[GitHub] [spark] gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-05 Thread GitBox
gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290589946
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
 ##
 @@ -185,6 +186,8 @@ abstract class BaseSessionStateBuilder(
 V2WriteSupportCheck +:
 V2StreamingScanSupportCheck +:
 customCheckRules
+
+override protected def lookupCatalog(name: String): CatalogPlugin = 
session.catalog(name)
 
 Review comment:
   I might not get the difference. How to avoid misusing them?
   
   - One is our global catalog? Another is the local catalogs? 
   - What is the semantics of `lookupCatalog` for SessionCatalog? 
   - What is the semantics of `lookupCatalog` for CatalogPlugin for DSv2?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290587603
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceResolution.scala
 ##
 @@ -98,8 +100,22 @@ case class DataSourceResolution(
 
 case DropViewStatement(AsTableIdentifier(tableName), ifExists) =>
   DropTableCommand(tableName, ifExists, isView = true, purge = false)
+
+case u @ UnresolvedRelation(CatalogObjectIdentifier(Some(catalog), ident)) 
=>
 
 Review comment:
   ```
 new FindDataSourceTable(session) +:
   new ResolveSQLOnFile(session) +:
   new FallBackFileSourceV2(session) +:
   DataSourceResolution(conf, session.catalog(_)) +:
   ```
   
   I do not know who make this decision, but semantically, this could be 
different. one is before the rule ResolveSQLOnFile; another is after the rule 
ResolveSQLOnFile


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290530241
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ##
 @@ -728,20 +737,16 @@ class Analyzer(
 //and the default database is only used to look up a view);
 // 3. Use the currentDb of the SessionCatalog.
 private def lookupTableFromCatalog(
+tableIdentifier: TableIdentifier,
 u: UnresolvedRelation,
 defaultDatabase: Option[String] = None): LogicalPlan = {
-  val tableIdentWithDb = u.tableIdentifier.copy(
-database = u.tableIdentifier.database.orElse(defaultDatabase))
+  val tableIdentWithDb = tableIdentifier.copy(
+database = tableIdentifier.database.orElse(defaultDatabase))
   try {
 catalog.lookupRelation(tableIdentWithDb)
   } catch {
-case e: NoSuchTableException =>
-  u.failAnalysis(s"Table or view not found: 
${tableIdentWithDb.unquotedString}", e)
-// If the database is defined and that database is not found, throw an 
AnalysisException.
-// Note that if the database is not defined, it is possible we are 
looking up a temp view.
-case e: NoSuchDatabaseException =>
-  u.failAnalysis(s"Table or view not found: 
${tableIdentWithDb.unquotedString}, the " +
-s"database ${e.db} doesn't exist.", e)
 
 Review comment:
   The original error messages are still helpful. Let us keep it. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290530304
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ##
 @@ -38,19 +38,32 @@ class UnresolvedException[TreeType <: TreeNode[_]](tree: 
TreeType, function: Str
 /**
  * Holds the name of a relation that has yet to be looked up in a catalog.
  *
- * @param tableIdentifier table name
+ * @param multipartIdentifier table name
  */
-case class UnresolvedRelation(tableIdentifier: TableIdentifier)
-  extends LeafNode {
+case class UnresolvedRelation(multipartIdentifier: Seq[String]) extends 
LeafNode {
+  import org.apache.spark.sql.catalog.v2.CatalogV2Implicits._
 
   /** Returns a `.` separated name for this relation. */
-  def tableName: String = tableIdentifier.unquotedString
+  def tableName: String = multipartIdentifier.quoted
 
   override def output: Seq[Attribute] = Nil
 
   override lazy val resolved = false
 }
 
+object UnresolvedRelation {
+  def apply(tableIdentifier: TableIdentifier): UnresolvedRelation = {
 
 Review comment:
   Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290529952
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala
 ##
 @@ -91,6 +92,8 @@ class HiveSessionStateBuilder(session: SparkSession, 
parentState: Option[Session
 V2WriteSupportCheck +:
 V2StreamingScanSupportCheck +:
 customCheckRules
+
+override protected def lookupCatalog(name: String): CatalogPlugin = 
session.catalog(name)
 
 Review comment:
   The same question here. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290529728
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
 ##
 @@ -185,6 +186,8 @@ abstract class BaseSessionStateBuilder(
 V2WriteSupportCheck +:
 V2StreamingScanSupportCheck +:
 customCheckRules
+
+override protected def lookupCatalog(name: String): CatalogPlugin = 
session.catalog(name)
 
 Review comment:
   Why not `catalog(name)`? Any difference between `catalog(name)` and 
`session.catalog(name)`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290528094
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceResolution.scala
 ##
 @@ -98,8 +100,22 @@ case class DataSourceResolution(
 
 case DropViewStatement(AsTableIdentifier(tableName), ifExists) =>
   DropTableCommand(tableName, ifExists, isView = true, purge = false)
+
+case u @ UnresolvedRelation(CatalogObjectIdentifier(Some(catalog), ident)) 
=>
 
 Review comment:
   Why in this rule, instead of `FindDataSourceTable`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290524858
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
 ##
 @@ -190,10 +196,11 @@ case class CreateViewCommand(
   // package (e.g., HiveGenericUDF).
   child.collect {
 // Disallow creating permanent views based on temporary views.
-case s: UnresolvedRelation
-  if 
sparkSession.sessionState.catalog.isTemporaryTable(s.tableIdentifier) =>
+case UnresolvedRelation(AsTableIdentifier(ident))
+if sparkSession.sessionState.catalog.isTemporaryTable(ident) =>
+  // temporary views are only stored in the session catalog
   throw new AnalysisException(s"Not allowed to create a permanent view 
$name by " +
-s"referencing a temporary view ${s.tableIdentifier}")
+  s"referencing a temporary view $ident")
 
 Review comment:
   Nit: indent. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290524760
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
 ##
 @@ -181,6 +182,11 @@ case class CreateViewCommand(
* Permanent views are not allowed to reference temp objects, including temp 
function and views
*/
   private def verifyTemporaryObjectsNotExists(sparkSession: SparkSession): 
Unit = {
+val lookup = new LookupCatalog {
+  override protected def lookupCatalog(name: String): CatalogPlugin = 
sparkSession.catalog(name)
+}
+import lookup._
 
 Review comment:
   Do we need to build a new `LookupCatalog`?
   
   How about directly using the analyzer's AsTableIdentifier?
   > sparkSession.sessionState.analyzer.AsTableIdentifier(ident))


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290516629
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ##
 @@ -38,19 +38,32 @@ class UnresolvedException[TreeType <: TreeNode[_]](tree: 
TreeType, function: Str
 /**
  * Holds the name of a relation that has yet to be looked up in a catalog.
  *
- * @param tableIdentifier table name
+ * @param multipartIdentifier table name
  */
-case class UnresolvedRelation(tableIdentifier: TableIdentifier)
-  extends LeafNode {
+case class UnresolvedRelation(multipartIdentifier: Seq[String]) extends 
LeafNode {
+  import org.apache.spark.sql.catalog.v2.CatalogV2Implicits._
 
   /** Returns a `.` separated name for this relation. */
-  def tableName: String = tableIdentifier.unquotedString
+  def tableName: String = multipartIdentifier.quoted
 
   override def output: Seq[Attribute] = Nil
 
   override lazy val resolved = false
 }
 
+object UnresolvedRelation {
+  def apply(tableIdentifier: TableIdentifier): UnresolvedRelation = {
 
 Review comment:
   I check the usage of this. It sounds like `insert` is not covered? What is 
your following plan? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290515164
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
 ##
 @@ -50,7 +50,8 @@ object ResolveHints {
*
* This rule must happen before common table expressions.
*/
-  class ResolveJoinStrategyHints(conf: SQLConf) extends Rule[LogicalPlan] {
+  class ResolveJoinStrategyHints(
+  conf: SQLConf) extends Rule[LogicalPlan] {
 
 Review comment:
   Nit: revert it back?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
gatorsmile commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290514686
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala
 ##
 @@ -153,7 +153,8 @@ object HiveAnalysis extends Rule[LogicalPlan] {
 case CreateTable(tableDesc, mode, None) if DDLUtils.isHiveTable(tableDesc) 
=>
   CreateTableCommand(tableDesc, ignoreIfExists = mode == SaveMode.Ignore)
 
-case CreateTable(tableDesc, mode, Some(query)) if 
DDLUtils.isHiveTable(tableDesc) =>
+case CreateTable(tableDesc, mode, Some(query))
+if DDLUtils.isHiveTable(tableDesc) && query.resolved =>
 
 Review comment:
   This sounds like a separate issue. Could we submit a separate PR? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org