[GitHub] [spark] jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-05 Thread GitBox
jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290983004
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
 ##
 @@ -36,17 +36,18 @@ import org.apache.spark.sql.util.SchemaUtils
  * Replaces [[UnresolvedRelation]]s if the plan is for direct query on files.
  */
 class ResolveSQLOnFile(sparkSession: SparkSession) extends Rule[LogicalPlan] {
+
   private def maybeSQLFile(u: UnresolvedRelation): Boolean = {
-sparkSession.sessionState.conf.runSQLonFile && 
u.tableIdentifier.database.isDefined
+sparkSession.sessionState.conf.runSQLonFile && u.multipartIdentifier.size 
== 2
 
 Review comment:
   Sounds good.
   
   What should be the expected behavior for this test in sql.SQLQuerySuite."run 
sql directly on files"?
   ```
   e = intercept[AnalysisException] {
 sql("select * from json.invalid_file")
   }
   assert(e.message.contains("Path does not exist"))
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-05 Thread GitBox
jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290902457
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceResolution.scala
 ##
 @@ -98,8 +100,22 @@ case class DataSourceResolution(
 
 case DropViewStatement(AsTableIdentifier(tableName), ifExists) =>
   DropTableCommand(tableName, ifExists, isView = true, purge = false)
+
+case u @ UnresolvedRelation(CatalogObjectIdentifier(Some(catalog), ident)) 
=>
 
 Review comment:
   When attempting to move `DataSourceV2Relation` to Catalyst, I discovered 
[SPARK-27961](https://issues.apache.org/jira/browse/SPARK-27961)
   "DataSourceV2Relation should not have refresh method". The refresh method 
was added by #24401.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-05 Thread GitBox
jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290902457
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceResolution.scala
 ##
 @@ -98,8 +100,22 @@ case class DataSourceResolution(
 
 case DropViewStatement(AsTableIdentifier(tableName), ifExists) =>
   DropTableCommand(tableName, ifExists, isView = true, purge = false)
+
+case u @ UnresolvedRelation(CatalogObjectIdentifier(Some(catalog), ident)) 
=>
 
 Review comment:
   When attempting to move `DataSourceV2Relation` to Catalyst, I discovered 
[SPARK-27961](https://issues.apache.org/jira/browse/SPARK-27961)
   "DataSourceV2Relation should not have refresh method".


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-05 Thread GitBox
jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290595780
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ##
 @@ -728,20 +737,16 @@ class Analyzer(
 //and the default database is only used to look up a view);
 // 3. Use the currentDb of the SessionCatalog.
 private def lookupTableFromCatalog(
+tableIdentifier: TableIdentifier,
 u: UnresolvedRelation,
 defaultDatabase: Option[String] = None): LogicalPlan = {
-  val tableIdentWithDb = u.tableIdentifier.copy(
-database = u.tableIdentifier.database.orElse(defaultDatabase))
+  val tableIdentWithDb = tableIdentifier.copy(
+database = tableIdentifier.database.orElse(defaultDatabase))
   try {
 catalog.lookupRelation(tableIdentWithDb)
   } catch {
-case e: NoSuchTableException =>
-  u.failAnalysis(s"Table or view not found: 
${tableIdentWithDb.unquotedString}", e)
-// If the database is defined and that database is not found, throw an 
AnalysisException.
-// Note that if the database is not defined, it is possible we are 
looking up a temp view.
-case e: NoSuchDatabaseException =>
-  u.failAnalysis(s"Table or view not found: 
${tableIdentWithDb.unquotedString}, the " +
-s"database ${e.db} doesn't exist.", e)
 
 Review comment:
   Unfortunately not possible since Analysis exception is thrown by 
checkAnalysis now:
   ```
 case u: UnresolvedRelation =>
   u.failAnalysis(s"Table or view not found: 
${u.multipartIdentifier.quoted}")
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-05 Thread GitBox
jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290592441
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
 ##
 @@ -185,6 +186,8 @@ abstract class BaseSessionStateBuilder(
 V2WriteSupportCheck +:
 V2StreamingScanSupportCheck +:
 customCheckRules
+
+override protected def lookupCatalog(name: String): CatalogPlugin = 
session.catalog(name)
 
 Review comment:
   Good questions. Please check out @rdblue's #24768 first.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-05 Thread GitBox
jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290591930
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceResolution.scala
 ##
 @@ -98,8 +100,22 @@ case class DataSourceResolution(
 
 case DropViewStatement(AsTableIdentifier(tableName), ifExists) =>
   DropTableCommand(tableName, ifExists, isView = true, purge = false)
+
+case u @ UnresolvedRelation(CatalogObjectIdentifier(Some(catalog), ident)) 
=>
 
 Review comment:
   Thanks for the suggestion. Let me give it some thought.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-05 Thread GitBox
jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290591148
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
 ##
 @@ -181,6 +182,11 @@ case class CreateViewCommand(
* Permanent views are not allowed to reference temp objects, including temp 
function and views
*/
   private def verifyTemporaryObjectsNotExists(sparkSession: SparkSession): 
Unit = {
+val lookup = new LookupCatalog {
+  override protected def lookupCatalog(name: String): CatalogPlugin = 
sparkSession.catalog(name)
+}
+import lookup._
 
 Review comment:
   Good idea!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290583573
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala
 ##
 @@ -153,7 +153,8 @@ object HiveAnalysis extends Rule[LogicalPlan] {
 case CreateTable(tableDesc, mode, None) if DDLUtils.isHiveTable(tableDesc) 
=>
   CreateTableCommand(tableDesc, ignoreIfExists = mode == SaveMode.Ignore)
 
-case CreateTable(tableDesc, mode, Some(query)) if 
DDLUtils.isHiveTable(tableDesc) =>
+case CreateTable(tableDesc, mode, Some(query))
+if DDLUtils.isHiveTable(tableDesc) && query.resolved =>
 
 Review comment:
   This PR prevents lookupTableFromCatalog from throwing NoSuchTableException 
right away. Instead, it relies on checkAnalysis to throw an exception for 
UnresolvedRelation.
   
   The test hive.SQLQuerySuite."double nested data" would fail in the following 
sql without this change:
   ```
   CREATE TABLE test_ctas_1234 AS SELECT * from notexists
   ```
   HiveAnalysis gets to run before checkAnalysis, thus exposing this bug where 
query.output is used before query is resolved.
   
   So wouldn't say it is a totally separate issue.
   In addition, outside of this PR, it'd be hard to write a unit test.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290550798
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala
 ##
 @@ -91,6 +92,8 @@ class HiveSessionStateBuilder(session: SparkSession, 
parentState: Option[Session
 V2WriteSupportCheck +:
 V2StreamingScanSupportCheck +:
 customCheckRules
+
+override protected def lookupCatalog(name: String): CatalogPlugin = 
session.catalog(name)
 
 Review comment:
   Same answer as above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290550754
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala
 ##
 @@ -185,6 +186,8 @@ abstract class BaseSessionStateBuilder(
 V2WriteSupportCheck +:
 V2StreamingScanSupportCheck +:
 customCheckRules
+
+override protected def lookupCatalog(name: String): CatalogPlugin = 
session.catalog(name)
 
 Review comment:
   SparkSession.catalog returns CatalogPlugin for DSv2 while 
BaseSessionStateBuilder.catalog is a SessionCatalog.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290550116
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceResolution.scala
 ##
 @@ -98,8 +100,22 @@ case class DataSourceResolution(
 
 case DropViewStatement(AsTableIdentifier(tableName), ifExists) =>
   DropTableCommand(tableName, ifExists, isView = true, purge = false)
+
+case u @ UnresolvedRelation(CatalogObjectIdentifier(Some(catalog), ident)) 
=>
 
 Review comment:
   All DSv2 resolutions are kept in DataSourceResolution that requires 
LookupCatalog mixin.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290522087
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ##
 @@ -38,19 +38,32 @@ class UnresolvedException[TreeType <: TreeNode[_]](tree: 
TreeType, function: Str
 /**
  * Holds the name of a relation that has yet to be looked up in a catalog.
  *
- * @param tableIdentifier table name
+ * @param multipartIdentifier table name
  */
-case class UnresolvedRelation(tableIdentifier: TableIdentifier)
-  extends LeafNode {
+case class UnresolvedRelation(multipartIdentifier: Seq[String]) extends 
LeafNode {
+  import org.apache.spark.sql.catalog.v2.CatalogV2Implicits._
 
   /** Returns a `.` separated name for this relation. */
-  def tableName: String = tableIdentifier.unquotedString
+  def tableName: String = multipartIdentifier.quoted
 
   override def output: Seq[Attribute] = Nil
 
   override lazy val resolved = false
 }
 
+object UnresolvedRelation {
+  def apply(tableIdentifier: TableIdentifier): UnresolvedRelation = {
 
 Review comment:
   I am working on a PR for SPARK-27845 to add Insert. Almost done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290383294
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
 ##
 @@ -50,11 +51,15 @@ object ResolveHints {
*
* This rule must happen before common table expressions.
*/
-  class ResolveJoinStrategyHints(conf: SQLConf) extends Rule[LogicalPlan] {
+  class ResolveJoinStrategyHints(
+  conf: SQLConf,
+  catalogLookup: String => CatalogPlugin) extends Rule[LogicalPlan] with 
LookupCatalog {
 
 Review comment:
   Like suggested approach of passing LookupCatalog instance because it will 
make it easier to add feature to LookCatalog e.g. default catalog name.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290379810
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala
 ##
 @@ -50,11 +51,15 @@ object ResolveHints {
*
* This rule must happen before common table expressions.
*/
-  class ResolveJoinStrategyHints(conf: SQLConf) extends Rule[LogicalPlan] {
+  class ResolveJoinStrategyHints(
+  conf: SQLConf,
+  catalogLookup: String => CatalogPlugin) extends Rule[LogicalPlan] with 
LookupCatalog {
 
 Review comment:
   Not used. Removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] DataSourceV2: Select from multiple catalogs

2019-06-04 Thread GitBox
jzhuge commented on a change in pull request #24741: [SPARK-27322][SQL] 
DataSourceV2: Select from multiple catalogs
URL: https://github.com/apache/spark/pull/24741#discussion_r290379652
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala
 ##
 @@ -35,18 +36,21 @@ import org.apache.spark.sql.util.SchemaUtils
 /**
  * Replaces [[UnresolvedRelation]]s if the plan is for direct query on files.
  */
-class ResolveSQLOnFile(sparkSession: SparkSession) extends Rule[LogicalPlan] {
+class ResolveSQLOnFile(sparkSession: SparkSession) extends Rule[LogicalPlan] 
with LookupCatalog {
+
+  override protected def lookupCatalog(name: String): CatalogPlugin = 
sparkSession.catalog(name)
 
 Review comment:
   Not used. Removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org