[GitHub] [spark] rdblue commented on a change in pull request #24768: [SPARK-27919][SQL] Add v2 session catalog

GitBox Tue, 02 Jul 2019 16:47:50 -0700

rdblue commented on a change in pull request #24768: [SPARK-27919][SQL] Add v2 
session catalog
URL: https://github.com/apache/spark/pull/24768#discussion_r299727339


 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceResolution.scala
 ##########
 @@ -26,64 +26,73 @@ import org.apache.spark.sql.catalog.v2.{CatalogPlugin, 
Identifier, LookupCatalog
 import org.apache.spark.sql.catalog.v2.expressions.Transform
 import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.analysis.CastSupport
-import org.apache.spark.sql.catalyst.catalog.{BucketSpec, CatalogTable, 
CatalogTableType, CatalogUtils}
+import org.apache.spark.sql.catalyst.catalog.{BucketSpec, CatalogTable, 
CatalogTableType, CatalogUtils, UnresolvedCatalogRelation}
 import org.apache.spark.sql.catalyst.plans.logical.{CreateTableAsSelect, 
CreateV2Table, DropTable, LogicalPlan}
 import 
org.apache.spark.sql.catalyst.plans.logical.sql.{AlterTableAddColumnsStatement, 
AlterTableSetLocationStatement, AlterTableSetPropertiesStatement, 
AlterTableUnsetPropertiesStatement, AlterViewSetPropertiesStatement, 
AlterViewUnsetPropertiesStatement, CreateTableAsSelectStatement, 
CreateTableStatement, DropTableStatement, DropViewStatement, QualifiedColType}
 import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.execution.command.{AlterTableAddColumnsCommand, 
AlterTableSetLocationCommand, AlterTableSetPropertiesCommand, 
AlterTableUnsetPropertiesCommand, DropTableCommand}
+import org.apache.spark.sql.execution.datasources.v2.{CatalogTableAsV2, 
DataSourceV2Relation}
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.sources.v2.TableProvider
 import org.apache.spark.sql.types.{HIVE_TYPE_STRING, HiveStringType, 
MetadataBuilder, StructField, StructType}
 
 case class DataSourceResolution(
     conf: SQLConf,
-    findCatalog: String => CatalogPlugin)
-  extends Rule[LogicalPlan] with CastSupport with LookupCatalog {
+    lookup: LookupCatalog)
+  extends Rule[LogicalPlan] with CastSupport {
 
   import org.apache.spark.sql.catalog.v2.CatalogV2Implicits._
+  import lookup._
 
-  override protected def lookupCatalog(name: String): CatalogPlugin = 
findCatalog(name)
-
-  def defaultCatalog: Option[CatalogPlugin] = 
conf.defaultV2Catalog.map(findCatalog)
+  lazy val v2SessionCatalog: CatalogPlugin = lookup.sessionCatalog
+      .getOrElse(throw new AnalysisException("No v2 session catalog 
implementation is available"))
 
   override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators {
     case CreateTableStatement(
         AsTableIdentifier(table), schema, partitionCols, bucketSpec, 
properties,
         V1WriteProvider(provider), options, location, comment, ifNotExists) =>
-
+      // the source is v1, the identifier has no catalog, and there is no 
default v2 catalog
 
 Review comment:
   After this PR, the rules to determine the catalog responsible for an 
identifier are:
   
   1. If the identifier starts with a known catalog, use it
   2. If there is a configured default v2 catalog, use that catalog
   3. Otherwise, the session catalog is responsible for the identifier
   
   Rules 1 and 2 are implemented in the `AsTableIdentifier` and 
`CatalogObjectIdentifier` extractors. So `AsTableIdentifier` is not going to 
match and convert to a v1 identifier if a v2 catalog is responsible for the 
table. In those cases, `CatalogObjectIdentifier` will always return a catalog 
and an `Identifier`.
   
   If the session catalog is responsible for the identifier, then Spark needs 
to determine whether it should use the session catalog directly, or if it needs 
to use a v2 plan that uses the `V2SessionCatalog` because the source is v2. 
That's what the `V1WriteProvider` matcher determines.
   
   So putting all of that together, we get that this rule will match when the 
source is v1 (use the session catalog directly), the identifier has no catalog 
(rule 1 is not the case), and there is no default v2 catalog (rule 2 is not the 
case).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] rdblue commented on a change in pull request #24768: [SPARK-27919][SQL] Add v2 session catalog

Reply via email to