[GitHub] [spark] cloud-fan commented on a change in pull request #25330: [SPARK-28565][SQL] DataFrameWriter saveAsTable support for V2 catalogs

GitBox Thu, 08 Aug 2019 20:07:23 -0700

cloud-fan commented on a change in pull request #25330: [SPARK-28565][SQL] 
DataFrameWriter saveAsTable support for V2 catalogs
URL: https://github.com/apache/spark/pull/25330#discussion_r312315345


 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
 ##########
 @@ -374,8 +375,12 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
     df.sparkSession.sessionState.sqlParser.parseMultipartIdentifier(tableName) 
match {
       case CatalogObjectIdentifier(Some(catalog), ident) =>
         insertInto(catalog, ident)
+      // TODO(SPARK-28667): Support the V2SessionCatalog
 
 Review comment:
   Not related to this PR, but just a note. I've been working on 
V2SessionCatalog improvement for a while, and one issue I found is: it's not 
easy to implement "if the table provider is v1, go to v1 session catalog. if 
the table provider is v2, go to v2 session catalog".
   
   It's easy to do it for CREATE TABLE, because the table provider is known at 
the beginning. But it's hard for SELECT and INSERT, as we need to look up the 
table from Hive catalog to get the table provider.
   
   I'd expect to have 2 analyzer rules to do it:
   1. one rule in sql/catalyst, which resolves `UnresolvedRelation` to 
`UnresolvedCatalogRelation` by looking up the table from hive catalog
   2. one rule in sql/core, which resolves `UnresolvedCatalogRelation` to 
`DataSourceV2Relation`, if the table provider is v2. This has to be in sql/core 
because `DataSource.lookupDataSource` is in sql/core. We have a rule 
`FindDataSourceTable` that resolves `UnresolvedCatalogRelation` to v1 relation 
if the table provider is v1.
   
   I think `DataFrameWriter` should produce `...Statement` plans, to reuse the 
analyzer rules for all the things.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #25330: [SPARK-28565][SQL] DataFrameWriter saveAsTable support for V2 catalogs

Reply via email to