brkyvz opened a new pull request #25822: [SQL] Support partitioning and 
bucketing through DataFrameWriter.save for V2 Tables
URL: https://github.com/apache/spark/pull/25822
 
 
   ### What changes were proposed in this pull request?
   
   We add a new interface `SupportsCreateTable` to support the passing of 
partitioning transforms and table properties for tables that can be created 
without the existence of a catalog. Traditionally, data sources were passed all 
necessary information to define a table through the options in DataFrameWriter 
in conjunction with `save`.
   
   Through this new interface, we can continue to perform the necessary checks 
for SaveMode.ErrorIfExists and SaveMode.Ignore through `save` for V2 tables. 
For example, a file based data source such as parquet can check if the target 
directory is empty or not as part of the `SupportsCreateTable.canCreateTable` 
to support these save modes. In addition, if metadata is available for a table 
(e.g. the schema of a jdbc data source would be available), the data source can 
check if the correct schema and partitioning transforms have been provided as 
part of the `SupportsCreateTable.buildTable` if a table already exists for the 
given options.
   
   ### Why are the changes needed?
   
   Currently partitioning and bucketing information cannot be passed through 
for DataSources that migrate to DataSource V2 through the DataFrameWriter.save 
method which is one of the most commonly used methods used in Apache Spark.
   
   ### Does this PR introduce any user-facing change?
   
   This adds a new interface `SupportsCreateTable` which DataSource developers 
can implement as part of their `TableProvider` interface to support the 
creation of tables when a catalog is not available.
   
   ### How was this patch tested?
   
   Tests in DataSourceV2DataFrameSuite

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to