brkyvz opened a new pull request #25822: [SQL] Support partitioning and bucketing through DataFrameWriter.save for V2 Tables URL: https://github.com/apache/spark/pull/25822 ### What changes were proposed in this pull request? We add a new interface `SupportsCreateTable` to support the passing of partitioning transforms and table properties for tables that can be created without the existence of a catalog. Traditionally, data sources were passed all necessary information to define a table through the options in DataFrameWriter in conjunction with `save`. Through this new interface, we can continue to perform the necessary checks for SaveMode.ErrorIfExists and SaveMode.Ignore through `save` for V2 tables. For example, a file based data source such as parquet can check if the target directory is empty or not as part of the `SupportsCreateTable.canCreateTable` to support these save modes. In addition, if metadata is available for a table (e.g. the schema of a jdbc data source would be available), the data source can check if the correct schema and partitioning transforms have been provided as part of the `SupportsCreateTable.buildTable` if a table already exists for the given options. ### Why are the changes needed? Currently partitioning and bucketing information cannot be passed through for DataSources that migrate to DataSource V2 through the DataFrameWriter.save method which is one of the most commonly used methods used in Apache Spark. ### Does this PR introduce any user-facing change? This adds a new interface `SupportsCreateTable` which DataSource developers can implement as part of their `TableProvider` interface to support the creation of tables when a catalog is not available. ### How was this patch tested? Tests in DataSourceV2DataFrameSuite
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
