cloud-fan edited a comment on issue #25822: [SPARK-29127][SQL] Support partitioning and bucketing through DataFrameWriter.save for V2 Tables URL: https://github.com/apache/spark/pull/25822#issuecomment-532781291 It's worthwhile to discuss the usefulness of `TableProvider`. So far I see 2 use cases: 1. `DataFrameReader.load()` with only append/overwrite save mode. This was from a previous decision. If we revisit it and want to support all save modes, `TableProvider` can't be used here. 2. CREATE TABLE USING with session catalog (similar to Hive EXTERNAL/MANAGED TABLE): The core idea is to keep metadata in Spark and keep data externally. `TableProvider` is a good fit as we don't need to create/alter/drop tables in the external systems, but register external data as tables in Spark. This is the major use case of DS V1 and many users are familiar with it, I think it's better to support it with DS v2.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
