karuppayya commented on issue #25995: [SPARK-29324][SQL] Fix overwrite behaviour for saveAsTable URL: https://github.com/apache/spark/pull/25995#issuecomment-539013715 1. When I write to a partitioned table(with fileformat other than parquet) using `saveAstable` and `partitionOverwriteMode` set to `dynamic`, the affected partitions may have data with Parquet format(default) while other partitions will have data in a different format. User needs to specify the fileformat when writing using this API. I think this operation should be disallowed when fileformat is not specifie or use the format from the dropped table. Also during `overwrite` + `saveAsTable` operation, we drop the table and recreate it 1. In the case of Hive table, we might transparently change a Hive(provider = HIve) table to datasource table. Can this have any implication when the table is read with Spark version where there are issues with Orc's data source flow and user wanting to use the Hive read flow? 2. In the case of `partitionOverwriteMode=dynamic`, we write to particular partitions only. Due to drop, we lose the partition information. ALso, all the partition level stats is lost(If already analyzed). These are expensive operations in case of larger tables. @cloud-fan Any thoughts on these?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
