karuppayya commented on issue #25995: [SPARK-29324][SQL] Fix overwrite 
behaviour for saveAsTable 
URL: https://github.com/apache/spark/pull/25995#issuecomment-539013715
 
 
   1. When I write to a partitioned table(with fileformat other than parquet) 
using `saveAstable` and `partitionOverwriteMode` set to `dynamic`, the affected 
partitions may have data with Parquet format(default) while other partitions 
will have data in a different format. User  needs to specify the fileformat 
when writing using this API.
   I think this operation should be disallowed when fileformat is not specifie  
or use the format from the dropped table. 
   
   Also during `overwrite` + `saveAsTable` operation, we drop the table and 
recreate it
   1. In the case of Hive table, we might transparently change a Hive(provider 
= HIve) table to datasource table. Can this have any implication when the table 
is read with Spark version where there are issues with Orc's data source flow 
and user wanting to use the Hive read flow?
   2. In the case of   `partitionOverwriteMode=dynamic`, we write to particular 
partitions only. Due to drop, we lose the partition information. ALso, all the 
partition level stats is lost(If already analyzed). These are expensive 
operations in case of larger tables.
   
   @cloud-fan Any thoughts on these?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to