[GitHub] [spark] cxzl25 commented on pull request #34493: [SPARK-37217][SQL] Dynamic partitions should fail quickly when writing to external tables to prevent data deletion

GitBox Sun, 07 Nov 2021 20:56:14 -0800


cxzl25 commented on pull request #34493:
URL: https://github.com/apache/spark/pull/34493#issuecomment-962812348



   > Hmm, my question is, as we are going to overwrite the table partitions, 
why we need to prevent data to be deleted? Any other delete-like command, I 
think if any failure happens during deletion, there will be some data that are 
already deleted before the failure. I think we don't provide atomicity 
guarantee for this command, right?
   
   Yes. I agree with you.
   Operation is not guaranteed to be atomic.
   Failure during the deletion process is not guaranteed to be restored.
   
   
   
   But in this case, if the number of dynamic partitions exceeds 
`hive.exec.max.dynamic.partitions`, Spark deletes the partition data first, and 
then checks that the number of partitions exceeds the configured number when 
`client.loadDynamicPartitions` loads the data, and it fails immediately. No 
data is written to the partition.
   
   The user thought that the operation was not successful, and theoretically 
the original data should still be there.
   
   Or the user will check whether the number of partitions meets expectations. 
If it does, the user needs to adjust the hive configuration. If it does not, it 
needs to modify the sql logic.
   It also takes time to re-run sql, and the data during this period will not 
be able to be read.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cxzl25 commented on pull request #34493: [SPARK-37217][SQL] Dynamic partitions should fail quickly when writing to external tables to prevent data deletion

Reply via email to