AngersZhuuuu commented on PR #36056:
URL: https://github.com/apache/spark/pull/36056#issuecomment-1087649418

   ping @cloud-fan  In order to realize the target that can support all 
overwrite, spark can't delete matching partitions before computing. But we 
support custom partition path and this part is handled in commit protocol, and 
spark must delete matching partitions before commit job. But
   
   1. In committer side, we don't know the information about jobs (such as if 
it's a overwriting? or is it's a non-partition table overwriting)
   2. Also in committer side, it don't know how to handle deleting matching 
partitions
   3. Custom partition path overwrite  is handled in commit protocol's 
`commitJob`
   
   So the best processing steps is:
   
   1. Not delete matching partitions
   2. Executing data and write temp path 
   3. Delete matching partitions
   4. commitJob, in this step, custom partition path is written and commit data 
to staging dir
   5. rename staging dir to target location


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to