[GitHub] [spark] AngersZhuuuu commented on pull request #36056: [WIP][SPARK-36571][SQL] Add an SQLOverwriteHadoopMapReduceCommitProtocol to support all SQL overwrite write data to staging dir

GitBox Mon, 04 Apr 2022 07:45:42 -0700


AngersZhuuuu commented on PR #36056:
URL: https://github.com/apache/spark/pull/36056#issuecomment-1087649418


   ping @cloud-fan  In order to realize the target that can support all 
overwrite, spark can't delete matching partitions before computing. But we 
support custom partition path and this part is handled in commit protocol, and 
spark must delete matching partitions before commit job. But
   
   1. In committer side, we don't know the information about jobs (such as if 
it's a overwriting? or is it's a non-partition table overwriting)
   2. Also in committer side, it don't know how to handle deleting matching 
partitions
   3. Custom partition path overwrite  is handled in commit protocol's 
`commitJob`
   
   So the best processing steps is:
   
   1. Not delete matching partitions
   2. Executing data and write temp path 
   3. Delete matching partitions
   4. commitJob, in this step, custom partition path is written and commit data 
to staging dir
   5. rename staging dir to target location


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] AngersZhuuuu commented on pull request #36056: [WIP][SPARK-36571][SQL] Add an SQLOverwriteHadoopMapReduceCommitProtocol to support all SQL overwrite write data to staging dir

Reply via email to