AngersZhuuuu commented on PR #36056: URL: https://github.com/apache/spark/pull/36056#issuecomment-1087649418
ping @cloud-fan In order to realize the target that can support all overwrite, spark can't delete matching partitions before computing. But we support custom partition path and this part is handled in commit protocol, and spark must delete matching partitions before commit job. But 1. In committer side, we don't know the information about jobs (such as if it's a overwriting? or is it's a non-partition table overwriting) 2. Also in committer side, it don't know how to handle deleting matching partitions 3. Custom partition path overwrite is handled in commit protocol's `commitJob` So the best processing steps is: 1. Not delete matching partitions 2. Executing data and write temp path 3. Delete matching partitions 4. commitJob, in this step, custom partition path is written and commit data to staging dir 5. rename staging dir to target location -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
