zhengchenyu commented on PR #37346: URL: https://github.com/apache/spark/pull/37346#issuecomment-3401003005
@viirya @dongjoon-hyun @wForget After some research, I discovered that the `.spark_staging_xxx` directory is only used for custom partition paths (introduced in https://github.com/apache/spark/pull/15814) and dynamic partitions overwrite (introduced in https://github.com/apache/spark/pull/18714, with appropriate modifications in https://github.com/apache/spark/pull/29000). I suspect the purpose of introducing `.spark_staging_xxx` is to avoid conflicts, for example, in scenarios where dynamic partitions overwrite to prevent data contamination. I believe the issue of running multiple partitions application in parallel is similar to the two above. Could we make writing to `.spark_staging_xxx` as the default behavior? This would not only solve this problem but also make the code structure more clean? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
