zhengchenyu commented on PR #37346:
URL: https://github.com/apache/spark/pull/37346#issuecomment-3401003005

   @viirya @dongjoon-hyun @wForget 
   
   After some research, I discovered that the `.spark_staging_xxx` directory is 
only used for custom partition paths (introduced in 
https://github.com/apache/spark/pull/15814) and dynamic partitions overwrite 
(introduced in https://github.com/apache/spark/pull/18714, with appropriate 
modifications in https://github.com/apache/spark/pull/29000). I suspect the 
purpose of introducing `.spark_staging_xxx` is to avoid conflicts, for example, 
in scenarios where dynamic partitions overwrite to prevent data contamination.
   
   I believe the issue of running multiple partitions application in parallel 
is similar to the two above. Could we make writing to `.spark_staging_xxx` as 
the default behavior? This would not only solve this problem but also make the 
code structure more clean?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to