turboFei opened a new pull request #25744: [SPARK-29037] Make staging dir identified with applicationId. URL: https://github.com/apache/spark/pull/25744 ### What changes were proposed in this pull request? For a stage, whose tasks would commit output. A task saves result to a staging dir first, when all tasks of this stage success, it will move tasks' output to destination dir. However, when we kill an application, which is committing tasks' output, parts of tasks' results are saved in staging dir. Then we rerun this application and the new application will reuse the staging dir, and if the task commit stage success, it will move the files under this staging dir to destination dir. In this PR, I make the staging dir identified with applicationId. ### Why are the changes needed? Spark may give duplicated result without this PR. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Existing UT.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
