[GitHub] [spark] turboFei opened a new pull request #25744: [SPARK-29037] Make staging dir identified with applicationId.

GitBox Tue, 10 Sep 2019 06:18:15 -0700

turboFei opened a new pull request #25744: [SPARK-29037] Make staging dir 
identified with applicationId.
URL: https://github.com/apache/spark/pull/25744
 
 
   ### What changes were proposed in this pull request?
   For a stage, whose tasks would commit output.
   A task saves result to a staging dir first, when all tasks of this stage 
success, it will move tasks' output to destination dir.
   However, when we kill an application, which  is committing tasks' output,  
parts of tasks' results  are saved in staging dir.
   Then we rerun this application and the new application will reuse the 
staging dir, and if the task commit stage success, it will move the files under 
this staging dir to destination dir.
   
   In this PR, I make the  staging dir identified with applicationId.
   
   
   ### Why are the changes needed?
   Spark may give duplicated result without this PR.
   
   
   ### Does this PR introduce any user-facing change?
   No.
   
   
   ### How was this patch tested?
   Existing UT.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] turboFei opened a new pull request #25744: [SPARK-29037] Make staging dir identified with applicationId.

Reply via email to