wankunde opened a new pull request #29197:
URL: https://github.com/apache/spark/pull/29197


   # What changes were proposed in this pull request?
   Generally, distributed jobs have two stages of committing files: committing 
task's output files and committing job's output files. If one attempt fails, 
another attempt will try to run the task again, after all tasks succeed, the 
job will commit the output of all tasks.
   But now if we run a dynamic partition overwrite job, for example, `INSERT 
OVERWRITE table dst partition(part) SELECT * from src`, then if one of the 
final stage tasks fails, the job will fail.
   The first task attempt  datawriter in final stage writes the output data 
directly to spark stage directory.If the first taskattempt fails, the second 
taskattempt datawriter will fail to setup, because the task's output file is 
already exists. Then the job will fail.
   Therefore, I think we should write the temporary data to the task attempt's 
work directory, and commit result files after the task attempt succeed.
   
   ### Why are the changes needed?
   Bug fix in case one dynamic partition data writer of final stage tasks fails 
.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added UT


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to