pgandhi999 commented on issue #22806: [SPARK-25250][CORE] : On successful 
completion of a task attempt on a parti…
URL: https://github.com/apache/spark/pull/22806#issuecomment-450911923
 
 
   @cloud-fan I do not fully understand your question but this is what happens 
according to the current behaviour:
    Task with task id 5005(let us say) working on partition id 9000 is running 
on stage 4.0. Stage 4.0 fails, therefore, stage 4.1 is created. So stage 4.1 
sees that the partition id 9000 has not completed and thus, it creates a new 
task with task id, let us say, 2002 scheduled to run on partition id 9000.  
However, it so happens that after stage 4.1 starts running, task id 5005 is 
successfully completed and it marks the partition id 9000 as complete. This 
information is not propagated to stage 4.1 as the map `successful` in 
TaskSetManager.scala which keeps track of the task indices that got completed 
does not get updated for tasksets running for stage 4.1. So, task id 2002 
computes the result but is not able to commit it to HDFS as, results for the 
same partition have already been written to HDFS, thus, it sees a 
CommitDeniedException. However, it sees this as a failure and will keep 
retrying multiple times, even though it should not. This will continue on till 
the Spark application as a whole succeeds so the driver simply shuts down all 
the executors at the end, thus, killing task id 2002. I hope this answers your 
query.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to