pgandhi999 commented on a change in pull request #22806: [SPARK-25250][CORE] : 
Late zombie task completions handled correctly even before new taskset launched
URL: https://github.com/apache/spark/pull/22806#discussion_r246573015
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
 ##########
 @@ -1383,6 +1383,8 @@ private[spark] class DAGScheduler(
                 if (!job.finished(rt.outputId)) {
                   job.finished(rt.outputId) = true
                   job.numFinished += 1
+                  
taskScheduler.markPartitionIdAsCompletedAndKillCorrespondingTaskAttempts(
 
 Review comment:
   Ahh I see what you are saying. Yes, this method assumes that the newest 
TaskSet has been created. Yes, indeed a tiny possibility exists that when this 
method gets called, possibly the new TaskSet might have not been added to 
`taskSetsByStageIdAndAttempt`. However, when I wrote the code, my assumption 
was that there is always a small delay for the task completion event to 
propagate to the DAGScheduler. I have tested this code by reproducing 
FetchFailures multiple times and when this method is called, the new TaskSet is 
always present so there has not yet been an instance when the race condition 
described above has occurred, while before this fix, I was able to reproduce 
the bug like 4 out of 5 times. Still, a valid point.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to