pgandhi999 commented on a change in pull request #22806: [SPARK-25250][CORE] : 
Late zombie task completions handled correctly even before new taskset launched
URL: https://github.com/apache/spark/pull/22806#discussion_r247217162
 
 

 ##########
 File path: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
 ##########
 @@ -286,6 +286,44 @@ private[spark] class TaskSchedulerImpl(
     }
   }
 
+  /**
+   * SPARK-25250: Whenever any Task gets successfully completed, we simply 
mark the
+   * corresponding partition id as completed in all attempts for that 
particular stage and
+   * additionally, for a Result Stage, we also kill the remaining task 
attempts running on the
+   * same partition. As a result, we do not see any Killed tasks due to
+   * TaskCommitDenied Exceptions showing up in the UI. When this method is 
called from
+   * DAGScheduler.scala on a task completion event being fired, it is assumed 
that the new
+   * TaskSet has already been created and registered. However, a small 
possibility does exist
+   * that when this method gets called, possibly the new TaskSet might have 
not been added
 
 Review comment:
   Yes, indeed, this patch covers this corner case. But as @Ngone51 pointed 
out, there is an extremely rare chance that when this method gets called from 
DAGScheduler.scala, it could be possible that the new TaskSet might have not 
been added to taskSetsByStageIdAndAttempt. My assumption while writing this 
code was that there is always a small delay for the task completion event to 
propagate to the DAGScheduler, which gives sufficient time for the new TaskSet 
to be created gracefully. I have tested this patch like a million times and 
have not once encountered this case, but theoretically, there might be a small 
chance nevertheless. That is why I have added the comment, in case, somebody 
encounters this in the future.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to