Ngone51 commented on a change in pull request #22806: [SPARK-25250][CORE] : Late zombie task completions handled correctly even before new taskset launched URL: https://github.com/apache/spark/pull/22806#discussion_r247779264
########## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ########## @@ -286,6 +286,44 @@ private[spark] class TaskSchedulerImpl( } } + /** + * SPARK-25250: Whenever any Task gets successfully completed, we simply mark the + * corresponding partition id as completed in all attempts for that particular stage and + * additionally, for a Result Stage, we also kill the remaining task attempts running on the + * same partition. As a result, we do not see any Killed tasks due to + * TaskCommitDenied Exceptions showing up in the UI. When this method is called from + * DAGScheduler.scala on a task completion event being fired, it is assumed that the new + * TaskSet has already been created and registered. However, a small possibility does exist + * that when this method gets called, possibly the new TaskSet might have not been added Review comment: I think @squito has a good ponit here. Previously, I was thinking what if the active TaskSet has not been created when we marking completed partition for all TaskSets and does this fix still works ? Now, I realize that whether the active TaskSet has been created or not, it still works: * created obviously, fine. * not created then, when `DAGScheduler` calling `submitMissingTasks`, it will figure out which missing partitions to compute(including the partitions which were completed by tasks from previous stage attempt). So, the new created TaskSet also know about the completed partition. And these are all benefit from event loop, which perform as a single thread. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org