Ngone51 commented on a change in pull request #22806: [SPARK-25250][CORE] : 
Late zombie task completions handled correctly even before new taskset launched
URL: https://github.com/apache/spark/pull/22806#discussion_r247779264
 
 

 ##########
 File path: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
 ##########
 @@ -286,6 +286,44 @@ private[spark] class TaskSchedulerImpl(
     }
   }
 
+  /**
+   * SPARK-25250: Whenever any Task gets successfully completed, we simply 
mark the
+   * corresponding partition id as completed in all attempts for that 
particular stage and
+   * additionally, for a Result Stage, we also kill the remaining task 
attempts running on the
+   * same partition. As a result, we do not see any Killed tasks due to
+   * TaskCommitDenied Exceptions showing up in the UI. When this method is 
called from
+   * DAGScheduler.scala on a task completion event being fired, it is assumed 
that the new
+   * TaskSet has already been created and registered. However, a small 
possibility does exist
+   * that when this method gets called, possibly the new TaskSet might have 
not been added
 
 Review comment:
   I think @squito has a good ponit here. Previously, I was thinking what if 
the active TaskSet has not been created when we marking completed partition for 
all TaskSets and does this fix still works ? Now, I realize that whether the 
active TaskSet has been created or not, it still works:
   
   * created
   
   obviously, fine.
   
   * not created
   
   then, when `DAGScheduler` calling `submitMissingTasks`, it will figure out 
which missing partitions to compute(including the partitions which were 
completed by tasks from previous stage attempt). So, the new created TaskSet 
also know about the completed partition. And these are all benefit from event 
loop, which perform as a single thread.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to