cloud-fan commented on a change in pull request #22806: [SPARK-25250][CORE] : 
Late zombie task completions handled correctly even before new taskset launched
URL: https://github.com/apache/spark/pull/22806#discussion_r246619431
 
 

 ##########
 File path: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
 ##########
 @@ -286,6 +286,34 @@ private[spark] class TaskSchedulerImpl(
     }
   }
 
+  /**
+   * SPARK-25250: Whenever any Result Task gets successfully completed, we 
simply mark the
+   * corresponding partition id as completed in all attempts for that 
particular stage and
+   * additionally, for a Result Stage, we also kill the remaining task 
attempts running on the
+   * same partition. As a result, we do not see any Killed tasks due to
+   * TaskCommitDenied Exceptions showing up in the UI.
+   */
+  override def completeTasks(partitionId: Int, stageId: Int, killTasks: 
Boolean): Unit = {
+    taskSetsByStageIdAndAttempt.getOrElse(stageId, Map()).values.foreach { tsm 
=>
+      tsm.partitionToIndex.get(partitionId) match {
+        case Some(index) =>
+          tsm.markPartitionCompletedForRedundantTaskAttempts(index)
 
 Review comment:
   how about `markPartitionAsAlreadyCompleted`? This indicates that the 
partition is already completed by other task set managers, and this task set 
manager only needs to mark it as completed, but not to track its run time or 
increase the numSuccessfulTasks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to