cloud-fan commented on a change in pull request #22806: [SPARK-25250][CORE] : 
Late zombie task completions handled correctly even before new taskset launched
URL: https://github.com/apache/spark/pull/22806#discussion_r246998710
 
 

 ##########
 File path: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
 ##########
 @@ -286,6 +286,44 @@ private[spark] class TaskSchedulerImpl(
     }
   }
 
+  /**
+   * SPARK-25250: Whenever any Task gets successfully completed, we simply 
mark the
+   * corresponding partition id as completed in all attempts for that 
particular stage and
+   * additionally, for a Result Stage, we also kill the remaining task 
attempts running on the
+   * same partition. As a result, we do not see any Killed tasks due to
+   * TaskCommitDenied Exceptions showing up in the UI. When this method is 
called from
+   * DAGScheduler.scala on a task completion event being fired, it is assumed 
that the new
+   * TaskSet has already been created and registered. However, a small 
possibility does exist
+   * that when this method gets called, possibly the new TaskSet might have 
not been added
+   * to taskSetsByStageIdAndAttempt. In such a case, we might still hit the 
same issue. However,
+   * the above scenario has not yet been reproduced.
+   */
+  override def completeTasks(partitionId: Int, stageId: Int, killTasks: 
Boolean): Unit = {
+    taskSetsByStageIdAndAttempt.getOrElse(stageId, Map()).values.foreach { tsm 
=>
+      tsm.partitionToIndex.get(partitionId) match {
+        case Some(index) =>
+          tsm.markPartitionAsAlreadyCompleted(index)
 
 Review comment:
   I don't see any downside of marking `successful(index) = true`.
   - If `tsm` is active and the task is pending to run, I think killing the 
task is a no-op, but maybe better to not send the KillTask request to avoid 
overhead. `successful(index) = true` is needed here to finish the `tsm`.
   - if `tsm` is active and the task is running. Killing the task will probably 
save resource, and `successful(index) = true` is needed here to finish the 
`tsm`.
   - if `tsm` is zoombie, `successful(index) = true` is a no-op(but no 
overhead) as a zombie `tsm` will never be marked as finished. Killing the task 
may have benefit if the task is still running.
   
   If we want to be 100% safe and eliminate any perf regression, maybe the 
simplest choice is to not kill the task, but just `successful(index) = true`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to