squito commented on a change in pull request #22806: [SPARK-25250][CORE] : Late 
zombie task completions handled correctly even before new taskset launched
URL: https://github.com/apache/spark/pull/22806#discussion_r251915695
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala
 ##########
 @@ -109,4 +109,13 @@ private[spark] trait TaskScheduler {
    */
   def applicationAttemptId(): Option[String]
 
+  /**
+   * SPARK-25250: Whenever any Task gets successfully completed, we simply 
mark the
+   * corresponding partition id as completed in all attempts for that 
particular stage.
+   * This ensures that multiple attempts of the same task do not keep running 
even when the
+   * corresponding partition is completed. This method must be called from 
inside the DAGScheduler
 
 Review comment:
   This is now the same as the old `markPartitionCompletedInAllTaskSets`, 
right?  We should only have one of them -- I prefer the old name and comment 
actually, though I do think its worth adding to the doc on the other one that 
is should only be called form inside the DAGScheduler event loop.
   
   If there is a reason to keep this one, delete "This ensures that multiple 
attempts of the same task do not keep running even when the corresponding 
partition is completed".  That isn't really the important part, its to keep the 
DAGScheduler and the individual tsm's in sync wrt whether stages are done or 
not.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to