squito commented on a change in pull request #22806: [SPARK-25250][CORE] : Late
zombie task completions handled correctly even before new taskset launched
URL: https://github.com/apache/spark/pull/22806#discussion_r251915695
##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala
##########
@@ -109,4 +109,13 @@ private[spark] trait TaskScheduler {
*/
def applicationAttemptId(): Option[String]
+ /**
+ * SPARK-25250: Whenever any Task gets successfully completed, we simply
mark the
+ * corresponding partition id as completed in all attempts for that
particular stage.
+ * This ensures that multiple attempts of the same task do not keep running
even when the
+ * corresponding partition is completed. This method must be called from
inside the DAGScheduler
Review comment:
This is now the same as the old `markPartitionCompletedInAllTaskSets`,
right? We should only have one of them -- I prefer the old name and comment
actually, though I do think its worth adding to the doc on the other one that
is should only be called form inside the DAGScheduler event loop.
If there is a reason to keep this one, delete "This ensures that multiple
attempts of the same task do not keep running even when the corresponding
partition is completed". That isn't really the important part, its to keep the
DAGScheduler and the individual tsm's in sync wrt whether stages are done or
not.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]