Ngone51 commented on a change in pull request #22806: [SPARK-25250][CORE] :
Late zombie task completions handled correctly even before new taskset launched
URL: https://github.com/apache/spark/pull/22806#discussion_r247019300
##########
File path:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##########
@@ -286,6 +286,44 @@ private[spark] class TaskSchedulerImpl(
}
}
+ /**
+ * SPARK-25250: Whenever any Task gets successfully completed, we simply
mark the
+ * corresponding partition id as completed in all attempts for that
particular stage and
+ * additionally, for a Result Stage, we also kill the remaining task
attempts running on the
+ * same partition. As a result, we do not see any Killed tasks due to
+ * TaskCommitDenied Exceptions showing up in the UI. When this method is
called from
+ * DAGScheduler.scala on a task completion event being fired, it is assumed
that the new
+ * TaskSet has already been created and registered. However, a small
possibility does exist
+ * that when this method gets called, possibly the new TaskSet might have
not been added
+ * to taskSetsByStageIdAndAttempt. In such a case, we might still hit the
same issue. However,
+ * the above scenario has not yet been reproduced.
+ */
+ override def completeTasks(partitionId: Int, stageId: Int, killTasks:
Boolean): Unit = {
+ taskSetsByStageIdAndAttempt.getOrElse(stageId, Map()).values.foreach { tsm
=>
+ tsm.partitionToIndex.get(partitionId) match {
+ case Some(index) =>
+ tsm.markPartitionAsAlreadyCompleted(index)
Review comment:
I know, for a certain stage, there will be one active `tsm` and 0 or more
zombie tsm. But they can all be finished by calling `maybeFinishTaskSet()`, and
this only notify `TaskScheduler`, but remains unkonw for `DAGScheduler` (I
guess you're meaning this.)
> I think we need to increase tasksSuccessful, so that a tsm can correctly
be finished.
Only increase `tasksSuccessful` is not enough. Because a `tsm` could be
finished if and only if it is zombie. And if we don't set the zombie bit here,
then, no else where can do this for certain cases I mentioned above. So, the
`tsm` won't be finished at the end.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]