squito commented on a change in pull request #22806: [SPARK-25250][CORE] : Late
zombie task completions handled correctly even before new taskset launched
URL: https://github.com/apache/spark/pull/22806#discussion_r247730654
##########
File path:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
##########
@@ -286,6 +286,44 @@ private[spark] class TaskSchedulerImpl(
}
}
+ /**
+ * SPARK-25250: Whenever any Task gets successfully completed, we simply
mark the
+ * corresponding partition id as completed in all attempts for that
particular stage and
+ * additionally, for a Result Stage, we also kill the remaining task
attempts running on the
+ * same partition. As a result, we do not see any Killed tasks due to
+ * TaskCommitDenied Exceptions showing up in the UI. When this method is
called from
+ * DAGScheduler.scala on a task completion event being fired, it is assumed
that the new
+ * TaskSet has already been created and registered. However, a small
possibility does exist
+ * that when this method gets called, possibly the new TaskSet might have
not been added
Review comment:
We really should try to be making this scenario impossible, not just less
likely -- and in fact I think that *is* what this patch does. The DAGScheduler
does this by having everything that modifies shared state happen inside the
event loop, which is single threaded. The problem with the last fix was that
it modified `TaskSchedulerImpl.taskSetsByStageIdAndAttempt`, but that does
*not* have the same protections. If we can ensure that the relevant data
structures are only touched inside the event loop, we should be safe. I don't
want to just go on testing a million times, I've been burned by that before :P.
(Sorry I haven't had a chance to take a detailed walk through this patch yet
though).
BTW I also meant earlier to change the *entire* PR description, not just the
title, as well to that entire blurb I had. That will become the commit msg,
and its also helpful to any other reviewers. I think that helps focus on what
this is adding on top of the previous fix attempt, otherwise this sounds like
its fixing the same thing:
SPARK-23433 tried to ensure that late task completions from a zombie taskset
were properly updated in all tasksets for the stage (zombie or not). However,
because it did this outside of the DAGScheduler event loop, the DAGScheduler
could launch another taskset for the stage at the same time, before it had
updated the set of tasks left to run. The new active taskset would never learn
about the old completion from the zombie taskset. This could lead to multiple
active tasksets (as described in SPARK-23433) or resultstage failure from
repeated TaskCommitDenied exceptions for the task which the final stage attemps
to re-run.
This change fixes it by moving that logic into the DAGScheduler event loop.
As an optimization, it also kills tasks in all task sets, only if its a result
stage (an extension of SPARK-25773).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]