Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/17166#discussion_r107773629
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl
private[scheduler](
taskState: TaskState,
reason: TaskFailedReason): Unit = synchronized {
taskSetManager.handleFailedTask(tid, taskState, reason)
- if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
+ if (!taskSetManager.isZombie) {
--- End diff --
Re: (a) Each revive offer complexity is not trivial, it is the cost of a
full schedule over all available active executors (happening within the driver
rpc + CoarseGrainedSchedulerBackend.this lock).
We are now introducing potentially large number of duplicate and
unnecessary (from existing jobs pov) full schedules of the order of ~25% of
total number of tasks in system (spec exec enabled).
I do not have the cluster or job(s) to give concrete numbers unfortunately
- but I hope I have convinced you about the actual cost involved.
Re (b): I was thrown off by the `info.killed == return` in
`tsm.handleFailedTask`; misread it as for the current update (it is for the
existing state).
Given this, I completely mis-understood the semantics of killed and
apologize for the long discussion !
Having said that, I now do not believe there will be a job hang as you
mentioned.
With speculative execution disabled, there is a `makeOffers(executorId)` in
CGSB.receive when task state is finished which can cause reschedule of task on
the same executor.
If the executor itself goes away, as you mentioned earlier, it causes a
reviveOffer which will cause a full schedule and submission of the killed task.
With speculative execution enabled a periodic full schedule will cause the
killed task to be picked up anyway.
Given this, did I miss something which requires this to be introduced ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]