Github user dhruve commented on a diff in the pull request:
https://github.com/apache/spark/pull/22288#discussion_r227095389
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -415,9 +420,55 @@ private[spark] class TaskSchedulerImpl(
launchedAnyTask |= launchedTaskAtCurrentMaxLocality
} while (launchedTaskAtCurrentMaxLocality)
}
+
if (!launchedAnyTask) {
- taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
+ taskSet.getCompletelyBlacklistedTaskIfAny(hostToExecutors) match
{
+ case Some(taskIndex) => // Returns the taskIndex which was
unschedulable
+
+ // If the taskSet is unschedulable we try to find an
existing idle blacklisted
+ // executor. If we cannot find one, we abort immediately.
Else we kill the idle
--- End diff --
By clearing the abort timer as soon as a task is launched we are relaxing
this situation.
If there are large backlog of tasks:
- If we acquire new executors or launch new tasks we will defer the check
- If we cannot acquire new executors and we are running with long running
tasks such that no new tasks can be launched and we have less no. of executors
compared to max failures, in that case this will end up being harsh. This can
happen, but seems more like a very specific edge case.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]