tgravescs commented on a change in pull request #27223:
[SPARK-30511][SPARK-28403][CORE] Don't treat failed/killed speculative tasks as
pending in Spark scheduler
URL: https://github.com/apache/spark/pull/27223#discussion_r370690964
##########
File path: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
##########
@@ -263,9 +263,15 @@ private[spark] class ExecutorAllocationManager(
*/
private def maxNumExecutorsNeeded(): Int = {
val numRunningOrPendingTasks = listener.totalPendingTasks +
listener.totalRunningTasks
- math.ceil(numRunningOrPendingTasks * executorAllocationRatio /
- tasksPerExecutorForFullParallelism)
- .toInt
+ val maxNeeded = math.ceil(numRunningOrPendingTasks *
executorAllocationRatio /
+ tasksPerExecutorForFullParallelism).toInt
+ if (listener.pendingSpeculativeTasks > 0 &&
tasksPerExecutorForFullParallelism > 1) {
+ // If we have pending speculative tasks, allocate one more executor to
satisfy the
+ // locality requirements of speculative tasks
+ maxNeeded + 1
Review comment:
right but you can't guarantee that the +1 will get your speculative task or
that the new executor won't end up on the same host. In our example case of all
999 already running it would give it a better chance if the 1 that finished
happened to be on the same executor but what are the odds only 1 finished and
it finished exactly on the same host? Generally the reason you speculate is
that you are assuming something on that host is wrong or slow so you want to
run it somewhere else. If that is the case that executor is likely to stay
loaded up while other ones finish. If you have 500 executors already and your
speculation configs are sane, the odds of another task finishing on a different
executor are pretty good. If you don't have all your regular tasks finished,
they could easily be put on the +1 host as well before your speculative has a
chance to be scheduled on it and you could end up with only the 1 your counter
part is on. Since we don't have them integrated well its all just a
guess/heuristic.
I'm ok with leaving this logic to +1 like you have it (with expanded
comment) since that is what it was but at the same time it can be a huge waste
or resources. If every application has speculative on and you have 1000 apps
running, you could easily be wasting 1000's of containers when you already have
enough resources to run within.
We could definitely come up with better heuristics that don't waste so much
but can be done separately.
Also can you add in a debug log statement here when we do the +1
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]