[GitHub] [spark] tgravescs commented on a change in pull request #27223: [SPARK-30511][SPARK-28403][CORE] Don't treat failed/killed speculative tasks as pending in Spark scheduler

GitBox Fri, 24 Jan 2020 07:26:05 -0800

tgravescs commented on a change in pull request #27223: 
[SPARK-30511][SPARK-28403][CORE] Don't treat failed/killed speculative tasks as 
pending in Spark scheduler
URL: https://github.com/apache/spark/pull/27223#discussion_r370690964


 ##########
 File path: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
 ##########
 @@ -263,9 +263,15 @@ private[spark] class ExecutorAllocationManager(
    */
   private def maxNumExecutorsNeeded(): Int = {
     val numRunningOrPendingTasks = listener.totalPendingTasks + 
listener.totalRunningTasks
-    math.ceil(numRunningOrPendingTasks * executorAllocationRatio /
-              tasksPerExecutorForFullParallelism)
-      .toInt
+    val maxNeeded = math.ceil(numRunningOrPendingTasks * 
executorAllocationRatio /
+      tasksPerExecutorForFullParallelism).toInt
+    if (listener.pendingSpeculativeTasks > 0 && 
tasksPerExecutorForFullParallelism > 1) {
+      // If we have pending speculative tasks, allocate one more executor to 
satisfy the
+      // locality requirements of speculative tasks
+      maxNeeded + 1
 
 Review comment:
   right but you can't guarantee that the +1 will get your speculative task or 
that the new executor won't end up on the same host. In our example case of all 
999 already running it would give it a better chance if the 1 that finished 
happened to be on the same executor but what are the odds only 1 finished and 
it finished exactly on the same host?  Generally the reason you speculate is 
that you are assuming something on that host is wrong or slow so you want to 
run it somewhere else.  If that is the case that executor is likely to stay 
loaded up while other ones finish. If you have 500 executors already and your 
speculation configs are sane, the odds of another task finishing on a different 
executor are pretty good. If you don't have all your regular tasks finished, 
they could easily be put on the +1 host as well before your speculative has a 
chance to be scheduled on it and you could end up with only the 1 your counter 
part is on.  Since we don't have them integrated well its all just a 
guess/heuristic.  
   
   I'm ok with leaving this logic to +1 like you have it (with expanded 
comment) since that is what it was but at the same time it can be a huge waste 
or resources. If every application has speculative on and you have 1000 apps 
running, you could easily be wasting 1000's of containers when you already have 
enough resources to run within.
   We could definitely come up with better heuristics that don't waste so much 
but can be done separately.
   Also can you add in a debug log statement here when we do the +1

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] tgravescs commented on a change in pull request #27223: [SPARK-30511][SPARK-28403][CORE] Don't treat failed/killed speculative tasks as pending in Spark scheduler

Reply via email to