[GitHub] [spark] tgravescs commented on a change in pull request #27223: [SPARK-30511][SPARK-28403][CORE] Don't treat failed/killed speculative tasks as pending in Spark scheduler

GitBox Thu, 23 Jan 2020 14:51:04 -0800

tgravescs commented on a change in pull request #27223: 
[SPARK-30511][SPARK-28403][CORE] Don't treat failed/killed speculative tasks as 
pending in Spark scheduler
URL: https://github.com/apache/spark/pull/27223#discussion_r370370306


 ##########
 File path: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
 ##########
 @@ -263,9 +263,15 @@ private[spark] class ExecutorAllocationManager(
    */
   private def maxNumExecutorsNeeded(): Int = {
     val numRunningOrPendingTasks = listener.totalPendingTasks + 
listener.totalRunningTasks
-    math.ceil(numRunningOrPendingTasks * executorAllocationRatio /
-              tasksPerExecutorForFullParallelism)
-      .toInt
+    val maxNeeded = math.ceil(numRunningOrPendingTasks * 
executorAllocationRatio /
+      tasksPerExecutorForFullParallelism).toInt
+    if (listener.pendingSpeculativeTasks > 0 && 
tasksPerExecutorForFullParallelism > 1) {
+      // If we have pending speculative tasks, allocate one more executor to 
satisfy the
+      // locality requirements of speculative tasks
+      maxNeeded + 1
 
 Review comment:
   But this isn't necessarily going to work, it doesn't guarantee that you get 
an executor on a different host.  I guess it gives it a chance at least. I 
think the comment needs to be more clear and state that the scheduler logic 
will only start a speculative task if its on a different host and that this may 
not work. I now see why that code was added below in the addExecutors, so 
thanks for pointing it out.
   This also doesn't make sense if we already have more then 1 executor. The 
numRunningOrPendingTasks already includes speculative tasks, so lets say we 
have 1000 total tasks and putting 1 task per executor. We are already asking 
for 1000 executors, so we don't need to add 1 more here that won't be used. 
Really I think it should only be needed if maxNeeded == 1 and we have 
speculative tasks. Otherwise we have at least 2 executors so they should be 
able to run.
   
    I think we should file another followup jira to perhaps handle speculative 
tasks differently in cluster manager side

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] tgravescs commented on a change in pull request #27223: [SPARK-30511][SPARK-28403][CORE] Don't treat failed/killed speculative tasks as pending in Spark scheduler

Reply via email to