squito commented on a change in pull request #23951: [SPARK-27038][CORE][YARN] 
Re-implement RackResolver to reduce resolving time
URL: https://github.com/apache/spark/pull/23951#discussion_r263469410
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
 ##########
 @@ -184,11 +184,23 @@ private[spark] class TaskSetManager(
     t.epoch = epoch
   }
 
+  // An array to store preferred location and its task index
+  private val locationWithTaskIndex: ArrayBuffer[(String, Int)] = new 
ArrayBuffer[(String, Int)]()
+  private val addTaskStartTime = System.nanoTime()
   // Add all our tasks to the pending lists. We do this in reverse order
   // of task index so that tasks with low indices get launched first.
   for (i <- (0 until numTasks).reverse) {
-    addPendingTask(i)
+    addPendingTask(i, true)
   }
+  // Convert preferred location list to rack list in one invocation and zip 
with the origin index
+  private val rackWithTaskIndex = 
sched.getRacksForHosts(locationWithTaskIndex.map(_._1).toList)
 
 Review comment:
   btw I meant something like this for avoiding leaving around 
`locationWithTaskIndex` and doing the deduping. 
    
https://github.com/squito/spark/commit/fc0b3089efe4bf35f145e5c01bff26e7b9c5f0e7
   
    Feel free to pull in that commit to your change if you think its helpful.
   
   The de-duping thing is minor, but I am concerned that the 
`locationWithTaskIndex` variable is going to be confusing if its left around as 
a private member variable, even though its only meaningful in this limited 
context.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to