LantaoJin commented on a change in pull request #23951:
[SPARK-13704][CORE][YARN] Re-implement RackResolver to reduce resolving time
URL: https://github.com/apache/spark/pull/23951#discussion_r266428467
##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
##########
@@ -184,11 +184,23 @@ private[spark] class TaskSetManager(
t.epoch = epoch
}
+ // An array to store preferred location and its task index
+ private val locationWithTaskIndex: ArrayBuffer[(String, Int)] = new
ArrayBuffer[(String, Int)]()
+ private val addTaskStartTime = System.nanoTime()
// Add all our tasks to the pending lists. We do this in reverse order
// of task index so that tasks with low indices get launched first.
for (i <- (0 until numTasks).reverse) {
- addPendingTask(i)
+ addPendingTask(i, true)
}
+ // Convert preferred location list to rack list in one invocation and zip
with the origin index
+ private val rackWithTaskIndex =
sched.getRacksForHosts(locationWithTaskIndex.map(_._1).toList)
Review comment:
Refactor with the de-duping approach.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]