squito commented on a change in pull request #23951: [SPARK-13704][CORE][YARN]
Re-implement RackResolver to reduce resolving time
URL: https://github.com/apache/spark/pull/23951#discussion_r266899149
##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
##########
@@ -250,8 +251,10 @@ private[spark] class TaskSetManager(
}
pendingTasksForHost.getOrElseUpdate(loc.host, new ArrayBuffer) += index
- initializingTaskArray match {
- case Some(array) => array += ((loc.host, index))
+ initializingHostToIndices match {
+ case Some(hostToIndices) =>
+ // when TaskSetManager initializing, preferredLocation -> task
indices
+ hostToIndices.getOrElseUpdate(loc.host, new ArrayBuffer) += index
Review comment:
I think you misunderstood me. I agree we should *not* try to use one map
for both racks and hosts to indices, as you said that would cause problems if
there were ever a name conflict.
I'm just talking about how we build up the map `hostToIndices`. You can see
your added line
```scala
hostToIndices.getOrElseUpdate(loc.host, new ArrayBuffer) += index
```
is pretty much the same as what is a few lines up, just its going into
`pendingsTaskForHost`:
https://github.com/apache/spark/blob/e402de5fd030cdc4150fda0755c7c636cad9619e/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L236
The only difference is that your addition is guarded by
`initializingHostToIndices match ...`, but that is irrelevant for what you want
-- you could change `addPendingTasks` to not bother creating `hostToIndices`
and instead call
```scala
(rack, indices) <- sched.getRacksForHosts(pendingTasksForHost.keySet.toSeq)
```
as `pendingTasksForHost` is exactly the same right there.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]