squito commented on a change in pull request #23951: [SPARK-13704][CORE][YARN] 
Re-implement RackResolver to reduce resolving time
URL: https://github.com/apache/spark/pull/23951#discussion_r266899149
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
 ##########
 @@ -250,8 +251,10 @@ private[spark] class TaskSetManager(
       }
       pendingTasksForHost.getOrElseUpdate(loc.host, new ArrayBuffer) += index
 
-      initializingTaskArray match {
-        case Some(array) => array += ((loc.host, index))
+      initializingHostToIndices match {
+        case Some(hostToIndices) =>
+          // when TaskSetManager initializing, preferredLocation -> task 
indices
+          hostToIndices.getOrElseUpdate(loc.host, new ArrayBuffer) += index
 
 Review comment:
   I think you misunderstood me.  I agree we should *not* try to use one map 
for both racks and hosts to indices, as you said that would cause problems if 
there were ever a name conflict.
   
   I'm just talking about how we build up the map `hostToIndices`.  You can see 
your added line
   
   ```scala
   hostToIndices.getOrElseUpdate(loc.host, new ArrayBuffer) += index
   ```
   
   is pretty much the same as what is a few lines up, just its going into 
`pendingsTaskForHost`:
   
   
https://github.com/apache/spark/blob/e402de5fd030cdc4150fda0755c7c636cad9619e/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L236
   
   The only difference is that your addition is guarded by 
`initializingHostToIndices match ...`, but that is irrelevant for what you want 
-- you could change `addPendingTasks` to not bother creating `hostToIndices` 
and instead call
   
   ```scala
   (rack, indices) <- sched.getRacksForHosts(pendingTasksForHost.keySet.toSeq)
   ```
   
   as `pendingTasksForHost` is exactly the same right there.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to