LantaoJin commented on a change in pull request #23951: 
[SPARK-13704][CORE][YARN] Re-implement RackResolver to reduce resolving time
URL: https://github.com/apache/spark/pull/23951#discussion_r266403686
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
 ##########
 @@ -184,11 +184,23 @@ private[spark] class TaskSetManager(
     t.epoch = epoch
   }
 
+  // An array to store preferred location and its task index
+  private val locationWithTaskIndex: ArrayBuffer[(String, Int)] = new 
ArrayBuffer[(String, Int)]()
+  private val addTaskStartTime = System.nanoTime()
   // Add all our tasks to the pending lists. We do this in reverse order
   // of task index so that tasks with low indices get launched first.
   for (i <- (0 until numTasks).reverse) {
-    addPendingTask(i)
+    addPendingTask(i, true)
   }
+  // Convert preferred location list to rack list in one invocation and zip 
with the origin index
+  private val rackWithTaskIndex = 
sched.getRacksForHosts(locationWithTaskIndex.map(_._1).toList)
 
 Review comment:
   > if you repeat the same host 1000 times, if that host is not in the cache 
yet, it will repeat that lookup 1000 times.
   
   Looking up a host in a Set has no different with looking up it in a HashMap. 
For example, if host1, host2, host3 are three uncached unresolved hosts. First 
time, `getUncachedHosts` will return a list includes all of them. Then 
`rawMapping.resolve()` resolve them and `cacheResolvdHosts(uncachedHosts, 
resolvedHosts)` will cache the three hosts in memory. Now the second round with 
host1, host2, host3, host4 begin, `getUncachedHosts` will look up in a HashMap 
and return a list only contains host4, then the host4 will be resolved and 
cache in memory.
   ```java
     @Override
     public List<String> resolve(List<String> names) {
       // normalize all input names to be in the form of IP addresses
       names = NetUtils.normalizeHostNames(names);
   
       List <String> result = new ArrayList<String>(names.size());
       if (names.isEmpty()) {
         return result;
       }
       List<String> uncachedHosts = getUncachedHosts(names);
       // Resolve the uncached hosts
       List<String> resolvedHosts = rawMapping.resolve(uncachedHosts);
       //cache them
       cacheResolvedHosts(uncachedHosts, resolvedHosts);
       //now look up the entire list in the cache
       return getCachedHosts(names);
     }
   
     private List<String> getCachedHosts(List<String> names) {
       List<String> result = new ArrayList<String>(names.size());
       // Construct the result
       for (String name : names) {
         String networkLocation = cache.get(name);
         if (networkLocation != null) {
           result.add(networkLocation);
         } else {
           return null;
         }
       }
       return result;
     }
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to