LantaoJin commented on a change in pull request #23951:
[SPARK-13704][CORE][YARN] Re-implement RackResolver to reduce resolving time
URL: https://github.com/apache/spark/pull/23951#discussion_r266403686
##########
File path: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
##########
@@ -184,11 +184,23 @@ private[spark] class TaskSetManager(
t.epoch = epoch
}
+ // An array to store preferred location and its task index
+ private val locationWithTaskIndex: ArrayBuffer[(String, Int)] = new
ArrayBuffer[(String, Int)]()
+ private val addTaskStartTime = System.nanoTime()
// Add all our tasks to the pending lists. We do this in reverse order
// of task index so that tasks with low indices get launched first.
for (i <- (0 until numTasks).reverse) {
- addPendingTask(i)
+ addPendingTask(i, true)
}
+ // Convert preferred location list to rack list in one invocation and zip
with the origin index
+ private val rackWithTaskIndex =
sched.getRacksForHosts(locationWithTaskIndex.map(_._1).toList)
Review comment:
> if you repeat the same host 1000 times, if that host is not in the cache
yet, it will repeat that lookup 1000 times.
Looking up a host in a Set has no different with looking up it in a HashMap.
For example, if host1, host2, host3 are three uncached unresolved hosts. First
time, `getUncachedHosts` will return a list includes all of them. Then
`rawMapping.resolve()` resolve them and `cacheResolvdHosts(uncachedHosts,
resolvedHosts)` will cache the three hosts in memory. Now the second round with
host1, host2, host3, host4 begin, `getUncachedHosts` will look up in a HashMap
and return a list only contains host4, then the host4 will be resolved and
cache in memory.
```java
@Override
public List<String> resolve(List<String> names) {
// normalize all input names to be in the form of IP addresses
names = NetUtils.normalizeHostNames(names);
List <String> result = new ArrayList<String>(names.size());
if (names.isEmpty()) {
return result;
}
List<String> uncachedHosts = getUncachedHosts(names);
// Resolve the uncached hosts
List<String> resolvedHosts = rawMapping.resolve(uncachedHosts);
//cache them
cacheResolvedHosts(uncachedHosts, resolvedHosts);
//now look up the entire list in the cache
return getCachedHosts(names);
}
private List<String> getCachedHosts(List<String> names) {
List<String> result = new ArrayList<String>(names.size());
// Construct the result
for (String name : names) {
String networkLocation = cache.get(name);
if (networkLocation != null) {
result.add(networkLocation);
} else {
return null;
}
}
return result;
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]