BELUGA BEHR created YARN-8170:
---------------------------------

             Summary: Caching Node Rack Location
                 Key: YARN-8170
                 URL: https://issues.apache.org/jira/browse/YARN-8170
             Project: Hadoop YARN
          Issue Type: New Feature
          Components: applications, nodemanager
    Affects Versions: 3.0.1
            Reporter: BELUGA BEHR


When the MapReduce ApplicationMaster is trying to assign Mappers to Nodes, it 
loops all of the queued Mappers and looks up the ideal rack location of each 
Mapper.

Under the covers, the rack awareness script is being called, once per Mapper. 
The results do get cached, but for only as long as the ApplicationMaster 
exists. That means that the script gets called N times each time a new 
ApplicationMaster is launched. If the rack awareness script is complex or 
requires an external lookup, this can be a slow process and can even DDOS the 
external lookup source.

There are at least a couple of ways to tackle this...
 # Add a DNSToSwitchMapping implementation that caches in an external cache 
(i.e., memcached) instead of memory so that all ApplicationMasters can share 
the same cache and would rarely call the rack awareness script.
 # Like the shuffle service, add a new NodeManager auxiliary which exposes a 
rack lookup API so that the NodeManagers are responsible for caching the rack 
locations. This would also require a DNSToSwitchMapping implementation that 
interacts with this new service.
 # Other?

{code:java}
          String host = allocated.getNodeId().getHost();
          String rack = RackResolver.resolve(host).getNetworkLocation();
{code}
[https://github.com/apache/hadoop/blob/453d48bdfbb67ed3e66c33c4aef239c3d7bdd3bc/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#L1435-L1464]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to