[ 
https://issues.apache.org/jira/browse/YARN-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13768228#comment-13768228
 ] 

Junping Du commented on YARN-1200:
----------------------------------

bq. A reasonable regression-fixing first step is to match that of the HDFS 
functionality: Each NameNode (and NameNode alone) needs the rack resolution 
script, not all the DNs.
Right. I also agree this is the first step we should address. However, I think 
this is not a simple fix that we should quickly make a decision. The reason is 
as below:
Yarn, unlike MRV1, actually separate two layers of scheduling (resource 
scheduling and task scheduling) out for simplicity: For resource scheduling, AM 
translate tasks (TaskAttempt) from Job into ContainerRequest and later encode 
to ResourceRequests and send to RM scheduler for container allocation. For task 
scheduling, AM schedule map tasks based on locality on allocated containers. 
For the first step, the AM resolve topology to create several (typically 3: 
node, rack and ANY/*) ResourceRequest from one ContainerRequest asked for 
specific nodes. The second step is decided by AM only which means AM has to 
understand the topology at least for some specific nodes. Although we may get 
rid of resolving topology in the first step by simplifying ResourceRequest 
between AM and RM, i.e. sending only 1 (node) ResourceRequest instead of 3 RRs, 
we still need to resolve it in second step.
Given we cannot get rid of resolve topology in AM, we may prefer cache instead 
of running the script on each node. I have a draft proposal below:
- Setup a cache <node, network_location> in AM
- In response come back from AMRM heartbeat, AM can get topology info on 
related nodes (nodes in request and assignedContainer) and add into cache 
- Remove all RackResolver call in AM side with accessing cache
- If cache missing in sending node's ResourceRequest for rack, resolve its rack 
location to something unusual (like:UNKNOWN), then RM will replace with correct 
rack info. Heartbeat back will fill the cache later.
- The cache is not only be filled but also be refreshed. So if topology changes 
and aware in RM side, the changes may not updated immediately to all AMs, but 
gradually be updated to related AMs (request node's resource or get assigned 
containers).
Thoughts?  

                
> Provide a central view for rack topologies
> ------------------------------------------
>
>                 Key: YARN-1200
>                 URL: https://issues.apache.org/jira/browse/YARN-1200
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.1.0-beta
>            Reporter: Harsh J
>
> It appears that with YARN, any AM (such as the MRv2 AM) that tries to do 
> rack-info-based work, will need to resolve racks locally rather than get rack 
> info from YARN directly: 
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java#L1054
>  and its use of a simple implementation of 
> https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/RackResolver.java
> This is a regression, as we've traditionally only had users maintain rack 
> mappings and its associated script on a single master role node (JobTracker), 
> not at every compute node. Task spawning hosts have never done/needed rack 
> resolution of their own.
> It is silly to have to maintain rack configs and their changes on all nodes. 
> We should have the RM host a stable interface service so that there's only a 
> single view of the topology across the cluster, and document for AMs to use 
> that instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to