[ 
https://issues.apache.org/jira/browse/YARN-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated YARN-466:
----------------------------

    Assignee: Zhijie Shen
    
> Slave hostname mismatches in ResourceManager/Scheduler
> ------------------------------------------------------
>
>                 Key: YARN-466
>                 URL: https://issues.apache.org/jira/browse/YARN-466
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager, scheduler
>            Reporter: Roger Hoover
>            Assignee: Zhijie Shen
>
> The problem is that the ResourceManager learns the hostname of a slave node 
> when the NodeManager registers itself and it seems the node manager is 
> getting the hostname by asking the OS.  When a job is submitted, I think the 
> ApplicationMaster learns the hostname by doing a reverse DNS lookup based on 
> the slaves file.
> Therefore, the ApplicationMaster submits requests for containers using the 
> fully qualified domain name (node1.foo.com) but the scheduler uses the OS 
> hostname (node1) when checking to see if any requests are node-local.  The 
> result is that node-local requests are never found using this method of 
> searching for node-local requests:
> ResourceRequest request = application.getResourceRequest(priority, 
> node.getHostName());
> I think it's unfriendly to ask users to make sure they configure hostnames to 
> match fully qualified domain names. There should be a way for the 
> ApplicationMaster and NodeManager to agree on the hostname.
> Steps to Reproduce:
> 1) Configure the OS hostname on slaves to differ from the fully qualified 
> domain name.  For example, if the FQDN for the slave is "node1.foo.com", set 
> the hostname on the node to be just "node1".
> 2) On submitting a job, observe that the AM submits resource requests using 
> the FQDN (e.g. "node1.foo.com").  You can add logging to the allocate() 
> method of whatever scheduler you're using 
> for (ResourceRequest req: ask) {
>       LOG.debug(String.format("Request %s for %d containers on %s", req, 
> req.getNumContainers(), req.getHostName()));
>     }
> 3) Observe that when the scheduler checks for node locality (in the handle() 
> method) using the FiCaSchedulerNode.getHostName(), the hostname is uses is 
> the one set in the host OS (e.g. "node1").  NOTE: if you're using 
> FifoScheduler, this bug needs to be fixed first 
> (https://issues.apache.org/jira/browse/YARN-412).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to