Roger Hoover created YARN-466:
---------------------------------

             Summary: Slave hostname mismatches in ResourceManager/Scheduler
                 Key: YARN-466
                 URL: https://issues.apache.org/jira/browse/YARN-466
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager, scheduler
            Reporter: Roger Hoover


The problem is that the ResourceManager learns the hostname of a slave node 
when the NodeManager registers itself and it seems the node manager is getting 
the hostname by asking the OS.  When a job is submitted, I think the 
ApplicationMaster learns the hostname by doing a reverse DNS lookup based on 
the slaves file.

Therefore, the ApplicationMaster submits requests for containers using the 
fully qualified domain name (node1.foo.com) but the scheduler uses the OS 
hostname (node1) when checking to see if any requests are node-local.  The 
result is that node-local requests are never found using this method of 
searching for node-local requests:

ResourceRequest request = application.getResourceRequest(priority, 
node.getHostName());

I think it's unfriendly to ask users to make sure they configure hostnames to 
match fully qualified domain names. There should be a way for the 
ApplicationMaster and NodeManager to agree on the hostname.

Steps to Reproduce:
1) Configure the OS hostname on slaves to differ from the fully qualified 
domain name.  For example, if the FQDN for the slave is "node1.foo.com", set 
the hostname on the node to be just "node1".
2) On submitting a job, observe that the AM submits resource requests using the 
FQDN (e.g. "node1.foo.com").  You can add logging to the allocate() method of 
whatever scheduler you're using 

for (ResourceRequest req: ask) {
      LOG.debug(String.format("Request %s for %d containers on %s", req, 
req.getNumContainers(), req.getHostName()));
    }
3) Observe that when the scheduler checks for node locality (in the handle() 
method) using the FiCaSchedulerNode.getHostName(), the hostname is uses is the 
one set in the host OS (e.g. "node1").  NOTE: if you're using FifoScheduler, 
this bug needs to be fixed first 
(https://issues.apache.org/jira/browse/YARN-412).  


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to