[ https://issues.apache.org/jira/browse/YARN-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bikas Saha updated YARN-466: ---------------------------- Assignee: Zhijie Shen > Slave hostname mismatches in ResourceManager/Scheduler > ------------------------------------------------------ > > Key: YARN-466 > URL: https://issues.apache.org/jira/browse/YARN-466 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager, scheduler > Reporter: Roger Hoover > Assignee: Zhijie Shen > > The problem is that the ResourceManager learns the hostname of a slave node > when the NodeManager registers itself and it seems the node manager is > getting the hostname by asking the OS. When a job is submitted, I think the > ApplicationMaster learns the hostname by doing a reverse DNS lookup based on > the slaves file. > Therefore, the ApplicationMaster submits requests for containers using the > fully qualified domain name (node1.foo.com) but the scheduler uses the OS > hostname (node1) when checking to see if any requests are node-local. The > result is that node-local requests are never found using this method of > searching for node-local requests: > ResourceRequest request = application.getResourceRequest(priority, > node.getHostName()); > I think it's unfriendly to ask users to make sure they configure hostnames to > match fully qualified domain names. There should be a way for the > ApplicationMaster and NodeManager to agree on the hostname. > Steps to Reproduce: > 1) Configure the OS hostname on slaves to differ from the fully qualified > domain name. For example, if the FQDN for the slave is "node1.foo.com", set > the hostname on the node to be just "node1". > 2) On submitting a job, observe that the AM submits resource requests using > the FQDN (e.g. "node1.foo.com"). You can add logging to the allocate() > method of whatever scheduler you're using > for (ResourceRequest req: ask) { > LOG.debug(String.format("Request %s for %d containers on %s", req, > req.getNumContainers(), req.getHostName())); > } > 3) Observe that when the scheduler checks for node locality (in the handle() > method) using the FiCaSchedulerNode.getHostName(), the hostname is uses is > the one set in the host OS (e.g. "node1"). NOTE: if you're using > FifoScheduler, this bug needs to be fixed first > (https://issues.apache.org/jira/browse/YARN-412). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira