Sunil G created YARN-1694: ----------------------------- Summary: RM is shutting down when an NM is added to cluster without updating the hostname in /etc/hosts Key: YARN-1694 URL: https://issues.apache.org/jira/browse/YARN-1694 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0 Reporter: Sunil G Priority: Critical
A New NM is added to cluster, but the hostname mapping of this NM is not updated in /etc/hosts in RM. NM registration is successful without any problems. When a job is submitted, RM shuts down with below exception. 2013-10-04 04:37:37,611 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in handling event type NODE_UPDATE to the scheduler java.lang.IllegalArgumentException: java.net.UnknownHostException: host-10-18-40-120 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418) at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247) at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1296) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1344) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1210) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1169) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:870) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:707) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:751) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:93) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:449) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.UnknownHostException: host-10-18-40-120 ... 15 more 2013-10-04 04:37:37,614 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye.. -- This message was sent by Atlassian JIRA (v6.1.5#6160)