Mohammad Kamrul Islam created YARN-1894:
-------------------------------------------

             Summary: RM shutdown due to java.net.UnknownHostException
                 Key: YARN-1894
                 URL: https://issues.apache.org/jira/browse/YARN-1894
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
            Reporter: Mohammad Kamrul Islam
            Assignee: Mohammad Kamrul Islam


Background:
----------------
I started Hadoop 2.3 on my Mac in my office network and submitted few jobs 
successfully. When i went to my home (new network), I submitted another job and 
it abruptly pulled down the RM service.

Error in RM log:
{noformat}
2014-03-29 12:28:56,754 INFO 
org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager:
 storing RMDelegation token with sequence number: 3
2014-03-29 12:28:57,256 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type NODE_UPDATE to the scheduler
java.lang.IllegalArgumentException: java.net.UnknownHostException: 
mislam-mn.<MY.OOFICE.DOMAIN>
        at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
        at 
org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1294)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1342)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1208)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1167)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:868)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:642)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:556)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:696)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:740)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:88)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:543)
        at java.lang.Thread.run(Thread.java:695)
Caused by: java.net.UnknownHostException: mislam-mn.linkedin.biz
        ... 15 more
2014-03-29 12:28:57,259 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Exiting, bbye..
2014-03-29 12:28:57,297 INFO org.mortbay.log: Stopped 
[email protected]:8088
2014-03-29 12:28:57,401 INFO org.apache.hadoop.ipc.Server: Stopping server on 
8032
2014-03-29 12:28:57,473 INFO org.apache.hadoop.ipc.Server: Stopping server on 
8033
.....
{noformat}

Proposal:
---------------
I believe the root cause : I moved my machine from one network to another with 
the same RM service.

My point is: Whatever the cause, RM is a long running core-service and it 
should not exit this way. An appropriate error message should be sufficient.

If there is an consensus (or no disagreement), I can work for a patch.

  





--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to