Ed Kohlwey created YARN-1114:
--------------------------------

             Summary: Resource Manager Failure Due to Unreachable DNS
                 Key: YARN-1114
                 URL: https://issues.apache.org/jira/browse/YARN-1114
             Project: Hadoop YARN
          Issue Type: Bug
          Components: resourcemanager
    Affects Versions: 2.1.0-beta
         Environment: Centos 6.3, Hortonworks vendor distro based on Hadoop 2.1
            Reporter: Ed Kohlwey


We encountered an issue last night where DNS was not resolvable on our cluster 
briefly.

Our resource manager appears to have crashed due to an unresolvable hostname 
for a node manager. This is definitely not the right behavior since anyone can 
crash the resource manager by advertising a node manager with an unresolvable 
hostname. It also makes the RM non-very-robust to transient network issues that 
may arise. 

Here is the stack trace:
{noformat}
2013-08-28 05:06:24,703 FATAL 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in 
handling event type NODE_UPDATE to the scheduler
java.lang.IllegalArgumentException: java.net.UnknownHostException: <hostname 
removed>
        at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
        at 
org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:243)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.createContainer(AppSchedulable.java:160)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:237)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:338)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:364)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:160)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:149)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:907)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:980)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:110)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
        at java.lang.Thread.run(Thread.java:724)
Caused by: java.net.UnknownHostException: <hostname removed>
        ... 14 more
{noformat}

The following is our version information (from the hortonworks distro):
{noformat}
Hadoop 2.1.0.2.0.4.0-38
Subversion g...@github.com:hortonworks/hadoop.git -r 
1c6feea9d537846789eb3337dc5b1a8911cfd60a
Compiled by jenkins on 2013-07-08T10:29Z
>From source with checksum d1403d7842ef98c85d5f3d1332fa4
This command was run using /usr/lib/hadoop/hadoop-common-2.1.0.2.0.4.0-38.jar
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to