Ed Kohlwey created YARN-1114:
--------------------------------
Summary: Resource Manager Failure Due to Unreachable DNS
Key: YARN-1114
URL: https://issues.apache.org/jira/browse/YARN-1114
Project: Hadoop YARN
Issue Type: Bug
Components: resourcemanager
Affects Versions: 2.1.0-beta
Environment: Centos 6.3, Hortonworks vendor distro based on Hadoop 2.1
Reporter: Ed Kohlwey
We encountered an issue last night where DNS was not resolvable on our cluster
briefly.
Our resource manager appears to have crashed due to an unresolvable hostname
for a node manager. This is definitely not the right behavior since anyone can
crash the resource manager by advertising a node manager with an unresolvable
hostname. It also makes the RM non-very-robust to transient network issues that
may arise.
Here is the stack trace:
{noformat}
2013-08-28 05:06:24,703 FATAL
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error in
handling event type NODE_UPDATE to the scheduler
java.lang.IllegalArgumentException: java.net.UnknownHostException: <hostname
removed>
at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
at
org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:243)
at
org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.createContainer(AppSchedulable.java:160)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:237)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:338)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AppSchedulable.assignContainer(AppSchedulable.java:364)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSLeafQueue.assignContainer(FSLeafQueue.java:160)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSParentQueue.assignContainer(FSParentQueue.java:149)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.nodeUpdate(FairScheduler.java:907)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:980)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.handle(FairScheduler.java:110)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:413)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.net.UnknownHostException: <hostname removed>
... 14 more
{noformat}
The following is our version information (from the hortonworks distro):
{noformat}
Hadoop 2.1.0.2.0.4.0-38
Subversion [email protected]:hortonworks/hadoop.git -r
1c6feea9d537846789eb3337dc5b1a8911cfd60a
Compiled by jenkins on 2013-07-08T10:29Z
>From source with checksum d1403d7842ef98c85d5f3d1332fa4
This command was run using /usr/lib/hadoop/hadoop-common-2.1.0.2.0.4.0-38.jar
{noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira