Never mind... we figured out its DNS entry was going missing.
john
From: John Lilley [mailto:[email protected]]
Sent: Thursday, March 13, 2014 2:52 PM
To: [email protected]
Subject: ResourceManager shutting down
We have this erratic behavior where every so often the RM will shutdown with an
UnknownHostException. The odd thing is, the host it complains about have been
in use for days at that point without problem. Any ideas?
Thanks,
John
2014-03-13 14:38:14,746 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(578)) -
application_1394204725813_0220 State change from ACCEPTED to RUNNING
2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager
(ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to
the scheduler
java.lang.IllegalArgumentException: java.net.UnknownHostException:
skitzo.office.datalever.com
at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
at
org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
at
org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
... 15 more
2014-03-13 14:38:15,794 INFO resourcemanager.ResourceManager
(ResourceManager.java:run(453)) - Exiting, bbye..
2014-03-13 14:38:15,911 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped
[email protected]:8088<mailto:[email protected]:8088>
2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager
(AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion
recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep
interrupted
2014-03-13 14:38:16,013 INFO impl.MetricsSystemImpl
(MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
2014-03-13 14:38:16,014 INFO impl.MetricsSystemImpl
(MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
2014-03-13 14:38:16,014 INFO impl.MetricsSystemImpl
(MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system
shutdown complete.
2014-03-13 14:38:16,015 WARN amlauncher.ApplicationMasterLauncher
(ApplicationMasterLauncher.java:run(98)) -
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
interrupted. Returning.
2014-03-13 14:38:16,015 INFO ipc.Server (Server.java:stop(2442)) - Stopping
server on 8141
2014-03-13 14:38:16,017 INFO ipc.Server (Server.java:stop(2442)) - Stopping
server on 8050
... and so on, it shuts down