Never mind... we figured out its DNS entry was going missing.
john

From: John Lilley [mailto:[email protected]]
Sent: Thursday, March 13, 2014 2:52 PM
To: [email protected]
Subject: ResourceManager shutting down

We have this erratic behavior where every so often the RM will shutdown with an 
UnknownHostException.  The odd thing is, the host it complains about have been 
in use for days at that point without problem.  Any ideas?
Thanks,
John


2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - 
application_1394204725813_0220 State change from ACCEPTED to RUNNING
2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager 
(ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to 
the scheduler
java.lang.IllegalArgumentException: java.net.UnknownHostException: 
skitzo.office.datalever.com
        at 
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
        at 
org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
        ... 15 more
2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager 
(ResourceManager.java:run(453)) - Exiting, bbye..
2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
[email protected]:8088<mailto:[email protected]:8088>
2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager 
(AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion 
recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
interrupted
2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl 
(MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics system...
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
(MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
(MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system 
shutdown complete.
2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher 
(ApplicationMasterLauncher.java:run(98)) - 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
 interrupted. Returning.
2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping 
server on 8141
2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping 
server on 8050
... and so on, it shuts down

Reply via email to