Hi John

Would you mind filing a jira with more details. The RM going down just because 
a host was not resolvable or DNS timed out is something that should be 
addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind… we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:[email protected]] 
> Sent: Thursday, March 13, 2014 2:52 PM
> To: [email protected]
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with 
> an UnknownHostException.  The odd thing is, the host it complains about have 
> been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl (RMAppImpl.java:handle(578)) - 
> application_1394204725813_0220 State change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(449)) - Error in handling event type NODE_UPDATE to 
> the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: 
> skitzo.office.datalever.com
>         at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
>         at 
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
>         ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - Stopped 
> [email protected]:8088
> 2014-03-13 14:38:16,013 ERROR delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(557)) - InterruptedExcpetion 
> recieved for ExpiredTokenRemover thread java.lang.InterruptedException: sleep 
> interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics 
> system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system 
> shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher 
> (ApplicationMasterLauncher.java:run(98)) - 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
>  interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - Stopping 
> server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - Stopping 
> server on 8050
> … and so on, it shuts down
>  

Reply via email to