Hi Hitesh,

          Yes it is an issue. This is handled in 
https://issues.apache.org/jira/i#browse/YARN-713 fixes DNS Issue. This fix 
available on hadoop-2.4(unreleased).


Thanks & Regards
Rohith Sharma K S

-----Original Message-----
From: Hitesh Shah [mailto:[email protected]] 
Sent: 14 March 2014 09:03
To: [email protected]
Subject: Re: ResourceManager shutting down

Hi John

Would you mind filing a jira with more details. The RM going down just because 
a host was not resolvable or DNS timed out is something that should be 
addressed.

thanks
-- Hitesh

On Mar 13, 2014, at 2:29 PM, John Lilley wrote:

> Never mind... we figured out its DNS entry was going missing.
> john
>  
> From: John Lilley [mailto:[email protected]]
> Sent: Thursday, March 13, 2014 2:52 PM
> To: [email protected]
> Subject: ResourceManager shutting down
>  
> We have this erratic behavior where every so often the RM will shutdown with 
> an UnknownHostException.  The odd thing is, the host it complains about have 
> been in use for days at that point without problem.  Any ideas?
> Thanks,
> John
>  
>  
> 2014-03-13 14:38:14,746 INFO  rmapp.RMAppImpl 
> (RMAppImpl.java:handle(578)) - application_1394204725813_0220 State 
> change from ACCEPTED to RUNNING
> 2014-03-13 14:38:15,794 FATAL resourcemanager.ResourceManager 
> (ResourceManager.java:run(449)) - Error in handling event type 
> NODE_UPDATE to the scheduler
> java.lang.IllegalArgumentException: java.net.UnknownHostException: 
> skitzo.office.datalever.com
>         at 
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418)
>         at 
> org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.createContainerToken(LeafQueue.java:1297)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainer(LeafQueue.java:1345)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignOffSwitchContainers(LeafQueue.java:1211)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainersOnNode(LeafQueue.java:1170)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.assignContainers(LeafQueue.java:871)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:645)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:559)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:690)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:734)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:86)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:440)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.UnknownHostException: skitzo.office.datalever.com
>         ... 15 more
> 2014-03-13 14:38:15,794 INFO  resourcemanager.ResourceManager 
> (ResourceManager.java:run(453)) - Exiting, bbye..
> 2014-03-13 14:38:15,911 INFO  mortbay.log (Slf4jLog.java:info(67)) - 
> Stopped [email protected]:8088
> 2014-03-13 14:38:16,013 ERROR 
> delegation.AbstractDelegationTokenSecretManager 
> (AbstractDelegationTokenSecretManager.java:run(557)) - 
> InterruptedExcpetion recieved for ExpiredTokenRemover thread 
> java.lang.InterruptedException: sleep interrupted
> 2014-03-13 14:38:16,013 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(200)) - Stopping ResourceManager metrics 
> system...
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(206)) - ResourceManager metrics system stopped.
> 2014-03-13 14:38:16,014 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:shutdown(572)) - ResourceManager metrics system 
> shutdown complete.
> 2014-03-13 14:38:16,015 WARN  amlauncher.ApplicationMasterLauncher 
> (ApplicationMasterLauncher.java:run(98)) - 
> org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread
>  interrupted. Returning.
> 2014-03-13 14:38:16,015 INFO  ipc.Server (Server.java:stop(2442)) - 
> Stopping server on 8141
> 2014-03-13 14:38:16,017 INFO  ipc.Server (Server.java:stop(2442)) - 
> Stopping server on 8050 ... and so on, it shuts down
>  

Reply via email to