[ 
https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603621#comment-13603621
 ] 

Hitesh Shah commented on YARN-196:
----------------------------------

Committed to branch-2 and trunk. Thanks Xuan for addressing the numerous review 
comments and being so patient. I have also filed a related jira regarding 
similar handling of connection loss after the NM is up. 
                
> Nodemanager should be more robust in handling connection failure  to 
> ResourceManager when a cluster is started
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-196
>                 URL: https://issues.apache.org/jira/browse/YARN-196
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.0, 2.0.0-alpha
>            Reporter: Ramgopal N
>            Assignee: Xuan Gong
>         Attachments: MAPREDUCE-3676.patch, YARN-196.10.patch, 
> YARN-196.11.patch, YARN-196.12.1.patch, YARN-196.12.patch, YARN-196.1.patch, 
> YARN-196.2.patch, YARN-196.3.patch, YARN-196.4.patch, YARN-196.5.patch, 
> YARN-196.6.patch, YARN-196.7.patch, YARN-196.8.patch, YARN-196.9.patch
>
>
> If NM is started before starting the RM ,NM is shutting down with the 
> following error
> {code}
> ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting 
> services org.apache.hadoop.yarn.server.nodemanager.NodeManager
> org.apache.avro.AvroRuntimeException: 
> java.lang.reflect.UndeclaredThrowableException
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149)
>       at 
> org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242)
> Caused by: java.lang.reflect.UndeclaredThrowableException
>       at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)
>       ... 3 more
> Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: 
> Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on 
> connection exception: java.net.ConnectException: Connection refused; For more 
> details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>       at 
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
>       at $Proxy23.registerNodeManager(Unknown Source)
>       at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
>       ... 5 more
> Caused by: java.net.ConnectException: Call From 
> HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection 
> exception: java.net.ConnectException: Connection refused; For more details 
> see:  http://wiki.apache.org/hadoop/ConnectionRefused
>       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1141)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1100)
>       at 
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128)
>       ... 7 more
> Caused by: java.net.ConnectException: Connection refused
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>       at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563)
>       at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211)
>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1117)
>       ... 9 more
> 2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher thread interrupted
> java.lang.InterruptedException
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
>       at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
>       at java.lang.Thread.run(Thread.java:619)
> 2012-01-16 15:04:13,337 INFO org.apache.hadoop.yarn.service.AbstractService: 
> Service:Dispatcher is stopped.
> 2012-01-16 15:04:13,392 INFO org.mortbay.log: Stopped 
> SelectChannelConnector@0.0.0.0:9999
> 2012-01-16 15:04:13,493 INFO org.apache.hadoop.yarn.service.AbstractService: 
> Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.
> 2012-01-16 15:04:13,493 INFO org.apache.hadoop.ipc.Server: Stopping server on 
> 24290
> 2012-01-16 15:04:13,494 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 24290
> 2012-01-16 15:04:13,495 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
> 2012-01-16 15:04:13,496 INFO org.apache.hadoop.yarn.service.AbstractService: 
> Service:org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler
>  is stopped.
> 2012-01-16 15:04:13,496 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher thread interrupted
> java.lang.InterruptedException
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
>       at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
>       at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to