[ 
https://issues.apache.org/jira/browse/YARN-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583823#comment-13583823
 ] 

Hitesh Shah commented on YARN-196:
----------------------------------

Patch still has trailing whitespace issues.

In YarnConfiguration.java:

+  /** Max time to wait for ResourceManger start
+   */
 
Please rephrase to "max time to wait to establish a connection to the 
ResourceManager when the NodeManager starts"

+  /** Time interval for each NM attempt to connect RM
+   */

Rephrase to "Time interval between each NM attempt to connect to the 
ResourceManager" 

You can use the same descriptions in yarn-default.xml. Not sure if "After that 
period of time, NM will throw out exceptions" is valid in yarn-default.xml. A 
better description could mention that the NM will shutdown if it cannot connect 
to the RM within the specified max time period. 
Description should also mention how to use -1 to retry forever.

Earlier comment had a point of switching to using SECONDS instead of MS for 
users to understand more easily.

+      // this.hostName = InetAddress.getLocalHost().getCanonicalHostName();
  - Please remove commented out code if not being used.

Unit test does not really seem to be testing the flow of 
RESOURCEMANAGER_CONNECT_WAIT_MS being set to -1. waitForEver is being 
explicitly set to true/false based on the updater's ctor and not really based 
on the config value. If that flow cannot be tested, it might be better to 
remove the additional complexity from the test.

Also, patch will need to be updated due to 
https://issues.apache.org/jira/browse/HADOOP-9112. 














                
> Nodemanager if started before starting Resource manager is getting 
> shutdown.But if both RM and NM are started and then after if RM is going 
> down,NM is retrying for the RM.
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-196
>                 URL: https://issues.apache.org/jira/browse/YARN-196
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.0, 2.0.0-alpha
>            Reporter: Ramgopal N
>            Assignee: Xuan Gong
>         Attachments: MAPREDUCE-3676.patch, YARN-196.1.patch, 
> YARN-196.2.patch, YARN-196.3.patch, YARN-196.4.patch
>
>
> If NM is started before starting the RM ,NM is shutting down with the 
> following error
> {code}
> ERROR org.apache.hadoop.yarn.service.CompositeService: Error starting 
> services org.apache.hadoop.yarn.server.nodemanager.NodeManager
> org.apache.avro.AvroRuntimeException: 
> java.lang.reflect.UndeclaredThrowableException
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:149)
>       at 
> org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:167)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:242)
> Caused by: java.lang.reflect.UndeclaredThrowableException
>       at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:182)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:145)
>       ... 3 more
> Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: 
> Call From HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on 
> connection exception: java.net.ConnectException: Connection refused; For more 
> details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>       at 
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:131)
>       at $Proxy23.registerNodeManager(Unknown Source)
>       at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
>       ... 5 more
> Caused by: java.net.ConnectException: Call From 
> HOST-10-18-52-230/10.18.52.230 to HOST-10-18-52-250:8025 failed on connection 
> exception: java.net.ConnectException: Connection refused; For more details 
> see:  http://wiki.apache.org/hadoop/ConnectionRefused
>       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:857)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1141)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1100)
>       at 
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:128)
>       ... 7 more
> Caused by: java.net.ConnectException: Connection refused
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>       at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:659)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:469)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:563)
>       at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:211)
>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:1247)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1117)
>       ... 9 more
> 2012-01-16 15:04:13,336 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher thread interrupted
> java.lang.InterruptedException
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
>       at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
>       at java.lang.Thread.run(Thread.java:619)
> 2012-01-16 15:04:13,337 INFO org.apache.hadoop.yarn.service.AbstractService: 
> Service:Dispatcher is stopped.
> 2012-01-16 15:04:13,392 INFO org.mortbay.log: Stopped 
> [email protected]:9999
> 2012-01-16 15:04:13,493 INFO org.apache.hadoop.yarn.service.AbstractService: 
> Service:org.apache.hadoop.yarn.server.nodemanager.webapp.WebServer is stopped.
> 2012-01-16 15:04:13,493 INFO org.apache.hadoop.ipc.Server: Stopping server on 
> 24290
> 2012-01-16 15:04:13,494 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 24290
> 2012-01-16 15:04:13,495 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
> 2012-01-16 15:04:13,496 INFO org.apache.hadoop.yarn.service.AbstractService: 
> Service:org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler
>  is stopped.
> 2012-01-16 15:04:13,496 WARN org.apache.hadoop.yarn.event.AsyncDispatcher: 
> AsyncDispatcher thread interrupted
> java.lang.InterruptedException
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
>       at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:76)
>       at java.lang.Thread.run(Thread.java:619)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to