Junping Du created YARN-4431:
--------------------------------

             Summary: Not necessary to do unRegisterNM() if NM get stop due to 
failed to connect to RM
                 Key: YARN-4431
                 URL: https://issues.apache.org/jira/browse/YARN-4431
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Junping Du
            Assignee: Junping Du


{noformat}
2015-12-07 12:16:57,873 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-12-07 12:16:58,874 INFO org.apache.hadoop.ipc.Client: Retrying connect to 
server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is 
RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2015-12-07 12:16:58,876 WARN 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Unregistration 
of the Node 10.200.10.53:25454 failed.
java.net.ConnectException: Call From jduMBP.local/10.200.10.53 to 0.0.0.0:8031 
failed on connection exception: java.net.ConnectException: Connection refused; 
For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.GeneratedConstructorAccessor30.newInstance(Unknown 
Source)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
        at org.apache.hadoop.ipc.Client.call(Client.java:1452)
        at org.apache.hadoop.ipc.Client.call(Client.java:1385)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
        at com.sun.proxy.$Proxy74.unRegisterNodeManager(Unknown Source)
        at 
org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.unRegisterNodeManager(ResourceTrackerPBClientImpl.java:98)
        at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:255)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
        at com.sun.proxy.$Proxy75.unRegisterNodeManager(Unknown Source)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:267)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStop(NodeStatusUpdaterImpl.java:245)
        at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
        at 
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
        at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
        at 
org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
        at 
org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
        at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:377)
{noformat}
If RM down for some reason, NM's NodeStatusUpdaterImpl will retry the 
connection with proper retry policy. After retry the maximum times (15 minutes 
by default), it will send NodeManagerEventType.SHUTDOWN to shutdown NM. But NM 
shutdown will call NodeStatusUpdaterImpl.serviceStop() which will call 
unRegisterNM() to unregister NM from RM and get retry again (another 15 
minutes). This is completely unnecessary and we should skip unRegisterNM when 
NM get shutdown because of connection issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to