[ 
https://issues.apache.org/jira/browse/YARN-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-4431:
-----------------------------
    Attachment: YARN-4431.patch

Upload a patch which is quite stright-forward so no need a unit test.

> Not necessary to do unRegisterNM() if NM get stop due to failed to connect to 
> RM
> --------------------------------------------------------------------------------
>
>                 Key: YARN-4431
>                 URL: https://issues.apache.org/jira/browse/YARN-4431
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: YARN-4431.patch
>
>
> {noformat}
> 2015-12-07 12:16:57,873 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2015-12-07 12:16:58,874 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2015-12-07 12:16:58,876 WARN 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: 
> Unregistration of the Node 10.200.10.53:25454 failed.
> java.net.ConnectException: Call From jduMBP.local/10.200.10.53 to 
> 0.0.0.0:8031 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>         at sun.reflect.GeneratedConstructorAccessor30.newInstance(Unknown 
> Source)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
>         at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1452)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1385)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>         at com.sun.proxy.$Proxy74.unRegisterNodeManager(Unknown Source)
>         at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.unRegisterNodeManager(ResourceTrackerPBClientImpl.java:98)
>         at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:483)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:255)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>         at com.sun.proxy.$Proxy75.unRegisterNodeManager(Unknown Source)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.unRegisterNM(NodeStatusUpdaterImpl.java:267)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStop(NodeStatusUpdaterImpl.java:245)
>         at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>         at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>         at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>         at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>         at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>         at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:377)
> {noformat}
> If RM down for some reason, NM's NodeStatusUpdaterImpl will retry the 
> connection with proper retry policy. After retry the maximum times (15 
> minutes by default), it will send NodeManagerEventType.SHUTDOWN to shutdown 
> NM. But NM shutdown will call NodeStatusUpdaterImpl.serviceStop() which will 
> call unRegisterNM() to unregister NM from RM and get retry again (another 15 
> minutes). This is completely unnecessary and we should skip unRegisterNM when 
> NM get shutdown because of connection issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to