[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.
[ https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15321107#comment-15321107 ] Junping Du commented on YARN-4288: -- Sure. Please go ahead. Thanks Jason! > NodeManager restart should keep retrying to register to RM while connection > exception happens during RM failed over. > > > Key: YARN-4288 > URL: https://issues.apache.org/jira/browse/YARN-4288 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.8.0, 2.7.3 > > Attachments: YARN-4288-v2.patch, YARN-4288-v3.patch, YARN-4288.patch > > > When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with > RPC which could throw following exceptions when RM get restarted at the same > time, like following exception shows: > {noformat} > 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl > (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - > Unexpected error rebooting NodeStatusUpdater > java.io.IOException: Failed on local exception: java.io.IOException: > Connection reset by peer; Host Details : local host is: "172.27.62.28"; > destination host is: "172.27.62.57":8025; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > at org.apache.hadoop.ipc.Client.call(Client.java:1473) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy37.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:215) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:304) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read(BufferedInputStream.java:254) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967) > 2015-08-17 14:35:59,436 FATAL nodemanager.NodeManager > (NodeManager.java:run(307)) - Error while rebooting NodeStatusUpdater. > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: > Failed on local exception: java.io.IOException: Connection reset by peer; > Host Details : local host is: "172.27.62.28"; destination host is: > "172.27.62.57":8025; > at >
[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.
[ https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980434#comment-14980434 ] Hudson commented on YARN-4288: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #550 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/550/]) YARN-4288. Fixed RMProxy to retry on IOException from local host. (jianhe: rev c41699965e78ce5e87669d17923ab84e494c4188) * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/TestRetryProxy.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/UnreliableImplementation.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryPolicies.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/UnreliableInterface.java > NodeManager restart should keep retrying to register to RM while connection > exception happens during RM failed over. > > > Key: YARN-4288 > URL: https://issues.apache.org/jira/browse/YARN-4288 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-4288-v2.patch, YARN-4288-v3.patch, YARN-4288.patch > > > When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with > RPC which could throw following exceptions when RM get restarted at the same > time, like following exception shows: > {noformat} > 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl > (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - > Unexpected error rebooting NodeStatusUpdater > java.io.IOException: Failed on local exception: java.io.IOException: > Connection reset by peer; Host Details : local host is: "172.27.62.28"; > destination host is: "172.27.62.57":8025; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > at org.apache.hadoop.ipc.Client.call(Client.java:1473) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy37.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:215) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:304) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read(BufferedInputStream.java:254) >
[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.
[ https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979982#comment-14979982 ] Hudson commented on YARN-4288: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #599 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/599/]) YARN-4288. Fixed RMProxy to retry on IOException from local host. (jianhe: rev c41699965e78ce5e87669d17923ab84e494c4188) * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/UnreliableInterface.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/UnreliableImplementation.java * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryPolicies.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/TestRetryProxy.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java > NodeManager restart should keep retrying to register to RM while connection > exception happens during RM failed over. > > > Key: YARN-4288 > URL: https://issues.apache.org/jira/browse/YARN-4288 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-4288-v2.patch, YARN-4288-v3.patch, YARN-4288.patch > > > When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with > RPC which could throw following exceptions when RM get restarted at the same > time, like following exception shows: > {noformat} > 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl > (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - > Unexpected error rebooting NodeStatusUpdater > java.io.IOException: Failed on local exception: java.io.IOException: > Connection reset by peer; Host Details : local host is: "172.27.62.28"; > destination host is: "172.27.62.57":8025; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > at org.apache.hadoop.ipc.Client.call(Client.java:1473) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy37.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:215) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:304) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.
[ https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979957#comment-14979957 ] Hudson commented on YARN-4288: -- FAILURE: Integrated in Hadoop-trunk-Commit #8723 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8723/]) YARN-4288. Fixed RMProxy to retry on IOException from local host. (jianhe: rev c41699965e78ce5e87669d17923ab84e494c4188) * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/UnreliableInterface.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryPolicies.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/UnreliableImplementation.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/TestRetryProxy.java * hadoop-yarn-project/CHANGES.txt > NodeManager restart should keep retrying to register to RM while connection > exception happens during RM failed over. > > > Key: YARN-4288 > URL: https://issues.apache.org/jira/browse/YARN-4288 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-4288-v2.patch, YARN-4288-v3.patch, YARN-4288.patch > > > When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with > RPC which could throw following exceptions when RM get restarted at the same > time, like following exception shows: > {noformat} > 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl > (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - > Unexpected error rebooting NodeStatusUpdater > java.io.IOException: Failed on local exception: java.io.IOException: > Connection reset by peer; Host Details : local host is: "172.27.62.28"; > destination host is: "172.27.62.57":8025; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > at org.apache.hadoop.ipc.Client.call(Client.java:1473) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy37.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:215) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:304) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read(BufferedInputStream.java:254) > at
[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.
[ https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980148#comment-14980148 ] Hudson commented on YARN-4288: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1335 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1335/]) YARN-4288. Fixed RMProxy to retry on IOException from local host. (jianhe: rev c41699965e78ce5e87669d17923ab84e494c4188) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/UnreliableImplementation.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/TestRetryProxy.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/UnreliableInterface.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryPolicies.java > NodeManager restart should keep retrying to register to RM while connection > exception happens during RM failed over. > > > Key: YARN-4288 > URL: https://issues.apache.org/jira/browse/YARN-4288 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-4288-v2.patch, YARN-4288-v3.patch, YARN-4288.patch > > > When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with > RPC which could throw following exceptions when RM get restarted at the same > time, like following exception shows: > {noformat} > 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl > (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - > Unexpected error rebooting NodeStatusUpdater > java.io.IOException: Failed on local exception: java.io.IOException: > Connection reset by peer; Host Details : local host is: "172.27.62.28"; > destination host is: "172.27.62.57":8025; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > at org.apache.hadoop.ipc.Client.call(Client.java:1473) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy37.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:215) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:304) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read(BufferedInputStream.java:254) > at
[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.
[ https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980205#comment-14980205 ] Hudson commented on YARN-4288: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2488 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2488/]) YARN-4288. Fixed RMProxy to retry on IOException from local host. (jianhe: rev c41699965e78ce5e87669d17923ab84e494c4188) * hadoop-yarn-project/CHANGES.txt * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryPolicies.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/TestRetryProxy.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/UnreliableInterface.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/UnreliableImplementation.java > NodeManager restart should keep retrying to register to RM while connection > exception happens during RM failed over. > > > Key: YARN-4288 > URL: https://issues.apache.org/jira/browse/YARN-4288 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-4288-v2.patch, YARN-4288-v3.patch, YARN-4288.patch > > > When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with > RPC which could throw following exceptions when RM get restarted at the same > time, like following exception shows: > {noformat} > 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl > (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - > Unexpected error rebooting NodeStatusUpdater > java.io.IOException: Failed on local exception: java.io.IOException: > Connection reset by peer; Host Details : local host is: "172.27.62.28"; > destination host is: "172.27.62.57":8025; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > at org.apache.hadoop.ipc.Client.call(Client.java:1473) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy37.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:215) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:304) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read(BufferedInputStream.java:254) > at
[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.
[ https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980156#comment-14980156 ] Hudson commented on YARN-4288: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #612 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/612/]) YARN-4288. Fixed RMProxy to retry on IOException from local host. (jianhe: rev c41699965e78ce5e87669d17923ab84e494c4188) * hadoop-yarn-project/CHANGES.txt * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/UnreliableImplementation.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/retry/RetryPolicies.java * hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/UnreliableInterface.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/retry/TestRetryProxy.java > NodeManager restart should keep retrying to register to RM while connection > exception happens during RM failed over. > > > Key: YARN-4288 > URL: https://issues.apache.org/jira/browse/YARN-4288 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Fix For: 2.8.0 > > Attachments: YARN-4288-v2.patch, YARN-4288-v3.patch, YARN-4288.patch > > > When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with > RPC which could throw following exceptions when RM get restarted at the same > time, like following exception shows: > {noformat} > 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl > (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - > Unexpected error rebooting NodeStatusUpdater > java.io.IOException: Failed on local exception: java.io.IOException: > Connection reset by peer; Host Details : local host is: "172.27.62.28"; > destination host is: "172.27.62.57":8025; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > at org.apache.hadoop.ipc.Client.call(Client.java:1473) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy37.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:215) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:304) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read(BufferedInputStream.java:254) >
[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.
[ https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979205#comment-14979205 ] Hadoop QA commented on YARN-4288: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s {color} | {color:blue} docker + precommit patch detected. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 38s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 35s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 23s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 0s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 29s {color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 14s {color} | {color:red} hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common in trunk cannot run convertXmlToText from findbugs {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s {color} | {color:green} trunk passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s {color} | {color:green} trunk passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 25s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 16s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 16s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 57s {color} | {color:red} Patch generated 1 new checkstyle issues in root (total was 44, now 45). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s {color} | {color:green} the patch passed with JDK v1.8.0_60 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 38s {color} | {color:green} the patch passed with JDK v1.7.0_79 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 57s {color} | {color:green} hadoop-common in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 47s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.8.0_60. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 18s {color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 7s {color} | {color:green} hadoop-yarn-common in the patch passed with JDK v1.7.0_79. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 58m 22s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.7.1 Server=1.7.1 Image:test-patch-base-hadoop-date2015-10-28 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12769294/YARN-4288-v3.patch | | JIRA Issue | YARN-4288 | | Optional Tests | asflicense javac javadoc mvninstall unit findbugs checkstyle compile | | uname | Linux a5588f4967ed 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64
[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.
[ https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979454#comment-14979454 ] Jian He commented on YARN-4288: --- lgtm > NodeManager restart should keep retrying to register to RM while connection > exception happens during RM failed over. > > > Key: YARN-4288 > URL: https://issues.apache.org/jira/browse/YARN-4288 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4288-v2.patch, YARN-4288-v3.patch, YARN-4288.patch > > > When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with > RPC which could throw following exceptions when RM get restarted at the same > time, like following exception shows: > {noformat} > 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl > (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - > Unexpected error rebooting NodeStatusUpdater > java.io.IOException: Failed on local exception: java.io.IOException: > Connection reset by peer; Host Details : local host is: "172.27.62.28"; > destination host is: "172.27.62.57":8025; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > at org.apache.hadoop.ipc.Client.call(Client.java:1473) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy37.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:215) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:304) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read(BufferedInputStream.java:254) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967) > 2015-08-17 14:35:59,436 FATAL nodemanager.NodeManager > (NodeManager.java:run(307)) - Error while rebooting NodeStatusUpdater. > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: > Failed on local exception: java.io.IOException: Connection reset by peer; > Host Details : local host is: "172.27.62.28"; destination host is: > "172.27.62.57":8025; > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:223) > at >
[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.
[ https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979452#comment-14979452 ] Junping Du commented on YARN-4288: -- The findbug warning and checkstyle is not related. > NodeManager restart should keep retrying to register to RM while connection > exception happens during RM failed over. > > > Key: YARN-4288 > URL: https://issues.apache.org/jira/browse/YARN-4288 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4288-v2.patch, YARN-4288-v3.patch, YARN-4288.patch > > > When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with > RPC which could throw following exceptions when RM get restarted at the same > time, like following exception shows: > {noformat} > 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl > (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - > Unexpected error rebooting NodeStatusUpdater > java.io.IOException: Failed on local exception: java.io.IOException: > Connection reset by peer; Host Details : local host is: "172.27.62.28"; > destination host is: "172.27.62.57":8025; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > at org.apache.hadoop.ipc.Client.call(Client.java:1473) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy37.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:215) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:304) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read(BufferedInputStream.java:254) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967) > 2015-08-17 14:35:59,436 FATAL nodemanager.NodeManager > (NodeManager.java:run(307)) - Error while rebooting NodeStatusUpdater. > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: > Failed on local exception: java.io.IOException: Connection reset by peer; > Host Details : local host is: "172.27.62.28"; destination host is: > "172.27.62.57":8025; > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:223) > at
[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.
[ https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977391#comment-14977391 ] Hadoop QA commented on YARN-4288: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 19m 17s | Pre-patch trunk has 3 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 8m 15s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 42s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 45s | The applied patch generated 3 new checkstyle issues (total was 41, now 44). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 6m 40s | Tests failed in hadoop-common. | | {color:green}+1{color} | yarn tests | 2m 4s | Tests passed in hadoop-yarn-common. | | | | 55m 2s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.ipc.TestDecayRpcScheduler | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12769058/YARN-4288-v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 68ce93c | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9591/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-common.html | | checkstyle | https://builds.apache.org/job/PreCommit-YARN-Build/9591/artifact/patchprocess/diffcheckstylehadoop-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/9591/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9591/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-yarn-common test log | https://builds.apache.org/job/PreCommit-YARN-Build/9591/artifact/patchprocess/testrun_hadoop-yarn-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9591/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9591/console | This message was automatically generated. > NodeManager restart should keep retrying to register to RM while connection > exception happens during RM failed over. > > > Key: YARN-4288 > URL: https://issues.apache.org/jira/browse/YARN-4288 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4288-v2.patch, YARN-4288.patch > > > When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with > RPC which could throw following exceptions when RM get restarted at the same > time, like following exception shows: > {noformat} > 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl > (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - > Unexpected error rebooting NodeStatusUpdater > java.io.IOException: Failed on local exception: java.io.IOException: > Connection reset by peer; Host Details : local host is: "172.27.62.28"; > destination host is: "172.27.62.57":8025; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > at org.apache.hadoop.ipc.Client.call(Client.java:1473) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at >
[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.
[ https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975226#comment-14975226 ] Junping Du commented on YARN-4288: -- Thanks [~vinodkv] for the comments. bq. Why isn't existing RMProxy framework taking care of this? RMProxy is supposed to take care of this. However, the way that RMProxy to do is to do retry on specific (known) exceptions but fail directly for other exceptions. Like this case, IOException get thrown will get failed directly without any retry (for non-HA case). We are a little risky if more potential exception could get thrown during RM down time. For this particular case, I can add the IOException (other than RemoteException) to be handled directly which sounds a easy way of fix. bq. Why are we putting special code in NodeStatusUpdater? Shouldn't we use something in the RMProxy framework? See ServerProxy for example that gets used by NMClients. As I mentioned above, having a white list of exceptions to retry doesn't sound robust enough: if any exception we don't meet before, we could skip the retry unintentionally. Isn't it? Anyway, I could fix the problem with following existing retry policy framework but hopefully we could improve the framework in other JIRA. bq. Just looked at YARN-4132 too, we should definitely see if we can merge these two together. This is a bug that NM doesn't retry in some cases. YARN-4132 talk about another problem that NM retry should be longer than general RMProxy client which is a more general improvement. I think we'd better separate them out. Thoughts? > NodeManager restart should keep retrying to register to RM while connection > exception happens during RM failed over. > > > Key: YARN-4288 > URL: https://issues.apache.org/jira/browse/YARN-4288 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4288.patch > > > When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with > RPC which could throw following exceptions when RM get restarted at the same > time, like following exception shows: > {noformat} > 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl > (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - > Unexpected error rebooting NodeStatusUpdater > java.io.IOException: Failed on local exception: java.io.IOException: > Connection reset by peer; Host Details : local host is: "172.27.62.28"; > destination host is: "172.27.62.57":8025; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > at org.apache.hadoop.ipc.Client.call(Client.java:1473) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy37.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:215) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:304) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at >
[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.
[ https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14975004#comment-14975004 ] Vinod Kumar Vavilapalli commented on YARN-4288: --- Just looked at YARN-4132 too, we should definitely see if we can merge these two together. > NodeManager restart should keep retrying to register to RM while connection > exception happens during RM failed over. > > > Key: YARN-4288 > URL: https://issues.apache.org/jira/browse/YARN-4288 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4288.patch > > > When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with > RPC which could throw following exceptions when RM get restarted at the same > time, like following exception shows: > {noformat} > 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl > (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - > Unexpected error rebooting NodeStatusUpdater > java.io.IOException: Failed on local exception: java.io.IOException: > Connection reset by peer; Host Details : local host is: "172.27.62.28"; > destination host is: "172.27.62.57":8025; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > at org.apache.hadoop.ipc.Client.call(Client.java:1473) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy37.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:215) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:304) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read(BufferedInputStream.java:254) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967) > 2015-08-17 14:35:59,436 FATAL nodemanager.NodeManager > (NodeManager.java:run(307)) - Error while rebooting NodeStatusUpdater. > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: > Failed on local exception: java.io.IOException: Connection reset by peer; > Host Details : local host is: "172.27.62.28"; destination host is: > "172.27.62.57":8025; > at >
[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.
[ https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974996#comment-14974996 ] Vinod Kumar Vavilapalli commented on YARN-4288: --- Couple of questions - Why isn't existing RMProxy framework taking care of this? - Why are we putting special code in NodeStatusUpdater? Shouldn't we use something in the RMProxy framework? See ServerProxy for example that gets used by NMClients. > NodeManager restart should keep retrying to register to RM while connection > exception happens during RM failed over. > > > Key: YARN-4288 > URL: https://issues.apache.org/jira/browse/YARN-4288 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4288.patch > > > When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with > RPC which could throw following exceptions when RM get restarted at the same > time, like following exception shows: > {noformat} > 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl > (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - > Unexpected error rebooting NodeStatusUpdater > java.io.IOException: Failed on local exception: java.io.IOException: > Connection reset by peer; Host Details : local host is: "172.27.62.28"; > destination host is: "172.27.62.57":8025; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > at org.apache.hadoop.ipc.Client.call(Client.java:1473) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy37.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:257) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.rebootNodeStatusUpdaterAndRegisterWithRM(NodeStatusUpdaterImpl.java:215) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager$2.run(NodeManager.java:304) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) > at sun.nio.ch.IOUtil.read(IOUtil.java:197) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) > at > org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:514) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read(BufferedInputStream.java:254) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1072) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967) > 2015-08-17 14:35:59,436 FATAL nodemanager.NodeManager > (NodeManager.java:run(307)) - Error while rebooting NodeStatusUpdater. > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: > Failed on local exception: java.io.IOException: Connection reset by peer; > Host Details : local host is: "172.27.62.28"; destination host is: >
[jira] [Commented] (YARN-4288) NodeManager restart should keep retrying to register to RM while connection exception happens during RM failed over.
[ https://issues.apache.org/jira/browse/YARN-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969388#comment-14969388 ] Hadoop QA commented on YARN-4288: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 22m 49s | Pre-patch trunk has 1 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 10m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 13m 48s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 32s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 49s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 57s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 43s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 33s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | yarn tests | 9m 34s | Tests failed in hadoop-yarn-server-nodemanager. | | | | 62m 32s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.yarn.server.nodemanager.TestNodeStatusUpdaterForLabels | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12768048/YARN-4288.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 2798723 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-YARN-Build/9523/artifact/patchprocess/trunkFindbugsWarningshadoop-yarn-server-nodemanager.html | | whitespace | https://builds.apache.org/job/PreCommit-YARN-Build/9523/artifact/patchprocess/whitespace.txt | | hadoop-yarn-server-nodemanager test log | https://builds.apache.org/job/PreCommit-YARN-Build/9523/artifact/patchprocess/testrun_hadoop-yarn-server-nodemanager.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/9523/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/9523/console | This message was automatically generated. > NodeManager restart should keep retrying to register to RM while connection > exception happens during RM failed over. > > > Key: YARN-4288 > URL: https://issues.apache.org/jira/browse/YARN-4288 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Junping Du >Assignee: Junping Du >Priority: Critical > Attachments: YARN-4288.patch > > > When NM get restarted, NodeStatusUpdaterImpl will try to register to RM with > RPC which could throw following exceptions when RM get restarted at the same > time, like following exception shows: > {noformat} > 2015-08-17 14:35:59,434 ERROR nodemanager.NodeStatusUpdaterImpl > (NodeStatusUpdaterImpl.java:rebootNodeStatusUpdaterAndRegisterWithRM(222)) - > Unexpected error rebooting NodeStatusUpdater > java.io.IOException: Failed on local exception: java.io.IOException: > Connection reset by peer; Host Details : local host is: "172.27.62.28"; > destination host is: "172.27.62.57":8025; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) > at org.apache.hadoop.ipc.Client.call(Client.java:1473) > at org.apache.hadoop.ipc.Client.call(Client.java:1400) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy36.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:68) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606)