[
https://issues.apache.org/jira/browse/YARN-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246125#comment-14246125
]
Steve Loughran commented on YARN-2710:
--------------------------------------
Immediate failure trigger is that the retry handler says "Not retrying because
failovers (5) exceeded maximum allowed (5)"
If you look at the lines immediately above it, the RM is still bootstrapping
hence not listening for connections.
The test is probably just starting too early. Recommend changing the
failure/retry policy to to add some backoff & maybe more retries
{code}
014-12-14 11:40:06,693 INFO [Thread-442] resourcemanager.RMAuditLogger
(RMAuditLogger.java:logSuccess(165)) - USER=jenkins
OPERATION=refreshSuperUserGroupsConfiguration TARGET=AdminService
RESULT=SUCCESS
2014-12-14 11:40:06,693 INFO [Thread-442] conf.Configuration
(Configuration.java:getConfResourceAsInputStream(2240)) - found resource
core-site.xml at
file:/home/jenkins/jenkins-slave/workspace/Hadoop-Yarn-trunk-Java8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/test-classes/core-site.xml
2014-12-14 11:40:06,694 INFO [Thread-442] security.Groups
(Groups.java:refresh(248)) - clearing userToGroupsMap cache
2014-12-14 11:40:06,694 INFO [Thread-442] resourcemanager.RMAuditLogger
(RMAuditLogger.java:logSuccess(165)) - USER=jenkins
OPERATION=refreshUserToGroupsMappings TARGET=AdminService RESULT=SUCCESS
2014-12-14 11:40:06,694 INFO [Thread-442] resourcemanager.RMAuditLogger
(RMAuditLogger.java:logSuccess(165)) - USER=jenkins
OPERATION=transitionToActive TARGET=RMHAProtocolService RESULT=SUCCESS
2014-12-14 11:40:07,539 INFO [Thread-443]
client.ConfiguredRMFailoverProxyProvider
(ConfiguredRMFailoverProxyProvider.java:performFailover(100)) - Failing over to
rm2
2014-12-14 11:40:07,541 WARN [Thread-443] retry.RetryInvocationHandler
(RetryInvocationHandler.java:invoke(117)) - Exception while invoking class
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getContainers
over rm2. Not retrying because failovers (5) exceeded maximum allowed (5)
java.net.ConnectException: Call From asf908.gq1.ygridcore.net/67.195.81.152 to
asf908.gq1.ygridcore.net:28032 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy16.getContainers(Unknown Source)
at
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getContainers(ApplicationClientProtocolPBClientImpl.java:410)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
at com.sun.proxy.$Proxy17.getContainers(Unknown Source)
at
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getContainers(YarnClientImpl.java:654)
at
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetContainersOnHA(TestApplicationClientProtocolOnHA.java:155)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
at
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:712)
at
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
at org.apache.hadoop.ipc.Client.call(Client.java:1438)
... 23 more
{code}
> RM HA tests failed intermittently on trunk
> ------------------------------------------
>
> Key: YARN-2710
> URL: https://issues.apache.org/jira/browse/YARN-2710
> Project: Hadoop YARN
> Issue Type: Bug
> Components: client
> Reporter: Wangda Tan
> Attachments: TestResourceTrackerOnHA-output.2.txt,
> org.apache.hadoop.yarn.client.TestResourceTrackerOnHA-output.txt
>
>
> Failure like, it can be happened in TestApplicationClientProtocolOnHA,
> TestResourceTrackerOnHA, etc.
> {code}
> org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
> testGetApplicationAttemptsOnHA(org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA)
> Time elapsed: 9.491 sec <<< ERROR!
> java.net.ConnectException: Call From asf905.gq1.ygridcore.net/67.195.81.149
> to asf905.gq1.ygridcore.net:28032 failed on connection exception:
> java.net.ConnectException: Connection refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
> at
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
> at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
> at org.apache.hadoop.ipc.Client.call(Client.java:1438)
> at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy17.getApplicationAttempts(Unknown Source)
> at
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationAttempts(ApplicationClientProtocolPBClientImpl.java:372)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
> at com.sun.proxy.$Proxy18.getApplicationAttempts(Unknown Source)
> at
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationAttempts(YarnClientImpl.java:583)
> at
> org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetApplicationAttemptsOnHA(TestApplicationClientProtocolOnHA.java:137)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)