[ 
https://issues.apache.org/jira/browse/YARN-10455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210272#comment-17210272
 ] 

Ahmed Hussein commented on YARN-10455:
--------------------------------------

[~leftnoteasy], [~eyang], [~Jim_Brennan]
Can you please take at the patch?

> TestNMProxy.testNMProxyRPCRetry is not consistent
> -------------------------------------------------
>
>                 Key: YARN-10455
>                 URL: https://issues.apache.org/jira/browse/YARN-10455
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Major
>         Attachments: YARN-10455.001.patch
>
>
> The fix in YARN-8844 may fail depending on the configuration of the machine 
> running the test.
>  In some cases the address gets resolved and the Unit throws a connection 
> timeout exception instead. In such scenario the JUnit times out the main 
> reason behind the failure is swallowed by the shutdown of the clients.
>  To make sure that the JUnit behavior is consistent, a suggested fix is to 
> set the host address to {{127.0.0.1:1}}. The latter will omit the probability 
> of collisions on non-privileged ports.
>  Also, it is more correct to catch {{SocketException}} directly rather than 
> catching IOException with a check for not {{SocketException}}.
>  
> The stack trace with such failures:
> {code:bash}
> [INFO] Running 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy
> [ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 24.293 s <<< FAILURE! - in 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy
> [ERROR] 
> testNMProxyRPCRetry(org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy)
>   Time elapsed: 20.18 s  <<< ERROR!
> org.junit.runners.model.TestTimedOutException: test timed out after 20000 
> milliseconds
>       at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
>       at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
>       at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
>       at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>       at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>       at 
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:336)
>       at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203)
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:586)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:700)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:821)
>       at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413)
>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:1645)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1461)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1414)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:234)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:119)
>       at com.sun.proxy.$Proxy24.startContainers(Unknown Source)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:133)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
>       at com.sun.proxy.$Proxy25.startContainers(Unknown Source)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.TestNMProxy.testNMProxyRPCRetry(TestNMProxy.java:167)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at java.lang.Thread.run(Thread.java:748)
> [INFO]
> [INFO] Results:
> [INFO]
> [ERROR] Errors:
> [ERROR]   TestNMProxy.testNMProxyRPCRetry:167 » TestTimedOut test timed out 
> after 20000 ...
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to