[jira] [Commented] (YARN-4385) TestDistributedShell times out
[ https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065598#comment-15065598 ] Naganarasimha G R commented on YARN-4385: - Faced one more intermittent failure in 2928 branch but not related to ATS v2 code {code} -- T E S T S --- Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell Tests run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 476.165 sec <<< FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell testDSShellWithDomain(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell) Time elapsed: 29.211 sec <<< FAILURE! java.lang.AssertionError: expected:<2> but was:<3> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.checkTimelineV1(TestDistributedShell.java:356) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShell(TestDistributedShell.java:317) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithDomain(TestDistributedShell.java:195) Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 39.703 sec - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShellWithNodeLabels Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.yarn.applications.distributedshell.TestDSAppMaster Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.508 sec - in org.apache.hadoop.yarn.applications.distributedshell.TestDSAppMaster Results : Failed tests: TestDistributedShell.testDSShellWithDomain:195->testDSShell:317->checkTimelineV1:356 expected:<2> but was:<3> Tests run: 16, Failures: 1, Errors: 0, Skipped: 0 {code} {{TestDistributedShell.checkTimelineV1}} checks whether only 2 (requested) containers are being launched. But in reality more than 2 are getting launched. possible reasons for it are : * when RM has assigned additional containers and the Distributed shell AM is launching it. I had observed similar behavior of over assigning in MR also but MR AM takes care returning the extra apps assigned by the RM. Similar approach should exist in Distributed shell AM too. * container has been killed for some reason and extra Container is started Not sure which of these cases is causing the assigning of additional containers, to analyze this we require more RM and AM logs. Possible solutions are : * Instead of checking only 2 we can check for at least 2, so that test case will not fail if more than 2 containers are launched * Try to ensure not more than desired containers are launched even though RM allocates more containers > TestDistributedShell times out > -- > > Key: YARN-4385 > URL: https://issues.apache.org/jira/browse/YARN-4385 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Tsuyoshi Ozawa >Assignee: Naganarasimha G R > Attachments: > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4385) TestDistributedShell times out
[ https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062444#comment-15062444 ] Naganarasimha G R commented on YARN-4385: - Hi [~ozawa], Please confirm if this is reproducible if not planning to disable it ! > TestDistributedShell times out > -- > > Key: YARN-4385 > URL: https://issues.apache.org/jira/browse/YARN-4385 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Tsuyoshi Ozawa >Assignee: Naganarasimha G R > Attachments: > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4385) TestDistributedShell times out
[ https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15054206#comment-15054206 ] Naganarasimha G R commented on YARN-4385: - Hi [~ozawa], Is this getting reproduced now ? Tried many times was not able to reproduce but was not successful. and from the logs saw only related log exception as below, but felt from it that it was temporal issue your machine. Please confirm to analyze further {code} 2015-11-22 19:29:54,739 DEBUG [IPC Client (273924) connection to ip-172-31-20-42.ap-northeast-1.compute.internal/172.31.20.42:52028 from ubuntu] ipc.Client (Client.java:close(1208)) - closing ipc connection to ip-172-31-20-42.ap-northeast-1.compute.internal/172.31.20.42:52028: null java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1110) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1005) 2015-11-22 19:29:54,739 DEBUG [IPC Client (273924) connection to ip-172-31-20-42.ap-northeast-1.compute.internal/172.31.20.42:52028 from ubuntu] ipc.Client (Client.java:close(1217)) - IPC Client (273924) connection to ip-172-31-20-42.ap-northeast-1.compute.internal/172.31.20.42:52028 from ubuntu: closed 2015-11-22 19:29:54,739 DEBUG [IPC Client (273924) connection to ip-172-31-20-42.ap-northeast-1.compute.internal/172.31.20.42:52028 from ubuntu] ipc.Client (Client.java:run(1018)) - IPC Client (273924) connection to ip-172-31-20-42.ap-northeast-1.compute.internal/172.31.20.42:52028 from ubuntu: stopped, remaining connections 0 2015-11-22 19:29:54,743 DEBUG [Thread-3684] retry.RetryInvocationHandler (RetryInvocationHandler.java:invoke(151)) - Exception while invoking getApplicationReport of class ApplicationClientProtocolPBClientImpl over null. Retrying after sleeping for 3ms. java.io.EOFException: End of File Exception between local host is: "ip-172-31-20-42.ap-northeast-1.compute.internal/172.31.20.42"; destination host is: "ip-172-31-20-42.ap-northeast-1.compute.internal":52028; : java.io.EOFException; For more details see: http://wiki.apache.org/hadoop/EOFException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:765) at org.apache.hadoop.ipc.Client.call(Client.java:1452) at org.apache.hadoop.ipc.Client.call(Client.java:1385) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) at com.sun.proxy.$Proxy87.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:220) at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy88.getApplicationReport(Unknown Source) at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:446) at org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:740) at org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:715) at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithCustomLogPropertyFile(TestDistributedShell.java:502) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at
[jira] [Commented] (YARN-4385) TestDistributedShell times out
[ https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046092#comment-15046092 ] Naganarasimha G R commented on YARN-4385: - Hi [~ozawa], I would like to take a look at this as its related to other jira which i was working on, Please reassign if you are already handling it. > TestDistributedShell times out > -- > > Key: YARN-4385 > URL: https://issues.apache.org/jira/browse/YARN-4385 > Project: Hadoop YARN > Issue Type: Bug > Components: test >Reporter: Tsuyoshi Ozawa >Assignee: Naganarasimha G R > Attachments: > org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell-output.txt > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-4385) TestDistributedShell times out
[ https://issues.apache.org/jira/browse/YARN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022660#comment-15022660 ] Tsuyoshi Ozawa commented on YARN-4385: -- >From https://builds.apache.org/job/Hadoop-Yarn-trunk/1380/ {quote} ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 11262 lines...] TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShell.setup:72->setupInternal:94 » YarnRuntime java.io.IOExcept... TestDistributedShellWithNodeLabels.setup:47 » YarnRuntime java.io.IOException:... Tests run: 14, Failures: 0, Errors: 12, Skipped: 0 [INFO] [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop YARN SUCCESS [ 4.803 s] [INFO] Apache Hadoop YARN API SUCCESS [04:44 min] [INFO] Apache Hadoop YARN Common . SUCCESS [03:31 min] [INFO] Apache Hadoop YARN Server . SUCCESS [ 0.109 s] [INFO] Apache Hadoop YARN Server Common .. SUCCESS [ 57.348 s] [INFO] Apache Hadoop YARN NodeManager SUCCESS [10:05 min] [INFO] Apache Hadoop YARN Web Proxy .. SUCCESS [ 29.458 s] [INFO] Apache Hadoop YARN ApplicationHistoryService .. SUCCESS [03:46 min] [INFO] Apache Hadoop YARN ResourceManager SUCCESS [ 01:03 h] [INFO] Apache Hadoop YARN Server Tests ... SUCCESS [01:52 min] [INFO] Apache Hadoop YARN Client . SUCCESS [07:21 min] [INFO] Apache Hadoop YARN SharedCacheManager . SUCCESS [ 32.136 s] [INFO] Apache Hadoop YARN Applications ... SUCCESS [ 0.053 s] [INFO] Apache Hadoop YARN DistributedShell ... FAILURE [ 29.403 s] [INFO] Apache Hadoop YARN Unmanaged Am Launcher .. SKIPPED [INFO] Apache Hadoop YARN Site ... SKIPPED [INFO] Apache Hadoop YARN Registry ... SKIPPED [INFO] Apache Hadoop YARN Project SKIPPED [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 01:37 h [INFO] Finished at: 2015-11-09T20:36:25+00:00 [INFO] Final Memory: 81M/690M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.17:test (default-test) on project hadoop-yarn-applications-distributedshell: There are test failures. [ERROR] [ERROR] Please refer to /home/jenkins/jenkins-slave/workspace/Hadoop-Yarn-trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/surefire-reports for the individual test results. [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hadoop-yarn-applications-distributedshell Build step 'Execute shell' marked build as failure Archiving artifacts Recording test results Updating HDFS-9234 Sending e-mails to: yarn-...@hadoop.apache.org Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## 12 tests failed. FAILED: org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithInvalidArgs Error Message: java.io.IOException: ResourceManager failed to start. Final state is STOPPED Stack Trace: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED at org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:331) at