[jira] [Updated] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1281: - Attachment: YARN-1281.1.patch Thank you for comment, Mit. Assigned this issue to myself. {code} @Override public ZooKeeper getNewZooKeeper() throws IOException, InterruptedException { return createClient(watcher, hostPort, 100); } {code} I suspect that the timeout value is too short to connect ZK servers, because Jenkins servers can get overload sometimes. Attached patch changes the test to add timeout value. I'm running the test hundreds times on local. I'll report the result. The following comments are observation from code and log. 1. ZK server startups correctly and its client fails to connect to server. We can observe it from the log, . 2. ZKRMStateStore is not called stop() method after testing, but its connection is cleaned up after testing in ClientBaseWithFixes#tearDown. IIUC, it works well. TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Attachments: YARN-1281.1.patch, output.txt The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1281: - Attachment: YARN-1281.2.patch Updated a patch to change ZK-related timeouts correctly. TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Attachments: YARN-1281.1.patch, YARN-1281.2.patch, output.txt The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-1281: -- Issue Type: Test (was: Bug) TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Test Components: resourcemanager Reporter: Karthik Kambatla Assignee: Tsuyoshi OZAWA Attachments: YARN-1281.1.patch, YARN-1281.2.patch, output.txt The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1281: - Attachment: TestZKRMStateStureoSKClientConnections-output.txt Attached complete log at the failure. ZMZKUtils#getZKAcls() fails to read ACLs. Maybe this is because of setup timing. {quote} 2014-04-15 08:13:04,712 ERROR [Thread-12] resourcemanager.RMZKUtils (RMZKUtils.java:getZKAcls(51)) - Couldn't read ACLs based on yarn.resourcemanager.zk-acl 2014-04-15 08:13:04,713 INFO [Thread-12] service.AbstractService (AbstractService.java:noteFailure(272)) - Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore failed in state INITED; cause: org.apache.hadoop.util.ZKUtil$BadAclFormatException: ACL 'randomstring*' not of expected form scheme:id:perm org.apache.hadoop.util.ZKUtil$BadAclFormatException: ACL 'randomstring*' not of expected form scheme:id:perm at org.apache.hadoop.util.ZKUtil.parseACLs(ZKUtil.java:110) at org.apache.hadoop.yarn.server.resourcemanager.RMZKUtils.getZKAcls(RMZKUtils.java:49) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.initInternal(ZKRMStateStore.java:206) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.serviceInit(RMStateStore.java:276) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections$TestZKClient$TestZKRMStateStore.init(TestZKRMStateStoreZKClientConnections.java:79) at org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections$TestZKClient.getRMStateStore(TestZKRMStateStoreZKClientConnections.java:129) at org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStoreZKClientConnections.testInvalidZKAclConfiguration(TestZKRMStateStoreZKClientConnections.java:261) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62) {quote} TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karthik Kambatla Attachments: TestZKRMStateStureoSKClientConnections-output.txt The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1281: - Attachment: (was: TestZKRMStateStureoSKClientConnections-output.txt) TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karthik Kambatla Attachments: output.txt The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-1281: - Attachment: output.txt TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karthik Kambatla Attachments: output.txt The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1281) TestZKRMStateStoreZKClientConnections fails intermittently
[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Kambatla updated YARN-1281: --- Assignee: (was: Karthik Kambatla) TestZKRMStateStoreZKClientConnections fails intermittently -- Key: YARN-1281 URL: https://issues.apache.org/jira/browse/YARN-1281 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Karthik Kambatla The test fails intermittently - haven't been able to reproduce the failure deterministically. -- This message was sent by Atlassian JIRA (v6.2#6252)