[ https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tsuyoshi OZAWA updated YARN-1281: --------------------------------- Attachment: YARN-1281.1.patch Thank you for comment, Mit. Assigned this issue to myself. {code} @Override public ZooKeeper getNewZooKeeper() throws IOException, InterruptedException { return createClient(watcher, hostPort, 100); } {code} I suspect that the timeout value is too short to connect ZK servers, because Jenkins servers can get overload sometimes. Attached patch changes the test to add timeout value. I'm running the test hundreds times on local. I'll report the result. The following comments are observation from code and log. 1. ZK server startups correctly and its client fails to connect to server. We can observe it from the log, . 2. ZKRMStateStore is not called stop() method after testing, but its connection is cleaned up after testing in ClientBaseWithFixes#tearDown. IIUC, it works well. > TestZKRMStateStoreZKClientConnections fails intermittently > ---------------------------------------------------------- > > Key: YARN-1281 > URL: https://issues.apache.org/jira/browse/YARN-1281 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Reporter: Karthik Kambatla > Assignee: Tsuyoshi OZAWA > Attachments: YARN-1281.1.patch, output.txt > > > The test fails intermittently - haven't been able to reproduce the failure > deterministically. -- This message was sent by Atlassian JIRA (v6.2#6252)