[ 
https://issues.apache.org/jira/browse/YARN-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated YARN-1281:
---------------------------------

    Attachment: YARN-1281.1.patch

Thank you for comment, Mit. Assigned this issue to myself.

{code}
      @Override
      public ZooKeeper getNewZooKeeper()
          throws IOException, InterruptedException {
        return createClient(watcher, hostPort, 100);
      }
{code}

I suspect that the timeout value is too short to connect ZK servers, because 
Jenkins servers can get overload sometimes. Attached patch changes the test to 
add timeout value. I'm running the test hundreds times on local. I'll report 
the result.

The following comments are observation from code and log.
1. ZK server startups correctly and its client fails to connect to server. We 
can observe it from the log, .
2. ZKRMStateStore is not called stop() method after testing, but its connection 
is cleaned up after testing in ClientBaseWithFixes#tearDown. IIUC, it works 
well.


> TestZKRMStateStoreZKClientConnections fails intermittently
> ----------------------------------------------------------
>
>                 Key: YARN-1281
>                 URL: https://issues.apache.org/jira/browse/YARN-1281
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Karthik Kambatla
>            Assignee: Tsuyoshi OZAWA
>         Attachments: YARN-1281.1.patch, output.txt
>
>
> The test fails intermittently - haven't been able to reproduce the failure 
> deterministically. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to