Jian He commented on YARN-2716:

Thanks Karthik for working this ! This simplifies things a lot. 
Mostly good, few comments and questions:
- these two booleans not used, maybe removed.
{{private boolean create = false, delete = false;
- is this going to be done in this jira?
{code} // TODO: Check deleting appIdRemovePath works recursively
- will the safeDelete throw noNodeExist exception if deleting a non-existing 
- {{new RetryNTimes(numRetries, zkSessionTimeout / numRetries));}},  I think 
the second parameter should be zkRetryInterval; Also, I have a question why in 
HA case, zkRetryInterval is calculated as below
if (HAUtil.isHAEnabled(conf)) {
  zkRetryInterval = zkSessionTimeout / 

- I found this 
 saying that blockUntilConnect is not needed to call; Suppose it’s needed, I 
think the zkSessionTimeout value is too small, it would be 
numRetries*numRetryInterval, otherwise RM will exit soon after retry 10s by 
if (!curatorFramework.blockUntilConnected(
        zkSessionTimeout, TimeUnit.MILLISECONDS)) {
      LOG.fatal("Couldn't establish connection to ZK server");
      throw new YarnRuntimeException("Couldn't connect to ZK server");
- remove this ?
//      @Override
//      public ZooKeeper getNewZooKeeper() throws IOException {
//        return client;
//      }
-  I think testZKSessionTimeout may be removed too ? it looks like a test for 

> Refactor ZKRMStateStore retry code with Apache Curator
> ------------------------------------------------------
>                 Key: YARN-2716
>                 URL: https://issues.apache.org/jira/browse/YARN-2716
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Jian He
>            Assignee: Karthik Kambatla
>         Attachments: yarn-2716-1.patch, yarn-2716-prelim.patch, 
> yarn-2716-prelim.patch, yarn-2716-super-prelim.patch
> Per suggestion by [~kasha] in YARN-2131,  it's nice to use curator to 
> simplify the retry logic in ZKRMStateStore.

This message was sent by Atlassian JIRA

Reply via email to