Karthik Kambatla commented on YARN-2054:

bq. If we want these configs to match up with 
yarn.resourcemanager.zk-timeout-ms and (as YARN-1878 is trying) if that can 
change, we need to somehow make them linked dynamically?
These configs need not match, but in an HA setting, it might not make a lot of 
sense to have these significantly different.

bq. Does it make sense to link with the config HA enabled also ? If we have 
another RM sitting standby, we may want to failover quickly. But if we have 
only one RM, and somehow ZK is unavailable, RM will only retry for 10 seconds 
and shuts down.
Good point. May be, we can come up with a good value for retry-interval based 
on whether HA is enabled and yarn.resourcemanager.zk-timeout-ms. 

> Poor defaults for YARN ZK configs for retries and retry-inteval
> ---------------------------------------------------------------
>                 Key: YARN-2054
>                 URL: https://issues.apache.org/jira/browse/YARN-2054
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>         Attachments: yarn-2054-1.patch
> Currenly, we have the following default values:
> # yarn.resourcemanager.zk-num-retries - 500
> # yarn.resourcemanager.zk-retry-interval-ms - 2000
> This leads to a cumulate 1000 seconds before the RM gives up trying to 
> connect to the ZK. 

This message was sent by Atlassian JIRA

Reply via email to