[ 
https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated YARN-1029:
-----------------------------------

    Attachment: yarn-1029-1.patch

Here is an updated patch that implements only automatic failover. The patch 
also unifies the zk-connection-related configs. 

The patch also fixes a bug in RMProxy (YARN-1028) - the number of failovers 
should be tuned, not the policy. I thought I tested this on a cluster, but 
looks like I forgot to remove yarn.client.failover-max-attempts from the 
config. Now, fixed TestRMFailover to test this.

Is it okay to set the zk-connection-timeout to 10 seconds? Would the store need 
longer timeout. This timeout affects the actual failover time.

Testing: TestRMFailover and TestRMHA test the new code. Ran a 2-node cluster 
and killed the Active RM while running a job. 

Pending: Documentation of config changes in yarn-default.xml. Will add these in 
the next revision along with any other suggestions.

> Allow embedding leader election into the RM
> -------------------------------------------
>
>                 Key: YARN-1029
>                 URL: https://issues.apache.org/jira/browse/YARN-1029
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Karthik Kambatla
>         Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, 
> yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-approach.patch
>
>
> It should be possible to embed common ActiveStandyElector into the RM such 
> that ZooKeeper based leader election and notification is in-built. In 
> conjunction with a ZK state store, this configuration will be a simple 
> deployment option.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to