[
https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karthik Kambatla updated YARN-1029:
-----------------------------------
Attachment: yarn-1029-1.patch
Here is an updated patch that implements only automatic failover. The patch
also unifies the zk-connection-related configs.
The patch also fixes a bug in RMProxy (YARN-1028) - the number of failovers
should be tuned, not the policy. I thought I tested this on a cluster, but
looks like I forgot to remove yarn.client.failover-max-attempts from the
config. Now, fixed TestRMFailover to test this.
Is it okay to set the zk-connection-timeout to 10 seconds? Would the store need
longer timeout. This timeout affects the actual failover time.
Testing: TestRMFailover and TestRMHA test the new code. Ran a 2-node cluster
and killed the Active RM while running a job.
Pending: Documentation of config changes in yarn-default.xml. Will add these in
the next revision along with any other suggestions.
> Allow embedding leader election into the RM
> -------------------------------------------
>
> Key: YARN-1029
> URL: https://issues.apache.org/jira/browse/YARN-1029
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Bikas Saha
> Assignee: Karthik Kambatla
> Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch,
> yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-approach.patch
>
>
> It should be possible to embed common ActiveStandyElector into the RM such
> that ZooKeeper based leader election and notification is in-built. In
> conjunction with a ZK state store, this configuration will be a simple
> deployment option.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)