[
https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059248#comment-15059248
]
Karthik Kambatla commented on YARN-4438:
----------------------------------------
bq. I'll make the change accordingly.
Not sure I understand. Regarding a single client for elector and store, are we
doing it in this patch or punting to a follow-up. I am open to either options,
would prefer doing it here. All that is needed is store the CuratorFramework
instance in RMContext.
Comments on the patch itself:
# LeaderElectorService
## The instance, rm, is not used anywhere. Why even pass it? The RMContext
should have all the information that the elector needs or updates?
## Use conf instead of getConfig() here?
{code}
String zkBasePath = getConfig().get(
YarnConfiguration.AUTO_FAILOVER_ZK_BASE_PATH,
YarnConfiguration.DEFAULT_AUTO_FAILOVER_ZK_BASE_PATH);
latchPath = zkBasePath + "/" + clusterId;
{code}
## isLeader(): update log messages to say - "rmId is elected leader,
transitioning to active". On failure, "Failed to transition to active, giving
up leadership"?
## reJoinElection: why sleep for 1 second?
## reJoinElection: Also, on exception, it might not be enough to just log it.
If it is due to close(), don't we want to force give-up so the other RM becomes
active? If it is on initAndStartLeaderLatch(), this RM will never become
active; don't we want to just die?
## How about adding a method called closeLeaderLatch to complement
initAndStart? That would help us avoid cases like the null check missing in
rejoinElection?
## notLeader: Again, we should likely do more than just logging.
# YarnConfiguration: If our long-term plan is to keep the curator version and
get rid of EmbeddedElectorService, may be we should have a config to use
embedded-elector instead of curator-elector. e.g.
yarn.resourcemanager.ha.use-active-standby-elector and set it to true by
default for now?
# ResourceManager:
## Nit: Spurious import changes - prefer leaving them out or fixing them in a
separate patch before/after.
## Why change the argument to transitionToStandby from true to false?
## Comment: Looking closer, in the following method, reinitialize(initialize)
should be called outside the if. No? I am surprised we haven't noticed this
before. May be, fix it on another JIRA?
# Looking at the changes in AdminService and ResourceManager, I still feel the
AdminService should be the one handling the LeaderElectorService. Also, the
LeaderElectorService talks to AdminService for transitions to active/standby.
> Implement RM leader election with curator
> -----------------------------------------
>
> Key: YARN-4438
> URL: https://issues.apache.org/jira/browse/YARN-4438
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Jian He
> Assignee: Jian He
> Attachments: YARN-4438.1.patch, YARN-4438.2.patch, YARN-4438.3.patch
>
>
> This is to implement the leader election with curator instead of the
> ActiveStandbyElector from common package, this also avoids adding more
> configs in common to suit RM's own needs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)