[ https://issues.apache.org/jira/browse/YARN-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059248#comment-15059248 ]
Karthik Kambatla commented on YARN-4438: ---------------------------------------- bq. I'll make the change accordingly. Not sure I understand. Regarding a single client for elector and store, are we doing it in this patch or punting to a follow-up. I am open to either options, would prefer doing it here. All that is needed is store the CuratorFramework instance in RMContext. Comments on the patch itself: # LeaderElectorService ## The instance, rm, is not used anywhere. Why even pass it? The RMContext should have all the information that the elector needs or updates? ## Use conf instead of getConfig() here? {code} String zkBasePath = getConfig().get( YarnConfiguration.AUTO_FAILOVER_ZK_BASE_PATH, YarnConfiguration.DEFAULT_AUTO_FAILOVER_ZK_BASE_PATH); latchPath = zkBasePath + "/" + clusterId; {code} ## isLeader(): update log messages to say - "rmId is elected leader, transitioning to active". On failure, "Failed to transition to active, giving up leadership"? ## reJoinElection: why sleep for 1 second? ## reJoinElection: Also, on exception, it might not be enough to just log it. If it is due to close(), don't we want to force give-up so the other RM becomes active? If it is on initAndStartLeaderLatch(), this RM will never become active; don't we want to just die? ## How about adding a method called closeLeaderLatch to complement initAndStart? That would help us avoid cases like the null check missing in rejoinElection? ## notLeader: Again, we should likely do more than just logging. # YarnConfiguration: If our long-term plan is to keep the curator version and get rid of EmbeddedElectorService, may be we should have a config to use embedded-elector instead of curator-elector. e.g. yarn.resourcemanager.ha.use-active-standby-elector and set it to true by default for now? # ResourceManager: ## Nit: Spurious import changes - prefer leaving them out or fixing them in a separate patch before/after. ## Why change the argument to transitionToStandby from true to false? ## Comment: Looking closer, in the following method, reinitialize(initialize) should be called outside the if. No? I am surprised we haven't noticed this before. May be, fix it on another JIRA? # Looking at the changes in AdminService and ResourceManager, I still feel the AdminService should be the one handling the LeaderElectorService. Also, the LeaderElectorService talks to AdminService for transitions to active/standby. > Implement RM leader election with curator > ----------------------------------------- > > Key: YARN-4438 > URL: https://issues.apache.org/jira/browse/YARN-4438 > Project: Hadoop YARN > Issue Type: Improvement > Reporter: Jian He > Assignee: Jian He > Attachments: YARN-4438.1.patch, YARN-4438.2.patch, YARN-4438.3.patch > > > This is to implement the leader election with curator instead of the > ActiveStandbyElector from common package, this also avoids adding more > configs in common to suit RM's own needs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)