Tsuyoshi Ozawa commented on YARN-4438:

[~jianhe] Thank you for taking the issue. +1 for the design. Could you check 
following comments?

1. In the code path of {{RMWebApp.getHAZookeeperConnectionState}}, 
{{embeddedElector}}, which is null since {{AdminService.embeddedElector}} is 
uninitialized, can be accessed directly. Can we use 
{{rmContext.getLeaderElectorService()}} instead?

public class AdminService extends CompositeService implements
    HAServiceProtocol, ResourceManagerAdministrationProtocol {
  public String getHAZookeeperConnectionState() {
    if (!rmContext.isHAEnabled()) {
      return "ResourceManager HA is not enabled.";
    } else if (!autoFailoverEnabled) {
      return "Auto Failover is not enabled.";
    return this.embeddedElector.getHAZookeeperConnectionState();

public class RMWebApp extends WebApp implements YarnWebParams {
  public String getHAZookeeperConnectionState() {
    return rm.getRMContext().getRMAdminService()

2. In {{TestLeaderElectorService.testKillZKInstance}}, should we check (rm1 is 
active and rm2 is standby) or  (rm1 is standby and rm2 is active) explicitly to 
detect split brain problem?

    GenericTestUtils.waitFor(new Supplier<Boolean>() {
      @Override public Boolean get() {
        try {
          return rm1.getAdminService().getServiceStatus().getState()
              .equals(HAServiceState.ACTIVE) || rm2.getAdminService()
        } catch (IOException e) {
        return false;

can be:

              || (rm1.getAdminService().getServiceStatus().getState()
              .equals(HAServiceState.STANDBY) && rm2.getAdminService()

Following comments are minor nits:

* LeaderElectorService and TestLeaderElectorService includes lines which exceed 
80 characters.
* Should we name a variable, {{TestingCluster cluster}} in 
TestLeaderElectorService, {{zkCluster}} to make it clear?
* Found a typo:
  public void testRMFailToTransitionToActive() throws Exception{
    Thread laucnRM = new Thread() {

* We can remove unused imports in {{LeaderElectorService}}, 
{{TestLeaderElectorService}}, {{ResourceManager}}.

import org.apache.hadoop.fs.CommonConfigurationKeys;
import org.apache.hadoop.yarn.exceptions.YarnRuntimeException;

{code:title= TestLeaderElectorService.java}
import static org.mockito.Mockito.spy;

import org.apache.curator.framework.recipes.leader.LeaderLatchListener;

* This is just a question - why did you change an argument of 
{{RMAuditLogger.logFailure}}, target, from {{RMHAProtocolService}} to {{RM}}?

@@ -319,7 +323,7 @@ public synchronized void transitionToActive(
     } catch (Exception e) {
       RMAuditLogger.logFailure(user.getShortUserName(), "transitionToActive",
-          "", "RMHAProtocolService",
+          "", "RM",
           "Exception transitioning to active");
       throw new ServiceFailedException(
           "Error when transitioning to Active mode", e);
@@ -338,7 +342,7 @@ public synchronized void transitionToActive(
           "Error on refreshAll during transistion to Active", e);
     RMAuditLogger.logSuccess(user.getShortUserName(), "transitionToActive",
-        "RMHAProtocolService");
+        "RM");

> Implement RM leader election with curator
> -----------------------------------------
>                 Key: YARN-4438
>                 URL: https://issues.apache.org/jira/browse/YARN-4438
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Jian He
>            Assignee: Jian He
>         Attachments: YARN-4438.1.patch
> This is to implement the leader election with curator instead of the 
> ActiveStandbyElector from common package,  this also avoids adding more 
> configs in common to suit RM's own needs. 

This message was sent by Atlassian JIRA

Reply via email to