[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM

Bikas Saha (JIRA) Fri, 20 Dec 2013 06:00:19 -0800

    [ 
https://issues.apache.org/jira/browse/YARN-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13853970#comment-13853970
 ]


Bikas Saha commented on YARN-1029:
----------------------------------

Why fencing configurable when ZK store is self-fenced? I dont think we need to 
add an fencing related code for embedded FC except for a dummy fencer to pass 
into the elector code.
{code}+  public static final String RM_HA_FENCER = RM_HA_PREFIX + 
"fencer";{code}

Can we please consolidate all zk configs in one place in the file

Isnt rmId enough because the rest of its is available from config. The port is 
anyways one of many rm ports.
{code}+  required int32 port = 1;
+  required string hostname = 2;
+  required string clusterid = 3;
+  required string rmId = 4;{code}

There is a separate jira open to add a cluster-id

dropped the synchronized? 
{code}-  private synchronized boolean isRMActive() {{code}

there is no fencer in embedded election, right?
{code}+  @Override
+  public void becomeStandby() {
+    try {
+      rm.transitionToStandby(true);
+    } catch (Exception e) {
+      // Log the exception. The fencer should be able to fence this node
+      LOG.error("RM could not transition to Standby mode", e);
+    }
+  }{code}

this is probably not enough. we need to notify the rm.
{code}@Override
+  public void notifyFatalError(String errorMessage) {
+    LOG.fatal("Received " + errorMessage);
+    throw new YarnRuntimeException(errorMessage);
+  }{code}

this should be empty. there is no fencing in embedded election because zk store 
is self-fenced.
{code}@Override
+  public void fenceOldActive(byte[] oldActiveData) {
+    RMHAServiceTarget target = dataToTarget(oldActiveData);
+
+    try {
+      target.checkFencingConfigured();
+    } catch (BadFencingConfigurationException e) {
+      throw new YarnBadConfigurationException(e.getMessage());
+    }
+
+    if (!target.getFencer().fence(target)) {
+      throw new YarnRuntimeException("Could not fence old active");
+    }
+  }{code}

Didnt quite get the purpose of the new thread. Why can we not call 
elector.joinElection() in serviceStart(). There is no need for us to loop and 
keep calling joinElection() in a thread.

Use newly created HAUtil helper methods?
{code}
+      if (conf.getBoolean(YarnConfiguration.AUTO_FAILOVER_ENABLED,
+          YarnConfiguration.DEFAULT_AUTO_FAILOVER_ENABLED)) {
+        // Automatic failover enabled
+        if (conf.getBoolean(YarnConfiguration.AUTO_FAILOVER_EMBEDDED,
+            YarnConfiguration.DEFAULT_AUTO_FAILOVER_EMBEDDED)) {
+          // Embedded automatic failover enabled
+          electorService = createRMZKActiveStandbyElectorService();
+          addIfService(electorService);
{code}

In the embedded failover test how do we know that the ZK based failover is 
being triggered? I did not understand how failover can happen so quickly when 
the zk session timeout is 10s.

IMO the ElectorService should not be calling RM.transitionToActive/Standby. It 
should be calling AdminService.transitionToActive/Standby. The AdminService is 
the only HA entry point into the system. By calling directly into the RM we are 
breaking the abstractions that everything else is going to follow.

Also, an alternative layering would be if the ElectorService would be made a 
member of the AdminService. There is no need for the main body of the RM to 
know about failover or failover controllers (FC) etc. Interaction with any FC 
for failover is abstracted in the AdminService. So IMO if FC is configured to 
be embedded then we can maintain the abstraction and embed it into the 
AdminService. 


> Allow embedding leader election into the RM
> -------------------------------------------
>
>                 Key: YARN-1029
>                 URL: https://issues.apache.org/jira/browse/YARN-1029
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Karthik Kambatla
>         Attachments: embedded-zkfc-approach.patch, yarn-1029-0.patch, 
> yarn-1029-0.patch, yarn-1029-1.patch, yarn-1029-approach.patch
>
>
> It should be possible to embed common ActiveStandyElector into the RM such 
> that ZooKeeper based leader election and notification is in-built. In 
> conjunction with a ZK state store, this configuration will be a simple 
> deployment option.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (YARN-1029) Allow embedding leader election into the RM

Reply via email to