[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813716#comment-13813716
 ] 

Bikas Saha commented on YARN-1222:
----------------------------------

@Private? {code}+  public static String getConfValueForRMInstance(String 
prefix,{code}

If RM is the one creating root znode then how can someone else's ACL's be 
present on that znode? ie. how can the ACLs on root znode have any other 
entries?

My concern is that we are only adding new ACLs every time we failover but never 
deleting them. Is it possible that we end up creating too many ACLs for the 
root znode and hit ZK issues?
{code}
+    Id rmId = new Id(zkRootNodeAuthScheme,
+        DigestAuthenticationProvider.generateDigest(
+            zkRootNodeUsername + ":" + zkRootNodePassword));
+    zkRootNodeAcl.add(new ACL(CREATE_DELETE_PERMS, rmId));
+    return zkRootNodeAcl;
{code}

For both of the above, can we use well-known prefixes for the root znode acls 
(rm-admin-acl and rm-cd-acl). When fencing we dont touch the rm-admin-acl but 
remove all rm-cd-acl's. We then add a new rm-cd-acl for ourselves. we dont 
touch any other acl. Where is the shared rm-admin-acl being set such that both 
RMs have admin access to the root znode?

How is the following case going to work? How can the root node acl be set in 
the conf? Upon active, we have to remove the old RM's cd-acl and set our 
cd-acl. That cannot be statically set in conf right?
{code}
if (HAUtil.isHAEnabled(conf)) {
+      String zkRootNodeAclConf = HAUtil.getConfValueForRMInstance
+          (YarnConfiguration.ZK_RM_STATE_STORE_ROOT_NODE_ACL, conf);
+      if (zkRootNodeAclConf != null) {
+        zkRootNodeAclConf = ZKUtil.resolveConfIndirection(zkRootNodeAclConf);
+        try {
+          zkRootNodeAcl = ZKUtil.parseACLs(zkRootNodeAclConf);
+        } catch (ZKUtil.BadAclFormatException bafe) {
+          LOG.error("Invalid format for " +
+              YarnConfiguration.ZK_RM_STATE_STORE_ROOT_NODE_ACL);
+          throw bafe;
+        }
+      }
{code}

The test should probably create separate copies of conf for the 2 RM's

Wont we get an exception/error from this? {code}+    
rmService.submitApplication(SubmitApplicationRequest.newInstance(asc));
{code}
Lets put a comment saying, triggering a state store operation that makes rm1 
realize that its not the master because it got fenced by the store.

This and other similar places need an @Private {code}+  @VisibleForTesting
+  public void createWithRetries({code}

Can you please specify in comments which operations are exempt from 
multi-operation. Looks like only "write" operations go through multi. 
Exceptions being initial znode creation and fence-on-active. Right?

Can we move this logic into the common RMStateStore and notify it about HA 
state loss via a standard HA exception. Will the null return make the state 
store crash?
{code}
+        } catch (KeeperException.NoAuthException nae) {
+          if (HAUtil.isHAEnabled(getConfig())) {
+            // Transition to standby
+            RMHAServiceTarget target = new RMHAServiceTarget(
+                (YarnConfiguration)getConfig());
+            target.getProxy(getConfig(), 1000).transitionToStandby(
+                new HAServiceProtocol.StateChangeRequestInfo(
+                    HAServiceProtocol.RequestSource.REQUEST_BY_USER_FORCED));
+            return null;
+          }
{code}


> Make improvements in ZKRMStateStore for fencing
> -----------------------------------------------
>
>                 Key: YARN-1222
>                 URL: https://issues.apache.org/jira/browse/YARN-1222
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Karthik Kambatla
>         Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, 
> yarn-1222-4.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to