[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812663#comment-13812663
 ] 

Karthik Kambatla commented on YARN-1222:
----------------------------------------

bq. deleteWithRetries() - The new logic does not seem identical to the older 
logic.
Good catch! It is not identical, but should have the same effect. Instead of 
using a ZKAction that does both exist() and delete(), the second patch was 
doing exist first (with retries) and then delete which can lead to funny 
behavior. Hence, the return null.

bq. why do we do an exists check instead of catching the NoNodeException after 
calling delete directly?
Agree deleting and catching NoNodeException makes more sense. The exists() 
followed by delete() comes from earlier patches on YARN-353, and I am not sure 
the reasoning behind it. We don't seem to be using the watch anywhere. The 
updated patch (-3.patch) adopts this approach.

bq. Should this method only change the ACL's pertaining to other RM instances 
and not every ACL on the znode? Alternatively, if the assumption is that the 
root znode etc are manipulated only by the RM's then can we simply remove all 
older ACL's and set a new ACL for the current RM.
Only the RMs manipulate the znode structure. Here, we are using the ZKStoreACLs 
to generate the ZKRootNodeACLs. IIUC, the ZKStoreACLs define who has access to 
the store. That could be used to define the users that can run the RM with 
ZKStateStore; an example could be that only adminuser is allowed. The patch 
provides adminuser as much access on the ZKRootNode as possible, removing 
create-delete permissions. In addition to that, it gives exclusive 
create-delete perms to the RM on ZKRootNode. I think this is required if we 
want to support this behavior even for an ACL list where say each element gives 
one user access to run the RM. One extreme case could be two RMs running - each 
started by a different user with both users being allowed by the ACLs. 

Does that sound reasonable? Should we handle it any other way?

> Make improvements in ZKRMStateStore for fencing
> -----------------------------------------------
>
>                 Key: YARN-1222
>                 URL: https://issues.apache.org/jira/browse/YARN-1222
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Karthik Kambatla
>         Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to