[
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13812663#comment-13812663
]
Karthik Kambatla commented on YARN-1222:
----------------------------------------
bq. deleteWithRetries() - The new logic does not seem identical to the older
logic.
Good catch! It is not identical, but should have the same effect. Instead of
using a ZKAction that does both exist() and delete(), the second patch was
doing exist first (with retries) and then delete which can lead to funny
behavior. Hence, the return null.
bq. why do we do an exists check instead of catching the NoNodeException after
calling delete directly?
Agree deleting and catching NoNodeException makes more sense. The exists()
followed by delete() comes from earlier patches on YARN-353, and I am not sure
the reasoning behind it. We don't seem to be using the watch anywhere. The
updated patch (-3.patch) adopts this approach.
bq. Should this method only change the ACL's pertaining to other RM instances
and not every ACL on the znode? Alternatively, if the assumption is that the
root znode etc are manipulated only by the RM's then can we simply remove all
older ACL's and set a new ACL for the current RM.
Only the RMs manipulate the znode structure. Here, we are using the ZKStoreACLs
to generate the ZKRootNodeACLs. IIUC, the ZKStoreACLs define who has access to
the store. That could be used to define the users that can run the RM with
ZKStateStore; an example could be that only adminuser is allowed. The patch
provides adminuser as much access on the ZKRootNode as possible, removing
create-delete permissions. In addition to that, it gives exclusive
create-delete perms to the RM on ZKRootNode. I think this is required if we
want to support this behavior even for an ACL list where say each element gives
one user access to run the RM. One extreme case could be two RMs running - each
started by a different user with both users being allowed by the ACLs.
Does that sound reasonable? Should we handle it any other way?
> Make improvements in ZKRMStateStore for fencing
> -----------------------------------------------
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Bikas Saha
> Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch
>
>
> Using multi-operations for every ZK interaction.
> In every operation, automatically creating/deleting a lock znode that is the
> child of the root znode. This is to achieve fencing by modifying the
> create/delete permissions on the root znode.
--
This message was sent by Atlassian JIRA
(v6.1#6144)