[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814065#comment-13814065
 ] 

Karthik Kambatla commented on YARN-1222:
----------------------------------------

Thanks [~bikassaha] for the close review. 

We should probably examine the ACL strategy a little more. My reasoning behind 
allowing the user to configure the ACLs is to avoid security holes. For 
instance, if we just use yarn.resourcemanager.address and the RM's 
cluster-time-stamp, third-parties (anything but the RM) can retrieve that 
information and mess with the store.

bq. How is the following case going to work? How can the root node acl be set 
in the conf? Upon active, we have to remove the old RM's cd-acl and set our 
cd-acl. That cannot be statically set in conf right?
The root-node ACLs are per RM instance. They need to be different for it to 
work. The documentation in yarn-default.xml explains this - we might have to 
make it even more clear? 

bq. My concern is that we are only adding new ACLs every time we failover but 
never deleting them. Is it possible that we end up creating too many ACLs for 
the root znode and hit ZK issues?
Don't think that is possible. On failover, if not configured, we construct root 
node ACLs from the same initial ACLs. They are not adding up across iterations. 
The number of ACLs in the list is always bounded by (user-configured-for-store 
+ 1). Am I missing something?

e.g. If the user doesn't configure any ZK-ACLs, the ACLs for the store are 
{world:anyone:rwcda} and the ACLs for the root node are {world:anyone:rwa, 
active-rm-address:active-rm-timestamp:cd} always. 

bq. For both of the above, can we use well-known prefixes for the root znode 
acls (rm-admin-acl and rm-cd-acl). 
We might be able to do that, but the user can realize it in the current 
implementation by configuring the root ACLs to exactly that.

A completely different approach to this would be to use 
session-based-authentication. The Active session claims create-delete. However, 
we might want to do that as a follow up - it might need some more refactoring 
on the store to stick to ensure a single session. 

bq. Can we move this logic into the common RMStateStore and notify it about HA 
state loss via a standard HA exception. 
I initially did that, but moved it to ZKRMStateStore because the common 
RMStateStore is oblivious to the implicit fencing mechanism in the ZKStore. Do 
you think we should make it aware of fencing - have something like a 
StoreFencedException?

bq. Will the null return make the state store crash?
It didn't store the crash in my testing. Will look at it more closely for the 
next revision of the patch.

bq. This and other similar places need an @Private
ZKRMStateStore itself is @Private @Unstable. Should we still label the methods 
@Private? 

{quote} @Private?
{code}
+  public static String getConfValueForRMInstance(String prefix,
{code}
{quote}
This was intentional - there might be merit to leaving these methods public and 
mark the class itself @Public @Unstable at the moment.

> Make improvements in ZKRMStateStore for fencing
> -----------------------------------------------
>
>                 Key: YARN-1222
>                 URL: https://issues.apache.org/jira/browse/YARN-1222
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Karthik Kambatla
>         Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, 
> yarn-1222-4.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to