[
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13814065#comment-13814065
]
Karthik Kambatla commented on YARN-1222:
----------------------------------------
Thanks [~bikassaha] for the close review.
We should probably examine the ACL strategy a little more. My reasoning behind
allowing the user to configure the ACLs is to avoid security holes. For
instance, if we just use yarn.resourcemanager.address and the RM's
cluster-time-stamp, third-parties (anything but the RM) can retrieve that
information and mess with the store.
bq. How is the following case going to work? How can the root node acl be set
in the conf? Upon active, we have to remove the old RM's cd-acl and set our
cd-acl. That cannot be statically set in conf right?
The root-node ACLs are per RM instance. They need to be different for it to
work. The documentation in yarn-default.xml explains this - we might have to
make it even more clear?
bq. My concern is that we are only adding new ACLs every time we failover but
never deleting them. Is it possible that we end up creating too many ACLs for
the root znode and hit ZK issues?
Don't think that is possible. On failover, if not configured, we construct root
node ACLs from the same initial ACLs. They are not adding up across iterations.
The number of ACLs in the list is always bounded by (user-configured-for-store
+ 1). Am I missing something?
e.g. If the user doesn't configure any ZK-ACLs, the ACLs for the store are
{world:anyone:rwcda} and the ACLs for the root node are {world:anyone:rwa,
active-rm-address:active-rm-timestamp:cd} always.
bq. For both of the above, can we use well-known prefixes for the root znode
acls (rm-admin-acl and rm-cd-acl).
We might be able to do that, but the user can realize it in the current
implementation by configuring the root ACLs to exactly that.
A completely different approach to this would be to use
session-based-authentication. The Active session claims create-delete. However,
we might want to do that as a follow up - it might need some more refactoring
on the store to stick to ensure a single session.
bq. Can we move this logic into the common RMStateStore and notify it about HA
state loss via a standard HA exception.
I initially did that, but moved it to ZKRMStateStore because the common
RMStateStore is oblivious to the implicit fencing mechanism in the ZKStore. Do
you think we should make it aware of fencing - have something like a
StoreFencedException?
bq. Will the null return make the state store crash?
It didn't store the crash in my testing. Will look at it more closely for the
next revision of the patch.
bq. This and other similar places need an @Private
ZKRMStateStore itself is @Private @Unstable. Should we still label the methods
@Private?
{quote} @Private?
{code}
+ public static String getConfValueForRMInstance(String prefix,
{code}
{quote}
This was intentional - there might be merit to leaving these methods public and
mark the class itself @Public @Unstable at the moment.
> Make improvements in ZKRMStateStore for fencing
> -----------------------------------------------
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Bikas Saha
> Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch,
> yarn-1222-4.patch
>
>
> Using multi-operations for every ZK interaction.
> In every operation, automatically creating/deleting a lock znode that is the
> child of the root znode. This is to achieve fencing by modifying the
> create/delete permissions on the root znode.
--
This message was sent by Atlassian JIRA
(v6.1#6144)