[
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815280#comment-13815280
]
Bikas Saha commented on YARN-1222:
----------------------------------
bq. Post YARN-1318, I think RMStateStore constructor should take RMContext.
Then, we should be able to replace the RPC approach with
rmContext.getHAService.transitionToStandby()
Great, lets track that and put a comment. Doing a self-RPC is good to avoid.
bq. A completely different approach might to be keep
handleStoreFencedException() in ResourceManager and the store implementation to
call it when it realizes it got fenced. Thoughts?
Thats what I was suggesting. The store reports this exception/error to the RM
and then the RM does the right thing. (in this case transitionToStandby).
notifyDoneStoringApplicationAttempt() etc should not be sent when there is a
fenced exception. Extending that, we should probably only send the notifyDone*
upon success. That way those callees need to be bothered only with the
normal/success code path. Any exception should be reported to the RM. The RM
can examine the exception to see if it is a fenced exception. Then
transitionToStandby(). If some other exception then die (like we currently do
in multiple different places. We will now do it in one place).
> Make improvements in ZKRMStateStore for fencing
> -----------------------------------------------
>
> Key: YARN-1222
> URL: https://issues.apache.org/jira/browse/YARN-1222
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Bikas Saha
> Assignee: Karthik Kambatla
> Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch,
> yarn-1222-4.patch, yarn-1222-5.patch
>
>
> Using multi-operations for every ZK interaction.
> In every operation, automatically creating/deleting a lock znode that is the
> child of the root znode. This is to achieve fencing by modifying the
> create/delete permissions on the root znode.
--
This message was sent by Atlassian JIRA
(v6.1#6144)