[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13815280#comment-13815280
 ] 

Bikas Saha commented on YARN-1222:
----------------------------------

bq. Post YARN-1318, I think RMStateStore constructor should take RMContext. 
Then, we should be able to replace the RPC approach with 
rmContext.getHAService.transitionToStandby()
Great, lets track that and put a comment. Doing a self-RPC is good to avoid.

bq. A completely different approach might to be keep 
handleStoreFencedException() in ResourceManager and the store implementation to 
call it when it realizes it got fenced. Thoughts?
Thats what I was suggesting. The store reports this exception/error to the RM 
and then the RM does the right thing. (in this case transitionToStandby).

notifyDoneStoringApplicationAttempt() etc should not be sent when there is a 
fenced exception. Extending that, we should probably only send the notifyDone* 
upon success. That way those callees need to be bothered only with the 
normal/success code path. Any exception should be reported to the RM. The RM 
can examine the exception to see if it is a fenced exception. Then 
transitionToStandby(). If some other exception then die (like we currently do 
in multiple different places. We will now do it in one place).

> Make improvements in ZKRMStateStore for fencing
> -----------------------------------------------
>
>                 Key: YARN-1222
>                 URL: https://issues.apache.org/jira/browse/YARN-1222
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Karthik Kambatla
>         Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, 
> yarn-1222-4.patch, yarn-1222-5.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to