[ 
https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817560#comment-13817560
 ] 

Bikas Saha commented on YARN-1222:
----------------------------------

Quick comments
1) The new event is not following the convention we have for events. Events are 
grouped by the destination of the events ie the handler. So all 
RMStateStoreEvent are handled by the state store. We have now a new class of 
event that are handled by the ResourceManager. So we should not overload the 
RMStateStoreEvents. Lets create a new type that is handled by the new handler 
in the ResourceManager. When HA is enabled then on exception we should 
transitionToStandby() but not exit. When HA is not enabled then we should die 
like we currently do.

2) I dont quite get why the ResourceManager would send a failed_store event 
back to the store who had sent it to the RM in the first place. From 1) above 
RM should either transitionToStandby or die when it gets that event.

> Make improvements in ZKRMStateStore for fencing
> -----------------------------------------------
>
>                 Key: YARN-1222
>                 URL: https://issues.apache.org/jira/browse/YARN-1222
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Karthik Kambatla
>         Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, 
> yarn-1222-4.patch, yarn-1222-5.patch, yarn-1222-6.patch
>
>
> Using multi-operations for every ZK interaction. 
> In every operation, automatically creating/deleting a lock znode that is the 
> child of the root znode. This is to achieve fencing by modifying the 
> create/delete permissions on the root znode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to