[ https://issues.apache.org/jira/browse/YARN-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817560#comment-13817560 ]
Bikas Saha commented on YARN-1222: ---------------------------------- Quick comments 1) The new event is not following the convention we have for events. Events are grouped by the destination of the events ie the handler. So all RMStateStoreEvent are handled by the state store. We have now a new class of event that are handled by the ResourceManager. So we should not overload the RMStateStoreEvents. Lets create a new type that is handled by the new handler in the ResourceManager. When HA is enabled then on exception we should transitionToStandby() but not exit. When HA is not enabled then we should die like we currently do. 2) I dont quite get why the ResourceManager would send a failed_store event back to the store who had sent it to the RM in the first place. From 1) above RM should either transitionToStandby or die when it gets that event. > Make improvements in ZKRMStateStore for fencing > ----------------------------------------------- > > Key: YARN-1222 > URL: https://issues.apache.org/jira/browse/YARN-1222 > Project: Hadoop YARN > Issue Type: Sub-task > Reporter: Bikas Saha > Assignee: Karthik Kambatla > Attachments: yarn-1222-1.patch, yarn-1222-2.patch, yarn-1222-3.patch, > yarn-1222-4.patch, yarn-1222-5.patch, yarn-1222-6.patch > > > Using multi-operations for every ZK interaction. > In every operation, automatically creating/deleting a lock znode that is the > child of the root znode. This is to achieve fencing by modifying the > create/delete permissions on the root znode. -- This message was sent by Atlassian JIRA (v6.1#6144)