[ https://issues.apache.org/jira/browse/YARN-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhihai xu updated YARN-4209: ---------------------------- Description: RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by {{stateMachine.doTransition}}. The reason is {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded in {{stateMachine.doTransition}} called from public API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So right after the internal state transition from {{updateFencedState}} changes the state to FENCED state, the external state transition changes the state back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE state even {{notifyStoreOperationFailed}} is called. The only working case for FENCED state is {{notifyStoreOperationFailed}} called from {{ZKRMStateStore#VerifyActiveStatusThread}}. For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => {{notifyStoreOperationFailed}} =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} change state to FENCED => exit external {{stateMachine.doTransition}} change state to ACTIVE. was: RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by {{stateMachine.doTransition}}. The reason is {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded in {{stateMachine.doTransition}} called from public API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So right after the internal state transition from {{updateFencedState}} changes the state to FENCED state, the external state transition changes the state back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE state even notifyStoreOperationFailed is called. The only working case for FENCED state is {{notifyStoreOperationFailed}} called from {{ZKRMStateStore#VerifyActiveStatusThread}}. For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => {{notifyStoreOperationFailed}} =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} change state to FENCED => exit external {{stateMachine.doTransition}} change state to ACTIVE. > RMStateStore FENCED state doesn’t work due to updateFencedState called by > stateMachine.doTransition > --------------------------------------------------------------------------------------------------- > > Key: YARN-4209 > URL: https://issues.apache.org/jira/browse/YARN-4209 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.7.2 > Reporter: zhihai xu > Assignee: zhihai xu > Priority: Critical > Attachments: YARN-4209.000.patch > > > RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by > {{stateMachine.doTransition}}. The reason is > {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded > in {{stateMachine.doTransition}} called from public > API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So > right after the internal state transition from {{updateFencedState}} changes > the state to FENCED state, the external state transition changes the state > back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE > state even {{notifyStoreOperationFailed}} is called. The only working case > for FENCED state is {{notifyStoreOperationFailed}} called from > {{ZKRMStateStore#VerifyActiveStatusThread}}. > For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter > external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => > {{notifyStoreOperationFailed}} > =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal > {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} > change state to FENCED => exit external {{stateMachine.doTransition}} change > state to ACTIVE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)