Rohith Sharma K S commented on YARN-4209:

I debugged your test case, and I got your point. Good catch!!

About the patch, Moving {{updateFencedState(); }} into StandByTransitionThread 
would solve the problem, but randomly does not ensure stateMachine is moved to 
FENCED state since it is asynchronous. It means if any other events are 
competing for obtaining write lock, then moving to FENCED state might be 
delayed. Thinking how about changing to MultipleArcTransition? so that all the 
exception handling return FENCED state. It also ensures state is in FENCED 

> RMStateStore FENCED state doesn’t work due to updateFencedState called by 
> stateMachine.doTransition
> ---------------------------------------------------------------------------------------------------
>                 Key: YARN-4209
>                 URL: https://issues.apache.org/jira/browse/YARN-4209
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.2
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>            Priority: Critical
>         Attachments: YARN-4209.000.patch
> RMStateStore FENCED state doesn’t work due to {{updateFencedState}} called by 
> {{stateMachine.doTransition}}. The reason is
> {{stateMachine.doTransition}} called from {{updateFencedState}} is embedded 
> in {{stateMachine.doTransition}} called from public 
> API(removeRMDelegationToken...) or {{ForwardingEventHandler#handle}}. So 
> right after the internal state transition from {{updateFencedState}} changes 
> the state to FENCED state, the external state transition changes the state 
> back to ACTIVE state. The end result is that RMStateStore is still in ACTIVE 
> state even after {{notifyStoreOperationFailed}} is called. The only working 
> case for FENCED state is {{notifyStoreOperationFailed}} called from 
> {{ZKRMStateStore#VerifyActiveStatusThread}}.
> For example: {{removeRMDelegationToken}} => {{handleStoreEvent}} => enter 
> external {{stateMachine.doTransition}} => {{RemoveRMDTTransition}} => 
> {{notifyStoreOperationFailed}} 
> =>{{updateFencedState}}=>{{handleStoreEvent}}=> enter internal 
> {{stateMachine.doTransition}} => exit internal {{stateMachine.doTransition}} 
> change state to FENCED => exit external {{stateMachine.doTransition}} change 
> state to ACTIVE.

This message was sent by Atlassian JIRA

Reply via email to