[ 
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13988573#comment-13988573
 ] 

Tsuyoshi OZAWA commented on YARN-1861:
--------------------------------------

> We should call rm.adminService.resetLeaderElection() in the finally block. If 
> rm.transitionToStandby() fails while stoping RM's services, all RM can stuck.

Sorry, I noticed this is wrong. If rm.transitionToStandby() fails, RM can stuck 
until ZK server detects the failure. We can call EmbeddedElectorService.stop() 
in exception hander to shutdown gracefully, but this is one option.

> Both RM stuck in standby mode when automatic failover is enabled
> ----------------------------------------------------------------
>
>                 Key: YARN-1861
>                 URL: https://issues.apache.org/jira/browse/YARN-1861
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Arpit Gupta
>            Assignee: Xuan Gong
>            Priority: Blocker
>         Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, 
> YARN-1861.5.patch, yarn-1861-1.patch
>
>
> In our HA tests we noticed that the tests got stuck because both RM's got 
> into standby state and no one became active.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to