[
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995743#comment-13995743
]
Karthik Kambatla commented on YARN-1861:
----------------------------------------
bq. Also, we need to make sure that when automatic failover is enabled, all
external interventions like a fence like this bug (and forced-manual failover
from CLI?) do a similar reset into the leader election. There may not be cases
like this today though.
One way to future-proof this is to call resetLeaderElection in
ResourceManager#transitionToStandby itself. That looks hacky, but doesn't
require new external interventions to explicitly handle it. [~vinodkv] - do you
think that would be a better approach?
> Both RM stuck in standby mode when automatic failover is enabled
> ----------------------------------------------------------------
>
> Key: YARN-1861
> URL: https://issues.apache.org/jira/browse/YARN-1861
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 2.4.0
> Reporter: Arpit Gupta
> Assignee: Karthik Kambatla
> Priority: Blocker
> Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch,
> YARN-1861.5.patch, yarn-1861-1.patch, yarn-1861-6.patch
>
>
> In our HA tests we noticed that the tests got stuck because both RM's got
> into standby state and no one became active.
--
This message was sent by Atlassian JIRA
(v6.2#6252)