Karthik Kambatla commented on YARN-1861:

bq. Also, we need to make sure that when automatic failover is enabled, all 
external interventions like a fence like this bug (and forced-manual failover 
from CLI?) do a similar reset into the leader election. There may not be cases 
like this today though.
One way to future-proof this is to call resetLeaderElection in 
ResourceManager#transitionToStandby itself. That looks hacky, but doesn't 
require new external interventions to explicitly handle it. [~vinodkv] - do you 
think that would be a better approach?

> Both RM stuck in standby mode when automatic failover is enabled
> ----------------------------------------------------------------
>                 Key: YARN-1861
>                 URL: https://issues.apache.org/jira/browse/YARN-1861
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Arpit Gupta
>            Assignee: Karthik Kambatla
>            Priority: Blocker
>         Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, 
> YARN-1861.5.patch, yarn-1861-1.patch, yarn-1861-6.patch
> In our HA tests we noticed that the tests got stuck because both RM's got 
> into standby state and no one became active.

This message was sent by Atlassian JIRA

Reply via email to