[
https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinod Kumar Vavilapalli reassigned YARN-1861:
---------------------------------------------
Assignee: Karthik Kambatla (was: Xuan Gong)
Looking at this as it is blocking 2.4.1.
Assigning to Karthik given he has done the core code change. Will credit Xuan
too.
I tried to just apply the test-case and run it without the core change and was
expecting the active RM to go to standby and the standby RM to go to active
once the originally active RM is fenced. Instead I get a NPE somewhere. Can the
test be fixed to do so?
Also, we need to make sure that when automatic failover is enabled, all
external interventions like a fence like this bug (and forced-manual failover
from CLI?) do a similar reset into the leader election. There may not be cases
like this today though..
> Both RM stuck in standby mode when automatic failover is enabled
> ----------------------------------------------------------------
>
> Key: YARN-1861
> URL: https://issues.apache.org/jira/browse/YARN-1861
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: resourcemanager
> Affects Versions: 2.4.0
> Reporter: Arpit Gupta
> Assignee: Karthik Kambatla
> Priority: Blocker
> Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch,
> YARN-1861.5.patch, yarn-1861-1.patch, yarn-1861-6.patch
>
>
> In our HA tests we noticed that the tests got stuck because both RM's got
> into standby state and no one became active.
--
This message was sent by Atlassian JIRA
(v6.2#6252)