Vinod Kumar Vavilapalli commented on YARN-1861:

bq. Without the core code change, this testcase will fail. Because NM is trying 
to connect the active RM, but neither of two RMs are active. So, the NPE is 
Can we make this explicit, instead of being an NPE? Like doing a client call to 
find the current active RM or something like that?

Tx for the explanation of all the cases, Xuan.

bq. That looks hacky, but doesn't require new external interventions to 
explicitly handle it. Vinod Kumar Vavilapalli - do you think that would be a 
better approach?
That is what I was thinking, but I am concerned about locking etc. This code 
has become a little convoluted. Per Xuan, we seem to be safe for now, so may be 
look at this separately?

> Both RM stuck in standby mode when automatic failover is enabled
> ----------------------------------------------------------------
>                 Key: YARN-1861
>                 URL: https://issues.apache.org/jira/browse/YARN-1861
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Arpit Gupta
>            Assignee: Karthik Kambatla
>            Priority: Blocker
>         Attachments: YARN-1861.2.patch, YARN-1861.3.patch, YARN-1861.4.patch, 
> YARN-1861.5.patch, yarn-1861-1.patch, yarn-1861-6.patch
> In our HA tests we noticed that the tests got stuck because both RM's got 
> into standby state and no one became active.

This message was sent by Atlassian JIRA

Reply via email to