RE: YARN HA Active ResourceManager failover when machine is stopped

Rohith Sharma K S Sun, 26 Apr 2015 21:39:07 -0700

Hi

     I had seen this issue in my cluster without HA configured when the process 
is Halted.  I assume that your scenario also having similar issue when Active 
RM machine is Shutdown abruptly.  May be you can verify and compare taking 
thread dump of NM and with below JIRA’s.


Open JIRA’s in community regarding this problem are
https://issues.apache.org/jira/i#browse/YARN-1061 (Without HA)
https://issues.apache.org/jira/i#browse/YARN-2578 (With HA)


Thanks & Regards
Rohith Sharma K S

From: Matt Narrell [mailto:[email protected]]
Sent: 24 April 2015 23:28
To: [email protected]
Subject: Re: YARN HA Active ResourceManager failover when machine is stopped

Also, another observation is that when the VMs are halted, its seems like the 
NodeManagers do not consider this a scenario to round-robin among the 
configured ResourceManagers?  Is there some timeout that I’ve missed to 
instruct the NodeManagers to do this round-robining in the case of the machine 
not responding (to distinguish it from a network blip)?

mn

On Apr 24, 2015, at 1:50 AM, Drake민영근 
<[email protected]<mailto:[email protected]>> wrote:

Hi, Matt

The second log file looks like node manager's log, not the standby resource 
manager.

Thanks.

Drake 민영근 Ph.D
kt NexR

On Fri, Apr 24, 2015 at 11:39 AM, Matt Narrell 
<[email protected]<mailto:[email protected]>> wrote:
Active ResourceManager:  http://pastebin.com/hE0ppmnb
Standby ResourceManager: http://pastebin.com/DB8VjHqA

Oppressively chatty and not much valuable info contained therein.


On Apr 23, 2015, at 4:25 PM, Vinod Kumar Vavilapalli 
<[email protected]<mailto:[email protected]>> wrote:

I have run into this offline with someone else too but couldn't root-cause it.

Will you be able to share your active/standby ResourceManager logs via pastebin 
or something?

+Vinod

On Apr 23, 2015, at 9:41 AM, Matt Narrell 
<[email protected]<mailto:[email protected]>> wrote:


I’m using Hadoop 2.6.0 from HDP 2.2.4 installed via Ambari 2.0

I’m testing the YARN HA ResourceManager failover. If I STOP the active 
ResourceManager (shut the machine off), the standby ResourceManager is elected 
to active, but the NodeManagers do not register themselves with the newly 
elected active ResourceManager. If I restart the machine (but DO NOT resume the 
YARN services) the NodeManagers register with the newly elected ResourceManager 
and my jobs resume. I assume I have some bad configuration, as this produces a 
SPOF, and is not HA in the sense I’m expecting.

Thanks,
mn

RE: YARN HA Active ResourceManager failover when machine is stopped

Reply via email to