> On 3 Jun 2015, at 22:01, Henry Saputra <[email protected]> wrote:
> 
> Hi All,
> 
> I would like to know if "yarn.resourcemanager.am.max-attempts" config
> parameter will make the already running ApplicationMaster (AM) to have
> HA mode in YARN once it is already running?
> 

if you can reconfigure the RM and restart it, the value will be picked up by 
the RM (rolling upgrades and an HA cluster lets you do that)

for long-lived services, you should have the cluster set up with a window for 
failures, so that sporadic, intermittent failures don't kill the app.

> Meaning that if the running AM process dies (though permgen, OOM, or
> kill JVM with kill signal) then ResourceManager (RM) should be able to
> restart the number of times specified by
> "yarn.resourcemanager.am.max-attempts" config value ?

yes, though its a "start counter", not a restart counter. That first run counts 
as attempt #1

> 
> I was trying it and it seems like the there was an attempt to restart
> the AppMaster but dies immediately.
> 

with a default cluster restart value of 2, two failures in a row is enough to 
kill the app.

In https://issues.apache.org/jira/browse/YARN-2392  I've a patch to give you 
more details on count-exceeded values; global and app limits, plus window 
details.

Reply via email to