> On 3 Jun 2015, at 22:01, Henry Saputra <[email protected]> wrote: > > Hi All, > > I would like to know if "yarn.resourcemanager.am.max-attempts" config > parameter will make the already running ApplicationMaster (AM) to have > HA mode in YARN once it is already running? >
if you can reconfigure the RM and restart it, the value will be picked up by the RM (rolling upgrades and an HA cluster lets you do that) for long-lived services, you should have the cluster set up with a window for failures, so that sporadic, intermittent failures don't kill the app. > Meaning that if the running AM process dies (though permgen, OOM, or > kill JVM with kill signal) then ResourceManager (RM) should be able to > restart the number of times specified by > "yarn.resourcemanager.am.max-attempts" config value ? yes, though its a "start counter", not a restart counter. That first run counts as attempt #1 > > I was trying it and it seems like the there was an attempt to restart > the AppMaster but dies immediately. > with a default cluster restart value of 2, two failures in a row is enough to kill the app. In https://issues.apache.org/jira/browse/YARN-2392 I've a patch to give you more details on count-exceeded values; global and app limits, plus window details.
