Hi YARN devs,
I am working on the ZKRMStateStore, and had a very basic question - on RM
failure, how long does the AM fail before crashing, or more importantly
what controls it.
Looking into the code, I see the following two parameters:
1. yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms - set to
1 min
2. Fix configs
yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval.secs}
- set by default to 15 mins and 30 seconds respectively
The AM crashes only after 20 minutes.
Are there any other configs that influence this?
Thanks
Karthik