The RMProxy code, by default, uses 15 minutes for connect.max-wait, but the AM aborts trying to connect only after 20 mins. Wonder where the additional 5 minutes comes from? Let me run it again and see.
Also, 15 minutes seems a little excessive, compared to other similar timeouts being 10 mins. I can fix this as part of YARN-1056 if you agree we should bring it down. Thanks Karthik On Mon, Aug 12, 2013 at 10:22 AM, Bikas Saha <[email protected]> wrote: > You should probably look at the RMProxy code and the configs it uses. I am > hoping that all clients including the MR AM now use that proxy and so > older configs are no longer valid. > > Bikas > > -----Original Message----- > From: Karthik Kambatla [mailto:[email protected]] > Sent: Sunday, August 11, 2013 8:45 PM > To: [email protected] > Subject: AM timeout on RM failure? > > Hi YARN devs, > > I am working on the ZKRMStateStore, and had a very basic question - on RM > failure, how long does the AM fail before crashing, or more importantly > what controls it. > > Looking into the code, I see the following two parameters: > > 1. yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms - set to > 1 min > 2. Fix configs > > yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval > .secs} > - set by default to 15 mins and 30 seconds respectively > > The AM crashes only after 20 minutes. > > Are there any other configs that influence this? > > Thanks > Karthik >
