The RMProxy code, by default, uses 15 minutes for connect.max-wait, but the
AM aborts trying to connect only after 20 mins. Wonder where the additional
5 minutes comes from? Let me run it again and see.

Also, 15 minutes seems a little excessive, compared to other similar
timeouts being 10 mins. I can fix this as part of YARN-1056 if you agree we
should bring it down.

Thanks
Karthik


On Mon, Aug 12, 2013 at 10:22 AM, Bikas Saha <[email protected]> wrote:

> You should probably look at the RMProxy code and the configs it uses. I am
> hoping that all clients including the MR AM now use that proxy and so
> older configs are no longer valid.
>
> Bikas
>
> -----Original Message-----
> From: Karthik Kambatla [mailto:[email protected]]
> Sent: Sunday, August 11, 2013 8:45 PM
> To: [email protected]
> Subject: AM timeout on RM failure?
>
> Hi YARN devs,
>
> I am working on the ZKRMStateStore, and had a very basic question - on RM
> failure, how long does the AM fail before crashing, or more importantly
> what controls it.
>
> Looking into the code, I see the following two parameters:
>
>    1. yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms - set to
>    1 min
>    2. Fix configs
>
> yarn.resourcemanager.resourcemanager.connect.{max.wait.secs|retry_interval
> .secs}
>    - set by default to 15 mins and 30 seconds respectively
>
> The AM crashes only after 20 minutes.
>
> Are there any other configs that influence this?
>
> Thanks
> Karthik
>

Reply via email to