[ 
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14519360#comment-14519360
 ] 

Jason Lowe commented on YARN-3554:
----------------------------------

I think 10 minutes is still too high.  We didn't even have this functionality 
until 2.6 because of rolling upgrades, and NMs don't take that long to recover 
in a rolling upgrade.  They recover in tens of seconds rather than tens of 
minutes.  Therefore I don't think it makes much sense to spend a lot of time 
trying to connect to an NM beyond a few minutes.  The chances of successfully 
connecting after a few minutes of trying is going to be very low, and NMs fail 
all the time anyway.  So if we spend all that extra time trying for essentially 
no benefit, all we've done is prolonged the application recovery time for no 
good reason.

> Default value for maximum nodemanager connect wait time is too high
> -------------------------------------------------------------------
>
>                 Key: YARN-3554
>                 URL: https://issues.apache.org/jira/browse/YARN-3554
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>            Assignee: Naganarasimha G R
>         Attachments: YARN-3554.20150429-1.patch
>
>
> The default value for yarn.client.nodemanager-connect.max-wait-ms is 900000 
> msec or 15 minutes, which is way too high.  The default container expiry time 
> from the RM and the default task timeout in MapReduce are both only 10 
> minutes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to