[
https://issues.apache.org/jira/browse/YARN-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773513#comment-16773513
]
Rayman commented on YARN-3554:
------------------------------
The RetryUpToMaximumTimeWithFixedSleep policy takes as input a maxTime and a
sleepTime.
and internally is implemented as a RetryUpToMaximumCountWithFixedSleep with
maxCount = maxTime / sleepTime.
This has a problem
It does not account for the time spent while performing the actual retry. For
example,
RetryUpToMaximumTimeWithFixedSleep with maxTime = 30 sec and sleepTime = 1sec.
Will takeupto 90 seconds, if each retry (e.g., ConnectionTimeout) takes 2
seconds to return.
30 * (2 +1).
A policy claiming to be RetryUpToMaximumTimeWithFixedSleep, should *actually*
respect the *maximum time*, e.g., by recording a timestamp/timer.
> Default value for maximum nodemanager connect wait time is too high
> -------------------------------------------------------------------
>
> Key: YARN-3554
> URL: https://issues.apache.org/jira/browse/YARN-3554
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Jason Lowe
> Assignee: Naganarasimha G R
> Priority: Major
> Labels: BB2015-05-RFC, newbie
> Fix For: 2.8.0, 2.7.1, 2.6.2, 3.0.0-alpha1
>
> Attachments: YARN-3554-20150429-2.patch, YARN-3554.20150429-1.patch
>
>
> The default value for yarn.client.nodemanager-connect.max-wait-ms is 900000
> msec or 15 minutes, which is way too high. The default container expiry time
> from the RM and the default task timeout in MapReduce are both only 10
> minutes.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]