[
https://issues.apache.org/jira/browse/YARN-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943809#comment-14943809
]
Anubhav Dhoot commented on YARN-4185:
-------------------------------------
I don't think option 2 where you restart from 1 makes sense. Its also not a
goal to minimize the total wait time. The goal should be to minimize the time
to recover for short intermittent failure while also waiting long enough for
long failures before giving up. Would it be better for us to ramp up to 10 sec
exponentially and then do the n retries for 10 sec or do totally n retries
including the ramp up.
> Retry interval delay for NM client can be improved from the fixed static
> retry
> -------------------------------------------------------------------------------
>
> Key: YARN-4185
> URL: https://issues.apache.org/jira/browse/YARN-4185
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Anubhav Dhoot
> Assignee: Neelesh Srinivas Salian
>
> Instead of having a fixed retry interval that starts off very high and stays
> there, we are better off using an exponential backoff that has the same fixed
> max limit. Today the retry interval is fixed at 10 sec that can be
> unnecessarily high especially when NMs could rolling restart within a sec.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)