[ 
https://issues.apache.org/jira/browse/YARN-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390490#comment-16390490
 ] 

Chandni Singh edited comment on YARN-5015 at 3/8/18 12:17 AM:
--------------------------------------------------------------

 [~leftnoteasy] Please find my answers below to some of the questions:
{quote}2) mv org.apache.hadoop.yarn.server.retry.SlidingWindowRetryPolicy to 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container: Why it is 
in server-common?
{quote}
It is in server common so that later we can use it for AM restart. Eventually 
we have to unify the code for AM and container restart, so this class needs to 
be accessible to RM as well.
{quote}4) calculatePendingRetries

return retryContext.getRemainingRetries() == -1 ? retryContext.getMaxRetries() 
: retryContext.getRemainingRetries();

 Why check {{retryContext.getRemainingRetries() == -1}}? Should this be 
getMaxRetries() == -1?
{quote}
The default value of {{remainingRetries}} is -1, that is, when it is not set, 
it is -1.

If remainingRetries is not set then pending retries = {{maxRetries}}. 
Otherwise, pendingRetries = {{remainingRetries}}.
 Just after this we update the {{remainingRetries}} = {{pendingRetries}} - 1.
{quote}1) Instead of adding getRestartTimes/getRemainingRetries to 
{{ContainerRetryContext}}, I suggest to have a separate class like 
NMContainerRetryContext which includes:
{quote}
Similar to 2, should I create a {{SlidingContainerRetryContext}} in the 
server-common? Even this needs to be accessible to RM later when we change AM 
retry code to use this common class?

 

 


was (Author: csingh):
 [~leftnoteasy] Please find my answers below to some of the questions:
{quote}2) mv org.apache.hadoop.yarn.server.retry.SlidingWindowRetryPolicy to 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container: Why it is 
in server-common?
{quote}
It is in server common so that later we can use it for AM restart. Eventually 
we have to unify the code for AM and container restart, so this class needs to 
be accessible to RM as well.
{quote}4) calculatePendingRetries

return retryContext.getRemainingRetries() == -1 ? retryContext.getMaxRetries() 
: retryContext.getRemainingRetries();

 Why check {{retryContext.getRemainingRetries() == -1}}? Should this be 
getMaxRetries() == -1?
{quote}
The default value of {{remainingRetries}} is -1, that is, when it is not set, 
it is -1.

If remainingRetries is not set then pending retries = {{maxRetries}}. 
Otherwise, pendingRetries = {{remainingRetries}}.
 Just after this we update the {{remainingRetries}} = {{pendingRetries}} - 1.

> Support sliding window retry capability for container restart 
> --------------------------------------------------------------
>
>                 Key: YARN-5015
>                 URL: https://issues.apache.org/jira/browse/YARN-5015
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>            Reporter: Varun Vasudev
>            Assignee: Chandni Singh
>            Priority: Major
>              Labels: oct16-medium
>         Attachments: YARN-5015.01.patch, YARN-5015.02.patch, 
> YARN-5015.03.patch
>
>
> We support sliding window retry policy for AM restarts (Introduced in 
> YARN-611). Similar sliding window retry policy is needed for container 
> restarts.
> With this change, we can introduce a common class for 
> SlidingWindowRetryPolicy ( suggested by [~vvasudev] in the comments) and 
> integrate it to container restart. 
> In a subsequent jira, we can modify the AM code to use 
> SlidingWindowRetryPolicy which will unify the AM and container restart code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to