[
https://issues.apache.org/jira/browse/YARN-5015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16390428#comment-16390428
]
Wangda Tan commented on YARN-5015:
----------------------------------
Thanks [~csingh], see my comments below:
1) Instead of adding getRestartTimes/getRemainingRetries to
{{ContainerRetryContext}}, I suggest to have a separate class like
NMContainerRetryContext which includes:
- ContainerRetryContext
- getRestartTimes/getRemainingRetries
Since we should not add runtime information to protocol/api classes.
2) mv org.apache.hadoop.yarn.server.retry.SlidingWindowRetryPolicy to
org.apache.hadoop.yarn.server.nodemanager.containermanager.container: Why it is
in server-common?
3) {{shouldRetry}}:
- It's better to return true at the begining of the method when
{{getMaxRetries() == ContainerRetryContext.RETRY_FOREVER}}, which can avoid
lots of checks in the following functions like calculatePendingRetries.
4) {{calculatePendingRetries}}
{code}
return retryContext.getRemainingRetries() == -1 ?
retryContext.getMaxRetries() :
retryContext.getRemainingRetries();
{code}
Why check {{retryContext.getRemainingRetries() == -1}}? Should this be
getMaxRetries() == -1?
5) {{updateRetryContext}}:
{code}
retryContext.setRemainingRetries(pendingRetries -1);
{code}
6) In ContainerImpl:
{code}
int n = container.containerRetryContext.getMaxRetries()
- container.containerRetryContext.getRemainingRetries();
container.addDiagnostics("Diagnostic message from attempt "
+ n + " : ", "\n");
{code}
Under the context of SlidingWindowRetry, this n may keep changing. To avoid
introducing more logics, I suggest to remove {{n}} from the diagnostics.
> Support sliding window retry capability for container restart
> --------------------------------------------------------------
>
> Key: YARN-5015
> URL: https://issues.apache.org/jira/browse/YARN-5015
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Reporter: Varun Vasudev
> Assignee: Chandni Singh
> Priority: Major
> Labels: oct16-medium
> Attachments: YARN-5015.01.patch, YARN-5015.02.patch,
> YARN-5015.03.patch
>
>
> We support sliding window retry policy for AM restarts (Introduced in
> YARN-611). Similar sliding window retry policy is needed for container
> restarts.
> With this change, we can introduce a common class for
> SlidingWindowRetryPolicy ( suggested by [~vvasudev] in the comments) and
> integrate it to container restart.
> In a subsequent jira, we can modify the AM code to use
> SlidingWindowRetryPolicy which will unify the AM and container restart code.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]