[
https://issues.apache.org/jira/browse/YARN-8362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491101#comment-16491101
]
Chandni Singh edited comment on YARN-8362 at 5/25/18 7:06 PM:
--------------------------------------------------------------
In patch 2, I fixed the checkstyle.
The test failure
{{org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager.testLocalingResourceWhileContainerRunning}}
is not related to this change.
ItĀ fails in the existing trunk even without this change.
was (Author: csingh):
In patch 2, I fixed the checkstyle.
The test failure
{{org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager.testLocalingResourceWhileContainerRunning}}
is not related to this change.
> Number of remaining retries are updated twice after a container failure in NM
> ------------------------------------------------------------------------------
>
> Key: YARN-8362
> URL: https://issues.apache.org/jira/browse/YARN-8362
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Chandni Singh
> Assignee: Chandni Singh
> Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8362.001.patch, YARN-8362.002.patch
>
>
> The {{shouldRetry(int errorCode)}} in {{ContainerImpl}} with YARN-5015 also
> updated some fields in retry context- remaining retries, restart times.
> This method is directly called from outside the ContainerImpl class as well-
> {{ContainerLaunch.setContainerCompletedStatus}}. This causes following
> problems:
> # remainingRetries are updated more than once after a failure. if
> {{maxRetries = 1}}, then a retry will not be triggered because of multiple
> calls to {{shouldRetry(int errorCode).}}
> # Writes to {{retryContext}} should be protected and called when the write
> lock is held.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]