[
https://issues.apache.org/jira/browse/YARN-8362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chandni Singh updated YARN-8362:
--------------------------------
Attachment: YARN-8362.002.patch
> Number of remaining retries are updated twice after a container failure in NM
> ------------------------------------------------------------------------------
>
> Key: YARN-8362
> URL: https://issues.apache.org/jira/browse/YARN-8362
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Chandni Singh
> Assignee: Chandni Singh
> Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8362.001.patch, YARN-8362.002.patch
>
>
> The {{shouldRetry(int errorCode)}} in {{ContainerImpl}} with YARN-5015 also
> updated some fields in retry context- remaining retries, restart times.
> This method is directly called from outside the ContainerImpl class as well-
> {{ContainerLaunch.setContainerCompletedStatus}}. This causes following
> problems:
> # remainingRetries are updated more than once after a failure. if
> {{maxRetries = 1}}, then a retry will not be triggered because of multiple
> calls to {{shouldRetry(int errorCode).}}
> # Writes to {{retryContext}} should be protected and called when the write
> lock is held.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]