[
https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872368#comment-15872368
]
Jian He commented on YARN-6153:
-------------------------------
thanks for updating the patch:
- could you add brief comment at the head of
testAMRestartNotLostContainerAfterAttemptFailuresValidityInterval to explain
what the test mostly does?
- In RMAppAttemptImpl, why is getStartTime used for checking validityInterval.
Also, given that shouldCountTowardsMaxAttemptRetry internally already contains
checking validity interval, this code is not needed ? because it's already
done in the {{if (!appAttempt.shouldCountTowardsMaxAttemptRetry()) {}} before.
{code}
} else {
// After AM reset window time, it is no longer the last attempt.
long attemptFailuresValidityInterval =
appAttempt.submissionContext.getAttemptFailuresValidityInterval();
long end = System.currentTimeMillis();
if (attemptFailuresValidityInterval > 0
&& appAttempt.getStartTime() < (end -
attemptFailuresValidityInterval)) {
keepContainersAcrossAppAttempts = true;
}
{code}
- A couple of places exceed 80 column limit, pls fix those.
> keepContainer does not work when AM retry window is set
> -------------------------------------------------------
>
> Key: YARN-6153
> URL: https://issues.apache.org/jira/browse/YARN-6153
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.7.1
> Reporter: kyungwan nam
> Attachments: YARN-6153.001.patch, YARN-6153.002.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true,
> attemptFailuresValidityInterval=300000.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2)
> without killing containers.
> after 10 minutes, I thought AM failure count was reset by
> attemptFailuresValidityInterval (5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3
> was launched properly)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]