[ https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872368#comment-15872368 ]
Jian He commented on YARN-6153: ------------------------------- thanks for updating the patch: - could you add brief comment at the head of testAMRestartNotLostContainerAfterAttemptFailuresValidityInterval to explain what the test mostly does? - In RMAppAttemptImpl, why is getStartTime used for checking validityInterval. Also, given that shouldCountTowardsMaxAttemptRetry internally already contains checking validity interval, this code is not needed ? because it's already done in the {{if (!appAttempt.shouldCountTowardsMaxAttemptRetry()) {}} before. {code} } else { // After AM reset window time, it is no longer the last attempt. long attemptFailuresValidityInterval = appAttempt.submissionContext.getAttemptFailuresValidityInterval(); long end = System.currentTimeMillis(); if (attemptFailuresValidityInterval > 0 && appAttempt.getStartTime() < (end - attemptFailuresValidityInterval)) { keepContainersAcrossAppAttempts = true; } {code} - A couple of places exceed 80 column limit, pls fix those. > keepContainer does not work when AM retry window is set > ------------------------------------------------------- > > Key: YARN-6153 > URL: https://issues.apache.org/jira/browse/YARN-6153 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.7.1 > Reporter: kyungwan nam > Attachments: YARN-6153.001.patch, YARN-6153.002.patch > > > yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster. > I submitted a YARN application (slider app) that keepContainers=true, > attemptFailuresValidityInterval=300000. > it did work properly when AM was failed firstly. > all containers launched by previous AM were resynced with new AM (attempt2) > without killing containers. > after 10 minutes, I thought AM failure count was reset by > attemptFailuresValidityInterval (5 minutes). > but, all containers were killed when AM was failed secondly. (new AM attempt3 > was launched properly) -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org