[ 
https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15877186#comment-15877186
 ] 

Jian He commented on YARN-6153:
-------------------------------

I'm thinking how to make the code simpler,
Can this code
{code}
          if (appAttempt.submissionContext
            .getKeepContainersAcrossApplicationAttempts()
              && !appAttempt.submissionContext.getUnmanagedAM()) {
            // See if we should retain containers for non-unmanaged applications
            if (!appAttempt.shouldCountTowardsMaxAttemptRetry()) {
              // Premption, hardware failures, NM resync doesn't count towards
              // app-failures and so we should retain containers.
              keepContainersAcrossAppAttempts = true;
            } else if (!appAttempt.maybeLastAttempt) {
              // Not preemption, hardware failures or NM resync.
              // Not last-attempt too - keep containers.
              keepContainersAcrossAppAttempts = true;
            } else {
              // After AM reset window time, it is no longer the last attempt.
              long attemptFailuresValidityInterval = appAttempt
                  .submissionContext.getAttemptFailuresValidityInterval();
              long end = System.currentTimeMillis();
              if (attemptFailuresValidityInterval > 0
                  && appAttempt.getStartTime() < (end
                  - attemptFailuresValidityInterval)) {
                keepContainersAcrossAppAttempts = true;
              }
            }
          }
{code}
 be replaced as same as RMAppImpl ?
{code}
if (KeepContainersInSubmissonContext && app.getNumFailedAppAttempts() >= 
app.getMaxAttempts()) {
   KeepContainers = true
}
{code}
This makes it future-proof that both places share the same logic 

> keepContainer does not work when AM retry window is set
> -------------------------------------------------------
>
>                 Key: YARN-6153
>                 URL: https://issues.apache.org/jira/browse/YARN-6153
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.1
>            Reporter: kyungwan nam
>         Attachments: YARN-6153.001.patch, YARN-6153.002.patch, 
> YARN-6153.003.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true, 
> attemptFailuresValidityInterval=300000.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2) 
> without killing containers.
> after 10 minutes, I thought AM failure count was reset by 
> attemptFailuresValidityInterval (5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3 
> was launched properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to