[jira] [Updated] (YARN-6153) keepContainer does not work when AM retry window is set

2017-03-28 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated YARN-6153:
-
Target Version/s: 2.8.1, 3.0.0-alpha3  (was: 2.8.0, 3.0.0-alpha3)

> keepContainer does not work when AM retry window is set
> ---
>
> Key: YARN-6153
> URL: https://issues.apache.org/jira/browse/YARN-6153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: kyungwan nam
>Assignee: kyungwan nam
> Fix For: 2.8.0, 3.0.0-alpha3
>
> Attachments: YARN-6153.001.patch, YARN-6153.002.patch, 
> YARN-6153.003.patch, YARN-6153.004.patch, YARN-6153.005.patch, 
> YARN-6153.006.patch, YARN-6153-branch-2.8.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true, 
> attemptFailuresValidityInterval=30.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2) 
> without killing containers.
> after 10 minutes, I thought AM failure count was reset by 
> attemptFailuresValidityInterval (5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3 
> was launched properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6153) keepContainer does not work when AM retry window is set

2017-03-02 Thread kyungwan nam (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-6153:
---
Attachment: (was: YARN-6153.006-1.patch)

> keepContainer does not work when AM retry window is set
> ---
>
> Key: YARN-6153
> URL: https://issues.apache.org/jira/browse/YARN-6153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: kyungwan nam
>Assignee: kyungwan nam
> Fix For: 2.8.0, 3.0.0-alpha3
>
> Attachments: YARN-6153.001.patch, YARN-6153.002.patch, 
> YARN-6153.003.patch, YARN-6153.004.patch, YARN-6153.005.patch, 
> YARN-6153.006.patch, YARN-6153-branch-2.8.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true, 
> attemptFailuresValidityInterval=30.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2) 
> without killing containers.
> after 10 minutes, I thought AM failure count was reset by 
> attemptFailuresValidityInterval (5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3 
> was launched properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6153) keepContainer does not work when AM retry window is set

2017-03-02 Thread kyungwan nam (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-6153:
---
Attachment: (was: YARN-6153-branch-2.8.patch)

> keepContainer does not work when AM retry window is set
> ---
>
> Key: YARN-6153
> URL: https://issues.apache.org/jira/browse/YARN-6153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: kyungwan nam
>Assignee: kyungwan nam
> Fix For: 2.8.0, 3.0.0-alpha3
>
> Attachments: YARN-6153.001.patch, YARN-6153.002.patch, 
> YARN-6153.003.patch, YARN-6153.004.patch, YARN-6153.005.patch, 
> YARN-6153.006-1.patch, YARN-6153.006.patch, YARN-6153-branch-2.8.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true, 
> attemptFailuresValidityInterval=30.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2) 
> without killing containers.
> after 10 minutes, I thought AM failure count was reset by 
> attemptFailuresValidityInterval (5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3 
> was launched properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6153) keepContainer does not work when AM retry window is set

2017-03-02 Thread kyungwan nam (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-6153:
---
Attachment: YARN-6153-branch-2.8.patch

Thanks for your comment...
I'm uploading a new patch for the branch-2.8.
the system clock in the RMAppImpl will be used for checking validity interval.


> keepContainer does not work when AM retry window is set
> ---
>
> Key: YARN-6153
> URL: https://issues.apache.org/jira/browse/YARN-6153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: kyungwan nam
>Assignee: kyungwan nam
> Fix For: 2.8.0, 3.0.0-alpha3
>
> Attachments: YARN-6153.001.patch, YARN-6153.002.patch, 
> YARN-6153.003.patch, YARN-6153.004.patch, YARN-6153.005.patch, 
> YARN-6153.006-1.patch, YARN-6153.006.patch, YARN-6153-branch-2.8.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true, 
> attemptFailuresValidityInterval=30.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2) 
> without killing containers.
> after 10 minutes, I thought AM failure count was reset by 
> attemptFailuresValidityInterval (5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3 
> was launched properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6153) keepContainer does not work when AM retry window is set

2017-03-02 Thread kyungwan nam (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-6153:
---
Attachment: YARN-6153.006-1.patch

I'm uploading the additional patch for the hadoop-trunk. (YARN-6153.006-1.patch)
above 1. problem has been fixed in the same way as the branch-2.8.



> keepContainer does not work when AM retry window is set
> ---
>
> Key: YARN-6153
> URL: https://issues.apache.org/jira/browse/YARN-6153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: kyungwan nam
>Assignee: kyungwan nam
> Fix For: 2.8.0, 3.0.0-alpha3
>
> Attachments: YARN-6153.001.patch, YARN-6153.002.patch, 
> YARN-6153.003.patch, YARN-6153.004.patch, YARN-6153.005.patch, 
> YARN-6153.006-1.patch, YARN-6153.006.patch, YARN-6153-branch-2.8.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true, 
> attemptFailuresValidityInterval=30.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2) 
> without killing containers.
> after 10 minutes, I thought AM failure count was reset by 
> attemptFailuresValidityInterval (5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3 
> was launched properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6153) keepContainer does not work when AM retry window is set

2017-03-02 Thread kyungwan nam (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-6153:
---
Attachment: YARN-6153-branch-2.8.patch

I'm uploading the patch for branch-2.8.

1. in the testRMAppAttemptFailuresValidityInterval, using the systemClock has 
been replaced with Thread.sleep.

by following, the time to check the validity interval is no longer the 
systemClock in RMAppImpl.

{code}
-  private int getNumFailedAppAttempts() {
+  public int getNumFailedAppAttempts() {
 int completedAttempts = 0;
-long endTime = this.systemClock.getTime();
 // Do not count AM preemption, hardware failures or NM resync
 // as attempt failure.
 for (RMAppAttempt attempt : attempts.values()) {
   if (attempt.shouldCountTowardsMaxAttemptRetry()) {
-if (this.attemptFailuresValidityInterval <= 0
-|| (attempt.getFinishTime() > endTime
-- this.attemptFailuresValidityInterval)) {
-  completedAttempts++;
-}
+completedAttempts++;
   }
 }
{code}

2. in the testAMRestartNotLostContainerAfterAttemptFailuresValidityInterval, 
the timeout value has been increased to 40 seconds.

currently, YARN-4807 is not yet included in the branch-2.8. I think that’s why 
the timeout happens.


> keepContainer does not work when AM retry window is set
> ---
>
> Key: YARN-6153
> URL: https://issues.apache.org/jira/browse/YARN-6153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: kyungwan nam
>Assignee: kyungwan nam
> Fix For: 2.8.0, 3.0.0-alpha3
>
> Attachments: YARN-6153.001.patch, YARN-6153.002.patch, 
> YARN-6153.003.patch, YARN-6153.004.patch, YARN-6153.005.patch, 
> YARN-6153.006.patch, YARN-6153-branch-2.8.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true, 
> attemptFailuresValidityInterval=30.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2) 
> without killing containers.
> after 10 minutes, I thought AM failure count was reset by 
> attemptFailuresValidityInterval (5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3 
> was launched properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6153) keepContainer does not work when AM retry window is set

2017-02-27 Thread kyungwan nam (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-6153:
---
Attachment: YARN-6153.006.patch

Thanks for your review.
Ok. I'm uploading a new patch 006, which it has been fixed according to your 
comment.

> keepContainer does not work when AM retry window is set
> ---
>
> Key: YARN-6153
> URL: https://issues.apache.org/jira/browse/YARN-6153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: kyungwan nam
> Attachments: YARN-6153.001.patch, YARN-6153.002.patch, 
> YARN-6153.003.patch, YARN-6153.004.patch, YARN-6153.005.patch, 
> YARN-6153.006.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true, 
> attemptFailuresValidityInterval=30.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2) 
> without killing containers.
> after 10 minutes, I thought AM failure count was reset by 
> attemptFailuresValidityInterval (5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3 
> was launched properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6153) keepContainer does not work when AM retry window is set

2017-02-24 Thread kyungwan nam (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-6153:
---
Attachment: YARN-6153.005.patch

my source tree was not up-to-date.
that's why compile is failed.
I'm uploading a new patch which is based on up-to-date source.

> keepContainer does not work when AM retry window is set
> ---
>
> Key: YARN-6153
> URL: https://issues.apache.org/jira/browse/YARN-6153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: kyungwan nam
> Attachments: YARN-6153.001.patch, YARN-6153.002.patch, 
> YARN-6153.003.patch, YARN-6153.004.patch, YARN-6153.005.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true, 
> attemptFailuresValidityInterval=30.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2) 
> without killing containers.
> after 10 minutes, I thought AM failure count was reset by 
> attemptFailuresValidityInterval (5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3 
> was launched properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6153) keepContainer does not work when AM retry window is set

2017-02-22 Thread kyungwan nam (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-6153:
---
Attachment: YARN-6153.004.patch

thanks for review.
yes, It looks better. I’m uploading a new patch.
I have fixed it as your comment.
Consequently, It looks like that _maybeLastAttempt_ is no longer used. so, the 
code associated with _maybeLastAttempt_ have been removed.

> keepContainer does not work when AM retry window is set
> ---
>
> Key: YARN-6153
> URL: https://issues.apache.org/jira/browse/YARN-6153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: kyungwan nam
> Attachments: YARN-6153.001.patch, YARN-6153.002.patch, 
> YARN-6153.003.patch, YARN-6153.004.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true, 
> attemptFailuresValidityInterval=30.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2) 
> without killing containers.
> after 10 minutes, I thought AM failure count was reset by 
> attemptFailuresValidityInterval (5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3 
> was launched properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6153) keepContainer does not work when AM retry window is set

2017-02-21 Thread kyungwan nam (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-6153:
---
Attachment: YARN-6153.003.patch

thanks for review.
I'm uploading a new patch that is fixed as your comment.


> keepContainer does not work when AM retry window is set
> ---
>
> Key: YARN-6153
> URL: https://issues.apache.org/jira/browse/YARN-6153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: kyungwan nam
> Attachments: YARN-6153.001.patch, YARN-6153.002.patch, 
> YARN-6153.003.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true, 
> attemptFailuresValidityInterval=30.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2) 
> without killing containers.
> after 10 minutes, I thought AM failure count was reset by 
> attemptFailuresValidityInterval (5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3 
> was launched properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6153) keepContainer does not work when AM retry window is set

2017-02-16 Thread kyungwan nam (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-6153:
---
Attachment: YARN-6153.002.patch

thanks for your review.
I'm attaching a new patch.
- as your suggestion, the checking logic is moved to 
shouldCountTowardsMaxAttemptRetry.
- to verify this issue a test case is added.

> keepContainer does not work when AM retry window is set
> ---
>
> Key: YARN-6153
> URL: https://issues.apache.org/jira/browse/YARN-6153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: kyungwan nam
> Attachments: YARN-6153.001.patch, YARN-6153.002.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true, 
> attemptFailuresValidityInterval=30.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2) 
> without killing containers.
> after 10 minutes, I thought AM failure count was reset by 
> attemptFailuresValidityInterval (5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3 
> was launched properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6153) keepContainer does not work when AM retry window is set

2017-02-07 Thread kyungwan nam (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-6153:
---
Attachment: YARN-6153.001.patch

if maybeLastAttempt in RMAppAttemptImpl is true, keepContainers is always 
ignored.
but, after AM reset window time, it is no longer the last attempt.

I'm attaching a patch.
if the last attempt is aged as longer than AM reset window time, the 
keepContainers will be kept.

> keepContainer does not work when AM retry window is set
> ---
>
> Key: YARN-6153
> URL: https://issues.apache.org/jira/browse/YARN-6153
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.1
>Reporter: kyungwan nam
> Attachments: YARN-6153.001.patch
>
>
> yarn.resourcemanager.am.max-attempts has been configured to 2 in my cluster.
> I submitted a YARN application (slider app) that keepContainers=true, 
> attemptFailuresValidityInterval=30.
> it did work properly when AM was failed firstly.
> all containers launched by previous AM were resynced with new AM (attempt2) 
> without killing containers.
> after 10 minutes, I thought AM failure count was reset by 
> attemptFailuresValidityInterval (5 minutes).
> but, all containers were killed when AM was failed secondly. (new AM attempt3 
> was launched properly)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org