[jira] [Commented] (YARN-2249) AM release request may be lost on RM restart

2014-08-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103912#comment-14103912
 ] 

Hudson commented on YARN-2249:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #652 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/652/])
YARN-2249. Avoided AM release requests being lost on work preserving RM 
restart. Contributed by Jian He. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1618972)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java


 AM release request may be lost on RM restart
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
 YARN-2249.2.patch, YARN-2249.3.patch, YARN-2249.4.patch, YARN-2249.5.patch


 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2249) AM release request may be lost on RM restart

2014-08-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14104222#comment-14104222
 ] 

Hudson commented on YARN-2249:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1843 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1843/])
YARN-2249. Avoided AM release requests being lost on work preserving RM 
restart. Contributed by Jian He. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1618972)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java


 AM release request may be lost on RM restart
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
 YARN-2249.2.patch, YARN-2249.3.patch, YARN-2249.4.patch, YARN-2249.5.patch


 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2249) AM release request may be lost on RM restart

2014-08-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102049#comment-14102049
 ] 

Hadoop QA commented on YARN-2249:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12662609/YARN-2249.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-tools/hadoop-sls 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/4669//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4669//console

This message is automatically generated.

 AM release request may be lost on RM restart
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
 YARN-2249.2.patch, YARN-2249.3.patch, YARN-2249.4.patch, YARN-2249.5.patch


 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2249) AM release request may be lost on RM restart

2014-08-19 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102773#comment-14102773
 ] 

Zhijie Shen commented on YARN-2249:
---

+1 for the latest patch. Will commit it.

 AM release request may be lost on RM restart
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
 YARN-2249.2.patch, YARN-2249.3.patch, YARN-2249.4.patch, YARN-2249.5.patch


 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2249) AM release request may be lost on RM restart

2014-08-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103068#comment-14103068
 ] 

Hudson commented on YARN-2249:
--

FAILURE: Integrated in Hadoop-trunk-Commit #6088 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6088/])
YARN-2249. Avoided AM release requests being lost on work preserving RM 
restart. Contributed by Jian He. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1618972)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java


 AM release request may be lost on RM restart
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
 YARN-2249.2.patch, YARN-2249.3.patch, YARN-2249.4.patch, YARN-2249.5.patch


 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2249) AM release request may be lost on RM restart

2014-08-19 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14103086#comment-14103086
 ] 

Hudson commented on YARN-2249:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1868 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1868/])
YARN-2249. Avoided AM release requests being lost on work preserving RM 
restart. Contributed by Jian He. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1618972)
* 
/hadoop/common/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/ResourceSchedulerWrapper.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fifo/FifoScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockAM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestApplicationMasterService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMRestart.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestWorkPreservingRMRestart.java


 AM release request may be lost on RM restart
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Fix For: 2.6.0

 Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
 YARN-2249.2.patch, YARN-2249.3.patch, YARN-2249.4.patch, YARN-2249.5.patch


 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2249) AM release request may be lost on RM restart

2014-08-18 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101250#comment-14101250
 ] 

Zhijie Shen commented on YARN-2249:
---

1. Do the following in AbstractYarnScheduler.serviceInit?
{code}
+super.nmExpireInterval =
+conf.getInt(YarnConfiguration.RM_NM_EXPIRY_INTERVAL_MS,
+  YarnConfiguration.DEFAULT_RM_NM_EXPIRY_INTERVAL_MS);
{code}
{code}
+createReleaseCache();
{code}

2. Add RM_NM_EXPIRY_INTERVAL_MS in yarn-default.xml?

3. Not sure it's going to be an efficient data structure. Different apps' 
containers should not affect each other, right? mutex on the whole collection 
seems to be a too coarse granularity (blocking allocate call). Should we use 
MapAppAttemptId, ListContainerId and make each app have separate mutex?
{code}
+  private SetContainerId pendingRelease = null;
+  private final Object mutex = new Object();
{code}

 AM release request may be lost on RM restart
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
 YARN-2249.2.patch, YARN-2249.3.patch, YARN-2249.4.patch


 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2249) AM release request may be lost on RM restart

2014-08-18 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101443#comment-14101443
 ] 

Jian He commented on YARN-2249:
---

Thanks Zhijie for the review ! 
bq.  Do the following in AbstractYarnScheduler.serviceInit?
fixed.
bq. Add RM_NM_EXPIRY_INTERVAL_MS in yarn-default.xml?
It is already present.
bq. Not sure it's going to be an efficient data structure. Different apps' 
containers should not affect each other, right? mutex on the whole collection 
seems to be a too coarse granularity (blocking allocate call). Should we use 
MapAppAttemptId, ListContainerId and make each app have separate mutex?
I moved the pendingReleases to SchedulerApplicationAttempt and lock the attempt 
object instead.

 AM release request may be lost on RM restart
 

 Key: YARN-2249
 URL: https://issues.apache.org/jira/browse/YARN-2249
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2249.1.patch, YARN-2249.1.patch, YARN-2249.2.patch, 
 YARN-2249.2.patch, YARN-2249.3.patch, YARN-2249.4.patch, YARN-2249.5.patch


 AM resync on RM restart will send outstanding container release requests back 
 to the new RM. In the meantime, NMs report the container statuses back to RM 
 to recover the containers. If RM receives the container release request  
 before the container is actually recovered in scheduler, the container won't 
 be released and the release request will be lost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)