[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-06-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025208#comment-14025208
 ] 

Hudson commented on YARN-1368:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1769 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1769/])
YARN-1368. Added core functionality of recovering container state into 
schedulers after ResourceManager Restart so as to preserve running work in the 
cluster. Contributed by Jian He. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601303)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerRecoverEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStartedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/Queue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 

[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-06-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025271#comment-14025271
 ] 

Hudson commented on YARN-1368:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1796 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1796/])
YARN-1368. Added core functionality of recovering container state into 
schedulers after ResourceManager Restart so as to preserve running work in the 
cluster. Contributed by Jian He. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601303)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/yarn-default.xml
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContext.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMContextImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceTrackerService.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerEventType.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmcontainer/RMContainerRecoverEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeImpl.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmnode/RMNodeStartedEvent.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/Queue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerApplicationAttempt.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/SchedulerNode.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CSQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/CapacityScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/LeafQueue.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/capacity/ParentQueue.java
* 

[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-06-08 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14021545#comment-14021545
 ] 

Vinod Kumar Vavilapalli commented on YARN-1368:
---

+1, looks good. Checking this in..

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.10.patch, 
 YARN-1368.11.patch, YARN-1368.2.patch, YARN-1368.3.patch, YARN-1368.4.patch, 
 YARN-1368.5.patch, YARN-1368.7.patch, YARN-1368.8.patch, YARN-1368.9.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-06-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020209#comment-14020209
 ] 

Hadoop QA commented on YARN-1368:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12648667/YARN-1368.9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3922//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3922//console

This message is automatically generated.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.7.patch, YARN-1368.8.patch, 
 YARN-1368.9.patch, YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-06-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020342#comment-14020342
 ] 

Hadoop QA commented on YARN-1368:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12648699/YARN-1368.10.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3927//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3927//console

This message is automatically generated.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.10.patch, 
 YARN-1368.2.patch, YARN-1368.3.patch, YARN-1368.4.patch, YARN-1368.5.patch, 
 YARN-1368.7.patch, YARN-1368.8.patch, YARN-1368.9.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-06-06 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020595#comment-14020595
 ] 

Vinod Kumar Vavilapalli commented on YARN-1368:
---

Thanks for the udpate! This looks oh-so-close. Some more comments, including 
review of tests now:
 - YarnConfiguration.RM_WORK_PRESERVING_RECOVERY_ENABLED as Private?
 - Also say that in yarn-default.xml that the feature is experimental now.
 - o.a.h.y.s.rm.Applicatoin changes seem unnecessary?
 - MockAM.allocateContainers() is added but not used.
 - MockRM why reduce the sleeps?
 - TestMoveApplication changes are unnecessary?
 - There is a bunch of settle down sleeps. May be in future tickets, we 
should fix them.



 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.10.patch, 
 YARN-1368.2.patch, YARN-1368.3.patch, YARN-1368.4.patch, YARN-1368.5.patch, 
 YARN-1368.7.patch, YARN-1368.8.patch, YARN-1368.9.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-06-06 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020627#comment-14020627
 ] 

Jian He commented on YARN-1368:
---

Thanks Vinod for the review!
bq. MockRM why reduce the sleeps?
reverted.
bq. There is a bunch of settle down sleeps. May be in future tickets, we 
should fix them.
Added a new method to explicitly wait for the num of containers to be recovered.
Fixed other comments also. 

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.10.patch, 
 YARN-1368.11.patch, YARN-1368.2.patch, YARN-1368.3.patch, YARN-1368.4.patch, 
 YARN-1368.5.patch, YARN-1368.7.patch, YARN-1368.8.patch, YARN-1368.9.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014858#comment-14014858
 ] 

Hadoop QA commented on YARN-1368:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647794/YARN-1368.8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 13 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3879//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3879//console

This message is automatically generated.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.7.patch, YARN-1368.8.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013880#comment-14013880
 ] 

Wangda Tan commented on YARN-1368:
--

[~jianhe], while reading YARN-2022, I don't know if you considered 
masterContainer recovering in this patch or not. I haven't found it in the 
patch, I think we need to consider it if it's not here. For preemption policy, 
it's important to get AM containers before making preemption decisions.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.combined.001.patch, 
 YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013998#comment-14013998
 ] 

Jian He commented on YARN-1368:
---

Wangda, AM container is just one type of container and should be covered 
already in the patch.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.combined.001.patch, 
 YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014426#comment-14014426
 ] 

Jian He commented on YARN-1368:
---

bq.Kill container? Same for the following too?
good point,fixed.
bq. Instead we should use getCurrentAttemptForContainer(ContainerId 
containerId)?
I think the RMContainer should be created with the original attempt Id. The 
containerId to attemptId routing will happen automatically.
bq. ContainerRecoveredTransition: Missing other transitions that a regular 
container goes through?
checked the code, we only need to send event to update the ranNodes. Added 
here. Eventually, YARN-1885 should fix the ranNodes thing on recovery.
bq. Kill the container when the following happens?
I added comment saying this condition can never happen.


 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.7.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014439#comment-14014439
 ] 

Wangda Tan commented on YARN-1368:
--

[~jianhe], I mean after RM restart and recover, the 
RMAppAttempt.getMasterContainer will return correct master container or not?

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.7.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014442#comment-14014442
 ] 

Jian He commented on YARN-1368:
---

bq.  RMAppAttempt.getMasterContainer will return correct master container or 
not?
Yes, RMAppAttemptImpl.recover does that.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.7.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014445#comment-14014445
 ] 

Hadoop QA commented on YARN-1368:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12647729/YARN-1368.7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 13 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3871//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3871//console

This message is automatically generated.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.7.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012750#comment-14012750
 ] 

Vinod Kumar Vavilapalli commented on YARN-1368:
---

Started looking at this. It is too big a patch. Can you split this into pieces: 
one for RM-NM protocol stuff and one for the remaining?

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.combined.001.patch, 
 YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-29 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012795#comment-14012795
 ] 

Vinod Kumar Vavilapalli commented on YARN-1368:
---

bq. Unless I'm missing something, Anubhav was working on this JIRA. It is great 
that Jian did the refactoring to have common code for the schedulers and some 
testcases for it, but most of the work has been done by Anubhav and he was 
working actively on it. We should reassign the JIRA back to Anubhav and let him 
drive it to completion, agree?
[~tucu00], if you see the history of this ticket, there was nothing here that 
showed that Anubhav was working on this. There was a prototype patch on the 
umbrella JIRA, but the part of the patch that focuses on this JIRA was 
completely off the mark. It is specifically for these reasons that I publicly 
discouraged posting prototype patch for this well-defined feature. In this 
case, the prototype was off the mark, and folks(not just [~jianhe]) were 
confused about the individual tasks filed.

bq. Jian He, I understand the patch is taking a different approach, which is 
based on the work Anubhav started. Instead hijacking the JIRA, the correct way 
should have been proposing to the assignee/author of the original patch the 
changes and offering to contribute/breakdown tasks. Please do so next time.
[~tucu00], I just looked the prototype as well as this patch. Not only is this 
patch taking a different approach, it is also *not* based on the work in the 
prototype. The approach in the prototype is pretty broken - everyone's free to 
look back on both the patches and argue about this - this one aims at 
correcting it by starting from scratch and moving beyond to set the right stage 
for the entire effort.

Even if it were, it is sometimes a major burden to correct original patches, 
specifically when their approaches are completely botched, and then direct them 
towards the right approach. This one's one of those.

I agree in general that he shouldn't have re-purposed this JIRA. This JIRA's 
original purpose was a very small subset of what it is doing now.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.combined.001.patch, 
 YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-29 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013013#comment-14013013
 ] 

Sandy Ryza commented on YARN-1368:
--

Regardless of the technical approach taken in any initial patch, my 
understanding of basic JIRA practice is that assigning a JIRA to oneself (as 
Anubhav did here) means give me some space to work on this.  Because Hadoop 
development is so distributed, it acts as a good coordination mechanism that 
helps us all avoid duplicating work.  Of course, we don't want people to be 
able to sit on JIRAs and stall progress, but a simple Will you be able to get 
to this soon? If not, mind if I take it up? and then waiting a couple of days 
usually suffices to deal with that.

Etiquette is obviously difficult to codify, and I realize that this discussion 
is detracting from the technical details of this JIRA, so I'll stop my 
belly-aching here.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.combined.001.patch, 
 YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008421#comment-14008421
 ] 

Hadoop QA commented on YARN-1368:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646657/YARN-1368.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3826//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3826//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3826//console

This message is automatically generated.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.combined.001.patch, 
 YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-25 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008538#comment-14008538
 ] 

Wangda Tan commented on YARN-1368:
--

[~jianhe],
Thanks for addressing my comments, two more comments,
1. 
bq. it's a repeated field
I think it's better to follow YARN's convention, repeated field uses plural 
formal of word, you can checkout yarn_server_common_service_protos.proto for 
examples,

2. It's better to add test for RegisterNodeManagerPBImpl as well,

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.combined.001.patch, 
 YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007972#comment-14007972
 ] 

Hadoop QA commented on YARN-1368:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646645/YARN-1368.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 16 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/3823//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3823//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3823//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3823//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3823//console

This message is automatically generated.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14007992#comment-14007992
 ] 

Hadoop QA commented on YARN-1368:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646645/YARN-1368.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 16 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.
See 
https://builds.apache.org/job/PreCommit-YARN-Build/3824//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 
release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerPreemption
  
org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.TestFifoScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3824//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3824//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3824//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3824//console

This message is automatically generated.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14008044#comment-14008044
 ] 

Hadoop QA commented on YARN-1368:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12646657/YARN-1368.5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 18 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3825//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/3825//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3825//console

This message is automatically generated.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.4.patch, YARN-1368.5.patch, YARN-1368.combined.001.patch, 
 YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-21 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005511#comment-14005511
 ] 

Jian He commented on YARN-1368:
---

The new patch is rebased on YARN-2017 and created a new ContainerRecoveryReport 
record in NM-RM protocol to include the container resource capability.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, YARN-1368.3.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998823#comment-13998823
 ] 

Wangda Tan commented on YARN-1368:
--

Sorry I went to wrong JIRA, please ignore above comment :-/

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998822#comment-13998822
 ] 

Wangda Tan commented on YARN-1368:
--

LGTM, +1 (non-binding). Please kick off Jenkins building.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-15 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998164#comment-13998164
 ] 

Alejandro Abdelnur commented on YARN-1368:
--

[~ jianhe], I understand the patch is taking a different approach, which is 
based on the work Anubhav started. Instead hijacking the JIRA, the correct way 
should have been proposing -to the assignee/author of the original patch- the 
changes and offering to contribute/breakdown tasks. Please do so next time.


 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-14 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997144#comment-13997144
 ] 

Jian He commented on YARN-1368:
---

New patch fixed Wangda's comments and also implemented the specific recover 
methods for FifoScheduler queue. This patch should be rebased on top of 
YARN-2017.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-14 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997386#comment-13997386
 ] 

Wangda Tan commented on YARN-1368:
--

Hi Jian,
Thanks for updating your patch, I took a look at it, some comments,
1) RMAppImpl.java
{code}
+  // Add the current attempt to the scheduler.It'll be removed from
+  // scheduler in RMAppAttempt#BaseFinalTransition
+  app.handler.handle(new AppAttemptAddedSchedulerEvent(app.currentAttempt
+.getAppAttemptId(), false));
{code}
Not quite understand about this, in current trunk code, how RMAppAttempt notify 
scheduler about AppAttemptAddedSchedulerEvent when recovering? If this is a 
missing part in current trunk, I tend to put sending 
AppAttemptAddedSchedulerEvent code into RMAppAttemptImpl.

2) RMAppAttemptImpl.java
{code}
+   new ContainerFinishedTransition(
+  new AMContainerCrashedAtRunningTransition(),
+  RMAppAttemptState.RUNNING))
{code}
And
{code}
+   new ContainerFinishedTransition(
+  new AMContainerCrashedBeforeRunningTransition(),
+  RMAppAttemptState.LAUNCHED))
{code}
I found the RUNNING and LAUNCHED state are passed in as targetedFinalState, 
and the targetFinalState will only used in FinalStateSavedTransition, I got 
confused, could you please elaborate on this? Why use split 
AMContainerCrashedTransition to two transitions and set their states to 
RUNNING/LAUNCHED differently.

3) In AbstractYarnScheduler.java
3.1 
{code}
+  if (rmApp == null) {
+LOG.error(Skip recovering container  + status
++  for unknown application.);
+continue;
+  }
{code}
And
{code}
+  if (rmApp.getApplicationSubmissionContext().getUnmanagedAM()) {
+if (LOG.isDebugEnabled()) {
+  LOG.debug(Skip recovering container  + status
+  +  for unmanaged AM. + rmApp.getApplicationId());
+}
+continue;
+  }
{code}
And
{code}
+  SchedulerApplication schedulerApp = applications.get(appId);
+  if (schedulerApp == null) {
+LOG.info(Skip recovering container   + status
++  for unknown SchedulerApplication. Application state is 
++ rmApp.getState());
+continue;
+  }
{code}
It's better to make log level more consistency.

3.2
{code}
+  public RMContainer createContainer(ContainerStatus status, RMNode node) {
+Container container =
+Container.newInstance(status.getContainerId(), node.getNodeID(),
+  node.getHttpAddress(), Resource.newInstance(1024, 1),
+  Priority.newInstance(0), null);
{code}
Should we change Resource(1024, 1) to its actually resource?

3.3 For recoverContainersOnNode, is it possible NODE_ADDED happened before 
APP_ADDED? I ask this before container may be recovered before its application 
added to scheduler if yes.

4) In FiCaSchedulerNode.java:
{code}
+  @Override
+  public void recoverContainer(RMContainer rmContainer) {
+if (rmContainer.getState().equals(RMContainerState.COMPLETED)) {
+  return;
+}
+allocateContainer(null, rmContainer);
+  }
{code}
Since the allocateContainer doesn't use application-id parameter, I think it's 
better to remove it.

5) In TestWorkPreservingRMRestart.java
{code}
+assertEquals(usedCapacity, queue.getAbsoluteUsedCapacity(), 0);
{code}
It may better to use two parameter assertEquals, because delta is 0

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-14 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997132#comment-13997132
 ] 

Jian He commented on YARN-1368:
---

New patch uploaded.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.2.patch, 
 YARN-1368.combined.001.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995819#comment-13995819
 ] 

Jian He commented on YARN-1368:
---

Hi [~adhoot], thanks for working on the FS changes, can you please separate it 
and upload onto YARN-1370 for FS specific change? I already  have a local patch 
which changes quite a bit from the latest patch uploaded here. And also  
YARN-2017 is likely to go in first which again conflicts quite a bit with the 
patch here. 

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.combined.001.patch, 
 YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997024#comment-13997024
 ] 

Alejandro Abdelnur commented on YARN-1368:
--

[~jianhe], [~vinodkv], 

Unless I'm missing something, Anubhav was working on this JIRA. It is great 
that Jian did the refactoring to have common code for the schedulers and some 
testcases for it, but most of the work has been done by Anubhav and he was 
working actively on it. We should reassign the JIRA back to Anubhav and let him 
drive it to completion, agree?

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.combined.001.patch, 
 YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-13 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997071#comment-13997071
 ] 

Jian He commented on YARN-1368:
---

[~tucu00], Please understand that the patch uploaded here is completely 
different from the prototype patch on YARN-556. The patch here is using a 
different approach to cover all schedulers and also the whole container 
recovery flow is different which simplifies things a lot. This jira itself was 
originally opened as “RM should populate running container allocation 
information from NM resync” and did not cover recovering the schedulers. I 
should have opened a new jira to express the approaches instead of renaming 
this one to avoid confusion.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Jian He
 Attachments: YARN-1368.1.patch, YARN-1368.combined.001.patch, 
 YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13991571#comment-13991571
 ] 

Wangda Tan commented on YARN-1368:
--

[~jianhe], thanks for your reply, only one comment on your reply, others make 
sense to me,
bq. That’s half way code, we eventually want to merge the common code of 
FSSchedulerNode and FicaSchedulerNode into a common base AbstractSchedulerNode.
I think is it better to finish YARN-2017 before commit this JIRA? Otherwise we 
will have lots of additional merge works to do.


 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1368.1.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-07 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13991571#comment-13991571
 ] 

Wangda Tan commented on YARN-1368:
--

[~jianhe], thanks for your reply, only one comment on your reply, others make 
sense to me,
bq. That’s half way code, we eventually want to merge the common code of 
FSSchedulerNode and FicaSchedulerNode into a common base AbstractSchedulerNode.
I think is it better to finish YARN-2017 before commit this JIRA? Otherwise we 
will have lots of additional merge works to do.


 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1368.1.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-05 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990233#comment-13990233
 ] 

Wangda Tan commented on YARN-1368:
--

Hi [~jianhe], thanks for this patch, I'm agree with major strategies. But I've 
some comments and questions,

In AbstractYarnScheduler:recoverContainersOnNode
{code}
+  if (rmApp.getApplicationSubmissionContext().getUnmanagedAM()) {
+if (LOG.isDebugEnabled()) {
+  LOG.debug(Skip recovering container  + status
+  +  for unmanaged AM. + rmApp.getApplicationId());
+}
+continue;
+  }
{code}
Why we don't recover container in unmanaged AM case? In my understand, no 
matter it's managed or unmanaged AM, the recover process should be same. Is 
there any difference between them?

Should this be included in schedulerAttempt.recoverContainer(...)?
{code}
+  // recover app scheduling info
+  schedulerAttempt.appSchedulingInfo.recoverContainer(rmContainer);
{code}

In AppSchedulingInfo.recoverContainer(...)
{code}
+QueueMetrics metrics = queue.getMetrics();
+if (pending) {
+  // If there was any running containers, the application was
+  // running from scheduler's POV.
+  pending = false;
+  metrics.runAppAttempt(applicationId, user);
+}
+if (rmContainer.getState().equals(RMContainerState.COMPLETED)) {
+  return;
+}
+metrics.allocateResources(user, 1, Resource.newInstance(1024, 1), false);
{code}
Should this be a part of queue.recoverContainer(...)? Is it better to create 
QueueMetrics.recoverContainer(...)?

In CapacityScheduler,
{code}
-CollectionFiCaSchedulerNode nodes = cs.getAllNodes().values();
+CollectionSchedulerNode nodes = cs.getAllNodes().values();
{code}
Could you elaborate why do this and a series of change between SchedulerNode 
and FiCaSchedulerNode? Not really understand.

For recoverContainer in queue, should we do top-down (recover from root queue) 
or bottom-up (recover from leaf queue). I found in the patch it's bottom-up, 
should this be decided by scheduler implementation?

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1368.1.patch, YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-01 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13987155#comment-13987155
 ] 

Vinod Kumar Vavilapalli commented on YARN-1368:
---

bq. I think the prototype is doing as a scheduler-specifc fashion and only for 
the FairScheduler. It'll be a maintenance overhead if we implement each 
scheduler separately.
This makes sense. As much as possible, RM restart should be orthogonal to 
schedulers. Further, where possible, we should try to reuse code within the 
schedulers. That way we won't miss stuff like metrics, scheduler calculations, 
resource consumption etc.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-01 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13987182#comment-13987182
 ] 

Sandy Ryza commented on YARN-1368:
--

+1 to sharing common code between schedulers where possible.  However, the 
patch proposed on this JIRA leaves two separate accountings for the nodes in 
the cluster - one inside AbstractYarnScheduler and one inside the specific 
schedulers.  As [~leftnoteasy] pointed out, if we want to merge scheduler code, 
it's probably best to do so in a patch that's separate from adding the 
re-population logic.

A meta-comment - [~jianhe], if you have concerns about an approach / prototype 
it's better to provide feedback than to rewrite it yourself.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-05-01 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13987190#comment-13987190
 ] 

Vinod Kumar Vavilapalli commented on YARN-1368:
---

bq. Further, where possible, we should try to reuse code within the schedulers.
To clarify, what I meant was not reuse across schedulers (which BTW is in 
itself a good goal), but reusing existing code that is inside schedulers to 
also do stuff when recovering from a restart. Anyways..

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-04-30 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986213#comment-13986213
 ] 

Wangda Tan commented on YARN-1368:
--

Thanks [~jianhe] for this proposal, I think recover container from NM heartbeat 
is a reasonable way, +1 for general ideas,
Some minor comments,
bq. Noticed that FiCaSchedulerNode and FSSchedulerNode are almost the same. Any 
reason for keeping both ? thinking to merge the common methods into 
SchedulerNode.
Currently IMO, we'd better keep both. To avoid involving too much parts in this 
JIRA, we can separate the merge common logic of them to a new task.
bq. ContainerStatus sent in NM registration doesn’t capture enough information 
for re-constructing the containers. we may replace that with a new object or 
just adding more fields to encapsulate all the necessary information for 
re-constructing the container.
Personally I think create a new type specialized for container recovering is 
better, ContainerStatus is also used in node heartbeat. Including too much 
fields in each heartbeat isn't safe or efficient

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-04-30 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986222#comment-13986222
 ] 

Anubhav Dhoot commented on YARN-1368:
-

Hi [~jianhe], I have spent a bunch time of time thinking on this and related 
issues and have covered a bunch of these in the prototype on 
[YARN-556|https://issues.apache.org/jira/browse/YARN-556]. Lets sync up so we 
avoid duplicated effort.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-04-30 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986270#comment-13986270
 ] 

Jian He commented on YARN-1368:
---

Hi [~adhoot],  sure. Thanks for sharing the prototype. I looked at the patch.
I think the prototype  is doing as a scheduler-specifc fashion and only for the 
FairScheduler. It'll be a maintenance  overhead if we implement each scheduler 
separately.  The prototype recovers a portion of the FairScheduler by reusing 
some existing APIs,  but I think missed some other state for the scheduler 
attempt, queue metrics, schedulerNode.
The patch submitted here is to experiment  a generic approach for recovering 
all scheduler state including all entities(metrics, schedulerNode etc,)  with 
no or minimum scheduler-specific changes.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1368.preliminary.patch


 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-04-29 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985151#comment-13985151
 ] 

Jian He commented on YARN-1368:
---

Hi [~adhoot], mind if I take this over ? I have a preliminary patch which does 
the bulk of the work. I can upload very soon. thx

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Anubhav Dhoot

 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1368) Common work to re-populate containers’ state into scheduler

2014-04-29 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985164#comment-13985164
 ] 

Jian He commented on YARN-1368:
---

It's good to have a scheduler-agnostic way to recover the containers and all 
the other scheduler states for app/attempts. Renamed the title to do this.

 Common work to re-populate containers’ state into scheduler
 ---

 Key: YARN-1368
 URL: https://issues.apache.org/jira/browse/YARN-1368
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Anubhav Dhoot

 YARN-1367 adds support for the NM to tell the RM about all currently running 
 containers upon registration. The RM needs to send this information to the 
 schedulers along with the NODE_ADDED_EVENT so that the schedulers can recover 
 the current allocation state of the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)