[jira] [Commented] (YARN-2075) TestRMAdminCLI consistently fail on trunk and branch-2

2014-06-12 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028868#comment-14028868
 ] 

Zhijie Shen commented on YARN-2075:
---

[~shahrs87], thanks your explanation.

+1 for the fix here. I'll commit the patch. We can also consider to do 
something similar to what is done for namenode.

 TestRMAdminCLI consistently fail on trunk and branch-2
 --

 Key: YARN-2075
 URL: https://issues.apache.org/jira/browse/YARN-2075
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-2075.patch


 {code}
 Running org.apache.hadoop.yarn.client.TestRMAdminCLI
 Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec 
  FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI
 testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time 
 elapsed: 0.082 sec   ERROR!
 java.lang.UnsupportedOperationException: null
   at java.util.AbstractList.remove(AbstractList.java:144)
   at java.util.AbstractList$Itr.remove(AbstractList.java:360)
   at java.util.AbstractCollection.remove(AbstractCollection.java:252)
   at 
 org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173)
   at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180)
 testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time elapsed: 0.088 
 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2075) TestRMAdminCLI consistently fail on trunk and branch-2

2014-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028911#comment-14028911
 ] 

Hudson commented on YARN-2075:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5689 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5689/])
YARN-2075. Fixed the test failure of TestRMAdminCLI. Contributed by Kenji 
Kikushima. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602071)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java


 TestRMAdminCLI consistently fail on trunk and branch-2
 --

 Key: YARN-2075
 URL: https://issues.apache.org/jira/browse/YARN-2075
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Fix For: 2.5.0

 Attachments: YARN-2075.patch


 {code}
 Running org.apache.hadoop.yarn.client.TestRMAdminCLI
 Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec 
  FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI
 testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time 
 elapsed: 0.082 sec   ERROR!
 java.lang.UnsupportedOperationException: null
   at java.util.AbstractList.remove(AbstractList.java:144)
   at java.util.AbstractList$Itr.remove(AbstractList.java:360)
   at java.util.AbstractCollection.remove(AbstractCollection.java:252)
   at 
 org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173)
   at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180)
 testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time elapsed: 0.088 
 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command

2014-06-12 Thread Kenji Kikushima (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kenji Kikushima updated YARN-1480:
--

Attachment: YARN-1480-5.patch

Rebased on trunk. Thanks for setting target version, [~zjshen].

 RM web services getApps() accepts many more filters than ApplicationCLI 
 list command
 --

 Key: YARN-1480
 URL: https://issues.apache.org/jira/browse/YARN-1480
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, 
 YARN-1480-5.patch, YARN-1480.patch


 Nowadays RM web services getApps() accepts many more filters than 
 ApplicationCLI list command, which only accepts state and type. IMHO, 
 ideally, different interfaces should provide consistent functionality. Is it 
 better to allow more filters in ApplicationCLI?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command

2014-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029024#comment-14029024
 ] 

Hadoop QA commented on YARN-1480:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650030/YARN-1480-5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3970//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3970//console

This message is automatically generated.

 RM web services getApps() accepts many more filters than ApplicationCLI 
 list command
 --

 Key: YARN-1480
 URL: https://issues.apache.org/jira/browse/YARN-1480
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, 
 YARN-1480-5.patch, YARN-1480.patch


 Nowadays RM web services getApps() accepts many more filters than 
 ApplicationCLI list command, which only accepts state and type. IMHO, 
 ideally, different interfaces should provide consistent functionality. Is it 
 better to allow more filters in ApplicationCLI?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2148) TestNMClient failed due more exit code values added and passed to AM

2014-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029047#comment-14029047
 ] 

Hudson commented on YARN-2148:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #581 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/581/])
YARN-2148. TestNMClient failed due more exit code values added and passed to AM 
(Wangda Tan via bikas) (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602043)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java


 TestNMClient failed due more exit code values added and passed to AM
 

 Key: YARN-2148
 URL: https://issues.apache.org/jira/browse/YARN-2148
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0, 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: YARN-2148.patch


 Currently, TestNMClient will be failed in trunk, see 
 https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClient/
 {code}
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:385)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:347)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
 {code}
 Test cases in TestNMClient uses following code to verify exit code of 
 COMPLETED containers
 {code}
   testGetContainerStatus(container, i, ContainerState.COMPLETE,
   Container killed by the ApplicationMaster., Arrays.asList(
   new Integer[] {137, 143, 0}));
 {code}
 But YARN-2091 added logic to make exit code reflecting the actual status, so 
 exit code of the killed by ApplicationMaster will be -105,
 {code}
   if (container.hasDefaultExitCode()) {
 container.exitCode = exitEvent.getExitCode();
   }
 {code}
 We should update test case as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2125) ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled

2014-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029045#comment-14029045
 ] 

Hudson commented on YARN-2125:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #581 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/581/])
YARN-2125. Changed ProportionalCapacityPreemptionPolicy to log CSV in debug 
level. Contributed by Wangda Tan (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601980)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java


 ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled
 ---

 Key: YARN-2125
 URL: https://issues.apache.org/jira/browse/YARN-2125
 Project: Hadoop YARN
  Issue Type: Task
  Components: resourcemanager, scheduler
Affects Versions: 3.0.0
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Minor
 Fix For: 2.5.0

 Attachments: YARN-2125.patch, YARN-2125.patch


 Currently, logToCSV() will be output using LOG.info() in 
 ProportionalCapacityPreemptionPolicy. Which will generate non-human-readable 
 texts in resource manager's log every several seconds, like
 {code}
 ...
 2014-06-05 15:57:07,603 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1401955027603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, 
 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, 
 3072, 2, 3072, 2, 0, 0, 0, 0
 2014-06-05 15:57:10,603 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1401955030603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, 
 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, 
 3072, 2, 3072, 2, 0, 0, 0, 0
 ...
 {code}
 It's better to make it output when debug enabled.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2124) ProportionalCapacityPreemptionPolicy cannot work because it's initialized before scheduler initialized

2014-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029044#comment-14029044
 ] 

Hudson commented on YARN-2124:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #581 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/581/])
YARN-2124. Fixed NPE in ProportionalCapacityPreemptionPolicy. Contributed by 
Wangda Tan (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601964)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingMonitor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java


 ProportionalCapacityPreemptionPolicy cannot work because it's initialized 
 before scheduler initialized
 --

 Key: YARN-2124
 URL: https://issues.apache.org/jira/browse/YARN-2124
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 3.0.0
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical
 Fix For: 2.5.0

 Attachments: YARN-2124.patch, YARN-2124.patch


 When I play with scheduler with preemption, I found 
 ProportionalCapacityPreemptionPolicy cannot work. NPE will be raised when RM 
 start
 {code}
 2014-06-05 11:01:33,201 ERROR 
 org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
 Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw 
 an Exception.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.util.resource.Resources.greaterThan(Resources.java:225)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.computeIdealResourceDistribution(ProportionalCapacityPreemptionPolicy.java:302)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.recursivelyComputeIdealAssignment(ProportionalCapacityPreemptionPolicy.java:261)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:198)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:174)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 This is caused by ProportionalCapacityPreemptionPolicy needs 
 ResourceCalculator from CapacityScheduler. But 
 ProportionalCapacityPreemptionPolicy get initialized before CapacityScheduler 
 initialized. So ResourceCalculator will set to null in 
 ProportionalCapacityPreemptionPolicy. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2124) ProportionalCapacityPreemptionPolicy cannot work because it's initialized before scheduler initialized

2014-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029175#comment-14029175
 ] 

Hudson commented on YARN-2124:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1772 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1772/])
YARN-2124. Fixed NPE in ProportionalCapacityPreemptionPolicy. Contributed by 
Wangda Tan (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601964)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingMonitor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java


 ProportionalCapacityPreemptionPolicy cannot work because it's initialized 
 before scheduler initialized
 --

 Key: YARN-2124
 URL: https://issues.apache.org/jira/browse/YARN-2124
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 3.0.0
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical
 Fix For: 2.5.0

 Attachments: YARN-2124.patch, YARN-2124.patch


 When I play with scheduler with preemption, I found 
 ProportionalCapacityPreemptionPolicy cannot work. NPE will be raised when RM 
 start
 {code}
 2014-06-05 11:01:33,201 ERROR 
 org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
 Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw 
 an Exception.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.util.resource.Resources.greaterThan(Resources.java:225)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.computeIdealResourceDistribution(ProportionalCapacityPreemptionPolicy.java:302)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.recursivelyComputeIdealAssignment(ProportionalCapacityPreemptionPolicy.java:261)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:198)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:174)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 This is caused by ProportionalCapacityPreemptionPolicy needs 
 ResourceCalculator from CapacityScheduler. But 
 ProportionalCapacityPreemptionPolicy get initialized before CapacityScheduler 
 initialized. So ResourceCalculator will set to null in 
 ProportionalCapacityPreemptionPolicy. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2075) TestRMAdminCLI consistently fail on trunk and branch-2

2014-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029177#comment-14029177
 ] 

Hudson commented on YARN-2075:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1772 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1772/])
YARN-2075. Fixed the test failure of TestRMAdminCLI. Contributed by Kenji 
Kikushima. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602071)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java


 TestRMAdminCLI consistently fail on trunk and branch-2
 --

 Key: YARN-2075
 URL: https://issues.apache.org/jira/browse/YARN-2075
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Fix For: 2.5.0

 Attachments: YARN-2075.patch


 {code}
 Running org.apache.hadoop.yarn.client.TestRMAdminCLI
 Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec 
  FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI
 testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time 
 elapsed: 0.082 sec   ERROR!
 java.lang.UnsupportedOperationException: null
   at java.util.AbstractList.remove(AbstractList.java:144)
   at java.util.AbstractList$Itr.remove(AbstractList.java:360)
   at java.util.AbstractCollection.remove(AbstractCollection.java:252)
   at 
 org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173)
   at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180)
 testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time elapsed: 0.088 
 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2125) ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled

2014-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029176#comment-14029176
 ] 

Hudson commented on YARN-2125:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1772 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1772/])
YARN-2125. Changed ProportionalCapacityPreemptionPolicy to log CSV in debug 
level. Contributed by Wangda Tan (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601980)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java


 ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled
 ---

 Key: YARN-2125
 URL: https://issues.apache.org/jira/browse/YARN-2125
 Project: Hadoop YARN
  Issue Type: Task
  Components: resourcemanager, scheduler
Affects Versions: 3.0.0
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Minor
 Fix For: 2.5.0

 Attachments: YARN-2125.patch, YARN-2125.patch


 Currently, logToCSV() will be output using LOG.info() in 
 ProportionalCapacityPreemptionPolicy. Which will generate non-human-readable 
 texts in resource manager's log every several seconds, like
 {code}
 ...
 2014-06-05 15:57:07,603 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1401955027603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, 
 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, 
 3072, 2, 3072, 2, 0, 0, 0, 0
 2014-06-05 15:57:10,603 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1401955030603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, 
 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, 
 3072, 2, 3072, 2, 0, 0, 0, 0
 ...
 {code}
 It's better to make it output when debug enabled.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2148) TestNMClient failed due more exit code values added and passed to AM

2014-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029178#comment-14029178
 ] 

Hudson commented on YARN-2148:
--

SUCCESS: Integrated in Hadoop-Hdfs-trunk #1772 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1772/])
YARN-2148. TestNMClient failed due more exit code values added and passed to AM 
(Wangda Tan via bikas) (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602043)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java


 TestNMClient failed due more exit code values added and passed to AM
 

 Key: YARN-2148
 URL: https://issues.apache.org/jira/browse/YARN-2148
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0, 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: YARN-2148.patch


 Currently, TestNMClient will be failed in trunk, see 
 https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClient/
 {code}
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:385)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:347)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
 {code}
 Test cases in TestNMClient uses following code to verify exit code of 
 COMPLETED containers
 {code}
   testGetContainerStatus(container, i, ContainerState.COMPLETE,
   Container killed by the ApplicationMaster., Arrays.asList(
   new Integer[] {137, 143, 0}));
 {code}
 But YARN-2091 added logic to make exit code reflecting the actual status, so 
 exit code of the killed by ApplicationMaster will be -105,
 {code}
   if (container.hasDefaultExitCode()) {
 container.exitCode = exitEvent.getExitCode();
   }
 {code}
 We should update test case as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1885) RM may not send the finished signal to some nodes where the application ran after RM restarts

2014-06-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029188#comment-14029188
 ] 

Wangda Tan commented on YARN-1885:
--

[~jlowe], you're right. The old node will be replaced by new node in 
ReconnectNodeTransition. My bad, sorry for this.

 RM may not send the finished signal to some nodes where the application ran 
 after RM restarts
 -

 Key: YARN-1885
 URL: https://issues.apache.org/jira/browse/YARN-1885
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Wangda Tan
 Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch


 During our HA testing we have seen cases where yarn application logs are not 
 available through the cli but i can look at AM logs through the UI. RM was 
 also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2147) client lacks delegation token exception details when application submit fails

2014-06-12 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He reassigned YARN-2147:
-

Assignee: Chen He

 client lacks delegation token exception details when application submit fails
 -

 Key: YARN-2147
 URL: https://issues.apache.org/jira/browse/YARN-2147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor

 When an client submits an application and the delegation token process fails 
 the client can lack critical details needed to understand the nature of the 
 error.  Only the message of the error exception is conveyed to the client, 
 which sometimes isn't enough to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2124) ProportionalCapacityPreemptionPolicy cannot work because it's initialized before scheduler initialized

2014-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029247#comment-14029247
 ] 

Hudson commented on YARN-2124:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1799 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1799/])
YARN-2124. Fixed NPE in ProportionalCapacityPreemptionPolicy. Contributed by 
Wangda Tan (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601964)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingMonitor.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java


 ProportionalCapacityPreemptionPolicy cannot work because it's initialized 
 before scheduler initialized
 --

 Key: YARN-2124
 URL: https://issues.apache.org/jira/browse/YARN-2124
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, scheduler
Affects Versions: 3.0.0
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Critical
 Fix For: 2.5.0

 Attachments: YARN-2124.patch, YARN-2124.patch


 When I play with scheduler with preemption, I found 
 ProportionalCapacityPreemptionPolicy cannot work. NPE will be raised when RM 
 start
 {code}
 2014-06-05 11:01:33,201 ERROR 
 org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
 Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw 
 an Exception.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.util.resource.Resources.greaterThan(Resources.java:225)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.computeIdealResourceDistribution(ProportionalCapacityPreemptionPolicy.java:302)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.recursivelyComputeIdealAssignment(ProportionalCapacityPreemptionPolicy.java:261)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:198)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:174)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82)
 at java.lang.Thread.run(Thread.java:744)
 {code}
 This is caused by ProportionalCapacityPreemptionPolicy needs 
 ResourceCalculator from CapacityScheduler. But 
 ProportionalCapacityPreemptionPolicy get initialized before CapacityScheduler 
 initialized. So ResourceCalculator will set to null in 
 ProportionalCapacityPreemptionPolicy. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2125) ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled

2014-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029248#comment-14029248
 ] 

Hudson commented on YARN-2125:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1799 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1799/])
YARN-2125. Changed ProportionalCapacityPreemptionPolicy to log CSV in debug 
level. Contributed by Wangda Tan (jianhe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601980)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java


 ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled
 ---

 Key: YARN-2125
 URL: https://issues.apache.org/jira/browse/YARN-2125
 Project: Hadoop YARN
  Issue Type: Task
  Components: resourcemanager, scheduler
Affects Versions: 3.0.0
Reporter: Wangda Tan
Assignee: Wangda Tan
Priority: Minor
 Fix For: 2.5.0

 Attachments: YARN-2125.patch, YARN-2125.patch


 Currently, logToCSV() will be output using LOG.info() in 
 ProportionalCapacityPreemptionPolicy. Which will generate non-human-readable 
 texts in resource manager's log every several seconds, like
 {code}
 ...
 2014-06-05 15:57:07,603 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1401955027603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, 
 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, 
 3072, 2, 3072, 2, 0, 0, 0, 0
 2014-06-05 15:57:10,603 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy:
   QUEUESTATE: 1401955030603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, 
 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, 
 3072, 2, 3072, 2, 0, 0, 0, 0
 ...
 {code}
 It's better to make it output when debug enabled.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2148) TestNMClient failed due more exit code values added and passed to AM

2014-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029250#comment-14029250
 ] 

Hudson commented on YARN-2148:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1799 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1799/])
YARN-2148. TestNMClient failed due more exit code values added and passed to AM 
(Wangda Tan via bikas) (bikas: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602043)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java


 TestNMClient failed due more exit code values added and passed to AM
 

 Key: YARN-2148
 URL: https://issues.apache.org/jira/browse/YARN-2148
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0, 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: YARN-2148.patch


 Currently, TestNMClient will be failed in trunk, see 
 https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClient/
 {code}
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:385)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:347)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
 {code}
 Test cases in TestNMClient uses following code to verify exit code of 
 COMPLETED containers
 {code}
   testGetContainerStatus(container, i, ContainerState.COMPLETE,
   Container killed by the ApplicationMaster., Arrays.asList(
   new Integer[] {137, 143, 0}));
 {code}
 But YARN-2091 added logic to make exit code reflecting the actual status, so 
 exit code of the killed by ApplicationMaster will be -105,
 {code}
   if (container.hasDefaultExitCode()) {
 container.exitCode = exitEvent.getExitCode();
   }
 {code}
 We should update test case as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2075) TestRMAdminCLI consistently fail on trunk and branch-2

2014-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029249#comment-14029249
 ] 

Hudson commented on YARN-2075:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1799 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1799/])
YARN-2075. Fixed the test failure of TestRMAdminCLI. Contributed by Kenji 
Kikushima. (zjshen: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602071)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java


 TestRMAdminCLI consistently fail on trunk and branch-2
 --

 Key: YARN-2075
 URL: https://issues.apache.org/jira/browse/YARN-2075
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Zhijie Shen
Assignee: Kenji Kikushima
 Fix For: 2.5.0

 Attachments: YARN-2075.patch


 {code}
 Running org.apache.hadoop.yarn.client.TestRMAdminCLI
 Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec 
  FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI
 testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time 
 elapsed: 0.082 sec   ERROR!
 java.lang.UnsupportedOperationException: null
   at java.util.AbstractList.remove(AbstractList.java:144)
   at java.util.AbstractList$Itr.remove(AbstractList.java:360)
   at java.util.AbstractCollection.remove(AbstractCollection.java:252)
   at 
 org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173)
   at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144)
   at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447)
   at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380)
   at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180)
 testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI)  Time elapsed: 0.088 
 sec   FAILURE!
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366)
   at 
 org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2148) TestNMClient failed due more exit code values added and passed to AM

2014-06-12 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029270#comment-14029270
 ] 

Tsuyoshi OZAWA commented on YARN-2148:
--



[~zjshen] [~leftnoteasy], thank you for the points.


{quote}
Previously, I have 0 here because it is possible that the container finishes so 
quickly that kill command even hasn't be processed.

However, CLEANUP_CONTAINER is executed on another thread. Before it is 
executed, the container has already exit as normal, with the exit code 0.
{quote}

I'm checking whether this case can happen. Please wait a moment.

{quote}
It seems that we still have 137 and 143 in ExitCode. We need to make sure the 
container will not exit with these two codes here.
{quote}

Is this because that the signal is sent from 
{{ContainerLaunch#cleanupContainer()}}(SIGTERM) and 
{{DelayedProcessKiller#run()}}(SIGKILL), right? If the answer is positive, 
{{ContainerImpl#exitCode}} is set in the {{KillTransition#transition}} before 
container's being signaled. Therefore, both cases are covered.

{code}
  @SuppressWarnings(unchecked) // dispatcher not typed
  public void cleanupContainer() throws IOException {
...
final Signal signal = sleepDelayBeforeSigKill  0
  ? Signal.TERM
  : Signal.KILL;
  }
{code}



 TestNMClient failed due more exit code values added and passed to AM
 

 Key: YARN-2148
 URL: https://issues.apache.org/jira/browse/YARN-2148
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0, 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: YARN-2148.patch


 Currently, TestNMClient will be failed in trunk, see 
 https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClient/
 {code}
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:385)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:347)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
 {code}
 Test cases in TestNMClient uses following code to verify exit code of 
 COMPLETED containers
 {code}
   testGetContainerStatus(container, i, ContainerState.COMPLETE,
   Container killed by the ApplicationMaster., Arrays.asList(
   new Integer[] {137, 143, 0}));
 {code}
 But YARN-2091 added logic to make exit code reflecting the actual status, so 
 exit code of the killed by ApplicationMaster will be -105,
 {code}
   if (container.hasDefaultExitCode()) {
 container.exitCode = exitEvent.getExitCode();
   }
 {code}
 We should update test case as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit

2014-06-12 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029273#comment-14029273
 ] 

Wei Yan commented on YARN-2083:
---

Thanks, [~tianyi]. Here are more comments.
Could we move the test code to a new file TestFSQueue.java, as the evaluated 
function is located in FSQueue.

{code}
boolean couldAssignMoreContainer = schedulable.assignContainerPreCheck(
new FSSchedulerNode(fakeNode, true));
{code}
We don't need to create a new FSSchedulerNode each time. Just create one.

some nitty comments: (1) for comment style, I may much prefer // Test the... 
instead of //test the You can check the other comments in the code.
(2) Not need to create couldAssignMoreContainer, just directly put the 
assignContainerPreCheck function inside the assertTrue/False.

 In fair scheduler, Queue should not been assigned more containers when its 
 usedResource had reach the maxResource limit
 ---

 Key: YARN-2083
 URL: https://issues.apache.org/jira/browse/YARN-2083
 Project: Hadoop YARN
  Issue Type: Bug
  Components: scheduler
Affects Versions: 2.3.0
Reporter: Yi Tian
  Labels: assignContainer, fair, scheduler
 Fix For: 2.4.1

 Attachments: YARN-2083-1.patch, YARN-2083.patch


 In fair scheduler, FSParentQueue and FSLeafQueue do an 
 assignContainerPreCheck to guaranty this queue is not over its limit.
 But the fitsIn function in Resource.java did not return false when the 
 usedResource equals the maxResource.
 I think we should create a new Function fitsInWithoutEqual instead of 
 fitsIn in this case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN

2014-06-12 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029320#comment-14029320
 ] 

haosdent commented on YARN-1964:


Cool feature!

 Create Docker analog of the LinuxContainerExecutor in YARN
 --

 Key: YARN-1964
 URL: https://issues.apache.org/jira/browse/YARN-1964
 Project: Hadoop YARN
  Issue Type: New Feature
Affects Versions: 2.2.0
Reporter: Arun C Murthy
Assignee: Abin Shahab
 Attachments: yarn-1964-branch-2.2.0-docker.patch, 
 yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, 
 yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, 
 yarn-1964-docker.patch


 Docker (https://www.docker.io/) is, increasingly, a very popular container 
 technology.
 In context of YARN, the support for Docker will provide a very elegant 
 solution to allow applications to *package* their software into a Docker 
 container (entire Linux file system incl. custom versions of perl, python 
 etc.) and use it as a blueprint to launch all their YARN containers with 
 requisite software environment. This provides both consistency (all YARN 
 containers will have the same software environment) and isolation (no 
 interference with whatever is installed on the physical machine).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2155) FairScheduler: Incorrect check when trigger a preemption

2014-06-12 Thread Wei Yan (JIRA)
Wei Yan created YARN-2155:
-

 Summary: FairScheduler: Incorrect check when trigger a preemption
 Key: YARN-2155
 URL: https://issues.apache.org/jira/browse/YARN-2155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2155) FairScheduler: Incorrect check when trigger a preemption

2014-06-12 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2155:
--

Description: 
{code}
private boolean shouldAttemptPreemption() {
if (preemptionEnabled) {
  return (preemptionUtilizationThreshold  Math.max(
  (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(),
  (float) rootMetrics.getAvailableVirtualCores() /
  clusterResource.getVirtualCores()));
}
return false;
  }
{code}

preemptionUtilizationThreshould should be compared with allocatedResource 
instead of availableResource.

 FairScheduler: Incorrect check when trigger a preemption
 

 Key: YARN-2155
 URL: https://issues.apache.org/jira/browse/YARN-2155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan

 {code}
 private boolean shouldAttemptPreemption() {
 if (preemptionEnabled) {
   return (preemptionUtilizationThreshold  Math.max(
   (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(),
   (float) rootMetrics.getAvailableVirtualCores() /
   clusterResource.getVirtualCores()));
 }
 return false;
   }
 {code}
 preemptionUtilizationThreshould should be compared with allocatedResource 
 instead of availableResource.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2155) FairScheduler: Incorrect check when trigger a preemption

2014-06-12 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2155:
--

Description: 
{code}
private boolean shouldAttemptPreemption() {
  if (preemptionEnabled) {
return (preemptionUtilizationThreshold  Math.max(
(float) rootMetrics.getAvailableMB() / clusterResource.getMemory(),
(float) rootMetrics.getAvailableVirtualCores() /
clusterResource.getVirtualCores()));
  }
  return false;
}
{code}

preemptionUtilizationThreshould should be compared with allocatedResource 
instead of availableResource.

  was:
{code}
private boolean shouldAttemptPreemption() {
if (preemptionEnabled) {
  return (preemptionUtilizationThreshold  Math.max(
  (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(),
  (float) rootMetrics.getAvailableVirtualCores() /
  clusterResource.getVirtualCores()));
}
return false;
  }
{code}

preemptionUtilizationThreshould should be compared with allocatedResource 
instead of availableResource.


 FairScheduler: Incorrect check when trigger a preemption
 

 Key: YARN-2155
 URL: https://issues.apache.org/jira/browse/YARN-2155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan

 {code}
 private boolean shouldAttemptPreemption() {
   if (preemptionEnabled) {
 return (preemptionUtilizationThreshold  Math.max(
 (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(),
 (float) rootMetrics.getAvailableVirtualCores() /
 clusterResource.getVirtualCores()));
   }
   return false;
 }
 {code}
 preemptionUtilizationThreshould should be compared with allocatedResource 
 instead of availableResource.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2155) FairScheduler: Incorrect check when trigger a preemption

2014-06-12 Thread Wei Yan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Yan updated YARN-2155:
--

Attachment: YARN-2155.patch

 FairScheduler: Incorrect check when trigger a preemption
 

 Key: YARN-2155
 URL: https://issues.apache.org/jira/browse/YARN-2155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2155.patch


 {code}
 private boolean shouldAttemptPreemption() {
   if (preemptionEnabled) {
 return (preemptionUtilizationThreshold  Math.max(
 (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(),
 (float) rootMetrics.getAvailableVirtualCores() /
 clusterResource.getVirtualCores()));
   }
   return false;
 }
 {code}
 preemptionUtilizationThreshould should be compared with allocatedResource 
 instead of availableResource.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2137) Add support for logaggregation to a path on non-default filecontext

2014-06-12 Thread Sumit Kumar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029436#comment-14029436
 ] 

Sumit Kumar commented on YARN-2137:
---

Thanks for pointing these out [~vinodkv]. These make sense :-) working on these 
right now.

 Add support for logaggregation to a path on non-default filecontext
 ---

 Key: YARN-2137
 URL: https://issues.apache.org/jira/browse/YARN-2137
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: log-aggregation
Affects Versions: 2.4.0
Reporter: Sumit Kumar
Assignee: Sumit Kumar
 Attachments: YARN-2137.patch


 Current log-aggregation implementation supports logaggregation to default 
 filecontext only. This patch is to support logaggregation to any of the 
 supported filesystems within hadoop eco-system (hdfs, s3, swiftfs etc). So 
 for example a customer could use hdfs as default filesystem but use s3 or 
 swiftfs for logaggregation. Current implementation makes mixed usages of 
 FileContext+AbstractFileSystem apis as well as FileSystem apis which is 
 confusing.
 This patch does two things:
 # moves logaggregation implementation to use only FileContext apis
 # adds support for doing log aggregation on non-default filesystem as well.
 # changes TestLogAggregationService to use local filesystem itself instead of 
 mocking the behavior



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2148) TestNMClient failed due more exit code values added and passed to AM

2014-06-12 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029504#comment-14029504
 ] 

Tsuyoshi OZAWA commented on YARN-2148:
--

{quote}
The race condition I've observed before is that KillTransition is executed, and 
the the diagnostics info has been added. However, CLEANUP_CONTAINER is executed 
on another thread. Before it is executed, the container has already exit as 
normal, with the exit code 0
{quote}

This was a race condition between a thread which executes CLEANUP_CONTAINER and 
ContainerLauncher and KillTransition. {{ContainerImpl#exitCode}} is set in 
{{KillTransition}} after YARN-2091. Therefore, the case of the exit code 0 
doesn't occur and it's also covered with the [~leftnoteasy]'s patch. I think 
it's consistent change.

{quote}
One more concern: ContainerExitStatus is a pubic class. YARN-2091 seems to be 
incompatible change, while the old code has been used for a while.
{quote}

YARN-2091 introduces new ContainerExitStatus. If old code uses old jar before 
YARN-2091, new exit code should be handled as INVALID or unknown exit code. 
IHMO, we should announce it for YARN application creators at the release time. 
One option is adding document which describe this.

 TestNMClient failed due more exit code values added and passed to AM
 

 Key: YARN-2148
 URL: https://issues.apache.org/jira/browse/YARN-2148
 Project: Hadoop YARN
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0, 2.5.0
Reporter: Wangda Tan
Assignee: Wangda Tan
 Fix For: 2.5.0

 Attachments: YARN-2148.patch


 Currently, TestNMClient will be failed in trunk, see 
 https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClient/
 {code}
 java.lang.AssertionError: null
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:385)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:347)
   at 
 org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226)
 {code}
 Test cases in TestNMClient uses following code to verify exit code of 
 COMPLETED containers
 {code}
   testGetContainerStatus(container, i, ContainerState.COMPLETE,
   Container killed by the ApplicationMaster., Arrays.asList(
   new Integer[] {137, 143, 0}));
 {code}
 But YARN-2091 added logic to make exit code reflecting the actual status, so 
 exit code of the killed by ApplicationMaster will be -105,
 {code}
   if (container.hasDefaultExitCode()) {
 container.exitCode = exitEvent.getExitCode();
   }
 {code}
 We should update test case as well.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2155) FairScheduler: Incorrect check when trigger a preemption

2014-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029514#comment-14029514
 ] 

Hadoop QA commented on YARN-2155:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650080/YARN-2155.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3971//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3971//console

This message is automatically generated.

 FairScheduler: Incorrect check when trigger a preemption
 

 Key: YARN-2155
 URL: https://issues.apache.org/jira/browse/YARN-2155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2155.patch


 {code}
 private boolean shouldAttemptPreemption() {
   if (preemptionEnabled) {
 return (preemptionUtilizationThreshold  Math.max(
 (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(),
 (float) rootMetrics.getAvailableVirtualCores() /
 clusterResource.getVirtualCores()));
   }
   return false;
 }
 {code}
 preemptionUtilizationThreshould should be compared with allocatedResource 
 instead of availableResource.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1919) Log yarn.resourcemanager.cluster-id is required for HA instead of throwing NPE

2014-06-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029520#comment-14029520
 ] 

Jian He commented on YARN-1919:
---

lgtm, +1

 Log yarn.resourcemanager.cluster-id is required for HA instead of throwing NPE
 --

 Key: YARN-1919
 URL: https://issues.apache.org/jira/browse/YARN-1919
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.3.0, 2.4.0
Reporter: Devaraj K
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: YARN-1919.1.patch


 {code:xml}
 2014-04-09 16:14:16,392 WARN org.apache.hadoop.service.AbstractService: When 
 stopping the service 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService : 
 java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceStop(EmbeddedElectorService.java:108)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
   at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:122)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:232)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1038)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-12 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029539#comment-14029539
 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--

[~jianhe], I think it's OK after fencing operation, but one problem is 
{{recover()}} is invoked before the fencing. My idea to deal with the problem 
is as follows:

1. Active RM stores current epoch value.
2. After the fail over, new active RM recovers epoch and recognizes the epoch 
value as {{epoch + 1}}.
3. New active RM issues {{fence()}} on ZKRMStateStore and increment epoch.

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA

 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-1702) Expose kill app functionality as part of RM web services

2014-06-12 Thread Varun Vasudev (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-1702:


Attachment: apache-yarn-1702.14.patch

{quote}
RMWebService.hasAppAcess() is not used anywhere.
By default, we don't have any filters, and so all the writable web-services 
get an 'unauthorized' errors. This seems reasonable, but let's document it.
{quote}

Fixed.

 Expose kill app functionality as part of RM web services
 

 Key: YARN-1702
 URL: https://issues.apache.org/jira/browse/YARN-1702
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1702.10.patch, apache-yarn-1702.11.patch, 
 apache-yarn-1702.12.patch, apache-yarn-1702.13.patch, 
 apache-yarn-1702.14.patch, apache-yarn-1702.2.patch, 
 apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch, 
 apache-yarn-1702.7.patch, apache-yarn-1702.8.patch, apache-yarn-1702.9.patch


 Expose functionality to kill an app via the ResourceManager web services API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029578#comment-14029578
 ] 

Jian He commented on YARN-2052:
---

bq.  but one problem is recover() is invoked before the fencing
didn't get you. After checking the code, isn't fencing invoked before recover ?

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA

 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1885) RM may not send the finished signal to some nodes where the application ran after RM restarts

2014-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029590#comment-14029590
 ] 

Hadoop QA commented on YARN-1885:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650015/YARN-1885.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3972//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3972//console

This message is automatically generated.

 RM may not send the finished signal to some nodes where the application ran 
 after RM restarts
 -

 Key: YARN-1885
 URL: https://issues.apache.org/jira/browse/YARN-1885
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Arpit Gupta
Assignee: Wangda Tan
 Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, 
 YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch


 During our HA testing we have seen cases where yarn application logs are not 
 available through the cli but i can look at AM logs through the UI. RM was 
 also being restarted in the background as the application was running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2152) Recover missing container information

2014-06-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029595#comment-14029595
 ] 

Jian He commented on YARN-2152:
---

Probably we can add extra to-be-recovered container information in 
ContainerTokenIdentifier as a payload and that'll be sent to NM on container 
launch.

 Recover missing container information
 -

 Key: YARN-2152
 URL: https://issues.apache.org/jira/browse/YARN-2152
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He

 Container information such as container priority and container start time 
 cannot be recovered because NM container today lacks such container 
 information to send across on NM registration when RM recovery happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-12 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029619#comment-14029619
 ] 

Tsuyoshi OZAWA commented on YARN-2052:
--

[~jianhe], my bad, you're right. I misread that RMStore is registered as a 
service of RM. Then we don't need such a tricky way I described.


 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA

 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-06-12 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA reassigned YARN-2052:


Assignee: Tsuyoshi OZAWA

 ContainerId creation after work preserving restart is broken
 

 Key: YARN-2052
 URL: https://issues.apache.org/jira/browse/YARN-2052
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Tsuyoshi OZAWA
Assignee: Tsuyoshi OZAWA

 Container ids are made unique by using the app identifier and appending a 
 monotonically increasing sequence number to it. Since container creation is a 
 high churn activity the RM does not store the sequence number per app. So 
 after restart it does not know what the new sequence number should be for new 
 allocations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services

2014-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029647#comment-14029647
 ] 

Hadoop QA commented on YARN-1702:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12650103/apache-yarn-1702.14.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3973//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3973//console

This message is automatically generated.

 Expose kill app functionality as part of RM web services
 

 Key: YARN-1702
 URL: https://issues.apache.org/jira/browse/YARN-1702
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1702.10.patch, apache-yarn-1702.11.patch, 
 apache-yarn-1702.12.patch, apache-yarn-1702.13.patch, 
 apache-yarn-1702.14.patch, apache-yarn-1702.2.patch, 
 apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch, 
 apache-yarn-1702.7.patch, apache-yarn-1702.8.patch, apache-yarn-1702.9.patch


 Expose functionality to kill an app via the ResourceManager web services API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2151) FairScheduler option for global preemption within hierarchical queues

2014-06-12 Thread Andrey Stepachev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029768#comment-14029768
 ] 

Andrey Stepachev commented on YARN-2151:


Actually there is not much code for the preemption itself, but more about Min 
Share. 
So, this patch can be applied (after rb of course) and should not contradict or 
interfere
with future changes in container preemption logic.

 FairScheduler option for global preemption within hierarchical queues
 -

 Key: YARN-2151
 URL: https://issues.apache.org/jira/browse/YARN-2151
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Andrey Stepachev
 Attachments: YARN-2151.patch


 FairScheduler has hierarchical queues, but fair share calculation and 
 preemption still works withing a limited range and effectively still 
 nonhierarchical.
 This patch solves this incompleteness in two aspects:
 1. Currently MinShare is not propagated to upper queue, that leads to
 fair share calculation ignores all Min Shares in deeper queues. 
 Lets take an example
 (implemented as test case TestFairScheduler#testMinShareInHierarchicalQueues)
 {code}
 ?xml version=1.0?
 allocations
 queue name=queue1
   maxResources10240mb, 10vcores/maxResources
   queue name=big/
   queue name=sub1
 schedulingPolicyfair/schedulingPolicy
 queue name=sub11
   minResources6192mb, 6vcores/minResources
 /queue
   /queue
   queue name=sub2
   /queue
 /queue
 /allocations
 {code}
 Then bigApp started within queue1.big with 10x1GB containers.
 That effectively eats all maximum allowed resources for queue1.
 Subsequent requests for app1 (queue1.sub1.sub11) and 
 app2 (queue1.sub2) (5x1GB each) will wait for free resources. 
 Take a note, that sub11 has min share requirements for 6x1GB.
 Without given patch fair share will be calculated with no knowledge 
 about min share requirements and app1 and app2 will get equal 
 number of containers.
 With the patch resources will split according to min share ( in test
 it will be 5 for app1 and 1 for app2)
 That behaviour controlled by the same parameter as ‘globalPreemtion’,
 but that can be changed easily.
 Implementation is a bit awkward, but seems that method for min share
 recalculation can be exposed as public or protected api and constructor
 in FSQueue can call it before using minShare getter. But right now
 current implementation with nulls should work too.
 2. Preemption doesn’t works between queues on different level for the
 queues hierarchy. Moreover, it is not possible to override various 
 parameters for children queues. 
 This patch adds parameter ‘globalPreemption’, which enables global 
 preemption algorithm modifications.
 In a nutshell patch adds function shouldAttemptPreemption(queue),
 which can calculate usage for nested queues, and if queue with usage more 
 that specified threshold is found, preemption can be triggered.
 Aggregated minShare does the rest of work and preemption will work
 as expected within hierarchy of queues with different MinShare/MaxShare
 specifications on different levels.
 Test case TestFairScheduler#testGlobalPreemption depicts how it works.
 One big app gets resources above its fair share and app1 has a declared
 min share. On submission code finds that starvation and preempts enough
 containers to give enough room for app1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2147) client lacks delegation token exception details when application submit fails

2014-06-12 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-2147:
--

Attachment: YARN-2147.patch

I make changes that can log all tokens when it fails to renew them. 

 client lacks delegation token exception details when application submit fails
 -

 Key: YARN-2147
 URL: https://issues.apache.org/jira/browse/YARN-2147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Attachments: YARN-2147.patch


 When an client submits an application and the delegation token process fails 
 the client can lack critical details needed to understand the nature of the 
 error.  Only the message of the error exception is conveyed to the client, 
 which sometimes isn't enough to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2156) ApplicationMasterService#serviceStart() method has hardcoded AuthMethod.TOKEN as security configuration

2014-06-12 Thread Svetozar (JIRA)
Svetozar created YARN-2156:
--

 Summary: ApplicationMasterService#serviceStart() method has 
hardcoded AuthMethod.TOKEN as security configuration
 Key: YARN-2156
 URL: https://issues.apache.org/jira/browse/YARN-2156
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Svetozar


org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService#serviceStart()
 method has mistakenly hardcoded AuthMethod.TOKEN as Hadoop security 
authentication. 

It looks like that:


{code}

@Override
  protected void serviceStart() throws Exception {
Configuration conf = getConfig();
YarnRPC rpc = YarnRPC.create(conf);

InetSocketAddress masterServiceAddress = conf.getSocketAddr(
YarnConfiguration.RM_SCHEDULER_ADDRESS,
YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS,
YarnConfiguration.DEFAULT_RM_SCHEDULER_PORT);

Configuration serverConf = conf;
// If the auth is not-simple, enforce it to be token-based.
serverConf = new Configuration(conf);
serverConf.set(
CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION,
SaslRpcServer.AuthMethod.TOKEN.toString());


...
}
{code}

Obviously such code makes sense only if 
CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION config setting is 
missing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2155) FairScheduler: Incorrect check when trigger a preemption

2014-06-12 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029792#comment-14029792
 ] 

Karthik Kambatla commented on YARN-2155:


Good catch [~octo47]. Thanks for fixing it, Wei.

The fix looks good to me. +1. 

 FairScheduler: Incorrect check when trigger a preemption
 

 Key: YARN-2155
 URL: https://issues.apache.org/jira/browse/YARN-2155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Attachments: YARN-2155.patch


 {code}
 private boolean shouldAttemptPreemption() {
   if (preemptionEnabled) {
 return (preemptionUtilizationThreshold  Math.max(
 (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(),
 (float) rootMetrics.getAvailableVirtualCores() /
 clusterResource.getVirtualCores()));
   }
   return false;
 }
 {code}
 preemptionUtilizationThreshould should be compared with allocatedResource 
 instead of availableResource.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2156) ApplicationMasterService#serviceStart() method has hardcoded AuthMethod.TOKEN as security configuration

2014-06-12 Thread Jian He (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029796#comment-14029796
 ] 

Jian He commented on YARN-2156:
---

Svetozar, I think this is expected because AMRMToken today is used in both 
secure and non-secure environment.

 ApplicationMasterService#serviceStart() method has hardcoded AuthMethod.TOKEN 
 as security configuration
 ---

 Key: YARN-2156
 URL: https://issues.apache.org/jira/browse/YARN-2156
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Svetozar Ivanov

 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService#serviceStart()
  method has mistakenly hardcoded AuthMethod.TOKEN as Hadoop security 
 authentication. 
 It looks like that:
 {code}
 @Override
   protected void serviceStart() throws Exception {
 Configuration conf = getConfig();
 YarnRPC rpc = YarnRPC.create(conf);
 InetSocketAddress masterServiceAddress = conf.getSocketAddr(
 YarnConfiguration.RM_SCHEDULER_ADDRESS,
 YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS,
 YarnConfiguration.DEFAULT_RM_SCHEDULER_PORT);
 Configuration serverConf = conf;
 // If the auth is not-simple, enforce it to be token-based.
 serverConf = new Configuration(conf);
 serverConf.set(
 CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION,
 SaslRpcServer.AuthMethod.TOKEN.toString());
 
 ...
 }
 {code}
 Obviously such code makes sense only if 
 CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION config setting 
 is missing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services

2014-06-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029850#comment-14029850
 ] 

Vinod Kumar Vavilapalli commented on YARN-1702:
---

Looks good, checking this in.

 Expose kill app functionality as part of RM web services
 

 Key: YARN-1702
 URL: https://issues.apache.org/jira/browse/YARN-1702
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Attachments: apache-yarn-1702.10.patch, apache-yarn-1702.11.patch, 
 apache-yarn-1702.12.patch, apache-yarn-1702.13.patch, 
 apache-yarn-1702.14.patch, apache-yarn-1702.2.patch, 
 apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch, 
 apache-yarn-1702.7.patch, apache-yarn-1702.8.patch, apache-yarn-1702.9.patch


 Expose functionality to kill an app via the ResourceManager web services API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2155) FairScheduler: Incorrect threshold check for preemption

2014-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029868#comment-14029868
 ] 

Hudson commented on YARN-2155:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5695 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5695/])
YARN-2155. FairScheduler: Incorrect threshold check for preemption. (Wei Yan 
via kasha) (kasha: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602295)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java


 FairScheduler: Incorrect threshold check for preemption
 ---

 Key: YARN-2155
 URL: https://issues.apache.org/jira/browse/YARN-2155
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Wei Yan
Assignee: Wei Yan
 Fix For: 2.5.0

 Attachments: YARN-2155.patch


 {code}
 private boolean shouldAttemptPreemption() {
   if (preemptionEnabled) {
 return (preemptionUtilizationThreshold  Math.max(
 (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(),
 (float) rootMetrics.getAvailableVirtualCores() /
 clusterResource.getVirtualCores()));
   }
   return false;
 }
 {code}
 preemptionUtilizationThreshould should be compared with allocatedResource 
 instead of availableResource.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails

2014-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029871#comment-14029871
 ] 

Hadoop QA commented on YARN-2147:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650136/YARN-2147.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3974//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3974//console

This message is automatically generated.

 client lacks delegation token exception details when application submit fails
 -

 Key: YARN-2147
 URL: https://issues.apache.org/jira/browse/YARN-2147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Attachments: YARN-2147.patch


 When an client submits an application and the delegation token process fails 
 the client can lack critical details needed to understand the nature of the 
 error.  Only the message of the error exception is conveyed to the client, 
 which sometimes isn't enough to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services

2014-06-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029885#comment-14029885
 ] 

Hudson commented on YARN-1702:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5696 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5696/])
YARN-1702. Added kill app functionality to RM web services. Contributed by 
Varun Vasudev. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602298)
* /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppState.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java
* 
/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm


 Expose kill app functionality as part of RM web services
 

 Key: YARN-1702
 URL: https://issues.apache.org/jira/browse/YARN-1702
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Varun Vasudev
Assignee: Varun Vasudev
 Fix For: 2.5.0

 Attachments: apache-yarn-1702.10.patch, apache-yarn-1702.11.patch, 
 apache-yarn-1702.12.patch, apache-yarn-1702.13.patch, 
 apache-yarn-1702.14.patch, apache-yarn-1702.2.patch, 
 apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch, 
 apache-yarn-1702.7.patch, apache-yarn-1702.8.patch, apache-yarn-1702.9.patch


 Expose functionality to kill an app via the ResourceManager web services API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2147) client lacks delegation token exception details when application submit fails

2014-06-12 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He updated YARN-2147:
--

Attachment: YARN-2147-v2.patch

 client lacks delegation token exception details when application submit fails
 -

 Key: YARN-2147
 URL: https://issues.apache.org/jira/browse/YARN-2147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Attachments: YARN-2147-v2.patch, YARN-2147.patch


 When an client submits an application and the delegation token process fails 
 the client can lack critical details needed to understand the nature of the 
 error.  Only the message of the error exception is conveyed to the client, 
 which sometimes isn't enough to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2156) ApplicationMasterService#serviceStart() method has hardcoded AuthMethod.TOKEN as security configuration

2014-06-12 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029913#comment-14029913
 ] 

Daryn Sharp commented on YARN-2156:
---

Yes, this is by design.  Yarn uses tokens regardless of your security setting.

 ApplicationMasterService#serviceStart() method has hardcoded AuthMethod.TOKEN 
 as security configuration
 ---

 Key: YARN-2156
 URL: https://issues.apache.org/jira/browse/YARN-2156
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Svetozar Ivanov

 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService#serviceStart()
  method has mistakenly hardcoded AuthMethod.TOKEN as Hadoop security 
 authentication. 
 It looks like that:
 {code}
 @Override
   protected void serviceStart() throws Exception {
 Configuration conf = getConfig();
 YarnRPC rpc = YarnRPC.create(conf);
 InetSocketAddress masterServiceAddress = conf.getSocketAddr(
 YarnConfiguration.RM_SCHEDULER_ADDRESS,
 YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS,
 YarnConfiguration.DEFAULT_RM_SCHEDULER_PORT);
 Configuration serverConf = conf;
 // If the auth is not-simple, enforce it to be token-based.
 serverConf = new Configuration(conf);
 serverConf.set(
 CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION,
 SaslRpcServer.AuthMethod.TOKEN.toString());
 
 ...
 }
 {code}
 Obviously such code makes sense only if 
 CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION config setting 
 is missing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (YARN-2156) ApplicationMasterService#serviceStart() method has hardcoded AuthMethod.TOKEN as security configuration

2014-06-12 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp resolved YARN-2156.
---

Resolution: Not a Problem

 ApplicationMasterService#serviceStart() method has hardcoded AuthMethod.TOKEN 
 as security configuration
 ---

 Key: YARN-2156
 URL: https://issues.apache.org/jira/browse/YARN-2156
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Svetozar Ivanov

 org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService#serviceStart()
  method has mistakenly hardcoded AuthMethod.TOKEN as Hadoop security 
 authentication. 
 It looks like that:
 {code}
 @Override
   protected void serviceStart() throws Exception {
 Configuration conf = getConfig();
 YarnRPC rpc = YarnRPC.create(conf);
 InetSocketAddress masterServiceAddress = conf.getSocketAddr(
 YarnConfiguration.RM_SCHEDULER_ADDRESS,
 YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS,
 YarnConfiguration.DEFAULT_RM_SCHEDULER_PORT);
 Configuration serverConf = conf;
 // If the auth is not-simple, enforce it to be token-based.
 serverConf = new Configuration(conf);
 serverConf.set(
 CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION,
 SaslRpcServer.AuthMethod.TOKEN.toString());
 
 ...
 }
 {code}
 Obviously such code makes sense only if 
 CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION config setting 
 is missing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails

2014-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029968#comment-14029968
 ] 

Hadoop QA commented on YARN-2147:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650154/YARN-2147-v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3975//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3975//console

This message is automatically generated.

 client lacks delegation token exception details when application submit fails
 -

 Key: YARN-2147
 URL: https://issues.apache.org/jira/browse/YARN-2147
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Jason Lowe
Assignee: Chen He
Priority: Minor
 Attachments: YARN-2147-v2.patch, YARN-2147.patch


 When an client submits an application and the delegation token process fails 
 the client can lack critical details needed to understand the nature of the 
 error.  Only the message of the error exception is conveyed to the client, 
 which sometimes isn't enough to debug.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (YARN-2157) Document YARN metrics

2014-06-12 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created YARN-2157:
---

 Summary: Document YARN metrics
 Key: YARN-2157
 URL: https://issues.apache.org/jira/browse/YARN-2157
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA


YARN-side of HADOOP-6350.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2157) Document YARN metrics

2014-06-12 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated YARN-2157:


Description: YARN-side of HADOOP-6350. Add YARN metrics to Metrics 
document.  (was: YARN-side of HADOOP-6350.)

 Document YARN metrics
 -

 Key: YARN-2157
 URL: https://issues.apache.org/jira/browse/YARN-2157
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA

 YARN-side of HADOOP-6350. Add YARN metrics to Metrics document.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-1367) After restart NM should resync with the RM without killing containers

2014-06-12 Thread Anubhav Dhoot (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030016#comment-14030016
 ] 

Anubhav Dhoot commented on YARN-1367:
-

I will send an updated patch shortly

 After restart NM should resync with the RM without killing containers
 -

 Key: YARN-1367
 URL: https://issues.apache.org/jira/browse/YARN-1367
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Bikas Saha
Assignee: Anubhav Dhoot
 Attachments: YARN-1367.prototype.patch


 After RM restart, the RM sends a resync response to NMs that heartbeat to it. 
  Upon receiving the resync response, the NM kills all containers and 
 re-registers with the RM. The NM should be changed to not kill the container 
 and instead inform the RM about all currently running containers including 
 their allocations etc. After the re-register, the NM should send all pending 
 container completions to the RM as usual.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2151) FairScheduler option for global preemption within hierarchical queues

2014-06-12 Thread Ashwin Shankar (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030057#comment-14030057
 ] 

Ashwin Shankar commented on YARN-2151:
--

Hi [~octo47],
Couple of questions :
1. The first problem you are trying to solve is minResources of a queue not 
getting accounted in fair share calculation of its ancestry.
Can't we solve this simply by configuring 'weight' properly at the parents ? 
Here is what I'm talking about.I tested the following config out in my 3 node 
cluster of 40G and it works :
{code:xml}
?xml version=1.0?
allocations
queue name=queue1
  maxResources10240mb, 10vcores/maxResources
  weight9/weight
  queue name=big/
  queue name=sub1
  weight7/weight
schedulingPolicyfair/schedulingPolicy
queue name=sub11
  minResources6192mb, 6vcores/minResources
/queue
  /queue
  queue name=sub2
  /queue
/queue
/allocations
{code}

2. I checked out your testcase TestFairScheduler#testGlobalPreemption. In a 
nutshell you are testing two things - problem1 described above
and whether you are inheriting minSharePreemptionTimeout from queue1 to sub11 . 
I have couple of questions on this :
 a. Why don't you want to define minSharePreemptionTimeout in sub11 
itself,since you are anyway configuring it ?
 b. What if someone doesn't want minSharePreemptionTimeout to be inherited and 
would want to use global default ?

 FairScheduler option for global preemption within hierarchical queues
 -

 Key: YARN-2151
 URL: https://issues.apache.org/jira/browse/YARN-2151
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: fairscheduler
Reporter: Andrey Stepachev
 Attachments: YARN-2151.patch


 FairScheduler has hierarchical queues, but fair share calculation and 
 preemption still works withing a limited range and effectively still 
 nonhierarchical.
 This patch solves this incompleteness in two aspects:
 1. Currently MinShare is not propagated to upper queue, that leads to
 fair share calculation ignores all Min Shares in deeper queues. 
 Lets take an example
 (implemented as test case TestFairScheduler#testMinShareInHierarchicalQueues)
 {code}
 ?xml version=1.0?
 allocations
 queue name=queue1
   maxResources10240mb, 10vcores/maxResources
   queue name=big/
   queue name=sub1
 schedulingPolicyfair/schedulingPolicy
 queue name=sub11
   minResources6192mb, 6vcores/minResources
 /queue
   /queue
   queue name=sub2
   /queue
 /queue
 /allocations
 {code}
 Then bigApp started within queue1.big with 10x1GB containers.
 That effectively eats all maximum allowed resources for queue1.
 Subsequent requests for app1 (queue1.sub1.sub11) and 
 app2 (queue1.sub2) (5x1GB each) will wait for free resources. 
 Take a note, that sub11 has min share requirements for 6x1GB.
 Without given patch fair share will be calculated with no knowledge 
 about min share requirements and app1 and app2 will get equal 
 number of containers.
 With the patch resources will split according to min share ( in test
 it will be 5 for app1 and 1 for app2)
 That behaviour controlled by the same parameter as ‘globalPreemtion’,
 but that can be changed easily.
 Implementation is a bit awkward, but seems that method for min share
 recalculation can be exposed as public or protected api and constructor
 in FSQueue can call it before using minShare getter. But right now
 current implementation with nulls should work too.
 2. Preemption doesn’t works between queues on different level for the
 queues hierarchy. Moreover, it is not possible to override various 
 parameters for children queues. 
 This patch adds parameter ‘globalPreemption’, which enables global 
 preemption algorithm modifications.
 In a nutshell patch adds function shouldAttemptPreemption(queue),
 which can calculate usage for nested queues, and if queue with usage more 
 that specified threshold is found, preemption can be triggered.
 Aggregated minShare does the rest of work and preemption will work
 as expected within hierarchy of queues with different MinShare/MaxShare
 specifications on different levels.
 Test case TestFairScheduler#testGlobalPreemption depicts how it works.
 One big app gets resources above its fair share and app1 has a declared
 min share. On submission code finds that starvation and preempts enough
 containers to give enough room for app1.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2152) Recover missing container information

2014-06-12 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2152:
--

Attachment: YARN-2152.1.patch

Uploaded a patch:
- Added two more fields(priority and createTime) in ContainerTokenIdentifier 
- NM will populate the extra container information in NMContainerStatus based 
on the information passed by containerTokenIdentifier. 
- I feel createTime is more appropriate than startTime because it indicates the 
time when container is created.

 Recover missing container information
 -

 Key: YARN-2152
 URL: https://issues.apache.org/jira/browse/YARN-2152
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2152.1.patch


 Container information such as container priority and container start time 
 cannot be recovered because NM container today lacks such container 
 information to send across on NM registration when RM recovery happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy

2014-06-12 Thread Mayank Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030143#comment-14030143
 ] 

Mayank Bansal commented on YARN-2022:
-

Hi [~vinodkv]

what you are saying make sense and I agree to that however I think we still 
need this patch as that will ensure we are give least priority to kill AM's.

Thoughts?

[~sunilg] Thanks for the patch.

here are some high level comments

{code}
+  public static final String SKIP_AM_CONTAINER_FROM_PREEMPTION = 
yarn.resourcemanager.monitor.capacity.preemption.skip_am_container;
{code}
Please run the formatter , it doesn't seems to be the standard length of the 
line

{code}
+skipAMContainer = config.getBoolean(SKIP_AM_CONTAINER_FROM_PREEMPTION,
+false);
{code}
By default it should be true, as we always wanted am to be least priority.

Did you run the test on the cluster?

Thanks,
Mayank

 Preempting an Application Master container can be kept as least priority when 
 multiple applications are marked for preemption by 
 ProportionalCapacityPreemptionPolicy
 -

 Key: YARN-2022
 URL: https://issues.apache.org/jira/browse/YARN-2022
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Sunil G
Assignee: Sunil G
 Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, 
 YARN-2022.3.patch, YARN-2022.4.patch, Yarn-2022.1.patch


 Cluster Size = 16GB [2NM's]
 Queue A Capacity = 50%
 Queue B Capacity = 50%
 Consider there are 3 applications running in Queue A which has taken the full 
 cluster capacity. 
 J1 = 2GB AM + 1GB * 4 Maps
 J2 = 2GB AM + 1GB * 4 Maps
 J3 = 2GB AM + 1GB * 2 Maps
 Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ].
 Currently in this scenario, Jobs J3 will get killed including its AM.
 It is better if AM can be given least priority among multiple applications. 
 In this same scenario, map tasks from J3 and J2 can be preempted.
 Later when cluster is free, maps can be allocated to these Jobs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2152) Recover missing container information

2014-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030150#comment-14030150
 ] 

Hadoop QA commented on YARN-2152:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650196/YARN-2152.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests:

  org.apache.hadoop.yarn.TestContainerLaunchRPC
  org.apache.hadoop.yarn.TestRPC
  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot
  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync
  org.apache.hadoop.yarn.server.nodemanager.TestEventFlow
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServer
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.TestContainer
  
org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown
  
org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater
  
org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager
  
org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3976//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3976//console

This message is automatically generated.

 Recover missing container information
 -

 Key: YARN-2152
 URL: https://issues.apache.org/jira/browse/YARN-2152
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2152.1.patch


 Container information such as container priority and container start time 
 cannot be recovered because NM container today lacks such container 
 information to send across on NM registration when RM recovery happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2152) Recover missing container information

2014-06-12 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2152:
--

Attachment: YARN-2152.1.patch

 Recover missing container information
 -

 Key: YARN-2152
 URL: https://issues.apache.org/jira/browse/YARN-2152
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2152.1.patch, YARN-2152.1.patch


 Container information such as container priority and container start time 
 cannot be recovered because NM container today lacks such container 
 information to send across on NM registration when RM recovery happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2152) Recover missing container information

2014-06-12 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2152:
--

Attachment: (was: YARN-2152.1.patch)

 Recover missing container information
 -

 Key: YARN-2152
 URL: https://issues.apache.org/jira/browse/YARN-2152
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2152.1.patch


 Container information such as container priority and container start time 
 cannot be recovered because NM container today lacks such container 
 information to send across on NM registration when RM recovery happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2152) Recover missing container information

2014-06-12 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2152:
--

Attachment: YARN-2152.1.patch

 Recover missing container information
 -

 Key: YARN-2152
 URL: https://issues.apache.org/jira/browse/YARN-2152
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2152.1.patch, YARN-2152.1.patch


 Container information such as container priority and container start time 
 cannot be recovered because NM container today lacks such container 
 information to send across on NM registration when RM recovery happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers

2014-06-12 Thread Beckham007 (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030175#comment-14030175
 ] 

Beckham007 commented on YARN-2140:
--

I think it could use the net_cls subsystem of cgroup to handle this.
Firstly, it need to refactor 
org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler to 
support various of resource, not only cpu.

 Add support for network IO isolation/scheduling for containers
 --

 Key: YARN-2140
 URL: https://issues.apache.org/jira/browse/YARN-2140
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
Assignee: Wei Yan





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2152) Recover missing container information

2014-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030201#comment-14030201
 ] 

Hadoop QA commented on YARN-2152:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650210/YARN-2152.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 16 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3977//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3977//console

This message is automatically generated.

 Recover missing container information
 -

 Key: YARN-2152
 URL: https://issues.apache.org/jira/browse/YARN-2152
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2152.1.patch, YARN-2152.1.patch


 Container information such as container priority and container start time 
 cannot be recovered because NM container today lacks such container 
 information to send across on NM registration when RM recovery happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers

2014-06-12 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030220#comment-14030220
 ] 

haosdent commented on YARN-2140:


net_cls just classify the package. So cgroup is not enough to do network IO 
isolation/scheduling. And I have tried tc and net_cls, but them don't do well 
in network IO isolation/scheduling even couldn't have any effects on package in 
flow.

 Add support for network IO isolation/scheduling for containers
 --

 Key: YARN-2140
 URL: https://issues.apache.org/jira/browse/YARN-2140
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
Assignee: Wei Yan





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2152) Recover missing container information

2014-06-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030221#comment-14030221
 ] 

Hadoop QA commented on YARN-2152:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12650213/YARN-2152.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 16 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/3978//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3978//console

This message is automatically generated.

 Recover missing container information
 -

 Key: YARN-2152
 URL: https://issues.apache.org/jira/browse/YARN-2152
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2152.1.patch, YARN-2152.1.patch


 Container information such as container priority and container start time 
 cannot be recovered because NM container today lacks such container 
 information to send across on NM registration when RM recovery happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2152) Recover missing container information

2014-06-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030254#comment-14030254
 ] 

Wangda Tan commented on YARN-2152:
--

[~jianhe], thanks for working on this,
I've looked your patch, only comment is, can we rename startTime in 
RMContainerImpl to createTime for consistency? Otherwise, people might think 
they are in different meaning.

 Recover missing container information
 -

 Key: YARN-2152
 URL: https://issues.apache.org/jira/browse/YARN-2152
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2152.1.patch, YARN-2152.1.patch


 Container information such as container priority and container start time 
 cannot be recovered because NM container today lacks such container 
 information to send across on NM registration when RM recovery happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (YARN-2152) Recover missing container information

2014-06-12 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2152:
--

Attachment: YARN-2152.2.patch

Did the rename as suggested

 Recover missing container information
 -

 Key: YARN-2152
 URL: https://issues.apache.org/jira/browse/YARN-2152
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Jian He
Assignee: Jian He
 Attachments: YARN-2152.1.patch, YARN-2152.1.patch, YARN-2152.2.patch


 Container information such as container priority and container start time 
 cannot be recovered because NM container today lacks such container 
 information to send across on NM registration when RM recovery happens



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers

2014-06-12 Thread Beckham007 (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030264#comment-14030264
 ] 

Beckham007 commented on YARN-2140:
--

tc class add dev ${net_dev} parent ${parent_classid} classid ${classid} htb 
rate ${guaranteed_bandwidth}kbps ceil ${max_bandwidth}kbps
It could be used to control the min and max bandwidth of each container.

 Add support for network IO isolation/scheduling for containers
 --

 Key: YARN-2140
 URL: https://issues.apache.org/jira/browse/YARN-2140
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
Assignee: Wei Yan





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures

2014-06-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030269#comment-14030269
 ] 

Wangda Tan commented on YARN-2074:
--

[~jianhe], I just found this patch is failed to apply on trunk, could you 
update it against trunk?
Thanks,

 Preemption of AM containers shouldn't count towards AM failures
 ---

 Key: YARN-2074
 URL: https://issues.apache.org/jira/browse/YARN-2074
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager
Reporter: Vinod Kumar Vavilapalli
Assignee: Jian He
 Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch


 One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM 
 containers getting preempted shouldn't count towards AM failures and thus 
 shouldn't eventually fail applications.
 We should explicitly handle AM container preemption/kill as a separate issue 
 and not count it towards the limit on AM failures.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers

2014-06-12 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030271#comment-14030271
 ] 

Wei Yan commented on YARN-2140:
---

[~haosd...@gmail.com], [~beckham007], net_cls can be used to limit the network 
bandwidth used for each task per device. One problem here is that it is not 
easy for users to specify the accurate network bandwidth requirement for the 
application. I'm still working on the design.

 Add support for network IO isolation/scheduling for containers
 --

 Key: YARN-2140
 URL: https://issues.apache.org/jira/browse/YARN-2140
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Wei Yan
Assignee: Wei Yan





--
This message was sent by Atlassian JIRA
(v6.2#6252)