[jira] [Commented] (YARN-2075) TestRMAdminCLI consistently fail on trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028868#comment-14028868 ] Zhijie Shen commented on YARN-2075: --- [~shahrs87], thanks your explanation. +1 for the fix here. I'll commit the patch. We can also consider to do something similar to what is done for namenode. TestRMAdminCLI consistently fail on trunk and branch-2 -- Key: YARN-2075 URL: https://issues.apache.org/jira/browse/YARN-2075 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Zhijie Shen Assignee: Kenji Kikushima Attachments: YARN-2075.patch {code} Running org.apache.hadoop.yarn.client.TestRMAdminCLI Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI) Time elapsed: 0.082 sec ERROR! java.lang.UnsupportedOperationException: null at java.util.AbstractList.remove(AbstractList.java:144) at java.util.AbstractList$Itr.remove(AbstractList.java:360) at java.util.AbstractCollection.remove(AbstractCollection.java:252) at org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173) at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144) at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447) at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180) testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI) Time elapsed: 0.088 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2075) TestRMAdminCLI consistently fail on trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028911#comment-14028911 ] Hudson commented on YARN-2075: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5689 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5689/]) YARN-2075. Fixed the test failure of TestRMAdminCLI. Contributed by Kenji Kikushima. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602071) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java TestRMAdminCLI consistently fail on trunk and branch-2 -- Key: YARN-2075 URL: https://issues.apache.org/jira/browse/YARN-2075 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Zhijie Shen Assignee: Kenji Kikushima Fix For: 2.5.0 Attachments: YARN-2075.patch {code} Running org.apache.hadoop.yarn.client.TestRMAdminCLI Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI) Time elapsed: 0.082 sec ERROR! java.lang.UnsupportedOperationException: null at java.util.AbstractList.remove(AbstractList.java:144) at java.util.AbstractList$Itr.remove(AbstractList.java:360) at java.util.AbstractCollection.remove(AbstractCollection.java:252) at org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173) at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144) at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447) at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180) testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI) Time elapsed: 0.088 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command
[ https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenji Kikushima updated YARN-1480: -- Attachment: YARN-1480-5.patch Rebased on trunk. Thanks for setting target version, [~zjshen]. RM web services getApps() accepts many more filters than ApplicationCLI list command -- Key: YARN-1480 URL: https://issues.apache.org/jira/browse/YARN-1480 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Kenji Kikushima Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, YARN-1480-5.patch, YARN-1480.patch Nowadays RM web services getApps() accepts many more filters than ApplicationCLI list command, which only accepts state and type. IMHO, ideally, different interfaces should provide consistent functionality. Is it better to allow more filters in ApplicationCLI? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1480) RM web services getApps() accepts many more filters than ApplicationCLI list command
[ https://issues.apache.org/jira/browse/YARN-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029024#comment-14029024 ] Hadoop QA commented on YARN-1480: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650030/YARN-1480-5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3970//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3970//console This message is automatically generated. RM web services getApps() accepts many more filters than ApplicationCLI list command -- Key: YARN-1480 URL: https://issues.apache.org/jira/browse/YARN-1480 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Kenji Kikushima Attachments: YARN-1480-2.patch, YARN-1480-3.patch, YARN-1480-4.patch, YARN-1480-5.patch, YARN-1480.patch Nowadays RM web services getApps() accepts many more filters than ApplicationCLI list command, which only accepts state and type. IMHO, ideally, different interfaces should provide consistent functionality. Is it better to allow more filters in ApplicationCLI? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2148) TestNMClient failed due more exit code values added and passed to AM
[ https://issues.apache.org/jira/browse/YARN-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029047#comment-14029047 ] Hudson commented on YARN-2148: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #581 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/581/]) YARN-2148. TestNMClient failed due more exit code values added and passed to AM (Wangda Tan via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602043) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java TestNMClient failed due more exit code values added and passed to AM Key: YARN-2148 URL: https://issues.apache.org/jira/browse/YARN-2148 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 3.0.0, 2.5.0 Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.5.0 Attachments: YARN-2148.patch Currently, TestNMClient will be failed in trunk, see https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClient/ {code} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:385) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:347) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226) {code} Test cases in TestNMClient uses following code to verify exit code of COMPLETED containers {code} testGetContainerStatus(container, i, ContainerState.COMPLETE, Container killed by the ApplicationMaster., Arrays.asList( new Integer[] {137, 143, 0})); {code} But YARN-2091 added logic to make exit code reflecting the actual status, so exit code of the killed by ApplicationMaster will be -105, {code} if (container.hasDefaultExitCode()) { container.exitCode = exitEvent.getExitCode(); } {code} We should update test case as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2125) ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled
[ https://issues.apache.org/jira/browse/YARN-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029045#comment-14029045 ] Hudson commented on YARN-2125: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #581 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/581/]) YARN-2125. Changed ProportionalCapacityPreemptionPolicy to log CSV in debug level. Contributed by Wangda Tan (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601980) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled --- Key: YARN-2125 URL: https://issues.apache.org/jira/browse/YARN-2125 Project: Hadoop YARN Issue Type: Task Components: resourcemanager, scheduler Affects Versions: 3.0.0 Reporter: Wangda Tan Assignee: Wangda Tan Priority: Minor Fix For: 2.5.0 Attachments: YARN-2125.patch, YARN-2125.patch Currently, logToCSV() will be output using LOG.info() in ProportionalCapacityPreemptionPolicy. Which will generate non-human-readable texts in resource manager's log every several seconds, like {code} ... 2014-06-05 15:57:07,603 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1401955027603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0 2014-06-05 15:57:10,603 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1401955030603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0 ... {code} It's better to make it output when debug enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2124) ProportionalCapacityPreemptionPolicy cannot work because it's initialized before scheduler initialized
[ https://issues.apache.org/jira/browse/YARN-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029044#comment-14029044 ] Hudson commented on YARN-2124: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #581 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/581/]) YARN-2124. Fixed NPE in ProportionalCapacityPreemptionPolicy. Contributed by Wangda Tan (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601964) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingMonitor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java ProportionalCapacityPreemptionPolicy cannot work because it's initialized before scheduler initialized -- Key: YARN-2124 URL: https://issues.apache.org/jira/browse/YARN-2124 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 3.0.0 Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical Fix For: 2.5.0 Attachments: YARN-2124.patch, YARN-2124.patch When I play with scheduler with preemption, I found ProportionalCapacityPreemptionPolicy cannot work. NPE will be raised when RM start {code} 2014-06-05 11:01:33,201 ERROR org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw an Exception. java.lang.NullPointerException at org.apache.hadoop.yarn.util.resource.Resources.greaterThan(Resources.java:225) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.computeIdealResourceDistribution(ProportionalCapacityPreemptionPolicy.java:302) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.recursivelyComputeIdealAssignment(ProportionalCapacityPreemptionPolicy.java:261) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:198) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:174) at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72) at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82) at java.lang.Thread.run(Thread.java:744) {code} This is caused by ProportionalCapacityPreemptionPolicy needs ResourceCalculator from CapacityScheduler. But ProportionalCapacityPreemptionPolicy get initialized before CapacityScheduler initialized. So ResourceCalculator will set to null in ProportionalCapacityPreemptionPolicy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2124) ProportionalCapacityPreemptionPolicy cannot work because it's initialized before scheduler initialized
[ https://issues.apache.org/jira/browse/YARN-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029175#comment-14029175 ] Hudson commented on YARN-2124: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1772 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1772/]) YARN-2124. Fixed NPE in ProportionalCapacityPreemptionPolicy. Contributed by Wangda Tan (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601964) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingMonitor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java ProportionalCapacityPreemptionPolicy cannot work because it's initialized before scheduler initialized -- Key: YARN-2124 URL: https://issues.apache.org/jira/browse/YARN-2124 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 3.0.0 Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical Fix For: 2.5.0 Attachments: YARN-2124.patch, YARN-2124.patch When I play with scheduler with preemption, I found ProportionalCapacityPreemptionPolicy cannot work. NPE will be raised when RM start {code} 2014-06-05 11:01:33,201 ERROR org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw an Exception. java.lang.NullPointerException at org.apache.hadoop.yarn.util.resource.Resources.greaterThan(Resources.java:225) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.computeIdealResourceDistribution(ProportionalCapacityPreemptionPolicy.java:302) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.recursivelyComputeIdealAssignment(ProportionalCapacityPreemptionPolicy.java:261) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:198) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:174) at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72) at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82) at java.lang.Thread.run(Thread.java:744) {code} This is caused by ProportionalCapacityPreemptionPolicy needs ResourceCalculator from CapacityScheduler. But ProportionalCapacityPreemptionPolicy get initialized before CapacityScheduler initialized. So ResourceCalculator will set to null in ProportionalCapacityPreemptionPolicy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2075) TestRMAdminCLI consistently fail on trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029177#comment-14029177 ] Hudson commented on YARN-2075: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1772 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1772/]) YARN-2075. Fixed the test failure of TestRMAdminCLI. Contributed by Kenji Kikushima. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602071) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java TestRMAdminCLI consistently fail on trunk and branch-2 -- Key: YARN-2075 URL: https://issues.apache.org/jira/browse/YARN-2075 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Zhijie Shen Assignee: Kenji Kikushima Fix For: 2.5.0 Attachments: YARN-2075.patch {code} Running org.apache.hadoop.yarn.client.TestRMAdminCLI Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI) Time elapsed: 0.082 sec ERROR! java.lang.UnsupportedOperationException: null at java.util.AbstractList.remove(AbstractList.java:144) at java.util.AbstractList$Itr.remove(AbstractList.java:360) at java.util.AbstractCollection.remove(AbstractCollection.java:252) at org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173) at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144) at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447) at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180) testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI) Time elapsed: 0.088 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2125) ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled
[ https://issues.apache.org/jira/browse/YARN-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029176#comment-14029176 ] Hudson commented on YARN-2125: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1772 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1772/]) YARN-2125. Changed ProportionalCapacityPreemptionPolicy to log CSV in debug level. Contributed by Wangda Tan (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601980) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled --- Key: YARN-2125 URL: https://issues.apache.org/jira/browse/YARN-2125 Project: Hadoop YARN Issue Type: Task Components: resourcemanager, scheduler Affects Versions: 3.0.0 Reporter: Wangda Tan Assignee: Wangda Tan Priority: Minor Fix For: 2.5.0 Attachments: YARN-2125.patch, YARN-2125.patch Currently, logToCSV() will be output using LOG.info() in ProportionalCapacityPreemptionPolicy. Which will generate non-human-readable texts in resource manager's log every several seconds, like {code} ... 2014-06-05 15:57:07,603 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1401955027603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0 2014-06-05 15:57:10,603 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1401955030603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0 ... {code} It's better to make it output when debug enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2148) TestNMClient failed due more exit code values added and passed to AM
[ https://issues.apache.org/jira/browse/YARN-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029178#comment-14029178 ] Hudson commented on YARN-2148: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1772 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1772/]) YARN-2148. TestNMClient failed due more exit code values added and passed to AM (Wangda Tan via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602043) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java TestNMClient failed due more exit code values added and passed to AM Key: YARN-2148 URL: https://issues.apache.org/jira/browse/YARN-2148 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 3.0.0, 2.5.0 Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.5.0 Attachments: YARN-2148.patch Currently, TestNMClient will be failed in trunk, see https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClient/ {code} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:385) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:347) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226) {code} Test cases in TestNMClient uses following code to verify exit code of COMPLETED containers {code} testGetContainerStatus(container, i, ContainerState.COMPLETE, Container killed by the ApplicationMaster., Arrays.asList( new Integer[] {137, 143, 0})); {code} But YARN-2091 added logic to make exit code reflecting the actual status, so exit code of the killed by ApplicationMaster will be -105, {code} if (container.hasDefaultExitCode()) { container.exitCode = exitEvent.getExitCode(); } {code} We should update test case as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1885) RM may not send the finished signal to some nodes where the application ran after RM restarts
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029188#comment-14029188 ] Wangda Tan commented on YARN-1885: -- [~jlowe], you're right. The old node will be replaced by new node in ReconnectNodeTransition. My bad, sorry for this. RM may not send the finished signal to some nodes where the application ran after RM restarts - Key: YARN-1885 URL: https://issues.apache.org/jira/browse/YARN-1885 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Wangda Tan Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch During our HA testing we have seen cases where yarn application logs are not available through the cli but i can look at AM logs through the UI. RM was also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2147) client lacks delegation token exception details when application submit fails
[ https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He reassigned YARN-2147: - Assignee: Chen He client lacks delegation token exception details when application submit fails - Key: YARN-2147 URL: https://issues.apache.org/jira/browse/YARN-2147 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Chen He Priority: Minor When an client submits an application and the delegation token process fails the client can lack critical details needed to understand the nature of the error. Only the message of the error exception is conveyed to the client, which sometimes isn't enough to debug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2124) ProportionalCapacityPreemptionPolicy cannot work because it's initialized before scheduler initialized
[ https://issues.apache.org/jira/browse/YARN-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029247#comment-14029247 ] Hudson commented on YARN-2124: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1799 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1799/]) YARN-2124. Fixed NPE in ProportionalCapacityPreemptionPolicy. Contributed by Wangda Tan (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601964) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/ResourceManager.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/SchedulingMonitor.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/MockRM.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/TestProportionalCapacityPreemptionPolicy.java ProportionalCapacityPreemptionPolicy cannot work because it's initialized before scheduler initialized -- Key: YARN-2124 URL: https://issues.apache.org/jira/browse/YARN-2124 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, scheduler Affects Versions: 3.0.0 Reporter: Wangda Tan Assignee: Wangda Tan Priority: Critical Fix For: 2.5.0 Attachments: YARN-2124.patch, YARN-2124.patch When I play with scheduler with preemption, I found ProportionalCapacityPreemptionPolicy cannot work. NPE will be raised when RM start {code} 2014-06-05 11:01:33,201 ERROR org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[SchedulingMonitor (ProportionalCapacityPreemptionPolicy),5,main] threw an Exception. java.lang.NullPointerException at org.apache.hadoop.yarn.util.resource.Resources.greaterThan(Resources.java:225) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.computeIdealResourceDistribution(ProportionalCapacityPreemptionPolicy.java:302) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.recursivelyComputeIdealAssignment(ProportionalCapacityPreemptionPolicy.java:261) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.containerBasedPreemptOrKill(ProportionalCapacityPreemptionPolicy.java:198) at org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy.editSchedule(ProportionalCapacityPreemptionPolicy.java:174) at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor.invokePolicy(SchedulingMonitor.java:72) at org.apache.hadoop.yarn.server.resourcemanager.monitor.SchedulingMonitor$PreemptionChecker.run(SchedulingMonitor.java:82) at java.lang.Thread.run(Thread.java:744) {code} This is caused by ProportionalCapacityPreemptionPolicy needs ResourceCalculator from CapacityScheduler. But ProportionalCapacityPreemptionPolicy get initialized before CapacityScheduler initialized. So ResourceCalculator will set to null in ProportionalCapacityPreemptionPolicy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2125) ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled
[ https://issues.apache.org/jira/browse/YARN-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029248#comment-14029248 ] Hudson commented on YARN-2125: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1799 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1799/]) YARN-2125. Changed ProportionalCapacityPreemptionPolicy to log CSV in debug level. Contributed by Wangda Tan (jianhe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1601980) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/monitor/capacity/ProportionalCapacityPreemptionPolicy.java ProportionalCapacityPreemptionPolicy should only log CSV when debug enabled --- Key: YARN-2125 URL: https://issues.apache.org/jira/browse/YARN-2125 Project: Hadoop YARN Issue Type: Task Components: resourcemanager, scheduler Affects Versions: 3.0.0 Reporter: Wangda Tan Assignee: Wangda Tan Priority: Minor Fix For: 2.5.0 Attachments: YARN-2125.patch, YARN-2125.patch Currently, logToCSV() will be output using LOG.info() in ProportionalCapacityPreemptionPolicy. Which will generate non-human-readable texts in resource manager's log every several seconds, like {code} ... 2014-06-05 15:57:07,603 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1401955027603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0 2014-06-05 15:57:10,603 INFO org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy: QUEUESTATE: 1401955030603, a1, 4096, 3, 2048, 2, 4096, 3, 4096, 3, 0, 0, 0, 0, b1, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0, b2, 3072, 2, 1024, 1, 3072, 2, 3072, 2, 0, 0, 0, 0 ... {code} It's better to make it output when debug enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2148) TestNMClient failed due more exit code values added and passed to AM
[ https://issues.apache.org/jira/browse/YARN-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029250#comment-14029250 ] Hudson commented on YARN-2148: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1799 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1799/]) YARN-2148. TestNMClient failed due more exit code values added and passed to AM (Wangda Tan via bikas) (bikas: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602043) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/api/impl/TestNMClient.java TestNMClient failed due more exit code values added and passed to AM Key: YARN-2148 URL: https://issues.apache.org/jira/browse/YARN-2148 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 3.0.0, 2.5.0 Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.5.0 Attachments: YARN-2148.patch Currently, TestNMClient will be failed in trunk, see https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClient/ {code} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:385) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:347) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226) {code} Test cases in TestNMClient uses following code to verify exit code of COMPLETED containers {code} testGetContainerStatus(container, i, ContainerState.COMPLETE, Container killed by the ApplicationMaster., Arrays.asList( new Integer[] {137, 143, 0})); {code} But YARN-2091 added logic to make exit code reflecting the actual status, so exit code of the killed by ApplicationMaster will be -105, {code} if (container.hasDefaultExitCode()) { container.exitCode = exitEvent.getExitCode(); } {code} We should update test case as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2075) TestRMAdminCLI consistently fail on trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029249#comment-14029249 ] Hudson commented on YARN-2075: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1799 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1799/]) YARN-2075. Fixed the test failure of TestRMAdminCLI. Contributed by Kenji Kikushima. (zjshen: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602071) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/src/test/java/org/apache/hadoop/yarn/client/TestRMAdminCLI.java TestRMAdminCLI consistently fail on trunk and branch-2 -- Key: YARN-2075 URL: https://issues.apache.org/jira/browse/YARN-2075 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Zhijie Shen Assignee: Kenji Kikushima Fix For: 2.5.0 Attachments: YARN-2075.patch {code} Running org.apache.hadoop.yarn.client.TestRMAdminCLI Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI) Time elapsed: 0.082 sec ERROR! java.lang.UnsupportedOperationException: null at java.util.AbstractList.remove(AbstractList.java:144) at java.util.AbstractList$Itr.remove(AbstractList.java:360) at java.util.AbstractCollection.remove(AbstractCollection.java:252) at org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173) at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144) at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447) at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180) testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI) Time elapsed: 0.088 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2148) TestNMClient failed due more exit code values added and passed to AM
[ https://issues.apache.org/jira/browse/YARN-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029270#comment-14029270 ] Tsuyoshi OZAWA commented on YARN-2148: -- [~zjshen] [~leftnoteasy], thank you for the points. {quote} Previously, I have 0 here because it is possible that the container finishes so quickly that kill command even hasn't be processed. However, CLEANUP_CONTAINER is executed on another thread. Before it is executed, the container has already exit as normal, with the exit code 0. {quote} I'm checking whether this case can happen. Please wait a moment. {quote} It seems that we still have 137 and 143 in ExitCode. We need to make sure the container will not exit with these two codes here. {quote} Is this because that the signal is sent from {{ContainerLaunch#cleanupContainer()}}(SIGTERM) and {{DelayedProcessKiller#run()}}(SIGKILL), right? If the answer is positive, {{ContainerImpl#exitCode}} is set in the {{KillTransition#transition}} before container's being signaled. Therefore, both cases are covered. {code} @SuppressWarnings(unchecked) // dispatcher not typed public void cleanupContainer() throws IOException { ... final Signal signal = sleepDelayBeforeSigKill 0 ? Signal.TERM : Signal.KILL; } {code} TestNMClient failed due more exit code values added and passed to AM Key: YARN-2148 URL: https://issues.apache.org/jira/browse/YARN-2148 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 3.0.0, 2.5.0 Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.5.0 Attachments: YARN-2148.patch Currently, TestNMClient will be failed in trunk, see https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClient/ {code} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:385) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:347) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226) {code} Test cases in TestNMClient uses following code to verify exit code of COMPLETED containers {code} testGetContainerStatus(container, i, ContainerState.COMPLETE, Container killed by the ApplicationMaster., Arrays.asList( new Integer[] {137, 143, 0})); {code} But YARN-2091 added logic to make exit code reflecting the actual status, so exit code of the killed by ApplicationMaster will be -105, {code} if (container.hasDefaultExitCode()) { container.exitCode = exitEvent.getExitCode(); } {code} We should update test case as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2083) In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit
[ https://issues.apache.org/jira/browse/YARN-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029273#comment-14029273 ] Wei Yan commented on YARN-2083: --- Thanks, [~tianyi]. Here are more comments. Could we move the test code to a new file TestFSQueue.java, as the evaluated function is located in FSQueue. {code} boolean couldAssignMoreContainer = schedulable.assignContainerPreCheck( new FSSchedulerNode(fakeNode, true)); {code} We don't need to create a new FSSchedulerNode each time. Just create one. some nitty comments: (1) for comment style, I may much prefer // Test the... instead of //test the You can check the other comments in the code. (2) Not need to create couldAssignMoreContainer, just directly put the assignContainerPreCheck function inside the assertTrue/False. In fair scheduler, Queue should not been assigned more containers when its usedResource had reach the maxResource limit --- Key: YARN-2083 URL: https://issues.apache.org/jira/browse/YARN-2083 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.3.0 Reporter: Yi Tian Labels: assignContainer, fair, scheduler Fix For: 2.4.1 Attachments: YARN-2083-1.patch, YARN-2083.patch In fair scheduler, FSParentQueue and FSLeafQueue do an assignContainerPreCheck to guaranty this queue is not over its limit. But the fitsIn function in Resource.java did not return false when the usedResource equals the maxResource. I think we should create a new Function fitsInWithoutEqual instead of fitsIn in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1964) Create Docker analog of the LinuxContainerExecutor in YARN
[ https://issues.apache.org/jira/browse/YARN-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029320#comment-14029320 ] haosdent commented on YARN-1964: Cool feature! Create Docker analog of the LinuxContainerExecutor in YARN -- Key: YARN-1964 URL: https://issues.apache.org/jira/browse/YARN-1964 Project: Hadoop YARN Issue Type: New Feature Affects Versions: 2.2.0 Reporter: Arun C Murthy Assignee: Abin Shahab Attachments: yarn-1964-branch-2.2.0-docker.patch, yarn-1964-branch-2.2.0-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch, yarn-1964-docker.patch Docker (https://www.docker.io/) is, increasingly, a very popular container technology. In context of YARN, the support for Docker will provide a very elegant solution to allow applications to *package* their software into a Docker container (entire Linux file system incl. custom versions of perl, python etc.) and use it as a blueprint to launch all their YARN containers with requisite software environment. This provides both consistency (all YARN containers will have the same software environment) and isolation (no interference with whatever is installed on the physical machine). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2155) FairScheduler: Incorrect check when trigger a preemption
Wei Yan created YARN-2155: - Summary: FairScheduler: Incorrect check when trigger a preemption Key: YARN-2155 URL: https://issues.apache.org/jira/browse/YARN-2155 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2155) FairScheduler: Incorrect check when trigger a preemption
[ https://issues.apache.org/jira/browse/YARN-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2155: -- Description: {code} private boolean shouldAttemptPreemption() { if (preemptionEnabled) { return (preemptionUtilizationThreshold Math.max( (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(), (float) rootMetrics.getAvailableVirtualCores() / clusterResource.getVirtualCores())); } return false; } {code} preemptionUtilizationThreshould should be compared with allocatedResource instead of availableResource. FairScheduler: Incorrect check when trigger a preemption Key: YARN-2155 URL: https://issues.apache.org/jira/browse/YARN-2155 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan {code} private boolean shouldAttemptPreemption() { if (preemptionEnabled) { return (preemptionUtilizationThreshold Math.max( (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(), (float) rootMetrics.getAvailableVirtualCores() / clusterResource.getVirtualCores())); } return false; } {code} preemptionUtilizationThreshould should be compared with allocatedResource instead of availableResource. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2155) FairScheduler: Incorrect check when trigger a preemption
[ https://issues.apache.org/jira/browse/YARN-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2155: -- Description: {code} private boolean shouldAttemptPreemption() { if (preemptionEnabled) { return (preemptionUtilizationThreshold Math.max( (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(), (float) rootMetrics.getAvailableVirtualCores() / clusterResource.getVirtualCores())); } return false; } {code} preemptionUtilizationThreshould should be compared with allocatedResource instead of availableResource. was: {code} private boolean shouldAttemptPreemption() { if (preemptionEnabled) { return (preemptionUtilizationThreshold Math.max( (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(), (float) rootMetrics.getAvailableVirtualCores() / clusterResource.getVirtualCores())); } return false; } {code} preemptionUtilizationThreshould should be compared with allocatedResource instead of availableResource. FairScheduler: Incorrect check when trigger a preemption Key: YARN-2155 URL: https://issues.apache.org/jira/browse/YARN-2155 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan {code} private boolean shouldAttemptPreemption() { if (preemptionEnabled) { return (preemptionUtilizationThreshold Math.max( (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(), (float) rootMetrics.getAvailableVirtualCores() / clusterResource.getVirtualCores())); } return false; } {code} preemptionUtilizationThreshould should be compared with allocatedResource instead of availableResource. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2155) FairScheduler: Incorrect check when trigger a preemption
[ https://issues.apache.org/jira/browse/YARN-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-2155: -- Attachment: YARN-2155.patch FairScheduler: Incorrect check when trigger a preemption Key: YARN-2155 URL: https://issues.apache.org/jira/browse/YARN-2155 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2155.patch {code} private boolean shouldAttemptPreemption() { if (preemptionEnabled) { return (preemptionUtilizationThreshold Math.max( (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(), (float) rootMetrics.getAvailableVirtualCores() / clusterResource.getVirtualCores())); } return false; } {code} preemptionUtilizationThreshould should be compared with allocatedResource instead of availableResource. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2137) Add support for logaggregation to a path on non-default filecontext
[ https://issues.apache.org/jira/browse/YARN-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029436#comment-14029436 ] Sumit Kumar commented on YARN-2137: --- Thanks for pointing these out [~vinodkv]. These make sense :-) working on these right now. Add support for logaggregation to a path on non-default filecontext --- Key: YARN-2137 URL: https://issues.apache.org/jira/browse/YARN-2137 Project: Hadoop YARN Issue Type: Improvement Components: log-aggregation Affects Versions: 2.4.0 Reporter: Sumit Kumar Assignee: Sumit Kumar Attachments: YARN-2137.patch Current log-aggregation implementation supports logaggregation to default filecontext only. This patch is to support logaggregation to any of the supported filesystems within hadoop eco-system (hdfs, s3, swiftfs etc). So for example a customer could use hdfs as default filesystem but use s3 or swiftfs for logaggregation. Current implementation makes mixed usages of FileContext+AbstractFileSystem apis as well as FileSystem apis which is confusing. This patch does two things: # moves logaggregation implementation to use only FileContext apis # adds support for doing log aggregation on non-default filesystem as well. # changes TestLogAggregationService to use local filesystem itself instead of mocking the behavior -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2148) TestNMClient failed due more exit code values added and passed to AM
[ https://issues.apache.org/jira/browse/YARN-2148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029504#comment-14029504 ] Tsuyoshi OZAWA commented on YARN-2148: -- {quote} The race condition I've observed before is that KillTransition is executed, and the the diagnostics info has been added. However, CLEANUP_CONTAINER is executed on another thread. Before it is executed, the container has already exit as normal, with the exit code 0 {quote} This was a race condition between a thread which executes CLEANUP_CONTAINER and ContainerLauncher and KillTransition. {{ContainerImpl#exitCode}} is set in {{KillTransition}} after YARN-2091. Therefore, the case of the exit code 0 doesn't occur and it's also covered with the [~leftnoteasy]'s patch. I think it's consistent change. {quote} One more concern: ContainerExitStatus is a pubic class. YARN-2091 seems to be incompatible change, while the old code has been used for a while. {quote} YARN-2091 introduces new ContainerExitStatus. If old code uses old jar before YARN-2091, new exit code should be handled as INVALID or unknown exit code. IHMO, we should announce it for YARN application creators at the release time. One option is adding document which describe this. TestNMClient failed due more exit code values added and passed to AM Key: YARN-2148 URL: https://issues.apache.org/jira/browse/YARN-2148 Project: Hadoop YARN Issue Type: Bug Components: client Affects Versions: 3.0.0, 2.5.0 Reporter: Wangda Tan Assignee: Wangda Tan Fix For: 2.5.0 Attachments: YARN-2148.patch Currently, TestNMClient will be failed in trunk, see https://builds.apache.org/job/PreCommit-YARN-Build/3959/testReport/junit/org.apache.hadoop.yarn.client.api.impl/TestNMClient/testNMClient/ {code} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testGetContainerStatus(TestNMClient.java:385) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testContainerManagement(TestNMClient.java:347) at org.apache.hadoop.yarn.client.api.impl.TestNMClient.testNMClient(TestNMClient.java:226) {code} Test cases in TestNMClient uses following code to verify exit code of COMPLETED containers {code} testGetContainerStatus(container, i, ContainerState.COMPLETE, Container killed by the ApplicationMaster., Arrays.asList( new Integer[] {137, 143, 0})); {code} But YARN-2091 added logic to make exit code reflecting the actual status, so exit code of the killed by ApplicationMaster will be -105, {code} if (container.hasDefaultExitCode()) { container.exitCode = exitEvent.getExitCode(); } {code} We should update test case as well. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2155) FairScheduler: Incorrect check when trigger a preemption
[ https://issues.apache.org/jira/browse/YARN-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029514#comment-14029514 ] Hadoop QA commented on YARN-2155: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650080/YARN-2155.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3971//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3971//console This message is automatically generated. FairScheduler: Incorrect check when trigger a preemption Key: YARN-2155 URL: https://issues.apache.org/jira/browse/YARN-2155 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2155.patch {code} private boolean shouldAttemptPreemption() { if (preemptionEnabled) { return (preemptionUtilizationThreshold Math.max( (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(), (float) rootMetrics.getAvailableVirtualCores() / clusterResource.getVirtualCores())); } return false; } {code} preemptionUtilizationThreshould should be compared with allocatedResource instead of availableResource. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1919) Log yarn.resourcemanager.cluster-id is required for HA instead of throwing NPE
[ https://issues.apache.org/jira/browse/YARN-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029520#comment-14029520 ] Jian He commented on YARN-1919: --- lgtm, +1 Log yarn.resourcemanager.cluster-id is required for HA instead of throwing NPE -- Key: YARN-1919 URL: https://issues.apache.org/jira/browse/YARN-1919 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.3.0, 2.4.0 Reporter: Devaraj K Assignee: Tsuyoshi OZAWA Priority: Minor Attachments: YARN-1919.1.patch {code:xml} 2014-04-09 16:14:16,392 WARN org.apache.hadoop.service.AbstractService: When stopping the service org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService : java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceStop(EmbeddedElectorService.java:108) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:122) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:232) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1038) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029539#comment-14029539 ] Tsuyoshi OZAWA commented on YARN-2052: -- [~jianhe], I think it's OK after fencing operation, but one problem is {{recover()}} is invoked before the fencing. My idea to deal with the problem is as follows: 1. Active RM stores current epoch value. 2. After the fail over, new active RM recovers epoch and recognizes the epoch value as {{epoch + 1}}. 3. New active RM issues {{fence()}} on ZKRMStateStore and increment epoch. ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Container ids are made unique by using the app identifier and appending a monotonically increasing sequence number to it. Since container creation is a high churn activity the RM does not store the sequence number per app. So after restart it does not know what the new sequence number should be for new allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1702) Expose kill app functionality as part of RM web services
[ https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Vasudev updated YARN-1702: Attachment: apache-yarn-1702.14.patch {quote} RMWebService.hasAppAcess() is not used anywhere. By default, we don't have any filters, and so all the writable web-services get an 'unauthorized' errors. This seems reasonable, but let's document it. {quote} Fixed. Expose kill app functionality as part of RM web services Key: YARN-1702 URL: https://issues.apache.org/jira/browse/YARN-1702 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1702.10.patch, apache-yarn-1702.11.patch, apache-yarn-1702.12.patch, apache-yarn-1702.13.patch, apache-yarn-1702.14.patch, apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch, apache-yarn-1702.7.patch, apache-yarn-1702.8.patch, apache-yarn-1702.9.patch Expose functionality to kill an app via the ResourceManager web services API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029578#comment-14029578 ] Jian He commented on YARN-2052: --- bq. but one problem is recover() is invoked before the fencing didn't get you. After checking the code, isn't fencing invoked before recover ? ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Container ids are made unique by using the app identifier and appending a monotonically increasing sequence number to it. Since container creation is a high churn activity the RM does not store the sequence number per app. So after restart it does not know what the new sequence number should be for new allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1885) RM may not send the finished signal to some nodes where the application ran after RM restarts
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029590#comment-14029590 ] Hadoop QA commented on YARN-1885: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650015/YARN-1885.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3972//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3972//console This message is automatically generated. RM may not send the finished signal to some nodes where the application ran after RM restarts - Key: YARN-1885 URL: https://issues.apache.org/jira/browse/YARN-1885 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Wangda Tan Attachments: YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch, YARN-1885.patch During our HA testing we have seen cases where yarn application logs are not available through the cli but i can look at AM logs through the UI. RM was also being restarted in the background as the application was running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2152) Recover missing container information
[ https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029595#comment-14029595 ] Jian He commented on YARN-2152: --- Probably we can add extra to-be-recovered container information in ContainerTokenIdentifier as a payload and that'll be sent to NM on container launch. Recover missing container information - Key: YARN-2152 URL: https://issues.apache.org/jira/browse/YARN-2152 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Container information such as container priority and container start time cannot be recovered because NM container today lacks such container information to send across on NM registration when RM recovery happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029619#comment-14029619 ] Tsuyoshi OZAWA commented on YARN-2052: -- [~jianhe], my bad, you're right. I misread that RMStore is registered as a service of RM. Then we don't need such a tricky way I described. ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Container ids are made unique by using the app identifier and appending a monotonically increasing sequence number to it. Since container creation is a high churn activity the RM does not store the sequence number per app. So after restart it does not know what the new sequence number should be for new allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2052) ContainerId creation after work preserving restart is broken
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA reassigned YARN-2052: Assignee: Tsuyoshi OZAWA ContainerId creation after work preserving restart is broken Key: YARN-2052 URL: https://issues.apache.org/jira/browse/YARN-2052 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Container ids are made unique by using the app identifier and appending a monotonically increasing sequence number to it. Since container creation is a high churn activity the RM does not store the sequence number per app. So after restart it does not know what the new sequence number should be for new allocations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services
[ https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029647#comment-14029647 ] Hadoop QA commented on YARN-1702: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650103/apache-yarn-1702.14.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3973//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3973//console This message is automatically generated. Expose kill app functionality as part of RM web services Key: YARN-1702 URL: https://issues.apache.org/jira/browse/YARN-1702 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1702.10.patch, apache-yarn-1702.11.patch, apache-yarn-1702.12.patch, apache-yarn-1702.13.patch, apache-yarn-1702.14.patch, apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch, apache-yarn-1702.7.patch, apache-yarn-1702.8.patch, apache-yarn-1702.9.patch Expose functionality to kill an app via the ResourceManager web services API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2151) FairScheduler option for global preemption within hierarchical queues
[ https://issues.apache.org/jira/browse/YARN-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029768#comment-14029768 ] Andrey Stepachev commented on YARN-2151: Actually there is not much code for the preemption itself, but more about Min Share. So, this patch can be applied (after rb of course) and should not contradict or interfere with future changes in container preemption logic. FairScheduler option for global preemption within hierarchical queues - Key: YARN-2151 URL: https://issues.apache.org/jira/browse/YARN-2151 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Andrey Stepachev Attachments: YARN-2151.patch FairScheduler has hierarchical queues, but fair share calculation and preemption still works withing a limited range and effectively still nonhierarchical. This patch solves this incompleteness in two aspects: 1. Currently MinShare is not propagated to upper queue, that leads to fair share calculation ignores all Min Shares in deeper queues. Lets take an example (implemented as test case TestFairScheduler#testMinShareInHierarchicalQueues) {code} ?xml version=1.0? allocations queue name=queue1 maxResources10240mb, 10vcores/maxResources queue name=big/ queue name=sub1 schedulingPolicyfair/schedulingPolicy queue name=sub11 minResources6192mb, 6vcores/minResources /queue /queue queue name=sub2 /queue /queue /allocations {code} Then bigApp started within queue1.big with 10x1GB containers. That effectively eats all maximum allowed resources for queue1. Subsequent requests for app1 (queue1.sub1.sub11) and app2 (queue1.sub2) (5x1GB each) will wait for free resources. Take a note, that sub11 has min share requirements for 6x1GB. Without given patch fair share will be calculated with no knowledge about min share requirements and app1 and app2 will get equal number of containers. With the patch resources will split according to min share ( in test it will be 5 for app1 and 1 for app2) That behaviour controlled by the same parameter as ‘globalPreemtion’, but that can be changed easily. Implementation is a bit awkward, but seems that method for min share recalculation can be exposed as public or protected api and constructor in FSQueue can call it before using minShare getter. But right now current implementation with nulls should work too. 2. Preemption doesn’t works between queues on different level for the queues hierarchy. Moreover, it is not possible to override various parameters for children queues. This patch adds parameter ‘globalPreemption’, which enables global preemption algorithm modifications. In a nutshell patch adds function shouldAttemptPreemption(queue), which can calculate usage for nested queues, and if queue with usage more that specified threshold is found, preemption can be triggered. Aggregated minShare does the rest of work and preemption will work as expected within hierarchy of queues with different MinShare/MaxShare specifications on different levels. Test case TestFairScheduler#testGlobalPreemption depicts how it works. One big app gets resources above its fair share and app1 has a declared min share. On submission code finds that starvation and preempts enough containers to give enough room for app1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2147) client lacks delegation token exception details when application submit fails
[ https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2147: -- Attachment: YARN-2147.patch I make changes that can log all tokens when it fails to renew them. client lacks delegation token exception details when application submit fails - Key: YARN-2147 URL: https://issues.apache.org/jira/browse/YARN-2147 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Chen He Priority: Minor Attachments: YARN-2147.patch When an client submits an application and the delegation token process fails the client can lack critical details needed to understand the nature of the error. Only the message of the error exception is conveyed to the client, which sometimes isn't enough to debug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2156) ApplicationMasterService#serviceStart() method has hardcoded AuthMethod.TOKEN as security configuration
Svetozar created YARN-2156: -- Summary: ApplicationMasterService#serviceStart() method has hardcoded AuthMethod.TOKEN as security configuration Key: YARN-2156 URL: https://issues.apache.org/jira/browse/YARN-2156 Project: Hadoop YARN Issue Type: Bug Reporter: Svetozar org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService#serviceStart() method has mistakenly hardcoded AuthMethod.TOKEN as Hadoop security authentication. It looks like that: {code} @Override protected void serviceStart() throws Exception { Configuration conf = getConfig(); YarnRPC rpc = YarnRPC.create(conf); InetSocketAddress masterServiceAddress = conf.getSocketAddr( YarnConfiguration.RM_SCHEDULER_ADDRESS, YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS, YarnConfiguration.DEFAULT_RM_SCHEDULER_PORT); Configuration serverConf = conf; // If the auth is not-simple, enforce it to be token-based. serverConf = new Configuration(conf); serverConf.set( CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION, SaslRpcServer.AuthMethod.TOKEN.toString()); ... } {code} Obviously such code makes sense only if CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION config setting is missing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2155) FairScheduler: Incorrect check when trigger a preemption
[ https://issues.apache.org/jira/browse/YARN-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029792#comment-14029792 ] Karthik Kambatla commented on YARN-2155: Good catch [~octo47]. Thanks for fixing it, Wei. The fix looks good to me. +1. FairScheduler: Incorrect check when trigger a preemption Key: YARN-2155 URL: https://issues.apache.org/jira/browse/YARN-2155 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Attachments: YARN-2155.patch {code} private boolean shouldAttemptPreemption() { if (preemptionEnabled) { return (preemptionUtilizationThreshold Math.max( (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(), (float) rootMetrics.getAvailableVirtualCores() / clusterResource.getVirtualCores())); } return false; } {code} preemptionUtilizationThreshould should be compared with allocatedResource instead of availableResource. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2156) ApplicationMasterService#serviceStart() method has hardcoded AuthMethod.TOKEN as security configuration
[ https://issues.apache.org/jira/browse/YARN-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029796#comment-14029796 ] Jian He commented on YARN-2156: --- Svetozar, I think this is expected because AMRMToken today is used in both secure and non-secure environment. ApplicationMasterService#serviceStart() method has hardcoded AuthMethod.TOKEN as security configuration --- Key: YARN-2156 URL: https://issues.apache.org/jira/browse/YARN-2156 Project: Hadoop YARN Issue Type: Bug Reporter: Svetozar Ivanov org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService#serviceStart() method has mistakenly hardcoded AuthMethod.TOKEN as Hadoop security authentication. It looks like that: {code} @Override protected void serviceStart() throws Exception { Configuration conf = getConfig(); YarnRPC rpc = YarnRPC.create(conf); InetSocketAddress masterServiceAddress = conf.getSocketAddr( YarnConfiguration.RM_SCHEDULER_ADDRESS, YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS, YarnConfiguration.DEFAULT_RM_SCHEDULER_PORT); Configuration serverConf = conf; // If the auth is not-simple, enforce it to be token-based. serverConf = new Configuration(conf); serverConf.set( CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION, SaslRpcServer.AuthMethod.TOKEN.toString()); ... } {code} Obviously such code makes sense only if CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION config setting is missing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services
[ https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029850#comment-14029850 ] Vinod Kumar Vavilapalli commented on YARN-1702: --- Looks good, checking this in. Expose kill app functionality as part of RM web services Key: YARN-1702 URL: https://issues.apache.org/jira/browse/YARN-1702 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Attachments: apache-yarn-1702.10.patch, apache-yarn-1702.11.patch, apache-yarn-1702.12.patch, apache-yarn-1702.13.patch, apache-yarn-1702.14.patch, apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch, apache-yarn-1702.7.patch, apache-yarn-1702.8.patch, apache-yarn-1702.9.patch Expose functionality to kill an app via the ResourceManager web services API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2155) FairScheduler: Incorrect threshold check for preemption
[ https://issues.apache.org/jira/browse/YARN-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029868#comment-14029868 ] Hudson commented on YARN-2155: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5695 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5695/]) YARN-2155. FairScheduler: Incorrect threshold check for preemption. (Wei Yan via kasha) (kasha: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602295) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/TestFairSchedulerPreemption.java FairScheduler: Incorrect threshold check for preemption --- Key: YARN-2155 URL: https://issues.apache.org/jira/browse/YARN-2155 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.5.0 Attachments: YARN-2155.patch {code} private boolean shouldAttemptPreemption() { if (preemptionEnabled) { return (preemptionUtilizationThreshold Math.max( (float) rootMetrics.getAvailableMB() / clusterResource.getMemory(), (float) rootMetrics.getAvailableVirtualCores() / clusterResource.getVirtualCores())); } return false; } {code} preemptionUtilizationThreshould should be compared with allocatedResource instead of availableResource. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails
[ https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029871#comment-14029871 ] Hadoop QA commented on YARN-2147: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650136/YARN-2147.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3974//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3974//console This message is automatically generated. client lacks delegation token exception details when application submit fails - Key: YARN-2147 URL: https://issues.apache.org/jira/browse/YARN-2147 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Chen He Priority: Minor Attachments: YARN-2147.patch When an client submits an application and the delegation token process fails the client can lack critical details needed to understand the nature of the error. Only the message of the error exception is conveyed to the client, which sometimes isn't enough to debug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1702) Expose kill app functionality as part of RM web services
[ https://issues.apache.org/jira/browse/YARN-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029885#comment-14029885 ] Hudson commented on YARN-1702: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5696 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5696/]) YARN-1702. Added kill app functionality to RM web services. Contributed by Varun Vasudev. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1602298) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/RMWebServices.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/dao/AppState.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/webapp/TestRMWebServicesAppsModification.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm Expose kill app functionality as part of RM web services Key: YARN-1702 URL: https://issues.apache.org/jira/browse/YARN-1702 Project: Hadoop YARN Issue Type: Sub-task Reporter: Varun Vasudev Assignee: Varun Vasudev Fix For: 2.5.0 Attachments: apache-yarn-1702.10.patch, apache-yarn-1702.11.patch, apache-yarn-1702.12.patch, apache-yarn-1702.13.patch, apache-yarn-1702.14.patch, apache-yarn-1702.2.patch, apache-yarn-1702.3.patch, apache-yarn-1702.4.patch, apache-yarn-1702.5.patch, apache-yarn-1702.7.patch, apache-yarn-1702.8.patch, apache-yarn-1702.9.patch Expose functionality to kill an app via the ResourceManager web services API. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2147) client lacks delegation token exception details when application submit fails
[ https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-2147: -- Attachment: YARN-2147-v2.patch client lacks delegation token exception details when application submit fails - Key: YARN-2147 URL: https://issues.apache.org/jira/browse/YARN-2147 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Chen He Priority: Minor Attachments: YARN-2147-v2.patch, YARN-2147.patch When an client submits an application and the delegation token process fails the client can lack critical details needed to understand the nature of the error. Only the message of the error exception is conveyed to the client, which sometimes isn't enough to debug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2156) ApplicationMasterService#serviceStart() method has hardcoded AuthMethod.TOKEN as security configuration
[ https://issues.apache.org/jira/browse/YARN-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029913#comment-14029913 ] Daryn Sharp commented on YARN-2156: --- Yes, this is by design. Yarn uses tokens regardless of your security setting. ApplicationMasterService#serviceStart() method has hardcoded AuthMethod.TOKEN as security configuration --- Key: YARN-2156 URL: https://issues.apache.org/jira/browse/YARN-2156 Project: Hadoop YARN Issue Type: Bug Reporter: Svetozar Ivanov org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService#serviceStart() method has mistakenly hardcoded AuthMethod.TOKEN as Hadoop security authentication. It looks like that: {code} @Override protected void serviceStart() throws Exception { Configuration conf = getConfig(); YarnRPC rpc = YarnRPC.create(conf); InetSocketAddress masterServiceAddress = conf.getSocketAddr( YarnConfiguration.RM_SCHEDULER_ADDRESS, YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS, YarnConfiguration.DEFAULT_RM_SCHEDULER_PORT); Configuration serverConf = conf; // If the auth is not-simple, enforce it to be token-based. serverConf = new Configuration(conf); serverConf.set( CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION, SaslRpcServer.AuthMethod.TOKEN.toString()); ... } {code} Obviously such code makes sense only if CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION config setting is missing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2156) ApplicationMasterService#serviceStart() method has hardcoded AuthMethod.TOKEN as security configuration
[ https://issues.apache.org/jira/browse/YARN-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp resolved YARN-2156. --- Resolution: Not a Problem ApplicationMasterService#serviceStart() method has hardcoded AuthMethod.TOKEN as security configuration --- Key: YARN-2156 URL: https://issues.apache.org/jira/browse/YARN-2156 Project: Hadoop YARN Issue Type: Bug Reporter: Svetozar Ivanov org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService#serviceStart() method has mistakenly hardcoded AuthMethod.TOKEN as Hadoop security authentication. It looks like that: {code} @Override protected void serviceStart() throws Exception { Configuration conf = getConfig(); YarnRPC rpc = YarnRPC.create(conf); InetSocketAddress masterServiceAddress = conf.getSocketAddr( YarnConfiguration.RM_SCHEDULER_ADDRESS, YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS, YarnConfiguration.DEFAULT_RM_SCHEDULER_PORT); Configuration serverConf = conf; // If the auth is not-simple, enforce it to be token-based. serverConf = new Configuration(conf); serverConf.set( CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION, SaslRpcServer.AuthMethod.TOKEN.toString()); ... } {code} Obviously such code makes sense only if CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION config setting is missing. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2147) client lacks delegation token exception details when application submit fails
[ https://issues.apache.org/jira/browse/YARN-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029968#comment-14029968 ] Hadoop QA commented on YARN-2147: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650154/YARN-2147-v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3975//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3975//console This message is automatically generated. client lacks delegation token exception details when application submit fails - Key: YARN-2147 URL: https://issues.apache.org/jira/browse/YARN-2147 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Jason Lowe Assignee: Chen He Priority: Minor Attachments: YARN-2147-v2.patch, YARN-2147.patch When an client submits an application and the delegation token process fails the client can lack critical details needed to understand the nature of the error. Only the message of the error exception is conveyed to the client, which sometimes isn't enough to debug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2157) Document YARN metrics
Akira AJISAKA created YARN-2157: --- Summary: Document YARN metrics Key: YARN-2157 URL: https://issues.apache.org/jira/browse/YARN-2157 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA YARN-side of HADOOP-6350. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2157) Document YARN metrics
[ https://issues.apache.org/jira/browse/YARN-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated YARN-2157: Description: YARN-side of HADOOP-6350. Add YARN metrics to Metrics document. (was: YARN-side of HADOOP-6350.) Document YARN metrics - Key: YARN-2157 URL: https://issues.apache.org/jira/browse/YARN-2157 Project: Hadoop YARN Issue Type: Improvement Components: documentation Reporter: Akira AJISAKA Assignee: Akira AJISAKA YARN-side of HADOOP-6350. Add YARN metrics to Metrics document. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1367) After restart NM should resync with the RM without killing containers
[ https://issues.apache.org/jira/browse/YARN-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030016#comment-14030016 ] Anubhav Dhoot commented on YARN-1367: - I will send an updated patch shortly After restart NM should resync with the RM without killing containers - Key: YARN-1367 URL: https://issues.apache.org/jira/browse/YARN-1367 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Attachments: YARN-1367.prototype.patch After RM restart, the RM sends a resync response to NMs that heartbeat to it. Upon receiving the resync response, the NM kills all containers and re-registers with the RM. The NM should be changed to not kill the container and instead inform the RM about all currently running containers including their allocations etc. After the re-register, the NM should send all pending container completions to the RM as usual. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2151) FairScheduler option for global preemption within hierarchical queues
[ https://issues.apache.org/jira/browse/YARN-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030057#comment-14030057 ] Ashwin Shankar commented on YARN-2151: -- Hi [~octo47], Couple of questions : 1. The first problem you are trying to solve is minResources of a queue not getting accounted in fair share calculation of its ancestry. Can't we solve this simply by configuring 'weight' properly at the parents ? Here is what I'm talking about.I tested the following config out in my 3 node cluster of 40G and it works : {code:xml} ?xml version=1.0? allocations queue name=queue1 maxResources10240mb, 10vcores/maxResources weight9/weight queue name=big/ queue name=sub1 weight7/weight schedulingPolicyfair/schedulingPolicy queue name=sub11 minResources6192mb, 6vcores/minResources /queue /queue queue name=sub2 /queue /queue /allocations {code} 2. I checked out your testcase TestFairScheduler#testGlobalPreemption. In a nutshell you are testing two things - problem1 described above and whether you are inheriting minSharePreemptionTimeout from queue1 to sub11 . I have couple of questions on this : a. Why don't you want to define minSharePreemptionTimeout in sub11 itself,since you are anyway configuring it ? b. What if someone doesn't want minSharePreemptionTimeout to be inherited and would want to use global default ? FairScheduler option for global preemption within hierarchical queues - Key: YARN-2151 URL: https://issues.apache.org/jira/browse/YARN-2151 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Andrey Stepachev Attachments: YARN-2151.patch FairScheduler has hierarchical queues, but fair share calculation and preemption still works withing a limited range and effectively still nonhierarchical. This patch solves this incompleteness in two aspects: 1. Currently MinShare is not propagated to upper queue, that leads to fair share calculation ignores all Min Shares in deeper queues. Lets take an example (implemented as test case TestFairScheduler#testMinShareInHierarchicalQueues) {code} ?xml version=1.0? allocations queue name=queue1 maxResources10240mb, 10vcores/maxResources queue name=big/ queue name=sub1 schedulingPolicyfair/schedulingPolicy queue name=sub11 minResources6192mb, 6vcores/minResources /queue /queue queue name=sub2 /queue /queue /allocations {code} Then bigApp started within queue1.big with 10x1GB containers. That effectively eats all maximum allowed resources for queue1. Subsequent requests for app1 (queue1.sub1.sub11) and app2 (queue1.sub2) (5x1GB each) will wait for free resources. Take a note, that sub11 has min share requirements for 6x1GB. Without given patch fair share will be calculated with no knowledge about min share requirements and app1 and app2 will get equal number of containers. With the patch resources will split according to min share ( in test it will be 5 for app1 and 1 for app2) That behaviour controlled by the same parameter as ‘globalPreemtion’, but that can be changed easily. Implementation is a bit awkward, but seems that method for min share recalculation can be exposed as public or protected api and constructor in FSQueue can call it before using minShare getter. But right now current implementation with nulls should work too. 2. Preemption doesn’t works between queues on different level for the queues hierarchy. Moreover, it is not possible to override various parameters for children queues. This patch adds parameter ‘globalPreemption’, which enables global preemption algorithm modifications. In a nutshell patch adds function shouldAttemptPreemption(queue), which can calculate usage for nested queues, and if queue with usage more that specified threshold is found, preemption can be triggered. Aggregated minShare does the rest of work and preemption will work as expected within hierarchy of queues with different MinShare/MaxShare specifications on different levels. Test case TestFairScheduler#testGlobalPreemption depicts how it works. One big app gets resources above its fair share and app1 has a declared min share. On submission code finds that starvation and preempts enough containers to give enough room for app1. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2152) Recover missing container information
[ https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2152: -- Attachment: YARN-2152.1.patch Uploaded a patch: - Added two more fields(priority and createTime) in ContainerTokenIdentifier - NM will populate the extra container information in NMContainerStatus based on the information passed by containerTokenIdentifier. - I feel createTime is more appropriate than startTime because it indicates the time when container is created. Recover missing container information - Key: YARN-2152 URL: https://issues.apache.org/jira/browse/YARN-2152 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2152.1.patch Container information such as container priority and container start time cannot be recovered because NM container today lacks such container information to send across on NM registration when RM recovery happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2022) Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy
[ https://issues.apache.org/jira/browse/YARN-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030143#comment-14030143 ] Mayank Bansal commented on YARN-2022: - Hi [~vinodkv] what you are saying make sense and I agree to that however I think we still need this patch as that will ensure we are give least priority to kill AM's. Thoughts? [~sunilg] Thanks for the patch. here are some high level comments {code} + public static final String SKIP_AM_CONTAINER_FROM_PREEMPTION = yarn.resourcemanager.monitor.capacity.preemption.skip_am_container; {code} Please run the formatter , it doesn't seems to be the standard length of the line {code} +skipAMContainer = config.getBoolean(SKIP_AM_CONTAINER_FROM_PREEMPTION, +false); {code} By default it should be true, as we always wanted am to be least priority. Did you run the test on the cluster? Thanks, Mayank Preempting an Application Master container can be kept as least priority when multiple applications are marked for preemption by ProportionalCapacityPreemptionPolicy - Key: YARN-2022 URL: https://issues.apache.org/jira/browse/YARN-2022 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Affects Versions: 2.4.0 Reporter: Sunil G Assignee: Sunil G Attachments: YARN-2022-DesignDraft.docx, YARN-2022.2.patch, YARN-2022.3.patch, YARN-2022.4.patch, Yarn-2022.1.patch Cluster Size = 16GB [2NM's] Queue A Capacity = 50% Queue B Capacity = 50% Consider there are 3 applications running in Queue A which has taken the full cluster capacity. J1 = 2GB AM + 1GB * 4 Maps J2 = 2GB AM + 1GB * 4 Maps J3 = 2GB AM + 1GB * 2 Maps Another Job J4 is submitted in Queue B [J4 needs a 2GB AM + 1GB * 2 Maps ]. Currently in this scenario, Jobs J3 will get killed including its AM. It is better if AM can be given least priority among multiple applications. In this same scenario, map tasks from J3 and J2 can be preempted. Later when cluster is free, maps can be allocated to these Jobs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2152) Recover missing container information
[ https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030150#comment-14030150 ] Hadoop QA commented on YARN-2152: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650196/YARN-2152.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.TestContainerLaunchRPC org.apache.hadoop.yarn.TestRPC org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerReboot org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerResync org.apache.hadoop.yarn.server.nodemanager.TestEventFlow org.apache.hadoop.yarn.server.nodemanager.webapp.TestContainerLogsPage org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServer org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.TestContainersMonitor org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers org.apache.hadoop.yarn.server.nodemanager.containermanager.container.TestContainer org.apache.hadoop.yarn.server.nodemanager.TestNodeManagerShutdown org.apache.hadoop.yarn.server.nodemanager.TestNodeStatusUpdater org.apache.hadoop.yarn.server.nodemanager.containermanager.TestContainerManager org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3976//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3976//console This message is automatically generated. Recover missing container information - Key: YARN-2152 URL: https://issues.apache.org/jira/browse/YARN-2152 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2152.1.patch Container information such as container priority and container start time cannot be recovered because NM container today lacks such container information to send across on NM registration when RM recovery happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2152) Recover missing container information
[ https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2152: -- Attachment: YARN-2152.1.patch Recover missing container information - Key: YARN-2152 URL: https://issues.apache.org/jira/browse/YARN-2152 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2152.1.patch, YARN-2152.1.patch Container information such as container priority and container start time cannot be recovered because NM container today lacks such container information to send across on NM registration when RM recovery happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2152) Recover missing container information
[ https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2152: -- Attachment: (was: YARN-2152.1.patch) Recover missing container information - Key: YARN-2152 URL: https://issues.apache.org/jira/browse/YARN-2152 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2152.1.patch Container information such as container priority and container start time cannot be recovered because NM container today lacks such container information to send across on NM registration when RM recovery happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2152) Recover missing container information
[ https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2152: -- Attachment: YARN-2152.1.patch Recover missing container information - Key: YARN-2152 URL: https://issues.apache.org/jira/browse/YARN-2152 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2152.1.patch, YARN-2152.1.patch Container information such as container priority and container start time cannot be recovered because NM container today lacks such container information to send across on NM registration when RM recovery happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers
[ https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030175#comment-14030175 ] Beckham007 commented on YARN-2140: -- I think it could use the net_cls subsystem of cgroup to handle this. Firstly, it need to refactor org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler to support various of resource, not only cpu. Add support for network IO isolation/scheduling for containers -- Key: YARN-2140 URL: https://issues.apache.org/jira/browse/YARN-2140 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Assignee: Wei Yan -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2152) Recover missing container information
[ https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030201#comment-14030201 ] Hadoop QA commented on YARN-2152: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650210/YARN-2152.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 16 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3977//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3977//console This message is automatically generated. Recover missing container information - Key: YARN-2152 URL: https://issues.apache.org/jira/browse/YARN-2152 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2152.1.patch, YARN-2152.1.patch Container information such as container priority and container start time cannot be recovered because NM container today lacks such container information to send across on NM registration when RM recovery happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers
[ https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030220#comment-14030220 ] haosdent commented on YARN-2140: net_cls just classify the package. So cgroup is not enough to do network IO isolation/scheduling. And I have tried tc and net_cls, but them don't do well in network IO isolation/scheduling even couldn't have any effects on package in flow. Add support for network IO isolation/scheduling for containers -- Key: YARN-2140 URL: https://issues.apache.org/jira/browse/YARN-2140 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Assignee: Wei Yan -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2152) Recover missing container information
[ https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030221#comment-14030221 ] Hadoop QA commented on YARN-2152: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12650213/YARN-2152.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 16 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3978//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3978//console This message is automatically generated. Recover missing container information - Key: YARN-2152 URL: https://issues.apache.org/jira/browse/YARN-2152 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2152.1.patch, YARN-2152.1.patch Container information such as container priority and container start time cannot be recovered because NM container today lacks such container information to send across on NM registration when RM recovery happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2152) Recover missing container information
[ https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030254#comment-14030254 ] Wangda Tan commented on YARN-2152: -- [~jianhe], thanks for working on this, I've looked your patch, only comment is, can we rename startTime in RMContainerImpl to createTime for consistency? Otherwise, people might think they are in different meaning. Recover missing container information - Key: YARN-2152 URL: https://issues.apache.org/jira/browse/YARN-2152 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2152.1.patch, YARN-2152.1.patch Container information such as container priority and container start time cannot be recovered because NM container today lacks such container information to send across on NM registration when RM recovery happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2152) Recover missing container information
[ https://issues.apache.org/jira/browse/YARN-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2152: -- Attachment: YARN-2152.2.patch Did the rename as suggested Recover missing container information - Key: YARN-2152 URL: https://issues.apache.org/jira/browse/YARN-2152 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He Attachments: YARN-2152.1.patch, YARN-2152.1.patch, YARN-2152.2.patch Container information such as container priority and container start time cannot be recovered because NM container today lacks such container information to send across on NM registration when RM recovery happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers
[ https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030264#comment-14030264 ] Beckham007 commented on YARN-2140: -- tc class add dev ${net_dev} parent ${parent_classid} classid ${classid} htb rate ${guaranteed_bandwidth}kbps ceil ${max_bandwidth}kbps It could be used to control the min and max bandwidth of each container. Add support for network IO isolation/scheduling for containers -- Key: YARN-2140 URL: https://issues.apache.org/jira/browse/YARN-2140 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Assignee: Wei Yan -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2074) Preemption of AM containers shouldn't count towards AM failures
[ https://issues.apache.org/jira/browse/YARN-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030269#comment-14030269 ] Wangda Tan commented on YARN-2074: -- [~jianhe], I just found this patch is failed to apply on trunk, could you update it against trunk? Thanks, Preemption of AM containers shouldn't count towards AM failures --- Key: YARN-2074 URL: https://issues.apache.org/jira/browse/YARN-2074 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Vinod Kumar Vavilapalli Assignee: Jian He Attachments: YARN-2074.1.patch, YARN-2074.2.patch, YARN-2074.3.patch One orthogonal concern with issues like YARN-2055 and YARN-2022 is that AM containers getting preempted shouldn't count towards AM failures and thus shouldn't eventually fail applications. We should explicitly handle AM container preemption/kill as a separate issue and not count it towards the limit on AM failures. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2140) Add support for network IO isolation/scheduling for containers
[ https://issues.apache.org/jira/browse/YARN-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030271#comment-14030271 ] Wei Yan commented on YARN-2140: --- [~haosd...@gmail.com], [~beckham007], net_cls can be used to limit the network bandwidth used for each task per device. One problem here is that it is not easy for users to specify the accurate network bandwidth requirement for the application. I'm still working on the design. Add support for network IO isolation/scheduling for containers -- Key: YARN-2140 URL: https://issues.apache.org/jira/browse/YARN-2140 Project: Hadoop YARN Issue Type: New Feature Reporter: Wei Yan Assignee: Wei Yan -- This message was sent by Atlassian JIRA (v6.2#6252)