[jira] [Commented] (YARN-2103) Fix code bug in SerializedExceptionPBImpl
[ https://issues.apache.org/jira/browse/YARN-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009347#comment-14009347 ] Binglin Chang commented on YARN-2103: - I plan to add generic test to test all PBImpls in YARN-2051, so separated tests are not needed. Fix code bug in SerializedExceptionPBImpl - Key: YARN-2103 URL: https://issues.apache.org/jira/browse/YARN-2103 Project: Hadoop YARN Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: YARN-2103.v1.patch {code} SerializedExceptionProto proto = SerializedExceptionProto .getDefaultInstance(); SerializedExceptionProto.Builder builder = null; boolean viaProto = false; {code} Since viaProto is false, we should initiate build rather than proto -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2075) TestRMAdminCLI consistently fail on trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai updated YARN-2075: Summary: TestRMAdminCLI consistently fail on trunk and branch-2 (was: TestRMAdminCLI consistently fail on trunk) TestRMAdminCLI consistently fail on trunk and branch-2 -- Key: YARN-2075 URL: https://issues.apache.org/jira/browse/YARN-2075 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Zhijie Shen Assignee: Kenji Kikushima Attachments: YARN-2075.patch {code} Running org.apache.hadoop.yarn.client.TestRMAdminCLI Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI) Time elapsed: 0.082 sec ERROR! java.lang.UnsupportedOperationException: null at java.util.AbstractList.remove(AbstractList.java:144) at java.util.AbstractList$Itr.remove(AbstractList.java:360) at java.util.AbstractCollection.remove(AbstractCollection.java:252) at org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173) at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144) at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447) at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180) testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI) Time elapsed: 0.088 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2075) TestRMAdminCLI consistently fail on trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009749#comment-14009749 ] Mit Desai commented on YARN-2075: - Hi Kenji, I applied the patch to trunk and branch-2. The tests still fail TestRMAdminCLI consistently fail on trunk and branch-2 -- Key: YARN-2075 URL: https://issues.apache.org/jira/browse/YARN-2075 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Zhijie Shen Assignee: Kenji Kikushima Attachments: YARN-2075.patch {code} Running org.apache.hadoop.yarn.client.TestRMAdminCLI Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI) Time elapsed: 0.082 sec ERROR! java.lang.UnsupportedOperationException: null at java.util.AbstractList.remove(AbstractList.java:144) at java.util.AbstractList$Itr.remove(AbstractList.java:360) at java.util.AbstractCollection.remove(AbstractCollection.java:252) at org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173) at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144) at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447) at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180) testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI) Time elapsed: 0.088 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1728) History server doesn't understand percent encoded paths
[ https://issues.apache.org/jira/browse/YARN-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009795#comment-14009795 ] jay vyas commented on YARN-1728: Possble link to MAPREDUCE-5902, not sure exactly how this would pop up in two places, but it seems almost the exact same problem, just on the DFS side instead of on the web side. History server doesn't understand percent encoded paths --- Key: YARN-1728 URL: https://issues.apache.org/jira/browse/YARN-1728 Project: Hadoop YARN Issue Type: Bug Reporter: Abraham Elmahrek For example, going to the job history server page http://localhost:19888/jobhistory/logs/localhost%3A8041/container_1391466602060_0011_01_01/job_1391466602060_0011/admin/stderr results in the following error: {code} Cannot get container logs. Invalid nodeId: test-cdh5-hue.ent.cloudera.com%3A8041 {code} Where the url decoded version works: http://localhost:19888/jobhistory/logs/localhost:8041/container_1391466602060_0011_01_01/job_1391466602060_0011/admin/stderr It seems like both should be supported as the former is simply percent encoding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2096) Race in TestRMRestart#testQueueMetricsOnRMRestart
[ https://issues.apache.org/jira/browse/YARN-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009807#comment-14009807 ] Tsuyoshi OZAWA commented on YARN-2096: -- One good news: TestRMRestart with Anubhav's patch works well - after running tests hundreds times, no failure. Good job :-) Race in TestRMRestart#testQueueMetricsOnRMRestart - Key: YARN-2096 URL: https://issues.apache.org/jira/browse/YARN-2096 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Fix For: 2.5.0 Attachments: YARN-2096.patch org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testQueueMetricsOnRMRestart fails randomly because of a race condition. The test validates that metrics are incremented, but does not wait for all transitions to finish before checking for the values. It also resets metrics after kicking off recovery of second RM. The metrics that need to be incremented race with this reset causing test to fail randomly. We need to wait for the right transitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He updated YARN-1680: -- Attachment: YARN-1680-v2.patch availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Chen He Attachments: YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009893#comment-14009893 ] Jian Fang commented on YARN-796: Hi Bikas, I think it is better to have the node manager to specify its own labels and then it registers the labels with RM. Also, it would be great if YARN could provide an API to add/update labels to a node. This is based on the following scenario. Usually a hadoop cluster in cloud is elastic, that is to say, the cluster size can be automatically or manually expended or shrunk based on cluster situation, for example, idleness. When a node in a cluster is chosen to be shrunk, i.e., to be removed, we could call the API to label the node so that no more tasks would be assigned to this node. We could use the decommission API to achieve this goal, but I think the label API may be more elegant. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1474) Make schedulers services
[ https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009905#comment-14009905 ] Hadoop QA commented on YARN-1474: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646928/YARN-1474.17.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-tools/hadoop-sls hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3833//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3833//console This message is automatically generated. Make schedulers services Key: YARN-1474 URL: https://issues.apache.org/jira/browse/YARN-1474 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.3.0, 2.4.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Attachments: YARN-1474.1.patch, YARN-1474.10.patch, YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, YARN-1474.17.patch, YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, YARN-1474.9.patch Schedulers currently have a reinitialize but no start and stop. Fitting them into the YARN service model would make things more coherent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1474) Make schedulers services
[ https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009928#comment-14009928 ] Tsuyoshi OZAWA commented on YARN-1474: -- The three test failures of TestFairScheduler are filed as YARN-2105. Make schedulers services Key: YARN-1474 URL: https://issues.apache.org/jira/browse/YARN-1474 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.3.0, 2.4.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Attachments: YARN-1474.1.patch, YARN-1474.10.patch, YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, YARN-1474.17.patch, YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, YARN-1474.9.patch Schedulers currently have a reinitialize but no start and stop. Fitting them into the YARN service model would make things more coherent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14009931#comment-14009931 ] Hadoop QA commented on YARN-1680: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646932/YARN-1680-v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3834//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3834//console This message is automatically generated. availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Chen He Attachments: YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2106) TestFairScheduler in trunk is failing
Wei Yan created YARN-2106: - Summary: TestFairScheduler in trunk is failing Key: YARN-2106 URL: https://issues.apache.org/jira/browse/YARN-2106 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Some issues due to the Queue Placement policy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (YARN-2106) TestFairScheduler in trunk is failing
[ https://issues.apache.org/jira/browse/YARN-2106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved YARN-2106. -- Resolution: Duplicate TestFairScheduler in trunk is failing - Key: YARN-2106 URL: https://issues.apache.org/jira/browse/YARN-2106 Project: Hadoop YARN Issue Type: Bug Reporter: Wei Yan Assignee: Wei Yan Some issues due to the Queue Placement policy. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1680) availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory.
[ https://issues.apache.org/jira/browse/YARN-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010005#comment-14010005 ] Chen He commented on YARN-1680: --- These three errors are reported in [YARN-2105|https://issues.apache.org/jira/browse/YARN-2105] and not related to this JIRA. availableResources sent to applicationMaster in heartbeat should exclude blacklistedNodes free memory. -- Key: YARN-1680 URL: https://issues.apache.org/jira/browse/YARN-1680 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.2.0, 2.3.0 Environment: SuSE 11 SP2 + Hadoop-2.3 Reporter: Rohith Assignee: Chen He Attachments: YARN-1680-v2.patch, YARN-1680.patch There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster slow start is set to 1. Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is become unstable(3 Map got killed), MRAppMaster blacklisted unstable NodeManager(NM-4). All reducer task are running in cluster now. MRAppMaster does not preempt the reducers because for Reducer preemption calculation, headRoom is considering blacklisted nodes memory. This makes jobs to hang forever(ResourceManager does not assing any new containers on blacklisted nodes but returns availableResouce considers cluster free memory). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2105) Three TestFairScheduler tests fail in trunk
[ https://issues.apache.org/jira/browse/YARN-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010013#comment-14010013 ] Karthik Kambatla commented on YARN-2105: Looks good to me. I ll wait for Sandy also to take a look. Three TestFairScheduler tests fail in trunk --- Key: YARN-2105 URL: https://issues.apache.org/jira/browse/YARN-2105 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Ashwin Shankar Attachments: YARN-2105-v1.txt The following tests fail in trunk: {code} Failed tests: TestFairScheduler.testDontAllowUndeclaredPools:2412 expected:1 but was:0 Tests in error: TestFairScheduler.testQueuePlacementWithPolicy:624 NullPointer TestFairScheduler.testNotUserAsDefaultQueue:530 » NullPointer {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2107) Refactor timeline classes into server.timeline package
[ https://issues.apache.org/jira/browse/YARN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2107: -- Issue Type: Bug (was: Sub-task) Parent: (was: YARN-1530) Refactor timeline classes into server.timeline package -- Key: YARN-2107 URL: https://issues.apache.org/jira/browse/YARN-2107 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Right now, most of timeline-server classes are present in an applicationhistoryserver package instead of a top level timeline package. This is one part of YARN-2043, there is more to do.. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2107) Refactor timeline classes into server.timeline package
Vinod Kumar Vavilapalli created YARN-2107: - Summary: Refactor timeline classes into server.timeline package Key: YARN-2107 URL: https://issues.apache.org/jira/browse/YARN-2107 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Right now, most of timeline-server classes are present in an applicationhistoryserver package instead of a top level timeline package. This is one part of YARN-2043, there is more to do.. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2105) Three TestFairScheduler tests fail in trunk
[ https://issues.apache.org/jira/browse/YARN-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010164#comment-14010164 ] Tsuyoshi OZAWA commented on YARN-2105: -- The patch works well on my local. Three TestFairScheduler tests fail in trunk --- Key: YARN-2105 URL: https://issues.apache.org/jira/browse/YARN-2105 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Ashwin Shankar Attachments: YARN-2105-v1.txt The following tests fail in trunk: {code} Failed tests: TestFairScheduler.testDontAllowUndeclaredPools:2412 expected:1 but was:0 Tests in error: TestFairScheduler.testQueuePlacementWithPolicy:624 NullPointer TestFairScheduler.testNotUserAsDefaultQueue:530 » NullPointer {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2107) Refactor timeline classes into server.timeline package
[ https://issues.apache.org/jira/browse/YARN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2107: -- Issue Type: Sub-task (was: Bug) Parent: YARN-1530 Refactor timeline classes into server.timeline package -- Key: YARN-2107 URL: https://issues.apache.org/jira/browse/YARN-2107 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Right now, most of timeline-server classes are present in an applicationhistoryserver package instead of a top level timeline package. This is one part of YARN-2043, there is more to do.. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2107) Refactor timeline classes into server.timeline package
[ https://issues.apache.org/jira/browse/YARN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2107: -- Attachment: YARN-2107.txt Here's a simple eclipse-refactor patch attached. Easiest way to review if on git - apply the patch, git add new files and run git diff -M Refactor timeline classes into server.timeline package -- Key: YARN-2107 URL: https://issues.apache.org/jira/browse/YARN-2107 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-2107.txt Right now, most of timeline-server classes are present in an applicationhistoryserver package instead of a top level timeline package. This is one part of YARN-2043, there is more to do.. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2107) Refactor timeline classes into server.timeline package
[ https://issues.apache.org/jira/browse/YARN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010200#comment-14010200 ] Hadoop QA commented on YARN-2107: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646971/YARN-2107.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests: org.apache.hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryServer org.apache.hadoop.yarn.server.applicationhistoryservice.TestApplicationHistoryClientService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3835//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3835//console This message is automatically generated. Refactor timeline classes into server.timeline package -- Key: YARN-2107 URL: https://issues.apache.org/jira/browse/YARN-2107 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-2107.txt Right now, most of timeline-server classes are present in an applicationhistoryserver package instead of a top level timeline package. This is one part of YARN-2043, there is more to do.. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2108) Show minShare on RM Scheduler page
[ https://issues.apache.org/jira/browse/YARN-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2108: -- Description: Today RM Scheduler page shows FairShare, Used, Used (over fair share) and MaxCapacity. It would be better to show MinShare with possibly different color code, so that we know queue is running more than its min share. Show minShare on RM Scheduler page -- Key: YARN-2108 URL: https://issues.apache.org/jira/browse/YARN-2108 Project: Hadoop YARN Issue Type: Task Reporter: Siqi Li Assignee: Siqi Li Today RM Scheduler page shows FairShare, Used, Used (over fair share) and MaxCapacity. It would be better to show MinShare with possibly different color code, so that we know queue is running more than its min share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1021) Yarn Scheduler Load Simulator
[ https://issues.apache.org/jira/browse/YARN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010234#comment-14010234 ] Wei Yan commented on YARN-1021: --- [~cristiana.voicu], the SLS directly supports rumen traces. In general, you need to have some existing workload traces (i.e., from some production clusters), and then use Rumen to generate workload traces. Then let the SLS load these traces. Or you can generate some traces randomly (random # of jobs, requests, lifetime, etc). Sorry that I don't have the traces used in that page right now. Yarn Scheduler Load Simulator - Key: YARN-1021 URL: https://issues.apache.org/jira/browse/YARN-1021 Project: Hadoop YARN Issue Type: New Feature Components: scheduler Reporter: Wei Yan Assignee: Wei Yan Fix For: 2.3.0 Attachments: YARN-1021-demo.tar.gz, YARN-1021-images.tar.gz, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.patch, YARN-1021.pdf The Yarn Scheduler is a fertile area of interest with different implementations, e.g., Fifo, Capacity and Fair schedulers. Meanwhile, several optimizations are also made to improve scheduler performance for different scenarios and workload. Each scheduler algorithm has its own set of features, and drives scheduling decisions by many factors, such as fairness, capacity guarantee, resource availability, etc. It is very important to evaluate a scheduler algorithm very well before we deploy it in a production cluster. Unfortunately, currently it is non-trivial to evaluate a scheduling algorithm. Evaluating in a real cluster is always time and cost consuming, and it is also very hard to find a large-enough cluster. Hence, a simulator which can predict how well a scheduler algorithm for some specific workload would be quite useful. We want to build a Scheduler Load Simulator to simulate large-scale Yarn clusters and application loads in a single machine. This would be invaluable in furthering Yarn by providing a tool for researchers and developers to prototype new scheduler features and predict their behavior and performance with reasonable amount of confidence, there-by aiding rapid innovation. The simulator will exercise the real Yarn ResourceManager removing the network factor by simulating NodeManagers and ApplicationMasters via handling and dispatching NM/AMs heartbeat events from within the same JVM. To keep tracking of scheduler behavior and performance, a scheduler wrapper will wrap the real scheduler. The simulator will produce real time metrics while executing, including: * Resource usages for whole cluster and each queue, which can be utilized to configure cluster and queue's capacity. * The detailed application execution trace (recorded in relation to simulated time), which can be analyzed to understand/validate the scheduler behavior (individual jobs turn around time, throughput, fairness, capacity guarantee, etc). * Several key metrics of scheduler algorithm, such as time cost of each scheduler operation (allocate, handle, etc), which can be utilized by Hadoop developers to find the code spots and scalability limits. The simulator will provide real time charts showing the behavior of the scheduler and its performance. A short demo is available http://www.youtube.com/watch?v=6thLi8q0qLE, showing how to use simulator to simulate Fair Scheduler and Capacity Scheduler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2099) Preemption in fair scheduler should consider app priorities
[ https://issues.apache.org/jira/browse/YARN-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010244#comment-14010244 ] Ashwin Shankar commented on YARN-2099: -- Ah, I didn't know about YARN-596,this is very nice ! I agree with Sandy's comment. Keeping app preemption based on leaf queue's scheduling policy and having a separate policy which is purely based on priority makes sense to me. Preemption in fair scheduler should consider app priorities --- Key: YARN-2099 URL: https://issues.apache.org/jira/browse/YARN-2099 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.5.0 Reporter: Ashwin Shankar Fair scheduler should take app priorities into account while preempting containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated YARN-1769: Attachment: YARN-1769.patch upmerged to latest CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2099) Preemption in fair scheduler should consider app priorities
[ https://issues.apache.org/jira/browse/YARN-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010251#comment-14010251 ] Wei Yan commented on YARN-2099: --- Hey, [~ashwinshankar77], Are you working on this one? If not, I would like to take it. Preemption in fair scheduler should consider app priorities --- Key: YARN-2099 URL: https://issues.apache.org/jira/browse/YARN-2099 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.5.0 Reporter: Ashwin Shankar Fair scheduler should take app priorities into account while preempting containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2099) Preemption in fair scheduler should consider app priorities
[ https://issues.apache.org/jira/browse/YARN-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010269#comment-14010269 ] Ashwin Shankar commented on YARN-2099: -- Hey [~ywskycn], please go ahead. Preemption in fair scheduler should consider app priorities --- Key: YARN-2099 URL: https://issues.apache.org/jira/browse/YARN-2099 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.5.0 Reporter: Ashwin Shankar Fair scheduler should take app priorities into account while preempting containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2107) Refactor timeline classes into server.timeline package
[ https://issues.apache.org/jira/browse/YARN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010277#comment-14010277 ] Zhijie Shen commented on YARN-2107: --- +1 for the new namespace. The test failure is caused by the defaults: {code} property descriptionStore class name for timeline store./description nameyarn.timeline-service.store-class/name valueorg.apache.hadoop.yarn.server.applicationhistoryservice.timeline.LeveldbTimelineStore/value /property {code} We need to change yarn-default.xml accordingly. Refactor timeline classes into server.timeline package -- Key: YARN-2107 URL: https://issues.apache.org/jira/browse/YARN-2107 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-2107.txt Right now, most of timeline-server classes are present in an applicationhistoryserver package instead of a top level timeline package. This is one part of YARN-2043, there is more to do.. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2099) Preemption in fair scheduler should consider app priorities
[ https://issues.apache.org/jira/browse/YARN-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan reassigned YARN-2099: - Assignee: Wei Yan Preemption in fair scheduler should consider app priorities --- Key: YARN-2099 URL: https://issues.apache.org/jira/browse/YARN-2099 Project: Hadoop YARN Issue Type: Sub-task Components: api, resourcemanager Affects Versions: 2.5.0 Reporter: Ashwin Shankar Assignee: Wei Yan Fair scheduler should take app priorities into account while preempting containers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010319#comment-14010319 ] Hadoop QA commented on YARN-1769: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646981/YARN-1769.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3836//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3836//console This message is automatically generated. CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1769) CapacityScheduler: Improve reservations
[ https://issues.apache.org/jira/browse/YARN-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010358#comment-14010358 ] Thomas Graves commented on YARN-1769: - TestFairScheduler is failing for other reasons. see https://issues.apache.org/jira/browse/YARN-2105. CapacityScheduler: Improve reservations Key: YARN-1769 URL: https://issues.apache.org/jira/browse/YARN-1769 Project: Hadoop YARN Issue Type: Improvement Components: capacityscheduler Affects Versions: 2.3.0 Reporter: Thomas Graves Assignee: Thomas Graves Attachments: YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch, YARN-1769.patch Currently the CapacityScheduler uses reservations in order to handle requests for large containers and the fact there might not currently be enough space available on a single host. The current algorithm for reservations is to reserve as many containers as currently required and then it will start to reserve more above that after a certain number of re-reservations (currently biased against larger containers). Anytime it hits the limit of number reserved it stops looking at any other nodes. This results in potentially missing nodes that have enough space to fullfill the request. The other place for improvement is currently reservations count against your queue capacity. If you have reservations you could hit the various limits which would then stop you from looking further at that node. The above 2 cases can cause an application requesting a larger container to take a long time to gets it resources. We could improve upon both of those by simply continuing to look at incoming nodes to see if we could potentially swap out a reservation for an actual allocation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2107) Refactor timeline classes into server.timeline package
[ https://issues.apache.org/jira/browse/YARN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2107: -- Attachment: YARN-2107.1.txt Tx for the review and the tip Zhijie. I fixed both yarn-default.xml and the documentation. Technically the rename is an incompatible change the LevelDBStore impl. But Timeline service wasn't 'declared' stable, so I am not creating any compatibility bridges. Refactor timeline classes into server.timeline package -- Key: YARN-2107 URL: https://issues.apache.org/jira/browse/YARN-2107 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-2107.1.txt, YARN-2107.txt Right now, most of timeline-server classes are present in an applicationhistoryserver package instead of a top level timeline package. This is one part of YARN-2043, there is more to do.. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1474) Make schedulers services
[ https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010372#comment-14010372 ] Tsuyoshi OZAWA commented on YARN-1474: -- [~kkambatl], v17 is ready for review. could you take a look? Make schedulers services Key: YARN-1474 URL: https://issues.apache.org/jira/browse/YARN-1474 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.3.0, 2.4.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Attachments: YARN-1474.1.patch, YARN-1474.10.patch, YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, YARN-1474.17.patch, YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, YARN-1474.9.patch Schedulers currently have a reinitialize but no start and stop. Fitting them into the YARN service model would make things more coherent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2107) Refactor timeline classes into server.timeline package
[ https://issues.apache.org/jira/browse/YARN-2107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010448#comment-14010448 ] Hadoop QA commented on YARN-2107: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646997/YARN-2107.1.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-applicationhistoryservice hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3837//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3837//console This message is automatically generated. Refactor timeline classes into server.timeline package -- Key: YARN-2107 URL: https://issues.apache.org/jira/browse/YARN-2107 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Attachments: YARN-2107.1.txt, YARN-2107.txt Right now, most of timeline-server classes are present in an applicationhistoryserver package instead of a top level timeline package. This is one part of YARN-2043, there is more to do.. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-1961) Fair scheduler preemption doesn't work for non-leaf queues
[ https://issues.apache.org/jira/browse/YARN-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar reassigned YARN-1961: Assignee: Ashwin Shankar Fair scheduler preemption doesn't work for non-leaf queues -- Key: YARN-1961 URL: https://issues.apache.org/jira/browse/YARN-1961 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.4.0 Reporter: Ashwin Shankar Assignee: Ashwin Shankar Labels: scheduler Setting minResources and minSharePreemptionTimeout to a non-leaf queue doesn't cause preemption to happen when that non-leaf queue is below minResources and there are outstanding demands in that non-leaf queue. Here is an example fs allocation config(partial) : {code:xml} queue name=abc minResources3072 mb,0 vcores/minResources minSharePreemptionTimeout30/minSharePreemptionTimeout queue name=childabc1 /queue queue name=childabc2 /queue /queue {code} With the above configs,preemption doesn't seem to happen if queue abc is below minShare and it has outstanding unsatisfied demands from apps in its child queues. Ideally in such cases we would like preemption to kick off and reclaim resources from other queues(not under queue abc). Looking at the code it seems like preemption checks for starvation only at the leaf queue level and not at the parent level. {code:title=FairScheduler.java|borderStyle=solid} boolean isStarvedForMinShare(FSLeafQueue sched) boolean isStarvedForFairShare(FSLeafQueue sched) {code} This affects our use case where we have a parent queue with probably a 100 unconfigured leaf queues under it.We want to give a minshare to the parent queue to protect all the leaf queues under it,but we cannot do it due to this bug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2105) Three TestFairScheduler tests fail in trunk
[ https://issues.apache.org/jira/browse/YARN-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010497#comment-14010497 ] Sandy Ryza commented on YARN-2105: -- +1. Thanks for the quick turnaround on this Ashwin. Three TestFairScheduler tests fail in trunk --- Key: YARN-2105 URL: https://issues.apache.org/jira/browse/YARN-2105 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Ashwin Shankar Attachments: YARN-2105-v1.txt The following tests fail in trunk: {code} Failed tests: TestFairScheduler.testDontAllowUndeclaredPools:2412 expected:1 but was:0 Tests in error: TestFairScheduler.testQueuePlacementWithPolicy:624 NullPointer TestFairScheduler.testNotUserAsDefaultQueue:530 » NullPointer {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2108) Show minShare on RM Fair Scheduler page
[ https://issues.apache.org/jira/browse/YARN-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-2108: - Summary: Show minShare on RM Fair Scheduler page (was: Show minShare on RM Scheduler page) Show minShare on RM Fair Scheduler page --- Key: YARN-2108 URL: https://issues.apache.org/jira/browse/YARN-2108 Project: Hadoop YARN Issue Type: Task Reporter: Siqi Li Assignee: Siqi Li Today RM Scheduler page shows FairShare, Used, Used (over fair share) and MaxCapacity. It would be better to show MinShare with possibly different color code, so that we know queue is running more than its min share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2108) Show minShare on RM Fair Scheduler page
[ https://issues.apache.org/jira/browse/YARN-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2108: -- Attachment: YARN-2108.v1.patch Show minShare on RM Fair Scheduler page --- Key: YARN-2108 URL: https://issues.apache.org/jira/browse/YARN-2108 Project: Hadoop YARN Issue Type: Task Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2108.v1.patch Today RM Scheduler page shows FairShare, Used, Used (over fair share) and MaxCapacity. It would be better to show MinShare with possibly different color code, so that we know queue is running more than its min share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1801) NPE in public localizer
[ https://issues.apache.org/jira/browse/YARN-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010507#comment-14010507 ] Tsuyoshi OZAWA commented on YARN-1801: -- Looks good to me(non-binding). [~jlowe], can you take a look please? NPE in public localizer --- Key: YARN-1801 URL: https://issues.apache.org/jira/browse/YARN-1801 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Jason Lowe Assignee: Hong Zhiguo Priority: Critical Attachments: YARN-1801.patch While investigating YARN-1800 found this in the NM logs that caused the public localizer to shutdown: {noformat} 2014-01-23 01:26:38,655 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:addResource(651)) - Downloading public rsrc:{ hdfs://colo-2:8020/user/fertrist/oozie-oozi/601-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar, 1390440382009, FILE, null } 2014-01-23 01:26:38,656 FATAL localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(726)) - Error: Shutting down java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.run(ResourceLocalizationService.java:712) 2014-01-23 01:26:38,656 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(728)) - Public cache exiting {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2105) Fix TestFairScheduler after YARN-2012
[ https://issues.apache.org/jira/browse/YARN-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-2105: - Summary: Fix TestFairScheduler after YARN-2012 (was: Three TestFairScheduler tests fail in trunk) Fix TestFairScheduler after YARN-2012 - Key: YARN-2105 URL: https://issues.apache.org/jira/browse/YARN-2105 Project: Hadoop YARN Issue Type: Test Reporter: Ted Yu Assignee: Ashwin Shankar Fix For: 2.5.0 Attachments: YARN-2105-v1.txt The following tests fail in trunk: {code} Failed tests: TestFairScheduler.testDontAllowUndeclaredPools:2412 expected:1 but was:0 Tests in error: TestFairScheduler.testQueuePlacementWithPolicy:624 NullPointer TestFairScheduler.testNotUserAsDefaultQueue:530 » NullPointer {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2108) Show minShare on RM Fair Scheduler page
[ https://issues.apache.org/jira/browse/YARN-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated YARN-2108: -- Attachment: YARN-2108.v2.patch Show minShare on RM Fair Scheduler page --- Key: YARN-2108 URL: https://issues.apache.org/jira/browse/YARN-2108 Project: Hadoop YARN Issue Type: Task Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2108.v1.patch, YARN-2108.v2.patch Today RM Scheduler page shows FairShare, Used, Used (over fair share) and MaxCapacity. It would be better to show MinShare with possibly different color code, so that we know queue is running more than its min share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1801) NPE in public localizer
[ https://issues.apache.org/jira/browse/YARN-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010560#comment-14010560 ] Hadoop QA commented on YARN-1801: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646195/YARN-1801.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3839//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3839//console This message is automatically generated. NPE in public localizer --- Key: YARN-1801 URL: https://issues.apache.org/jira/browse/YARN-1801 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Jason Lowe Assignee: Hong Zhiguo Priority: Critical Attachments: YARN-1801.patch While investigating YARN-1800 found this in the NM logs that caused the public localizer to shutdown: {noformat} 2014-01-23 01:26:38,655 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:addResource(651)) - Downloading public rsrc:{ hdfs://colo-2:8020/user/fertrist/oozie-oozi/601-140114233013619-oozie-oozi-W/aggregator--map-reduce/map-reduce-launcher.jar, 1390440382009, FILE, null } 2014-01-23 01:26:38,656 FATAL localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(726)) - Error: Shutting down java.lang.NullPointerException at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.run(ResourceLocalizationService.java:712) 2014-01-23 01:26:38,656 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(728)) - Public cache exiting {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
[ https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA reassigned YARN-2091: Assignee: Tsuyoshi OZAWA Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters --- Key: YARN-2091 URL: https://issues.apache.org/jira/browse/YARN-2091 Project: Hadoop YARN Issue Type: Task Reporter: Bikas Saha Assignee: Tsuyoshi OZAWA Currently, the AM cannot programmatically determine if the task was killed due to using excessive memory. The NM kills it without passing this information in the container status back to the RM. So the AM cannot take any action here. The jira tracks adding this exit status and passing it from the NM to the RM and then the AM. In general, there may be other such actions taken by YARN that are currently opaque to the AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
[ https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010580#comment-14010580 ] Tsuyoshi OZAWA commented on YARN-2091: -- ContainerManagerImpl cannot distinguish the exit reason because ContainersMonitorImpl dispatches ContainerKillEvent without the exit reason currently. I plan to add exit reason to ContainerKillEvent. Please let me know if you have better idea. Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters --- Key: YARN-2091 URL: https://issues.apache.org/jira/browse/YARN-2091 Project: Hadoop YARN Issue Type: Task Reporter: Bikas Saha Assignee: Tsuyoshi OZAWA Currently, the AM cannot programmatically determine if the task was killed due to using excessive memory. The NM kills it without passing this information in the container status back to the RM. So the AM cannot take any action here. The jira tracks adding this exit status and passing it from the NM to the RM and then the AM. In general, there may be other such actions taken by YARN that are currently opaque to the AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
[ https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010585#comment-14010585 ] Bikas Saha commented on YARN-2091: -- Thats the missing pieces AFAIK. That exit reason needs to be passed along internally through the NM and then on to the RM and AM. Maybe simply directly use ContainerExitStatus instead of a new reason object inside ContainerKillEvent. Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters --- Key: YARN-2091 URL: https://issues.apache.org/jira/browse/YARN-2091 Project: Hadoop YARN Issue Type: Task Reporter: Bikas Saha Assignee: Tsuyoshi OZAWA Currently, the AM cannot programmatically determine if the task was killed due to using excessive memory. The NM kills it without passing this information in the container status back to the RM. So the AM cannot take any action here. The jira tracks adding this exit status and passing it from the NM to the RM and then the AM. In general, there may be other such actions taken by YARN that are currently opaque to the AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions
[ https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-596: - Attachment: YARN-596.patch Update a new patch after YARN-2105 is in. In fair scheduler, intra-application container priorities affect inter-application preemption decisions --- Key: YARN-596 URL: https://issues.apache.org/jira/browse/YARN-596 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch In the fair scheduler, containers are chosen for preemption in the following way: All containers for all apps that are in queues that are over their fair share are put in a list. The list is sorted in order of the priority that the container was requested in. This means that an application can shield itself from preemption by requesting it's containers at higher priorities, which doesn't really make sense. Also, an application that is not over its fair share, but that is in a queue that is over it's fair share is just as likely to have containers preempted as an application that is over its fair share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2108) Show minShare on RM Fair Scheduler page
[ https://issues.apache.org/jira/browse/YARN-2108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010611#comment-14010611 ] Hadoop QA commented on YARN-2108: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12647016/YARN-2108.v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3840//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3840//console This message is automatically generated. Show minShare on RM Fair Scheduler page --- Key: YARN-2108 URL: https://issues.apache.org/jira/browse/YARN-2108 Project: Hadoop YARN Issue Type: Task Reporter: Siqi Li Assignee: Siqi Li Attachments: YARN-2108.v1.patch, YARN-2108.v2.patch Today RM Scheduler page shows FairShare, Used, Used (over fair share) and MaxCapacity. It would be better to show MinShare with possibly different color code, so that we know queue is running more than its min share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
[ https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010614#comment-14010614 ] Tsuyoshi OZAWA commented on YARN-2091: -- Hi Bikas, let me clarify what simply directly use means. I meant to pass exit reason via ContainerKillEvent like {{ContainerKillEvent(containerId, ContainerExitStatus.KILL_EXCEEDED_MEMORY, msg)}}. Is this out of way? Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters --- Key: YARN-2091 URL: https://issues.apache.org/jira/browse/YARN-2091 Project: Hadoop YARN Issue Type: Task Reporter: Bikas Saha Assignee: Tsuyoshi OZAWA Currently, the AM cannot programmatically determine if the task was killed due to using excessive memory. The NM kills it without passing this information in the container status back to the RM. So the AM cannot take any action here. The jira tracks adding this exit status and passing it from the NM to the RM and then the AM. In general, there may be other such actions taken by YARN that are currently opaque to the AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
[ https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Yan updated YARN-1913: -- Attachment: YARN-1913.patch With Fair Scheduler, cluster can logjam when all resources are consumed by AMs -- Key: YARN-1913 URL: https://issues.apache.org/jira/browse/YARN-1913 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.3.0 Reporter: bc Wong Assignee: Wei Yan Labels: easyfix Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch It's possible to deadlock a cluster by submitting many applications at once, and have all cluster resources taken up by AMs. One solution is for the scheduler to limit resources taken up by AMs, as a percentage of total cluster resources, via a maxApplicationMasterShare config. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters
[ https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010630#comment-14010630 ] Bikas Saha commented on YARN-2091: -- We are on the same page. The kill reason is directly a ContainerExitStatus. Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters --- Key: YARN-2091 URL: https://issues.apache.org/jira/browse/YARN-2091 Project: Hadoop YARN Issue Type: Task Reporter: Bikas Saha Assignee: Tsuyoshi OZAWA Currently, the AM cannot programmatically determine if the task was killed due to using excessive memory. The NM kills it without passing this information in the container status back to the RM. So the AM cannot take any action here. The jira tracks adding this exit status and passing it from the NM to the RM and then the AM. In general, there may be other such actions taken by YARN that are currently opaque to the AM. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions
[ https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010643#comment-14010643 ] Hadoop QA commented on YARN-596: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12647026/YARN-596.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3841//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3841//console This message is automatically generated. In fair scheduler, intra-application container priorities affect inter-application preemption decisions --- Key: YARN-596 URL: https://issues.apache.org/jira/browse/YARN-596 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch In the fair scheduler, containers are chosen for preemption in the following way: All containers for all apps that are in queues that are over their fair share are put in a list. The list is sorted in order of the priority that the container was requested in. This means that an application can shield itself from preemption by requesting it's containers at higher priorities, which doesn't really make sense. Also, an application that is not over its fair share, but that is in a queue that is over it's fair share is just as likely to have containers preempted as an application that is over its fair share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1913) With Fair Scheduler, cluster can logjam when all resources are consumed by AMs
[ https://issues.apache.org/jira/browse/YARN-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010659#comment-14010659 ] Hadoop QA commented on YARN-1913: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12647029/YARN-1913.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/3842//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/3842//console This message is automatically generated. With Fair Scheduler, cluster can logjam when all resources are consumed by AMs -- Key: YARN-1913 URL: https://issues.apache.org/jira/browse/YARN-1913 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.3.0 Reporter: bc Wong Assignee: Wei Yan Labels: easyfix Attachments: YARN-1913.patch, YARN-1913.patch, YARN-1913.patch, YARN-1913.patch It's possible to deadlock a cluster by submitting many applications at once, and have all cluster resources taken up by AMs. One solution is for the scheduler to limit resources taken up by AMs, as a percentage of total cluster resources, via a maxApplicationMasterShare config. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2103) Inconsistency between viaProto flag and initial value of SerializedExceptionProto.Builder
[ https://issues.apache.org/jira/browse/YARN-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010700#comment-14010700 ] Binglin Chang commented on YARN-2103: - Hi [~ozawa], thanks for reviewing the patch and the comments. I use the original title because the bug isn't just about inconsistent viaProto, but also lack of equals and hashcode method(which will affect other records who uses SerializedException), I guess I should point out all bugs in the jira. about code format, most PBImpl classes use those common code: {code} private void maybeInitBuilder() { if (viaProto || builder == null) { builder = GetApplicationsRequestProto.newBuilder(proto); } viaProto = false; } @Override public int hashCode() { return getProto().hashCode(); } @Override public boolean equals(Object other) { if (other == null) return false; if (other.getClass().isAssignableFrom(this.getClass())) { return this.getProto().equals(this.getClass().cast(other).getProto()); } return false; } {code} you can see GetApplicationsRequestPBImpl/GetApplicationsResponsePBImpl, I just follow those patterns, maybe we can change them all in another JIRA, changing them may not fit into in this JIRA. bq. How about adding concrete tests as a first step of generic tests on YARN-2051. After generic test are added, those old tests are probably redundant and can be removed. Guess we can discuss this in the future. I can provide a separate test currently. Inconsistency between viaProto flag and initial value of SerializedExceptionProto.Builder - Key: YARN-2103 URL: https://issues.apache.org/jira/browse/YARN-2103 Project: Hadoop YARN Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: YARN-2103.v1.patch {code} SerializedExceptionProto proto = SerializedExceptionProto .getDefaultInstance(); SerializedExceptionProto.Builder builder = null; boolean viaProto = false; {code} Since viaProto is false, we should initiate build rather than proto -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2103) Inconsistency between viaProto flag and initial value of SerializedExceptionProto.Builder
[ https://issues.apache.org/jira/browse/YARN-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated YARN-2103: Description: Bug 1: {code} SerializedExceptionProto proto = SerializedExceptionProto .getDefaultInstance(); SerializedExceptionProto.Builder builder = null; boolean viaProto = false; {code} Since viaProto is false, we should initiate build rather than proto Bug 2: the class does not provide hashcode() and equals() like other PBImpl records, this class is used in other records, it may affect other records' behavior. was: {code} SerializedExceptionProto proto = SerializedExceptionProto .getDefaultInstance(); SerializedExceptionProto.Builder builder = null; boolean viaProto = false; {code} Since viaProto is false, we should initiate build rather than proto Inconsistency between viaProto flag and initial value of SerializedExceptionProto.Builder - Key: YARN-2103 URL: https://issues.apache.org/jira/browse/YARN-2103 Project: Hadoop YARN Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: YARN-2103.v1.patch Bug 1: {code} SerializedExceptionProto proto = SerializedExceptionProto .getDefaultInstance(); SerializedExceptionProto.Builder builder = null; boolean viaProto = false; {code} Since viaProto is false, we should initiate build rather than proto Bug 2: the class does not provide hashcode() and equals() like other PBImpl records, this class is used in other records, it may affect other records' behavior. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1474) Make schedulers services
[ https://issues.apache.org/jira/browse/YARN-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010728#comment-14010728 ] Karthik Kambatla commented on YARN-1474: Thanks Tsuyoshi. We are very close to getting this in. Few minor comments: # In each of the schedulers, I don't think we need the following snippet or for that matter the variable {{initialized}} at all. {{reinitialize()}} would have just the contents of else-block. When using the scheduler, one should setRMContext(), init() and then reinitialize() thereafter. {code} if (!initialized) { this.rmContext = rmContext; initScheduler(configuration); startSchedulerThreads(); } else { {code} # ResourceSchedulerWrapper should override serviceInit, serviceStart and serviceStop methods. Not init, start and stop. # I have a feeling we ll have to update some tests including the ones that are modified in the latest patch to call scheduler.init() right after scheduler.setRMContext, if we are not using the scheduler from a MockRM or ResourceManager instance. Make schedulers services Key: YARN-1474 URL: https://issues.apache.org/jira/browse/YARN-1474 Project: Hadoop YARN Issue Type: Sub-task Components: scheduler Affects Versions: 2.3.0, 2.4.0 Reporter: Sandy Ryza Assignee: Tsuyoshi OZAWA Attachments: YARN-1474.1.patch, YARN-1474.10.patch, YARN-1474.11.patch, YARN-1474.12.patch, YARN-1474.13.patch, YARN-1474.14.patch, YARN-1474.15.patch, YARN-1474.16.patch, YARN-1474.17.patch, YARN-1474.2.patch, YARN-1474.3.patch, YARN-1474.4.patch, YARN-1474.5.patch, YARN-1474.6.patch, YARN-1474.7.patch, YARN-1474.8.patch, YARN-1474.9.patch Schedulers currently have a reinitialize but no start and stop. Fitting them into the YARN service model would make things more coherent. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-596) In fair scheduler, intra-application container priorities affect inter-application preemption decisions
[ https://issues.apache.org/jira/browse/YARN-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010791#comment-14010791 ] Sandy Ryza commented on YARN-596: - Thanks Wei. Getting close - a few more comments. {code} + private static final ResourceCalculator RESOURCE_CALCULATOR = + new DefaultResourceCalculator(); {code} This is no longer needed in FSQueue, right? FIFOPolicy should throw an unsupported operation exception if its checkIfUsageOverFairShare is called. fairshare should be fairShare In fair scheduler, intra-application container priorities affect inter-application preemption decisions --- Key: YARN-596 URL: https://issues.apache.org/jira/browse/YARN-596 Project: Hadoop YARN Issue Type: Bug Components: scheduler Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch, YARN-596.patch In the fair scheduler, containers are chosen for preemption in the following way: All containers for all apps that are in queues that are over their fair share are put in a list. The list is sorted in order of the priority that the container was requested in. This means that an application can shield itself from preemption by requesting it's containers at higher priorities, which doesn't really make sense. Also, an application that is not over its fair share, but that is in a queue that is over it's fair share is just as likely to have containers preempted as an application that is over its fair share. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2075) TestRMAdminCLI consistently fail on trunk and branch-2
[ https://issues.apache.org/jira/browse/YARN-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010804#comment-14010804 ] Kenji Kikushima commented on YARN-2075: --- Hi [~mitdesai ], thanks for your testing. I also tested patch on trunk locally, and confirmed TestRMAdminCLI passed. This patch contains modification for HAAdmin.java. Please refresh if you didn't refresh o.a.h.ha yet. {noformat} $ mvn test -Dtest=org.apache.hadoop.yarn.client.TestRMAdminCLI [INFO] Scanning for projects... [INFO] [INFO] [INFO] Building hadoop-yarn-client 3.0.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-yarn-client --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-resources-plugin:2.2:resources (default-resources) @ hadoop-yarn-client --- [INFO] Using default encoding to copy filtered resources. [INFO] [INFO] --- maven-compiler-plugin:2.5.1:compile (default-compile) @ hadoop-yarn-client --- [INFO] Nothing to compile - all classes are up to date [INFO] [INFO] --- maven-resources-plugin:2.2:testResources (default-testResources) @ hadoop-yarn-client --- [INFO] Using default encoding to copy filtered resources. [INFO] [INFO] --- maven-compiler-plugin:2.5.1:testCompile (default-testCompile) @ hadoop-yarn-client --- [INFO] Nothing to compile - all classes are up to date [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hadoop-yarn-client --- [INFO] Surefire report directory: /home/user/hadoop-common/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/surefire-reports --- T E S T S --- --- T E S T S --- Running org.apache.hadoop.yarn.client.TestRMAdminCLI Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.308 sec - in org.apache.hadoop.yarn.client.TestRMAdminCLI Results : Tests run: 13, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 4.266s [INFO] Finished at: Wed May 28 13:33:31 UTC 2014 [INFO] Final Memory: 17M/268M [INFO] {noformat} TestRMAdminCLI consistently fail on trunk and branch-2 -- Key: YARN-2075 URL: https://issues.apache.org/jira/browse/YARN-2075 Project: Hadoop YARN Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Zhijie Shen Assignee: Kenji Kikushima Attachments: YARN-2075.patch {code} Running org.apache.hadoop.yarn.client.TestRMAdminCLI Tests run: 13, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 1.191 sec FAILURE! - in org.apache.hadoop.yarn.client.TestRMAdminCLI testTransitionToActive(org.apache.hadoop.yarn.client.TestRMAdminCLI) Time elapsed: 0.082 sec ERROR! java.lang.UnsupportedOperationException: null at java.util.AbstractList.remove(AbstractList.java:144) at java.util.AbstractList$Itr.remove(AbstractList.java:360) at java.util.AbstractCollection.remove(AbstractCollection.java:252) at org.apache.hadoop.ha.HAAdmin.isOtherTargetNodeActive(HAAdmin.java:173) at org.apache.hadoop.ha.HAAdmin.transitionToActive(HAAdmin.java:144) at org.apache.hadoop.ha.HAAdmin.runCmd(HAAdmin.java:447) at org.apache.hadoop.ha.HAAdmin.run(HAAdmin.java:380) at org.apache.hadoop.yarn.client.cli.RMAdminCLI.run(RMAdminCLI.java:318) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testTransitionToActive(TestRMAdminCLI.java:180) testHelp(org.apache.hadoop.yarn.client.TestRMAdminCLI) Time elapsed: 0.088 sec FAILURE! java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testError(TestRMAdminCLI.java:366) at org.apache.hadoop.yarn.client.TestRMAdminCLI.testHelp(TestRMAdminCLI.java:307) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)