[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534947#comment-14534947 ] Vinod Kumar Vavilapalli commented on YARN-276: -- There is a lot of history here. [~cwelch] / [~leftnoteasy], I think one of your recent patches (YARN-2637?) may have fixed this. Can you please go through this JIRA and confirm/resolve it? Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: Nemon Lou Assignee: Nemon Lou Labels: incompatible Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534835#comment-14534835 ] Rohith commented on YARN-276: - Any update on this issue? Is this problem still exists in trunk and branch-2? Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: Nemon Lou Assignee: Nemon Lou Labels: incompatible Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13675840#comment-13675840 ] Hadoop QA commented on YARN-276: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12586295/YARN-276.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1122//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1122//console This message is automatically generated. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Assignee: nemon lou Labels: incompatible Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13675574#comment-13675574 ] Thomas Graves commented on YARN-276: I need to spend more time looking through the new logic, here are a few more comments for now. remove the comment from overAMUsedPercent about max active application since its no longer present. FicaSchedulerApp getAMResource, change amRequedResource - amRequiredResource the max active applications per user used to use the absolute queue capacity instead of the absolute max queue capacity. It was changed to use the absolute capacity because it uses the userlimitfactor in the calculation, which should be applied to the capacity and not max capacity (see MAPREDUCE-3897 for more details). We should change the overAMUsedPercentPerUser similarly to use absolute capacity, not absolute max capacity. This can be filed as a separate jira since it was pre-existing but a bad app that requests 0 for the memory could cause divide by zero exception. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Assignee: nemon lou Labels: incompatible Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673458#comment-13673458 ] Thomas Graves commented on YARN-276: Nemon, Sorry it appears this got lost in the shuffle and it no longer applies, could you update the patch for the current trunk/branch-2? Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Assignee: nemon lou Labels: incompatible Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13673453#comment-13673453 ] Hadoop QA commented on YARN-276: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/1258/YARN-276.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1081//console This message is automatically generated. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Assignee: nemon lou Labels: incompatible Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638923#comment-13638923 ] Hadoop QA commented on YARN-276: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/1258/YARN-276.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/804//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/804//console This message is automatically generated. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Assignee: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634383#comment-13634383 ] Thomas Graves commented on YARN-276: Following up on the web services. Its not heavily used so I'm ok with us just removing the old maxActiveApplications and marking it as incompatible change. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Assignee: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631585#comment-13631585 ] Hadoop QA commented on YARN-276: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578689/YARN-276.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/735//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/735//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/735//console This message is automatically generated. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Assignee: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631599#comment-13631599 ] Hadoop QA commented on YARN-276: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578699/YARN-276.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/736//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/736//console This message is automatically generated. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Assignee: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631659#comment-13631659 ] Hadoop QA commented on YARN-276: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578712/YARN-276.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/737//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/737//console This message is automatically generated. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Assignee: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630939#comment-13630939 ] nemon lou commented on YARN-276: [~tgraves] Here is the initial thoughts on checking cluster level AM resource percent in each leafqueue: Leaf queue'capacity is computed based on absoluteMaxCapacity. Considering we have 10 leaf queues,each with a value of 100% absoluteMaxCapacity and 10% maxAMResourcePerQueuePercent configured, there is still a chance that all leaf queue's resources taken up by AM before reaching the 10% maxAMResourcePerQueuePercent limit. Note that a cluster basis' am resource percent only works in leaf queue if no am resource percent configured for this leaf queue. As Thomas Graves mentioned,cluster level checking will causing one queue restrict another.I will remove cluster level checking. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Assignee: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628069#comment-13628069 ] Zhijie Shen commented on YARN-276: -- I found one more issue with the per queue basis check. The used AM resource cannot exceed the per queue cap, and the real situation would be that there is be some wasted resource of each queue, because it is not enough to accommodate one more AM. However, by gathering this wasted resource of each queue, it is possible to accommodate one more AM. The problem is caused by the hard resourced resource reservation. It will be more severe if there are more queues. Anyway, it could be a separate jira, which can be addressed later. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Assignee: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617403#comment-13617403 ] Thomas Graves commented on YARN-276: I agree with Zhijie on this, if we are changing it to use actual used am resource we don't need maxActiveApplications as it will just cause more confusion/unneeded logic. One thing we need to think about is what to do with the web services api as it has maxActiveApplications in it. If it no longer applies we might need to make v2 or perhaps we come up with a better way to hide the internal details. I didn't look at the patch in great details so perhaps I missed something, but why are we checking both the cluster level and the queue level? It seems like queue level should be enough and is generally where we are most concerned about this. If somehow one queue does go over, it shouldn't really restrict another queue from using its share. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Assignee: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616708#comment-13616708 ] Zhijie Shen commented on YARN-276: -- IMO, the essential problem is that maxActiveApplications is a loose bound. See the formular bellow. 1. clusterResource * maximumApplicationMasterResourcePercent = minAllocation * maxActiveApplications. maxActiveApplications is computed by assuming each application only requires minAllocation. In fact, AM container may require more. Therefore, 2. clusterResource * maximumApplicationMasterResourcePercent = minAllocation * maxActiveApplications = (minAllocation_1 + minAllocation_2 + ... + minAllocation_k) = (requestedResource_1 + requestedResource_2 + ... + minAllocation_k), where k = maxActiveApplications. Hence when maxActiveApplications applications are activated and they require more than minAllocation resource, such that more than maximumApplicationMasterResourcePercent of clusterResource may be used by AMs, and even clusterResource is likely to be exceeded. @nemon's solution looks good, which is actually a more restrict bound of the max allowed active applications. Whenever an application is to be activated, the following criteria is checked. 3. clusterResource * maximumApplicationMasterResourcePercent - ApplicationMasterResource = requestedResource. The issue here is that when this criteria is met, maxActiveApplications should be met as well, because this one is more restricted. So instead of add the new criteria, how about replacing maxActiveApplications with it? Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617008#comment-13617008 ] nemon lou commented on YARN-276: [~zjshen] Yes,a dynamic maxActiveApplications will work ,too.And no need adding any new criteria .I'll give it a try . Thanks. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539260#comment-13539260 ] nemon lou commented on YARN-276: updating the patch. Four properties have been added to CS web page: Max AM Used Per Queue Percent Actual AM Used Per Queue Percent Max AM Used Percent For Cluster Actual AM Used Percent For Cluster This patch keeps track of AM used resources and checks for it both at cluster level and leaf Queue level. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13539263#comment-13539263 ] Hadoop QA commented on YARN-276: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562321/YARN-276.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/258//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/258//console This message is automatically generated. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537168#comment-13537168 ] Robert Joseph Evans commented on YARN-276: -- I am not an expert on the scheduler code, so I have not done an in depth review of the patch. My biggest concern with this is that there is no visibility in the UI/web services about why an app may not have been scheduled. It would be great if you could update CapacitySchedulerLeafQueueInfo.java and the web page that uses it CapacitySchedulerPage.java. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537643#comment-13537643 ] nemon lou commented on YARN-276: Good idea ,Robert.Thank you for your comment. I think it's good to display AM used resources and AM percent limit(or max resources that AMs can use) for each leaf queue on capacity scheduler page. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535809#comment-13535809 ] nemon lou commented on YARN-276: All YARN and MR 's tests passed on my own cluster.So Submit Patch again. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535813#comment-13535813 ] Hadoop QA commented on YARN-276: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561672/YARN-276.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/237//console This message is automatically generated. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13536818#comment-13536818 ] nemon lou commented on YARN-276: This patch is ready for review now.Thank you. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534585#comment-13534585 ] Hadoop QA commented on YARN-276: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561406/YARN-276.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.security.TestApplicationTokens org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization org.apache.hadoop.yarn.server.resourcemanager.security.TestClientTokens {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/227//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/227//console This message is automatically generated. Capacity Scheduler can hang when submit many jobs concurrently -- Key: YARN-276 URL: https://issues.apache.org/jira/browse/YARN-276 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 3.0.0, 2.0.1-alpha Reporter: nemon lou Attachments: YARN-276.patch Original Estimate: 24h Remaining Estimate: 24h In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity scheduler can hang with most resources taken up by AM and don't have enough resources for tasks.And then all applications hang there. The cause is that yarn.scheduler.capacity.maximum-am-resource-percent not check directly.Instead ,this property only used for maxActiveApplications. And maxActiveApplications is computed by minimumAllocation (not by Am actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira