[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14534947#comment-14534947 ] Vinod Kumar Vavilapalli commented on YARN-276: -- There is a lot of history here. [~cwelch] / [~leftnoteasy], I think one of your recent patches (YARN-2637?) may have fixed this. Can you please go through this JIRA and confirm/resolve it? > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: Nemon Lou >Assignee: Nemon Lou > Labels: incompatible > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14534835#comment-14534835 ] Rohith commented on YARN-276: - Any update on this issue? Is this problem still exists in trunk and branch-2? > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: Nemon Lou >Assignee: Nemon Lou > Labels: incompatible > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677544#comment-13677544 ] Thomas Graves commented on YARN-276: Thanks for the updates, some comments: - we need to escapeHtml the AM used resources similar to YARN-764 - I think you should put back maxAMResourcePerQueuePerUserPercent. The main reason being its useful to show to users so that they know what limit they might be hitting. Otherwise their job could be waiting to activate and the UI doesn't show them any limits they might be hitting. The overAMUsedPercentPerUser should use the Capacity not maxCapacity. The per user checks need to taking into account the minimum user percent as well as the user limit factor (like it did in previous version of the patch). Ideally this is dynamically figured out instead of it being hardcoded like before since you could have a user limit % at like 20%, but if there is only 2 users each user really gets 50%. That could be complicated based on the timing of things. The downside to the dynamic is that it makes it much harder for users to understand why there job might not be launched. It might make more sense to keep the formula similar to before where it uses both user limit factor and user limit percent for now and file a separate jira to investigate making that more dynamic. That jira could also look into addressing the amresource percent applying to the absolute max capacity. - can you update the web services documentation (./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm) - we can remove the "Per Queue" from the web ui: Max AM Resource Per Queue Percent. I think we can remove the "PerQueue" bit from the REST web services too: maxAMResourcePerQueuePercent -> maxAMResourcePercent - we are keeping the AM used resource percent at the user level. It might be nice to print output this atleast through the REST webservices. It would be nice to have in the UI too but I'm a bit afraid its going to get to cluttered there. - the REST webservices print out of the amUsedResources should be of type ResourceInfo so that you get it in separated fields like: 4096 2 The old format that we kept for backwards compatibility was: . We don't need that format since this is new. - TestApplicationLimits - remove the old comment - // set max active to 2 - TestApplicationLimits - why are you multiplying by the userLimitFactor? +Resource queueResource = Resources.multiply(clusterResources, +queue.getAbsoluteCapacity() * queue.getUserLimitFactor()); - what are the changes in TestClientTokens.java? - In the MiniYarnCluster why are we setting the AM resource percent to 100%? > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Labels: incompatible > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675840#comment-13675840 ] Hadoop QA commented on YARN-276: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12586295/YARN-276.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1122//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1122//console This message is automatically generated. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Labels: incompatible > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675778#comment-13675778 ] nemon lou commented on YARN-276: Hi Thomas,thank you for your review. The "used resource not showing up" issue seems like a bug that already exists.i will fire another jira for it.(Resource.java's toString() method uses symbol "<>",which is ignored by explorers) The "divide by zero exception" problem has not been fixed as i haven't find which piece of code can cause it. Other review comments will been accepted in latest patch.Thanks. After reconsidering user limit, i find property "maxAMResourcePerQueuePerUserPercent" added by me is not a proper one.It will be removed and checking maxAMResourcePerQueue for each user instead. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Labels: incompatible > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675574#comment-13675574 ] Thomas Graves commented on YARN-276: I need to spend more time looking through the new logic, here are a few more comments for now. remove the comment from overAMUsedPercent about max active application since its no longer present. FicaSchedulerApp getAMResource, change amRequedResource -> amRequiredResource the max active applications per user used to use the absolute queue capacity instead of the absolute max queue capacity. It was changed to use the absolute capacity because it uses the userlimitfactor in the calculation, which should be applied to the capacity and not max capacity (see MAPREDUCE-3897 for more details). We should change the overAMUsedPercentPerUser similarly to use absolute capacity, not absolute max capacity. This can be filed as a separate jira since it was pre-existing but a bad app that requests 0 for the memory could cause divide by zero exception. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Labels: incompatible > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675315#comment-13675315 ] Thomas Graves commented on YARN-276: Thanks Nemon, I'm still reviewing it, here are a couple of things so far. I hope to finish reviewing later tonight. - LeafQueue - please wrap at 80 characters - LeafQueue - please use the @VisibleForTesting annoation in setMaxAMResourcePerQueuePerUserPercent - FicaSchedulerApp - for misspelled as foe - FicaSchedulerApp - please use the @VisibleForTesting annotation around setAMResource I ran a few tests and looked at the scheduler webui for the queue I was running in and the used resource and am used resources showed up blank even though there were jobs running. Can you please take a look to see why? The REST web services call were returning values for those fields. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Labels: incompatible > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13674076#comment-13674076 ] Hadoop QA commented on YARN-276: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12586047/YARN-276.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/1104//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1104//console This message is automatically generated. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Labels: incompatible > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673453#comment-13673453 ] Hadoop QA commented on YARN-276: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/1258/YARN-276.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1081//console This message is automatically generated. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Labels: incompatible > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673458#comment-13673458 ] Thomas Graves commented on YARN-276: Nemon, Sorry it appears this got lost in the shuffle and it no longer applies, could you update the patch for the current trunk/branch-2? > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Labels: incompatible > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638944#comment-13638944 ] nemon lou commented on YARN-276: Parameter maxActiveApplicationsPerUsers is changed to maxAMResourcePerQueuePerUserPercent.Also checked by user's actual AM used resources. And the webService has this change,too. [~tgraves] > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638923#comment-13638923 ] Hadoop QA commented on YARN-276: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/1258/YARN-276.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/804//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/804//console This message is automatically generated. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13634383#comment-13634383 ] Thomas Graves commented on YARN-276: Following up on the web services. Its not heavily used so I'm ok with us just removing the old maxActiveApplications and marking it as incompatible change. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631720#comment-13631720 ] Thomas Graves commented on YARN-276: thanks for the update Nemon. You are going to want to change maxActiveApplicationsPerUsers also, otherwise you hit the same issues in the per user checks. I don't really want to just remove maxActiveApplications from the webservices as that is incompatible for anyone using 0.23 version. Let me investigate that a little bit more. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631659#comment-13631659 ] Hadoop QA commented on YARN-276: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578712/YARN-276.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/737//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/737//console This message is automatically generated. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631599#comment-13631599 ] Hadoop QA commented on YARN-276: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578699/YARN-276.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/736//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/736//console This message is automatically generated. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13631585#comment-13631585 ] Hadoop QA commented on YARN-276: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12578689/YARN-276.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/735//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/735//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/735//console This message is automatically generated. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630939#comment-13630939 ] nemon lou commented on YARN-276: [~tgraves] Here is the initial thoughts on checking cluster level AM resource percent in each leafqueue: Leaf queue'capacity is computed based on absoluteMaxCapacity. Considering we have 10 leaf queues,each with a value of 100% absoluteMaxCapacity and 10% maxAMResourcePerQueuePercent configured, there is still a chance that all leaf queue's resources taken up by AM before reaching the 10% maxAMResourcePerQueuePercent limit. Note that a cluster basis' am resource percent only works in leaf queue if no am resource percent configured for this leaf queue. As Thomas Graves mentioned,cluster level checking will causing one queue restrict another.I will remove cluster level checking. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628069#comment-13628069 ] Zhijie Shen commented on YARN-276: -- I found one more issue with the per queue basis check. The used AM resource cannot exceed the per queue cap, and the real situation would be that there is be some wasted resource of each queue, because it is not enough to accommodate one more AM. However, by gathering this wasted resource of each queue, it is possible to accommodate one more AM. The problem is caused by the hard resourced resource reservation. It will be more severe if there are more queues. Anyway, it could be a separate jira, which can be addressed later. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617444#comment-13617444 ] Thomas Graves commented on YARN-276: Also note that the am resource percent can be set on a per queue basis as well as a cluster basis so having cluster level check doesn't make sense in that case. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617403#comment-13617403 ] Thomas Graves commented on YARN-276: I agree with Zhijie on this, if we are changing it to use actual used am resource we don't need maxActiveApplications as it will just cause more confusion/unneeded logic. One thing we need to think about is what to do with the web services api as it has maxActiveApplications in it. If it no longer applies we might need to make v2 or perhaps we come up with a better way to hide the internal details. I didn't look at the patch in great details so perhaps I missed something, but why are we checking both the cluster level and the queue level? It seems like queue level should be enough and is generally where we are most concerned about this. If somehow one queue does go over, it shouldn't really restrict another queue from using its share. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou >Assignee: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617008#comment-13617008 ] nemon lou commented on YARN-276: [~zjshen] Yes,a dynamic maxActiveApplications will work ,too.And no need adding any new criteria .I'll give it a try . Thanks. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616708#comment-13616708 ] Zhijie Shen commented on YARN-276: -- IMO, the essential problem is that maxActiveApplications is a loose bound. See the formular bellow. 1. clusterResource * maximumApplicationMasterResourcePercent = minAllocation * maxActiveApplications. maxActiveApplications is computed by assuming each application only requires minAllocation. In fact, AM container may require more. Therefore, 2. clusterResource * maximumApplicationMasterResourcePercent = minAllocation * maxActiveApplications = (minAllocation_1 + minAllocation_2 + ... + minAllocation_k) <= (requestedResource_1 + requestedResource_2 + ... + minAllocation_k), where k = maxActiveApplications. Hence when maxActiveApplications applications are activated and they require more than minAllocation resource, such that more than maximumApplicationMasterResourcePercent of clusterResource may be used by AMs, and even clusterResource is likely to be exceeded. @nemon's solution looks good, which is actually a more restrict bound of the max allowed active applications. Whenever an application is to be activated, the following criteria is checked. 3. clusterResource * maximumApplicationMasterResourcePercent - ApplicationMasterResource >= requestedResource. The issue here is that when this criteria is met, maxActiveApplications should be met as well, because this one is more restricted. So instead of add the new criteria, how about replacing maxActiveApplications with it? > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539263#comment-13539263 ] Hadoop QA commented on YARN-276: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12562321/YARN-276.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/258//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/258//console This message is automatically generated. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13539260#comment-13539260 ] nemon lou commented on YARN-276: updating the patch. Four properties have been added to CS web page: Max AM Used Per Queue Percent Actual AM Used Per Queue Percent Max AM Used Percent For Cluster Actual AM Used Percent For Cluster This patch keeps track of AM used resources and checks for it both at cluster level and leaf Queue level. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537643#comment-13537643 ] nemon lou commented on YARN-276: Good idea ,Robert.Thank you for your comment. I think it's good to display AM used resources and AM percent limit(or max resources that AMs can use) for each leaf queue on capacity scheduler page. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537168#comment-13537168 ] Robert Joseph Evans commented on YARN-276: -- I am not an expert on the scheduler code, so I have not done an in depth review of the patch. My biggest concern with this is that there is no visibility in the UI/web services about why an app may not have been scheduled. It would be great if you could update CapacitySchedulerLeafQueueInfo.java and the web page that uses it CapacitySchedulerPage.java. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536818#comment-13536818 ] nemon lou commented on YARN-276: This patch is ready for review now.Thank you. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13536781#comment-13536781 ] Hadoop QA commented on YARN-276: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561839/YARN-276.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/240//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/240//console This message is automatically generated. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch, YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535813#comment-13535813 ] Hadoop QA commented on YARN-276: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561672/YARN-276.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/237//console This message is automatically generated. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535809#comment-13535809 ] nemon lou commented on YARN-276: All YARN and MR 's tests passed on my own cluster.So Submit Patch again. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou > Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, > YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently
[ https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534585#comment-13534585 ] Hadoop QA commented on YARN-276: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12561406/YARN-276.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.TestApplicationCleanup org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationLimits org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates org.apache.hadoop.yarn.server.resourcemanager.TestApplicationMasterLauncher org.apache.hadoop.yarn.server.resourcemanager.TestRM org.apache.hadoop.yarn.server.resourcemanager.security.TestApplicationTokens org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization org.apache.hadoop.yarn.server.resourcemanager.security.TestClientTokens {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/227//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/227//console This message is automatically generated. > Capacity Scheduler can hang when submit many jobs concurrently > -- > > Key: YARN-276 > URL: https://issues.apache.org/jira/browse/YARN-276 > Project: Hadoop YARN > Issue Type: Bug > Components: capacityscheduler >Affects Versions: 3.0.0, 2.0.1-alpha >Reporter: nemon lou > Attachments: YARN-276.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity > scheduler can hang with most resources taken up by AM and don't have enough > resources for tasks.And then all applications hang there. > The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not > check directly.Instead ,this property only used for maxActiveApplications. > And maxActiveApplications is computed by minimumAllocation (not by Am > actually used). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira