[
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677544#comment-13677544
]
Thomas Graves commented on YARN-276:
------------------------------------
Thanks for the updates, some comments:
- we need to escapeHtml the AM used resources similar to YARN-764
- I think you should put back maxAMResourcePerQueuePerUserPercent. The main
reason being its useful to show to users so that they know what limit they
might be hitting. Otherwise their job could be waiting to activate and the UI
doesn't show them any limits they might be hitting. The
overAMUsedPercentPerUser should use the Capacity not maxCapacity.
The per user checks need to taking into account the minimum user percent as
well as the user limit factor (like it did in previous version of the patch).
Ideally this is dynamically figured out instead of it being hardcoded like
before since you could have a user limit % at like 20%, but if there is only 2
users each user really gets 50%. That could be complicated based on the timing
of things. The downside to the dynamic is that it makes it much harder for
users to understand why there job might not be launched. It might make more
sense to keep the formula similar to before where it uses both user limit
factor and user limit percent for now and file a separate jira to investigate
making that more dynamic. That jira could also look into addressing the
amresource percent applying to the absolute max capacity.
- can you update the web services documentation
(./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm)
- we can remove the "Per Queue" from the web ui: Max AM Resource Per Queue
Percent. I think we can remove the "PerQueue" bit from the REST web services
too: maxAMResourcePerQueuePercent -> maxAMResourcePercent
- we are keeping the AM used resource percent at the user level. It might be
nice to print output this atleast through the REST webservices. It would be
nice to have in the UI too but I'm a bit afraid its going to get to cluttered
there.
- the REST webservices print out of the amUsedResources should be of type
ResourceInfo so that you get it in separated fields like:
<amResourcesUsed>
<memory>4096</memory>
<vCores>2</vCores>
</amResourcesUsed>
The old format that we kept for backwards compatibility was:
<usedResources><memory:4096, vCores:2></usedResources>. We don't need that
format since this is new.
- TestApplicationLimits - remove the old comment - // set max active to 2
- TestApplicationLimits - why are you multiplying by the userLimitFactor?
+ Resource queueResource = Resources.multiply(clusterResources,
+ queue.getAbsoluteCapacity() * queue.getUserLimitFactor());
- what are the changes in TestClientTokens.java?
- In the MiniYarnCluster why are we setting the AM resource percent to 100%?
> Capacity Scheduler can hang when submit many jobs concurrently
> --------------------------------------------------------------
>
> Key: YARN-276
> URL: https://issues.apache.org/jira/browse/YARN-276
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler
> Affects Versions: 3.0.0, 2.0.1-alpha
> Reporter: nemon lou
> Assignee: nemon lou
> Labels: incompatible
> Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch,
> YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch,
> YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch,
> YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity
> scheduler can hang with most resources taken up by AM and don't have enough
> resources for tasks.And then all applications hang there.
> The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not
> check directly.Instead ,this property only used for maxActiveApplications.
> And maxActiveApplications is computed by minimumAllocation (not by Am
> actually used).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira