[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

Thomas Graves (JIRA) Thu, 06 Jun 2013 14:39:31 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677544#comment-13677544
 ]


Thomas Graves commented on YARN-276:
------------------------------------

Thanks for the updates, some comments:

- we need to escapeHtml the AM used resources similar to YARN-764

- I think you should put back maxAMResourcePerQueuePerUserPercent.  The main 
reason being its useful to show to users so that they know what limit they 
might be hitting.  Otherwise their job could be waiting to activate and the UI 
doesn't show them any limits they might be hitting.  The 
overAMUsedPercentPerUser should use the Capacity not maxCapacity.  

The per user checks need to taking into account the minimum user percent as 
well as the user limit factor (like it did in previous version of the patch). 
Ideally this is dynamically figured out instead of it being hardcoded like 
before since you could have a user limit % at like 20%, but if there is only 2 
users each user really gets 50%.  That could be complicated based on the timing 
of things. The downside to the dynamic is that it makes it much harder for 
users to understand why there job might not be launched.   It might make more 
sense to keep the formula similar to before where it uses both user limit 
factor and user limit percent for now and file a separate jira to investigate 
making that more dynamic.  That jira could also look into addressing the 
amresource percent applying to the absolute max capacity.  

- can you update the web services documentation 
(./hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm)

- we can remove the "Per Queue" from the web ui: Max AM Resource Per Queue 
Percent. I think we can remove the "PerQueue" bit from the REST web services 
too: maxAMResourcePerQueuePercent -> maxAMResourcePercent

- we are keeping the AM used resource percent at the user level.  It might be 
nice to print output this atleast through the REST webservices. It would be 
nice to have in the UI too but I'm a bit afraid its going to get to cluttered 
there. 

- the REST webservices print out of the amUsedResources should be of type 
ResourceInfo so that you get it in separated fields like:
<amResourcesUsed>
<memory>4096</memory>
<vCores>2</vCores>
</amResourcesUsed>

The old format that we kept for backwards compatibility was: 
<usedResources><memory:4096, vCores:2></usedResources>. We don't need that 
format since this is new.


- TestApplicationLimits - remove the old comment -      // set max active to 2
- TestApplicationLimits - why are you multiplying by the userLimitFactor?
+    Resource queueResource = Resources.multiply(clusterResources,
+        queue.getAbsoluteCapacity() * queue.getUserLimitFactor());

- what are the changes in TestClientTokens.java?

- In the MiniYarnCluster why are we setting the AM resource percent to 100%?
                
> Capacity Scheduler can hang when submit many jobs concurrently
> --------------------------------------------------------------
>
>                 Key: YARN-276
>                 URL: https://issues.apache.org/jira/browse/YARN-276
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 3.0.0, 2.0.1-alpha
>            Reporter: nemon lou
>            Assignee: nemon lou
>              Labels: incompatible
>         Attachments: YARN-276.patch, YARN-276.patch, YARN-276.patch, 
> YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, 
> YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch, 
> YARN-276.patch, YARN-276.patch, YARN-276.patch, YARN-276.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In hadoop2.0.1,When i submit many jobs concurrently at the same time,Capacity 
> scheduler can hang with most resources taken up by AM and don't have enough 
> resources for tasks.And then all applications hang there.
> The cause is that "yarn.scheduler.capacity.maximum-am-resource-percent" not 
> check directly.Instead ,this property only used for maxActiveApplications. 
> And maxActiveApplications is computed by minimumAllocation (not by Am 
> actually used).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-276) Capacity Scheduler can hang when submit many jobs concurrently

Reply via email to