[ 
https://issues.apache.org/jira/browse/YARN-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343995#comment-14343995
 ] 

Wangda Tan commented on YARN-3243:
----------------------------------

How reservation continous looking works now:
- For ParentQueue, if capacity after all reserved containers dropped <= maximum 
capacity of the queue, it will try to assign containers on children.
- For LeafQueue, if capacity after all reserved containers of an application 
dropped + required < maximum capacity of the LeafQueue, will continue.
- For application, if LeafQueue/ParentQueue has marked some containers needs to 
unreserve, it will tries to unreserve a container with resource > asked 
resource.

But actually, we need make,
{{min(LeafQueue.limit - LeafQueue.usage, user.limit - user.usage) - required + 
application-unreserved-resource >= 0}}. 
And as what we've done in YARN-3265, {{LeafQueue.limit = min(Parent.limit, 
LeafQueue.max).}}
Otherwise, some capacity limits in the queue hierarchy will be violated.

Passing "ResourceLimits" in the hierarchy can enforce the limit described above 
and also simplify code structure. Working on a patch now.

> CapacityScheduler should pass headroom from parent to children to make sure 
> ParentQueue obey its capacity limits.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-3243
>                 URL: https://issues.apache.org/jira/browse/YARN-3243
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler, resourcemanager
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>
> Now CapacityScheduler has some issues to make sure ParentQueue always obeys 
> its capacity limits, for example:
> 1) When allocating container of a parent queue, it will only check 
> parentQueue.usage < parentQueue.max. If leaf queue allocated a container.size 
> > (parentQueue.max - parentQueue.usage), parent queue can excess its max 
> resource limit, as following example:
> {code}
>         A  (usage=54, max=55)
>        /     \
>       A1     A2 (usage=1, max=55)
> (usage=53, max=53)
> {code}
> Queue-A2 is able to allocate container since its usage < max, but if we do 
> that, A's usage can excess A.max.
> 2) When doing continous reservation check, parent queue will only tell 
> children "you need unreserve *some* resource, so that I will less than my 
> maximum resource", but it will not tell how many resource need to be 
> unreserved. This may lead to parent queue excesses configured maximum 
> capacity as well.
> With YARN-3099/YARN-3124, now we have {{ResourceUsage}} class in each class, 
> *here is my proposal*:
> - ParentQueue will set its children's ResourceUsage.headroom, which means, 
> *maximum resource its children can allocate*.
> - ParentQueue will set its children's headroom to be (saying parent's name is 
> "qA"): min(qA.headroom, qA.max - qA.used). This will make sure qA's 
> ancestors' capacity will be enforced as well (qA.headroom is set by qA's 
> parent).
> - {{needToUnReserve}} is not necessary, instead, children can get how much 
> resource need to be unreserved to keep its parent's resource limit.
> - More over, with this, YARN-3026 will make a clear boundary between 
> LeafQueue and FiCaSchedulerApp, headroom will consider user-limit, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to