Wangda Tan commented on YARN-3434:

I think your concerns may not be a problem, ResourceLimits will be replaced 
(instead of updated) when node heartbeat. And ResourceLimits object itself is 
to decouple Parent and Child (e.g. ParentQueue to Children, LeafQueue to apps), 
Child doesn't need to understand how Parent compute limits, it only need to 
respect limits. For example, app doesn't need to understand how queue computing 
queue capacity/user-limit/continous-reservation-looking, it only need to know 
what's the "limit" considering all factors, so it can make decision to 

The usage of ResourceLimits in my mind for user-limit case is:
- ParentQueue compute/set limits
- LeafQueue store limits (why store see 1.)
- LeafQueue recompute/set user-limit when trying to do allocate for each 
- LeafQueue check user-limit as well as limits when trying to allocate/reserve 
- The user-limit saved in ResourceLimits is only used in normal 
allocation/reservation path, if it's a reserved allocation, we will reset 
user-limit to un-limited.

1. Why store limits in LeafQueue instead of passing down?
This is required by headroom computing, app's headroom is affected by queue's 
parent as well as sibling changes, we cannot update all app's headroom when 
that changes, but we need recompute headroom when app do heartbeat, so we have 
to store latest ResourceLimits in LeafQueue. See YARN-2008 for more information.

I'm not sure if above can make you understand better about my suggestion. 
Please let me know your thoughts.

> Interaction between reservations and userlimit can result in significant ULF 
> violation
> --------------------------------------------------------------------------------------
>                 Key: YARN-3434
>                 URL: https://issues.apache.org/jira/browse/YARN-3434
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.6.0
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>         Attachments: YARN-3434.patch
> ULF was set to 1.0
> User was able to consume 1.4X queue capacity.
> It looks like when this application launched, it reserved about 1000 
> containers, each 8G each, within about 5 seconds. I think this allowed the 
> logic in assignToUser() to allow the userlimit to be surpassed.

This message was sent by Atlassian JIRA

Reply via email to