Thomas Graves commented on YARN-3434:

Note I had a reproducible test case for this.  Set userlimit% to 100%, user 
limit factor to 1.  15 nodes, 20GB each.  1 queue configured for capacity 70, 
the 2nd queue configured capacity 30.
In one queue I started a sleep job needing 10 - 12GB containers in the first 
queue.  I then started a second job in the 2nd queue that needed 25,  12GB 
containers, the second job got containers but then had to reserve others 
waiting for the first job to release some.   

Without this change when the first job started releasing containers the second 
job would grab them and go over the user limit.  With this fix it stayed within 
the user limit.  

> Interaction between reservations and userlimit can result in significant ULF 
> violation
> --------------------------------------------------------------------------------------
>                 Key: YARN-3434
>                 URL: https://issues.apache.org/jira/browse/YARN-3434
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.6.0
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>         Attachments: YARN-3434.patch
> ULF was set to 1.0
> User was able to consume 1.4X queue capacity.
> It looks like when this application launched, it reserved about 1000 
> containers, each 8G each, within about 5 seconds. I think this allowed the 
> logic in assignToUser() to allow the userlimit to be surpassed.

This message was sent by Atlassian JIRA

Reply via email to