[ 
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376060#comment-14376060
 ] 

Nathan Roberts commented on YARN-3388:
--------------------------------------

Example (lots of things going on in this algorithm. I simplified to just the 
key pieces for clarity.)
tuples are resources [memory] or [memory,cpu]

just memory:
-----------------
Queue Capacity is [100]
2 active users, both request [10] at a time
User1 is at [45]
User2 is at [40]
Limit is calculated to be 100/2=50, both users can allocate
User2 goes to [50] - now used Capacity is 45+50=95
Limit is still 50
User1 goes to [55] - used Capacity now 50+55=105
Limit is now 105/2
User2 goes to [60] - used Capacity is now 60+55=115
Limit is now 115/2
So on and so forth until maxCapacity is hit.
Notice how the users essentially leap frog one another, allowing the Limit to 
continually move higher.

memory and cpu
------------------------
Queue Capacity is [100,100]
2 active users, User1 asks for [10,20], User2 asks for [20,10]
User1 is at [35,45]
User2 is at [45,35]
Limit is calculated to be [100/2=50,100/2=50], both users can allocate
User2 goes to [65,45] - used Capacity is now [65+35=100,45+45=90]
Limit is still [50,50]
User1 goes to [45,65] - used Capacity is now [65+45=110,45+65=110]
Limit is now [110/2=55, 110/2=55]
User1 and User2 are now both considered over limit and neither can allocate. 
User1 is over on cpu, User2 is over on memory.

Open to suggestions on simple ways to fix this. I'm currently thinking a 
reasonable (simple, effective, computationally cheap, mostly fair) approach 
might be to give some small percentage of additional leeway for userLimit. 



> userlimit isn't playing well with DRF calculator
> ------------------------------------------------
>
>                 Key: YARN-3388
>                 URL: https://issues.apache.org/jira/browse/YARN-3388
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.6.0
>            Reporter: Nathan Roberts
>            Assignee: Nathan Roberts
>
> When there are multiple active users in a queue, it should be possible for 
> those users to make use of capacity up-to max_capacity (or close). The 
> resources should be fairly distributed among the active users in the queue. 
> This works pretty well when there is a single resource being scheduled.   
> However, when there are multiple resources the situation gets more complex 
> and the current algorithm tends to get stuck at Capacity. 
> Example illustrated in subsequent comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to