[ 
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387413#comment-14387413
 ] 

Wangda Tan commented on YARN-3388:
----------------------------------

Hi [~nroberts],
Sorry for my late response and thanks for reporting/working on this. I think 
your proposal should be good, it computes Σ(user.dominate_share) and user with 
smallest dominate_share can always continue.
For implementations:
- {{updateConsumedRatio}} being called when clusterResource changed or any 
resource allocated, but it needs loop all users in the LeafQueue. This should 
be improved, there could be 100 or more users in a queue.

I think a similar way is, we can save the "user.dominate_share" in each user, 
and also total_dominate_share = Σ(user.dominate_share) in each LeafQueue, with 
this, we need only O(1) time when resource allocated/released and O(#user) time 
when clusterResource changed. Resource allocation/release seems more frequent 
than clusterResource changed to me.

For label support, in addition to above (if you think above suggestion is 
fine), we need record user.dominate_share-by-label and 
total_dominate_share-by-label. Which could solve user-limit-by-label problem.

Please let me know your thoughts.

Thanks,

> userlimit isn't playing well with DRF calculator
> ------------------------------------------------
>
>                 Key: YARN-3388
>                 URL: https://issues.apache.org/jira/browse/YARN-3388
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.6.0
>            Reporter: Nathan Roberts
>            Assignee: Nathan Roberts
>         Attachments: YARN-3388-v0.patch
>
>
> When there are multiple active users in a queue, it should be possible for 
> those users to make use of capacity up-to max_capacity (or close). The 
> resources should be fairly distributed among the active users in the queue. 
> This works pretty well when there is a single resource being scheduled.   
> However, when there are multiple resources the situation gets more complex 
> and the current algorithm tends to get stuck at Capacity. 
> Example illustrated in subsequent comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to