[
https://issues.apache.org/jira/browse/YARN-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387413#comment-14387413
]
Wangda Tan commented on YARN-3388:
----------------------------------
Hi [~nroberts],
Sorry for my late response and thanks for reporting/working on this. I think
your proposal should be good, it computes Σ(user.dominate_share) and user with
smallest dominate_share can always continue.
For implementations:
- {{updateConsumedRatio}} being called when clusterResource changed or any
resource allocated, but it needs loop all users in the LeafQueue. This should
be improved, there could be 100 or more users in a queue.
I think a similar way is, we can save the "user.dominate_share" in each user,
and also total_dominate_share = Σ(user.dominate_share) in each LeafQueue, with
this, we need only O(1) time when resource allocated/released and O(#user) time
when clusterResource changed. Resource allocation/release seems more frequent
than clusterResource changed to me.
For label support, in addition to above (if you think above suggestion is
fine), we need record user.dominate_share-by-label and
total_dominate_share-by-label. Which could solve user-limit-by-label problem.
Please let me know your thoughts.
Thanks,
> userlimit isn't playing well with DRF calculator
> ------------------------------------------------
>
> Key: YARN-3388
> URL: https://issues.apache.org/jira/browse/YARN-3388
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacityscheduler
> Affects Versions: 2.6.0
> Reporter: Nathan Roberts
> Assignee: Nathan Roberts
> Attachments: YARN-3388-v0.patch
>
>
> When there are multiple active users in a queue, it should be possible for
> those users to make use of capacity up-to max_capacity (or close). The
> resources should be fairly distributed among the active users in the queue.
> This works pretty well when there is a single resource being scheduled.
> However, when there are multiple resources the situation gets more complex
> and the current algorithm tends to get stuck at Capacity.
> Example illustrated in subsequent comment.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)