[
https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076062#comment-14076062
]
Wangda Tan commented on YARN-2008:
----------------------------------
Hi Craig,
As we discussed in YARN-1198, I think we should consider resource used by a
queue's siblings when computing headroom, I took a look at your patch again,
some comments:
We first need think about how to calculate headroom in general, I think
headroom is (concluded from sub JIRAs of YARN-1198),
{code}
queue_available = min(clusterResource - used_by_sibling_of_parents -
used_by_this_queue, queue_max_resource)
headroom = min(queue_available - available_resource_in_blacklisted_nodes,
user_limit)
{code}
So I think this JIRA is focus on computing {{used_by_sibling_of_parents}}, is
it?
I think the general appoarch looks good to me, except In CSQueueUtils.java,
(will include review of tests in next iteration):
1)
{code}
//sibling used is parent used - my used...
float siblingUsedCapacity = Resources.ratio(
resourceCalculator,
Resources.subtract(parent.getUsedResources(),
queue.getUsedResources()),
parentResource);
{code}
It seems to me this computing not robust enough when parent resource is empty,
no matter it's an zero-capacity queue or sibling of it used 100% of cluster.
It's better to add an edge test case to prevent such zero-division as well.
2)
It's better to explicitly cap {{return absoluteMaxAvail}} in range of \[0~1\]
to prevent errors float computation.
Thanks,
Wangda
> CapacityScheduler may report incorrect queueMaxCap if there is hierarchy
> queue structure
> -----------------------------------------------------------------------------------------
>
> Key: YARN-2008
> URL: https://issues.apache.org/jira/browse/YARN-2008
> Project: Hadoop YARN
> Issue Type: Sub-task
> Affects Versions: 2.3.0
> Reporter: Chen He
> Assignee: Craig Welch
> Attachments: YARN-2008.1.patch, YARN-2008.2.patch
>
>
> If there are two queues, both allowed to use 100% of the actual resources in
> the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and
> there is not actual space available. If we use current method to get
> headroom, CapacityScheduler thinks there are still available resources for
> users in Q1 but they have been used by Q2.
> If the CapacityScheduelr has a hierarchy queue structure, it may report
> incorrect queueMaxCap. Here is a example
> || ||rootQueue|| ||
> | | / |
> \ |
> | L1ParentQueue1 | |
> L1ParentQueue2 |
> | (allowed to use up 80% of its parent) | | (allowed to use 20%
> in minimum of its parent)|
> | / | \ | |
> | L2LeafQueue1 | L2LeafQueue2 | |
> |(50% of its parent) | (50% of its parent in minimum) | |
> When we calculate headroom of a user in L2LeafQueue2, current method will
> think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources.
> However, without checking L1ParentQueue1, we are not sure. It is possible
> that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually,
> L2LeafQueue2 can only use 30% (60%*50%).
--
This message was sent by Atlassian JIRA
(v6.2#6252)