[
https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Naganarasimha G R updated YARN-4415:
------------------------------------
Attachment: capacity-scheduler.xml
Hi [~wangda],
bq. I think QueueCapacitiesInfo should not assume maxCapacity will be > eps. We
have normalizations while setting values to QueueCapacities, so we should copy
exactly same value from QueueCapacities to QueueCapacitiesInfo (cap it between
0 and 1 is fine).
Point i am trying to make here is that none of the capacities are configured
for a given queue and partition. and hence Queue Capacities will not be having
configured capacities for the given label and when QueueCapacitiesInfo is
queried for the non existent label then it returns the default capacities as 0
and max as 100 (though this can be corrected to be 1)
bq. It's a valid use case that a queue has max capacity = 0, for example,
reservation system (YARN-1051) could dynamically adjust queue capacities.
I am not against to the concept of configuring the max capacity to zero but the
default should not be zero, if not we will not be able to make benifit of
accessible node labels as {{*}}
bq. I may not fully understand why we need to fetch parent queue's capacities
while setting QueueCapacitiesInfo. As I mentioned above, QueueCapacities should
have everything considered and calculated at QueueCapacities (including parent
queue's capacities), correct
In the example scenarios which i have mentioned, queue can access a particular
particular partition, but the capacities for it is not configured. So in that
case QueueCapacities will not be have the label. Also when accessible nodelabel
is configured as {{*}} then any new label can be added to the cluster and NM
can be mapped to it, but as the capacities are not configured for the queue,
allocations can not happen
Hope i am clear if not, i have uploaded my capacity scheduler xml . just
create a new partition label xxx and try to submit a job for it in default
Queue (default queue is configured with accessible nodelabels as {{*}} ). Job
will not be able to proceed.
> Scheduler Web Ui shows max capacity for the queue is 100% but when we submit
> application doesnt get assigned
> ------------------------------------------------------------------------------------------------------------
>
> Key: YARN-4415
> URL: https://issues.apache.org/jira/browse/YARN-4415
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: capacity scheduler, resourcemanager
> Affects Versions: 2.7.2
> Reporter: Naganarasimha G R
> Assignee: Naganarasimha G R
> Attachments: App info with diagnostics info.png,
> capacity-scheduler.xml, screenshot-1.png
>
>
> Steps to reproduce the issue :
> Scenario 1:
> # Configure a queue(default) with accessible node labels as *
> # create a exclusive partition *xxx* and map a NM to it
> # ensure no capacities are configured for default for label xxx
> # start an RM app with queue as default and label as xxx
> # application is stuck but scheduler ui shows 100% as max capacity for that
> queue
> Scenario 2:
> # create a nonexclusive partition *sharedPartition* and map a NM to it
> # ensure no capacities are configured for default queue
> # start an RM app with queue as *default* and label as *sharedPartition*
> # application is stuck but scheduler ui shows 100% as max capacity for that
> queue for *sharedPartition*
> For both issues cause is the same default max capacity and abs max capacity
> is set to Zero %
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)