Naganarasimha G R updated YARN-4415:
    Attachment: capacity-scheduler.xml

Hi [~wangda],
bq. I think QueueCapacitiesInfo should not assume maxCapacity will be > eps. We 
have normalizations while setting values to QueueCapacities, so we should copy 
exactly same value from QueueCapacities to QueueCapacitiesInfo (cap it between 
0 and 1 is fine).
Point i am trying to make here is that none of the capacities are configured 
for a given queue and partition. and hence Queue Capacities will not be having 
configured capacities for the given label and when QueueCapacitiesInfo is 
queried for the non existent label then it returns the default capacities as 0 
and max as 100 (though this can be corrected to be 1)

bq. It's a valid use case that a queue has max capacity = 0, for example, 
reservation system (YARN-1051) could dynamically adjust queue capacities.
I am not against to the concept of configuring the max capacity to zero but the 
default should not be zero, if not we will not be able to make benifit of 
accessible node labels as {{*}}

bq. I may not fully understand why we need to fetch parent queue's capacities 
while setting QueueCapacitiesInfo. As I mentioned above, QueueCapacities should 
have everything considered and calculated at QueueCapacities (including parent 
queue's capacities), correct
In the example scenarios which i have mentioned, queue can access a particular 
particular partition, but the capacities for it is not configured. So in that 
case QueueCapacities will not be have the label. Also when accessible nodelabel 
is configured as {{*}} then any new label can be added to the cluster and NM 
can be mapped to it, but as the capacities are not configured for the queue, 
allocations can not happen

Hope i am clear if not, i  have uploaded my capacity scheduler xml . just 
create a new partition label xxx and try to submit a job for it in default 
Queue (default queue is configured with accessible nodelabels as {{*}} ). Job 
will not be able to proceed.

> Scheduler Web Ui shows max capacity for the queue is 100% but when we submit 
> application doesnt get assigned
> ------------------------------------------------------------------------------------------------------------
>                 Key: YARN-4415
>                 URL: https://issues.apache.org/jira/browse/YARN-4415
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.2
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>         Attachments: App info with diagnostics info.png, 
> capacity-scheduler.xml, screenshot-1.png
> Steps to reproduce the issue :
> Scenario 1:
> # Configure a queue(default) with accessible node labels as *
> # create a exclusive partition *xxx* and map a NM to it
> # ensure no capacities are configured for default for label xxx
> # start an RM app with queue as default and label as xxx
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue
> Scenario 2:
> # create a nonexclusive partition *sharedPartition* and map a NM to it
> # ensure no capacities are configured for default queue
> # start an RM app with queue as *default* and label as *sharedPartition*
> # application is stuck but scheduler ui shows 100% as max capacity for that 
> queue for *sharedPartition*
> For both issues cause is the same default max capacity and abs max capacity 
> is set to Zero %

This message was sent by Atlassian JIRA

Reply via email to