[
https://issues.apache.org/jira/browse/YARN-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803880#comment-15803880
]
Ying Zhang commented on YARN-4415:
----------------------------------
Hi [~Naganarasimha], [~leftnoteasy], [~xinxianyin], we've encountered the same
issue during our test. Noticed that this JIRA has been opened for a while. I
understand the reason [~leftnoteasy] and [~xinxianyin] have for choosing 0 or
100 as default max capacity value if not set. But the current issue is we use 0
as default max capacity internally (using macro CSQueueUtils.EPSILON) when
allocating resource but in RM Scheduler UI showing 100 as max capacity (due to
the reason class PartitionQueueCapacitiesInfo use 100 as default value in this
case). Would we change to use same default value here to avoid the
inconsistency?
{quote}
But I think there's one thing we need to fix:
When queue.accessible-node-labels == *,
QueueCapacitiesInfo#QueueCapacitiesInfo(QueueCapacities) should call
RMNodeLabelsManager.getClusterNodeLabelNames to get all labels instead of
calling getExistingNodeLabels. So after we add/remove labels, queue's
capacities in webUI/REST response will be updated as well.
{quote}
[~leftnoteasy], I'm not sure I understand what you mean, but it might be good
that we keep using getExistingNodeLabels so that only the node label partitions
that the queue has access to can be shown in RM Scheduler UI.
> Scheduler Web Ui shows max capacity for the queue is 100% but when we submit
> application doesnt get assigned
> ------------------------------------------------------------------------------------------------------------
>
> Key: YARN-4415
> URL: https://issues.apache.org/jira/browse/YARN-4415
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: capacity scheduler, resourcemanager
> Affects Versions: 2.7.2
> Reporter: Naganarasimha G R
> Assignee: Naganarasimha G R
> Attachments: App info with diagnostics info.png,
> capacity-scheduler.xml, screenshot-1.png
>
>
> Steps to reproduce the issue :
> Scenario 1:
> # Configure a queue(default) with accessible node labels as *
> # create a exclusive partition *xxx* and map a NM to it
> # ensure no capacities are configured for default for label xxx
> # start an RM app with queue as default and label as xxx
> # application is stuck but scheduler ui shows 100% as max capacity for that
> queue
> Scenario 2:
> # create a nonexclusive partition *sharedPartition* and map a NM to it
> # ensure no capacities are configured for default queue
> # start an RM app with queue as *default* and label as *sharedPartition*
> # application is stuck but scheduler ui shows 100% as max capacity for that
> queue for *sharedPartition*
> For both issues cause is the same default max capacity and abs max capacity
> is set to Zero %
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]