[
https://issues.apache.org/jira/browse/YARN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wangda Tan updated YARN-2744:
-----------------------------
Attachment: YARN-2744-20141025-1.patch
Thanks for [~sumitmohanty] reporting this issue.
Attached a fix of this issue, this bug is caused by currently check if labels
of queue are included by NodeLabelsManager placed in a wrong place:
{code}
if (this.accessibleLabels == null && parent != null) {
this.accessibleLabels = parent.getAccessibleNodeLabels();
- SchedulerUtils.checkIfLabelInClusterNodeLabels(labelManager,
- this.accessibleLabels);
}
+ SchedulerUtils.checkIfLabelInClusterNodeLabels(labelManager,
+ this.accessibleLabels);
{code}
In the past, labels will only be checked when 1) label of a queue is empty 2)
the queue is not root queue 3) labels of queue's parent is not empty.
Added a patch and tests to cover different cases when parsing queue, but
accessible-node-labels not included by NodeLabelsManager, they should throw out
exception.
Wangda
> Under some scenario, it is possible to end up with capacity scheduler
> configuration that uses labels that no longer exist
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-2744
> URL: https://issues.apache.org/jira/browse/YARN-2744
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: capacityscheduler
> Affects Versions: 2.5.1
> Reporter: Sumit Mohanty
> Assignee: Wangda Tan
> Priority: Critical
> Fix For: 2.6.0
>
> Attachments: YARN-2744-20141025-1.patch
>
>
> Use the following steps:
> * Ensure default in-memory storage is configured for labels
> * Define some labels and assign nodes to labels (e.g. define two labels and
> assign both labels to the host on a one host cluster)
> * Invoke refreshQueues
> * Modify capacity scheduler to create two top level queues and allow access
> to the labels from both the queues
> * Assign appropriate "label + queue" specific capacities
> * Restart resource manager
> Noticed that RM starts without any issues. The labels are not preserved
> across restart and thus the capacity-scheduler ends up using labels that are
> no longer present.
> At this point submitting an application to YARN will not succeed as there are
> no resources available with the labels.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)