[ 
https://issues.apache.org/jira/browse/YARN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2744:
-----------------------------
    Attachment: YARN-2744-20141025-1.patch

Thanks for [~sumitmohanty] reporting this issue.

Attached a fix of this issue, this bug is caused by currently check if labels 
of queue are included by NodeLabelsManager placed in a wrong place:

{code}
     if (this.accessibleLabels == null && parent != null) {
       this.accessibleLabels = parent.getAccessibleNodeLabels();
-      SchedulerUtils.checkIfLabelInClusterNodeLabels(labelManager,
-          this.accessibleLabels);
     }
+    SchedulerUtils.checkIfLabelInClusterNodeLabels(labelManager,
+        this.accessibleLabels);
{code}

In the past, labels will only be checked when 1) label of a queue is empty 2) 
the queue is not root queue 3) labels of queue's parent is not empty. 

Added a patch and tests to cover different cases when parsing queue, but 
accessible-node-labels not included by NodeLabelsManager, they should throw out 
exception. 

Wangda



> Under some scenario, it is possible to end up with capacity scheduler 
> configuration that uses labels that no longer exist
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-2744
>                 URL: https://issues.apache.org/jira/browse/YARN-2744
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacityscheduler
>    Affects Versions: 2.5.1
>            Reporter: Sumit Mohanty
>            Assignee: Wangda Tan
>            Priority: Critical
>             Fix For: 2.6.0
>
>         Attachments: YARN-2744-20141025-1.patch
>
>
> Use the following steps:
> * Ensure default in-memory storage is configured for labels
> * Define some labels and assign nodes to labels (e.g. define two labels and 
> assign both labels to the host on a one host cluster)
> * Invoke refreshQueues
> * Modify capacity scheduler to create two top level queues and allow access 
> to the labels from both the queues
> * Assign appropriate "label + queue" specific capacities
> * Restart resource manager
> Noticed that RM starts without any issues. The labels are not preserved 
> across restart and thus the capacity-scheduler ends up using labels that are 
> no longer present.
> At this point submitting an application to YARN will not succeed as there are 
> no resources available with the labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to