[jira] [Updated] (YARN-2744) Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist

2014-10-27 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2744:
-
Priority: Critical  (was: Major)

 Under some scenario, it is possible to end up with capacity scheduler 
 configuration that uses labels that no longer exist
 -

 Key: YARN-2744
 URL: https://issues.apache.org/jira/browse/YARN-2744
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Wangda Tan
Priority: Critical
 Attachments: YARN-2744-20141025-1.patch, YARN-2744-20141025-2.patch


 Use the following steps:
 * Ensure default in-memory storage is configured for labels
 * Define some labels and assign nodes to labels (e.g. define two labels and 
 assign both labels to the host on a one host cluster)
 * Invoke refreshQueues
 * Modify capacity scheduler to create two top level queues and allow access 
 to the labels from both the queues
 * Assign appropriate label + queue specific capacities
 * Restart resource manager
 Noticed that RM starts without any issues. The labels are not preserved 
 across restart and thus the capacity-scheduler ends up using labels that are 
 no longer present.
 At this point submitting an application to YARN will not succeed as there are 
 no resources available with the labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2744) Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist

2014-10-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2744:
-
Issue Type: Sub-task  (was: Bug)
Parent: YARN-2492

 Under some scenario, it is possible to end up with capacity scheduler 
 configuration that uses labels that no longer exist
 -

 Key: YARN-2744
 URL: https://issues.apache.org/jira/browse/YARN-2744
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Wangda Tan
Priority: Critical
 Fix For: 2.6.0


 Use the following steps:
 * Ensure default in-memory storage is configured for labels
 * Define some labels and assign nodes to labels (e.g. define two labels and 
 assign both labels to the host on a one host cluster)
 * Invoke refreshQueues
 * Modify capacity scheduler to create two top level queues and allow access 
 to the labels from both the queues
 * Assign appropriate label + queue specific capacities
 * Restart resource manager
 Noticed that RM starts without any issues. The labels are not preserved 
 across restart and thus the capacity-scheduler ends up using labels that are 
 no longer present.
 At this point submitting an application to YARN will not succeed as there are 
 no resources available with the labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2744) Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist

2014-10-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2744:
-
Attachment: YARN-2744-20141025-1.patch

Thanks for [~sumitmohanty] reporting this issue.

Attached a fix of this issue, this bug is caused by currently check if labels 
of queue are included by NodeLabelsManager placed in a wrong place:

{code}
 if (this.accessibleLabels == null  parent != null) {
   this.accessibleLabels = parent.getAccessibleNodeLabels();
-  SchedulerUtils.checkIfLabelInClusterNodeLabels(labelManager,
-  this.accessibleLabels);
 }
+SchedulerUtils.checkIfLabelInClusterNodeLabels(labelManager,
+this.accessibleLabels);
{code}

In the past, labels will only be checked when 1) label of a queue is empty 2) 
the queue is not root queue 3) labels of queue's parent is not empty. 

Added a patch and tests to cover different cases when parsing queue, but 
accessible-node-labels not included by NodeLabelsManager, they should throw out 
exception. 

Wangda



 Under some scenario, it is possible to end up with capacity scheduler 
 configuration that uses labels that no longer exist
 -

 Key: YARN-2744
 URL: https://issues.apache.org/jira/browse/YARN-2744
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Wangda Tan
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-2744-20141025-1.patch


 Use the following steps:
 * Ensure default in-memory storage is configured for labels
 * Define some labels and assign nodes to labels (e.g. define two labels and 
 assign both labels to the host on a one host cluster)
 * Invoke refreshQueues
 * Modify capacity scheduler to create two top level queues and allow access 
 to the labels from both the queues
 * Assign appropriate label + queue specific capacities
 * Restart resource manager
 Noticed that RM starts without any issues. The labels are not preserved 
 across restart and thus the capacity-scheduler ends up using labels that are 
 no longer present.
 At this point submitting an application to YARN will not succeed as there are 
 no resources available with the labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2744) Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist

2014-10-25 Thread Wangda Tan (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wangda Tan updated YARN-2744:
-
Attachment: YARN-2744-20141025-2.patch

[~ste...@apache.org],
Thanks for your comments! I've updated all stop in the TestQueueParsing to 
stopQuietly. Will keep this in mind when writing tests in the future.

Wangda

 Under some scenario, it is possible to end up with capacity scheduler 
 configuration that uses labels that no longer exist
 -

 Key: YARN-2744
 URL: https://issues.apache.org/jira/browse/YARN-2744
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Wangda Tan
Priority: Critical
 Fix For: 2.6.0

 Attachments: YARN-2744-20141025-1.patch, YARN-2744-20141025-2.patch


 Use the following steps:
 * Ensure default in-memory storage is configured for labels
 * Define some labels and assign nodes to labels (e.g. define two labels and 
 assign both labels to the host on a one host cluster)
 * Invoke refreshQueues
 * Modify capacity scheduler to create two top level queues and allow access 
 to the labels from both the queues
 * Assign appropriate label + queue specific capacities
 * Restart resource manager
 Noticed that RM starts without any issues. The labels are not preserved 
 across restart and thus the capacity-scheduler ends up using labels that are 
 no longer present.
 At this point submitting an application to YARN will not succeed as there are 
 no resources available with the labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2744) Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist

2014-10-25 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-2744:
--
Priority: Major  (was: Critical)
Target Version/s: 2.6.0
   Fix Version/s: (was: 2.6.0)

If I understand correctly, all this patch is doing is validating the labels on 
restart even if memory-based-config-store is used.

Downgrading priority, revert back if you disagree.

 Under some scenario, it is possible to end up with capacity scheduler 
 configuration that uses labels that no longer exist
 -

 Key: YARN-2744
 URL: https://issues.apache.org/jira/browse/YARN-2744
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: capacityscheduler
Affects Versions: 2.5.1
Reporter: Sumit Mohanty
Assignee: Wangda Tan
 Attachments: YARN-2744-20141025-1.patch, YARN-2744-20141025-2.patch


 Use the following steps:
 * Ensure default in-memory storage is configured for labels
 * Define some labels and assign nodes to labels (e.g. define two labels and 
 assign both labels to the host on a one host cluster)
 * Invoke refreshQueues
 * Modify capacity scheduler to create two top level queues and allow access 
 to the labels from both the queues
 * Assign appropriate label + queue specific capacities
 * Restart resource manager
 Noticed that RM starts without any issues. The labels are not preserved 
 across restart and thus the capacity-scheduler ends up using labels that are 
 no longer present.
 At this point submitting an application to YARN will not succeed as there are 
 no resources available with the labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)