[jira] [Updated] (YARN-2744) Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist
[ https://issues.apache.org/jira/browse/YARN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2744: - Priority: Critical (was: Major) Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist - Key: YARN-2744 URL: https://issues.apache.org/jira/browse/YARN-2744 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.5.1 Reporter: Sumit Mohanty Assignee: Wangda Tan Priority: Critical Attachments: YARN-2744-20141025-1.patch, YARN-2744-20141025-2.patch Use the following steps: * Ensure default in-memory storage is configured for labels * Define some labels and assign nodes to labels (e.g. define two labels and assign both labels to the host on a one host cluster) * Invoke refreshQueues * Modify capacity scheduler to create two top level queues and allow access to the labels from both the queues * Assign appropriate label + queue specific capacities * Restart resource manager Noticed that RM starts without any issues. The labels are not preserved across restart and thus the capacity-scheduler ends up using labels that are no longer present. At this point submitting an application to YARN will not succeed as there are no resources available with the labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2744) Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist
[ https://issues.apache.org/jira/browse/YARN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2744: - Issue Type: Sub-task (was: Bug) Parent: YARN-2492 Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist - Key: YARN-2744 URL: https://issues.apache.org/jira/browse/YARN-2744 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.5.1 Reporter: Sumit Mohanty Assignee: Wangda Tan Priority: Critical Fix For: 2.6.0 Use the following steps: * Ensure default in-memory storage is configured for labels * Define some labels and assign nodes to labels (e.g. define two labels and assign both labels to the host on a one host cluster) * Invoke refreshQueues * Modify capacity scheduler to create two top level queues and allow access to the labels from both the queues * Assign appropriate label + queue specific capacities * Restart resource manager Noticed that RM starts without any issues. The labels are not preserved across restart and thus the capacity-scheduler ends up using labels that are no longer present. At this point submitting an application to YARN will not succeed as there are no resources available with the labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2744) Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist
[ https://issues.apache.org/jira/browse/YARN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2744: - Attachment: YARN-2744-20141025-1.patch Thanks for [~sumitmohanty] reporting this issue. Attached a fix of this issue, this bug is caused by currently check if labels of queue are included by NodeLabelsManager placed in a wrong place: {code} if (this.accessibleLabels == null parent != null) { this.accessibleLabels = parent.getAccessibleNodeLabels(); - SchedulerUtils.checkIfLabelInClusterNodeLabels(labelManager, - this.accessibleLabels); } +SchedulerUtils.checkIfLabelInClusterNodeLabels(labelManager, +this.accessibleLabels); {code} In the past, labels will only be checked when 1) label of a queue is empty 2) the queue is not root queue 3) labels of queue's parent is not empty. Added a patch and tests to cover different cases when parsing queue, but accessible-node-labels not included by NodeLabelsManager, they should throw out exception. Wangda Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist - Key: YARN-2744 URL: https://issues.apache.org/jira/browse/YARN-2744 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.5.1 Reporter: Sumit Mohanty Assignee: Wangda Tan Priority: Critical Fix For: 2.6.0 Attachments: YARN-2744-20141025-1.patch Use the following steps: * Ensure default in-memory storage is configured for labels * Define some labels and assign nodes to labels (e.g. define two labels and assign both labels to the host on a one host cluster) * Invoke refreshQueues * Modify capacity scheduler to create two top level queues and allow access to the labels from both the queues * Assign appropriate label + queue specific capacities * Restart resource manager Noticed that RM starts without any issues. The labels are not preserved across restart and thus the capacity-scheduler ends up using labels that are no longer present. At this point submitting an application to YARN will not succeed as there are no resources available with the labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2744) Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist
[ https://issues.apache.org/jira/browse/YARN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-2744: - Attachment: YARN-2744-20141025-2.patch [~ste...@apache.org], Thanks for your comments! I've updated all stop in the TestQueueParsing to stopQuietly. Will keep this in mind when writing tests in the future. Wangda Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist - Key: YARN-2744 URL: https://issues.apache.org/jira/browse/YARN-2744 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.5.1 Reporter: Sumit Mohanty Assignee: Wangda Tan Priority: Critical Fix For: 2.6.0 Attachments: YARN-2744-20141025-1.patch, YARN-2744-20141025-2.patch Use the following steps: * Ensure default in-memory storage is configured for labels * Define some labels and assign nodes to labels (e.g. define two labels and assign both labels to the host on a one host cluster) * Invoke refreshQueues * Modify capacity scheduler to create two top level queues and allow access to the labels from both the queues * Assign appropriate label + queue specific capacities * Restart resource manager Noticed that RM starts without any issues. The labels are not preserved across restart and thus the capacity-scheduler ends up using labels that are no longer present. At this point submitting an application to YARN will not succeed as there are no resources available with the labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2744) Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist
[ https://issues.apache.org/jira/browse/YARN-2744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated YARN-2744: -- Priority: Major (was: Critical) Target Version/s: 2.6.0 Fix Version/s: (was: 2.6.0) If I understand correctly, all this patch is doing is validating the labels on restart even if memory-based-config-store is used. Downgrading priority, revert back if you disagree. Under some scenario, it is possible to end up with capacity scheduler configuration that uses labels that no longer exist - Key: YARN-2744 URL: https://issues.apache.org/jira/browse/YARN-2744 Project: Hadoop YARN Issue Type: Sub-task Components: capacityscheduler Affects Versions: 2.5.1 Reporter: Sumit Mohanty Assignee: Wangda Tan Attachments: YARN-2744-20141025-1.patch, YARN-2744-20141025-2.patch Use the following steps: * Ensure default in-memory storage is configured for labels * Define some labels and assign nodes to labels (e.g. define two labels and assign both labels to the host on a one host cluster) * Invoke refreshQueues * Modify capacity scheduler to create two top level queues and allow access to the labels from both the queues * Assign appropriate label + queue specific capacities * Restart resource manager Noticed that RM starts without any issues. The labels are not preserved across restart and thus the capacity-scheduler ends up using labels that are no longer present. At this point submitting an application to YARN will not succeed as there are no resources available with the labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)