[jira] [Created] (YARN-2324) Race condition in continuousScheduling for FairScheduler
zhihai xu created YARN-2324: --- Summary: Race condition in continuousScheduling for FairScheduler Key: YARN-2324 URL: https://issues.apache.org/jira/browse/YARN-2324 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Race condition in continuousScheduling for FairScheduler. removeNode can run when continuousScheduling is called in schedulingThread. If the node is removed from nodes, nodes.get(n2) and getFSSchedulerNode(nodeId) will be null. So we need add lock to remove the NPE/race conditions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2324) Race condition in continuousScheduling for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067821#comment-14067821 ] Wei Yan commented on YARN-2324: --- duplicate of YARN-2273? Race condition in continuousScheduling for FairScheduler Key: YARN-2324 URL: https://issues.apache.org/jira/browse/YARN-2324 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Race condition in continuousScheduling for FairScheduler. removeNode can run when continuousScheduling is called in schedulingThread. If the node is removed from nodes, nodes.get(n2) and getFSSchedulerNode(nodeId) will be null. So we need add lock to remove the NPE/race conditions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (YARN-2324) Race condition in continuousScheduling for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu reassigned YARN-2324: --- Assignee: zhihai xu Race condition in continuousScheduling for FairScheduler Key: YARN-2324 URL: https://issues.apache.org/jira/browse/YARN-2324 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Race condition in continuousScheduling for FairScheduler. removeNode can run when continuousScheduling is called in schedulingThread. If the node is removed from nodes, nodes.get(n2) and getFSSchedulerNode(nodeId) will be null. So we need add lock to remove the NPE/race conditions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2324) Race condition in continuousScheduling for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2324: Attachment: YARN-2324.000.patch Race condition in continuousScheduling for FairScheduler Key: YARN-2324 URL: https://issues.apache.org/jira/browse/YARN-2324 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2324.000.patch Race condition in continuousScheduling for FairScheduler. removeNode can run when continuousScheduling is called in schedulingThread. If the node is removed from nodes, nodes.get(n2) and getFSSchedulerNode(nodeId) will be null. So we need add lock to remove the NPE/race conditions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2324) Race condition in continuousScheduling for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067844#comment-14067844 ] Hadoop QA commented on YARN-2324: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656778/YARN-2324.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4373//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4373//console This message is automatically generated. Race condition in continuousScheduling for FairScheduler Key: YARN-2324 URL: https://issues.apache.org/jira/browse/YARN-2324 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2324.000.patch Race condition in continuousScheduling for FairScheduler. removeNode can run when continuousScheduling is called in schedulingThread. If the node is removed from nodes, nodes.get(n2) and getFSSchedulerNode(nodeId) will be null. So we need add lock to remove the NPE/race conditions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2013) The diagnostics is always the ExitCodeException stack when the container crashes
[ https://issues.apache.org/jira/browse/YARN-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated YARN-2013: - Attachment: YARN-2013.5.patch [~djp], thank you for review! Updated a patch to address your comment. The diagnostics is always the ExitCodeException stack when the container crashes Key: YARN-2013 URL: https://issues.apache.org/jira/browse/YARN-2013 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Zhijie Shen Assignee: Tsuyoshi OZAWA Attachments: YARN-2013.1.patch, YARN-2013.2.patch, YARN-2013.3-2.patch, YARN-2013.3.patch, YARN-2013.4.patch, YARN-2013.5.patch When a container crashes, ExitCodeException will be thrown from Shell. Default/LinuxContainerExecutor captures the exception, put the exception stack into the diagnostic. Therefore, the exception stack is always the same. {code} String diagnostics = Exception from container-launch: \n + StringUtils.stringifyException(e) + \n + shExec.getOutput(); container.handle(new ContainerDiagnosticsUpdateEvent(containerId, diagnostics)); {code} In addition, it seems that the exception always has a empty message as there's no message from stderr. Hence the diagnostics is not of much use for users to analyze the reason of container crash. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2324) Race condition in continuousScheduling for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067889#comment-14067889 ] Tsuyoshi OZAWA commented on YARN-2324: -- [~zxu], thank you for reporting and your contribution! As [~ywskycn] mentioned, we're addressing the problem on YARN-2273. Let's close this ticket as duplicated. Race condition in continuousScheduling for FairScheduler Key: YARN-2324 URL: https://issues.apache.org/jira/browse/YARN-2324 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2324.000.patch Race condition in continuousScheduling for FairScheduler. removeNode can run when continuousScheduling is called in schedulingThread. If the node is removed from nodes, nodes.get(n2) and getFSSchedulerNode(nodeId) will be null. So we need add lock to remove the NPE/race conditions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2013) The diagnostics is always the ExitCodeException stack when the container crashes
[ https://issues.apache.org/jira/browse/YARN-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067898#comment-14067898 ] Hadoop QA commented on YARN-2013: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656788/YARN-2013.5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager: org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4374//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4374//console This message is automatically generated. The diagnostics is always the ExitCodeException stack when the container crashes Key: YARN-2013 URL: https://issues.apache.org/jira/browse/YARN-2013 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Zhijie Shen Assignee: Tsuyoshi OZAWA Attachments: YARN-2013.1.patch, YARN-2013.2.patch, YARN-2013.3-2.patch, YARN-2013.3.patch, YARN-2013.4.patch, YARN-2013.5.patch When a container crashes, ExitCodeException will be thrown from Shell. Default/LinuxContainerExecutor captures the exception, put the exception stack into the diagnostic. Therefore, the exception stack is always the same. {code} String diagnostics = Exception from container-launch: \n + StringUtils.stringifyException(e) + \n + shExec.getOutput(); container.handle(new ContainerDiagnosticsUpdateEvent(containerId, diagnostics)); {code} In addition, it seems that the exception always has a empty message as there's no message from stderr. Hence the diagnostics is not of much use for users to analyze the reason of container crash. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2013) The diagnostics is always the ExitCodeException stack when the container crashes
[ https://issues.apache.org/jira/browse/YARN-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067922#comment-14067922 ] Tsuyoshi OZAWA commented on YARN-2013: -- The test failure is not related. The diagnostics is always the ExitCodeException stack when the container crashes Key: YARN-2013 URL: https://issues.apache.org/jira/browse/YARN-2013 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Reporter: Zhijie Shen Assignee: Tsuyoshi OZAWA Attachments: YARN-2013.1.patch, YARN-2013.2.patch, YARN-2013.3-2.patch, YARN-2013.3.patch, YARN-2013.4.patch, YARN-2013.5.patch When a container crashes, ExitCodeException will be thrown from Shell. Default/LinuxContainerExecutor captures the exception, put the exception stack into the diagnostic. Therefore, the exception stack is always the same. {code} String diagnostics = Exception from container-launch: \n + StringUtils.stringifyException(e) + \n + shExec.getOutput(); container.handle(new ContainerDiagnosticsUpdateEvent(containerId, diagnostics)); {code} In addition, it seems that the exception always has a empty message as there's no message from stderr. Hence the diagnostics is not of much use for users to analyze the reason of container crash. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2309) NPE during RM-Restart test scenario
[ https://issues.apache.org/jira/browse/YARN-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067928#comment-14067928 ] Tsuyoshi OZAWA commented on YARN-2309: -- [~nishan], thank you for reporting. The patch to fix the problem is available on YARN-1919. Any feedbacks are welcome :-) NPE during RM-Restart test scenario --- Key: YARN-2309 URL: https://issues.apache.org/jira/browse/YARN-2309 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Nishan Shetty Priority: Minor During RMRestart test scenarios, we met with below exception. A point to note here is, Zookeeper also was not stable during this testing, we could see many Zookeeper exception before getting this NPE {code} 2014-07-10 10:49:46,817 WARN org.apache.hadoop.service.AbstractService: When stopping the service org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService : java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceStop(EmbeddedElectorService.java:108) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:125) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:232) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1039) {code} Zookeeper Exception {code} 2014-07-10 10:49:46,816 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService failed in state INITED; cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1046) at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1017) at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:632) at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:766) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2313) Livelock can occur on FairScheduler when there are lots entry in queue
[ https://issues.apache.org/jira/browse/YARN-2313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067998#comment-14067998 ] Sandy Ryza commented on YARN-2313: -- Thanks for reporting this [~ozawa]. A couple nits: * The new configuration should be defined in FairSchedulerConfiguration like other fair scheduler props * If I understand correctly, the race described in the findbugs could never actually happen. For code readability, I think it's better to add a findbugs exclude than an unnecessary synchronization. * In the warning, replace use with using * Extra space after DEFAULT_RM_SCHEDULER_FS_UPDATE_INTERVAL_MS Eventually, I think we should try to be smarter about the work that goes on in update(). In most cases, the fair shares will stay the same, or will only change for apps in a particular queue, so we can avoid recomputation. Livelock can occur on FairScheduler when there are lots entry in queue -- Key: YARN-2313 URL: https://issues.apache.org/jira/browse/YARN-2313 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.4.1 Reporter: Tsuyoshi OZAWA Assignee: Tsuyoshi OZAWA Attachments: YARN-2313.1.patch, YARN-2313.2.patch, YARN-2313.3.patch, rm-stack-trace.txt Observed livelock on FairScheduler when there are lots entry in queue. After my investigating code, following case can occur: 1. {{update()}} called by UpdateThread takes longer times than UPDATE_INTERVAL(500ms) if there are lots queue. 2. UpdateThread goes busy loop. 3. Other threads(AllocationFileReloader, ResourceManager$SchedulerEventDispatcher) can wait forever. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068017#comment-14068017 ] Sandy Ryza commented on YARN-796: - I'm worried that the proposal is becoming too complex. Can we try to whittle the proposal down to a minimum viable feature? I'm not necessarily opposed to the more advanced parts of it like queue label policies and updating labels on the fly, and the design should aim to make them possible in the future, but I don't think they need to be part of the initial implementation. To me it seems like the essential requirements here are: * A way for nodes to be tagged with labels * A way to make scheduling requests based on these labels I'm also skeptical about the need for adding/removing labels dynamically. Do we have concrete use cases for this? Lastly, as BC and Sunil have pointed out, specifying the labels in the NodeManager confs greatly simplifies configuration when nodes are being added. Are there advantages to a centralized configuration? Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068024#comment-14068024 ] Craig Welch commented on YARN-2008: --- Hmm, I don't think that we can change headroom to be just the guaranteed or base capacity because I believe that will defeat the support for having a max capacity above the base capacity. As I understand it that is in place so that busy queues can grow to use more of the cluster when other queues are underutilized - to achieve more efficient and full use of the cluster overall - and if the application gets the low baseline headroom it will not be able to effectively use that greater capacity. Assuming we keep support for the max capacity, then even with pre-emption I think we will need this logic, because preemption won't guarantee that all the queues have their max capacity available to them, as the total max capacity can be over 100%. Preemption will help, certainly, but I don't think it can replace this logic - I think we need both. CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure - Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Chen He Assignee: Chen He Attachments: YARN-2008.1.patch, YARN-2008.2.patch If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | | / | \ | | L1ParentQueue1 | | L1ParentQueue2| | (allowed to use up 80% of its parent)| | (allowed to use 20% in minimum of its parent)| |/ | \ || | L2LeafQueue1 |L2LeafQueue2 | | |(50% of its parent) | (50% of its parent in minimum) | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068026#comment-14068026 ] Chen He commented on YARN-2008: --- Thank you for the patch, [~cwelch] {quote} and if the application gets the low baseline headroom it will not be able to effectively use that greater capacity. {quote} If we overestimate the headroom, it will cause some AMs hang or worst case: deadlock. This JIRA is to avoid that. CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure - Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Chen He Assignee: Chen He Attachments: YARN-2008.1.patch, YARN-2008.2.patch If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | | / | \ | | L1ParentQueue1 | | L1ParentQueue2| | (allowed to use up 80% of its parent)| | (allowed to use 20% in minimum of its parent)| |/ | \ || | L2LeafQueue1 |L2LeafQueue2 | | |(50% of its parent) | (50% of its parent in minimum) | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (YARN-2325) need check whether node is null in nodeUpdate for FairScheduler
zhihai xu created YARN-2325: --- Summary: need check whether node is null in nodeUpdate for FairScheduler Key: YARN-2325 URL: https://issues.apache.org/jira/browse/YARN-2325 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu need check whether node is null in nodeUpdate for FairScheduler. If nodeUpdate is called after removeNode, the getFSSchedulerNode will be null. If the node is null, we should return with error message. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2325) need check whether node is null in nodeUpdate for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated YARN-2325: Attachment: YARN-2325.000.patch need check whether node is null in nodeUpdate for FairScheduler Key: YARN-2325 URL: https://issues.apache.org/jira/browse/YARN-2325 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Attachments: YARN-2325.000.patch need check whether node is null in nodeUpdate for FairScheduler. If nodeUpdate is called after removeNode, the getFSSchedulerNode will be null. If the node is null, we should return with error message. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2325) need check whether node is null in nodeUpdate for FairScheduler
[ https://issues.apache.org/jira/browse/YARN-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068035#comment-14068035 ] Hadoop QA commented on YARN-2325: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656795/YARN-2325.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4375//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4375//console This message is automatically generated. need check whether node is null in nodeUpdate for FairScheduler Key: YARN-2325 URL: https://issues.apache.org/jira/browse/YARN-2325 Project: Hadoop YARN Issue Type: Bug Reporter: zhihai xu Assignee: zhihai xu Attachments: YARN-2325.000.patch need check whether node is null in nodeUpdate for FairScheduler. If nodeUpdate is called after removeNode, the getFSSchedulerNode will be null. If the node is null, we should return with error message. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068041#comment-14068041 ] Jian Fang commented on YARN-796: As Sandy pointed out, seems the scope becomes bigger and bigger. Take our use case as an example, we initial only need to restrict Application masters not be assigned to some nodes such as spot instances in EC2. In our design, we only added the following parameters yarn.label.enabled yarn.nodemanager.labels yarn.app.mapreduce.am.labels to yarn-site.xml and then modified hadoop code. This function works now. With the current proposal, I wonder how long it may take to finish. I also doubt about the assumption for admin to configure labels for a cluster. Usually a cluster comes with hundreds or thousands of nodes, how possible for the admin to manually configure the labels? This type of work can be easily automated by some script or a java process running on each node to write the labels such as OS, processor, and other parameters to yarn-site.xml before the cluster is started. This is especially true for clusters in a cloud because everything is automated there. The admin UI could only be used in some special cases that require human intervention. One use case for dynamic labeling is that we can put a label to a node when we try to shrink a cluster so that hadoop will not assign tasks to that node any more to give that node some grace time to be decommissioned. This is most likely to be implemented by a restful API call from a process that chooses a node to remove based on cluster metrics of the cluster. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068043#comment-14068043 ] Allen Wittenauer commented on YARN-796: --- I agree pretty much completely with everything Sandy said, especially on the centralized configuration. It actually makes configuration harder for heterogeneous node setups. One caveat: {code} I'm also skeptical about the need for adding/removing labels dynamically. Do we have concrete use cases for this? {code} If you have the nodemanager push the labels to the RM (esp if you can do this via user defined script or java class...), you basically have to have dynamic labels for nodes. Use cases are pretty easy to hit if you label nodes based upon the software stack installed. A quick example for those not following: # User writes software that depends upon a particular version of libfoo.so.2. # Configuration management does an install of libfoo.so.2 # NodeManager label script picks up that it has both libfoo.so.1 and libfoo.so.2. Publishes that it now has libfoo1 and libfoo2. (Remember, this is C and not the screwed up Java universe so having two versions is completely legitimate) # system can now do operations appropriate for either libfoo on that node. # libfoo1 gets deprecated and removed from the system, again via configuration management. # label script picks up change and removes libfoo1 from label listing # system acts appropriately and no longer does operations on node based upon libfoo1 label ... and all without restarting or reconfiguring anything on the Hadoop side. If there is any sort of manual step required in configuration the nodes short of the initial label script/class and other obviously user-provided bits, then we've failed. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068048#comment-14068048 ] Alejandro Abdelnur commented on YARN-796: - i agree with sandy and allen. said that, we currently dont do any thing centralized on per nodemanager basis, if we want to so that we should think solving it in a more general way than just labels. and i would suggest doing that (if we decide to) in a diff jira. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens
[ https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong updated YARN-2211: Attachment: YARN-2211.5.1.patch Same patch with -1 on release audit fix RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens -- Key: YARN-2211 URL: https://issues.apache.org/jira/browse/YARN-2211 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch After YARN-2208, AMRMToken can be rolled over periodically. We need to save related Master Keys and use them to recover the AMRMToken when RM restart/failover happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2211) RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens
[ https://issues.apache.org/jira/browse/YARN-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068069#comment-14068069 ] Hadoop QA commented on YARN-2211: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656801/YARN-2211.5.1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesContainers org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServices org.apache.hadoop.yarn.server.nodemanager.webapp.TestNMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4376//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4376//console This message is automatically generated. RMStateStore needs to save AMRMToken master key for recovery when RM restart/failover happens -- Key: YARN-2211 URL: https://issues.apache.org/jira/browse/YARN-2211 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Xuan Gong Assignee: Xuan Gong Attachments: YARN-2211.1.patch, YARN-2211.2.patch, YARN-2211.3.patch, YARN-2211.4.patch, YARN-2211.5.1.patch, YARN-2211.5.patch After YARN-2208, AMRMToken can be rolled over periodically. We need to save related Master Keys and use them to recover the AMRMToken when RM restart/failover happens -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart
[ https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068074#comment-14068074 ] Jian He commented on YARN-1372: --- {code} NM informs RM and holds on to the information (YARN-1336 should handle this as well) RM informs AM AM acks RM RM acks NM NM deletes the information {code} The approach looks reasonable to me. [~adhoot], wanna take a stab at this? Ensure all completed containers are reported to the AMs across RM restart - Key: YARN-1372 URL: https://issues.apache.org/jira/browse/YARN-1372 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Bikas Saha Assignee: Anubhav Dhoot Currently the NM informs the RM about completed containers and then removes those containers from the RM notification list. The RM passes on that completed container information to the AM and the AM pulls this data. If the RM dies before the AM pulls this data then the AM may not be able to get this information again. To fix this, NM should maintain a separate list of such completed container notifications sent to the RM. After the AM has pulled the containers from the RM then the RM will inform the NM about it and the NM can remove the completed container from the new list. Upon re-register with the RM (after RM restart) the NM should send the entire list of completed containers to the RM along with any other containers that completed while the RM was dead. This ensures that the RM can inform the AM's about all completed containers. Some container completions may be reported more than once since the AM may have pulled the container but the RM may die before notifying the NM about the pull. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2249) RM may receive container release request on AM resync before container is actually recovered
[ https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2249: -- Description: AM resync on RM restart will send outstanding container release requests back to the new RM. In the meantime, NMs report the container statuses back to RM to recover the containers. If RM receives the container release request before the container is actually recovered in scheduler, the container won't be released and the release request will be lost. (was: AM resync on RM restart will send outstanding resource requests, container release list etc. back to the new RM. It is possible that RM receives the container release request before the container is actually recovered. ) RM may receive container release request on AM resync before container is actually recovered Key: YARN-2249 URL: https://issues.apache.org/jira/browse/YARN-2249 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He AM resync on RM restart will send outstanding container release requests back to the new RM. In the meantime, NMs report the container statuses back to RM to recover the containers. If RM receives the container release request before the container is actually recovered in scheduler, the container won't be released and the release request will be lost. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2249) RM may receive container release request on AM resync before container is actually recovered
[ https://issues.apache.org/jira/browse/YARN-2249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068076#comment-14068076 ] Jian He commented on YARN-2249: --- One possible solution is to have AM always send the whole pending release requests in every allocate. The pending release will be decremented once AM receives the completed status of the released container. Specifically, changing AMRMClient to send the pendingRelease instead of release in the allocate method. RM may receive container release request on AM resync before container is actually recovered Key: YARN-2249 URL: https://issues.apache.org/jira/browse/YARN-2249 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager Reporter: Jian He Assignee: Jian He AM resync on RM restart will send outstanding container release requests back to the new RM. In the meantime, NMs report the container statuses back to RM to recover the containers. If RM receives the container release request before the container is actually recovered in scheduler, the container won't be released and the release request will be lost. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068116#comment-14068116 ] Wangda Tan commented on YARN-796: - Really thanks all your comments above, As Sandy, Alejandro and Allen mentioned, concerns of centralized configuration. My thinking is, node label is more dynamic comparing to any other existing options of NM. An important use case we can see is, some customers want to mark label on each node indicate which department/team the node belongs to, when a new team comes in and new machines added, labels may need to be changed. And also, it is possible that the whole cluster is booked to run some huge batch job at 12am-2am for example. So such labels will be changed frequently. If we only have distributed configuration on each node, it is a nightmare for admins to re-configure. I think we should have a same internal interface for destributed/centralized configuration. Like what we've done for RMStateStore. And as Jian Fang mentioned, bq. doubt about the assumption for admin to configure labels for a cluster. I think using script to mark labels is a great way to saving configuration works. But lots of other use cases need human intervention as well. Good examples like from Allen and me. Thanks, Wangda Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068120#comment-14068120 ] Alejandro Abdelnur commented on YARN-796: - Wangda, your usecase is throwing overboard the work pf the scheduler regarding matching nodes with data locality. you can solve it in a much better way using scheduler queues configuration, which can be dynamically adjusted. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068124#comment-14068124 ] Wangda Tan commented on YARN-796: - Hi Alejandro, I totally understand the use case I mentioned is antithetical of the design philosophy of YARN, which should be elastically sharing resources of a multi-tenant environment. But hard partition has some important use cases, even if this is not strongly recommended. Like in some performance-sensitive environment. For example user may want to run HBase master/region-servers in a group of nodes, and don't want any other tasks running in these nodes even if they have free resource. Our current queue configuration cannot solve such problem, of course user can create a separate YARN cluster in this case, but I think make such NMs under a same RM is easy to use and manage. Do you agree? Thanks, Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2323) FairShareComparator creates too much Resource object
[ https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068143#comment-14068143 ] Sandy Ryza commented on YARN-2323: -- As it's a static final variable, ONE should be all caps. Otherwise, LGTM. FairShareComparator creates too much Resource object Key: YARN-2323 URL: https://issues.apache.org/jira/browse/YARN-2323 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2323.patch Each call of {{FairShareComparator}} creates a new Resource object one: {code} Resource one = Resources.createResource(1); {code} At the volume of 1000 nodes and 1000 apps, the comparator will be called more than 10 million times per second, thus creating more than 10 million object one, which is unnecessary. Since the object one is read-only and is never referenced outside of comparator, we could make it static. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068144#comment-14068144 ] Alejandro Abdelnur commented on YARN-796: - Wangda, i'm afraid i'm lost with your last comment. i thought labels were to express desired node affinity base on a label, not to fence off nodes. i don't understand how you will achieve fencing off a node with a label unless you have a more complex annotation mechanism than just a label (ie book this node only if label X is present) also you would have to add ACLs to labels to avoid anybody simply asking for a label. am i missing something? Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068145#comment-14068145 ] Wangda Tan commented on YARN-796: - Alejandro, I think we've mentioned this in our design doc, you check check https://issues.apache.org/jira/secure/attachment/12654446/Node-labels-Requirements-Design-doc-V1.pdf, top level requirements-admin tools-Security and access controls for managing Labels. Please let me know if you have any comments on it. Thanks :), Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068147#comment-14068147 ] Wangda Tan commented on YARN-1198: -- I've just taken a look at all sub tasks of this JIRA, I'm wondering if we should define what is the headroom first. In previous YARN, including YARN-1198 the headroom is defined as the maximum resource of an application can get. And in YARN-2008, the headroom is defined as the available resource of an application can get, because we already considered used resource of sibling queues. I'm afraid if we need add a new field like guaranteed headroom of an application consider its absolute capacity (not maximum capacity) and user-limits, etc. We may keep both of them because, - The maximum resource is not always achievible because sum of maximum resource of leaf queues may excess cluster resource. - With preemption, resource beyond guaranteed resource will be likely preempted. It should be consider as a temporary resource. And with this, AM can, - Using guaranteed headroom to allocate resource which will not be preempted. - Using maximum headroom to try to allocate resource beyond its guaranteed headroom. And in my humble opinion, the available resource of an application can get doesn't make a lot of sense here, and may cause some backward-compatible problems as well. Because in a dynamic cluster, the number can change rapidly, it is possible that a cluster is fulfilled by another application just happens one second after the AM got the available headroom. And also, this field can not solve the deadlock problem as well, a malicious application can ask much more resource of this, or a careless developer totally ignore this field. The only valid solution in my head is putting such logic into scheduler side, and enforce resource usage by preemption policy. Any thoughts? [~jlowe], [~cwelch] Thanks, Wangda Capacity Scheduler headroom calculation does not work as expected - Key: YARN-1198 URL: https://issues.apache.org/jira/browse/YARN-1198 Project: Hadoop YARN Issue Type: Bug Reporter: Omkar Vinit Joshi Assignee: Omkar Vinit Joshi Attachments: YARN-1198.1.patch Today headroom calculation (for the app) takes place only when * New node is added/removed from the cluster * New container is getting assigned to the application. However there are potentially lot of situations which are not considered for this calculation * If a container finishes then headroom for that application will change and should be notified to the AM accordingly. * If a single user has submitted multiple applications (app1 and app2) to the same queue then ** If app1's container finishes then not only app1's but also app2's AM should be notified about the change in headroom. ** Similarly if a container is assigned to any applications app1/app2 then both AM should be notified about their headroom. ** To simplify the whole communication process it is ideal to keep headroom per User per LeafQueue so that everyone gets the same picture (apps belonging to same user and submitted in same queue). * If a new user submits an application to the queue then all applications submitted by all users in that queue should be notified of the headroom change. * Also today headroom is an absolute number ( I think it should be normalized but then this is going to be not backward compatible..) * Also when admin user refreshes queue headroom has to be updated. These all are the potential bugs in headroom calculations -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2008) CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure
[ https://issues.apache.org/jira/browse/YARN-2008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068148#comment-14068148 ] Wangda Tan commented on YARN-2008: -- Hi [~cwelch], [~airbots], I've put my comment on YARN-1198: https://issues.apache.org/jira/browse/YARN-1198?focusedCommentId=14068147page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14068147, because I think it is a general comment of headroom. Please share your ideas here, Thanks, Wangda CapacityScheduler may report incorrect queueMaxCap if there is hierarchy queue structure - Key: YARN-2008 URL: https://issues.apache.org/jira/browse/YARN-2008 Project: Hadoop YARN Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Chen He Assignee: Chen He Attachments: YARN-2008.1.patch, YARN-2008.2.patch If there are two queues, both allowed to use 100% of the actual resources in the cluster. Q1 and Q2 currently use 50% of actual cluster's resources and there is not actual space available. If we use current method to get headroom, CapacityScheduler thinks there are still available resources for users in Q1 but they have been used by Q2. If the CapacityScheduelr has a hierarchy queue structure, it may report incorrect queueMaxCap. Here is a example ||||rootQueue|| || | | / | \ | | L1ParentQueue1 | | L1ParentQueue2| | (allowed to use up 80% of its parent)| | (allowed to use 20% in minimum of its parent)| |/ | \ || | L2LeafQueue1 |L2LeafQueue2 | | |(50% of its parent) | (50% of its parent in minimum) | | When we calculate headroom of a user in L2LeafQueue2, current method will think L2LeafQueue2 can use 40% (80%*50%) of actual rootQueue resources. However, without checking L1ParentQueue1, we are not sure. It is possible that L1ParentQueue2 have used 40% of rootQueue resources right now. Actually, L2LeafQueue2 can only use 30% (60%*50%). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068163#comment-14068163 ] Wangda Tan commented on YARN-796: - Hi [~sunilg], bq. 2. Regarding reservations, how about introducing node-label reservations. Ideas is like, if an application is lacking resource on a node, it can reserve on that node as well as to node-label. So when a suitable node update comes from another node in same node-label, can try allocating container in new node by unreserving from old node. I think this makes sense, we'd better support this. I will check our current resource reservation/unreservation logic how to support it, will keep you posted. bq. 3. My approach was more like have a centralized configuration, but later after some time, if want to add a new node to cluster, then it can start with a hardcoded label in its yarn-site. In your approach, we need to use REStful API or admin command to bring this node under one label. May be while start up itself this node can be set under a label. your thoughts? I think a problem of mixed centralized/distributed configuration I can see is, it will be hard to manage them after RM/NM restart -- should we use labels specified in NM config or our centralized config? I also replied Jian Fang previously about this: https://issues.apache.org/jira/browse/YARN-796?focusedCommentId=14063316page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14063316. Maybe a workaround is we can define the centralized config all always overwrite distributed config. E.g. user defined GPU in NM config, and admin use RESTful added FPGA, RM will serialize both GPU, FPGA into a centralized storage system. And after RM restart or NM restart, RM will ignore NM config if anything defined in RM. But I still think it's better to avoid use both of them together. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068169#comment-14068169 ] Allen Wittenauer commented on YARN-796: --- bq. An important use case we can see is, some customers want to mark label on each node indicate which department/team the node belongs to, when a new team comes in and new machines added, labels may need to be changed. You can solve this problem today by just running separate RMs. In practice, however, marking nodes for specific teams in queue systems doesn't work because doing so assumes that the capacity never changes... i.e., nodes never fail. That happens all the time, of course, thus why percentages make a lot more sense. If you absolutely want a fixed number of capacity, you still wouldn't mark specific nodes: you'd say queue x gets y machines with no specification of which nodes. bq. And also, it is possible that the whole cluster is booked to run some huge batch job at 12am-2am for example. So such labels will be changed frequently. Well, no, they won't. They'll happen exactly twice a day. But it doesn't matter: you can solve this problem today too by just setting something that changes the queue acls at 12am and 2am via a cron job. bq. For example user may want to run HBase master/region-servers in a group of nodes, and don't want any other tasks running in these nodes even if they have free resource. Our current queue configuration cannot solve such problem ... except, you guessed it: this is a solved problem today too. You just need to make sure the container sizes that are requested consume the whole node. bq. If we only have distributed configuration on each node, it is a nightmare for admins to re-configure. Hi. My name is Allen and I'm an admin. Even if using labels for doing this type of scheduling was sane, it still wouldn't be a nightmare because any competent admin would use configuration management to roll out changes to the nodes in a controlled manner. But more importantly: these use cases are *solved problems* and have been in YARN for a very long time. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2323) FairShareComparator creates too much Resource object
[ https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Zhiguo updated YARN-2323: -- Attachment: YARN-2323-2.patch patch revised according to [~sandyr]'s comments. FairShareComparator creates too much Resource object Key: YARN-2323 URL: https://issues.apache.org/jira/browse/YARN-2323 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2323-2.patch, YARN-2323.patch Each call of {{FairShareComparator}} creates a new Resource object one: {code} Resource one = Resources.createResource(1); {code} At the volume of 1000 nodes and 1000 apps, the comparator will be called more than 10 million times per second, thus creating more than 10 million object one, which is unnecessary. Since the object one is read-only and is never referenced outside of comparator, we could make it static. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068184#comment-14068184 ] Wangda Tan commented on YARN-796: - bq. You can solve this problem today by just running separate RMs. I think it's not good for configure, user need maintain several configuration folders in their nodes for submission job. bq. In practice, however, marking nodes for specific teams in queue systems doesn't work because doing so assumes that the capacity never changes... i.e It is possible that you cannot replace a failure node by a random node in heterogeneous cluster. E.g. only some nodes have GPUs, and these nodes will be dedicated to be used by data scientist team. Percentage of queue capacity doesn't make a lot of sense here. bq. ... except, you guessed it: this is a solved problem today too. You just need to make sure the container sizes that are requested consume the whole node. Assume a HBase master want to run a node have 64G mem and infiniband. You can ask a 64G mem container, but it may be like to be allocated to a 128G node but doesn't have infiniband. Again, it's another heterogeneous issue. And ask for such a big container may need take a great amount of time, wait for resource reservation, etc. bq. it still wouldn't be a nightmare because any competent admin would use configuration management to roll out changes to the nodes in a controlled manner. It is very likely not every admin has scripts like you, especially some new YARN users, we'd better make this feature can be used out-of-box Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2323) FairShareComparator creates too much Resource object
[ https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068207#comment-14068207 ] Hadoop QA commented on YARN-2323: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656814/YARN-2323-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServices org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesCapacitySched org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesDelegationTokens org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesAppsModification org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesNodes org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesApps org.apache.hadoop.yarn.server.resourcemanager.webapp.TestRMWebServicesFairScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/4377//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4377//console This message is automatically generated. FairShareComparator creates too much Resource object Key: YARN-2323 URL: https://issues.apache.org/jira/browse/YARN-2323 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2323-2.patch, YARN-2323.patch Each call of {{FairShareComparator}} creates a new Resource object one: {code} Resource one = Resources.createResource(1); {code} At the volume of 1000 nodes and 1000 apps, the comparator will be called more than 10 million times per second, thus creating more than 10 million object one, which is unnecessary. Since the object one is read-only and is never referenced outside of comparator, we could make it static. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2323) FairShareComparator creates too many Resource objects
[ https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-2323: - Summary: FairShareComparator creates too many Resource objects (was: FairShareComparator creates too much Resource object) FairShareComparator creates too many Resource objects - Key: YARN-2323 URL: https://issues.apache.org/jira/browse/YARN-2323 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Attachments: YARN-2323-2.patch, YARN-2323.patch Each call of {{FairShareComparator}} creates a new Resource object one: {code} Resource one = Resources.createResource(1); {code} At the volume of 1000 nodes and 1000 apps, the comparator will be called more than 10 million times per second, thus creating more than 10 million object one, which is unnecessary. Since the object one is read-only and is never referenced outside of comparator, we could make it static. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2323) FairShareComparator creates too many Resource objects
[ https://issues.apache.org/jira/browse/YARN-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068210#comment-14068210 ] Hudson commented on YARN-2323: -- FAILURE: Integrated in Hadoop-trunk-Commit #5921 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5921/]) YARN-2323. FairShareComparator creates too many Resource objects (Hong Zhiguo via Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1612187) * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java FairShareComparator creates too many Resource objects - Key: YARN-2323 URL: https://issues.apache.org/jira/browse/YARN-2323 Project: Hadoop YARN Issue Type: Improvement Components: fairscheduler Reporter: Hong Zhiguo Assignee: Hong Zhiguo Priority: Minor Fix For: 2.6.0 Attachments: YARN-2323-2.patch, YARN-2323.patch Each call of {{FairShareComparator}} creates a new Resource object one: {code} Resource one = Resources.createResource(1); {code} At the volume of 1000 nodes and 1000 apps, the comparator will be called more than 10 million times per second, thus creating more than 10 million object one, which is unnecessary. Since the object one is read-only and is never referenced outside of comparator, we could make it static. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store
[ https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2033: -- Attachment: YARN-2033_ALL.2.patch Investigate merging generic-history into the Timeline Store --- Key: YARN-2033 URL: https://issues.apache.org/jira/browse/YARN-2033 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch Having two different stores isn't amicable to generic insights on what's happening with applications. This is to investigate porting generic-history into the Timeline Store. One goal is to try and retain most of the client side interfaces as close to what we have today. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2033) Investigate merging generic-history into the Timeline Store
[ https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated YARN-2033: -- Attachment: YARN-2033.2.patch TestFSDownload is not related, but upload a patch to fix remaining issues. Investigate merging generic-history into the Timeline Store --- Key: YARN-2033 URL: https://issues.apache.org/jira/browse/YARN-2033 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch Having two different stores isn't amicable to generic insights on what's happening with applications. This is to investigate porting generic-history into the Timeline Store. One goal is to try and retain most of the client side interfaces as close to what we have today. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (YARN-2284) Find missing config options in YarnConfiguration and yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated YARN-2284: - Attachment: YARN2284-02.patch Second attempt. - Have minimal changes to Configuration to read in xml files and retain keys with no values for comparison. - Pull out common functions into a utility class. - Create two unit tests, one for YarnConfiguration/yarn-default.xml and another for MRJobConfig/mapred-default.xml. Unit tests generate output in surefire-reports that do Configuration/XML file comparisons and report which keys exist in one and not the other. Find missing config options in YarnConfiguration and yarn-default.xml - Key: YARN-2284 URL: https://issues.apache.org/jira/browse/YARN-2284 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.4.1 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Attachments: YARN2284-01.patch, YARN2284-02.patch YarnConfiguration has one set of properties. yarn-default.xml has another set of properties. Ideally, there should be an automatic way to find missing properties in either location. This is analogous to MAPREDUCE-5130, but for yarn-default.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2033) Investigate merging generic-history into the Timeline Store
[ https://issues.apache.org/jira/browse/YARN-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068217#comment-14068217 ] Hadoop QA commented on YARN-2033: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656819/YARN-2033_ALL.2.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4378//console This message is automatically generated. Investigate merging generic-history into the Timeline Store --- Key: YARN-2033 URL: https://issues.apache.org/jira/browse/YARN-2033 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Zhijie Shen Attachments: ProposalofStoringYARNMetricsintotheTimelineStore.pdf, YARN-2033.1.patch, YARN-2033.2.patch, YARN-2033.Prototype.patch, YARN-2033_ALL.1.patch, YARN-2033_ALL.2.patch Having two different stores isn't amicable to generic insights on what's happening with applications. This is to investigate porting generic-history into the Timeline Store. One goal is to try and retain most of the client side interfaces as close to what we have today. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-2284) Find missing config options in YarnConfiguration and yarn-default.xml
[ https://issues.apache.org/jira/browse/YARN-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068218#comment-14068218 ] Hadoop QA commented on YARN-2284: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12656820/YARN2284-02.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4379//console This message is automatically generated. Find missing config options in YarnConfiguration and yarn-default.xml - Key: YARN-2284 URL: https://issues.apache.org/jira/browse/YARN-2284 Project: Hadoop YARN Issue Type: Improvement Affects Versions: 2.4.1 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: supportability Attachments: YARN2284-01.patch, YARN2284-02.patch YarnConfiguration has one set of properties. yarn-default.xml has another set of properties. Ideally, there should be an automatic way to find missing properties in either location. This is analogous to MAPREDUCE-5130, but for yarn-default.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14068230#comment-14068230 ] Allen Wittenauer commented on YARN-796: --- Then let me be more blunt about it: I'm -1 this patch if I can't do dynamic labels from the node manager via a script. Allow for (admin) labels on nodes and resource-requests --- Key: YARN-796 URL: https://issues.apache.org/jira/browse/YARN-796 Project: Hadoop YARN Issue Type: Sub-task Reporter: Arun C Murthy Assignee: Wangda Tan Attachments: LabelBasedScheduling.pdf, Node-labels-Requirements-Design-doc-V1.pdf, YARN-796.patch It will be useful for admins to specify labels for nodes. Examples of labels are OS, processor architecture etc. We should expose these labels and allow applications to specify labels on resource-requests. Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.2#6252)