[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165951#comment-14165951 ] Hadoop QA commented on YARN-2496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12674034/YARN-2496-20141009-1.patch against trunk revision 596702a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5348//console This message is automatically generated. > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496-20141009-1.patch, YARN-2496.patch, > YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch, YARN-2496.patch, YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161550#comment-14161550 ] Hadoop QA commented on YARN-2496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673291/YARN-2496.patch against trunk revision 0fb2735. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA org.apache.hadoop.yarn.server.resourcemanager.security.TestAMRMTokens org.apache.hadoop.yarn.server.resourcemanager.TestResourceManager org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-YARN-Build/5300//testReport/ Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5300//console This message is automatically generated. > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161402#comment-14161402 ] Hadoop QA commented on YARN-2496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673283/YARN-2496.patch against trunk revision 519e5a7. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5297//console This message is automatically generated. > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161155#comment-14161155 ] Hadoop QA commented on YARN-2496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673224/YARN-2496.patch against trunk revision 8dc6abf. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5287//console This message is automatically generated. > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch, YARN-2496.patch, YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160684#comment-14160684 ] Hadoop QA commented on YARN-2496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12673145/YARN-2496.patch against trunk revision ea26cc0. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/5277//console This message is automatically generated. > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch, YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143993#comment-14143993 ] Craig Welch commented on YARN-2496: --- So, re the headroom issue (2) - the short version - I don't think we can put off addressing this, because I think it is going to be a typical case and will be problematic. I think the most realistic solution is to support only a short list of pre-configured label expressions per queue. Another option is to limit nodes to supporting only 1 label per node (which, realistically, might be sufficient). A third option is to limit the number of labels which a queue can access to a very small value + the "all" value (1-2). Basically, one of the factors pushing the large set of possible values which must be considered to properly calculate headroom needs to be made finite/drastically reduced. longer version... I don't think we should move forward without addressing it. I say this because I think it is likely to be a typical situation to have a queue which has more than one label associated with it- most likely, the simple case of a queue which can address all nodes some of which have a label and some of which do not. Jobs entering these queues using a restrictive label expression will hit this headroom issue - it's especially true in cases where there are lower resources, which is what one would expect from a "small set of special machines" (e.g. typical node label case). It's important to make sure headroom is correctly handled as we add node labels, and as things stand, we know it is not. I'm afraid it is something of a design issue, allowing arbitrary node label expressions with multiple labels on queues, etc, is leading to something of a combinatory explosion. It may be that the right solution is to narrow the feature set a bit for this iteration. We could choose to only support a restricted set of expressions on a given queue. This could even mean only supporting the default label expression - I'm concerned that this may be too restrictive - and so that we would need to support a set of expressions. This could then be a finite list which is pre-calculated. I think, in practical terms, this will probably meet people's needs. A second option is to restrict the number of labels supported on a queue, a small enough set could be pre-calculated for all possibilities. I suspicious of this latter option, though, it would have to be a very small number of labels to be manageable and I think it reduces, realistically, to the restricted set of expressions. I also don't see any performant way to support arbitrary nodelable expressions on every request with unlimited labels per queue and node - things as they are. It appears to me you would need to keep track of all resource values for intersection of all label combinations. If we limited the number of possible labels on a node to one then we could calculate based on expressions at runtime (possibly for a very small number > 1, but again, growth is exponential? I believe... and functionally complex) > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142001#comment-14142001 ] Wangda Tan commented on YARN-2496: -- Craig, still about #2, I think what you commented is make sense to me, AM can get a more precise headroom to plan its following resource usage, but I think: 1) It may not enough as what you said: bq. For this reason, headroom should reflect the labels in the last resource request from the application, not the queue's labels. It is possible an AM sent resource requests with different label expression, so what we will response headroom back to AM? I think maybe we need a new field in AllocateRequest to request different headrooms under different label expression. 2) Even with 1), I cannot think of a good way to fast computing random label expression in an acceptable time complexity, it is possible thousands of different label expression existed in a big cluster at the same time. Our current implementation can make sure resource of labels of a queue will up-to-date whenever resource change happened. With 1) and 2). I suggest to make it as a pending task, and we can deal it in the future. About bq. -re 5, I though * could be in requests, if no, then should not be an issue. Yes, we doesn't support specify * in requests, because it may cause some possible resource wastage. AM should clearly know what resource it needed. Thanks, Wangda > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141736#comment-14141736 ] Craig Welch commented on YARN-2496: --- ach, not finished - anyway, re 2 - a particular job may well only be able to use nodes with one label in a queue, and so if the headroom includes nodes without that label, we'll end up with another deadlock case where it spins up reducers too early and then can't complete its maps. It is definately a valid usecase to have a queue with two lables (a and b, as in this example) and an app which is consistently requesting only one of those two labels (from submission to completion...) - perhaps only "a" nodes have the "special resource" it needs (special hardware capability, etc). For this reason, headroom should reflect the labels in the last resource request from the application, not the queue's labels. (-re 5, I though * could be in requests, if no, then should not be an issue.) > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141724#comment-14141724 ] Craig Welch commented on YARN-2496: --- Ok, so it sounds like 1, 3 and 4 are ok. I think that 2 is still a problem though - headroom is an app level value, and even though an app may be able to use either label in a resource request it will, in some cases, not be able to. A typical case will be where a subset of nodes in > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141661#comment-14141661 ] Wangda Tan commented on YARN-2496: -- Hi [~cwelch], Thanks for your comments: 1) Regarding, {code} Headroom Calculation for JobA: userConsumed = 8G maxCapacityConsiderLabelA = 6G (Node1 only) headroom = -2G (assume it will normalize to 0G) {code} Currently, we calculate headroom by, bq. Headroom = min(userLimit, queue-max-cap, max-capacity-consider-label) - consumed The {max-capacity-consider-label} is queue-wise not app-wise, so in the queue-wise, the max-capacity-consider-label = node1 + node2. You can think that, the {max-capacity-consider-label} can guarantee it's always larger or equals than total resource of the queue will use. 2) Regarding bq. The "labels" are the labels for the queue, but the resource requests coming from the application can be a subset of that, no? So if application "a" is running on a queue with lables a and b, but it has a label expression of only a, which it is using for resource requests, it's going to get a headroom based on nodes with both labels a and b, but in fact it only has a "real" headroom for nodes with label "a" Yes/No, because even if app-a has "a" label in app-level, it's ResourceRequest(s) can also overwrite it and use b. Label in app-level is just a default label expression when its ResourceRequest doesn't set. So, app-a can still use all labels of queue. 3) Regarding, bq. On the parent/leaf refactor to share AbstractCSQueue - a great idea, thought about it myself when seeing the duplication, I agree with that, I think it may not too risky but it will hide functional changes we made. Let's get more ideas about this, because reverting it need some efforts. 4) Regarding, bq. CSQueueUtils - just removing a line, should revert Will do 5) Regarding, bq. SchedulerUtils.checkNodeLabelExpression - I think there is an issue here with the * case {{checkNodeLabelExpression}} is used for check if a ResourceRequest can be allocated on a node, we don't support specifying * in any label-expression (including ResourceRequest, ASC, queue-default-label-expression), that will cause many problem. Instead, we support * in queue's labels (not default-label-expression), which means queue *can* access any labels. The checking methods are {{checkQueueAccessToNode}} and {{checkQueueLabelExpression}}. Thanks, Wangda > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141508#comment-14141508 ] Craig Welch commented on YARN-2496: --- SchedulerUtils.checkNodeLabelExpression - I think there is an issue here with the * case, as I read the code a * case will not properly match against a node with a label - after the first check, there should be a check for ANY in the expression, and if so, return true > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141369#comment-14141369 ] Craig Welch commented on YARN-2496: --- CSQueueUtils - just removing a line, should revert > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141367#comment-14141367 ] Craig Welch commented on YARN-2496: --- On the parent/leaf refactor to share AbstractCSQueue - a great idea, thought about it myself when seeing the duplication, but I think that doing it while making changes like adding node labels adds confusion and makes it harder to see functional changes, I think it should have been done in isolation at some point (where no other changes were occurring). I don’t think you should change course on it now (I'm not suggesting any changes to what you have... I think it would be more risky than not at this point), just a thought for future cases like this. > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141355#comment-14141355 ] Craig Welch commented on YARN-2496: --- I'm also concerned about this: Resource maxCapacityConsiderLabel = labelManager == null ? clusterResource : labelManager.getQueueResource( queueName, labels, clusterResource); The "labels" are the labels for the queue, but the resource requests coming from the application can be a subset of that, no? So if application "a" is running on a queue with lables a and b, but it has a label expression of only a, which it is using for resource requests, it's going to get a headroom based on nodes with both labels a and b, but in fact it only has a "real" headroom for nodes with label "a" > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141342#comment-14141342 ] Craig Welch commented on YARN-2496: --- Let's say user UserU is running two jobs, JobA and JobB, in queue QueueAB, which has both LabelA and LabelB. JobA has label LabelA, JobB has LabelB. It's a two node cluster, Node1 and Node2. Node1 has LabelA, Node2 has LabelB Let's say the user has access to 100% of the cluster (just the one queue, etc). Let's say that JobA is using 4G of Ram and JobB is also using 4G. Let's say each node has 6G. Headroom Calculation for JobA: userConsumed = 8G maxCapacityConsiderLabelA = 6G (Node1 only) headroom = -2G (assume it will normalize to 0G) However, the user should still be able to use the remaining 2G for JobA, as they are only using 4 of the 6G available to that label. The issue I see is userConsumed, as maxCapacityConsiderLabel considers the label, but userConsumed does not, it should only be "userConsumedForLabel", if it is, JobA would see 2G as it should (as the consumed for LabelA is only 4G.) The problem, I think, is in subtracting a cross-label value from a per-label value. > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136878#comment-14136878 ] Wangda Tan commented on YARN-2496: -- Hi [~cwelch], I'm not sure quite understand about this, did you mean we need calculate consumed resource for each label (or label-expression) under each queue? Could you give me an example about how to avoid job starvation with it? It confuse me that, if we have resource per label/l-expression, should we have resource per host/rack (we can ask for resource only on a host/rack by specifying relax-locality). Thanks, Wangda > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135772#comment-14135772 ] Craig Welch commented on YARN-2496: --- LeafQueue.java I think there’s a bug in the way a user’s consumed value is obtained when working with labels (it’s unchanged from the existing behavior, but it needs to change, I think) The consumed is going to be for all jobs regardless of lable regardless of label expression but it’s being subtracted from a value which is limited to the user expression, so - this could be incorrect if user has some jobs w/out labels and some with (or with different labels) in the same queue. Jobs using labels could be starved due to jobs running under the same user with other labels. I believe consumed resources needs to be per label and/or label expression. > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131616#comment-14131616 ] Hadoop QA commented on YARN-2496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668345/YARN-2496.patch against trunk revision 78b0483. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4918//console This message is automatically generated. > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch, > YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129875#comment-14129875 ] Hadoop QA commented on YARN-2496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12668003/YARN-2496.patch against trunk revision 4be9517. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 10 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4888//console This message is automatically generated. > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch, YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129845#comment-14129845 ] Hadoop QA commented on YARN-2496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667992/YARN-2496.patch against trunk revision 4be9517. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4887//console This message is automatically generated. > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch, YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129837#comment-14129837 ] Wangda Tan commented on YARN-2496: -- Hi [~jianhe], Thanks for your comments, bq. CSQueueUtils.java format change only, we can revert Reverted bq. why checking labelManager != null every where ? we only need to check where it’s needed. It was used to reduce changes in tests. I think we should remove these checks and improve tests bq. We may not need to change the method signature to add one more parameter, just pass the queues map into NodeLabelManager#reinitializeQueueLabels, to avoid a number of test changes. Make sense, now reverted changes for related tests and get a queueToLabels after parseQueue bq. label initialization code is duplicated between ParentQueue and LeafQueue, how about creating an AbstractCSQueue and put common initialization methods there ? Make sense, there do have lots of common code between PQ and LQ, now I have merged all common parts to abstractCSQueue. Attached a new patch Thanks, Wangda > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129610#comment-14129610 ] Jian He commented on YARN-2496: --- briefly looked at the patch: - CSQueueUtils.java format change only, we can revert - why checking {{labelManager != null}} every where ? we only need to check where it’s needed. - We may not need to change the method signature to add one more parameter, just pass the queues map into NodeLabelManager#reinitializeQueueLabels, to avoid a number of test changes. {code} parseQueue(this, conf, null, CapacitySchedulerConfiguration.ROOT, queues, queues, noop, queueToLabels); {code} - label initialization code is duplicated between ParentQueue and LeafQueue, how about creating an AbstractCSQueue and put common initilazation methods there ? > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-2496) [YARN-796] Changes for capacity scheduler to support allocate resource respect labels
[ https://issues.apache.org/jira/browse/YARN-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14127168#comment-14127168 ] Hadoop QA commented on YARN-2496: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12667431/YARN-2496.patch against trunk revision 90c8ece. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-YARN-Build/4855//console This message is automatically generated. > [YARN-796] Changes for capacity scheduler to support allocate resource > respect labels > - > > Key: YARN-2496 > URL: https://issues.apache.org/jira/browse/YARN-2496 > Project: Hadoop YARN > Issue Type: Sub-task > Components: resourcemanager >Reporter: Wangda Tan >Assignee: Wangda Tan > Attachments: YARN-2496.patch > > > This JIRA Includes: > - Add/parse labels option to {{capacity-scheduler.xml}} similar to other > options of queue like capacity/maximum-capacity, etc. > - Include a "default-label-expression" option in queue config, if an app > doesn't specify label-expression, "default-label-expression" of queue will be > used. > - Check if labels can be accessed by the queue when submit an app with > labels-expression to queue or update ResourceRequest with label-expression > - Check labels on NM when trying to allocate ResourceRequest on the NM with > label-expression > - Respect labels when calculate headroom/user-limit -- This message was sent by Atlassian JIRA (v6.3.4#6332)