[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147034#comment-14147034
 ] 

Craig Welch commented on YARN-796:
----------------------------------

Some additional info regarding the headroom problem - one of the prototypical 
node label cases is a queue which can access the whole cluster but which also 
can access a particular label ("a").  A mapreduce job is launched on this queue 
with an expression limiting it to "a" nodes.  It will receive headroom 
reflecting access to the whole cluster, even though it can only use "a" nodes.  
This will sometimes result in a deadlock situation where it starts reducers 
before it should, based on the incorrect (inflated) headroom, and then cannot 
start mappers in order to complete the map phase, and so is deadlocked.  If 
there are significantly fewer "a" nodes than the total cluster (expected to be 
a frequent case), during cases of high or full utilization of those nodes 
(again, desirable and probably typical), this deadlock will occur. 

It is possible to make no change and receive the correct headroom value for a 
very restricted set of configurations.  If queues are restricted to a single 
label (and not * or "also the whole cluster"), and jobs run with a label 
expression selecting that single label, they should get the correct headroom 
values.  Unfortunately, this eliminates a great many use cases/cluster 
configurations, including the one above, which I think it is very importantant 
to support.

A couple of additional details regarding Solution 1 above - in addition to the 
potential to expand the allocate response api to include a map of 
expresion->headroom values, it is also possible with this approach to return 
the correct headroom value where it is currently returned for a job with a 
single expression.  So, a scenario I think very likely - which is the first use 
case above (a queue which can see the whole cluster + a label with "special" 
nodes, say label "GPU"), with a default label expression of "GPU" (used by the 
job throughout), running an unmodified mapreduce job (or hive, etc), where no 
special support for labels has been added to the that component in the 
platform, the correct headroom will be returned.   I think it's important to be 
able to introduce node label usability in a largely backward compatible way to 
enable mapreduce & things above to be able to make use of node labels with just 
configuration/the yarn platform implementation, and this is the solution (of 
the one's we've considered) which will make this possible. 

> Allow for (admin) labels on nodes and resource-requests
> -------------------------------------------------------
>
>                 Key: YARN-796
>                 URL: https://issues.apache.org/jira/browse/YARN-796
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 2.4.1
>            Reporter: Arun C Murthy
>            Assignee: Wangda Tan
>         Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to