[
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14147034#comment-14147034
]
Craig Welch commented on YARN-796:
----------------------------------
Some additional info regarding the headroom problem - one of the prototypical
node label cases is a queue which can access the whole cluster but which also
can access a particular label ("a"). A mapreduce job is launched on this queue
with an expression limiting it to "a" nodes. It will receive headroom
reflecting access to the whole cluster, even though it can only use "a" nodes.
This will sometimes result in a deadlock situation where it starts reducers
before it should, based on the incorrect (inflated) headroom, and then cannot
start mappers in order to complete the map phase, and so is deadlocked. If
there are significantly fewer "a" nodes than the total cluster (expected to be
a frequent case), during cases of high or full utilization of those nodes
(again, desirable and probably typical), this deadlock will occur.
It is possible to make no change and receive the correct headroom value for a
very restricted set of configurations. If queues are restricted to a single
label (and not * or "also the whole cluster"), and jobs run with a label
expression selecting that single label, they should get the correct headroom
values. Unfortunately, this eliminates a great many use cases/cluster
configurations, including the one above, which I think it is very importantant
to support.
A couple of additional details regarding Solution 1 above - in addition to the
potential to expand the allocate response api to include a map of
expresion->headroom values, it is also possible with this approach to return
the correct headroom value where it is currently returned for a job with a
single expression. So, a scenario I think very likely - which is the first use
case above (a queue which can see the whole cluster + a label with "special"
nodes, say label "GPU"), with a default label expression of "GPU" (used by the
job throughout), running an unmodified mapreduce job (or hive, etc), where no
special support for labels has been added to the that component in the
platform, the correct headroom will be returned. I think it's important to be
able to introduce node label usability in a largely backward compatible way to
enable mapreduce & things above to be able to make use of node labels with just
configuration/the yarn platform implementation, and this is the solution (of
the one's we've considered) which will make this possible.
> Allow for (admin) labels on nodes and resource-requests
> -------------------------------------------------------
>
> Key: YARN-796
> URL: https://issues.apache.org/jira/browse/YARN-796
> Project: Hadoop YARN
> Issue Type: Sub-task
> Affects Versions: 2.4.1
> Reporter: Arun C Murthy
> Assignee: Wangda Tan
> Attachments: LabelBasedScheduling.pdf,
> Node-labels-Requirements-Design-doc-V1.pdf,
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf,
> YARN-796.node-label.consolidate.1.patch,
> YARN-796.node-label.consolidate.2.patch,
> YARN-796.node-label.consolidate.3.patch,
> YARN-796.node-label.consolidate.4.patch,
> YARN-796.node-label.consolidate.5.patch,
> YARN-796.node-label.consolidate.6.patch,
> YARN-796.node-label.consolidate.7.patch,
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1,
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)