[ 
https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146321#comment-14146321
 ] 

Wangda Tan commented on YARN-796:
---------------------------------

Had an offline discussion with [~cwelch] today, based on Craig's comment on 
YARN-2496: 
https://issues.apache.org/jira/browse/YARN-2496?focusedCommentId=14143993&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14143993.
 I think it's better to put here for more discussions.

*A simple summary of the problem is:*
Current queues and nodes have labels, queue may not be able to access all nodes 
in the cluster, so the headroom might be less than headroom calculated today.
Today in YARN-2496, headroom caculation changed to {{headroom = min(headroom, 
total-resource-of-the-queue-can-access)}}.
However, this may not enough, application may set label it required (e.g. 
label-expression = GPU && LARGE_MEMORY). It's better to return headroom 
according to the label expression of the application to avoid resource 
deadlock, etc. problems.
We will have two problems to support this,
# There can be thousands of combinations of label expression, it will be a very 
large calculation amount for headroom when we have many application running and 
ask for different labels at the same time.
# A single application can ask for different label expressions for different 
containers (like mapper need GPU but reduer not), a single headroom returned by 
AllocateResponse may not enough.

*Proposed solutions:*
Solution #1:
Assume a relatively small number of unique label-expression can satisfy most 
applications. We can add an option in capacity-scheduler.xml, users can add 
list of label-expressions need pre-calculated, number of such label-expressions 
should be small (like <= 100 in the whole cluster). NodeLabelManager will 
update them when node join, leave or label changed.
And add a new field in AllocateResponse, like {{Map<LabelExpression(String), 
Headroom(Resource)> labelExpToHeadroom}}. We will return the list of 
precalculated headrooms back to AM, and AM can make decision how to use it.

Solution #2:
AM will receive updated nodes (a list of NodeReport) from RM in 
AllocateResponse, AM itself can figure out how to get headroom of specified 
label-expression according to updated NMs. This is simpler than #1, but AM side 
need implement its own logic to support it.

Hope to get more thoughts about this,

Thanks,
Wangda

> Allow for (admin) labels on nodes and resource-requests
> -------------------------------------------------------
>
>                 Key: YARN-796
>                 URL: https://issues.apache.org/jira/browse/YARN-796
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 2.4.1
>            Reporter: Arun C Murthy
>            Assignee: Wangda Tan
>         Attachments: LabelBasedScheduling.pdf, 
> Node-labels-Requirements-Design-doc-V1.pdf, 
> Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, 
> YARN-796.node-label.consolidate.1.patch, 
> YARN-796.node-label.consolidate.2.patch, 
> YARN-796.node-label.consolidate.3.patch, 
> YARN-796.node-label.consolidate.4.patch, 
> YARN-796.node-label.consolidate.5.patch, 
> YARN-796.node-label.consolidate.6.patch, 
> YARN-796.node-label.consolidate.7.patch, 
> YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, 
> YARN-796.patch, YARN-796.patch4
>
>
> It will be useful for admins to specify labels for nodes. Examples of labels 
> are OS, processor architecture etc.
> We should expose these labels and allow applications to specify labels on 
> resource-requests.
> Obviously we need to support admin operations on adding/removing node labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to