[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146321#comment-14146321 ]
Wangda Tan commented on YARN-796: --------------------------------- Had an offline discussion with [~cwelch] today, based on Craig's comment on YARN-2496: https://issues.apache.org/jira/browse/YARN-2496?focusedCommentId=14143993&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14143993. I think it's better to put here for more discussions. *A simple summary of the problem is:* Current queues and nodes have labels, queue may not be able to access all nodes in the cluster, so the headroom might be less than headroom calculated today. Today in YARN-2496, headroom caculation changed to {{headroom = min(headroom, total-resource-of-the-queue-can-access)}}. However, this may not enough, application may set label it required (e.g. label-expression = GPU && LARGE_MEMORY). It's better to return headroom according to the label expression of the application to avoid resource deadlock, etc. problems. We will have two problems to support this, # There can be thousands of combinations of label expression, it will be a very large calculation amount for headroom when we have many application running and ask for different labels at the same time. # A single application can ask for different label expressions for different containers (like mapper need GPU but reduer not), a single headroom returned by AllocateResponse may not enough. *Proposed solutions:* Solution #1: Assume a relatively small number of unique label-expression can satisfy most applications. We can add an option in capacity-scheduler.xml, users can add list of label-expressions need pre-calculated, number of such label-expressions should be small (like <= 100 in the whole cluster). NodeLabelManager will update them when node join, leave or label changed. And add a new field in AllocateResponse, like {{Map<LabelExpression(String), Headroom(Resource)> labelExpToHeadroom}}. We will return the list of precalculated headrooms back to AM, and AM can make decision how to use it. Solution #2: AM will receive updated nodes (a list of NodeReport) from RM in AllocateResponse, AM itself can figure out how to get headroom of specified label-expression according to updated NMs. This is simpler than #1, but AM side need implement its own logic to support it. Hope to get more thoughts about this, Thanks, Wangda > Allow for (admin) labels on nodes and resource-requests > ------------------------------------------------------- > > Key: YARN-796 > URL: https://issues.apache.org/jira/browse/YARN-796 > Project: Hadoop YARN > Issue Type: Sub-task > Affects Versions: 2.4.1 > Reporter: Arun C Murthy > Assignee: Wangda Tan > Attachments: LabelBasedScheduling.pdf, > Node-labels-Requirements-Design-doc-V1.pdf, > Node-labels-Requirements-Design-doc-V2.pdf, YARN-796-Diagram.pdf, > YARN-796.node-label.consolidate.1.patch, > YARN-796.node-label.consolidate.2.patch, > YARN-796.node-label.consolidate.3.patch, > YARN-796.node-label.consolidate.4.patch, > YARN-796.node-label.consolidate.5.patch, > YARN-796.node-label.consolidate.6.patch, > YARN-796.node-label.consolidate.7.patch, > YARN-796.node-label.consolidate.8.patch, YARN-796.node-label.demo.patch.1, > YARN-796.patch, YARN-796.patch4 > > > It will be useful for admins to specify labels for nodes. Examples of labels > are OS, processor architecture etc. > We should expose these labels and allow applications to specify labels on > resource-requests. > Obviously we need to support admin operations on adding/removing node labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)