[ 
https://issues.apache.org/jira/browse/YARN-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380937#comment-15380937
 ] 

Carlo Curino commented on YARN-2915:
------------------------------------

[~vinodkv], Incidentally we were discussing this with [~subru] just yesterday.

*Philosophically:*
I agree with you that node-labels and node-label expressions are very powerful 
and could subsume much of the rest of yarn locality/sub-clusters etc. 

Another aspect that makes this equivalence somewhat pleasant is that in some 
reasonably restricted scenario this is quite natural. E.g., given two 
node-partitions labels (blue, red), at the moment the {{CapacityScheduler}} 
behaves almost as if the world of blue nodes and red nodes are completely 
orthogonal to each other. Mapping this onto having two separate RMs dealing 
with blue nodes and red nodes should be rather straightforward. This is to say 
that if we simply "paint" each sub-cluster blue or red, it is not to hard to 
enforce this. Admins could use this concept to manually allocate capacity onto 
physical sub-clusters by manipulating labels instead of thinking of 
sub-clusters too explicitly. 

The two concerns of these are:
 # labels are very generic and we risk to confuse the admins as we use the same 
constructs to refer to very physical or very logical entities (good and bad)
 # Handling richer/more complex intersection of node-label partitions and 
sub-clusters notions (i.e., where they are not aligned as I described) might 
get trickier and requires the "digging deeper" you suggested.

All in all, I am in favor of this especially if we also tackle more 
substantially the scheduler rewrite work we have discussed.

*Practically:*
I think we should land a v0 of federation with all basic mechanisms in place, 
but with a somewhat limited admin surface that is not fully transparent yet. 
(i.e., we give users full transparency, but ask a little more to our admins to 
begin with). This allows us to harden much of the internals and mechanics, 
before polishing all the tooling around it. Priority-wise this is very 
important to us.

In v1 (soon after) we will improve this with: 
 # admin tooling that maps the single-logical view of (queue + labels) to 
multiple subclusters queues + labels transparently (achieving I think what you 
ask as an admin experience), 
 # policies that direct job's asks based on labels+locality (providing the 
physical substrate to support (1)).

Note that the general architecture makes (1) and (2) quite feasible. For 
example, if you look at the policies I just posted in YARN-52324, YARN-5235 it 
is easy (literally a handful of LOC) to modify the "routing" behavior to be 
based on node-labels while reusing much of the rest of the mechanics around it. 
In fact, if you or [~wangda] have time/interest to work on this I am happy to 
help you orient yourself in what we are doing in the policy space. 

> Enable YARN RM scale out via federation using multiple RM's
> -----------------------------------------------------------
>
>                 Key: YARN-2915
>                 URL: https://issues.apache.org/jira/browse/YARN-2915
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Sriram Rao
>            Assignee: Subru Krishnan
>         Attachments: FEDERATION_CAPACITY_ALLOCATION_JIRA.pdf, 
> Federation-BoF.pdf, Yarn_federation_design_v1.pdf, federation-prototype.patch
>
>
> This is an umbrella JIRA that proposes to scale out YARN to support large 
> clusters comprising of tens of thousands of nodes.   That is, rather than 
> limiting a YARN managed cluster to about 4k in size, the proposal is to 
> enable the YARN managed cluster to be elastically scalable.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to