[ 
https://issues.apache.org/jira/browse/YARN-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617707#comment-14617707
 ] 

Carlo Curino commented on YARN-2915:
------------------------------------

{color:red}
*ENFORCING GLOBAL INVARIANT*
{color}

During the bird of a feather at Hadoop Summit 2015, and in separate 
conversations with [~kasha], [~leftnoteasy], [~jianhe], [~vinodkv], we received 
multiple questions on how we plan to handle global scheduler invariants with 
the local enforcement provided by the sub-cluster RMs. 

The attached FEDERATION_CAPACITY_ALLOCATION_JIRA.pdf is a short presentation 
that explains in more details our ideas. 

The key intuition is that we will have a spectrum of options ranging from 
full-replication of the queue structure in each sub-cluster to a full 
partitioning of it. On one extreme we will have a the best spreading of load 
and best fairness, while on the opposite extreme we will get the best 
scalability and isolation among tenants. Navigating the middle ground requires 
dynamic algorithms that continuously re-balance the queue mappings. 
Conceptually the problem is very close to preemption for node-labels when we 
allow rich expression and preferences on node labels. 

We propose an initial simple approach (re-using some of the preemption work to 
detect global imbalancing), and we are considering an LP-based modeling of the 
problem (possibly leveraging the apache-licensed solver in google or-tools). 
The solution we propose has the potential to provide a simple concrete initial 
version (which is likely to scale substantially), that we can iterate on 
getting better and better on it. Much of this must be driven by experimental 
results based on our initial prototype (which we are about to post code for).

 

> Enable YARN RM scale out via federation using multiple RM's
> -----------------------------------------------------------
>
>                 Key: YARN-2915
>                 URL: https://issues.apache.org/jira/browse/YARN-2915
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Sriram Rao
>            Assignee: Subru Krishnan
>         Attachments: FEDERATION_CAPACITY_ALLOCATION_JIRA.pdf, 
> Yarn_federation_design_v1.pdf
>
>
> This is an umbrella JIRA that proposes to scale out YARN to support large 
> clusters comprising of tens of thousands of nodes.   That is, rather than 
> limiting a YARN managed cluster to about 4k in size, the proposal is to 
> enable the YARN managed cluster to be elastically scalable.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to