[ 
https://issues.apache.org/jira/browse/YARN-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232144#comment-14232144
 ] 

Carlo Curino commented on YARN-2915:
------------------------------------

A couple of design principles at play:

 # We are designing federation so that it requires minimal changes to YARN. 
 # We are trying hard to make federation completely transparent to 
applications. 
 # We are investigating uses of federation that could facilitate maintenance / 
fault-tolerance / sub-cluster customization. 


Regarding (1), in the context of cluster pooling / private cloud idea mentioned 
above, 
the clusters being pooled can be (largely) unaware of the fact that are being 
federated together, 
as all/most public protocols are unmodified, and the AMRMProxy of YARN-2884 can 
be run only 
on a small cluster that work as a launch pad for the federation.

Regarding (2), it seems plausible (we have a working prototype) to make the 
federation transparent to the applications,
but more analysis of security, load balancing, and HA aspects is required. 

Regarding (3) federation should facilitate upgrades of each sub-cluster, can be 
made more fault-tolerant,
by having the routing layer to fall-back on a secondary clusters upon 
cluster-wide failures, and could be leveraged
for customization (e.g., run a smaller cluster with very fast heartbeats and a 
bigger cluster with slower heartbeat, and
pull them together on demand).  

> Enable YARN RM scale out via federation using multiple RM's
> -----------------------------------------------------------
>
>                 Key: YARN-2915
>                 URL: https://issues.apache.org/jira/browse/YARN-2915
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Sriram Rao
>            Assignee: Subru Krishnan
>
> This is an umbrella JIRA that proposes to scale out YARN to support large 
> clusters comprising of tens of thousands of nodes.   That is, rather than 
> limiting a YARN managed cluster to about 4k in size, the proposal is to 
> enable the YARN managed cluster to be elastically scalable.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to