Re: [Vote] merge feature branch YARN-2915 (Federation) to trunk

Daniel Templeton Tue, 01 Aug 2017 13:51:55 -0700

Thanks, Subru!  Carry on. :)

Daniel


On 8/1/17 1:42 PM, Subru Krishnan wrote:

Hi Daniel,

You were just on time, myself & Carlo were just talking about moving
forward with the merge :).

To answer your questions:

    1. The expectation about the store is that user will have a database set
    up (we only link to install instructions page) but we do have the scripts
    for the schema and stored procedures. This is in fact called out in the doc
    in the *State Store* section (just before *Running a Sample Job).
*Additionally
    we are working on a ZK based implementation of the store. Inigo has patch
    in YARN-6900[1].
    2. We rely on existing YARN/Hadoop security mechanisms for running
    application on Federation as-is so you should not need any additional
    Kerberos configuration. Disclaimer: we don't use Kerberos for securing
    Hadoop but rely on our production infrastructure.

Thanks,
Subru

[1] https://issues.apache.org/jira/browse/YARN-6900

On Tue, Aug 1, 2017 at 1:25 PM, Daniel Templeton <[email protected]>
wrote:

Subru, sorry for the last minute contribution... :)  I've been looking at
the branch, and I have two questions.

First, what's the out-of-box experience regarding the data store? Is the
expectation that the user will have a database set up and ready to go?
Will the state store set up the schema automatically, or is that on the
user?  I don't see that in the docs.

Second, how well does federation play with Kerberos?  Anything special
that needs to be configured to make it work?

Daniel

On 7/25/17 8:24 PM, Subru Krishnan wrote:

Hi all,

Per earlier discussion [9], I'd like to start a formal vote to merge
feature YARN Federation (YARN-2915) [1] to trunk. The vote will run for 7
days, and will end Aug 1 7PM PDT.

We have been developing the feature in a branch (YARN-2915 [2]) for a
while, and we are reasonably confident that the state of the feature meets
the criteria to be merged onto trunk.

*Key Ideas*:

YARN’s centralized design allows strict enforcement of scheduling
invariants and effective resource sharing, but becomes a scalability
bottleneck (in number of jobs and nodes) well before reaching the scale of
our clusters (e.g., 20k-50k nodes).


To address these limitations, we developed a scale-out, federation-based
solution (YARN-2915). Our architecture scales near-linearly to datacenter
sized clusters, by partitioning nodes across multiple sub-clusters (each
running a YARN cluster of few thousands nodes). Applications can span
multiple sub-clusters *transparently (i.e. no code change or recompilation
of existing apps)*, thanks to a layer of indirection that negotiates with
multiple sub-clusters' Resource Managers on behalf of the application.


This design is structurally scalable, as it bounds the number of nodes
each
RM is responsible for. Appropriate policies ensure that the majority of
applications reside within a single sub-cluster, thus further controlling
the load on each RM. This provides near linear scale-out by simply adding
more sub-clusters. The same mechanism enables pooling of resources from
clusters owned and operated by different teams.

Status:

     - The version we would like to merge to trunk is termed "MVP" (minimal
     viable product). The feature will have a complete end-to-end
application
     execution flow with the ability to span a single application across
     multiple YARN (sub) clusters.
     - There were 50+ sub-tasks that were that were completed as part of
this
     effort. Every patch has been reviewed and +1ed by a committer. Thanks
to
     Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough reviews!
     - Federation is designed to be built around YARN and consequently has
     minimal code changes to core YARN. The relevant JIRAs that modify
existing
     YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid close
     attention to ensure that if federation is disabled there is zero
impact to
     existing functionality (disabled by default).
     - We found a few bugs as we went along which we fixed directly
upstream
     in trunk and/or branch-2.
     - We have continuously rebasing the feature branch [2] so the merge
     should be a straightforward cherry-pick.
     - The current version has been rather thoroughly tested and is
currently
     deployed in a *10,000+ node federated YARN cluster that's running
     upwards of 50k jobs daily with a reliability of 99.9%*.
     - We have few ideas for follow-up extensions/improvements which are
     tracked in the umbrella JIRA YARN-5597[3].


Documentation:

     - Quick start guide (maven site) - YARN-6484[4].
     - Overall design doc[5] and the slide-deck [6] we used for our talk at
     Hadoop Summit 2016 is available in the umbrella jira - YARN-2915.


Credits:

This is a group effort that could have not been possible without the ideas
and hard work of many other folks and we would like to specifically call
out Giovanni, Botong & Ellen for their invaluable contributions. Also big
thanks to the many folks in community  (Sriram, Kishore, Sarvesh, Jian,
Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith and
many more) that helped us shape our ideas and code with very insightful
feedback and comments.

Cheers,
Subru & Carlo

[1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915
[2] https://github.com/apache/hadoop/tree/YARN-2915
[3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597
[4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484
[5] https://issues.apache.org/jira/secure/attachment/12733292/Ya
rn_federation_design_v1.pdf
[6] https://issues.apache.org/jira/secure/attachment/1281922
9/YARN-Federation-Hadoop-Summit_final.pptx
[7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671
[8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673
[9]
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201
706.mbox/%3CCAOScs9bSsZ7mzH15Y%2BSPDU8YuNUAq7QicjXpDoX_tKh3M
S4HsA%40mail.gmail.com%3E

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [Vote] merge feature branch YARN-2915 (Federation) to trunk

Reply via email to