Dear all, Good morning,
This is Kiran Kumar Pulamolu, doing research in resource optimization in Hadoop by dapit resource sharing with fairness policies. I would like contribute in this group. Please help out how to proceed. Thankyou Kiran Kumar Pulamolu +919492400797 On 24 Jun 2017 5:42 a.m., "Wangda Tan" <[email protected]> wrote: > Thanks all for working on the feature, I'm in favor of moving forward as > well. > > Best, > Wangda > > On Fri, Jun 23, 2017 at 2:44 PM, Sangjin Lee <[email protected]> wrote: > > > Thanks for the clarification Subru. I am in favor of moving forward. > > > > > > Sangjin > > > > On Thu, Jun 22, 2017 at 6:21 PM, Karthik Shashank Kambatla < > > [email protected]> wrote: > > > > > Given RTC and the amount of production testing this feature has > > received, I > > > am totally in favor of this merge. > > > > > > > > > > > > On Tue, Jun 20, 2017 at 4:28 PM, Subru Krishnan <[email protected]> > > wrote: > > > > > > > Hi all, > > > > > > > > We would like to open a discussion on merging the YARN Federation > > > > (YARN-2915) [1] feature to trunk. We have been developing the > feature > > > in a > > > > feature branch (YARN-2915 [2]) for a while, and we are reasonably > > > confident > > > > that the state of the feature meets the criteria to be merged onto > > trunk. > > > > > > > > *Key Ideas*: > > > > > > > > YARN’s centralized design allows strict enforcement of scheduling > > > > invariants and effective resource sharing, but becomes a scalability > > > > bottleneck (in number of jobs and nodes) well before reaching the > scale > > > of > > > > our clusters (e.g., 20k-50k nodes). > > > > > > > > > > > > To address these limitations, we developed a scale-out, > > federation-based > > > > solution (YARN-2915). Our architecture scales near-linearly to > > datacenter > > > > sized clusters, by partitioning nodes across multiple sub-clusters > > (each > > > > running a YARN cluster of few thousands nodes). Applications can span > > > > multiple sub-clusters *transparently (i.e. no code change or > > > recompilation > > > > of existing apps)*, thanks to a layer of indirection that negotiates > > with > > > > multiple sub-clusters' Resource Managers on behalf of the > application. > > > > > > > > > > > > This design is structurally scalable, as it bounds the number of > nodes > > > each > > > > RM is responsible for. Appropriate policies ensure that the majority > of > > > > applications reside within a single sub-cluster, thus further > > controlling > > > > the load on each RM. This provides near linear scale-out by simply > > adding > > > > more sub-clusters. The same mechanism enables pooling of resources > from > > > > clusters owned and operated by different teams. > > > > > > > > Status: > > > > > > > > - The version we would like to merge to trunk is termed "MVP" > > (minimal > > > > viable product). The feature will have a complete end-to-end > > > application > > > > execution flow with the ability to span a single application > across > > > > multiple YARN (sub) clusters. > > > > - There were 50+ sub-tasks that were that were completed as part > of > > > this > > > > effort. Every patch has been reviewed and +1ed by a committer. > > Thanks > > > to > > > > Jian, Wangda, Karthik, Vinod, Varun & Arun for the thorough > reviews! > > > > - Federation is designed to be built around YARN and consequently > > has > > > > minimal code changes to core YARN. The relevant JIRAs that modify > > > > existing > > > > YARN code base are YARN-3671 [7] & YARN-3673 [8]. We also paid > close > > > > attention to ensure that if federation is disabled there is zero > > > impact > > > > to > > > > existing functionality (disabled by default). > > > > - We found a few bugs as we went along which we fixed directly > > > upstream > > > > in trunk and/or branch-2. > > > > - We have continuously rebasing the feature branch [2] so the > merge > > > > should be a straightforward cherry-pick. > > > > - The current version has been rather thoroughly tested and is > > > currently > > > > deployed in a *10,000+ node federated YARN cluster that's running > > > > upwards of 50k jobs daily with a reliability of 99.9%*. > > > > - We have few ideas for follow-up extensions/improvements which > are > > > > tracked in the umbrella JIRA YARN-5597[3]. > > > > > > > > > > > > Documentation: > > > > > > > > - Quick start guide (maven site) - YARN-6484[4]. > > > > - Overall design doc[5] and the slide-deck [6] we used for our > talk > > at > > > > Hadoop Summit 2016 is available in the umbrella jira - YARN-2915. > > > > > > > > > > > > Credits: > > > > > > > > This is a group effort that could have not been possible without the > > > ideas > > > > and hard work of many other folks and we would like to specifically > > call > > > > out Giovanni, Botong & Ellen for their invaluable contributions. Also > > big > > > > thanks to the many folks in community (Sriram, Kishore, Sarvesh, > Jian, > > > > Wangda, Karthik, Vinod, Varun, Inigo, Vrushali, Sangjin, Joep, Rohith > > and > > > > many more) that helped us shape our ideas and code with very > insightful > > > > feedback and comments. > > > > > > > > We plan to start the merge vote in the next week or so. The branch is > > > close > > > > to complete (~5 patches before one can kick the tires on a running > > > > deployment). Please look through the branch; feedback is welcome. > > Thanks! > > > > > > > > Cheers, > > > > Subru & Carlo > > > > > > > > [1] YARN-2915: https://issues.apache.org/jira/browse/YARN-2915 > > > > [2] https://github.com/apache/hadoop/tree/YARN-2915 > > > > [3] YARN-5597: https://issues.apache.org/jira/browse/YARN-5597 > > > > [4] YARN-6484: https://issues.apache.org/jira/browse/YARN-6484 > > > > [5] https://issues.apache.org/jira/secure/attachment/12733292/ > > > > Yarn_federation_design_v1.pdf > > > > [6] https://issues.apache.org/jira/secure/attachment/1281922 > > > > 9/YARN-Federation-Hadoop-Summit_final.pptx > > > > [7] YARN-3671: https://issues.apache.org/jira/browse/YARN-3671 > > > > [8] YARN-3673: https://issues.apache.org/jira/browse/YARN-3673 > > > > > > > > > >
