+1 Excellent work, Carlo and Subru. -C
On Fri, Sep 26, 2014 at 11:50 AM, Carlo Curino <[email protected]> wrote: > (Apologies if it is delivered twice.) > > YARN Devs, > > We propose to merge YARN-1051 development branch into trunk. > > Key Idea: > This work adds support for Reservations to YARN RM. The key idea is to allow > users to request dedicated access to resources (a reservation), ahead of time. > For example I can ask for "10 containers for 1 hour sometime between 4pm and > 9pm today". The RM keeps track of the accepted reservation by means of > a Plan (think it as an agenda on how the cluster resources will be used), > and performs admission control to guarantee that if a reservation is accepted > enough > resources are set aside to satisfy it. We enforce the reservation promises > by dynamically creating/resizing/removing queues at the right time. This > allows us > to leverage the existing schedulers for the actual container assignment and > tracking. The key benefit is to expose to the scheduler flexibility of > allocation, while > guaranteeing users predictable resource allocation. > > Status > > * The work has been "broken down" into 14 subtasks (+3 patches > already committed to trunk for move/kill of apps). All the issues have been > resolved. > > * Jenkins +1 the patch (with the exception of one test failure which > we did not introduce, which is tracked here: > https://issues.apache.org/jira/browse/MAPREDUCE-6094) > > * Simple integration with MapReduce: > https://issues.apache.org/jira/browse/MAPREDUCE-6103 > > * The broken-down patches have been reviewed and +1ed by Vinod Kumar > Vavilapali, Jian He, Wangda Tan, Karthik Kambatla, and Chris Douglas. Thanks > to all of you for the thorough reviews! > > * The current version has been rather thoroughly tested by running it > on our 250 machines research cluster for months (first prototype was > operational about a year ago) by: > > o Running hundreds of thousands of job generate by a modified version of > gridmix that exercise the reservations mechanism side-by-side normal queues. > > o To support our integration with the resource estimation framework > Perforator (http://research.microsoft.com/pubs/178971/perforator.pdf). > Kaushik and Dharmesh have been pounding the reservation system for their > research for 3-4 months now, and helped us spot few bugs and iron them out. > > o Code has been inspected/extended by 4-5 other researchers which are > exploring integration with other systems and extensions of our algorithms for > "reservation placement". > > * We have few ideas for follow-up extensions/improvements are tracked > by the umbrella JIRA https://issues.apache.org/jira/browse/YARN-2572 > > Documents and Deliverables > > * This work was accepted for publication to SoCC 2014 (pre-camera > ready version of the paper here): > https://issues.apache.org/jira/secure/attachment/12671498/socc14-paper15.pdf > > * Shorter design doc: > https://issues.apache.org/jira/secure/attachment/12628330/YARN-1051-design.pdf > > * Overall patch: > https://issues.apache.org/jira/secure/attachment/12671361/YARN-1051.1.patch > > * Per Karthik request we are preparing a small how-to document and > example code/configuration tracked by > https://issues.apache.org/jira/browse/YARN-2609 > > > Credits > Myself and Subru did lots of the coding (hence the flow of patches from us), > but this is a group effort that could have not been possible without the > ideas and hard work of many other > folks in our research group (Microsoft-CISL). Major kudos to: Chris Douglas, > Sriram Rao, Raghu Ramakrishnan, and our intern Djellel Difallah. Also big > thanks to the many folks in community (Arun, Vinod, Alejandro, Bikas, > Karthik, Sandy, Hitesh, Jakob, Mohammad, Mayank, Jason, Bobby, and many more) > that helped us shape our ideas and code with very insightful feedback and > comments. > > We expect the vote to run for the usual 7 days and will expire at 12pm PDT on > Oct 3. Please feel free to reach out to us if you have any questions/doubts. > > Cheers, > Carlo & Subru >
