+1. Nicely done, Subru and Carlo. I have been partially involved with the work, and have reviewed some of the patches. With some help from Subru and documentation from Carlo (thanks!), I was able to play with the reservation system. Verified the following: 1. Reservations can be made only for the amount of resources available for that queue. 2. Jobs submitted against a reservation run in the corresponding "reservation" queue, and jobs submitted to the same higher-level queue but not against a reservation run in the corresponding "default" queue. 3. The web-ui shows the reserved resources in a queue even when there are no apps running.
There are a few follow-up items towards feature completeness, and I am okay with working on them post merge to trunk as planned. 1. Support for FairScheduler 2. Recover reservations on RM restart/failover 3. CLI and/or REST APIs to make reservations - this is very useful for testing 4. Documentation in the usual apt.vm format. Cheers! Karthik On Wed, Oct 1, 2014 at 1:29 PM, Wangda Tan <[email protected]> wrote: > +1 (non-binding), > Reviewed several patches related to scheduler side changes. As Jian > mentioned, this will not affect existing behavior. > Looking forward this feature will be used by more people. Thanks for Carlo > and Subru! > > Thanks, > Wangda > > On Wed, Oct 1, 2014 at 1:21 PM, Jian He <[email protected]> wrote: > > > +1, > > > > Carlo and Subru, great job ! thanks for your contribution ! > > I reviewed a couple of CapacityScheduler related patches, they are in > good > > shape. In the minimum, they are not affecting existing behavior. should > be > > safe to merge. > > > > Jian > > > > > > On Wed, Oct 1, 2014 at 2:46 AM, Thomas Jungblut <[email protected]> > > wrote: > > > > > +1 (non-binding) > > > Thanks for adding this, really useful feature. > > > > > > On 30 September 2014 19:40, Chris Douglas <[email protected]> wrote: > > > > > > > +1 > > > > > > > > Excellent work, Carlo and Subru. -C > > > > > > > > On Fri, Sep 26, 2014 at 11:50 AM, Carlo Curino < > [email protected]> > > > > wrote: > > > > > (Apologies if it is delivered twice.) > > > > > > > > > > YARN Devs, > > > > > > > > > > We propose to merge YARN-1051 development branch into trunk. > > > > > > > > > > Key Idea: > > > > > This work adds support for Reservations to YARN RM. The key idea is > > to > > > > allow users to request dedicated access to resources (a reservation), > > > ahead > > > > of time. > > > > > For example I can ask for "10 containers for 1 hour sometime > between > > > 4pm > > > > and 9pm today". The RM keeps track of the accepted reservation by > > means > > > of > > > > > a Plan (think it as an agenda on how the cluster resources will be > > > > used), and performs admission control to guarantee that if a > > reservation > > > is > > > > accepted enough > > > > > resources are set aside to satisfy it. We enforce the reservation > > > > promises by dynamically creating/resizing/removing queues at the > right > > > > time. This allows us > > > > > to leverage the existing schedulers for the actual container > > assignment > > > > and tracking. The key benefit is to expose to the scheduler > flexibility > > > of > > > > allocation, while > > > > > guaranteeing users predictable resource allocation. > > > > > > > > > > Status > > > > > > > > > > * The work has been "broken down" into 14 subtasks (+3 > > patches > > > > already committed to trunk for move/kill of apps). All the issues > have > > > been > > > > resolved. > > > > > > > > > > * Jenkins +1 the patch (with the exception of one test > > failure > > > > which we did not introduce, which is tracked here: > > > > https://issues.apache.org/jira/browse/MAPREDUCE-6094) > > > > > > > > > > * Simple integration with MapReduce: > > > > https://issues.apache.org/jira/browse/MAPREDUCE-6103 > > > > > > > > > > * The broken-down patches have been reviewed and +1ed by > > Vinod > > > > Kumar Vavilapali, Jian He, Wangda Tan, Karthik Kambatla, and Chris > > > Douglas. > > > > Thanks to all of you for the thorough reviews! > > > > > > > > > > * The current version has been rather thoroughly tested by > > > > running it on our 250 machines research cluster for months (first > > > prototype > > > > was operational about a year ago) by: > > > > > > > > > > o Running hundreds of thousands of job generate by a modified > > version > > > > of gridmix that exercise the reservations mechanism side-by-side > normal > > > > queues. > > > > > > > > > > o To support our integration with the resource estimation > framework > > > > Perforator (http://research.microsoft.com/pubs/178971/perforator.pdf > ). > > > > Kaushik and Dharmesh have been pounding the reservation system for > > their > > > > research for 3-4 months now, and helped us spot few bugs and iron > them > > > out. > > > > > > > > > > o Code has been inspected/extended by 4-5 other researchers which > > are > > > > exploring integration with other systems and extensions of our > > algorithms > > > > for "reservation placement". > > > > > > > > > > * We have few ideas for follow-up extensions/improvements > are > > > > tracked by the umbrella JIRA > > > > https://issues.apache.org/jira/browse/YARN-2572 > > > > > > > > > > Documents and Deliverables > > > > > > > > > > * This work was accepted for publication to SoCC 2014 > > > > (pre-camera ready version of the paper here): > > > > > > > > > > https://issues.apache.org/jira/secure/attachment/12671498/socc14-paper15.pdf > > > > > > > > > > * Shorter design doc: > > > > > > > > > > https://issues.apache.org/jira/secure/attachment/12628330/YARN-1051-design.pdf > > > > > > > > > > * Overall patch: > > > > > > > > > > https://issues.apache.org/jira/secure/attachment/12671361/YARN-1051.1.patch > > > > > > > > > > * Per Karthik request we are preparing a small how-to > > document > > > > and example code/configuration tracked by > > > > https://issues.apache.org/jira/browse/YARN-2609 > > > > > > > > > > > > > > > Credits > > > > > Myself and Subru did lots of the coding (hence the flow of patches > > from > > > > us), but this is a group effort that could have not been possible > > without > > > > the ideas and hard work of many other > > > > > folks in our research group (Microsoft-CISL). Major kudos to: > Chris > > > > Douglas, Sriram Rao, Raghu Ramakrishnan, and our intern Djellel > > Difallah. > > > > Also big thanks to the many folks in community (Arun, Vinod, > > Alejandro, > > > > Bikas, Karthik, Sandy, Hitesh, Jakob, Mohammad, Mayank, Jason, Bobby, > > and > > > > many more) that helped us shape our ideas and code with very > insightful > > > > feedback and comments. > > > > > > > > > > We expect the vote to run for the usual 7 days and will expire at > > 12pm > > > > PDT on Oct 3. Please feel free to reach out to us if you have any > > > > questions/doubts. > > > > > > > > > > Cheers, > > > > > Carlo & Subru > > > > > > > > > > > > > > > > -- > > CONFIDENTIALITY NOTICE > > NOTICE: This message is intended for the use of the individual or entity > to > > which it is addressed and may contain information that is confidential, > > privileged and exempt from disclosure under applicable law. If the reader > > of this message is not the intended recipient, you are hereby notified > that > > any printing, copying, dissemination, distribution, disclosure or > > forwarding of this communication is strictly prohibited. If you have > > received this communication in error, please contact the sender > immediately > > and delete it from your system. Thank You. > > >
