+1 Non binding. I misread the deadline so mine may not count. This adds a relevant and important dimension to the scheduling features in YARN.
On Fri, Oct 3, 2014 at 2:23 PM, Carlo Curino <[email protected]> wrote: > Thanks everyone for voting, if I count right we have: > 4 +1 binding, > 5 +1 non-binding (including ourselves) > > So we are proceeding with merge to trunk (via Chris Douglas), > and per Vinod's and Karthik's suggestions, we will get a couple > of clean builds / jenkins runs, and repeat our usual suite of > runs on clusters and then commit to branch-2 and branch-2.6. > > Thanks, > Carlo & Subru > > On 10/2/14 4:17 PM, "Karthik Kambatla" <[email protected]> wrote: > > >If this vote is meant for all branches: > > > >+1 to merge to trunk > >+1 to merge to branch-2 > >+1 to merge to branch-2.6, provided we "label" this feature > >experimental/alpha until the follow-up items are addressed. > >-0 to unconditional merge to branch-2.6. > > > >PS: We should decide on the way to communicate the stability of a feature. > >May be, the new-feature notes in the release documentation should have > >this > >label? > > > > > > > >On Wed, Oct 1, 2014 at 6:23 PM, Karthik Kambatla <[email protected]> > >wrote: > > > >> +1. Nicely done, Subru and Carlo. > >> > >> I have been partially involved with the work, and have reviewed some of > >> the patches. With some help from Subru and documentation from Carlo > >> (thanks!), I was able to play with the reservation system. Verified the > >> following: > >> 1. Reservations can be made only for the amount of resources available > >>for > >> that queue. > >> 2. Jobs submitted against a reservation run in the corresponding > >> "reservation" queue, and jobs submitted to the same higher-level queue > >>but > >> not against a reservation run in the corresponding "default" queue. > >> 3. The web-ui shows the reserved resources in a queue even when there > >>are > >> no apps running. > >> > >> There are a few follow-up items towards feature completeness, and I am > >> okay with working on them post merge to trunk as planned. > >> 1. Support for FairScheduler > >> 2. Recover reservations on RM restart/failover > >> 3. CLI and/or REST APIs to make reservations - this is very useful for > >> testing > >> 4. Documentation in the usual apt.vm format. > >> > >> Cheers! > >> Karthik > >> > >> > >> > >> > >> On Wed, Oct 1, 2014 at 1:29 PM, Wangda Tan <[email protected]> wrote: > >> > >>> +1 (non-binding), > >>> Reviewed several patches related to scheduler side changes. As Jian > >>> mentioned, this will not affect existing behavior. > >>> Looking forward this feature will be used by more people. Thanks for > >>>Carlo > >>> and Subru! > >>> > >>> Thanks, > >>> Wangda > >>> > >>> On Wed, Oct 1, 2014 at 1:21 PM, Jian He <[email protected]> wrote: > >>> > >>> > +1, > >>> > > >>> > Carlo and Subru, great job ! thanks for your contribution ! > >>> > I reviewed a couple of CapacityScheduler related patches, they are in > >>> good > >>> > shape. In the minimum, they are not affecting existing behavior. > >>>should > >>> be > >>> > safe to merge. > >>> > > >>> > Jian > >>> > > >>> > > >>> > On Wed, Oct 1, 2014 at 2:46 AM, Thomas Jungblut > >>><[email protected]> > >>> > wrote: > >>> > > >>> > > +1 (non-binding) > >>> > > Thanks for adding this, really useful feature. > >>> > > > >>> > > On 30 September 2014 19:40, Chris Douglas <[email protected]> > >>> wrote: > >>> > > > >>> > > > +1 > >>> > > > > >>> > > > Excellent work, Carlo and Subru. -C > >>> > > > > >>> > > > On Fri, Sep 26, 2014 at 11:50 AM, Carlo Curino < > >>> [email protected]> > >>> > > > wrote: > >>> > > > > (Apologies if it is delivered twice.) > >>> > > > > > >>> > > > > YARN Devs, > >>> > > > > > >>> > > > > We propose to merge YARN-1051 development branch into trunk. > >>> > > > > > >>> > > > > Key Idea: > >>> > > > > This work adds support for Reservations to YARN RM. The key > >>>idea > >>> is > >>> > to > >>> > > > allow users to request dedicated access to resources (a > >>> reservation), > >>> > > ahead > >>> > > > of time. > >>> > > > > For example I can ask for "10 containers for 1 hour sometime > >>> between > >>> > > 4pm > >>> > > > and 9pm today". The RM keeps track of the accepted reservation > >>>by > >>> > means > >>> > > of > >>> > > > > a Plan (think it as an agenda on how the cluster resources > >>>will > >>> be > >>> > > > used), and performs admission control to guarantee that if a > >>> > reservation > >>> > > is > >>> > > > accepted enough > >>> > > > > resources are set aside to satisfy it. We enforce the > >>>reservation > >>> > > > promises by dynamically creating/resizing/removing queues at the > >>> right > >>> > > > time. This allows us > >>> > > > > to leverage the existing schedulers for the actual container > >>> > assignment > >>> > > > and tracking. The key benefit is to expose to the scheduler > >>> flexibility > >>> > > of > >>> > > > allocation, while > >>> > > > > guaranteeing users predictable resource allocation. > >>> > > > > > >>> > > > > Status > >>> > > > > > >>> > > > > * The work has been "broken down" into 14 subtasks (+3 > >>> > patches > >>> > > > already committed to trunk for move/kill of apps). All the issues > >>> have > >>> > > been > >>> > > > resolved. > >>> > > > > > >>> > > > > * Jenkins +1 the patch (with the exception of one test > >>> > failure > >>> > > > which we did not introduce, which is tracked here: > >>> > > > https://issues.apache.org/jira/browse/MAPREDUCE-6094) > >>> > > > > > >>> > > > > * Simple integration with MapReduce: > >>> > > > https://issues.apache.org/jira/browse/MAPREDUCE-6103 > >>> > > > > > >>> > > > > * The broken-down patches have been reviewed and +1ed > >>>by > >>> > Vinod > >>> > > > Kumar Vavilapali, Jian He, Wangda Tan, Karthik Kambatla, and > >>>Chris > >>> > > Douglas. > >>> > > > Thanks to all of you for the thorough reviews! > >>> > > > > > >>> > > > > * The current version has been rather thoroughly > >>>tested by > >>> > > > running it on our 250 machines research cluster for months (first > >>> > > prototype > >>> > > > was operational about a year ago) by: > >>> > > > > > >>> > > > > o Running hundreds of thousands of job generate by a modified > >>> > version > >>> > > > of gridmix that exercise the reservations mechanism side-by-side > >>> normal > >>> > > > queues. > >>> > > > > > >>> > > > > o To support our integration with the resource estimation > >>> framework > >>> > > > Perforator ( > >>> http://research.microsoft.com/pubs/178971/perforator.pdf). > >>> > > > Kaushik and Dharmesh have been pounding the reservation system > >>>for > >>> > their > >>> > > > research for 3-4 months now, and helped us spot few bugs and iron > >>> them > >>> > > out. > >>> > > > > > >>> > > > > o Code has been inspected/extended by 4-5 other researchers > >>> which > >>> > are > >>> > > > exploring integration with other systems and extensions of our > >>> > algorithms > >>> > > > for "reservation placement". > >>> > > > > > >>> > > > > * We have few ideas for follow-up > >>>extensions/improvements > >>> are > >>> > > > tracked by the umbrella JIRA > >>> > > > https://issues.apache.org/jira/browse/YARN-2572 > >>> > > > > > >>> > > > > Documents and Deliverables > >>> > > > > > >>> > > > > * This work was accepted for publication to SoCC 2014 > >>> > > > (pre-camera ready version of the paper here): > >>> > > > > >>> > > > >>> > > >>> > >>> > https://issues.apache.org/jira/secure/attachment/12671498/socc14-paper15 > >>>.pdf > >>> > > > > > >>> > > > > * Shorter design doc: > >>> > > > > >>> > > > >>> > > >>> > >>> > https://issues.apache.org/jira/secure/attachment/12628330/YARN-1051-desi > >>>gn.pdf > >>> > > > > > >>> > > > > * Overall patch: > >>> > > > > >>> > > > >>> > > >>> > >>> > https://issues.apache.org/jira/secure/attachment/12671361/YARN-1051.1.pa > >>>tch > >>> > > > > > >>> > > > > * Per Karthik request we are preparing a small how-to > >>> > document > >>> > > > and example code/configuration tracked by > >>> > > > https://issues.apache.org/jira/browse/YARN-2609 > >>> > > > > > >>> > > > > > >>> > > > > Credits > >>> > > > > Myself and Subru did lots of the coding (hence the flow of > >>>patches > >>> > from > >>> > > > us), but this is a group effort that could have not been possible > >>> > without > >>> > > > the ideas and hard work of many other > >>> > > > > folks in our research group (Microsoft-CISL). Major kudos to: > >>> Chris > >>> > > > Douglas, Sriram Rao, Raghu Ramakrishnan, and our intern Djellel > >>> > Difallah. > >>> > > > Also big thanks to the many folks in community (Arun, Vinod, > >>> > Alejandro, > >>> > > > Bikas, Karthik, Sandy, Hitesh, Jakob, Mohammad, Mayank, Jason, > >>> Bobby, > >>> > and > >>> > > > many more) that helped us shape our ideas and code with very > >>> insightful > >>> > > > feedback and comments. > >>> > > > > > >>> > > > > We expect the vote to run for the usual 7 days and will expire > >>>at > >>> > 12pm > >>> > > > PDT on Oct 3. Please feel free to reach out to us if you have any > >>> > > > questions/doubts. > >>> > > > > > >>> > > > > Cheers, > >>> > > > > Carlo & Subru > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > -- > >>> > CONFIDENTIALITY NOTICE > >>> > NOTICE: This message is intended for the use of the individual or > >>> entity to > >>> > which it is addressed and may contain information that is > >>>confidential, > >>> > privileged and exempt from disclosure under applicable law. If the > >>> reader > >>> > of this message is not the intended recipient, you are hereby > >>>notified > >>> that > >>> > any printing, copying, dissemination, distribution, disclosure or > >>> > forwarding of this communication is strictly prohibited. If you have > >>> > received this communication in error, please contact the sender > >>> immediately > >>> > and delete it from your system. Thank You. > >>> > > >>> > >> > >> > >
