Catching up on this thread. I'd missed Wangda's comment. This was merged without testing it end-to-end? And backported to branch-2?
I share Karthik's reservations. This is a very aggressive merging strategy. Every other feature merged to branch-2 must also help stabilize YARN-1197. Going through the patch, the particular changes are not terribly threatening and the passing unit tests are encouraging. However, a few days' delay to test the feature is not asking much, particularly when you have reason to be confident in its stability. -0, after the fact. -C On Thu, Sep 24, 2015 at 11:39 AM, Wangda Tan <[email protected]> wrote: > Trunk build seems stable now after the merge. A couple of yarn trunk builds > also ran fine such as: https://builds.apache.org/job/Hadoop-Yarn-trunk/1169/ > . > > I've merged this into branch-2 so as to minimize YARN dev effort across > branches as per the discussion thread of YARN-1197 merge. > > Thanks, > Wangda > > > On Wed, Sep 23, 2015 at 3:15 PM, Wangda Tan <[email protected]> wrote: > >> Thanks Karthik, >> >> Doing end-to-end test as well as regression test is the top priority of >> this feature. Will keep you updated. >> >> Regards, >> Wangda >> >> On Wed, Sep 23, 2015 at 3:01 PM, Karthik Kambatla <[email protected]> >> wrote: >> >>> I did skim through the patch before voting :) >>> >>> As I mentioned earlier, I understand vast majority of the code is net new, >>> and the likelihood of breaking changes is low. Still, I would like for us >>> to be more careful and run some end-to-end and regression tests before >>> including this in a release. While I won't block the merge to branch-2, >>> will be a -1 on cutting a release without due testing. >>> >>> At the risk of digressing, I also feel branch-2 should be in a releasable >>> state more often than not. The only way to bring any cadence and >>> predictability to our releases is by maintaining branch-2 in such a state. >>> >>> On Tue, Sep 22, 2015 at 8:14 PM, Wangda Tan <[email protected]> wrote: >>> >>> > Hi Karthik, >>> > >>> > Let me elaborate more to make you feel better of this change, don't be >>> > scared by the size of the patch :) >>> > >>> > Common RM/Scheduler part: >>> > - AbstractYarnScheduler new logic only. >>> > - AppSchedulingInfo new logic only. >>> > - RMContainer / RMNode state machine, new logic only >>> > - SchedulerApplicationAttempt / Allocation, refactoring to existing >>> > reservation logic so increase request reservation can reuse it, and >>> > refactored to simply updating container token / pull container part so >>> > increase/decrease/new-allocation can reuse same code. >>> > >>> > FairScheduler: >>> > - Small change since we updated how to pull container updated token. I >>> > believe it will be a straightforward change for you if you take a closer >>> > look at it. >>> > >>> > CapacityScheduler: >>> > - Most changes are separate logic or small refactorings, most complex >>> > allocation logic stays within IncreaseContainerAllocator.java. >>> > >>> > Please let me know where you want to get more details of >>> implementations. >>> > >>> > I strongly suggest you to take a glance at the diff, we have already >>> worked >>> > on the merge for the past one week, and we've paid a lot of extra time >>> to >>> > keep YARN-1197 sync with trunk in the past several months. After this >>> merge >>> > finished, a couple of weeks needed to finish end-to-end test and some >>> other >>> > extra tests, it won't affect our upcoming branch-2 release. >>> > >>> > I would not prefer to merge to trunk only, all people working on RM side >>> > will be affected, we're very carefully avoid such divergence of RM in >>> > trunk/branch-2. Since nobody wants to create two different patches for >>> > every RM changes. And also, after this finished, other efforts can >>> happen >>> > in parallel such as YARN-4091. >>> > >>> > Let me know if you have any other questions/concerns. >>> > >>> > Thanks, >>> > Wangda >>> > >>> > On Tue, Sep 22, 2015 at 7:26 PM, Karthik Kambatla <[email protected]> >>> > wrote: >>> > >>> > > I am sorry, but merging a potentially disruptive change to branch-2 >>> > without >>> > > end-to-end tests seems too disruptive to me. >>> > > >>> > > I do agree with you on the potential inconvenience of having to post >>> > > different patches for trunk and branch-2, but I would rather have that >>> > > inconvenience than the risk of merging something that hasn't been >>> > > thoroughly tested. >>> > > >>> > > On Tue, Sep 22, 2015 at 6:18 PM, Wangda Tan <[email protected]> >>> wrote: >>> > > >>> > > > Hi Karthik, >>> > > > >>> > > > Thanks for comments! However, I think only merge to trunk may not >>> work, >>> > > > this patch involves thousands lines of code changes in scheduler >>> side, >>> > > only >>> > > > putting that to trunk could lead to trunk/branch-2 totally >>> incompatible >>> > > for >>> > > > resource manager. I think most of the code changes are new to >>> scheduler >>> > > > instead of modifying existed logic, they're not very tricky to me. >>> And >>> > > when >>> > > > 2.8 will be released is not planned yet, at least we have a couple >>> of >>> > > > months to make sure this feature becomes usable and not cause >>> existing >>> > > > behavior regressions. >>> > > > >>> > > > Sounds good to you? >>> > > > >>> > > > Wangda >>> > > > >>> > > > >>> > > > On Tue, Sep 22, 2015 at 3:48 PM, Karthik Kambatla < >>> [email protected]> >>> > > > wrote: >>> > > > >>> > > > > +1 on merging to trunk. It would be nice to have some amount of >>> > testing >>> > > > > done before the merge, but I understand how merging to trunk would >>> > > likely >>> > > > > speed up the testing efforts. >>> > > > > >>> > > > > Let us not merge into branch-2 until after we have done a fair >>> bit of >>> > > > > testing, and are comfortable including it in a release. While the >>> > code >>> > > > > mostly appears to not mess with existing scheduling logic, I am >>> > > concerned >>> > > > > about regressions to existing scheduling behavior. >>> > > > > >>> > > > > >>> > > > > >>> > > > > On Tue, Sep 22, 2015 at 1:28 PM, Karthik Kambatla < >>> > [email protected]> >>> > > > > wrote: >>> > > > > >>> > > > > > By the way, for the purposes of merge vote, I believe a >>> committer's >>> > > > vote >>> > > > > > is binding. So, Wangda and Zhihai's votes should be binding. :) >>> > > > > > >>> > > > > > On Tue, Sep 22, 2015 at 11:38 AM, Zhihai Xu <[email protected]> >>> > > wrote: >>> > > > > > >>> > > > > >> +1 (non-binding) >>> > > > > >> >>> > > > > >> thanks >>> > > > > >> Zhihai Xu >>> > > > > >> >>> > > > > >> On Tue, Sep 22, 2015 at 12:10 AM, Xuan Gong < >>> > [email protected]> >>> > > > > >> wrote: >>> > > > > >> >>> > > > > >> > +1 Binding >>> > > > > >> > >>> > > > > >> > Thanks >>> > > > > >> > >>> > > > > >> > Xuan Gong >>> > > > > >> > >>> > > > > >> > > On Sep 22, 2015, at 12:03 AM, Junping Du < >>> [email protected] >>> > > >>> > > > > wrote: >>> > > > > >> > > >>> > > > > >> > > +1. (Binding). >>> > > > > >> > > >>> > > > > >> > > Thanks, >>> > > > > >> > > >>> > > > > >> > > Junping >>> > > > > >> > > ________________________________________ >>> > > > > >> > > From: Wangda Tan <[email protected]> >>> > > > > >> > > Sent: Thursday, September 17, 2015 3:19 AM >>> > > > > >> > > To: [email protected] >>> > > > > >> > > Subject: Re: [VOTE] Merge YARN-1197 container resize into >>> > trunk >>> > > > > >> > > >>> > > > > >> > > +1 (non-binding), >>> > > > > >> > > >>> > > > > >> > > Thanks Jian starting this thread. This can minimize effort >>> of >>> > > > works >>> > > > > >> > across branches. >>> > > > > >> > > >>> > > > > >> > > To clarify, this feature is end-to-end code completed, we >>> > have >>> > > > API, >>> > > > > >> > rm/nm implementations patches committed, but we haven't >>> tested >>> > it >>> > > > > >> > end-to-end. Filed YARN-4175 to create an example program to >>> test >>> > > it >>> > > > > >> > end-to-end. >>> > > > > >> > > >>> > > > > >> > > Regards, >>> > > > > >> > > Wangda >>> > > > > >> > > >>> > > > > >> > >> On Sep 16, 2015, at 6:30 PM, Jian He <[email protected] >>> > >>> > > > wrote: >>> > > > > >> > >> >>> > > > > >> > >> Hi All, >>> > > > > >> > >> >>> > > > > >> > >> Thanks Meng Ding and Wangda Tan for all the hard work ! >>> > > > > >> > >> >>> > > > > >> > >> I would like to call a vote to merge YARN-1197 container >>> > resize >>> > > > > into >>> > > > > >> > trunk. >>> > > > > >> > >> >>> > > > > >> > >> Key idea: >>> > > > > >> > >> This feature adds the ability for AM to change container >>> > > resource >>> > > > > >> size >>> > > > > >> > at runtime. >>> > > > > >> > >> >>> > > > > >> > >> Details: >>> > > > > >> > >> - This feature is tracked at >>> > > > > >> > https://issues.apache.org/jira/browse/YARN-1197 >>> > > > > >> > >> - It’s currently developed at a separate branch: >>> > > > > >> > https://github.com/apache/hadoop/commits/YARN-1197 >>> > > > > >> > >> - A uber patch( >>> > https://issues.apache.org/jira/browse/YARN-4157 >>> > > ) >>> > > > > >> > generated from YARN-1197 to run against trunk shows all unit >>> > > tests >>> > > > > have >>> > > > > >> > passed. >>> > > > > >> > >> - This feature now can work end-to-end. >>> > > > > >> > >> - All the unresolved jiras under YARN-1197 will be the >>> next >>> > > > step. >>> > > > > >> > >> >>> > > > > >> > >> Thanks, >>> > > > > >> > >> Wangda Tan & Meng Ding & Jian He >>> > > > > >> > > >>> > > > > >> > > >>> > > > > >> > >>> > > > > >> > >>> > > > > >> >>> > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >> >>
