Re: [DISCUSS] Developing features in branches

Steve Loughran Fri, 01 May 2015 03:00:30 -0700

people are already doing work in branches, on private github repos, with their 
own personal review/commit policy.


Branches in apache codebase would permit more structured collaboration between 
committers. But to get the same review as a final patch, they probably need 
supervision as they go along.

The strength of doing things in branches is
-no half-complete patches, either during the work or otherwise
-one person's work-in-progress doesn't break other people's work.


The weaknesses?
 -the longer lived the branch, the harder the merge. You can reduce the impact 
through rebasing the branch, which you can't do on shared branches, or simply 
through regular merges (which complicates the graph).
 -there's potentially more of a tendency to accept a long-lived branch in 
without enough final review, on the basis that the work has been ongoing for 
longer.

whatever: it works for HDFS

> On 1 May 2015, at 05:14, Bikas Saha <bi...@hortonworks.com> wrote:
> 
> In other words, the best solution is careful up-front design and break up of 
> changes so that they can be made incrementally. At that point working on 
> master is not much different than working in a branch. However, if that 
> allows for a set of changes to be made inside a branch in a non-disruptive 
> manner and people want the extra work of maintaining a branch then that 
> choice could be made. E.g. there could be parts which are less thought out 
> and more risky. They could be grafted out in a preparatory master patches and 
> then do the isolated riskier changes in a branch. This would make sense when 
> the riskier change is worth 5-10 jiras or more. Ie. The work is substantial 
> enough that it need multiple jiras over multiple weeks to get to completion. 
> So avoiding master is beneficial. That's the way I would think about it.
> 
> -----Original Message-----
> From: Chris Nauroth [mailto:cnaur...@hortonworks.com] 
> Sent: Thursday, April 30, 2015 3:46 PM
> To: yarn-dev@hadoop.apache.org
> Subject: Re: [DISCUSS] Developing features in branches
> 
> In HDFS, our recent feature branches tried to keep large portions of their 
> new code in new classes (i.e.
> org.apache.hadoop.hdfs.server.namenode.CacheManager) or even new Java 
> packages (i.e. org.apache.hadoop.hdfs.server.namenode.snapshot).  We tried to 
> make minimal changes in existing code: just enough to hook into the new code. 
>  If hooking into the new code isn't easy for some reason, then sometimes you 
> can submit a non-impactful refactoring patch to trunk to help make it easier. 
>  By submitting straightforward refactorings to trunk first, you can reduce 
> some of the difficulty of reviewing a large consolidated patch at merge time. 
>  Reviewers can focus on the new logic.
> 
> This tends to minimize the impact of merge conflicts coming from either trunk 
> or a sibling feature branch.  This is only possible if it's a logically 
> distinct new feature and this kind of code organization makes sense for that 
> feature, but it's something to keep in mind.
> 
> --Chris Nauroth
> 
> 
> 
> 
> On 4/30/15, 3:23 PM, "Zhijie Shen" <zs...@hortonworks.com> wrote:
> 
>> Exactly. Branch development is good, but I concerned about too many 
>> concurrent branches. In terms of code management, the good branch 
>> development candidate could be those like registry, shared cache and 
>> timeline service. Their most changes are the incremental code in some 
>> new sub-module, are less likely to conflict with trunk/branch-2, and 
>> are rarely depended by other parallel development.
>> 
>> Thanks,
>> Zhijie
>> ________________________________________
>> From: Bikas Saha <bi...@hortonworks.com>
>> Sent: Thursday, April 30, 2015 12:52 PM
>> To: yarn-dev@hadoop.apache.org
>> Subject: RE: [DISCUSS] Developing features in branches
>> 
>> I think what Zhijie is talking about is a little different. Work 
>> happening in parallel across 2 branches have no clue about each other 
>> since they don¹t get updates via master. If a bunch of these branches 
>> is tried to be merged close to a release then there are likely to be a 
>> lot of surprises. As an example, lets say support for speculation and 
>> node labels were happening in separate branches. It is very likely that 
>>> 50% of the code would conflict - not just in code but also in semantics.
>> 
>> Bikas
>> 
>> -----Original Message-----
>> From: Ray Chiang [mailto:rchi...@cloudera.com]
>> Sent: Thursday, April 30, 2015 10:35 AM
>> To: yarn-dev@hadoop.apache.org
>> Subject: Re: [DISCUSS] Developing features in branches
>> 
>> Following up on Zhijie's comments, there's nothing to prevent 
>> periodically pulling updates from the "main" branch (e.g. branch-2 or
>> trunk) into the feature branch, is there?  Or cherry-picking some 
>> changes to alleviate conflict management during branch merging?
>> 
>> I've seen other projects use one of the two techniques above.
>> 
>> -Ray
>> 
>> 
>> On Wed, Apr 29, 2015 at 9:43 PM, Zhijie Shen <zs...@hortonworks.com>
>> wrote:
>> 
>>> My 2 cents:
>>> 
>>> Branch maintenance cost should be fine if we have few features to be  
>>> developed in branches. However, if there're too many, each other  
>>> branch may be blind to most of latest code change from others, and
>>> trunk/branch-2 becomes stale. That said, with the increasing adopting  
>>> of branch development, it's likely to increase the cost of merging 
>>> each branch back.
>>> 
>>> Some features may last more than one releases, such as RM restarting  
>>> before and timeline service now. Even if it's developed in a branch,  
>>> we may want to merge its milestones such as phase 1, phase 2 back to
>>> trunk/branch-2 to align with some release before it's completely done.
>>> Moreover, my experience is that the longer a feature stays in the  
>>> branch, the more conflicts we have to merge. Hence, it may not be a  
>>> good idea to hold a feature in the branch too long before merging it 
>>> back.
>>> 
>>> Thanks,
>>> Zhijie
>>> ________________________________________
>>> From: Subramaniam V K <subru...@gmail.com>
>>> Sent: Wednesday, April 29, 2015 7:16 PM
>>> To: yarn-dev@hadoop.apache.org
>>> Subject: Re: [DISCUSS] Developing features in branches
>>> 
>>> Karthik, thanks for starting the thread.
>>> 
>>> Here's my $0.02 based on the experience of working on a feature 
>>> branch while adding reservations (YARN-1051).
>>> 
>>> Overall a +1 for the approach.
>>> 
>>> The couple of pain points we faced were:
>>> 1) Merge cost with trunk
>>> 2) Lack of CI in the feature branch
>>> 
>>> The migration to git & keeping the feature branch in continuous sync 
>>> with trunk mitigated (1) and with Allen's new test-patch.sh 
>>> addressing (2), branches for features especially if used for all 
>>> major features seems like an excellent choice.
>>> 
>>> -Subru
>>> 
>>> On Tue, Apr 28, 2015 at 5:47 PM, Sangjin Lee <sjl...@gmail.com> wrote:
>>> 
>>>> Ah, I missed that part (obviously). Fantastic!
>>>> 
>>>> On Tue, Apr 28, 2015 at 5:31 PM, Sean Busbey <bus...@cloudera.com>
>>> wrote:
>>>> 
>>>>> On Apr 28, 2015 5:59 PM, "Sangjin Lee" <sjl...@gmail.com> wrote:
>>>>>> 
>>>>> 
>>>>>> That said, in a way we're deferring the cost of cleaning things 
>>>>>> up
>>>>> towards
>>>>>> the end of the branch. For example, we don't get the same 
>>>>>> treatment
>>> of
>>>>> the
>>>>>> hadoop jenkins in a branch development. It's left up to the 
>>>>>> group or
>>>> the
>>>>>> individuals to make sure to run test-patch.sh to ensure tech 
>>>>>> debt
>>> does
>>>>> not
>>>>>> accumulate.
>>>>> 
>>>>> As Allen previously mentioned, the QA bot will run test-patch 
>>>>> against feature branches so long as you name the patch file
>>> correctly.
>>>>> 
>>>> 
>>> 
>

Re: [DISCUSS] Developing features in branches

Reply via email to