[sympy] Avoiding big branches

Aaron Meurer Mon, 25 Jul 2011 15:36:38 -0700

Hi everyone.

I wrote a blog post about this
(http://asmeurersympy.wordpress.com/2011/07/25/merging-integration3-with-sympy-0-7-0-nightmare/),
but I think it's important enough that it should be discussed here.


You can read my blog post for the details, but basically I spent the
past three weeks on and off merging master into my integration3
development branch.  I'm not yet ready to submit this for review: I
just did the merge so that I could get some upstream fixes.  But, for
whatever reason, git decided that I needed to review basically every
change made in master since the base of my branch.

But even other than that, I found at least two regressions in the
polys and two regressions outside the polys, had to make several
changes to my code to make it run again (like replacing as_basic()
with as_expr()), and had to manually check the diff of the merge to
make sure I handled all the merge conflicts correctly.

We talked about this a little back when we did the polys12 merge, but
I want to reiterate it.  Having big branches like this is very bad.
Keeping things up to date with master is a nightmare.  Furthermore,
there are bound to be regressions in master that are not noticed
because they only trigger test failures in your branch.

Therefore, I recommend that anyone who has a big branch of work to try
to get it merged in, and that in the future you try to submit your
fixes as separate small (but still meaningful) pull requests.  git
makes it easy to work with several small branches; much easier than
than working with one big branch in my opinion.  This is especially
true if you are like me and do not like rebasing.  You can just merge
your small branches together, and the merges will be easier because
the size of the changes will be smaller.  And since we basically never
rebase pull requests any more due to the merge button, you can usually
just use the exact same commits.

But even if your git habits are more rebase oriented, you should still
do this, because rebases can be just as painful as merges for big
branches.  And regardless of your merging habits, you still have the
same scenario where someone changes something in master and makes
fixes everywhere where necessary, but this does not include your code,
because it's separate from master.   For example, we recently changed
the printer to use lexicographic ordering by default.  So all the
doctests in SymPy had to be updated.  Anyone who had a development
branch had to change his own doctests when he merged/rebased over this
change.  But if those dev changes were merged with master, they would
have been fixed by Mateusz when he changed all the docstrings in
SymPy.

I've noticed that Mateusz has stopped using the big branch model.
Rather, he submits all changes as pull requests to master.  And so far
we have not had anything near what we had with polys12.  The changes
are all reviewed (this is another problem with big branches, is that
they are harder to review), merge conflicts are minimal, since only
specific pull requests get merge conflicts, rather than a whole big
branch, and it's easier to understand what is being done with each
pull request.

Sure, some of his requests haven't been merged yet, but actually,
because of those that *have* been merged, it's more than if he had a
polys13 with everything in it and none of it merged with master.  And
like I said, with git, it's very easy to take in the changes you need
locally if they haven't been merged yet.

Other experienced devs could probably also share their experiences
with this sort of thing.

I want to get this message across especially to our GSoC students, who
are the ones making the most changes right now, and also may not
remember all the polys12 stuff enough to see the problems that were
shown with it.

Some of the GSoC students are doing a good job of submitting smaller
changes back as pull requests, especially atomic changes that do not
require the rest of their work to be merged.  Others are not doing so
well, I think.  I could go through and tell you each how well you are
doing more specifically if you want.

Even if you have development work that is not ready for user-level
interaction, you should still get this merged with master.  Then
people will notice regressions against your code when your tests fail,
and if you allow some user interaction, for example, by turning if off
by default or by marking the module as unstable, some people will use
it, and find bugs in it for you.  You may even get some people
submitting patches fixing your code.  People do this with code that's
in master, but very rarely with code that's buried in a development
branch.

As for my branch, I still can't merge it because the code relies
pretty heavily on a regression I had to make, which was basically to
disable algebraic substitution in exp.subs.  So I am going to put
priority on fixing subs (see
http://groups.google.com/group/sympy/browse_thread/thread/4a19d0f39f51fda6#)
over any additional improvements to the Risch algorithm.  Then, when
this is fixed, I will submit my branch for review, and any additional
fixes I make, no matter how small, I will submit immediately as pull
requests, rather than storing them up in some dev branch.

Aaron Meurer

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sympy?hl=en.

[sympy] Avoiding big branches

Reply via email to