subject:"Re\: Bring together merge and rebase"

Re: Bring together merge and rebase

2018-01-06 Thread Theodore Ts'o

On Sat, Jan 06, 2018 at 10:29:21AM -0700, Carl Baldwin wrote:
> > When n==m==1, "amended" pointer from X1 to A1 may allow you to
> > answer "Is this the first attempt?  If this is refined, what did the
> > earlier one look like?" when given X1, but you would also want to
> > answer a related question "This was a good start, but did the effort
> > result in a refined patch, and if so what is it?" when given A1, and
> > "amended" pointer won't help at all.  Needless to say, the "pointer"
> > approach breaks down when !(n==m==1).
> 
> It doesn't break down. It merely presents more sophisticated situations
> that may be more work for the tool to help out with. This is where I
> think a prototype will help see these situations and develop the tool to
> manage them.

That's another way of saying "break down".

And if the goal is a prototype, may I gently suggest that the way
forward is trailers in the commit body, ala:

Change-Id: I0b793feac9664bcc8935d8ec04ca16d5

or

Upstream-4.15-SHA1: 73875fc2b3934e45b4b9a94eb57ca8cd

Making changes in the commit header is complex, and has all *sorts* of
forward and backwards compatibility challenges, especially when it's
not clear what the proper data model should be.

Cheers,

 -Ted

Re: Bring together merge and rebase

2018-01-06 Thread Carl Baldwin

On Sat, Jan 06, 2018 at 10:29:19AM -0700, Carl Baldwin wrote:
> To me, this is roughly equivalent to saying that parent pointers
> embedded in a commit object is a good idea because we want a richer
> relationship than mere "parent". Look how much we've done with this
> simple relationship. Similarly, the new relationship that I'm
> proposing handles much more than the simple m==n==1 case. Read below
> for more detail.

Of course, I meant to say "is not a good idea" in the above paragraph.
Please pardon my error.

Re: Bring together merge and rebase

2018-01-06 Thread Carl Baldwin

On Fri, Jan 05, 2018 at 12:14:28PM -0800, Junio C Hamano wrote:
> Martin Fick  writes:
> 
> > These scenarios seem to come up most for me at Gerrit hack-
> > a-thons where we collaborate a lot in short time spans on 
> > changes.  We (the Gerrit maintainers) too have wanted and 
> > sometimes discussed ways to track the relation of "amended" 
> > commits (which is generally what Gerrit patchsets are).  We 
> > also concluded that some sort of parent commit pointer was 
> > needed, although parent is somewhat the wrong term since 
> > that already means something in git.  Rather, maybe some 
> > "predecessor" type of term would be better, maybe 
> > "antecedent", but "amended-commit" pointer might be best?
> 
> In general, I agree that you would want richer set of "relationship"
> than mere "predecessor" or "related", but I do not think "amended"
> is sufficient.  I certainly do not think a "pointer" embedded in a
> commit object is a good idea, either (a new commit object header is

To me, this is roughly equivalent to saying that parent pointers
embedded in a commit object is a good idea because we want a richer
relationship than mere "parent". Look how much we've done with this
simple relationship. Similarly, the new relationship that I'm proposing
handles much more than the simple m==n==1 case. Read below for more
detail.

> out of question, but I doubt it is a good idea to make a pointer
> back to an existing commit as a part of the log message).
> 
> You may used to have a set of n-patches A1, A2, ..., An, that turned
> into m-patches X1, X2, ..., Xm, after refactoring.  During the work,
> it may turned out that some things the original tried to do are not
> sensible and dropped, while some other things are added in the final.
> series.  
> 
> When n==m==1, "amended" pointer from X1 to A1 may allow you to
> answer "Is this the first attempt?  If this is refined, what did the
> earlier one look like?" when given X1, but you would also want to
> answer a related question "This was a good start, but did the effort
> result in a refined patch, and if so what is it?" when given A1, and
> "amended" pointer won't help at all.  Needless to say, the "pointer"
> approach breaks down when !(n==m==1).

It doesn't break down. It merely presents more sophisticated situations
that may be more work for the tool to help out with. This is where I
think a prototype will help see these situations and develop the tool to
manage them.

When each of n commits is amended or rebased trivially into m==n new
commits then each change is represented by a distinct graph of
predecessors that can be followed independently of others. With rebase,
this is accomplished by using only "pick" in interactive mode or not
using interactive mode at all (and no autosquash).

The more sophisticated cases can be broken down into two operations that
change the number of resulting commits.

  1. Squashing two commits together ("fixup", "squash"). In this case,
 the resulting commit will have two or more pointers. This clearly
 shows that multiple changes converged into one at this point.

  2. Splitting a single commit into multiple new commits ("edit"). In
 this case, the graph shows multiple new commits pointing to the
 same predecessor. In my experience, this is less common. It also is
 a little more challenging to think about the tool managing
 divergent work but I think it is possible.

The end result is m commits where m can be any positive number (even,
coincidentally, n). However, the graph of amended commits still tells
the story quite well. Even if commits are reordered, the graphs can
still be useful. The predecessor graph is independent of the parent
graph which makes up normal git commit history so it isn't inherently
bad that the order of commits was changed.

We can dream up some very interesting graphs. Sure, as we do
increasingly more complicated history rewriting, it is going to be
increasingly more difficult for the tool to help out. I'm not really
deterred by this at this point. I want to experiment and work it out
with a prototype.

My primary objective personally is to detect where work on a single
change has diverged by working on it from more than one workspace
whether its multiple people chipping in or just me. Merely having the
ability to reject an update that clobbers divergent work is a big win.
No more silent corruption of work.

My secondary objective is to develop a tool to help get the divergent
work back on track. I believe that in the majority of common cases, this
tool can be successful in either finding an automatic way to bring the
divergent work back into a new revision of the change or present the
user with conflicts to resolve that end up being much easier than what
I've had to do in past experience with rebase workflows.

Carl

Re: Bring together merge and rebase

2018-01-05 Thread Junio C Hamano

Martin Fick  writes:

> These scenarios seem to come up most for me at Gerrit hack-
> a-thons where we collaborate a lot in short time spans on 
> changes.  We (the Gerrit maintainers) too have wanted and 
> sometimes discussed ways to track the relation of "amended" 
> commits (which is generally what Gerrit patchsets are).  We 
> also concluded that some sort of parent commit pointer was 
> needed, although parent is somewhat the wrong term since 
> that already means something in git.  Rather, maybe some 
> "predecessor" type of term would be better, maybe 
> "antecedent", but "amended-commit" pointer might be best?

In general, I agree that you would want richer set of "relationship"
than mere "predecessor" or "related", but I do not think "amended"
is sufficient.  I certainly do not think a "pointer" embedded in a
commit object is a good idea, either (a new commit object header is
out of question, but I doubt it is a good idea to make a pointer
back to an existing commit as a part of the log message).

You may used to have a set of n-patches A1, A2, ..., An, that turned
into m-patches X1, X2, ..., Xm, after refactoring.  During the work,
it may turned out that some things the original tried to do are not
sensible and dropped, while some other things are added in the final.
series.  

When n==m==1, "amended" pointer from X1 to A1 may allow you to
answer "Is this the first attempt?  If this is refined, what did the
earlier one look like?" when given X1, but you would also want to
answer a related question "This was a good start, but did the effort
result in a refined patch, and if so what is it?" when given A1, and
"amended" pointer won't help at all.  Needless to say, the "pointer"
approach breaks down when !(n==m==1).

Re: Bring together merge and rebase

2018-01-04 Thread Carl Baldwin

On Thu, Jan 04, 2018 at 10:09:19PM -0700, Carl Baldwin wrote:
> This would be very cool. I've wanted to tackle this for a long time. I
> think I even filed an issue with gerrit about this years ago.

Yep, it turned out that it was a duplicate but I described what I did to
work around it.

https://bugs.chromium.org/p/gerrit/issues/detail?id=2375

Re: Bring together merge and rebase

2018-01-04 Thread Carl Baldwin

On Thu, Jan 04, 2018 at 12:19:34PM -0700, Martin Fick wrote:
> On Tuesday, December 26, 2017 12:40:26 AM Jacob Keller 
> wrote:
> > On Mon, Dec 25, 2017 at 10:02 PM, Carl Baldwin 
>  wrote:
> > >> On Mon, Dec 25, 2017 at 5:16 PM, Carl Baldwin 
>  wrote:
> > >> A bit of a tangent here, but a thought I didn't wanna
> > >> lose: In the general case where a patch was rebased
> > >> and the original parent pointer was changed, it is
> > >> actually quite hard to show a diff of what changed
> > >> between versions.
> > 
> > My biggest gripes are that the gerrit web interface
> > doesn't itself do something like this (and jgit does not
> > appear to be able to generate combined diffs at all!)
> 
> I believe it now does, a presentation was given at the 
> Gerrit User summit in London describing this work.  It would 
> indeed be great if git could do this also!

This would be very cool. I've wanted to tackle this for a long time. I
think I even filed an issue with gerrit about this years ago.

Carl

Re: Bring together merge and rebase

2018-01-04 Thread Carl Baldwin

On Thu, Jan 04, 2018 at 01:06:27PM -0700, Martin Fick wrote:
> On Tuesday, December 26, 2017 01:31:55 PM Carl Baldwin 
> wrote:
> ...
> > What I propose is that gerrit and github could end up more
> > robust, featureful, and interoperable if they had this
> > feature to build from.
> 
> I agree (assuming we come up with a well defined feature)
> 
> > With gerrit specifically, adopting this feature would make
> > the "change" concept richer than it is now because it
> > could supersede the change-id in the commit message and
> > allow a change to evolve in a distributed non-linear way
> > with protection against clobbering work.
> 
> We (the Gerrit maintainers) would like changes to be able to 
> evolve non-linearly so that we can eventually support 
> distributed Gerrit reviews, and the amended-commit pointer 
> is one way I have thought to resolve this.

I really think that keeping these references is the key to doing this.

> > I have no intention to disparage either tool. I love them
> > both. They've both made my career better in different
> > ways. I know there is no guarantee that github, gerrit,
> > or any other tool will do anything to adopt this. But,
> > I'm hoping they are reading this thread and that they
> > recognize how this feature can make them a little bit
> > better and jump in and help. I know it is a lot to hope
> > for but I think it could be great if it happened.
> 
> We (the Gerrit maintainers) do recognize it, and I am glad 
> that someone is pushing for solutions in this space.  I am 
> not sure what the right solution is, and how to modify 
> workflows to deal better with this.  I do think that starting 
> by making your local repo track pointers to amended-commits, 
> likely with various git hooks and notes (as also proposed by 
> Johannes Schindelin), would be a good start.   With that in 
> place, then you can attack various specific workflows.

I have started a prototype that I will use to demonstrate this. I hope
to have something in a couple of weeks. I do have a day job also, so it
will be slow going. One idea that I had was to put my own server with
special hooks in it in front of gerrit to illustrate how collaboration
on a gerrit change, or even a chain of them, can be made safe. It would
act as a middle man between my client and the gerrit server. I'd just
have to change remote reference on my client to demonstrate.

> If you want to then attack the Gerrit workflow, it would be 
> good if you could prevent pushing new patchests that are 
> amended versions of patchsets that are out of date.  While 
> it would be great if Gerrit could reject such pushes, I 
> wonder if to start, git could detect and it prevent the push 
> in this situation?  Could a git push hook analyze the ref 
> advertisements and figure this out (all the patchsets are in 
> the advertisement)?  Can a git hook look at the ref 
> advertisement?

I'll think about this. At the least, the hook would have to look at the
server to see if there are new revisions. It would be difficult to close
race conditions that occur because the client will always be using
potentially out of date information even if it just went and pulled down
the latest stuff. I think I still like my middle man idea better as a
short term proof of concept.

Preventing pushing amended/rebased versions of out of date changes is
simple. Follow the "predecessor" references until you hit a known
commit. If that commit is the latest revision of the change then it is
up to date. If that commit not the latest revision, then it is out of
date. Reject it. This is what I plan to illustrate in my middle man
server.

If you traverse the entire graph of predecessors without finding a known
commit, then you have a new change. (In fact, the changeset id in the
commit message in a gerrit change seems unnecessary at this point). It
gets a little more complicated when you think about combining/squashing
changes (resulting in two or more "predecessor" references from a single
commit) or dividing a change into multiple but it works.

The harder part is the push/pull interaction between client and server.
When you go to push your amended update to a patchset, you want git to
send along any other new commits to complete the predecessor graph on
the server side. For example, you might rebase your commit and then
amend it to fix something. Personally, I'd like the rebase and the amend
to both be kept separately.

Similarly, when you've just had a push rejected because you're out of
date, you want to be able to easily pull down the commits you're missing
so that you can merge locally and try to push again.

You also don't want gc to garbage collect the intermediate commits. I
think gerrit uses many references internally in the git repo to "pin"
older revisions in the repository so that they don't appear orphaned. I
think I'm going to have to do something similar in my prototype.

If you think about it, this is all very much like what git already does
with its

Re: Bring together merge and rebase

2018-01-04 Thread Carl Baldwin

On Thu, Jan 04, 2018 at 12:54:00PM -0700, Martin Fick wrote:
> On Monday, December 25, 2017 06:16:40 PM Carl Baldwin wrote:
> > On Sun, Dec 24, 2017 at 10:52:15PM -0500, Theodore Ts'o 
> wrote:
> > Look at what happens in a rebase type workflow in any of
> > the following scenarios. All of these came up regularly
> > in my time with Gerrit.
> > 
> > 1. Make a quick edit through the web UI then later
> > work on the change again in your local clone. It is easy
> > to forget to pull down the change made through the UI
> > before starting to work on it again. If that happens, the
> > change made through the UI will almost certainly be
> > clobbered.
> > 
> > 2. You or someone else creates a second change that is
> > dependent on yours and works on it while yours is still
> > evolving. If the second change gets rebased with an older
> > copy of the base change and then posted back up for
> > review, newer work in the base change has just been
> > clobbered.
> > 
> > 3. As a reviewer, you decide the best way to explain
> > how you'd like to see something done differently is to
> > make the quick change yourself and push it up. If the
> > author fails to fetch what you pushed before continuing
> > onto something else, it gets clobbered.
> > 
> > 4. You want to collaborate on a single change with
> > someone else in any way and for whatever reason. As soon
> > as that change starts hitting multiple work spaces, there
> > are synchronization issues that currently take careful
> > manual intervention.
> 
> These scenarios seem to come up most for me at Gerrit hack-
> a-thons where we collaborate a lot in short time spans on 
> changes.  We (the Gerrit maintainers) too have wanted and 
> sometimes discussed ways to track the relation of "amended" 
> commits (which is generally what Gerrit patchsets are).  We 
> also concluded that some sort of parent commit pointer was 
> needed, although parent is somewhat the wrong term since 
> that already means something in git.  Rather, maybe some 
> "predecessor" type of term would be better, maybe 
> "antecedent", but "amended-commit" pointer might be best?

I like "replaces" as I have proposed or "supersedes". "predecessor" also
seems pretty good. I may add that to my list of favorites.

Carl

Re: Bring together merge and rebase

2018-01-04 Thread Martin Fick

> On Jan 4, 2018 11:19 AM, "Martin Fick" 
 wrote:
> > On Tuesday, December 26, 2017 12:40:26 AM Jacob Keller
> > 
> > wrote:
> > > On Mon, Dec 25, 2017 at 10:02 PM, Carl Baldwin
> > 
> >  wrote:
> > > >> On Mon, Dec 25, 2017 at 5:16 PM, Carl Baldwin
> > 
> >  wrote:
> > > >> A bit of a tangent here, but a thought I didn't
> > > >> wanna
> > > >> lose: In the general case where a patch was rebased
> > > >> and the original parent pointer was changed, it is
> > > >> actually quite hard to show a diff of what changed
> > > >> between versions.
> > > 
> > > My biggest gripes are that the gerrit web interface
> > > doesn't itself do something like this (and jgit does
> > > not
> > > appear to be able to generate combined diffs at all!)
> > 
> > I believe it now does, a presentation was given at the
> > Gerrit User summit in London describing this work.  It
> > would indeed be great if git could do this also!


On Thursday, January 04, 2018 04:02:40 PM Jacob Keller 
wrote:
> Any chance slides or a recording was posted anywhere? I'm
> quite interested in this topic.

Slides and video + transcript here:

https://gerrit.googlesource.com/summit/2017/+/master/sessions/new-in-2.15.md

Watch the part after the backend improvements,

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation

Re: Bring together merge and rebase

2018-01-04 Thread Martin Fick

On Tuesday, December 26, 2017 01:31:55 PM Carl Baldwin 
wrote:
...
> What I propose is that gerrit and github could end up more
> robust, featureful, and interoperable if they had this
> feature to build from.

I agree (assuming we come up with a well defined feature)

> With gerrit specifically, adopting this feature would make
> the "change" concept richer than it is now because it
> could supersede the change-id in the commit message and
> allow a change to evolve in a distributed non-linear way
> with protection against clobbering work.

We (the Gerrit maintainers) would like changes to be able to 
evolve non-linearly so that we can eventually support 
distributed Gerrit reviews, and the amended-commit pointer 
is one way I have thought to resolve this.

> I have no intention to disparage either tool. I love them
> both. They've both made my career better in different
> ways. I know there is no guarantee that github, gerrit,
> or any other tool will do anything to adopt this. But,
> I'm hoping they are reading this thread and that they
> recognize how this feature can make them a little bit
> better and jump in and help. I know it is a lot to hope
> for but I think it could be great if it happened.

We (the Gerrit maintainers) do recognize it, and I am glad 
that someone is pushing for solutions in this space.  I am 
not sure what the right solution is, and how to modify 
workflows to deal better with this.  I do think that starting 
by making your local repo track pointers to amended-commits, 
likely with various git hooks and notes (as also proposed by 
Johannes Schindelin), would be a good start.   With that in 
place, then you can attack various specific workflows.

If you want to then attack the Gerrit workflow, it would be 
good if you could prevent pushing new patchests that are 
amended versions of patchsets that are out of date.  While 
it would be great if Gerrit could reject such pushes, I 
wonder if to start, git could detect and it prevent the push 
in this situation?  Could a git push hook analyze the ref 
advertisements and figure this out (all the patchsets are in 
the advertisement)?  Can a git hook look at the ref 
advertisement?

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation

Re: Bring together merge and rebase

2018-01-04 Thread Martin Fick

On Monday, December 25, 2017 06:16:40 PM Carl Baldwin wrote:
> On Sun, Dec 24, 2017 at 10:52:15PM -0500, Theodore Ts'o 
wrote:
> Look at what happens in a rebase type workflow in any of
> the following scenarios. All of these came up regularly
> in my time with Gerrit.
> 
> 1. Make a quick edit through the web UI then later
> work on the change again in your local clone. It is easy
> to forget to pull down the change made through the UI
> before starting to work on it again. If that happens, the
> change made through the UI will almost certainly be
> clobbered.
> 
> 2. You or someone else creates a second change that is
> dependent on yours and works on it while yours is still
> evolving. If the second change gets rebased with an older
> copy of the base change and then posted back up for
> review, newer work in the base change has just been
> clobbered.
> 
> 3. As a reviewer, you decide the best way to explain
> how you'd like to see something done differently is to
> make the quick change yourself and push it up. If the
> author fails to fetch what you pushed before continuing
> onto something else, it gets clobbered.
> 
> 4. You want to collaborate on a single change with
> someone else in any way and for whatever reason. As soon
> as that change starts hitting multiple work spaces, there
> are synchronization issues that currently take careful
> manual intervention.

These scenarios seem to come up most for me at Gerrit hack-
a-thons where we collaborate a lot in short time spans on 
changes.  We (the Gerrit maintainers) too have wanted and 
sometimes discussed ways to track the relation of "amended" 
commits (which is generally what Gerrit patchsets are).  We 
also concluded that some sort of parent commit pointer was 
needed, although parent is somewhat the wrong term since 
that already means something in git.  Rather, maybe some 
"predecessor" type of term would be better, maybe 
"antecedent", but "amended-commit" pointer might be best?

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation

Re: Bring together merge and rebase

2018-01-04 Thread Martin Fick

On Sunday, December 24, 2017 12:01:38 AM Johannes Schindelin 
wrote:
> Hi Carl,
> 
> On Sat, 23 Dec 2017, Carl Baldwin wrote:
> > I imagine that a "git commit --amend" would also insert
> > a "replaces" reference to the original commit but I
> > failed to mention that in my original post.
> 
> And cherry-pick, too, of course.
> 
> Both of these examples hint at a rather huge urge of some
> users to turn this feature off because the referenced
> commits may very well be throw-away commits in their
> case, making the newly-recorded information completely
> undesired.
> 
> Example: I am working on a topic branch. In the middle, I
> see a typo. I commit a fix, continue to work on the topic
> branch. Later, I cherry-pick that commit to a separate
> topic branch because I really don't think that those two
> topics are related. Now I definitely do not want a
> reference of the cherry-picked commit to the original
> one: the latter will never be pushed to a public
> repository, and gc'ed in a few weeks.
> 
> Of course, that is only my wish, other users in similar
> situations may want that information. Demonstrating that
> you would be better served with an opt-in feature that
> uses notes rather than a baked-in commit header.

I think what you are highlighting is not when to track this, 
but rather when to share this tracking.  In my local repo, I 
would definitely want to know that I cherry-picked this from 
elsewhere, it helps me understand what I have done later 
when I look back at old commits and branches that need to 
potentially be thrown away.  But I agree you may not want to 
share these publicly.

I am not sure what the right formula is, for when to share 
these pointers publicly, but it seems like it might be that 
whenever you push something, it should push along any 
references to amended commits that were publicly available 
already.  I am not sure how to track that, but I suspect it 
is a subset of the union of commits you have fetched, and 
commits you have pushed (i.e. you got it from elsewhere, or 
you created it and already shared it with others)?  Maybe it 
should also include any commits reachable by advertisements 
to places you are pushing to (in case it got shared some 
other way)?

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation

Re: Bring together merge and rebase

2018-01-04 Thread Martin Fick

On Tuesday, December 26, 2017 12:40:26 AM Jacob Keller 
wrote:
> On Mon, Dec 25, 2017 at 10:02 PM, Carl Baldwin 
 wrote:
> >> On Mon, Dec 25, 2017 at 5:16 PM, Carl Baldwin 
 wrote:
> >> A bit of a tangent here, but a thought I didn't wanna
> >> lose: In the general case where a patch was rebased
> >> and the original parent pointer was changed, it is
> >> actually quite hard to show a diff of what changed
> >> between versions.
> 
> My biggest gripes are that the gerrit web interface
> doesn't itself do something like this (and jgit does not
> appear to be able to generate combined diffs at all!)

I believe it now does, a presentation was given at the 
Gerrit User summit in London describing this work.  It would 
indeed be great if git could do this also!

-Martin 



-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation

Re: Bring together merge and rebase

2018-01-04 Thread Johannes Schindelin

Hi,

On Sun, 24 Dec 2017, Alexei Lozovsky wrote:

> On Dec 24, 2017, at 01:01, Johannes Schindelin wrote:
> > 
> > Hi Carl,
> > 
> > On Sat, 23 Dec 2017, Carl Baldwin wrote:
> > 
> >> I imagine that a "git commit --amend" would also insert a "replaces"
> >> reference to the original commit but I failed to mention that in my
> >> original post.
> > 
> > And cherry-pick, too, of course.
> 
> Why would it?

Because that's the command you use if you perform an interactive rebase
"manually". Or if you need to split a topic branch into two.

Ciao,
Johannes

Re: Bring together merge and rebase

2017-12-27 Thread Carl Baldwin

On Wed, Dec 27, 2017 at 03:35:58PM +0200, Alexei Lozovsky wrote:
> I think the reasoning behind Theo's words is that it would be better
> to first implement the commit relationship tracking as an add-in which
> uses commit messages for data storage, then evaluate its usefulness
> when it's actually available (including extensions to gitk and stuff
> to support the new metadata), and then it could be moved into core git
> data structures, when it has proven itself useful. It's not a trivial
> feature which warrants immediate addition to git and its design can
> change when faced with real- world use-cases, so it would be bad for
> compatibility to rush its addition. Storage location for metadata
> seems to be an implementation detail which could be technically
> changed more or less easily. But it's much easier to ignore a trailer
> in commit message in the favor of a commit header field than to
> replace a deprecated commit header field with a better one, which
> could cause massive headache for all git repositories in the world.

Yeah, this is a point that everyone is eager to make instead of really
trying to understand what I'm trying to do and offering constructive
suggestions. It's not that I'm not listening. I'm not really concerned
about headers vs trailers or the asthetics of the whole thing as much as
I'm concerned about how the server / client interaction will be. I worry
that anything that I come up with that isn't implemented in the regular
git core push and fetch will end up being awkward or end up needing to
reimplement a lot of what's already in git. But, maybe it just needs a
little more thought. Let me try to think through it...

Imagine John posts a new change up for review to a review server. The
current master points at commit A and so he grabs it and drafts his
first proposal, B1.

digraph history {
B1 -> A
}

Soon after posting, he notices a couple of simple errors and uses the
web UI to correct them. This creates B2. (Dashed edges are replaces
references).

digraph history {
B1 -> A
B2 -> A
B2 -> B1 [ style="dashed"; ]
}

Anna reviews B2 and finds a small nit. She asks John if she can just fix
it and push up a new review. He agrees. She pushes up B3.

digraph history {
B1 -> A
B2 -> A
B3 -> A
B2 -> B1 [ style="dashed"; ]
B3 -> B2 [ style="dashed"; ]
}

John goes back to his workspace and does a little more work on B. He
creates the fourth revision, B4 but since he didn't update his workspace
with the other two most recent revisions, his new revision is derived
from B1.

digraph history {
B1 -> A
B2 -> A
B3 -> A
B4 -> A
B2 -> B1 [ style="dashed"; ]
B3 -> B2 [ style="dashed"; ]
B4 -> B1 [ style="dashed"; ]
}

John then pushes to the server. I imagined that would be a command
similar to what gerrit does.

git push codereview refs/for/master

At this point, I want a couple of things to happen. First, the server
should be able to match the new revision to the change by following the
replaces references to the commits it already has. Then it should
recognize that this is not a fast forward update to the change and
reject it on those grounds.

After that, John needs to be able to fetch B2 and B3 so that his local
client can perform a merge. I guess John needs to know what change he's
trying to fetch. In this case, he needs to fetch both B2 and B3 in order
get the full history graph of the change. The problem I see here is that
today's git fetch would see B2 and B3 as unrelated branches. There could
be any number of them to fetch. So, how does he ask for everything
related to the change? Does he do a wild card or something?

git fetch codereview refs/changes/123/*

Or does he just fetch all refs (this could be many on a busy review
server)? Or do we need to do something out of band to discover the list
of references that need to be fetched?

I've been thinking out loud a bit. I guess this could be a path forward.
I guess to make gc happy, I've got to keep around a ref pointing at each
new revision so that it doesn't get garbage collected.

Carl

Re: Bring together merge and rebase

2017-12-27 Thread Alexei Lozovsky

On Dec 27, 2017, at 06:35, Carl Baldwin  wrote:
> 
> On Sun, Dec 24, 2017 at 10:52:15PM -0500, Theodore Ts'o wrote:
>> 
>> My experience, from seeing these much more complex use cases ---
>> starting with something as simple as the Linux Kernel Stable Kernel
>> Series, and extending to something much more complex such as the
>> workflow that is used to support a Google Kernel Rebase, is that using
>> just a simple extra "Replaces" pointer in the commit header is not
>> nearly expressive enough.  And, if you make it a core part of the
>> commit data structure, there are all sorts of compatibility headaches
>> with older versions of git that wouldn't know about it.  And if it
> 
> The more I think about this, the less I worry. Be sure that you're using 
> 
>> then turns out it's not sufficient more the more complex workflows
>> *anyway*, maybe adding a new "replace" pointer in the core git data
>> structures isn't worth it.  It might be that just keeping such things
>> as trailers in the commit body might be the better way to go.
> 
> It doesn't need to be everything to everyone to be useful. I hope to
> show in this thread that it is useful enough to be a compelling addition
> to git. I think I've also shown that it could be used as a part of your
> more complex workflow. Maybe even a bigger part of it than you had
> realized.

I think the reasoning behind Theo's words is that it would be better to
first implement the commit relationship tracking as an add-in which uses
commit messages for data storage, then evaluate its usefulness when it's
actually available (including extensions to gitk and stuff to support the
new metadata), and then it could be moved into core git data structures,
when it has proven itself useful. It's not a trivial feature which warrants
immediate addition to git and its design can change when faced with real-
world use-cases, so it would be bad for compatibility to rush its addition.
Storage location for metadata seems to be an implementation detail which
could be technically changed more or less easily. But it's much easier to
ignore a trailer in commit message in the favor of a commit header field
than to replace a deprecated commit header field with a better one, which
could cause massive headache for all git repositories in the world.

Re: Bring together merge and rebase

2017-12-26 Thread Carl Baldwin

On Sun, Dec 24, 2017 at 10:52:15PM -0500, Theodore Ts'o wrote:
> Here's another potential use case.  The stable kernels (e.g., 3.18.y,
> 4.4.y, 4.9.y, etc.) have cherry picks from the the upstream kernel,
> and this is handled by putting in the commit body something like this:
> 
> [ Upstream commit 3a4b77cd47bb837b8557595ec7425f281f2ca1fe ]

I think replaces could apply to cherry picks like this too. The more I
think about it, I actually think that replaces isn't a bad name for it
in the cherry pick context. When you cherry pick a commit, you create a
new commit that is derived from it and stands in for or replaces it in
the new context. It is a stretch but I don't think it is that bad.

You can tell that it is a cherry pick because the referenced commit's
history is not reachable in the current context.

Though we could consider some different names like "derivedfrom",
"obsoletes", "succeeds", "supersedes", "supplants"

> 
> 
> And here's yet another use case.  For internal Google kernel
> development, we maintain a kernel that has a large number of patches
> on top of a kernel version.  When we backport an upstream fix (say,
> one that first appeared in the 4.12 version of the upstream kernel),
> we include a line in the commit body that looks like this:
> 
> Upstream-4.12-SHA1: 5649645d725c73df4302428ee4e02c869248b4c5
> 
> This is useful, because when we switch to use a newer upstream kernel,
> we need make sure we can account for all patches that were built on
> top of the 3xx kernel (which might have been using 4.10, for the sake
> of argument), to the 4xx kernel series (which might be using 4.15 ---
> the version numbers have been changed to protect the innocent).  This
> means going through each and every patch that was on top of the 3xx
> kernel, and if it has a line such as "Upstream 4.12-SHA1", we know
> that it will already be included in a 4.15 based kernel, so we don't
> need to worry about carrying that patch forward.

Are 3xx and 4xx internal version numbers? If I understand correctly, in
your example, 3xx is the heavily patched internal kernel based on 4.10
and 4xx is the internal patched version of 4.15. I think I'm following
so far.

Let's say that you used a "replaces" reference instead of your
"Upstream-4.12-SHA1" reference. The only piece of metadata that is
missing is the "4.12" of your string. However, you could replicate this
with some set arithmetic. If the sha1 referred to by "replaces" exists
in the set of commits reachable from 4.15 then you've answered the same
question.

> In other cases, we might decide that the patch is no longer needed.
> It could be because the patch has already be included upstream, in
> which case we might check in a commit with an empty patch body, but
> whose header contains something like this in the 4xx kernel:
> 
> Origin-3xx-SHA1: fe546bdfc46a92255ebbaa908dc3a942bc422faa
> Upstream-Dropped-4.11-SHA1: d90dc0ae7c264735bfc5ac354c44ce2e

So, the first reference is the old commit that patched the 3xx series?
What is the second reference? What is "4.11" indicating? Is that the
patch that was included in the upstream kernel that obsoleted your 3xx
patch?

If I understood that correctly. You could use a "replaces" reference for
the first line and the second line would still have to be included as a
separate header in your commit message? Does this mean "replaces" is not
useful in your case? I don't think so.

> Or we could decide that the commit is no longer no longer needed ---

no longer no longer needed? Is this a double negative indicating that it
is needed again? Or, is it a mistake?

> perhaps because the relevant subsystem was completely rewritten and
> the functionality was added in a different way.  Then we might have
> just have an empty commit with an explanation of why the commit is no
> longer needed and the commit body would have the metadata:
> 
> Origin-Dropped-3xx-SHA1: 26f49fcbb45e4bc18ad5b52dc93c3afe

The metadata in this reference indicates that it was dropped since 3xx.
Doesn't the empty body (and maybe a commit message saying dropping a
patch) indicate this if a "references" pointer were used instead? The
3xx part of the metadata could be derived again by set arithmetic.

> Or perhaps the commit is still needed, and for various reasons the
> commit was never upstreamed; perhaps because it's only useful for
> Google-specific hardware, or the patch was rejected upstream.  The we
> will have a cherry-pick that would include in the body:
> 
> Origin-3xx-SHA1: 8f3b6df74b9b4ec3ab615effb984c1b5

Replaces reference and set arithmetic.

> (Note: all commits that are added in the rebase workflow, even the
> empty commits that just have the Origin-Dropped-3xx-SHA1 or
> Upstream-Droped-4.11-SHA1 headers, are patch reviewed through Gerrit,
> so we have an audited, second-engineer review to make sure each commit
> in the 3xx kernel that Google had been carrying had the correct
> disposition when rebasing to the 4xx kernel.)

This is

Re: Bring together merge and rebase

2017-12-26 Thread Carl Baldwin

On Tue, Dec 26, 2017 at 01:08:45PM +0900, Mike Hommey wrote:
> FWIW, your proposal has a lot in common (but is not quite equivalent)
> to mercurial's obsolescence markers and changeset evolution features.

I've had experience with mercurial but not since about 2009. After
reading up a little bit on this changeset evolution feature, it looks
very much like what I'm proposing. Obsolescence markers look a lot like
replaces references except, as illustrated by this blog [1], they point
the other way! Hence, the illustrations confused me for a moment. It
seems more natural to embed the reference in the new commit pointing at
the old. That said, the illustrated direction of the arrows doesn't
really affect the usefulness of the idea.

His third example (#3-working-with-other-people), appears to be the kind
of collaboration that I'm trying to describe here. To quote the blog:

  In git or vanilla (no extension) mercurial, you would have to figure
  out that b’ and b” are two new versions of b and merge them. Changeset
  evolution detects that situation, marks b’ and b” as being divergent.
  It then suggests automatic resolution with a merge and preserves
  history.

This is the kind of thing that I had to deal with manually in gerrit. I
hadn't seen this feature in mercurial but I'm glad to know now there is
a precedent for it.

Carl

[1] https://blog.laurentcharignon.com/post/2016-02-02-changeset-evolution/

Re: Bring together merge and rebase

2017-12-26 Thread Igor Djordjevic

Very interesting topic, just this one part I wanted to comment on:

On 26/12/2017 02:28, Jacob Keller wrote:
> 
> What about some way to take the reflog and turn it into a commit-based
> linkage and export that? Rather than tying it into the individual
> commit history, keep track of it outside the commit, possibly via
> something like notes, or some other mechanism.

This seems like the most useful approach, might be not touching reflog 
per se, but having some kind of "cherry-picked commits source" log 
(where rebasing is a subset of cherry-picking). What Johannes 
mentioned, a mapping between "old" and "new" commits. Might be notes 
could fit in nicely, but I`m not competent to comment on that at the 
moment.

For me, the most interesting use case is not even tied to code review 
(thus no review comments to think about), but a situation where one 
might be rebasing a set of downstream patches on top of updating 
upstream - it might be possible for a bug to slip through due to some 
upstream changes, even where there are no conflicts and test suite is 
executed regularly (might be test reveling the bug is yet to be added).

In that situation, instead of just going back in "regular" history 
(single dimension) and eventually finding the offending (rebased) 
commit (its N-th rebased version, that is), it might be great to 
actually keep drilling down the "rebase history" now (second 
dimension), finding the exact rebase iteration / rebased commit 
version where the error first appeared.

Carl, you described this well in your document[1], and Johannes 
provided a valuable first-hand experience[2] from working around the 
very same native Git limitation for years, mentioning using (fragile, 
costly and not very automatible) rebased commits message search to 
drill down the second dimension (rebase iterations), which seems to 
be the only possible approach at the moment, with "vanilla" Git, at 
least.

So this might be much more interesting case, if code review one is 
less appropriate because of review comments being also relevant to 
commit rebase iterations (which should be then stored somewhere, too, 
relating to corresponding commits, not to lose context).

Regards, Buga

p.s. "Merging rebase" and "shears.sh" script[3] seem to be orthogonal 
to this - really great on their own in improving rebase itself and 
making it smarter and much more powerful and useful, where I guess 
they would benefit from native Git "cherry-picked (rebased) commits 
iterations tracking" (old/source <> new/destination commit mapping), 
too, as would other Git tools.

[1] http://blog.episodicgenius.com/post/merge-or-rebase--neither/
[2] 
https://public-inbox.org/git/20171226040843.h7o6txkrp6zlv...@glandium.org/T/#m2e5079488bed2968d4ea52a10051a06c06ff61e0
[3] 
https://github.com/git-for-windows/build-extra/blob/af9cff5005/shears.sh#L12-L18

Re: Bring together merge and rebase

2017-12-26 Thread Mike Hommey

On Fri, Dec 22, 2017 at 11:10:19PM -0700, Carl Baldwin wrote:
> The big contention among git users is whether to rebase or to merge
> changes [2][3] while iterating. I used to firmly believe that merging
> was the way to go and rebase was harmful. More recently, I have worked
> in some environments where I saw rebase used very effectively while
> iterating on changes and I relaxed my stance a lot. Now, I'm on the
> fence. I appreciate the strengths and weaknesses of both approaches. I
> waffle between the two depending on the situation, the tools being
> used, and I guess, to some extent, my mood.
> 
> I think what git needs is something brand new that brings the two
> together and has all of the advantages of both approaches. Let me
> explain what I've got in mind...
> 
> I've been calling this proposal `git replay` or `git replace` but I'd
> like to hear other suggestions for what to name it. It works like
> rebase except with one very important difference. Instead of orphaning
> the original commit, it keeps a pointer to it in the commit just like
> a `parent` entry but calls it `replaces` instead to distinguish it
> from regular history. In the resulting commit history, following
> `parent` pointers shows exactly the same history as if the commit had
> been rebased. Meanwhile, the history of iterating on the change itself
> is available by following `replaces` pointers. The new commit replaces
> the old one but keeps it around to record how the change evolved.
> 
> The git history now has two dimensions. The first shows a cleaned up
> history where fix ups and code review feedback have been rolled into
> the original changes and changes can possibly be ordered in a nice
> linear progression that is much easier to understand. The second
> drills into the history of a change. There is no loss and you don't
> change history in a way that will cause problems for others who have
> the older commits.
> 
> Replay handles collaboration between multiple authors on a single
> change. This is difficult and prone to accidental loss when using
> rebase and it results in a complex history when done with merge. With
> replay, collaborators could merge while collaborating on a single
> change and a record of each one's contributions can be preserved.
> Attempting this level of collaboration caused me many headaches when I
> worked with the gerrit workflow (which in many ways, I like a lot).
> 
> I blogged about this proposal earlier this year when I first thought
> of it [1]. I got busy and didn't think about it for a while. Now with
> a little time off of work, I've come back to revisit it. The blog
> entry has a few examples showing how it works and how the history will
> look in a few examples. Take a look.
> 
> Various git commands will have to learn how to handle this kind of
> history. For example, things like fetch, push, gc, and others that
> move history around and clean out orphaned history should treat
> anything reachable through `replaces` pointers as precious. Log and
> related history commands may need new switches to traverse the history
> differently in different situations. Bisect is a interesting one. I
> tend to think that bisect should prefer the regular commit history but
> have the ability to drill into the change history if necessary.
> 
> In my opinion, this proposal would bring together rebase and merge in
> a powerful way and could end the contention. Thanks for your
> consideration.

FWIW, your proposal has a lot in common (but is not quite equivalent) to
mercurial's obsolescence markers and changeset evolution features.

Mike

Re: Bring together merge and rebase

2017-12-26 Thread Carl Baldwin

On Tue, Dec 26, 2017 at 03:19:02PM -0500, Paul Smith wrote:
> As someone working in an environment where we do a lot of rebasing and
> very little merging, I read these proposals with interest.  I'm not
> convinced that we would switch to using a "replaces"-type feature, but
> I'm pretty sure that the "null-merge and rebase" trick described
> previously would not be something we're interested in using.

In the near term, maybe. I'm still working with it to be sure I
understand it right.

> Although "git log" doesn't follow these merges (unless requested), all
> the graphical tools that are used to display history WOULD show all
> those branches.  In a "replaces"-type environment I think the point is
> that we would not want to see them (certainly not by default) as they
> would be used mainly for deeper spelunking, but since they just seem
> like normal merges I don't see any way to turn them off.

You've touched on some of my concerns with the null-merge approach. I
want the end result to be as clean as possible which I think is what
lures many to the rebase methodology in the first place.

> If "replaces" was a separate capability then it could be treated
> differently by history browsing tools, and shown or not shown as
> desired.  For example, a commit that had a "replaces" element could be
> selected somehow and you could expand that set of commits that were
> replaced, or something like that.

Exactly!

Carl

Re: Bring together merge and rebase

2017-12-26 Thread Paul Smith

On Tue, 2017-12-26 at 12:44 -0700, Carl Baldwin wrote:
> > Sure, it could be opt in, be a new format etc. But you haven't
> > explained why you think a feature like this would need to rely on
> > an entirely new parent structure and side-DAG, as opposed to just
> > the more minor changes I'm pointing out above, and which I think
> > will give you what you need from a UX level.
> 
> I have not wrapped my head around it enough to convince myself that
> it gives what I'm after. Let me spend a little more time with it to
> get a feel for it.

As someone working in an environment where we do a lot of rebasing and
very little merging, I read these proposals with interest.  I'm not
convinced that we would switch to using a "replaces"-type feature, but
I'm pretty sure that the "null-merge and rebase" trick described
previously would not be something we're interested in using.

Although "git log" doesn't follow these merges (unless requested), all
the graphical tools that are used to display history WOULD show all
those branches.  In a "replaces"-type environment I think the point is
that we would not want to see them (certainly not by default) as they
would be used mainly for deeper spelunking, but since they just seem
like normal merges I don't see any way to turn them off.

If "replaces" was a separate capability then it could be treated
differently by history browsing tools, and shown or not shown as
desired.  For example, a commit that had a "replaces" element could be
selected somehow and you could expand that set of commits that were
replaced, or something like that.

Re: Bring together merge and rebase

2017-12-26 Thread Carl Baldwin

On Tue, Dec 26, 2017 at 01:04:36PM -0500, Theodore Ts'o wrote:
> On Mon, Dec 25, 2017 at 06:16:40PM -0700, Carl Baldwin wrote:
> > At this point, you might wonder why I'm not proposing to simply add a
> > "change-id" to the commit object. The short answer is that the
> > "change-id" Gerrit uses in the commit messages cannot stand on its own.
> > It depends on data stored on the server which maintains a relationship
> > of commits to a review number and a linear ordering of commits within
> > the review (hopefully I'm not over simplifying this). The "replaces"
> > reference is an attempt to make something which can stand on its own. I
> > don't think we need to solve the problem of where to keep comments at
> > this point.
> 
> I strongly disagree, and one way to see that is by doing a real-life
> experiment.  If you take a look at a gerrit change that, which in my
> experience can have up to ten or twelve revisions, and strip out the
> comments, so all you get to look at it is half-dozen or more
> revisions.  How useful is it *really*?  How does it get used in
> practice?  What development problem does it help to solve?

I didn't mean to imply that we need to get along without the comments. I
was only pointing out that gerrit, github, other code review UIs have
already figured out how to store comments archored to specific revisions
of files in the repository. I'm suggesting that we let them continue to
do that part while we take the first step of specifying how the
intermediate revisions are kept.

If the various code review servers adopted this then we'd have a client
side which could push up revisions for review to any of them. In
addition, they'd all get the collaborative functionality that I
described in my reply to your previous message.

What we get with this proposal is if I push up a review and that review
is changed by someone (maybe even me) outside of my original workspace,
my client gives me the tools to detect it and merge with it. If I try to
push over (clobber) that work then I get an error that the remote cannot
be fast-forwarded and I'm forced to fetch it and merge it.

I get this while using the rebase methodology I've grown to enjoy having
since using gerrit and I end up with a mainline history that looks
exactly the way I want it to.

> And when you say that it is a bug that the Gerrit Change-Id does not
> stand alone, consider that it can also be a *feature*.  If you keep
> all of this in the main repo, the number of commits can easily grow by
> an order of magnitude.  And these are commits that you have to keep
> forever, which means it slows down every subsequent git clone, git gc
> operation, git tag --contains search, etc.

I didn't say it was a bug; just that it is at odds with what I'm hoping
to do.

I agree that the number of commits in the repository will go up.
However, I think there will be ways to mitigate the costs.

The commits are not in the mainline history. So, I wouldn't expect a git
tag --contains or most other commands that traverse history to consider
them at all.

It could be possible to make the default git clone skip them all and
only fetch them on demand for specific changes.

> So what are the benefits, and what are the costs?  If the benefits
> were huge, then perhaps it would be worthwhile.  But if you lose a
> huge amount of the value because you are missing the *why* between the
> half-dozen to dozen past revisions of the commit, then is it really
> worth it to adopt that particular workflow?
> 
> It seems to me your argument is contrasting a "replaces" pointer
> versus the github PR.  But compared to the Gerrit solution, I don't
> think the "replaces" pointer proposal is as robust or as featureful.
> Also, please keep in mind that just because it's in core git doesn't
> guarantee that Github will support it.  As far as I know github has
> zero support notes, for example.

What I propose is that gerrit and github could end up more robust,
featureful, and interoperable if they had this feature to build from.

With gerrit specifically, adopting this feature would make the "change"
concept richer than it is now because it could supersede the change-id
in the commit message and allow a change to evolve in a distributed
non-linear way with protection against clobbering work.

I have no intention to disparage either tool. I love them both. They've
both made my career better in different ways. I know there is no
guarantee that github, gerrit, or any other tool will do anything to
adopt this. But, I'm hoping they are reading this thread and that they
recognize how this feature can make them a little bit better and jump in
and help. I know it is a lot to hope for but I think it could be great
if it happened.

Carl

Re: Bring together merge and rebase

2017-12-26 Thread Carl Baldwin

On Tue, Dec 26, 2017 at 06:49:56PM +0100, Ævar Arnfjörð Bjarmason wrote:
> New headers should be added after existing headers, but other than
> that it won't choke on it. See 4b2bced559 when the encoding header was
> added, this also passes most tests:
> 
> diff --git a/commit.c b/commit.c
> index cab8d4455b..cd2bafbaa0 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1565,6 +1565,8 @@ int commit_tree_extended(const char *msg, size_t 
> msg_len,
> if (!encoding_is_utf8)
> strbuf_addf(, "encoding %s\n", 
> git_commit_encoding);
> 
> +   strbuf_addf(, "replaces 
> \n");
> +
> while (extra) {
> add_extra_header(, extra);
> extra = extra->next;
> 
> Only "most" since of course this changes the sha1 of every commit git
> creates from what you get now.
> 
> > Even if core git code does not simply choke on it, I would like push and
> > pull to follow these pointers and transfer the history behind them. I
> > assumed that git would not do this today. I would also like gc to
> > preserve e8aa79baf6 as if it were referenced by a parent pointer so that
> > it doesn't purge it from the history.
> 
> It won't pay any attention to them if "replaces" is something entirely
> new, what I was pointing out in my earlier reply is that you can simply
> *also* create the parent pointers to these no-op merge commits that hide
> away the previous history the "replaces" headers will be referencing.
> 
> The reason to do that is 100% backwards compatibility, and and only
> needing to make minor UI changes to have this feature (to e.g. history
> walking), as opposed to needing to hack everything that now follows
> "parent" or constructs a commit graph.

Thank you for clarifying this. I have learned something.

> Sure, it could be opt in, be a new format etc. But you haven't
> explained why you think a feature like this would need to rely on an
> entirely new parent structure and side-DAG, as opposed to just the
> more minor changes I'm pointing out above, and which I think will give
> you what you need from a UX level.

I have not wrapped my head around it enough to convince myself that it
gives what I'm after. Let me spend a little more time with it to get a
feel for it.

Carl

Re: Bring together merge and rebase

2017-12-26 Thread Theodore Ts'o

On Mon, Dec 25, 2017 at 06:16:40PM -0700, Carl Baldwin wrote:
> At this point, you might wonder why I'm not proposing to simply add a
> "change-id" to the commit object. The short answer is that the
> "change-id" Gerrit uses in the commit messages cannot stand on its own.
> It depends on data stored on the server which maintains a relationship
> of commits to a review number and a linear ordering of commits within
> the review (hopefully I'm not over simplifying this). The "replaces"
> reference is an attempt to make something which can stand on its own. I
> don't think we need to solve the problem of where to keep comments at
> this point.

I strongly disagree, and one way to see that is by doing a real-life
experiment.  If you take a look at a gerrit change that, which in my
experience can have up to ten or twelve revisions, and strip out the
comments, so all you get to look at it is half-dozen or more
revisions.  How useful is it *really*?  How does it get used in
practice?  What development problem does it help to solve?

And when you say that it is a bug that the Gerrit Change-Id does not
stand alone, consider that it can also be a *feature*.  If you keep
all of this in the main repo, the number of commits can easily grow by
an order of magnitude.  And these are commits that you have to keep
forever, which means it slows down every subsequent git clone, git gc
operation, git tag --contains search, etc.

So what are the benefits, and what are the costs?  If the benefits
were huge, then perhaps it would be worthwhile.  But if you lose a
huge amount of the value because you are missing the *why* between the
half-dozen to dozen past revisions of the commit, then is it really
worth it to adopt that particular workflow?

It seems to me your argument is contrasting a "replaces" pointer
versus the github PR.  But compared to the Gerrit solution, I don't
think the "replaces" pointer proposal is as robust or as featureful.
Also, please keep in mind that just because it's in core git doesn't
guarantee that Github will support it.  As far as I know github has
zero support notes, for example.

- Ted

Re: Bring together merge and rebase

2017-12-26 Thread Ævar Arnfjörð Bjarmason


On Tue, Dec 26 2017, Carl Baldwin jotted:

> On Sat, Dec 23, 2017 at 11:09:59PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> >> But I don't see why you think this needs a new "replaces" parent
>> >> pointer orthagonal to parent pointers, i.e. something that would
>> >> need to be a new field in the commit object (I may have misread the
>> >> proposal, it's not heavy on technical details).
>> >
>> > Just to clarify, I am proposing a new "replaces" pointer in the commit
>> > object. Imagine starting with rebase exactly as it works today. This new
>> > field would be inserted into any new commit created by a rebase command
>> > to reference the original commit on which it was based. Though, I'm not
>> > sure if it would be better to change the behavior of the existing rebase
>> > command, provide a switch or config option to turn it on, or provide a
>> > new command entirely (e.g. git replay or git replace) to avoid
>> > compatibility issues with the existing rebase.
>>
>> Yeah that sounds fine, I thought you meant that this "replaces" field
>> would replace the "parent" field, which would require some rather deep
>> incompatible changes to all git clients.
>>
>> But then I don't get why you think fetch/pull/gc would need to be
>> altered, if it's because you thought that adding arbitrary *new* fields
>> to the commit object would require changes to those that's not the case.
>
> Thank you again for your reply. Following is the kind of commit that I
> would like to create.
>
> tree fcce2f309177c7da9c795448a3e392a137434cf1
> parent b3758d9223b63ebbfbc16c9b23205e42272cd4b9
> replaces e8aa79baf6aef573da930a385e4db915187d5187
> author Carl Baldwin  1514057225 -0700
> committer Carl Baldwin  1514058444 -0700
>
> What will happen if I create this today? I assumed git would just choke
> on it but I'm not certain. It has been a long time since I attempted to
> get into the internals of git.

New headers should be added after existing headers, but other than that
it won't choke on it. See 4b2bced559 when the encoding header was added,
this also passes most tests:

diff --git a/commit.c b/commit.c
index cab8d4455b..cd2bafbaa0 100644
--- a/commit.c
+++ b/commit.c
@@ -1565,6 +1565,8 @@ int commit_tree_extended(const char *msg, size_t 
msg_len,
if (!encoding_is_utf8)
strbuf_addf(, "encoding %s\n", git_commit_encoding);

+   strbuf_addf(, "replaces 
\n");
+
while (extra) {
add_extra_header(, extra);
extra = extra->next;

Only "most" since of course this changes the sha1 of every commit git
creates from what you get now.

> Even if core git code does not simply choke on it, I would like push and
> pull to follow these pointers and transfer the history behind them. I
> assumed that git would not do this today. I would also like gc to
> preserve e8aa79baf6 as if it were referenced by a parent pointer so that
> it doesn't purge it from the history.

It won't pay any attention to them if "replaces" is something entirely
new, what I was pointing out in my earlier reply is that you can simply
*also* create the parent pointers to these no-op merge commits that hide
away the previous history the "replaces" headers will be referencing.

The reason to do that is 100% backwards compatibility, and and only
needing to make minor UI changes to have this feature (to e.g. history
walking), as opposed to needing to hack everything that now follows
"parent" or constructs a commit graph.

> I'm currently thinking of an example of the workflow that I'm after in
> response to Theodore Ts'o's message from yesterday. Stay tuned, I hope
> it makes it clearer why I want it this way.
>
> [snip]
>
>> Instead, if I understand what you're actually trying to do, it could
>> also be done as:
>>
>>  1) Just add a new replaces  field to new commit objects
>>
>>  2) Make git-rebase know how to write those, e.g. add two of those
>> pointing to A & B when it squashes them into AB.
>>
>>  3) Write a history traversal mechanism similar to --full-history
>> that'll ignore any commits on branches that yield no changes, or
>> only those whose commits are referenced by this "replaces" field.
>>
>> You'd then end up with:
>>
>>  A) A way to "stash" these commits in the permanent history
>>
>>  B) ... that wouldn't be visble in "git log" by default
>>
>>  C) Would require no underlying changes to the commit model, i.e. it
>> would work with all past & future git clients, if they didn't know
>> about the "replaces" field they'd just show more verbose history.
>
> I get this point. I don't underestimate how difficult making such a
> change to the core model is. I know there are older clients which cannot
> simply be updated. There are also alternate implementations (e.g. jgit)
> that also need to be considered. This is the thing I worry

Re: Bring together merge and rebase

2017-12-26 Thread Jacob Keller

On Mon, Dec 25, 2017 at 10:02 PM, Carl Baldwin  wrote:
> On Mon, Dec 25, 2017 at 05:47:55PM -0800, Jacob Keller wrote:
>> On Mon, Dec 25, 2017 at 5:16 PM, Carl Baldwin  wrote:
>> > Anyway, now I am compelled to use github which is also a fine tool and I
>> > appreciate all of the work that has gone into it. About 80% of the time,
>> > I rebase and force push to my branch to update a pull request. I've come
>> > to like the end product of the rebase workflow. However, github doesn't
>> > excel at this approach. For one, it doesn't preserve older revisions
>> > which were already reviewed which makes it is difficult for reviewers to
>> > pick up where they left off the last time. If it preserved them, as
>> > gerrit does, the reviewer can compare a new revision with the most
>> > recent older revision they reviewed to see just what has been addressed
>> > since then.
>>
>> A bit of a tangent here, but a thought I didn't wanna lose: In the
>> general case where a patch was rebased and the original parent pointer
>> was changed, it is actually quite hard to show a diff of what changed
>> between versions.
>>
>> The best I've found is to do something like a 4-way --cc merge diff,
>> which mostly works, but has a few awkward cases, and ends up usually
>> showing double ++ and -- notation.
>>
>> Just something I've thought about a fair bit, trying to come up with
>> some good way to show "what changed between A1 and A2, but ignore all
>> changes between parent P1 and P2 which you don't care that much about
>> in this context.
>
> I ran into this all the time with gerrit. I wrote a script that you'd
> run on a working copy (with no local changes). I'd fetch and checkout
> the latest patchset that I want to review(say, for example, its patchset
> 5) from gerrit. Then, say I wanted to compare it with patch set 3 which
> has a different parent. I'd run this from the top level of my working
> copy.
>
> compare-to-previous-patchset 3
>
> It would fetch patch set 3 from gerrit, rebase it to the same parent as
> the current patch set on a detached HEAD and then git diff it with the
> current patch set. If there were conflicts, it would just commit the
> conflict markers to the commit. There is no attempt to resolve the
> conflicts. The script was crude but it helped me out many times and it
> was nice to be able to review how conflicts were resolved when those
> came up.
>
> Carl
>

Interesting. That could work fairly well. I usually do something along
the lines of:

git diff patch-new patch-old patch-base-new patch-base-old --cc, which
produces a combined diff format patch which usually works ok.

My biggest gripes are that the gerrit web interface doesn't itself do
something like this (and jgit does not appear to be able to generate
combined diffs at all!)

> PS In case you're curious, here's my script...
>
> #!/bin/bash
>
> remote=gerrit
> previous_patchset=$1; shift
>
> # Assumes we're sitting on the latest patch set.
> new_patch_set_id=$(git rev-parse HEAD)
>
> branch=$(git branch | awk '/^\*/ {print$2}')
> [ "$branch" = "(no" ] && branch=
>
> # set user, host, port, and project from git config
> eval $(echo "$(git config remote.$remote.url)" |
>sed 's,ssh://\(.*\)@\(.*\):\([[:digit:]]*\)/\(.*\).git,user=\1 host=\2 
> p<
>
> gerrit() {
> ssh $user@$host -p $port gerrit ${1+"$@"}
> }
>
> # Grabs a bunch of information from gerrit about the current patch
> eval $(gerrit query --current-patch-set $new_patch_set_id |
> awk '
> BEGIN {mode="main"}
> / currentPatchSet:/ { mode="currentPatchSet" }
> / ref:/ { printf "new_patch_ref=%s\n", $2 }
> / number:/ {
> if (mode=="main") {
> printf "review_num=%s\n", $2
> }
> if (mode=="currentPatchSet") {
> printf "new_patchset=%s\n", $2
> }
> }
> ')
>
> # Fetch the old patch set
> old_patch_ref=${new_patch_ref%$new_patchset}$previous_patchset
> git fetch $remote $old_patch_ref && git checkout FETCH_HEAD
>
> # Rebase the old patch set to the parent of the new patch set.
> if ! git rebase HEAD^ --onto ${new_patch_set_id}^
> then
> git diff --name-only --diff-filter=U -z | xargs -0 git add
> git rebase --continue
> fi
>
> previous_patchset_rebased=$(git rev-parse HEAD)
>
> # Go back to the new patch set and diff it against the rebased old one.
> if [ "$branch" ]
> then
> git checkout $branch
> else
> git checkout $new_patch_set_id
> fi
> git diff $previous_patchset_rebased

One thing you might do is have it create a temporary worktree in order
to avoid problems with being in the local checkout.

Thanks,
Jake

Re: Bring together merge and rebase

2017-12-25 Thread Carl Baldwin

On Mon, Dec 25, 2017 at 05:47:55PM -0800, Jacob Keller wrote:
> On Mon, Dec 25, 2017 at 5:16 PM, Carl Baldwin  wrote:
> > Anyway, now I am compelled to use github which is also a fine tool and I
> > appreciate all of the work that has gone into it. About 80% of the time,
> > I rebase and force push to my branch to update a pull request. I've come
> > to like the end product of the rebase workflow. However, github doesn't
> > excel at this approach. For one, it doesn't preserve older revisions
> > which were already reviewed which makes it is difficult for reviewers to
> > pick up where they left off the last time. If it preserved them, as
> > gerrit does, the reviewer can compare a new revision with the most
> > recent older revision they reviewed to see just what has been addressed
> > since then.
> 
> A bit of a tangent here, but a thought I didn't wanna lose: In the
> general case where a patch was rebased and the original parent pointer
> was changed, it is actually quite hard to show a diff of what changed
> between versions.
>
> The best I've found is to do something like a 4-way --cc merge diff,
> which mostly works, but has a few awkward cases, and ends up usually
> showing double ++ and -- notation.
>
> Just something I've thought about a fair bit, trying to come up with
> some good way to show "what changed between A1 and A2, but ignore all
> changes between parent P1 and P2 which you don't care that much about
> in this context.

I ran into this all the time with gerrit. I wrote a script that you'd
run on a working copy (with no local changes). I'd fetch and checkout
the latest patchset that I want to review(say, for example, its patchset
5) from gerrit. Then, say I wanted to compare it with patch set 3 which
has a different parent. I'd run this from the top level of my working
copy.

compare-to-previous-patchset 3

It would fetch patch set 3 from gerrit, rebase it to the same parent as
the current patch set on a detached HEAD and then git diff it with the
current patch set. If there were conflicts, it would just commit the
conflict markers to the commit. There is no attempt to resolve the
conflicts. The script was crude but it helped me out many times and it
was nice to be able to review how conflicts were resolved when those
came up.

Carl

PS In case you're curious, here's my script...

#!/bin/bash

remote=gerrit
previous_patchset=$1; shift

# Assumes we're sitting on the latest patch set.
new_patch_set_id=$(git rev-parse HEAD)

branch=$(git branch | awk '/^\*/ {print$2}')
[ "$branch" = "(no" ] && branch=

# set user, host, port, and project from git config
eval $(echo "$(git config remote.$remote.url)" |
   sed 's,ssh://\(.*\)@\(.*\):\([[:digit:]]*\)/\(.*\).git,user=\1 host=\2 p<

gerrit() {
ssh $user@$host -p $port gerrit ${1+"$@"}
}

# Grabs a bunch of information from gerrit about the current patch
eval $(gerrit query --current-patch-set $new_patch_set_id |
awk '
BEGIN {mode="main"}
/ currentPatchSet:/ { mode="currentPatchSet" }
/ ref:/ { printf "new_patch_ref=%s\n", $2 }
/ number:/ {
if (mode=="main") {
printf "review_num=%s\n", $2
}
if (mode=="currentPatchSet") {
printf "new_patchset=%s\n", $2
}
}
')

# Fetch the old patch set
old_patch_ref=${new_patch_ref%$new_patchset}$previous_patchset
git fetch $remote $old_patch_ref && git checkout FETCH_HEAD

# Rebase the old patch set to the parent of the new patch set.
if ! git rebase HEAD^ --onto ${new_patch_set_id}^
then
git diff --name-only --diff-filter=U -z | xargs -0 git add
git rebase --continue
fi

previous_patchset_rebased=$(git rev-parse HEAD)

# Go back to the new patch set and diff it against the rebased old one.
if [ "$branch" ]
then
git checkout $branch
else
git checkout $new_patch_set_id
fi
git diff $previous_patchset_rebased

Re: Bring together merge and rebase

2017-12-25 Thread Jacob Keller

On Mon, Dec 25, 2017 at 5:16 PM, Carl Baldwin  wrote:
> Anyway, now I am compelled to use github which is also a fine tool and I
> appreciate all of the work that has gone into it. About 80% of the time,
> I rebase and force push to my branch to update a pull request. I've come
> to like the end product of the rebase workflow. However, github doesn't
> excel at this approach. For one, it doesn't preserve older revisions
> which were already reviewed which makes it is difficult for reviewers to
> pick up where they left off the last time. If it preserved them, as
> gerrit does, the reviewer can compare a new revision with the most
> recent older revision they reviewed to see just what has been addressed
> since then.

A bit of a tangent here, but a thought I didn't wanna lose: In the
general case where a patch was rebased and the original parent pointer
was changed, it is actually quite hard to show a diff of what changed
between versions.

The best I've found is to do something like a 4-way --cc merge diff,
which mostly works, but has a few awkward cases, and ends up usually
showing double ++ and -- notation.

Just something I've thought about a fair bit, trying to come up with
some good way to show "what changed between A1 and A2, but ignore all
changes between parent P1 and P2 which you don't care that much about
in this context.

Thanks,
Jake

Re: Bring together merge and rebase

2017-12-25 Thread Jacob Keller

On Mon, Dec 25, 2017 at 4:16 PM, Carl Baldwin  wrote:
> On Sat, Dec 23, 2017 at 11:09:59PM +0100, Ęvar Arnfjörš Bjarmason wrote:
>> >> But I don't see why you think this needs a new "replaces" parent
>> >> pointer orthagonal to parent pointers, i.e. something that would
>> >> need to be a new field in the commit object (I may have misread the
>> >> proposal, it's not heavy on technical details).
>> >
>> > Just to clarify, I am proposing a new "replaces" pointer in the commit
>> > object. Imagine starting with rebase exactly as it works today. This new
>> > field would be inserted into any new commit created by a rebase command
>> > to reference the original commit on which it was based. Though, I'm not
>> > sure if it would be better to change the behavior of the existing rebase
>> > command, provide a switch or config option to turn it on, or provide a
>> > new command entirely (e.g. git replay or git replace) to avoid
>> > compatibility issues with the existing rebase.
>>
>> Yeah that sounds fine, I thought you meant that this "replaces" field
>> would replace the "parent" field, which would require some rather deep
>> incompatible changes to all git clients.
>>
>> But then I don't get why you think fetch/pull/gc would need to be
>> altered, if it's because you thought that adding arbitrary *new* fields
>> to the commit object would require changes to those that's not the case.
>
> Thank you again for your reply. Following is the kind of commit that I
> would like to create.
>
> tree fcce2f309177c7da9c795448a3e392a137434cf1
> parent b3758d9223b63ebbfbc16c9b23205e42272cd4b9
> replaces e8aa79baf6aef573da930a385e4db915187d5187
> author Carl Baldwin  1514057225 -0700
> committer Carl Baldwin  1514058444 -0700
>
> What will happen if I create this today? I assumed git would just choke
> on it but I'm not certain. It has been a long time since I attempted to
> get into the internals of git.
>
> Even if core git code does not simply choke on it, I would like push and
> pull to follow these pointers and transfer the history behind them. I
> assumed that git would not do this today. I would also like gc to
> preserve e8aa79baf6 as if it were referenced by a parent pointer so that
> it doesn't purge it from the history.
>
> I'm currently thinking of an example of the workflow that I'm after in
> response to Theodore Ts'o's message from yesterday. Stay tuned, I hope
> it makes it clearer why I want it this way.
>
> [snip]
>
>> Instead, if I understand what you're actually trying to do, it could
>> also be done as:
>>
>>  1) Just add a new replaces  field to new commit objects
>>
>>  2) Make git-rebase know how to write those, e.g. add two of those
>> pointing to A & B when it squashes them into AB.
>>
>>  3) Write a history traversal mechanism similar to --full-history
>> that'll ignore any commits on branches that yield no changes, or
>> only those whose commits are referenced by this "replaces" field.
>>
>> You'd then end up with:
>>
>>  A) A way to "stash" these commits in the permanent history
>>
>>  B) ... that wouldn't be visble in "git log" by default
>>
>>  C) Would require no underlying changes to the commit model, i.e. it
>> would work with all past & future git clients, if they didn't know
>> about the "replaces" field they'd just show more verbose history.
>
> I get this point. I don't underestimate how difficult making such a
> change to the core model is. I know there are older clients which cannot
> simply be updated. There are also alternate implementations (e.g. jgit)
> that also need to be considered. This is the thing I worry about the
> most. I think at the very least, this new feature will have to be an
> opt-in feature for teams who can easily ensure a minimum version of git
> will be used. Maybe the core.repositoryformatversion config or something
> like that would have to play into it. There may also be some minimal
> amount that could be backported to older clients to at least avoid
> choking on new repos (I know this doesn't guarantee older clients will
> be updated). Just throwing a few ideas out.
>
> I want to be sure that the implications have been explored before giving
> up and doing something external to git.
>
> Carl

What about some way to take the reflog and turn it into a commit-based
linkage and export that? Rather than tying it into the individual
commit history, keep track of it outside the commit, possibly via
something like notes, or some other mechanism.

This also ties into work done by Josh Triplett on git series [1] and
some previous mail discussions that I remember. He had some mechanism
for tracking series history which works ok, but can cause problems you
mentioned when simply adding a second parent commit.

I tend to think some mechanism to store both patch/commit history and
review based comments would be a very useful thing to integrate so
that multiple platforms

Re: Bring together merge and rebase

2017-12-25 Thread Carl Baldwin

On Sun, Dec 24, 2017 at 10:52:15PM -0500, Theodore Ts'o wrote:
> As a suggestion, before diving into the technical details of your
> proposal, it might be useful consider the usage scenario you are
> targetting.  Things like "git rebase" and "git merge" and your
> proposed "git replace/replay" are *mechanisms*.
> 
> But how they fit into a particular workflow is much more important
> from a design perspective, and given that there are many different git
> workflows which are used by different projects, and by different
> developers within a particular project.
> 
> For example, rebase gets used in many different ways, and many of the
> debates when people talk about "git rebase" being evil generally
> presuppose a particular workflow that that the advocate has in mind.
> If someone is using git rebase or git commit --amend before git
> commits have ever been pushed out to a public repository, or to anyone
> else, that's a very different case where it has been visible
> elsewhere.  Even the the most strident, "you must never rewrite a
> commit and all history must be preserved" generally don't insist that
> every single edit must be preserved on the theory that "all history is
> valuable".
> 
> > The git history now has two dimensions. The first shows a cleaned up
> > history where fix ups and code review feedback have been rolled into
> > the original changes and changes can possibly be ordered in a nice
> > linear progression that is much easier to understand. The second
> > drills into the history of a change. There is no loss and you don't
> > change history in a way that will cause problems for others who have
> > the older commits.
> 
> If your goal is to preserve the history of the change, one of the
> problems with any git-centric solution is that you generally lose the
> code review feedback and the discussions that are involved with a
> commit.  Just simply preserving the different versions of the commits
> is going to lose a huge amount of the context that makes the history
> valuable.
> 
> So for example, I would claim that if *that* is your goal, a better
> solution is to use Gerrit, so that all of the different versions of
> the commits are preserved along with the line-by-line comments and
> discussions that were part of the code review.  In that model, each
> commit has something like this in the commit trailer:
> 
> Change-Id: I8d89b33683274451bcd6bfbaf75bce98

Thank you for your reply. I agree that discussing the workflows is very
valuable and I certainly haven't done that justice yet.

Gerrit is the tool that got me thinking about my proposal in the first
place. I spent a few years developing and doing a significant number of
code reviews using it. I've since changed to an environment where I no
longer have it. It turns out that "a better solution is to use Gerrit"
is not helpful to me now because it isn't up to me. Gerrit is not nearly
as ubiquitous as git itself.

In my opinion, Gerrit has shown us the power of the "change". As you
point out, it introduced the change-id embedded into the commit message
and uses it to track a change's progress as a "review." I think these
are powerful concepts and Gerrit did a nice job with them. I guess one
of my goals with my proposal here is to formalize the "change" idea so
that any git-based tool understands it and can interoperate. This is why
I want it in the core git commit object and I want push, pull, gc, and
other commands to understand it.

At this point, you might wonder why I'm not proposing to simply add a
"change-id" to the commit object. The short answer is that the
"change-id" Gerrit uses in the commit messages cannot stand on its own.
It depends on data stored on the server which maintains a relationship
of commits to a review number and a linear ordering of commits within
the review (hopefully I'm not over simplifying this). The "replaces"
reference is an attempt to make something which can stand on its own. I
don't think we need to solve the problem of where to keep comments at
this point.

An unbroken chain of "replaces" references obviates the need for the
change id in the commit message. From any given commit in the chain, we
can follow the references to the first commit which started the review.
However, the chain is even more useful because it is not limited to a
linear progression of revisions. Let me try to explain how this can
solve some of the most common issues I ran into with the rebase type
workflow.

Look at what happens in a rebase type workflow in any of the following
scenarios. All of these came up regularly in my time with Gerrit.

1. Make a quick edit through the web UI then later work on the
   change again in your local clone. It is easy to forget to pull
   down the change made through the UI before starting to work on it
   again. If that happens, the change made through the UI will
   almost certainly be clobbered.

2. You or someone else creates a second change that is dependent on

Re: Bring together merge and rebase

2017-12-25 Thread Carl Baldwin

On Sat, Dec 23, 2017 at 11:09:59PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> But I don't see why you think this needs a new "replaces" parent
> >> pointer orthagonal to parent pointers, i.e. something that would
> >> need to be a new field in the commit object (I may have misread the
> >> proposal, it's not heavy on technical details).
> >
> > Just to clarify, I am proposing a new "replaces" pointer in the commit
> > object. Imagine starting with rebase exactly as it works today. This new
> > field would be inserted into any new commit created by a rebase command
> > to reference the original commit on which it was based. Though, I'm not
> > sure if it would be better to change the behavior of the existing rebase
> > command, provide a switch or config option to turn it on, or provide a
> > new command entirely (e.g. git replay or git replace) to avoid
> > compatibility issues with the existing rebase.
> 
> Yeah that sounds fine, I thought you meant that this "replaces" field
> would replace the "parent" field, which would require some rather deep
> incompatible changes to all git clients.
> 
> But then I don't get why you think fetch/pull/gc would need to be
> altered, if it's because you thought that adding arbitrary *new* fields
> to the commit object would require changes to those that's not the case.

Thank you again for your reply. Following is the kind of commit that I
would like to create.

tree fcce2f309177c7da9c795448a3e392a137434cf1
parent b3758d9223b63ebbfbc16c9b23205e42272cd4b9
replaces e8aa79baf6aef573da930a385e4db915187d5187
author Carl Baldwin  1514057225 -0700
committer Carl Baldwin  1514058444 -0700

What will happen if I create this today? I assumed git would just choke
on it but I'm not certain. It has been a long time since I attempted to
get into the internals of git.

Even if core git code does not simply choke on it, I would like push and
pull to follow these pointers and transfer the history behind them. I
assumed that git would not do this today. I would also like gc to
preserve e8aa79baf6 as if it were referenced by a parent pointer so that
it doesn't purge it from the history.

I'm currently thinking of an example of the workflow that I'm after in
response to Theodore Ts'o's message from yesterday. Stay tuned, I hope
it makes it clearer why I want it this way.

[snip]

> Instead, if I understand what you're actually trying to do, it could
> also be done as:
> 
>  1) Just add a new replaces  field to new commit objects
> 
>  2) Make git-rebase know how to write those, e.g. add two of those
> pointing to A & B when it squashes them into AB.
> 
>  3) Write a history traversal mechanism similar to --full-history
> that'll ignore any commits on branches that yield no changes, or
> only those whose commits are referenced by this "replaces" field.
> 
> You'd then end up with:
> 
>  A) A way to "stash" these commits in the permanent history
> 
>  B) ... that wouldn't be visble in "git log" by default
> 
>  C) Would require no underlying changes to the commit model, i.e. it
> would work with all past & future git clients, if they didn't know
> about the "replaces" field they'd just show more verbose history.

I get this point. I don't underestimate how difficult making such a
change to the core model is. I know there are older clients which cannot
simply be updated. There are also alternate implementations (e.g. jgit)
that also need to be considered. This is the thing I worry about the
most. I think at the very least, this new feature will have to be an
opt-in feature for teams who can easily ensure a minimum version of git
will be used. Maybe the core.repositoryformatversion config or something
like that would have to play into it. There may also be some minimal
amount that could be backported to older clients to at least avoid
choking on new repos (I know this doesn't guarantee older clients will
be updated). Just throwing a few ideas out.

I want to be sure that the implications have been explored before giving
up and doing something external to git.

Carl

RE: Bring together merge and rebase

2017-12-25 Thread Randall S. Becker

On December 25, 2017 6:44 PM Carl Baldwin wrote:
> On Sun, Dec 24, 2017 at 12:01:38AM +0100, Johannes Schindelin wrote:
> > On Sat, 23 Dec 2017, Carl Baldwin wrote:
> > > I imagine that a "git commit --amend" would also insert a "replaces"
> > > reference to the original commit but I failed to mention that in my
> > > original post.
> >
> > And cherry-pick, too, of course.
> 
> This brings up a good point. I do think this can be applied to cherry-pick, 
> but
> as someone else pointed out, the name "replaces"
> doesn't seem right in the context of a cherry-pick. So, maybe "replaces"
> is not the right name. I'm open to suggestions.

Just an off the wall suggestion: what about "stitch" or "suture" since this is 
now way beyond a band-aid solution (sorry  , but only a little). I was 
thinking along the lines of "blend" but that seems less graphic and doesn't 
apply to cherry-picking.

Holiday Cheers,
Randall

-- Brief whoami: NonStop developer since approximately 
UNIX(421664400)/NonStop(2112884442)
-- In my real life, I talk too much.

Re: Bring together merge and rebase

2017-12-25 Thread Carl Baldwin

On Sun, Dec 24, 2017 at 12:01:38AM +0100, Johannes Schindelin wrote:
> Hi Carl,
> 
> On Sat, 23 Dec 2017, Carl Baldwin wrote:
> 
> > I imagine that a "git commit --amend" would also insert a "replaces"
> > reference to the original commit but I failed to mention that in my
> > original post.
> 
> And cherry-pick, too, of course.

This brings up a good point. I do think this can be applied to
cherry-pick, but as someone else pointed out, the name "replaces"
doesn't seem right in the context of a cherry-pick. So, maybe "replaces"
is not the right name. I'm open to suggestions.

It occurs to me now that the reason that I want a separate, orthogonal
history dimension is that a "replaces" reference does not imply that the
referenced commit is pulled in with all of its history like a "parent"
reference does. It isn't creating a merge commit. It means that the
referenced commit is derived from the other one and, at least in the
context of this branch's main history, renders it obsolete. Given this
definition, I think it applies to a cherry-pick.

> Both of these examples hint at a rather huge urge of some users to turn
> this feature off because the referenced commits may very well be
> throw-away commits in their case, making the newly-recorded information
> completely undesired.

I certainly don't want to make it difficult to get rid of throw-away
commits.

The workflows I'm interested in are mostly around iterating on what will
end up looking like a single commit in the final history. I'm imagining
posting a change, (or changes) somewhere to be reviewed by others.
Others submit feedback and I continue iterating given the feedback. If
certain intermediate throw-away commits have only been seen locally by
the author, they could be squashed into a single minimal new update.

I'm diving deeper into these workflows in my reply to Theodore. To avoid
fragmenting my ideas too much, I'll take the details over to that reply.
I hope to finished that soon.

Carl

Re: Bring together merge and rebase

2017-12-25 Thread Carl Baldwin

On Sat, Dec 23, 2017 at 05:19:35PM -0500, Randall S. Becker wrote:
> No matter how this plays out, let's please make very sure to provide
> sufficient user documentation so that those of us who have to explain
> the differences to users have a decent reference. Even now, explaining
> rebase vs. merge is difficult enough for people new to git to choose
> which to use when (sometimes pummeling is involved to get the point
> across  ), even though it should be intuitive to most of us. I am
> predicting that adding this capability is going to further confuse the
> *new* user community a little. Entirely out of enlighted
> self-interest, I am offering to help document
> (edits/contribution//whatever) this once we get to that point in
> development.

I agree. I have a feeling that it may take a while for this to play out.
This has been on my mind for a while and think there will be some more
discussion before anything gets started.

Carl

> Something else to consider is how (or if) this capability is going to
> be presented in front-ends and in Cloud services. GitK is a given, of
> course. I'm still impatiently waiting for worktree support from some
> other front-ends.

It all takes time. :)

> Cheers,
> Randall
> 
> -- Brief whoami: NonStop developer since approximately 
> UNIX(421664400)/NonStop(2112884442)
> -- In my real life, I talk too much.

Re: Bring together merge and rebase

2017-12-24 Thread Theodore Ts'o

On Fri, Dec 22, 2017 at 11:10:19PM -0700, Carl Baldwin wrote:
> I've been calling this proposal `git replay` or `git replace` but I'd
> like to hear other suggestions for what to name it. It works like
> rebase except with one very important difference. Instead of orphaning
> the original commit, it keeps a pointer to it in the commit just like
> a `parent` entry but calls it `replaces` instead to distinguish it
> from regular history. In the resulting commit history, following
> `parent` pointers shows exactly the same history as if the commit had
> been rebased. Meanwhile, the history of iterating on the change itself
> is available by following `replaces` pointers. The new commit replaces
> the old one but keeps it around to record how the change evolved.

As a suggestion, before diving into the technical details of your
proposal, it might be useful consider the usage scenario you are
targetting.  Things like "git rebase" and "git merge" and your
proposed "git replace/replay" are *mechanisms*.

But how they fit into a particular workflow is much more important
from a design perspective, and given that there are many different git
workflows which are used by different projects, and by different
developers within a particular project.

For example, rebase gets used in many different ways, and many of the
debates when people talk about "git rebase" being evil generally
presuppose a particular workflow that that the advocate has in mind.
If someone is using git rebase or git commit --amend before git
commits have ever been pushed out to a public repository, or to anyone
else, that's a very different case where it has been visible
elsewhere.  Even the the most strident, "you must never rewrite a
commit and all history must be preserved" generally don't insist that
every single edit must be preserved on the theory that "all history is
valuable".

> The git history now has two dimensions. The first shows a cleaned up
> history where fix ups and code review feedback have been rolled into
> the original changes and changes can possibly be ordered in a nice
> linear progression that is much easier to understand. The second
> drills into the history of a change. There is no loss and you don't
> change history in a way that will cause problems for others who have
> the older commits.

If your goal is to preserve the history of the change, one of the
problems with any git-centric solution is that you generally lose the
code review feedback and the discussions that are involved with a
commit.  Just simply preserving the different versions of the commits
is going to lose a huge amount of the context that makes the history
valuable.

So for example, I would claim that if *that* is your goal, a better
solution is to use Gerrit, so that all of the different versions of
the commits are preserved along with the line-by-line comments and
discussions that were part of the code review.  In that model, each
commit has something like this in the commit trailer:

Change-Id: I8d89b33683274451bcd6bfbaf75bce98

You can then cut and paste the Change-Id into the Gerrit user
interface, and see the different commits, more important, the
discussion surrounding each change.

If the complaint about Gerrit is that it's not a core part of Git, the
challenge is (a) how to carry the code review comments in the git
repository, and (b) do so in a while that it doesn't bloat the core
repository, since most of the time, you *don't* want or need to keep a
local copy of all of the code review comments going back since the
beginning of the project.

-

Here's another potential use case.  The stable kernels (e.g., 3.18.y,
4.4.y, 4.9.y, etc.) have cherry picks from the the upstream kernel,
and this is handled by putting in the commit body something like this:

[ Upstream commit 3a4b77cd47bb837b8557595ec7425f281f2ca1fe ]

And here's yet another use case.  For internal Google kernel
development, we maintain a kernel that has a large number of patches
on top of a kernel version.  When we backport an upstream fix (say,
one that first appeared in the 4.12 version of the upstream kernel),
we include a line in the commit body that looks like this:

Upstream-4.12-SHA1: 5649645d725c73df4302428ee4e02c869248b4c5

This is useful, because when we switch to use a newer upstream kernel,
we need make sure we can account for all patches that were built on
top of the 3xx kernel (which might have been using 4.10, for the sake
of argument), to the 4xx kernel series (which might be using 4.15 ---
the version numbers have been changed to protect the innocent).  This
means going through each and every patch that was on top of the 3xx
kernel, and if it has a line such as "Upstream 4.12-SHA1", we know
that it will already be included in a 4.15 based kernel, so we don't
need to worry about carrying that patch forward.

In other cases, we might decide that the patch is no longer needed.
It could be because the patch has already be included upstream, in

Re: Bring together merge and rebase

2017-12-24 Thread Alexei Lozovsky

On Dec 24, 2017, at 01:01, Johannes Schindelin wrote:
> 
> Hi Carl,
> 
> On Sat, 23 Dec 2017, Carl Baldwin wrote:
> 
>> I imagine that a "git commit --amend" would also insert a "replaces"
>> reference to the original commit but I failed to mention that in my
>> original post.
> 
> And cherry-pick, too, of course.

Why would it? In my mind, cherry-picking does not 'replace' or 'refine'
commits, it copies them into other, unrelated branches (usually something
like stable branches maintained separately from the mainline). If anything,
cherry-pick could add a separate "cherry-picked from" reference which may
be useful, I guess, for conflict resolution if two branches with the same
commit are merged.

> Of course, that is only my wish, other users in similar situations may
> want that information. Demonstrating that you would be better served with
> an opt-in feature that uses notes rather than a baked-in commit header.

Using notes also allows to test and evaluate this new feature without
any changes to core git, using it as an extension at first.

Re: Bring together merge and rebase

2017-12-23 Thread Johannes Schindelin

Hi Carl,

On Sat, 23 Dec 2017, Carl Baldwin wrote:

> I imagine that a "git commit --amend" would also insert a "replaces"
> reference to the original commit but I failed to mention that in my
> original post.

And cherry-pick, too, of course.

Both of these examples hint at a rather huge urge of some users to turn
this feature off because the referenced commits may very well be
throw-away commits in their case, making the newly-recorded information
completely undesired.

Example: I am working on a topic branch. In the middle, I see a typo. I
commit a fix, continue to work on the topic branch. Later, I cherry-pick
that commit to a separate topic branch because I really don't think that
those two topics are related. Now I definitely do not want a reference of
the cherry-picked commit to the original one: the latter will never be
pushed to a public repository, and gc'ed in a few weeks.

Of course, that is only my wish, other users in similar situations may
want that information. Demonstrating that you would be better served with
an opt-in feature that uses notes rather than a baked-in commit header.

Ciao,
Johannes

Re: Bring together merge and rebase

2017-12-23 Thread Johannes Schindelin

Hi Ævar,

On Sat, 23 Dec 2017, Ævar Arnfjörð Bjarmason wrote:

> On Sat, Dec 23 2017, Carl Baldwin jotted:
> 
> > The big contention among git users is whether to rebase or to merge
> > changes [2][3] while iterating. I used to firmly believe that merging
> > was the way to go and rebase was harmful. More recently, I have worked
> > in some environments where I saw rebase used very effectively while
> > iterating on changes and I relaxed my stance a lot. Now, I'm on the
> > fence. I appreciate the strengths and weaknesses of both approaches. I
> > waffle between the two depending on the situation, the tools being
> > used, and I guess, to some extent, my mood.
> >
> > I think what git needs is something brand new that brings the two
> > together and has all of the advantages of both approaches. Let me
> > explain what I've got in mind...
> >
> > I've been calling this proposal `git replay` or `git replace` but I'd
> > like to hear other suggestions for what to name it. It works like
> > rebase except with one very important difference. Instead of orphaning
> > the original commit, it keeps a pointer to it in the commit just like
> > a `parent` entry but calls it `replaces` instead to distinguish it
> > from regular history. In the resulting commit history, following
> > `parent` pointers shows exactly the same history as if the commit had
> > been rebased. Meanwhile, the history of iterating on the change itself
> > is available by following `replaces` pointers. The new commit replaces
> > the old one but keeps it around to record how the change evolved.
> >
> > The git history now has two dimensions. The first shows a cleaned up
> > history where fix ups and code review feedback have been rolled into
> > the original changes and changes can possibly be ordered in a nice
> > linear progression that is much easier to understand. The second
> > drills into the history of a change. There is no loss and you don't
> > change history in a way that will cause problems for others who have
> > the older commits.
> >
> > Replay handles collaboration between multiple authors on a single
> > change. This is difficult and prone to accidental loss when using
> > rebase and it results in a complex history when done with merge. With
> > replay, collaborators could merge while collaborating on a single
> > change and a record of each one's contributions can be preserved.
> > Attempting this level of collaboration caused me many headaches when I
> > worked with the gerrit workflow (which in many ways, I like a lot).
> >
> > I blogged about this proposal earlier this year when I first thought
> > of it [1]. I got busy and didn't think about it for a while. Now with
> > a little time off of work, I've come back to revisit it. The blog
> > entry has a few examples showing how it works and how the history will
> > look in a few examples. Take a look.
> >
> > Various git commands will have to learn how to handle this kind of
> > history. For example, things like fetch, push, gc, and others that
> > move history around and clean out orphaned history should treat
> > anything reachable through `replaces` pointers as precious. Log and
> > related history commands may need new switches to traverse the history
> > differently in different situations. Bisect is a interesting one. I
> > tend to think that bisect should prefer the regular commit history but
> > have the ability to drill into the change history if necessary.
> >
> > In my opinion, this proposal would bring together rebase and merge in
> > a powerful way and could end the contention. Thanks for your
> > consideration.
> >
> > Carl Baldwin
> >
> > [1] http://blog.episodicgenius.com/post/merge-or-rebase--neither/ [2]
> > https://git-scm.com/book/en/v2/Git-Branching-Rebasing [3]
> > http://changelog.complete.org/archives/586-rebase-considered-harmful
> 
> I think this is a worthwhile thing to implement, there are certainly
> use-cases where you'd like to have your cake & eat it too as it were,
> i.e. have a nice rebased history in "git log", but also have the "raw"
> history for all the reasons the fossil people like to talk about, or for
> some compliance reasons.
> 
> But I don't see why you think this needs a new "replaces" parent pointer
> orthagonal to parent pointers, i.e. something that would need to be a
> new field in the commit object (I may have misread the proposal, it's
> not heavy on technical details).
> 
> Consider a merge use case like this:
> 
>   A---B---C topic
>  / \
> D---E---F---G---H master
> 
> Here we worked on a topic with commits A,B & C, maybe we regret not
> squashing B into A, but it gives us the "raw" history. Instead we might
> rebase it like this:
> 
>   A+B---C topic
>  /
> G---H master
> 
> Now we can push "topic" to master, but as you've noted this loses the
> raw history, but now consider doing this instead:
> 
>   A---B---C   A2+B2---C2 topic
>  / \ /
>

RE: Bring together merge and rebase

2017-12-23 Thread Randall S. Becker

On December 23, 2017 4:02 PM, Carl Baldwin wrote:
> On Sat, Dec 23, 2017 at 07:59:35PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > I think this is a worthwhile thing to implement, there are certainly
> > use-cases where you'd like to have your cake & eat it too as it were,
> > i.e. have a nice rebased history in "git log", but also have the "raw"
> > history for all the reasons the fossil people like to talk about, or
> > for some compliance reasons.
> 
> Thank you kindly for your reply. I do think we can have the cake and eat it
> too in this case. At a high level, what you describe above is what I'm after.
> I'm sorry if I left something out or was unclear. I hoped to keep my original
> post brief. Maybe it was too brief to be useful.
> However, I'd like to follow up and be understood.
> 
> > But I don't see why you think this needs a new "replaces" parent
> > pointer orthagonal to parent pointers, i.e. something that would need
> > to be a new field in the commit object (I may have misread the
> > proposal, it's not heavy on technical details).
> 
> Just to clarify, I am proposing a new "replaces" pointer in the commit object.
> Imagine starting with rebase exactly as it works today. This new field would
> be inserted into any new commit created by a rebase command to reference
> the original commit on which it was based. Though, I'm not sure if it would
> be better to change the behavior of the existing rebase command, provide a
> switch or config option to turn it on, or provide a new command entirely (e.g.
> git replay or git replace) to avoid compatibility issues with the existing 
> rebase.
> 
> I imagine that a "git commit --amend" would also insert a "replaces"
> reference to the original commit but I failed to mention that in my original
> post. The amend use case is similar to adding a fixup commit and then doing
> a squash in interactive mode.
> 
> > Consider a merge use case like this:
> >
> >   A---B---C topic
> >  / \
> > D---E---F---G---H master
> 
> This is a bit different than the use cases that I've had in mind. You show 
> that
> the topic has already merged to master. I have imagined this proposal being
> useful before the topic becomes a part of the master branch. I'm thinking in
> the context of something like a github pull request under active development
> and review or a gerrit review. So, at this point, we still look like this:
> 
>   A---B---C topic
>  /
> D---E---F---G
> 
> > Here we worked on a topic with commits A,B & C, maybe we regret not
> > squashing B into A, but it gives us the "raw" history. Instead we
> > might rebase it like this:
> >
> >   A+B---C topic
> >  /
> > G---H master
> 
> Since H already merged the topic. I'm not sure what the A+B and C commits
> are doing.
> 
> At the point where I have C and G above, let's say I regret not having
> squashed A and B as you suggested. My proposal would end up as I draw
> below where the primes are the new versions of the commits (A' is A+B).
> Bare with me, I'm not sure the best way to draw this in ascii. It has that
> orthogoal dimension that makes the ascii drawings a little more
> complex: (I left out the parent of A' which is still E)
> 
>A--B---C
> \ |\<- "replaces" rather than "parent"
>  -A'C' topic
>  /
> D---E---F---G master
> 
> We can continue by actually changing the base. All of these commits are
> kept, I just drop them from the drawings to avoid getting too complex.
> 
> A'--C'
>  \   \  <- "replaces" rather than "parent"
>   A"--C" topic
>  /
> D---E---F---G master
> 
> Normal git log operations would ignore them by default. When finally
> merging to master, it ends up very simple (by default) but the history is 
> still
> there to support archealogic operations.
> 
> D---E---F---G---A"--C" master
> 
> > Now we can push "topic" to master, but as you've noted this loses the
> > raw history, but now consider doing this instead:
> >
> >   A---B---C   A2+B2---C2 topic
> >  / \ /
> > D---E---F---G---G master
> 
> There are two Gs in this drawing. Should the second be H? Sorry, I'm just
> trying to understanding the use case you're describing and I don't
> understand it yet which makes it difficult to comment on the rest of your
> reply.
> 
> > I.e. you could have started working on commit A/B/C, now you "git
> > replace" them (which would be some fancy rebase alias), and what it'll
> > do is create a merge commit that entirely resolves the conflict so
> > that hte tree is equivalent to what "master" was already at. Then you
> > rewrite them and re-apply them on top.
> >
> > If you run "git log" it will already ignore A,B,C unless you specify
> > --full-history, so git already knows to ignore these sort of side
> > histories that result in no changes on the branch they got

Re: Bring together merge and rebase

2017-12-23 Thread Ævar Arnfjörð Bjarmason


On Sat, Dec 23 2017, Carl Baldwin jotted:

> On Sat, Dec 23, 2017 at 07:59:35PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> I think this is a worthwhile thing to implement, there are certainly
>> use-cases where you'd like to have your cake & eat it too as it were,
>> i.e. have a nice rebased history in "git log", but also have the "raw"
>> history for all the reasons the fossil people like to talk about, or for
>> some compliance reasons.
>
> Thank you kindly for your reply. I do think we can have the cake and eat
> it too in this case. At a high level, what you describe above is what
> I'm after. I'm sorry if I left something out or was unclear. I hoped to
> keep my original post brief. Maybe it was too brief to be useful.
> However, I'd like to follow up and be understood.
>
>> But I don't see why you think this needs a new "replaces" parent pointer
>> orthagonal to parent pointers, i.e. something that would need to be a
>> new field in the commit object (I may have misread the proposal, it's
>> not heavy on technical details).
>
> Just to clarify, I am proposing a new "replaces" pointer in the commit
> object. Imagine starting with rebase exactly as it works today. This new
> field would be inserted into any new commit created by a rebase command
> to reference the original commit on which it was based. Though, I'm not
> sure if it would be better to change the behavior of the existing rebase
> command, provide a switch or config option to turn it on, or provide a
> new command entirely (e.g. git replay or git replace) to avoid
> compatibility issues with the existing rebase.

Yeah that sounds fine, I thought you meant that this "replaces" field
would replace the "parent" field, which would require some rather deep
incompatible changes to all git clients.

But then I don't get why you think fetch/pull/gc would need to be
altered, if it's because you thought that adding arbitrary *new* fields
to the commit object would require changes to those that's not the case.

> I imagine that a "git commit --amend" would also insert a "replaces"
> reference to the original commit but I failed to mention that in my
> original post. The amend use case is similar to adding a fixup commit
> and then doing a squash in interactive mode.
>
>> Consider a merge use case like this:
>>
>>   A---B---C topic
>>  / \
>> D---E---F---G---H master
>
> This is a bit different than the use cases that I've had in mind. You
> show that the topic has already merged to master. I have imagined this
> proposal being useful before the topic becomes a part of the master
> branch. I'm thinking in the context of something like a github pull
> request under active development and review or a gerrit review. So, at
> this point, we still look like this:
>
>   A---B---C topic
>  /
> D---E---F---G

Right, I'm just mentioning this for context, i.e. "if you only used
git-merge".

>> Here we worked on a topic with commits A,B & C, maybe we regret not
>> squashing B into A, but it gives us the "raw" history. Instead we might
>> rebase it like this:
>>
>>   A+B---C topic
>>  /
>> G---H master
>
> Since H already merged the topic. I'm not sure what the A+B and C
> commits are doing.

This means that master is at commit H, but your newly rebased topic is
at C, i.e. master has no new commits so you could `git push origin
C:master` without -f.

> At the point where I have C and G above, let's say I regret not having
> squashed A and B as you suggested. My proposal would end up as I draw
> below where the primes are the new versions of the commits (A' is A+B).
> Bare with me, I'm not sure the best way to draw this in ascii. It has
> that orthogoal dimension that makes the ascii drawings a little more
> complex: (I left out the parent of A' which is still E)
>
>A--B---C
> \ |\<- "replaces" rather than "parent"
>  -A'C' topic
>  /
> D---E---F---G master
>
> We can continue by actually changing the base. All of these commits are
> kept, I just drop them from the drawings to avoid getting too complex.
>
> A'--C'
>  \   \  <- "replaces" rather than "parent"
>   A"--C" topic
>  /
> D---E---F---G master
>
> Normal git log operations would ignore them by default. When finally
> merging to master, it ends up very simple (by default) but the history
> is still there to support archealogic operations.
>
> D---E---F---G---A"--C" master
>
>> Now we can push "topic" to master, but as you've noted this loses the
>> raw history, but now consider doing this instead:
>>
>>   A---B---C   A2+B2---C2 topic
>>  / \ /
>> D---E---F---G---G master
>
> There are two Gs in this drawing. Should the second be H? Sorry, I'm
> just trying to understanding the use case you're describing and I don't
> understand it yet which makes it difficult to

Re: Bring together merge and rebase

2017-12-23 Thread Carl Baldwin

On Sat, Dec 23, 2017 at 07:59:35PM +0100, Ævar Arnfjörð Bjarmason wrote:
> I think this is a worthwhile thing to implement, there are certainly
> use-cases where you'd like to have your cake & eat it too as it were,
> i.e. have a nice rebased history in "git log", but also have the "raw"
> history for all the reasons the fossil people like to talk about, or for
> some compliance reasons.

Thank you kindly for your reply. I do think we can have the cake and eat
it too in this case. At a high level, what you describe above is what
I'm after. I'm sorry if I left something out or was unclear. I hoped to
keep my original post brief. Maybe it was too brief to be useful.
However, I'd like to follow up and be understood.

> But I don't see why you think this needs a new "replaces" parent pointer
> orthagonal to parent pointers, i.e. something that would need to be a
> new field in the commit object (I may have misread the proposal, it's
> not heavy on technical details).

Just to clarify, I am proposing a new "replaces" pointer in the commit
object. Imagine starting with rebase exactly as it works today. This new
field would be inserted into any new commit created by a rebase command
to reference the original commit on which it was based. Though, I'm not
sure if it would be better to change the behavior of the existing rebase
command, provide a switch or config option to turn it on, or provide a
new command entirely (e.g. git replay or git replace) to avoid
compatibility issues with the existing rebase.

I imagine that a "git commit --amend" would also insert a "replaces"
reference to the original commit but I failed to mention that in my
original post. The amend use case is similar to adding a fixup commit
and then doing a squash in interactive mode.

> Consider a merge use case like this:
> 
>   A---B---C topic
>  / \
> D---E---F---G---H master

This is a bit different than the use cases that I've had in mind. You
show that the topic has already merged to master. I have imagined this
proposal being useful before the topic becomes a part of the master
branch. I'm thinking in the context of something like a github pull
request under active development and review or a gerrit review. So, at
this point, we still look like this:

  A---B---C topic
 /
D---E---F---G

> Here we worked on a topic with commits A,B & C, maybe we regret not
> squashing B into A, but it gives us the "raw" history. Instead we might
> rebase it like this:
> 
>   A+B---C topic
>  /
> G---H master

Since H already merged the topic. I'm not sure what the A+B and C
commits are doing.

At the point where I have C and G above, let's say I regret not having
squashed A and B as you suggested. My proposal would end up as I draw
below where the primes are the new versions of the commits (A' is A+B).
Bare with me, I'm not sure the best way to draw this in ascii. It has
that orthogoal dimension that makes the ascii drawings a little more
complex: (I left out the parent of A' which is still E)

   A--B---C
\ |\<- "replaces" rather than "parent"
 -A'C' topic
 /
D---E---F---G master

We can continue by actually changing the base. All of these commits are
kept, I just drop them from the drawings to avoid getting too complex.

A'--C'
 \   \  <- "replaces" rather than "parent"
  A"--C" topic
 /
D---E---F---G master

Normal git log operations would ignore them by default. When finally
merging to master, it ends up very simple (by default) but the history
is still there to support archealogic operations.

D---E---F---G---A"--C" master

> Now we can push "topic" to master, but as you've noted this loses the
> raw history, but now consider doing this instead:
> 
>   A---B---C   A2+B2---C2 topic
>  / \ /
> D---E---F---G---G master

There are two Gs in this drawing. Should the second be H? Sorry, I'm
just trying to understanding the use case you're describing and I don't
understand it yet which makes it difficult to comment on the rest of
your reply.

> I.e. you could have started working on commit A/B/C, now you "git
> replace" them (which would be some fancy rebase alias), and what it'll
> do is create a merge commit that entirely resolves the conflict so that
> hte tree is equivalent to what "master" was already at. Then you rewrite
> them and re-apply them on top.
> 
> If you run "git log" it will already ignore A,B,C unless you specify
> --full-history, so git already knows to ignore these sort of side
> histories that result in no changes on the branch they got merged
> into. I don't know about bisect, but if it's not doing something similar
> already it would be easy to make it do so.

I haven't had the need to use --full-history much. Let me see if I can
play around with it to see if I can figure out how to use it in a way

Re: Bring together merge and rebase

2017-12-23 Thread Ævar Arnfjörð Bjarmason


On Sat, Dec 23 2017, Carl Baldwin jotted:

> The big contention among git users is whether to rebase or to merge
> changes [2][3] while iterating. I used to firmly believe that merging
> was the way to go and rebase was harmful. More recently, I have worked
> in some environments where I saw rebase used very effectively while
> iterating on changes and I relaxed my stance a lot. Now, I'm on the
> fence. I appreciate the strengths and weaknesses of both approaches. I
> waffle between the two depending on the situation, the tools being
> used, and I guess, to some extent, my mood.
>
> I think what git needs is something brand new that brings the two
> together and has all of the advantages of both approaches. Let me
> explain what I've got in mind...
>
> I've been calling this proposal `git replay` or `git replace` but I'd
> like to hear other suggestions for what to name it. It works like
> rebase except with one very important difference. Instead of orphaning
> the original commit, it keeps a pointer to it in the commit just like
> a `parent` entry but calls it `replaces` instead to distinguish it
> from regular history. In the resulting commit history, following
> `parent` pointers shows exactly the same history as if the commit had
> been rebased. Meanwhile, the history of iterating on the change itself
> is available by following `replaces` pointers. The new commit replaces
> the old one but keeps it around to record how the change evolved.
>
> The git history now has two dimensions. The first shows a cleaned up
> history where fix ups and code review feedback have been rolled into
> the original changes and changes can possibly be ordered in a nice
> linear progression that is much easier to understand. The second
> drills into the history of a change. There is no loss and you don't
> change history in a way that will cause problems for others who have
> the older commits.
>
> Replay handles collaboration between multiple authors on a single
> change. This is difficult and prone to accidental loss when using
> rebase and it results in a complex history when done with merge. With
> replay, collaborators could merge while collaborating on a single
> change and a record of each one's contributions can be preserved.
> Attempting this level of collaboration caused me many headaches when I
> worked with the gerrit workflow (which in many ways, I like a lot).
>
> I blogged about this proposal earlier this year when I first thought
> of it [1]. I got busy and didn't think about it for a while. Now with
> a little time off of work, I've come back to revisit it. The blog
> entry has a few examples showing how it works and how the history will
> look in a few examples. Take a look.
>
> Various git commands will have to learn how to handle this kind of
> history. For example, things like fetch, push, gc, and others that
> move history around and clean out orphaned history should treat
> anything reachable through `replaces` pointers as precious. Log and
> related history commands may need new switches to traverse the history
> differently in different situations. Bisect is a interesting one. I
> tend to think that bisect should prefer the regular commit history but
> have the ability to drill into the change history if necessary.
>
> In my opinion, this proposal would bring together rebase and merge in
> a powerful way and could end the contention. Thanks for your
> consideration.
>
> Carl Baldwin
>
> [1] http://blog.episodicgenius.com/post/merge-or-rebase--neither/
> [2] https://git-scm.com/book/en/v2/Git-Branching-Rebasing
> [3] http://changelog.complete.org/archives/586-rebase-considered-harmful

I think this is a worthwhile thing to implement, there are certainly
use-cases where you'd like to have your cake & eat it too as it were,
i.e. have a nice rebased history in "git log", but also have the "raw"
history for all the reasons the fossil people like to talk about, or for
some compliance reasons.

But I don't see why you think this needs a new "replaces" parent pointer
orthagonal to parent pointers, i.e. something that would need to be a
new field in the commit object (I may have misread the proposal, it's
not heavy on technical details).

Consider a merge use case like this:

  A---B---C topic
 / \
D---E---F---G---H master

Here we worked on a topic with commits A,B & C, maybe we regret not
squashing B into A, but it gives us the "raw" history. Instead we might
rebase it like this:

  A+B---C topic
 /
G---H master

Now we can push "topic" to master, but as you've noted this loses the
raw history, but now consider doing this instead:

  A---B---C   A2+B2---C2 topic
 / \ /
D---E---F---G---G master

I.e. you could have started working on commit A/B/C, now you "git
replace" them (which would be some fancy rebase alias), and what it'll
do is create a merge commit that entirely resolves the conflict so that
hte tree is equivalent to what

43 matches

Mail list logo