Re: [RFC, PATCH] A new merge algorithm (EXPERIMENTAL)

2005-08-26 Thread Fredrik Kuivinen
On Fri, Aug 26, 2005 at 04:48:32PM -0400, Daniel Barkalow wrote:
 On Fri, 26 Aug 2005, Fredrik Kuivinen wrote:
 
  I will try to describe how the algorithm works. The problem with the
  usual 3-way merge algorithm is that we sometimes do not have a unique
  common ancestor. In [1] B and C seems to be equally good. What this
  algorithm does is to _merge_ the common ancestors, in this case B and
  C, into a temporary tree lets call it T. It does then use this
  temporary tree T as the common ancestor for D and E to produce the
  final merge result. In the case described in [1] this will work out
  fine and we get a clean merge with the expected result.
 
 The only problem I can see with this is that it's likely to generate
 conflicts between the shared heads, and the user is going to be confused
 trying to resolve them, because the files with the conflicts will be
 missing all of the more recent changes.

I don't actually think that conflicts between shared heads is a
problem. Given the criss-cross case (we want to merge A and B into M):

 M
 |\
 | \
 A  B
 |\/|
 |/\|
 C  D
 | /
 |/
 E

Lets assume there is a merge conflict if we try to merge C and D
(which are the two shared heads). Then both A and B must resolve this
conflict. If they have done it in the same way we wont get a merge
conflict at M, if they have resolved it differently we will get a
merge conflict. In the first case there is no merge conflict at M, in
the second case the user has to pick which one of the two different
resolutions she wants.

Note that the algorithm will happily write non-clean merge results to
the object database during the merge shared heads stage. Hence, when
we are merging C and D internally we will _not_ ask the user to
resolve any eventual merge conflicts.

 Other than that, I think it should
 give the right answer, although it will presumably involve a lot of
 ancient history doing the internal merge. (Which would probably be really
 painful if you've got two branches that cross-merge regularly and never
 actually completely sync)

The expensive part is the repeated merging. But as I wrote in my mail
multiple shared heads seems to be pretty uncommon. As far as I can
tell there is no reason for the number of shared heads to increase as
a repository grows larger. However, this do probably depend on usage
patterns.

But it is certainly possible to construct cases with an arbitrary
number of shared heads. In which case the algorithm will be a bit
slow, at least if there are real content merges going on which are
handled by merge(1).

 I'm getting pretty close to having a version of read-tree that does the
 trivial merge portion based comparing the sides against all of the shared
 heads. I think yours will be better for the cases we've identified, giving
 the correct answer for Tony's case rather than reporting a conflict, but
 it's clearly more complicated. I think my changes are worthwhile anyway,
 since they make the merging logic more central, but obviously
 insufficient.
 
 I've been thinking that could be useful to have read-tree figure out the
 history itself, instead of being passed ancestors, in which case it could
 use your method, except more efficiently (and only look further at the
 history when needed).

It will be interesting to have a look at the code when you are done.
I find the Git architecture with respect to merging to be quite
nice. A core which handles the simple cases _fast_ and let the more
complicated cases be handled by someone else.

- Fredrik
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC, PATCH] A new merge algorithm (EXPERIMENTAL)

2005-08-26 Thread Linus Torvalds


On Fri, 26 Aug 2005, Fredrik Kuivinen wrote:

 In real numbers it is as follows: In Linus' kernel tree there are
 5996 commits. 400 of those have more than one parent. Of those 400
 merge commits 4 have more than one shared head.

Ok, that's already interesting in itself. I was wanting to re-run all the 
merges with the new git-merge-base -a to see which merges might have had 
different merge bases, and you've actually done that. Interesting to see 
the numbers.

 * Is it worth it? That is, is the added complexity in the merge logic
   worth the advantages of correctly handling some strange (but real
   life) merge cases?

I am of two minds on this. I hate the notion of a more complex merge. But
at the same time, it clearly is a very interesting case when we do have
multiple possible shared parents, and I think that at the very least we
should warn the user. And using a more complex merge algorithm when it
happens seems to be a very valid thing to do.

Also, it's possible that other developers see more of the criss-crossing
merges than I do - iow, they're probably more likely to happen in the
throw-away trees than in some of the main trees. Neither of the two cases
we've seen and had issues were merges I did, for example. Which means that
your 1% of all merges number is probably low. Of course, it's quite
likely that in most cases, the pick either one approach will give the
exact same result as the more complex merge.

Using python, which people have less exposure to, sounds like an 
additional thorny issue..

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC, PATCH] A new merge algorithm (EXPERIMENTAL)

2005-08-26 Thread Junio C Hamano
Linus Torvalds [EMAIL PROTECTED] writes:

 On Fri, 26 Aug 2005, Fredrik Kuivinen wrote:

 In real numbers it is as follows: In Linus' kernel tree there are
 5996 commits. 400 of those have more than one parent. Of those 400
 merge commits 4 have more than one shared head.

 Ok, that's already interesting in itself. I was wanting to re-run all the 
 merges with the new git-merge-base -a to see which merges might have had 
 different merge bases, and you've actually done that. Interesting to see 
 the numbers.

Fredrik, mind giving us the commit ID of those four for us to
take a look at them?

 I am of two minds on this. I hate the notion of a more complex merge. But
 at the same time, it clearly is a very interesting case when we do have
 multiple possible shared parents, and I think that at the very least we
 should warn the user. And using a more complex merge algorithm when it
 happens seems to be a very valid thing to do.

I agree.  GIT is lightening fast for trivial cases and we can
afford to spend more time to handle more complex ones carefully,
at the same time telling the user what we are doing is a good thing.

 Using python, which people have less exposure to, sounds like an 
 additional thorny issue..

Not too big a problem for me to follow the patch ;-).


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC, PATCH] A new merge algorithm (EXPERIMENTAL)

2005-08-26 Thread Fredrik Kuivinen
On Fri, Aug 26, 2005 at 06:08:33PM -0700, Junio C Hamano wrote:
 Linus Torvalds [EMAIL PROTECTED] writes:
 
  On Fri, 26 Aug 2005, Fredrik Kuivinen wrote:
 
  In real numbers it is as follows: In Linus' kernel tree there are
  5996 commits. 400 of those have more than one parent. Of those 400
  merge commits 4 have more than one shared head.
 
  Ok, that's already interesting in itself. I was wanting to re-run all the 
  merges with the new git-merge-base -a to see which merges might have had 
  different merge bases, and you've actually done that. Interesting to see 
  the numbers.
 
 Fredrik, mind giving us the commit ID of those four for us to
 take a look at them?

Sure, they are:
467ca22d3371f132ee225a5591a1ed0cd518cb3d which has two shared heads
7e2987503dda95a5f80290bb8c06279009c2419e and
eff910a91ac04ab1d9e210d4f721484af3b39c8d

0e396ee43e445cb7c215a98da4e76d0ce354d9d7 with the heads
462cee296476278acaa54c41925b3273e0e4dd40 and
8be3de3fd8469154a2b3e18a4712032dac5b4a53

3190186362466658f01b2e354e639378ce07e1a9 with
38d84c3bd6dd22bdb1f797c87006931133d71aea and
46906c4415f88cebfad530917bada0835d651824

and finally

da28c12089dfcfb8695b6b555cdb8e03dda2b690 with
9e566d8bd61f939b7f5d7d969f5b178571471cf9 and
18190cc08d70a6ec8ef69f0f6ede021f7cb3f9b8

(The script which finds those commits also prints out the commit id of
any octopus commits. There is one commit with more than two parents in
the kernel repository,
13e652800d1644dfedcd0d59ac95ef0beb7f3165. However, it only looks like
that commit has three parents, two of them are actually identical.)

  I am of two minds on this. I hate the notion of a more complex merge. But
  at the same time, it clearly is a very interesting case when we do have
  multiple possible shared parents, and I think that at the very least we
  should warn the user. And using a more complex merge algorithm when it
  happens seems to be a very valid thing to do.
 
 I agree.  GIT is lightening fast for trivial cases and we can
 afford to spend more time to handle more complex ones carefully,
 at the same time telling the user what we are doing is a good thing.
 
  Using python, which people have less exposure to, sounds like an 
  additional thorny issue..
 
 Not too big a problem for me to follow the patch ;-).
 

Good to know :)

I mainly wanted to try the idea with the new algorithm, so some kind
of high level language seemed like a good choice.

- Fredrik
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html