Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Fredrik Kuivinen
On Wed, Sep 07, 2005 at 02:33:42PM -0400, Daniel Barkalow wrote:
 On Wed, 7 Sep 2005, Fredrik Kuivinen wrote:
 
  Of the 500 merge commits that currently exists in the kernel
  repository 19 produces non-clean merges with git-merge-script. The
  four merge cases listed in
  [EMAIL PROTECTED] are cleanly merged by
  git-merge-script. Every merge commit which is cleanly merged by
  git-resolve-script is also cleanly merged by git-merge-script,
  furthermore the results are identical. There are currently two merges
  in the kernel repository which are not cleanly merged by
  git-resolve-script but are cleanly merged by git-merge-script.
 
 If you use my read-tree and change git-resolve-script to pass all of the 
 ancestors to it, how does it do? I expect you'll still be slightly ahead, 
 because we don't (yet) have content merge with multiple ancestors. You 
 should also check the merge that Tony Luck reported, which undid a revert, 
 as well as the one that Len Brown reported around the same time which had 
 similar problems. I think maintainer trees are a much better test for a 
 merge algorithm, because the kernel repository is relatively linear, while 
 maintainers tend more to merge things back and forth.

Junio tested some of the multiple common ancestor cases with your
version of read-tree and reported his results in
[EMAIL PROTECTED].

The two cases my algorithm merges cleanly and git-resolve-script do
not merge cleanly are 0e396ee43e445cb7c215a98da4e76d0ce354d9d7 and
0c168775709faa74c1b87f1e61046e0c51ade7f3. Both of them have two common
ancestors. The second one have, as far as I know, not been tested with
your read-tree.

The merge cases reported by Tony Luck and Len Brown are both cleanly
merged by my code.

You are probably right about the maintainer trees. I should have a
look at some of them. Do you know any specific repositories with
interesting merge cases?

- Fredrik
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Daniel Barkalow
On Thu, 8 Sep 2005, Fredrik Kuivinen wrote:

 On Wed, Sep 07, 2005 at 02:33:42PM -0400, Daniel Barkalow wrote:
  On Wed, 7 Sep 2005, Fredrik Kuivinen wrote:
  
   Of the 500 merge commits that currently exists in the kernel
   repository 19 produces non-clean merges with git-merge-script. The
   four merge cases listed in
   [EMAIL PROTECTED] are cleanly merged by
   git-merge-script. Every merge commit which is cleanly merged by
   git-resolve-script is also cleanly merged by git-merge-script,
   furthermore the results are identical. There are currently two merges
   in the kernel repository which are not cleanly merged by
   git-resolve-script but are cleanly merged by git-merge-script.
  
  If you use my read-tree and change git-resolve-script to pass all of the 
  ancestors to it, how does it do? I expect you'll still be slightly ahead, 
  because we don't (yet) have content merge with multiple ancestors. You 
  should also check the merge that Tony Luck reported, which undid a revert, 
  as well as the one that Len Brown reported around the same time which had 
  similar problems. I think maintainer trees are a much better test for a 
  merge algorithm, because the kernel repository is relatively linear, while 
  maintainers tend more to merge things back and forth.
 
 Junio tested some of the multiple common ancestor cases with your
 version of read-tree and reported his results in
 [EMAIL PROTECTED].

Oh, right. I'm clearly not paying enough attention here.

 The two cases my algorithm merges cleanly and git-resolve-script do
 not merge cleanly are 0e396ee43e445cb7c215a98da4e76d0ce354d9d7 and
 0c168775709faa74c1b87f1e61046e0c51ade7f3. Both of them have two common
 ancestors. The second one have, as far as I know, not been tested with
 your read-tree.

Okay, I'll have to check whether the result I get seems right. I take it 
your result agrees with what the users actually produced by hand?

 The merge cases reported by Tony Luck and Len Brown are both cleanly
 merged by my code.

Do they come out correctly? Both of those have cases which cannot be 
decided correctly with only the ancestor trees, due to one branch 
reverting a patch that was only in one ancestor. The correct result is to 
revert that patch, but figuring out that requires looking at more trees. I 
think your algorithm should work for this case, but it would be good to 
have verification. (IIRC, Len got the correct result while Tony got the 
wrong result and then corrected it later.)

 You are probably right about the maintainer trees. I should have a
 look at some of them. Do you know any specific repositories with
 interesting merge cases?

Not especially, except that I would guess that people who have reported 
hitting bad cases would be more likely to have other interesting merges in 
their trees. You might also try merging maintainer trees with each other, 
since it's relatively likely that there would be complicating overlap that 
only doesn't cause confusion because things get rearranged in -mm. For 
that matter, I bet you'd get plenty of test cases out of trying to 
replicate -mm as a git tree.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Junio C Hamano
Chuck Lever [EMAIL PROTECTED] writes:

 Junio C Hamano wrote:
 Chuck Lever [EMAIL PROTECTED] writes:

 once the list implementation is working well, i plan to go back and 
 replace it with something more sophisticated which can perform 
 insertion, deletion, and searching efficiently.

Well, what I wanted to see was if we can get away without
introducing more sophisticated data structures, which would add
complexity.

 I haven't timed it like you did, but my gut feeling is that the
 most wastage during the merge is coming from having to move
 entries because we insert into or remove from the same
 active-cache array.

 the merge actually replaces entries in place, so it is already fairly 
 efficient.  the wastage in the merge case arises from the many list 
 insertions done by read_cache, all of which involve moving some of the 
 active cache array.

Yes, the reading of three trees upfront is probably the culprit
in your case, and I think that is something Daniel's read-tree
rewrite can help.  I still see some remove_cache_entry_at()
there because it works on the active_cache in place, but the
structure of the new code should be much easier to make the kind
of modification we are talking about.

 I think what Daniel
 did to read-tree recently, still in the proposed updates branch,
 would make this kind of change far easier to implement.

 as i'm new to git development, i wasn't aware of the proposed branch.  i 
 will see if i can take a look (or send me a pointer to the list archives 
 if he has posted them here).

If you already have a local copy of git.git repository you work in:

git fetch http://kernel.org/pub/scm/git/git.git +pu:refs/heads/ko-pu
git checkout -f ko-pu

Otherwise

git clone http://kernel.org/pub/scm/git/git.git peek
cd peek
git checkout -f pu

The interesting change from the mainline is in read-tree.c

You can browse http://kernel.org/git/ and see commitdiffs in the
pu branch if you are not interested in slurping the whole
proposed updates branch.

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Linus Torvalds


On Thu, 8 Sep 2005, Junio C Hamano wrote:
 
 Yes, the reading of three trees upfront is probably the culprit
 in your case

However, note that _most_ tree reading just reads one.

Merges may take half a second, and yes, when I did it, the fact that we 
move things around in the array is by far the highest cost. But the thing 
is, if merges take half a second, that's still not only damn good, it's 
not even the most common operation.

Yes, the active_cache[] layout as one big array is inconvenient for 
read_tree(), which tends to want to interleave different trees in the 
array and thus forces things to be moved around.

But remember what the most common use for the index is - it's the single 
tree case from read_cache(). That's _so_ much more common than 
read_tree() that it's not even funny.

So the data structure is optimized for a different case than reading in 
trees. Big deal. That optimization is definitely worth it: it allows us to 
do the read_cache() with the actual index entries being totally read-only 
(a linked list would have to add a next pointer to the cache entries and 
not allow the in-place thing that read_cache() does).

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Junio C Hamano
Linus Torvalds [EMAIL PROTECTED] writes:

 So the data structure is optimized for a different case than reading in 
 trees. Big deal. That optimization is definitely worth it: it allows us to 
 do the read_cache() with the actual index entries being totally read-only 
 (a linked list would have to add a next pointer to the cache entries and 
 not allow the in-place thing that read_cache() does).

Yes, you are right.  What is being discussed is to help
read-tree.c while keeping that nice property.

I think Daniel's patch is going in the right direction.  It can
be told to populate the resulting cache from scratch by reading
the current cache and the trees being read, instead of inserting
and removing the current cache as it does right now.  Once that
is done, we would not have to do repeated memmove to insert and
delete an entry one at a time anymore.


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Chuck Lever

Linus Torvalds wrote:


On Thu, 8 Sep 2005, Junio C Hamano wrote:


Yes, the reading of three trees upfront is probably the culprit
in your case



However, note that _most_ tree reading just reads one.

Merges may take half a second, and yes, when I did it, the fact that we 
move things around in the array is by far the highest cost. But the thing 
is, if merges take half a second, that's still not only damn good, it's 
not even the most common operation.


in my case the merges were taking significantly longer than a half 
second.  making this change is certainly not worth it if merges are 
running fast...


Yes, the active_cache[] layout as one big array is inconvenient for 
read_tree(), which tends to want to interleave different trees in the 
array and thus forces things to be moved around.


But remember what the most common use for the index is - it's the single 
tree case from read_cache(). That's _so_ much more common than 
read_tree() that it's not even funny.


read_cache is fast as implemented.  the issue is that the next thing 
that is often done is a cache insert, which requires a memmove, which is 
slow.


So the data structure is optimized for a different case than reading in 
trees. Big deal. That optimization is definitely worth it: it allows us to 
do the read_cache() with the actual index entries being totally read-only 
(a linked list would have to add a next pointer to the cache entries and 
not allow the in-place thing that read_cache() does).


they are still read-only with my linked list implementation.
begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:[EMAIL PROTECTED]
title:Member of Technical Staff
tel;work:+1 734 763-4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668-1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard



Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Linus Torvalds


On Thu, 8 Sep 2005, Chuck Lever wrote:
 
 in my case the merges were taking significantly longer than a half 
 second.  making this change is certainly not worth it if merges are 
 running fast...

Note that in cold-cache cases, all the expense of read-tree is in actually
reading the tree objects themselves (a kernel tree has more than a
thousand subdirectories). Also, a full git pull will do the diffstat 
etc, and then the expense ends up being in the actual git diff part.

So read-tree itself may be half a second, but a merge ends up having other 
parts.

 they are still read-only with my linked list implementation.

Btw, in the sparse project, we have this really smart pointer list data
structure, which is extremely space- and time-efficient. It ends up
_looking_ like a linked list, but it batches things up in hunks of 29
entries (29 pointers plus overhead gives you blocks of 32 longwords, which
is the allocation size) and thus gives basically a cache-friendly
doubly-linked list. It knows how to do insertions, traversals etc very 
efficiently.

Any interest?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Chuck Lever

Linus Torvalds wrote:


On Thu, 8 Sep 2005, Chuck Lever wrote:

in my case the merges were taking significantly longer than a half 
second.  making this change is certainly not worth it if merges are 
running fast...



Note that in cold-cache cases, all the expense of read-tree is in actually
reading the tree objects themselves (a kernel tree has more than a
thousand subdirectories). Also, a full git pull will do the diffstat 
etc, and then the expense ends up being in the actual git diff part.


So read-tree itself may be half a second, but a merge ends up having other 
parts.


i measured this using the following test...

i have a linux kernel git repository under control of stgit and it has 
about 70 patches in it.  i did an stg status to heat the page cache. 
i popped all the patches, then did an stg pull origin.


i started oprofile, and did an stg push -a.  it took about 9 minutes.

i stopped oprofile and looked at the results.  roughly 65% of the total 
CPU time was spent in libc:memmove.  after instrumenting git, i 
determined that in the stg push case the critical memmove was the one 
in add_cache_entry.


note that i'm not saying that the 9 minutes of wall clock time was 
entirely due to CPU... catalin has been steadily improving stg push so 
this time has shortened by more than half, recently.  but i do notice 
that working in a kernel repository is significantly slower than working 
in my git or stgit source repositories, which are smaller by far.  the 
small repositories behave just as i expect, the tools respond quite 
snappily.



they are still read-only with my linked list implementation.


Btw, in the sparse project, we have this really smart pointer list data
structure, which is extremely space- and time-efficient. It ends up
_looking_ like a linked list, but it batches things up in hunks of 29
entries (29 pointers plus overhead gives you blocks of 32 longwords, which
is the allocation size) and thus gives basically a cache-friendly
doubly-linked list. It knows how to do insertions, traversals etc very 
efficiently.


Any interest?


i'm not married to splay trees.  i think we should explore several 
different data structures before picking one, and this one sounds 
reasonable to try.
begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:[EMAIL PROTECTED]
title:Member of Technical Staff
tel;work:+1 734 763-4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668-1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard



Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Fredrik Kuivinen
On Thu, Sep 08, 2005 at 11:27:35AM -0400, Daniel Barkalow wrote:

...

  The two cases my algorithm merges cleanly and git-resolve-script do
  not merge cleanly are 0e396ee43e445cb7c215a98da4e76d0ce354d9d7 and
  0c168775709faa74c1b87f1e61046e0c51ade7f3. Both of them have two common
  ancestors. The second one have, as far as I know, not been tested with
  your read-tree.
 
 Okay, I'll have to check whether the result I get seems right. I take it 
 your result agrees with what the users actually produced by hand?
 
The first one agrees with what was actually committed. For the second
one the difference between the tree produced by the algorithm and what
was committed is:

diff --git a/include/net/ieee80211.h b/include/net/ieee80211.h
--- a/include/net/ieee80211.h
+++ b/include/net/ieee80211.h
@@ -425,9 +425,7 @@ struct ieee80211_stats {
 
 struct ieee80211_device;
 
-#if 0 /* for later */
 #include ieee80211_crypt.h
-#endif
 
 #define SEC_KEY_1 (10)
 #define SEC_KEY_2 (11)


I have looked at the files and common ancestors involved and I think
that this change have been introduced manually. I may have missed
something when I analysed it though...


  The merge cases reported by Tony Luck and Len Brown are both cleanly
  merged by my code.
 
 Do they come out correctly? Both of those have cases which cannot be 
 decided correctly with only the ancestor trees, due to one branch 
 reverting a patch that was only in one ancestor. The correct result is to 
 revert that patch, but figuring out that requires looking at more trees. I 
 think your algorithm should work for this case, but it would be good to 
 have verification. (IIRC, Len got the correct result while Tony got the 
 wrong result and then corrected it later.)

Len's merge case come out identically to the tree he committed. I have
described what I got for Tony's case in
[EMAIL PROTECTED] (my merge algorithm
produces the result Tony expected to get, but he didn't get that from
git-resolve-script).

- Fredrik
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Junio C Hamano
Junio C Hamano [EMAIL PROTECTED] writes:

 ... I'd like to leave what
 merge strategy to use as an user option, and leave the door open
 for other merge strategies to emerge later.  I know Pasky wants
 to look into pcdv merge and other alternatives.

 This is still off-the-top-of-my-head, but the top-level merge
 entry point for the end user would be just:

 git merge head remote merge-message

 and by default 'git-merge-script' (sorry, I am taking over the
 name of your script for this 'generic driver for underlying
 merge strategy scripts') would do something like:

And here is an illustration of concept patch, requesting for
comments.  It probably is buggy as h*ll, but I am showing
interfaces early, hoping this is something people can agree on
to use to plug various merge strategies into.

The patch uses 'git-show-branches --merge-base' as an improved
'git-merge-base -a' that can do more than two heads, which still
only exists in my development repository, but I hope you get the
idea.


Subject: [PATCH] Multi-backend merge driver.

This is just an illustration of concept patch.

The new command 'git merge' takes the current head and one or more
remote heads, with the commit log message for the automated case.

If the heads being merged are simple fast-forwards, it acts the
same way as the current 'git resolve'.  Otherwise, it tries
different merge strategies and takes the result from the one that
succeeded auto-merging, if there is any.

If no merge strategy succeeds auto-merging, their results are
evaluated for number of paths needed for hand resolving, and the
one with the least number of such paths is left in the working
tree.  The user is asked to resolve them by hand and make a
commit manually.

The calling convention from the 'git merge' driver to merge
strategy programs is very simple:

 - A strategy program is to be called 'git-merge-strategy'.

 - They take input of this form:

common1 common2 ... '--' head remote1 remote2...

   That is, one or more the common ancestors, double dash, the
   current head, and one or more remote heads being merged into
   the current branch.

 - Before a strategy program is called, the working tree is
   matched to the current head.

 - The strategy program exits with status code 0 when it
   successfully auto-merges the given heads.  It should do
   update-cache for all the merged paths when it does so -- the
   index file will be used to record the merge result as a
   commit by the driver.

 - The strategy program exits with status code 1 when it leaves
   conflicts behind.  It should do update-cache for all the
   merged paths that it successfully auto-merged, and leave the
   cache entry in the index file as the same as head for paths
   it could not auto-merge, and leave its best-effort result
   with conflict markers in the working tree when it does so.

 - The strategy program exists with status code other than 0 or
   1 if it does not handle the given merge at all.

As examples, this commit comes with merge strategies based on
'git resolve' and 'git octopus'.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]

---

 git-commit.sh|2 
 git-merge-octopus.sh |   86 +
 git-merge-resolve.sh |   80 
 git-merge.sh |  205 ++
 4 files changed, 372 insertions(+), 1 deletions(-)
 create mode 100755 git-merge-octopus.sh
 create mode 100755 git-merge-resolve.sh
 create mode 100755 git-merge.sh

a65a614ce7f67c3ed4dee3bed3ab4d86d3f79577
diff --git a/git-commit.sh b/git-commit.sh
--- a/git-commit.sh
+++ b/git-commit.sh
@@ -158,7 +158,7 @@ if [ ! -r $GIT_DIR/HEAD ]; then
PARENTS=
 else
if [ -f $GIT_DIR/MERGE_HEAD ]; then
-   PARENTS=-p HEAD -p MERGE_HEAD
+   PARENTS=-p HEAD `sed -e 's/^/-p /' $GIT_DIR/MERGE_HEAD`
fi
if test $use_commit != 
then
diff --git a/git-merge-octopus.sh b/git-merge-octopus.sh
new file mode 100755
--- /dev/null
+++ b/git-merge-octopus.sh
@@ -0,0 +1,86 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+# Resolve two or more trees.
+#
+
+# The first parameters up to -- are merge bases; the rest are heads.
+bases= head= remotes= sep_seen=
+for arg
+do
+   case ,$sep_seen,$head,$arg, in
+   *,--,)
+   sep_seen=yes
+   ;;
+   ,yes,,*)
+   head=$arg
+   ;;
+   ,yes,*)
+   remotes=$remotes$arg 
+   ;;
+   *)
+   bases=$bases$arg 
+   ;;
+   esac
+done
+
+# Reject if this is not an Octopus -- resolve should be used instead.
+case $remotes in
+?*' '?*)
+   ;;
+*)
+   exit 2 ;;
+esac
+
+# MRC is the current merge reference commit
+# MRT is the current merge result tree
+
+MRC=$head MSG= PARENT=-p $head
+MRT=$(git-write-tree)
+CNT=1 ;# counting our head
+NON_FF_MERGE=0
+for SHA1 in $remotes
+do
+   common=$(git-merge-base $MRC 

Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread A Large Angry SCM

Junio C Hamano wrote:
[...]

Subject: [PATCH] Multi-backend merge driver.

This is just an illustration of concept patch.

The new command 'git merge' takes the current head and one or more
remote heads, with the commit log message for the automated case.

If the heads being merged are simple fast-forwards, it acts the
same way as the current 'git resolve'.  Otherwise, it tries
different merge strategies and takes the result from the one that
succeeded auto-merging, if there is any.

If no merge strategy succeeds auto-merging, their results are
evaluated for number of paths needed for hand resolving, and the
one with the least number of such paths is left in the working
tree.  The user is asked to resolve them by hand and make a
commit manually.

[...]

This last paragraph concerns me because it limits us to a very simple 
metric, number of paths needed for hand resolving. Some hand 
resolving situations can be more complex than others and it would nice 
to use that information if it was available.


Of course I have no idea what that information may be or how it can be 
computed.

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Daniel Barkalow
On Thu, 8 Sep 2005, Fredrik Kuivinen wrote:

 The first one agrees with what was actually committed. For the second
 one the difference between the tree produced by the algorithm and what
 was committed is:
 
 diff --git a/include/net/ieee80211.h b/include/net/ieee80211.h
 --- a/include/net/ieee80211.h
 +++ b/include/net/ieee80211.h
 @@ -425,9 +425,7 @@ struct ieee80211_stats {
  
  struct ieee80211_device;
  
 -#if 0 /* for later */
  #include ieee80211_crypt.h
 -#endif
  
  #define SEC_KEY_1 (10)
  #define SEC_KEY_2 (11)
 
 
 I have looked at the files and common ancestors involved and I think
 that this change have been introduced manually. I may have missed
 something when I analysed it though...

Certainly possible that it was done manually.

   The merge cases reported by Tony Luck and Len Brown are both cleanly
   merged by my code.
  
  Do they come out correctly? Both of those have cases which cannot be 
  decided correctly with only the ancestor trees, due to one branch 
  reverting a patch that was only in one ancestor. The correct result is to 
  revert that patch, but figuring out that requires looking at more trees. I 
  think your algorithm should work for this case, but it would be good to 
  have verification. (IIRC, Len got the correct result while Tony got the 
  wrong result and then corrected it later.)
 
 Len's merge case come out identically to the tree he committed. I have
 described what I got for Tony's case in
 [EMAIL PROTECTED] (my merge algorithm
 produces the result Tony expected to get, but he didn't get that from
 git-resolve-script).

Good. It looks to me like this is a good algorithm in practice, then.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] A new merge algorithm, take 3

2005-09-07 Thread Fredrik Kuivinen
Hi,

Here is the new version of the merge algorithm patch.

The major changes compared to the previous patch are:

* No more messing around with merge-cache. git-ls-files used to get
  the unmerged entries instead.
* The python code is now contained in two files, git-merge-script and
  gitMergeCommon.py.
* The user interface is identical to the interface provided by
  git-resolve-script
* In the non-clean case the unmerged cache entries will not be
  removed from the cache.

I have also attached a test script which can redo every merge in a
repository with both git-resolve-script and git-merge-script. It will
report any non-clean merges and non-identical results for clean
merges. Do _not_ use this script in repositories you care about. It
calls 'git reset --hard' repeatedly and will probably not leave the
repository in its original state when it's done.

Of the 500 merge commits that currently exists in the kernel
repository 19 produces non-clean merges with git-merge-script. The
four merge cases listed in
[EMAIL PROTECTED] are cleanly merged by
git-merge-script. Every merge commit which is cleanly merged by
git-resolve-script is also cleanly merged by git-merge-script,
furthermore the results are identical. There are currently two merges
in the kernel repository which are not cleanly merged by
git-resolve-script but are cleanly merged by git-merge-script.

I guess the need for this has decreased with Daniel's new read-tree
code. Is there any chance of getting this code merged into mainline
git?

- Fredrik
#!/usr/bin/env python

import sys, math, random, os, re, signal, tempfile, time
from heapq import heappush, heappop
from sets import Set
from gitMergeCommon import *

def mergeMerge(a, b):
print 'Running merge-script HEAD', b.sha, '...'
[out, code] = runProgram(['git-merge-script', 'HEAD', b.sha, 'merge message'],
 returnCode=True, pipeOutput=False)
if code == 0:
return True
else:
return False

def gitResolveMerge(a, b):
print 'Running git resolve HEAD', b.sha, '...'
[out, code] = runProgram(['git', 'resolve', 'HEAD', b.sha, 'merge message'],
 returnCode=True, pipeOutput=False)

if code == 0:
return True
else:
return False

def doWork(graph, commits):
print 'commits:', repr(commits)
result = []
totalMergeTime = 0
totalResolveTime = 0
numCommits = 0
try:
for commit in graph.commits:
if len(commits)  0 and not (commit.sha in commits):
continue

if len(commit.parents)  1:
res = commit.sha + ' : '
if len(commit.parents) == 2:
numCommits += 1
print '---'
print 'Testing commit', commit.sha, '(tree)', commit.tree()
a = commit.parents[0]
b = commit.parents[1]

runProgram(['git-reset-script', '--hard', a.sha])
print 'Running git resolve...'
stdout.flush()
startTime = time.time()
resResolve = gitResolveMerge(a, b)
timeResolve = time.time() - startTime
totalResolveTime += timeResolve

if resResolve:
resolveHead = Commit(runProgram(['git-rev-parse', '--verify', 'HEAD']).rstrip(), [a, b])

runProgram(['git-reset-script', '--hard', a.sha])
print 'Running merge...'
stdout.flush()
startTime = time.time()
resMerge = mergeMerge(a, b)
timeMerge = time.time() - startTime
totalMergeTime += timeMerge

if resMerge:
mergeHead = Commit(runProgram(['git-rev-parse', '--verify', 'HEAD']).rstrip(), [a, b])

res += 'time r: ' + str(int(timeResolve)) + ' m: ' + str(int(timeMerge)) + '\t'
if resResolve and resMerge:
if resolveHead.tree() == mergeHead.tree():
res += 'Identical result'
else:
res += 'Non-identical results! resolve: ' + resolveHead.sha + \
   ' merge: ' + mergeHead.sha
else:
if resResolve:
res += 'resolve succeeded (' + resolveHead.sha + '), '
else:
res += 'resolve failed, '

if resMerge:
res += 'merge succeeded (' + mergeHead.sha + ')'
else:
res += 'merge failed'
else:
res += 'Ignoring octupus merge'

print res
  

Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-07 Thread Daniel Barkalow
On Wed, 7 Sep 2005, Fredrik Kuivinen wrote:

 Of the 500 merge commits that currently exists in the kernel
 repository 19 produces non-clean merges with git-merge-script. The
 four merge cases listed in
 [EMAIL PROTECTED] are cleanly merged by
 git-merge-script. Every merge commit which is cleanly merged by
 git-resolve-script is also cleanly merged by git-merge-script,
 furthermore the results are identical. There are currently two merges
 in the kernel repository which are not cleanly merged by
 git-resolve-script but are cleanly merged by git-merge-script.

If you use my read-tree and change git-resolve-script to pass all of the 
ancestors to it, how does it do? I expect you'll still be slightly ahead, 
because we don't (yet) have content merge with multiple ancestors. You 
should also check the merge that Tony Luck reported, which undid a revert, 
as well as the one that Len Brown reported around the same time which had 
similar problems. I think maintainer trees are a much better test for a 
merge algorithm, because the kernel repository is relatively linear, while 
maintainers tend more to merge things back and forth.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-07 Thread Junio C Hamano
Fredrik Kuivinen [EMAIL PROTECTED] writes:

 I guess the need for this has decreased with Daniel's new read-tree
 code. Is there any chance of getting this code merged into mainline
 git?

I do not think Daniel's code decreased anything.  The two
algorithms are solving the same problem but they do it
differently.  There would be cases where multi-base merge would
give better results over your base-on-merge-bases merge; and in
other cases yours would perform better.  I'd like to leave what
merge strategy to use as an user option, and leave the door open
for other merge strategies to emerge later.  I know Pasky wants
to look into pcdv merge and other alternatives.

This is still off-the-top-of-my-head, but the top-level merge
entry point for the end user would be just:

git merge head remote merge-message

and by default 'git-merge-script' (sorry, I am taking over the
name of your script for this 'generic driver for underlying
merge strategy scripts') would do something like:

 - Run 'git-merge-base -a' to find common ancestors.

 - If there is one single common ancestor, do what the current
   'git-resolve-script' does: if fast-forward, declare success
   and exit without touching anything but $GIT_DIR/HEAD;
   otherwise, run read-tree -m 3-way followed by merge-cache,
   and commit with the given message if the merge was clean and
   exit.  If the merge was not clean, go on to the next step.

 - User preference (maybe $HOME/.git/merge-preference, maybe
   command line option to 'git merge', maybe interactive
   prompting) can specify which complex merge strategies to use.
   we probably would want to be able to say try this, that and
   that-over-there in this order in the preference. One of the
   strategies could be 'I give up, just exit and sort it out by
   hand'.

 - The chosen merge backend, be it the one that uses Daniel's
   multi-base merge or your base-on-merge-bases merge, takes
   over.  In any case, the backend should attempt to resolve the
   two heads, and stop at the point just before committing.

 - Evaluate how good the merge is by looking at the result from
   the backend at this point, and if it is not good enough,
   reset the working tree back to the pre-merge state and try
   the next backend.  The looping code needs a termination
   clause that picks the best result after exhausting the list
   of backends.

 - Then, if the merge backend auto-resolved the heads, we
   commit with merge-message; otherwise we exit with
   non-zero status and have the user sort it out.

The above assumes that (1) we are already doing a reasonable
thing in simple one-common-ancestor case; (2) it is in the fast
path and we want the multiple backend code out of it.  You could
think of the current 'git-resolve-script' code equivalent of
having a single 'complex merge backend' that just 'gives up'.

If the above is a good user model, we need a couple of interface
convention between merge backends and the above driver:

 - Some merge backends (like yours) would require that the
   starting tree is clean while others may not care as long as
   the paths involved are clean (i.e. index matches head and
   the working file matches index -- the traditional code and
   Daniel's even allow such a working file being different from
   index if it happens to match the merge result).  They need to
   be able to say declined to merge even though I might do a
   better job than others if given a clean working tree.  Then
   the user can retry the same merge after getting the working
   tree in a clean state.  I personally feel that we do not need
   this complexity and it is reasonable to always require the
   starting tree to be clean when the merge is not a
   fast-forward, though.

 - When a merge backend finishes, it may leave the working tree
   in a failed merge state.  If we were to try different
   backends in the loop to find the best result, we would need
   some way to assign scores to them.  The score should be able
   to tell us if the result can be auto-committable or not, and
   if not how bad it is.  I think exit status from the backend
   can be used to tell us the former (i.e. exit non-zero if your
   result has conflicts and you do not want your result to be
   committed immediately), and number of cache-dirty entries can
   be used as an indication of how bad it is (i.e. leave the
   paths you know have not been cleanly merged cache-dirty -- do
   not run git-update-cache on them).  This is essentially the
   same convention used by the current 'git-resolve-script'.

I think the 'renaming merge heuristics' Linus outlined in
another thread will fall naturally into this picture as a merge
backend.

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html