Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Daniel Barkalow
On Thu, 8 Sep 2005, Fredrik Kuivinen wrote:

> The first one agrees with what was actually committed. For the second
> one the difference between the tree produced by the algorithm and what
> was committed is:
> 
> diff --git a/include/net/ieee80211.h b/include/net/ieee80211.h
> --- a/include/net/ieee80211.h
> +++ b/include/net/ieee80211.h
> @@ -425,9 +425,7 @@ struct ieee80211_stats {
>  
>  struct ieee80211_device;
>  
> -#if 0 /* for later */
>  #include "ieee80211_crypt.h"
> -#endif
>  
>  #define SEC_KEY_1 (1<<0)
>  #define SEC_KEY_2 (1<<1)
> 
> 
> I have looked at the files and common ancestors involved and I think
> that this change have been introduced manually. I may have missed
> something when I analysed it though...

Certainly possible that it was done manually.

> > > The merge cases reported by Tony Luck and Len Brown are both cleanly
> > > merged by my code.
> > 
> > Do they come out correctly? Both of those have cases which cannot be 
> > decided correctly with only the ancestor trees, due to one branch 
> > reverting a patch that was only in one ancestor. The correct result is to 
> > revert that patch, but figuring out that requires looking at more trees. I 
> > think your algorithm should work for this case, but it would be good to 
> > have verification. (IIRC, Len got the correct result while Tony got the 
> > wrong result and then corrected it later.)
> 
> Len's merge case come out identically to the tree he committed. I have
> described what I got for Tony's case in
> <[EMAIL PROTECTED]> (my merge algorithm
> produces the result Tony expected to get, but he didn't get that from
> git-resolve-script).

Good. It looks to me like this is a good algorithm in practice, then.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread A Large Angry SCM

Junio C Hamano wrote:
[...]

Subject: [PATCH] Multi-backend merge driver.

This is just an illustration of concept patch.

The new command 'git merge' takes the current head and one or more
remote heads, with the commit log message for the automated case.

If the heads being merged are simple fast-forwards, it acts the
same way as the current 'git resolve'.  Otherwise, it tries
different merge strategies and takes the result from the one that
succeeded auto-merging, if there is any.

If no merge strategy succeeds auto-merging, their results are
evaluated for number of paths needed for hand resolving, and the
one with the least number of such paths is left in the working
tree.  The user is asked to resolve them by hand and make a
commit manually.

[...]

This last paragraph concerns me because it limits us to a very simple 
metric, "number of paths needed for hand resolving". Some "hand 
resolving" situations can be more complex than others and it would nice 
to use that information if it was available.


Of course I have no idea what that information may be or how it can be 
computed.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Linus Torvalds


On Thu, 8 Sep 2005, Chuck Lever wrote:
> > 
> > Btw, in the sparse project, we have this really smart "pointer list" data
> > structure, which is extremely space- and time-efficient. It ends up
> > _looking_ like a linked list, but it batches things up in hunks of 29
> > entries (29 pointers plus overhead gives you blocks of 32 longwords, which
> > is the allocation size) and thus gives basically a cache-friendly
> > doubly-linked list. It knows how to do insertions, traversals etc very 
> > efficiently.
> > 
> > Any interest?
> 
> i'm not married to splay trees.  i think we should explore several 
> different data structures before picking one, and this one sounds 
> reasonable to try.

Actually, I just realized that one of the issues is just plain looking up
a name (ie "pos = cache_name_pos(..)", and the sparse list don't make that
very efficient unless you have a good starting point.

So a splay tree is probably more appropriate.

Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Junio C Hamano
Junio C Hamano <[EMAIL PROTECTED]> writes:

> ... I'd like to leave what
> merge strategy to use as an user option, and leave the door open
> for other merge strategies to emerge later.  I know Pasky wants
> to look into pcdv merge and other alternatives.
>
> This is still off-the-top-of-my-head, but the top-level merge
> entry point for the end user would be just:
>
> git merge   
>
> and by default 'git-merge-script' (sorry, I am taking over the
> name of your script for this 'generic driver for underlying
> merge strategy scripts') would do something like:

And here is "an illustration of concept" patch, requesting for
comments.  It probably is buggy as h*ll, but I am showing
interfaces early, hoping this is something people can agree on
to use to plug various merge strategies into.

The patch uses 'git-show-branches --merge-base' as an improved
'git-merge-base -a' that can do more than two heads, which still
only exists in my development repository, but I hope you get the
idea.


Subject: [PATCH] Multi-backend merge driver.

This is just an illustration of concept patch.

The new command 'git merge' takes the current head and one or more
remote heads, with the commit log message for the automated case.

If the heads being merged are simple fast-forwards, it acts the
same way as the current 'git resolve'.  Otherwise, it tries
different merge strategies and takes the result from the one that
succeeded auto-merging, if there is any.

If no merge strategy succeeds auto-merging, their results are
evaluated for number of paths needed for hand resolving, and the
one with the least number of such paths is left in the working
tree.  The user is asked to resolve them by hand and make a
commit manually.

The calling convention from the 'git merge' driver to merge
strategy programs is very simple:

 - A strategy program is to be called 'git-merge-'.

 - They take input of this form:

  ... '--'   ...

   That is, one or more the common ancestors, double dash, the
   current head, and one or more remote heads being merged into
   the current branch.

 - Before a strategy program is called, the working tree is
   matched to the current .

 - The strategy program exits with status code 0 when it
   successfully auto-merges the given heads.  It should do
   update-cache for all the merged paths when it does so -- the
   index file will be used to record the merge result as a
   commit by the driver.

 - The strategy program exits with status code 1 when it leaves
   conflicts behind.  It should do update-cache for all the
   merged paths that it successfully auto-merged, and leave the
   cache entry in the index file as the same as  for paths
   it could not auto-merge, and leave its best-effort result
   with conflict markers in the working tree when it does so.

 - The strategy program exists with status code other than 0 or
   1 if it does not handle the given merge at all.

As examples, this commit comes with merge strategies based on
'git resolve' and 'git octopus'.

Signed-off-by: Junio C Hamano <[EMAIL PROTECTED]>

---

 git-commit.sh|2 
 git-merge-octopus.sh |   86 +
 git-merge-resolve.sh |   80 
 git-merge.sh |  205 ++
 4 files changed, 372 insertions(+), 1 deletions(-)
 create mode 100755 git-merge-octopus.sh
 create mode 100755 git-merge-resolve.sh
 create mode 100755 git-merge.sh

a65a614ce7f67c3ed4dee3bed3ab4d86d3f79577
diff --git a/git-commit.sh b/git-commit.sh
--- a/git-commit.sh
+++ b/git-commit.sh
@@ -158,7 +158,7 @@ if [ ! -r "$GIT_DIR/HEAD" ]; then
PARENTS=""
 else
if [ -f "$GIT_DIR/MERGE_HEAD" ]; then
-   PARENTS="-p HEAD -p MERGE_HEAD"
+   PARENTS="-p HEAD "`sed -e 's/^/-p /' "$GIT_DIR/MERGE_HEAD"`
fi
if test "$use_commit" != ""
then
diff --git a/git-merge-octopus.sh b/git-merge-octopus.sh
new file mode 100755
--- /dev/null
+++ b/git-merge-octopus.sh
@@ -0,0 +1,86 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+# Resolve two or more trees.
+#
+
+# The first parameters up to -- are merge bases; the rest are heads.
+bases= head= remotes= sep_seen=
+for arg
+do
+   case ",$sep_seen,$head,$arg," in
+   *,--,)
+   sep_seen=yes
+   ;;
+   ,yes,,*)
+   head=$arg
+   ;;
+   ,yes,*)
+   remotes="$remotes$arg "
+   ;;
+   *)
+   bases="$bases$arg "
+   ;;
+   esac
+done
+
+# Reject if this is not an Octopus -- resolve should be used instead.
+case "$remotes" in
+?*' '?*)
+   ;;
+*)
+   exit 2 ;;
+esac
+
+# MRC is the current "merge reference commit"
+# MRT is the current "merge result tree"
+
+MRC=$head MSG= PARENT="-p $head"
+MRT=$(git-write-tree)
+CNT=1 ;# counting our head
+NON_FF_MERGE=0
+for SHA1 in $remotes
+do
+   common=$(git-merge-base $MRC $SHA1) ||
+   

Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Fredrik Kuivinen
On Thu, Sep 08, 2005 at 11:27:35AM -0400, Daniel Barkalow wrote:

...

> > The two cases my algorithm merges cleanly and git-resolve-script do
> > not merge cleanly are 0e396ee43e445cb7c215a98da4e76d0ce354d9d7 and
> > 0c168775709faa74c1b87f1e61046e0c51ade7f3. Both of them have two common
> > ancestors. The second one have, as far as I know, not been tested with
> > your read-tree.
> 
> Okay, I'll have to check whether the result I get seems right. I take it 
> your result agrees with what the users actually produced by hand?
 
The first one agrees with what was actually committed. For the second
one the difference between the tree produced by the algorithm and what
was committed is:

diff --git a/include/net/ieee80211.h b/include/net/ieee80211.h
--- a/include/net/ieee80211.h
+++ b/include/net/ieee80211.h
@@ -425,9 +425,7 @@ struct ieee80211_stats {
 
 struct ieee80211_device;
 
-#if 0 /* for later */
 #include "ieee80211_crypt.h"
-#endif
 
 #define SEC_KEY_1 (1<<0)
 #define SEC_KEY_2 (1<<1)


I have looked at the files and common ancestors involved and I think
that this change have been introduced manually. I may have missed
something when I analysed it though...


> > The merge cases reported by Tony Luck and Len Brown are both cleanly
> > merged by my code.
> 
> Do they come out correctly? Both of those have cases which cannot be 
> decided correctly with only the ancestor trees, due to one branch 
> reverting a patch that was only in one ancestor. The correct result is to 
> revert that patch, but figuring out that requires looking at more trees. I 
> think your algorithm should work for this case, but it would be good to 
> have verification. (IIRC, Len got the correct result while Tony got the 
> wrong result and then corrected it later.)

Len's merge case come out identically to the tree he committed. I have
described what I got for Tony's case in
<[EMAIL PROTECTED]> (my merge algorithm
produces the result Tony expected to get, but he didn't get that from
git-resolve-script).

- Fredrik
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Chuck Lever

Linus Torvalds wrote:


On Thu, 8 Sep 2005, Chuck Lever wrote:

in my case the merges were taking significantly longer than a half 
second.  making this change is certainly not worth it if merges are 
running fast...



Note that in cold-cache cases, all the expense of read-tree is in actually
reading the tree objects themselves (a kernel tree has more than a
thousand subdirectories). Also, a full "git pull" will do the diffstat 
etc, and then the expense ends up being in the actual "git diff" part.


So read-tree itself may be half a second, but a merge ends up having other 
parts.


i measured this using the following test...

i have a linux kernel git repository under control of stgit and it has 
about 70 patches in it.  i did an "stg status" to heat the page cache. 
i popped all the patches, then did an "stg pull origin".


i started oprofile, and did an "stg push -a".  it took about 9 minutes.

i stopped oprofile and looked at the results.  roughly 65% of the total 
CPU time was spent in libc:memmove.  after instrumenting git, i 
determined that in the "stg push" case the critical memmove was the one 
in add_cache_entry.


note that i'm not saying that the 9 minutes of wall clock time was 
entirely due to CPU... catalin has been steadily improving "stg push" so 
this time has shortened by more than half, recently.  but i do notice 
that working in a kernel repository is significantly slower than working 
in my git or stgit source repositories, which are smaller by far.  the 
small repositories behave just as i expect, the tools respond quite 
snappily.



they are still read-only with my linked list implementation.


Btw, in the sparse project, we have this really smart "pointer list" data
structure, which is extremely space- and time-efficient. It ends up
_looking_ like a linked list, but it batches things up in hunks of 29
entries (29 pointers plus overhead gives you blocks of 32 longwords, which
is the allocation size) and thus gives basically a cache-friendly
doubly-linked list. It knows how to do insertions, traversals etc very 
efficiently.


Any interest?


i'm not married to splay trees.  i think we should explore several 
different data structures before picking one, and this one sounds 
reasonable to try.
begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:[EMAIL PROTECTED]
title:Member of Technical Staff
tel;work:+1 734 763-4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668-1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard



Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Linus Torvalds


On Thu, 8 Sep 2005, Chuck Lever wrote:
> 
> in my case the merges were taking significantly longer than a half 
> second.  making this change is certainly not worth it if merges are 
> running fast...

Note that in cold-cache cases, all the expense of read-tree is in actually
reading the tree objects themselves (a kernel tree has more than a
thousand subdirectories). Also, a full "git pull" will do the diffstat 
etc, and then the expense ends up being in the actual "git diff" part.

So read-tree itself may be half a second, but a merge ends up having other 
parts.

> they are still read-only with my linked list implementation.

Btw, in the sparse project, we have this really smart "pointer list" data
structure, which is extremely space- and time-efficient. It ends up
_looking_ like a linked list, but it batches things up in hunks of 29
entries (29 pointers plus overhead gives you blocks of 32 longwords, which
is the allocation size) and thus gives basically a cache-friendly
doubly-linked list. It knows how to do insertions, traversals etc very 
efficiently.

Any interest?

Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Chuck Lever

Linus Torvalds wrote:


On Thu, 8 Sep 2005, Junio C Hamano wrote:


Yes, the reading of three trees upfront is probably the culprit
in your case



However, note that _most_ tree reading just reads one.

Merges may take half a second, and yes, when I did it, the fact that we 
move things around in the array is by far the highest cost. But the thing 
is, if merges take half a second, that's still not only damn good, it's 
not even the most common operation.


in my case the merges were taking significantly longer than a half 
second.  making this change is certainly not worth it if merges are 
running fast...


Yes, the active_cache[] layout as one big array is inconvenient for 
read_tree(), which tends to want to interleave different trees in the 
array and thus forces things to be moved around.


But remember what the most common use for the index is - it's the "single 
tree" case from read_cache(). That's _so_ much more common than 
read_tree() that it's not even funny.


read_cache is fast as implemented.  the issue is that the next thing 
that is often done is a cache insert, which requires a memmove, which is 
slow.


So the data structure is optimized for a different case than reading in 
trees. Big deal. That optimization is definitely worth it: it allows us to 
do the read_cache() with the actual index entries being totally read-only 
(a linked list would have to add a "next" pointer to the cache entries and 
not allow the in-place thing that read_cache() does).


they are still read-only with my linked list implementation.
begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:[EMAIL PROTECTED]
title:Member of Technical Staff
tel;work:+1 734 763-4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668-1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard



Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Junio C Hamano
Linus Torvalds <[EMAIL PROTECTED]> writes:

> So the data structure is optimized for a different case than reading in 
> trees. Big deal. That optimization is definitely worth it: it allows us to 
> do the read_cache() with the actual index entries being totally read-only 
> (a linked list would have to add a "next" pointer to the cache entries and 
> not allow the in-place thing that read_cache() does).

Yes, you are right.  What is being discussed is to help
read-tree.c while keeping that nice property.

I think Daniel's patch is going in the right direction.  It can
be told to populate the resulting cache from scratch by reading
the current cache and the trees being read, instead of inserting
and removing the current cache as it does right now.  Once that
is done, we would not have to do repeated memmove to insert and
delete an entry one at a time anymore.


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Linus Torvalds


On Thu, 8 Sep 2005, Junio C Hamano wrote:
> 
> Yes, the reading of three trees upfront is probably the culprit
> in your case

However, note that _most_ tree reading just reads one.

Merges may take half a second, and yes, when I did it, the fact that we 
move things around in the array is by far the highest cost. But the thing 
is, if merges take half a second, that's still not only damn good, it's 
not even the most common operation.

Yes, the active_cache[] layout as one big array is inconvenient for 
read_tree(), which tends to want to interleave different trees in the 
array and thus forces things to be moved around.

But remember what the most common use for the index is - it's the "single 
tree" case from read_cache(). That's _so_ much more common than 
read_tree() that it's not even funny.

So the data structure is optimized for a different case than reading in 
trees. Big deal. That optimization is definitely worth it: it allows us to 
do the read_cache() with the actual index entries being totally read-only 
(a linked list would have to add a "next" pointer to the cache entries and 
not allow the in-place thing that read_cache() does).

Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Junio C Hamano
Chuck Lever <[EMAIL PROTECTED]> writes:

> Junio C Hamano wrote:
>> Chuck Lever <[EMAIL PROTECTED]> writes:
>>
> once the list implementation is working well, i plan to go back and 
> replace it with something more sophisticated which can perform 
> insertion, deletion, and searching efficiently.

Well, what I wanted to see was if we can get away without
introducing more sophisticated data structures, which would add
complexity.

>> I haven't timed it like you did, but my gut feeling is that the
>> most wastage during the merge is coming from having to move
>> entries because we "insert into" or "remove from" the same
>> active-cache array.
>
> the merge actually replaces entries in place, so it is already fairly 
> efficient.  the wastage in the merge case arises from the many list 
> insertions done by read_cache, all of which involve moving some of the 
> active cache array.

Yes, the reading of three trees upfront is probably the culprit
in your case, and I think that is something Daniel's read-tree
rewrite can help.  I still see some remove_cache_entry_at()
there because it works on the active_cache in place, but the
structure of the new code should be much easier to make the kind
of modification we are talking about.

>> I think what Daniel
>> did to read-tree recently, still in the proposed updates branch,
>> would make this kind of change far easier to implement.
>
> as i'm new to git development, i wasn't aware of the proposed branch.  i 
> will see if i can take a look (or send me a pointer to the list archives 
> if he has posted them here).

If you already have a local copy of git.git repository you work in:

git fetch http://kernel.org/pub/scm/git/git.git +pu:refs/heads/ko-pu
git checkout -f ko-pu

Otherwise

git clone http://kernel.org/pub/scm/git/git.git peek
cd peek
git checkout -f pu

The interesting change from the mainline is in read-tree.c

You can browse http://kernel.org/git/ and see commitdiffs in the
"pu" branch if you are not interested in slurping the whole
proposed updates branch.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Daniel Barkalow
On Thu, 8 Sep 2005, Fredrik Kuivinen wrote:

> On Wed, Sep 07, 2005 at 02:33:42PM -0400, Daniel Barkalow wrote:
> > On Wed, 7 Sep 2005, Fredrik Kuivinen wrote:
> > 
> > > Of the 500 merge commits that currently exists in the kernel
> > > repository 19 produces non-clean merges with git-merge-script. The
> > > four merge cases listed in
> > > <[EMAIL PROTECTED]> are cleanly merged by
> > > git-merge-script. Every merge commit which is cleanly merged by
> > > git-resolve-script is also cleanly merged by git-merge-script,
> > > furthermore the results are identical. There are currently two merges
> > > in the kernel repository which are not cleanly merged by
> > > git-resolve-script but are cleanly merged by git-merge-script.
> > 
> > If you use my read-tree and change git-resolve-script to pass all of the 
> > ancestors to it, how does it do? I expect you'll still be slightly ahead, 
> > because we don't (yet) have content merge with multiple ancestors. You 
> > should also check the merge that Tony Luck reported, which undid a revert, 
> > as well as the one that Len Brown reported around the same time which had 
> > similar problems. I think maintainer trees are a much better test for a 
> > merge algorithm, because the kernel repository is relatively linear, while 
> > maintainers tend more to merge things back and forth.
> 
> Junio tested some of the multiple common ancestor cases with your
> version of read-tree and reported his results in
> <[EMAIL PROTECTED]>.

Oh, right. I'm clearly not paying enough attention here.

> The two cases my algorithm merges cleanly and git-resolve-script do
> not merge cleanly are 0e396ee43e445cb7c215a98da4e76d0ce354d9d7 and
> 0c168775709faa74c1b87f1e61046e0c51ade7f3. Both of them have two common
> ancestors. The second one have, as far as I know, not been tested with
> your read-tree.

Okay, I'll have to check whether the result I get seems right. I take it 
your result agrees with what the users actually produced by hand?

> The merge cases reported by Tony Luck and Len Brown are both cleanly
> merged by my code.

Do they come out correctly? Both of those have cases which cannot be 
decided correctly with only the ancestor trees, due to one branch 
reverting a patch that was only in one ancestor. The correct result is to 
revert that patch, but figuring out that requires looking at more trees. I 
think your algorithm should work for this case, but it would be good to 
have verification. (IIRC, Len got the correct result while Tony got the 
wrong result and then corrected it later.)

> You are probably right about the maintainer trees. I should have a
> look at some of them. Do you know any specific repositories with
> interesting merge cases?

Not especially, except that I would guess that people who have reported 
hitting bad cases would be more likely to have other interesting merges in 
their trees. You might also try merging maintainer trees with each other, 
since it's relatively likely that there would be complicating overlap that 
only doesn't cause confusion because things get rearranged in -mm. For 
that matter, I bet you'd get plenty of test cases out of trying to 
replicate -mm as a git tree.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-08 Thread Chuck Lever

Junio C Hamano wrote:

Chuck Lever <[EMAIL PROTECTED]> writes:


for the past two weeks i've been attempting to replace the active_cache 
array with an abstract data type (linked list just as a prototype) in 
order to eliminate the need to use memmove() during insertions and 
deletions.


one big win is that the merge algorithm no longer needs to overwrite the 
active_cache.  you can keep two lists and simply move elements from one 
to the other as you do the merge, and then hook the new, merged list 
back to the cache head when you're done.


anyway, i'm at a point where i could use some help and advice if this is 
something you think could be useful in the long run.



This is interesting.

For something like a kernel tree that has around 18k entries, a
single memmove will at the worst case move that many pointers,
so naturally a list based implementation would perform well for
many insertions and deletions.   On the other hand, the list
based implementation would make it very expensive to find an
entry read-only, I suspect, unlike the array based
implementation which lets us do binary search.


using a list was an easy prototype to see what issues there are when 
converting from an array into an abstract data type.  i've created a few 
helpers that allow me to limit the list implementation changes to 
cache.h, read-cache.c, and read-tree.c; but having these helpers means 
it should be fairly simple to switch to any reasonable data structure. 
(the helper parts are already working and i can post them to the list if 
folks are interested).


once the list implementation is working well, i plan to go back and 
replace it with something more sophisticated which can perform 
insertion, deletion, and searching efficiently.


in the long run, i see using a balanced tree.  that would make 
insertions and searches quick, and prevent the tree from degenerating 
into a linked list due to a series of in-order insertions.  a splay tree 
might be even more entertaining, as it always remains balanced, and a 
search operation places the found node or insertion point right at the 
root of the tree.


each node in the tree would represent a unique path, and have references 
to the entries of all four stages, contained in an array.  that would 
simplify several routines in read-cache.c and probably make the merge 
logic even easier to understand and more efficient.



I haven't timed it like you did, but my gut feeling is that the
most wastage during the merge is coming from having to move
entries because we "insert into" or "remove from" the same
active-cache array.


the merge actually replaces entries in place, so it is already fairly 
efficient.  the wastage in the merge case arises from the many list 
insertions done by read_cache, all of which involve moving some of the 
active cache array.



If we allocate a new array and populate it
from scratch as we iterate through the paths being merged and
replace the active_cache pointer at the very end, we would do
much better without going to a linked list implementation, which
would penalize the normal read-only case.


i can already tell you that using a separate source and destination data 
structure during the merge simplifies the logic and makes it easier to 
understand.  i imagine it will also be easier to maintain and enhance in 
the long run.



I think what Daniel
did to read-tree recently, still in the proposed updates branch,
would make this kind of change far easier to implement.


as i'm new to git development, i wasn't aware of the proposed branch.  i 
will see if i can take a look (or send me a pointer to the list archives 
if he has posted them here).



I wanted to cc this message to Daniel but your message to me was
somehow private so I did not.  Please feel free to bring this
issue up either with him directly or on the list.


i wanted to assess whether this was a reasonable idea or if there was 
already work in progress in this area.  thanks for your response!
begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:[EMAIL PROTECTED]
title:Member of Technical Staff
tel;work:+1 734 763-4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668-1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard



Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-07 Thread Fredrik Kuivinen
On Wed, Sep 07, 2005 at 02:33:42PM -0400, Daniel Barkalow wrote:
> On Wed, 7 Sep 2005, Fredrik Kuivinen wrote:
> 
> > Of the 500 merge commits that currently exists in the kernel
> > repository 19 produces non-clean merges with git-merge-script. The
> > four merge cases listed in
> > <[EMAIL PROTECTED]> are cleanly merged by
> > git-merge-script. Every merge commit which is cleanly merged by
> > git-resolve-script is also cleanly merged by git-merge-script,
> > furthermore the results are identical. There are currently two merges
> > in the kernel repository which are not cleanly merged by
> > git-resolve-script but are cleanly merged by git-merge-script.
> 
> If you use my read-tree and change git-resolve-script to pass all of the 
> ancestors to it, how does it do? I expect you'll still be slightly ahead, 
> because we don't (yet) have content merge with multiple ancestors. You 
> should also check the merge that Tony Luck reported, which undid a revert, 
> as well as the one that Len Brown reported around the same time which had 
> similar problems. I think maintainer trees are a much better test for a 
> merge algorithm, because the kernel repository is relatively linear, while 
> maintainers tend more to merge things back and forth.

Junio tested some of the multiple common ancestor cases with your
version of read-tree and reported his results in
<[EMAIL PROTECTED]>.

The two cases my algorithm merges cleanly and git-resolve-script do
not merge cleanly are 0e396ee43e445cb7c215a98da4e76d0ce354d9d7 and
0c168775709faa74c1b87f1e61046e0c51ade7f3. Both of them have two common
ancestors. The second one have, as far as I know, not been tested with
your read-tree.

The merge cases reported by Tony Luck and Len Brown are both cleanly
merged by my code.

You are probably right about the maintainer trees. I should have a
look at some of them. Do you know any specific repositories with
interesting merge cases?

- Fredrik
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-07 Thread Junio C Hamano
Fredrik Kuivinen <[EMAIL PROTECTED]> writes:

> I guess the need for this has decreased with Daniel's new read-tree
> code. Is there any chance of getting this code merged into mainline
> git?

I do not think Daniel's code decreased anything.  The two
algorithms are solving the same problem but they do it
differently.  There would be cases where multi-base merge would
give better results over your base-on-merge-bases merge; and in
other cases yours would perform better.  I'd like to leave what
merge strategy to use as an user option, and leave the door open
for other merge strategies to emerge later.  I know Pasky wants
to look into pcdv merge and other alternatives.

This is still off-the-top-of-my-head, but the top-level merge
entry point for the end user would be just:

git merge   

and by default 'git-merge-script' (sorry, I am taking over the
name of your script for this 'generic driver for underlying
merge strategy scripts') would do something like:

 - Run 'git-merge-base -a' to find common ancestors.

 - If there is one single common ancestor, do what the current
   'git-resolve-script' does: if fast-forward, declare success
   and exit without touching anything but $GIT_DIR/HEAD;
   otherwise, run read-tree -m 3-way followed by merge-cache,
   and commit with the given message if the merge was clean and
   exit.  If the merge was not clean, go on to the next step.

 - User preference (maybe $HOME/.git/merge-preference, maybe
   command line option to 'git merge', maybe interactive
   prompting) can specify which complex merge strategies to use.
   we probably would want to be able to say "try this, that and
   that-over-there in this order" in the preference. One of the
   strategies could be 'I give up, just exit and sort it out by
   hand'.

 - The chosen merge backend, be it the one that uses Daniel's
   multi-base merge or your base-on-merge-bases merge, takes
   over.  In any case, the backend should attempt to resolve the
   two heads, and stop at the point just before committing.

 - Evaluate how good the merge is by looking at the result from
   the backend at this point, and if it is not good enough,
   reset the working tree back to the pre-merge state and try
   the next backend.  The looping code needs a termination
   clause that picks the best result after exhausting the list
   of backends.

 - Then, if the merge backend auto-resolved the heads, we
   commit with ; otherwise we exit with
   non-zero status and have the user sort it out.

The above assumes that (1) we are already doing a reasonable
thing in simple one-common-ancestor case; (2) it is in the fast
path and we want the multiple backend code out of it.  You could
think of the current 'git-resolve-script' code equivalent of
having a single 'complex merge backend' that just 'gives up'.

If the above is a good user model, we need a couple of interface
convention between merge backends and the above driver:

 - Some merge backends (like yours) would require that the
   starting tree is clean while others may not care as long as
   the paths involved are clean (i.e. index matches  and
   the working file matches index -- the traditional code and
   Daniel's even allow such a working file being different from
   index if it happens to match the merge result).  They need to
   be able to say "declined to merge even though I might do a
   better job than others if given a clean working tree".  Then
   the user can retry the same merge after getting the working
   tree in a clean state.  I personally feel that we do not need
   this complexity and it is reasonable to always require the
   starting tree to be clean when the merge is not a
   fast-forward, though.

 - When a merge backend finishes, it may leave the working tree
   in a failed merge state.  If we were to try different
   backends in the loop to find the best result, we would need
   some way to assign scores to them.  The score should be able
   to tell us if the result can be auto-committable or not, and
   if not how bad it is.  I think exit status from the backend
   can be used to tell us the former (i.e. exit non-zero if your
   result has conflicts and you do not want your result to be
   committed immediately), and number of cache-dirty entries can
   be used as an indication of how bad it is (i.e. leave the
   paths you know have not been cleanly merged cache-dirty -- do
   not run git-update-cache on them).  This is essentially the
   same convention used by the current 'git-resolve-script'.

I think the 'renaming merge heuristics' Linus outlined in
another thread will fall naturally into this picture as a merge
backend.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] A new merge algorithm, take 3

2005-09-07 Thread Daniel Barkalow
On Wed, 7 Sep 2005, Fredrik Kuivinen wrote:

> Of the 500 merge commits that currently exists in the kernel
> repository 19 produces non-clean merges with git-merge-script. The
> four merge cases listed in
> <[EMAIL PROTECTED]> are cleanly merged by
> git-merge-script. Every merge commit which is cleanly merged by
> git-resolve-script is also cleanly merged by git-merge-script,
> furthermore the results are identical. There are currently two merges
> in the kernel repository which are not cleanly merged by
> git-resolve-script but are cleanly merged by git-merge-script.

If you use my read-tree and change git-resolve-script to pass all of the 
ancestors to it, how does it do? I expect you'll still be slightly ahead, 
because we don't (yet) have content merge with multiple ancestors. You 
should also check the merge that Tony Luck reported, which undid a revert, 
as well as the one that Len Brown reported around the same time which had 
similar problems. I think maintainer trees are a much better test for a 
merge algorithm, because the kernel repository is relatively linear, while 
maintainers tend more to merge things back and forth.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html