Re: Moved files and merges

2005-09-05 Thread H. Peter Anvin

Junio C Hamano wrote:



 1
/ \
0-2-3-5-7
  \   /
   4-6

It shouldn't matter to the merge at 7 if the 2-3 reorganization was done 
locally, by applying a patch, or by merging.



There was another problem in my message that treated #3
specially.  I did it that way primarily because I wanted to have
an algorithm that needs to look only limited (namely, one)
number of commits, more than what we currently look at.  The
problem is that the trail #0..#1..#3 (in the example in second
message, whose rename probably happened between #0 and #1) may
change the contents of the renamed file so drastically that diff
between #2 and #3 may not look like rename anymore, while we
could still detect it if we followed the whole trail and looked
for renames between each commit on it.



One question, of course, is if one should simply keep additional 
metadata around to handle this sort of situations.  One could, for 
example, keep a UUID for each file, which would be carried over by the 
renaming commit.  If one runs into a tree which doesn't have the UUIDs, 
they should be generated at that time (this could be a bit tricky to do 
without invalidating all signatures in the tree, since the obvious way 
-- adding it to the tree object -- would invalidate all the commit and 
tag objects.)


In some ways this is similar to the Unix filesystem model of separating 
location (pathname) from identity (device:inode).


It would also hade the somewhat interesting possibility that one could 
remove and recreate a file and have it exist as a different entity. 
That probably needs to be a user option.


-hpa
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moved files and merges

2005-09-05 Thread Linus Torvalds


On Mon, 5 Sep 2005, H. Peter Anvin wrote:
 
 It would also hade the somewhat interesting possibility that one could 
 remove and recreate a file and have it exist as a different entity. 
 That probably needs to be a user option.

It's a totally broken model. Really.

You think it solves issues, but it just creates more bugs and problems 
than it solves.

Trust me. The whole point of git is that content is the only thing that 
matters, and that there isn't any other meta-data. If you break that 
fundamental assumption, everything git does so well will break. 

I think we've already shown that the content matters approach works.  I
claim that the git rename tracking works better than any other SCM out 
there, _exactly_ because it doesn't make the mistake of trying to track 
anything but content.

The moved + modified files is not anything special. The current 
automatic merger may not handle it, but that's not because it _can't_ 
handle it, it's because it tries to be simple and efficient. 

And because it's so _incredibly_ fast for all the normal cases, you can 
now spend some effort on figuring out renames dynamically for the few 
cases where it fails. Does it do so now? No. Would adding UUID's help? 
Hell no. It would be just an unmitigated disaster.

Exactly the same way git-diff-tree can figure out renames, a merge 
algorithm can figure them out. 

Right now, we have two stages in merges: we try the trivial merge first
(pure git-read-tree), and when that fails, we try the automatic 3-way
merge. The fact that we don't have a third (and fourth, and fifth) merge
algorithm for when those two trivial merges happen to not work is _not_ an
indication that the contents only approach doesn't work - it's just an
indication of the fact that 99.9% of all merges are trivial, and they
should be optimized for.

So the next step is _not_ to do UUID's, it's to notice that merge errors 
happened, and try to figure out why. Right now we just give up and say 
sort it out by hand. That's actually a perfectly valid approach even in 
the presense of moved files - it's a bit painful, but once you _do_ sort 
it out and commit the merge, especially if you can push the merge back (so 
that both sides then agree on the final rename), future merges will be 
trivial again - ie you won't have to go through it over and over again.

Of course, if you don't push it back, but keep the two trees separate and 
keep on modifying files that have different names in the other repository, 
you'll keep on getting into the situation that the trivial merge doesn't 
work. So we _do_ want to get an automated phase 3 (and maybe 4..) merge 
that can figure out renames, but the point here is that it's something we 
_can_ figure out.

For example, one way of doing it is to just do the exact merge we do now,
and then look at the files that didn't merge. Do a cross-diff between such
files and new/deleted files (if not _exactly_ the way we do for git diff
-M, then at least it's exactly the same concept), and try to do a
three-way merge where the base/first/second pairs don't have the same
name.

For example, let's say that you have the common commit A, and file x,
and two paths (B and C) where B has renamed the file x to y, and C has
modified file x. You end up with the schenario that our trivial merge
fails to handle, and right now we give up, and don't help the user very
much at all. But the _solution_ is not to change read-tree to know about
renames, nor is it to make git keep any new data. The solution is to just 
make phase 3 say:

 - Automatic merge failed, trying rename merge
 - go through all files that exist in C but not in B (or vice versa), and 
   pair them up with all files that exist in B but not in C (or vice
   versa) and see if _they_ can be handled as a three-way merge. And 
   exactly the same way that we do the rename detection, we may want to
   find the optimal pairing by looking at the distance between the
   files.

Notice? This will automatically handle the renamed in one branch, 
modified in another case. In fact, if the renamer modified it too, that's 
not a problem at all - the three-way merge will work exactly the same way 
it does now with the case of a non-moved modified in both files.

Problem solved. Without complicating the trivial (and very common) cases, 
and without introducing any new metadata that is fundamentally impossible 
to maintain (and it _is_ fundamentally impossible to maintain, because it 
has nothing to do with the contents of the files, so patches etc will by 
definition break it).

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moved files and merges

2005-09-05 Thread H. Peter Anvin

Linus Torvalds wrote:


It's a totally broken model. Really.

You think it solves issues, but it just creates more bugs and problems 
than it solves.


Trust me. The whole point of git is that content is the only thing that 
matters, and that there isn't any other meta-data. If you break that 
fundamental assumption, everything git does so well will break. 


I think we've already shown that the content matters approach works.  I
claim that the git rename tracking works better than any other SCM out 
there, _exactly_ because it doesn't make the mistake of trying to track 
anything but content.


The moved + modified files is not anything special. The current 
automatic merger may not handle it, but that's not because it _can't_ 
handle it, it's because it tries to be simple and efficient. 

And because it's so _incredibly_ fast for all the normal cases, you can 
now spend some effort on figuring out renames dynamically for the few 
cases where it fails. Does it do so now? No. Would adding UUID's help? 
Hell no. It would be just an unmitigated disaster.




Okay, how about keeping a cache (derived from the contents) of these 
types of data, to assist the merge if necessary?  If it doesn't exist 
when needed, it can just be created, which may take O(n) time.


-hpa
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moved files and merges

2005-09-05 Thread Junio C Hamano
Linus Torvalds [EMAIL PROTECTED] writes:

 Of course, if you don't push it back, but keep the two trees separate and 
 keep on modifying files that have different names in the other repository, 
 you'll keep on getting into the situation that the trivial merge doesn't 
 work. So we _do_ want to get an automated phase 3 (and maybe 4..) merge 
 that can figure out renames, but the point here is that it's something we 
 _can_ figure out.

Thanks.  You very well said exactly what I should have said; I
failed to explain where that `reusing rename information from
previous merge` algorithm should fit in the bigger picture.

And the algorithm you describe (and Daniel briefly outlined) is
slightly different from what I had in mind in that it does not
have to depend on previous merge, which is also nice.

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moved files and merges

2005-09-05 Thread Junio C Hamano
H. Peter Anvin [EMAIL PROTECTED] writes:

 One question, of course, is if one should simply keep additional 
 metadata around to handle this sort of situations.  One could, for 
 example, keep a UUID for each file,...

If I am not mistaken, that is exactly what tla does.  It seems
to work well in practice and seem so simple (at least
superficially, I have not looked deeply into the issues involved
in keeping it sync with the contents and how to recover if the
user ever screws up, etc.), and I can see why people find it so
attractive.  I myself once did.

But previous argument by Linus made in a distant (in git
timescale) past is now ingrained in my brain: the additional
metadata recorded at the commit time can only help us what we
envisioned in the past when the tool to record that metadata was
written.  If we try to track by contents, we can do at least
the same (diff -M being able to tell renames is an example that
we can get away without having a UUID) and possibly better,
depending on how much effort we are willing to spend drilling
down when we actually need to know what happened at merge
time.  What I found most important in that argument by Linus is
that the drilling down algorithm can improve over time while
the additional metadata specification is cast in stone when a
commit is made.




-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moved files and merges

2005-09-04 Thread Daniel Barkalow
On Sun, 4 Sep 2005, Junio C Hamano wrote:

 Sam Ravnborg [EMAIL PROTECTED] writes:
 
  If the problem is not fully understood it can be difficult to come up
  with the proper solution. And with the example above the problem should
  be really easy to understand.
  Then we have the tree as used by hpa with a few more mergers in it. But
  the above is what was initial tried to do with the added complexity of a
  few more renames etc.
 
 All true.  Let's redraw that simplified scenario, and see if
 what I said still holds.  It may be interesting to store my
 previous message and this one and run diff between them.  I
 suspect that the main difference to come out would be the the
 problem description part and the merge machinery part would not
 be all that different.

I'm not quite so convinced, because I think that the actual situation is a 
bit more natural, and therefore our expectations at the end should be 
closer to right with less attention to detail. But I think the actual 
situation is more interesting, anyway, because it's more likely to happen 
and we're more likely to be able to help.

 
 This is a simplified scenario of klibc vs klibc-kbuild HPA had
 trouble with, to help us think of a way to solve this
 interesting merge problem.
 
#1 - #3 - #5 - #7
// /
 #0 - #2 - #4 - #6
 
 There are two lines of developments.  #0-#1 renames F to G and
 introduces K.  #0-#2 keeps F as F and does not introduce K.
 
 At commit #3, #2 is merged into #1.  The changes made to the
 file contents of F between #0 and #2 are appreciated, but we
 would also want to keep our decision to rename F to G and our
 new file K.  So commit #3 has the resulting merge contents in G
 and has K, inherited from #1.  This _might_ be different from
 what we traditionally consider a 'merge', but from the use case
 point of view it is a valid thing one would want to do.

I think this is actually quite a regular merge, and I think we should be 
able to offer some assistance. The situation with K is normal: case #3ALT. 
If someone introduces a file and there's no file or directory with that 
name in other trees, we assume that the merge should include it.

F/G is trickier, and I don't think we can actually do much about it with 
the current structure of read-tree/merge-cache/etc, but, theoretically, we 
should recognize that #0-#1 is a rename plus content changes, and #0-#2 
is content changes, so the total should be the rename plus contents 
changes; I think we want to additionally signal a conflict, because 
there's a reasonable chance that the rename will interfere with the #0-#2 
changes, and need intervention. Most likely, this just means that we 
should not commit automatically, but have the user test the result first.

For now, of course, we don't get renames at any point in the merging 
procedure, so our code can't tell, and sees it as a big conflict that the 
user has to deal with. But we can agree on what the result is if the user 
includes all the changes from the other branch (and see the situation 
you reported first as cherry-picking the content and leaving the 
structural changes).

 Commit #4 is a continued development from #2; changes are made
 to F, and there is no K.  Commit #5 similarly is a continued
 development from #3; its changes are made to G and K also has
 further changes.
 
 We are about to merge #6 into #5 to create #7.  We should be
 able to take advantage of what the user did when the merge #3
 was made; namely, we should be able to infer that the line of
 development that flows #0 .. #3 .. #7 prefers to rename F to G,
 and also wants the newly introduced K.  We should be able to
 tell it by looking at what the merge #3 did.

Again, K should be unexceptional, because we're keeping a file that was 
added to one side but not the other. (In the other situation, it still 
works; relative to the common ancestor, we're in #8ALT, since #5 doesn't 
have K, which was in #2 and #6; we see the rejection in a merge as a 
removal, which is effectively the same.)

 Now, how can we use git to figure that out?

First off, it should handle K automatically, because we're still including 
a file added by one side without interference from the other side.

 First, given our current head (#5) and the other head we are
 about to merge (#6), we need a way to tell if we merged from
 them before (i.e. the existence of #3) and if so the latest of
 such merge (i.e. #3).
 
 The merge base between #5 and #6 is #2.  We can look at commits
 between us (#5) and the merge base (#2), find a merge (#3),
 which has two parents.  One of the parents is #2 which is
 reachable from #6, and the other is #1 which is not reachable
 from #6 but is reachable from #5.  Can we say that this reliably
 tells us that #2 is on their side and #1 is on our side?  Does
 the fact that #3 is the commit topologically closest to #5 tell
 us that #3 is the one we want to look deeper?
 
 This is still handwaving, but 

Re: Moved files and merges

2005-09-04 Thread Junio C Hamano
Daniel Barkalow [EMAIL PROTECTED] writes:

 I think this is actually quite a regular merge, and I think we should be 
 able to offer some assistance. The situation with K is normal: case #3ALT. 
 If someone introduces a file and there's no file or directory with that 
 name in other trees, we assume that the merge should include it.

I was not particularly interested in discussing the initial
merge, which is a perfectly regular merge as you said.  I was
more focusing on reusing the tree-structure change information
we _could_ find in merge #3 when we make later merges, because
that merge is something the user did in the past and would be a
good guide for guessing what the user wants to happen to this
round.

There is no question about K in 'keeping addition' case.  It
gets interesting only when the first merge prefered 'reject
addition by them' and we would want to reuse that preference in
the second merge.  But as I tried to clarify in the a couple of
things worth mentioning message, there is no fundamental reason
to treat removal and addition any differently.  It is just a way
to reduce unnecessary conflicts.

 Most likely, this just means that we 
 should not commit automatically, but have the user test the result first.

No question about it again.

 Of course, read-tree is in flux at 
 the moment, so making more structural changes to it at the same time is 
 awkward.

Doing this in read-tree is a bit premature.  I'd prefer a
scripted solution first to see what we want and how well it
works in practice.

   1
  / \
 0-2-3-5-7
\   /
 4-6

 It shouldn't matter to the merge at 7 if the 2-3 reorganization was done 
 locally, by applying a patch, or by merging.

There was another problem in my message that treated #3
specially.  I did it that way primarily because I wanted to have
an algorithm that needs to look only limited (namely, one)
number of commits, more than what we currently look at.  The
problem is that the trail #0..#1..#3 (in the example in second
message, whose rename probably happened between #0 and #1) may
change the contents of the renamed file so drastically that diff
between #2 and #3 may not look like rename anymore, while we
could still detect it if we followed the whole trail and looked
for renames between each commit on it.

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moved files and merges

2005-09-03 Thread Junio C Hamano
This is a simplified scenario of klibc vs klibc-kbuild HPA had
trouble with, to help us think of a way to solve this
interesting merge problem.

 #1 - #3 - #5 - #7
   // /
#0 - #2 - #4 - #6

There are two lines of developments.  #0-#2 renames F to G and
introduces K.  #0-#1 keeps F as F and does not introduce K.

At commit #3, #2 is merged into #1.  The changes made to the
file contents of F between #0 and #2 are appreciated, but the
renaming of F to G and introduction of K were not.  So commit #3
has the resulting merge contents in F and does not have file K.
This _might_ be different from what we traditionally consider a
'merge', but from the use case point of view it is a valid thing
one would want to do.

Commit #4 is a continued development from #2; changes are made
to G, and K has further changes.  Commit #5 similarly is a
continued development from #3; its changes are in F and K does
not exist.

We are about to merge #6 into #5 to create #7.  We should be
able to take advantage of what the user did when the merge #3
was made; namely, we should be able to infer that the line of
development that flows #0 .. #3 .. #7 prefers to keep F as F,
and does not want the newly introduced K.  We should be able to
tell it by looking at what the merge #3 did.

Now, how can we use git to figure that out?

First, given our current head (#5) and the other head we are
about to merge (#6), we need a way to tell if we merged from
them before (i.e. the existence of #3) and if so the latest of
such merge (i.e. #3).

The merge base between #5 and #6 is #2.  We can look at commits
between us (#5) and the merge base (#2), find a merge (#3),
which has two parents.  One of the parents is #2 which is
reachable from #6, and the other is #1 which is not reachable
from #6 but is reachable from #5.  Can we say that this reliably
tells us that #2 is on their side and #1 is on our side?  Does
the fact that #3 is the commit topologically closest to #5 tell
us that #3 is the one we want to look deeper?

This is still handwaving, but assuming the answers to these
questions are yes, we have found that the 'previous' merge is
#3, that #1 is its parent on our side, and that #2 is its parent
on their side.

Then we can ask 'diff-tree -M #2 #3' to see what `tree
structure` changes we do _not_ want from their line of
development, while slurping the contents changes from them.
When making the tree to put at #7, just like I outlined to my
previous message to HPA, we can first create a tree that is a
derivative of #6 with only the structural changes detected
between #2 and #3 (which are 'rename from G to F' and 'removal
of K') applied.  Similarly, we make another derivative, this
time of #2, with only the structural changes to adjust it to
'our' tree (again, 'rename from G to F' and 'removal of K').
Then we can run 3-way git-read-tree like this:

git-read-tree -m -u '#2-adjusted' '#5' '#6-adjusted'

The last part, using the structurally adjusted tree as the
merge-base tree, is what I forgot to do in the previous message
to HPA.

Hmm.

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moved files and merges

2005-09-03 Thread Sam Ravnborg
On Sat, Sep 03, 2005 at 01:25:50AM -0700, Junio C Hamano wrote:
 Junio C Hamano [EMAIL PROTECTED] writes:
 
  H. Peter Anvin [EMAIL PROTECTED] writes:
 
  I currently have two klibc trees,
 
  I cloned them to take a look.  You_do_ seem to have a lot of
  renames.
 
 Well, I think I understand how your trees ancestry looks like,
 but still haven't come up with a good problem definition.  I am
 sorry that this message is not a solution for your problem but
 would end up to be just my rambling and thinking aloud.
 
 The ancestry looks like this:
 
#4-#5---#7   #0: 1.0.14 released, next version is 1.0.15
   /  /  5691e96ebfccd21a1f75d3518dd55a96b311d1aa
  /---#1-#3---#6 #1: Explain why execvpe/execlpe work the way they do.
 // /1d774a8cbd8e8b90759491591987cb509122bd78
   #0-#2 #2: 1.1 released, next version is 1.1.1
 3a41b60f6730077db3f04cf2874c96a0e53da453
 #3: Merge of #2 into #1
 7ab38d71de2964129cf1d5bc4e071d103e807a0d
 #4: socketcalls aren't always *.S files; they can...
 f52be163e684fc3840e557ecf242270926136b67
 #5: Merge of #3 into #4
 2e2a79d62a96b6b0d4bc93697fe77cd3030cdfd9
 #6: Warnings cleanup
 f5260f8737517f19a03ee906cd64dfc9930221cd
 #7: Remove obsoleted files from merge
 59709a172ee58c9d529a8c4b6f5cf53460629cb3
 
 and you are trying to merge #6 into #7 (or #7 into #6).  #6 does
 not have usr/kinit and nfsmount at the top; #7 has nfsmount
 under usr/kinit/.


Hi Junio.

Ican expalin some of the background for this particular merge.
At about one month ago I cloned the current klibc.git tree and started
doing the necessary modifications needed to introduce kbuild - the
build system used in the kernel.
Futhermore we decided to move files around so they fit the directory
structure planned to be used in the kernel - when we at one point in the
future merged with mainline.
While I were modifying the build system the development continued and a
few files saw some updates in the official klibc tree.

So what we want to do in this case is:
- Merge the kbuild changes into the official tree without loosing the
  changes made to renamed files.

On purpose I did not modify any of the renamed files so the klibc-kbuild
tree contains renames only for these.

If it would be possible to merge:
libs/klibc/klibc.git and libs/klibc/sam/klibc-kbuild.git
using the above rules it would be perfect.

Then a few of the patches from libs/klibc/klibc-kbuild.git would have to
be applied again, but thats doable.

Anyway my view on it. Since Peter is the one doing the merge he may have
better ideas.

Sam
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moved files and merges

2005-09-03 Thread Fredrik Kuivinen
On Sat, Sep 03, 2005 at 11:46:53AM -0700, Junio C Hamano wrote:
[lots of good stuff]

I obviously misunderstood the complexity of this merge case. Thank you
for the explanation. 

- Fredrik
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moved files and merges

2005-09-03 Thread Junio C Hamano
Sam Ravnborg [EMAIL PROTECTED] writes:

 As explained in another mail what we want to do is actually to
 transpose the changes made to F to the now renamed file G.
 So we end up with G containing the modifications made to F.

 Also we want to include the new file K.

Thanks for the clarification.  But the principles are the same.

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Moved files and merges

2005-09-03 Thread H. Peter Anvin

Martin Langhoff wrote:


Probably should be hacked into cg-merge. When the merge reports a file
is missing, what happens? Does it leave a .rej file or anything?



The error message is:

MERGE ERROR: nfsmount/mount.c: Not handling case 
3225ecdf8d172cda2a6ea5276af0d3edc566a0e7 -  - 
c02da9e576a525a2a49da930107ed3936a45b6e1
MERGE ERROR: nfsmount/sunrpc.c: Not handling case 
037e33e84ebcee4e097a009439c1bab7143ef92d -  - 
e2fe5f8b728b5235010ed317e759222179dcd45c


Conflicts during merge. Do cg-commit after resolving them.

-hpa
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html