Re: Moved files and merges
Junio C Hamano wrote: 1 / \ 0-2-3-5-7 \ / 4-6 It shouldn't matter to the merge at 7 if the 2-3 reorganization was done locally, by applying a patch, or by merging. There was another problem in my message that treated #3 specially. I did it that way primarily because I wanted to have an algorithm that needs to look only limited (namely, one) number of commits, more than what we currently look at. The problem is that the trail #0..#1..#3 (in the example in second message, whose rename probably happened between #0 and #1) may change the contents of the renamed file so drastically that diff between #2 and #3 may not look like rename anymore, while we could still detect it if we followed the whole trail and looked for renames between each commit on it. One question, of course, is if one should simply keep additional metadata around to handle this sort of situations. One could, for example, keep a UUID for each file, which would be carried over by the renaming commit. If one runs into a tree which doesn't have the UUIDs, they should be generated at that time (this could be a bit tricky to do without invalidating all signatures in the tree, since the obvious way -- adding it to the tree object -- would invalidate all the commit and tag objects.) In some ways this is similar to the Unix filesystem model of separating location (pathname) from identity (device:inode). It would also hade the somewhat interesting possibility that one could remove and recreate a file and have it exist as a different entity. That probably needs to be a user option. -hpa - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moved files and merges
On Mon, 5 Sep 2005, H. Peter Anvin wrote: It would also hade the somewhat interesting possibility that one could remove and recreate a file and have it exist as a different entity. That probably needs to be a user option. It's a totally broken model. Really. You think it solves issues, but it just creates more bugs and problems than it solves. Trust me. The whole point of git is that content is the only thing that matters, and that there isn't any other meta-data. If you break that fundamental assumption, everything git does so well will break. I think we've already shown that the content matters approach works. I claim that the git rename tracking works better than any other SCM out there, _exactly_ because it doesn't make the mistake of trying to track anything but content. The moved + modified files is not anything special. The current automatic merger may not handle it, but that's not because it _can't_ handle it, it's because it tries to be simple and efficient. And because it's so _incredibly_ fast for all the normal cases, you can now spend some effort on figuring out renames dynamically for the few cases where it fails. Does it do so now? No. Would adding UUID's help? Hell no. It would be just an unmitigated disaster. Exactly the same way git-diff-tree can figure out renames, a merge algorithm can figure them out. Right now, we have two stages in merges: we try the trivial merge first (pure git-read-tree), and when that fails, we try the automatic 3-way merge. The fact that we don't have a third (and fourth, and fifth) merge algorithm for when those two trivial merges happen to not work is _not_ an indication that the contents only approach doesn't work - it's just an indication of the fact that 99.9% of all merges are trivial, and they should be optimized for. So the next step is _not_ to do UUID's, it's to notice that merge errors happened, and try to figure out why. Right now we just give up and say sort it out by hand. That's actually a perfectly valid approach even in the presense of moved files - it's a bit painful, but once you _do_ sort it out and commit the merge, especially if you can push the merge back (so that both sides then agree on the final rename), future merges will be trivial again - ie you won't have to go through it over and over again. Of course, if you don't push it back, but keep the two trees separate and keep on modifying files that have different names in the other repository, you'll keep on getting into the situation that the trivial merge doesn't work. So we _do_ want to get an automated phase 3 (and maybe 4..) merge that can figure out renames, but the point here is that it's something we _can_ figure out. For example, one way of doing it is to just do the exact merge we do now, and then look at the files that didn't merge. Do a cross-diff between such files and new/deleted files (if not _exactly_ the way we do for git diff -M, then at least it's exactly the same concept), and try to do a three-way merge where the base/first/second pairs don't have the same name. For example, let's say that you have the common commit A, and file x, and two paths (B and C) where B has renamed the file x to y, and C has modified file x. You end up with the schenario that our trivial merge fails to handle, and right now we give up, and don't help the user very much at all. But the _solution_ is not to change read-tree to know about renames, nor is it to make git keep any new data. The solution is to just make phase 3 say: - Automatic merge failed, trying rename merge - go through all files that exist in C but not in B (or vice versa), and pair them up with all files that exist in B but not in C (or vice versa) and see if _they_ can be handled as a three-way merge. And exactly the same way that we do the rename detection, we may want to find the optimal pairing by looking at the distance between the files. Notice? This will automatically handle the renamed in one branch, modified in another case. In fact, if the renamer modified it too, that's not a problem at all - the three-way merge will work exactly the same way it does now with the case of a non-moved modified in both files. Problem solved. Without complicating the trivial (and very common) cases, and without introducing any new metadata that is fundamentally impossible to maintain (and it _is_ fundamentally impossible to maintain, because it has nothing to do with the contents of the files, so patches etc will by definition break it). Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moved files and merges
Linus Torvalds wrote: It's a totally broken model. Really. You think it solves issues, but it just creates more bugs and problems than it solves. Trust me. The whole point of git is that content is the only thing that matters, and that there isn't any other meta-data. If you break that fundamental assumption, everything git does so well will break. I think we've already shown that the content matters approach works. I claim that the git rename tracking works better than any other SCM out there, _exactly_ because it doesn't make the mistake of trying to track anything but content. The moved + modified files is not anything special. The current automatic merger may not handle it, but that's not because it _can't_ handle it, it's because it tries to be simple and efficient. And because it's so _incredibly_ fast for all the normal cases, you can now spend some effort on figuring out renames dynamically for the few cases where it fails. Does it do so now? No. Would adding UUID's help? Hell no. It would be just an unmitigated disaster. Okay, how about keeping a cache (derived from the contents) of these types of data, to assist the merge if necessary? If it doesn't exist when needed, it can just be created, which may take O(n) time. -hpa - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moved files and merges
Linus Torvalds [EMAIL PROTECTED] writes: Of course, if you don't push it back, but keep the two trees separate and keep on modifying files that have different names in the other repository, you'll keep on getting into the situation that the trivial merge doesn't work. So we _do_ want to get an automated phase 3 (and maybe 4..) merge that can figure out renames, but the point here is that it's something we _can_ figure out. Thanks. You very well said exactly what I should have said; I failed to explain where that `reusing rename information from previous merge` algorithm should fit in the bigger picture. And the algorithm you describe (and Daniel briefly outlined) is slightly different from what I had in mind in that it does not have to depend on previous merge, which is also nice. - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moved files and merges
H. Peter Anvin [EMAIL PROTECTED] writes: One question, of course, is if one should simply keep additional metadata around to handle this sort of situations. One could, for example, keep a UUID for each file,... If I am not mistaken, that is exactly what tla does. It seems to work well in practice and seem so simple (at least superficially, I have not looked deeply into the issues involved in keeping it sync with the contents and how to recover if the user ever screws up, etc.), and I can see why people find it so attractive. I myself once did. But previous argument by Linus made in a distant (in git timescale) past is now ingrained in my brain: the additional metadata recorded at the commit time can only help us what we envisioned in the past when the tool to record that metadata was written. If we try to track by contents, we can do at least the same (diff -M being able to tell renames is an example that we can get away without having a UUID) and possibly better, depending on how much effort we are willing to spend drilling down when we actually need to know what happened at merge time. What I found most important in that argument by Linus is that the drilling down algorithm can improve over time while the additional metadata specification is cast in stone when a commit is made. - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moved files and merges
On Sun, 4 Sep 2005, Junio C Hamano wrote: Sam Ravnborg [EMAIL PROTECTED] writes: If the problem is not fully understood it can be difficult to come up with the proper solution. And with the example above the problem should be really easy to understand. Then we have the tree as used by hpa with a few more mergers in it. But the above is what was initial tried to do with the added complexity of a few more renames etc. All true. Let's redraw that simplified scenario, and see if what I said still holds. It may be interesting to store my previous message and this one and run diff between them. I suspect that the main difference to come out would be the the problem description part and the merge machinery part would not be all that different. I'm not quite so convinced, because I think that the actual situation is a bit more natural, and therefore our expectations at the end should be closer to right with less attention to detail. But I think the actual situation is more interesting, anyway, because it's more likely to happen and we're more likely to be able to help. This is a simplified scenario of klibc vs klibc-kbuild HPA had trouble with, to help us think of a way to solve this interesting merge problem. #1 - #3 - #5 - #7 // / #0 - #2 - #4 - #6 There are two lines of developments. #0-#1 renames F to G and introduces K. #0-#2 keeps F as F and does not introduce K. At commit #3, #2 is merged into #1. The changes made to the file contents of F between #0 and #2 are appreciated, but we would also want to keep our decision to rename F to G and our new file K. So commit #3 has the resulting merge contents in G and has K, inherited from #1. This _might_ be different from what we traditionally consider a 'merge', but from the use case point of view it is a valid thing one would want to do. I think this is actually quite a regular merge, and I think we should be able to offer some assistance. The situation with K is normal: case #3ALT. If someone introduces a file and there's no file or directory with that name in other trees, we assume that the merge should include it. F/G is trickier, and I don't think we can actually do much about it with the current structure of read-tree/merge-cache/etc, but, theoretically, we should recognize that #0-#1 is a rename plus content changes, and #0-#2 is content changes, so the total should be the rename plus contents changes; I think we want to additionally signal a conflict, because there's a reasonable chance that the rename will interfere with the #0-#2 changes, and need intervention. Most likely, this just means that we should not commit automatically, but have the user test the result first. For now, of course, we don't get renames at any point in the merging procedure, so our code can't tell, and sees it as a big conflict that the user has to deal with. But we can agree on what the result is if the user includes all the changes from the other branch (and see the situation you reported first as cherry-picking the content and leaving the structural changes). Commit #4 is a continued development from #2; changes are made to F, and there is no K. Commit #5 similarly is a continued development from #3; its changes are made to G and K also has further changes. We are about to merge #6 into #5 to create #7. We should be able to take advantage of what the user did when the merge #3 was made; namely, we should be able to infer that the line of development that flows #0 .. #3 .. #7 prefers to rename F to G, and also wants the newly introduced K. We should be able to tell it by looking at what the merge #3 did. Again, K should be unexceptional, because we're keeping a file that was added to one side but not the other. (In the other situation, it still works; relative to the common ancestor, we're in #8ALT, since #5 doesn't have K, which was in #2 and #6; we see the rejection in a merge as a removal, which is effectively the same.) Now, how can we use git to figure that out? First off, it should handle K automatically, because we're still including a file added by one side without interference from the other side. First, given our current head (#5) and the other head we are about to merge (#6), we need a way to tell if we merged from them before (i.e. the existence of #3) and if so the latest of such merge (i.e. #3). The merge base between #5 and #6 is #2. We can look at commits between us (#5) and the merge base (#2), find a merge (#3), which has two parents. One of the parents is #2 which is reachable from #6, and the other is #1 which is not reachable from #6 but is reachable from #5. Can we say that this reliably tells us that #2 is on their side and #1 is on our side? Does the fact that #3 is the commit topologically closest to #5 tell us that #3 is the one we want to look deeper? This is still handwaving, but
Re: Moved files and merges
Daniel Barkalow [EMAIL PROTECTED] writes: I think this is actually quite a regular merge, and I think we should be able to offer some assistance. The situation with K is normal: case #3ALT. If someone introduces a file and there's no file or directory with that name in other trees, we assume that the merge should include it. I was not particularly interested in discussing the initial merge, which is a perfectly regular merge as you said. I was more focusing on reusing the tree-structure change information we _could_ find in merge #3 when we make later merges, because that merge is something the user did in the past and would be a good guide for guessing what the user wants to happen to this round. There is no question about K in 'keeping addition' case. It gets interesting only when the first merge prefered 'reject addition by them' and we would want to reuse that preference in the second merge. But as I tried to clarify in the a couple of things worth mentioning message, there is no fundamental reason to treat removal and addition any differently. It is just a way to reduce unnecessary conflicts. Most likely, this just means that we should not commit automatically, but have the user test the result first. No question about it again. Of course, read-tree is in flux at the moment, so making more structural changes to it at the same time is awkward. Doing this in read-tree is a bit premature. I'd prefer a scripted solution first to see what we want and how well it works in practice. 1 / \ 0-2-3-5-7 \ / 4-6 It shouldn't matter to the merge at 7 if the 2-3 reorganization was done locally, by applying a patch, or by merging. There was another problem in my message that treated #3 specially. I did it that way primarily because I wanted to have an algorithm that needs to look only limited (namely, one) number of commits, more than what we currently look at. The problem is that the trail #0..#1..#3 (in the example in second message, whose rename probably happened between #0 and #1) may change the contents of the renamed file so drastically that diff between #2 and #3 may not look like rename anymore, while we could still detect it if we followed the whole trail and looked for renames between each commit on it. - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moved files and merges
This is a simplified scenario of klibc vs klibc-kbuild HPA had trouble with, to help us think of a way to solve this interesting merge problem. #1 - #3 - #5 - #7 // / #0 - #2 - #4 - #6 There are two lines of developments. #0-#2 renames F to G and introduces K. #0-#1 keeps F as F and does not introduce K. At commit #3, #2 is merged into #1. The changes made to the file contents of F between #0 and #2 are appreciated, but the renaming of F to G and introduction of K were not. So commit #3 has the resulting merge contents in F and does not have file K. This _might_ be different from what we traditionally consider a 'merge', but from the use case point of view it is a valid thing one would want to do. Commit #4 is a continued development from #2; changes are made to G, and K has further changes. Commit #5 similarly is a continued development from #3; its changes are in F and K does not exist. We are about to merge #6 into #5 to create #7. We should be able to take advantage of what the user did when the merge #3 was made; namely, we should be able to infer that the line of development that flows #0 .. #3 .. #7 prefers to keep F as F, and does not want the newly introduced K. We should be able to tell it by looking at what the merge #3 did. Now, how can we use git to figure that out? First, given our current head (#5) and the other head we are about to merge (#6), we need a way to tell if we merged from them before (i.e. the existence of #3) and if so the latest of such merge (i.e. #3). The merge base between #5 and #6 is #2. We can look at commits between us (#5) and the merge base (#2), find a merge (#3), which has two parents. One of the parents is #2 which is reachable from #6, and the other is #1 which is not reachable from #6 but is reachable from #5. Can we say that this reliably tells us that #2 is on their side and #1 is on our side? Does the fact that #3 is the commit topologically closest to #5 tell us that #3 is the one we want to look deeper? This is still handwaving, but assuming the answers to these questions are yes, we have found that the 'previous' merge is #3, that #1 is its parent on our side, and that #2 is its parent on their side. Then we can ask 'diff-tree -M #2 #3' to see what `tree structure` changes we do _not_ want from their line of development, while slurping the contents changes from them. When making the tree to put at #7, just like I outlined to my previous message to HPA, we can first create a tree that is a derivative of #6 with only the structural changes detected between #2 and #3 (which are 'rename from G to F' and 'removal of K') applied. Similarly, we make another derivative, this time of #2, with only the structural changes to adjust it to 'our' tree (again, 'rename from G to F' and 'removal of K'). Then we can run 3-way git-read-tree like this: git-read-tree -m -u '#2-adjusted' '#5' '#6-adjusted' The last part, using the structurally adjusted tree as the merge-base tree, is what I forgot to do in the previous message to HPA. Hmm. - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moved files and merges
On Sat, Sep 03, 2005 at 01:25:50AM -0700, Junio C Hamano wrote: Junio C Hamano [EMAIL PROTECTED] writes: H. Peter Anvin [EMAIL PROTECTED] writes: I currently have two klibc trees, I cloned them to take a look. You_do_ seem to have a lot of renames. Well, I think I understand how your trees ancestry looks like, but still haven't come up with a good problem definition. I am sorry that this message is not a solution for your problem but would end up to be just my rambling and thinking aloud. The ancestry looks like this: #4-#5---#7 #0: 1.0.14 released, next version is 1.0.15 / / 5691e96ebfccd21a1f75d3518dd55a96b311d1aa /---#1-#3---#6 #1: Explain why execvpe/execlpe work the way they do. // /1d774a8cbd8e8b90759491591987cb509122bd78 #0-#2 #2: 1.1 released, next version is 1.1.1 3a41b60f6730077db3f04cf2874c96a0e53da453 #3: Merge of #2 into #1 7ab38d71de2964129cf1d5bc4e071d103e807a0d #4: socketcalls aren't always *.S files; they can... f52be163e684fc3840e557ecf242270926136b67 #5: Merge of #3 into #4 2e2a79d62a96b6b0d4bc93697fe77cd3030cdfd9 #6: Warnings cleanup f5260f8737517f19a03ee906cd64dfc9930221cd #7: Remove obsoleted files from merge 59709a172ee58c9d529a8c4b6f5cf53460629cb3 and you are trying to merge #6 into #7 (or #7 into #6). #6 does not have usr/kinit and nfsmount at the top; #7 has nfsmount under usr/kinit/. Hi Junio. Ican expalin some of the background for this particular merge. At about one month ago I cloned the current klibc.git tree and started doing the necessary modifications needed to introduce kbuild - the build system used in the kernel. Futhermore we decided to move files around so they fit the directory structure planned to be used in the kernel - when we at one point in the future merged with mainline. While I were modifying the build system the development continued and a few files saw some updates in the official klibc tree. So what we want to do in this case is: - Merge the kbuild changes into the official tree without loosing the changes made to renamed files. On purpose I did not modify any of the renamed files so the klibc-kbuild tree contains renames only for these. If it would be possible to merge: libs/klibc/klibc.git and libs/klibc/sam/klibc-kbuild.git using the above rules it would be perfect. Then a few of the patches from libs/klibc/klibc-kbuild.git would have to be applied again, but thats doable. Anyway my view on it. Since Peter is the one doing the merge he may have better ideas. Sam - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moved files and merges
On Sat, Sep 03, 2005 at 11:46:53AM -0700, Junio C Hamano wrote: [lots of good stuff] I obviously misunderstood the complexity of this merge case. Thank you for the explanation. - Fredrik - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moved files and merges
Sam Ravnborg [EMAIL PROTECTED] writes: As explained in another mail what we want to do is actually to transpose the changes made to F to the now renamed file G. So we end up with G containing the modifications made to F. Also we want to include the new file K. Thanks for the clarification. But the principles are the same. - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Moved files and merges
Martin Langhoff wrote: Probably should be hacked into cg-merge. When the merge reports a file is missing, what happens? Does it leave a .rej file or anything? The error message is: MERGE ERROR: nfsmount/mount.c: Not handling case 3225ecdf8d172cda2a6ea5276af0d3edc566a0e7 - - c02da9e576a525a2a49da930107ed3936a45b6e1 MERGE ERROR: nfsmount/sunrpc.c: Not handling case 037e33e84ebcee4e097a009439c1bab7143ef92d - - e2fe5f8b728b5235010ed317e759222179dcd45c Conflicts during merge. Do cg-commit after resolving them. -hpa - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html