Re: [RFC, PATCH] A new merge algorithm (EXPERIMENTAL)
On Fri, Aug 26, 2005 at 04:48:32PM -0400, Daniel Barkalow wrote: On Fri, 26 Aug 2005, Fredrik Kuivinen wrote: I will try to describe how the algorithm works. The problem with the usual 3-way merge algorithm is that we sometimes do not have a unique common ancestor. In [1] B and C seems to be equally good. What this algorithm does is to _merge_ the common ancestors, in this case B and C, into a temporary tree lets call it T. It does then use this temporary tree T as the common ancestor for D and E to produce the final merge result. In the case described in [1] this will work out fine and we get a clean merge with the expected result. The only problem I can see with this is that it's likely to generate conflicts between the shared heads, and the user is going to be confused trying to resolve them, because the files with the conflicts will be missing all of the more recent changes. I don't actually think that conflicts between shared heads is a problem. Given the criss-cross case (we want to merge A and B into M): M |\ | \ A B |\/| |/\| C D | / |/ E Lets assume there is a merge conflict if we try to merge C and D (which are the two shared heads). Then both A and B must resolve this conflict. If they have done it in the same way we wont get a merge conflict at M, if they have resolved it differently we will get a merge conflict. In the first case there is no merge conflict at M, in the second case the user has to pick which one of the two different resolutions she wants. Note that the algorithm will happily write non-clean merge results to the object database during the merge shared heads stage. Hence, when we are merging C and D internally we will _not_ ask the user to resolve any eventual merge conflicts. Other than that, I think it should give the right answer, although it will presumably involve a lot of ancient history doing the internal merge. (Which would probably be really painful if you've got two branches that cross-merge regularly and never actually completely sync) The expensive part is the repeated merging. But as I wrote in my mail multiple shared heads seems to be pretty uncommon. As far as I can tell there is no reason for the number of shared heads to increase as a repository grows larger. However, this do probably depend on usage patterns. But it is certainly possible to construct cases with an arbitrary number of shared heads. In which case the algorithm will be a bit slow, at least if there are real content merges going on which are handled by merge(1). I'm getting pretty close to having a version of read-tree that does the trivial merge portion based comparing the sides against all of the shared heads. I think yours will be better for the cases we've identified, giving the correct answer for Tony's case rather than reporting a conflict, but it's clearly more complicated. I think my changes are worthwhile anyway, since they make the merging logic more central, but obviously insufficient. I've been thinking that could be useful to have read-tree figure out the history itself, instead of being passed ancestors, in which case it could use your method, except more efficiently (and only look further at the history when needed). It will be interesting to have a look at the code when you are done. I find the Git architecture with respect to merging to be quite nice. A core which handles the simple cases _fast_ and let the more complicated cases be handled by someone else. - Fredrik - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH] A new merge algorithm (EXPERIMENTAL)
On Fri, 26 Aug 2005, Fredrik Kuivinen wrote: In real numbers it is as follows: In Linus' kernel tree there are 5996 commits. 400 of those have more than one parent. Of those 400 merge commits 4 have more than one shared head. Ok, that's already interesting in itself. I was wanting to re-run all the merges with the new git-merge-base -a to see which merges might have had different merge bases, and you've actually done that. Interesting to see the numbers. * Is it worth it? That is, is the added complexity in the merge logic worth the advantages of correctly handling some strange (but real life) merge cases? I am of two minds on this. I hate the notion of a more complex merge. But at the same time, it clearly is a very interesting case when we do have multiple possible shared parents, and I think that at the very least we should warn the user. And using a more complex merge algorithm when it happens seems to be a very valid thing to do. Also, it's possible that other developers see more of the criss-crossing merges than I do - iow, they're probably more likely to happen in the throw-away trees than in some of the main trees. Neither of the two cases we've seen and had issues were merges I did, for example. Which means that your 1% of all merges number is probably low. Of course, it's quite likely that in most cases, the pick either one approach will give the exact same result as the more complex merge. Using python, which people have less exposure to, sounds like an additional thorny issue.. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH] A new merge algorithm (EXPERIMENTAL)
Linus Torvalds [EMAIL PROTECTED] writes: On Fri, 26 Aug 2005, Fredrik Kuivinen wrote: In real numbers it is as follows: In Linus' kernel tree there are 5996 commits. 400 of those have more than one parent. Of those 400 merge commits 4 have more than one shared head. Ok, that's already interesting in itself. I was wanting to re-run all the merges with the new git-merge-base -a to see which merges might have had different merge bases, and you've actually done that. Interesting to see the numbers. Fredrik, mind giving us the commit ID of those four for us to take a look at them? I am of two minds on this. I hate the notion of a more complex merge. But at the same time, it clearly is a very interesting case when we do have multiple possible shared parents, and I think that at the very least we should warn the user. And using a more complex merge algorithm when it happens seems to be a very valid thing to do. I agree. GIT is lightening fast for trivial cases and we can afford to spend more time to handle more complex ones carefully, at the same time telling the user what we are doing is a good thing. Using python, which people have less exposure to, sounds like an additional thorny issue.. Not too big a problem for me to follow the patch ;-). - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC, PATCH] A new merge algorithm (EXPERIMENTAL)
On Fri, Aug 26, 2005 at 06:08:33PM -0700, Junio C Hamano wrote: Linus Torvalds [EMAIL PROTECTED] writes: On Fri, 26 Aug 2005, Fredrik Kuivinen wrote: In real numbers it is as follows: In Linus' kernel tree there are 5996 commits. 400 of those have more than one parent. Of those 400 merge commits 4 have more than one shared head. Ok, that's already interesting in itself. I was wanting to re-run all the merges with the new git-merge-base -a to see which merges might have had different merge bases, and you've actually done that. Interesting to see the numbers. Fredrik, mind giving us the commit ID of those four for us to take a look at them? Sure, they are: 467ca22d3371f132ee225a5591a1ed0cd518cb3d which has two shared heads 7e2987503dda95a5f80290bb8c06279009c2419e and eff910a91ac04ab1d9e210d4f721484af3b39c8d 0e396ee43e445cb7c215a98da4e76d0ce354d9d7 with the heads 462cee296476278acaa54c41925b3273e0e4dd40 and 8be3de3fd8469154a2b3e18a4712032dac5b4a53 3190186362466658f01b2e354e639378ce07e1a9 with 38d84c3bd6dd22bdb1f797c87006931133d71aea and 46906c4415f88cebfad530917bada0835d651824 and finally da28c12089dfcfb8695b6b555cdb8e03dda2b690 with 9e566d8bd61f939b7f5d7d969f5b178571471cf9 and 18190cc08d70a6ec8ef69f0f6ede021f7cb3f9b8 (The script which finds those commits also prints out the commit id of any octopus commits. There is one commit with more than two parents in the kernel repository, 13e652800d1644dfedcd0d59ac95ef0beb7f3165. However, it only looks like that commit has three parents, two of them are actually identical.) I am of two minds on this. I hate the notion of a more complex merge. But at the same time, it clearly is a very interesting case when we do have multiple possible shared parents, and I think that at the very least we should warn the user. And using a more complex merge algorithm when it happens seems to be a very valid thing to do. I agree. GIT is lightening fast for trivial cases and we can afford to spend more time to handle more complex ones carefully, at the same time telling the user what we are doing is a good thing. Using python, which people have less exposure to, sounds like an additional thorny issue.. Not too big a problem for me to follow the patch ;-). Good to know :) I mainly wanted to try the idea with the new algorithm, so some kind of high level language seemed like a good choice. - Fredrik - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html