Re: Re: Re: Re: write-tree is pasky-0.4
Dear diary, on Sat, Apr 16, 2005 at 02:22:45AM CEST, I got a letter where Linus Torvalds <[EMAIL PROTECTED]> told me that... > > > On Sat, 16 Apr 2005, Petr Baudis wrote: > > > > But otherwise it is great news to me. Actually, in that case, is it > > worth renaming it to Cogito and using cg to invoke it? Wouldn't be that > > actually more confusing after it gets merged? IOW, should I stick to > > "git" or feel free to rename it to "cg"? > > I'm perfectly happy for it to stay as "git", and in general I don't have > any huge preferences either way. You guys can discuss names as much as you > like, it's the "tracking renames" and "how to merge" things that worry me. :-) > I think I've explained my name tracking worries. When it comes to "how to > merge", there's three issues: > > - we do commonly have merge clashes where both trees have applied the >exact same patch. That should merge perfectly well using the 3-way >merge from a common parent that Junio has, but not your current "bring >patches forward" kind of strategy. My current "bring patches forward" strategy is only very interim, to have something working well enough for me to merge with you. I will gladly change it to use merge-tree*, when it is done. (Or read-tree -m - I will yet have to have a look, but it looks extremely promising.) > - I _do_ actually sometimes merge with dirty state in my working >directory, which is why I want the merge to take place in a separate >(and temporary) directory, which allows for a failed merge without >having any major cleanup. If the merge fails, it's not a big deal, and >I can just blow the merge directory away without losing the work I had >in my "real" working directory. Ok. But still, especially when you do some nontrivial conflicts resolving, how do you check if it even compiles after the merge? Or do you just commit it and possibly fix the compilation in another commit? > - reliability. I care much less for "clever" than I care for "guaranteed >to never do the wrong thing". If I have to fix up some stuff by hand, >I'll happily do so. But if I can't trust the merge and have to _check_ >things by hand afterwards, that will make me leery of the merges, and >_that_ is bad. > > The third point is why I'm going to the ultra-conservative "three-way > merge from the common parent". It's not fancy, but it's something I feel > comfortable with as a merge strategy. For example, arch (and in particular > darcs) seems to want to try to be "clever" about the merges, and I'd > always live in fear. I agree and I would like to achieve the same. I too think the three-way merge from the common parent is the best way to go for now. > And, finally, there's obviously performance. I _think_ a normal merge with > nary a conflict and just a few tens of files changed should be possible in > a second. I realize that sounds crazy to some people, but I think it's > entirely doable. Half of that is writing the new tree out (that is a > relative costly op due to the compression). The other half is the "work". Being written in shell, there is plenty of space for optimization - from using bash internals instead of textutils to rewriting parts of it in C. My priority now is to get it right first, though. :-) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: write-tree is pasky-0.4
On Fri, 15 Apr 2005, Linus Torvalds wrote: > On Fri, 15 Apr 2005, Daniel Barkalow wrote: > > > > So you want to merge someone else's tree into your committed state, and > > then merge the result with your working directory to get the working > > directory you continue with, provided that the second merge is trivial? > > No, you don't even "merge" the working directory. > > The low-level tools should entirely ignore the working directory. To a > low-level merge, the working directory doesn't even exist. It just gets > three commits (or trees) and merges two of them with the third as a > parent, and does all of it in it's own temporary "merge working > directory". It seems like users won't expect there to be a new working directory for the merge in which they are supposed to resolve te conflicts, but where they don't see their uncommited changes. In any case, the low-level tools have to care about *some* working directory, even if it isn't the parent of .git, and the parent of .git seems like where other similar things happen. If we're being conservative about merging, we're likely to report a lot of conflicts, at least until we work out better techniques than a simple 3-way merge. > > For the latter, there are sometimes multiple ancestors which fit this > > criterion > > Yes. Let's just pick one at random (or more likely, the latest one by > date - let's not actually be _random_ random) at first. Okay; I've currently got the one where the number of generations it is away from the further head is the smallest, and of equal ones, an arbitrary choice. If people are generally similar in the amount they diverge before commiting, this should be the most similar ancestor. > There are other heuristics we can try, ie if it turns out that it's common > to have a couple of alternatives (but no more than some small number, say > five or so), we can literally just -try- to do a tree-only merge, and see > how many lines out common output you get from "diff-tree". > > Because that "how mnay files do we need to merge" is the number you want > to minimize, and doing a couple of extra "diff-tree" + "join" operations > should be so fast that nobody will notice that we actually tried five > different merges to see which one looked the best. > > But hey, especially if the merge fails with real clashes (ie there are > changes in common and running "merge" leaves conflicts), and there were > other alternate parents to choose, there's nothing wrong with just > printing them out and saying "you might try to specify one of these > manually". I think we should be able to get good results out of doing the 5 merges and reporting a conflict only if there's a conflict in all of them; it shouldn't be possible for two to succeed but give different results (if it did, clearly our current algorithm is unsafe, since it would give some undesired output if it happened to use the wrong ancestor). I'm thinking of not actually calling "merge(1)" for this at all; it just calls diff3, and diff3 is only 1745 lines including option parsing. We can probably arrange to look around for better ancestors in case of conflicts we'd otherwise have to report, and get this all tidy and more efficient than having diff3 re-read files. And if we only go to other ancestors in case of conflicts, we're going to be a lot faster total than getting a reaction from the user, almost no matter what we do. > I really don't think we should worry too much about this until we've > actually used the system for a while and seen what it does. So just start > with "nearest common parent with most recent date". Which I think you > already implemented, no? I've got something like that (see above); did you want it in some form other than the patch I sent you? -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: write-tree is pasky-0.4
On Fri, 15 Apr 2005, Daniel Barkalow wrote: > > So you want to merge someone else's tree into your committed state, and > then merge the result with your working directory to get the working > directory you continue with, provided that the second merge is trivial? No, you don't even "merge" the working directory. The low-level tools should entirely ignore the working directory. To a low-level merge, the working directory doesn't even exist. It just gets three commits (or trees) and merges two of them with the third as a parent, and does all of it in it's own temporary "merge working directory". So on a technical level, the "plumbing" part really really doesn't care at all. However, from a _usability_ part, you expect after a merge that your working directory has been updated to be the merged tree. And that's where the "if I have a working tree that is dirty, I want that part to fail" comes in. In other words, the final phase (after the "tree-merge" has actually successfully already finished) is to go back to the working directory, and check out the merged results. But that checkout would be a variation on "checkout-cache -a" which first checks that none of the files it is going to overwrite are dirty. Don't worry about this part. It's really totally separate from the true merge itself. The "real work" has already been done by the time we notice that "oops, we can't actually show him the newly merged tree, because he has got dirty data where we want to show it". > > I care. Even if the best common parent is 3 months ago, I care. I'd much > > rather get a big explicit conflict than a "clean merge" that ends up being > > debatable because people played games with per-file merging or something > > questionable like that. > > Are you thinking that the best common ancestor is the one that ties up > absolutely all of the chains of commits, or the closest one that the sides > have in common? The closest common one. > For the latter, there are sometimes multiple ancestors which fit this > criterion Yes. Let's just pick one at random (or more likely, the latest one by date - let's not actually be _random_ random) at first. There are other heuristics we can try, ie if it turns out that it's common to have a couple of alternatives (but no more than some small number, say five or so), we can literally just -try- to do a tree-only merge, and see how many lines out common output you get from "diff-tree". Because that "how mnay files do we need to merge" is the number you want to minimize, and doing a couple of extra "diff-tree" + "join" operations should be so fast that nobody will notice that we actually tried five different merges to see which one looked the best. But hey, especially if the merge fails with real clashes (ie there are changes in common and running "merge" leaves conflicts), and there were other alternate parents to choose, there's nothing wrong with just printing them out and saying "you might try to specify one of these manually". I really don't think we should worry too much about this until we've actually used the system for a while and seen what it does. So just start with "nearest common parent with most recent date". Which I think you already implemented, no? Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: write-tree is pasky-0.4
On Fri, 15 Apr 2005, Linus Torvalds wrote: > On Fri, 15 Apr 2005, Daniel Barkalow wrote: > > > > Is there some reason you don't commit before merging? All of the current > > merge theory seems to want to merge two commits, using the information git > > keeps about them. > > Note that the 3-way merge would _only_ merge the committed state. The > thing is, 99% of all merges end up touching files that I never touch > myself (ie other architectures), so me being able to merge them even when > _I_ am in the middle of something is a good thing. > > So even when I have dirty state, the "merge" would only merge the clean > state. And then before the merge information is put back into my working > directory, I'd do a "check-files" on the result, making sure that nothing > that got changed by the merge isn't up-to-date. So you want to merge someone else's tree into your committed state, and then merge the result with your working directory to get the working directory you continue with, provided that the second merge is trivial? > > How much do you care about the situation where there is no best common > > ancestor > > I care. Even if the best common parent is 3 months ago, I care. I'd much > rather get a big explicit conflict than a "clean merge" that ends up being > debatable because people played games with per-file merging or something > questionable like that. Are you thinking that the best common ancestor is the one that ties up absolutely all of the chains of commits, or the closest one that the sides have in common? I have the feeling that the former isn't going to be useful, because there will be lines you're considering merging which go back to ancient kernels, where they keep merging in your changes, but they still have a lineage back to 2.6.0 or something. For the latter, there are sometimes multiple ancestors which fit this criterion, and different ones of them are most helpful for different portions of the merge. I think this primarily happens when a branch you want to merge has accepted multiple patches that you've also accepted (and the history identifies this fact); this may or may not be a situation you want to allow on a regular basis. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: write-tree is pasky-0.4
On Fri, 15 Apr 2005, Daniel Barkalow wrote: > > Is there some reason you don't commit before merging? All of the current > merge theory seems to want to merge two commits, using the information git > keeps about them. Note that the 3-way merge would _only_ merge the committed state. The thing is, 99% of all merges end up touching files that I never touch myself (ie other architectures), so me being able to merge them even when _I_ am in the middle of something is a good thing. So even when I have dirty state, the "merge" would only merge the clean state. And then before the merge information is put back into my working directory, I'd do a "check-files" on the result, making sure that nothing that got changed by the merge isn't up-to-date. > How much do you care about the situation where there is no best common > ancestor I care. Even if the best common parent is 3 months ago, I care. I'd much rather get a big explicit conflict than a "clean merge" that ends up being debatable because people played games with per-file merging or something questionable like that. > I think that the time spent on I/O will be overwhelmed by the time spent > issuing the command at that rate. There is no time at all spent on IO. All my email is local, and if this all ends up working out well, I can track the other peoples object trees in local subdirectories with some daily rsyncs. And I have enough memory in my machines that there is basically no disk IO - the only tree I normally touch is the kernel trees, they all stay in cache. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: write-tree is pasky-0.4
On Fri, 15 Apr 2005, Linus Torvalds wrote: > I think I've explained my name tracking worries. When it comes to "how to > merge", there's three issues: > > - we do commonly have merge clashes where both trees have applied the >exact same patch. That should merge perfectly well using the 3-way >merge from a common parent that Junio has, but not your current "bring >patches forward" kind of strategy. I think 3-way merge is probably the best starting point, but I think that there might be value in being able to identify the commits of each side involved in a conflict. I think this would help with cases where both sides pick up an identical patch, and then each side makes a further change to a different part of the changed region (you find out that the other guy's change was supposed to follow the patch, and don't conflict with it). > - I _do_ actually sometimes merge with dirty state in my working >directory, which is why I want the merge to take place in a separate >(and temporary) directory, which allows for a failed merge without >having any major cleanup. If the merge fails, it's not a big deal, and >I can just blow the merge directory away without losing the work I had >in my "real" working directory. Is there some reason you don't commit before merging? All of the current merge theory seems to want to merge two commits, using the information git keeps about them. It should be cheap to get a new clean working directory to merge in, too, particularly if we add a cache of hardlinkable expanded blobs. > - reliability. I care much less for "clever" than I care for "guaranteed >to never do the wrong thing". If I have to fix up some stuff by hand, >I'll happily do so. But if I can't trust the merge and have to _check_ >things by hand afterwards, that will make me leery of the merges, and >_that_ is bad. > > The third point is why I'm going to the ultra-conservative "three-way > merge from the common parent". It's not fancy, but it's something I feel > comfortable with as a merge strategy. For example, arch (and in particular > darcs) seems to want to try to be "clever" about the merges, and I'd > always live in fear. How much do you care about the situation where there is no best common ancestor (which can happen if you're merging two main lines, each of which has merged with both of a pair of minor trees)? I think that arch is even more conservative, in that it doesn't look for a common ancestor, and reports conflicts whenever changes overlap at all. Of course, reliability by virtue of never working without help is not a big win over living in fear; you always have to check over it, not because you're afraid, but because it needs you to. > And, finally, there's obviously performance. I _think_ a normal merge with > nary a conflict and just a few tens of files changed should be possible in > a second. I realize that sounds crazy to some people, but I think it's > entirely doable. Half of that is writing the new tree out (that is a > relative costly op due to the compression). The other half is the "work". I think that the time spent on I/O will be overwhelmed by the time spent issuing the command at that rate. It might matter if you start getting into merging lots of things at once, but that's more like a minute for a merge group with 600 changes rather than a second per merge; we could potentially save a lot of time based of having a bunch of information left over from the previous merge when starting merge number 2. So 15 seconds plus half a second per merge might be better than a second per merge in the case that matters. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: write-tree is pasky-0.4
On Sat, 16 Apr 2005, Petr Baudis wrote: > > But otherwise it is great news to me. Actually, in that case, is it > worth renaming it to Cogito and using cg to invoke it? Wouldn't be that > actually more confusing after it gets merged? IOW, should I stick to > "git" or feel free to rename it to "cg"? I'm perfectly happy for it to stay as "git", and in general I don't have any huge preferences either way. You guys can discuss names as much as you like, it's the "tracking renames" and "how to merge" things that worry me. I think I've explained my name tracking worries. When it comes to "how to merge", there's three issues: - we do commonly have merge clashes where both trees have applied the exact same patch. That should merge perfectly well using the 3-way merge from a common parent that Junio has, but not your current "bring patches forward" kind of strategy. - I _do_ actually sometimes merge with dirty state in my working directory, which is why I want the merge to take place in a separate (and temporary) directory, which allows for a failed merge without having any major cleanup. If the merge fails, it's not a big deal, and I can just blow the merge directory away without losing the work I had in my "real" working directory. - reliability. I care much less for "clever" than I care for "guaranteed to never do the wrong thing". If I have to fix up some stuff by hand, I'll happily do so. But if I can't trust the merge and have to _check_ things by hand afterwards, that will make me leery of the merges, and _that_ is bad. The third point is why I'm going to the ultra-conservative "three-way merge from the common parent". It's not fancy, but it's something I feel comfortable with as a merge strategy. For example, arch (and in particular darcs) seems to want to try to be "clever" about the merges, and I'd always live in fear. And, finally, there's obviously performance. I _think_ a normal merge with nary a conflict and just a few tens of files changed should be possible in a second. I realize that sounds crazy to some people, but I think it's entirely doable. Half of that is writing the new tree out (that is a relative costly op due to the compression). The other half is the "work". Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: write-tree is pasky-0.4
Dear diary, on Fri, Apr 15, 2005 at 10:13:21PM CEST, I got a letter where Linus Torvalds <[EMAIL PROTECTED]> told me that... > > > On Fri, 15 Apr 2005, Petr Baudis wrote: > > > > So, I assume that you don't want to merge my "SCM layer" (which is > > perfectly fine by me). However, I also apply plenty of patches > > concerning the "core git" - be it portability, leak fixes, argument > > parsing fixes and so on. > > I'm actually perfectly happy to merge your SCM layer too eventually, but > I'm nervous at this point. Especially while people are discussing some > SCM options that I'm personally very leery of, and think that may make > sense for others, but that I personally distrust. You mean the renames tracking and similar yet mostly theoretical discussions? Or do you dislike something already implemented? I'd be happy to hear about it in that case. (To argue about it and likely get persuaded... ;-) But otherwise it is great news to me. Actually, in that case, is it worth renaming it to Cogito and using cg to invoke it? Wouldn't be that actually more confusing after it gets merged? IOW, should I stick to "git" or feel free to rename it to "cg"? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: write-tree is pasky-0.4
Dear diary, on Fri, Apr 15, 2005 at 10:58:10PM CEST, I got a letter where "C. Scott Ananian" <[EMAIL PROTECTED]> told me that... > On Fri, 15 Apr 2005, Junio C Hamano wrote: > > >to yours is no problem for me. Currently I see your HEAD is at > >461aef08823a18a6c69d472499ef5257f8c7f6c8, so I will generate a > >set of patches against it. > > Have you considered using an s/key-like system to make these hashes more > human-readable? Using the S/Key translation (11-bit chunks map to a 1-4 > letter word), Linus' HEAD is at: > WOW-SCAN-NAVE-AUK-JILL-BASH-HI-LACE-LID-RIDE-RUSE-LINE-GLEE-WICK-A > ...which is a little longer, but speaking of branch "wow-scan" (which > gives 22 bits of disambiguation) is probably less error-prone than > discussing branch '461...' (only 12 bits). > > You could supercharge this algorithm by using (say) > /usr/dict/american-english-large (>2^17 words; 160 bits of hash = 10 > dictionary words), or mixing upper and lower case (likely to reduce the 15 > word s/key phrase to ~11 words) to give something like >RiDe-Rift-rIMe-rOSy-ScaR-sCat-ShiN-sIde-Sine-seeK-TIEd-TINT > My personal feeling is that case is likely to be dropped in casual > conversation, so speaking of branch 'wow', 'wow-scan', or 'wow-scan-nave' > is likely to be significantly more useful than trying to pronounce > mixed-cased versions of these. > > This is obviously a cogito issue, rather than a git-fs thing. I kind of like it, the only thing I fear is possible conflict with branch names; it is not very likely though, I think. I believe (at least) the first three words should be used if possible. I'm not sure in what cases do you think we should use those "verbal" names, though. Of course we should accept them as IDs, but I don't think we should ever show them automatically. Probably provide a trivial to use tool to convert to them, and parameters for *-id tools to show them. I assume we would have a custom tool for the translation? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: write-tree is pasky-0.4
On Fri, 15 Apr 2005, Petr Baudis wrote: > > So, I assume that you don't want to merge my "SCM layer" (which is > perfectly fine by me). However, I also apply plenty of patches > concerning the "core git" - be it portability, leak fixes, argument > parsing fixes and so on. I'm actually perfectly happy to merge your SCM layer too eventually, but I'm nervous at this point. Especially while people are discussing some SCM options that I'm personally very leery of, and think that may make sense for others, but that I personally distrust. > BTW, just out of interest, are you personally planning to use Cogito for > your kernel and sparse (and possibly even git) work, or will you stay > with your lowlevel plumbing for that? I'm really really hoping I'd use cogito, and that it ends up being just one project. In particular, I'm hoping that in a few days, I'll have done enough plumbing that I don't even care any more, and then I'd not even maintain a tree of my own. I'm really not that much of an SCM guy. I detest pretty much all SCM's out there, and while it's been interesting to do 'git', I've done it because I was forced to, and because I really wanted to put _my_ needs and opinions first in an SCM, and see how that works. That's why I've been so adamant about having a "philosophy", because otherwise I'd probably just end up with yet another SCM that I'd despise. So for me, the "optimal" situation really ends up that you guys end up as the maintainers. I don't even _want_ to maintain it, although I'd be more than happy to be part of the engineering team. I just want to mark out the direction well enough and get it to a point where I can _use_ it, that I feel like I'm done. But before I can do that, I need to feel like I can live with the end result. The only missing part is merges, and I think you and Junio are getting pretty close (with Daniel's parent finder, Junio's merger etc). Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: write-tree is pasky-0.4
Dear diary, on Fri, Apr 15, 2005 at 08:44:02PM CEST, I got a letter where Linus Torvalds <[EMAIL PROTECTED]> told me that... > And I merged your "Add -z option to show-files", but you had based your > other patches on Petr's tree which due to my other changes is not going to > merge totally cleanly with mine, so I'm wondering if you might want to try > to re-merge your mergepoint stuff against my current tree? That way I can > continue to maintain a set of "core files", and Pasky can maintain the > "usable interfaces" part.. Actually, I wanted to ask about this. :-) So, I assume that you don't want to merge my "SCM layer" (which is perfectly fine by me). However, I also apply plenty of patches concerning the "core git" - be it portability, leak fixes, argument parsing fixes and so on. Would it be of any benefit if I maintained two trees, one with just your core git but what I merge (I think I'd call this branch git-pb), and one with my git-pasky (to be renamed to Cogito) layer. I'd then put the "core git" changes to the git-pb branch and pull from it to the Cogito branch regularily, but it should be safe for you to pull from it too. In fact, in that case I might even end up entirely separating the Cogito tools from the core git and distributing them independently. BTW, just out of interest, are you personally planning to use Cogito for your kernel and sparse (and possibly even git) work, or will you stay with your lowlevel plumbing for that? Thanks, -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html