Re: Re: Merge with git-pasky II.
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > On Sun, 17 Apr 2005, Ingo Molnar wrote: > > > > in fact, this attack cannot even be proven to be malicious, purely via > > the email from Malice: it could be incredible bad luck that caused that > > good-looking patch to be mistakenly matching a dangerous object. > > I really hate theoretical discussions. i was only replying to your earlier point: > > > Almost all attacks on sha1 will depend on _replacing_ a file with > > > a bogus new one. So guys, instead of using sha256 or going > > > overboard, just make sure that when you synchronize, you NEVER > > > import a file you already have. which point i still believe is subtly wrong. You were suggesting to concentrate on file replacement to counter most of the practical attacks, while i pointed out an attack _using the same basic mechanism that your point above supposed_. [ if you can replace a file with a known hash, with a bogus new one, and you still have enough control over the contents of your bogus new file that it is 1) a valid file that builds 2) compromises the kernel, then you likely have the same amount of control my 'theoretical' attack requires. ] > And the thing is, _if_ somebody finds a way to make sha1 act as just a > complex parity bit, and comes up with generating a clashing object > that actually makes sense, then going to sha256 is likely pointless > too [...] yes, that's why i suggested to not actually trust the hash to be cryptographically secure, but to just assume it's a good generic hash we can design a DB around, and to turn -DCOLLISION_CHECK on and enforce consistency rules on boundaries. [ it's not bad to keep sha1 because even my suggested enhancement still leaves 'content-less trust-pointers to untrusted content via email' vectors open against attack (maintainer sends you an email that commit X in Malice's repository Y is fine to pull, and you pull it blindly, while the attacker has replaced his content with the compromised one meanwhile), but it at least validates the bulk traffic that goes into the DB: patches via emails and trusted repositories. ] so all i was suggesting was to extend your suggested 'overwrite collision check' to a stricter 'content we throw away and use the sha1 shortcut for needs to be checked against the in-DB content as well'. in other words, your suggested 'rename check' is checking for 'positive duplicate content', while my addition would also check for 'negative duplicate content' as well. but as usual, i could be wrong, so dont take this too serious :-) Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
On Sun, 17 Apr 2005, Ingo Molnar wrote: > > in fact, this attack cannot even be proven to be malicious, purely via > the email from Malice: it could be incredible bad luck that caused that > good-looking patch to be mistakenly matching a dangerous object. I really hate theoretical discussions. The fact is, a lot of _crap_ engineering gets done because of the question "what if?". It results in over-engineering, often to the point where the end result is quite a lot measurably worse than the sane results. You are _literally_ arguing for the equivalent of "what if a meteorite hit my plane while it was in flight - maybe I should add three inches of high-tension armored steel around the plane, so that my passengers would be protected". That's not engineering. That's five-year-olds discussing building their imaginary forts ("I want gun-turrets and a mechanical horse one mile high, and my command center is 5 miles under-ground and totally encased in 5 meters of lead"). I absolutely _hate_ doing engineering on the principle of "this might be possible in theory", and I'm violently opposed to it. So far, I have not heard a single argument that I consider even _remotely_ likely. The thing is, even if you can force a hash collission by sending somebody a patch, it's really pretty much almost guaranteed that the patch is not just "a few strange characters", unless sha1 is really broken to the point where it's not cryptographically secure _at_all_. In other words, unless somebody finds a way to make sha1 appear as nothing more than a complicated set of parity bits, all brute-force "get the same sha1" is likely to be about generating a really strange blob based on the thing you want to replace - and by "really strange" I mean total binary crap. And likely _much_ bigger too. And by "much bigger" I mean "possibly gigabytes of data". And the thing is, _if_ somebody finds a way to make sha1 act as just a complex parity bit, and comes up with generating a clashing object that actually makes sense, then going to sha256 is likely pointless too - I think the algorithm is basically the same, just with more bits. If you've broken sha1 to the point where it's _that_ breakable, then you've likely broken sha256 too. Nobody has ever proven that you couldn't break sha256 with some really clever algorithm... So if you start playing "what if?" games, dammit, I can play mine. If we want to have any kind of confidence that the hash is reall yunbreakable, we should make it not just longer than 160 bits, we should make sure that it's two or more hashes, and that they are based on totally different principles. And we should all digitally sign every single object too, and we should use 4096-bit PGP keys and unguessable passphrases that are at least 20 words in length. And we should then build a bunker 5 miles underground, encased in lead, so that somebody cannot flip a few bits with a ray-gun, and make us believe that the sha1's match when they don't. Oh, and we need to all wear aluminum propeller beanies to make sure that they don't use that ray-gun to make us do the modification _outselves_. And the thing is, that's just crazy talk. The difference between a crazy person and an intelligent one is that the crazy one doesn't realize what makes sense in the world. The goal of good engineering is not to ask "what if?", but to ask "how do I make this work as well as possible". So please stop with the theoretical sha1 attacks. It is simply NOT TRUE that you can generate an object that looks halfway sane and still gets you the sha1 you want. Even the "breakage" doesn't actually do that. And if it ever _does_ become true, it will quite possibly be thanks to some technology that breaks other hashes too. So until proven otherwise, I worry about accidental hashes, and in 160 bits of good hashing, that just isn't an issue either. Anybody who compares a 128-bit md5-sum to a 160-bit sha1 doesn't understand the math. It didn't get "slightly less likely" to happen. It got so _unbelievably_ less likely to happen that it's not even funny. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > The compromise relies on you having reviewed something harmless, while > in reality what happened within the DB was far less harmless. And the > DB remains self-consistent: neither fsck, nor others importing your > tree will be able to detect the compromise. This attack can only be > detected when you apply the patch, after that point all the > information (except Malice's message in your inbox) is gone. in fact, this attack cannot even be proven to be malicious, purely via the email from Malice: it could be incredible bad luck that caused that good-looking patch to be mistakenly matching a dangerous object. In fact this could happen even today, _accidentally_. (but i'm willing to bet that hell will be freezing over first, and i'll have some really good odds ;) There's probably a much higher likelyhood of Linus' tree getting corrupted in some old fashioned way and introducing a security hole by accident) Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
* Brad Roberts <[EMAIL PROTECTED]> wrote: > While I agree that a hash collision is bad and certainly worth > preventing during new object creation, for it to actually implant a > trojan in a build successfully it'd have to meet even more criteria > than you've layed out. It'd have to... > - provide all the public symbols the shadowed object provided so that it > would still build and link successfully that's not a problem. Most modules dont provide public symbols. Especially not drivers. Generally it's the modules that _dont_ have any global impact that get reviewed less stringently - an attacker would thus choose them for psychological reasons anyway. > - be shadowing an object that's part of an active tree > > Shadowing an object that's not part of the working tree means > something on another branch or obsoleted some time in the past is > still db corruption, but not nearly as big an issue from a trojan > standpoint. it's not DB corruption, it's a feature of GIT: it's a content _cache_, new and old alike. Nothing in GIT says that old objects in the repository (which are still very much part of history) cannot be revived in newer trees. (in fact it regularly happens - e.g. if a fix is undone manually.) Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
On Sun, 17 Apr 2005, Ingo Molnar wrote: > Date: Sun, 17 Apr 2005 16:52:32 +0200 > From: Ingo Molnar <[EMAIL PROTECTED]> > To: Linus Torvalds <[EMAIL PROTECTED]> > Cc: Petr Baudis <[EMAIL PROTECTED]>, Simon Fowler <[EMAIL PROTECTED]>, > David Lang <[EMAIL PROTECTED]>, git@vger.kernel.org > Subject: Re: Re: Merge with git-pasky II. > > > * Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > Almost all attacks on sha1 will depend on _replacing_ a file with a > > bogus new one. So guys, instead of using sha256 or going overboard, > > just make sure that when you synchronize, you NEVER import a file you > > already have. > > With tens of thousands (or hundreds of thousands) of objects expected in > the repository sooner or later, there's quite a selection to pick from. > Once you apply the patch, instead of the expected new file that you > reviewed and found safe, the attacker has the other object included in > the official kernel. > > A dangerous object can be anything: e.g. a debugging hack that allows > arbitrary kernel-space writes. Or a known-insecure module (which since > then got fixed, but the buggy code still exists in the DB). The module > is in a single file and is self-installing (e.g. it has __init code to > register itself as some driver.) While I agree that a hash collision is bad and certainly worth preventing during new object creation, for it to actually implant a trojan in a build successfully it'd have to meet even more criteria than you've layed out. It'd have to... - be shadowing an object that's part of an active tree - provide all the public symbols the shadowed object provided so that it would still build and link successfully Shadowing an object that's not part of the working tree means something on another branch or obsoleted some time in the past is still db corruption, but not nearly as big an issue from a trojan standpoint. Later, Brad - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > Almost all attacks on sha1 will depend on _replacing_ a file with a > bogus new one. So guys, instead of using sha256 or going overboard, > just make sure that when you synchronize, you NEVER import a file you > already have. here is a bit complex, but still practical attack that doesnt rely on replacement and which can only be detected if we check the sha1 uniqueness assumptions. If you can generate a duplicate sha1 key for an arbitrary 'target' file, and Malice sends you a GIT-generated patch that introduces a new file (which doesnt exist in the current tree) which you review (in the email) and which looks safe to apply & harmless. Maybe the patch has a bit weird formatting and some weird comments (which in reality Malice used to generate the proper sha1 key) but otherwise the patch is for some seldom used arcane driver that no-one used for quite some time and no-one really cares about, so you are happy to apply the patch. The compromise occurs when you apply the patch: the seemingly harmless patch has an sha1 key that Malice manufacured to match that of an already existing, 'dangerous' object in your database. With tens of thousands (or hundreds of thousands) of objects expected in the repository sooner or later, there's quite a selection to pick from. Once you apply the patch, instead of the expected new file that you reviewed and found safe, the attacker has the other object included in the official kernel. A dangerous object can be anything: e.g. a debugging hack that allows arbitrary kernel-space writes. Or a known-insecure module (which since then got fixed, but the buggy code still exists in the DB). The module is in a single file and is self-installing (e.g. it has __init code to register itself as some driver.) Malice might even previously plant a dangerous object as some 'firmware module' in another arcane driver, which doesnt get compiled by default, but still shows up in the DB. Or Malice might plant a dangerous object via an innocent-looking documentation file. (which contains some sample code and is called sample.txt) this type of 'false sharing attack' can only be prevented if an object is only 'shared' with another object if it has been memcmp-ed with the object in the repository. I.e. if we trust the sharing decision! Once the attack has occured it cannot be detected automatically: only people will notice it. (why did that weird unrelated module show up in that old driver?) The compromise relies on you having reviewed something harmless, while in reality what happened within the DB was far less harmless. And the DB remains self-consistent: neither fsck, nor others importing your tree will be able to detect the compromise. This attack can only be detected when you apply the patch, after that point all the information (except Malice's message in your inbox) is gone. so unless we actively check for collisions, once an sha1 key can be generated at will on near-arbitrary input, it's not a secure system anymore. We might be lucky and safe, but we wont be secure. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
On Sat, 16 Apr 2005, Linus Torvalds wrote: Almost all attacks on sha1 will depend on _replacing_ a file with a bogus new one. So guys, instead of using sha256 or going overboard, just make sure that when you synchronize, you NEVER import a file you already have. It's really that simple. Add "--ignore-existing" to your rsync scripts, and you're pretty much done. That guarantees that a new evil blob by the next mad scientist out to take over the world will never touch your repository, and if we make this part of the _standard_ scripts, then dammit, security is in good _practices_ rather than just relying blindly on the hash being secure. In other words, I think we could have used md5's as the hash, if we just make sure we have good practices. And it wouldn't have been "insecure". The fact is, you don't merge with people you don't trust. If you don't trust them, they have a much easier time corrupting your repository by just creating bugs in the code and checking that thing in. Who cares about hash collisions, when you can generate a kernel root vulnerability by just adding a single line of code and use the _correct_ hash for it. So the sha1 hash does not replace _trust_. That comes from something else altogether. What I am bringing up is not intended to be a trust thing, but instead a safety thing, accidents, not evil intent. makeing the rsync scripts --ignore-existing will avoid corrupting local data when pulling remotely, but it won't solve the problem of running into a collision locally (and won't do much to help you figure out what's wrong when you run into a remote collision) David Lang -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
On Sat, Apr 16, 2005 at 06:03:33PM +0200, Petr Baudis wrote: > Dear diary, on Sat, Apr 16, 2005 at 05:55:37PM CEST, I got a letter > where Simon Fowler <[EMAIL PROTECTED]> told me that... > > On Sat, Apr 16, 2005 at 05:19:24AM -0700, David Lang wrote: > > > Simon > > > > > > given that you have multiple machines creating files, how do you deal > > > with > > > the idea of the same 'unique id' being assigned to different files by > > > different machines? > > > > > The id is a sha1 hash of the current time and the full path of the > > file being added - the chances of that being replicated without > > malicious intent is extremely small. There are other things that > > could be used, like the hostname, username of the person running the > > program, etc, but I don't really see them being necessary. > > Why not just use UUID? > Hey, everything else in git seems to use sha1, so I just copied Linus' sha1 code ;-) All I wanted was something that had a good chance of being unique across any potential set of distributed repositories, to avoid the chance of accidental clashes. A sha1 hash of something that's not likely to be replicated is a simple way to do that. Simon -- PGP public key Id 0x144A991C, or http://himi.org/stuff/himi.asc (crappy) Homepage: http://himi.org doe #237 (see http://www.lemuria.org/DeCSS) My DeCSS mirror: ftp://himi.org/pub/mirrors/css/ signature.asc Description: Digital signature
Re: Re: Merge with git-pasky II.
On Sat, 16 Apr 2005, Petr Baudis wrote: > Dear diary, on Sat, Apr 16, 2005 at 05:55:37PM CEST, I got a letter > where Simon Fowler <[EMAIL PROTECTED]> told me that... > > > The id is a sha1 hash of the current time and the full path of the > > file being added - the chances of that being replicated without > > malicious intent is extremely small. There are other things that > > could be used, like the hostname, username of the person running the > > program, etc, but I don't really see them being necessary. > > Why not just use UUID? Note that using anything that isn't data-related totally destroys the whole point of the object database. Remember: any time we don't uniquely generate the same name for the same object, we'll waste disk-space. So adding in user/machine/uuid's to the thing is always a mistake. The whole thing depends on the hash being as close to 1:1 with the contents as humanly possible. There's also the issue of size. Yes, I could have chosen sha256 instead of sha1. But the keys would be almost twice as big, which in turn means that the "tree" objects would be bigger, and that the "index" file would be bigger. Is that a huge problem? No. We can certainly move to it if sha1 ever shows itself to be weak. But I really think we are much better off just re-generating the whole tree and history at that point, rather than try to predict the future. The fact is, with current knowledge, sha1 _is_ safe for what git uses it for, for the forseeable future. And we have a migration strategy if I'm wrong. Don't worry about it. Almost all attacks on sha1 will depend on _replacing_ a file with a bogus new one. So guys, instead of using sha256 or going overboard, just make sure that when you synchronize, you NEVER import a file you already have. It's really that simple. Add "--ignore-existing" to your rsync scripts, and you're pretty much done. That guarantees that a new evil blob by the next mad scientist out to take over the world will never touch your repository, and if we make this part of the _standard_ scripts, then dammit, security is in good _practices_ rather than just relying blindly on the hash being secure. In other words, I think we could have used md5's as the hash, if we just make sure we have good practices. And it wouldn't have been "insecure". The fact is, you don't merge with people you don't trust. If you don't trust them, they have a much easier time corrupting your repository by just creating bugs in the code and checking that thing in. Who cares about hash collisions, when you can generate a kernel root vulnerability by just adding a single line of code and use the _correct_ hash for it. So the sha1 hash does not replace _trust_. That comes from something else altogether. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
Dear diary, on Sat, Apr 16, 2005 at 05:55:37PM CEST, I got a letter where Simon Fowler <[EMAIL PROTECTED]> told me that... > On Sat, Apr 16, 2005 at 05:19:24AM -0700, David Lang wrote: > > Simon > > > > given that you have multiple machines creating files, how do you deal with > > the idea of the same 'unique id' being assigned to different files by > > different machines? > > > The id is a sha1 hash of the current time and the full path of the > file being added - the chances of that being replicated without > malicious intent is extremely small. There are other things that > could be used, like the hostname, username of the person running the > program, etc, but I don't really see them being necessary. Why not just use UUID? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
Dear diary, on Fri, Apr 15, 2005 at 12:22:26PM CEST, I got a letter where Junio C Hamano <[EMAIL PROTECTED]> told me that... > After I re-read [*R1*], in which Linus talks about dircache, > especially this section: > > - The "current directory cache" describes some baseline. In particular, >note the "some" part. It's not tied to any special baseline, and you >can change your baseline any way you please. > >So it does NOT have to track any particular state in either the object >database _or_ in your actual current working tree. In fact, all real >interactions with "git" are really about updating this staging area one >way or the other: you might check out the state from it into your >working area (partially or fully), you can push your working area into >the staging area (again, partially or fully). > >And if you want to, you can write the thing that the staging area >represents as a "tree" into the object database, or you can merge a >tree from the object database into the staging area. > >In other words: the staging area aka "current directory cache" is >really how all interaction takes place. The object database never >interacts directly with your working directory contents. ALL >interactions go through the current directory cache. > > I started to have more doubts on the approach of *not* > performing the merge in the dircache I set up specifically for > merging, which is the direction in which you are pushing if I > understand you correctly. Maybe I completely misunderstand what > you want. This message is long but I need a clear understanding > of what is expected to be useful to you, so please bear with me. > PB> merge-tree.pl -b $base $(tree-id) $merged | parse-your-output > > Please help me understand this example you have given earlier. > Here is my understanding of your assumption when the above > pipeline takes place. Correct me if I am mistaken. > > * The user is in a working directory $W. It is controlled by >git-tools and there are $W/.git/. directory and $W/.git/index >dircache. > > * The dircache $W/.git/index started its life as a read-tree >from some commit. The git-tools is keeping track of which >commit it is somewhere, presumably in $W/.git/ directory. >Let's call it $C (commit). > > ? Question. Is the $(tree-id) in your example the same as $C >above? Yes. Actually $(tree-id) returns ID of the tree object, not the commit object; but that doesn't matter here, probably - let's ignore that distinction for simplicity. > * The user have run [*1*] (see Footnote below) checkout-cache >on $W/.git/index some time in the past and $W is full of >working files. Some of them may or may not have modified. >There may be some additions or deletions. So the contents of >the working directory may not match the tree associated with >$C. > > * The user may or may not have run [*1*] update-cache in $W. >The contents of the dircache $W/.git/index may not match the >tree associated with $C. > > ? Question. Are you forbidding the user to run update-cache by >hand, and keeping track of the changes yourself, to be >applied all at once at "git commit" time, thereby >guaranteeing the $W/.git/index to match the tree associated >with $C all times? From the description of The "GIT toolkit" >section in README, it is not clear to me which part of his >repository an end user is not supposed to muck with himself. Ideally, he shouldn't be using *any* of the low-level plumbing by now. The only exception is update-cache --refresh, which he can do at will (I'm yet thinking what to do with it :-). The git-tools always assume that index basically contains the state as of the last commit. (Actually the only time when this matters *now* might be git diff - the user would get confused from the results.) > * Now the user has some changes in his working directory and >notices upstream or a side branch has notable changes >desireble to be picked up. So he runs some git-tools command >to cause the above quoted pipeline to run. > > ? Question. Does $merged in your example mean such an upstream >or side branch? Is $base in your example the common ancestor >between $C and $merged? Correct. *HOWEVER* what is not correct is that git-tools would let you merge in your working directory while you have local changes there. In the past, the merge would happen in your working tree, but git-tools wouldn't let you go for it unless your working tree has no local changes. It would complain loudly and refuse to, since it's *not* what you want to do and it was most likely a mistake. Currently, git merge just creates a ,,merge/ subdirectory sharing the object database with your working tree, but with an independent checkout of it; it will do the merge there, and when you commit it there, it will update your working tree with the merged chan
Re: Re: Merge with git-pasky II.
Dear diary, on Fri, Apr 15, 2005 at 02:58:25AM CEST, I got a letter where Junio C Hamano <[EMAIL PROTECTED]> told me that... > > "PB" == Petr Baudis <[EMAIL PROTECTED]> writes: > >> I think the above would result in what SCM person would call > >> "merge upstream/sidestream changes into my working directory". > > PB> And that's exactly what I'm doing now with git merge. ;-) In fact, > PB> ideally the whole change in my scripts when your script is finished > PB> would be replacing > > PB> checkout-cache `diff-tree` # symbolic > PB> git diff $base $merged | git apply > > PB> with > > PB> merge-tree.pl -b $base $(tree-id) $merged | parse-your-output > > In the above I presume by $merged you mean the tree ID (or > commit ID) the user's working directory is based upon? Well, > merge-trees (Linus has a single directory merge-tree already) > looks at tree IDs (or commit IDs); it would never involve > working files in random state that is not recorded as part of a > tree (committed or not). Given that constraints I am not sure > how well that would pan out. I have to think about this a bit. No, $(tree-id) is the "destination branhc", what the user directory is based upon; $merged is the branch you are merging now, relative to $base. When I throw away the useless "-b" argument, in practice it would look like merge-trees abcd 1234 5678 for doing /-- 1234 -+- abcd < / \-- 5678 (not that the order of 1234 and 5678 would actually really matter) I fear I don't understand the rest of your paragraph. :-( > I do like, however, the idea of separating the step of doing any > checkout/merge etc. and actually doing them. So the command set > of parse-your-output needs to be defined. Based on what I have > done so far, it would consist of the following: > > - Result is this object $SHA1 with mode $mode at $path (takes >one of the trees); you can do update-cache --cacheinfo (if >you want to muck with dircache) or cat-file blob (if you want >to get the file) or both. > > - Result is to delete $path. > > - Result is a merge between object $SHA1-1 and $SHA1-2 with >mode $mode-1 or $mode-2 at $path. > > Would this be a good enough command set? What about the conflicts? Like one tree deleting, other tree modifying? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
BTW, I am not competing with Junio script. If that is the way we all agree on. It is should be very easy for Junio to fix his perl script. right? Chris On Thu, Apr 14, 2005 at 04:37:17PM -0400, Christopher Li wrote: > Is that some thing you want to see? Maybe clean up the error printing. > > > Chris > > --- /dev/null 2003-01-30 05:24:37.0 -0500 > +++ merge.py 2005-04-14 16:34:39.0 -0400 > @@ -0,0 +1,76 @@ > +#!/usr/bin/env python > + > +import re > +import sys > +import os > +from pprint import pprint > + > +def get_tree(commit): > +data = os.popen("cat-file commit %s"%commit).read() > +return re.findall(r"(?m)^tree (\w+)", data)[0] > + > +PREFIX = 0 > +PATH = -1 > +SHA = -2 > +ORIGSHA = -3 > + > +def get_difftree(old, new): > +lines = os.popen("diff-tree %s %s"%(old, new)).read().split("\x00") > +patterns = (r"(\*)(\d+)->(\d+)\s(\w+)\s(\w+)->(\w+)\s(.*)", > + r"([+-])(\d+)\s(\w+)\s(\w+)\s(.*)") > +res = {} > +for l in lines: > + if not l: continue > + for p in patterns: > + m = re.findall(p, l) > + if m: > + m = m[0] > + res[m[-1]] = m > + break > + else: > + raise "difftree: unknow line", l > +return res > + > +def analyze(diff1, diff2): > +diff1only = [ diff1[k] for k in diff1 if k not in diff2 ] > +diff2only = [ diff2[k] for k in diff2 if k not in diff1 ] > +both = [ (diff1[k],diff2[k]) for k in diff2 if k in diff1 ] > + > +action(diff1only) > +action(diff2only) > +action_two(both) > + > +def action(diffs): > +for act in diffs: > + if act[PREFIX] == "*": > + print "modify", act[PATH], act[SHA] > + elif act[PREFIX] == '-': > + print "remove", act[PATH], act[SHA] > + elif act[PREFIX] == '+': > + print "add", act[PATH], act[SHA] > + else: > + raise "unknow action" > + > +def action_two(diffs): > +for act1, act2 in diffs: > + if len(act1) == len(act2): # same kind type > + if act1[PREFIX] == act2[PREFIX]: > + if act1[SHA] == act2[SHA] or act1[PREFIX] == '-': > + return action(act1) > + if act1[PREFIX]=='*': > + print "do_merge", act1[PATH], act1[ORIGSHA], act1[SHA], > act2[SHA] > + return > + print "unable to handle", act[PATH] > + print "one side wants", act1[PREFIX] > + print "the other side wants", act2[PREFIX] > + > + > +args = sys.argv[1:] > +if len(args)!=3: > +print "Usage merge.py " > +trees = map(get_tree, args) > +print "checkout-tree", trees[0] > +diff1 = get_difftree(trees[0], trees[1]) > +diff2 = get_difftree(trees[0], trees[2]) > +analyze(diff1, diff2) > + > - > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
Is that some thing you want to see? Maybe clean up the error printing. Chris --- /dev/null 2003-01-30 05:24:37.0 -0500 +++ merge.py2005-04-14 16:34:39.0 -0400 @@ -0,0 +1,76 @@ +#!/usr/bin/env python + +import re +import sys +import os +from pprint import pprint + +def get_tree(commit): +data = os.popen("cat-file commit %s"%commit).read() +return re.findall(r"(?m)^tree (\w+)", data)[0] + +PREFIX = 0 +PATH = -1 +SHA = -2 +ORIGSHA = -3 + +def get_difftree(old, new): +lines = os.popen("diff-tree %s %s"%(old, new)).read().split("\x00") +patterns = (r"(\*)(\d+)->(\d+)\s(\w+)\s(\w+)->(\w+)\s(.*)", + r"([+-])(\d+)\s(\w+)\s(\w+)\s(.*)") +res = {} +for l in lines: + if not l: continue + for p in patterns: + m = re.findall(p, l) + if m: + m = m[0] + res[m[-1]] = m + break + else: + raise "difftree: unknow line", l +return res + +def analyze(diff1, diff2): +diff1only = [ diff1[k] for k in diff1 if k not in diff2 ] +diff2only = [ diff2[k] for k in diff2 if k not in diff1 ] +both = [ (diff1[k],diff2[k]) for k in diff2 if k in diff1 ] + +action(diff1only) +action(diff2only) +action_two(both) + +def action(diffs): +for act in diffs: + if act[PREFIX] == "*": + print "modify", act[PATH], act[SHA] + elif act[PREFIX] == '-': + print "remove", act[PATH], act[SHA] + elif act[PREFIX] == '+': + print "add", act[PATH], act[SHA] + else: + raise "unknow action" + +def action_two(diffs): +for act1, act2 in diffs: + if len(act1) == len(act2): # same kind type + if act1[PREFIX] == act2[PREFIX]: + if act1[SHA] == act2[SHA] or act1[PREFIX] == '-': + return action(act1) + if act1[PREFIX]=='*': + print "do_merge", act1[PATH], act1[ORIGSHA], act1[SHA], act2[SHA] + return + print "unable to handle", act[PATH] + print "one side wants", act1[PREFIX] + print "the other side wants", act2[PREFIX] + + +args = sys.argv[1:] +if len(args)!=3: +print "Usage merge.py " +trees = map(get_tree, args) +print "checkout-tree", trees[0] +diff1 = get_difftree(trees[0], trees[1]) +diff2 = get_difftree(trees[0], trees[2]) +analyze(diff1, diff2) + - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
On Fri, Apr 15, 2005 at 01:31:59AM +0200, Petr Baudis wrote: > > I am just trying to follow my understanding of what Linus > > wanted. One of the guiding principle is to do as much things as > > in dircache without ever checking things out or touching working > > files unnecessarily. > > I'm just arguing that instead of directly touching the directory cache, > you should just list what would you do there - and you already do this, That is exactly what I suggest in the previous email. And my python script does exactly that ;-) Chris - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
Dear diary, on Fri, Apr 15, 2005 at 01:12:34AM CEST, I got a letter where Junio C Hamano <[EMAIL PROTECTED]> told me that... > > "PB" == Petr Baudis <[EMAIL PROTECTED]> writes: > > PB> What I would like your script to do is therefore just do the > PB> merge in a given already prepared (including built index) > PB> directory, with a passed base. The base should be determined > PB> by a separate tool (I already saw some patches); most future > PB> "science" will probably go to a clever selection of this > PB> base, anyway. > > I think you are contradicting yourself for saying the above > after agreeing with me that the script should just work on trees > not commits. My understanding is that the tools is just to > merge two related trees relative to another ancestor tree, > nothing more. Especially, it should not care what is in the > working directory---that is SCM person's business. Yes. Isn't this exactly what I'm saying? I'm arguing for doing less in my paragraph, you are arguing for doing less in your paragraph, and we even seem to agree on the direction in which we should do less. > I am just trying to follow my understanding of what Linus > wanted. One of the guiding principle is to do as much things as > in dircache without ever checking things out or touching working > files unnecessarily. I'm just arguing that instead of directly touching the directory cache, you should just list what would you do there - and you already do this, I think. So I'd be happy with a switch which would just do that and not touch the directory cache. I'll parse your output and do the right thing for me. > PB> This will give the tool maximal flexibility. > > I suspect it would force me to have a working directory > populated with files, just to do a merge. Why would that be so? > PB> I'm all for an -o, and I don't mind ,, - I just don't want it uselessly > PB> long. I hope "git~merge~$$" was a joke... :-) > > Which part do you object to? PID part? or tilde? Would > git~merge do, perhaps? It probably would not matter to you > because as an SCM you would always give an explicit --output > parameter to the script anyway. Yes. I'll just override it with ,,merge, I think. So, do whatever you want. ;-)) > PB> By the way, what about indentation with tabs? If you have a > PB> strong opinion about this, I don't insist - but if you > PB> really don't mind/care either way, it'd be great to use tabs > PB> as in the rest of the git code. > > I do not have a strong opinion, but it is more trouble for me > only because I am lazy and am used to the indentation my Emacs > gives me. I write code other than git, so changing Perl-mode > indentation setting globally for all .pl files is not an option > for me. I'll see what I can do when I have time. Doesn't Emacs have something equivalent to ./.vimrc? I've also seen those funny -*- strings. Well, if it would mean a lot of trouble for you, just forget about it. > PB> Is there a fundamental reason why the directory cache > PB> contains the ancestor instead of the destination branch? > > Because you are thinking as an SCM person where there are > distinction between tree-A and tree-B, two heads being merged. > There is no "destination branch" nor "source branch" in what I > am doing. It is a merge of two equals derived from the same > ancestor. That's a valid point of view too. Actually, when you would have a mode in which you would not write to the directory cache, do you need to read from it? You could do just direct cat-files like for the other trees, and it would be even faster. Then, you could do without a directory cache altogether in this mode. > PB> And this is another thing I dislike a lot. I'd like merge-tree.pl to > PB> leave my directory cache alone, thank you very much. You know, I see > PB> what goes to the directory cache as actually part of the policy part. > > Remember I am not touching *your* dircache. It is a dircache in > the temporary merge area, specifically set up to help you review > the merge. Yes, but I want to have a control over its dircache too. :-) That is because I want the user to be able to use the regular git commands like "git diff" there. > Can't the SCM driver do things along this line, perhaps? > > - You have your working files and your dircache. They may not >match because you have uncommitted changes to your >environment. You want to merge with Linus head. You know >its SHA1 (call it COMMIT-Linus). Your SCM knows which commit >you started with (call it COMMIT-Current). > > - First you merge the tree associated with COMMIT-Current. Use >it and COMMIT-Linus to find the common ancestor to use. > > - Now use the tree SHA of COMMIT-Current, tree SHA1 of >COMMIT-Linus, and tree SHA1 of the common ancestor commit to >drive git-merge.perl (to be renamed ;-). You will get a >temporary directory. Have your user examine what is in >there, and fix the merge and have them tell you
Re: Re: Re: Merge with git-pasky II.
Dear diary, on Thu, Apr 14, 2005 at 10:23:26PM CEST, I got a letter where Erik van Konijnenburg <[EMAIL PROTECTED]> told me that... > On Thu, Apr 14, 2005 at 09:35:07PM +0200, Petr Baudis wrote: > > Hmm. I actually don't like this naming. I think it's not too consistent, > > is irregular, therefore parsing it would be ugly. What I propose: > > > > 12c\tname <- legend > > <- original file > > D <- tree #1 removed file > > D<- tree #2 removed file > > DD<- both trees removed file > > M <- tree #1 modified file > > M > > DM* <- conflict, tree #1 removed file, tree #2 modified file > > MD* > > MM<- exact same modification > > MM* <- different modifications, merging > > > > This is generic, theoretically scales well even to more trees, is easy > > to parse trivially, still is human readable (actually the asterisk in > > the 'conflict' column is there basically only for the humans), is > > completely regular and consistent. > > Detail: perhaps use underscore instead of space, to avoid space/tab typos > that are invisible on paper and user friendly mail clients? I'd go for dots in that case. Looks less intrusive. :^) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
On Thu, Apr 14, 2005 at 09:35:07PM +0200, Petr Baudis wrote: > Hmm. I actually don't like this naming. I think it's not too consistent, > is irregular, therefore parsing it would be ugly. What I propose: > > 12c\tname <- legend > <- original file > D <- tree #1 removed file > D<- tree #2 removed file > DD<- both trees removed file > M <- tree #1 modified file > M > DM* <- conflict, tree #1 removed file, tree #2 modified file > MD* > MM<- exact same modification > MM* <- different modifications, merging > > This is generic, theoretically scales well even to more trees, is easy > to parse trivially, still is human readable (actually the asterisk in > the 'conflict' column is there basically only for the humans), is > completely regular and consistent. Detail: perhaps use underscore instead of space, to avoid space/tab typos that are invisible on paper and user friendly mail clients? Regards, Erik - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
Dear diary, on Thu, Apr 14, 2005 at 09:59:04PM CEST, I got a letter where Junio C Hamano <[EMAIL PROTECTED]> told me that... > > "LT" == Linus Torvalds <[EMAIL PROTECTED]> writes: > > LT> On Thu, 14 Apr 2005, Junio C Hamano wrote: > > >> Sorry, I have not seen what you have been doing since pasky 0.3, > >> and I have not even started to understand the mental model of > >> the world your tool is building. That said, my gut feeling is > >> that telling this script about git-pasky's world model might be > >> a mistake. I'd rather see you consider the script as mere "part > >> of the plumbing". > > LT> I agree. Having separate abstraction layers is good. I'm actually very > LT> happy with Pasky's cleaned-up-tree, exactly because unlike the first one, (Just a side-note - functionally and even organizationally, the cleaned up tree does not differ significantly from the original one.) > LT> Pasky did a great job of maintaining the abstraction between "plumbing" > LT> and user interfaces. > > Agreed, not just with your agreeing with me, but with the > statement that Pasky did a good job (although I am ashamed to > say I have not caught up with the "userland" tools). Thanks. :-) > LT> The plumbing should take user interface needs into account, but the more > LT> conceptually separate it is ("does it makes sense on its own?") the better > LT> off we'll be. And "merge these two trees" (which works on a _tree_ level) > LT> or "find the common commit" (which works on a _commit_ level) look like > LT> plumbing to me - the kind of things I should have written, if I weren't > LT> such a lazy slob. > > I am planning drop the ancestor computation from the script, and > make it another command line parameter to the script. Dan > Barkalow's merge-base program should be used to compute it and > his result should drive the merge. That sounds more UNIXy to > me. Good move, I say! > I even may want to make the script take three trees not > commits, since the merge script does not need commits (it only > needs trees). As plumbing it would be cleaner interface to it > to do so. The wrapper SCM scripts can and should make sure it > is fed trees when the user gives it commits (or symbolic > representation of it like .git/tags/blah, or `cat .git/HEAD`). Agreed. > But one different thing to note here. > > You say "merge these two trees" above (I take it that you mean > "merge these two trees, taking account of this tree as their > common ancestor", so actually you are dealing with three trees), > and I am tending to agree with the notion of merging trees not > commits. However you might get richer context and more sensible > resulting merge if you say "merge these two commits". Since > commit chaining is part of the fundamental git object model you > may as well use it. Could you be more particular on the richer context etc? I think this script should stay strictly on the level of trees. When someone invents it, there could be a merge-commits script which does something very smart about two commits, traversing the graph between them etc, and doing a set of merge-tree invocations, possibly preparing the staging area for them etc. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
Dear diary, on Thu, Apr 14, 2005 at 08:12:35PM CEST, I got a letter where Junio C Hamano <[EMAIL PROTECTED]> told me that... > > "PB" == Petr Baudis <[EMAIL PROTECTED]> writes: > > PB> Bah, you outran me. ;-) > > Just being in a different timezone, I guess. > > PB> I'll change it to use the cool git-pasky stuff (commit-id etc) and its > PB> style of committing - that is, it will merely record the update-caches > PB> to be done upon commit, and it will read-tree the branch we are merging > PB> to instead of the ancestor. (So that git diff gives useful output.) > > Sorry, I have not seen what you have been doing since pasky 0.3, > and I have not even started to understand the mental model of > the world your tool is building. That said, my gut feeling is > that telling this script about git-pasky's world model might be > a mistake. I'd rather see you consider the script as mere "part > of the plumbing". Maybe adding an extra parameter to the script > to let the user explicitly specify the common ancestor to use > would be needed, but I would prefer git-pasky-merge to do its > own magic (converting symbolic commit names into raw commit > names and such) before calling this low level script. > > That way people like me who have not migrated to your framework > can still keep using it. All the script currently needs is a > bare git object database; i.e., nothing other than what is in > .git/objects and a couple of commit record SHA1s as its > parameters. No .git/heads/, no .git/HEAD.local, no .git/tags, > are involved for it to work, and I would prefer to keep things > that way if possible. I see, and I actually agree with it. However, I'll want merge-tree.pl to do a little less than it does now for that, though. The mechanics in "kernel" is fine as long as I can control policy in my "userspace". ;-) BTW, the git* name sorta imply my toilet instead of the core plumbing, and it'd be more consistent with the current plumbnaming; and could we have it with the .pl extension, please? :-) What I would like your script to do is therefore just do the merge in a given already prepared (including built index) directory, with a passed base. The base should be determined by a separate tool (I already saw some patches); most future "science" will probably go to a clever selection of this base, anyway. This will give the tool maximal flexibility. E.g., then someone who wants to can just merge with his working copy (if you don't give checkout-cache -f - but why would you anyway), or do whatever other cleverness he wants. > >> * show-diff updates to add -r flag to squelch diffs for files not in > >> the working directory. This is mainly useful when verifying the > >> result of an automated merge. ..snip.. > was too tired and did not think of a letter when I wrote it. I > guess '-r' stood for removed, but I agree it is a bad choice. > Any objections to '-q'? None here. > >> +# Create a temporary directory and go there. > >> +system 'rm', '-rf', ',,merge-temp'; > > PB> Can't we call it just ,,merge? > > I'd rather have a command line option '-o' (scrapping the > current '-o' and renaming it to something else; as you can see I > am terrible at picking option names ;-)) to mean "output to this > directory". I am not really an Arch person so I do not > particulary care about /^,,/. How about "git~merge~$$"? I'm all for an -o, and I don't mind ,, - I just don't want it uselessly long. I hope "git~merge~$$" was a joke... :-) > >> +for ((',,merge-temp', '.git')) { mkdir $_; chdir $_; } > >> +symlink "../../.git/objects", "objects"; > >> +chdir '..'; > >> + > >> +my $ancestor_tree = read_commit_tree($common); > >> +system 'read-tree', $ancestor_tree; > >> + > >> +my %tree0 = read_diff_tree($ancestor_tree, read_commit_tree($ARGV[0])); > >> +my %tree1 = read_diff_tree($ancestor_tree, read_commit_tree($ARGV[1])); > >> + > >> +my @ancestor_file = read_show_files(); > >> +my %ancestor_file = map { $_ => 1 } @ancestor_file; > >> + > >> +for (@ancestor_file) { > >> +if (! exists $tree0{$_} && ! exists $tree1{$_}) { By the way, what about indentation with tabs? If you have a strong opinion about this, I don't insist - but if you really don't mind/care either way, it'd be great to use tabs as in the rest of the git code. > >> + if ($full_checkout) { > >> + system 'checkout-cache', $_; > >> + } > >> + print STDERR "O - $_\n"; > > PB> Huh, what are you trying to do here? I think you should just record > PB> remove, no? (And I wouldn't do anything with my read-tree. ;-) > > At this moment in the script, we have run "read-tree" the > ancestor so the dircache has the original. %tree0 and %tree1 > both did not touch the path ($_ here) so it is the same as > ancestor. When '-f' is specified we are populating the output > working tree with the merge result so that is what that > 'checkout-cache' is about. "O - $path" means "we took the > original". Aha! Thanks. Is there a fundamental reason why the dire
Re: Re: Merge with git-pasky II.
Dear diary, on Thu, Apr 14, 2005 at 01:14:13PM CEST, I got a letter where Junio C Hamano <[EMAIL PROTECTED]> told me that... > Here is a diff to update the git-merge.perl script I showed you > earlier today ;-). It contains the following updates against > your HEAD (bb95843a5a0f397270819462812735ee29796fb4). Bah, you outran me. ;-) > * git-merge.perl command we talked about on the git list. I've >covered the changed-to-the-same case etc. I still haven't done >anything about file-vs-directory case yet. > >It does warn when it needed to run merge to automerge and let >merge give a warning message about conflicts if any. In >modify/remove cases, modified in one but removed in the other >files are left in either $path~A~ or $path~B~ in the merge >temporary directory, and the script issues a warning at the >end. I think I will take it rather my working git merge implementation - it's getting insane in bash. ;-) I'll change it to use the cool git-pasky stuff (commit-id etc) and its style of committing - that is, it will merely record the update-caches to be done upon commit, and it will read-tree the branch we are merging to instead of the ancestor. (So that git diff gives useful output.) > * show-files and ls-tree updates to add -z flag to NUL terminate records; >this is needed for git-merge.perl to work. > > * show-diff updates to add -r flag to squelch diffs for files not in >the working directory. This is mainly useful when verifying the >result of an automated merge. -r traditionally means recursive - what's the reasoning behind the choice of this letter? > * update-cache updates to add "--cacheinfo mode sha1" flag to register >a file that is not in the current working directory. Needed for >minimum-checkout merging by git-merge.perl. > > > Signed-off-by: Junio C Hamano <[EMAIL PROTECTED]> > > --- > > git-merge.perl | 247 > + > ls-tree.c |9 +- > show-diff.c| 11 +- > show-files.c | 12 ++ > update-cache.c | 25 + > 5 files changed, 296 insertions(+), 8 deletions(-) > > diff -x .git -Nru ,,1/git-merge.perl ,,2/git-merge.perl > --- ,,1/git-merge.perl1969-12-31 16:00:00.0 -0800 > +++ ,,2/git-merge.perl2005-04-14 04:00:14.0 -0700 > @@ -0,0 +1,247 @@ > +#!/usr/bin/perl -w > + > +use Getopt::Long; use strict? > + > +my $full_checkout = 0; > +my $oneside_checkout = 0; > +GetOptions("full" => \$full_checkout, > +"oneside" => \$oneside_checkout) > +or die; > + > +if ($full_checkout) { > +$oneside_checkout = 1; > +} > + > +sub read_rev_tree { > +my (@head) = @_; > +my ($fhi); > +open $fhi, '-|', 'rev-tree', '--edges', @head > + or die "$!: rev-tree --edges @head"; > +my %common; > +while (<$fhi>) { > + chomp; > + (undef, undef, my @common) = split(/ /, $_); > + for (@common) { > + if (s/^([a-f0-f]{40}):3$/$1/) { > + $common{$_}++; > + } > + } > +} > +close $fhi; > + > +my @common = (map { $_->[1] } > + sort { $b->[0] <=> $a->[0] } > + map { [ $common{$_} => $_ ] } > + keys %common); > + > +return $common[0]; > +} It'd be simpler to do just my @common = (map { $common{$_} } sort { $b <=> $a } keys %common) But I really think this is a horrible heuristic. I believe you should take the latest commit in the --edges output, and from that choose the base whose rev-tree --edges the_base merged_branch has the least lines on output. (That is, the path to it is shortest - ideally it's already part of the merged_branch.) > + > +sub read_commit_tree { > +my ($commit) = @_; > +my ($fhi); > +open $fhi, '-|', 'cat-file', 'commit', $commit > + or die "$!: cat-file commit $commit"; > +my $tree = <$fhi>; > +close $fhi; > +$tree =~ s/^tree //; > +return $tree; > +} > + > +# Reads diff-tree -r output and gives a hash that maps a path > +# to 3-tuple (old-mode new-mode new-sha). > +# When creating, old-mode is undef. When removing, new-* are undef. What about sub OLDMODE { 0 } sub NEWMODE { 1 } sub NEWSHA { 2 } and then using that when accessing the tuple? Would make the code much more readable. > +sub read_diff_tree { > +my (@tree) = @_; > +my ($fhi); > +local ($_, $/); > +$/ = "\0"; > +my %path; > +open $fhi, '-|', 'diff-tree', '-r', @tree > + or die "$!: diff-tree -r @tree"; > +while (<$fhi>) { > + chomp; > + if (/^\*([0-7]+)->([0-7]+)\tblob\t[0-9a-f]+->([0-9a-f]{40})\t(.*)$/s) { > + $path{$4} = [$1, $2, $3]; > + } > + elsif (/^\+([0-7]+)\tblob\t([0-9a-f]{40})\t(.*)$/s) { > + $path{$3} = [undef, $1, $2]; > + } > + elsif (/^\-([0-7]+)\tblob\t[0-9a-f]{40}\t(.*)$/s) { > + $path{$2} = [$1, undef, undef]; > + } > + else { > +