Re: Migrating away from SHA-1?
On Sat, Jun 18, 2016 at 03:10:27AM +0100, Leo Gaspard wrote: > First, sorry for not having this message threaded: I'm not subscribed to > the list and haven't found a way to get a Message-Id from gmane. Sorry it's taken so long to get back to this. I've been at a conference. > So, my questions to the git team: > * Is there a consensus, that git should migrate away from SHA-1 before > it gets a collision attack, because it would mean chosen-prefix > collision isn't far away and people wouldn't have the time to upgrade? I plan on adding support for a new hash as soon as that's possible, but I don't have a firm timeline. This is a volunteer effort in my own limited free time. > * Is there a consensus, that Peter Anvin's amended transition plan is > the way to go? I'm not planning on changing algorithms in the middle of a repository. This will only be available on new or imported repositories. My current thinking on proposed algorithms is SHA3-256 or BLAKE2b-256. The cryptanalysis on SHA-256 indicates that it may not be a great long-term choice, and I expect people won't want to change algorithms frequently. If time becomes extremely urgent, we can always add support for a 160-bit hash first (e.g. BLAKE2b-160) and then finish the object_id transition later as it becomes convenient. I'd like to avoid that, though. > * If the two conditions above are fulfilled, has work started on it > yet? (I guess as Brian Carlson had started his work 9 weeks ago and he > was speaking about working on it on the week-end he should have finished > it now, so excluding this) It takes a long time to get a patch series through. I'm rather busy and don't always have time to rebase and address issues during the week. > * If the two first conditions are fulfilled, is there anything I could > do to help this transition? (including helping Brian if his work hasn't > actually ended yet) You're welcome to send patches if you like. I try to avoid areas I know are under heavy development, like the refs code. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: Migrating away from SHA-1?
Leo Gaspardwrote: > First, sorry for not having this message threaded: I'm not subscribed to > the list and haven't found a way to get a Message-Id from gmane. Appending "/raw" to the gmane URL will get you the raw message with full headers: article.gmane.org/gmane.comp.version-control.git/$NUMBER/raw you can also use that article $NUMBER via NNTP on news.gmane.org > So, my questions to the git team: It is customary to Cc: all relevant parties involved with that thread since they may not all be subscribed, either. > * Is there a consensus, that git should migrate away from SHA-1 before > it gets a collision attack, because it would mean chosen-prefix > collision isn't far away and people wouldn't have the time to upgrade? > * Is there a consensus, that Peter Anvin's amended transition plan is > the way to go? > * If the two conditions above are fulfilled, has work started on it > yet? (I guess as Brian Carlson had started his work 9 weeks ago and he > was speaking about working on it on the week-end he should have finished > it now, so excluding this) AFAIK, brian is still working on it. Last series on the matter begins here: http://mid.gmane.org/20160607005716.69222-2-sand...@crustytoothpaste.net I'm just on the sidelines observing :) > * If the two first conditions are fulfilled, is there anything I could > do to help this transition? (including helping Brian if his work hasn't > actually ended yet) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
First, sorry for not having this message threaded: I'm not subscribed to the list and haven't found a way to get a Message-Id from gmane. I just wanted to ask, as an end-user highly relying on commit signatures, a few questions as to the migration away from SHA-1. SHA-1 already suffers from a freestart collision attack. Based on what I understand of the object model of git, a chosen-prefix collision attack (perhaps somewhat improved) is enough to make reviewers accept a patch, sign it, and then swap the innocuous-looking patch for an evil-doing one -- which *will be signed*. As for the issue about code checking being an easier entrypoint (Theodore Ts'o, 2016-04-14 22:40:51 GMT), in a use case of mine there is a repo with my dotfiles on an untrusted server. Yet I download them and am able to execute them without fear because each commit is PGP-signed with my key. The point being that code checking is not even a possible entrypoint in some cases, so SHA-1 seems to be(come) the weakest link. So, I don't think it is possible to disagree with Jeff King when he wrote his 2016-04-12 23:15:19 GMT email. Peter Anvin (2016-04-14 17:28:50 GMT) gets a point in that there is no need to hurry (chosen-prefix collisions may be still quite a long way, even though there is no guesswork in these matters), and quality is important. Yet Jeff King's proposal (2016-04-12 23:42:52 GMT), amended by Junio Hamano (2016-04-13 01:03:02 GMT) and himself (2016-04-13 01:36:32 GMT) seem to have met no opposition. So, my questions to the git team: * Is there a consensus, that git should migrate away from SHA-1 before it gets a collision attack, because it would mean chosen-prefix collision isn't far away and people wouldn't have the time to upgrade? * Is there a consensus, that Peter Anvin's amended transition plan is the way to go? * If the two conditions above are fulfilled, has work started on it yet? (I guess as Brian Carlson had started his work 9 weeks ago and he was speaking about working on it on the week-end he should have finished it now, so excluding this) * If the two first conditions are fulfilled, is there anything I could do to help this transition? (including helping Brian if his work hasn't actually ended yet) Sorry for bringing up again a subject that seems to be quite recurrent, and for this long block of text, Leo Gaspard signature.asc Description: OpenPGP digital signature
Re: Migrating away from SHA-1?
On Tue, Apr 12, 2016 at 06:58:10PM -0700, H. Peter Anvin wrote: > On April 12, 2016 6:51:12 PM PDT, Duy Nguyenwrote: > >On Wed, Apr 13, 2016 at 5:38 AM, H. Peter Anvin wrote: > >> OK, I'm going to open this can of worms... > >> > >> At what point do we migrate from SHA-1? > > > >Brian Carlson has been slowly refactoring git code base, abstracting > >SHA-1 away. Once that work is done, I think we can talk about moving > >away from SHA-1. The process is slow because it likely causes > >conflicts with in-flight topics. A quick grep shows we still have > >about 300 SHA-1 references, so it'll be quite some time. > > Well, at least it sounds like work is underway. That is a big deal. Yes, it's a bunch of slow manual refactoring, and I've been busy as we've been doing house- and car-related things recently. I'll try to spend a little more time on it this weekend. The first step is to convert all of the individual places that use unsigned char [20] to use struct object_id, which can then be extended to use different hash algorithms. There are also constants, GIT_SHA1_RAWSZ and GIT_SHA1_HEXSZ, that abstract the 20 and 40 values in the codebase so they can be changed in the future. While this is a project I've been mostly working on, I have no objection to other people sending in a patch or series as they feel like it. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: Migrating away from SHA-1?
On Thu, Apr 14, 2016 at 07:18:53PM -0700, Junio C Hamano wrote: > Jeff Kingwrites: > > > [2] Somewhere in the list archive is my patch to find partial > > collisions like "git commit --sha1=31337", and I did in fact use > > that micro-optimization. That, along with multi-threading, made it > > feasible to do 6-8 character prefixes, as I recall. > > In our testsuite, we have a test that uses many objects, all of > which have object names that begin with 10 '0' characters. Can you give more details on which test? 10 zeroes is 40 bits, which means that by random chance, only about one in a trillion objects would match that. We certainly didn't hit that randomly, and it seems like it would be computationally expensive to have come up with the input for even one such object, let alone "many". -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
Jeff Kingwrites: > [2] Somewhere in the list archive is my patch to find partial > collisions like "git commit --sha1=31337", and I did in fact use > that micro-optimization. That, along with multi-threading, made it > feasible to do 6-8 character prefixes, as I recall. In our testsuite, we have a test that uses many objects, all of which have object names that begin with 10 '0' characters. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
On Thu, Apr 14, 2016 at 06:40:51PM -0400, Theodore Ts'o wrote: > Also, remember that while we can write programs that look for > suspicious git objects that have stuff hidden after the null > terminator (in fact, maybe that would be a good thing to add to git, > hmmm?)[...] Detecting the hidden bytes is underway elsewhere on the list. And while I think it's a good idea to do so, I don't think it really introduces a meaningful defense against collision attacks. You can also hide bytes in arbitrary headers in a git object[1], and they will not be shown by default. Adding the extra bytes at the end is certainly easier if you're micro-optimizing the collision process[2], but I don't think it changes the fundamental equation. It reduces the work you do per-sha1 by a constant factor, but not the number of sha1s you expect to compute. -Peff [1] Obviously neither "extra headers" nor "stuff after NUL" applies to patches sent by email, where everything short of binary-diffs is human-readable. So for the kernel, you're really talking about attacking a lieutenant whose repo gets pulled. But there are plenty of other projects that "git merge" from strangers. [2] Somewhere in the list archive is my patch to find partial collisions like "git commit --sha1=31337", and I did in fact use that micro-optimization. That, along with multi-threading, made it feasible to do 6-8 character prefixes, as I recall. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
On Thu, Apr 14, 2016 at 10:28:50AM -0700, H. Peter Anvin wrote: > > Either way, I agree with Ted, that we have enough time to do it > right, but that is a good reason to do it sooner rather than later > (see also my note about freezing the cryptographic properties.) Sure, I think we should do it as well. But the fact that the attacker will likely need to get a commit into the tree in order to be able to carry out a collision attack means that it's easier (and probably less detectable) to get some underhanded C code into the tree. For one thing, you just need to introduce it via a patch ("Hi, I'm super eager newbie Nick, here's a cleanup patch!"), as opposed to getting a sublieutenant to accept a git pull request. Also, remember that while we can write programs that look for suspicious git objects that have stuff hidden after the null terminator (in fact, maybe that would be a good thing to add to git, hmmm?), the state of the art in detecting underhanded C code which is deliberately designed to not be noticed by static code checkers (or humans doing a superficial code review, for that matter) is not particularly encouraging to me. - Ted -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
On April 14, 2016 10:23:03 AM PDT, David Turnerwrote: >On Wed, 2016-04-13 at 21:53 -0400, Theodore Ts'o wrote: >> On Tue, Apr 12, 2016 at 07:15:34PM -0400, David Turner wrote: >> > >> > If SHA-1 is broken (in certain ways), someone *can* replace an >> > arbitrary blob. GPG does not help in this case, because the >> > signature >> > is over the commit object (which points to a tree, which eventually >> > points to the blob), and the commit hasn't changed. So the GPG >> > signature will still verify. >> >> The "in certain ways" is the critical bit. The question is whether >> you are trying to replace an arbitrary blob, or a blob that was >> submitted under your control. >> >> If you are trying to replace an arbitrary blob under the you need to >> carry a preimage attack. That means that given a particular hash, >> you >> need to find another blob that has the same hash. SHA-1 is currently >> resistant against preimage attack (that is, you need to use brute >> force, so the work factor is 2**159). >> >> If you are trying to replace an arbitrary blob which is under your >> control, then all you need is a collision attack, and this is where >> SHA-1 has been weakened. It is now possible to find a collision with >> a work factor of 2**69, instead of the requisite 2**80. >> >> It was a MD5 collision which was involved with the Flame attack. >> Someone (in probably the US or Isreali intelligence services) >> submitted a Certificate Signing Request (CSR) to the Microsoft >> Terminal Services Licensing server. That CSR was under the control >> of >> the attacker, and it resulted in a certificate where parts of the >> certificate could be swapped out with the corresponding fields from >> another CSR (which was not submitted to the Certifiying Authority) >> which had the code signing bit set. >> >> So in order to carry out this attack, not only did the (cough) >> "unknown" attackers had to have come up with a collision, but the two >> pieces of colliding blobs had to parsable a valid CSR's, one which >> had >> to pass inspection by the automated CA signing authority, and the >> other which had to contain the desired code signing bits set so the >> attacker could sabotage an Iranian nuclear centrifuge. >> >> OK, so how does this map to git? First of all, from a collision >> perspective, the two blobs have to map into valid C code, one of >> which >> has to be innocuous enough such that any humans who review the patch >> and/or git pull request don't notice anything wrong. > >It looks like Linux contains at least some firmware which would be hard >to audit. One random example is: >firmware/bnx2x/bnx2x-e1h-6.2.9.0.fw.ihex Either way, I agree with Ted, that we have enough time to do it right, but that is a good reason to do it sooner rather than later (see also my note about freezing the cryptographic properties.) -- Sent from my Android device with K-9 Mail. Please excuse brevity and formatting. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
On Wed, 2016-04-13 at 21:53 -0400, Theodore Ts'o wrote: > On Tue, Apr 12, 2016 at 07:15:34PM -0400, David Turner wrote: > > > > If SHA-1 is broken (in certain ways), someone *can* replace an > > arbitrary blob. GPG does not help in this case, because the > > signature > > is over the commit object (which points to a tree, which eventually > > points to the blob), and the commit hasn't changed. So the GPG > > signature will still verify. > > The "in certain ways" is the critical bit. The question is whether > you are trying to replace an arbitrary blob, or a blob that was > submitted under your control. > > If you are trying to replace an arbitrary blob under the you need to > carry a preimage attack. That means that given a particular hash, > you > need to find another blob that has the same hash. SHA-1 is currently > resistant against preimage attack (that is, you need to use brute > force, so the work factor is 2**159). > > If you are trying to replace an arbitrary blob which is under your > control, then all you need is a collision attack, and this is where > SHA-1 has been weakened. It is now possible to find a collision with > a work factor of 2**69, instead of the requisite 2**80. > > It was a MD5 collision which was involved with the Flame attack. > Someone (in probably the US or Isreali intelligence services) > submitted a Certificate Signing Request (CSR) to the Microsoft > Terminal Services Licensing server. That CSR was under the control > of > the attacker, and it resulted in a certificate where parts of the > certificate could be swapped out with the corresponding fields from > another CSR (which was not submitted to the Certifiying Authority) > which had the code signing bit set. > > So in order to carry out this attack, not only did the (cough) > "unknown" attackers had to have come up with a collision, but the two > pieces of colliding blobs had to parsable a valid CSR's, one which > had > to pass inspection by the automated CA signing authority, and the > other which had to contain the desired code signing bits set so the > attacker could sabotage an Iranian nuclear centrifuge. > > OK, so how does this map to git? First of all, from a collision > perspective, the two blobs have to map into valid C code, one of > which > has to be innocuous enough such that any humans who review the patch > and/or git pull request don't notice anything wrong. It looks like Linux contains at least some firmware which would be hard to audit. One random example is: firmware/bnx2x/bnx2x-e1h-6.2.9.0.fw.ihex -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
Theodore Ts'o wrote: > OK, so how does this map to git? First of all, from a collision > perspective, the two blobs have to map into valid C code Git provides other places to hide the colliding blobs; the best seems to be as an added header in the commit object, or as trailing data after a \0 in the commit message. git is very good at hiding such potentially colliding data from the user, as https://github.com/joeyh/supercollider demonstrates. commit 24f30db5790b209fa412ce81c5ef2bf8af5fd4d7 Author: Joey HessDate: Fri Sep 9 11:49:21 2011 -0400 an innocent commit If this were a sha1 colliding attack, there would be some sort of binary garbage below. Which there isn't. So this can be safely merged. joey@darkstar:~/tmp/supercollider>git cat-file -p 24f30db5790b209fa412ce81c5ef2bf8af5fd4d7 tree 735a7633237c07b398856005de3bc9ea00446747 author Joey Hess 1315583361 -0400 committer Joey Hess 1315583361 -0400 an innocent commit If this were a sha1 colliding attack, there would be some sort of binary garbage below. Which there isn't. So this can be safely merged. ??b???[?i??ͯ?t? 2??os??Q??H?*zl?RA˂q?E ?E7???\?m???U?>MU GY?d)?ȼ??'g?~D??ɯhQ????/"E??X?m???^??S?D??;w6(?`??>?縘?AѲ?*!??@v>?8??2?!??=*?J ???ynH???c?w?\??K7???N?6?????A5?FM?wZ?~?pKY?R???s7??(?ƶ?_"??m%1a??ʀ??K[ t????!A0?ΈfT.?T?w?ᛵƌ?р???aco?V/2??nَ? ?}?6?_?z?{ (The other possibility would be to hide the colliding blob in the tree object, but that seems unlikely.) -- see shy jo signature.asc Description: PGP signature
Re: Migrating away from SHA-1?
On Tue, Apr 12, 2016 at 07:15:34PM -0400, David Turner wrote: > > If SHA-1 is broken (in certain ways), someone *can* replace an > arbitrary blob. GPG does not help in this case, because the signature > is over the commit object (which points to a tree, which eventually > points to the blob), and the commit hasn't changed. So the GPG > signature will still verify. The "in certain ways" is the critical bit. The question is whether you are trying to replace an arbitrary blob, or a blob that was submitted under your control. If you are trying to replace an arbitrary blob under the you need to carry a preimage attack. That means that given a particular hash, you need to find another blob that has the same hash. SHA-1 is currently resistant against preimage attack (that is, you need to use brute force, so the work factor is 2**159). If you are trying to replace an arbitrary blob which is under your control, then all you need is a collision attack, and this is where SHA-1 has been weakened. It is now possible to find a collision with a work factor of 2**69, instead of the requisite 2**80. It was a MD5 collision which was involved with the Flame attack. Someone (in probably the US or Isreali intelligence services) submitted a Certificate Signing Request (CSR) to the Microsoft Terminal Services Licensing server. That CSR was under the control of the attacker, and it resulted in a certificate where parts of the certificate could be swapped out with the corresponding fields from another CSR (which was not submitted to the Certifiying Authority) which had the code signing bit set. So in order to carry out this attack, not only did the (cough) "unknown" attackers had to have come up with a collision, but the two pieces of colliding blobs had to parsable a valid CSR's, one which had to pass inspection by the automated CA signing authority, and the other which had to contain the desired code signing bits set so the attacker could sabotage an Iranian nuclear centrifuge. OK, so how does this map to git? First of all, from a collision perspective, the two blobs have to map into valid C code, one of which has to be innocuous enough such that any humans who review the patch and/or git pull request don't notice anything wrong. The second has to contain whatever security backdoor the attacker is going to try to introduce into the git tree. Ideally this is also should pass muster by humans who are inspecting the code, but if the attack is targetted against a specific victim which is not likely to look at the code, it might be okay if something like this: #if 0 /* this is needed to make the hash collision work */ aev2Ein4Hagh8eimshood5aTeteiVo9hOhchohN6jiem6AiNEipeeR3Pie4ePaeJ fo8eLa9ateeKie5VeG5eZuu2Sahqu1Ohai9ohGhuAevoot5OtohQuai7koo4IeTh ohCefae4Ahkah0eiku2Efo0iuHai8ideaRooth8wVahlia0nuu1eeSh5oht1Kaer aiJi4chunahK9oozpaiWu7viee5aiFahud6Ee2zieich1veKque6PhiaAit1shie #endif ... was hidden in the middle of the replacement blob. One would *hope*, though, that if something like this appeared in a blob that was being sent to the upstream repository, that even a sloppy github pull request reviewer would notice. That's because in this scenario, the attacker needs to be able to get the first blob into the git tree first, which means they need to be trusted enough to get the first blob in. And so the question which comes to mind is if you are that trusted (or if the git pull review process is that crappy), might it not be easier to simply introduce an obfuscated code that has a security weakness? That is, something from the Underhanded C contest, or an accidental buffer overrun, hopefully one that isn't noticed by static code checkers. If you do that, you don't even need to figure out how to create a SHA-1 collision. Does that mean that we shouldn't figure out how to migrate to another hash function? No, it's probably worth planning how to do it. But we probably have a fair amount of time to get this right. Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
On April 12, 2016 6:51:12 PM PDT, Duy Nguyenwrote: >On Wed, Apr 13, 2016 at 5:38 AM, H. Peter Anvin wrote: >> OK, I'm going to open this can of worms... >> >> At what point do we migrate from SHA-1? > >Brian Carlson has been slowly refactoring git code base, abstracting >SHA-1 away. Once that work is done, I think we can talk about moving >away from SHA-1. The process is slow because it likely causes >conflicts with in-flight topics. A quick grep shows we still have >about 300 SHA-1 references, so it'll be quite some time. Well, at least it sounds like work is underway. That is a big deal. -- Sent from my Android device with K-9 Mail. Please excuse brevity and formatting. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
On Wed, Apr 13, 2016 at 5:38 AM, H. Peter Anvinwrote: > OK, I'm going to open this can of worms... > > At what point do we migrate from SHA-1? Brian Carlson has been slowly refactoring git code base, abstracting SHA-1 away. Once that work is done, I think we can talk about moving away from SHA-1. The process is slow because it likely causes conflicts with in-flight topics. A quick grep shows we still have about 300 SHA-1 references, so it'll be quite some time. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
On 04/12/16 18:03, Junio C Hamano wrote: and so on. Of course trees don't have any space for this; they have a fixed-length for the hash part of each record, which is basically: NUL <20-byte-sha1> So we'd probably need a "treev2" object type that gives room for an algorithm byte (or we'd have to try to shove it into the mode, but since old versions won't know the new algorithm anyway, I don't think it solves that much...). Or you can just define for the whole tree object (either implicit in its type, or in a header) that it always uses algorithm X. This will hurt the performance a lot during the transition period as it no longer will be possible to rely on "most of the time a fine grained commit changes only a small part of the tree, and we can cheaply avoid descending into trees that haven't changed because we can tell that the corresponding tree objects in the pre- and post- trees have the same object name" optimization. But we cannot avoid it. Not really, because you can point to the algoX hash even for the existing objects. Perhaps the tree object can add a format descriptor at the beginning; something like: Transitioning to that would be something like: 0. Overhaul all of the git code to handle arbitrary-sized object ids. 1. Decide on the new algorithm and implement it in git. 2. Recognize parameterized object ids in commits and tags (designing format, implementing the reading side). 3. Recognize parameterized object ids somehow in trees (designing format, implementing the reading side). 4. Teach the object database to index objects by the new algorithm (or possibly both algorithms). 5. Add a protocol extension so that both sides can decide which algorithm is being used when they talk about oids. 6. Add a config option to write references in objects using the new algorithm. 7. After a while, flip the config option on. Hopefully the readers from steps 1-5 have percolated to the masses by then, and it's not a horrible flag day. We're basically on step 0 right now. I'm sure I'm missing some subtleties in there, too. One subtlety is that 7. "not a flag day" may not be a good thing. There has to be a section of a history that spans the transition, set of commits and trees that have pointers to both kinds of object names. The narrower such a section of the history, the more pleasant to use the result of the transition would be. Different projects that can have their own flag days at their own pace is a good thing, so the above observation does not invalidate your transition plan, though. I don't think there is any way this can *not* be by repository and somehow require a manual operation in order to preserve the cryptographic integrity. In some ways, the transition point and the transition table becomes a special kind of tag object. There may have to be more than one in the case of commits in multiple trees. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
On Tue, Apr 12, 2016 at 06:03:02PM -0700, Junio C Hamano wrote: > > So we'd probably need a "treev2" object type that gives room for an > > algorithm byte (or we'd have to try to shove it into the mode, but since > > old versions won't know the new algorithm anyway, I don't think it > > solves that much...). Or you can just define for the whole tree object > > (either implicit in its type, or in a header) that it always uses > > algorithm X. > > This will hurt the performance a lot during the transition period as > it no longer will be possible to rely on "most of the time a fine > grained commit changes only a small part of the tree, and we can > cheaply avoid descending into trees that haven't changed because we > can tell that the corresponding tree objects in the pre- and post- > trees have the same object name" optimization. But we cannot avoid > it. Yeah. I'd hope in general that there would be a single commit that does the transition, and we'd only pay it when doing diffs across the boundary. And even then, I think a local-only cache of aliases could mitigate the worst of it. > > 7. After a while, flip the config option on. Hopefully the readers > > from steps 1-5 have percolated to the masses by then, and it's not > > a horrible flag day. > > > > We're basically on step 0 right now. I'm sure I'm missing some > > subtleties in there, too. > > One subtlety is that 7. "not a flag day" may not be a good thing. > > There has to be a section of a history that spans the transition, > set of commits and trees that have pointers to both kinds of object > names. The narrower such a section of the history, the more > pleasant to use the result of the transition would be. > > Different projects that can have their own flag days at their own > pace is a good thing, so the above observation does not invalidate > your transition plan, though. Good point. I do think projects would do well to have a moment where they switch to the new format, and don't freely intermingle. We could possibly do some magic there to help things out. For example, if we are building on a commit that is sha-2, we automatically use more sha-2 objects to point to them. And then the "flag day" for a project is simply that somebody pushes to "master" using sha-2, and everybody else's git (which learned long ago to speak the new algorithm) just picks it up. Of course that's not exactly a flag day for projects that branch from old history for bugfixes. But it might be close enough. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
Jeff Kingwrites: > So a slightly nicer thing is to parameterize the algorithm for every > object name reference. So commits look like: > > tree sha256:1234abcd... > parent sha256:1234abcd... > > and so on. Of course trees don't have any space for this; they have a > fixed-length for the hash part of each record, which is basically: > > NUL <20-byte-sha1> > > So we'd probably need a "treev2" object type that gives room for an > algorithm byte (or we'd have to try to shove it into the mode, but since > old versions won't know the new algorithm anyway, I don't think it > solves that much...). Or you can just define for the whole tree object > (either implicit in its type, or in a header) that it always uses > algorithm X. This will hurt the performance a lot during the transition period as it no longer will be possible to rely on "most of the time a fine grained commit changes only a small part of the tree, and we can cheaply avoid descending into trees that haven't changed because we can tell that the corresponding tree objects in the pre- and post- trees have the same object name" optimization. But we cannot avoid it. > Transitioning to that would be something like: > > 0. Overhaul all of the git code to handle arbitrary-sized object ids. > > 1. Decide on the new algorithm and implement it in git. > > 2. Recognize parameterized object ids in commits and tags (designing > format, implementing the reading side). > > 3. Recognize parameterized object ids somehow in trees (designing > format, implementing the reading side). > > 4. Teach the object database to index objects by the new algorithm (or > possibly both algorithms). > > 5. Add a protocol extension so that both sides can decide which > algorithm is being used when they talk about oids. > > 6. Add a config option to write references in objects using the new > algorithm. > > 7. After a while, flip the config option on. Hopefully the readers > from steps 1-5 have percolated to the masses by then, and it's not > a horrible flag day. > > We're basically on step 0 right now. I'm sure I'm missing some > subtleties in there, too. One subtlety is that 7. "not a flag day" may not be a good thing. There has to be a section of a history that spans the transition, set of commits and trees that have pointers to both kinds of object names. The narrower such a section of the history, the more pleasant to use the result of the transition would be. Different projects that can have their own flag days at their own pace is a good thing, so the above observation does not invalidate your transition plan, though. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
On Tue, Apr 12, 2016 at 07:15:34PM -0400, David Turner wrote: > It would be possible, of course, to GPG-sign the entire commit's > transitive data (rather than just the SHA1s of same). But as far as I > know, that is not ever what is done. There is a project called git-evtag which does this, and you can find mention on the list. The problem is just that it's not very efficient. That's maybe OK for tag-signing, which is relatively rare. It wouldn't really work for commit-signing. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
On Tue, Apr 12, 2016 at 03:38:04PM -0700, H. Peter Anvin wrote: > For existing repositories we will need to have a migration mechanism. Since > we can't modify objects without completely invalidating the cryptographic > properties, what I would suggest is that we leave the existing objects as > is, with a persistent lookup table from SHA-1 to , and have that > lookup table signed (e.g. GPG) by the person responsible for converting the > repository. This freezes the cryptographic status of the existing SHA-1 > objects at the time the conversion happens. This is a very good reason to > do this before SHA-1 is actually broken In contrast. SHA-2 has been > surprisingly resistant to cryptoanalysis, to the point that SHA-3 was > motivated by performance and the desire to have a well-tested function based > on entirely different principles should a generic attack against the common > structure of MD5/SHA-1/SHA-2 would ever be found. There are a few threads in the list archive discussing options, if you search. A conversion table like you mention seems like a "step 2". I think the first step is figuring out what the new format looks like, and how objects refer to each other. The absolute simplest thing that could work is literally replacing sha1 with a 160-bit truncation of sha-256, telling everybody to convert their repos, and accepting that existing gpg signatures and external sha1 references are all obsolete. Old versions of git are obsolete, but the code changes are very minor. That sucks for a lot of reasons, obviously. So a slightly nicer thing is to parameterize the algorithm for every object name reference. So commits look like: tree sha256:1234abcd... parent sha256:1234abcd... and so on. Of course trees don't have any space for this; they have a fixed-length for the hash part of each record, which is basically: NUL <20-byte-sha1> So we'd probably need a "treev2" object type that gives room for an algorithm byte (or we'd have to try to shove it into the mode, but since old versions won't know the new algorithm anyway, I don't think it solves that much...). Or you can just define for the whole tree object (either implicit in its type, or in a header) that it always uses algorithm X. And then the "new" objects can refer to the older sha1 objects directly (either via "sha1:1234abcd", or we'd probably define a parameter-less reference to mean "sha1:"), and that essentially grafts the old history to the new. You can always walk the old history. And because we're really talk about collision attacks and not pre-image attacks, it probably remains fairly trustworthy for chaining (because nobody is making _new_ objects and referring to them via sha1). And then if you buy into the collision vs pre-image thing above, there's not much point in caring about the mapping between sha1 and the new algorithm. The old ones are set in stone and probably fine. You might want such a mapping for performance (e.g., so that you can immediately tell that an old sha-1 tree and a new sha-2 tree have an empty diff, even though they have different ids), but that's purely a local thing. So perhaps you were thinking of something in between, or an alternative plan altogether. I haven't been able to think of a scheme that is secure, convenient, and involves less work than the one above. Transitioning to that would be something like: 0. Overhaul all of the git code to handle arbitrary-sized object ids. 1. Decide on the new algorithm and implement it in git. 2. Recognize parameterized object ids in commits and tags (designing format, implementing the reading side). 3. Recognize parameterized object ids somehow in trees (designing format, implementing the reading side). 4. Teach the object database to index objects by the new algorithm (or possibly both algorithms). 5. Add a protocol extension so that both sides can decide which algorithm is being used when they talk about oids. 6. Add a config option to write references in objects using the new algorithm. 7. After a while, flip the config option on. Hopefully the readers from steps 1-5 have percolated to the masses by then, and it's not a horrible flag day. We're basically on step 0 right now. I'm sure I'm missing some subtleties in there, too. Things get simpler if you don't fully parameterize (e.g., just assume everything is moved to the new algorithm, and provide a "legacy" parent pointer for connecting to sha1 history). But part of this would be future-proofing for a day when sha-2 fails. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
On Tue, 2016-04-12 at 16:00 -0700, Stefan Beller wrote: > On Tue, Apr 12, 2016 at 3:38 PM, H. Peter Anvin> wrote: > > OK, I'm going to open this can of worms... > > > > At what point do we migrate from SHA-1? At this point the > > cryptoanalysis of > > SHA-1 is most likely a matter of time. > > And I thought the cryptographic properties of SHA1 did not matter for > Gits use case. > We could employ broken md5 or such as well. > ( see http://stackoverflow.com/questions/28792784/why-does-git-use-a- > cryptographic-hash-function > ) > That is because security goes on top via gpg signing of tags/commits. > > I am not sure if anyone came up with > a counter argument to Linus reasoning there? Here's my reasoning as to why the security of SHA1 matters: If SHA-1 is not broken, and someone hacks into e.g. kernel.org, they can't replace an arbitrary blob with anything else without being detected by git's automatic checksumming of objects. GPG is necessary here because otherwise the HEAD commit could be changed (to point to a new tree that points to the new blob). If SHA-1 is broken (in certain ways), someone *can* replace an arbitrary blob. GPG does not help in this case, because the signature is over the commit object (which points to a tree, which eventually points to the blob), and the commit hasn't changed. So the GPG signature will still verify. It would be possible, of course, to GPG-sign the entire commit's transitive data (rather than just the SHA1s of same). But as far as I know, that is not ever what is done. This is the argument for migration to a more-secure hash. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
On Tue, Apr 12, 2016 at 04:00:18PM -0700, Stefan Beller wrote: > On Tue, Apr 12, 2016 at 3:38 PM, H. Peter Anvinwrote: > > OK, I'm going to open this can of worms... > > > > At what point do we migrate from SHA-1? At this point the cryptoanalysis of > > SHA-1 is most likely a matter of time. > > And I thought the cryptographic properties of SHA1 did not matter for > Gits use case. > We could employ broken md5 or such as well. > ( see > http://stackoverflow.com/questions/28792784/why-does-git-use-a-cryptographic-hash-function > ) > That is because security goes on top via gpg signing of tags/commits. > > I am not sure if anyone came up with > a counter argument to Linus reasoning there? I have never understood that reasoning at all, nor why it is so often repeated. The GPG signature is over a single object, that mentions other objects by their sha1 ids. But users don't care that v1.0 is securely mapped to tree 1234abcd. They care which files are in 1234abcd, and if sha1 is broken, it means you can't credibly verify the content down to the blob level. There's some additional protection in that git generally prefers objects it already has to new ones. So it's hard to reliably distribute your evil colliding object, depending on where people might have fetched from first. But: 1. I know there's at least once race[1] where a colliding object can still enter the repository. There may be more that have either existed all along, or that have grown over the years. I don't think this is something we've paid attention to and tested. 2. That helps some people, I guess, but it's little consolation to somebody who runs "git clone" followed by verifying the tag. -Peff [1] The race I am thinking of is that for performance reasons, we don't re-scan the pack directory when index-pack checks has_sha1_file() on an incoming object and it comes up negative. So if somebody else is repacking, we might skip the collision check in such a case. At least that race is not under control of an attacker, though. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
On 04/12/16 16:00, Stefan Beller wrote: On Tue, Apr 12, 2016 at 3:38 PM, H. Peter Anvinwrote: OK, I'm going to open this can of worms... At what point do we migrate from SHA-1? At this point the cryptoanalysis of SHA-1 is most likely a matter of time. And I thought the cryptographic properties of SHA1 did not matter for Gits use case. We could employ broken md5 or such as well. ( see http://stackoverflow.com/questions/28792784/why-does-git-use-a-cryptographic-hash-function ) That is because security goes on top via gpg signing of tags/commits. I am not sure if anyone came up with a counter argument to Linus reasoning there? Not true, because what we are signing is a chain of SHA-1s; the signature is meaningless unless the integrity of the hash chain is inviolate. For existing repositories we will need to have a migration mechanism. Since we can't modify objects without completely invalidating the cryptographic properties, what I would suggest is that we leave the existing objects as is, with a persistent lookup table from SHA-1 to , and have that lookup table signed (e.g. GPG) by the person responsible for converting the repository. This freezes the cryptographic status of the existing SHA-1 objects at the time the conversion happens. This is a very good reason to do this before SHA-1 is actually broken In contrast. SHA-2 has been surprisingly resistant to cryptoanalysis, to the point that SHA-3 was motivated by performance and the desire to have a well-tested function based on entirely different principles should a generic attack against the common structure of MD5/SHA-1/SHA-2 would ever be found. When the kernel moved from BitKeeper to Git, all history was thrown away, and started from scratch. The old history could be grafted into the repo, if you cared though. I'd propose to go that route again and use a sha1 graft history which you can get optionally put into your new history for convenience. That was done more for legal reasons than anything else, as far as I understand. The userbase of git today is also much, much larger than the userbase for BK ever was. -hpa -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating away from SHA-1?
On Tue, Apr 12, 2016 at 3:38 PM, H. Peter Anvinwrote: > OK, I'm going to open this can of worms... > > At what point do we migrate from SHA-1? At this point the cryptoanalysis of > SHA-1 is most likely a matter of time. And I thought the cryptographic properties of SHA1 did not matter for Gits use case. We could employ broken md5 or such as well. ( see http://stackoverflow.com/questions/28792784/why-does-git-use-a-cryptographic-hash-function ) That is because security goes on top via gpg signing of tags/commits. I am not sure if anyone came up with a counter argument to Linus reasoning there? > > For existing repositories we will need to have a migration mechanism. Since > we can't modify objects without completely invalidating the cryptographic > properties, what I would suggest is that we leave the existing objects as > is, with a persistent lookup table from SHA-1 to , and have that > lookup table signed (e.g. GPG) by the person responsible for converting the > repository. This freezes the cryptographic status of the existing SHA-1 > objects at the time the conversion happens. This is a very good reason to > do this before SHA-1 is actually broken In contrast. SHA-2 has been > surprisingly resistant to cryptoanalysis, to the point that SHA-3 was > motivated by performance and the desire to have a well-tested function based > on entirely different principles should a generic attack against the common > structure of MD5/SHA-1/SHA-2 would ever be found. When the kernel moved from BitKeeper to Git, all history was thrown away, and started from scratch. The old history could be grafted into the repo, if you cared though. I'd propose to go that route again and use a sha1 graft history which you can get optionally put into your new history for convenience. Stefan > > -hpa > > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Migrating away from SHA-1?
OK, I'm going to open this can of worms... At what point do we migrate from SHA-1? At this point the cryptoanalysis of SHA-1 is most likely a matter of time. For existing repositories we will need to have a migration mechanism. Since we can't modify objects without completely invalidating the cryptographic properties, what I would suggest is that we leave the existing objects as is, with a persistent lookup table from SHA-1 to , and have that lookup table signed (e.g. GPG) by the person responsible for converting the repository. This freezes the cryptographic status of the existing SHA-1 objects at the time the conversion happens. This is a very good reason to do this before SHA-1 is actually broken In contrast. SHA-2 has been surprisingly resistant to cryptoanalysis, to the point that SHA-3 was motivated by performance and the desire to have a well-tested function based on entirely different principles should a generic attack against the common structure of MD5/SHA-1/SHA-2 would ever be found. -hpa -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html