Re: SHA1 hash safety
On Tue, Apr 19, 2005 at 06:48:57PM -0400, C. Scott Ananian wrote: > On Tue, 19 Apr 2005, David Meybohm wrote: > > >But doesn't this require assuming the distribution of MD5 is uniform, > >and don't the papers finding collisions in less show it's not? So, your > >birthday-argument for calculating the probability wouldn't apply, because > >it rests on the assumption MD5 is uniform, and it isn't. > > No, the collision papers don't show this at all. I didn't mean they showed it directly. I meant by finding collisions in MD5 quickly, MD5 would have to have some non-uniformity. But that's nevertheless wrong because uniformness and collision finding ability aren't related. Sorry to have wasted everyone's time. Dave - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
On Tue, 19 Apr 2005, David Meybohm wrote: But doesn't this require assuming the distribution of MD5 is uniform, and don't the papers finding collisions in less show it's not? So, your birthday-argument for calculating the probability wouldn't apply, because it rests on the assumption MD5 is uniform, and it isn't. No, the collision papers don't show this at all. --scott atomic strategic HBDRILL SARANAC COBRA JUDY Ft. Meade assassination politics Mossad HOPEFUL ZPSEMANTIC DTFROGS HTKEEPER LITEMPO LIONIZER operation ( http://cscott.net/ ) - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
On Mon, Apr 18, 2005 at 12:43:23AM -0700, Andy Isaacson wrote: > > I'm not going to do the sums, but I would hazard a guess that it's more > likely your PC suffered a cosmic-ray-induced memory fault - EACH OF THE > FOUR TIMES YOU TESTED IT - causing it to report the same MD5, than that > you actually discovered a collision with a measly million (or even > hundred million) plaintexts. But doesn't this require assuming the distribution of MD5 is uniform, and don't the papers finding collisions in less show it's not? So, your birthday-argument for calculating the probability wouldn't apply, because it rests on the assumption MD5 is uniform, and it isn't. For example, say most people are married in June, get pregnant, and there are more births around March, 9 months later, than in other months. Then if you are born in March you have a higher chance of seeing a collision of your birthday with someone else's. The same is true for someone else born in March too, and this makes the chances of seeing a collision for the whole function higher. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
On Mon, 18 Apr 2005, Andy Isaacson wrote: If you had actual evidence of a collision, I'd love to see it - even if it's just the equivalent of % md5 foo d3b07384d113edec49eaa6238ad5ff00 foo % md5 bar d3b07384d113edec49eaa6238ad5ff00 bar % cmp foo bar foo bar differ: byte 25, line 1 % But in the absence of actual evidence, we have to assume (just based on the probabilities) that there was some error in your testing. I've already had a long correspondence with this poster. He claims that "this happened 7 years ago", involved a "commercial contract covered by Swiss Banking Law" (with caps!) and that, of course, he "certainly doesn't retain [his] client's documents", and even if he *did*, he wouldn't show them to *me*. And then he was unable to comprehend that I couldn't accept his word alone as prima facie evidence that the laws of probability did not apply to him or his clients. I've been a coder far too long to attribute to "The Mysterious Hand Of God" what can adequately be described by subtle programmer error. The most reasonable explanation, given the (lack of) evidence, is that the programmer involved quickly took refuge in a (wildly improbable, but his clients'll never know) "MD5 collision" instead of buckling down and finding the bug in his code. --scott ODOATH Ortega FBI SGUAT AEBARMAN India Peking ODACID operation RYBAT [Hello to all my fans in domestic surveillance] for Dummies KUCLUB ( http://cscott.net/ ) - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
On Sun, 17 Apr 2005, Horst von Brand wrote: crypto-babble about collision whitepapers is uninteresting without a repo that has real collisions. git is far too cool as is - prove I Just copy over a file (might be the first step in splitting it, or a header file that is duplicated for convenience, ...) This is not a collision. This is a *feature*. --scott payment UKUSA ODOATH AVBLIMP ESSENCE JUBILIST ASW AK-47 CABOUNCE Ortega PBPRIME North Korea anthrax Milosevic bomb Soviet QKFLOWAGE Yeltsin ( http://cscott.net/ ) - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
[trimmed cc list, nobody wants to read this noise] On Sat, Apr 16, 2005 at 11:35:39PM +0200, Brian O'Mahoney wrote: > >> (1) I _have_ seen real-life collisions with MD5, in the context of > >>Document management systems containing ~10^6 ms-WORD documents. > > > > Dude! You could have been *famous*! Why the > > aitch-ee-double-hockey-sticks didn't you publish this when you found it? > > Seriously, man. > > The MD5 has was fine, or at least the code (a) produced the correct > results on the published test cases, and, (b) was properly applied to > all bytes of the file(s). I was surprised when it happened, which is why > I bothered to post to this list at this time, so I make two more points OK, I guess it's time for some remedial math. There are 2^128 = 340282366920938463463374607431768211456 different MD5 hashes. You are suggesting that you found a collision using ~1e6 = ~1,000,000 plaintexts. Let's suppose there were actually 100,000,000 = 1e8 plaintexts, just in case you underestimated the number. Applying the birthday paradox, we have a 50% probability that you'd find one collision if there were ~7,213,475,309,916,173 possible hash values. If you extend the birthday argument ("what is the probability of a collision if you take N samples from a set of size M?") you get the following results, with N = 1e8: 50% (1 in 2) probability of collision in 7,213,475,309,916,173. 1% (1 in 100) probability of collision in497,496,027,172,833,194. .05% (1 in 1845) probability of collision in 9,223,372,036,854,775,806. That's where my quick-and-dirty solver craps out, but we're still a really long ways from 340,282,366,920,938,463,463,374,607,431,768,211,456. A simple linear extrapolation (certainly wrong, but not by more than a few dozen orders of magnitude) says that the odds would be 1 in 68,056,473,384,187,692,692 for the full MD5 hash (I'm not even going to dignify that with a percentage). I'm not going to do the sums, but I would hazard a guess that it's more likely your PC suffered a cosmic-ray-induced memory fault - EACH OF THE FOUR TIMES YOU TESTED IT - causing it to report the same MD5, than that you actually discovered a collision with a measly million (or even hundred million) plaintexts. (Of course, I don't know how many tests of the hash you actually did. But the point stands.) Hell, if you're *that* lucky, what are you doing in IT? You could be making a killing at the roulette table. Or, even more likely, there was some other factor in the system (most likely that it was only using a few bits, probably 32, of the hash when looking for collisions) that resulted in a false alarm. If you had actual evidence of a collision, I'd love to see it - even if it's just the equivalent of % md5 foo d3b07384d113edec49eaa6238ad5ff00 foo % md5 bar d3b07384d113edec49eaa6238ad5ff00 bar % cmp foo bar foo bar differ: byte 25, line 1 % But in the absence of actual evidence, we have to assume (just based on the probabilities) that there was some error in your testing. -andy - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
On Sun, Apr 17, 2005 at 12:38:37AM -0400, David A. Wheeler wrote: > The probability of an accidental overlap for SHA-1 for two > different files is absurdly remote; it's just not worth worrying about. > > However, the possibility of an INTENTIONAL overlap is a completely > different matter. I think the hash algorithm should change in the > future; I have a proposal below. > > Someone has ALREADY broken into a server to modify the Linux kernel > code already, so the idea of an attack on kernel code > is not an idle fantasy. MD5 is dead, and SHA-1's work factor has > already been sufficiently broken that people have already been told > "walk to the exits" (i.e., DO NOT USE SHA-1 for new programs like git). We're very clearly going to need a FAQ for git. SHA-1's work factor has been decreased to 2**69 from 2**80 for generating two messages that have the same hash value, WHERE THE HASH VALUE AND THE MESSAGES ARE NOT UNDER THE ATTACKER'S CONTROL. This is not the same as a pre-image attack, where given a message M1 which hashes to value H, the attacker can find another message M2 which also hashes to value H. In even if the attacker can do this, the result has to have valid git metadata format, and also be valid C code. So the the recent result which has weakened (but not broken) SHA-1's use in digital signatures, and which has resulted in the advice to "walk not run" for the exits, do not apply to git. Can we guarantee that there won't be further innovations that may break SHA-1? Of course not. But an attacker who wants to introduced a trojan into the Linux kernel would have a much easier time doing a "black bag job" --- i.e., breaking into Linus's house in Portland, and then inserting a buggered patch into his master source tree. If you're going to be a professional paranoid, it's best to worry about the realistic attacks before stressing out over the unrealistic ones. - Ted - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
Linus wants to drive ahead, and ignore the collision issue for now, and has been dismissive of the risks, he wants a result not heart searching, and the list comments exhibit a confusion with the engineering problem of avoiding accidental collisions v deliberate sabotage. Since this is not a show-stopper, and getting the BK replacement in place is time critical, and if you look at the code it is easy to extend the content key, LET US just leave this issue for now. Horst von Brand wrote: > [EMAIL PROTECTED] said: > > [...] > > >>Linus has already weighed in that he doesn't give a crap. All the >>crypto-babble about collision whitepapers is uninteresting without a >>repo that has real collisions. git is far too cool as is - prove I >>should be concerned. > > > Just copy over a file (might be the first step in splitting it, or a > header file that is duplicated for convenience, ...) -- mit freundlichen Grüßen, Brian. Dr. Brian O'Mahoney Mobile +41 (0)79 334 8035 Email: [EMAIL PROTECTED] Bleicherstrasse 25, CH-8953 Dietikon, Switzerland PGP Key fingerprint = 33 41 A2 DE 35 7C CE 5D F5 14 39 C9 6D 38 56 D5 - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
[EMAIL PROTECTED] said: [...] > Linus has already weighed in that he doesn't give a crap. All the > crypto-babble about collision whitepapers is uninteresting without a > repo that has real collisions. git is far too cool as is - prove I > should be concerned. Just copy over a file (might be the first step in splitting it, or a header file that is duplicated for convenience, ...) -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, ChileFax: +56 32 797513 - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
I have nothing further to contribute to this subtopic. Good luck with it. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
> "Tkil" == Tkil <[EMAIL PROTECTED]> writes: Tkil> but the chance of any collision at all wigs me out. > "Paul" == Paul Jackson <[EMAIL PROTECTED]> writes: Paul> Guess you're just going to get wigged out then. Wig wig. :) I didn't mean "wigs me out to the point I won't use it" but more of "wigs me out so that I'm curious whether there are backup schemes worth considering". In particular, the comparisons between hash collisions and hardware failure seem contrived -- if I have bad RAM, or a bad block on my HD, I can recover it from known good sources. But if the actual known good source is structured in such a way that a particular set of data cannot be represented, that bothers me. In this case, the fact that it has to be the same length, same SHA-1, correct C, and functionally similar C at that, makes for a comforting cushion. Further, git wouldn't be the only representation; there would be periodic tarballs, different trees, etc. On the other paw, if "effectively random" MS Word docs gave true MD5 collisions (when we have a proper MD5 hash computed over the entire document) in a "mere" 1e7 space, that is interesting/scary. (I was also trying to add a few factoids to the MSW comment, as their structure could lead to collisions if (say) only the first 512 bytes were considered -- it's possible that nothing but size and date might change in that, and /those/ I can see colliding in 1e7 documents.) Finally, I apologize for taking your time. I'm just watching this from the sidelines, and the questions above are just intellectual curiosity. :-/ (The only other thread I'm really following is people trying to chunk files in a way that would increase storage efficiency; reading the Venti paper, I was wondering how efficient it would be if a one-byte addition at the top of the file would generate all-new blocks, while the rsync-ish protocol seems to offer substantial relief. But if the "interesting history" fits in 10USD worth of HD, that might be enough. Babble.) Thanks, t. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
Paul Jackson wrote: what I'm talking about is the chance that somewhere, sometime there will be two different documents that end up with the same hash I have vastly greater chance of a file colliding due to hardware or software glitch than a random message digest collision of two legitimate documents. The probability of an accidental overlap for SHA-1 for two different files is absurdly remote; it's just not worth worrying about. However, the possibility of an INTENTIONAL overlap is a completely different matter. I think the hash algorithm should change in the future; I have a proposal below. Someone has ALREADY broken into a server to modify the Linux kernel code already, so the idea of an attack on kernel code is not an idle fantasy. MD5 is dead, and SHA-1's work factor has already been sufficiently broken that people have already been told "walk to the exits" (i.e., DO NOT USE SHA-1 for new programs like git). The fact that blobs are compressed first, with a length header in front, _may_ make it harder to attack. But maybe not. I haven't checked for this case, but most decompression algorithms I know of have a "don't change" mode that essentially just copies the data behind it. If the one used in git has such a mode (I bet it does!), an attacker could use that mode to make it MUCH easier to create an attack vector than it would appear at first. Now the attacker just needs to create a collision (hmmm, where was that paper?). Remember, you don't need to run a hash algorithm over an entire file; you can precompute to near the end, and then try your iterations from there. A little hardware (inc. FPGAs) would speed the attack. Of course, that assumes you actually check everything to make sure that an attacker can't slip in something different. After each rsync, are all new files' hash values checked? Do they uncompress to right length? Do they have excess data after the decompression? I'm hoping that sort of input-checking (since the data might be from an attacker, if indirectly!) is already going on, though I haven't reviewed the git source code. While the jury's still out, the current belief by most folks I talk to is that SHA-1 variants with more bits, such as SHA-256, are the way to go now. The SHA-1 attack simply reduces the work factor (it's not a COMPLETE break), so adding more bits is believed to increase the work factor enough to counter it. Adding more information to the hash can make attacking even harder. Here's one idea: whenever that hash algorithm switch occurs, create a new "hash" value as this: SHA-256 "+" uncompressed-length Where SHA-256 is computed just like SHA-1 is now, e.g., SHA-256(file) where file = typecode + length + compressed data. Leave the internal format as-is (with the length embedded as well). This means that an attacker has to come up with an attack that creates the same length uncompressed, yet has the same hash of the compressed result. That's harder to do. Length is also really, really cheap to compute :-). That also might help the convince the "what happens if there's an accidental collision" crowd: now, if the file lengths are different, you're GUARANTEED that the hash values are different, though that's not the best reason to do that. One reason to think about switching sooner rather than later is that it'd be really nice if the object store also included signatures, so that in one fell swoop you could check who signed what (and thus you could later on CONFIRM with much more certainty who REALLY submitted a given change... say if it was clearly malicious). If you switch hash algorithms, the signatures might not work, depending on how you do it. --- David A. Wheeler - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
> but the chance of any collision at all wigs me out. Guess you're just going to get wigged out then. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
> "Brian" == Brian O'Mahoney <[EMAIL PROTECTED]> writes: Brian> (1) I _have_ seen real-life collisions with MD5, in the context Brian> of Document management systems containing ~10^6 ms-WORD Brian> documents. Was this whole-document based, or was it blocked or otherwise chunked? I'm wondering, because (SFAIK) the MS word on-disk format is some serialized version of one or more containers, possibly nested. If you're blocks are sized so that the first block is the same across multiple files, this could cause collisions -- but they're the good kind, that allow us to save disk space, so they're not a problem. Are you saying that, within 1e7 documents, that you found two documents with the same MD5 hash yet different contents? That's not an accusation, btw; I'm just trying to get clarity on the terminology. I'm fascinated by the idea of using this sort of content-addressable filesystem, but the chance of any collision at all wigs me out. I look at the probabilities, but still. Thanks, t. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
Hi! > We've already computed the chances of a random pure hash collision > with SHA1 - it's something like an average of 1 collision every > 10 billion years if we have 10,000 coders generating 1 new file > version every minute, non-stop, 24 hours a day, 365 days a year. GIT is safe even for the millions of monkeys writing Shakespeare :-) Have a nice fortnight -- Martin `MJ' Mares <[EMAIL PROTECTED]> http://atrey.karlin.mff.cuni.cz/~mj/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth Homo homini lupus, frater fratri lupior, bohemus bohemo lupissimus. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
> sysadmins realize that there are an infinante number of files that map to Sysadmins know that there are an infinite ways for their systems to crap out, and try to cover for the ones that there is a snow balls chance in Hades of them seeing in their lifetime. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
> what I'm talking about is the chance that somewhere, sometime there will > be two different documents that end up with the same hash I have vastly greater chance of a file colliding due to hardware or software glitch than a random message digest collision of two legitimate documents. I've lost quite a few files in 25 years of computing to just such glitches, sometimes without knowing it until months or years later. We've already computed the chances of a random pure hash collision with SHA1 - it's something like an average of 1 collision every 10 billion years if we have 10,000 coders generating 1 new file version every minute, non-stop, 24 hours a day, 365 days a year. Get real. There are _many_ sources of random error in our tools. When some sources are billions of billions times more likely to occur, it makes sense to worry about them first. Reminds me of the drunk looking under the lamp post for the house keys he dropped - because that's where the light is. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: SHA1 hash safety
On Sat, 16 Apr 2005, C. Scott Ananian wrote: Date: Sat, 16 Apr 2005 11:36:28 -0400 (EDT) From: C. Scott Ananian <[EMAIL PROTECTED]> To: Petr Baudis <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED], David Lang <[EMAIL PROTECTED]>, Ingo Molnar <[EMAIL PROTECTED]>, git@vger.kernel.org Subject: Re: Re: SHA1 hash safety On Sat, 16 Apr 2005, Petr Baudis wrote: I know the current state of the art here. It's going to take more than just hearsay to convince me that full 128-bit MD5 collisions are likely. http://cryptography.hyperlink.cz/MD5_collisions.html OK, OK, I spoke too sloppily. Let me rephrase: It's going to take more than just hearsay to convince me that full 128-bit MD5 collisions *IN ARBITRARILY CHOSEN DOCUMENTS* are likely. I could add, "WITHOUT SPECIAL EFFORT BY AN ATTACKER". you are missing the point. I'm not talking about takeing one document (sched.c) and finding a replacement that can drop in without being noticed. what I'm talking about is the chance that somewhere, sometime there will be two different documents that end up with the same hash what git is doing (in very crude sysadminish terms) is to take all the files you care about, move them into a new directory where they are named by their hash with a symlink that replaces the origional file (and then a bunch of stuff to manage multiple versions of those symlinks) if you are taking every file that you ever care about and loosing all refrence to it except by it's hash then when you get a second file that has the same hash you loose the contents of one of the two files (race condition over which file gets written into the storage directory last) anywhere else that hashing algorithms are used people realize that there will be hash collisions and plan accordingly, however people tend to put blinders on when you say SHA1 or MD5 and decide that somehow the same thing cannot happen with these types of hashes. they can, and eventually they will so you need to plan accordingly. David Lang -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
that's the difference between CS researchers and sysadmins. sysadmins realize that there are an infinante number of files that map to the same hash value and plan accordingly (becouse we KNOW we will run across them eventually), and don't see it as a big deal when we finally do. CS researches quote statistics that show how hard it is to intentiallly create two files with the same hash and insist it just doesn't happen until presented by the proof, at which point it is a big deal. a difference in viewpoints. David Lang On Sat, 16 Apr 2005, C. Scott Ananian wrote: Date: Sat, 16 Apr 2005 10:58:15 -0400 (EDT) From: C. Scott Ananian <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: David Lang <[EMAIL PROTECTED]>, Ingo Molnar <[EMAIL PROTECTED]>, git@vger.kernel.org Subject: Re: SHA1 hash safety On Sat, 16 Apr 2005, Brian O'Mahoney wrote: (1) I _have_ seen real-life collisions with MD5, in the context of Document management systems containing ~10^6 ms-WORD documents. Dude! You could have been *famous*! Why the aitch-ee-double-hockey-sticks didn't you publish this when you found it? Seriously, man. Even given the known weaknesses in MD5, it would take much more than a million documents to find MD5 collisions. I can only conclude that the hash was being used incorrectly; most likely truncated (my wild-ass guess would be to 32 bits; a collision is likely with > 50% probability in a million document store for a hash of less than 40 bits). I know the current state of the art here. It's going to take more than just hearsay to convince me that full 128-bit MD5 collisions are likely. I believe there are only two or so known to exist so far, and those were found by a research team in China (which, yes, is fairly famous among the cryptographic community now after publishing a paper consisting of little apart from the two collisions themselves). Please, let's talk about hash collisions responsibly. I posted earlier about the *actual computed probability* of finding two files with an SHA-1 collision before the sun goes supernova. It's 10^28 to 1 against. The recent cryptographic works has shown that there are certain situations where a decent amount of computer work (2^69 operations) can produce two sequences with the same hash, but these sequences are not freely chosen; they've got very specific structure. This attack does not apply to (effectively) random files sitting in a SCM. http://www.schneier.com/blog/archives/2005/02/sha1_broken.html That said, Linux's widespread use means that it may not be unimaginable for an attacker to devote this amount of resources to an attack, which would probably involve first committing some specially structured file to the SCM (but would Linus accept it?) and then silently corrupting said file via a SHA1 collision to toggle some bits (which would presumably Do Evil). Thus hashes other than SHA1 really ought to be considered... ...but the cryptographic community has not yet come to a conclusion on what the replacement ought to be. These attacks are so new that we don't really understand what it is about the structure of SHA1 which makes them possible, which makes it hard to determine which other hashes are similarly vulnerable. It will take time. I believe Linus has already stated on this list that his plan is to eventually provide a tool for bulk migration of an existing SHA1 git repository to a new hash type. Basically munging through the repository in bulk, replacing all the hashes. This seems a perfectly adequate strategy at the moment. --scott WASHTUB Panama Minister Moscow explosives KUGOWN hack Marxist LPMEDLEY genetic immediate radar SCRANTON COBRA JANE KGB Shoal Bay atomic Bejing ( http://cscott.net/ ) -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
On Sat, 16 Apr 2005, Brian O'Mahoney wrote: Three points: (1) I _have_ seen real-life collisions with MD5, in the context of Document management systems containing ~10^6 ms-WORD documents. (2) The HMAC (ethernet-harware-address) of any interface _should_ help to make a unique Id. you want a unique ID that can be computed directly from the file contents. what file integrety programa (ala tripwire) do is to use multiple identification routines (aide uses MD4+MD5+filesize IIRC) David Lang wrote: On Sat, 16 Apr 2005, Ingo Molnar wrote: * David Lang <[EMAIL PROTECTED]> wrote: this issue was raised a few days ago in the context of someone tampering with the files and it was decided that the extra checks were good enough to prevent this (at least for now), but what about accidental collisions? if I am understanding things right the objects get saved in the filesystem in filenames that are the SHA1 hash. of two legitimate files have the same hash I don't see any way for both of them to exist. yes the risk of any two files having the same has is low, but in the earlier thread someone chimed in and said that they had two files on their system that had the same hash.. you can add -DCOLLISION_CHECK to Makefile:CFLAGS to turn on collision checking (disabled currently). If there indeed exist two files that have different content but the same hash, could someone send those two files? remember that the flap over SHA1 being 'broken' a couple weeks ago was not from researchers finding multiple files with the same hash, but finding that it was more likly then expected that files would have the same hash. there was qa discussion on LKML within the last year about useing MD5 hashes for identifying unique filesystem blocks (with the idea of being able to merge identical blocks) and in that discussion it was pointed out that collisions are a known real-life issue. so if collision detection is turned on in git, does that make it error out if it runs into a second file with the same hash, or does it do something else? David Lang -- Brian -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
Please see below: C. Scott Ananian wrote: > On Sat, 16 Apr 2005, Brian O'Mahoney wrote: > >> (1) I _have_ seen real-life collisions with MD5, in the context of >>Document management systems containing ~10^6 ms-WORD documents. > > > Dude! You could have been *famous*! Why the > aitch-ee-double-hockey-sticks didn't you publish this when you found it? > Seriously, man. The MD5 has was fine, or at least the code (a) produced the correct results on the published test cases, and, (b) was properly applied to all bytes of the file(s). I was surprised when it happened, which is why I bothered to post to this list at this time, so I make two more points (1) These hashes were designed, to assist in the construction of digital signatures, ie so it is hard to produce a document to hash to a known hash value, and that with a defined document format so they are designed (i) hash similar documents far apart, and (ii) be hard to reverse; it says nothing about naturally ocurring collisions, ie where the document is not constrained to be similar, > > Even given the known weaknesses in MD5, it would take much more than a > million documents to find MD5 collisions. I can only conclude that the > hash was being used incorrectly; most likely truncated (my wild-ass > guess would be to 32 bits; a collision is likely with > 50% probability > in a million document store for a hash of less than 40 bits). > > I know the current state of the art here. It's going to take more than > just hearsay to convince me that full 128-bit MD5 collisions are likely. > I believe there are only two or so known to exist so far, and those were > found by a research team in China (which, yes, is fairly famous among > the cryptographic community now after publishing a paper consisting of > little apart from the two collisions themselves). (2) I am not concerned with cryptography here, merely sound engineering tradeoffs and the avoidance of _pain_in_the_ass_ when we do see a random collision, [NB the 2^69 is to 'cause a collision in SHA1' not the odds against such a collision] ... (below) > > Please, let's talk about hash collisions responsibly. I posted earlier > about the *actual computed probability* of finding two files with an > SHA-1 collision before the sun goes supernova. It's 10^28 to 1 against. > The recent cryptographic works has shown that there are certain > situations where a decent amount of computer work (2^69 operations) can > produce two sequences with the same hash, but these sequences are not > freely chosen; they've got very specific structure. This attack does > not apply to (effectively) random files sitting in a SCM. > http://www.schneier.com/blog/archives/2005/02/sha1_broken.html > > That said, Linux's widespread use means that it may not be unimaginable > for an attacker to devote this amount of resources to an attack, which > would probably involve first committing some specially structured file > to the SCM (but would Linus accept it?) and then silently corrupting > said file via a SHA1 collision to toggle some bits (which would > presumably Do Evil). Thus hashes other than SHA1 really ought to be > considered... > > ..but the cryptographic community has not yet come to a conclusion on > what the replacement ought to be. These attacks are so new that we > don't really understand what it is about the structure of SHA1 which > makes them possible, which makes it hard to determine which other hashes > are similarly vulnerable. It will take time. > > I believe Linus has already stated on this list that his plan is to > eventually provide a tool for bulk migration of an existing SHA1 git > repository to a new hash type. Basically munging through the > repository in bulk, replacing all the hashes. This seems a perfectly > adequate strategy at the moment. ... [I say again, the problem here is NOT forgery of hashes, though SCO like paranoia ...] ... but the hashes are a tiny part of the total space, even for trivial patches, so that, providing _NOW_ for a longer hash (and then why not use, say, SHA-256 for now as well) is prudent. We do not want to revisit the plumbing, in the next 3-10 years, for 16 bytes per hash. Finally I can do no more than quote Schneier: "SHA-1 has been broken. Not a reduced-round version. Not a simplified version. The real thing. ... It's time for us all to migrate away from SHA-1. Luckily, there are alternatives. The National Institute of Standards and Technology already has standards for longer -- and harder to break -- hash functions: SHA-224, SHA-256, SHA-384, and SHA-512. They're already government standards, and can already be used." and there are FOSS implementations. Or, put more simply by Jon Callas, PGP's CTO: "It's time to walk, but not run, to the fire exits. You don't see smoke, but the fire alarms have gone off." That's basically what he said last August [2004]. > --scott > > WASHTUB Panama Minister Moscow explosives KUGOWN hack Marxist LPMEDLEY > geneti
Re: SHA1 hash safety
Scott wrote: > Please, let's talk about hash collisions responsibly. Agreed. Chasing down links from the one Petr provided: http://cryptography.hyperlink.cz/MD5_collisions.html the best read I found was: MD5 To Be Considered Harmful Someday http://eprint.iacr.org/2004/357.pdf As the author, Dan Kaminsky, states: > it is far too easy to overestimate the risks described in this paper. This paper does a good job of explaining the vulnerabilities that MD5 has, currently (and yes, git uses SHA1 ...). We have far greater vulnerabilities from intentional or accidental coding errors, inadequately audited code, root exploits of user (non-kernel) code, compilation and build tools, unreliable hardware (how many of us use non-ECC memory - I do), poorly administered systems, ... -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <[EMAIL PROTECTED]> 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
On Sat, Apr 16, 2005 at 10:58:15AM -0400, C. Scott Ananian wrote: > Even given the known weaknesses in MD5, it would take much more than a > million documents to find MD5 collisions. I can only conclude that the > hash was being used incorrectly; most likely truncated (my wild-ass guess > would be to 32 bits; a collision is likely with > 50% probability in a > million document store for a hash of less than 40 bits). I've also seen non thread-safe GUID generation, using MD5m hit collisions: but of course that was due to the fact that the code had thread safety issues, not because anyone actually ever hit a MD5 collision... Of course there are constructed cases of MD5 collision, but those are pretty disinteresting. Give me two files that have useful content and the same hash, and then I'll be impressed. Linus has already weighed in that he doesn't give a crap. All the crypto-babble about collision whitepapers is uninteresting without a repo that has real collisions. git is far too cool as is - prove I should be concerned. -- Ross Vandegrift [EMAIL PROTECTED] "The good Christian should beware of mathematicians, and all those who make empty prophecies. The danger already exists that the mathematicians have made a covenant with the devil to darken the spirit and to confine man in the bonds of Hell." --St. Augustine, De Genesi ad Litteram, Book II, xviii, 37 - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: SHA1 hash safety
On Sat, 16 Apr 2005, Petr Baudis wrote: I know the current state of the art here. It's going to take more than just hearsay to convince me that full 128-bit MD5 collisions are likely. http://cryptography.hyperlink.cz/MD5_collisions.html OK, OK, I spoke too sloppily. Let me rephrase: It's going to take more than just hearsay to convince me that full 128-bit MD5 collisions *IN ARBITRARILY CHOSEN DOCUMENTS* are likely. I could add, "WITHOUT SPECIAL EFFORT BY AN ATTACKER". But you're right, I was too busy thrashing around with the basic probability cluestick to carefully distinguish MD5 (in which *collisions* can be found fairly easily now by an attacker, although not *preimages*) and SHA1 (which is what git is actually using, and still requires 2^69 hash computations to collide). And note again that these are not preimage attacks. Even with MD5, an attacker can't arbitrarily change existing code in the Linux kernel by creating a malicious file with the same MD5 hash. But extreme caution is necessary, because both of these hash mechanisms have been shown to be weak, and algorithms grow weaker with time, not stronger. I think the only conclusion that can be made is that "one should not rely on the hash for security". And I don't believe that we are. We should be careful to continue saying "branch 46f *in Linus' tree*" instead of just "branch 46f" and assuming that that is unique. The security is provided by Linus' control over his repository, not by the hash. --scott [The 'MD5 collisions in 15 minutes on a laptop' paper did surprise me. I vaguely remember hearing about this before, but I'd forgotten just how broken MD5 is. It's still a fine *hash* function; just not a terribly good *cryptographically secure* hash function.] Israel PBSUCCESS $400 million in gold bullion President Nader jihad RNC LPMEDLEY agent HTKEEPER Cheney SEQUIN SARANAC Clinton biowarfare ( http://cscott.net/ ) - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: SHA1 hash safety
Dear diary, on Sat, Apr 16, 2005 at 04:58:15PM CEST, I got a letter where "C. Scott Ananian" <[EMAIL PROTECTED]> told me that... > On Sat, 16 Apr 2005, Brian O'Mahoney wrote: > > >(1) I _have_ seen real-life collisions with MD5, in the context of > > Document management systems containing ~10^6 ms-WORD documents. > > Dude! You could have been *famous*! Why the > aitch-ee-double-hockey-sticks didn't you publish this when you found it? > Seriously, man. > > Even given the known weaknesses in MD5, it would take much more than a > million documents to find MD5 collisions. I can only conclude that the > hash was being used incorrectly; most likely truncated (my wild-ass guess > would be to 32 bits; a collision is likely with > 50% probability in a > million document store for a hash of less than 40 bits). > > I know the current state of the art here. It's going to take more than > just hearsay to convince me that full 128-bit MD5 collisions are likely. > I believe there are only two or so known to exist so far, and those were > found by a research team in China (which, yes, is fairly famous among the > cryptographic community now after publishing a paper consisting of little > apart from the two collisions themselves). http://cryptography.hyperlink.cz/MD5_collisions.html -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
On Sat, 16 Apr 2005, Brian O'Mahoney wrote: (1) I _have_ seen real-life collisions with MD5, in the context of Document management systems containing ~10^6 ms-WORD documents. Dude! You could have been *famous*! Why the aitch-ee-double-hockey-sticks didn't you publish this when you found it? Seriously, man. Even given the known weaknesses in MD5, it would take much more than a million documents to find MD5 collisions. I can only conclude that the hash was being used incorrectly; most likely truncated (my wild-ass guess would be to 32 bits; a collision is likely with > 50% probability in a million document store for a hash of less than 40 bits). I know the current state of the art here. It's going to take more than just hearsay to convince me that full 128-bit MD5 collisions are likely. I believe there are only two or so known to exist so far, and those were found by a research team in China (which, yes, is fairly famous among the cryptographic community now after publishing a paper consisting of little apart from the two collisions themselves). Please, let's talk about hash collisions responsibly. I posted earlier about the *actual computed probability* of finding two files with an SHA-1 collision before the sun goes supernova. It's 10^28 to 1 against. The recent cryptographic works has shown that there are certain situations where a decent amount of computer work (2^69 operations) can produce two sequences with the same hash, but these sequences are not freely chosen; they've got very specific structure. This attack does not apply to (effectively) random files sitting in a SCM. http://www.schneier.com/blog/archives/2005/02/sha1_broken.html That said, Linux's widespread use means that it may not be unimaginable for an attacker to devote this amount of resources to an attack, which would probably involve first committing some specially structured file to the SCM (but would Linus accept it?) and then silently corrupting said file via a SHA1 collision to toggle some bits (which would presumably Do Evil). Thus hashes other than SHA1 really ought to be considered... ...but the cryptographic community has not yet come to a conclusion on what the replacement ought to be. These attacks are so new that we don't really understand what it is about the structure of SHA1 which makes them possible, which makes it hard to determine which other hashes are similarly vulnerable. It will take time. I believe Linus has already stated on this list that his plan is to eventually provide a tool for bulk migration of an existing SHA1 git repository to a new hash type. Basically munging through the repository in bulk, replacing all the hashes. This seems a perfectly adequate strategy at the moment. --scott WASHTUB Panama Minister Moscow explosives KUGOWN hack Marxist LPMEDLEY genetic immediate radar SCRANTON COBRA JANE KGB Shoal Bay atomic Bejing ( http://cscott.net/ ) - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
Three points: (1) I _have_ seen real-life collisions with MD5, in the context of Document management systems containing ~10^6 ms-WORD documents. (2) The HMAC (ethernet-harware-address) of any interface _should_ help to make a unique Id. (3) While I havn't looked at the details of the plumbing, this is the time to make sure we can, easily, drop in SHA-160, SHA-256 (or whatever comes from NIST) when needed. David Lang wrote: > On Sat, 16 Apr 2005, Ingo Molnar wrote: > >> * David Lang <[EMAIL PROTECTED]> wrote: >> >>> this issue was raised a few days ago in the context of someone >>> tampering with the files and it was decided that the extra checks were >>> good enough to prevent this (at least for now), but what about >>> accidental collisions? >>> >>> if I am understanding things right the objects get saved in the >>> filesystem in filenames that are the SHA1 hash. of two legitimate >>> files have the same hash I don't see any way for both of them to >>> exist. >>> >>> yes the risk of any two files having the same has is low, but in the >>> earlier thread someone chimed in and said that they had two files on >>> their system that had the same hash.. >> >> >> you can add -DCOLLISION_CHECK to Makefile:CFLAGS to turn on collision >> checking (disabled currently). If there indeed exist two files that have >> different content but the same hash, could someone send those two files? > > > remember that the flap over SHA1 being 'broken' a couple weeks ago was > not from researchers finding multiple files with the same hash, but > finding that it was more likly then expected that files would have the > same hash. > > there was qa discussion on LKML within the last year about useing MD5 > hashes for identifying unique filesystem blocks (with the idea of being > able to merge identical blocks) and in that discussion it was pointed > out that collisions are a known real-life issue. > > so if collision detection is turned on in git, does that make it error > out if it runs into a second file with the same hash, or does it do > something else? > > David Lang > -- Brian - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
On Sat, 16 Apr 2005, Ingo Molnar wrote: * David Lang <[EMAIL PROTECTED]> wrote: this issue was raised a few days ago in the context of someone tampering with the files and it was decided that the extra checks were good enough to prevent this (at least for now), but what about accidental collisions? if I am understanding things right the objects get saved in the filesystem in filenames that are the SHA1 hash. of two legitimate files have the same hash I don't see any way for both of them to exist. yes the risk of any two files having the same has is low, but in the earlier thread someone chimed in and said that they had two files on their system that had the same hash.. you can add -DCOLLISION_CHECK to Makefile:CFLAGS to turn on collision checking (disabled currently). If there indeed exist two files that have different content but the same hash, could someone send those two files? remember that the flap over SHA1 being 'broken' a couple weeks ago was not from researchers finding multiple files with the same hash, but finding that it was more likly then expected that files would have the same hash. there was qa discussion on LKML within the last year about useing MD5 hashes for identifying unique filesystem blocks (with the idea of being able to merge identical blocks) and in that discussion it was pointed out that collisions are a known real-life issue. so if collision detection is turned on in git, does that make it error out if it runs into a second file with the same hash, or does it do something else? David Lang -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
* David Lang <[EMAIL PROTECTED]> wrote: > this issue was raised a few days ago in the context of someone > tampering with the files and it was decided that the extra checks were > good enough to prevent this (at least for now), but what about > accidental collisions? > > if I am understanding things right the objects get saved in the > filesystem in filenames that are the SHA1 hash. of two legitimate > files have the same hash I don't see any way for both of them to > exist. > > yes the risk of any two files having the same has is low, but in the > earlier thread someone chimed in and said that they had two files on > their system that had the same hash.. you can add -DCOLLISION_CHECK to Makefile:CFLAGS to turn on collision checking (disabled currently). If there indeed exist two files that have different content but the same hash, could someone send those two files? Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
SHA1 hash safety
this issue was raised a few days ago in the context of someone tampering with the files and it was decided that the extra checks were good enough to prevent this (at least for now), but what about accidental collisions? if I am understanding things right the objects get saved in the filesystem in filenames that are the SHA1 hash. of two legitimate files have the same hash I don't see any way for both of them to exist. yes the risk of any two files having the same has is low, but in the earlier thread someone chimed in and said that they had two files on their system that had the same hash.. David Lang -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html