Re: SHA1 hash safety

2005-04-20 Thread David Meybohm
On Tue, Apr 19, 2005 at 06:48:57PM -0400, C. Scott Ananian wrote:
 On Tue, 19 Apr 2005, David Meybohm wrote:
 
 But doesn't this require assuming the distribution of MD5 is uniform,
 and don't the papers finding collisions in less show it's not? So, your
 birthday-argument for calculating the probability wouldn't apply, because
 it rests on the assumption MD5 is uniform, and it isn't.
 
 No, the collision papers don't show this at all.

I didn't mean they showed it directly. I meant by finding collisions in
MD5 quickly, MD5 would have to have some non-uniformity. But that's
nevertheless wrong because uniformness and collision finding ability
aren't related. Sorry to have wasted everyone's time.

Dave
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-19 Thread C. Scott Ananian
On Tue, 19 Apr 2005, David Meybohm wrote:
But doesn't this require assuming the distribution of MD5 is uniform,
and don't the papers finding collisions in less show it's not? So, your
birthday-argument for calculating the probability wouldn't apply, because
it rests on the assumption MD5 is uniform, and it isn't.
No, the collision papers don't show this at all.
 --scott
atomic strategic HBDRILL SARANAC COBRA JUDY Ft. Meade assassination politics 
Mossad HOPEFUL ZPSEMANTIC DTFROGS HTKEEPER LITEMPO LIONIZER operation
 ( http://cscott.net/ )
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-18 Thread Andy Isaacson
[trimmed cc list, nobody wants to read this noise]

On Sat, Apr 16, 2005 at 11:35:39PM +0200, Brian O'Mahoney wrote:
  (1) I _have_ seen real-life collisions with MD5, in the context of
 Document management systems containing ~10^6 ms-WORD documents.
  
  Dude!  You could have been *famous*!  Why the
  aitch-ee-double-hockey-sticks didn't you publish this when you found it?
  Seriously, man.
 
 The MD5 has was fine, or at least the code (a) produced the correct
 results on the published test cases, and, (b) was properly applied to
 all bytes of the file(s). I was surprised when it happened, which is why
 I bothered to post to this list at this time, so I make two more points

OK, I guess it's time for some remedial math.

There are 2^128 = 340282366920938463463374607431768211456 different MD5
hashes.

You are suggesting that you found a collision using ~1e6 = ~1,000,000
plaintexts.

Let's suppose there were actually 100,000,000 = 1e8 plaintexts, just in
case you underestimated the number.

Applying the birthday paradox, we have a 50% probability that you'd find
one collision if there were ~7,213,475,309,916,173 possible hash values.
If you extend the birthday argument (what is the probability of a
collision if you take N samples from a set of size M?) you get the
following results, with N = 1e8:

50% (1 in 2) probability of collision in   7,213,475,309,916,173.
1% (1 in 100) probability of collision in497,496,027,172,833,194.
.05% (1 in 1845) probability of collision in   9,223,372,036,854,775,806.

That's where my quick-and-dirty solver craps out, but we're still a
really long ways from

 340,282,366,920,938,463,463,374,607,431,768,211,456.

A simple linear extrapolation (certainly wrong, but not by more than a
few dozen orders of magnitude) says that the odds would be
1 in 68,056,473,384,187,692,692 for the full MD5 hash (I'm not even
going to dignify that with a percentage).

I'm not going to do the sums, but I would hazard a guess that it's more
likely your PC suffered a cosmic-ray-induced memory fault - EACH OF THE
FOUR TIMES YOU TESTED IT - causing it to report the same MD5, than that
you actually discovered a collision with a measly million (or even
hundred million) plaintexts.

(Of course, I don't know how many tests of the hash you actually did.
But the point stands.)

Hell, if you're *that* lucky, what are you doing in IT?  You could be
making a killing at the roulette table.

Or, even more likely, there was some other factor in the system (most
likely that it was only using a few bits, probably 32, of the hash
when looking for collisions) that resulted in a false alarm.

If you had actual evidence of a collision, I'd love to see it - even if
it's just the equivalent of
% md5 foo
d3b07384d113edec49eaa6238ad5ff00 foo
% md5 bar
d3b07384d113edec49eaa6238ad5ff00 bar
% cmp foo bar
foo bar differ: byte 25, line 1
%

But in the absence of actual evidence, we have to assume (just based on
the probabilities) that there was some error in your testing.

-andy
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-18 Thread C. Scott Ananian
On Mon, 18 Apr 2005, Andy Isaacson wrote:
If you had actual evidence of a collision, I'd love to see it - even if
it's just the equivalent of
% md5 foo
d3b07384d113edec49eaa6238ad5ff00 foo
% md5 bar
d3b07384d113edec49eaa6238ad5ff00 bar
% cmp foo bar
foo bar differ: byte 25, line 1
%
But in the absence of actual evidence, we have to assume (just based on
the probabilities) that there was some error in your testing.
I've already had a long correspondence with this poster.  He claims that 
this happened 7 years ago, involved a commercial contract covered by 
Swiss Banking Law (with caps!) and that, of course, he certainly doesn't 
retain [his] client's documents, and even if he *did*, he wouldn't show 
them to *me*.

And then he was unable to comprehend that I couldn't accept his word alone 
as prima facie evidence that the laws of probability did not apply to him or 
his clients.

I've been a coder far too long to attribute to The Mysterious Hand Of 
God what can adequately be described by subtle programmer error.

The most reasonable explanation, given the (lack of) evidence, is that 
the programmer involved quickly took refuge in a (wildly improbable, but 
his clients'll never know) MD5 collision instead of buckling down and 
finding the bug in his code.
 --scott

ODOATH Ortega FBI SGUAT AEBARMAN India Peking ODACID operation RYBAT 
[Hello to all my fans in domestic surveillance] for Dummies KUCLUB
 ( http://cscott.net/ )
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread Ingo Molnar

* David Lang [EMAIL PROTECTED] wrote:

 this issue was raised a few days ago in the context of someone 
 tampering with the files and it was decided that the extra checks were 
 good enough to prevent this (at least for now), but what about 
 accidental collisions?
 
 if I am understanding things right the objects get saved in the 
 filesystem in filenames that are the SHA1 hash. of two legitimate 
 files have the same hash I don't see any way for both of them to 
 exist.
 
 yes the risk of any two files having the same has is low, but in the 
 earlier thread someone chimed in and said that they had two files on 
 their system that had the same hash..

you can add -DCOLLISION_CHECK to Makefile:CFLAGS to turn on collision 
checking (disabled currently). If there indeed exist two files that have 
different content but the same hash, could someone send those two files?

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread Brian O'Mahoney
Three points:
(1) I _have_ seen real-life collisions with MD5, in the context of
Document management systems containing ~10^6 ms-WORD documents.
(2) The HMAC (ethernet-harware-address) of any interface _should_
help to make a unique Id.
(3) While I havn't looked at the details of the plumbing, this is
the time to make sure we can, easily, drop in SHA-160, SHA-256
(or whatever comes from NIST) when needed.


David Lang wrote:
 On Sat, 16 Apr 2005, Ingo Molnar wrote:
 
 * David Lang [EMAIL PROTECTED] wrote:

 this issue was raised a few days ago in the context of someone
 tampering with the files and it was decided that the extra checks were
 good enough to prevent this (at least for now), but what about
 accidental collisions?

 if I am understanding things right the objects get saved in the
 filesystem in filenames that are the SHA1 hash. of two legitimate
 files have the same hash I don't see any way for both of them to
 exist.

 yes the risk of any two files having the same has is low, but in the
 earlier thread someone chimed in and said that they had two files on
 their system that had the same hash..


 you can add -DCOLLISION_CHECK to Makefile:CFLAGS to turn on collision
 checking (disabled currently). If there indeed exist two files that have
 different content but the same hash, could someone send those two files?
 
 
 remember that the flap over SHA1 being 'broken' a couple weeks ago was
 not from researchers finding multiple files with the same hash, but
 finding that it was more likly then expected that files would have the
 same hash.
 
 there was qa discussion on LKML within the last year about useing MD5
 hashes for identifying unique filesystem blocks (with the idea of being
 able to merge identical blocks) and in that discussion it was pointed
 out that collisions are a known real-life issue.
 
 so if collision detection is turned on in git, does that make it error
 out if it runs into a second file with the same hash, or does it do
 something else?
 
 David Lang
 

-- 
Brian
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: SHA1 hash safety

2005-04-16 Thread Petr Baudis
Dear diary, on Sat, Apr 16, 2005 at 04:58:15PM CEST, I got a letter
where C. Scott Ananian [EMAIL PROTECTED] told me that...
 On Sat, 16 Apr 2005, Brian O'Mahoney wrote:
 
 (1) I _have_ seen real-life collisions with MD5, in the context of
Document management systems containing ~10^6 ms-WORD documents.
 
 Dude!  You could have been *famous*!  Why the 
 aitch-ee-double-hockey-sticks didn't you publish this when you found it?
 Seriously, man.
 
 Even given the known weaknesses in MD5, it would take much more than a 
 million documents to find MD5 collisions.  I can only conclude that the 
 hash was being used incorrectly; most likely truncated (my wild-ass guess 
 would be to 32 bits; a collision is likely with  50% probability in a 
 million document store for a hash of less than 40 bits).
 
 I know the current state of the art here.  It's going to take more than 
 just hearsay to convince me that full 128-bit MD5 collisions are likely. 
 I believe there are only two or so known to exist so far, and those were 
 found by a research team in China (which, yes, is fairly famous among the 
 cryptographic community now after publishing a paper consisting of little 
 apart from the two collisions themselves).

http://cryptography.hyperlink.cz/MD5_collisions.html

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread David Lang
that's the difference between CS researchers and sysadmins.
sysadmins realize that there are an infinante number of files that map to 
the same hash value and plan accordingly (becouse we KNOW we will run 
across them eventually), and don't see it as a big deal when we finally 
do.

CS researches quote statistics that show how hard it is to intentiallly 
create two files with the same hash and insist it just doesn't happen 
until presented by the proof, at which point it is a big deal.

a difference in viewpoints.
David Lang
 On Sat, 16 Apr 2005, C. Scott Ananian wrote:
Date: Sat, 16 Apr 2005 10:58:15 -0400 (EDT)
From: C. Scott Ananian [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: David Lang [EMAIL PROTECTED], Ingo Molnar [EMAIL PROTECTED],
git@vger.kernel.org
Subject: Re: SHA1 hash safety
On Sat, 16 Apr 2005, Brian O'Mahoney wrote:
(1) I _have_ seen real-life collisions with MD5, in the context of
   Document management systems containing ~10^6 ms-WORD documents.
Dude!  You could have been *famous*!  Why the aitch-ee-double-hockey-sticks 
didn't you publish this when you found it?
Seriously, man.

Even given the known weaknesses in MD5, it would take much more than a 
million documents to find MD5 collisions.  I can only conclude that the hash 
was being used incorrectly; most likely truncated (my wild-ass guess would be 
to 32 bits; a collision is likely with  50% probability in a million 
document store for a hash of less than 40 bits).

I know the current state of the art here.  It's going to take more than just 
hearsay to convince me that full 128-bit MD5 collisions are likely. I believe 
there are only two or so known to exist so far, and those were found by a 
research team in China (which, yes, is fairly famous among the cryptographic 
community now after publishing a paper consisting of little apart from the 
two collisions themselves).

Please, let's talk about hash collisions responsibly.  I posted earlier about 
the *actual computed probability* of finding two files with an SHA-1 
collision before the sun goes supernova.  It's 10^28 to 1 against.
The recent cryptographic works has shown that there are certain situations 
where a decent amount of computer work (2^69 operations) can produce two 
sequences with the same hash, but these sequences are not freely chosen; 
they've got very specific structure.  This attack does not apply to 
(effectively) random files sitting in a SCM.
 http://www.schneier.com/blog/archives/2005/02/sha1_broken.html

That said, Linux's widespread use means that it may not be unimaginable for 
an attacker to devote this amount of resources to an attack, which would 
probably involve first committing some specially structured file to the SCM 
(but would Linus accept it?) and then silently corrupting said file via a 
SHA1 collision to toggle some bits (which would presumably Do Evil).  Thus 
hashes other than SHA1 really ought to be considered...

...but the cryptographic community has not yet come to a conclusion on what 
the replacement ought to be.  These attacks are so new that we don't really 
understand what it is about the structure of SHA1 which makes them possible, 
which makes it hard to determine which other hashes are similarly vulnerable. 
It will take time.

I believe Linus has already stated on this list that his plan is to 
eventually provide a tool for bulk migration of an existing SHA1 git 
repository to a new hash type.   Basically munging through the repository in 
bulk, replacing all the hashes.  This seems a perfectly adequate strategy at 
the moment.
--scott

WASHTUB Panama Minister Moscow explosives KUGOWN hack Marxist LPMEDLEY 
genetic immediate radar SCRANTON COBRA JANE KGB Shoal Bay atomic Bejing
( http://cscott.net/ )

--
There are two ways of constructing a software design. One way is to make it so 
simple that there are obviously no deficiencies. And the other way is to make 
it so complicated that there are no obvious deficiencies.
 -- C.A.R. Hoare
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread Paul Jackson
 what I'm talking about is the chance that somewhere, sometime there will 
 be two different documents that end up with the same hash

I have vastly greater chance of a file colliding due to hardware or
software glitch than a random message digest collision of two legitimate
documents.

I've lost quite a few files in 25 years of computing to just
such glitches, sometimes without knowing it until months or years
later.

We've already computed the chances of a random pure hash collision
with SHA1 - it's something like an average of 1 collision every
10 billion years if we have 10,000 coders generating 1 new file
version every minute, non-stop, 24 hours a day, 365 days a year.

Get real.  There are _many_ sources of random error in our
tools.  When some sources are billions of billions times
more likely to occur, it makes sense to worry about them first.

Reminds me of the drunk looking under the lamp post for the
house keys he dropped - because that's where the light is.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread Paul Jackson
 sysadmins realize that there are an infinante number of files that map to 

Sysadmins know that there are an infinite ways for their
systems to crap out, and try to cover for the ones that
there is a snow balls chance in Hades of them seeing in
their lifetime.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread Martin Mares
Hi!

 We've already computed the chances of a random pure hash collision
 with SHA1 - it's something like an average of 1 collision every
 10 billion years if we have 10,000 coders generating 1 new file
 version every minute, non-stop, 24 hours a day, 365 days a year.

GIT is safe even for the millions of monkeys writing Shakespeare :-)

Have a nice fortnight
-- 
Martin `MJ' Mares   [EMAIL PROTECTED]   http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
Homo homini lupus, frater fratri lupior, bohemus bohemo lupissimus.
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread Tkil
 Brian == Brian O'Mahoney [EMAIL PROTECTED] writes:

Brian (1) I _have_ seen real-life collisions with MD5, in the context
Brian of Document management systems containing ~10^6 ms-WORD
Brian documents.

Was this whole-document based, or was it blocked or otherwise chunked?

I'm wondering, because (SFAIK) the MS word on-disk format is some
serialized version of one or more containers, possibly nested.  If
you're blocks are sized so that the first block is the same across
multiple files, this could cause collisions -- but they're the good
kind, that allow us to save disk space, so they're not a problem.

Are you saying that, within 1e7 documents, that you found two
documents with the same MD5 hash yet different contents?

That's not an accusation, btw; I'm just trying to get clarity on the
terminology.  I'm fascinated by the idea of using this sort of
content-addressable filesystem, but the chance of any collision at all
wigs me out.  I look at the probabilities, but still.

Thanks,
t.
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread Paul Jackson
 but the chance of any collision at all wigs me out.

Guess you're just going to get wigged out then.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread David A. Wheeler
Paul Jackson wrote:
what I'm talking about is the chance that somewhere, sometime there will 
be two different documents that end up with the same hash
I have vastly greater chance of a file colliding due to hardware or
software glitch than a random message digest collision of two legitimate
documents.
The probability of an accidental overlap for SHA-1 for two
different files is absurdly remote; it's just not worth worrying about.
However, the possibility of an INTENTIONAL overlap is a completely
different matter.  I think the hash algorithm should change in the
future; I have a proposal below.
Someone has ALREADY broken into a server to modify the Linux kernel
code already, so the idea of an attack on kernel code
is not an idle fantasy. MD5 is dead, and SHA-1's work factor has
already been sufficiently broken that people have already been told
walk to the exits (i.e., DO NOT USE SHA-1 for new programs like git).
The fact that blobs are compressed first, with a length header
in front, _may_ make it harder to attack.  But maybe not.
I haven't checked for this case, but most decompression algorithms
I know of have a don't change mode that essentially just copies the
data behind it.  If the one used in git has such a mode
(I bet it does!), an attacker could use that mode to
make it MUCH easier to create an attack vector than it would
appear at first.  Now the attacker just needs to create a collision
(hmmm, where was that paper?).  Remember, you don't need to
run a hash algorithm over an entire file; you can precompute
to near the end, and then try your iterations from there.
A little hardware (inc. FPGAs) would speed the attack.
Of course, that assumes you actually
check everything to make sure that an attacker can't slip
in something different. After each rsync, are all new files'
hash values checked?  Do they uncompress to right length?
Do they have excess data after the decompression?
I'm hoping that sort of input-checking (since the data
might be from an attacker, if indirectly!) is already going on,
though I haven't reviewed the git source code.
While the jury's still out, the current belief by most folks
I talk to is that SHA-1 variants with more bits, such as SHA-256,
are the way to go now.  The SHA-1 attack simply reduces
the work factor (it's not a COMPLETE break), so adding
more bits is believed to increase the work factor
enough to counter it.
Adding more information to the hash can make attacking even harder.
Here's one idea: whenever that hash algorithm
switch occurs, create a new hash value as this:
  SHA-256 + uncompressed-length
Where SHA-256 is computed just like SHA-1 is now, e.g.,
SHA-256(file) where file = typecode + length + compressed data.
Leave the internal format as-is (with the length embedded as well).
This means that an attacker has to come up with an attack
that creates the same length uncompressed, yet has the same hash
of the compressed result. That's harder to do.
Length is also really, really cheap to compute :-).
That also might help the convince the what happens if there's
an accidental collision crowd: now, if the file lengths
are different, you're GUARANTEED that the hash values are different,
though that's not the best reason to do that.
One reason to think about switching sooner rather than later
is that it'd be really nice if the object store also included
signatures, so that in one fell swoop you could check who signed what
(and thus you could later on CONFIRM with much more certainty who
REALLY submitted a given change... say if it was clearly malicious).
If you switch hash algorithms, the signatures might not work,
depending on how you do it.
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread Paul Jackson
I have nothing further to contribute to this subtopic.
Good luck with it.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html