Re: binary file compare...

2009-04-18 Thread Piet van Oostrum
> Adam Olsen (AO) wrote: >AO> The Wayback Machine has 150 billion pages, so 2**37. Google's index >AO> is a bit larger at over a trillion pages, so 2**40. A little closer >AO> than I'd like, but that's still 56294995000 to 1 odds of having >AO> *any* collisions between *any* of the file

Re: binary file compare...

2009-04-17 Thread Steven D'Aprano
On Fri, 17 Apr 2009 11:19:31 -0700, Adam Olsen wrote: > Actually, *cryptographic* hashes handle that just fine. Even for files > with just a 1 bit change the output is totally different. This is known > as the Avalanche Effect. Otherwise they'd be vulnerable to attacks. > > Which isn't to say

Re: binary file compare...

2009-04-17 Thread Lawrence D'Oliveiro
In message , Nigel Rantor wrote: > Adam Olsen wrote: > >> The chance of *accidentally* producing a collision, although >> technically possible, is so extraordinarily rare that it's completely >> overshadowed by the risk of a hardware or software failure producing >> an incorrect result. > > Not

Re: binary file compare...

2009-04-17 Thread Adam Olsen
On Apr 17, 9:59 am, SpreadTooThin wrote: > You know this is just insane.  I'd be satisfied with a CRC16 or > something in the situation i'm in. > I have two large files, one local and one remote.  Transferring every > byte across the internet to be sure that the two files are identical > is just n

Re: binary file compare...

2009-04-17 Thread Adam Olsen
On Apr 17, 9:59 am, norseman wrote: > The more complicated the math the harder it is to keep a higher form of > math from checking (or improperly displacing) a lower one.  Which, of > course, breaks the rules.  Commonly called improper thinking. A number > of math teasers make use of that. Of cou

Re: binary file compare...

2009-04-17 Thread Adam Olsen
On Apr 17, 5:30 am, Tim Wintle wrote: > On Thu, 2009-04-16 at 21:44 -0700, Adam Olsen wrote: > > The Wayback Machine has 150 billion pages, so 2**37.  Google's index > > is a bit larger at over a trillion pages, so 2**40.  A little closer > > than I'd like, but that's still 56294995000 to 1 od

Re: binary file compare...

2009-04-17 Thread SpreadTooThin
On Apr 17, 4:54 am, Nigel Rantor wrote: > Adam Olsen wrote: > > On Apr 16, 11:15 am, SpreadTooThin wrote: > >> And yes he is right CRCs hashing all have a probability of saying that > >> the files are identical when in fact they are not. > > > Here's the bottom line.  It is either: > > > A) Sever

Re: binary file compare...

2009-04-17 Thread norseman
Adam Olsen wrote: On Apr 16, 11:15 am, SpreadTooThin wrote: And yes he is right CRCs hashing all have a probability of saying that the files are identical when in fact they are not. Here's the bottom line. It is either: A) Several hundred years of mathematics and cryptography are wrong. The

Re: binary file compare...

2009-04-17 Thread Tim Wintle
On Thu, 2009-04-16 at 21:44 -0700, Adam Olsen wrote: > The Wayback Machine has 150 billion pages, so 2**37. Google's index > is a bit larger at over a trillion pages, so 2**40. A little closer > than I'd like, but that's still 56294995000 to 1 odds of having > *any* collisions between *any* o

Re: binary file compare...

2009-04-17 Thread Nigel Rantor
Adam Olsen wrote: On Apr 16, 11:15 am, SpreadTooThin wrote: And yes he is right CRCs hashing all have a probability of saying that the files are identical when in fact they are not. Here's the bottom line. It is either: A) Several hundred years of mathematics and cryptography are wrong. The

Re: binary file compare...

2009-04-17 Thread Nigel Rantor
Adam Olsen wrote: On Apr 16, 4:27 pm, "Rhodri James" wrote: On Thu, 16 Apr 2009 10:44:06 +0100, Adam Olsen wrote: On Apr 16, 3:16 am, Nigel Rantor wrote: Okay, before I tell you about the empirical, real-world evidence I have could you please accept that hashes collide and that no matter ho

Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 16, 4:27 pm, "Rhodri James" wrote: > On Thu, 16 Apr 2009 10:44:06 +0100, Adam Olsen wrote: > > On Apr 16, 3:16 am, Nigel Rantor wrote: > >> Okay, before I tell you about the empirical, real-world evidence I have > >> could you please accept that hashes collide and that no matter how many

Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 16, 11:15 am, SpreadTooThin wrote: > And yes he is right CRCs hashing all have a probability of saying that > the files are identical when in fact they are not. Here's the bottom line. It is either: A) Several hundred years of mathematics and cryptography are wrong. The birthday problem

Re: binary file compare...

2009-04-16 Thread Rhodri James
On Thu, 16 Apr 2009 10:44:06 +0100, Adam Olsen wrote: On Apr 16, 3:16 am, Nigel Rantor wrote: Okay, before I tell you about the empirical, real-world evidence I have could you please accept that hashes collide and that no matter how many samples you use the probability of finding two files th

Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 16, 8:59 am, Grant Edwards wrote: > On 2009-04-16, Adam Olsen wrote: > > I'm afraid you will need to back up your claims with real files. > > Although MD5 is a smaller, older hash (128 bits, so you only need > > 2**64 files to find collisions), > > You don't need quite that many to have a

Re: binary file compare...

2009-04-16 Thread SpreadTooThin
On Apr 16, 3:16 am, Nigel Rantor wrote: > Adam Olsen wrote: > > On Apr 15, 12:56 pm, Nigel Rantor wrote: > >> Adam Olsen wrote: > >>> The chance of *accidentally* producing a collision, although > >>> technically possible, is so extraordinarily rare that it's completely > >>> overshadowed by the

Re: binary file compare...

2009-04-16 Thread Grant Edwards
On 2009-04-16, Adam Olsen wrote: > The chance of *accidentally* producing a collision, although > technically possible, is so extraordinarily rare that it's > completely overshadowed by the risk of a hardware or software > failure producing an incorrect result. Not when

Re: binary file compare...

2009-04-16 Thread Nigel Rantor
Adam Olsen wrote: On Apr 16, 3:16 am, Nigel Rantor wrote: Adam Olsen wrote: On Apr 15, 12:56 pm, Nigel Rantor wrote: Adam Olsen wrote: The chance of *accidentally* producing a collision, although technically possible, is so extraordinarily rare that it's completely overshadowed by the risk

Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 16, 3:16 am, Nigel Rantor wrote: > Adam Olsen wrote: > > On Apr 15, 12:56 pm, Nigel Rantor wrote: > >> Adam Olsen wrote: > >>> The chance of *accidentally* producing a collision, although > >>> technically possible, is so extraordinarily rare that it's completely > >>> overshadowed by the

Re: binary file compare...

2009-04-16 Thread Nigel Rantor
Adam Olsen wrote: On Apr 15, 12:56 pm, Nigel Rantor wrote: Adam Olsen wrote: The chance of *accidentally* producing a collision, although technically possible, is so extraordinarily rare that it's completely overshadowed by the risk of a hardware or software failure producing an incorrect resu

Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 15, 12:56 pm, Nigel Rantor wrote: > Adam Olsen wrote: > > The chance of *accidentally* producing a collision, although > > technically possible, is so extraordinarily rare that it's completely > > overshadowed by the risk of a hardware or software failure producing > > an incorrect result.

Re: binary file compare...

2009-04-15 Thread Nigel Rantor
Adam Olsen wrote: The chance of *accidentally* producing a collision, although technically possible, is so extraordinarily rare that it's completely overshadowed by the risk of a hardware or software failure producing an incorrect result. Not when you're using them to compare lots of files. Tr

Re: binary file compare...

2009-04-15 Thread Adam Olsen
On Apr 15, 11:04 am, Nigel Rantor wrote: > The fact that two md5 hashes are equal does not mean that the sources > they were generated from are equal. To do that you must still perform a > byte-by-byte comparison which is much less work for the processor than > generating an md5 or sha hash. > > I

Re: binary file compare...

2009-04-15 Thread SpreadTooThin
On Apr 15, 8:04 am, Grant Edwards wrote: > On 2009-04-15, Martin wrote: > > > > > Hi, > > > On Mon, Apr 13, 2009 at 10:03 PM, Grant Edwards wrote: > >> On 2009-04-13, SpreadTooThin wrote: > > >>> I want to compare two binary files and see if they are the same. > >>> I see the filecmp.cmp functi

Re: binary file compare...

2009-04-15 Thread Nigel Rantor
Grant Edwards wrote: We all rail against premature optimization, but using a checksum instead of a direct comparison is premature unoptimization. ;) And more than that, will provide false positives for some inputs. So, basically it's a worse-than-useless approach for determining if two files

Re: binary file compare...

2009-04-15 Thread Nigel Rantor
Martin wrote: On Wed, Apr 15, 2009 at 11:03 AM, Steven D'Aprano wrote: The checksum does look at every byte in each file. Checksumming isn't a way to avoid looking at each byte of the two files, it is a way of mapping all the bytes to a single number. My understanding of the original question

Re: binary file compare...

2009-04-15 Thread Grant Edwards
On 2009-04-15, Martin wrote: > On Wed, Apr 15, 2009 at 11:03 AM, Steven D'Aprano > I'd still say rather burn CPU cycles than development hours (if I got > the question right), _Hours_? Calling the file compare module takes _one_line_of_code_. Implementing a file compare from scratch takes abo

Re: binary file compare...

2009-04-15 Thread Grant Edwards
On 2009-04-15, Martin wrote: > Hi, > > On Mon, Apr 13, 2009 at 10:03 PM, Grant Edwards wrote: >> On 2009-04-13, SpreadTooThin wrote: >> >>> I want to compare two binary files and see if they are the same. >>> I see the filecmp.cmp function but I don't get a warm fuzzy feeling >>> that it is doin

Re: binary file compare...

2009-04-15 Thread Martin
On Wed, Apr 15, 2009 at 11:03 AM, Steven D'Aprano wrote: > The checksum does look at every byte in each file. Checksumming isn't a > way to avoid looking at each byte of the two files, it is a way of > mapping all the bytes to a single number. My understanding of the original question was a way t

Re: binary file compare...

2009-04-15 Thread Steven D'Aprano
On Wed, 15 Apr 2009 07:54:20 +0200, Martin wrote: >> Perhaps I'm being dim, but how else are you going to decide if two >> files are the same unless you compare the bytes in the files? > > I'd say checksums, just about every download relies on checksums to > verify you do have indeed the same fil

Re: binary file compare...

2009-04-14 Thread Martin
Hi, On Mon, Apr 13, 2009 at 10:03 PM, Grant Edwards wrote: > On 2009-04-13, SpreadTooThin wrote: > >> I want to compare two binary files and see if they are the same. >> I see the filecmp.cmp function but I don't get a warm fuzzy feeling >> that it is doing a byte by byte comparison of two files

Re: binary file compare...

2009-04-14 Thread Adam Olsen
On Apr 13, 8:39 pm, Grant Edwards wrote: > On 2009-04-13, Peter Otten <__pete...@web.de> wrote: > > > But there's a cache. A change of file contents may go > > undetected as long as the file stats don't change: > > Good point.  You can fool it if you force the stats to their > old values after you

binary file compare...

2009-04-14 Thread SpreadTooThin
I want to compare two binary files and see if they are the same. I see the filecmp.cmp function but I don't get a warm fuzzy feeling that it is doing a byte by byte comparison of two files to see if they are they same. What should I be using if not filecmp.cmp? -- http://mail.python.org/mailman/li

Re: binary file compare...

2009-04-13 Thread Grant Edwards
On 2009-04-13, Peter Otten <__pete...@web.de> wrote: > But there's a cache. A change of file contents may go > undetected as long as the file stats don't change: Good point. You can fool it if you force the stats to their old values after you modify a file and you don't clear the cache. -- Gra

Re: binary file compare...

2009-04-13 Thread Dave Angel
SpreadTooThin wrote: On Apr 13, 2:37 pm, Grant Edwards wrote: On 2009-04-13, Grant Edwards wrote: On 2009-04-13, SpreadTooThin wrote: I want to compare two binary files and see if they are the same. I see the filecmp.cmp function but I don't get a warm fuzzy feeling that i

Re: binary file compare...

2009-04-13 Thread Steven D'Aprano
On Mon, 13 Apr 2009 15:03:32 -0500, Grant Edwards wrote: > On 2009-04-13, SpreadTooThin wrote: > >> I want to compare two binary files and see if they are the same. I see >> the filecmp.cmp function but I don't get a warm fuzzy feeling that it >> is doing a byte by byte comparison of two files t

Re: binary file compare...

2009-04-13 Thread Peter Otten
Grant Edwards wrote: > On 2009-04-13, Grant Edwards wrote: >> On 2009-04-13, SpreadTooThin wrote: >> >>> I want to compare two binary files and see if they are the same. >>> I see the filecmp.cmp function but I don't get a warm fuzzy feeling >>> that it is doing a byte by byte comparison of two

Re: binary file compare...

2009-04-13 Thread SpreadTooThin
On Apr 13, 2:37 pm, Grant Edwards wrote: > On 2009-04-13, Grant Edwards wrote: > > > > > On 2009-04-13, SpreadTooThin wrote: > > >> I want to compare two binary files and see if they are the same. > >> I see the filecmp.cmp function but I don't get a warm fuzzy feeling > >> that it is doing a by

Re: binary file compare...

2009-04-13 Thread Grant Edwards
On 2009-04-13, Grant Edwards wrote: > On 2009-04-13, SpreadTooThin wrote: > >> I want to compare two binary files and see if they are the same. >> I see the filecmp.cmp function but I don't get a warm fuzzy feeling >> that it is doing a byte by byte comparison of two files to see if they >> are t

Re: binary file compare...

2009-04-13 Thread SpreadTooThin
On Apr 13, 2:03 pm, Grant Edwards wrote: > On 2009-04-13, SpreadTooThin wrote: > > > I want to compare two binary files and see if they are the same. > > I see the filecmp.cmp function but I don't get a warm fuzzy feeling > > that it is doing a byte by byte comparison of two files to see if they

Re: binary file compare...

2009-04-13 Thread Grant Edwards
On 2009-04-13, SpreadTooThin wrote: > I want to compare two binary files and see if they are the same. > I see the filecmp.cmp function but I don't get a warm fuzzy feeling > that it is doing a byte by byte comparison of two files to see if they > are they same. Perhaps I'm being dim, but how el

Re: binary file compare...

2009-04-13 Thread SpreadTooThin
On Apr 13, 2:00 pm, Przemyslaw Kaminski wrote: > SpreadTooThin wrote: > > I want to compare two binary files and see if they are the same. > > I see the filecmp.cmp function but I don't get a warm fuzzy feeling > > that it is doing a byte by byte comparison of two files to see if they > > are they

Re: binary file compare...

2009-04-13 Thread Przemyslaw Kaminski
SpreadTooThin wrote: > I want to compare two binary files and see if they are the same. > I see the filecmp.cmp function but I don't get a warm fuzzy feeling > that it is doing a byte by byte comparison of two files to see if they > are they same. > > What should I be using if not filecmp.cmp? W