Re: binary file compare...

2009-04-18 Thread Piet van Oostrum
Adam Olsen rha...@gmail.com (AO) wrote: AO The Wayback Machine has 150 billion pages, so 2**37. Google's index AO is a bit larger at over a trillion pages, so 2**40. A little closer AO than I'd like, but that's still 56294995000 to 1 odds of having AO *any* collisions between *any* of the

Re: binary file compare...

2009-04-17 Thread Nigel Rantor
Adam Olsen wrote: On Apr 16, 11:15 am, SpreadTooThin bjobrie...@gmail.com wrote: And yes he is right CRCs hashing all have a probability of saying that the files are identical when in fact they are not. Here's the bottom line. It is either: A) Several hundred years of mathematics and

Re: binary file compare...

2009-04-17 Thread Nigel Rantor
Adam Olsen wrote: On Apr 16, 4:27 pm, Rhodri James rho...@wildebst.demon.co.uk wrote: On Thu, 16 Apr 2009 10:44:06 +0100, Adam Olsen rha...@gmail.com wrote: On Apr 16, 3:16 am, Nigel Rantor wig...@wiggly.org wrote: Okay, before I tell you about the empirical, real-world evidence I have could

Re: binary file compare...

2009-04-17 Thread Tim Wintle
On Thu, 2009-04-16 at 21:44 -0700, Adam Olsen wrote: The Wayback Machine has 150 billion pages, so 2**37. Google's index is a bit larger at over a trillion pages, so 2**40. A little closer than I'd like, but that's still 56294995000 to 1 odds of having *any* collisions between *any* of

Re: binary file compare...

2009-04-17 Thread norseman
Adam Olsen wrote: On Apr 16, 11:15 am, SpreadTooThin bjobrie...@gmail.com wrote: And yes he is right CRCs hashing all have a probability of saying that the files are identical when in fact they are not. Here's the bottom line. It is either: A) Several hundred years of mathematics and

Re: binary file compare...

2009-04-17 Thread SpreadTooThin
On Apr 17, 4:54 am, Nigel Rantor wig...@wiggly.org wrote: Adam Olsen wrote: On Apr 16, 11:15 am, SpreadTooThin bjobrie...@gmail.com wrote: And yes he is right CRCs hashing all have a probability of saying that the files are identical when in fact they are not. Here's the bottom line.  It

Re: binary file compare...

2009-04-17 Thread Adam Olsen
On Apr 17, 5:30 am, Tim Wintle tim.win...@teamrubber.com wrote: On Thu, 2009-04-16 at 21:44 -0700, Adam Olsen wrote: The Wayback Machine has 150 billion pages, so 2**37.  Google's index is a bit larger at over a trillion pages, so 2**40.  A little closer than I'd like, but that's still

Re: binary file compare...

2009-04-17 Thread Adam Olsen
On Apr 17, 9:59 am, norseman norse...@hughes.net wrote: The more complicated the math the harder it is to keep a higher form of math from checking (or improperly displacing) a lower one.  Which, of course, breaks the rules.  Commonly called improper thinking. A number of math teasers make use

Re: binary file compare...

2009-04-17 Thread Adam Olsen
On Apr 17, 9:59 am, SpreadTooThin bjobrie...@gmail.com wrote: You know this is just insane.  I'd be satisfied with a CRC16 or something in the situation i'm in. I have two large files, one local and one remote.  Transferring every byte across the internet to be sure that the two files are

Re: binary file compare...

2009-04-17 Thread Lawrence D'Oliveiro
In message mailman.3934.1239821812.11746.python-l...@python.org, Nigel Rantor wrote: Adam Olsen wrote: The chance of *accidentally* producing a collision, although technically possible, is so extraordinarily rare that it's completely overshadowed by the risk of a hardware or software

Re: binary file compare...

2009-04-17 Thread Steven D'Aprano
On Fri, 17 Apr 2009 11:19:31 -0700, Adam Olsen wrote: Actually, *cryptographic* hashes handle that just fine. Even for files with just a 1 bit change the output is totally different. This is known as the Avalanche Effect. Otherwise they'd be vulnerable to attacks. Which isn't to say you

Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 15, 12:56 pm, Nigel Rantor wig...@wiggly.org wrote: Adam Olsen wrote: The chance of *accidentally* producing a collision, although technically possible, is so extraordinarily rare that it's completely overshadowed by the risk of a hardware or software failure producing an incorrect

Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 16, 3:16 am, Nigel Rantor wig...@wiggly.org wrote: Adam Olsen wrote: On Apr 15, 12:56 pm, Nigel Rantor wig...@wiggly.org wrote: Adam Olsen wrote: The chance of *accidentally* producing a collision, although technically possible, is so extraordinarily rare that it's completely

Re: binary file compare...

2009-04-16 Thread Nigel Rantor
Adam Olsen wrote: On Apr 16, 3:16 am, Nigel Rantor wig...@wiggly.org wrote: Adam Olsen wrote: On Apr 15, 12:56 pm, Nigel Rantor wig...@wiggly.org wrote: Adam Olsen wrote: The chance of *accidentally* producing a collision, although technically possible, is so extraordinarily rare that it's

Re: binary file compare...

2009-04-16 Thread Nigel Rantor
Adam Olsen wrote: On Apr 15, 12:56 pm, Nigel Rantor wig...@wiggly.org wrote: Adam Olsen wrote: The chance of *accidentally* producing a collision, although technically possible, is so extraordinarily rare that it's completely overshadowed by the risk of a hardware or software failure producing

Re: binary file compare...

2009-04-16 Thread Grant Edwards
On 2009-04-16, Adam Olsen rha...@gmail.com wrote: The chance of *accidentally* producing a collision, although technically possible, is so extraordinarily rare that it's completely overshadowed by the risk of a hardware or software failure producing an incorrect result. Not when you're

Re: binary file compare...

2009-04-16 Thread SpreadTooThin
On Apr 16, 3:16 am, Nigel Rantor wig...@wiggly.org wrote: Adam Olsen wrote: On Apr 15, 12:56 pm, Nigel Rantor wig...@wiggly.org wrote: Adam Olsen wrote: The chance of *accidentally* producing a collision, although technically possible, is so extraordinarily rare that it's completely

Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 16, 8:59 am, Grant Edwards inva...@invalid wrote: On 2009-04-16, Adam Olsen rha...@gmail.com wrote: I'm afraid you will need to back up your claims with real files. Although MD5 is a smaller, older hash (128 bits, so you only need 2**64 files to find collisions), You don't need

Re: binary file compare...

2009-04-16 Thread Rhodri James
On Thu, 16 Apr 2009 10:44:06 +0100, Adam Olsen rha...@gmail.com wrote: On Apr 16, 3:16 am, Nigel Rantor wig...@wiggly.org wrote: Okay, before I tell you about the empirical, real-world evidence I have could you please accept that hashes collide and that no matter how many samples you use the

Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 16, 11:15 am, SpreadTooThin bjobrie...@gmail.com wrote: And yes he is right CRCs hashing all have a probability of saying that the files are identical when in fact they are not. Here's the bottom line. It is either: A) Several hundred years of mathematics and cryptography are wrong.

Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 16, 4:27 pm, Rhodri James rho...@wildebst.demon.co.uk wrote: On Thu, 16 Apr 2009 10:44:06 +0100, Adam Olsen rha...@gmail.com wrote: On Apr 16, 3:16 am, Nigel Rantor wig...@wiggly.org wrote: Okay, before I tell you about the empirical, real-world evidence I have could you please

Re: binary file compare...

2009-04-15 Thread Steven D'Aprano
On Wed, 15 Apr 2009 07:54:20 +0200, Martin wrote: Perhaps I'm being dim, but how else are you going to decide if two files are the same unless you compare the bytes in the files? I'd say checksums, just about every download relies on checksums to verify you do have indeed the same file.

Re: binary file compare...

2009-04-15 Thread Martin
On Wed, Apr 15, 2009 at 11:03 AM, Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: The checksum does look at every byte in each file. Checksumming isn't a way to avoid looking at each byte of the two files, it is a way of mapping all the bytes to a single number. My understanding

Re: binary file compare...

2009-04-15 Thread Grant Edwards
On 2009-04-15, Martin mar...@marcher.name wrote: Hi, On Mon, Apr 13, 2009 at 10:03 PM, Grant Edwards inva...@invalid wrote: On 2009-04-13, SpreadTooThin bjobrie...@gmail.com wrote: I want to compare two binary files and see if they are the same. I see the filecmp.cmp function but I don't

Re: binary file compare...

2009-04-15 Thread Grant Edwards
On 2009-04-15, Martin mar...@marcher.name wrote: On Wed, Apr 15, 2009 at 11:03 AM, Steven D'Aprano I'd still say rather burn CPU cycles than development hours (if I got the question right), _Hours_? Calling the file compare module takes _one_line_of_code_. Implementing a file compare from

Re: binary file compare...

2009-04-15 Thread Nigel Rantor
Martin wrote: On Wed, Apr 15, 2009 at 11:03 AM, Steven D'Aprano ste...@remove.this.cybersource.com.au wrote: The checksum does look at every byte in each file. Checksumming isn't a way to avoid looking at each byte of the two files, it is a way of mapping all the bytes to a single number. My

Re: binary file compare...

2009-04-15 Thread Nigel Rantor
Grant Edwards wrote: We all rail against premature optimization, but using a checksum instead of a direct comparison is premature unoptimization. ;) And more than that, will provide false positives for some inputs. So, basically it's a worse-than-useless approach for determining if two

Re: binary file compare...

2009-04-15 Thread SpreadTooThin
On Apr 15, 8:04 am, Grant Edwards inva...@invalid wrote: On 2009-04-15, Martin mar...@marcher.name wrote: Hi, On Mon, Apr 13, 2009 at 10:03 PM, Grant Edwards inva...@invalid wrote: On 2009-04-13, SpreadTooThin bjobrie...@gmail.com wrote: I want to compare two binary files and see if

Re: binary file compare...

2009-04-15 Thread Adam Olsen
On Apr 15, 11:04 am, Nigel Rantor wig...@wiggly.org wrote: The fact that two md5 hashes are equal does not mean that the sources they were generated from are equal. To do that you must still perform a byte-by-byte comparison which is much less work for the processor than generating an md5 or

Re: binary file compare...

2009-04-15 Thread Nigel Rantor
Adam Olsen wrote: The chance of *accidentally* producing a collision, although technically possible, is so extraordinarily rare that it's completely overshadowed by the risk of a hardware or software failure producing an incorrect result. Not when you're using them to compare lots of files.

binary file compare...

2009-04-14 Thread SpreadTooThin
I want to compare two binary files and see if they are the same. I see the filecmp.cmp function but I don't get a warm fuzzy feeling that it is doing a byte by byte comparison of two files to see if they are they same. What should I be using if not filecmp.cmp? --

Re: binary file compare...

2009-04-14 Thread Adam Olsen
On Apr 13, 8:39 pm, Grant Edwards gra...@visi.com wrote: On 2009-04-13, Peter Otten __pete...@web.de wrote: But there's a cache. A change of file contents may go undetected as long as the file stats don't change: Good point.  You can fool it if you force the stats to their old values

Re: binary file compare...

2009-04-14 Thread Martin
Hi, On Mon, Apr 13, 2009 at 10:03 PM, Grant Edwards inva...@invalid wrote: On 2009-04-13, SpreadTooThin bjobrie...@gmail.com wrote: I want to compare two binary files and see if they are the same. I see the filecmp.cmp function but I don't get a warm fuzzy feeling that it is doing a byte by

Re: binary file compare...

2009-04-13 Thread Przemyslaw Kaminski
SpreadTooThin wrote: I want to compare two binary files and see if they are the same. I see the filecmp.cmp function but I don't get a warm fuzzy feeling that it is doing a byte by byte comparison of two files to see if they are they same. What should I be using if not filecmp.cmp? Well,

Re: binary file compare...

2009-04-13 Thread SpreadTooThin
On Apr 13, 2:00 pm, Przemyslaw Kaminski cge...@gmail.com wrote: SpreadTooThin wrote: I want to compare two binary files and see if they are the same. I see the filecmp.cmp function but I don't get a warm fuzzy feeling that it is doing a byte by byte comparison of two files to see if they

Re: binary file compare...

2009-04-13 Thread Grant Edwards
On 2009-04-13, SpreadTooThin bjobrie...@gmail.com wrote: I want to compare two binary files and see if they are the same. I see the filecmp.cmp function but I don't get a warm fuzzy feeling that it is doing a byte by byte comparison of two files to see if they are they same. Perhaps I'm

Re: binary file compare...

2009-04-13 Thread SpreadTooThin
On Apr 13, 2:03 pm, Grant Edwards inva...@invalid wrote: On 2009-04-13, SpreadTooThin bjobrie...@gmail.com wrote: I want to compare two binary files and see if they are the same. I see the filecmp.cmp function but I don't get a warm fuzzy feeling that it is doing a byte by byte comparison

Re: binary file compare...

2009-04-13 Thread Grant Edwards
On 2009-04-13, Grant Edwards inva...@invalid wrote: On 2009-04-13, SpreadTooThin bjobrie...@gmail.com wrote: I want to compare two binary files and see if they are the same. I see the filecmp.cmp function but I don't get a warm fuzzy feeling that it is doing a byte by byte comparison of two

Re: binary file compare...

2009-04-13 Thread SpreadTooThin
On Apr 13, 2:37 pm, Grant Edwards inva...@invalid wrote: On 2009-04-13, Grant Edwards inva...@invalid wrote: On 2009-04-13, SpreadTooThin bjobrie...@gmail.com wrote: I want to compare two binary files and see if they are the same. I see the filecmp.cmp function but I don't get a warm

Re: binary file compare...

2009-04-13 Thread Peter Otten
Grant Edwards wrote: On 2009-04-13, Grant Edwards inva...@invalid wrote: On 2009-04-13, SpreadTooThin bjobrie...@gmail.com wrote: I want to compare two binary files and see if they are the same. I see the filecmp.cmp function but I don't get a warm fuzzy feeling that it is doing a byte by

Re: binary file compare...

2009-04-13 Thread Steven D'Aprano
On Mon, 13 Apr 2009 15:03:32 -0500, Grant Edwards wrote: On 2009-04-13, SpreadTooThin bjobrie...@gmail.com wrote: I want to compare two binary files and see if they are the same. I see the filecmp.cmp function but I don't get a warm fuzzy feeling that it is doing a byte by byte comparison

Re: binary file compare...

2009-04-13 Thread Dave Angel
SpreadTooThin wrote: On Apr 13, 2:37 pm, Grant Edwards inva...@invalid wrote: On 2009-04-13, Grant Edwards inva...@invalid wrote: On 2009-04-13, SpreadTooThin bjobrie...@gmail.com wrote: I want to compare two binary files and see if they are the same. I see the filecmp.cmp

Re: binary file compare...

2009-04-13 Thread Grant Edwards
On 2009-04-13, Peter Otten __pete...@web.de wrote: But there's a cache. A change of file contents may go undetected as long as the file stats don't change: Good point. You can fool it if you force the stats to their old values after you modify a file and you don't clear the cache. -- Grant