Grant Edwards wrote: > On 2009-04-13, Grant Edwards <inva...@invalid> wrote: >> On 2009-04-13, SpreadTooThin <bjobrie...@gmail.com> wrote: >> >>> I want to compare two binary files and see if they are the same. >>> I see the filecmp.cmp function but I don't get a warm fuzzy feeling >>> that it is doing a byte by byte comparison of two files to see if they >>> are they same. >> >> Perhaps I'm being dim, but how else are you going to decide if >> two files are the same unless you compare the bytes in the >> files? >> >> You could hash them and compare the hashes, but that's a lot >> more work than just comparing the two byte streams. >> >>> What should I be using if not filecmp.cmp? >> >> I don't understand what you've got against comparing the files >> when you stated that what you wanted to do was compare the files. > > Doh! I misread your post and thought were weren't getting a > warm fuzzying feeling _because_ it was doing a byte-byte > compare. Now I'm a bit confused. Are you under the impression > it's _not_ doing a byte-byte compare? Here's the code: > > def _do_cmp(f1, f2): > bufsize = BUFSIZE > fp1 = open(f1, 'rb') > fp2 = open(f2, 'rb') > while True: > b1 = fp1.read(bufsize) > b2 = fp2.read(bufsize) > if b1 != b2: > return False > if not b1: > return True > > It looks like a byte-by-byte comparison to me. Note that when > this function is called the file lengths have already been > compared and found to be equal.
But there's a cache. A change of file contents may go undetected as long as the file stats don't change: $ cat fool_filecmp.py import filecmp, shutil, sys for fn in "adb": with open(fn, "w") as f: f.write("yadda") shutil.copystat("d", "a") filecmp.cmp("a", "b", False) with open("a", "w") as f: f.write("*****") shutil.copystat("d", "a") if "--clear" in sys.argv: print "clearing cache" filecmp._cache.clear() if filecmp.cmp("a", "b", False): print "file a and b are equal" else: print "file a and b differ" print "a's contents:", open("a").read() print "b's contents:", open("b").read() $ python2.6 fool_filecmp.py file a and b are equal a's contents: ***** b's contents: yadda Oops. If you are paranoid you have to clear the cache before doing the comparison: $ python2.6 fool_filecmp.py --clear clearing cache file a and b differ a's contents: ***** b's contents: yadda Peter -- http://mail.python.org/mailman/listinfo/python-list