J�rn Engel wrote:
The chances of bits on your hard drive platter randomly losing their
magnetism or capacitors in your RAM losing charge and changing are probably higher than two different files having an SHA1 collision :-).
I used to have the same opinion. Then I read this: http://www.usenix.org/events/hotos03/tech/full_papers/henson/henson_html/hash.html
relative path and file size. My assumption was that if these all match,
the files are probably going to be the same anyway.
In that case, you can ignore the hashes anyway. Do a direct comparison, nothing lost.
I think this is ultimately a matter of faith. Personally my gut feeling is that the cryptographers know more than the skeptics, and wait with keen interest for them to show that their birthday paradox actually happens in real life when applied to SHA-1. So, every extra hash bit is actually only sqrt(2) of extra randomness. sqrt(2^160) is still 2^80 which is still a very large number (sqrt(365) = 19).
When I read the original CryptoBytes newsletter about the MD5 hash function weakness, I was left with the impression that the only thing they thought possible was *inserted* blocks that do not affect the MD5 value. The actual MD5 hash function has (to my knowledge) no known flaw. This is one of the things that they fixed with the SHA-* suite.
Each to their own of course! Maybe a full comparison should be the default behaviour, but personally I'm happy with the digest.
Failing active monitoring, as a simple compromise there's no reason that unify-dirs couldn't optionally store its internal inode/stat/SHA1 hash cache in a Berkeley database, and run the script every hour or so via cron. It would certainly prevent the copious stat()'ing that the script does, at the expense of not noticing unlikely unification situations until the DB cache entries expire.Your problem is simpler, compared to the one I want to solve. Also,
with final cowlinks, it's perfectly sane to combine two files with
different owners, permissions, [amc]times, etc. Both will have
seperate inodes, just the data is identical.
Yes, if you can do some kind of kernel side inode -> inode "semi-soft" (Bagua?) link like COWlinks you get these advantages.
It just makes the unification script have to save a whole lot more state information, and personally for my purposes I consider that unnecessary; but then, I'm mostly concerned about saving space for system libraries and binaries across a whole load of virtually identical vservers.
-- Sam Vilain, sam /\T vilain |><>T net, PGP key ID: 0x05B52F13 (include my PGP key ID in personal replies to avoid spam filtering)
_______________________________________________ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
