On Fri, 6 August 2004 12:53:43 +1200, Sam Vilain wrote: > > The chances of bits on your hard drive platter randomly losing their > magnetism or capacitors in your RAM losing charge and changing are > probably higher than two different files having an SHA1 collision :-).
I used to have the same opinion. Then I read this: http://www.usenix.org/events/hotos03/tech/full_papers/henson/henson_html/hash.html > Hashing only the first block of the file as an optimisation is a > sensible idea. Yes. > The script could be easily modified to do this as a seperate step, > however bear in mind that it will only even consider checking the file's > contents if the files already have the same owner/group/permissions, > relative path and file size. My assumption was that if these all match, > the files are probably going to be the same anyway. In that case, you can ignore the hashes anyway. Do a direct comparison, nothing lost. > Nice idea, but I think on UNIX that's pretty much a can of worms with no > easy answer. You'd need something in the kernel that notifies userland > when any inode on a filesystem changes. Have a look at the intermezzo > module if you want to go down that path. If you can provide the kernel > half, I'll be more than happy to extend unify-dirs to work with it :). Yes, I know. Quite a few people tried it already, Al Viro didn't like any of it. > Failing active monitoring, as a simple compromise there's no reason that > unify-dirs couldn't optionally store its internal inode/stat/SHA1 hash > cache in a Berkeley database, and run the script every hour or so via > cron. It would certainly prevent the copious stat()'ing that the script > does, at the expense of not noticing unlikely unification situations > until the DB cache entries expire. > > Of course, it would still absolutely hammer the VFS every time it runs > with readdir() calls and find all those glorious reiserfs corner case > bugs, but in my experience with a "handful" (say, 30) of vservers that > are already mostly unified the script completes in under a minute when > unifying just the OS (eg, /usr, /lib, /sbin and /bin). > > Who knows, maybe there are other optimizations possible - like only > stat()'ing the leaf directories in the heirarchy, to see if any files > have been added or removed before actually using readdir() to read them. > Again this will not catch some unlikely unification situations until > full stat()'ing happens. Your problem is simpler, compared to the one I want to solve. Also, with final cowlinks, it's perfectly sane to combine two files with different owners, permissions, [amc]times, etc. Both will have seperate inodes, just the data is identical. J�rn -- Invincibility is in oneself, vulnerability is in the opponent. -- Sun Tzu _______________________________________________ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
