J�rn Engel wrote:
There is vunify which is part of util-vserver package. What is
>>better for general usage is Sam's unify-dirs script. It is located >>at http://mirrors.paul.sladen.org/sam.vilain.net/vserver/unify-dirs.
Just use this without the -l or -i options and it will just do hard
>>links, which is what you want for the current implementation of cow >>links.
Darn! Doesn't work for me yet. One personal problem is a slow (USB1) 300GB hard-drive that contains some identical files. I was thinking about hashing only the first 4k or so of each file and the do a direct comparison in case of hash collision. Even with sha1 over the complete file, there is no guarantee that a hash collision means two identical files.
The chances of bits on your hard drive platter randomly losing their
magnetism or capacitors in your RAM losing charge and changing are probably higher than two different files having an SHA1 collision :-). Hey, maybe *that's* why I get those random reiserfs corruptions!
Hashing only the first block of the file as an optimisation is a sensible idea.
The script could be easily modified to do this as a seperate step, however bear in mind that it will only even consider checking the file's contents if the files already have the same owner/group/permissions, relative path and file size. My assumption was that if these all match, the files are probably going to be the same anyway.
Also, I want a database with all already known files. Untimately this could be turned into a daemon that watches the complete fs tree for changes and turns new files into cowlinks shortly after creation. With such a daemon, "cp -r" will temporarily flush part of the page cache, have the same result as "cowcopy -r".
Nice idea, but I think on UNIX that's pretty much a can of worms with no easy answer. You'd need something in the kernel that notifies userland when any inode on a filesystem changes. Have a look at the intermezzo module if you want to go down that path. If you can provide the kernel half, I'll be more than happy to extend unify-dirs to work with it :).
Failing active monitoring, as a simple compromise there's no reason that unify-dirs couldn't optionally store its internal inode/stat/SHA1 hash cache in a Berkeley database, and run the script every hour or so via cron. It would certainly prevent the copious stat()'ing that the script does, at the expense of not noticing unlikely unification situations until the DB cache entries expire.
Of course, it would still absolutely hammer the VFS every time it runs with readdir() calls and find all those glorious reiserfs corner case bugs, but in my experience with a "handful" (say, 30) of vservers that are already mostly unified the script completes in under a minute when unifying just the OS (eg, /usr, /lib, /sbin and /bin).
Who knows, maybe there are other optimizations possible - like only stat()'ing the leaf directories in the heirarchy, to see if any files have been added or removed before actually using readdir() to read them. Again this will not catch some unlikely unification situations until full stat()'ing happens.
Sam. _______________________________________________ Vserver mailing list [EMAIL PROTECTED] http://list.linux-vserver.org/mailman/listinfo/vserver
