On Mittwoch, 25. Februar 2009, Chinmay Kamat wrote: > We had thought of using a smaller hash function. However the following > issue arises --- hash value of a block being written to disk is > calculated and compared in the tree index. If a match is found, we are > never sure if the blocks are identical or its a hash collision. So we > need to do a byte by byte comparison of the 2 blocks- current block > being written and the block pointed to by matching tree entry. This > would mean doing disk read for reading the block pointed to by tree > entry. So each detection of duplicate block will have an overhead of a > block read.
I thought about that last night, and came to a similar idea as Michael: On Mittwoch, 25. Februar 2009, Michael Keulkeul wrote: > If soneone asked me, I would answer than verification is necessary and > using weaker and small hash is fine. > Storing the block with it's hash, marked "non deduplicated" is fine, just > dedup it later with a background process when filesystems has some idle > iops to spend on it, and mark it "deduplicated" when done. > I've the feeling that tux3 design is neat to do such a thing (multiple > trees, just add one to the forest). How about not doing deduplication on *every* block, but only for specially marked files? Ie. the whole root filesystem is not marked; but if I create a big file that is to be used as loopback mount, and I want to use several versions of that, I could mark it for deduplication (reminds me of Copy-On-Write semantics), which would cause it being indexed in some way. If I need a copy, I just mark the file to-be-written as "to be de-duplicated", and the copying process find the duplicate blocks. (Or maybe there's some special "cowcp" for Tux3, since a real copy only needs the extents refcounted?) Then there would be a lot less hash values computed, therefore fewer collisions, and there would be *much* less read-traffic. Well, maybe the SSD people would create a new command "compare this data against block A,B,C and D; if there's a match, return the block number; else write into block E.". As SSD has no seek time, this should be (on average) faster than normal writes. [ A similar command would be good for RAID sets and online verification, BTW ] > On the other hand, in case of a larger hash (SHA1) when 2 hash values > match, the blocks should be duplicates, assuming the chances of collision > with large hashes > are very remote. Probably more remote than hw failure. Well, IMO even remote chances of unseen data damage is bad. Regards, Phil _______________________________________________ Tux3 mailing list Tux3@tux3.org http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3