> -----Original Message----- > From: Philipp Marek [mailto:philipp.ma...@emerion.com] > Sent: Thursday, February 26, 2009 8:23 AM > On Mittwoch, 25. Februar 2009, Chinmay Kamat wrote: > > We had thought of using a smaller hash function. However > the following > > issue arises --- hash value of a block being written to disk is > > calculated and compared in the tree index. If a match is > found, we are > > never sure if the blocks are identical or its a hash > collision. So we > > need to do a byte by byte comparison of the 2 blocks- current block > > being written and the block pointed to by matching tree entry. This > > would mean doing disk read for reading the block pointed > to by tree > > entry. So each detection of duplicate block will have an > overhead of a > > block read.
If deduplication is done in times with reduced IO, i don't see the problem with doing an esktra read for verification. Any hash-value will have collisions, and loss of data is an absolute no-no for filesystems. > > I thought about that last night, and came to a similar idea > as Michael: > > On Mittwoch, 25. Februar 2009, Michael Keulkeul wrote: > > If soneone asked me, I would answer than verification is > necessary and > > using weaker and small hash is fine. > > Storing the block with it's hash, marked "non deduplicated" > is fine, > > just dedup it later with a background process when filesystems has > > some idle iops to spend on it, and mark it "deduplicated" when done. > > I've the feeling that tux3 design is neat to do such a > thing (multiple > > trees, just add one to the forest). > > How about not doing deduplication on *every* block, but only > for specially marked files? This might help the problem by doing ekstra IO to verify that blocks are identical. On a side note regarding security. There should be no way for any other user than root to see if a diskblock is duplicated. Otherwise information can leak to persons not allowed to read the files of another user. -- Stefan _______________________________________________ Tux3 mailing list Tux3@tux3.org http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3