We had thought of using a smaller hash function. However the following issue arises --- hash value of a block being written to disk is calculated and compared in the tree index. If a match is found, we are never sure if the blocks are identical or its a hash collision. So we need to do a byte by byte comparison of the 2 blocks- current block being written and the block pointed to by matching tree entry. This would mean doing disk read for reading the block pointed to by tree entry. So each detection of duplicate block will have an overhead of a block read.
On the other hand, in case of a larger hash (SHA1) when 2 hash values match, the blocks should be duplicates, assuming the chances of collision with large hashes are very remote. Probably more remote than hw failure. The tree contains only the first 64 bits of the SHA1 hash, so there is some optimization. ( we are working on handling collisions here). About the SHA1 implementation, as Daniel mentioned in his mail, we will be looking at implementations other than openssl before the kernel port. Regards, Gaurav Tungatkar Chinmay Kamat Kushal Dalmia Amey Magar _______________________________________________ Tux3 mailing list Tux3@tux3.org http://mailman.tux3.org/cgi-bin/mailman/listinfo/tux3