> Even better would be using the ZFS block checksums (assuming we are only
> summing the data, not it's position or time :)...
>
> Then we could have two files that have 90% the same blocks, and still
> get some dedup value... ;)

Yes,  but you will need to add some sort of highly collision resistant
checksum (sha+md5 maybe) and code to a; bit level compare blocks on
collision (100% bit verification) and b; handle linked or cascaded
collision tables (2+ blocks with the same hash but differing bits).  I
actually coded some of this and was playing with it.  My testbed relied on
another internal data store to track hash maps, collisions (dedup lists)
and collision cascades (kind of like what perl does with hash key
collisions).  It turned out to be a real pain when taking into account
snaps and clones.  I decided to wait until the resilver/grow/remove code
was in place as this seems to be part of the puzzle.


-Wade

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to