> Even better would be using the ZFS block checksums (assuming we are only > summing the data, not it's position or time :)... > > Then we could have two files that have 90% the same blocks, and still > get some dedup value... ;)
Yes, but you will need to add some sort of highly collision resistant checksum (sha+md5 maybe) and code to a; bit level compare blocks on collision (100% bit verification) and b; handle linked or cascaded collision tables (2+ blocks with the same hash but differing bits). I actually coded some of this and was playing with it. My testbed relied on another internal data store to track hash maps, collisions (dedup lists) and collision cascades (kind of like what perl does with hash key collisions). It turned out to be a real pain when taking into account snaps and clones. I decided to wait until the resilver/grow/remove code was in place as this seems to be part of the puzzle. -Wade _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss