On Wed, Dec 28, 2011 at 3:14 PM, Brad Diggs <brad.di...@oracle.com> wrote:
> The two key takeaways from this exercise were as follows. There is
> tremendous caching potential
> through the use of ZFS deduplication. However, the current block level
> deduplication does not
> benefit directory as much as it perhaps could if deduplication occurred at
> the byte level rather than
> the block level. It very could be that even byte level deduplication doesn't
> work as well either.
> Until that option is available, we won't know for sure.
How would byte-level dedup even work? My best idea would be to apply
the rsync algorithm and then start searching for little chunks of data
with matching rsync CRCs, rolling the rsync CRC over the data until a
match is found for some chunk (which then has to be read and
compared), and so on. The result would be incredibly slow on write
and would have huge storage overhead. On the read side you could have
many more I/Os too, so read would get much slower as well. I suspect
any other byte-level dedup solutions would be similarly lousy.
There'd be no real savings to be had, making the idea not worthwhile.
Dedup is for very specific use cases. If your use case doesn't
benefit from block-level dedup, then don't bother with dedup. (The
same applies to compression, but compression is much more likely to be
useful in general, which is why it should generally be on.)
zfs-discuss mailing list