On Wed, Dec 28, 2011 at 3:14 PM, Brad Diggs <brad.di...@oracle.com> wrote: > > The two key takeaways from this exercise were as follows. There is > tremendous caching potential > through the use of ZFS deduplication. However, the current block level > deduplication does not > benefit directory as much as it perhaps could if deduplication occurred at > the byte level rather than > the block level. It very could be that even byte level deduplication doesn't > work as well either. > Until that option is available, we won't know for sure.
How would byte-level dedup even work? My best idea would be to apply the rsync algorithm and then start searching for little chunks of data with matching rsync CRCs, rolling the rsync CRC over the data until a match is found for some chunk (which then has to be read and compared), and so on. The result would be incredibly slow on write and would have huge storage overhead. On the read side you could have many more I/Os too, so read would get much slower as well. I suspect any other byte-level dedup solutions would be similarly lousy. There'd be no real savings to be had, making the idea not worthwhile. Dedup is for very specific use cases. If your use case doesn't benefit from block-level dedup, then don't bother with dedup. (The same applies to compression, but compression is much more likely to be useful in general, which is why it should generally be on.) Nico -- _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss