On Sun, 2010-07-18 at 16:18 -0700, Richard L. Hamilton wrote: > > I would imagine that if it's read-mostly, it's a win, but > otherwise it costs more than it saves. Even more conventional > compression tends to be more resource intensive than decompression... > > What I'm wondering is when dedup is a better value than compression. > Most obviously, when there are a lot of identical blocks across different > files; but I'm not sure how often that happens, aside from maybe > blocks of zeros (which may well be sparse anyway).
Shared/identical blocks come into play in several specific scenarios: 1) Multiple VMs, cloud. If you have multiple guest OS' installed, they're going to benefit heavily from dedup. Even Zones can benefit here. 2) Situations with lots of copies of large amounts of data where only some of the data is different between each copy. The classic example is a Solaris build server, hosting dozens or even hundreds, of copies of the Solaris tree, each being worked on by different developers. Typically the developer is working on something less than 1% of the total source code, so the other 99% can be shared via dedup. For general purpose usage, e.g. hosting your music or movie collection, I doubt that dedup offers any real advantage. If I were talking about deploying dedup, I'd only use it in situations like the two I mentioned, and not for just a general purpose storage server. For general purpose applications I think compression is better. (Though I think dedup will have higher savings -- significantly so -- in the particular situation where you know you lots and lots of duplicate/redundant data.) Note also that dedup actually does some things where your duplicated data may gain an effective increase in redundancy/security, because it does make sure that the data that is deduped has higher redundancy than non-deduped data. (This sounds counterintuitive, but as long as you have at least 3 copies of the duplicated data, its a net win.) Btw, compression on top of dedup may actually kill your benefit of dedup. My hypothesis (unproven, admittedly) is that because many compression algos actually cause small permutations of data to significantly change the bit values (even just by changing their offset in the binary) in the overall compressed object, it can seriously defeat dedup's efficacy. - Garrett _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss