On Sun, 2010-07-18 at 16:18 -0700, Richard L. Hamilton wrote:

> 
> I would imagine that if it's read-mostly, it's a win, but
> otherwise it costs more than it saves.  Even more conventional
> compression tends to be more resource intensive than decompression...
> 
> What I'm wondering is when dedup is a better value than compression.
> Most obviously, when there are a lot of identical blocks across different
> files; but I'm not sure how often that happens, aside from maybe
> blocks of zeros (which may well be sparse anyway).

Shared/identical blocks come into play in several specific scenarios:

1) Multiple VMs, cloud.  If you have multiple guest OS' installed,
they're going to benefit heavily from dedup.  Even Zones can benefit
here.

2) Situations with lots of copies of large amounts of data where only
some of the data is different between each copy.  The classic example is
a Solaris build server, hosting dozens or even hundreds, of copies of
the Solaris tree, each being worked on by different developers.
Typically the developer is working on something less than 1% of the
total source code, so the other 99% can be shared via dedup.

For general purpose usage, e.g. hosting your music or movie collection,
I doubt that dedup offers any real advantage.  If I were talking about
deploying dedup, I'd only use it in situations like the two I mentioned,
and not for just a general purpose storage server.  For general purpose
applications I think compression is better.  (Though I think dedup will
have higher savings -- significantly so -- in the particular situation
where you know you lots and lots of duplicate/redundant data.)

Note also that dedup actually does some things where your duplicated
data may gain an effective increase in redundancy/security, because it
does make sure that the data that is deduped has higher redundancy than
non-deduped data.  (This sounds counterintuitive, but as long as you
have at least 3 copies of the duplicated data, its a net win.)

Btw, compression on top of dedup may actually kill your benefit of
dedup.   My hypothesis (unproven, admittedly) is that because many
compression algos actually cause small permutations of data to
significantly change the bit values (even just by changing their offset
in the binary) in the overall compressed object, it can seriously defeat
dedup's efficacy.

        - Garrett

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to