On 7/18/2010 4:18 PM, Richard L. Hamilton wrote:
Even the most expensive decompression algorithms
generally run
significantly faster than I/O to disk -- at least
when real disks are
involved. So, as long as you don't run out of CPU
and have to wait for
CPU to be available for decompression, the
decompression will win. The
same concept is true for dedup, although I don't
necessarily think of
dedup as a form of compression (others might
reasonably do so though.)
Effectively, dedup is a form of compression of the
filesystem rather than any single file, but one
oriented to not interfering with access to any of what
may be sharing blocks.
I would imagine that if it's read-mostly, it's a win, but
otherwise it costs more than it saves. Even more conventional
compression tends to be more resource intensive than decompression...
What I'm wondering is when dedup is a better value than compression.
Most obviously, when there are a lot of identical blocks across different
files; but I'm not sure how often that happens, aside from maybe
blocks of zeros (which may well be sparse anyway).
From my own experience, a dedup "win" is much more data-usage-dependent
than compression.
Compression seems to be of general use across the vast majority of data
I've encountered - with the sole big exception of media file servers
(where the data is already compressed pictures, audio, or video). It
seems to be of general utility, since I've always got spare CPU cycles,
and it's really not very "expensive" in terms of CPU in most cases. Of
course, the *value* of compression varies according to the data (i.e.
how much it will compress), but that doesn't matter for *utility* for
the most part.
Dedup, on the other hand, currently has a very steep price in terms of
needed ARC/L2ARC/RAM, so it's much harder to justify in those cases
where it only provides modest benefits. Additionally, we're still in the
development side of dedup (IMHO), so I can't really make a full
evaluation of Dedup concept, as many of its issues today are
implementation-related, not concept-related. All that said, Dedup has
a showcase use case where it is of *massive* benefit: hosting Virtual
Machines. For a machine hosting only 100 VM data stores, I can see 99%
space savings. And, I see a significant performance boost, since I can
cache that one VM image in RAM easily. There's other places where
Dedup seems modestly useful these days (one is in the afore-mentioned
media-file server, which you'd be surprised how much duplication there
is), but it's *much* harder to pre-determine dedup's utility for a given
dataset, unless you have highly detailed knowledge of that dataset's
composition.
I'll admit to not being a big fan of the Dedup concept originally (go
back a couple of years here on this list), but, given that the world is
marching straight to Virtualization as fast as we can go, I'm a convert
now.
From my perspective, here's a couple of things that I think would help
improve dedup's utility for me:
(a) fix the outstanding issues in the current implementation (duh!).
(b) add the ability to store the entire DDT in the backing store, and
not have to construct it in ARC from disk-resident info (this would be
of great help where backing store = SSD or RAM-based things)
(c) be able to "test-dedup" a given filesystem. I'd like ZFS to be able
to look at a filesystem and tell me how much dedup I'd get out of it,
WITHOUT having to actually create a dedup-enabled filesystem and copy
the data to it. While it would be nice to be able to simply turn on
dedup for a filesystem, and have ZFS dedup the existing data there
(in-place, without copying), I realize the implementation is hard given
how things currently work, and frankly, that's of much lower priority
for me than being able to test-dedup a dataset.
(d) increase the slab (record) size significantly, to at least 1MB or
more. I daresay the primary way VM images are stored these days are as
single, large files (though iSCSI volumes are coming up fast), and as
such, I've got 20G files which would really, really, benefit from having
a much larger slab size.
(e) and, of course, seeing if there's some way we can cut down on
dedup's piggy DDT size. :-)
--
Erik Trimble
Java System Support
Mailstop: usca22-123
Phone: x17195
Santa Clara, CA
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss