On 7/18/2010 4:18 PM, Richard L. Hamilton wrote:
Even the most expensive decompression algorithms
generally run
significantly faster than I/O to disk -- at least
when real disks are
involved.  So, as long as you don't run out of CPU
and have to wait for
CPU to be available for decompression, the
decompression will win.  The
same concept is true for dedup, although I don't
necessarily think of
dedup as a form of compression (others might
reasonably do so though.)
Effectively, dedup is a form of compression of the
filesystem rather than any single file, but one
oriented to not interfering with access to any of what
may be sharing blocks.

I would imagine that if it's read-mostly, it's a win, but
otherwise it costs more than it saves.  Even more conventional
compression tends to be more resource intensive than decompression...

What I'm wondering is when dedup is a better value than compression.
Most obviously, when there are a lot of identical blocks across different
files; but I'm not sure how often that happens, aside from maybe
blocks of zeros (which may well be sparse anyway).

From my own experience, a dedup "win" is much more data-usage-dependent than compression.

Compression seems to be of general use across the vast majority of data I've encountered - with the sole big exception of media file servers (where the data is already compressed pictures, audio, or video). It seems to be of general utility, since I've always got spare CPU cycles, and it's really not very "expensive" in terms of CPU in most cases. Of course, the *value* of compression varies according to the data (i.e. how much it will compress), but that doesn't matter for *utility* for the most part.

Dedup, on the other hand, currently has a very steep price in terms of needed ARC/L2ARC/RAM, so it's much harder to justify in those cases where it only provides modest benefits. Additionally, we're still in the development side of dedup (IMHO), so I can't really make a full evaluation of Dedup concept, as many of its issues today are implementation-related, not concept-related. All that said, Dedup has a showcase use case where it is of *massive* benefit: hosting Virtual Machines. For a machine hosting only 100 VM data stores, I can see 99% space savings. And, I see a significant performance boost, since I can cache that one VM image in RAM easily. There's other places where Dedup seems modestly useful these days (one is in the afore-mentioned media-file server, which you'd be surprised how much duplication there is), but it's *much* harder to pre-determine dedup's utility for a given dataset, unless you have highly detailed knowledge of that dataset's composition.

I'll admit to not being a big fan of the Dedup concept originally (go back a couple of years here on this list), but, given that the world is marching straight to Virtualization as fast as we can go, I'm a convert now.


From my perspective, here's a couple of things that I think would help improve dedup's utility for me:

(a) fix the outstanding issues in the current implementation (duh!).

(b) add the ability to store the entire DDT in the backing store, and not have to construct it in ARC from disk-resident info (this would be of great help where backing store = SSD or RAM-based things)

(c) be able to "test-dedup" a given filesystem. I'd like ZFS to be able to look at a filesystem and tell me how much dedup I'd get out of it, WITHOUT having to actually create a dedup-enabled filesystem and copy the data to it. While it would be nice to be able to simply turn on dedup for a filesystem, and have ZFS dedup the existing data there (in-place, without copying), I realize the implementation is hard given how things currently work, and frankly, that's of much lower priority for me than being able to test-dedup a dataset.

(d) increase the slab (record) size significantly, to at least 1MB or more. I daresay the primary way VM images are stored these days are as single, large files (though iSCSI volumes are coming up fast), and as such, I've got 20G files which would really, really, benefit from having a much larger slab size.

(e) and, of course, seeing if there's some way we can cut down on dedup's piggy DDT size. :-)


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to