On Jul 10, 2010, at 5:33 AM, Erik Trimble wrote: > On 7/10/2010 5:24 AM, Richard Elling wrote: >> On Jul 9, 2010, at 11:10 PM, Brandon High wrote: >> >> >>> On Fri, Jul 9, 2010 at 5:18 PM, Brandon High<bh...@freaks.com> wrote: >>> I think that DDT entries are a little bigger than what you're using. The >>> size seems to range between 150 and 250 bytes depending on how it's >>> calculated, call it 200b each. Your 128G dataset would require closer to >>> 200M (+/- 25%) for the DDT if your data was completely unique. 1TB of >>> unique data would require 600M - 1000M for the DDT. >>> >>> Using 376b per entry, it's 376M for 128G of unique data, or just under 3GB >>> for 1TB of unique data. >>> >> 4% seems to be a pretty good SWAG. >> >> >>> A 1TB zvol with 8k blocks would require almost 24GB of memory to hold the >>> DDT. Ouch. >>> >> ... or more than 300GB for 512-byte records. >> >> The performance issue is that DDT access tends to be random. This implies >> that >> if you don't have a lot of RAM and your pool has poor random read I/O >> performance, >> then you will not be impressed with dedup performance. In other words, >> trying to >> dedup lots of data on a small DRAM machine using big, slow pool HDDs will >> not set >> any benchmark records. By contrast, using SSDs for the pool can demonstrate >> good >> random read performance. As the price per bit of HDDs continues to drop, the >> value >> of deduping pools using HDDs also drops. >> -- richard >> >> > > Which brings up an interesting idea: if I have a pool with good random I/O > (perhaps made from SSDs, or even one of those nifty Oracle F5100 things), I > would probably not want to have a DDT created, or at least have one that was > very significantly abbreviated. What capability does ZFS have for > recognizing that we won't need a full DDT created for high-I/O-speed pools? > Particularly with the fact that such pools would almost certainly be heavy > candidates for dedup (the $/GB being significantly higher than other mediums, > and thus space being at a premium) ?
Methinks it is impossible to build a complete DDT, we'll run out of atoms... maybe if we can use strings? :-) Think of it as a very, very sparse array. Otherwise it is managed just like other metadata. -- richard -- Richard Elling rich...@nexenta.com +1-760-896-4422 ZFS and NexentaStor training, Rotterdam, July 13-15, 2010 http://nexenta-rotterdam.eventbrite.com/ _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss