Re: [zfs-discuss] Debunking the dedup memory myth

Richard Elling Sat, 10 Jul 2010 06:17:46 -0700

On Jul 10, 2010, at 5:33 AM, Erik Trimble wrote:

> On 7/10/2010 5:24 AM, Richard Elling wrote:
>> On Jul 9, 2010, at 11:10 PM, Brandon High wrote:
>> 
>>   
>>> On Fri, Jul 9, 2010 at 5:18 PM, Brandon High<bh...@freaks.com>  wrote:
>>> I think that DDT entries are a little bigger than what you're using. The 
>>> size seems to range between 150 and 250 bytes depending on how it's 
>>> calculated, call it 200b each. Your 128G dataset would require closer to 
>>> 200M (+/- 25%) for the DDT if your data was completely unique. 1TB of 
>>> unique data would require 600M - 1000M for the DDT.
>>> 
>>> Using 376b per entry, it's 376M for 128G of unique data, or just under 3GB 
>>> for 1TB of unique data.
>>>     
>> 4% seems to be a pretty good SWAG.
>> 
>>   
>>> A 1TB zvol with 8k blocks would require almost 24GB of memory to hold the 
>>> DDT. Ouch.
>>>     
>> ... or more than 300GB for 512-byte records.
>> 
>> The performance issue is that DDT access tends to be random. This implies 
>> that
>> if you don't have a lot of RAM and your pool has poor random read I/O 
>> performance,
>> then you will not be impressed with dedup performance. In other words, 
>> trying to
>> dedup lots of data on a small DRAM machine using big, slow pool HDDs will 
>> not set
>> any benchmark records. By contrast, using SSDs for the pool can demonstrate 
>> good
>> random read performance. As the price per bit of HDDs continues to drop, the 
>> value
>> of deduping pools using HDDs also drops.
>>  -- richard
>> 
>>   
> 
> Which brings up an interesting idea:   if I have a pool with good random I/O  
> (perhaps made from SSDs, or even one of those nifty Oracle F5100 things),  I 
> would probably not want to have a DDT created, or at least have one that was 
> very significantly abbreviated.   What capability does ZFS have for 
> recognizing that we won't need a full DDT created for high-I/O-speed pools?  
> Particularly with the fact that such pools would almost certainly be heavy 
> candidates for dedup (the $/GB being significantly higher than other mediums, 
> and thus space being at a premium) ?


Methinks it is impossible to build a complete DDT, we'll run out of atoms... 
maybe if 
we can use strings?  :-)  Think of it as a very, very sparse array.  Otherwise 
it
is managed just like other metadata.
 -- richard

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
ZFS and NexentaStor training, Rotterdam, July 13-15, 2010
http://nexenta-rotterdam.eventbrite.com/




_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Debunking the dedup memory myth

Reply via email to