Re: [zfs-discuss] DDT sync?

Edward Ned Harvey Tue, 31 May 2011 20:27:10 -0700

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Edward Ned Harvey
> 
> So here's what I'm going to do.  With arc_meta_limit at 7680M, of which
> 100M
> was consumed "naturally," that leaves me 7580 to play with.  Call it
7500M.
> Divide by 412 bytes, it means I'll hit a brick wall when I reach a little
> over 19M blocks.  Which means if I set my recordsize to 32K, I'll hit that
> limit around 582G disk space consumed.  That is my hypothesis, and now
> beginning the test.


Well, this is interesting.  With 7580MB theoretically available for DDT in
ARC, the expectation was that 19M DDT entries would finally max out the ARC
and then I'd jump off a performance cliff and start seeing a bunch of pool
reads killing my write performance.

In reality, what I saw was:  
* Up to a million blocks, the performance difference with/without dedup was
basically negligible.  Write time with dedup = 1x write time without dedup.
* After a million, the dedup write time consistently reached 2x longer than
the native write time.  This happened when my ARC became full of user data
(not meta data)
* As the # of unique blocks in pool increased, gradually, the dedup write
time deviated from the non-dedup write time.  2x, 3x, 4x.  I got a
consistent 4x longer write time with dedup enabled, after the pool reached
22.5M blocks.
* And then it jumped off a cliff.  When I got to 24M blocks, it was the last
datapoint able to be collected.  28x slower write with dedup (4966 sec to
write 3G, as compared to 178sec), and for the first time, a nonzero rm time.
All the way up till now, even with dedup, the rm time was zero.  But now it
was 72sec.  
* I waited another 6 hours, and never got another data point.  So I found
the limit where the pool becomes unusably slow.  

At a cursory look, you might say this supported the hypothesis.  You might
say "24M compared to 19M, that's not too far off.  This could be accounted
for by using the 376byte size of ddt_entry_t, instead of the 412byte size
apparently measured... This would adjust the hypothesis to 21.1M blocks."

But I don't think that's quite fair.  Because my arc_meta_used never got
above 5,159.  And I never saw the massive read overload that was predicted
to be the cause of failure.  In fact, starting from 0.4M to 0.5M blocks
(early, early, early on) from that point onward, I always had 40-50 reads
for every 250 writes.  Right to the bitter end.  And my arc is full of user
data, not meta data.

So the conclusions I'm drawing are:

(1)  If you don't tweak arc_meta_limit, and you want to enable dedup, you're
toast.  But if you do tweak arc_meta_limit, you might reasonably expect
dedup to perform 3x to 4x slower on unique data...  And based on results
that I haven't talked about yet here, dedup performs 3x to 4x faster on
duplicate data.  So if you have 50% or higher duplicate data (dedup ratio 2x
or higher) and you have plenty of memory and tweak it, then your performance
with dedup could be comparable, or even faster than running without dedup.
Of course, depending on your data patterns and usage patterns.  YMMV.

(2)  The above is pretty much the best you can do, if your server is going
to be a "normal" server, handling both reads & writes.  Because the data and
the meta_data are both stored in the ARC, the data has a tendency to push
the meta_data out.  But in a special use case - Suppose you only care about
write performance and saving disk space.  For example, suppose you're the
destination server of a backup policy.  You only do writes, so you don't
care about keeping data in cache.  You want to enable dedup to save cost on
backup disks.  You only care about keeping meta_data in ARC.  If you set
primarycache=metadata ....  I'll go test this now.  The hypothesis is that
my arc_meta_used should actually climb up to the arc_meta_limit before I
start hitting any disk reads, so my write performance with/without dedup
should be pretty much equal up to that point.  I'm sacrificing the potential
read benefit of caching data in ARC, in order to hopefully gain write
performance - So write performance can be just as good with dedup enabled or
disabled.  In fact, if there's much duplicate data, the dedup write
performance in this case should be significantly better than without dedup.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] DDT sync?

Reply via email to