> From: Erik Trimble [mailto:erik.trim...@oracle.com]
> 
> OK, I just re-looked at a couple of things, and here's what I /think/ is
> the correct numbers.
> 
> I just checked, and the current size of this structure is 0x178, or 376
> bytes.
> 
> Each ARC entry, which points to either an L2ARC item (of any kind,
> cached data, metadata, or a DDT line) or actual data/metadata/etc., is
> defined in the struct "arc_buf_hdr" :
> 
> http://src.opensolaris.org/source/xref/onnv/onnv-
> gate/usr/src/uts/common/fs/zfs/arc.c#431
> 
> It's current size is 0xb0, or 176 bytes.
> 
> These are fixed-size structures.

heheheh...  See what I mean about all the conflicting sources of
information?  Is it 376 and 176?  Or is it 270 and 200?
Erik says it's fixed-size.  Richard says "The DDT entries vary in size."

So far, what Erik says is at least based on reading the source code, with a
disclaimer of possibly misunderstanding the source code.  What Richard says
is just a statement of supposed absolute fact without any backing.

In any event, thank you both for your input.  Can anyone answer these
authoritatively?  (Neil?)   I'll send you a pizza.  ;-)


> For 1TB of data, broken into the following block sizes:
>               DDT size        ARC consumption
> 512b          752GB (73%)     352GB (34%)
> 4k            94GB (9%)       44GB (4.3%)
> 8k            47GB (4.5%)     22GB (2.1%)
> 32k           11.75GB (2.2%)  5.5GB (0.5%)
> 64k           5.9GB (1.1%)    2.75GB (0.3%)
> 128k          2.9GB% (0.6%)   1.4GB (0.1%)

At least the methodology to calculate all this seems reasonable to me.  If
the new numbers (376 and 176) are correct, I would just state it like this:

DDT size = 376b * # unique blocks
You can find the number of blocks in an existing filesystem using zdb -bb
poolname

ARC consumption = 176b * #blocks in the L2ARC
You can estimate the #blocks in L2ARC, if you divide total pool disk usage
by the number of blocks in pool obtained above, to find the average block
size in pool.  Divide the total L2ARC capacity by the average block size,
and you get the number of average-sized blocks stored in your L2ARC.
(Or take L2ARC capacity / Total pool usage, * #blocks in whole pool.  To
estimate #blocks in L2ARC)


> 
> ARC consumption presumes the whole DDT is stored in the L2ARC.
> 
> Percentage size is relative to the original 1TB total data size
> 
> 
> 
> Of course, the trickier proposition here is that we DON'T KNOW what our
> dedup value is ahead of time on a given data set.  That is, given a data
> set of X size, we don't know how big the deduped data size will be. The
> above calculations are for DDT/ARC size for a data set that has already
> been deduped down to 1TB in size.
> 
> 
> Perhaps it would be nice to have some sort of userland utility that
> builds it's own DDT as a test and does all the above calculations, to
> see how dedup would work on a given dataset.  'zdb -S' sorta, kinda does
> that, but...
> 
> 
> --
> Erik Trimble
> Java System Support
> Mailstop:  usca22-317
> Phone:  x67195
> Santa Clara, CA
> Timezone: US/Pacific (GMT-0800)


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to