On Sep 24, 2012, at 10:08 AM, Jason Usher <jushe...@yahoo.com> wrote:

> Oh, and one other thing ...
> --- On Fri, 9/21/12, Jason Usher <jushe...@yahoo.com> wrote:
>>> It shows the allocated number of bytes used by the
>>> filesystem, i.e.
>>> after compression. To get the uncompressed size,
>> multiply
>>> "used" by
>>> "compressratio" (so for example if used=65G and
>>> compressratio=2.00x,
>>> then your decompressed size is 2.00 x 65G = 130G).
>> Ok, thank you.  The problem with this is, the
>> compressratio only goes to two significant digits, which
>> means if I do the math, I'm only getting an
>> approximation.  Since we may use these numbers to
>> compute billing, it is important to get it right.
>> Is there any way at all to get the real *exact* number ?
> I'm hoping the answer is yes - I've been looking but do not see it ...

none can hide from dtrace!
# dtrace -qn 'dsl_dataset_stats:entry {this->ds = (dsl_dataset_t 
*)arg0;printf("%s\tcompressed size = %d\tuncompressed size=%d\n", 
this->ds->ds_dir->dd_myname, this->ds->ds_phys->ds_compressed_bytes, 
openindiana-1   compressed size = 3667988992    uncompressed size=3759321088

[zfs get all rpool/openindiana-1 in another shell]

For reporting, the number is rounded to 2 decimal places.

>> Ok.  So the dedupratio I see for the entire pool is
>> "dedupe ratio for filesystems in this pool that have dedupe
>> enabled" ... yes ?
>>>> Also, why do I not see any dedupe stats for the
>>> individual filesystem ?  I see compressratio, and I
>> see
>>> dedup=on, but I don't see any dedupratio for the
>> filesystem
>>> itself...
>> Ok, getting back to precise accounting ... if I turn on
>> dedupe for a particular filesystem, and then I multiply the
>> "used" property by the compressratio property, and calculate
>> the real usage, do I need to do another calculation to
>> account for the deduplication ?  Or does the "used"
>> property not take into account deduping ?
> So if the answer to this is "yes, the used property is not only a compressed 
> figure, but a deduped figure" then I think we have a bigger problem ...
> You described dedupe as operating not only within the filesystem with 
> dedup=on, but between all filesystems with dedupe enabled.
> Doesn't that mean that if I enabled dedupe on more than one filesystem, I can 
> never know how much total, raw space each of those is using ?  Because if the 
> dedupe ratio is calculated across all of them, it's not the actual ratio for 
> any one of them ... so even if I do the math, I can't decide what the total 
> raw usage for one of them is ... right ?

Correct. This is by design so that blocks shared amongst different datasets can
be deduped -- the common case for things like virtual machine images.

> Again, if "used" does not reflect dedupe, and I don't need to do any math to 
> get the "raw" storage figure, then it doesn't matter...
>>>> Did turning on dedupe for a single filesystem turn
>> it
>>> on for the entire pool ?
>>> In a sense, yes. The dedup machinery is pool-wide, but
>> only
>>> writes from
>>> filesystems which have dedup enabled enter it. The
>> rest
>>> simply pass it
>>> by and work as usual.
>> Ok - but from a performance point of view, I am only using
>> ram/cpu resources for the deduping of just the individual
>> filesystems I enabled dedupe on, right ?  I hope that
>> turning on dedupe for just one filesystem did not incur
>> ram/cpu costs across the entire pool...
> I also wonder about this performance question...

It depends.
 -- richard

illumos Day & ZFS Day, Oct 1-2, 2012 San Fransisco 

zfs-discuss mailing list

Reply via email to