[exposed organs below…]

On Oct 7, 2011, at 8:25 PM, Daniel Carosone wrote:
> On Tue, Oct 04, 2011 at 09:28:36PM -0700, Richard Elling wrote:
>> On Oct 4, 2011, at 4:14 PM, Daniel Carosone wrote:
>>> I sent it twice, because something strange happened on the first send,
>>> to the ashift=12 pool.  "zfs list -o space" showed figures at least
>>> twice those on the source, maybe roughly 2.5 times.
>> Can you share the output?
> Source machine, zpool v14 snv_111b:
> int/iscsi_01  99.2G   237G     37.9G    199G              0          0     
> 200G
> Destination machine, zpool v31 snv_151b:
> geek/iscsi_01  3.64T   550G     88.4G    461G              0          0     
> 200G
> uext/iscsi_01  1.73T   245G     39.2G    206G              0          0     
> 200G
> geek is the ashift=12 pool, obviously.  I'm assuming the smaller
> difference for uext is due to other layout differences in the pool
> versions.
>>> What is going on? Is there really that much metadata overhead?  How
>>> many metadata blocks are needed for each 8k vol block, and are they
>>> each really only holding 512 bytes of metadata in a 4k allocation?
>>> Can they not be packed appropriately for the ashift?
>> Doesn't matter how small metadata compresses, the minimum size you can write
>> is 4KB.
> This isn't about whether the metadata compresses, this is about
> whether ZFS is smart enough to use all the space in a 4k block for
> metadata, rather than assuming it can fit at best 512 bytes,
> regardless of ashift.  By packing, I meant packing them full rather
> than leaving them mostly empty and wasted (or anything to do with
> compression). 

The answer is: it depends. Let's look for more clues first...

>> I think we'd need to see the exact layout of the internal data. This can be 
>> achieved with the zfs_blkstats macro in mdb. Perhaps we can take this offline
>> and report back?
> Happy to - what other details / output would you like?

This is easier to do offline, but while we're here…
[assuming Solaris-derived OS with mdb]

0. scrub the pool, so that the block usage stats are loaded

1. find the address of the pool's spa structure, for example
        # echo ::spa | mdb -k
        ADDR                 STATE NAME                                         
        ffffff01c647d580    ACTIVE stuff
        ffffff01c52b1040    ACTIVE syspool

2. look at the block usage stats, for example
        # echo ffffff01c52b1040::zfs_blkstats | mdb -k
        Dittoed blocks on same vdev: 4541
        Blocks  LSIZE   PSIZE   ASIZE     avg    comp   %Total  Type
             1    16K      1K   3.00K   3.00K   16.00     0.00  object directory
             3  1.50K   1.50K   4.50K   1.50K    1.00     0.00  object array
           163  19.8M   1.46M   4.39M   27.6K   13.52     0.28  bpobj
           336  1.79M    724K   2.12M   6.46K    2.53     0.13  SPA space map

3. compare the block usage stats for the various pools
        Block counts are obvious
        LSIZE = logical size
        PSIZE = physical size, after compression
        ASIZE = allocated size, how much disk space is used (including raidz & 
        avg = average allocated size per block
        comp = compression ratio (LSIZE:PSIZE)
        %Total is the percent of total allocated space

It should be obvious that ashift = 9 for the above example.
 -- richard


ZFS and performance consulting
VMworld Copenhagen, October 17-20
OpenStorage Summit, San Jose, CA, October 24-27
LISA '11, Boston, MA, December 4-9 

zfs-discuss mailing list

Reply via email to