On Mar 18, 2012, at 11:16 AM, Jim Klimov wrote:

> Hello all,
> 
> I was asked if it is possible to convert a ZFS pool created
> explicitly with ashift=12 (via the tweaked binary) and filled
> with data back into ashift=9 so as to use the slack space
> from small blocks (BP's, file tails, etc.)

copy out, copy in.  Whether this is easy or not depends on how well
you plan your storage use ...

> The user's HDD marketing text says that it "efficiently"
> emulates 512b sectors while using 4Kb ones natively (that's
> why ashift=12 was enforced in the first place).

Marketing: 2 drink minimum

> 
> Questions are:
> 1) How bad would a performance hit be with 512b blocks used
>   on a 4kb drive with such "efficient emulation"?

Depends almost exclusively on the workload and hardware. In my
experience, most folks who bite the 4KB bullet have low-cost HDDs
where one cannot reasonably expect high performance.

> Is it
>   possible to model/emulate the situation somehow in advance
>   to see if it's worth that change at all?

It will be far more cost effective to just make the change.

> 2) Is it possible to easily estimate the amount of "wasted"
>   disk space in slack areas of the currently active ZFS
>   allocation (unused portions of 4kb blocks that might
>   become available if the disks were reused with ashift=9)?

Detailed space use is available from the zfs_blkstats mdb macro 
as previously described in such threads.

> 3) How many parts of ZFS pool are actually affected by the
>   ashift setting?

Everything is impacted. But that isn't a useful answer.

>   From what I gather, it is applied at the top-level vdev
>   level (I read that one can mix ashift=9 and ashift=12
>   TLVDEVs in one pool spanning several TLVDEVs). Is that
>   a correct impression?

Yes

>   If yes, how does ashift size influence the amount of
>   slots in uberblock ring (128 vs. 32 entries) which is
>   applied at the leaf vdev level (right?) but should be
>   consistent across the pool?

It should be consistent across the top-level vdev. 

There is 128KB of space available for the uberblock list. The minimum 
size of an uberblock entry is 1KB. Obviously, a 4KB disk can't write only 1KB, 
so for 4KB sectors, there are 32 entries in theuberblock list.

>   As far as I see in ZFS on-disk format, all sizes and
>   offsets are in either bytes or 512b blocks, and the
>   ashift'ed block size is not actually used anywhere
>   except to set the minimal block size and its implicit
>   alignment during writes.

The on-disk format doc is somewhat dated and unclear here. UTSL.

>   Is it wrong to think that it's enough to forge an
>   uberblock with ashift=9 and a matching self-checksum
>   and place that into the pool (leaf vdev labels), and
>   magically have all old data 4kb-aligned still available,
>   while new writes would be 512b-aligned?

Yes, it is wrong to think that.

> 
> Thanks for helping me grasp the theory,
> //Jim

 -- richard

--
DTrace Conference, April 3, 2012, 
http://wiki.smartos.org/display/DOC/dtrace.conf
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422






_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to