On Mar 18, 2012, at 11:16 AM, Jim Klimov wrote:
> Hello all,
> I was asked if it is possible to convert a ZFS pool created
> explicitly with ashift=12 (via the tweaked binary) and filled
> with data back into ashift=9 so as to use the slack space
> from small blocks (BP's, file tails, etc.)
copy out, copy in. Whether this is easy or not depends on how well
you plan your storage use ...
> The user's HDD marketing text says that it "efficiently"
> emulates 512b sectors while using 4Kb ones natively (that's
> why ashift=12 was enforced in the first place).
Marketing: 2 drink minimum
> Questions are:
> 1) How bad would a performance hit be with 512b blocks used
> on a 4kb drive with such "efficient emulation"?
Depends almost exclusively on the workload and hardware. In my
experience, most folks who bite the 4KB bullet have low-cost HDDs
where one cannot reasonably expect high performance.
> Is it
> possible to model/emulate the situation somehow in advance
> to see if it's worth that change at all?
It will be far more cost effective to just make the change.
> 2) Is it possible to easily estimate the amount of "wasted"
> disk space in slack areas of the currently active ZFS
> allocation (unused portions of 4kb blocks that might
> become available if the disks were reused with ashift=9)?
Detailed space use is available from the zfs_blkstats mdb macro
as previously described in such threads.
> 3) How many parts of ZFS pool are actually affected by the
> ashift setting?
Everything is impacted. But that isn't a useful answer.
> From what I gather, it is applied at the top-level vdev
> level (I read that one can mix ashift=9 and ashift=12
> TLVDEVs in one pool spanning several TLVDEVs). Is that
> a correct impression?
> If yes, how does ashift size influence the amount of
> slots in uberblock ring (128 vs. 32 entries) which is
> applied at the leaf vdev level (right?) but should be
> consistent across the pool?
It should be consistent across the top-level vdev.
There is 128KB of space available for the uberblock list. The minimum
size of an uberblock entry is 1KB. Obviously, a 4KB disk can't write only 1KB,
so for 4KB sectors, there are 32 entries in theuberblock list.
> As far as I see in ZFS on-disk format, all sizes and
> offsets are in either bytes or 512b blocks, and the
> ashift'ed block size is not actually used anywhere
> except to set the minimal block size and its implicit
> alignment during writes.
The on-disk format doc is somewhat dated and unclear here. UTSL.
> Is it wrong to think that it's enough to forge an
> uberblock with ashift=9 and a matching self-checksum
> and place that into the pool (leaf vdev labels), and
> magically have all old data 4kb-aligned still available,
> while new writes would be 512b-aligned?
Yes, it is wrong to think that.
> Thanks for helping me grasp the theory,
DTrace Conference, April 3, 2012,
ZFS Performance and Training
zfs-discuss mailing list