2012-10-12 11:11, andy thomas wrote:
Great, thanks for the explanation! I didn't realise you could have a
sort of 'stacked pyramid' vdev/pool structure.
Well, you can - the layers are "pool" - "top-level VDEVs" - "leaf
VDEVs", though on trivial pools like single-disk ones, the layers
kinda merge into one or two :) This should be described in the
manpage in greater detail.
So the pool stripes over Top-Level VDEVs (TLVDEVs), roughly by
round-robining whole logical blocks upon write, and then each
tlvdev depending on its redundancy configuration forms the sectors
to be written onto its component leaf vdevs (low-level disks,
partitions or slices, luns, files, etc.) Since full-stripe writes
are not required by ZFS, smaller blocks can consume less sectors
than there are leafs (disks) in a tlvdev, but this does not result
in lost space "holes" nor in RMW cycles like on full-stripe RAID
systems. If there's a free "hole" of contiguous logical addressing
(roughly, striped across leaf vdevs within the tlvdev), where the
userdata sectors (after optional compression) plus the redundancy
sectors fit - it will be used.
I guess it is because of this contiguous addressing that a tlvdev
with raidzN can not (currently) change the number of component disks,
and a pool can not decrease the number of tlvdevs. If you add new
tlvdevs to an existing pool, the ZFS algorithms will try to put
some more load on emptier tlvdevs and balance the writes, although
according to discussions, this can still lead to disbalance and
performance problems on particular installations.
In fact, you can (although not recommended due to balancing reasons)
have tlvdevs of mixed size (like in Freddie's example) and even of
different structure (i.e. mixing raidz and mirrors or even single
LUNs) by forcing the disk attachment.
Note however that a loss of a tlvdev kills your whole pool, so
don't stripe important data over single disks/luns ;)
And you don't have control of what gets written where, so you'd
also get an averaged performance mix of raidz and mirrors with
unpredictable performance for particular userdata block's storage.
zfs-discuss mailing list