> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Erik Trimble
> 
> the thing that folks tend to forget is that RaidZ is IOPS limited.  For
> the most part, if I want to reconstruct a single slab (stripe) of data,
> I have to issue a read to EACH disk in the vdev, and wait for that disk
> to return the value, before I can write the computed parity value out
> to
> the disk under reconstruction.

If I'm trying to interpret your whole message, Erik, and condense it, I
think I get the following.  Please tell me if and where I'm wrong.

In any given zpool, some number of slabs are used in the whole pool.  In
raidzN, a portion of each slab is written on each disk.  Therefore, during
resilver, if there are a total of 1million slabs used in the zpool, it means
each good disk will need to read 1million partial slabs, and the replaced
disk will need to write 1 million partial slabs.  Each good disk receives a
read request in parallel, and all of them must complete before a write is
given to the new disk.  Each read/write cycle is completed before the next
cycle begins.  (It seems this could be accelerated by allowing all the good
disks to continue reading in parallel instead of waiting, right?)

The conclusion I would reach is:

Given no bus bottleneck:

It is true that resilvering a raidz will be slower with many disks in the
vdev, because the average latency for the worst of N disks will increase as
N increases.  But that effect is only marginal, and bounded between the
average latency of a single disk, and the worst case latency of a single
disk.

The characteristic that *really* makes a big difference is the number of
slabs in the pool.  i.e. if your filesystem is composed of mostly small
files or fragments, versus mostly large unfragmented files.


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to