> From: Haudy Kazemi [mailto:kaze0...@umn.edu]
>
> There is another optimization in the Best Practices Guide that says the
> number of devices in a vdev should be (N+P) with P = 1 (raidz), 2
> (raidz2), or 3 (raidz3) and N equals 2, 4, or 8.
> I.e. 2^N + P where N is 1, 2, or 3 and P is the RAIDZ level.
> 
> I.e. Optimal sizes
> RAIDZ1 vdevs should have 3, 5, or 9 devices in each vdev
> RAIDZ2 vdevs should have 4, 6, or 10 devices in each vdev
> RAIDZ3 vdevs should have 5, 7, or 11 devices in each vdev

This sounds logical, although I don't know how real it is.  The logic seems
to be ... Assuming slab sizes of 128K, the amount of data written to each
disk within the vdev gets divided into something which is a multiple of 512b
or 4K (newer drives supposedly starting to use 4K block sizes instead of
512b).  

But I have doubts about the real-ness here, because ... An awful lot of
times, your actual slabs are smaller than 128K just because you're not
performing sustained sequential writes very often.

But it seems to make sense, whenever you *do* have some sequential writes,
you would want the data written to each disk to be a multiple of 512b or 4K.
If you had a 128K slab, divided into 5, then each disk would write 25.6K and
even for sustained sequential writes, some degree of fragmentation would be
impossible to avoid.  Actually, I don't think fragmentation is techinically
the correct term for that behavior.  It might be more appropriate to simply
say it forces a less-than-100% duty cycle.

And another thing ... Doesn't the checksum take up some space anyway?  Even
if you obeyed the BPG and used ... let's say ... 4 disks for N ... then each
disk has 32K of data to write, which is a multiple of 4K and 512b ... but
each disk also needs to write the checksum.  So each disk writes 32K + a few
bytes.  Which defeats the whole purpose anyway, doesn't it?

The effect, if real at all, might be negligible.  I don't know how small it
is, but I'm quite certain it's not huge.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to