Hi Lasse,

I expanded this entry to include more details of the zpool list and
zfs list reporting.

See if the new explanation provides enough details.

Thanks,

Cindy

On 02/08/10 16:51, Lasse Osterild wrote:
On 09/02/2010, at 00.23, Daniel Carosone wrote:

On Mon, Feb 08, 2010 at 11:28:11PM +0100, Lasse Osterild wrote:
Ok thanks I know that the amount of used space will vary, but what's
the usefulness of the total size when ie in my pool above 4 x 1G
(roughly, depending on recordsize) are reserved for parity, it's not
like it's useable for anything else :)  I just don't see the point
when it's a raidz or raidz2 pool, but I guess I am missing something
here.
The basis of raidz is that each block is its own raid stripe, with its
own layout.  At present, this only matters for the size of the stripe.
For example, if I write a single 512-byte block, to a dual-parity
raidz2, I will write three blocks, to three disks.  With a larger
block, I will have more data over more disks, until the block is big
enough to stripe evenly over all of them. As the block gets bigger
yet, more is written to each disk as part of the stripe, and the
parity units get bigger to match the size of the largest data unit.
This "rounding" can very often mean that different disks have
different amounts of data for each stripe.
Crucially, it also means the ratio of parity-to-data is not fixed.
This tends to average out on a pool with lots of data and mixed block sizes, but not always; consider an extreme case of a pool
containing only datasets with blocksize=512. That's what the comments
in the documentation are referring to, and the major reason for the
zpool output you see.

In future, it may go further and be more important.

Just as the data count per stripe can vary, there's nothing
fundamental in the raidz layout that says that the same parity count
and method has to be used for the entire pool, either.  Raidz already
degrades to simple mirroring in some of the same small-stripe cases
discussed above.

There's no particular reason, in theory, why they could not also have
different amounts of parity on a per-block basis.  I imagine that when
bp-rewrite and the ability to reshape pools comes along, this will
indeed be the case, at least during transition.  As a simple example,
when reshaping a raidz1 to a raidz2 by adding a disk, there will be
blocks with single parity and other blocks with dual for a time until
the operation is finished.
Maybe one day in the future, there will just be a basic "raidz" vdev
type, and we can set dataset properties for the number of additional
parity blocks each should get.  This might be a little like we can
currently set "copies", including that it would only affect new writes
and lead to very mixed redundancy states.
Noone has actually said this is a real goal, and the reasons it's not
presently allowed include administrative and operational simplicity as
well as implementation and testing constraints, but I think it would
be handy and cool.
--
Dan.

Thanks Dan! :)

That explanation made perfect sense and I appreciate you taking the time to 
write this, perhaps parts of it could go into the FAQ ?  I realise that it's 
sort of in there already but it doesn't explain it very well.

Cheers,

 - Lasse
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to