On Sat, Oct 27, 2012 at 12:35 PM, Jim Klimov <jimkli...@cos.ru> wrote:

> 2012-10-27 20:54, Toby Thain wrote:
>> Parity is very simple to calculate and doesn't use a lot of CPU - just
>>> slightly more work than reading all the blocks: read all the stripe
>>> blocks on all the drives involved in a stripe, then do a simple XOR
>>> operation across all the data.  The actual checksums are more expensive
>>> as they're MD5 - much nicer when these can be hardware accelerated.
>> Checksums are MD5??
> No, they are fletcher variants or sha256, with more probably coming
> up soon, and some of these might also be boosted by certain hardware
> capabilities, but I tend to agree that parity calculations likely
> are faster (even if not all parities are simple XORs - that would
> be silly for double- or triple-parity sets which may use different
> algos just to be sure).

I would expect raidz2 and 3 to use the same math as traditional raid6 for
parity: https://en.wikipedia.org/wiki/Raid6#RAID_6 .  In particular, the
sentence "For a computer scientist, a good way to think about this is that
<operator> is a bitwise XOR operator and <g superscript i> is the action of
a linear feedback shift register on a chunk of data."  If I understood it
correctly, it does a different number of iterations of the LFSR on each
sector, depending on which sector among the data sectors it is, and that
the LFSR is applied independently to small groups of bytes in each sector,
and then does the XOR to get the second parity sector (and for third
parity, I believe it needs to use a different generator polynomial for the
LFSR).  For small numbers of iterations, multiple iterations of the LSFR
can be optimized to a single shift and an XOR with a lookup value on the
lowest bits.  For larger numbers of iterations (if you have, say, 28 disks
in a raidz3), it could construct the 25th iteration by doing 10, 10, 5, but
I have no idea how ZFS actually implements it.

As I understand it, fletcher checksums are extremely simple and are
basically 2 additions and 2 modulus per however many bytes at a time it
processes, so I wouldn't be surprised if fletcher was about the same speed
as computing second/third parity.  SHA256 I don't know, I would expect it
to be more expensive, simply because it is a cryptographic hash.

zfs-discuss mailing list

Reply via email to