Re: [zfs-discuss] Raidz - what is stored in parity?

Marty Scholes Wed, 11 Aug 2010 06:53:47 -0700

Erik Trimble wrote:
> On 8/10/2010 9:57 PM, Peter Taps wrote:
> > Hi Eric,
> >
> > Thank you for your help. At least one part is clear
> now.
> >
> > I still am confused about how the system is still
> functional after one disk fails.
> >
> > Consider my earlier example of 3 disks zpool
> configured for raidz-1. To keep it simple let's not
> consider block sizes.
> >
> > Let's say I send a write value "abcdef" to the
> zpool.
> >
> > As the data gets striped, we will have 2 characters
> per disk.
> >
> > disk1 = "ab" + some parity info
> > disk2 = "cd" + some parity info
> > disk3 = "ef" + some parity info
> >
> > Now, if disk2 fails, I lost "cd." How will I ever
> recover this? The parity info may tell me that
> something is bad but I don't see how my data will get
> recovered.
> >
> > The only good thing is that any newer data will now
> be striped over two disks.
> >
> > Perhaps I am missing some fundamental concept about
> raidz.
> >
> > Regards,
> > Peter
> 
> Parity is not intended to tell you *if* something is
> bad (well, it's not 
> *designed* for that). It tells you how to RECONSTRUCT
> something should 
> it be bad.  ZFS uses Checksums of the data (which are
> stored as data 
> themselves) to tell if some data is bad, and thus
> needs to be re-written


To follow up Erik's post, parity is used both to detect and correct errors in a 
string of equal sized numbers, each parity is equal in size to each of the 
numbers.  In the old serial protocols, one bit was used to detect an error in a 
string of 7 bits, so each "number" in the string was a one bit.  In the case of 
ZFS, each "number" in the string is a disk block.  The length of the string of 
numbers is completely arbitrary.

I am rusty on parity math, but Reed-Solomon is used (of which XOR is a 
degenerate case) such that each parity is independent of the other parities.  
RAIDZ can support up to three parities per stripe.

Generally, a single parity can either detect a single corrupt number in a 
string or if it is known which number is corrupt, a single parity can correct 
that number.  Traditional RAID5 makes the assumption that it knows which number 
(i.e. block) is bad because the disk failed and therefore can use the parity 
block to reconstruct it.  RAID5 cannot reconstruct a random bit-flip.

RAIDZ takes a different approach where the checksum for the number string (i.e. 
stripe) exists in a different, already validated stripe.  With that checksum in 
hand, ZFS knows when a stripe is corrupt but not which block.  ZFS will then 
reconstruct each data block in the stripe using the parity block, one data 
block at a time until the checksum matches.  At that point ZFS knows which 
block is bad and can rebuild it and write it to disk.  A scrub does this for 
all stripes and all parities in each stripe.

Using the example above, the disk layout would look more like the following for 
a single stripe, and as Erik mentioned, the location of the data and parity 
blocks will change from stripe to stripe:
disk1 = "ab"
disk2 = "cd"
disk3 = parity info

Again using the example above, if disk 2 fails, or even stays online but 
producess bad data, the information can be reconstructed from disk 3.

The beauty of ZFS is that it does not depend on parity to detect errors, your 
stripes can be as wide as you want (up to 100-ish devices) and you can choose 
1, 2 or 3 parity devices.

Hope that makes sense,
Marty
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Raidz - what is stored in parity?

Reply via email to