That was exactly the summary I was looking for.

I would enourage folks to read the referenced link Toby sent:

   http://blogs.sun.com/roller/page/bonwick?entry=zfs_end_to_end_data

...also the linked RAID-Z summary from this article was very interesting, since something like this is needed for recovery from checksum failures:

Which brings us to the coolest thing about RAID-Z: self-healing data. In addition to handling whole-disk failure, RAID-Z can also detect and correct silent data corruption. Whenever you read a RAID-Z block, ZFS compares it against its checksum. If the data disks didn't return the right answer, ZFS reads the parity and then does combinatorial reconstruction to figure out which disk returned bad data. It then repairs the damaged disk and returns good data to the application. ZFS also reports the incident through Solaris FMA so that the system administrator knows that one of the disks is silently failing.

Finally, note that *RAID-Z doesn't require any special hardware.* It doesn't need NVRAM for correctness, and it doesn't need write buffering for good performance. With RAID-Z, ZFS makes good on the original RAID promise: it provides fast, reliable storage using cheap, commodity disks.


   http://blogs.sun.com/roller/page/bonwick?entry=raid_z




Toby Thain wrote:


On 4-Aug-06, at 3:25 AM, Russell Leighton wrote:


If the software (filesystem like ZFS or database like Berkeley DB) finds a mismatch for a checksum on a block read, then what?

Is there a recovery mechanism, or do you just be happy you know there is a problem (and go to backup)?


ZFS will correct from a good mirror (http://blogs.sun.com/roller/page/bonwick?entry=zfs_end_to_end_data).
--T


Thx

Matthias Andree wrote:

Berkeley DB can, since version 4.1 (IIRC), write checksums (newer
versions document this as SHA1) on its database pages, to detect
corruptions and writes that were supposed to be atomic but failed
(because you cannot write 4K or 16K atomically on a disk drive).




Reply via email to