That was exactly the summary I was looking for.
I would enourage folks to read the referenced link Toby sent:
http://blogs.sun.com/roller/page/bonwick?entry=zfs_end_to_end_data
...also the linked RAID-Z summary from this article was very
interesting, since something like this is needed for recovery from
checksum failures:
Which brings us to the coolest thing about RAID-Z: self-healing data.
In addition to handling whole-disk failure, RAID-Z can also detect and
correct silent data corruption. Whenever you read a RAID-Z block, ZFS
compares it against its checksum. If the data disks didn't return the
right answer, ZFS reads the parity and then does combinatorial
reconstruction to figure out which disk returned bad data. It then
repairs the damaged disk and returns good data to the application. ZFS
also reports the incident through Solaris FMA so that the system
administrator knows that one of the disks is silently failing.
Finally, note that *RAID-Z doesn't require any special hardware.* It
doesn't need NVRAM for correctness, and it doesn't need write
buffering for good performance. With RAID-Z, ZFS makes good on the
original RAID promise: it provides fast, reliable storage using cheap,
commodity disks.
http://blogs.sun.com/roller/page/bonwick?entry=raid_z
Toby Thain wrote:
On 4-Aug-06, at 3:25 AM, Russell Leighton wrote:
If the software (filesystem like ZFS or database like Berkeley DB)
finds a mismatch for a checksum on a block read, then what?
Is there a recovery mechanism, or do you just be happy you know
there is a problem (and go to backup)?
ZFS will correct from a good mirror
(http://blogs.sun.com/roller/page/bonwick?entry=zfs_end_to_end_data).
--T
Thx
Matthias Andree wrote:
Berkeley DB can, since version 4.1 (IIRC), write checksums (newer
versions document this as SHA1) on its database pages, to detect
corruptions and writes that were supposed to be atomic but failed
(because you cannot write 4K or 16K atomically on a disk drive).