Re: Checksumming blocks? [was Re: the " 'official' point of view" expressed by kernelnewbies.org regarding reiser4 inclusion]

Russell Leighton Fri, 04 Aug 2006 04:52:58 -0700


That was exactly the summary I was looking for.


I would enourage folks to read the referenced link Toby sent:

   http://blogs.sun.com/roller/page/bonwick?entry=zfs_end_to_end_data

...also the linked RAID-Z summary from this article was veryinteresting, since something like this is needed for recovery fromchecksum failures:

Which brings us to the coolest thing about RAID-Z: self-healing data.In addition to handling whole-disk failure, RAID-Z can also detect andcorrect silent data corruption. Whenever you read a RAID-Z block, ZFScompares it against its checksum. If the data disks didn't return theright answer, ZFS reads the parity and then does combinatorialreconstruction to figure out which disk returned bad data. It thenrepairs the damaged disk and returns good data to the application. ZFSalso reports the incident through Solaris FMA so that the systemadministrator knows that one of the disks is silently failing.
Finally, note that *RAID-Z doesn't require any special hardware.* Itdoesn't need NVRAM for correctness, and it doesn't need writebuffering for good performance. With RAID-Z, ZFS makes good on theoriginal RAID promise: it provides fast, reliable storage using cheap,commodity disks.


   http://blogs.sun.com/roller/page/bonwick?entry=raid_z




Toby Thain wrote:

On 4-Aug-06, at 3:25 AM, Russell Leighton wrote:
If the software (filesystem like ZFS or database like Berkeley DB)finds a mismatch for a checksum on a block read, then what?
Is there a recovery mechanism, or do you just be happy you knowthere is a problem (and go to backup)?
ZFS will correct from a good mirror(http://blogs.sun.com/roller/page/bonwick?entry=zfs_end_to_end_data).
--T
Thx

Matthias Andree wrote:
Berkeley DB can, since version 4.1 (IIRC), write checksums (newer
versions document this as SHA1) on its database pages, to detect
corruptions and writes that were supposed to be atomic but failed
(because you cannot write 4K or 16K atomically on a disk drive).

Re: Checksumming blocks? [was Re: the " 'official' point of view" expressed by kernelnewbies.org regarding reiser4 inclusion]

Reply via email to