On Sun, Jan 15, 2012 at 3:04 PM, Jim Klimov <jimkli...@cos.ru> wrote:
> "Does raidzN actually protect against bitrot?"
> That's a kind of radical, possibly offensive, question formula
> that I have lately.

Yup, it does. That's why many of us use it.

> The way I get it, RAID5/6 generally has no mechanism to detect
> *WHICH* sector was faulty, if all of them got read without
> error reports from the disk.

It validates the checksum on every read. If it doesn't match, then
one of the devices (at least) is returning incorrect data. So it simply
tries reconstruction assuming each device in turn is bad until it gets
the right answer. That gives you the correct data, and tells you which
device was wrong, and then you write back the correct data to the
errant device.

> Perhaps it won't even test whether
> parity matches and bytes zero out, as long as there were no read
> errors reported.

Absolutely not. It always checks, regardless. (Try writing over one
half of a zfs mirror with dd and watch it cheerfully repair your data
without an actual error in sight.)

> 1) How does raidzN protect agaist bit-rot without known full
>   death of a component disk, if it at all does?
>   Or does it only help against "loud corruption" where the
>   disk reports a sector-access error or dies completely?
>
> 2) Do the "leaf blocks" (on-disk sectors or ranges of sectors
>   that belong to a raidzN stripe) have any ZFS checksums of
>   their own? That is, can ZFS determine which of the disks
>   produced invalid data and reconstruct the whole stripe?

No, the checksum is against the whole stripe. And you do the
combinatorial reconstruction to work out which is bad.

> 2**) Alternatively, how does raidzN get into situation like
>   "I know there is an error somewhere, but don't know where"?
>   Does this signal simultaneous failures in different disks
>   of one stripe?

If you have raidz1, and two devices give bad data, then you don't
have enough redundancy to do the reconstruction. I've not seen this
myself for random bitrot, but it's the sort of thing that can give you
trouble if you lose a whole disk and then hit a bad block on another
device during resilver.

(Regular scrubs to identify and fix bad individual blocks before you have
to do a resilver are therefore a good thing.)

>   How *do* some things get fixed then - can only dittoed data
>   or metadata be salvaged from second good copies on raidZ?

You can recover anything you have enough redundancy for. Which
means everything, up to the redundancy of the vdev. Beyond that,
you may be able to recover dittoed data (of which metadata is just
one example) even if you've lost an entire vdev.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to