Re: [zfs-discuss] Split responsibility for data with ZFS

Miles Nordin Fri, 12 Dec 2008 14:32:49 -0800

>>>>> "nw" == Nicolas Williams <nicolas.willi...@sun.com> writes:
>>>>> "tt" == Toby Thain <t...@telegraphics.com.au> writes:
>>>>> "jh" == Johan Hartzenberg <jhart...@gmail.com> writes:


    nw> If you can fully trust the SAN then there's no reason not to
    nw> run ZFS on top of it with no ZFS mirrors and no RAID-Z.

The best practice I understood is currently to use zpool-layer
redundancy especially with SAN even moreso than with single-spindle
local storage, because of (1) the new corruption problems people are
having with ZFS on single-LUN SAN's that they didn't have when using
UFS and vxfs on the same SAN, and (2) the new severity of the problem,
losing the whole pool instead of the few files you lose to UFS
corruption or that you're supposed to lose to random bit flips on ZFS.

The problems do not sound like random bit-flips.  They're corruption
of every ueberblock.  The best-guess explanation AIUI, is not FC
checksum gremlins---it's that write access to the SAN is lost and then
comes back---ex. if the SAN target loses power or fabric access but
the ZFS host doesn't reboot---and either the storage stack is
misreporting the failure or ZFS isn't correctly responding to the
errors.  see the posts I referenced.

Apparently the layering is not as simple in practice as one might
imagine.

Even if you ignore the post-mortem analysis of the corrupt pools and
look only at the symptom, if it were random corruption from DRAM and
FC checksum gremlins, we should see mostly reports of a few files lost
to checksum errors on single-LUN SAN's and reported in 'zpool status',
much more often than whole zpool's lost, yet exactly the opposite is
happening.

    jh> The only bit that I understand about why HW raid "might" be
    jh> bad is that if it had access to the disks behind a HW RAID
    jh> LUN, then _IF_ zfs were to encounter corrupted data in a read,

In at least one case it's certain there are no reported latent sector
errors from the SAN on the corrupt LUN---'dd if=<..lun..>
of=/dev/null' worked for at least one person who lost a single-LUN
zpool.  it doesn't sound to me like random bit-flips causing the
problem, since all copies of the ueberblock are corrupt, and that's a
bit far-fetched to happen randomly on a LUN that scrubs almost clean
when mounted with the second-newest ueberblock.

    jh> ZFS' notorious instability during error conditions.

right, availability is a reason to use RAID below ZFS layer.  It might
or might not be related to the SAN problems.  Maybe yes if the
corruption happens during a path failover or a temporary connectivity
interruption.  but the symptom's different from the
timeout/availability thread, is a corrupt unmountable pool.  The hang
discussion was about frozen systems where the pool imports fine after
reboot, which is a different symptom.

pgp5QBMINLSeY.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Split responsibility for data with ZFS

Reply via email to