Re: [zfs-discuss] corrupt zfs stream? checksum mismatch

Miles Nordin Wed, 13 Aug 2008 13:28:52 -0700

>>>>> "jw" == Jonathan Wheeler <[EMAIL PROTECTED]> writes:


    jw> A common example used all over the place is zfs send | ssh
    jw> $host. In these examples is ssh guaranteeing the data delivery
    jw> somehow?

it is really all just appologetics.  It sounds like a zfs bug to me.

The only alternative is bad hardware (not disks), so you could try
memory testers, continuous big 'make -n <big number, like 4 - 10>'
builds, scripted continuous zpool send/recv, to look for this.

    jw> you may end up in a situation like the one I'm in today if you
    jw> don't somehow test your backups.

which is why I asked you to check -n spots it.  It doesn't---the tool
gives you no way to test the backups!

I've lost before because I backed things up onto tape, wiped the
original, and then had the tape go bad.  The idea of backups is to
always have two copies, so I should have written two tapes.  but I
don't see any reason to believe you wouldn't get two bad copies in
your case since it sounds like a bug.

I also made the mistake of using FancyTape---I used some DAT bullshit
with a ``table of contents'' that can become ``corrupt'' if you power
off the drive at the wrong moment, which simpler tape formats don't
have.  DAT also has these block checksums, where some drives if they
can't read part of the tape, they just hang forever and can't seek
past it.  (weirdly analagous to zfs receive).  I had already learned
not to gzip a tarball before writing it to tape if the tarball
contained mostly uncompressable things, because the gzip format is
less robust than the tar format.  but, I got bitten anyway because of
the stupid tape TOC and the poor exception handling in the DAT drive's
firmware.

What's required, *given hindsight*, is to realize that the purpose of
backups for ZFS users is partly to protect ourselves from ZFS bugs, so
the backups need to be stored in a format that has nothing to do with
ZFS, like tar or UDF or a non-ZFS filesystem.

however if you have lots of snapshots or clones, I'm not sure this is
possible because the data expands too much.  In that case I might
store backups in an zpool rather than in a file, because I expect
zpool corruption bugs will get more attention sooner than 'zfs send'
corruption bugs.  but, that's still sketchy, and had it not been for
your experience, I might have trusted the zfs send format.

``learn'', fine, but I don't think you've done anything unreasonable.

    jw> is there anything I can do to recover my data from these zfs
    jw> dumps? Anything at all :)

fix 'zfs receive' to ignore the error? :)

burry the dumps in the sand for two years, and hope someone else fixes
ZFS in the mean time? :)  That's what I did to my tape with the bad
TOC.  no good news yet.

    jw> If the problem is "just" that "zfs receive" is checksumming
    jw> the data on the way in, can I disable this somehow within zfs?
    jw> Can I globally disable checksumming in the kernel module? mdb
    jw> something or rather?

sounds plausible but I don't know how, so please let me know if you
find a way.

I found also some magic /etc/system incantations, but it doesn't seem
to apply to 'zfs receive'.  It's more of what you found, more ``simon
sez, import!'' stuff:

 http://opensolaris.org/jive/message.jspa?messageID=192572#194209
 http://sunsolve.sun.com/search/document.do?assetkey=1-66-233602-1

pgpX1q8bMSIVj.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] corrupt zfs stream? checksum mismatch

Reply via email to