Re: [zfs-discuss] [O.Seibert@cs.ru.nl: A broken ZFS pool...]

Jim Klimov Thu, 16 Feb 2012 05:09:17 -0800

2012-02-16 14:57, Olaf Seibert wrote:

On Wed 15 Feb 2012 at 14:49:14 +0100, Olaf Seibert wrote:

         NAME                     STATE     READ WRITE CKSUM
         tank                     FAULTED      0     0     2
           raidz2-0               DEGRADED     0     0     8
             da0                  ONLINE       0     0     0
             da1                  ONLINE       0     0     0
             da2                  ONLINE       0     0     0
             da3                  ONLINE       0     0     0
             3758301462980058947  UNAVAIL      0     0     0  was /dev/da4
             da5                  ONLINE       0     0     0


Current status: I've been running "zdb -bcsvL -e -L -p /dev tank", which
magical command I found from
http://sigtar.com/2009/10/19/opensolaris-zfs-recovery-after-kernel-panic/.
I apparently had to export the tank first.

It has been running overnight now, and the only output so far was

fourquid.0:/tmp$ sudo zdb -bcsvL -e -L -p /dev tank

Traversing all blocks to verify checksums ...
zdb_blkptr_cb: Got error 122 reading<42, 0, 3, 0>  DVA[0]=<0:508c6a90c00:3000>  
DVA[1]=<0:1813ba6c800:3000>  [L3 DMU dnode] fletcher4 lzjb LE contiguous unique double 
size=4000L/1c00P birth=244334305L/244334305P fill=18480533 
cksum=2a43556fd2b:95a3245729a27:15e3e48f3c6a490e:70fa77061df61a76 -- skipping
zdb_blkptr_cb: Got error 122 reading<42, 0, 3, 3>  DVA[0]=<0:508c6aa2000:3000>  
DVA[1]=<0:1813ba72800:3000>  [L3 DMU dnode] fletcher4 lzjb LE contiguous unique double 
size=4000L/1e00P birth=244334321L/244334321P fill=16777409 
cksum=2ad6a555e8f:a1dcced71be6c:191abf84e5905b05:e8564e4004372491 -- skipping


with the "error 122" messages appearing after an hour or so.

Would these 2 errors be the "2" in the CKSUM column?

I haven't tried yet if this automagically has fixed / unlinked these
blocks, but if it didn't, how would I do that?


ZDB so far is supposed to do only read-only checks by
directly accessing the storage hardware. Whatever errors
it finds are not propagated to the kernel and do not get
fixed, nor do they cause kernel panics if unfixable by
current algorithms. And it doesn't use the ARC cache, so
zdb is often quite slow (fetching my pool's DDT into a
text file for grep-analysis took over a day).

Well, the way I've been sent off a number of times, "you
can see in the source"; namely - go to src.illumos.org
and search the freebsd-gate for the error message text,
error number or the function name zdb_blkptr_cb().
From there you can try to figure out the logic that led
to the error. I've only got error=50 reported so far,
and apparently those were blocks that were unrecoverably
mismatching their previously known checksums (CKSUM error
counts at the raidz and pool levels).

I've also had more errors at the raidz level (2) vs pool
level (1); I guess this means that the logical block had
two ditto copies in ZFS, and both were erroneous.
For the pool these were counted as a single block with
two DVA addresses, and for the raidz these were two
separately stored and broken blocks.

I may be wrong in such interpretation, though.

How can I see whether
these blocks are "important", i.e. are required for access to much data?


These are L3 blocks, so they address at least three
layers of indirect block pointers (in sets of up to
128 entries in each intermediate block). Like this:
                             L3
                        L2[0]...L2[127]
           L1[0,0]...L1[0,127]  ...  L1[127,0]...[L1127,127]
     lots of L0[0,0,0]-L0[127,127,127] blockpointers referencing
           your userdata (including redundant ditto copies)

Overall, if your data sizes require, there can be up
to L7 blocks in the structure for each ZFS object, to
ultimately address its zillions of L0 blocks.

Each LN block has a "fill" field which tells you how
many L0 blocks it addresses in the end. In your case
fill=18480533 and fill=16777409 amounts to quite a
lot of data.

Would running it on OpenIndiana or so instead of on FreeBSD make a
difference?


Not sure about that... I THINK most of the code should
be the same, so it probably depends on which platform
you're more used to work on (and recompile some test
kernels in particular). I haven't tried FreeBSD in over
a decade, so can't help here :)

I'm still trying to punch my pool into a sane position,
so I can say that the current OpenIndiana code does not
contain a miraculous recovery wizard ;)


-Olaf.


Good luck, really,
//Jim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [O.Seibert@cs.ru.nl: A broken ZFS pool...]

Reply via email to