I've had this error on my pool since over a year ago, when I
posted and asked about it. The general consent was that this
is only fixable by recreation of the pool, and that if things
don't die right away, the problem may be benign (i.e. in some
first blocks of MOS that are in practice written once and not
really used nor relied upon).

In detailed "zpool status" this error shows as:

By analogy to other errors in unnamed files, this was deemed to
be the MOS dataset, object number 0.

Anyway, now that I am digging deeper into ZFS bowels (as detailed
in my other current thread), I've made a tool which can request
sectors which pertain to a given DVA and verify the XOR parity.

With ZDB I've extracted what I believe to be the block-pointer
tree for this despite ZDB trying to dump the whole pool upon
access to no child dataset (I saw recently on-list that someone
picked up this ZDB "bug" as well), I used a bit of perl magic:

# time zdb -ddddd -bbbbbb -e 1601233584937321596 0 | \
  perl -e '$a=0; while (<>) { chomp; if ( /^Dataset mos/ ) { $a=1; }
  elsif ( /^Dataset / ) {$a=2; exit 0;};
  if ( $a == 1 ) { print "$_\n"; }  }' > mos.txt

This gives me everything ZDB thinks is part of MOS, up to the
start of a next Dataset dump:

Dataset mos [META], ID 0, cr_txg 4, 50.5G, 76355 objects,
rootbp DVA[0]=<0:590df6a4000:3000> DVA[1]=<0:8e4c636000:3000>
DVA[2]=<0:8107426b000:3000> [L0 DMU objset] fletcher4 lzjb LE
contiguous unique triple size=800L/200P birth=326429440L/326429440P
fill=76355 cksum=1042f7ae8a:63ab010a1de:138cbe92583cd:29e4cd03f544fe

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
         0    3    16K    16K  84.1M  80.2M   46.49  DMU dnode
        dnode flags: USED_BYTES
        dnode maxblkid: 5132
Indirect blocks:

               0 L2   DVA[0]=<0:590df6a1000:3000> DVA[1]=
<0:8e4c630000:3000> DVA[2]=<0:81074268000:3000> [L2 DMU dnode]
fletcher4 lzjb LE contiguous unique triple size=4000L/e00P
birth=326429440L/326429440P fill=76355

               0  L1  DVA[0]=<0:590df69b000:6000> DVA[1]=
<0:8fd76b8000:6000> DVA[2]=<0:81074262000:6000> [L1 DMU dnode]
fletcher4 lzjb LE contiguous unique triple size=4000L/1200P
birth=326429440L/326429440P fill=1155

               0   L0 DVA[0]=<0:590df695000:3000> DVA[1]=
<0:8e4c61e000:3000> DVA[2]=<0:8107425c000:3000> [L0 DMU dnode]
fletcher4 lzjb LE contiguous unique triple size=4000L/c00P
birth=326429440L/326429440P fill=31
(for a total of 3572 block pointers)

I fed this list into my new verification tool, testing all DVA
ditto copies, and it found no blocks with bad sectors - all the
XOR parities and the checksums matched their sector or two worth
of data.

So, given that there are no on-disk errors in the "Dataset mos
[META], ID 0" "Object #0" - what does the zpool scrub find time
after time and call an "error in metadata:0x0"?

//Jim Klimov

zfs-discuss mailing list

Reply via email to