Hello experts, and preferably those in the know of the
low-level side of ZFS!

I've mentioned this in another thread, but this question
may need some more general attention:

My main data pool shows a discrepancy between traversal
and alloc sizes, as quoted below. In another thread, it
was suggested that due to some corruption ZFS might get
two different block pointers pointing to the same logical
block - so one of them has a checksum mismatch and the
overall accounting error. This sounds credible to me.

Question to the low-level gurus: can it be so despite
ZFS's best efforts to keep everything perfectly safe?

Is it possible to determine which block pointer is
invalid and perhaps enforce its destruction or
reallocation, similarly to discovery and "recovery"
of "lost clusters" on other FSes (maybe using some
of the ZFS-forensics Python scripts out there)?
If this matter really concerns some 12Kb of userdata,
I won't mind corrupting or losing it, if this would
make my pool "sane" again.

I do want to ensure that my pool is consistent,
even if it means forging some blocks on-disk,
bypassing ZFS, and perhaps sacrificing a little
bit of the data (preferably - knowing what it is,
so maybe if I have another copy lying around -
I can fetch it; otherwise I'd know and accept
what is gone).

However I do not want to recreate this pool from
scratch, because I don't have a backup or means
to make a backup of all its entirety of data.
As a presumably reliable home storage, it was the
backup and original storage has now been repurposed
and filled up as well ;)


More detailed ZDB walks (i.e. to detect leaked blocks)
had failed, probably due to depleting RAM (8Gb) and
SWAP (35Gb), I might retry with a bit more swap soon.
Also all the tests quoted below were done in oi_148a
on home-brewn hardware (ex-desktop, no ECC), and I
plan to redo some of them in oi_151a.


Sorry, now the post becomes lengthy due to "screenshots"...

======= Simple ZDB walk of the pool:

# time zdb -bsvL -e 1601233584937321596
Traversing all blocks ...

block traversal size 9044810649600 != alloc 9044810661888 (unreachable 12288)

        bp count:        85247389
        bp logical:    8891475160064      avg: 104302
bp physical: 7985515234304 avg: 93674 compression: 1.11 bp allocated: 12429088972800 avg: 145800 compression: 0.72 bp deduped: 3384278323200 ref>1: 13909855 deduplication: 1.27
        SPA allocated: 9044810661888     used: 75.64%

Blocks  LSIZE   PSIZE   ASIZE     avg    comp   %Total  Type
     -      -       -       -       -       -        -  unallocated
     2    32K      4K   72.0K   36.0K    8.00     0.00  object directory
     3  1.50K   1.50K    108K   36.0K    1.00     0.00  object array
     2    32K   2.50K   72.0K   36.0K   12.80     0.00  packed nvlist
     -      -       -       -       -       -        -  packed nvlist size
 7.80K   988M    208M   1.12G    147K    4.75     0.01  bpobj
     -      -       -       -       -       -        -  bpobj header
- - - - - - - SPA space map header
  185K   761M    523M   6.57G   36.3K    1.46     0.06  SPA space map
    22  1020K   1020K   1.58M   73.6K    1.00     0.00  ZIL intent log
  933K  14.6G   3.11G   25.2G   27.6K    4.69     0.22  DMU dnode
 1.75K  3.50M    898K   42.0M   24.0K    3.99     0.00  DMU objset
     -      -       -       -       -       -        -  DSL directory
390 243K 200K 13.7M 36.0K 1.21 0.00 DSL directory child map 388 298K 208K 13.6M 36.0K 1.43 0.00 DSL dataset snap map
   715  10.2M   1.14M   25.1M   36.0K    8.92     0.00  DSL props
     -      -       -       -       -       -        -  DSL dataset
     -      -       -       -       -       -        -  ZFS znode
     -      -       -       -       -       -        -  ZFS V0 ACL
 76.1M  8.06T   7.25T   11.2T    150K    1.11    98.67  ZFS plain file
 2.17M  2.76G   1.33G   52.7G   24.3K    2.08     0.46  ZFS directory
   341   314K    171K   7.99M   24.0K    1.84     0.00  ZFS master node
   856  25.4M   1.16M   20.1M   24.1K   21.92     0.00  ZFS delete queue
     -      -       -       -       -       -        -  zvol object
     -      -       -       -       -       -        -  zvol prop
     -      -       -       -       -       -        -  other uint8[]
     -      -       -       -       -       -        -  other uint64[]
     -      -       -       -       -       -        -  other ZAP
- - - - - - - persistent error log
    33  4.02M    763K   4.46M    139K    5.39     0.00  SPA history
     -      -       -       -       -       -        -  SPA history offsets
     1    512     512   36.0K   36.0K    1.00     0.00  Pool properties
     -      -       -       -       -       -        -  DSL permissions
 17.1K  12.7M   8.63M    411M   24.0K    1.48     0.00  ZFS ACL
     -      -       -       -       -       -        -  ZFS SYSACL
     5  80.0K   5.00K    120K   24.0K   16.00     0.00  FUID table
     -      -       -       -       -       -        -  FUID table size
1.37K 723K 705K 49.3M 36.0K 1.03 0.00 DSL dataset next clones
     -      -       -       -       -       -        -  scan work queue
 2.69K  2.57M   1.36M   64.5M   24.0K    1.89     0.00  ZFS user/group used
- - - - - - - ZFS user/group quota - - - - - - - snapshot refcount tags
 1.87M  7.48G   4.41G   67.4G   36.0K    1.70     0.58  DDT ZAP algorithm
     2    32K   4.50K   72.0K   36.0K    7.11     0.00  DDT statistics
    21  10.5K   10.5K    504K   24.0K    1.00     0.00  System attributes
   288   144K    144K   6.75M   24.0K    1.00     0.00  SA master node
288 432K 144K 6.75M 24.0K 3.00 0.00 SA attr registration
   576  9.00M   1008K   13.5M   24.0K    9.14     0.00  SA attr layouts
     -      -       -       -       -       -        -  scan translations
     -      -       -       -       -       -        -  deduplicated block
 1.84K  4.73M   1.20M   66.3M   36.0K    3.95     0.00  DSL deadlist map
- - - - - - - DSL deadlist map hdr
    94  68.0K   50.0K   3.30M   36.0K    1.36     0.00  DSL dir clones
    11  1.38M   49.5K    792K   72.0K   28.44     0.00  bpobj subobj
    20   258K   49.5K    864K   43.2K    5.21     0.00  deferred free
   815  22.0M   10.3M   23.8M   29.9K    2.13     0.00  dedup ditto
 81.3M  8.09T   7.26T   11.3T    142K    1.11   100.00  Total

capacity operations bandwidth ---- errors ---- description used avail read write read write read write cksum pool 8.23T 2.65T 89 0 418K 0 0 0 0 raidz2 8.23T 2.65T 89 0 418K 0 0 0 0 /dev/dsk/c6t0d0s0 8 0 497K 0 0 0 0 /dev/dsk/c6t1d0s0 13 0 531K 0 0 0 0 /dev/dsk/c6t2d0s0 9 0 479K 0 0 0 0 /dev/dsk/c6t3d0s0 8 0 495K 0 0 0 0 /dev/dsk/c6t4d0s0 13 0 531K 0 0 0 0 /dev/dsk/c6t5d0s0 9 0 481K 0 0 0 0

real    632m2.412s
user    311m54.827s
sys     5m1.200s


======== Attempts to analyze the pool with leaked-block
detection failed despite adding lots of swap (35Gb),
but likely due to running out of VM as shown below:

root@openindiana:~# time zdb -bsvc -e 1601233584937321596

Traversing all blocks to verify checksums and verify nothing leaked ...

Assertion failed: zio_wait(zio_claim(0L, zcb->zcb_spa, refcnt ? 0 : spa_first_txg(zcb->zcb_spa), bp, 0L, 0L, ZIO_FLAG_CANFAIL)) == 0 (0x2 == 0x0), file ../zdb.c, line 1950
Abort

real    7197m41.288s
user    291m39.256s
sys     25m48.133s

root@openindiana:~# time zdb -bb -e 1601233584937321596

Traversing all blocks to verify nothing leaked ...

(LAN Disconnected; RAM/SWAP used up)


=== Virtual memory was depleted by this ZDB walk attempt:

# iostat -Xn -Td c6t0d0 c6t1d0 c6t2d0 c6t3d0 c6t4d0 c6t5d0  60
iostat: near-0 (20-40kb/s in 60sec avg)

# top
last pid: 4154; load avg: 0.16, 0.17, 0.17; up 7+06:59:56 17:10:31
50 processes: 49 sleeping, 1 on cpu
CPU states: 92.0% idle,  0.0% user,  8.0% kernel,  0.0% iowait,  0.0% swap
Kernel: 1618 ctxsw, 142 trap, 1163 intr, 774 syscall, 77 flt, 256 pgin, 264 pgout
Memory: 8191M phys mem, 128M free mem, 35G total swap, 998M free swap

   PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
  4133 root      205  60    0   23G 6541M sleep  255:04  0.24% zdb
  4028 root        1  59    0 2300K 1380K sleep   35:23  0.18% vmstat
  3887 root        1  59    0 3984K  896K cpu/1   27:08  0.13% top
  3705 jack        1  59    0 7596K 1048K sleep    2:59  0.01% sshd

# vmstat 1
 kthr      memory            page            disk          faults      cpu
r b w swap free re mf pi po fr de sr f0 lf lf rm in sy cs us sy id 0 2 0 404468 130880 2 76 269 285 285 0 636 0 0 0 0 1660 762 1834 0 10 90 0 2 0 404468 130928 1 66 228 272 276 0 461 0 0 0 0 1453 786 1625 0 8 92 0 2 0 404468 130880 1 77 275 235 235 0 356 0 0 0 0 1307 750 1618 0 8 92 0 2 0 404468 130916 1 67 237 277 277 0 621 0 0 0 0 1593 1105 2022 0 10 90 0 2 0 404468 130876 4 75 255 231 231 0 598 0 0 0 0 1356 751 1897 0 8 91 0 1 0 404468 130872 6 79 256 272 272 0 584 0 0 0 0 1337 765 1702 0 8 92 0 2 0 404468 130908 3 70 237 281 281 0 583 0 0 0 0 1593 762 1885 0 10 90 0 2 0 404468 130868 6 86 287 275 275 0 534 0 0 0 0 1074 754 1635 0 8 92 0 2 0 404468 130884 2 71 245 273 273 0 597 0 0 0 0 1327 763 1652 0 8 92 0 2 0 404468 130900 6 85 287 318 318 0 520 0 0 0 0 1370 770 1913 0 10 89



======== Also, when my non-redundant rpool failed,
I could not walk it by ZDB - the tool was reporting
errors and bailing out like shown below. At the
moment it is irrelevant to my current problem,
as I've recreated that rpool, but it seems wrong
that errors in files precluded the pool from working
(being imported by zpool, or checked by zdb).



root@openindiana:~# time zdb -bb -e 17995958177810353692

Traversing all blocks to verify nothing leaked ...
Assertion failed: ss->ss_start <= start (0x79e22600 <= 0x79e1dc00), file ../../../uts/common/fs/zfs/space_map.c, line 173
Abort (core dumped)

real    0m12.184s
user    0m0.367s
sys     0m0.474s

root@openindiana:~# time zdb -bsvc -e 17995958177810353692

Traversing all blocks to verify checksums and verify nothing leaked ...
Assertion failed: ss->ss_start <= start (0x79e22600 <= 0x79e1dc00), file ../../../uts/common/fs/zfs/space_map.c, line 173
Abort (core dumped)

real    0m12.019s
user    0m0.360s
sys     0m0.458s

=== There were some errors, but all known ones proved to
be in file data, hence - why the fatal bailout (and also
inability to import that rpool - machine hung on every
attempt)?

root@openindiana:~# time zdb -bsvcL -e 17995958177810353692

Traversing all blocks to verify checksums ...

zdb_blkptr_cb: Got error 50 reading <182, 19177, 0, 1> DVA[0]=<0:a8c8e600:20000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=82L/82P fill=1 cksum=3401f5fe522b:109ee10ba48ed38c:e7f49c220f7b8bc:ff405ef051b91e65 -- skipping zdb_blkptr_cb: Got error 50 reading <182, 19202, 0, 1> DVA[0]=<0:a9030a00:20000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=82L/82P fill=1 cksum=11c4c738b0ba:7bb81bce3313913:8f85a7abf1b9e34:58e8746d63119393 -- skipping zdb_blkptr_cb: Got error 50 reading <182, 24924, 0, 0> DVA[0]=<0:b1aaec00:14a00> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=14a00L/14a00P birth=85L/85P fill=1 cksum=270679cd905d:6119a969a134566:6f0f7da64c4d2d90:3ab86aa985abef02 -- skipping zdb_blkptr_cb: Got error 50 reading <182, 24944, 0, 0> DVA[0]=<0:b1cdf000:10800> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=10800L/10800P birth=85L/85P fill=1 cksum=1ebb4d1ae9f5:3cf5f42afa9a332:757613fc2d2de7b3:5f197017333a4f89 -- skipping

zdb_blkptr_cb: Got error 50 reading <493, 947, 0, 165> DVA[0]=<0:b3efc200:20000> [L0 ZFS plain file] fletcher4 uncompressed LE contiguous unique single size=20000L/20000P birth=26691L/26691P fill=1 cksum=2cdc2ae22d10:b33d31bcbc0d8da:f1571c9975e151b0:a037073594569635 -- skipping

Error counts:

        errno  count
           50  5
block traversal size 11986202624 != alloc 11986203136 (unreachable 512)

        bp count:          405927
        bp logical:    15030449664      avg:  37027
        bp physical:   12995855872      avg:  32015     compression:   1.16
        bp allocated:  13172434944      avg:  32450     compression:   1.14
        bp deduped:    1186232320    ref>1:  12767   deduplication:   1.09
        SPA allocated: 11986203136     used: 56.17%



//Jim

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to