Re: dump -X of large LVM based FFSv2 with WAPBL panics
Hello Jaromir, actually I did a forced fsck on the respective FS while it was unmounted upfront. To be sure I just ran the command again - it passes with no errors the second time. When I run dump -X again, the panic still occurs. Best regards, Matthias nuc# fsck -P /dev/mapper/vg0-photo ** /dev/mapper/rvg0-photo ** File system is clean; not checking nuc# fsck -P -f /dev/mapper/vg0-photo ** /dev/mapper/rvg0-photo ** File system is already clean ** Last Mounted on /p ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups FREE BLK COUNT(S) WRONG IN SUPERBLK** | 97% SALVAGE? [yn] y 59411 files, 63408414 used, 35694535 free (2079 frags, 4461557 blocks, 0.0% fragmentation) * FILE SYSTEM WAS MODIFIED * nuc# fsck -P -f /dev/mapper/vg0-photo ** /dev/mapper/rvg0-photo ** File system is already clean ** Last Mounted on /p ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 59411 files, 63408414 used, 35694535 free (2079 frags, 4461557 blocks, 0.0% fragmentation) nuc# mount /p nuc# touch /p/test.ignore nuc# umount /p nuc# fsck -P -f /dev/mapper/vg0-photo ** /dev/mapper/rvg0-photo ** File system is already clean ** Last Mounted on /p ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 59412 files, 63408414 used, 35694535 free (2079 frags, 4461557 blocks, 0.0% fragmentation) nuc# Am 15.11.2017 um 20:29 schrieb Jaromír Doleček: Hi, can you try if doing full forced fsck (fsck -f) would resolve this? I've seen several such persistent panics when I was debugging WAPBL. Even after kernel fixes I had persistent panics around ffs_newvnode() due to disk data corruption from previous runs. This is worth trying. Some day I plan to add some counter, so that actually boot would actually force fsck every X boots even when clean, similarily what Linux does with ext3/4. Jaromir 2017-11-15 12:56 GMT+01:00 Matthias Petermann>: Hello, on my system I have observed a serious panic when doing FFSv2 dumps under certain conditions. I did some googling on my own and found some references regarding the lead symptom "ffs_newvnode: ino=113 on /p: gen 55fd2f1f/55fd2f1f has non zero blocks ff00 or size 0" but all of them ended up as solved back in 2016. So I wanted to share my observation here, in the hope somebody can give me some pointers how the issue could be narrowed down further. 1) Given: - NetBSD 8.0_BETA (Kernel built from branches/netbsd-8 around 2017-11-06) NetBSD nuc.local 8.0_BETA NetBSD 8.0_BETA (XEN3_DOM0_XHCI) #0: Mon Nov 6 14:31:17 CET 2017 admin@nuc.local:/s/src/sys/arch/amd64/compile/XEN3_DOM0_XHCI amd64 - A large (392 GB) LVM volume hosting a FFSv2 filesystem with WAPBL enabled (/dev/mapper/vg0-photo mounted at /p) - (An external USB 3.0 Drive) 2) What I tried: - make a dump of the aforementioned filesystem, using snapshots # dump -X -0auf /mnt/photo.0.dump /p 3) What happens then: - the System crashes, leaving a coredump with with the following indication: ffs_newvnode: ino=113 on /p: gen 55fd2f1f/55fd2f1f has non zero blocks ff00 or size 0 fatal page fault in supervisor mode trap type 6 code 0x2 rip 0x8022c0cc cs 0x8 rflags 0x10246 cr2 0xfe82deaddf1d ilevel 0x3 rsp 0xfe810e6b1eb8 curlwp 0xfe827f736000 pid 0.4 lowest kstack 0xfe810e6ae2c0 panic: trap cpu0: Begin traceback... vpanic() at netbsd:vpanic+0x140 snprintf() at netbsd:snprintf trap() at netbsd:trap+0xc6b --- trap (number 6) --- mutex_enter() at netbsd:mutex_enter+0xc biodone2() at netbsd:biodone2+0x9b biodone2() at netbsd:biodone2+0x9b biointr() at netbsd:biointr+0x3a softint_dispatch() at netbsd:softint_dispatch+0xd3 DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe810e6b1ff0 Xsoftintr() at netbsd:Xsoftintr+0x4f --- interrupt --- 0: cpu0: End traceback... dumping to dev 0,1 (offset=168119, size=2076255): dump - gdb backtrace shows: (gdb) target kvm netbsd.3.core 0x80229545 in cpu_reboot () (gdb) bt #0 0x80229545 in cpu_reboot () #1 0x809a4afc in vpanic () #2 0x809a4bb0 in panic () #3 0x8022b176 in trap ()
Re: dump -X of large LVM based FFSv2 with WAPBL panics
On Wed, Nov 15, 2017 at 08:29:51PM +0100, Jaromír Dole?ek wrote: > Hi, > > can you try if doing full forced fsck (fsck -f) would resolve this? > > I've seen several such persistent panics when I was debugging WAPBL. Even > after kernel fixes I had persistent panics around ffs_newvnode() due to > disk data corruption from previous runs. This is worth trying. > > Some day I plan to add some counter, so that actually boot would actually > force fsck every X boots even when clean, similarily what Linux does with > ext3/4. I hope it will be configurable. On linux I alwas turn it off (you don't want a multi-hours fsck following a "quick reboot for kernel update"). I'd prefer a forced fsck when the kernel has detected a fs corruption. This indeed needs a write to the superblock ... -- Manuel BouyerNetBSD: 26 ans d'experience feront toujours la difference --
Re: dump -X of large LVM based FFSv2 with WAPBL panics
Hi, can you try if doing full forced fsck (fsck -f) would resolve this? I've seen several such persistent panics when I was debugging WAPBL. Even after kernel fixes I had persistent panics around ffs_newvnode() due to disk data corruption from previous runs. This is worth trying. Some day I plan to add some counter, so that actually boot would actually force fsck every X boots even when clean, similarily what Linux does with ext3/4. Jaromir 2017-11-15 12:56 GMT+01:00 Matthias Petermann: > Hello, > > on my system I have observed a serious panic when doing FFSv2 dumps under > certain conditions. I did some googling on my own and found some references > regarding the lead symptom > > "ffs_newvnode: ino=113 on /p: gen 55fd2f1f/55fd2f1f has non zero > blocks ff00 or size 0" > > but all of them ended up as solved back in 2016. So I wanted to share my > observation here, in the hope somebody can give me some pointers how the > issue could be narrowed down further. > > 1) Given: > > - NetBSD 8.0_BETA (Kernel built from branches/netbsd-8 around 2017-11-06) > > NetBSD nuc.local 8.0_BETA NetBSD 8.0_BETA (XEN3_DOM0_XHCI) #0: Mon > Nov 6 14:31:17 CET 2017 > admin@nuc.local:/s/src/sys/arch/amd64/compile/XEN3_DOM0_XHCI > amd64 > > - A large (392 GB) LVM volume hosting a FFSv2 filesystem with WAPBL enabled > (/dev/mapper/vg0-photo mounted at /p) > > - (An external USB 3.0 Drive) > > 2) What I tried: > > - make a dump of the aforementioned filesystem, using snapshots > > # dump -X -0auf /mnt/photo.0.dump /p > > 3) What happens then: > > - the System crashes, leaving a coredump with with the following > indication: > > ffs_newvnode: ino=113 on /p: gen 55fd2f1f/55fd2f1f has non zero blocks > ff00 or size 0 > fatal page fault in supervisor mode > trap type 6 code 0x2 rip 0x8022c0cc cs 0x8 rflags 0x10246 cr2 > 0xfe82deaddf1d ilevel 0x3 rsp 0xfe810e6b1eb8 > curlwp 0xfe827f736000 pid 0.4 lowest kstack 0xfe810e6ae2c0 > panic: trap > cpu0: Begin traceback... > vpanic() at netbsd:vpanic+0x140 > snprintf() at netbsd:snprintf > trap() at netbsd:trap+0xc6b > --- trap (number 6) --- > mutex_enter() at netbsd:mutex_enter+0xc > biodone2() at netbsd:biodone2+0x9b > biodone2() at netbsd:biodone2+0x9b > biointr() at netbsd:biointr+0x3a > softint_dispatch() at netbsd:softint_dispatch+0xd3 > DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfe810e6b1ff0 > Xsoftintr() at netbsd:Xsoftintr+0x4f > --- interrupt --- > 0: > cpu0: End traceback... > > dumping to dev 0,1 (offset=168119, size=2076255): > dump > > - gdb backtrace shows: > > (gdb) target kvm netbsd.3.core > 0x80229545 in cpu_reboot () > (gdb) bt > #0 0x80229545 in cpu_reboot () > #1 0x809a4afc in vpanic () > #2 0x809a4bb0 in panic () > #3 0x8022b176 in trap () > #4 0x8020113e in alltraps () > #5 0x8022c0cc in mutex_enter () > #6 0x80a029f5 in wapbl_biodone () > #7 0x809e2f20 in biodone2 () > #8 0x809e2f20 in biodone2 () > #9 0x809e303e in biointr () > #10 0x8097bc1d in softint_dispatch () > #11 0x80223eef in Xsoftintr () > (gdb) > > 4) What I tried afterwards: > > - make a dump of the aforementioned filesystem, using NO snapshots > > # dump -0auf /mnt/photo.0.dump /p > > -> works > > - umount the filesystem, enforcing a manual fsck > > -> no problems > > - dumpfs -s /dev/mapper/vg0-photo > > nuc# dumpfs -s /dev/mapper/vg0-photo > file system: /dev/mapper/vg0-photo > format FFSv2 > endian little-endian > location 65536 (-b 128) > magic 19540119timeWed Nov 15 12:26:52 2017 > superblock location 65536 id [ 59f8026a 16319237 ] > cylgrp dynamic inodes FFSv2 sblock FFSv2 fslevel 5 > nbfree 4461561 ndir1865nifree 24770027nffree 2079 > ncg 530 size100663296 blocks 99102949 > bsize 32768 shift 15 mask0x8000 > fsize 4096shift 12 mask0xf000 > frag8 shift 3 fsbtodb 3 > bpg 23742 fpg 189936 ipg 46848 > minfree 5% optim timemaxcontig 2 maxbpg 4096 > symlinklen 120 contigsumsize 2 > maxfilesize 0x000800800805 > nindir 4096inopb 128 > avgfilesize 16384 avgfpdir 64 > sblkno 24 cblkno 32 iblkno 40 dblkno 2968 > sbsize 4096cgsize 32768 > csaddr 2968cssize 12288 > cgrotor 0 fmod0 ronly 0 clean 0x01 > wapbl version 0x1 location 2 flags 0x0 > wapbl loc0 402688128loc1 131072 loc2 512loc3 3 > flags none > fsmnt /p > volname swuid 0 > > 5) Further