Re: ZFS PANIC: HELP.
On 02/27/2022 3:58 pm, Mark Johnston wrote: On Sun, Feb 27, 2022 at 01:16:44PM -0600, Larry Rosenman wrote: On 02/26/2022 11:08 am, Larry Rosenman wrote: > On 02/26/2022 10:57 am, Larry Rosenman wrote: >> On 02/26/2022 10:37 am, Juraj Lutter wrote: On 26 Feb 2022, at 03:03, Larry Rosenman wrote: I'm running this script: #!/bin/sh for i in $(zfs list -H | awk '{print $1}') do FS=$1 FN=$(echo ${FS} | sed -e s@/@_@g) sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org cat - \> $FN done >>> I’d put, like: >>> >>> echo ${FS} >>> >>> before “sudo zfs send”, to get at least a bit of a clue on where it >>> can get to. >>> >>> otis >>> >>> >>> — >>> Juraj Lutter >>> o...@freebsd.org >> I just looked at the destination to see where it died (it did!) and I >> bectl destroy'd the >> BE that crashed it, and am running a new scrub -- we'll see whether >> that was sufficient. >> >> Thanks, all! > Well, it was NOT sufficient More zfs export fun to come :( I was able to export the rest of the datasets, and re-install 14-CURRENT from a recent snapshot, and restore the datasets I care about. I'm now seeing: mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled pid 48 (zpool), jid 0, uid 0: exited on signal 6 mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled pid 54 (zpool), jid 0, uid 0: exited on signal 6 On boot. Ideas? That ioctl is DIOCGMEDIASIZE, i.e., something is asking /dev/mfi0, the controller device node, about the size of a disk. Presumably this is the result of some kind of misconfiguration somewhere, and /dev/mfid0 was meant instead. per advice from markj@ I deleted the /{etc,boot}/zfs/zpool.cache files, and this issue went away. Stale cache files which are no longer needed. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On Sun, Feb 27, 2022 at 01:16:44PM -0600, Larry Rosenman wrote: > On 02/26/2022 11:08 am, Larry Rosenman wrote: > > On 02/26/2022 10:57 am, Larry Rosenman wrote: > >> On 02/26/2022 10:37 am, Juraj Lutter wrote: > On 26 Feb 2022, at 03:03, Larry Rosenman wrote: > I'm running this script: > #!/bin/sh > for i in $(zfs list -H | awk '{print $1}') > do > FS=$1 > FN=$(echo ${FS} | sed -e s@/@_@g) > sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh > l...@freenas.lerctr.org cat - \> $FN > done > > > > >>> I’d put, like: > >>> > >>> echo ${FS} > >>> > >>> before “sudo zfs send”, to get at least a bit of a clue on where it > >>> can get to. > >>> > >>> otis > >>> > >>> > >>> — > >>> Juraj Lutter > >>> o...@freebsd.org > >> I just looked at the destination to see where it died (it did!) and I > >> bectl destroy'd the > >> BE that crashed it, and am running a new scrub -- we'll see whether > >> that was sufficient. > >> > >> Thanks, all! > > Well, it was NOT sufficient More zfs export fun to come :( > > I was able to export the rest of the datasets, and re-install 14-CURRENT > from a recent snapshot, and restore the datasets I care about. > > I'm now seeing: > mfi0: IOCTL 0x40086481 not handled > mfi0: IOCTL 0x40086481 not handled > mfi0: IOCTL 0x40086481 not handled > mfi0: IOCTL 0x40086481 not handled > pid 48 (zpool), jid 0, uid 0: exited on signal 6 > mfi0: IOCTL 0x40086481 not handled > mfi0: IOCTL 0x40086481 not handled > mfi0: IOCTL 0x40086481 not handled > mfi0: IOCTL 0x40086481 not handled > pid 54 (zpool), jid 0, uid 0: exited on signal 6 > > On boot. Ideas? That ioctl is DIOCGMEDIASIZE, i.e., something is asking /dev/mfi0, the controller device node, about the size of a disk. Presumably this is the result of some kind of misconfiguration somewhere, and /dev/mfid0 was meant instead.
Re: ZFS PANIC: HELP.
On 2/27/22 16:09, Larry Rosenman wrote: On 02/27/2022 3:03 pm, Michael Butler wrote: [ cc list trimmed ] On 2/27/22 14:16, Larry Rosenman wrote: I was able to export the rest of the datasets, and re-install 14-CURRENT from a recent snapshot, and restore the datasets I care about. I'm now seeing: mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled pid 48 (zpool), jid 0, uid 0: exited on signal 6 mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled pid 54 (zpool), jid 0, uid 0: exited on signal 6 On boot. Ideas? These messages may or may not be related. I found both the mfi and mrsas drivers to be 'chatty' in this way - IOCTL complaints. I ended up setting the debug flag for mrsas in /etc/sysctl.conf .. dev.mrsas.0.mrsas_debug=0 There's an equivalent for mfi Michael I don't see it: ✖1 ❯ sysctl dev.mfi dev.mfi.0.keep_deleted_volumes: 0 dev.mfi.0.delete_busy_volumes: 0 dev.mfi.0.%parent: pci3 dev.mfi.0.%pnpinfo: vendor=0x1000 device=0x0079 subvendor=0x1028 subdevice=0x1f17 class=0x010400 dev.mfi.0.%location: slot=0 function=0 dbsf=pci0:3:0:0 dev.mfi.0.%driver: mfi dev.mfi.0.%desc: Dell PERC H700 Integrated dev.mfi.%parent: my brain-fade - you're right; it is only there and tunable in the mrsas driver. My apologies :-( Michael
Re: ZFS PANIC: HELP.
On 02/27/2022 3:03 pm, Michael Butler wrote: [ cc list trimmed ] On 2/27/22 14:16, Larry Rosenman wrote: I was able to export the rest of the datasets, and re-install 14-CURRENT from a recent snapshot, and restore the datasets I care about. I'm now seeing: mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled pid 48 (zpool), jid 0, uid 0: exited on signal 6 mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled pid 54 (zpool), jid 0, uid 0: exited on signal 6 On boot. Ideas? These messages may or may not be related. I found both the mfi and mrsas drivers to be 'chatty' in this way - IOCTL complaints. I ended up setting the debug flag for mrsas in /etc/sysctl.conf .. dev.mrsas.0.mrsas_debug=0 There's an equivalent for mfi Michael I don't see it: ✖1 ❯ sysctl dev.mfi dev.mfi.0.keep_deleted_volumes: 0 dev.mfi.0.delete_busy_volumes: 0 dev.mfi.0.%parent: pci3 dev.mfi.0.%pnpinfo: vendor=0x1000 device=0x0079 subvendor=0x1028 subdevice=0x1f17 class=0x010400 dev.mfi.0.%location: slot=0 function=0 dbsf=pci0:3:0:0 dev.mfi.0.%driver: mfi dev.mfi.0.%desc: Dell PERC H700 Integrated dev.mfi.%parent: -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
[ cc list trimmed ] On 2/27/22 14:16, Larry Rosenman wrote: I was able to export the rest of the datasets, and re-install 14-CURRENT from a recent snapshot, and restore the datasets I care about. I'm now seeing: mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled pid 48 (zpool), jid 0, uid 0: exited on signal 6 mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled pid 54 (zpool), jid 0, uid 0: exited on signal 6 On boot. Ideas? These messages may or may not be related. I found both the mfi and mrsas drivers to be 'chatty' in this way - IOCTL complaints. I ended up setting the debug flag for mrsas in /etc/sysctl.conf .. dev.mrsas.0.mrsas_debug=0 There's an equivalent for mfi Michael
Re: ZFS PANIC: HELP.
On 02/26/2022 11:08 am, Larry Rosenman wrote: On 02/26/2022 10:57 am, Larry Rosenman wrote: On 02/26/2022 10:37 am, Juraj Lutter wrote: On 26 Feb 2022, at 03:03, Larry Rosenman wrote: I'm running this script: #!/bin/sh for i in $(zfs list -H | awk '{print $1}') do FS=$1 FN=$(echo ${FS} | sed -e s@/@_@g) sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org cat - \> $FN done I’d put, like: echo ${FS} before “sudo zfs send”, to get at least a bit of a clue on where it can get to. otis — Juraj Lutter o...@freebsd.org I just looked at the destination to see where it died (it did!) and I bectl destroy'd the BE that crashed it, and am running a new scrub -- we'll see whether that was sufficient. Thanks, all! Well, it was NOT sufficient More zfs export fun to come :( I was able to export the rest of the datasets, and re-install 14-CURRENT from a recent snapshot, and restore the datasets I care about. I'm now seeing: mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled pid 48 (zpool), jid 0, uid 0: exited on signal 6 mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled pid 54 (zpool), jid 0, uid 0: exited on signal 6 On boot. Ideas? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/26/2022 10:57 am, Larry Rosenman wrote: On 02/26/2022 10:37 am, Juraj Lutter wrote: On 26 Feb 2022, at 03:03, Larry Rosenman wrote: I'm running this script: #!/bin/sh for i in $(zfs list -H | awk '{print $1}') do FS=$1 FN=$(echo ${FS} | sed -e s@/@_@g) sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org cat - \> $FN done I’d put, like: echo ${FS} before “sudo zfs send”, to get at least a bit of a clue on where it can get to. otis — Juraj Lutter o...@freebsd.org I just looked at the destination to see where it died (it did!) and I bectl destroy'd the BE that crashed it, and am running a new scrub -- we'll see whether that was sufficient. Thanks, all! Well, it was NOT sufficient More zfs export fun to come :( -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/26/2022 10:37 am, Juraj Lutter wrote: On 26 Feb 2022, at 03:03, Larry Rosenman wrote: I'm running this script: #!/bin/sh for i in $(zfs list -H | awk '{print $1}') do FS=$1 FN=$(echo ${FS} | sed -e s@/@_@g) sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org cat - \> $FN done I’d put, like: echo ${FS} before “sudo zfs send”, to get at least a bit of a clue on where it can get to. otis — Juraj Lutter o...@freebsd.org I just looked at the destination to see where it died (it did!) and I bectl destroy'd the BE that crashed it, and am running a new scrub -- we'll see whether that was sufficient. Thanks, all! -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
Quoting Larry Rosenman (from Fri, 25 Feb 2022 20:03:51 -0600): On 02/25/2022 2:11 am, Alexander Leidinger wrote: Quoting Larry Rosenman (from Thu, 24 Feb 2022 20:19:45 -0600): I tried a scrub -- it panic'd on a fatal double fault. Suggestions? The safest / cleanest (but not fastest) is data export and pool re-creation. If you export dataset by dataset (instead of recursively all), you can even see which dataset is causing the issue. In case this per dataset export narrows down the issue and it is a dataset you don't care about (as in: 1) no issue to recreate from scratch or 2) there is a backup available) you could delete this (or each such) dataset and re-create it in-place (= not re-creating the entire pool). Bye, Alexander. http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netch...@freebsd.org : PGP 0x8F31830F9F2772BF I'm running this script: #!/bin/sh for i in $(zfs list -H | awk '{print $1}') do FS=$1 FN=$(echo ${FS} | sed -e s@/@_@g) sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org cat - \> $FN done How will I know a "Problem" dataset? You told a scrub is panicing the system. A scrub only touches occupied blocks. As such a problem-dataset should panic your system. If it doesn't panic at all, the problem may be within a snapshot which contains data which is deleted in later versions of the dataset. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpYqsU391ZUr.pgp Description: Digitale PGP-Signatur
Re: ZFS PANIC: HELP.
On 02/25/2022 2:11 am, Alexander Leidinger wrote: Quoting Larry Rosenman (from Thu, 24 Feb 2022 20:19:45 -0600): I tried a scrub -- it panic'd on a fatal double fault. Suggestions? The safest / cleanest (but not fastest) is data export and pool re-creation. If you export dataset by dataset (instead of recursively all), you can even see which dataset is causing the issue. In case this per dataset export narrows down the issue and it is a dataset you don't care about (as in: 1) no issue to recreate from scratch or 2) there is a backup available) you could delete this (or each such) dataset and re-create it in-place (= not re-creating the entire pool). Bye, Alexander. http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF I'm running this script: #!/bin/sh for i in $(zfs list -H | awk '{print $1}') do FS=$1 FN=$(echo ${FS} | sed -e s@/@_@g) sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org cat - \> $FN done How will I know a "Problem" dataset? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
Quoting Larry Rosenman (from Thu, 24 Feb 2022 20:19:45 -0600): I tried a scrub -- it panic'd on a fatal double fault. Suggestions? The safest / cleanest (but not fastest) is data export and pool re-creation. If you export dataset by dataset (instead of recursively all), you can even see which dataset is causing the issue. In case this per dataset export narrows down the issue and it is a dataset you don't care about (as in: 1) no issue to recreate from scratch or 2) there is a backup available) you could delete this (or each such) dataset and re-create it in-place (= not re-creating the entire pool). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpbleK3b3rSl.pgp Description: Digitale PGP-Signatur
Re: ZFS PANIC: HELP.
On 02/24/2022 8:07 pm, Larry Rosenman wrote: On 02/24/2022 1:27 pm, Larry Rosenman wrote: On 02/24/2022 10:48 am, Rob Wing wrote: even with those set, I still get the panid. :( Let me see if I can compile a 14 non-INVARIANTS kernel on the 13-REL system. UGH. I chroot'd to the pool, and built a no invariants kernel. It booted and seems(!) to be running. Is there any diagnostics/clearing the crappy ZIL? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 I tried a scrub -- it panic'd on a fatal double fault. Suggestions? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/24/2022 1:27 pm, Larry Rosenman wrote: On 02/24/2022 10:48 am, Rob Wing wrote: even with those set, I still get the panid. :( Let me see if I can compile a 14 non-INVARIANTS kernel on the 13-REL system. UGH. I chroot'd to the pool, and built a no invariants kernel. It booted and seems(!) to be running. Is there any diagnostics/clearing the crappy ZIL? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/24/2022 10:48 am, Rob Wing wrote: Yes, I believe so. On Thu, Feb 24, 2022 at 7:42 AM Larry Rosenman wrote: On 02/24/2022 10:36 am, Rob Wing wrote: You might try setting `sysctl vfs.zfs.recover=1` and `sysctl vfs.zfs.spa.load_verify_metadata=0`. I had a similar error the other day (couple months ago). The best I did was being able to import the pool read only. I ended up restoring from backup. Are those tunables that I can set in loader.conf? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 even with those set, I still get the panid. :( Let me see if I can compile a 14 non-INVARIANTS kernel on the 13-REL system. UGH. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
Yes, I believe so. On Thu, Feb 24, 2022 at 7:42 AM Larry Rosenman wrote: > On 02/24/2022 10:36 am, Rob Wing wrote: > > You might try setting `sysctl vfs.zfs.recover=1` and `sysctl > vfs.zfs.spa.load_verify_metadata=0`. > > I had a similar error the other day (couple months ago). The best I did > was being able to import the pool read only. I ended up restoring from > backup. > > > > Are those tunables that I can set in loader.conf? > > > -- > Larry Rosenman http://www.lerctr.org/~ler > Phone: +1 214-642-9640 E-Mail: l...@lerctr.org > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 >
Re: ZFS PANIC: HELP.
On 02/24/2022 10:36 am, Rob Wing wrote: You might try setting `sysctl vfs.zfs.recover=1` and `sysctl vfs.zfs.spa.load_verify_metadata=0`. I had a similar error the other day (couple months ago). The best I did was being able to import the pool read only. I ended up restoring from backup. Are those tunables that I can set in loader.conf? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
You might try setting `sysctl vfs.zfs.recover=1` and `sysctl vfs.zfs.spa.load_verify_metadata=0`. I had a similar error the other day (couple months ago). The best I did was being able to import the pool read only. I ended up restoring from backup. On Thu, Feb 24, 2022 at 7:30 AM Alexander Motin wrote: > On 24.02.2022 10:57, Larry Rosenman wrote: > > On 02/23/2022 9:27 pm, Larry Rosenman wrote: > >> It crashes just after root mount (this is the boot pool and only pool > >> on the system), > >> seeL > >> https://www.lerctr.org/~ler/14-BOOT-Crash.png > > > > Where do I go from here? > > I see 2 ways: 1) Since it is only an assertion and 13 is working (so > far), you may just build 14 kernel without INVARIANTS option and later > recreate the pool when you have time. 2) You may treat it as metadata > corruption: import pool read-only and evacuate the data. If you have > recent enough snapshots you may be able to easily replicate the pool > with all the settings to some other disk. ZIL is not replicated, so > corruptions there should not be a problem. If there are no snapshots, > then either copy on file level, or you may be able to create snapshot > for replication in 13 (on 14 without INVARIANTS), importing pool > read-write. > > -- > Alexander Motin > >
Re: ZFS PANIC: HELP.
On 02/24/2022 10:29 am, Alexander Motin wrote: On 24.02.2022 10:57, Larry Rosenman wrote: On 02/23/2022 9:27 pm, Larry Rosenman wrote: It crashes just after root mount (this is the boot pool and only pool on the system), seeL https://www.lerctr.org/~ler/14-BOOT-Crash.png Where do I go from here? I see 2 ways: 1) Since it is only an assertion and 13 is working (so far), you may just build 14 kernel without INVARIANTS option and later recreate the pool when you have time. 2) You may treat it as metadata corruption: import pool read-only and evacuate the data. If you have recent enough snapshots you may be able to easily replicate the pool with all the settings to some other disk. ZIL is not replicated, so corruptions there should not be a problem. If there are no snapshots, then either copy on file level, or you may be able to create snapshot for replication in 13 (on 14 without INVARIANTS), importing pool read-write. Ugh. The box is a 6 disk R710, and all 6 disks are in the pool. I do have a FreeNAS box with enough space to copy the data out. There ARE snaps of MOST filesystems that are taken regularly. The 13 I'm booting from is the 13 memstick image. There are ~70 filesystems (IIRC) with poudriere, ports, et al. I'm not sure how to build the 14 kernel from the 13 booted box. Ideas? Methods? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 24.02.2022 10:57, Larry Rosenman wrote: On 02/23/2022 9:27 pm, Larry Rosenman wrote: It crashes just after root mount (this is the boot pool and only pool on the system), seeL https://www.lerctr.org/~ler/14-BOOT-Crash.png Where do I go from here? I see 2 ways: 1) Since it is only an assertion and 13 is working (so far), you may just build 14 kernel without INVARIANTS option and later recreate the pool when you have time. 2) You may treat it as metadata corruption: import pool read-only and evacuate the data. If you have recent enough snapshots you may be able to easily replicate the pool with all the settings to some other disk. ZIL is not replicated, so corruptions there should not be a problem. If there are no snapshots, then either copy on file level, or you may be able to create snapshot for replication in 13 (on 14 without INVARIANTS), importing pool read-write. -- Alexander Motin
Re: ZFS PANIC: HELP.
On 02/23/2022 9:27 pm, Larry Rosenman wrote: On 02/23/2022 9:15 pm, Alexander Motin wrote: On 23.02.2022 22:01, Larry Rosenman wrote: On 02/23/2022 8:58 pm, Alexander Motin wrote: On 23.02.2022 21:52, Larry Rosenman wrote: On 02/23/2022 8:41 pm, Alexander Motin wrote: Hi Larry, The panic you are getting is an assertion, enabled by kernel built with INVARIANTS option. On 13 you may just not have that debugging enabled to hit the issue. But that may be only a consequence. Original problem I guess in possibly corrupted ZFS intent log records (or false positive), that could happen so due to use of -F recovery option on `zpool import`, that supposed to try import pool at earlier transaction group if there is some metadata corruption found. It is not supposed to work 100% and only a last resort. Though may be that assertion is just excessively strict for that specific recovery case. If as you say pool can be imported and scrubbed on 13, then I'd expect following clean export should allow later import on 14 without -F. On 23.02.2022 21:21, Larry Rosenman wrote: 've got my main dev box that crashes on 14 with the screen shot at https://www.lerctr.org/~ler/14-zfs-crash.png. Booting from a 13-REL USB installer it imports and scrubs. Ideas? I can either video conference with shared screen or give access to the console via my Dominion KVM. Any help/ideas/etc welcome I really need to get this box back. How can I import the pool withOUT it mounting the FileSystems so I can export it cleanly on the 13 system? Why do you need to import without mounting file systems? I think you may actually wish them to be mounted to replay their ZILs. Just use -R option to mount file systems in some different place. I get the errors shown at: https://www.lerctr.org/~ler/14-mount-R-output.png Should I worry? Or do something(tm) here? This looks weird, but may possibly depend on mount points topology, whether /mnt is writable, etc. What happen if you export it now and try to import it in normal way on 14 without -F? It crashes just after root mount (this is the boot pool and only pool on the system), seeL https://www.lerctr.org/~ler/14-BOOT-Crash.png Where do I go from here? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/23/2022 9:15 pm, Alexander Motin wrote: On 23.02.2022 22:01, Larry Rosenman wrote: On 02/23/2022 8:58 pm, Alexander Motin wrote: On 23.02.2022 21:52, Larry Rosenman wrote: On 02/23/2022 8:41 pm, Alexander Motin wrote: Hi Larry, The panic you are getting is an assertion, enabled by kernel built with INVARIANTS option. On 13 you may just not have that debugging enabled to hit the issue. But that may be only a consequence. Original problem I guess in possibly corrupted ZFS intent log records (or false positive), that could happen so due to use of -F recovery option on `zpool import`, that supposed to try import pool at earlier transaction group if there is some metadata corruption found. It is not supposed to work 100% and only a last resort. Though may be that assertion is just excessively strict for that specific recovery case. If as you say pool can be imported and scrubbed on 13, then I'd expect following clean export should allow later import on 14 without -F. On 23.02.2022 21:21, Larry Rosenman wrote: 've got my main dev box that crashes on 14 with the screen shot at https://www.lerctr.org/~ler/14-zfs-crash.png. Booting from a 13-REL USB installer it imports and scrubs. Ideas? I can either video conference with shared screen or give access to the console via my Dominion KVM. Any help/ideas/etc welcome I really need to get this box back. How can I import the pool withOUT it mounting the FileSystems so I can export it cleanly on the 13 system? Why do you need to import without mounting file systems? I think you may actually wish them to be mounted to replay their ZILs. Just use -R option to mount file systems in some different place. I get the errors shown at: https://www.lerctr.org/~ler/14-mount-R-output.png Should I worry? Or do something(tm) here? This looks weird, but may possibly depend on mount points topology, whether /mnt is writable, etc. What happen if you export it now and try to import it in normal way on 14 without -F? It crashes just after root mount (this is the boot pool and only pool on the system), seeL https://www.lerctr.org/~ler/14-BOOT-Crash.png -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 23.02.2022 22:01, Larry Rosenman wrote: On 02/23/2022 8:58 pm, Alexander Motin wrote: On 23.02.2022 21:52, Larry Rosenman wrote: On 02/23/2022 8:41 pm, Alexander Motin wrote: Hi Larry, The panic you are getting is an assertion, enabled by kernel built with INVARIANTS option. On 13 you may just not have that debugging enabled to hit the issue. But that may be only a consequence. Original problem I guess in possibly corrupted ZFS intent log records (or false positive), that could happen so due to use of -F recovery option on `zpool import`, that supposed to try import pool at earlier transaction group if there is some metadata corruption found. It is not supposed to work 100% and only a last resort. Though may be that assertion is just excessively strict for that specific recovery case. If as you say pool can be imported and scrubbed on 13, then I'd expect following clean export should allow later import on 14 without -F. On 23.02.2022 21:21, Larry Rosenman wrote: 've got my main dev box that crashes on 14 with the screen shot at https://www.lerctr.org/~ler/14-zfs-crash.png. Booting from a 13-REL USB installer it imports and scrubs. Ideas? I can either video conference with shared screen or give access to the console via my Dominion KVM. Any help/ideas/etc welcome I really need to get this box back. How can I import the pool withOUT it mounting the FileSystems so I can export it cleanly on the 13 system? Why do you need to import without mounting file systems? I think you may actually wish them to be mounted to replay their ZILs. Just use -R option to mount file systems in some different place. I get the errors shown at: https://www.lerctr.org/~ler/14-mount-R-output.png Should I worry? Or do something(tm) here? This looks weird, but may possibly depend on mount points topology, whether /mnt is writable, etc. What happen if you export it now and try to import it in normal way on 14 without -F? -- Alexander Motin
Re: ZFS PANIC: HELP.
On 02/23/2022 8:58 pm, Alexander Motin wrote: On 23.02.2022 21:52, Larry Rosenman wrote: On 02/23/2022 8:41 pm, Alexander Motin wrote: Hi Larry, The panic you are getting is an assertion, enabled by kernel built with INVARIANTS option. On 13 you may just not have that debugging enabled to hit the issue. But that may be only a consequence. Original problem I guess in possibly corrupted ZFS intent log records (or false positive), that could happen so due to use of -F recovery option on `zpool import`, that supposed to try import pool at earlier transaction group if there is some metadata corruption found. It is not supposed to work 100% and only a last resort. Though may be that assertion is just excessively strict for that specific recovery case. If as you say pool can be imported and scrubbed on 13, then I'd expect following clean export should allow later import on 14 without -F. On 23.02.2022 21:21, Larry Rosenman wrote: 've got my main dev box that crashes on 14 with the screen shot at https://www.lerctr.org/~ler/14-zfs-crash.png. Booting from a 13-REL USB installer it imports and scrubs. Ideas? I can either video conference with shared screen or give access to the console via my Dominion KVM. Any help/ideas/etc welcome I really need to get this box back. How can I import the pool withOUT it mounting the FileSystems so I can export it cleanly on the 13 system? Why do you need to import without mounting file systems? I think you may actually wish them to be mounted to replay their ZILs. Just use -R option to mount file systems in some different place. I get the errors shown at: https://www.lerctr.org/~ler/14-mount-R-output.png Should I worry? Or do something(tm) here? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 23.02.2022 21:52, Larry Rosenman wrote: On 02/23/2022 8:41 pm, Alexander Motin wrote: Hi Larry, The panic you are getting is an assertion, enabled by kernel built with INVARIANTS option. On 13 you may just not have that debugging enabled to hit the issue. But that may be only a consequence. Original problem I guess in possibly corrupted ZFS intent log records (or false positive), that could happen so due to use of -F recovery option on `zpool import`, that supposed to try import pool at earlier transaction group if there is some metadata corruption found. It is not supposed to work 100% and only a last resort. Though may be that assertion is just excessively strict for that specific recovery case. If as you say pool can be imported and scrubbed on 13, then I'd expect following clean export should allow later import on 14 without -F. On 23.02.2022 21:21, Larry Rosenman wrote: 've got my main dev box that crashes on 14 with the screen shot at https://www.lerctr.org/~ler/14-zfs-crash.png. Booting from a 13-REL USB installer it imports and scrubs. Ideas? I can either video conference with shared screen or give access to the console via my Dominion KVM. Any help/ideas/etc welcome I really need to get this box back. How can I import the pool withOUT it mounting the FileSystems so I can export it cleanly on the 13 system? Why do you need to import without mounting file systems? I think you may actually wish them to be mounted to replay their ZILs. Just use -R option to mount file systems in some different place. -- Alexander Motin
Re: ZFS PANIC: HELP.
On 02/23/2022 8:41 pm, Alexander Motin wrote: Hi Larry, The panic you are getting is an assertion, enabled by kernel built with INVARIANTS option. On 13 you may just not have that debugging enabled to hit the issue. But that may be only a consequence. Original problem I guess in possibly corrupted ZFS intent log records (or false positive), that could happen so due to use of -F recovery option on `zpool import`, that supposed to try import pool at earlier transaction group if there is some metadata corruption found. It is not supposed to work 100% and only a last resort. Though may be that assertion is just excessively strict for that specific recovery case. If as you say pool can be imported and scrubbed on 13, then I'd expect following clean export should allow later import on 14 without -F. On 23.02.2022 21:21, Larry Rosenman wrote: 've got my main dev box that crashes on 14 with the screen shot at https://www.lerctr.org/~ler/14-zfs-crash.png. Booting from a 13-REL USB installer it imports and scrubs. Ideas? I can either video conference with shared screen or give access to the console via my Dominion KVM. Any help/ideas/etc welcome I really need to get this box back. How can I import the pool withOUT it mounting the FileSystems so I can export it cleanly on the 13 system? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
Hi Larry, The panic you are getting is an assertion, enabled by kernel built with INVARIANTS option. On 13 you may just not have that debugging enabled to hit the issue. But that may be only a consequence. Original problem I guess in possibly corrupted ZFS intent log records (or false positive), that could happen so due to use of -F recovery option on `zpool import`, that supposed to try import pool at earlier transaction group if there is some metadata corruption found. It is not supposed to work 100% and only a last resort. Though may be that assertion is just excessively strict for that specific recovery case. If as you say pool can be imported and scrubbed on 13, then I'd expect following clean export should allow later import on 14 without -F. On 23.02.2022 21:21, Larry Rosenman wrote: 've got my main dev box that crashes on 14 with the screen shot at https://www.lerctr.org/~ler/14-zfs-crash.png. Booting from a 13-REL USB installer it imports and scrubs. Ideas? I can either video conference with shared screen or give access to the console via my Dominion KVM. Any help/ideas/etc welcome I really need to get this box back. -- Alexander Motin
ZFS PANIC: HELP.
've got my main dev box that crashes on 14 with the screen shot at https://www.lerctr.org/~ler/14-zfs-crash.png. Booting from a 13-REL USB installer it imports and scrubs. Ideas? I can either video conference with shared screen or give access to the console via my Dominion KVM. Any help/ideas/etc welcome I really need to get this box back. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106