Re: ZFS PANIC: HELP.

2022-02-27 Thread Larry Rosenman

On 02/27/2022 3:58 pm, Mark Johnston wrote:

On Sun, Feb 27, 2022 at 01:16:44PM -0600, Larry Rosenman wrote:

On 02/26/2022 11:08 am, Larry Rosenman wrote:
> On 02/26/2022 10:57 am, Larry Rosenman wrote:
>> On 02/26/2022 10:37 am, Juraj Lutter wrote:
 On 26 Feb 2022, at 03:03, Larry Rosenman  wrote:
 I'm running this script:
 #!/bin/sh
 for i in $(zfs list -H | awk '{print $1}')
 do
   FS=$1
   FN=$(echo ${FS} | sed -e s@/@_@g)
   sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh
 l...@freenas.lerctr.org cat - \> $FN
 done



>>> I’d put, like:
>>>
>>> echo ${FS}
>>>
>>> before “sudo zfs send”, to get at least a bit of a clue on where it
>>> can get to.
>>>
>>> otis
>>>
>>>
>>> —
>>> Juraj Lutter
>>> o...@freebsd.org
>> I just looked at the destination to see where it died (it did!) and I
>> bectl destroy'd the
>> BE that crashed it, and am running a new scrub -- we'll see whether
>> that was sufficient.
>>
>> Thanks, all!
> Well, it was NOT sufficient More zfs export fun to come :(

I was able to export the rest of the datasets, and re-install 
14-CURRENT

from a recent snapshot, and restore the datasets I care about.

I'm now seeing:
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 48 (zpool), jid 0, uid 0: exited on signal 6
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 54 (zpool), jid 0, uid 0: exited on signal 6

On boot.  Ideas?


That ioctl is DIOCGMEDIASIZE, i.e., something is asking /dev/mfi0, the
controller device node, about the size of a disk.  Presumably this is
the result of some kind of misconfiguration somewhere, and /dev/mfid0
was meant instead.



per advice from markj@ I deleted the /{etc,boot}/zfs/zpool.cache files, 
and this issue went

away.  Stale cache files which are no longer needed.

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106



Re: ZFS PANIC: HELP.

2022-02-27 Thread Mark Johnston
On Sun, Feb 27, 2022 at 01:16:44PM -0600, Larry Rosenman wrote:
> On 02/26/2022 11:08 am, Larry Rosenman wrote:
> > On 02/26/2022 10:57 am, Larry Rosenman wrote:
> >> On 02/26/2022 10:37 am, Juraj Lutter wrote:
>  On 26 Feb 2022, at 03:03, Larry Rosenman  wrote:
>  I'm running this script:
>  #!/bin/sh
>  for i in $(zfs list -H | awk '{print $1}')
>  do
>    FS=$1
>    FN=$(echo ${FS} | sed -e s@/@_@g)
>    sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh 
>  l...@freenas.lerctr.org cat - \> $FN
>  done
>  
>  
>  
> >>> I’d put, like:
> >>> 
> >>> echo ${FS}
> >>> 
> >>> before “sudo zfs send”, to get at least a bit of a clue on where it 
> >>> can get to.
> >>> 
> >>> otis
> >>> 
> >>> 
> >>> —
> >>> Juraj Lutter
> >>> o...@freebsd.org
> >> I just looked at the destination to see where it died (it did!) and I
> >> bectl destroy'd the
> >> BE that crashed it, and am running a new scrub -- we'll see whether
> >> that was sufficient.
> >> 
> >> Thanks, all!
> > Well, it was NOT sufficient More zfs export fun to come :(
> 
> I was able to export the rest of the datasets, and re-install 14-CURRENT 
> from a recent snapshot, and restore the datasets I care about.
> 
> I'm now seeing:
> mfi0: IOCTL 0x40086481 not handled
> mfi0: IOCTL 0x40086481 not handled
> mfi0: IOCTL 0x40086481 not handled
> mfi0: IOCTL 0x40086481 not handled
> pid 48 (zpool), jid 0, uid 0: exited on signal 6
> mfi0: IOCTL 0x40086481 not handled
> mfi0: IOCTL 0x40086481 not handled
> mfi0: IOCTL 0x40086481 not handled
> mfi0: IOCTL 0x40086481 not handled
> pid 54 (zpool), jid 0, uid 0: exited on signal 6
> 
> On boot.  Ideas?

That ioctl is DIOCGMEDIASIZE, i.e., something is asking /dev/mfi0, the
controller device node, about the size of a disk.  Presumably this is
the result of some kind of misconfiguration somewhere, and /dev/mfid0
was meant instead.



Re: ZFS PANIC: HELP.

2022-02-27 Thread Michael Butler

On 2/27/22 16:09, Larry Rosenman wrote:

On 02/27/2022 3:03 pm, Michael Butler wrote:

[ cc list trimmed ]

On 2/27/22 14:16, Larry Rosenman wrote:


I was able to export the rest of the datasets, and re-install 
14-CURRENT from a recent snapshot, and restore the datasets I care 
about.


I'm now seeing:
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 48 (zpool), jid 0, uid 0: exited on signal 6
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 54 (zpool), jid 0, uid 0: exited on signal 6

On boot.  Ideas?


These messages may or may not be related. I found both the mfi and
mrsas drivers to be 'chatty' in this way - IOCTL complaints. I ended
up setting the debug flag for mrsas in /etc/sysctl.conf ..

dev.mrsas.0.mrsas_debug=0

There's an equivalent for mfi

Michael


I don't see it:
✖1 ❯ sysctl dev.mfi
dev.mfi.0.keep_deleted_volumes: 0
dev.mfi.0.delete_busy_volumes: 0
dev.mfi.0.%parent: pci3
dev.mfi.0.%pnpinfo: vendor=0x1000 device=0x0079 subvendor=0x1028 
subdevice=0x1f17 class=0x010400

dev.mfi.0.%location: slot=0 function=0 dbsf=pci0:3:0:0
dev.mfi.0.%driver: mfi
dev.mfi.0.%desc: Dell PERC H700 Integrated
dev.mfi.%parent:


 my brain-fade - you're right; it is only there and tunable in the 
mrsas driver.


My apologies :-(

Michael





Re: ZFS PANIC: HELP.

2022-02-27 Thread Larry Rosenman

On 02/27/2022 3:03 pm, Michael Butler wrote:

[ cc list trimmed ]

On 2/27/22 14:16, Larry Rosenman wrote:


I was able to export the rest of the datasets, and re-install 
14-CURRENT from a recent snapshot, and restore the datasets I care 
about.


I'm now seeing:
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 48 (zpool), jid 0, uid 0: exited on signal 6
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 54 (zpool), jid 0, uid 0: exited on signal 6

On boot.  Ideas?


These messages may or may not be related. I found both the mfi and
mrsas drivers to be 'chatty' in this way - IOCTL complaints. I ended
up setting the debug flag for mrsas in /etc/sysctl.conf ..

dev.mrsas.0.mrsas_debug=0

There's an equivalent for mfi

Michael


I don't see it:
✖1 ❯ sysctl dev.mfi
dev.mfi.0.keep_deleted_volumes: 0
dev.mfi.0.delete_busy_volumes: 0
dev.mfi.0.%parent: pci3
dev.mfi.0.%pnpinfo: vendor=0x1000 device=0x0079 subvendor=0x1028 
subdevice=0x1f17 class=0x010400

dev.mfi.0.%location: slot=0 function=0 dbsf=pci0:3:0:0
dev.mfi.0.%driver: mfi
dev.mfi.0.%desc: Dell PERC H700 Integrated
dev.mfi.%parent:

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106



Re: ZFS PANIC: HELP.

2022-02-27 Thread Michael Butler

 [ cc list trimmed ]

On 2/27/22 14:16, Larry Rosenman wrote:


I was able to export the rest of the datasets, and re-install 14-CURRENT 
from a recent snapshot, and restore the datasets I care about.


I'm now seeing:
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 48 (zpool), jid 0, uid 0: exited on signal 6
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 54 (zpool), jid 0, uid 0: exited on signal 6

On boot.  Ideas?


These messages may or may not be related. I found both the mfi and mrsas 
drivers to be 'chatty' in this way - IOCTL complaints. I ended up 
setting the debug flag for mrsas in /etc/sysctl.conf ..


dev.mrsas.0.mrsas_debug=0

There's an equivalent for mfi

Michael






Re: ZFS PANIC: HELP.

2022-02-27 Thread Larry Rosenman

On 02/26/2022 11:08 am, Larry Rosenman wrote:

On 02/26/2022 10:57 am, Larry Rosenman wrote:

On 02/26/2022 10:37 am, Juraj Lutter wrote:

On 26 Feb 2022, at 03:03, Larry Rosenman  wrote:
I'm running this script:
#!/bin/sh
for i in $(zfs list -H | awk '{print $1}')
do
  FS=$1
  FN=$(echo ${FS} | sed -e s@/@_@g)
  sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh 
l...@freenas.lerctr.org cat - \> $FN

done




I’d put, like:

echo ${FS}

before “sudo zfs send”, to get at least a bit of a clue on where it 
can get to.


otis


—
Juraj Lutter
o...@freebsd.org

I just looked at the destination to see where it died (it did!) and I
bectl destroy'd the
BE that crashed it, and am running a new scrub -- we'll see whether
that was sufficient.

Thanks, all!

Well, it was NOT sufficient More zfs export fun to come :(


I was able to export the rest of the datasets, and re-install 14-CURRENT 
from a recent snapshot, and restore the datasets I care about.


I'm now seeing:
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 48 (zpool), jid 0, uid 0: exited on signal 6
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 54 (zpool), jid 0, uid 0: exited on signal 6

On boot.  Ideas?



--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106



Re: ZFS PANIC: HELP.

2022-02-26 Thread Larry Rosenman

On 02/26/2022 10:57 am, Larry Rosenman wrote:

On 02/26/2022 10:37 am, Juraj Lutter wrote:

On 26 Feb 2022, at 03:03, Larry Rosenman  wrote:
I'm running this script:
#!/bin/sh
for i in $(zfs list -H | awk '{print $1}')
do
  FS=$1
  FN=$(echo ${FS} | sed -e s@/@_@g)
  sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh 
l...@freenas.lerctr.org cat - \> $FN

done




I’d put, like:

echo ${FS}

before “sudo zfs send”, to get at least a bit of a clue on where it 
can get to.


otis


—
Juraj Lutter
o...@freebsd.org

I just looked at the destination to see where it died (it did!) and I
bectl destroy'd the
BE that crashed it, and am running a new scrub -- we'll see whether
that was sufficient.

Thanks, all!

Well, it was NOT sufficient More zfs export fun to come :(

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106



Re: ZFS PANIC: HELP.

2022-02-26 Thread Larry Rosenman

On 02/26/2022 10:37 am, Juraj Lutter wrote:

On 26 Feb 2022, at 03:03, Larry Rosenman  wrote:
I'm running this script:
#!/bin/sh
for i in $(zfs list -H | awk '{print $1}')
do
  FS=$1
  FN=$(echo ${FS} | sed -e s@/@_@g)
  sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org 
cat - \> $FN

done




I’d put, like:

echo ${FS}

before “sudo zfs send”, to get at least a bit of a clue on where it can 
get to.


otis


—
Juraj Lutter
o...@freebsd.org
I just looked at the destination to see where it died (it did!) and I 
bectl destroy'd the
BE that crashed it, and am running a new scrub -- we'll see whether that 
was sufficient.


Thanks, all!
--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106



Re: ZFS PANIC: HELP.

2022-02-26 Thread Alexander Leidinger
 Quoting Larry Rosenman  (from Fri, 25 Feb 2022  
20:03:51 -0600):



On 02/25/2022 2:11 am, Alexander Leidinger wrote:

Quoting Larry Rosenman  (from Thu, 24 Feb 2022  
20:19:45 -0600):



I tried a scrub -- it panic'd on a fatal double fault. 

  Suggestions?


 The safest / cleanest (but not fastest) is data export and  
pool re-creation. If you export dataset by dataset (instead of  
recursively all), you can even see which dataset is causing the  
issue. In case this per dataset export narrows down the issue and  
it is a dataset you don't care about (as in: 1) no issue to  
recreate from scratch or 2) there is a backup available) you could  
delete this (or each such) dataset and re-create it in-place (= not  
re-creating the entire pool).


Bye,
Alexander.
 http://www.Leidinger.net alexan...@leidinger.net: PGP  
0x8F31830F9F2772BF

http://www.FreeBSD.org    netch...@freebsd.org  : PGP 0x8F31830F9F2772BF


  I'm running this script:
#!/bin/sh
for i in $(zfs list -H | awk '{print $1}')
do
  FS=$1
  FN=$(echo ${FS} | sed -e s@/@_@g)
  sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh  
l...@freenas.lerctr.org cat - \> $FN

done

   

  How will I know a "Problem" dataset?


You told a scrub is panicing the system. A scrub only touches occupied  
blocks. As such a problem-dataset should panic your system. If it  
doesn't panic at all, the problem may be within a snapshot which  
contains data which is deleted in later versions of the dataset.


Bye,
Alexander.
--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpYqsU391ZUr.pgp
Description: Digitale PGP-Signatur


Re: ZFS PANIC: HELP.

2022-02-25 Thread Larry Rosenman



On 02/25/2022 2:11 am, Alexander Leidinger wrote:

Quoting Larry Rosenman  (from Thu, 24 Feb 2022 20:19:45 
-0600):



I tried a scrub -- it panic'd on a fatal double fault.

Suggestions?


The safest / cleanest (but not fastest) is data export and pool 
re-creation. If you export dataset by dataset (instead of recursively 
all), you can even see which dataset is causing the issue. In case this 
per dataset export narrows down the issue and it is a dataset you don't 
care about (as in: 1) no issue to recreate from scratch or 2) there is 
a backup available) you could delete this (or each such) dataset and 
re-create it in-place (= not re-creating the entire pool).


Bye,
Alexander.

http://www.Leidinger.net alexan...@leidinger.net: PGP 
0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 
0x8F31830F9F2772BF


I'm running this script:
#!/bin/sh
for i in $(zfs list -H | awk '{print $1}')
do
  FS=$1
  FN=$(echo ${FS} | sed -e s@/@_@g)
  sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org 
cat - \> $FN

done

How will I know a "Problem" dataset?

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-25 Thread Alexander Leidinger
 Quoting Larry Rosenman  (from Thu, 24 Feb 2022  
20:19:45 -0600):



I tried a scrub -- it panic'd on a fatal double fault. 

  Suggestions?


The safest / cleanest (but not fastest) is data export and pool  
re-creation. If you export dataset by dataset (instead of recursively  
all), you can even see which dataset is causing the issue. In case  
this per dataset export narrows down the issue and it is a dataset you  
don't care about (as in: 1) no issue to recreate from scratch or 2)  
there is a backup available) you could delete this (or each such)  
dataset and re-create it in-place (= not re-creating the entire pool).


Bye,
Alexander.
--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpbleK3b3rSl.pgp
Description: Digitale PGP-Signatur


Re: ZFS PANIC: HELP.

2022-02-24 Thread Larry Rosenman



On 02/24/2022 8:07 pm, Larry Rosenman wrote:


On 02/24/2022 1:27 pm, Larry Rosenman wrote:

On 02/24/2022 10:48 am, Rob Wing wrote:

even with those set, I still get the panid. :(

Let me see if I can compile a 14 non-INVARIANTS kernel on the 13-REL 
system.


UGH.


I chroot'd to the pool, and built a no invariants kernel.  It booted and 
seems(!) to be running.


Is there any diagnostics/clearing the crappy ZIL?

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

I tried a scrub -- it panic'd on a fatal double fault.

Suggestions?

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-24 Thread Larry Rosenman



On 02/24/2022 1:27 pm, Larry Rosenman wrote:


On 02/24/2022 10:48 am, Rob Wing wrote:


even with those set, I still get the panid. :(


Let me see if I can compile a 14 non-INVARIANTS kernel on the 13-REL 
system.


UGH.


I chroot'd to the pool, and built a no invariants kernel.  It booted and 
seems(!) to be running.


Is there any diagnostics/clearing the crappy ZIL?

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-24 Thread Larry Rosenman



On 02/24/2022 10:48 am, Rob Wing wrote:


Yes, I believe so.

On Thu, Feb 24, 2022 at 7:42 AM Larry Rosenman  wrote:

On 02/24/2022 10:36 am, Rob Wing wrote:

You might try setting `sysctl vfs.zfs.recover=1` and `sysctl 
vfs.zfs.spa.load_verify_metadata=0`.


I had a similar error the other day (couple months ago). The best I did 
was being able to import the pool read only. I ended up restoring from 
backup.


Are those tunables that I can set in loader.conf?

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

even with those set, I still get the panid. :(

Let me see if I can compile a 14 non-INVARIANTS kernel on the 13-REL 
system.


UGH.

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-24 Thread Rob Wing
Yes, I believe so.

On Thu, Feb 24, 2022 at 7:42 AM Larry Rosenman  wrote:

> On 02/24/2022 10:36 am, Rob Wing wrote:
>
> You might try setting `sysctl vfs.zfs.recover=1` and `sysctl
> vfs.zfs.spa.load_verify_metadata=0`.
>
> I had a similar error the other day (couple months ago). The best I did
> was being able to import the pool read only. I ended up restoring from
> backup.
>
>
>
> Are those tunables that I can set in loader.conf?
>
>
> --
> Larry Rosenman http://www.lerctr.org/~ler
> Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
>


Re: ZFS PANIC: HELP.

2022-02-24 Thread Larry Rosenman



On 02/24/2022 10:36 am, Rob Wing wrote:

You might try setting `sysctl vfs.zfs.recover=1` and `sysctl 
vfs.zfs.spa.load_verify_metadata=0`.


I had a similar error the other day (couple months ago). The best I did 
was being able to import the pool read only. I ended up restoring from 
backup.






Are those tunables that I can set in loader.conf?

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-24 Thread Rob Wing
You might try setting `sysctl vfs.zfs.recover=1` and `sysctl
vfs.zfs.spa.load_verify_metadata=0`.

I had a similar error the other day (couple months ago). The best I did was
being able to import the pool read only. I ended up restoring from backup.

On Thu, Feb 24, 2022 at 7:30 AM Alexander Motin  wrote:

> On 24.02.2022 10:57, Larry Rosenman wrote:
> > On 02/23/2022 9:27 pm, Larry Rosenman wrote:
> >> It crashes just after root mount (this is the boot pool and only pool
> >> on the system),
> >> seeL
> >> https://www.lerctr.org/~ler/14-BOOT-Crash.png
> >
> > Where do I go from here?
>
> I see 2 ways: 1) Since it is only an assertion and 13 is working (so
> far), you may just build 14 kernel without INVARIANTS option and later
> recreate the pool when you have time.  2) You may treat it as metadata
> corruption: import pool read-only and evacuate the data.  If you have
> recent enough snapshots you may be able to easily replicate the pool
> with all the settings to some other disk.  ZIL is not replicated, so
> corruptions there should not be a problem.  If there are no snapshots,
> then either copy on file level, or you may be able to create snapshot
> for replication in 13 (on 14 without INVARIANTS), importing pool
> read-write.
>
> --
> Alexander Motin
>
>


Re: ZFS PANIC: HELP.

2022-02-24 Thread Larry Rosenman

On 02/24/2022 10:29 am, Alexander Motin wrote:

On 24.02.2022 10:57, Larry Rosenman wrote:

On 02/23/2022 9:27 pm, Larry Rosenman wrote:

It crashes just after root mount (this is the boot pool and only pool
on the system),
seeL
https://www.lerctr.org/~ler/14-BOOT-Crash.png


Where do I go from here?


I see 2 ways: 1) Since it is only an assertion and 13 is working (so
far), you may just build 14 kernel without INVARIANTS option and later
recreate the pool when you have time.  2) You may treat it as metadata
corruption: import pool read-only and evacuate the data.  If you have
recent enough snapshots you may be able to easily replicate the pool
with all the settings to some other disk.  ZIL is not replicated, so
corruptions there should not be a problem.  If there are no snapshots,
then either copy on file level, or you may be able to create snapshot
for replication in 13 (on 14 without INVARIANTS), importing pool
read-write.


Ugh.  The box is a 6 disk R710, and all 6 disks are in the pool.

I do have a FreeNAS box with enough space to copy the data out.  There 
ARE snaps of MOST filesystems that are taken regularly.


The 13 I'm booting from is the 13 memstick image.

There are ~70 filesystems (IIRC) with poudriere, ports, et al.

I'm not sure how to build the 14 kernel from the 13 booted box.

Ideas?  Methods?


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106



Re: ZFS PANIC: HELP.

2022-02-24 Thread Alexander Motin

On 24.02.2022 10:57, Larry Rosenman wrote:

On 02/23/2022 9:27 pm, Larry Rosenman wrote:

It crashes just after root mount (this is the boot pool and only pool
on the system),
seeL
https://www.lerctr.org/~ler/14-BOOT-Crash.png


Where do I go from here?


I see 2 ways: 1) Since it is only an assertion and 13 is working (so 
far), you may just build 14 kernel without INVARIANTS option and later 
recreate the pool when you have time.  2) You may treat it as metadata 
corruption: import pool read-only and evacuate the data.  If you have 
recent enough snapshots you may be able to easily replicate the pool 
with all the settings to some other disk.  ZIL is not replicated, so 
corruptions there should not be a problem.  If there are no snapshots, 
then either copy on file level, or you may be able to create snapshot 
for replication in 13 (on 14 without INVARIANTS), importing pool read-write.


--
Alexander Motin



Re: ZFS PANIC: HELP.

2022-02-24 Thread Larry Rosenman

On 02/23/2022 9:27 pm, Larry Rosenman wrote:

On 02/23/2022 9:15 pm, Alexander Motin wrote:

On 23.02.2022 22:01, Larry Rosenman wrote:

On 02/23/2022 8:58 pm, Alexander Motin wrote:

On 23.02.2022 21:52, Larry Rosenman wrote:

On 02/23/2022 8:41 pm, Alexander Motin wrote:

Hi Larry,

The panic you are getting is an assertion, enabled by kernel built
with INVARIANTS option.  On 13 you may just not have that 
debugging

enabled to hit the issue.  But that may be only a consequence.
Original problem I guess in possibly corrupted ZFS intent log 
records
(or false positive), that could happen so due to use of -F 
recovery
option on `zpool import`, that supposed to try import pool at 
earlier
transaction group if there is some metadata corruption found.  It 
is
not supposed to work 100% and only a last resort.  Though may be 
that
assertion is just excessively strict for that specific recovery 
case.
If as you say pool can be imported and scrubbed on 13, then I'd 
expect

following clean export should allow later import on 14 without -F.

On 23.02.2022 21:21, Larry Rosenman wrote:


've got my main dev box that crashes on 14 with the screen shot 
at https://www.lerctr.org/~ler/14-zfs-crash.png.

Booting from a 13-REL USB installer it imports and scrubs.

Ideas?

I can either video conference with shared screen or give access 
to the console via my Dominion KVM.


Any help/ideas/etc welcome I really need to get this box back.


How can I import the pool withOUT it mounting the FileSystems so I 
can export it cleanly on the 13 system?


Why do you need to import without mounting file systems?  I think 
you

may actually wish them to be mounted to replay their ZILs.  Just use
-R option to mount file systems in some different place.


I get the errors shown at:
https://www.lerctr.org/~ler/14-mount-R-output.png

Should I worry?  Or do something(tm) here?


This looks weird, but may possibly depend on mount points topology,
whether /mnt is writable, etc.  What happen if you export it now and
try to import it in normal way on 14 without -F?


It crashes just after root mount (this is the boot pool and only pool
on the system),
seeL
https://www.lerctr.org/~ler/14-BOOT-Crash.png


Where do I go from here?


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106



Re: ZFS PANIC: HELP.

2022-02-23 Thread Larry Rosenman

On 02/23/2022 9:15 pm, Alexander Motin wrote:

On 23.02.2022 22:01, Larry Rosenman wrote:

On 02/23/2022 8:58 pm, Alexander Motin wrote:

On 23.02.2022 21:52, Larry Rosenman wrote:

On 02/23/2022 8:41 pm, Alexander Motin wrote:

Hi Larry,

The panic you are getting is an assertion, enabled by kernel built
with INVARIANTS option.  On 13 you may just not have that debugging
enabled to hit the issue.  But that may be only a consequence.
Original problem I guess in possibly corrupted ZFS intent log 
records

(or false positive), that could happen so due to use of -F recovery
option on `zpool import`, that supposed to try import pool at 
earlier
transaction group if there is some metadata corruption found.  It 
is
not supposed to work 100% and only a last resort.  Though may be 
that
assertion is just excessively strict for that specific recovery 
case.
If as you say pool can be imported and scrubbed on 13, then I'd 
expect

following clean export should allow later import on 14 without -F.

On 23.02.2022 21:21, Larry Rosenman wrote:


've got my main dev box that crashes on 14 with the screen shot at 
https://www.lerctr.org/~ler/14-zfs-crash.png.

Booting from a 13-REL USB installer it imports and scrubs.

Ideas?

I can either video conference with shared screen or give access to 
the console via my Dominion KVM.


Any help/ideas/etc welcome I really need to get this box back.


How can I import the pool withOUT it mounting the FileSystems so I 
can export it cleanly on the 13 system?


Why do you need to import without mounting file systems?  I think you
may actually wish them to be mounted to replay their ZILs.  Just use
-R option to mount file systems in some different place.


I get the errors shown at:
https://www.lerctr.org/~ler/14-mount-R-output.png

Should I worry?  Or do something(tm) here?


This looks weird, but may possibly depend on mount points topology,
whether /mnt is writable, etc.  What happen if you export it now and
try to import it in normal way on 14 without -F?


It crashes just after root mount (this is the boot pool and only pool on 
the system),

seeL
https://www.lerctr.org/~ler/14-BOOT-Crash.png


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106



Re: ZFS PANIC: HELP.

2022-02-23 Thread Alexander Motin

On 23.02.2022 22:01, Larry Rosenman wrote:

On 02/23/2022 8:58 pm, Alexander Motin wrote:

On 23.02.2022 21:52, Larry Rosenman wrote:

On 02/23/2022 8:41 pm, Alexander Motin wrote:

Hi Larry,

The panic you are getting is an assertion, enabled by kernel built
with INVARIANTS option.  On 13 you may just not have that debugging
enabled to hit the issue.  But that may be only a consequence.
Original problem I guess in possibly corrupted ZFS intent log records
(or false positive), that could happen so due to use of -F recovery
option on `zpool import`, that supposed to try import pool at earlier
transaction group if there is some metadata corruption found.  It is
not supposed to work 100% and only a last resort.  Though may be that
assertion is just excessively strict for that specific recovery case.
If as you say pool can be imported and scrubbed on 13, then I'd expect
following clean export should allow later import on 14 without -F.

On 23.02.2022 21:21, Larry Rosenman wrote:


've got my main dev box that crashes on 14 with the screen shot at 
https://www.lerctr.org/~ler/14-zfs-crash.png.

Booting from a 13-REL USB installer it imports and scrubs.

Ideas?

I can either video conference with shared screen or give access to 
the console via my Dominion KVM.


Any help/ideas/etc welcome I really need to get this box back.


How can I import the pool withOUT it mounting the FileSystems so I 
can export it cleanly on the 13 system?


Why do you need to import without mounting file systems?  I think you
may actually wish them to be mounted to replay their ZILs.  Just use
-R option to mount file systems in some different place.


I get the errors shown at:
https://www.lerctr.org/~ler/14-mount-R-output.png

Should I worry?  Or do something(tm) here?


This looks weird, but may possibly depend on mount points topology, 
whether /mnt is writable, etc.  What happen if you export it now and try 
to import it in normal way on 14 without -F?


--
Alexander Motin



Re: ZFS PANIC: HELP.

2022-02-23 Thread Larry Rosenman

On 02/23/2022 8:58 pm, Alexander Motin wrote:

On 23.02.2022 21:52, Larry Rosenman wrote:

On 02/23/2022 8:41 pm, Alexander Motin wrote:

Hi Larry,

The panic you are getting is an assertion, enabled by kernel built
with INVARIANTS option.  On 13 you may just not have that debugging
enabled to hit the issue.  But that may be only a consequence.
Original problem I guess in possibly corrupted ZFS intent log records
(or false positive), that could happen so due to use of -F recovery
option on `zpool import`, that supposed to try import pool at earlier
transaction group if there is some metadata corruption found.  It is
not supposed to work 100% and only a last resort.  Though may be that
assertion is just excessively strict for that specific recovery case.
If as you say pool can be imported and scrubbed on 13, then I'd 
expect

following clean export should allow later import on 14 without -F.

On 23.02.2022 21:21, Larry Rosenman wrote:


've got my main dev box that crashes on 14 with the screen shot at 
https://www.lerctr.org/~ler/14-zfs-crash.png.

Booting from a 13-REL USB installer it imports and scrubs.

Ideas?

I can either video conference with shared screen or give access to 
the console via my Dominion KVM.


Any help/ideas/etc welcome I really need to get this box back.


How can I import the pool withOUT it mounting the FileSystems so I can 
export it cleanly on the 13 system?


Why do you need to import without mounting file systems?  I think you
may actually wish them to be mounted to replay their ZILs.  Just use
-R option to mount file systems in some different place.


I get the errors shown at:
https://www.lerctr.org/~ler/14-mount-R-output.png

Should I worry?  Or do something(tm) here?


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106



Re: ZFS PANIC: HELP.

2022-02-23 Thread Alexander Motin

On 23.02.2022 21:52, Larry Rosenman wrote:

On 02/23/2022 8:41 pm, Alexander Motin wrote:

Hi Larry,

The panic you are getting is an assertion, enabled by kernel built
with INVARIANTS option.  On 13 you may just not have that debugging
enabled to hit the issue.  But that may be only a consequence.
Original problem I guess in possibly corrupted ZFS intent log records
(or false positive), that could happen so due to use of -F recovery
option on `zpool import`, that supposed to try import pool at earlier
transaction group if there is some metadata corruption found.  It is
not supposed to work 100% and only a last resort.  Though may be that
assertion is just excessively strict for that specific recovery case.
If as you say pool can be imported and scrubbed on 13, then I'd expect
following clean export should allow later import on 14 without -F.

On 23.02.2022 21:21, Larry Rosenman wrote:


've got my main dev box that crashes on 14 with the screen shot at 
https://www.lerctr.org/~ler/14-zfs-crash.png.

Booting from a 13-REL USB installer it imports and scrubs.

Ideas?

I can either video conference with shared screen or give access to 
the console via my Dominion KVM.


Any help/ideas/etc welcome I really need to get this box back.


How can I import the pool withOUT it mounting the FileSystems so I can 
export it cleanly on the 13 system?


Why do you need to import without mounting file systems?  I think you 
may actually wish them to be mounted to replay their ZILs.  Just use -R 
option to mount file systems in some different place.


--
Alexander Motin



Re: ZFS PANIC: HELP.

2022-02-23 Thread Larry Rosenman

On 02/23/2022 8:41 pm, Alexander Motin wrote:

Hi Larry,

The panic you are getting is an assertion, enabled by kernel built
with INVARIANTS option.  On 13 you may just not have that debugging
enabled to hit the issue.  But that may be only a consequence.
Original problem I guess in possibly corrupted ZFS intent log records
(or false positive), that could happen so due to use of -F recovery
option on `zpool import`, that supposed to try import pool at earlier
transaction group if there is some metadata corruption found.  It is
not supposed to work 100% and only a last resort.  Though may be that
assertion is just excessively strict for that specific recovery case.
If as you say pool can be imported and scrubbed on 13, then I'd expect
following clean export should allow later import on 14 without -F.

On 23.02.2022 21:21, Larry Rosenman wrote:


've got my main dev box that crashes on 14 with the screen shot at 
https://www.lerctr.org/~ler/14-zfs-crash.png.

Booting from a 13-REL USB installer it imports and scrubs.

Ideas?

I can either video conference with shared screen or give access to the 
console via my Dominion KVM.


Any help/ideas/etc welcome I really need to get this box back.


How can I import the pool withOUT it mounting the FileSystems so I can 
export it cleanly on the 13 system?



--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106



Re: ZFS PANIC: HELP.

2022-02-23 Thread Alexander Motin

Hi Larry,

The panic you are getting is an assertion, enabled by kernel built with 
INVARIANTS option.  On 13 you may just not have that debugging enabled 
to hit the issue.  But that may be only a consequence.  Original problem 
I guess in possibly corrupted ZFS intent log records (or false 
positive), that could happen so due to use of -F recovery option on 
`zpool import`, that supposed to try import pool at earlier transaction 
group if there is some metadata corruption found.  It is not supposed to 
work 100% and only a last resort.  Though may be that assertion is just 
excessively strict for that specific recovery case.  If as you say pool 
can be imported and scrubbed on 13, then I'd expect following clean 
export should allow later import on 14 without -F.


On 23.02.2022 21:21, Larry Rosenman wrote:


've got my main dev box that crashes on 14 with the screen shot at 
https://www.lerctr.org/~ler/14-zfs-crash.png.

Booting from a 13-REL USB installer it imports and scrubs.

Ideas?

I can either video conference with shared screen or give access to the 
console via my Dominion KVM.


Any help/ideas/etc welcome I really need to get this box back.




--
Alexander Motin



ZFS PANIC: HELP.

2022-02-23 Thread Larry Rosenman



've got my main dev box that crashes on 14 with the screen shot at 
https://www.lerctr.org/~ler/14-zfs-crash.png.

Booting from a 13-REL USB installer it imports and scrubs.

Ideas?

I can either video conference with shared screen or give access to the 
console via my Dominion KVM.


Any help/ideas/etc welcome I really need to get this box back.


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106