subject:"ZFS panic"

Another ZFS Panic -- buffer modified while frozen

2023-08-30 Thread Cy Schubert

A different panic on a different amd64 machine also running poudriere but 
building amd64 packages. Exmh was just started, displaying back to my 
laptop at the time of panic.

panic: buffer modified while frozen!
cpuid = 1
time = 1693417762
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe008e67fba0
vpanic() at vpanic+0x132/frame 0xfe008e67fcd0
panic() at panic+0x43/frame 0xfe008e67fd30
arc_cksum_verify() at arc_cksum_verify+0x12c/frame 0xfe008e67fd80
arc_buf_destroy_impl() at arc_buf_destroy_impl+0x6f/frame 0xfe008e67fdc0
arc_buf_destroy() at arc_buf_destroy+0xd5/frame 0xfe008e67fdf0
dbuf_destroy() at dbuf_destroy+0x60/frame 0xfe008e67fe40
dbuf_evict_one() at dbuf_evict_one+0x176/frame 0xfe008e67fe70
dbuf_evict_thread() at dbuf_evict_thread+0x345/frame 0xfe008e67fef0
fork_exit() at fork_exit+0x82/frame 0xfe008e67ff30
fork_trampoline() at fork_trampoline+0xe/frame 0xfe008e67ff30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 3h46m10s
Dumping 1962 out of 8122 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91
%

__curthread () at /opt/src/git-src/sys/amd64/include/pcpu_aux.h:57
57  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu
,
(kgdb) bt
#0  __curthread () at /opt/src/git-src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=1)
at /opt/src/git-src/sys/kern/kern_shutdown.c:405
#2  0x806c1b30 in kern_reboot (howto=260)
at /opt/src/git-src/sys/kern/kern_shutdown.c:526
#3  0x806c202f in vpanic (
fmt=0x83d82b7c "buffer modified while frozen!", 
ap=ap@entry=0xfe008e67fd10)
at /opt/src/git-src/sys/kern/kern_shutdown.c:970
#4  0x806c1dd3 in panic (fmt=)
at /opt/src/git-src/sys/kern/kern_shutdown.c:894
#5  0x83ae5f2c in arc_cksum_verify (buf=0xf80188cde180)
at /opt/src/git-src/sys/contrib/openzfs/module/zfs/arc.c:1475
#6  0x83ae99ff in arc_buf_destroy_impl (
buf=buf@entry=0xf80188cde180)
at /opt/src/git-src/sys/contrib/openzfs/module/zfs/arc.c:3113
#7  0x83ae9625 in arc_buf_destroy (buf=0xf80188cde180, 
tag=tag@entry=0xf80104a534c8)
at /opt/src/git-src/sys/contrib/openzfs/module/zfs/arc.c:3889
#8  0x83b0eee0 in dbuf_destroy (db=db@entry=0xf80104a534c8)
at /opt/src/git-src/sys/contrib/openzfs/module/zfs/dbuf.c:2983
#9  0x83b17996 in dbuf_evict_one ()
at /opt/src/git-src/sys/contrib/openzfs/module/zfs/dbuf.c:781
--Type  for more, q to quit, c to continue without paging--c
#10 0x83b0c345 in dbuf_evict_thread (unused=)
at /opt/src/git-src/sys/contrib/openzfs/module/zfs/dbuf.c:819
#11 0x80677ab2 in fork_exit (
callout=0x83b0c000 , arg=0x0, 
frame=0xfe008e67ff40) at /opt/src/git-src/sys/kern/kern_fork.c:1160
#12 
(kgdb) 


FreeBSD cwsys 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 150 #4 
komquats-n26508
9-b22aae410bc7: Wed Aug 30 04:38:24 PDT 2023 root@cwsys:/export/obj/opt/
src/
git-src/amd64.amd64/sys/BREAK2 amd64

Almost the same configuration as the other machine.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0


ÀÀÀÀÀÀÀÀ

Re: ZFS: panic: VERIFY3(dev->l2ad_hand <= dev->l2ad_evict) failed

2023-06-10 Thread Graham Perrin


On 27/05/2023 16:30, Graham Perrin wrote:

Three panics with c2c9ac88c2bb (2023-05-26, 1400089):

…


Martin, if you'd like to take 
, please go 
ahead.


Thanks



OpenPGP_signature
Description: OpenPGP digital signature

ZFS: panic: VERIFY3(dev->l2ad_hand <= dev->l2ad_evict) failed

2023-05-27 Thread Graham Perrin


Three panics with c2c9ac88c2bb (2023-05-26, 1400089):

 Dumptime: 2023-05-27 03:17:16 +0100

 Dumptime: 2023-05-27 03:41:03 +0100

 Dumptime: 2023-05-27 14:03:32 +0100

Are they symptomatic of  
(fixed 2023-05-09 with 
), 
or should I treat today's panics as separate?


The first panic occurred during installworld, uptime five minutes.

The second was in single user mode, when I took the third of three L2ARC 
devices online, uptime twenty-two minutes.


The third panic was whilst using Plasma, without the device that 
featured in the second panic, uptime twenty-three minutes.


The result of a probe at 07:49 UTC: 





OpenPGP_signature
Description: OpenPGP digital signature

Re: ZFS PANIC: HELP.

2022-02-27 Thread Larry Rosenman


On 02/27/2022 3:58 pm, Mark Johnston wrote:

On Sun, Feb 27, 2022 at 01:16:44PM -0600, Larry Rosenman wrote:

On 02/26/2022 11:08 am, Larry Rosenman wrote:
> On 02/26/2022 10:57 am, Larry Rosenman wrote:
>> On 02/26/2022 10:37 am, Juraj Lutter wrote:
 On 26 Feb 2022, at 03:03, Larry Rosenman  wrote:
 I'm running this script:
 #!/bin/sh
 for i in $(zfs list -H | awk '{print $1}')
 do
   FS=$1
   FN=$(echo ${FS} | sed -e s@/@_@g)
   sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh
 l...@freenas.lerctr.org cat - \> $FN
 done



>>> I’d put, like:
>>>
>>> echo ${FS}
>>>
>>> before “sudo zfs send”, to get at least a bit of a clue on where it
>>> can get to.
>>>
>>> otis
>>>
>>>
>>> —
>>> Juraj Lutter
>>> o...@freebsd.org
>> I just looked at the destination to see where it died (it did!) and I
>> bectl destroy'd the
>> BE that crashed it, and am running a new scrub -- we'll see whether
>> that was sufficient.
>>
>> Thanks, all!
> Well, it was NOT sufficient More zfs export fun to come :(

I was able to export the rest of the datasets, and re-install 
14-CURRENT

from a recent snapshot, and restore the datasets I care about.

I'm now seeing:
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 48 (zpool), jid 0, uid 0: exited on signal 6
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 54 (zpool), jid 0, uid 0: exited on signal 6

On boot.  Ideas?


That ioctl is DIOCGMEDIASIZE, i.e., something is asking /dev/mfi0, the
controller device node, about the size of a disk.  Presumably this is
the result of some kind of misconfiguration somewhere, and /dev/mfid0
was meant instead.



per advice from markj@ I deleted the /{etc,boot}/zfs/zpool.cache files, 
and this issue went

away.  Stale cache files which are no longer needed.

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-27 Thread Mark Johnston

On Sun, Feb 27, 2022 at 01:16:44PM -0600, Larry Rosenman wrote:
> On 02/26/2022 11:08 am, Larry Rosenman wrote:
> > On 02/26/2022 10:57 am, Larry Rosenman wrote:
> >> On 02/26/2022 10:37 am, Juraj Lutter wrote:
>  On 26 Feb 2022, at 03:03, Larry Rosenman  wrote:
>  I'm running this script:
>  #!/bin/sh
>  for i in $(zfs list -H | awk '{print $1}')
>  do
>    FS=$1
>    FN=$(echo ${FS} | sed -e s@/@_@g)
>    sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh 
>  l...@freenas.lerctr.org cat - \> $FN
>  done
>  
>  
>  
> >>> I’d put, like:
> >>> 
> >>> echo ${FS}
> >>> 
> >>> before “sudo zfs send”, to get at least a bit of a clue on where it 
> >>> can get to.
> >>> 
> >>> otis
> >>> 
> >>> 
> >>> —
> >>> Juraj Lutter
> >>> o...@freebsd.org
> >> I just looked at the destination to see where it died (it did!) and I
> >> bectl destroy'd the
> >> BE that crashed it, and am running a new scrub -- we'll see whether
> >> that was sufficient.
> >> 
> >> Thanks, all!
> > Well, it was NOT sufficient More zfs export fun to come :(
> 
> I was able to export the rest of the datasets, and re-install 14-CURRENT 
> from a recent snapshot, and restore the datasets I care about.
> 
> I'm now seeing:
> mfi0: IOCTL 0x40086481 not handled
> mfi0: IOCTL 0x40086481 not handled
> mfi0: IOCTL 0x40086481 not handled
> mfi0: IOCTL 0x40086481 not handled
> pid 48 (zpool), jid 0, uid 0: exited on signal 6
> mfi0: IOCTL 0x40086481 not handled
> mfi0: IOCTL 0x40086481 not handled
> mfi0: IOCTL 0x40086481 not handled
> mfi0: IOCTL 0x40086481 not handled
> pid 54 (zpool), jid 0, uid 0: exited on signal 6
> 
> On boot.  Ideas?

That ioctl is DIOCGMEDIASIZE, i.e., something is asking /dev/mfi0, the
controller device node, about the size of a disk.  Presumably this is
the result of some kind of misconfiguration somewhere, and /dev/mfid0
was meant instead.

Re: ZFS PANIC: HELP.

2022-02-27 Thread Michael Butler


On 2/27/22 16:09, Larry Rosenman wrote:

On 02/27/2022 3:03 pm, Michael Butler wrote:

[ cc list trimmed ]

On 2/27/22 14:16, Larry Rosenman wrote:


I was able to export the rest of the datasets, and re-install 
14-CURRENT from a recent snapshot, and restore the datasets I care 
about.


I'm now seeing:
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 48 (zpool), jid 0, uid 0: exited on signal 6
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 54 (zpool), jid 0, uid 0: exited on signal 6

On boot.  Ideas?


These messages may or may not be related. I found both the mfi and
mrsas drivers to be 'chatty' in this way - IOCTL complaints. I ended
up setting the debug flag for mrsas in /etc/sysctl.conf ..

dev.mrsas.0.mrsas_debug=0

There's an equivalent for mfi

Michael


I don't see it:
✖1 ❯ sysctl dev.mfi
dev.mfi.0.keep_deleted_volumes: 0
dev.mfi.0.delete_busy_volumes: 0
dev.mfi.0.%parent: pci3
dev.mfi.0.%pnpinfo: vendor=0x1000 device=0x0079 subvendor=0x1028 
subdevice=0x1f17 class=0x010400

dev.mfi.0.%location: slot=0 function=0 dbsf=pci0:3:0:0
dev.mfi.0.%driver: mfi
dev.mfi.0.%desc: Dell PERC H700 Integrated
dev.mfi.%parent:


 my brain-fade - you're right; it is only there and tunable in the 
mrsas driver.


My apologies :-(

Michael

Re: ZFS PANIC: HELP.

2022-02-27 Thread Larry Rosenman


On 02/27/2022 3:03 pm, Michael Butler wrote:

[ cc list trimmed ]

On 2/27/22 14:16, Larry Rosenman wrote:


I was able to export the rest of the datasets, and re-install 
14-CURRENT from a recent snapshot, and restore the datasets I care 
about.


I'm now seeing:
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 48 (zpool), jid 0, uid 0: exited on signal 6
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 54 (zpool), jid 0, uid 0: exited on signal 6

On boot.  Ideas?


These messages may or may not be related. I found both the mfi and
mrsas drivers to be 'chatty' in this way - IOCTL complaints. I ended
up setting the debug flag for mrsas in /etc/sysctl.conf ..

dev.mrsas.0.mrsas_debug=0

There's an equivalent for mfi

Michael


I don't see it:
✖1 ❯ sysctl dev.mfi
dev.mfi.0.keep_deleted_volumes: 0
dev.mfi.0.delete_busy_volumes: 0
dev.mfi.0.%parent: pci3
dev.mfi.0.%pnpinfo: vendor=0x1000 device=0x0079 subvendor=0x1028 
subdevice=0x1f17 class=0x010400

dev.mfi.0.%location: slot=0 function=0 dbsf=pci0:3:0:0
dev.mfi.0.%driver: mfi
dev.mfi.0.%desc: Dell PERC H700 Integrated
dev.mfi.%parent:

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-27 Thread Michael Butler


 [ cc list trimmed ]

On 2/27/22 14:16, Larry Rosenman wrote:


I was able to export the rest of the datasets, and re-install 14-CURRENT 
from a recent snapshot, and restore the datasets I care about.


I'm now seeing:
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 48 (zpool), jid 0, uid 0: exited on signal 6
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 54 (zpool), jid 0, uid 0: exited on signal 6

On boot.  Ideas?


These messages may or may not be related. I found both the mfi and mrsas 
drivers to be 'chatty' in this way - IOCTL complaints. I ended up 
setting the debug flag for mrsas in /etc/sysctl.conf ..


dev.mrsas.0.mrsas_debug=0

There's an equivalent for mfi

Michael

Re: ZFS PANIC: HELP.

2022-02-27 Thread Larry Rosenman


On 02/26/2022 11:08 am, Larry Rosenman wrote:

On 02/26/2022 10:57 am, Larry Rosenman wrote:

On 02/26/2022 10:37 am, Juraj Lutter wrote:

On 26 Feb 2022, at 03:03, Larry Rosenman  wrote:
I'm running this script:
#!/bin/sh
for i in $(zfs list -H | awk '{print $1}')
do
  FS=$1
  FN=$(echo ${FS} | sed -e s@/@_@g)
  sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh 
l...@freenas.lerctr.org cat - \> $FN

done




I’d put, like:

echo ${FS}

before “sudo zfs send”, to get at least a bit of a clue on where it 
can get to.


otis


—
Juraj Lutter
o...@freebsd.org

I just looked at the destination to see where it died (it did!) and I
bectl destroy'd the
BE that crashed it, and am running a new scrub -- we'll see whether
that was sufficient.

Thanks, all!

Well, it was NOT sufficient More zfs export fun to come :(


I was able to export the rest of the datasets, and re-install 14-CURRENT 
from a recent snapshot, and restore the datasets I care about.


I'm now seeing:
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 48 (zpool), jid 0, uid 0: exited on signal 6
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
mfi0: IOCTL 0x40086481 not handled
pid 54 (zpool), jid 0, uid 0: exited on signal 6

On boot.  Ideas?



--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-26 Thread Larry Rosenman


On 02/26/2022 10:57 am, Larry Rosenman wrote:

On 02/26/2022 10:37 am, Juraj Lutter wrote:

On 26 Feb 2022, at 03:03, Larry Rosenman  wrote:
I'm running this script:
#!/bin/sh
for i in $(zfs list -H | awk '{print $1}')
do
  FS=$1
  FN=$(echo ${FS} | sed -e s@/@_@g)
  sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh 
l...@freenas.lerctr.org cat - \> $FN

done




I’d put, like:

echo ${FS}

before “sudo zfs send”, to get at least a bit of a clue on where it 
can get to.


otis


—
Juraj Lutter
o...@freebsd.org

I just looked at the destination to see where it died (it did!) and I
bectl destroy'd the
BE that crashed it, and am running a new scrub -- we'll see whether
that was sufficient.

Thanks, all!

Well, it was NOT sufficient More zfs export fun to come :(

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-26 Thread Larry Rosenman


On 02/26/2022 10:37 am, Juraj Lutter wrote:

On 26 Feb 2022, at 03:03, Larry Rosenman  wrote:
I'm running this script:
#!/bin/sh
for i in $(zfs list -H | awk '{print $1}')
do
  FS=$1
  FN=$(echo ${FS} | sed -e s@/@_@g)
  sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org 
cat - \> $FN

done




I’d put, like:

echo ${FS}

before “sudo zfs send”, to get at least a bit of a clue on where it can 
get to.


otis


—
Juraj Lutter
o...@freebsd.org
I just looked at the destination to see where it died (it did!) and I 
bectl destroy'd the
BE that crashed it, and am running a new scrub -- we'll see whether that 
was sufficient.


Thanks, all!
--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-26 Thread Alexander Leidinger

 Quoting Larry Rosenman  (from Fri, 25 Feb 2022  
20:03:51 -0600):



On 02/25/2022 2:11 am, Alexander Leidinger wrote:

Quoting Larry Rosenman  (from Thu, 24 Feb 2022  
20:19:45 -0600):



I tried a scrub -- it panic'd on a fatal double fault. 

  Suggestions?


 The safest / cleanest (but not fastest) is data export and  
pool re-creation. If you export dataset by dataset (instead of  
recursively all), you can even see which dataset is causing the  
issue. In case this per dataset export narrows down the issue and  
it is a dataset you don't care about (as in: 1) no issue to  
recreate from scratch or 2) there is a backup available) you could  
delete this (or each such) dataset and re-create it in-place (= not  
re-creating the entire pool).


Bye,
Alexander.
 http://www.Leidinger.net alexan...@leidinger.net: PGP  
0x8F31830F9F2772BF

http://www.FreeBSD.org    netch...@freebsd.org  : PGP 0x8F31830F9F2772BF


  I'm running this script:
#!/bin/sh
for i in $(zfs list -H | awk '{print $1}')
do
  FS=$1
  FN=$(echo ${FS} | sed -e s@/@_@g)
  sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh  
l...@freenas.lerctr.org cat - \> $FN

done

   

  How will I know a "Problem" dataset?


You told a scrub is panicing the system. A scrub only touches occupied  
blocks. As such a problem-dataset should panic your system. If it  
doesn't panic at all, the problem may be within a snapshot which  
contains data which is deleted in later versions of the dataset.


Bye,
Alexander.
--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpYqsU391ZUr.pgp
Description: Digitale PGP-Signatur

Re: ZFS PANIC: HELP.

2022-02-25 Thread Larry Rosenman




On 02/25/2022 2:11 am, Alexander Leidinger wrote:

Quoting Larry Rosenman  (from Thu, 24 Feb 2022 20:19:45 
-0600):



I tried a scrub -- it panic'd on a fatal double fault.

Suggestions?


The safest / cleanest (but not fastest) is data export and pool 
re-creation. If you export dataset by dataset (instead of recursively 
all), you can even see which dataset is causing the issue. In case this 
per dataset export narrows down the issue and it is a dataset you don't 
care about (as in: 1) no issue to recreate from scratch or 2) there is 
a backup available) you could delete this (or each such) dataset and 
re-create it in-place (= not re-creating the entire pool).


Bye,
Alexander.

http://www.Leidinger.net alexan...@leidinger.net: PGP 
0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 
0x8F31830F9F2772BF


I'm running this script:
#!/bin/sh
for i in $(zfs list -H | awk '{print $1}')
do
  FS=$1
  FN=$(echo ${FS} | sed -e s@/@_@g)
  sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org 
cat - \> $FN

done

How will I know a "Problem" dataset?

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-25 Thread Alexander Leidinger

 Quoting Larry Rosenman  (from Thu, 24 Feb 2022  
20:19:45 -0600):



I tried a scrub -- it panic'd on a fatal double fault. 

  Suggestions?


The safest / cleanest (but not fastest) is data export and pool  
re-creation. If you export dataset by dataset (instead of recursively  
all), you can even see which dataset is causing the issue. In case  
this per dataset export narrows down the issue and it is a dataset you  
don't care about (as in: 1) no issue to recreate from scratch or 2)  
there is a backup available) you could delete this (or each such)  
dataset and re-create it in-place (= not re-creating the entire pool).


Bye,
Alexander.
--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpbleK3b3rSl.pgp
Description: Digitale PGP-Signatur

Re: ZFS PANIC: HELP.

2022-02-24 Thread Larry Rosenman




On 02/24/2022 8:07 pm, Larry Rosenman wrote:


On 02/24/2022 1:27 pm, Larry Rosenman wrote:

On 02/24/2022 10:48 am, Rob Wing wrote:

even with those set, I still get the panid. :(

Let me see if I can compile a 14 non-INVARIANTS kernel on the 13-REL 
system.


UGH.


I chroot'd to the pool, and built a no invariants kernel.  It booted and 
seems(!) to be running.


Is there any diagnostics/clearing the crappy ZIL?

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

I tried a scrub -- it panic'd on a fatal double fault.

Suggestions?

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-24 Thread Larry Rosenman




On 02/24/2022 1:27 pm, Larry Rosenman wrote:


On 02/24/2022 10:48 am, Rob Wing wrote:


even with those set, I still get the panid. :(


Let me see if I can compile a 14 non-INVARIANTS kernel on the 13-REL 
system.


UGH.


I chroot'd to the pool, and built a no invariants kernel.  It booted and 
seems(!) to be running.


Is there any diagnostics/clearing the crappy ZIL?

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-24 Thread Larry Rosenman




On 02/24/2022 10:48 am, Rob Wing wrote:


Yes, I believe so.

On Thu, Feb 24, 2022 at 7:42 AM Larry Rosenman  wrote:

On 02/24/2022 10:36 am, Rob Wing wrote:

You might try setting `sysctl vfs.zfs.recover=1` and `sysctl 
vfs.zfs.spa.load_verify_metadata=0`.


I had a similar error the other day (couple months ago). The best I did 
was being able to import the pool read only. I ended up restoring from 
backup.


Are those tunables that I can set in loader.conf?

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

even with those set, I still get the panid. :(

Let me see if I can compile a 14 non-INVARIANTS kernel on the 13-REL 
system.


UGH.

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-24 Thread Rob Wing

Yes, I believe so.

On Thu, Feb 24, 2022 at 7:42 AM Larry Rosenman  wrote:

> On 02/24/2022 10:36 am, Rob Wing wrote:
>
> You might try setting `sysctl vfs.zfs.recover=1` and `sysctl
> vfs.zfs.spa.load_verify_metadata=0`.
>
> I had a similar error the other day (couple months ago). The best I did
> was being able to import the pool read only. I ended up restoring from
> backup.
>
>
>
> Are those tunables that I can set in loader.conf?
>
>
> --
> Larry Rosenman http://www.lerctr.org/~ler
> Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
>

Re: ZFS PANIC: HELP.

2022-02-24 Thread Larry Rosenman




On 02/24/2022 10:36 am, Rob Wing wrote:

You might try setting `sysctl vfs.zfs.recover=1` and `sysctl 
vfs.zfs.spa.load_verify_metadata=0`.


I had a similar error the other day (couple months ago). The best I did 
was being able to import the pool read only. I ended up restoring from 
backup.






Are those tunables that I can set in loader.conf?

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-24 Thread Rob Wing

You might try setting `sysctl vfs.zfs.recover=1` and `sysctl
vfs.zfs.spa.load_verify_metadata=0`.

I had a similar error the other day (couple months ago). The best I did was
being able to import the pool read only. I ended up restoring from backup.

On Thu, Feb 24, 2022 at 7:30 AM Alexander Motin  wrote:

> On 24.02.2022 10:57, Larry Rosenman wrote:
> > On 02/23/2022 9:27 pm, Larry Rosenman wrote:
> >> It crashes just after root mount (this is the boot pool and only pool
> >> on the system),
> >> seeL
> >> https://www.lerctr.org/~ler/14-BOOT-Crash.png
> >
> > Where do I go from here?
>
> I see 2 ways: 1) Since it is only an assertion and 13 is working (so
> far), you may just build 14 kernel without INVARIANTS option and later
> recreate the pool when you have time.  2) You may treat it as metadata
> corruption: import pool read-only and evacuate the data.  If you have
> recent enough snapshots you may be able to easily replicate the pool
> with all the settings to some other disk.  ZIL is not replicated, so
> corruptions there should not be a problem.  If there are no snapshots,
> then either copy on file level, or you may be able to create snapshot
> for replication in 13 (on 14 without INVARIANTS), importing pool
> read-write.
>
> --
> Alexander Motin
>
>

Re: ZFS PANIC: HELP.

2022-02-24 Thread Larry Rosenman


On 02/24/2022 10:29 am, Alexander Motin wrote:

On 24.02.2022 10:57, Larry Rosenman wrote:

On 02/23/2022 9:27 pm, Larry Rosenman wrote:

It crashes just after root mount (this is the boot pool and only pool
on the system),
seeL
https://www.lerctr.org/~ler/14-BOOT-Crash.png


Where do I go from here?


I see 2 ways: 1) Since it is only an assertion and 13 is working (so
far), you may just build 14 kernel without INVARIANTS option and later
recreate the pool when you have time.  2) You may treat it as metadata
corruption: import pool read-only and evacuate the data.  If you have
recent enough snapshots you may be able to easily replicate the pool
with all the settings to some other disk.  ZIL is not replicated, so
corruptions there should not be a problem.  If there are no snapshots,
then either copy on file level, or you may be able to create snapshot
for replication in 13 (on 14 without INVARIANTS), importing pool
read-write.


Ugh.  The box is a 6 disk R710, and all 6 disks are in the pool.

I do have a FreeNAS box with enough space to copy the data out.  There 
ARE snaps of MOST filesystems that are taken regularly.


The 13 I'm booting from is the 13 memstick image.

There are ~70 filesystems (IIRC) with poudriere, ports, et al.

I'm not sure how to build the 14 kernel from the 13 booted box.

Ideas?  Methods?


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-24 Thread Alexander Motin


On 24.02.2022 10:57, Larry Rosenman wrote:

On 02/23/2022 9:27 pm, Larry Rosenman wrote:

It crashes just after root mount (this is the boot pool and only pool
on the system),
seeL
https://www.lerctr.org/~ler/14-BOOT-Crash.png


Where do I go from here?


I see 2 ways: 1) Since it is only an assertion and 13 is working (so 
far), you may just build 14 kernel without INVARIANTS option and later 
recreate the pool when you have time.  2) You may treat it as metadata 
corruption: import pool read-only and evacuate the data.  If you have 
recent enough snapshots you may be able to easily replicate the pool 
with all the settings to some other disk.  ZIL is not replicated, so 
corruptions there should not be a problem.  If there are no snapshots, 
then either copy on file level, or you may be able to create snapshot 
for replication in 13 (on 14 without INVARIANTS), importing pool read-write.


--
Alexander Motin

Re: ZFS PANIC: HELP.

2022-02-24 Thread Larry Rosenman


On 02/23/2022 9:27 pm, Larry Rosenman wrote:

On 02/23/2022 9:15 pm, Alexander Motin wrote:

On 23.02.2022 22:01, Larry Rosenman wrote:

On 02/23/2022 8:58 pm, Alexander Motin wrote:

On 23.02.2022 21:52, Larry Rosenman wrote:

On 02/23/2022 8:41 pm, Alexander Motin wrote:

Hi Larry,

The panic you are getting is an assertion, enabled by kernel built
with INVARIANTS option.  On 13 you may just not have that 
debugging

enabled to hit the issue.  But that may be only a consequence.
Original problem I guess in possibly corrupted ZFS intent log 
records
(or false positive), that could happen so due to use of -F 
recovery
option on `zpool import`, that supposed to try import pool at 
earlier
transaction group if there is some metadata corruption found.  It 
is
not supposed to work 100% and only a last resort.  Though may be 
that
assertion is just excessively strict for that specific recovery 
case.
If as you say pool can be imported and scrubbed on 13, then I'd 
expect

following clean export should allow later import on 14 without -F.

On 23.02.2022 21:21, Larry Rosenman wrote:


've got my main dev box that crashes on 14 with the screen shot 
at https://www.lerctr.org/~ler/14-zfs-crash.png.

Booting from a 13-REL USB installer it imports and scrubs.

Ideas?

I can either video conference with shared screen or give access 
to the console via my Dominion KVM.


Any help/ideas/etc welcome I really need to get this box back.


How can I import the pool withOUT it mounting the FileSystems so I 
can export it cleanly on the 13 system?


Why do you need to import without mounting file systems?  I think 
you

may actually wish them to be mounted to replay their ZILs.  Just use
-R option to mount file systems in some different place.


I get the errors shown at:
https://www.lerctr.org/~ler/14-mount-R-output.png

Should I worry?  Or do something(tm) here?


This looks weird, but may possibly depend on mount points topology,
whether /mnt is writable, etc.  What happen if you export it now and
try to import it in normal way on 14 without -F?


It crashes just after root mount (this is the boot pool and only pool
on the system),
seeL
https://www.lerctr.org/~ler/14-BOOT-Crash.png


Where do I go from here?


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-23 Thread Larry Rosenman


On 02/23/2022 9:15 pm, Alexander Motin wrote:

On 23.02.2022 22:01, Larry Rosenman wrote:

On 02/23/2022 8:58 pm, Alexander Motin wrote:

On 23.02.2022 21:52, Larry Rosenman wrote:

On 02/23/2022 8:41 pm, Alexander Motin wrote:

Hi Larry,

The panic you are getting is an assertion, enabled by kernel built
with INVARIANTS option.  On 13 you may just not have that debugging
enabled to hit the issue.  But that may be only a consequence.
Original problem I guess in possibly corrupted ZFS intent log 
records

(or false positive), that could happen so due to use of -F recovery
option on `zpool import`, that supposed to try import pool at 
earlier
transaction group if there is some metadata corruption found.  It 
is
not supposed to work 100% and only a last resort.  Though may be 
that
assertion is just excessively strict for that specific recovery 
case.
If as you say pool can be imported and scrubbed on 13, then I'd 
expect

following clean export should allow later import on 14 without -F.

On 23.02.2022 21:21, Larry Rosenman wrote:


've got my main dev box that crashes on 14 with the screen shot at 
https://www.lerctr.org/~ler/14-zfs-crash.png.

Booting from a 13-REL USB installer it imports and scrubs.

Ideas?

I can either video conference with shared screen or give access to 
the console via my Dominion KVM.


Any help/ideas/etc welcome I really need to get this box back.


How can I import the pool withOUT it mounting the FileSystems so I 
can export it cleanly on the 13 system?


Why do you need to import without mounting file systems?  I think you
may actually wish them to be mounted to replay their ZILs.  Just use
-R option to mount file systems in some different place.


I get the errors shown at:
https://www.lerctr.org/~ler/14-mount-R-output.png

Should I worry?  Or do something(tm) here?


This looks weird, but may possibly depend on mount points topology,
whether /mnt is writable, etc.  What happen if you export it now and
try to import it in normal way on 14 without -F?


It crashes just after root mount (this is the boot pool and only pool on 
the system),

seeL
https://www.lerctr.org/~ler/14-BOOT-Crash.png


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-23 Thread Alexander Motin


On 23.02.2022 22:01, Larry Rosenman wrote:

On 02/23/2022 8:58 pm, Alexander Motin wrote:

On 23.02.2022 21:52, Larry Rosenman wrote:

On 02/23/2022 8:41 pm, Alexander Motin wrote:

Hi Larry,

The panic you are getting is an assertion, enabled by kernel built
with INVARIANTS option.  On 13 you may just not have that debugging
enabled to hit the issue.  But that may be only a consequence.
Original problem I guess in possibly corrupted ZFS intent log records
(or false positive), that could happen so due to use of -F recovery
option on `zpool import`, that supposed to try import pool at earlier
transaction group if there is some metadata corruption found.  It is
not supposed to work 100% and only a last resort.  Though may be that
assertion is just excessively strict for that specific recovery case.
If as you say pool can be imported and scrubbed on 13, then I'd expect
following clean export should allow later import on 14 without -F.

On 23.02.2022 21:21, Larry Rosenman wrote:


've got my main dev box that crashes on 14 with the screen shot at 
https://www.lerctr.org/~ler/14-zfs-crash.png.

Booting from a 13-REL USB installer it imports and scrubs.

Ideas?

I can either video conference with shared screen or give access to 
the console via my Dominion KVM.


Any help/ideas/etc welcome I really need to get this box back.


How can I import the pool withOUT it mounting the FileSystems so I 
can export it cleanly on the 13 system?


Why do you need to import without mounting file systems?  I think you
may actually wish them to be mounted to replay their ZILs.  Just use
-R option to mount file systems in some different place.


I get the errors shown at:
https://www.lerctr.org/~ler/14-mount-R-output.png

Should I worry?  Or do something(tm) here?


This looks weird, but may possibly depend on mount points topology, 
whether /mnt is writable, etc.  What happen if you export it now and try 
to import it in normal way on 14 without -F?


--
Alexander Motin

Re: ZFS PANIC: HELP.

2022-02-23 Thread Larry Rosenman


On 02/23/2022 8:58 pm, Alexander Motin wrote:

On 23.02.2022 21:52, Larry Rosenman wrote:

On 02/23/2022 8:41 pm, Alexander Motin wrote:

Hi Larry,

The panic you are getting is an assertion, enabled by kernel built
with INVARIANTS option.  On 13 you may just not have that debugging
enabled to hit the issue.  But that may be only a consequence.
Original problem I guess in possibly corrupted ZFS intent log records
(or false positive), that could happen so due to use of -F recovery
option on `zpool import`, that supposed to try import pool at earlier
transaction group if there is some metadata corruption found.  It is
not supposed to work 100% and only a last resort.  Though may be that
assertion is just excessively strict for that specific recovery case.
If as you say pool can be imported and scrubbed on 13, then I'd 
expect

following clean export should allow later import on 14 without -F.

On 23.02.2022 21:21, Larry Rosenman wrote:


've got my main dev box that crashes on 14 with the screen shot at 
https://www.lerctr.org/~ler/14-zfs-crash.png.

Booting from a 13-REL USB installer it imports and scrubs.

Ideas?

I can either video conference with shared screen or give access to 
the console via my Dominion KVM.


Any help/ideas/etc welcome I really need to get this box back.


How can I import the pool withOUT it mounting the FileSystems so I can 
export it cleanly on the 13 system?


Why do you need to import without mounting file systems?  I think you
may actually wish them to be mounted to replay their ZILs.  Just use
-R option to mount file systems in some different place.


I get the errors shown at:
https://www.lerctr.org/~ler/14-mount-R-output.png

Should I worry?  Or do something(tm) here?


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-23 Thread Alexander Motin


On 23.02.2022 21:52, Larry Rosenman wrote:

On 02/23/2022 8:41 pm, Alexander Motin wrote:

Hi Larry,

The panic you are getting is an assertion, enabled by kernel built
with INVARIANTS option.  On 13 you may just not have that debugging
enabled to hit the issue.  But that may be only a consequence.
Original problem I guess in possibly corrupted ZFS intent log records
(or false positive), that could happen so due to use of -F recovery
option on `zpool import`, that supposed to try import pool at earlier
transaction group if there is some metadata corruption found.  It is
not supposed to work 100% and only a last resort.  Though may be that
assertion is just excessively strict for that specific recovery case.
If as you say pool can be imported and scrubbed on 13, then I'd expect
following clean export should allow later import on 14 without -F.

On 23.02.2022 21:21, Larry Rosenman wrote:


've got my main dev box that crashes on 14 with the screen shot at 
https://www.lerctr.org/~ler/14-zfs-crash.png.

Booting from a 13-REL USB installer it imports and scrubs.

Ideas?

I can either video conference with shared screen or give access to 
the console via my Dominion KVM.


Any help/ideas/etc welcome I really need to get this box back.


How can I import the pool withOUT it mounting the FileSystems so I can 
export it cleanly on the 13 system?


Why do you need to import without mounting file systems?  I think you 
may actually wish them to be mounted to replay their ZILs.  Just use -R 
option to mount file systems in some different place.


--
Alexander Motin

Re: ZFS PANIC: HELP.

2022-02-23 Thread Larry Rosenman


On 02/23/2022 8:41 pm, Alexander Motin wrote:

Hi Larry,

The panic you are getting is an assertion, enabled by kernel built
with INVARIANTS option.  On 13 you may just not have that debugging
enabled to hit the issue.  But that may be only a consequence.
Original problem I guess in possibly corrupted ZFS intent log records
(or false positive), that could happen so due to use of -F recovery
option on `zpool import`, that supposed to try import pool at earlier
transaction group if there is some metadata corruption found.  It is
not supposed to work 100% and only a last resort.  Though may be that
assertion is just excessively strict for that specific recovery case.
If as you say pool can be imported and scrubbed on 13, then I'd expect
following clean export should allow later import on 14 without -F.

On 23.02.2022 21:21, Larry Rosenman wrote:


've got my main dev box that crashes on 14 with the screen shot at 
https://www.lerctr.org/~ler/14-zfs-crash.png.

Booting from a 13-REL USB installer it imports and scrubs.

Ideas?

I can either video conference with shared screen or give access to the 
console via my Dominion KVM.


Any help/ideas/etc welcome I really need to get this box back.


How can I import the pool withOUT it mounting the FileSystems so I can 
export it cleanly on the 13 system?



--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: ZFS PANIC: HELP.

2022-02-23 Thread Alexander Motin


Hi Larry,

The panic you are getting is an assertion, enabled by kernel built with 
INVARIANTS option.  On 13 you may just not have that debugging enabled 
to hit the issue.  But that may be only a consequence.  Original problem 
I guess in possibly corrupted ZFS intent log records (or false 
positive), that could happen so due to use of -F recovery option on 
`zpool import`, that supposed to try import pool at earlier transaction 
group if there is some metadata corruption found.  It is not supposed to 
work 100% and only a last resort.  Though may be that assertion is just 
excessively strict for that specific recovery case.  If as you say pool 
can be imported and scrubbed on 13, then I'd expect following clean 
export should allow later import on 14 without -F.


On 23.02.2022 21:21, Larry Rosenman wrote:


've got my main dev box that crashes on 14 with the screen shot at 
https://www.lerctr.org/~ler/14-zfs-crash.png.

Booting from a 13-REL USB installer it imports and scrubs.

Ideas?

I can either video conference with shared screen or give access to the 
console via my Dominion KVM.


Any help/ideas/etc welcome I really need to get this box back.




--
Alexander Motin

ZFS PANIC: HELP.

2022-02-23 Thread Larry Rosenman




've got my main dev box that crashes on 14 with the screen shot at 
https://www.lerctr.org/~ler/14-zfs-crash.png.

Booting from a 13-REL USB installer it imports and scrubs.

Ideas?

I can either video conference with shared screen or give access to the 
console via my Dominion KVM.


Any help/ideas/etc welcome I really need to get this box back.


--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106

Re: zfs panic when 'make buildworld buildkernel'

2020-10-14 Thread Mateusz Guzik

Please try https://svnweb.freebsd.org/changeset/base/366717

On 10/15/20, YAMAMOTO Shigeru  wrote:
>
> Thank you for your help.
>
> after
> https://svnweb.freebsd.org/base?view=revision&revision=366685
>
> I update kernel and try to 'make buildworld buildkernel'.
>
> 1st time, I can finished 'make buildworld buildkernel'.
>
> 2nd time, panic
>
> ```
> [root@jenkins-02 ~]# uname -a
> FreeBSD jenkins-02.current.os-hackers.jp 13.0-CURRENT FreeBSD 13.0-CURRENT
> #0 r366693: Wed Oct 14 21:04:42 JST 2020
> r...@jenkins-02.current.os-hackers.jp:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
> amd64
> [root@jenkins-02 ~]#
> ```
>
> ```
> flags ()
> lock type zfs: UNLOCKED
> panic: No vop_fplookup_vexec(0xf80057f061f0, 0xfe004b286688)
> cpuid = 1
> time = 1602733349
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe004b286570
> vpanic() at vpanic+0x182/frame 0xfe004b2865c0
> panic() at panic+0x43/frame 0xfe004b286620
> VOP_FPLOOKUP_VEXEC_APV() at VOP_FPLOOKUP_VEXEC_APV+0x125/frame
> 0xfe004b286640
> cache_fplookup() at cache_fplookup+0x437/frame 0xfe004b286750
> namei() at namei+0x149/frame 0xfe004b286810
> vn_open_cred() at vn_open_cred+0x336/frame 0xfe004b286970
> kern_openat() at kern_openat+0x25a/frame 0xfe004b286ad0
> amd64_syscall() at amd64_syscall+0x135/frame 0xfe004b286bf0
> fast_syscall_common() at fast_syscall_common+0xf8/frame
> 0xfe004b286bf0
> --- syscall (499, FreeBSD ELF64, sys_openat), rip = 0x805909caa, rsp =
> 0x7fff71a8, rbp = 0x7fff7220 ---
> KDB: enter: panic
> [ thread pid 84431 tid 100664 ]
> Stopped at  kdb_enter+0x37: movq$0,0x10b0116(%rip)
> db>
> ```
>
> ```
> db> trace
> Tracing pid 84431 tid 100664 td 0xfe004b1e8000
> kdb_enter() at kdb_enter+0x37/frame 0xfe004b286570
> vpanic() at vpanic+0x19e/frame 0xfe004b2865c0
> panic() at panic+0x43/frame 0xfe004b286620
> VOP_FPLOOKUP_VEXEC_APV() at VOP_FPLOOKUP_VEXEC_APV+0x125/frame
> 0xfe004b286640
> cache_fplookup() at cache_fplookup+0x437/frame 0xfe004b286750
> namei() at namei+0x149/frame 0xfe004b286810
> vn_open_cred() at vn_open_cred+0x336/frame 0xfe004b286970
> kern_openat() at kern_openat+0x25a/frame 0xfe004b286ad0
> amd64_syscall() at amd64_syscall+0x135/frame 0xfe004b286bf0
> fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe004b286bf0
> --- syscall (499, FreeBSD ELF64, sys_openat), rip = 0x805909caa, rsp =
> 0x7fff71a8, rbp = 0x7fff7220 ---
> db>
> ```
>
> Thanks,
> ---
> YAMAMOTO Shigeru 
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: zfs panic when 'make buildworld buildkernel'

2020-10-14 Thread YAMAMOTO Shigeru



Thank you for your help.

after 
https://svnweb.freebsd.org/base?view=revision&revision=366685

I update kernel and try to 'make buildworld buildkernel'.

1st time, I can finished 'make buildworld buildkernel'.

2nd time, panic

```
[root@jenkins-02 ~]# uname -a
FreeBSD jenkins-02.current.os-hackers.jp 13.0-CURRENT FreeBSD 13.0-CURRENT
#0 r366693: Wed Oct 14 21:04:42 JST 2020
r...@jenkins-02.current.os-hackers.jp:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
amd64
[root@jenkins-02 ~]#
```

```
flags ()
lock type zfs: UNLOCKED
panic: No vop_fplookup_vexec(0xf80057f061f0, 0xfe004b286688)
cpuid = 1
time = 1602733349
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfe004b286570
vpanic() at vpanic+0x182/frame 0xfe004b2865c0
panic() at panic+0x43/frame 0xfe004b286620
VOP_FPLOOKUP_VEXEC_APV() at VOP_FPLOOKUP_VEXEC_APV+0x125/frame
0xfe004b286640
cache_fplookup() at cache_fplookup+0x437/frame 0xfe004b286750
namei() at namei+0x149/frame 0xfe004b286810
vn_open_cred() at vn_open_cred+0x336/frame 0xfe004b286970
kern_openat() at kern_openat+0x25a/frame 0xfe004b286ad0
amd64_syscall() at amd64_syscall+0x135/frame 0xfe004b286bf0
fast_syscall_common() at fast_syscall_common+0xf8/frame
0xfe004b286bf0
--- syscall (499, FreeBSD ELF64, sys_openat), rip = 0x805909caa, rsp =
0x7fff71a8, rbp = 0x7fff7220 ---
KDB: enter: panic
[ thread pid 84431 tid 100664 ]
Stopped at  kdb_enter+0x37: movq$0,0x10b0116(%rip)
db>
```

```
db> trace
Tracing pid 84431 tid 100664 td 0xfe004b1e8000
kdb_enter() at kdb_enter+0x37/frame 0xfe004b286570
vpanic() at vpanic+0x19e/frame 0xfe004b2865c0
panic() at panic+0x43/frame 0xfe004b286620
VOP_FPLOOKUP_VEXEC_APV() at VOP_FPLOOKUP_VEXEC_APV+0x125/frame
0xfe004b286640
cache_fplookup() at cache_fplookup+0x437/frame 0xfe004b286750
namei() at namei+0x149/frame 0xfe004b286810
vn_open_cred() at vn_open_cred+0x336/frame 0xfe004b286970
kern_openat() at kern_openat+0x25a/frame 0xfe004b286ad0
amd64_syscall() at amd64_syscall+0x135/frame 0xfe004b286bf0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe004b286bf0
--- syscall (499, FreeBSD ELF64, sys_openat), rip = 0x805909caa, rsp =
0x7fff71a8, rbp = 0x7fff7220 ---
db>
```

Thanks,
---
YAMAMOTO Shigeru 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: zfs panic when 'make buildworld buildkernel'

2020-10-13 Thread Mateusz Guzik

On 10/13/20, Mateusz Guzik  wrote:
> On 10/13/20, YAMAMOTO Shigeru  wrote:
>>
>> Hi,
>>
>> I try to 'make buildworld buildkernel' at full ZFS environment.
>> But I can't finished buildworld/buildkernel without panic.
>> Anyone have same trouble?
>>
>> uname -a:
>> ```
>> FreeBSD jenkins-02.current.os-hackers.jp 13.0-CURRENT FreeBSD
>> 13.0-CURRENT
>> #0 r366657: Tue Oct 13 13:07:15 JST 2020
>> r...@jenkins-02.current.os-hackers.jp:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
>> amd64
>> ```
>>
>> panic message:
>> ```
>> login: panic: VERIFY(tid) failed
>>
>> cpuid = 2
>> time = 1602582381
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> 0xfe002abaa9f0
>> vpanic() at vpanic+0x182/frame 0xfe002abaaa40
>> spl_panic() at spl_panic+0x3a/frame 0xfe002ab0
>> taskq_dispatch() at taskq_dispatch+0xe8/frame 0xfe002abaaae0
>> arc_prune_async() at arc_prune_async+0x3f/frame 0xfe002abaab00
>> arc_evict_cb() at arc_evict_cb+0x1f6/frame 0xfe002abaab60
>> zthr_procedure() at zthr_procedure+0x8f/frame 0xfe002abaabb0
>> fork_exit() at fork_exit+0x80/frame 0xfe002abaabf0
>> fork_trampoline() at fork_trampoline+0xe/frame 0xfe002abaabf0
>> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>> KDB: enter: panic
>> [ thread pid 19 tid 100070 ]
>> Stopped at  kdb_enter+0x37: movq$0,0x10b0116(%rip)
>> db>
>> ```
>>
>
> The issue is pretty apparent:
>
> taskqid_t tqid = atomic_fetchadd_int(&tqidnext, 1);
>
> this eventually wraps to 0 and then you get the crash.
>
> Probably the thing to do is ot bump it to 64 bit and 0-check on other
> platforms.
>

This should do it for the time being:

diff --git a/sys/contrib/openzfs/module/os/freebsd/spl/spl_taskq.c
b/sys/contrib/openzfs/module/os/freebsd/spl/spl_taskq.c
index 1050816cd968..a8e53aba3915 100644
--- a/sys/contrib/openzfs/module/os/freebsd/spl/spl_taskq.c
+++ b/sys/contrib/openzfs/module/os/freebsd/spl/spl_taskq.c
@@ -67,7 +67,7 @@ static unsigned long tqenthash;
 static unsigned long tqenthashlock;
 static struct sx *tqenthashtbl_lock;

-static uint32_t tqidnext = 1;
+static uint32_t tqidnext;

 #defineTQIDHASH(tqid) (&tqenthashtbl[(tqid) & tqenthash])
 #defineTQIDHASHLOCK(tqid) (&tqenthashtbl_lock[((tqid) &
tqenthashlock)])
@@ -90,7 +90,6 @@ system_taskq_init(void *arg)
M_TASKQ, M_WAITOK | M_ZERO);
for (i = 0; i < tqenthashlock + 1; i++)
sx_init_flags(&tqenthashtbl_lock[i], "tqenthash", SX_DUPOK);
-   tqidnext = 1;
taskq_zone = uma_zcreate("taskq_zone", sizeof (taskq_ent_t),
NULL, NULL, NULL, NULL,
UMA_ALIGN_CACHE, 0);
@@ -137,10 +136,23 @@ taskq_lookup(taskqid_t tqid)
return (ent);
 }

+static taskqid_t
+__taskq_nextgen(void)
+{
+   taskqid_t tqid;
+
+   for (;;) {
+   tqid = atomic_fetchadd_int(&tqidnext, 1) + 1;
+   if (__predict_true(tqid != 0))
+   break;
+   }
+   return (tqid);
+}
+
 static taskqid_t
 taskq_insert(taskq_ent_t *ent)
 {
-   taskqid_t tqid = atomic_fetchadd_int(&tqidnext, 1);
+   taskqid_t tqid = __taskq_nextgen();

ent->tqent_id = tqid;
ent->tqent_registered = B_TRUE;
@@ -345,9 +357,9 @@ taskq_dispatch(taskq_t *tq, task_func_t func, void
*arg, uint_t flags)
task->tqent_cancelled = B_FALSE;
task->tqent_type = NORMAL_TASK;
tid = taskq_insert(task);
+   VERIFY(tid);
TASK_INIT(&task->tqent_task, prio, taskq_run, task);
taskqueue_enqueue(tq->tq_queue, &task->tqent_task);
-   VERIFY(tid);
return (tid);
 }

-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: zfs panic when 'make buildworld buildkernel'

2020-10-13 Thread Mateusz Guzik

On 10/13/20, YAMAMOTO Shigeru  wrote:
>
> Hi,
>
> I try to 'make buildworld buildkernel' at full ZFS environment.
> But I can't finished buildworld/buildkernel without panic.
> Anyone have same trouble?
>
> uname -a:
> ```
> FreeBSD jenkins-02.current.os-hackers.jp 13.0-CURRENT FreeBSD 13.0-CURRENT
> #0 r366657: Tue Oct 13 13:07:15 JST 2020
> r...@jenkins-02.current.os-hackers.jp:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
> amd64
> ```
>
> panic message:
> ```
> login: panic: VERIFY(tid) failed
>
> cpuid = 2
> time = 1602582381
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe002abaa9f0
> vpanic() at vpanic+0x182/frame 0xfe002abaaa40
> spl_panic() at spl_panic+0x3a/frame 0xfe002ab0
> taskq_dispatch() at taskq_dispatch+0xe8/frame 0xfe002abaaae0
> arc_prune_async() at arc_prune_async+0x3f/frame 0xfe002abaab00
> arc_evict_cb() at arc_evict_cb+0x1f6/frame 0xfe002abaab60
> zthr_procedure() at zthr_procedure+0x8f/frame 0xfe002abaabb0
> fork_exit() at fork_exit+0x80/frame 0xfe002abaabf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe002abaabf0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
> [ thread pid 19 tid 100070 ]
> Stopped at  kdb_enter+0x37: movq$0,0x10b0116(%rip)
> db>
> ```
>

The issue is pretty apparent:

taskqid_t tqid = atomic_fetchadd_int(&tqidnext, 1);

this eventually wraps to 0 and then you get the crash.

Probably the thing to do is ot bump it to 64 bit and 0-check on other platforms.

-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

zfs panic when 'make buildworld buildkernel'

2020-10-13 Thread YAMAMOTO Shigeru



Hi,

I try to 'make buildworld buildkernel' at full ZFS environment.
But I can't finished buildworld/buildkernel without panic.
Anyone have same trouble?

uname -a:
```
FreeBSD jenkins-02.current.os-hackers.jp 13.0-CURRENT FreeBSD 13.0-CURRENT
#0 r366657: Tue Oct 13 13:07:15 JST 2020
r...@jenkins-02.current.os-hackers.jp:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
amd64
```

panic message:
```
login: panic: VERIFY(tid) failed

cpuid = 2
time = 1602582381
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfe002abaa9f0
vpanic() at vpanic+0x182/frame 0xfe002abaaa40
spl_panic() at spl_panic+0x3a/frame 0xfe002ab0
taskq_dispatch() at taskq_dispatch+0xe8/frame 0xfe002abaaae0
arc_prune_async() at arc_prune_async+0x3f/frame 0xfe002abaab00
arc_evict_cb() at arc_evict_cb+0x1f6/frame 0xfe002abaab60
zthr_procedure() at zthr_procedure+0x8f/frame 0xfe002abaabb0
fork_exit() at fork_exit+0x80/frame 0xfe002abaabf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe002abaabf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 19 tid 100070 ]
Stopped at  kdb_enter+0x37: movq$0,0x10b0116(%rip)
db>
```

Latest stable kernel, I have, is:
```
FreeBSD jenkins-02.current.os-hackers.jp 13.0-CURRENT FreeBSD 13.0-CURRENT
#0 r363746: Sat Aug  1 14:25:06 JST 2020
r...@freebsd-00.current.os-hackers.jp:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
amd64
```

Thanks,
---
YAMAMOTO Shigeru 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: ZFS Panic: Current: r354843: panic: solaris assert: error || lr->lr_length <= size, file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c, line: 1324

2019-11-19 Thread Dennis Clarke


On 11/19/19 3:51 PM, Larry Rosenman wrote:

Ideas?  Core *IS* available, and I can give access.

Unread portion of the kernel message buffer:
panic: solaris assert: error || lr->lr_length <= size, file: 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c, 
line: 1324

cpuid = 20
time = 1574159903

...

#16 0x00080030d7aa in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffe138
(kgdb)



Looks similar to :

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=238076

However that assert was in 
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zrlock.c and on RISC-V.


Dennis



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

ZFS Panic: Current: r354843: panic: solaris assert: error || lr->lr_length <= size, file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c, line: 1324

2019-11-19 Thread Larry Rosenman


Ideas?  Core *IS* available, and I can give access.

Unread portion of the kernel message buffer:
panic: solaris assert: error || lr->lr_length <= size, file: 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c, 
line: 1324

cpuid = 20
time = 1574159903
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe028c4d1920

vpanic() at vpanic+0x17e/frame 0xfe028c4d1980
panic() at panic+0x43/frame 0xfe028c4d19e0
assfail() at assfail+0x1a/frame 0xfe028c4d19f0
zfs_get_data() at zfs_get_data+0x358/frame 0xfe028c4d1a60
zil_commit_impl() at zil_commit_impl+0xfa5/frame 0xfe028c4d1bb0
zfs_sync() at zfs_sync+0xa2/frame 0xfe028c4d1bd0
sys_sync() at sys_sync+0xf5/frame 0xfe028c4d1c00
amd64_syscall() at amd64_syscall+0x29b/frame 0xfe028c4d1d30
fast_syscall_common() at fast_syscall_common+0x101/frame 
0xfe028c4d1d30
--- syscall (36, FreeBSD ELF64, sys_sync), rip = 0x80030d7aa, rsp = 
0x7fffe138, rbp = 0x7fffe260 ---

Uptime: 4h32m18s
Dumping 24794 out of 131029 
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%


__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" 
(offsetof(struct pcpu,

(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:392
#2  0x804bbc20 in kern_reboot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:479
#3  0x804bc076 in vpanic (fmt=, ap=out>)

at /usr/src/sys/kern/kern_shutdown.c:908
#4  0x804bbdd3 in panic (fmt=)
at /usr/src/sys/kern/kern_shutdown.c:835
#5  0x8177021a in assfail (a=, f=,
l=)
at 
/usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81

#6  0x81418e98 in zfs_get_data (arg=,
lr=0xfe0365716b60, buf=, lwb=0xf813d468a000,
zio=)
at 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1324

#7  0x813e1775 in zil_lwb_commit (zilog=0xf81044baa800,
itx=, lwb=0xf813d468a000)
at 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:1610

#8  zil_process_commit_list (zilog=0xf81044baa800)
at 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:2188

#9  zil_commit_writer (zilog=0xf81044baa800, zcw=)
at 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:2321

#10 zil_commit_impl (zilog=, foid=)
at 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:2835

#11 0x81415752 in zfs_sync (vfsp=,
waitfor=)
at 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:331
#12 0x80593e35 in sys_sync (td=, uap=out>)

at /usr/src/sys/kern/vfs_syscalls.c:142
#13 0x8080c75b in syscallenter (td=0xf816486ce000)
at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:144
#14 amd64_syscall (td=0xf816486ce000, traced=0)
at /usr/src/sys/amd64/amd64/trap.c:1163
#15 
#16 0x00080030d7aa in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffe138
(kgdb)

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: ZFS panic at boot when mounting root on r330386

2018-03-04 Thread Andriy Gapon

On 05/03/2018 02:59, Bryan Drewery wrote:
>> panic: solaris assert: refcount_count(&spa->spa_refcount) > spa->spa_minref 
>> || MUTEX_HELD(&spa_namespace_lock), file: 
>> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c, line: 952
>> cpuid = 10
>> time = 1520207367
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
>> 0xfe23f57a2420
>> vpanic() at vpanic+0x18d/frame 0xfe23f57a2480
>> panic() at panic+0x43/frame 0xfe23f57a24e0
>> assfail() at assfail+0x1a/frame 0xfe23f57a24f0
>> spa_close() at spa_close+0x5d/frame 0xfe23f57a2520
>> spa_get_stats() at spa_get_stats+0x481/frame 0xfe23f57a2700
>> zfs_ioc_pool_stats() at zfs_ioc_pool_stats+0x25/frame 0xfe23f57a2740
>> zfsdev_ioctl() at zfsdev_ioctl+0x76b/frame 0xfe23f57a27e0
>> devfs_ioctl() at devfs_ioctl+0xcb/frame 0xfe23f57a2830
>> VOP_IOCTL_APV() at VOP_IOCTL_APV+0x102/frame 0xfe23f57a2860
>> vn_ioctl() at vn_ioctl+0x124/frame 0xfe23f57a2970
>> devfs_ioctl_f() at devfs_ioctl_f+0x1f/frame 0xfe23f57a2990
>> kern_ioctl() at kern_ioctl+0x2c2/frame 0xfe23f57a29f0
>> sys_ioctl() at sys_ioctl+0x15c/frame 0xfe23f57a2ac0
>> amd64_syscall() at amd64_syscall+0x786/frame 0xfe23f57a2bf0
>> fast_syscall_common() at fast_syscall_common+0x101/frame 0xfe23f57a2bf0
>> --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x80049afda, rsp = 
>> 0x7fffbd18, rbp = 0x7fffbd90 ---
>> KDB: enter: panic
>> [ thread pid 56 tid 100606 ]
>> Stopped at  kdb_enter+0x3b: movq$0,kdb_why
>> db>
> 
> It seems like a race as I can get it to boot sometimes.

Yes, it does.  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210409

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

ZFS panic at boot when mounting root on r330386

2018-03-04 Thread Bryan Drewery

> panic: solaris assert: refcount_count(&spa->spa_refcount) > spa->spa_minref 
> || MUTEX_HELD(&spa_namespace_lock), file: 
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_misc.c, line: 952
> cpuid = 10
> time = 1520207367
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe23f57a2420
> vpanic() at vpanic+0x18d/frame 0xfe23f57a2480
> panic() at panic+0x43/frame 0xfe23f57a24e0
> assfail() at assfail+0x1a/frame 0xfe23f57a24f0
> spa_close() at spa_close+0x5d/frame 0xfe23f57a2520
> spa_get_stats() at spa_get_stats+0x481/frame 0xfe23f57a2700
> zfs_ioc_pool_stats() at zfs_ioc_pool_stats+0x25/frame 0xfe23f57a2740
> zfsdev_ioctl() at zfsdev_ioctl+0x76b/frame 0xfe23f57a27e0
> devfs_ioctl() at devfs_ioctl+0xcb/frame 0xfe23f57a2830
> VOP_IOCTL_APV() at VOP_IOCTL_APV+0x102/frame 0xfe23f57a2860
> vn_ioctl() at vn_ioctl+0x124/frame 0xfe23f57a2970
> devfs_ioctl_f() at devfs_ioctl_f+0x1f/frame 0xfe23f57a2990
> kern_ioctl() at kern_ioctl+0x2c2/frame 0xfe23f57a29f0
> sys_ioctl() at sys_ioctl+0x15c/frame 0xfe23f57a2ac0
> amd64_syscall() at amd64_syscall+0x786/frame 0xfe23f57a2bf0
> fast_syscall_common() at fast_syscall_common+0x101/frame 0xfe23f57a2bf0
> --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x80049afda, rsp = 
> 0x7fffbd18, rbp = 0x7fffbd90 ---
> KDB: enter: panic
> [ thread pid 56 tid 100606 ]
> Stopped at  kdb_enter+0x3b: movq$0,kdb_why
> db>

It seems like a race as I can get it to boot sometimes.

-- 
Regards,
Bryan Drewery



signature.asc
Description: OpenPGP digital signature

random ZFS panic...

2016-03-02 Thread Larry Rosenman

Was rebooting my laptop, and got the following:


trivet dumped core - see /var/crash/vmcore.0

Wed Mar  2 06:09:19 CST 2016

FreeBSD trivet 11.0-CURRENT FreeBSD 11.0-CURRENT #3 r296287: Tue Mar  1 
19:13:37 CST 2016 root@trivet:/usr/obj/usr/src/sys/GENERIC  amd64

panic: from debugger

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
<118>.
<118>Terminated
<118>Mar  2 06:07:22 trivet syslogd: exiting on signal 15
<5>wlan0: link state changed to DOWN
Waiting (max 60 seconds) for system process `vnlru' to stop...done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining...0 0 0 0 done
All buffers synced.
lock order reversal:
 1st 0xf80012855d50 zfs (zfs) @ /usr/src/sys/kern/vfs_mount.c:1222
 2nd 0xf800128557c8 zfs_gfs (zfs_gfs) @ 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/gfs.c:494
stack backtrace:
#0 0x80a825a0 at witness_debugger+0x70
#1 0x80a824a1 at witness_checkorder+0xe71
#2 0x80a00a6b at __lockmgr_args+0xd3b
#3 0x80ace2bc at vop_stdlock+0x3c
#4 0x80fcb010 at VOP_LOCK1_APV+0x100
#5 0x80aeef9a at _vn_lock+0x9a
#6 0x820c8b13 at gfs_file_create+0x73
#7 0x820c8bbd at gfs_dir_create+0x1d
#8 0x82191f57 at zfsctl_mknode_snapdir+0x47
#9 0x820c9135 at gfs_dir_lookup+0x185
#10 0x820c961d at gfs_vop_lookup+0x1d
#11 0x82190f75 at zfsctl_root_lookup+0xf5
#12 0x82191e13 at zfsctl_umount_snapshots+0x83
#13 0x821aacfb at zfs_umount+0x7b
#14 0x80ad7c70 at dounmount+0x530
#15 0x80ae127b at vfs_unmountall+0x6b
#16 0x80ac24f9 at bufshutdown+0x3b9
#17 0x80a26fe9 at kern_reboot+0x189
lock order reversal:
 1st 0xf800125197c8 zfs (zfs) @ /usr/src/sys/kern/vfs_mount.c:1222
 2nd 0xf8000ffbc240 devfs (devfs) @ /usr/src/sys/kern/vfs_subr.c:2498
stack backtrace:
#0 0x80a825a0 at witness_debugger+0x70
#1 0x80a824a1 at witness_checkorder+0xe71
#2 0x80a00a6b at __lockmgr_args+0xd3b
#3 0x80ace2bc at vop_stdlock+0x3c
#4 0x80fcb010 at VOP_LOCK1_APV+0x100
#5 0x80aeef9a at _vn_lock+0x9a
#6 0x80adf553 at vget+0x63
#7 0x808fcb4d at devfs_allocv+0xcd
#8 0x808fc653 at devfs_root+0x43
#9 0x80ad7b8f at dounmount+0x44f
#10 0x80ae12d4 at vfs_unmountall+0xc4
#11 0x80ac24f9 at bufshutdown+0x3b9
#12 0x80a26fe9 at kern_reboot+0x189
#13 0x80a26e03 at sys_reboot+0x3e3
#14 0x80e7a15b at amd64_syscall+0x2db
#15 0x80e5926b at Xfast_syscall+0xfb
Uptime: 7h20m6s


Fatal trap 9: general protection fault while in kernel mode
cpuid = 2; apic id = 02
instruction pointer = 0x20:0x8215369b
stack pointer   = 0x28:0xfe02329189b0
frame pointer   = 0x28:0xfe02329189c0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 0 (dbu_evict)
Uptime: 7h20m7s
Dumping 2338 out of 8050 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/zfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/coretemp.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/coretemp.ko.debug...done.
done.
Loaded symbols for /boot/kernel/coretemp.ko
Reading symbols from /boot/kernel/ichsmb.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/ichsmb.ko.debug...done.
done.
Loaded symbols for /boot/kernel/ichsmb.ko
Reading symbols from /boot/kernel/smbus.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/smbus.ko.debug...done.
done.
Loaded symbols for /boot/kernel/smbus.ko
Reading symbols from /boot/kernel/hwpmc.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/hwpmc.ko.debug...done.
done.
Loaded symbols for /boot/kernel/hwpmc.ko
Reading symbols from /boot/kernel/iwn135fw.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/iwn135fw.ko.debug...done.
done.
Loaded symbols for /boot/kernel/iwn135fw.ko
Reading symbols from /boot/kernel/aesni.ko...Reading symbols from 
/usr/lib/debug//boot/kernel/aesni.ko.debug...done.
done.
Loaded symbols for /boot/kernel/aesni.ko
Reading symbols from /boot

Re: ZFS panic

2015-10-01 Thread Oliver Pinter

CC+= swills

On 9/17/15, Oliver Pinter  wrote:
> Hi All!
>
> We got this panic on modified FreeBSD (we not touched the ZFS part).
>
> panic: solaris assert: error || lr->lr_length <= zp->z_blksz, file:
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c,
> line: 1355
> cpuid = 6
> KDB: stack backtrace:
> #0 0x80639527 at kdb_backtrace+0x67
> #1 0x805fd509 at vpanic+0x189
> #2 0x805fd593 at panic+0x43
> #3 0x802ce3aa at assfail+0x1a
> #4 0x8039c391 at zfs_get_data+0x391
> #5 0x803afeac at zil_commit+0x94c
> #6 0x803a39d8 at zfs_freebsd_fsync+0xc8
> #7 0x8089a8a7 at VOP_FSYNC_APV+0xf7
> #8 0x806afc40 at sys_fsync+0x170
> #9 0x808311bc at amd64_syscall+0x2bc
> #10 0x8081285b at Xfast_syscall+0xfb
> Uptime: 7d5h19m13s
> Dumping 8207 out of 32742
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> Dump complete
> Automatic reboot in 15 seconds - press a key on the console to abort
> Rebooting...
> cpu_reset: Restarting BSP
> cpu_reset_proxy: Stopped CPU 6
>
>
> (kgdb) bt
> #0  doadump (textdump=) at pcpu.h:221
> #1  0x805fcf70 in kern_reboot (howto=260) at
> /usr/src/sys/kern/kern_shutdown.c:329
> #2  0x805fd548 in vpanic (fmt=, ap= optimized out>) at /usr/src/sys/kern/kern_shutdown.c:626
> #3  0x805fd593 in panic (fmt=0x0) at
> /usr/src/sys/kern/kern_shutdown.c:557
> #4  0x802ce3aa in assfail (a=, f= optimized out>, l=) at
> /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81
> #5  0x8039c391 in zfs_get_data (arg=,
> lr=, buf=,
> zio=0xf8019eeb1760) at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1355
> #6  0x803afeac in zil_commit (zilog=0xf8001d518800,
> foid=) at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:1107
> #7  0x803a39d8 in zfs_freebsd_fsync (ap=)
> at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:2797
> #8  0x8089a8a7 in VOP_FSYNC_APV (vop=,
> a=) at vnode_if.c:1328
> #9  0x806afc40 in sys_fsync (td=0xf8001d0429c0, uap= optimized out>) at vnode_if.h:549
> #10 0x808311bc in amd64_syscall (td=0xf8001d0429c0,
> traced=0) at subr_syscall.c:139
> #11 0x8081285b in Xfast_syscall () at
> /usr/src/sys/amd64/amd64/exception.S:394
> #12 0x0058d23a in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> Current language:  auto; currently minimal
> (kgdb) f 5
> #5  0x8039c391 in zfs_get_data (arg=,
> lr=, buf=,
> zio=0xf8019eeb1760) at
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1355
> 1355ASSERT(error || lr->lr_length <=
> zp->z_blksz);
> (kgdb) l
> 1350ASSERT(db->db_offset == offset);
> 1351ASSERT(db->db_size == size);
> 1352
> 1353error = dmu_sync(zio,
> lr->lr_common.lrc_txg,
> 1354zfs_get_done, zgd);
> 1355ASSERT(error || lr->lr_length <=
> zp->z_blksz);
> 1356
> 1357/*
> 1358 * On success, we need to wait for the write
> I/O
> 1359 * initiated by dmu_sync() to complete
> before we can
> (kgdb) p *lr
> Cannot access memory at address 0xa5a5a5a5a5a5a5a5
> (kgdb) p *zp
> Cannot access memory at address 0xa5a5a5a5a5a5a5a5
> (kgdb)
>
>
> Undefined info command: "regs".  Try "help info".
> (kgdb) info registers
> rax0x0  0
> rbx0xf804aab14e00   -8776049406464
> rcx0x0  0
> rdx0x0  0
> rsi0x0  0
> rdi0x0  0
> rbp0xfe085f78e8f0   0xfe085f78e8f0
> rsp0xfe085f78e890   0xfe085f78e890
> r8 0x0  0
> r9 0x0  0
> r100x0  0
> r110x0  0
> r120x0  0
> r130xfe034cecd0b8   -2184847765320
> r140x2  131072
> r150x0  0
> rip0x8039c391   0x8039c391
> 
> eflags 0x0  0
> cs 0x0  0
> ss 0x0  0
> ds 0x0  0
> es 0x0  0
> fs 0x0  0
> gs 0x0  0
>
> [...]
> 8039c2f9:   48 8b 7d b0 mov-0x50(%rbp),%rdi
> 8039c2fd:   48 89 d9mov%rbx,%rcx
> 8039c300:   e8 db 50 f6 ff  callq
> 803013e0 
> 8039c305:   41 89 c4mov%eax,%r12d
> 8039c308:   41 83 fc 25 cmp$0x25,%r12d
> 8039c30c:   75 53   jne
> 8039c361 
> 8039c30e:   49 c7 45 00 14 00 00movq   $0x14,0x0(%r13)
> 8039c315:   00
> 8039c316:   45 31 e4xor

ZFS panic

2015-09-17 Thread Oliver Pinter

Hi All!

We got this panic on modified FreeBSD (we not touched the ZFS part).

panic: solaris assert: error || lr->lr_length <= zp->z_blksz, file:
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c,
line: 1355
cpuid = 6
KDB: stack backtrace:
#0 0x80639527 at kdb_backtrace+0x67
#1 0x805fd509 at vpanic+0x189
#2 0x805fd593 at panic+0x43
#3 0x802ce3aa at assfail+0x1a
#4 0x8039c391 at zfs_get_data+0x391
#5 0x803afeac at zil_commit+0x94c
#6 0x803a39d8 at zfs_freebsd_fsync+0xc8
#7 0x8089a8a7 at VOP_FSYNC_APV+0xf7
#8 0x806afc40 at sys_fsync+0x170
#9 0x808311bc at amd64_syscall+0x2bc
#10 0x8081285b at Xfast_syscall+0xfb
Uptime: 7d5h19m13s
Dumping 8207 out of 32742 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
Dump complete
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...
cpu_reset: Restarting BSP
cpu_reset_proxy: Stopped CPU 6


(kgdb) bt
#0  doadump (textdump=) at pcpu.h:221
#1  0x805fcf70 in kern_reboot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:329
#2  0x805fd548 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:626
#3  0x805fd593 in panic (fmt=0x0) at
/usr/src/sys/kern/kern_shutdown.c:557
#4  0x802ce3aa in assfail (a=, f=, l=) at
/usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81
#5  0x8039c391 in zfs_get_data (arg=,
lr=, buf=,
zio=0xf8019eeb1760) at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1355
#6  0x803afeac in zil_commit (zilog=0xf8001d518800,
foid=) at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:1107
#7  0x803a39d8 in zfs_freebsd_fsync (ap=)
at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:2797
#8  0x8089a8a7 in VOP_FSYNC_APV (vop=,
a=) at vnode_if.c:1328
#9  0x806afc40 in sys_fsync (td=0xf8001d0429c0, uap=) at vnode_if.h:549
#10 0x808311bc in amd64_syscall (td=0xf8001d0429c0,
traced=0) at subr_syscall.c:139
#11 0x8081285b in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:394
#12 0x0058d23a in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal
(kgdb) f 5
#5  0x8039c391 in zfs_get_data (arg=,
lr=, buf=,
zio=0xf8019eeb1760) at
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1355
1355ASSERT(error || lr->lr_length <= zp->z_blksz);
(kgdb) l
1350ASSERT(db->db_offset == offset);
1351ASSERT(db->db_size == size);
1352
1353error = dmu_sync(zio, lr->lr_common.lrc_txg,
1354zfs_get_done, zgd);
1355ASSERT(error || lr->lr_length <= zp->z_blksz);
1356
1357/*
1358 * On success, we need to wait for the write I/O
1359 * initiated by dmu_sync() to complete
before we can
(kgdb) p *lr
Cannot access memory at address 0xa5a5a5a5a5a5a5a5
(kgdb) p *zp
Cannot access memory at address 0xa5a5a5a5a5a5a5a5
(kgdb)


Undefined info command: "regs".  Try "help info".
(kgdb) info registers
rax0x0  0
rbx0xf804aab14e00   -8776049406464
rcx0x0  0
rdx0x0  0
rsi0x0  0
rdi0x0  0
rbp0xfe085f78e8f0   0xfe085f78e8f0
rsp0xfe085f78e890   0xfe085f78e890
r8 0x0  0
r9 0x0  0
r100x0  0
r110x0  0
r120x0  0
r130xfe034cecd0b8   -2184847765320
r140x2  131072
r150x0  0
rip0x8039c391   0x8039c391 
eflags 0x0  0
cs 0x0  0
ss 0x0  0
ds 0x0  0
es 0x0  0
fs 0x0  0
gs 0x0  0

[...]
8039c2f9:   48 8b 7d b0 mov-0x50(%rbp),%rdi
8039c2fd:   48 89 d9mov%rbx,%rcx
8039c300:   e8 db 50 f6 ff  callq
803013e0 
8039c305:   41 89 c4mov%eax,%r12d
8039c308:   41 83 fc 25 cmp$0x25,%r12d
8039c30c:   75 53   jne
8039c361 
8039c30e:   49 c7 45 00 14 00 00movq   $0x14,0x0(%r13)
8039c315:   00
8039c316:   45 31 e4xor%r12d,%r12d
8039c319:   eb 29   jmp
8039c344 
8039c31b:   48 8b 3c 25 38 a4 c1mov0x80c1a438,%rdi
8039c322:   80
8039c323:   41 bc 02 00 00 00   mov$0x2,%r12d
8039c329:   48 85 fftest   %rdi,%rdi
8039c32c:

Re: [ZFS] [panic] Fatal trap 12: page fault while in kernel mode.

2015-02-12 Thread Ivan Klymenko

В Tue, 10 Feb 2015 22:01:29 +0200
Ivan Klymenko  пишет:

> I do not know the conditions - it just happened.
> 
> http://pastebin.com/BASJB599

next
http://pastebin.com/hY8GYpjd
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [ZFS] [panic] Fatal trap 12: page fault while in kernel mode.

2015-02-11 Thread Ivan Klymenko

В Tue, 10 Feb 2015 22:01:29 +0200
Ivan Klymenko  пишет:

> I do not know the conditions - it just happened.
> 
> http://pastebin.com/BASJB599
> ___
> freebsd...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscr...@freebsd.org"


http://pastebin.com/8HjebJKB

http://pastebin.com/dx8PJKNr
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

[ZFS] [panic] Fatal trap 12: page fault while in kernel mode.

2015-02-10 Thread Ivan Klymenko

I do not know the conditions - it just happened.

http://pastebin.com/BASJB599
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-22 Thread Florian Smeets

On 20/07/14 16:03, Larry Rosenman wrote:
> 
> panic: solaris assert: !(zio->io_flags & ZIO_FLAG_DELEGATED), file:
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line:
> 2874
> 

This was fixed by r268980.

Florian



signature.asc
Description: OpenPGP digital signature

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-22 Thread Xin Li

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

I think this is my stupid.  Feeling ashamed.

Please try r268980+ and report back if it's fixed or not, thanks!

Cheers,
-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJTzihjAAoJEJW2GBstM+ns2esP/3zfORqtE11QeveWI8wBzHav
Pl4A3V8kgi8FHP8m33gim1yAERpqf2+WgYuAjUNloFrFD9JCslnnoGtk9yiaOH2N
Y7SJMYfvYkZ50upGE+fJKAcpH3QxLuEpSIlaFMkvA9oXtAiZEJ+BYBJh8VFWFXIs
+bnqY0Mba7T574oqw8KcEysdYYDqSVd4/M/HBUdDTWT0/ZgeStJZw9MKPiSYFJcK
PvV9dglPZ08L+Ra7cqu/EtC93p9IxVtfqxMlNkhMeVkQt+0jMKzHgDZ0dg9j7AVn
SA9e+YvE9Sg2riE6XT2iF9B8Z4vDFf5/8CGtpBpKrB4861bcctdgJ3LIJBdDsz9N
lOCjWXc4Amv8GVpTKPKaAGxeGPeN52Drqd2pD1ljLLENJsBZ3WOdMjt/R4VhpLVl
9LuXNFZ8oQy/PqVxHhYWYOOuqcFQnexB/5Qdmic3grIC/fwELbQwEK+34md5q4iJ
pvy34RzzZrA1pVRfBM2oQ0TnABrdA7+bsb44DLvkPHmVGxPPZdN7QUfVzIrbCNe9
apBkLk/Qx2kOtpuAe6xBxhaOZEAJ9Q26Souew1HOYDSAusyCETB5+tfp1hl7R0ic
xdL4YjSq9FPKOyYATCBh8/F0f2hVPC3mB2SmDz9PvI1+POWHae1tpeSCGonoc3A+
DCOLXRPNLbyZyrxTUtXG
=NzT+
-END PGP SIGNATURE-
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-21 Thread Larry Rosenman

On 2014-07-21 00:24, Florian Smeets wrote:

On 21/07/14 01:46, Steven Hartland wrote:

- Original Message - From: "Larry Rosenman" 
To: "Steven Hartland" 
Cc: ; 
Sent: Monday, July 21, 2014 12:22 AM
Subject: Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

On 2014-07-20 18:21, Steven Hartland wrote:

Can you try reverting r265321 and see if you still see the
same crash?

   Regards
   Steve

I'll do the revert, but it's been a ONE TIME hit.

There was a followup to mine with a reproducible poudriere crash like
mine.

If you don't have a reproducable senario I'd hold off.

Florian, is yours reproducable and can you send me
a pretty print of the crashing zio?

can you set print pretty on
and then reprint the zio?

That makes it "pretty" for Steve.
--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 (c) E-Mail: l...@lerctr.org
US Mail: 108 Turvey Cove, Hutto, TX 78634-5688
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-20 Thread Florian Smeets

On 21/07/14 01:46, Steven Hartland wrote:
> - Original Message - From: "Larry Rosenman" 
> To: "Steven Hartland" 
> Cc: ; 
> Sent: Monday, July 21, 2014 12:22 AM
> Subject: Re: [ZFS][PANIC] Solaris Assert/zio.c:2548
> 
> 
>> On 2014-07-20 18:21, Steven Hartland wrote:
>>> Can you try reverting r265321 and see if you still see the
>>> same crash?
>>>
>>>Regards
>>>Steve
>> I'll do the revert, but it's been a ONE TIME hit.
>>
>> There was a followup to mine with a reproducible poudriere crash like
>> mine.
> 
> If you don't have a reproducable senario I'd hold off.
> 
> Florian, is yours reproducable and can you send me
> a pretty print of the crashing zio?
> 

My backtrace looks a little different.

panic: solaris assert: !(zio->io_flags & ZIO_FLAG_DELEGATED), file:
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c,
line: 2874
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfe2e97f0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe2e98a0
vpanic() at vpanic+0x126/frame 0xfe2e98e0
panic() at panic+0x43/frame 0xfe2e9940
assfail() at assfail+0x1d/frame 0xfe2e9950
zio_vdev_io_assess() at zio_vdev_io_assess+0x2e8/frame 0xfe2e9980
zio_execute() at zio_execute+0x1e9/frame 0xfe2e99e0
taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame 0xfe2e9a40
taskqueue_thread_loop() at taskqueue_thread_loop+0x9b/frame
0xfe2e9a70
fork_exit() at fork_exit+0x84/frame 0xfe2e9ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe2e9ab0
--- trap 0, rip = 0, rsp = 0xfe2e9b70, rbp = 0 ---
KDB: enter: panic
(kgdb) where
#0  doadump (textdump=-2125462752) at pcpu.h:219
#1  0x80347655 in db_fncall (dummy1=,
dummy2=, dummy3=,
dummy4=) at /usr/src/sys/ddb/db_command.c:578
#2  0x8034733d in db_command (cmd_table=0x0)
at /usr/src/sys/ddb/db_command.c:449
#3  0x803470b4 in db_command_loop ()
at /usr/src/sys/ddb/db_command.c:502
#4  0x80349a90 in db_trap (type=, code=0)
at /usr/src/sys/ddb/db_main.c:231
#5  0x80944159 in kdb_trap (type=3, code=0, tf=)
at /usr/src/sys/kern/subr_kdb.c:654
#6  0x80d1e532 in trap (frame=0xfe2e97d0)
at /usr/src/sys/amd64/amd64/trap.c:542
#7  0x80d01202 in calltrap ()
at /usr/src/sys/amd64/amd64/exception.S:231
#8  0x809438be in kdb_enter (why=0x80f9ce38 "panic",
msg=) at cpufunc.h:63
#9  0x8090bb66 in vpanic (fmt=,
ap=) at /usr/src/sys/kern/kern_shutdown.c:737
#10 0x8090bbd3 in panic (fmt=0x815a59a0 "\004")
at /usr/src/sys/kern/kern_shutdown.c:673
#11 0x81fb821d in assfail (a=,
---Type  to continue, or q  to quit---
f=, l=)
at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81
#12 0x81eca848 in zio_vdev_io_assess (ziop=)
at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2874
#13 0x81ec58b9 in zio_execute (zio=0xf801a8abc398)
at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:1416
#14 0x80954150 in taskqueue_run_locked (queue=0xf80009249b00)
at /usr/src/sys/kern/subr_taskqueue.c:356
#15 0x80954c1b in taskqueue_thread_loop (arg=)
at /usr/src/sys/kern/subr_taskqueue.c:623
#16 0x808d9834 in fork_exit (
callout=0x80954b80 ,
arg=0xf80003dfeed0, frame=0xfe2e9ac0)
at /usr/src/sys/kern/kern_fork.c:977
#17 0x80d0173e in fork_trampoline ()
at /usr/src/sys/amd64/amd64/exception.S:605
#18 0x in ?? ()
(kgdb) frame 12
#12 0x81eca848 in zio_vdev_io_assess (ziop=)
at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c:2874
2874ASSERT(!(zio->io_flags & ZIO_FLAG_DELEGATED));
(kgdb) print zio
$3 = (zio_t *) 0xf801a8abc398
(kgdb) print *zio
$4 = {io_bookmark = {zb_objset = 4339, zb_object = 327827, zb_level = 0,
zb_blkid = 0}, io_prop = {zp_checksum = ZIO_CHECKSUM_INHERIT,
zp_compress = ZIO_COMPRESS_INHERIT, zp_type = DMU_OT_NONE,
zp_level = 0 '\0', zp_copies = 0 '\0', zp_dedup = 0, zp_dedup_verify
= 0,
zp_nopwrite = 0}, io_type = ZIO_TYPE_WRITE,
  io_child_type = ZIO_CHILD_VDEV, io_cmd = 0,
  io_priority = ZIO_PRIORITY_ASYNC_WRITE, io_reexecute = 0 '\0',
  io_state = "\001", io_txg = 1312558, io_spa = 0xfe00022e6000,
  io_bp = 0xfe000a94a640, io_bp_override = 0x0, io_bp_copy =
{blk_dva = {{
dva_word = {1, 58754170}}, {dva_word = {1, 69614673}}, {dva_word
= {0,
  0}}}, blk_prop = 9229009297394892802, blk_pad = {0, 0},
blk_phys_birth = 0, blk_birth =

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-20 Thread Steven Hartland

- Original Message - 
From: "Dan Mack" 

To: "Steven Hartland" 
Cc: ; ; "Larry Rosenman" 

Sent: Monday, July 21, 2014 2:29 AM
Subject: Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

On Mon, 21 Jul 2014, Steven Hartland wrote:

I just updated to I think 268921 earlier today and this is the first
time I've had a panic (HEAD-268921 that is)

I'll try to get some more data if I can get it back up and running.

That doesn't look like a related trace tbh.

  Regards
  Steve

After rebooting with a dumpdev; I got this :

kbd2 at ukbd0
Trying to mount root from zfs:tank []...
panic: deadlkres: possible deadlock detected for 0xf8000e089000, blocked 
for 1801216 ticks

cpuid = 6
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe085ef1d8d0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe085ef1d980
vpanic() at vpanic+0x126/frame 0xfe085ef1d9c0
panic() at panic+0x43/frame 0xfe085ef1da20
deadlkres() at deadlkres+0x35c/frame 0xfe085ef1da70
fork_exit() at fork_exit+0x84/frame 0xfe085ef1dab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe085ef1dab0
--- trap 0, rip = 0, rsp = 0xfe085ef1db70, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100070 ]
Stopped at  kdb_enter+0x3e: movq$0,kdb_why

I cannot seem to get past this yet so I'm open to suggestions.  I'm
still at the db> prompt if you'd like me to attempt to collect more
info.

Just spotted an interesting message on a recent commit which may be
relavent:

URL: http://svnweb.freebsd.org/changeset/base/268855
This specific commit makes boot hang just before mounting the root 
dataset for me when vfs.zfs.vdev.cache.size tunable is set. Unsetting 
this tunable or reverting this commit (currently running r268933 minus 
r268855) fixes the boot for me.

Please let me know if I can provide any more information.

- Nikolai Lifanov

The current code disables vdev caching by default so this will only
occur if manually enabled.

The code details the reason for this as:-
* TODO: Note that with the current ZFS code, it turns out that the
* vdev cache is not helpful, and in some cases actually harmful.  It
* is better if we disable this.  Once some time has passed, we should
* actually remove this to simplify the code.  For now we just disable
* it by setting the zfs_vdev_cache_size to zero.  Note that Solaris 11
* has made these same changes.

   Regards
   Steve

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-20 Thread Steven Hartland

- Original Message - 
From: "Dan Mack" 

To: "Steven Hartland" 
Cc: ; ; "Larry Rosenman" 

Sent: Monday, July 21, 2014 2:29 AM
Subject: Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

On Mon, 21 Jul 2014, Steven Hartland wrote:

I just updated to I think 268921 earlier today and this is the first
time I've had a panic (HEAD-268921 that is)

I'll try to get some more data if I can get it back up and running.

That doesn't look like a related trace tbh.

  Regards
  Steve

After rebooting with a dumpdev; I got this :

kbd2 at ukbd0
Trying to mount root from zfs:tank []...
panic: deadlkres: possible deadlock detected for 0xf8000e089000, blocked 
for 1801216 ticks

cpuid = 6
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe085ef1d8d0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe085ef1d980
vpanic() at vpanic+0x126/frame 0xfe085ef1d9c0
panic() at panic+0x43/frame 0xfe085ef1da20
deadlkres() at deadlkres+0x35c/frame 0xfe085ef1da70
fork_exit() at fork_exit+0x84/frame 0xfe085ef1dab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe085ef1dab0
--- trap 0, rip = 0, rsp = 0xfe085ef1db70, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100070 ]
Stopped at  kdb_enter+0x3e: movq$0,kdb_why

I cannot seem to get past this yet so I'm open to suggestions.  I'm
still at the db> prompt if you'd like me to attempt to collect more
info.

For some reason the deadlock detector is triggering, not sure why.

I'd recommend starting a new thread to discuss this as it doesn't
appear to be related to this thread.

The only thing I could suggest is disabling it to see if it truely
is a deadlock or if something is being really slow.
vfs.zfs.deadman_enabled=0

If this is new then it would be good for you to try and identify
which of the changes introduced it, so do a binary chop on versions
back to your last known good.

   Regards
   Steve
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-20 Thread Dan Mack


On Mon, 21 Jul 2014, Steven Hartland wrote:


I just updated to I think 268921 earlier today and this is the first
time I've had a panic (HEAD-268921 that is)

I'll try to get some more data if I can get it back up and running.


That doesn't look like a related trace tbh.

  Regards
  Steve


After rebooting with a dumpdev; I got this :

kbd2 at ukbd0
Trying to mount root from zfs:tank []...
panic: deadlkres: possible deadlock detected for 0xf8000e089000, blocked 
for 1801216 ticks

cpuid = 6
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe085ef1d8d0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe085ef1d980
vpanic() at vpanic+0x126/frame 0xfe085ef1d9c0
panic() at panic+0x43/frame 0xfe085ef1da20
deadlkres() at deadlkres+0x35c/frame 0xfe085ef1da70
fork_exit() at fork_exit+0x84/frame 0xfe085ef1dab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe085ef1dab0
--- trap 0, rip = 0, rsp = 0xfe085ef1db70, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100070 ]
Stopped at  kdb_enter+0x3e: movq$0,kdb_why

I cannot seem to get past this yet so I'm open to suggestions.  I'm
still at the db> prompt if you'd like me to attempt to collect more
info.

dan
--
Dan Mack

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-20 Thread Dan Mack


On Mon, 21 Jul 2014, Steven Hartland wrote:



- Original Message - From: "Dan Mack" 


I think I may have hit the same problem; I'm going to stay connected
to the console and see if it happens again; this is what I see
currently with the back-trace:

db> bt
Tracing pid 0 tid 100070 td 0xf8000e088920
kdb_enter() at kdb_enter+0x3e/frame 0xfe085ef1d980
vpanic() at vpanic+0x146/frame 0xfe085ef1d9c0
panic() at panic+0x43/frame 0xfe085ef1da20
deadlkres() at deadlkres+0x35c/frame 0xfe085ef1da70
fork_exit() at fork_exit+0x84/frame 0xfe085ef1dab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe085ef1dab0
--- trap 0, rip = 0, rsp = 0xfe085ef1db70, rbp = 0 ---

I just updated to I think 268921 earlier today and this is the first
time I've had a panic (HEAD-268921 that is)

I'll try to get some more data if I can get it back up and running.


That doesn't look like a related trace tbh.

  Regards
  Steve


Awesome, something else perhaps :-)   Thanks,

dan
--
Dan Mack

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-20 Thread Dan Mack


On Mon, 21 Jul 2014, Steven Hartland wrote:


- Original Message - From: "Larry Rosenman" 
To: "Steven Hartland" 
Cc: ; 
Sent: Monday, July 21, 2014 12:22 AM
Subject: Re: [ZFS][PANIC] Solaris Assert/zio.c:2548



On 2014-07-20 18:21, Steven Hartland wrote:

Can you try reverting r265321 and see if you still see the
same crash?

   Regards
   Steve

I'll do the revert, but it's been a ONE TIME hit.

There was a followup to mine with a reproducible poudriere crash like mine.


If you don't have a reproducable senario I'd hold off.

Florian, is yours reproducable and can you send me
a pretty print of the crashing zio?

  Regards
  Steve


I think I may have hit the same problem; I'm going to stay connected to the 
console and see if it happens again; this is what I see currently with the 
back-trace:

db> bt
Tracing pid 0 tid 100070 td 0xf8000e088920
kdb_enter() at kdb_enter+0x3e/frame 0xfe085ef1d980
vpanic() at vpanic+0x146/frame 0xfe085ef1d9c0
panic() at panic+0x43/frame 0xfe085ef1da20
deadlkres() at deadlkres+0x35c/frame 0xfe085ef1da70
fork_exit() at fork_exit+0x84/frame 0xfe085ef1dab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe085ef1dab0
--- trap 0, rip = 0, rsp = 0xfe085ef1db70, rbp = 0 ---

I just updated to I think 268921 earlier today and this is the first time I've 
had a panic (HEAD-268921 that is)

I'll try to get some more data if I can get it back up and running.

dan
--
Dan Mack

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-20 Thread Steven Hartland



- Original Message - 
From: "Dan Mack" 



I think I may have hit the same problem; I'm going to stay connected
to the console and see if it happens again; this is what I see
currently with the back-trace:

db> bt
Tracing pid 0 tid 100070 td 0xf8000e088920
kdb_enter() at kdb_enter+0x3e/frame 0xfe085ef1d980
vpanic() at vpanic+0x146/frame 0xfe085ef1d9c0
panic() at panic+0x43/frame 0xfe085ef1da20
deadlkres() at deadlkres+0x35c/frame 0xfe085ef1da70
fork_exit() at fork_exit+0x84/frame 0xfe085ef1dab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe085ef1dab0
--- trap 0, rip = 0, rsp = 0xfe085ef1db70, rbp = 0 ---

I just updated to I think 268921 earlier today and this is the first
time I've had a panic (HEAD-268921 that is)

I'll try to get some more data if I can get it back up and running.


That doesn't look like a related trace tbh.

   Regards
   Steve
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-20 Thread Larry Rosenman

On 2014-07-20 18:46, Steven Hartland wrote:

- Original Message - From: "Larry Rosenman" 
To: "Steven Hartland" 
Cc: ; 
Sent: Monday, July 21, 2014 12:22 AM
Subject: Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

On 2014-07-20 18:21, Steven Hartland wrote:

Can you try reverting r265321 and see if you still see the
same crash?

   Regards
   Steve

I'll do the revert, but it's been a ONE TIME hit.

There was a followup to mine with a reproducible poudriere crash like 
mine.

If you don't have a reproducable senario I'd hold off.

Florian, is yours reproducable and can you send me
a pretty print of the crashing zio?

   Regards
   Steve

running on the reverted kernel.

We'll see if it stays up, crashes or what.

Haven't seen the crash again regardless.

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 (c) E-Mail: l...@lerctr.org
US Mail: 108 Turvey Cove, Hutto, TX 78634-5688
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-20 Thread Steven Hartland

- Original Message - 
From: "Larry Rosenman" 

To: "Steven Hartland" 
Cc: ; 
Sent: Monday, July 21, 2014 12:22 AM
Subject: Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

On 2014-07-20 18:21, Steven Hartland wrote:

Can you try reverting r265321 and see if you still see the
same crash?

   Regards
   Steve

I'll do the revert, but it's been a ONE TIME hit.

There was a followup to mine with a reproducible poudriere crash like 
mine.

If you don't have a reproducable senario I'd hold off.

Florian, is yours reproducable and can you send me
a pretty print of the crashing zio?

   Regards
   Steve
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-20 Thread Larry Rosenman


On 2014-07-20 18:21, Steven Hartland wrote:

Can you try reverting r265321 and see if you still see the
same crash?

   Regards
   Steve

I'll do the revert, but it's been a ONE TIME hit.

There was a followup to mine with a reproducible poudriere crash like 
mine.



--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 (c) E-Mail: l...@lerctr.org
US Mail: 108 Turvey Cove, Hutto, TX 78634-5688
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-20 Thread Steven Hartland


Can you try reverting r265321 and see if you still see the
same crash?

   Regards
   Steve
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-20 Thread Steven Hartland


Something like following should allow you to get the zio details
assuming the compile has optimised it out:

cd /var/crash
kgdb /boot/kernel/kernel /var/crash/vmcore.5
kgdb> frame 5
kgdb> print zio

   Regards
   Steve

- Original Message - 
From: "Larry Rosenman" 

To: "Steven Hartland" 
Cc: ; 
Sent: Sunday, July 20, 2014 8:20 PM
Subject: Re: [ZFS][PANIC] Solaris Assert/zio.c:2548



On 2014-07-20 14:18, Steven Hartland wrote:

Can you provide the details of the zio which caused the panic?

Also does any of your pools support trim?


No, on the trim.  Can you walk me through getting the zio you need?


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-20 Thread Larry Rosenman


On 2014-07-20 14:18, Steven Hartland wrote:

Can you provide the details of the zio which caused the panic?

Also does any of your pools support trim?


No, on the trim.  Can you walk me through getting the zio you need?



   Regards
   Steve

- Original Message - From: "Larry Rosenman" 
To: ; 
Sent: Sunday, July 20, 2014 3:03 PM
Subject: [ZFS][PANIC] Solaris Assert/zio.c:2548


Got the following panic overnight (I think while a nightly rsync was 
running):


Dump header from device /dev/gpt/swap0
 Architecture: amd64
 Architecture Version: 2
 Dump Length: 8122101760B (7745 MB)
 Blocksize: 512
 Dumptime: Sun Jul 20 03:22:18 2014
 Hostname: borg.lerctr.org
 Magic: FreeBSD Kernel Dump
 Version String: FreeBSD 11.0-CURRENT #50 r268894M: Sat Jul 19 
18:06:08 CDT 2014

   r...@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER
 Panic String: solaris assert: !(zio->io_flags & ZIO_FLAG_DELEGATED), 
file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, 
line: 2874

 Dump Parity: 763150733
 Bounds: 5
 Dump Status: good


borg.lerctr.org dumped core - see /var/crash/vmcore.5

Sun Jul 20 03:28:12 CDT 2014

FreeBSD borg.lerctr.org 11.0-CURRENT FreeBSD 11.0-CURRENT #50 
r268894M: Sat Jul 19 18:06:08 CDT 2014 
r...@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER  amd64


panic: solaris assert: !(zio->io_flags & ZIO_FLAG_DELEGATED), file: 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 
2874


GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and 
you are
welcome to change it and/or distribute copies of it under certain 
conditions.

Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for 
details.

This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: solaris assert: !(zio->io_flags & ZIO_FLAG_DELEGATED), file: 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 
2874

cpuid = 7
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe100c49f930

kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe100c49f9e0
vpanic() at vpanic+0x126/frame 0xfe100c49fa20
panic() at panic+0x43/frame 0xfe100c49fa80
assfail() at assfail+0x1d/frame 0xfe100c49fa90
zio_vdev_io_assess() at zio_vdev_io_assess+0x2ed/frame 
0xfe100c49fac0

zio_execute() at zio_execute+0x1e9/frame 0xfe100c49fb20
taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame 
0xfe100c49fb80
taskqueue_thread_loop() at taskqueue_thread_loop+0x9b/frame 
0xfe100c49fbb0

fork_exit() at fork_exit+0x84/frame 0xfe100c49fbf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe100c49fbf0
--- trap 0, rip = 0, rsp = 0xfe100c49fcb0, rbp = 0 ---
Uptime: 8h57m17s
(ada2:ahcich2:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 
00 00

(ada2:ahcich2:0:0:0): CAM status: Command timeout
(ada2:ahcich2:0:0:0): Error 5, Retries exhausted
(ada2:ahcich2:0:0:0): Synchronize cache failed
Dumping 7745 out of 64463 
MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%


Reading symbols from /boot/kernel/linux.ko.symbols...done.
Loaded symbols for /boot/kernel/linux.ko.symbols
Reading symbols from /boot/kernel/if_lagg.ko.symbols...done.
Loaded symbols for /boot/kernel/if_lagg.ko.symbols
Reading symbols from /boot/kernel/snd_envy24ht.ko.symbols...done.
Loaded symbols for /boot/kernel/snd_envy24ht.ko.symbols
Reading symbols from /boot/kernel/snd_spicds.ko.symbols...done.
Loaded symbols for /boot/kernel/snd_spicds.ko.symbols
Reading symbols from /boot/kernel/coretemp.ko.symbols...done.
Loaded symbols for /boot/kernel/coretemp.ko.symbols
Reading symbols from /boot/kernel/ichsmb.ko.symbols...done.
Loaded symbols for /boot/kernel/ichsmb.ko.symbols
Reading symbols from /boot/kernel/smbus.ko.symbols...done.
Loaded symbols for /boot/kernel/smbus.ko.symbols
Reading symbols from /boot/kernel/ichwd.ko.symbols...done.
Loaded symbols for /boot/kernel/ichwd.ko.symbols
Reading symbols from /boot/kernel/cpuctl.ko.symbols...done.
Loaded symbols for /boot/kernel/cpuctl.ko.symbols
Reading symbols from /boot/kernel/crypto.ko.symbols...done.
Loaded symbols for /boot/kernel/crypto.ko.symbols
Reading symbols from /boot/kernel/cryptodev.ko.symbols...done.
Loaded symbols for /boot/kernel/cryptodev.ko.symbols
Reading symbols from /boot/kernel/dtraceall.ko.symbols...done.
Loaded symbols for /boot/kernel/dtraceall.ko.symbols
Reading symbols from /boot/kernel/profile.ko.symbols...done.
Loaded symbols for /boot/kernel/profile.ko.symbols
Reading symbols from /boot/kernel/cyclic.ko.symbols...done.
Loaded symbols for /boot/kernel/cyclic.ko.symbols
Reading symbols from /boot/kernel/dtrace.ko.symbols...done.
Loaded symbols for /boot/kernel/dtrace.ko.symbols
Reading symbols from 
/boot/kernel/systrace_freebsd32.ko.symbols...done.

Loaded symbols for /b

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-20 Thread Steven Hartland


Can you provide the details of the zio which caused the panic?

Also does any of your pools support trim?

   Regards
   Steve

- Original Message - 
From: "Larry Rosenman" 

To: ; 
Sent: Sunday, July 20, 2014 3:03 PM
Subject: [ZFS][PANIC] Solaris Assert/zio.c:2548



Got the following panic overnight (I think while a nightly rsync was running):

Dump header from device /dev/gpt/swap0
 Architecture: amd64
 Architecture Version: 2
 Dump Length: 8122101760B (7745 MB)
 Blocksize: 512
 Dumptime: Sun Jul 20 03:22:18 2014
 Hostname: borg.lerctr.org
 Magic: FreeBSD Kernel Dump
 Version String: FreeBSD 11.0-CURRENT #50 r268894M: Sat Jul 19 18:06:08 CDT 2014
   r...@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER
 Panic String: solaris assert: !(zio->io_flags & ZIO_FLAG_DELEGATED), file: 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 2874

 Dump Parity: 763150733
 Bounds: 5
 Dump Status: good


borg.lerctr.org dumped core - see /var/crash/vmcore.5

Sun Jul 20 03:28:12 CDT 2014

FreeBSD borg.lerctr.org 11.0-CURRENT FreeBSD 11.0-CURRENT #50 r268894M: Sat Jul 19 18:06:08 CDT 2014 
r...@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER  amd64


panic: solaris assert: !(zio->io_flags & ZIO_FLAG_DELEGATED), file: 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 2874


GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: solaris assert: !(zio->io_flags & ZIO_FLAG_DELEGATED), file: 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 2874

cpuid = 7
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe100c49f930
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe100c49f9e0
vpanic() at vpanic+0x126/frame 0xfe100c49fa20
panic() at panic+0x43/frame 0xfe100c49fa80
assfail() at assfail+0x1d/frame 0xfe100c49fa90
zio_vdev_io_assess() at zio_vdev_io_assess+0x2ed/frame 0xfe100c49fac0
zio_execute() at zio_execute+0x1e9/frame 0xfe100c49fb20
taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame 0xfe100c49fb80
taskqueue_thread_loop() at taskqueue_thread_loop+0x9b/frame 0xfe100c49fbb0
fork_exit() at fork_exit+0x84/frame 0xfe100c49fbf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe100c49fbf0
--- trap 0, rip = 0, rsp = 0xfe100c49fcb0, rbp = 0 ---
Uptime: 8h57m17s
(ada2:ahcich2:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada2:ahcich2:0:0:0): CAM status: Command timeout
(ada2:ahcich2:0:0:0): Error 5, Retries exhausted
(ada2:ahcich2:0:0:0): Synchronize cache failed
Dumping 7745 out of 64463 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/linux.ko.symbols...done.
Loaded symbols for /boot/kernel/linux.ko.symbols
Reading symbols from /boot/kernel/if_lagg.ko.symbols...done.
Loaded symbols for /boot/kernel/if_lagg.ko.symbols
Reading symbols from /boot/kernel/snd_envy24ht.ko.symbols...done.
Loaded symbols for /boot/kernel/snd_envy24ht.ko.symbols
Reading symbols from /boot/kernel/snd_spicds.ko.symbols...done.
Loaded symbols for /boot/kernel/snd_spicds.ko.symbols
Reading symbols from /boot/kernel/coretemp.ko.symbols...done.
Loaded symbols for /boot/kernel/coretemp.ko.symbols
Reading symbols from /boot/kernel/ichsmb.ko.symbols...done.
Loaded symbols for /boot/kernel/ichsmb.ko.symbols
Reading symbols from /boot/kernel/smbus.ko.symbols...done.
Loaded symbols for /boot/kernel/smbus.ko.symbols
Reading symbols from /boot/kernel/ichwd.ko.symbols...done.
Loaded symbols for /boot/kernel/ichwd.ko.symbols
Reading symbols from /boot/kernel/cpuctl.ko.symbols...done.
Loaded symbols for /boot/kernel/cpuctl.ko.symbols
Reading symbols from /boot/kernel/crypto.ko.symbols...done.
Loaded symbols for /boot/kernel/crypto.ko.symbols
Reading symbols from /boot/kernel/cryptodev.ko.symbols...done.
Loaded symbols for /boot/kernel/cryptodev.ko.symbols
Reading symbols from /boot/kernel/dtraceall.ko.symbols...done.
Loaded symbols for /boot/kernel/dtraceall.ko.symbols
Reading symbols from /boot/kernel/profile.ko.symbols...done.
Loaded symbols for /boot/kernel/profile.ko.symbols
Reading symbols from /boot/kernel/cyclic.ko.symbols...done.
Loaded symbols for /boot/kernel/cyclic.ko.symbols
Reading symbols from /boot/kernel/dtrace.ko.symbols...done.
Loaded symbols for /boot/kernel/dtrace.ko.symbols
Reading symbols from /boot/kernel/systrace_freebsd32.ko.symbols...done.
Loaded symbols for /boot/kernel/systrace_freebsd32.ko.symbols
Reading symbols from /boot/kernel/systrace.ko.symbols...done.
Loaded symbols for /boot/kernel/systrac

Re: [ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-20 Thread Florian Smeets

On 20/07/14 16:03, Larry Rosenman wrote:

> Panic String: solaris assert: !(zio->io_flags & ZIO_FLAG_DELEGATED),
> file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c,
> line: 2874


> 
> Unread portion of the kernel message buffer: panic: solaris assert:
> !(zio->io_flags & ZIO_FLAG_DELEGATED), file:
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line:
> 2874 cpuid = 7 KDB: stack backtrace: db_trace_self_wrapper() at
> db_trace_self_wrapper+0x2b/frame 0xfe100c49f930 kdb_backtrace()
> at kdb_backtrace+0x39/frame 0xfe100c49f9e0 vpanic() at
> vpanic+0x126/frame 0xfe100c49fa20 panic() at panic+0x43/frame
> 0xfe100c49fa80 assfail() at assfail+0x1d/frame
> 0xfe100c49fa90 zio_vdev_io_assess() at
> zio_vdev_io_assess+0x2ed/frame 0xfe100c49fac0 zio_execute() at
> zio_execute+0x1e9/frame 0xfe100c49fb20 taskqueue_run_locked() at
> taskqueue_run_locked+0xf0/frame 0xfe100c49fb80 
> taskqueue_thread_loop() at taskqueue_thread_loop+0x9b/frame
> 0xfe100c49fbb0 fork_exit() at fork_exit+0x84/frame
> 0xfe100c49fbf0 fork_trampoline() at fork_trampoline+0xe/frame
> 0xfe100c49fbf0 --- trap 0, rip = 0, rsp = 0xfe100c49fcb0, rbp
> = 0 --- Uptime: 8h57m17s (ada2:ahcich2:0:0:0): FLUSHCACHE48. ACB: ea
> 00 00 00 00 40 00 00 00 00 00 00 (ada2:ahcich2:0:0:0): CAM status:


Same here, running poudriere the box panics reproducibly within 2-5 seconds.

panic: solaris assert: !(zio->io_flags & ZIO_FLAG_DELEGATED), file:
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c,
line: 2874
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
0xfe2e97f0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe2e98a0
vpanic() at vpanic+0x126/frame 0xfe2e98e0
panic() at panic+0x43/frame 0xfe2e9940
assfail() at assfail+0x1d/frame 0xfe2e9950
zio_vdev_io_assess() at zio_vdev_io_assess+0x2e8/frame 0xfe2e9980
zio_execute() at zio_execute+0x1e9/frame 0xfe2e99e0
taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame 0xfe2e9a40
taskqueue_thread_loop() at taskqueue_thread_loop+0x9b/frame
0xfe2e9a70
fork_exit() at fork_exit+0x84/frame 0xfe2e9ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe2e9ab0
--- trap 0, rip = 0, rsp = 0xfe2e9b70, rbp = 0 ---
KDB: enter: panic
[ thread pid 0 tid 100422 ]
Stopped at  kdb_enter+0x3e: movq$0,kdb_why

Florian



signature.asc
Description: OpenPGP digital signature

[ZFS][PANIC] Solaris Assert/zio.c:2548

2014-07-20 Thread Larry Rosenman

Got the following panic overnight (I think while a nightly rsync was running):

Dump header from device /dev/gpt/swap0
  Architecture: amd64
  Architecture Version: 2
  Dump Length: 8122101760B (7745 MB)
  Blocksize: 512
  Dumptime: Sun Jul 20 03:22:18 2014
  Hostname: borg.lerctr.org
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 11.0-CURRENT #50 r268894M: Sat Jul 19 18:06:08 CDT 
2014
r...@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER
  Panic String: solaris assert: !(zio->io_flags & ZIO_FLAG_DELEGATED), file: 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 2874
  Dump Parity: 763150733
  Bounds: 5
  Dump Status: good


borg.lerctr.org dumped core - see /var/crash/vmcore.5

Sun Jul 20 03:28:12 CDT 2014

FreeBSD borg.lerctr.org 11.0-CURRENT FreeBSD 11.0-CURRENT #50 r268894M: Sat Jul 
19 18:06:08 CDT 2014 r...@borg.lerctr.org:/usr/obj/usr/src/sys/VT-LER  amd64

panic: solaris assert: !(zio->io_flags & ZIO_FLAG_DELEGATED), file: 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 2874

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: solaris assert: !(zio->io_flags & ZIO_FLAG_DELEGATED), file: 
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line: 2874
cpuid = 7
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe100c49f930
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe100c49f9e0
vpanic() at vpanic+0x126/frame 0xfe100c49fa20
panic() at panic+0x43/frame 0xfe100c49fa80
assfail() at assfail+0x1d/frame 0xfe100c49fa90
zio_vdev_io_assess() at zio_vdev_io_assess+0x2ed/frame 0xfe100c49fac0
zio_execute() at zio_execute+0x1e9/frame 0xfe100c49fb20
taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame 0xfe100c49fb80
taskqueue_thread_loop() at taskqueue_thread_loop+0x9b/frame 0xfe100c49fbb0
fork_exit() at fork_exit+0x84/frame 0xfe100c49fbf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe100c49fbf0
--- trap 0, rip = 0, rsp = 0xfe100c49fcb0, rbp = 0 ---
Uptime: 8h57m17s
(ada2:ahcich2:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
(ada2:ahcich2:0:0:0): CAM status: Command timeout
(ada2:ahcich2:0:0:0): Error 5, Retries exhausted
(ada2:ahcich2:0:0:0): Synchronize cache failed
Dumping 7745 out of 64463 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/linux.ko.symbols...done.
Loaded symbols for /boot/kernel/linux.ko.symbols
Reading symbols from /boot/kernel/if_lagg.ko.symbols...done.
Loaded symbols for /boot/kernel/if_lagg.ko.symbols
Reading symbols from /boot/kernel/snd_envy24ht.ko.symbols...done.
Loaded symbols for /boot/kernel/snd_envy24ht.ko.symbols
Reading symbols from /boot/kernel/snd_spicds.ko.symbols...done.
Loaded symbols for /boot/kernel/snd_spicds.ko.symbols
Reading symbols from /boot/kernel/coretemp.ko.symbols...done.
Loaded symbols for /boot/kernel/coretemp.ko.symbols
Reading symbols from /boot/kernel/ichsmb.ko.symbols...done.
Loaded symbols for /boot/kernel/ichsmb.ko.symbols
Reading symbols from /boot/kernel/smbus.ko.symbols...done.
Loaded symbols for /boot/kernel/smbus.ko.symbols
Reading symbols from /boot/kernel/ichwd.ko.symbols...done.
Loaded symbols for /boot/kernel/ichwd.ko.symbols
Reading symbols from /boot/kernel/cpuctl.ko.symbols...done.
Loaded symbols for /boot/kernel/cpuctl.ko.symbols
Reading symbols from /boot/kernel/crypto.ko.symbols...done.
Loaded symbols for /boot/kernel/crypto.ko.symbols
Reading symbols from /boot/kernel/cryptodev.ko.symbols...done.
Loaded symbols for /boot/kernel/cryptodev.ko.symbols
Reading symbols from /boot/kernel/dtraceall.ko.symbols...done.
Loaded symbols for /boot/kernel/dtraceall.ko.symbols
Reading symbols from /boot/kernel/profile.ko.symbols...done.
Loaded symbols for /boot/kernel/profile.ko.symbols
Reading symbols from /boot/kernel/cyclic.ko.symbols...done.
Loaded symbols for /boot/kernel/cyclic.ko.symbols
Reading symbols from /boot/kernel/dtrace.ko.symbols...done.
Loaded symbols for /boot/kernel/dtrace.ko.symbols
Reading symbols from /boot/kernel/systrace_freebsd32.ko.symbols...done.
Loaded symbols for /boot/kernel/systrace_freebsd32.ko.symbols
Reading symbols from /boot/kernel/systrace.ko.symbols...done.
Loaded symbols for /boot/kernel/systrace.ko.symbols
Reading symbols from /boot/kernel/sdt.ko.symbols...done.
Loaded symbols for /boot/kernel/sdt.ko.symbols
Reading symbols from /boot/kernel/lockstat.ko.symbols...done.
Loaded symbols for /boot/kernel/lockstat.ko.symbols
Reading symbols from /boot/kernel/fasttrap.ko.symbols...done.
Loaded symbols for /boot/kernel/fas

Re: ZFS panic in -CURRENT

2014-04-15 Thread R. Tyler Croy


(follow up below)

On 04/01/2014 06:57, R. Tyler Croy wrote:

On Tue, 01 Apr 2014 09:41:45 +0300
Andriy Gapon  wrote:


on 01/04/2014 02:22 R. Tyler Croy said the following:

Bumping this with more details

On Fri, 28 Mar 2014 09:53:32 -0700
R Tyler Croy  wrote:


Apologies for the rough format here, I had to take a picture of
this failure because I didn't know what else to do.



I'm building off of the GitHub freebsd.git mirror here, and the
latest commit in the tree is neel@'s "Add an ioctl to suspend.."

My dmesg/pciconf are here:
https://gist.github.com/rtyler/1faa854dff7c4396d9e8

As linked before, the dmesg and `pciconf -lv` output can be found
here: 

Also in addition to the photo from before of the panic, here's
another reproduction photo:


Are you or have you even been running with any ZFS-related kernel
patches?

Negative, I've never run any specific ZFS patches on this machine (or
any machine for that matter!)

One other unique clue might be that I'm running with an encrypted
zpool, other than that, nothing fancy here.



I've upgraded my machine to r264387 and I still experience the issue, 
here's the latest pretty picture of my panicked laptop :) 



The issue still seems to stem from a failed assertion in 
zap_leaf_lookup_closest() 
(http://svnweb.freebsd.org/base/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c?revision=249195&view=markup#l446) 
but I'm not sure which assertion might be failing.


This is somewhat problematic because I cannot perform *any* FS 
operations with the tainted directory tree, not even a `du -hcs *` to 
find out how much space I can never access again :P



I can reproduce this consistently, if anybody has the time to get onto 
IRC (rtyler on Freenode and EFNet) and debug this, I can certainly act 
as remote hands with kdb to help ascertain more information about the panic.



Cheers


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: ZFS panic in -CURRENT

2014-04-02 Thread Andriy Gapon

on 02/04/2014 19:48 R. Tyler Croy said the following:
> On Wed, 02 Apr 2014 09:58:37 +0300
> Andriy Gapon  wrote:
> 
>> on 01/04/2014 16:57 R. Tyler Croy said the following:
>>> On Tue, 01 Apr 2014 09:41:45 +0300
>>> Andriy Gapon  wrote:
>>>
 on 01/04/2014 02:22 R. Tyler Croy said the following:
>> ...
> Also in addition to the photo from before of the panic, here's
> another reproduction photo:
> 

 Are you or have you even been running with any ZFS-related kernel
 patches?
>>>
>>>
>>> Negative, I've never run any specific ZFS patches on this machine
>>> (or any machine for that matter!)
>>>
>>> One other unique clue might be that I'm running with an encrypted
>>> zpool, other than that, nothing fancy here.
>>
>> Your problem looks like a corruption of on-disk data.
>> I can not say how it came to be or how to fix it now.
>>
> 
> 
> This is concerning to me, I'm using an intel 128GB SSD which is less
> than 6 months old. If there is an actual disk-level corruption,
> shouldn't that manifest itself as a zpool error?

I am afraid that this is a different kind of corruption.  Either a bug (possibly
old, already fixes) in ZFS or a corruption that happened in RAM before a buffer
was sent to a disk.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: ZFS panic in -CURRENT

2014-04-02 Thread R. Tyler Croy

On Wed, 02 Apr 2014 09:58:37 +0300
Andriy Gapon  wrote:

> on 01/04/2014 16:57 R. Tyler Croy said the following:
> > On Tue, 01 Apr 2014 09:41:45 +0300
> > Andriy Gapon  wrote:
> > 
> >> on 01/04/2014 02:22 R. Tyler Croy said the following:
> ...
> >>> Also in addition to the photo from before of the panic, here's
> >>> another reproduction photo:
> >>> 
> >>
> >> Are you or have you even been running with any ZFS-related kernel
> >> patches?
> > 
> > 
> > Negative, I've never run any specific ZFS patches on this machine
> > (or any machine for that matter!)
> > 
> > One other unique clue might be that I'm running with an encrypted
> > zpool, other than that, nothing fancy here.
> 
> Your problem looks like a corruption of on-disk data.
> I can not say how it came to be or how to fix it now.
> 


This is concerning to me, I'm using an intel 128GB SSD which is less
than 6 months old. If there is an actual disk-level corruption,
shouldn't that manifest itself as a zpool error?


:/

-- 

- R. Tyler Croy

--
 Code: 
  Chatter: 

  % gpg --keyserver keys.gnupg.net --recv-key 3F51E16F
--
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: ZFS panic in -CURRENT

2014-04-02 Thread Andriy Gapon

on 01/04/2014 16:57 R. Tyler Croy said the following:
> On Tue, 01 Apr 2014 09:41:45 +0300
> Andriy Gapon  wrote:
> 
>> on 01/04/2014 02:22 R. Tyler Croy said the following:
...
>>> Also in addition to the photo from before of the panic, here's
>>> another reproduction photo:
>>> 
>>
>> Are you or have you even been running with any ZFS-related kernel
>> patches?
> 
> 
> Negative, I've never run any specific ZFS patches on this machine (or
> any machine for that matter!)
> 
> One other unique clue might be that I'm running with an encrypted
> zpool, other than that, nothing fancy here.

Your problem looks like a corruption of on-disk data.
I can not say how it came to be or how to fix it now.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: ZFS panic in -CURRENT

2014-04-01 Thread R. Tyler Croy

On Tue, 01 Apr 2014 09:41:45 +0300
Andriy Gapon  wrote:

> on 01/04/2014 02:22 R. Tyler Croy said the following:
> > Bumping this with more details
> > 
> > On Fri, 28 Mar 2014 09:53:32 -0700
> > R Tyler Croy  wrote:
> > 
> >> Apologies for the rough format here, I had to take a picture of
> >> this failure because I didn't know what else to do.
> >>
> >> 
> >>
> >> I'm building off of the GitHub freebsd.git mirror here, and the
> >> latest commit in the tree is neel@'s "Add an ioctl to suspend.."
> >>
> >> My dmesg/pciconf are here:
> >> https://gist.github.com/rtyler/1faa854dff7c4396d9e8
> > 
> > 
> > As linked before, the dmesg and `pciconf -lv` output can be found
> > here: 
> > 
> > Also in addition to the photo from before of the panic, here's
> > another reproduction photo:
> > 
> 
> Are you or have you even been running with any ZFS-related kernel
> patches?


Negative, I've never run any specific ZFS patches on this machine (or
any machine for that matter!)

One other unique clue might be that I'm running with an encrypted
zpool, other than that, nothing fancy here.



- R. Tyler Croy

--
 Code: 
  Chatter: 

  % gpg --keyserver keys.gnupg.net --recv-key 3F51E16F
--
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: ZFS panic in -CURRENT

2014-03-31 Thread Andriy Gapon

on 01/04/2014 02:22 R. Tyler Croy said the following:
> Bumping this with more details
> 
> On Fri, 28 Mar 2014 09:53:32 -0700
> R Tyler Croy  wrote:
> 
>> Apologies for the rough format here, I had to take a picture of this
>> failure because I didn't know what else to do.
>>
>> 
>>
>> I'm building off of the GitHub freebsd.git mirror here, and the
>> latest commit in the tree is neel@'s "Add an ioctl to suspend.."
>>
>> My dmesg/pciconf are here:
>> https://gist.github.com/rtyler/1faa854dff7c4396d9e8
> 
> 
> As linked before, the dmesg and `pciconf -lv` output can be found here:
> 
> 
> Also in addition to the photo from before of the panic, here's another
> reproduction photo:
> 

Are you or have you even been running with any ZFS-related kernel patches?

> I'm running -CURRENT as of r263881 right now, with a custom kernel
> which is built on top of the VT kernel
> (https://github.com/rtyler/freebsd/blob/5e324960f1f2b7079de369204fe228db4a2ec99d/sys/amd64/conf/KIWI)
> 
> I'm able to get this panic *consistently* whenever a process accesses
> my maildir folder which I sync with the mbsync program (isync package),
> such as `mbsync personal` or when I back up the maildir with duplicity.
> The commonality seems to be listing or accessing portions of this file
> tree. Curiously enough it only seems to be isolated to that single
> portion of the filesystem tree.
> 
> The zpool is also clean as far as errors go:
> 
>> [16:11:03] tyler:freebsd git:(master*) $ zpool status zroot
>>   pool: zroot
>>  state: ONLINE
>> status: Some supported features are not enabled on the pool. The pool
>> can still be used, but some features are unavailable.
>> action: Enable all features using 'zpool upgrade'. Once this is done,
>> the pool may no longer be accessible by software that does not
>> support the features. See zpool-features(7) for details.
>>   scan: scrub repaired 0 in 0h18m with 0 errors on Fri Mar 28 11:55:03
>> 2014 config:
>>
>> NAME  STATE READ WRITE CKSUM
>> zroot ONLINE   0 0 0
>>   ada0p3.eli  ONLINE   0 0 0
>>
>> errors: No known data errors
>> [16:19:57] tyler:freebsd git:(master*) $ 
> 
> 
> I'm not sure what other data would be useful here, I can consistently
> see the panic, but this data is highly personal, so I'm not sure how
> much of a "repro case" I can give folks. :(
> 
> Cheers
> 


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: ZFS panic in -CURRENT

2014-03-31 Thread R. Tyler Croy

Bumping this with more details

On Fri, 28 Mar 2014 09:53:32 -0700
R Tyler Croy  wrote:

> Apologies for the rough format here, I had to take a picture of this
> failure because I didn't know what else to do.
> 
> 
> 
> I'm building off of the GitHub freebsd.git mirror here, and the
> latest commit in the tree is neel@'s "Add an ioctl to suspend.."
> 
> My dmesg/pciconf are here:
> https://gist.github.com/rtyler/1faa854dff7c4396d9e8

As linked before, the dmesg and `pciconf -lv` output can be found here:

Also in addition to the photo from before of the panic, here's another
reproduction photo:

I'm running -CURRENT as of r263881 right now, with a custom kernel
which is built on top of the VT kernel
(https://github.com/rtyler/freebsd/blob/5e324960f1f2b7079de369204fe228db4a2ec99d/sys/amd64/conf/KIWI)

I'm able to get this panic *consistently* whenever a process accesses
my maildir folder which I sync with the mbsync program (isync package),
such as `mbsync personal` or when I back up the maildir with duplicity.
The commonality seems to be listing or accessing portions of this file
tree. Curiously enough it only seems to be isolated to that single
portion of the filesystem tree.

The zpool is also clean as far as errors go:

> [16:11:03] tyler:freebsd git:(master*) $ zpool status zroot
>   pool: zroot
>  state: ONLINE
> status: Some supported features are not enabled on the pool. The pool
> can still be used, but some features are unavailable.
> action: Enable all features using 'zpool upgrade'. Once this is done,
> the pool may no longer be accessible by software that does not
> support the features. See zpool-features(7) for details.
>   scan: scrub repaired 0 in 0h18m with 0 errors on Fri Mar 28 11:55:03
> 2014 config:
> 
> NAME  STATE READ WRITE CKSUM
> zroot ONLINE   0 0 0
>   ada0p3.eli  ONLINE   0 0 0
> 
> errors: No known data errors
> [16:19:57] tyler:freebsd git:(master*) $ 

I'm not sure what other data would be useful here, I can consistently
see the panic, but this data is highly personal, so I'm not sure how
much of a "repro case" I can give folks. :(

Cheers
-- 

- R. Tyler Croy

--
 Code: 
  Chatter: 

  % gpg --keyserver keys.gnupg.net --recv-key 3F51E16F
--
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

ZFS panic in -CURRENT

2014-03-28 Thread R Tyler Croy

Apologies for the rough format here, I had to take a picture of this failure 
because I didn't know what else to do.



I'm building off of the GitHub freebsd.git mirror here, and the latest commit 
in the tree is neel@'s "Add an ioctl to suspend.."

My dmesg/pciconf are here: https://gist.github.com/rtyler/1faa854dff7c4396d9e8
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: ZFS panic (dn->dn_datablkshift != 0) with r256304 and send/recv

2013-10-14 Thread Keith White


On Mon, 14 Oct 2013, Andriy Gapon wrote:


on 14/10/2013 03:34 Keith White said the following:

I get the following assert failure with a recent current:

panic: solaris assert: dn->dn_datablkshift != 0, file:
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c,
line: 638


Please see https://www.illumos.org/issues/4188
The current best known fix is to simply drop the assertion.
...


Thanks!  It works for me.   Receive completes, and filesystems compare the same.

Index: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c
===
--- /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c
(revision 256304)
+++ /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c
(working copy)
@@ -635,7 +635,7 @@
uint64_t start = off >> shift;
uint64_t end = (off + len) >> shift;

-   ASSERT(dn->dn_datablkshift != 0);
+   /* XXX may be false alarm: ASSERT(dn->dn_datablkshift != 0); 
XXX */
ASSERT(dn->dn_indblkshift != 0);

zio = zio_root(tx->tx_pool->dp_spa,


Though, I am not entirely sure if this will be the final solution.  I'll
double-check with Matt.



...keith
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: ZFS panic (dn->dn_datablkshift != 0) with r256304 and send/recv

2013-10-13 Thread Andriy Gapon

on 14/10/2013 03:34 Keith White said the following:
> I get the following assert failure with a recent current:
> 
> panic: solaris assert: dn->dn_datablkshift != 0, file:
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c,
> line: 638

Please see https://www.illumos.org/issues/4188
The current best known fix is to simply drop the assertion.
Though, I am not entirely sure if this will be the final solution.  I'll
double-check with Matt.

> # uname -a
> FreeBSD freebsd10 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r256304: Thu Oct 10
> 19:38:55 EDT 2013 kwhite@freebsd10:/tank/obj/usr/src/sys/GENERIC  amd64
> 
> # kgdb /boot/kernel/kernel /var/crash/vmcore.last
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> ...
> <118># zfs send -vi tank/RPI@20131004 tank/RPI@20131013 | zfs recv -vF
> m_tank/RPI@20131013
> <118>send from @20131004 to tank/RPI@20131013 estimated size is 85.0M
> <118>total estimated size is 85.0M
> <118>TIMESENT   SNAPSHOT
> <118>receiving incremental stream of tank/RPI@20131013 into 
> m_tank/RPI@20131013
> <118>19:45:12   5.90M   tank/RPI@20131013
> <118>19:45:13   36.4M   tank/RPI@20131013
> <118>19:45:15   38.4M   tank/RPI@20131013
> <118>19:45:16   41.3M   tank/RPI@20131013
> panic: solaris assert: dn->dn_datablkshift != 0, file:
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c,
> line: 638
> cpuid = 0
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00977711a0
> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe0097771250
> vpanic() at vpanic+0x126/frame 0xfe0097771290
> panic() at panic+0x43/frame 0xfe00977712f0
> assfail() at assfail+0x22/frame 0xfe0097771300
> dmu_tx_hold_free() at dmu_tx_hold_free+0x162/frame 0xfe00977713e0
> dmu_free_long_range() at dmu_free_long_range+0x244/frame 0xfe0097771450
> dmu_free_long_object() at dmu_free_long_object+0x1f/frame 0xfe0097771480
> dmu_recv_stream() at dmu_recv_stream+0x86e/frame 0xfe00977716b0
> zfs_ioc_recv() at zfs_ioc_recv+0x96c/frame 0xfe0097771920
> zfsdev_ioctl() at zfsdev_ioctl+0x54a/frame 0xfe00977719c0
> devfs_ioctl_f() at devfs_ioctl_f+0xf0/frame 0xfe0097771a20
> kern_ioctl() at kern_ioctl+0x2ca/frame 0xfe0097771a90
> sys_ioctl() at sys_ioctl+0x11f/frame 0xfe0097771ae0
> amd64_syscall() at amd64_syscall+0x265/frame 0xfe0097771bf0
> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe0097771bf0
> --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8019f49ca, rsp =
> 0x7fff9438, rbp = 0x7fff94c0 ---
> KDB: enter: panic
> Uptime: 7m30s
> ...
> (kgdb) where
> #0  doadump (textdump=1) at pcpu.h:219
> #1  0x808b88b7 in kern_reboot (howto=260) at
> /usr/src/sys/kern/kern_shutdown.c:447
> #2  0x808b8dc5 in vpanic (fmt=, ap= optimized
> out>) at /usr/src/sys/kern/kern_shutdown.c:754
> #3  0x808b8e13 in panic (fmt=) at
> /usr/src/sys/kern/kern_shutdown.c:683
> #4  0x81dd1222 in assfail (a=, f= out>, l=) at
> /usr/src/sys/modules/opensolaris/../../cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81
> 
> #5  0x81c847c2 in dmu_tx_hold_free (tx=0xf800118bda00, 
> object= optimized out>, off=, len=)
> at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:638
> 
> #6  0x81c78124 in dmu_free_long_range (os=0xf8000580f000,
> object=, offset=0, length=18446744073709551615)
> at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:654
> #7  0x81c781df in dmu_free_long_object (os=0xf8000580f000,
> object=66055) at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:700
> #8  0x81c7c39e in dmu_recv_stream (drc=0xfe0097771728, fp= optimized out>, voffp=0xfe0097771718, cleanup_fd=8, action_handlep= optimized out>)
> at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:1289
> 
> #9  0x81d0a1fc in zfs_ioc_recv (zc=0xfe000183) at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:4102
> 
> #10 0x81d054ea in zfsdev_ioctl (dev=, zcmd= optimized out>, arg=, flag=, 
> td= optimized out>)
> at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:5960
> 
> #11 0x807b0d40 in devfs_ioctl_f (fp=0xf80005304dc0, 
> com=3222821403,
> data=0xf800027abc40, cred=, td=0xf8000524f000) at
> /usr/src/sys/fs/devfs/devfs_vnops.c:757
> #12 0x809

ZFS panic (dn->dn_datablkshift != 0) with r256304 and send/recv

2013-10-13 Thread Keith White


I get the following assert failure with a recent current:

panic: solaris assert: dn->dn_datablkshift != 0, file: 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c,
 line: 638

# uname -a
FreeBSD freebsd10 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r256304: Thu Oct 10 
19:38:55 EDT 2013 kwhite@freebsd10:/tank/obj/usr/src/sys/GENERIC  amd64

# kgdb /boot/kernel/kernel /var/crash/vmcore.last
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
...
<118># zfs send -vi tank/RPI@20131004 tank/RPI@20131013 | zfs recv -vF 
m_tank/RPI@20131013
<118>send from @20131004 to tank/RPI@20131013 estimated size is 85.0M
<118>total estimated size is 85.0M
<118>TIMESENT   SNAPSHOT
<118>receiving incremental stream of tank/RPI@20131013 into m_tank/RPI@20131013
<118>19:45:12   5.90M   tank/RPI@20131013
<118>19:45:13   36.4M   tank/RPI@20131013
<118>19:45:15   38.4M   tank/RPI@20131013
<118>19:45:16   41.3M   tank/RPI@20131013
panic: solaris assert: dn->dn_datablkshift != 0, file: 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c,
 line: 638
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00977711a0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe0097771250
vpanic() at vpanic+0x126/frame 0xfe0097771290
panic() at panic+0x43/frame 0xfe00977712f0
assfail() at assfail+0x22/frame 0xfe0097771300
dmu_tx_hold_free() at dmu_tx_hold_free+0x162/frame 0xfe00977713e0
dmu_free_long_range() at dmu_free_long_range+0x244/frame 0xfe0097771450
dmu_free_long_object() at dmu_free_long_object+0x1f/frame 0xfe0097771480
dmu_recv_stream() at dmu_recv_stream+0x86e/frame 0xfe00977716b0
zfs_ioc_recv() at zfs_ioc_recv+0x96c/frame 0xfe0097771920
zfsdev_ioctl() at zfsdev_ioctl+0x54a/frame 0xfe00977719c0
devfs_ioctl_f() at devfs_ioctl_f+0xf0/frame 0xfe0097771a20
kern_ioctl() at kern_ioctl+0x2ca/frame 0xfe0097771a90
sys_ioctl() at sys_ioctl+0x11f/frame 0xfe0097771ae0
amd64_syscall() at amd64_syscall+0x265/frame 0xfe0097771bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe0097771bf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8019f49ca, rsp = 
0x7fff9438, rbp = 0x7fff94c0 ---
KDB: enter: panic
Uptime: 7m30s
...
(kgdb) where
#0  doadump (textdump=1) at pcpu.h:219
#1  0x808b88b7 in kern_reboot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:447
#2  0x808b8dc5 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:754
#3  0x808b8e13 in panic (fmt=) at 
/usr/src/sys/kern/kern_shutdown.c:683
#4  0x81dd1222 in assfail (a=, f=, 
l=) at 
/usr/src/sys/modules/opensolaris/../../cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81
#5  0x81c847c2 in dmu_tx_hold_free (tx=0xf800118bda00, object=, off=, len=)
at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:638
#6  0x81c78124 in dmu_free_long_range (os=0xf8000580f000, 
object=, offset=0, length=18446744073709551615)
at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:654
#7  0x81c781df in dmu_free_long_object (os=0xf8000580f000, 
object=66055) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c:700
#8  0x81c7c39e in dmu_recv_stream (drc=0xfe0097771728, fp=, voffp=0xfe0097771718, cleanup_fd=8, action_handlep=)
at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:1289
#9  0x81d0a1fc in zfs_ioc_recv (zc=0xfe000183) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:4102
#10 0x81d054ea in zfsdev_ioctl (dev=, zcmd=, 
arg=, flag=, td=)
at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:5960
#11 0x807b0d40 in devfs_ioctl_f (fp=0xf80005304dc0, com=3222821403, 
data=0xf800027abc40, cred=, td=0xf8000524f000) at 
/usr/src/sys/fs/devfs/devfs_vnops.c:757
#12 0x8090ffea in kern_ioctl (td=0xf8000524f000, fd=, com=0) at file.h:319
#13 0x8090fccf in sys_ioctl (td=0xf8000524f000, 
uap=0xfe0097771b80) at /usr/src/sys/kern/sys_generic.c:698
#14 0x80cb2e05 in amd64_syscall (td=0xf8000524f000, traced=0) at 
subr_syscall.c:134
#15 0x80c979fb in Xfast_syscall () at 
/usr/src/sys/amd64/amd64/exception.S:391
#16 0x0008019f49ca in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal
(kgdb)

...keith
_

Re: ZFS panic with r255937

2013-10-03 Thread Keith White


On Thu, 3 Oct 2013, Andriy Gapon wrote:


on 02/10/2013 20:59 Keith White said the following:

On Wed, 2 Oct 2013, Andriy Gapon wrote:


on 30/09/2013 02:11 kwh...@site.uottawa.ca said the following:

Sorry, debugging this is *way* beyond me.  Any hints, patches to try?


Please share the stack trace.

--
Andriy Gapon


There's now a pr for this panic: kern/182570

Here's the stack trace:

root@freebsd10:/usr/src # kgdb /boot/kernel/kernel /var/crash/vmcore.last
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: solaris assert: dn->dn_maxblkid == 0 &&
(BP_IS_HOLE(&dn->dn_phys->dn_blkptr[0]) || dnode_block_freed(dn, 0)), file:
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c,
line: 598
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00992b3280
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe00992b3330
vpanic() at vpanic+0x126/frame 0xfe00992b3370
panic() at panic+0x43/frame 0xfe00992b33d0
assfail() at assfail+0x22/frame 0xfe00992b33e0
dnode_reallocate() at dnode_reallocate+0x225/frame 0xfe00992b3430
dmu_object_reclaim() at dmu_object_reclaim+0x123/frame 0xfe00992b3480
dmu_recv_stream() at dmu_recv_stream+0xd79/frame 0xfe00992b36b0
zfs_ioc_recv() at zfs_ioc_recv+0x96c/frame 0xfe00992b3920
zfsdev_ioctl() at zfsdev_ioctl+0x54a/frame 0xfe00992b39c0
devfs_ioctl_f() at devfs_ioctl_f+0xf0/frame 0xfe00992b3a20
kern_ioctl() at kern_ioctl+0x2ca/frame 0xfe00992b3a90
sys_ioctl() at sys_ioctl+0x11f/frame 0xfe00992b3ae0
amd64_syscall() at amd64_syscall+0x265/frame 0xfe00992b3bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe00992b3bf0



Thank you very much.
To me this looks very similar to a problem discovered and fixed in illumos some
time ago.  Please check if the following change improves the situation for you.

https://github.com/avg-I/freebsd/commit/a7e7dece215bc5d00077e9c7f4db34d9e5c30659

Raw:
https://github.com/avg-I/freebsd/commit/a7e7dece215bc5d00077e9c7f4db34d9e5c30659.patch
...


Yes, it does.  send/recv completes with no panic.  That patch fixes kern/182570 
for me.

Thanks!

...keith

Once the patch is applied "svn diff" gives me:

Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c
===
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c(revision 
255986)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c(working copy)
@@ -677,6 +677,16 @@
if (err != 0)
return (err);
err = dmu_free_long_range_impl(os, dn, offset, length);
+
+   /*
+* It is important to zero out the maxblkid when freeing the entire
+* file, so that (a) subsequent calls to dmu_free_long_range_impl()
+* will take the fast path, and (b) dnode_reallocate() can verify
+* that the entire file has been freed.
+*/
+   if (offset == 0 && length == DMU_OBJECT_END)
+   dn->dn_maxblkid = 0;
+
dnode_rele(dn, FTAG);
return (err);
 }
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c
===
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c (revision 
255986)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c (working copy)
@@ -616,7 +616,7 @@
 */
if (dn->dn_datablkshift == 0) {
if (off != 0 || len < dn->dn_datablksz)
-   dmu_tx_count_write(txh, off, len);
+   dmu_tx_count_write(txh, 0, dn->dn_datablksz);
} else {
/* first block will be modified if it is not aligned */
if (!IS_P2ALIGNED(off, 1 << dn->dn_datablkshift))

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: ZFS panic with r255937

2013-10-02 Thread Andriy Gapon

on 02/10/2013 20:59 Keith White said the following:
> On Wed, 2 Oct 2013, Andriy Gapon wrote:
> 
>> on 30/09/2013 02:11 kwh...@site.uottawa.ca said the following:
>>> Sorry, debugging this is *way* beyond me.  Any hints, patches to try?
>>
>> Please share the stack trace.
>>
>> -- 
>> Andriy Gapon
> 
> There's now a pr for this panic: kern/182570
> 
> Here's the stack trace:
> 
> root@freebsd10:/usr/src # kgdb /boot/kernel/kernel /var/crash/vmcore.last
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> panic: solaris assert: dn->dn_maxblkid == 0 &&
> (BP_IS_HOLE(&dn->dn_phys->dn_blkptr[0]) || dnode_block_freed(dn, 0)), file:
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c,
> line: 598
> cpuid = 1
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00992b3280
> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe00992b3330
> vpanic() at vpanic+0x126/frame 0xfe00992b3370
> panic() at panic+0x43/frame 0xfe00992b33d0
> assfail() at assfail+0x22/frame 0xfe00992b33e0
> dnode_reallocate() at dnode_reallocate+0x225/frame 0xfe00992b3430
> dmu_object_reclaim() at dmu_object_reclaim+0x123/frame 0xfe00992b3480
> dmu_recv_stream() at dmu_recv_stream+0xd79/frame 0xfe00992b36b0
> zfs_ioc_recv() at zfs_ioc_recv+0x96c/frame 0xfe00992b3920
> zfsdev_ioctl() at zfsdev_ioctl+0x54a/frame 0xfe00992b39c0
> devfs_ioctl_f() at devfs_ioctl_f+0xf0/frame 0xfe00992b3a20
> kern_ioctl() at kern_ioctl+0x2ca/frame 0xfe00992b3a90
> sys_ioctl() at sys_ioctl+0x11f/frame 0xfe00992b3ae0
> amd64_syscall() at amd64_syscall+0x265/frame 0xfe00992b3bf0
> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe00992b3bf0


Thank you very much.
To me this looks very similar to a problem discovered and fixed in illumos some
time ago.  Please check if the following change improves the situation for you.

https://github.com/avg-I/freebsd/commit/a7e7dece215bc5d00077e9c7f4db34d9e5c30659

Raw:
https://github.com/avg-I/freebsd/commit/a7e7dece215bc5d00077e9c7f4db34d9e5c30659.patch

> --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8019dc9ca, rsp =
> 0x7fff5ad8, rbp = 0x7fff5b60 ---
> KDB: enter: panic
> Uptime: 37m10s
> Dumping 874 out of 2023 MB:
> 
> ...
> 
> (kgdb) where
> #0  doadump (textdump=1) at pcpu.h:218
> #1  0x808b7217 in kern_reboot (howto=260) at
> /usr/src/sys/kern/kern_shutdown.c:447
> #2  0x808b7725 in vpanic (fmt=, ap= optimized
> out>) at /usr/src/sys/kern/kern_shutdown.c:754
> #3  0x808b7773 in panic (fmt=) at
> /usr/src/sys/kern/kern_shutdown.c:683
> #4  0x81e5 in assfail (a=, f= out>, l=) at
> /usr/src/sys/modules/opensolaris/../../cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81
> 
> #5  0x81d09735 in dnode_reallocate (dn=0xf8006dde3000,
> ot=DMU_OT_PLAIN_FILE_CONTENTS, blocksize=1024, bonustype=DMU_OT_SA,
> bonuslen=168, tx=0xf8006d7a2600)
> at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:596
> 
> #6  0x81cff463 in dmu_object_reclaim (os=,
> object=, ot=DMU_OT_PLAIN_FILE_CONTENTS, blocksize=1024,
> bonustype=DMU_OT_SA, bonuslen=168)
> at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c:155
> 
> #7  0x81cfd849 in dmu_recv_stream (drc=0xfe00992b3728, fp= optimized out>, voffp=0xfe00992b3718, cleanup_fd=8, action_handlep= optimized out>)
> at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:1231
> 
> #8  0x81d8b1fc in zfs_ioc_recv (zc=0xfe00372e1000) at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:4102
> 
> #9  0x81d864ea in zfsdev_ioctl (dev=, zcmd= optimized out>, arg=, flag=, 
> td= optimized out>)
> at
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:5960
> 
> #10 0x807af840 in devfs_ioctl_f (fp=0xf800052e1460, 
> com=3222821403,
> data=0xf8006ff1faa0, cred=, td=0xf8003f7ba920) at
> /usr/src/sys/fs/devfs/devfs_vnops.c:757
> #11 0x8090e94a in kern_ioctl (td=0xf8003f7ba920, fd= optimized
> out>, com=0) at file.h:319
> #12 0x8090e62f in sys_ioctl (td=0xf8003f7ba920,
> uap=0xfe00992b3b80) at /usr/src/sys/kern/sys_generic.c:698
> #13 0x80caee35 in amd64_syscall (td=0xf8003f7ba920, traced=0) at
> subr_syscall.c:134
> #14 0x80c961ab in Xfast_syscall () at
> /usr/src/sys/amd64/amd64/exception.S:391
> #15 0x0008019dc9ca in ?? (

Re: ZFS panic with r255937

2013-10-02 Thread Keith White


On Wed, 2 Oct 2013, Andriy Gapon wrote:


on 30/09/2013 02:11 kwh...@site.uottawa.ca said the following:

Sorry, debugging this is *way* beyond me.  Any hints, patches to try?


Please share the stack trace.

--
Andriy Gapon


There's now a pr for this panic: kern/182570

Here's the stack trace:

root@freebsd10:/usr/src # kgdb /boot/kernel/kernel /var/crash/vmcore.last
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: solaris assert: dn->dn_maxblkid == 0 && 
(BP_IS_HOLE(&dn->dn_phys->dn_blkptr[0]) || dnode_block_freed(dn, 0)), file: 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c, line: 598
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00992b3280
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe00992b3330
vpanic() at vpanic+0x126/frame 0xfe00992b3370
panic() at panic+0x43/frame 0xfe00992b33d0
assfail() at assfail+0x22/frame 0xfe00992b33e0
dnode_reallocate() at dnode_reallocate+0x225/frame 0xfe00992b3430
dmu_object_reclaim() at dmu_object_reclaim+0x123/frame 0xfe00992b3480
dmu_recv_stream() at dmu_recv_stream+0xd79/frame 0xfe00992b36b0
zfs_ioc_recv() at zfs_ioc_recv+0x96c/frame 0xfe00992b3920
zfsdev_ioctl() at zfsdev_ioctl+0x54a/frame 0xfe00992b39c0
devfs_ioctl_f() at devfs_ioctl_f+0xf0/frame 0xfe00992b3a20
kern_ioctl() at kern_ioctl+0x2ca/frame 0xfe00992b3a90
sys_ioctl() at sys_ioctl+0x11f/frame 0xfe00992b3ae0
amd64_syscall() at amd64_syscall+0x265/frame 0xfe00992b3bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe00992b3bf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8019dc9ca, rsp = 
0x7fff5ad8, rbp = 0x7fff5b60 ---
KDB: enter: panic
Uptime: 37m10s
Dumping 874 out of 2023 MB:

...

(kgdb) where
#0  doadump (textdump=1) at pcpu.h:218
#1  0x808b7217 in kern_reboot (howto=260) at 
/usr/src/sys/kern/kern_shutdown.c:447
#2  0x808b7725 in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:754
#3  0x808b7773 in panic (fmt=) at 
/usr/src/sys/kern/kern_shutdown.c:683
#4  0x81e5 in assfail (a=, f=, 
l=) at 
/usr/src/sys/modules/opensolaris/../../cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81
#5  0x81d09735 in dnode_reallocate (dn=0xf8006dde3000, 
ot=DMU_OT_PLAIN_FILE_CONTENTS, blocksize=1024, bonustype=DMU_OT_SA, 
bonuslen=168, tx=0xf8006d7a2600)
at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:596
#6  0x81cff463 in dmu_object_reclaim (os=, 
object=, ot=DMU_OT_PLAIN_FILE_CONTENTS, blocksize=1024, 
bonustype=DMU_OT_SA, bonuslen=168)
at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_object.c:155
#7  0x81cfd849 in dmu_recv_stream (drc=0xfe00992b3728, fp=, voffp=0xfe00992b3718, cleanup_fd=8, action_handlep=)
at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_send.c:1231
#8  0x81d8b1fc in zfs_ioc_recv (zc=0xfe00372e1000) at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:4102
#9  0x81d864ea in zfsdev_ioctl (dev=, zcmd=, 
arg=, flag=, td=)
at 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:5960
#10 0x807af840 in devfs_ioctl_f (fp=0xf800052e1460, com=3222821403, 
data=0xf8006ff1faa0, cred=, td=0xf8003f7ba920) at 
/usr/src/sys/fs/devfs/devfs_vnops.c:757
#11 0x8090e94a in kern_ioctl (td=0xf8003f7ba920, fd=, com=0) at file.h:319
#12 0x8090e62f in sys_ioctl (td=0xf8003f7ba920, 
uap=0xfe00992b3b80) at /usr/src/sys/kern/sys_generic.c:698
#13 0x80caee35 in amd64_syscall (td=0xf8003f7ba920, traced=0) at 
subr_syscall.c:134
#14 0x80c961ab in Xfast_syscall () at 
/usr/src/sys/amd64/amd64/exception.S:391
#15 0x0008019dc9ca in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal
(kgdb)


Here's a how to repeat:

Assuming pool "tank" and non-existent tank/nobj tank/xobj

=== cut here ===
#!/bin/sh

zfs create tank/nobj
zfs snapshot tank/nobj@0
__MAKECONF=/dev/null SRCCONF=/dev/null MAKEOBJDIRPREFIX=/tank/nobj make -j6 
buildkernel
zfs snapshot tank/nobj@1
#find /tank/nobj -name '.h' -print | xargs rm
zfs snapshot tank/nobj@2
#rm -rf /tank/nobj/*
zfs snapshot tank/nobj@3
__MAKECONF=/dev/null SRCCONF=/dev/null MAKEOBJDIRPREFIX=/tank/nobj make -j6 
buildkernel
zfs snapshot tank/nobj@4
#rm -rf /tank/nobj/*
zfs snapshot tank/nobj@5

zfs send -vR tank/nobj@5 | zfs recv -vF

Re: ZFS panic with r255937

2013-10-02 Thread Andriy Gapon

on 30/09/2013 02:11 kwh...@site.uottawa.ca said the following:
> Sorry, debugging this is *way* beyond me.  Any hints, patches to try?

Please share the stack trace.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

ZFS panic with r255937

2013-09-29 Thread kwhite

I get the following reproducible panic when doing a zfs send -R | zfs recv
of a well churned file system (snapshots of the ports tree before and
after libiconv update) running a recent current:

panic: solaris assert: dn->dn_maxblkid == 0 &&
(BP_IS_HOLE(&dn->dn_phys->dn_blkptr[0]) || dnode_block_freed(dn, 0)),
file:
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c,
line: 598

coredump available.

# uname -a
FreeBSD freebsd10 10.0-ALPHA4 FreeBSD 10.0-ALPHA4 #0 r255937: Sun Sep 29
09:45:21 EDT 2013 kwhite@freebsd10:/usr/obj/usr/src/sys/GENERIC  amd64


# zfs list -t snapshot -r tank/ports
NAME  USED  AVAIL  REFER  MOUNTPOINT
tank/ports@20130915  72.5M  -  3.64G  -
tank/ports@20130917  0  -  3.65G  -
tank/ports@20130918  0  -  3.65G  -
tank/ports@20130921  88.6M  -  3.66G  -
tank/ports@20130928   352K  -  3.66G  -

# zpool list -v
NAME SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
m_tank  1.81T   831G  1.00T44%  1.00x  ONLINE  -
  ada4.nop.eli  1.81T   831G  1.00T -
tank 928G   831G  96.9G89%  1.00x  ONLINE  -
  ada3.nop.eli   928G   831G  96.9G -

# zfs send -vR tank/ports@20130928 | zfs recv -vF m_tank/xports
... panic eventually ...

# cd /boot/kernel; kgdb kernel /var/crash/vmcore.last
...
(kgdb) up
#5  0x81d09735 in dnode_reallocate (dn=0xf8006dde3000,
ot=DMU_OT_PLAIN_FILE_CONTENTS, blocksize=1024, bonustype=DMU_OT_SA,
bonuslen=168, tx=0xf8006d7a2600)
at
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c:596

596 ASSERT(dn->dn_maxblkid == 0 &&

(kgdb) p dn->dn_maxblkid
$1 = 2

So, no question about why the ASSERT failed.

Sorry, debugging this is *way* beyond me.  Any hints, patches to try?

...keith

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: ZFS panic with concurrent recv and read-heavy workload

2011-06-08 Thread Marius Strobl

On Fri, Jun 03, 2011 at 03:03:56AM -0400, Nathaniel W Filardo wrote:
> I just got this on another machine, no heavy workload needed, just booting
> and starting some jails.  Of interest, perhaps, both this and the machine
> triggering the below panic are SMP V240s with 1.5GHz CPUs (though I will
> confess that the machine in the original report may have had bad RAM).  I
> have run a UP 1.2GHz V240 for months and never seen this panic.
> 
> This time the kernel is
> > FreeBSD 9.0-CURRENT #9: Fri Jun  3 02:32:13 EDT 2011
> csup'd immediately before building.  The full panic this time is
> > panic: Lock buf_hash_table.ht_locks[i].ht_lock not exclusively locked @
> > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4659
> >
> > cpuid = 1
> > KDB: stack backtrace:
> > panic() at panic+0x1c8
> > _sx_assert() at _sx_assert+0xc4
> > _sx_xunlock() at _sx_xunlock+0x98
> > l2arc_feed_thread() at l2arc_feed_thread+0xeac
> > fork_exit() at fork_exit+0x9c
> > fork_trampoline() at fork_trampoline+0x8
> >
> > SC Alert: SC Request to send Break to host.
> > KDB: enter: Line break on console
> > [ thread pid 27 tid 100121 ]
> > Stopped at  kdb_enter+0x80: ta  %xcc, 1
> > db> reset
> > ttiimmeeoouutt  sshhuuiinngg  ddoowwnn  CCPPUUss..
> 
> Half of the memory in this machine is new (well, came with the machine) and
> half is from the aforementioned UP V240 which seemed to work fine (I was
> attempting an upgrade when this happened); none of it (or indeed any of the
> hardware save the disk controller and disks) are common between this and the
> machine reporting below.
> 
> Thoughts?  Any help would be greatly appreciated.
> Thanks.
> --nwf;
> 
> On Wed, Apr 06, 2011 at 04:00:43AM -0400, Nathaniel W Filardo wrote:
> >[...]
> > panic: Lock buf_hash_table.ht_locks[i].ht_lock not exclusively locked @ 
> > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1869
> >
> > cpuid = 1
> > KDB: stack backtrace:
> > panic() at panic+0x1c8
> > _sx_assert() at _sx_assert+0xc4
> > _sx_xunlock() at _sx_xunlock+0x98
> > arc_evict() at arc_evict+0x614
> > arc_get_data_buf() at arc_get_data_buf+0x360
> > arc_buf_alloc() at arc_buf_alloc+0x94
> > dmu_buf_will_fill() at dmu_buf_will_fill+0xfc
> > dmu_write() at dmu_write+0xec
> > dmu_recv_stream() at dmu_recv_stream+0x8a8
> > zfs_ioc_recv() at zfs_ioc_recv+0x354
> > zfsdev_ioctl() at zfsdev_ioctl+0xe0
> > devfs_ioctl_f() at devfs_ioctl_f+0xe8
> > kern_ioctl() at kern_ioctl+0x294
> > ioctl() at ioctl+0x198
> > syscallenter() at syscallenter+0x270
> > syscall() at syscall+0x74
> > -- syscall (54, FreeBSD ELF64, ioctl) %o7=0x40c13e24 --
> > userland() at 0x40e72cc8
> > user trace: trap %o7=0x40c13e24
> > pc 0x40e72cc8, sp 0x7fd4641
> > pc 0x40c158f4, sp 0x7fd4721
> > pc 0x40c1e878, sp 0x7fd47f1
> > pc 0x40c1ce54, sp 0x7fd8b01
> > pc 0x40c1dbe0, sp 0x7fd9431
> > pc 0x40c1f718, sp 0x7fdd741
> > pc 0x10731c, sp 0x7fdd831
> > pc 0x10c90c, sp 0x7fdd8f1
> > pc 0x103ef0, sp 0x7fde1d1
> > pc 0x4021aff4, sp 0x7fde291
> > done
> >[...]

Apparently this is a locking issue in the ARC code, the ZFS people should
be able to help you.

Marius

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: ZFS panic with concurrent recv and read-heavy workload

2011-06-03 Thread Nathaniel W Filardo

I just got this on another machine, no heavy workload needed, just booting
and starting some jails.  Of interest, perhaps, both this and the machine
triggering the below panic are SMP V240s with 1.5GHz CPUs (though I will
confess that the machine in the original report may have had bad RAM).  I
have run a UP 1.2GHz V240 for months and never seen this panic.

This time the kernel is
> FreeBSD 9.0-CURRENT #9: Fri Jun  3 02:32:13 EDT 2011
csup'd immediately before building.  The full panic this time is
> panic: Lock buf_hash_table.ht_locks[i].ht_lock not exclusively locked @
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4659
>
> cpuid = 1
> KDB: stack backtrace:
> panic() at panic+0x1c8
> _sx_assert() at _sx_assert+0xc4
> _sx_xunlock() at _sx_xunlock+0x98
> l2arc_feed_thread() at l2arc_feed_thread+0xeac
> fork_exit() at fork_exit+0x9c
> fork_trampoline() at fork_trampoline+0x8
>
> SC Alert: SC Request to send Break to host.
> KDB: enter: Line break on console
> [ thread pid 27 tid 100121 ]
> Stopped at  kdb_enter+0x80: ta  %xcc, 1
> db> reset
> ttiimmeeoouutt  sshhuuiinngg  ddoowwnn  CCPPUUss..

Half of the memory in this machine is new (well, came with the machine) and
half is from the aforementioned UP V240 which seemed to work fine (I was
attempting an upgrade when this happened); none of it (or indeed any of the
hardware save the disk controller and disks) are common between this and the
machine reporting below.

Thoughts?  Any help would be greatly appreciated.
Thanks.
--nwf;

On Wed, Apr 06, 2011 at 04:00:43AM -0400, Nathaniel W Filardo wrote:
>[...]
> panic: Lock buf_hash_table.ht_locks[i].ht_lock not exclusively locked @ 
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1869
>
> cpuid = 1
> KDB: stack backtrace:
> panic() at panic+0x1c8
> _sx_assert() at _sx_assert+0xc4
> _sx_xunlock() at _sx_xunlock+0x98
> arc_evict() at arc_evict+0x614
> arc_get_data_buf() at arc_get_data_buf+0x360
> arc_buf_alloc() at arc_buf_alloc+0x94
> dmu_buf_will_fill() at dmu_buf_will_fill+0xfc
> dmu_write() at dmu_write+0xec
> dmu_recv_stream() at dmu_recv_stream+0x8a8
> zfs_ioc_recv() at zfs_ioc_recv+0x354
> zfsdev_ioctl() at zfsdev_ioctl+0xe0
> devfs_ioctl_f() at devfs_ioctl_f+0xe8
> kern_ioctl() at kern_ioctl+0x294
> ioctl() at ioctl+0x198
> syscallenter() at syscallenter+0x270
> syscall() at syscall+0x74
> -- syscall (54, FreeBSD ELF64, ioctl) %o7=0x40c13e24 --
> userland() at 0x40e72cc8
> user trace: trap %o7=0x40c13e24
> pc 0x40e72cc8, sp 0x7fd4641
> pc 0x40c158f4, sp 0x7fd4721
> pc 0x40c1e878, sp 0x7fd47f1
> pc 0x40c1ce54, sp 0x7fd8b01
> pc 0x40c1dbe0, sp 0x7fd9431
> pc 0x40c1f718, sp 0x7fdd741
> pc 0x10731c, sp 0x7fdd831
> pc 0x10c90c, sp 0x7fdd8f1
> pc 0x103ef0, sp 0x7fde1d1
> pc 0x4021aff4, sp 0x7fde291
> done
>[...]


pgpz83vKmukl9.pgp
Description: PGP signature

ZFS panic with concurrent recv and read-heavy workload

2011-04-06 Thread Nathaniel W Filardo

When racing two workloads, one doing
>  zfs recv -v -d testpool
and the other
>  find /testpool -type f -print0 | xargs -0 sha1
I can (seemingly reliably) trigger this panic:

panic: Lock buf_hash_table.ht_locks[i].ht_lock not exclusively locked @ 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1869
  

   
cpuid = 1   
   
KDB: stack backtrace:   
   
panic() at panic+0x1c8  
   
_sx_assert() at _sx_assert+0xc4 
   
_sx_xunlock() at _sx_xunlock+0x98   
   
arc_evict() at arc_evict+0x614  
   
arc_get_data_buf() at arc_get_data_buf+0x360
   
arc_buf_alloc() at arc_buf_alloc+0x94   
   
dmu_buf_will_fill() at dmu_buf_will_fill+0xfc
dmu_write() at dmu_write+0xec
dmu_recv_stream() at dmu_recv_stream+0x8a8  
   
zfs_ioc_recv() at zfs_ioc_recv+0x354
   
zfsdev_ioctl() at zfsdev_ioctl+0xe0 
   
devfs_ioctl_f() at devfs_ioctl_f+0xe8   
   
kern_ioctl() at kern_ioctl+0x294
   
ioctl() at ioctl+0x198
syscallenter() at syscallenter+0x270
syscall() at syscall+0x74   
   
-- syscall (54, FreeBSD ELF64, ioctl) %o7=0x40c13e24 -- 
   
userland() at 0x40e72cc8
   
user trace: trap %o7=0x40c13e24 
   
pc 0x40e72cc8, sp 0x7fd4641
pc 0x40c158f4, sp 0x7fd4721 
   
pc 0x40c1e878, sp 0x7fd47f1 
   
pc 0x40c1ce54, sp 0x7fd8b01 
   
pc 0x40c1dbe0, sp 0x7fd9431 
   
pc 0x40c1f718, sp 0x7fdd741 
   
pc 0x10731c, sp 0x7fdd831   
   
pc 0x10c90c, sp 0x7fdd8f1   
   
pc 0x103ef0, sp 0x7fde1d1   
   
pc 0x4021aff4, sp 0x7fde291 
   
done

The machine is a freshly installed and built sparc64 2-way SMP, running
today's -CURRENT with
http://people.freebsd.org/~mm/patches/zfs/zfs_ioctl_compat_bugfix.patch
applied.  Of note, it has only 1G of RAM in it, so kmem_max <= 512M.

Thoughts?  More information?  Thanks in advance.
--nwf;


pgpo8tXy31jgF.pgp
Description: PGP signature

Re: zfs panic

2010-06-24 Thread Jaakko Heinonen

On 2010-06-23, ben wilber wrote:
> > > panic: _sx_xlock_hard: recursed on non-recursive sx 
> > > buf_hash_table.ht_locks[i].ht_lock @ 
> > > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/c
> > > ommon/fs/zfs/arc.c:1626
> > 
> > Any chance to obtain a backtrace for the panic?
> 
> >From r209229:
> 
> db:0:kdb.enter.default>  bt
> Tracing pid 3233 tid 100396 td 0xff013600f000
> kdb_enter() at kdb_enter+0x3d
> panic() at panic+0x1c8
> _sx_xlock_hard() at _sx_xlock_hard+0x93
> _sx_xlock() at _sx_xlock+0xaa
> arc_buf_remove_ref() at arc_buf_remove_ref+0x7b
> dbuf_rele() at dbuf_rele+0x15d
> zfs_freebsd_reclaim() at zfs_freebsd_reclaim+0xe1
> VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0xe8
> vgonel() at vgonel+0x186
> vnlru_free() at vnlru_free+0x2f4
> vfs_lowmem() at vfs_lowmem+0x31

I believe that this has been fixed in r209260.

-- 
Jaakko
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: zfs panic

2010-06-24 Thread Tom Evans

On Wed, Jun 23, 2010 at 10:01 PM, ben wilber  wrote:
> On Wed, Jun 23, 2010 at 01:47:33PM -0700, Xin LI wrote:
>> >
>> > panic: _sx_xlock_hard: recursed on non-recursive sx 
>> > buf_hash_table.ht_locks[i].ht_lock @ 
>> > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/c
>> > ommon/fs/zfs/arc.c:1626
>>
>> Any chance to obtain a backtrace for the panic?
>
> >From r209229:
>
> db:0:kdb.enter.default>  bt
> Tracing pid 3233 tid 100396 td 0xff013600f000
> kdb_enter() at kdb_enter+0x3d
> panic() at panic+0x1c8
> _sx_xlock_hard() at _sx_xlock_hard+0x93
> _sx_xlock() at _sx_xlock+0xaa
> arc_buf_remove_ref() at arc_buf_remove_ref+0x7b
> dbuf_rele() at dbuf_rele+0x15d
> zfs_freebsd_reclaim() at zfs_freebsd_reclaim+0xe1
> VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0xe8
> vgonel() at vgonel+0x186
> vnlru_free() at vnlru_free+0x2f4
> vfs_lowmem() at vfs_lowmem+0x31
> kmem_malloc() at kmem_malloc+0x12c
> page_alloc() at page_alloc+0x18
> keg_alloc_slab() at keg_alloc_slab+0xe6
> keg_fetch_slab() at keg_fetch_slab+0x18e
> zone_fetch_slab() at zone_fetch_slab+0x4f
> uma_zalloc_arg() at uma_zalloc_arg+0x56b
> arc_get_data_buf() at arc_get_data_buf+0xaa
> arc_read_nolock() at arc_read_nolock+0x1cb
> arc_read() at arc_read+0x71
> dbuf_read() at dbuf_read+0x4ea
> dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x119
> dmu_buf_hold_array() at dmu_buf_hold_array+0x57
> dmu_read_uio() at dmu_read_uio+0x3f
> zfs_freebsd_read() at zfs_freebsd_read+0x558
> VOP_READ_APV() at VOP_READ_APV+0xe2
> vn_read() at vn_read+0x1d0
> dofileread() at dofileread+0x97
> kern_preadv() at kern_preadv+0x6a
> pread() at pread+0x52
> syscallenter() at syscallenter+0x217
> syscall() at syscall+0x39
> Xfast_syscall() at Xfast_syscall+0xe1
> --- syscall (475, FreeBSD ELF64, pread), rip = 0x80100c14c, rsp = 
> 0x7fbfeb48, rbp = 0x2 ---
>
> Thanks.

I notice the traceback is in the UMA zone allocaor. Did it used to be stable?
ZFS recently changed to using the UMA allocator, and I found this made
my system less reliable. Does disabling this help? Add this to
/boot/loader.conf:

vfs.zfs.zio.use_uma=0

Cheers

Tom
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: zfs panic

2010-06-23 Thread ben wilber

On Wed, Jun 23, 2010 at 01:47:33PM -0700, Xin LI wrote:
> > 
> > panic: _sx_xlock_hard: recursed on non-recursive sx 
> > buf_hash_table.ht_locks[i].ht_lock @ 
> > /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/c
> > ommon/fs/zfs/arc.c:1626
> 
> Any chance to obtain a backtrace for the panic?

>From r209229:

db:0:kdb.enter.default>  bt
Tracing pid 3233 tid 100396 td 0xff013600f000
kdb_enter() at kdb_enter+0x3d
panic() at panic+0x1c8
_sx_xlock_hard() at _sx_xlock_hard+0x93
_sx_xlock() at _sx_xlock+0xaa
arc_buf_remove_ref() at arc_buf_remove_ref+0x7b
dbuf_rele() at dbuf_rele+0x15d
zfs_freebsd_reclaim() at zfs_freebsd_reclaim+0xe1
VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0xe8
vgonel() at vgonel+0x186
vnlru_free() at vnlru_free+0x2f4
vfs_lowmem() at vfs_lowmem+0x31
kmem_malloc() at kmem_malloc+0x12c
page_alloc() at page_alloc+0x18
keg_alloc_slab() at keg_alloc_slab+0xe6
keg_fetch_slab() at keg_fetch_slab+0x18e
zone_fetch_slab() at zone_fetch_slab+0x4f
uma_zalloc_arg() at uma_zalloc_arg+0x56b
arc_get_data_buf() at arc_get_data_buf+0xaa
arc_read_nolock() at arc_read_nolock+0x1cb
arc_read() at arc_read+0x71
dbuf_read() at dbuf_read+0x4ea
dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x119
dmu_buf_hold_array() at dmu_buf_hold_array+0x57
dmu_read_uio() at dmu_read_uio+0x3f
zfs_freebsd_read() at zfs_freebsd_read+0x558
VOP_READ_APV() at VOP_READ_APV+0xe2
vn_read() at vn_read+0x1d0
dofileread() at dofileread+0x97
kern_preadv() at kern_preadv+0x6a
pread() at pread+0x52
syscallenter() at syscallenter+0x217
syscall() at syscall+0x39
Xfast_syscall() at Xfast_syscall+0xe1
--- syscall (475, FreeBSD ELF64, pread), rip = 0x80100c14c, rsp = 
0x7fbfeb48, rbp = 0x2 ---

Thanks.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: zfs panic

2010-06-23 Thread Xin LI

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Hi, Ben,

On 2010/06/23 09:44, ben wilber wrote:
> Hi,
> 
> Since at least r208174, I've been getting the following panic every few days 
> on
> my fairly heavily loaded amd64 machine:
> 
> panic: _sx_xlock_hard: recursed on non-recursive sx 
> buf_hash_table.ht_locks[i].ht_lock @ 
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/c
> ommon/fs/zfs/arc.c:1626
> 
> The machine has 36 disks behind mfi(4), plus some SATA L2ARC.  This
> panic is sometimes preempted by "kmem_map too small", so I apologize in
> advance if this is caused by memory shortage.

Any chance to obtain a backtrace for the panic?

Cheers,
- -- 
Xin LI http://www.delphij.net/
FreeBSD - The Power to Serve!  Live free or die
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.15 (FreeBSD)

iQEcBAEBCAAGBQJMInLlAAoJEATO+BI/yjfB2GMH+wUzic+knneIShCqZJltR9Xe
WyvEvMAkzEqg+4quLbDT8n5fge/wLY6NKODDwHCNOwku3HQqwon+lQpA9YqiWplm
MH+x+Nxa8evE/Fc84Xl8ajgjzWVqAfGWl6mhruBCjeVf/oY3ZujiX9mCPNKMzDU/
86To+UkeiQfVHDcwh8xDp+vIb+QvYEyKY1cmqi4Uu1nojAGBgCPs3ISUG0/834/J
H0pv+CeAi3zlDzeOSUBtpf2RutJ/oH1MuyHoB1LORk1FPVe3WvbGx+lpQonoedta
UVZqYeAqpliTbqQOXfrl7+6g0bqnCobyri8YNtyLS2h6NsA05lclYEbTJyR9QgI=
=JTmj
-END PGP SIGNATURE-
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

zfs panic

2010-06-23 Thread ben wilber

Hi,

Since at least r208174, I've been getting the following panic every few days on
my fairly heavily loaded amd64 machine:

panic: _sx_xlock_hard: recursed on non-recursive sx 
buf_hash_table.ht_locks[i].ht_lock @ 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/c
ommon/fs/zfs/arc.c:1626

The machine has 36 disks behind mfi(4), plus some SATA L2ARC.  This
panic is sometimes preempted by "kmem_map too small", so I apologize in
advance if this is caused by memory shortage.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

ZFS panic import

2010-06-02 Thread Danilo Baio

Hi,

I have (had), a freebsd 8.0 release-p2 running a zfs poll with raidz, 3
disks 250gb SATA and 1 disk IDE... OK... this was running for a while
without problem, but, some day ago, with a power outage the zfs don't import
anymore.

All the system was in the ZFS, so, i had to use a fixit cd for try to
import.

zpool import -o ro ID..
cannot import ' tank': pool may be in use from other system
use -f to import anyway

with -f:

panic: solaris assert: size != 0, file
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c,
line: 91
cpuid = 0
Uptime: .10m55s

I saved the coredump:

kgdb kernel.debug /home/dbaio/zfs.vmcore.0
...
...
ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is
present;
to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
ZFS filesystem version 13
ZFS storage pool version 13
panic: solaris assert: size != 0, file:
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c,
line: 91
cpuid = 0
Uptime: 7m13s
Physical memory: 3949 MB
Dumping 1288 MB: (CTRL-C to abort) (CTRL-C to abort) 1273 1257 1241 1225
1209 1193 1177 1161 1145 1129 1113 1097 1081 1065 1049 1033 1017 1001 985
969 953 937 921 905 889 873 857 841 825 809 793 777 761 745 729 713 697 681
665 649 633 617 601 585 569 553 537 521 505 489 473 457 441 425 409 393 377
361 345 329 313 297 281 265 249 233 217 201 185 169 153 137 121 105 89 73 57
41 25 9

Reading symbols from /mnt2/boot/kernel/opensolaris.ko...done.
Loaded symbols for /mnt2/boot/kernel/opensolaris.ko
Reading symbols from /mnt2/boot/kernel/zfs.ko...done.
Loaded symbols for /mnt2/boot/kernel/zfs.ko
#0 doadump () at pcpu.h:223
223 __asm __volatile("movq %%gs:0,%0" : "=r" (td));
(kgdb) backtrace
#0 doadump () at pcpu.h:223
#1 0x8057f8c9 in boot (howto=260)
at /usr/src/sys/kern/kern_shutdown.c:416
#2 0x8057fcfc in panic (
fmt=0x814fdce8 "solaris assert: %s, file: %s, line: %d")
at /usr/src/sys/kern/kern_shutdown.c:579
#3 0x8146f1d3 in space_map_add () from /mnt2/boot/kernel/zfs.ko
#4 0x81461195 in metaslab_free_dva () from /mnt2/boot/kernel/zfs.ko
#5 0x814612ea in metaslab_free () from /mnt2/boot/kernel/zfs.ko
#6 0x in ?? ()
#7 0x00053732 in ?? ()
#8 0x0008 in ?? ()
#9 0xff0004866870 in ?? ()
#10 0xff000411d000 in ?? ()
#11 0x in ?? ()
#12 0x in ?? ()
#13 0xff80756ff660 in ?? ()
#14 0x81487d67 in zio_dva_free () from /mnt2/boot/kernel/zfs.ko
#15 0x81488f07 in zio_execute () from /mnt2/boot/kernel/zfs.ko
#16 0x8143a675 in arc_free () from /mnt2/boot/kernel/zfs.ko
#17 0x814557a2 in dsl_dataset_block_kill ()
from /mnt2/boot/kernel/zfs.ko
#18 0x8144fbdf in free_blocks () from /mnt2/boot/kernel/zfs.ko
---Type  to continue, or q  to quit---

Now, with 8.1 beta fixit, the same error...

panic: solaris assert: size != 0, file
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c,
line: 90

...

With opensolaris indiana, i can import this zfs with -f
I have already save the data content and now i can do some tests if you
want..

r...@opensolaris:~# zpool import
pool: tank
id: 18269788674034674773
state: ONLINE
status: The pool is formatted using an older on-disk version.
action: The pool can be imported using its name or numeric identifier,
though
some features will not be available without an explicit 'zpool upgrade'.
config:

tank ONLINE
raidz1 ONLINE
c7d0p1 ONLINE
c9d0p1 ONLINE
c7d1p1 ONLINE
c8d0p1 ONLINE

This machine is a AMD Athlon(tm) 64 Processor 3000+ with 4gb ram.


If something that i can do for help, let me know... maybe the problem is me
=)

Regards.

-- 
Danilo Gonçalves Baio (dbaio)
danilobaio  (*) gmail . com
+55 (44) 8801 1257
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

89 matches

Mail list logo