Re: panic: Unaligned free (was: kernel panic while copying files)

2021-07-07 Thread Gary Jennejohn
On Wed, 7 Jul 2021 09:38:05 +0100
Edward Tomasz Napiera?a  wrote:

> On 0705T1833, Gary Jennejohn wrote:
> > On Mon, 5 Jul 2021 15:04:48 +0100
> > Edward Tomasz Napiera__a  wrote:
> >   
> > > On 0701T1330, Gary Jennejohn wrote:  
> > > > Gary Jennejohn  wrote:
> > > > > I noticed that the value of vm.debug.divisor affects what value is
> > > > > returned in uma_core.c:uma_dbg_kskip(), so I decided to try a few
> > > > > different values.
> > > > > 
> > > > > The returned value is used to set skipdbg in uma_core.c:item_dtor().
> > > > > 
> > > > > The default is vm.debug.divisor=1.
> > > > > 
> > > > > vm.debug.divisor is only present when INVARIANTS is defined.
> > > > > 
> > > > > kskipdbg eventually affects the value of freei.
> > > > > 
> > > > > With these values:
> > > > > vm.debug.divisor: 0
> > > > > kern.cam.da.enable_uma_ccbs: 1
> > > > > I can turn on the disk and it comes up without a panic!
> > > > > 
> > > > > However, I didn't try to do any large data transfers to the disk.
> > > > > 
> > > > > So, it appears that at least vm.debug.divisor is a big factor in
> > > > > whether or not a panic happens with INVARIANTS.
> > > > > 
> > > > 
> > > > I decided to do a real test.  So I built a kernel w/o INVARIANTS and
> > > > installed it to /boot/test.
> > > > 
> > > > Then I stuck a 160GB disk I had around into an external USB3 enclosure
> > > > and put a filesystem on it.
> > > > 
> > > > The I booted the new kernel from /boot/test and set the sysctls so:
> > > > kern.cam.da.enable_uma_ccbs: 1
> > > > kern.cam.ada.enable_uma_ccbs: 1
> > > > 
> > > > After that I plugged in the external USB3 enclosure and copied about
> > > > 114GiB of data from an internal SSD to it - without a kernel panic:
> > > > FilesystemSizeUsed   Avail Capacity  Mounted on
> > > > /dev/da0p1144G114G 18G86%/mnt
> > > > 
> > > > I'm pretty sure that's more than I could copy without a kernel panic
> > > > prior to the recent changes made in cam and umass.
> > > > 
> > > > My test may not be real proof that all bugs have been squashed, but it
> > > > certainly seems to be a better situation than we had before.
> > > 
> > > I think the vm.debug.divisor simply masks the problem; the underlying
> > > bug is still there.
> > > 
> > > Could you go back to the setup which panics, and then test the patch
> > > at https://reviews.freebsd.org/D31054?  It fixes the scenario described
> > > by Warner.
> > >   
> > 
> > It looks like this patch fixes things.
> > 
> > I used the default value vm.debug.divisor=1 and both enable_uma_ccbs=1
> > (which are now the default values on my system).
> > 
> > I used the 8TiB disk, which spins up very slowly and usually resulted very
> > quickly in a panic - no panic with the patch.
> > 
> > Then using dd to /dev/null (bs=1m) I transferred:
> > 
> > 308755+0 records in
> > 308755+0 records out
> > 323753082880 bytes transferred in 1366.162410 secs (236979938 bytes/sec)
> > 
> > from the disk, so about 324GiB without a panic.  
> 
> Perfect, I've committed the fix.  Thank you!
> 

Thanks to you!  I built a new kernel as soon as I saw the commit and
am running it since yesterday.

-- 
Gary Jennejohn



Re: panic: Unaligned free (was: kernel panic while copying files)

2021-07-07 Thread Edward Tomasz Napiera?a
On 0705T1833, Gary Jennejohn wrote:
> On Mon, 5 Jul 2021 15:04:48 +0100
> Edward Tomasz Napiera__a  wrote:
> 
> > On 0701T1330, Gary Jennejohn wrote:
> > > Gary Jennejohn  wrote:  
> > > > I noticed that the value of vm.debug.divisor affects what value is
> > > > returned in uma_core.c:uma_dbg_kskip(), so I decided to try a few
> > > > different values.
> > > > 
> > > > The returned value is used to set skipdbg in uma_core.c:item_dtor().
> > > > 
> > > > The default is vm.debug.divisor=1.
> > > > 
> > > > vm.debug.divisor is only present when INVARIANTS is defined.
> > > > 
> > > > kskipdbg eventually affects the value of freei.
> > > > 
> > > > With these values:
> > > > vm.debug.divisor: 0
> > > > kern.cam.da.enable_uma_ccbs: 1
> > > > I can turn on the disk and it comes up without a panic!
> > > > 
> > > > However, I didn't try to do any large data transfers to the disk.
> > > > 
> > > > So, it appears that at least vm.debug.divisor is a big factor in
> > > > whether or not a panic happens with INVARIANTS.
> > > >   
> > > 
> > > I decided to do a real test.  So I built a kernel w/o INVARIANTS and
> > > installed it to /boot/test.
> > > 
> > > Then I stuck a 160GB disk I had around into an external USB3 enclosure
> > > and put a filesystem on it.
> > > 
> > > The I booted the new kernel from /boot/test and set the sysctls so:
> > > kern.cam.da.enable_uma_ccbs: 1
> > > kern.cam.ada.enable_uma_ccbs: 1
> > > 
> > > After that I plugged in the external USB3 enclosure and copied about
> > > 114GiB of data from an internal SSD to it - without a kernel panic:
> > > FilesystemSizeUsed   Avail Capacity  Mounted on
> > > /dev/da0p1144G114G 18G86%/mnt
> > > 
> > > I'm pretty sure that's more than I could copy without a kernel panic
> > > prior to the recent changes made in cam and umass.
> > > 
> > > My test may not be real proof that all bugs have been squashed, but it
> > > certainly seems to be a better situation than we had before.  
> > 
> > I think the vm.debug.divisor simply masks the problem; the underlying
> > bug is still there.
> > 
> > Could you go back to the setup which panics, and then test the patch
> > at https://reviews.freebsd.org/D31054?  It fixes the scenario described
> > by Warner.
> > 
> 
> It looks like this patch fixes things.
> 
> I used the default value vm.debug.divisor=1 and both enable_uma_ccbs=1
> (which are now the default values on my system).
> 
> I used the 8TiB disk, which spins up very slowly and usually resulted very
> quickly in a panic - no panic with the patch.
> 
> Then using dd to /dev/null (bs=1m) I transferred:
> 
> 308755+0 records in
> 308755+0 records out
> 323753082880 bytes transferred in 1366.162410 secs (236979938 bytes/sec)
> 
> from the disk, so about 324GiB without a panic.

Perfect, I've committed the fix.  Thank you!




Re: panic: Unaligned free (was: kernel panic while copying files)

2021-07-05 Thread Gary Jennejohn
On Mon, 5 Jul 2021 15:04:48 +0100
Edward Tomasz Napiera__a  wrote:

> On 0701T1330, Gary Jennejohn wrote:
> > Gary Jennejohn  wrote:  
> > > I noticed that the value of vm.debug.divisor affects what value is
> > > returned in uma_core.c:uma_dbg_kskip(), so I decided to try a few
> > > different values.
> > > 
> > > The returned value is used to set skipdbg in uma_core.c:item_dtor().
> > > 
> > > The default is vm.debug.divisor=1.
> > > 
> > > vm.debug.divisor is only present when INVARIANTS is defined.
> > > 
> > > kskipdbg eventually affects the value of freei.
> > > 
> > > With these values:
> > > vm.debug.divisor: 0
> > > kern.cam.da.enable_uma_ccbs: 1
> > > I can turn on the disk and it comes up without a panic!
> > > 
> > > However, I didn't try to do any large data transfers to the disk.
> > > 
> > > So, it appears that at least vm.debug.divisor is a big factor in
> > > whether or not a panic happens with INVARIANTS.
> > >   
> > 
> > I decided to do a real test.  So I built a kernel w/o INVARIANTS and
> > installed it to /boot/test.
> > 
> > Then I stuck a 160GB disk I had around into an external USB3 enclosure
> > and put a filesystem on it.
> > 
> > The I booted the new kernel from /boot/test and set the sysctls so:
> > kern.cam.da.enable_uma_ccbs: 1
> > kern.cam.ada.enable_uma_ccbs: 1
> > 
> > After that I plugged in the external USB3 enclosure and copied about
> > 114GiB of data from an internal SSD to it - without a kernel panic:
> > FilesystemSizeUsed   Avail Capacity  Mounted on
> > /dev/da0p1144G114G 18G86%/mnt
> > 
> > I'm pretty sure that's more than I could copy without a kernel panic
> > prior to the recent changes made in cam and umass.
> > 
> > My test may not be real proof that all bugs have been squashed, but it
> > certainly seems to be a better situation than we had before.  
> 
> I think the vm.debug.divisor simply masks the problem; the underlying
> bug is still there.
> 
> Could you go back to the setup which panics, and then test the patch
> at https://reviews.freebsd.org/D31054?  It fixes the scenario described
> by Warner.
> 

It looks like this patch fixes things.

I used the default value vm.debug.divisor=1 and both enable_uma_ccbs=1
(which are now the default values on my system).

I used the 8TiB disk, which spins up very slowly and usually resulted very
quickly in a panic - no panic with the patch.

Then using dd to /dev/null (bs=1m) I transferred:

308755+0 records in
308755+0 records out
323753082880 bytes transferred in 1366.162410 secs (236979938 bytes/sec)

from the disk, so about 324GiB without a panic.

-- 
Gary Jennejohn



Re: panic: Unaligned free (was: kernel panic while copying files)

2021-07-05 Thread Edward Tomasz Napierała
On 0701T1330, Gary Jennejohn wrote:
> Gary Jennejohn  wrote:
> > I noticed that the value of vm.debug.divisor affects what value is
> > returned in uma_core.c:uma_dbg_kskip(), so I decided to try a few
> > different values.
> > 
> > The returned value is used to set skipdbg in uma_core.c:item_dtor().
> > 
> > The default is vm.debug.divisor=1.
> > 
> > vm.debug.divisor is only present when INVARIANTS is defined.
> > 
> > kskipdbg eventually affects the value of freei.
> > 
> > With these values:
> > vm.debug.divisor: 0
> > kern.cam.da.enable_uma_ccbs: 1
> > I can turn on the disk and it comes up without a panic!
> > 
> > However, I didn't try to do any large data transfers to the disk.
> > 
> > So, it appears that at least vm.debug.divisor is a big factor in
> > whether or not a panic happens with INVARIANTS.
> > 
> 
> I decided to do a real test.  So I built a kernel w/o INVARIANTS and
> installed it to /boot/test.
> 
> Then I stuck a 160GB disk I had around into an external USB3 enclosure
> and put a filesystem on it.
> 
> The I booted the new kernel from /boot/test and set the sysctls so:
> kern.cam.da.enable_uma_ccbs: 1
> kern.cam.ada.enable_uma_ccbs: 1
> 
> After that I plugged in the external USB3 enclosure and copied about
> 114GiB of data from an internal SSD to it - without a kernel panic:
> FilesystemSizeUsed   Avail Capacity  Mounted on
> /dev/da0p1144G114G 18G86%/mnt
> 
> I'm pretty sure that's more than I could copy without a kernel panic
> prior to the recent changes made in cam and umass.
> 
> My test may not be real proof that all bugs have been squashed, but it
> certainly seems to be a better situation than we had before.

I think the vm.debug.divisor simply masks the problem; the underlying
bug is still there.

Could you go back to the setup which panics, and then test the patch
at https://reviews.freebsd.org/D31054?  It fixes the scenario described
by Warner.




Re: panic: Unaligned free (was: kernel panic while copying files)

2021-07-01 Thread Gary Jennejohn


Gary Jennejohn  wrote:

> On Wed, 30 Jun 2021 10:35:14 -0600
> Warner Losh  wrote:
> 
> > On Wed, Jun 30, 2021 at 6:58 AM Gary Jennejohn  wrote:
> >   
> > > On Wed, 30 Jun 2021 06:02:59 +0100
> > > Graham Perrin  wrote:
> > >
> > > > On 29/06/2021 10:42, Gary Jennejohn wrote:
> > > > > ___ panic is now the result of an unaligned free.
> > > > >
> > > > > panic: Unaligned free of 0xf800259e2800 from zone
> > > > >  0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)
> > > > >
> > > > > I have the crash dump and a debug kernel in case anyone wants more
> > > info.
> > > > Can you post the backtrace etc. here? Thanks
> > > >
> > >
> > > Sure.  As can be seen from the uma zone being da_ccb, the panic
> > > resulted from setting kern.cam.da.enable_uma_ccbs=1.
> > >
> > > Unread portion of the kernel message buffer:
> > > panic: Unaligned free of 0xf800259e2800 from zone
> > > 0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)
> > > cpuid = 2
> > > time = 1624958650
> > > KDB: stack backtrace:
> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame
> > > 0xfe00c62687a0
> > > kdb_backtrace() at kdb_backtrace+0x46/frame 0xfe00c6268850
> > > vpanic() at vpanic+0x227/frame 0xfe00c62688f0
> > > panic() at panic+0x4e/frame 0xfe00c6268950
> > > uma_dbg_free() at uma_dbg_free+0xfc/frame 0xfe00c62689a0
> > > item_dtor() at item_dtor+0x7c/frame 0xfe00c62689e0
> > > uma_zfree_arg() at uma_zfree_arg+0xf0/frame 0xfe00c6268a50
> > > uma_zfree() at uma_zfree+0x23/frame 0xfe00c6268a70
> > > xpt_free_ccb() at xpt_free_ccb+0x43/frame 0xfe00c6268a90
> > > camperiphdone() at camperiphdone+0x211/frame 0xfe00c6268ae0
> > > xpt_done_process() at xpt_done_process+0x550/frame 0xfe00c6268b40
> > > xpt_done_td() at xpt_done_td+0x1c0/frame 0xfe00c6268b80
> > > fork_exit() at fork_exit+0x117/frame 0xfe00c6268bf0
> > > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c6268bf0
> > > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > > KDB: enter: panic
> > >
> > > doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399
> > > 399 dumptid = curthread->td_tid;
> > > (kgdb) bt
> > > #0  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399
> > > #1  0x804d5dd7 in db_dump (dummy=-2138843371, dummy2=false,
> > > dummy3=-1,
> > > dummy4=0xfe00c6268320 "") at /usr/src/sys/ddb/db_command.c:575
> > > #2  0x804d5bf4 in db_command (
> > > last_cmdp=0x8114ce80 , cmd_table=0x0,
> > > dopager=1)
> > > at /usr/src/sys/ddb/db_command.c:482
> > > #3  0x804d583c in db_command_loop ()
> > > at /usr/src/sys/ddb/db_command.c:535
> > > #4  0x804da27c in db_trap (type=3, code=0)
> > > at /usr/src/sys/ddb/db_main.c:270
> > > #5  0x8083df9d in kdb_trap (type=3, code=0, tf=0xfe00c6268770)
> > > at /usr/src/sys/kern/subr_kdb.c:727
> > > #6  0x80d31494 in trap (frame=0xfe00c6268770)
> > > at /usr/src/sys/amd64/amd64/trap.c:604
> > > #7  0x80d32628 in trap_check (frame=0xfe00c6268770)
> > > at /usr/src/sys/amd64/amd64/trap.c:664
> > > #8  
> > > #9  breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66
> > > #10 0x8083d3d0 in kdb_enter (why=0x80e0355b "panic",
> > > msg=0x80e0355b "panic") at /usr/src/sys/kern/subr_kdb.c:505
> > > #11 0x807d1725 in vpanic (
> > > fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab
> > > %p(%d)", ap=0xfe00c6268930) at /usr/src/sys/kern/kern_shutdown.c:906
> > > #12 0x807d120e in panic (
> > > fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab
> > > %p(%d)")
> > > at /usr/src/sys/kern/kern_shutdown.c:843
> > > #13 0x80c16a8c in uma_dbg_free (zone=0xfe00dc9d2000,
> > > slab=0xf800259e2fd8, item=0xf800259e2800)
> > > at /usr/src/sys/vm/uma_core.c:5659
> > > #14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000,
> > > item=0xf800259e2800, size=544, udata=0x0, skip=SKIP_NONE)
> > > at /usr/src/sys/vm/uma_core.c:3418
> > > #15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000,
> > > item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
> > > #16 0x802e45d3 in uma_zfree (zone=0xfe00dc9d2000,
> > > item=0xf800259e2800) at /usr/src/sys/vm/uma.h:404
> > > #17 0x802dc3c3 in xpt_free_ccb (free_ccb=0xf800259e2800)
> > > at /usr/src/sys/cam/cam_xpt.c:4676
> > > #18 0x802dacf1 in camperiphdone (periph=0xf80025329b00,
> > > done_ccb=0xf80025a24cc0) at /usr/src/sys/cam/cam_periph.c:1427
> > > #19 0x802e4520 in xpt_done_process (ccb_h=0xf80025a24cc0)
> > > at /usr/src/sys/cam/cam_xpt.c:5493
> > > #20 0x802e68e0 in xpt_done_td (arg=0x81143700 
> > > )
> > > at /usr/src/sys/cam/cam_xpt.c:5548
> > > #21 0x807673c7 in fork_exit 

Re: panic: Unaligned free (was: kernel panic while copying files)

2021-06-30 Thread Gary Jennejohn
On Wed, 30 Jun 2021 10:35:14 -0600
Warner Losh  wrote:

> On Wed, Jun 30, 2021 at 6:58 AM Gary Jennejohn  wrote:
> 
> > On Wed, 30 Jun 2021 06:02:59 +0100
> > Graham Perrin  wrote:
> >  
> > > On 29/06/2021 10:42, Gary Jennejohn wrote:  
> > > > ___ panic is now the result of an unaligned free.
> > > >
> > > > panic: Unaligned free of 0xf800259e2800 from zone
> > > >  0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)
> > > >
> > > > I have the crash dump and a debug kernel in case anyone wants more  
> > info.  
> > > Can you post the backtrace etc. here? Thanks
> > >  
> >
> > Sure.  As can be seen from the uma zone being da_ccb, the panic
> > resulted from setting kern.cam.da.enable_uma_ccbs=1.
> >
> > Unread portion of the kernel message buffer:
> > panic: Unaligned free of 0xf800259e2800 from zone
> > 0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)
> > cpuid = 2
> > time = 1624958650
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame
> > 0xfe00c62687a0
> > kdb_backtrace() at kdb_backtrace+0x46/frame 0xfe00c6268850
> > vpanic() at vpanic+0x227/frame 0xfe00c62688f0
> > panic() at panic+0x4e/frame 0xfe00c6268950
> > uma_dbg_free() at uma_dbg_free+0xfc/frame 0xfe00c62689a0
> > item_dtor() at item_dtor+0x7c/frame 0xfe00c62689e0
> > uma_zfree_arg() at uma_zfree_arg+0xf0/frame 0xfe00c6268a50
> > uma_zfree() at uma_zfree+0x23/frame 0xfe00c6268a70
> > xpt_free_ccb() at xpt_free_ccb+0x43/frame 0xfe00c6268a90
> > camperiphdone() at camperiphdone+0x211/frame 0xfe00c6268ae0
> > xpt_done_process() at xpt_done_process+0x550/frame 0xfe00c6268b40
> > xpt_done_td() at xpt_done_td+0x1c0/frame 0xfe00c6268b80
> > fork_exit() at fork_exit+0x117/frame 0xfe00c6268bf0
> > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c6268bf0
> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > KDB: enter: panic
> >
> > doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399
> > 399 dumptid = curthread->td_tid;
> > (kgdb) bt
> > #0  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399
> > #1  0x804d5dd7 in db_dump (dummy=-2138843371, dummy2=false,
> > dummy3=-1,
> > dummy4=0xfe00c6268320 "") at /usr/src/sys/ddb/db_command.c:575
> > #2  0x804d5bf4 in db_command (
> > last_cmdp=0x8114ce80 , cmd_table=0x0,
> > dopager=1)
> > at /usr/src/sys/ddb/db_command.c:482
> > #3  0x804d583c in db_command_loop ()
> > at /usr/src/sys/ddb/db_command.c:535
> > #4  0x804da27c in db_trap (type=3, code=0)
> > at /usr/src/sys/ddb/db_main.c:270
> > #5  0x8083df9d in kdb_trap (type=3, code=0, tf=0xfe00c6268770)
> > at /usr/src/sys/kern/subr_kdb.c:727
> > #6  0x80d31494 in trap (frame=0xfe00c6268770)
> > at /usr/src/sys/amd64/amd64/trap.c:604
> > #7  0x80d32628 in trap_check (frame=0xfe00c6268770)
> > at /usr/src/sys/amd64/amd64/trap.c:664
> > #8  
> > #9  breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66
> > #10 0x8083d3d0 in kdb_enter (why=0x80e0355b "panic",
> > msg=0x80e0355b "panic") at /usr/src/sys/kern/subr_kdb.c:505
> > #11 0x807d1725 in vpanic (
> > fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab
> > %p(%d)", ap=0xfe00c6268930) at /usr/src/sys/kern/kern_shutdown.c:906
> > #12 0x807d120e in panic (
> > fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab
> > %p(%d)")
> > at /usr/src/sys/kern/kern_shutdown.c:843
> > #13 0x80c16a8c in uma_dbg_free (zone=0xfe00dc9d2000,
> > slab=0xf800259e2fd8, item=0xf800259e2800)
> > at /usr/src/sys/vm/uma_core.c:5659
> > #14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000,
> > item=0xf800259e2800, size=544, udata=0x0, skip=SKIP_NONE)
> > at /usr/src/sys/vm/uma_core.c:3418
> > #15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000,
> > item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
> > #16 0x802e45d3 in uma_zfree (zone=0xfe00dc9d2000,
> > item=0xf800259e2800) at /usr/src/sys/vm/uma.h:404
> > #17 0x802dc3c3 in xpt_free_ccb (free_ccb=0xf800259e2800)
> > at /usr/src/sys/cam/cam_xpt.c:4676
> > #18 0x802dacf1 in camperiphdone (periph=0xf80025329b00,
> > done_ccb=0xf80025a24cc0) at /usr/src/sys/cam/cam_periph.c:1427
> > #19 0x802e4520 in xpt_done_process (ccb_h=0xf80025a24cc0)
> > at /usr/src/sys/cam/cam_xpt.c:5493
> > #20 0x802e68e0 in xpt_done_td (arg=0x81143700 )
> > at /usr/src/sys/cam/cam_xpt.c:5548
> > #21 0x807673c7 in fork_exit (callout=0x802e6720
> > ,
> > arg=0x81143700 , frame=0xfe00c6268c00)
> > at /usr/src/sys/kern/kern_fork.c:1083
> > #22 
> >
> > [kgdb stuff removed]
> >
> > (kgdb) down
> > #15 0x80c0ba60 in uma_zfree_arg 

Re: panic: Unaligned free (was: kernel panic while copying files)

2021-06-30 Thread Warner Losh
On Wed, Jun 30, 2021 at 6:58 AM Gary Jennejohn  wrote:

> On Wed, 30 Jun 2021 06:02:59 +0100
> Graham Perrin  wrote:
>
> > On 29/06/2021 10:42, Gary Jennejohn wrote:
> > > ___ panic is now the result of an unaligned free.
> > >
> > > panic: Unaligned free of 0xf800259e2800 from zone
> > >  0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)
> > >
> > > I have the crash dump and a debug kernel in case anyone wants more
> info.
> > Can you post the backtrace etc. here? Thanks
> >
>
> Sure.  As can be seen from the uma zone being da_ccb, the panic
> resulted from setting kern.cam.da.enable_uma_ccbs=1.
>
> Unread portion of the kernel message buffer:
> panic: Unaligned free of 0xf800259e2800 from zone
> 0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)
> cpuid = 2
> time = 1624958650
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame
> 0xfe00c62687a0
> kdb_backtrace() at kdb_backtrace+0x46/frame 0xfe00c6268850
> vpanic() at vpanic+0x227/frame 0xfe00c62688f0
> panic() at panic+0x4e/frame 0xfe00c6268950
> uma_dbg_free() at uma_dbg_free+0xfc/frame 0xfe00c62689a0
> item_dtor() at item_dtor+0x7c/frame 0xfe00c62689e0
> uma_zfree_arg() at uma_zfree_arg+0xf0/frame 0xfe00c6268a50
> uma_zfree() at uma_zfree+0x23/frame 0xfe00c6268a70
> xpt_free_ccb() at xpt_free_ccb+0x43/frame 0xfe00c6268a90
> camperiphdone() at camperiphdone+0x211/frame 0xfe00c6268ae0
> xpt_done_process() at xpt_done_process+0x550/frame 0xfe00c6268b40
> xpt_done_td() at xpt_done_td+0x1c0/frame 0xfe00c6268b80
> fork_exit() at fork_exit+0x117/frame 0xfe00c6268bf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c6268bf0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
>
> doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399
> 399 dumptid = curthread->td_tid;
> (kgdb) bt
> #0  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399
> #1  0x804d5dd7 in db_dump (dummy=-2138843371, dummy2=false,
> dummy3=-1,
> dummy4=0xfe00c6268320 "") at /usr/src/sys/ddb/db_command.c:575
> #2  0x804d5bf4 in db_command (
> last_cmdp=0x8114ce80 , cmd_table=0x0,
> dopager=1)
> at /usr/src/sys/ddb/db_command.c:482
> #3  0x804d583c in db_command_loop ()
> at /usr/src/sys/ddb/db_command.c:535
> #4  0x804da27c in db_trap (type=3, code=0)
> at /usr/src/sys/ddb/db_main.c:270
> #5  0x8083df9d in kdb_trap (type=3, code=0, tf=0xfe00c6268770)
> at /usr/src/sys/kern/subr_kdb.c:727
> #6  0x80d31494 in trap (frame=0xfe00c6268770)
> at /usr/src/sys/amd64/amd64/trap.c:604
> #7  0x80d32628 in trap_check (frame=0xfe00c6268770)
> at /usr/src/sys/amd64/amd64/trap.c:664
> #8  
> #9  breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66
> #10 0x8083d3d0 in kdb_enter (why=0x80e0355b "panic",
> msg=0x80e0355b "panic") at /usr/src/sys/kern/subr_kdb.c:505
> #11 0x807d1725 in vpanic (
> fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab
> %p(%d)", ap=0xfe00c6268930) at /usr/src/sys/kern/kern_shutdown.c:906
> #12 0x807d120e in panic (
> fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab
> %p(%d)")
> at /usr/src/sys/kern/kern_shutdown.c:843
> #13 0x80c16a8c in uma_dbg_free (zone=0xfe00dc9d2000,
> slab=0xf800259e2fd8, item=0xf800259e2800)
> at /usr/src/sys/vm/uma_core.c:5659
> #14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000,
> item=0xf800259e2800, size=544, udata=0x0, skip=SKIP_NONE)
> at /usr/src/sys/vm/uma_core.c:3418
> #15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000,
> item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
> #16 0x802e45d3 in uma_zfree (zone=0xfe00dc9d2000,
> item=0xf800259e2800) at /usr/src/sys/vm/uma.h:404
> #17 0x802dc3c3 in xpt_free_ccb (free_ccb=0xf800259e2800)
> at /usr/src/sys/cam/cam_xpt.c:4676
> #18 0x802dacf1 in camperiphdone (periph=0xf80025329b00,
> done_ccb=0xf80025a24cc0) at /usr/src/sys/cam/cam_periph.c:1427
> #19 0x802e4520 in xpt_done_process (ccb_h=0xf80025a24cc0)
> at /usr/src/sys/cam/cam_xpt.c:5493
> #20 0x802e68e0 in xpt_done_td (arg=0x81143700 )
> at /usr/src/sys/cam/cam_xpt.c:5548
> #21 0x807673c7 in fork_exit (callout=0x802e6720
> ,
> arg=0x81143700 , frame=0xfe00c6268c00)
> at /usr/src/sys/kern/kern_fork.c:1083
> #22 
>
> [kgdb stuff removed]
>
> (kgdb) down
> #15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000,
> item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
> 4374item_dtor(zone, item, cache_uz_size(cache), udata,
> SKIP_NONE);
> (kgdb) down
> #14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000,
> 

Re: panic: Unaligned free (was: kernel panic while copying files)

2021-06-30 Thread Gary Jennejohn
On Wed, 30 Jun 2021 06:02:59 +0100
Graham Perrin  wrote:

> On 29/06/2021 10:42, Gary Jennejohn wrote:
> > ___ panic is now the result of an unaligned free.
> >
> > panic: Unaligned free of 0xf800259e2800 from zone
> >  0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)
> >
> > I have the crash dump and a debug kernel in case anyone wants more info.  
> Can you post the backtrace etc. here? Thanks
> 

Sure.  As can be seen from the uma zone being da_ccb, the panic
resulted from setting kern.cam.da.enable_uma_ccbs=1.

Unread portion of the kernel message buffer:
panic: Unaligned free of 0xf800259e2800 from zone 
0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)
cpuid = 2
time = 1624958650
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame 0xfe00c62687a0
kdb_backtrace() at kdb_backtrace+0x46/frame 0xfe00c6268850
vpanic() at vpanic+0x227/frame 0xfe00c62688f0
panic() at panic+0x4e/frame 0xfe00c6268950
uma_dbg_free() at uma_dbg_free+0xfc/frame 0xfe00c62689a0
item_dtor() at item_dtor+0x7c/frame 0xfe00c62689e0
uma_zfree_arg() at uma_zfree_arg+0xf0/frame 0xfe00c6268a50
uma_zfree() at uma_zfree+0x23/frame 0xfe00c6268a70
xpt_free_ccb() at xpt_free_ccb+0x43/frame 0xfe00c6268a90
camperiphdone() at camperiphdone+0x211/frame 0xfe00c6268ae0
xpt_done_process() at xpt_done_process+0x550/frame 0xfe00c6268b40
xpt_done_td() at xpt_done_td+0x1c0/frame 0xfe00c6268b80
fork_exit() at fork_exit+0x117/frame 0xfe00c6268bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c6268bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399
399 dumptid = curthread->td_tid;
(kgdb) bt
#0  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399
#1  0x804d5dd7 in db_dump (dummy=-2138843371, dummy2=false, dummy3=-1,
dummy4=0xfe00c6268320 "") at /usr/src/sys/ddb/db_command.c:575
#2  0x804d5bf4 in db_command (
last_cmdp=0x8114ce80 , cmd_table=0x0, dopager=1)
at /usr/src/sys/ddb/db_command.c:482
#3  0x804d583c in db_command_loop ()
at /usr/src/sys/ddb/db_command.c:535
#4  0x804da27c in db_trap (type=3, code=0)
at /usr/src/sys/ddb/db_main.c:270
#5  0x8083df9d in kdb_trap (type=3, code=0, tf=0xfe00c6268770)
at /usr/src/sys/kern/subr_kdb.c:727
#6  0x80d31494 in trap (frame=0xfe00c6268770)
at /usr/src/sys/amd64/amd64/trap.c:604
#7  0x80d32628 in trap_check (frame=0xfe00c6268770)
at /usr/src/sys/amd64/amd64/trap.c:664
#8  
#9  breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66
#10 0x8083d3d0 in kdb_enter (why=0x80e0355b "panic",
msg=0x80e0355b "panic") at /usr/src/sys/kern/subr_kdb.c:505
#11 0x807d1725 in vpanic (
fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab %p(%d)", 
ap=0xfe00c6268930) at /usr/src/sys/kern/kern_shutdown.c:906
#12 0x807d120e in panic (
fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab %p(%d)")
at /usr/src/sys/kern/kern_shutdown.c:843
#13 0x80c16a8c in uma_dbg_free (zone=0xfe00dc9d2000,
slab=0xf800259e2fd8, item=0xf800259e2800)
at /usr/src/sys/vm/uma_core.c:5659
#14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000,
item=0xf800259e2800, size=544, udata=0x0, skip=SKIP_NONE)
at /usr/src/sys/vm/uma_core.c:3418
#15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000,
item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
#16 0x802e45d3 in uma_zfree (zone=0xfe00dc9d2000,
item=0xf800259e2800) at /usr/src/sys/vm/uma.h:404
#17 0x802dc3c3 in xpt_free_ccb (free_ccb=0xf800259e2800)
at /usr/src/sys/cam/cam_xpt.c:4676
#18 0x802dacf1 in camperiphdone (periph=0xf80025329b00,
done_ccb=0xf80025a24cc0) at /usr/src/sys/cam/cam_periph.c:1427
#19 0x802e4520 in xpt_done_process (ccb_h=0xf80025a24cc0)
at /usr/src/sys/cam/cam_xpt.c:5493
#20 0x802e68e0 in xpt_done_td (arg=0x81143700 )
at /usr/src/sys/cam/cam_xpt.c:5548
#21 0x807673c7 in fork_exit (callout=0x802e6720 ,
arg=0x81143700 , frame=0xfe00c6268c00)
at /usr/src/sys/kern/kern_fork.c:1083
#22 

[kgdb stuff removed]

(kgdb) down
#15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000,
item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
4374item_dtor(zone, item, cache_uz_size(cache), udata, 
SKIP_NONE);
(kgdb) down
#14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000,
item=0xf800259e2800, size=544, udata=0x0, skip=SKIP_NONE)
at /usr/src/sys/vm/uma_core.c:3418
3418uma_dbg_free(zone, NULL, item);
(kgdb) p/x skipdbg
$26 = 0x0
(kgdb) p/x zone->uz_flags
$27 = 0x4100 (UMA_ZFLAG_TRASH|UMA_ZFLAG_CTORDTOR)

panic: Unaligned free (was: kernel panic while copying files)

2021-06-29 Thread Graham Perrin

On 29/06/2021 10:42, Gary Jennejohn wrote:

… panic is now the result of an unaligned free.

panic: Unaligned free of 0xf800259e2800 from zone
 0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)

I have the crash dump and a debug kernel in case anyone wants more info.

Can you post the backtrace etc. here? Thanks



Re: kernel panic while copying files

2021-06-29 Thread Gary Jennejohn
I was sort of hoping that all the recent changes made by imp@ in cam and
umass may have fixed the cause of the kernel crash.

Unfortunately not.

But there is a change - instead of a duplicate free the panic is now the
result of an unaligned free.

panic: Unaligned free of 0xf800259e2800 from zone
0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3)

I have the crash dump and a debug kernel in case anyone wants more info.

-- 
Gary Jennejohn



Re: kernel panic while copying files

2021-06-12 Thread Gary Jennejohn
On Sat, 12 Jun 2021 14:10:36 +0100
Edward Tomasz Napiera__a  wrote:

> On 0610T1150, Gary Jennejohn wrote:
> > On Tue, 8 Jun 2021 17:54:05 +0200
> > Gary Jennejohn  wrote:
> > 
> > [big snip]  
> 
> [..]
> 
> > So, I did ``git reset --hard 8dc96b74edb844bb621afeba38fe4af104b13120'',
> > which was the penultimate commit made by trasz to clear CCBs on the stack
> > after he committed 3394d4239b85b5577845d9e6de4e97b18d3dba58, the change
> > to allocate CCBs in UMA.
> > 
> > Note that I only built the kernel and not world.
> > 
> > I tried to reset to 3394d4239b85b5577845d9e6de4e97b18d3dba58 itself,
> > but without the following commits for CCBs on the stack the kernel
> > paniced during startup in AHCI.
> > 
> > Anyway, this is the minimum set of changes relevant to the uma_ccbs
> > story and also results in a panic identical to the one listed above
> > when I set kern.cam.da.enable_uma_ccbs=1 and turn on the external USB
> > disk.
> > 
> > So, Warner is probably right and at least the da_uma_ccbs commits
> > should be reverted until more research can be done on why the panic
> > happens.
> > 
> > The ada_uma_ccbs commits do not cause any problems in my experience and
> > could probably be left in the kernel.  
> 
> Thank you, I'm working on a fix.  Meanwhile - does the current code
> cause any problems with set kern.cam.da.enable_uma_ccbs set to 0?
> If it doesn't, it probably doesn't require backing off, since 0 is
> the default, and will keep being the default until bugs such as this
> one are fixed.
> 

No, with the sysctl set to 0 it works really well.  I've been running
it that way for several days and have transferred large amounts of
data to an external USB3 disk with no problems.

I didn't mention it, but I also tested the reset kernel (with INVARIANTS)
with the sysctl set to 0 and the kernel did not panic.

I've had ada_enable_uma_ccbs set to 1 the whole time and never saw any
problems.

I agree, as long as the default is 0 all the code can stay in the tree.

-- 
Gary Jennejohn



Re: kernel panic while copying files

2021-06-12 Thread Edward Tomasz Napierała
On 0610T1150, Gary Jennejohn wrote:
> On Tue, 8 Jun 2021 17:54:05 +0200
> Gary Jennejohn  wrote:
> 
> [big snip]

[..]

> So, I did ``git reset --hard 8dc96b74edb844bb621afeba38fe4af104b13120'',
> which was the penultimate commit made by trasz to clear CCBs on the stack
> after he committed 3394d4239b85b5577845d9e6de4e97b18d3dba58, the change
> to allocate CCBs in UMA.
> 
> Note that I only built the kernel and not world.
> 
> I tried to reset to 3394d4239b85b5577845d9e6de4e97b18d3dba58 itself,
> but without the following commits for CCBs on the stack the kernel
> paniced during startup in AHCI.
> 
> Anyway, this is the minimum set of changes relevant to the uma_ccbs
> story and also results in a panic identical to the one listed above
> when I set kern.cam.da.enable_uma_ccbs=1 and turn on the external USB
> disk.
> 
> So, Warner is probably right and at least the da_uma_ccbs commits
> should be reverted until more research can be done on why the panic
> happens.
> 
> The ada_uma_ccbs commits do not cause any problems in my experience and
> could probably be left in the kernel.

Thank you, I'm working on a fix.  Meanwhile - does the current code
cause any problems with set kern.cam.da.enable_uma_ccbs set to 0?
If it doesn't, it probably doesn't require backing off, since 0 is
the default, and will keep being the default until bugs such as this
one are fixed.




Re: kernel panic while copying files

2021-06-10 Thread Gary Jennejohn
On Tue, 8 Jun 2021 17:54:05 +0200
Gary Jennejohn  wrote:

[big snip]
> Here's the kgdb backtrace with the -O0 kernel:
> 
> (kgdb) bt
> #0  0x8081d706 in doadump (textdump=0)
> at /usr/src/sys/kern/kern_shutdown.c:398
> #1  0x804ef15a in db_dump (dummy=-2138500043, dummy2=false, dummy3=-1,
> dummy4=0xfe00c62a11b0 "") at /usr/src/sys/ddb/db_command.c:575
> #2  0x804eef5f in db_command (
> last_cmdp=0x8114d380 , cmd_table=0x0, dopager=1)
> at /usr/src/sys/ddb/db_command.c:482
> #3  0x804eeb38 in db_command_loop ()
> at /usr/src/sys/ddb/db_command.c:535
> #4  0x804f38ef in db_trap (type=3, code=0)
> at /usr/src/sys/ddb/db_main.c:270
> #5  0x80891d02 in kdb_trap (type=3, code=0, tf=0xfe00c62a1680)
> at /usr/src/sys/kern/subr_kdb.c:727
> #6  0x80dd53c3 in trap (frame=0xfe00c62a1680)
> at /usr/src/sys/amd64/amd64/trap.c:604
> #7  0x80dd6718 in trap_check (frame=0xfe00c62a1680)
> at /usr/src/sys/amd64/amd64/trap.c:664
> #8  
> #9  breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66
> #10 0x808910d0 in kdb_enter (why=0x80eaaf0b "panic",
> msg=0x80eaaf0b "panic") at /usr/src/sys/kern/subr_kdb.c:505
> #11 0x8081dbfe in vpanic (
> fmt=0x80e80f73 "Duplicate free of %p from zone %p(%s) slab 
> %p(%d)", ap=0xfe00c62a1850) at /usr/src/sys/kern/kern_shutdown.c:906
> #12 0x8081d6b0 in panic (
> fmt=0x80e80f73 "Duplicate free of %p from zone %p(%s) slab 
> %p(%d)")
> at /usr/src/sys/kern/kern_shutdown.c:843
> #13 0x80caaec5 in uma_dbg_free (zone=0xfe00dc9d9800,
> slab=0xf80007ee0fd8, item=0xf80007ee)
> at /usr/src/sys/vm/uma_core.c:5664
> #14 0x80c9faf5 in item_dtor (zone=0xfe00dc9d9800,
> item=0xf80007ee, size=544, udata=0x0, skip=SKIP_NONE)
> at /usr/src/sys/vm/uma_core.c:3418
> #15 0x80c9eec7 in uma_zfree_arg (zone=0xfe00dc9d9800,
> item=0xf80007ee, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
> #16 0x802e5a89 in uma_zfree (zone=0xfe00dc9d9800,
> item=0xf80007ee) at /usr/src/sys/vm/uma.h:404
> #17 0x802dcfa6 in xpt_free_ccb (free_ccb=0xf80007ee)
> at /usr/src/sys/cam/cam_xpt.c:4674
> #18 0x802db639 in camperiphdone (periph=0xf8005d68bd00,
> done_ccb=0xf80007797cc0) at /usr/src/sys/cam/cam_periph.c:1427
> #19 0x802e59b6 in xpt_done_process (ccb_h=0xf80007797cc0)
> at /usr/src/sys/cam/cam_xpt.c:5491
> #20 0x802e811e in xpt_done_td (arg=0x81143c00 )
> at /usr/src/sys/cam/cam_xpt.c:5546
> #21 0x807ac0ea in fork_exit (callout=0x802e7f20 ,
> arg=0x81143c00 , frame=0xfe00c62a1c00)
> at /usr/src/sys/kern/kern_fork.c:1083
> #22 
> 

So, I did ``git reset --hard 8dc96b74edb844bb621afeba38fe4af104b13120'',
which was the penultimate commit made by trasz to clear CCBs on the stack
after he committed 3394d4239b85b5577845d9e6de4e97b18d3dba58, the change
to allocate CCBs in UMA.

Note that I only built the kernel and not world.

I tried to reset to 3394d4239b85b5577845d9e6de4e97b18d3dba58 itself,
but without the following commits for CCBs on the stack the kernel
paniced during startup in AHCI.

Anyway, this is the minimum set of changes relevant to the uma_ccbs
story and also results in a panic identical to the one listed above
when I set kern.cam.da.enable_uma_ccbs=1 and turn on the external USB
disk.

So, Warner is probably right and at least the da_uma_ccbs commits
should be reverted until more research can be done on why the panic
happens.

The ada_uma_ccbs commits do not cause any problems in my experience and
could probably be left in the kernel.

-- 
Gary Jennejohn



Re: kernel panic while copying files

2021-06-08 Thread Gary Jennejohn
On Tue, 8 Jun 2021 06:27:04 -0600
Warner Losh  wrote:

> On Tue, Jun 8, 2021 at 2:47 AM Gary Jennejohn  wrote:
> 
[snip old stuff]
> > Here the kgdb backtrace:
> >
> > Unread portion of the kernel message buffer:
> > panic: Duplicate free of 0xf800356b9000 from zone
> > 0xfe00dcbdd800(da_ccb) slab 0xf800356b9fd8(0)
> > cpuid = 8
> > time = 1623140519
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > 0xfe00c5f398c0
> > vpanic() at vpanic+0x181/frame 0xfe00c5f39910
> > panic() at panic+0x43/frame 0xfe00c5f39970
> > uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfe00c5f399b0
> > uma_zfree_arg() at uma_zfree_arg+0x147/frame 0xfe00c5f39a00
> > camperiphdone() at camperiphdone+0x1b7/frame 0xfe00c5f39b20
> > xpt_done_process() at xpt_done_process+0x3dd/frame 0xfe00c5f39b60
> > xpt_done_td() at xpt_done_td+0xf5/frame 0xfe00c5f39bb0
> > fork_exit() at fork_exit+0x80/frame 0xfe00c5f39bf0
> > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c5f39bf0
> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > KDB: enter: panic
> >
> > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> > 55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n"
> > (offsetof(struct pcpu,
> > (kgdb) bt
> > #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> > #1  doadump (textdump=textdump@entry=0)
> > at /usr/src/sys/kern/kern_shutdown.c:399
> > #2  0x8040c39a in db_dump (dummy=,
> > dummy2=, dummy3=, dummy4=)
> > at /usr/src/sys/ddb/db_command.c:575
> > #3  0x8040c192 in db_command (last_cmdp=,
> > cmd_table=, dopager=dopager@entry=1)
> > at /usr/src/sys/ddb/db_command.c:482
> > #4  0x8040beed in db_command_loop ()
> > at /usr/src/sys/ddb/db_command.c:535
> > #5  0x8040f616 in db_trap (type=, code= > out>)  
> > at /usr/src/sys/ddb/db_main.c:270
> > #6  0x8066b1c4 in kdb_trap (type=type@entry=3, code=code@entry=0,
> > tf=, tf@entry=0xfe00c5f397f0)
> > at /usr/src/sys/kern/subr_kdb.c:727
> > #7  0x809a4e96 in trap (frame=0xfe00c5f397f0)
> > at /usr/src/sys/amd64/amd64/trap.c:604
> > #8  
> > #9  kdb_enter (why=0x80a61a23 "panic", msg=)
> > at /usr/src/sys/kern/subr_kdb.c:506
> > #10 0x806207a2 in vpanic (fmt=, ap=,
> > ap@entry=0xfe00c5f39950) at /usr/src/sys/kern/kern_shutdown.c:907
> > #11 0x80620533 in panic (
> > fmt=0x80d635c8  ".\024\244\200\377\377\377\377")
> > at /usr/src/sys/kern/kern_shutdown.c:843
> > #12 0x808e12b1 in uma_dbg_free (zone=0xfe00dcbdd800,
> > slab=0xf800356b9fd8, item=0xf800356b9000)
> > at /usr/src/sys/vm/uma_core.c:5664
> > #13 0x808d9de7 in item_dtor (zone=0xfe00dcbdd800,
> > item=0xf800356b9000, size=544, udata=0x0, skip=SKIP_NONE)
> > at /usr/src/sys/vm/uma_core.c:3418
> > #14 uma_zfree_arg (zone=0xfe00dcbdd800, item=0xf800356b9000,
> > udata=udata@entry=0x0) at /usr/src/sys/vm/uma_core.c:4374
> > #15 0x802da503 in uma_zfree (zone=0x80d635c8 ,
> > item=0x200) at /usr/src/sys/vm/uma.h:404
> >  
> 
> OK. This is a bad stack trace. camperiphdone doesn't call uma_zfree()...
> It does call
> xpt_free_ccb, though, and that's likely what's going wrong. And that
> matches the line
> numbers. Most likely this is llvm's tail call optimizations...  Can you
> compile the kernel
> either -O0 or with -fno-optimize-sibling-calls? That will give a better
> call stack.
> 
> However, it's likely the new UMA stuff trasz committed (or it's providing
> better
> diagnostics than the old malloc based code which seems more likely) that
> can be
> disabled by the tunable kern.cam.da.enable_uma_ccbs=0.
> 
> The lines in question:
> saved_ccb = (union ccb *)done_ccb->ccb_h.saved_ccb_ptr;
> bcopy(saved_ccb, done_ccb, sizeof(*done_ccb));
> xpt_free_ccb(saved_ccb);
> 
> So we overwrite the done_ccb with the saved_ccb's contents and then free
> the saved ccb.
> That's likely OKish, though.
> 
> We copy entire CCBs around in this code a lot, and I've not traced through
> it. But we're
> sending a scsi start unit in response to some error that is being reported
> via cam_periph_error()
> 
> #16 0x802d9117 in camperiphdone (periph=0xf800061e2c00,
> > done_ccb=0xf800355d6cc0) at /usr/src/sys/cam/cam_periph.c:1427
> > #17 0x802dfebd in xpt_done_process (ccb_h=0xf800355d6cc0)
> > at /usr/src/sys/cam/cam_xpt.c:5491
> > #18 0x802e1ec5 in xpt_done_td (
> > arg=arg@entry=0x80d33d80 )
> > at /usr/src/sys/cam/cam_xpt.c:5546
> > #19 0x805dad80 in fork_exit (callout=0x802e1dd0
> > ,
> > arg=0x80d33d80 , frame=0xfe00c5f39c00)
> > at /usr/src/sys/kern/kern_fork.c:1083
> > #20 
> >
> > Apparently caused by recent changes to CAM.
> >
> > Let me know if you want more information.
> >  
> 
> what's 

Re: kernel panic while copying files

2021-06-08 Thread Warner Losh
On Tue, Jun 8, 2021 at 8:42 AM Gary Jennejohn  wrote:

> On Tue, 8 Jun 2021 06:48:19 -0600
> Warner Losh  wrote:
>
> > Sorry to reply to myself... had a thought as my brain rested while making
> > tea...
> >
> > I think we may need to consider reverting (or at least not yet enabling)
> > the uma stuff.
> >
>
> I tested and enabled the UMA CCB stuff immediately after trasz had
> committed it.  I was able to copy files panic-free over USB until
> recently AFAICR.
>
> I also have had the kern.cam.ada.enable_uma_ccbs=1 set since
> then and have never seen a problem there.  Only with USB.
>

Yes. This specific bug only affects SCSI. And it only affects it when
there's an error that requires a restart. I've not yet had the time to do
an audit for where else the copying is done...


> I'll try booting a new kernel with the uma_ccb sysctl's set to 0
> and see what happens.
>
> BTW I now have a kernel compiled with -O0 ready to test.
>

Great!

Warner


Re: kernel panic while copying files

2021-06-08 Thread Gary Jennejohn
On Tue, 8 Jun 2021 06:48:19 -0600
Warner Losh  wrote:

> Sorry to reply to myself... had a thought as my brain rested while making
> tea...
> 
> I think we may need to consider reverting (or at least not yet enabling)
> the uma stuff.
> 

I tested and enabled the UMA CCB stuff immediately after trasz had
committed it.  I was able to copy files panic-free over USB until
recently AFAICR.

I also have had the kern.cam.ada.enable_uma_ccbs=1 set since
then and have never seen a problem there.  Only with USB.

I'll try booting a new kernel with the uma_ccb sysctl's set to 0
and see what happens.

BTW I now have a kernel compiled with -O0 ready to test.

[snip lots of extraneous stuff]

-- 
Gary Jennejohn



Re: kernel panic while copying files

2021-06-08 Thread Warner Losh
Sorry to reply to myself... had a thought as my brain rested while making
tea...

I think we may need to consider reverting (or at least not yet enabling)
the uma stuff.

On Tue, Jun 8, 2021 at 6:27 AM Warner Losh  wrote:

>
>
> On Tue, Jun 8, 2021 at 2:47 AM Gary Jennejohn 
> wrote:
>
>> On Mon, 7 Jun 2021 16:54:11 -0400
>> Mark Johnston  wrote:
>>
>> > On Mon, Jun 07, 2021 at 11:01:09AM +0200, Gary Jennejohn wrote:
>> > > I've seen this panic three times in the last two days:
>> > >
>> > > [first panic]
>> > > Unread portion of the kernel message buffer:
>> > >
>> > >
>> > > Fatal trap 12: page fault while in kernel mode
>> > > cpuid = 3; apic id = 03
>> > > fault virtual address   = 0x801118000
>> > > fault code  = supervisor write data, page not present
>> > > instruction pointer = 0x20:0x808d2212
>> > > stack pointer   = 0x28:0xfe00dbc8c760
>> > > frame pointer   = 0x28:0xfe00dbc8c7a0
>> > > code segment= base 0x0, limit 0xf, type 0x1b
>> > > = DPL 0, pres 1, long 1, def32 0, gran 1
>> > > processor eflags= interrupt enabled, resume, IOPL = 0
>> > > current process = 28 (dom0)
>> > > trap number = 12
>> > > panic: page fault
>> > > cpuid = 3
>> > > time = 1622963058
>> > > KDB: stack backtrace:
>> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> 0xfe00dbc8c410
>> > > vpanic() at vpanic+0x181/frame 0xfe00dbc8c460
>> > > panic() at panic+0x43/frame 0xfe00dbc8c4c0
>> > > trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520
>> > > trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580
>> > > trap() at trap+0x253/frame 0xfe00dbc8c690
>> > > calltrap() at calltrap+0x8/frame 0xfe00dbc8c690
>> > > --- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp
>> = 0xfe00dbc8c7a0 ---
>> > > zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0
>> > > bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0
>> > > bucket_cache_reclaim_domain() at
>> bucket_cache_reclaim_domain+0x30a/frame 0xfe00dbc8c830
>> > > zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880
>> > > uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame
>> 0xfe00dbc8c8b0
>> > > vm_pageout_worker() at vm_pageout_worker+0x41e/frame
>> 0xfe00dbc8cb70
>> > > vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0
>> > > fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0
>> > > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0
>> > > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>> > > KDB: enter: panic
>> > >
>> > > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
>> > > 55   __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
>> pcpu,
>> > >pc_curthread)));
>> > >
>> > > One difference was that in the second and third panics the fault
>> virtual
>> > > address was 0x0.  But the backtrace was the same.
>> > >
>> > > Relevant info from the info.x files:
>> > > Architecture: amd64
>> > > Architecture Version: 2
>> > > Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039:
>> Sat Jun
>> > > 5 09:58:55 CEST 2021
>> > >
>> > > CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz
>> K8-class CPU)
>> > >   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1
>> Stepping=1
>> > >   AMD Features=0x2e500800
>> > >   AMD
>> Features2=0x35c233ff
>> > >   AMD Extended Feature Extensions ID
>> EBX=0x1007
>> > >
>> > > I have 16GiB of memory in the box.
>> > >
>> > > The panic occurred while copying files from an internal SATA SSD to a
>> > > SATA 8TB disk in an external USB3 docking station.  The panic seems to
>> > > occur quite quickly, after only a few files have been copied.
>> > >
>> > > swap is on a different internal disk.
>> > >
>> > > I can poke around in the crash dumps with kgdb if anyone wants more
>> > > information.
>> >
>> > Are you running with invariants configured in the kernel?  If not,
>> > please try to reproduce this in a kernel with
>> >
>> > options INVARIANT_SUPPORT
>> > options INVARIANTS
>> >
>> > configured.
>> >
>> > A stack trace with line numbers would also be helpful.
>>
>> Thanks for the hint.  After enabling INVARIANTS the kernel panics as
>> soon I turn on the external USB3 disk.  No user disk access required.
>>
>> Version String: FreeBSD 14.0-CURRENT #34 main-n247239-f570a6723e1: Tue Jun
>> 8 09:34:32 CEST 2021
>>
>> Here the kgdb backtrace:
>>
>> Unread portion of the kernel message buffer:
>> panic: Duplicate free of 0xf800356b9000 from zone
>> 0xfe00dcbdd800(da_ccb) slab 0xf800356b9fd8(0)
>> cpuid = 8
>> time = 1623140519
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> 0xfe00c5f398c0
>> vpanic() at vpanic+0x181/frame 0xfe00c5f39910
>> panic() at panic+0x43/frame 0xfe00c5f39970
>> uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfe00c5f399b0
>> uma_zfree_arg() at 

Re: kernel panic while copying files

2021-06-08 Thread Warner Losh
On Tue, Jun 8, 2021 at 2:47 AM Gary Jennejohn  wrote:

> On Mon, 7 Jun 2021 16:54:11 -0400
> Mark Johnston  wrote:
>
> > On Mon, Jun 07, 2021 at 11:01:09AM +0200, Gary Jennejohn wrote:
> > > I've seen this panic three times in the last two days:
> > >
> > > [first panic]
> > > Unread portion of the kernel message buffer:
> > >
> > >
> > > Fatal trap 12: page fault while in kernel mode
> > > cpuid = 3; apic id = 03
> > > fault virtual address   = 0x801118000
> > > fault code  = supervisor write data, page not present
> > > instruction pointer = 0x20:0x808d2212
> > > stack pointer   = 0x28:0xfe00dbc8c760
> > > frame pointer   = 0x28:0xfe00dbc8c7a0
> > > code segment= base 0x0, limit 0xf, type 0x1b
> > > = DPL 0, pres 1, long 1, def32 0, gran 1
> > > processor eflags= interrupt enabled, resume, IOPL = 0
> > > current process = 28 (dom0)
> > > trap number = 12
> > > panic: page fault
> > > cpuid = 3
> > > time = 1622963058
> > > KDB: stack backtrace:
> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe00dbc8c410
> > > vpanic() at vpanic+0x181/frame 0xfe00dbc8c460
> > > panic() at panic+0x43/frame 0xfe00dbc8c4c0
> > > trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520
> > > trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580
> > > trap() at trap+0x253/frame 0xfe00dbc8c690
> > > calltrap() at calltrap+0x8/frame 0xfe00dbc8c690
> > > --- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp
> = 0xfe00dbc8c7a0 ---
> > > zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0
> > > bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0
> > > bucket_cache_reclaim_domain() at
> bucket_cache_reclaim_domain+0x30a/frame 0xfe00dbc8c830
> > > zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880
> > > uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame
> 0xfe00dbc8c8b0
> > > vm_pageout_worker() at vm_pageout_worker+0x41e/frame 0xfe00dbc8cb70
> > > vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0
> > > fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0
> > > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0
> > > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > > KDB: enter: panic
> > >
> > > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> > > 55   __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
> pcpu,
> > >pc_curthread)));
> > >
> > > One difference was that in the second and third panics the fault
> virtual
> > > address was 0x0.  But the backtrace was the same.
> > >
> > > Relevant info from the info.x files:
> > > Architecture: amd64
> > > Architecture Version: 2
> > > Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: Sat
> Jun
> > > 5 09:58:55 CEST 2021
> > >
> > > CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz
> K8-class CPU)
> > >   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1
> Stepping=1
> > >   AMD Features=0x2e500800
> > >   AMD
> Features2=0x35c233ff
> > >   AMD Extended Feature Extensions ID
> EBX=0x1007
> > >
> > > I have 16GiB of memory in the box.
> > >
> > > The panic occurred while copying files from an internal SATA SSD to a
> > > SATA 8TB disk in an external USB3 docking station.  The panic seems to
> > > occur quite quickly, after only a few files have been copied.
> > >
> > > swap is on a different internal disk.
> > >
> > > I can poke around in the crash dumps with kgdb if anyone wants more
> > > information.
> >
> > Are you running with invariants configured in the kernel?  If not,
> > please try to reproduce this in a kernel with
> >
> > options INVARIANT_SUPPORT
> > options INVARIANTS
> >
> > configured.
> >
> > A stack trace with line numbers would also be helpful.
>
> Thanks for the hint.  After enabling INVARIANTS the kernel panics as
> soon I turn on the external USB3 disk.  No user disk access required.
>
> Version String: FreeBSD 14.0-CURRENT #34 main-n247239-f570a6723e1: Tue Jun
> 8 09:34:32 CEST 2021
>
> Here the kgdb backtrace:
>
> Unread portion of the kernel message buffer:
> panic: Duplicate free of 0xf800356b9000 from zone
> 0xfe00dcbdd800(da_ccb) slab 0xf800356b9fd8(0)
> cpuid = 8
> time = 1623140519
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe00c5f398c0
> vpanic() at vpanic+0x181/frame 0xfe00c5f39910
> panic() at panic+0x43/frame 0xfe00c5f39970
> uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfe00c5f399b0
> uma_zfree_arg() at uma_zfree_arg+0x147/frame 0xfe00c5f39a00
> camperiphdone() at camperiphdone+0x1b7/frame 0xfe00c5f39b20
> xpt_done_process() at xpt_done_process+0x3dd/frame 0xfe00c5f39b60
> xpt_done_td() at xpt_done_td+0xf5/frame 0xfe00c5f39bb0
> fork_exit() at fork_exit+0x80/frame 0xfe00c5f39bf0
> fork_trampoline() at fork_trampoline+0xe/frame 

Re: kernel panic while copying files

2021-06-08 Thread Gary Jennejohn
On Tue, 8 Jun 2021 11:04:33 +0200
Mateusz Guzik  wrote:

> Given how easy it is to reproduce perhaps you can spend a little bit
> of time narrowing it down to a specific commit. You can do it with
> git-bisect.
> 

Ok, I'll give it a try.

-- 
Gary Jennejohn



Re: kernel panic while copying files

2021-06-08 Thread Hans Petter Selasky

On 6/8/21 1:34 PM, Gary Jennejohn wrote:

Fields in the ccb like periph_name, unit_number and dev_name are filled
with zeroes.


Smells like a double free, like the panic message indicates, but would 
be nice to know exactly which driver is doing this, if it is "ATA" or 
"UMASS", so to speak.


Maybe you need to do a quick bisect, like suggested.

--HPS



Re: kernel panic while copying files

2021-06-08 Thread Gary Jennejohn
On Tue, 8 Jun 2021 11:20:37 +0200
Hans Petter Selasky  wrote:

> On 6/8/21 11:04 AM, Mateusz Guzik wrote:
> > Apparently caused by recent changes to CAM.
> > 
> > Let me know if you want more information.  
> 
> Maybe you can print the *ccb being freed and figure out which device
> it belongs to.
> 

I'm now running a kernel without INVARIANTS, so I can check:

Jun  8 13:23:52 ernst kernel: ugen2.4:  at usbus2
Jun  8 13:23:52 ernst kernel: umass0 on uhub5
Jun  8 13:23:52 ernst kernel: umass0:  on usbus2
Jun  8 13:23:52 ernst kernel: umass0:  SCSI over Bulk-Only; quirks = 0xc101
Jun  8 13:23:52 ernst kernel: umass0:6:0: Attached to scbus6
Jun  8 13:24:37 ernst kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Jun  8 13:24:37 ernst kernel: da0:  Fixed Direct Access 
SPC-4 SCSI device
Jun  8 13:24:37 ernst kernel: da0: Serial Number 0001
Jun  8 13:24:37 ernst kernel: da0: 400.000MB/s transfers
Jun  8 13:24:37 ernst kernel: da0: 7630885MB (15628053168 512 byte sectors)
Jun  8 13:24:37 ernst kernel: da0: quirks=0x2

The only USB device which I turned on.

Fields in the ccb like periph_name, unit_number and dev_name are filled
with zeroes.

The ccb is enormous and really hard to parse.

-- 
Gary Jennejohn



Re: kernel panic while copying files

2021-06-08 Thread Hans Petter Selasky

On 6/8/21 11:04 AM, Mateusz Guzik wrote:

Apparently caused by recent changes to CAM.

Let me know if you want more information.


Maybe you can print the *ccb being freed and figure out which device it 
belongs to.


--HPS



Re: kernel panic while copying files

2021-06-08 Thread Mateusz Guzik
Given how easy it is to reproduce perhaps you can spend a little bit
of time narrowing it down to a specific commit. You can do it with
git-bisect.

On 6/8/21, Gary Jennejohn  wrote:
> On Mon, 7 Jun 2021 16:54:11 -0400
> Mark Johnston  wrote:
>
>> On Mon, Jun 07, 2021 at 11:01:09AM +0200, Gary Jennejohn wrote:
>> > I've seen this panic three times in the last two days:
>> >
>> > [first panic]
>> > Unread portion of the kernel message buffer:
>> >
>> >
>> > Fatal trap 12: page fault while in kernel mode
>> > cpuid = 3; apic id = 03
>> > fault virtual address   = 0x801118000
>> > fault code  = supervisor write data, page not present
>> > instruction pointer = 0x20:0x808d2212
>> > stack pointer   = 0x28:0xfe00dbc8c760
>> > frame pointer   = 0x28:0xfe00dbc8c7a0
>> > code segment= base 0x0, limit 0xf, type 0x1b
>> > = DPL 0, pres 1, long 1, def32 0, gran 1
>> > processor eflags= interrupt enabled, resume, IOPL = 0
>> > current process = 28 (dom0)
>> > trap number = 12
>> > panic: page fault
>> > cpuid = 3
>> > time = 1622963058
>> > KDB: stack backtrace:
>> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> > 0xfe00dbc8c410
>> > vpanic() at vpanic+0x181/frame 0xfe00dbc8c460
>> > panic() at panic+0x43/frame 0xfe00dbc8c4c0
>> > trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520
>> > trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580
>> > trap() at trap+0x253/frame 0xfe00dbc8c690
>> > calltrap() at calltrap+0x8/frame 0xfe00dbc8c690
>> > --- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp =
>> > 0xfe00dbc8c7a0 ---
>> > zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0
>> > bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0
>> > bucket_cache_reclaim_domain() at bucket_cache_reclaim_domain+0x30a/frame
>> > 0xfe00dbc8c830
>> > zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880
>> > uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame
>> > 0xfe00dbc8c8b0
>> > vm_pageout_worker() at vm_pageout_worker+0x41e/frame 0xfe00dbc8cb70
>> > vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0
>> > fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0
>> > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0
>> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>> > KDB: enter: panic
>> >
>> > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
>> > 55   __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
>> > pcpu,
>> >   pc_curthread)));
>> >
>> > One difference was that in the second and third panics the fault
>> > virtual
>> > address was 0x0.  But the backtrace was the same.
>> >
>> > Relevant info from the info.x files:
>> > Architecture: amd64
>> > Architecture Version: 2
>> > Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: Sat
>> > Jun
>> > 5 09:58:55 CEST 2021
>> >
>> > CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz
>> > K8-class CPU)
>> >   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1
>> > Stepping=1
>> >   AMD Features=0x2e500800
>> >   AMD
>> > Features2=0x35c233ff
>> >   AMD Extended Feature Extensions ID
>> > EBX=0x1007
>> >
>> > I have 16GiB of memory in the box.
>> >
>> > The panic occurred while copying files from an internal SATA SSD to a
>> > SATA 8TB disk in an external USB3 docking station.  The panic seems to
>> > occur quite quickly, after only a few files have been copied.
>> >
>> > swap is on a different internal disk.
>> >
>> > I can poke around in the crash dumps with kgdb if anyone wants more
>> > information.
>>
>> Are you running with invariants configured in the kernel?  If not,
>> please try to reproduce this in a kernel with
>>
>> options INVARIANT_SUPPORT
>> options INVARIANTS
>>
>> configured.
>>
>> A stack trace with line numbers would also be helpful.
>
> Thanks for the hint.  After enabling INVARIANTS the kernel panics as
> soon I turn on the external USB3 disk.  No user disk access required.
>
> Version String: FreeBSD 14.0-CURRENT #34 main-n247239-f570a6723e1: Tue Jun
> 8 09:34:32 CEST 2021
>
> Here the kgdb backtrace:
>
> Unread portion of the kernel message buffer:
> panic: Duplicate free of 0xf800356b9000 from zone
> 0xfe00dcbdd800(da_ccb) slab 0xf800356b9fd8(0)
> cpuid = 8
> time = 1623140519
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe00c5f398c0
> vpanic() at vpanic+0x181/frame 0xfe00c5f39910
> panic() at panic+0x43/frame 0xfe00c5f39970
> uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfe00c5f399b0
> uma_zfree_arg() at uma_zfree_arg+0x147/frame 0xfe00c5f39a00
> camperiphdone() at camperiphdone+0x1b7/frame 0xfe00c5f39b20
> xpt_done_process() at xpt_done_process+0x3dd/frame 0xfe00c5f39b60
> xpt_done_td() at xpt_done_td+0xf5/frame 0xfe00c5f39bb0
> fork_exit() at fork_exit+0x80/frame 

Re: kernel panic while copying files

2021-06-08 Thread Gary Jennejohn
On Mon, 7 Jun 2021 16:54:11 -0400
Mark Johnston  wrote:

> On Mon, Jun 07, 2021 at 11:01:09AM +0200, Gary Jennejohn wrote:
> > I've seen this panic three times in the last two days:
> > 
> > [first panic]
> > Unread portion of the kernel message buffer:
> > 
> > 
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 3; apic id = 03
> > fault virtual address   = 0x801118000
> > fault code  = supervisor write data, page not present
> > instruction pointer = 0x20:0x808d2212
> > stack pointer   = 0x28:0xfe00dbc8c760
> > frame pointer   = 0x28:0xfe00dbc8c7a0
> > code segment= base 0x0, limit 0xf, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags= interrupt enabled, resume, IOPL = 0
> > current process = 28 (dom0)
> > trap number = 12
> > panic: page fault
> > cpuid = 3
> > time = 1622963058
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> > 0xfe00dbc8c410
> > vpanic() at vpanic+0x181/frame 0xfe00dbc8c460
> > panic() at panic+0x43/frame 0xfe00dbc8c4c0
> > trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520
> > trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580
> > trap() at trap+0x253/frame 0xfe00dbc8c690
> > calltrap() at calltrap+0x8/frame 0xfe00dbc8c690
> > --- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp = 
> > 0xfe00dbc8c7a0 ---
> > zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0
> > bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0
> > bucket_cache_reclaim_domain() at bucket_cache_reclaim_domain+0x30a/frame 
> > 0xfe00dbc8c830
> > zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880
> > uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame 0xfe00dbc8c8b0
> > vm_pageout_worker() at vm_pageout_worker+0x41e/frame 0xfe00dbc8cb70
> > vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0
> > fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0
> > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0
> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> > KDB: enter: panic
> > 
> > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> > 55   __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
> >pc_curthread)));
> > 
> > One difference was that in the second and third panics the fault virtual
> > address was 0x0.  But the backtrace was the same.
> > 
> > Relevant info from the info.x files:
> > Architecture: amd64
> > Architecture Version: 2
> > Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: Sat Jun
> > 5 09:58:55 CEST 2021
> > 
> > CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz K8-class 
> > CPU)
> >   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1  Stepping=1
> >   AMD Features=0x2e500800
> >   AMD 
> > Features2=0x35c233ff
> >   AMD Extended Feature Extensions ID 
> > EBX=0x1007
> > 
> > I have 16GiB of memory in the box.
> > 
> > The panic occurred while copying files from an internal SATA SSD to a
> > SATA 8TB disk in an external USB3 docking station.  The panic seems to
> > occur quite quickly, after only a few files have been copied.
> > 
> > swap is on a different internal disk.
> > 
> > I can poke around in the crash dumps with kgdb if anyone wants more
> > information.  
> 
> Are you running with invariants configured in the kernel?  If not,
> please try to reproduce this in a kernel with
> 
> options INVARIANT_SUPPORT
> options INVARIANTS
> 
> configured.
> 
> A stack trace with line numbers would also be helpful.

Thanks for the hint.  After enabling INVARIANTS the kernel panics as
soon I turn on the external USB3 disk.  No user disk access required.

Version String: FreeBSD 14.0-CURRENT #34 main-n247239-f570a6723e1: Tue Jun
8 09:34:32 CEST 2021

Here the kgdb backtrace:

Unread portion of the kernel message buffer:
panic: Duplicate free of 0xf800356b9000 from zone 
0xfe00dcbdd800(da_ccb) slab 0xf800356b9fd8(0)
cpuid = 8
time = 1623140519
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00c5f398c0
vpanic() at vpanic+0x181/frame 0xfe00c5f39910
panic() at panic+0x43/frame 0xfe00c5f39970
uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfe00c5f399b0
uma_zfree_arg() at uma_zfree_arg+0x147/frame 0xfe00c5f39a00
camperiphdone() at camperiphdone+0x1b7/frame 0xfe00c5f39b20
xpt_done_process() at xpt_done_process+0x3dd/frame 0xfe00c5f39b60
xpt_done_td() at xpt_done_td+0xf5/frame 0xfe00c5f39bb0
fork_exit() at fork_exit+0x80/frame 0xfe00c5f39bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c5f39bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct 
pcpu,
(kgdb) bt
#0  __curthread () at 

Re: kernel panic while copying files

2021-06-07 Thread Mark Johnston
On Mon, Jun 07, 2021 at 11:01:09AM +0200, Gary Jennejohn wrote:
> I've seen this panic three times in the last two days:
> 
> [first panic]
> Unread portion of the kernel message buffer:
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 3; apic id = 03
> fault virtual address   = 0x801118000
> fault code  = supervisor write data, page not present
> instruction pointer = 0x20:0x808d2212
> stack pointer   = 0x28:0xfe00dbc8c760
> frame pointer   = 0x28:0xfe00dbc8c7a0
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 28 (dom0)
> trap number = 12
> panic: page fault
> cpuid = 3
> time = 1622963058
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00dbc8c410
> vpanic() at vpanic+0x181/frame 0xfe00dbc8c460
> panic() at panic+0x43/frame 0xfe00dbc8c4c0
> trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520
> trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580
> trap() at trap+0x253/frame 0xfe00dbc8c690
> calltrap() at calltrap+0x8/frame 0xfe00dbc8c690
> --- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp = 
> 0xfe00dbc8c7a0 ---
> zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0
> bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0
> bucket_cache_reclaim_domain() at bucket_cache_reclaim_domain+0x30a/frame 
> 0xfe00dbc8c830
> zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880
> uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame 0xfe00dbc8c8b0
> vm_pageout_worker() at vm_pageout_worker+0x41e/frame 0xfe00dbc8cb70
> vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0
> fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
> 
> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> 55   __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
>  pc_curthread)));
> 
> One difference was that in the second and third panics the fault virtual
> address was 0x0.  But the backtrace was the same.
> 
> Relevant info from the info.x files:
> Architecture: amd64
> Architecture Version: 2
> Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: Sat Jun
> 5 09:58:55 CEST 2021
> 
> CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz K8-class 
> CPU)
>   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1  Stepping=1
>   AMD Features=0x2e500800
>   AMD 
> Features2=0x35c233ff
>   AMD Extended Feature Extensions ID EBX=0x1007
> 
> I have 16GiB of memory in the box.
> 
> The panic occurred while copying files from an internal SATA SSD to a
> SATA 8TB disk in an external USB3 docking station.  The panic seems to
> occur quite quickly, after only a few files have been copied.
> 
> swap is on a different internal disk.
> 
> I can poke around in the crash dumps with kgdb if anyone wants more
> information.

Are you running with invariants configured in the kernel?  If not,
please try to reproduce this in a kernel with

options INVARIANT_SUPPORT
options INVARIANTS

configured.

A stack trace with line numbers would also be helpful.



kernel panic while copying files

2021-06-07 Thread Gary Jennejohn
I've seen this panic three times in the last two days:

[first panic]
Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x801118000
fault code  = supervisor write data, page not present
instruction pointer = 0x20:0x808d2212
stack pointer   = 0x28:0xfe00dbc8c760
frame pointer   = 0x28:0xfe00dbc8c7a0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 28 (dom0)
trap number = 12
panic: page fault
cpuid = 3
time = 1622963058
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00dbc8c410
vpanic() at vpanic+0x181/frame 0xfe00dbc8c460
panic() at panic+0x43/frame 0xfe00dbc8c4c0
trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520
trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580
trap() at trap+0x253/frame 0xfe00dbc8c690
calltrap() at calltrap+0x8/frame 0xfe00dbc8c690
--- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp = 
0xfe00dbc8c7a0 ---
zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0
bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0
bucket_cache_reclaim_domain() at bucket_cache_reclaim_domain+0x30a/frame 
0xfe00dbc8c830
zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880
uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame 0xfe00dbc8c8b0
vm_pageout_worker() at vm_pageout_worker+0x41e/frame 0xfe00dbc8cb70
vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0
fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
55   __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
   pc_curthread)));

One difference was that in the second and third panics the fault virtual
address was 0x0.  But the backtrace was the same.

Relevant info from the info.x files:
Architecture: amd64
Architecture Version: 2
Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: Sat Jun
5 09:58:55 CEST 2021

CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1  Stepping=1
  AMD Features=0x2e500800
  AMD 
Features2=0x35c233ff
  AMD Extended Feature Extensions ID EBX=0x1007

I have 16GiB of memory in the box.

The panic occurred while copying files from an internal SATA SSD to a
SATA 8TB disk in an external USB3 docking station.  The panic seems to
occur quite quickly, after only a few files have been copied.

swap is on a different internal disk.

I can poke around in the crash dumps with kgdb if anyone wants more
information.

-- 
Gary Jennejohn