Re: panic: Unaligned free (was: kernel panic while copying files)
On Wed, 7 Jul 2021 09:38:05 +0100 Edward Tomasz Napiera?a wrote: > On 0705T1833, Gary Jennejohn wrote: > > On Mon, 5 Jul 2021 15:04:48 +0100 > > Edward Tomasz Napiera__a wrote: > > > > > On 0701T1330, Gary Jennejohn wrote: > > > > Gary Jennejohn wrote: > > > > > I noticed that the value of vm.debug.divisor affects what value is > > > > > returned in uma_core.c:uma_dbg_kskip(), so I decided to try a few > > > > > different values. > > > > > > > > > > The returned value is used to set skipdbg in uma_core.c:item_dtor(). > > > > > > > > > > The default is vm.debug.divisor=1. > > > > > > > > > > vm.debug.divisor is only present when INVARIANTS is defined. > > > > > > > > > > kskipdbg eventually affects the value of freei. > > > > > > > > > > With these values: > > > > > vm.debug.divisor: 0 > > > > > kern.cam.da.enable_uma_ccbs: 1 > > > > > I can turn on the disk and it comes up without a panic! > > > > > > > > > > However, I didn't try to do any large data transfers to the disk. > > > > > > > > > > So, it appears that at least vm.debug.divisor is a big factor in > > > > > whether or not a panic happens with INVARIANTS. > > > > > > > > > > > > > I decided to do a real test. So I built a kernel w/o INVARIANTS and > > > > installed it to /boot/test. > > > > > > > > Then I stuck a 160GB disk I had around into an external USB3 enclosure > > > > and put a filesystem on it. > > > > > > > > The I booted the new kernel from /boot/test and set the sysctls so: > > > > kern.cam.da.enable_uma_ccbs: 1 > > > > kern.cam.ada.enable_uma_ccbs: 1 > > > > > > > > After that I plugged in the external USB3 enclosure and copied about > > > > 114GiB of data from an internal SSD to it - without a kernel panic: > > > > FilesystemSizeUsed Avail Capacity Mounted on > > > > /dev/da0p1144G114G 18G86%/mnt > > > > > > > > I'm pretty sure that's more than I could copy without a kernel panic > > > > prior to the recent changes made in cam and umass. > > > > > > > > My test may not be real proof that all bugs have been squashed, but it > > > > certainly seems to be a better situation than we had before. > > > > > > I think the vm.debug.divisor simply masks the problem; the underlying > > > bug is still there. > > > > > > Could you go back to the setup which panics, and then test the patch > > > at https://reviews.freebsd.org/D31054? It fixes the scenario described > > > by Warner. > > > > > > > It looks like this patch fixes things. > > > > I used the default value vm.debug.divisor=1 and both enable_uma_ccbs=1 > > (which are now the default values on my system). > > > > I used the 8TiB disk, which spins up very slowly and usually resulted very > > quickly in a panic - no panic with the patch. > > > > Then using dd to /dev/null (bs=1m) I transferred: > > > > 308755+0 records in > > 308755+0 records out > > 323753082880 bytes transferred in 1366.162410 secs (236979938 bytes/sec) > > > > from the disk, so about 324GiB without a panic. > > Perfect, I've committed the fix. Thank you! > Thanks to you! I built a new kernel as soon as I saw the commit and am running it since yesterday. -- Gary Jennejohn
Re: panic: Unaligned free (was: kernel panic while copying files)
On 0705T1833, Gary Jennejohn wrote: > On Mon, 5 Jul 2021 15:04:48 +0100 > Edward Tomasz Napiera__a wrote: > > > On 0701T1330, Gary Jennejohn wrote: > > > Gary Jennejohn wrote: > > > > I noticed that the value of vm.debug.divisor affects what value is > > > > returned in uma_core.c:uma_dbg_kskip(), so I decided to try a few > > > > different values. > > > > > > > > The returned value is used to set skipdbg in uma_core.c:item_dtor(). > > > > > > > > The default is vm.debug.divisor=1. > > > > > > > > vm.debug.divisor is only present when INVARIANTS is defined. > > > > > > > > kskipdbg eventually affects the value of freei. > > > > > > > > With these values: > > > > vm.debug.divisor: 0 > > > > kern.cam.da.enable_uma_ccbs: 1 > > > > I can turn on the disk and it comes up without a panic! > > > > > > > > However, I didn't try to do any large data transfers to the disk. > > > > > > > > So, it appears that at least vm.debug.divisor is a big factor in > > > > whether or not a panic happens with INVARIANTS. > > > > > > > > > > I decided to do a real test. So I built a kernel w/o INVARIANTS and > > > installed it to /boot/test. > > > > > > Then I stuck a 160GB disk I had around into an external USB3 enclosure > > > and put a filesystem on it. > > > > > > The I booted the new kernel from /boot/test and set the sysctls so: > > > kern.cam.da.enable_uma_ccbs: 1 > > > kern.cam.ada.enable_uma_ccbs: 1 > > > > > > After that I plugged in the external USB3 enclosure and copied about > > > 114GiB of data from an internal SSD to it - without a kernel panic: > > > FilesystemSizeUsed Avail Capacity Mounted on > > > /dev/da0p1144G114G 18G86%/mnt > > > > > > I'm pretty sure that's more than I could copy without a kernel panic > > > prior to the recent changes made in cam and umass. > > > > > > My test may not be real proof that all bugs have been squashed, but it > > > certainly seems to be a better situation than we had before. > > > > I think the vm.debug.divisor simply masks the problem; the underlying > > bug is still there. > > > > Could you go back to the setup which panics, and then test the patch > > at https://reviews.freebsd.org/D31054? It fixes the scenario described > > by Warner. > > > > It looks like this patch fixes things. > > I used the default value vm.debug.divisor=1 and both enable_uma_ccbs=1 > (which are now the default values on my system). > > I used the 8TiB disk, which spins up very slowly and usually resulted very > quickly in a panic - no panic with the patch. > > Then using dd to /dev/null (bs=1m) I transferred: > > 308755+0 records in > 308755+0 records out > 323753082880 bytes transferred in 1366.162410 secs (236979938 bytes/sec) > > from the disk, so about 324GiB without a panic. Perfect, I've committed the fix. Thank you!
Re: panic: Unaligned free (was: kernel panic while copying files)
On Mon, 5 Jul 2021 15:04:48 +0100 Edward Tomasz Napiera__a wrote: > On 0701T1330, Gary Jennejohn wrote: > > Gary Jennejohn wrote: > > > I noticed that the value of vm.debug.divisor affects what value is > > > returned in uma_core.c:uma_dbg_kskip(), so I decided to try a few > > > different values. > > > > > > The returned value is used to set skipdbg in uma_core.c:item_dtor(). > > > > > > The default is vm.debug.divisor=1. > > > > > > vm.debug.divisor is only present when INVARIANTS is defined. > > > > > > kskipdbg eventually affects the value of freei. > > > > > > With these values: > > > vm.debug.divisor: 0 > > > kern.cam.da.enable_uma_ccbs: 1 > > > I can turn on the disk and it comes up without a panic! > > > > > > However, I didn't try to do any large data transfers to the disk. > > > > > > So, it appears that at least vm.debug.divisor is a big factor in > > > whether or not a panic happens with INVARIANTS. > > > > > > > I decided to do a real test. So I built a kernel w/o INVARIANTS and > > installed it to /boot/test. > > > > Then I stuck a 160GB disk I had around into an external USB3 enclosure > > and put a filesystem on it. > > > > The I booted the new kernel from /boot/test and set the sysctls so: > > kern.cam.da.enable_uma_ccbs: 1 > > kern.cam.ada.enable_uma_ccbs: 1 > > > > After that I plugged in the external USB3 enclosure and copied about > > 114GiB of data from an internal SSD to it - without a kernel panic: > > FilesystemSizeUsed Avail Capacity Mounted on > > /dev/da0p1144G114G 18G86%/mnt > > > > I'm pretty sure that's more than I could copy without a kernel panic > > prior to the recent changes made in cam and umass. > > > > My test may not be real proof that all bugs have been squashed, but it > > certainly seems to be a better situation than we had before. > > I think the vm.debug.divisor simply masks the problem; the underlying > bug is still there. > > Could you go back to the setup which panics, and then test the patch > at https://reviews.freebsd.org/D31054? It fixes the scenario described > by Warner. > It looks like this patch fixes things. I used the default value vm.debug.divisor=1 and both enable_uma_ccbs=1 (which are now the default values on my system). I used the 8TiB disk, which spins up very slowly and usually resulted very quickly in a panic - no panic with the patch. Then using dd to /dev/null (bs=1m) I transferred: 308755+0 records in 308755+0 records out 323753082880 bytes transferred in 1366.162410 secs (236979938 bytes/sec) from the disk, so about 324GiB without a panic. -- Gary Jennejohn
Re: panic: Unaligned free (was: kernel panic while copying files)
On 0701T1330, Gary Jennejohn wrote: > Gary Jennejohn wrote: > > I noticed that the value of vm.debug.divisor affects what value is > > returned in uma_core.c:uma_dbg_kskip(), so I decided to try a few > > different values. > > > > The returned value is used to set skipdbg in uma_core.c:item_dtor(). > > > > The default is vm.debug.divisor=1. > > > > vm.debug.divisor is only present when INVARIANTS is defined. > > > > kskipdbg eventually affects the value of freei. > > > > With these values: > > vm.debug.divisor: 0 > > kern.cam.da.enable_uma_ccbs: 1 > > I can turn on the disk and it comes up without a panic! > > > > However, I didn't try to do any large data transfers to the disk. > > > > So, it appears that at least vm.debug.divisor is a big factor in > > whether or not a panic happens with INVARIANTS. > > > > I decided to do a real test. So I built a kernel w/o INVARIANTS and > installed it to /boot/test. > > Then I stuck a 160GB disk I had around into an external USB3 enclosure > and put a filesystem on it. > > The I booted the new kernel from /boot/test and set the sysctls so: > kern.cam.da.enable_uma_ccbs: 1 > kern.cam.ada.enable_uma_ccbs: 1 > > After that I plugged in the external USB3 enclosure and copied about > 114GiB of data from an internal SSD to it - without a kernel panic: > FilesystemSizeUsed Avail Capacity Mounted on > /dev/da0p1144G114G 18G86%/mnt > > I'm pretty sure that's more than I could copy without a kernel panic > prior to the recent changes made in cam and umass. > > My test may not be real proof that all bugs have been squashed, but it > certainly seems to be a better situation than we had before. I think the vm.debug.divisor simply masks the problem; the underlying bug is still there. Could you go back to the setup which panics, and then test the patch at https://reviews.freebsd.org/D31054? It fixes the scenario described by Warner.
Re: panic: Unaligned free (was: kernel panic while copying files)
Gary Jennejohn wrote: > On Wed, 30 Jun 2021 10:35:14 -0600 > Warner Losh wrote: > > > On Wed, Jun 30, 2021 at 6:58 AM Gary Jennejohn wrote: > > > > > On Wed, 30 Jun 2021 06:02:59 +0100 > > > Graham Perrin wrote: > > > > > > > On 29/06/2021 10:42, Gary Jennejohn wrote: > > > > > ___ panic is now the result of an unaligned free. > > > > > > > > > > panic: Unaligned free of 0xf800259e2800 from zone > > > > > 0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3) > > > > > > > > > > I have the crash dump and a debug kernel in case anyone wants more > > > info. > > > > Can you post the backtrace etc. here? Thanks > > > > > > > > > > Sure. As can be seen from the uma zone being da_ccb, the panic > > > resulted from setting kern.cam.da.enable_uma_ccbs=1. > > > > > > Unread portion of the kernel message buffer: > > > panic: Unaligned free of 0xf800259e2800 from zone > > > 0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3) > > > cpuid = 2 > > > time = 1624958650 > > > KDB: stack backtrace: > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame > > > 0xfe00c62687a0 > > > kdb_backtrace() at kdb_backtrace+0x46/frame 0xfe00c6268850 > > > vpanic() at vpanic+0x227/frame 0xfe00c62688f0 > > > panic() at panic+0x4e/frame 0xfe00c6268950 > > > uma_dbg_free() at uma_dbg_free+0xfc/frame 0xfe00c62689a0 > > > item_dtor() at item_dtor+0x7c/frame 0xfe00c62689e0 > > > uma_zfree_arg() at uma_zfree_arg+0xf0/frame 0xfe00c6268a50 > > > uma_zfree() at uma_zfree+0x23/frame 0xfe00c6268a70 > > > xpt_free_ccb() at xpt_free_ccb+0x43/frame 0xfe00c6268a90 > > > camperiphdone() at camperiphdone+0x211/frame 0xfe00c6268ae0 > > > xpt_done_process() at xpt_done_process+0x550/frame 0xfe00c6268b40 > > > xpt_done_td() at xpt_done_td+0x1c0/frame 0xfe00c6268b80 > > > fork_exit() at fork_exit+0x117/frame 0xfe00c6268bf0 > > > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c6268bf0 > > > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > > > KDB: enter: panic > > > > > > doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399 > > > 399 dumptid = curthread->td_tid; > > > (kgdb) bt > > > #0 doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399 > > > #1 0x804d5dd7 in db_dump (dummy=-2138843371, dummy2=false, > > > dummy3=-1, > > > dummy4=0xfe00c6268320 "") at /usr/src/sys/ddb/db_command.c:575 > > > #2 0x804d5bf4 in db_command ( > > > last_cmdp=0x8114ce80 , cmd_table=0x0, > > > dopager=1) > > > at /usr/src/sys/ddb/db_command.c:482 > > > #3 0x804d583c in db_command_loop () > > > at /usr/src/sys/ddb/db_command.c:535 > > > #4 0x804da27c in db_trap (type=3, code=0) > > > at /usr/src/sys/ddb/db_main.c:270 > > > #5 0x8083df9d in kdb_trap (type=3, code=0, tf=0xfe00c6268770) > > > at /usr/src/sys/kern/subr_kdb.c:727 > > > #6 0x80d31494 in trap (frame=0xfe00c6268770) > > > at /usr/src/sys/amd64/amd64/trap.c:604 > > > #7 0x80d32628 in trap_check (frame=0xfe00c6268770) > > > at /usr/src/sys/amd64/amd64/trap.c:664 > > > #8 > > > #9 breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66 > > > #10 0x8083d3d0 in kdb_enter (why=0x80e0355b "panic", > > > msg=0x80e0355b "panic") at /usr/src/sys/kern/subr_kdb.c:505 > > > #11 0x807d1725 in vpanic ( > > > fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab > > > %p(%d)", ap=0xfe00c6268930) at /usr/src/sys/kern/kern_shutdown.c:906 > > > #12 0x807d120e in panic ( > > > fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab > > > %p(%d)") > > > at /usr/src/sys/kern/kern_shutdown.c:843 > > > #13 0x80c16a8c in uma_dbg_free (zone=0xfe00dc9d2000, > > > slab=0xf800259e2fd8, item=0xf800259e2800) > > > at /usr/src/sys/vm/uma_core.c:5659 > > > #14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000, > > > item=0xf800259e2800, size=544, udata=0x0, skip=SKIP_NONE) > > > at /usr/src/sys/vm/uma_core.c:3418 > > > #15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000, > > > item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374 > > > #16 0x802e45d3 in uma_zfree (zone=0xfe00dc9d2000, > > > item=0xf800259e2800) at /usr/src/sys/vm/uma.h:404 > > > #17 0x802dc3c3 in xpt_free_ccb (free_ccb=0xf800259e2800) > > > at /usr/src/sys/cam/cam_xpt.c:4676 > > > #18 0x802dacf1 in camperiphdone (periph=0xf80025329b00, > > > done_ccb=0xf80025a24cc0) at /usr/src/sys/cam/cam_periph.c:1427 > > > #19 0x802e4520 in xpt_done_process (ccb_h=0xf80025a24cc0) > > > at /usr/src/sys/cam/cam_xpt.c:5493 > > > #20 0x802e68e0 in xpt_done_td (arg=0x81143700 > > > ) > > > at /usr/src/sys/cam/cam_xpt.c:5548 > > > #21 0x807673c7 in fork_exit
Re: panic: Unaligned free (was: kernel panic while copying files)
On Wed, 30 Jun 2021 10:35:14 -0600 Warner Losh wrote: > On Wed, Jun 30, 2021 at 6:58 AM Gary Jennejohn wrote: > > > On Wed, 30 Jun 2021 06:02:59 +0100 > > Graham Perrin wrote: > > > > > On 29/06/2021 10:42, Gary Jennejohn wrote: > > > > ___ panic is now the result of an unaligned free. > > > > > > > > panic: Unaligned free of 0xf800259e2800 from zone > > > > 0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3) > > > > > > > > I have the crash dump and a debug kernel in case anyone wants more > > info. > > > Can you post the backtrace etc. here? Thanks > > > > > > > Sure. As can be seen from the uma zone being da_ccb, the panic > > resulted from setting kern.cam.da.enable_uma_ccbs=1. > > > > Unread portion of the kernel message buffer: > > panic: Unaligned free of 0xf800259e2800 from zone > > 0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3) > > cpuid = 2 > > time = 1624958650 > > KDB: stack backtrace: > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame > > 0xfe00c62687a0 > > kdb_backtrace() at kdb_backtrace+0x46/frame 0xfe00c6268850 > > vpanic() at vpanic+0x227/frame 0xfe00c62688f0 > > panic() at panic+0x4e/frame 0xfe00c6268950 > > uma_dbg_free() at uma_dbg_free+0xfc/frame 0xfe00c62689a0 > > item_dtor() at item_dtor+0x7c/frame 0xfe00c62689e0 > > uma_zfree_arg() at uma_zfree_arg+0xf0/frame 0xfe00c6268a50 > > uma_zfree() at uma_zfree+0x23/frame 0xfe00c6268a70 > > xpt_free_ccb() at xpt_free_ccb+0x43/frame 0xfe00c6268a90 > > camperiphdone() at camperiphdone+0x211/frame 0xfe00c6268ae0 > > xpt_done_process() at xpt_done_process+0x550/frame 0xfe00c6268b40 > > xpt_done_td() at xpt_done_td+0x1c0/frame 0xfe00c6268b80 > > fork_exit() at fork_exit+0x117/frame 0xfe00c6268bf0 > > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c6268bf0 > > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > > KDB: enter: panic > > > > doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399 > > 399 dumptid = curthread->td_tid; > > (kgdb) bt > > #0 doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399 > > #1 0x804d5dd7 in db_dump (dummy=-2138843371, dummy2=false, > > dummy3=-1, > > dummy4=0xfe00c6268320 "") at /usr/src/sys/ddb/db_command.c:575 > > #2 0x804d5bf4 in db_command ( > > last_cmdp=0x8114ce80 , cmd_table=0x0, > > dopager=1) > > at /usr/src/sys/ddb/db_command.c:482 > > #3 0x804d583c in db_command_loop () > > at /usr/src/sys/ddb/db_command.c:535 > > #4 0x804da27c in db_trap (type=3, code=0) > > at /usr/src/sys/ddb/db_main.c:270 > > #5 0x8083df9d in kdb_trap (type=3, code=0, tf=0xfe00c6268770) > > at /usr/src/sys/kern/subr_kdb.c:727 > > #6 0x80d31494 in trap (frame=0xfe00c6268770) > > at /usr/src/sys/amd64/amd64/trap.c:604 > > #7 0x80d32628 in trap_check (frame=0xfe00c6268770) > > at /usr/src/sys/amd64/amd64/trap.c:664 > > #8 > > #9 breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66 > > #10 0x8083d3d0 in kdb_enter (why=0x80e0355b "panic", > > msg=0x80e0355b "panic") at /usr/src/sys/kern/subr_kdb.c:505 > > #11 0x807d1725 in vpanic ( > > fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab > > %p(%d)", ap=0xfe00c6268930) at /usr/src/sys/kern/kern_shutdown.c:906 > > #12 0x807d120e in panic ( > > fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab > > %p(%d)") > > at /usr/src/sys/kern/kern_shutdown.c:843 > > #13 0x80c16a8c in uma_dbg_free (zone=0xfe00dc9d2000, > > slab=0xf800259e2fd8, item=0xf800259e2800) > > at /usr/src/sys/vm/uma_core.c:5659 > > #14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000, > > item=0xf800259e2800, size=544, udata=0x0, skip=SKIP_NONE) > > at /usr/src/sys/vm/uma_core.c:3418 > > #15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000, > > item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374 > > #16 0x802e45d3 in uma_zfree (zone=0xfe00dc9d2000, > > item=0xf800259e2800) at /usr/src/sys/vm/uma.h:404 > > #17 0x802dc3c3 in xpt_free_ccb (free_ccb=0xf800259e2800) > > at /usr/src/sys/cam/cam_xpt.c:4676 > > #18 0x802dacf1 in camperiphdone (periph=0xf80025329b00, > > done_ccb=0xf80025a24cc0) at /usr/src/sys/cam/cam_periph.c:1427 > > #19 0x802e4520 in xpt_done_process (ccb_h=0xf80025a24cc0) > > at /usr/src/sys/cam/cam_xpt.c:5493 > > #20 0x802e68e0 in xpt_done_td (arg=0x81143700 ) > > at /usr/src/sys/cam/cam_xpt.c:5548 > > #21 0x807673c7 in fork_exit (callout=0x802e6720 > > , > > arg=0x81143700 , frame=0xfe00c6268c00) > > at /usr/src/sys/kern/kern_fork.c:1083 > > #22 > > > > [kgdb stuff removed] > > > > (kgdb) down > > #15 0x80c0ba60 in uma_zfree_arg
Re: panic: Unaligned free (was: kernel panic while copying files)
On Wed, Jun 30, 2021 at 6:58 AM Gary Jennejohn wrote: > On Wed, 30 Jun 2021 06:02:59 +0100 > Graham Perrin wrote: > > > On 29/06/2021 10:42, Gary Jennejohn wrote: > > > ___ panic is now the result of an unaligned free. > > > > > > panic: Unaligned free of 0xf800259e2800 from zone > > > 0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3) > > > > > > I have the crash dump and a debug kernel in case anyone wants more > info. > > Can you post the backtrace etc. here? Thanks > > > > Sure. As can be seen from the uma zone being da_ccb, the panic > resulted from setting kern.cam.da.enable_uma_ccbs=1. > > Unread portion of the kernel message buffer: > panic: Unaligned free of 0xf800259e2800 from zone > 0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3) > cpuid = 2 > time = 1624958650 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame > 0xfe00c62687a0 > kdb_backtrace() at kdb_backtrace+0x46/frame 0xfe00c6268850 > vpanic() at vpanic+0x227/frame 0xfe00c62688f0 > panic() at panic+0x4e/frame 0xfe00c6268950 > uma_dbg_free() at uma_dbg_free+0xfc/frame 0xfe00c62689a0 > item_dtor() at item_dtor+0x7c/frame 0xfe00c62689e0 > uma_zfree_arg() at uma_zfree_arg+0xf0/frame 0xfe00c6268a50 > uma_zfree() at uma_zfree+0x23/frame 0xfe00c6268a70 > xpt_free_ccb() at xpt_free_ccb+0x43/frame 0xfe00c6268a90 > camperiphdone() at camperiphdone+0x211/frame 0xfe00c6268ae0 > xpt_done_process() at xpt_done_process+0x550/frame 0xfe00c6268b40 > xpt_done_td() at xpt_done_td+0x1c0/frame 0xfe00c6268b80 > fork_exit() at fork_exit+0x117/frame 0xfe00c6268bf0 > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c6268bf0 > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > KDB: enter: panic > > doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399 > 399 dumptid = curthread->td_tid; > (kgdb) bt > #0 doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399 > #1 0x804d5dd7 in db_dump (dummy=-2138843371, dummy2=false, > dummy3=-1, > dummy4=0xfe00c6268320 "") at /usr/src/sys/ddb/db_command.c:575 > #2 0x804d5bf4 in db_command ( > last_cmdp=0x8114ce80 , cmd_table=0x0, > dopager=1) > at /usr/src/sys/ddb/db_command.c:482 > #3 0x804d583c in db_command_loop () > at /usr/src/sys/ddb/db_command.c:535 > #4 0x804da27c in db_trap (type=3, code=0) > at /usr/src/sys/ddb/db_main.c:270 > #5 0x8083df9d in kdb_trap (type=3, code=0, tf=0xfe00c6268770) > at /usr/src/sys/kern/subr_kdb.c:727 > #6 0x80d31494 in trap (frame=0xfe00c6268770) > at /usr/src/sys/amd64/amd64/trap.c:604 > #7 0x80d32628 in trap_check (frame=0xfe00c6268770) > at /usr/src/sys/amd64/amd64/trap.c:664 > #8 > #9 breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66 > #10 0x8083d3d0 in kdb_enter (why=0x80e0355b "panic", > msg=0x80e0355b "panic") at /usr/src/sys/kern/subr_kdb.c:505 > #11 0x807d1725 in vpanic ( > fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab > %p(%d)", ap=0xfe00c6268930) at /usr/src/sys/kern/kern_shutdown.c:906 > #12 0x807d120e in panic ( > fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab > %p(%d)") > at /usr/src/sys/kern/kern_shutdown.c:843 > #13 0x80c16a8c in uma_dbg_free (zone=0xfe00dc9d2000, > slab=0xf800259e2fd8, item=0xf800259e2800) > at /usr/src/sys/vm/uma_core.c:5659 > #14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000, > item=0xf800259e2800, size=544, udata=0x0, skip=SKIP_NONE) > at /usr/src/sys/vm/uma_core.c:3418 > #15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000, > item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374 > #16 0x802e45d3 in uma_zfree (zone=0xfe00dc9d2000, > item=0xf800259e2800) at /usr/src/sys/vm/uma.h:404 > #17 0x802dc3c3 in xpt_free_ccb (free_ccb=0xf800259e2800) > at /usr/src/sys/cam/cam_xpt.c:4676 > #18 0x802dacf1 in camperiphdone (periph=0xf80025329b00, > done_ccb=0xf80025a24cc0) at /usr/src/sys/cam/cam_periph.c:1427 > #19 0x802e4520 in xpt_done_process (ccb_h=0xf80025a24cc0) > at /usr/src/sys/cam/cam_xpt.c:5493 > #20 0x802e68e0 in xpt_done_td (arg=0x81143700 ) > at /usr/src/sys/cam/cam_xpt.c:5548 > #21 0x807673c7 in fork_exit (callout=0x802e6720 > , > arg=0x81143700 , frame=0xfe00c6268c00) > at /usr/src/sys/kern/kern_fork.c:1083 > #22 > > [kgdb stuff removed] > > (kgdb) down > #15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000, > item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374 > 4374item_dtor(zone, item, cache_uz_size(cache), udata, > SKIP_NONE); > (kgdb) down > #14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000, >
Re: panic: Unaligned free (was: kernel panic while copying files)
On Wed, 30 Jun 2021 06:02:59 +0100 Graham Perrin wrote: > On 29/06/2021 10:42, Gary Jennejohn wrote: > > ___ panic is now the result of an unaligned free. > > > > panic: Unaligned free of 0xf800259e2800 from zone > > 0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3) > > > > I have the crash dump and a debug kernel in case anyone wants more info. > Can you post the backtrace etc. here? Thanks > Sure. As can be seen from the uma zone being da_ccb, the panic resulted from setting kern.cam.da.enable_uma_ccbs=1. Unread portion of the kernel message buffer: panic: Unaligned free of 0xf800259e2800 from zone 0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3) cpuid = 2 time = 1624958650 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame 0xfe00c62687a0 kdb_backtrace() at kdb_backtrace+0x46/frame 0xfe00c6268850 vpanic() at vpanic+0x227/frame 0xfe00c62688f0 panic() at panic+0x4e/frame 0xfe00c6268950 uma_dbg_free() at uma_dbg_free+0xfc/frame 0xfe00c62689a0 item_dtor() at item_dtor+0x7c/frame 0xfe00c62689e0 uma_zfree_arg() at uma_zfree_arg+0xf0/frame 0xfe00c6268a50 uma_zfree() at uma_zfree+0x23/frame 0xfe00c6268a70 xpt_free_ccb() at xpt_free_ccb+0x43/frame 0xfe00c6268a90 camperiphdone() at camperiphdone+0x211/frame 0xfe00c6268ae0 xpt_done_process() at xpt_done_process+0x550/frame 0xfe00c6268b40 xpt_done_td() at xpt_done_td+0x1c0/frame 0xfe00c6268b80 fork_exit() at fork_exit+0x117/frame 0xfe00c6268bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c6268bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399 399 dumptid = curthread->td_tid; (kgdb) bt #0 doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399 #1 0x804d5dd7 in db_dump (dummy=-2138843371, dummy2=false, dummy3=-1, dummy4=0xfe00c6268320 "") at /usr/src/sys/ddb/db_command.c:575 #2 0x804d5bf4 in db_command ( last_cmdp=0x8114ce80 , cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:482 #3 0x804d583c in db_command_loop () at /usr/src/sys/ddb/db_command.c:535 #4 0x804da27c in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:270 #5 0x8083df9d in kdb_trap (type=3, code=0, tf=0xfe00c6268770) at /usr/src/sys/kern/subr_kdb.c:727 #6 0x80d31494 in trap (frame=0xfe00c6268770) at /usr/src/sys/amd64/amd64/trap.c:604 #7 0x80d32628 in trap_check (frame=0xfe00c6268770) at /usr/src/sys/amd64/amd64/trap.c:664 #8 #9 breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66 #10 0x8083d3d0 in kdb_enter (why=0x80e0355b "panic", msg=0x80e0355b "panic") at /usr/src/sys/kern/subr_kdb.c:505 #11 0x807d1725 in vpanic ( fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab %p(%d)", ap=0xfe00c6268930) at /usr/src/sys/kern/kern_shutdown.c:906 #12 0x807d120e in panic ( fmt=0x80dbca46 "Unaligned free of %p from zone %p(%s) slab %p(%d)") at /usr/src/sys/kern/kern_shutdown.c:843 #13 0x80c16a8c in uma_dbg_free (zone=0xfe00dc9d2000, slab=0xf800259e2fd8, item=0xf800259e2800) at /usr/src/sys/vm/uma_core.c:5659 #14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000, item=0xf800259e2800, size=544, udata=0x0, skip=SKIP_NONE) at /usr/src/sys/vm/uma_core.c:3418 #15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000, item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374 #16 0x802e45d3 in uma_zfree (zone=0xfe00dc9d2000, item=0xf800259e2800) at /usr/src/sys/vm/uma.h:404 #17 0x802dc3c3 in xpt_free_ccb (free_ccb=0xf800259e2800) at /usr/src/sys/cam/cam_xpt.c:4676 #18 0x802dacf1 in camperiphdone (periph=0xf80025329b00, done_ccb=0xf80025a24cc0) at /usr/src/sys/cam/cam_periph.c:1427 #19 0x802e4520 in xpt_done_process (ccb_h=0xf80025a24cc0) at /usr/src/sys/cam/cam_xpt.c:5493 #20 0x802e68e0 in xpt_done_td (arg=0x81143700 ) at /usr/src/sys/cam/cam_xpt.c:5548 #21 0x807673c7 in fork_exit (callout=0x802e6720 , arg=0x81143700 , frame=0xfe00c6268c00) at /usr/src/sys/kern/kern_fork.c:1083 #22 [kgdb stuff removed] (kgdb) down #15 0x80c0ba60 in uma_zfree_arg (zone=0xfe00dc9d2000, item=0xf800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374 4374item_dtor(zone, item, cache_uz_size(cache), udata, SKIP_NONE); (kgdb) down #14 0x80c0c5dc in item_dtor (zone=0xfe00dc9d2000, item=0xf800259e2800, size=544, udata=0x0, skip=SKIP_NONE) at /usr/src/sys/vm/uma_core.c:3418 3418uma_dbg_free(zone, NULL, item); (kgdb) p/x skipdbg $26 = 0x0 (kgdb) p/x zone->uz_flags $27 = 0x4100 (UMA_ZFLAG_TRASH|UMA_ZFLAG_CTORDTOR)
panic: Unaligned free (was: kernel panic while copying files)
On 29/06/2021 10:42, Gary Jennejohn wrote: … panic is now the result of an unaligned free. panic: Unaligned free of 0xf800259e2800 from zone 0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3) I have the crash dump and a debug kernel in case anyone wants more info. Can you post the backtrace etc. here? Thanks
Re: kernel panic while copying files
I was sort of hoping that all the recent changes made by imp@ in cam and umass may have fixed the cause of the kernel crash. Unfortunately not. But there is a change - instead of a duplicate free the panic is now the result of an unaligned free. panic: Unaligned free of 0xf800259e2800 from zone 0xfe00dc9d2000(da_ccb) slab 0xf800259e2fd8(3) I have the crash dump and a debug kernel in case anyone wants more info. -- Gary Jennejohn
Re: kernel panic while copying files
On Sat, 12 Jun 2021 14:10:36 +0100 Edward Tomasz Napiera__a wrote: > On 0610T1150, Gary Jennejohn wrote: > > On Tue, 8 Jun 2021 17:54:05 +0200 > > Gary Jennejohn wrote: > > > > [big snip] > > [..] > > > So, I did ``git reset --hard 8dc96b74edb844bb621afeba38fe4af104b13120'', > > which was the penultimate commit made by trasz to clear CCBs on the stack > > after he committed 3394d4239b85b5577845d9e6de4e97b18d3dba58, the change > > to allocate CCBs in UMA. > > > > Note that I only built the kernel and not world. > > > > I tried to reset to 3394d4239b85b5577845d9e6de4e97b18d3dba58 itself, > > but without the following commits for CCBs on the stack the kernel > > paniced during startup in AHCI. > > > > Anyway, this is the minimum set of changes relevant to the uma_ccbs > > story and also results in a panic identical to the one listed above > > when I set kern.cam.da.enable_uma_ccbs=1 and turn on the external USB > > disk. > > > > So, Warner is probably right and at least the da_uma_ccbs commits > > should be reverted until more research can be done on why the panic > > happens. > > > > The ada_uma_ccbs commits do not cause any problems in my experience and > > could probably be left in the kernel. > > Thank you, I'm working on a fix. Meanwhile - does the current code > cause any problems with set kern.cam.da.enable_uma_ccbs set to 0? > If it doesn't, it probably doesn't require backing off, since 0 is > the default, and will keep being the default until bugs such as this > one are fixed. > No, with the sysctl set to 0 it works really well. I've been running it that way for several days and have transferred large amounts of data to an external USB3 disk with no problems. I didn't mention it, but I also tested the reset kernel (with INVARIANTS) with the sysctl set to 0 and the kernel did not panic. I've had ada_enable_uma_ccbs set to 1 the whole time and never saw any problems. I agree, as long as the default is 0 all the code can stay in the tree. -- Gary Jennejohn
Re: kernel panic while copying files
On 0610T1150, Gary Jennejohn wrote: > On Tue, 8 Jun 2021 17:54:05 +0200 > Gary Jennejohn wrote: > > [big snip] [..] > So, I did ``git reset --hard 8dc96b74edb844bb621afeba38fe4af104b13120'', > which was the penultimate commit made by trasz to clear CCBs on the stack > after he committed 3394d4239b85b5577845d9e6de4e97b18d3dba58, the change > to allocate CCBs in UMA. > > Note that I only built the kernel and not world. > > I tried to reset to 3394d4239b85b5577845d9e6de4e97b18d3dba58 itself, > but without the following commits for CCBs on the stack the kernel > paniced during startup in AHCI. > > Anyway, this is the minimum set of changes relevant to the uma_ccbs > story and also results in a panic identical to the one listed above > when I set kern.cam.da.enable_uma_ccbs=1 and turn on the external USB > disk. > > So, Warner is probably right and at least the da_uma_ccbs commits > should be reverted until more research can be done on why the panic > happens. > > The ada_uma_ccbs commits do not cause any problems in my experience and > could probably be left in the kernel. Thank you, I'm working on a fix. Meanwhile - does the current code cause any problems with set kern.cam.da.enable_uma_ccbs set to 0? If it doesn't, it probably doesn't require backing off, since 0 is the default, and will keep being the default until bugs such as this one are fixed.
Re: kernel panic while copying files
On Tue, 8 Jun 2021 17:54:05 +0200 Gary Jennejohn wrote: [big snip] > Here's the kgdb backtrace with the -O0 kernel: > > (kgdb) bt > #0 0x8081d706 in doadump (textdump=0) > at /usr/src/sys/kern/kern_shutdown.c:398 > #1 0x804ef15a in db_dump (dummy=-2138500043, dummy2=false, dummy3=-1, > dummy4=0xfe00c62a11b0 "") at /usr/src/sys/ddb/db_command.c:575 > #2 0x804eef5f in db_command ( > last_cmdp=0x8114d380 , cmd_table=0x0, dopager=1) > at /usr/src/sys/ddb/db_command.c:482 > #3 0x804eeb38 in db_command_loop () > at /usr/src/sys/ddb/db_command.c:535 > #4 0x804f38ef in db_trap (type=3, code=0) > at /usr/src/sys/ddb/db_main.c:270 > #5 0x80891d02 in kdb_trap (type=3, code=0, tf=0xfe00c62a1680) > at /usr/src/sys/kern/subr_kdb.c:727 > #6 0x80dd53c3 in trap (frame=0xfe00c62a1680) > at /usr/src/sys/amd64/amd64/trap.c:604 > #7 0x80dd6718 in trap_check (frame=0xfe00c62a1680) > at /usr/src/sys/amd64/amd64/trap.c:664 > #8 > #9 breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66 > #10 0x808910d0 in kdb_enter (why=0x80eaaf0b "panic", > msg=0x80eaaf0b "panic") at /usr/src/sys/kern/subr_kdb.c:505 > #11 0x8081dbfe in vpanic ( > fmt=0x80e80f73 "Duplicate free of %p from zone %p(%s) slab > %p(%d)", ap=0xfe00c62a1850) at /usr/src/sys/kern/kern_shutdown.c:906 > #12 0x8081d6b0 in panic ( > fmt=0x80e80f73 "Duplicate free of %p from zone %p(%s) slab > %p(%d)") > at /usr/src/sys/kern/kern_shutdown.c:843 > #13 0x80caaec5 in uma_dbg_free (zone=0xfe00dc9d9800, > slab=0xf80007ee0fd8, item=0xf80007ee) > at /usr/src/sys/vm/uma_core.c:5664 > #14 0x80c9faf5 in item_dtor (zone=0xfe00dc9d9800, > item=0xf80007ee, size=544, udata=0x0, skip=SKIP_NONE) > at /usr/src/sys/vm/uma_core.c:3418 > #15 0x80c9eec7 in uma_zfree_arg (zone=0xfe00dc9d9800, > item=0xf80007ee, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374 > #16 0x802e5a89 in uma_zfree (zone=0xfe00dc9d9800, > item=0xf80007ee) at /usr/src/sys/vm/uma.h:404 > #17 0x802dcfa6 in xpt_free_ccb (free_ccb=0xf80007ee) > at /usr/src/sys/cam/cam_xpt.c:4674 > #18 0x802db639 in camperiphdone (periph=0xf8005d68bd00, > done_ccb=0xf80007797cc0) at /usr/src/sys/cam/cam_periph.c:1427 > #19 0x802e59b6 in xpt_done_process (ccb_h=0xf80007797cc0) > at /usr/src/sys/cam/cam_xpt.c:5491 > #20 0x802e811e in xpt_done_td (arg=0x81143c00 ) > at /usr/src/sys/cam/cam_xpt.c:5546 > #21 0x807ac0ea in fork_exit (callout=0x802e7f20 , > arg=0x81143c00 , frame=0xfe00c62a1c00) > at /usr/src/sys/kern/kern_fork.c:1083 > #22 > So, I did ``git reset --hard 8dc96b74edb844bb621afeba38fe4af104b13120'', which was the penultimate commit made by trasz to clear CCBs on the stack after he committed 3394d4239b85b5577845d9e6de4e97b18d3dba58, the change to allocate CCBs in UMA. Note that I only built the kernel and not world. I tried to reset to 3394d4239b85b5577845d9e6de4e97b18d3dba58 itself, but without the following commits for CCBs on the stack the kernel paniced during startup in AHCI. Anyway, this is the minimum set of changes relevant to the uma_ccbs story and also results in a panic identical to the one listed above when I set kern.cam.da.enable_uma_ccbs=1 and turn on the external USB disk. So, Warner is probably right and at least the da_uma_ccbs commits should be reverted until more research can be done on why the panic happens. The ada_uma_ccbs commits do not cause any problems in my experience and could probably be left in the kernel. -- Gary Jennejohn
Re: kernel panic while copying files
On Tue, 8 Jun 2021 06:27:04 -0600 Warner Losh wrote: > On Tue, Jun 8, 2021 at 2:47 AM Gary Jennejohn wrote: > [snip old stuff] > > Here the kgdb backtrace: > > > > Unread portion of the kernel message buffer: > > panic: Duplicate free of 0xf800356b9000 from zone > > 0xfe00dcbdd800(da_ccb) slab 0xf800356b9fd8(0) > > cpuid = 8 > > time = 1623140519 > > KDB: stack backtrace: > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > > 0xfe00c5f398c0 > > vpanic() at vpanic+0x181/frame 0xfe00c5f39910 > > panic() at panic+0x43/frame 0xfe00c5f39970 > > uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfe00c5f399b0 > > uma_zfree_arg() at uma_zfree_arg+0x147/frame 0xfe00c5f39a00 > > camperiphdone() at camperiphdone+0x1b7/frame 0xfe00c5f39b20 > > xpt_done_process() at xpt_done_process+0x3dd/frame 0xfe00c5f39b60 > > xpt_done_td() at xpt_done_td+0xf5/frame 0xfe00c5f39bb0 > > fork_exit() at fork_exit+0x80/frame 0xfe00c5f39bf0 > > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c5f39bf0 > > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > > KDB: enter: panic > > > > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 > > 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" > > (offsetof(struct pcpu, > > (kgdb) bt > > #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 > > #1 doadump (textdump=textdump@entry=0) > > at /usr/src/sys/kern/kern_shutdown.c:399 > > #2 0x8040c39a in db_dump (dummy=, > > dummy2=, dummy3=, dummy4=) > > at /usr/src/sys/ddb/db_command.c:575 > > #3 0x8040c192 in db_command (last_cmdp=, > > cmd_table=, dopager=dopager@entry=1) > > at /usr/src/sys/ddb/db_command.c:482 > > #4 0x8040beed in db_command_loop () > > at /usr/src/sys/ddb/db_command.c:535 > > #5 0x8040f616 in db_trap (type=, code= > out>) > > at /usr/src/sys/ddb/db_main.c:270 > > #6 0x8066b1c4 in kdb_trap (type=type@entry=3, code=code@entry=0, > > tf=, tf@entry=0xfe00c5f397f0) > > at /usr/src/sys/kern/subr_kdb.c:727 > > #7 0x809a4e96 in trap (frame=0xfe00c5f397f0) > > at /usr/src/sys/amd64/amd64/trap.c:604 > > #8 > > #9 kdb_enter (why=0x80a61a23 "panic", msg=) > > at /usr/src/sys/kern/subr_kdb.c:506 > > #10 0x806207a2 in vpanic (fmt=, ap=, > > ap@entry=0xfe00c5f39950) at /usr/src/sys/kern/kern_shutdown.c:907 > > #11 0x80620533 in panic ( > > fmt=0x80d635c8 ".\024\244\200\377\377\377\377") > > at /usr/src/sys/kern/kern_shutdown.c:843 > > #12 0x808e12b1 in uma_dbg_free (zone=0xfe00dcbdd800, > > slab=0xf800356b9fd8, item=0xf800356b9000) > > at /usr/src/sys/vm/uma_core.c:5664 > > #13 0x808d9de7 in item_dtor (zone=0xfe00dcbdd800, > > item=0xf800356b9000, size=544, udata=0x0, skip=SKIP_NONE) > > at /usr/src/sys/vm/uma_core.c:3418 > > #14 uma_zfree_arg (zone=0xfe00dcbdd800, item=0xf800356b9000, > > udata=udata@entry=0x0) at /usr/src/sys/vm/uma_core.c:4374 > > #15 0x802da503 in uma_zfree (zone=0x80d635c8 , > > item=0x200) at /usr/src/sys/vm/uma.h:404 > > > > OK. This is a bad stack trace. camperiphdone doesn't call uma_zfree()... > It does call > xpt_free_ccb, though, and that's likely what's going wrong. And that > matches the line > numbers. Most likely this is llvm's tail call optimizations... Can you > compile the kernel > either -O0 or with -fno-optimize-sibling-calls? That will give a better > call stack. > > However, it's likely the new UMA stuff trasz committed (or it's providing > better > diagnostics than the old malloc based code which seems more likely) that > can be > disabled by the tunable kern.cam.da.enable_uma_ccbs=0. > > The lines in question: > saved_ccb = (union ccb *)done_ccb->ccb_h.saved_ccb_ptr; > bcopy(saved_ccb, done_ccb, sizeof(*done_ccb)); > xpt_free_ccb(saved_ccb); > > So we overwrite the done_ccb with the saved_ccb's contents and then free > the saved ccb. > That's likely OKish, though. > > We copy entire CCBs around in this code a lot, and I've not traced through > it. But we're > sending a scsi start unit in response to some error that is being reported > via cam_periph_error() > > #16 0x802d9117 in camperiphdone (periph=0xf800061e2c00, > > done_ccb=0xf800355d6cc0) at /usr/src/sys/cam/cam_periph.c:1427 > > #17 0x802dfebd in xpt_done_process (ccb_h=0xf800355d6cc0) > > at /usr/src/sys/cam/cam_xpt.c:5491 > > #18 0x802e1ec5 in xpt_done_td ( > > arg=arg@entry=0x80d33d80 ) > > at /usr/src/sys/cam/cam_xpt.c:5546 > > #19 0x805dad80 in fork_exit (callout=0x802e1dd0 > > , > > arg=0x80d33d80 , frame=0xfe00c5f39c00) > > at /usr/src/sys/kern/kern_fork.c:1083 > > #20 > > > > Apparently caused by recent changes to CAM. > > > > Let me know if you want more information. > > > > what's
Re: kernel panic while copying files
On Tue, Jun 8, 2021 at 8:42 AM Gary Jennejohn wrote: > On Tue, 8 Jun 2021 06:48:19 -0600 > Warner Losh wrote: > > > Sorry to reply to myself... had a thought as my brain rested while making > > tea... > > > > I think we may need to consider reverting (or at least not yet enabling) > > the uma stuff. > > > > I tested and enabled the UMA CCB stuff immediately after trasz had > committed it. I was able to copy files panic-free over USB until > recently AFAICR. > > I also have had the kern.cam.ada.enable_uma_ccbs=1 set since > then and have never seen a problem there. Only with USB. > Yes. This specific bug only affects SCSI. And it only affects it when there's an error that requires a restart. I've not yet had the time to do an audit for where else the copying is done... > I'll try booting a new kernel with the uma_ccb sysctl's set to 0 > and see what happens. > > BTW I now have a kernel compiled with -O0 ready to test. > Great! Warner
Re: kernel panic while copying files
On Tue, 8 Jun 2021 06:48:19 -0600 Warner Losh wrote: > Sorry to reply to myself... had a thought as my brain rested while making > tea... > > I think we may need to consider reverting (or at least not yet enabling) > the uma stuff. > I tested and enabled the UMA CCB stuff immediately after trasz had committed it. I was able to copy files panic-free over USB until recently AFAICR. I also have had the kern.cam.ada.enable_uma_ccbs=1 set since then and have never seen a problem there. Only with USB. I'll try booting a new kernel with the uma_ccb sysctl's set to 0 and see what happens. BTW I now have a kernel compiled with -O0 ready to test. [snip lots of extraneous stuff] -- Gary Jennejohn
Re: kernel panic while copying files
Sorry to reply to myself... had a thought as my brain rested while making tea... I think we may need to consider reverting (or at least not yet enabling) the uma stuff. On Tue, Jun 8, 2021 at 6:27 AM Warner Losh wrote: > > > On Tue, Jun 8, 2021 at 2:47 AM Gary Jennejohn > wrote: > >> On Mon, 7 Jun 2021 16:54:11 -0400 >> Mark Johnston wrote: >> >> > On Mon, Jun 07, 2021 at 11:01:09AM +0200, Gary Jennejohn wrote: >> > > I've seen this panic three times in the last two days: >> > > >> > > [first panic] >> > > Unread portion of the kernel message buffer: >> > > >> > > >> > > Fatal trap 12: page fault while in kernel mode >> > > cpuid = 3; apic id = 03 >> > > fault virtual address = 0x801118000 >> > > fault code = supervisor write data, page not present >> > > instruction pointer = 0x20:0x808d2212 >> > > stack pointer = 0x28:0xfe00dbc8c760 >> > > frame pointer = 0x28:0xfe00dbc8c7a0 >> > > code segment= base 0x0, limit 0xf, type 0x1b >> > > = DPL 0, pres 1, long 1, def32 0, gran 1 >> > > processor eflags= interrupt enabled, resume, IOPL = 0 >> > > current process = 28 (dom0) >> > > trap number = 12 >> > > panic: page fault >> > > cpuid = 3 >> > > time = 1622963058 >> > > KDB: stack backtrace: >> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> 0xfe00dbc8c410 >> > > vpanic() at vpanic+0x181/frame 0xfe00dbc8c460 >> > > panic() at panic+0x43/frame 0xfe00dbc8c4c0 >> > > trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520 >> > > trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580 >> > > trap() at trap+0x253/frame 0xfe00dbc8c690 >> > > calltrap() at calltrap+0x8/frame 0xfe00dbc8c690 >> > > --- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp >> = 0xfe00dbc8c7a0 --- >> > > zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0 >> > > bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0 >> > > bucket_cache_reclaim_domain() at >> bucket_cache_reclaim_domain+0x30a/frame 0xfe00dbc8c830 >> > > zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880 >> > > uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame >> 0xfe00dbc8c8b0 >> > > vm_pageout_worker() at vm_pageout_worker+0x41e/frame >> 0xfe00dbc8cb70 >> > > vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0 >> > > fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0 >> > > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0 >> > > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- >> > > KDB: enter: panic >> > > >> > > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 >> > > 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct >> pcpu, >> > >pc_curthread))); >> > > >> > > One difference was that in the second and third panics the fault >> virtual >> > > address was 0x0. But the backtrace was the same. >> > > >> > > Relevant info from the info.x files: >> > > Architecture: amd64 >> > > Architecture Version: 2 >> > > Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: >> Sat Jun >> > > 5 09:58:55 CEST 2021 >> > > >> > > CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz >> K8-class CPU) >> > > Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 >> Stepping=1 >> > > AMD Features=0x2e500800 >> > > AMD >> Features2=0x35c233ff >> > > AMD Extended Feature Extensions ID >> EBX=0x1007 >> > > >> > > I have 16GiB of memory in the box. >> > > >> > > The panic occurred while copying files from an internal SATA SSD to a >> > > SATA 8TB disk in an external USB3 docking station. The panic seems to >> > > occur quite quickly, after only a few files have been copied. >> > > >> > > swap is on a different internal disk. >> > > >> > > I can poke around in the crash dumps with kgdb if anyone wants more >> > > information. >> > >> > Are you running with invariants configured in the kernel? If not, >> > please try to reproduce this in a kernel with >> > >> > options INVARIANT_SUPPORT >> > options INVARIANTS >> > >> > configured. >> > >> > A stack trace with line numbers would also be helpful. >> >> Thanks for the hint. After enabling INVARIANTS the kernel panics as >> soon I turn on the external USB3 disk. No user disk access required. >> >> Version String: FreeBSD 14.0-CURRENT #34 main-n247239-f570a6723e1: Tue Jun >> 8 09:34:32 CEST 2021 >> >> Here the kgdb backtrace: >> >> Unread portion of the kernel message buffer: >> panic: Duplicate free of 0xf800356b9000 from zone >> 0xfe00dcbdd800(da_ccb) slab 0xf800356b9fd8(0) >> cpuid = 8 >> time = 1623140519 >> KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> 0xfe00c5f398c0 >> vpanic() at vpanic+0x181/frame 0xfe00c5f39910 >> panic() at panic+0x43/frame 0xfe00c5f39970 >> uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfe00c5f399b0 >> uma_zfree_arg() at
Re: kernel panic while copying files
On Tue, Jun 8, 2021 at 2:47 AM Gary Jennejohn wrote: > On Mon, 7 Jun 2021 16:54:11 -0400 > Mark Johnston wrote: > > > On Mon, Jun 07, 2021 at 11:01:09AM +0200, Gary Jennejohn wrote: > > > I've seen this panic three times in the last two days: > > > > > > [first panic] > > > Unread portion of the kernel message buffer: > > > > > > > > > Fatal trap 12: page fault while in kernel mode > > > cpuid = 3; apic id = 03 > > > fault virtual address = 0x801118000 > > > fault code = supervisor write data, page not present > > > instruction pointer = 0x20:0x808d2212 > > > stack pointer = 0x28:0xfe00dbc8c760 > > > frame pointer = 0x28:0xfe00dbc8c7a0 > > > code segment= base 0x0, limit 0xf, type 0x1b > > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > > processor eflags= interrupt enabled, resume, IOPL = 0 > > > current process = 28 (dom0) > > > trap number = 12 > > > panic: page fault > > > cpuid = 3 > > > time = 1622963058 > > > KDB: stack backtrace: > > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfe00dbc8c410 > > > vpanic() at vpanic+0x181/frame 0xfe00dbc8c460 > > > panic() at panic+0x43/frame 0xfe00dbc8c4c0 > > > trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520 > > > trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580 > > > trap() at trap+0x253/frame 0xfe00dbc8c690 > > > calltrap() at calltrap+0x8/frame 0xfe00dbc8c690 > > > --- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp > = 0xfe00dbc8c7a0 --- > > > zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0 > > > bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0 > > > bucket_cache_reclaim_domain() at > bucket_cache_reclaim_domain+0x30a/frame 0xfe00dbc8c830 > > > zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880 > > > uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame > 0xfe00dbc8c8b0 > > > vm_pageout_worker() at vm_pageout_worker+0x41e/frame 0xfe00dbc8cb70 > > > vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0 > > > fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0 > > > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0 > > > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > > > KDB: enter: panic > > > > > > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 > > > 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct > pcpu, > > >pc_curthread))); > > > > > > One difference was that in the second and third panics the fault > virtual > > > address was 0x0. But the backtrace was the same. > > > > > > Relevant info from the info.x files: > > > Architecture: amd64 > > > Architecture Version: 2 > > > Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: Sat > Jun > > > 5 09:58:55 CEST 2021 > > > > > > CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz > K8-class CPU) > > > Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 > Stepping=1 > > > AMD Features=0x2e500800 > > > AMD > Features2=0x35c233ff > > > AMD Extended Feature Extensions ID > EBX=0x1007 > > > > > > I have 16GiB of memory in the box. > > > > > > The panic occurred while copying files from an internal SATA SSD to a > > > SATA 8TB disk in an external USB3 docking station. The panic seems to > > > occur quite quickly, after only a few files have been copied. > > > > > > swap is on a different internal disk. > > > > > > I can poke around in the crash dumps with kgdb if anyone wants more > > > information. > > > > Are you running with invariants configured in the kernel? If not, > > please try to reproduce this in a kernel with > > > > options INVARIANT_SUPPORT > > options INVARIANTS > > > > configured. > > > > A stack trace with line numbers would also be helpful. > > Thanks for the hint. After enabling INVARIANTS the kernel panics as > soon I turn on the external USB3 disk. No user disk access required. > > Version String: FreeBSD 14.0-CURRENT #34 main-n247239-f570a6723e1: Tue Jun > 8 09:34:32 CEST 2021 > > Here the kgdb backtrace: > > Unread portion of the kernel message buffer: > panic: Duplicate free of 0xf800356b9000 from zone > 0xfe00dcbdd800(da_ccb) slab 0xf800356b9fd8(0) > cpuid = 8 > time = 1623140519 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfe00c5f398c0 > vpanic() at vpanic+0x181/frame 0xfe00c5f39910 > panic() at panic+0x43/frame 0xfe00c5f39970 > uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfe00c5f399b0 > uma_zfree_arg() at uma_zfree_arg+0x147/frame 0xfe00c5f39a00 > camperiphdone() at camperiphdone+0x1b7/frame 0xfe00c5f39b20 > xpt_done_process() at xpt_done_process+0x3dd/frame 0xfe00c5f39b60 > xpt_done_td() at xpt_done_td+0xf5/frame 0xfe00c5f39bb0 > fork_exit() at fork_exit+0x80/frame 0xfe00c5f39bf0 > fork_trampoline() at fork_trampoline+0xe/frame
Re: kernel panic while copying files
On Tue, 8 Jun 2021 11:04:33 +0200 Mateusz Guzik wrote: > Given how easy it is to reproduce perhaps you can spend a little bit > of time narrowing it down to a specific commit. You can do it with > git-bisect. > Ok, I'll give it a try. -- Gary Jennejohn
Re: kernel panic while copying files
On 6/8/21 1:34 PM, Gary Jennejohn wrote: Fields in the ccb like periph_name, unit_number and dev_name are filled with zeroes. Smells like a double free, like the panic message indicates, but would be nice to know exactly which driver is doing this, if it is "ATA" or "UMASS", so to speak. Maybe you need to do a quick bisect, like suggested. --HPS
Re: kernel panic while copying files
On Tue, 8 Jun 2021 11:20:37 +0200 Hans Petter Selasky wrote: > On 6/8/21 11:04 AM, Mateusz Guzik wrote: > > Apparently caused by recent changes to CAM. > > > > Let me know if you want more information. > > Maybe you can print the *ccb being freed and figure out which device > it belongs to. > I'm now running a kernel without INVARIANTS, so I can check: Jun 8 13:23:52 ernst kernel: ugen2.4: at usbus2 Jun 8 13:23:52 ernst kernel: umass0 on uhub5 Jun 8 13:23:52 ernst kernel: umass0: on usbus2 Jun 8 13:23:52 ernst kernel: umass0: SCSI over Bulk-Only; quirks = 0xc101 Jun 8 13:23:52 ernst kernel: umass0:6:0: Attached to scbus6 Jun 8 13:24:37 ernst kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0 Jun 8 13:24:37 ernst kernel: da0: Fixed Direct Access SPC-4 SCSI device Jun 8 13:24:37 ernst kernel: da0: Serial Number 0001 Jun 8 13:24:37 ernst kernel: da0: 400.000MB/s transfers Jun 8 13:24:37 ernst kernel: da0: 7630885MB (15628053168 512 byte sectors) Jun 8 13:24:37 ernst kernel: da0: quirks=0x2 The only USB device which I turned on. Fields in the ccb like periph_name, unit_number and dev_name are filled with zeroes. The ccb is enormous and really hard to parse. -- Gary Jennejohn
Re: kernel panic while copying files
On 6/8/21 11:04 AM, Mateusz Guzik wrote: Apparently caused by recent changes to CAM. Let me know if you want more information. Maybe you can print the *ccb being freed and figure out which device it belongs to. --HPS
Re: kernel panic while copying files
Given how easy it is to reproduce perhaps you can spend a little bit of time narrowing it down to a specific commit. You can do it with git-bisect. On 6/8/21, Gary Jennejohn wrote: > On Mon, 7 Jun 2021 16:54:11 -0400 > Mark Johnston wrote: > >> On Mon, Jun 07, 2021 at 11:01:09AM +0200, Gary Jennejohn wrote: >> > I've seen this panic three times in the last two days: >> > >> > [first panic] >> > Unread portion of the kernel message buffer: >> > >> > >> > Fatal trap 12: page fault while in kernel mode >> > cpuid = 3; apic id = 03 >> > fault virtual address = 0x801118000 >> > fault code = supervisor write data, page not present >> > instruction pointer = 0x20:0x808d2212 >> > stack pointer = 0x28:0xfe00dbc8c760 >> > frame pointer = 0x28:0xfe00dbc8c7a0 >> > code segment= base 0x0, limit 0xf, type 0x1b >> > = DPL 0, pres 1, long 1, def32 0, gran 1 >> > processor eflags= interrupt enabled, resume, IOPL = 0 >> > current process = 28 (dom0) >> > trap number = 12 >> > panic: page fault >> > cpuid = 3 >> > time = 1622963058 >> > KDB: stack backtrace: >> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame >> > 0xfe00dbc8c410 >> > vpanic() at vpanic+0x181/frame 0xfe00dbc8c460 >> > panic() at panic+0x43/frame 0xfe00dbc8c4c0 >> > trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520 >> > trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580 >> > trap() at trap+0x253/frame 0xfe00dbc8c690 >> > calltrap() at calltrap+0x8/frame 0xfe00dbc8c690 >> > --- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp = >> > 0xfe00dbc8c7a0 --- >> > zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0 >> > bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0 >> > bucket_cache_reclaim_domain() at bucket_cache_reclaim_domain+0x30a/frame >> > 0xfe00dbc8c830 >> > zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880 >> > uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame >> > 0xfe00dbc8c8b0 >> > vm_pageout_worker() at vm_pageout_worker+0x41e/frame 0xfe00dbc8cb70 >> > vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0 >> > fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0 >> > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0 >> > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- >> > KDB: enter: panic >> > >> > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 >> > 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct >> > pcpu, >> > pc_curthread))); >> > >> > One difference was that in the second and third panics the fault >> > virtual >> > address was 0x0. But the backtrace was the same. >> > >> > Relevant info from the info.x files: >> > Architecture: amd64 >> > Architecture Version: 2 >> > Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: Sat >> > Jun >> > 5 09:58:55 CEST 2021 >> > >> > CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz >> > K8-class CPU) >> > Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 >> > Stepping=1 >> > AMD Features=0x2e500800 >> > AMD >> > Features2=0x35c233ff >> > AMD Extended Feature Extensions ID >> > EBX=0x1007 >> > >> > I have 16GiB of memory in the box. >> > >> > The panic occurred while copying files from an internal SATA SSD to a >> > SATA 8TB disk in an external USB3 docking station. The panic seems to >> > occur quite quickly, after only a few files have been copied. >> > >> > swap is on a different internal disk. >> > >> > I can poke around in the crash dumps with kgdb if anyone wants more >> > information. >> >> Are you running with invariants configured in the kernel? If not, >> please try to reproduce this in a kernel with >> >> options INVARIANT_SUPPORT >> options INVARIANTS >> >> configured. >> >> A stack trace with line numbers would also be helpful. > > Thanks for the hint. After enabling INVARIANTS the kernel panics as > soon I turn on the external USB3 disk. No user disk access required. > > Version String: FreeBSD 14.0-CURRENT #34 main-n247239-f570a6723e1: Tue Jun > 8 09:34:32 CEST 2021 > > Here the kgdb backtrace: > > Unread portion of the kernel message buffer: > panic: Duplicate free of 0xf800356b9000 from zone > 0xfe00dcbdd800(da_ccb) slab 0xf800356b9fd8(0) > cpuid = 8 > time = 1623140519 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfe00c5f398c0 > vpanic() at vpanic+0x181/frame 0xfe00c5f39910 > panic() at panic+0x43/frame 0xfe00c5f39970 > uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfe00c5f399b0 > uma_zfree_arg() at uma_zfree_arg+0x147/frame 0xfe00c5f39a00 > camperiphdone() at camperiphdone+0x1b7/frame 0xfe00c5f39b20 > xpt_done_process() at xpt_done_process+0x3dd/frame 0xfe00c5f39b60 > xpt_done_td() at xpt_done_td+0xf5/frame 0xfe00c5f39bb0 > fork_exit() at fork_exit+0x80/frame
Re: kernel panic while copying files
On Mon, 7 Jun 2021 16:54:11 -0400 Mark Johnston wrote: > On Mon, Jun 07, 2021 at 11:01:09AM +0200, Gary Jennejohn wrote: > > I've seen this panic three times in the last two days: > > > > [first panic] > > Unread portion of the kernel message buffer: > > > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 3; apic id = 03 > > fault virtual address = 0x801118000 > > fault code = supervisor write data, page not present > > instruction pointer = 0x20:0x808d2212 > > stack pointer = 0x28:0xfe00dbc8c760 > > frame pointer = 0x28:0xfe00dbc8c7a0 > > code segment= base 0x0, limit 0xf, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags= interrupt enabled, resume, IOPL = 0 > > current process = 28 (dom0) > > trap number = 12 > > panic: page fault > > cpuid = 3 > > time = 1622963058 > > KDB: stack backtrace: > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > > 0xfe00dbc8c410 > > vpanic() at vpanic+0x181/frame 0xfe00dbc8c460 > > panic() at panic+0x43/frame 0xfe00dbc8c4c0 > > trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520 > > trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580 > > trap() at trap+0x253/frame 0xfe00dbc8c690 > > calltrap() at calltrap+0x8/frame 0xfe00dbc8c690 > > --- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp = > > 0xfe00dbc8c7a0 --- > > zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0 > > bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0 > > bucket_cache_reclaim_domain() at bucket_cache_reclaim_domain+0x30a/frame > > 0xfe00dbc8c830 > > zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880 > > uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame 0xfe00dbc8c8b0 > > vm_pageout_worker() at vm_pageout_worker+0x41e/frame 0xfe00dbc8cb70 > > vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0 > > fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0 > > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0 > > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > > KDB: enter: panic > > > > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 > > 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, > >pc_curthread))); > > > > One difference was that in the second and third panics the fault virtual > > address was 0x0. But the backtrace was the same. > > > > Relevant info from the info.x files: > > Architecture: amd64 > > Architecture Version: 2 > > Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: Sat Jun > > 5 09:58:55 CEST 2021 > > > > CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz K8-class > > CPU) > > Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1 > > AMD Features=0x2e500800 > > AMD > > Features2=0x35c233ff > > AMD Extended Feature Extensions ID > > EBX=0x1007 > > > > I have 16GiB of memory in the box. > > > > The panic occurred while copying files from an internal SATA SSD to a > > SATA 8TB disk in an external USB3 docking station. The panic seems to > > occur quite quickly, after only a few files have been copied. > > > > swap is on a different internal disk. > > > > I can poke around in the crash dumps with kgdb if anyone wants more > > information. > > Are you running with invariants configured in the kernel? If not, > please try to reproduce this in a kernel with > > options INVARIANT_SUPPORT > options INVARIANTS > > configured. > > A stack trace with line numbers would also be helpful. Thanks for the hint. After enabling INVARIANTS the kernel panics as soon I turn on the external USB3 disk. No user disk access required. Version String: FreeBSD 14.0-CURRENT #34 main-n247239-f570a6723e1: Tue Jun 8 09:34:32 CEST 2021 Here the kgdb backtrace: Unread portion of the kernel message buffer: panic: Duplicate free of 0xf800356b9000 from zone 0xfe00dcbdd800(da_ccb) slab 0xf800356b9fd8(0) cpuid = 8 time = 1623140519 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00c5f398c0 vpanic() at vpanic+0x181/frame 0xfe00c5f39910 panic() at panic+0x43/frame 0xfe00c5f39970 uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfe00c5f399b0 uma_zfree_arg() at uma_zfree_arg+0x147/frame 0xfe00c5f39a00 camperiphdone() at camperiphdone+0x1b7/frame 0xfe00c5f39b20 xpt_done_process() at xpt_done_process+0x3dd/frame 0xfe00c5f39b60 xpt_done_td() at xpt_done_td+0xf5/frame 0xfe00c5f39bb0 fork_exit() at fork_exit+0x80/frame 0xfe00c5f39bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c5f39bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) bt #0 __curthread () at
Re: kernel panic while copying files
On Mon, Jun 07, 2021 at 11:01:09AM +0200, Gary Jennejohn wrote: > I've seen this panic three times in the last two days: > > [first panic] > Unread portion of the kernel message buffer: > > > Fatal trap 12: page fault while in kernel mode > cpuid = 3; apic id = 03 > fault virtual address = 0x801118000 > fault code = supervisor write data, page not present > instruction pointer = 0x20:0x808d2212 > stack pointer = 0x28:0xfe00dbc8c760 > frame pointer = 0x28:0xfe00dbc8c7a0 > code segment= base 0x0, limit 0xf, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags= interrupt enabled, resume, IOPL = 0 > current process = 28 (dom0) > trap number = 12 > panic: page fault > cpuid = 3 > time = 1622963058 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00dbc8c410 > vpanic() at vpanic+0x181/frame 0xfe00dbc8c460 > panic() at panic+0x43/frame 0xfe00dbc8c4c0 > trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520 > trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580 > trap() at trap+0x253/frame 0xfe00dbc8c690 > calltrap() at calltrap+0x8/frame 0xfe00dbc8c690 > --- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp = > 0xfe00dbc8c7a0 --- > zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0 > bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0 > bucket_cache_reclaim_domain() at bucket_cache_reclaim_domain+0x30a/frame > 0xfe00dbc8c830 > zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880 > uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame 0xfe00dbc8c8b0 > vm_pageout_worker() at vm_pageout_worker+0x41e/frame 0xfe00dbc8cb70 > vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0 > fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0 > fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0 > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > KDB: enter: panic > > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 > 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, > pc_curthread))); > > One difference was that in the second and third panics the fault virtual > address was 0x0. But the backtrace was the same. > > Relevant info from the info.x files: > Architecture: amd64 > Architecture Version: 2 > Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: Sat Jun > 5 09:58:55 CEST 2021 > > CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz K8-class > CPU) > Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1 > AMD Features=0x2e500800 > AMD > Features2=0x35c233ff > AMD Extended Feature Extensions ID EBX=0x1007 > > I have 16GiB of memory in the box. > > The panic occurred while copying files from an internal SATA SSD to a > SATA 8TB disk in an external USB3 docking station. The panic seems to > occur quite quickly, after only a few files have been copied. > > swap is on a different internal disk. > > I can poke around in the crash dumps with kgdb if anyone wants more > information. Are you running with invariants configured in the kernel? If not, please try to reproduce this in a kernel with options INVARIANT_SUPPORT options INVARIANTS configured. A stack trace with line numbers would also be helpful.
kernel panic while copying files
I've seen this panic three times in the last two days: [first panic] Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x801118000 fault code = supervisor write data, page not present instruction pointer = 0x20:0x808d2212 stack pointer = 0x28:0xfe00dbc8c760 frame pointer = 0x28:0xfe00dbc8c7a0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 28 (dom0) trap number = 12 panic: page fault cpuid = 3 time = 1622963058 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00dbc8c410 vpanic() at vpanic+0x181/frame 0xfe00dbc8c460 panic() at panic+0x43/frame 0xfe00dbc8c4c0 trap_fatal() at trap_fatal+0x387/frame 0xfe00dbc8c520 trap_pfault() at trap_pfault+0x4f/frame 0xfe00dbc8c580 trap() at trap+0x253/frame 0xfe00dbc8c690 calltrap() at calltrap+0x8/frame 0xfe00dbc8c690 --- trap 0xc, rip = 0x808d2212, rsp = 0xfe00dbc8c760, rbp = 0xfe00dbc8c7a0 --- zone_release() at zone_release+0x1f2/frame 0xfe00dbc8c7a0 bucket_drain() at bucket_drain+0xda/frame 0xfe00dbc8c7d0 bucket_cache_reclaim_domain() at bucket_cache_reclaim_domain+0x30a/frame 0xfe00dbc8c830 zone_reclaim() at zone_reclaim+0x162/frame 0xfe00dbc8c880 uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame 0xfe00dbc8c8b0 vm_pageout_worker() at vm_pageout_worker+0x41e/frame 0xfe00dbc8cb70 vm_pageout() at vm_pageout+0x21e/frame 0xfe00dbc8cbb0 fork_exit() at fork_exit+0x7e/frame 0xfe00dbc8cbf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe00dbc8cbf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, pc_curthread))); One difference was that in the second and third panics the fault virtual address was 0x0. But the backtrace was the same. Relevant info from the info.x files: Architecture: amd64 Architecture Version: 2 Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: Sat Jun 5 09:58:55 CEST 2021 CPU: AMD Ryzen 5 1600 Six-Core Processor (3194.09-MHz K8-class CPU) Origin="AuthenticAMD" Id=0x800f11 Family=0x17 Model=0x1 Stepping=1 AMD Features=0x2e500800 AMD Features2=0x35c233ff AMD Extended Feature Extensions ID EBX=0x1007 I have 16GiB of memory in the box. The panic occurred while copying files from an internal SATA SSD to a SATA 8TB disk in an external USB3 docking station. The panic seems to occur quite quickly, after only a few files have been copied. swap is on a different internal disk. I can poke around in the crash dumps with kgdb if anyone wants more information. -- Gary Jennejohn