Re: Cam panic on r271170

2014-09-25 Thread Bryan Drewery
On 9/17/2014 10:39 AM, Bryan Drewery wrote:
> On 9/16/2014 9:28 PM, Bryan Drewery wrote:
>> I've been getting this quite frequently on head recently. I have dumps
>> if anyone is interested in more information.
>>
>>> Fatal trap 9: general protection fault while in kernel mode
>>> cpuid = 10; Memory modified after free 0xf8003e0b0800(2040)
>>> val= @ 0xf8003e0b0808
>>> apanic: Most recently used by CAM CCB
>>>
>>> cpuid = 6
>>> KDB: stack backtrace:
>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>>> 0xfe124735b4c0
>>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe124735b570
>>> vpanic() at vpanic+0x189/frame 0xfe124735b5f0
>>> panic() at panic+0x43/frame 0xfe124735b650
>>> mtrash_ctor() at mtrash_ctor+0x8a/frame 0xfe124735b680
>>> uma_zalloc_arg() at uma_zalloc_arg+0x4f1/frame 0xfe124735b6f0
>>> malloc() at malloc+0x192/frame 0xfe124735b740
>>> xpt_run_allocq() at xpt_run_allocq+0xb5/frame 0xfe124735b780
>>> adastrategy() at adastrategy+0x117/frame 0xfe124735b7b0
>>> g_io_request() at g_io_request+0x3b7/frame 0xfe124735b810
>>> g_part_start() at g_part_start+0x2b7/frame 0xfe124735b890
>>> g_io_request() at g_io_request+0x3b7/frame 0xfe124735b8f0
>>> g_io_request() at g_io_request+0x3b7/frame 0xfe124735b950
>>> vdev_geom_io_start() at vdev_geom_io_start+0x137/frame 0xfe124735b970
>>> zio_vdev_io_start() at zio_vdev_io_start+0x49f/frame 0xfe124735b9d0
>>> zio_execute() at zio_execute+0x204/frame 0xfe124735ba30
>>> vdev_queue_io_done() at vdev_queue_io_done+0x180/frame 0xfe124735ba80
>>> zio_vdev_io_done() at zio_vdev_io_done+0x11d/frame 0xfe124735bac0
>>> zio_execute() at zio_execute+0x204/frame 0xfe124735bb20
>>> taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame
>>> 0xfe124735bb80
>>> taskqueue_thread_loop() at taskqueue_thread_loop+0x9b/frame
>>> 0xfe124735bbb0
>>> fork_exit() at fork_exit+0x84/frame 0xfe124735bbf0
>>> fork_trampoline() at fork_trampoline+0xe/frame 0xfe124735bbf0
>>> --- trap 0, rip = 0, rsp = 0xfe124735bcb0, rbp = 0 ---
>>> KDB: enter: panic
>>> [ thread pid 0 tid 100571 ]
>>> Stopped at  kdb_enter+0x3e: movq$0,kdb_why
>>
>>

Not sure about this one. We've been getting it at work as well based on
a stable/10ish tree.

> 
> I also had this one recently:
> 
>> #8  0x80d1a162 in calltrap () at 
>> /usr/src/sys/amd64/amd64/exception.S:231
>> #9  0x802e52c4 in xpt_path_periph (path=0xdeadc0dedeadc0de) at 
>> /usr/src/sys/cam/cam_xpt.c:3738
>> #10 0x802dfe62 in cam_periph_error (ccb=0xf8003e0b6000, 
>> camflags=CAM_FLAG_NONE, sense_flags=0, save_ccb=0x0) at 
>> /usr/src/sys/cam/cam_periph.c:1602
>> #11 0x803057e4 in adadone (periph=0xf8003e09b700, 
>> done_ccb=0xf8003e0b6000) at /usr/src/sys/cam/ata/ata_da.c:1877
>> #12 0x802e6e44 in xpt_done_process (ccb_h=0xf8003e0b6000) at 
>> /usr/src/sys/cam/cam_xpt.c:5245
>> #13 0x80394d59 in ahci_ch_intr_direct (arg=) at 
>> /usr/src/sys/dev/ahci/ahci.c:1132
>> #14 0x80390ff1 in ahci_intr (data=) at 
>> /usr/src/sys/dev/ahci/ahci.c:417
>> #15 0x808ea5d3 in intr_event_execute_handlers (p=> out>, ie=0xf8000f725d00) at /usr/src/sys/kern/kern_intr.c:1252
>> #16 0x808eafb6 in ithread_loop (arg=0xf8000f6dea60) at 
>> /usr/src/sys/kern/kern_intr.c:1265
>> #17 0x808e7fc4 in fork_exit (callout=0x808eaf10 
>> , arg=0xf8000f6dea60, frame=0xfe1245083c00) at 
>> /usr/src/sys/kern/kern_fork.c:977
>> #18 0x80d1a69e in fork_trampoline () at 
>> /usr/src/sys/amd64/amd64/exception.S:605
> 

This one however (and all subsequent traces I posted) was resolved by
r271201.

-- 
Regards,
Bryan Drewery



signature.asc
Description: OpenPGP digital signature


Re: Cam panic on r271170

2014-09-24 Thread Ryan Stone
On Wed, Sep 24, 2014 at 11:48 AM, Bryan Drewery  wrote:
> Another, with much more information here:
> https://people.freebsd.org/~bdrewery/cam.panic.txt
>
> This was with memguard (vm.memguard.desc="CAM CCB") and a KASSERT in
> malloc(9) and uma_zalloc_arg() to prevent M_WAITOK in non-sleepable
> threads. Neither triggered.

Well that's unfortunate.  Are CCBs ever the target of DMA?

Does your hardware have an Intel chipset and is it new enough to
support an IOMMU?  You might try enabling the IOMMU, which would catch
out-of-bounds DMA by hardware:

http://svnweb.freebsd.org/changeset/base/257251

One caveat is that while I understand that the busdma infrastructure
for this is quite well tested, individual drivers and devices need to
be well-behaved so it's possible that this won't work on your hardware
right now.  Also, don't enable this if you use BHyve with PCI
Passthrough.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Cam panic on r271170

2014-09-24 Thread Bryan Drewery
On 9/17/2014 10:39 AM, Bryan Drewery wrote:
> On 9/16/2014 9:28 PM, Bryan Drewery wrote:
>> I've been getting this quite frequently on head recently. I have dumps
>> if anyone is interested in more information.
>>
>>> Fatal trap 9: general protection fault while in kernel mode
>>> cpuid = 10; Memory modified after free 0xf8003e0b0800(2040)
>>> val= @ 0xf8003e0b0808
>>> apanic: Most recently used by CAM CCB
>>>
>>> cpuid = 6
>>> KDB: stack backtrace:
>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>>> 0xfe124735b4c0
>>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe124735b570
>>> vpanic() at vpanic+0x189/frame 0xfe124735b5f0
>>> panic() at panic+0x43/frame 0xfe124735b650
>>> mtrash_ctor() at mtrash_ctor+0x8a/frame 0xfe124735b680
>>> uma_zalloc_arg() at uma_zalloc_arg+0x4f1/frame 0xfe124735b6f0
>>> malloc() at malloc+0x192/frame 0xfe124735b740
>>> xpt_run_allocq() at xpt_run_allocq+0xb5/frame 0xfe124735b780
>>> adastrategy() at adastrategy+0x117/frame 0xfe124735b7b0
>>> g_io_request() at g_io_request+0x3b7/frame 0xfe124735b810
>>> g_part_start() at g_part_start+0x2b7/frame 0xfe124735b890
>>> g_io_request() at g_io_request+0x3b7/frame 0xfe124735b8f0
>>> g_io_request() at g_io_request+0x3b7/frame 0xfe124735b950
>>> vdev_geom_io_start() at vdev_geom_io_start+0x137/frame 0xfe124735b970
>>> zio_vdev_io_start() at zio_vdev_io_start+0x49f/frame 0xfe124735b9d0
>>> zio_execute() at zio_execute+0x204/frame 0xfe124735ba30
>>> vdev_queue_io_done() at vdev_queue_io_done+0x180/frame 0xfe124735ba80
>>> zio_vdev_io_done() at zio_vdev_io_done+0x11d/frame 0xfe124735bac0
>>> zio_execute() at zio_execute+0x204/frame 0xfe124735bb20
>>> taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame
>>> 0xfe124735bb80
>>> taskqueue_thread_loop() at taskqueue_thread_loop+0x9b/frame
>>> 0xfe124735bbb0
>>> fork_exit() at fork_exit+0x84/frame 0xfe124735bbf0
>>> fork_trampoline() at fork_trampoline+0xe/frame 0xfe124735bbf0
>>> --- trap 0, rip = 0, rsp = 0xfe124735bcb0, rbp = 0 ---
>>> KDB: enter: panic
>>> [ thread pid 0 tid 100571 ]
>>> Stopped at  kdb_enter+0x3e: movq$0,kdb_why
>>
>>
> 
> I also had this one recently:
> 
>> #8  0x80d1a162 in calltrap () at 
>> /usr/src/sys/amd64/amd64/exception.S:231
>> #9  0x802e52c4 in xpt_path_periph (path=0xdeadc0dedeadc0de) at 
>> /usr/src/sys/cam/cam_xpt.c:3738
>> #10 0x802dfe62 in cam_periph_error (ccb=0xf8003e0b6000, 
>> camflags=CAM_FLAG_NONE, sense_flags=0, save_ccb=0x0) at 
>> /usr/src/sys/cam/cam_periph.c:1602
>> #11 0x803057e4 in adadone (periph=0xf8003e09b700, 
>> done_ccb=0xf8003e0b6000) at /usr/src/sys/cam/ata/ata_da.c:1877
>> #12 0x802e6e44 in xpt_done_process (ccb_h=0xf8003e0b6000) at 
>> /usr/src/sys/cam/cam_xpt.c:5245
>> #13 0x80394d59 in ahci_ch_intr_direct (arg=) at 
>> /usr/src/sys/dev/ahci/ahci.c:1132
>> #14 0x80390ff1 in ahci_intr (data=) at 
>> /usr/src/sys/dev/ahci/ahci.c:417
>> #15 0x808ea5d3 in intr_event_execute_handlers (p=> out>, ie=0xf8000f725d00) at /usr/src/sys/kern/kern_intr.c:1252
>> #16 0x808eafb6 in ithread_loop (arg=0xf8000f6dea60) at 
>> /usr/src/sys/kern/kern_intr.c:1265
>> #17 0x808e7fc4 in fork_exit (callout=0x808eaf10 
>> , arg=0xf8000f6dea60, frame=0xfe1245083c00) at 
>> /usr/src/sys/kern/kern_fork.c:977
>> #18 0x80d1a69e in fork_trampoline () at 
>> /usr/src/sys/amd64/amd64/exception.S:605
> 
> 

Another, with much more information here:
https://people.freebsd.org/~bdrewery/cam.panic.txt

This was with memguard (vm.memguard.desc="CAM CCB") and a KASSERT in
malloc(9) and uma_zalloc_arg() to prevent M_WAITOK in non-sleepable
threads. Neither triggered.

It seems to be when syncing to the SSD ZFS log I have.

-- 
Regards,
Bryan Drewery



signature.asc
Description: OpenPGP digital signature


Re: Cam panic on r271170

2014-09-17 Thread Ryan Stone
Could you try turning on memguard for the CAM CCB malloc area?  That
should locate the problem quite quickly.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Cam panic on r271170

2014-09-17 Thread Bryan Drewery
On 9/16/2014 9:28 PM, Bryan Drewery wrote:
> I've been getting this quite frequently on head recently. I have dumps
> if anyone is interested in more information.
> 
>> Fatal trap 9: general protection fault while in kernel mode
>> cpuid = 10; Memory modified after free 0xf8003e0b0800(2040)
>> val= @ 0xf8003e0b0808
>> apanic: Most recently used by CAM CCB
>>
>> cpuid = 6
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> 0xfe124735b4c0
>> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe124735b570
>> vpanic() at vpanic+0x189/frame 0xfe124735b5f0
>> panic() at panic+0x43/frame 0xfe124735b650
>> mtrash_ctor() at mtrash_ctor+0x8a/frame 0xfe124735b680
>> uma_zalloc_arg() at uma_zalloc_arg+0x4f1/frame 0xfe124735b6f0
>> malloc() at malloc+0x192/frame 0xfe124735b740
>> xpt_run_allocq() at xpt_run_allocq+0xb5/frame 0xfe124735b780
>> adastrategy() at adastrategy+0x117/frame 0xfe124735b7b0
>> g_io_request() at g_io_request+0x3b7/frame 0xfe124735b810
>> g_part_start() at g_part_start+0x2b7/frame 0xfe124735b890
>> g_io_request() at g_io_request+0x3b7/frame 0xfe124735b8f0
>> g_io_request() at g_io_request+0x3b7/frame 0xfe124735b950
>> vdev_geom_io_start() at vdev_geom_io_start+0x137/frame 0xfe124735b970
>> zio_vdev_io_start() at zio_vdev_io_start+0x49f/frame 0xfe124735b9d0
>> zio_execute() at zio_execute+0x204/frame 0xfe124735ba30
>> vdev_queue_io_done() at vdev_queue_io_done+0x180/frame 0xfe124735ba80
>> zio_vdev_io_done() at zio_vdev_io_done+0x11d/frame 0xfe124735bac0
>> zio_execute() at zio_execute+0x204/frame 0xfe124735bb20
>> taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame
>> 0xfe124735bb80
>> taskqueue_thread_loop() at taskqueue_thread_loop+0x9b/frame
>> 0xfe124735bbb0
>> fork_exit() at fork_exit+0x84/frame 0xfe124735bbf0
>> fork_trampoline() at fork_trampoline+0xe/frame 0xfe124735bbf0
>> --- trap 0, rip = 0, rsp = 0xfe124735bcb0, rbp = 0 ---
>> KDB: enter: panic
>> [ thread pid 0 tid 100571 ]
>> Stopped at  kdb_enter+0x3e: movq$0,kdb_why
> 
> 

I also had this one recently:

> #8  0x80d1a162 in calltrap () at 
> /usr/src/sys/amd64/amd64/exception.S:231
> #9  0x802e52c4 in xpt_path_periph (path=0xdeadc0dedeadc0de) at 
> /usr/src/sys/cam/cam_xpt.c:3738
> #10 0x802dfe62 in cam_periph_error (ccb=0xf8003e0b6000, 
> camflags=CAM_FLAG_NONE, sense_flags=0, save_ccb=0x0) at 
> /usr/src/sys/cam/cam_periph.c:1602
> #11 0x803057e4 in adadone (periph=0xf8003e09b700, 
> done_ccb=0xf8003e0b6000) at /usr/src/sys/cam/ata/ata_da.c:1877
> #12 0x802e6e44 in xpt_done_process (ccb_h=0xf8003e0b6000) at 
> /usr/src/sys/cam/cam_xpt.c:5245
> #13 0x80394d59 in ahci_ch_intr_direct (arg=) at 
> /usr/src/sys/dev/ahci/ahci.c:1132
> #14 0x80390ff1 in ahci_intr (data=) at 
> /usr/src/sys/dev/ahci/ahci.c:417
> #15 0x808ea5d3 in intr_event_execute_handlers (p= out>, ie=0xf8000f725d00) at /usr/src/sys/kern/kern_intr.c:1252
> #16 0x808eafb6 in ithread_loop (arg=0xf8000f6dea60) at 
> /usr/src/sys/kern/kern_intr.c:1265
> #17 0x808e7fc4 in fork_exit (callout=0x808eaf10 
> , arg=0xf8000f6dea60, frame=0xfe1245083c00) at 
> /usr/src/sys/kern/kern_fork.c:977
> #18 0x80d1a69e in fork_trampoline () at 
> /usr/src/sys/amd64/amd64/exception.S:605


-- 
Regards,
Bryan Drewery



signature.asc
Description: OpenPGP digital signature