witness_lock_list_get: witness exhausted
First shame on me as I obviously missed the reply to my former report and did not report back on a proposed patch. https://lists.freebsd.org/pipermail/freebsd-current/2018-January/068136.html I am still running current – the guest has 160GB Ram and 64 vCPU’s assigned under ESXi which mainly runs poudriere for amd64+arm builds and I again noticed "witness_lock_list_get: witness exhausted" on the console (which I don't pay much attention to). UNAME: FreeBSD poudriere.gopai.com 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n259872-f948cb717f50: Wed Dec 28 13:13:43 EST 2022 mi...@poudriere.gopai.com:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 It seems that LOCK_NCHILDREN and LOCK_CHILDCOUNT never got incremented due to my lack of reporting in /usr/src/sys/kern/subr_witness.c If it's just a matter of incrementing LOCK_CHIDCOUNT 4096 and LOCK_NCHILDREN 20 without adding the sysctl knobs I do that also - I am not kernel savvy. I will make sure and test an updated patch should one be made available or advise whether to increment these values or ignore the warnings. Regards, Michael Jung CONFIDENTIALITY NOTE: This message is intended only for the use of the individual or entity to whom it is addressed and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this transmission in error, please notify us by telephone at (502) 212-4000 or notify us at PAI, Dept. 99, 2101 High Wickham Place, Suite 101, Louisville, KY 40245 Disclaimer The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful. This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast, a leader in email security and cyber resilience. Mimecast integrates email defenses with brand protection, security awareness training, web security, compliance and other essential capabilities. Mimecast helps protect large and small organizations from malicious activity, human error and technology failure; and to lead the movement toward building a more resilient world. To find out more, visit our website.
Re: usbhid panic when switching vt-s (invariants+witness enabled)
On 9/30/22 12:31, Andriy Gapon wrote: On 26/09/2022 18:13, Hans Petter Selasky wrote: On 9/23/22 23:43, Hans Petter Selasky wrote: vpanic() at 0x808f4c84 = vpanic+0x184/frame 0xfe003590e900 panic() at 0x808f4a33 = panic+0x43/frame 0xfe003590e960 sleepq_add() at 0x809521ab = sleepq_add+0x37b/frame 0xfe003590e9b0 _sleep() at 0x80902118 = _sleep+0x238/frame 0xfe003590ea40 usbhid_sync_xfer() at 0x8532e071 = usbhid_sync_xfer+0x171/frame 0xfe003590eaa0 usbhid_set_report() at 0x8532db26 = usbhid_set_report+0x96/frame 0xfe003590eae0 hid_set_report() at 0x80686caa = hid_set_report+0x6a/frame 0xfe003590eb20 hidbus_write() at 0x85335a7c = hidbus_write+0x5c/frame 0xfe003590eb50 hid_write() at 0x80686b98 = hid_write+0x58/frame 0xfe003590eb80 hkbd_set_leds() at 0x85c1cfe6 = hkbd_set_leds+0x206/frame 0xfe003590ebc0 hkbd_ioctl_locked() at 0x85c1cd6b = hkbd_ioctl_locked+0x33b/frame 0xfe003590ec20 hkbd_ioctl_locked() at 0x85c1caff = hkbd_ioctl_locked+0xcf/frame 0xfe003590ec80 hkbd_ioctl() at 0x85c1ba5a = hkbd_ioctl+0xba/frame 0xfe003590ecc0 kbdmux_ioctl() at 0x80695d3b = kbdmux_ioctl+0x12b/frame 0xfe003590ed00 vt_window_switch() at 0x8079d969 = vt_window_switch+0x229/frame 0xfe003590ed40 vt_switch_timer() at 0x807a15a1 = vt_switch_timer+0x21/frame 0xfe003590ed60 Can you test this patch: https://reviews.freebsd.org/D36715 Sorry that it took a while. I cannot reproduce the problem after applying the patch. I see that you already committed the change, but I thought that I'd let you know. Thank you very much! You are welcome! --HPS
Re: usbhid panic when switching vt-s (invariants+witness enabled)
On 26/09/2022 18:13, Hans Petter Selasky wrote: On 9/23/22 23:43, Hans Petter Selasky wrote: vpanic() at 0x808f4c84 = vpanic+0x184/frame 0xfe003590e900 panic() at 0x808f4a33 = panic+0x43/frame 0xfe003590e960 sleepq_add() at 0x809521ab = sleepq_add+0x37b/frame 0xfe003590e9b0 _sleep() at 0x80902118 = _sleep+0x238/frame 0xfe003590ea40 usbhid_sync_xfer() at 0x8532e071 = usbhid_sync_xfer+0x171/frame 0xfe003590eaa0 usbhid_set_report() at 0x8532db26 = usbhid_set_report+0x96/frame 0xfe003590eae0 hid_set_report() at 0x80686caa = hid_set_report+0x6a/frame 0xfe003590eb20 hidbus_write() at 0x85335a7c = hidbus_write+0x5c/frame 0xfe003590eb50 hid_write() at 0x80686b98 = hid_write+0x58/frame 0xfe003590eb80 hkbd_set_leds() at 0x85c1cfe6 = hkbd_set_leds+0x206/frame 0xfe003590ebc0 hkbd_ioctl_locked() at 0x85c1cd6b = hkbd_ioctl_locked+0x33b/frame 0xfe003590ec20 hkbd_ioctl_locked() at 0x85c1caff = hkbd_ioctl_locked+0xcf/frame 0xfe003590ec80 hkbd_ioctl() at 0x85c1ba5a = hkbd_ioctl+0xba/frame 0xfe003590ecc0 kbdmux_ioctl() at 0x80695d3b = kbdmux_ioctl+0x12b/frame 0xfe003590ed00 vt_window_switch() at 0x8079d969 = vt_window_switch+0x229/frame 0xfe003590ed40 vt_switch_timer() at 0x807a15a1 = vt_switch_timer+0x21/frame 0xfe003590ed60 Can you test this patch: https://reviews.freebsd.org/D36715 Sorry that it took a while. I cannot reproduce the problem after applying the patch. I see that you already committed the change, but I thought that I'd let you know. Thank you very much! -- Andriy Gapon
Re: usbhid panic when switching vt-s (invariants+witness enabled)
On 9/23/22 23:43, Hans Petter Selasky wrote: vpanic() at 0x808f4c84 = vpanic+0x184/frame 0xfe003590e900 panic() at 0x808f4a33 = panic+0x43/frame 0xfe003590e960 sleepq_add() at 0x809521ab = sleepq_add+0x37b/frame 0xfe003590e9b0 _sleep() at 0x80902118 = _sleep+0x238/frame 0xfe003590ea40 usbhid_sync_xfer() at 0x8532e071 = usbhid_sync_xfer+0x171/frame 0xfe003590eaa0 usbhid_set_report() at 0x8532db26 = usbhid_set_report+0x96/frame 0xfe003590eae0 hid_set_report() at 0x80686caa = hid_set_report+0x6a/frame 0xfe003590eb20 hidbus_write() at 0x85335a7c = hidbus_write+0x5c/frame 0xfe003590eb50 hid_write() at 0x80686b98 = hid_write+0x58/frame 0xfe003590eb80 hkbd_set_leds() at 0x85c1cfe6 = hkbd_set_leds+0x206/frame 0xfe003590ebc0 hkbd_ioctl_locked() at 0x85c1cd6b = hkbd_ioctl_locked+0x33b/frame 0xfe003590ec20 hkbd_ioctl_locked() at 0x85c1caff = hkbd_ioctl_locked+0xcf/frame 0xfe003590ec80 hkbd_ioctl() at 0x85c1ba5a = hkbd_ioctl+0xba/frame 0xfe003590ecc0 kbdmux_ioctl() at 0x80695d3b = kbdmux_ioctl+0x12b/frame 0xfe003590ed00 vt_window_switch() at 0x8079d969 = vt_window_switch+0x229/frame 0xfe003590ed40 vt_switch_timer() at 0x807a15a1 = vt_switch_timer+0x21/frame 0xfe003590ed60 Can you test this patch: https://reviews.freebsd.org/D36715 --HPS
Re: usbhid panic when switching vt-s (invariants+witness enabled)
On 9/23/22 23:33, Andriy Gapon wrote: It seems that the problem may be related to different keyboard LED states between the VTs. The system is a fresh stable/13. The panic looks like an attempt to sleep while in an interrupt thread (a callout?). Hi, I suspect vt_switch_timer must have a task to handle those requests, which allows sleeping. Vladimir, will you handle this? It is a minor regression issue when using the new usbhid feature. --HPS panic: sleepq_add: td 0xf80006af to sleep on wchan 0xf802ea752e08 with sleeping prohibited cpuid = 5 time = 1663940484 KDB: stack backtrace: db_trace_self_wrapper() at 0x8061555b = db_trace_self_wrapper+0x2b/frame 0xfe003590e7f0 kdb_backtrace() at 0x80942637 = kdb_backtrace+0x37/frame 0xfe003590e8a0 vpanic() at 0x808f4c84 = vpanic+0x184/frame 0xfe003590e900 panic() at 0x808f4a33 = panic+0x43/frame 0xfe003590e960 sleepq_add() at 0x809521ab = sleepq_add+0x37b/frame 0xfe003590e9b0 _sleep() at 0x80902118 = _sleep+0x238/frame 0xfe003590ea40 usbhid_sync_xfer() at 0x8532e071 = usbhid_sync_xfer+0x171/frame 0xfe003590eaa0 usbhid_set_report() at 0x8532db26 = usbhid_set_report+0x96/frame 0xfe003590eae0 hid_set_report() at 0x80686caa = hid_set_report+0x6a/frame 0xfe003590eb20 hidbus_write() at 0x85335a7c = hidbus_write+0x5c/frame 0xfe003590eb50 hid_write() at 0x80686b98 = hid_write+0x58/frame 0xfe003590eb80 hkbd_set_leds() at 0x85c1cfe6 = hkbd_set_leds+0x206/frame 0xfe003590ebc0 hkbd_ioctl_locked() at 0x85c1cd6b = hkbd_ioctl_locked+0x33b/frame 0xfe003590ec20 hkbd_ioctl_locked() at 0x85c1caff = hkbd_ioctl_locked+0xcf/frame 0xfe003590ec80 hkbd_ioctl() at 0x85c1ba5a = hkbd_ioctl+0xba/frame 0xfe003590ecc0 kbdmux_ioctl() at 0x80695d3b = kbdmux_ioctl+0x12b/frame 0xfe003590ed00 vt_window_switch() at 0x8079d969 = vt_window_switch+0x229/frame 0xfe003590ed40 vt_switch_timer() at 0x807a15a1 = vt_switch_timer+0x21/frame 0xfe003590ed60 softclock_call_cc() at 0x809127c4 = softclock_call_cc+0x244/frame 0xfe003590ee20 softclock() at 0x80912c1c = softclock+0x7c/frame 0xfe003590ee50 ithread_loop() at 0x808b662a = ithread_loop+0x2da/frame 0xfe003590eef0 fork_exit() at 0x808b2f85 = fork_exit+0xc5/frame 0xfe003590ef30 fork_trampoline() at 0x80c084fe = fork_trampoline+0xe/frame 0xfe003590ef30
usbhid panic when switching vt-s (invariants+witness enabled)
It seems that the problem may be related to different keyboard LED states between the VTs. The system is a fresh stable/13. The panic looks like an attempt to sleep while in an interrupt thread (a callout?). panic: sleepq_add: td 0xf80006af to sleep on wchan 0xf802ea752e08 with sleeping prohibited cpuid = 5 time = 1663940484 KDB: stack backtrace: db_trace_self_wrapper() at 0x8061555b = db_trace_self_wrapper+0x2b/frame 0xfe003590e7f0 kdb_backtrace() at 0x80942637 = kdb_backtrace+0x37/frame 0xfe003590e8a0 vpanic() at 0x808f4c84 = vpanic+0x184/frame 0xfe003590e900 panic() at 0x808f4a33 = panic+0x43/frame 0xfe003590e960 sleepq_add() at 0x809521ab = sleepq_add+0x37b/frame 0xfe003590e9b0 _sleep() at 0x80902118 = _sleep+0x238/frame 0xfe003590ea40 usbhid_sync_xfer() at 0x8532e071 = usbhid_sync_xfer+0x171/frame 0xfe003590eaa0 usbhid_set_report() at 0x8532db26 = usbhid_set_report+0x96/frame 0xfe003590eae0 hid_set_report() at 0x80686caa = hid_set_report+0x6a/frame 0xfe003590eb20 hidbus_write() at 0x85335a7c = hidbus_write+0x5c/frame 0xfe003590eb50 hid_write() at 0x80686b98 = hid_write+0x58/frame 0xfe003590eb80 hkbd_set_leds() at 0x85c1cfe6 = hkbd_set_leds+0x206/frame 0xfe003590ebc0 hkbd_ioctl_locked() at 0x85c1cd6b = hkbd_ioctl_locked+0x33b/frame 0xfe003590ec20 hkbd_ioctl_locked() at 0x85c1caff = hkbd_ioctl_locked+0xcf/frame 0xfe003590ec80 hkbd_ioctl() at 0x85c1ba5a = hkbd_ioctl+0xba/frame 0xfe003590ecc0 kbdmux_ioctl() at 0x80695d3b = kbdmux_ioctl+0x12b/frame 0xfe003590ed00 vt_window_switch() at 0x8079d969 = vt_window_switch+0x229/frame 0xfe003590ed40 vt_switch_timer() at 0x807a15a1 = vt_switch_timer+0x21/frame 0xfe003590ed60 softclock_call_cc() at 0x809127c4 = softclock_call_cc+0x244/frame 0xfe003590ee20 softclock() at 0x80912c1c = softclock+0x7c/frame 0xfe003590ee50 ithread_loop() at 0x808b662a = ithread_loop+0x2da/frame 0xfe003590eef0 fork_exit() at 0x808b2f85 = fork_exit+0xc5/frame 0xfe003590ef30 fork_trampoline() at 0x80c084fe = fork_trampoline+0xe/frame 0xfe003590ef30 -- Andriy Gapon
Re: Can't build with INVARIANTS but not WITNESS
On 4/27/22, John F Carr wrote: > My -CURRENT kernel has INVARIANTS (inherited from GENERIC) but not WITNESS: > > include GENERIC > ident STRIATUS > nooptions WITNESS > nooptions WITNESS_SKIPSPIN > > My kernel build fails: > > /usr/home/jfc/freebsd/src/sys/kern/vfs_lookup.c:102:13: error: variable > 'line' set but not used [-Werror,-Wunused-but-set-variable] > int flags, line __diagused; >^ > /usr/home/jfc/freebsd/src/sys/kern/vfs_lookup.c:101:14: error: variable > 'file' set but not used [-Werror,-Wunused-but-set-variable] > const char *file __diagused; > > The problem is, __diagused expands to nothing if INVARIANTS _or_ WITNESS is > defined, but the variable in vfs_lookup.c is only used if WITNESS is > defined. > > #if defined(INVARIANTS) || defined(WITNESS) > #define __diagused > #else > #define __diagused __unused > #endif > > I think this code is trying to be too clever and causing more trouble than > it prevents. Change the || to &&, or replace __diagused with __unused > everywhere. > I disagree. The entire point is to not end up with actually unused variables even when is enabled. I patched it up in https://cgit.FreeBSD.org/src/commit/?id=b40c0db6f6d61ed594118d81dc691b9263a7e4d7 . This still allows for actually vars when only one of INVARIANTS or WITNESS is defined, but that's a much smaller problem than allowing it in general. -- Mateusz Guzik
Can't build with INVARIANTS but not WITNESS
My -CURRENT kernel has INVARIANTS (inherited from GENERIC) but not WITNESS: include GENERIC ident STRIATUS nooptions WITNESS nooptions WITNESS_SKIPSPIN My kernel build fails: /usr/home/jfc/freebsd/src/sys/kern/vfs_lookup.c:102:13: error: variable 'line' set but not used [-Werror,-Wunused-but-set-variable] int flags, line __diagused; ^ /usr/home/jfc/freebsd/src/sys/kern/vfs_lookup.c:101:14: error: variable 'file' set but not used [-Werror,-Wunused-but-set-variable] const char *file __diagused; The problem is, __diagused expands to nothing if INVARIANTS _or_ WITNESS is defined, but the variable in vfs_lookup.c is only used if WITNESS is defined. #if defined(INVARIANTS) || defined(WITNESS) #define __diagused #else #define __diagused __unused #endif I think this code is trying to be too clever and causing more trouble than it prevents. Change the || to &&, or replace __diagused with __unused everywhere.
Re: witness_lock_list_get: witness exhausted
Just take it and change as you see fit, I don't have time to work on it. On 10/4/21, Alan Somers wrote: > On Mon, Jan 8, 2018 at 5:31 PM Mateusz Guzik wrote: >> >> On Tue, Jan 9, 2018 at 12:41 AM, Michael Jung wrote: >> >> > On 2018-01-08 13:39, John Baldwin wrote: >> > >> >> On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote: >> >> >> >>> Hi! >> >>> >> >>> I've recently up'd my processor count on our poudriere box and have >> >>> started noticing the error >> >>> "witness_lock_list_get: witness exhausted" on the console. The >> >>> kernel >> >>> *DOES NOT* crash but I >> >>> thought the report may be useful to someone. >> >>> >> >>> $ uname -a >> >>> FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun >> >>> Nov >> >>> 19 18:41:20 EST 2017 >> >>> mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 >> >>> >> >>> The machine is pretty busy running four poudriere build instances. >> >>> >> >>> last pid: 76584; load averages: 115.07, 115.96, 98.30 >> >>> >> >>> up 6+07:32:59 14:44:03 >> >>> 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock >> >>> CPU: 59.0% user, 0.0% nice, 40.7% system, 0.1% interrupt, 0.1% >> >>> idle >> >>> Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free >> >>> ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other >> >>> 25G Compressed, 32G Uncompressed, 1.24:1 Ratio >> >>> >> >>> Let me know what additional information I might supply. >> >>> >> >> >> >> This just means that WITNESS stopped working because it ran out of >> >> pre-allocated objects. In particular the objects used to track how >> >> many locks are held by how many threads: >> >> >> >> /* >> >> * XXX: This is somewhat bogus, as we assume here that at most 2048 >> >> threads >> >> * will hold LOCK_NCHILDREN locks. We handle failure ok, and we >> >> should >> >> * probably be safe for the most part, but it's still a SWAG. >> >> */ >> >> #define LOCK_NCHILDREN 5 >> >> #define LOCK_CHILDCOUNT 2048 >> >> >> >> Probably the '2048' (max number of concurrent threads) needs to scale >> >> with >> >> MAXCPU. 2048 threads is probably a bit low on big x86 boxes. >> >> >> > >> > >> > Thank you for you explanation. We are expanding our ESXi cluster and >> > even >> > though with standard edition I can only assign 64 vCPU's to a guest and >> > as >> > much >> > RAM as I want, I do like to help with edge cases if I can make them >> > occur >> > pushing >> > boundaries as I can towards additianional improvements in FreeBSD. >> > >> >> Can you apply this and re-run the test? >> >> https://people.freebsd.org/~mjg/witness.diff >> >> It bumps the counters to be "high enough" but also starts tracking usage. >> If you get >> the message again, bump the values even higher. >> >> Once you get a complete poudriere run which did not result in the >> problem, >> do: >> $ sysctl debug.witness.list_used debug.witness.list_max_used >> >> to dump the actual usage. > > This is a nice little patch. Can we commit to head? Even better > would be if LOCK_CHILDCOUNT could be a tunable. On my largish system, > here's what I get shortly after boot: > > debug.witness.list_max_used: 8432 > debug.witness.list_used: 8420 > > -Alan > -- Mateusz Guzik
Re: witness_lock_list_get: witness exhausted
On Mon, Jan 8, 2018 at 5:31 PM Mateusz Guzik wrote: > > On Tue, Jan 9, 2018 at 12:41 AM, Michael Jung wrote: > > > On 2018-01-08 13:39, John Baldwin wrote: > > > >> On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote: > >> > >>> Hi! > >>> > >>> I've recently up'd my processor count on our poudriere box and have > >>> started noticing the error > >>> "witness_lock_list_get: witness exhausted" on the console. The kernel > >>> *DOES NOT* crash but I > >>> thought the report may be useful to someone. > >>> > >>> $ uname -a > >>> FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov > >>> 19 18:41:20 EST 2017 > >>> mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 > >>> > >>> The machine is pretty busy running four poudriere build instances. > >>> > >>> last pid: 76584; load averages: 115.07, 115.96, 98.30 > >>> > >>> up 6+07:32:59 14:44:03 > >>> 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock > >>> CPU: 59.0% user, 0.0% nice, 40.7% system, 0.1% interrupt, 0.1% idle > >>> Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free > >>> ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other > >>> 25G Compressed, 32G Uncompressed, 1.24:1 Ratio > >>> > >>> Let me know what additional information I might supply. > >>> > >> > >> This just means that WITNESS stopped working because it ran out of > >> pre-allocated objects. In particular the objects used to track how > >> many locks are held by how many threads: > >> > >> /* > >> * XXX: This is somewhat bogus, as we assume here that at most 2048 > >> threads > >> * will hold LOCK_NCHILDREN locks. We handle failure ok, and we should > >> * probably be safe for the most part, but it's still a SWAG. > >> */ > >> #define LOCK_NCHILDREN 5 > >> #define LOCK_CHILDCOUNT 2048 > >> > >> Probably the '2048' (max number of concurrent threads) needs to scale with > >> MAXCPU. 2048 threads is probably a bit low on big x86 boxes. > >> > > > > > > Thank you for you explanation. We are expanding our ESXi cluster and even > > though with standard edition I can only assign 64 vCPU's to a guest and as > > much > > RAM as I want, I do like to help with edge cases if I can make them occur > > pushing > > boundaries as I can towards additianional improvements in FreeBSD. > > > > Can you apply this and re-run the test? > > https://people.freebsd.org/~mjg/witness.diff > > It bumps the counters to be "high enough" but also starts tracking usage. > If you get > the message again, bump the values even higher. > > Once you get a complete poudriere run which did not result in the problem, > do: > $ sysctl debug.witness.list_used debug.witness.list_max_used > > to dump the actual usage. This is a nice little patch. Can we commit to head? Even better would be if LOCK_CHILDCOUNT could be a tunable. On my largish system, here's what I get shortly after boot: debug.witness.list_max_used: 8432 debug.witness.list_used: 8420 -Alan
Re: witness_lock_list_get: witness exhausted
On Tue, Jan 9, 2018 at 12:41 AM, Michael Jung <mi...@mikej.com> wrote: > On 2018-01-08 13:39, John Baldwin wrote: > >> On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote: >> >>> Hi! >>> >>> I've recently up'd my processor count on our poudriere box and have >>> started noticing the error >>> "witness_lock_list_get: witness exhausted" on the console. The kernel >>> *DOES NOT* crash but I >>> thought the report may be useful to someone. >>> >>> $ uname -a >>> FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov >>> 19 18:41:20 EST 2017 >>> mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 >>> >>> The machine is pretty busy running four poudriere build instances. >>> >>> last pid: 76584; load averages: 115.07, 115.96, 98.30 >>> >>> up 6+07:32:59 14:44:03 >>> 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock >>> CPU: 59.0% user, 0.0% nice, 40.7% system, 0.1% interrupt, 0.1% idle >>> Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free >>> ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other >>> 25G Compressed, 32G Uncompressed, 1.24:1 Ratio >>> >>> Let me know what additional information I might supply. >>> >> >> This just means that WITNESS stopped working because it ran out of >> pre-allocated objects. In particular the objects used to track how >> many locks are held by how many threads: >> >> /* >> * XXX: This is somewhat bogus, as we assume here that at most 2048 >> threads >> * will hold LOCK_NCHILDREN locks. We handle failure ok, and we should >> * probably be safe for the most part, but it's still a SWAG. >> */ >> #define LOCK_NCHILDREN 5 >> #define LOCK_CHILDCOUNT 2048 >> >> Probably the '2048' (max number of concurrent threads) needs to scale with >> MAXCPU. 2048 threads is probably a bit low on big x86 boxes. >> > > > Thank you for you explanation. We are expanding our ESXi cluster and even > though with standard edition I can only assign 64 vCPU's to a guest and as > much > RAM as I want, I do like to help with edge cases if I can make them occur > pushing > boundaries as I can towards additianional improvements in FreeBSD. > Can you apply this and re-run the test? https://people.freebsd.org/~mjg/witness.diff It bumps the counters to be "high enough" but also starts tracking usage. If you get the message again, bump the values even higher. Once you get a complete poudriere run which did not result in the problem, do: $ sysctl debug.witness.list_used debug.witness.list_max_used to dump the actual usage. -- Mateusz Guzik ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: witness_lock_list_get: witness exhausted
On 2018-01-08 13:39, John Baldwin wrote: On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote: Hi! I've recently up'd my processor count on our poudriere box and have started noticing the error "witness_lock_list_get: witness exhausted" on the console. The kernel *DOES NOT* crash but I thought the report may be useful to someone. $ uname -a FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov 19 18:41:20 EST 2017 mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 The machine is pretty busy running four poudriere build instances. last pid: 76584; load averages: 115.07, 115.96, 98.30 up 6+07:32:59 14:44:03 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock CPU: 59.0% user, 0.0% nice, 40.7% system, 0.1% interrupt, 0.1% idle Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other 25G Compressed, 32G Uncompressed, 1.24:1 Ratio Let me know what additional information I might supply. This just means that WITNESS stopped working because it ran out of pre-allocated objects. In particular the objects used to track how many locks are held by how many threads: /* * XXX: This is somewhat bogus, as we assume here that at most 2048 threads * will hold LOCK_NCHILDREN locks. We handle failure ok, and we should * probably be safe for the most part, but it's still a SWAG. */ #define LOCK_NCHILDREN 5 #define LOCK_CHILDCOUNT 2048 Probably the '2048' (max number of concurrent threads) needs to scale with MAXCPU. 2048 threads is probably a bit low on big x86 boxes. Thank you for you explanation. We are expanding our ESXi cluster and even though with standard edition I can only assign 64 vCPU's to a guest and as much RAM as I want, I do like to help with edge cases if I can make them occur pushing boundaries as I can towards additianional improvements in FreeBSD. --mikej ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: witness_lock_list_get: witness exhausted
On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote: > Hi! > > I've recently up'd my processor count on our poudriere box and have > started noticing the error > "witness_lock_list_get: witness exhausted" on the console. The kernel > *DOES NOT* crash but I > thought the report may be useful to someone. > > $ uname -a > FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov > 19 18:41:20 EST 2017 > mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 > > The machine is pretty busy running four poudriere build instances. > > last pid: 76584; load averages: 115.07, 115.96, 98.30 > > up 6+07:32:59 14:44:03 > 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock > CPU: 59.0% user, 0.0% nice, 40.7% system, 0.1% interrupt, 0.1% idle > Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free > ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other > 25G Compressed, 32G Uncompressed, 1.24:1 Ratio > > Let me know what additional information I might supply. This just means that WITNESS stopped working because it ran out of pre-allocated objects. In particular the objects used to track how many locks are held by how many threads: /* * XXX: This is somewhat bogus, as we assume here that at most 2048 threads * will hold LOCK_NCHILDREN locks. We handle failure ok, and we should * probably be safe for the most part, but it's still a SWAG. */ #define LOCK_NCHILDREN 5 #define LOCK_CHILDCOUNT 2048 Probably the '2048' (max number of concurrent threads) needs to scale with MAXCPU. 2048 threads is probably a bit low on big x86 boxes. -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
witness_lock_list_get: witness exhausted
Hi! I've recently up'd my processor count on our poudriere box and have started noticing the error "witness_lock_list_get: witness exhausted" on the console. The kernel *DOES NOT* crash but I thought the report may be useful to someone. $ uname -a FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov 19 18:41:20 EST 2017 mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 The machine is pretty busy running four poudriere build instances. last pid: 76584; load averages: 115.07, 115.96, 98.30 up 6+07:32:59 14:44:03 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock CPU: 59.0% user, 0.0% nice, 40.7% system, 0.1% interrupt, 0.1% idle Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other 25G Compressed, 32G Uncompressed, 1.24:1 Ratio Let me know what additional information I might supply. Thanks! --mikej Copyright (c) 1992-2017 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 12.0-CURRENT #1 r325999: Sun Nov 19 18:41:20 EST 2017 mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 FreeBSD clang version 5.0.0 (tags/RELEASE_500/final 312559) (based on LLVM 5.0.0svn) WARNING: WITNESS option enabled, expect reduced performance. VT(vga): text 80x25 CPU: Intel(R) Xeon(R) CPU E7- 4850 @ 2.00GHz (1995.00-MHz K8-class CPU) Origin="GenuineIntel" Id=0x206f2 Family=0x6 Model=0x2f Stepping=2 Features=0x1fa3fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,DTS,MMX,FXSR,SSE,SSE2,SS,HTT> Features2=0x83b82203<SSE3,PCLMULQDQ,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,HV> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> AMD Features2=0x1 Structured Extended Features=0x2 TSC: P-state invariant Hypervisor: Origin = "VMwareVMware" real memory = 96636764160 (92160 MB) avail memory = 93873786880 (89525 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 60 CPUs FreeBSD/SMP: 6 package(s) x 10 core(s) random: unblocking device. MADT: Forcing active-low polarity and level trigger for SCI ioapic0 irqs 0-23 on motherboard SMP: AP CPU #46 Launched! SMP: AP CPU #16 Launched! SMP: AP CPU #7 Launched! SMP: AP CPU #51 Launched! SMP: AP CPU #20 Launched! SMP: AP CPU #15 Launched! SMP: AP CPU #1 Launched! SMP: AP CPU #48 Launched! SMP: AP CPU #23 Launched! SMP: AP CPU #27 Launched! SMP: AP CPU #12 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #55 Launched! SMP: AP CPU #17 Launched! SMP: AP CPU #52 Launched! SMP: AP CPU #18 Launched! SMP: AP CPU #39 Launched! SMP: AP CPU #11 Launched! SMP: AP CPU #41 Launched! SMP: AP CPU #58 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #26 Launched! SMP: AP CPU #37 Launched! SMP: AP CPU #45 Launched! SMP: AP CPU #40 Launched! SMP: AP CPU #35 Launched! SMP: AP CPU #53 Launched! SMP: AP CPU #50 Launched! SMP: AP CPU #44 Launched! SMP: AP CPU #34 Launched! SMP: AP CPU #30 Launched! SMP: AP CPU #31 Launched! SMP: AP CPU #47 Launched! SMP: AP CPU #13 Launched! SMP: AP CPU #28 Launched! SMP: AP CPU #49 Launched! SMP: AP CPU #54 Launched! SMP: AP CPU #8 Launched! SMP: AP CPU #57 Launched! SMP: AP CPU #29 Launched! SMP: AP CPU #4 Launched! SMP: AP CPU #24 Launched! SMP: AP CPU #33 Launched! SMP: AP CPU #38 Launched! SMP: AP CPU #59 Launched! SMP: AP CPU #9 Launched! SMP: AP CPU #42 Launched! SMP: AP CPU #5 Launched! SMP: AP CPU #6 Launched! SMP: AP CPU #43 Launched! SMP: AP CPU #14 Launched! SMP: AP CPU #19 Launched! SMP: AP CPU #10 Launched! SMP: AP CPU #25 Launched! SMP: AP CPU #56 Launched! SMP: AP CPU #36 Launched! SMP: AP CPU #22 Launched! SMP: AP CPU #32 Launched! SMP: AP CPU #21 Launched! random: entropy device external interface netmap: loaded module [ath_hal] loaded module_register_init: MOD_LOAD (vesa, 0x80f95c20, 0) error 19 kbd1 at kbdmux0 nexus0 vtvga0: on motherboard cryptosoft0: on motherboard acpi0: on motherboard acpi0: Power Button (fixed) hpet0: iomem 0xfed0-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 cpu0: numa-domain 0 on acpi0 cpu1: numa-domain 5 on acpi0 cpu2: numa-domain 5 on acpi0 cpu3: numa-domain 5 on acpi0 cpu4: numa-domain 5 on acpi0 cpu5: numa-domain 5 on acpi0 cpu6: numa-domain 5 on acpi0 cpu7: numa-domain 5 on acpi0 cpu8: numa-domain 5 on acpi0 cpu9: numa-domain 5 on acpi0 cpu10: numa-domain 5 on acpi0 cpu11: numa-domain 4 on acpi0 cpu12: numa-domain 4 on acpi0 cpu13: numa-domain 4 on acpi0 cpu14: numa-domain 4 on acpi0 cpu15: numa-domain 4 on acpi0 cpu16: numa-domain 4 on acpi0 cpu17: numa-domain 4 on
r324870 breaks boot on amd64 with WITNESS (was "svn commit: r324870 - in head/sys: amd64/include kern")
All, I highly advise not upgrading to this revision if you use WITNESS. Please see the attached message for more details and reply to the commit log. Cheers, -Ngie > Begin forwarded message: > > From: "Ngie Cooper (yaneurabeya)" <yaneurab...@gmail.com> > Subject: Re: svn commit: r324870 - in head/sys: amd64/include kern > Date: October 22, 2017 at 17:19:32 PDT > To: Mateusz Guzik <m...@freebsd.org> > Cc: src-committers <src-committ...@freebsd.org>, svn-src-...@freebsd.org, > svn-src-h...@freebsd.org > > >> On Oct 22, 2017, at 13:43, Mateusz Guzik <m...@freebsd.org> wrote: >> >> Author: mjg >> Date: Sun Oct 22 20:43:50 2017 >> New Revision: 324870 >> URL: https://svnweb.freebsd.org/changeset/base/324870 >> >> Log: >> Make the sleepq chain hash size configurable per-arch and bump on amd64. >> >> While here cache-align chains. >> >> This shortens longest found chain during poudriere -j 80 from 32 to 16. >> >> Pushing this higher up will probably require allocation on boot. > > Hi Mateusz, > This change causes the Jenkins VMs to panic at boot with "panic: > witness_init: pending locks list is too small, increase WITNESS_PENDLIST” > when WITNESS is enabled: > https://ci.freebsd.org/job/FreeBSD-head-amd64-test/4781/console . > Please fix or revert. > Thanks, > -Ngie signature.asc Description: Message signed with OpenPGP using GPGMail
Re: New warnings from WITNESS
> On 6 Nov 2016, at 19:41, Scott Longwrote: > > >> On Nov 6, 2016, at 11:01 AM, Michael Tuexen wrote: >> >>> On 6 Nov 2016, at 15:39, Konstantin Belousov wrote: >>> >>> On Sun, Nov 06, 2016 at 02:17:45PM +0100, Michael Tuexen wrote: > On 6 Nov 2016, at 13:28, Konstantin Belousov wrote: > > On Sun, Nov 06, 2016 at 12:50:12PM +0100, Michael Tuexen wrote: >> bus_dmamap_create with the following non-sleepable locks held: >> exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ >> dev/mpt/mpt.c:2287 >> stack backtrace: >> #0 0x80ac0300 at witness_debugger+0x70 >> #1 0x80ac15e7 at witness_warn+0x3d7 >> #2 0x81055fef at bus_dmamap_create+0x2f >> #3 0x80678a25 at mpt_configure_ioc+0x3a5 >> #4 0x80677476 at mpt_attach+0x226 >> #5 0x80683299 at mpt_pci_attach+0x9c9 >> #6 0x80a9478d at device_attach+0x41d >> #7 0x80a9595a at bus_generic_attach+0x4a >> #8 0x806ebe75 at pci_attach+0xd5 >> #9 0x80a9478d at device_attach+0x41d >> #10 0x80a9595a at bus_generic_attach+0x4a >> #11 0x803c11a2 at acpi_pcib_acpi_attach+0x402 >> #12 0x80a9478d at device_attach+0x41d >> #13 0x80a9595a at bus_generic_attach+0x4a >> #14 0x803b4c8f at acpi_attach+0xdbf >> #15 0x80a9478d at device_attach+0x41d >> #16 0x80a9595a at bus_generic_attach+0x4a >> #17 0x80ee03e3 at nexus_acpi_attach+0x73 >> >> ... and so on. Not sure which revision introduced it... > r308268 > > I believe that this is an mpt(4) driver issue, which calls > bus_dmamap_create(9) with the mpt mutex held. OK. Whom to contact? Or are you willing to look into it? I haven't worked in that area... >>> >>> I am really not sure. Looking at the svn history is not very encouraging. >> Hmm. Time to change my setup, I guess... >>> >>> Might be try freebsd-scsi@ as well. >> I dropped them an e-mail... >> >> > > Please see my response on that list. Will do once it shows up in the archive since I'm not subscribed to that list... Best regards Michael > > Scott > > ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: New warnings from WITNESS
> On Nov 6, 2016, at 11:01 AM, Michael Tuexenwrote: > >> On 6 Nov 2016, at 15:39, Konstantin Belousov wrote: >> >> On Sun, Nov 06, 2016 at 02:17:45PM +0100, Michael Tuexen wrote: On 6 Nov 2016, at 13:28, Konstantin Belousov wrote: On Sun, Nov 06, 2016 at 12:50:12PM +0100, Michael Tuexen wrote: > bus_dmamap_create with the following non-sleepable locks held: > exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ > dev/mpt/mpt.c:2287 > stack backtrace: > #0 0x80ac0300 at witness_debugger+0x70 > #1 0x80ac15e7 at witness_warn+0x3d7 > #2 0x81055fef at bus_dmamap_create+0x2f > #3 0x80678a25 at mpt_configure_ioc+0x3a5 > #4 0x80677476 at mpt_attach+0x226 > #5 0x80683299 at mpt_pci_attach+0x9c9 > #6 0x80a9478d at device_attach+0x41d > #7 0x80a9595a at bus_generic_attach+0x4a > #8 0x806ebe75 at pci_attach+0xd5 > #9 0x80a9478d at device_attach+0x41d > #10 0x80a9595a at bus_generic_attach+0x4a > #11 0x803c11a2 at acpi_pcib_acpi_attach+0x402 > #12 0x80a9478d at device_attach+0x41d > #13 0x80a9595a at bus_generic_attach+0x4a > #14 0x803b4c8f at acpi_attach+0xdbf > #15 0x80a9478d at device_attach+0x41d > #16 0x80a9595a at bus_generic_attach+0x4a > #17 0x80ee03e3 at nexus_acpi_attach+0x73 > > ... and so on. Not sure which revision introduced it... r308268 I believe that this is an mpt(4) driver issue, which calls bus_dmamap_create(9) with the mpt mutex held. >>> OK. Whom to contact? Or are you willing to look into it? >>> I haven't worked in that area... >> >> I am really not sure. Looking at the svn history is not very encouraging. > Hmm. Time to change my setup, I guess... >> >> Might be try freebsd-scsi@ as well. > I dropped them an e-mail... > > Please see my response on that list. Scott ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: New warnings from WITNESS
> On 6 Nov 2016, at 15:39, Konstantin Belousovwrote: > > On Sun, Nov 06, 2016 at 02:17:45PM +0100, Michael Tuexen wrote: >>> On 6 Nov 2016, at 13:28, Konstantin Belousov wrote: >>> >>> On Sun, Nov 06, 2016 at 12:50:12PM +0100, Michael Tuexen wrote: bus_dmamap_create with the following non-sleepable locks held: exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ dev/mpt/mpt.c:2287 stack backtrace: #0 0x80ac0300 at witness_debugger+0x70 #1 0x80ac15e7 at witness_warn+0x3d7 #2 0x81055fef at bus_dmamap_create+0x2f #3 0x80678a25 at mpt_configure_ioc+0x3a5 #4 0x80677476 at mpt_attach+0x226 #5 0x80683299 at mpt_pci_attach+0x9c9 #6 0x80a9478d at device_attach+0x41d #7 0x80a9595a at bus_generic_attach+0x4a #8 0x806ebe75 at pci_attach+0xd5 #9 0x80a9478d at device_attach+0x41d #10 0x80a9595a at bus_generic_attach+0x4a #11 0x803c11a2 at acpi_pcib_acpi_attach+0x402 #12 0x80a9478d at device_attach+0x41d #13 0x80a9595a at bus_generic_attach+0x4a #14 0x803b4c8f at acpi_attach+0xdbf #15 0x80a9478d at device_attach+0x41d #16 0x80a9595a at bus_generic_attach+0x4a #17 0x80ee03e3 at nexus_acpi_attach+0x73 ... and so on. Not sure which revision introduced it... >>> r308268 >>> >>> I believe that this is an mpt(4) driver issue, which calls >>> bus_dmamap_create(9) with the mpt mutex held. >> OK. Whom to contact? Or are you willing to look into it? >> I haven't worked in that area... > > I am really not sure. Looking at the svn history is not very encouraging. Hmm. Time to change my setup, I guess... > > Might be try freebsd-scsi@ as well. I dropped them an e-mail... Best regards Michael ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: New warnings from WITNESS
On Sun, Nov 06, 2016 at 02:17:45PM +0100, Michael Tuexen wrote: > > On 6 Nov 2016, at 13:28, Konstantin Belousovwrote: > > > > On Sun, Nov 06, 2016 at 12:50:12PM +0100, Michael Tuexen wrote: > >> bus_dmamap_create with the following non-sleepable locks held: > >> exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ > >> dev/mpt/mpt.c:2287 > >> stack backtrace: > >> #0 0x80ac0300 at witness_debugger+0x70 > >> #1 0x80ac15e7 at witness_warn+0x3d7 > >> #2 0x81055fef at bus_dmamap_create+0x2f > >> #3 0x80678a25 at mpt_configure_ioc+0x3a5 > >> #4 0x80677476 at mpt_attach+0x226 > >> #5 0x80683299 at mpt_pci_attach+0x9c9 > >> #6 0x80a9478d at device_attach+0x41d > >> #7 0x80a9595a at bus_generic_attach+0x4a > >> #8 0x806ebe75 at pci_attach+0xd5 > >> #9 0x80a9478d at device_attach+0x41d > >> #10 0x80a9595a at bus_generic_attach+0x4a > >> #11 0x803c11a2 at acpi_pcib_acpi_attach+0x402 > >> #12 0x80a9478d at device_attach+0x41d > >> #13 0x80a9595a at bus_generic_attach+0x4a > >> #14 0x803b4c8f at acpi_attach+0xdbf > >> #15 0x80a9478d at device_attach+0x41d > >> #16 0x80a9595a at bus_generic_attach+0x4a > >> #17 0x80ee03e3 at nexus_acpi_attach+0x73 > >> > >> ... and so on. Not sure which revision introduced it... > > r308268 > > > > I believe that this is an mpt(4) driver issue, which calls > > bus_dmamap_create(9) with the mpt mutex held. > OK. Whom to contact? Or are you willing to look into it? > I haven't worked in that area... I am really not sure. Looking at the svn history is not very encouraging. Might be try freebsd-scsi@ as well. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: New warnings from WITNESS
> On 6 Nov 2016, at 13:28, Konstantin Belousovwrote: > > On Sun, Nov 06, 2016 at 12:50:12PM +0100, Michael Tuexen wrote: >> bus_dmamap_create with the following non-sleepable locks held: >> exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ >> dev/mpt/mpt.c:2287 >> stack backtrace: >> #0 0x80ac0300 at witness_debugger+0x70 >> #1 0x80ac15e7 at witness_warn+0x3d7 >> #2 0x81055fef at bus_dmamap_create+0x2f >> #3 0x80678a25 at mpt_configure_ioc+0x3a5 >> #4 0x80677476 at mpt_attach+0x226 >> #5 0x80683299 at mpt_pci_attach+0x9c9 >> #6 0x80a9478d at device_attach+0x41d >> #7 0x80a9595a at bus_generic_attach+0x4a >> #8 0x806ebe75 at pci_attach+0xd5 >> #9 0x80a9478d at device_attach+0x41d >> #10 0x80a9595a at bus_generic_attach+0x4a >> #11 0x803c11a2 at acpi_pcib_acpi_attach+0x402 >> #12 0x80a9478d at device_attach+0x41d >> #13 0x80a9595a at bus_generic_attach+0x4a >> #14 0x803b4c8f at acpi_attach+0xdbf >> #15 0x80a9478d at device_attach+0x41d >> #16 0x80a9595a at bus_generic_attach+0x4a >> #17 0x80ee03e3 at nexus_acpi_attach+0x73 >> >> ... and so on. Not sure which revision introduced it... > r308268 > > I believe that this is an mpt(4) driver issue, which calls > bus_dmamap_create(9) with the mpt mutex held. OK. Whom to contact? Or are you willing to look into it? I haven't worked in that area... Best regards Michael ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: New warnings from WITNESS
On Sun, Nov 06, 2016 at 12:50:12PM +0100, Michael Tuexen wrote: > bus_dmamap_create with the following non-sleepable locks held: > exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ > dev/mpt/mpt.c:2287 > stack backtrace: > #0 0x80ac0300 at witness_debugger+0x70 > #1 0x80ac15e7 at witness_warn+0x3d7 > #2 0x81055fef at bus_dmamap_create+0x2f > #3 0x80678a25 at mpt_configure_ioc+0x3a5 > #4 0x80677476 at mpt_attach+0x226 > #5 0x80683299 at mpt_pci_attach+0x9c9 > #6 0x80a9478d at device_attach+0x41d > #7 0x80a9595a at bus_generic_attach+0x4a > #8 0x806ebe75 at pci_attach+0xd5 > #9 0x80a9478d at device_attach+0x41d > #10 0x80a9595a at bus_generic_attach+0x4a > #11 0x803c11a2 at acpi_pcib_acpi_attach+0x402 > #12 0x80a9478d at device_attach+0x41d > #13 0x80a9595a at bus_generic_attach+0x4a > #14 0x803b4c8f at acpi_attach+0xdbf > #15 0x80a9478d at device_attach+0x41d > #16 0x80a9595a at bus_generic_attach+0x4a > #17 0x80ee03e3 at nexus_acpi_attach+0x73 > > ... and so on. Not sure which revision introduced it... r308268 I believe that this is an mpt(4) driver issue, which calls bus_dmamap_create(9) with the mpt mutex held. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
New warnings from WITNESS
Dear all, when booting a recent kernel [freebsd12:~] tuexen% uname -a FreeBSD freebsd12.testbed 12.0-CURRENT FreeBSD 12.0-CURRENT #702 r308359M: Sun Nov 6 11:55:17 CET 2016 tuexen@freebsd12.testbed:/usr/home/tuexen/head/sys/amd64/compile/SCTP amd64 on a VMWare Fusion VM, I get a lot of warnings like bus_dmamap_create with the following non-sleepable locks held: exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ dev/mpt/mpt.c:2287 stack backtrace: #0 0x80ac0300 at witness_debugger+0x70 #1 0x80ac15e7 at witness_warn+0x3d7 #2 0x81055fef at bus_dmamap_create+0x2f #3 0x80678a25 at mpt_configure_ioc+0x3a5 #4 0x80677476 at mpt_attach+0x226 #5 0x80683299 at mpt_pci_attach+0x9c9 #6 0x80a9478d at device_attach+0x41d #7 0x80a9595a at bus_generic_attach+0x4a #8 0x806ebe75 at pci_attach+0xd5 #9 0x80a9478d at device_attach+0x41d #10 0x80a9595a at bus_generic_attach+0x4a #11 0x803c11a2 at acpi_pcib_acpi_attach+0x402 #12 0x80a9478d at device_attach+0x41d #13 0x80a9595a at bus_generic_attach+0x4a #14 0x803b4c8f at acpi_attach+0xdbf #15 0x80a9478d at device_attach+0x41d #16 0x80a9595a at bus_generic_attach+0x4a #17 0x80ee03e3 at nexus_acpi_attach+0x73 bus_dmamap_create with the following non-sleepable locks held: exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ dev/mpt/mpt.c:2287 stack backtrace: #0 0x80ac0300 at witness_debugger+0x70 #1 0x80ac15e7 at witness_warn+0x3d7 #2 0x81055fef at bus_dmamap_create+0x2f #3 0x80678a25 at mpt_configure_ioc+0x3a5 #4 0x80677476 at mpt_attach+0x226 #5 0x80683299 at mpt_pci_attach+0x9c9 #6 0x80a9478d at device_attach+0x41d #7 0x80a9595a at bus_generic_attach+0x4a #8 0x806ebe75 at pci_attach+0xd5 #9 0x80a9478d at device_attach+0x41d #10 0x80a9595a at bus_generic_attach+0x4a #11 0x803c11a2 at acpi_pcib_acpi_attach+0x402 #12 0x80a9478d at device_attach+0x41d #13 0x80a9595a at bus_generic_attach+0x4a #14 0x803b4c8f at acpi_attach+0xdbf #15 0x80a9478d at device_attach+0x41d #16 0x80a9595a at bus_generic_attach+0x4a #17 0x80ee03e3 at nexus_acpi_attach+0x73 bus_dmamap_create with the following non-sleepable locks held: exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ dev/mpt/mpt.c:2287 stack backtrace: #0 0x80ac0300 at witness_debugger+0x70 #1 0x80ac15e7 at witness_warn+0x3d7 #2 0x81055fef at bus_dmamap_create+0x2f #3 0x80678a25 at mpt_configure_ioc+0x3a5 #4 0x80677476 at mpt_attach+0x226 #5 0x80683299 at mpt_pci_attach+0x9c9 #6 0x80a9478d at device_attach+0x41d #7 0x80a9595a at bus_generic_attach+0x4a #8 0x806ebe75 at pci_attach+0xd5 #9 0x80a9478d at device_attach+0x41d #10 0x80a9595a at bus_generic_attach+0x4a #11 0x803c11a2 at acpi_pcib_acpi_attach+0x402 #12 0x80a9478d at device_attach+0x41d #13 0x80a9595a at bus_generic_attach+0x4a #14 0x803b4c8f at acpi_attach+0xdbf #15 0x80a9478d at device_attach+0x41d #16 0x80a9595a at bus_generic_attach+0x4a #17 0x80ee03e3 at nexus_acpi_attach+0x73 bus_dmamap_create with the following non-sleepable locks held: exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ dev/mpt/mpt.c:2287 stack backtrace: #0 0x80ac0300 at witness_debugger+0x70 #1 0x80ac15e7 at witness_warn+0x3d7 #2 0x81055fef at bus_dmamap_create+0x2f #3 0x80678a25 at mpt_configure_ioc+0x3a5 #4 0x80677476 at mpt_attach+0x226 #5 0x80683299 at mpt_pci_attach+0x9c9 #6 0x80a9478d at device_attach+0x41d #7 0x80a9595a at bus_generic_attach+0x4a #8 0x806ebe75 at pci_attach+0xd5 #9 0x80a9478d at device_attach+0x41d #10 0x80a9595a at bus_generic_attach+0x4a #11 0x803c11a2 at acpi_pcib_acpi_attach+0x402 #12 0x80a9478d at device_attach+0x41d #13 0x80a9595a at bus_generic_attach+0x4a #14 0x803b4c8f at acpi_attach+0xdbf #15 0x80a9478d at device_attach+0x41d #16 0x80a9595a at bus_generic_attach+0x4a #17 0x80ee03e3 at nexus_acpi_attach+0x73 ... and so on. Not sure which revision introduced it... Best regards Michael ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: WITNESS messages on 11
I'm not sure, but maybe this is related to [1]. FWIW, according to [2], LORs of witness means deadlock could be happened in that situation (if it's not a false positive), not necessarily an accurate deadlock happening IMO :) I hope it helps :) [1]http://lists.freebsd.org/pipermail/freebsd-current/2016-May/061195.html [2]https://www.freebsd.org/doc/faq/troubleshoot.html#idp63374416 Best Regards, Mokhi. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
WITNESS messages on 11
I saw this every time I tried to install 11 from the VM image recently. While copying over the ports tree with the command "nc | tar xzf -" I get the messages, see below. Yuri lock order reversal: 1st 0xfe003ca55570 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:3512 2nd 0xf80003db6a00 dirhash (dirhash) @ /usr/src/sys/ufs/ufs/ufs_dirhash.c:281 stack backtrace: #0 0x80a981b0 at witness_debugger+0x70 #1 0x80a980a4 at witness_checkorder+0xe54 #2 0x80a429a2 at _sx_xlock+0x72 #3 0x80cf32dd at ufsdirhash_add+0x3d #4 0x80cf60ba at ufs_direnter+0x4da #5 0x80cfe6b9 at ufs_mkdir+0x8a9 #6 0x80ff7f47 at VOP_MKDIR_APV+0xf7 #7 0x80b05c88 at kern_mkdirat+0x208 #8 0x80e9cc1b at amd64_syscall+0x2db #9 0x80e7cbdb at Xfast_syscall+0xfb lock order reversal: 1st 0xf800388ed5f0 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:894 2nd 0xfe003cb6ccb0 bufwait (bufwait) @ /usr/src/sys/ufs/ffs/ffs_vnops.c:263 3rd 0xf8003b40b7c8 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2498 stack backtrace: #0 0x80a981b0 at witness_debugger+0x70 #1 0x80a980a4 at witness_checkorder+0xe54 #2 0x80a12b06 at __lockmgr_args+0x4d6 #3 0x80cedf46 at ffs_lock+0xa6 #4 0x80ff88f0 at VOP_LOCK1_APV+0x100 #5 0x80b0880a at _vn_lock+0x9a #6 0x80af8c63 at vget+0x63 #7 0x80aeb52c at vfs_hash_get+0xcc #8 0x80ce95a0 at ffs_vgetf+0x40 #9 0x80ce0e91 at softdep_sync_buf+0xb51 #10 0x80ceeb36 at ffs_syncvnode+0x256 #11 0x80cedde0 at ffs_fsync+0x20 #12 0x80ff79b7 at VOP_FSYNC_APV+0xf7 #13 0x80ada695 at bufsync+0x35 #14 0x80af6d12 at bufobj_invalbuf+0xa2 #15 0x80af9f4e at vgonel+0x15e #16 0x80afce37 at vnlru_proc+0x577 #17 0x809fd8c4 at fork_exit+0x84 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Is a high witness refcount indicative of a missing unlock?
On Sunday, March 29, 2015 02:29:42 PM Lars wrote: Hi, I am poking around for a cause for my repeating deadlock issues on my system based on r 279869. ddb show witness show the “vnode interlock” and the “zfs” locks both with reference counts over 200K. Obviously they are related, and there is a find running (all the filesystems on this machine are zfs ( minus the specialty ones like devfs). I don’t see any other withness entry with reference counts even in the ballpark of these two, so does this indicate that we have a vnode/zfs path were we don’t unlock? The ref count just means that the locks exist (so you have 200k ZFS vnodes), not that the locks are currently held. The reference count on the witness object is bumped when a lock is created that uses that witness object and released when the lock is destroyed. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Is a high witness refcount indicative of a missing unlock?
Hi Shane, While our configs shoulds much the same (ignoring 10.1-stable vs current) and I had the nvidia driver loaded, my lockups did not involve the nvidia driver. That of course does not necessarily mean anything if the issue is squarly in zfs somewhere. You can see the reference counts from the ddb kernel debugger (man 8 ddb) using the “show witness” command Lars On Mar 29, 2015, at 19:14, Shane Ambler free...@shaneware.biz wrote: On 30/03/2015 05:59, Lars wrote: Hi, I am poking around for a cause for my repeating deadlock issues on my system based on r 279869. ddb show witness show the “vnode interlock†and the “zfs†locks both with reference counts over 200K. Obviously they are related, and there is a find running (all the filesystems on this machine are zfs ( minus the specialty ones like devfs). I don’t see any other withness entry with reference counts even in the ballpark of these two, so does this indicate that we have a vnode/zfs path were we don’t unlock? I am running 10.1-STABLE and have bad locking issues. Running a witness kernel I got a duplicate lock from nvidia and lock order reversals involving zfs. Any chance your issue is related to mine? What command can give me the witness lock counts? The debug data I have collected so far is at - http://shaneware.biz/freebsddebugdata/ The lock reversal output I had was (after uptime of about 12 mins) - Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system process `vnlru' to stop...done Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system process `syncer' to stop... Mar 24 00:24:25 leader kernel: Syncing disks, vnodes remaining...0 0 0 0 0 0 0 0 done Mar 24 00:24:25 leader kernel: All buffers synced. Mar 24 00:24:25 leader kernel: lock order reversal: Mar 24 00:24:25 leader kernel: 1st 0xf800224555f0 zfs (zfs) @ /usr/src/sys/kern/vfs_mount.c:1229 Mar 24 00:24:25 leader kernel: 2nd 0xf800222d67c8 syncer (syncer) @ /usr/src/sys/kern/vfs_subr.c:2268 Mar 24 00:24:25 leader kernel: KDB: stack backtrace: Mar 24 00:24:25 leader kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe022df6e4c0 Mar 24 00:24:25 leader kernel: kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe022df6e570 Mar 24 00:24:25 leader kernel: witness_checkorder() at witness_checkorder+0xdc2/frame 0xfe022df6e600 Mar 24 00:24:25 leader kernel: __lockmgr_args() at __lockmgr_args+0x9ea/frame 0xfe022df6e740 Mar 24 00:24:25 leader kernel: vop_stdlock() at vop_stdlock+0x3c/frame 0xfe022df6e760 Mar 24 00:24:25 leader kernel: VOP_LOCK1_APV() at VOP_LOCK1_APV+0xfc/frame 0xfe022df6e790 Mar 24 00:24:25 leader kernel: _vn_lock() at _vn_lock+0xaa/frame 0xfe022df6e800 Mar 24 00:24:25 leader kernel: vputx() at vputx+0x232/frame 0xfe022df6e860 Mar 24 00:24:25 leader kernel: dounmount() at dounmount+0x301/frame 0xfe022df6e8e0 Mar 24 00:24:25 leader kernel: vfs_unmountall() at vfs_unmountall+0x61/frame 0xfe022df6e910 Mar 24 00:24:25 leader kernel: kern_reboot() at kern_reboot+0x540/frame 0xfe022df6e980 Mar 24 00:24:25 leader kernel: sys_reboot() at sys_reboot+0x5a/frame 0xfe022df6e9a0 Mar 24 00:24:25 leader kernel: amd64_syscall() at amd64_syscall+0x25a/frame 0xfe022df6eab0 Mar 24 00:24:25 leader kernel: Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe022df6eab0 Mar 24 00:24:25 leader kernel: --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40f1bc, rsp = 0x7fffe6d8, rbp = 0x7fffe7d0 --- Mar 24 00:24:25 leader kernel: lock order reversal: Mar 24 00:24:25 leader kernel: 1st 0xf800222d6b78 zfs (zfs) @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1814 Mar 24 00:24:25 leader kernel: 2nd 0x818514a8 allproc (allproc) @ /usr/src/sys/kern/kern_descrip.c:2872 Mar 24 00:24:25 leader kernel: KDB: stack backtrace: Mar 24 00:24:25 leader kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe022df6e690 Mar 24 00:24:25 leader kernel: kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe022df6e740 Mar 24 00:24:25 leader kernel: witness_checkorder() at witness_checkorder+0xdc2/frame 0xfe022df6e7d0 Mar 24 00:24:25 leader kernel: _sx_slock() at _sx_slock+0x76/frame 0xfe022df6e810 Mar 24 00:24:25 leader kernel: mountcheckdirs() at mountcheckdirs+0x47/frame 0xfe022df6e860 Mar 24 00:24:25 leader kernel: dounmount() at dounmount+0x36f/frame 0xfe022df6e8e0 Mar 24 00:24:25 leader kernel: vfs_unmountall() at vfs_unmountall+0x61/frame 0xfe022df6e910 Mar 24 00:24:25 leader kernel: kern_reboot() at kern_reboot+0x540/frame 0xfe022df6e980 Mar 24 00:24:25 leader kernel: sys_reboot() at sys_reboot+0x5a/frame 0xfe022df6e9a0 Mar 24 00:24:25 leader kernel: amd64_syscall
Re: Is a high witness refcount indicative of a missing unlock?
On 30/03/2015 05:59, Lars wrote: Hi, I am poking around for a cause for my repeating deadlock issues on my system based on r 279869. ddb show witness show the “vnode interlock†and the “zfs†locks both with reference counts over 200K. Obviously they are related, and there is a find running (all the filesystems on this machine are zfs ( minus the specialty ones like devfs). I don’t see any other withness entry with reference counts even in the ballpark of these two, so does this indicate that we have a vnode/zfs path were we don’t unlock? I am running 10.1-STABLE and have bad locking issues. Running a witness kernel I got a duplicate lock from nvidia and lock order reversals involving zfs. Any chance your issue is related to mine? What command can give me the witness lock counts? The debug data I have collected so far is at - http://shaneware.biz/freebsddebugdata/ The lock reversal output I had was (after uptime of about 12 mins) - Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system process `vnlru' to stop...done Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system process `bufdaemon' to stop...done Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system process `syncer' to stop... Mar 24 00:24:25 leader kernel: Syncing disks, vnodes remaining...0 0 0 0 0 0 0 0 done Mar 24 00:24:25 leader kernel: All buffers synced. Mar 24 00:24:25 leader kernel: lock order reversal: Mar 24 00:24:25 leader kernel: 1st 0xf800224555f0 zfs (zfs) @ /usr/src/sys/kern/vfs_mount.c:1229 Mar 24 00:24:25 leader kernel: 2nd 0xf800222d67c8 syncer (syncer) @ /usr/src/sys/kern/vfs_subr.c:2268 Mar 24 00:24:25 leader kernel: KDB: stack backtrace: Mar 24 00:24:25 leader kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe022df6e4c0 Mar 24 00:24:25 leader kernel: kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe022df6e570 Mar 24 00:24:25 leader kernel: witness_checkorder() at witness_checkorder+0xdc2/frame 0xfe022df6e600 Mar 24 00:24:25 leader kernel: __lockmgr_args() at __lockmgr_args+0x9ea/frame 0xfe022df6e740 Mar 24 00:24:25 leader kernel: vop_stdlock() at vop_stdlock+0x3c/frame 0xfe022df6e760 Mar 24 00:24:25 leader kernel: VOP_LOCK1_APV() at VOP_LOCK1_APV+0xfc/frame 0xfe022df6e790 Mar 24 00:24:25 leader kernel: _vn_lock() at _vn_lock+0xaa/frame 0xfe022df6e800 Mar 24 00:24:25 leader kernel: vputx() at vputx+0x232/frame 0xfe022df6e860 Mar 24 00:24:25 leader kernel: dounmount() at dounmount+0x301/frame 0xfe022df6e8e0 Mar 24 00:24:25 leader kernel: vfs_unmountall() at vfs_unmountall+0x61/frame 0xfe022df6e910 Mar 24 00:24:25 leader kernel: kern_reboot() at kern_reboot+0x540/frame 0xfe022df6e980 Mar 24 00:24:25 leader kernel: sys_reboot() at sys_reboot+0x5a/frame 0xfe022df6e9a0 Mar 24 00:24:25 leader kernel: amd64_syscall() at amd64_syscall+0x25a/frame 0xfe022df6eab0 Mar 24 00:24:25 leader kernel: Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe022df6eab0 Mar 24 00:24:25 leader kernel: --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40f1bc, rsp = 0x7fffe6d8, rbp = 0x7fffe7d0 --- Mar 24 00:24:25 leader kernel: lock order reversal: Mar 24 00:24:25 leader kernel: 1st 0xf800222d6b78 zfs (zfs) @ /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1814 Mar 24 00:24:25 leader kernel: 2nd 0x818514a8 allproc (allproc) @ /usr/src/sys/kern/kern_descrip.c:2872 Mar 24 00:24:25 leader kernel: KDB: stack backtrace: Mar 24 00:24:25 leader kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe022df6e690 Mar 24 00:24:25 leader kernel: kdb_backtrace() at kdb_backtrace+0x39/frame 0xfe022df6e740 Mar 24 00:24:25 leader kernel: witness_checkorder() at witness_checkorder+0xdc2/frame 0xfe022df6e7d0 Mar 24 00:24:25 leader kernel: _sx_slock() at _sx_slock+0x76/frame 0xfe022df6e810 Mar 24 00:24:25 leader kernel: mountcheckdirs() at mountcheckdirs+0x47/frame 0xfe022df6e860 Mar 24 00:24:25 leader kernel: dounmount() at dounmount+0x36f/frame 0xfe022df6e8e0 Mar 24 00:24:25 leader kernel: vfs_unmountall() at vfs_unmountall+0x61/frame 0xfe022df6e910 Mar 24 00:24:25 leader kernel: kern_reboot() at kern_reboot+0x540/frame 0xfe022df6e980 Mar 24 00:24:25 leader kernel: sys_reboot() at sys_reboot+0x5a/frame 0xfe022df6e9a0 Mar 24 00:24:25 leader kernel: amd64_syscall() at amd64_syscall+0x25a/frame 0xfe022df6eab0 Mar 24 00:24:25 leader kernel: Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfe022df6eab0 Mar 24 00:24:25 leader kernel: --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40f1bc, rsp = 0x7fffe6d8, rbp = 0x7fffe7d0 --- Mar 24 00:24:25 leader kernel: lock order reversal: Mar 24 00:24:25 leader kernel: 1st 0xf8001ca8e240 zfs (zfs) @ /usr/src/sys/kern/vfs_mount.c:1229 Mar 24 00:24:25 leader kernel: 2nd 0xf8001ca8e5f0 devfs (devfs) @ /usr/src/sys/kern
Is a high witness refcount indicative of a missing unlock?
Hi, I am poking around for a cause for my repeating deadlock issues on my system based on r 279869. ddb show witness show the “vnode interlock” and the “zfs” locks both with reference counts over 200K. Obviously they are related, and there is a find running (all the filesystems on this machine are zfs ( minus the specialty ones like devfs). I don’t see any other withness entry with reference counts even in the ballpark of these two, so does this indicate that we have a vnode/zfs path were we don’t unlock? Thanks, Lars ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: witness and modules.
On 03/12/2014 04:33, Julian Elischer wrote: On 12/3/14, 12:24 AM, Warner Losh wrote: On Dec 1, 2014, at 10:08 PM, Julian Elischer jul...@freebsd.org wrote: On 12/1/14, 11:39 PM, John Baldwin wrote: On Friday, November 28, 2014 11:08:35 PM Julian Elischer wrote: Do we need to compile all modules with witness definitions when linking with a kernel compiled with witness? This was true at one stage but I remember some work was done to make them compatible. You should not need this. modules always call functions in the kernel for lock operations and this functions are what invoke WITNESS. that's what I thought but empirical evidence disagrees. I'll try some more cases. I swap back and forth all the time between the two. Kernel modules don’t change when you compile them with WITNESS or without. not entirely.. hwpmc.ko: U witness_restore hwpmc.ko: U witness_save zfs.ko: U witness_restore zfs.ko: U witness_save Seems like the problem affects modules that use DROP_GIANT / PICKUP_GIANT. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: witness and modules.
On 12/3/14, 7:26 PM, Andriy Gapon wrote: On 03/12/2014 04:33, Julian Elischer wrote: On 12/3/14, 12:24 AM, Warner Losh wrote: On Dec 1, 2014, at 10:08 PM, Julian Elischer jul...@freebsd.org wrote: On 12/1/14, 11:39 PM, John Baldwin wrote: On Friday, November 28, 2014 11:08:35 PM Julian Elischer wrote: Do we need to compile all modules with witness definitions when linking with a kernel compiled with witness? This was true at one stage but I remember some work was done to make them compatible. You should not need this. modules always call functions in the kernel for lock operations and this functions are what invoke WITNESS. that's what I thought but empirical evidence disagrees. I'll try some more cases. I swap back and forth all the time between the two. Kernel modules don’t change when you compile them with WITNESS or without. not entirely.. hwpmc.ko: U witness_restore hwpmc.ko: U witness_save zfs.ko: U witness_restore zfs.ko: U witness_save Seems like the problem affects modules that use DROP_GIANT / PICKUP_GIANT. that's a good observation. I'll take a look a that later. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: witness and modules.
On Wednesday, December 03, 2014 09:53:20 PM Julian Elischer wrote: On 12/3/14, 7:26 PM, Andriy Gapon wrote: On 03/12/2014 04:33, Julian Elischer wrote: On 12/3/14, 12:24 AM, Warner Losh wrote: On Dec 1, 2014, at 10:08 PM, Julian Elischer jul...@freebsd.org wrote: On 12/1/14, 11:39 PM, John Baldwin wrote: On Friday, November 28, 2014 11:08:35 PM Julian Elischer wrote: Do we need to compile all modules with witness definitions when linking with a kernel compiled with witness? This was true at one stage but I remember some work was done to make them compatible. You should not need this. modules always call functions in the kernel for lock operations and this functions are what invoke WITNESS. that's what I thought but empirical evidence disagrees. I'll try some more cases. I swap back and forth all the time between the two. Kernel modules don’t change when you compile them with WITNESS or without. not entirely.. hwpmc.ko: U witness_restore hwpmc.ko: U witness_save zfs.ko: U witness_restore zfs.ko: U witness_save Seems like the problem affects modules that use DROP_GIANT / PICKUP_GIANT. that's a good observation. I'll take a look a that later. Yes, that isn't really intended to be used publically. The pmc one is stale as system calls haven't run with Giant in several releases. All the ones for g_topology_lock() also seem to be broken-by-designed. There is no good reason I can think of for g_topology_lock() to assert that Giant isn't held. I suspect phk@ just wanted to force geom to be locked without Giant, but I'm not sure that is the best way to achieve that? Poul, is that correct? If you fix that you can remove almost all of the DROP_GIANT/PICKUP_GIANT in the tree. They should really only be in the _sleep() and cv_wait_*() functions where they are used to give Giant its special property of being dropped while asleep. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: witness and modules.
On Dec 1, 2014, at 10:08 PM, Julian Elischer jul...@freebsd.org wrote: On 12/1/14, 11:39 PM, John Baldwin wrote: On Friday, November 28, 2014 11:08:35 PM Julian Elischer wrote: Do we need to compile all modules with witness definitions when linking with a kernel compiled with witness? This was true at one stage but I remember some work was done to make them compatible. You should not need this. modules always call functions in the kernel for lock operations and this functions are what invoke WITNESS. that's what I thought but empirical evidence disagrees. I'll try some more cases. I swap back and forth all the time between the two. Kernel modules don’t change when you compile them with WITNESS or without. Warner signature.asc Description: Message signed with OpenPGP using GPGMail
Re: witness and modules.
On 12/3/14, 12:24 AM, Warner Losh wrote: On Dec 1, 2014, at 10:08 PM, Julian Elischer jul...@freebsd.org wrote: On 12/1/14, 11:39 PM, John Baldwin wrote: On Friday, November 28, 2014 11:08:35 PM Julian Elischer wrote: Do we need to compile all modules with witness definitions when linking with a kernel compiled with witness? This was true at one stage but I remember some work was done to make them compatible. You should not need this. modules always call functions in the kernel for lock operations and this functions are what invoke WITNESS. that's what I thought but empirical evidence disagrees. I'll try some more cases. I swap back and forth all the time between the two. Kernel modules don’t change when you compile them with WITNESS or without. Warner it's a 10.0 based system. the zfs kernel module grew a reference to witness_restore and witness_save when it was compiled with witness in opt_global.h. I certainly wouldn't load with a standard kernel. it may have been somethign funny with the zfs compat layer stuff, and I'm pretty sure people don't use zfs as a module much, but it was definitely there. note: it'd probably work the other way around.. a non-witness module would probably load on a witness kernel.. none of the other kernel modules had that extra reference so I think it's specific to zfs. Maybe to the way it was compiled.. the sources are pretty standard but the makefiles are a bit hacked.. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: witness and modules.
On 12/3/14, 12:24 AM, Warner Losh wrote: On Dec 1, 2014, at 10:08 PM, Julian Elischer jul...@freebsd.org wrote: On 12/1/14, 11:39 PM, John Baldwin wrote: On Friday, November 28, 2014 11:08:35 PM Julian Elischer wrote: Do we need to compile all modules with witness definitions when linking with a kernel compiled with witness? This was true at one stage but I remember some work was done to make them compatible. You should not need this. modules always call functions in the kernel for lock operations and this functions are what invoke WITNESS. that's what I thought but empirical evidence disagrees. I'll try some more cases. I swap back and forth all the time between the two. Kernel modules don’t change when you compile them with WITNESS or without. not entirely.. hwpmc.ko: U witness_restore hwpmc.ko: U witness_save zfs.ko: U witness_restore zfs.ko: U witness_save Warner ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: witness and modules.
On Friday, November 28, 2014 11:08:35 PM Julian Elischer wrote: Do we need to compile all modules with witness definitions when linking with a kernel compiled with witness? This was true at one stage but I remember some work was done to make them compatible. You should not need this. modules always call functions in the kernel for lock operations and this functions are what invoke WITNESS. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: witness and modules.
On 12/1/14, 11:39 PM, John Baldwin wrote: On Friday, November 28, 2014 11:08:35 PM Julian Elischer wrote: Do we need to compile all modules with witness definitions when linking with a kernel compiled with witness? This was true at one stage but I remember some work was done to make them compatible. You should not need this. modules always call functions in the kernel for lock operations and this functions are what invoke WITNESS. that's what I thought but empirical evidence disagrees. I'll try some more cases. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
witness and modules.
Do we need to compile all modules with witness definitions when linking with a kernel compiled with witness? This was true at one stage but I remember some work was done to make them compatible. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: WITNESS observes 2 LORs on Boot of Release 10.1
On 11/22/2014 03:51 PM, Benjamin Kaduk wrote: On Fri, 21 Nov 2014, Ellis H. Wilson III wrote: Before I start, and this is mainly geared to my responder Benjamin Kaduk, based on your response, are you suggesting that the cnputc WITNESS panic you expected to happen is now completely unavoidable in FreeBSD 10? I.E., is this a spinlock that WITNESS falls over each time but that is provably deadlock free that the developers have decided cannot be BLESSED for some reason? https://lists.freebsd.org/pipermail/freebsd-current/2012-January/031316.html looks to be a better explanation than the previous link I sent ... in short, console output is hard. I guess I just can't wrap my head around why we would ever move to a regime where SKIPSPIN is the default for testing... That just seems like an open invitation for introducing spinlock regressions. I don't think anyone made the conscious decision to do that, it just happened by default as no one spent the time to fix the aforementioned issue. Moving onto the LORs I'm seeing, a question I have as a newbie to WITNESS debugging is how exactly to interpret the output if I see a stacktrace and then a LOR output like the following: lock order reversal: 1st 0x81633d88 entropy harvest mutex (entropy harvest mutex) @ /usr/src/sys/dev/random/random_harvestq.c:198 2nd 0x813b6208 scrlock (scrlock) @ /usr/src/sys/dev/syscons/syscons.c:2682 Does this mean WITNESS has already stored an ordering of #1 harvest_mtx then #2 scp-scr_lock, and somewhere somebody tried to lock scp-scr_lock without first getting harvest_mtx? Or the reverse (WITNESS previously recorded scrlock and then harvest and the lines it spit out were the offenders?) I believe it is the latter (the ordering being printed is the bad one which caused WITNESS to complain). Thanks so much for the additional info Ben. This fleshes out the history of this issue for me significantly. I have filed a bug on this at: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195262 Xin Li was able to identify the ordering that caused the problem and proposed a possible patch to fix it. I can confirm that now I'm booting with solely WITNESS (i.e., not WITNESS_SKIPSPIN) without panic. Thanks! ellis ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: WITNESS observes 2 LORs on Boot of Release 10.1
On Fri, 21 Nov 2014, Ellis H. Wilson III wrote: Before I start, and this is mainly geared to my responder Benjamin Kaduk, based on your response, are you suggesting that the cnputc WITNESS panic you expected to happen is now completely unavoidable in FreeBSD 10? I.E., is this a spinlock that WITNESS falls over each time but that is provably deadlock free that the developers have decided cannot be BLESSED for some reason? https://lists.freebsd.org/pipermail/freebsd-current/2012-January/031316.html looks to be a better explanation than the previous link I sent ... in short, console output is hard. I guess I just can't wrap my head around why we would ever move to a regime where SKIPSPIN is the default for testing... That just seems like an open invitation for introducing spinlock regressions. I don't think anyone made the conscious decision to do that, it just happened by default as no one spent the time to fix the aforementioned issue. Moving onto the LORs I'm seeing, a question I have as a newbie to WITNESS debugging is how exactly to interpret the output if I see a stacktrace and then a LOR output like the following: lock order reversal: 1st 0x81633d88 entropy harvest mutex (entropy harvest mutex) @ /usr/src/sys/dev/random/random_harvestq.c:198 2nd 0x813b6208 scrlock (scrlock) @ /usr/src/sys/dev/syscons/syscons.c:2682 Does this mean WITNESS has already stored an ordering of #1 harvest_mtx then #2 scp-scr_lock, and somewhere somebody tried to lock scp-scr_lock without first getting harvest_mtx? Or the reverse (WITNESS previously recorded scrlock and then harvest and the lines it spit out were the offenders?) I believe it is the latter (the ordering being printed is the bad one which caused WITNESS to complain). -Ben ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: WITNESS observes 2 LORs on Boot of Release 10.1
On 11/18/2014 08:35 PM, Ellis H. Wilson III wrote: If nobody has seen these before, I'll try and put together fixes for them. Please somebody speak up if you have seen them or have useful information for me to go on in my patches. I've started to dig into the relevant code in sys/dev/random/random_harvestq.c, sys/dev/syscons/syscons.c, and sys/kern/subr_sleepqueue.c. Before I start, and this is mainly geared to my responder Benjamin Kaduk, based on your response, are you suggesting that the cnputc WITNESS panic you expected to happen is now completely unavoidable in FreeBSD 10? I.E., is this a spinlock that WITNESS falls over each time but that is provably deadlock free that the developers have decided cannot be BLESSED for some reason? I guess I just can't wrap my head around why we would ever move to a regime where SKIPSPIN is the default for testing... That just seems like an open invitation for introducing spinlock regressions. Moving onto the LORs I'm seeing, a question I have as a newbie to WITNESS debugging is how exactly to interpret the output if I see a stacktrace and then a LOR output like the following: lock order reversal: 1st 0x81633d88 entropy harvest mutex (entropy harvest mutex) @ /usr/src/sys/dev/random/random_harvestq.c:198 2nd 0x813b6208 scrlock (scrlock) @ /usr/src/sys/dev/syscons/syscons.c:2682 Does this mean WITNESS has already stored an ordering of #1 harvest_mtx then #2 scp-scr_lock, and somewhere somebody tried to lock scp-scr_lock without first getting harvest_mtx? Or the reverse (WITNESS previously recorded scrlock and then harvest and the lines it spit out were the offenders?) Along those lines, in 10.0 and 10.1 releases I get two LORs showing up almost on-top of each other, with the other LOR showing up as: lock order reversal: 1st 0x81633d88 entropy harvest mutex (entropy harvest mutex) @ /usr/src/sys/dev/random/random_harvestq.c:198 2nd 0x81424bb8 sleepq chain (sleepq chain) @ /usr/src/sys/kern/subr_sleepqueue.c:240 This seems like maybe two LORs are detected at the same time, which perhaps suggests that the harvest_mtx should have been taken /after/ both of the other locks mentioned (scrlock and sleepq). I'm happy to do the legwork implementing, testing, and submitting a patch for this, but I would really appreciate a pointer in the right direction from somebody who already has handled some LORs before. Thanks! ellis ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
WITNESS observes 2 LORs on Boot of Release 10.1
Hi all, I'm observing the following two WITNESS LORs being thrown upon boot-up of 10.0 and I was tracking current, hoping they would go away by 10.1, but it seems they persist as shown below. I suspect this is because current is being built with WITNESS on but also with SKIPSPIN on. So these issues are unlikely to show up for any devs but those who specifically enable WITNESS and disable SKIPSPIN like myself. At my work we would greatly like to do our debugging with checking of spin-locking order included and panicing upon LOR detection. That's not possible with these in existence. Specifically: snip Event timer RTC frequency 32768 Hz quality 0 ppc0: cannot reserve I/O port range Timecounters tick every 10.000 msec pcm0: measured ac97 link rate at 29212 Hz lock order reversal: lock order reversal: 1st 0x8177a1a0 entropy harvest mutex (entropy harvest mutex) @ /usr/src/sys/dev/random/random_harvestq.c:198 2nd 0x81475108 scrlock (scrlock) @ /usr/src/sys/dev/syscons/syscons.c:2694 KDB: stack backtrace: #0 0x8092f020 at kdb_backtrace+0x60 #1 0x80944204 at witness_checkorder+0xc04 #2 0x808e6d5e at __mtx_lock_spin_flags+0x4e #3 0x80738b80 at sc_puts+0xb0 #4 0x8073c4e5 at sc_cnputc+0xe5 #5 0x808b6cf0 at cnputc+0x80 #6 0x808b6fa8 at cnputs+0x58 #7 0x80934011 at putchar+0x151 #8 0x80932c36 at kvprintf+0xf6 #9 0x809344ad at _vprintf+0x8d #10 0x809346c3 at printf+0x53 #11 0x80943fe0 at witness_checkorder+0x9e0 #12 0x808e6d5e at __mtx_lock_spin_flags+0x4e #13 0x8090019b at msleep_spin_sbt+0x5b #14 0x80698198 at random_kthread+0x78 #15 0x808cd941 at fork_exit+0x71 #16 0x80caee7e at fork_trampoline+0xe 1st 0x8177a1a0 entropy harvest mutex (entropy harvest mutex) @ /usr/src/sys/dev/random/random_harvestq.c:198 2nd 0x81568390 sleepq chain (sleepq chain) @ /usr/src/sys/kern/subr_sleepqueue.c:237 KDB: stack backtrace: #0 0x8092f020 at kdb_backtrace+0x60 #1 0x80944204 at witness_checkorder+0xc04 #2 0x808e6d5e at __mtx_lock_spin_flags+0x4e #3 0x8090019b at msleep_spin_sbt+0x5b #4 0x80698198 at random_kthread+0x78 #5 0x808cd941 at fork_exit+0x71 #6 0x80caee7e at fork_trampoline+0xe usbus0: 12Mbps Full Speed USB v1.0 ugen0.1: Apple at usbus0 uhub0: Apple OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1 on usbus0 ada0 at ata0 bus 0 scbus0 target 0 lun 0 ada0: VBOX HARDDISK 1.0 ATA-6 device snip This is on a virtual machine via VirtualBox, but we have reproduced it on real hardware. Regarding the latter reversal, I have dug in the code and see the calls go about in the following order: In sys/dev/random/random_harvestq.c: 198 mtx_lock_spin(harvest_mtx); 209 msleep_spin_sbt(random_kthread_control, harvest_mtx, Now in sys/kern/kern_synch.c: 297 sleepq_lock(ident); Now in sys/kern/subr_sleepqueue.c 237 mtx_lock_spin(sc-sc_lock); The above line numbers might be a hair off from head since I'm looking at code synced last week. I'm happy to open a bug on this if that's the desirable course of action, or to even assist in fixing it, but I wanted to first see if anybody knew about these already (they didn't show up on the known LORs list on quick perusal) or if this was simply a case of WITNESS being overly conservative and throwing false positives. If this belongs on a different list just let me know. Thanks! ellis ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: WITNESS observes 2 LORs on Boot of Release 10.1
On Tue, 18 Nov 2014, Ellis H. Wilson III wrote: This is on a virtual machine via VirtualBox, but we have reproduced it on real hardware. Regarding the latter reversal, I have dug in the code and see the calls go about in the following order: In sys/dev/random/random_harvestq.c: 198 mtx_lock_spin(harvest_mtx); 209 msleep_spin_sbt(random_kthread_control, harvest_mtx, Now in sys/kern/kern_synch.c: 297 sleepq_lock(ident); Now in sys/kern/subr_sleepqueue.c 237 mtx_lock_spin(sc-sc_lock); The above line numbers might be a hair off from head since I'm looking at code synced last week. I'm happy to open a bug on this if that's the desirable course of action, or to even assist in fixing it, but I wanted to first see if anybody knew about these already (they didn't show up on the known LORs list on quick perusal) or if this was simply a case of WITNESS being overly conservative and throwing false positives. If this belongs on a different list just let me know. I don't believe it is known, and this list is fine. I'm observing the following two WITNESS LORs being thrown upon boot-up of 10.0 and I was tracking current, hoping they would go away by 10.1, but it seems they persist as shown below. I suspect this is because current is being built with WITNESS on but also with SKIPSPIN on. So these issues are unlikely to show up for any devs but those who specifically enable WITNESS and disable SKIPSPIN like myself. At my work we would greatly like to do our debugging with checking of spin-locking order included and panicing upon LOR detection. That's not possible with these in existence. However, I was under the impression that a kernel built with WITNESS and without WITNESS_SKIPSPIN would panic on boot on the cnputs_mutx (see, e.g., https://lists.freebsd.org/pipermail/freebsd-stable/2014-January/076864.html). So, (1) I'm surprised you can boot it, and (2) that would explain why no one else has been using it. -Ben ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: WITNESS observes 2 LORs on Boot of Release 10.1
On 11/18/2014 05:39 PM, Benjamin Kaduk wrote: On Tue, 18 Nov 2014, Ellis H. Wilson III wrote: I'm observing the following two WITNESS LORs being thrown upon boot-up of 10.0 and I was tracking current, hoping they would go away by 10.1, but it seems they persist as shown below. I suspect this is because current is being built with WITNESS on but also with SKIPSPIN on. So these issues are unlikely to show up for any devs but those who specifically enable WITNESS and disable SKIPSPIN like myself. At my work we would greatly like to do our debugging with checking of spin-locking order included and panicing upon LOR detection. That's not possible with these in existence. However, I was under the impression that a kernel built with WITNESS and without WITNESS_SKIPSPIN would panic on boot on the cnputs_mutx (see, e.g., https://lists.freebsd.org/pipermail/freebsd-stable/2014-January/076864.html). So, (1) I'm surprised you can boot it, and (2) that would explain why no one else has been using it. That's a very interesting thread. I've seen another where a fellow developer suggested just throwing on the WITNESS_SKIPSPIN flag to solve the issue. I can't say that I agree with the approach, but I understand. I'd be willing to tackle a bit of WITNESS massaging to help it be instructed about known false positives better, if that's desirable. Why I'm able to boot however, is simple: I haven't enabled a full suite of debugging flags, KDB/DDB being the key ones that cause a panic to occur on a failure like the one I've seen. We originally were seeing the panic so WITNESS in its entirety was shut off. I was asked to try and get that back on-track, so to start I at least wanted to see how many LORs we were dealing with on boot. 2 is apparently that magic number, and maybe 3 if the cputs issue you refer to is still possible to be hit in the 10.1 wild. If nobody has seen these before, I'll try and put together fixes for them. Please somebody speak up if you have seen them or have useful information for me to go on in my patches. Thanks, ellis ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
WITNESS: unable to allocate a new witness object
I'm trying to set up textdump(4). On boot I see: WITNESS: unable to allocate a new witness object also Expensive timeout(9) function: 0x9ffc00e222e0(0xa09ed320) 0.002434387 s kickstart. Does the first indicate I haven't set up WITNESS correctly? What does the second tell me? Thanks Anton ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: WITNESS: unable to allocate a new witness object
On Tue, Oct 15, 2013 at 4:17 PM, Anton Shterenlikht me...@bris.ac.uk wrote: I'm trying to set up textdump(4). On boot I see: WITNESS: unable to allocate a new witness object also It means that you run out of WITNESS object on the free list. Expensive timeout(9) function: 0x9ffc00e222e0(0xa09ed320) 0.002434387 s kickstart. It's output from DIAGNOSTIC, it's triggered when the callout handler execution time exceeds a given threshold. You can safely ignore it. Also, please stop spamming on mailing lists with new posts. They more or less all refers to the same problem. Keeps posting doesn't encourage people to look at it, neither it helps to have it solved more quickly. Thanks, -- Davide There are no solved problems; there are only problems that are more or less solved -- Henri Poincare ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Witness message about lock order reversal on 10 (head)
I got these messages on 10 head, rev.254235, during 'filesystem full' condition. Yuri = log = lock order reversal: 1st 0xff80f7432470 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:3054 2nd 0xfe00075b5600 dirhash (dirhash) @ /usr/src/sys/ufs/ufs/ufs_dirhash.c:284 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff8000284440 kdb_backtrace() at kdb_backtrace+0x39/frame 0xff80002844f0 witness_checkorder() at witness_checkorder+0xd4f/frame 0xff8000284580 _sx_xlock() at _sx_xlock+0x75/frame 0xff80002845c0 ufsdirhash_add() at ufsdirhash_add+0x3b/frame 0xff8000284600 ufs_direnter() at ufs_direnter+0x688/frame 0xff80002846c0 ufs_mkdir() at ufs_mkdir+0x863/frame 0xff80002848c0 VOP_MKDIR_APV() at VOP_MKDIR_APV+0xf0/frame 0xff80002848f0 kern_mkdirat() at kern_mkdirat+0x21a/frame 0xff8000284ae0 amd64_syscall() at amd64_syscall+0x265/frame 0xff8000284bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff8000284bf0 --- syscall (136, FreeBSD ELF64, sys_mkdir), rip = 0x800931e9a, rsp = 0x7fffd798, rbp = 0x7fffdc70 --- lock order reversal: 1st 0xfe0007633840 filedesc structure (filedesc structure) @ /usr/src/sys/kern/kern_descrip.c:1184 2nd 0xfe0007a45240 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:4346 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff800031f6b0 kdb_backtrace() at kdb_backtrace+0x39/frame 0xff800031f760 witness_checkorder() at witness_checkorder+0xd4f/frame 0xff800031f7f0 __lockmgr_args() at __lockmgr_args+0x6f2/frame 0xff800031f920 ffs_lock() at ffs_lock+0x84/frame 0xff800031f970 VOP_LOCK1_APV() at VOP_LOCK1_APV+0xf5/frame 0xff800031f9a0 _vn_lock() at _vn_lock+0xab/frame 0xff800031fa10 knlist_remove_kq() at knlist_remove_kq+0x82/frame 0xff800031fa40 knote_fdclose() at knote_fdclose+0xc8/frame 0xff800031fa90 closefp() at closefp+0x64/frame 0xff800031fae0 amd64_syscall() at amd64_syscall+0x265/frame 0xff800031fbf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff800031fbf0 --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x80164537a, rsp = 0x7f7fbf08, rbp = 0x7f7fbf20 --- pid 983 (sendmail), uid 25 inumber 473785 on /: filesystem full pid 1101 (dd), uid 2 inumber 426338 on /: filesystem full lock order reversal: 1st 0xfe0007cbc240 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2099 2nd 0xff80f7894338 bufwait (bufwait) @ /usr/src/sys/ufs/ffs/ffs_vnops.c:262 3rd 0xfe003b531418 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2099 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff8000378e60 kdb_backtrace() at kdb_backtrace+0x39/frame 0xff8000378f10 witness_checkorder() at witness_checkorder+0xd4f/frame 0xff8000378fa0 __lockmgr_args() at __lockmgr_args+0x6f2/frame 0xff80003790d0 ffs_lock() at ffs_lock+0x84/frame 0xff8000379120 VOP_LOCK1_APV() at VOP_LOCK1_APV+0xf5/frame 0xff8000379150 _vn_lock() at _vn_lock+0xab/frame 0xff80003791c0 vget() at vget+0x70/frame 0xff8000379210 vfs_hash_get() at vfs_hash_get+0xf5/frame 0xff8000379260 ffs_vgetf() at ffs_vgetf+0x41/frame 0xff80003792f0 softdep_sync_buf() at softdep_sync_buf+0x2e4/frame 0xff80003793a0 ffs_syncvnode() at ffs_syncvnode+0x258/frame 0xff8000379420 ffs_truncate() at ffs_truncate+0x5ca/frame 0xff8000379600 ufs_direnter() at ufs_direnter+0x891/frame 0xff80003796c0 ufs_mkdir() at ufs_mkdir+0x863/frame 0xff80003798c0 VOP_MKDIR_APV() at VOP_MKDIR_APV+0xf0/frame 0xff80003798f0 kern_mkdirat() at kern_mkdirat+0x21a/frame 0xff8000379ae0 amd64_syscall() at amd64_syscall+0x265/frame 0xff8000379bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff8000379bf0 --- syscall (136, FreeBSD ELF64, sys_mkdir), rip = 0x800931e9a, rsp = 0x7fffd898, rbp = 0x7fffd970 --- ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Witness message about lock order reversal on 10 (head)
It might be the same false positive I saw a couple weeks ago ... Davide said to me: | The LOR is a false positive. | See the comment in sys/ufs/ufs/ufs_dirhash.c | Also, switching motherboards is not related to this in any way. You'll | eventually hit that LOR report, unless you disabled WITNESS. Thanks, -- Davide On Mon, 19 Aug 2013, Yuri wrote: I got these messages on 10 head, rev.254235, during 'filesystem full' condition. Yuri = log = lock order reversal: 1st 0xff80f7432470 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:3054 2nd 0xfe00075b5600 dirhash (dirhash) @ /usr/src/sys/ufs/ufs/ufs_dirhash.c:284 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff8000284440 kdb_backtrace() at kdb_backtrace+0x39/frame 0xff80002844f0 witness_checkorder() at witness_checkorder+0xd4f/frame 0xff8000284580 _sx_xlock() at _sx_xlock+0x75/frame 0xff80002845c0 ufsdirhash_add() at ufsdirhash_add+0x3b/frame 0xff8000284600 ufs_direnter() at ufs_direnter+0x688/frame 0xff80002846c0 ufs_mkdir() at ufs_mkdir+0x863/frame 0xff80002848c0 VOP_MKDIR_APV() at VOP_MKDIR_APV+0xf0/frame 0xff80002848f0 kern_mkdirat() at kern_mkdirat+0x21a/frame 0xff8000284ae0 amd64_syscall() at amd64_syscall+0x265/frame 0xff8000284bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff8000284bf0 --- syscall (136, FreeBSD ELF64, sys_mkdir), rip = 0x800931e9a, rsp = 0x7fffd798, rbp = 0x7fffdc70 --- lock order reversal: 1st 0xfe0007633840 filedesc structure (filedesc structure) @ /usr/src/sys/kern/kern_descrip.c:1184 2nd 0xfe0007a45240 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:4346 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff800031f6b0 kdb_backtrace() at kdb_backtrace+0x39/frame 0xff800031f760 witness_checkorder() at witness_checkorder+0xd4f/frame 0xff800031f7f0 __lockmgr_args() at __lockmgr_args+0x6f2/frame 0xff800031f920 ffs_lock() at ffs_lock+0x84/frame 0xff800031f970 VOP_LOCK1_APV() at VOP_LOCK1_APV+0xf5/frame 0xff800031f9a0 _vn_lock() at _vn_lock+0xab/frame 0xff800031fa10 knlist_remove_kq() at knlist_remove_kq+0x82/frame 0xff800031fa40 knote_fdclose() at knote_fdclose+0xc8/frame 0xff800031fa90 closefp() at closefp+0x64/frame 0xff800031fae0 amd64_syscall() at amd64_syscall+0x265/frame 0xff800031fbf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff800031fbf0 --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x80164537a, rsp = 0x7f7fbf08, rbp = 0x7f7fbf20 --- pid 983 (sendmail), uid 25 inumber 473785 on /: filesystem full pid 1101 (dd), uid 2 inumber 426338 on /: filesystem full lock order reversal: 1st 0xfe0007cbc240 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2099 2nd 0xff80f7894338 bufwait (bufwait) @ /usr/src/sys/ufs/ffs/ffs_vnops.c:262 3rd 0xfe003b531418 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2099 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff8000378e60 kdb_backtrace() at kdb_backtrace+0x39/frame 0xff8000378f10 witness_checkorder() at witness_checkorder+0xd4f/frame 0xff8000378fa0 __lockmgr_args() at __lockmgr_args+0x6f2/frame 0xff80003790d0 ffs_lock() at ffs_lock+0x84/frame 0xff8000379120 VOP_LOCK1_APV() at VOP_LOCK1_APV+0xf5/frame 0xff8000379150 _vn_lock() at _vn_lock+0xab/frame 0xff80003791c0 vget() at vget+0x70/frame 0xff8000379210 vfs_hash_get() at vfs_hash_get+0xf5/frame 0xff8000379260 ffs_vgetf() at ffs_vgetf+0x41/frame 0xff80003792f0 softdep_sync_buf() at softdep_sync_buf+0x2e4/frame 0xff80003793a0 ffs_syncvnode() at ffs_syncvnode+0x258/frame 0xff8000379420 ffs_truncate() at ffs_truncate+0x5ca/frame 0xff8000379600 ufs_direnter() at ufs_direnter+0x891/frame 0xff80003796c0 ufs_mkdir() at ufs_mkdir+0x863/frame 0xff80003798c0 VOP_MKDIR_APV() at VOP_MKDIR_APV+0xf0/frame 0xff80003798f0 kern_mkdirat() at kern_mkdirat+0x21a/frame 0xff8000379ae0 amd64_syscall() at amd64_syscall+0x265/frame 0xff8000379bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff8000379bf0 --- syscall (136, FreeBSD ELF64, sys_mkdir), rip = 0x800931e9a, rsp = 0x7fffd898, rbp = 0x7fffd970 --- ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org dan -- Dan Mack ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Spurious witness warning when destroying spin mtx
On Saturday, November 24, 2012 10:01:39 am Attilio Rao wrote: On Sat, Nov 24, 2012 at 3:08 AM, Ryan Stone ryst...@gmail.com wrote: Today I saw a spurious witness warning for acquiring duplicate lock of same type. The root cause is that when running mtx_destroy on a spinlock that is held by the current thread, mtx_destroy calls spinlock_exit() before calling WITNESS_UNLOCK, which opens up a window in which the CPU can be interrupted and attempt to acquire another spinlock of the same type as the one being destroyed. This patch should fix it: I seriously wonder why right now we don't assume the lock is unheld. There are likely historically reasons for that, but I would like to know which one are those and eventually fix them out. FWIK, all the other locking primitives assume the lock is already unheld when destroying and I think it would be good to have that for mutexes as well. That is simply behavior we inherited from BSD/OS. I didn't find it all that useful so all of the other locking primitives I've added since then have not had this behavior. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Spurious witness warning when destroying spin mtx
On Friday, November 23, 2012 10:08:28 PM Ryan Stone wrote: Today I saw a spurious witness warning for acquiring duplicate lock of same type. The root cause is that when running mtx_destroy on a spinlock that is held by the current thread, mtx_destroy calls spinlock_exit() before calling WITNESS_UNLOCK, which opens up a window in which the CPU can be interrupted and attempt to acquire another spinlock of the same type as the one being destroyed. This patch should fix it: diff --git a/sys/kern/kern_mutex.c b/sys/kern/kern_mutex.c index 2f13863..96f43f8 100644 --- a/sys/kern/kern_mutex.c +++ b/sys/kern/kern_mutex.c @@ -918,16 +918,16 @@ _mtx_destroy(volatile uintptr_t *c) else { MPASS((m-mtx_lock (MTX_RECURSED|MTX_CONTESTED)) == 0); + lock_profile_release_lock(m-lock_object); + /* Tell witness this isn't locked to make it happy. */ + WITNESS_UNLOCK(m-lock_object, LOP_EXCLUSIVE, __FILE__, + __LINE__); + /* Perform the non-mtx related part of mtx_unlock_spin(). */ if (LOCK_CLASS(m-lock_object) == lock_class_mtx_spin) spinlock_exit(); else curthread-td_locks--; - - lock_profile_release_lock(m-lock_object); - /* Tell witness this isn't locked to make it happy. */ - WITNESS_UNLOCK(m-lock_object, LOP_EXCLUSIVE, __FILE__, - __LINE__); } m-mtx_lock = MTX_DESTROYED Ah, I would tweak this slightly perhaps to match mtx_unlock() and mtx_unlock_spin(): if (LOCK_CLASS() == lock_class_mtx_sleep) curthread-td_locks--; /* lock profile and witness stuff */ if (LOCK_CLASS() == lock_class_mtx_spin) spinlock_exit(); You are correct that it is broken for the spin case. -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Spurious witness warning when destroying spin mtx
On Sat, Nov 24, 2012 at 3:01 PM, Attilio Rao atti...@freebsd.org wrote: On Sat, Nov 24, 2012 at 3:08 AM, Ryan Stone ryst...@gmail.com wrote: Today I saw a spurious witness warning for acquiring duplicate lock of same type. The root cause is that when running mtx_destroy on a spinlock that is held by the current thread, mtx_destroy calls spinlock_exit() before calling WITNESS_UNLOCK, which opens up a window in which the CPU can be interrupted and attempt to acquire another spinlock of the same type as the one being destroyed. This patch should fix it: I seriously wonder why right now we don't assume the lock is unheld. There are likely historically reasons for that, but I would like to know which one are those and eventually fix them out. FWIK, all the other locking primitives assume the lock is already unheld when destroying and I think it would be good to have that for mutexes as well. Can you please show which lock triggers the panic you saw? Ryan, however I'm sure this patch would introduce a major switch in our KPI/POLA and eventually it would be a bigger work. Your patch is certainly right and I think you should commit it for the time being. For the long-term, maybe you would like to work on such a patch, rather. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Spurious witness warning when destroying spin mtx
On Sat, Nov 24, 2012 at 3:08 AM, Ryan Stone ryst...@gmail.com wrote: Today I saw a spurious witness warning for acquiring duplicate lock of same type. The root cause is that when running mtx_destroy on a spinlock that is held by the current thread, mtx_destroy calls spinlock_exit() before calling WITNESS_UNLOCK, which opens up a window in which the CPU can be interrupted and attempt to acquire another spinlock of the same type as the one being destroyed. This patch should fix it: I seriously wonder why right now we don't assume the lock is unheld. There are likely historically reasons for that, but I would like to know which one are those and eventually fix them out. FWIK, all the other locking primitives assume the lock is already unheld when destroying and I think it would be good to have that for mutexes as well. Can you please show which lock triggers the panic you saw? Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Spurious witness warning when destroying spin mtx
On Sat, Nov 24, 2012 at 10:01 AM, Attilio Rao atti...@freebsd.org wrote: I seriously wonder why right now we don't assume the lock is unheld. There are likely historically reasons for that, but I would like to know which one are those and eventually fix them out. FWIK, all the other locking primitives assume the lock is already unheld when destroying and I think it would be good to have that for mutexes as well. Can you please show which lock triggers the panic you saw? Thanks, Attilio It was taskqueue_free: void taskqueue_free(struct taskqueue *queue) { TQ_LOCK(queue); queue-tq_flags = ~TQ_FLAGS_ACTIVE; taskqueue_terminate(queue-tq_threads, queue); KASSERT(TAILQ_EMPTY(queue-tq_active), (Tasks still running?)); KASSERT(queue-tq_callouts == 0, (Armed timeout tasks)); mtx_destroy(queue-tq_mutex); free(queue-tq_threads, M_TASKQUEUE); free(queue, M_TASKQUEUE); } ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Spurious witness warning when destroying spin mtx
On Sat, Nov 24, 2012 at 3:46 PM, Ryan Stone ryst...@gmail.com wrote: On Sat, Nov 24, 2012 at 10:01 AM, Attilio Rao atti...@freebsd.org wrote: I seriously wonder why right now we don't assume the lock is unheld. There are likely historically reasons for that, but I would like to know which one are those and eventually fix them out. FWIK, all the other locking primitives assume the lock is already unheld when destroying and I think it would be good to have that for mutexes as well. Can you please show which lock triggers the panic you saw? Thanks, Attilio It was taskqueue_free: taskqueue_free() must not be called in places where there are still races, so the lock is not really meaningful and should be acquired. Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Spurious witness warning when destroying spin mtx
On Sat, Nov 24, 2012 at 3:51 PM, Attilio Rao atti...@freebsd.org wrote: On Sat, Nov 24, 2012 at 3:46 PM, Ryan Stone ryst...@gmail.com wrote: On Sat, Nov 24, 2012 at 10:01 AM, Attilio Rao atti...@freebsd.org wrote: I seriously wonder why right now we don't assume the lock is unheld. There are likely historically reasons for that, but I would like to know which one are those and eventually fix them out. FWIK, all the other locking primitives assume the lock is already unheld when destroying and I think it would be good to have that for mutexes as well. Can you please show which lock triggers the panic you saw? Thanks, Attilio It was taskqueue_free: taskqueue_free() must not be called in places where there are still races, so the lock is not really meaningful and should be acquired. Herm, I mean to say after taskqueue_termintate() returns must not be races Attilio -- Peace can only be achieved by understanding - A. Einstein ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Spurious witness warning when destroying spin mtx
Today I saw a spurious witness warning for acquiring duplicate lock of same type. The root cause is that when running mtx_destroy on a spinlock that is held by the current thread, mtx_destroy calls spinlock_exit() before calling WITNESS_UNLOCK, which opens up a window in which the CPU can be interrupted and attempt to acquire another spinlock of the same type as the one being destroyed. This patch should fix it: diff --git a/sys/kern/kern_mutex.c b/sys/kern/kern_mutex.c index 2f13863..96f43f8 100644 --- a/sys/kern/kern_mutex.c +++ b/sys/kern/kern_mutex.c @@ -918,16 +918,16 @@ _mtx_destroy(volatile uintptr_t *c) else { MPASS((m-mtx_lock (MTX_RECURSED|MTX_CONTESTED)) == 0); + lock_profile_release_lock(m-lock_object); + /* Tell witness this isn't locked to make it happy. */ + WITNESS_UNLOCK(m-lock_object, LOP_EXCLUSIVE, __FILE__, + __LINE__); + /* Perform the non-mtx related part of mtx_unlock_spin(). */ if (LOCK_CLASS(m-lock_object) == lock_class_mtx_spin) spinlock_exit(); else curthread-td_locks--; - - lock_profile_release_lock(m-lock_object); - /* Tell witness this isn't locked to make it happy. */ - WITNESS_UNLOCK(m-lock_object, LOP_EXCLUSIVE, __FILE__, - __LINE__); } m-mtx_lock = MTX_DESTROYED ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
on 25/01/2012 23:52 Andriy Gapon said the following: on 24/01/2012 14:32 Gleb Smirnoff said the following: Yes, now: Rebooting... lock order reversal: 1st 0x80937140 smp rendezvous (smp rendezvous) @ /usr/src/head/sys/kern/kern_shutdown.c:542 2nd 0xfe0001f5d838 uart_hwmtx (uart_hwmtx) @ /usr/src/head/sys/dev/uart/uart_cpu.h:92 panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/head/sys/kern/kern_cons.c:500 OK, so it's just a plain LOR between smp rendezvous and uart_hwmtx, no new details... It's still intriguing to me why the LOR *doesn't* happen [*] with stop_scheduler_on_panic=0. But I don't see a productive way to pursue this investigation further. Salve Glebius! After your recent nudging I took yet another look at this issue and it seems that I have some findings. For those who might get interested here is a convenience reference to the whole thread on gmane: http://thread.gmane.org/gmane.os.freebsd.current/139307 A short summary. Prerequisites: an SMP x86 system, its kernel is configured with WITNESS !WITNESS_SKIPSPIN (this is important) and a system uses serial console via uart. Then, if stop_scheduler_on_panic is set to zero the system can be rebooted without a problem. On the other hand, if stop_scheduler_on_panic is enabled, then the system first runs into a LOR when calling printf() in cpu_reset() and then it runs into a panic when printf is recursively called from witness(9) to report the LOR. The panic happens because of the recursion on cnputs_mtx, which should ensure that cnputs() output is not intermingled and which is not flagged to support recursion. There are two things about this report that greatly confused and puzzled me: 1. stop_scheduler_on_panic variable is used _only_ in panic(9). So how could it be possible that changing its value affects behavior of the system when panic(9) is not called?! 2. The LOR in question happens between smp rendezvous (smp_ipi_mtx) and uart_hwmtx (sc_hwmtx_s in uart core) spin locks. The order of these locks is actually predefined in witness order_lists[] such that uart_hwmtx must come before smp_ipi_mtx. But in the reboot path we first take smp_ipi_mtx in shutdown_reset(), then we call cpu_reset, then it calls printf and from there we get to uart_hwmtx for serial console output. So the order between these spinlocks is always violated in the x86 SMP reboot path. How come witness(9) doesn't _always_ detect this LOR? How come it didn't detect this LOR before any stop scheduler commits?! [Spoiler alert :-)] Turns out that the two puzzles above are closely related. Let's first list all the things that change when stop_scheduler_on_panic is enabled and a panic happens: - other CPUs are stopped (forced to spin) - interrupts on current CPU are disabled - by virtue of the above the current thread should be the only thread running (unless it executes a voluntary switch) - all locks are busted, they are completely ignored / bypassed - by virtue of the above no lock invariants and witness checks are performed, so no lock order checking, no recursion checking, etc So, what I observe is this: when stop_scheduler_on_panic is disabled, the LOR is actually detected as it should be. witness(9) works properly here. Once the LOR is detected witness(9) wants to report it using printf(9). That's where we run into the cnputs_mtx recursion panic. It's all exactly as with stop_scheduler_on_panic enabled. Then panic(9) wants to report the panic using printf(9), which goes to cnputs() again, where _mtx_lock_spin_flags() detects locks recursion again (this is independent of witness(9)) and calls panic(9) again. Then panic(9) wants to report the panic using printf(9)... I assume that when the stack is exhausted we run into a double fault and dblfault_handler wants to print something again... Likely we eventually run into a triple fault which resets the machine. Although, I recall some old reports of machines hanging during reboot in a place suspiciously close to where the described ordeal happens... But if the machine doesn't hang we get a full appearance of the reset successfully happening (modulo some last messages missing). With stop_scheduler_on_panic enabled and all the locks being ignored we, of course, do not run into any secondary lock recursions and resulting panics. So the system is able to at least panic gracefully (still we should have just reported the LOR gracefully, no panic is required). Some obvious conclusions: - the smp rendezvous and uart_hwmtx LOR is real and it is the true cause of the problem; it should be fixed one way or other - either by correcting witness order_lists[] or by avoiding the LOR in shutdown_reset/cpu_reset; - because witness(9) uses printf(9) to report problems, it is very fragile to use witness with any locks that can be acquired under printf(9) - stop_scheduler_on_panic just uncovers the true bug There are probably other conclusions that can
Re: new panic in cpu_reset() with WITNESS
2012/5/17, Andriy Gapon a...@freebsd.org: on 25/01/2012 23:52 Andriy Gapon said the following: on 24/01/2012 14:32 Gleb Smirnoff said the following: Yes, now: Rebooting... lock order reversal: 1st 0x80937140 smp rendezvous (smp rendezvous) @ /usr/src/head/sys/kern/kern_shutdown.c:542 2nd 0xfe0001f5d838 uart_hwmtx (uart_hwmtx) @ /usr/src/head/sys/dev/uart/uart_cpu.h:92 panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/head/sys/kern/kern_cons.c:500 OK, so it's just a plain LOR between smp rendezvous and uart_hwmtx, no new details... It's still intriguing to me why the LOR *doesn't* happen [*] with stop_scheduler_on_panic=0. But I don't see a productive way to pursue this investigation further. Salve Glebius! After your recent nudging I took yet another look at this issue and it seems that I have some findings. For those who might get interested here is a convenience reference to the whole thread on gmane: http://thread.gmane.org/gmane.os.freebsd.current/139307 A short summary. Prerequisites: an SMP x86 system, its kernel is configured with WITNESS !WITNESS_SKIPSPIN (this is important) and a system uses serial console via uart. Then, if stop_scheduler_on_panic is set to zero the system can be rebooted without a problem. On the other hand, if stop_scheduler_on_panic is enabled, then the system first runs into a LOR when calling printf() in cpu_reset() and then it runs into a panic when printf is recursively called from witness(9) to report the LOR. The panic happens because of the recursion on cnputs_mtx, which should ensure that cnputs() output is not intermingled and which is not flagged to support recursion. There are two things about this report that greatly confused and puzzled me: 1. stop_scheduler_on_panic variable is used _only_ in panic(9). So how could it be possible that changing its value affects behavior of the system when panic(9) is not called?! 2. The LOR in question happens between smp rendezvous (smp_ipi_mtx) and uart_hwmtx (sc_hwmtx_s in uart core) spin locks. The order of these locks is actually predefined in witness order_lists[] such that uart_hwmtx must come before smp_ipi_mtx. But in the reboot path we first take smp_ipi_mtx in shutdown_reset(), then we call cpu_reset, then it calls printf and from there we get to uart_hwmtx for serial console output. So the order between these spinlocks is always violated in the x86 SMP reboot path. How come witness(9) doesn't _always_ detect this LOR? How come it didn't detect this LOR before any stop scheduler commits?! [Spoiler alert :-)] Turns out that the two puzzles above are closely related. Let's first list all the things that change when stop_scheduler_on_panic is enabled and a panic happens: - other CPUs are stopped (forced to spin) - interrupts on current CPU are disabled - by virtue of the above the current thread should be the only thread running (unless it executes a voluntary switch) - all locks are busted, they are completely ignored / bypassed - by virtue of the above no lock invariants and witness checks are performed, so no lock order checking, no recursion checking, etc So, what I observe is this: when stop_scheduler_on_panic is disabled, the LOR is actually detected as it should be. witness(9) works properly here. Once the LOR is detected witness(9) wants to report it using printf(9). That's where we run into the cnputs_mtx recursion panic. It's all exactly as with stop_scheduler_on_panic enabled. Then panic(9) wants to report the panic using printf(9), which goes to cnputs() again, where _mtx_lock_spin_flags() detects locks recursion again (this is independent of witness(9)) and calls panic(9) again. Then panic(9) wants to report the panic using printf(9)... I assume that when the stack is exhausted we run into a double fault and dblfault_handler wants to print something again... Likely we eventually run into a triple fault which resets the machine. Although, I recall some old reports of machines hanging during reboot in a place suspiciously close to where the described ordeal happens... But if the machine doesn't hang we get a full appearance of the reset successfully happening (modulo some last messages missing). With stop_scheduler_on_panic enabled and all the locks being ignored we, of course, do not run into any secondary lock recursions and resulting panics. So the system is able to at least panic gracefully (still we should have just reported the LOR gracefully, no panic is required). Some obvious conclusions: - the smp rendezvous and uart_hwmtx LOR is real and it is the true cause of the problem; it should be fixed one way or other - either by correcting witness order_lists[] or by avoiding the LOR in shutdown_reset/cpu_reset; - because witness(9) uses printf(9) to report problems, it is very fragile to use witness with any locks that can
Re: locks under printf(9) and WITNESS
BTW, I see another LOR with spinlocks, maybe harmless. o sysbeep() is called from syscons code and it calls timeout() which introduces the following order: scrlock - callout. o The callout code programming of event timers introduces the following order (via callout_new_inserted == cpu_new_callout): callout - et_hw_mtx o Eventtimers' doconfigtimer calls loadtimer with et_hw_mtx held, loadtimer calls et_start method of a configured event timer and, e.g. in the case of lapic_et_start and bootverbose it calls printf(9), which gives: et_hw_mtx - scrlock This is just for the information. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
on 24/01/2012 14:32 Gleb Smirnoff said the following: Yes, now: Rebooting... lock order reversal: 1st 0x80937140 smp rendezvous (smp rendezvous) @ /usr/src/head/sys/kern/kern_shutdown.c:542 2nd 0xfe0001f5d838 uart_hwmtx (uart_hwmtx) @ /usr/src/head/sys/dev/uart/uart_cpu.h:92 panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/head/sys/kern/kern_cons.c:500 OK, so it's just a plain LOR between smp rendezvous and uart_hwmtx, no new details... It's still intriguing to me why the LOR *doesn't* happen [*] with stop_scheduler_on_panic=0. But I don't see a productive way to pursue this investigation further. [*] - or maybe better to say why it isn't detected/reported. cpuid = 0 KDB: enter: panic [ thread pid 1 tid 11 ] Stopped at kdb_enter+0x3b: movq$0,0x5159f2(%rip) db bt Tracing pid 1 tid 11 td 0xfe0001d5e000 kdb_enter() at kdb_enter+0x3b panic() at panic+0x1c7 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f cnputs() at cnputs+0x7a putchar() at putchar+0x11f kvprintf() at kvprintf+0x83 vprintf() at vprintf+0x85 printf() at printf+0x67 Maybe kdb_backtrace ought to use db_printf. kdb_backtrace() at kdb_backtrace+0x2d _witness_debugger() at _witness_debugger+0x2c witness_checkorder() at witness_checkorder+0x854 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99 uart_cnputc() at uart_cnputc+0x3e cnputc() at cnputc+0x4c cnputs() at cnputs+0x26 putchar() at putchar+0x11f kvprintf() at kvprintf+0x83 vprintf() at vprintf+0x85 printf() at printf+0x67 cpu_reset() at cpu_reset+0x81 kern_reboot() at kern_reboot+0x3a5 sys_reboot() at sys_reboot+0x42 amd64_syscall() at amd64_syscall+0x39e Xfast_syscall() at Xfast_syscall+0xf7 --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 0x7fffd6d8, rbp = 0x49 --- -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
On Mon, Jan 23, 2012 at 06:56:08PM +0200, Andriy Gapon wrote: A on 23/01/2012 18:46 Gleb Smirnoff said the following: A On Mon, Jan 23, 2012 at 06:43:23PM +0200, Andriy Gapon wrote: A A db bt A A Tracing pid 1 tid 11 td 0xfe0001d5e000 A A kdb_enter() at kdb_enter+0x3b A A panic() at panic+0x1c7 A A _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f A A cnputs() at cnputs+0x7a A A vprintf() at vprintf+0xcb A A printf() at printf+0x67 A A db_putc() at db_putc+0x81 A A A A Ah, db_putc does something different from what I expected. A A Can you hack it to never use printf? A A Just cut printfs from db_putc()? A A Make the following condition be always false: A A if (!kdb_active || ddb_use_printf) { A A E.g.: A A if (0) { With this change + s/printf/db_printf/ in subr_witness.c I've got the following during reboot: Rebooting... lllock order reversal: 1st 0x80937140 smp rendezvous (smp rendezvous) @ /usr/src/head/sys/kern/kern_shutdown.c:542 2nd 0x80b13280 syscons video lock (syscons video lock) @ /usr/src/head/sys/dev/syscons/syscons.c:1921 panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/head/sys/kern/kern_cons.c:500 cpuid = 0 KDB: enter: panic [ thread pid 1 tid 11 ] Stopped at kdb_enter+0x3b: movq$0,0x5159f2(%rip) db bt Tracing pid 1 tid 11 td 0xfe0001d5e000 kdb_enter() at kdb_enter+0x3b panic() at panic+0x1c7 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f cnputs() at cnputs+0x7a putchar() at putchar+0x11f kvprintf() at kvprintf+0x83 vprintf() at vprintf+0x85 printf() at printf+0x67 kdb_backtrace() at kdb_backtrace+0x2d _witness_debugger() at _witness_debugger+0x2c witness_checkorder() at witness_checkorder+0x854 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99 scrn_update() at scrn_update+0x41c sc_cnputc() at sc_cnputc+0x46 cnputc() at cnputc+0x4c db_putc() at db_putc+0x4d kvprintf() at kvprintf+0x83 db_printf() at db_printf+0x86 witness_checkorder() at witness_checkorder+0x773 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99 sc_puts() at sc_puts+0x97 sc_cnputc() at sc_cnputc+0x3e cnputc() at cnputc+0x4c db_putc() at db_putc+0x4d kvprintf() at kvprintf+0x83 db_printf() at db_printf+0x86 witness_checkorder() at witness_checkorder+0x773 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99 uart_cnputc() at uart_cnputc+0x3e cnputc() at cnputc+0x4c cnputs() at cnputs+0x26 putchar() at putchar+0x11f kvprintf() at kvprintf+0x83 vprintf() at vprintf+0x85 printf() at printf+0x67 cpu_reset() at cpu_reset+0x81 kern_reboot() at kern_reboot+0x3a5 sys_reboot() at sys_reboot+0x42 amd64_syscall() at amd64_syscall+0x39e Xfast_syscall() at Xfast_syscall+0xf7 --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 0x7fffd6d8, rbp = 0x49 --- db -- Totus tuus, Glebius. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
on 24/01/2012 13:54 Gleb Smirnoff said the following: On Mon, Jan 23, 2012 at 06:56:08PM +0200, Andriy Gapon wrote: A on 23/01/2012 18:46 Gleb Smirnoff said the following: A On Mon, Jan 23, 2012 at 06:43:23PM +0200, Andriy Gapon wrote: A A db bt A A Tracing pid 1 tid 11 td 0xfe0001d5e000 A A kdb_enter() at kdb_enter+0x3b A A panic() at panic+0x1c7 A A _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f A A cnputs() at cnputs+0x7a A A vprintf() at vprintf+0xcb A A printf() at printf+0x67 A A db_putc() at db_putc+0x81 A A A A Ah, db_putc does something different from what I expected. A A Can you hack it to never use printf? A A Just cut printfs from db_putc()? A A Make the following condition be always false: A A if (!kdb_active || ddb_use_printf) { A A E.g.: A A if (0) { With this change + s/printf/db_printf/ in subr_witness.c I've got the following during reboot: Rebooting... lllock order reversal: 1st 0x80937140 smp rendezvous (smp rendezvous) @ /usr/src/head/sys/kern/kern_shutdown.c:542 2nd 0x80b13280 syscons video lock (syscons video lock) @ /usr/src/head/sys/dev/syscons/syscons.c:1921 panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/head/sys/kern/kern_cons.c:500 Can you disable video console? -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
On Tue, Jan 24, 2012 at 02:09:03PM +0200, Andriy Gapon wrote: A on 24/01/2012 13:54 Gleb Smirnoff said the following: A On Mon, Jan 23, 2012 at 06:56:08PM +0200, Andriy Gapon wrote: A A on 23/01/2012 18:46 Gleb Smirnoff said the following: A A On Mon, Jan 23, 2012 at 06:43:23PM +0200, Andriy Gapon wrote: A A A db bt A A A Tracing pid 1 tid 11 td 0xfe0001d5e000 A A A kdb_enter() at kdb_enter+0x3b A A A panic() at panic+0x1c7 A A A _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f A A A cnputs() at cnputs+0x7a A A A vprintf() at vprintf+0xcb A A A printf() at printf+0x67 A A A db_putc() at db_putc+0x81 A A A A A A Ah, db_putc does something different from what I expected. A A A Can you hack it to never use printf? A A A A Just cut printfs from db_putc()? A A A A Make the following condition be always false: A A A A if (!kdb_active || ddb_use_printf) { A A A A E.g.: A A A A if (0) { A A With this change + s/printf/db_printf/ in subr_witness.c I've A got the following during reboot: A A Rebooting... A lllock order reversal: A 1st 0x80937140 smp rendezvous (smp rendezvous) @ /usr/src/head/sys/kern/kern_shutdown.c:542 A 2nd 0x80b13280 syscons video lock (syscons video lock) @ /usr/src/head/sys/dev/syscons/syscons.c:1921 A panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/head/sys/kern/kern_cons.c:500 A A Can you disable video console? Yes, now: Rebooting... lock order reversal: 1st 0x80937140 smp rendezvous (smp rendezvous) @ /usr/src/head/sys/kern/kern_shutdown.c:542 2nd 0xfe0001f5d838 uart_hwmtx (uart_hwmtx) @ /usr/src/head/sys/dev/uart/uart_cpu.h:92 panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/head/sys/kern/kern_cons.c:500 cpuid = 0 KDB: enter: panic [ thread pid 1 tid 11 ] Stopped at kdb_enter+0x3b: movq$0,0x5159f2(%rip) db bt Tracing pid 1 tid 11 td 0xfe0001d5e000 kdb_enter() at kdb_enter+0x3b panic() at panic+0x1c7 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f cnputs() at cnputs+0x7a putchar() at putchar+0x11f kvprintf() at kvprintf+0x83 vprintf() at vprintf+0x85 printf() at printf+0x67 kdb_backtrace() at kdb_backtrace+0x2d _witness_debugger() at _witness_debugger+0x2c witness_checkorder() at witness_checkorder+0x854 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99 uart_cnputc() at uart_cnputc+0x3e cnputc() at cnputc+0x4c cnputs() at cnputs+0x26 putchar() at putchar+0x11f kvprintf() at kvprintf+0x83 vprintf() at vprintf+0x85 printf() at printf+0x67 cpu_reset() at cpu_reset+0x81 kern_reboot() at kern_reboot+0x3a5 sys_reboot() at sys_reboot+0x42 amd64_syscall() at amd64_syscall+0x39e Xfast_syscall() at Xfast_syscall+0xf7 --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 0x7fffd6d8, rbp = 0x49 --- -- Totus tuus, Glebius. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
on 22/01/2012 18:35 Gleb Smirnoff said the following: On Sat, Jan 21, 2012 at 03:38:59PM +0200, Andriy Gapon wrote: A on 20/01/2012 17:41 Gleb Smirnoff said the following: A A On Tue, Jan 17, 2012 at 03:02:42PM +0400, Gleb Smirnoff wrote: A T New panic has been introduced somewhere between A T r229851 and r229932, that happens on shutdown if A T kernel has WITNESS and doesn't have WITNESS_SKIPSPIN. A A I've run through binary search and panic was introduced A by r229854. A A A Which means that it was introduced by one of the earlier commits (probably A mine). Thank you for this investigative work! A Although, to be frank and honest, I still don't see how the problem could be A a non-issue before. I can confirm that I can reboot succesfully with a new kernel and kern.stop_scheduler_on_panic=0. This is very puzzling. You observation means that stop_scheduler_on_panic has an effect on the code outside the panic path. I reviewed the SCHEDULER_STOPPED() changes and I still can not see how that could be possible (modulo compiler bugs). Unfortunately, I also can not reproduce the problem here - without WITNESS_SKIPSPIN I am running into a panic[*] during boot. BTW, do you get any other LOR reports _before_ you do a reboot? Can you try to change printfs in witness to db_printfs? Perhaps this will allow to get the details of the LOR in uart_cnputc. Maybe that will reveal some important additional details. Finally, can you try the patch from my other post? [*] The panic: ... Root mount waiting for: usbus0 panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/sys/kern/kern_cons.c:500 cpuid = 0 curthread: 0xfe000230b900 stack: 0xff802cda3000 - 0xff802cda7000 stack pointer: 0xff802cda5c50 KDB: stack backtrace: db_trace_self_wrapper() at 0x802bf41a = db_trace_self_wrapper+0x2a kdb_backtrace() at 0x8053382a = kdb_backtrace+0x3a panic() at 0x804f8690 = panic+0x260 _mtx_lock_spin_flags() at 0x804e5eff = _mtx_lock_spin_flags+0xef cnputs() at 0x804b0ea6 = cnputs+0x36 putchar() at 0x8053887f = putchar+0x24f kvprintf() at 0x80536b1b = kvprintf+0x9b vprintf() at 0x80538225 = vprintf+0x85 printf() at 0x805382f7 = printf+0x67 witness_checkorder() at 0x8054c361 = witness_checkorder+0x821 _mtx_lock_spin_flags() at 0x804e5f1d = _mtx_lock_spin_flags+0x10d uart_cnputc() at 0x803c8e6c = uart_cnputc+0x3c cnputc() at 0x804b0acd = cnputc+0x6d cnputs() at 0x804b0ebb = cnputs+0x4b putchar() at 0x8053887f = putchar+0x24f kvprintf() at 0x80536b1b = kvprintf+0x9b vprintf() at 0x80538225 = vprintf+0x85 printf() at 0x805382f7 = printf+0x67 witness_checkorder() at 0x8054c361 = witness_checkorder+0x821 _mtx_lock_spin_flags() at 0x804e5f1d = _mtx_lock_spin_flags+0x10d cpu_new_callout() at 0x806fadbb = cpu_new_callout+0x9b callout_cc_add() at 0x8050e379 = callout_cc_add+0x129 callout_reset_on() at 0x8050efcc = callout_reset_on+0x24c sleepq_set_timeout() at 0x8053ff56 = sleepq_set_timeout+0x106 _sleep() at 0x805036f2 = _sleep+0x342 pause() at 0x80503920 = pause+0x90 usb_pause_mtx() at 0x803ee841 = usb_pause_mtx+0x71 usbd_req_set_address() at 0x803e9f1c = usbd_req_set_address+0xac usb_alloc_device() at 0x803def89 = usb_alloc_device+0x3d9 uhub_explore() at 0x803e5b63 = uhub_explore+0x3a3 usb_bus_explore() at 0x803d4187 = usb_bus_explore+0xd7 usb_process() at 0x803e8bc1 = usb_process+0xa1 fork_exit() at 0x804c801a = fork_exit+0x1aa fork_trampoline() at 0x806bf3ee = fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xff802cda6d00, rbp = 0 --- KDB: enter: panic [ thread pid 15 tid 100031 ] Stopped at 0x8053348d = kdb_enter+0x3d:movq $0,0x11212d0(%rip) db show locks exclusive sx USB config SX lock (USB config SX lock) r = 0 (0xfe0005040070) locked @ /usr/src/sys/dev/usb/usb_device.c:2643 exclusive spin mutex callout (callout) r = 0 (0x8160b3f8) locked @ /usr/src/sys/kern/kern_timeout.c:409 exclusive spin mutex sleepq chain (sleepq chain) r = 0 (0x81655668) locked @ /usr/src/sys/kern/subr_sleepqueue.c:237 -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: locks under printf(9) and WITNESS [Was: new panic in cpu_reset() with WITNESS]
On Sat, Jan 21, 2012 at 07:26:55PM +0200, Andriy Gapon wrote: A BTW, we have a quite strange situation with spin locks in console output path. A cnputs_mtx is marked as MTX_NOWITNESS, supposedly because cnputs (printf) can be A called in any locking context (even during normal operation). But there are a A number of console-specific locks (scrlock, uart_hwmtx, syscons video lock) A which are acquired under cnputs_mtx, but which are *not* marked as A MTX_NOWITNESS. More, they are specified in the witness order_lists as if we A certainly know a correct order for them. A I think that the msgbuf mutex also deserves mentioning in this context. A A I think that all of these spin locks should be marked as MTX_NOWITNESS (as long A as they stay normal spinlocks), because printf(9) should be usable wherever I A stick it in the code. A A P.S. The above only matters for WITNESS and !WITNESS_SKIPSPIN and I am not sure A if this combination really matters. A A A Here's my take at it: A diff --git a/sys/kern/kern_cons.c b/sys/kern/kern_cons.c A index 5346bc3..97f0f16 100644 ... This patch works. -- Totus tuus, Glebius. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
On Mon, Jan 23, 2012 at 10:21:57AM +0200, Andriy Gapon wrote: A I can confirm that I can reboot succesfully with a new kernel A and kern.stop_scheduler_on_panic=0. A A This is very puzzling. You observation means that stop_scheduler_on_panic has A an effect on the code outside the panic path. I reviewed the A SCHEDULER_STOPPED() changes and I still can not see how that could be possible A (modulo compiler bugs). A A Unfortunately, I also can not reproduce the problem here - without A WITNESS_SKIPSPIN I am running into a panic[*] during boot. I suppose your patch (the one in other post) should help against this panic on boot. A BTW, do you get any other LOR reports _before_ you do a reboot? I have some, but I don't think they are related. A Can you try to change printfs in witness to db_printfs? Perhaps this will allow A to get the details of the LOR in uart_cnputc. Maybe that will reveal some A important additional details. Should I do s/printf/db_printf/ throughout entire subr_witness.c, or only in special places? A Finally, can you try the patch from my other post? Yes, it works. -- Totus tuus, Glebius. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: locks under printf(9) and WITNESS [Was: new panic in cpu_reset() with WITNESS]
on 23/01/2012 15:04 Gleb Smirnoff said the following: On Sat, Jan 21, 2012 at 07:26:55PM +0200, Andriy Gapon wrote: A BTW, we have a quite strange situation with spin locks in console output path. A cnputs_mtx is marked as MTX_NOWITNESS, supposedly because cnputs (printf) can be A called in any locking context (even during normal operation). But there are a A number of console-specific locks (scrlock, uart_hwmtx, syscons video lock) A which are acquired under cnputs_mtx, but which are *not* marked as A MTX_NOWITNESS. More, they are specified in the witness order_lists as if we A certainly know a correct order for them. A I think that the msgbuf mutex also deserves mentioning in this context. A A I think that all of these spin locks should be marked as MTX_NOWITNESS (as long A as they stay normal spinlocks), because printf(9) should be usable wherever I A stick it in the code. A A P.S. The above only matters for WITNESS and !WITNESS_SKIPSPIN and I am not sure A if this combination really matters. A A A Here's my take at it: A diff --git a/sys/kern/kern_cons.c b/sys/kern/kern_cons.c A index 5346bc3..97f0f16 100644 ... This patch works. Thank you for testing! -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
on 23/01/2012 15:07 Gleb Smirnoff said the following: On Mon, Jan 23, 2012 at 10:21:57AM +0200, Andriy Gapon wrote: A I can confirm that I can reboot succesfully with a new kernel A and kern.stop_scheduler_on_panic=0. A A This is very puzzling. You observation means that stop_scheduler_on_panic has A an effect on the code outside the panic path. I reviewed the A SCHEDULER_STOPPED() changes and I still can not see how that could be possible A (modulo compiler bugs). A A Unfortunately, I also can not reproduce the problem here - without A WITNESS_SKIPSPIN I am running into a panic[*] during boot. I suppose your patch (the one in other post) should help against this panic on boot. Yes, it does. It also masks/fixes the panic at reboot. A BTW, do you get any other LOR reports _before_ you do a reboot? I have some, but I don't think they are related. OK. A Can you try to change printfs in witness to db_printfs? Perhaps this will allow A to get the details of the LOR in uart_cnputc. Maybe that will reveal some A important additional details. Should I do s/printf/db_printf/ throughout entire subr_witness.c, or only in special places? I think that replace all should work for this test. A Finally, can you try the patch from my other post? Yes, it works. Thank you again. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
On Mon, Jan 23, 2012 at 04:01:20PM +0200, Andriy Gapon wrote: A A Can you try to change printfs in witness to db_printfs? Perhaps this will allow A A to get the details of the LOR in uart_cnputc. Maybe that will reveal some A A important additional details. A A Should I do s/printf/db_printf/ throughout entire subr_witness.c, or only A in special places? A A I think that replace all should work for this test. Yes, s/ printf/ db_printf/ in subr_witness.c fixes my problem. -- Totus tuus, Glebius. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
On Mon, Jan 23, 2012 at 08:24:10PM +0400, Gleb Smirnoff wrote: T On Mon, Jan 23, 2012 at 04:01:20PM +0200, Andriy Gapon wrote: T A A Can you try to change printfs in witness to db_printfs? Perhaps this will allow T A A to get the details of the LOR in uart_cnputc. Maybe that will reveal some T A A important additional details. T A T A Should I do s/printf/db_printf/ throughout entire subr_witness.c, or only T A in special places? T A T A I think that replace all should work for this test. T T Yes, s/ printf/ db_printf/ in subr_witness.c fixes my problem. Sorry. I was testing wrong kernel. The substitute doesn't help. Panic trace is almost the same: db bt Tracing pid 1 tid 11 td 0xfe0001d5e000 kdb_enter() at kdb_enter+0x3b panic() at panic+0x1c7 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f cnputs() at cnputs+0x7a vprintf() at vprintf+0xcb printf() at printf+0x67 db_putc() at db_putc+0x81 kvprintf() at kvprintf+0x83 db_printf() at db_printf+0x86 witness_checkorder() at witness_checkorder+0x773 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99 uart_cnputc() at uart_cnputc+0x3e cnputc() at cnputc+0x4c cnputs() at cnputs+0x26 putchar() at putchar+0x11f kvprintf() at kvprintf+0x83 vprintf() at vprintf+0x85 printf() at printf+0x67 cpu_reset() at cpu_reset+0x81 kern_reboot() at kern_reboot+0x3a5 sys_reboot() at sys_reboot+0x42 amd64_syscall() at amd64_syscall+0x39e Xfast_syscall() at Xfast_syscall+0xf7 --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 0x7fffd6d8, rbp = 0x49 --- db -- Totus tuus, Glebius. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
on 23/01/2012 18:26 Gleb Smirnoff said the following: Sorry. I was testing wrong kernel. The substitute doesn't help. Panic trace is almost the same: db bt Tracing pid 1 tid 11 td 0xfe0001d5e000 kdb_enter() at kdb_enter+0x3b panic() at panic+0x1c7 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f cnputs() at cnputs+0x7a vprintf() at vprintf+0xcb printf() at printf+0x67 db_putc() at db_putc+0x81 Ah, db_putc does something different from what I expected. Can you hack it to never use printf? kvprintf() at kvprintf+0x83 db_printf() at db_printf+0x86 witness_checkorder() at witness_checkorder+0x773 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99 uart_cnputc() at uart_cnputc+0x3e cnputc() at cnputc+0x4c cnputs() at cnputs+0x26 putchar() at putchar+0x11f kvprintf() at kvprintf+0x83 vprintf() at vprintf+0x85 printf() at printf+0x67 cpu_reset() at cpu_reset+0x81 kern_reboot() at kern_reboot+0x3a5 sys_reboot() at sys_reboot+0x42 amd64_syscall() at amd64_syscall+0x39e Xfast_syscall() at Xfast_syscall+0xf7 --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 0x7fffd6d8, rbp = 0x49 --- db -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
On Mon, Jan 23, 2012 at 06:43:23PM +0200, Andriy Gapon wrote: A db bt A Tracing pid 1 tid 11 td 0xfe0001d5e000 A kdb_enter() at kdb_enter+0x3b A panic() at panic+0x1c7 A _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f A cnputs() at cnputs+0x7a A vprintf() at vprintf+0xcb A printf() at printf+0x67 A db_putc() at db_putc+0x81 A A Ah, db_putc does something different from what I expected. A Can you hack it to never use printf? Just cut printfs from db_putc()? -- Totus tuus, Glebius. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
on 23/01/2012 18:46 Gleb Smirnoff said the following: On Mon, Jan 23, 2012 at 06:43:23PM +0200, Andriy Gapon wrote: A db bt A Tracing pid 1 tid 11 td 0xfe0001d5e000 A kdb_enter() at kdb_enter+0x3b A panic() at panic+0x1c7 A _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f A cnputs() at cnputs+0x7a A vprintf() at vprintf+0xcb A printf() at printf+0x67 A db_putc() at db_putc+0x81 A A Ah, db_putc does something different from what I expected. A Can you hack it to never use printf? Just cut printfs from db_putc()? Make the following condition be always false: if (!kdb_active || ddb_use_printf) { E.g.: if (0) { :-) -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
On Sat, Jan 21, 2012 at 03:38:59PM +0200, Andriy Gapon wrote: A on 20/01/2012 17:41 Gleb Smirnoff said the following: A A On Tue, Jan 17, 2012 at 03:02:42PM +0400, Gleb Smirnoff wrote: A T New panic has been introduced somewhere between A T r229851 and r229932, that happens on shutdown if A T kernel has WITNESS and doesn't have WITNESS_SKIPSPIN. A A I've run through binary search and panic was introduced A by r229854. A A A Which means that it was introduced by one of the earlier commits (probably A mine). Thank you for this investigative work! A Although, to be frank and honest, I still don't see how the problem could be A a non-issue before. I can confirm that I can reboot succesfully with a new kernel and kern.stop_scheduler_on_panic=0. -- Totus tuus, Glebius. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
on 20/01/2012 17:41 Gleb Smirnoff said the following: On Tue, Jan 17, 2012 at 03:02:42PM +0400, Gleb Smirnoff wrote: T New panic has been introduced somewhere between T r229851 and r229932, that happens on shutdown if T kernel has WITNESS and doesn't have WITNESS_SKIPSPIN. I've run through binary search and panic was introduced by r229854. Which means that it was introduced by one of the earlier commits (probably mine). Thank you for this investigative work! Although, to be frank and honest, I still don't see how the problem could be a non-issue before. So any help/ideas/guesses are appreciated. P.S. I also have a big suspicion that we have more problems that can be triggered by WITNESS !WITNESS_SKIPSPIN, but which are harmless in practice. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
locks under printf(9) and WITNESS [Was: new panic in cpu_reset() with WITNESS]
BTW, we have a quite strange situation with spin locks in console output path. cnputs_mtx is marked as MTX_NOWITNESS, supposedly because cnputs (printf) can be called in any locking context (even during normal operation). But there are a number of console-specific locks (scrlock, uart_hwmtx, syscons video lock) which are acquired under cnputs_mtx, but which are *not* marked as MTX_NOWITNESS. More, they are specified in the witness order_lists as if we certainly know a correct order for them. I think that the msgbuf mutex also deserves mentioning in this context. I think that all of these spin locks should be marked as MTX_NOWITNESS (as long as they stay normal spinlocks), because printf(9) should be usable wherever I stick it in the code. P.S. The above only matters for WITNESS and !WITNESS_SKIPSPIN and I am not sure if this combination really matters. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
on 21/01/2012 15:38 Andriy Gapon said the following: on 20/01/2012 17:41 Gleb Smirnoff said the following: On Tue, Jan 17, 2012 at 03:02:42PM +0400, Gleb Smirnoff wrote: T New panic has been introduced somewhere between T r229851 and r229932, that happens on shutdown if T kernel has WITNESS and doesn't have WITNESS_SKIPSPIN. I've run through binary search and panic was introduced by r229854. Which means that it was introduced by one of the earlier commits (probably mine). Thank you for this investigative work! Although, to be frank and honest, I still don't see how the problem could be a non-issue before. BTW, just to be sure, does the problem go away when kern.stop_scheduler_on_panic is set to zero? -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: locks under printf(9) and WITNESS [Was: new panic in cpu_reset() with WITNESS]
on 21/01/2012 16:37 Andriy Gapon said the following: BTW, we have a quite strange situation with spin locks in console output path. cnputs_mtx is marked as MTX_NOWITNESS, supposedly because cnputs (printf) can be called in any locking context (even during normal operation). But there are a number of console-specific locks (scrlock, uart_hwmtx, syscons video lock) which are acquired under cnputs_mtx, but which are *not* marked as MTX_NOWITNESS. More, they are specified in the witness order_lists as if we certainly know a correct order for them. I think that the msgbuf mutex also deserves mentioning in this context. I think that all of these spin locks should be marked as MTX_NOWITNESS (as long as they stay normal spinlocks), because printf(9) should be usable wherever I stick it in the code. P.S. The above only matters for WITNESS and !WITNESS_SKIPSPIN and I am not sure if this combination really matters. Here's my take at it: diff --git a/sys/kern/kern_cons.c b/sys/kern/kern_cons.c index 5346bc3..97f0f16 100644 --- a/sys/kern/kern_cons.c +++ b/sys/kern/kern_cons.c @@ -586,7 +586,7 @@ static void cn_drvinit(void *unused) { - mtx_init(cnputs_mtx, cnputs_mtx, NULL, MTX_SPIN | MTX_NOWITNESS); + mtx_init(cnputs_mtx, cnputs_mtx, NULL, MTX_SPIN); use_cnputs_mtx = 1; } diff --git a/sys/kern/subr_witness.c b/sys/kern/subr_witness.c index 55cb2d7..39dd94d 100644 --- a/sys/kern/subr_witness.c +++ b/sys/kern/subr_witness.c @@ -629,7 +629,6 @@ static struct witness_order_list_entry order_lists[] = { #endif { rm.mutex_mtx, lock_class_mtx_spin }, { sio, lock_class_mtx_spin }, - { scrlock, lock_class_mtx_spin }, #ifdef __i386__ { cy, lock_class_mtx_spin }, #endif @@ -638,7 +637,6 @@ static struct witness_order_list_entry order_lists[] = { { rtc_mtx, lock_class_mtx_spin }, #endif { scc_hwmtx, lock_class_mtx_spin }, - { uart_hwmtx, lock_class_mtx_spin }, { fast_taskqueue, lock_class_mtx_spin }, { intr table, lock_class_mtx_spin }, #ifdef HWPMC_HOOKS @@ -653,8 +651,8 @@ static struct witness_order_list_entry order_lists[] = { { sched lock, lock_class_mtx_spin }, { td_contested, lock_class_mtx_spin }, { callout, lock_class_mtx_spin }, + { et_hw_mtx, lock_class_mtx_spin }, { entropy harvest mutex, lock_class_mtx_spin }, - { syscons video lock, lock_class_mtx_spin }, #ifdef SMP { smp rendezvous, lock_class_mtx_spin }, #endif @@ -662,6 +660,13 @@ static struct witness_order_list_entry order_lists[] = { { tlb0, lock_class_mtx_spin }, #endif /* +* console locks +*/ + { cnputs_mtx, lock_class_mtx_spin }, + { scrlock, lock_class_mtx_spin }, + { syscons video lock, lock_class_mtx_spin }, + { uart_hwmtx, lock_class_mtx_spin }, + /* * leaf locks */ { intrcnt, lock_class_mtx_spin }, How does this look? -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
On Tue, Jan 17, 2012 at 03:02:42PM +0400, Gleb Smirnoff wrote: T New panic has been introduced somewhere between T r229851 and r229932, that happens on shutdown if T kernel has WITNESS and doesn't have WITNESS_SKIPSPIN. I've run through binary search and panic was introduced by r229854. -- Totus tuus, Glebius. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
new panic in cpu_reset() with WITNESS
New panic has been introduced somewhere between r229851 and r229932, that happens on shutdown if kernel has WITNESS and doesn't have WITNESS_SKIPSPIN. Uptime: 1h0m17s Rebooting... panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/head/sys/kern/kern_cons.c:500 cpuid = 0 KDB: enter: panic [ thread pid 1 tid 11 ] Stopped at kdb_enter+0x3b: movq$0,0x514d32(%rip) db db bt Tracing pid 1 tid 11 td 0xfe0001d5e000 kdb_enter() at kdb_enter+0x3b panic() at panic+0x1c7 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f cnputs() at cnputs+0x7a putchar() at putchar+0x11f kvprintf() at kvprintf+0x83 vprintf() at vprintf+0x85 printf() at printf+0x67 witness_checkorder() at witness_checkorder+0x773 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99 uart_cnputc() at uart_cnputc+0x3e cnputc() at cnputc+0x4c cnputs() at cnputs+0x26 putchar() at putchar+0x11f kvprintf() at kvprintf+0x83 vprintf() at vprintf+0x85 printf() at printf+0x67 cpu_reset() at cpu_reset+0x81 kern_reboot() at kern_reboot+0x3a5 --More--^M^Msys_reboot() at sys_reboot+0x42 amd64_syscall() at amd64_syscall+0x39e Xfast_syscall() at Xfast_syscall+0xf7 --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 0x7fffd6d8, rbp = 0x49 --- db db show locks exclusive sleep mutex Giant (Giant) r = 0 (0x809bc560) locked @ /usr/src/head/sys/kern/kern_module.c:101 exclusive spin mutex smp rendezvous (smp rendezvous) r = 0 (0x80a08840) locked @ /usr/src/head/sys/kern/kern_shutdown.c:542 db So the problem is that we are holding smp rendezvous mutex during the cpu_reset(). No mutexes should be obtained after it. However, since cpu_reset() does priting we obtain cnputs_mtx, and later obtain uart_hwmtx. The latter is hardcoded in the subr_witness.c as mutex to obtain before smp rendezvous, this triggers yet another printf from witness, that finally panics due to recursing on cnputs_mtx. -- Totus tuus, Glebius. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
2012/1/17 Gleb Smirnoff gleb...@freebsd.org: New panic has been introduced somewhere between r229851 and r229932, that happens on shutdown if kernel has WITNESS and doesn't have WITNESS_SKIPSPIN. Uptime: 1h0m17s Rebooting... panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/head/sys/kern/kern_cons.c:500 cpuid = 0 KDB: enter: panic [ thread pid 1 tid 11 ] Stopped at kdb_enter+0x3b: movq $0,0x514d32(%rip) db db bt Tracing pid 1 tid 11 td 0xfe0001d5e000 kdb_enter() at kdb_enter+0x3b panic() at panic+0x1c7 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f cnputs() at cnputs+0x7a putchar() at putchar+0x11f kvprintf() at kvprintf+0x83 vprintf() at vprintf+0x85 printf() at printf+0x67 witness_checkorder() at witness_checkorder+0x773 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99 uart_cnputc() at uart_cnputc+0x3e cnputc() at cnputc+0x4c cnputs() at cnputs+0x26 putchar() at putchar+0x11f kvprintf() at kvprintf+0x83 vprintf() at vprintf+0x85 printf() at printf+0x67 cpu_reset() at cpu_reset+0x81 kern_reboot() at kern_reboot+0x3a5 --More--^M ^Msys_reboot() at sys_reboot+0x42 amd64_syscall() at amd64_syscall+0x39e Xfast_syscall() at Xfast_syscall+0xf7 --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 0x7fffd6d8, rbp = 0x49 --- db db show locks exclusive sleep mutex Giant (Giant) r = 0 (0x809bc560) locked @ /usr/src/head/sys/kern/kern_module.c:101 exclusive spin mutex smp rendezvous (smp rendezvous) r = 0 (0x80a08840) locked @ /usr/src/head/sys/kern/kern_shutdown.c:542 db So the problem is that we are holding smp rendezvous mutex during the cpu_reset(). No mutexes should be obtained after it. However, since cpu_reset() does priting we obtain cnputs_mtx, and later obtain uart_hwmtx. The latter is hardcoded in the subr_witness.c as mutex to obtain before smp rendezvous, this triggers yet another printf from witness, that finally panics due to recursing on cnputs_mtx. At $WORK we explicitly marked cnputs_mtx as NO_WITNESS since it didn't seem possible to fit it into the heirarchy in any sane way, since a print can come from basically anywhere. If anyone has a better fix, that'd be great, but I haven't been able to think of one. Thanks, matthew ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
On Tue, Jan 17, 2012 at 07:34:23AM -0800, m...@freebsd.org wrote: m 2012/1/17 Gleb Smirnoff gleb...@freebsd.org: m New panic has been introduced somewhere between m r229851 and r229932, that happens on shutdown if m kernel has WITNESS and doesn't have WITNESS_SKIPSPIN. m m Uptime: 1h0m17s m Rebooting... m panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/head/sys/kern/kern_cons.c:500 m cpuid = 0 m KDB: enter: panic m [ thread pid 1 tid 11 ] m Stopped at kdb_enter+0x3b: movq $0,0x514d32(%rip) m db m db bt m Tracing pid 1 tid 11 td 0xfe0001d5e000 m kdb_enter() at kdb_enter+0x3b m panic() at panic+0x1c7 m _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f m cnputs() at cnputs+0x7a m putchar() at putchar+0x11f m kvprintf() at kvprintf+0x83 m vprintf() at vprintf+0x85 m printf() at printf+0x67 m witness_checkorder() at witness_checkorder+0x773 m _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99 m uart_cnputc() at uart_cnputc+0x3e m cnputc() at cnputc+0x4c m cnputs() at cnputs+0x26 m putchar() at putchar+0x11f m kvprintf() at kvprintf+0x83 m vprintf() at vprintf+0x85 m printf() at printf+0x67 m cpu_reset() at cpu_reset+0x81 m kern_reboot() at kern_reboot+0x3a5 m --More--^M ^Msys_reboot() at sys_reboot+0x42 m amd64_syscall() at amd64_syscall+0x39e m Xfast_syscall() at Xfast_syscall+0xf7 m --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 0x7fffd6d8, rbp = 0x49 --- m db m db show locks m exclusive sleep mutex Giant (Giant) r = 0 (0x809bc560) locked @ /usr/src/head/sys/kern/kern_module.c:101 m exclusive spin mutex smp rendezvous (smp rendezvous) r = 0 (0x80a08840) locked @ /usr/src/head/sys/kern/kern_shutdown.c:542 m db m m So the problem is that we are holding smp rendezvous mutex during the cpu_reset(). m No mutexes should be obtained after it. However, since cpu_reset() does priting m we obtain cnputs_mtx, and later obtain uart_hwmtx. The latter is hardcoded in m the subr_witness.c as mutex to obtain before smp rendezvous, this triggers m yet another printf from witness, that finally panics due to recursing on m cnputs_mtx. m m At $WORK we explicitly marked cnputs_mtx as NO_WITNESS since it didn't m seem possible to fit it into the heirarchy in any sane way, since a m print can come from basically anywhere. m m If anyone has a better fix, that'd be great, but I haven't been able m to think of one. Setting NO_WITNESS on cnputs_mtx won't help for the above problem, IMHO. -- Totus tuus, Glebius. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: new panic in cpu_reset() with WITNESS
on 17/01/2012 17:34 m...@freebsd.org said the following: 2012/1/17 Gleb Smirnoff gleb...@freebsd.org: New panic has been introduced somewhere between r229851 and r229932, that happens on shutdown if kernel has WITNESS and doesn't have WITNESS_SKIPSPIN. [**] Uptime: 1h0m17s Rebooting... panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ /usr/src/head/sys/kern/kern_cons.c:500 cpuid = 0 KDB: enter: panic [ thread pid 1 tid 11 ] Stopped at kdb_enter+0x3b: movq$0,0x514d32(%rip) db db bt Tracing pid 1 tid 11 td 0xfe0001d5e000 kdb_enter() at kdb_enter+0x3b panic() at panic+0x1c7 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f cnputs() at cnputs+0x7a putchar() at putchar+0x11f kvprintf() at kvprintf+0x83 vprintf() at vprintf+0x85 printf() at printf+0x67 witness_checkorder() at witness_checkorder+0x773 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99 uart_cnputc() at uart_cnputc+0x3e cnputc() at cnputc+0x4c cnputs() at cnputs+0x26 putchar() at putchar+0x11f kvprintf() at kvprintf+0x83 vprintf() at vprintf+0x85 printf() at printf+0x67 cpu_reset() at cpu_reset+0x81 kern_reboot() at kern_reboot+0x3a5 --More--^M^Msys_reboot() at sys_reboot+0x42 amd64_syscall() at amd64_syscall+0x39e Xfast_syscall() at Xfast_syscall+0xf7 --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 0x7fffd6d8, rbp = 0x49 --- db db show locks exclusive sleep mutex Giant (Giant) r = 0 (0x809bc560) locked @ /usr/src/head/sys/kern/kern_module.c:101 exclusive spin mutex smp rendezvous (smp rendezvous) r = 0 (0x80a08840) locked @ /usr/src/head/sys/kern/kern_shutdown.c:542 db So the problem is that we are holding smp rendezvous mutex during the cpu_reset(). No mutexes should be obtained after it. However, since cpu_reset() does priting we obtain cnputs_mtx, and later obtain uart_hwmtx. The latter is hardcoded in the subr_witness.c as mutex to obtain before smp rendezvous, this triggers yet another printf from witness, that finally panics due to recursing on cnputs_mtx. At $WORK we explicitly marked cnputs_mtx as NO_WITNESS since it didn't seem possible to fit it into the heirarchy in any sane way, since a print can come from basically anywhere. This is the case for the FreeBSD tree too (at least in head). If anyone has a better fix, that'd be great, but I haven't been able to think of one. Please note that while the panic happened on cnputs_mtx[*] (because of the recursion), it seems to have been provoked by uart_hwmtx actually. The root cause of the panic at hand is that smp_ipi_mtx is supposed to be a terminal lock, i.e. no other lock is supposed to be acquired while smp_ipi_mtx is held. This is the case for all the normal code, but the rule is violated for cpu_reset() in the SMP case, because smp_ipi_mtx is acquired in shutdown_reset() and is held till the end. Thus, while normally smp_ipi_mtx and cnputs_mtx/uart_hwmtx should be completely independent, they have become dependent in this reset path. I think that normally this problem is hidden by people not using WITNESS or by them using WITNESS_SKIPSPIN. One obvious solution is to change the relative order of smp_ipi_mtx (aka smp rendezvous) and uart_hwmtx in order_lists of subr_witness.c, which predefine some order that is expected between the locks. Another solution is to look into the reason why smp_ipi_mtx is acquired in shutdown_reset(). This call has been added to avoid [what used to be] a deadlock between the stop_cpus() call with interrupts disabled on a shutdown CPU and smp_rendezvous or some tlb ipi call on another CPU. As such it should be sufficient to hold smp_ipi_mtx around the stop_cpus() call. In fact, I would argue that stop_cpus() must hold smp_ipi_mtx for the above reason. That is, all inter-CPU operations should be initiated using the same lock. [*] And there is, of course, a different issue (pointed out many times by Bruce Evans) of cnputs_mtx and now msg_lock (msgbuf lock) being inadequate for their jobs as this example demonstrates. Multiplied by the problem of printf being used where db_printf would have been a better choice. Since printf (or db_printf) can be called truly in any situation, the locking must handle recursion, must handle being stuck on another CPU, etc. In general, these functions should strive to be lockless, but that's probably not really practical for the normal situations in SMP environment. Still the special situations should be detected and handled. [**] I would expect this panic/problem to be there for a long time already (definitely before r229851). It's surprising to me that it seems to be a recent phenomenon. -- Andriy Gapon ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: witness_lock_list_get: witness exhausted
On Tuesday, August 16, 2011 5:59:53 pm Peter Jeremy wrote: I'm getting the above message when running Peter Holm's stress test with INCARNATIONS=150 on a 16-core sparc. Does this mean LOCK_CHILDCOUNT is too low or does it indicate a leak in witness lock_list_entry's somewhere? Most likely the former. The comment on LOCK_CHILDCOUNT indicates it is dimensioned to allow 2048 threads to hold 5 locks each but it's not clear how many threads will get started at a given INCARNATIONS count. Also, each lle holds a bucket of instances, so the same count would only let 1024 threads hold 6 locks each perhaps (depending on the bucket size.. it was originally 4 locks per lle). -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
witness_lock_list_get: witness exhausted
I'm getting the above message when running Peter Holm's stress test with INCARNATIONS=150 on a 16-core sparc. Does this mean LOCK_CHILDCOUNT is too low or does it indicate a leak in witness lock_list_entry's somewhere? The comment on LOCK_CHILDCOUNT indicates it is dimensioned to allow 2048 threads to hold 5 locks each but it's not clear how many threads will get started at a given INCARNATIONS count. -- Peter Jeremy pgpCkToMhjPEH.pgp Description: PGP signature
Questions about witness reports on sparc64
Folks, I'm pretty new here, so feel free to use the clue stick :) Playing with current- on Sun Fire V210 I see more or less consistent lock reversal reports like the following: lock order reversal: 1st 0xc1c85fb8 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2559 2nd 0xf80002784600 dirhash (dirhash) @ /usr/src/sys/ufs/ufs/ufs_dirhash.c:285 KDB: stack backtrace: _witness_debugger() at _witness_debugger+0x38 witness_checkorder() at witness_checkorder+0xcf8 _sx_xlock() at _sx_xlock+0x9c ufsdirhash_acquire() at ufsdirhash_acquire+0x30 ufsdirhash_add() at ufsdirhash_add+0x4 ufs_direnter() at ufs_direnter+0x75c ufs_makeinode() at ufs_makeinode+0x4dc ufs_create() at ufs_create+0x40 VOP_CREATE_APV() at VOP_CREATE_APV+0x108 vn_open_cred() at vn_open_cred+0x1fc vn_open() at vn_open+0x1c kern_openat() at kern_openat+0x10c kern_open() at kern_open+0x18 open() at open+0x14 syscall() at syscall+0x25c -- syscall (5, FreeBSD ELF64, open) %o7=0x412bfda8 -- userland() at 0x41344808 user trace: trap %o7=0x412bfda8 pc 0x41344808, sp 0x7fd99d1 pc 0x120008, sp 0x7fd9aa1 pc 0x11216c, sp 0x7fd9b61 pc 0x10d32c, sp 0x7fddd11 pc 0x10d9e4, sp 0x7fdde01 pc 0x10cf2c, sp 0x7fdded1 pc 0x10eea4, sp 0x7fddfa1 pc 0x15a0a0, sp 0x7fde061 pc 0x13024c, sp 0x7fde191 pc 0x103c10, sp 0x7fde281 pc 0x4028ef34, sp 0x7fde341 done That's just on startup, nothing fancy going on. My questions are - is it worth spamming the list with these? If so, what other info needed? Full dmesg bellow. Thanks, -- Nikolai GDB: no debug ports present KDB: debugger backends: ddb KDB: current backend: ddb Copyright (c) 1992-2010 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.0-CURRENT #0: Sun Feb 28 23:19:48 EST 2010 r...@moon.ipv6.fetissov.org:/usr/obj/usr/src/sys/GENERIC sparc64 WARNING: WITNESS option enabled, expect reduced performance. real memory = 4294967296 (4096 MB) avail memory = 4174233600 (3980 MB) cpu0: Sun Microsystems UltraSparc-IIIi Processor (1002.00 MHz CPU) cpu1: Sun Microsystems UltraSparc-IIIi Processor (1002.00 MHz CPU) FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs ispfw: registered firmware isp_1000 ispfw: registered firmware isp_1040 ispfw: registered firmware isp_1040_it ispfw: registered firmware isp_1080 ispfw: registered firmware isp_1080_it ispfw: registered firmware isp_12160 ispfw: registered firmware isp_12160_it ispfw: registered firmware isp_2100 ispfw: registered firmware isp_2200 ispfw: registered firmware isp_2300 ispfw: registered firmware isp_2322 ispfw: registered firmware isp_2400 ispfw: registered firmware isp_2400_multi ispfw: registered firmware isp_2500 ispfw: registered firmware isp_2500_multi kbd0 at kbdmux0 nexus0: Open Firmware Nexus device nexus0: memory-controller mem 0x400-0x407 type memory-controller (no driver attached) nexus0: memory-controller mem 0x480-0x487 type memory-controller (no driver attached) pcib0: Sun Host-PCI bridge mem 0x4000ff0-0x4000ff0afff,0x4000fc1-0x4000fc1701f,0x7f6-0x7f600ff,0x4000ff8-0x4000ff8 irq 2035,2032,2033,2036,2019 on nexus0 pcib0: Tomatillo, version 4, IGN 0x1f, bus B, 66MHz pcib0: DVMA map: 0xc000 to 0xdfff 65536 entries pcib0: [FILTER] pcib0: [FILTER] pcib0: [FILTER] pcib0: [FILTER] pci0: OFW PCI bus on pcib0 bge0: Broadcom BCM5704 A3, ASIC rev. 0x002003 mem 0x20-0x20,0x11-0x11 at device 2.0 on pci0 miibus0: MII bus on bge0 brgphy0: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge0: Ethernet address: 00:03:ba:45:87:2d bge0: [ITHREAD] bge1: Broadcom BCM5704 A3, ASIC rev. 0x002003 mem 0x40-0x40,0x12-0x12 at device 2.1 on pci0 miibus1: MII bus on bge1 brgphy1: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge1: Ethernet address: 00:03:ba:45:87:2e bge1: [ITHREAD] pcib1: Sun Host-PCI bridge mem 0x4000f60-0x4000f60afff,0x4000f41-0x4000f41701f,0x7fe-0x7fe00ff,0x4000f78-0x4000f78 irq 1970,1968,1969,1972,1953 on nexus0 pcib1: Tomatillo, version 4, IGN 0x1e, bus A, 33MHz pcib1: DVMA map: 0xc000 to 0xdfff 65536 entries pcib1: [FILTER] pci1: OFW PCI bus on pcib1 isab0: PCI-ISA bridge at device 7.0 on pci1 isa0: ISA bus on isab0 pci1: old, non-VGA display device at device 6.0 (no driver attached) ohci0: AcerLabs M5237 (Aladdin-V) USB controller mem 0x100-0x1000fff at device 10.0 on pci1 ohci0: [ITHREAD] usbus0: AcerLabs M5237 (Aladdin-V) USB controller on ohci0 atapci0: AcerLabs M5229 UDMA100 controller port 0x900-0x907,0x918-0x91b,0x910-0x917,0x908-0x90b,0x920-0x92f at device 13.0 on pci1 atapci0: [ITHREAD] atapci0: using
Re: Questions about witness reports on sparc64
On Thu, Mar 4, 2010 at 7:19 PM, Nikolai Fetissov nikolai-free...@fetissov.org wrote: Folks, I'm pretty new here, so feel free to use the clue stick :) Playing with current- on Sun Fire V210 I see more or less consistent lock reversal reports like the following: : My questions are - is it worth spamming the list with these? If so, what other info needed? It is worth sending it to the list and sending a report as outlined here: http://sources.zabbadoz.net/freebsd/lor.html Scot ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
patch to let witness monitor the mtx pool
I've been running with the patch below for a little while now. It helped me find a situation where a thread attemped to grab a pool mutex while it already held one, which I suspect could have caused a deadlock in certain circumstances. In any case, this was illegal because these mutexes are only supposed be used as leaf mutexes. I had to patch kern_sx.c to quiet warnings about lock order reversal that look bogus to me. I believe the way that kern_sx.c uses the pool mutexes is safe. Index: sys/kern/kern_mtxpool.c === RCS file: /home/ncvs/src/sys/kern/kern_mtxpool.c,v retrieving revision 1.7 diff -u -r1.7 kern_mtxpool.c --- sys/kern/kern_mtxpool.c 13 Jun 2003 19:39:21 - 1.7 +++ sys/kern/kern_mtxpool.c 20 Jun 2003 06:36:55 - @@ -33,6 +33,7 @@ #include sys/lock.h #include sys/malloc.h #include sys/mutex.h +#include sys/sysctl.h #include sys/systm.h #ifndef MTX_POOL_SIZE @@ -44,6 +45,12 @@ int mtx_pool_valid = 0; +int witness_skip_mtx_pool = 0; + +TUNABLE_INT(debug.witness_skip_mtx_pool, witness_skip_mtx_pool); +SYSCTL_INT(_debug, OID_AUTO, witness_skip_mtx_pool, CTLFLAG_RD, +witness_skip_mtx_pool, 0, ); + /* * Inline version of mtx_pool_find(), used to streamline our main API * function calls. @@ -60,11 +67,13 @@ static void mtx_pool_setup(void *dummy __unused) { -int i; +int i, mtx_opts; +mtx_opts = MTX_DEF | MTX_QUIET; +if (witness_skip_mtx_pool) + mtx_opts |= MTX_NOWITNESS; for (i = 0; i MTX_POOL_SIZE; ++i) - mtx_init(mtx_pool_ary[i], pool mutex, NULL, - MTX_DEF | MTX_NOWITNESS | MTX_QUIET); + mtx_init(mtx_pool_ary[i], pool mutex, NULL, mtx_opts); mtx_pool_valid = 1; } Index: sys/kern/kern_sx.c === RCS file: /home/ncvs/src/sys/kern/kern_sx.c,v retrieving revision 1.19 diff -u -r1.19 kern_sx.c --- sys/kern/kern_sx.c 11 Jun 2003 00:56:56 - 1.19 +++ sys/kern/kern_sx.c 19 Jun 2003 20:28:19 - @@ -126,9 +126,8 @@ sx-sx_cnt++; LOCK_LOG_LOCK(SLOCK, sx-sx_object, 0, 0, file, line); - WITNESS_LOCK(sx-sx_object, 0, file, line); - mtx_unlock(sx-sx_lock); + WITNESS_LOCK(sx-sx_object, 0, file, line); } int @@ -139,8 +138,8 @@ if (sx-sx_cnt = 0) { sx-sx_cnt++; LOCK_LOG_TRY(SLOCK, sx-sx_object, 0, 1, file, line); - WITNESS_LOCK(sx-sx_object, LOP_TRYLOCK, file, line); mtx_unlock(sx-sx_lock); + WITNESS_LOCK(sx-sx_object, LOP_TRYLOCK, file, line); return (1); } else { LOCK_LOG_TRY(SLOCK, sx-sx_object, 0, 0, file, line); @@ -180,9 +179,8 @@ sx-sx_xholder = curthread; LOCK_LOG_LOCK(XLOCK, sx-sx_object, 0, 0, file, line); - WITNESS_LOCK(sx-sx_object, LOP_EXCLUSIVE, file, line); - mtx_unlock(sx-sx_lock); + WITNESS_LOCK(sx-sx_object, LOP_EXCLUSIVE, file, line); } int @@ -194,9 +192,9 @@ sx-sx_cnt--; sx-sx_xholder = curthread; LOCK_LOG_TRY(XLOCK, sx-sx_object, 0, 1, file, line); + mtx_unlock(sx-sx_lock); WITNESS_LOCK(sx-sx_object, LOP_EXCLUSIVE | LOP_TRYLOCK, file, line); - mtx_unlock(sx-sx_lock); return (1); } else { LOCK_LOG_TRY(XLOCK, sx-sx_object, 0, 0, file, line); ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
fun with WITNESS and pool mutex
When I was attempting to debug a system deadlock problem where the culprit process was sleeping on a pool mutex, I noticed that show witness in ddb doesn't report anything about this particular mutex flavor. I discovered that witness doesn't monitor these mutexes because mtx_pool_setup() calls mtx_init with the MTX_NOWITNESS flag. These are a mutexes bit special, because they are supposed to be leaf mutexes and no other mutexes should be grabbed after them. The deadlock in question caused me to discover a violation of this restriction, so I wondered if there were more problems of this type in the code. I suspected there would be, since there haven't been any automatic checks of to verify that these mutexes are being used correctly. Just for grins, I removed the MTX_NOWITNESS flag from mtx_pool_setup() and quickly found the first violation during the boot sequence: Mounting root from ufs:/dev/da0s1a acquiring duplicate lock of same type: pool mutex 1st pool mutex @ /usr/src/sys/kern/vfs_syscalls.c:736 2nd pool mutex @ /usr/src/sys/kern/kern_lock.c:598 Stack backtrace: backtrace(c051f4df,c051c041,c051adcc,256,c05e4808) at backtrace+0x17 witness_lock(c05e0cf0,8,c051adcc,256,c05e2ae0) at witness_lock+0x697 _mtx_lock_flags(c05e0cf0,0,c051adcc,256,c641a248) at _mtx_lock_flags+0xb1 lockstatus(c641a304,0,e4b1ab24,c036f2e8,e4b1ab44) at lockstatus+0x3c vop_stdislocked(e4b1ab44,e4b1ab30,c0464f78,e4b1ab44,e4b1ab58) at vop_stdislocked +0x21 vop_defaultop(e4b1ab44,e4b1ab58,c0375a67,e4b1ab44,c05e4808) at vop_defaultop+0x1 8 ufs_vnoperate(e4b1ab44,c05e4808,c05740ac,c05b67e0,c641a248) at ufs_vnoperate+0x1 8 assert_vop_locked(c641a248,c05197fc,c05b74e0,c641a248,0) at assert_vop_locked+0x 47 VOP_GETVOBJECT(c641a248,0,c0524224,2e0,c05740ac) at VOP_GETVOBJECT+0x3f kern_open(c61c64c0,bfbff6b0,0,1,0) at kern_open+0x44f open(c61c64c0,e4b1ad10,c0537dc3,3fd,3) at open+0x30 syscall(2f,2f,2f,1,0) at syscall+0x26e Xint0x80_syscall() at Xint0x80_syscall+0x1d --- syscall (5), eip = 0x80529ef, esp = 0xbfbff16c, ebp = 0xbfbff208 --- The code in question: FILEDESC_LOCK(fdp); FILE_LOCK(fp); if (fp-f_count == 1) { KASSERT(fdp-fd_ofiles[indx] != fp, (Open file descriptor lost all refs)); FILEDESC_UNLOCK(fdp); FILE_UNLOCK(fp); VOP_UNLOCK(vp, 0, td); vn_close(vp, flags FMASK, fp-f_cred, td); fdrop(fp, td); td-td_retval[0] = indx; return 0; } /* assert that vn_open created a backing object if one is needed */ KASSERT(!vn_canvmio(vp) || VOP_GETVOBJECT(vp, NULL) == 0, (open: vmio vnode has no backing object after vn_open)); fp-f_data = vp; fp-f_flag = flags FMASK; fp-f_ops = vnops; fp-f_type = (vp-v_type == VFIFO ? DTYPE_FIFO : DTYPE_VNODE); FILEDESC_UNLOCK(fdp); FILE_UNLOCK(fp); This one appears to be easily fixable by moving the second KASSERT down a few lines to below the FILE_UNLOCK() call. Any bets on how many other potential deadlock problems there are in the tree? ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: fun with WITNESS and pool mutex
On 18 Jun, I wrote: When I was attempting to debug a system deadlock problem where the culprit process was sleeping on a pool mutex, I noticed that show witness in ddb doesn't report anything about this particular mutex flavor. I discovered that witness doesn't monitor these mutexes because mtx_pool_setup() calls mtx_init with the MTX_NOWITNESS flag. These are a mutexes bit special, because they are supposed to be leaf mutexes and no other mutexes should be grabbed after them. The deadlock in question caused me to discover a violation of this restriction, so I wondered if there were more problems of this type in the code. I suspected there would be, since there haven't been any automatic checks of to verify that these mutexes are being used correctly. Just for grins, I removed the MTX_NOWITNESS flag from mtx_pool_setup() and quickly found the first violation during the boot sequence: [ snip - I committed a patch ] Any bets on how many other potential deadlock problems there are in the tree? The only problems I've found so far are in fdrop_locked() and kern_open(), so things might not be as bleak as I initially feared. I also got this LOR message from witness about the sx lock code: lock order reversal 1st 0xc05e1020 pool mutex (pool mutex) @ /usr/src/sys/kern/kern_sx.c:111 2nd 0xc05dfa00 module subsystem sx lock (module subsystem sx lock) @ /usr/src/s ys/kern/kern_module.c:126 I *think* this is actually a safe use of pool mutex. What would be the best way to quite the complaint? The two possibilities that I can think of are to handle this as a special case in the witness code or to slightly rearrange the code in sx_lock.c to swap the order of the WITNESS_LOCK() and mtx_unlock() calls in _sx*_lock(). ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to [EMAIL PROTECTED]
2 tcp/ip witness warnings
hi, Expirementing today with netgraph sockets hit into these two cases: 1: lock order reversal 1st 0xc27658b4 inp (inp) @ /home/phantom/src5/sys/netinet/tcp_input.c:640 2nd 0xc031832c tcp (tcp) @ /home/phantom/src5/sys/netinet/tcp_usrreq.c:621 Stack backtrace: backtrace(c02e9301,c031832c,c02eef23,c02eef23,c02f0154) at backtrace+0x17 witness_lock(c031832c,8,c02f0154,26d,0) at witness_lock+0x66d _mtx_lock_flags(c031832c,0,c02f0154,26d,0) at _mtx_lock_flags+0xb1 tcp_usr_rcvd(c273a200,80,c02e8f74,c294db00,3b9aca00) at tcp_usr_rcvd+0x30 soreceive(c273a200,0,cd3259f0,cd3259e8,0) at soreceive+0x89a ng_ksocket_incoming2(c2960500,0,c273a200,4,c26975d8) at ng_ksocket_incoming2+0x2 65 ng_apply_item(c2960500,c26975c0,c294bcee,7d7,c02e8f74) at ng_apply_item+0x4af ng_snd_item(c26975c0,0,cd325adc,c2960500,c273a24c) at ng_snd_item+0x614 ng_send_fn(c2960500,0,c2549ac0,c273a200,4) at ng_send_fn+0x1b7 ng_ksocket_incoming(c273a200,c2960500,4,c0ee7168,0) at ng_ksocket_incoming+0x2f sowakeup(c273a200,c273a24c,c02ef894,822,5000) at sowakeup+0x97 tcp_input(c0ee7100,14,c01acbbd,c033b500,1) at tcp_input+0x22bb ip_input(c0ee7100,0,c02ee77d,e9,c0eb99c0) at ip_input+0x7d6 swi_net(0,0,c02e43b0,217,c0ec35f4) at swi_net+0x112 ithread_loop(c0ec2100,cd325d48,c02e4219,35f,5ed54) at ithread_loop+0x182 fork_exit(c01a3b40,c0ec2100,cd325d48) at fork_exit+0xc4 fork_trampoline() at fork_trampoline+0x1a 2: malloc() of 16 with the following non-sleepablelocks held: exclusive sleep mutex inp r = 0 (0xc2765194) locked @ /home/phantom/src5/sys/net inet/tcp_input.c:640 exclusive sleep mutex tcp r = 0 (0xc031832c) locked @ /home/phantom/src5/sys/net inet/tcp_usrreq.c:480 exclusive sleep mutex netisr lock r = 0 (0xc0360260) locked @ /home/phantom/src5 /sys/net/netisr.c:215 db tr Debugger(c02d6d9c,cd32587c,3,cd32587c,c0313240) at Debugger+0x54 witness_warn(5,0,c02f6550,c02dc43f,246) at witness_warn+0x1a0 uma_zalloc_arg(c083aa20,0,102,1f6,c02f0154) at uma_zalloc_arg+0x53 malloc(10,c0313240,102,c2760720,cd325904) at malloc+0xd4 in_sockaddr(0,cd3258f4,c02f0154,1f6,0) at in_sockaddr+0x29 tcp_usr_accept(c273a100,cd32593c,cd32594c,c254a013,c273a100) at tcp_usr_accept+0x13a soaccept(c273a100,cd32593c,132,64,4) at soaccept+0x3f ng_ksocket_finish_accept(c2a9fd00,0,7fd,0,cd325988) at ng_ksocket_finish_accept+0x73 ng_ksocket_incoming2(c294d500,0,c273be00,4,c2943d58) at ng_ksocket_incoming2+0x1e0 ng_apply_item(c294d500,c2943d40,c2981cee,7d7,c02e8f74) at ng_apply_item+0x4af ng_snd_item(c2943d40,0,c02f0154,c294d500,c273be4c) at ng_snd_item+0x614 ng_send_fn(c294d500,0,c2549ac0,c273be00,4) at ng_send_fn+0x1b7 ng_ksocket_incoming(c273be00,c294d500,4,c273a100,c273be00) at ng_ksocket_incoming+0x2f sowakeup(c273be00,c273be4c,0,c298dbb0,cd325b74) at sowakeup+0x97 sonewconn(c273be00,2,c033c378,c0ec4c9c,d71d0300) at sonewconn+0x199 syncache_socket(c298dbb0,c273be00,c0ed2e00,c267b528,c27601c8) at syncache_socket+0x1f syncache_expand(cd325b74,c0ed2e68,cd325b70,c0ed2e00,5000) at syncache_expand+0x98 tcp_input(c0ed2e00,14,c01acbbd,c033b500,1) at tcp_input+0x676 ip_input(c0ed2e00,0,c02ee77d,e9,c0eb99c0) at ip_input+0x7d6 swi_net(0,0,c02e43b0,217,c0ec35f4) at swi_net+0x112 ithread_loop(c0ec2100,cd325d48,c02e4219,35f,5ed54) at ithread_loop+0x182 fork_exit(c01a3b40,c0ec2100,cd325d48) at fork_exit+0xc4 fork_trampoline() at fork_trampoline+0x1a --- trap 0x1, eip = 0, esp = 0xcd325d7c, ebp = 0 --- To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re:2 tcp/ip witness warnings
Thanks for reporting these. NFS has this same problem with sowakeup(). You can find an old patch I made for this at http://www.freebsd.org/~hsu/hammer.diff It takes a big hammer to the problem and is less than elegant, so I haven't committed it. I hope to have time to address this problem a better way some time in the future, after some other pressing locking issues are fixed. Jeffrey To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
witness needs DDB
subr_witness.c now needs DDB option enabled, otherwise can not be compiled. please fix it. David Xu To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: Witness problem with sound
On Wednesday 05 March 2003 07:42, Pete Carah wrote: I don't know how system-specific this problem is, but: Sony VAIO R505ES Sound is Intel ICH3 + Yamaha. This or something closely related has been happening for weeks. Several times earlier this week and last week sound panic'd, and also sometimes there was a panic (several different kinds) on boot. Late last week X wouldn't start due to not being able to see the VESA modes. All those except the sound problems currently appear fixed... This may or may not be related to the fact that acpi puts nearly all device interrupts on irq 9 (which causes other problems). There are other problems (mostly with pcmcia, which is left out of this kernel; it causes hangs on init), but this one is obvious and does leave the system running. A remaining one that bothers multi-type usb card readers is that CAM/XPT only scans LUN 0 for any device it finds. All my card readers show as only CF because of this, and I have to give 4 camcontrol commands to get the system to see the rest (at least it does work with manual intervention!!). -- Pete To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message I got the same error message from my R505DC. And the sound also doesn;t work correctly. Jun Su To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
RE: witness: nfs buf queue
On 06-Mar-2003 Jonathan Lemon wrote: Doing a kernel build over NFS on today's -current gives a pile of following error messages during the final link phase: Acquiring lockmgr lock nfs with the following non-sleepablelocks held: exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ ../../../kern/vfs_bio.c:2107 Acquiring lockmgr lock nfs with the following non-sleepablelocks held: exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ ../../../kern/vfs_bio.c:2107 Acquiring lockmgr lock nfs with the following non-sleepablelocks held: exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ ../../../kern/vfs_bio.c:2107 ... Witness didn't used to complain about these until my recent commits, so these could be old bugs that we are just now seeing. It does look like all the lock functions in inmem() use LK_NOWAIT which is exempted from the witness check: if ((flags (LK_NOWAIT|LK_RELEASE)) == 0) WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, lkp-lk_interlock-mtx_object, Acquiring lockmgr lock \%s\, lkp-lk_wmesg); A stack trace from one of these would be helpful. -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: witness: nfs buf queue
On Thu, Mar 06, 2003 at 01:01:49PM -0500, John Baldwin wrote: On 06-Mar-2003 Jonathan Lemon wrote: Doing a kernel build over NFS on today's -current gives a pile of following error messages during the final link phase: Acquiring lockmgr lock nfs with the following non-sleepablelocks held: exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ ../../../kern/vfs_bio.c:2107 Acquiring lockmgr lock nfs with the following non-sleepablelocks held: exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ ../../../kern/vfs_bio.c:2107 Acquiring lockmgr lock nfs with the following non-sleepablelocks held: exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ ../../../kern/vfs_bio.c:2107 ... Witness didn't used to complain about these until my recent commits, so these could be old bugs that we are just now seeing. It does look like all the lock functions in inmem() use LK_NOWAIT which is exempted from the witness check: if ((flags (LK_NOWAIT|LK_RELEASE)) == 0) WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, lkp-lk_interlock-mtx_object, Acquiring lockmgr lock \%s\, lkp-lk_wmesg); A stack trace from one of these would be helpful. Here you go. This is from -current as of roughly an hour ago, I managed to break into ddb in the middle of witness_warn: kvprintf(c03918eb,c0210c20,e4050bfc,a,e4050c48) at kvprintf+0x8d vprintf(c03918eb,e4050c48,0,c0428620,10001) at vprintf+0x57 witness_warn(5,c0400da8,c03918eb,c039fff9,c6a752d0) at witness_warn+0xbe lockmgr(ca438304,10001,ca438248,c6a752d0,12) at lockmgr+0xc8 vop_sharedlock(e4050c98,0,c039a8b0,35c,c01eb1f2) at vop_sharedlock+0x7d vn_lock(ca438248,12,c6a752d0,83b,c04285e0) at vn_lock+0xeb flushbufqueues(c04285e0,0,c0398a32,104,64) at flushbufqueues+0x100 buf_daemon(0,e4050d48,c0390fdb,35f,0) at buf_daemon+0xd5 fork_exit(c0239920,0,e4050d48) at fork_exit+0xc4 fork_trampoline() at fork_trampoline+0x1a --- trap 0x1, eip = 0, esp = 0xe4050d7c, ebp = 0 --- No gdb stack trace, my dump device is too small. :-( -- Jonathan To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: witness: nfs buf queue
On 06-Mar-2003 Jonathan Lemon wrote: On Thu, Mar 06, 2003 at 01:01:49PM -0500, John Baldwin wrote: On 06-Mar-2003 Jonathan Lemon wrote: Doing a kernel build over NFS on today's -current gives a pile of following error messages during the final link phase: Acquiring lockmgr lock nfs with the following non-sleepablelocks held: exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ ../../../kern/vfs_bio.c:2107 Acquiring lockmgr lock nfs with the following non-sleepablelocks held: exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ ../../../kern/vfs_bio.c:2107 Acquiring lockmgr lock nfs with the following non-sleepablelocks held: exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ ../../../kern/vfs_bio.c:2107 ... Witness didn't used to complain about these until my recent commits, so these could be old bugs that we are just now seeing. It does look like all the lock functions in inmem() use LK_NOWAIT which is exempted from the witness check: if ((flags (LK_NOWAIT|LK_RELEASE)) == 0) WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, lkp-lk_interlock-mtx_object, Acquiring lockmgr lock \%s\, lkp-lk_wmesg); A stack trace from one of these would be helpful. Here you go. This is from -current as of roughly an hour ago, I managed to break into ddb in the middle of witness_warn: kvprintf(c03918eb,c0210c20,e4050bfc,a,e4050c48) at kvprintf+0x8d vprintf(c03918eb,e4050c48,0,c0428620,10001) at vprintf+0x57 witness_warn(5,c0400da8,c03918eb,c039fff9,c6a752d0) at witness_warn+0xbe lockmgr(ca438304,10001,ca438248,c6a752d0,12) at lockmgr+0xc8 10001 = LK_INTERLOCK | LK_SHARED, note no LK_NOWAIT flag, hence the bug. :( vop_sharedlock(e4050c98,0,c039a8b0,35c,c01eb1f2) at vop_sharedlock+0x7d The bug seems to be that vop_sharedlock doesn't honor LK_NOWAIT or any other external lockmgr flags. Untested possible patch below. I'd really like jeffr's comments on it though: Index: vfs_default.c === RCS file: /usr/cvs/src/sys/kern/vfs_default.c,v retrieving revision 1.75 diff -u -r1.75 vfs_default.c --- vfs_default.c 3 Mar 2003 23:37:50 - 1.75 +++ vfs_default.c 6 Mar 2003 20:00:40 - @@ -445,8 +445,7 @@ default: panic(vop_sharedlock: bad operation %d, flags LK_TYPE_MASK); } - if (flags LK_INTERLOCK) - vnflags |= LK_INTERLOCK; + vnflags |= flags (LK_INTERLOCK | LK_EXTFLG_MASK); #ifndefDEBUG_LOCKS return (lockmgr(vp-v_vnlock, vnflags, VI_MTX(vp), ap-a_td)); #else @@ -503,8 +502,7 @@ default: panic(vop_nolock: bad operation %d, flags LK_TYPE_MASK); } - if (flags LK_INTERLOCK) - vnflags |= LK_INTERLOCK; + vnflags |= flags (LK_INTERLOCK | LK_EXTFLG_MASK); return(lockmgr(vp-v_vnlock, vnflags, VI_MTX(vp), ap-a_td)); #else /* for now */ /* -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
witness: nfs buf queue
Doing a kernel build over NFS on today's -current gives a pile of following error messages during the final link phase: Acquiring lockmgr lock nfs with the following non-sleepablelocks held: exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ ../../../kern/vfs_bio.c:2107 Acquiring lockmgr lock nfs with the following non-sleepablelocks held: exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ ../../../kern/vfs_bio.c:2107 Acquiring lockmgr lock nfs with the following non-sleepablelocks held: exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ ../../../kern/vfs_bio.c:2107 ... -- Jonathan To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
WITNESS panic in netinet/tcp_input.c
It seems like I'm being handed kernel panics on a platter today. I just got a WITNESS-related one in /usr/src/sys/netinet/tcp_input.c:2190. And actually, in the process of writing this message the first time, it happened again. I've since booted to an older kernel. The kernel this comes from is 5.0-CURRENT from Tue Mar 4 20:30:35 CST 2003. Soft updates appears to have turned my public PGP keyring file into a bunch of NULs too : I assume that is an artifact of the kernel panic. Script started on Wed Mar 5 19:51:06 2003 edgemaster# gdb -k /boot/kernel/kernel.debug vmcore.4 GNU gdb 5.2.1 (FreeBSD) Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-undermydesk-freebsd... panic: bwrite: buffer is not busy??? panic messages: --- panic: lock (sleep mutex) tcp not locked @ /usr/src/sys/netinet/tcp_input.c:2190 Stack backtrace: syncing disks, buffers remaining... panic: bwrite: buffer is not busy??? Uptime: 20m40s Dumping 1279 MB ata1: resetting devices .. done [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] 16 32 48[CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] 64 80 96 112 128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 464 480 496 512 528 544 560 576 592 608 624 640 656 672 688 704 720 736 752 768 784 800 816 832 848 864 880 896 912 928 944 960 976 992 1008 1024 1040 1056 1072 1088 1104 1120 1136 1152 1168 1184 1200 1216 1232 1248 1264 --- #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:239 239 dumping++; (kgdb) bt full #0 doadump () at /usr/src/sys/kern/kern_shutdown.c:239 No locals. #1 0xc01cd66a in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:371 No locals. #2 0xc01cd8d3 in panic () at /usr/src/sys/kern/kern_shutdown.c:542 td = (struct thread *) 0xc281aa50 bootopt = 260 newpanic = 0 buf = bwrite: buffer is not busy???\0ked @ /usr/src/sys/netinet/tcp_input.c:2190, '\0' repeats 182 times #3 0xc020e142 in bwrite (bp=0xd30a4778) at /usr/src/sys/kern/vfs_bio.c:795 oldflags = 537002148 newbp = (struct buf *) 0xc767d5b4 #4 0xc020fe5c in vfs_bio_awrite (bp=0xd30a4778) at /usr/src/sys/kern/vfs_bio.c:1692 i = 1 j = 0 lblkno = 0 vp = (struct vnode *) 0xc767d5b4 ncl = 0 nwritten = 16384 size = 16384 maxcl = 8 #5 0xc02c0dca in ffs_fsync (ap=0xdf976a00) at /usr/src/sys/ufs/ffs/ffs_vnops.c:257 vp = (struct vnode *) 0xc767d5b4 ip = (struct inode *) 0xd30a4778 bp = (struct buf *) 0xd30a4778 nbp = (struct buf *) 0x0 error = 0 wait = 0 passes = 4 skipmeta = 0 lbn = 1 #6 0xc02bff1e in ffs_sync (mp=0xc697a000, waitfor=2, cred=0xc2806e80, td=0xc037f6a0) at vnode_if.h:612 nvp = (struct vnode *) 0xc767d490 vp = (struct vnode *) 0xc767d5b4 devvp = (struct vnode *) 0xc767d5b4 ip = (struct inode *) 0x0 ump = (struct ufsmount *) 0xc699b300 fs = (struct fs *) 0xc697 error = 0 count = 0 wait = 0 lockreq = 18 allerror = 0 #7 0xc022261b in sync (td=0xc037f6a0, uap=0x0) at /usr/src/sys/kern/vfs_syscalls.c:138 mp = (struct mount *) 0xc697a000 nmp = (struct mount *) 0x0 asyncflag = 0 #8 0xc01cd29c in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:280 bp = (struct buf *) 0x0 iter = -1031689648 nbusy = -1031700352 pbusy = -1031689648 subiter = -1031700352 #9 0xc01cd8d3 in panic () at /usr/src/sys/kern/kern_shutdown.c:542 td = (struct thread *) 0xc281aa50 bootopt = 256 newpanic = 1 buf = bwrite: buffer is not busy???\0ked @ /usr/src/sys/netinet/tcp_input.c:2190, '\0' repeats 182 times #10 0xc01ee013 in witness_unlock (lock=0xc038f9cc, flags=8, file=0xc035a374 /usr/src/sys/netinet/tcp_input.c, line=2190) at /usr/src/sys/kern/subr_witness.c:951 lock_list = (struct lock_list_entry **) 0xc03e3540 instance = (struct lock_instance *) 0xc03e3554 class = (struct lock_class *) 0xc0384160 s = 1664 i = 0 j = -1070007860 #11 0xc01c4952 in _mtx_unlock_flags (m=0xc03e3554, opts=0, file=0xc038f9cc `A8À\a\2125À\a\2125À, line=-1069664960) at /usr/src/sys/kern/kern_mutex.c:357 No locals. #12 0xc0255ec9 in tcp_input (m=0xc038f9cc, off0=20) at /usr/src/sys/netinet/tcp_input.c:2324 th = (struct tcphdr *) 0xc34c1824 ip = (struct ip *) 0xc34c1810 ipov = (struct ipovly
RE: witness_get: witness exhausted?
On 03-Mar-2003 Andrew Gallatin wrote: I'm developing a character driver which tracks a lot of state on a per-open basis. I've got several mutexes in there which are initialzed at open, and destroyed at close. After a few dozen opens, witness seems to croak with: witness_get: witness exhausted Am I leaking something? Or is the witness code? I looked at subr_witness.c, and I don't see witness_free() being called from witness_destroy(). There's probably some design constraint that I don't understand. Unfortunately dead witnesses may still be stuck in the lock order hierarchy and I haven't figured out yet how to properly handle the case of free'ing a witness structure from the tree while preserving the correct lock orders. You can try http://www.freebsd.org/~jhb/patches/witness.patch but I don't think it will help with your specific case. FWIW, Witness (and the FreeBSD debugging environment in general) is why I've gotten approval to co-develop this driver on FreeBSD (in addition to linux). Its already caught several locking bugs. Glad to hear it is at least working in some cases. :) -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
RE: witness_get: witness exhausted?
John Baldwin writes: Unfortunately dead witnesses may still be stuck in the lock order hierarchy and I haven't figured out yet how to properly handle the case of free'ing a witness structure from the tree while preserving the correct lock orders. You can try Ah, I think I see. Thanks for the info. http://www.freebsd.org/~jhb/patches/witness.patch but I don't think it will help with your specific case. It would probably help a little, but not very much.. Thanks, Drew To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message