subject:"Witness This"

witness_lock_list_get: witness exhausted

2023-01-02 Thread Michael Jung

First shame on me as I obviously missed the reply to my former report and did 
not
report back on a proposed patch.

https://lists.freebsd.org/pipermail/freebsd-current/2018-January/068136.html

I am still running current – the guest has 160GB Ram and 64 vCPU’s assigned 
under ESXi
which mainly runs poudriere for amd64+arm builds and I again noticed
"witness_lock_list_get: witness exhausted" on the console (which I don't pay 
much attention to).

UNAME: FreeBSD poudriere.gopai.com 14.0-CURRENT FreeBSD 14.0-CURRENT #0 
main-n259872-f948cb717f50: Wed Dec 28 13:13:43 EST 2022 
mi...@poudriere.gopai.com:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

It seems that LOCK_NCHILDREN and LOCK_CHILDCOUNT never got incremented due to 
my lack of reporting in
/usr/src/sys/kern/subr_witness.c

If it's just a matter of incrementing LOCK_CHIDCOUNT 4096 and LOCK_NCHILDREN 20 
without adding the
sysctl knobs I do that also - I am not kernel savvy.

I will make sure and test an updated patch should one be made available or 
advise whether to increment these values
or ignore the warnings.

Regards,

Michael Jung





CONFIDENTIALITY NOTE: This message is intended only for the use
of the individual or entity to whom it is addressed and may
contain information that is privileged, confidential, and
exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby
notified that any dissemination, distribution or copying
of this communication is strictly prohibited. If you have
received this transmission in error, please notify us by
telephone at (502) 212-4000 or notify us at PAI, Dept. 99,
2101 High Wickham Place, Suite 101, Louisville, KY 40245

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast, a leader in email security and cyber 
resilience. Mimecast integrates email defenses with brand protection, security 
awareness training, web security, compliance and other essential capabilities. 
Mimecast helps protect large and small organizations from malicious activity, 
human error and technology failure; and to lead the movement toward building a 
more resilient world. To find out more, visit our website.

Re: usbhid panic when switching vt-s (invariants+witness enabled)

2022-09-30 Thread Hans Petter Selasky


On 9/30/22 12:31, Andriy Gapon wrote:

On 26/09/2022 18:13, Hans Petter Selasky wrote:

On 9/23/22 23:43, Hans Petter Selasky wrote:

vpanic() at 0x808f4c84 = vpanic+0x184/frame 0xfe003590e900
panic() at 0x808f4a33 = panic+0x43/frame 0xfe003590e960
sleepq_add() at 0x809521ab = sleepq_add+0x37b/frame 
0xfe003590e9b0

_sleep() at 0x80902118 = _sleep+0x238/frame 0xfe003590ea40
usbhid_sync_xfer() at 0x8532e071 = 
usbhid_sync_xfer+0x171/frame 0xfe003590eaa0
usbhid_set_report() at 0x8532db26 = 
usbhid_set_report+0x96/frame 0xfe003590eae0
hid_set_report() at 0x80686caa = hid_set_report+0x6a/frame 
0xfe003590eb20
hidbus_write() at 0x85335a7c = hidbus_write+0x5c/frame 
0xfe003590eb50
hid_write() at 0x80686b98 = hid_write+0x58/frame 
0xfe003590eb80
hkbd_set_leds() at 0x85c1cfe6 = hkbd_set_leds+0x206/frame 
0xfe003590ebc0
hkbd_ioctl_locked() at 0x85c1cd6b = 
hkbd_ioctl_locked+0x33b/frame 0xfe003590ec20
hkbd_ioctl_locked() at 0x85c1caff = 
hkbd_ioctl_locked+0xcf/frame 0xfe003590ec80
hkbd_ioctl() at 0x85c1ba5a = hkbd_ioctl+0xba/frame 
0xfe003590ecc0
kbdmux_ioctl() at 0x80695d3b = kbdmux_ioctl+0x12b/frame 
0xfe003590ed00
vt_window_switch() at 0x8079d969 = 
vt_window_switch+0x229/frame 0xfe003590ed40
vt_switch_timer() at 0x807a15a1 = vt_switch_timer+0x21/frame 
0xfe003590ed60


Can you test this patch:
https://reviews.freebsd.org/D36715


Sorry that it took a while.
I cannot reproduce the problem after applying the patch.
I see that you already committed the change, but I thought that I'd let 
you know.

Thank you very much!



You are welcome!

--HPS

Re: usbhid panic when switching vt-s (invariants+witness enabled)

2022-09-30 Thread Andriy Gapon


On 26/09/2022 18:13, Hans Petter Selasky wrote:

On 9/23/22 23:43, Hans Petter Selasky wrote:

vpanic() at 0x808f4c84 = vpanic+0x184/frame 0xfe003590e900
panic() at 0x808f4a33 = panic+0x43/frame 0xfe003590e960
sleepq_add() at 0x809521ab = sleepq_add+0x37b/frame 0xfe003590e9b0
_sleep() at 0x80902118 = _sleep+0x238/frame 0xfe003590ea40
usbhid_sync_xfer() at 0x8532e071 = usbhid_sync_xfer+0x171/frame 
0xfe003590eaa0
usbhid_set_report() at 0x8532db26 = usbhid_set_report+0x96/frame 
0xfe003590eae0
hid_set_report() at 0x80686caa = hid_set_report+0x6a/frame 
0xfe003590eb20

hidbus_write() at 0x85335a7c = hidbus_write+0x5c/frame 
0xfe003590eb50
hid_write() at 0x80686b98 = hid_write+0x58/frame 0xfe003590eb80
hkbd_set_leds() at 0x85c1cfe6 = hkbd_set_leds+0x206/frame 
0xfe003590ebc0
hkbd_ioctl_locked() at 0x85c1cd6b = hkbd_ioctl_locked+0x33b/frame 
0xfe003590ec20
hkbd_ioctl_locked() at 0x85c1caff = hkbd_ioctl_locked+0xcf/frame 
0xfe003590ec80

hkbd_ioctl() at 0x85c1ba5a = hkbd_ioctl+0xba/frame 0xfe003590ecc0
kbdmux_ioctl() at 0x80695d3b = kbdmux_ioctl+0x12b/frame 
0xfe003590ed00
vt_window_switch() at 0x8079d969 = vt_window_switch+0x229/frame 
0xfe003590ed40
vt_switch_timer() at 0x807a15a1 = vt_switch_timer+0x21/frame 
0xfe003590ed60


Can you test this patch:
https://reviews.freebsd.org/D36715


Sorry that it took a while.
I cannot reproduce the problem after applying the patch.
I see that you already committed the change, but I thought that I'd let you 
know.
Thank you very much!

--
Andriy Gapon

Re: usbhid panic when switching vt-s (invariants+witness enabled)

2022-09-26 Thread Hans Petter Selasky


On 9/23/22 23:43, Hans Petter Selasky wrote:

vpanic() at 0x808f4c84 = vpanic+0x184/frame 0xfe003590e900
panic() at 0x808f4a33 = panic+0x43/frame 0xfe003590e960
sleepq_add() at 0x809521ab = sleepq_add+0x37b/frame 
0xfe003590e9b0

_sleep() at 0x80902118 = _sleep+0x238/frame 0xfe003590ea40
usbhid_sync_xfer() at 0x8532e071 = usbhid_sync_xfer+0x171/frame 
0xfe003590eaa0
usbhid_set_report() at 0x8532db26 = usbhid_set_report+0x96/frame 
0xfe003590eae0
hid_set_report() at 0x80686caa = hid_set_report+0x6a/frame 
0xfe003590eb20
hidbus_write() at 0x85335a7c = hidbus_write+0x5c/frame 
0xfe003590eb50

hid_write() at 0x80686b98 = hid_write+0x58/frame 0xfe003590eb80
hkbd_set_leds() at 0x85c1cfe6 = hkbd_set_leds+0x206/frame 
0xfe003590ebc0
hkbd_ioctl_locked() at 0x85c1cd6b = 
hkbd_ioctl_locked+0x33b/frame 0xfe003590ec20
hkbd_ioctl_locked() at 0x85c1caff = hkbd_ioctl_locked+0xcf/frame 
0xfe003590ec80
hkbd_ioctl() at 0x85c1ba5a = hkbd_ioctl+0xba/frame 
0xfe003590ecc0
kbdmux_ioctl() at 0x80695d3b = kbdmux_ioctl+0x12b/frame 
0xfe003590ed00
vt_window_switch() at 0x8079d969 = vt_window_switch+0x229/frame 
0xfe003590ed40
vt_switch_timer() at 0x807a15a1 = vt_switch_timer+0x21/frame 
0xfe003590ed60


Can you test this patch:
https://reviews.freebsd.org/D36715

--HPS

Re: usbhid panic when switching vt-s (invariants+witness enabled)

2022-09-23 Thread Hans Petter Selasky


On 9/23/22 23:33, Andriy Gapon wrote:


It seems that the problem may be related to different keyboard LED 
states between the VTs.  The system is a fresh stable/13.  The panic 
looks like an attempt to sleep while in an interrupt thread (a callout?).




Hi,

I suspect vt_switch_timer must have a task to handle those requests, 
which allows sleeping.


Vladimir, will you handle this?

It is a minor regression issue when using the new usbhid feature.

--HPS

panic: sleepq_add: td 0xf80006af to sleep on wchan 
0xf802ea752e08 with sleeping prohibited

cpuid = 5
time = 1663940484
KDB: stack backtrace:
db_trace_self_wrapper() at 0x8061555b = 
db_trace_self_wrapper+0x2b/frame 0xfe003590e7f0
kdb_backtrace() at 0x80942637 = kdb_backtrace+0x37/frame 
0xfe003590e8a0

vpanic() at 0x808f4c84 = vpanic+0x184/frame 0xfe003590e900
panic() at 0x808f4a33 = panic+0x43/frame 0xfe003590e960
sleepq_add() at 0x809521ab = sleepq_add+0x37b/frame 
0xfe003590e9b0

_sleep() at 0x80902118 = _sleep+0x238/frame 0xfe003590ea40
usbhid_sync_xfer() at 0x8532e071 = usbhid_sync_xfer+0x171/frame 
0xfe003590eaa0
usbhid_set_report() at 0x8532db26 = usbhid_set_report+0x96/frame 
0xfe003590eae0
hid_set_report() at 0x80686caa = hid_set_report+0x6a/frame 
0xfe003590eb20
hidbus_write() at 0x85335a7c = hidbus_write+0x5c/frame 
0xfe003590eb50

hid_write() at 0x80686b98 = hid_write+0x58/frame 0xfe003590eb80
hkbd_set_leds() at 0x85c1cfe6 = hkbd_set_leds+0x206/frame 
0xfe003590ebc0
hkbd_ioctl_locked() at 0x85c1cd6b = 
hkbd_ioctl_locked+0x33b/frame 0xfe003590ec20
hkbd_ioctl_locked() at 0x85c1caff = hkbd_ioctl_locked+0xcf/frame 
0xfe003590ec80
hkbd_ioctl() at 0x85c1ba5a = hkbd_ioctl+0xba/frame 
0xfe003590ecc0
kbdmux_ioctl() at 0x80695d3b = kbdmux_ioctl+0x12b/frame 
0xfe003590ed00
vt_window_switch() at 0x8079d969 = vt_window_switch+0x229/frame 
0xfe003590ed40
vt_switch_timer() at 0x807a15a1 = vt_switch_timer+0x21/frame 
0xfe003590ed60
softclock_call_cc() at 0x809127c4 = 
softclock_call_cc+0x244/frame 0xfe003590ee20

softclock() at 0x80912c1c = softclock+0x7c/frame 0xfe003590ee50
ithread_loop() at 0x808b662a = ithread_loop+0x2da/frame 
0xfe003590eef0

fork_exit() at 0x808b2f85 = fork_exit+0xc5/frame 0xfe003590ef30
fork_trampoline() at 0x80c084fe = fork_trampoline+0xe/frame 
0xfe003590ef30

usbhid panic when switching vt-s (invariants+witness enabled)

2022-09-23 Thread Andriy Gapon




It seems that the problem may be related to different keyboard LED states 
between the VTs.  The system is a fresh stable/13.  The panic looks like an 
attempt to sleep while in an interrupt thread (a callout?).


panic: sleepq_add: td 0xf80006af to sleep on wchan 0xf802ea752e08 
with sleeping prohibited

cpuid = 5
time = 1663940484
KDB: stack backtrace:
db_trace_self_wrapper() at 0x8061555b = db_trace_self_wrapper+0x2b/frame 
0xfe003590e7f0

kdb_backtrace() at 0x80942637 = kdb_backtrace+0x37/frame 
0xfe003590e8a0
vpanic() at 0x808f4c84 = vpanic+0x184/frame 0xfe003590e900
panic() at 0x808f4a33 = panic+0x43/frame 0xfe003590e960
sleepq_add() at 0x809521ab = sleepq_add+0x37b/frame 0xfe003590e9b0
_sleep() at 0x80902118 = _sleep+0x238/frame 0xfe003590ea40
usbhid_sync_xfer() at 0x8532e071 = usbhid_sync_xfer+0x171/frame 
0xfe003590eaa0
usbhid_set_report() at 0x8532db26 = usbhid_set_report+0x96/frame 
0xfe003590eae0
hid_set_report() at 0x80686caa = hid_set_report+0x6a/frame 
0xfe003590eb20

hidbus_write() at 0x85335a7c = hidbus_write+0x5c/frame 
0xfe003590eb50
hid_write() at 0x80686b98 = hid_write+0x58/frame 0xfe003590eb80
hkbd_set_leds() at 0x85c1cfe6 = hkbd_set_leds+0x206/frame 
0xfe003590ebc0
hkbd_ioctl_locked() at 0x85c1cd6b = hkbd_ioctl_locked+0x33b/frame 
0xfe003590ec20
hkbd_ioctl_locked() at 0x85c1caff = hkbd_ioctl_locked+0xcf/frame 
0xfe003590ec80

hkbd_ioctl() at 0x85c1ba5a = hkbd_ioctl+0xba/frame 0xfe003590ecc0
kbdmux_ioctl() at 0x80695d3b = kbdmux_ioctl+0x12b/frame 
0xfe003590ed00
vt_window_switch() at 0x8079d969 = vt_window_switch+0x229/frame 
0xfe003590ed40
vt_switch_timer() at 0x807a15a1 = vt_switch_timer+0x21/frame 
0xfe003590ed60
softclock_call_cc() at 0x809127c4 = softclock_call_cc+0x244/frame 
0xfe003590ee20

softclock() at 0x80912c1c = softclock+0x7c/frame 0xfe003590ee50
ithread_loop() at 0x808b662a = ithread_loop+0x2da/frame 
0xfe003590eef0
fork_exit() at 0x808b2f85 = fork_exit+0xc5/frame 0xfe003590ef30
fork_trampoline() at 0x80c084fe = fork_trampoline+0xe/frame 
0xfe003590ef30



--
Andriy Gapon

Re: Can't build with INVARIANTS but not WITNESS

2022-04-27 Thread Mateusz Guzik

On 4/27/22, John F Carr  wrote:
> My -CURRENT kernel has INVARIANTS (inherited from GENERIC) but not WITNESS:
>
> include GENERIC
> ident   STRIATUS
> nooptions   WITNESS
> nooptions   WITNESS_SKIPSPIN
>
> My kernel build fails:
>
> /usr/home/jfc/freebsd/src/sys/kern/vfs_lookup.c:102:13: error: variable
> 'line' set but not used [-Werror,-Wunused-but-set-variable]
> int flags, line __diagused;
>^
> /usr/home/jfc/freebsd/src/sys/kern/vfs_lookup.c:101:14: error: variable
> 'file' set but not used [-Werror,-Wunused-but-set-variable]
> const char *file __diagused;
>
> The problem is, __diagused expands to nothing if INVARIANTS _or_ WITNESS is
> defined, but the variable in vfs_lookup.c is only used if WITNESS is
> defined.
>
> #if defined(INVARIANTS) || defined(WITNESS)
> #define __diagused
> #else
> #define __diagused  __unused
> #endif
>
> I think this code is trying to be too clever and causing more trouble than
> it prevents.  Change the || to &&, or replace __diagused with __unused
> everywhere.
>

I disagree. The entire point is to not end up with actually unused
variables even when is enabled.

I patched it up in
https://cgit.FreeBSD.org/src/commit/?id=b40c0db6f6d61ed594118d81dc691b9263a7e4d7
. This still allows for actually vars when only one of INVARIANTS or
WITNESS is defined, but that's a much smaller problem than allowing it
in general.

-- 
Mateusz Guzik

Can't build with INVARIANTS but not WITNESS

2022-04-27 Thread John F Carr

My -CURRENT kernel has INVARIANTS (inherited from GENERIC) but not WITNESS:

include GENERIC
ident   STRIATUS
nooptions   WITNESS
nooptions   WITNESS_SKIPSPIN

My kernel build fails:

/usr/home/jfc/freebsd/src/sys/kern/vfs_lookup.c:102:13: error: variable 'line' 
set but not used [-Werror,-Wunused-but-set-variable]
int flags, line __diagused;
   ^
/usr/home/jfc/freebsd/src/sys/kern/vfs_lookup.c:101:14: error: variable 'file' 
set but not used [-Werror,-Wunused-but-set-variable]
const char *file __diagused;

The problem is, __diagused expands to nothing if INVARIANTS _or_ WITNESS is 
defined, but the variable in vfs_lookup.c is only used if WITNESS is defined.

#if defined(INVARIANTS) || defined(WITNESS)
#define __diagused
#else
#define __diagused  __unused
#endif

I think this code is trying to be too clever and causing more trouble than it 
prevents.  Change the || to &&, or replace __diagused with __unused everywhere.

Re: witness_lock_list_get: witness exhausted

2021-10-04 Thread Mateusz Guzik

Just take it and change as you see fit, I don't have time to work on it.

On 10/4/21, Alan Somers  wrote:
> On Mon, Jan 8, 2018 at 5:31 PM Mateusz Guzik  wrote:
>>
>> On Tue, Jan 9, 2018 at 12:41 AM, Michael Jung  wrote:
>>
>> > On 2018-01-08 13:39, John Baldwin wrote:
>> >
>> >> On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote:
>> >>
>> >>> Hi!
>> >>>
>> >>> I've recently up'd my processor count on our poudriere box and have
>> >>> started noticing the error
>> >>> "witness_lock_list_get: witness exhausted" on the console.  The
>> >>> kernel
>> >>> *DOES NOT* crash but I
>> >>> thought the report may be useful to someone.
>> >>>
>> >>> $ uname -a
>> >>> FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun
>> >>> Nov
>> >>> 19 18:41:20 EST 2017
>> >>> mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64
>> >>>
>> >>> The machine is pretty busy running four poudriere build instances.
>> >>>
>> >>> last pid: 76584;  load averages: 115.07, 115.96, 98.30
>> >>>
>> >>>   up 6+07:32:59  14:44:03
>> >>> 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock
>> >>> CPU: 59.0% user,  0.0% nice, 40.7% system,  0.1% interrupt,  0.1%
>> >>> idle
>> >>> Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free
>> >>> ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other
>> >>>   25G Compressed, 32G Uncompressed, 1.24:1 Ratio
>> >>>
>> >>> Let me know what additional information I might supply.
>> >>>
>> >>
>> >> This just means that WITNESS stopped working because it ran out of
>> >> pre-allocated objects.  In particular the objects used to track how
>> >> many locks are held by how many threads:
>> >>
>> >> /*
>> >>  * XXX: This is somewhat bogus, as we assume here that at most 2048
>> >> threads
>> >>  * will hold LOCK_NCHILDREN locks.  We handle failure ok, and we
>> >> should
>> >>  * probably be safe for the most part, but it's still a SWAG.
>> >>  */
>> >> #define LOCK_NCHILDREN  5
>> >> #define LOCK_CHILDCOUNT 2048
>> >>
>> >> Probably the '2048' (max number of concurrent threads) needs to scale
>> >> with
>> >> MAXCPU.  2048 threads is probably a bit low on big x86 boxes.
>> >>
>> >
>> >
>> > Thank you for you explanation.  We are expanding our ESXi cluster and
>> > even
>> > though with standard edition I can only assign 64 vCPU's to a guest and
>> > as
>> > much
>> > RAM as I want, I do like to help with edge cases if I can make them
>> > occur
>> > pushing
>> > boundaries as I can towards additianional improvements in FreeBSD.
>> >
>>
>> Can you apply this and re-run the test?
>>
>> https://people.freebsd.org/~mjg/witness.diff
>>
>> It bumps the counters to be "high enough" but also starts tracking usage.
>> If you get
>> the message again, bump the values even higher.
>>
>> Once you get a complete poudriere run which did not result in the
>> problem,
>> do:
>> $ sysctl debug.witness.list_used debug.witness.list_max_used
>>
>> to dump the actual usage.
>
> This is a nice little patch.  Can we commit to head?  Even better
> would be if LOCK_CHILDCOUNT could be a tunable.  On my largish system,
> here's what I get shortly after boot:
>
> debug.witness.list_max_used: 8432
> debug.witness.list_used: 8420
>
> -Alan
>


-- 
Mateusz Guzik

Re: witness_lock_list_get: witness exhausted

2021-10-03 Thread Alan Somers

On Mon, Jan 8, 2018 at 5:31 PM Mateusz Guzik  wrote:
>
> On Tue, Jan 9, 2018 at 12:41 AM, Michael Jung  wrote:
>
> > On 2018-01-08 13:39, John Baldwin wrote:
> >
> >> On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote:
> >>
> >>> Hi!
> >>>
> >>> I've recently up'd my processor count on our poudriere box and have
> >>> started noticing the error
> >>> "witness_lock_list_get: witness exhausted" on the console.  The kernel
> >>> *DOES NOT* crash but I
> >>> thought the report may be useful to someone.
> >>>
> >>> $ uname -a
> >>> FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov
> >>> 19 18:41:20 EST 2017
> >>> mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64
> >>>
> >>> The machine is pretty busy running four poudriere build instances.
> >>>
> >>> last pid: 76584;  load averages: 115.07, 115.96, 98.30
> >>>
> >>>   up 6+07:32:59  14:44:03
> >>> 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock
> >>> CPU: 59.0% user,  0.0% nice, 40.7% system,  0.1% interrupt,  0.1% idle
> >>> Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free
> >>> ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other
> >>>   25G Compressed, 32G Uncompressed, 1.24:1 Ratio
> >>>
> >>> Let me know what additional information I might supply.
> >>>
> >>
> >> This just means that WITNESS stopped working because it ran out of
> >> pre-allocated objects.  In particular the objects used to track how
> >> many locks are held by how many threads:
> >>
> >> /*
> >>  * XXX: This is somewhat bogus, as we assume here that at most 2048
> >> threads
> >>  * will hold LOCK_NCHILDREN locks.  We handle failure ok, and we should
> >>  * probably be safe for the most part, but it's still a SWAG.
> >>  */
> >> #define LOCK_NCHILDREN  5
> >> #define LOCK_CHILDCOUNT 2048
> >>
> >> Probably the '2048' (max number of concurrent threads) needs to scale with
> >> MAXCPU.  2048 threads is probably a bit low on big x86 boxes.
> >>
> >
> >
> > Thank you for you explanation.  We are expanding our ESXi cluster and even
> > though with standard edition I can only assign 64 vCPU's to a guest and as
> > much
> > RAM as I want, I do like to help with edge cases if I can make them occur
> > pushing
> > boundaries as I can towards additianional improvements in FreeBSD.
> >
>
> Can you apply this and re-run the test?
>
> https://people.freebsd.org/~mjg/witness.diff
>
> It bumps the counters to be "high enough" but also starts tracking usage.
> If you get
> the message again, bump the values even higher.
>
> Once you get a complete poudriere run which did not result in the problem,
> do:
> $ sysctl debug.witness.list_used debug.witness.list_max_used
>
> to dump the actual usage.

This is a nice little patch.  Can we commit to head?  Even better
would be if LOCK_CHILDCOUNT could be a tunable.  On my largish system,
here's what I get shortly after boot:

debug.witness.list_max_used: 8432
debug.witness.list_used: 8420

-Alan

Re: witness_lock_list_get: witness exhausted

2018-01-08 Thread Mateusz Guzik

On Tue, Jan 9, 2018 at 12:41 AM, Michael Jung <mi...@mikej.com> wrote:

> On 2018-01-08 13:39, John Baldwin wrote:
>
>> On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote:
>>
>>> Hi!
>>>
>>> I've recently up'd my processor count on our poudriere box and have
>>> started noticing the error
>>> "witness_lock_list_get: witness exhausted" on the console.  The kernel
>>> *DOES NOT* crash but I
>>> thought the report may be useful to someone.
>>>
>>> $ uname -a
>>> FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov
>>> 19 18:41:20 EST 2017
>>> mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64
>>>
>>> The machine is pretty busy running four poudriere build instances.
>>>
>>> last pid: 76584;  load averages: 115.07, 115.96, 98.30
>>>
>>>   up 6+07:32:59  14:44:03
>>> 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock
>>> CPU: 59.0% user,  0.0% nice, 40.7% system,  0.1% interrupt,  0.1% idle
>>> Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free
>>> ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other
>>>   25G Compressed, 32G Uncompressed, 1.24:1 Ratio
>>>
>>> Let me know what additional information I might supply.
>>>
>>
>> This just means that WITNESS stopped working because it ran out of
>> pre-allocated objects.  In particular the objects used to track how
>> many locks are held by how many threads:
>>
>> /*
>>  * XXX: This is somewhat bogus, as we assume here that at most 2048
>> threads
>>  * will hold LOCK_NCHILDREN locks.  We handle failure ok, and we should
>>  * probably be safe for the most part, but it's still a SWAG.
>>  */
>> #define LOCK_NCHILDREN  5
>> #define LOCK_CHILDCOUNT 2048
>>
>> Probably the '2048' (max number of concurrent threads) needs to scale with
>> MAXCPU.  2048 threads is probably a bit low on big x86 boxes.
>>
>
>
> Thank you for you explanation.  We are expanding our ESXi cluster and even
> though with standard edition I can only assign 64 vCPU's to a guest and as
> much
> RAM as I want, I do like to help with edge cases if I can make them occur
> pushing
> boundaries as I can towards additianional improvements in FreeBSD.
>

Can you apply this and re-run the test?

https://people.freebsd.org/~mjg/witness.diff

It bumps the counters to be "high enough" but also starts tracking usage.
If you get
the message again, bump the values even higher.

Once you get a complete poudriere run which did not result in the problem,
do:
$ sysctl debug.witness.list_used debug.witness.list_max_used

to dump the actual usage.

-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: witness_lock_list_get: witness exhausted

2018-01-08 Thread Michael Jung


On 2018-01-08 13:39, John Baldwin wrote:

On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote:

Hi!

I've recently up'd my processor count on our poudriere box and have
started noticing the error
"witness_lock_list_get: witness exhausted" on the console.  The kernel
*DOES NOT* crash but I
thought the report may be useful to someone.

$ uname -a
FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun 
Nov

19 18:41:20 EST 2017
mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

The machine is pretty busy running four poudriere build instances.

last pid: 76584;  load averages: 115.07, 115.96, 98.30

  up 6+07:32:59  14:44:03
763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock
CPU: 59.0% user,  0.0% nice, 40.7% system,  0.1% interrupt,  0.1% idle
Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free
ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other
  25G Compressed, 32G Uncompressed, 1.24:1 Ratio

Let me know what additional information I might supply.


This just means that WITNESS stopped working because it ran out of
pre-allocated objects.  In particular the objects used to track how
many locks are held by how many threads:

/*
 * XXX: This is somewhat bogus, as we assume here that at most 2048 
threads

 * will hold LOCK_NCHILDREN locks.  We handle failure ok, and we should
 * probably be safe for the most part, but it's still a SWAG.
 */
#define LOCK_NCHILDREN  5
#define LOCK_CHILDCOUNT 2048

Probably the '2048' (max number of concurrent threads) needs to scale 
with

MAXCPU.  2048 threads is probably a bit low on big x86 boxes.



Thank you for you explanation.  We are expanding our ESXi cluster and 
even
though with standard edition I can only assign 64 vCPU's to a guest and 
as much
RAM as I want, I do like to help with edge cases if I can make them 
occur pushing

boundaries as I can towards additianional improvements in FreeBSD.

--mikej
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: witness_lock_list_get: witness exhausted

2018-01-08 Thread John Baldwin

On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote:
> Hi!
> 
> I've recently up'd my processor count on our poudriere box and have 
> started noticing the error
> "witness_lock_list_get: witness exhausted" on the console.  The kernel 
> *DOES NOT* crash but I
> thought the report may be useful to someone.
> 
> $ uname -a
> FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov 
> 19 18:41:20 EST 2017
> mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64
> 
> The machine is pretty busy running four poudriere build instances.
> 
> last pid: 76584;  load averages: 115.07, 115.96, 98.30   
>   
>   up 6+07:32:59  14:44:03
> 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock
> CPU: 59.0% user,  0.0% nice, 40.7% system,  0.1% interrupt,  0.1% idle
> Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free
> ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other
>   25G Compressed, 32G Uncompressed, 1.24:1 Ratio
> 
> Let me know what additional information I might supply.

This just means that WITNESS stopped working because it ran out of
pre-allocated objects.  In particular the objects used to track how
many locks are held by how many threads:

/*
 * XXX: This is somewhat bogus, as we assume here that at most 2048 threads
 * will hold LOCK_NCHILDREN locks.  We handle failure ok, and we should
 * probably be safe for the most part, but it's still a SWAG.
 */
#define LOCK_NCHILDREN  5
#define LOCK_CHILDCOUNT 2048

Probably the '2048' (max number of concurrent threads) needs to scale with
MAXCPU.  2048 threads is probably a bit low on big x86 boxes.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

witness_lock_list_get: witness exhausted

2017-11-28 Thread Michael Jung


Hi!

I've recently up'd my processor count on our poudriere box and have 
started noticing the error
"witness_lock_list_get: witness exhausted" on the console.  The kernel 
*DOES NOT* crash but I

thought the report may be useful to someone.

$ uname -a
FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov 
19 18:41:20 EST 2017

mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

The machine is pretty busy running four poudriere build instances.

last pid: 76584;  load averages: 115.07, 115.96, 98.30   
 
 up 6+07:32:59  14:44:03

763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock
CPU: 59.0% user,  0.0% nice, 40.7% system,  0.1% interrupt,  0.1% idle
Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free
ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other
 25G Compressed, 32G Uncompressed, 1.24:1 Ratio

Let me know what additional information I might supply.


Thanks!
--mikej
Copyright (c) 1992-2017 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.0-CURRENT #1 r325999: Sun Nov 19 18:41:20 EST 2017
mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
FreeBSD clang version 5.0.0 (tags/RELEASE_500/final 312559) (based on LLVM 
5.0.0svn)
WARNING: WITNESS option enabled, expect reduced performance.
VT(vga): text 80x25
CPU: Intel(R) Xeon(R) CPU E7- 4850  @ 2.00GHz (1995.00-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206f2  Family=0x6  Model=0x2f  Stepping=2
  
Features=0x1fa3fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,DTS,MMX,FXSR,SSE,SSE2,SS,HTT>
  
Features2=0x83b82203<SSE3,PCLMULQDQ,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,HV>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1
  Structured Extended Features=0x2
  TSC: P-state invariant
Hypervisor: Origin = "VMwareVMware"
real memory  = 96636764160 (92160 MB)
avail memory = 93873786880 (89525 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 60 CPUs
FreeBSD/SMP: 6 package(s) x 10 core(s)
random: unblocking device.
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0  irqs 0-23 on motherboard
SMP: AP CPU #46 Launched!
SMP: AP CPU #16 Launched!
SMP: AP CPU #7 Launched!
SMP: AP CPU #51 Launched!
SMP: AP CPU #20 Launched!
SMP: AP CPU #15 Launched!
SMP: AP CPU #1 Launched!
SMP: AP CPU #48 Launched!
SMP: AP CPU #23 Launched!
SMP: AP CPU #27 Launched!
SMP: AP CPU #12 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #55 Launched!
SMP: AP CPU #17 Launched!
SMP: AP CPU #52 Launched!
SMP: AP CPU #18 Launched!
SMP: AP CPU #39 Launched!
SMP: AP CPU #11 Launched!
SMP: AP CPU #41 Launched!
SMP: AP CPU #58 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #26 Launched!
SMP: AP CPU #37 Launched!
SMP: AP CPU #45 Launched!
SMP: AP CPU #40 Launched!
SMP: AP CPU #35 Launched!
SMP: AP CPU #53 Launched!
SMP: AP CPU #50 Launched!
SMP: AP CPU #44 Launched!
SMP: AP CPU #34 Launched!
SMP: AP CPU #30 Launched!
SMP: AP CPU #31 Launched!
SMP: AP CPU #47 Launched!
SMP: AP CPU #13 Launched!
SMP: AP CPU #28 Launched!
SMP: AP CPU #49 Launched!
SMP: AP CPU #54 Launched!
SMP: AP CPU #8 Launched!
SMP: AP CPU #57 Launched!
SMP: AP CPU #29 Launched!
SMP: AP CPU #4 Launched!
SMP: AP CPU #24 Launched!
SMP: AP CPU #33 Launched!
SMP: AP CPU #38 Launched!
SMP: AP CPU #59 Launched!
SMP: AP CPU #9 Launched!
SMP: AP CPU #42 Launched!
SMP: AP CPU #5 Launched!
SMP: AP CPU #6 Launched!
SMP: AP CPU #43 Launched!
SMP: AP CPU #14 Launched!
SMP: AP CPU #19 Launched!
SMP: AP CPU #10 Launched!
SMP: AP CPU #25 Launched!
SMP: AP CPU #56 Launched!
SMP: AP CPU #36 Launched!
SMP: AP CPU #22 Launched!
SMP: AP CPU #32 Launched!
SMP: AP CPU #21 Launched!
random: entropy device external interface
netmap: loaded module
[ath_hal] loaded
module_register_init: MOD_LOAD (vesa, 0x80f95c20, 0) error 19
kbd1 at kbdmux0
nexus0
vtvga0:  on motherboard
cryptosoft0:  on motherboard
acpi0:  on motherboard
acpi0: Power Button (fixed)
hpet0:  iomem 0xfed0-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
cpu0:  numa-domain 0 on acpi0
cpu1:  numa-domain 5 on acpi0
cpu2:  numa-domain 5 on acpi0
cpu3:  numa-domain 5 on acpi0
cpu4:  numa-domain 5 on acpi0
cpu5:  numa-domain 5 on acpi0
cpu6:  numa-domain 5 on acpi0
cpu7:  numa-domain 5 on acpi0
cpu8:  numa-domain 5 on acpi0
cpu9:  numa-domain 5 on acpi0
cpu10:  numa-domain 5 on acpi0
cpu11:  numa-domain 4 on acpi0
cpu12:  numa-domain 4 on acpi0
cpu13:  numa-domain 4 on acpi0
cpu14:  numa-domain 4 on acpi0
cpu15:  numa-domain 4 on acpi0
cpu16:  numa-domain 4 on acpi0
cpu17:  numa-domain 4 on

r324870 breaks boot on amd64 with WITNESS (was "svn commit: r324870 - in head/sys: amd64/include kern")

2017-10-22 Thread Ngie Cooper (yaneurabeya)

All,
I highly advise not upgrading to this revision if you use WITNESS. 
Please see the attached message for more details and reply to the commit log.
Cheers,
-Ngie

> Begin forwarded message:
> 
> From: "Ngie Cooper (yaneurabeya)" <yaneurab...@gmail.com>
> Subject: Re: svn commit: r324870 - in head/sys: amd64/include kern
> Date: October 22, 2017 at 17:19:32 PDT
> To: Mateusz Guzik <m...@freebsd.org>
> Cc: src-committers <src-committ...@freebsd.org>, svn-src-...@freebsd.org, 
> svn-src-h...@freebsd.org
> 
> 
>> On Oct 22, 2017, at 13:43, Mateusz Guzik <m...@freebsd.org> wrote:
>> 
>> Author: mjg
>> Date: Sun Oct 22 20:43:50 2017
>> New Revision: 324870
>> URL: https://svnweb.freebsd.org/changeset/base/324870
>> 
>> Log:
>> Make the sleepq chain hash size configurable per-arch and bump on amd64.
>> 
>> While here cache-align chains.
>> 
>> This shortens longest found chain during poudriere -j 80 from 32 to 16.
>> 
>> Pushing this higher up will probably require allocation on boot.
> 
> Hi Mateusz,
>   This change causes the Jenkins VMs to panic at boot with "panic: 
> witness_init: pending locks list is too small, increase WITNESS_PENDLIST” 
> when WITNESS is enabled: 
> https://ci.freebsd.org/job/FreeBSD-head-amd64-test/4781/console .
>   Please fix or revert.
> Thanks,
> -Ngie



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: New warnings from WITNESS

2016-11-06 Thread Michael Tuexen


> On 6 Nov 2016, at 19:41, Scott Long  wrote:
> 
> 
>> On Nov 6, 2016, at 11:01 AM, Michael Tuexen  wrote:
>> 
>>> On 6 Nov 2016, at 15:39, Konstantin Belousov  wrote:
>>> 
>>> On Sun, Nov 06, 2016 at 02:17:45PM +0100, Michael Tuexen wrote:
> On 6 Nov 2016, at 13:28, Konstantin Belousov  wrote:
> 
> On Sun, Nov 06, 2016 at 12:50:12PM +0100, Michael Tuexen wrote:
>> bus_dmamap_create with the following non-sleepable locks held:
>> exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ 
>> dev/mpt/mpt.c:2287
>> stack backtrace:
>> #0 0x80ac0300 at witness_debugger+0x70
>> #1 0x80ac15e7 at witness_warn+0x3d7
>> #2 0x81055fef at bus_dmamap_create+0x2f
>> #3 0x80678a25 at mpt_configure_ioc+0x3a5
>> #4 0x80677476 at mpt_attach+0x226
>> #5 0x80683299 at mpt_pci_attach+0x9c9
>> #6 0x80a9478d at device_attach+0x41d
>> #7 0x80a9595a at bus_generic_attach+0x4a
>> #8 0x806ebe75 at pci_attach+0xd5
>> #9 0x80a9478d at device_attach+0x41d
>> #10 0x80a9595a at bus_generic_attach+0x4a
>> #11 0x803c11a2 at acpi_pcib_acpi_attach+0x402
>> #12 0x80a9478d at device_attach+0x41d
>> #13 0x80a9595a at bus_generic_attach+0x4a
>> #14 0x803b4c8f at acpi_attach+0xdbf
>> #15 0x80a9478d at device_attach+0x41d
>> #16 0x80a9595a at bus_generic_attach+0x4a
>> #17 0x80ee03e3 at nexus_acpi_attach+0x73
>> 
>> ... and so on. Not sure which revision introduced it...
> r308268
> 
> I believe that this is an mpt(4) driver issue, which calls
> bus_dmamap_create(9) with the mpt mutex held.
 OK. Whom to contact? Or are you willing to look into it?
 I haven't worked in that area...
>>> 
>>> I am really not sure.  Looking at the svn history is not very encouraging.
>> Hmm. Time to change my setup, I guess...
>>> 
>>> Might be try freebsd-scsi@ as well.
>> I dropped them an e-mail...
>> 
>> 
> 
> Please see my response on that list.
Will do once it shows up in the archive since I'm not subscribed
to that list...

Best regards
Michael
> 
> Scott
> 
> 

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: New warnings from WITNESS

2016-11-06 Thread Scott Long


> On Nov 6, 2016, at 11:01 AM, Michael Tuexen  wrote:
> 
>> On 6 Nov 2016, at 15:39, Konstantin Belousov  wrote:
>> 
>> On Sun, Nov 06, 2016 at 02:17:45PM +0100, Michael Tuexen wrote:
 On 6 Nov 2016, at 13:28, Konstantin Belousov  wrote:
 
 On Sun, Nov 06, 2016 at 12:50:12PM +0100, Michael Tuexen wrote:
> bus_dmamap_create with the following non-sleepable locks held:
> exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ 
> dev/mpt/mpt.c:2287
> stack backtrace:
> #0 0x80ac0300 at witness_debugger+0x70
> #1 0x80ac15e7 at witness_warn+0x3d7
> #2 0x81055fef at bus_dmamap_create+0x2f
> #3 0x80678a25 at mpt_configure_ioc+0x3a5
> #4 0x80677476 at mpt_attach+0x226
> #5 0x80683299 at mpt_pci_attach+0x9c9
> #6 0x80a9478d at device_attach+0x41d
> #7 0x80a9595a at bus_generic_attach+0x4a
> #8 0x806ebe75 at pci_attach+0xd5
> #9 0x80a9478d at device_attach+0x41d
> #10 0x80a9595a at bus_generic_attach+0x4a
> #11 0x803c11a2 at acpi_pcib_acpi_attach+0x402
> #12 0x80a9478d at device_attach+0x41d
> #13 0x80a9595a at bus_generic_attach+0x4a
> #14 0x803b4c8f at acpi_attach+0xdbf
> #15 0x80a9478d at device_attach+0x41d
> #16 0x80a9595a at bus_generic_attach+0x4a
> #17 0x80ee03e3 at nexus_acpi_attach+0x73
> 
> ... and so on. Not sure which revision introduced it...
 r308268
 
 I believe that this is an mpt(4) driver issue, which calls
 bus_dmamap_create(9) with the mpt mutex held.
>>> OK. Whom to contact? Or are you willing to look into it?
>>> I haven't worked in that area...
>> 
>> I am really not sure.  Looking at the svn history is not very encouraging.
> Hmm. Time to change my setup, I guess...
>> 
>> Might be try freebsd-scsi@ as well.
> I dropped them an e-mail...
> 
> 

Please see my response on that list.

Scott


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: New warnings from WITNESS

2016-11-06 Thread Michael Tuexen

> On 6 Nov 2016, at 15:39, Konstantin Belousov  wrote:
> 
> On Sun, Nov 06, 2016 at 02:17:45PM +0100, Michael Tuexen wrote:
>>> On 6 Nov 2016, at 13:28, Konstantin Belousov  wrote:
>>> 
>>> On Sun, Nov 06, 2016 at 12:50:12PM +0100, Michael Tuexen wrote:
 bus_dmamap_create with the following non-sleepable locks held:
 exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ 
 dev/mpt/mpt.c:2287
 stack backtrace:
 #0 0x80ac0300 at witness_debugger+0x70
 #1 0x80ac15e7 at witness_warn+0x3d7
 #2 0x81055fef at bus_dmamap_create+0x2f
 #3 0x80678a25 at mpt_configure_ioc+0x3a5
 #4 0x80677476 at mpt_attach+0x226
 #5 0x80683299 at mpt_pci_attach+0x9c9
 #6 0x80a9478d at device_attach+0x41d
 #7 0x80a9595a at bus_generic_attach+0x4a
 #8 0x806ebe75 at pci_attach+0xd5
 #9 0x80a9478d at device_attach+0x41d
 #10 0x80a9595a at bus_generic_attach+0x4a
 #11 0x803c11a2 at acpi_pcib_acpi_attach+0x402
 #12 0x80a9478d at device_attach+0x41d
 #13 0x80a9595a at bus_generic_attach+0x4a
 #14 0x803b4c8f at acpi_attach+0xdbf
 #15 0x80a9478d at device_attach+0x41d
 #16 0x80a9595a at bus_generic_attach+0x4a
 #17 0x80ee03e3 at nexus_acpi_attach+0x73
 
 ... and so on. Not sure which revision introduced it...
>>> r308268
>>> 
>>> I believe that this is an mpt(4) driver issue, which calls
>>> bus_dmamap_create(9) with the mpt mutex held.
>> OK. Whom to contact? Or are you willing to look into it?
>> I haven't worked in that area...
> 
> I am really not sure.  Looking at the svn history is not very encouraging.
Hmm. Time to change my setup, I guess...
> 
> Might be try freebsd-scsi@ as well.
I dropped them an e-mail...

Best regards
Michael

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: New warnings from WITNESS

2016-11-06 Thread Konstantin Belousov

On Sun, Nov 06, 2016 at 02:17:45PM +0100, Michael Tuexen wrote:
> > On 6 Nov 2016, at 13:28, Konstantin Belousov  wrote:
> > 
> > On Sun, Nov 06, 2016 at 12:50:12PM +0100, Michael Tuexen wrote:
> >> bus_dmamap_create with the following non-sleepable locks held:
> >> exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ 
> >> dev/mpt/mpt.c:2287
> >> stack backtrace:
> >> #0 0x80ac0300 at witness_debugger+0x70
> >> #1 0x80ac15e7 at witness_warn+0x3d7
> >> #2 0x81055fef at bus_dmamap_create+0x2f
> >> #3 0x80678a25 at mpt_configure_ioc+0x3a5
> >> #4 0x80677476 at mpt_attach+0x226
> >> #5 0x80683299 at mpt_pci_attach+0x9c9
> >> #6 0x80a9478d at device_attach+0x41d
> >> #7 0x80a9595a at bus_generic_attach+0x4a
> >> #8 0x806ebe75 at pci_attach+0xd5
> >> #9 0x80a9478d at device_attach+0x41d
> >> #10 0x80a9595a at bus_generic_attach+0x4a
> >> #11 0x803c11a2 at acpi_pcib_acpi_attach+0x402
> >> #12 0x80a9478d at device_attach+0x41d
> >> #13 0x80a9595a at bus_generic_attach+0x4a
> >> #14 0x803b4c8f at acpi_attach+0xdbf
> >> #15 0x80a9478d at device_attach+0x41d
> >> #16 0x80a9595a at bus_generic_attach+0x4a
> >> #17 0x80ee03e3 at nexus_acpi_attach+0x73
> >> 
> >> ... and so on. Not sure which revision introduced it...
> > r308268
> > 
> > I believe that this is an mpt(4) driver issue, which calls
> > bus_dmamap_create(9) with the mpt mutex held.
> OK. Whom to contact? Or are you willing to look into it?
> I haven't worked in that area...

I am really not sure.  Looking at the svn history is not very encouraging.

Might be try freebsd-scsi@ as well.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: New warnings from WITNESS

2016-11-06 Thread Michael Tuexen

> On 6 Nov 2016, at 13:28, Konstantin Belousov  wrote:
> 
> On Sun, Nov 06, 2016 at 12:50:12PM +0100, Michael Tuexen wrote:
>> bus_dmamap_create with the following non-sleepable locks held:
>> exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ 
>> dev/mpt/mpt.c:2287
>> stack backtrace:
>> #0 0x80ac0300 at witness_debugger+0x70
>> #1 0x80ac15e7 at witness_warn+0x3d7
>> #2 0x81055fef at bus_dmamap_create+0x2f
>> #3 0x80678a25 at mpt_configure_ioc+0x3a5
>> #4 0x80677476 at mpt_attach+0x226
>> #5 0x80683299 at mpt_pci_attach+0x9c9
>> #6 0x80a9478d at device_attach+0x41d
>> #7 0x80a9595a at bus_generic_attach+0x4a
>> #8 0x806ebe75 at pci_attach+0xd5
>> #9 0x80a9478d at device_attach+0x41d
>> #10 0x80a9595a at bus_generic_attach+0x4a
>> #11 0x803c11a2 at acpi_pcib_acpi_attach+0x402
>> #12 0x80a9478d at device_attach+0x41d
>> #13 0x80a9595a at bus_generic_attach+0x4a
>> #14 0x803b4c8f at acpi_attach+0xdbf
>> #15 0x80a9478d at device_attach+0x41d
>> #16 0x80a9595a at bus_generic_attach+0x4a
>> #17 0x80ee03e3 at nexus_acpi_attach+0x73
>> 
>> ... and so on. Not sure which revision introduced it...
> r308268
> 
> I believe that this is an mpt(4) driver issue, which calls
> bus_dmamap_create(9) with the mpt mutex held.
OK. Whom to contact? Or are you willing to look into it?
I haven't worked in that area...

Best regards
Michael

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: New warnings from WITNESS

2016-11-06 Thread Konstantin Belousov

On Sun, Nov 06, 2016 at 12:50:12PM +0100, Michael Tuexen wrote:
> bus_dmamap_create with the following non-sleepable locks held:
> exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ 
> dev/mpt/mpt.c:2287
> stack backtrace:
> #0 0x80ac0300 at witness_debugger+0x70
> #1 0x80ac15e7 at witness_warn+0x3d7
> #2 0x81055fef at bus_dmamap_create+0x2f
> #3 0x80678a25 at mpt_configure_ioc+0x3a5
> #4 0x80677476 at mpt_attach+0x226
> #5 0x80683299 at mpt_pci_attach+0x9c9
> #6 0x80a9478d at device_attach+0x41d
> #7 0x80a9595a at bus_generic_attach+0x4a
> #8 0x806ebe75 at pci_attach+0xd5
> #9 0x80a9478d at device_attach+0x41d
> #10 0x80a9595a at bus_generic_attach+0x4a
> #11 0x803c11a2 at acpi_pcib_acpi_attach+0x402
> #12 0x80a9478d at device_attach+0x41d
> #13 0x80a9595a at bus_generic_attach+0x4a
> #14 0x803b4c8f at acpi_attach+0xdbf
> #15 0x80a9478d at device_attach+0x41d
> #16 0x80a9595a at bus_generic_attach+0x4a
> #17 0x80ee03e3 at nexus_acpi_attach+0x73
> 
> ... and so on. Not sure which revision introduced it...
r308268

I believe that this is an mpt(4) driver issue, which calls
bus_dmamap_create(9) with the mpt mutex held.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

New warnings from WITNESS

2016-11-06 Thread Michael Tuexen

Dear all,

when booting a recent kernel

[freebsd12:~] tuexen% uname -a
FreeBSD freebsd12.testbed 12.0-CURRENT FreeBSD 12.0-CURRENT #702 r308359M: Sun 
Nov  6 11:55:17 CET 2016 
tuexen@freebsd12.testbed:/usr/home/tuexen/head/sys/amd64/compile/SCTP  amd64

on a VMWare Fusion VM, I get a lot of warnings like

bus_dmamap_create with the following non-sleepable locks held:
exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ 
dev/mpt/mpt.c:2287
stack backtrace:
#0 0x80ac0300 at witness_debugger+0x70
#1 0x80ac15e7 at witness_warn+0x3d7
#2 0x81055fef at bus_dmamap_create+0x2f
#3 0x80678a25 at mpt_configure_ioc+0x3a5
#4 0x80677476 at mpt_attach+0x226
#5 0x80683299 at mpt_pci_attach+0x9c9
#6 0x80a9478d at device_attach+0x41d
#7 0x80a9595a at bus_generic_attach+0x4a
#8 0x806ebe75 at pci_attach+0xd5
#9 0x80a9478d at device_attach+0x41d
#10 0x80a9595a at bus_generic_attach+0x4a
#11 0x803c11a2 at acpi_pcib_acpi_attach+0x402
#12 0x80a9478d at device_attach+0x41d
#13 0x80a9595a at bus_generic_attach+0x4a
#14 0x803b4c8f at acpi_attach+0xdbf
#15 0x80a9478d at device_attach+0x41d
#16 0x80a9595a at bus_generic_attach+0x4a
#17 0x80ee03e3 at nexus_acpi_attach+0x73
bus_dmamap_create with the following non-sleepable locks held:
exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ 
dev/mpt/mpt.c:2287
stack backtrace:
#0 0x80ac0300 at witness_debugger+0x70
#1 0x80ac15e7 at witness_warn+0x3d7
#2 0x81055fef at bus_dmamap_create+0x2f
#3 0x80678a25 at mpt_configure_ioc+0x3a5
#4 0x80677476 at mpt_attach+0x226
#5 0x80683299 at mpt_pci_attach+0x9c9
#6 0x80a9478d at device_attach+0x41d
#7 0x80a9595a at bus_generic_attach+0x4a
#8 0x806ebe75 at pci_attach+0xd5
#9 0x80a9478d at device_attach+0x41d
#10 0x80a9595a at bus_generic_attach+0x4a
#11 0x803c11a2 at acpi_pcib_acpi_attach+0x402
#12 0x80a9478d at device_attach+0x41d
#13 0x80a9595a at bus_generic_attach+0x4a
#14 0x803b4c8f at acpi_attach+0xdbf
#15 0x80a9478d at device_attach+0x41d
#16 0x80a9595a at bus_generic_attach+0x4a
#17 0x80ee03e3 at nexus_acpi_attach+0x73
bus_dmamap_create with the following non-sleepable locks held:
exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ 
dev/mpt/mpt.c:2287
stack backtrace:
#0 0x80ac0300 at witness_debugger+0x70
#1 0x80ac15e7 at witness_warn+0x3d7
#2 0x81055fef at bus_dmamap_create+0x2f
#3 0x80678a25 at mpt_configure_ioc+0x3a5
#4 0x80677476 at mpt_attach+0x226
#5 0x80683299 at mpt_pci_attach+0x9c9
#6 0x80a9478d at device_attach+0x41d
#7 0x80a9595a at bus_generic_attach+0x4a
#8 0x806ebe75 at pci_attach+0xd5
#9 0x80a9478d at device_attach+0x41d
#10 0x80a9595a at bus_generic_attach+0x4a
#11 0x803c11a2 at acpi_pcib_acpi_attach+0x402
#12 0x80a9478d at device_attach+0x41d
#13 0x80a9595a at bus_generic_attach+0x4a
#14 0x803b4c8f at acpi_attach+0xdbf
#15 0x80a9478d at device_attach+0x41d
#16 0x80a9595a at bus_generic_attach+0x4a
#17 0x80ee03e3 at nexus_acpi_attach+0x73
bus_dmamap_create with the following non-sleepable locks held:
exclusive sleep mutex mpt (mpt) r = 0 (0xfee2f008) locked @ 
dev/mpt/mpt.c:2287
stack backtrace:
#0 0x80ac0300 at witness_debugger+0x70
#1 0x80ac15e7 at witness_warn+0x3d7
#2 0x81055fef at bus_dmamap_create+0x2f
#3 0x80678a25 at mpt_configure_ioc+0x3a5
#4 0x80677476 at mpt_attach+0x226
#5 0x80683299 at mpt_pci_attach+0x9c9
#6 0x80a9478d at device_attach+0x41d
#7 0x80a9595a at bus_generic_attach+0x4a
#8 0x806ebe75 at pci_attach+0xd5
#9 0x80a9478d at device_attach+0x41d
#10 0x80a9595a at bus_generic_attach+0x4a
#11 0x803c11a2 at acpi_pcib_acpi_attach+0x402
#12 0x80a9478d at device_attach+0x41d
#13 0x80a9595a at bus_generic_attach+0x4a
#14 0x803b4c8f at acpi_attach+0xdbf
#15 0x80a9478d at device_attach+0x41d
#16 0x80a9595a at bus_generic_attach+0x4a
#17 0x80ee03e3 at nexus_acpi_attach+0x73

... and so on. Not sure which revision introduced it...

Best regards
Michael
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: WITNESS messages on 11

2016-05-28 Thread mokhi

I'm not sure, but maybe this is related to [1].
FWIW, according to [2], LORs of witness means deadlock could be
happened in that situation (if it's not a false positive), not
necessarily an accurate deadlock happening IMO :)

I hope it helps :)

[1]http://lists.freebsd.org/pipermail/freebsd-current/2016-May/061195.html
[2]https://www.freebsd.org/doc/faq/troubleshoot.html#idp63374416

Best Regards, Mokhi.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

WITNESS messages on 11

2016-05-28 Thread Yuri


I saw this every time I tried to install 11 from the VM image recently.


While copying over the ports tree with the command "nc   | tar 
xzf -" I get the messages, see below.



Yuri


lock order reversal:
 1st 0xfe003ca55570 bufwait (bufwait) @ 
/usr/src/sys/kern/vfs_bio.c:3512
 2nd 0xf80003db6a00 dirhash (dirhash) @ 
/usr/src/sys/ufs/ufs/ufs_dirhash.c:281

stack backtrace:
#0 0x80a981b0 at witness_debugger+0x70
#1 0x80a980a4 at witness_checkorder+0xe54
#2 0x80a429a2 at _sx_xlock+0x72
#3 0x80cf32dd at ufsdirhash_add+0x3d
#4 0x80cf60ba at ufs_direnter+0x4da
#5 0x80cfe6b9 at ufs_mkdir+0x8a9
#6 0x80ff7f47 at VOP_MKDIR_APV+0xf7
#7 0x80b05c88 at kern_mkdirat+0x208
#8 0x80e9cc1b at amd64_syscall+0x2db
#9 0x80e7cbdb at Xfast_syscall+0xfb
lock order reversal:
 1st 0xf800388ed5f0 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:894
 2nd 0xfe003cb6ccb0 bufwait (bufwait) @ 
/usr/src/sys/ufs/ffs/ffs_vnops.c:263

 3rd 0xf8003b40b7c8 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2498
stack backtrace:
#0 0x80a981b0 at witness_debugger+0x70
#1 0x80a980a4 at witness_checkorder+0xe54
#2 0x80a12b06 at __lockmgr_args+0x4d6
#3 0x80cedf46 at ffs_lock+0xa6
#4 0x80ff88f0 at VOP_LOCK1_APV+0x100
#5 0x80b0880a at _vn_lock+0x9a
#6 0x80af8c63 at vget+0x63
#7 0x80aeb52c at vfs_hash_get+0xcc
#8 0x80ce95a0 at ffs_vgetf+0x40
#9 0x80ce0e91 at softdep_sync_buf+0xb51
#10 0x80ceeb36 at ffs_syncvnode+0x256
#11 0x80cedde0 at ffs_fsync+0x20
#12 0x80ff79b7 at VOP_FSYNC_APV+0xf7
#13 0x80ada695 at bufsync+0x35
#14 0x80af6d12 at bufobj_invalbuf+0xa2
#15 0x80af9f4e at vgonel+0x15e
#16 0x80afce37 at vnlru_proc+0x577
#17 0x809fd8c4 at fork_exit+0x84

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Is a high witness refcount indicative of a missing unlock?

2015-04-06 Thread John Baldwin

On Sunday, March 29, 2015 02:29:42 PM Lars wrote:
 Hi,
 I am poking around for a cause for my repeating deadlock issues on my system 
 based on r 279869. ddb show witness show the “vnode interlock” and the “zfs” 
 locks both with reference counts over 200K. Obviously they are related, and 
 there is a find running (all the filesystems on this machine are zfs ( minus 
 the specialty ones like devfs).
 
 I don’t see any other withness entry with reference counts even in the 
 ballpark of these two, so does this indicate that we have a vnode/zfs path 
 were we don’t unlock?

The ref count just means that the locks exist (so you have 200k ZFS vnodes),
not that the locks are currently held.  The reference count on the witness
object is bumped when a lock is created that uses that witness object and
released when the lock is destroyed.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Is a high witness refcount indicative of a missing unlock?

2015-03-31 Thread Lars

Hi Shane,
While our configs shoulds much the same (ignoring 10.1-stable vs current) and I 
had the nvidia driver loaded, my lockups did not involve the nvidia driver. 
That of course does not necessarily mean anything if the issue is squarly in 
zfs somewhere.

You can see the reference counts from the ddb kernel debugger (man 8 ddb) using 
the “show witness” command

Lars
 On Mar 29, 2015, at 19:14, Shane Ambler free...@shaneware.biz wrote:
 
 On 30/03/2015 05:59, Lars wrote:
 Hi, I am poking around for a cause for my repeating deadlock issues
 on my system based on r 279869. ddb show witness show the â€œvnode
 interlockâ€ and the â€œzfsâ€ locks both with reference counts over
 200K. Obviously they are related, and there is a find running (all
 the filesystems on this machine are zfs ( minus the specialty ones
 like devfs).
 
 I donâ€™t see any other withness entry with reference counts even in
 the ballpark of these two, so does this indicate that we have a
 vnode/zfs path were we donâ€™t unlock?
 
 
 I am running 10.1-STABLE and have bad locking issues. Running a witness
 kernel I got a duplicate lock from nvidia and lock order reversals
 involving zfs. Any chance your issue is related to mine?
 
 What command can give me the witness lock counts?
 
 The debug data I have collected so far is at -
 http://shaneware.biz/freebsddebugdata/
 
 The lock reversal output I had was (after uptime of about 12 mins) -
 
 Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system process 
 `vnlru' to stop...done
 Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system process 
 `bufdaemon' to stop...done
 Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system process 
 `syncer' to stop...
 Mar 24 00:24:25 leader kernel: Syncing disks, vnodes remaining...0 0 0 0 0 0 
 0 0 done
 Mar 24 00:24:25 leader kernel: All buffers synced.
 Mar 24 00:24:25 leader kernel: lock order reversal:
 Mar 24 00:24:25 leader kernel: 1st 0xf800224555f0 zfs (zfs) @ 
 /usr/src/sys/kern/vfs_mount.c:1229
 Mar 24 00:24:25 leader kernel: 2nd 0xf800222d67c8 syncer (syncer) @ 
 /usr/src/sys/kern/vfs_subr.c:2268
 Mar 24 00:24:25 leader kernel: KDB: stack backtrace:
 Mar 24 00:24:25 leader kernel: db_trace_self_wrapper() at 
 db_trace_self_wrapper+0x2b/frame 0xfe022df6e4c0
 Mar 24 00:24:25 leader kernel: kdb_backtrace() at kdb_backtrace+0x39/frame 
 0xfe022df6e570
 Mar 24 00:24:25 leader kernel: witness_checkorder() at 
 witness_checkorder+0xdc2/frame 0xfe022df6e600
 Mar 24 00:24:25 leader kernel: __lockmgr_args() at __lockmgr_args+0x9ea/frame 
 0xfe022df6e740
 Mar 24 00:24:25 leader kernel: vop_stdlock() at vop_stdlock+0x3c/frame 
 0xfe022df6e760
 Mar 24 00:24:25 leader kernel: VOP_LOCK1_APV() at VOP_LOCK1_APV+0xfc/frame 
 0xfe022df6e790
 Mar 24 00:24:25 leader kernel: _vn_lock() at _vn_lock+0xaa/frame 
 0xfe022df6e800
 Mar 24 00:24:25 leader kernel: vputx() at vputx+0x232/frame 0xfe022df6e860
 Mar 24 00:24:25 leader kernel: dounmount() at dounmount+0x301/frame 
 0xfe022df6e8e0
 Mar 24 00:24:25 leader kernel: vfs_unmountall() at vfs_unmountall+0x61/frame 
 0xfe022df6e910
 Mar 24 00:24:25 leader kernel: kern_reboot() at kern_reboot+0x540/frame 
 0xfe022df6e980
 Mar 24 00:24:25 leader kernel: sys_reboot() at sys_reboot+0x5a/frame 
 0xfe022df6e9a0
 Mar 24 00:24:25 leader kernel: amd64_syscall() at amd64_syscall+0x25a/frame 
 0xfe022df6eab0
 Mar 24 00:24:25 leader kernel: Xfast_syscall() at Xfast_syscall+0xfb/frame 
 0xfe022df6eab0
 Mar 24 00:24:25 leader kernel: --- syscall (55, FreeBSD ELF64, sys_reboot), 
 rip = 0x40f1bc, rsp = 0x7fffe6d8, rbp = 0x7fffe7d0 ---
 Mar 24 00:24:25 leader kernel: lock order reversal:
 Mar 24 00:24:25 leader kernel: 1st 0xf800222d6b78 zfs (zfs) @ 
 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1814
 Mar 24 00:24:25 leader kernel: 2nd 0x818514a8 allproc (allproc) @ 
 /usr/src/sys/kern/kern_descrip.c:2872
 Mar 24 00:24:25 leader kernel: KDB: stack backtrace:
 Mar 24 00:24:25 leader kernel: db_trace_self_wrapper() at 
 db_trace_self_wrapper+0x2b/frame 0xfe022df6e690
 Mar 24 00:24:25 leader kernel: kdb_backtrace() at kdb_backtrace+0x39/frame 
 0xfe022df6e740
 Mar 24 00:24:25 leader kernel: witness_checkorder() at 
 witness_checkorder+0xdc2/frame 0xfe022df6e7d0
 Mar 24 00:24:25 leader kernel: _sx_slock() at _sx_slock+0x76/frame 
 0xfe022df6e810
 Mar 24 00:24:25 leader kernel: mountcheckdirs() at mountcheckdirs+0x47/frame 
 0xfe022df6e860
 Mar 24 00:24:25 leader kernel: dounmount() at dounmount+0x36f/frame 
 0xfe022df6e8e0
 Mar 24 00:24:25 leader kernel: vfs_unmountall() at vfs_unmountall+0x61/frame 
 0xfe022df6e910
 Mar 24 00:24:25 leader kernel: kern_reboot() at kern_reboot+0x540/frame 
 0xfe022df6e980
 Mar 24 00:24:25 leader kernel: sys_reboot() at sys_reboot+0x5a/frame 
 0xfe022df6e9a0
 Mar 24 00:24:25 leader kernel: amd64_syscall

Re: Is a high witness refcount indicative of a missing unlock?

2015-03-29 Thread Shane Ambler


On 30/03/2015 05:59, Lars wrote:

Hi, I am poking around for a cause for my repeating deadlock issues
on my system based on r 279869. ddb show witness show the â€œvnode
interlockâ€ and the â€œzfsâ€ locks both with reference counts over
200K. Obviously they are related, and there is a find running (all
the filesystems on this machine are zfs ( minus the specialty ones
like devfs).

I donâ€™t see any other withness entry with reference counts even in
the ballpark of these two, so does this indicate that we have a
vnode/zfs path were we donâ€™t unlock?



I am running 10.1-STABLE and have bad locking issues. Running a witness
kernel I got a duplicate lock from nvidia and lock order reversals
involving zfs. Any chance your issue is related to mine?

What command can give me the witness lock counts?

The debug data I have collected so far is at -
http://shaneware.biz/freebsddebugdata/

The lock reversal output I had was (after uptime of about 12 mins) -

Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system 
process `vnlru' to stop...done
Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system 
process `bufdaemon' to stop...done
Mar 24 00:24:25 leader kernel: Waiting (max 60 seconds) for system 
process `syncer' to stop...
Mar 24 00:24:25 leader kernel: Syncing disks, vnodes remaining...0 0 0 0 
0 0 0 0 done

Mar 24 00:24:25 leader kernel: All buffers synced.
Mar 24 00:24:25 leader kernel: lock order reversal:
Mar 24 00:24:25 leader kernel: 1st 0xf800224555f0 zfs (zfs) @ 
/usr/src/sys/kern/vfs_mount.c:1229
Mar 24 00:24:25 leader kernel: 2nd 0xf800222d67c8 syncer (syncer) @ 
/usr/src/sys/kern/vfs_subr.c:2268

Mar 24 00:24:25 leader kernel: KDB: stack backtrace:
Mar 24 00:24:25 leader kernel: db_trace_self_wrapper() at 
db_trace_self_wrapper+0x2b/frame 0xfe022df6e4c0
Mar 24 00:24:25 leader kernel: kdb_backtrace() at 
kdb_backtrace+0x39/frame 0xfe022df6e570
Mar 24 00:24:25 leader kernel: witness_checkorder() at 
witness_checkorder+0xdc2/frame 0xfe022df6e600
Mar 24 00:24:25 leader kernel: __lockmgr_args() at 
__lockmgr_args+0x9ea/frame 0xfe022df6e740
Mar 24 00:24:25 leader kernel: vop_stdlock() at vop_stdlock+0x3c/frame 
0xfe022df6e760
Mar 24 00:24:25 leader kernel: VOP_LOCK1_APV() at 
VOP_LOCK1_APV+0xfc/frame 0xfe022df6e790
Mar 24 00:24:25 leader kernel: _vn_lock() at _vn_lock+0xaa/frame 
0xfe022df6e800
Mar 24 00:24:25 leader kernel: vputx() at vputx+0x232/frame 
0xfe022df6e860
Mar 24 00:24:25 leader kernel: dounmount() at dounmount+0x301/frame 
0xfe022df6e8e0
Mar 24 00:24:25 leader kernel: vfs_unmountall() at 
vfs_unmountall+0x61/frame 0xfe022df6e910
Mar 24 00:24:25 leader kernel: kern_reboot() at kern_reboot+0x540/frame 
0xfe022df6e980
Mar 24 00:24:25 leader kernel: sys_reboot() at sys_reboot+0x5a/frame 
0xfe022df6e9a0
Mar 24 00:24:25 leader kernel: amd64_syscall() at 
amd64_syscall+0x25a/frame 0xfe022df6eab0
Mar 24 00:24:25 leader kernel: Xfast_syscall() at 
Xfast_syscall+0xfb/frame 0xfe022df6eab0
Mar 24 00:24:25 leader kernel: --- syscall (55, FreeBSD ELF64, 
sys_reboot), rip = 0x40f1bc, rsp = 0x7fffe6d8, rbp = 0x7fffe7d0 ---

Mar 24 00:24:25 leader kernel: lock order reversal:
Mar 24 00:24:25 leader kernel: 1st 0xf800222d6b78 zfs (zfs) @ 
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1814
Mar 24 00:24:25 leader kernel: 2nd 0x818514a8 allproc (allproc) 
@ /usr/src/sys/kern/kern_descrip.c:2872

Mar 24 00:24:25 leader kernel: KDB: stack backtrace:
Mar 24 00:24:25 leader kernel: db_trace_self_wrapper() at 
db_trace_self_wrapper+0x2b/frame 0xfe022df6e690
Mar 24 00:24:25 leader kernel: kdb_backtrace() at 
kdb_backtrace+0x39/frame 0xfe022df6e740
Mar 24 00:24:25 leader kernel: witness_checkorder() at 
witness_checkorder+0xdc2/frame 0xfe022df6e7d0
Mar 24 00:24:25 leader kernel: _sx_slock() at _sx_slock+0x76/frame 
0xfe022df6e810
Mar 24 00:24:25 leader kernel: mountcheckdirs() at 
mountcheckdirs+0x47/frame 0xfe022df6e860
Mar 24 00:24:25 leader kernel: dounmount() at dounmount+0x36f/frame 
0xfe022df6e8e0
Mar 24 00:24:25 leader kernel: vfs_unmountall() at 
vfs_unmountall+0x61/frame 0xfe022df6e910
Mar 24 00:24:25 leader kernel: kern_reboot() at kern_reboot+0x540/frame 
0xfe022df6e980
Mar 24 00:24:25 leader kernel: sys_reboot() at sys_reboot+0x5a/frame 
0xfe022df6e9a0
Mar 24 00:24:25 leader kernel: amd64_syscall() at 
amd64_syscall+0x25a/frame 0xfe022df6eab0
Mar 24 00:24:25 leader kernel: Xfast_syscall() at 
Xfast_syscall+0xfb/frame 0xfe022df6eab0
Mar 24 00:24:25 leader kernel: --- syscall (55, FreeBSD ELF64, 
sys_reboot), rip = 0x40f1bc, rsp = 0x7fffe6d8, rbp = 0x7fffe7d0 ---

Mar 24 00:24:25 leader kernel: lock order reversal:
Mar 24 00:24:25 leader kernel: 1st 0xf8001ca8e240 zfs (zfs) @ 
/usr/src/sys/kern/vfs_mount.c:1229
Mar 24 00:24:25 leader kernel: 2nd 0xf8001ca8e5f0 devfs (devfs) @ 
/usr/src/sys/kern

Is a high witness refcount indicative of a missing unlock?

2015-03-29 Thread Lars

Hi,
I am poking around for a cause for my repeating deadlock issues on my system 
based on r 279869. ddb show witness show the “vnode interlock” and the “zfs” 
locks both with reference counts over 200K. Obviously they are related, and 
there is a find running (all the filesystems on this machine are zfs ( minus 
the specialty ones like devfs).

I don’t see any other withness entry with reference counts even in the ballpark 
of these two, so does this indicate that we have a vnode/zfs path were we don’t 
unlock?

Thanks,
Lars
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: witness and modules.

2014-12-03 Thread Andriy Gapon

On 03/12/2014 04:33, Julian Elischer wrote:
 On 12/3/14, 12:24 AM, Warner Losh wrote:
 On Dec 1, 2014, at 10:08 PM, Julian Elischer jul...@freebsd.org wrote:

 On 12/1/14, 11:39 PM, John Baldwin wrote:
 On Friday, November 28, 2014 11:08:35 PM Julian Elischer wrote:
 Do we need to compile all modules with witness definitions when
 linking with a kernel compiled with witness?
 This was true at one stage but I remember some work was done to make
 them compatible.
 You should not need this.  modules always call functions in the kernel for
 lock operations and this functions are what invoke WITNESS.

 that's what I thought but empirical evidence disagrees.
 I'll try some more cases.
 I swap back and forth all the time between the two. Kernel modules don’t
 change when you compile them with WITNESS or without.
 not entirely..
 hwpmc.ko: U witness_restore
 hwpmc.ko: U witness_save
 zfs.ko: U witness_restore
 zfs.ko: U witness_save

Seems like the problem affects modules that use DROP_GIANT / PICKUP_GIANT.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: witness and modules.

2014-12-03 Thread Julian Elischer


On 12/3/14, 7:26 PM, Andriy Gapon wrote:

On 03/12/2014 04:33, Julian Elischer wrote:

On 12/3/14, 12:24 AM, Warner Losh wrote:

On Dec 1, 2014, at 10:08 PM, Julian Elischer jul...@freebsd.org wrote:

On 12/1/14, 11:39 PM, John Baldwin wrote:

On Friday, November 28, 2014 11:08:35 PM Julian Elischer wrote:

Do we need to compile all modules with witness definitions when
linking with a kernel compiled with witness?
This was true at one stage but I remember some work was done to make
them compatible.

You should not need this.  modules always call functions in the kernel for
lock operations and this functions are what invoke WITNESS.


that's what I thought but empirical evidence disagrees.
I'll try some more cases.

I swap back and forth all the time between the two. Kernel modules don’t
change when you compile them with WITNESS or without.

not entirely..
hwpmc.ko: U witness_restore
hwpmc.ko: U witness_save
zfs.ko: U witness_restore
zfs.ko: U witness_save

Seems like the problem affects modules that use DROP_GIANT / PICKUP_GIANT.


that's a good observation. I'll take a look a that later.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: witness and modules.

2014-12-03 Thread John Baldwin

On Wednesday, December 03, 2014 09:53:20 PM Julian Elischer wrote:
 On 12/3/14, 7:26 PM, Andriy Gapon wrote:
  On 03/12/2014 04:33, Julian Elischer wrote:
  On 12/3/14, 12:24 AM, Warner Losh wrote:
  On Dec 1, 2014, at 10:08 PM, Julian Elischer jul...@freebsd.org
  wrote:
  
  On 12/1/14, 11:39 PM, John Baldwin wrote:
  On Friday, November 28, 2014 11:08:35 PM Julian Elischer wrote:
  Do we need to compile all modules with witness definitions when
  linking with a kernel compiled with witness?
  This was true at one stage but I remember some work was done to make
  them compatible.
  
  You should not need this.  modules always call functions in the kernel
  for
  lock operations and this functions are what invoke WITNESS.
  
  that's what I thought but empirical evidence disagrees.
  I'll try some more cases.
  
  I swap back and forth all the time between the two. Kernel modules don’t
  change when you compile them with WITNESS or without.
  
  not entirely..
  hwpmc.ko: U witness_restore
  hwpmc.ko: U witness_save
  zfs.ko: U witness_restore
  zfs.ko: U witness_save
  
  Seems like the problem affects modules that use DROP_GIANT / PICKUP_GIANT.
 
 that's a good observation. I'll take a look a that later.

Yes, that isn't really intended to be used publically.

The pmc one is stale as system calls haven't run with Giant in several 
releases.

All the ones for g_topology_lock() also seem to be broken-by-designed.  There 
is no good reason I can think of for g_topology_lock() to assert that Giant 
isn't held.  I suspect phk@ just wanted to force geom to be locked without 
Giant, but I'm not sure that is the best way to achieve that?  Poul, is that 
correct?

If you fix that you can remove almost all of the DROP_GIANT/PICKUP_GIANT in 
the tree.  They should really only be in the _sleep() and cv_wait_*() 
functions where they are used to give Giant its special property of being 
dropped while asleep.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: witness and modules.

2014-12-02 Thread Warner Losh


 On Dec 1, 2014, at 10:08 PM, Julian Elischer jul...@freebsd.org wrote:
 
 On 12/1/14, 11:39 PM, John Baldwin wrote:
 On Friday, November 28, 2014 11:08:35 PM Julian Elischer wrote:
 Do we need to compile all modules with witness definitions when
 linking with a kernel compiled with witness?
 This was true at one stage but I remember some work was done to make
 them compatible.
 You should not need this.  modules always call functions in the kernel for
 lock operations and this functions are what invoke WITNESS.
 
 that's what I thought but empirical evidence disagrees.
 I'll try some more cases.

I swap back and forth all the time between the two. Kernel modules don’t
change when you compile them with WITNESS or without.

Warner



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: witness and modules.

2014-12-02 Thread Julian Elischer


On 12/3/14, 12:24 AM, Warner Losh wrote:

On Dec 1, 2014, at 10:08 PM, Julian Elischer jul...@freebsd.org wrote:

On 12/1/14, 11:39 PM, John Baldwin wrote:

On Friday, November 28, 2014 11:08:35 PM Julian Elischer wrote:

Do we need to compile all modules with witness definitions when
linking with a kernel compiled with witness?
This was true at one stage but I remember some work was done to make
them compatible.

You should not need this.  modules always call functions in the kernel for
lock operations and this functions are what invoke WITNESS.


that's what I thought but empirical evidence disagrees.
I'll try some more cases.

I swap back and forth all the time between the two. Kernel modules don’t
change when you compile them with WITNESS or without.

Warner


it's a 10.0 based system.

the zfs kernel module grew a reference to witness_restore and 
witness_save  when it was compiled with witness in opt_global.h.
I certainly wouldn't load with a standard kernel. it may have been 
somethign funny with the zfs compat layer stuff, and I'm pretty sure 
people don't use zfs as a module much, but it was definitely there. 
note: it'd probably work the other way around.. a non-witness module 
would probably load on a witness kernel.. none of the other kernel 
modules had that extra reference so I think it's specific to zfs.   
Maybe to the way it was compiled.. the sources are pretty standard but 
the makefiles are a bit hacked..




___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: witness and modules.

2014-12-02 Thread Julian Elischer


On 12/3/14, 12:24 AM, Warner Losh wrote:

On Dec 1, 2014, at 10:08 PM, Julian Elischer jul...@freebsd.org wrote:

On 12/1/14, 11:39 PM, John Baldwin wrote:

On Friday, November 28, 2014 11:08:35 PM Julian Elischer wrote:

Do we need to compile all modules with witness definitions when
linking with a kernel compiled with witness?
This was true at one stage but I remember some work was done to make
them compatible.

You should not need this.  modules always call functions in the kernel for
lock operations and this functions are what invoke WITNESS.


that's what I thought but empirical evidence disagrees.
I'll try some more cases.

I swap back and forth all the time between the two. Kernel modules don’t
change when you compile them with WITNESS or without.

not entirely..
hwpmc.ko: U witness_restore
hwpmc.ko: U witness_save
zfs.ko: U witness_restore
zfs.ko: U witness_save

Warner



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: witness and modules.

2014-12-01 Thread John Baldwin

On Friday, November 28, 2014 11:08:35 PM Julian Elischer wrote:
 Do we need to compile all modules with witness definitions when
 linking with a kernel compiled with witness?
 This was true at one stage but I remember some work was done to make
 them compatible.

You should not need this.  modules always call functions in the kernel for 
lock operations and this functions are what invoke WITNESS.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: witness and modules.

2014-12-01 Thread Julian Elischer


On 12/1/14, 11:39 PM, John Baldwin wrote:

On Friday, November 28, 2014 11:08:35 PM Julian Elischer wrote:

Do we need to compile all modules with witness definitions when
linking with a kernel compiled with witness?
This was true at one stage but I remember some work was done to make
them compatible.

You should not need this.  modules always call functions in the kernel for
lock operations and this functions are what invoke WITNESS.


that's what I thought but empirical evidence disagrees.
I'll try some more cases.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

witness and modules.

2014-11-28 Thread Julian Elischer

Do we need to compile all modules with witness definitions when 
linking with a kernel compiled with witness?
This was true at one stage but I remember some work was done to make 
them compatible.




___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: WITNESS observes 2 LORs on Boot of Release 10.1

2014-11-24 Thread Ellis H. Wilson III


On 11/22/2014 03:51 PM, Benjamin Kaduk wrote:

On Fri, 21 Nov 2014, Ellis H. Wilson III wrote:


Before I start, and this is mainly geared to my responder Benjamin Kaduk,
based on your response, are you suggesting that the cnputc WITNESS panic you
expected to happen is now completely unavoidable in FreeBSD 10?  I.E., is this
a spinlock that WITNESS falls over each time but that is provably deadlock
free that the developers have decided cannot be BLESSED for some reason?


https://lists.freebsd.org/pipermail/freebsd-current/2012-January/031316.html
looks to be a better explanation than the previous link I sent ... in
short, console output is hard.


I guess I just can't wrap my head around why we would ever move to a regime
where SKIPSPIN is the default for testing...  That just seems like an open
invitation for introducing spinlock regressions.


I don't think anyone made the conscious decision to do that, it just
happened by default as no one spent the time to fix the aforementioned
issue.


Moving onto the LORs I'm seeing, a question I have as a newbie to WITNESS
debugging is how exactly to interpret the output if I see a stacktrace and
then a LOR output like the following:

lock order reversal:
   1st 0x81633d88 entropy harvest mutex (entropy harvest mutex) @
/usr/src/sys/dev/random/random_harvestq.c:198
   2nd 0x813b6208 scrlock (scrlock) @
/usr/src/sys/dev/syscons/syscons.c:2682

Does this mean WITNESS has already stored an ordering of #1 harvest_mtx then
#2 scp-scr_lock, and somewhere somebody tried to lock scp-scr_lock without
first getting harvest_mtx?  Or the reverse (WITNESS previously recorded
scrlock and then harvest and the lines it spit out were the offenders?)


I believe it is the latter (the ordering being printed is the bad one
which caused WITNESS to complain).


Thanks so much for the additional info Ben.  This fleshes out the 
history of this issue for me significantly.  I have filed a bug on this at:


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195262

Xin Li was able to identify the ordering that caused the problem and 
proposed a possible patch to fix it.  I can confirm that now I'm booting 
with solely WITNESS (i.e., not WITNESS_SKIPSPIN) without panic.


Thanks!

ellis
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: WITNESS observes 2 LORs on Boot of Release 10.1

2014-11-22 Thread Benjamin Kaduk

On Fri, 21 Nov 2014, Ellis H. Wilson III wrote:

 Before I start, and this is mainly geared to my responder Benjamin Kaduk,
 based on your response, are you suggesting that the cnputc WITNESS panic you
 expected to happen is now completely unavoidable in FreeBSD 10?  I.E., is this
 a spinlock that WITNESS falls over each time but that is provably deadlock
 free that the developers have decided cannot be BLESSED for some reason?

https://lists.freebsd.org/pipermail/freebsd-current/2012-January/031316.html
looks to be a better explanation than the previous link I sent ... in
short, console output is hard.

 I guess I just can't wrap my head around why we would ever move to a regime
 where SKIPSPIN is the default for testing...  That just seems like an open
 invitation for introducing spinlock regressions.

I don't think anyone made the conscious decision to do that, it just
happened by default as no one spent the time to fix the aforementioned
issue.

 Moving onto the LORs I'm seeing, a question I have as a newbie to WITNESS
 debugging is how exactly to interpret the output if I see a stacktrace and
 then a LOR output like the following:

 lock order reversal:
   1st 0x81633d88 entropy harvest mutex (entropy harvest mutex) @
 /usr/src/sys/dev/random/random_harvestq.c:198
   2nd 0x813b6208 scrlock (scrlock) @
 /usr/src/sys/dev/syscons/syscons.c:2682

 Does this mean WITNESS has already stored an ordering of #1 harvest_mtx then
 #2 scp-scr_lock, and somewhere somebody tried to lock scp-scr_lock without
 first getting harvest_mtx?  Or the reverse (WITNESS previously recorded
 scrlock and then harvest and the lines it spit out were the offenders?)

I believe it is the latter (the ordering being printed is the bad one
which caused WITNESS to complain).

-Ben
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: WITNESS observes 2 LORs on Boot of Release 10.1

2014-11-21 Thread Ellis H. Wilson III


On 11/18/2014 08:35 PM, Ellis H. Wilson III wrote:

If nobody has seen these before, I'll try and put together fixes for
 them.  Please somebody speak up if you have seen them or have useful
 information for me to go on in my patches.


I've started to dig into the relevant code in 
sys/dev/random/random_harvestq.c, sys/dev/syscons/syscons.c, and 
sys/kern/subr_sleepqueue.c.


Before I start, and this is mainly geared to my responder Benjamin 
Kaduk, based on your response, are you suggesting that the cnputc 
WITNESS panic you expected to happen is now completely unavoidable in 
FreeBSD 10?  I.E., is this a spinlock that WITNESS falls over each time 
but that is provably deadlock free that the developers have decided 
cannot be BLESSED for some reason?


I guess I just can't wrap my head around why we would ever move to a 
regime where SKIPSPIN is the default for testing...  That just seems 
like an open invitation for introducing spinlock regressions.


Moving onto the LORs I'm seeing, a question I have as a newbie to 
WITNESS debugging is how exactly to interpret the output if I see a 
stacktrace and then a LOR output like the following:


lock order reversal:
  1st 0x81633d88 entropy harvest mutex (entropy harvest mutex) 
@ /usr/src/sys/dev/random/random_harvestq.c:198
  2nd 0x813b6208 scrlock (scrlock) @ 
/usr/src/sys/dev/syscons/syscons.c:2682


Does this mean WITNESS has already stored an ordering of #1 harvest_mtx 
then #2 scp-scr_lock, and somewhere somebody tried to lock 
scp-scr_lock without first getting harvest_mtx?  Or the reverse 
(WITNESS previously recorded scrlock and then harvest and the lines it 
spit out were the offenders?)


Along those lines, in 10.0 and 10.1 releases I get two LORs showing up 
almost on-top of each other, with the other LOR showing up as:


lock order reversal:
 1st 0x81633d88 entropy harvest mutex (entropy harvest mutex) @ 
/usr/src/sys/dev/random/random_harvestq.c:198
 2nd 0x81424bb8 sleepq chain (sleepq chain) @ 
/usr/src/sys/kern/subr_sleepqueue.c:240


This seems like maybe two LORs are detected at the same time, which 
perhaps suggests that the harvest_mtx should have been taken /after/ 
both of the other locks mentioned (scrlock and sleepq).


I'm happy to do the legwork implementing, testing, and submitting a 
patch for this, but I would really appreciate a pointer in the right 
direction from somebody who already has handled some LORs before.


Thanks!

ellis
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

WITNESS observes 2 LORs on Boot of Release 10.1

2014-11-18 Thread Ellis H. Wilson III


Hi all,

I'm observing the following two WITNESS LORs being thrown upon boot-up 
of 10.0 and I was tracking current, hoping they would go away by 10.1, 
but it seems they persist as shown below.  I suspect this is because 
current is being built with WITNESS on but also with SKIPSPIN on.  So 
these issues are unlikely to show up for any devs but those who 
specifically enable WITNESS and disable SKIPSPIN like myself.  At my 
work we would greatly like to do our debugging with checking of 
spin-locking order included and panicing upon LOR detection.  That's not 
possible with these in existence.


Specifically:

snip
Event timer RTC frequency 32768 Hz quality 0
ppc0: cannot reserve I/O port range
Timecounters tick every 10.000 msec
pcm0: measured ac97 link rate at 29212 Hz
lock order reversal:
lock order reversal:
 1st 0x8177a1a0 entropy harvest mutex (entropy harvest mutex) @ 
/usr/src/sys/dev/random/random_harvestq.c:198
 2nd 0x81475108 scrlock (scrlock) @ 
/usr/src/sys/dev/syscons/syscons.c:2694

KDB: stack backtrace:
#0 0x8092f020 at kdb_backtrace+0x60
#1 0x80944204 at witness_checkorder+0xc04
#2 0x808e6d5e at __mtx_lock_spin_flags+0x4e
#3 0x80738b80 at sc_puts+0xb0
#4 0x8073c4e5 at sc_cnputc+0xe5
#5 0x808b6cf0 at cnputc+0x80
#6 0x808b6fa8 at cnputs+0x58
#7 0x80934011 at putchar+0x151
#8 0x80932c36 at kvprintf+0xf6
#9 0x809344ad at _vprintf+0x8d
#10 0x809346c3 at printf+0x53
#11 0x80943fe0 at witness_checkorder+0x9e0
#12 0x808e6d5e at __mtx_lock_spin_flags+0x4e
#13 0x8090019b at msleep_spin_sbt+0x5b
#14 0x80698198 at random_kthread+0x78
#15 0x808cd941 at fork_exit+0x71
#16 0x80caee7e at fork_trampoline+0xe
 1st 0x8177a1a0 entropy harvest mutex (entropy harvest mutex) @ 
/usr/src/sys/dev/random/random_harvestq.c:198
 2nd 0x81568390 sleepq chain (sleepq chain) @ 
/usr/src/sys/kern/subr_sleepqueue.c:237

KDB: stack backtrace:
#0 0x8092f020 at kdb_backtrace+0x60
#1 0x80944204 at witness_checkorder+0xc04
#2 0x808e6d5e at __mtx_lock_spin_flags+0x4e
#3 0x8090019b at msleep_spin_sbt+0x5b
#4 0x80698198 at random_kthread+0x78
#5 0x808cd941 at fork_exit+0x71
#6 0x80caee7e at fork_trampoline+0xe
usbus0: 12Mbps Full Speed USB v1.0
ugen0.1: Apple at usbus0
uhub0: Apple OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1 on usbus0
ada0 at ata0 bus 0 scbus0 target 0 lun 0
ada0: VBOX HARDDISK 1.0 ATA-6 device
snip

This is on a virtual machine via VirtualBox, but we have reproduced it 
on real hardware.  Regarding the latter reversal, I have dug in the code 
and see the calls go about in the following order:


In sys/dev/random/random_harvestq.c:
 198 mtx_lock_spin(harvest_mtx);
 209 msleep_spin_sbt(random_kthread_control, harvest_mtx,
Now in sys/kern/kern_synch.c:
 297 sleepq_lock(ident);
Now in sys/kern/subr_sleepqueue.c
 237 mtx_lock_spin(sc-sc_lock);

The above line numbers might be a hair off from head since I'm looking 
at code synced last week.


I'm happy to open a bug on this if that's the desirable course of 
action, or to even assist in fixing it, but I wanted to first see if 
anybody knew about these already (they didn't show up on the known LORs 
list on quick perusal) or if this was simply a case of WITNESS being 
overly conservative and throwing false positives.  If this belongs on a 
different list just let me know.


Thanks!

ellis
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: WITNESS observes 2 LORs on Boot of Release 10.1

2014-11-18 Thread Benjamin Kaduk

On Tue, 18 Nov 2014, Ellis H. Wilson III wrote:

 This is on a virtual machine via VirtualBox, but we have reproduced it on real
 hardware.  Regarding the latter reversal, I have dug in the code and see the
 calls go about in the following order:

 In sys/dev/random/random_harvestq.c:
  198 mtx_lock_spin(harvest_mtx);
  209 msleep_spin_sbt(random_kthread_control, harvest_mtx,
 Now in sys/kern/kern_synch.c:
  297 sleepq_lock(ident);
 Now in sys/kern/subr_sleepqueue.c
  237 mtx_lock_spin(sc-sc_lock);

 The above line numbers might be a hair off from head since I'm looking at code
 synced last week.

 I'm happy to open a bug on this if that's the desirable course of action, or
 to even assist in fixing it, but I wanted to first see if anybody knew about
 these already (they didn't show up on the known LORs list on quick perusal) or
 if this was simply a case of WITNESS being overly conservative and throwing
 false positives.  If this belongs on a different list just let me know.

I don't believe it is known, and this list is fine.

 I'm observing the following two WITNESS LORs being thrown upon boot-up of 10.0
 and I was tracking current, hoping they would go away by 10.1, but it seems
 they persist as shown below.  I suspect this is because current is being built
 with WITNESS on but also with SKIPSPIN on.  So these issues are unlikely to
 show up for any devs but those who specifically enable WITNESS and disable
 SKIPSPIN like myself.  At my work we would greatly like to do our debugging
 with checking of spin-locking order included and panicing upon LOR detection.
 That's not possible with these in existence.

However, I was under the impression that a kernel built with WITNESS and
without WITNESS_SKIPSPIN would panic on boot on the cnputs_mutx (see,
e.g.,
https://lists.freebsd.org/pipermail/freebsd-stable/2014-January/076864.html).
So, (1) I'm surprised you can boot it, and (2) that would explain why no
one else has been using it.

-Ben
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: WITNESS observes 2 LORs on Boot of Release 10.1

2014-11-18 Thread Ellis H. Wilson III


On 11/18/2014 05:39 PM, Benjamin Kaduk wrote:

On Tue, 18 Nov 2014, Ellis H. Wilson III wrote:

I'm observing the following two WITNESS LORs being thrown upon boot-up of 10.0
and I was tracking current, hoping they would go away by 10.1, but it seems
they persist as shown below.  I suspect this is because current is being built
with WITNESS on but also with SKIPSPIN on.  So these issues are unlikely to
show up for any devs but those who specifically enable WITNESS and disable
SKIPSPIN like myself.  At my work we would greatly like to do our debugging
with checking of spin-locking order included and panicing upon LOR detection.
That's not possible with these in existence.


However, I was under the impression that a kernel built with WITNESS and
without WITNESS_SKIPSPIN would panic on boot on the cnputs_mutx (see,
e.g.,
https://lists.freebsd.org/pipermail/freebsd-stable/2014-January/076864.html).
So, (1) I'm surprised you can boot it, and (2) that would explain why no
one else has been using it.


That's a very interesting thread.  I've seen another where a fellow 
developer suggested just throwing on the WITNESS_SKIPSPIN flag to 
solve the issue.  I can't say that I agree with the approach, but I 
understand.  I'd be willing to tackle a bit of WITNESS massaging to help 
it be instructed about known false positives better, if that's desirable.


Why I'm able to boot however, is simple: I haven't enabled a full suite 
of debugging flags, KDB/DDB being the key ones that cause a panic to 
occur on a failure like the one I've seen.  We originally were seeing 
the panic so WITNESS in its entirety was shut off.  I was asked to try 
and get that back on-track, so to start I at least wanted to see how 
many LORs we were dealing with on boot.  2 is apparently that magic 
number, and maybe 3 if the cputs issue you refer to is still possible to 
be hit in the 10.1 wild.


If nobody has seen these before, I'll try and put together fixes for 
them.  Please somebody speak up if you have seen them or have useful 
information for me to go on in my patches.


Thanks,

ellis
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

WITNESS: unable to allocate a new witness object

2013-10-15 Thread Anton Shterenlikht

I'm trying to set up textdump(4).

On boot I see:

WITNESS: unable to allocate a new witness object

also

Expensive timeout(9) function: 0x9ffc00e222e0(0xa09ed320) 
0.002434387 s
 kickstart.

Does the first indicate I haven't set up
WITNESS correctly?

What does the second tell me?

Thanks

Anton
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: WITNESS: unable to allocate a new witness object

2013-10-15 Thread Davide Italiano

On Tue, Oct 15, 2013 at 4:17 PM, Anton Shterenlikht me...@bris.ac.uk wrote:
 I'm trying to set up textdump(4).

 On boot I see:

 WITNESS: unable to allocate a new witness object

 also

It means that you run out of WITNESS object on the free list.


 Expensive timeout(9) function: 0x9ffc00e222e0(0xa09ed320) 
 0.002434387 s
  kickstart.


It's output from DIAGNOSTIC, it's triggered when the callout handler
execution time exceeds a given threshold. You can safely ignore it.


Also, please stop spamming on mailing lists with new posts. They more
or less all refers to the same problem. Keeps posting doesn't
encourage people to look at it, neither it helps to have it solved
more quickly.

Thanks,

-- 
Davide

There are no solved problems; there are only problems that are more
or less solved -- Henri Poincare
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Witness message about lock order reversal on 10 (head)

2013-08-19 Thread Yuri


I got these messages on 10 head, rev.254235, during 'filesystem full' condition.

Yuri


= log =
lock order reversal:
 1st 0xff80f7432470 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:3054
 2nd 0xfe00075b5600 dirhash (dirhash) @ 
/usr/src/sys/ufs/ufs/ufs_dirhash.c:284
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff8000284440
kdb_backtrace() at kdb_backtrace+0x39/frame 0xff80002844f0
witness_checkorder() at witness_checkorder+0xd4f/frame 0xff8000284580
_sx_xlock() at _sx_xlock+0x75/frame 0xff80002845c0
ufsdirhash_add() at ufsdirhash_add+0x3b/frame 0xff8000284600
ufs_direnter() at ufs_direnter+0x688/frame 0xff80002846c0
ufs_mkdir() at ufs_mkdir+0x863/frame 0xff80002848c0
VOP_MKDIR_APV() at VOP_MKDIR_APV+0xf0/frame 0xff80002848f0
kern_mkdirat() at kern_mkdirat+0x21a/frame 0xff8000284ae0
amd64_syscall() at amd64_syscall+0x265/frame 0xff8000284bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff8000284bf0
--- syscall (136, FreeBSD ELF64, sys_mkdir), rip = 0x800931e9a, rsp = 
0x7fffd798, rbp = 0x7fffdc70 ---
lock order reversal:
 1st 0xfe0007633840 filedesc structure (filedesc structure) @ 
/usr/src/sys/kern/kern_descrip.c:1184
 2nd 0xfe0007a45240 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:4346
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff800031f6b0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xff800031f760
witness_checkorder() at witness_checkorder+0xd4f/frame 0xff800031f7f0
__lockmgr_args() at __lockmgr_args+0x6f2/frame 0xff800031f920
ffs_lock() at ffs_lock+0x84/frame 0xff800031f970
VOP_LOCK1_APV() at VOP_LOCK1_APV+0xf5/frame 0xff800031f9a0
_vn_lock() at _vn_lock+0xab/frame 0xff800031fa10
knlist_remove_kq() at knlist_remove_kq+0x82/frame 0xff800031fa40
knote_fdclose() at knote_fdclose+0xc8/frame 0xff800031fa90
closefp() at closefp+0x64/frame 0xff800031fae0
amd64_syscall() at amd64_syscall+0x265/frame 0xff800031fbf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff800031fbf0
--- syscall (6, FreeBSD ELF64, sys_close), rip = 0x80164537a, rsp = 
0x7f7fbf08, rbp = 0x7f7fbf20 ---
pid 983 (sendmail), uid 25 inumber 473785 on /: filesystem full
pid 1101 (dd), uid 2 inumber 426338 on /: filesystem full
lock order reversal:
 1st 0xfe0007cbc240 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2099
 2nd 0xff80f7894338 bufwait (bufwait) @ /usr/src/sys/ufs/ffs/ffs_vnops.c:262
 3rd 0xfe003b531418 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2099
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff8000378e60
kdb_backtrace() at kdb_backtrace+0x39/frame 0xff8000378f10
witness_checkorder() at witness_checkorder+0xd4f/frame 0xff8000378fa0
__lockmgr_args() at __lockmgr_args+0x6f2/frame 0xff80003790d0
ffs_lock() at ffs_lock+0x84/frame 0xff8000379120
VOP_LOCK1_APV() at VOP_LOCK1_APV+0xf5/frame 0xff8000379150
_vn_lock() at _vn_lock+0xab/frame 0xff80003791c0
vget() at vget+0x70/frame 0xff8000379210
vfs_hash_get() at vfs_hash_get+0xf5/frame 0xff8000379260
ffs_vgetf() at ffs_vgetf+0x41/frame 0xff80003792f0
softdep_sync_buf() at softdep_sync_buf+0x2e4/frame 0xff80003793a0
ffs_syncvnode() at ffs_syncvnode+0x258/frame 0xff8000379420
ffs_truncate() at ffs_truncate+0x5ca/frame 0xff8000379600
ufs_direnter() at ufs_direnter+0x891/frame 0xff80003796c0
ufs_mkdir() at ufs_mkdir+0x863/frame 0xff80003798c0
VOP_MKDIR_APV() at VOP_MKDIR_APV+0xf0/frame 0xff80003798f0
kern_mkdirat() at kern_mkdirat+0x21a/frame 0xff8000379ae0
amd64_syscall() at amd64_syscall+0x265/frame 0xff8000379bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff8000379bf0
--- syscall (136, FreeBSD ELF64, sys_mkdir), rip = 0x800931e9a, rsp = 
0x7fffd898, rbp = 0x7fffd970 ---

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Witness message about lock order reversal on 10 (head)

2013-08-19 Thread Dan Mack



It might be the same false positive I saw a couple weeks ago ... Davide said to 
me:

 | The LOR is a false positive.
 | See the comment in sys/ufs/ufs/ufs_dirhash.c
 | Also, switching motherboards is not related to this in any way. You'll
 | eventually hit that LOR report, unless you disabled WITNESS.


Thanks,

--
Davide



On Mon, 19 Aug 2013, Yuri wrote:

I got these messages on 10 head, rev.254235, during 'filesystem full' 
condition.


Yuri


= log =
lock order reversal:
1st 0xff80f7432470 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:3054
2nd 0xfe00075b5600 dirhash (dirhash) @ 
/usr/src/sys/ufs/ufs/ufs_dirhash.c:284

KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xff8000284440

kdb_backtrace() at kdb_backtrace+0x39/frame 0xff80002844f0
witness_checkorder() at witness_checkorder+0xd4f/frame 0xff8000284580
_sx_xlock() at _sx_xlock+0x75/frame 0xff80002845c0
ufsdirhash_add() at ufsdirhash_add+0x3b/frame 0xff8000284600
ufs_direnter() at ufs_direnter+0x688/frame 0xff80002846c0
ufs_mkdir() at ufs_mkdir+0x863/frame 0xff80002848c0
VOP_MKDIR_APV() at VOP_MKDIR_APV+0xf0/frame 0xff80002848f0
kern_mkdirat() at kern_mkdirat+0x21a/frame 0xff8000284ae0
amd64_syscall() at amd64_syscall+0x265/frame 0xff8000284bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff8000284bf0
--- syscall (136, FreeBSD ELF64, sys_mkdir), rip = 0x800931e9a, rsp = 
0x7fffd798, rbp = 0x7fffdc70 ---

lock order reversal:
1st 0xfe0007633840 filedesc structure (filedesc structure) @ 
/usr/src/sys/kern/kern_descrip.c:1184

2nd 0xfe0007a45240 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:4346
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xff800031f6b0

kdb_backtrace() at kdb_backtrace+0x39/frame 0xff800031f760
witness_checkorder() at witness_checkorder+0xd4f/frame 0xff800031f7f0
__lockmgr_args() at __lockmgr_args+0x6f2/frame 0xff800031f920
ffs_lock() at ffs_lock+0x84/frame 0xff800031f970
VOP_LOCK1_APV() at VOP_LOCK1_APV+0xf5/frame 0xff800031f9a0
_vn_lock() at _vn_lock+0xab/frame 0xff800031fa10
knlist_remove_kq() at knlist_remove_kq+0x82/frame 0xff800031fa40
knote_fdclose() at knote_fdclose+0xc8/frame 0xff800031fa90
closefp() at closefp+0x64/frame 0xff800031fae0
amd64_syscall() at amd64_syscall+0x265/frame 0xff800031fbf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff800031fbf0
--- syscall (6, FreeBSD ELF64, sys_close), rip = 0x80164537a, rsp = 
0x7f7fbf08, rbp = 0x7f7fbf20 ---

pid 983 (sendmail), uid 25 inumber 473785 on /: filesystem full
pid 1101 (dd), uid 2 inumber 426338 on /: filesystem full
lock order reversal:
1st 0xfe0007cbc240 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2099
2nd 0xff80f7894338 bufwait (bufwait) @ 
/usr/src/sys/ufs/ffs/ffs_vnops.c:262

3rd 0xfe003b531418 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2099
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xff8000378e60

kdb_backtrace() at kdb_backtrace+0x39/frame 0xff8000378f10
witness_checkorder() at witness_checkorder+0xd4f/frame 0xff8000378fa0
__lockmgr_args() at __lockmgr_args+0x6f2/frame 0xff80003790d0
ffs_lock() at ffs_lock+0x84/frame 0xff8000379120
VOP_LOCK1_APV() at VOP_LOCK1_APV+0xf5/frame 0xff8000379150
_vn_lock() at _vn_lock+0xab/frame 0xff80003791c0
vget() at vget+0x70/frame 0xff8000379210
vfs_hash_get() at vfs_hash_get+0xf5/frame 0xff8000379260
ffs_vgetf() at ffs_vgetf+0x41/frame 0xff80003792f0
softdep_sync_buf() at softdep_sync_buf+0x2e4/frame 0xff80003793a0
ffs_syncvnode() at ffs_syncvnode+0x258/frame 0xff8000379420
ffs_truncate() at ffs_truncate+0x5ca/frame 0xff8000379600
ufs_direnter() at ufs_direnter+0x891/frame 0xff80003796c0
ufs_mkdir() at ufs_mkdir+0x863/frame 0xff80003798c0
VOP_MKDIR_APV() at VOP_MKDIR_APV+0xf0/frame 0xff80003798f0
kern_mkdirat() at kern_mkdirat+0x21a/frame 0xff8000379ae0
amd64_syscall() at amd64_syscall+0x265/frame 0xff8000379bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff8000379bf0
--- syscall (136, FreeBSD ELF64, sys_mkdir), rip = 0x800931e9a, rsp = 
0x7fffd898, rbp = 0x7fffd970 ---


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org




dan
--
Dan Mack

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Spurious witness warning when destroying spin mtx

2013-01-14 Thread John Baldwin

On Saturday, November 24, 2012 10:01:39 am Attilio Rao wrote:
 On Sat, Nov 24, 2012 at 3:08 AM, Ryan Stone ryst...@gmail.com wrote:
  Today I saw a spurious witness warning for acquiring duplicate lock of
  same type.  The root cause is that when running mtx_destroy on a spinlock
  that is held by the current thread, mtx_destroy calls spinlock_exit()
  before calling WITNESS_UNLOCK, which opens up a window in which the CPU can
  be interrupted and attempt to acquire another spinlock of the same type as
  the one being destroyed.  This patch should fix it:
 
 I seriously wonder why right now we don't assume the lock is unheld.
 There are likely historically reasons for that, but I would like to
 know which one are those and eventually fix them out.
 FWIK, all the other locking primitives assume the lock is already
 unheld when destroying and I think it would be good to have that for
 mutexes as well.

That is simply behavior we inherited from BSD/OS.  I didn't find it
all that useful so all of the other locking primitives I've added
since then have not had this behavior.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Spurious witness warning when destroying spin mtx

2013-01-11 Thread John Baldwin

On Friday, November 23, 2012 10:08:28 PM Ryan Stone wrote:
 Today I saw a spurious witness warning for acquiring duplicate lock of
 same type.  The root cause is that when running mtx_destroy on a spinlock
 that is held by the current thread, mtx_destroy calls spinlock_exit()
 before calling WITNESS_UNLOCK, which opens up a window in which the CPU can
 be interrupted and attempt to acquire another spinlock of the same type as
 the one being destroyed.  This patch should fix it:
 
 diff --git a/sys/kern/kern_mutex.c b/sys/kern/kern_mutex.c
 index 2f13863..96f43f8 100644
 --- a/sys/kern/kern_mutex.c
 +++ b/sys/kern/kern_mutex.c
 @@ -918,16 +918,16 @@ _mtx_destroy(volatile uintptr_t *c)
 else {
 MPASS((m-mtx_lock  (MTX_RECURSED|MTX_CONTESTED)) == 0);
 
 +   lock_profile_release_lock(m-lock_object);
 +   /* Tell witness this isn't locked to make it happy. */
 +   WITNESS_UNLOCK(m-lock_object, LOP_EXCLUSIVE, __FILE__,
 +   __LINE__);
 +
 /* Perform the non-mtx related part of mtx_unlock_spin().
 */ if (LOCK_CLASS(m-lock_object) == lock_class_mtx_spin)
 spinlock_exit();
 else
 curthread-td_locks--;
 -
 -   lock_profile_release_lock(m-lock_object);
 -   /* Tell witness this isn't locked to make it happy. */
 -   WITNESS_UNLOCK(m-lock_object, LOP_EXCLUSIVE, __FILE__,
 -   __LINE__);
 }
 
 m-mtx_lock = MTX_DESTROYED

Ah, I would tweak this slightly perhaps to match mtx_unlock() and 
mtx_unlock_spin():

if (LOCK_CLASS() == lock_class_mtx_sleep)
curthread-td_locks--;

/* lock profile and witness stuff */

if (LOCK_CLASS() == lock_class_mtx_spin)
spinlock_exit();

You are correct that it is broken for the spin case.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Spurious witness warning when destroying spin mtx

2012-11-25 Thread Attilio Rao

On Sat, Nov 24, 2012 at 3:01 PM, Attilio Rao atti...@freebsd.org wrote:
 On Sat, Nov 24, 2012 at 3:08 AM, Ryan Stone ryst...@gmail.com wrote:
 Today I saw a spurious witness warning for acquiring duplicate lock of
 same type.  The root cause is that when running mtx_destroy on a spinlock
 that is held by the current thread, mtx_destroy calls spinlock_exit()
 before calling WITNESS_UNLOCK, which opens up a window in which the CPU can
 be interrupted and attempt to acquire another spinlock of the same type as
 the one being destroyed.  This patch should fix it:

 I seriously wonder why right now we don't assume the lock is unheld.
 There are likely historically reasons for that, but I would like to
 know which one are those and eventually fix them out.
 FWIK, all the other locking primitives assume the lock is already
 unheld when destroying and I think it would be good to have that for
 mutexes as well.

 Can you please show which lock triggers the panic you saw?

Ryan,
however I'm sure this patch would introduce a major switch in our
KPI/POLA and eventually it would be a bigger work.

Your patch is certainly right and I think you should commit it for the
time being. For the long-term, maybe you would like to work on such a
patch, rather.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Spurious witness warning when destroying spin mtx

2012-11-24 Thread Attilio Rao

On Sat, Nov 24, 2012 at 3:08 AM, Ryan Stone ryst...@gmail.com wrote:
 Today I saw a spurious witness warning for acquiring duplicate lock of
 same type.  The root cause is that when running mtx_destroy on a spinlock
 that is held by the current thread, mtx_destroy calls spinlock_exit()
 before calling WITNESS_UNLOCK, which opens up a window in which the CPU can
 be interrupted and attempt to acquire another spinlock of the same type as
 the one being destroyed.  This patch should fix it:

I seriously wonder why right now we don't assume the lock is unheld.
There are likely historically reasons for that, but I would like to
know which one are those and eventually fix them out.
FWIK, all the other locking primitives assume the lock is already
unheld when destroying and I think it would be good to have that for
mutexes as well.

Can you please show which lock triggers the panic you saw?

Thanks,
Attilio



-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Spurious witness warning when destroying spin mtx

2012-11-24 Thread Ryan Stone

On Sat, Nov 24, 2012 at 10:01 AM, Attilio Rao atti...@freebsd.org wrote:

 I seriously wonder why right now we don't assume the lock is unheld.
 There are likely historically reasons for that, but I would like to
 know which one are those and eventually fix them out.
 FWIK, all the other locking primitives assume the lock is already
 unheld when destroying and I think it would be good to have that for
 mutexes as well.

 Can you please show which lock triggers the panic you saw?

 Thanks,
 Attilio


It was taskqueue_free:

void
taskqueue_free(struct taskqueue *queue)
{

TQ_LOCK(queue);
queue-tq_flags = ~TQ_FLAGS_ACTIVE;
taskqueue_terminate(queue-tq_threads, queue);
KASSERT(TAILQ_EMPTY(queue-tq_active), (Tasks still running?));
KASSERT(queue-tq_callouts == 0, (Armed timeout tasks));
mtx_destroy(queue-tq_mutex);
free(queue-tq_threads, M_TASKQUEUE);
free(queue, M_TASKQUEUE);
}
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Spurious witness warning when destroying spin mtx

2012-11-24 Thread Attilio Rao

On Sat, Nov 24, 2012 at 3:46 PM, Ryan Stone ryst...@gmail.com wrote:

 On Sat, Nov 24, 2012 at 10:01 AM, Attilio Rao atti...@freebsd.org wrote:

 I seriously wonder why right now we don't assume the lock is unheld.
 There are likely historically reasons for that, but I would like to
 know which one are those and eventually fix them out.
 FWIK, all the other locking primitives assume the lock is already
 unheld when destroying and I think it would be good to have that for
 mutexes as well.

 Can you please show which lock triggers the panic you saw?

 Thanks,
 Attilio


 It was taskqueue_free:

taskqueue_free() must not be called in places where there are still
races, so the lock is not really meaningful and should be acquired.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Spurious witness warning when destroying spin mtx

2012-11-24 Thread Attilio Rao

On Sat, Nov 24, 2012 at 3:51 PM, Attilio Rao atti...@freebsd.org wrote:
 On Sat, Nov 24, 2012 at 3:46 PM, Ryan Stone ryst...@gmail.com wrote:

 On Sat, Nov 24, 2012 at 10:01 AM, Attilio Rao atti...@freebsd.org wrote:

 I seriously wonder why right now we don't assume the lock is unheld.
 There are likely historically reasons for that, but I would like to
 know which one are those and eventually fix them out.
 FWIK, all the other locking primitives assume the lock is already
 unheld when destroying and I think it would be good to have that for
 mutexes as well.

 Can you please show which lock triggers the panic you saw?

 Thanks,
 Attilio


 It was taskqueue_free:

 taskqueue_free() must not be called in places where there are still
 races, so the lock is not really meaningful and should be acquired.

Herm, I mean to say after taskqueue_termintate() returns must not be races

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Spurious witness warning when destroying spin mtx

2012-11-23 Thread Ryan Stone

Today I saw a spurious witness warning for acquiring duplicate lock of
same type.  The root cause is that when running mtx_destroy on a spinlock
that is held by the current thread, mtx_destroy calls spinlock_exit()
before calling WITNESS_UNLOCK, which opens up a window in which the CPU can
be interrupted and attempt to acquire another spinlock of the same type as
the one being destroyed.  This patch should fix it:

diff --git a/sys/kern/kern_mutex.c b/sys/kern/kern_mutex.c
index 2f13863..96f43f8 100644
--- a/sys/kern/kern_mutex.c
+++ b/sys/kern/kern_mutex.c
@@ -918,16 +918,16 @@ _mtx_destroy(volatile uintptr_t *c)
else {
MPASS((m-mtx_lock  (MTX_RECURSED|MTX_CONTESTED)) == 0);

+   lock_profile_release_lock(m-lock_object);
+   /* Tell witness this isn't locked to make it happy. */
+   WITNESS_UNLOCK(m-lock_object, LOP_EXCLUSIVE, __FILE__,
+   __LINE__);
+
/* Perform the non-mtx related part of mtx_unlock_spin(). */
if (LOCK_CLASS(m-lock_object) == lock_class_mtx_spin)
spinlock_exit();
else
curthread-td_locks--;
-
-   lock_profile_release_lock(m-lock_object);
-   /* Tell witness this isn't locked to make it happy. */
-   WITNESS_UNLOCK(m-lock_object, LOP_EXCLUSIVE, __FILE__,
-   __LINE__);
}

m-mtx_lock = MTX_DESTROYED
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-05-17 Thread Andriy Gapon

on 25/01/2012 23:52 Andriy Gapon said the following:
 on 24/01/2012 14:32 Gleb Smirnoff said the following:
 Yes, now:

 Rebooting...
 lock order reversal:
  1st 0x80937140 smp rendezvous (smp rendezvous) @ 
 /usr/src/head/sys/kern/kern_shutdown.c:542
  2nd 0xfe0001f5d838 uart_hwmtx (uart_hwmtx) @ 
 /usr/src/head/sys/dev/uart/uart_cpu.h:92
 panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ 
 /usr/src/head/sys/kern/kern_cons.c:500
 
 OK, so it's just a plain LOR between smp rendezvous and uart_hwmtx, no new
 details...
 
 It's still intriguing to me why the LOR *doesn't* happen [*] with
 stop_scheduler_on_panic=0.  But I don't see a productive way to pursue this
 investigation further.

Salve Glebius!
After your recent nudging I took yet another look at this issue and it seems 
that
I have some findings.

For those who might get interested here is a convenience reference to the whole
thread on gmane: http://thread.gmane.org/gmane.os.freebsd.current/139307

A short summary.
Prerequisites: an SMP x86 system, its kernel is configured with WITNESS 
!WITNESS_SKIPSPIN (this is important) and a system uses serial console via uart.
Then, if stop_scheduler_on_panic is set to zero the system can be rebooted 
without
a problem.  On the other hand, if stop_scheduler_on_panic is enabled, then the
system first runs into a LOR when calling printf() in cpu_reset() and then it 
runs
into a panic when printf is recursively called from witness(9) to report the 
LOR.
 The panic happens because of the recursion on cnputs_mtx, which should ensure
that cnputs() output is not intermingled and which is not flagged to support
recursion.

There are two things about this report that greatly confused and puzzled me:
1. stop_scheduler_on_panic variable is used _only_ in panic(9).  So how could it
be possible that changing its value affects behavior of the system when panic(9)
is not called?!

2. The LOR in question happens between smp rendezvous (smp_ipi_mtx) and
uart_hwmtx (sc_hwmtx_s in uart core) spin locks.  The order of these locks is
actually predefined in witness order_lists[] such that uart_hwmtx must come 
before
smp_ipi_mtx.  But in the reboot path we first take smp_ipi_mtx in
shutdown_reset(), then we call cpu_reset, then it calls printf and from there we
get to uart_hwmtx for serial console output.  So the order between these 
spinlocks
is always violated in the x86 SMP reboot path.
How come witness(9) doesn't _always_ detect this LOR?
How come it didn't detect this LOR before any stop scheduler commits?!

[Spoiler alert :-)]

Turns out that the two puzzles above are closely related.
Let's first list all the things that change when stop_scheduler_on_panic is
enabled and a panic happens:
- other CPUs are stopped (forced to spin)
- interrupts on current CPU are disabled
- by virtue of the above the current thread should be the only thread running
(unless it executes a voluntary switch)
- all locks are busted, they are completely ignored / bypassed
- by virtue of the above no lock invariants and witness checks are performed, so
no lock order checking, no recursion checking, etc

So, what I observe is this: when stop_scheduler_on_panic is disabled, the LOR is
actually detected as it should be.  witness(9) works properly here.  Once the 
LOR
is detected witness(9) wants to report it using printf(9).  That's where we run
into the cnputs_mtx recursion panic.  It's all exactly as with
stop_scheduler_on_panic enabled.  Then panic(9) wants to report the panic using
printf(9), which goes to cnputs() again, where _mtx_lock_spin_flags() detects
locks recursion again (this is independent of witness(9)) and calls panic(9)
again.  Then panic(9) wants to report the panic using printf(9)...
I assume that when the stack is exhausted we run into a double fault and
dblfault_handler wants to print something again...  Likely we eventually run 
into
a triple fault which resets the machine.  Although, I recall some old reports of
machines hanging during reboot in a place suspiciously close to where the
described ordeal happens...
But if the machine doesn't hang we get a full appearance of the reset 
successfully
happening (modulo some last messages missing).

With stop_scheduler_on_panic enabled and all the locks being ignored we, of
course, do not run into any secondary lock recursions and resulting panics.  So
the system is able to at least panic gracefully (still we should have just
reported the LOR gracefully, no panic is required).

Some obvious conclusions:
- the smp rendezvous and uart_hwmtx LOR is real and it is the true cause of
the problem; it should be fixed one way or other - either by correcting witness
order_lists[] or by avoiding the LOR in shutdown_reset/cpu_reset;
- because witness(9) uses printf(9) to report problems, it is very fragile to 
use
witness with any locks that can be acquired under printf(9)
- stop_scheduler_on_panic just uncovers the true bug

There are probably other conclusions that can

Re: new panic in cpu_reset() with WITNESS

2012-05-17 Thread Attilio Rao

2012/5/17, Andriy Gapon a...@freebsd.org:
 on 25/01/2012 23:52 Andriy Gapon said the following:
 on 24/01/2012 14:32 Gleb Smirnoff said the following:
 Yes, now:

 Rebooting...
 lock order reversal:
  1st 0x80937140 smp rendezvous (smp rendezvous) @
 /usr/src/head/sys/kern/kern_shutdown.c:542
  2nd 0xfe0001f5d838 uart_hwmtx (uart_hwmtx) @
 /usr/src/head/sys/dev/uart/uart_cpu.h:92
 panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @
 /usr/src/head/sys/kern/kern_cons.c:500

 OK, so it's just a plain LOR between smp rendezvous and uart_hwmtx, no
 new
 details...

 It's still intriguing to me why the LOR *doesn't* happen [*] with
 stop_scheduler_on_panic=0.  But I don't see a productive way to pursue
 this
 investigation further.

 Salve Glebius!
 After your recent nudging I took yet another look at this issue and it seems
 that
 I have some findings.

 For those who might get interested here is a convenience reference to the
 whole
 thread on gmane: http://thread.gmane.org/gmane.os.freebsd.current/139307

 A short summary.
 Prerequisites: an SMP x86 system, its kernel is configured with WITNESS 
 !WITNESS_SKIPSPIN (this is important) and a system uses serial console via
 uart.
 Then, if stop_scheduler_on_panic is set to zero the system can be rebooted
 without
 a problem.  On the other hand, if stop_scheduler_on_panic is enabled, then
 the
 system first runs into a LOR when calling printf() in cpu_reset() and then
 it runs
 into a panic when printf is recursively called from witness(9) to report the
 LOR.
  The panic happens because of the recursion on cnputs_mtx, which should
 ensure
 that cnputs() output is not intermingled and which is not flagged to
 support
 recursion.

 There are two things about this report that greatly confused and puzzled
 me:
 1. stop_scheduler_on_panic variable is used _only_ in panic(9).  So how
 could it
 be possible that changing its value affects behavior of the system when
 panic(9)
 is not called?!

 2. The LOR in question happens between smp rendezvous (smp_ipi_mtx) and
 uart_hwmtx (sc_hwmtx_s in uart core) spin locks.  The order of these locks
 is
 actually predefined in witness order_lists[] such that uart_hwmtx must come
 before
 smp_ipi_mtx.  But in the reboot path we first take smp_ipi_mtx in
 shutdown_reset(), then we call cpu_reset, then it calls printf and from
 there we
 get to uart_hwmtx for serial console output.  So the order between these
 spinlocks
 is always violated in the x86 SMP reboot path.
 How come witness(9) doesn't _always_ detect this LOR?
 How come it didn't detect this LOR before any stop scheduler commits?!

 [Spoiler alert :-)]

 Turns out that the two puzzles above are closely related.
 Let's first list all the things that change when stop_scheduler_on_panic is
 enabled and a panic happens:
 - other CPUs are stopped (forced to spin)
 - interrupts on current CPU are disabled
 - by virtue of the above the current thread should be the only thread
 running
 (unless it executes a voluntary switch)
 - all locks are busted, they are completely ignored / bypassed
 - by virtue of the above no lock invariants and witness checks are
 performed, so
 no lock order checking, no recursion checking, etc

 So, what I observe is this: when stop_scheduler_on_panic is disabled, the
 LOR is
 actually detected as it should be.  witness(9) works properly here.  Once
 the LOR
 is detected witness(9) wants to report it using printf(9).  That's where we
 run
 into the cnputs_mtx recursion panic.  It's all exactly as with
 stop_scheduler_on_panic enabled.  Then panic(9) wants to report the panic
 using
 printf(9), which goes to cnputs() again, where _mtx_lock_spin_flags()
 detects
 locks recursion again (this is independent of witness(9)) and calls
 panic(9)
 again.  Then panic(9) wants to report the panic using printf(9)...
 I assume that when the stack is exhausted we run into a double fault and
 dblfault_handler wants to print something again...  Likely we eventually run
 into
 a triple fault which resets the machine.  Although, I recall some old
 reports of
 machines hanging during reboot in a place suspiciously close to where the
 described ordeal happens...
 But if the machine doesn't hang we get a full appearance of the reset
 successfully
 happening (modulo some last messages missing).

 With stop_scheduler_on_panic enabled and all the locks being ignored we, of
 course, do not run into any secondary lock recursions and resulting panics.
 So
 the system is able to at least panic gracefully (still we should have
 just
 reported the LOR gracefully, no panic is required).

 Some obvious conclusions:
 - the smp rendezvous and uart_hwmtx LOR is real and it is the true cause
 of
 the problem; it should be fixed one way or other - either by correcting
 witness
 order_lists[] or by avoiding the LOR in shutdown_reset/cpu_reset;
 - because witness(9) uses printf(9) to report problems, it is very fragile
 to use
 witness with any locks that can

Re: locks under printf(9) and WITNESS

2012-01-28 Thread Andriy Gapon


BTW, I see another LOR with spinlocks, maybe harmless.

o sysbeep() is called from syscons code and it calls timeout() which introduces
the following order: scrlock - callout.

o The callout code programming of event timers introduces the following order
(via callout_new_inserted == cpu_new_callout):
callout - et_hw_mtx

o Eventtimers' doconfigtimer calls  loadtimer with et_hw_mtx held, loadtimer
calls et_start method of a configured event timer and, e.g. in the case of
lapic_et_start and bootverbose it calls printf(9), which gives:
et_hw_mtx - scrlock

This is just for the information.
-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-25 Thread Andriy Gapon

on 24/01/2012 14:32 Gleb Smirnoff said the following:
 Yes, now:
 
 Rebooting...
 lock order reversal:
  1st 0x80937140 smp rendezvous (smp rendezvous) @ 
 /usr/src/head/sys/kern/kern_shutdown.c:542
  2nd 0xfe0001f5d838 uart_hwmtx (uart_hwmtx) @ 
 /usr/src/head/sys/dev/uart/uart_cpu.h:92
 panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ 
 /usr/src/head/sys/kern/kern_cons.c:500

OK, so it's just a plain LOR between smp rendezvous and uart_hwmtx, no new
details...

It's still intriguing to me why the LOR *doesn't* happen [*] with
stop_scheduler_on_panic=0.  But I don't see a productive way to pursue this
investigation further.

[*] - or maybe better to say why it isn't detected/reported.

 cpuid = 0
 KDB: enter: panic
 [ thread pid 1 tid 11 ]
 Stopped at  kdb_enter+0x3b: movq$0,0x5159f2(%rip)
 db bt
 Tracing pid 1 tid 11 td 0xfe0001d5e000
 kdb_enter() at kdb_enter+0x3b
 panic() at panic+0x1c7
 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f
 cnputs() at cnputs+0x7a
 putchar() at putchar+0x11f
 kvprintf() at kvprintf+0x83
 vprintf() at vprintf+0x85
 printf() at printf+0x67

Maybe kdb_backtrace ought to use db_printf.

 kdb_backtrace() at kdb_backtrace+0x2d
 _witness_debugger() at _witness_debugger+0x2c
 witness_checkorder() at witness_checkorder+0x854
 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99
 uart_cnputc() at uart_cnputc+0x3e
 cnputc() at cnputc+0x4c
 cnputs() at cnputs+0x26
 putchar() at putchar+0x11f
 kvprintf() at kvprintf+0x83
 vprintf() at vprintf+0x85
 printf() at printf+0x67
 cpu_reset() at cpu_reset+0x81
 kern_reboot() at kern_reboot+0x3a5
 sys_reboot() at sys_reboot+0x42
 amd64_syscall() at amd64_syscall+0x39e
 Xfast_syscall() at Xfast_syscall+0xf7
 --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 
 0x7fffd6d8, rbp = 0x49 ---

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-24 Thread Gleb Smirnoff

On Mon, Jan 23, 2012 at 06:56:08PM +0200, Andriy Gapon wrote:
A on 23/01/2012 18:46 Gleb Smirnoff said the following:
A  On Mon, Jan 23, 2012 at 06:43:23PM +0200, Andriy Gapon wrote:
A  A  db bt
A  A  Tracing pid 1 tid 11 td 0xfe0001d5e000
A  A  kdb_enter() at kdb_enter+0x3b
A  A  panic() at panic+0x1c7
A  A  _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f
A  A  cnputs() at cnputs+0x7a
A  A  vprintf() at vprintf+0xcb
A  A  printf() at printf+0x67
A  A  db_putc() at db_putc+0x81
A  A 
A  A Ah, db_putc does something different from what I expected.
A  A Can you hack it to never use printf?
A  
A  Just cut printfs from db_putc()?
A 
A Make the following condition be always false:
A 
A if (!kdb_active || ddb_use_printf) {
A 
A E.g.:
A 
A if (0) {

With this change + s/printf/db_printf/ in subr_witness.c I've
got the following during reboot:

Rebooting...
lllock order reversal:
 1st 0x80937140 smp rendezvous (smp rendezvous) @ 
/usr/src/head/sys/kern/kern_shutdown.c:542
 2nd 0x80b13280 syscons video lock (syscons video lock) @ 
/usr/src/head/sys/dev/syscons/syscons.c:1921
panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ 
/usr/src/head/sys/kern/kern_cons.c:500

cpuid = 0
KDB: enter: panic
[ thread pid 1 tid 11 ]
Stopped at  kdb_enter+0x3b: movq$0,0x5159f2(%rip)
db bt
Tracing pid 1 tid 11 td 0xfe0001d5e000
kdb_enter() at kdb_enter+0x3b
panic() at panic+0x1c7
_mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f
cnputs() at cnputs+0x7a
putchar() at putchar+0x11f
kvprintf() at kvprintf+0x83
vprintf() at vprintf+0x85
printf() at printf+0x67
kdb_backtrace() at kdb_backtrace+0x2d
_witness_debugger() at _witness_debugger+0x2c
witness_checkorder() at witness_checkorder+0x854
_mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99
scrn_update() at scrn_update+0x41c
sc_cnputc() at sc_cnputc+0x46
cnputc() at cnputc+0x4c
db_putc() at db_putc+0x4d
kvprintf() at kvprintf+0x83
db_printf() at db_printf+0x86
witness_checkorder() at witness_checkorder+0x773
_mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99
sc_puts() at sc_puts+0x97
sc_cnputc() at sc_cnputc+0x3e
cnputc() at cnputc+0x4c
db_putc() at db_putc+0x4d
kvprintf() at kvprintf+0x83
db_printf() at db_printf+0x86
witness_checkorder() at witness_checkorder+0x773
_mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99
uart_cnputc() at uart_cnputc+0x3e
cnputc() at cnputc+0x4c
cnputs() at cnputs+0x26
putchar() at putchar+0x11f
kvprintf() at kvprintf+0x83
vprintf() at vprintf+0x85
printf() at printf+0x67
cpu_reset() at cpu_reset+0x81
kern_reboot() at kern_reboot+0x3a5
sys_reboot() at sys_reboot+0x42
amd64_syscall() at amd64_syscall+0x39e
Xfast_syscall() at Xfast_syscall+0xf7
--- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 
0x7fffd6d8, rbp = 0x49 ---
db

-- 
Totus tuus, Glebius.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-24 Thread Andriy Gapon

on 24/01/2012 13:54 Gleb Smirnoff said the following:
 On Mon, Jan 23, 2012 at 06:56:08PM +0200, Andriy Gapon wrote:
 A on 23/01/2012 18:46 Gleb Smirnoff said the following:
 A  On Mon, Jan 23, 2012 at 06:43:23PM +0200, Andriy Gapon wrote:
 A  A  db bt
 A  A  Tracing pid 1 tid 11 td 0xfe0001d5e000
 A  A  kdb_enter() at kdb_enter+0x3b
 A  A  panic() at panic+0x1c7
 A  A  _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f
 A  A  cnputs() at cnputs+0x7a
 A  A  vprintf() at vprintf+0xcb
 A  A  printf() at printf+0x67
 A  A  db_putc() at db_putc+0x81
 A  A 
 A  A Ah, db_putc does something different from what I expected.
 A  A Can you hack it to never use printf?
 A  
 A  Just cut printfs from db_putc()?
 A 
 A Make the following condition be always false:
 A 
 A if (!kdb_active || ddb_use_printf) {
 A 
 A E.g.:
 A 
 A if (0) {
 
 With this change + s/printf/db_printf/ in subr_witness.c I've
 got the following during reboot:
 
 Rebooting...
 lllock order reversal:
  1st 0x80937140 smp rendezvous (smp rendezvous) @ 
 /usr/src/head/sys/kern/kern_shutdown.c:542
  2nd 0x80b13280 syscons video lock (syscons video lock) @ 
 /usr/src/head/sys/dev/syscons/syscons.c:1921
 panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ 
 /usr/src/head/sys/kern/kern_cons.c:500

Can you disable video console?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-24 Thread Gleb Smirnoff

On Tue, Jan 24, 2012 at 02:09:03PM +0200, Andriy Gapon wrote:
A on 24/01/2012 13:54 Gleb Smirnoff said the following:
A  On Mon, Jan 23, 2012 at 06:56:08PM +0200, Andriy Gapon wrote:
A  A on 23/01/2012 18:46 Gleb Smirnoff said the following:
A  A  On Mon, Jan 23, 2012 at 06:43:23PM +0200, Andriy Gapon wrote:
A  A  A  db bt
A  A  A  Tracing pid 1 tid 11 td 0xfe0001d5e000
A  A  A  kdb_enter() at kdb_enter+0x3b
A  A  A  panic() at panic+0x1c7
A  A  A  _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f
A  A  A  cnputs() at cnputs+0x7a
A  A  A  vprintf() at vprintf+0xcb
A  A  A  printf() at printf+0x67
A  A  A  db_putc() at db_putc+0x81
A  A  A 
A  A  A Ah, db_putc does something different from what I expected.
A  A  A Can you hack it to never use printf?
A  A  
A  A  Just cut printfs from db_putc()?
A  A 
A  A Make the following condition be always false:
A  A 
A  A if (!kdb_active || ddb_use_printf) {
A  A 
A  A E.g.:
A  A 
A  A if (0) {
A  
A  With this change + s/printf/db_printf/ in subr_witness.c I've
A  got the following during reboot:
A  
A  Rebooting...
A  lllock order reversal:
A   1st 0x80937140 smp rendezvous (smp rendezvous) @ 
/usr/src/head/sys/kern/kern_shutdown.c:542
A   2nd 0x80b13280 syscons video lock (syscons video lock) @ 
/usr/src/head/sys/dev/syscons/syscons.c:1921
A  panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ 
/usr/src/head/sys/kern/kern_cons.c:500
A 
A Can you disable video console?

Yes, now:

Rebooting...
lock order reversal:
 1st 0x80937140 smp rendezvous (smp rendezvous) @ 
/usr/src/head/sys/kern/kern_shutdown.c:542
 2nd 0xfe0001f5d838 uart_hwmtx (uart_hwmtx) @ 
/usr/src/head/sys/dev/uart/uart_cpu.h:92
panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ 
/usr/src/head/sys/kern/kern_cons.c:500

cpuid = 0
KDB: enter: panic
[ thread pid 1 tid 11 ]
Stopped at  kdb_enter+0x3b: movq$0,0x5159f2(%rip)
db bt
Tracing pid 1 tid 11 td 0xfe0001d5e000
kdb_enter() at kdb_enter+0x3b
panic() at panic+0x1c7
_mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f
cnputs() at cnputs+0x7a
putchar() at putchar+0x11f
kvprintf() at kvprintf+0x83
vprintf() at vprintf+0x85
printf() at printf+0x67
kdb_backtrace() at kdb_backtrace+0x2d
_witness_debugger() at _witness_debugger+0x2c
witness_checkorder() at witness_checkorder+0x854
_mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99
uart_cnputc() at uart_cnputc+0x3e
cnputc() at cnputc+0x4c
cnputs() at cnputs+0x26
putchar() at putchar+0x11f
kvprintf() at kvprintf+0x83
vprintf() at vprintf+0x85
printf() at printf+0x67
cpu_reset() at cpu_reset+0x81
kern_reboot() at kern_reboot+0x3a5
sys_reboot() at sys_reboot+0x42
amd64_syscall() at amd64_syscall+0x39e
Xfast_syscall() at Xfast_syscall+0xf7
--- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 
0x7fffd6d8, rbp = 0x49 ---


-- 
Totus tuus, Glebius.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-23 Thread Andriy Gapon

on 22/01/2012 18:35 Gleb Smirnoff said the following:
 On Sat, Jan 21, 2012 at 03:38:59PM +0200, Andriy Gapon wrote:
 A on 20/01/2012 17:41 Gleb Smirnoff said the following:
 A  
 A  On Tue, Jan 17, 2012 at 03:02:42PM +0400, Gleb Smirnoff wrote:
 A  T   New panic has been introduced somewhere between
 A  T r229851 and r229932, that happens on shutdown if
 A  T kernel has WITNESS and doesn't have WITNESS_SKIPSPIN.
 A  
 A  I've run through binary search and panic was introduced
 A  by r229854.
 A  
 A 
 A Which means that it was introduced by one of the earlier commits (probably
 A mine).  Thank you for this investigative work!
 A Although, to be frank and honest, I still don't see how the problem could 
 be
 A a non-issue before.
 
 I can confirm that I can reboot succesfully with a new kernel
 and kern.stop_scheduler_on_panic=0.

This is very puzzling.  You observation means that stop_scheduler_on_panic has
an effect on the code outside the panic path.  I reviewed the
SCHEDULER_STOPPED() changes and I still can not see how that could be possible
(modulo compiler bugs).

Unfortunately, I also can not reproduce the problem here - without
WITNESS_SKIPSPIN I am running into a panic[*] during boot.
BTW, do you get any other LOR reports _before_ you do a reboot?

Can you try to change printfs in witness to db_printfs?  Perhaps this will allow
to get the details of the LOR in uart_cnputc.  Maybe that will reveal some
important additional details.

Finally, can you try the patch from my other post?

[*]
The panic:
...
Root mount waiting for: usbus0
panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @
/usr/src/sys/kern/kern_cons.c:500

cpuid = 0
curthread: 0xfe000230b900
stack: 0xff802cda3000 - 0xff802cda7000
stack pointer: 0xff802cda5c50
KDB: stack backtrace:
db_trace_self_wrapper() at 0x802bf41a = db_trace_self_wrapper+0x2a
kdb_backtrace() at 0x8053382a = kdb_backtrace+0x3a
panic() at 0x804f8690 = panic+0x260
_mtx_lock_spin_flags() at 0x804e5eff = _mtx_lock_spin_flags+0xef
cnputs() at 0x804b0ea6 = cnputs+0x36
putchar() at 0x8053887f = putchar+0x24f
kvprintf() at 0x80536b1b = kvprintf+0x9b
vprintf() at 0x80538225 = vprintf+0x85
printf() at 0x805382f7 = printf+0x67
witness_checkorder() at 0x8054c361 = witness_checkorder+0x821
_mtx_lock_spin_flags() at 0x804e5f1d = _mtx_lock_spin_flags+0x10d
uart_cnputc() at 0x803c8e6c = uart_cnputc+0x3c
cnputc() at 0x804b0acd = cnputc+0x6d
cnputs() at 0x804b0ebb = cnputs+0x4b
putchar() at 0x8053887f = putchar+0x24f
kvprintf() at 0x80536b1b = kvprintf+0x9b
vprintf() at 0x80538225 = vprintf+0x85
printf() at 0x805382f7 = printf+0x67
witness_checkorder() at 0x8054c361 = witness_checkorder+0x821
_mtx_lock_spin_flags() at 0x804e5f1d = _mtx_lock_spin_flags+0x10d
cpu_new_callout() at 0x806fadbb = cpu_new_callout+0x9b
callout_cc_add() at 0x8050e379 = callout_cc_add+0x129
callout_reset_on() at 0x8050efcc = callout_reset_on+0x24c
sleepq_set_timeout() at 0x8053ff56 = sleepq_set_timeout+0x106
_sleep() at 0x805036f2 = _sleep+0x342
pause() at 0x80503920 = pause+0x90
usb_pause_mtx() at 0x803ee841 = usb_pause_mtx+0x71
usbd_req_set_address() at 0x803e9f1c = usbd_req_set_address+0xac
usb_alloc_device() at 0x803def89 = usb_alloc_device+0x3d9
uhub_explore() at 0x803e5b63 = uhub_explore+0x3a3
usb_bus_explore() at 0x803d4187 = usb_bus_explore+0xd7
usb_process() at 0x803e8bc1 = usb_process+0xa1
fork_exit() at 0x804c801a = fork_exit+0x1aa
fork_trampoline() at 0x806bf3ee = fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff802cda6d00, rbp = 0 ---
KDB: enter: panic
[ thread pid 15 tid 100031 ]
Stopped at  0x8053348d = kdb_enter+0x3d:movq
$0,0x11212d0(%rip)
db show locks
exclusive sx USB config SX lock (USB config SX lock) r = 0 (0xfe0005040070)
locked @ /usr/src/sys/dev/usb/usb_device.c:2643
exclusive spin mutex callout (callout) r = 0 (0x8160b3f8) locked @
/usr/src/sys/kern/kern_timeout.c:409
exclusive spin mutex sleepq chain (sleepq chain) r = 0 (0x81655668)
locked @ /usr/src/sys/kern/subr_sleepqueue.c:237


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: locks under printf(9) and WITNESS [Was: new panic in cpu_reset() with WITNESS]

2012-01-23 Thread Gleb Smirnoff

On Sat, Jan 21, 2012 at 07:26:55PM +0200, Andriy Gapon wrote:
A  BTW, we have a quite strange situation with spin locks in console output 
path.
A  cnputs_mtx is marked as MTX_NOWITNESS, supposedly because cnputs (printf) 
can be
A  called in any locking context (even during normal operation).  But there 
are a
A  number of console-specific locks (scrlock, uart_hwmtx, syscons video 
lock)
A  which are acquired under cnputs_mtx, but which are *not* marked as
A  MTX_NOWITNESS.  More, they are specified in the witness order_lists as if 
we
A  certainly know a correct order for them.
A  I think that the msgbuf mutex also deserves mentioning in this context.
A  
A  I think that all of these spin locks should be marked as MTX_NOWITNESS (as 
long
A  as they stay normal spinlocks), because printf(9) should be usable 
wherever I
A  stick it in the code.
A  
A  P.S. The above only matters for WITNESS and !WITNESS_SKIPSPIN and I am not 
sure
A  if this combination really matters.
A  
A 
A Here's my take at it:
A diff --git a/sys/kern/kern_cons.c b/sys/kern/kern_cons.c
A index 5346bc3..97f0f16 100644

...

This patch works.

-- 
Totus tuus, Glebius.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-23 Thread Gleb Smirnoff

On Mon, Jan 23, 2012 at 10:21:57AM +0200, Andriy Gapon wrote:
A  I can confirm that I can reboot succesfully with a new kernel
A  and kern.stop_scheduler_on_panic=0.
A 
A This is very puzzling.  You observation means that stop_scheduler_on_panic 
has
A an effect on the code outside the panic path.  I reviewed the
A SCHEDULER_STOPPED() changes and I still can not see how that could be 
possible
A (modulo compiler bugs).
A 
A Unfortunately, I also can not reproduce the problem here - without
A WITNESS_SKIPSPIN I am running into a panic[*] during boot.

I suppose your patch (the one in other post) should help against
this panic on boot.

A BTW, do you get any other LOR reports _before_ you do a reboot?

I have some, but I don't think they are related.

A Can you try to change printfs in witness to db_printfs?  Perhaps this will 
allow
A to get the details of the LOR in uart_cnputc.  Maybe that will reveal some
A important additional details.

Should I do s/printf/db_printf/ throughout entire subr_witness.c, or only
in special places?

A Finally, can you try the patch from my other post?

Yes, it works.

-- 
Totus tuus, Glebius.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: locks under printf(9) and WITNESS [Was: new panic in cpu_reset() with WITNESS]

2012-01-23 Thread Andriy Gapon

on 23/01/2012 15:04 Gleb Smirnoff said the following:
 On Sat, Jan 21, 2012 at 07:26:55PM +0200, Andriy Gapon wrote:
 A  BTW, we have a quite strange situation with spin locks in console output 
 path.
 A  cnputs_mtx is marked as MTX_NOWITNESS, supposedly because cnputs 
 (printf) can be
 A  called in any locking context (even during normal operation).  But there 
 are a
 A  number of console-specific locks (scrlock, uart_hwmtx, syscons video 
 lock)
 A  which are acquired under cnputs_mtx, but which are *not* marked as
 A  MTX_NOWITNESS.  More, they are specified in the witness order_lists as 
 if we
 A  certainly know a correct order for them.
 A  I think that the msgbuf mutex also deserves mentioning in this context.
 A  
 A  I think that all of these spin locks should be marked as MTX_NOWITNESS 
 (as long
 A  as they stay normal spinlocks), because printf(9) should be usable 
 wherever I
 A  stick it in the code.
 A  
 A  P.S. The above only matters for WITNESS and !WITNESS_SKIPSPIN and I am 
 not sure
 A  if this combination really matters.
 A  
 A 
 A Here's my take at it:
 A diff --git a/sys/kern/kern_cons.c b/sys/kern/kern_cons.c
 A index 5346bc3..97f0f16 100644
 
 ...
 
 This patch works.
 

Thank you for testing!

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-23 Thread Andriy Gapon

on 23/01/2012 15:07 Gleb Smirnoff said the following:
 On Mon, Jan 23, 2012 at 10:21:57AM +0200, Andriy Gapon wrote:
 A  I can confirm that I can reboot succesfully with a new kernel
 A  and kern.stop_scheduler_on_panic=0.
 A 
 A This is very puzzling.  You observation means that stop_scheduler_on_panic 
 has
 A an effect on the code outside the panic path.  I reviewed the
 A SCHEDULER_STOPPED() changes and I still can not see how that could be 
 possible
 A (modulo compiler bugs).
 A 
 A Unfortunately, I also can not reproduce the problem here - without
 A WITNESS_SKIPSPIN I am running into a panic[*] during boot.
 
 I suppose your patch (the one in other post) should help against
 this panic on boot.

Yes, it does.  It also masks/fixes the panic at reboot.

 A BTW, do you get any other LOR reports _before_ you do a reboot?
 
 I have some, but I don't think they are related.

OK.

 A Can you try to change printfs in witness to db_printfs?  Perhaps this will 
 allow
 A to get the details of the LOR in uart_cnputc.  Maybe that will reveal some
 A important additional details.
 
 Should I do s/printf/db_printf/ throughout entire subr_witness.c, or only
 in special places?

I think that replace all should work for this test.

 A Finally, can you try the patch from my other post?
 
 Yes, it works.
 

Thank you again.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-23 Thread Gleb Smirnoff

On Mon, Jan 23, 2012 at 04:01:20PM +0200, Andriy Gapon wrote:
A  A Can you try to change printfs in witness to db_printfs?  Perhaps this 
will allow
A  A to get the details of the LOR in uart_cnputc.  Maybe that will reveal 
some
A  A important additional details.
A  
A  Should I do s/printf/db_printf/ throughout entire subr_witness.c, or only
A  in special places?
A 
A I think that replace all should work for this test.

Yes, s/ printf/ db_printf/ in subr_witness.c fixes my problem.

-- 
Totus tuus, Glebius.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-23 Thread Gleb Smirnoff

On Mon, Jan 23, 2012 at 08:24:10PM +0400, Gleb Smirnoff wrote:
T On Mon, Jan 23, 2012 at 04:01:20PM +0200, Andriy Gapon wrote:
T A  A Can you try to change printfs in witness to db_printfs?  Perhaps 
this will allow
T A  A to get the details of the LOR in uart_cnputc.  Maybe that will 
reveal some
T A  A important additional details.
T A  
T A  Should I do s/printf/db_printf/ throughout entire subr_witness.c, or 
only
T A  in special places?
T A 
T A I think that replace all should work for this test.
T 
T Yes, s/  printf/ db_printf/ in subr_witness.c fixes my problem.

Sorry. I was testing wrong kernel. The substitute doesn't help. Panic
trace is almost the same:

db bt
Tracing pid 1 tid 11 td 0xfe0001d5e000
kdb_enter() at kdb_enter+0x3b
panic() at panic+0x1c7
_mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f
cnputs() at cnputs+0x7a
vprintf() at vprintf+0xcb
printf() at printf+0x67
db_putc() at db_putc+0x81
kvprintf() at kvprintf+0x83
db_printf() at db_printf+0x86
witness_checkorder() at witness_checkorder+0x773
_mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99
uart_cnputc() at uart_cnputc+0x3e
cnputc() at cnputc+0x4c
cnputs() at cnputs+0x26
putchar() at putchar+0x11f
kvprintf() at kvprintf+0x83
vprintf() at vprintf+0x85
printf() at printf+0x67
cpu_reset() at cpu_reset+0x81
kern_reboot() at kern_reboot+0x3a5
sys_reboot() at sys_reboot+0x42
amd64_syscall() at amd64_syscall+0x39e
Xfast_syscall() at Xfast_syscall+0xf7
--- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 
0x7fffd6d8, rbp = 0x49 ---
db 

-- 
Totus tuus, Glebius.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-23 Thread Andriy Gapon

on 23/01/2012 18:26 Gleb Smirnoff said the following:
 Sorry. I was testing wrong kernel. The substitute doesn't help. Panic
 trace is almost the same:
 
 db bt
 Tracing pid 1 tid 11 td 0xfe0001d5e000
 kdb_enter() at kdb_enter+0x3b
 panic() at panic+0x1c7
 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f
 cnputs() at cnputs+0x7a
 vprintf() at vprintf+0xcb
 printf() at printf+0x67
 db_putc() at db_putc+0x81

Ah, db_putc does something different from what I expected.
Can you hack it to never use printf?

 kvprintf() at kvprintf+0x83
 db_printf() at db_printf+0x86
 witness_checkorder() at witness_checkorder+0x773
 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99
 uart_cnputc() at uart_cnputc+0x3e
 cnputc() at cnputc+0x4c
 cnputs() at cnputs+0x26
 putchar() at putchar+0x11f
 kvprintf() at kvprintf+0x83
 vprintf() at vprintf+0x85
 printf() at printf+0x67
 cpu_reset() at cpu_reset+0x81
 kern_reboot() at kern_reboot+0x3a5
 sys_reboot() at sys_reboot+0x42
 amd64_syscall() at amd64_syscall+0x39e
 Xfast_syscall() at Xfast_syscall+0xf7
 --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 
 0x7fffd6d8, rbp = 0x49 ---
 db 
 


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-23 Thread Gleb Smirnoff

On Mon, Jan 23, 2012 at 06:43:23PM +0200, Andriy Gapon wrote:
A  db bt
A  Tracing pid 1 tid 11 td 0xfe0001d5e000
A  kdb_enter() at kdb_enter+0x3b
A  panic() at panic+0x1c7
A  _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f
A  cnputs() at cnputs+0x7a
A  vprintf() at vprintf+0xcb
A  printf() at printf+0x67
A  db_putc() at db_putc+0x81
A 
A Ah, db_putc does something different from what I expected.
A Can you hack it to never use printf?

Just cut printfs from db_putc()?

-- 
Totus tuus, Glebius.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-23 Thread Andriy Gapon

on 23/01/2012 18:46 Gleb Smirnoff said the following:
 On Mon, Jan 23, 2012 at 06:43:23PM +0200, Andriy Gapon wrote:
 A  db bt
 A  Tracing pid 1 tid 11 td 0xfe0001d5e000
 A  kdb_enter() at kdb_enter+0x3b
 A  panic() at panic+0x1c7
 A  _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f
 A  cnputs() at cnputs+0x7a
 A  vprintf() at vprintf+0xcb
 A  printf() at printf+0x67
 A  db_putc() at db_putc+0x81
 A 
 A Ah, db_putc does something different from what I expected.
 A Can you hack it to never use printf?
 
 Just cut printfs from db_putc()?

Make the following condition be always false:

if (!kdb_active || ddb_use_printf) {

E.g.:

if (0) {

:-)
-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-22 Thread Gleb Smirnoff

On Sat, Jan 21, 2012 at 03:38:59PM +0200, Andriy Gapon wrote:
A on 20/01/2012 17:41 Gleb Smirnoff said the following:
A  
A  On Tue, Jan 17, 2012 at 03:02:42PM +0400, Gleb Smirnoff wrote:
A  T   New panic has been introduced somewhere between
A  T r229851 and r229932, that happens on shutdown if
A  T kernel has WITNESS and doesn't have WITNESS_SKIPSPIN.
A  
A  I've run through binary search and panic was introduced
A  by r229854.
A  
A 
A Which means that it was introduced by one of the earlier commits (probably
A mine).  Thank you for this investigative work!
A Although, to be frank and honest, I still don't see how the problem could be
A a non-issue before.

I can confirm that I can reboot succesfully with a new kernel
and kern.stop_scheduler_on_panic=0.

-- 
Totus tuus, Glebius.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-21 Thread Andriy Gapon

on 20/01/2012 17:41 Gleb Smirnoff said the following:
 
 On Tue, Jan 17, 2012 at 03:02:42PM +0400, Gleb Smirnoff wrote:
 T   New panic has been introduced somewhere between
 T r229851 and r229932, that happens on shutdown if
 T kernel has WITNESS and doesn't have WITNESS_SKIPSPIN.
 
 I've run through binary search and panic was introduced
 by r229854.
 

Which means that it was introduced by one of the earlier commits (probably
mine).  Thank you for this investigative work!
Although, to be frank and honest, I still don't see how the problem could be
a non-issue before.

So any help/ideas/guesses are appreciated.


P.S.
I also have a big suspicion that we have more problems that can be triggered by
WITNESS  !WITNESS_SKIPSPIN, but which are harmless in practice.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

locks under printf(9) and WITNESS [Was: new panic in cpu_reset() with WITNESS]

2012-01-21 Thread Andriy Gapon


BTW, we have a quite strange situation with spin locks in console output path.
cnputs_mtx is marked as MTX_NOWITNESS, supposedly because cnputs (printf) can be
called in any locking context (even during normal operation).  But there are a
number of console-specific locks (scrlock, uart_hwmtx, syscons video lock)
which are acquired under cnputs_mtx, but which are *not* marked as
MTX_NOWITNESS.  More, they are specified in the witness order_lists as if we
certainly know a correct order for them.
I think that the msgbuf mutex also deserves mentioning in this context.

I think that all of these spin locks should be marked as MTX_NOWITNESS (as long
as they stay normal spinlocks), because printf(9) should be usable wherever I
stick it in the code.

P.S. The above only matters for WITNESS and !WITNESS_SKIPSPIN and I am not sure
if this combination really matters.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-21 Thread Andriy Gapon

on 21/01/2012 15:38 Andriy Gapon said the following:
 on 20/01/2012 17:41 Gleb Smirnoff said the following:

 On Tue, Jan 17, 2012 at 03:02:42PM +0400, Gleb Smirnoff wrote:
 T   New panic has been introduced somewhere between
 T r229851 and r229932, that happens on shutdown if
 T kernel has WITNESS and doesn't have WITNESS_SKIPSPIN.

 I've run through binary search and panic was introduced
 by r229854.

 
 Which means that it was introduced by one of the earlier commits (probably
 mine).  Thank you for this investigative work!
 Although, to be frank and honest, I still don't see how the problem could be
 a non-issue before.

BTW, just to be sure, does the problem go away when kern.stop_scheduler_on_panic
is set to zero?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: locks under printf(9) and WITNESS [Was: new panic in cpu_reset() with WITNESS]

2012-01-21 Thread Andriy Gapon

on 21/01/2012 16:37 Andriy Gapon said the following:
 
 BTW, we have a quite strange situation with spin locks in console output path.
 cnputs_mtx is marked as MTX_NOWITNESS, supposedly because cnputs (printf) can 
 be
 called in any locking context (even during normal operation).  But there are a
 number of console-specific locks (scrlock, uart_hwmtx, syscons video lock)
 which are acquired under cnputs_mtx, but which are *not* marked as
 MTX_NOWITNESS.  More, they are specified in the witness order_lists as if we
 certainly know a correct order for them.
 I think that the msgbuf mutex also deserves mentioning in this context.
 
 I think that all of these spin locks should be marked as MTX_NOWITNESS (as 
 long
 as they stay normal spinlocks), because printf(9) should be usable wherever I
 stick it in the code.
 
 P.S. The above only matters for WITNESS and !WITNESS_SKIPSPIN and I am not 
 sure
 if this combination really matters.
 

Here's my take at it:
diff --git a/sys/kern/kern_cons.c b/sys/kern/kern_cons.c
index 5346bc3..97f0f16 100644
--- a/sys/kern/kern_cons.c
+++ b/sys/kern/kern_cons.c
@@ -586,7 +586,7 @@ static void
 cn_drvinit(void *unused)
 {

-   mtx_init(cnputs_mtx, cnputs_mtx, NULL, MTX_SPIN | MTX_NOWITNESS);
+   mtx_init(cnputs_mtx, cnputs_mtx, NULL, MTX_SPIN);
use_cnputs_mtx = 1;
 }

diff --git a/sys/kern/subr_witness.c b/sys/kern/subr_witness.c
index 55cb2d7..39dd94d 100644
--- a/sys/kern/subr_witness.c
+++ b/sys/kern/subr_witness.c
@@ -629,7 +629,6 @@ static struct witness_order_list_entry order_lists[] = {
 #endif
{ rm.mutex_mtx, lock_class_mtx_spin },
{ sio, lock_class_mtx_spin },
-   { scrlock, lock_class_mtx_spin },
 #ifdef __i386__
{ cy, lock_class_mtx_spin },
 #endif
@@ -638,7 +637,6 @@ static struct witness_order_list_entry order_lists[] = {
{ rtc_mtx, lock_class_mtx_spin },
 #endif
{ scc_hwmtx, lock_class_mtx_spin },
-   { uart_hwmtx, lock_class_mtx_spin },
{ fast_taskqueue, lock_class_mtx_spin },
{ intr table, lock_class_mtx_spin },
 #ifdef HWPMC_HOOKS
@@ -653,8 +651,8 @@ static struct witness_order_list_entry order_lists[] = {
{ sched lock, lock_class_mtx_spin },
{ td_contested, lock_class_mtx_spin },
{ callout, lock_class_mtx_spin },
+   { et_hw_mtx, lock_class_mtx_spin },
{ entropy harvest mutex, lock_class_mtx_spin },
-   { syscons video lock, lock_class_mtx_spin },
 #ifdef SMP
{ smp rendezvous, lock_class_mtx_spin },
 #endif
@@ -662,6 +660,13 @@ static struct witness_order_list_entry order_lists[] = {
{ tlb0, lock_class_mtx_spin },
 #endif
/*
+* console locks
+*/
+   { cnputs_mtx, lock_class_mtx_spin },
+   { scrlock, lock_class_mtx_spin },
+   { syscons video lock, lock_class_mtx_spin },
+   { uart_hwmtx, lock_class_mtx_spin },
+   /*
 * leaf locks
 */
{ intrcnt, lock_class_mtx_spin },


How does this look?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-20 Thread Gleb Smirnoff


On Tue, Jan 17, 2012 at 03:02:42PM +0400, Gleb Smirnoff wrote:
T   New panic has been introduced somewhere between
T r229851 and r229932, that happens on shutdown if
T kernel has WITNESS and doesn't have WITNESS_SKIPSPIN.

I've run through binary search and panic was introduced
by r229854.

-- 
Totus tuus, Glebius.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

new panic in cpu_reset() with WITNESS

2012-01-17 Thread Gleb Smirnoff

  New panic has been introduced somewhere between
r229851 and r229932, that happens on shutdown if
kernel has WITNESS and doesn't have WITNESS_SKIPSPIN.

Uptime: 1h0m17s
Rebooting...
panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ 
/usr/src/head/sys/kern/kern_cons.c:500
cpuid = 0
KDB: enter: panic
[ thread pid 1 tid 11 ]
Stopped at  kdb_enter+0x3b: movq$0,0x514d32(%rip)
db
db bt
Tracing pid 1 tid 11 td 0xfe0001d5e000
kdb_enter() at kdb_enter+0x3b
panic() at panic+0x1c7
_mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f
cnputs() at cnputs+0x7a
putchar() at putchar+0x11f
kvprintf() at kvprintf+0x83
vprintf() at vprintf+0x85
printf() at printf+0x67
witness_checkorder() at witness_checkorder+0x773
_mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99
uart_cnputc() at uart_cnputc+0x3e
cnputc() at cnputc+0x4c
cnputs() at cnputs+0x26
putchar() at putchar+0x11f
kvprintf() at kvprintf+0x83
vprintf() at vprintf+0x85
printf() at printf+0x67
cpu_reset() at cpu_reset+0x81
kern_reboot() at kern_reboot+0x3a5
--More--^M^Msys_reboot() at sys_reboot+0x42
amd64_syscall() at amd64_syscall+0x39e
Xfast_syscall() at Xfast_syscall+0xf7
--- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 
0x7fffd6d8, rbp = 0x49 ---
db
db show locks
exclusive sleep mutex Giant (Giant) r = 0 (0x809bc560) locked @ 
/usr/src/head/sys/kern/kern_module.c:101
exclusive spin mutex smp rendezvous (smp rendezvous) r = 0 (0x80a08840) 
locked @ /usr/src/head/sys/kern/kern_shutdown.c:542
db

So the problem is that we are holding smp rendezvous mutex during the 
cpu_reset().
No mutexes should be obtained after it. However, since cpu_reset() does priting
we obtain cnputs_mtx, and later obtain uart_hwmtx. The latter is hardcoded in
the subr_witness.c as mutex to obtain before smp rendezvous, this triggers
yet another printf from witness, that finally panics due to recursing on
cnputs_mtx.

-- 
Totus tuus, Glebius.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-17 Thread mdf

2012/1/17 Gleb Smirnoff gleb...@freebsd.org:
  New panic has been introduced somewhere between
 r229851 and r229932, that happens on shutdown if
 kernel has WITNESS and doesn't have WITNESS_SKIPSPIN.

 Uptime: 1h0m17s
 Rebooting...
 panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ 
 /usr/src/head/sys/kern/kern_cons.c:500
 cpuid = 0
 KDB: enter: panic
 [ thread pid 1 tid 11 ]
 Stopped at      kdb_enter+0x3b: movq    $0,0x514d32(%rip)
 db
 db bt
 Tracing pid 1 tid 11 td 0xfe0001d5e000
 kdb_enter() at kdb_enter+0x3b
 panic() at panic+0x1c7
 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f
 cnputs() at cnputs+0x7a
 putchar() at putchar+0x11f
 kvprintf() at kvprintf+0x83
 vprintf() at vprintf+0x85
 printf() at printf+0x67
 witness_checkorder() at witness_checkorder+0x773
 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99
 uart_cnputc() at uart_cnputc+0x3e
 cnputc() at cnputc+0x4c
 cnputs() at cnputs+0x26
 putchar() at putchar+0x11f
 kvprintf() at kvprintf+0x83
 vprintf() at vprintf+0x85
 printf() at printf+0x67
 cpu_reset() at cpu_reset+0x81
 kern_reboot() at kern_reboot+0x3a5
 --More--^M        ^Msys_reboot() at sys_reboot+0x42
 amd64_syscall() at amd64_syscall+0x39e
 Xfast_syscall() at Xfast_syscall+0xf7
 --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 
 0x7fffd6d8, rbp = 0x49 ---
 db
 db show locks
 exclusive sleep mutex Giant (Giant) r = 0 (0x809bc560) locked @ 
 /usr/src/head/sys/kern/kern_module.c:101
 exclusive spin mutex smp rendezvous (smp rendezvous) r = 0 
 (0x80a08840) locked @ /usr/src/head/sys/kern/kern_shutdown.c:542
 db

 So the problem is that we are holding smp rendezvous mutex during the 
 cpu_reset().
 No mutexes should be obtained after it. However, since cpu_reset() does 
 priting
 we obtain cnputs_mtx, and later obtain uart_hwmtx. The latter is hardcoded in
 the subr_witness.c as mutex to obtain before smp rendezvous, this triggers
 yet another printf from witness, that finally panics due to recursing on
 cnputs_mtx.

At $WORK we explicitly marked cnputs_mtx as NO_WITNESS since it didn't
seem possible to fit it into the heirarchy in any sane way, since a
print can come from basically anywhere.

If anyone has a better fix, that'd be great, but I haven't been able
to think of one.

Thanks,
matthew
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-17 Thread Gleb Smirnoff

On Tue, Jan 17, 2012 at 07:34:23AM -0800, m...@freebsd.org wrote:
m 2012/1/17 Gleb Smirnoff gleb...@freebsd.org:
m   New panic has been introduced somewhere between
m  r229851 and r229932, that happens on shutdown if
m  kernel has WITNESS and doesn't have WITNESS_SKIPSPIN.
m 
m  Uptime: 1h0m17s
m  Rebooting...
m  panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ 
/usr/src/head/sys/kern/kern_cons.c:500
m  cpuid = 0
m  KDB: enter: panic
m  [ thread pid 1 tid 11 ]
m  Stopped at      kdb_enter+0x3b: movq    $0,0x514d32(%rip)
m  db
m  db bt
m  Tracing pid 1 tid 11 td 0xfe0001d5e000
m  kdb_enter() at kdb_enter+0x3b
m  panic() at panic+0x1c7
m  _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f
m  cnputs() at cnputs+0x7a
m  putchar() at putchar+0x11f
m  kvprintf() at kvprintf+0x83
m  vprintf() at vprintf+0x85
m  printf() at printf+0x67
m  witness_checkorder() at witness_checkorder+0x773
m  _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99
m  uart_cnputc() at uart_cnputc+0x3e
m  cnputc() at cnputc+0x4c
m  cnputs() at cnputs+0x26
m  putchar() at putchar+0x11f
m  kvprintf() at kvprintf+0x83
m  vprintf() at vprintf+0x85
m  printf() at printf+0x67
m  cpu_reset() at cpu_reset+0x81
m  kern_reboot() at kern_reboot+0x3a5
m  --More--^M        ^Msys_reboot() at sys_reboot+0x42
m  amd64_syscall() at amd64_syscall+0x39e
m  Xfast_syscall() at Xfast_syscall+0xf7
m  --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 
0x7fffd6d8, rbp = 0x49 ---
m  db
m  db show locks
m  exclusive sleep mutex Giant (Giant) r = 0 (0x809bc560) locked @ 
/usr/src/head/sys/kern/kern_module.c:101
m  exclusive spin mutex smp rendezvous (smp rendezvous) r = 0 
(0x80a08840) locked @ /usr/src/head/sys/kern/kern_shutdown.c:542
m  db
m 
m  So the problem is that we are holding smp rendezvous mutex during the 
cpu_reset().
m  No mutexes should be obtained after it. However, since cpu_reset() does 
priting
m  we obtain cnputs_mtx, and later obtain uart_hwmtx. The latter is hardcoded 
in
m  the subr_witness.c as mutex to obtain before smp rendezvous, this triggers
m  yet another printf from witness, that finally panics due to recursing on
m  cnputs_mtx.
m 
m At $WORK we explicitly marked cnputs_mtx as NO_WITNESS since it didn't
m seem possible to fit it into the heirarchy in any sane way, since a
m print can come from basically anywhere.
m 
m If anyone has a better fix, that'd be great, but I haven't been able
m to think of one.

Setting NO_WITNESS on cnputs_mtx won't help for the above problem, IMHO.

-- 
Totus tuus, Glebius.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: new panic in cpu_reset() with WITNESS

2012-01-17 Thread Andriy Gapon

on 17/01/2012 17:34 m...@freebsd.org said the following:
 2012/1/17 Gleb Smirnoff gleb...@freebsd.org:
  New panic has been introduced somewhere between
 r229851 and r229932, that happens on shutdown if
 kernel has WITNESS and doesn't have WITNESS_SKIPSPIN.
[**]
 Uptime: 1h0m17s
 Rebooting...
 panic: mtx_lock_spin: recursed on non-recursive mutex cnputs_mtx @ 
 /usr/src/head/sys/kern/kern_cons.c:500
 cpuid = 0
 KDB: enter: panic
 [ thread pid 1 tid 11 ]
 Stopped at  kdb_enter+0x3b: movq$0,0x514d32(%rip)
 db
 db bt
 Tracing pid 1 tid 11 td 0xfe0001d5e000
 kdb_enter() at kdb_enter+0x3b
 panic() at panic+0x1c7
 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x10f
 cnputs() at cnputs+0x7a
 putchar() at putchar+0x11f
 kvprintf() at kvprintf+0x83
 vprintf() at vprintf+0x85
 printf() at printf+0x67
 witness_checkorder() at witness_checkorder+0x773
 _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x99
 uart_cnputc() at uart_cnputc+0x3e
 cnputc() at cnputc+0x4c
 cnputs() at cnputs+0x26
 putchar() at putchar+0x11f
 kvprintf() at kvprintf+0x83
 vprintf() at vprintf+0x85
 printf() at printf+0x67
 cpu_reset() at cpu_reset+0x81
 kern_reboot() at kern_reboot+0x3a5
 --More--^M^Msys_reboot() at sys_reboot+0x42
 amd64_syscall() at amd64_syscall+0x39e
 Xfast_syscall() at Xfast_syscall+0xf7
 --- syscall (55, FreeBSD ELF64, sys_reboot), rip = 0x40ea3c, rsp = 
 0x7fffd6d8, rbp = 0x49 ---
 db
 db show locks
 exclusive sleep mutex Giant (Giant) r = 0 (0x809bc560) locked @ 
 /usr/src/head/sys/kern/kern_module.c:101
 exclusive spin mutex smp rendezvous (smp rendezvous) r = 0 
 (0x80a08840) locked @ /usr/src/head/sys/kern/kern_shutdown.c:542
 db

 So the problem is that we are holding smp rendezvous mutex during the 
 cpu_reset().
 No mutexes should be obtained after it. However, since cpu_reset() does 
 priting
 we obtain cnputs_mtx, and later obtain uart_hwmtx. The latter is hardcoded in
 the subr_witness.c as mutex to obtain before smp rendezvous, this triggers
 yet another printf from witness, that finally panics due to recursing on
 cnputs_mtx.
 
 At $WORK we explicitly marked cnputs_mtx as NO_WITNESS since it didn't
 seem possible to fit it into the heirarchy in any sane way, since a
 print can come from basically anywhere.

This is the case for the FreeBSD tree too (at least in head).

 If anyone has a better fix, that'd be great, but I haven't been able
 to think of one.


Please note that while the panic happened on cnputs_mtx[*] (because of the
recursion), it seems to have been provoked by uart_hwmtx actually.

The root cause of the panic at hand is that smp_ipi_mtx is supposed to be a
terminal lock, i.e. no other lock is supposed to be acquired while smp_ipi_mtx
is held.  This is the case for all the normal code, but the rule is violated for
cpu_reset() in the SMP case, because smp_ipi_mtx is acquired in shutdown_reset()
and is held till the end.

Thus, while normally smp_ipi_mtx and cnputs_mtx/uart_hwmtx should be completely
independent, they have become dependent in this reset path.

I think that normally this problem is hidden by people not using WITNESS or by
them using WITNESS_SKIPSPIN.

One obvious solution is to change the relative order of smp_ipi_mtx (aka smp
rendezvous) and uart_hwmtx in order_lists of subr_witness.c, which predefine
some order that is expected between the locks.

Another solution is to look into the reason why smp_ipi_mtx is acquired in
shutdown_reset().  This call has been added to avoid [what used to be] a
deadlock between the stop_cpus() call with interrupts disabled on a shutdown CPU
and smp_rendezvous or some tlb ipi call on another CPU.  As such it should be
sufficient to hold smp_ipi_mtx around the stop_cpus() call.
In fact, I would argue that stop_cpus() must hold smp_ipi_mtx for the above
reason.  That is, all inter-CPU operations should be initiated using the same 
lock.

[*]
And there is, of course, a different issue (pointed out many times by Bruce
Evans) of cnputs_mtx and now msg_lock (msgbuf lock) being inadequate for their
jobs as this example demonstrates.  Multiplied by the problem of printf being
used where db_printf would have been a better choice.
Since printf (or db_printf) can be called truly in any situation, the locking
must handle recursion, must handle being stuck on another CPU, etc.  In
general, these functions should strive to be lockless, but that's probably not
really practical for the normal situations in SMP environment.  Still the
special situations should be detected and handled.

[**] I would expect this panic/problem to be there for a long time already
(definitely before r229851).  It's surprising to me that it seems to be a recent
phenomenon.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: witness_lock_list_get: witness exhausted

2011-08-18 Thread John Baldwin

On Tuesday, August 16, 2011 5:59:53 pm Peter Jeremy wrote:
 I'm getting the above message when running Peter Holm's stress
 test with INCARNATIONS=150 on a 16-core sparc.  Does this mean
 LOCK_CHILDCOUNT is too low or does it indicate a leak in witness
 lock_list_entry's somewhere?

Most likely the former.

 The comment on LOCK_CHILDCOUNT indicates it is dimensioned to
 allow 2048 threads to hold 5 locks each but it's not clear how
 many threads will get started at a given INCARNATIONS count.

Also, each lle holds a bucket of instances, so the same count
would only let 1024 threads hold 6 locks each perhaps (depending
on the bucket size.. it was originally 4 locks per lle).

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

witness_lock_list_get: witness exhausted

2011-08-16 Thread Peter Jeremy

I'm getting the above message when running Peter Holm's stress
test with INCARNATIONS=150 on a 16-core sparc.  Does this mean
LOCK_CHILDCOUNT is too low or does it indicate a leak in witness
lock_list_entry's somewhere?

The comment on LOCK_CHILDCOUNT indicates it is dimensioned to
allow 2048 threads to hold 5 locks each but it's not clear how
many threads will get started at a given INCARNATIONS count.

-- 
Peter Jeremy


pgpCkToMhjPEH.pgp
Description: PGP signature

Questions about witness reports on sparc64

2010-03-04 Thread Nikolai Fetissov

Folks,

I'm pretty new here, so feel free to use the clue stick :)

Playing with current- on Sun Fire V210 I see more or less
consistent lock reversal reports like the following:

lock order reversal:
 1st 0xc1c85fb8 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:2559
 2nd 0xf80002784600 dirhash (dirhash) @
/usr/src/sys/ufs/ufs/ufs_dirhash.c:285
KDB: stack backtrace:
_witness_debugger() at _witness_debugger+0x38
witness_checkorder() at witness_checkorder+0xcf8
_sx_xlock() at _sx_xlock+0x9c
ufsdirhash_acquire() at ufsdirhash_acquire+0x30
ufsdirhash_add() at ufsdirhash_add+0x4
ufs_direnter() at ufs_direnter+0x75c
ufs_makeinode() at ufs_makeinode+0x4dc
ufs_create() at ufs_create+0x40
VOP_CREATE_APV() at VOP_CREATE_APV+0x108
vn_open_cred() at vn_open_cred+0x1fc
vn_open() at vn_open+0x1c
kern_openat() at kern_openat+0x10c
kern_open() at kern_open+0x18
open() at open+0x14
syscall() at syscall+0x25c
-- syscall (5, FreeBSD ELF64, open) %o7=0x412bfda8 --
userland() at 0x41344808
user trace: trap %o7=0x412bfda8
pc 0x41344808, sp 0x7fd99d1
pc 0x120008, sp 0x7fd9aa1
pc 0x11216c, sp 0x7fd9b61
pc 0x10d32c, sp 0x7fddd11
pc 0x10d9e4, sp 0x7fdde01
pc 0x10cf2c, sp 0x7fdded1
pc 0x10eea4, sp 0x7fddfa1
pc 0x15a0a0, sp 0x7fde061
pc 0x13024c, sp 0x7fde191
pc 0x103c10, sp 0x7fde281
pc 0x4028ef34, sp 0x7fde341
done

That's just on startup, nothing fancy going on.

My questions are - is it worth spamming the list
with these? If so, what other info needed?

Full dmesg bellow.

Thanks,
--
 Nikolai


GDB: no debug ports present
KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1992-2010 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 9.0-CURRENT #0: Sun Feb 28 23:19:48 EST 2010
r...@moon.ipv6.fetissov.org:/usr/obj/usr/src/sys/GENERIC sparc64
WARNING: WITNESS option enabled, expect reduced performance.
real memory  = 4294967296 (4096 MB)
avail memory = 4174233600 (3980 MB)
cpu0: Sun Microsystems UltraSparc-IIIi Processor (1002.00 MHz CPU)
cpu1: Sun Microsystems UltraSparc-IIIi Processor (1002.00 MHz CPU)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
ispfw: registered firmware isp_1000
ispfw: registered firmware isp_1040
ispfw: registered firmware isp_1040_it
ispfw: registered firmware isp_1080
ispfw: registered firmware isp_1080_it
ispfw: registered firmware isp_12160
ispfw: registered firmware isp_12160_it
ispfw: registered firmware isp_2100
ispfw: registered firmware isp_2200
ispfw: registered firmware isp_2300
ispfw: registered firmware isp_2322
ispfw: registered firmware isp_2400
ispfw: registered firmware isp_2400_multi
ispfw: registered firmware isp_2500
ispfw: registered firmware isp_2500_multi
kbd0 at kbdmux0
nexus0: Open Firmware Nexus device
nexus0: memory-controller mem 0x400-0x407 type
memory-controller (no driver attached)
nexus0: memory-controller mem 0x480-0x487 type
memory-controller (no driver attached)
pcib0: Sun Host-PCI bridge mem
0x4000ff0-0x4000ff0afff,0x4000fc1-0x4000fc1701f,0x7f6-0x7f600ff,0x4000ff8-0x4000ff8
irq 2035,2032,2033,2036,2019 on nexus0
pcib0: Tomatillo, version 4, IGN 0x1f, bus B, 66MHz
pcib0: DVMA map: 0xc000 to 0xdfff 65536 entries
pcib0: [FILTER]
pcib0: [FILTER]
pcib0: [FILTER]
pcib0: [FILTER]
pci0: OFW PCI bus on pcib0
bge0: Broadcom BCM5704 A3, ASIC rev. 0x002003 mem
0x20-0x20,0x11-0x11 at device 2.0 on pci0
miibus0: MII bus on bge0
brgphy0: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
bge0: Ethernet address: 00:03:ba:45:87:2d
bge0: [ITHREAD]
bge1: Broadcom BCM5704 A3, ASIC rev. 0x002003 mem
0x40-0x40,0x12-0x12 at device 2.1 on pci0
miibus1: MII bus on bge1
brgphy1: BCM5704 10/100/1000baseTX PHY PHY 1 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
bge1: Ethernet address: 00:03:ba:45:87:2e
bge1: [ITHREAD]
pcib1: Sun Host-PCI bridge mem
0x4000f60-0x4000f60afff,0x4000f41-0x4000f41701f,0x7fe-0x7fe00ff,0x4000f78-0x4000f78
irq 1970,1968,1969,1972,1953 on nexus0
pcib1: Tomatillo, version 4, IGN 0x1e, bus A, 33MHz
pcib1: DVMA map: 0xc000 to 0xdfff 65536 entries
pcib1: [FILTER]
pci1: OFW PCI bus on pcib1
isab0: PCI-ISA bridge at device 7.0 on pci1
isa0: ISA bus on isab0
pci1: old, non-VGA display device at device 6.0 (no driver attached)
ohci0: AcerLabs M5237 (Aladdin-V) USB controller mem 0x100-0x1000fff
at device 10.0 on pci1
ohci0: [ITHREAD]
usbus0: AcerLabs M5237 (Aladdin-V) USB controller on ohci0
atapci0: AcerLabs M5229 UDMA100 controller port
0x900-0x907,0x918-0x91b,0x910-0x917,0x908-0x90b,0x920-0x92f at device 13.0
on pci1
atapci0: [ITHREAD]
atapci0: using

Re: Questions about witness reports on sparc64

2010-03-04 Thread Scot Hetzel

On Thu, Mar 4, 2010 at 7:19 PM, Nikolai Fetissov
nikolai-free...@fetissov.org wrote:
 Folks,

 I'm pretty new here, so feel free to use the clue stick :)

 Playing with current- on Sun Fire V210 I see more or less
 consistent lock reversal reports like the following:

:

 My questions are - is it worth spamming the list
 with these? If so, what other info needed?

It is worth sending it to the list and sending a report as outlined here:

http://sources.zabbadoz.net/freebsd/lor.html

Scot
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

patch to let witness monitor the mtx pool

2003-06-24 Thread Don Lewis

I've been running with the patch below for a little while now.  It
helped me find a situation where a thread attemped to grab a pool
mutex while it already held one, which I suspect could have caused a
deadlock in certain circumstances.  In any case, this was illegal
because these mutexes are only supposed be used as leaf mutexes.

I had to patch kern_sx.c to quiet warnings about lock order reversal
that look bogus to me.  I believe the way that kern_sx.c uses the pool
mutexes is safe.

Index: sys/kern/kern_mtxpool.c
===
RCS file: /home/ncvs/src/sys/kern/kern_mtxpool.c,v
retrieving revision 1.7
diff -u -r1.7 kern_mtxpool.c
--- sys/kern/kern_mtxpool.c 13 Jun 2003 19:39:21 -  1.7
+++ sys/kern/kern_mtxpool.c 20 Jun 2003 06:36:55 -
@@ -33,6 +33,7 @@
 #include sys/lock.h
 #include sys/malloc.h
 #include sys/mutex.h
+#include sys/sysctl.h
 #include sys/systm.h
 
 #ifndef MTX_POOL_SIZE
@@ -44,6 +45,12 @@
 
 int mtx_pool_valid = 0;
 
+int witness_skip_mtx_pool = 0;
+
+TUNABLE_INT(debug.witness_skip_mtx_pool, witness_skip_mtx_pool);
+SYSCTL_INT(_debug, OID_AUTO, witness_skip_mtx_pool, CTLFLAG_RD,
+witness_skip_mtx_pool, 0, );
+
 /*
  * Inline version of mtx_pool_find(), used to streamline our main API
  * function calls.
@@ -60,11 +67,13 @@
 static void
 mtx_pool_setup(void *dummy __unused)
 {
-int i;
+int i, mtx_opts;
 
+mtx_opts = MTX_DEF | MTX_QUIET;
+if (witness_skip_mtx_pool)
+   mtx_opts |= MTX_NOWITNESS;
 for (i = 0; i  MTX_POOL_SIZE; ++i)
-   mtx_init(mtx_pool_ary[i], pool mutex, NULL,
-   MTX_DEF | MTX_NOWITNESS | MTX_QUIET);
+   mtx_init(mtx_pool_ary[i], pool mutex, NULL, mtx_opts);
 mtx_pool_valid = 1;
 }
 
Index: sys/kern/kern_sx.c
===
RCS file: /home/ncvs/src/sys/kern/kern_sx.c,v
retrieving revision 1.19
diff -u -r1.19 kern_sx.c
--- sys/kern/kern_sx.c  11 Jun 2003 00:56:56 -  1.19
+++ sys/kern/kern_sx.c  19 Jun 2003 20:28:19 -
@@ -126,9 +126,8 @@
sx-sx_cnt++;
 
LOCK_LOG_LOCK(SLOCK, sx-sx_object, 0, 0, file, line);
-   WITNESS_LOCK(sx-sx_object, 0, file, line);
-
mtx_unlock(sx-sx_lock);
+   WITNESS_LOCK(sx-sx_object, 0, file, line);
 }
 
 int
@@ -139,8 +138,8 @@
if (sx-sx_cnt = 0) {
sx-sx_cnt++;
LOCK_LOG_TRY(SLOCK, sx-sx_object, 0, 1, file, line);
-   WITNESS_LOCK(sx-sx_object, LOP_TRYLOCK, file, line);
mtx_unlock(sx-sx_lock);
+   WITNESS_LOCK(sx-sx_object, LOP_TRYLOCK, file, line);
return (1);
} else {
LOCK_LOG_TRY(SLOCK, sx-sx_object, 0, 0, file, line);
@@ -180,9 +179,8 @@
sx-sx_xholder = curthread;
 
LOCK_LOG_LOCK(XLOCK, sx-sx_object, 0, 0, file, line);
-   WITNESS_LOCK(sx-sx_object, LOP_EXCLUSIVE, file, line);
-
mtx_unlock(sx-sx_lock);
+   WITNESS_LOCK(sx-sx_object, LOP_EXCLUSIVE, file, line);
 }
 
 int
@@ -194,9 +192,9 @@
sx-sx_cnt--;
sx-sx_xholder = curthread;
LOCK_LOG_TRY(XLOCK, sx-sx_object, 0, 1, file, line);
+   mtx_unlock(sx-sx_lock);
WITNESS_LOCK(sx-sx_object, LOP_EXCLUSIVE | LOP_TRYLOCK, file,
line);
-   mtx_unlock(sx-sx_lock);
return (1);
} else {
LOCK_LOG_TRY(XLOCK, sx-sx_object, 0, 0, file, line);

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]

fun with WITNESS and pool mutex

2003-06-18 Thread Don Lewis

When I was attempting to debug a system deadlock problem where the
culprit process was sleeping on a pool mutex, I noticed that show
witness in ddb doesn't report anything about this particular mutex
flavor.  I discovered that witness doesn't monitor these mutexes because
mtx_pool_setup() calls mtx_init with the MTX_NOWITNESS flag.

These are a mutexes bit special, because they are supposed to be leaf
mutexes and no other mutexes should be grabbed after them.  The deadlock
in question caused me to discover a violation of this restriction, so I
wondered if there were more problems of this type in the code.  I
suspected there would be, since there haven't been any automatic checks
of to verify that these mutexes are being used correctly.

Just for grins, I removed the MTX_NOWITNESS flag from mtx_pool_setup()
and quickly found the first violation during the boot sequence:

Mounting root from ufs:/dev/da0s1a
acquiring duplicate lock of same type: pool mutex
 1st pool mutex @ /usr/src/sys/kern/vfs_syscalls.c:736
 2nd pool mutex @ /usr/src/sys/kern/kern_lock.c:598
Stack backtrace:
backtrace(c051f4df,c051c041,c051adcc,256,c05e4808) at backtrace+0x17
witness_lock(c05e0cf0,8,c051adcc,256,c05e2ae0) at witness_lock+0x697
_mtx_lock_flags(c05e0cf0,0,c051adcc,256,c641a248) at _mtx_lock_flags+0xb1
lockstatus(c641a304,0,e4b1ab24,c036f2e8,e4b1ab44) at lockstatus+0x3c
vop_stdislocked(e4b1ab44,e4b1ab30,c0464f78,e4b1ab44,e4b1ab58) at vop_stdislocked
+0x21
vop_defaultop(e4b1ab44,e4b1ab58,c0375a67,e4b1ab44,c05e4808) at vop_defaultop+0x1
8
ufs_vnoperate(e4b1ab44,c05e4808,c05740ac,c05b67e0,c641a248) at ufs_vnoperate+0x1
8
assert_vop_locked(c641a248,c05197fc,c05b74e0,c641a248,0) at assert_vop_locked+0x
47
VOP_GETVOBJECT(c641a248,0,c0524224,2e0,c05740ac) at VOP_GETVOBJECT+0x3f
kern_open(c61c64c0,bfbff6b0,0,1,0) at kern_open+0x44f
open(c61c64c0,e4b1ad10,c0537dc3,3fd,3) at open+0x30
syscall(2f,2f,2f,1,0) at syscall+0x26e
Xint0x80_syscall() at Xint0x80_syscall+0x1d
--- syscall (5), eip = 0x80529ef, esp = 0xbfbff16c, ebp = 0xbfbff208 ---


The code in question:

FILEDESC_LOCK(fdp);
FILE_LOCK(fp);
if (fp-f_count == 1) {
KASSERT(fdp-fd_ofiles[indx] != fp,
(Open file descriptor lost all refs));
FILEDESC_UNLOCK(fdp);
FILE_UNLOCK(fp);
VOP_UNLOCK(vp, 0, td);
vn_close(vp, flags  FMASK, fp-f_cred, td);   
fdrop(fp, td);
td-td_retval[0] = indx;
return 0;
}
  
/* assert that vn_open created a backing object if one is needed */
KASSERT(!vn_canvmio(vp) || VOP_GETVOBJECT(vp, NULL) == 0,
(open: vmio vnode has no backing object after vn_open));

fp-f_data = vp;
fp-f_flag = flags  FMASK;
fp-f_ops = vnops;
fp-f_type = (vp-v_type == VFIFO ? DTYPE_FIFO : DTYPE_VNODE);
FILEDESC_UNLOCK(fdp);
FILE_UNLOCK(fp);

This one appears to be easily fixable by moving the second KASSERT down
a few lines to below the FILE_UNLOCK() call.

Any bets on how many other potential deadlock problems there are in the
tree?
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: fun with WITNESS and pool mutex

2003-06-18 Thread Don Lewis

On 18 Jun, I wrote:
 When I was attempting to debug a system deadlock problem where the
 culprit process was sleeping on a pool mutex, I noticed that show
 witness in ddb doesn't report anything about this particular mutex
 flavor.  I discovered that witness doesn't monitor these mutexes because
 mtx_pool_setup() calls mtx_init with the MTX_NOWITNESS flag.
 
 These are a mutexes bit special, because they are supposed to be leaf
 mutexes and no other mutexes should be grabbed after them.  The deadlock
 in question caused me to discover a violation of this restriction, so I
 wondered if there were more problems of this type in the code.  I
 suspected there would be, since there haven't been any automatic checks
 of to verify that these mutexes are being used correctly.
 
 Just for grins, I removed the MTX_NOWITNESS flag from mtx_pool_setup()
 and quickly found the first violation during the boot sequence:

[ snip - I committed a patch ]

 Any bets on how many other potential deadlock problems there are in the
 tree?

The only problems I've found so far are in fdrop_locked() and
kern_open(), so things might not be as bleak as I initially feared.

I also got this LOR message from witness about the sx lock code:

lock order reversal
 1st 0xc05e1020 pool mutex (pool mutex) @ /usr/src/sys/kern/kern_sx.c:111
 2nd 0xc05dfa00 module subsystem sx lock (module subsystem sx lock) @ /usr/src/s
ys/kern/kern_module.c:126

I *think* this is actually a safe use of pool mutex.  What would be the
best way to quite the complaint?  The two possibilities that I can think
of are to handle this as a special case in the witness code or to
slightly rearrange the code in sx_lock.c to swap the order of the
WITNESS_LOCK() and mtx_unlock() calls in _sx*_lock().


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to [EMAIL PROTECTED]

2 tcp/ip witness warnings

2003-03-14 Thread Alexey Zelkin

hi,

Expirementing today with netgraph sockets hit into these two cases:

1:

lock order reversal
 1st 0xc27658b4 inp (inp) @ /home/phantom/src5/sys/netinet/tcp_input.c:640
 2nd 0xc031832c tcp (tcp) @ /home/phantom/src5/sys/netinet/tcp_usrreq.c:621
Stack backtrace:
backtrace(c02e9301,c031832c,c02eef23,c02eef23,c02f0154) at backtrace+0x17
witness_lock(c031832c,8,c02f0154,26d,0) at witness_lock+0x66d
_mtx_lock_flags(c031832c,0,c02f0154,26d,0) at _mtx_lock_flags+0xb1
tcp_usr_rcvd(c273a200,80,c02e8f74,c294db00,3b9aca00) at tcp_usr_rcvd+0x30
soreceive(c273a200,0,cd3259f0,cd3259e8,0) at soreceive+0x89a
ng_ksocket_incoming2(c2960500,0,c273a200,4,c26975d8) at ng_ksocket_incoming2+0x2
65
ng_apply_item(c2960500,c26975c0,c294bcee,7d7,c02e8f74) at ng_apply_item+0x4af
ng_snd_item(c26975c0,0,cd325adc,c2960500,c273a24c) at ng_snd_item+0x614
ng_send_fn(c2960500,0,c2549ac0,c273a200,4) at ng_send_fn+0x1b7
ng_ksocket_incoming(c273a200,c2960500,4,c0ee7168,0) at ng_ksocket_incoming+0x2f
sowakeup(c273a200,c273a24c,c02ef894,822,5000) at sowakeup+0x97
tcp_input(c0ee7100,14,c01acbbd,c033b500,1) at tcp_input+0x22bb
ip_input(c0ee7100,0,c02ee77d,e9,c0eb99c0) at ip_input+0x7d6
swi_net(0,0,c02e43b0,217,c0ec35f4) at swi_net+0x112
ithread_loop(c0ec2100,cd325d48,c02e4219,35f,5ed54) at ithread_loop+0x182
fork_exit(c01a3b40,c0ec2100,cd325d48) at fork_exit+0xc4
fork_trampoline() at fork_trampoline+0x1a

2:

malloc() of 16 with the following non-sleepablelocks held:
exclusive sleep mutex inp r = 0 (0xc2765194) locked @ /home/phantom/src5/sys/net
inet/tcp_input.c:640
exclusive sleep mutex tcp r = 0 (0xc031832c) locked @ /home/phantom/src5/sys/net
inet/tcp_usrreq.c:480
exclusive sleep mutex netisr lock r = 0 (0xc0360260) locked @ /home/phantom/src5
/sys/net/netisr.c:215

db tr
Debugger(c02d6d9c,cd32587c,3,cd32587c,c0313240) at Debugger+0x54
witness_warn(5,0,c02f6550,c02dc43f,246) at witness_warn+0x1a0
uma_zalloc_arg(c083aa20,0,102,1f6,c02f0154) at uma_zalloc_arg+0x53
malloc(10,c0313240,102,c2760720,cd325904) at malloc+0xd4
in_sockaddr(0,cd3258f4,c02f0154,1f6,0) at in_sockaddr+0x29
tcp_usr_accept(c273a100,cd32593c,cd32594c,c254a013,c273a100) at tcp_usr_accept+0x13a
soaccept(c273a100,cd32593c,132,64,4) at soaccept+0x3f
ng_ksocket_finish_accept(c2a9fd00,0,7fd,0,cd325988) at ng_ksocket_finish_accept+0x73
ng_ksocket_incoming2(c294d500,0,c273be00,4,c2943d58) at ng_ksocket_incoming2+0x1e0
ng_apply_item(c294d500,c2943d40,c2981cee,7d7,c02e8f74) at ng_apply_item+0x4af
ng_snd_item(c2943d40,0,c02f0154,c294d500,c273be4c) at ng_snd_item+0x614
ng_send_fn(c294d500,0,c2549ac0,c273be00,4) at ng_send_fn+0x1b7
ng_ksocket_incoming(c273be00,c294d500,4,c273a100,c273be00) at ng_ksocket_incoming+0x2f
sowakeup(c273be00,c273be4c,0,c298dbb0,cd325b74) at sowakeup+0x97
sonewconn(c273be00,2,c033c378,c0ec4c9c,d71d0300) at sonewconn+0x199
syncache_socket(c298dbb0,c273be00,c0ed2e00,c267b528,c27601c8) at syncache_socket+0x1f
syncache_expand(cd325b74,c0ed2e68,cd325b70,c0ed2e00,5000) at syncache_expand+0x98
tcp_input(c0ed2e00,14,c01acbbd,c033b500,1) at tcp_input+0x676
ip_input(c0ed2e00,0,c02ee77d,e9,c0eb99c0) at ip_input+0x7d6
swi_net(0,0,c02e43b0,217,c0ec35f4) at swi_net+0x112
ithread_loop(c0ec2100,cd325d48,c02e4219,35f,5ed54) at ithread_loop+0x182
fork_exit(c01a3b40,c0ec2100,cd325d48) at fork_exit+0xc4
fork_trampoline() at fork_trampoline+0x1a
--- trap 0x1, eip = 0, esp = 0xcd325d7c, ebp = 0 ---


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re:2 tcp/ip witness warnings

2003-03-14 Thread Jeffrey Hsu

Thanks for reporting these.  NFS has this same problem with sowakeup().
You can find an old patch I made for this at
http://www.freebsd.org/~hsu/hammer.diff
It takes a big hammer to the problem and is less than elegant, so I
haven't committed it.  I hope to have time to address this problem
a better way some time in the future, after some other pressing locking
issues are fixed.

Jeffrey


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

witness needs DDB

2003-03-08 Thread David Xu

subr_witness.c now needs DDB option enabled, otherwise can not
be compiled. please fix it.

David Xu


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: Witness problem with sound

2003-03-07 Thread Jun Su

On Wednesday 05 March 2003 07:42, Pete Carah wrote:
 I don't know how system-specific this problem is, but:

 Sony VAIO R505ES
 Sound is Intel ICH3 + Yamaha.

 This or something closely related has been happening for weeks.
 Several times earlier this week and last week sound panic'd, and
 also sometimes there was a panic (several different kinds) on boot.
 Late last week X wouldn't start due to not being able to see the
 VESA modes.
 All those except the sound problems currently appear fixed...

 This may or may not be related to the fact that acpi puts nearly all
 device interrupts on irq 9 (which causes other problems).


 There are other problems (mostly with pcmcia, which is left out of
 this kernel; it causes hangs on init), but this one is obvious and
 does leave the system running.

 A remaining one that bothers multi-type usb card readers is that CAM/XPT
 only scans LUN 0 for any device it finds.  All my card readers show
 as only CF because of this, and I have to give 4 camcontrol commands to
 get the system to see the rest (at least it does work with manual
 intervention!!).

 -- Pete

 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-current in the body of the message

I got the same error message from my R505DC. And the sound also doesn;t work 
correctly. 

Jun Su

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

RE: witness: nfs buf queue

2003-03-06 Thread John Baldwin


On 06-Mar-2003 Jonathan Lemon wrote:
 Doing a kernel build over NFS on today's -current gives a pile of
 following error messages during the final link phase:
 
 Acquiring lockmgr lock nfs with the following non-sleepablelocks held:
 exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ 
 ../../../kern/vfs_bio.c:2107
 Acquiring lockmgr lock nfs with the following non-sleepablelocks held:
 exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ 
 ../../../kern/vfs_bio.c:2107
 Acquiring lockmgr lock nfs with the following non-sleepablelocks held:
 exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ 
 ../../../kern/vfs_bio.c:2107
 ...

Witness didn't used to complain about these until my recent commits,
so these could be old bugs that we are just now seeing.  It does look
like all the lock functions in inmem() use LK_NOWAIT which is exempted
from the witness check:

if ((flags  (LK_NOWAIT|LK_RELEASE)) == 0)
WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK,
lkp-lk_interlock-mtx_object,
Acquiring lockmgr lock \%s\, lkp-lk_wmesg);

A stack trace from one of these would be helpful.

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: witness: nfs buf queue

2003-03-06 Thread Jonathan Lemon

On Thu, Mar 06, 2003 at 01:01:49PM -0500, John Baldwin wrote:
 
 On 06-Mar-2003 Jonathan Lemon wrote:
  Doing a kernel build over NFS on today's -current gives a pile of
  following error messages during the final link phase:
  
  Acquiring lockmgr lock nfs with the following non-sleepablelocks held:
  exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ 
  ../../../kern/vfs_bio.c:2107
  Acquiring lockmgr lock nfs with the following non-sleepablelocks held:
  exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ 
  ../../../kern/vfs_bio.c:2107
  Acquiring lockmgr lock nfs with the following non-sleepablelocks held:
  exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ 
  ../../../kern/vfs_bio.c:2107
  ...
 
 Witness didn't used to complain about these until my recent commits,
 so these could be old bugs that we are just now seeing.  It does look
 like all the lock functions in inmem() use LK_NOWAIT which is exempted
 from the witness check:
 
 if ((flags  (LK_NOWAIT|LK_RELEASE)) == 0)
 WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK,
 lkp-lk_interlock-mtx_object,
 Acquiring lockmgr lock \%s\, lkp-lk_wmesg);
 
 A stack trace from one of these would be helpful.

Here you go.  This is from -current as of roughly an hour ago, I managed
to break into ddb in the middle of witness_warn:

kvprintf(c03918eb,c0210c20,e4050bfc,a,e4050c48) at kvprintf+0x8d
vprintf(c03918eb,e4050c48,0,c0428620,10001) at vprintf+0x57
witness_warn(5,c0400da8,c03918eb,c039fff9,c6a752d0) at witness_warn+0xbe
lockmgr(ca438304,10001,ca438248,c6a752d0,12) at lockmgr+0xc8
vop_sharedlock(e4050c98,0,c039a8b0,35c,c01eb1f2) at vop_sharedlock+0x7d
vn_lock(ca438248,12,c6a752d0,83b,c04285e0) at vn_lock+0xeb
flushbufqueues(c04285e0,0,c0398a32,104,64) at flushbufqueues+0x100
buf_daemon(0,e4050d48,c0390fdb,35f,0) at buf_daemon+0xd5
fork_exit(c0239920,0,e4050d48) at fork_exit+0xc4
fork_trampoline() at fork_trampoline+0x1a
--- trap 0x1, eip = 0, esp = 0xe4050d7c, ebp = 0 ---

No gdb stack trace, my dump device is too small.  :-(
--
Jonathan

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

Re: witness: nfs buf queue

2003-03-06 Thread John Baldwin


On 06-Mar-2003 Jonathan Lemon wrote:
 On Thu, Mar 06, 2003 at 01:01:49PM -0500, John Baldwin wrote:
 
 On 06-Mar-2003 Jonathan Lemon wrote:
  Doing a kernel build over NFS on today's -current gives a pile of
  following error messages during the final link phase:
  
  Acquiring lockmgr lock nfs with the following non-sleepablelocks held:
  exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ 
  ../../../kern/vfs_bio.c:2107
  Acquiring lockmgr lock nfs with the following non-sleepablelocks held:
  exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ 
  ../../../kern/vfs_bio.c:2107
  Acquiring lockmgr lock nfs with the following non-sleepablelocks held:
  exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ 
  ../../../kern/vfs_bio.c:2107
  ...
 
 Witness didn't used to complain about these until my recent commits,
 so these could be old bugs that we are just now seeing.  It does look
 like all the lock functions in inmem() use LK_NOWAIT which is exempted
 from the witness check:
 
 if ((flags  (LK_NOWAIT|LK_RELEASE)) == 0)
 WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK,
 lkp-lk_interlock-mtx_object,
 Acquiring lockmgr lock \%s\, lkp-lk_wmesg);
 
 A stack trace from one of these would be helpful.
 
 Here you go.  This is from -current as of roughly an hour ago, I managed
 to break into ddb in the middle of witness_warn:
 
 kvprintf(c03918eb,c0210c20,e4050bfc,a,e4050c48) at kvprintf+0x8d
 vprintf(c03918eb,e4050c48,0,c0428620,10001) at vprintf+0x57
 witness_warn(5,c0400da8,c03918eb,c039fff9,c6a752d0) at witness_warn+0xbe
 lockmgr(ca438304,10001,ca438248,c6a752d0,12) at lockmgr+0xc8

10001 = LK_INTERLOCK | LK_SHARED, note no LK_NOWAIT flag, hence the bug. :(

 vop_sharedlock(e4050c98,0,c039a8b0,35c,c01eb1f2) at vop_sharedlock+0x7d

The bug seems to be that vop_sharedlock doesn't honor LK_NOWAIT or any
other external lockmgr flags.

Untested possible patch below.  I'd really like jeffr's comments on it
though:

Index: vfs_default.c
===
RCS file: /usr/cvs/src/sys/kern/vfs_default.c,v
retrieving revision 1.75
diff -u -r1.75 vfs_default.c
--- vfs_default.c   3 Mar 2003 23:37:50 -   1.75
+++ vfs_default.c   6 Mar 2003 20:00:40 -
@@ -445,8 +445,7 @@
default:
panic(vop_sharedlock: bad operation %d, flags  LK_TYPE_MASK);
}
-   if (flags  LK_INTERLOCK)
-   vnflags |= LK_INTERLOCK;
+   vnflags |= flags  (LK_INTERLOCK | LK_EXTFLG_MASK);
 #ifndefDEBUG_LOCKS
return (lockmgr(vp-v_vnlock, vnflags, VI_MTX(vp), ap-a_td));
 #else
@@ -503,8 +502,7 @@
default:
panic(vop_nolock: bad operation %d, flags  LK_TYPE_MASK);
}
-   if (flags  LK_INTERLOCK)
-   vnflags |= LK_INTERLOCK;
+   vnflags |= flags  (LK_INTERLOCK | LK_EXTFLG_MASK);
return(lockmgr(vp-v_vnlock, vnflags, VI_MTX(vp), ap-a_td));
 #else /* for now */
/*

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

witness: nfs buf queue

2003-03-05 Thread Jonathan Lemon

Doing a kernel build over NFS on today's -current gives a pile of
following error messages during the final link phase:

Acquiring lockmgr lock nfs with the following non-sleepablelocks held:
exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ 
../../../kern/vfs_bio.c:2107
Acquiring lockmgr lock nfs with the following non-sleepablelocks held:
exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ 
../../../kern/vfs_bio.c:2107
Acquiring lockmgr lock nfs with the following non-sleepablelocks held:
exclusive sleep mutex buf queue lock r = 0 (0xc0427b60) locked @ 
../../../kern/vfs_bio.c:2107
...

--
Jonathan

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

WITNESS panic in netinet/tcp_input.c

2003-03-05 Thread Sean Kelly

It seems like I'm being handed kernel panics on a platter today. I just got
a WITNESS-related one in /usr/src/sys/netinet/tcp_input.c:2190. And
actually, in the process of writing this message the first time, it
happened again. I've since booted to an older kernel.

The kernel this comes from is 5.0-CURRENT from Tue Mar 4 20:30:35 CST 2003.

Soft updates appears to have turned my public PGP keyring file into a bunch
of NULs too : I assume that is an artifact of the kernel panic.

Script started on Wed Mar  5 19:51:06 2003
edgemaster# gdb -k /boot/kernel/kernel.debug vmcore.4 
GNU gdb 5.2.1 (FreeBSD)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-undermydesk-freebsd...
panic: bwrite: buffer is not busy???
panic messages:
---
panic: lock (sleep mutex) tcp not locked @ /usr/src/sys/netinet/tcp_input.c:2190
Stack backtrace:

syncing disks, buffers remaining... panic: bwrite: buffer is not busy???
Uptime: 20m40s
Dumping 1279 MB
ata1: resetting devices ..
done
[CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to 
abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort] [CTRL-C 
to abort]  16 32 48[CTRL-C to abort] [CTRL-C to abort] [CTRL-C to abort]  64 80 96 112 
128 144 160 176 192 208 224 240 256 272 288 304 320 336 352 368 384 400 416 432 448 
464 480 496 512 528 544 560 576 592 608 624 640 656 672 688 704 720 736 752 768 784 
800 816 832 848 864 880 896 912 928 944 960 976 992 1008 1024 1040 1056 1072 1088 1104 
1120 1136 1152 1168 1184 1200 1216 1232 1248 1264
---
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:239
239 dumping++;
(kgdb) bt full
#0  doadump () at /usr/src/sys/kern/kern_shutdown.c:239
No locals.
#1  0xc01cd66a in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:371
No locals.
#2  0xc01cd8d3 in panic () at /usr/src/sys/kern/kern_shutdown.c:542
td = (struct thread *) 0xc281aa50
bootopt = 260
newpanic = 0
buf = bwrite: buffer is not busy???\0ked @ 
/usr/src/sys/netinet/tcp_input.c:2190, '\0' repeats 182 times
#3  0xc020e142 in bwrite (bp=0xd30a4778) at /usr/src/sys/kern/vfs_bio.c:795
oldflags = 537002148
newbp = (struct buf *) 0xc767d5b4
#4  0xc020fe5c in vfs_bio_awrite (bp=0xd30a4778)
at /usr/src/sys/kern/vfs_bio.c:1692
i = 1
j = 0
lblkno = 0
vp = (struct vnode *) 0xc767d5b4
ncl = 0
nwritten = 16384
size = 16384
maxcl = 8
#5  0xc02c0dca in ffs_fsync (ap=0xdf976a00)
at /usr/src/sys/ufs/ffs/ffs_vnops.c:257
vp = (struct vnode *) 0xc767d5b4
ip = (struct inode *) 0xd30a4778
bp = (struct buf *) 0xd30a4778
nbp = (struct buf *) 0x0
error = 0
wait = 0
passes = 4
skipmeta = 0
lbn = 1
#6  0xc02bff1e in ffs_sync (mp=0xc697a000, waitfor=2, cred=0xc2806e80, 
td=0xc037f6a0) at vnode_if.h:612
nvp = (struct vnode *) 0xc767d490
vp = (struct vnode *) 0xc767d5b4
devvp = (struct vnode *) 0xc767d5b4
ip = (struct inode *) 0x0
ump = (struct ufsmount *) 0xc699b300
fs = (struct fs *) 0xc697
error = 0
count = 0
wait = 0
lockreq = 18
allerror = 0
#7  0xc022261b in sync (td=0xc037f6a0, uap=0x0)
at /usr/src/sys/kern/vfs_syscalls.c:138
mp = (struct mount *) 0xc697a000
nmp = (struct mount *) 0x0
asyncflag = 0
#8  0xc01cd29c in boot (howto=256) at /usr/src/sys/kern/kern_shutdown.c:280
bp = (struct buf *) 0x0
iter = -1031689648
nbusy = -1031700352
pbusy = -1031689648
subiter = -1031700352
#9  0xc01cd8d3 in panic () at /usr/src/sys/kern/kern_shutdown.c:542
td = (struct thread *) 0xc281aa50
bootopt = 256
newpanic = 1
buf = bwrite: buffer is not busy???\0ked @ 
/usr/src/sys/netinet/tcp_input.c:2190, '\0' repeats 182 times
#10 0xc01ee013 in witness_unlock (lock=0xc038f9cc, flags=8, 
file=0xc035a374 /usr/src/sys/netinet/tcp_input.c, line=2190)
at /usr/src/sys/kern/subr_witness.c:951
lock_list = (struct lock_list_entry **) 0xc03e3540
instance = (struct lock_instance *) 0xc03e3554
class = (struct lock_class *) 0xc0384160
s = 1664
i = 0
j = -1070007860
#11 0xc01c4952 in _mtx_unlock_flags (m=0xc03e3554, opts=0, 
file=0xc038f9cc `A8À\a\2125À\a\2125À, line=-1069664960)
at /usr/src/sys/kern/kern_mutex.c:357
No locals.
#12 0xc0255ec9 in tcp_input (m=0xc038f9cc, off0=20)
at /usr/src/sys/netinet/tcp_input.c:2324
th = (struct tcphdr *) 0xc34c1824
ip = (struct ip *) 0xc34c1810
ipov = (struct ipovly

RE: witness_get: witness exhausted?

2003-03-04 Thread John Baldwin


On 03-Mar-2003 Andrew Gallatin wrote:
 
 I'm developing a character driver which tracks a lot of state on a
 per-open basis.  I've got several mutexes in there which are
 initialzed at open, and destroyed at close.  After a few
 dozen opens, witness seems to croak with:
 
 witness_get: witness exhausted
 
 Am I leaking something?  Or is the witness code?  I looked at
 subr_witness.c, and I don't see witness_free() being called from
 witness_destroy().  There's probably some design constraint that
 I don't understand.

Unfortunately dead witnesses may still be stuck in the lock order
hierarchy and I haven't figured out yet how to properly handle the
case of free'ing a witness structure from the tree while preserving
the correct lock orders.  You can try
http://www.freebsd.org/~jhb/patches/witness.patch  but I don't think
it will help with your specific case.

 FWIW, Witness (and the FreeBSD debugging environment in general) is
 why I've gotten approval to co-develop this driver on FreeBSD (in
 addition to linux).  Its already caught several locking bugs.

Glad to hear it is at least working in some cases. :)

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

RE: witness_get: witness exhausted?

2003-03-04 Thread Andrew Gallatin


John Baldwin writes:
  
  Unfortunately dead witnesses may still be stuck in the lock order
  hierarchy and I haven't figured out yet how to properly handle the
  case of free'ing a witness structure from the tree while preserving
  the correct lock orders.  You can try

Ah, I think I see.  Thanks for the info.

  http://www.freebsd.org/~jhb/patches/witness.patch  but I don't think
  it will help with your specific case.

It would probably help a little, but not very much..

Thanks,

Drew


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message

1 2 >

1 - 100 of 187 matches

Mail list logo