Hi,
On my ThinkPat T430s I am trying to debug multithreaded qemu by
attaching gdb. This crashes the kernel of the host system within
a few minutes. Luckily I managed to attach a serial over lan with
Intel AMT.
login: panic: kernel diagnostic assertion "__mp_lock_held(&sched_lock) == 0"
failed: file "../../../../kern/kern_lock.c", line 126
Stopped at Debugger+0x5: leave
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
IF RUNNING SMP, USE 'mach ddbcpu <#>' AND 'trace' ON OTHER PROCESSORS, TOO.
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
ddb{0}> trace
Debugger() at Debugger+0x5
panic() at panic+0xee
__assert() at __assert+0x21
_kernel_lock_init() at _kernel_lock_init
issignal() at issignal+0x205
sleep_setup_signal() at sleep_setup_signal+0x94
tsleep() at tsleep+0x86
sys_sigsuspend() at sys_sigsuspend+0x46
syscall() at syscall+0x249
--- syscall (number 111) ---
end of kernel
end trace frame: 0x7fe55fdbef0, count: -9
0x7fe50c4cdcc:
PID PPID PGRP UID S FLAGS WAIT COMMAND
28380 5201 5983 1000 3 0x4100080 thrsleep qemu-system-x86_
* 5825 5201 5983 1000 7 0xc100088 pause qemu-system-x86_
18891 5201 5983 1000 3 0xc100080 sigwait qemu-system-x86_
5983 5201 5983 1000 3 0x8000080 thrsleep qemu-system-x86_
19446 22621 19446 1000 3 0x80 poll gdb
5201 12983 5201 1000 3 0x80 wait gdb
The kernel lock is acquired in mi_syscall() as sys_sigsuspend() needs
it. tsleep() calls sleep_setup() which acquires the sched lock.
Then sleep_setup_signal() calls issignal() via the macro CURSIG().
The function issignal() is full of side effects, especially for a
traced process.
There the kernel lock is acquired again, which should be fine as
it is a recursive lock. But to avoid deadlocks, _kernel_lock()
asserts that is is acquired before sched lock. This check is too
strict, the condition is only true when the lock is taken the first
time.
Index: kern/kern_lock.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_lock.c,v
retrieving revision 1.42
diff -u -p -u -p -r1.42 kern_lock.c
--- kern/kern_lock.c 6 May 2013 16:37:55 -0000 1.42
+++ kern/kern_lock.c 11 Aug 2013 01:54:06 -0000
@@ -123,7 +123,10 @@ _kernel_lock_init(void)
void
_kernel_lock(void)
{
- SCHED_ASSERT_UNLOCKED();
+#ifdef DIAGNOSTIC
+ if (__mp_lock_held(&kernel_lock) == 0)
+ SCHED_ASSERT_UNLOCKED();
+#endif /* DIAGNOSTIC */
__mp_lock(&kernel_lock);
}
Unfortunately this fix does not solve my problem. With that I get
another panic: wakeup: p_stat is 7
login: panic: wakeup: p_stat is 7
Stopped at Debugger+0x5: leave
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
IF RUNNING SMP, USE 'mach ddbcpu <#>' AND 'trace' ON OTHER PROCESSORS, TOO.
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
ddb{3}> trace
Debugger() at Debugger+0x5
panic() at panic+0xee
wakeup_n() at wakeup_n+0xfd
sys___thrwakeup() at sys___thrwakeup+0x54
syscall() at syscall+0x249
--- syscall (number 301) ---
end of kernel
end trace frame: 0x684cb9237c0, count: -5
0x684bf834c2a:
PID PPID PGRP UID S FLAGS WAIT COMMAND
10959 11922 10959 1000 3 0x80 wait gdb
*11287 10959 10043 1000 7 0xc100000 qemu-system-x86_
11131 10959 10043 1000 3 0xc100080 sigwait qemu-system-x86_
10043 10959 10043 1000 7 0x8000000 qemu-system-x86_
I will investigate further.
bluhm