Hi,

On my ThinkPat T430s I am trying to debug multithreaded qemu by
attaching gdb.  This crashes the kernel of the host system within
a few minutes.  Luckily I managed to attach a serial over lan with
Intel AMT.

login: panic: kernel diagnostic assertion "__mp_lock_held(&sched_lock) == 0" 
failed: file "../../../../kern/kern_lock.c", line 126
Stopped at      Debugger+0x5:   leave
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
IF RUNNING SMP, USE 'mach ddbcpu <#>' AND 'trace' ON OTHER PROCESSORS, TOO.
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
ddb{0}> trace
Debugger() at Debugger+0x5
panic() at panic+0xee
__assert() at __assert+0x21
_kernel_lock_init() at _kernel_lock_init
issignal() at issignal+0x205
sleep_setup_signal() at sleep_setup_signal+0x94
tsleep() at tsleep+0x86
sys_sigsuspend() at sys_sigsuspend+0x46
syscall() at syscall+0x249
--- syscall (number 111) ---
end of kernel
end trace frame: 0x7fe55fdbef0, count: -9
0x7fe50c4cdcc:

   PID   PPID   PGRP    UID  S       FLAGS  WAIT          COMMAND
 28380   5201   5983   1000  3   0x4100080  thrsleep      qemu-system-x86_
* 5825   5201   5983   1000  7   0xc100088  pause         qemu-system-x86_
 18891   5201   5983   1000  3   0xc100080  sigwait       qemu-system-x86_
  5983   5201   5983   1000  3   0x8000080  thrsleep      qemu-system-x86_
 19446  22621  19446   1000  3        0x80  poll          gdb
  5201  12983   5201   1000  3        0x80  wait          gdb

The kernel lock is acquired in mi_syscall() as sys_sigsuspend() needs
it.  tsleep() calls sleep_setup() which acquires the sched lock.
Then sleep_setup_signal() calls issignal() via the macro CURSIG().
The function issignal() is full of side effects, especially for a
traced process.

There the kernel lock is acquired again, which should be fine as
it is a recursive lock.  But to avoid deadlocks, _kernel_lock()
asserts that is is acquired before sched lock.  This check is too
strict, the condition is only true when the lock is taken the first
time.

Index: kern/kern_lock.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_lock.c,v
retrieving revision 1.42
diff -u -p -u -p -r1.42 kern_lock.c
--- kern/kern_lock.c    6 May 2013 16:37:55 -0000       1.42
+++ kern/kern_lock.c    11 Aug 2013 01:54:06 -0000
@@ -123,7 +123,10 @@ _kernel_lock_init(void)
 void
 _kernel_lock(void)
 {
-       SCHED_ASSERT_UNLOCKED();
+#ifdef DIAGNOSTIC
+       if (__mp_lock_held(&kernel_lock) == 0)
+               SCHED_ASSERT_UNLOCKED();
+#endif /* DIAGNOSTIC */
        __mp_lock(&kernel_lock);
 }

Unfortunately this fix does not solve my problem.  With that I get
another panic: wakeup: p_stat is 7

login: panic: wakeup: p_stat is 7
Stopped at      Debugger+0x5:   leave
RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
IF RUNNING SMP, USE 'mach ddbcpu <#>' AND 'trace' ON OTHER PROCESSORS, TOO.
DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
ddb{3}> trace
Debugger() at Debugger+0x5
panic() at panic+0xee
wakeup_n() at wakeup_n+0xfd
sys___thrwakeup() at sys___thrwakeup+0x54
syscall() at syscall+0x249
--- syscall (number 301) ---
end of kernel
end trace frame: 0x684cb9237c0, count: -5
0x684bf834c2a:

   PID   PPID   PGRP    UID  S       FLAGS  WAIT          COMMAND
 10959  11922  10959   1000  3        0x80  wait          gdb
*11287  10959  10043   1000  7   0xc100000                qemu-system-x86_
 11131  10959  10043   1000  3   0xc100080  sigwait       qemu-system-x86_
 10043  10959  10043   1000  7   0x8000000                qemu-system-x86_

I will investigate further.

bluhm

Reply via email to