> Date: Sun, 11 Aug 2013 11:46:29 +0200
> From: Alexander Bluhm <[email protected]>
>
> Hi,
>
> On my ThinkPat T430s I am trying to debug multithreaded qemu by
> attaching gdb. This crashes the kernel of the host system within
> a few minutes. Luckily I managed to attach a serial over lan with
> Intel AMT.
>
> login: panic: kernel diagnostic assertion "__mp_lock_held(&sched_lock) == 0"
> failed: file "../../../../kern/kern_lock.c", line 126
> Stopped at Debugger+0x5: leave
> RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
> IF RUNNING SMP, USE 'mach ddbcpu <#>' AND 'trace' ON OTHER PROCESSORS, TOO.
> DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
> ddb{0}> trace
> Debugger() at Debugger+0x5
> panic() at panic+0xee
> __assert() at __assert+0x21
> _kernel_lock_init() at _kernel_lock_init
> issignal() at issignal+0x205
> sleep_setup_signal() at sleep_setup_signal+0x94
> tsleep() at tsleep+0x86
> sys_sigsuspend() at sys_sigsuspend+0x46
> syscall() at syscall+0x249
> --- syscall (number 111) ---
> end of kernel
> end trace frame: 0x7fe55fdbef0, count: -9
> 0x7fe50c4cdcc:
>
> PID PPID PGRP UID S FLAGS WAIT COMMAND
> 28380 5201 5983 1000 3 0x4100080 thrsleep qemu-system-x86_
> * 5825 5201 5983 1000 7 0xc100088 pause qemu-system-x86_
> 18891 5201 5983 1000 3 0xc100080 sigwait qemu-system-x86_
> 5983 5201 5983 1000 3 0x8000080 thrsleep qemu-system-x86_
> 19446 22621 19446 1000 3 0x80 poll gdb
> 5201 12983 5201 1000 3 0x80 wait gdb
>
> The kernel lock is acquired in mi_syscall() as sys_sigsuspend() needs
> it. tsleep() calls sleep_setup() which acquires the sched lock.
> Then sleep_setup_signal() calls issignal() via the macro CURSIG().
> The function issignal() is full of side effects, especially for a
> traced process.
>
> There the kernel lock is acquired again, which should be fine as
> it is a recursive lock. But to avoid deadlocks, _kernel_lock()
> asserts that is is acquired before sched lock. This check is too
> strict, the condition is only true when the lock is taken the first
> time.
I don't think that check is too strict. Rather the ptrace(2)
functionality is subtly broken.
> Index: kern/kern_lock.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_lock.c,v
> retrieving revision 1.42
> diff -u -p -u -p -r1.42 kern_lock.c
> --- kern/kern_lock.c 6 May 2013 16:37:55 -0000 1.42
> +++ kern/kern_lock.c 11 Aug 2013 01:54:06 -0000
> @@ -123,7 +123,10 @@ _kernel_lock_init(void)
> void
> _kernel_lock(void)
> {
> - SCHED_ASSERT_UNLOCKED();
> +#ifdef DIAGNOSTIC
> + if (__mp_lock_held(&kernel_lock) == 0)
> + SCHED_ASSERT_UNLOCKED();
> +#endif /* DIAGNOSTIC */
> __mp_lock(&kernel_lock);
> }
>
> Unfortunately this fix does not solve my problem. With that I get
> another panic: wakeup: p_stat is 7
>
> login: panic: wakeup: p_stat is 7
> Stopped at Debugger+0x5: leave
> RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
> IF RUNNING SMP, USE 'mach ddbcpu <#>' AND 'trace' ON OTHER PROCESSORS, TOO.
> DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
> ddb{3}> trace
> Debugger() at Debugger+0x5
> panic() at panic+0xee
> wakeup_n() at wakeup_n+0xfd
> sys___thrwakeup() at sys___thrwakeup+0x54
> syscall() at syscall+0x249
> --- syscall (number 301) ---
> end of kernel
> end trace frame: 0x684cb9237c0, count: -5
> 0x684bf834c2a:
>
> PID PPID PGRP UID S FLAGS WAIT COMMAND
> 10959 11922 10959 1000 3 0x80 wait gdb
> *11287 10959 10043 1000 7 0xc100000 qemu-system-x86_
> 11131 10959 10043 1000 3 0xc100080 sigwait qemu-system-x86_
> 10043 10959 10043 1000 7 0x8000000 qemu-system-x86_
>
> I will investigate further.
>
> bluhm
>
>