> Date: Sun, 11 Aug 2013 11:46:29 +0200
> From: Alexander Bluhm <[email protected]>
> 
> Hi,
> 
> On my ThinkPat T430s I am trying to debug multithreaded qemu by
> attaching gdb.  This crashes the kernel of the host system within
> a few minutes.  Luckily I managed to attach a serial over lan with
> Intel AMT.
> 
> login: panic: kernel diagnostic assertion "__mp_lock_held(&sched_lock) == 0" 
> failed: file "../../../../kern/kern_lock.c", line 126
> Stopped at      Debugger+0x5:   leave
> RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
> IF RUNNING SMP, USE 'mach ddbcpu <#>' AND 'trace' ON OTHER PROCESSORS, TOO.
> DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
> ddb{0}> trace
> Debugger() at Debugger+0x5
> panic() at panic+0xee
> __assert() at __assert+0x21
> _kernel_lock_init() at _kernel_lock_init
> issignal() at issignal+0x205
> sleep_setup_signal() at sleep_setup_signal+0x94
> tsleep() at tsleep+0x86
> sys_sigsuspend() at sys_sigsuspend+0x46
> syscall() at syscall+0x249
> --- syscall (number 111) ---
> end of kernel
> end trace frame: 0x7fe55fdbef0, count: -9
> 0x7fe50c4cdcc:
> 
>    PID   PPID   PGRP    UID  S       FLAGS  WAIT          COMMAND
>  28380   5201   5983   1000  3   0x4100080  thrsleep      qemu-system-x86_
> * 5825   5201   5983   1000  7   0xc100088  pause         qemu-system-x86_
>  18891   5201   5983   1000  3   0xc100080  sigwait       qemu-system-x86_
>   5983   5201   5983   1000  3   0x8000080  thrsleep      qemu-system-x86_
>  19446  22621  19446   1000  3        0x80  poll          gdb
>   5201  12983   5201   1000  3        0x80  wait          gdb
> 
> The kernel lock is acquired in mi_syscall() as sys_sigsuspend() needs
> it.  tsleep() calls sleep_setup() which acquires the sched lock.
> Then sleep_setup_signal() calls issignal() via the macro CURSIG().
> The function issignal() is full of side effects, especially for a
> traced process.
> 
> There the kernel lock is acquired again, which should be fine as
> it is a recursive lock.  But to avoid deadlocks, _kernel_lock()
> asserts that is is acquired before sched lock.  This check is too
> strict, the condition is only true when the lock is taken the first
> time.

I don't think that check is too strict.  Rather the ptrace(2)
functionality is subtly broken.

> Index: kern/kern_lock.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_lock.c,v
> retrieving revision 1.42
> diff -u -p -u -p -r1.42 kern_lock.c
> --- kern/kern_lock.c  6 May 2013 16:37:55 -0000       1.42
> +++ kern/kern_lock.c  11 Aug 2013 01:54:06 -0000
> @@ -123,7 +123,10 @@ _kernel_lock_init(void)
>  void
>  _kernel_lock(void)
>  {
> -     SCHED_ASSERT_UNLOCKED();
> +#ifdef DIAGNOSTIC
> +     if (__mp_lock_held(&kernel_lock) == 0)
> +             SCHED_ASSERT_UNLOCKED();
> +#endif /* DIAGNOSTIC */
>       __mp_lock(&kernel_lock);
>  }
> 
> Unfortunately this fix does not solve my problem.  With that I get
> another panic: wakeup: p_stat is 7
> 
> login: panic: wakeup: p_stat is 7
> Stopped at      Debugger+0x5:   leave
> RUN AT LEAST 'trace' AND 'ps' AND INCLUDE OUTPUT WHEN REPORTING THIS PANIC!
> IF RUNNING SMP, USE 'mach ddbcpu <#>' AND 'trace' ON OTHER PROCESSORS, TOO.
> DO NOT EVEN BOTHER REPORTING THIS WITHOUT INCLUDING THAT INFORMATION!
> ddb{3}> trace
> Debugger() at Debugger+0x5
> panic() at panic+0xee
> wakeup_n() at wakeup_n+0xfd
> sys___thrwakeup() at sys___thrwakeup+0x54
> syscall() at syscall+0x249
> --- syscall (number 301) ---
> end of kernel
> end trace frame: 0x684cb9237c0, count: -5
> 0x684bf834c2a:
> 
>    PID   PPID   PGRP    UID  S       FLAGS  WAIT          COMMAND
>  10959  11922  10959   1000  3        0x80  wait          gdb
> *11287  10959  10043   1000  7   0xc100000                qemu-system-x86_
>  11131  10959  10043   1000  3   0xc100080  sigwait       qemu-system-x86_
>  10043  10959  10043   1000  7   0x8000000                qemu-system-x86_
> 
> I will investigate further.
> 
> bluhm
> 
> 

Reply via email to