Re: kernel_lock not locked

2018-07-01 Thread Martin Pieuchot
On 28/06/18(Thu) 14:53, Visa Hankala wrote:
> On Wed, Jun 27, 2018 at 08:46:04PM +0200, Landry Breuil wrote:
> > On Wed, Jun 27, 2018 at 05:37:54PM +0100, Laurence Tratt wrote:
> > > >Synopsis:    kernel_lock not locked
> > > >Category:kernel
> > > >Environment:
> > >   System  : OpenBSD 6.3
> > >   Details : OpenBSD 6.3-current (GENERIC.MP) #55: Mon Jun 25 23:01:52 
> > > MDT 2018
> > >
> > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > 
> > >   Architecture: OpenBSD.amd64
> > >   Machine : amd64
> > > >Description:
> > >   I just hit the following kernel panic (a locking error in sched_bsd.c):
> > > 
> > >   https://imagebin.ca/v/46kV6Tfqe1sc
> > > 
> > >   I can hit this repeatedly by gdb'ing the new quodlibet 4.1.0 update that
> > >   Stuart just pushed to ports. It crashes at load; exactly at the point I
> > >   quit gdb the kernel panics. Here's the userland trace I get just before
> > >   the kernel panic occurs:
> > 
> > Fwiw, i've hit a similar panic (kernel_lock not locked) this weekend (on an 
> > up
> > to date kernel) when using egdb on ... firefox, of course.
> 
> There is a locking bug that gets triggered when a traced and stopped
> multithreaded process is forced to exit. When the bug hits, a thread
> calls exit1() with the kernel locked recursively:
> 
> sched_exit
> exit1
> single_thread_check
> single_thread_set
> issignal  <-- KERNEL_LOCK()
> userret  <-- KERNEL_LOCK()
> syscall
> Xsyscall_untramp
> 
> sched_exit() assumes that a single KERNEL_UNLOCK() releases the lock
> completely. However, the assumption is wrong in the above case.
> sched_exit() switches to the CPU's idle thread, which in turn calls
> mi_switch(). Then, mi_switch() tries to release the kernel lock (which
> is bound to the CPU, and which should not be locked in the first place).
> That causes a panic with WITNESS because WITNESS had associated the lock
> with the exiting thread and the lock is not found in the idle thread's
> lock list. That is why the panic's stack trace looks peculiar:
> 
> panic
> witness_unlock
> ___mp_release_all
> mi_switch
> sched_idle
> 
> Without WITNESS, the system would hang soon instead.
> 
> The bug can be fixed by making sched_exit() release the kernel lock
> completely. That would also make exit1() more agnostic with regard to
> the state of the lock. As an alternative, issignal() could avoid the
> recursive locking.
> 
> Comments? OK?

Thanks for your analyze.  So this is a regression introduced by the fix
for the previous TOCTOU race.

The kernel is currently grabbing the KERNEL_LOCK() in userret() to
serialize access to `ps_sigact'.  In the future we'll want to use finer
locks.  So my question is which fix goes in that direction?  The one
you posted or not grabbing the KERNEL_LOCK() in userret()?

If it doesn't matter, then I believe you should commit your fix, it is
ok mpi@.

> Index: kern/kern_sched.c
> ===
> RCS file: src/sys/kern/kern_sched.c,v
> retrieving revision 1.48
> diff -u -p -r1.48 kern_sched.c
> --- kern/kern_sched.c 19 Jun 2018 19:29:52 -  1.48
> +++ kern/kern_sched.c 28 Jun 2018 13:47:28 -
> @@ -218,8 +218,11 @@ sched_exit(struct proc *p)
>  
>   LIST_INSERT_HEAD(>spc_deadproc, p, p_hash);
>  
> +#ifdef MULTIPROCESSOR
>   /* This process no longer needs to hold the kernel lock. */
> - KERNEL_UNLOCK();
> + KERNEL_ASSERT_LOCKED();
> + __mp_release_all(_lock);
> +#endif
>  
>   SCHED_LOCK(s);
>   idle = spc->spc_idleproc;
> 



Re: kernel_lock not locked

2018-06-28 Thread Visa Hankala
On Wed, Jun 27, 2018 at 08:46:04PM +0200, Landry Breuil wrote:
> On Wed, Jun 27, 2018 at 05:37:54PM +0100, Laurence Tratt wrote:
> > >Synopsis:  kernel_lock not locked
> > >Category:  kernel
> > >Environment:
> > System  : OpenBSD 6.3
> > Details : OpenBSD 6.3-current (GENERIC.MP) #55: Mon Jun 25 23:01:52 
> > MDT 2018
> >  
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > I just hit the following kernel panic (a locking error in sched_bsd.c):
> > 
> >   https://imagebin.ca/v/46kV6Tfqe1sc
> > 
> > I can hit this repeatedly by gdb'ing the new quodlibet 4.1.0 update that
> > Stuart just pushed to ports. It crashes at load; exactly at the point I
> > quit gdb the kernel panics. Here's the userland trace I get just before
> > the kernel panic occurs:
> 
> Fwiw, i've hit a similar panic (kernel_lock not locked) this weekend (on an up
> to date kernel) when using egdb on ... firefox, of course.

There is a locking bug that gets triggered when a traced and stopped
multithreaded process is forced to exit. When the bug hits, a thread
calls exit1() with the kernel locked recursively:

sched_exit
exit1
single_thread_check
single_thread_set
issignal  <-- KERNEL_LOCK()
userret  <-- KERNEL_LOCK()
syscall
Xsyscall_untramp

sched_exit() assumes that a single KERNEL_UNLOCK() releases the lock
completely. However, the assumption is wrong in the above case.
sched_exit() switches to the CPU's idle thread, which in turn calls
mi_switch(). Then, mi_switch() tries to release the kernel lock (which
is bound to the CPU, and which should not be locked in the first place).
That causes a panic with WITNESS because WITNESS had associated the lock
with the exiting thread and the lock is not found in the idle thread's
lock list. That is why the panic's stack trace looks peculiar:

panic
witness_unlock
___mp_release_all
mi_switch
sched_idle

Without WITNESS, the system would hang soon instead.

The bug can be fixed by making sched_exit() release the kernel lock
completely. That would also make exit1() more agnostic with regard to
the state of the lock. As an alternative, issignal() could avoid the
recursive locking.

Comments? OK?

Index: kern/kern_sched.c
===
RCS file: src/sys/kern/kern_sched.c,v
retrieving revision 1.48
diff -u -p -r1.48 kern_sched.c
--- kern/kern_sched.c   19 Jun 2018 19:29:52 -  1.48
+++ kern/kern_sched.c   28 Jun 2018 13:47:28 -
@@ -218,8 +218,11 @@ sched_exit(struct proc *p)
 
LIST_INSERT_HEAD(>spc_deadproc, p, p_hash);
 
+#ifdef MULTIPROCESSOR
/* This process no longer needs to hold the kernel lock. */
-   KERNEL_UNLOCK();
+   KERNEL_ASSERT_LOCKED();
+   __mp_release_all(_lock);
+#endif
 
SCHED_LOCK(s);
idle = spc->spc_idleproc;



kernel_lock not locked

2018-06-27 Thread Laurence Tratt
>Synopsis:      kernel_lock not locked
>Category:  kernel
>Environment:
System  : OpenBSD 6.3
Details : OpenBSD 6.3-current (GENERIC.MP) #55: Mon Jun 25 23:01:52 
MDT 2018
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:
I just hit the following kernel panic (a locking error in sched_bsd.c):

  https://imagebin.ca/v/46kV6Tfqe1sc

I can hit this repeatedly by gdb'ing the new quodlibet 4.1.0 update that
Stuart just pushed to ports. It crashes at load; exactly at the point I
quit gdb the kernel panics. Here's the userland trace I get just before
the kernel panic occurs:

  $ gdb /usr/local/bin/python3.6
  GNU gdb 6.3
  Copyright 2004 Free Software Foundation, Inc.
  GDB is free software, covered by the GNU General Public License, and 
you are
  welcome to change it and/or distribute copies of it under certain 
conditions.
  Type "show copying" to see the conditions.
  There is absolutely no warranty for GDB.  Type "show warranty" for 
details.
  This GDB was configured as "amd64-unknown-openbsd6.3"...
  (no debugging symbols found)
  
  (gdb) run /usr/local/bin/quodlibet
  Starting program: /usr/local/bin/python3.6 /usr/local/bin/quodlibet
  [New process 34894]
  
  Program received signal SIGTRAP, Trace/breakpoint trap.
  0x027d5fc0ba27 in _dl_debug_state ()
  at /usr/src/libexec/ld.so/resolve.c:764
  764 {
  Current language:  auto; currently minimal
  (gdb) bt
  #0  0x027d5fc0ba27 in _dl_debug_state ()
  at /usr/src/libexec/ld.so/resolve.c:764
  #1  0x027d5fc00e60 in dlopen (
  libname=0x27d19bf9040 "libgdk_pixbuf-2.0.so.3200.1", 
flags=257)
  at /usr/src/libexec/ld.so/dlfcn.c:76
  #2  0x027cf73a79c9 in g_module_open (
  file_name=0x27d19bf9480 "libgdk_pixbuf-2.0.so.3200.1", 
flags=Variable "flags" is not available.
  )
  at gmodule-dl.c:98
  #3  0x027d30429ead in g_typelib_symbol ()
 from /usr/local/lib/libgirepository-1.0.so.3.0
  #4  0x027d30424dda in g_registered_type_info_get_g_type ()
 from /usr/local/lib/libgirepository-1.0.so.3.0
  #5  0x027d9d0e88d8 in pygi_arg_interface_new_from_info ()
 from /usr/local/lib/python3.6/site-packages/gi/_gi.so
  #6  0x027d9d0ee241 in pygi_arg_gobject_new_from_info ()
 from /usr/local/lib/python3.6/site-packages/gi/_gi.so
  #7  0x027d9d0e8b71 in pygi_arg_cache_new ()
 from /usr/local/lib/python3.6/site-packages/gi/_gi.so
  #8  0x027d9d0e9ae2 in _callable_cache_generate_args_cache_real ()
 from /usr/local/lib/python3.6/site-packages/gi/_gi.so
  #9  0x027d9d0e9858 in _callable_cache_init ()
 from /usr/local/lib/python3.6/site-packages/gi/_gi.so
  #10 0x027d9d0e8e9b in _function_cache_init ()
 from /usr/local/lib/python3.6/site-packages/gi/_gi.so
  ---Type  to continue, or q  to quit--- 
  #11 0x027d9d0e9359 in pygi_method_cache_new ()
 from /usr/local/lib/python3.6/site-packages/gi/_gi.so
  #12 0x027d9d0e858c in _wrap_g_callable_info_invoke ()
 from /usr/local/lib/python3.6/site-packages/gi/_gi.so
  #13 0x027d9d0dd4ae in _callable_info_call ()
 from /usr/local/lib/python3.6/site-packages/gi/_gi.so
  #14 0x027db5e99a02 in _PyObject_FastCallDict ()
 from /usr/local/lib/libpython3.6m.so.0.0
  #15 0x027db5f66be1 in call_function ()
 from /usr/local/lib/libpython3.6m.so.0.0
  #16 0x027db5f63ce9 in _PyEval_EvalFrameDefault ()
 from /usr/local/lib/libpython3.6m.so.0.0
  #17 0x027db5f67dd0 in fast_function ()
 from /usr/local/lib/libpython3.6m.so.0.0
  #18 0x027db5f66be8 in call_function ()
 from /usr/local/lib/libpython3.6m.so.0.0
  #19 0x027db5f63ce9 in _PyEval_EvalFrameDefault ()
 from /usr/local/lib/libpython3.6m.so.0.0
  #20 0x027db5f67529 in _PyEval_EvalCodeWithName ()
 from /usr/local/lib/libpython3.6m.so.0.0
  #21 0x027db5f67d5c in fast_function ()
 from /usr/local/lib/libpython3.6m.so.0.0
  #22 0x027db5f66be8 in call_function ()
 from /usr/local/lib/libpython3.6m.so.0.0
  ---Type  to continue, or q  to quit---
  #23 0x027db5f6