> Date: Sun, 1 Mar 2020 14:02:53 +0100
> From: Alexander Bluhm <alexander.bl...@gmx.net>
> 
> Hi,
> 
> I had a 6.6 machine where a lot of git processes got stuck sleeping
> on "futex".  The process holding the futex rwlock was this one.
> 
>  33293  332235      1   2734  3    0x800483  fsleep        git
> 
> It called mi_switch() from proc_stop() with this trace.
> 
> issignal(ffff80002acc74a8) at issignal+0x2ec
> sleep_setup_signal(120,ffffffff81e2e168) at sleep_setup_signal+0xdf
> rwsleep(12d8,ffff80002acc74a8,23,c010e7fabd0,0) at rwsleep+0x94
> futex_wait(2,ffff80002b0fb480,c010e7fabd0,12d8) at futex_wait+0x180
> sys_futex(530,ffff80002acc74a8,53) at sys_futex+0x80
> syscall(0) at syscall+0x37d
> Xsyscall(0,53,0,53,0,c015a954200) at Xsyscall+0x128
> 
> So I would say the process was stopped instead of sleeping and did
> not release the lock.  Can rwsleep() call rw_exit() before
> sleep_setup_signal()?  This diff survived a full make regress on
> amd64.
> 
> ok?

I think that should work.  We should be able to release the lock as
soon as we grab the scheduler lock.  And that happens in sleep_setup().

This probably means that msleep(4) has a similar issue.  And maybe
other places that do the split sleep_setup()/sleep_finish().

> Index: kern/kern_synch.c
> ===================================================================
> RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_synch.c,v
> retrieving revision 1.162
> diff -u -p -r1.162 kern_synch.c
> --- kern/kern_synch.c 30 Jan 2020 08:51:27 -0000      1.162
> +++ kern/kern_synch.c 1 Mar 2020 12:11:30 -0000
> @@ -320,9 +320,9 @@ rwsleep(const volatile void *ident, stru
> 
>       sleep_setup(&sls, ident, priority, wmesg);
>       sleep_setup_timeout(&sls, timo);
> -     sleep_setup_signal(&sls);
> -
>       rw_exit(rwl);
> +     /* signal may stop the process, release rwlock before that */
> +     sleep_setup_signal(&sls);
> 
>       error = sleep_finish_all(&sls, 1);
> 
> 

Reply via email to