> Date: Sun, 1 Mar 2020 14:02:53 +0100 > From: Alexander Bluhm <alexander.bl...@gmx.net> > > Hi, > > I had a 6.6 machine where a lot of git processes got stuck sleeping > on "futex". The process holding the futex rwlock was this one. > > 33293 332235 1 2734 3 0x800483 fsleep git > > It called mi_switch() from proc_stop() with this trace. > > issignal(ffff80002acc74a8) at issignal+0x2ec > sleep_setup_signal(120,ffffffff81e2e168) at sleep_setup_signal+0xdf > rwsleep(12d8,ffff80002acc74a8,23,c010e7fabd0,0) at rwsleep+0x94 > futex_wait(2,ffff80002b0fb480,c010e7fabd0,12d8) at futex_wait+0x180 > sys_futex(530,ffff80002acc74a8,53) at sys_futex+0x80 > syscall(0) at syscall+0x37d > Xsyscall(0,53,0,53,0,c015a954200) at Xsyscall+0x128 > > So I would say the process was stopped instead of sleeping and did > not release the lock. Can rwsleep() call rw_exit() before > sleep_setup_signal()? This diff survived a full make regress on > amd64. > > ok?
I think that should work. We should be able to release the lock as soon as we grab the scheduler lock. And that happens in sleep_setup(). This probably means that msleep(4) has a similar issue. And maybe other places that do the split sleep_setup()/sleep_finish(). > Index: kern/kern_synch.c > =================================================================== > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_synch.c,v > retrieving revision 1.162 > diff -u -p -r1.162 kern_synch.c > --- kern/kern_synch.c 30 Jan 2020 08:51:27 -0000 1.162 > +++ kern/kern_synch.c 1 Mar 2020 12:11:30 -0000 > @@ -320,9 +320,9 @@ rwsleep(const volatile void *ident, stru > > sleep_setup(&sls, ident, priority, wmesg); > sleep_setup_timeout(&sls, timo); > - sleep_setup_signal(&sls); > - > rw_exit(rwl); > + /* signal may stop the process, release rwlock before that */ > + sleep_setup_signal(&sls); > > error = sleep_finish_all(&sls, 1); > >