Re: rwsleep and stopped process
On Sun, Mar 01, 2020 at 02:16:20PM +0100, Mark Kettenis wrote: > This probably means that msleep(4) has a similar issue. Here is the diff for msleep() and rwsleep(). bluhm Index: kern/kern_synch.c === RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_synch.c,v retrieving revision 1.162 diff -u -p -r1.162 kern_synch.c --- kern/kern_synch.c 30 Jan 2020 08:51:27 - 1.162 +++ kern/kern_synch.c 1 Mar 2020 13:50:29 - @@ -259,7 +259,6 @@ msleep(const volatile void *ident, struc sleep_setup(&sls, ident, priority, wmesg); sleep_setup_timeout(&sls, timo); - sleep_setup_signal(&sls); /* XXX - We need to make sure that the mutex doesn't * unblock splsched. This can be made a bit more @@ -268,6 +267,8 @@ msleep(const volatile void *ident, struc spl = MUTEX_OLDIPL(mtx); MUTEX_OLDIPL(mtx) = splsched(); mtx_leave(mtx); + /* signal may stop the process, release mutex before that */ + sleep_setup_signal(&sls); error = sleep_finish_all(&sls, 1); @@ -320,9 +321,10 @@ rwsleep(const volatile void *ident, stru sleep_setup(&sls, ident, priority, wmesg); sleep_setup_timeout(&sls, timo); - sleep_setup_signal(&sls); rw_exit(rwl); + /* signal may stop the process, release rwlock before that */ + sleep_setup_signal(&sls); error = sleep_finish_all(&sls, 1);
Re: rwsleep and stopped process
> Date: Sun, 1 Mar 2020 14:02:53 +0100 > From: Alexander Bluhm > > Hi, > > I had a 6.6 machine where a lot of git processes got stuck sleeping > on "futex". The process holding the futex rwlock was this one. > > 33293 332235 1 2734 30x800483 fsleepgit > > It called mi_switch() from proc_stop() with this trace. > > issignal(80002acc74a8) at issignal+0x2ec > sleep_setup_signal(120,81e2e168) at sleep_setup_signal+0xdf > rwsleep(12d8,80002acc74a8,23,c010e7fabd0,0) at rwsleep+0x94 > futex_wait(2,80002b0fb480,c010e7fabd0,12d8) at futex_wait+0x180 > sys_futex(530,80002acc74a8,53) at sys_futex+0x80 > syscall(0) at syscall+0x37d > Xsyscall(0,53,0,53,0,c015a954200) at Xsyscall+0x128 > > So I would say the process was stopped instead of sleeping and did > not release the lock. Can rwsleep() call rw_exit() before > sleep_setup_signal()? This diff survived a full make regress on > amd64. > > ok? I think that should work. We should be able to release the lock as soon as we grab the scheduler lock. And that happens in sleep_setup(). This probably means that msleep(4) has a similar issue. And maybe other places that do the split sleep_setup()/sleep_finish(). > Index: kern/kern_synch.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_synch.c,v > retrieving revision 1.162 > diff -u -p -r1.162 kern_synch.c > --- kern/kern_synch.c 30 Jan 2020 08:51:27 - 1.162 > +++ kern/kern_synch.c 1 Mar 2020 12:11:30 - > @@ -320,9 +320,9 @@ rwsleep(const volatile void *ident, stru > > sleep_setup(&sls, ident, priority, wmesg); > sleep_setup_timeout(&sls, timo); > - sleep_setup_signal(&sls); > - > rw_exit(rwl); > + /* signal may stop the process, release rwlock before that */ > + sleep_setup_signal(&sls); > > error = sleep_finish_all(&sls, 1); > >
rwsleep and stopped process
Hi, I had a 6.6 machine where a lot of git processes got stuck sleeping on "futex". The process holding the futex rwlock was this one. 33293 332235 1 2734 30x800483 fsleepgit It called mi_switch() from proc_stop() with this trace. issignal(80002acc74a8) at issignal+0x2ec sleep_setup_signal(120,81e2e168) at sleep_setup_signal+0xdf rwsleep(12d8,80002acc74a8,23,c010e7fabd0,0) at rwsleep+0x94 futex_wait(2,80002b0fb480,c010e7fabd0,12d8) at futex_wait+0x180 sys_futex(530,80002acc74a8,53) at sys_futex+0x80 syscall(0) at syscall+0x37d Xsyscall(0,53,0,53,0,c015a954200) at Xsyscall+0x128 So I would say the process was stopped instead of sleeping and did not release the lock. Can rwsleep() call rw_exit() before sleep_setup_signal()? This diff survived a full make regress on amd64. ok? bluhm Index: kern/kern_synch.c === RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_synch.c,v retrieving revision 1.162 diff -u -p -r1.162 kern_synch.c --- kern/kern_synch.c 30 Jan 2020 08:51:27 - 1.162 +++ kern/kern_synch.c 1 Mar 2020 12:11:30 - @@ -320,9 +320,9 @@ rwsleep(const volatile void *ident, stru sleep_setup(&sls, ident, priority, wmesg); sleep_setup_timeout(&sls, timo); - sleep_setup_signal(&sls); - rw_exit(rwl); + /* signal may stop the process, release rwlock before that */ + sleep_setup_signal(&sls); error = sleep_finish_all(&sls, 1);