Re: rwsleep and stopped process

2020-03-01 Thread Alexander Bluhm
On Sun, Mar 01, 2020 at 02:16:20PM +0100, Mark Kettenis wrote:
> This probably means that msleep(4) has a similar issue.

Here is the diff for msleep() and rwsleep().

bluhm

Index: kern/kern_synch.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_synch.c,v
retrieving revision 1.162
diff -u -p -r1.162 kern_synch.c
--- kern/kern_synch.c   30 Jan 2020 08:51:27 -  1.162
+++ kern/kern_synch.c   1 Mar 2020 13:50:29 -
@@ -259,7 +259,6 @@ msleep(const volatile void *ident, struc

sleep_setup(&sls, ident, priority, wmesg);
sleep_setup_timeout(&sls, timo);
-   sleep_setup_signal(&sls);

/* XXX - We need to make sure that the mutex doesn't
 * unblock splsched. This can be made a bit more
@@ -268,6 +267,8 @@ msleep(const volatile void *ident, struc
spl = MUTEX_OLDIPL(mtx);
MUTEX_OLDIPL(mtx) = splsched();
mtx_leave(mtx);
+   /* signal may stop the process, release mutex before that */
+   sleep_setup_signal(&sls);

error = sleep_finish_all(&sls, 1);

@@ -320,9 +321,10 @@ rwsleep(const volatile void *ident, stru

sleep_setup(&sls, ident, priority, wmesg);
sleep_setup_timeout(&sls, timo);
-   sleep_setup_signal(&sls);

rw_exit(rwl);
+   /* signal may stop the process, release rwlock before that */
+   sleep_setup_signal(&sls);

error = sleep_finish_all(&sls, 1);



Re: rwsleep and stopped process

2020-03-01 Thread Mark Kettenis
> Date: Sun, 1 Mar 2020 14:02:53 +0100
> From: Alexander Bluhm 
> 
> Hi,
> 
> I had a 6.6 machine where a lot of git processes got stuck sleeping
> on "futex".  The process holding the futex rwlock was this one.
> 
>  33293  332235  1   2734  30x800483  fsleepgit
> 
> It called mi_switch() from proc_stop() with this trace.
> 
> issignal(80002acc74a8) at issignal+0x2ec
> sleep_setup_signal(120,81e2e168) at sleep_setup_signal+0xdf
> rwsleep(12d8,80002acc74a8,23,c010e7fabd0,0) at rwsleep+0x94
> futex_wait(2,80002b0fb480,c010e7fabd0,12d8) at futex_wait+0x180
> sys_futex(530,80002acc74a8,53) at sys_futex+0x80
> syscall(0) at syscall+0x37d
> Xsyscall(0,53,0,53,0,c015a954200) at Xsyscall+0x128
> 
> So I would say the process was stopped instead of sleeping and did
> not release the lock.  Can rwsleep() call rw_exit() before
> sleep_setup_signal()?  This diff survived a full make regress on
> amd64.
> 
> ok?

I think that should work.  We should be able to release the lock as
soon as we grab the scheduler lock.  And that happens in sleep_setup().

This probably means that msleep(4) has a similar issue.  And maybe
other places that do the split sleep_setup()/sleep_finish().

> Index: kern/kern_synch.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_synch.c,v
> retrieving revision 1.162
> diff -u -p -r1.162 kern_synch.c
> --- kern/kern_synch.c 30 Jan 2020 08:51:27 -  1.162
> +++ kern/kern_synch.c 1 Mar 2020 12:11:30 -
> @@ -320,9 +320,9 @@ rwsleep(const volatile void *ident, stru
> 
>   sleep_setup(&sls, ident, priority, wmesg);
>   sleep_setup_timeout(&sls, timo);
> - sleep_setup_signal(&sls);
> -
>   rw_exit(rwl);
> + /* signal may stop the process, release rwlock before that */
> + sleep_setup_signal(&sls);
> 
>   error = sleep_finish_all(&sls, 1);
> 
> 



rwsleep and stopped process

2020-03-01 Thread Alexander Bluhm
Hi,

I had a 6.6 machine where a lot of git processes got stuck sleeping
on "futex".  The process holding the futex rwlock was this one.

 33293  332235  1   2734  30x800483  fsleepgit

It called mi_switch() from proc_stop() with this trace.

issignal(80002acc74a8) at issignal+0x2ec
sleep_setup_signal(120,81e2e168) at sleep_setup_signal+0xdf
rwsleep(12d8,80002acc74a8,23,c010e7fabd0,0) at rwsleep+0x94
futex_wait(2,80002b0fb480,c010e7fabd0,12d8) at futex_wait+0x180
sys_futex(530,80002acc74a8,53) at sys_futex+0x80
syscall(0) at syscall+0x37d
Xsyscall(0,53,0,53,0,c015a954200) at Xsyscall+0x128

So I would say the process was stopped instead of sleeping and did
not release the lock.  Can rwsleep() call rw_exit() before
sleep_setup_signal()?  This diff survived a full make regress on
amd64.

ok?

bluhm

Index: kern/kern_synch.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/kern/kern_synch.c,v
retrieving revision 1.162
diff -u -p -r1.162 kern_synch.c
--- kern/kern_synch.c   30 Jan 2020 08:51:27 -  1.162
+++ kern/kern_synch.c   1 Mar 2020 12:11:30 -
@@ -320,9 +320,9 @@ rwsleep(const volatile void *ident, stru

sleep_setup(&sls, ident, priority, wmesg);
sleep_setup_timeout(&sls, timo);
-   sleep_setup_signal(&sls);
-
rw_exit(rwl);
+   /* signal may stop the process, release rwlock before that */
+   sleep_setup_signal(&sls);

error = sleep_finish_all(&sls, 1);