Re: mi_switch() & setting `p_stat'

2021-10-03 Thread Martin Pieuchot
On 02/10/21(Sat) 21:09, Mark Kettenis wrote:
> > Date: Sat, 2 Oct 2021 20:35:41 +0200
> > From: Martin Pieuchot 
> > [...] 
> > There's no sleeping point but a call to wakeup().  This wakeup() is
> > supposed to wake a btrace(8) process.  But if the curproc, which just
> > added itself to the global sleep queue, ends up in the same bucket as
> > the btrace process, the KASSERT() line 565 of kern/kern_synch.c will
> > trigger:
> > 
> > /*
> >  * If the rwlock passed to rwsleep() is contended, the
> >  * CPU will end up calling wakeup() between sleep_setup()
> >  * and sleep_finish().
> >  */
> > if (p == curproc) {
> > KASSERT(p->p_stat == SONPROC);
> > continue;
> > }
> 
> Ah, right.  But that means the comment isn't accurate.  At least there
> are other cases that make us hit that codepath.
> 
> How useful is that KASSERT in catching actual bugs?

I added the KASSERT() to limit the scope of the check.  If the test is
true `curproc' is obviously on the CPU.  Its usefulness is questionable.

So a simpler fix would be to remove the assert, diff below does that and
update the comment, ok?

Index: kern/kern_synch.c
===
RCS file: /cvs/src/sys/kern/kern_synch.c,v
retrieving revision 1.179
diff -u -p -r1.179 kern_synch.c
--- kern/kern_synch.c   9 Sep 2021 18:41:39 -   1.179
+++ kern/kern_synch.c   3 Oct 2021 08:48:28 -
@@ -558,14 +558,11 @@ wakeup_n(const volatile void *ident, int
for (p = TAILQ_FIRST(qp); p != NULL && n != 0; p = pnext) {
pnext = TAILQ_NEXT(p, p_runq);
/*
-* If the rwlock passed to rwsleep() is contended, the
-* CPU will end up calling wakeup() between sleep_setup()
-* and sleep_finish().
+* This happens if wakeup(9) is called after enqueuing
+* itself on the sleep queue and both `ident' collide.
 */
-   if (p == curproc) {
-   KASSERT(p->p_stat == SONPROC);
+   if (p == curproc)
continue;
-   }
 #ifdef DIAGNOSTIC
if (p->p_stat != SSLEEP && p->p_stat != SSTOP)
panic("wakeup: p_stat is %d", (int)p->p_stat);



Re: mi_switch() & setting `p_stat'

2021-10-02 Thread Philip Guenther
On Sat, Oct 2, 2021 at 8:57 AM Martin Pieuchot  wrote:

> When a thread running on a CPU schedules itself out, it does the following
> (pseudo_code):
>
> SCHED_LOCK()
> curproc->p_stat = SSLEEP;
> // some more operations
> mi_switch()
>
> The problem with this is that any instrumentation between setting `p_stat'
> and cpu_switchto() is incorrect because 'curproc' is still being executed
> and is not yet sleeping.  Its `p_stat' should be SONPROC and not SSLEEP.
>
...

> To fix this we should set `p_stat' as late a possible, diff below does that
> just before calling cpu_switchto().
>
...

> --- kern/kern_sig.c 28 Sep 2021 10:00:18 -  1.283
> +++ kern/kern_sig.c 2 Oct 2021 17:00:52 -
> @@ -1347,7 +1347,6 @@ proc_stop(struct proc *p, int sw)
> SCHED_ASSERT_LOCKED();
>  #endif
>
> -   p->p_stat = SSTOP;
> atomic_clearbits_int(>ps_flags, PS_WAITED);
> atomic_setbits_int(>ps_flags, PS_STOPPED);
> atomic_setbits_int(>p_flag, P_SUSPSIG);
> @@ -1357,7 +1356,7 @@ proc_stop(struct proc *p, int sw)
>  */
> softintr_schedule(proc_stop_si);
> if (sw)
> -   mi_switch();
> +   mi_switch(SSTOP);
>

This pair of chunks is wrong, as then the proc_stop(p, 0) call in
ptsignal() doesn't change the process from SSLEEP to SSTOP.  Sending a stop
signal to a blocked process needs to change its state.


Philip Guenther


Re: mi_switch() & setting `p_stat'

2021-10-02 Thread Mark Kettenis
> Date: Sat, 2 Oct 2021 20:35:41 +0200
> From: Martin Pieuchot 
> 
> On 02/10/21(Sat) 20:24, Mark Kettenis wrote:
> > > Date: Sat, 2 Oct 2021 19:55:49 +0200
> > > From: Martin Pieuchot 
> > > 
> > > When a thread running on a CPU schedules itself out, it does the following
> > > (pseudo_code):
> > > 
> > >   SCHED_LOCK()
> > >   curproc->p_stat = SSLEEP;
> > >   // some more operations
> > >   mi_switch()
> > > 
> > > The problem with this is that any instrumentation between setting `p_stat'
> > > and cpu_switchto() is incorrect because 'curproc' is still being executed
> > > and is not yet sleeping.  Its `p_stat' should be SONPROC and not SSLEEP.
> > 
> > Hmm, well, we're holding the scheduler lock, so nothing should really
> > look at our state at this point...
> 
> I added many TRACEPOINT() to investigate the scheduler's behaviour.  They
> look at those states.
> 
> > > It is possible to reproduce the problem with the following btrace(8) 
> > > script:
> > > 
> > >   tracepoint:sched:enqueue { printf("%d -> enqueue (%d)\n", arg0, arg1); }
> > >   tracepoint:sched:dequeue { printf("%d <- dequeue (%d)\n", arg0, arg1); }
> > >   tracepoint:sched:on__cpu { printf("%d -- on cpu (%d)\n", tid, pid); }
> > > 
> > > At which point the KASSERT() in wakeup_n() triggers if `curproc' is going 
> > > to
> > > sleep and its sleep channel collides with the running btrace(8) program:
> > > 
> > >   dt_prov_static_hook() at dt_prov_static_hook+0xe4
> > >   remrunqueue() at remrunqueue+0x1a4
> > >   sched_chooseproc() at sched_chooseproc+0x200
> > >   mi_switch() at mi_switch+0x178
> > >   sleep_finish() at sleep_finish+0x1d0
> > >   tsleep() at tsleep+0x100
> > >   biowait() at biowait+0x4c
> > >   ffs_read() at ffs_read+0x1c0
> > >   VOP_READ() at VOP_READ+0x44
> > >   vn_read() at vn_read+0x84
> > >   dofilereadv() at dofilereadv+0x8c
> > >   sys_read() at sys_read+0x5c
> > 
> > which suggests that something fishy is going on here.  Did we
> > accidentally introduce a sleeping point in the scheduler?
> 
> There's no sleeping point but a call to wakeup().  This wakeup() is
> supposed to wake a btrace(8) process.  But if the curproc, which just
> added itself to the global sleep queue, ends up in the same bucket as
> the btrace process, the KASSERT() line 565 of kern/kern_synch.c will
> trigger:
> 
> /*
>  * If the rwlock passed to rwsleep() is contended, the
>  * CPU will end up calling wakeup() between sleep_setup()
>  * and sleep_finish().
>  */
> if (p == curproc) {
> KASSERT(p->p_stat == SONPROC);
> continue;
> }

Ah, right.  But that means the comment isn't accurate.  At least there
are other cases that make us hit that codepath.

How useful is that KASSERT in catching actual bugs?

> > > To fix this we should set `p_stat' as late a possible, diff below does 
> > > that
> > > just before calling cpu_switchto().
> > > 
> > > Note that there's an exception for SRUN because setrunqueue() change 
> > > `p_stat'
> > > to indicate the thread is on a queue.  I'll discuss that in an upcoming 
> > > diff.
> > > 
> > > ok?
> > 
> > I'm not necessarily against this diff, but it may hide bugs.  And...
> 
> Updated diff that uses a char.

That looks better.

> Index: kern/kern_sched.c
> ===
> RCS file: /cvs/src/sys/kern/kern_sched.c,v
> retrieving revision 1.73
> diff -u -p -r1.73 kern_sched.c
> --- kern/kern_sched.c 9 Sep 2021 18:41:39 -   1.73
> +++ kern/kern_sched.c 2 Oct 2021 17:00:52 -
> @@ -144,10 +144,9 @@ sched_idle(void *v)
>*/
>   SCHED_LOCK(s);
>   cpuset_add(_idle_cpus, ci);
> - p->p_stat = SSLEEP;
>   p->p_cpu = ci;
>   atomic_setbits_int(>p_flag, P_CPUPEG);
> - mi_switch();
> + mi_switch(SSLEEP);
>   cpuset_del(_idle_cpus, ci);
>   SCHED_UNLOCK(s);
>  
> @@ -159,8 +158,7 @@ sched_idle(void *v)
>   struct proc *dead;
>  
>   SCHED_LOCK(s);
> - p->p_stat = SSLEEP;
> - mi_switch();
> + mi_switch(SSLEEP);
>   SCHED_UNLOCK(s);
>  
>   while ((dead = LIST_FIRST(>spc_deadproc))) {
> @@ -625,7 +623,7 @@ sched_peg_curproc(struct cpu_info *ci)
>   atomic_setbits_int(>p_flag, P_CPUPEG);
>   setrunqueue(ci, p, p->p_usrpri);
>   p->p_ru.ru_nvcsw++;
> - mi_switch();
> + mi_switch(SRUN);
>   SCHED_UNLOCK(s);
>  }
>  
> Index: kern/kern_synch.c
> ===
> RCS file: /cvs/src/sys/kern/kern_synch.c,v
> retrieving revision 1.179
> diff -u -p -r1.179 kern_synch.c
> --- kern/kern_synch.c 9 Sep 2021 18:41:39 -   1.179
> +++ kern/kern_synch.c 2 Oct 2021 17:00:52 -
> @@ -421,10 +421,9 @@ sleep_finish(struct sleep_state *sls, in
>  

Re: mi_switch() & setting `p_stat'

2021-10-02 Thread Martin Pieuchot
On 02/10/21(Sat) 20:24, Mark Kettenis wrote:
> > Date: Sat, 2 Oct 2021 19:55:49 +0200
> > From: Martin Pieuchot 
> > 
> > When a thread running on a CPU schedules itself out, it does the following
> > (pseudo_code):
> > 
> > SCHED_LOCK()
> > curproc->p_stat = SSLEEP;
> > // some more operations
> > mi_switch()
> > 
> > The problem with this is that any instrumentation between setting `p_stat'
> > and cpu_switchto() is incorrect because 'curproc' is still being executed
> > and is not yet sleeping.  Its `p_stat' should be SONPROC and not SSLEEP.
> 
> Hmm, well, we're holding the scheduler lock, so nothing should really
> look at our state at this point...

I added many TRACEPOINT() to investigate the scheduler's behaviour.  They
look at those states.

> > It is possible to reproduce the problem with the following btrace(8) script:
> > 
> >   tracepoint:sched:enqueue { printf("%d -> enqueue (%d)\n", arg0, arg1); }
> >   tracepoint:sched:dequeue { printf("%d <- dequeue (%d)\n", arg0, arg1); }
> >   tracepoint:sched:on__cpu { printf("%d -- on cpu (%d)\n", tid, pid); }
> > 
> > At which point the KASSERT() in wakeup_n() triggers if `curproc' is going to
> > sleep and its sleep channel collides with the running btrace(8) program:
> > 
> >   dt_prov_static_hook() at dt_prov_static_hook+0xe4
> >   remrunqueue() at remrunqueue+0x1a4
> >   sched_chooseproc() at sched_chooseproc+0x200
> >   mi_switch() at mi_switch+0x178
> >   sleep_finish() at sleep_finish+0x1d0
> >   tsleep() at tsleep+0x100
> >   biowait() at biowait+0x4c
> >   ffs_read() at ffs_read+0x1c0
> >   VOP_READ() at VOP_READ+0x44
> >   vn_read() at vn_read+0x84
> >   dofilereadv() at dofilereadv+0x8c
> >   sys_read() at sys_read+0x5c
> 
> which suggests that something fishy is going on here.  Did we
> accidentally introduce a sleeping point in the scheduler?

There's no sleeping point but a call to wakeup().  This wakeup() is
supposed to wake a btrace(8) process.  But if the curproc, which just
added itself to the global sleep queue, ends up in the same bucket as
the btrace process, the KASSERT() line 565 of kern/kern_synch.c will
trigger:

/*
 * If the rwlock passed to rwsleep() is contended, the
 * CPU will end up calling wakeup() between sleep_setup()
 * and sleep_finish().
 */
if (p == curproc) {
KASSERT(p->p_stat == SONPROC);
continue;
}

> > To fix this we should set `p_stat' as late a possible, diff below does that
> > just before calling cpu_switchto().
> > 
> > Note that there's an exception for SRUN because setrunqueue() change 
> > `p_stat'
> > to indicate the thread is on a queue.  I'll discuss that in an upcoming 
> > diff.
> > 
> > ok?
> 
> I'm not necessarily against this diff, but it may hide bugs.  And...

Updated diff that uses a char.

Index: kern/kern_sched.c
===
RCS file: /cvs/src/sys/kern/kern_sched.c,v
retrieving revision 1.73
diff -u -p -r1.73 kern_sched.c
--- kern/kern_sched.c   9 Sep 2021 18:41:39 -   1.73
+++ kern/kern_sched.c   2 Oct 2021 17:00:52 -
@@ -144,10 +144,9 @@ sched_idle(void *v)
 */
SCHED_LOCK(s);
cpuset_add(_idle_cpus, ci);
-   p->p_stat = SSLEEP;
p->p_cpu = ci;
atomic_setbits_int(>p_flag, P_CPUPEG);
-   mi_switch();
+   mi_switch(SSLEEP);
cpuset_del(_idle_cpus, ci);
SCHED_UNLOCK(s);
 
@@ -159,8 +158,7 @@ sched_idle(void *v)
struct proc *dead;
 
SCHED_LOCK(s);
-   p->p_stat = SSLEEP;
-   mi_switch();
+   mi_switch(SSLEEP);
SCHED_UNLOCK(s);
 
while ((dead = LIST_FIRST(>spc_deadproc))) {
@@ -625,7 +623,7 @@ sched_peg_curproc(struct cpu_info *ci)
atomic_setbits_int(>p_flag, P_CPUPEG);
setrunqueue(ci, p, p->p_usrpri);
p->p_ru.ru_nvcsw++;
-   mi_switch();
+   mi_switch(SRUN);
SCHED_UNLOCK(s);
 }
 
Index: kern/kern_synch.c
===
RCS file: /cvs/src/sys/kern/kern_synch.c,v
retrieving revision 1.179
diff -u -p -r1.179 kern_synch.c
--- kern/kern_synch.c   9 Sep 2021 18:41:39 -   1.179
+++ kern/kern_synch.c   2 Oct 2021 17:00:52 -
@@ -421,10 +421,9 @@ sleep_finish(struct sleep_state *sls, in
}
 
if (do_sleep) {
-   p->p_stat = SSLEEP;
p->p_ru.ru_nvcsw++;
SCHED_ASSERT_LOCKED();
-   mi_switch();
+   mi_switch(SSLEEP);
} else {
unsleep(p);
}
@@ -603,7 +602,7 @@ sys_sched_yield(struct proc *p, void *v,
newprio = max(newprio, q->p_runpri);
setrunqueue(p->p_cpu, p, newprio);
p->p_ru.ru_nvcsw++;
-   

Re: mi_switch() & setting `p_stat'

2021-10-02 Thread Mark Kettenis
> Date: Sat, 2 Oct 2021 19:55:49 +0200
> From: Martin Pieuchot 
> 
> When a thread running on a CPU schedules itself out, it does the following
> (pseudo_code):
> 
>   SCHED_LOCK()
>   curproc->p_stat = SSLEEP;
>   // some more operations
>   mi_switch()
> 
> The problem with this is that any instrumentation between setting `p_stat'
> and cpu_switchto() is incorrect because 'curproc' is still being executed
> and is not yet sleeping.  Its `p_stat' should be SONPROC and not SSLEEP.

Hmm, well, we're holding the scheduler lock, so nothing should really
look at our state at this point...

> It is possible to reproduce the problem with the following btrace(8) script:
> 
>   tracepoint:sched:enqueue { printf("%d -> enqueue (%d)\n", arg0, arg1); }
>   tracepoint:sched:dequeue { printf("%d <- dequeue (%d)\n", arg0, arg1); }
>   tracepoint:sched:on__cpu { printf("%d -- on cpu (%d)\n", tid, pid); }
> 
> At which point the KASSERT() in wakeup_n() triggers if `curproc' is going to
> sleep and its sleep channel collides with the running btrace(8) program:
> 
>   dt_prov_static_hook() at dt_prov_static_hook+0xe4
>   remrunqueue() at remrunqueue+0x1a4
>   sched_chooseproc() at sched_chooseproc+0x200
>   mi_switch() at mi_switch+0x178
>   sleep_finish() at sleep_finish+0x1d0
>   tsleep() at tsleep+0x100
>   biowait() at biowait+0x4c
>   ffs_read() at ffs_read+0x1c0
>   VOP_READ() at VOP_READ+0x44
>   vn_read() at vn_read+0x84
>   dofilereadv() at dofilereadv+0x8c
>   sys_read() at sys_read+0x5c

which suggests that something fishy is going on here.  Did we
accidentally introduce a sleeping point in the scheduler?

> To fix this we should set `p_stat' as late a possible, diff below does that
> just before calling cpu_switchto().
> 
> Note that there's an exception for SRUN because setrunqueue() change `p_stat'
> to indicate the thread is on a queue.  I'll discuss that in an upcoming diff.
> 
> ok?

I'm not necessarily against this diff, but it may hide bugs.  And...

> Index: kern/kern_sched.c
> ===
> RCS file: /cvs/src/sys/kern/kern_sched.c,v
> retrieving revision 1.73
> diff -u -p -r1.73 kern_sched.c
> --- kern/kern_sched.c 9 Sep 2021 18:41:39 -   1.73
> +++ kern/kern_sched.c 2 Oct 2021 17:00:52 -
> @@ -144,10 +144,9 @@ sched_idle(void *v)
>*/
>   SCHED_LOCK(s);
>   cpuset_add(_idle_cpus, ci);
> - p->p_stat = SSLEEP;
>   p->p_cpu = ci;
>   atomic_setbits_int(>p_flag, P_CPUPEG);
> - mi_switch();
> + mi_switch(SSLEEP);
>   cpuset_del(_idle_cpus, ci);
>   SCHED_UNLOCK(s);
>  
> @@ -159,8 +158,7 @@ sched_idle(void *v)
>   struct proc *dead;
>  
>   SCHED_LOCK(s);
> - p->p_stat = SSLEEP;
> - mi_switch();
> + mi_switch(SSLEEP);
>   SCHED_UNLOCK(s);
>  
>   while ((dead = LIST_FIRST(>spc_deadproc))) {
> @@ -625,7 +623,7 @@ sched_peg_curproc(struct cpu_info *ci)
>   atomic_setbits_int(>p_flag, P_CPUPEG);
>   setrunqueue(ci, p, p->p_usrpri);
>   p->p_ru.ru_nvcsw++;
> - mi_switch();
> + mi_switch(SRUN);
>   SCHED_UNLOCK(s);
>  }
>  
> Index: kern/kern_synch.c
> ===
> RCS file: /cvs/src/sys/kern/kern_synch.c,v
> retrieving revision 1.179
> diff -u -p -r1.179 kern_synch.c
> --- kern/kern_synch.c 9 Sep 2021 18:41:39 -   1.179
> +++ kern/kern_synch.c 2 Oct 2021 17:00:52 -
> @@ -421,10 +421,9 @@ sleep_finish(struct sleep_state *sls, in
>   }
>  
>   if (do_sleep) {
> - p->p_stat = SSLEEP;
>   p->p_ru.ru_nvcsw++;
>   SCHED_ASSERT_LOCKED();
> - mi_switch();
> + mi_switch(SSLEEP);
>   } else {
>   unsleep(p);
>   }
> @@ -603,7 +602,7 @@ sys_sched_yield(struct proc *p, void *v,
>   newprio = max(newprio, q->p_runpri);
>   setrunqueue(p->p_cpu, p, newprio);
>   p->p_ru.ru_nvcsw++;
> - mi_switch();
> + mi_switch(SRUN);
>   SCHED_UNLOCK(s);
>  
>   return (0);
> Index: kern/kern_sig.c
> ===
> RCS file: /cvs/src/sys/kern/kern_sig.c,v
> retrieving revision 1.283
> diff -u -p -r1.283 kern_sig.c
> --- kern/kern_sig.c   28 Sep 2021 10:00:18 -  1.283
> +++ kern/kern_sig.c   2 Oct 2021 17:00:52 -
> @@ -1347,7 +1347,6 @@ proc_stop(struct proc *p, int sw)
>   SCHED_ASSERT_LOCKED();
>  #endif
>  
> - p->p_stat = SSTOP;
>   atomic_clearbits_int(>ps_flags, PS_WAITED);
>   atomic_setbits_int(>ps_flags, PS_STOPPED);
>   atomic_setbits_int(>p_flag, P_SUSPSIG);
> @@ -1357,7 +1356,7 @@ proc_stop(struct proc *p, int sw)
>*/
>   softintr_schedule(proc_stop_si);
>   if (sw)
> - mi_switch();
> + mi_switch(SSTOP);
>  }
>  
>  /*
> @@