Re: mi_switch() & setting `p_stat'
On 02/10/21(Sat) 21:09, Mark Kettenis wrote: > > Date: Sat, 2 Oct 2021 20:35:41 +0200 > > From: Martin Pieuchot > > [...] > > There's no sleeping point but a call to wakeup(). This wakeup() is > > supposed to wake a btrace(8) process. But if the curproc, which just > > added itself to the global sleep queue, ends up in the same bucket as > > the btrace process, the KASSERT() line 565 of kern/kern_synch.c will > > trigger: > > > > /* > > * If the rwlock passed to rwsleep() is contended, the > > * CPU will end up calling wakeup() between sleep_setup() > > * and sleep_finish(). > > */ > > if (p == curproc) { > > KASSERT(p->p_stat == SONPROC); > > continue; > > } > > Ah, right. But that means the comment isn't accurate. At least there > are other cases that make us hit that codepath. > > How useful is that KASSERT in catching actual bugs? I added the KASSERT() to limit the scope of the check. If the test is true `curproc' is obviously on the CPU. Its usefulness is questionable. So a simpler fix would be to remove the assert, diff below does that and update the comment, ok? Index: kern/kern_synch.c === RCS file: /cvs/src/sys/kern/kern_synch.c,v retrieving revision 1.179 diff -u -p -r1.179 kern_synch.c --- kern/kern_synch.c 9 Sep 2021 18:41:39 - 1.179 +++ kern/kern_synch.c 3 Oct 2021 08:48:28 - @@ -558,14 +558,11 @@ wakeup_n(const volatile void *ident, int for (p = TAILQ_FIRST(qp); p != NULL && n != 0; p = pnext) { pnext = TAILQ_NEXT(p, p_runq); /* -* If the rwlock passed to rwsleep() is contended, the -* CPU will end up calling wakeup() between sleep_setup() -* and sleep_finish(). +* This happens if wakeup(9) is called after enqueuing +* itself on the sleep queue and both `ident' collide. */ - if (p == curproc) { - KASSERT(p->p_stat == SONPROC); + if (p == curproc) continue; - } #ifdef DIAGNOSTIC if (p->p_stat != SSLEEP && p->p_stat != SSTOP) panic("wakeup: p_stat is %d", (int)p->p_stat);
Re: mi_switch() & setting `p_stat'
On Sat, Oct 2, 2021 at 8:57 AM Martin Pieuchot wrote: > When a thread running on a CPU schedules itself out, it does the following > (pseudo_code): > > SCHED_LOCK() > curproc->p_stat = SSLEEP; > // some more operations > mi_switch() > > The problem with this is that any instrumentation between setting `p_stat' > and cpu_switchto() is incorrect because 'curproc' is still being executed > and is not yet sleeping. Its `p_stat' should be SONPROC and not SSLEEP. > ... > To fix this we should set `p_stat' as late a possible, diff below does that > just before calling cpu_switchto(). > ... > --- kern/kern_sig.c 28 Sep 2021 10:00:18 - 1.283 > +++ kern/kern_sig.c 2 Oct 2021 17:00:52 - > @@ -1347,7 +1347,6 @@ proc_stop(struct proc *p, int sw) > SCHED_ASSERT_LOCKED(); > #endif > > - p->p_stat = SSTOP; > atomic_clearbits_int(&pr->ps_flags, PS_WAITED); > atomic_setbits_int(&pr->ps_flags, PS_STOPPED); > atomic_setbits_int(&p->p_flag, P_SUSPSIG); > @@ -1357,7 +1356,7 @@ proc_stop(struct proc *p, int sw) > */ > softintr_schedule(proc_stop_si); > if (sw) > - mi_switch(); > + mi_switch(SSTOP); > This pair of chunks is wrong, as then the proc_stop(p, 0) call in ptsignal() doesn't change the process from SSLEEP to SSTOP. Sending a stop signal to a blocked process needs to change its state. Philip Guenther
Re: mi_switch() & setting `p_stat'
> Date: Sat, 2 Oct 2021 20:35:41 +0200 > From: Martin Pieuchot > > On 02/10/21(Sat) 20:24, Mark Kettenis wrote: > > > Date: Sat, 2 Oct 2021 19:55:49 +0200 > > > From: Martin Pieuchot > > > > > > When a thread running on a CPU schedules itself out, it does the following > > > (pseudo_code): > > > > > > SCHED_LOCK() > > > curproc->p_stat = SSLEEP; > > > // some more operations > > > mi_switch() > > > > > > The problem with this is that any instrumentation between setting `p_stat' > > > and cpu_switchto() is incorrect because 'curproc' is still being executed > > > and is not yet sleeping. Its `p_stat' should be SONPROC and not SSLEEP. > > > > Hmm, well, we're holding the scheduler lock, so nothing should really > > look at our state at this point... > > I added many TRACEPOINT() to investigate the scheduler's behaviour. They > look at those states. > > > > It is possible to reproduce the problem with the following btrace(8) > > > script: > > > > > > tracepoint:sched:enqueue { printf("%d -> enqueue (%d)\n", arg0, arg1); } > > > tracepoint:sched:dequeue { printf("%d <- dequeue (%d)\n", arg0, arg1); } > > > tracepoint:sched:on__cpu { printf("%d -- on cpu (%d)\n", tid, pid); } > > > > > > At which point the KASSERT() in wakeup_n() triggers if `curproc' is going > > > to > > > sleep and its sleep channel collides with the running btrace(8) program: > > > > > > dt_prov_static_hook() at dt_prov_static_hook+0xe4 > > > remrunqueue() at remrunqueue+0x1a4 > > > sched_chooseproc() at sched_chooseproc+0x200 > > > mi_switch() at mi_switch+0x178 > > > sleep_finish() at sleep_finish+0x1d0 > > > tsleep() at tsleep+0x100 > > > biowait() at biowait+0x4c > > > ffs_read() at ffs_read+0x1c0 > > > VOP_READ() at VOP_READ+0x44 > > > vn_read() at vn_read+0x84 > > > dofilereadv() at dofilereadv+0x8c > > > sys_read() at sys_read+0x5c > > > > which suggests that something fishy is going on here. Did we > > accidentally introduce a sleeping point in the scheduler? > > There's no sleeping point but a call to wakeup(). This wakeup() is > supposed to wake a btrace(8) process. But if the curproc, which just > added itself to the global sleep queue, ends up in the same bucket as > the btrace process, the KASSERT() line 565 of kern/kern_synch.c will > trigger: > > /* > * If the rwlock passed to rwsleep() is contended, the > * CPU will end up calling wakeup() between sleep_setup() > * and sleep_finish(). > */ > if (p == curproc) { > KASSERT(p->p_stat == SONPROC); > continue; > } Ah, right. But that means the comment isn't accurate. At least there are other cases that make us hit that codepath. How useful is that KASSERT in catching actual bugs? > > > To fix this we should set `p_stat' as late a possible, diff below does > > > that > > > just before calling cpu_switchto(). > > > > > > Note that there's an exception for SRUN because setrunqueue() change > > > `p_stat' > > > to indicate the thread is on a queue. I'll discuss that in an upcoming > > > diff. > > > > > > ok? > > > > I'm not necessarily against this diff, but it may hide bugs. And... > > Updated diff that uses a char. That looks better. > Index: kern/kern_sched.c > === > RCS file: /cvs/src/sys/kern/kern_sched.c,v > retrieving revision 1.73 > diff -u -p -r1.73 kern_sched.c > --- kern/kern_sched.c 9 Sep 2021 18:41:39 - 1.73 > +++ kern/kern_sched.c 2 Oct 2021 17:00:52 - > @@ -144,10 +144,9 @@ sched_idle(void *v) >*/ > SCHED_LOCK(s); > cpuset_add(&sched_idle_cpus, ci); > - p->p_stat = SSLEEP; > p->p_cpu = ci; > atomic_setbits_int(&p->p_flag, P_CPUPEG); > - mi_switch(); > + mi_switch(SSLEEP); > cpuset_del(&sched_idle_cpus, ci); > SCHED_UNLOCK(s); > > @@ -159,8 +158,7 @@ sched_idle(void *v) > struct proc *dead; > > SCHED_LOCK(s); > - p->p_stat = SSLEEP; > - mi_switch(); > + mi_switch(SSLEEP); > SCHED_UNLOCK(s); > > while ((dead = LIST_FIRST(&spc->spc_deadproc))) { > @@ -625,7 +623,7 @@ sched_peg_curproc(struct cpu_info *ci) > atomic_setbits_int(&p->p_flag, P_CPUPEG); > setrunqueue(ci, p, p->p_usrpri); > p->p_ru.ru_nvcsw++; > - mi_switch(); > + mi_switch(SRUN); > SCHED_UNLOCK(s); > } > > Index: kern/kern_synch.c > === > RCS file: /cvs/src/sys/kern/kern_synch.c,v > retrieving revision 1.179 > diff -u -p -r1.179 kern_synch.c > --- kern/kern_synch.c 9 Sep 2021 18:41:39 - 1.179 > +++ kern/kern_synch.c 2 Oct 2021 17:00:52 - > @@ -421,10 +421,9 @@ sleep_finish(struct s
Re: mi_switch() & setting `p_stat'
On 02/10/21(Sat) 20:24, Mark Kettenis wrote: > > Date: Sat, 2 Oct 2021 19:55:49 +0200 > > From: Martin Pieuchot > > > > When a thread running on a CPU schedules itself out, it does the following > > (pseudo_code): > > > > SCHED_LOCK() > > curproc->p_stat = SSLEEP; > > // some more operations > > mi_switch() > > > > The problem with this is that any instrumentation between setting `p_stat' > > and cpu_switchto() is incorrect because 'curproc' is still being executed > > and is not yet sleeping. Its `p_stat' should be SONPROC and not SSLEEP. > > Hmm, well, we're holding the scheduler lock, so nothing should really > look at our state at this point... I added many TRACEPOINT() to investigate the scheduler's behaviour. They look at those states. > > It is possible to reproduce the problem with the following btrace(8) script: > > > > tracepoint:sched:enqueue { printf("%d -> enqueue (%d)\n", arg0, arg1); } > > tracepoint:sched:dequeue { printf("%d <- dequeue (%d)\n", arg0, arg1); } > > tracepoint:sched:on__cpu { printf("%d -- on cpu (%d)\n", tid, pid); } > > > > At which point the KASSERT() in wakeup_n() triggers if `curproc' is going to > > sleep and its sleep channel collides with the running btrace(8) program: > > > > dt_prov_static_hook() at dt_prov_static_hook+0xe4 > > remrunqueue() at remrunqueue+0x1a4 > > sched_chooseproc() at sched_chooseproc+0x200 > > mi_switch() at mi_switch+0x178 > > sleep_finish() at sleep_finish+0x1d0 > > tsleep() at tsleep+0x100 > > biowait() at biowait+0x4c > > ffs_read() at ffs_read+0x1c0 > > VOP_READ() at VOP_READ+0x44 > > vn_read() at vn_read+0x84 > > dofilereadv() at dofilereadv+0x8c > > sys_read() at sys_read+0x5c > > which suggests that something fishy is going on here. Did we > accidentally introduce a sleeping point in the scheduler? There's no sleeping point but a call to wakeup(). This wakeup() is supposed to wake a btrace(8) process. But if the curproc, which just added itself to the global sleep queue, ends up in the same bucket as the btrace process, the KASSERT() line 565 of kern/kern_synch.c will trigger: /* * If the rwlock passed to rwsleep() is contended, the * CPU will end up calling wakeup() between sleep_setup() * and sleep_finish(). */ if (p == curproc) { KASSERT(p->p_stat == SONPROC); continue; } > > To fix this we should set `p_stat' as late a possible, diff below does that > > just before calling cpu_switchto(). > > > > Note that there's an exception for SRUN because setrunqueue() change > > `p_stat' > > to indicate the thread is on a queue. I'll discuss that in an upcoming > > diff. > > > > ok? > > I'm not necessarily against this diff, but it may hide bugs. And... Updated diff that uses a char. Index: kern/kern_sched.c === RCS file: /cvs/src/sys/kern/kern_sched.c,v retrieving revision 1.73 diff -u -p -r1.73 kern_sched.c --- kern/kern_sched.c 9 Sep 2021 18:41:39 - 1.73 +++ kern/kern_sched.c 2 Oct 2021 17:00:52 - @@ -144,10 +144,9 @@ sched_idle(void *v) */ SCHED_LOCK(s); cpuset_add(&sched_idle_cpus, ci); - p->p_stat = SSLEEP; p->p_cpu = ci; atomic_setbits_int(&p->p_flag, P_CPUPEG); - mi_switch(); + mi_switch(SSLEEP); cpuset_del(&sched_idle_cpus, ci); SCHED_UNLOCK(s); @@ -159,8 +158,7 @@ sched_idle(void *v) struct proc *dead; SCHED_LOCK(s); - p->p_stat = SSLEEP; - mi_switch(); + mi_switch(SSLEEP); SCHED_UNLOCK(s); while ((dead = LIST_FIRST(&spc->spc_deadproc))) { @@ -625,7 +623,7 @@ sched_peg_curproc(struct cpu_info *ci) atomic_setbits_int(&p->p_flag, P_CPUPEG); setrunqueue(ci, p, p->p_usrpri); p->p_ru.ru_nvcsw++; - mi_switch(); + mi_switch(SRUN); SCHED_UNLOCK(s); } Index: kern/kern_synch.c === RCS file: /cvs/src/sys/kern/kern_synch.c,v retrieving revision 1.179 diff -u -p -r1.179 kern_synch.c --- kern/kern_synch.c 9 Sep 2021 18:41:39 - 1.179 +++ kern/kern_synch.c 2 Oct 2021 17:00:52 - @@ -421,10 +421,9 @@ sleep_finish(struct sleep_state *sls, in } if (do_sleep) { - p->p_stat = SSLEEP; p->p_ru.ru_nvcsw++; SCHED_ASSERT_LOCKED(); - mi_switch(); + mi_switch(SSLEEP); } else { unsleep(p); } @@ -603,7 +602,7 @@ sys_sched_yield(struct proc *p, void *v, newprio = max(newprio, q->p_runpri); setrunqueue(p->p_cpu, p, newprio); p->p_ru.ru
Re: mi_switch() & setting `p_stat'
> Date: Sat, 2 Oct 2021 19:55:49 +0200 > From: Martin Pieuchot > > When a thread running on a CPU schedules itself out, it does the following > (pseudo_code): > > SCHED_LOCK() > curproc->p_stat = SSLEEP; > // some more operations > mi_switch() > > The problem with this is that any instrumentation between setting `p_stat' > and cpu_switchto() is incorrect because 'curproc' is still being executed > and is not yet sleeping. Its `p_stat' should be SONPROC and not SSLEEP. Hmm, well, we're holding the scheduler lock, so nothing should really look at our state at this point... > It is possible to reproduce the problem with the following btrace(8) script: > > tracepoint:sched:enqueue { printf("%d -> enqueue (%d)\n", arg0, arg1); } > tracepoint:sched:dequeue { printf("%d <- dequeue (%d)\n", arg0, arg1); } > tracepoint:sched:on__cpu { printf("%d -- on cpu (%d)\n", tid, pid); } > > At which point the KASSERT() in wakeup_n() triggers if `curproc' is going to > sleep and its sleep channel collides with the running btrace(8) program: > > dt_prov_static_hook() at dt_prov_static_hook+0xe4 > remrunqueue() at remrunqueue+0x1a4 > sched_chooseproc() at sched_chooseproc+0x200 > mi_switch() at mi_switch+0x178 > sleep_finish() at sleep_finish+0x1d0 > tsleep() at tsleep+0x100 > biowait() at biowait+0x4c > ffs_read() at ffs_read+0x1c0 > VOP_READ() at VOP_READ+0x44 > vn_read() at vn_read+0x84 > dofilereadv() at dofilereadv+0x8c > sys_read() at sys_read+0x5c which suggests that something fishy is going on here. Did we accidentally introduce a sleeping point in the scheduler? > To fix this we should set `p_stat' as late a possible, diff below does that > just before calling cpu_switchto(). > > Note that there's an exception for SRUN because setrunqueue() change `p_stat' > to indicate the thread is on a queue. I'll discuss that in an upcoming diff. > > ok? I'm not necessarily against this diff, but it may hide bugs. And... > Index: kern/kern_sched.c > === > RCS file: /cvs/src/sys/kern/kern_sched.c,v > retrieving revision 1.73 > diff -u -p -r1.73 kern_sched.c > --- kern/kern_sched.c 9 Sep 2021 18:41:39 - 1.73 > +++ kern/kern_sched.c 2 Oct 2021 17:00:52 - > @@ -144,10 +144,9 @@ sched_idle(void *v) >*/ > SCHED_LOCK(s); > cpuset_add(&sched_idle_cpus, ci); > - p->p_stat = SSLEEP; > p->p_cpu = ci; > atomic_setbits_int(&p->p_flag, P_CPUPEG); > - mi_switch(); > + mi_switch(SSLEEP); > cpuset_del(&sched_idle_cpus, ci); > SCHED_UNLOCK(s); > > @@ -159,8 +158,7 @@ sched_idle(void *v) > struct proc *dead; > > SCHED_LOCK(s); > - p->p_stat = SSLEEP; > - mi_switch(); > + mi_switch(SSLEEP); > SCHED_UNLOCK(s); > > while ((dead = LIST_FIRST(&spc->spc_deadproc))) { > @@ -625,7 +623,7 @@ sched_peg_curproc(struct cpu_info *ci) > atomic_setbits_int(&p->p_flag, P_CPUPEG); > setrunqueue(ci, p, p->p_usrpri); > p->p_ru.ru_nvcsw++; > - mi_switch(); > + mi_switch(SRUN); > SCHED_UNLOCK(s); > } > > Index: kern/kern_synch.c > === > RCS file: /cvs/src/sys/kern/kern_synch.c,v > retrieving revision 1.179 > diff -u -p -r1.179 kern_synch.c > --- kern/kern_synch.c 9 Sep 2021 18:41:39 - 1.179 > +++ kern/kern_synch.c 2 Oct 2021 17:00:52 - > @@ -421,10 +421,9 @@ sleep_finish(struct sleep_state *sls, in > } > > if (do_sleep) { > - p->p_stat = SSLEEP; > p->p_ru.ru_nvcsw++; > SCHED_ASSERT_LOCKED(); > - mi_switch(); > + mi_switch(SSLEEP); > } else { > unsleep(p); > } > @@ -603,7 +602,7 @@ sys_sched_yield(struct proc *p, void *v, > newprio = max(newprio, q->p_runpri); > setrunqueue(p->p_cpu, p, newprio); > p->p_ru.ru_nvcsw++; > - mi_switch(); > + mi_switch(SRUN); > SCHED_UNLOCK(s); > > return (0); > Index: kern/kern_sig.c > === > RCS file: /cvs/src/sys/kern/kern_sig.c,v > retrieving revision 1.283 > diff -u -p -r1.283 kern_sig.c > --- kern/kern_sig.c 28 Sep 2021 10:00:18 - 1.283 > +++ kern/kern_sig.c 2 Oct 2021 17:00:52 - > @@ -1347,7 +1347,6 @@ proc_stop(struct proc *p, int sw) > SCHED_ASSERT_LOCKED(); > #endif > > - p->p_stat = SSTOP; > atomic_clearbits_int(&pr->ps_flags, PS_WAITED); > atomic_setbits_int(&pr->ps_flags, PS_STOPPED); > atomic_setbits_int(&p->p_flag, P_SUSPSIG); > @@ -1357,7 +1356,7 @@ proc_stop(struct proc *p, int sw) >*/ > softintr_schedule(proc_stop_si); > if (sw) > - mi_switch(); > + mi_switch
mi_switch() & setting `p_stat'
When a thread running on a CPU schedules itself out, it does the following (pseudo_code): SCHED_LOCK() curproc->p_stat = SSLEEP; // some more operations mi_switch() The problem with this is that any instrumentation between setting `p_stat' and cpu_switchto() is incorrect because 'curproc' is still being executed and is not yet sleeping. Its `p_stat' should be SONPROC and not SSLEEP. It is possible to reproduce the problem with the following btrace(8) script: tracepoint:sched:enqueue { printf("%d -> enqueue (%d)\n", arg0, arg1); } tracepoint:sched:dequeue { printf("%d <- dequeue (%d)\n", arg0, arg1); } tracepoint:sched:on__cpu { printf("%d -- on cpu (%d)\n", tid, pid); } At which point the KASSERT() in wakeup_n() triggers if `curproc' is going to sleep and its sleep channel collides with the running btrace(8) program: dt_prov_static_hook() at dt_prov_static_hook+0xe4 remrunqueue() at remrunqueue+0x1a4 sched_chooseproc() at sched_chooseproc+0x200 mi_switch() at mi_switch+0x178 sleep_finish() at sleep_finish+0x1d0 tsleep() at tsleep+0x100 biowait() at biowait+0x4c ffs_read() at ffs_read+0x1c0 VOP_READ() at VOP_READ+0x44 vn_read() at vn_read+0x84 dofilereadv() at dofilereadv+0x8c sys_read() at sys_read+0x5c To fix this we should set `p_stat' as late a possible, diff below does that just before calling cpu_switchto(). Note that there's an exception for SRUN because setrunqueue() change `p_stat' to indicate the thread is on a queue. I'll discuss that in an upcoming diff. ok? Index: kern/kern_sched.c === RCS file: /cvs/src/sys/kern/kern_sched.c,v retrieving revision 1.73 diff -u -p -r1.73 kern_sched.c --- kern/kern_sched.c 9 Sep 2021 18:41:39 - 1.73 +++ kern/kern_sched.c 2 Oct 2021 17:00:52 - @@ -144,10 +144,9 @@ sched_idle(void *v) */ SCHED_LOCK(s); cpuset_add(&sched_idle_cpus, ci); - p->p_stat = SSLEEP; p->p_cpu = ci; atomic_setbits_int(&p->p_flag, P_CPUPEG); - mi_switch(); + mi_switch(SSLEEP); cpuset_del(&sched_idle_cpus, ci); SCHED_UNLOCK(s); @@ -159,8 +158,7 @@ sched_idle(void *v) struct proc *dead; SCHED_LOCK(s); - p->p_stat = SSLEEP; - mi_switch(); + mi_switch(SSLEEP); SCHED_UNLOCK(s); while ((dead = LIST_FIRST(&spc->spc_deadproc))) { @@ -625,7 +623,7 @@ sched_peg_curproc(struct cpu_info *ci) atomic_setbits_int(&p->p_flag, P_CPUPEG); setrunqueue(ci, p, p->p_usrpri); p->p_ru.ru_nvcsw++; - mi_switch(); + mi_switch(SRUN); SCHED_UNLOCK(s); } Index: kern/kern_synch.c === RCS file: /cvs/src/sys/kern/kern_synch.c,v retrieving revision 1.179 diff -u -p -r1.179 kern_synch.c --- kern/kern_synch.c 9 Sep 2021 18:41:39 - 1.179 +++ kern/kern_synch.c 2 Oct 2021 17:00:52 - @@ -421,10 +421,9 @@ sleep_finish(struct sleep_state *sls, in } if (do_sleep) { - p->p_stat = SSLEEP; p->p_ru.ru_nvcsw++; SCHED_ASSERT_LOCKED(); - mi_switch(); + mi_switch(SSLEEP); } else { unsleep(p); } @@ -603,7 +602,7 @@ sys_sched_yield(struct proc *p, void *v, newprio = max(newprio, q->p_runpri); setrunqueue(p->p_cpu, p, newprio); p->p_ru.ru_nvcsw++; - mi_switch(); + mi_switch(SRUN); SCHED_UNLOCK(s); return (0); Index: kern/kern_sig.c === RCS file: /cvs/src/sys/kern/kern_sig.c,v retrieving revision 1.283 diff -u -p -r1.283 kern_sig.c --- kern/kern_sig.c 28 Sep 2021 10:00:18 - 1.283 +++ kern/kern_sig.c 2 Oct 2021 17:00:52 - @@ -1347,7 +1347,6 @@ proc_stop(struct proc *p, int sw) SCHED_ASSERT_LOCKED(); #endif - p->p_stat = SSTOP; atomic_clearbits_int(&pr->ps_flags, PS_WAITED); atomic_setbits_int(&pr->ps_flags, PS_STOPPED); atomic_setbits_int(&p->p_flag, P_SUSPSIG); @@ -1357,7 +1356,7 @@ proc_stop(struct proc *p, int sw) */ softintr_schedule(proc_stop_si); if (sw) - mi_switch(); + mi_switch(SSTOP); } /* @@ -1979,8 +1978,7 @@ single_thread_check_locked(struct proc * } /* not exiting and don't need to unwind, so suspend */ - p->p_stat = SSTOP; - mi_switch(); + mi_switch(SSTOP); } while (pr->ps_single != NULL); } Index: kern/sched_bsd.c === RCS file: /cvs/src/sys/kern/sched