On 12/08/23(Sat) 10:57, Visa Hankala wrote:
> On Fri, Aug 11, 2023 at 09:52:15PM +0200, Martin Pieuchot wrote:
> > When stopping a machine, with "halt -p" for example, secondary CPUs are
> > removed from the scheduler before smr_flush() is called.  So there's no
> > need for the SMR thread to peg itself to such CPUs.  This currently
> > isn't a problem because we use per-CPU runqueues but it doesn't work
> > with a global one.  So the diff below skip halted CPUs.  It should also
> > speed up rebooting/halting on machine with a huge number of CPUs.
> 
> Because SPCF_HALTED does not (?) imply that the CPU has stopped
> processing interrupts, this skipping is not safe as is. Interrupt
> handlers might access SMR-protected data.

Interesting.  This is worse than I expected.  It seems we completely
forgot about suspend/resume and rebooting when we started pinning
interrupts on secondary CPUs, no?  Previously sched_stop_secondary_cpus()
was enough to ensure no more code would be executed on secondary CPUs,
no?  Wouldn't it be better to remap interrupts to the primary CPU in
those cases?  Is it easily doable? 

> One possible solution is to spin. When smr_grace_wait() sees
> SPCF_HALTED, it should probably call cpu_unidle(ci) and spin until
> condition READ_ONCE(ci->ci_schedstate.spc_smrgp) == smrgp becomes true.
> However, for this to work, sched_idle() needs to invoke smr_idle().
> Here is a potential problem since the cpu_idle_{enter,cycle,leave}()
> logic is not consistent between architectures.

We're trying to move away from per-CPU runqueues.  That's how I found
this issue.  I don't see how this possible solution could work with a
global runqueue.  Do you?

> I think the intent in sched_idle() was that cpu_idle_enter() should
> block interrupts so that sched_idle() could check without races if
> the CPU can sleep. Now, on some architectures cpu_idle_enter() is
> a no-op. These architectures have to check the idle state in their
> cpu_idle_cycle() function before pausing the CPU.
> 
> To avoid touching architecture-specific code, cpu_is_idle() could
> be redefined to
> 
> ((ci)->ci_schedstate.spc_whichqs == 0 &&
>  (ci)->ci_schedstate.spc_smrgp == READ_ONCE(smr_grace_period))
> 
> Then the loop conditions
> 
>               while (!cpu_is_idle(curcpu())) {
> 
> and
> 
>               while (spc->spc_whichqs == 0) {
> 
> in sched_idle() would have to be changed to
> 
>               while (spc->spc_whichqs != 0) {
> 
> and
> 
>               while (cpu_is_idle(ci)) {
> 
> 
> :(
> 
> > Index: kern/kern_smr.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/kern_smr.c,v
> > retrieving revision 1.16
> > diff -u -p -r1.16 kern_smr.c
> > --- kern/kern_smr.c 14 Aug 2022 01:58:27 -0000      1.16
> > +++ kern/kern_smr.c 11 Aug 2023 19:43:54 -0000
> > @@ -158,6 +158,8 @@ smr_grace_wait(void)
> >     CPU_INFO_FOREACH(cii, ci) {
> >             if (!CPU_IS_RUNNING(ci))
> >                     continue;
> > +           if (ci->ci_schedstate.spc_schedflags & SPCF_HALTED)
> > +                   continue;
> >             if (READ_ONCE(ci->ci_schedstate.spc_smrgp) == smrgp)
> >                     continue;
> >             sched_peg_curproc(ci);
> > 

Reply via email to