Visa Hankala <v...@hankala.org> wrote: > On Sat, Aug 12, 2023 at 01:29:10PM +0200, Martin Pieuchot wrote: > > On 12/08/23(Sat) 10:57, Visa Hankala wrote: > > > On Fri, Aug 11, 2023 at 09:52:15PM +0200, Martin Pieuchot wrote: > > > > When stopping a machine, with "halt -p" for example, secondary CPUs are > > > > removed from the scheduler before smr_flush() is called. So there's no > > > > need for the SMR thread to peg itself to such CPUs. This currently > > > > isn't a problem because we use per-CPU runqueues but it doesn't work > > > > with a global one. So the diff below skip halted CPUs. It should also > > > > speed up rebooting/halting on machine with a huge number of CPUs. > > > > > > Because SPCF_HALTED does not (?) imply that the CPU has stopped > > > processing interrupts, this skipping is not safe as is. Interrupt > > > handlers might access SMR-protected data. > > > > Interesting. This is worse than I expected. It seems we completely > > forgot about suspend/resume and rebooting when we started pinning > > interrupts on secondary CPUs, no? Previously sched_stop_secondary_cpus() > > was enough to ensure no more code would be executed on secondary CPUs, > > no? Wouldn't it be better to remap interrupts to the primary CPU in > > those cases? Is it easily doable? > > I think device interrupt stopping already happens through > config_suspend_all().
Right. DVACT_QUIESCE is supposed to stop the device's transaction flow, which implies it stops scheduling more things which will cause interrupts. DVACT_SUSPEND is supposed to stop the device even further, so that it can save a hardware state into memory, which is viable for recovery later on. For hibernate, this is even more strict, because memory side effects must not occur after these phases. I suppose there is nothing which says interrupts must be stopped for DVACT_SUSPEND. A driver could ACK the interrupts, but create no software side effects. Anyways, for the halt/reboot case, config_suspend_all() is not done. We have architectures that don't suspend/resume, but must halt/reboot. We have not reviewed their driver stack to check if config_suspend_all() would do the right thing. This is not trivial, it isn't just in the drivers. Some of our non-suspending architectures could have incorrectly ordered device trees...