> Date: Sun, 9 Jul 2023 17:24:41 -0500 > From: Scott Cheloha <scottchel...@gmail.com> > > On Sun, Jul 09, 2023 at 08:11:43PM +0200, Claudio Jeker wrote: > > On Sun, Jul 09, 2023 at 12:52:20PM -0500, Scott Cheloha wrote: > > > This patch fixes resume/unhibernate on GPROF kernels where kgmon(8) > > > has activated kernel profiling. > > > > > > I think the problem is that code called from cpu_hatch() does not play > > > nicely with _mcount(), so GPROF kernels crash during resume. I can't > > > point you to which code in particular, but keeping all CPUs out of > > > _mcount() until the primary CPU has completed resume/unhibernate fixes > > > the crash. > > > > > > ok? > > > > To be honest, I'm not sure we need something like this. GPROF is already a > > special case and poeple running a GPROF kernel should probably stop the > > collection of profile data before suspend/hibernate. > > Sorry, I was a little unclear in my original mail. > > When I say "has activated kernel profiling" I mean "has *ever* > activated kernel profiling". > > Regardless of whether or not profiling is active at the moment we > reach sleep_state(), if kernel profiling has *ever* been activated in > the past, the resume crashes.
So isn't the real problem that some of the lower-level code involved in the resume path isn't properly marked to not do the instrumentation? Traditionally that was assembly code and we'd use NENTRY() (in amd64) or ENTRY_NP() (on some other architectures) to prevent thise functions from calling _mcount(). But that was only ever done for code used during early bootstrap of the kernel. And these days there may be C code that needs this as well. With your diff, functions in the suspend/resume path will still call _mcount() which may not be safe.