On Mon, Jan 30, 2023 at 11:48:23AM -0800, Josh Poimboeuf wrote:
> On Mon, Jan 30, 2023 at 06:36:32PM +0000, Mark Rutland wrote:
> > On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote:
> > > On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote:
> > > > @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched);
> > > >  static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched);
> > > >  int __sched dynamic_cond_resched(void)
> > > >  {
> > > > -       if (!static_branch_unlikely(&sk_dynamic_cond_resched))
> > > > +       if (!static_branch_unlikely(&sk_dynamic_cond_resched)) {
> > > > +               klp_sched_try_switch();
> > > >                 return 0;
> > > > +       }
> > > >         return __cond_resched();
> > > >  }
> > > >  EXPORT_SYMBOL(dynamic_cond_resched);
> > > 
> > > I would make the klp_sched_try_switch() not depend on
> > > sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed
> > > pass through __schedule().
> > > 
> > > But you'll probably want to check with Mark here, this all might
> > > generate crap code on arm64.
> > 
> > IIUC here klp_sched_try_switch() is a static call, so on arm64 this'll 
> > generate
> > at least a load, a conditional branch, and an indirect branch. That's not
> > ideal, but I'd have to benchmark it to find out whether it's a significant
> > overhead relative to the baseline of PREEMPT_DYNAMIC.
> > 
> > For arm64 it'd be a bit nicer to have another static key check, and a call 
> > to
> > __klp_sched_try_switch(). That way the static key check gets turned into a 
> > NOP
> > in the common case, and the call to __klp_sched_try_switch() can be a direct
> > call (potentially a tail-call if we made it return 0).
> 
> Hm, it might be nice if our out-of-line static call implementation would
> automatically do a static key check as part of static_call_cond() for
> NULL-type static calls.
> 
> But the best answer is probably to just add inline static calls to
> arm64.  Is the lack of objtool the only thing blocking that?

The major issues were branch range limitations (and needing the linker to add
PLTs), and painful instruction patching requirements (e.g. the architecture's
"CMODX" rules for Concurrent MODification and eXecution of instructions). We
went with the static key scheme above because that was what our assembled code
generation would devolve to anyway.

If we knew each call-site would only call a particular function or skip the
call, then we could do better (and would probably need something like objtool
to NOP that out at compile time), but since we don't know the callee at build
time we can't ensure we have a PLT in range when necessary.

> Objtool is now modular, so all the controversial CFG reverse engineering
> is now optional, so it shouldn't be too hard to just enable objtool for
> static call inlines.

Funnily enough, I spent some time yesterday looking at enabling a trivial
objtool for arm64 as I wanted some basic ELF rewriting functionality (to
manipulate the mcount_loc table). So I'll likely be looking at that soon
regardless of static calls. :)

Thanks,
Mark.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Reply via email to