On Wed, Jun 03, 2026 at 02:17:27PM +0000, Dmitry Ilvokhin wrote:

> > Something a little like so, which is completely untested, except to
> > build kernel/locking/spinlock.o (with clang-23).
> 
> Thanks a lot for taking a look, Peter.
> 
> I like the static_call idea. It's truly zero cost on x86 (and, as you
> note, even a byte smaller). The one caveat is that it relies on
> HAVE_STATIC_CALL_INLINE to stay free. 
> 
> So my plan would be: static_call where HAVE_STATIC_CALL_INLINE is
> available (x86), and a static branch fallback elsewhere, gated behind a
> default-off config so it imposes nothing on arches/kernels that don't
> opt in. I'm mostly interested in x86, but would like arm64 to work too,
> which would use the fallback.

(i386 doesn't have STATIC_CALL_INLINE, but nobody cares about the
performance on that target, so anything goes really ;-)

> 
> Concretely:
> 
> 1. Split the sleepable-lock patches out and send them separately.
>    They're independent of the static call work and look far less
>    controversial.
> 
> 2. Convert the paravirt spinlock unlock to a static_call, as the
>    foundation for the unlock tracepoint. I'm happy to take a stab at it.
>    Let me know if you'd rather do it yourself.

Yeah, I think that patch as-is *should* work, but like said, I haven't
even tried it, so it could be terribly broken :-)

> 3. Build the unlock tracepoint on top: static_call where it's cheap,
>    config-gated static_branch fallback where it isn't.

Right, so I think we need some sort of custom callback for tracepoint
enable/disable. Its been a minute since I dug through the tracepoint
code, but I don't think it provides that with a convenient wrapper, but
it should be doable.

One thing to note is that when you set the tracepoint unlock function,
it should either tail-call into the original function, or you have to
create two unlock_trace functions, one for native and one for paravirt
and pick the right one.

> Does this plan sound reasonable to you?

Yeah, should work.

> > Also, I think someone should go do some performance runs with
> > ARCH_INLINE_SPIN_* set for x86 just like for s390.
> 
> That's a good point, I'll run benchmarks and report back with the
> results.

Thanks!

Reply via email to