On Thu, 14 May 2026 14:13:35 +0000
Dmitry Ilvokhin <[email protected]> wrote:

> > > +void __lockfunc queued_spin_release_traced(struct qspinlock *lock)
> > > +{
> > > + if (queued_spin_is_contended(lock))
> > > +         trace_call__contended_release(lock);
> > > + queued_spin_release(lock);  
> > 
> > And then remove the duplicate call of "queued_spin_release()" here.  
> 
> This is the scenario the comment above the static branch describes.
> Here's what it looks like in practice on x86_64 (defconfig, compiled
> with GCC 11).
> 
> Current design (trace + unlock combined, with return):
>   
>     endbr64
>     xchg %ax,%ax                     ; NOP (static branch)
>     movb $0x0,(%rdi)                 ; unlock
>     decl %gs:__preempt_count
>     je   preempt
>     jmp  __x86_return_thunk
>     call queued_spin_release_traced  ; cold
>     jmp  preempt_handling            ; cold
>     call __SCT__preempt_schedule
>     jmp  __x86_return_thunk
> 
> With the trace-only function (no return, unlock after the call):
>   
>     endbr64
>     push %rbx                        ; saves callee-saved rbx (!)
>     mov  %rdi,%rbx                   ; preserve lock across call (!)
>     xchg %ax,%ax                     ; NOP (static branch)
>     movb $0x0,(%rbx)                 ; unlock
>     decl %gs:__preempt_count
>     je   preempt
>     pop  %rbx                        ; callee-saved restore (!)
>     jmp  __x86_return_thunk
>     call queued_spin_release_traced  ; cold
>     jmp  unlock                      ; cold
>     call __SCT__preempt_schedule
>     pop  %rbx
>     jmp  __x86_return_thunk
> 
> Three extra instructions marked by "!" on the hot path (push, mov, pop),
> all wasted when the tracepoint is off. That's the main reason for
> combining trace and unlock in the same out-of-line function.

Ah, because the return makes it into two tail calls.

I still don't like the duplication, perhaps add some more comments about
needing to update the other location if anything changes here? And perhaps
comment that this duplicate code helps the assembly.

-- Steve


Reply via email to