CVSROOT:        /cvs
Module name:    src
Changes by:     [email protected] 2026/01/14 13:43:56

Modified files:
        sys/arch/amd64/amd64: pmap.c vector.S 
        sys/arch/i386/i386: apicvec.s pmap.c 

Log message:
pmap functions send various TLB shootdown operations by IPI to other cpus.
A lock is grabbed to serialize this. Then recipient cpus get sent an IPI
demanding this work.  The lock is reused as a counter of cpus doing the work,
and each cpu's IPI handler decrements the counter.
The local cpu can do some operations in the parallel, before verifying
the TLB operations have completed in pmap_tlb_shootwait() which spins
for the counter to reach 0.  But the counter is also a lock, and 0
means other cpu can grab it.  So if the latency for the local work
exceeds the latency on the recepient cpus, the "counter-lock" can be
grabbed by a different cpu for its own TLB shootdown operations.  The
original cpu will now spin waiting for this second cpu's work to
finish, creating pmap function latency.
To fix this, I create per-cpu counters which are seperate from the lock.
The IPI functions written in asm now decrement this per-cpu counter, and
when it reaches 0, the shared lock is cleared allowing another cpu to
being shootdowns tracked by its own per-cpu counter.  The waiting
function only spins on the correct per-cpu counter.
As a bonus, the lock (and new variable indicating the shooting cpu)
are now in cache-aligned.
In snaps for 2 weeks
Many comments from chris; ok mlarkin chris

Reply via email to