CVSROOT:        /cvs
Module name:    src
Changes by:     d...@cvs.openbsd.org    2025/05/06 18:39:09

Modified files:
        sys/sys        : futex.h proc.h 
        sys/kern       : kern_fork.c sys_futex.c 

Log message:
avoid lock contention in futex syscalls

previously all futexes in the system coordinated using a single
global lock. if you have heavily threaded code building locks in
userland out of futexes, this lock gets hammered. this is true even
if userland thinks it's operating on separate locks, it all ends
up serialised in the kernel. this can reduce the throughput of these
heavily threaded programs.

like the __thrsleep diff, the big change is hashing futex waiters
into an array of sleep queues with separate locks/lists based on
their "id" to try and avoid contending on a single lock.

also like the __thrsleep diff, this change also tries to avoid
having a thread waiting in futex_wait re-take the lock when waking
up. futex_wake is holding the sleep queue lock when waking up
sleeping threads, so having the sleeping thread try take the sleep
queue lock again would immediately put it back to sleep again.
having futex_wait sleep without the lock means it can return back
to userland sooner.

this is very similar to the change made to __thrsleep and __thrwakeup
in src/sys/kern/kern_synch.c r1.214.

a feature of futexes is that multiple threads can wait on the same
address and get woken up together. this was previously implemented by
allocating a struct to represent this userland address, and then queuing
the waiting threads on this struct. while pools aren't slow, they're
not free, so this diff removes this struct and queues threads directly.
this means the futex wakups may have to iterate more, but in practice
this is amortised by having multiple lists/locks (which results in
shorter lists of threads), and avoiding the overhead of the pool
operations. my observation is that most futex ops didnt share wait
addresses, so every futex wait would result in a pool get and put
anyway.

another feature of futexes that __thrsleep doesnt have is the ability
to move the address threads are sleeping on. this means that threads
can move between sleep queues in the array. care must be taken to
avoid deadlocks between the locks on each sleep queue, and when a
waiting thread wakes up after a timeout expires it has to be careful
to remove itself from the right sleep queue after such a requeue.

testing by many, but especially phessler@
ok mpi@

Reply via email to