CVSROOT: /cvs Module name: src Changes by: d...@cvs.openbsd.org 2025/05/06 18:39:09
Modified files: sys/sys : futex.h proc.h sys/kern : kern_fork.c sys_futex.c Log message: avoid lock contention in futex syscalls previously all futexes in the system coordinated using a single global lock. if you have heavily threaded code building locks in userland out of futexes, this lock gets hammered. this is true even if userland thinks it's operating on separate locks, it all ends up serialised in the kernel. this can reduce the throughput of these heavily threaded programs. like the __thrsleep diff, the big change is hashing futex waiters into an array of sleep queues with separate locks/lists based on their "id" to try and avoid contending on a single lock. also like the __thrsleep diff, this change also tries to avoid having a thread waiting in futex_wait re-take the lock when waking up. futex_wake is holding the sleep queue lock when waking up sleeping threads, so having the sleeping thread try take the sleep queue lock again would immediately put it back to sleep again. having futex_wait sleep without the lock means it can return back to userland sooner. this is very similar to the change made to __thrsleep and __thrwakeup in src/sys/kern/kern_synch.c r1.214. a feature of futexes is that multiple threads can wait on the same address and get woken up together. this was previously implemented by allocating a struct to represent this userland address, and then queuing the waiting threads on this struct. while pools aren't slow, they're not free, so this diff removes this struct and queues threads directly. this means the futex wakups may have to iterate more, but in practice this is amortised by having multiple lists/locks (which results in shorter lists of threads), and avoiding the overhead of the pool operations. my observation is that most futex ops didnt share wait addresses, so every futex wait would result in a pool get and put anyway. another feature of futexes that __thrsleep doesnt have is the ability to move the address threads are sleeping on. this means that threads can move between sleep queues in the array. care must be taken to avoid deadlocks between the locks on each sleep queue, and when a waiting thread wakes up after a timeout expires it has to be careful to remove itself from the right sleep queue after such a requeue. testing by many, but especially phessler@ ok mpi@