On Fri, Aug 29, 2025 at 11:52 AM Tomas Vondra <to...@vondra.me> wrote: > True. But one worker did show up in top, using a fair amount of CPU, so > why wouldn't the others (if they process the same stream)?
It deliberately concentrates wakeups into the lowest numbered workers that are marked idle in a bitmap. * higher numbered workers snooze and eventually time out (with the patches for 19 that make the pool size dynamic) * busy workers have a better chance of staying on CPU between one job and the next * minimised duplication of various caches and descriptors Every other wakeup routing strategy I've tried so far performed worse in both avg(latency) and stddev(latency). I have wondered if we might want to consider per-NUMA-node IO worker pools with their own submission queues. Not investigated, but I suppose it might possibly help with the submission queue lock, cache line ping pong for buffer headers that the worker touches on completion, and inter-process interrupts. I don't know where to draw the line with a potential optimisations to IO worker mode that would realistically only help on Linux today, when the main performance plan for Linux is io_uring.