Hi Taylor,
I was looking at this code again. Probably our queue is much faster and
more scalable since pcq is CAS based, which is a more typical/classical
approach. (For example, in our paper, we explain why FAA-based queues
scale much much better, we also compare against CAS-based queues.)
(For example, see pp.13-14 in
https://drops.dagstuhl.de/opus/volltexte/2019/11335/pdf/LIPIcs-DISC-2019-28.pdf
NCQ and MSQUEUE are CAS-based. Our queues (SCQ, SCQP) are FAA-based. You
can see a big gap as we reach 18 per-CPU cores. Basically NCQ and
MSQUEUE won't scale at all.)
The challenge there is to actually have a truly lock-free FAA queue;
most queues are not, they incorrectly claim lock-freedom. We finally
solve this problem in our paper.
Would you like us to explore if pcq can be replaced with our new DISC'19
algorithm potentially? How critical for the overall networking
performance is this code?
Ruslan
On 9/27/20 2:58 PM, Taylor R Campbell wrote:
Date: Fri, 25 Sep 2020 17:58:41 -0400
From: Ruslan Nikolaev <[email protected]>
I came across this page
https://wiki.netbsd.org/projects/project/atomic_fifo_lifo_queues/
Recently, we have designed a high-performant (and what is also very
important -- truly lock-free -- not just "lockless") and memory
efficient FIFO queue based on ring buffers that use FAA (fetch-and-add)
and traditional M&S queue as an outer layer (for unbounded queues).
Cool!
I don't actually know what the author of that project description had
in mind, and he's no longer around, so we should probably just take
that page down unless someone else remembers and it's still relevant.
Parts of the network stack use pcq(9) for inter-CPU packet queueing,
although in new code (and incrementally for old code) we generally try
to minimize inter-CPU communication except to distribute load.
https://man.NetBSD.org/pcq.9
https://nxr.NetBSD.org/xref/src/sys/kern/subr_pcq.c