Hello,
just a quick question:
was pf(4) enabled while running those tests?
if pf(4) was enabled while those tests were running,
what rules were loaded to to pf(4)?
my guess is pf(4) was not enabled when running those tests. if I remember
correctly I could see performance boost by factor ~1.5 when running those tests
with similar diff applied to machines provided by hrvoje@.
I agree it's time to commit such change.
thanks and
regards
sashan
On Wed, Apr 21, 2021 at 09:36:11PM +0200, Alexander Bluhm wrote:
> Hi,
>
> For a while we are running network without kernel lock, but with a
> network lock. The latter is an exclusive sleeping rwlock.
>
> It is possible to run the forwarding path in parallel on multiple
> cores. I use ix(4) interfaces which provide one input queue for
> each CPU. For that we have to start multiple softnet tasks and
> replace the exclusive lock with a shared lock. This works for IP
> and IPv6 input and forwarding, but not for higher protocols.
>
> So I implement a queue between IP and higher layers. We had that
> before when we were using netlock for IP and kernel lock for TCP.
> Now we have shared lock for IP and exclusive lock for TCP. By using
> a queue, we can upgrade the lock once for multiple packets.
>
> As you can see here, forwardings performance doubles from 4.5x10^9
> to 9x10^9 . Left column is current, right column is with my diff.
> The other dots at 2x10^9 are with socket splicing which is not
> affected.
> https://urldefense.com/v3/__http://bluhm.genua.de/perform/results/2021-04-21T10*3A50*3A37Z/gnuplot/forward.png__;JSU!!GqivPVa7Brio!OtwTVOe5OduUy66tflwa4sLzYhsp0IYrrEbYuHZgcNN-ajkw4YV9sxpxPkaP34lTt1CS_5aW$
>
>
> Here are all numbers with various network tests.
> https://urldefense.com/v3/__http://bluhm.genua.de/perform/results/2021-04-21T10*3A50*3A37Z/perform.html__;JSU!!GqivPVa7Brio!OtwTVOe5OduUy66tflwa4sLzYhsp0IYrrEbYuHZgcNN-ajkw4YV9sxpxPkaP34lTt4fEPDZ-$
>
> TCP performance gets less deterministic due to the addition queue.
>
> Kernel stack flame graph looks like this. Machine uses 4 CPU.
> https://urldefense.com/v3/__http://bluhm.genua.de/files/kstack-multiqueue-forward.svg__;!!GqivPVa7Brio!OtwTVOe5OduUy66tflwa4sLzYhsp0IYrrEbYuHZgcNN-ajkw4YV9sxpxPkaP34lTt9DxAhcT$
>
>
> Note the kernel lock around nd6_resolve(). I hat to put it there
> as I have seen an MP related crash there. This can be fixed
> independently of this diff.
>
> We need more MP preassure to find such bugs and races. I think now
> is a good time to give this diff broader testing and commit it.
> You need interfaces with multiple queues to see a difference.
>