Re: Optimizing kernel compilation / alignments for network performance

Rafał Miłecki Fri, 06 May 2022 00:53:03 -0700

On 5.05.2022 18:04, Andrew Lunn wrote:

you'll see that most used functions are:
v7_dma_inv_range
__irqentry_text_end
l2c210_inv_range
v7_dma_clean_range
bcma_host_soc_read32
__netif_receive_skb_core
arch_cpu_idle
l2c210_clean_range
fib_table_lookup


There is a lot of cache management functions here. Might sound odd,
but have you tried disabling SMP? These cache functions need to
operate across all CPUs, and the communication between CPUs can slow
them down. If there is only one CPU, these cache functions get simpler
and faster.

It just depends on your workload. If you have 1 CPU loaded to 100% and
the other 3 idle, you might see an improvement. If you actually need
more than one CPU, it will probably be worse.


It seems to lower my NAT speed from ~362 Mb/s to 320 Mb/s but it feels
more stable now (lower variations). Let me spend some time on more
testing.


FWIW during all my tests I was using:
echo 2 > /sys/class/net/eth0/queues/rx-0/rps_cpus
that is what I need to get similar speeds across iperf sessions

With
echo 0 > /sys/class/net/eth0/queues/rx-0/rps_cpus
my NAT speeds were jumping between 4 speeds:
273 Mbps / 315 Mbps / 353 Mbps / 425 Mbps
(every time I started iperf kernel jumped into one state and kept the
 same iperf speed until stopping it and starting another session)

With
echo 1 > /sys/class/net/eth0/queues/rx-0/rps_cpus
my NAT speeds were jumping between 2 speeds:
284 Mbps / 408 Mbps

I've also found that some Ethernet drivers invalidate or flush too
much. If you are sending a 64 byte TCP ACK, all you need to flush is
64 bytes, not the full 1500 MTU. If you receive a TCP ACK, and then
recycle the buffer, all you need to invalidate is the size of the ACK,
so long as you can guarantee nothing has touched the memory above it.
But you need to be careful when implementing tricks like this, or you
can get subtle corruption bugs when you get it wrong.


That was actually bgmac's initial behaviour, see commit 92b9ccd34a90
("bgmac: pass received packet to the netif instead of copying it"):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=92b9ccd34a9053c628d230fe27a7e0c10179910f

I think it was Felix who suggested me to avoid skb_copy*() and it seems
it improved performance indeed.

_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel

Re: Optimizing kernel compilation / alignments for network performance

Reply via email to