Over years I saw multiple reports that new OpenWrt release / kernel update / netifd change / DSA introduction caused a regression in router network / NAT speed (masquerade NAT in most cases). Most of those reports remained unresolved I believe.
The problem is that: 1. OpenWrt doesn't have automated testing environments 2. Developers can't figure anything from undetailed reports 3. Even experienced users don't know how to do proper debugging I spent almost 2 last months researching & testing masquerade NAT performance. I thought I'll share my find outs & results. Hopefully this will get more people involved in tracing & fixing such regressions. ************************* * Testing method ************************* In 99% cases it's a totally bad idea to use online speed test services. They may be too unreliable. It's better to setup a local server instead. For actual testing you may use iperf or iperf3. If needed - for some reason - FTP, HTTP or another protocol may be an option too. ************************* * Testing results ************************* Network traffic is often not perfectly stable. To avoid getting false results it may be worth to: 1. Repeat test in few sessions 2. Reject lowest & highest results 3. Calculate an average speed Example of my testing: for i in $(seq 1 5); do date iperf -t 80 -i 10 -c 192.168.99.1 | head -n -1 | sed -n 's/.* \([0-9][0-9]*\) Mbits\/sec.*/\1/p' | sort -n echo sleep 15 done Above script lists 8 results from each iperf session. Later I get middle 4 and calculate avarage from them. Then I calculate average from all 5 sessions. It may be an overkill but it was meant to deal with some really unstable cases. ************************* * Environment setup ************************* Get some (usually 2) PCs powerful enough to easily handle maximum expected router traffic. Once setup avoid changing anything. Kernel update or configuration change on PC may affect results even if router is a bottleneck [1]. Disable power saving - I noticed once a lower performance whenever screen saver got activated. Connect PC to WAN port and setup it to use a static IP. You may setup DHCP server too or just make OpenWrt use static WAN IP & gateway. Start iperf / FTP / HTTP / whatever server. Connect another PC to LAN port and install a matching client for generating network traffic. ************************* * OpenWrt customizations ************************* Depending on setup you may need some custom configuration changes. To avoid applying them manually on every boot use uci-defaults scripts. Example of my WAN setup: mkdir -p files/etc/uci-defaults/ cat << EOF > files/etc/uci-defaults/90-nat.sh #!/bin/sh uci set network.wan.proto='static' uci set network.wan.ipaddr='192.168.99.2' uci set network.wan.netmask='255.255.255.0' EOF ************************* * Finding regressions ************************* In continuous testing pick an interval (every day testing or every n-th commit testing) and look for regressions. If you notice a regression the first step is to find the first bad commit. End users often assume that regression was caused by a kernel change as that is the simplest difference to notice. Always find exact commit. Make sure to use git bisect [2] for finding first bad commits. ************************* * Stabilizng performance ************************* Probably the most annoying problem in debugging are unstable results. Speed changing between testing sessions / reboots / recompilations makes the whole testing unreliable and makes it hard to find a real regression. Below are few tips that may help stabilizing network speeds. 1. Repeat tests and get average Explained above. 2. Don't change environment setup Explained above. 3. Use pfifo qdisc It should be more stable for simple traffic (e.g. iperf generated). Include "tc" package and execute something like: tc qdisc replace dev eth0 root pfifo Verify with: tc qdisc 4. Adjust rps_cpus and xps_cpus On multi-CPU devices having multiple CPUs assigned to a single network device may result in traffic being assigned to random CPU and in varying speeds across testing sessions. 5. Disable CONFIG_SMP This will likely reduce performance but may help finding regression if testing results vary a lot. 6. Organizing kernel symbols CPUs of home routers usually have small caches. The way kernel symbols get organized during compilation may significantly affect network performance [3]. It's especially annoying as network unrelated changes may move / reorder symbols and affect cache hits & misses. There isn't a reliable solution for that. It may help to add: -falign-functions=32 or -falign-functions=64 (depending on platform). using e.g. KBUILD_CFLAGS. ************************* * Profiling ************************* Profiling with "perf" [4] allows checking what consumes CPUs. It's very useful for finding code worth optimizing & comparing CPU usage across changes. OpenWrt needs to be commpiled with CONFIG_KERNEL_PERF_EVENTS=y option and package "perf" needs to be installed. Example of recording: 1. Start network traffic 2. On router execute: ( cd /tmp/; perf record -a -g -- sleep 60 ) 3. Copy /tmp/perf.data to machine used for compiling OpenWrt Example of reporting: 1. perf report -k build_dir/target-*/linux-*/vmlinux.debug --kallsyms build_dir/target-*/linux-*/linux-*/System.map 2. perf report -k build_dir/target-*/linux-*/vmlinux.debug --kallsyms build_dir/target-*/linux-*/linux-*/System.map --no-child 3. perf report -k build_dir/target-*/linux-*/vmlinux.debug --kallsyms build_dir/target-*/linux-*/linux-*/System.map --no-child -g none For more fancy reports the Flame Graph [5] can be used: 1. perf script build_dir/target-*/linux-*/vmlinux.debug --kallsyms build_dir/target-*/linux-*/linux-*/System.map > out.perf 2. stackcollapse-perf.pl out.perf > out.folded 3. flamegraph.pl out.folded > out.svg ************************* * Kernel regressions ************************* The most problematic to debug are kernel updates. If the first bad OpenWrt commit is something like kernel switch from 5.4 to 5.10 it means millions of actual changes. There is no reasonable way to bisect kernel in OpenWrt. There are so many kernel patches and so much custom code that it's impossible to apply all those to dozens of kernel commits during git bisect. There are two ways to handle such cases: 1. Strip OpenWrt of 90+% custom patches and then try kernel bisecting 2. Use non-OpenWrt environment like Buildroot [6] ************************* * References ************************* [1] https://lore.kernel.org/netdev/81e63fc9-ac8c-cb35-4572-c808ddab9...@gmail.com/T/#m161113b88568f90fb10106e0c6dc9beadd4861e2 [2] https://git-scm.com/docs/git-bisect [3] https://lore.kernel.org/netdev/2a338e8e-3288-859c-d2e8-26c5712d3...@nbd.name/T/#m2215fd7b363dc321e5b16d6e192168c510b8ce94 [4] https://perf.wiki.kernel.org/index.php/Main_Page [5] https://www.brendangregg.com/flamegraphs.html [6] https://buildroot.org/ _______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel