Hi Rafal,
Thank you for your detailed analyses and also for the detailed report.
This is very helpful when I ran into this problem.
Can we somehow automate it so that we get notified a day after a bad
change was committed about performance regression and not one year after?
On 6/14/22 15:16, Rafał Miłecki wrote:
On 12.06.2022 21:58, Rafał Miłecki wrote:
5. 7125323b81d7 ("bcm53xx: switch to kernel 5.4")
Improved network speed by 25% (256 Mb/s → 320 Mb/s).
I didn't have time to bisect this *improvement* to a single kernel
commit. I tried profiling but it isn't obvious to me what caused that
improvement.
Kernel 4.19:
11.94% ksoftirqd/0 [kernel.kallsyms] [k]
v7_dma_inv_range
7.06% ksoftirqd/0 [kernel.kallsyms] [k]
l2c210_inv_range
3.37% ksoftirqd/0 [kernel.kallsyms] [k]
v7_dma_clean_range
2.80% ksoftirqd/0 [kernel.kallsyms] [k]
l2c210_clean_range
2.67% ksoftirqd/0 [kernel.kallsyms] [k] bgmac_poll
2.63% ksoftirqd/0 [kernel.kallsyms] [k]
__dev_queue_xmit
2.43% ksoftirqd/0 [kernel.kallsyms] [k]
__netif_receive_skb_core
2.13% ksoftirqd/0 [kernel.kallsyms] [k]
bgmac_start_xmit
1.82% ksoftirqd/0 [kernel.kallsyms] [k] nf_hook_slow
1.54% ksoftirqd/0 [kernel.kallsyms] [k] ip_forward
1.50% ksoftirqd/0 [kernel.kallsyms] [k]
dma_cache_maint_page
Kernel 5.4:
14.53% ksoftirqd/0 [kernel.kallsyms] [k]
v7_dma_inv_range
8.02% ksoftirqd/0 [kernel.kallsyms] [k]
l2c210_inv_range
3.32% ksoftirqd/0 [kernel.kallsyms] [k] bgmac_poll
3.28% ksoftirqd/0 [kernel.kallsyms] [k]
v7_dma_clean_range
3.12% ksoftirqd/0 [kernel.kallsyms] [k]
__netif_receive_skb_core
2.70% ksoftirqd/0 [kernel.kallsyms] [k]
l2c210_clean_range
2.46% ksoftirqd/0 [kernel.kallsyms] [k]
__dev_queue_xmit
2.26% ksoftirqd/0 [kernel.kallsyms] [k]
bgmac_start_xmit
1.73% ksoftirqd/0 [kernel.kallsyms] [k]
__dma_page_dev_to_cpu
1.72% ksoftirqd/0 [kernel.kallsyms] [k] nf_hook_slow
Riddle solved. Change to bless/blame: 4e0c54bc5bc8 ("kernel: add support
for kernel 5.4").
First of all bcm53xx uses
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
OpenWrt's kernel Makefile in kernel 4.19:
ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
KBUILD_CFLAGS += -Os $(EXTRA_OPTIMIZATION)
else
KBUILD_CFLAGS += -O2 -fno-reorder-blocks -fno-tree-ch
$(EXTRA_OPTIMIZATION)
endif
OpenWrt's kernel Makefile in 5.4:
ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE
KBUILD_CFLAGS += -O2 $(EXTRA_OPTIMIZATION)
else ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3
KBUILD_CFLAGS += -O3 $(EXTRA_OPTIMIZATION)
else ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
KBUILD_CFLAGS += -Os -fno-reorder-blocks -fno-tree-ch $(EXTRA_OPTIMIZATION)
endif
As you can see 4e0c54bc5bc8 has accidentally moved -fno-reorder-blocks
from !CONFIG_CC_OPTIMIZE_FOR_SIZE to CONFIG_CC_OPTIMIZE_FOR_SIZE.
This looks like an accident to me.
All targets except mediatek/mt7629 are setting
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE in master. In Openwrt 21.02 the
ARCHS38 target set CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3, but now it is
also to normal performance.
We should probably switch mediatek/mt7629 to
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE, does anyone have such a device and
could test a patch?
I've noticed problem with -fno-reorder-blocks long time ago, see:
[PATCH RFC] kernel: drop -fno-reorder-blocks
https://patchwork.ozlabs.org/project/openwrt/patch/20190409093046.13401-1-zaj...@gmail.com/
It should really get sorted out...
I would suggest to remove the -fno-reorder-blocks -fno-tree-ch options
as they are not used.
The next step could be Profile-guided optimization:
https://lwn.net/Articles/830300/
If the toolchain works properly I expect there big improvements as
routing, forwarding and NAT is completely in the kernel and we use
devices with small caches. Profile-guided optimization should be able to
avoid many cache misses by better packaging the binary.
Hauke
_______________________________________________
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel