Il giorno ven 17 giu 2022 alle ore 13:51 Hauke Mehrtens <ha...@hauke-m.de> ha scritto: > > Hi Rafal, > > Thank you for your detailed analyses and also for the detailed report. > This is very helpful when I ran into this problem. > > Can we somehow automate it so that we get notified a day after a bad > change was committed about performance regression and not one year after? > > On 6/14/22 15:16, Rafał Miłecki wrote: > > On 12.06.2022 21:58, Rafał Miłecki wrote: > >> 5. 7125323b81d7 ("bcm53xx: switch to kernel 5.4") > >> > >> Improved network speed by 25% (256 Mb/s → 320 Mb/s). > >> > >> I didn't have time to bisect this *improvement* to a single kernel > >> commit. I tried profiling but it isn't obvious to me what caused that > >> improvement. > >> > >> Kernel 4.19: > >> 11.94% ksoftirqd/0 [kernel.kallsyms] [k] > >> v7_dma_inv_range > >> 7.06% ksoftirqd/0 [kernel.kallsyms] [k] > >> l2c210_inv_range > >> 3.37% ksoftirqd/0 [kernel.kallsyms] [k] > >> v7_dma_clean_range > >> 2.80% ksoftirqd/0 [kernel.kallsyms] [k] > >> l2c210_clean_range > >> 2.67% ksoftirqd/0 [kernel.kallsyms] [k] bgmac_poll > >> 2.63% ksoftirqd/0 [kernel.kallsyms] [k] > >> __dev_queue_xmit > >> 2.43% ksoftirqd/0 [kernel.kallsyms] [k] > >> __netif_receive_skb_core > >> 2.13% ksoftirqd/0 [kernel.kallsyms] [k] > >> bgmac_start_xmit > >> 1.82% ksoftirqd/0 [kernel.kallsyms] [k] nf_hook_slow > >> 1.54% ksoftirqd/0 [kernel.kallsyms] [k] ip_forward > >> 1.50% ksoftirqd/0 [kernel.kallsyms] [k] > >> dma_cache_maint_page > >> > >> Kernel 5.4: > >> 14.53% ksoftirqd/0 [kernel.kallsyms] [k] > >> v7_dma_inv_range > >> 8.02% ksoftirqd/0 [kernel.kallsyms] [k] > >> l2c210_inv_range > >> 3.32% ksoftirqd/0 [kernel.kallsyms] [k] bgmac_poll > >> 3.28% ksoftirqd/0 [kernel.kallsyms] [k] > >> v7_dma_clean_range > >> 3.12% ksoftirqd/0 [kernel.kallsyms] [k] > >> __netif_receive_skb_core > >> 2.70% ksoftirqd/0 [kernel.kallsyms] [k] > >> l2c210_clean_range > >> 2.46% ksoftirqd/0 [kernel.kallsyms] [k] > >> __dev_queue_xmit > >> 2.26% ksoftirqd/0 [kernel.kallsyms] [k] > >> bgmac_start_xmit > >> 1.73% ksoftirqd/0 [kernel.kallsyms] [k] > >> __dma_page_dev_to_cpu > >> 1.72% ksoftirqd/0 [kernel.kallsyms] [k] nf_hook_slow > > > > Riddle solved. Change to bless/blame: 4e0c54bc5bc8 ("kernel: add support > > for kernel 5.4"). > > > > First of all bcm53xx uses > > CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y > > > > > > OpenWrt's kernel Makefile in kernel 4.19: > > > > ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE > > KBUILD_CFLAGS += -Os $(EXTRA_OPTIMIZATION) > > else > > KBUILD_CFLAGS += -O2 -fno-reorder-blocks -fno-tree-ch > > $(EXTRA_OPTIMIZATION) > > endif > > > > > > OpenWrt's kernel Makefile in 5.4: > > > > ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE > > KBUILD_CFLAGS += -O2 $(EXTRA_OPTIMIZATION) > > else ifdef CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3 > > KBUILD_CFLAGS += -O3 $(EXTRA_OPTIMIZATION) > > else ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE > > KBUILD_CFLAGS += -Os -fno-reorder-blocks -fno-tree-ch $(EXTRA_OPTIMIZATION) > > endif > > > > > > As you can see 4e0c54bc5bc8 has accidentally moved -fno-reorder-blocks > > from !CONFIG_CC_OPTIMIZE_FOR_SIZE to CONFIG_CC_OPTIMIZE_FOR_SIZE. > > This looks like an accident to me. > All targets except mediatek/mt7629 are setting > CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE in master. In Openwrt 21.02 the > ARCHS38 target set CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3, but now it is > also to normal performance. > > We should probably switch mediatek/mt7629 to > CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE, does anyone have such a device and > could test a patch? > > > I've noticed problem with -fno-reorder-blocks long time ago, see: > > [PATCH RFC] kernel: drop -fno-reorder-blocks > > https://patchwork.ozlabs.org/project/openwrt/patch/20190409093046.13401-1-zaj...@gmail.com/ > > > > > > It should really get sorted out... > > I would suggest to remove the -fno-reorder-blocks -fno-tree-ch options > as they are not used. > > > The next step could be Profile-guided optimization: > https://lwn.net/Articles/830300/ > If the toolchain works properly I expect there big improvements as > routing, forwarding and NAT is completely in the kernel and we use > devices with small caches. Profile-guided optimization should be able to > avoid many cache misses by better packaging the binary. >
PGO would be a dream to accomplish but it's a nightmare to actually use it. The kernel size grow a lot and it needs to be done correctly... Also AFAIK it's not that easy to add support for it and it's problematic for some device to generate the profile data. > Hauke > > _______________________________________________ > openwrt-devel mailing list > openwrt-devel@lists.openwrt.org > https://lists.openwrt.org/mailman/listinfo/openwrt-devel _______________________________________________ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel