Hi, I propose to set -falign-functions=16 to kernels of i386/amd64 to reduce performance fluctuations by small, unrelated changes.
[Background] I noticed that performance of IP forwarding had been degraded by 10% between Aug. 1 and Aug. 16. Bisecting commits between them points out that performance degradations happened by several commits and unfortunately the commits aren't related to performance of IP forwarding; for example a change to ip6flow. I and knakahara investigated how these degradations happened and concluded that they are because of changes of the start of functions (alignment of function codes), which probably affects CPU cache hits. (Actually this is just our guess because we don't have a way to know cache hit/miss ratios for now...) [How -falign-functions=16 helps?] Currently the start of functions of kernels of i386/amd64 is unaligned, i.e., functions can start at any bytes depending on leading objects linked to the kernel. If the size of leading objects has been changed, starts of all following functions also change. You can see how function alignments are organized by nm -n netbsd or just seeing symbol files generated in releasedir. If you specify -falign-functions=16 to COPTS in your kernel config, you can align functions by 16 bytes. By doing so, addresses of the start of all functions always become 0xXXXXXXX0 for i386 0xffffffffXXXXXXX0 for amd64. The alignment makes sure that functions don't affect by other unrelated code changes. [Why not aligned in the first place?] It seems because of -mtune=nocona that is specified in bsd.own.mk. -mtune=generic provides functions aligned by 16 bytes, but provides poorer performance than -mtune=nocona, so I don't propose this kind of changes. [-falign-functions=16 solves the issue completely?] No. It seems there remains some other cause(s) that provide performance fluctuations. Nonetheless, setting -falign-functions=16 reduces fluctuations. [The point of the proposal] The aim of the proposal isn't to provide good performance by aligning functions of a kernel, but to reduce performance fluctuations by small, unrelated changes. Such behavior makes it difficult to measure small overhead of a change because we cannot distinguish a given performance change comes from either the real change or function alignment changes. Any suggestions or comments? Adding -falign-functions=16 is one solution and there may be a better way to the goal. And also I'm not sure where we should add such option. Thanks, ozaki-r
