On Mon, Aug 29, 2016 at 4:43 PM, Ryota Ozaki <[email protected]> wrote: > Hi, > > I propose to set -falign-functions=16 to kernels > of i386/amd64 to reduce performance fluctuations > by small, unrelated changes. > > [Background] > > I noticed that performance of IP forwarding had > been degraded by 10% between Aug. 1 and Aug. 16. > Bisecting commits between them points out that > performance degradations happened by several > commits and unfortunately the commits aren't > related to performance of IP forwarding; for > example a change to ip6flow. > > I and knakahara investigated how these > degradations happened and concluded that they > are because of changes of the start of functions > (alignment of function codes), which probably > affects CPU cache hits. (Actually this is just > our guess because we don't have a way to know > cache hit/miss ratios for now...) > > [How -falign-functions=16 helps?] > > Currently the start of functions of kernels of > i386/amd64 is unaligned, i.e., functions can > start at any bytes depending on leading objects > linked to the kernel. If the size of leading > objects has been changed, starts of all following > functions also change. > > You can see how function alignments are organized > by nm -n netbsd or just seeing symbol files > generated in releasedir. > > If you specify -falign-functions=16 to COPTS in > your kernel config, you can align functions by > 16 bytes. By doing so, addresses of the start of > all functions always become 0xXXXXXXX0 for i386 > 0xffffffffXXXXXXX0 for amd64. The alignment makes > sure that functions don't affect by other > unrelated code changes. > > [Why not aligned in the first place?] > > It seems because of -mtune=nocona that is specified > in bsd.own.mk. -mtune=generic provides functions > aligned by 16 bytes, but provides poorer performance > than -mtune=nocona, so I don't propose this kind of > changes. > > [-falign-functions=16 solves the issue completely?] > > No. It seems there remains some other cause(s) that > provide performance fluctuations. Nonetheless, > setting -falign-functions=16 reduces fluctuations. > > [The point of the proposal] > > The aim of the proposal isn't to provide good > performance by aligning functions of a kernel, > but to reduce performance fluctuations by small, > unrelated changes. Such behavior makes it > difficult to measure small overhead of a change > because we cannot distinguish a given performance > change comes from either the real change or > function alignment changes. > > > Any suggestions or comments? > > Adding -falign-functions=16 is one solution and > there may be a better way to the goal. And also > I'm not sure where we should add such option.
http://www.netbsd.org/~ozaki-r/align-functions-16.diff The patch adds the option to sys/arch/amd64/conf/Makefile.amd64. Is it a feasible place to add? ozaki-r
