On 28/12/16(Wed) 01:05, Jeremie Courreges-Anglas wrote: > Mark Kettenis <[email protected]> writes: > > >> Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET) > >> From: Mark Kettenis <[email protected]> > >> > >> We already do this on some architectures, but not on amd64 for > >> example. The main reason is that this disables memcpy() optimizations > >> that have a measurable impact on the network stack performance. > >> > >> We can get those optimizations back by doing: > >> > >> #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) > >> > >> I verified that gcc still does proper bounds checking on > >> __builtin_memcpy(), so we don't lose that. > >> > >> The nice thing about this solution is that we can choose explicitly > >> which optimizations we want. And as you can see the kernel makefile > >> gets simpler ;). > >> > >> Of course the real reason why I'm looking into this is that clang > >> makes it really hard to build kernels without -ffreestanding. > >> > >> The diff below implements this strategy, and enabled the optimizations > >> for memcpy() and memset(). We can add others if we think there is a > >> benefit. I've tested the diff on amd64. We may need to put an #undef > >> memcpy somewhere for platforms that use the generic C code for memcpy. > >> > >> Thoughts? > > > > So those #undefs are necessary. New diff below. Tested on armv7, > > hppa and sparc64 now as well. > > I think this is the way to go; can't help tests on other archs, though. > ok jca@ fwiw
For the archives, Hrvoje Popovski measured a performance impact when using a kernel with this diff to forward packets. I guess we're missing some defines.
