On 28/12/16(Wed) 01:05, Jeremie Courreges-Anglas wrote:
> Mark Kettenis <[email protected]> writes:
> 
> >> Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET)
> >> From: Mark Kettenis <[email protected]>
> >> 
> >> We already do this on some architectures, but not on amd64 for
> >> example.  The main reason is that this disables memcpy() optimizations
> >> that have a measurable impact on the network stack performance.
> >> 
> >> We can get those optimizations back by doing:
> >> 
> >> #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
> >> 
> >> I verified that gcc still does proper bounds checking on
> >> __builtin_memcpy(), so we don't lose that.
> >> 
> >> The nice thing about this solution is that we can choose explicitly
> >> which optimizations we want.  And as you can see the kernel makefile
> >> gets simpler ;).
> >> 
> >> Of course the real reason why I'm looking into this is that clang
> >> makes it really hard to build kernels without -ffreestanding.
> >> 
> >> The diff below implements this strategy, and enabled the optimizations
> >> for memcpy() and memset().  We can add others if we think there is a
> >> benefit.  I've tested the diff on amd64.  We may need to put an #undef
> >> memcpy somewhere for platforms that use the generic C code for memcpy.
> >> 
> >> Thoughts?
> >
> > So those #undefs are necessary.  New diff below.  Tested on armv7,
> > hppa and sparc64 now as well.
> 
> I think this is the way to go; can't help tests on other archs, though.
> ok jca@ fwiw

For the archives, Hrvoje Popovski measured a performance impact when using
a kernel with this diff to forward packets.  I guess we're missing some
defines.

Reply via email to