>> Am 28.12.2016 um 08:29 schrieb Martin Pieuchot <m...@openbsd.org>:
>> On 28/12/16(Wed) 01:05, Jeremie Courreges-Anglas wrote:
>> Mark Kettenis <mark.kette...@xs4all.nl> writes:
>>>> Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET)
>>>> From: Mark Kettenis <mark.kette...@xs4all.nl>
>>>> We already do this on some architectures, but not on amd64 for
>>>> example.  The main reason is that this disables memcpy() optimizations
>>>> that have a measurable impact on the network stack performance.
>>>> We can get those optimizations back by doing:
>>>> #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
>>>> I verified that gcc still does proper bounds checking on
>>>> __builtin_memcpy(), so we don't lose that.
>>>> The nice thing about this solution is that we can choose explicitly
>>>> which optimizations we want.  And as you can see the kernel makefile
>>>> gets simpler ;).
>>>> Of course the real reason why I'm looking into this is that clang
>>>> makes it really hard to build kernels without -ffreestanding.
>>>> The diff below implements this strategy, and enabled the optimizations
>>>> for memcpy() and memset().  We can add others if we think there is a
>>>> benefit.  I've tested the diff on amd64.  We may need to put an #undef
>>>> memcpy somewhere for platforms that use the generic C code for memcpy.
>>>> Thoughts?
>>> So those #undefs are necessary.  New diff below.  Tested on armv7,
>>> hppa and sparc64 now as well.
>> I think this is the way to go; can't help tests on other archs, though.
>> ok jca@ fwiw
> For the archives, Hrvoje Popovski measured a performance impact when using
> a kernel with this diff to forward packets.  I guess we're missing some
> defines.

I'm late to the game - but does this diff remove all the other optimizations as 
well? (eg. bcopy, memcmp, memmove, strchr, ...) I did some performance testing 
when I added them for amd64 in libc and it made a noticeable difference - not 
just for memcpy+memset.


Reply via email to