>> Am 28.12.2016 um 08:29 schrieb Martin Pieuchot <[email protected]>: >> >> On 28/12/16(Wed) 01:05, Jeremie Courreges-Anglas wrote: >> Mark Kettenis <[email protected]> writes: >> >>>> Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET) >>>> From: Mark Kettenis <[email protected]> >>>> >>>> We already do this on some architectures, but not on amd64 for >>>> example. The main reason is that this disables memcpy() optimizations >>>> that have a measurable impact on the network stack performance. >>>> >>>> We can get those optimizations back by doing: >>>> >>>> #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) >>>> >>>> I verified that gcc still does proper bounds checking on >>>> __builtin_memcpy(), so we don't lose that. >>>> >>>> The nice thing about this solution is that we can choose explicitly >>>> which optimizations we want. And as you can see the kernel makefile >>>> gets simpler ;). >>>> >>>> Of course the real reason why I'm looking into this is that clang >>>> makes it really hard to build kernels without -ffreestanding. >>>> >>>> The diff below implements this strategy, and enabled the optimizations >>>> for memcpy() and memset(). We can add others if we think there is a >>>> benefit. I've tested the diff on amd64. We may need to put an #undef >>>> memcpy somewhere for platforms that use the generic C code for memcpy. >>>> >>>> Thoughts? >>> >>> So those #undefs are necessary. New diff below. Tested on armv7, >>> hppa and sparc64 now as well. >> >> I think this is the way to go; can't help tests on other archs, though. >> ok jca@ fwiw > > For the archives, Hrvoje Popovski measured a performance impact when using > a kernel with this diff to forward packets. I guess we're missing some > defines.
I'm late to the game - but does this diff remove all the other optimizations as well? (eg. bcopy, memcmp, memmove, strchr, ...) I did some performance testing when I added them for amd64 in libc and it made a noticeable difference - not just for memcpy+memset. Reyk
