Re: Build kernels with -ffreestanding?
> Date: Wed, 28 Dec 2016 22:59:18 +0100 (CET) > From: Mark Kettenis <mark.kette...@xs4all.nl> > > > Date: Wed, 28 Dec 2016 08:29:05 +0100 > > From: Martin Pieuchot <m...@openbsd.org> > > > > On 28/12/16(Wed) 01:05, Jeremie Courreges-Anglas wrote: > > > Mark Kettenis <mark.kette...@xs4all.nl> writes: > > > > > > >> Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET) > > > >> From: Mark Kettenis <mark.kette...@xs4all.nl> > > > >> > > > >> We already do this on some architectures, but not on amd64 for > > > >> example. The main reason is that this disables memcpy() optimizations > > > >> that have a measurable impact on the network stack performance. > > > >> > > > >> We can get those optimizations back by doing: > > > >> > > > >> #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) > > > >> > > > >> I verified that gcc still does proper bounds checking on > > > >> __builtin_memcpy(), so we don't lose that. > > > >> > > > >> The nice thing about this solution is that we can choose explicitly > > > >> which optimizations we want. And as you can see the kernel makefile > > > >> gets simpler ;). > > > >> > > > >> Of course the real reason why I'm looking into this is that clang > > > >> makes it really hard to build kernels without -ffreestanding. > > > >> > > > >> The diff below implements this strategy, and enabled the optimizations > > > >> for memcpy() and memset(). We can add others if we think there is a > > > >> benefit. I've tested the diff on amd64. We may need to put an #undef > > > >> memcpy somewhere for platforms that use the generic C code for memcpy. > > > >> > > > >> Thoughts? > > > > > > > > So those #undefs are necessary. New diff below. Tested on armv7, > > > > hppa and sparc64 now as well. > > > > > > I think this is the way to go; can't help tests on other archs, though. > > > ok jca@ fwiw > > > > For the archives, Hrvoje Popovski measured a performance impact when using > > a kernel with this diff to forward packets. I guess we're missing some > > defines. > > The most likely candidate is memmove. Here is a diff that adds it. Scrap that; memcmp needs this too. New diff below. Index: arch/amd64/conf/Makefile.amd64 === RCS file: /cvs/src/sys/arch/amd64/conf/Makefile.amd64,v retrieving revision 1.74 diff -u -p -r1.74 Makefile.amd64 --- arch/amd64/conf/Makefile.amd64 29 Nov 2016 09:08:34 - 1.74 +++ arch/amd64/conf/Makefile.amd64 28 Dec 2016 22:19:20 - @@ -29,9 +29,7 @@ CWARNFLAGS= -Werror -Wall -Wimplicit-fun CMACHFLAGS=-mcmodel=kernel -mno-red-zone -mno-sse2 -mno-sse -mno-3dnow \ -mno-mmx -msoft-float -fno-omit-frame-pointer -CMACHFLAGS+= -fno-builtin-printf -fno-builtin-snprintf \ - -fno-builtin-vsnprintf -fno-builtin-log \ - -fno-builtin-log2 -fno-builtin-malloc ${NOPIE_FLAGS} +CMACHFLAGS+= -ffreestanding ${NOPIE_FLAGS} .if ${IDENT:M-DNO_PROPOLICE} CMACHFLAGS+= -fno-stack-protector .endif Index: lib/libkern/memcmp.c === RCS file: /cvs/src/sys/lib/libkern/memcmp.c,v retrieving revision 1.6 diff -u -p -r1.6 memcmp.c --- lib/libkern/memcmp.c10 Jun 2014 04:16:57 - 1.6 +++ lib/libkern/memcmp.c28 Dec 2016 22:19:21 - @@ -34,6 +34,8 @@ #include +#undef memcmp + /* * Compare memory regions. */ Index: lib/libkern/memcpy.c === RCS file: /cvs/src/sys/lib/libkern/memcpy.c,v retrieving revision 1.3 diff -u -p -r1.3 memcpy.c --- lib/libkern/memcpy.c12 Jun 2013 16:44:22 - 1.3 +++ lib/libkern/memcpy.c28 Dec 2016 22:19:21 - @@ -32,6 +32,8 @@ #include #include +#undef memcpy + /* * This is designed to be small, not fast. */ Index: lib/libkern/memmove.c === RCS file: /cvs/src/sys/lib/libkern/memmove.c,v retrieving revision 1.1 diff -u -p -r1.1 memmove.c --- lib/libkern/memmove.c 11 Jun 2013 18:04:41 - 1.1 +++ lib/libkern/memmove.c 28 Dec 2016 22:19:21 - @@ -32,6 +32,8 @@ #include #include +#undef memmove + /* * This is designed to be small, not fast. */ Index: lib/libkern/memset.c === RCS file: /cvs/src/sys/lib/libk
Re: Build kernels with -ffreestanding?
> Date: Wed, 28 Dec 2016 08:29:05 +0100 > From: Martin Pieuchot <m...@openbsd.org> > > On 28/12/16(Wed) 01:05, Jeremie Courreges-Anglas wrote: > > Mark Kettenis <mark.kette...@xs4all.nl> writes: > > > > >> Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET) > > >> From: Mark Kettenis <mark.kette...@xs4all.nl> > > >> > > >> We already do this on some architectures, but not on amd64 for > > >> example. The main reason is that this disables memcpy() optimizations > > >> that have a measurable impact on the network stack performance. > > >> > > >> We can get those optimizations back by doing: > > >> > > >> #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) > > >> > > >> I verified that gcc still does proper bounds checking on > > >> __builtin_memcpy(), so we don't lose that. > > >> > > >> The nice thing about this solution is that we can choose explicitly > > >> which optimizations we want. And as you can see the kernel makefile > > >> gets simpler ;). > > >> > > >> Of course the real reason why I'm looking into this is that clang > > >> makes it really hard to build kernels without -ffreestanding. > > >> > > >> The diff below implements this strategy, and enabled the optimizations > > >> for memcpy() and memset(). We can add others if we think there is a > > >> benefit. I've tested the diff on amd64. We may need to put an #undef > > >> memcpy somewhere for platforms that use the generic C code for memcpy. > > >> > > >> Thoughts? > > > > > > So those #undefs are necessary. New diff below. Tested on armv7, > > > hppa and sparc64 now as well. > > > > I think this is the way to go; can't help tests on other archs, though. > > ok jca@ fwiw > > For the archives, Hrvoje Popovski measured a performance impact when using > a kernel with this diff to forward packets. I guess we're missing some > defines. The most likely candidate is memmove. Here is a diff that adds it. Index: arch/amd64/conf/Makefile.amd64 === RCS file: /cvs/src/sys/arch/amd64/conf/Makefile.amd64,v retrieving revision 1.74 diff -u -p -r1.74 Makefile.amd64 --- arch/amd64/conf/Makefile.amd64 29 Nov 2016 09:08:34 - 1.74 +++ arch/amd64/conf/Makefile.amd64 28 Dec 2016 21:48:52 - @@ -29,9 +29,7 @@ CWARNFLAGS= -Werror -Wall -Wimplicit-fun CMACHFLAGS=-mcmodel=kernel -mno-red-zone -mno-sse2 -mno-sse -mno-3dnow \ -mno-mmx -msoft-float -fno-omit-frame-pointer -CMACHFLAGS+= -fno-builtin-printf -fno-builtin-snprintf \ - -fno-builtin-vsnprintf -fno-builtin-log \ - -fno-builtin-log2 -fno-builtin-malloc ${NOPIE_FLAGS} +CMACHFLAGS+= -ffreestanding ${NOPIE_FLAGS} .if ${IDENT:M-DNO_PROPOLICE} CMACHFLAGS+= -fno-stack-protector .endif Index: lib/libkern/memcpy.c === RCS file: /cvs/src/sys/lib/libkern/memcpy.c,v retrieving revision 1.3 diff -u -p -r1.3 memcpy.c --- lib/libkern/memcpy.c12 Jun 2013 16:44:22 - 1.3 +++ lib/libkern/memcpy.c28 Dec 2016 21:48:53 - @@ -32,6 +32,8 @@ #include #include +#undef memcpy + /* * This is designed to be small, not fast. */ Index: lib/libkern/memmove.c === RCS file: /cvs/src/sys/lib/libkern/memmove.c,v retrieving revision 1.1 diff -u -p -r1.1 memmove.c --- lib/libkern/memmove.c 11 Jun 2013 18:04:41 - 1.1 +++ lib/libkern/memmove.c 28 Dec 2016 21:48:53 - @@ -32,6 +32,8 @@ #include #include +#undef memmove + /* * This is designed to be small, not fast. */ Index: lib/libkern/memset.c === RCS file: /cvs/src/sys/lib/libkern/memset.c,v retrieving revision 1.7 diff -u -p -r1.7 memset.c --- lib/libkern/memset.c10 Jun 2014 04:16:57 - 1.7 +++ lib/libkern/memset.c28 Dec 2016 21:48:53 - @@ -39,6 +39,8 @@ #include #include +#undef memset + #definewsize sizeof(u_int) #definewmask (wsize - 1) Index: sys/systm.h === RCS file: /cvs/src/sys/sys/systm.h,v retrieving revision 1.120 diff -u -p -r1.120 systm.h --- sys/systm.h 19 Dec 2016 08:36:50 - 1.120 +++ sys/systm.h 28 Dec 2016 21:48:53 - @@ -330,6 +330,10 @@ extern int (*mountroot)(void); #include +#define memcpy(d, s, n)__builtin_memcpy((d), (s), (n)) +#define memmove(d, s, n) __builtin_memmove((d), (s), (n)) +#define memset(b, c, n)__builtin_memset((b), (c), (n)) + #if defined(DDB) || defined(KGDB) /* debugger entry points */ void Debugger(void); /* in DDB only */
Re: Build kernels with -ffreestanding?
>> Am 28.12.2016 um 08:29 schrieb Martin Pieuchot <m...@openbsd.org>: >> >> On 28/12/16(Wed) 01:05, Jeremie Courreges-Anglas wrote: >> Mark Kettenis <mark.kette...@xs4all.nl> writes: >> >>>> Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET) >>>> From: Mark Kettenis <mark.kette...@xs4all.nl> >>>> >>>> We already do this on some architectures, but not on amd64 for >>>> example. The main reason is that this disables memcpy() optimizations >>>> that have a measurable impact on the network stack performance. >>>> >>>> We can get those optimizations back by doing: >>>> >>>> #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) >>>> >>>> I verified that gcc still does proper bounds checking on >>>> __builtin_memcpy(), so we don't lose that. >>>> >>>> The nice thing about this solution is that we can choose explicitly >>>> which optimizations we want. And as you can see the kernel makefile >>>> gets simpler ;). >>>> >>>> Of course the real reason why I'm looking into this is that clang >>>> makes it really hard to build kernels without -ffreestanding. >>>> >>>> The diff below implements this strategy, and enabled the optimizations >>>> for memcpy() and memset(). We can add others if we think there is a >>>> benefit. I've tested the diff on amd64. We may need to put an #undef >>>> memcpy somewhere for platforms that use the generic C code for memcpy. >>>> >>>> Thoughts? >>> >>> So those #undefs are necessary. New diff below. Tested on armv7, >>> hppa and sparc64 now as well. >> >> I think this is the way to go; can't help tests on other archs, though. >> ok jca@ fwiw > > For the archives, Hrvoje Popovski measured a performance impact when using > a kernel with this diff to forward packets. I guess we're missing some > defines. I'm late to the game - but does this diff remove all the other optimizations as well? (eg. bcopy, memcmp, memmove, strchr, ...) I did some performance testing when I added them for amd64 in libc and it made a noticeable difference - not just for memcpy+memset. Reyk
Re: Build kernels with -ffreestanding?
On 28/12/16(Wed) 01:05, Jeremie Courreges-Anglas wrote: > Mark Kettenis <mark.kette...@xs4all.nl> writes: > > >> Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET) > >> From: Mark Kettenis <mark.kette...@xs4all.nl> > >> > >> We already do this on some architectures, but not on amd64 for > >> example. The main reason is that this disables memcpy() optimizations > >> that have a measurable impact on the network stack performance. > >> > >> We can get those optimizations back by doing: > >> > >> #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) > >> > >> I verified that gcc still does proper bounds checking on > >> __builtin_memcpy(), so we don't lose that. > >> > >> The nice thing about this solution is that we can choose explicitly > >> which optimizations we want. And as you can see the kernel makefile > >> gets simpler ;). > >> > >> Of course the real reason why I'm looking into this is that clang > >> makes it really hard to build kernels without -ffreestanding. > >> > >> The diff below implements this strategy, and enabled the optimizations > >> for memcpy() and memset(). We can add others if we think there is a > >> benefit. I've tested the diff on amd64. We may need to put an #undef > >> memcpy somewhere for platforms that use the generic C code for memcpy. > >> > >> Thoughts? > > > > So those #undefs are necessary. New diff below. Tested on armv7, > > hppa and sparc64 now as well. > > I think this is the way to go; can't help tests on other archs, though. > ok jca@ fwiw For the archives, Hrvoje Popovski measured a performance impact when using a kernel with this diff to forward packets. I guess we're missing some defines.
Re: Build kernels with -ffreestanding?
Mark Kettenis <mark.kette...@xs4all.nl> writes: >> Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET) >> From: Mark Kettenis <mark.kette...@xs4all.nl> >> >> We already do this on some architectures, but not on amd64 for >> example. The main reason is that this disables memcpy() optimizations >> that have a measurable impact on the network stack performance. >> >> We can get those optimizations back by doing: >> >> #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) >> >> I verified that gcc still does proper bounds checking on >> __builtin_memcpy(), so we don't lose that. >> >> The nice thing about this solution is that we can choose explicitly >> which optimizations we want. And as you can see the kernel makefile >> gets simpler ;). >> >> Of course the real reason why I'm looking into this is that clang >> makes it really hard to build kernels without -ffreestanding. >> >> The diff below implements this strategy, and enabled the optimizations >> for memcpy() and memset(). We can add others if we think there is a >> benefit. I've tested the diff on amd64. We may need to put an #undef >> memcpy somewhere for platforms that use the generic C code for memcpy. >> >> Thoughts? > > So those #undefs are necessary. New diff below. Tested on armv7, > hppa and sparc64 now as well. I think this is the way to go; can't help tests on other archs, though. ok jca@ fwiw -- jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE
Re: Build kernels with -ffreestanding?
On Sat, Dec 24, 2016 at 12:08:35AM +0100, Mark Kettenis wrote: > We can get those optimizations back by doing: > > #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) You might still want to put a prototype in, just before the define. Joerg
Re: Build kernels with -ffreestanding?
> Date: Sun, 25 Dec 2016 09:40:05 +1100 > From: Jonathan Gray <j...@jsg.id.au> > > On Sat, Dec 24, 2016 at 05:07:11PM +0100, Mark Kettenis wrote: > > > Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET) > > > From: Mark Kettenis <mark.kette...@xs4all.nl> > > > > > > We already do this on some architectures, but not on amd64 for > > > example. The main reason is that this disables memcpy() optimizations > > > that have a measurable impact on the network stack performance. > > > > > > We can get those optimizations back by doing: > > > > > > #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) > > > > > > I verified that gcc still does proper bounds checking on > > > __builtin_memcpy(), so we don't lose that. > > > > > > The nice thing about this solution is that we can choose explicitly > > > which optimizations we want. And as you can see the kernel makefile > > > gets simpler ;). > > > > > > Of course the real reason why I'm looking into this is that clang > > > makes it really hard to build kernels without -ffreestanding. > > > > > > The diff below implements this strategy, and enabled the optimizations > > > for memcpy() and memset(). We can add others if we think there is a > > > benefit. I've tested the diff on amd64. We may need to put an #undef > > > memcpy somewhere for platforms that use the generic C code for memcpy. > > > > > > Thoughts? > > > > So those #undefs are necessary. New diff below. Tested on armv7, > > hppa and sparc64 now as well. macppc tested now as well > I agree this is the way we want to go. It also avoids having > to expand the list to -fno-builtin-free etc for newer versions of gcc. > > Why build the memcpy/memset in libkern at all if we go this route? __builtin_memcpy() may still expand to an explicit memset() call if the compiler decides not to inline it. > > Index: sys/systm.h > > === > > RCS file: /cvs/src/sys/sys/systm.h,v > > retrieving revision 1.119 > > diff -u -p -r1.119 systm.h > > --- sys/systm.h 24 Sep 2016 18:35:52 - 1.119 > > +++ sys/systm.h 24 Dec 2016 16:05:48 - > > @@ -306,6 +306,9 @@ extern int (*mountroot)(void); > > > > #include > > > > +#define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) > > +#define memset(b, c, n) __builtin_memset((b), (c), (n)) > > + > > #if defined(DDB) || defined(KGDB) > > /* debugger entry points */ > > void Debugger(void); /* in DDB only */ > > Index: lib/libkern/memcpy.c > > === > > RCS file: /cvs/src/sys/lib/libkern/memcpy.c,v > > retrieving revision 1.3 > > diff -u -p -r1.3 memcpy.c > > --- lib/libkern/memcpy.c12 Jun 2013 16:44:22 - 1.3 > > +++ lib/libkern/memcpy.c24 Dec 2016 16:05:48 - > > @@ -32,6 +32,8 @@ > > #include > > #include > > > > +#undef memcpy > > + > > /* > > * This is designed to be small, not fast. > > */ > > Index: lib/libkern/memset.c > > === > > RCS file: /cvs/src/sys/lib/libkern/memset.c,v > > retrieving revision 1.7 > > diff -u -p -r1.7 memset.c > > --- lib/libkern/memset.c10 Jun 2014 04:16:57 - 1.7 > > +++ lib/libkern/memset.c24 Dec 2016 16:05:48 - > > @@ -39,6 +39,8 @@ > > #include > > #include > > > > +#undef memset > > + > > #definewsize sizeof(u_int) > > #definewmask (wsize - 1) > > > > Index: arch/amd64/conf/Makefile.amd64 > > === > > RCS file: /cvs/src/sys/arch/amd64/conf/Makefile.amd64,v > > retrieving revision 1.74 > > diff -u -p -r1.74 Makefile.amd64 > > --- arch/amd64/conf/Makefile.amd64 29 Nov 2016 09:08:34 - 1.74 > > +++ arch/amd64/conf/Makefile.amd64 24 Dec 2016 16:05:49 - > > @@ -29,9 +29,7 @@ CWARNFLAGS= -Werror -Wall -Wimplicit-fun > > > > CMACHFLAGS=-mcmodel=kernel -mno-red-zone -mno-sse2 -mno-sse > > -mno-3dnow \ > > -mno-mmx -msoft-float -fno-omit-frame-pointer > > -CMACHFLAGS+= -fno-builtin-printf -fno-builtin-snprintf \ > > - -fno-builtin-vsnprintf -fno-builtin-log \ > > - -fno-builtin-log2 -fno-builtin-malloc ${NOPIE_FLAGS} > > +CMACHFLAGS+= -ffreestanding ${NOPIE_FLAGS} > > .if ${IDENT:M-DNO_PROPOLICE} > > CMACHFLAGS+= -fno-stack-protector > > .endif > > >
Re: Build kernels with -ffreestanding?
On Sat, Dec 24, 2016 at 05:07:11PM +0100, Mark Kettenis wrote: > > Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET) > > From: Mark Kettenis <mark.kette...@xs4all.nl> > > > > We already do this on some architectures, but not on amd64 for > > example. The main reason is that this disables memcpy() optimizations > > that have a measurable impact on the network stack performance. > > > > We can get those optimizations back by doing: > > > > #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) > > > > I verified that gcc still does proper bounds checking on > > __builtin_memcpy(), so we don't lose that. > > > > The nice thing about this solution is that we can choose explicitly > > which optimizations we want. And as you can see the kernel makefile > > gets simpler ;). > > > > Of course the real reason why I'm looking into this is that clang > > makes it really hard to build kernels without -ffreestanding. > > > > The diff below implements this strategy, and enabled the optimizations > > for memcpy() and memset(). We can add others if we think there is a > > benefit. I've tested the diff on amd64. We may need to put an #undef > > memcpy somewhere for platforms that use the generic C code for memcpy. > > > > Thoughts? > > So those #undefs are necessary. New diff below. Tested on armv7, > hppa and sparc64 now as well. I agree this is the way we want to go. It also avoids having to expand the list to -fno-builtin-free etc for newer versions of gcc. Why build the memcpy/memset in libkern at all if we go this route? > > Index: sys/systm.h > === > RCS file: /cvs/src/sys/sys/systm.h,v > retrieving revision 1.119 > diff -u -p -r1.119 systm.h > --- sys/systm.h 24 Sep 2016 18:35:52 - 1.119 > +++ sys/systm.h 24 Dec 2016 16:05:48 - > @@ -306,6 +306,9 @@ extern int (*mountroot)(void); > > #include > > +#define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) > +#define memset(b, c, n) __builtin_memset((b), (c), (n)) > + > #if defined(DDB) || defined(KGDB) > /* debugger entry points */ > void Debugger(void); /* in DDB only */ > Index: lib/libkern/memcpy.c > === > RCS file: /cvs/src/sys/lib/libkern/memcpy.c,v > retrieving revision 1.3 > diff -u -p -r1.3 memcpy.c > --- lib/libkern/memcpy.c 12 Jun 2013 16:44:22 - 1.3 > +++ lib/libkern/memcpy.c 24 Dec 2016 16:05:48 - > @@ -32,6 +32,8 @@ > #include > #include > > +#undef memcpy > + > /* > * This is designed to be small, not fast. > */ > Index: lib/libkern/memset.c > === > RCS file: /cvs/src/sys/lib/libkern/memset.c,v > retrieving revision 1.7 > diff -u -p -r1.7 memset.c > --- lib/libkern/memset.c 10 Jun 2014 04:16:57 - 1.7 > +++ lib/libkern/memset.c 24 Dec 2016 16:05:48 - > @@ -39,6 +39,8 @@ > #include > #include > > +#undef memset > + > #define wsize sizeof(u_int) > #define wmask (wsize - 1) > > Index: arch/amd64/conf/Makefile.amd64 > === > RCS file: /cvs/src/sys/arch/amd64/conf/Makefile.amd64,v > retrieving revision 1.74 > diff -u -p -r1.74 Makefile.amd64 > --- arch/amd64/conf/Makefile.amd6429 Nov 2016 09:08:34 - 1.74 > +++ arch/amd64/conf/Makefile.amd6424 Dec 2016 16:05:49 - > @@ -29,9 +29,7 @@ CWARNFLAGS= -Werror -Wall -Wimplicit-fun > > CMACHFLAGS= -mcmodel=kernel -mno-red-zone -mno-sse2 -mno-sse -mno-3dnow \ > -mno-mmx -msoft-float -fno-omit-frame-pointer > -CMACHFLAGS+= -fno-builtin-printf -fno-builtin-snprintf \ > - -fno-builtin-vsnprintf -fno-builtin-log \ > - -fno-builtin-log2 -fno-builtin-malloc ${NOPIE_FLAGS} > +CMACHFLAGS+= -ffreestanding ${NOPIE_FLAGS} > .if ${IDENT:M-DNO_PROPOLICE} > CMACHFLAGS+= -fno-stack-protector > .endif >
Re: Build kernels with -ffreestanding?
> Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET) > From: Mark Kettenis <mark.kette...@xs4all.nl> > > We already do this on some architectures, but not on amd64 for > example. The main reason is that this disables memcpy() optimizations > that have a measurable impact on the network stack performance. > > We can get those optimizations back by doing: > > #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) > > I verified that gcc still does proper bounds checking on > __builtin_memcpy(), so we don't lose that. > > The nice thing about this solution is that we can choose explicitly > which optimizations we want. And as you can see the kernel makefile > gets simpler ;). > > Of course the real reason why I'm looking into this is that clang > makes it really hard to build kernels without -ffreestanding. > > The diff below implements this strategy, and enabled the optimizations > for memcpy() and memset(). We can add others if we think there is a > benefit. I've tested the diff on amd64. We may need to put an #undef > memcpy somewhere for platforms that use the generic C code for memcpy. > > Thoughts? So those #undefs are necessary. New diff below. Tested on armv7, hppa and sparc64 now as well. Index: sys/systm.h === RCS file: /cvs/src/sys/sys/systm.h,v retrieving revision 1.119 diff -u -p -r1.119 systm.h --- sys/systm.h 24 Sep 2016 18:35:52 - 1.119 +++ sys/systm.h 24 Dec 2016 16:05:48 - @@ -306,6 +306,9 @@ extern int (*mountroot)(void); #include +#define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) +#define memset(b, c, n) __builtin_memset((b), (c), (n)) + #if defined(DDB) || defined(KGDB) /* debugger entry points */ void Debugger(void); /* in DDB only */ Index: lib/libkern/memcpy.c === RCS file: /cvs/src/sys/lib/libkern/memcpy.c,v retrieving revision 1.3 diff -u -p -r1.3 memcpy.c --- lib/libkern/memcpy.c12 Jun 2013 16:44:22 - 1.3 +++ lib/libkern/memcpy.c24 Dec 2016 16:05:48 - @@ -32,6 +32,8 @@ #include #include +#undef memcpy + /* * This is designed to be small, not fast. */ Index: lib/libkern/memset.c === RCS file: /cvs/src/sys/lib/libkern/memset.c,v retrieving revision 1.7 diff -u -p -r1.7 memset.c --- lib/libkern/memset.c10 Jun 2014 04:16:57 - 1.7 +++ lib/libkern/memset.c24 Dec 2016 16:05:48 - @@ -39,6 +39,8 @@ #include #include +#undef memset + #definewsize sizeof(u_int) #definewmask (wsize - 1) Index: arch/amd64/conf/Makefile.amd64 === RCS file: /cvs/src/sys/arch/amd64/conf/Makefile.amd64,v retrieving revision 1.74 diff -u -p -r1.74 Makefile.amd64 --- arch/amd64/conf/Makefile.amd64 29 Nov 2016 09:08:34 - 1.74 +++ arch/amd64/conf/Makefile.amd64 24 Dec 2016 16:05:49 - @@ -29,9 +29,7 @@ CWARNFLAGS= -Werror -Wall -Wimplicit-fun CMACHFLAGS=-mcmodel=kernel -mno-red-zone -mno-sse2 -mno-sse -mno-3dnow \ -mno-mmx -msoft-float -fno-omit-frame-pointer -CMACHFLAGS+= -fno-builtin-printf -fno-builtin-snprintf \ - -fno-builtin-vsnprintf -fno-builtin-log \ - -fno-builtin-log2 -fno-builtin-malloc ${NOPIE_FLAGS} +CMACHFLAGS+= -ffreestanding ${NOPIE_FLAGS} .if ${IDENT:M-DNO_PROPOLICE} CMACHFLAGS+= -fno-stack-protector .endif
Build kernels with -ffreestanding?
We already do this on some architectures, but not on amd64 for example. The main reason is that this disables memcpy() optimizations that have a measurable impact on the network stack performance. We can get those optimizations back by doing: #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) I verified that gcc still does proper bounds checking on __builtin_memcpy(), so we don't lose that. The nice thing about this solution is that we can choose explicitly which optimizations we want. And as you can see the kernel makefile gets simpler ;). Of course the real reason why I'm looking into this is that clang makes it really hard to build kernels without -ffreestanding. The diff below implements this strategy, and enabled the optimizations for memcpy() and memset(). We can add others if we think there is a benefit. I've tested the diff on amd64. We may need to put an #undef memcpy somewhere for platforms that use the generic C code for memcpy. Thoughts? Index: sys/systm.h === RCS file: /cvs/src/sys/sys/systm.h,v retrieving revision 1.120 diff -u -p -r1.120 systm.h --- sys/systm.h 19 Dec 2016 08:36:50 - 1.120 +++ sys/systm.h 23 Dec 2016 22:53:15 - @@ -330,6 +330,9 @@ extern int (*mountroot)(void); #include +#define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) +#define memset(b, c, n) __builtin_memset((b), (c), (n)) + #if defined(DDB) || defined(KGDB) /* debugger entry points */ void Debugger(void); /* in DDB only */ Index: arch/amd64/conf/Makefile.amd64 === RCS file: /cvs/src/sys/arch/amd64/conf/Makefile.amd64,v retrieving revision 1.74 diff -u -p -r1.74 Makefile.amd64 --- arch/amd64/conf/Makefile.amd64 29 Nov 2016 09:08:34 - 1.74 +++ arch/amd64/conf/Makefile.amd64 23 Dec 2016 22:53:15 - @@ -29,9 +29,7 @@ CWARNFLAGS= -Werror -Wall -Wimplicit-fun CMACHFLAGS=-mcmodel=kernel -mno-red-zone -mno-sse2 -mno-sse -mno-3dnow \ -mno-mmx -msoft-float -fno-omit-frame-pointer -CMACHFLAGS+= -fno-builtin-printf -fno-builtin-snprintf \ - -fno-builtin-vsnprintf -fno-builtin-log \ - -fno-builtin-log2 -fno-builtin-malloc ${NOPIE_FLAGS} +CMACHFLAGS+= -ffreestanding ${NOPIE_FLAGS} .if ${IDENT:M-DNO_PROPOLICE} CMACHFLAGS+= -fno-stack-protector .endif