Re: Build kernels with -ffreestanding?

2016-12-28 Thread Mark Kettenis
> Date: Wed, 28 Dec 2016 22:59:18 +0100 (CET)
> From: Mark Kettenis <mark.kette...@xs4all.nl>
> 
> > Date: Wed, 28 Dec 2016 08:29:05 +0100
> > From: Martin Pieuchot <m...@openbsd.org>
> > 
> > On 28/12/16(Wed) 01:05, Jeremie Courreges-Anglas wrote:
> > > Mark Kettenis <mark.kette...@xs4all.nl> writes:
> > > 
> > > >> Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET)
> > > >> From: Mark Kettenis <mark.kette...@xs4all.nl>
> > > >> 
> > > >> We already do this on some architectures, but not on amd64 for
> > > >> example.  The main reason is that this disables memcpy() optimizations
> > > >> that have a measurable impact on the network stack performance.
> > > >> 
> > > >> We can get those optimizations back by doing:
> > > >> 
> > > >> #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
> > > >> 
> > > >> I verified that gcc still does proper bounds checking on
> > > >> __builtin_memcpy(), so we don't lose that.
> > > >> 
> > > >> The nice thing about this solution is that we can choose explicitly
> > > >> which optimizations we want.  And as you can see the kernel makefile
> > > >> gets simpler ;).
> > > >> 
> > > >> Of course the real reason why I'm looking into this is that clang
> > > >> makes it really hard to build kernels without -ffreestanding.
> > > >> 
> > > >> The diff below implements this strategy, and enabled the optimizations
> > > >> for memcpy() and memset().  We can add others if we think there is a
> > > >> benefit.  I've tested the diff on amd64.  We may need to put an #undef
> > > >> memcpy somewhere for platforms that use the generic C code for memcpy.
> > > >> 
> > > >> Thoughts?
> > > >
> > > > So those #undefs are necessary.  New diff below.  Tested on armv7,
> > > > hppa and sparc64 now as well.
> > > 
> > > I think this is the way to go; can't help tests on other archs, though.
> > > ok jca@ fwiw
> > 
> > For the archives, Hrvoje Popovski measured a performance impact when using
> > a kernel with this diff to forward packets.  I guess we're missing some
> > defines.
> 
> The most likely candidate is memmove.  Here is a diff that adds it.

Scrap that; memcmp needs this too.  New diff below.

Index: arch/amd64/conf/Makefile.amd64
===
RCS file: /cvs/src/sys/arch/amd64/conf/Makefile.amd64,v
retrieving revision 1.74
diff -u -p -r1.74 Makefile.amd64
--- arch/amd64/conf/Makefile.amd64  29 Nov 2016 09:08:34 -  1.74
+++ arch/amd64/conf/Makefile.amd64  28 Dec 2016 22:19:20 -
@@ -29,9 +29,7 @@ CWARNFLAGS=   -Werror -Wall -Wimplicit-fun
 
 CMACHFLAGS=-mcmodel=kernel -mno-red-zone -mno-sse2 -mno-sse -mno-3dnow \
-mno-mmx -msoft-float -fno-omit-frame-pointer
-CMACHFLAGS+=   -fno-builtin-printf -fno-builtin-snprintf \
-   -fno-builtin-vsnprintf -fno-builtin-log \
-   -fno-builtin-log2 -fno-builtin-malloc ${NOPIE_FLAGS}
+CMACHFLAGS+=   -ffreestanding ${NOPIE_FLAGS}
 .if ${IDENT:M-DNO_PROPOLICE}
 CMACHFLAGS+=   -fno-stack-protector
 .endif
Index: lib/libkern/memcmp.c
===
RCS file: /cvs/src/sys/lib/libkern/memcmp.c,v
retrieving revision 1.6
diff -u -p -r1.6 memcmp.c
--- lib/libkern/memcmp.c10 Jun 2014 04:16:57 -  1.6
+++ lib/libkern/memcmp.c28 Dec 2016 22:19:21 -
@@ -34,6 +34,8 @@
 
 #include 
 
+#undef memcmp
+
 /*
  * Compare memory regions.
  */
Index: lib/libkern/memcpy.c
===
RCS file: /cvs/src/sys/lib/libkern/memcpy.c,v
retrieving revision 1.3
diff -u -p -r1.3 memcpy.c
--- lib/libkern/memcpy.c12 Jun 2013 16:44:22 -  1.3
+++ lib/libkern/memcpy.c28 Dec 2016 22:19:21 -
@@ -32,6 +32,8 @@
 #include 
 #include 
 
+#undef memcpy
+
 /*
  * This is designed to be small, not fast.
  */
Index: lib/libkern/memmove.c
===
RCS file: /cvs/src/sys/lib/libkern/memmove.c,v
retrieving revision 1.1
diff -u -p -r1.1 memmove.c
--- lib/libkern/memmove.c   11 Jun 2013 18:04:41 -  1.1
+++ lib/libkern/memmove.c   28 Dec 2016 22:19:21 -
@@ -32,6 +32,8 @@
 #include 
 #include 
 
+#undef memmove
+
 /*
  * This is designed to be small, not fast.
  */
Index: lib/libkern/memset.c
===
RCS file: /cvs/src/sys/lib/libk

Re: Build kernels with -ffreestanding?

2016-12-28 Thread Mark Kettenis
> Date: Wed, 28 Dec 2016 08:29:05 +0100
> From: Martin Pieuchot <m...@openbsd.org>
> 
> On 28/12/16(Wed) 01:05, Jeremie Courreges-Anglas wrote:
> > Mark Kettenis <mark.kette...@xs4all.nl> writes:
> > 
> > >> Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET)
> > >> From: Mark Kettenis <mark.kette...@xs4all.nl>
> > >> 
> > >> We already do this on some architectures, but not on amd64 for
> > >> example.  The main reason is that this disables memcpy() optimizations
> > >> that have a measurable impact on the network stack performance.
> > >> 
> > >> We can get those optimizations back by doing:
> > >> 
> > >> #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
> > >> 
> > >> I verified that gcc still does proper bounds checking on
> > >> __builtin_memcpy(), so we don't lose that.
> > >> 
> > >> The nice thing about this solution is that we can choose explicitly
> > >> which optimizations we want.  And as you can see the kernel makefile
> > >> gets simpler ;).
> > >> 
> > >> Of course the real reason why I'm looking into this is that clang
> > >> makes it really hard to build kernels without -ffreestanding.
> > >> 
> > >> The diff below implements this strategy, and enabled the optimizations
> > >> for memcpy() and memset().  We can add others if we think there is a
> > >> benefit.  I've tested the diff on amd64.  We may need to put an #undef
> > >> memcpy somewhere for platforms that use the generic C code for memcpy.
> > >> 
> > >> Thoughts?
> > >
> > > So those #undefs are necessary.  New diff below.  Tested on armv7,
> > > hppa and sparc64 now as well.
> > 
> > I think this is the way to go; can't help tests on other archs, though.
> > ok jca@ fwiw
> 
> For the archives, Hrvoje Popovski measured a performance impact when using
> a kernel with this diff to forward packets.  I guess we're missing some
> defines.

The most likely candidate is memmove.  Here is a diff that adds it.

Index: arch/amd64/conf/Makefile.amd64
===
RCS file: /cvs/src/sys/arch/amd64/conf/Makefile.amd64,v
retrieving revision 1.74
diff -u -p -r1.74 Makefile.amd64
--- arch/amd64/conf/Makefile.amd64  29 Nov 2016 09:08:34 -  1.74
+++ arch/amd64/conf/Makefile.amd64  28 Dec 2016 21:48:52 -
@@ -29,9 +29,7 @@ CWARNFLAGS=   -Werror -Wall -Wimplicit-fun
 
 CMACHFLAGS=-mcmodel=kernel -mno-red-zone -mno-sse2 -mno-sse -mno-3dnow \
-mno-mmx -msoft-float -fno-omit-frame-pointer
-CMACHFLAGS+=   -fno-builtin-printf -fno-builtin-snprintf \
-   -fno-builtin-vsnprintf -fno-builtin-log \
-   -fno-builtin-log2 -fno-builtin-malloc ${NOPIE_FLAGS}
+CMACHFLAGS+=   -ffreestanding ${NOPIE_FLAGS}
 .if ${IDENT:M-DNO_PROPOLICE}
 CMACHFLAGS+=   -fno-stack-protector
 .endif
Index: lib/libkern/memcpy.c
===
RCS file: /cvs/src/sys/lib/libkern/memcpy.c,v
retrieving revision 1.3
diff -u -p -r1.3 memcpy.c
--- lib/libkern/memcpy.c12 Jun 2013 16:44:22 -  1.3
+++ lib/libkern/memcpy.c28 Dec 2016 21:48:53 -
@@ -32,6 +32,8 @@
 #include 
 #include 
 
+#undef memcpy
+
 /*
  * This is designed to be small, not fast.
  */
Index: lib/libkern/memmove.c
===
RCS file: /cvs/src/sys/lib/libkern/memmove.c,v
retrieving revision 1.1
diff -u -p -r1.1 memmove.c
--- lib/libkern/memmove.c   11 Jun 2013 18:04:41 -  1.1
+++ lib/libkern/memmove.c   28 Dec 2016 21:48:53 -
@@ -32,6 +32,8 @@
 #include 
 #include 
 
+#undef memmove
+
 /*
  * This is designed to be small, not fast.
  */
Index: lib/libkern/memset.c
===
RCS file: /cvs/src/sys/lib/libkern/memset.c,v
retrieving revision 1.7
diff -u -p -r1.7 memset.c
--- lib/libkern/memset.c10 Jun 2014 04:16:57 -  1.7
+++ lib/libkern/memset.c28 Dec 2016 21:48:53 -
@@ -39,6 +39,8 @@
 #include 
 #include 
 
+#undef memset
+
 #definewsize   sizeof(u_int)
 #definewmask   (wsize - 1)
 
Index: sys/systm.h
===
RCS file: /cvs/src/sys/sys/systm.h,v
retrieving revision 1.120
diff -u -p -r1.120 systm.h
--- sys/systm.h 19 Dec 2016 08:36:50 -  1.120
+++ sys/systm.h 28 Dec 2016 21:48:53 -
@@ -330,6 +330,10 @@ extern int (*mountroot)(void);
 
 #include 
 
+#define memcpy(d, s, n)__builtin_memcpy((d), (s), (n))
+#define memmove(d, s, n)   __builtin_memmove((d), (s), (n))
+#define memset(b, c, n)__builtin_memset((b), (c), (n))
+
 #if defined(DDB) || defined(KGDB)
 /* debugger entry points */
 void   Debugger(void); /* in DDB only */



Re: Build kernels with -ffreestanding?

2016-12-28 Thread Reyk Floeter

>> Am 28.12.2016 um 08:29 schrieb Martin Pieuchot <m...@openbsd.org>:
>> 
>> On 28/12/16(Wed) 01:05, Jeremie Courreges-Anglas wrote:
>> Mark Kettenis <mark.kette...@xs4all.nl> writes:
>> 
>>>> Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET)
>>>> From: Mark Kettenis <mark.kette...@xs4all.nl>
>>>> 
>>>> We already do this on some architectures, but not on amd64 for
>>>> example.  The main reason is that this disables memcpy() optimizations
>>>> that have a measurable impact on the network stack performance.
>>>> 
>>>> We can get those optimizations back by doing:
>>>> 
>>>> #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
>>>> 
>>>> I verified that gcc still does proper bounds checking on
>>>> __builtin_memcpy(), so we don't lose that.
>>>> 
>>>> The nice thing about this solution is that we can choose explicitly
>>>> which optimizations we want.  And as you can see the kernel makefile
>>>> gets simpler ;).
>>>> 
>>>> Of course the real reason why I'm looking into this is that clang
>>>> makes it really hard to build kernels without -ffreestanding.
>>>> 
>>>> The diff below implements this strategy, and enabled the optimizations
>>>> for memcpy() and memset().  We can add others if we think there is a
>>>> benefit.  I've tested the diff on amd64.  We may need to put an #undef
>>>> memcpy somewhere for platforms that use the generic C code for memcpy.
>>>> 
>>>> Thoughts?
>>> 
>>> So those #undefs are necessary.  New diff below.  Tested on armv7,
>>> hppa and sparc64 now as well.
>> 
>> I think this is the way to go; can't help tests on other archs, though.
>> ok jca@ fwiw
> 
> For the archives, Hrvoje Popovski measured a performance impact when using
> a kernel with this diff to forward packets.  I guess we're missing some
> defines.

I'm late to the game - but does this diff remove all the other optimizations as 
well? (eg. bcopy, memcmp, memmove, strchr, ...) I did some performance testing 
when I added them for amd64 in libc and it made a noticeable difference - not 
just for memcpy+memset.

Reyk


Re: Build kernels with -ffreestanding?

2016-12-27 Thread Martin Pieuchot
On 28/12/16(Wed) 01:05, Jeremie Courreges-Anglas wrote:
> Mark Kettenis <mark.kette...@xs4all.nl> writes:
> 
> >> Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET)
> >> From: Mark Kettenis <mark.kette...@xs4all.nl>
> >> 
> >> We already do this on some architectures, but not on amd64 for
> >> example.  The main reason is that this disables memcpy() optimizations
> >> that have a measurable impact on the network stack performance.
> >> 
> >> We can get those optimizations back by doing:
> >> 
> >> #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
> >> 
> >> I verified that gcc still does proper bounds checking on
> >> __builtin_memcpy(), so we don't lose that.
> >> 
> >> The nice thing about this solution is that we can choose explicitly
> >> which optimizations we want.  And as you can see the kernel makefile
> >> gets simpler ;).
> >> 
> >> Of course the real reason why I'm looking into this is that clang
> >> makes it really hard to build kernels without -ffreestanding.
> >> 
> >> The diff below implements this strategy, and enabled the optimizations
> >> for memcpy() and memset().  We can add others if we think there is a
> >> benefit.  I've tested the diff on amd64.  We may need to put an #undef
> >> memcpy somewhere for platforms that use the generic C code for memcpy.
> >> 
> >> Thoughts?
> >
> > So those #undefs are necessary.  New diff below.  Tested on armv7,
> > hppa and sparc64 now as well.
> 
> I think this is the way to go; can't help tests on other archs, though.
> ok jca@ fwiw

For the archives, Hrvoje Popovski measured a performance impact when using
a kernel with this diff to forward packets.  I guess we're missing some
defines.



Re: Build kernels with -ffreestanding?

2016-12-27 Thread Jeremie Courreges-Anglas
Mark Kettenis <mark.kette...@xs4all.nl> writes:

>> Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET)
>> From: Mark Kettenis <mark.kette...@xs4all.nl>
>> 
>> We already do this on some architectures, but not on amd64 for
>> example.  The main reason is that this disables memcpy() optimizations
>> that have a measurable impact on the network stack performance.
>> 
>> We can get those optimizations back by doing:
>> 
>> #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
>> 
>> I verified that gcc still does proper bounds checking on
>> __builtin_memcpy(), so we don't lose that.
>> 
>> The nice thing about this solution is that we can choose explicitly
>> which optimizations we want.  And as you can see the kernel makefile
>> gets simpler ;).
>> 
>> Of course the real reason why I'm looking into this is that clang
>> makes it really hard to build kernels without -ffreestanding.
>> 
>> The diff below implements this strategy, and enabled the optimizations
>> for memcpy() and memset().  We can add others if we think there is a
>> benefit.  I've tested the diff on amd64.  We may need to put an #undef
>> memcpy somewhere for platforms that use the generic C code for memcpy.
>> 
>> Thoughts?
>
> So those #undefs are necessary.  New diff below.  Tested on armv7,
> hppa and sparc64 now as well.

I think this is the way to go; can't help tests on other archs, though.
ok jca@ fwiw

-- 
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE



Re: Build kernels with -ffreestanding?

2016-12-24 Thread Joerg Sonnenberger
On Sat, Dec 24, 2016 at 12:08:35AM +0100, Mark Kettenis wrote:
> We can get those optimizations back by doing:
> 
> #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))

You might still want to put a prototype in, just before the define.

Joerg



Re: Build kernels with -ffreestanding?

2016-12-24 Thread Mark Kettenis
> Date: Sun, 25 Dec 2016 09:40:05 +1100
> From: Jonathan Gray <j...@jsg.id.au>
> 
> On Sat, Dec 24, 2016 at 05:07:11PM +0100, Mark Kettenis wrote:
> > > Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET)
> > > From: Mark Kettenis <mark.kette...@xs4all.nl>
> > > 
> > > We already do this on some architectures, but not on amd64 for
> > > example.  The main reason is that this disables memcpy() optimizations
> > > that have a measurable impact on the network stack performance.
> > > 
> > > We can get those optimizations back by doing:
> > > 
> > > #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
> > > 
> > > I verified that gcc still does proper bounds checking on
> > > __builtin_memcpy(), so we don't lose that.
> > > 
> > > The nice thing about this solution is that we can choose explicitly
> > > which optimizations we want.  And as you can see the kernel makefile
> > > gets simpler ;).
> > > 
> > > Of course the real reason why I'm looking into this is that clang
> > > makes it really hard to build kernels without -ffreestanding.
> > > 
> > > The diff below implements this strategy, and enabled the optimizations
> > > for memcpy() and memset().  We can add others if we think there is a
> > > benefit.  I've tested the diff on amd64.  We may need to put an #undef
> > > memcpy somewhere for platforms that use the generic C code for memcpy.
> > > 
> > > Thoughts?
> > 
> > So those #undefs are necessary.  New diff below.  Tested on armv7,
> > hppa and sparc64 now as well.

macppc tested now as well

> I agree this is the way we want to go.  It also avoids having
> to expand the list to -fno-builtin-free etc for newer versions of gcc.
> 
> Why build the memcpy/memset in libkern at all if we go this route?

__builtin_memcpy() may still expand to an explicit memset() call if
the compiler decides not to inline it.

> > Index: sys/systm.h
> > ===
> > RCS file: /cvs/src/sys/sys/systm.h,v
> > retrieving revision 1.119
> > diff -u -p -r1.119 systm.h
> > --- sys/systm.h 24 Sep 2016 18:35:52 -  1.119
> > +++ sys/systm.h 24 Dec 2016 16:05:48 -
> > @@ -306,6 +306,9 @@ extern int (*mountroot)(void);
> >  
> >  #include 
> >  
> > +#define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
> > +#define memset(b, c, n) __builtin_memset((b), (c), (n))
> > +
> >  #if defined(DDB) || defined(KGDB)
> >  /* debugger entry points */
> >  void   Debugger(void); /* in DDB only */
> > Index: lib/libkern/memcpy.c
> > ===
> > RCS file: /cvs/src/sys/lib/libkern/memcpy.c,v
> > retrieving revision 1.3
> > diff -u -p -r1.3 memcpy.c
> > --- lib/libkern/memcpy.c12 Jun 2013 16:44:22 -  1.3
> > +++ lib/libkern/memcpy.c24 Dec 2016 16:05:48 -
> > @@ -32,6 +32,8 @@
> >  #include 
> >  #include 
> >  
> > +#undef memcpy
> > +
> >  /*
> >   * This is designed to be small, not fast.
> >   */
> > Index: lib/libkern/memset.c
> > ===
> > RCS file: /cvs/src/sys/lib/libkern/memset.c,v
> > retrieving revision 1.7
> > diff -u -p -r1.7 memset.c
> > --- lib/libkern/memset.c10 Jun 2014 04:16:57 -  1.7
> > +++ lib/libkern/memset.c24 Dec 2016 16:05:48 -
> > @@ -39,6 +39,8 @@
> >  #include 
> >  #include 
> >  
> > +#undef memset
> > +
> >  #definewsize   sizeof(u_int)
> >  #definewmask   (wsize - 1)
> >  
> > Index: arch/amd64/conf/Makefile.amd64
> > ===
> > RCS file: /cvs/src/sys/arch/amd64/conf/Makefile.amd64,v
> > retrieving revision 1.74
> > diff -u -p -r1.74 Makefile.amd64
> > --- arch/amd64/conf/Makefile.amd64  29 Nov 2016 09:08:34 -  1.74
> > +++ arch/amd64/conf/Makefile.amd64  24 Dec 2016 16:05:49 -
> > @@ -29,9 +29,7 @@ CWARNFLAGS=   -Werror -Wall -Wimplicit-fun
> >  
> >  CMACHFLAGS=-mcmodel=kernel -mno-red-zone -mno-sse2 -mno-sse 
> > -mno-3dnow \
> > -mno-mmx -msoft-float -fno-omit-frame-pointer
> > -CMACHFLAGS+=   -fno-builtin-printf -fno-builtin-snprintf \
> > -   -fno-builtin-vsnprintf -fno-builtin-log \
> > -   -fno-builtin-log2 -fno-builtin-malloc ${NOPIE_FLAGS}
> > +CMACHFLAGS+=   -ffreestanding ${NOPIE_FLAGS}
> >  .if ${IDENT:M-DNO_PROPOLICE}
> >  CMACHFLAGS+=   -fno-stack-protector
> >  .endif
> > 
> 



Re: Build kernels with -ffreestanding?

2016-12-24 Thread Jonathan Gray
On Sat, Dec 24, 2016 at 05:07:11PM +0100, Mark Kettenis wrote:
> > Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET)
> > From: Mark Kettenis <mark.kette...@xs4all.nl>
> > 
> > We already do this on some architectures, but not on amd64 for
> > example.  The main reason is that this disables memcpy() optimizations
> > that have a measurable impact on the network stack performance.
> > 
> > We can get those optimizations back by doing:
> > 
> > #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
> > 
> > I verified that gcc still does proper bounds checking on
> > __builtin_memcpy(), so we don't lose that.
> > 
> > The nice thing about this solution is that we can choose explicitly
> > which optimizations we want.  And as you can see the kernel makefile
> > gets simpler ;).
> > 
> > Of course the real reason why I'm looking into this is that clang
> > makes it really hard to build kernels without -ffreestanding.
> > 
> > The diff below implements this strategy, and enabled the optimizations
> > for memcpy() and memset().  We can add others if we think there is a
> > benefit.  I've tested the diff on amd64.  We may need to put an #undef
> > memcpy somewhere for platforms that use the generic C code for memcpy.
> > 
> > Thoughts?
> 
> So those #undefs are necessary.  New diff below.  Tested on armv7,
> hppa and sparc64 now as well.

I agree this is the way we want to go.  It also avoids having
to expand the list to -fno-builtin-free etc for newer versions of gcc.

Why build the memcpy/memset in libkern at all if we go this route?

> 
> Index: sys/systm.h
> ===
> RCS file: /cvs/src/sys/sys/systm.h,v
> retrieving revision 1.119
> diff -u -p -r1.119 systm.h
> --- sys/systm.h   24 Sep 2016 18:35:52 -  1.119
> +++ sys/systm.h   24 Dec 2016 16:05:48 -
> @@ -306,6 +306,9 @@ extern int (*mountroot)(void);
>  
>  #include 
>  
> +#define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
> +#define memset(b, c, n) __builtin_memset((b), (c), (n))
> +
>  #if defined(DDB) || defined(KGDB)
>  /* debugger entry points */
>  void Debugger(void); /* in DDB only */
> Index: lib/libkern/memcpy.c
> ===
> RCS file: /cvs/src/sys/lib/libkern/memcpy.c,v
> retrieving revision 1.3
> diff -u -p -r1.3 memcpy.c
> --- lib/libkern/memcpy.c  12 Jun 2013 16:44:22 -  1.3
> +++ lib/libkern/memcpy.c  24 Dec 2016 16:05:48 -
> @@ -32,6 +32,8 @@
>  #include 
>  #include 
>  
> +#undef memcpy
> +
>  /*
>   * This is designed to be small, not fast.
>   */
> Index: lib/libkern/memset.c
> ===
> RCS file: /cvs/src/sys/lib/libkern/memset.c,v
> retrieving revision 1.7
> diff -u -p -r1.7 memset.c
> --- lib/libkern/memset.c  10 Jun 2014 04:16:57 -  1.7
> +++ lib/libkern/memset.c  24 Dec 2016 16:05:48 -
> @@ -39,6 +39,8 @@
>  #include 
>  #include 
>  
> +#undef memset
> +
>  #define  wsize   sizeof(u_int)
>  #define  wmask   (wsize - 1)
>  
> Index: arch/amd64/conf/Makefile.amd64
> ===
> RCS file: /cvs/src/sys/arch/amd64/conf/Makefile.amd64,v
> retrieving revision 1.74
> diff -u -p -r1.74 Makefile.amd64
> --- arch/amd64/conf/Makefile.amd6429 Nov 2016 09:08:34 -  1.74
> +++ arch/amd64/conf/Makefile.amd6424 Dec 2016 16:05:49 -
> @@ -29,9 +29,7 @@ CWARNFLAGS= -Werror -Wall -Wimplicit-fun
>  
>  CMACHFLAGS=  -mcmodel=kernel -mno-red-zone -mno-sse2 -mno-sse -mno-3dnow \
>   -mno-mmx -msoft-float -fno-omit-frame-pointer
> -CMACHFLAGS+= -fno-builtin-printf -fno-builtin-snprintf \
> - -fno-builtin-vsnprintf -fno-builtin-log \
> - -fno-builtin-log2 -fno-builtin-malloc ${NOPIE_FLAGS}
> +CMACHFLAGS+= -ffreestanding ${NOPIE_FLAGS}
>  .if ${IDENT:M-DNO_PROPOLICE}
>  CMACHFLAGS+= -fno-stack-protector
>  .endif
> 



Re: Build kernels with -ffreestanding?

2016-12-24 Thread Mark Kettenis
> Date: Sat, 24 Dec 2016 00:08:35 +0100 (CET)
> From: Mark Kettenis <mark.kette...@xs4all.nl>
> 
> We already do this on some architectures, but not on amd64 for
> example.  The main reason is that this disables memcpy() optimizations
> that have a measurable impact on the network stack performance.
> 
> We can get those optimizations back by doing:
> 
> #define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
> 
> I verified that gcc still does proper bounds checking on
> __builtin_memcpy(), so we don't lose that.
> 
> The nice thing about this solution is that we can choose explicitly
> which optimizations we want.  And as you can see the kernel makefile
> gets simpler ;).
> 
> Of course the real reason why I'm looking into this is that clang
> makes it really hard to build kernels without -ffreestanding.
> 
> The diff below implements this strategy, and enabled the optimizations
> for memcpy() and memset().  We can add others if we think there is a
> benefit.  I've tested the diff on amd64.  We may need to put an #undef
> memcpy somewhere for platforms that use the generic C code for memcpy.
> 
> Thoughts?

So those #undefs are necessary.  New diff below.  Tested on armv7,
hppa and sparc64 now as well.

Index: sys/systm.h
===
RCS file: /cvs/src/sys/sys/systm.h,v
retrieving revision 1.119
diff -u -p -r1.119 systm.h
--- sys/systm.h 24 Sep 2016 18:35:52 -  1.119
+++ sys/systm.h 24 Dec 2016 16:05:48 -
@@ -306,6 +306,9 @@ extern int (*mountroot)(void);
 
 #include 
 
+#define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
+#define memset(b, c, n) __builtin_memset((b), (c), (n))
+
 #if defined(DDB) || defined(KGDB)
 /* debugger entry points */
 void   Debugger(void); /* in DDB only */
Index: lib/libkern/memcpy.c
===
RCS file: /cvs/src/sys/lib/libkern/memcpy.c,v
retrieving revision 1.3
diff -u -p -r1.3 memcpy.c
--- lib/libkern/memcpy.c12 Jun 2013 16:44:22 -  1.3
+++ lib/libkern/memcpy.c24 Dec 2016 16:05:48 -
@@ -32,6 +32,8 @@
 #include 
 #include 
 
+#undef memcpy
+
 /*
  * This is designed to be small, not fast.
  */
Index: lib/libkern/memset.c
===
RCS file: /cvs/src/sys/lib/libkern/memset.c,v
retrieving revision 1.7
diff -u -p -r1.7 memset.c
--- lib/libkern/memset.c10 Jun 2014 04:16:57 -  1.7
+++ lib/libkern/memset.c24 Dec 2016 16:05:48 -
@@ -39,6 +39,8 @@
 #include 
 #include 
 
+#undef memset
+
 #definewsize   sizeof(u_int)
 #definewmask   (wsize - 1)
 
Index: arch/amd64/conf/Makefile.amd64
===
RCS file: /cvs/src/sys/arch/amd64/conf/Makefile.amd64,v
retrieving revision 1.74
diff -u -p -r1.74 Makefile.amd64
--- arch/amd64/conf/Makefile.amd64  29 Nov 2016 09:08:34 -  1.74
+++ arch/amd64/conf/Makefile.amd64  24 Dec 2016 16:05:49 -
@@ -29,9 +29,7 @@ CWARNFLAGS=   -Werror -Wall -Wimplicit-fun
 
 CMACHFLAGS=-mcmodel=kernel -mno-red-zone -mno-sse2 -mno-sse -mno-3dnow \
-mno-mmx -msoft-float -fno-omit-frame-pointer
-CMACHFLAGS+=   -fno-builtin-printf -fno-builtin-snprintf \
-   -fno-builtin-vsnprintf -fno-builtin-log \
-   -fno-builtin-log2 -fno-builtin-malloc ${NOPIE_FLAGS}
+CMACHFLAGS+=   -ffreestanding ${NOPIE_FLAGS}
 .if ${IDENT:M-DNO_PROPOLICE}
 CMACHFLAGS+=   -fno-stack-protector
 .endif



Build kernels with -ffreestanding?

2016-12-23 Thread Mark Kettenis
We already do this on some architectures, but not on amd64 for
example.  The main reason is that this disables memcpy() optimizations
that have a measurable impact on the network stack performance.

We can get those optimizations back by doing:

#define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))

I verified that gcc still does proper bounds checking on
__builtin_memcpy(), so we don't lose that.

The nice thing about this solution is that we can choose explicitly
which optimizations we want.  And as you can see the kernel makefile
gets simpler ;).

Of course the real reason why I'm looking into this is that clang
makes it really hard to build kernels without -ffreestanding.

The diff below implements this strategy, and enabled the optimizations
for memcpy() and memset().  We can add others if we think there is a
benefit.  I've tested the diff on amd64.  We may need to put an #undef
memcpy somewhere for platforms that use the generic C code for memcpy.

Thoughts?


Index: sys/systm.h
===
RCS file: /cvs/src/sys/sys/systm.h,v
retrieving revision 1.120
diff -u -p -r1.120 systm.h
--- sys/systm.h 19 Dec 2016 08:36:50 -  1.120
+++ sys/systm.h 23 Dec 2016 22:53:15 -
@@ -330,6 +330,9 @@ extern int (*mountroot)(void);
 
 #include 
 
+#define memcpy(d, s, n) __builtin_memcpy((d), (s), (n))
+#define memset(b, c, n) __builtin_memset((b), (c), (n))
+
 #if defined(DDB) || defined(KGDB)
 /* debugger entry points */
 void   Debugger(void); /* in DDB only */
Index: arch/amd64/conf/Makefile.amd64
===
RCS file: /cvs/src/sys/arch/amd64/conf/Makefile.amd64,v
retrieving revision 1.74
diff -u -p -r1.74 Makefile.amd64
--- arch/amd64/conf/Makefile.amd64  29 Nov 2016 09:08:34 -  1.74
+++ arch/amd64/conf/Makefile.amd64  23 Dec 2016 22:53:15 -
@@ -29,9 +29,7 @@ CWARNFLAGS=   -Werror -Wall -Wimplicit-fun
 
 CMACHFLAGS=-mcmodel=kernel -mno-red-zone -mno-sse2 -mno-sse -mno-3dnow \
-mno-mmx -msoft-float -fno-omit-frame-pointer
-CMACHFLAGS+=   -fno-builtin-printf -fno-builtin-snprintf \
-   -fno-builtin-vsnprintf -fno-builtin-log \
-   -fno-builtin-log2 -fno-builtin-malloc ${NOPIE_FLAGS}
+CMACHFLAGS+=   -ffreestanding ${NOPIE_FLAGS}
 .if ${IDENT:M-DNO_PROPOLICE}
 CMACHFLAGS+=   -fno-stack-protector
 .endif