[PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-13 Thread Artem S. Tashkinov

> GCC 10 appears to have changed -O2 in order to make compilation time
faster when using -flto, seemingly at the expense of performance, in
particular with regards to how the inliner works. Since -O3 these days
shouldn't have the same set of bugs as 10 years ago, this commit
defaults new kernel compiles to -O3 when using gcc >= 10.

It's a strong "no" from me.

1) Aside from rare Gentoo users no one has extensively tested -O3 with
the kernel - even Gentoo defaults to -O2 for kernel compilation

2) -O3 _always_ bloats the code by a large amount which means both
vmlinux/bzImage and modules will become bigger, and slower to load from
the disk

3) -O3 does _not_ necessarily makes the code run faster

4) If GCC10 has removed certain options for the -O2 optimization level
you could just readded them as compilation flags without forcing -O3 by
default on everyone

5) If you still insist on -O3 I guess everyone would be happy if you
just made two KConfig options:

OPTIMIZE_O2 (-O2)
OPTIMIZE_O3_EVEN_MOAR (-O3)

Best regards,
Artem


Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-11 Thread Masahiro Yamada
On Sun, May 10, 2020 at 9:47 PM David Laight  wrote:
>
> From: Joe Perches
> > Sent: 08 May 2020 16:06
> > On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote:
> > > Personally, I'm more interested in improving compile speed of the kernel
> >
> > Any opinion on precompiled header support?
>
> When ever I've been anywhere near it it is always a disaster.
> It may make sense for C++ where there is lots of complicated
> code to parse in .h files. Parsing C headers is usually easier.
>
> One this I have done that significantly speeds up .h file
> processing is to take the long list of '-I directory' parameters
> that are passed to the compiler and copy the first version
> of each file into a separate 'object headers' directory.
> This saves the compiler doing lots of 'failed opens'.
>
> If each fragment makefile lists its 'public' headers make
> can generate dependency rules that do the copies.
>
> FWIW make is much faster if you delete all the builtin and
> suffix rules and rely on explicit rules for each file.


Kbuild disables Make's builtin rules at least.


# Do not use make's built-in rules and variables
# (this increases performance and avoids hard-to-debug behaviour)
MAKEFLAGS += -rR



--
Best Regards
Masahiro Yamada


RE: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-10 Thread David Laight
From: Joe Perches
> Sent: 10 May 2020 18:45
> 
> On Sun, 2020-05-10 at 12:47 +, David Laight wrote:
> > From: Joe Perches
> > > Sent: 08 May 2020 16:06
> > > On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote:
> > > > Personally, I'm more interested in improving compile speed of the kernel
> > >
> > > Any opinion on precompiled header support?
> >
> > When ever I've been anywhere near it it is always a disaster.
> 
> A disaster? Why?

The only time I've had systems that used them they always got
out of step with the headers - probable due to #define changes.
If auto-generated by the compiler then parallel makes also
give problems.

> For a large commercial c only project, it worked well
> by reducing a combined multi-include file, similar to
> kernel.h here, to a single file.

Certainly reducing the number of directories searched
can make a big difference.

I've also compiled .so by merging all the sources into a
single file.

> That was before SSDs though and the file open times
> might have been rather larger then.

The real killer is lots of directory names in the -I 
especially over NFS.

I've also looked at system call stats during a kernel compile.
open() dominated and my 'gut feeling' was that most were
failing opens.

I also suspect that modern compilers remember that an include
file contained an include guard - and don't even both looking
for it a second time.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-10 Thread Joe Perches
On Sun, 2020-05-10 at 12:47 +, David Laight wrote:
> From: Joe Perches
> > Sent: 08 May 2020 16:06
> > On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote:
> > > Personally, I'm more interested in improving compile speed of the kernel
> > 
> > Any opinion on precompiled header support?
> 
> When ever I've been anywhere near it it is always a disaster.

A disaster? Why?

For a large commercial c only project, it worked well
by reducing a combined multi-include file, similar to
kernel.h here, to a single file.

That was before SSDs though and the file open times
might have been rather larger then.




RE: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-10 Thread David Laight
From: Joe Perches
> Sent: 08 May 2020 16:06
> On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote:
> > Personally, I'm more interested in improving compile speed of the kernel
> 
> Any opinion on precompiled header support?

When ever I've been anywhere near it it is always a disaster.
It may make sense for C++ where there is lots of complicated
code to parse in .h files. Parsing C headers is usually easier.

One this I have done that significantly speeds up .h file
processing is to take the long list of '-I directory' parameters
that are passed to the compiler and copy the first version
of each file into a separate 'object headers' directory.
This saves the compiler doing lots of 'failed opens'.

If each fragment makefile lists its 'public' headers make
can generate dependency rules that do the copies.

FWIW make is much faster if you delete all the builtin and
suffix rules and rely on explicit rules for each file.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-08 Thread Arnd Bergmann
On Fri, May 8, 2020 at 5:06 PM Joe Perches  wrote:
>
> On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote:
> > Personally, I'm more interested in improving compile speed of the kernel
>
> Any opinion on precompiled header support?

I have not tried it. IIRC precompiled headers usually work best for projects
that have a large header with all the global declarations that gets included
everywhere, while Linux has always tried (with different amounts of success)
to minimize the number of headers that get included per file.

   Arnd


Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-08 Thread Joe Perches
On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote:
> Personally, I'm more interested in improving compile speed of the kernel

Any opinion on precompiled header support?




Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-08 Thread Arnd Bergmann
On Fri, May 8, 2020 at 2:07 PM Jason A. Donenfeld  wrote:
> On Fri, May 8, 2020 at 5:56 AM Arnd Bergmann  wrote:
>
> The other significant thing -- and what prompted this patchset -- is
> it looks like gcc 10 has lowered the inlining degree for -O2, and put
> gcc 9's inlining parameters from -O2 into gcc-10's -O3.

I suspect it is more complicated than that, as there are a number of
parameters that determine inlining decisions. It's also not clear whether
the ones for -O3 are generally better than the ones with -O2, or if it's
just that whatever changed caused a few surprises but is otherwise
preferable.

Did you see regressions in specific modules, or just a general slowdown
or growth in object size as the result?

  Arnd


Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-08 Thread Jason A. Donenfeld
On Fri, May 8, 2020 at 5:56 AM Arnd Bergmann  wrote:
>
> On Fri, May 8, 2020 at 1:33 PM Oleksandr Natalenko  
> wrote:
> >
> > On Fri, May 08, 2020 at 05:21:47AM -0600, Jason A. Donenfeld wrote:
> > > > Should we untangle -O3 from depending on ARC first maybe?
> > >
> > > Oh, hah, good point. Yes, I'll do that for a v2, but will wait another
> > > day for feedback first.
> >
> > Just keep in mind that my previous attempt [1] failed because of too
> > many false positive warnings despite -O3 really uncovered a couple of
> > bugs in the codebase.
>
> I think my warning fixes were mostly picked up in the meantime, but
> if there are any remaining, they would be mixed in with random other
> fixes in my testing tree, so it's hard to know for sure.
>
> I also want to hear the feedback from the gcc developers about what
> the general recommendations are between O2 and O3, and how
> they may have changed over time. According to the gcc-10 documentation,
> the difference between -O2 and -O3 is exactly this set of options:
>
> -fgcse-after-reload
> -fipa-cp-clone
> -floop-interchange
> -floop-unroll-and-jam
> -fpeel-loops
> -fpredictive-commoning
> -fsplit-loops
> -fsplit-paths
> -ftree-loop-distribution
> -ftree-loop-vectorize
> -ftree-partial-pre
> -ftree-slp-vectorize
> -funswitch-loops
> -fvect-cost-model
> -fvect-cost-model=dynamic
> -fversion-loops-for-strides

The other significant thing -- and what prompted this patchset -- is
it looks like gcc 10 has lowered the inlining degree for -O2, and put
gcc 9's inlining parameters from -O2 into gcc-10's -O3.


Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-08 Thread Arnd Bergmann
On Fri, May 8, 2020 at 1:33 PM Oleksandr Natalenko  wrote:
>
> On Fri, May 08, 2020 at 05:21:47AM -0600, Jason A. Donenfeld wrote:
> > > Should we untangle -O3 from depending on ARC first maybe?
> >
> > Oh, hah, good point. Yes, I'll do that for a v2, but will wait another
> > day for feedback first.
>
> Just keep in mind that my previous attempt [1] failed because of too
> many false positive warnings despite -O3 really uncovered a couple of
> bugs in the codebase.

I think my warning fixes were mostly picked up in the meantime, but
if there are any remaining, they would be mixed in with random other
fixes in my testing tree, so it's hard to know for sure.

I also want to hear the feedback from the gcc developers about what
the general recommendations are between O2 and O3, and how
they may have changed over time. According to the gcc-10 documentation,
the difference between -O2 and -O3 is exactly this set of options:

-fgcse-after-reload
-fipa-cp-clone
-floop-interchange
-floop-unroll-and-jam
-fpeel-loops
-fpredictive-commoning
-fsplit-loops
-fsplit-paths
-ftree-loop-distribution
-ftree-loop-vectorize
-ftree-partial-pre
-ftree-slp-vectorize
-funswitch-loops
-fvect-cost-model
-fvect-cost-model=dynamic
-fversion-loops-for-strides

It's a relatively short list, so someone familiar with the options could
perhaps look into whether we want to change the default for all
of them, or if it makes sense to be more selective.

Personally, I'm more interested in improving compile speed of the kernel
and eventually supporting -Og or some variant of it for my own build
testing, but of course I also want to make sure that the other optimization
levels do not produce warnings, and -Og leads to more problems than
-O3 at the moment.

   Arnd


Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-08 Thread Oleksandr Natalenko
On Fri, May 08, 2020 at 05:21:47AM -0600, Jason A. Donenfeld wrote:
> > Should we untangle -O3 from depending on ARC first maybe?
> 
> Oh, hah, good point. Yes, I'll do that for a v2, but will wait another
> day for feedback first.

Just keep in mind that my previous attempt [1] failed because of too
many false positive warnings despite -O3 really uncovered a couple of
bugs in the codebase.

Lets hope your attempt will be more successfull. I'll happily offer my
review tag ;).

Also Cc'ing Andrew who (IIRC) tried to took my sumbission and Arnd who
tried to clean up the mess afterwards.

[1] https://lore.kernel.org/lkml/20191211104619.114557-1-oleksa...@redhat.com/

-- 
  Best regards,
Oleksandr Natalenko (post-factum)
Principal Software Maintenance Engineer



Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-08 Thread Jason A. Donenfeld
On Fri, May 8, 2020 at 3:08 AM Oleksandr Natalenko  wrote:
>
> Should we untangle -O3 from depending on ARC first maybe?

Oh, hah, good point. Yes, I'll do that for a v2, but will wait another
day for feedback first.

Jason


Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-08 Thread Oleksandr Natalenko
On Thu, May 07, 2020 at 04:45:30PM -0600, Jason A. Donenfeld wrote:
> GCC 10 appears to have changed -O2 in order to make compilation time
> faster when using -flto, seemingly at the expense of performance, in
> particular with regards to how the inliner works. Since -O3 these days
> shouldn't have the same set of bugs as 10 years ago, this commit
> defaults new kernel compiles to -O3 when using gcc >= 10.
> 
> Signed-off-by: Jason A. Donenfeld 
> ---
>  init/Kconfig | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index 9e22ee8fbd75..fab3f810a68d 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1245,7 +1245,8 @@ config BOOT_CONFIG
>  
>  choice
>   prompt "Compiler optimization level"
> - default CC_OPTIMIZE_FOR_PERFORMANCE
> + default CC_OPTIMIZE_FOR_PERFORMANCE_O3 if GCC_VERSION >= 10
> + default CC_OPTIMIZE_FOR_PERFORMANCE if (GCC_VERSION < 10 || 
> CC_IS_CLANG)
>  
>  config CC_OPTIMIZE_FOR_PERFORMANCE
>   bool "Optimize for performance (-O2)"
> -- 
> 2.26.2
> 

Should we untangle -O3 from depending on ARC first maybe?

-- 
  Best regards,
Oleksandr Natalenko (post-factum)
Principal Software Maintenance Engineer



Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-08 Thread Peter Zijlstra
On Thu, May 07, 2020 at 04:45:30PM -0600, Jason A. Donenfeld wrote:
> GCC 10 appears to have changed -O2 in order to make compilation time
> faster when using -flto, seemingly at the expense of performance, in
> particular with regards to how the inliner works. Since -O3 these days
> shouldn't have the same set of bugs as 10 years ago, this commit
> defaults new kernel compiles to -O3 when using gcc >= 10.

Would be nice to get some GCC person's feedback on this. But in general,
I think you're right in that O3 isn't the code-gen disaster it used to
be.

> Signed-off-by: Jason A. Donenfeld 
> ---
>  init/Kconfig | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index 9e22ee8fbd75..fab3f810a68d 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1245,7 +1245,8 @@ config BOOT_CONFIG
>  
>  choice
>   prompt "Compiler optimization level"
> - default CC_OPTIMIZE_FOR_PERFORMANCE
> + default CC_OPTIMIZE_FOR_PERFORMANCE_O3 if GCC_VERSION >= 10
> + default CC_OPTIMIZE_FOR_PERFORMANCE if (GCC_VERSION < 10 || 
> CC_IS_CLANG)
>  
>  config CC_OPTIMIZE_FOR_PERFORMANCE
>   bool "Optimize for performance (-O2)"
> -- 
> 2.26.2
> 


[PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10

2020-05-07 Thread Jason A. Donenfeld
GCC 10 appears to have changed -O2 in order to make compilation time
faster when using -flto, seemingly at the expense of performance, in
particular with regards to how the inliner works. Since -O3 these days
shouldn't have the same set of bugs as 10 years ago, this commit
defaults new kernel compiles to -O3 when using gcc >= 10.

Signed-off-by: Jason A. Donenfeld 
---
 init/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/init/Kconfig b/init/Kconfig
index 9e22ee8fbd75..fab3f810a68d 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1245,7 +1245,8 @@ config BOOT_CONFIG
 
 choice
prompt "Compiler optimization level"
-   default CC_OPTIMIZE_FOR_PERFORMANCE
+   default CC_OPTIMIZE_FOR_PERFORMANCE_O3 if GCC_VERSION >= 10
+   default CC_OPTIMIZE_FOR_PERFORMANCE if (GCC_VERSION < 10 || 
CC_IS_CLANG)
 
 config CC_OPTIMIZE_FOR_PERFORMANCE
bool "Optimize for performance (-O2)"
-- 
2.26.2