Re: Old compiler versions (was Re: v4.10: kernel stack frame pointer .. has bad value (null))

2017-03-09 Thread Linus Torvalds
On Thu, Mar 9, 2017 at 2:49 AM, Pavel Machek  wrote:
>
> (On thinkpad X220, compiling bzip2)

You really shouldn't assume that the zlib build tracks the kernel build.

At least at some point, a noticeable part of the build cost for the
kernel was just parsing the fairly big source code. We have honking
big include files and deep nesting, and there is a lot of preprocessor
and just general parsing overhead for stuff that in most files don't
even generate code.

All those inline functions and type declarations for things that then
aren't actually used in most files means that you spend relatively
more time just parsing files than you spend on generating and
optimizing code.

So the trade-offs between different projects can be very different.
Some projects have huge tables with static initializers that gcc at
some point had serious quadratic-time issues with, and other code has
big functions where the actual optimization phase is the bulk of it.
And some projects have a lot of big and nested include files.

It's not nearly as bad as some C++ projects (where the header file
mess can often _easily_ be the dominant factor by far), but it's still
potentially completely different from something like building zlib.

Oh, and don't even bother looking at -O0 times. That's almost purely
parsing, but more importantly, the kernel has never in its lifetime
built without optimizations.

We basically rely on the compiler not being moronic crap. Always have,
always will.

> Unfortunately, 4.11-rc1 fails to compile on gcc 3.3.5.
>
>> 1. None (CC_STACKPROTECTOR_NONE) (NEW)
>
> is needed. Easy. But then I get
>
>   AS  arch/x86/entry/entry_32.o
>   arch/x86/entry/entry_32.S: Assembler messages:
>   arch/x86/entry/entry_32.S:440: Error: invalid character '"' in
>   operand 1
>
> from the ALTERNATIVE macro. It seems 3.3 just does not like " in macro
> arguments.

Ok. Clearly our checks in  are outdated, and we
"allow" compilers that don't actually work.

> But that looks fixable. But when I force the compilation, it is
> actually _slower_ than recent gcc (23 minutes vs. 13
> minutes). Interesting.

I forget when gcc got the "integrated preprocessor". It's a long time
ago. But that actually sped things up, because it basically halves (or
more) the overhead of parsing.

With an external preprocessor you obviously first have cpp doing its
parsing, then writing the preprocessed results out, and then you had
cc1 doing parsing again.

So yes, gcc has gotten a lot slower over time, but some things have
actually improved.

 Linus


Old compiler versions (was Re: v4.10: kernel stack frame pointer .. has bad value (null))

2017-03-09 Thread Pavel Machek
Hi!

> > > - CONFIG_FUNCTION_GRAPH_TRACER sets it on x86-32 because of a gcc bug
> > >   where the stack gets aligned before the mcount call.  This issue
> > >   should be mostly obsolete as most modern compilers now have -mfentry.
> > >   We could make it dependent on CC_USING_FENTRY.
> > 
> > Yeah. At some point we might even upgrade the compiler requirements to
> > no longer accept the mcount model.
> > 
> > I think the fentry model is gcc-4.6.0 and up. Currently I guess we
> > support gcc-3.2+, which is fairly ridiculous considering that 4.6.0 is
> > from March, 2011. So it's over five years ago already.
> > 
> > gcc-3.2.0 is from 2002, I think. At some point you just have to say
> > "caring about a 15 year old compiler is ridiculous"
> > 
> > The main reason we have fairly aggressively supported old compilers
> > tends to be some odder architectures that don't have good support, so
> > people use various random "this works for me" versions.
> > 
> > We could easily make the gcc version checks much more strict on x86,
> > I suspect.
> 
> Well, I have fast CPUs, but most of the time they just compile
> stuff. Especially bisect is compile-heavy. I suspect going back to
> gcc-3.2 would bring me bigger advantages than CPU upgrade...

Okay, would not it be nice if we supported gcc-3.3? It compiles about
twice the speed of gcc-4.9, across the board: (If we could compile at
-O1, we'd get 4 times the speed. At -O0, we'd be at cca 9 times the
speed; that would be useful for a bisect!)

Good news is that -Os is quite significantly faster than -O2 (and
already supported), so that should be simple way to optimize bisect 
performance. 

(On thinkpad X220, compiling bzip2)

| mach |  gcc | ||   real |   user |   sys |   $
| x220 | 4.9.2-10 | -O0 | bzip2.c caf036 |  0.644 |   0.54 |  0.03 |   $
|  |  | -O1 ||  1.501 ||   |   $
|  |  | -O2 ||  2.607 ||   |   $
|  |  | -O3 ||  3.052 ||   |   $
|  |  | -Os ||  1.839 ||   |   $
|  | 3.3.5-13 | -O0 ||  0.343 |  0.300 | 0.028 |   $
|  |  | -O1 ||  0.721 ||   |   $
|  |  | -O2 ||  1.238 ||   |   $
|  |  | -O3 ||  1.598 |  1.508 | 0.032 |   $


Unfortunately, 4.11-rc1 fails to compile on gcc 3.3.5.

> 1. None (CC_STACKPROTECTOR_NONE) (NEW)

is needed. Easy. But then I get

  AS  arch/x86/entry/entry_32.o
  arch/x86/entry/entry_32.S: Assembler messages:
  arch/x86/entry/entry_32.S:440: Error: invalid character '"' in
  operand 1

from the ALTERNATIVE macro. It seems 3.3 just does not like " in macro
arguments.

arch/x86/boot/bioscall.S: Assembler messages:
arch/x86/boot/bioscall.S:68: Error: `68(%esp)' is not a valid 16 bit
base/index expression

Plus I get about milion of

 from fs/fs-writeback.c:23:
 include/linux/irq.h:419: warning: parameter has
 incomplete type
 include/linux/irq.h:420: warning: parameter has
 incomplete type
 
... and problem with builtin_ffs in drm_blend.c, and others with
function alignment in drm.

lzo1x_compress needs __builtin_ctz. In the end, compilation fails with

mm/built-in.o(.text+0x2b714): In function `do_set_pmd':
: undefined reference to `__compiletime_assert_3034'
mm/built-in.o(.text+0x2c09a): In function `create_huge_pmd':
: undefined reference to `do_huge_pmd_anonymous_page'
mm/built-in.o(.text+0x2c0ca): In function `wp_huge_pmd':
: undefined reference to `do_huge_pmd_wp_page'
drivers/built-in.o(.text+0xe5a2b): In function
`cea_mode_alternate_timings':
: undefined reference to `__compiletime_assert_2638'
drivers/built-in.o(.text+0x3c969f): In function `sg_ioctl':
: undefined reference to `__divdi3'

But that looks fixable. But when I force the compilation, it is
actually _slower_ than recent gcc (23 minutes vs. 13
minutes). Interesting. If someone knows what old gcc versions actually
compile recent kernels, I'd like to know.

Best regards,

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature