[Bug bootstrap/91034] In tree build of gmp fails on Raspberry Pi4 (ARM Cortex A72) with `mls r1,r4,r8,r11' not supported in ARM mode

2019-07-02 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91034

--- Comment #16 from Andrew Roberts  ---
That kicks a memory loose, from my build script:

# sed needed for GMP >=5.1 && < 6.2.0 on ARM otherwise isl build fails with
# undefined symbol __gmpn_invert_limb
sed -ixx "s/none-/${uname_m}-/" Makefile

Building natively on arm was failing using the host set to none-*-*. 

none-*-* seems to work ok on Raspbian on the pi4. And it fails if you alter it,
although as I'm altering to uname -m, which gives armv7l, this would explain
some things. But not why a vanilla gcc-8.3.0 and 9.1.0 built with system gmp
can be used to rebuild themselves using the above sed and intree gmp.

However all my other arm machines (odroid-c2, odroid-xu4, rpi zero, rpi b, rpi
3b) all need this fix to build. They are running arch linux arm. 


I'll recheck on the faster ones in the next few days, with both 8.3.0 and
9.1.0, to confirm if that is still the case.

[Bug bootstrap/91034] In tree build of gmp fails on Raspberry Pi4 (ARM Cortex A72) with `mls r1,r4,r8,r11' not supported in ARM mode

2019-07-01 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91034

--- Comment #14 from Andrew Roberts  ---
One final point even vanilla gcc 9.1.0 fails to build gmp standalone if
CFLAGS is set, so issue with Raspbian compiler is that it is probably setting
CFLAGS and thus messing up gmp build.

To cause standalone gmp 6.1.2 build to fail with vanilla gcc release:
CFLAGS="-v" ; export CFLAGS
tar -xf gmp-6.1.2.tar.*
cd gmp-6.1.2
./configure
make

but ./configure --enable-assembly=no is fine.

Of course the system binutils could be out of whack also, but rebuilding that
is a job for another day.

[Bug bootstrap/91034] In tree build of gmp fails on Raspberry Pi4 (ARM Cortex A72) with `mls r1,r4,r8,r11' not supported in ARM mode

2019-07-01 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91034

--- Comment #13 from Andrew Roberts  ---
Just tried --enable-assembly=no with the standalone build of gmp
and this does seem to work as advertised. Everything symlinked to .c rather
than .asm files, and no .asm or .s files built at all.

Building gmp standalone with the broken raspbian compiler and setting
CFLAGS="-v" works when configuring using --enable-assembly=no, but fails
without it. 

Long term this is probably the best way forward...

[Bug bootstrap/91034] In tree build of gmp fails on Raspberry Pi4 (ARM Cortex A72) with `mls r1,r4,r8,r11' not supported in ARM mode

2019-07-01 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91034

--- Comment #12 from Andrew Roberts  ---
GMP 6.1.0 and later support the following configure option:
--enable-assembly   enable the use of assembly loops [default=yes]

not sure if this could be used to stop gmp using assembler.

[Bug bootstrap/91034] In tree build of gmp fails on Raspberry Pi4 (ARM Cortex A72) with `mls r1,r4,r8,r11' not supported in ARM mode

2019-07-01 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91034

--- Comment #11 from Andrew Roberts  ---
Richard,
the cpu supports mls (its a ARM Cortex A72). 

Comment 2 shows the -v output for both building gmp within gcc and standalone.
When building gmp in tree using Raspbian compiler:

as --gdwarf2 -v -I . -I ../../../gcc-9.1.0/gmp/mpn -I .. -I
../../../gcc-9.1.0/gmp -march=armv6 -mfloat-abi=hard -mfpu=vfp -meabi=5
--noexecstack -o divrem_1.o tmp-divrem_1.s

The issue is that when building gmp in tree no assembler code is supposed to be
used.

This looks like a problem with the Raspbian build of gcc breaking something. 
Possibly relating to CFLAGS, as setting CFLAGS to -v also causes the standalone
build of gmp to break with similar assembler errors.

When building an original gcc-8.3.0 or 9.1.0 release using the system gmp (not
in tree), this compiler can then be used to rebuild its self with gmp in tree
with no issues. So problem is specific to the Raspbian host compiler.

[Bug bootstrap/91034] In tree build of gmp fails on Raspberry Pi4 (ARM Cortex A72) with `mls r1,r4,r8,r11' not supported in ARM mode

2019-07-01 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91034

--- Comment #9 from Andrew Roberts  ---
For completeness I've also built gcc 8.3.0 with in tree gmp 6.1.2 using the
newly built 9.1.0. And then in turn used this gcc 8.3.0 to rebuild gcc 9.1.0
with in tree gmp.

So the host gcc 8.3.0 doesn't work building gcc with in tree gmp.
But all the versions I have built (9.1.0 and 8.3.0) build this correctly.

[Bug bootstrap/91034] In tree build of gmp fails on Raspberry Pi4 (ARM Cortex A72) with `mls r1,r4,r8,r11' not supported in ARM mode

2019-07-01 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91034

--- Comment #8 from Andrew Roberts  ---
Build of 9.1.0 using system gmp worked fine. 
Rebuild of 9.1.0 with in tree gmp-6.1.2 using that version of gcc
also worked fine.

Thus probably a host gcc compiler problem,
I'll report to Raspbian.

[Bug bootstrap/91034] In tree build of gmp fails on Raspberry Pi4 (ARM Cortex A72) with `mls r1,r4,r8,r11' not supported in ARM mode

2019-07-01 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91034

--- Comment #6 from Andrew Roberts  ---
I'm now building gcc 9.1.0 with the system gmp (using --with-gmp-lib and
--with-gmp-include). If this is successful I'll use this compiler to rebuild
itself with an in tree gmp and see where that gets me. Hopefully will allow the
elimination of the host gcc as a culprit.

[Bug bootstrap/91034] In tree build of gmp fails on Raspberry Pi4 (ARM Cortex A72) with `mls r1,r4,r8,r11' not supported in ARM mode

2019-06-30 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91034

--- Comment #5 from Andrew Roberts  ---
OK I tried again using the latest gmp snapshot:
 gmp-6.1.99-20190630.tar.lz 

I still get the same error.

[Bug bootstrap/91034] In tree build of gmp fails on Raspberry Pi4 (ARM Cortex A72) with `mls r1,r4,r8,r11' not supported in ARM mode

2019-06-29 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91034

--- Comment #3 from Andrew Roberts  ---
Created attachment 46535
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46535=edit
tmp-divrem_1.s file generated using m4 from divrem_1.asm in gmp/mpn

This is the gmp 6.1.2 version, from the gcc 9.1.0 build directory.

[Bug bootstrap/91034] In tree build of gmp fails on Raspberry Pi4 (ARM Cortex A72) with `mls r1,r4,r8,r11' not supported in ARM mode

2019-06-29 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91034

--- Comment #2 from Andrew Roberts  ---
Building 9.1.0 with gmp 6.1.2 using "-v -save-temps" gives:

gcc -v -save-temps -c -DHAVE_CONFIG_H -I. -I../../../gcc-9.1.0/gmp/mpn -I..
-D__GMP_WITHIN_GMP  -I../../../gcc-9.1.0/gmp -DOPERATION_divrem_1 -DNO_ASM -O2
-g -pipe -Wa,--noexecstack tmp-divrem_1.s -o divrem_1.o
gcc: warning: -pipe ignored because -save-temps specified
Using built-in specs.
COLLECT_GCC=gcc
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Raspbian 8.3.0-6+rpi1'
--with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr
--with-gcc-major-version-only --program-suffix=-8
--program-prefix=arm-linux-gnueabihf- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm
--disable-libquadmath --disable-libquadmath-support --enable-plugin
--with-system-zlib --with-target-system-zlib --enable-objc-gc=auto
--enable-multiarch --disable-sjlj-exceptions --with-arch=armv6 --with-fpu=vfp
--with-float=hard --disable-werror --enable-checking=release
--build=arm-linux-gnueabihf --host=arm-linux-gnueabihf
--target=arm-linux-gnueabihf
Thread model: posix
gcc version 8.3.0 (Raspbian 8.3.0-6+rpi1)
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-c' '-D' 'HAVE_CONFIG_H' '-I' '.' '-I'
'../../../gcc-9.1.0/gmp/mpn' '-I' '..' '-D' '__GMP_WITHIN_GMP' '-I'
'../../../gcc-9.1.0/gmp' '-D' 'OPERATION_divrem_1' '-D' 'NO_ASM' '-O2' '-g'
'-pipe' '-o' 'divrem_1.o'  '-mfloat-abi=hard' '-mfpu=vfp' '-mtls-dialect=gnu'
'-marm' '-march=armv6+fp'
 as --gdwarf2 -v -I . -I ../../../gcc-9.1.0/gmp/mpn -I .. -I
../../../gcc-9.1.0/gmp -march=armv6 -mfloat-abi=hard -mfpu=vfp -meabi=5
--noexecstack -o divrem_1.o tmp-divrem_1.s
GNU assembler version 2.31.1 (arm-linux-gnueabihf) using BFD version (GNU
Binutils for Raspbian) 2.31.1
tmp-divrem_1.s: Assembler messages:
tmp-divrem_1.s:129: Error: selected processor does not support `mls
r1,r4,r8,r11' in ARM mode
tmp-divrem_1.s:145: Error: selected processor does not support `mls
r1,r4,r8,r11' in ARM mode
tmp-divrem_1.s:158: Error: selected processor does not support `mls
r1,r4,r8,r11' in ARM mode
tmp-divrem_1.s:175: Error: selected processor does not support `mls
r1,r4,r3,r8' in ARM mode
tmp-divrem_1.s:209: Error: selected processor does not support `mls
r11,r4,r12,r3' in ARM mode

Whereas building gmp 6.1.2 standalone (./configure ; make) using -v -save-temps
gives:

gcc -v -save-temps -c -DHAVE_CONFIG_H -I. -I.. -D__GMP_WITHIN_GMP -I..
-DOPERATION_divrem_1 -O2 -pedantic -fomit-frame-pointer -march=armv8-a
-mfloat-abi=hard -mfpu=neon -mtune=cortex-a72 -Wa,--noexecstack tmp-divrem_1.s
-fPIC -DPIC -o .libs/divrem_1.o
Using built-in specs.
COLLECT_GCC=gcc
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Raspbian 8.3.0-6+rpi1'
--with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr
--with-gcc-major-version-only --program-suffix=-8
--program-prefix=arm-linux-gnueabihf- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm
--disable-libquadmath --disable-libquadmath-support --enable-plugin
--with-system-zlib --with-target-system-zlib --enable-objc-gc=auto
--enable-multiarch --disable-sjlj-exceptions --with-arch=armv6 --with-fpu=vfp
--with-float=hard --disable-werror --enable-checking=release
--build=arm-linux-gnueabihf --host=arm-linux-gnueabihf
--target=arm-linux-gnueabihf
Thread model: posix
gcc version 8.3.0 (Raspbian 8.3.0-6+rpi1)
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-c' '-D' 'HAVE_CONFIG_H' '-I' '.' '-I'
'..' '-D' '__GMP_WITHIN_GMP' '-I' '..' '-D' 'OPERATION_divrem_1' '-O2'
'-Wpedantic' '-fomit-frame-pointer'  '-mfloat-abi=hard' '-mfpu=neon'
'-mtune=cortex-a72' '-fPIC' '-D' 'PIC' '-o' '.libs/divrem_1.o'
'-mtls-dialect=gnu' '-marm' '-march=armv8-a+simd'
 as -v -I . -I .. -I .. -march=armv8-a -mfloat-abi=hard -mfpu=neon -meabi=5
--noexecstack -o .libs/divrem_1.o tmp-divrem_1.s
GNU assembler version 2.31.1 (arm-linux-gnueabihf) using BFD version (GNU
Binutils for Raspbian) 2.31.1
COMPILER_PATH=/usr/lib/gcc/arm-linux-gnueabihf/8/:/usr/lib/gcc/arm-linux-gnueabihf/8/:/usr/lib/gcc/arm-linux-gnueabihf/:/usr/lib/gcc/arm-linux-gnueabihf/8/:/usr/lib/gcc/arm-linux-gnueabihf/

[Bug bootstrap/91034] In tree build of gmp fails on Raspberry Pi4 (ARM Cortex A72) with `mls r1,r4,r8,r11' not supported in ARM mode

2019-06-29 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91034

--- Comment #1 from Andrew Roberts  ---
Configure line used for building gcc 9.1.0:

../gcc-9.1.0/configure --prefix=/usr/local/gcc-9.1.0--program-suffix=
--disa
ble-werror --enable-shared  --enable-threads=posix
--enable-checking=release
 --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions
--enable-gnu-unique-object --enable-linker-build-id
--with-linker-hash-style
=gnu --enable-plugin--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran,lto --disable-libgcj   --enable-clocale=gnu
--d
isable-libstdcxx-pch--enable-install-libiberty --disable-multilib  
--disabl
e-libssp --enable-default-pie --enable-default-ssp 
--host=arm-linux-gnueabi
hf --build=arm-linux-gnueabihf --with-arch=armv6 --with-float=hard
--with-fpu=vf
p --disable-bootstrap

the -with-arch=armv6 is the same as the host compiler.

I would normally build on ArchLinux ARM (but that's not available for Pi4 yet). 
One difference is Raspbian configures its gcc using:

--build=arm-linux-gnueabihf --host=arm-linux-gnueabihf
--target=arm-linux-gnueabihf --with-arch=armv6

Basically one gcc configuration for all raspberry pi's.

whereas ArchLinux ARM uses:

--host=armv7l-unknown-linux-gnueabihf --build=armv7l-unknown-linux-gnueabihf
--with-arch=armv7-a (RPI 3b 32bit)

 --host=armv6l-unknown-linux-gnueabihf --build=armv6l-unknown-linux-gnueabihf
--with-arch=armv6 (RPI Zero)

I normally copy the triplet/arch from the host compiler when building.

If I use the ArchLinx ARM triplet: armv7l-unknown-linux-gnueabihf
and "--with-arch=armv7-a" 
it still fails in the same way, so its not the --with-arch or triplet which is
breaking things.

Alternate (arch linux arm style) config which also fails:

../gcc-9.1.0/configure --prefix=/usr/local/gcc-9.1.0--program-suffix=
--disa
ble-werror --enable-shared  --enable-threads=posix
--enable-checking=release
 --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions
--enable-gnu-unique-object --enable-linker-build-id
--with-linker-hash-style
=gnu --enable-plugin--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran,lto --disable-libgcj   --enable-clocale=gnu
--d
isable-libstdcxx-pch--enable-install-libiberty --disable-multilib  
--disabl
e-libssp --enable-default-pie --enable-default-ssp 
--host=armv7l-unknown-li
nux-gnueabihf --build=armv7l-unknown-linux-gnueabihf --with-arch=armv7-a
--with-
float=hard --with-fpu=vfp --disable-bootstrap

[Bug bootstrap/91034] New: In tree build of gmp fails on Raspberry Pi4 (ARM Cortex A72) with `mls r1,r4,r8,r11' not supported in ARM mode

2019-06-29 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91034

Bug ID: 91034
   Summary: In tree build of gmp fails on Raspberry Pi4 (ARM
Cortex A72) with `mls r1,r4,r8,r11' not supported in
ARM mode
   Product: gcc
   Version: 9.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrewm.roberts at sky dot com
  Target Milestone: ---

When building gcc 9.1.0 (or 8.3.0) on Raspberry Pi 4 (ARM Cortex A72 in 32 bit
mode)
in tree build of gmp 6.1.2 (or 6.1.0) fails to build with:

tmp-divrem_1.s:129: Error: selected processor does not support `mls
r1,r4,r8,r11' in ARM mode

gmp 6.1.2 builds fine outside of gcc, thus its not a straight binutils issue,
but something to do with how gcc is configuring/building gmp.

The same configurations and version builds gcc-9.1.0/8.3.0 fine on Raspberry Pi
Zero and Pi 3b etc.

/proc/cpuinfo
processor   : 0
model name  : ARMv7 Processor rev 3 (v7l)
BogoMIPS: 108.00
Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xd08
CPU revision: 3

uname -a
Linux raspberrypi 4.19.50-v7l+ #895 SMP Thu Jun 20 16:03:42 BST 2019 armv7l
GNU/Linux

OS: Raspbian GNU/Linux 10 (buster)

gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/8/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Raspbian 8.3.0-6+rpi1'
--with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr
--with-gcc-major-version-only --program-suffix=-8
--program-prefix=arm-linux-gnueabihf- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm
--disable-libquadmath --disable-libquadmath-support --enable-plugin
--with-system-zlib --with-target-system-zlib --enable-objc-gc=auto
--enable-multiarch --disable-sjlj-exceptions --with-arch=armv6 --with-fpu=vfp
--with-float=hard --disable-werror --enable-checking=release
--build=arm-linux-gnueabihf --host=arm-linux-gnueabihf
--target=arm-linux-gnueabihf
Thread model: posix
gcc version 8.3.0 (Raspbian 8.3.0-6+rpi1)

gcc -march=native -Q --help=target
The following options are target specific:
  -mabi=aapcs-linux
  -mabort-on-noreturn   [disabled]
  -mandroid [disabled]
  -mapcs[disabled]
  -mapcs-frame  [disabled]
  -mapcs-reentrant  [disabled]
  -mapcs-stack-check[disabled]
  -march=   armv8-a+crc+simd
  -marm [enabled]
  -masm-syntax-unified  [disabled]
  -mbe32[enabled]
  -mbe8 [disabled]
  -mbig-endian  [disabled]
  -mbionic  [disabled]
  -mbranch-cost=-1
  -mcallee-super-interworking   [disabled]
  -mcaller-super-interworking   [disabled]
  -mcmse[disabled]
  -mcpu=
  -mfix-cortex-m3-ldrd  [disabled]
  -mflip-thumb  [disabled]
  -mfloat-abi=  hard
  -mfp16-format=none
  -mfpu=vfp
  -mglibc   [enabled]
  -mhard-float
  -mlittle-endian   [enabled]
  -mlong-calls  [disabled]
  -mmusl[disabled]
  -mneon-for-64bits [disabled]
  -mpic-data-is-text-relative   [enabled]
  -mpic-register=
  -mpoke-function-name  [disabled]
  -mprint-tune-info [disabled]
  -mpure-code   [disabled]
  -mrestrict-it [disabled]
  -msched-prolog[enabled]
  -msingle-pic-base [disabled]
  -mslow-flash-data [disabled]
  -msoft-float
  -mstructure-size-boundary=8
  -mthumb   [disabled]
  -mthumb-interwork [disabled]
  -mtls-dialect=gnu
  -mtp= cp15
  -mtpcs-frame  [disabled]
  -mtpcs-leaf-frame [disabled]
  -mtune=
  -muclibc  [disabled]
  -munaligned-access

[Bug target/89508] New: gcc snapshot 9.0.1 20190127 generates invalid assembler options on aarch64-unknown-linux-gnu with -march=native

2019-02-26 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89508

Bug ID: 89508
   Summary: gcc snapshot 9.0.1 20190127 generates invalid
assembler options on aarch64-unknown-linux-gnu with
-march=native
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrewm.roberts at sky dot com
  Target Milestone: ---

The latest gcc 9 snapshot (gcc version 9.0.1 20190127 (experimental)) fails
to compile files  (due to assembler option errors) when used with -march=native
on aarch64-unknown-linux-gnu

The gcc-8.3.0 release built with the same options accepts -march=native and
generates options the assembler can accept.

If -march=native is no longer accepted, it should generate a warning or error
as appropriate, or be silently ignored.

/usr/local/gcc/bin/gcc - is the gcc 9 snapshot
/usr/local/gcc-8.3.0/bin/gcc - is the gcc 8.3.0 release

All testing done an a Raspberry Pi 3B (BCM2837), ARM Cortex A53, running 64 bit
ArchLinux ARM:
uname -a
Linux alarm 4.20.11-1-ARCH #1 SMP Wed Feb 20 19:23:26 MST 2019 aarch64
GNU/Linux

The system binutils is used (2.31.1)

as --version
GNU assembler (GNU Binutils) 2.31.1
Copyright (C) 2018 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `aarch64-unknown-linux-gnu'.


Output of gcc-9.0.1 --help=target with -march=native:
(the guessed -march line is not accepted by the assembler)

/usr/local/gcc/bin/gcc -march=native -Q --help=target
The following options are target specific:
  -mabi=lp64
  -march=  
armv8-a+crc+profile+rng+memtag+sb+ssbs+predres
  -mbig-endian  [disabled]
  -mbionic  [disabled]
  -mbranch-protection=
  -mcmodel= small
  -mcpu=generic
  -mfix-cortex-a53-835769   [enabled]
  -mfix-cortex-a53-843419   [enabled]
  -mgeneral-regs-only   [disabled]
  -mglibc   [enabled]
  -mlittle-endian   [enabled]
  -mlow-precision-div   [disabled]
  -mlow-precision-recip-sqrt[disabled]
  -mlow-precision-sqrt  [disabled]
  -mmusl[disabled]
  -momit-leaf-frame-pointer [enabled]
  -moverride=
  -mpc-relative-literal-loads   [enabled]
  -msign-return-address=none
  -mstack-protector-guard-offset=
  -mstack-protector-guard-reg=
  -mstack-protector-guard=  global
  -mstrict-align[disabled]
  -msve-vector-bits=scalable
  -mtls-dialect=desc
  -mtls-size=   24
  -mtrack-speculation   [disabled]
  -mtune=   generic
  -muclibc  [disabled]
  -mverbose-cost-dump   [disabled]

  Known AArch64 ABIs (for use with the -mabi= option):
ilp32 lp64

  Supported AArch64 return address signing scope (for use with
-msign-return-address= option):
all non-leaf none

  The code model option names for -mcmodel:
large small tiny

  Valid arguments to -mstack-protector-guard=:
global sysreg

  The possible SVE vector lengths:
1024 128 2048 256 512 scalable

  The possible TLS dialects:
desc trad

Assembler messages:
Error: unknown architectural extension `rng+memtag+sb+ssbs+predres'
Error: unrecognized option
-march=armv8-a+crc+profile+rng+memtag+sb+ssbs+predres

gcc-9.0.1 build options:

 /usr/local/gcc/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-9.0.0/libexec/gcc/aarch64-unknown-linux-gnu/9.0.1/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../gcc-9.0.0/configure --prefix=/usr/local/gcc-9.0.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran,lto --disable-libgcj --enable-clocale=gnu
--disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
--enable-shared --with-arch-directory=aarch64 --enable-multiarch
--disable-libssp --enable-default-pie --enable-default-ssp
--host=aarch64-unknown-linux-gnu --build=aarch64-unknown-linux-gnu
--with-arch=armv8-a --disable-bootstrap
Thread model: posix
gcc version 9.0.1 20190127 (experimental) (GCC)

Output of gcc

[Bug web/85578] broken links in gcc-8.0.1-RC-20180427/INSTALL/specific.html, and out of date prerequisites.html

2018-05-01 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85578

--- Comment #2 from Andrew Roberts  ---
Ok thanks, just checking on the prerequisites front.

[Bug web/85578] New: broken links in gcc-8.0.1-RC-20180427/INSTALL/specific.html, and out of date prerequisites.html

2018-04-30 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85578

Bug ID: 85578
   Summary: broken links in
gcc-8.0.1-RC-20180427/INSTALL/specific.html, and out
of date prerequisites.html
   Product: gcc
   Version: 8.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: web
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrewm.roberts at sky dot com
  Target Milestone: ---

The file INSTALL/specific.html in gcc-8.0.1-RC-20180427
contains many broken links. All links that include target-x-x are broken,
only the simple ones like avr are working.

avr link is: 
file:///home/aroberts/gcc/gcc/gcc-8.0.1-RC-20180427/INSTALL/specific.html#avr
which references:


aarch64*-*-* link is:
file:///home/aroberts/gcc/gcc/gcc-8.0.1-RC-20180427/INSTALL/specific.html#aarch64-x-x
which fails to reference:


This is obviously broken. And seems to apply to all the none trivial links.

Also in prerequisites.html, are the versions for mpc, mpfr, gmp, isl etc ok,
or are they out of date?

I see that the download_prerequisites is referencing:
gmp='gmp-6.1.0.tar.bz2'
mpfr='mpfr-3.1.4.tar.bz2'
mpc='mpc-1.0.3.tar.gz'
isl='isl-0.18.tar.bz2'

[Bug bootstrap/84800] ICE building gcc in isl_factorization.c with xgcc on SPARC Solaris with 8-20180304 snapshot

2018-03-14 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84800

--- Comment #2 from Andrew Roberts  ---
Rebuilt with 8-20180311 snapshot, and it now builds successfully:

/usr/local/gcc/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-8-20180311/libexec/gcc/sparc-sun-solaris2.10/8.0.1/lto-wrapper
Target: sparc-sun-solaris2.10
Configured with: ../gcc-8-20180311/configure --prefix=/usr/local/gcc-8-20180311
--with-ld=/usr/ccs/bin/ld --without-gnu-ld --with-as=/usr/ccs/bin/as
--without-gnu-as --build=sparc-sun-solaris2.10 --target=sparc-sun-solaris2.10
--enable-languages=c,c++,fortran --enable-shared --enable-libssp --enable-nls
--enable-threads=posix --with-included-gettext --with-libiconv-prefix=/opt/csw
--with-system-zlib=/opt/csw --with-isl
Thread model: posix
gcc version 8.0.1 20180311 (experimental) (GCC)

[Bug bootstrap/84800] New: ICE building gcc in isl_factorization.c with xgcc on SPARC Solaris with 8-20180304 snapshot

2018-03-09 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84800

Bug ID: 84800
   Summary: ICE building gcc in isl_factorization.c with xgcc on
SPARC Solaris with 8-20180304 snapshot
   Product: gcc
   Version: 8.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrewm.roberts at sky dot com
  Target Milestone: ---

Building gcc-8-20180304 failed on native SPARC Solaris 10
(sparc-sun-solaris2.10).
I had previously built the 20180114 snapshot in the same way (the only
difference is using mpfr-4.0.1 vs mpfr-4.0.0). I'm retrying the 20180114
snapshot with mpfr-4.0.1. 

The host compiler is the OpenCSW  gcc compiler:

/opt/csw/bin/gcc -v
Reading specs from /opt/csw/lib/gcc/sparc-sun-solaris2.10/5.5.0/specs
COLLECT_GCC=/opt/csw/bin/gcc
COLLECT_LTO_WRAPPER=/opt/csw/libexec/gcc/sparc-sun-solaris2.10/5.5.0/lto-wrapper
Target: sparc-sun-solaris2.10
Configured with:
/home/dam/mgar/pkg/gcc5/trunk/work/solaris10-sparc/build-isa-sparcv8p
lus/gcc-5.5.0/configure --prefix=/opt/csw --exec_prefix=/opt/csw
--bindir=/opt/csw/bin
 --sbindir=/opt/csw/sbin --libexecdir=/opt/csw/libexec --datadir=/opt/csw/share
--sysc
onfdir=/etc/opt/csw --sharedstatedir=/opt/csw/share
--localstatedir=/var/opt/csw --lib
dir=/opt/csw/lib --infodir=/opt/csw/share/info --includedir=/opt/csw/include
--mandir=
/opt/csw/share/man --enable-cloog-backend=isl --enable-java-awt=xlib
--enable-language
s=ada,c,c++,fortran,go,java,objc --enable-libada --enable-libssp --enable-nls
--enable
-objc-gc --enable-threads=posix --program-suffix=-5.5 --with-cloog=/opt/csw
--with-gmp
=/opt/csw --with-included-gettext --with-ld=/usr/ccs/bin/ld --without-gnu-ld
--with-li
biconv-prefix=/opt/csw --with-mpfr=/opt/csw --with-ppl=/opt/csw
--with-system-zlib=/op
t/csw --with-as=/usr/ccs/bin/as --without-gnu-as
Thread model: posix
gcc version 5.5.0 (GCC)

The gcc configuration used for the build is gmp 6.12, mpc 1.1.0, mpfr 4.0.1,
isl 0.18 all built in tree as follows:

gccver=8-20180304
gmpver=6.1.2
mpcver=1.1.0
mpfrver=4.0.1
islver=0.18
gtar -xf gcc-$gccver.tar.*
cd gcc-$gccver
gtar -xf ../gmp-$gmpver.tar.*
gtar -xf ../mpc-$mpcver.tar.*
gtar -xf ../mpfr-$mpfrver.tar.*
gtar -xf ../isl-$islver.tar.*
ln -s gmp-$gmpver gmp
ln -s mpc-$mpcver mpc
ln -s mpfr-$mpfrver mpfr
ln -s isl-$islver isl
cd ..
mkdir gcc-build
cd gcc-build
CC=/opt/csw/bin/gcc ; export CC
CXX=/opt/csw/bin/g++ ; export CXX
../gcc-$gccver/configure \
 --prefix=/usr/local/gcc-$gccver \
 --with-ld=/usr/ccs/bin/ld --without-gnu-ld \
 --with-as=/usr/ccs/bin/as --without-gnu-as \
 --build=sparc-sun-solaris2.10 \
 --target=sparc-sun-solaris2.10 \
 --enable-languages=c,c++,fortran --enable-shared \
 --enable-libssp --enable-nls --enable-threads=posix \
 --with-included-gettext \
 --with-libiconv-prefix=/opt/csw --with-system-zlib=/opt/csw \
 --with-isl
gmake

The error is:
libtool: compile:  /export/home/aroberts/Public/gcc-build/./prev-gcc/xgcc
-B/export/home/aroberts/Public/gcc-build/./prev-gcc/
-B/usr/local/gcc-8-20180304/sparc-sun-solaris2.10/bin/
-B/usr/local/gcc-8-20180304/sparc-sun-solaris2.10/bin/
-B/usr/local/gcc-8-20180304/sparc-sun-solaris2.10/lib/ -isystem
/usr/local/gcc-8-20180304/sparc-sun-solaris2.10/include -isystem
/usr/local/gcc-8-20180304/sparc-sun-solaris2.10/sys-include -DHAVE_CONFIG_H -I.
-I../../gcc-8-20180304/isl -I../../gcc-8-20180304/isl/include -Iinclude/
-I/export/home/aroberts/Public/gcc-build/gmp/../../gcc-8-20180304/gmp
-I/export/home/aroberts/Public/gcc-build/./gmp -g -O2 -MT isl_factorization.lo
-MD -MP -MF .deps/isl_factorization.Tpo -c
../../gcc-8-20180304/isl/isl_factorization.c -o isl_factorization.o
during GIMPLE pass: pre
../../gcc-8-20180304/isl/isl_factorization.c: In function
‘isl_basic_set_factorizer’:
../../gcc-8-20180304/isl/isl_factorization.c:256:28: internal compiler error:
in compute_antic_aux, at tree-ssa-pre.c:2148
 __isl_give isl_factorizer *isl_basic_set_factorizer(
^~~~
0x1222833 compute_antic_aux
../../gcc-8-20180304/gcc/tree-ssa-pre.c:2148
0x1223633 compute_antic
../../gcc-8-20180304/gcc/tree-ssa-pre.c:2364
0x122a3c7 execute
../../gcc-8-20180304/gcc/tree-ssa-pre.c:4131
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
gmake[5]: *** [isl_factorization.lo] Error 1
gmake[5]: Leaving directory `/export/home/aroberts/Public/gcc-build/isl'
gmake[4]: *** [all-recursive] Error 1
gmake[4]: Leaving directory `/export/home/aroberts/Public/gcc-build/isl'
gmake[3]: *** [all] Error 2
gmake[3]: Leaving directory `/export/home/aroberts/Public/gcc-build/isl'
gmake[2]: *** [all-stage2-isl] Error 2
gmake[2]: Leaving directory `/export/home/aroberts/Public/gcc-build'
gmake[1]: *** [stage2-bubble] E

[Bug driver/83206] -mfpu=auto does not work on ARM (armv7l-unknown-linux-gnueabihf)

2018-03-07 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83206

--- Comment #23 from Andrew Roberts  ---
RPI Zero still looks ok with latest snapshot. 

/usr/local/gcc/bin/gcc -mfpu=auto -O3 -o matrix matrix.c
cc1: error: -mfloat-abi=hard: selected processor lacks an FPU

/usr/local/gcc/bin/gcc -mcpu=native -mfpu=auto -O3 -o matrix matrix.c
Is ok.

/usr/local/gcc/bin/gcc -march=native -mcpu=native -Q --help=target | grep
"mcpu\|mfpu\|march"
  -march=   armv6zk+fp
  -mcpu=arm1176jzf-s
  -mfpu=vfp

/usr/local/gcc/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-8.0.0/libexec/gcc/armv6l-unknown-linux-gnueabihf/8.0.1/lto-wrapper
Target: armv6l-unknown-linux-gnueabihf
Configured with: ../gcc-8.0.0/configure --prefix=/usr/local/gcc-8.0.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran,lto --disable-libgcj --enable-clocale=gnu
--disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
--disable-libssp --enable-default-pie --enable-default-ssp
--host=armv6l-unknown-linux-gnueabihf --build=armv6l-unknown-linux-gnueabihf
--with-arch=armv6 --with-float=hard --with-fpu=vfp --disable-bootstrap
Thread model: posix
gcc version 8.0.1 20180304 (experimental) (GCC)

[Bug driver/83206] -mfpu=auto does not work on ARM (armv7l-unknown-linux-gnueabihf)

2018-03-07 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83206

--- Comment #22 from Andrew Roberts  ---
The RPI Zero bug was fixed, I'm retesting with the latest snapshot (8.0.1
20180304) just to be sure it is ok. There are still a number of inconsistencies
and things which could be improved.

On Odroid-Xu4 (Cortex A15/A7 Big/little, Aarch32)
-

/usr/local/gcc/bin/gcc -mfpu=auto -O3 -o matrix matrix.c
cc1: error: -mfloat-abi=hard: selected processor lacks an FPU

It would be better if this error could let the user know they need to select a
CPU manually, rather than incorrectly state it lacks an FPU. This is going to
be confusing to people.

/usr/local/gcc/bin/gcc -mcpu=native -mfpu=auto -O3 -o matrix matrix.c
Is fine.

/usr/local/gcc/bin/gcc -march=native -Q --help=target | grep
"mcpu\|mfpu\|march"
  -march=   armv7ve+vfpv3-d16
  -mcpu=
  -mfpu=vfpv3-d16
/usr/local/gcc/bin/gcc -march=native -mcpu=native -Q --help=target | grep
"mcpu\|mfpu\|march"
  -march=   armv7ve+vfpv3-d16
  -mcpu=cortex-a7
  -mfpu=vfpv3-d16

This is still not detecting BIG/little CPU combinations (I had a separate PR
about this [83207]).

On ODROID-C2 (Cortex A53,AARCH64)
--
/usr/local/gcc/bin/gcc -march=native -mcpu=native -Q --help=target | grep
"mcpu\|mfpu\|march"
  -march=ARCH   armv8-a+crc
  -mcpu=CPU cortex-a53

The output is inconsistent with aarch32 output (=ARCH, =CPU), I had also raised
a PR about this [83193].

On RPI 3 (Cortex A53,AArch32)
-
No issues here that I can see.

I'll update again tomorrow when the RPI Zero build has completed

[Bug driver/83193] Help for invalid -march= options from cc1 omits -march=native on x86-64, arm. aarch64, output also inconsistent

2018-02-19 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83193

--- Comment #4 from Andrew Roberts  ---
Correct, it does not show native in the list of valid options presented to the
user.

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2018-01-22 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #50 from Andrew Roberts  ---
with the matrix.c benchmark on Ryzen and looking at the other options when
using -march=znver1 and -mtune=znver1

mult took 225281 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=128
mult took 185961 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=256
mult took 187577 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=512

-adding mno-avx2 has no effect on the above baseline.

adding in -mno-fma

mult took 223302 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=128
-mno-fma
mult took 123773 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=256
-mno-fma
mult took 124690 clocks -march=znver1 -mtune=znver1 -mprefer-vector-width=512
-mno-fma

Is the patch in trunk yet? I was assuming it was from the other comments.

using -march=ivybridge but keeping the rest of the options:
mult took 215052 clocks -march=ivybridge -mtune=znver1
-mprefer-vector-width=128   -mno-fma
mult took 121661 clocks -march=ivybridge -mtune=znver1
-mprefer-vector-width=256 -mno-fma
mult took 131763 clocks -march=ivybridge -mtune=znver1
-mprefer-vector-width=512 -mno-fma

Switching to -march=ivybridge -mtune=skylake-avx512 and dropping the other
options (and still on Ryzen)
mult took 119195 clocks -march=ivybridge -mtune=skylake-avx512 

With -march=znver1 -mtune=skylake-avx512 and dropping the other options
mult took 182799 clocks -march=znver1 -mtune=skylake-avx512

So the combination of -march=ivybridge -mtune=skylake-avx512 is doing something
right.

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2018-01-22 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #48 from Andrew Roberts  ---
Correction, that should be 23 not 23000 for the haswell drop in
performance.

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2018-01-22 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #47 from Andrew Roberts  ---
Again with the latest snapshot:
gcc version 8.0.1 20180121

matrix.c is still needing additional options to get the best out of the Ryzen
processor. But is better than before (223029 clocks vs 371978 originally), 
but 122677 is achievable with the right options. However the same can also be
said for haswell as things stand. The haswell (-march=haswell -mtune=haswell)
time has dropped from 19 to 23000, but do we put that down to
Meltdown/Spectre updates or compiler updates.

With just -O3 on Ryzen:

Top 5
mult took 115669 clocks -march=ivybridge -mtune=skylake-avx512
mult took 118403 clocks -march=corei7-avx -mtune=skylake-avx512
mult took 119379 clocks -march=core-avx-i -mtune=skylake-avx512
mult took 119735 clocks -march=corei7-avx -mtune=skylake
mult took 119901 clocks -march=sandybridge -mtune=broadwell

mult took 120023 clocks -march=sandybridge -mtune=haswell
mult took 121010 clocks -march=corei7-avx -mtune=haswell
mult took 127371 clocks -march=sandybridge -mtune=x86-64
mult took 151208 clocks -march=btver2 -mtune=generic
mult took 152360 clocks -march=ivybridge -mtune=generic
mult took 173926 clocks -march=haswell -mtune=haswell
mult took 177359 clocks -march=znver1 -mtune=athlon64
mult took 18 clocks -march=ivybridge -mtune=znver1
mult took 188219 clocks -march=znver1 -mtune=generic
mult took 199721 clocks -march=znver1 -mtune=x86-64
mult took 223029 clocks -march=znver1 -mtune=znver1

Bot 5
mult took 377398 clocks -march=znver1 -mtune=bdver3
mult took 377650 clocks -march=knl -mtune=bdver3
mult took 378600 clocks -march=core-avx2 -mtune=bonnell
mult took 381447 clocks -march=skylake-avx512 -mtune=haswell
mult took 388837 clocks -march=skylake-avx512 -mtune=bdver4

On Haswell 

Top 5
mult took 133704 clocks -march=ivybridge -mtune=k8-sse3
mult took 15 clocks -march=btver2 -mtune=k8
mult took 15 clocks -march=core-avx-i -mtune=x86-64
mult took 15 clocks -march=corei7-avx -mtune=nano
mult took 15 clocks -march=corei7-avx -mtune=opteron

mult took 16 clocks -march=core-avx-i -mtune=haswell
mult took 19 clocks -march=haswell -mtune=eden-x4
mult took 19 clocks -march=ivybridge -mtune=generic
mult took 20 clocks -march=haswell -mtune=x86-64
mult took 23 clocks -march=haswell -mtune=haswell
mult took 27 clocks -march=haswell -mtune=generic

Bot 5
mult took 42 clocks -march=skylake-avx512 -mtune=bdver2
mult took 42 clocks -march=znver1 -mtune=bdver3
mult took 42 clocks -march=znver1 -mtune=bdver4
mult took 43 clocks -march=bdver2 -mtune=bdver2
mult took 43 clocks -march=knl -mtune=bdver2

Using 
-mprefer-vector-width=none -mno-fma -mno-avx2 -O3

On Ryzen
Top 5
mult took 116558 clocks -march=haswell -mtune=bdver3
mult took 116673 clocks -march=haswell -mtune=skylake
mult took 117268 clocks -march=sandybridge -mtune=skylake-avx512
mult took 117288 clocks -march=broadwell -mtune=nocona
mult took 118450 clocks -march=corei7-avx -mtune=haswell

mult took 119719 clocks -march=core-avx-i -mtune=znver1
mult took 120028 clocks -march=znver1 -mtune=skylake
mult took 122677 clocks -march=znver1 -mtune=znver1
mult took 123423 clocks -march=haswell -mtune=haswell
mult took 127388 clocks -march=skylake -mtune=x86-64
mult took 130475 clocks -march=znver1 -mtune=x86-64
mult took 132374 clocks -march=sandybridge -mtune=generic
mult took 162317 clocks -march=znver1 -mtune=generic

Bot 5
mult took 30 clocks -march=nano-x2 -mtune=btver2
mult took 31 clocks -march=skylake-avx512 -mtune=westmere
mult took 319772 clocks -march=knl -mtune=sandybridge
mult took 32 clocks -march=eden-x2 -mtune=amdfam10
mult took 33 clocks -march=atom -mtune=broadwell

On Haswell

Top 5
mult took 123148 clocks -march=bonnell -mtune=ivybridge
mult took 130262 clocks -march=ivybridge -mtune=silvermont
mult took 135299 clocks -march=core-avx2 -mtune=nano-3000
mult took 15 clocks -march=core-avx2 -mtune=intel
mult took 15 clocks -march=haswell -mtune=btver1

mult took 17 clocks -march=core-avx-i -mtune=haswell
mult took 17 clocks -march=znver1 -mtune=x86-64
mult took 18 clocks -march=haswell -mtune=haswell
mult took 18 clocks -march=znver1 -mtune=generic
mult took 21 clocks -march=haswell -mtune=generic
mult took 23 clocks -march=haswell -mtune=x86-64

Bot 5
mult took 35 clocks -march=nano-x4 -mtune=nano-2000
mult took 35 clocks -march=slm -mtune=skylake-avx512
mult took 36 clocks -march=barcelona -mtune=broadwell
mult took 36 clocks -march=nano -mtune=corei7
mult took 36 clocks -march=nocona -mtune=btver2

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2018-01-22 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #46 from Andrew Roberts  ---
With the latest snapshot:
gcc version 8.0.1 20180121

For the mt19937ar things now look reasonable without any strange options on
Ryzen.

Top 5
mt19937ar took 226849 clocks -march=amdfam10 -mtune=btver2
mt19937ar took 228970 clocks -march=amdfam10 -mtune=barcelona
mt19937ar took 229494 clocks -march=bdver1 -mtune=btver1
mt19937ar took 229524 clocks -march=nano -mtune=nano
mt19937ar took 230003 clocks -march=opteron-sse3 -mtune=athlon64-sse3

mt19937ar took 233793 clocks -march=k8-sse3 -mtune=x86-64
mt19937ar took 241700 clocks -march=corei7 -mtune=generic
mt19937ar took 242373 clocks -march=nano-3000 -mtune=znver1
mt19937ar took 245550 clocks -march=k8-sse3 -mtune=haswell
mt19937ar took 251431 clocks -march=znver1 -mtune=generic
mt19937ar took 262200 clocks -march=znver1 -mtune=znver1
mt19937ar took 276993 clocks -march=haswell -mtune=haswell

Bot 5
mt19937ar took 341326 clocks -march=nano-x4 -mtune=silvermont
mt19937ar took 341750 clocks -march=core-avx-i -mtune=nocona
mt19937ar took 342457 clocks -march=k8 -mtune=znver1
mt19937ar took 347453 clocks -march=ivybridge -mtune=bonnell
mt19937ar took 364041 clocks -march=haswell -mtune=core-avx-i

with -mno-avx2
mt19937ar took 235997 clocks -march=znver1 -mtune=opteron
mt19937ar took 233921 clocks -march=nano-1000 -mtune=x86-64
mt19937ar took 243452 clocks -march=znver1 -mtune=x86-64
mt19937ar took 243540 clocks -march=silvermont -mtune=generic
mt19937ar took 247113 clocks -march=znver1 -mtune=generic
mt19937ar took 241368 clocks -march=nano-2000 -mtune=haswell
mt19937ar took 247806 clocks -march=znver1 -mtune=znver1

Compare this with it taking 430875 clocks originally for -march=znver1
-mtune=znver1

On Haswell 

Top 5

mt19937ar took 22 clocks -march=amdfam10 -mtune=amdfam10
mt19937ar took 22 clocks -march=amdfam10 -mtune=athlon64
mt19937ar took 22 clocks -march=amdfam10 -mtune=athlon64-sse3
mt19937ar took 22 clocks -march=amdfam10 -mtune=athlon-fx
mt19937ar took 22 clocks -march=amdfam10 -mtune=barcelona

mt19937ar took 22 clocks -march=corei7-avx -mtune=x86-64
mt19937ar took 23 clocks -march=haswell -mtune=haswell
mt19937ar took 24 clocks -march=haswell -mtune=generic
mt19937ar took 26 clocks -march=haswell -mtune=x86-64

Bot 5 (all various shades of mtune=bdverZ or mtune=btverZ)
mt19937ar took 31 clocks -march=core-avx2 -mtune=bdver1
mt19937ar took 31 clocks -march=haswell -mtune=bdver1
mt19937ar took 31 clocks -march=skylake -mtune=bdver1

[Bug bootstrap/83903] New: gcc 8.0.0 20180114 fails to bootstrap on Darwin x86_64, undeclared ASM_OUTPUT_DEF

2018-01-16 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83903

Bug ID: 83903
   Summary: gcc 8.0.0 20180114 fails to bootstrap on Darwin
x86_64, undeclared ASM_OUTPUT_DEF
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrewm.roberts at sky dot com
  Target Milestone: ---

Building gcc 8.0.0 20180114 on Darwin (OS X) with llvm fails with an undeclared
identifier:

.../../gcc-8.0.0/gcc/config/i386/i386.c:10961:7: error: use of undeclared
  identifier 'ASM_OUTPUT_DEF'
  ASM_OUTPUT_DEF (asm_out_file, alias, name);

clang -v
Apple LLVM version 9.0.0 (clang-900.0.39.2)
Target: x86_64-apple-darwin17.3.0
Thread model: posix
InstalledDir:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault
.xctoolchain/usr/bin

uname -a
Darwin Andrews-Mac-mini.local 17.3.0 Darwin Kernel Version 17.3.0: Thu Nov  9
18
:09:22 PST 2017; root:xnu-4570.31.3~1/RELEASE_X86_64 x86_64

gcc is configured with:

../gcc-8.0.0/configure --prefix=/usr/local/gcc-8.0.0 --program-suffix=
--disable-werror --enable-checking=release --enable-languages=c,c++,fortran,lto
--disable-libgcj --host=x86_64-apple-darwin17.3.0
--build=x86_64-apple-darwin17.3.0 --disable-bootstrap

gcc 7.2.0 builds fine on the same system, using the same configuration. This is
an upto date Mac with current Apple toochain.

[Bug driver/83206] -mfpu=auto does not work on ARM (armv7l-unknown-linux-gnueabihf)

2017-12-11 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83206

--- Comment #20 from Andrew Roberts  ---
The patch in in latest snapshot is working ok on Raspberry Pi Zero. And
-help=target now returns:

/usr/local/gcc/bin/gcc -march=native -mcpu=native -mfpu=auto -Q --help=target |
grep "march\|mcpu\|mfpu"
  -march=   armv6zk
  -mcpu=arm1176jzf-s
  -mfpu=auto

gcc version 8.0.0 20171210 (experimental) (GCC)

[Bug driver/83206] -mfpu=auto does not work on ARM (armv7l-unknown-linux-gnueabihf)

2017-12-10 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83206

--- Comment #18 from Andrew Roberts  ---
Richard, I'm giving the latest snapshot a test, the armv6 version  will be
ready in 16 hrs or so...

Meanwhile a question about consistency with gcc -Q --help=target,
and also what happens if you don't specify -mcpu= with =mfpu=auto
tested with gcc version 8.0.0 20171210 (experimental) (GCC)

/usr/local/gcc/bin/gcc  -Q --help=target | grep "mcpu\|mfpu\|march"
  -march=   armv7-a+fp
  -mcpu=
  -mfpu=vfpv3-d16

/usr/local/gcc/bin/gcc -march=native -Q --help=target | grep
"mcpu\|mfpu\|march"
  -march=   armv7ve+vfpv3-d16
  -mcpu=
  -mfpu=vfpv3-d16

/usr/local/gcc/bin/gcc -march=native -mcpu=native -Q --help=target | grep
"mcpu\|mfpu\|march"
  -march=   armv7ve+vfpv3-d16
  -mcpu=cortex-a7
  -mfpu=vfpv3-d16

/usr/local/gcc/bin/gcc -march=native -mcpu=native -mfpu=auto -Q --help=target |
grep "mcpu\|mfpu\|march"
  -march=   armv7ve
  -mcpu=cortex-a7
  -mfpu=auto

So without anything specified generic arch is given and also a fpu.
-march=native fills in the arch with the correct one
whereas -mfpu=auto just says auto rather than what was selected

march=native gives different results depending on if -mfpu=auto is set

How does mfpu=auto impact cross compuilers? Will it just not be available? What
will happen in the future when its the default?

This was all on the Big/Little ODROID XU4, so it should really have been:
  -mcpu=cortex-a15.cortex-a7
When the patches for that issue land.

[Bug driver/83206] -mfpu=auto does not work on ARM (armv7l-unknown-linux-gnueabihf)

2017-12-04 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83206

--- Comment #15 from Andrew Roberts  ---
Created attachment 42792
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42792=edit
/proc/cpuinfo fro rpi3 (cortex a-53) on aarch64

/proc/cpuinfo fro rpi3 (cortex a-53) on aarch64

while this is the same cpu as odroid-c2 running aarch64, it has much newer
kernel.
rpi: 4.14.3-1-ARCH
odroid-c2: 3.14.79-28-ARCH

Newer aarch64 kernels expose MIDR directly at:
/sys/devices/system/cpu/cpu0/regs/identification/midr_el1

but not the other control regs needed for FPU detection

[Bug driver/83206] -mfpu=auto does not work on ARM (armv7l-unknown-linux-gnueabihf)

2017-12-04 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83206

--- Comment #14 from Andrew Roberts  ---
Richard, I have checked with latest snapshot (20171203) and problem persists.

I think the issue is that the CPU on the original Raspberry Pi and Pi Zero is
not detected properly by gcc. 

/usr/local/gcc/bin/gcc -mcpu=native -Q --help=target | grep mcpu=
  -mcpu=arm1176jz-s

But the processor is actually an arm1176jzf-s

Using:
/usr/local/gcc/bin/gcc -o matrix-v6  -mcpu=arm1176jzf-s  -mfpu=auto -O3
matrix.c
works

whereas using -mcpu=native or -mcpu=arm1176jz-s fails (no FPU).

gcc seems to parse /proc/cpuinfo to get the MIDR details and this is correct
(as far as it goes). But it doesn't parse the Features line to get the FPU
details. Which is the only way of telling the arm1176jz-s from arm1176jzf-s (as
Linux doesn't give access to control registers).

On Raspberry Pi B/Zero:
Features: half thumb fastmult vfp edsp java tls

I've attached /proc/cpuinfo for all arm processors I have.

While looking at this it might be worth also looking at bug 83207 (big/little
cpu detection) as that is just a case of parsing out both processors from the
/proc/cpuinfo file (see odroid-xu4 file)

It might be worth soliciting additional /proc/cpuinfo files from the mailing
list, if anybody has them.

[Bug driver/83206] -mfpu=auto does not work on ARM (armv7l-unknown-linux-gnueabihf)

2017-12-04 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83206

--- Comment #13 from Andrew Roberts  ---
Created attachment 42791
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42791=edit
/proc/cpuinfo from odroid-c2 (cortex-A53) aarch64 mode

/proc/cpuinfo from odroid-c2 (cortex-A53) aarch64 mode

[Bug driver/83206] -mfpu=auto does not work on ARM (armv7l-unknown-linux-gnueabihf)

2017-12-04 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83206

--- Comment #12 from Andrew Roberts  ---
Created attachment 42790
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42790=edit
/proc/cpuinfo from Raspberry Pi 3 (cortex-A53) arm mode

/proc/cpuinfo from Raspberry Pi 3 (cortex-A53) arm mode

[Bug driver/83206] -mfpu=auto does not work on ARM (armv7l-unknown-linux-gnueabihf)

2017-12-04 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83206

--- Comment #11 from Andrew Roberts  ---
Created attachment 42789
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42789=edit
/proc/cpuinfo from rpi b (arm1176jzf-s)

/proc/cpuinfo from rpi b (arm1176jzf-s)

[Bug driver/83206] -mfpu=auto does not work on ARM (armv7l-unknown-linux-gnueabihf)

2017-12-04 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83206

--- Comment #10 from Andrew Roberts  ---
Created attachment 42788
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42788=edit
/proc/cpuinfo from odroid-xu4 big/little cortex-a15/cortex-a7

/proc/cpuinfo from odroid-xu4 big/little cortex-a15/cortex-a7

[Bug driver/83206] -mfpu=auto does not work on ARM (armv7l-unknown-linux-gnueabihf)

2017-12-04 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83206

--- Comment #9 from Andrew Roberts  ---
Created attachment 42787
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42787=edit
/proc/cpuinfo from cortex-a7 Raspberry Pi 2b v1.1

/proc/cpuinfo from cortex-a7 Raspberry Pi 2b v1.1

[Bug driver/83206] -mfpu=auto does not work on ARM (armv7l-unknown-linux-gnueabihf)

2017-12-04 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83206

--- Comment #7 from Andrew Roberts  ---
I get the same thing if I just use -mcpu=native:

/usr/local/gcc/bin/gcc -o matrix-v6 -mcpu=native -mfpu=auto -O3 matrix.c
cc1: error: -mfloat-abi=hard: selected processor lacks an FPU

I realize the aarch64 compiler does not need -mfpu=auto, but I was wondering if
it was worth at least not rejecting it so makefiles can be portable between arm
and aarch64. At present you get:

gcc: error: unrecognized command line option ‘-mfpu=auto’

and the compile fails

A Rasbperry PI Zero is the cheapest and easiest armv6 option, although it does
take 24hrs to build the compiler

[Bug driver/83206] -mfpu=auto does not work on ARM (armv7l-unknown-linux-gnueabihf)

2017-12-03 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83206

--- Comment #5 from Andrew Roberts  ---
It looks like I was right about this all along, its just that armv6l isn't
working. armv7l seems ok:

On RaspberryPi B - ARM1176 rev 7 (0x4100b760)
cat /proc/cpuinfo
processor   : 0
model name  : ARMv6-compatible processor rev 7 (v6l)
BogoMIPS: 697.95
Features: half thumb fastmult vfp edsp java tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xb76
CPU revision: 7

/usr/local/gcc/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-8.0.0/libexec/gcc/armv6l-unknown-linux-gnueabihf/8.0.0/lto-wrapper
Target: armv6l-unknown-linux-gnueabihf
Configured with: ../gcc-8.0.0/configure --prefix=/usr/local/gcc-8.0.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran,lto --disable-libgcj --enable-clocale=gnu
--disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
--disable-libssp --enable-default-pie --enable-default-ssp
--host=armv6l-unknown-linux-gnueabihf --build=armv6l-unknown-linux-gnueabihf
--with-arch=armv6 --with-float=hard --with-fpu=vfp --disable-bootstrap
Thread model: posix
gcc version 8.0.0 20171126 (experimental) (GCC)

/usr/local/gcc/bin/gcc -march=native -mcpu=native -mtune=native -Q
--target-help | grep "march=\|mtune=\|mcpu=\|mfpu="
  -march=   armv6zk+fp
  -mcpu=arm1176jz-s
  -mfpu=vfp
  -mtune=   arm1176jz-s

/usr/local/gcc/bin/gcc -o matrix-v6 -march=native -mcpu=native -mtune=native
-mfpu=auto -O3 matrix.c
cc1: error: -mfloat-abi=hard: selected processor lacks an FPU

whereas:
/usr/local/gcc/bin/gcc -o matrix-v6 -march=native -mcpu=native -mtune=native
-mfpu=vfp -O3 matrix.c

is fine.

-mfpu=auto works on 
Raspberry Pi 3B - 4 x Cortex-A53 rev 4 (0x4100d030)
and 
ODROID-XU4 - 4 x Cortex-A15 rev 3 (0x4100c0f0)/4 x Cortex-A7 rev 3 (0x4100c070)

On aarch64 -mfpu=auto gives:
gcc: error: unrecognized command line option ‘-mfpu=auto’

which is correct, but would it be better to silently accept it for
compatibility with ARM 32 bit

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-29 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #33 from Andrew Roberts  ---
That second llvm command line should read:

/usr/local/llvm-5.0.1-rc2/bin/clang -march=znver1 -mtune=znver1 -Ofast
mt19937ar.c -o mt19937ar

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-29 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #32 from Andrew Roberts  ---
For what its worth, here's what the latest and greatest from the competition
has to offer:

/usr/local/llvm-5.0.1-rc2/bin/clang -march=znver1 -mtune=znver1 -O3 matrix.c -o
matrix
mult took 887141 clocks

/usr/local/llvm-5.0.1-rc2/biznver1 -O3 mt19937ar.c -o mt19937ar
mt19937ar took 402282 clocks

/usr/local/llvm-5.0.1-rc2/bin/clang -march=znver1 -mtune=znver1 -Ofast matrix.c
-o matrix
mult took 760913 clocks

/usr/local/llvm-5.0.1-rc2/bin/clang -march=znver1 -mtune=znver1 -Ofast
mt19937ar.c -o mt19937ar
mt19937ar took 392527 clocks


current gcc-8 snapshot:
/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1  -Ofast matrix.c -o matrix
mult took 364775 clocks

/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1  -Ofast -o mt19937ar
mt19937ar.c
mt19937ar took 430804 clocks

current gcc-8 snapshot + extra opts to improve znver1 performance
/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none
-mno-fma -Ofast matrix.c -o matrix
mult took 130329 clocks

/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mno-avx2 -Ofast -o
mt19937ar mt19937ar.c
mt19937ar took 387728 clocks

So gcc loses on mt19937ar.c without -mno-avx2
But gcc wins big on matrix.c, especially with -mprefer-vector-width=none
-mno-fma

[Bug driver/83206] -mfpu=auto does not work on ARM (armv7l-unknown-linux-gnueabihf)

2017-11-29 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83206

--- Comment #3 from Andrew Roberts  ---
ok confirmed, this bug is still present on the gcc-7 branch, with the current
snapshot:

/usr/local/gcc-7.2.1/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc-7.2.1/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-7.2.1/bin/../libexec/gcc/armv7l-unknown-linux-gnueabihf/7.2.1/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-7.3.0/configure --prefix=/usr/local/gcc-7.3.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran,lto --disable-libgcj --enable-clocale=gnu
--disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
--disable-libssp --enable-default-pie --enable-default-ssp
--host=armv7l-unknown-linux-gnueabihf --build=armv7l-unknown-linux-gnueabihf
--with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --disable-bootstrap
Thread model: posix
gcc version 7.2.1 20171123 (GCC)

/usr/local/gcc-7.2.1/bin/gcc -march=native -mcpu=cortex-a53  -mfpu=auto -Ofast
-o matrix matrix.c
cc1: error: -mfloat-abi=hard: selected processor lacks an FPU

Also the gcc man pages for 7.2.1 lack documentation for the -mfpu=auto option,
although it is accepted as an argument (gcc 8 does document it)

On 7.2.1 man page: 
  -mfpu=name
   This specifies what floating-point hardware (or hardware emulation)
   is available on the target.  Permissible names are: vfpv2, vfpv3,
   vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd, vfpv3xd-fp16,
   neon-vfpv3, neon-fp16, vfpv4, vfpv4-d16, fpv4-sp-d16, neon-vfpv4,
   fpv5-d16, fpv5-sp-d16, fp-armv8, neon-fp-armv8 and
   crypto-neon-fp-armv8.  Note that neon is an alias for neon-vfpv3
   and vfp is an alias for vfpv2.

On 8.0.0 man page:
   -mfpu=name
   This specifies what floating-point hardware (or hardware emulation)
   is available on the target.  Permissible names are: auto, vfpv2,
   vfpv3, vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd,
   vfpv3xd-fp16, neon-vfpv3, neon-fp16, vfpv4, vfpv4-d16, fpv4-sp-d16,
   neon-vfpv4, fpv5-d16, fpv5-sp-d16, fp-armv8, neon-fp-armv8 and
   crypto-neon-fp-armv8.  Note that neon is an alias for neon-vfpv3
   and vfp is an alias for vfpv2.

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-29 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #31 from Andrew Roberts  ---
of for mt19937ar with -mno-avx2

/usr/local/gcc/bin/gcc -march=$amarch -mtune=$amtune -mno-avx2 -O3 -o mt199
37ar mt19937ar.c

Top 2:
mt19937ar took 358493 clocks -march=silvermont -mtune=bdver1
mt19937ar took 359933 clocks -march=corei7 -mtune=btver2

Top znver1:
mt19937ar took 363177 clocks -march=znver1 -mtune=k8-sse3
mt19937ar took 373751 clocks -march=slm -mtune=znver1
mt19937ar took 379094 clocks -march=znver1 -mtune=znver1

Worst cases:
mt19937ar took 683339 clocks -march=bdver3 -mtune=btver1
mt19937ar took 687566 clocks -march=btver2 -mtune=haswell
mt19937ar took 695629 clocks -march=athlon64-sse3 -mtune=sandybridge
mt19937ar took 697349 clocks -march=k8-sse3 -mtune=knl
mt19937ar took 697831 clocks -march=knl -mtune=core2
mt19937ar took 798283 clocks -march=opteron -mtune=athlon64-sse3

Running just for: -march=znver1 -mtune=znver1  -Ofast
mt19937ar took 445136 clocks
mt19937ar took 449784 clocks
mt19937ar took 460105 clocks

Running just for: -march=znver1 -mtune=znver1 -mno-avx2 -Ofast
mt19937ar took 416937 clocks
mt19937ar took 389458 clocks
mt19937ar took 389154 clocks

So -mno-avx2 gives 13-14% gain depending on how you look at it.

[Bug driver/83206] -mfpu=auto does not work on ARM (armv7l-unknown-linux-gnueabihf)

2017-11-29 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83206

--- Comment #2 from Andrew Roberts  ---
Correction:

1) This works on gcc 8 snapshot, it doesn't work on gcc-7.2.0

/usr/local/gcc-7.2.0/bin/gcc -march=native -mcpu=cortex-a53  -mfpu=auto -Ofast
-o matrix matrix.c
cc1: error: -mfloat-abi=hard: selected processor lacks an FPU

2) The current message when you do not select a cpu explicitly, could do with
improving to prompt you to do so.

/usr/local/gcc/bin/gcc -march=native -mfpu=auto -o matrix matrix.c
cc1: error: -mfloat-abi=hard: selected processor lacks an FPU

Should really prompt user to use -mcpu= to select a cpu

3) This is the gcc version it doesn't work against, I'll check latest gcc-7
snapshot to check if the gcc-8 fix has been backported.

 /usr/local/gcc-7.2.0/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc-7.2.0/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-7.2.0/libexec/gcc/armv7l-unknown-linux-gnueabihf/7.2.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-7.2.0/configure --prefix=/usr/local/gcc-7.2.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran,lto --disable-libgcj --enable-clocale=gnu
--disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
--disable-libssp --enable-default-pie --enable-default-ssp
--host=armv7l-unknown-linux-gnueabihf --build=armv7l-unknown-linux-gnueabihf
--with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --disable-bootstrap
Thread model: posix
gcc version 7.2.0 (GCC)

[Bug driver/83206] -mfpu=auto does not work on ARM (armv7l-unknown-linux-gnueabihf)

2017-11-28 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83206

--- Comment #1 from Andrew Roberts  ---
This was tested using:

/usr/local/gcc/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-8.0.0/libexec/gcc/armv7l-unknown-linux-gnueabihf/8.0.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-8.0.0/configure --prefix=/usr/local/gcc-8.0.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran,lto --disable-libgcj --enable-clocale=gnu
--disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
--disable-libssp --enable-default-pie --enable-default-ssp
--host=armv7l-unknown-linux-gnueabihf --build=armv7l-unknown-linux-gnueabihf
--with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --disable-bootstrap
Thread model: posix
gcc version 8.0.0 20171126 (experimental) (GCC)

and its wasn't a ODROID-XU3 it was a Hardkernel Odroid XU4

[Bug driver/83207] New: On ARM -mcpu=native does not detect ARM big/little cpu combinations correctly (armv7l-unknown-linux-gnueabihf)

2017-11-28 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83207

Bug ID: 83207
   Summary: On ARM -mcpu=native does not detect ARM big/little cpu
combinations correctly
(armv7l-unknown-linux-gnueabihf)
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: driver
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrewm.roberts at sky dot com
  Target Milestone: ---

On ARM autodetection of the CPU using -mcpu=native does not give the expected
results on ARM big/little combinations.

/usr/local/gcc/bin/gcc -mcpu=native -Q --help=target | grep mcpu
  -mcpu=cortex-a7

So it didn't pick:
  cortex-a15.cortex-a7

Tested on Hardkernel Odroid XU4
CPU Model:
4 x Cortex-A15 rev 3 (0x4100c0f0)
4 x Cortex-A7 rev 3 (0x4100c070)

/usr/local/gcc/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-8.0.0/libexec/gcc/armv7l-unknown-linux-gnueabihf/8.0.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-8.0.0/configure --prefix=/usr/local/gcc-8.0.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran,lto --disable-libgcj --enable-clocale=gnu
--disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
--disable-libssp --enable-default-pie --enable-default-ssp
--host=armv7l-unknown-linux-gnueabihf --build=armv7l-unknown-linux-gnueabihf
--with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --disable-bootstrap
Thread model: posix
gcc version 8.0.0 20171126 (experimental) (GCC)

[Bug driver/83206] New: -mfpu=auto does not work on ARM (armv7l-unknown-linux-gnueabihf)

2017-11-28 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83206

Bug ID: 83206
   Summary: -mfpu=auto does not work on ARM
(armv7l-unknown-linux-gnueabihf)
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: driver
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrewm.roberts at sky dot com
  Target Milestone: ---

On ARM an option to -mfpu is auto, this is given when you do:

/usr/local/gcc/bin/gcc -mcpu=native -Q --help=target
...
  Known ARM FPUs (for use with the -mfpu= option):
auto crypto-neon-fp-armv8 fp-armv8 fpv4-sp-d16 fpv5-d16 fpv5-sp-d16 neon
neon-fp-armv8 neon-fp16 neon-vfpv3 neon-vfpv4 vfp vfp3 vfpv2 vfpv3
vfpv3-d16
vfpv3-d16-fp16 vfpv3-fp16 vfpv3xd vfpv3xd-fp16 vfpv4 vfpv4-d16

If you try:
/usr/local/gcc/bin/gcc -mcpu=native -mfpu=auto -Q --help=target
You get:
  -mfpu=auto

But if you try to use it:
gcc -march=native -mcpu=native -mtune=native -mfpu=auto -Ofast -o matrix
matrix.c
You get:
cc1: error: -mfloat-abi=hard: selected processor lacks an FPU
which isn't true as:
gcc -march=native -mcpu=native -mtune=native -mfpu=neon -Ofast -o matrix
matrix.c
works
as does
-mfpu=vfpv3-d16
etc

This is true on:
armv7l and armv6l at least, tested on:
ODROID-XU3: (ARM big/little Cortex-A15/A7)
Raspbery Pi B: (ARM ARM1176)
Raspberry Pi 2B v1: (ARM Cortex-A7)
Raspberry Pi 3B: (ARM Cortex-A53)

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #29 from Andrew Roberts  ---
And rerunning all the tests for matrix.c on Ryzen using:
-march=$amarch -mtune=$amtune -mprefer-vector-width=none -mno-fma -O3

The winners were:
mult took 118145 clocks -march=broadwell -mtune=broadwell
mult took 118912 clocks -march=core-avx2 -mtune=core-avx2

Top -mtune=znver1
mult took 121845 clocks -march=core-avx2 -mtune=znver1
mult took 129241 clocks -march=znver1 -mtune=znver1

And the bottom of the list no longer has a cluster of -mtune= btverX, bdverX,
znver1

Worst cases:
mult took 253400 clocks -march=x86-64 -mtune=haswell
mult took 254006 clocks -march=bonnell -mtune=westmere
mult took 254624 clocks -march=bonnell -mtune=silvermont
mult took 258577 clocks -march=bonnell -mtune=nehalem
mult took 260612 clocks -march=bonnell -mtune=corei7
mult took 277789 clocks -march=nocona -mtune=nano-x4

-

And rerunning all the tests for matrix.c on Ryzen using:
-march=$amarch -mtune=$amtune -mprefer-vector-width=none -mno-fma -mno-avx2
-Ofast

The winners were:
mult took 116405 clocks -march=broadwell -mtune=broadwell
mult took 117314 clocks -march=ivybridge -mtune=haswell
mult took 117551 clocks -march=broadwell -mtune=bdver2

Top znver1:
mult took 119951 clocks -march=knl -mtune=znver1
mult took 120442 clocks -march=znver1 -mtune=znver1

Worst cases:
mult took 239640 clocks -march=nehalem -mtune=bdver3
mult took 240623 clocks -march=athlon64-sse3 -mtune=silvermont
mult took 241143 clocks -march=eden-x2 -mtune=nano-2000
mult took 241547 clocks -march=core2 -mtune=intel
mult took 241870 clocks -march=nehalem -mtune=bdver2
mult took 248251 clocks -march=nocona -mtune=intel

The differences between broadwell and znver1 is within the margin of error I
would suggest, with these options.

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #28 from Andrew Roberts  ---
Adding -mno-avx2 into the mix was a marginal win, but only just showing out of
the noise:

/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none
-mno-fma -mno-avx2 -O3 matrix.c -o matrix
   mult took 121397 clocks
   mult took 124373 clocks
   mult took 125345 clocks

/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none
-mno-fma -O3 matrix.c -o matrix
mult took 123262 clocks
mult took 128193 clocks
mult took 125891 clocks

Using -Ofast instead of -O3

/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none
-mno-fma -Ofast matrix.c -o matrix
mult took 125163 clocks
mult took 123799 clocks
mult took 122808 clocks

/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none
-mno-fma -mno-avx2 -Ofast matrix.c -o matrix
mult took 130189 clocks
mult took 122726 clocks
mult took 123686 clocks

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #24 from Andrew Roberts  ---
For the mt19937ar test:

/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -O3 mt19937ar.c -o mt19937ar
  mt19937ar took 462062 clocks

/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none
-O3 mt19937ar.c -o mt19937ar
  mt19937ar took 412449 clocks

/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none
-mno-fma -O3 mt19937ar.c -o mt19937ar
  mt19937ar took 419284 clocks

/usr/local/gcc/bin/gcc -march=haswell -mtune=haswell -mprefer-vector-width=none
-mno-fma -O3 mt19937ar.c -o mt19937ar
  mt19937ar took 436768 clocks

/usr/local/gcc/bin/gcc -march=corei7-avx -mtune=skylake -O3 mt19937ar.c -o
mt19937ar
  mt19937ar took 410302 clocks

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-28 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #23 from Andrew Roberts  ---
Thanks Honza,

getting closer, with original matrix.c on Ryzen:

/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -O3 matrix.c -o matrix
mult took 364850 clocks

/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none
-O3 matrix.c -o matrix
   mult took 194517 clocks

/usr/local/gcc/bin/gcc -march=znver1 -mtune=znver1 -mprefer-vector-width=none
-mno-fma -O3 matrix.c -o matrix
mult took 130343 clocks

/usr/local/gcc/bin/gcc -march=haswell -mtune=haswell -mprefer-vector-width=none
-mno-fma -O3 matrix.c -o matrix
mult took 130129 clocks

These last two are comparable with the fastest obtained from trying all
combinations of -march and -mtune

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-27 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #20 from Andrew Roberts  ---
Again those latest mt19937ar results above were with the current snapshot:

/usr/local/gcc/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-8.0.0/libexec/gcc/x86_64-unknown-linux-gnu/8.0.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-8.0.0/configure --prefix=/usr/local/gcc-8.0.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-initfini-array --enable-gnu-indirect-function --with-isl
--enable-languages=c,c++,fortran,lto --disable-libgcj --enable-lto
--enable-multilib --with-tune=generic --with-arch_32=i686
--host=x86_64-unknown-linux-gnu --build=x86_64-unknown-linux-gnu
--disable-bootstrap
Thread model: posix
gcc version 8.0.0 20171126 (experimental) (GCC)

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-27 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #19 from Andrew Roberts  ---
Created attachment 42735
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42735=edit
modified mt19937ar test program, test script and results

modified mt19937ar test program, test script and results

tar -tf mt19937ar-test.tar.gz
./doit.csh   <= Test script, change path to gcc!
./mt19937ar.c<= main function altered to give test results
./mt19937ar-haswell.txt  <= full results on Intel Core i5-4570S
./mt19937ar-ryzen.txt<= full results on AMD Ryzen 7 1700 Eight-Core
Processor

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-27 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #18 from Andrew Roberts  ---
Ok trying an entirely different algorith, same results:

Using Mersenne Twister algorithm from here:
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html

alter main program to comment out original test harness, and replace
main with:

int main(void)
{
int i;
unsigned long init[4]={0x123, 0x234, 0x345, 0x456}, length=4;
init_by_array(init, length);
clock_t e, s=clock();
int j=genrand_int32();
for(i=0; i<1; i++)
{
  j ^= genrand_int32();
}
e=clock();
if (j != -549769613) printf("Error j != -549769613 (%d)\n", j);
printf("mt19937ar took %ld clocks ", (long)(e-s));
return 0;
}

So nothing complicated.
On Ryzen:


Top 5:
mt19937ar took 354877 clocks -march=amdfam10 -mtune=k8
mt19937ar took 356203 clocks -march=bdver2 -mtune=eden-x2
mt19937ar took 356534 clocks -march=nano-x2 -mtune=nano-1000
mt19937ar took 357321 clocks -march=athlon-fx -mtune=nano-x4
mt19937ar took 357634 clocks -march=bdver3 -mtune=nano-x2

Bot 5:
mt19937ar took 675052 clocks -march=nano -mtune=btver1
mt19937ar took 679826 clocks -march=k8 -mtune=nocona
mt19937ar took 681118 clocks -march=opteron -mtune=atom
mt19937ar took 689604 clocks -march=core2 -mtune=broadwell
mt19937ar took 699840 clocks -march=skylake -mtune=generic

Top -mtune=znver1
mt19937ar took 369722 clocks -march=nano-x2 -mtune=znver1

Top -march=znver1
mt19937ar took 375286 clocks -march=znver1 -mtune=silvermont

-march=znver1 -mtune=znver1 (aka native)
mt19937ar took 430875 clocks -march=znver1 -mtune=znver1

-march=haswell -mtune=haswell
mt19937ar took 402963 clocks -march=haswell -mtune=haswell

-march=k8 -mtune=k8
mt19937ar took 367890 clocks -march=k8 -mtune=k8

so -march=znver1 -mtune=znver1 is:
7% slower than tuning for haswell
17% slower than tuning for k8

Again -mtune=znver1, -mtune=bdverX, -mtune=btverX all cluster at the bottom

On Haswell:
--

Top 5:
mt19937ar took 29 clocks -march=amdfam10 -mtune=barcelona
mt19937ar took 29 clocks -march=amdfam10 -mtune=bdver1
mt19937ar took 29 clocks -march=amdfam10 -mtune=bdver2
mt19937ar took 29 clocks -march=amdfam10 -mtune=bdver3
mt19937ar took 29 clocks -march=amdfam10 -mtune=bdver4

Bot 5:
mt19937ar took 37 clocks -march=znver1 -mtune=bdver3
mt19937ar took 37 clocks -march=znver1 -mtune=bdver4
mt19937ar took 37 clocks -march=znver1 -mtune=btver2
mt19937ar took 37 clocks -march=znver1 -mtune=znver1
mt19937ar took 38 clocks -march=knl -mtune=bdver1

Top -mtune=haswell
mt19937ar took 30 clocks -march=bdver4 -mtune=haswell

Top -march=haswell
mt19937ar took 30 clocks -march=haswell -mtune=broadwell

-march=haswell -mtune=haswell (aka native)
mt19937ar took 30 clocks -march=haswell -mtune=haswell

Best performing pair:
mt19937ar took 29 clocks -march=barcelona -mtune=barcelona

so the haswell options are pretty much optimal on that hardware
 as from other test.

[Bug driver/83193] Help for invalid -march= options from cc1 omits -march=native on x86-64, arm. aarch64, output also inconsistent

2017-11-27 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83193

--- Comment #1 from Andrew Roberts  ---
The same comments also apply to the -mcpu and -mtune options as well. Double
output on arm for -mcpu, and missing help for native.

also:

gcc -Q --help=target
used to document the allowable -mcpu/-mtune options, but now only documents the
allowable -mfpu/-mfpmath= options (across ARM, AARCH64 and X86-64). This was
really helpful.

And on aarch64 the -Q --help-target option doesn't properly display -march,
-mcpu -mtune, it displays -march=ARCH, -mcpu=CPU, -mtune=CPU, rather than
-march=, -mcpu=, -mtune= as other systems do.

AARCH64
/usr/local/gcc/bin/gcc -Q --help=target
The following options are target specific:
...  
  -march=ARCH   armv8-a
...
  -mcpu=CPU
...
  -mtune=CPU

ARM
/usr/local/gcc/bin/gcc -Q --help=target
The following options are target specific:
...
  -march=   armv7-a+fp
...
  -mcpu=
...
  -mtune=



X86-64
/usr/local/gcc/bin/gcc -Q --help=target
The following options are target specific:
...
  -march=   x86-64
...
  -mcpu= 
... 
  -mtune=   generic

Sorry to be so pedantic.

[Bug driver/83193] New: Help for invalid -march= options from cc1 omits -march=native on x86-64, arm. aarch64, output also inconsistent

2017-11-27 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83193

Bug ID: 83193
   Summary: Help for invalid -march= options from cc1 omits
-march=native on x86-64, arm. aarch64, output also
inconsistent
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: driver
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrewm.roberts at sky dot com
  Target Milestone: ---

-march=native no longer documented in cc1 help message  and the help output is
buggy and inconsistent (missing on aarch64, given twice on arm)

ON X86-64
-

The cc1 help message when invalid -march= values are passed omits "native" as
an option on x86-64. This is happening on at least 8.0 snapshots and 7.2
branch.

/usr/local/gcc/bin/gcc -march=fdsfks -E - < /dev/null
# 1 ""
cc1: error: bad value (‘fdsfks’) for ‘-march=’ switch
cc1: note: valid arguments to ‘-march=’ switch are: nocona core2 nehalem corei7
westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2
broadwell skylake skylake-avx512 cannonlake bonnell atom silvermont slm knl knm
x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8
k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10
barcelona bdver1 bdver2 bdver3 bdver4 znver1 btver1 btver2

/usr/local/gcc/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-8.0.0/libexec/gcc/x86_64-unknown-linux-gnu/8.0.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-8.0.0/configure --prefix=/usr/local/gcc-8.0.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-initfini-array --enable-gnu-indirect-function --with-isl
--enable-languages=c,c++,fortran,lto --disable-libgcj --enable-lto
--enable-multilib --with-tune=generic --with-arch_32=i686
--host=x86_64-unknown-linux-gnu --build=x86_64-unknown-linux-gnu
--disable-bootstrap
Thread model: posix
gcc version 8.0.0 20171126 (experimental) (GCC) 

ON ARM
--

On arm native it was included in the list as of 7.x, now it is also missing,
AND THE INFO IS DISPLAYED TWICE:

/usr/local/gcc/bin/gcc -march=fdsfks -E - < /dev/null
gcc: error: unrecognized -march target: fdsfks
gcc: note: valid arguments are: armv2 armv2a armv3 armv3m armv4 armv4t armv5
armv5t armv5e armv5te armv5tej armv6 armv6j armv6k armv6z armv6kz armv6zk
armv6t2 armv6-m armv6s-m armv7 armv7-a armv7ve armv7-r armv7-m armv7e-m armv8-a
armv8.1-a armv8.2-a armv8.3-a armv8-m.base armv8-m.main armv8-r iwmmxt iwmmxt2
gcc: error: unrecognized -march target: fdsfks
gcc: note: valid arguments are: armv2 armv2a armv3 armv3m armv4 armv4t armv5
armv5t armv5e armv5te armv5tej armv6 armv6j armv6k armv6z armv6kz armv6zk
armv6t2 armv6-m armv6s-m armv7 armv7-a armv7ve armv7-r armv7-m armv7e-m armv8-a
armv8.1-a armv8.2-a armv8.3-a armv8-m.base armv8-m.main armv8-r iwmmxt iwmmxt2
gcc: error: missing argument to ‘-march=’

/usr/local/gcc/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-8.0.0/libexec/gcc/armv7l-unknown-linux-gnueabihf/8.0.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-8.0.0/configure --prefix=/usr/local/gcc-8.0.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran,lto --disable-libgcj --enable-clocale=gnu
--disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
--disable-libssp --enable-default-pie --enable-default-ssp
--host=armv7l-unknown-linux-gnueabihf --build=armv7l-unknown-linux-gnueabihf
--with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --disable-bootstrap
Thread model: posix
gcc version 8.0.0 20171126 (experimental) (GCC)

ON AARCH64
--

On aarch64 no help is given:

/usr/local/gcc/bin/gcc -march=fdsfks -E - < /dev/null
# 1 ""
cc1: error: unknown value ‘fdsfks’ for -march

/usr/local/gcc/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-8.0.0/libexec/gcc/aarch64-unknown-linux-gnu/8.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../gcc-8.0.0/configure --prefix=/usr/local/gcc-8.0.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linke

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-27 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #17 from Andrew Roberts  ---
The general consensus in userland is that the znver1 optimization is much worse
than 0.5%, or even 2% off. Most people are using -march=haswell if they care
about performance.

Just taking one part of one of my apps I see a 5% difference with
-march=haswell vs -march=znver1, and this is just general code (loading GL
extensions). 

The trick is to remove system dependencies from things I could benchmark. If
there are no recommendations, I'll come up with some tests myself for various
workloads, and try across various march/tune combos.

I'll also look at some other real world benchmarks that are available online.

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-27 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #14 from Andrew Roberts  ---
It would be nice if znver1 for -march and -mtune could be improved before the
gcc 8 release. At present -march=znver1 -mtune=znver1 looks be to about the
worst thing you could do, and not just on this vectorizable code. And given we
tell people to use -march=native which gives this, it would be nice to improve.

With the attached example switching to larger vectors still only gets to 20
clocks, whereas other combinations get down to 116045

mult took 116045 clocks -march=corei7-avx -mtune=skylake

So there is more going on here than just the vector length.

If there is any testing to isolate other options I would be happy to help, just
point me in the right direction. If there are good (open) benchmarks I can
routinely test on a range of targets I would be happy to. I have ryzen,
haswell, skylake, arm, aarch64, etc.

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-26 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #12 from Andrew Roberts  ---
Ok I've tried again with this weeks snapshot:

gcc version 8.0.0 20171126 (experimental) (GCC) 

Taking combination of -march and -mtune which works well on Ryzen:

/usr/local/gcc/bin/gcc -march=core-avx-i -mtune=nocona -O3 matrix.c -o matrix
./matrix
mult took 131153 clocks

Then switching to -mtune=znver1

/usr/local/gcc/bin/gcc -march=core-avx-i -mtune=znver1 -O3 matrix.c -o matrix
./matrix
 mult took 231309 clocks

Then looking at the differences in the -Q --help=target output for these two
and eliminating each difference at a time, I found that:

gcc -march=core-avx-i -mtune=znver1 -mprefer-vector-width=none -O3 matrix.c -o
matrix
[aroberts@ryzen share]$ ./matrix
mult took 132295 clocks

The default for znver1 is: -mprefer-vector-width=128

So is this option still helping with the latest microcode? Not in this case at
least.

cat /proc/cpuinfo : 
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 23
model   : 1
model name  : AMD Ryzen 7 1700 Eight-Core Processor
stepping: 1
microcode   : 0x8001129

with -march=znver1 -mtune=znver1
with default of -mprefer-vector-width=128
mult took 386291 clocks

with -march=znver1 -mtune=znver1 -mprefer-vector-width=none
mult took 201455 clocks

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-22 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #10 from Andrew Roberts  ---
Created attachment 42691
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42691=edit
Script for matrix.c test program

Script for matrix.c test program

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-22 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #9 from Andrew Roberts  ---
Created attachment 42690
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42690=edit
Test results for Skylake system with matrix.c

Test results for Skylake system with matrix.c

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-22 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #8 from Andrew Roberts  ---
Created attachment 42689
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42689=edit
Test results for Haswell system with matrix.c

Test results for Haswell system with matrix.c

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-22 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #7 from Andrew Roberts  ---
Created attachment 42688
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42688=edit
Test results for Ryzen system with matrix.c

Test results for Ryzen system with matrix.c

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-22 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #6 from Andrew Roberts  ---
Created attachment 42687
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42687=edit
Test program used for the attached performance results (matrix.c)

Test program used for the attached performance results (matrix.c)

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-22 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

--- Comment #5 from Andrew Roberts  ---
I've been testing on a Ryzen system and also comparing with Haswell and
Skylake. From my testing -mtune=znver1 does not perform well and never has,
including as of last snapshot:
gcc version 8.0.0 20171119 (experimental) (GCC)

-mtune=generic seems a better option for all three systems as a default for
-march=native

This is only with one test case (attached), but I've seen the same across many
other tests.

See the attached testcase (matix.c) and performance logs 
Ryzen - znver1-tunebug.txt
Haswell - znver1-tunebug2.txt
Skylake - znver1-tunebug3.txt

[Bug target/81616] Update -mtune=generic for the current Intel and AMD processors

2017-11-22 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81616

Andrew Roberts  changed:

   What|Removed |Added

 CC||andrewm.roberts at sky dot com

--- Comment #4 from Andrew Roberts  ---
I've been testing on a Ryzen system and also comparing with Haswell and
Skylake. From my testing -mtune=znver1 does not perform well and never has,
including as of last snapshot:
gcc version 8.0.0 20171119 (experimental) (GCC)

-mtune=generic seems a better option for all three systems as a default for
-march=native

This is only with one test case (attached), but I've seen the same across many
other tests.

See the attached testcase (matix.c) and performance logs 
Ryzen - znver1-tunebug.txt
Haswell - znver1-tunebug2.txt
Skylake - znver1-tunebug3.txt

[Bug target/82175] [8 Regression] -march=native fails on armv7 big/little system armv7l-unknown-linux-gnueabihf with gcc 8.0.0

2017-10-02 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82175

--- Comment #10 from Andrew Roberts  ---
Created attachment 42276
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42276=edit
Output of gcc -march=native -Q --help=target for gcc 7.2.0

Output of gcc -march=native -Q --help=target for gcc 7.2.0
generated on ODROIDXU4, armv7

[Bug target/82175] [8 Regression] -march=native fails on armv7 big/little system armv7l-unknown-linux-gnueabihf with gcc 8.0.0

2017-10-02 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82175

--- Comment #9 from Andrew Roberts  ---
Created attachment 42275
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42275=edit
Output of gcc -march=native -Q --help=target for gcc 8.0.0.20171001

Output of gcc -march=native -Q --help=target for gcc 8.0.0.20171001

[Bug target/82175] [8 Regression] -march=native fails on armv7 big/little system armv7l-unknown-linux-gnueabihf with gcc 8.0.0

2017-10-02 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82175

--- Comment #8 from Andrew Roberts  ---
I generated it using:
/usr/local/gcc-7.2.0/bin/gcc -march=native -Q --help=target
and
/usr/local/gcc-8.0.0/bin/gcc -march=native -Q --help=target

on each of the systems. Using:

/usr/local/gcc-8.0.0/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc-8.0.0/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-8.0.0/libexec/gcc/armv7l-unknown-linux-gnueabihf/8.0.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-8.0.0/configure --prefix=/usr/local/gcc-8.0.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran,lto --disable-libgcj --enable-clocale=gnu
--disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
--disable-libssp --enable-default-pie --enable-default-ssp
--host=armv7l-unknown-linux-gnueabihf --build=armv7l-unknown-linux-gnueabihf
--with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --disable-bootstrap
Thread model: posix
gcc version 8.0.0 20171001 (experimental) (GCC)

and

/usr/local/gcc-7.2.0/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc-7.2.0/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-7.2.0/libexec/gcc/armv7l-unknown-linux-gnueabihf/7.2.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-7.2.0/configure --prefix=/usr/local/gcc-7.2.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran,lto --disable-libgcj --enable-clocale=gnu
--disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
--disable-libssp --enable-default-pie --enable-default-ssp
--host=armv7l-unknown-linux-gnueabihf --build=armv7l-unknown-linux-gnueabihf
--with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --disable-bootstrap
Thread model: posix
gcc version 7.2.0 (GCC)

I've attached the full output from the ODROIDXU4 system.

[Bug target/82175] [8 Regression] -march=native fails on armv7 big/little system armv7l-unknown-linux-gnueabihf with gcc 8.0.0

2017-10-02 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82175

--- Comment #6 from Andrew Roberts  ---
Thanks Richard, this is now ok, tested on armv7 and aarch64. 

However I do see differences in what is selected by march=native on arm between
7.2.0 and 8.0.0.20171001. Is this as expected? Or is it a work in progress?
There seem to be significant changes...

On aarch64: The only difference is: (< is gcc-7.2.0, > is gcc-8)

<   -mtls-size= [default]
---
>   -mtls-size= 24

On armv7: (tested on RPI, and ODROID XU4)
RPI:
<   -march= armv8-a+crc
---
>   -march= armv8-a+crc+simd (RPI)

ODROID XU4:
<   -march= armv7ve
---
>   -march= armv7ve+vfpv3-d16

Differences ommon to both RPI and ODROID XU4:
>   -mbe32  [enabled]
>   -mbe8   [disabled]

<   -mcpu=  [default]
<   -mfix-cortex-m3-ldrd[enabled]
---
>   -mcpu=  
>   -mfix-cortex-m3-ldrd[disabled]

<   -mrestrict-it   [enabled]
---
>   -mrestrict-it   [disabled]

<   -mstructure-size-boundary=  32
---
>   -mstructure-size-boundary=  8

<   -mthumb-interwork   [enabled]
---
>   -mthumb-interwork   [disabled]

<   -mtp=   auto
---
>   -mtp=   cp15

<   -mtune= [default]
---
>   -mtune=

[Bug target/82175] -march=native fails on armv7 big/little system armv7l-unknown-linux-gnueabihf with gcc 8.0.0

2017-09-11 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82175

--- Comment #1 from Andrew Roberts  ---
This also fails on a Raspberry PI 3 running armv7 in the same way.
Looks like the armv8 code has got mixed up with the armv7 code...

The RPI3 has:
4x ARM Cortex-A53 rev 4 (0x4100d030)

cat /proc/cpuinfo
processor   : 0
model name  : ARMv7 Processor rev 4 (v7l)
BogoMIPS: 38.40
Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 4
...

It does work ok on a different armv8 (aarch64) system:
Odroid-C2, this uses the same processors as the RPI3, but is running an aarch64
linux OS.
4 x ARM Cortex-A53 rev 4 (0x4100d030)

cat /proc/cpuinfo
processor   : 0
BogoMIPS: 2.00
Features: fp asimd crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 4
...

On the ODROID-C2 in aarch64 mode:
/usr/local/gcc/bin/gcc -Q --help=target | grep march
  -march=ARCH   armv8-a

[Bug c/82175] New: -march=native fails on armv7 big/little system armv7l-unknown-linux-gnueabihf with gcc 8.0.0

2017-09-11 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82175

Bug ID: 82175
   Summary: -march=native fails on armv7 big/little system
armv7l-unknown-linux-gnueabihf with gcc 8.0.0
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrewm.roberts at sky dot com
  Target Milestone: ---

gcc-7.2.0 is ok on this target, but gcc-8.0.0 fails to detect native target.

cat > test.c
#include 

int main(void)
{
printf("Hello World\n");
return 0;
}
^D
/usr/local/gcc-8.0.0/bin/gcc -march=native -o test800 test.c
gcc: error: unrecognized -march target: native
gcc: note: valid arguments are: armv2 armv2a armv3 armv3m armv4 armv4t armv5
armv5t armv5e armv5te armv5tej armv6 armv6j armv6k armv6z armv6kz armv6zk
armv6t2 armv6-m armv6s-m armv7 armv7-a armv7ve armv7-r armv7-m armv7e-m armv8-a
armv8.1-a armv8.2-a armv8-m.base armv8-m.main armv8-r iwmmxt iwmmxt2
gcc: error: unrecognized -march target: native
gcc: note: valid arguments are: armv2 armv2a armv3 armv3m armv4 armv4t armv5
armv5t armv5e armv5te armv5tej armv6 armv6j armv6k armv6z armv6kz armv6zk
armv6t2 armv6-m armv6s-m armv7 armv7-a armv7ve armv7-r armv7-m armv7e-m armv8-a
armv8.1-a armv8.2-a armv8-m.base armv8-m.main armv8-r iwmmxt iwmmxt2
gcc: error: missing argument to ‘-march=’

But --help=target gives a result, but seems to use armv8 syntax.
/usr/local/gcc-8.0.0/bin/gcc -Q --help=target  |& grep march
  -march=   armv7-a+fp

gcc-7.2.0 gives:
/usr/local/gcc-7.2.0/bin/gcc -Q --help=target  |& grep march
  -march=   armv7-a

Both versions of gcc configured identically apart from --prefix=
/usr/local/gcc-8.0.0/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc-8.0.0/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-8.0.0/libexec/gcc/armv7l-unknown-linux-gnueab
ihf/8.0.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-8.0.0/configure --prefix=/usr/local/gcc-8.0.0
--program-
suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checkin
g=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exception
s --enable-gnu-unique-object --enable-linker-build-id
--with-linker-hash-style=g
nu --enable-plugin --enable-gnu-indirect-function --enable-lto --with-isl
--enab
le-languages=c,c++,fortran,lto --disable-libgcj --enable-clocale=gnu
--disable-l
ibstdcxx-pch --enable-install-libiberty --disable-multilib --disable-libssp
--en
able-default-pie --enable-default-ssp --host=armv7l-unknown-linux-gnueabihf
--bu
ild=armv7l-unknown-linux-gnueabihf --with-arch=armv7-a --with-float=hard
--with-
fpu=vfpv3-d16 --disable-bootstrap
Thread model: posix
gcc version 8.0.0 20170910 (experimental) (GCC) 


Target/Host system is a ODroid-XU4 with 8 cores:
Cores 0..3: ARM Cortex-A7 rev 3 (0x4100c070)
Cores 4..7: ARM Cortex-A15 rev 3 (0x4100c0f0)
cat /proc/cpuinfo
processor   : 0
model name  : ARMv7 Processor rev 3 (v7l)
BogoMIPS: 18.00
Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt 
vfpd32 lpae evtstrm 
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xc07
CPU revision: 3
...
processor   : 4
model name  : ARMv7 Processor rev 3 (v7l)
BogoMIPS: 18.00
Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt 
vfpd32 lpae evtstrm 
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part: 0xc0f
CPU revision: 3
...

[Bug bootstrap/81864] building gcc 8 with --enable-gather-detailed-mem-stats fails on x86-64, arm and aarch64 under gnu linux

2017-08-17 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81864

--- Comment #8 from Andrew Roberts  ---
aarch64 also ok with gcc-8.0.0 for me.

[Bug bootstrap/81864] building gcc 8 with --enable-gather-detailed-mem-stats fails on x86-64, arm and aarch64 under gnu linux

2017-08-17 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81864

--- Comment #7 from Andrew Roberts  ---
Works for me on x86-64, trying aarch64 now.

[Bug middle-end/81818] aarch64 uses 2-3x memory and 2x time of arm at -Os, -O2, -O3

2017-08-17 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81818

--- Comment #11 from Andrew Roberts  ---
Created attachment 41992
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41992=edit
gcc-7.2.0 -fmem-report output for arm, aarch64, and x86-64

Output for gcc 7.2.0 with -fmem-report (as gcc-7.2.0-fmem-report.tar.bz2).

g++ -Ox -fmem-report -c testmap.cpp
where -Ox is one of: -O0, -O1, -O2, -O3, or -O1 -fgcse

This is across: x64 (x86-64) , arm, aarch64-rpi3 (aarch64)
Both Raspberry Pi 3 systems are identical, one has 32 bit OS, other has 64 bit
OS (Arch Linux ARM)

The files are named: gcc-7.2.0-[arch]-[opt].txt.

[Bug middle-end/81818] aarch64 uses 2-3x memory and 2x time of arm at -Os, -O2, -O3

2017-08-17 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81818

--- Comment #10 from Andrew Roberts  ---
I've attached the output for gcc 7.2.0 with -fmem-report (as
gcc-7.2.0-fmem-report.tar.bz2).

g++ -Ox -fmem-report -c testmap.cpp
where -Ox is one of: -O0, -O1, -O2, -O3, or -O1 -fgcse

This is across: x64 (x86-64) , arm, aarch64-rpi3 (aarch64)
Both Raspbery Pi 3 systems are identical, one has 32 bit OS, other has 64 bit
OS (Arch Linux ARM)

The files are named: gcc-7.2.0-[arch]-[opt].txt.

The original issue was large memory usage increase for aarch64 vs arm, on -O2
and above. So looking at -O1 vs -O2 for the above.

There seem to be leaks in the Bitmaps:

  Total Memory  Percentage
  MemoryLeaked  Leaked
arm -O1: 54067992  10582346 19.57%
arm -O2: 43536148  15595746 35.82%
aarch64 -O1: 39788848   9005047 22.63%
aarch64 -O2: 74521688  42694630 57.29% <= big increase on aarch64 at -O2

47% of the leaks at -O2 on aarch64 are in:
df-problems.c:1912 (df_mir_alloc)543920:  0.7% 202813600 
10167911: 23.8%   0   0  heap
df-problems.c:1913 (df_mir_alloc)544080:  0.7% 202798720 
10167165: 23.8%   0   0  heap

32% of the leaks at -O2 on x86-64 are also in the same place, so I guess this
is a 64bit code path.

I don't see anything else which stands out as being different between arm and
aarch64 as they move from -O1 to -O2.
There are plenty of other leaks though, although how significant these are I
have no idea.

The arm gcc is configured with:
/usr/local/gcc/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc/bin/g++
COLLECT_LTO_WRAPPER=/usr/local/gcc-7.2.0/libexec/gcc/armv7l-unknown-linux-gnueabihf/7.2.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-7.2.0/configure --prefix=/usr/local/gcc-7.2.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran --disable-libgcj --enable-clocale=gnu
--disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
--disable-libssp --host=armv7l-unknown-linux-gnueabihf
--build=armv7l-unknown-linux-gnueabihf --with-arch=armv7-a --with-float=hard
--with-fpu=vfpv3-d16 --disable-bootstrap --enable-gather-detailed-mem-stats
Thread model: posix
gcc version 7.2.0 (GCC)

The aarch64 gcc is configured with:
/usr/local/gcc/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc/bin/g++
COLLECT_LTO_WRAPPER=/usr/local/gcc-7.2.0/libexec/gcc/aarch64-unknown-linux-gnu/7.2.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../gcc-7.2.0/configure --prefix=/usr/local/gcc-7.2.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran --disable-libgcj --enable-clocale=gnu
--disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
--enable-shared --enable-clocale=gnu --with-arch-directory=aarch64
--enable-multiarch --disable-libssp --host=aarch64-unknown-linux-gnu
--build=aarch64-unknown-linux-gnu --with-arch=armv8-a --disable-bootstrap
--enable-gather-detailed-mem-stats
Thread model: posix
gcc version 7.2.0 (GCC)

The x86-64 gcc is configured with:
/usr/local/gcc/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc/bin/g++
COLLECT_LTO_WRAPPER=/usr/local/gcc-7.2.0/libexec/gcc/x86_64-unknown-linux-gnu/7.2.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-7.2.0/configure --prefix=/usr/local/gcc-7.2.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-initfini-array --enable-gnu-indirect-function --with-isl
--enable-languages=c,c++,fortran,lto --disable-libgcj --enable-lto
--enable-multilib --with-tune=generic --with-arch_32=i686
--host=x86_64-unknown-linux-gnu --build=x86_64-unknown-linux-gnu
--with-ld=/usr/local/bin/ld --with-gnu-ld --with-as=/usr/local/bin/as
--with-gnu-as --disable-bootstrap --enable-gather-detailed-mem-stats
Thread model: posix
gcc version 7.2.0 (GCC)

[Bug bootstrap/81864] building gcc 8 with --enable-gather-detailed-mem-stats fails on x86-64, arm and aarch64 under gnu linux

2017-08-17 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81864

--- Comment #2 from Andrew Roberts  ---
I can confirm gcc 7.2.0 builds ok on x86-64, arm and aarch64 with
--enable-gather-detailed-mem-stats. 

So its just 8.0.0 which is failing.

[Bug bootstrap/81864] New: building gcc 8 with --enable-gather-detailed-mem-stats fails on x86-64, arm and aarch64 under gnu linux

2017-08-16 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81864

Bug ID: 81864
   Summary: building gcc 8 with --enable-gather-detailed-mem-stats
fails on x86-64, arm and aarch64 under gnu linux
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrewm.roberts at sky dot com
  Target Milestone: ---

Building gcc gcc-8-20170806 with --enable-gather-detailed-mem-stats fails:
On x64, arm and aarch64.

gcc-7.2.0 (release version) builds ok (at least on x64) with same options.
gcc-8.0.0 20170813 also fails on all.

on x64:

/home/aroberts/gcc/gcc-build/./gcc/xgcc -B/home/aroberts/gcc/gcc-build/./gcc/
-xc -nostdinc /dev/null -S -o /dev/null
-fself-test=../../gcc-8.0.0/gcc/testsuite/selftests
xgcc: internal compiler error: Segmentation fault (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://gcc.gnu.org/bugs/> for instructions.
make[2]: *** [Makefile:1952: s-selftest-c] Error 4
rm fsf-funding.pod gcov.pod gpl.pod cpp.pod gfdl.pod gcc.pod gcov-dump.pod
gfortran.pod gcov-tool.pod
make[2]: Leaving directory '/home/aroberts/gcc/gcc-build/gcc'
make[1]: *** [Makefile:4305: all-gcc] Error 2
make[1]: Leaving directory '/home/aroberts/gcc/gcc-build'
make: *** [Makefile:918: all] Error 2

/home/aroberts/gcc/gcc-build/./gcc/xgcc -v -save-temps
-B/home/aroberts/gcc/gcc-build/./gcc/ -xc -nostdinc /dev/null -S -o /dev/null
-fself-test=../../gcc-8.0.0/gcc/testsuite/selftests
Reading specs from /home/aroberts/gcc/gcc-build/./gcc/specs
COLLECT_GCC=/home/aroberts/gcc/gcc-build/./gcc/xgcc
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-8.0.0/configure --prefix=/usr/local/gcc-8.0.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-initfini-array --enable-gnu-indirect-function --with-isl
--enable-languages=c,c++,fortran,lto --disable-libgcj --enable-lto
--enable-multilib --with-tune=generic --with-arch_32=i686
--host=x86_64-unknown-linux-gnu --build=x86_64-unknown-linux-gnu
--with-ld=/usr/local/bin/ld --with-gnu-ld --with-as=/usr/local/bin/as
--with-gnu-as --disable-bootstrap --enable-gather-detailed-mem-stats
Thread model: posix
gcc version 8.0.0 20170806 (experimental) (GCC) 
COLLECT_GCC_OPTIONS='-v' '-save-temps' '-B'
'/home/aroberts/gcc/gcc-build/./gcc/' '-nostdinc' '-S' '-o' '/dev/null'
'-fself-test=../../gcc-8.0.0/gcc/testsuite/selftests' '-mtune=generic'
'-march=x86-64'
 /home/aroberts/gcc/gcc-build/./gcc/cc1 -E -quiet -nostdinc -v -iprefix
/home/aroberts/gcc/gcc-build/gcc/../lib/gcc/x86_64-unknown-linux-gnu/8.0.0/
-isystem /home/aroberts/gcc/gcc-build/./gcc/include -isystem
/home/aroberts/gcc/gcc-build/./gcc/include-fixed /dev/null -mtune=generic
-march=x86-64 -fself-test=../../gcc-8.0.0/gcc/testsuite/selftests
-fpch-preprocess -o null.i
xgcc: internal compiler error: Segmentation fault (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://gcc.gnu.org/bugs/> for instructions.

Host OS:
Fedora 26 - x64

host gcc: 
gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap
--enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr
--mandir=/usr/share/man --infodir=/usr/share/info
--with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared
--enable-threads=posix --enable-checking=release --enable-multilib
--with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions
--enable-gnu-unique-object --enable-linker-build-id
--with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin
--enable-initfini-array --with-isl --enable-libmpx
--enable-offload-targets=nvptx-none --without-cuda-driver
--enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686
--build=x86_64-redhat-linux
Thread model: posix
gcc version 7.1.1 20170622 (Red Hat 7.1.1-3) (GCC) 

host ld:
ld -v
GNU ld (GNU Binutils) 2.29

uname -a
Linux ryzen 4.12.5-300.fc26.x86_64 #1 SMP Mon Aug 7 15:27:25 UTC 2017 x86_64
x86_64 x86_64 GNU/Linux

cat /proc/cpuinfo
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 23
model   : 1
model name  : AMD Ryzen 7 1700 Eight-Core Processor
stepping: 1
microcode   : 0x8001126
cpu MHz : 1550.000
cache size  : 512 KB
physical id : 0
siblings: 16
core id : 0
cpu cores   : 8
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp   

[Bug middle-end/81818] aarch64 uses 2-3x memory and 2x time of arm at -Os, -O2, -O3

2017-08-16 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81818

--- Comment #8 from Andrew Roberts  ---
I've tried building gcc-8-20170806 and gcc-8-20170813 with
--enable-gather-detailed-mem-stats

This fails on x86-64, arm and aarch64 with the same error.

The recently released 7.2.0 build ok on x86-64 at least, still testing the
rest.

Shall I file a separate bug report for gcc-8?

The error is:
/home/aroberts/gcc/gcc-build/./gcc/xgcc -B/home/aroberts/gcc/gcc-build/./gcc/
-xc -nostdinc /dev/null -S -o /dev/null
-fself-test=../../gcc-8.0.0/gcc/testsuite/selftests
xgcc: internal compiler error: Segmentation fault (program cc1)
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
make[2]: *** [Makefile:1952: s-selftest-c] Error 4
rm fsf-funding.pod gcov.pod gpl.pod cpp.pod gfdl.pod gcc.pod gcov-dump.pod
gfortran.pod gcov-tool.pod
make[2]: Leaving directory '/home/aroberts/gcc/gcc-build/gcc'
make[1]: *** [Makefile:4305: all-gcc] Error 2
make[1]: Leaving directory '/home/aroberts/gcc/gcc-build'
make: *** [Makefile:918: all] Error 2

[Bug middle-end/81818] aarch64 uses 2-3x memory and 2x time of arm at -Os, -O2, -O3

2017-08-16 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81818

--- Comment #7 from Andrew Roberts  ---
I'll try the memory testing on both arm and aarch64.

I've also tried -fopt-info-all-optall, I was hoping this would provide some
info on what was happening, but it only seems to give any output under -O3.

[Bug middle-end/81818] aarch64 uses 2-3x memory and 2x time of arm at -Os, -O2, -O3

2017-08-16 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81818

--- Comment #6 from Andrew Roberts  ---
Looks like this info got purged by the bugzilla failure, here it is again:

Ok, I've done some more digging. 

Looking at the optimization options enabled by -O2 vs -O1, I built the test
program at -O1 and enabled each optimization in turn, on both ARM and AARCH64.

It looks like -fgcse is using the most memory of all the optimizations.
On ARM "-O1 -fgcse" is using MORE memory than "-O2". 

This suggests to me that on ARM the gcse optimization is not being run for -O2
due to some cost benefit analysis or something. Where as it is on AARCH64. Is
there anyway to get some info out of gcc to prove this?

On AARCH64 -fgcse results in a huge compile time increase due to the additional
memory usage causing massive swapping. ARM compile time increased by 14%, but
AARCH compile time increased by 400%. When there is enough RAM to avoid
swapping  -fgcse looks ok (2Gb on odroid-c2).

Tested using: gcc version 8.0.0 20170806 (experimental) (GCC) on
Raspberry PI 3 1Gb RAM (both armv7l and aarch64).

For ARM:

Optimization Level: -O1 -falign-functions
Time=1:20.76 Mem=320040 PageFaults=0
Optimization Level: -O1 -falign-jumps
Time=1:21.10 Mem=319940 PageFaults=0
Optimization Level: -O1 -falign-labels
Time=1:21.00 Mem=320028 PageFaults=0
Optimization Level: -O1 -falign-loops
Time=1:20.62 Mem=320028 PageFaults=0
Optimization Level: -O1 -fcaller-saves
Time=1:20.45 Mem=319884 PageFaults=0
Optimization Level: -O1 -fcode-hoisting
Time=1:22.01 Mem=320832 PageFaults=0
Optimization Level: -O1 -fcrossjumping
Time=1:21.28 Mem=320164 PageFaults=0
Optimization Level: -O1 -fcse-follow-jumps
Time=1:20.47 Mem=32 PageFaults=0
Optimization Level: -O1 -fdevirtualize
Time=1:42.07 Mem=320032 PageFaults=0
Optimization Level: -O1 -fdevirtualize-speculatively
Time=1:20.44 Mem=320008 PageFaults=0
Optimization Level: -O1 -fexpensive-optimizations
Time=1:22.92 Mem=321752 PageFaults=0
Optimization Level: -O1 -fgcse
Time=1:34.12 Mem=556640 PageFaults=0 <
Optimization Level: -O1 -fhoist-adjacent-loads
Time=1:20.45 Mem=319940 PageFaults=0
Optimization Level: -O1 -findirect-inlining
Time=1:21.31 Mem=320020 PageFaults=0
Optimization Level: -O1 -finline-small-functions
Time=1:32.36 Mem=319992 PageFaults=0
Optimization Level: -O1 -fipa-bit-cp
Time=1:21.13 Mem=320008 PageFaults=0
Optimization Level: -O1 -fipa-cp
Time=1:19.94 Mem=322140 PageFaults=0
Optimization Level: -O1 -fipa-icf
Time=1:21.50 Mem=319940 PageFaults=0
Optimization Level: -O1 -fipa-icf-functions
Time=1:20.93 Mem=320060 PageFaults=0
Optimization Level: -O1 -fipa-icf-variables
Time=1:20.48 Mem=320044 PageFaults=0
Optimization Level: -O1 -fipa-ra
Time=1:20.58 Mem=320284 PageFaults=0
Optimization Level: -O1 -fipa-sra
Time=1:12.69 Mem=310648 PageFaults=0
Optimization Level: -O1 -fipa-vrp
Time=1:20.45 Mem=319836 PageFaults=0
Optimization Level: -O1 -fisolate-erroneous-paths-dereference
Time=1:20.61 Mem=320024 PageFaults=0
Optimization Level: -O1 -flra-remat
Time=1:20.56 Mem=319944 PageFaults=0
Optimization Level: -O1 -foptimize-sibling-calls
Time=1:20.69 Mem=320012 PageFaults=0
Optimization Level: -O1 -foptimize-strlen
Time=1:21.10 Mem=320024 PageFaults=0
Optimization Level: -O1 -fpartial-inlining
Time=1:21.19 Mem=319888 PageFaults=0
Optimization Level: -O1 -fpeephole2
Time=1:20.75 Mem=319888 PageFaults=0
Optimization Level: -O1 -freorder-functions
Time=1:20.63 Mem=319884 PageFaults=0
Optimization Level: -O1 -frerun-cse-after-loop
Time=1:21.96 Mem=320984 PageFaults=0
Optimization Level: -O1 -fschedule-insns2
Time=1:24.68 Mem=343916 PageFaults=0
Optimization Level: -O1 -fschedule-insns
Time=1:52.77 Mem=324696 PageFaults=0
Optimization Level: -O1 -fstore-merging
Time=1:20.47 Mem=320208 PageFaults=0
Optimization Level: -O1 -fstrict-aliasing
Time=1:20.86 Mem=319880 PageFaults=0
Optimization Level: -O1 -fthread-jumps
Time=1:20.31 Mem=319900 PageFaults=0
Optimization Level: -O1 -ftree-pre
Time=1:21.38 Mem=320696 PageFaults=0
Optimization Level: -O1 -ftree-switch-conversion
Time=1:20.51 Mem=320004 PageFaults=0
Optimization Level: -O1 -ftree-tail-merge
Time=1:21.13 Mem=320040 PageFaults=0
Optimization Level: -O1 -ftree-vrp
Time=1:21.01 Mem=323032 PageFaults=0

For AARCH64:

Optimization Level: -O1 -falign-functions
Time=2:22.49 Mem=393844 PageFaults=150
Optimization Level: -O1 -falign-jumps
Time=2:20.70 Mem=393952 PageFaults=0
Optimization Level: -O1 -falign-labels
Time=2:21.09 Mem=393880 PageFaults=0
Optimization Level: -O1 -falign-loops
Time=2:20.68 Mem=393956 PageFaults=0
Optimization Level: -O1 -fcaller-saves
Time=2:20.98 Mem=393968 PageFaults=0
Optimization Level: -O1 -fcode-hoisting
Time=2:22.60 Mem=395656 PageFaults=0
Optimization Level: -O1 -fcrossjumping
Time=2:21.69 Mem=393956 PageFaults=0
Optimization Level: -O1 -fcse-follow-jumps
Time=2:21.12 Mem=393968 PageFaults=0
Optimization Level: -O1 -fdevirtualize
Time=2:58.68 Mem=393412 PageFaults=0
Optimization Level: 

[Bug middle-end/81818] aarch64 uses 2-3x memory and 2x time of arm at -Os, -O2, -O3

2017-08-13 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81818

--- Comment #5 from Andrew Roberts  ---
Ok, I've done some more digging. 

Looking at the optimization options enabled by -O2 vs -O1, I built the test
program at -O1 and enabled each optimization in turn, on both ARM and AARCH64.

It looks like -fgcse is using the most memory of all the optimizations.
On ARM "-O1 -fgcse" is using MORE memory than "-O2". 

This suggests to me that on ARM the gcse optimization is not being run for -O2
due to some cost benefit analysis or something. Where as it is on AARCH64. Is
there anyway to get some info out of gcc to prove this?

On AARCH64 -fgcse results in a huge compile time increase due to the additional
memory usage causing massive swapping. ARM compile time increased by 14%, but
AARCH compile time increased by 400%. When there is enough RAM to avoid
swapping  -fgcse looks ok (2Gb on odroid-c2).

Tested using: gcc version 8.0.0 20170806 (experimental) (GCC) on
Raspberry PI 3 1Gb RAM (both armv7l and aarch64).

For ARM:

Optimization Level: -O1 -falign-functions
Time=1:20.76 Mem=320040 PageFaults=0
Optimization Level: -O1 -falign-jumps
Time=1:21.10 Mem=319940 PageFaults=0
Optimization Level: -O1 -falign-labels
Time=1:21.00 Mem=320028 PageFaults=0
Optimization Level: -O1 -falign-loops
Time=1:20.62 Mem=320028 PageFaults=0
Optimization Level: -O1 -fcaller-saves
Time=1:20.45 Mem=319884 PageFaults=0
Optimization Level: -O1 -fcode-hoisting
Time=1:22.01 Mem=320832 PageFaults=0
Optimization Level: -O1 -fcrossjumping
Time=1:21.28 Mem=320164 PageFaults=0
Optimization Level: -O1 -fcse-follow-jumps
Time=1:20.47 Mem=32 PageFaults=0
Optimization Level: -O1 -fdevirtualize
Time=1:42.07 Mem=320032 PageFaults=0
Optimization Level: -O1 -fdevirtualize-speculatively
Time=1:20.44 Mem=320008 PageFaults=0
Optimization Level: -O1 -fexpensive-optimizations
Time=1:22.92 Mem=321752 PageFaults=0
Optimization Level: -O1 -fgcse
Time=1:34.12 Mem=556640 PageFaults=0 <
Optimization Level: -O1 -fhoist-adjacent-loads
Time=1:20.45 Mem=319940 PageFaults=0
Optimization Level: -O1 -findirect-inlining
Time=1:21.31 Mem=320020 PageFaults=0
Optimization Level: -O1 -finline-small-functions
Time=1:32.36 Mem=319992 PageFaults=0
Optimization Level: -O1 -fipa-bit-cp
Time=1:21.13 Mem=320008 PageFaults=0
Optimization Level: -O1 -fipa-cp
Time=1:19.94 Mem=322140 PageFaults=0
Optimization Level: -O1 -fipa-icf
Time=1:21.50 Mem=319940 PageFaults=0
Optimization Level: -O1 -fipa-icf-functions
Time=1:20.93 Mem=320060 PageFaults=0
Optimization Level: -O1 -fipa-icf-variables
Time=1:20.48 Mem=320044 PageFaults=0
Optimization Level: -O1 -fipa-ra
Time=1:20.58 Mem=320284 PageFaults=0
Optimization Level: -O1 -fipa-sra
Time=1:12.69 Mem=310648 PageFaults=0
Optimization Level: -O1 -fipa-vrp
Time=1:20.45 Mem=319836 PageFaults=0
Optimization Level: -O1 -fisolate-erroneous-paths-dereference
Time=1:20.61 Mem=320024 PageFaults=0
Optimization Level: -O1 -flra-remat
Time=1:20.56 Mem=319944 PageFaults=0
Optimization Level: -O1 -foptimize-sibling-calls
Time=1:20.69 Mem=320012 PageFaults=0
Optimization Level: -O1 -foptimize-strlen
Time=1:21.10 Mem=320024 PageFaults=0
Optimization Level: -O1 -fpartial-inlining
Time=1:21.19 Mem=319888 PageFaults=0
Optimization Level: -O1 -fpeephole2
Time=1:20.75 Mem=319888 PageFaults=0
Optimization Level: -O1 -freorder-functions
Time=1:20.63 Mem=319884 PageFaults=0
Optimization Level: -O1 -frerun-cse-after-loop
Time=1:21.96 Mem=320984 PageFaults=0
Optimization Level: -O1 -fschedule-insns2
Time=1:24.68 Mem=343916 PageFaults=0
Optimization Level: -O1 -fschedule-insns
Time=1:52.77 Mem=324696 PageFaults=0
Optimization Level: -O1 -fstore-merging
Time=1:20.47 Mem=320208 PageFaults=0
Optimization Level: -O1 -fstrict-aliasing
Time=1:20.86 Mem=319880 PageFaults=0
Optimization Level: -O1 -fthread-jumps
Time=1:20.31 Mem=319900 PageFaults=0
Optimization Level: -O1 -ftree-pre
Time=1:21.38 Mem=320696 PageFaults=0
Optimization Level: -O1 -ftree-switch-conversion
Time=1:20.51 Mem=320004 PageFaults=0
Optimization Level: -O1 -ftree-tail-merge
Time=1:21.13 Mem=320040 PageFaults=0
Optimization Level: -O1 -ftree-vrp
Time=1:21.01 Mem=323032 PageFaults=0

For AARCH64:

Optimization Level: -O1 -falign-functions
Time=2:22.49 Mem=393844 PageFaults=150
Optimization Level: -O1 -falign-jumps
Time=2:20.70 Mem=393952 PageFaults=0
Optimization Level: -O1 -falign-labels
Time=2:21.09 Mem=393880 PageFaults=0
Optimization Level: -O1 -falign-loops
Time=2:20.68 Mem=393956 PageFaults=0
Optimization Level: -O1 -fcaller-saves
Time=2:20.98 Mem=393968 PageFaults=0
Optimization Level: -O1 -fcode-hoisting
Time=2:22.60 Mem=395656 PageFaults=0
Optimization Level: -O1 -fcrossjumping
Time=2:21.69 Mem=393956 PageFaults=0
Optimization Level: -O1 -fcse-follow-jumps
Time=2:21.12 Mem=393968 PageFaults=0
Optimization Level: -O1 -fdevirtualize
Time=2:58.68 Mem=393412 PageFaults=0
Optimization Level: -O1 -fdevirtualize-speculatively
Time=2:20.83 Mem=393968 PageFaults=0

[Bug middle-end/81818] aarch64 uses 2-3x memory and 2x time of arm at -Os, -O2, -O3

2017-08-11 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81818

--- Comment #4 from Andrew Roberts  ---
Looking at --param ggc-min-expand and --param ggc-min-heapsize

For gcc 8.0.0:
on arm with 1Gb RAM:
GGC heuristics: --param ggc-min-expand=93 --param ggc-min-heapsize=119808
on aarch64 with 1Gb RAM:
GGC heuristics: --param ggc-min-expand=88 --param ggc-min-heapsize=109859

So these are already slightly lower on aarch64, than on arm (presumably due to
less RAM being free after kernel usage, 789M vs 889M on arm).

Looking at individual optimizations:

as -O2 uses much more memory than -O1, I figured out the optimizations that
differed, and tried building at -O2 with each of these optimizations disabled
one by one.
The most any one optimization reduced the memory footprint by was 4%. So no
smoking gun there. The optimizations for -O2 are the same for arm and aarch64.

[Bug c++/81818] aarch64 uses 2-3x memory and 2x time of arm at -Os, -O2, -O3 (memory-hog)

2017-08-11 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81818

--- Comment #3 from Andrew Roberts  ---
I've added the test results for the arm and aarch64 builds on Raspberry Pi3.
These show compilation time, memory used, and object file size for:
-O0, -Os, -O1, -O2, -O3
using gcc 5.4.0, 6.4.0, 7.2.0, and 8.0.0

7.2.0 is the rc2 version, 8.0.0 is latest weekly snapshot.

The memory and compilation time issues are not speciific to any of the above
gcc
versions. It seems to have always been an issue with aarch64.

[Bug c++/81818] aarch64 uses 2-3x memory and 2x time of arm at -Os, -O2, -O3 (memory-hog)

2017-08-11 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81818

--- Comment #2 from Andrew Roberts  ---
Created attachment 41975
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41975=edit
Full test results for aarch64 on Raspberry Pi3

Test results for -O0, -Os, -O1, -O2, -O3 for gcc 5.4.0, 6.4.0, 7.2.0, 8.0.0
on Raspberry Pi3 running 64 bit kernel (aarch64).

[Bug c++/81818] aarch64 uses 2-3x memory and 2x time of arm at -Os, -O2, -O3 (memory-hog)

2017-08-11 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81818

--- Comment #1 from Andrew Roberts  ---
Created attachment 41974
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41974=edit
Full test report for Raspberry Pi ARM

Test results for -O0, -Os, -O1, -O2, -O3 for gcc 5.4.0, 6.4.0, 7.2.0, 8.0.0
on Raspberry Pi3 running 32 bit kernel (arm).

[Bug c++/81818] New: aarch64 uses 2-3x memory and 2x time of arm at -Os, -O2, -O3 (memory-hog)

2017-08-11 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81818

Bug ID: 81818
   Summary: aarch64  uses 2-3x memory and 2x time of arm at -Os,
-O2, -O3 (memory-hog)
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrewm.roberts at sky dot com
  Target Milestone: ---

Created attachment 41973
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41973=edit
System independent test program to demonstrate the issue.

I've run into problems building both gcc its self and my application
on aarch64. The system was running out of memory, and the compliler was
aborting
with an ICE:

g++: internal compiler error: Killed (program cc1plus)

I've raised the issue on the gcc list (gcc behavior on memory exhaustion), and 
people are looking at getting a better error message indicating that out of 
memory may be an issue.

The remaining issue is what can be done about the memory usage of the aarch64 
version of gcc, which seems much worse than arm, and x64.

--
gcc on aarch64 uses 3x the memory of arm, and is 2.2x slower in compiling.
This is apparent at -Os, -O2 and -O3.
--

I've cut my program down and made it system independent (testmap.cpp,
attached).
The program consists of two functions, one of which populates a multimap
with 2400 inserts. Basically it's just doing:

#include 
typedef std::multimap EnumMap_t;
static EnumMap_t EnumMap;
...
EnumMap.insert(EnumMap_t::value_type(0u, "0"));
EnumMap.insert(EnumMap_t::value_type(1u, "1"));
...
EnumMap.insert(EnumMap_t::value_type(2399u, "2399"));
... 

I've built this across x64 (Ryzen), arm (Raspberry Pi3), and aarch64 (Raspbery 
Pi3, and Odroid-C2). x64 is on Fedora, the rest are on Arch Linux Arm. The 
Raspberry Pi's have 1Gb RAM, the ODroid 2Gb, x64 has 32Gb. 

Compiling this single file exhausts most of the RAM on the Raspberry PI, and 
thus any parallel builds fail, or slow right down if swap file is used.

I've attached log files for builds at -O0, -O1, -O2, -O3 and -Os on all the
systems, using gcc 5.4.0, 6.4.0, 7.2.0rc2 and 8.0.0 snapshot.

Here is a summary of the results: 
all build using:
gcc -Ox -c testmap.cpp
where -Ox is one of -O0, -Os, -O1, -O2, -O3

Memory Usage (Kb)
-O05.4.0   6.4.0   7.2rc2  8.0.0
x64223676  223688  223736  223728
arm156204  156336  156336  156292
pi aarch64 224324  224596  224424  224572
od aarch64 217492  217604  217492  217540

-Os5.4.0   6.4.0   7.2rc2  8.0.0
x64392448  392512  392688  392680
arm205724  205792  205896  205664
pi aarch64 422520  422636  422208  422604 <= Higher than x64, 2x arm
od aarch64 416776  416260  416684  416708 <= Higher than x64, 2x arm

-O15.4.0   6.4.0   7.2rc2  8.0.0
x64394596  394568  394352  394232
arm319976  319896  319900  319840
pi aarch64 393944  393996  393836  394000
od aarch64 391636  391652  391636  391640

-O25.4.0   6.4.0   7.2rc2  8.0.0
x64628816  628972  628772  628896
arm267832  267860  267716  267836
pi aarch64 815260  784288  799196  812504  <= Higher than x64, 3x arm
od aarch64 813252  813068  813052  813084  <= Higher than x64, 3x arm

-O35.4.0   6.4.0   7.2rc2  8.0.0
x64629284  629472  629116  629236
arm266364  266264  266240  266412
pi aarch64 724168  723760  724000  724148  <= Higher than x64, 2.7x arm
od aarch64 718628  718388  718608  718608  <= Higher than x64, 2.7x arm

It's a similar story with compile times. I'll just compare apples with apples
here (identical hardware just arm vs aarch64 distibution/compiler):

-Os5.4.0   6.4.0   7.2rc2  8.0.0
arm3:05.82 3:06.41 3:03.30 3:05.58
pi aarch64 5:59.43 6:07.95 6:04.69 5:55.98 <= 2.0x arm

-O35.4.0   6.4.0   7.2rc2  8.0.0
arm2:14.83 2:15.77 2:14.87 2:15.94
pi aarch64 5:02.46 5:02.44 5:02.47 5:02.46 <= 2.2x arm

Both arm and aarch64 versions are using the same binutils:
GNU ld (GNU Binutils) 2.28.0.20170506

I built the compilers myself using same options for all versions:

ARM:
/usr/local/gcc/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-8.0.0/libexec/gcc/armv7l-unknown-linux-gnueabihf/8.0.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-8.0.0/configure --prefix=/usr/local/gcc-8.0.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-gnu-indirect-function --enable-lto --with-isl
-

[Bug bootstrap/78471] New: gcc-7-20161120 and truck fail to build on armv7l with ICE in cp-demangle.c, earlier snapshots ok

2016-11-22 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78471

Bug ID: 78471
   Summary: gcc-7-20161120 and truck fail to build on armv7l with
ICE in cp-demangle.c, earlier snapshots ok
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrewm.roberts at sky dot com
  Target Milestone: ---

Created attachment 40110
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40110=edit
Preprocessed gcc source file

Building gcc-7-20161120 and truck as of 20161122 fail to build on armv7l
(Raspberry Pi 3) with an ICE. Earlier weekly snapshots have been fine (tested
previous snapshot again with current arch toolchain and its fine). 
The 20161120 snapshot builds ok on x86_64 (centos 7) and aarch64 (Arch,
odroid-c2).

Host GCC:
gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/armv7l-unknown-linux-gnueabihf/6.2.1/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib
--libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info
--with-bugurl=https://github.com/archlinuxarm/PKGBUILDs/issues
--enable-languages=c,c++,fortran,go,lto,objc,obj-c++ --enable-shared
--enable-threads=posix --with-system-zlib --with-isl --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch
--disable-libssp --enable-gnu-unique-object --enable-linker-build-id
--enable-lto --enable-plugin --enable-install-libiberty
--with-linker-hash-style=gnu --enable-gnu-indirect-function --disable-multilib
--disable-werror --enable-checking=release
--host=armv7l-unknown-linux-gnueabihf --build=armv7l-unknown-linux-gnueabihf
--with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16
Thread model: posix
gcc version 6.2.1 20160830 (GCC)

Host ld:
ld -v
GNU ld (GNU Binutils) 2.27

uname -a
Linux alarmpi 4.4.33-1-ARCH #1 SMP Sat Nov 19 14:09:17 MST 2016 armv7l
GNU/Linux

Host is a Raspberry Pi 3 running Arch Linux.

Configured with:
../gcc-7.0.0/configure --prefix=/usr/local/gcc-7.0.0 --program-suffix=
--disable-werror --enable-shared --enable-threads=posix
--enablechecking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran --disable-libgcj --enable-clocale=gnu
--disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
--disable-libssp --host=armv7l-unknown-linux-gnueabihf
--build=armv7l-unknown-linux-gnueabihf --with-arch=armv7-a --with-float=hard
--with-fpu=vfpv3-d16 --disable-bootstrap

Built using:
make

Intree libraries used:
gmpver=6.1.1
mpcver=1.0.3
mpfrver=3.1.5
cloogver=0.18.1
islver=0.16.1

The build fails in:
armv7l-unknown-linux-gnueabihf/libsanitizer/libbacktrace
building cp-demangle.c

Compiler output:
cd
/home/alarm/gcc/gcc-build/armv7l-unknown-linux-gnueabihf/libsanitizer/libbacktrace
[root@alarmpi libbacktrace]#  /home/alarm/gcc/gcc-build/./gcc/xgcc -v
-save-temps -B/home/alarm/gcc/gcc-build/./gcc/
-B/usr/local/gcc-7.0.0/armv7l-unknown-linux-gnueabihf/bin/
-B/usr/local/gcc-7.0.0/armv7l-unknown-linux-gnueabihf/lib/ -isystem
/usr/local/gcc-7.0.0/armv7l-unknown-linux-gnueabihf/include -isystem
/usr/local/gcc-7.0.0/armv7l-unknown-linux-gnueabihf/sys-include -DHAVE_CONFIG_H
-I. -I../../../../gcc-7.0.0/libsanitizer/libbacktrace -I.. -I
../../../../gcc-7.0.0/libsanitizer/../include -I
../../../../gcc-7.0.0/libsanitizer/../libgcc -I ../../libgcc -I .. -I
../../../../gcc-7.0.0/libsanitizer -I
../../../../gcc-7.0.0/libsanitizer/../libbacktrace -W -Wall -Wwrite-strings
-Wmissing-format-attribute -Wcast-qual -Werror -Wstrict-prototypes
-Wmissing-prototypes -Wold-style-definition -g -O2 -march=armv7-a -pipe -MT
cp-demangle.lo -MD -MP -MF .deps/cp-demangle.Tpo -c
../../../../gcc-7.0.0/libsanitizer/libbacktrace/../../libiberty/cp-demangle.c
-o cp-demangle.o
xgcc: warning: -pipe ignored because -save-temps specified
Reading specs from /home/alarm/gcc/gcc-build/./gcc/specs
COLLECT_GCC=/home/alarm/gcc/gcc-build/./gcc/xgcc
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-7.0.0/configure --prefix=/usr/local/gcc-7.0.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran --disable-libgcj --enable-clocale=gnu
--disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
--disable-libssp --host=armv7l-unknown-linux-gnueabihf
--build=armv7l-unknown-linux-gnueabihf --with-a

[Bug bootstrap/67728] Build fails when cross-compiling with in-tree GMP and ISL

2016-03-27 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67728

--- Comment #25 from Andrew Roberts  ---
The patch works on native armv7l-unknown-linux-gnuabihf with:
gcc-6-20160320
and in tree
gmp 6.1.0
mpc 1.0.3
mpfr 3.1.4
isl 0.16.1

although I wasn't seeing a problem with check-mpc.
At least the build completes without needing the GMP snapshot or seding 
none- to `uname -m`- in the makefile.

[Bug bootstrap/67728] Build fails when cross-compiling with in-tree GMP and ISL

2016-03-22 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67728

--- Comment #22 from Andrew Roberts  ---
Tested with:
gcc-6-20160313
and in-tree:
gmp-6.1.99-20160321
mpc-1.0.3
mpfr-3.1.4
isl-0.16.1

On:
armv7l Arch Linux Arm (Raspberry Pi 3) (not bootstrapped yet due to build time)
This also builds ok with new GMP snapshot.

/usr/local/gcc-6.0.0/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc-6.0.0/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-6.0.0/libexec/gcc/armv7l-unknown-linux-gnueab
ihf/6.0.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-6.0.0/configure --prefix=/usr/local/gcc-6.0.0
--program-
suffix= --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix 
--with-system-zlib --with-isl --enable-__cxa_atexit
--disable-libunwind-exceptio
ns --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp
--enable-gnu-un
ique-object --enable-linker-build-id --enable-lto --enable-plugin
--enable-insta
ll-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function
--disab
le-multilib --disable-werror --enable-checking=release
--host=armv7l-unknown-lin
ux-gnueabihf --build=armv7l-unknown-linux-gnueabihf
--target=armv7l-unknown-linu
x-gnueabihf --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16
--disable
-bootstrap
Thread model: posix
gcc version 6.0.0 20160313 (experimental) (GCC)

ld -v
GNU ld (GNU Binutils) 2.26.0.20160302

[Bug bootstrap/67728] Build fails when cross-compiling with in-tree GMP and ISL

2016-03-21 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67728

--- Comment #21 from Andrew Roberts  ---
Tested with:
gcc-6-20160313
and in-tree:
gmp-6.1.99-20160321
mpc-1.0.3
mpfr-3.1.4
isl-0.16.1

On:
x86_64 Centos 7 (Full bootstrap)
This is Ok.

/usr/local/gcc-6.0.0/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc-6.0.0/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-6.0.0/libexec/gcc/x86_64-unknown-linux-gnu/6.0.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../gcc-6.0.0/configure --prefix=/usr/local/gcc-6.0.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-initfini-array --enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran --disable-libgcj --with-tune=generic
--enable-multilib --with-arch_32=i686 --host=x86_64-unknown-linux-gnu
--build=x86_64-unknown-linux-gnu --with-ld=/usr/local/bin/ld --with-gnu-ld
--with-as=/usr/local/bin/as --with-gnu-as --enable-bootstrap
Thread model: posix
gcc version 6.0.0 20160313 (experimental) (GCC)
/usr/local/bin/ld -v
GNU ld (GNU Binutils) 2.26.20160125


aarch64  Arch Linux Arm (ODroid-C2) (not stable enough to bootstrap)
This is Ok.

/usr/local/gcc-6.0.0/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc-6.0.0/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-6.0.0/libexec/gcc/aarch64-unknown-linux-gnu/6.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../gcc-6.0.0/configure --prefix=/usr/local/gcc-6.0.0
--program-suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu --enable-plugin
--enable-initfini-array --enable-gnu-indirect-function --enable-lto --with-isl
--enable-languages=c,c++,fortran --disable-libgcj --enable-clocale=gnu
--disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
--enable-shared --enable-clocale=gnu --with-arch-directory=aarch64
--enable-multiarch --host=aarch64-unknown-linux-gnu
--build=aarch64-unknown-linux-gnu --with-arch=armv8-a --disable-bootstrap
Thread model: posix
gcc version 6.0.0 20160313 (experimental) (GCC)


armv7l Arch Linux Arm (Raspberry Pi 3) (not bootstrapped yet due to build time)

I've not got this working yet, not sure if the new GMP is the issue or changes
in my build scripts. I'll go back to released GMP 6.1 and my working script and
then post the results later today or early tomorrow.

[Bug target/70133] AArch64 -mtune=native generates improperly formatted -march parameters

2016-03-18 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70133

Andrew Roberts  changed:

   What|Removed |Added

 CC||andrewm.roberts at sky dot com

--- Comment #4 from Andrew Roberts  ---
I've built latest snapshot on Arch Linux Arm aarch64 Odroid-C2 system and see
the same thing:

/usr/local/gcc-6.0.0/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc-6.0.0/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-6.0.0/libexec/gcc/aarch64-unknown-linux-gnu/6
.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../gcc-6.0.0/configure --prefix=/usr/local/gcc-6.0.0
--program-
suffix= --disable-werror --enable-shared --enable-threads=posix
--enable-checkin
g=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exception
s --enable-gnu-unique-object --enable-linker-build-id
--with-linker-hash-style=g
nu --enable-plugin --enable-initfini-array --enable-gnu-indirect-function
--enab
le-lto --with-isl --enable-languages=c,c++,fortran --disable-libgcj
--enable-clo
cale=gnu --disable-libstdcxx-pch --enable-install-libiberty --disable-multilib
-
-enable-shared --enable-clocale=gnu --with-arch-directory=aarch64
--enable-multi
arch --host=aarch64-unknown-linux-gnu --build=aarch64-unknown-linux-gnu
--target
=aarch64-unknown-linux-gnu --with-arch=armv8-a --disable-bootstrap
Thread model: posix
gcc version 6.0.0 20160313 (experimental) (GCC) 

echo "int main(void) { return 0; }" | /usr/local/gcc-6.0.0/bin/gcc
-march=native
 -c -x c -
Assembler messages:
Error: must specify extensions to add before specifying those to remove
Error: unrecognized option -march=armv8-a+fp+simd+nocrypto+crc+nolse

Where as:
echo "int main(void) { return 0; }" | /usr/local/gcc-6.0.0/bin/gcc
-march=armv8-
a+simd+crc+nolse -c -x c -

works

This is with binutils:
ld -v
GNU ld (GNU Binutils) 2.26.0.20160302

cat /proc/cpuinfo
Processor   : AArch64 Processor rev 4 (aarch64)
processor   : 0
processor   : 1
processor   : 2
processor   : 3
Features: fp asimd crc32
CPU implementer : 0x41
CPU architecture: AArch64
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 4

Hardware: ODROID-C2
Revision: 020b

uname -a
Linux alarm 3.14.29-10-ARCH #1 SMP PREEMPT Wed Mar 16 20:13:56 MDT 2016 aarch64 
GNU/Linux

I also saw the same thing with the Linero compiler on Ubuntu, and Arch Linux's
gcc 5.3.0. I did try to build gcc 6 snapshot on Ubuntu to report it but it was
too flakey, Arch Works better.

[Bug bootstrap/67728] Build fails when cross-compiling with in-tree GMP and ISL

2016-03-16 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67728

--- Comment #15 from Andrew Roberts  ---
Marc,

not entirely clear what you mean by reproducing the issue without downloading
mpfr, mpc, isl etc. Do you mean the missing symbol in GMP or the issues with
GMP when using assembly code? If you could please clarify.

I've built GMP 6.1.0 on its own on ARM armv7l, using the following
conifgurations:

../gmp-6.1.0/configure --prefix=/home/alarm/gcc/gmp/gmp-default
# This picks armv7lneon as cpu
make
make check
# No Failures
make install
nm ../gmp-default/lib/libgmp.so | grep gmpn_invert_limb
# 0004cca8 T __gmpn_invert_limb

../gmp-6.1.0/configure --prefix=/home/alarm/gcc/gmp/gmp-armv7l \
   --target=armv7l-linux-gnu \
   --build=armv7l-linux-gnu \
   --host=armv7l-linux-gnu
make
make check
# No Failures
make install
nm ../gmp-armv7l/lib/libgmp.so | grep gmpn_invert_limb
# 0004cca8 T __gmpn_invert_limb

../gmp-6.1.0/configure --prefix=/home/alarm/gcc/gmp/gmp-none \
--target=none-linux-gnu \
--build=none-linux-gnu \
--host=none-linux-gnu
make
make check
# No Failures
make install
nm ../gmp-none/lib/libgmp.a | grep gmpn_invert_limb
# no .so built, gmpn_invert_limb not in .a file


I note that GMP 6.1.0 at least as a configure option --enable-assembly=no
which seems to disable the assembly. But as with the none-linux-gnu
configuration this results in no gmpn_invert_limb symbol

../gmp-6.1.0/configure --prefix=/home/alarm/gcc/gmp/gmp-armv7l-noasm \
   --target=armv7l-linux-gnu \
   --build=armv7l-linux-gnu \
   --host=armv7l-linux-gnu \
   --enable-assembly=no
make
make check
# No Failures
make install
nm ../gmp-armv7l-noasm/lib/libgmp.so | grep gmpn_invert_limb
# gmpn_invert_limb not in .so or .a file

So I think the problem is that ISL is using a symbol which isn't always
available on all targets when no assembler is selected. But there might be
merrit investigating the --enable-assembly=no configure flag and which GMP
versions support it.

All of this does raise several isues:
1) If we are trying to disable assembler in GMP for some reason, shouldn't we
tell people building GMP outside of the tree to configure it that way. It seems
strange to have spent time and effort adding all those configuration switches
to find GMP, MPFR, ISL, MPC etc but then saying you really need to build it in
a special way, but we're not documenting it...

2) I've also found that different gcc versions require different versions of
ISL to build. For example ISL 0.14 does not work with gcc < 5.0.0, you need ISL
0.12.2. If there are dependencies on on which versions gcc needs for a given
release, shouldn't they also be documented (and the download_prerequisites
script updated accordingly).

[Bug bootstrap/67728] Build fails when cross-compiling with in-tree GMP and ISL

2016-03-14 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67728

--- Comment #11 from Andrew Roberts  ---
On Native ARM platform the bootstrap does work with the old in tree GMP 4.3.2,
regardless of wether you use none-linux-gnu or armv7l-linux-gnu when
configuring GMP.

Bulding by patching toplevel Makefile to remove none- and replace with armv7l-
../gcc-6.0.0/configure ...
uname_m=`uname -m`
sed -i "s/none-/${uname_m}-/" Makefile
make
make install

Using:
gmp -> ../gmp-4.3.2
mpc -> ../mpc-0.8.1
mpfr -> ../mpfr-2.4.2 (plus latest patches to mpfr to 2.4.2)
isl -> ../isl-0.16.1

^ Builds Ok and compiles simple program (not bootstrapped due to time taken)

Using:
gmp -> ../gmp-4.3.2
mpc -> ../mpc-0.8.1
mpfr -> ../mpfr-2.4.2 (plus latest patches to mpfr 2.4.2)
isl -> ../isl-0.15 (download_prerequisites version)

^ Builds Ok and compiles simple program (not bootstrapped due to time taken)


Bulding by WITHOUT patching toplevel Makefile but with bootstrapping
../gcc-6.0.0/configure ... --enable-bootstrap
make bootstrap-lean
make install

Using:
gmp -> ../gmp-4.3.2
mpc -> ../mpc-0.8.1
mpfr -> ../mpfr-2.4.2 (plus latest patches to mpfr 2.4.2)
isl -> ../isl-0.15 (download_prerequisites version)

^ Builds and bootstraps Ok and compiles simple program

This is all as expected. But as noted the docs suggest later versions of GMP
are ok as well.

If the build was fixed to use correct CPU for configuring GMP it would build
with both old and new versions of GMP. If there are some specific tests which
would exercise the GMP/ISL parts of gcc I could give them a go as well, but
running the entire test suite would take forever due to the slow storage.

[Bug bootstrap/67728] Build fails when cross-compiling with in-tree GMP and ISL

2016-03-13 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67728

--- Comment #8 from Andrew Roberts  ---
The initial bug report was for cross compiling. Bug 70211 is for native builds
on ARM. Given the huge growth in ARM development boards, this needs at least
documenting. As with the original reporter I spent ages trying to figure this
out before stumbling across a solution (and the solution isn't to build GMP out
of tree either). Building GMP out of tree creates another can of worms (esp on
multiarch machines). 

Any documentation fix should mention the targets that need the change (ARM),
and that both cross and native builds are affected. Also reference the
undefined symbol __gmpn_invert_limb so people know they have run across it.

Once you do the above it sort of becomes obvious that actually building GMP
intree with the correct CPU instead of none is proper solution. Or does that
not work for cross compilies? 

The easier it is for people to build gcc themselves the more testing prelease
versions will get.

[Bug bootstrap/70211] gcc-6-20160306 fails to build on ARM Linux with in tree ISL due to undefined GMP symbol __gmpn_invert_limb in isl_test

2016-03-13 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70211

--- Comment #1 from Andrew Roberts  ---
Looking at the toplevel Makefile.in the gmp targets (maybe-configure-gmp,
configure-gmp etc)
use:
--build=${build_alias} --host=none-${host_vendor}-${host_os}
--target=none-${host_vendor}-${host_os}

where as isl, mpfr etc all use:
--build=${build_alias} --host=${host_alias} --target=${target_alias}

Presumably there was a historic reason for this, is it still valid?

As my previous comments say, this seems to have started causing problems on ARM
from GMP 5.1 onwards.

[Bug bootstrap/70211] New: gcc-6-20160306 fails to build on ARM Linux with in tree ISL due to undefined GMP symbol __gmpn_invert_limb in isl_test

2016-03-12 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70211

Bug ID: 70211
   Summary: gcc-6-20160306 fails to build on ARM Linux with in
tree ISL due to undefined GMP symbol
__gmpn_invert_limb in isl_test
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrewm.roberts at sky dot com
  Target Milestone: ---

Build of gcc-6-20160306 fails on ARM Linux when building in tree ISL due to
undefined GMP symbol __gmpn_invert_limb when building isl_test

This failure dates back to at least 2013 as Linux From Scratch references it
here:
https://sourceware.org/bugzilla/attachment.cgi?id=6807
# PiLFS Build Script SVN-20130102 v1.0
...
# Workaround for a problem introduced with GMP 5.1.0.
# If configured by gcc with the "none" host & target, it will result in
undefined references to '__gmpn_invert_limb' during linking.
# Should be fixed by next version of gcc, but let me know if you have any more
ideas on this.
sed -i 's/none-/armv6l-/' Makefile

Make Error:
...
/bin/sh ./libtool  --tag=CC   --mode=link armv7l-unknown-linux-gnueabihf-gcc 
-O2 -march=armv7-a -pipe  -static-libstdc++ -static-libgcc  -o isl_test
isl_test.o libisl.la /home/alarm/gcc/gcc-build/./gmp/libgmp.la
libtool: link: armv7l-unknown-linux-gnueabihf-gcc -O2 -march=armv7-a -pipe
-static-libstdc++ -static-libgcc -o isl_test isl_test.o  ./.libs/libisl.a
/home/alarm/gcc/gcc-build/./gmp/.libs/libgmp.a
/home/alarm/gcc/gcc-build/./gmp/.libs/libgmp.a(divrem_1.o): In function
`__gmpn_divrem_1':
divrem_1.c:(.text+0xb0): undefined reference to `__gmpn_invert_limb'
divrem_1.c:(.text+0x1d4): undefined reference to `__gmpn_invert_limb'
/home/alarm/gcc/gcc-build/./gmp/.libs/libgmp.a(mod_1.o): In function
`__gmpn_mod_1':
mod_1.c:(.text+0x60): undefined reference to `__gmpn_invert_limb'
mod_1.c:(.text+0x170): undefined reference to `__gmpn_invert_limb'
/home/alarm/gcc/gcc-build/./gmp/.libs/libgmp.a(div_q.o): In function
`__gmpn_div_q':
div_q.c:(.text+0x174): undefined reference to `__gmpn_invert_limb'
/home/alarm/gcc/gcc-build/./gmp/.libs/libgmp.a(div_q.o):div_q.c:(.text+0x460):
more undefined references to `__gmpn_invert_limb' follow
collect2: error: ld returned 1 exit status
Makefile:1276: recipe for target 'isl_test' failed

The failure is due to --host=none-... and --target=none-... appearing the
gcc-build/gcc/Makefile for the configure-gmp target after configure.

This happens on (at least) armv6l (Raspberry Pi)  and armv7l (Raspberry Pi 3)
Editing the makefile to replace 'none-' with 'armv6l-' or 'armv7l-' allows the
build to complete.

Built Using:
tar -xjf gcc-6-20160306.tar.bz2
mv gcc-6-20160306 gcc-6.0.0
cd gcc-6.0.0
# From ftp://gcc.gnu.org/pub/gcc/infrastructure/
tar -xjf isl-0.16.1.tar.bz
ln -sf isl-0.16.1 isl
cd isl
# This version of ISL is using older version of automake, needs reconfigure
autoreconf
cd ..
# From MPFR website
tar -xjf mpfr-3.1.4.tar.bz2
ln -sf mpfr-3.1.4 mpfr
# From MPC website
tar -xzf mpc-1.0.3.tar.gz
ln -sf mpc-1.0.3 mpc
# From GMP website
tar -xjf gmp-6.1.0.tar.bz2
ln -sf gmp-6.1.0 gmp
cd ..
mkdir gcc-build
cd gcc-build
../gcc-6.0.0/configure --prefix=/usr/local/gcc-6.0.0 --program-suffix= \
--enable-languages=c,c++,fortran --enable-shared --enable-threads=posix \
--with-system-zlib --with-isl --enable-__cxa_atexit \
--disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch \
--disable-libssp --enable-gnu-unique-object --enable-linker-build-id \
--enable-lto --enable-plugin --enable-install-libiberty \
--with-linker-hash-style=gnu --enable-gnu-indirect-function \
--disable-multilib --disable-werror --enable-checking=release \
--host=armv7l-unknown-linux-gnueabihf --build=armv7l-unknown-linux-gnueabihf \
--target=armv7l-unknown-linux-gnueabihf --with-arch=armv7-a \
--with-float=hard --with-fpu=vfpv3-d16 --disable-bootstrap
# then run make

Resulting Makefile:
...
.PHONY: configure-gmp maybe-configure-gmp
maybe-configure-gmp:
maybe-configure-gmp: configure-gmp
configure-gmp:
...
$(SHELL) \
  $$s/$$module_srcdir/configure \
  --srcdir=$${topdir}/$$module_srcdir \
  $(HOST_CONFIGARGS) --build=${build_alias}
--host=none-${host_vendor}-$
{host_os} \
  --target=none-${host_vendor}-${host_os} --disable-shared LEX="touch
le
x.yy.c" \
  || exit 1
...

The --host=none- --target=none- cause the build of intree ISL to fail:

Built on Arch Linux Arm:
uname -a
Linux alarmpi 4.1.19-4-ARCH #1 SMP Wed Mar 9 18:23:02 MST 2016 armv7l GNU/Linux

Using tools:
automake: 1.15-1
autoconf 2.69-2
libtool: 2.4.6-4
binutils: 2.26-3
m4 1.4.17-1

Using in tree libraries:
isl-0.16.1 (need to run autoreconf in isl directory before configuring)
mpfr-3.1.4
mpc-1.0.3
gmp-6.1.0

Host gcc:
gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRA

[Bug c/70210] -march=native and -mcpu=native do not detect ARM cortex-a53 in 32 bit mode on Linux

2016-03-12 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70210

--- Comment #1 from Andrew Roberts  ---
Note this was causing bug 70132 (ARM -mcpu=native can cause a double free
abort). 
A patch as been subitted to fix the double free, but doesn't address 
the failure to detect the CPU

[Bug c/70210] New: -march=native and -mcpu=native do not detect ARM cortex-a53 in 32 bit mode on Linux

2016-03-12 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70210

Bug ID: 70210
   Summary: -march=native and -mcpu=native do not detect ARM
cortex-a53 in 32 bit mode on Linux
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrewm.roberts at sky dot com
  Target Milestone: ---

When using -march=native or -mcpu=native with gcc-6-20160306 snapshot (and also
on previous released versions), the ARM cortex-a53 CPU is not detected in 32
bit mode. This CPU is used on the Raspberry Pi 3 (BCM2834) amongst others.

Tested on Arch Linux ARM:

uname -a
Linux alarmpi 4.1.19-2-ARCH #1 SMP Sat Mar 5 22:22:01 MST 2016 armv7l GNU/Linux

cat /proc/cpuinfo
processor   : 0
model name  : ARMv7 Processor rev 4 (v7l)
BogoMIPS: 76.80
Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 4

processor   : 1
model name  : ARMv7 Processor rev 4 (v7l)
BogoMIPS: 76.80
Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 4

processor   : 2
model name  : ARMv7 Processor rev 4 (v7l)
BogoMIPS: 76.80
Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 4

processor   : 3
model name  : ARMv7 Processor rev 4 (v7l)
BogoMIPS: 76.80
Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
vfpd32 lpae evtstrm crc32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 4

Hardware: BCM2709
Revision: a02082
Serial  : 

/usr/local/gcc-6.0.0/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc-6.0.0/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-6.0.0/libexec/gcc/armv7l-unknown-linux-gnueab
ihf/6.0.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-6.0.0/configure --prefix=/usr/local/gcc-6.0.0
--program-
suffix= --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix 
--with-system-zlib --with-isl --enable-__cxa_atexit
--disable-libunwind-exceptio
ns --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp
--enable-gnu-un
ique-object --enable-linker-build-id --enable-lto --enable-plugin
--enable-insta
ll-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function
--disab
le-multilib --disable-werror --enable-checking=release
--host=armv7l-unknown-lin
ux-gnueabihf --build=armv7l-unknown-linux-gnueabihf
--target=armv7l-unknown-linu
x-gnueabihf --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16
--disable
-bootstrap
Thread model: posix
gcc version 6.0.0 20160306 (experimental) (GCC) 

The CPU part table in gcc/config/arm/driver-arm.c does not include the
cortex-a53 part number (0xd03).

[Bug driver/70132] ARM -mcpu=native can cause a double free abort.

2016-03-11 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70132

--- Comment #5 from Andrew Roberts  ---
Do I need to raise another bug report  to get the march=native to actually
generate native code, or has one already been raised?

My original report (Bug 70136) included full /proc/cpuinfo for the BCM2834 as
used on the Raspberry Pi 3 in 32 bit mode.

CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 4

[Bug driver/70132] ARM -mcpu=native can cause a double free abort.

2016-03-11 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70132

--- Comment #4 from Andrew Roberts  ---
Patch tested OK,
on Raspberry Pi 3, on Arch Linux using latest gcc 6 snapshot:

/usr/local/gcc-6.0.0/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/local/gcc-6.0.0/bin/gcc
COLLECT_LTO_WRAPPER=/usr/local/gcc-6.0.0/libexec/gcc/armv7l-unknown-linux-gnueabihf/6.0.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: ../gcc-6.0.0/configure --prefix=/usr/local/gcc-6.0.0
--program-suffix= --enable-languages=c,c++,fortran --enable-shared
--enable-threads=posix --with-system-zlib --with-isl --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch
--disable-libssp --enable-gnu-unique-object --enable-linker-build-id
--enable-lto --enable-plugin --enable-install-libiberty
--with-linker-hash-style=gnu --enable-gnu-indirect-function --disable-multilib
--disable-werror --enable-checking=release
--host=armv7l-unknown-linux-gnueabihf --build=armv7l-unknown-linux-gnueabihf
--target=armv7l-unknown-linux-gnueabihf --with-arch=armv7-a --with-float=hard
--with-fpu=vfpv3-d16 --disable-bootstrap
Thread model: posix
gcc version 6.0.0 20160306 (experimental) (GCC)

[Bug c/70136] New: -march=native causes SIGABRT due to double close of FILE on certain ARM systems (BCM2834, armv8 cortex-a53)

2016-03-08 Thread andrewm.roberts at sky dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70136

Bug ID: 70136
   Summary: -march=native causes SIGABRT due to double close of
FILE on certain ARM systems (BCM2834, armv8
cortex-a53)
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: andrewm.roberts at sky dot com
  Target Milestone: ---

gcc 4.9.1 (Raspbian Linux)
gcc 5.3.0 (Arch Linux)
gcc 6-20160306 (Arch Linux)
 all crash on Raspberry Pi 3 (BCM2834, armv8 cortex-a53) when using
-march=native compiler flag.

To reproduce:
echo "int main(void) {return0;}" | gcc -c -x c -march=native -

Example output (gcc 5.3.0):

[alarm@alarmp ~]$ echo "int main(void) {return 0;}" | gcc -c -x c -march=native
-
*** Error in `gcc': double free or corruption (!prev): 0x016486d8 ***
=== Backtrace: =
/usr/lib/libc.so.6(+0x649a4)[0x76e399a4]
/usr/lib/libc.so.6(+0x6ad2c)[0x76e3fd2c]
/usr/lib/libc.so.6(+0x6b6bc)[0x76e406bc]
/usr/lib/libc.so.6(fclose+0x110)[0x76e2f118]
gcc[0x20898]
gcc[0x1da3c]
gcc[0x1bfb4]
gcc[0x1e484]
gcc[0x1c6a0]
gcc[0x1d4c8]
gcc[0x1e808]
gcc[0x1ed08]
gcc[0x127a0]
gcc[0x12834]
/usr/lib/libc.so.6(__libc_start_main+0x114)[0x76debcf8]
=== Memory map: 
0001-000b9000 r-xp  b3:02 1201092/usr/bin/gcc
000c8000-000ca000 rw-p 000a8000 b3:02 1201092/usr/bin/gcc
000ca000-000cc000 rw-p  00:00 0
0164-01665000 rw-p  00:00 0  [heap]
76b0-76b21000 rw-p  00:00 0
76b21000-76c0 ---p  00:00 0
76c06000-76c22000 r-xp  b3:02 1198730/usr/lib/libgcc_s.so.1
76c22000-76c32000 ---p 0001c000 b3:02 1198730/usr/lib/libgcc_s.so.1
76c32000-76c33000 rw-p 0001c000 b3:02 1198730/usr/lib/libgcc_s.so.1
76c3d000-76dd5000 r--p  b3:02 1317130/usr/lib/locale/locale-archive
76dd5000-76efc000 r-xp  b3:02 1198747/usr/lib/libc-2.23.so
76efc000-76f0c000 ---p 00127000 b3:02 1198747/usr/lib/libc-2.23.so
76f0c000-76f0e000 r--p 00127000 b3:02 1198747/usr/lib/libc-2.23.so
76f0e000-76f0f000 rw-p 00129000 b3:02 1198747/usr/lib/libc-2.23.so
76f0f000-76f12000 rw-p  00:00 0
76f12000-76f82000 r-xp  b3:02 1198805/usr/lib/libm-2.23.so
76f82000-76f91000 ---p 0007 b3:02 1198805/usr/lib/libm-2.23.so
76f91000-76f92000 r--p 0006f000 b3:02 1198805/usr/lib/libm-2.23.so
76f92000-76f93000 rw-p 0007 b3:02 1198805/usr/lib/libm-2.23.so
76f93000-76fb3000 r-xp  b3:02 1198581/usr/lib/ld-2.23.so
76fb6000-76fb7000 rw-p  00:00 0
76fc-76fc2000 rw-p  00:00 0
76fc2000-76fc3000 r--p 0001f000 b3:02 1198581/usr/lib/ld-2.23.so
76fc3000-76fc4000 rw-p 0002 b3:02 1198581/usr/lib/ld-2.23.so
7e828000-7e849000 rw-p  00:00 0  [stack]
7eede000-7eedf000 r-xp  00:00 0  [sigpage]
7eedf000-7eee r--p  00:00 0  [vvar]
7eee-7eee1000 r-xp  00:00 0  [vdso]
-1000 r-xp  00:00 0  [vectors]
Aborted (core dumped)

Reproduced on:
Arch Linux Arm for Raspberry Pi 3
uname -a
Linux alarmpi 4.1.19-2-ARCH #1 SMP Sat Mar 5 22:22:01 MST 2016 armv7l GNU/Linux
cat /proc/cpuinfo
processor   : 0
model name  : ARMv7 Processor rev 4 (v7l)
BogoMIPS: 76.80
Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
vfpd32 lpae evtstrm crc32 
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 4

processor   : 1
model name  : ARMv7 Processor rev 4 (v7l)
BogoMIPS: 76.80
Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
vfpd32 lpae evtstrm crc32 
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 4

processor   : 2
model name  : ARMv7 Processor rev 4 (v7l)
BogoMIPS: 76.80
Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
vfpd32 lpae evtstrm crc32 
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 4

processor   : 3
model name  : ARMv7 Processor rev 4 (v7l)
BogoMIPS: 76.80
Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt
vfpd32 lpae evtstrm crc32 
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part: 0xd03
CPU revision: 4

Hardware: BCM2709
Revision: a02082
Serial  : 

Host Compiler:
[alarm@alarmpi ~]$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/armv7l-unknown-linux-gnueabihf/5.3.0/lto-wrapper
Target: armv7l-unknown-linux-gnueabihf
Configured with: /build/gcc/src/gcc-5-20160209/configure --prefix=/usr
--libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man

  1   2   >