[Bug gcov-profile/83509] gcov-dump-8 unable to dump any gcda files

2017-12-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83509

--- Comment #4 from PeteVine  ---
Created attachment 42934
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42934=edit
corresponding C/gcda/gcno files

[Bug gcov-profile/83509] gcov-dump-8 unable to dump any gcda files

2017-12-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83509

--- Comment #3 from PeteVine  ---
OK, the following command was used to obtain the gcno/gcda files:

$ gcc-8 -O3 -fprofile-generate -ftest-coverage sudoku.c && ./a.out

Unlike the gcda file, gcno is dumpable with gcov-dump-8.

[Bug gcov-profile/83509] New: gcov-dump-8 unable to dump any gcda files

2017-12-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83509

Bug ID: 83509
   Summary: gcov-dump-8 unable to dump any gcda files
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tulipawn at gmail dot com
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Created attachment 42931
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42931=edit
example gcda file created with gcc 8.0

As per https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82614#c16 

not being able to dump previous versions' format is to be expected but
gcov-dump-8 fails at dumping files created by gcc-8 as well:

$ gcov-dump-8 sudoku.gcda 

sudoku.gcda:data:magic `gcda':version `A80e'
sudoku.gcda:stamp 1935438558
sudoku.gcda:tag `0052' is invalid
sudoku.gcda:0052:2583412121:UNKNOWN

The file can still be dumped with previous versions of gcov-dump.

[Bug gcov-profile/82614] GCOV crashes while parsing gcda file

2017-12-19 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82614

--- Comment #15 from PeteVine  ---
No, that's not it - gcov-dump 6/7 have no problem dumping previous versions.
I'm just not sure if the problem with gcov-dump-8 is architecture specific
(ARM) or it's something to do with my setup. I'm going to leave it there.

[Bug gcov-profile/82614] GCOV crashes while parsing gcda file

2017-12-06 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82614

--- Comment #13 from PeteVine  ---
Almost certainly not related, but there's been some sort of regression in
gcov-dump from GCC 8 branch. Trying to dump any *.gcda file (ver. 8 included)
ends like this:

$ gcov-dump-8 Unified_cpp_js_src25.gcda 
Unified_cpp_js_src25.gcda:data:magic `gcda':version `504*'
Unified_cpp_js_src25.gcda:warning:current version is `A80e'
Unified_cpp_js_src25.gcda:stamp 532248120
Unified_cpp_js_src25.gcda:tag `01ba' is invalid
Unified_cpp_js_src25.gcda:01ba:3336454216:UNKNOWN

[Bug middle-end/70773] Cortex A5 profiled sudoku solver slower due to lack of sdiv/udiv

2017-11-28 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773

--- Comment #25 from PeteVine  ---
So, the profile data is probably fine, and judging from the size of the final
binary, it's being used. The fix could be real after all :)

[Bug middle-end/70773] Cortex A5 profiled sudoku solver slower due to lack of sdiv/udiv

2017-11-28 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773

--- Comment #24 from PeteVine  ---
Or maybe not, gcov-dump-6 is able to read the file.

$ gcov-dump-6 sudoku.gcda.good

sudoku.gcda.good:data:magic `gcda':version `A80e'
sudoku.gcda.good:warning:current version is `603*'
sudoku.gcda.good:stamp 46451024
sudoku.gcda.good: a300:  82:PROGRAM_SUMMARY checksum=0x1b6601f6
sudoku.gcda.good:   counts=29, runs=1, sum_all=283905002,
run_max=58689000, sum_max=58689000
sudoku.gcda.good:   counter histogram:
sudoku.gcda.good:   0: num counts=2, min counter=0, cum_counter=0
sudoku.gcda.good:   1: num counts=2, min counter=1, cum_counter=2
sudoku.gcda.good:   35: num counts=6, min counter=1000,
cum_counter=6000
sudoku.gcda.good:   41: num counts=2, min counter=3000,
cum_counter=6000
sudoku.gcda.good:   48: num counts=2, min counter=9000,
cum_counter=18000
sudoku.gcda.good:   54: num counts=1, min counter=27000,
cum_counter=27000
sudoku.gcda.good:   55: num counts=1, min counter=29000,
cum_counter=29000
sudoku.gcda.good:   58: num counts=1, min counter=52000,
cum_counter=52000
sudoku.gcda.good:   60: num counts=2, min counter=81000,
cum_counter=162000
sudoku.gcda.good:   82: num counts=1, min counter=3531000,
cum_counter=3531000
sudoku.gcda.good:   86: num counts=4, min counter=6469000,
cum_counter=26033000
sudoku.gcda.good:   92: num counts=1, min counter=19563000,
cum_counter=19563000
sudoku.gcda.good:   98: num counts=4, min counter=58411000,
cum_counter=234478000
sudoku.gcda.good: 0100:   3:FUNCTION ident=108032747,
lineno_checksum=0x0ceca33f, cfg_checksum=0x73ff2042
sudoku.gcda.good:  01a1:   6:COUNTERS arcs 3 counts
sudoku.gcda.good:  01af:   2:COUNTERS ior 1 counts
sudoku.gcda.good: 0100:   3:FUNCTION ident=82881,
lineno_checksum=0x3ae31d81, cfg_checksum=0x707619b8
sudoku.gcda.good:  01a1:  14:COUNTERS arcs 7 counts
sudoku.gcda.good:  01af:   2:COUNTERS ior 1 counts
sudoku.gcda.good: 0100:   3:FUNCTION ident=1633341470,
lineno_checksum=0xf25ea178, cfg_checksum=0x1bd90f34
sudoku.gcda.good:  01a1:  22:COUNTERS arcs 11 counts
sudoku.gcda.good:  01af:   2:COUNTERS ior 1 counts
sudoku.gcda.good: 0100:   3:FUNCTION ident=535938890,
lineno_checksum=0x375a9f34, cfg_checksum=0x5d41b59e
sudoku.gcda.good:  01a1:  16:COUNTERS arcs 8 counts
sudoku.gcda.good:  01af:   2:COUNTERS ior 1 counts

[Bug middle-end/70773] Cortex A5 profiled sudoku solver slower due to lack of sdiv/udiv

2017-11-28 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773

--- Comment #23 from PeteVine  ---
$ gcov-dump-6 sudoku.gcda.bad

sudoku.gcda.bad:data:magic `gcda':version `603*'
sudoku.gcda.bad:stamp 46515746
sudoku.gcda.bad: a300:  77:PROGRAM_SUMMARY checksum=0x12ec1c02
sudoku.gcda.bad:counts=29, runs=1, sum_all=342403001,
run_max=58689000, sum_max=58689000
sudoku.gcda.bad:counter histogram:
sudoku.gcda.bad:0: num counts=2, min counter=0, cum_counter=0
sudoku.gcda.bad:1: num counts=1, min counter=1, cum_counter=1
sudoku.gcda.bad:35: num counts=6, min counter=1000,
cum_counter=6000
sudoku.gcda.bad:41: num counts=1, min counter=3000,
cum_counter=3000
sudoku.gcda.bad:48: num counts=3, min counter=9000,
cum_counter=27000
sudoku.gcda.bad:54: num counts=1, min counter=27000,
cum_counter=27000
sudoku.gcda.bad:55: num counts=1, min counter=29000,
cum_counter=29000
sudoku.gcda.bad:60: num counts=3, min counter=81000,
cum_counter=243000
sudoku.gcda.bad:82: num counts=1, min counter=3531000,
cum_counter=3531000
sudoku.gcda.bad:86: num counts=4, min counter=6469000,
cum_counter=26033000
sudoku.gcda.bad:92: num counts=1, min counter=19563000,
cum_counter=19563000
sudoku.gcda.bad:98: num counts=5, min counter=58411000,
cum_counter=292941000
sudoku.gcda.bad: 0100:   3:FUNCTION ident=108032747,
lineno_checksum=0x0ceca33f, cfg_checksum=0x73ff2042
sudoku.gcda.bad:  01a1:   6:COUNTERS arcs 3 counts
sudoku.gcda.bad:  01b1:   2:COUNTERS time_profiler 1 counts
sudoku.gcda.bad: 0100:   3:FUNCTION ident=82881,
lineno_checksum=0x3ae31d81, cfg_checksum=0x707619b8
sudoku.gcda.bad:  01a1:  14:COUNTERS arcs 7 counts
sudoku.gcda.bad:  01b1:   2:COUNTERS time_profiler 1 counts
sudoku.gcda.bad: 0100:   3:FUNCTION ident=1633341470,
lineno_checksum=0xf25ea178, cfg_checksum=0x88a084d7
sudoku.gcda.bad:  01a1:  22:COUNTERS arcs 11 counts
sudoku.gcda.bad:  01b1:   2:COUNTERS time_profiler 1 counts
sudoku.gcda.bad: 0100:   3:FUNCTION ident=535938890,
lineno_checksum=0x375a9f34, cfg_checksum=0x5d41b59e
sudoku.gcda.bad:  01a1:  16:COUNTERS arcs 8 counts
sudoku.gcda.bad:  01b1:   2:COUNTERS time_profiler 1 counts

whereas:
$ gcov-dump-8 sudoku.gcda.good 
sudoku.gcda.good:data:magic `gcda':version `A80e'
sudoku.gcda.good:stamp 46451024
sudoku.gcda.good:tag `0052' is invalid
sudoku.gcda.good:0052:459670006:UNKNOWN

so it looks like the profile data is not usable and hence no pessimization?
That's probably not the fix I was hoping for, oops!

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-11-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773

--- Comment #22 from PeteVine  ---
> I don't know what exactly "fixed" this

That would be nice to know. This I can say for sure: gcc 7.2.1 20171116 still
produces slower profiled code on the target system. 

I've also discovered, compiling and profiling on a binary compatible Cortex A17
system (same flags), produces binaries that don't run any slower on the target
system.

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-11-23 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773

PeteVine  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #20 from PeteVine  ---
The bug doesn't reproduce in a recent GCC 8 build (profiling on a Cortex A5
system).

The generated assembly contains no __aeabi_idiv calls whatsoever. Well done.

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-11-23 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773

--- Comment #19 from PeteVine  ---
Created attachment 42694
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42694=edit
Better assembly after profiling

[Bug target/79964] Cortex A53 codegen still not optimal

2017-11-11 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964

PeteVine  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from PeteVine  ---
In case the changed behaviour of -frename-registers is not actually a feature,
please reopen.

[Bug fortran/79933] gfortran no longer able to compile dolfyn benchmark

2017-11-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79933

--- Comment #5 from PeteVine  ---
You're right, sorry for the confusion. It seems I skimmed over the wall of
errors too quickly while the last one came from a different source file.

According to my own results in bug #77730, I was somehow able to build the
benchmark with earlier versions of gcc 6/7 using the -std=f95 flag. Could you
try the following steps?

wget http://www.phoronix-test-suite.com/benchmark-files/dolfyn-cfd_0.527.tgz 
tar xvf dolfyn-cfd_0.527.tgz
cd dolfyn-cfd_0.527/src/
sed -i 's/F90FLAGS = -O2/F90FLAGS = -O2 -std=f95/' Makefile
make

[Bug fortran/79933] gfortran no longer able to compile dolfyn benchmark

2017-11-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79933

--- Comment #3 from PeteVine  ---
In gcc 8, -std=f2003 is required to overcome the issue but there's another
failure later on:

gfortran -c -O2 -std=f2003solverinterface.f90
solverinterface.f90:108:9:

real*4  fpar(16)  ! hint by Shibo
 1
Error: GNU Extension: Nonstandard type declaration REAL*4 at (1)
solverinterface.f90:153:7:

fpar(1) = RTOL(VarSC)
   1
Error: Unclassifiable statement at (1)
solverinterface.f90:154:7:

fpar(2) = ATOL(VarSC)
   1
Error: Unclassifiable statement at (1)
solverinterface.f90:156:7:

fpar(1) = RTOL(ivar) ! relative tolerance, must be between (0, 1)
   1
Error: Unclassifiable statement at (1)
solverinterface.f90:157:7:

fpar(2) = ATOL(ivar) ! absolute tolerance, must be positive
   1
Error: Unclassifiable statement at (1)
solverinterface.f90:176:1:

 NNZ,  Alu,Jlu,Ju,Jw )
 1

Are there any gfortran switches that would enable compiling this code base as
is?

[Bug target/79964] Cortex A53 codegen still not optimal

2017-07-30 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964

--- Comment #7 from PeteVine  ---
Thanks for pointing that out! I was using my bash history to change the CFLAGS
and when I was flipping the crc switch I didn't notice I'd picked a version
without -frename-registers, hence this wrong conclusion :)

Definitely then, -frename-registers it is!

http://openbenchmarking.org/result/1707307-RI-CORTEXA5313

[Bug target/79964] Cortex A53 codegen still not optimal

2017-07-30 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964

--- Comment #5 from PeteVine  ---
Turns out the GCC 8 regression is caused by the +crc switch in
-march=armv8-a+crc. Interesting, eh?

[Bug target/79581] VFP4 slower than VFP3 in C-ray on Cortex A5

2017-06-15 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581

--- Comment #8 from PeteVine  ---
I've just confirmed the result on a newer Linux distribution (Ubuntu 16.04) and
the difference between VFPv3 and v4 is clearly there (2330 vs 2560) using gcc
5.4. 

Unless the CPU itself requires an erratum, that probably leaves suboptimal
codegen as the main suspect.

[Bug target/79581] VFP4 slower than VFP3 in C-ray on Cortex A5

2017-06-15 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581

--- Comment #7 from PeteVine  ---
Thanks, I promise to test any patches without delay :)

[Bug target/79964] Cortex A53 codegen still not optimal

2017-05-02 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964

--- Comment #4 from PeteVine  ---
> I'm not sure what you're trying to measure here - it's very confusing with 
> multiple overlapping options (O3/Ofast/tree-vectorize), -mcpu/-march. Is it 
> related to -fipa-pta or is that not relevant?

All the relevant flags have been kept constant (-Ofast -mcpu), so you should
only look at this result side by side with the previous one.

I'll summarise the findings for you:

To get the best c-ray performance out of gcc7 it's necessary to either use
-mcpu/mtune=cortex-a57 or -mcpu=cortex-a53 -frename-registers (depessimizing
with -mno-fix-cortex-a53-843419 if necessary)

However, in gcc8, neither produce the expected, best performance. No
combination does, a clear regression.

[Bug target/79581] VFP4 slower than VFP3 in C-ray on Cortex A5

2017-05-01 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581

--- Comment #5 from PeteVine  ---
Unchanged in gcc version 8.0.0 20170501.

[Bug target/79964] Cortex A53 codegen still not optimal

2017-04-29 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964

--- Comment #2 from PeteVine  ---
I can confirm the first part of the issue gets fixed with this patch:

https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01415.html

but there's a regression in gcc8 concerning the second part. (or rather the
workarounds don't work any more) 

http://openbenchmarking.org/result/1704298-RI-CRAYREGRE13

("basic flags" didn't deactivate -mfix-cortex-a53-843419, hence the difference)

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-21 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773

--- Comment #18 from PeteVine  ---
> Well that sounds like the same issue.

> Note -fprofile-generate simple inserts counters in the generated code. In 
> fact the generated code is practically identical between Cortex-A5 and 
> Cortex-A7.

As long as the gcda file is not present, -fprofile-use yields an equally good
binary (obviously!), so clearly it's about the profile data somehow. If you
have any ideas or debugging suggestions, go ahead, I'll gladly test them.

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-21 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773

--- Comment #16 from PeteVine  ---
Also, I'd like to repeat the fact using -mcpu=cortex-a7 fixes the issue (no
library calls present). 

Incidentally, having run that A7 profiled binary on a Cortex-A53, I'm seeing a
10% hit compared to a vanilla A7 binary. Hopefully that's just an artifact of
profiling a different CPU architecture.

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-21 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773

--- Comment #15 from PeteVine  ---
I don't have a cross-compiler built/installed.

If you're positive the bug doesn't reproduce on your end (targeting generic or
A5 codegen), then maybe it's about some interaction between gcc instrumentation
and the slightly dated system libraries.  

I think my little A5->A53 experiment shows once the instrumented binary is
built, it doesn't matter how the profile data is gathered.

[Bug gcov-profile/69004] Building t-engine on ARM fails during -fprofile-use stage

2017-04-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69004

PeteVine  changed:

   What|Removed |Added

  Attachment #41239|0   |1
is obsolete||

--- Comment #37 from PeteVine  ---
Comment on attachment 41239
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41239
Assembly files produced with -fverbose-asm

Wrong bug, sorry. I can't get used to bugzilla jumping to the next issue on
posting a comment.

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773

--- Comment #13 from PeteVine  ---
Created attachment 41240
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41240=edit
Assembly files produced with -fverbose-asm

[Bug gcov-profile/69004] Building t-engine on ARM fails during -fprofile-use stage

2017-04-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69004

--- Comment #36 from PeteVine  ---
Created attachment 41239
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41239=edit
Assembly files produced with -fverbose-asm

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773

--- Comment #12 from PeteVine  ---
It even reproduces the following way:

I built an instrumented ARMv7 binary natively, ran it on a Cortex-A53, copied
the gcda file back, recompiled with -fprofile-use and got the same 20%
slowdown.

Surely, that must count (pun intended) for something, as both CPU's are
in-order designs.

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773

--- Comment #11 from PeteVine  ---
I've just retested gcc7 on both ARM platforms. 

AArch64 gets a 3% improvement now, while ARMv7 reproduces the issue, just as
before. I'm compiling/profiling on a Cortex A5 which could be the main reason
behind all this, as it doesn't have hard division.

[Bug target/79964] Cortex A53 codegen still not optimal

2017-04-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964

PeteVine  changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #1 from PeteVine  ---
Turns out -frename-registers fixes this issue as well, thanks for the tip!

http://openbenchmarking.org/result/1704142-RI-1703089RI22

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-04-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994

--- Comment #9 from PeteVine  ---
Well, yes, that fixes the -Ofast issue for me:

-mcpu=cortex-a53 -frename-registers
iir:65952 ns per loop
iir_2:  63098 ns per loop

-mcpu=cortex-a57 (-frename-registers)
iir:62839 ns per loop
iir_2:  62677 ns per loop

[Bug target/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-13 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007

--- Comment #9 from PeteVine  ---
Correction, it was about -fomit-frame-pointer period! Setting the environment
C(XX)FLAGS to that flag alone triggers the bug.

[Bug target/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-13 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007

--- Comment #8 from PeteVine  ---
It was about -O3 -fomit-frame-pointer, but yeah, I don't care one bit either.

Just make sure `--enable-languages=ada` works. (c++ is not being inferred so
you end up with no xg++)

[Bug ada/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007

--- Comment #6 from PeteVine  ---
Turns out it's a miscompilation bug as I was using the same set of C(XX)FLAGS
that work fine for those other languages. 

Removing the -fomit-frame-pointer flag while leaving the rest unchanged (-O3
-mtune=cortex-a57 -fipa-pta -march=armv8-a+crc) fixes the issue.

[Bug ada/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007

--- Comment #5 from PeteVine  ---
The repeated full ada bootstrap was successful at the same revision, using
identical flags and GNAT 5.4.0.

On the other hand, the failing build prints two warnings during the ada part:

g-debpoo.adb: In function ‘GNAT.DEBUG_POOLS.GET_SIZE’:
g-debpoo.adb:1418:8: warning: ‘SIZE_IN_STORAGE_ELEMENTS’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
g-debpoo.ads:295:7: note: ‘SIZE_IN_STORAGE_ELEMENTS’ was declared here

and

g-comlin.adb: In function ‘GNAT.COMMAND_LINE.FIND_LONGEST_MATCHING_SWITCH’:
g-comlin.adb:583:8: warning: ‘PARAM’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
g-comlin.adb:107:7: note: ‘PARAM’ was declared here

The error message read differently this time:

 7.0.1 20170311 (experimental) (aarch64-linux-gnu) Storage_Error stack overflow
or erroneous memory access

[Bug ada/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-11 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007

--- Comment #4 from PeteVine  ---
> Can you try again without --disable-bootstrap ?

It's GNAT 5.4.0. OK, I'll try again.

[Bug ada/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-11 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007

--- Comment #2 from PeteVine  ---
Right, I definitely used the same setup a few days ago minus
--disable-bootstrap.

[Bug ada/80007] New: --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-11 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007

Bug ID: 80007
   Summary: --disable-bootstrap with gnat-5 leads to failed gnat-7
build on aarch64
   Product: gcc
   Version: 7.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ada
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tulipawn at gmail dot com
  Target Milestone: ---

Never tried bootstrapping ada this way before (a full bootstrap succeeded a few
days ago), so I'm not entirely sure if that's allowed/advisable but I got the
following failure:

/mnt/odroid/gcc-master/build/./gcc/xgcc -B/mnt/odroid/gcc-master/build/./gcc/
-B/usr/gcc7/aarch64-linux-gnu/bin/ -B/usr/gcc7/aarch64-linux-gnu/lib/ -isystem
/usr/gcc7/aarch64-linux-gnu/include -isystem
/usr/gcc7/aarch64-linux-gnu/sys-include-c -g -O2  -fPIC  -W -Wall -gnatpg
-nostdinc   g-exptty.adb -o g-exptty.o
+===GNAT BUG DETECTED==+
| 7.0.1 20170311 (experimental) (aarch64-linux-gnu) Program_Error unhandled
signal|
| Error detected at s-stoele.adb:36:20 |
| Please submit a bug report; see https://gcc.gnu.org/bugs/ .  |
| Use a subject line meaningful to you and us to track the bug.|
| Include the entire contents of this bug box in the report.   |
| Include the exact command that you entered.  |
| Also include sources listed below.   |
+==+

Please include these source files with error report
Note that list may not be accurate in some cases,
so please double check that the problem can still
be reproduced with the set of files listed.
Consider also -gnatd.n switch (see debug.adb).

system.ads
g-exptty.adb
g-exptty.ads
g-expect.ads
gnat.ads
g-os_lib.ads
s-os_lib.ads
s-string.ads
ada.ads
a-uncdea.ads
g-regpat.ads
s-regpat.ads
g-tty.ads
s-oscons.ads
interfac.ads
i-c.ads
s-parame.ads
s-exctab.ads
s-stalib.ads
a-unccon.ads
a-tags.ads
s-stoele.ads
a-stream.ads
s-soflin.ads
a-except.ads
s-traent.ads
s-stache.ads
s-stratt.ads
s-unstyp.ads
s-secsta.ads
s-finmas.ads
a-finali.ads
s-finroo.ads
s-stopoo.ads
s-pooglo.ads
a-caldel.ads
a-calend.ads
s-stoele.adb

compilation abandoned
../gcc-interface/Makefile:296: recipe for target 'g-exptty.o' failed

[Bug target/79964] New: Cortex A53 codegen still not optimal

2017-03-08 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964

Bug ID: 79964
   Summary: Cortex A53 codegen still not optimal
   Product: gcc
   Version: 7.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tulipawn at gmail dot com
  Target Milestone: ---

Two data points:

- the integer benchmark from PR79665 runs about 7% slower with -mcpu=cortex-a53
vs other targets, equalling generic codegen. It was still indistinguishable on
20170220, so the regression must have happened shortly after as per PR79665#c13

- c-ray again, wherein dispensing with -mfix-cortex-a53-843419 removes the
handicap, but e.g. A57 codegen produces the best result ever on this machine:

http://openbenchmarking.org/result/1703089-RI-1703040RI07

Coming from a Cortex A53 with 32kB of L1 cache.

[Bug target/77730] Fortran performance on aarch64 (6/7 regression heads-up)

2017-03-07 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77730

PeteVine  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #9 from PeteVine  ---
So, with the whole -mfix-fix-cortex-a53-843419 debacle out of the way, there's
probably not much left to worry about:

http://openbenchmarking.org/result/1703071-RI-1609258LO61

[Bug fortran/79933] New: gfortran no longer able to compile dolfyn benchmark

2017-03-06 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79933

Bug ID: 79933
   Summary: gfortran no longer able to compile dolfyn benchmark
   Product: gcc
   Version: 7.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tulipawn at gmail dot com
  Target Milestone: ---

Created attachment 40900
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40900=edit
fortran source

Used to work fine a few months ago.

$ gfortran-7 -O2 gmsh2dolfyn.f90
gmsh2dolfyn.f90:159:9:

  stop'bug: error in dimensions of array v'
 1
Error: Blank required in STOP statement near (1)

Adding -std=f95 solves the issue so maybe some sort of suggestion is in order?

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker

2017-03-05 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105

--- Comment #15 from PeteVine  ---
Sorry wrong number :) I meant --enable-fix-cortex-a53-843419

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker

2017-03-05 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105

PeteVine  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #14 from PeteVine  ---
I've finally put two and two together and identified the cause,
--fix-cortex-a53-835769, which was kicking in for -mcpu=cortex-a53. Looks like
an LTO gold linker bug, period.

FWIW, an LTO bootstrapped gcc is actually slower, taking 2 more minutes to
complete a --disable-bootstrap build (3% slowdown).

[Bug middle-end/77546] [6/7 regression] C++ software renderer performance drop

2017-03-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77546

PeteVine  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WORKSFORME

--- Comment #7 from PeteVine  ---
Passing -mno-fix-cortex-a53-843419 fixes the issue; gcc 6.3 yields a 38.2 score
while gcc 7.0.1 an even better 39.5 using the same workaround.

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-03-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994

--- Comment #7 from PeteVine  ---
Not affected by -mno-fix-cortex-a53-843419 which gives the issue full validity.
-Ofast pessimizes Cortex A53 codegen somehow and switching to e.g.
-mcpu=cortex-a57 fixes it. (tested on trunk)

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-03-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468

--- Comment #31 from PeteVine  ---
Indeed, that was it! I've probably found the source of my A53 issues:

http://openbenchmarking.org/result/1703040-RI-CRAYERRAT99

This means comment #29 exposes a different issue and Cortex A53 codegen still
is suboptimal.

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-03-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468

--- Comment #30 from PeteVine  ---
Or rather, the difference observed in:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468#c7

is still there @ -Ofast, but the Cortex-A53 result is in the same range now.
I'll have to investigate the effect of --fix-cortex-a53-835769 that was always
passed by default in the other image.

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-03-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468

--- Comment #29 from PeteVine  ---
I used a different distribution image (binutils 2.25, no
--fix-cortex-a53-835769 option) but the results haven't changed (thunderx
tuning must have improved though as it stopped offering any benefit over A53):

http://openbenchmarking.org/result/1703043-RI-CRAYDEBIA96

[Bug target/79105] Autovectorized NEON code slower than vfpv4 on Cortex A5

2017-03-01 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79105

PeteVine  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #3 from PeteVine  ---
Closing in favour of PR79581.

[Bug target/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-27 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712

--- Comment #8 from PeteVine  ---
Seeing as unrolling does such a great job on aarch64, surpassing clang, should
we leave the ARM issue bunched together with this one?

[Bug target/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712

--- Comment #6 from PeteVine  ---
The difference between clang and gcc is even greater on ARMv7 Cortex A5 but
there's no way to catch up through unrolling (no effect): 

gcc version 7.0.1 20170225:1227.2 Kpos/sec
clang 3.6: 1540.4 Kpos/sec

[Bug target/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712

--- Comment #5 from PeteVine  ---
Clang however gets no further improvement from -funroll-loops meaning a simple
`-O3 -mcpu=cortex-a53` produces much better performance than gcc without
unrolling.

[Bug target/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712

--- Comment #4 from PeteVine  ---
It's a gcc version 7.0.1 20170220 (experimental) (GCC) configured with:

--enable-languages=c,c++,fortran --prefix=/usr/gcc7 --program-suffix=-7
--enable-shared --enable-linker-build-id --libexecdir=/usr/gcc7/lib
--without-included-gettext --enable-threads=posix --libdir=/usr/gcc7/lib
--enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-libquadmath --enable-plugin --with-system-zlib
--disable-browser-plugin --with-arch-directory=aarch64 --enable-multiarch
--enable-fix-cortex-a53-843419 --disable-werror --build=aarch64-linux-gnu
--host=aarch64-linux-gnu --target=aarch64-linux-gnu --enable-checking=release

[Bug middle-end/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712

--- Comment #2 from PeteVine  ---
Created attachment 40831
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40831=edit
inputs

[Bug middle-end/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712

--- Comment #1 from PeteVine  ---
Created attachment 40830
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40830=edit
C source

[Bug middle-end/79712] New: Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712

Bug ID: 79712
   Summary: Clang smarter about unrolling in fhourstones benchmark
   Product: gcc
   Version: 7.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tulipawn at gmail dot com
  Target Milestone: ---

Created attachment 40829
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40829=edit
preprocessed source

It seems clang is probably doing a better job at unrolling in the fhourstones
benchmark:

$ gcc -Wextra -Wall -Ofast -mcpu=cortex-a53 -march=armv8-a+crc -ftree-vectorize
SearchGame.i (-funroll-loops -fvariable-expansion-in-unroller
-ftree-loop-ivcanon -fivopts)
$ ./a.out < inputs

- clang 3.8 result: 3358 kpos/s
- gcc result: 3220 kpos/s
- gcc result with unrolling: 3473 kpos/s 

It would be nice if gcc could achieve similar performance to clang's -O3 out of
the box.

BTW, running the benchmark on 32-bit requires changing the %lu's to %llu's at
line 200 in the C source.

[Bug middle-end/79665] gcc's signed (x*x)/200 is slower than clang's

2017-02-23 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79665

--- Comment #13 from PeteVine  ---
Still, the 5% regression must have happened very recently. The fast gcc was
built on 20170220 and the slow one yesterday, using the original patch. Once
again, switching away from Cortex-A53 codegen restores the expected
performance.

[Bug middle-end/79665] gcc's signed (x*x)/200 is slower than clang's

2017-02-22 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79665

--- Comment #6 from PeteVine  ---
But that's related to -mcpu=cortex-a53 again, so never mind I guess.

[Bug middle-end/79665] gcc's signed (x*x)/200 is slower than clang's

2017-02-22 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79665

PeteVine  changed:

   What|Removed |Added

 CC||tulipawn at gmail dot com

--- Comment #5 from PeteVine  ---
Psst! GCC 7 was already 1.75x faster than Clang 3.8 on my aarch64 machine when
I benchmarked this code 3 weeks ago, but with this patch, it seems to take a 5%
hit.

[Bug target/79581] VFP4 slower than VFP3 in C-ray

2017-02-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581

--- Comment #4 from PeteVine  ---
> Judging by your -mcpu option is this on a Cortex-A5?

Yes, if you look at the results on a Cortex A53 running armv7 code, it doesn't
reproduce either, and A5-codegen is king :) (hopefully due to in-order design
or sth)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659#c12

A quick question regarding -mcpu=cortex-a5 codegen; is there a similar switch
to llvm's `-slowfpvmlx` feature? (disable slow vmla/vmls), which the nice ARM
guy divulged here:

https://bugs.llvm.org//show_bug.cgi?id=26135#c9  

or is it a non-issue in gcc?

[Bug target/79581] VFP4 slower than VFP3 in C-ray

2017-02-18 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581

--- Comment #2 from PeteVine  ---
Created attachment 40769
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40769=edit
sphract

The other file required to run the benchmark straight from bugzilla! :)

[Bug target/79581] VFP4 slower than VFP3 in C-ray

2017-02-17 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581

PeteVine  changed:

   What|Removed |Added

 Target||armv7

--- Comment #1 from PeteVine  ---
Distilled from PR79105 as a separate issue, not related to NEON and
autovectorization.

[Bug target/79581] New: VFP4 slower than VFP3 in C-ray

2017-02-17 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581

Bug ID: 79581
   Summary: VFP4 slower than VFP3 in C-ray
   Product: gcc
   Version: 7.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tulipawn at gmail dot com
  Target Milestone: ---

Created attachment 40762
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40762=edit
preprocessed source

$ gcc -marm -Ofast -mcpu=cortex-a5 -mfpu=vfpv3 c-ray-mt.i -lm -lpthread 

$ ./a.out -t 32 -s 160x120 -r 8 -i sphfract -o output.ppm ; done

Rendering took: 2 seconds (2393 milliseconds)

$ gcc -marm -Ofast -mcpu=cortex-a5 -mfpu=vfpv4 c-ray-mt.i -lm -lpthread

$ ./a.out -t 32 -s 160x120 -r 8 -i sphfract -o output.ppm ; done

Rendering took: 2 seconds (2494 milliseconds)

This defect dates back to gcc 4.9 (or earlier) but at least gcc 7 provides a
big speedup in vfvp4 code. (roughly 2500 now vs 2700 previously)

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-02-15 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468

--- Comment #28 from PeteVine  ---
Lesson learnt, thanks! 

If you look at the last -Ofast result (or 1702153-RI-CRAYFAST467), the suspect
difference is there (the compiler had been rebuilt from scratch with all the
patches), and I even managed to set a record performance with -mtune=thunderx.
I'm using an S905 SoC.

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-02-15 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468

--- Comment #26 from PeteVine  ---
OK, maybe this SoC is kinky, I give up:

http://openbenchmarking.org/result/1702154-RI-CRAYFAST326

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-02-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468

--- Comment #25 from PeteVine  ---
The original issue never mentioned -Ofast or -ffast-math and I see no
difference at -Ofast, indeed:

http://openbenchmarking.org/result/1702153-RI-CRAYFAST424

@jgreenhalgh Can you confirm there's no regression @ -O3 as well? Thanks.

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-02-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468

--- Comment #24 from PeteVine  ---
I did a git pull and restarted the build so unless something didn't get
reconfigured, it definitely should've been included. If you see the
improvement, never mind then.

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2017-02-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659

--- Comment #12 from PeteVine  ---
Nice, PR68664 patch has fixed the issue.

FWIW, unlike previously, running on a Cortex-A53, showed perfect alignment with
core type (-mfpu=vfpv3) on the first run:

Cortex-A8
Rendering took: 1 seconds (1801 milliseconds)

Cortex-A5
Rendering took: 1 seconds (1708 milliseconds)

Cortex-A7
Rendering took: 1 seconds (1699 milliseconds)

Cortex-A9
Rendering took: 1 seconds (1644 milliseconds)

Cortex-A15
Rendering took: 1 seconds (1637 milliseconds)

whereas using -mfpu=vfpv4 favours Cortex-A5 code's execution:

Cortex-A8
Rendering took: 1 seconds (1803 milliseconds)

Cortex-A5
Rendering took: 1 seconds (1506 milliseconds)

Cortex-A7
Rendering took: 1 seconds (1636 milliseconds)

Cortex-A9
Rendering took: 1 seconds (1645 milliseconds)

Cortex-A15
Rendering took: 1 seconds (1643 milliseconds)

but that's probably expected. Not sure about A8's codegen performance though.

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-02-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468

PeteVine  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|DUPLICATE   |---

--- Comment #22 from PeteVine  ---
@ktkachov I've just retested with your patches from PR68664 and it looks like
this could be a different issue, concerning Cortex-A53 codegen exclusively. 

Using -mcpu=thunderx demonstrates the problem (as well as gcc7 superiority
probably):

http://openbenchmarking.org/result/1702146-RI-1609039HA18

[Bug target/79480] -O3 and -mfpu=neon produces crashing code on ARM

2017-02-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79480

--- Comment #4 from PeteVine  ---
Whereas `-fsanitize=address` aborts all the same:

==28821==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs
operator delete) on 0xaf012100
#0 0xb6af76fb in operator delete(void*, unsigned int)
(/usr/gcc7/lib/libasan.so.4+0xd86fb)
#1 0xe5e53 in CLoad3DS::ProcessNextObjectChunk(CModel*, CObject*, Chunk*)
[clone .constprop.38] (/tmp/gl-117-1.3.2-src/src/gl-117+0xe5e53)
#2 0xe6773 in CLoad3DS::ProcessNextObjectChunk(CModel*, CObject*, Chunk*)
[clone .constprop.38] (/tmp/gl-117-1.3.2-src/src/gl-117+0xe6773)
#3 0xf8ae7 in CLoad3DS::ProcessNextChunk(CModel*, Chunk*) [clone
.constprop.30] (/tmp/gl-117-1.3.2-src/src/gl-117+0xf8ae7)
#4 0xf8ecf in CLoad3DS::ProcessNextChunk(CModel*, Chunk*) [clone
.constprop.30] (/tmp/gl-117-1.3.2-src/src/gl-117+0xf8ecf)
#5 0x80537 in CLoad3DS::Import3DS(CModel*, char*) [clone .constprop.29]
(/tmp/gl-117-1.3.2-src/src/gl-117+0x80537)
#6 0x3e403 in myFirstInit() (/tmp/gl-117-1.3.2-src/src/gl-117+0x3e403)
#7 0x1b2a7 in main (/tmp/gl-117-1.3.2-src/src/gl-117+0x1b2a7)
#8 0xb659d66f in __libc_start_main
(/lib/arm-linux-gnueabihf/libc.so.6+0x1766f)

0xaf012100 is located 0 bytes inside of 7680-byte region
[0xaf012100,0xaf013f00)
allocated by thread T0 here:
#0 0xb6af66cf in operator new[](unsigned int)
(/usr/gcc7/lib/libasan.so.4+0xd76cf)
#1 0xe5a5b in CLoad3DS::ProcessNextObjectChunk(CModel*, CObject*, Chunk*)
[clone .constprop.38] (/tmp/gl-117-1.3.2-src/src/gl-117+0xe5a5b)

SUMMARY: AddressSanitizer: alloc-dealloc-mismatch
(/usr/gcc7/lib/libasan.so.4+0xd86fb) in operator delete(void*, unsigned int)
==28821==HINT: if you don't care about these errors you may set
ASAN_OPTIONS=alloc_dealloc_mismatch=0
==28821==ABORTING

[Bug target/79480] -O3 and -mfpu=neon produces crashing code on ARM

2017-02-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79480

--- Comment #3 from PeteVine  ---
That's the same command line that leads to an immediate crash (uninstrumented).

[Bug target/79480] -O3 and -mfpu=neon produces crashing code on ARM

2017-02-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79480

--- Comment #2 from PeteVine  ---
OK, having been built with: 

-mcpu=cortex-a5 -O3 -ffast-math -marm -fomit-frame-pointer -fipa-pta
-mfpu=neon-vfpv4 -ftree-vectorize -flto -fsanitize=undefined

doesn't crash but prints many errors, e.g.:

3ds.cpp:111:8: runtime error: load of misaligned address 0x007de59a for type
'Uint32', which requires 4 byte alignment
0x007de59a: note: pointer points here
 00 00  4d 4d 0d be 00 00 02 00  0a 00 00 00 03 00 00 00  3d 3d b9 ab 00 00 3e
3d  0a 00 00 00 03 00
  ^ 
3ds.cpp:111:8: runtime error: load of misaligned address 0x007de5aa for type
'Uint32', which requires 4 byte alignment
0x007de5aa: note: pointer points here
 00 00  3d 3d b9 ab 00 00 3e 3d  0a 00 00 00 03 00 00 00  ff af fa 00 00 00 00
a0  14 00 00 00 30 31
  ^ 
3ds.cpp:125:8: runtime error: load of misaligned address 0x007de5e1 for type
'Uint16', which requires 2 byte alignment
0x007de5e1: note: pointer points here
 00 96 96  96 20 a0 0f 00 00 00 11  00 09 00 00 00 96 96 96  30 a0 0f 00 00 00
11 00  09 00 00 00 e5
  ^ 
3ds.cpp:111:8: runtime error: load of misaligned address 0x007de5e3 for type
'Uint32', which requires 4 byte alignment
0x007de5e3: note: pointer points here
 96  96 20 a0 0f 00 00 00 11  00 09 00 00 00 96 96 96  30 a0 0f 00 00 00 11 00 
09 00 00 00 e5 e5 e5
  ^ 
3ds.cpp:125:8: runtime error: load of misaligned address 0x007de5e7 for type
'Uint16', which requires 2 byte alignment
0x007de5e7: note: pointer points here
 0f 00 00 00 11  00 09 00 00 00 96 96 96  30 a0 0f 00 00 00 11 00  09 00 00 00
e5 e5 e5 40  a0 0e 00

LIBGL: warning, gles_glBlendFuncSeparate is NULL
main.cpp:4762:29: runtime error: index 256 out of bounds for type 'int [256]'
main.cpp:4762:10: runtime error: load of address 0x00427180 with insufficient
space for an object of type 'int'
0x00427180: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  01 00 00 00 00
00 00 00  d8 2a 61 00
  ^ 
main.cpp:4783:35: runtime error: index 65536 out of bounds for type 'unsigned
char [65536]'
main.cpp:4783:37: runtime error: store to address 0x00407180 with insufficient
space for an object of type 'unsigned char'
0x00407180: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00
00 00 00  00 00 00 00
  ^ 
main.cpp:4784:39: runtime error: index 65537 out of bounds for type 'unsigned
char [65536]'
main.cpp:4784:41: runtime error: store to address 0x00407181 with insufficient
space for an object of type 'unsigned char'
0x00407181: note: pointer points here
 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00
00 00  00 00 00 00 00
  ^ 
main.cpp:4785:39: runtime error: index 65538 out of bounds for type 'unsigned
char [65536]'
main.cpp:4785:41: runtime error: store to address 0x00407182 with insufficient
space for an object of type 'unsigned char'
0x00407182: note: pointer points here
 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00
00  00 00 00 00 00 00
  ^ 
main.cpp:4786:39: runtime error: index 65539 out of bounds for type 'unsigned
char [65536]'
main.cpp:4786:41: runtime error: store to address 0x00407183 with insufficient
space for an object of type 'unsigned char'
0x00407183: note: pointer points here
 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00
  ^ 
main.cpp:3384:44: runtime error: downcast of address 0x00cb3488 which does not
point to an object of type 'AIObj'
0x00cb3488: note: object is of type 'DynamicObj'
 61 01 00 00  c4 2f 29 00 00 00 00 00  00 00 00 00 01 1b c8 00  00 00 80 3f 00
00 80 3f  00 00 00 3f
  ^~~
  vptr for 'DynamicObj'
main.cpp:3385:19: runtime error: member access within address 0x00cc67d8 which
does not point to an object of type 'DynamicObj'
0x00cc67d8: note: object is of type 'CExplosion'
 61 00 00 00  e4 43 29 00 00 00 00 00  00 00 00 00 00 00 00 00  cd cc cc 3d 00
00 80 3f  00 00 00 3f
  ^~~
  vptr for 'CExplosion'

[Bug target/79480] New: -O3 and -mfpu=neon produces crashing code on ARM

2017-02-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79480

Bug ID: 79480
   Summary: -O3 and -mfpu=neon produces crashing code on ARM
   Product: gcc
   Version: 7.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tulipawn at gmail dot com
  Target Milestone: ---

The gl-117 binary (source link attached) compiled with:

-mcpu=cortex-a5 -O3 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize

crashes with a SIGBUS plus this kernel info:

 Alignment trap: not handling instruction f4620adf at [<00067c8c>]
 Unhandled fault: alignment exception (0x001) at 0x0048fcb1

Moreover, using LTO prints a warning about undefined behaviour:

main.cpp: In function ‘_ZL11myTimerFunci.isra.26’:
main.cpp:4762:29: warning: iteration 256 invokes undefined behavior
[-Waggressive-loop-optimizations]
  int h = heat [yind] [i2];
 ^
main.cpp:4757:5: note: containing loop
 for (i2 = 0; i2 < maxfx + 1; i2 ++)
 ^

The code is rather old C++ (g++ warns: ISO C++ forbids converting a string
constant to ‘char*’) but as long as -mfpu=neon is not used it doesn't crash,
even with the above warning. Reporting just in case.

[Bug target/79370] Cortex-A7 hardware division switched on for -mcpu but not -mtune

2017-02-03 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79370

PeteVine  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from PeteVine  ---
Oops, the meaning of -mcpu and -mtune must have switched places in my head :)

[Bug target/79370] New: Cortex-A7 hardware division switched on for -mcpu but not -mtune

2017-02-03 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79370

Bug ID: 79370
   Summary: Cortex-A7 hardware division switched on for -mcpu but
not -mtune
   Product: gcc
   Version: 7.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tulipawn at gmail dot com
  Target Milestone: ---

Created attachment 40667
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40667=edit
Preprocessed source

Compiling the attachment (produced from http://pastebin.com/X3ZTp1c0) only
emits sdiv instructions for -mcpu=cortex-a7 but not for the more logical case
of -mtune=cortex-a7

FWIW, running the produced "tuned" code on a Cortex-A53 (32bit mode) incurs a
6x penalty compared with -mcpu which is fully consistent with targeting a
soft-division CPU. 

gcc version 7.0.1 20170127, switches used:
-Ofast -mtune=cortex-a7 -marm -mfloat-abi=hard -mfpu=vfpv4

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2017-01-30 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659

--- Comment #11 from PeteVine  ---
Super cool, thanks! That makes the OP a true prophet before his time ;)

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2017-01-29 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659

--- Comment #9 from PeteVine  ---
@jgreenhalgh Please have a look at the profiled assembly for both fast and slow
codegen. (attached)

According to @aldyh's bisection in #68664 this probably isn't the same issue.

[Bug target/79239] [7 regression] ICE in extract_insn, at recog.c:2311 (error: unrecognizable insn)

2017-01-26 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79239

--- Comment #5 from PeteVine  ---
Yes, this came from the gl4es project, and compiling the whole thing normally,
only gcc7 is affected.

[Bug target/79239] [7 regression] ICE in extract_insn, at recog.c:2311 (error: unrecognizable insn)

2017-01-26 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79239

--- Comment #2 from PeteVine  ---
gcc -O2 or above elicits the ICE, configured with:

--enable-languages=c,c++,fortran --prefix=/usr/gcc7 --program-suffix=-7
--enable-shared --enable-linker-build-id --libexecdir=/usr/gcc7/lib
--without-included-gettext --enable-threads=posix --libdir=/usr/gcc7/lib
--enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=gcc4-compatible --disable-libstdcxx-dual-abi
--enable-gnu-unique-object --disable-libitm --disable-libquadmath
--enable-plugin --with-system-zlib --disable-browser-plugin
--with-arch-directory=arm --enable-multiarch --enable-multilib
--disable-sjlj-exceptions --with-arch=armv7-a --with-fpu=vfpv3
--with-float=hard --with-mode=arm --disable-werror --enable-multilib
--build=arm-linux-gnueabihf --host=arm-linux-gnueabihf
--target=arm-linux-gnueabihf --enable-checking=release

[Bug target/79239] New: [7 regression] ICE in extract_insn, at recog.c:2311 (error: unrecognizable insn)

2017-01-26 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79239

Bug ID: 79239
   Summary: [7 regression] ICE in extract_insn, at recog.c:2311
(error: unrecognizable insn)
   Product: gcc
   Version: 7.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tulipawn at gmail dot com
  Target Milestone: ---

Created attachment 40586
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40586=edit
Preposessed source

gcc version 7.0.1 20170122 (experimental) (GCC) ICEs on the attached
preprocessed source:

../src/gl/buffers.c: In function ‘gl4es_glBindBuffer’:
../src/gl/buffers.c:113:1: error: unrecognizable insn:
 }
 ^
(insn 130 129 131 12 (set (reg:DF 612)
(fma:DF (reg:DF 611)
(reg:DF 613)
(reg:DF 614))) "../src/gl/buffers.h":17 -1
 (nil))
../src/gl/buffers.c:113:1: internal compiler error: in extract_insn, at
recog.c:2311

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-01-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468

--- Comment #21 from PeteVine  ---
It would be great if https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 could
get squashed in one fell swoop.

[Bug target/79105] Autovectorized NEON code slower than vfpv4 on Cortex A5

2017-01-17 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79105

--- Comment #2 from PeteVine  ---
$ gcc -v

Configured with: ../configure -v --enable-languages=c,c++,fortran
--prefix=/usr/gcc7 --program-suffix=-7 --enable-shared --enable-linker-build-id
--libexecdir=/usr/gcc7/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/gcc7/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=gcc4-compatible
--enable-gnu-unique-object --disable-libitm --disable-libquadmath
--enable-plugin --with-system-zlib --disable-browser-plugin
--with-arch-directory=arm --enable-multiarch --enable-multilib
--disable-sjlj-exceptions --with-arch=armv7-a --with-fpu=vfpv3
--with-float=hard --with-mode=arm --disable-werror --enable-multilib
--build=arm-linux-gnueabihf --host=arm-linux-gnueabihf
--target=arm-linux-gnueabihf --enable-checking=release
Thread model: posix
gcc version 7.0.0 20170114 (experimental) (GCC)

[Bug target/79105] Autovectorized NEON code slower than vfpv4 on Cortex A5

2017-01-17 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79105

--- Comment #1 from PeteVine  ---
Updated to include an explicit -mfpu=neon-vfpv4. 

http://openbenchmarking.org/result/1701179-TA-1701143TA49

Not sure if -mcpu=cortex-a5 and -mfpu=neon shouldn't have implied VFPv4 but the
explicit addition has fixed a few results making others worse, however.

[Bug target/79105] New: Autovectorized NEON code slower than vfpv4 on Cortex A5

2017-01-16 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79105

Bug ID: 79105
   Summary: Autovectorized NEON code slower than vfpv4 on Cortex
A5
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tulipawn at gmail dot com
  Target Milestone: ---

As the title says, many results seem to suffer from switching to -mfpu=neon,
etc.

http://openbenchmarking.org/result/1701165-TA-1701143TA78

Could anyone explain the abnormally small difference between armv7 and aarch64
in OpenSSL?

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-01-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994

--- Comment #6 from PeteVine  ---
It's possible I already had that patch included in my build, but 
in case I didn't, here's a quick addition to the previous result:

http://openbenchmarking.org/result/1701143-TA-GCCCOMPAR66

The c-ray thunderx result suggests A53 codegen is still suboptimal. The patch
has had no effect on the original issue.

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-01-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994

--- Comment #4 from PeteVine  ---
I'm delighted to report **not** targeting Cortex-A53 actually incurs a
performance penalty sometimes ;)

http://openbenchmarking.org/result/1701128-TA-GCCCOMPAR79

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-01-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994

--- Comment #3 from PeteVine  ---
Hey, that works for me too! (62565 vs 70758 in favour of -Ofast). Usefully
strange :)

[Bug middle-end/78994] New: -Ofast makes aarch64 C++ benchmark slower

2017-01-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994

Bug ID: 78994
   Summary: -Ofast makes aarch64 C++ benchmark slower
   Product: gcc
   Version: 5.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tulipawn at gmail dot com
  Target Milestone: ---

Created attachment 40463
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40463=edit
Preprocessed source + assembly files

After make && ./build/dsp-bench, the results look as follows

@ -O3 -mcpu=cortex-a53 -ftree-vectorize

iir:67945 ns per loop
iir_2:  67952 ns per loop

@ -Ofast -mcpu=cortex-a53 -ftree-vectorize

iir:73367 ns per loop
iir_2:  73349 ns per loop

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker

2016-12-17 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105

--- Comment #13 from PeteVine  ---
Also, could these (sample) warnings actually matter when using ld.gold? NB,
lra-constraints.c features in the previously provided backtrace:

../../libdecnumber/decNumber.c:3582:0: note: code may be misoptimized unless
-fno-strict-aliasing is used
../../gcc/../libdecnumber/decNumber.h:118:0: warning: type of
‘decNumberFromString’ does not match original declaration [-Wlto-type-mismatch]
   decNumber * decNumberFromString(decNumber *, const char *, decContext *);

../../libdecnumber/decNumber.h:118:0: warning: type of ‘decNumberFromString’
does not match original declaration [-Wlto-type-mismatch]
   decNumber * decNumberFromString(decNumber *, const char *, decContext *);

../../libdecnumber/decNumber.h:118:0: warning: type of ‘decNumberFromString’
does not match original declaration [-Wlto-type-mismatch]
   decNumber * decNumberFromString(decNumber *, const char *, decContext *);

../../libdecnumber/decNumber.h:118:0: warning: type of ‘decNumberFromString’
does not match original declaration [-Wlto-type-mismatch]
   decNumber * decNumberFromString(decNumber *, const char *, decContext *);

../../libdecnumber/decNumber.c:489:0: note: ‘decNumberFromString’ was
previously declared here
 decNumber * decNumberFromString(decNumber *dn, const char chars[],

../../libdecnumber/decNumber.c:489:0: note: code may be misoptimized unless
-fno-strict-aliasing is used
../../gcc/vec.h:1552:1: warning: ‘safe_push’ violates the C++ One Definition
Rule  [-Wodr]
 vec::safe_push (const T  MEM_STAT_DECL)
 ^
../../gcc/vec.h:1552:1: note: return value type mismatch
 vec::safe_push (const T  MEM_STAT_DECL)
 ^
../../gcc/loop-invariant.c:100:8: note: type ‘struct invariant’ itself violate
the C++ One Definition Rule
 struct invariant
^
../../gcc/lra-constraints.c:4742:8: note: the incompatible type is defined here
 struct invariant
^
../../gcc/vec.h:1552:1: note: ‘safe_push’ was previously declared here
 vec::safe_push (const T  MEM_STAT_DECL)
 ^
../../gcc/vec.h:1552:1: note: code may be misoptimized unless
-fno-strict-aliasing is used
../../gcc/vec.h:1540:1: warning: ‘quick_push’ violates the C++ One Definition
Rule  [-Wodr]
 vec::quick_push (const T )
 ^
../../gcc/vec.h:1540:1: note: return value type mismatch
 vec::quick_push (const T )
 ^
../../gcc/loop-invariant.c:100:8: note: type ‘struct invariant’ itself violate
the C++ One Definition Rule
 struct invariant
^
../../gcc/lra-constraints.c:4742:8: note: the incompatible type is defined here
 struct invariant
^
../../gcc/vec.h:1540:1: note: ‘quick_push’ was previously declared here
 vec::quick_push (const T )
 ^
../../gcc/vec.h:1540:1: note: code may be misoptimized unless
-fno-strict-aliasing is used

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker

2016-12-17 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105

PeteVine  changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #12 from PeteVine  ---
 The crash still reproduces on aarch64. Any suggestions on the ld.gold
connection?

[Bug c++/69481] ICE with C++11 alias using with templates

2016-12-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69481

PeteVine  changed:

   What|Removed |Added

 CC||tulipawn at gmail dot com

--- Comment #10 from PeteVine  ---
I've just hit the same ICE trying to build HHVM on aarch64 with g++7 20161202:

[  5%] Building CXX object
third-party/folly/CMakeFiles/folly.dir/src/folly/ThreadCachedArena.cpp.o
In file included from
/mnt/odroid/hhvm/third-party/folly/folly/detail/ThreadLocalDetail.h:32:0,
 from
/mnt/odroid/hhvm/third-party/folly/folly/ThreadLocal.h:59,
 from
/mnt/odroid/hhvm/third-party/folly/folly/ThreadCachedArena.h:24,
 from
/mnt/odroid/hhvm/third-party/folly/src/folly/ThreadCachedArena.cpp:17:
/mnt/odroid/hhvm/third-party/folly/folly/Function.h:653:3: internal compiler
error: same canonical type node for different types
folly::Function::Traits and
folly::detail::function::FunctionTraits
   SharedProxy asSharedProxy() && {
   ^~~
[  5%] Building CXX object
third-party/folly/CMakeFiles/folly.dir/src/folly/ThreadCachedArena.cpp.o
In file included from
/mnt/odroid/hhvm/third-party/folly/folly/detail/ThreadLocalDetail.h:32:0,
 from
/mnt/odroid/hhvm/third-party/folly/folly/ThreadLocal.h:59,
 from
/mnt/odroid/hhvm/third-party/folly/folly/ThreadCachedArena.h:24,
 from
/mnt/odroid/hhvm/third-party/folly/src/folly/ThreadCachedArena.cpp:17:
/mnt/odroid/hhvm/third-party/folly/folly/Function.h:653:3: internal compiler
error: same canonical type node for different types
folly::Function::Traits and
folly::detail::function::FunctionTraits
   SharedProxy asSharedProxy() && {
   ^~~

[Bug bootstrap/78220] New: Add 'remounting exec' suggestion

2016-11-05 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78220

Bug ID: 78220
   Summary: Add 'remounting exec' suggestion
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tulipawn at gmail dot com
  Target Milestone: ---

Restarting a build on a `noexec` partition fails with:

checking whether the C compiler works... configure: error: in
`/mnt/gcc-svn-master/build-lto/gcc':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.

The above suggestion could include the most obvious, i.e. exec permissions, for
example: 

"Check if the partition hasn't been (re)mounted noexec."

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker

2016-10-29 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105

PeteVine  changed:

   What|Removed |Added

Summary|ICE during LTO bootstrap on |ICE during LTO bootstrap on
   |AARCH64 with extra options  |AARCH64 with gold linker

--- Comment #11 from PeteVine  ---
Ok, it was neither about binutils, nor special flags. I was able to complete
the build using 2.27 and full flags, provided ld.bfd was used and
--with-checking=release was never used.

The latter option was leading to discrepancies on 32/64-bit ARM platforms.

[Bug target/77730] Fortran performance on aarch64 (6/7 regression heads-up)

2016-10-27 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77730

--- Comment #8 from PeteVine  ---
I thought I was clear it was just a heads-up. All relevant data is already
inside and anyone willing to look closer should just run the benchmark on any
machine/platform like this, e.g.:

$ phoronix-test-suite benchmark 1609257-LO-FORTRANAA01 ,

and so on.

[Bug bootstrap/77917] ARM/AARCH64 bootstrap-lto fails

2016-10-27 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77917

PeteVine  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |WORKSFORME

[Bug bootstrap/77917] ARM/AARCH64 bootstrap-lto fails

2016-10-27 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77917

--- Comment #11 from PeteVine  ---
Well, I finally managed to complete an LTO bootstrap on ARM (even leaving the
full complement of C(XX)FLAGS in place, bar -flto) but it seems using ld.bfd is
a must.

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with extra options

2016-10-26 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105

--- Comment #10 from PeteVine  ---
Hold on, I'm investigating the effect of binutils downgrade (2.27 -> 2.26) and
switching to ld.bfd. Strange that it all works fine during normal bootstrap,
regardless of the codegen options.

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with extra options

2016-10-26 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105

--- Comment #9 from PeteVine  ---
Ha! It's not about the extra options, thank you! ;)

Completely vanilla environment and gcc 5.4 produce this after a while:

../build-lto-noflags/./gcc/xgcc -B../build-lto-noflags/./gcc/
-B/usr/gcc7/aarch64-linux-gnu/bin/ -B/usr/gcc7/aarch64-linux-gnu/lib/ -isystem
/usr/gcc7/aarch64-linux-gnu/include -isystem
/usr/gcc7/aarch64-linux-gnu/sys-include-g -O2 -O2  -g -O2 -DIN_GCC-W
-Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes
-Wmissing-prototypes -Wold-style-definition  -isystem ./include   -fPIC -g
-DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector   -fPIC -I. -I.
-I../.././gcc -I../../../libgcc -I../../../libgcc/. -I../../../libgcc/../gcc
-I../../../libgcc/../include  -DHAVE_CC_TLS  -o _muldi3.o -MT _muldi3.o -MD -MP
-MF _muldi3.dep -DL_muldi3 -c ../../../libgcc/libgcc2.c -fvisibility=hidden
-DHIDE_EXPORTS
In file included from /usr/include/string.h:630:0,
 from ../../../libgcc/../gcc/tsystem.h:100,
 from ../../../libgcc/libgcc2.c:27:
/usr/include/aarch64-linux-gnu/bits/string2.h: In function ‘__strspn_c2’:
/usr/include/aarch64-linux-gnu/bits/string2.h:1039:3: internal compiler error:
Segmentation fault
   while (__s[__result] == __accept1 || __s[__result] == __accept2)
   ^
0x84173b crash_signal
../../gcc/toplev.c:338
0x5d9020 c_parser_binary_expression
../../gcc/c/c-parser.c:6855
0x5d9a23 c_parser_conditional_expression
../../gcc/c/c-parser.c:6495
0x5d9f4b c_parser_expr_no_commas
../../gcc/c/c-parser.c:6412
0x5dc2bf c_parser_expression
../../gcc/c/c-parser.c:8608
0x5dd823 c_parser_expression_conv
../../gcc/c/c-parser.c:8641
0x5dd89b c_parser_condition
../../gcc/c/c-parser.c:5458
0x5df1bf c_parser_paren_condition
../../gcc/c/c-parser.c:5477
0x5ebd1f c_parser_while_statement
../../gcc/c/c-parser.c:5789
0x5e7293 c_parser_statement_after_labels
../../gcc/c/c-parser.c:5265
0x5e608f c_parser_compound_statement_nostart
../../gcc/c/c-parser.c:4944
0x5e68b3 c_parser_compound_statement
../../gcc/c/c-parser.c:4777
0x5e5853 c_parser_declaration_or_fndef
../../gcc/c/c-parser.c:2176
0x5f3ecb c_parser_external_declaration
../../gcc/c/c-parser.c:1574
0x5f4907 c_parser_translation_unit
../../gcc/c/c-parser.c:1454
0x5f4907 c_parse_file()
../../gcc/c/c-parser.c:18173
0xe3f497 c_common_parse_file()
../../gcc/c-family/c-opts.c:1087

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with extra options

2016-10-26 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105

--- Comment #8 from PeteVine  ---
FWIW, here's the corresponding backtrace:

#0  0x00afa00c in df_get_live_out () at ../../gcc/df.h:1159
#1  update_ebb_live_info (tail=, head=0x137b838
) at ../../gcc/lra-constraints.c:5612
#2  lra_inheritance () at ../../gcc/lra-constraints.c:6251
#3  0x00b02da0 in lra (f=) at ../../gcc/lra.c:2403
#4  0x00e8cb60 in do_reload () at ../../gcc/ira.c:5374
#5  execute (this=) at ../../gcc/ira.c:5558
#6  0x00abdd80 in execute_one_pass (pass=pass@entry=0x1451ac0) at
../../gcc/passes.c:2341
#7  0x00abe29c in execute_pass_list_1 (pass=0x1451ac0) at
../../gcc/passes.c:2430
#8  0x00abe2b0 in execute_pass_list_1 (pass=0x1450a40) at
../../gcc/passes.c:2431
#9  0x00abe308 in execute_pass_list (fn=,
pass=) at ../../gcc/passes.c:2441
#10 0x00d1e148 in cgraph_node::expand (this=this@entry=0x7fb7431000) at
../../gcc/cgraphunit.c:2001
#11 0x00d1f1f8 in expand_all_functions () at
../../gcc/cgraphunit.c:2137
#12 symbol_table::_ZN12symbol_table7compileEv.part.49(void)
(this=this@entry=0x7fb7798000) at ../../gcc/cgraphunit.c:2494
#13 0x00d1f674 in symbol_table::compile (this=0x7fb7798000) at
../../gcc/cgraphunit.c:2400
#14 symbol_table::finalize_compilation_unit (this=0x7fb7798000) at
../../gcc/cgraphunit.c:2584
#15 0x00863ef8 in compile_file () at ../../gcc/toplev.c:493
#16 0x00573f80 in do_compile () at ../../gcc/toplev.c:2012
#17 toplev::main (this=this@entry=0x7fdf68, argc=,
argc@entry=92, argv=, argv@entry=0x7fe0b8) at
../../gcc/toplev.c:2146
#18 0x0057345c in main (argc=92, argv=0x7fe0b8) at
../../gcc/main.c:39

I'm going to clear all flags in the future if that's the expected way.

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with extra options

2016-10-26 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105

--- Comment #7 from PeteVine  ---
Restarted the whole thing from scratch using gcc 5.4 and it segfaulted again.

../../../libgcc/libgcc2.c: In function ‘__powitf2’:
../../../libgcc/libgcc2.c:1851:1: internal compiler error: Segmentation fault
 }
 ^
CXX/CFLAGS as above, configured as follows:

../configure --enable-languages=c,c++,fortran --prefix=/usr/gcc7
--program-suffix=-7 --enable-shared --enable-linker-build-id
--libexecdir=/usr/gcc7/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/gcc7/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-libquadmath --enable-plugin
--with-system-zlib --disable-browser-plugin --with-arch-directory=aarch64
--enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror
--build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu
--enable-checking=release --with-build-config=bootstrap-lto

  1   2   3   >