[Bug gcov-profile/83509] gcov-dump-8 unable to dump any gcda files
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83509 --- Comment #4 from PeteVine --- Created attachment 42934 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42934=edit corresponding C/gcda/gcno files
[Bug gcov-profile/83509] gcov-dump-8 unable to dump any gcda files
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83509 --- Comment #3 from PeteVine --- OK, the following command was used to obtain the gcno/gcda files: $ gcc-8 -O3 -fprofile-generate -ftest-coverage sudoku.c && ./a.out Unlike the gcda file, gcno is dumpable with gcov-dump-8.
[Bug gcov-profile/83509] New: gcov-dump-8 unable to dump any gcda files
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83509 Bug ID: 83509 Summary: gcov-dump-8 unable to dump any gcda files Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- Created attachment 42931 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42931=edit example gcda file created with gcc 8.0 As per https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82614#c16 not being able to dump previous versions' format is to be expected but gcov-dump-8 fails at dumping files created by gcc-8 as well: $ gcov-dump-8 sudoku.gcda sudoku.gcda:data:magic `gcda':version `A80e' sudoku.gcda:stamp 1935438558 sudoku.gcda:tag `0052' is invalid sudoku.gcda:0052:2583412121:UNKNOWN The file can still be dumped with previous versions of gcov-dump.
[Bug gcov-profile/82614] GCOV crashes while parsing gcda file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82614 --- Comment #15 from PeteVine --- No, that's not it - gcov-dump 6/7 have no problem dumping previous versions. I'm just not sure if the problem with gcov-dump-8 is architecture specific (ARM) or it's something to do with my setup. I'm going to leave it there.
[Bug gcov-profile/82614] GCOV crashes while parsing gcda file
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82614 --- Comment #13 from PeteVine --- Almost certainly not related, but there's been some sort of regression in gcov-dump from GCC 8 branch. Trying to dump any *.gcda file (ver. 8 included) ends like this: $ gcov-dump-8 Unified_cpp_js_src25.gcda Unified_cpp_js_src25.gcda:data:magic `gcda':version `504*' Unified_cpp_js_src25.gcda:warning:current version is `A80e' Unified_cpp_js_src25.gcda:stamp 532248120 Unified_cpp_js_src25.gcda:tag `01ba' is invalid Unified_cpp_js_src25.gcda:01ba:3336454216:UNKNOWN
[Bug middle-end/70773] Cortex A5 profiled sudoku solver slower due to lack of sdiv/udiv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #25 from PeteVine --- So, the profile data is probably fine, and judging from the size of the final binary, it's being used. The fix could be real after all :)
[Bug middle-end/70773] Cortex A5 profiled sudoku solver slower due to lack of sdiv/udiv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #24 from PeteVine --- Or maybe not, gcov-dump-6 is able to read the file. $ gcov-dump-6 sudoku.gcda.good sudoku.gcda.good:data:magic `gcda':version `A80e' sudoku.gcda.good:warning:current version is `603*' sudoku.gcda.good:stamp 46451024 sudoku.gcda.good: a300: 82:PROGRAM_SUMMARY checksum=0x1b6601f6 sudoku.gcda.good: counts=29, runs=1, sum_all=283905002, run_max=58689000, sum_max=58689000 sudoku.gcda.good: counter histogram: sudoku.gcda.good: 0: num counts=2, min counter=0, cum_counter=0 sudoku.gcda.good: 1: num counts=2, min counter=1, cum_counter=2 sudoku.gcda.good: 35: num counts=6, min counter=1000, cum_counter=6000 sudoku.gcda.good: 41: num counts=2, min counter=3000, cum_counter=6000 sudoku.gcda.good: 48: num counts=2, min counter=9000, cum_counter=18000 sudoku.gcda.good: 54: num counts=1, min counter=27000, cum_counter=27000 sudoku.gcda.good: 55: num counts=1, min counter=29000, cum_counter=29000 sudoku.gcda.good: 58: num counts=1, min counter=52000, cum_counter=52000 sudoku.gcda.good: 60: num counts=2, min counter=81000, cum_counter=162000 sudoku.gcda.good: 82: num counts=1, min counter=3531000, cum_counter=3531000 sudoku.gcda.good: 86: num counts=4, min counter=6469000, cum_counter=26033000 sudoku.gcda.good: 92: num counts=1, min counter=19563000, cum_counter=19563000 sudoku.gcda.good: 98: num counts=4, min counter=58411000, cum_counter=234478000 sudoku.gcda.good: 0100: 3:FUNCTION ident=108032747, lineno_checksum=0x0ceca33f, cfg_checksum=0x73ff2042 sudoku.gcda.good: 01a1: 6:COUNTERS arcs 3 counts sudoku.gcda.good: 01af: 2:COUNTERS ior 1 counts sudoku.gcda.good: 0100: 3:FUNCTION ident=82881, lineno_checksum=0x3ae31d81, cfg_checksum=0x707619b8 sudoku.gcda.good: 01a1: 14:COUNTERS arcs 7 counts sudoku.gcda.good: 01af: 2:COUNTERS ior 1 counts sudoku.gcda.good: 0100: 3:FUNCTION ident=1633341470, lineno_checksum=0xf25ea178, cfg_checksum=0x1bd90f34 sudoku.gcda.good: 01a1: 22:COUNTERS arcs 11 counts sudoku.gcda.good: 01af: 2:COUNTERS ior 1 counts sudoku.gcda.good: 0100: 3:FUNCTION ident=535938890, lineno_checksum=0x375a9f34, cfg_checksum=0x5d41b59e sudoku.gcda.good: 01a1: 16:COUNTERS arcs 8 counts sudoku.gcda.good: 01af: 2:COUNTERS ior 1 counts
[Bug middle-end/70773] Cortex A5 profiled sudoku solver slower due to lack of sdiv/udiv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #23 from PeteVine --- $ gcov-dump-6 sudoku.gcda.bad sudoku.gcda.bad:data:magic `gcda':version `603*' sudoku.gcda.bad:stamp 46515746 sudoku.gcda.bad: a300: 77:PROGRAM_SUMMARY checksum=0x12ec1c02 sudoku.gcda.bad:counts=29, runs=1, sum_all=342403001, run_max=58689000, sum_max=58689000 sudoku.gcda.bad:counter histogram: sudoku.gcda.bad:0: num counts=2, min counter=0, cum_counter=0 sudoku.gcda.bad:1: num counts=1, min counter=1, cum_counter=1 sudoku.gcda.bad:35: num counts=6, min counter=1000, cum_counter=6000 sudoku.gcda.bad:41: num counts=1, min counter=3000, cum_counter=3000 sudoku.gcda.bad:48: num counts=3, min counter=9000, cum_counter=27000 sudoku.gcda.bad:54: num counts=1, min counter=27000, cum_counter=27000 sudoku.gcda.bad:55: num counts=1, min counter=29000, cum_counter=29000 sudoku.gcda.bad:60: num counts=3, min counter=81000, cum_counter=243000 sudoku.gcda.bad:82: num counts=1, min counter=3531000, cum_counter=3531000 sudoku.gcda.bad:86: num counts=4, min counter=6469000, cum_counter=26033000 sudoku.gcda.bad:92: num counts=1, min counter=19563000, cum_counter=19563000 sudoku.gcda.bad:98: num counts=5, min counter=58411000, cum_counter=292941000 sudoku.gcda.bad: 0100: 3:FUNCTION ident=108032747, lineno_checksum=0x0ceca33f, cfg_checksum=0x73ff2042 sudoku.gcda.bad: 01a1: 6:COUNTERS arcs 3 counts sudoku.gcda.bad: 01b1: 2:COUNTERS time_profiler 1 counts sudoku.gcda.bad: 0100: 3:FUNCTION ident=82881, lineno_checksum=0x3ae31d81, cfg_checksum=0x707619b8 sudoku.gcda.bad: 01a1: 14:COUNTERS arcs 7 counts sudoku.gcda.bad: 01b1: 2:COUNTERS time_profiler 1 counts sudoku.gcda.bad: 0100: 3:FUNCTION ident=1633341470, lineno_checksum=0xf25ea178, cfg_checksum=0x88a084d7 sudoku.gcda.bad: 01a1: 22:COUNTERS arcs 11 counts sudoku.gcda.bad: 01b1: 2:COUNTERS time_profiler 1 counts sudoku.gcda.bad: 0100: 3:FUNCTION ident=535938890, lineno_checksum=0x375a9f34, cfg_checksum=0x5d41b59e sudoku.gcda.bad: 01a1: 16:COUNTERS arcs 8 counts sudoku.gcda.bad: 01b1: 2:COUNTERS time_profiler 1 counts whereas: $ gcov-dump-8 sudoku.gcda.good sudoku.gcda.good:data:magic `gcda':version `A80e' sudoku.gcda.good:stamp 46451024 sudoku.gcda.good:tag `0052' is invalid sudoku.gcda.good:0052:459670006:UNKNOWN so it looks like the profile data is not usable and hence no pessimization? That's probably not the fix I was hoping for, oops!
[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #22 from PeteVine --- > I don't know what exactly "fixed" this That would be nice to know. This I can say for sure: gcc 7.2.1 20171116 still produces slower profiled code on the target system. I've also discovered, compiling and profiling on a binary compatible Cortex A17 system (same flags), produces binaries that don't run any slower on the target system.
[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 PeteVine changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |FIXED --- Comment #20 from PeteVine --- The bug doesn't reproduce in a recent GCC 8 build (profiling on a Cortex A5 system). The generated assembly contains no __aeabi_idiv calls whatsoever. Well done.
[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #19 from PeteVine --- Created attachment 42694 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42694=edit Better assembly after profiling
[Bug target/79964] Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 PeteVine changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #8 from PeteVine --- In case the changed behaviour of -frename-registers is not actually a feature, please reopen.
[Bug fortran/79933] gfortran no longer able to compile dolfyn benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79933 --- Comment #5 from PeteVine --- You're right, sorry for the confusion. It seems I skimmed over the wall of errors too quickly while the last one came from a different source file. According to my own results in bug #77730, I was somehow able to build the benchmark with earlier versions of gcc 6/7 using the -std=f95 flag. Could you try the following steps? wget http://www.phoronix-test-suite.com/benchmark-files/dolfyn-cfd_0.527.tgz tar xvf dolfyn-cfd_0.527.tgz cd dolfyn-cfd_0.527/src/ sed -i 's/F90FLAGS = -O2/F90FLAGS = -O2 -std=f95/' Makefile make
[Bug fortran/79933] gfortran no longer able to compile dolfyn benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79933 --- Comment #3 from PeteVine --- In gcc 8, -std=f2003 is required to overcome the issue but there's another failure later on: gfortran -c -O2 -std=f2003solverinterface.f90 solverinterface.f90:108:9: real*4 fpar(16) ! hint by Shibo 1 Error: GNU Extension: Nonstandard type declaration REAL*4 at (1) solverinterface.f90:153:7: fpar(1) = RTOL(VarSC) 1 Error: Unclassifiable statement at (1) solverinterface.f90:154:7: fpar(2) = ATOL(VarSC) 1 Error: Unclassifiable statement at (1) solverinterface.f90:156:7: fpar(1) = RTOL(ivar) ! relative tolerance, must be between (0, 1) 1 Error: Unclassifiable statement at (1) solverinterface.f90:157:7: fpar(2) = ATOL(ivar) ! absolute tolerance, must be positive 1 Error: Unclassifiable statement at (1) solverinterface.f90:176:1: NNZ, Alu,Jlu,Ju,Jw ) 1 Are there any gfortran switches that would enable compiling this code base as is?
[Bug target/79964] Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #7 from PeteVine --- Thanks for pointing that out! I was using my bash history to change the CFLAGS and when I was flipping the crc switch I didn't notice I'd picked a version without -frename-registers, hence this wrong conclusion :) Definitely then, -frename-registers it is! http://openbenchmarking.org/result/1707307-RI-CORTEXA5313
[Bug target/79964] Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #5 from PeteVine --- Turns out the GCC 8 regression is caused by the +crc switch in -march=armv8-a+crc. Interesting, eh?
[Bug target/79581] VFP4 slower than VFP3 in C-ray on Cortex A5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #8 from PeteVine --- I've just confirmed the result on a newer Linux distribution (Ubuntu 16.04) and the difference between VFPv3 and v4 is clearly there (2330 vs 2560) using gcc 5.4. Unless the CPU itself requires an erratum, that probably leaves suboptimal codegen as the main suspect.
[Bug target/79581] VFP4 slower than VFP3 in C-ray on Cortex A5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #7 from PeteVine --- Thanks, I promise to test any patches without delay :)
[Bug target/79964] Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #4 from PeteVine --- > I'm not sure what you're trying to measure here - it's very confusing with > multiple overlapping options (O3/Ofast/tree-vectorize), -mcpu/-march. Is it > related to -fipa-pta or is that not relevant? All the relevant flags have been kept constant (-Ofast -mcpu), so you should only look at this result side by side with the previous one. I'll summarise the findings for you: To get the best c-ray performance out of gcc7 it's necessary to either use -mcpu/mtune=cortex-a57 or -mcpu=cortex-a53 -frename-registers (depessimizing with -mno-fix-cortex-a53-843419 if necessary) However, in gcc8, neither produce the expected, best performance. No combination does, a clear regression.
[Bug target/79581] VFP4 slower than VFP3 in C-ray on Cortex A5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #5 from PeteVine --- Unchanged in gcc version 8.0.0 20170501.
[Bug target/79964] Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #2 from PeteVine --- I can confirm the first part of the issue gets fixed with this patch: https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01415.html but there's a regression in gcc8 concerning the second part. (or rather the workarounds don't work any more) http://openbenchmarking.org/result/1704298-RI-CRAYREGRE13 ("basic flags" didn't deactivate -mfix-cortex-a53-843419, hence the difference)
[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #18 from PeteVine --- > Well that sounds like the same issue. > Note -fprofile-generate simple inserts counters in the generated code. In > fact the generated code is practically identical between Cortex-A5 and > Cortex-A7. As long as the gcda file is not present, -fprofile-use yields an equally good binary (obviously!), so clearly it's about the profile data somehow. If you have any ideas or debugging suggestions, go ahead, I'll gladly test them.
[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #16 from PeteVine --- Also, I'd like to repeat the fact using -mcpu=cortex-a7 fixes the issue (no library calls present). Incidentally, having run that A7 profiled binary on a Cortex-A53, I'm seeing a 10% hit compared to a vanilla A7 binary. Hopefully that's just an artifact of profiling a different CPU architecture.
[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #15 from PeteVine --- I don't have a cross-compiler built/installed. If you're positive the bug doesn't reproduce on your end (targeting generic or A5 codegen), then maybe it's about some interaction between gcc instrumentation and the slightly dated system libraries. I think my little A5->A53 experiment shows once the instrumented binary is built, it doesn't matter how the profile data is gathered.
[Bug gcov-profile/69004] Building t-engine on ARM fails during -fprofile-use stage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69004 PeteVine changed: What|Removed |Added Attachment #41239|0 |1 is obsolete|| --- Comment #37 from PeteVine --- Comment on attachment 41239 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41239 Assembly files produced with -fverbose-asm Wrong bug, sorry. I can't get used to bugzilla jumping to the next issue on posting a comment.
[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #13 from PeteVine --- Created attachment 41240 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41240=edit Assembly files produced with -fverbose-asm
[Bug gcov-profile/69004] Building t-engine on ARM fails during -fprofile-use stage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69004 --- Comment #36 from PeteVine --- Created attachment 41239 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41239=edit Assembly files produced with -fverbose-asm
[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #12 from PeteVine --- It even reproduces the following way: I built an instrumented ARMv7 binary natively, ran it on a Cortex-A53, copied the gcda file back, recompiled with -fprofile-use and got the same 20% slowdown. Surely, that must count (pun intended) for something, as both CPU's are in-order designs.
[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #11 from PeteVine --- I've just retested gcc7 on both ARM platforms. AArch64 gets a 3% improvement now, while ARMv7 reproduces the issue, just as before. I'm compiling/profiling on a Cortex A5 which could be the main reason behind all this, as it doesn't have hard division.
[Bug target/79964] Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 PeteVine changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #1 from PeteVine --- Turns out -frename-registers fixes this issue as well, thanks for the tip! http://openbenchmarking.org/result/1704142-RI-1703089RI22
[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994 --- Comment #9 from PeteVine --- Well, yes, that fixes the -Ofast issue for me: -mcpu=cortex-a53 -frename-registers iir:65952 ns per loop iir_2: 63098 ns per loop -mcpu=cortex-a57 (-frename-registers) iir:62839 ns per loop iir_2: 62677 ns per loop
[Bug target/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #9 from PeteVine --- Correction, it was about -fomit-frame-pointer period! Setting the environment C(XX)FLAGS to that flag alone triggers the bug.
[Bug target/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #8 from PeteVine --- It was about -O3 -fomit-frame-pointer, but yeah, I don't care one bit either. Just make sure `--enable-languages=ada` works. (c++ is not being inferred so you end up with no xg++)
[Bug ada/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #6 from PeteVine --- Turns out it's a miscompilation bug as I was using the same set of C(XX)FLAGS that work fine for those other languages. Removing the -fomit-frame-pointer flag while leaving the rest unchanged (-O3 -mtune=cortex-a57 -fipa-pta -march=armv8-a+crc) fixes the issue.
[Bug ada/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #5 from PeteVine --- The repeated full ada bootstrap was successful at the same revision, using identical flags and GNAT 5.4.0. On the other hand, the failing build prints two warnings during the ada part: g-debpoo.adb: In function ‘GNAT.DEBUG_POOLS.GET_SIZE’: g-debpoo.adb:1418:8: warning: ‘SIZE_IN_STORAGE_ELEMENTS’ may be used uninitialized in this function [-Wmaybe-uninitialized] g-debpoo.ads:295:7: note: ‘SIZE_IN_STORAGE_ELEMENTS’ was declared here and g-comlin.adb: In function ‘GNAT.COMMAND_LINE.FIND_LONGEST_MATCHING_SWITCH’: g-comlin.adb:583:8: warning: ‘PARAM’ may be used uninitialized in this function [-Wmaybe-uninitialized] g-comlin.adb:107:7: note: ‘PARAM’ was declared here The error message read differently this time: 7.0.1 20170311 (experimental) (aarch64-linux-gnu) Storage_Error stack overflow or erroneous memory access
[Bug ada/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #4 from PeteVine --- > Can you try again without --disable-bootstrap ? It's GNAT 5.4.0. OK, I'll try again.
[Bug ada/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #2 from PeteVine --- Right, I definitely used the same setup a few days ago minus --disable-bootstrap.
[Bug ada/80007] New: --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 Bug ID: 80007 Summary: --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64 Product: gcc Version: 7.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ada Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Never tried bootstrapping ada this way before (a full bootstrap succeeded a few days ago), so I'm not entirely sure if that's allowed/advisable but I got the following failure: /mnt/odroid/gcc-master/build/./gcc/xgcc -B/mnt/odroid/gcc-master/build/./gcc/ -B/usr/gcc7/aarch64-linux-gnu/bin/ -B/usr/gcc7/aarch64-linux-gnu/lib/ -isystem /usr/gcc7/aarch64-linux-gnu/include -isystem /usr/gcc7/aarch64-linux-gnu/sys-include-c -g -O2 -fPIC -W -Wall -gnatpg -nostdinc g-exptty.adb -o g-exptty.o +===GNAT BUG DETECTED==+ | 7.0.1 20170311 (experimental) (aarch64-linux-gnu) Program_Error unhandled signal| | Error detected at s-stoele.adb:36:20 | | Please submit a bug report; see https://gcc.gnu.org/bugs/ . | | Use a subject line meaningful to you and us to track the bug.| | Include the entire contents of this bug box in the report. | | Include the exact command that you entered. | | Also include sources listed below. | +==+ Please include these source files with error report Note that list may not be accurate in some cases, so please double check that the problem can still be reproduced with the set of files listed. Consider also -gnatd.n switch (see debug.adb). system.ads g-exptty.adb g-exptty.ads g-expect.ads gnat.ads g-os_lib.ads s-os_lib.ads s-string.ads ada.ads a-uncdea.ads g-regpat.ads s-regpat.ads g-tty.ads s-oscons.ads interfac.ads i-c.ads s-parame.ads s-exctab.ads s-stalib.ads a-unccon.ads a-tags.ads s-stoele.ads a-stream.ads s-soflin.ads a-except.ads s-traent.ads s-stache.ads s-stratt.ads s-unstyp.ads s-secsta.ads s-finmas.ads a-finali.ads s-finroo.ads s-stopoo.ads s-pooglo.ads a-caldel.ads a-calend.ads s-stoele.adb compilation abandoned ../gcc-interface/Makefile:296: recipe for target 'g-exptty.o' failed
[Bug target/79964] New: Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 Bug ID: 79964 Summary: Cortex A53 codegen still not optimal Product: gcc Version: 7.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Two data points: - the integer benchmark from PR79665 runs about 7% slower with -mcpu=cortex-a53 vs other targets, equalling generic codegen. It was still indistinguishable on 20170220, so the regression must have happened shortly after as per PR79665#c13 - c-ray again, wherein dispensing with -mfix-cortex-a53-843419 removes the handicap, but e.g. A57 codegen produces the best result ever on this machine: http://openbenchmarking.org/result/1703089-RI-1703040RI07 Coming from a Cortex A53 with 32kB of L1 cache.
[Bug target/77730] Fortran performance on aarch64 (6/7 regression heads-up)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77730 PeteVine changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |WORKSFORME --- Comment #9 from PeteVine --- So, with the whole -mfix-fix-cortex-a53-843419 debacle out of the way, there's probably not much left to worry about: http://openbenchmarking.org/result/1703071-RI-1609258LO61
[Bug fortran/79933] New: gfortran no longer able to compile dolfyn benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79933 Bug ID: 79933 Summary: gfortran no longer able to compile dolfyn benchmark Product: gcc Version: 7.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40900 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40900=edit fortran source Used to work fine a few months ago. $ gfortran-7 -O2 gmsh2dolfyn.f90 gmsh2dolfyn.f90:159:9: stop'bug: error in dimensions of array v' 1 Error: Blank required in STOP statement near (1) Adding -std=f95 solves the issue so maybe some sort of suggestion is in order?
[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 --- Comment #15 from PeteVine --- Sorry wrong number :) I meant --enable-fix-cortex-a53-843419
[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 PeteVine changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |WORKSFORME --- Comment #14 from PeteVine --- I've finally put two and two together and identified the cause, --fix-cortex-a53-835769, which was kicking in for -mcpu=cortex-a53. Looks like an LTO gold linker bug, period. FWIW, an LTO bootstrapped gcc is actually slower, taking 2 more minutes to complete a --disable-bootstrap build (3% slowdown).
[Bug middle-end/77546] [6/7 regression] C++ software renderer performance drop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77546 PeteVine changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |WORKSFORME --- Comment #7 from PeteVine --- Passing -mno-fix-cortex-a53-843419 fixes the issue; gcc 6.3 yields a 38.2 score while gcc 7.0.1 an even better 39.5 using the same workaround.
[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994 --- Comment #7 from PeteVine --- Not affected by -mno-fix-cortex-a53-843419 which gives the issue full validity. -Ofast pessimizes Cortex A53 codegen somehow and switching to e.g. -mcpu=cortex-a57 fixes it. (tested on trunk)
[Bug target/77468] [7 Regression] C-ray regression on Aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #31 from PeteVine --- Indeed, that was it! I've probably found the source of my A53 issues: http://openbenchmarking.org/result/1703040-RI-CRAYERRAT99 This means comment #29 exposes a different issue and Cortex A53 codegen still is suboptimal.
[Bug target/77468] [7 Regression] C-ray regression on Aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #30 from PeteVine --- Or rather, the difference observed in: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468#c7 is still there @ -Ofast, but the Cortex-A53 result is in the same range now. I'll have to investigate the effect of --fix-cortex-a53-835769 that was always passed by default in the other image.
[Bug target/77468] [7 Regression] C-ray regression on Aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #29 from PeteVine --- I used a different distribution image (binutils 2.25, no --fix-cortex-a53-835769 option) but the results haven't changed (thunderx tuning must have improved though as it stopped offering any benefit over A53): http://openbenchmarking.org/result/1703043-RI-CRAYDEBIA96
[Bug target/79105] Autovectorized NEON code slower than vfpv4 on Cortex A5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79105 PeteVine changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #3 from PeteVine --- Closing in favour of PR79581.
[Bug target/79712] Clang smarter about unrolling in fhourstones benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #8 from PeteVine --- Seeing as unrolling does such a great job on aarch64, surpassing clang, should we leave the ARM issue bunched together with this one?
[Bug target/79712] Clang smarter about unrolling in fhourstones benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #6 from PeteVine --- The difference between clang and gcc is even greater on ARMv7 Cortex A5 but there's no way to catch up through unrolling (no effect): gcc version 7.0.1 20170225:1227.2 Kpos/sec clang 3.6: 1540.4 Kpos/sec
[Bug target/79712] Clang smarter about unrolling in fhourstones benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #5 from PeteVine --- Clang however gets no further improvement from -funroll-loops meaning a simple `-O3 -mcpu=cortex-a53` produces much better performance than gcc without unrolling.
[Bug target/79712] Clang smarter about unrolling in fhourstones benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #4 from PeteVine --- It's a gcc version 7.0.1 20170220 (experimental) (GCC) configured with: --enable-languages=c,c++,fortran --prefix=/usr/gcc7 --program-suffix=-7 --enable-shared --enable-linker-build-id --libexecdir=/usr/gcc7/lib --without-included-gettext --enable-threads=posix --libdir=/usr/gcc7/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --enable-plugin --with-system-zlib --disable-browser-plugin --with-arch-directory=aarch64 --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu --enable-checking=release
[Bug middle-end/79712] Clang smarter about unrolling in fhourstones benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #2 from PeteVine --- Created attachment 40831 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40831=edit inputs
[Bug middle-end/79712] Clang smarter about unrolling in fhourstones benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #1 from PeteVine --- Created attachment 40830 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40830=edit C source
[Bug middle-end/79712] New: Clang smarter about unrolling in fhourstones benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 Bug ID: 79712 Summary: Clang smarter about unrolling in fhourstones benchmark Product: gcc Version: 7.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40829 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40829=edit preprocessed source It seems clang is probably doing a better job at unrolling in the fhourstones benchmark: $ gcc -Wextra -Wall -Ofast -mcpu=cortex-a53 -march=armv8-a+crc -ftree-vectorize SearchGame.i (-funroll-loops -fvariable-expansion-in-unroller -ftree-loop-ivcanon -fivopts) $ ./a.out < inputs - clang 3.8 result: 3358 kpos/s - gcc result: 3220 kpos/s - gcc result with unrolling: 3473 kpos/s It would be nice if gcc could achieve similar performance to clang's -O3 out of the box. BTW, running the benchmark on 32-bit requires changing the %lu's to %llu's at line 200 in the C source.
[Bug middle-end/79665] gcc's signed (x*x)/200 is slower than clang's
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79665 --- Comment #13 from PeteVine --- Still, the 5% regression must have happened very recently. The fast gcc was built on 20170220 and the slow one yesterday, using the original patch. Once again, switching away from Cortex-A53 codegen restores the expected performance.
[Bug middle-end/79665] gcc's signed (x*x)/200 is slower than clang's
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79665 --- Comment #6 from PeteVine --- But that's related to -mcpu=cortex-a53 again, so never mind I guess.
[Bug middle-end/79665] gcc's signed (x*x)/200 is slower than clang's
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79665 PeteVine changed: What|Removed |Added CC||tulipawn at gmail dot com --- Comment #5 from PeteVine --- Psst! GCC 7 was already 1.75x faster than Clang 3.8 on my aarch64 machine when I benchmarked this code 3 weeks ago, but with this patch, it seems to take a 5% hit.
[Bug target/79581] VFP4 slower than VFP3 in C-ray
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #4 from PeteVine --- > Judging by your -mcpu option is this on a Cortex-A5? Yes, if you look at the results on a Cortex A53 running armv7 code, it doesn't reproduce either, and A5-codegen is king :) (hopefully due to in-order design or sth) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659#c12 A quick question regarding -mcpu=cortex-a5 codegen; is there a similar switch to llvm's `-slowfpvmlx` feature? (disable slow vmla/vmls), which the nice ARM guy divulged here: https://bugs.llvm.org//show_bug.cgi?id=26135#c9 or is it a non-issue in gcc?
[Bug target/79581] VFP4 slower than VFP3 in C-ray
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #2 from PeteVine --- Created attachment 40769 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40769=edit sphract The other file required to run the benchmark straight from bugzilla! :)
[Bug target/79581] VFP4 slower than VFP3 in C-ray
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 PeteVine changed: What|Removed |Added Target||armv7 --- Comment #1 from PeteVine --- Distilled from PR79105 as a separate issue, not related to NEON and autovectorization.
[Bug target/79581] New: VFP4 slower than VFP3 in C-ray
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 Bug ID: 79581 Summary: VFP4 slower than VFP3 in C-ray Product: gcc Version: 7.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40762 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40762=edit preprocessed source $ gcc -marm -Ofast -mcpu=cortex-a5 -mfpu=vfpv3 c-ray-mt.i -lm -lpthread $ ./a.out -t 32 -s 160x120 -r 8 -i sphfract -o output.ppm ; done Rendering took: 2 seconds (2393 milliseconds) $ gcc -marm -Ofast -mcpu=cortex-a5 -mfpu=vfpv4 c-ray-mt.i -lm -lpthread $ ./a.out -t 32 -s 160x120 -r 8 -i sphfract -o output.ppm ; done Rendering took: 2 seconds (2494 milliseconds) This defect dates back to gcc 4.9 (or earlier) but at least gcc 7 provides a big speedup in vfvp4 code. (roughly 2500 now vs 2700 previously)
[Bug target/77468] [7 Regression] C-ray regression on Aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #28 from PeteVine --- Lesson learnt, thanks! If you look at the last -Ofast result (or 1702153-RI-CRAYFAST467), the suspect difference is there (the compiler had been rebuilt from scratch with all the patches), and I even managed to set a record performance with -mtune=thunderx. I'm using an S905 SoC.
[Bug target/77468] [7 Regression] C-ray regression on Aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #26 from PeteVine --- OK, maybe this SoC is kinky, I give up: http://openbenchmarking.org/result/1702154-RI-CRAYFAST326
[Bug target/77468] [7 Regression] C-ray regression on Aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #25 from PeteVine --- The original issue never mentioned -Ofast or -ffast-math and I see no difference at -Ofast, indeed: http://openbenchmarking.org/result/1702153-RI-CRAYFAST424 @jgreenhalgh Can you confirm there's no regression @ -O3 as well? Thanks.
[Bug target/77468] [7 Regression] C-ray regression on Aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #24 from PeteVine --- I did a git pull and restarted the build so unless something didn't get reconfigured, it definitely should've been included. If you see the improvement, never mind then.
[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #12 from PeteVine --- Nice, PR68664 patch has fixed the issue. FWIW, unlike previously, running on a Cortex-A53, showed perfect alignment with core type (-mfpu=vfpv3) on the first run: Cortex-A8 Rendering took: 1 seconds (1801 milliseconds) Cortex-A5 Rendering took: 1 seconds (1708 milliseconds) Cortex-A7 Rendering took: 1 seconds (1699 milliseconds) Cortex-A9 Rendering took: 1 seconds (1644 milliseconds) Cortex-A15 Rendering took: 1 seconds (1637 milliseconds) whereas using -mfpu=vfpv4 favours Cortex-A5 code's execution: Cortex-A8 Rendering took: 1 seconds (1803 milliseconds) Cortex-A5 Rendering took: 1 seconds (1506 milliseconds) Cortex-A7 Rendering took: 1 seconds (1636 milliseconds) Cortex-A9 Rendering took: 1 seconds (1645 milliseconds) Cortex-A15 Rendering took: 1 seconds (1643 milliseconds) but that's probably expected. Not sure about A8's codegen performance though.
[Bug target/77468] [7 Regression] C-ray regression on Aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 PeteVine changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|DUPLICATE |--- --- Comment #22 from PeteVine --- @ktkachov I've just retested with your patches from PR68664 and it looks like this could be a different issue, concerning Cortex-A53 codegen exclusively. Using -mcpu=thunderx demonstrates the problem (as well as gcc7 superiority probably): http://openbenchmarking.org/result/1702146-RI-1609039HA18
[Bug target/79480] -O3 and -mfpu=neon produces crashing code on ARM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79480 --- Comment #4 from PeteVine --- Whereas `-fsanitize=address` aborts all the same: ==28821==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0xaf012100 #0 0xb6af76fb in operator delete(void*, unsigned int) (/usr/gcc7/lib/libasan.so.4+0xd86fb) #1 0xe5e53 in CLoad3DS::ProcessNextObjectChunk(CModel*, CObject*, Chunk*) [clone .constprop.38] (/tmp/gl-117-1.3.2-src/src/gl-117+0xe5e53) #2 0xe6773 in CLoad3DS::ProcessNextObjectChunk(CModel*, CObject*, Chunk*) [clone .constprop.38] (/tmp/gl-117-1.3.2-src/src/gl-117+0xe6773) #3 0xf8ae7 in CLoad3DS::ProcessNextChunk(CModel*, Chunk*) [clone .constprop.30] (/tmp/gl-117-1.3.2-src/src/gl-117+0xf8ae7) #4 0xf8ecf in CLoad3DS::ProcessNextChunk(CModel*, Chunk*) [clone .constprop.30] (/tmp/gl-117-1.3.2-src/src/gl-117+0xf8ecf) #5 0x80537 in CLoad3DS::Import3DS(CModel*, char*) [clone .constprop.29] (/tmp/gl-117-1.3.2-src/src/gl-117+0x80537) #6 0x3e403 in myFirstInit() (/tmp/gl-117-1.3.2-src/src/gl-117+0x3e403) #7 0x1b2a7 in main (/tmp/gl-117-1.3.2-src/src/gl-117+0x1b2a7) #8 0xb659d66f in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1766f) 0xaf012100 is located 0 bytes inside of 7680-byte region [0xaf012100,0xaf013f00) allocated by thread T0 here: #0 0xb6af66cf in operator new[](unsigned int) (/usr/gcc7/lib/libasan.so.4+0xd76cf) #1 0xe5a5b in CLoad3DS::ProcessNextObjectChunk(CModel*, CObject*, Chunk*) [clone .constprop.38] (/tmp/gl-117-1.3.2-src/src/gl-117+0xe5a5b) SUMMARY: AddressSanitizer: alloc-dealloc-mismatch (/usr/gcc7/lib/libasan.so.4+0xd86fb) in operator delete(void*, unsigned int) ==28821==HINT: if you don't care about these errors you may set ASAN_OPTIONS=alloc_dealloc_mismatch=0 ==28821==ABORTING
[Bug target/79480] -O3 and -mfpu=neon produces crashing code on ARM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79480 --- Comment #3 from PeteVine --- That's the same command line that leads to an immediate crash (uninstrumented).
[Bug target/79480] -O3 and -mfpu=neon produces crashing code on ARM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79480 --- Comment #2 from PeteVine --- OK, having been built with: -mcpu=cortex-a5 -O3 -ffast-math -marm -fomit-frame-pointer -fipa-pta -mfpu=neon-vfpv4 -ftree-vectorize -flto -fsanitize=undefined doesn't crash but prints many errors, e.g.: 3ds.cpp:111:8: runtime error: load of misaligned address 0x007de59a for type 'Uint32', which requires 4 byte alignment 0x007de59a: note: pointer points here 00 00 4d 4d 0d be 00 00 02 00 0a 00 00 00 03 00 00 00 3d 3d b9 ab 00 00 3e 3d 0a 00 00 00 03 00 ^ 3ds.cpp:111:8: runtime error: load of misaligned address 0x007de5aa for type 'Uint32', which requires 4 byte alignment 0x007de5aa: note: pointer points here 00 00 3d 3d b9 ab 00 00 3e 3d 0a 00 00 00 03 00 00 00 ff af fa 00 00 00 00 a0 14 00 00 00 30 31 ^ 3ds.cpp:125:8: runtime error: load of misaligned address 0x007de5e1 for type 'Uint16', which requires 2 byte alignment 0x007de5e1: note: pointer points here 00 96 96 96 20 a0 0f 00 00 00 11 00 09 00 00 00 96 96 96 30 a0 0f 00 00 00 11 00 09 00 00 00 e5 ^ 3ds.cpp:111:8: runtime error: load of misaligned address 0x007de5e3 for type 'Uint32', which requires 4 byte alignment 0x007de5e3: note: pointer points here 96 96 20 a0 0f 00 00 00 11 00 09 00 00 00 96 96 96 30 a0 0f 00 00 00 11 00 09 00 00 00 e5 e5 e5 ^ 3ds.cpp:125:8: runtime error: load of misaligned address 0x007de5e7 for type 'Uint16', which requires 2 byte alignment 0x007de5e7: note: pointer points here 0f 00 00 00 11 00 09 00 00 00 96 96 96 30 a0 0f 00 00 00 11 00 09 00 00 00 e5 e5 e5 40 a0 0e 00 LIBGL: warning, gles_glBlendFuncSeparate is NULL main.cpp:4762:29: runtime error: index 256 out of bounds for type 'int [256]' main.cpp:4762:10: runtime error: load of address 0x00427180 with insufficient space for an object of type 'int' 0x00427180: note: pointer points here 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 d8 2a 61 00 ^ main.cpp:4783:35: runtime error: index 65536 out of bounds for type 'unsigned char [65536]' main.cpp:4783:37: runtime error: store to address 0x00407180 with insufficient space for an object of type 'unsigned char' 0x00407180: note: pointer points here 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ main.cpp:4784:39: runtime error: index 65537 out of bounds for type 'unsigned char [65536]' main.cpp:4784:41: runtime error: store to address 0x00407181 with insufficient space for an object of type 'unsigned char' 0x00407181: note: pointer points here 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ main.cpp:4785:39: runtime error: index 65538 out of bounds for type 'unsigned char [65536]' main.cpp:4785:41: runtime error: store to address 0x00407182 with insufficient space for an object of type 'unsigned char' 0x00407182: note: pointer points here 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ main.cpp:4786:39: runtime error: index 65539 out of bounds for type 'unsigned char [65536]' main.cpp:4786:41: runtime error: store to address 0x00407183 with insufficient space for an object of type 'unsigned char' 0x00407183: note: pointer points here 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ main.cpp:3384:44: runtime error: downcast of address 0x00cb3488 which does not point to an object of type 'AIObj' 0x00cb3488: note: object is of type 'DynamicObj' 61 01 00 00 c4 2f 29 00 00 00 00 00 00 00 00 00 01 1b c8 00 00 00 80 3f 00 00 80 3f 00 00 00 3f ^~~ vptr for 'DynamicObj' main.cpp:3385:19: runtime error: member access within address 0x00cc67d8 which does not point to an object of type 'DynamicObj' 0x00cc67d8: note: object is of type 'CExplosion' 61 00 00 00 e4 43 29 00 00 00 00 00 00 00 00 00 00 00 00 00 cd cc cc 3d 00 00 80 3f 00 00 00 3f ^~~ vptr for 'CExplosion'
[Bug target/79480] New: -O3 and -mfpu=neon produces crashing code on ARM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79480 Bug ID: 79480 Summary: -O3 and -mfpu=neon produces crashing code on ARM Product: gcc Version: 7.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- The gl-117 binary (source link attached) compiled with: -mcpu=cortex-a5 -O3 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize crashes with a SIGBUS plus this kernel info: Alignment trap: not handling instruction f4620adf at [<00067c8c>] Unhandled fault: alignment exception (0x001) at 0x0048fcb1 Moreover, using LTO prints a warning about undefined behaviour: main.cpp: In function ‘_ZL11myTimerFunci.isra.26’: main.cpp:4762:29: warning: iteration 256 invokes undefined behavior [-Waggressive-loop-optimizations] int h = heat [yind] [i2]; ^ main.cpp:4757:5: note: containing loop for (i2 = 0; i2 < maxfx + 1; i2 ++) ^ The code is rather old C++ (g++ warns: ISO C++ forbids converting a string constant to ‘char*’) but as long as -mfpu=neon is not used it doesn't crash, even with the above warning. Reporting just in case.
[Bug target/79370] Cortex-A7 hardware division switched on for -mcpu but not -mtune
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79370 PeteVine changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #1 from PeteVine --- Oops, the meaning of -mcpu and -mtune must have switched places in my head :)
[Bug target/79370] New: Cortex-A7 hardware division switched on for -mcpu but not -mtune
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79370 Bug ID: 79370 Summary: Cortex-A7 hardware division switched on for -mcpu but not -mtune Product: gcc Version: 7.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40667 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40667=edit Preprocessed source Compiling the attachment (produced from http://pastebin.com/X3ZTp1c0) only emits sdiv instructions for -mcpu=cortex-a7 but not for the more logical case of -mtune=cortex-a7 FWIW, running the produced "tuned" code on a Cortex-A53 (32bit mode) incurs a 6x penalty compared with -mcpu which is fully consistent with targeting a soft-division CPU. gcc version 7.0.1 20170127, switches used: -Ofast -mtune=cortex-a7 -marm -mfloat-abi=hard -mfpu=vfpv4
[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #11 from PeteVine --- Super cool, thanks! That makes the OP a true prophet before his time ;)
[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #9 from PeteVine --- @jgreenhalgh Please have a look at the profiled assembly for both fast and slow codegen. (attached) According to @aldyh's bisection in #68664 this probably isn't the same issue.
[Bug target/79239] [7 regression] ICE in extract_insn, at recog.c:2311 (error: unrecognizable insn)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79239 --- Comment #5 from PeteVine --- Yes, this came from the gl4es project, and compiling the whole thing normally, only gcc7 is affected.
[Bug target/79239] [7 regression] ICE in extract_insn, at recog.c:2311 (error: unrecognizable insn)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79239 --- Comment #2 from PeteVine --- gcc -O2 or above elicits the ICE, configured with: --enable-languages=c,c++,fortran --prefix=/usr/gcc7 --program-suffix=-7 --enable-shared --enable-linker-build-id --libexecdir=/usr/gcc7/lib --without-included-gettext --enable-threads=posix --libdir=/usr/gcc7/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=gcc4-compatible --disable-libstdcxx-dual-abi --enable-gnu-unique-object --disable-libitm --disable-libquadmath --enable-plugin --with-system-zlib --disable-browser-plugin --with-arch-directory=arm --enable-multiarch --enable-multilib --disable-sjlj-exceptions --with-arch=armv7-a --with-fpu=vfpv3 --with-float=hard --with-mode=arm --disable-werror --enable-multilib --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf --enable-checking=release
[Bug target/79239] New: [7 regression] ICE in extract_insn, at recog.c:2311 (error: unrecognizable insn)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79239 Bug ID: 79239 Summary: [7 regression] ICE in extract_insn, at recog.c:2311 (error: unrecognizable insn) Product: gcc Version: 7.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40586 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40586=edit Preposessed source gcc version 7.0.1 20170122 (experimental) (GCC) ICEs on the attached preprocessed source: ../src/gl/buffers.c: In function ‘gl4es_glBindBuffer’: ../src/gl/buffers.c:113:1: error: unrecognizable insn: } ^ (insn 130 129 131 12 (set (reg:DF 612) (fma:DF (reg:DF 611) (reg:DF 613) (reg:DF 614))) "../src/gl/buffers.h":17 -1 (nil)) ../src/gl/buffers.c:113:1: internal compiler error: in extract_insn, at recog.c:2311
[Bug target/77468] [7 Regression] C-ray regression on Aarch64
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #21 from PeteVine --- It would be great if https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 could get squashed in one fell swoop.
[Bug target/79105] Autovectorized NEON code slower than vfpv4 on Cortex A5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79105 --- Comment #2 from PeteVine --- $ gcc -v Configured with: ../configure -v --enable-languages=c,c++,fortran --prefix=/usr/gcc7 --program-suffix=-7 --enable-shared --enable-linker-build-id --libexecdir=/usr/gcc7/lib --without-included-gettext --enable-threads=posix --libdir=/usr/gcc7/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=gcc4-compatible --enable-gnu-unique-object --disable-libitm --disable-libquadmath --enable-plugin --with-system-zlib --disable-browser-plugin --with-arch-directory=arm --enable-multiarch --enable-multilib --disable-sjlj-exceptions --with-arch=armv7-a --with-fpu=vfpv3 --with-float=hard --with-mode=arm --disable-werror --enable-multilib --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf --enable-checking=release Thread model: posix gcc version 7.0.0 20170114 (experimental) (GCC)
[Bug target/79105] Autovectorized NEON code slower than vfpv4 on Cortex A5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79105 --- Comment #1 from PeteVine --- Updated to include an explicit -mfpu=neon-vfpv4. http://openbenchmarking.org/result/1701179-TA-1701143TA49 Not sure if -mcpu=cortex-a5 and -mfpu=neon shouldn't have implied VFPv4 but the explicit addition has fixed a few results making others worse, however.
[Bug target/79105] New: Autovectorized NEON code slower than vfpv4 on Cortex A5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79105 Bug ID: 79105 Summary: Autovectorized NEON code slower than vfpv4 on Cortex A5 Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- As the title says, many results seem to suffer from switching to -mfpu=neon, etc. http://openbenchmarking.org/result/1701165-TA-1701143TA78 Could anyone explain the abnormally small difference between armv7 and aarch64 in OpenSSL?
[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994 --- Comment #6 from PeteVine --- It's possible I already had that patch included in my build, but in case I didn't, here's a quick addition to the previous result: http://openbenchmarking.org/result/1701143-TA-GCCCOMPAR66 The c-ray thunderx result suggests A53 codegen is still suboptimal. The patch has had no effect on the original issue.
[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994 --- Comment #4 from PeteVine --- I'm delighted to report **not** targeting Cortex-A53 actually incurs a performance penalty sometimes ;) http://openbenchmarking.org/result/1701128-TA-GCCCOMPAR79
[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994 --- Comment #3 from PeteVine --- Hey, that works for me too! (62565 vs 70758 in favour of -Ofast). Usefully strange :)
[Bug middle-end/78994] New: -Ofast makes aarch64 C++ benchmark slower
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994 Bug ID: 78994 Summary: -Ofast makes aarch64 C++ benchmark slower Product: gcc Version: 5.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40463 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40463=edit Preprocessed source + assembly files After make && ./build/dsp-bench, the results look as follows @ -O3 -mcpu=cortex-a53 -ftree-vectorize iir:67945 ns per loop iir_2: 67952 ns per loop @ -Ofast -mcpu=cortex-a53 -ftree-vectorize iir:73367 ns per loop iir_2: 73349 ns per loop
[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 --- Comment #13 from PeteVine --- Also, could these (sample) warnings actually matter when using ld.gold? NB, lra-constraints.c features in the previously provided backtrace: ../../libdecnumber/decNumber.c:3582:0: note: code may be misoptimized unless -fno-strict-aliasing is used ../../gcc/../libdecnumber/decNumber.h:118:0: warning: type of ‘decNumberFromString’ does not match original declaration [-Wlto-type-mismatch] decNumber * decNumberFromString(decNumber *, const char *, decContext *); ../../libdecnumber/decNumber.h:118:0: warning: type of ‘decNumberFromString’ does not match original declaration [-Wlto-type-mismatch] decNumber * decNumberFromString(decNumber *, const char *, decContext *); ../../libdecnumber/decNumber.h:118:0: warning: type of ‘decNumberFromString’ does not match original declaration [-Wlto-type-mismatch] decNumber * decNumberFromString(decNumber *, const char *, decContext *); ../../libdecnumber/decNumber.h:118:0: warning: type of ‘decNumberFromString’ does not match original declaration [-Wlto-type-mismatch] decNumber * decNumberFromString(decNumber *, const char *, decContext *); ../../libdecnumber/decNumber.c:489:0: note: ‘decNumberFromString’ was previously declared here decNumber * decNumberFromString(decNumber *dn, const char chars[], ../../libdecnumber/decNumber.c:489:0: note: code may be misoptimized unless -fno-strict-aliasing is used ../../gcc/vec.h:1552:1: warning: ‘safe_push’ violates the C++ One Definition Rule [-Wodr] vec::safe_push (const T MEM_STAT_DECL) ^ ../../gcc/vec.h:1552:1: note: return value type mismatch vec ::safe_push (const T MEM_STAT_DECL) ^ ../../gcc/loop-invariant.c:100:8: note: type ‘struct invariant’ itself violate the C++ One Definition Rule struct invariant ^ ../../gcc/lra-constraints.c:4742:8: note: the incompatible type is defined here struct invariant ^ ../../gcc/vec.h:1552:1: note: ‘safe_push’ was previously declared here vec ::safe_push (const T MEM_STAT_DECL) ^ ../../gcc/vec.h:1552:1: note: code may be misoptimized unless -fno-strict-aliasing is used ../../gcc/vec.h:1540:1: warning: ‘quick_push’ violates the C++ One Definition Rule [-Wodr] vec ::quick_push (const T ) ^ ../../gcc/vec.h:1540:1: note: return value type mismatch vec ::quick_push (const T ) ^ ../../gcc/loop-invariant.c:100:8: note: type ‘struct invariant’ itself violate the C++ One Definition Rule struct invariant ^ ../../gcc/lra-constraints.c:4742:8: note: the incompatible type is defined here struct invariant ^ ../../gcc/vec.h:1540:1: note: ‘quick_push’ was previously declared here vec ::quick_push (const T ) ^ ../../gcc/vec.h:1540:1: note: code may be misoptimized unless -fno-strict-aliasing is used
[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 PeteVine changed: What|Removed |Added CC||ktkachov at gcc dot gnu.org --- Comment #12 from PeteVine --- The crash still reproduces on aarch64. Any suggestions on the ld.gold connection?
[Bug c++/69481] ICE with C++11 alias using with templates
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69481 PeteVine changed: What|Removed |Added CC||tulipawn at gmail dot com --- Comment #10 from PeteVine --- I've just hit the same ICE trying to build HHVM on aarch64 with g++7 20161202: [ 5%] Building CXX object third-party/folly/CMakeFiles/folly.dir/src/folly/ThreadCachedArena.cpp.o In file included from /mnt/odroid/hhvm/third-party/folly/folly/detail/ThreadLocalDetail.h:32:0, from /mnt/odroid/hhvm/third-party/folly/folly/ThreadLocal.h:59, from /mnt/odroid/hhvm/third-party/folly/folly/ThreadCachedArena.h:24, from /mnt/odroid/hhvm/third-party/folly/src/folly/ThreadCachedArena.cpp:17: /mnt/odroid/hhvm/third-party/folly/folly/Function.h:653:3: internal compiler error: same canonical type node for different types folly::Function::Traits and folly::detail::function::FunctionTraits SharedProxy asSharedProxy() && { ^~~ [ 5%] Building CXX object third-party/folly/CMakeFiles/folly.dir/src/folly/ThreadCachedArena.cpp.o In file included from /mnt/odroid/hhvm/third-party/folly/folly/detail/ThreadLocalDetail.h:32:0, from /mnt/odroid/hhvm/third-party/folly/folly/ThreadLocal.h:59, from /mnt/odroid/hhvm/third-party/folly/folly/ThreadCachedArena.h:24, from /mnt/odroid/hhvm/third-party/folly/src/folly/ThreadCachedArena.cpp:17: /mnt/odroid/hhvm/third-party/folly/folly/Function.h:653:3: internal compiler error: same canonical type node for different types folly::Function::Traits and folly::detail::function::FunctionTraits SharedProxy asSharedProxy() && { ^~~
[Bug bootstrap/78220] New: Add 'remounting exec' suggestion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78220 Bug ID: 78220 Summary: Add 'remounting exec' suggestion Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: bootstrap Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Restarting a build on a `noexec` partition fails with: checking whether the C compiler works... configure: error: in `/mnt/gcc-svn-master/build-lto/gcc': configure: error: cannot run C compiled programs. If you meant to cross compile, use `--host'. The above suggestion could include the most obvious, i.e. exec permissions, for example: "Check if the partition hasn't been (re)mounted noexec."
[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 PeteVine changed: What|Removed |Added Summary|ICE during LTO bootstrap on |ICE during LTO bootstrap on |AARCH64 with extra options |AARCH64 with gold linker --- Comment #11 from PeteVine --- Ok, it was neither about binutils, nor special flags. I was able to complete the build using 2.27 and full flags, provided ld.bfd was used and --with-checking=release was never used. The latter option was leading to discrepancies on 32/64-bit ARM platforms.
[Bug target/77730] Fortran performance on aarch64 (6/7 regression heads-up)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77730 --- Comment #8 from PeteVine --- I thought I was clear it was just a heads-up. All relevant data is already inside and anyone willing to look closer should just run the benchmark on any machine/platform like this, e.g.: $ phoronix-test-suite benchmark 1609257-LO-FORTRANAA01 , and so on.
[Bug bootstrap/77917] ARM/AARCH64 bootstrap-lto fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77917 PeteVine changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |WORKSFORME
[Bug bootstrap/77917] ARM/AARCH64 bootstrap-lto fails
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77917 --- Comment #11 from PeteVine --- Well, I finally managed to complete an LTO bootstrap on ARM (even leaving the full complement of C(XX)FLAGS in place, bar -flto) but it seems using ld.bfd is a must.
[Bug target/78105] ICE during LTO bootstrap on AARCH64 with extra options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 --- Comment #10 from PeteVine --- Hold on, I'm investigating the effect of binutils downgrade (2.27 -> 2.26) and switching to ld.bfd. Strange that it all works fine during normal bootstrap, regardless of the codegen options.
[Bug target/78105] ICE during LTO bootstrap on AARCH64 with extra options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 --- Comment #9 from PeteVine --- Ha! It's not about the extra options, thank you! ;) Completely vanilla environment and gcc 5.4 produce this after a while: ../build-lto-noflags/./gcc/xgcc -B../build-lto-noflags/./gcc/ -B/usr/gcc7/aarch64-linux-gnu/bin/ -B/usr/gcc7/aarch64-linux-gnu/lib/ -isystem /usr/gcc7/aarch64-linux-gnu/include -isystem /usr/gcc7/aarch64-linux-gnu/sys-include-g -O2 -O2 -g -O2 -DIN_GCC-W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -fPIC -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -fPIC -I. -I. -I../.././gcc -I../../../libgcc -I../../../libgcc/. -I../../../libgcc/../gcc -I../../../libgcc/../include -DHAVE_CC_TLS -o _muldi3.o -MT _muldi3.o -MD -MP -MF _muldi3.dep -DL_muldi3 -c ../../../libgcc/libgcc2.c -fvisibility=hidden -DHIDE_EXPORTS In file included from /usr/include/string.h:630:0, from ../../../libgcc/../gcc/tsystem.h:100, from ../../../libgcc/libgcc2.c:27: /usr/include/aarch64-linux-gnu/bits/string2.h: In function ‘__strspn_c2’: /usr/include/aarch64-linux-gnu/bits/string2.h:1039:3: internal compiler error: Segmentation fault while (__s[__result] == __accept1 || __s[__result] == __accept2) ^ 0x84173b crash_signal ../../gcc/toplev.c:338 0x5d9020 c_parser_binary_expression ../../gcc/c/c-parser.c:6855 0x5d9a23 c_parser_conditional_expression ../../gcc/c/c-parser.c:6495 0x5d9f4b c_parser_expr_no_commas ../../gcc/c/c-parser.c:6412 0x5dc2bf c_parser_expression ../../gcc/c/c-parser.c:8608 0x5dd823 c_parser_expression_conv ../../gcc/c/c-parser.c:8641 0x5dd89b c_parser_condition ../../gcc/c/c-parser.c:5458 0x5df1bf c_parser_paren_condition ../../gcc/c/c-parser.c:5477 0x5ebd1f c_parser_while_statement ../../gcc/c/c-parser.c:5789 0x5e7293 c_parser_statement_after_labels ../../gcc/c/c-parser.c:5265 0x5e608f c_parser_compound_statement_nostart ../../gcc/c/c-parser.c:4944 0x5e68b3 c_parser_compound_statement ../../gcc/c/c-parser.c:4777 0x5e5853 c_parser_declaration_or_fndef ../../gcc/c/c-parser.c:2176 0x5f3ecb c_parser_external_declaration ../../gcc/c/c-parser.c:1574 0x5f4907 c_parser_translation_unit ../../gcc/c/c-parser.c:1454 0x5f4907 c_parse_file() ../../gcc/c/c-parser.c:18173 0xe3f497 c_common_parse_file() ../../gcc/c-family/c-opts.c:1087
[Bug target/78105] ICE during LTO bootstrap on AARCH64 with extra options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 --- Comment #8 from PeteVine --- FWIW, here's the corresponding backtrace: #0 0x00afa00c in df_get_live_out () at ../../gcc/df.h:1159 #1 update_ebb_live_info (tail=, head=0x137b838 ) at ../../gcc/lra-constraints.c:5612 #2 lra_inheritance () at ../../gcc/lra-constraints.c:6251 #3 0x00b02da0 in lra (f=) at ../../gcc/lra.c:2403 #4 0x00e8cb60 in do_reload () at ../../gcc/ira.c:5374 #5 execute (this=) at ../../gcc/ira.c:5558 #6 0x00abdd80 in execute_one_pass (pass=pass@entry=0x1451ac0) at ../../gcc/passes.c:2341 #7 0x00abe29c in execute_pass_list_1 (pass=0x1451ac0) at ../../gcc/passes.c:2430 #8 0x00abe2b0 in execute_pass_list_1 (pass=0x1450a40) at ../../gcc/passes.c:2431 #9 0x00abe308 in execute_pass_list (fn=, pass=) at ../../gcc/passes.c:2441 #10 0x00d1e148 in cgraph_node::expand (this=this@entry=0x7fb7431000) at ../../gcc/cgraphunit.c:2001 #11 0x00d1f1f8 in expand_all_functions () at ../../gcc/cgraphunit.c:2137 #12 symbol_table::_ZN12symbol_table7compileEv.part.49(void) (this=this@entry=0x7fb7798000) at ../../gcc/cgraphunit.c:2494 #13 0x00d1f674 in symbol_table::compile (this=0x7fb7798000) at ../../gcc/cgraphunit.c:2400 #14 symbol_table::finalize_compilation_unit (this=0x7fb7798000) at ../../gcc/cgraphunit.c:2584 #15 0x00863ef8 in compile_file () at ../../gcc/toplev.c:493 #16 0x00573f80 in do_compile () at ../../gcc/toplev.c:2012 #17 toplev::main (this=this@entry=0x7fdf68, argc=, argc@entry=92, argv=, argv@entry=0x7fe0b8) at ../../gcc/toplev.c:2146 #18 0x0057345c in main (argc=92, argv=0x7fe0b8) at ../../gcc/main.c:39 I'm going to clear all flags in the future if that's the expected way.
[Bug target/78105] ICE during LTO bootstrap on AARCH64 with extra options
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 --- Comment #7 from PeteVine --- Restarted the whole thing from scratch using gcc 5.4 and it segfaulted again. ../../../libgcc/libgcc2.c: In function ‘__powitf2’: ../../../libgcc/libgcc2.c:1851:1: internal compiler error: Segmentation fault } ^ CXX/CFLAGS as above, configured as follows: ../configure --enable-languages=c,c++,fortran --prefix=/usr/gcc7 --program-suffix=-7 --enable-shared --enable-linker-build-id --libexecdir=/usr/gcc7/lib --without-included-gettext --enable-threads=posix --libdir=/usr/gcc7/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --enable-plugin --with-system-zlib --disable-browser-plugin --with-arch-directory=aarch64 --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu --enable-checking=release --with-build-config=bootstrap-lto