[Bug gcov-profile/83509] gcov-dump-8 unable to dump any gcda files

2017-12-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83509 --- Comment #4 from PeteVine --- Created attachment 42934 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42934=edit corresponding C/gcda/gcno files

[Bug gcov-profile/83509] gcov-dump-8 unable to dump any gcda files

2017-12-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83509 --- Comment #3 from PeteVine --- OK, the following command was used to obtain the gcno/gcda files: $ gcc-8 -O3 -fprofile-generate -ftest-coverage sudoku.c && ./a.out Unlike the gcda file, gcno is dumpable with gcov-dump-8.

[Bug gcov-profile/83509] New: gcov-dump-8 unable to dump any gcda files

2017-12-20 Thread tulipawn at gmail dot com
-profile Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- Created attachment 42931 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42931=edit example gcda file created with

[Bug gcov-profile/82614] GCOV crashes while parsing gcda file

2017-12-19 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82614 --- Comment #15 from PeteVine --- No, that's not it - gcov-dump 6/7 have no problem dumping previous versions. I'm just not sure if the problem with gcov-dump-8 is architecture specific (ARM) or it's something to do with my setup. I'm going to

[Bug gcov-profile/82614] GCOV crashes while parsing gcda file

2017-12-06 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82614 --- Comment #13 from PeteVine --- Almost certainly not related, but there's been some sort of regression in gcov-dump from GCC 8 branch. Trying to dump any *.gcda file (ver. 8 included) ends like this: $ gcov-dump-8 Unified_cpp_js_src25.gcda

[Bug middle-end/70773] Cortex A5 profiled sudoku solver slower due to lack of sdiv/udiv

2017-11-28 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #25 from PeteVine --- So, the profile data is probably fine, and judging from the size of the final binary, it's being used. The fix could be real after all :)

[Bug middle-end/70773] Cortex A5 profiled sudoku solver slower due to lack of sdiv/udiv

2017-11-28 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #24 from PeteVine --- Or maybe not, gcov-dump-6 is able to read the file. $ gcov-dump-6 sudoku.gcda.good sudoku.gcda.good:data:magic `gcda':version `A80e' sudoku.gcda.good:warning:current version is `603*' sudoku.gcda.good:stamp

[Bug middle-end/70773] Cortex A5 profiled sudoku solver slower due to lack of sdiv/udiv

2017-11-28 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #23 from PeteVine --- $ gcov-dump-6 sudoku.gcda.bad sudoku.gcda.bad:data:magic `gcda':version `603*' sudoku.gcda.bad:stamp 46515746 sudoku.gcda.bad: a300: 77:PROGRAM_SUMMARY checksum=0x12ec1c02 sudoku.gcda.bad:

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-11-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #22 from PeteVine --- > I don't know what exactly "fixed" this That would be nice to know. This I can say for sure: gcc 7.2.1 20171116 still produces slower profiled code on the target system. I've also discovered, compiling and

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-11-23 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 PeteVine changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|---

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-11-23 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #19 from PeteVine --- Created attachment 42694 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42694=edit Better assembly after profiling

[Bug target/79964] Cortex A53 codegen still not optimal

2017-11-11 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 PeteVine changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug fortran/79933] gfortran no longer able to compile dolfyn benchmark

2017-11-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79933 --- Comment #5 from PeteVine --- You're right, sorry for the confusion. It seems I skimmed over the wall of errors too quickly while the last one came from a different source file. According to my own results in bug #77730, I was somehow able

[Bug fortran/79933] gfortran no longer able to compile dolfyn benchmark

2017-11-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79933 --- Comment #3 from PeteVine --- In gcc 8, -std=f2003 is required to overcome the issue but there's another failure later on: gfortran -c -O2 -std=f2003solverinterface.f90 solverinterface.f90:108:9: real*4 fpar(16) ! hint by Shibo

[Bug target/79964] Cortex A53 codegen still not optimal

2017-07-30 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #7 from PeteVine --- Thanks for pointing that out! I was using my bash history to change the CFLAGS and when I was flipping the crc switch I didn't notice I'd picked a version without -frename-registers, hence this wrong conclusion

[Bug target/79964] Cortex A53 codegen still not optimal

2017-07-30 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #5 from PeteVine --- Turns out the GCC 8 regression is caused by the +crc switch in -march=armv8-a+crc. Interesting, eh?

[Bug target/79581] VFP4 slower than VFP3 in C-ray on Cortex A5

2017-06-15 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #8 from PeteVine --- I've just confirmed the result on a newer Linux distribution (Ubuntu 16.04) and the difference between VFPv3 and v4 is clearly there (2330 vs 2560) using gcc 5.4. Unless the CPU itself requires an erratum, that

[Bug target/79581] VFP4 slower than VFP3 in C-ray on Cortex A5

2017-06-15 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #7 from PeteVine --- Thanks, I promise to test any patches without delay :)

[Bug target/79964] Cortex A53 codegen still not optimal

2017-05-02 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #4 from PeteVine --- > I'm not sure what you're trying to measure here - it's very confusing with > multiple overlapping options (O3/Ofast/tree-vectorize), -mcpu/-march. Is it > related to -fipa-pta or is that not relevant? All

[Bug target/79581] VFP4 slower than VFP3 in C-ray on Cortex A5

2017-05-01 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #5 from PeteVine --- Unchanged in gcc version 8.0.0 20170501.

[Bug target/79964] Cortex A53 codegen still not optimal

2017-04-29 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #2 from PeteVine --- I can confirm the first part of the issue gets fixed with this patch: https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01415.html but there's a regression in gcc8 concerning the second part. (or rather the

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-21 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #18 from PeteVine --- > Well that sounds like the same issue. > Note -fprofile-generate simple inserts counters in the generated code. In > fact the generated code is practically identical between Cortex-A5 and > Cortex-A7. As

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-21 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #16 from PeteVine --- Also, I'd like to repeat the fact using -mcpu=cortex-a7 fixes the issue (no library calls present). Incidentally, having run that A7 profiled binary on a Cortex-A53, I'm seeing a 10% hit compared to a vanilla

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-21 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #15 from PeteVine --- I don't have a cross-compiler built/installed. If you're positive the bug doesn't reproduce on your end (targeting generic or A5 codegen), then maybe it's about some interaction between gcc instrumentation and

[Bug gcov-profile/69004] Building t-engine on ARM fails during -fprofile-use stage

2017-04-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69004 PeteVine changed: What|Removed |Added Attachment #41239|0 |1 is obsolete|

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #13 from PeteVine --- Created attachment 41240 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41240=edit Assembly files produced with -fverbose-asm

[Bug gcov-profile/69004] Building t-engine on ARM fails during -fprofile-use stage

2017-04-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69004 --- Comment #36 from PeteVine --- Created attachment 41239 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41239=edit Assembly files produced with -fverbose-asm

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #12 from PeteVine --- It even reproduces the following way: I built an instrumented ARMv7 binary natively, ran it on a Cortex-A53, copied the gcda file back, recompiled with -fprofile-use and got the same 20% slowdown. Surely, that

[Bug middle-end/70773] Profiled sudoku solver slower due to lack of sdiv/udiv

2017-04-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70773 --- Comment #11 from PeteVine --- I've just retested gcc7 on both ARM platforms. AArch64 gets a 3% improvement now, while ARMv7 reproduces the issue, just as before. I'm compiling/profiling on a Cortex A5 which could be the main reason behind

[Bug target/79964] Cortex A53 codegen still not optimal

2017-04-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 PeteVine changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #1 from

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-04-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994 --- Comment #9 from PeteVine --- Well, yes, that fixes the -Ofast issue for me: -mcpu=cortex-a53 -frename-registers iir:65952 ns per loop iir_2: 63098 ns per loop -mcpu=cortex-a57 (-frename-registers) iir:62839 ns per loop iir_2:

[Bug target/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-13 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #9 from PeteVine --- Correction, it was about -fomit-frame-pointer period! Setting the environment C(XX)FLAGS to that flag alone triggers the bug.

[Bug target/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-13 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #8 from PeteVine --- It was about -O3 -fomit-frame-pointer, but yeah, I don't care one bit either. Just make sure `--enable-languages=ada` works. (c++ is not being inferred so you end up with no xg++)

[Bug ada/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #6 from PeteVine --- Turns out it's a miscompilation bug as I was using the same set of C(XX)FLAGS that work fine for those other languages. Removing the -fomit-frame-pointer flag while leaving the rest unchanged (-O3

[Bug ada/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #5 from PeteVine --- The repeated full ada bootstrap was successful at the same revision, using identical flags and GNAT 5.4.0. On the other hand, the failing build prints two warnings during the ada part: g-debpoo.adb: In function

[Bug ada/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-11 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #4 from PeteVine --- > Can you try again without --disable-bootstrap ? It's GNAT 5.4.0. OK, I'll try again.

[Bug ada/80007] --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-11 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80007 --- Comment #2 from PeteVine --- Right, I definitely used the same setup a few days ago minus --disable-bootstrap.

[Bug ada/80007] New: --disable-bootstrap with gnat-5 leads to failed gnat-7 build on aarch64

2017-03-11 Thread tulipawn at gmail dot com
Priority: P3 Component: ada Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Never tried bootstrapping ada this way before (a full bootstrap succeeded a few days ago), so I'm not entirely sure if that's

[Bug target/79964] New: Cortex A53 codegen still not optimal

2017-03-08 Thread tulipawn at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Two data points: - the integer benchmark from PR79665 runs about 7% slower with -mcpu=cortex-a53 vs other targets, equalling generic codegen. It was still indistinguishable

[Bug target/77730] Fortran performance on aarch64 (6/7 regression heads-up)

2017-03-07 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77730 PeteVine changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug fortran/79933] New: gfortran no longer able to compile dolfyn benchmark

2017-03-06 Thread tulipawn at gmail dot com
Component: fortran Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40900 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40900=edit fortran source Used to work fine a few months ago. $ gfortran-7

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker

2017-03-05 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 --- Comment #15 from PeteVine --- Sorry wrong number :) I meant --enable-fix-cortex-a53-843419

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker

2017-03-05 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 PeteVine changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|---

[Bug middle-end/77546] [6/7 regression] C++ software renderer performance drop

2017-03-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77546 PeteVine changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-03-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994 --- Comment #7 from PeteVine --- Not affected by -mno-fix-cortex-a53-843419 which gives the issue full validity. -Ofast pessimizes Cortex A53 codegen somehow and switching to e.g. -mcpu=cortex-a57 fixes it. (tested on trunk)

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-03-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #31 from PeteVine --- Indeed, that was it! I've probably found the source of my A53 issues: http://openbenchmarking.org/result/1703040-RI-CRAYERRAT99 This means comment #29 exposes a different issue and Cortex A53 codegen still is

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-03-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #30 from PeteVine --- Or rather, the difference observed in: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468#c7 is still there @ -Ofast, but the Cortex-A53 result is in the same range now. I'll have to investigate the effect of

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-03-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #29 from PeteVine --- I used a different distribution image (binutils 2.25, no --fix-cortex-a53-835769 option) but the results haven't changed (thunderx tuning must have improved though as it stopped offering any benefit over A53):

[Bug target/79105] Autovectorized NEON code slower than vfpv4 on Cortex A5

2017-03-01 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79105 PeteVine changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-27 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #8 from PeteVine --- Seeing as unrolling does such a great job on aarch64, surpassing clang, should we leave the ARM issue bunched together with this one?

[Bug target/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #6 from PeteVine --- The difference between clang and gcc is even greater on ARMv7 Cortex A5 but there's no way to catch up through unrolling (no effect): gcc version 7.0.1 20170225:1227.2 Kpos/sec clang 3.6:

[Bug target/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #5 from PeteVine --- Clang however gets no further improvement from -funroll-loops meaning a simple `-O3 -mcpu=cortex-a53` produces much better performance than gcc without unrolling.

[Bug target/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #4 from PeteVine --- It's a gcc version 7.0.1 20170220 (experimental) (GCC) configured with: --enable-languages=c,c++,fortran --prefix=/usr/gcc7 --program-suffix=-7 --enable-shared --enable-linker-build-id --libexecdir=/usr/gcc7/lib

[Bug middle-end/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #2 from PeteVine --- Created attachment 40831 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40831=edit inputs

[Bug middle-end/79712] Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79712 --- Comment #1 from PeteVine --- Created attachment 40830 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40830=edit C source

[Bug middle-end/79712] New: Clang smarter about unrolling in fhourstones benchmark

2017-02-25 Thread tulipawn at gmail dot com
Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40829 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40829=edit preprocessed source It seems clang is probably doing a better

[Bug middle-end/79665] gcc's signed (x*x)/200 is slower than clang's

2017-02-23 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79665 --- Comment #13 from PeteVine --- Still, the 5% regression must have happened very recently. The fast gcc was built on 20170220 and the slow one yesterday, using the original patch. Once again, switching away from Cortex-A53 codegen restores the

[Bug middle-end/79665] gcc's signed (x*x)/200 is slower than clang's

2017-02-22 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79665 --- Comment #6 from PeteVine --- But that's related to -mcpu=cortex-a53 again, so never mind I guess.

[Bug middle-end/79665] gcc's signed (x*x)/200 is slower than clang's

2017-02-22 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79665 PeteVine changed: What|Removed |Added CC||tulipawn at gmail dot com --- Comment #5

[Bug target/79581] VFP4 slower than VFP3 in C-ray

2017-02-20 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #4 from PeteVine --- > Judging by your -mcpu option is this on a Cortex-A5? Yes, if you look at the results on a Cortex A53 running armv7 code, it doesn't reproduce either, and A5-codegen is king :) (hopefully due to in-order design

[Bug target/79581] VFP4 slower than VFP3 in C-ray

2017-02-18 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #2 from PeteVine --- Created attachment 40769 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40769=edit sphract The other file required to run the benchmark straight from bugzilla! :)

[Bug target/79581] VFP4 slower than VFP3 in C-ray

2017-02-17 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 PeteVine changed: What|Removed |Added Target||armv7 --- Comment #1 from PeteVine ---

[Bug target/79581] New: VFP4 slower than VFP3 in C-ray

2017-02-17 Thread tulipawn at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40762 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40762=edit preprocessed source $ gcc -marm -Ofast -mcpu=cortex-a5 -mfpu=vfpv3 c-ray-mt.i -lm -lpthr

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-02-15 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #28 from PeteVine --- Lesson learnt, thanks! If you look at the last -Ofast result (or 1702153-RI-CRAYFAST467), the suspect difference is there (the compiler had been rebuilt from scratch with all the patches), and I even managed

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-02-15 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #26 from PeteVine --- OK, maybe this SoC is kinky, I give up: http://openbenchmarking.org/result/1702154-RI-CRAYFAST326

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-02-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #25 from PeteVine --- The original issue never mentioned -Ofast or -ffast-math and I see no difference at -Ofast, indeed: http://openbenchmarking.org/result/1702153-RI-CRAYFAST424 @jgreenhalgh Can you confirm there's no regression

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-02-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #24 from PeteVine --- I did a git pull and restarted the build so unless something didn't get reconfigured, it definitely should've been included. If you see the improvement, never mind then.

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2017-02-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #12 from PeteVine --- Nice, PR68664 patch has fixed the issue. FWIW, unlike previously, running on a Cortex-A53, showed perfect alignment with core type (-mfpu=vfpv3) on the first run: Cortex-A8 Rendering took: 1 seconds (1801

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-02-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 PeteVine changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|DUPLICATE

[Bug target/79480] -O3 and -mfpu=neon produces crashing code on ARM

2017-02-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79480 --- Comment #4 from PeteVine --- Whereas `-fsanitize=address` aborts all the same: ==28821==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0xaf012100 #0 0xb6af76fb in operator delete(void*, unsigned

[Bug target/79480] -O3 and -mfpu=neon produces crashing code on ARM

2017-02-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79480 --- Comment #3 from PeteVine --- That's the same command line that leads to an immediate crash (uninstrumented).

[Bug target/79480] -O3 and -mfpu=neon produces crashing code on ARM

2017-02-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79480 --- Comment #2 from PeteVine --- OK, having been built with: -mcpu=cortex-a5 -O3 -ffast-math -marm -fomit-frame-pointer -fipa-pta -mfpu=neon-vfpv4 -ftree-vectorize -flto -fsanitize=undefined doesn't crash but prints many errors, e.g.:

[Bug target/79480] New: -O3 and -mfpu=neon produces crashing code on ARM

2017-02-12 Thread tulipawn at gmail dot com
: target Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- The gl-117 binary (source link attached) compiled with: -mcpu=cortex-a5 -O3 -marm -fomit-frame-pointer -mfpu=neon -ftree-vectorize crashes with a SIGBUS plus this kernel

[Bug target/79370] Cortex-A7 hardware division switched on for -mcpu but not -mtune

2017-02-03 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79370 PeteVine changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|---

[Bug target/79370] New: Cortex-A7 hardware division switched on for -mcpu but not -mtune

2017-02-03 Thread tulipawn at gmail dot com
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40667 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40667=edit Preprocessed source Compiling the attachm

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2017-01-30 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #11 from PeteVine --- Super cool, thanks! That makes the OP a true prophet before his time ;)

[Bug target/53659] ARM: Using -mcpu=cortex-a9 option results in bad performance for Cortex-A9 processor in C-Ray phoronix benchmark

2017-01-29 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 --- Comment #9 from PeteVine --- @jgreenhalgh Please have a look at the profiled assembly for both fast and slow codegen. (attached) According to @aldyh's bisection in #68664 this probably isn't the same issue.

[Bug target/79239] [7 regression] ICE in extract_insn, at recog.c:2311 (error: unrecognizable insn)

2017-01-26 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79239 --- Comment #5 from PeteVine --- Yes, this came from the gl4es project, and compiling the whole thing normally, only gcc7 is affected.

[Bug target/79239] [7 regression] ICE in extract_insn, at recog.c:2311 (error: unrecognizable insn)

2017-01-26 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79239 --- Comment #2 from PeteVine --- gcc -O2 or above elicits the ICE, configured with: --enable-languages=c,c++,fortran --prefix=/usr/gcc7 --program-suffix=-7 --enable-shared --enable-linker-build-id --libexecdir=/usr/gcc7/lib

[Bug target/79239] New: [7 regression] ICE in extract_insn, at recog.c:2311 (error: unrecognizable insn)

2017-01-26 Thread tulipawn at gmail dot com
Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40586 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40586=edit Preposessed source

[Bug target/77468] [7 Regression] C-ray regression on Aarch64

2017-01-25 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77468 --- Comment #21 from PeteVine --- It would be great if https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659 could get squashed in one fell swoop.

[Bug target/79105] Autovectorized NEON code slower than vfpv4 on Cortex A5

2017-01-17 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79105 --- Comment #2 from PeteVine --- $ gcc -v Configured with: ../configure -v --enable-languages=c,c++,fortran --prefix=/usr/gcc7 --program-suffix=-7 --enable-shared --enable-linker-build-id --libexecdir=/usr/gcc7/lib --without-included-gettext

[Bug target/79105] Autovectorized NEON code slower than vfpv4 on Cortex A5

2017-01-17 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79105 --- Comment #1 from PeteVine --- Updated to include an explicit -mfpu=neon-vfpv4. http://openbenchmarking.org/result/1701179-TA-1701143TA49 Not sure if -mcpu=cortex-a5 and -mfpu=neon shouldn't have implied VFPv4 but the explicit addition has

[Bug target/79105] New: Autovectorized NEON code slower than vfpv4 on Cortex A5

2017-01-16 Thread tulipawn at gmail dot com
Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- As the title says, many results seem to suffer from switching to -mfpu=neon, etc. http://openbenchmarking.org/result/1701165-TA-1701143TA78

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-01-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994 --- Comment #6 from PeteVine --- It's possible I already had that patch included in my build, but in case I didn't, here's a quick addition to the previous result: http://openbenchmarking.org/result/1701143-TA-GCCCOMPAR66 The c-ray thunderx

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-01-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994 --- Comment #4 from PeteVine --- I'm delighted to report **not** targeting Cortex-A53 actually incurs a performance penalty sometimes ;) http://openbenchmarking.org/result/1701128-TA-GCCCOMPAR79

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-01-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994 --- Comment #3 from PeteVine --- Hey, that works for me too! (62565 vs 70758 in favour of -Ofast). Usefully strange :)

[Bug middle-end/78994] New: -Ofast makes aarch64 C++ benchmark slower

2017-01-04 Thread tulipawn at gmail dot com
-end Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Created attachment 40463 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40463=edit Preprocessed source + assembly files After make && ./build/dsp-bench,

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker

2016-12-17 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 --- Comment #13 from PeteVine --- Also, could these (sample) warnings actually matter when using ld.gold? NB, lra-constraints.c features in the previously provided backtrace: ../../libdecnumber/decNumber.c:3582:0: note: code may be misoptimized

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker

2016-12-17 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 PeteVine changed: What|Removed |Added CC||ktkachov at gcc dot gnu.org --- Comment #12

[Bug c++/69481] ICE with C++11 alias using with templates

2016-12-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69481 PeteVine changed: What|Removed |Added CC||tulipawn at gmail dot com --- Comment #10

[Bug bootstrap/78220] New: Add 'remounting exec' suggestion

2016-11-05 Thread tulipawn at gmail dot com
Assignee: unassigned at gcc dot gnu.org Reporter: tulipawn at gmail dot com Target Milestone: --- Restarting a build on a `noexec` partition fails with: checking whether the C compiler works... configure: error: in `/mnt/gcc-svn-master/build-lto/gcc': configure: error: cannot run C

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with gold linker

2016-10-29 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 PeteVine changed: What|Removed |Added Summary|ICE during LTO bootstrap on |ICE during LTO bootstrap on

[Bug target/77730] Fortran performance on aarch64 (6/7 regression heads-up)

2016-10-27 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77730 --- Comment #8 from PeteVine --- I thought I was clear it was just a heads-up. All relevant data is already inside and anyone willing to look closer should just run the benchmark on any machine/platform like this, e.g.: $ phoronix-test-suite

[Bug bootstrap/77917] ARM/AARCH64 bootstrap-lto fails

2016-10-27 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77917 PeteVine changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|---

[Bug bootstrap/77917] ARM/AARCH64 bootstrap-lto fails

2016-10-27 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77917 --- Comment #11 from PeteVine --- Well, I finally managed to complete an LTO bootstrap on ARM (even leaving the full complement of C(XX)FLAGS in place, bar -flto) but it seems using ld.bfd is a must.

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with extra options

2016-10-26 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 --- Comment #10 from PeteVine --- Hold on, I'm investigating the effect of binutils downgrade (2.27 -> 2.26) and switching to ld.bfd. Strange that it all works fine during normal bootstrap, regardless of the codegen options.

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with extra options

2016-10-26 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 --- Comment #9 from PeteVine --- Ha! It's not about the extra options, thank you! ;) Completely vanilla environment and gcc 5.4 produce this after a while: ../build-lto-noflags/./gcc/xgcc -B../build-lto-noflags/./gcc/

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with extra options

2016-10-26 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 --- Comment #8 from PeteVine --- FWIW, here's the corresponding backtrace: #0 0x00afa00c in df_get_live_out () at ../../gcc/df.h:1159 #1 update_ebb_live_info (tail=, head=0x137b838 ) at ../../gcc/lra-constraints.c:5612 #2

[Bug target/78105] ICE during LTO bootstrap on AARCH64 with extra options

2016-10-26 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78105 --- Comment #7 from PeteVine --- Restarted the whole thing from scratch using gcc 5.4 and it segfaulted again. ../../../libgcc/libgcc2.c: In function ‘__powitf2’: ../../../libgcc/libgcc2.c:1851:1: internal compiler error: Segmentation fault }

  1   2   3   >