[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2021-05-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-09-14 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994

Wilco  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-09-14
   Assignee|unassigned at gcc dot gnu.org  |wilco at gcc dot gnu.org
 Ever confirmed|0   |1

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-04-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994

--- Comment #9 from PeteVine  ---
Well, yes, that fixes the -Ofast issue for me:

-mcpu=cortex-a53 -frename-registers
iir:65952 ns per loop
iir_2:  63098 ns per loop

-mcpu=cortex-a57 (-frename-registers)
iir:62839 ns per loop
iir_2:  62677 ns per loop

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-04-12 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994

wilco at gcc dot gnu.org changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #8 from wilco at gcc dot gnu.org ---
It looks like scheduling but not of int<->fp transfers, try -frename-registers.

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-03-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994

--- Comment #7 from PeteVine  ---
Not affected by -mno-fix-cortex-a53-843419 which gives the issue full validity.
-Ofast pessimizes Cortex A53 codegen somehow and switching to e.g.
-mcpu=cortex-a57 fixes it. (tested on trunk)

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-01-14 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994

--- Comment #6 from PeteVine  ---
It's possible I already had that patch included in my build, but 
in case I didn't, here's a quick addition to the previous result:

http://openbenchmarking.org/result/1701143-TA-GCCCOMPAR66

The c-ray thunderx result suggests A53 codegen is still suboptimal. The patch
has had no effect on the original issue.

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-01-12 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994

--- Comment #5 from Andrew Pinski  ---
https://gcc.gnu.org/ml/gcc-patches/2017-01/msg00637.html

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-01-12 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994

--- Comment #4 from PeteVine  ---
I'm delighted to report **not** targeting Cortex-A53 actually incurs a
performance penalty sometimes ;)

http://openbenchmarking.org/result/1701128-TA-GCCCOMPAR79

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-01-04 Thread tulipawn at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994

--- Comment #3 from PeteVine  ---
Hey, that works for me too! (62565 vs 70758 in favour of -Ofast). Usefully
strange :)

[Bug target/78994] -Ofast makes aarch64 C++ benchmark slower for A53

2017-01-04 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78994

Andrew Pinski  changed:

   What|Removed |Added

Summary|-Ofast makes aarch64 C++|-Ofast makes aarch64 C++
   |benchmark slower|benchmark slower for A53

--- Comment #2 from Andrew Pinski  ---
For me on ThunderX, -Ofast is faster:
apinski@apinski-ss1:~/src/tests1$ g++ main.ii -Ofast -std=c++14 -mcpu=thunderx
apinski@apinski-ss1:~/src/tests1$ ./a.out
DSP Bench C++
Biquad coefficients:
b0=1.0045897513638202,
b1=-1.9900946111700766,
b2=0.98618704929508227,
a1=-1.9900946111700766,
a2=0.99077680065890239,
iir:62008 ns per loop
iir_2:  62008 ns per loop
apinski@apinski-ss1:~/src/tests1$ g++ main.ii -O3 -std=c++14 -mcpu=thunderx
apinski@apinski-ss1:~/src/tests1$ ./a.out
DSP Bench C++
Biquad coefficients:
b0=1.0045897513638202,
b1=-1.9900946111700766,
b2=0.98618704929508227,
a1=-1.9900946111700766,
a2=0.99077680065890239,
iir:80256 ns per loop
iir_2:  80256 ns per loop
apinski@apinski-ss1:~/src/tests1$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/aarch64-linux-gnu/5/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro
5.4.0-6ubuntu1~16.04.1' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-5 --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-libquadmath --enable-plugin --with-system-zlib
--disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-arm64/jre --enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-arm64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-arm64
--with-arch-directory=aarch64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror
--enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu
--target=aarch64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.1)


On the trunk:
apinski@apinski-ss1:~/src/tests1$ ~/src/local/tools/bin/g++  main.ii -O3
-std=c++14 -mcpu=thunderx
apinski@apinski-ss1:~/src/tests1$ ./a.out
DSP Bench C++
Biquad coefficients:
b0=1.0045897513638202,
b1=-1.9900946111700766,
b2=0.98618704929508227,
a1=-1.9900946111700766,
a2=0.99077680065890239,
iir:80630 ns per loop
iir_2:  80629 ns per loop
apinski@apinski-ss1:~/src/tests1$ ~/src/local/tools/bin/g++  main.ii -Ofast
-std=c++14 -mcpu=thunderx
apinski@apinski-ss1:~/src/tests1$ ./a.out
DSP Bench C++
Biquad coefficients:
b0=1.0045897513638202,
b1=-1.9900946111700766,
b2=0.98618704929508227,
a1=-1.9900946111700766,
a2=0.99077680065890239,
iir:62108 ns per loop
iir_2:  62118 ns per loop
apinski@apinski-ss1:~/src/tests1$ ~/src/local/tools/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/home/apinski/src/local/tools/bin/g++
COLLECT_LTO_WRAPPER=/data/src/local/tools/bin/../libexec/gcc/aarch64-unknown-linux-gnu/7.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../gcc/configure
--prefix=/home/apinski/src/local/objdir/../tools
--enable-languages=c,c++,fortran,go --disable-werror --with-sysroot=/
--enable-plugins --enable-gnu-indirect-function --with-pkgversion='My build'
Thread model: posix
gcc version 7.0.0 20161110 (experimental) [trunk revision 242061] (My build)