[Bug target/79964] Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 PeteVine changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #8 from PeteVine --- In case the changed behaviour of -frename-registers is not actually a feature, please reopen.
[Bug target/79964] Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #7 from PeteVine --- Thanks for pointing that out! I was using my bash history to change the CFLAGS and when I was flipping the crc switch I didn't notice I'd picked a version without -frename-registers, hence this wrong conclusion :) Definitely then, -frename-registers it is! http://openbenchmarking.org/result/1707307-RI-CORTEXA5313
[Bug target/79964] Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #6 from Andrew Pinski --- (In reply to PeteVine from comment #5) > Turns out the GCC 8 regression is caused by the +crc switch in > -march=armv8-a+crc. Interesting, eh? +crc should not cause any code generation difference ...
[Bug target/79964] Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #5 from PeteVine --- Turns out the GCC 8 regression is caused by the +crc switch in -march=armv8-a+crc. Interesting, eh?
[Bug target/79964] Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #4 from PeteVine --- > I'm not sure what you're trying to measure here - it's very confusing with > multiple overlapping options (O3/Ofast/tree-vectorize), -mcpu/-march. Is it > related to -fipa-pta or is that not relevant? All the relevant flags have been kept constant (-Ofast -mcpu), so you should only look at this result side by side with the previous one. I'll summarise the findings for you: To get the best c-ray performance out of gcc7 it's necessary to either use -mcpu/mtune=cortex-a57 or -mcpu=cortex-a53 -frename-registers (depessimizing with -mno-fix-cortex-a53-843419 if necessary) However, in gcc8, neither produce the expected, best performance. No combination does, a clear regression.
[Bug target/79964] Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #3 from wilco at gcc dot gnu.org --- (In reply to PeteVine from comment #2) > I can confirm the first part of the issue gets fixed with this patch: > > https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01415.html There are a few more division patches on the way, eg. https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01312.html is one of them, another should remove the redundant shift. > but there's a regression in gcc8 concerning the second part. (or rather the > workarounds don't work any more) > > http://openbenchmarking.org/result/1704298-RI-CRAYREGRE13 > > ("basic flags" didn't deactivate -mfix-cortex-a53-843419, hence the > difference) I'm not sure what you're trying to measure here - it's very confusing with multiple overlapping options (O3/Ofast/tree-vectorize), -mcpu/-march. Is it related to -fipa-pta or is that not relevant?
[Bug target/79964] Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 --- Comment #2 from PeteVine --- I can confirm the first part of the issue gets fixed with this patch: https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01415.html but there's a regression in gcc8 concerning the second part. (or rather the workarounds don't work any more) http://openbenchmarking.org/result/1704298-RI-CRAYREGRE13 ("basic flags" didn't deactivate -mfix-cortex-a53-843419, hence the difference)
[Bug target/79964] Cortex A53 codegen still not optimal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79964 PeteVine changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #1 from PeteVine --- Turns out -frename-registers fixes this issue as well, thanks for the tip! http://openbenchmarking.org/result/1704142-RI-1703089RI22