[Bug target/79581] VFP4 slower than VFP3 in C-ray on Cortex A5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #8 from PeteVine --- I've just confirmed the result on a newer Linux distribution (Ubuntu 16.04) and the difference between VFPv3 and v4 is clearly there (2330 vs 2560) using gcc 5.4. Unless the CPU itself requires an erratum, that probably leaves suboptimal codegen as the main suspect.
[Bug target/79581] VFP4 slower than VFP3 in C-ray on Cortex A5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #7 from PeteVine --- Thanks, I promise to test any patches without delay :)
[Bug target/79581] VFP4 slower than VFP3 in C-ray on Cortex A5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 Ramana Radhakrishnan changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-06-15 CC||ramana at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #6 from Ramana Radhakrishnan --- (In reply to PeteVine from comment #4) > > Judging by your -mcpu option is this on a Cortex-A5? > > Yes, if you look at the results on a Cortex A53 running armv7 code, it > doesn't reproduce either, and A5-codegen is king :) (hopefully due to > in-order design or sth) > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659#c12 > > A quick question regarding -mcpu=cortex-a5 codegen; is there a similar > switch to llvm's `-slowfpvmlx` feature? (disable slow vmla/vmls), which the > nice ARM guy divulged here: > > https://bugs.llvm.org//show_bug.cgi?id=26135#c9 > > or is it a non-issue in gcc? No, there isn't a similar switch to that in GCC that I'm aware of. It's also not yet clear why the change is - the only difference is as per kyrill's analysis in c#3 I'm going to confirm this but I don't have Cortex-A5 hardware to investigate / play with this any further.
[Bug target/79581] VFP4 slower than VFP3 in C-ray on Cortex A5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #5 from PeteVine --- Unchanged in gcc version 8.0.0 20170501.
[Bug target/79581] VFP4 slower than VFP3 in C-ray
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #4 from PeteVine --- > Judging by your -mcpu option is this on a Cortex-A5? Yes, if you look at the results on a Cortex A53 running armv7 code, it doesn't reproduce either, and A5-codegen is king :) (hopefully due to in-order design or sth) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53659#c12 A quick question regarding -mcpu=cortex-a5 codegen; is there a similar switch to llvm's `-slowfpvmlx` feature? (disable slow vmla/vmls), which the nice ARM guy divulged here: https://bugs.llvm.org//show_bug.cgi?id=26135#c9 or is it a non-issue in gcc?
[Bug target/79581] VFP4 slower than VFP3 in C-ray
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #3 from ktkachov at gcc dot gnu.org --- I can't reproduce the difference on my machine. Judging by your -mcpu option is this on a Cortex-A5? As far as codegen goes the major difference I can see is that the vfpv4 version generates vfma instructions instead of vmla ones. Also there are cases where the vfpv3 version will generate multiple vmls instructions whereas the vfpv4 one will generate an explicit vneg followed by vfma instructions
[Bug target/79581] VFP4 slower than VFP3 in C-ray
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 --- Comment #2 from PeteVine --- Created attachment 40769 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40769=edit sphract The other file required to run the benchmark straight from bugzilla! :)
[Bug target/79581] VFP4 slower than VFP3 in C-ray
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79581 PeteVine changed: What|Removed |Added Target||armv7 --- Comment #1 from PeteVine --- Distilled from PR79105 as a separate issue, not related to NEON and autovectorization.