[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283 --- Comment #13 from wilco at gcc dot gnu.org --- It looks the x64 issue is unrelated. It starts with a bad schedule which could be improved by the scheduler but that is off by default, while the ARM version starts with a good schedule which is completely messed up by the scheduler.
[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283 wilco at gcc dot gnu.org changed: What|Removed |Added CC||wilco at gcc dot gnu.org --- Comment #12 from wilco at gcc dot gnu.org --- There are 2 separate issues in the ARMv7 case. One is scheduling, the -S output goes down from 437 lines to 305 lines with -fno-schedule-insns (stack size 276 rather than 448 bytes). So basically the "register pressure aware" scheduler introduces lots of unnecessary spills. The 2nd issue is related to use of single-element operations within vectors. If I change the define to do an explicit dup, eg. vmulq_f32((b), vdupq_n_f32(a)), I get 211 lines and no spills at all. Switching scheduling on again gives 326 lines so it's spilling like crazy. Both issues seem to have been present since at least 4.8.2.
[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283 --- Comment #11 from Michael_S --- Created attachment 41128 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41128=edit ARMv7 case ARMv7 - very similar to x64
[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283 --- Comment #10 from Michael_S --- Created attachment 41127 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41127=edit bad reg allocation despite no-tree-ter No problems
[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283 --- Comment #9 from Bernd Schmidt --- Cou(In reply to Michael_S from comment #8) > Here is a variant that makes an issue to show on x64 with -fno-tree-ter. > https://godbolt.org/g/mSLiRZ Could you attach this here as well? I've been trying to get the testcase out of godbolt, but there seems not to be a save option and copy & paste doesn't work either. In general, the problem is that ter makes pessimal scheduling decisions, increasing register pressure. The patch I have adds a little mini-scheduler into the expand stage to try to prevent this. In theory that should also work for testcases where the bad scheduling was done manually, but I'd like to check.
[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283 Michael_S changed: What|Removed |Added CC||already5chosen at yahoo dot com --- Comment #8 from Michael_S --- Hi I am person that originated the issue. I didn't want to take part in the discussion, but Markus convinced me. I want to add a couple of point: 1. It seems to me that the issue is not specific to x64. It is more general and could happen on any machine with 16 SIMD registers. 2. Here is a demonstration of the issue on ARMv7 Neon. https://godbolt.org/g/e9A5Yi As an example of proper code generation you can look (on the same Godbolt) at code, generated by Visual C. 3. Markus argues that ARMv7 Neon issue differs from x64. He appears to think so, because x64 issue is cured by -fno-tree-ter and ARMv7 issue is not. I disagree. According to my understanding tree-ter processing is just a trigger of the problem, not the cause of it. The cause has to be a broken optimizer heuristics. 4. To prove my point that relationship between the problem and the tree-ter on x64 is incidental, I reformatted an original code in slightly different manner (was I imitating tree-ter ? May be. But I didn't look at tree-ter source code). Here is a variant that makes an issue to show on x64 with -fno-tree-ter. https://godbolt.org/g/mSLiRZ Best regards, Michael
[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283 --- Comment #7 from Bernd Schmidt --- Well, I've made a small tweak to the patch I have for PR78972, and I've got what at a glance looks like optimal code (no spills).
[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283 Richard Biener changed: What|Removed |Added Priority|P3 |P2 Target Milestone|--- |5.5
[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283 --- Comment #6 from Markus Trippelsdorf --- (In reply to Andrew Pinski from comment #5) > Try -fno-tree-ter . Yes, this works, too.
[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283 Andrew Pinski changed: What|Removed |Added Component|tree-optimization |middle-end --- Comment #5 from Andrew Pinski --- Try -fno-tree-ter .