[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation

2017-04-11 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283

--- Comment #13 from wilco at gcc dot gnu.org ---
It looks the x64 issue is unrelated. It starts with a bad schedule which could
be improved by the scheduler but that is off by default, while the ARM version
starts with a good schedule which is completely messed up by the scheduler.

[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation

2017-04-10 Thread wilco at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283

wilco at gcc dot gnu.org changed:

   What|Removed |Added

 CC||wilco at gcc dot gnu.org

--- Comment #12 from wilco at gcc dot gnu.org ---
There are 2 separate issues in the ARMv7 case. One is scheduling, the -S output
goes down from 437 lines to 305 lines with -fno-schedule-insns (stack size 276
rather than 448 bytes). So basically the "register pressure aware" scheduler
introduces lots of unnecessary spills.

The 2nd issue is related to use of single-element operations within vectors. If
I change the define to do an explicit dup, eg. vmulq_f32((b), vdupq_n_f32(a)),
I get 211 lines and no spills at all. Switching scheduling on again gives 326
lines so it's spilling like crazy.

Both issues seem to have been present since at least 4.8.2.

[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation

2017-04-04 Thread already5chosen at yahoo dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283

--- Comment #11 from Michael_S  ---
Created attachment 41128
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41128=edit
ARMv7 case

ARMv7 - very similar to x64

[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation

2017-04-04 Thread already5chosen at yahoo dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283

--- Comment #10 from Michael_S  ---
Created attachment 41127
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41127=edit
bad reg allocation despite no-tree-ter

No problems

[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation

2017-04-04 Thread bernds at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283

--- Comment #9 from Bernd Schmidt  ---
Cou(In reply to Michael_S from comment #8)
> Here is a variant that makes an issue to show on x64 with -fno-tree-ter.
> https://godbolt.org/g/mSLiRZ

Could you attach this here as well? I've been trying to get the testcase out of
godbolt, but there seems not to be a save option and copy & paste doesn't work
either.

In general, the problem is that ter makes pessimal scheduling decisions,
increasing register pressure. The patch I have adds a little mini-scheduler
into the expand stage to try to prevent this. In theory that should also work
for testcases where the bad scheduling was done manually, but I'd like to
check.

[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation

2017-04-04 Thread already5chosen at yahoo dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283

Michael_S  changed:

   What|Removed |Added

 CC||already5chosen at yahoo dot com

--- Comment #8 from Michael_S  ---
Hi
I am person that originated the issue. I didn't want to take part in the
discussion, but Markus convinced me.

I want to add a couple of point:
1. It seems to me that the issue is not specific to x64. It is more general and
could happen on any machine with 16 SIMD registers.

2. Here is a demonstration of the issue on ARMv7 Neon.
https://godbolt.org/g/e9A5Yi
As an example of proper code generation you can look (on the same Godbolt) at
code, generated by Visual C.

3. Markus argues that ARMv7 Neon issue differs from x64. He appears to think
so, because x64 issue is cured by -fno-tree-ter and ARMv7 issue is not.
I disagree. According to my understanding tree-ter processing is just a trigger
of the problem, not the cause of it. The cause has to be a broken optimizer
heuristics.

4. To prove my point that relationship between the problem and the tree-ter on
x64 is incidental, I reformatted an original code in slightly different manner
(was I imitating tree-ter ? May be. But I didn't look at  tree-ter source
code).
Here is a variant that makes an issue to show on x64 with -fno-tree-ter.
https://godbolt.org/g/mSLiRZ

Best regards,
Michael

[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation

2017-04-03 Thread bernds at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283

--- Comment #7 from Bernd Schmidt  ---
Well, I've made a small tweak to the patch I have for PR78972, and I've got
what at a glance looks like optimal code (no spills).

[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation

2017-04-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
   Target Milestone|--- |5.5

[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation

2017-04-02 Thread trippels at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283

--- Comment #6 from Markus Trippelsdorf  ---
(In reply to Andrew Pinski from comment #5)
> Try -fno-tree-ter .

Yes, this works, too.

[Bug middle-end/80283] [5/6/7 Regression] bad SIMD register allocation

2017-04-02 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80283

Andrew Pinski  changed:

   What|Removed |Added

  Component|tree-optimization   |middle-end

--- Comment #5 from Andrew Pinski  ---
Try -fno-tree-ter .