[Bug rtl-optimization/114211] [13/14 Regression] wrong code with -O -fno-tree-coalesce-vars since r13-1907

2024-03-05 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114211

--- Comment #7 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:aed445b0fd0c7ed16124c61e7eb732992426f103

commit r14-9315-gaed445b0fd0c7ed16124c61e7eb732992426f103
Author: Jakub Jelinek 
Date:   Tue Mar 5 10:32:38 2024 +0100

lower-subreg: Fix ROTATE handling [PR114211]

On the following testcase, we have
(insn 10 7 11 2 (set (reg/v:TI 106 [ h ])
(rotate:TI (reg/v:TI 106 [ h ])
(const_int 64 [0x40]))) "pr114211.c":8:5 1042
{rotl64ti2_doubleword}
 (nil))
before subreg1 and the pass decides to use
(reg:DI 127 [ h ]) / (reg:DI 128 [ h+8 ])
register pair instead of (reg/v:TI 106 [ h ]).
resolve_operand_for_swap_move_operator implements it by pretending it is
an assignment from
(concatn (reg:DI 127 [ h ]) (reg:DI 128 [ h+8 ]))
to
(concatn (reg:DI 128 [ h+8 ]) (reg:DI 127 [ h ]))
The problem is that if the rotate argument is the same as destination or
if there is even an overlap between the first half of the destination with
second half of the source we emit incorrect code, because the store to
(reg:DI 128 [ h+8 ]) overwrites what we need for source of the second
move.  The following patch detects that case and uses a temporary pseudo
to hold the original (reg:DI 128 [ h+8 ]) value across the first store.

2024-03-05  Jakub Jelinek  

PR rtl-optimization/114211
* lower-subreg.cc (resolve_simple_move): For double-word
rotates by BITS_PER_WORD if there is overlap between source
and destination use a temporary.

* gcc.dg/pr114211.c: New test.

[Bug rtl-optimization/114211] [13/14 Regression] wrong code with -O -fno-tree-coalesce-vars since r13-1907

2024-03-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114211

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
Created attachment 57603
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57603=edit
gcc14-pr114211.patch

Untested fix.

[Bug rtl-optimization/114211] [13/14 Regression] wrong code with -O -fno-tree-coalesce-vars since r13-1907

2024-03-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114211

--- Comment #5 from Jakub Jelinek  ---
Anyway, the actual bug is in the
r9-4082-g38e601118ca88adf0a472750b0da83f0ef1798a7
PR87507 change.
Either we need to punt if the rotate input and output overlaps, or handle that
case correctly.

[Bug rtl-optimization/114211] [13/14 Regression] wrong code with -O -fno-tree-coalesce-vars since r13-1907

2024-03-04 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114211

Jakub Jelinek  changed:

   What|Removed |Added

Summary|[13/14 Regression] wrong|[13/14 Regression] wrong
   |code with -O|code with -O
   |-fno-tree-coalesce-vars |-fno-tree-coalesce-vars
   ||since r13-1907
 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
Started with r13-1907-g525a1a73a5a563c829a5f76858fe122c9b39f254

[Bug rtl-optimization/114211] [13/14 Regression] wrong code with -O -fno-tree-coalesce-vars

2024-03-04 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114211

Uroš Bizjak  changed:

   What|Removed |Added

  Component|target  |rtl-optimization
   Keywords|needs-bisection |

--- Comment #3 from Uroš Bizjak  ---
(In reply to Richard Biener from comment #2)
> Possibly target independent rtl-optimization issue.

It is _subreg1 pass that converts:

(insn 10 7 11 2 (set (reg/v:TI 106 [ h ])
(rotate:TI (reg/v:TI 106 [ h ])
(const_int 64 [0x40]))) "pr114211.c":9:5 1042
{rotl64ti2_doubleword}
 (nil))

to:

(insn 39 7 40 2 (set (reg:DI 128 [ h+8 ])
(reg:DI 127 [ h ])) "pr114211.c":9:5 84 {*movdi_internal}
 (nil))
(insn 40 39 11 2 (set (reg:DI 127 [ h ])
(reg:DI 128 [ h+8 ])) "pr114211.c":9:5 84 {*movdi_internal}
 (nil))

Well... this won't swap. Either parallel should be emitted, or a temporary
should be used.

Adding -fno-split-wide-types fixes the testcase.

Re-confirmed as rtl-optimization problem.