[Bug target/113295] [14 Regression] SPEC 2006 416.gamess miscompares on Aarch64 when built with -Ofast -mcpu=native since g:2f46e3578d45ff060a0a329cb39d4f52878f9d5a

2024-02-23 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113295

Richard Sandiford  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #9 from Richard Sandiford  ---
Fixed.

[Bug target/113295] [14 Regression] SPEC 2006 416.gamess miscompares on Aarch64 when built with -Ofast -mcpu=native since g:2f46e3578d45ff060a0a329cb39d4f52878f9d5a

2024-02-23 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113295

--- Comment #8 from GCC Commits  ---
The trunk branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:9f105cfdc1bca6c9224384b3044c4ca5894e1e4c

commit r14-9156-g9f105cfdc1bca6c9224384b3044c4ca5894e1e4c
Author: Richard Sandiford 
Date:   Fri Feb 23 14:12:55 2024 +

aarch64: Tighten early-ra chain test for wide registers [PR113295]

Most code in early-ra used is_chain_candidate to check whether we
should chain two allocnos.  This included both tests that matter
for correctness and tests for certain heuristics.

Once that test passes for one pair of allocnos, we test whether
it's safe to chain the containing groups (which might contain
multiple allocnos for x2, x3 and x4 modes).  This test used an
inline test for correctness only, deliberately skipping the
heuristics.  However, this instance of the test was missing
some handling of equivalent allocnos.

This patch fixes things by making is_chain_candidate take a
strictness parameter: correctness only, or correctness + heuristics.
It then makes the group-chaining test use the correctness version
rather than trying to replicate it inline.

gcc/
PR target/113295
* config/aarch64/aarch64-early-ra.cc
(early_ra::test_strictness): New enum.
(early_ra::is_chain_candidate): Add a strictness parameter to
control whether only correctness matters, or whether both
correctness
and heuristics should be used.  Handle multiple levels of
equivalence.
(early_ra::find_related_start): Update call accordingly.
(early_ra::strided_polarity_pref): Likewise.
(early_ra::form_chains): Likewise.
(early_ra::try_to_chain_allocnos): Use is_chain_candidate in
correctness mode rather than trying to inline the test.

gcc/testsuite/
PR target/113295
* gcc.target/aarch64/pr113295-2.c: New test.

[Bug target/113295] [14 Regression] SPEC 2006 416.gamess miscompares on Aarch64 when built with -Ofast -mcpu=native since g:2f46e3578d45ff060a0a329cb39d4f52878f9d5a

2024-02-23 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113295

--- Comment #7 from GCC Commits  ---
The trunk branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:8a16e06da97f51574cfad17e2cece2e58571305d

commit r14-9155-g8a16e06da97f51574cfad17e2cece2e58571305d
Author: Richard Sandiford 
Date:   Fri Feb 23 14:12:54 2024 +

aarch64: Add missing early-ra bookkeeping [PR113295]

416.gamess showed up two wrong-code bugs in early-ra.  This patch
fixes the first of them.  It was difficult to reduce the source code
to something that would meaningfully show the situation, so the
testcase uses a direct RTL sequence instead.

In the sequence:

(a) register <2> is set more than once
(b) register <2> is copied to a temporary (<4>)
(c) register <2> is the destination of an FCSEL between <4> and
another value (<5>)
(d) <4> and <2> are equivalent for <4>'s live range
(e) <5>'s and <2>'s live ranges do not intersect, and there is
a pseudo-copy between <5> and <2>

On its own, (d) implies that <4> can be treated as equivalent to <2>.
And on its own, (e) implies that <5> can share <2>'s register.  But
<4>'s and <5>'s live ranges conflict, meaning that they cannot both
share the register together.  A bit of missing bookkeeping meant that
the mechanism for detecting this didn't fire.  We therefore ended up
with an FCSEL in which both inputs were the same register.

gcc/
PR target/113295
* config/aarch64/aarch64-early-ra.cc
(early_ra::find_related_start): Account for definitions by shared
registers when testing for a single register definition.
(early_ra::accumulate_defs): New function.
(early_ra::record_copy): If A shares B's register, fold A's
definition information into B's.  Fold A's use information into
B's.

gcc/testsuite/
PR target/113295
* gcc.dg/rtl/aarch64/pr113295-1.c: New test.

[Bug target/113295] [14 Regression] SPEC 2006 416.gamess miscompares on Aarch64 when built with -Ofast -mcpu=native since g:2f46e3578d45ff060a0a329cb39d4f52878f9d5a

2024-02-21 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113295

--- Comment #6 from Richard Sandiford  ---
For me the miscompilation is in jkdmem_, where we end up allocating the same
registers to both arms of an fcsel.  It sounds like it occurs elsewhere too.

I have a candidate fix, but need to think a bit more about it.

[Bug target/113295] [14 Regression] SPEC 2006 416.gamess miscompares on Aarch64 when built with -Ofast -mcpu=native since g:2f46e3578d45ff060a0a329cb39d4f52878f9d5a

2024-02-21 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113295

Richard Sandiford  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rsandifo at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #5 from Richard Sandiford  ---
Mine.  Could be the same as PR112922.

[Bug target/113295] [14 Regression] SPEC 2006 416.gamess miscompares on Aarch64 when built with -Ofast -mcpu=native since g:2f46e3578d45ff060a0a329cb39d4f52878f9d5a

2024-02-21 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113295

Tamar Christina  changed:

   What|Removed |Added

   Keywords|needs-bisection |
Summary|[14 Regression] SPEC 2006   |[14 Regression] SPEC 2006
   |416.gamess miscompares on   |416.gamess miscompares on
   |Aarch64 when built with |Aarch64 when built with
   |-Ofast -march=native -flto  |-Ofast -mcpu=native since
   ||g:2f46e3578d45ff060a0a329cb
   ||39d4f52878f9d5a

--- Comment #4 from Tamar Christina  ---
most of the changes seem to be scheduling and register renaming, so I guess one
should compile with scheduling off to reduce the noise a bit.

detinp_ does have a weird transformation where a cluster of csel go missing and
replaced with mov, but the function is too big for me to quickly tell if this
is the issue.