[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler

2024-01-23 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070

Alex Coplan  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #13 from Alex Coplan  ---
Should be fixed, sorry for the delay, and thanks for the report.

[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler

2024-01-23 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070

--- Comment #11 from GCC Commits  ---
The master branch has been updated by Alex Coplan :

https://gcc.gnu.org/g:6dd613df59060fb54c4e3f66f39cf59bc44d118a

commit r14-8360-g6dd613df59060fb54c4e3f66f39cf59bc44d118a
Author: Alex Coplan 
Date:   Fri Jan 12 09:09:10 2024 +

rtl-ssa: Ensure new defs get inserted [PR113070]

In r14-5820-ga49befbd2c783e751dc2110b544fe540eb7e33eb I added support to
RTL-SSA for inserting new insns, which included support for users
creating new defs.

However, I missed that apply_changes_to_insn needed updating to ensure
that the new defs actually got inserted into the main def chain.  This
meant that when the aarch64 ldp/stp pass inserted a new stp insn, the
stp would just get skipped over during subsequent alias analysis, as its
def never got inserted into the memory def chain.  This (unsurprisingly)
led to wrong code.

This patch fixes the issue by ensuring new user-created defs get
inserted.  I would have preferred to have used a flag internal to the
defs instead of a separate data structure to keep track of them, but since
machine_mode increased to 16 bits we're already at 64 bits in access_info,
and we can't really reuse m_is_temp as the logic in finalize_new_accesses
requires it to get cleared.

gcc/ChangeLog:

PR target/113070
* rtl-ssa.h: Include hash-set.h.
* rtl-ssa/changes.cc (function_info::finalize_new_accesses): Add
new_sets parameter and use it to keep track of new user-created
sets.
(function_info::apply_changes_to_insn): Also call add_def on new
sets.
(function_info::change_insns): Add hash_set to keep track of new
user-created defs.  Plumb it through.
* rtl-ssa/functions.h: Add hash_set parameter to
finalize_new_accesses and
apply_changes_to_insn.

[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler

2024-01-23 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070

--- Comment #10 from GCC Commits  ---
The master branch has been updated by Alex Coplan :

https://gcc.gnu.org/g:fce3994d04fc5d7d1c91f6db5a1f144aa291439a

commit r14-8359-gfce3994d04fc5d7d1c91f6db5a1f144aa291439a
Author: Alex Coplan 
Date:   Fri Jan 12 10:14:33 2024 +

rtl-ssa: Support for creating new uses [PR113070]

This exposes an interface for users to create new uses in RTL-SSA.
This is needed for updating uses after inserting a new store pair insn
in the aarch64 load/store pair fusion pass.

gcc/ChangeLog:

PR target/113070
* rtl-ssa/accesses.cc (function_info::create_use): New.
* rtl-ssa/changes.cc (function_info::finalize_new_accesses):
Ensure new uses end up referring to permanent defs.
* rtl-ssa/functions.h (function_info::create_use): Declare.

[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler

2024-01-23 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070

--- Comment #12 from GCC Commits  ---
The master branch has been updated by Alex Coplan :

https://gcc.gnu.org/g:ef86659da9de59896fb0128eef418224299267e9

commit r14-8361-gef86659da9de59896fb0128eef418224299267e9
Author: Alex Coplan 
Date:   Fri Jan 12 10:15:36 2024 +

aarch64: Fix up uses of mem following stp insert [PR113070]

As the PR shows (specifically #c7) we are missing updating uses of mem
when inserting an stp in the aarch64 load/store pair fusion pass.  This
patch fixes that.

RTL-SSA has a simple view of memory and by default doesn't allow stores
to be re-ordered w.r.t. other stores.  In the ldp fusion pass, we do our
own alias analysis and so can re-order stores over other accesses when
we deem this is safe.  If neither store can be re-purposed (moved into
the required position to form the stp while respecting the RTL-SSA
constraints), then we turn both the candidate stores into "tombstone"
insns (logically delete them) and insert a new stp insn.

As it stands, we implement the insert case separately (after dealing
with the candidate stores) in fuse_pair by inserting into the middle of
the vector of changes.  This is OK when we only have to insert one
change, but with this fix we would need to insert the change for the new
stp plus multiple changes to fix up uses of mem (note the number of
fix-ups is naturally bounded by the alias limit param to prevent
quadratic behaviour).  If we kept the code structured as is and inserted
into the middle of the vector, that would lead to repeated moving of
elements in the vector which seems inefficient.  The structure of the
code would also be a little unwieldy.

To improve on that situation, this patch introduces a helper class,
stp_change_builder, which implements a state machine that helps to build
the required changes directly in program order.  That state machine is
reponsible for deciding what changes need to be made in what order, and
the code in fuse_pair then simply follows those steps.

Together with the fix in the previous patch for installing new defs
correctly in RTL-SSA, this fixes PR113070.

We take the opportunity to rename the function decide_stp_strategy to
try_repurpose_store, as that seems more descriptive of what it actually
does, since stp_change_builder is now responsible for the overall change
strategy.

gcc/ChangeLog:

PR target/113070
* config/aarch64/aarch64-ldp-fusion.cc
(struct stp_change_builder): New.
(decide_stp_strategy): Reanme to ...
(try_repurpose_store): ... this.
(ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to
construct stp changes.  Fix up uses when inserting new stp insns.

[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler

2024-01-23 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070

--- Comment #9 from GCC Commits  ---
The master branch has been updated by Alex Coplan :

https://gcc.gnu.org/g:e0374b028a665a2ea8d6eb2b4e5862774e9e85c2

commit r14-8358-ge0374b028a665a2ea8d6eb2b4e5862774e9e85c2
Author: Alex Coplan 
Date:   Thu Jan 11 16:17:37 2024 +

rtl-ssa: Run finalize_new_accesses forwards [PR113070]

The next patch in this series exposes an interface for creating new uses
in RTL-SSA.  The intent is that new user-created uses can consume new
user-created defs in the same change group.  This is so that we can
correctly update uses of memory when inserting a new store pair insn in
the aarch64 load/store pair fusion pass (the affected uses need to
consume the new store pair insn).

As it stands, finalize_new_accesses is called as part of the backwards
insn placement loop within change_insns, but if we want new uses to be
able to depend on new defs in the same change group, we need
finalize_new_accesses to be called on earlier insns first.  This is so
that when we process temporary uses and turn them into permanent uses,
we can follow the last_def link on the temporary def to ensure we end up
with a permanent use consuming a permanent def.

gcc/ChangeLog:

PR target/113070
* rtl-ssa/changes.cc (function_info::change_insns): Split out the
call
to finalize_new_accesses from the backwards placement loop, run it
forwards in a separate loop.

[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler

2024-01-13 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070

Alex Coplan  changed:

   What|Removed |Added

   Keywords||patch
URL||https://patchwork.sourcewar
   ||e.org/project/gcc/list/?ser
   ||ies=29671

--- Comment #8 from Alex Coplan  ---
Patch series posted:
https://patchwork.sourceware.org/project/gcc/list/?series=29671

[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler

2024-01-12 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070

--- Comment #7 from Alex Coplan  ---
Just to give a concrete example / reduced testcase where this goes wrong (to
aid review).  For the following testcase (reduced from libiberty) with -O2
-mlate-ldp-fusion:

struct {
  unsigned D;
  int E;
} * sha1_process_block_ctx;
void *sha1_process_block_buffer;
int sha1_process_block_ctx_1, sha1_process_block_ctx_0,
sha1_process_block_ctx_3, sha1_process_block_d, sha1_process_block_e,
sha1_process_block_tm, sha1_process_block_a, sha1_process_block_x_6,
sha1_process_block_x_14, sha1_process_block_x_15;
unsigned sha1_process_block_ctx_2;
void sha1_process_block() {
  int *words = sha1_process_block_buffer;
  int endp = *words, x_0;
  int x[6];
  unsigned b, c;
  while (endp) {
int t = 0;
for (; t < 6;)
  t = *words;
sha1_process_block_a +=
sha1_process_block_ctx_2 + 8348 + sha1_process_block_tm;
x_0 += sha1_process_block_tm = x[73];
b += sha1_process_block_x_15 = sha1_process_block_tm;
sha1_process_block_a += b | 1;
sha1_process_block_tm = sha1_process_block_x_14 ^ 8;
sha1_process_block_e = sha1_process_block_tm;
sha1_process_block_tm = x[8];
c += sha1_process_block_x_14 = sha1_process_block_tm;
b += sha1_process_block_x_15;
sha1_process_block_tm = x_0 ^ x[3];
sha1_process_block_a += sha1_process_block_tm;
sha1_process_block_tm = x[4] ^ x[15];
sha1_process_block_e +=
sha1_process_block_a + b ^ sha1_process_block_d +
sha1_process_block_tm;
sha1_process_block_tm = sha1_process_block_x_6 ^ x[15];
sha1_process_block_d +=
sha1_process_block_e >>
5 + (sha1_process_block_x_6 = sha1_process_block_tm);
sha1_process_block_ctx_0 += sha1_process_block_ctx_1 +=
sha1_process_block_ctx_2 += c;
sha1_process_block_ctx_3 += sha1_process_block_ctx->E +=
sha1_process_block_e;
  }
}

we try to do this:

fusing pair [L=0] (200,199), base=31, hazards: (27,54), move_range: (54,54)

with the initial IR:

insn i200 in bb3 [ebb3] at point 102:
  +---
  |  200: [sp:DI+0x64]=x0:SI
  |  REG_DEAD x0:SI
  +---
  uses:
use of set r0:i37 (x0:SI)
use of phi node r31:a12 (sp:DI)
  appears inside an address
  defines:
set mem:i200

insn i198 in bb3 [ebb3] at point 104:
  +---
  |  198: [sp:DI+0x6c]=x2:SI
  |  REG_DEAD x2:SI
  +---
  uses:
use of set r2:i81 (x2:SI)
use of phi node r31:a12 (sp:DI)
  appears inside an address
  defines:
set mem:i198
  used by insn i27 in bb3 [ebb3] at point 108

insn i54 in bb3 [ebb3] at point 106:
  +--
  |   54: x2:SI=x16:SI<<0x1
  +--
  uses:
SI use of set r16:i28 (x16:DI)
  defines:
set r2:i54 (x2:SI)
  used by insn i199 in bb3 [ebb3] at point 110

insn i27 in bb3 [ebb3] at point 108:
  +
  |   27: x0:DI=zero_extend([x1:DI+0x18])
  |  REG_EQUAL [const(`*.LANCHOR0'+0x18)]
  +
  uses:
use of set r1:i223 (x1:DI)
  appears inside an address
use of set mem:i198
  defines:
set r0:i27 (x0:DI)
  live out from bb3 [ebb3] at point 114
  used by phi node r0:a15 (x0:DI) in ebb6 at point 116

insn i199 in bb3 [ebb3] at point 110:
  +---
  |  199: [sp:DI+0x68]=x2:SI
  |  REG_DEAD x2:SI
  +---
  uses:
use of set r2:i54 (x2:SI)
use of phi node r31:a12 (sp:DI)
  appears inside an address
  defines:
set mem:i199
  used by phi node mem:a15 in ebb6 at point 116

as it stands, after fusing that pair, we have:

insn i200 in bb3 [ebb3] at point 102:
  +--
  |  200: clobber [scratch]
  +--
  defines:
set mem:i200

insn i198 in bb3 [ebb3] at point 104:
  +---
  |  198: [sp:DI+0x6c]=x2:SI
  |  REG_DEAD x2:SI
  +---
  uses:
use of set r2:i81 (x2:SI)
use of phi node r31:a12 (sp:DI)
  appears inside an address
  defines:
set mem:i198
  used by insn i27 in bb3 [ebb3] at point 108

insn i54 in bb3 [ebb3] at point 106:
  +--
  |   54: x2:SI=x16:SI<<0x1
  +--
  uses:
SI use of set r16:i28 (x16:DI)
  defines:
set r2:i54 (x2:SI)
  used by insn i244 in bb3 [ebb3] at point 107

insn i244 in bb3 [ebb3] at point 107:
  +
  |  244: [sp:DI+0x64]=unspec[x0:SI,x2:SI] 38
  

[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler

2024-01-12 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070

--- Comment #6 from Alex Coplan  ---
And with those fixes it indeed looks like profiledbootstrap + LTO with all
frontends on aarch64 is working again (with the passes enabled).

[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler

2024-01-12 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070

--- Comment #5 from Alex Coplan  ---
(In reply to Alex Coplan from comment #4)
> So debugging the PGO/LTO failure of cactuBSSN (from SPEC CPU 2017) shows
> that we can miss updating uses immediately following an stp insn in the case
> that we insert a new stp insn (as opposed to updating an existing one). 
> That can then lead to wrong code as alias analysis goes wrong as a result.
> 
> I'm planning on fixing that and hopefully this will turn out to be the same
> issue.

Further to this it turns out that when I added support to rtl-ssa for inserting
new insns I missed that changes.cc:apply_changes_to_insn does:

  // Add all clobbers.  Sets and call clobbers never move relative to
  // other definitions, so are OK as-is.
  for (def_info *def : change.new_defs)
if (is_a (def) && !def->is_call_clobber ())
  add_def (def);

so it won't insert new user-created defs into the def chain (which again causes
alias analysis to skip over the stps when looking at future load pair
candidates).  Fixing that (together with a fix for the use issue mentioned
above) fixes the wrong code in cactuBSSN_r.

It seems likely that the same issue is causing the wrong code during LTO
profiledbootstrap, I'll test that shortly.

[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler

2024-01-09 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070

--- Comment #4 from Alex Coplan  ---
So debugging the PGO/LTO failure of cactuBSSN (from SPEC CPU 2017) shows that
we can miss updating uses immediately following an stp insn in the case that we
insert a new stp insn (as opposed to updating an existing one).  That can then
lead to wrong code as alias analysis goes wrong as a result.

I'm planning on fixing that and hopefully this will turn out to be the same
issue.

[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler

2024-01-08 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P3  |P1
 CC||jakub at gcc dot gnu.org

[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler

2023-12-29 Thread schwab--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070

Andreas Schwab  changed:

   What|Removed |Added

 CC||doko at gcc dot gnu.org

--- Comment #3 from Andreas Schwab  ---
*** Bug 113173 has been marked as a duplicate of this bug. ***

[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler

2023-12-19 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070

Alex Coplan  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |acoplan at gcc dot 
gnu.org

--- Comment #2 from Alex Coplan  ---
Adding -mno-early-ldp-fusion -mno-late-ldp-fusion to the compile flags fixes
the bootstrap, so it seems to be triggered by the new load/store pair fusion
pass.  I'll take a look.

[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler

2023-12-19 Thread acoplan at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070

Alex Coplan  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 CC||acoplan at gcc dot gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2023-12-19

--- Comment #1 from Alex Coplan  ---
Confirmed, I can reproduce the bootstrap failure.  I will investigate further.
I'm also looking at another LTO + PGO runtime failure which seems to be
triggered by the new load/store pair fusion pass, these might be related.

[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler

2023-12-19 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.0