[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070 Alex Coplan changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #13 from Alex Coplan --- Should be fixed, sorry for the delay, and thanks for the report.
[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070 --- Comment #11 from GCC Commits --- The master branch has been updated by Alex Coplan : https://gcc.gnu.org/g:6dd613df59060fb54c4e3f66f39cf59bc44d118a commit r14-8360-g6dd613df59060fb54c4e3f66f39cf59bc44d118a Author: Alex Coplan Date: Fri Jan 12 09:09:10 2024 + rtl-ssa: Ensure new defs get inserted [PR113070] In r14-5820-ga49befbd2c783e751dc2110b544fe540eb7e33eb I added support to RTL-SSA for inserting new insns, which included support for users creating new defs. However, I missed that apply_changes_to_insn needed updating to ensure that the new defs actually got inserted into the main def chain. This meant that when the aarch64 ldp/stp pass inserted a new stp insn, the stp would just get skipped over during subsequent alias analysis, as its def never got inserted into the memory def chain. This (unsurprisingly) led to wrong code. This patch fixes the issue by ensuring new user-created defs get inserted. I would have preferred to have used a flag internal to the defs instead of a separate data structure to keep track of them, but since machine_mode increased to 16 bits we're already at 64 bits in access_info, and we can't really reuse m_is_temp as the logic in finalize_new_accesses requires it to get cleared. gcc/ChangeLog: PR target/113070 * rtl-ssa.h: Include hash-set.h. * rtl-ssa/changes.cc (function_info::finalize_new_accesses): Add new_sets parameter and use it to keep track of new user-created sets. (function_info::apply_changes_to_insn): Also call add_def on new sets. (function_info::change_insns): Add hash_set to keep track of new user-created defs. Plumb it through. * rtl-ssa/functions.h: Add hash_set parameter to finalize_new_accesses and apply_changes_to_insn.
[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070 --- Comment #10 from GCC Commits --- The master branch has been updated by Alex Coplan : https://gcc.gnu.org/g:fce3994d04fc5d7d1c91f6db5a1f144aa291439a commit r14-8359-gfce3994d04fc5d7d1c91f6db5a1f144aa291439a Author: Alex Coplan Date: Fri Jan 12 10:14:33 2024 + rtl-ssa: Support for creating new uses [PR113070] This exposes an interface for users to create new uses in RTL-SSA. This is needed for updating uses after inserting a new store pair insn in the aarch64 load/store pair fusion pass. gcc/ChangeLog: PR target/113070 * rtl-ssa/accesses.cc (function_info::create_use): New. * rtl-ssa/changes.cc (function_info::finalize_new_accesses): Ensure new uses end up referring to permanent defs. * rtl-ssa/functions.h (function_info::create_use): Declare.
[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070 --- Comment #12 from GCC Commits --- The master branch has been updated by Alex Coplan : https://gcc.gnu.org/g:ef86659da9de59896fb0128eef418224299267e9 commit r14-8361-gef86659da9de59896fb0128eef418224299267e9 Author: Alex Coplan Date: Fri Jan 12 10:15:36 2024 + aarch64: Fix up uses of mem following stp insert [PR113070] As the PR shows (specifically #c7) we are missing updating uses of mem when inserting an stp in the aarch64 load/store pair fusion pass. This patch fixes that. RTL-SSA has a simple view of memory and by default doesn't allow stores to be re-ordered w.r.t. other stores. In the ldp fusion pass, we do our own alias analysis and so can re-order stores over other accesses when we deem this is safe. If neither store can be re-purposed (moved into the required position to form the stp while respecting the RTL-SSA constraints), then we turn both the candidate stores into "tombstone" insns (logically delete them) and insert a new stp insn. As it stands, we implement the insert case separately (after dealing with the candidate stores) in fuse_pair by inserting into the middle of the vector of changes. This is OK when we only have to insert one change, but with this fix we would need to insert the change for the new stp plus multiple changes to fix up uses of mem (note the number of fix-ups is naturally bounded by the alias limit param to prevent quadratic behaviour). If we kept the code structured as is and inserted into the middle of the vector, that would lead to repeated moving of elements in the vector which seems inefficient. The structure of the code would also be a little unwieldy. To improve on that situation, this patch introduces a helper class, stp_change_builder, which implements a state machine that helps to build the required changes directly in program order. That state machine is reponsible for deciding what changes need to be made in what order, and the code in fuse_pair then simply follows those steps. Together with the fix in the previous patch for installing new defs correctly in RTL-SSA, this fixes PR113070. We take the opportunity to rename the function decide_stp_strategy to try_repurpose_store, as that seems more descriptive of what it actually does, since stp_change_builder is now responsible for the overall change strategy. gcc/ChangeLog: PR target/113070 * config/aarch64/aarch64-ldp-fusion.cc (struct stp_change_builder): New. (decide_stp_strategy): Reanme to ... (try_repurpose_store): ... this. (ldp_bb_info::fuse_pair): Refactor to use stp_change_builder to construct stp changes. Fix up uses when inserting new stp insns.
[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070 --- Comment #9 from GCC Commits --- The master branch has been updated by Alex Coplan : https://gcc.gnu.org/g:e0374b028a665a2ea8d6eb2b4e5862774e9e85c2 commit r14-8358-ge0374b028a665a2ea8d6eb2b4e5862774e9e85c2 Author: Alex Coplan Date: Thu Jan 11 16:17:37 2024 + rtl-ssa: Run finalize_new_accesses forwards [PR113070] The next patch in this series exposes an interface for creating new uses in RTL-SSA. The intent is that new user-created uses can consume new user-created defs in the same change group. This is so that we can correctly update uses of memory when inserting a new store pair insn in the aarch64 load/store pair fusion pass (the affected uses need to consume the new store pair insn). As it stands, finalize_new_accesses is called as part of the backwards insn placement loop within change_insns, but if we want new uses to be able to depend on new defs in the same change group, we need finalize_new_accesses to be called on earlier insns first. This is so that when we process temporary uses and turn them into permanent uses, we can follow the last_def link on the temporary def to ensure we end up with a permanent use consuming a permanent def. gcc/ChangeLog: PR target/113070 * rtl-ssa/changes.cc (function_info::change_insns): Split out the call to finalize_new_accesses from the backwards placement loop, run it forwards in a separate loop.
[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070 Alex Coplan changed: What|Removed |Added Keywords||patch URL||https://patchwork.sourcewar ||e.org/project/gcc/list/?ser ||ies=29671 --- Comment #8 from Alex Coplan --- Patch series posted: https://patchwork.sourceware.org/project/gcc/list/?series=29671
[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070 --- Comment #7 from Alex Coplan --- Just to give a concrete example / reduced testcase where this goes wrong (to aid review). For the following testcase (reduced from libiberty) with -O2 -mlate-ldp-fusion: struct { unsigned D; int E; } * sha1_process_block_ctx; void *sha1_process_block_buffer; int sha1_process_block_ctx_1, sha1_process_block_ctx_0, sha1_process_block_ctx_3, sha1_process_block_d, sha1_process_block_e, sha1_process_block_tm, sha1_process_block_a, sha1_process_block_x_6, sha1_process_block_x_14, sha1_process_block_x_15; unsigned sha1_process_block_ctx_2; void sha1_process_block() { int *words = sha1_process_block_buffer; int endp = *words, x_0; int x[6]; unsigned b, c; while (endp) { int t = 0; for (; t < 6;) t = *words; sha1_process_block_a += sha1_process_block_ctx_2 + 8348 + sha1_process_block_tm; x_0 += sha1_process_block_tm = x[73]; b += sha1_process_block_x_15 = sha1_process_block_tm; sha1_process_block_a += b | 1; sha1_process_block_tm = sha1_process_block_x_14 ^ 8; sha1_process_block_e = sha1_process_block_tm; sha1_process_block_tm = x[8]; c += sha1_process_block_x_14 = sha1_process_block_tm; b += sha1_process_block_x_15; sha1_process_block_tm = x_0 ^ x[3]; sha1_process_block_a += sha1_process_block_tm; sha1_process_block_tm = x[4] ^ x[15]; sha1_process_block_e += sha1_process_block_a + b ^ sha1_process_block_d + sha1_process_block_tm; sha1_process_block_tm = sha1_process_block_x_6 ^ x[15]; sha1_process_block_d += sha1_process_block_e >> 5 + (sha1_process_block_x_6 = sha1_process_block_tm); sha1_process_block_ctx_0 += sha1_process_block_ctx_1 += sha1_process_block_ctx_2 += c; sha1_process_block_ctx_3 += sha1_process_block_ctx->E += sha1_process_block_e; } } we try to do this: fusing pair [L=0] (200,199), base=31, hazards: (27,54), move_range: (54,54) with the initial IR: insn i200 in bb3 [ebb3] at point 102: +--- | 200: [sp:DI+0x64]=x0:SI | REG_DEAD x0:SI +--- uses: use of set r0:i37 (x0:SI) use of phi node r31:a12 (sp:DI) appears inside an address defines: set mem:i200 insn i198 in bb3 [ebb3] at point 104: +--- | 198: [sp:DI+0x6c]=x2:SI | REG_DEAD x2:SI +--- uses: use of set r2:i81 (x2:SI) use of phi node r31:a12 (sp:DI) appears inside an address defines: set mem:i198 used by insn i27 in bb3 [ebb3] at point 108 insn i54 in bb3 [ebb3] at point 106: +-- | 54: x2:SI=x16:SI<<0x1 +-- uses: SI use of set r16:i28 (x16:DI) defines: set r2:i54 (x2:SI) used by insn i199 in bb3 [ebb3] at point 110 insn i27 in bb3 [ebb3] at point 108: + | 27: x0:DI=zero_extend([x1:DI+0x18]) | REG_EQUAL [const(`*.LANCHOR0'+0x18)] + uses: use of set r1:i223 (x1:DI) appears inside an address use of set mem:i198 defines: set r0:i27 (x0:DI) live out from bb3 [ebb3] at point 114 used by phi node r0:a15 (x0:DI) in ebb6 at point 116 insn i199 in bb3 [ebb3] at point 110: +--- | 199: [sp:DI+0x68]=x2:SI | REG_DEAD x2:SI +--- uses: use of set r2:i54 (x2:SI) use of phi node r31:a12 (sp:DI) appears inside an address defines: set mem:i199 used by phi node mem:a15 in ebb6 at point 116 as it stands, after fusing that pair, we have: insn i200 in bb3 [ebb3] at point 102: +-- | 200: clobber [scratch] +-- defines: set mem:i200 insn i198 in bb3 [ebb3] at point 104: +--- | 198: [sp:DI+0x6c]=x2:SI | REG_DEAD x2:SI +--- uses: use of set r2:i81 (x2:SI) use of phi node r31:a12 (sp:DI) appears inside an address defines: set mem:i198 used by insn i27 in bb3 [ebb3] at point 108 insn i54 in bb3 [ebb3] at point 106: +-- | 54: x2:SI=x16:SI<<0x1 +-- uses: SI use of set r16:i28 (x16:DI) defines: set r2:i54 (x2:SI) used by insn i244 in bb3 [ebb3] at point 107 insn i244 in bb3 [ebb3] at point 107: + | 244: [sp:DI+0x64]=unspec[x0:SI,x2:SI] 38
[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070 --- Comment #6 from Alex Coplan --- And with those fixes it indeed looks like profiledbootstrap + LTO with all frontends on aarch64 is working again (with the passes enabled).
[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070 --- Comment #5 from Alex Coplan --- (In reply to Alex Coplan from comment #4) > So debugging the PGO/LTO failure of cactuBSSN (from SPEC CPU 2017) shows > that we can miss updating uses immediately following an stp insn in the case > that we insert a new stp insn (as opposed to updating an existing one). > That can then lead to wrong code as alias analysis goes wrong as a result. > > I'm planning on fixing that and hopefully this will turn out to be the same > issue. Further to this it turns out that when I added support to rtl-ssa for inserting new insns I missed that changes.cc:apply_changes_to_insn does: // Add all clobbers. Sets and call clobbers never move relative to // other definitions, so are OK as-is. for (def_info *def : change.new_defs) if (is_a (def) && !def->is_call_clobber ()) add_def (def); so it won't insert new user-created defs into the def chain (which again causes alias analysis to skip over the stps when looking at future load pair candidates). Fixing that (together with a fix for the use issue mentioned above) fixes the wrong code in cactuBSSN_r. It seems likely that the same issue is causing the wrong code during LTO profiledbootstrap, I'll test that shortly.
[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070 --- Comment #4 from Alex Coplan --- So debugging the PGO/LTO failure of cactuBSSN (from SPEC CPU 2017) shows that we can miss updating uses immediately following an stp insn in the case that we insert a new stp insn (as opposed to updating an existing one). That can then lead to wrong code as alias analysis goes wrong as a result. I'm planning on fixing that and hopefully this will turn out to be the same issue.
[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070 Jakub Jelinek changed: What|Removed |Added Priority|P3 |P1 CC||jakub at gcc dot gnu.org
[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070 Andreas Schwab changed: What|Removed |Added CC||doko at gcc dot gnu.org --- Comment #3 from Andreas Schwab --- *** Bug 113173 has been marked as a duplicate of this bug. ***
[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070 Alex Coplan changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |acoplan at gcc dot gnu.org --- Comment #2 from Alex Coplan --- Adding -mno-early-ldp-fusion -mno-late-ldp-fusion to the compile flags fixes the bootstrap, so it seems to be triggered by the new load/store pair fusion pass. I'll take a look.
[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070 Alex Coplan changed: What|Removed |Added Status|UNCONFIRMED |NEW CC||acoplan at gcc dot gnu.org Ever confirmed|0 |1 Last reconfirmed||2023-12-19 --- Comment #1 from Alex Coplan --- Confirmed, I can reproduce the bootstrap failure. I will investigate further. I'm also looking at another LTO + PGO runtime failure which seems to be triggered by the new load/store pair fusion pass, these might be related.
[Bug target/113070] [14 regression] [AArch64] [PGO/LTO] Miscompilation of go compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113070 Richard Biener changed: What|Removed |Added Target Milestone|--- |14.0