[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255 --- Comment #12 from GCC Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:a98d5130a6dcff2ed4db371e500550134777b8cf commit r14-8346-ga98d5130a6dcff2ed4db371e500550134777b8cf Author: Richard Biener Date: Mon Jan 15 12:55:20 2024 +0100 rtl-optimization/113255 - base_alias_check vs. pointer difference When the x86 backend generates code for cpymem with the rep_8byte strathegy for the 8 byte aligned main rep movq it needs to compute an adjusted pointer to the source after doing a prologue aligning the destination. It computes that via src_ptr + (dest_ptr - orig_dest_ptr) which is perfectly fine. On RTL this is then 8: r134:DI=const(`g'+0x44) 9: {r133:DI=frame:DI-0x4c;clobber flags:CC;} REG_UNUSED flags:CC 56: r129:DI=const(`g'+0x4c) 57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;} REG_UNUSED flags:CC REG_EQUAL const(`g'+0x4c)&0xfff8 58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;} REG_DEAD r134:DI REG_UNUSED flags:CC REG_EQUAL const(`g'+0x44)-r129:DI 59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;} REG_DEAD r133:DI REG_UNUSED flags:CC but as written find_base_term happily picks the first candidate it finds for the MINUS which means it picks const(`g') rather than the correct frame:DI. This way find_base_term (but also the unfixed find_base_value used by init_alias_analysis to initialize REG_BASE_VALUE) performs pointer analysis isn't sound. The following restricts the handling of multi-operand operations to the case we know only one can be a pointer. This for example causes gcc.dg/tree-ssa/pr94969.c to miss some RTL PRE (I've opened PR113395 for this). A more drastic patch, removing base_alias_check results in only gcc.dg/guality/pr41447-1.c regressing (so testsuite coverage is bad). I've looked at gcc.dg/tree-ssa tests and mostly scheduling changes are present, the cc1plus .text size is only 230 bytes worse. With the this less drastic patch below most scheduling changes are gone. x86_64 might not the very best target to test for impact, but test coverage on other targets is unlikely to be very much better. PR rtl-optimization/113255 * alias.cc (find_base_term): Remove PLUS/MINUS handling when both operands are not CONST_INT_P. * gcc.dg/torture/pr113255.c: New testcase.
[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255 --- Comment #11 from Richard Biener --- diff --git a/gcc/alias.cc b/gcc/alias.cc index b2ec4806d22..0150dd699db 100644 --- a/gcc/alias.cc +++ b/gcc/alias.cc @@ -2272,6 +2272,8 @@ static bool base_alias_check (rtx x, rtx x_base, rtx y, rtx y_base, machine_mode x_mode, machine_mode y_mode) { + return 1; + /* If the address itself has no known base see if a known equivalent value has one. If either address still has no known base, nothing is known about aliasing. */ (an experiment I did many years ago already) gives clean testresults besides +FAIL: gcc.dg/guality/pr41447-1.c -O2 -DPREVENT_OPTIMIZATION execution test +FAIL: gcc.dg/guality/pr41447-1.c -O3 -g -DPREVENT_OPTIMIZATION execution test +FAIL: gcc.dg/guality/pr41447-1.c -Os -DPREVENT_OPTIMIZATION execution test +FAIL: gcc.dg/guality/pr41447-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none -DPREVENT_OPTIMIZATION execution test and only minor change in code generation: textdata bss dec hex filename 48374841 64824 1939512 50379177300b9a9 ../obj2/gcc/cc1plus 48375065 64824 1939512 50379401300ba89 gcc/cc1plus where the larger binary is the patched one. Assembly-wise there are scheduling changes, missed scheduling over spills. Recovering this and similar cases should be as easy as marking spill MEMs with a flag (in MEM_EXPR) for example, distinguishing (classes of) stack memory from the rest. We should have this already (spill_slot_decl, set_mem_attrs_for_spill), not sure why it doesn't look effective. Improving test coverage for desired transforms would be nice as well.
[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255 --- Comment #10 from Richard Biener --- Hmm, trying to fix find_base_term isn't enough, init_alias_analysis find_base_value needs to be adjusted as well. One "obvious" mistake there is a missing diff --git a/gcc/alias.cc b/gcc/alias.cc index b2ec4806d22..6aeb2167520 100644 --- a/gcc/alias.cc +++ b/gcc/alias.cc @@ -1492,6 +1492,13 @@ find_base_value (rtx src) { rtx temp, src_0 = XEXP (src, 0), src_1 = XEXP (src, 1); + /* If both operands of a MINUS are known pointers then the + base is not either of them. */ + if (GET_CODE (src) == MINUS + && REG_P (src_0) && REG_POINTER (src_0) + && REG_P (src_1) && REG_POINTER (src_1)) + return 0; + /* If either operand is a REG that is a known pointer, then it is the base. */ if (REG_P (src_0) && REG_POINTER (src_0)) but of course that's not conservative - not having REG_POINTER set doesn't mean it's not a pointer. But even when we assume REG_POINTER is correct the minus operands might not be REG_P. This is really all totally wrong for what it is (pointer analysis on RTL). On RTL we also lost constraints that arithmetic stays within an object. It should likely be scrapped completely and re-done, possibly having SET_DEST_POINTS_TO to be able to put SSA points-to info to SETs (REG_ATTRs are too coarse, but would be possible as well, losing some of the flow sensitivity). Incoming args & frame analysis would need to be implemented of course. As said, I'm not sure analyzing RTL will yield anything good while being conservative - and optimistic points-to is what leads us to these kind of bugs ...
[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255 --- Comment #9 from Richard Biener --- So this: static void expand_set_or_cpymem_prologue_epilogue_by_misaligned_moves (rtx destmem, rtx srcmem, ... /* See how many bytes we skipped. */ saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, saveddest, *destptr, saveddest, 1, OPTAB_DIRECT); /* Adjust srcptr and count. */ if (!issetmem) *srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr, saveddest, *srcptr, 1, OPTAB_DIRECT); is the problematical op, but I'm not seeing an easy way to avoid the MINUS here. It's also difficult to match (minus (..) (and ... aligning-cst)) in find_base_term as the AND is exposed via CSELIB values only. find_base_term is known broken but base_alias_check continues to be "useful" for aliasing with spill slots mostly. find_base_term tries to do ad-hoc points-to analysis but is not conservative in any way - it doesn't even have a way to say a final "I don't know" which means there's no way to perform a conservative correction to it. In fact I don't think we can make it conservative.
[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255 Richard Biener changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=49330, ||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=53705 --- Comment #8 from Richard Biener --- 3: NOTE_INSN_FUNCTION_BEG 8: r134:DI=const(`g'+0x44) 9: {r133:DI=frame:DI-0x4c;clobber flags:CC;} REG_UNUSED flags:CC ... 56: r129:DI=const(`g'+0x4c) 57: {r129:DI=r129:DI&0xfff8;clobber flags:CC;} REG_UNUSED flags:CC REG_EQUAL const(`g'+0x4c)&0xfff8 58: {r118:DI=r134:DI-r129:DI;clobber flags:CC;} REG_DEAD r134:DI REG_UNUSED flags:CC REG_EQUAL const(`g'+0x44)-r129:DI 59: {r119:DI=r133:DI-r118:DI;clobber flags:CC;} REG_DEAD r133:DI REG_UNUSED flags:CC so the source is - ((+0x44) - (+0x44)&0xfff...8) that is, the original source adjusted by the alignment prologue amount which is aligning the destination. That's a classical example for where find_base_term is confused. I'm not sure there's a way to "obfuscate" things here. I think find_base_term ignores paths with & (align ops) but it goes down the subtract base path here. So maybe instead of subtracting using & (align-1) for that would work.
[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255 --- Comment #7 from Jakub Jelinek --- Slightly cleaned up testcase: struct S { unsigned a[10]; unsigned y; unsigned b[6]; } g[2]; __attribute__((noinline, noclone)) int test (int x) { struct S e[2] = { g[0], g[1] }; int r = 0; if (x >= 0) { r++; e[1].y++; } g[1] = e[1]; return r; } int main () { test (1); if (g[1].y != 1) __builtin_abort (); return 0; }
[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255 Uroš Bizjak changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #6 from Uroš Bizjak --- (In reply to Richard Biener from comment #4) > I can't decipher this from what expand generates but the problem lies > there (the rep_8bytes expansion). Let's ask Honza.
[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255 --- Comment #5 from Jakub Jelinek --- (In reply to Richard Biener from comment #4) > I'm unsure the above parallel is valid, isn't parallel executing stmts > in "parallel" (unspecified order)? I don't see anything invalid on it. In addition to the memory copying, it describes the other effects at the end of the pattern (that the counter goes to 0 and the di/si pointers are incremented by the original counter times 8). All the uses of pseudos in the pattern are the values of those pseudos before the instruction, then all the sets at the end set the updated pseudos.
[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255 --- Comment #4 from Richard Biener --- DSE thinks the store is dead because it falls off the function. (insn 41 40 46 4 (set (mem/c:SI (plus:DI (reg/f:DI 19 frame) (const_int -36 [0xffdc])) [2 e[1].y+0 S4 A32]) (reg:SI 98 [ e$1$y ])) "t.c":21:9 85 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 98 [ e$1$y ]) (nil))) ... (insn 64 63 68 4 (parallel [ (set (reg:DI 131) (const_int 0 [0])) (set (reg/f:DI 129) (plus:DI (ashift:DI (reg:DI 131) (const_int 3 [0x3])) (reg/f:DI 129))) (set (reg/f:DI 119) (plus:DI (ashift:DI (reg:DI 131) (const_int 3 [0x3])) (reg/f:DI 119))) (set (mem/c:BLK (reg/f:DI 129) [1 A64]) (mem/c:BLK (reg/f:DI 119) [1 A8])) (use (reg:DI 131)) ]) "t.c":21:9 1416 {*rep_movdi_rex64} (expr_list:REG_UNUSED (reg:DI 131) (expr_list:REG_UNUSED (reg/f:DI 129) (expr_list:REG_UNUSED (reg/f:DI 119) (nil) it appears to be a bit convoluted how we compute the source of the block copy (reg/f:DI 119) and this is probably confusing "frame related" vs. non-frame-related MEM disambiguation. I'm unsure the above parallel is valid, isn't parallel executing stmts in "parallel" (unspecified order)? **scanning insn=64 mem: (reg/f:DI 119) after canon_rtx address: (reg/f:DI 119) after cselib_expand address: (minus:DI (plus:DI (reg/f:DI 129) (reg/f:DI 133)) (reg/f:DI 134)) after canon_rtx address: (minus:DI (plus:DI (reg/f:DI 129) (reg/f:DI 133)) (reg/f:DI 134)) varying cselib base=17:1741806 offset = 0 processing cselib load mem:(mem/c:BLK (reg/f:DI 119) [1 A8]) processing cselib load against insn 55 removing from active insn=55 has store processing cselib load against insn 47 removing from active insn=47 has store processing cselib load against insn 41 mems_found = 0, cannot_delete = true we're calling canon_true_dependence with (gdb) p debug_rtx (store_info->mem) (mem/c:SI (plus:DI (reg/f:DI 19 frame) (const_int -36 [0xffdc])) [2 e[1].y+0 S4 A32]) (gdb) p debug_rtx (mem) (mem/c:BLK (reg/f:DI 119) [1 A8]) (gdb) p debug_rtx (mem_addr) (minus:DI (plus:DI (value:DI 8:1741643 @0x49c8ba8/0x49b8e80) (value:DI 11:11 @0x49c8bf0/0x49b8f10)) (value:DI 9:9 @0x49c8bc0/0x49b8eb0)) and find_base_term of the mem_addr is returning (symbol_ref:DI ("g_e") [flags 0x2] ) which is because of the weird way we compute the source address (involving the address of the destination). There's a very old bug about find_base_term which tends to pick up a wrong base if multiple candidates are involved. I can't decipher this from what expand generates but the problem lies there (the rep_8bytes expansion).
[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255 --- Comment #3 from Uroš Bizjak --- _.dse1 pass is removing the store for some reason, -fno-dse "fixes" the testcase. Before _.dse1 pass, we have: (insn 41 40 46 4 (set (mem/c:SI (plus:DI (reg/f:DI 19 frame) (const_int -36 [0xffdc])) [2 e[1].y+0 S4 A32]) (reg:SI 98 [ e$1$y ])) "pr113255.c":21:9 85 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 98 [ e$1$y ]) (nil))) But _.dse1 pass decides that: **scanning insn=41 mem: (plus:DI (reg/f:DI 19 frame) (const_int -36 [0xffdc])) after canon_rtx address: (plus:DI (reg/f:DI 19 frame) (const_int -36 [0xffdc])) gid=1 offset=-36 processing const base store gid=1[-36..-32) mems_found = 1, cannot_delete = false ... Locally deleting insn 41 deferring deletion of insn with uid = 41.
[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255 Jakub Jelinek changed: What|Removed |Added Priority|P3 |P2 Keywords|needs-bisection | CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek --- Started with r11-963-g80d6f89e78fc3b772701988cc73aa8e8006283be
[Bug target/113255] [11/12/13/14 Regression] wrong code with -O2 -mtune=k8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113255 Andrew Pinski changed: What|Removed |Added Target Milestone|--- |11.5 Known to fail||10.2.0 Last reconfirmed||2024-01-06 Known to work||10.1.0 Component|c |target Ever confirmed|0 |1 Summary|wrong code with -O2 |[11/12/13/14 Regression] |-mtune=k8 |wrong code with -O2 ||-mtune=k8 Status|UNCONFIRMED |NEW --- Comment #1 from Andrew Pinski --- Confirmed. This looks to be a target issue. `-O2 -mtune=k8 -mstringop-strategy=libcall` works but `-O2 -mtune=k8 -mstringop-strategy=rep_8byte` does not.