[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515 --- Comment #11 from CVS Commits --- The master branch has been updated by Richard Biener : https://gcc.gnu.org/g:9f4f833455bb35c11d03e93f802604ac7cd8b740 commit r14-2344-g9f4f833455bb35c11d03e93f802604ac7cd8b740 Author: Richard Biener Date: Wed Jul 5 15:57:49 2023 +0200 tree-optimization/110515 - wrong code with LIM + PRE In this PR we face the issue that LIM speculates a load when hoisting it out of the loop (since it knows it cannot trap). Unfortunately this exposes undefined behavior when the load accesses memory with the wrong dynamic type. This later makes PRE use that representation instead of the original which accesses the same memory location but using a different dynamic type leading to a wrong disambiguation of that original access against another and thus a wrong-code transform. Fortunately there already is code in PRE dealing with a similar situation for code hoisting but that left a small gap which when fixed also fixes the wrong-code transform in this bug even if it doesn't address the underlying issue of LIM speculating that load. The upside is this fix is trivially safe to backport and chances of code generation regressions are very low. PR tree-optimization/110515 * tree-ssa-pre.cc (compute_avail): Make code dealing with hoisting loads with different alias-sets more robust. * g++.dg/opt/pr110515.C: New testcase.
[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515 Richard Biener changed: What|Removed |Added Keywords|needs-bisection | Priority|P3 |P2 --- Comment #10 from Richard Biener --- So compute_avail in PRE tries to detect this situation and it ends up modifying the alias set of the load to zero (and is successfully in doing that). I'm testing a patch to plug a hole in that logic.
[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515 --- Comment #9 from Richard Biener --- So IMHO it's the fault of invariant motion moving _145 = MEM[(struct LargeRep *) + 8B].Slots; and _34 = MEM[(struct LargeRep *) + 8B].Capacity; out of loop 10 as these loads are executed conditional. They are know to not trap and that's why LIM doesn't require them unconditionally executed but as we can see here CSE later takes advantage of undefinedness if there's a load using the wrong TBAA type not matching the dynamic type of the storage. For example char storage[4] __attribute__((aligned(4))); double foo (int n, int kind) { double res; for (int i = 0; i < n; ++i) if (kind) res = *(int *)storage; else res = *(float *)storage; return res; } is transformed to [local count: 118111600]: if (n_7(D) > 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: _2 = MEM[(float *)]; _1 = MEM[(int *)]; [local count: 955630225]: # i_16 = PHI if (kind_10(D) != 0) goto ; [50.00%] else goto ; [50.00%] [local count: 477815112]: res_12 = (double) _1; goto ; [100.00%] [local count: 477815112]: res_11 = (double) _2; [local count: 955630225]: # res_3 = PHI i_13 = i_16 + 1; if (n_7(D) > i_13) goto ; [89.00%] else goto ; [11.00%] [local count: 850510901]: goto ; [100.00%] [local count: 118111600]: # res_15 = PHI return res_15; and then eventually to the following, unconditionally using float as alias set. [local count: 118111600]: if (n_7(D) > 0) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: _2 = MEM[(float *)]; if (kind_10(D) != 0) goto ; [50.00%] else goto ; [50.00%] [local count: 52559662]: _4 = VIEW_CONVERT_EXPR(_2); res_12 = (double) _4; goto ; [100.00%] [local count: 52559662]: res_11 = (double) _2; [local count: 118111600]: # res_15 = PHI return res_15;
[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515 Richard Biener changed: What|Removed |Added Last reconfirmed||2023-07-05 Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #8 from Richard Biener --- (In reply to Sergei Trofimovich from comment #7) > (In reply to Richard Biener from comment #6) > > Turning off PRE also allows it to pass but I guess it's the same as with > > SRA. > > -fno-strict-aliasing also helps, -fno-ipa-modref doesn't. I suspect > > bisection > > isn't going to help much, but maybe it points to a small set of changes on > > the testcase itself. > > Bisect landed on the r12-4790-g4b3a325f07aceb : > > $ git bisect bad > 4b3a325f07acebf47e82de227ce1d5ba62f5bcae is the first bad commit > commit 4b3a325f07acebf47e82de227ce1d5ba62f5bcae > Author: Aldy Hernandez > Date: Thu Oct 28 15:35:21 2021 +0200 > > Remove VRP threader passes in exchange for better threading pre-VRP. After this commit the bug still appears with -fdisable-tree-threadfull1 -fdisable-tree-threadfull2 and before the commit it appears with -fdisable-tree-vrp-thread1 -fdisable-tree-vrp-thread2 so it seems removing the threading passes after VRP exposes the issue. Looking at the source and the IL of grow() I think the issue is that loop invariant motion hoists _134 and _104 in _31 = Visited.Small; if (_31 != 0) goto ; [50.00%] else goto ; [50.00%] [local count: 621159684]: _145 = MEM[(struct LargeRep *) + 8B].Slots; _34 = MEM[(struct LargeRep *) + 8B].Capacity; if (_34 != 0) goto ; [0.00%] else goto ; [0.00%] [local count: 955630290]: # _134 = PHI <_34(4), 2(3)> # _104 = PHI <_145(4), (3)> [local count: 8933211531]: out of the inner first loop which I think is OK even though it makes the _145 and _34 reads unconditional. Then PRE hoists those further, but correctly identifying Visited.Small = 0; _94 = operator new [] (1024); Visited.u.o.Slots = _94; Visited.u.o.Capacity = 128; as where they change. Interestingly I can disable (almost) all passes after tree PRE and still get the failure. One change PRE does is - _18 = MEM[(struct V *) + 8B].v; - if (_18 != 790526) -goto ; [66.00%] + _196 = (long unsigned int) prephitmp_173; + if (prephitmp_173 != 790526B) +goto ; [66.00%] where it replaces the read with _94 (or EmptyKey as from entry). That's phishy. It's the != EmptyKey check in grow when copying inline slots into temporary storage. So on 4b3a325f07acebf47e82de227ce1d5ba62f5bcae^ (one before the bisected rev.) it fails with g++ t.C -O2 -fdisable-tree-vrp-thread1 -fdump-tree-pre-details -fdump-tree-all -fno-tree-tail-merge -fno-tree-vectorize -fno-tree-loop-optimize -fdisable-tree-sink1 -fdisable-tree-dse3 -fdisable-tree-dce4 -fdisable-tree-fre5 -fdisable-tree-dom3 -fdisable-tree-thread3 -fdisable-tree-thread4 -fdisable-tree-vrp2 -fdisable-tree-vrp-thread2 -fdisable-tree-dse5 -fdisable-tree-cddce3 -fdisable-tree-sink2 -fdisable-tree-store-merging -fdisable-tree-dce7 -fdisable-tree-modref2 -fdisable-tree-local-pure-const2 -fdisable-tree-split-paths -fdisable-tree-ccp4 -fdisable-tree-slsr -fdisable-tree-reassoc2 -fdisable-tree-switchlower1 -fdisable-tree-forwprop4 -fdisable-tree-phiopt4 -fno-gcse -fno-dse -fno-gcse-after-reload -fdbg-cnt=treepre_insert:1-3:16-21:24-24 -fno-tree-partial-pre So it goes that the above load is CSEd to the LIM inserted load plus a conversion from pointer to unsigned long: Value numbering stmt = _145 = MEM[(struct LargeRep *) + 8B].Slots; Setting value number of _145 to _145 (changed) ... Value numbering stmt = _18 = MEM[(struct V *) + 8B].v; Inserting name _47 for expression (long unsigned int) _145 Setting value number of _18 to _47 (changed) we seem to disregard TBAA when CSEing two loads without an intervening possibly aliasing stores. But this then causes PRE (well, that's a long-long-standing issue...) to record the later load to EXP_GEN as exp_gen[25] := { {component_ref,mem_ref<8B>,addr_expr<>}@.MEM_129 (0084), _18 (0031) } so to PRE it appears that we do _18 = (long unsigned int) MEM[(struct LargeRep *) + 8B].Slots; and thus the wrong ref is used for disambiguation which means we disambiguate against TheSlot_172->v = Item$f_20; which eventually stores to that slot. *sigh*
[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515 --- Comment #7 from Sergei Trofimovich --- (In reply to Richard Biener from comment #6) > Turning off PRE also allows it to pass but I guess it's the same as with SRA. > -fno-strict-aliasing also helps, -fno-ipa-modref doesn't. I suspect > bisection > isn't going to help much, but maybe it points to a small set of changes on > the testcase itself. Bisect landed on the r12-4790-g4b3a325f07aceb : $ git bisect bad 4b3a325f07acebf47e82de227ce1d5ba62f5bcae is the first bad commit commit 4b3a325f07acebf47e82de227ce1d5ba62f5bcae Author: Aldy Hernandez Date: Thu Oct 28 15:35:21 2021 +0200 Remove VRP threader passes in exchange for better threading pre-VRP.
[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515 Richard Biener changed: What|Removed |Added CC||rguenth at gcc dot gnu.org Keywords||needs-bisection --- Comment #6 from Richard Biener --- Turning off PRE also allows it to pass but I guess it's the same as with SRA. -fno-strict-aliasing also helps, -fno-ipa-modref doesn't. I suspect bisection isn't going to help much, but maybe it points to a small set of changes on the testcase itself.
[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515 Andrew Pinski changed: What|Removed |Added Keywords||alias --- Comment #5 from Andrew Pinski --- I am not 100% sure but I think the problem is related to this code (note I added that asm comment there): ``` V *P = u.i; V *E = P + InlineSlots; for (; P != E; ++P) { if (P->v != EmptyKey) { TmpEnd->v = P->v; ++TmpEnd; } } Small = false; //asm("":"+m"(*this)::"memory"); u.o = LargeRep{new V[128], 128}; ``` The problem is most likely the read from u.i is happening after the store to u.o.
[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515 --- Comment #4 from Andrew Pinski --- Right away I can say it is not phiopt that is causing the issue: That is: doing `__attribute__((optimize(3, "no-ssa-phiopt")))` (instead of just 3) still fails. Turning off SRA allows it to work but I suspect SRA is just exposing it.
[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515 Andrew Pinski changed: What|Removed |Added Summary|llvm-15.0.7 possibly|[12/13/14 Regression] |invalid code on -O3 |llvm-15.0.7 possibly ||invalid code on -O3 Target Milestone|--- |12.4 Keywords||wrong-code