[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3

2023-07-06 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515

--- Comment #11 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:9f4f833455bb35c11d03e93f802604ac7cd8b740

commit r14-2344-g9f4f833455bb35c11d03e93f802604ac7cd8b740
Author: Richard Biener 
Date:   Wed Jul 5 15:57:49 2023 +0200

tree-optimization/110515 - wrong code with LIM + PRE

In this PR we face the issue that LIM speculates a load when
hoisting it out of the loop (since it knows it cannot trap).
Unfortunately this exposes undefined behavior when the load
accesses memory with the wrong dynamic type.  This later
makes PRE use that representation instead of the original
which accesses the same memory location but using a different
dynamic type leading to a wrong disambiguation of that
original access against another and thus a wrong-code transform.

Fortunately there already is code in PRE dealing with a similar
situation for code hoisting but that left a small gap which
when fixed also fixes the wrong-code transform in this bug even
if it doesn't address the underlying issue of LIM speculating
that load.

The upside is this fix is trivially safe to backport and chances
of code generation regressions are very low.

PR tree-optimization/110515
* tree-ssa-pre.cc (compute_avail): Make code dealing
with hoisting loads with different alias-sets more
robust.

* g++.dg/opt/pr110515.C: New testcase.

[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3

2023-07-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515

Richard Biener  changed:

   What|Removed |Added

   Keywords|needs-bisection |
   Priority|P3  |P2

--- Comment #10 from Richard Biener  ---
So compute_avail in PRE tries to detect this situation and it ends up modifying
the alias set of the load to zero (and is successfully in doing that).  I'm
testing a patch to plug a hole in that logic.

[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3

2023-07-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515

--- Comment #9 from Richard Biener  ---
So IMHO it's the fault of invariant motion moving

_145 = MEM[(struct LargeRep *) + 8B].Slots;

and

_34 = MEM[(struct LargeRep *) + 8B].Capacity;

out of loop 10 as these loads are executed conditional.  They are know
to not trap and that's why LIM doesn't require them unconditionally
executed but as we can see here CSE later takes advantage of undefinedness
if there's a load using the wrong TBAA type not matching the dynamic type
of the storage.

For example

char storage[4] __attribute__((aligned(4)));
double foo (int n, int kind)
{
  double res;
  for (int i = 0; i < n; ++i)
if (kind)
  res = *(int *)storage;
else
  res = *(float *)storage;
  return res;
}

is transformed to

   [local count: 118111600]:
  if (n_7(D) > 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119324]:
  _2 = MEM[(float *)];
  _1 = MEM[(int *)];

   [local count: 955630225]:
  # i_16 = PHI 
  if (kind_10(D) != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 477815112]:
  res_12 = (double) _1;
  goto ; [100.00%]

   [local count: 477815112]:
  res_11 = (double) _2;

   [local count: 955630225]:
  # res_3 = PHI 
  i_13 = i_16 + 1;
  if (n_7(D) > i_13)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 850510901]:
  goto ; [100.00%]

   [local count: 118111600]:
  # res_15 = PHI 
  return res_15;

and then eventually to the following, unconditionally using float as alias set.

   [local count: 118111600]:
  if (n_7(D) > 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119324]:
  _2 = MEM[(float *)];
  if (kind_10(D) != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 52559662]:
  _4 = VIEW_CONVERT_EXPR(_2);
  res_12 = (double) _4;
  goto ; [100.00%]

   [local count: 52559662]:
  res_11 = (double) _2;

   [local count: 118111600]:
  # res_15 = PHI 
  return res_15;

[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3

2023-07-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2023-07-05
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #8 from Richard Biener  ---
(In reply to Sergei Trofimovich from comment #7)
> (In reply to Richard Biener from comment #6)
> > Turning off PRE also allows it to pass but I guess it's the same as with 
> > SRA.
> > -fno-strict-aliasing also helps, -fno-ipa-modref doesn't.  I suspect
> > bisection
> > isn't going to help much, but maybe it points to a small set of changes on
> > the testcase itself.
> 
> Bisect landed on the r12-4790-g4b3a325f07aceb :
> 
> $ git bisect bad
> 4b3a325f07acebf47e82de227ce1d5ba62f5bcae is the first bad commit
> commit 4b3a325f07acebf47e82de227ce1d5ba62f5bcae
> Author: Aldy Hernandez 
> Date:   Thu Oct 28 15:35:21 2021 +0200
> 
> Remove VRP threader passes in exchange for better threading pre-VRP.

After this commit the bug still appears with -fdisable-tree-threadfull1
-fdisable-tree-threadfull2 and before the commit it appears with
-fdisable-tree-vrp-thread1 -fdisable-tree-vrp-thread2 so it seems
removing the threading passes after VRP exposes the issue.

Looking at the source and the IL of grow() I think the issue is that
loop invariant motion hoists _134 and _104 in

  _31 = Visited.Small;
  if (_31 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 621159684]:
  _145 = MEM[(struct LargeRep *) + 8B].Slots;
  _34 = MEM[(struct LargeRep *) + 8B].Capacity;
  if (_34 != 0)
goto ; [0.00%]
  else
goto ; [0.00%]

   [local count: 955630290]:
  # _134 = PHI <_34(4), 2(3)>
  # _104 = PHI <_145(4), (3)>

   [local count: 8933211531]:

out of the inner first loop which I think is OK even though it makes
the _145 and _34 reads unconditional.  Then PRE hoists those further,
but correctly identifying

  Visited.Small = 0;
  _94 = operator new [] (1024);
  Visited.u.o.Slots = _94;
  Visited.u.o.Capacity = 128;

as where they change.

Interestingly I can disable (almost) all passes after tree PRE and still
get the failure.  One change PRE does is

-  _18 = MEM[(struct V *) + 8B].v;
-  if (_18 != 790526)
-goto ; [66.00%]
+  _196 = (long unsigned int) prephitmp_173;
+  if (prephitmp_173 != 790526B)
+goto ; [66.00%]

where it replaces the read with _94 (or EmptyKey as from entry).  That's
phishy.  It's the != EmptyKey check in grow when copying inline slots into
temporary storage.

So on 4b3a325f07acebf47e82de227ce1d5ba62f5bcae^ (one before the bisected
rev.) it fails with

g++ t.C -O2 -fdisable-tree-vrp-thread1 -fdump-tree-pre-details -fdump-tree-all
-fno-tree-tail-merge -fno-tree-vectorize -fno-tree-loop-optimize
-fdisable-tree-sink1 -fdisable-tree-dse3 -fdisable-tree-dce4
-fdisable-tree-fre5 -fdisable-tree-dom3 -fdisable-tree-thread3
-fdisable-tree-thread4 -fdisable-tree-vrp2 -fdisable-tree-vrp-thread2
-fdisable-tree-dse5 -fdisable-tree-cddce3 -fdisable-tree-sink2
-fdisable-tree-store-merging -fdisable-tree-dce7 -fdisable-tree-modref2
-fdisable-tree-local-pure-const2 -fdisable-tree-split-paths -fdisable-tree-ccp4
-fdisable-tree-slsr -fdisable-tree-reassoc2 -fdisable-tree-switchlower1
-fdisable-tree-forwprop4 -fdisable-tree-phiopt4 -fno-gcse -fno-dse
-fno-gcse-after-reload -fdbg-cnt=treepre_insert:1-3:16-21:24-24
-fno-tree-partial-pre

So it goes that the above load is CSEd to the LIM inserted load plus
a conversion from pointer to unsigned long:

Value numbering stmt = _145 = MEM[(struct LargeRep *) + 8B].Slots;
Setting value number of _145 to _145 (changed)
...
Value numbering stmt = _18 = MEM[(struct V *) + 8B].v;
Inserting name _47 for expression (long unsigned int) _145
Setting value number of _18 to _47 (changed)

we seem to disregard TBAA when CSEing two loads without an intervening
possibly aliasing stores.  But this then causes PRE (well, that's a
long-long-standing issue...) to record the later load to EXP_GEN as

exp_gen[25] := {
{component_ref,mem_ref<8B>,addr_expr<>}@.MEM_129 (0084), _18
(0031) }

so to PRE it appears that we do

  _18 = (long unsigned int) MEM[(struct LargeRep *) + 8B].Slots;

and thus the wrong ref is used for disambiguation which means we
disambiguate against

  TheSlot_172->v = Item$f_20;

which eventually stores to that slot.

*sigh*

[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3

2023-07-04 Thread slyfox at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515

--- Comment #7 from Sergei Trofimovich  ---
(In reply to Richard Biener from comment #6)
> Turning off PRE also allows it to pass but I guess it's the same as with SRA.
> -fno-strict-aliasing also helps, -fno-ipa-modref doesn't.  I suspect
> bisection
> isn't going to help much, but maybe it points to a small set of changes on
> the testcase itself.

Bisect landed on the r12-4790-g4b3a325f07aceb :

$ git bisect bad
4b3a325f07acebf47e82de227ce1d5ba62f5bcae is the first bad commit
commit 4b3a325f07acebf47e82de227ce1d5ba62f5bcae
Author: Aldy Hernandez 
Date:   Thu Oct 28 15:35:21 2021 +0200

Remove VRP threader passes in exchange for better threading pre-VRP.

[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3

2023-07-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org
   Keywords||needs-bisection

--- Comment #6 from Richard Biener  ---
Turning off PRE also allows it to pass but I guess it's the same as with SRA.
-fno-strict-aliasing also helps, -fno-ipa-modref doesn't.  I suspect bisection
isn't going to help much, but maybe it points to a small set of changes on the
testcase itself.

[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3

2023-07-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||alias

--- Comment #5 from Andrew Pinski  ---
I am not 100% sure but I think the problem is related to this code (note I
added that asm comment there):
```
V *P = u.i;
V *E = P + InlineSlots;
for (; P != E; ++P) {
if (P->v != EmptyKey) {
TmpEnd->v = P->v;
++TmpEnd;
}
}

Small = false;
//asm("":"+m"(*this)::"memory");
u.o = LargeRep{new V[128], 128};
```

The problem is most likely the read from u.i is happening after the store to
u.o.

[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3

2023-07-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515

--- Comment #4 from Andrew Pinski  ---
Right away I can say it is not phiopt that is causing the issue:
That is:
doing `__attribute__((optimize(3, "no-ssa-phiopt")))` (instead of just 3) still
fails.

Turning off SRA allows it to work but I suspect SRA is just exposing it.

[Bug middle-end/110515] [12/13/14 Regression] llvm-15.0.7 possibly invalid code on -O3

2023-07-01 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110515

Andrew Pinski  changed:

   What|Removed |Added

Summary|llvm-15.0.7 possibly|[12/13/14 Regression]
   |invalid code on -O3 |llvm-15.0.7 possibly
   ||invalid code on -O3
   Target Milestone|--- |12.4
   Keywords||wrong-code