[Bug tree-optimization/117574] [12/13/14/15 Regression] Miscompile with -O2/3 and -O0/1 since r6-4133-ga8fc25795155d4

2024-11-20 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117574

--- Comment #12 from GCC Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:ff5a14abeb31cd6bd0ca55e7043d05c8141a8c7f

commit r15-5496-gff5a14abeb31cd6bd0ca55e7043d05c8141a8c7f
Author: Richard Biener 
Date:   Fri Nov 15 11:56:14 2024 +0100

tree-optimization/117574 - bougs niter lt-to-ne

When trying to change a IV from IV0 < IV1 to IV0' != IV1' we apply
fancy adjustments to the may_be_zero condition we compute rather
than using the obvious IV0->base >= IV1->base expression (to be
able to use > instead of >=?).  This doesn't seem to go well.

PR tree-optimization/117574
* tree-ssa-loop-niter.cc (number_of_iterations_lt_to_ne):
Use the obvious may_be_zero condition.

* gcc.dg/torture/pr117574-1.c: New testcase.

[Bug tree-optimization/117574] [12/13/14/15 Regression] Miscompile with -O2/3 and -O0/1 since r6-4133-ga8fc25795155d4

2024-11-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117574

Richard Biener  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=81388

--- Comment #11 from Richard Biener  ---
The odd condition exists in this form since at least niter analysis was
refactored, splitting out functions in r0-72975-g7f17528ab58fa0.

[Bug tree-optimization/117574] [12/13/14/15 Regression] Miscompile with -O2/3 and -O0/1 since r6-4133-ga8fc25795155d4

2024-11-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117574

--- Comment #10 from Richard Biener  ---
And we compute the "noloop" condition as

noloop = fold_build2 (GT_EXPR, boolean_type_node,
  iv0->base,
  fold_build2 (PLUS_EXPR, type1,
   iv1->base, tmod));

where tmod is 47 (that's STEP - DELTA %[fl] STEP in the IVs type - mind,
DELTA is unsigned!).

So the noloop condition is

(long int) ((int) d.3_5 + 40) + 1 >= (long int) ((int) d.3_5 + 76)

which is wrong, it should compute true.

What's fishy here is the negative DELTA (so there is IV overflow in that
compute) and then the use of unsigned FLOOR_MOD to compute the adjustment
for the final value and the noloop condition (I'm not sure why that needs
the mod adjustment at all?!).

I'm going to test the naiive

diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index 9518bf969cd..1be4b552206 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -1200,17 +1200,6 @@ number_of_iterations_lt_to_ne (tree type, affine_iv
*iv0,
 affine_iv *iv1,
  if (integer_zerop (assumption))
return false;
}
-  if (mpz_cmp (mmod, bnds->below) < 0)
-   noloop = boolean_false_node;
-  else if (POINTER_TYPE_P (type))
-   noloop = fold_build2 (GT_EXPR, boolean_type_node,
- iv0->base,
- fold_build_pointer_plus (iv1->base, tmod));
-  else
-   noloop = fold_build2 (GT_EXPR, boolean_type_node,
- iv0->base,
- fold_build2 (PLUS_EXPR, type1,
-  iv1->base, tmod));
 }
   else
 {
@@ -1226,21 +1215,15 @@ number_of_iterations_lt_to_ne (tree type, affine_iv
*iv0, affine_iv *iv1,
  if (integer_zerop (assumption))
return false;
}
-  if (mpz_cmp (mmod, bnds->below) < 0)
-   noloop = boolean_false_node;
-  else if (POINTER_TYPE_P (type))
-   noloop = fold_build2 (GT_EXPR, boolean_type_node,
- fold_build_pointer_plus (iv0->base,
-  fold_build1
(NEGATE_EXPR,
-   type1,
tmod)),
- iv1->base);
-  else
-   noloop = fold_build2 (GT_EXPR, boolean_type_node,
- fold_build2 (MINUS_EXPR, type1,
-  iv0->base, tmod),
- iv1->base);
 }

+  /* IV0 < IV1 does not loop if IV0->base >= IV1->base.  */
+  if (mpz_cmp (mmod, bnds->below) < 0)
+noloop = boolean_false_node;
+  else
+noloop = fold_build2 (GE_EXPR, boolean_type_node,
+ iv0->base, iv1->base);
+
   if (!integer_nonzerop (assumption))
 niter->assumptions = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
  niter->assumptions,

[Bug tree-optimization/117574] [12/13/14/15 Regression] Miscompile with -O2/3 and -O0/1 since r6-4133-ga8fc25795155d4

2024-11-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117574

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #9 from Richard Biener  ---
So before IVCANON we have a loop-header copy that will enter the loop:

if (_8 >= _10) // 76 >= 40
  goto ; [89.00%] // enter loop
else
  goto ; [11.00%] // skip it

and the first loop jump 

 [local count: 955630224]:
# g_33 = PHI 
# b_lsm.12_28 = PHI <_20(15), b_lsm.12_30(13)>
_20 = b_lsm.12_28 + g_33;
g_21 = g_33 + 50;
if (_8 >= g_21)  // 76 >= 90
  goto ; [89.00%]  // loop
else
  goto ; [11.00%]  // exit

should exit the loop.  We enter number_of_iterations_lt with

{ (long int) ((int) d.3_5 + 40) + 50, +, 50 } and (long int) ((int) d.3_5 + 76)
+ 1

computing delta == 18446744073709551603 (aka -13u) and then
number_of_iterations_lt_to_ne computes we can convert this to NE with
a final value of 34 and a { 0, +, 50 } IV.  I will dig further.

[Bug tree-optimization/117574] [12/13/14/15 Regression] Miscompile with -O2/3 and -O0/1 since r6-4133-ga8fc25795155d4

2024-11-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117574

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

--- Comment #8 from Richard Biener  ---
for (g = 40; g != 124; g += 50)

.optimized has

  d.3_5 = d; // 0
  _6 = (int) d.3_5;
  _9 = _6 + 40;
  _10 = (long int) _9;  // 40

  _43 = (unsigned int) d.3_5;
  _23 = _43 + 41;
  _29 = (int) _23;
  _50 = _43 + 76;
  _39 = (int) _50;
  _18 = _29 >= _39;  // 41 >= 76
  _37 = _18 ? 1 : 368934881474191034;  // 368934881474191034
  _42 = (unsigned long) d.3_5;
  _41 = _37 * 50;  // 84
  _17 = _42 + 40;
  _38 = _17 + _41; // 124
  _46 = (long int) _38;


   [local count: 955630224]:
  # g_33 = PHI 
  # b_lsm.12_28 = PHI <_20(4), b_lsm.12_30(3)>
  _20 = b_lsm.12_28 + g_33;
  g_21 = g_33 + 50;
  if (g_21 != _46)
goto ; [89.00%]

it is IVCANON replacing the inner loop IV and thus in the end this looks like
a niter issue.  Before IVCANON we have

  d.3_5 = d;
  _6 = (int) d.3_5;
  _7 = _6 + 76;
  _8 = (long int) _7;
  _9 = _6 + 40;
  _10 = (long int) _9;

   [local count: 955630224]:
  # g_33 = PHI 
  # b_lsm.12_28 = PHI <_20(15), b_lsm.12_30(13)>
  _20 = b_lsm.12_28 + g_33;
  g_21 = g_33 + 50;
  if (_8 >= g_21)  // 76 >= { 40, +, 50 }
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 850510900]:
  goto ; [100.00%]

so we wrongly replaced a >= IV check with a != one, the generated code suggests
a flipped condition somewhere:

  _18 = _29 >= _39;  // 41 >= 76
  _37 = _18 ? 1 : 368934881474191034;  // 368934881474191034

with _18 = _29 <= _39 it would have been correct.

[Bug tree-optimization/117574] [12/13/14/15 Regression] Miscompile with -O2/3 and -O0/1 since r6-4133-ga8fc25795155d4

2024-11-14 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117574

--- Comment #7 from Sam James  ---
For completeness: 9 started with r9-2287-g47ca20b4f69986 (i.e. we got better at
optimising and exposed it).

[Bug tree-optimization/117574] [12/13/14/15 Regression] Miscompile with -O2/3 and -O0/1 since r6-4133-ga8fc25795155d4

2024-11-14 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117574

Sam James  changed:

   What|Removed |Added

Summary|[12/13/14/15 Regression]|[12/13/14/15 Regression]
   |Miscompile with -O2/3 and   |Miscompile with -O2/3 and
   |-O0/1   |-O0/1 since
   ||r6-4133-ga8fc25795155d4
   Keywords|needs-bisection |
 CC||rguenth at gcc dot gnu.org

--- Comment #6 from Sam James  ---
Started with r6-4133-ga8fc25795155d4.

[Bug tree-optimization/117574] [12/13/14/15 Regression] Miscompile with -O2/3 and -O0/1

2024-11-14 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117574

Sam James  changed:

   What|Removed |Added

  Known to work|6.5.0   |5.4.0
  Known to fail||6.1.0

--- Comment #5 from Sam James  ---
5.4 works then and 6.1 doesn't

[Bug tree-optimization/117574] [12/13/14/15 Regression] Miscompile with -O2/3 and -O0/1

2024-11-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117574

--- Comment #4 from Andrew Pinski  ---
(In reply to Sam James from comment #3)
> ```
> $ /tmp/gcc-pfx/bin/gcc /tmp/a.c -o /tmp/a -O2 -fwrapv && /tmp/a
> 0
> $ /tmp/gcc-pfx/bin/gcc /tmp/a2.c -o /tmp/a -O2 -fwrapv && /tmp/a
> 0
> ```
> 
> What am I missing?

I was just testing with `-O3` (and not -fwrapv). And that failed in GCC 7.4.0.

[Bug tree-optimization/117574] [12/13/14/15 Regression] Miscompile with -O2/3 and -O0/1

2024-11-14 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117574

--- Comment #3 from Sam James  ---
(In reply to Andrew Pinski from comment #1)
> Confirmed.
> 
> You can hit the bug in GCC 7/8 by `s/c = 1/c = -1/`.

I don't see it with 7.4.1 20191114 when bisecting with either of the files.

/tmp/a.c:
```
int printf(const char *, ...);
int a, c;
long b;
short d;
long e(long f, long h, long i) {
  for (long g = f; g <= h; g += i)
b += g;
  return b;
}
int main() {
  c = 1;
  for (; c >= 0; c--)
;
  for (; e(d + 40, d + 76, c + 51) < 4;)
;
  printf("%X\n", a);
}
``

/tmp/a2.c:
```
int printf(const char *, ...);
int a, c;
long b;
short d;
long e(long f, long h, long i) {
  for (long g = f; g <= h; g += i)
b += g;
  return b;
}
int main() {
  c = -1;
  for (; c >= 0; c--)
;
  for (; e(d + 40, d + 76, c + 51) < 4;)
;
  printf("%X\n", a);
}
```

```
$ /tmp/gcc-pfx/bin/gcc /tmp/a.c -o /tmp/a -O2 -fwrapv && /tmp/a
0
$ /tmp/gcc-pfx/bin/gcc /tmp/a2.c -o /tmp/a -O2 -fwrapv && /tmp/a
0
```

What am I missing?

[Bug tree-optimization/117574] [12/13/14/15 Regression] Miscompile with -O2/3 and -O0/1

2024-11-14 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117574

--- Comment #2 from Richard Biener  ---
On trunk -fno-tree-ch fixes/avoids the issue.  -fno-tree-vrp also works around.