[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-09-03 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #26 from Jiu Fu Guo  ---
Patch was committed.

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-09-03 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #25 from CVS Commits  ---
The master branch has been updated by Jiu Fu Guo :

https://gcc.gnu.org/g:1aceceb1e2d6e86ce183c8cc448750fa03b6f79e

commit r14-3644-g1aceceb1e2d6e86ce183c8cc448750fa03b6f79e
Author: Jiufu Guo 
Date:   Mon Sep 4 10:31:04 2023 +0800

Optimize '(X - N * M) / N' to 'X / N - M' if valid

Integer expression "(X - N * M) / N" can be optimized to "X / N - M" with
the below conditions:
1. There is no wrap/overflow/underflow.
   wrap/overflow/underflow breaks the arithmetic operation.
2. "X - N * M" and "X" are not of opposite sign.
   Here, the operation "/" would be "trunc_div", the fractional part is
   discarded. If "X - N * M" and "X" are in different signs, then trunc_div
   discards the fractional parts (of /N) in different directions.

PR tree-optimization/108757

gcc/ChangeLog:

* match.pd ((X - N * M) / N): New pattern.
((X + N * M) / N): New pattern.
((X + C) div_rshift N): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/pr108757-1.c: New test.
* gcc.dg/pr108757-2.c: New test.
* gcc.dg/pr108757.h: New test.

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-05-12 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #24 from Jiu Fu Guo  ---
(In reply to Jiu Fu Guo from comment #23)
> /* Simplify ((t + -N*M) / N + M) -> t / N: (t + -C) >> N + (C>>N) ==> t >> N
> */
> (for div (trunc_div exact_div)
div was not used in this matcher, yet.  Here rshift is used:  t/(1<>N".

>  (simplify
>   (plus (rshift (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) &&
>(wi::to_wide (@3) << wi::to_wide (@2)) == -wi::to_wide (@1))
>(if (TYPE_OVERFLOW_UNDEFINED (type))
> (div @0 @2)
This should be "(rshift @0 @2)", otherwise it will be error if relax
"TYPE_UNSIGNED (type)"

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-05-12 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #23 from Jiu Fu Guo  ---
/* Simplify ((t + -N*M) / N + M) -> t / N: (t + -C) >> N + (C>>N) ==> t >> N */
(for div (trunc_div exact_div)
 (simplify
  (plus (rshift (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) &&
   (wi::to_wide (@3) << wi::to_wide (@2)) == -wi::to_wide (@1))
   (if (TYPE_OVERFLOW_UNDEFINED (type))
(div @0 @2)
#if GIMPLE
(with
 {
   bool overflowed = true;
   value_range vr0;
   if (get_range_query (cfun)->range_of_expr (vr0, @0)
   && !vr0.varying_p () && !vr0.undefined_p ())
 {
   wide_int wmin0 = vr0.lower_bound ();
   wide_int wmax0 = vr0.upper_bound ();
   wide_int w1 = -wi::to_wide (@1);
   wi::overflow_type min_ovf, max_ovf;
   wi::sub (wmin0, w1, TYPE_SIGN (type), _ovf);
   wi::sub (wmax0, w1, TYPE_SIGN (type), _ovf);
   if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
 overflowed = false;
 }
 }
(if (!overflowed)
 (rshift @0 @2)))
#endif
   

Got one match for the case.
Checking if it is safe(condition) or how to support other forms:
signed type, negative N, non-power2 N, negative M ...

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-05-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #22 from Jiu Fu Guo  ---
(In reply to Andrew Pinski from comment #21)
> (In reply to Jiu Fu Guo from comment #20)
> > Interesting thing:
> > the VR is always VR_VARYING, even for the below simple case:
> > 
> > typedef unsigned long INT;
> > INT __attribute__ ((noinline)) foo (INT x)
> > {
> >   if (x < 4)
> > return 0;
> >   INT a = x + 18446744073709551612ULL;
> >   INT b = a >> 2;
> >   return b + 1;
> > }
> 
> Yes that is because x does not have a "global" range.

I also tried "get_range_query (cfun)->range_of_expr (vr0, @0)", 
> 
> You could try the following testcase:
> ```
> typedef unsigned long INT;
> INT __attribute__ ((noinline)) foo (INT x)
> {
>   if (x < 4)
> __builtin_unreachable();
>   INT a = x + 18446744073709551612ULL;
>   INT b = a >> 2;
>   return b + 1;
> }
> ```
> 
> Which gets a (global) range for x_1(D) during forwprop3:
>   # RANGE [irange] INT [4, +INF]
>   INTD.2750 x_1(D) = xD.2751;

(In reply to Andrew Pinski from comment #21)
> (In reply to Jiu Fu Guo from comment #20)
> > Interesting thing:
> > the VR is always VR_VARYING, even for the below simple case:
> > 
> > typedef unsigned long INT;
> > INT __attribute__ ((noinline)) foo (INT x)
> > {
> >   if (x < 4)
> > return 0;
> >   INT a = x + 18446744073709551612ULL;
> >   INT b = a >> 2;
> >   return b + 1;
> > }
> 
> Yes that is because x does not have a "global" range.
> 
> You could try the following testcase:
> ```
> typedef unsigned long INT;
> INT __attribute__ ((noinline)) foo (INT x)
> {
>   if (x < 4)
> __builtin_unreachable();
>   INT a = x + 18446744073709551612ULL;
>   INT b = a >> 2;
>   return b + 1;
> }
> ```
> 
> Which gets a (global) range for x_1(D) during forwprop3:
>   # RANGE [irange] INT [4, +INF]
>   INTD.2750 x_1(D) = xD.2751;

Thanks so much!
"get_range_query (cfun)->range_of_expr (vr0, @0)" works for both the case!

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-05-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #21 from Andrew Pinski  ---
(In reply to Jiu Fu Guo from comment #20)
> Interesting thing:
> the VR is always VR_VARYING, even for the below simple case:
> 
> typedef unsigned long INT;
> INT __attribute__ ((noinline)) foo (INT x)
> {
>   if (x < 4)
> return 0;
>   INT a = x + 18446744073709551612ULL;
>   INT b = a >> 2;
>   return b + 1;
> }

Yes that is because x does not have a "global" range.

You could try the following testcase:
```
typedef unsigned long INT;
INT __attribute__ ((noinline)) foo (INT x)
{
  if (x < 4)
__builtin_unreachable();
  INT a = x + 18446744073709551612ULL;
  INT b = a >> 2;
  return b + 1;
}
```

Which gets a (global) range for x_1(D) during forwprop3:
  # RANGE [irange] INT [4, +INF]
  INTD.2750 x_1(D) = xD.2751;

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-05-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #20 from Jiu Fu Guo  ---
(In reply to Andrew Pinski from comment #19)
> Note in the loop case we know it does not wrap because there is a check
> already:
>[local count: 118111600]:
>   if (rows_8(D) > 3)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
> 
>[local count: 105119324]:
>   _13 = rows_8(D) + 18446744073709551612;
>   _15 = _13 / 4;
>   doloop.6_5 = _15 + 1;

Checking why below code does not work:
/* Simplify ((t + -N*M) / N + M) -> t / N: (t + -C) >> N + (C>>N) ==> t >> N */
(for div (trunc_div round_div)
 (simplify
  (plus (rshift (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3)
  (if (ANY_INTEGRAL_TYPE_P (type) &&
   (wi::to_wide (@3) << wi::to_wide (@2)) == -wi::to_wide (@1))
   (if (TYPE_OVERFLOW_UNDEFINED (type))
(div @0 @2)
#if GIMPLE
(with
 {
   bool overflowed = true;
   value_range vr0;
   if (INTEGRAL_TYPE_P (type)
   && get_global_range_query ()->range_of_expr (vr0, @0)
   && !vr0.varying_p () && !vr0.undefined_p ())
 {
   wide_int wmin0 = vr0.lower_bound ();
   wide_int wmax0 = vr0.upper_bound ();
   wide_int w1 = -wi::to_wide (@1);
   wi::overflow_type min_ovf, max_ovf;
   wi::add (wmin0, w1, TYPE_SIGN (type), _ovf);
   wi::add (wmax0, w1, TYPE_SIGN (type), _ovf);
   if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE)
 overflowed = false;
 }
 }
(if (!overflowed)
 (div @0 @2)))
#endif
   


Interesting thing:
the VR is always VR_VARYING, even for the below simple case:

typedef unsigned long INT;
INT __attribute__ ((noinline)) foo (INT x)
{
  if (x < 4)
return 0;
  INT a = x + 18446744073709551612ULL;
  INT b = a >> 2;
  return b + 1;
}

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-02-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #19 from Andrew Pinski  ---
Note in the loop case we know it does not wrap because there is a check
already:
   [local count: 118111600]:
  if (rows_8(D) > 3)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 105119324]:
  _13 = rows_8(D) + 18446744073709551612;
  _15 = _13 / 4;
  doloop.6_5 = _15 + 1;

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-02-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

Andrew Pinski  changed:

   What|Removed |Added

 Target|powerpc64le-linux   |powerpc64le-linux,
   ||aarch64-*-*

--- Comment #18 from Andrew Pinski  ---
Note this shows up on aarch64 too with my reduced testcase.

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-02-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

Andrew Pinski  changed:

   What|Removed |Added

  Component|rtl-optimization|tree-optimization

--- Comment #17 from Andrew Pinski  ---
  _13 = rows_8(D) + 18446744073709551612;
  _15 = _13 >> 2;
  doloop.8_5 = _15 + 1;


This is what IV-OPTS produces.

Reduced testcase:
typedef __SIZE_TYPE__ size_t;

void convert(size_t rows, float*src, float *result)
{
  for(size_t i = 0; i + 4 <= rows; i+=4) {
float t = src[i];
result[i] = t;
  }
}

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-02-13 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #13 from Andrew Pinski  ---
IIRC this is a doloop issue and has been reported before. Maybe even by myself
while I was working at Sony. I think I tried to fix it too.

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-02-13 Thread chip.kerchner at ibm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #12 from Chip Kerchner  ---
Here is an example of the original problem

#define EIGEN_ALWAYS_INLINE __attribute__((always_inline)) inline

typedef __vector float Packet4f;
typedef size_t Index;

EIGEN_ALWAYS_INLINE Packet4f ploadu(const float* from)
{
  return vec_xl(0, const_cast(from));
}

EIGEN_ALWAYS_INLINE void pstoreu(float* to, const Packet4f )
{
  vec_xst(from, 0, to);
}

void convert(Index rows, float*src, float *result)
{
  for(Index i = 0; i + 4 <= rows; i+=4) {
Packet4f r32_0 = ploadu(src + i +  0);
pstoreu(result + i +  0, r32_0);
  }
}

And the output (with notation on the lines in question)

cmpldi 0,3,3
blelr 0
addi 3,3,-4  <- i = rows - 4
li 9,0
srdi 3,3,2   <- i >>= 2
addi 8,3,1   <- i = i + 1
andi. 7,8,0x3
mr 10,8
beq 0,.L10
cmpdi 0,7,1
beq 0,.L14
cmpdi 0,7,2
beq 0,.L15
lxv 0,0(4)
mr 8,3
li 9,16
stxv 0,0(5)
.L15:
lxvx 0,4,9
addi 8,8,-1
stxvx 0,5,9
addi 9,9,16
.L14:
lxvx 0,4,9
cmpdi 0,8,1
stxvx 0,5,9
addi 9,9,16
beqlr 0
.L10:
srdi 10,10,2
mtctr 10
.L3:
lxvx 0,4,9
addi 10,9,16
addi 7,9,32
addi 8,9,48
stxvx 0,5,9
lxvx 0,4,10
addi 9,9,64
stxvx 0,5,10
lxvx 0,4,7
stxvx 0,5,7
lxvx 0,4,8
stxvx 0,5,8
bdnz .L3
blr

In this case the 3 lines notated can be replaced a simple `srdi 8,3,2`

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-02-11 Thread chip.kerchner at ibm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #11 from Chip Kerchner  ---
Nevermind, using a similar example that Segher gave, it would failed too.

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-02-11 Thread chip.kerchner at ibm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #10 from Chip Kerchner  ---
Oops that should be 31 * -2, not 33.

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-02-11 Thread chip.kerchner at ibm dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #9 from Chip Kerchner  ---
Doesn't this work for powers of two (N) and signed values (for A, N and M)?

(59 - (33 * -2)) / -2 + 31 = -62 + 31 = -29

and

59 / -2 = -29

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-02-11 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #8 from Segher Boessenkool  ---
No, addition and subtraction are well defined for all inputs, for unsigned
integers.

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-02-11 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #7 from Peter Bergner  ---
(In reply to Segher Boessenkool from comment #6)
> No?  Take a=59 as counterexample:
> 
> (a - (N*M)) / N + M = (59 - 2*30)/30 + 2 = ~0UL/30 + 2

For unsigned integers, isn't having a < N*M UB so we're free to do as we wish
for either the optimized and non-optimized sequences?  Meaning, can't we assume
here that a >= N*M?

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-02-11 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #6 from Segher Boessenkool  ---
No?  Take a=59 as counterexample:

(a - (N*M)) / N + M = (59 - 2*30)/30 + 2 = ~0UL/30 + 2

but

a / N = 59/30 = 1

Integer division in C is division towards zero, almost no normal algebraic
simplifications apply there.

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-02-11 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #5 from Peter Bergner  ---
(In reply to Segher Boessenkool from comment #4)
> If N is a power of two optimising this to a/N is valid, but for other values
> of N it is not (division is not the inverse of multiplication in C).  It also
> only works for unsigned of course.

Isn't this valid for any N & M, such that M is a factor of N?  Meaning, as long
as N / M has no remainder, it seems like this should be ok.  For example, N =
30 & M = 2 should be just as ok as the N = 32 & M = 2 case, correct?

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-02-11 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

Segher Boessenkool  changed:

   What|Removed |Added

 CC||segher at gcc dot gnu.org

--- Comment #4 from Segher Boessenkool  ---
If N is a power of two optimising this to a/N is valid, but for other values
of N it is not (division is not the inverse of multiplication in C).  It also
only works for unsigned of course.

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-02-11 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #3 from anlauf at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #2)
> I am not sure this can be done in the normal case unless you know the range
> of a to be [64...INF] .
> The wrap around case might be an issue ...
> But I am not 100% sure.

It is.  Just print foo(0).

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-02-10 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

--- Comment #2 from Andrew Pinski  ---
I am not sure this can be done in the normal case unless you know the range of
a to be [64...INF] .
The wrap around case might be an issue ...
But I am not 100% sure.

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-02-10 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N

2023-02-10 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757

Peter Bergner  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-02-10
 CC||chip.kerchner at ibm dot com
 Target||powerpc64le-linux
 Status|UNCONFIRMED |NEW

--- Comment #1 from Peter Bergner  ---
Reported by our Eigen developers and confirmed by me.