[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 Jiu Fu Guo changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #26 from Jiu Fu Guo --- Patch was committed.
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #25 from CVS Commits --- The master branch has been updated by Jiu Fu Guo : https://gcc.gnu.org/g:1aceceb1e2d6e86ce183c8cc448750fa03b6f79e commit r14-3644-g1aceceb1e2d6e86ce183c8cc448750fa03b6f79e Author: Jiufu Guo Date: Mon Sep 4 10:31:04 2023 +0800 Optimize '(X - N * M) / N' to 'X / N - M' if valid Integer expression "(X - N * M) / N" can be optimized to "X / N - M" with the below conditions: 1. There is no wrap/overflow/underflow. wrap/overflow/underflow breaks the arithmetic operation. 2. "X - N * M" and "X" are not of opposite sign. Here, the operation "/" would be "trunc_div", the fractional part is discarded. If "X - N * M" and "X" are in different signs, then trunc_div discards the fractional parts (of /N) in different directions. PR tree-optimization/108757 gcc/ChangeLog: * match.pd ((X - N * M) / N): New pattern. ((X + N * M) / N): New pattern. ((X + C) div_rshift N): New pattern. gcc/testsuite/ChangeLog: * gcc.dg/pr108757-1.c: New test. * gcc.dg/pr108757-2.c: New test. * gcc.dg/pr108757.h: New test.
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #24 from Jiu Fu Guo --- (In reply to Jiu Fu Guo from comment #23) > /* Simplify ((t + -N*M) / N + M) -> t / N: (t + -C) >> N + (C>>N) ==> t >> N > */ > (for div (trunc_div exact_div) div was not used in this matcher, yet. Here rshift is used: t/(1<>N". > (simplify > (plus (rshift (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3) > (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) && >(wi::to_wide (@3) << wi::to_wide (@2)) == -wi::to_wide (@1)) >(if (TYPE_OVERFLOW_UNDEFINED (type)) > (div @0 @2) This should be "(rshift @0 @2)", otherwise it will be error if relax "TYPE_UNSIGNED (type)"
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #23 from Jiu Fu Guo --- /* Simplify ((t + -N*M) / N + M) -> t / N: (t + -C) >> N + (C>>N) ==> t >> N */ (for div (trunc_div exact_div) (simplify (plus (rshift (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3) (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) && (wi::to_wide (@3) << wi::to_wide (@2)) == -wi::to_wide (@1)) (if (TYPE_OVERFLOW_UNDEFINED (type)) (div @0 @2) #if GIMPLE (with { bool overflowed = true; value_range vr0; if (get_range_query (cfun)->range_of_expr (vr0, @0) && !vr0.varying_p () && !vr0.undefined_p ()) { wide_int wmin0 = vr0.lower_bound (); wide_int wmax0 = vr0.upper_bound (); wide_int w1 = -wi::to_wide (@1); wi::overflow_type min_ovf, max_ovf; wi::sub (wmin0, w1, TYPE_SIGN (type), _ovf); wi::sub (wmax0, w1, TYPE_SIGN (type), _ovf); if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE) overflowed = false; } } (if (!overflowed) (rshift @0 @2))) #endif Got one match for the case. Checking if it is safe(condition) or how to support other forms: signed type, negative N, non-power2 N, negative M ...
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #22 from Jiu Fu Guo --- (In reply to Andrew Pinski from comment #21) > (In reply to Jiu Fu Guo from comment #20) > > Interesting thing: > > the VR is always VR_VARYING, even for the below simple case: > > > > typedef unsigned long INT; > > INT __attribute__ ((noinline)) foo (INT x) > > { > > if (x < 4) > > return 0; > > INT a = x + 18446744073709551612ULL; > > INT b = a >> 2; > > return b + 1; > > } > > Yes that is because x does not have a "global" range. I also tried "get_range_query (cfun)->range_of_expr (vr0, @0)", > > You could try the following testcase: > ``` > typedef unsigned long INT; > INT __attribute__ ((noinline)) foo (INT x) > { > if (x < 4) > __builtin_unreachable(); > INT a = x + 18446744073709551612ULL; > INT b = a >> 2; > return b + 1; > } > ``` > > Which gets a (global) range for x_1(D) during forwprop3: > # RANGE [irange] INT [4, +INF] > INTD.2750 x_1(D) = xD.2751; (In reply to Andrew Pinski from comment #21) > (In reply to Jiu Fu Guo from comment #20) > > Interesting thing: > > the VR is always VR_VARYING, even for the below simple case: > > > > typedef unsigned long INT; > > INT __attribute__ ((noinline)) foo (INT x) > > { > > if (x < 4) > > return 0; > > INT a = x + 18446744073709551612ULL; > > INT b = a >> 2; > > return b + 1; > > } > > Yes that is because x does not have a "global" range. > > You could try the following testcase: > ``` > typedef unsigned long INT; > INT __attribute__ ((noinline)) foo (INT x) > { > if (x < 4) > __builtin_unreachable(); > INT a = x + 18446744073709551612ULL; > INT b = a >> 2; > return b + 1; > } > ``` > > Which gets a (global) range for x_1(D) during forwprop3: > # RANGE [irange] INT [4, +INF] > INTD.2750 x_1(D) = xD.2751; Thanks so much! "get_range_query (cfun)->range_of_expr (vr0, @0)" works for both the case!
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #21 from Andrew Pinski --- (In reply to Jiu Fu Guo from comment #20) > Interesting thing: > the VR is always VR_VARYING, even for the below simple case: > > typedef unsigned long INT; > INT __attribute__ ((noinline)) foo (INT x) > { > if (x < 4) > return 0; > INT a = x + 18446744073709551612ULL; > INT b = a >> 2; > return b + 1; > } Yes that is because x does not have a "global" range. You could try the following testcase: ``` typedef unsigned long INT; INT __attribute__ ((noinline)) foo (INT x) { if (x < 4) __builtin_unreachable(); INT a = x + 18446744073709551612ULL; INT b = a >> 2; return b + 1; } ``` Which gets a (global) range for x_1(D) during forwprop3: # RANGE [irange] INT [4, +INF] INTD.2750 x_1(D) = xD.2751;
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 Jiu Fu Guo changed: What|Removed |Added CC||guojiufu at gcc dot gnu.org --- Comment #20 from Jiu Fu Guo --- (In reply to Andrew Pinski from comment #19) > Note in the loop case we know it does not wrap because there is a check > already: >[local count: 118111600]: > if (rows_8(D) > 3) > goto ; [89.00%] > else > goto ; [11.00%] > >[local count: 105119324]: > _13 = rows_8(D) + 18446744073709551612; > _15 = _13 / 4; > doloop.6_5 = _15 + 1; Checking why below code does not work: /* Simplify ((t + -N*M) / N + M) -> t / N: (t + -C) >> N + (C>>N) ==> t >> N */ (for div (trunc_div round_div) (simplify (plus (rshift (plus @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3) (if (ANY_INTEGRAL_TYPE_P (type) && (wi::to_wide (@3) << wi::to_wide (@2)) == -wi::to_wide (@1)) (if (TYPE_OVERFLOW_UNDEFINED (type)) (div @0 @2) #if GIMPLE (with { bool overflowed = true; value_range vr0; if (INTEGRAL_TYPE_P (type) && get_global_range_query ()->range_of_expr (vr0, @0) && !vr0.varying_p () && !vr0.undefined_p ()) { wide_int wmin0 = vr0.lower_bound (); wide_int wmax0 = vr0.upper_bound (); wide_int w1 = -wi::to_wide (@1); wi::overflow_type min_ovf, max_ovf; wi::add (wmin0, w1, TYPE_SIGN (type), _ovf); wi::add (wmax0, w1, TYPE_SIGN (type), _ovf); if (min_ovf == wi::OVF_NONE && max_ovf == wi::OVF_NONE) overflowed = false; } } (if (!overflowed) (div @0 @2))) #endif Interesting thing: the VR is always VR_VARYING, even for the below simple case: typedef unsigned long INT; INT __attribute__ ((noinline)) foo (INT x) { if (x < 4) return 0; INT a = x + 18446744073709551612ULL; INT b = a >> 2; return b + 1; }
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #19 from Andrew Pinski --- Note in the loop case we know it does not wrap because there is a check already: [local count: 118111600]: if (rows_8(D) > 3) goto ; [89.00%] else goto ; [11.00%] [local count: 105119324]: _13 = rows_8(D) + 18446744073709551612; _15 = _13 / 4; doloop.6_5 = _15 + 1;
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 Andrew Pinski changed: What|Removed |Added Target|powerpc64le-linux |powerpc64le-linux, ||aarch64-*-* --- Comment #18 from Andrew Pinski --- Note this shows up on aarch64 too with my reduced testcase.
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 Andrew Pinski changed: What|Removed |Added Component|rtl-optimization|tree-optimization --- Comment #17 from Andrew Pinski --- _13 = rows_8(D) + 18446744073709551612; _15 = _13 >> 2; doloop.8_5 = _15 + 1; This is what IV-OPTS produces. Reduced testcase: typedef __SIZE_TYPE__ size_t; void convert(size_t rows, float*src, float *result) { for(size_t i = 0; i + 4 <= rows; i+=4) { float t = src[i]; result[i] = t; } }
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #13 from Andrew Pinski --- IIRC this is a doloop issue and has been reported before. Maybe even by myself while I was working at Sony. I think I tried to fix it too.
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #12 from Chip Kerchner --- Here is an example of the original problem #define EIGEN_ALWAYS_INLINE __attribute__((always_inline)) inline typedef __vector float Packet4f; typedef size_t Index; EIGEN_ALWAYS_INLINE Packet4f ploadu(const float* from) { return vec_xl(0, const_cast(from)); } EIGEN_ALWAYS_INLINE void pstoreu(float* to, const Packet4f ) { vec_xst(from, 0, to); } void convert(Index rows, float*src, float *result) { for(Index i = 0; i + 4 <= rows; i+=4) { Packet4f r32_0 = ploadu(src + i + 0); pstoreu(result + i + 0, r32_0); } } And the output (with notation on the lines in question) cmpldi 0,3,3 blelr 0 addi 3,3,-4 <- i = rows - 4 li 9,0 srdi 3,3,2 <- i >>= 2 addi 8,3,1 <- i = i + 1 andi. 7,8,0x3 mr 10,8 beq 0,.L10 cmpdi 0,7,1 beq 0,.L14 cmpdi 0,7,2 beq 0,.L15 lxv 0,0(4) mr 8,3 li 9,16 stxv 0,0(5) .L15: lxvx 0,4,9 addi 8,8,-1 stxvx 0,5,9 addi 9,9,16 .L14: lxvx 0,4,9 cmpdi 0,8,1 stxvx 0,5,9 addi 9,9,16 beqlr 0 .L10: srdi 10,10,2 mtctr 10 .L3: lxvx 0,4,9 addi 10,9,16 addi 7,9,32 addi 8,9,48 stxvx 0,5,9 lxvx 0,4,10 addi 9,9,64 stxvx 0,5,10 lxvx 0,4,7 stxvx 0,5,7 lxvx 0,4,8 stxvx 0,5,8 bdnz .L3 blr In this case the 3 lines notated can be replaced a simple `srdi 8,3,2`
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #11 from Chip Kerchner --- Nevermind, using a similar example that Segher gave, it would failed too.
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #10 from Chip Kerchner --- Oops that should be 31 * -2, not 33.
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #9 from Chip Kerchner --- Doesn't this work for powers of two (N) and signed values (for A, N and M)? (59 - (33 * -2)) / -2 + 31 = -62 + 31 = -29 and 59 / -2 = -29
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #8 from Segher Boessenkool --- No, addition and subtraction are well defined for all inputs, for unsigned integers.
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #7 from Peter Bergner --- (In reply to Segher Boessenkool from comment #6) > No? Take a=59 as counterexample: > > (a - (N*M)) / N + M = (59 - 2*30)/30 + 2 = ~0UL/30 + 2 For unsigned integers, isn't having a < N*M UB so we're free to do as we wish for either the optimized and non-optimized sequences? Meaning, can't we assume here that a >= N*M?
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #6 from Segher Boessenkool --- No? Take a=59 as counterexample: (a - (N*M)) / N + M = (59 - 2*30)/30 + 2 = ~0UL/30 + 2 but a / N = 59/30 = 1 Integer division in C is division towards zero, almost no normal algebraic simplifications apply there.
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #5 from Peter Bergner --- (In reply to Segher Boessenkool from comment #4) > If N is a power of two optimising this to a/N is valid, but for other values > of N it is not (division is not the inverse of multiplication in C). It also > only works for unsigned of course. Isn't this valid for any N & M, such that M is a factor of N? Meaning, as long as N / M has no remainder, it seems like this should be ok. For example, N = 30 & M = 2 should be just as ok as the N = 32 & M = 2 case, correct?
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 Segher Boessenkool changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #4 from Segher Boessenkool --- If N is a power of two optimising this to a/N is valid, but for other values of N it is not (division is not the inverse of multiplication in C). It also only works for unsigned of course.
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #3 from anlauf at gcc dot gnu.org --- (In reply to Andrew Pinski from comment #2) > I am not sure this can be done in the normal case unless you know the range > of a to be [64...INF] . > The wrap around case might be an issue ... > But I am not 100% sure. It is. Just print foo(0).
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 --- Comment #2 from Andrew Pinski --- I am not sure this can be done in the normal case unless you know the range of a to be [64...INF] . The wrap around case might be an issue ... But I am not 100% sure.
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug tree-optimization/108757] We do not simplify (a - (N*M)) / N + M -> a / N
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108757 Peter Bergner changed: What|Removed |Added Ever confirmed|0 |1 Last reconfirmed||2023-02-10 CC||chip.kerchner at ibm dot com Target||powerpc64le-linux Status|UNCONFIRMED |NEW --- Comment #1 from Peter Bergner --- Reported by our Eigen developers and confirmed by me.