Re: Simplify X * C1 == C2 with undefined overflow
On Fri, Aug 07, 2020 at 11:36:59PM +0200, Marc Glisse wrote: > > > https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm is the most > > > common way to compute the modular multiplicative inverse of a number. For > > > 3 > > > and 2^32, it could tell us that 2863311531*3-2*2^32=1, so modulo 2^32 > > > 2863311531*3==1, and 3*X==5 is the same as 2863311531*3*X==2863311531*5, > > > i.e. X==1431655767. > > > > wi::gcd ? > > That's the first place I looked, but this is only the regular Euclid > algorithm, not the extended one. It tells you what the gcd is, but does not > give you the coefficients of the Bézout identity. For 3 and 2^32, it would > tell me the gcd is 1, while I want the number 2863311531 (inverse of 3). > > Ah, found it, there is mod_inv hidden in expr.c! It can be certainly moved to wide-int.{h,cc} if you need it elsewhere. Jakub
Re: Simplify X * C1 == C2 with undefined overflow
On 07/08/20 21:57, Marc Glisse wrote: On Fri, 7 Aug 2020, Joern Wolfgang Rennecke wrote: However, there are cases were the second division will not be possible without rest. Consider : 7*X == 3 3/7 is 0 rest 3. 0x1 / 7 is 0x24924924 rest 4 Since 3 cannot be represented as an integer multiple of -4, we can conclude that the predicate is always false. 613566757*7-2^32==3 Ah, right. I thought to decompose the variable factor into a non-overflowing and an overflowing part. But now I see that the former will not always be found with a standard division. Here we'd have to consider: 3 by 7 is 1 rest -4 .. and then I get your result above. But of course it's no longer an efficient algorithm if I have to test lots of non-canonical division results.
Re: Simplify X * C1 == C2 with undefined overflow
On Fri, 7 Aug 2020, Jakub Jelinek wrote: On Fri, Aug 07, 2020 at 10:57:54PM +0200, Marc Glisse wrote: On Fri, 7 Aug 2020, Joern Wolfgang Rennecke wrote: On 07/08/20 19:21, Marc Glisse wrote: If we are going to handle the wrapping case, we shouldn't limit to the non-wrapping meaning of multiplicity. 3*X==5 should become X==1431655767 (for a 32 bit type), etc. Do we have an extended gcd somewhere in gcc, to help compute 1431655767? I don't quite see yet how this relates to gcd, https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm is the most common way to compute the modular multiplicative inverse of a number. For 3 and 2^32, it could tell us that 2863311531*3-2*2^32=1, so modulo 2^32 2863311531*3==1, and 3*X==5 is the same as 2863311531*3*X==2863311531*5, i.e. X==1431655767. wi::gcd ? That's the first place I looked, but this is only the regular Euclid algorithm, not the extended one. It tells you what the gcd is, but does not give you the coefficients of the Bézout identity. For 3 and 2^32, it would tell me the gcd is 1, while I want the number 2863311531 (inverse of 3). Ah, found it, there is mod_inv hidden in expr.c! -- Marc Glisse
Re: Simplify X * C1 == C2 with undefined overflow
On Fri, Aug 07, 2020 at 10:57:54PM +0200, Marc Glisse wrote: > On Fri, 7 Aug 2020, Joern Wolfgang Rennecke wrote: > > > > > On 07/08/20 19:21, Marc Glisse wrote: > > > > > > If we are going to handle the wrapping case, we shouldn't limit to > > > the non-wrapping meaning of multiplicity. 3*X==5 should become > > > X==1431655767 (for a 32 bit type), etc. > > > > > > Do we have an extended gcd somewhere in gcc, to help compute 1431655767? > > I don't quite see yet how this relates to gcd, > > https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm is the most > common way to compute the modular multiplicative inverse of a number. For 3 > and 2^32, it could tell us that 2863311531*3-2*2^32=1, so modulo 2^32 > 2863311531*3==1, and 3*X==5 is the same as 2863311531*3*X==2863311531*5, > i.e. X==1431655767. wi::gcd ? Jakub
Re: Simplify X * C1 == C2 with undefined overflow
On Fri, 7 Aug 2020, Joern Wolfgang Rennecke wrote: On 07/08/20 19:21, Marc Glisse wrote: If we are going to handle the wrapping case, we shouldn't limit to the non-wrapping meaning of multiplicity. 3*X==5 should become X==1431655767 (for a 32 bit type), etc. Do we have an extended gcd somewhere in gcc, to help compute 1431655767? I don't quite see yet how this relates to gcd, https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm is the most common way to compute the modular multiplicative inverse of a number. For 3 and 2^32, it could tell us that 2863311531*3-2*2^32=1, so modulo 2^32 2863311531*3==1, and 3*X==5 is the same as 2863311531*3*X==2863311531*5, i.e. X==1431655767. However, there are cases were the second division will not be possible without rest. Consider : 7*X == 3 3/7 is 0 rest 3. 0x1 / 7 is 0x24924924 rest 4 Since 3 cannot be represented as an integer multiple of -4, we can conclude that the predicate is always false. 613566757*7-2^32==3 Also: 14*X == 28 has a non-zero power of two in the constant factor, so we have to factor out the power of two (if the right hand side has a lower power of two, the predicate is always false) and consider in modulo arithmetic with a number of bits less by the factored out power of two, i.e. 31 bit here: 7*X == 14 which is solved as above - but in 32 bit arithmetic - to X == 2 and to convert back to 32 bit arithmetic, we get: X & 0x7fff == 2 or 2*x == 4 Yes, we can reduce the size of the numbers a bit, but the gains won't be as nice for even numbers. I think the always-false case is already handled by CCP, tracking which bits can be 0 (I didn't check). -- Marc Glisse
Re: Simplify X * C1 == C2 with undefined overflow
On 07/08/20 19:21, Marc Glisse wrote: If we are going to handle the wrapping case, we shouldn't limit to the non-wrapping meaning of multiplicity. 3*X==5 should become X==1431655767 (for a 32 bit type), etc. Do we have an extended gcd somewhere in gcc, to help compute 1431655767? I don't quite see yet how this relates to gcd, but how how about this: First, we divide the right hand side of the comparison by the constant factor. 5 divided by 3 is one, with rest 2. Now we want to express this rest two by wrapping overflow. 0x1 divided by 3 is 0x rest 1. When we multiply 0x again by three, the rest is missing, so we get -1 in 32 bit. So we try to express 2 as a multiple of -1: 2/-1 = -2 Thus, the quotient is 1 + 0x * -2 , which, using 32 bit wrapping arithmetic, is 0x5557, i.e. 1431655767 . Again, because the constant factor (3) is odd, there can be no other solution. However, there are cases were the second division will not be possible without rest. Consider : 7*X == 3 3/7 is 0 rest 3. 0x1 / 7 is 0x24924924 rest 4 Since 3 cannot be represented as an integer multiple of -4, we can conclude that the predicate is always false. Also: 14*X == 28 has a non-zero power of two in the constant factor, so we have to factor out the power of two (if the right hand side has a lower power of two, the predicate is always false) and consider in modulo arithmetic with a number of bits less by the factored out power of two, i.e. 31 bit here: 7*X == 14 which is solved as above - but in 32 bit arithmetic - to X == 2 and to convert back to 32 bit arithmetic, we get: X & 0x7fff == 2 or 2*x == 4 Likewise, 14*X == 30 would be reduced to 7*X == 15 in 31 bit arithmetic, then we find that we can't express the rest of 1 as an integer multiple of -2, so the predicate is always false. . (Even if the variable factor is wider than equality comparison, that is not a problem as long as the comparison is not widened by the transformation.) On the other hand, the following generalizations would work only without overflow: - handling of inequality-comparisons - merely have to account for the sign of the factor reversing the sense of the inequality, e.g. -3*X >= 15 ---> X <= 5 And again for this one, we have to be careful how we round the division, but we don't have to limit to the case where 15 is a multiple of 3. 3*X>7 can be replaced with X>2. Those are two nice suggestions. Do you intend to write a patch? No, I just couldn't resist applying a smidge of number theory to a compiler problem. I still have to herd other patches. Otherwise I'll try to do it eventually (no promise). Would be nice.
Re: Simplify X * C1 == C2 with undefined overflow
On Fri, 7 Aug 2020, Joern Wolfgang Rennecke wrote: this transformation is quite straightforward, without overflow, 3*X==15 is the same as X==5 and 3*X==5 cannot happen. Actually, with binary and decimal computers, this transformation (with these specific numbers) is also valid for wrapping overflow. More generally, it is valid for wrapping overflow if the right hand side of the comparison is divisible without rest by the constant factor, and the constant factor has no sub-factor that is a zero divisor for the ring defined by the wrapping operation. For binary computers, the latter condition can be more simply be restated as: The constant factor has to be odd. (For decimal computers, it's: must not be divisible by two and/or five.) If we are going to handle the wrapping case, we shouldn't limit to the non-wrapping meaning of multiplicity. 3*X==5 should become X==1431655767 (for a 32 bit type), etc. Do we have an extended gcd somewhere in gcc, to help compute 1431655767? (Even if the variable factor is wider than equality comparison, that is not a problem as long as the comparison is not widened by the transformation.) On the other hand, the following generalizations would work only without overflow: - handling of inequality-comparisons - merely have to account for the sign of the factor reversing the sense of the inequality, e.g. -3*X >= 15 ---> X <= 5 And again for this one, we have to be careful how we round the division, but we don't have to limit to the case where 15 is a multiple of 3. 3*X>7 can be replaced with X>2. Those are two nice suggestions. Do you intend to write a patch? Otherwise I'll try to do it eventually (no promise). -- Marc Glisse
Re: Simplify X * C1 == C2 with undefined overflow
this transformation is quite straightforward, without overflow, 3*X==15 is the same as X==5 and 3*X==5 cannot happen. Actually, with binary and decimal computers, this transformation (with these specific numbers) is also valid for wrapping overflow. More generally, it is valid for wrapping overflow if the right hand side of the comparison is divisible without rest by the constant factor, and the constant factor has no sub-factor that is a zero divisor for the ring defined by the wrapping operation. For binary computers, the latter condition can be more simply be restated as: The constant factor has to be odd. (For decimal computers, it's: must not be divisible by two and/or five.) (Even if the variable factor is wider than equality comparison, that is not a problem as long as the comparison is not widened by the transformation.) On the other hand, the following generalizations would work only without overflow: - handling of inequality-comparisons - merely have to account for the sign of the factor reversing the sense of the inequality, e.g. -3*X >= 15 ---> X <= 5 - If the right-hand-side constant is not a multiple of the constant factor, the product is always unequal, i.e. an EQ test would be always false, NE would be always true.
Re: Simplify X * C1 == C2 with undefined overflow
On Mon, 3 Aug 2020, Richard Biener wrote: On Sat, Aug 1, 2020 at 9:29 AM Marc Glisse wrote: Hello, this transformation is quite straightforward, without overflow, 3*X==15 is the same as X==5 and 3*X==5 cannot happen. Adding a single_use restriction for the first case didn't seem necessary, although of course it can slightly increase register pressure in some cases. Bootstrap+regtest on x86_64-pc-linux-gnu. OK with using constant_boolean_node (cmp == NE_EXPR, type). ISTR we had the x * 0 == CST simplification somewhere but maybe it was x * y == 0 ... ah, yes: /* Transform comparisons of the form X * C1 CMP 0 to X CMP 0 in the signed arithmetic case. That form is created by the compiler often enough for folding it to be of value. One example is in computing loop trip counts after Operator Strength Reduction. */ (for cmp (simple_comparison) scmp (swapped_simple_comparison) (simplify (cmp (mult@3 @0 INTEGER_CST@1) integer_zerop@2) As it is placed after your pattern it will be never matched I think (but we don't warn because of INTEGER_CST vs. integer_zerop). But I think your pattern subsumes it besides of the X * 0 == 0 compare - oh, and the other pattern also handles relational compares (those will still trigger). Maybe place the patterns next to each other? Also see whether moving yours after the above will cease the testcases to be handled because it's no longer matched - if not this might be the better order. I moved it after, it still works, so I pushed the patch. Note that the other transformation has a single_use restriction, while this one doesn't, that's not very consistent, but also hopefully not so important... -- Marc Glisse
Re: Simplify X * C1 == C2 with undefined overflow
On Sat, Aug 1, 2020 at 9:29 AM Marc Glisse wrote: > > Hello, > > this transformation is quite straightforward, without overflow, 3*X==15 is > the same as X==5 and 3*X==5 cannot happen. Adding a single_use restriction > for the first case didn't seem necessary, although of course it can > slightly increase register pressure in some cases. > > Bootstrap+regtest on x86_64-pc-linux-gnu. OK with using constant_boolean_node (cmp == NE_EXPR, type). ISTR we had the x * 0 == CST simplification somewhere but maybe it was x * y == 0 ... ah, yes: /* Transform comparisons of the form X * C1 CMP 0 to X CMP 0 in the signed arithmetic case. That form is created by the compiler often enough for folding it to be of value. One example is in computing loop trip counts after Operator Strength Reduction. */ (for cmp (simple_comparison) scmp (swapped_simple_comparison) (simplify (cmp (mult@3 @0 INTEGER_CST@1) integer_zerop@2) As it is placed after your pattern it will be never matched I think (but we don't warn because of INTEGER_CST vs. integer_zerop). But I think your pattern subsumes it besides of the X * 0 == 0 compare - oh, and the other pattern also handles relational compares (those will still trigger). Maybe place the patterns next to each other? Also see whether moving yours after the above will cease the testcases to be handled because it's no longer matched - if not this might be the better order. Richard. > 2020-08-03 Marc Glisse > > PR tree-optimization/95433 > * match.pd (X * C1 == C2): New transformation. > > * gcc.c-torture/execute/pr23135.c: Add -fwrapv to avoid > undefined behavior. > * gcc.dg/tree-ssa/pr95433.c: New file. > > -- > Marc Glisse
Simplify X * C1 == C2 with undefined overflow
Hello, this transformation is quite straightforward, without overflow, 3*X==15 is the same as X==5 and 3*X==5 cannot happen. Adding a single_use restriction for the first case didn't seem necessary, although of course it can slightly increase register pressure in some cases. Bootstrap+regtest on x86_64-pc-linux-gnu. 2020-08-03 Marc Glisse PR tree-optimization/95433 * match.pd (X * C1 == C2): New transformation. * gcc.c-torture/execute/pr23135.c: Add -fwrapv to avoid undefined behavior. * gcc.dg/tree-ssa/pr95433.c: New file. -- Marc Glissediff --git a/gcc/match.pd b/gcc/match.pd index a052c9e3dbc..78fd8cf5d9e 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -1578,6 +1578,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && wi::neg_p (wi::to_wide (@1), TYPE_SIGN (TREE_TYPE (@1 (cmp @2 @0)) +/* For integral types with undefined overflow fold + x * C1 == C2 into x == C2 / C1 or false. */ +(for cmp (eq ne) + (simplify + (cmp (mult @0 INTEGER_CST@1) INTEGER_CST@2) + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)) + && wi::to_wide (@1) != 0) + (with { widest_int quot; } +(if (wi::multiple_of_p (wi::to_widest (@2), wi::to_widest (@1), + TYPE_SIGN (TREE_TYPE (@0)), )) + (cmp @0 { wide_int_to_tree (TREE_TYPE (@0), quot); }) + { build_int_cst (type, cmp == NE_EXPR); }) + /* (X - 1U) <= INT_MAX-1U into (int) X > 0. */ (for cmp (le gt) icmp (gt le) diff --git a/gcc/testsuite/gcc.c-torture/execute/pr23135.c b/gcc/testsuite/gcc.c-torture/execute/pr23135.c index e740ff52874..ef9b7efc9c4 100644 --- a/gcc/testsuite/gcc.c-torture/execute/pr23135.c +++ b/gcc/testsuite/gcc.c-torture/execute/pr23135.c @@ -1,7 +1,7 @@ /* Based on execute/simd-1.c, modified by joern.renne...@st.com to trigger a reload bug. Verified for gcc mainline from 20050722 13:00 UTC for sh-elf -m4 -O2. */ -/* { dg-options "-Wno-psabi" } */ +/* { dg-options "-Wno-psabi -fwrapv" } */ /* { dg-add-options stack_size } */ #ifndef STACK_SIZE diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr95433.c b/gcc/testsuite/gcc.dg/tree-ssa/pr95433.c new file mode 100644 index 000..4e161ee26cc --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr95433.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-optimized" } */ + +int f(int x){return x*7==17;} +int g(int x){return x*3==15;} + +/* { dg-final { scan-tree-dump "return 0;" "optimized" } } */ +/* { dg-final { scan-tree-dump "== 5;" "optimized" } } */