Re: Simplify X * C1 == C2 with undefined overflow

2020-08-08 Thread Jakub Jelinek via Gcc-patches
On Fri, Aug 07, 2020 at 11:36:59PM +0200, Marc Glisse wrote:
> > > https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm is the most
> > > common way to compute the modular multiplicative inverse of a number. For 
> > > 3
> > > and 2^32, it could tell us that 2863311531*3-2*2^32=1, so modulo 2^32
> > > 2863311531*3==1, and 3*X==5 is the same as 2863311531*3*X==2863311531*5,
> > > i.e. X==1431655767.
> > 
> > wi::gcd ?
> 
> That's the first place I looked, but this is only the regular Euclid
> algorithm, not the extended one. It tells you what the gcd is, but does not
> give you the coefficients of the Bézout identity. For 3 and 2^32, it would
> tell me the gcd is 1, while I want the number 2863311531 (inverse of 3).
> 
> Ah, found it, there is mod_inv hidden in expr.c!

It can be certainly moved to wide-int.{h,cc} if you need it elsewhere.

Jakub



Re: Simplify X * C1 == C2 with undefined overflow

2020-08-07 Thread Joern Wolfgang Rennecke



On 07/08/20 21:57, Marc Glisse wrote:

On Fri, 7 Aug 2020, Joern Wolfgang Rennecke wrote:


However, there are cases were the second division will not be 
possible without rest.

Consider : 7*X == 3
3/7 is 0 rest 3.
0x1 / 7 is 0x24924924 rest 4
Since 3 cannot be represented as an integer multiple of -4, we can 
conclude that the predicate

is always false.


613566757*7-2^32==3
Ah, right.  I thought to decompose the variable factor into a 
non-overflowing and an overflowing part.
But now I see that the former will not always be found with a standard 
division.  Here we'd have to

consider:
3 by 7 is 1 rest -4 .. and then I get your result above.  But of course 
it's no longer an efficient algorithm

if I have to test lots of non-canonical division results.




Re: Simplify X * C1 == C2 with undefined overflow

2020-08-07 Thread Marc Glisse

On Fri, 7 Aug 2020, Jakub Jelinek wrote:


On Fri, Aug 07, 2020 at 10:57:54PM +0200, Marc Glisse wrote:

On Fri, 7 Aug 2020, Joern Wolfgang Rennecke wrote:



On 07/08/20 19:21, Marc Glisse wrote:


If we are going to handle the wrapping case, we shouldn't limit to
the non-wrapping meaning of multiplicity. 3*X==5 should become
X==1431655767 (for a 32 bit type), etc.

Do we have an extended gcd somewhere in gcc, to help compute 1431655767?

I don't quite see yet how this relates to gcd,


https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm is the most
common way to compute the modular multiplicative inverse of a number. For 3
and 2^32, it could tell us that 2863311531*3-2*2^32=1, so modulo 2^32
2863311531*3==1, and 3*X==5 is the same as 2863311531*3*X==2863311531*5,
i.e. X==1431655767.


wi::gcd ?


That's the first place I looked, but this is only the regular Euclid 
algorithm, not the extended one. It tells you what the gcd is, but does 
not give you the coefficients of the Bézout identity. For 3 and 2^32, it 
would tell me the gcd is 1, while I want the number 2863311531 (inverse of 
3).


Ah, found it, there is mod_inv hidden in expr.c!

--
Marc Glisse


Re: Simplify X * C1 == C2 with undefined overflow

2020-08-07 Thread Jakub Jelinek via Gcc-patches
On Fri, Aug 07, 2020 at 10:57:54PM +0200, Marc Glisse wrote:
> On Fri, 7 Aug 2020, Joern Wolfgang Rennecke wrote:
> 
> > 
> > On 07/08/20 19:21, Marc Glisse wrote:
> > > 
> > > If we are going to handle the wrapping case, we shouldn't limit to
> > > the non-wrapping meaning of multiplicity. 3*X==5 should become
> > > X==1431655767 (for a 32 bit type), etc.
> > > 
> > > Do we have an extended gcd somewhere in gcc, to help compute 1431655767?
> > I don't quite see yet how this relates to gcd,
> 
> https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm is the most
> common way to compute the modular multiplicative inverse of a number. For 3
> and 2^32, it could tell us that 2863311531*3-2*2^32=1, so modulo 2^32
> 2863311531*3==1, and 3*X==5 is the same as 2863311531*3*X==2863311531*5,
> i.e. X==1431655767.

wi::gcd ?

Jakub



Re: Simplify X * C1 == C2 with undefined overflow

2020-08-07 Thread Marc Glisse

On Fri, 7 Aug 2020, Joern Wolfgang Rennecke wrote:



On 07/08/20 19:21, Marc Glisse wrote:


If we are going to handle the wrapping case, we shouldn't limit to the 
non-wrapping meaning of multiplicity. 3*X==5 should become X==1431655767 
(for a 32 bit type), etc.


Do we have an extended gcd somewhere in gcc, to help compute 1431655767?

I don't quite see yet how this relates to gcd,


https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm is the most 
common way to compute the modular multiplicative inverse of a number. For 
3 and 2^32, it could tell us that 2863311531*3-2*2^32=1, so modulo 2^32 
2863311531*3==1, and 3*X==5 is the same as 2863311531*3*X==2863311531*5, 
i.e. X==1431655767.


However, there are cases were the second division will not be possible 
without rest.

Consider : 7*X == 3
3/7 is 0 rest 3.
0x1 / 7 is 0x24924924 rest 4
Since 3 cannot be represented as an integer multiple of -4, we can conclude 
that the predicate

is always false.


613566757*7-2^32==3


Also: 14*X == 28
has a non-zero power of two in the constant factor, so we have to factor out 
the power of two
(if the right hand side has a lower power of two, the predicate is always 
false) and consider
in modulo arithmetic with a number of bits less by the factored out power of 
two, i.e. 31 bit here:

7*X == 14
which is solved as above - but in 32 bit arithmetic - to
X == 2
and to convert back to 32 bit arithmetic, we get:
X & 0x7fff == 2
or
2*x == 4


Yes, we can reduce the size of the numbers a bit, but the gains won't be 
as nice for even numbers. I think the always-false case is already handled 
by CCP, tracking which bits can be 0 (I didn't check).


--
Marc Glisse


Re: Simplify X * C1 == C2 with undefined overflow

2020-08-07 Thread Joern Wolfgang Rennecke



On 07/08/20 19:21, Marc Glisse wrote:


If we are going to handle the wrapping case, we shouldn't limit to the 
non-wrapping meaning of multiplicity. 3*X==5 should become 
X==1431655767 (for a 32 bit type), etc.


Do we have an extended gcd somewhere in gcc, to help compute 1431655767?

I don't quite see yet how this relates to gcd, but how how about this:

First, we divide the right hand side of the comparison by the constant 
factor.  5 divided by 3
is one, with rest 2.  Now we want to express this rest two by wrapping 
overflow.
0x1 divided by 3 is 0x rest 1.  When we multiply 
0x again by three,
the rest is missing, so we get -1 in 32 bit.  So we try to express 2 as 
a multiple of -1:

2/-1 = -2

Thus, the quotient is 1 + 0x * -2 , which, using 32 bit wrapping 
arithmetic,

is 0x5557, i.e. 1431655767 .
Again, because the constant factor (3) is odd, there can be no other 
solution.


However, there are cases were the second division will not be possible 
without rest.

Consider : 7*X == 3
3/7 is 0 rest 3.
0x1 / 7 is 0x24924924 rest 4
Since 3 cannot be represented as an integer multiple of -4, we can 
conclude that the predicate

is always false.

Also: 14*X == 28
has a non-zero power of two in the constant factor, so we have to factor 
out the power of two
(if the right hand side has a lower power of two, the predicate is 
always false) and consider
in modulo arithmetic with a number of bits less by the factored out 
power of two, i.e. 31 bit here:

7*X == 14
which is solved as above - but in 32 bit arithmetic - to
X == 2
and to convert back to 32 bit arithmetic, we get:
X & 0x7fff == 2
or
2*x == 4


Likewise, 14*X == 30 would be reduced to 7*X == 15 in 31 bit arithmetic, 
then we find that
we can't express the rest of 1 as an integer multiple of -2, so the 
predicate is always false.

 .


(Even if the variable factor is wider than equality comparison, that 
is not a problem as long as the comparison is not widened by the 
transformation.)


On the other hand, the following generalizations would work only 
without overflow:
- handling of inequality-comparisons - merely have to account for the 
sign of the factor reversing the sense of

 the inequality, e.g. -3*X >= 15 ---> X <= 5


And again for this one, we have to be careful how we round the 
division, but we don't have to limit to the case where 15 is a 
multiple of 3. 3*X>7 can be replaced with X>2.



Those are two nice suggestions. Do you intend to write a patch?
No, I just couldn't resist applying a smidge of number theory to a 
compiler problem.

I still have to herd other patches.

Otherwise I'll try to do it eventually (no promise).

Would be nice.


Re: Simplify X * C1 == C2 with undefined overflow

2020-08-07 Thread Marc Glisse

On Fri, 7 Aug 2020, Joern Wolfgang Rennecke wrote:


this transformation is quite straightforward, without overflow, 3*X==15 is
the same as X==5 and 3*X==5 cannot happen.


Actually, with binary and decimal computers, this transformation (with 
these specific numbers) is also valid for wrapping overflow.  More 
generally, it is valid for wrapping overflow if the right hand side of 
the comparison is divisible without rest by the constant factor, and the 
constant factor has no sub-factor that is a zero divisor for the ring 
defined by the wrapping operation. For binary computers, the latter 
condition can be more simply be restated as: The constant factor has to 
be odd. (For decimal computers, it's: must not be divisible by two 
and/or five.)


If we are going to handle the wrapping case, we shouldn't limit to the 
non-wrapping meaning of multiplicity. 3*X==5 should become X==1431655767 
(for a 32 bit type), etc.


Do we have an extended gcd somewhere in gcc, to help compute 1431655767?

(Even if the variable factor is wider than equality comparison, that is 
not a problem as long as the comparison is not widened by the 
transformation.)


On the other hand, the following generalizations would work only without 
overflow:
- handling of inequality-comparisons - merely have to account for the sign of 
the factor reversing the sense of

 the inequality, e.g. -3*X >= 15 ---> X <= 5


And again for this one, we have to be careful how we round the division, 
but we don't have to limit to the case where 15 is a multiple of 3. 3*X>7 
can be replaced with X>2.



Those are two nice suggestions. Do you intend to write a patch? Otherwise 
I'll try to do it eventually (no promise).


--
Marc Glisse


Re: Simplify X * C1 == C2 with undefined overflow

2020-08-07 Thread Joern Wolfgang Rennecke

this transformation is quite straightforward, without overflow, 3*X==15 is
the same as X==5 and 3*X==5 cannot happen.


Actually, with binary and decimal computers, this transformation (with these 
specific numbers)
is also valid for wrapping overflow.  More generally, it is valid for wrapping 
overflow
if the right hand side of the comparison is divisible without rest by the 
constant factor,
and the constant factor has no sub-factor that is a zero divisor for the ring 
defined by the wrapping operation.
For binary computers, the latter condition can be more simply be restated as: 
The constant factor has to be odd.
(For decimal computers, it's: must not be divisible by two and/or five.)

(Even if the variable factor is wider than equality comparison, that is not a 
problem
 as long as the comparison is not widened by the transformation.)

On the other hand, the following generalizations would work only without 
overflow:
- handling of inequality-comparisons - merely have to account for the sign of 
the factor reversing the sense of
  the inequality, e.g. -3*X >= 15 ---> X <= 5
- If the right-hand-side constant is not a multiple of the constant factor, the 
product is always unequal, i.e.
  an EQ test would be always false, NE would be always true.



Re: Simplify X * C1 == C2 with undefined overflow

2020-08-04 Thread Marc Glisse

On Mon, 3 Aug 2020, Richard Biener wrote:


On Sat, Aug 1, 2020 at 9:29 AM Marc Glisse  wrote:


Hello,

this transformation is quite straightforward, without overflow, 3*X==15 is
the same as X==5 and 3*X==5 cannot happen. Adding a single_use restriction
for the first case didn't seem necessary, although of course it can
slightly increase register pressure in some cases.

Bootstrap+regtest on x86_64-pc-linux-gnu.


OK with using constant_boolean_node (cmp == NE_EXPR, type).

ISTR we had the x * 0 == CST simplification somewhere
but maybe it was x * y == 0 ...  ah, yes:

/* Transform comparisons of the form X * C1 CMP 0 to X CMP 0 in the
  signed arithmetic case.  That form is created by the compiler
  often enough for folding it to be of value.  One example is in
  computing loop trip counts after Operator Strength Reduction.  */
(for cmp (simple_comparison)
scmp (swapped_simple_comparison)
(simplify
 (cmp (mult@3 @0 INTEGER_CST@1) integer_zerop@2)

As it is placed after your pattern it will be never matched I think
(but we don't warn because of INTEGER_CST vs. integer_zerop).

But I think your pattern subsumes it besides of the X * 0 == 0
compare - oh, and the other pattern also handles relational compares
(those will still trigger).

Maybe place the patterns next to each other?  Also see whether
moving yours after the above will cease the testcases to be handled
because it's no longer matched - if not this might be the better
order.


I moved it after, it still works, so I pushed the patch. Note that the 
other transformation has a single_use restriction, while this one doesn't, 
that's not very consistent, but also hopefully not so important...


--
Marc Glisse


Re: Simplify X * C1 == C2 with undefined overflow

2020-08-03 Thread Richard Biener via Gcc-patches
On Sat, Aug 1, 2020 at 9:29 AM Marc Glisse  wrote:
>
> Hello,
>
> this transformation is quite straightforward, without overflow, 3*X==15 is
> the same as X==5 and 3*X==5 cannot happen. Adding a single_use restriction
> for the first case didn't seem necessary, although of course it can
> slightly increase register pressure in some cases.
>
> Bootstrap+regtest on x86_64-pc-linux-gnu.

OK with using constant_boolean_node (cmp == NE_EXPR, type).

ISTR we had the x * 0 == CST simplification somewhere
but maybe it was x * y == 0 ...  ah, yes:

/* Transform comparisons of the form X * C1 CMP 0 to X CMP 0 in the
   signed arithmetic case.  That form is created by the compiler
   often enough for folding it to be of value.  One example is in
   computing loop trip counts after Operator Strength Reduction.  */
(for cmp (simple_comparison)
 scmp (swapped_simple_comparison)
 (simplify
  (cmp (mult@3 @0 INTEGER_CST@1) integer_zerop@2)

As it is placed after your pattern it will be never matched I think
(but we don't warn because of INTEGER_CST vs. integer_zerop).

But I think your pattern subsumes it besides of the X * 0 == 0
compare - oh, and the other pattern also handles relational compares
(those will still trigger).

Maybe place the patterns next to each other?  Also see whether
moving yours after the above will cease the testcases to be handled
because it's no longer matched - if not this might be the better
order.

Richard.

> 2020-08-03  Marc Glisse  
>
> PR tree-optimization/95433
> * match.pd (X * C1 == C2): New transformation.
>
> * gcc.c-torture/execute/pr23135.c: Add -fwrapv to avoid
> undefined behavior.
> * gcc.dg/tree-ssa/pr95433.c: New file.
>
> --
> Marc Glisse


Simplify X * C1 == C2 with undefined overflow

2020-08-01 Thread Marc Glisse

Hello,

this transformation is quite straightforward, without overflow, 3*X==15 is 
the same as X==5 and 3*X==5 cannot happen. Adding a single_use restriction 
for the first case didn't seem necessary, although of course it can 
slightly increase register pressure in some cases.


Bootstrap+regtest on x86_64-pc-linux-gnu.

2020-08-03  Marc Glisse  

PR tree-optimization/95433
* match.pd (X * C1 == C2): New transformation.

* gcc.c-torture/execute/pr23135.c: Add -fwrapv to avoid
undefined behavior.
* gcc.dg/tree-ssa/pr95433.c: New file.

--
Marc Glissediff --git a/gcc/match.pd b/gcc/match.pd
index a052c9e3dbc..78fd8cf5d9e 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1578,6 +1578,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 	&& wi::neg_p (wi::to_wide (@1), TYPE_SIGN (TREE_TYPE (@1
 (cmp @2 @0))
 
+/* For integral types with undefined overflow fold
+   x * C1 == C2 into x == C2 / C1 or false.  */
+(for cmp (eq ne)
+ (simplify
+  (cmp (mult @0 INTEGER_CST@1) INTEGER_CST@2)
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
+   && wi::to_wide (@1) != 0)
+   (with { widest_int quot; }
+(if (wi::multiple_of_p (wi::to_widest (@2), wi::to_widest (@1),
+			TYPE_SIGN (TREE_TYPE (@0)), ))
+ (cmp @0 { wide_int_to_tree (TREE_TYPE (@0), quot); })
+ { build_int_cst (type, cmp == NE_EXPR); })
+
 /* (X - 1U) <= INT_MAX-1U into (int) X > 0.  */
 (for cmp (le gt)
  icmp (gt le)
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr23135.c b/gcc/testsuite/gcc.c-torture/execute/pr23135.c
index e740ff52874..ef9b7efc9c4 100644
--- a/gcc/testsuite/gcc.c-torture/execute/pr23135.c
+++ b/gcc/testsuite/gcc.c-torture/execute/pr23135.c
@@ -1,7 +1,7 @@
 /* Based on execute/simd-1.c, modified by joern.renne...@st.com to
trigger a reload bug.  Verified for gcc mainline from 20050722 13:00 UTC
for sh-elf -m4 -O2.  */
-/* { dg-options "-Wno-psabi" } */
+/* { dg-options "-Wno-psabi -fwrapv" } */
 /* { dg-add-options stack_size } */
 
 #ifndef STACK_SIZE
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr95433.c b/gcc/testsuite/gcc.dg/tree-ssa/pr95433.c
new file mode 100644
index 000..4e161ee26cc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr95433.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-optimized" } */
+
+int f(int x){return x*7==17;}
+int g(int x){return x*3==15;}
+
+/* { dg-final { scan-tree-dump "return 0;" "optimized" } } */
+/* { dg-final { scan-tree-dump "== 5;" "optimized" } } */