Re: [PATCH] Tree-level fix for PR 69526

2017-05-18 Thread Robin Dapp
> Hmm, won't (uint32_t + uint32_t-CST) doesn't overflow be sufficient
> condition for such transformation?

Yes, in principle this should suffice.  What we're actually looking for
is something like a "proper" (or no) overflow, i.e. an overflow in both
min and max of the value range.  In

 (a + cst1) + cst2

an overflow of (a + cst1) will still produce a valid range as long as
overflow(a_min + cst1) == overflow(a_max + cst1).  When max overflows
but min does not we must not simplify.  Currently, I'm checking if the
range of (a + cst1) is still a VR_RANGE, for now disregarding
ANTI_RANGEs which could most likely be dealt with as well.

A major discussion point in my last try was to wrongly always use
sign-extend when extending cst1 to the outer type.  This is now changed
to use sign-extension when (a + cst1) "properly" overflows as in

 ASSUME (a > 0);
 (unsigned long)(a + UINT_MAX) + 1;

resulting in (unsigned long)(a) + (unsigned long)0, or use the type sign
of the constant like in

 ASSUME (a < 2);
 (unsigned long)(a + 4294967294u) + 10;

resulting in (unsigned long)(a) + (unsigned long)((unsigned
long)4294967294 + (unsigned long)10).  The additional flag from the last
patch is not necessary.

Test suite is clean on s390x and x86, bootstraps successful.

Regards
 Robin



Re: [PATCH] Tree-level fix for PR 69526

2017-05-11 Thread Bin.Cheng
On Tue, Jan 17, 2017 at 9:48 AM, Richard Biener
 wrote:
> On Tue, Jan 10, 2017 at 2:32 PM, Robin Dapp  wrote:
>> Perhaps I'm still missing how some cases are handled or not handled,
>> sorry for the noise.
>>
>>> I'm not sure there is anything to "interpret" -- the operation is unsigned
>>> and overflow is when the operation may wrap around zero.  There might
>>> be clever ways of re-writing the expression to
>>> (uint64_t)((uint32_t)((int32_t)uint32 + -1) + 1)
>>> avoiding the overflow and thus allowing the transform but I'm not sure 
>>> that's
>>> good.
>>
>> The extra work I introduced was to discern between
>>
>>  (uint64_t)(a + UINT_MAX) + 1  -> (uint64_t)(a),
>>  (uint64_t)(a + UINT_MAX) + 1  -> (uint64_t)(a) + (uint64_t)(UINT_MAX + 1),
>>
>> For a's range of [1,1] there is an overflow in both cases.
>> We still want to simplify the first case by combining UINT_MAX + 1 -> 0.
>
> So there's the case where a == 0 where (uint64_t)(0 + UINT_MAX) + 1 is not 
> zero.
>
> I think we're still searching for the proper condition on when it is allowed 
> to
> re-write
>
>  (uint64_t)(uint32_t + uint32_t-CST) + uint64_t-CST
>
> to
>
>  (uint64_t)(uint32_t) + (uint64_t)uint32_t-CST + uint64_t-CST
Hmm, won't (uint32_t + uint32_t-CST) doesn't overflow be sufficient
condition for such transformation?
>
> or to
>
>  (uint64_t)(uint32_t) + (uint64_t)(uint32_t-CST + (uint32_t)uint64_t-CST)
We don't actually need to prove this?  What complicates this is when
it's safe to transform:
 (uint64_t)(uint32_t + uint32_t-CST) + uint64_t-CST
into
 (uint64_t)(uint32_t + uint32_t-CSTx)
where
 uint32_t-CSTx = uint32_t-CST + (uint32_t)uint64_t-CST

Am I misunderstanding the whole thing?

Thanks,
bin
>
>> If "interpreting" UINT_MAX as -1 is not the correct thing to do, perhaps
>> (uint64_t)((uint32_t)(UINT_MAX + 1)) is? This fails, however, if the
>> outer constant is larger than UINT_MAX. What else can we do here?
>> Do we see cases like the second one at all? If it's not needed, the
>> extra work is likely not needed.
>
> We do have the need to associate and simplfy across conversions but of
> course we have to do it conservatively correct.
>
>>> A related thing would be canonicalizing unsigned X plus CST to
>>> unsigned X minus CST'
>>> if CST' has a smaller absolute value than CST.  I think currently we
>>> simply canonicalize
>>> to 'plus CST' but also only in fold-const.c, not in match.pd (ah we
>>> do, but only in a simplified manner).
>>
>> I can imagine this could simplify the treatment of some cases, yet I'm
>> already a bit lost with the current cases :)
>
> Yes, but I am lost a bit as well.  I don't know the correct conditions to test
> off-head -- I know we may not introduce new undefined overflow ops and
> of course we need to not compute wrong numbers either.
>
>>> That said, can we leave that "trick" out of the patch?  I think your
>>> more complicated
>>> "overflows" result from extract_range_from_binary_expr_1 doesn't apply to 
>>> all
>>> ops (like MULT_EXPR where more complicated cases can arise).
>>
>> There is certainly additional work to be done for MULT_EXPR, I
>> disregarded it so far. For this patch, I'd rather conservatively assume
>> overflow.
>
> Ok...
>
> Richard.
>
>> Regards
>>  Robin
>>


Re: [PATCH] Tree-level fix for PR 69526

2017-05-09 Thread Robin Dapp
ping.



Re: [PATCH] Tree-level fix for PR 69526

2017-02-02 Thread Robin Dapp
I skimmed through the code to see where transformation like
(a - 1) -> (a + UINT_MAX) are performed. It seems there are only two
places, match.pd (/* A - B -> A + (-B) if B is easily negatable.  */)
and fold-const.c.

In order to be able to reliably know whether to zero-extend or to
sign-extend the constant (1) in

 (ulong)(a - 1) + 1 -> (ulong)(a) + 0
 or -> (ulong)(a) + (ulong)(UINT_MAX + 1)

we'd have to know if the constant was converted by a transformation like
above.

I first tried removing the two transformations altogether but this
causes around 20 new regressions on s390x and I didn't go through all of
them to see whether they can be fixed. Is there a rationale for applying
the transformations other than canonicalization? I saw some
optimizations that only apply to PLUS_EXPRs but that can easily be
changed to also include MINUS_EXPR.

My other attempt entails an additional flag TREE_WAS_SIGNED for the lack
of a better naming idea. It is set when the above transformation takes
place. I used (NODE)->base.deprecated_flag but there may be better
choices. Tentative/example patch attached for reference.

Using this, the type extension in my original patch can be changed to
w1 = w1.from (w1, TYPE_PRECISION (type), TREE_WAS_SIGNED (@1)
  ? SIGNED : TYPE_SIGN (inner_type));
which works for me and does not introduce regressions on s390x.

Is this a viable approach?

Regards
 Robin
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index c649e54..cbb7469 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -9648,9 +9648,15 @@ fold_binary_loc (location_t loc,
 	   && (TREE_CODE (op1) != REAL_CST
 		   || REAL_VALUE_NEGATIVE (TREE_REAL_CST (op1
 	  || INTEGRAL_TYPE_P (type)))
-	return fold_build2_loc (loc, PLUS_EXPR, type,
-fold_convert_loc (loc, type, arg0),
-negate_expr (op1));
+	{
+	  tree negtem = negate_expr (op1);
+	  if (CONSTANT_CLASS_P (negtem))
+	TREE_WAS_SIGNED (negtem) = 1;
+	  tree tem = fold_build2_loc (loc, PLUS_EXPR, type,
+  fold_convert_loc (loc, type, arg0),
+  negtem);
+	  return tem;
+	}
 
   /* Fold [i] - [j] to i-j.  */
   if (TREE_CODE (arg0) == ADDR_EXPR
@@ -13620,6 +13626,7 @@ fold_negate_const (tree arg0, tree type)
 	t = force_fit_type (type, val, 1,
 			(overflow | TREE_OVERFLOW (arg0))
 			&& !TYPE_UNSIGNED (type));
+	TREE_WAS_SIGNED (t) = TREE_WAS_SIGNED (arg0);
 	break;
   }
 
diff --git a/gcc/match.pd b/gcc/match.pd
index b3f2063..e791942 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -870,7 +870,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (minus @0 negate_expr_p@1)
  (if (!FIXED_POINT_TYPE_P (type))
- (plus @0 (negate @1
+  (with {
+   if (CONSTANT_CLASS_P (@1))
+ TREE_WAS_SIGNED (@1) = 1;
+   }
+ (plus @0 (negate @1)
 
 /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
when profitable.
@@ -1223,8 +1227,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 		/* Extend @1 to TYPE.  Perform sign extension if the range
 		   overflowed but did not split.  */
-		w1 = w1.from (w1, TYPE_PRECISION (type), ovf.ovf ? SIGNED :
-		TYPE_SIGN (inner_type));
+		w1 = w1.from (w1, TYPE_PRECISION (type), TREE_WAS_SIGNED (@1)
+			  ? SIGNED : TYPE_SIGN (inner_type));
 		/*
 		w1 = w1.from (w1, TYPE_PRECISION (type), TYPE_SIGN (inner_type));
 		*/
diff --git a/gcc/tree.h b/gcc/tree.h
index 62cd7bb..1e7efd9 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -735,11 +735,18 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
 
 #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
 
+/* Foo.  */
+
+#define TREE_WAS_SIGNED(NODE) (CST_CHECK (NODE)->base.deprecated_flag)
+
 /* TREE_OVERFLOW can only be true for EXPR of CONSTANT_CLASS_P.  */
 
 #define TREE_OVERFLOW_P(EXPR) \
  (CONSTANT_CLASS_P (EXPR) && TREE_OVERFLOW (EXPR))
 
+#define TREE_WAS_SIGNED_P(EXPR) \
+ (CONSTANT_CLASS_P (EXPR) && TREE_WAS_SIGNED (EXPR))
+
 /* In a VAR_DECL, FUNCTION_DECL, NAMESPACE_DECL or TYPE_DECL,
nonzero means name is to be accessible from outside this translation unit.
In an IDENTIFIER_NODE, nonzero means an external declaration


Re: [PATCH] Tree-level fix for PR 69526

2017-01-17 Thread Richard Biener
On Tue, Jan 10, 2017 at 2:32 PM, Robin Dapp  wrote:
> Perhaps I'm still missing how some cases are handled or not handled,
> sorry for the noise.
>
>> I'm not sure there is anything to "interpret" -- the operation is unsigned
>> and overflow is when the operation may wrap around zero.  There might
>> be clever ways of re-writing the expression to
>> (uint64_t)((uint32_t)((int32_t)uint32 + -1) + 1)
>> avoiding the overflow and thus allowing the transform but I'm not sure that's
>> good.
>
> The extra work I introduced was to discern between
>
>  (uint64_t)(a + UINT_MAX) + 1  -> (uint64_t)(a),
>  (uint64_t)(a + UINT_MAX) + 1  -> (uint64_t)(a) + (uint64_t)(UINT_MAX + 1),
>
> For a's range of [1,1] there is an overflow in both cases.
> We still want to simplify the first case by combining UINT_MAX + 1 -> 0.

So there's the case where a == 0 where (uint64_t)(0 + UINT_MAX) + 1 is not zero.

I think we're still searching for the proper condition on when it is allowed to
re-write

 (uint64_t)(uint32_t + uint32_t-CST) + uint64_t-CST

to

 (uint64_t)(uint32_t) + (uint64_t)uint32_t-CST + uint64_t-CST

or to

 (uint64_t)(uint32_t) + (uint64_t)(uint32_t-CST + (uint32_t)uint64_t-CST)

> If "interpreting" UINT_MAX as -1 is not the correct thing to do, perhaps
> (uint64_t)((uint32_t)(UINT_MAX + 1)) is? This fails, however, if the
> outer constant is larger than UINT_MAX. What else can we do here?
> Do we see cases like the second one at all? If it's not needed, the
> extra work is likely not needed.

We do have the need to associate and simplfy across conversions but of
course we have to do it conservatively correct.

>> A related thing would be canonicalizing unsigned X plus CST to
>> unsigned X minus CST'
>> if CST' has a smaller absolute value than CST.  I think currently we
>> simply canonicalize
>> to 'plus CST' but also only in fold-const.c, not in match.pd (ah we
>> do, but only in a simplified manner).
>
> I can imagine this could simplify the treatment of some cases, yet I'm
> already a bit lost with the current cases :)

Yes, but I am lost a bit as well.  I don't know the correct conditions to test
off-head -- I know we may not introduce new undefined overflow ops and
of course we need to not compute wrong numbers either.

>> That said, can we leave that "trick" out of the patch?  I think your
>> more complicated
>> "overflows" result from extract_range_from_binary_expr_1 doesn't apply to all
>> ops (like MULT_EXPR where more complicated cases can arise).
>
> There is certainly additional work to be done for MULT_EXPR, I
> disregarded it so far. For this patch, I'd rather conservatively assume
> overflow.

Ok...

Richard.

> Regards
>  Robin
>


Re: [PATCH] Tree-level fix for PR 69526

2017-01-16 Thread Robin Dapp
Ping.

To put it shortly, I'm not sure how to differentiate between:

example range of a: [3,3]
(ulong)(a + UINT_MAX) + 1 --> (ulong)(a) + (ulong)(-1 + 1), sign extend

example range of a: [0,0]
(ulong)(a + UINT_MAX) + 1 --> (ulong)(a) + (ulong)(UINT_MAX + 1), no
sign extend

In this case, there is an "ok" overflow in the first example's range and
no overflow in the second, but I don't think this suffices as criterion.
As said, your proposed canonicalization (" unsigned X plus CST to
unsigned X minus CST' if CST' has a smaller absolute value than CST)
might help here, but I'm unsure how to do it currently.



Re: [PATCH] Tree-level fix for PR 69526

2017-01-10 Thread Robin Dapp
Perhaps I'm still missing how some cases are handled or not handled,
sorry for the noise.

> I'm not sure there is anything to "interpret" -- the operation is unsigned
> and overflow is when the operation may wrap around zero.  There might
> be clever ways of re-writing the expression to
> (uint64_t)((uint32_t)((int32_t)uint32 + -1) + 1)
> avoiding the overflow and thus allowing the transform but I'm not sure that's
> good.

The extra work I introduced was to discern between

 (uint64_t)(a + UINT_MAX) + 1  -> (uint64_t)(a),
 (uint64_t)(a + UINT_MAX) + 1  -> (uint64_t)(a) + (uint64_t)(UINT_MAX + 1),

For a's range of [1,1] there is an overflow in both cases.
We still want to simplify the first case by combining UINT_MAX + 1 -> 0.
If "interpreting" UINT_MAX as -1 is not the correct thing to do, perhaps
(uint64_t)((uint32_t)(UINT_MAX + 1)) is? This fails, however, if the
outer constant is larger than UINT_MAX. What else can we do here?
Do we see cases like the second one at all? If it's not needed, the
extra work is likely not needed.

> A related thing would be canonicalizing unsigned X plus CST to
> unsigned X minus CST'
> if CST' has a smaller absolute value than CST.  I think currently we
> simply canonicalize
> to 'plus CST' but also only in fold-const.c, not in match.pd (ah we
> do, but only in a simplified manner).

I can imagine this could simplify the treatment of some cases, yet I'm
already a bit lost with the current cases :)

> That said, can we leave that "trick" out of the patch?  I think your
> more complicated
> "overflows" result from extract_range_from_binary_expr_1 doesn't apply to all
> ops (like MULT_EXPR where more complicated cases can arise).

There is certainly additional work to be done for MULT_EXPR, I
disregarded it so far. For this patch, I'd rather conservatively assume
overflow.

Regards
 Robin



Re: [PATCH] Tree-level fix for PR 69526

2016-12-13 Thread Richard Biener
On Wed, Dec 7, 2016 at 5:14 PM, Robin Dapp  wrote:
>> So we have (uint64_t)(uint32 + -1U) + 1 and using TYPE_SIGN (inner_type)
>> produces (uint64_t)uint32 + -1U + 1.  This simply means that we cannot ignore
>> overflow of the inner operation and for some reason your change
>> to extract_range_from_binary_expr didn't catch this.  That is _8 + 4294967295
>> overflows but we ignored that.
>
> In this case the range of _8 was [1,1] so no overflow.
> I think the problem is rather about the interpretation of the inner
> constant. I tried discerning two cases now, a range split (i.e. when the
> single range of a binop variable becomes an anti range or two ranges
> after combining it with the binop's other range) and an overflow of the
> range's min and max.
> If the range didn't split, we can perform the simplification. If there
> was a min and max overflow, we have to interpret the inner constant as
> signed and perform a sign extension before converting it to the outer
> type. Without overflow we can use TYPE_SIGN (inner_type).
> Does this make sense?

I'm not sure there is anything to "interpret" -- the operation is unsigned
and overflow is when the operation may wrap around zero.  There might
be clever ways of re-writing the expression to
(uint64_t)((uint32_t)((int32_t)uint32 + -1) + 1)
avoiding the overflow and thus allowing the transform but I'm not sure that's
good.

A related thing would be canonicalizing unsigned X plus CST to
unsigned X minus CST'
if CST' has a smaller absolute value than CST.  I think currently we
simply canonicalize
to 'plus CST' but also only in fold-const.c, not in match.pd (ah we
do, but only in a simplified manner).

That said, can we leave that "trick" out of the patch?  I think your
more complicated
"overflows" result from extract_range_from_binary_expr_1 doesn't apply to all
ops (like MULT_EXPR where more complicated cases can arise).

Richard.


> Included the remarks and attached the new version.
>
> Regards
>  Robin


Re: [PATCH] Tree-level fix for PR 69526

2016-12-07 Thread Robin Dapp
> So we have (uint64_t)(uint32 + -1U) + 1 and using TYPE_SIGN (inner_type)
> produces (uint64_t)uint32 + -1U + 1.  This simply means that we cannot ignore
> overflow of the inner operation and for some reason your change
> to extract_range_from_binary_expr didn't catch this.  That is _8 + 4294967295
> overflows but we ignored that.

In this case the range of _8 was [1,1] so no overflow.
I think the problem is rather about the interpretation of the inner
constant. I tried discerning two cases now, a range split (i.e. when the
single range of a binop variable becomes an anti range or two ranges
after combining it with the binop's other range) and an overflow of the
range's min and max.
If the range didn't split, we can perform the simplification. If there
was a min and max overflow, we have to interpret the inner constant as
signed and perform a sign extension before converting it to the outer
type. Without overflow we can use TYPE_SIGN (inner_type).
Does this make sense?

Included the remarks and attached the new version.

Regards
 Robin
diff --git a/gcc/match.pd b/gcc/match.pd
index feaa4a1..19df142 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1195,6 +1195,81 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(minus @0 (minus @0 @1))
@1)
 
+  /* ((T)(A +- CST)) +- CST -> (T)(A) +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+ (for inner_op (plus minus)
+   (simplify
+	 (outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
+	   (if (TREE_CODE (type) == INTEGER_TYPE
+		&& TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3)))
+	(with
+	{
+	  struct vr_binop_overflow ovf;
+	  tree cst = NULL_TREE;
+	  tree inner_type = TREE_TYPE (@3);
+	  value_range vr = VR_INITIALIZER;
+
+	  /* Convert combined constant to tree of outer type if
+		 there was no value range split in the original operation.  */
+	  if (TYPE_OVERFLOW_UNDEFINED (inner_type)
+		  || (extract_range_from_binary_expr (, inner_op,
+		inner_type, @0, @1, ), !ovf.range_split))
+	  {
+		wide_int w1 = @1;
+		wide_int w2 = @2;
+
+		wide_int combined_cst;
+
+		/* Extend @1 to TYPE.  Perform sign extension if the range
+		   overflowed but did not split.  */
+		w1 = w1.from (w1, TYPE_PRECISION (type), ovf.ovf ? SIGNED :
+		TYPE_SIGN (inner_type));
+
+		if (inner_op == MINUS_EXPR)
+		  w1 = wi::neg (w1);
+
+		if (outer_op == MINUS_EXPR)
+		  w2 = wi::neg (w2);
+
+		/* Combine in outer, larger type.  */
+		bool combine_ovf = true;
+		combined_cst = wi::add (w1, w2);
+
+		cst = wide_int_to_tree (type, combined_cst);
+	  }
+	}
+	(if (cst)
+	 (outer_op (convert @0) { cst; }))
+	)
+#endif
+
+  /* ((T)(A)) +- CST -> (T)(A +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+(simplify
+ (outer_op (convert @0) INTEGER_CST@2)
+  (if (TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
+	   && TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
+	   && TREE_CODE (type) == INTEGER_TYPE)
+   /* Perform binary operation inside the cast if the constant fits
+	  and there is no overflow.  */
+   (with
+	{
+	  struct vr_binop_overflow ovf;
+
+	  int_fits_type_p (@2, TREE_TYPE (@0));
+	  tree cst_inner = fold_convert (TREE_TYPE (@0), @2);
+
+	  value_range vr = VR_INITIALIZER;
+	  extract_range_from_binary_expr (, outer_op, TREE_TYPE (@0),
+	  @0, cst_inner, );
+	}
+	(if (!ovf.range_split)
+	 (convert (outer_op @0 { cst_inner; })))
+	
+#endif
+
   /* (A +- CST1) +- CST2 -> A + CST3  */
   (for outer_op (plus minus)
(for inner_op (plus minus)
diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c
new file mode 100644
index 000..19b787b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ccp1-details" } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to" 11 "ccp1" } } */
+
+#include 
+
+long foo(int a)
+{
+  return (long)(a - 2) + 1;
+}
+
+long bar(int a)
+{
+  return (long)(a + 3) - 1;
+}
+
+long baz(int a)
+{
+  return (long)(a - 1) + 2;
+}
+
+long baf(int a)
+{
+  return (long)(a + 1) - 2;
+}
+
+long bak(int a)
+{
+  return (long)(a + 1) + 3;
+}
+
+long bal(int a)
+{
+  return (long)(a - 7) - 4;
+}
+
+long bam(int a)
+{
+  return (long)(a - 1) - INT_MAX;
+}
+
+long bam2(int a)
+{
+  return (long)(a + 1) + INT_MAX;
+}
+
+long ban(int a)
+{
+  return (long)(a - 1) + INT_MIN;
+}
+
+long ban2(int a)
+{
+  return (long)(a + 1) - INT_MIN;
+}
+
+unsigned long baq(int a)
+{
+  return (unsigned long)(a + 1) - 1;
+}
diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-1.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-1.c
new file mode 100644
index 000..8befc96
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-1.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ccp1-details -fdump-tree-evrp-details 

Re: [PATCH] Tree-level fix for PR 69526

2016-12-06 Thread Richard Biener
On Mon, Nov 28, 2016 at 2:26 PM, Robin Dapp  wrote:
>>> + /* Sign-extend @1 to TYPE.  */
>>> + w1 = w1.from (w1, TYPE_PRECISION (type), SIGNED);
>>>
>>> not sure why you do always sign-extend.  If the inner op is unsigned
>>> and we widen then that's certainly bogus considering your UINT_MAX
>>> example above.  Does
>>>
>>>w1 = w1.from (w1, TYPE_PRECISION (type), TYPE_SIGN 
>>> (inner_type));
>>>
>>> not work for some reason?
>
> With TYPE_SIGN (inner_type), I encountered situations like this in the
> testsuite (mostly Fortran but also 2422-1.c):
>
>   Folding statement: _1 = _9 + 1;
>   Applying pattern match.pd:1214, gimple-match.c:8719
>   gimple_simplified to _93 = (sizetype) _8;
>   _1 = _93 + 4294967296;
>   Folded into: _1 = _93 + 4294967296;
>
> in
>   _8 = (unsigned int) i_73;
>   _5 = _8 + 4294967295;
>   _9 = (sizetype) _5;
>   _93 = (sizetype) _8;
>   _1 = _93 + 4294967296;
>
> TYPE_SIGN (inner_type) is PLUS here, although it should be MINUS in
> order to combine -1 and +1 to 0. Perhaps this can be handled differently
> with keeping TYPE_SIGN (inner_type)?

So we have (uint64_t)(uint32 + -1U) + 1 and using TYPE_SIGN (inner_type)
produces (uint64_t)uint32 + -1U + 1.  This simply means that we cannot ignore
overflow of the inner operation and for some reason your change
to extract_range_from_binary_expr didn't catch this.  That is _8 + 4294967295
overflows but we ignored that.

> On the other hand, using SIGNED instead of TYPE_SIGN (inner_type) worked
> for all cases I looked at, like
>   if (a > 1 && a < 10)
> return (unsigned long)(a + UINT_MAX) + 1;
> For
>   if (a > 0 && a < 10)
> extract_range...() would not find a non-VR_VARYING range although we
> should probably still be able to simplify this.
>
>   if (a > 0)
> Here, vrp figures out [0,4294967294] and the simplification takes place.
>
> For
>   if (a <= 10)
> return (unsigned long)(a + (UINT_MAX - 10)) + 1;
> 003t.original already reads
> return (long unsigned int) a + 4294967286
>
> From this I hand-wavingly deduced, we'd only see instances where a
> sign-extension of the constant is fine (test suites on s390x and x86
> affirm this for what it's woth) but I don't have a cogent reason hence
> my doubts :)

Well, even if sign-extending you can probably construct a testcase where
that's still wrong with respect to overflow.

Richard.

> I'm ok with omitting the sign-changing case (I hadn't though of it
> anyway) and adapted the patch. I haven't attached the updated version
> though, because I'm still unsure about the issue above (despite the
> clean test suite).
>
> Regards
>  Robin
>


Re: [PATCH] Tree-level fix for PR 69526

2016-12-04 Thread Robin Dapp
Ping. Any idea how to tackle this?



Re: [PATCH] Tree-level fix for PR 69526

2016-11-28 Thread Robin Dapp
>> + /* Sign-extend @1 to TYPE.  */
>> + w1 = w1.from (w1, TYPE_PRECISION (type), SIGNED);
>>
>> not sure why you do always sign-extend.  If the inner op is unsigned
>> and we widen then that's certainly bogus considering your UINT_MAX
>> example above.  Does
>>
>>w1 = w1.from (w1, TYPE_PRECISION (type), TYPE_SIGN 
>> (inner_type));
>>
>> not work for some reason?

With TYPE_SIGN (inner_type), I encountered situations like this in the
testsuite (mostly Fortran but also 2422-1.c):

  Folding statement: _1 = _9 + 1;
  Applying pattern match.pd:1214, gimple-match.c:8719
  gimple_simplified to _93 = (sizetype) _8;
  _1 = _93 + 4294967296;
  Folded into: _1 = _93 + 4294967296;

in
  _8 = (unsigned int) i_73;
  _5 = _8 + 4294967295;
  _9 = (sizetype) _5;
  _93 = (sizetype) _8;
  _1 = _93 + 4294967296;

TYPE_SIGN (inner_type) is PLUS here, although it should be MINUS in
order to combine -1 and +1 to 0. Perhaps this can be handled differently
with keeping TYPE_SIGN (inner_type)?

On the other hand, using SIGNED instead of TYPE_SIGN (inner_type) worked
for all cases I looked at, like
  if (a > 1 && a < 10)
return (unsigned long)(a + UINT_MAX) + 1;
For
  if (a > 0 && a < 10)
extract_range...() would not find a non-VR_VARYING range although we
should probably still be able to simplify this.

  if (a > 0)
Here, vrp figures out [0,4294967294] and the simplification takes place.

For
  if (a <= 10)
return (unsigned long)(a + (UINT_MAX - 10)) + 1;
003t.original already reads
return (long unsigned int) a + 4294967286

>From this I hand-wavingly deduced, we'd only see instances where a
sign-extension of the constant is fine (test suites on s390x and x86
affirm this for what it's woth) but I don't have a cogent reason hence
my doubts :)

I'm ok with omitting the sign-changing case (I hadn't though of it
anyway) and adapted the patch. I haven't attached the updated version
though, because I'm still unsure about the issue above (despite the
clean test suite).

Regards
 Robin



Re: [PATCH] Tree-level fix for PR 69526

2016-11-28 Thread Richard Biener
On Fri, Nov 25, 2016 at 2:46 PM, Richard Biener
 wrote> On Wed, Nov 16, 2016 at 4:54 PM,
Robin Dapp  wrote:
>> Found some time to look into this again.
>>
>>> Index: tree-ssa-propagate.c
>>> ===
>>> --- tree-ssa-propagate.c(revision 240133)
>>> +++ tree-ssa-propagate.c(working copy)
>>> @@ -1105,10 +1105,10 @@ substitute_and_fold_dom_walker::before_d
>>>/* Replace real uses in the statement.  */
>>>did_replace |= replace_uses_in (stmt, get_value_fn);
>>>
>>> -  /* If we made a replacement, fold the statement.  */
>>> -  if (did_replace)
>>> +  /* Fold the statement.  */
>>> +  if (fold_stmt (, follow_single_use_edges))
>>> {
>>> - fold_stmt (, follow_single_use_edges);
>>> + did_replace = true;
>>>   stmt = gsi_stmt (i);
>>> }
>>>
>>> this would need compile-time cost evaluation (and avoid the tree-vrp.c
>>> folding part
>>> of your patch).
>>
>> This causes an ICE and bootstrap errors with newer revisions. I tested
>> my patch on r240691 where it still works. How can this be done properly now?
>
> Not sure, I'd have to investigate.  It should still just work
> (throwing off a bootstrap
> with just that change over the weekend).

Index: gcc/tree-ssa-propagate.c
===
--- gcc/tree-ssa-propagate.c(revision 242875)
+++ gcc/tree-ssa-propagate.c(working copy)
@@ -1065,13 +1065,15 @@ substitute_and_fold_dom_walker::before_d

   /* Replace real uses in the statement.  */
   did_replace |= replace_uses_in (stmt, get_value_fn);
+  if (did_replace)
+ gimple_set_modified (stmt, true);

   /* If we made a replacement, fold the statement.  */
-  if (did_replace)
+  if (fold_stmt (, follow_single_use_edges))
{
- fold_stmt (, follow_single_use_edges);
  stmt = gsi_stmt (i);
  gimple_set_modified (stmt, true);
+ did_replace = true;
}

   /* Some statements may be simplified using propagator

works fine for me.

>>> OTOH given that you use this to decide whether you can use a combined 
>>> constant
>>> that doesn't look necessary (if the constant is correct, that is) --
>>> what you need
>>> to make sure is that the final operation, (T)(A) +- CST, does not overflow
>>> if ! TYPE_OVERFLOW_WRAPS and there wasn't any overflow in the
>>> original operation.  I think that always holds, thus the combine_ovf check 
>>> looks
>>> superfluous to me.
>>
>> Removed the check and addressed the other remarks.
>>
>>> So now we know that for (T)(X + CST1) + CST2, (T)CST1 + CST2
>>> does not overflow.  But we do not really care for that, we want to know
>>> whether (T)X + CST' might invoke undefined behavior when the original
>>> expression did not.  This involves range information on X.  I don't
>>> see how we ensure this here.
>>
>> I guess I'm still missing an undefined behavior case. In which case can
>> (T)X + CST' trigger undefined behavior where the original statement did
>> not? I see the need for checking in the second pattern ((T)(X) + CST' ->
>> (T)(X + CST')), of course.
>
> Looking at
>
> +  /* ((T)(A +- CST)) +- CST -> (T)(A) +- CST)  */
> +#if GIMPLE
> +   (for outer_op (plus minus)
> + (for inner_op (plus minus)
> +   (simplify
> +(outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
> +  (if (TREE_CODE (type) == INTEGER_TYPE &&
> +   TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@3)))
>
> so the conversion (T) is widening or sign-changing.  (&& go to the next line)
>
> If A + CST overflows we cannot do the transform (you check that
> with extract_range_from_binary_expr setting 'range_split').
>
> If A + CST does not overflow but is unsigned and we are just changing sign
> (precision ==) then (T)A + (CST + CST) might overflow.  Consider
> (int)(INT_MAX + 1) + 1 -> INT_MAX + 2.  I think here the important part
> is whether A + CST fits in T for the case where we just change the type
> to a type with !TYPE_OVERFLOW_WRAPS.  Certainly restricting to
> widenings would avoid the issue.
>
>>> But that's "easily fixable" by computing it in unsigned arithmetic, that is
>>> doing
>>>
>>>   (long)(a + 2) + LONG_MAX -> (long)((unsigned long)a + (LONG_MAX + 2))
>>
>> Does this also work if (unsigned long)a + (LONG_MAX + 2) does not fit
>> into [0,LONG_MAX]? IIRC (unsigned long)(LONG_MAX + 2) is
>> implementation-defined and not undefined so it should work?
>
> Yes, implementation-defined beavior is fine.
>
>> Revised patch version attached. One thing I'm still not sure about is
>> the handling of sth. like (unsigned long)(a + UINT_MAX) + 1 for a = 0.
>> In the current patch version I always do a sign-extend of the first
>> constant (UINT_MAX here) which seems to cause no problems in the
>> testsuite and some custom tests still 

Re: [PATCH] Tree-level fix for PR 69526

2016-11-25 Thread Richard Biener
On Wed, Nov 16, 2016 at 4:54 PM, Robin Dapp  wrote:
> Found some time to look into this again.
>
>> Index: tree-ssa-propagate.c
>> ===
>> --- tree-ssa-propagate.c(revision 240133)
>> +++ tree-ssa-propagate.c(working copy)
>> @@ -1105,10 +1105,10 @@ substitute_and_fold_dom_walker::before_d
>>/* Replace real uses in the statement.  */
>>did_replace |= replace_uses_in (stmt, get_value_fn);
>>
>> -  /* If we made a replacement, fold the statement.  */
>> -  if (did_replace)
>> +  /* Fold the statement.  */
>> +  if (fold_stmt (, follow_single_use_edges))
>> {
>> - fold_stmt (, follow_single_use_edges);
>> + did_replace = true;
>>   stmt = gsi_stmt (i);
>> }
>>
>> this would need compile-time cost evaluation (and avoid the tree-vrp.c
>> folding part
>> of your patch).
>
> This causes an ICE and bootstrap errors with newer revisions. I tested
> my patch on r240691 where it still works. How can this be done properly now?

Not sure, I'd have to investigate.  It should still just work
(throwing off a bootstrap
with just that change over the weekend).

>> OTOH given that you use this to decide whether you can use a combined 
>> constant
>> that doesn't look necessary (if the constant is correct, that is) --
>> what you need
>> to make sure is that the final operation, (T)(A) +- CST, does not overflow
>> if ! TYPE_OVERFLOW_WRAPS and there wasn't any overflow in the
>> original operation.  I think that always holds, thus the combine_ovf check 
>> looks
>> superfluous to me.
>
> Removed the check and addressed the other remarks.
>
>> So now we know that for (T)(X + CST1) + CST2, (T)CST1 + CST2
>> does not overflow.  But we do not really care for that, we want to know
>> whether (T)X + CST' might invoke undefined behavior when the original
>> expression did not.  This involves range information on X.  I don't
>> see how we ensure this here.
>
> I guess I'm still missing an undefined behavior case. In which case can
> (T)X + CST' trigger undefined behavior where the original statement did
> not? I see the need for checking in the second pattern ((T)(X) + CST' ->
> (T)(X + CST')), of course.

Looking at

+  /* ((T)(A +- CST)) +- CST -> (T)(A) +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+ (for inner_op (plus minus)
+   (simplify
+(outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
+  (if (TREE_CODE (type) == INTEGER_TYPE &&
+   TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@3)))

so the conversion (T) is widening or sign-changing.  (&& go to the next line)

If A + CST overflows we cannot do the transform (you check that
with extract_range_from_binary_expr setting 'range_split').

If A + CST does not overflow but is unsigned and we are just changing sign
(precision ==) then (T)A + (CST + CST) might overflow.  Consider
(int)(INT_MAX + 1) + 1 -> INT_MAX + 2.  I think here the important part
is whether A + CST fits in T for the case where we just change the type
to a type with !TYPE_OVERFLOW_WRAPS.  Certainly restricting to
widenings would avoid the issue.

>> But that's "easily fixable" by computing it in unsigned arithmetic, that is
>> doing
>>
>>   (long)(a + 2) + LONG_MAX -> (long)((unsigned long)a + (LONG_MAX + 2))
>
> Does this also work if (unsigned long)a + (LONG_MAX + 2) does not fit
> into [0,LONG_MAX]? IIRC (unsigned long)(LONG_MAX + 2) is
> implementation-defined and not undefined so it should work?

Yes, implementation-defined beavior is fine.

> Revised patch version attached. One thing I'm still not sure about is
> the handling of sth. like (unsigned long)(a + UINT_MAX) + 1 for a = 0.
> In the current patch version I always do a sign-extend of the first
> constant (UINT_MAX here) which seems to cause no problems in the
> testsuite and some custom tests still worked.

+ /* Sign-extend @1 to TYPE.  */
+ w1 = w1.from (w1, TYPE_PRECISION (type), SIGNED);

not sure why you do always sign-extend.  If the inner op is unsigned
and we widen then that's certainly bogus considering your UINT_MAX
example above.  Does

   w1 = w1.from (w1, TYPE_PRECISION (type), TYPE_SIGN (inner_type));

not work for some reason?

+ /* Combine in outer, larger type.  */
+ bool combine_ovf = true;
+ combined_cst = wi::add (w1, w2, SIGNED, _ovf);

as you ignore combine_ovf you can simply use

  combined_cst = wi::add (w1, w2);

+ /* Convert combined constant to tree of outer type if
+there was no value range split in the original operation.  */
+ if (!range_split)
+   {

I'd say you want to condition on range_split early, like with

bool range_split;
if (TYPE_OVERFLOW_UNDEFINED (inner_type)
|| ! (extract_range_from_binary_expr (, _split), range_split))
  {

Re: [PATCH] Tree-level fix for PR 69526

2016-11-24 Thread Robin Dapp
Ping.



Re: [PATCH] Tree-level fix for PR 69526

2016-11-16 Thread Robin Dapp
Found some time to look into this again.

> Index: tree-ssa-propagate.c
> ===
> --- tree-ssa-propagate.c(revision 240133)
> +++ tree-ssa-propagate.c(working copy)
> @@ -1105,10 +1105,10 @@ substitute_and_fold_dom_walker::before_d
>/* Replace real uses in the statement.  */
>did_replace |= replace_uses_in (stmt, get_value_fn);
> 
> -  /* If we made a replacement, fold the statement.  */
> -  if (did_replace)
> +  /* Fold the statement.  */
> +  if (fold_stmt (, follow_single_use_edges))
> {
> - fold_stmt (, follow_single_use_edges);
> + did_replace = true;
>   stmt = gsi_stmt (i);
> }
> 
> this would need compile-time cost evaluation (and avoid the tree-vrp.c
> folding part
> of your patch).

This causes an ICE and bootstrap errors with newer revisions. I tested
my patch on r240691 where it still works. How can this be done properly now?

> OTOH given that you use this to decide whether you can use a combined constant
> that doesn't look necessary (if the constant is correct, that is) --
> what you need
> to make sure is that the final operation, (T)(A) +- CST, does not overflow
> if ! TYPE_OVERFLOW_WRAPS and there wasn't any overflow in the
> original operation.  I think that always holds, thus the combine_ovf check 
> looks
> superfluous to me.

Removed the check and addressed the other remarks.

> So now we know that for (T)(X + CST1) + CST2, (T)CST1 + CST2
> does not overflow.  But we do not really care for that, we want to know
> whether (T)X + CST' might invoke undefined behavior when the original
> expression did not.  This involves range information on X.  I don't
> see how we ensure this here.

I guess I'm still missing an undefined behavior case. In which case can
(T)X + CST' trigger undefined behavior where the original statement did
not? I see the need for checking in the second pattern ((T)(X) + CST' ->
(T)(X + CST')), of course.

> But that's "easily fixable" by computing it in unsigned arithmetic, that is
> doing
> 
>   (long)(a + 2) + LONG_MAX -> (long)((unsigned long)a + (LONG_MAX + 2))

Does this also work if (unsigned long)a + (LONG_MAX + 2) does not fit
into [0,LONG_MAX]? IIRC (unsigned long)(LONG_MAX + 2) is
implementation-defined and not undefined so it should work?

Revised patch version attached. One thing I'm still not sure about is
the handling of sth. like (unsigned long)(a + UINT_MAX) + 1 for a = 0.
In the current patch version I always do a sign-extend of the first
constant (UINT_MAX here) which seems to cause no problems in the
testsuite and some custom tests still worked. Can UINT_MAX, -1 and
similar cases be disambiguated (and correctly converted to the outer
type) when done in unsigned arithmetic?

Thinking about the second pattern, on s390x it introduces more casts
than just using the first one (e.g. in cases where the value will be
sign-extended after the operation which wouldn't have happened when
performing the operation in the larger type. Can we catch this
with a cost function?

On a side note: Can/should VRP infer ranges assuming no undefined
behavior will take place when -fstrict-overflow is in use? I.e.
inferring ~[INT_MIN,INT_MIN] for (long)(a - 1)? Would this even make sense?

Regards
 Robin
diff --git a/gcc/match.pd b/gcc/match.pd
index ba7e013..3d3f5bb 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1150,6 +1150,86 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(minus @0 (minus @0 @1))
@1)
 
+  /* ((T)(A +- CST)) +- CST -> (T)(A) +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+ (for inner_op (plus minus)
+   (simplify
+	 (outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
+	   (if (TREE_CODE (type) == INTEGER_TYPE &&
+		TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@3)))
+	(with
+	{
+	  bool range_split = false;
+	  tree cst = NULL_TREE;
+	  tree inner_type = TREE_TYPE (@3);
+	  value_range vr = VR_INITIALIZER;
+
+	  if (TYPE_OVERFLOW_UNDEFINED (inner_type))
+		range_split = false;
+	  else
+		{
+		  /* Check the inner binop using VRP information.  */
+		  extract_range_from_binary_expr (, inner_op,
+			  inner_type, @0, @1, _split);
+		}
+
+	  wide_int w1 = @1;
+	  wide_int w2 = @2;
+
+	  wide_int combined_cst;
+
+	  /* Sign-extend @1 to TYPE.  */
+	  w1 = w1.from (w1, TYPE_PRECISION (type), SIGNED);
+
+	  if (inner_op == MINUS_EXPR)
+		w1 = wi::neg (w1);
+
+	  if (outer_op == MINUS_EXPR)
+		w2 = wi::neg (w2);
+
+	  /* Combine in outer, larger type.  */
+	  bool combine_ovf = true;
+	  combined_cst = wi::add (w1, w2, SIGNED, _ovf);
+
+	  /* Convert combined constant to tree of outer type if
+		 there was no value range split in the original operation.  */
+	  if (!range_split)
+		{
+		  cst = wide_int_to_tree (type, combined_cst);
+		}
+	}
+	(if (cst)
+	 (outer_op (convert { @0; }) { cst; }))

Re: [PATCH] Tree-level fix for PR 69526

2016-10-14 Thread Richard Biener
On Tue, Sep 20, 2016 at 2:31 PM, Robin Dapp  wrote:
> Hi,
>
>> I meant to do sth like
>>
>> Index: tree-ssa-propagate.c
>> ===
>> --- tree-ssa-propagate.c(revision 240133)
>> +++ tree-ssa-propagate.c(working copy)
>> @@ -1105,10 +1105,10 @@ substitute_and_fold_dom_walker::before_d
>>/* Replace real uses in the statement.  */
>>did_replace |= replace_uses_in (stmt, get_value_fn);
>>
>> -  /* If we made a replacement, fold the statement.  */
>> -  if (did_replace)
>> +  /* Fold the statement.  */
>> +  if (fold_stmt (, follow_single_use_edges))
>> {
>> - fold_stmt (, follow_single_use_edges);
>> + did_replace = true;
>>   stmt = gsi_stmt (i);
>> }
>>
>> this would need compile-time cost evaluation (and avoid the tree-vrp.c
>> folding part
>> of your patch).
>
> Using this causes the simplifications to be performed in ccp1 instead of
> fwprop1. I noticed two tests failing that rely on propagation being
> performed in fwprop. Should these be adapted or rather the patch be changed?
>
>> Heh, but I think duplicating code is even worse.
>
> Ok, I changed extract_range_... after all, it's not so bad as I had
> thought.  Imho it should be simplified nevertheless, perhaps I'll do it
> in the future if I find some time.
>
>> + tree cst_inner = wide_int_to_tree (inner_type, cst);
>>
>> What prevents CST being truncated here?  It looks like
>> (long)intvar + 0xL will be mishandled that way.
>>
>
> Right, it may be truncated here, but the following TYPE_PRECISION ()
> guard prevents the value from being used in that case.  I pulled the
> line into the condition for clarity.
>
>> OTOH given that you use this to decide whether you can use a combined 
>> constant
>> that doesn't look necessary (if the constant is correct, that is) --
>> what you need
>> to make sure is that the final operation, (T)(A) +- CST, does not overflow
>> if ! TYPE_OVERFLOW_WRAPS and there wasn't any overflow in the
>> original operation.  I think that always holds, thus the combine_ovf check 
>> looks
>> superfluous to me.
>>
>
> I included this to account for cases like
>  return (long)(a + 2) + LONG_MAX
> which should not simplify as opposed to
>  return (unsigned long)(a + 2) + LONG_MAX
> where the constant is usable despite the overflow. Do you think it
> should be handled differently?

Well, (long)(a + 2) + LONG_MAX -> (long)a + (LONG_MAX + 2) is
indeed invalid because the resulting expression may overflow.  But
that's "easily fixable" by computing it in unsigned arithmetic, that is
doing

  (long)(a + 2) + LONG_MAX -> (long)((unsigned long)a + (LONG_MAX + 2))

which I believe is still desired?  Anyway, since the desired transform in the
end is (long)a + CST -> (long)(a + CST) this constant will not fit
here anyway...

Few comments on patch details:

+   if (TYPE_PRECISION (type) >= TYPE_PRECISION (inner_type))

please put the precision tests as a pattern if, thus

   (for outer_op (plus minus)
+ (for inner_op (plus minus)
+   (simplify
+(outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
  (if (TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@3)))
   (with
{
...

that makes it more obvious you are matching a widening conversion.

+   if (@0 == NULL_TREE || @1 == NULL_TREE
+   || inner_type == NULL_TREE

captures cannot be NULL_TREE nor can their type be NULL_TREE.  If you run
into such cases sth is broken elsewhere.

You have tests against INTEGER_TYPE - please put those alongsize (even before)
the precision test.  Note that you probably want to use INTEGRAL_TYPE_P there
to also catch ENUM_TYPE and BOOLEAN_TYPE.

+(outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)

you can use @0 to get at the type of inner_op.

For my taste there is too many temporaries, like 'check_inner_ovf'
where at the single
point of use a !TYPE_OVERFLOW_UNDEFINED (inner_type); would be much more
descriptive.

+   /* Convert to TYPE keeping possible signedness even when
+  dealing with unsigned types. */
+   wide_int v1;
+   v1 = v1.from (w1, w2.get_precision(), tree_int_cst_sgn (@1) ?
+ SIGNED : UNSIGNED);

better to comment it as /* Extend @1 to TYPE according to its sign.  */
I also think you want to use TYPE_SIGN (inner_type) instead of
tree_int_cst_sgn (@1) just to simplify the code.  You also want to
use TYPE_PRECISION (inner_type) instead of w2.get_precision ().

+   if (inner_op == MINUS_EXPR)
+ w1 = wi::neg (w1);

I think you want to negate v1 here?  (better not introduce v1 but use w1
for the extension?)

+   /* Combine in outer, larger type.  */
+   combined_cst = wi::add (v1, w2
+ 

Re: [PATCH] Tree-level fix for PR 69526

2016-10-14 Thread Robin Dapp
Ping :)



Re: [PATCH] Tree-level fix for PR 69526

2016-10-05 Thread Robin Dapp
Ping.



Re: [PATCH] Tree-level fix for PR 69526

2016-09-20 Thread Jeff Law

On 09/20/2016 06:31 AM, Robin Dapp wrote:

Hi,


I meant to do sth like

Index: tree-ssa-propagate.c
===
--- tree-ssa-propagate.c(revision 240133)
+++ tree-ssa-propagate.c(working copy)
@@ -1105,10 +1105,10 @@ substitute_and_fold_dom_walker::before_d
   /* Replace real uses in the statement.  */
   did_replace |= replace_uses_in (stmt, get_value_fn);

-  /* If we made a replacement, fold the statement.  */
-  if (did_replace)
+  /* Fold the statement.  */
+  if (fold_stmt (, follow_single_use_edges))
{
- fold_stmt (, follow_single_use_edges);
+ did_replace = true;
  stmt = gsi_stmt (i);
}

this would need compile-time cost evaluation (and avoid the tree-vrp.c
folding part
of your patch).


Using this causes the simplifications to be performed in ccp1 instead of
fwprop1. I noticed two tests failing that rely on propagation being
performed in fwprop. Should these be adapted or rather the patch be changed?
ISTM this is an indication that something changed the IL without folding 
prior to ccp & forwprop.  That's not in and of itself bad, but may be 
worth investigating to see if whatever prior pass that made this change 
ought to be adjusted.


That investigation would likely guide you with what to do with the 
testcase.  If you find that the earlier pass should have folded, then 
you fix it and change the testcase to verify the earlier pass folded 
properly.  Else you change the testcase to verify ccp folds the statement.


I'm going to let Richi own the review on this.  Just thought I'd chime 
in on that one topic.

jeff


Re: [PATCH] Tree-level fix for PR 69526

2016-09-20 Thread Robin Dapp
Hi,

> I meant to do sth like
> 
> Index: tree-ssa-propagate.c
> ===
> --- tree-ssa-propagate.c(revision 240133)
> +++ tree-ssa-propagate.c(working copy)
> @@ -1105,10 +1105,10 @@ substitute_and_fold_dom_walker::before_d
>/* Replace real uses in the statement.  */
>did_replace |= replace_uses_in (stmt, get_value_fn);
> 
> -  /* If we made a replacement, fold the statement.  */
> -  if (did_replace)
> +  /* Fold the statement.  */
> +  if (fold_stmt (, follow_single_use_edges))
> {
> - fold_stmt (, follow_single_use_edges);
> + did_replace = true;
>   stmt = gsi_stmt (i);
> }
> 
> this would need compile-time cost evaluation (and avoid the tree-vrp.c
> folding part
> of your patch).

Using this causes the simplifications to be performed in ccp1 instead of
fwprop1. I noticed two tests failing that rely on propagation being
performed in fwprop. Should these be adapted or rather the patch be changed?

> Heh, but I think duplicating code is even worse.

Ok, I changed extract_range_... after all, it's not so bad as I had
thought.  Imho it should be simplified nevertheless, perhaps I'll do it
in the future if I find some time.

> + tree cst_inner = wide_int_to_tree (inner_type, cst);
> 
> What prevents CST being truncated here?  It looks like
> (long)intvar + 0xL will be mishandled that way.
> 

Right, it may be truncated here, but the following TYPE_PRECISION ()
guard prevents the value from being used in that case.  I pulled the
line into the condition for clarity.

> OTOH given that you use this to decide whether you can use a combined constant
> that doesn't look necessary (if the constant is correct, that is) --
> what you need
> to make sure is that the final operation, (T)(A) +- CST, does not overflow
> if ! TYPE_OVERFLOW_WRAPS and there wasn't any overflow in the
> original operation.  I think that always holds, thus the combine_ovf check 
> looks
> superfluous to me.
> 

I included this to account for cases like
 return (long)(a + 2) + LONG_MAX
which should not simplify as opposed to
 return (unsigned long)(a + 2) + LONG_MAX
where the constant is usable despite the overflow. Do you think it
should be handled differently?

Revised version attached.

Regards
 Robin

--

gcc/ChangeLog:

2016-09-20  Robin Dapp  

PR middle-end/69526
This enables combining of wrapped binary operations and fixes
the tree level part of the PR.

* match.pd: Introduce patterns to simplify binary operations
wrapped inside a cast.
* tree-vrp.h: Export extract_range_from_binary_expr ().
* tree-ssa-propagate.c
(substitute_and_fold_dom_walker::before_dom_children):
Try to fold regardless of prior use propagations.
* tree-vrp.c (extract_range_from_binary_expr_1):
Add overflow check arguments and wrapper function without them.
(extract_range_from_binary_expr):
 Add wrapper function.


gcc/testsuite/ChangeLog:

2016-09-20  Robin Dapp  

* gcc.dg/wrapped-binop-simplify-signed-1.c: New test.
* gcc.dg/wrapped-binop-simplify-signed-2.c: New test.
* gcc.dg/wrapped-binop-simplify-unsigned-1.c: New test.
* gcc.dg/wrapped-binop-simplify-unsigned-2.c: New test.
diff --git a/gcc/match.pd b/gcc/match.pd
index 123e754..f624e7e 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1138,6 +1138,130 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(minus @0 (minus @0 @1))
@1)
 
+  /* ((T)(A +- CST)) +- CST -> (T)(A) +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+ (for inner_op (plus minus)
+   (simplify
+	 (outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
+	   (with
+	{
+	   /* If the inner operation is wrapped inside a conversion, we have
+	  to check it overflows/underflows and can only then perform the
+	  simplification, i.e. add the second constant to the first
+	  (wrapped) and convert afterwards.  */
+	bool inner_ovf = false;
+
+	bool combine_ovf = true;
+	tree cst = NULL_TREE;
+	bool combine_constants = false;
+	bool types_ok = false;
+
+	tree inner_type = TREE_TYPE (@3);
+	/* Check overflow in wrapped binop? */
+	bool check_inner_ovf = !TYPE_OVERFLOW_UNDEFINED (inner_type);
+	bool check_combine_ovf = !TYPE_OVERFLOW_WRAPS (type);
+
+	signop s1 = TYPE_SIGN (type);
+
+	if (TYPE_PRECISION (type) >= TYPE_PRECISION (inner_type))
+	  {
+		types_ok = true;
+
+		/* Check if the inner binop does not overflow i.e. we have VRP
+		   information and can prove prove it.  */
+		if (check_inner_ovf)
+		  {
+		value_range vr = VR_INITIALIZER;
+		if (@0 == NULL_TREE || @1 == NULL_TREE
+			|| inner_type == NULL_TREE
+			|| TREE_CODE (type) != INTEGER_TYPE)
+		  {
+			inner_ovf = true;
+		  }
+		else
+		  

Re: [PATCH] Tree-level fix for PR 69526

2016-09-14 Thread Jeff Law

[ Another patch I'd started working through, but hadn't completed... ]

On 09/14/2016 07:05 AM, Richard Biener wrote:

On Mon, Aug 22, 2016 at 4:58 PM, Robin Dapp  wrote:

if !inner_ovf (just set that to false if !check_inner_ovf to simplify
checks please).
you know it's valid to transform the op to (T)@0 innerop (T)@1 outerop @2
(if T is wider than the inner type which I think you should check and
which should
simplify things).


I simplified the control flow a little and split both parts of the
combination into separate patterns.


You can then combine (T)@1 and @2 where I think you fail to handle mixed
MINUS_EXPR/PLUS_EXPR (practically you'll see only PLUS_EXPRs for

integers).

Is it ok to do something like this (see patch) in order to deal with
MINUS_EXPR and then perform a wi::add?

if (inner_op == MINUS_EXPR)
w1 = wi::neg (w1);

if (outer_op == MINUS_EXPR)
w2 = wi::neg (w2);


Yes.

I was concerned about the case where w1 or w2 might be the minimum 
(negative) integer for its type.  That creates a constant that can't be 
represented.  I'm not familiar enough with the rest of hte code yet to 
know if that's problematical.






tree.c doesn't look like the best fit.  I think putting it into
tree-vrp.c is better
and I think that extract_range_from_binary_expr_1 itself should

compute what we

want here as additional output.  Conservative handling for all but

plus/minus is

ok with me.


I kept the extra function for now because I find
extract_range_from_binary_expr_1 somewhat lengthy and hard to follow
already :)


Heh, but I think duplicating code is even worse.
I was going to suggest a pre-patch that just refactored the various 
cases in extract_range_from_binary_expr_1 into their own functions. 
THat might it easier to keep things manageable.


[ ... ]



Now looking at binop_overflow (that I don't like very much, but ...)
Note that the naked "return true" in there makes 95% of that function 
unreachable code.  That's where I stopped without looking deeply at the 
rest of the code.


Jeff


Re: [PATCH] Tree-level fix for PR 69526

2016-09-14 Thread Richard Biener
On Mon, Aug 22, 2016 at 4:58 PM, Robin Dapp  wrote:
>> if !inner_ovf (just set that to false if !check_inner_ovf to simplify
>> checks please).
>> you know it's valid to transform the op to (T)@0 innerop (T)@1 outerop @2
>> (if T is wider than the inner type which I think you should check and
>> which should
>> simplify things).
>
> I simplified the control flow a little and split both parts of the
> combination into separate patterns.
>
>> You can then combine (T)@1 and @2 where I think you fail to handle mixed
>> MINUS_EXPR/PLUS_EXPR (practically you'll see only PLUS_EXPRs for
> integers).
>
> Is it ok to do something like this (see patch) in order to deal with
> MINUS_EXPR and then perform a wi::add?
>
> if (inner_op == MINUS_EXPR)
> w1 = wi::neg (w1);
>
> if (outer_op == MINUS_EXPR)
> w2 = wi::neg (w2);

Yes.

>> The testcase is rather unspecific as to what testcases shoudl fold and
> what not
>> given your very simple scan and mixing should/should-not simplify
> cases.  Please
>> consider splitting it up and make it a run test that verifies the
>> bogus transforms
>> do not take place.
>
> I divided the testcases into signed/unsigned and should simplify/should
> not simplify. The bogus unsigned transforms are now checked via assert().
>
>> Doing
> [...]
>> causes such stmts to be folded twice as substitute_and_fold does
> [...]
>> which is less than ideal.  I think that given we have fold_stmt following
>> SSA edges and thus not only stmts we propagated into are possibly
>> interesting to fold (but also their single-uses, recursively), we should
>> evaluate the compile-time cost of doing the fold_stmt unconditionally.
>
> Only using update_stmt seems to avoid the double call to fold_stmt but
> is obviously the wrong thing to do as this then tries to update
> everything that matches the pattern in tree-vrp.c. Copying some checks
> from match.pd to tree-vrp.c would help but isn't there a proper way to
> prevent duplicating the checking logic?

I think what triggers it is that you return 'true', not that you call
update_stmt.

> In addition, is the compile-time cost checking something I could do for
> this patch or a general thing to be done later?

I meant to do sth like

Index: tree-ssa-propagate.c
===
--- tree-ssa-propagate.c(revision 240133)
+++ tree-ssa-propagate.c(working copy)
@@ -1105,10 +1105,10 @@ substitute_and_fold_dom_walker::before_d
   /* Replace real uses in the statement.  */
   did_replace |= replace_uses_in (stmt, get_value_fn);

-  /* If we made a replacement, fold the statement.  */
-  if (did_replace)
+  /* Fold the statement.  */
+  if (fold_stmt (, follow_single_use_edges))
{
- fold_stmt (, follow_single_use_edges);
+ did_replace = true;
  stmt = gsi_stmt (i);
}

this would need compile-time cost evaluation (and avoid the tree-vrp.c
folding part
of your patch).

>> tree.c doesn't look like the best fit.  I think putting it into
>> tree-vrp.c is better
>> and I think that extract_range_from_binary_expr_1 itself should
> compute what we
>> want here as additional output.  Conservative handling for all but
> plus/minus is
>> ok with me.
>
> I kept the extra function for now because I find
> extract_range_from_binary_expr_1 somewhat lengthy and hard to follow
> already :)

Heh, but I think duplicating code is even worse.

Just looking at the simpler pattern right now:

+  /* ((T)(A)) +- CST -> (T)(A +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+(simplify
+ (outer_op (convert @0) INTEGER_CST@2)
+ /* Perform binary operation inside the cast if the constant fits
+   and there is no overflow.  */
+   (with
+   {
+ bool simplify = false;
+
+ wide_int cst = @2;
+ tree inner_type = TREE_TYPE (@0);
+ tree cst_inner = wide_int_to_tree (inner_type, cst);

What prevents CST being truncated here?  It looks like
(long)intvar + 0xL will be mishandled that way.

+ bool inner_ovf = true;
+
+ if (TYPE_PRECISION (inner_type) >= TYPE_PRECISION (type)
+ || TREE_CODE (inner_type) != INTEGER_TYPE)
+   simplify = false;
+ else
+ {
+   inner_ovf = binop_overflow (outer_op, @0, cst_inner, inner_type);
+   if (!inner_ovf)
+ simplify = true;
+ }
+   }
+   (if (simplify && @0)

@0 is always "true"

+(convert (outer_op @0 (convert { @2; }

You already have @2 converted -- cst_inner. Thus just use

(convert (outer_op @0 { cst_inner; }))

+   )))
+#endif


Now onto the other pattern.

+  /* ((T)(A +- CST)) +- CST -> (T)(A) +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+ (for inner_op (plus minus)
+   (simplify
+(outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
+  

Re: [PATCH] Tree-level fix for PR 69526

2016-09-05 Thread Robin Dapp
Ping.
diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 2beadbc..d66fcb1 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
 #include "gimplify.h"
+#include "tree-vrp.h"
 
 
 /* Forward declarations of the private auto-generated matchers.
diff --git a/gcc/match.pd b/gcc/match.pd
index b24bfb4..f54b734 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1119,6 +1119,113 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(minus @0 (minus @0 @1))
@1)
 
+  /* ((T)(A +- CST)) +- CST -> (T)(A) +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+ (for inner_op (plus minus)
+   (simplify
+	 (outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
+	   (with
+	{
+	   /* If the inner operation is wrapped inside a conversion, we have to
+	  check it overflows/underflows and can only then perform the
+	  simplification, i.e. add the second constant to the first
+	  (wrapped) and convert afterwards.  This fixes
+	  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69526 */
+	bool inner_ovf = false;
+
+	bool combine_ovf = true;
+	tree cst = NULL_TREE;
+	bool combine_constants = false;
+	bool types_ok = false;
+
+	tree inner_type = TREE_TYPE (@3);
+	/* Check overflow in wrapped binop? */
+	bool check_inner_ovf = !TYPE_OVERFLOW_UNDEFINED (inner_type);
+	/* Check overflow when combining constants? */
+	bool check_combine_ovf = TYPE_OVERFLOW_UNDEFINED (type);
+
+	signop s1 = TYPE_SIGN (type);
+
+	if (TYPE_PRECISION (type) >= TYPE_PRECISION (inner_type))
+	  {
+		types_ok = true;
+
+		/* Check if the inner binop does not overflow i.e. we have VRP
+		   information and can prove prove it.  */
+		if (check_inner_ovf)
+		  inner_ovf = binop_overflow (inner_op, @0, @1, inner_type);
+		else
+		  inner_ovf = false;
+
+		wide_int w1 = @1;
+		wide_int w2 = @2;
+
+		wide_int combined_cst;
+
+		/* Convert to TYPE keeping possible signedness even when
+		   dealing with unsigned types. */
+		wide_int v1;
+		v1 = v1.from (w1, w2.get_precision(), tree_int_cst_sign_bit
+			  (@1) ? SIGNED : UNSIGNED);
+
+		if (inner_op == MINUS_EXPR)
+		  w1 = wi::neg (w1);
+
+		if (outer_op == MINUS_EXPR)
+		  w2 = wi::neg (w2);
+
+		/* Combine in outer, larger type */
+		combined_cst = wi::add (v1, w2, TYPE_SIGN (type), _ovf);
+
+		/* The combined constant is ok if its type does not exhibit
+		   undefined overflow or there was no overflow when
+		   combining.  */
+		bool combined_cst_ok = !check_combine_ovf || !combine_ovf;
+		if (!inner_ovf && combined_cst_ok)
+		  {
+		/* convert to tree of outer type */
+		cst = wide_int_to_tree (type, combined_cst);
+		combine_constants = true;
+		  }
+	  }
+	  }
+	(if (types_ok && combine_constants && cst)
+	 (outer_op (convert { @0; }) { cst; }))
+	
+#endif
+
+  /* ((T)(A)) +- CST -> (T)(A +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+(simplify
+ (outer_op (convert @0) INTEGER_CST@2)
+ /* Perform binary operation inside the cast if the constant fits
+	and there is no overflow.  */
+   (with
+	{
+	  bool simplify = false;
+
+	  wide_int cst = @2;
+	  tree inner_type = TREE_TYPE (@0);
+	  tree cst_inner = wide_int_to_tree (inner_type, cst);
+	  bool inner_ovf = true;
+
+	  if (TYPE_PRECISION (inner_type) >= TYPE_PRECISION (type)
+	  || TREE_CODE (inner_type) != INTEGER_TYPE)
+	simplify = false;
+	  else
+	  {
+	inner_ovf = binop_overflow (outer_op, @0, cst_inner, inner_type);
+	if (!inner_ovf)
+	  simplify = true;
+	  }
+	}
+	(if (simplify && @0)
+	 (convert (outer_op @0 (convert { @2; }
+	)))
+#endif
+
   /* (A +- CST) +- CST -> A + CST  */
   (for outer_op (plus minus)
(for inner_op (plus minus)
@@ -1132,6 +1239,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (cst && !TREE_OVERFLOW (cst))
(inner_op @0 { cst; } ))
 
+
   /* (CST - A) +- CST -> CST - A  */
   (for outer_op (plus minus)
(simplify
diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c
new file mode 100644
index 000..0afe99a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c
@@ -0,0 +1,76 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to" 11 "forwprop1" } } */
+
+#include 
+
+// should simplify, ignore possible signed overflow in (a - 2) and combine
+// constants
+long foo(int a)
+{
+  return (long)(a - 2) + 1;
+}
+
+// should simplify
+long bar(int a)
+{
+  return (long)(a + 3) - 1;
+}
+
+// should simplify with combined constant in outer type
+long baz(int a)
+{
+  return (long)(a - 1) + 2;
+}
+
+// should simplify
+long baf(int a)
+{
+  return (long)(a + 1) - 2;
+}
+
+// should simplify (same as above)
+long bak(int a)
+{

Re: [PATCH] Tree-level fix for PR 69526

2016-08-23 Thread Robin Dapp
gah, this

+  return true;
+  if (TREE_CODE (t1) != SSA_NAME)

should of course be like this

+  if (TREE_CODE (t1) != SSA_NAME)
+  return true;

in the last patch.



Re: [PATCH] Tree-level fix for PR 69526

2016-08-22 Thread Robin Dapp
> if !inner_ovf (just set that to false if !check_inner_ovf to simplify
> checks please).
> you know it's valid to transform the op to (T)@0 innerop (T)@1 outerop @2
> (if T is wider than the inner type which I think you should check and
> which should
> simplify things).

I simplified the control flow a little and split both parts of the
combination into separate patterns.

> You can then combine (T)@1 and @2 where I think you fail to handle mixed
> MINUS_EXPR/PLUS_EXPR (practically you'll see only PLUS_EXPRs for
integers).

Is it ok to do something like this (see patch) in order to deal with
MINUS_EXPR and then perform a wi::add?

if (inner_op == MINUS_EXPR)
w1 = wi::neg (w1);

if (outer_op == MINUS_EXPR)
w2 = wi::neg (w2);

> The testcase is rather unspecific as to what testcases shoudl fold and
what not
> given your very simple scan and mixing should/should-not simplify
cases.  Please
> consider splitting it up and make it a run test that verifies the
> bogus transforms
> do not take place.

I divided the testcases into signed/unsigned and should simplify/should
not simplify. The bogus unsigned transforms are now checked via assert().

> Doing
[...]
> causes such stmts to be folded twice as substitute_and_fold does
[...]
> which is less than ideal.  I think that given we have fold_stmt following
> SSA edges and thus not only stmts we propagated into are possibly
> interesting to fold (but also their single-uses, recursively), we should
> evaluate the compile-time cost of doing the fold_stmt unconditionally.

Only using update_stmt seems to avoid the double call to fold_stmt but
is obviously the wrong thing to do as this then tries to update
everything that matches the pattern in tree-vrp.c. Copying some checks
from match.pd to tree-vrp.c would help but isn't there a proper way to
prevent duplicating the checking logic?
In addition, is the compile-time cost checking something I could do for
this patch or a general thing to be done later?

> tree.c doesn't look like the best fit.  I think putting it into
> tree-vrp.c is better
> and I think that extract_range_from_binary_expr_1 itself should
compute what we
> want here as additional output.  Conservative handling for all but
plus/minus is
> ok with me.

I kept the extra function for now because I find
extract_range_from_binary_expr_1 somewhat lengthy and hard to follow
already :) Wouldn't it be better to "separate concerns"/split it up in
the long run and merge the functionality needed here at some time?

Bootstrapped and reg-tested on s390x, bootstrap on x86 running.

Regards
 Robin


gcc/ChangeLog:

2016-08-22  Robin Dapp  

* match.pd: Introduce patterns to simplify binary operations wrapped
inside a cast.
* tree-vrp.c (bool binop_overflow):
(simplify_plus_or_minus_using_ranges): Make use of newly introduced
patterns.
(simplify_stmt_using_ranges): Call simplify_plus_or_minus_using_ranges
* tree-vrp.h: New file.
* gimple-match-head.c: Include tree-vrp.h


gcc/testsuite/ChangeLog:

2016-08-22  Robin Dapp  

* gcc.dg/wrapped-binop-simplify-signed-1.c: New test.
* gcc.dg/wrapped-binop-simplify-signed-2.c: New test.
* gcc.dg/wrapped-binop-simplify-unsigned-1.c: New test.
* gcc.dg/wrapped-binop-simplify-unsigned-2.c: New test.
diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 2beadbc..d66fcb1 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
 #include "gimplify.h"
+#include "tree-vrp.h"
 
 
 /* Forward declarations of the private auto-generated matchers.
diff --git a/gcc/match.pd b/gcc/match.pd
index b24bfb4..f54b734 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1119,6 +1119,113 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(minus @0 (minus @0 @1))
@1)
 
+  /* ((T)(A +- CST)) +- CST -> (T)(A) +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+ (for inner_op (plus minus)
+   (simplify
+	 (outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
+	   (with
+	{
+	   /* If the inner operation is wrapped inside a conversion, we have to
+	  check it overflows/underflows and can only then perform the
+	  simplification, i.e. add the second constant to the first
+	  (wrapped) and convert afterwards.  This fixes
+	  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69526 */
+	bool inner_ovf = false;
+
+	bool combine_ovf = true;
+	tree cst = NULL_TREE;
+	bool combine_constants = false;
+	bool types_ok = false;
+
+	tree inner_type = TREE_TYPE (@3);
+	/* Check overflow in wrapped binop? */
+	bool check_inner_ovf = !TYPE_OVERFLOW_UNDEFINED (inner_type);
+	/* Check overflow when combining constants? */
+	bool check_combine_ovf = TYPE_OVERFLOW_UNDEFINED 

Re: [PATCH] Tree-level fix for PR 69526

2016-07-21 Thread Richard Biener
On Thu, Jul 21, 2016 at 12:42 PM, Robin Dapp  wrote:
> As described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69526, we
> currently fail to simplify cases like
>
> (unsigned long)(a - 1) + 1
>
> to
>
> (unsigned long)a
>
> when VRP knows that (a - 1) does not overflow.
>
> This patch introduces a match.pd pattern as well as a helper function
> that checks for overflow in a binary operation using VRP information and
> simplifies when no overflow is present.
>
> Some effort was put in to stay in the inner type in cases like this
>
> (unsigned long)(a + CST1) - CST2
> ->
> (unsigned long)(a + CST3) rather than
> (unsigned long)a + CST3
>
> where abs(CST3) = abs(CST1 - CST) <= abs(CST1). I wonder if this is
> warranted, i.e. if it is always advantageous or if the evaluation should
> rather involve a cost estimation - e.g. distinguish between costs for
> operations in int vs. in long.
>
> Absence of signed overflow is also exploited:
>
> (long)(a + 2) - 1
> ->
> (long)(a + 1)
>
> Bootstrapped and regression tested on s390x and x86_64.

I find this a bit hard to follow (looking at the match.pd pattern).

+   if (check_inner_ovf)
+ {
+   // check if the inner binop does not overflow i.e. we have VRP
+   // information and can prove prove it
+   inner_ovf = binop_overflow (inner_op, @0, @1, inner_type);
+ }

if !inner_ovf (just set that to false if !check_inner_ovf to simplify
checks please).
you know it's valid to transform the op to (T)@0 innerop (T)@1 outerop @2
(if T is wider than the inner type which I think you should check and
which should
simplify things).

You can then combine (T)@1 and @2 where I think you fail to handle mixed
MINUS_EXPR/PLUS_EXPR (practically you'll see only PLUS_EXPRs for integers).

So you have (T)@0 + combined-in-T which you then can either emit or
check whether
combined-in-T fits in the inner type and @0 + combined-in-T does not overflow in
which case (T)(@0 + combined-in-T) is safe.

I believe that for simplicity we want to split this transform into two
- one doing
(T)(a + CST) - CST -> (T)a + CST' and one doing (T)a + CST -> (T)(a + CST).

The testcase is rather unspecific as to what testcases shoudl fold and what not
given your very simple scan and mixing should/should-not simplify cases.  Please
consider splitting it up and make it a run test that verifies the
bogus transforms
do not take place.

Doing

+static bool
+simplify_plus_or_minus_using_ranges (gimple_stmt_iterator *gsi, gimple *stmt)
+{
+  enum tree_code code = gimple_assign_rhs_code (stmt);
+  tree op0 = gimple_assign_rhs1 (stmt);
+  tree op1 = gimple_assign_rhs2 (stmt);
+
+  if ((code == PLUS_EXPR || code == MINUS_EXPR) &&
+  op0 != NULL && op1 != NULL)
+{
+  gimple *stmt1 = SSA_NAME_DEF_STMT (op0);
+  if (gassign *def = dyn_cast  (stmt1))
+   {
+ if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def))
+ && TREE_CODE (op1) == INTEGER_CST)
+   {
+ if (fold_stmt (gsi, follow_single_use_edges))
+   return true;

causes such stmts to be folded twice as substitute_and_fold does

  /* Some statements may be simplified using propagator
 specific information.  Do this before propagating
 into the stmt to not disturb pass specific information.  */
  if (fold_fn
  && (*fold_fn)())
{
  did_replace = true;
  prop_stats.num_stmts_folded++;
  stmt = gsi_stmt (i);
  update_stmt (stmt);
}

  /* Replace real uses in the statement.  */
  did_replace |= replace_uses_in (stmt, get_value_fn);

  /* If we made a replacement, fold the statement.  */
  if (did_replace)
{
  fold_stmt (, follow_single_use_edges);
  stmt = gsi_stmt (i);

which is less than ideal.  I think that given we have fold_stmt following
SSA edges and thus not only stmts we propagated into are possibly
interesting to fold (but also their single-uses, recursively), we should
evaluate the compile-time cost of doing the fold_stmt unconditionally.

diff --git a/gcc/tree.c b/gcc/tree.c
index 2295111..bc477fa 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -1358,6 +1358,108 @@ force_fit_type (tree type, const wide_int_ref ,
   return wide_int_to_tree (type, cst);
 }

+bool binop_overflow (enum tree_code op, tree t1, tree t2, tree type)
+{

tree.c doesn't look like the best fit.  I think putting it into
tree-vrp.c is better
and I think that extract_range_from_binary_expr_1 itself should compute what we
want here as additional output.  Conservative handling for all but plus/minus is
ok with me.

+  if (t1 == NULL_TREE || t2 == NULL_TREE || type == NULL_TREE)
+return true;
+
+  if (TYPE_OVERFLOW_UNDEFINED (type))
+return false;
+
+  if (!INTEGRAL_TYPE_P (type))
+return true;

note that we'll ICE if you call TYPE_OVERFLOW_UNDEFINED on a type
that is not ANY_INTEGRAL_TYPE_P, so better