Re: PR81635: Use chrecs to help find related data refs

2017-08-24 Thread Richard Sandiford
Richard Biener  writes:
> @@ -787,14 +821,14 @@ canonicalize_base_object_address (tree a
>
>  bool
>  dr_analyze_innermost (innermost_loop_behavior *drb, tree ref,
> - struct loop *loop)
> + gimple *stmt, struct loop *loop)
>  {
>
> please amend the function comment with what STMT is about
> (DR_STMT I suppose).

I'd done that with:

--
@@ -765,7 +778,28 @@ canonicalize_base_object_address (tree a
   return build_fold_addr_expr (TREE_OPERAND (addr, 0));
 }
 
-/* Analyze the behavior of memory reference REF.  There are two modes:
[...]
+/* Analyze the behavior of memory reference REF, which occurs in STMT.
+   There are two modes:
 
- BB analysis.  In this case we simply split the address into base,
  init and offset components, without reference to any containing loop.
--

but it wasn't obvious because of the elided stuff.

> @@ -893,14 +927,14 @@ dr_analyze_innermost (innermost_loop_beh
>init = size_binop (PLUS_EXPR, init, dinit);
>base_misalignment -= TREE_INT_CST_LOW (dinit);
>
> -  split_constant_offset (offset_iv.base, _iv.base, );
> -  init = size_binop (PLUS_EXPR, init, dinit);
> -
> -  step = size_binop (PLUS_EXPR,
> -fold_convert (ssizetype, base_iv.step),
> -fold_convert (ssizetype, offset_iv.step));
> -
>base = canonicalize_base_object_address (base_iv.base);
> +  tree offset = size_binop (PLUS_EXPR,
> +   fold_convert (ssizetype, offset_iv.base),
> +   init);
>
> so you remove the split_constant_offset handling on offset_iv.base.
> This may end up no longer walking and expanding def stmts of
> SSA names contained therein.  I suppose this is fully intended
> so that re-computing the ref address using DR_BASE/DR_OFFSET
> doesn't end up expanding that redundant code?

Yeah.  I think DR_OFFSET is only going to be used by code that
wants to construct a new address, so there doesn't seem much
point taking apart the offset only to regimplify it when building
a new reference.

> For analysis relying on this one now needs to resort to
> DR_VAR/CONST_OFFSET where SCEV analysis hopefully performs similar
> expansions?

We still apply split_constant_offset to the DR_VAR_OFFSET as well;
the scev analysis applies on top of that rather than replacing it.

> Just sth to watch at ...
>
> @@ -921,12 +955,12 @@ dr_analyze_innermost (innermost_loop_beh
>  }
>
>drb->base_address = base;
> -  drb->offset = fold_convert (ssizetype, offset_iv.base);
> -  drb->init = init;
> +  drb->offset = offset;
>drb->step = step;
> +  split_constant_offset (scev, >var_offset, >const_offset);
>
> so is the full-fledged split_constant_offset (with its SSA name handling)
> still needed here?  Sth to eventually address with a followup.

Yeah, I think we still need it.  The SCEV stuff is only really useful
if the starting offset has a simple evolution in a containing loop.
For other cases we punt and just use the original generic expression.
In particular, if we're doing SLP on a single-block function, we need
everything that split_constant_offset currently does.

> @@ -1490,6 +1482,7 @@ ref_at_iteration (data_reference_p dr, i
>tree ref_op2 = NULL_TREE;
>tree new_offset;
>
> +  split_constant_offset (DR_OFFSET (dr), , );
>if (iter != 0)
>  {
>new_offset = size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter));
>
> likewise here?  Why do you think ref_at_iteration cares?  Is that because
> of codegen quality?  I'd have done with coff == size_zero_node plus
> simplifications that arise from that.

Yeah, I assumed it was codegen quality.  But maybe it was just a way to
compensate for the fact that the MEM_REF is built directly (rather than
via a fold) and so this was the easiest way of getting a nonzero second
operand to the MEM_REF.

Thanks,
Richard


Re: PR81635: Use chrecs to help find related data refs

2017-08-24 Thread Richard Biener
On Tue, Aug 22, 2017 at 4:19 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Fri, Aug 18, 2017 at 12:30 PM, Richard Biener
>>  wrote:
>>> On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng  wrote:
 On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford
  wrote:
> "Bin.Cheng"  writes:
>> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford
>>  wrote:
>>> "Bin.Cheng"  writes:
 On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
  wrote:
> "Bin.Cheng"  writes:
>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>>  wrote:
>>> The first loop in the testcase regressed after my recent changes to
>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even
>>> for bb analysis and end up with:
>>>
>>>DR_BASE_ADDRESS: p or q
>>>DR_OFFSET: 0
>>>DR_INIT: 0 or 4
>>>DR_STEP: 16
>>>
>>> We now always keep the step as 0 instead, so for an int "i" we'd 
>>> have:
>>>
>>>DR_BASE_ADDRESS: p or q
>>>DR_OFFSET: (intptr_t) i
>>>DR_INIT: 0 or 4
>>>DR_STEP: 0
>>>
>>> This is also what we'd like to have for the unsigned "i", but the
>>> problem is that strip_constant_offset thinks that the "i + 1" in
>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
>>> longer seem to be related to the [i] ones.
>>
>> Didn't read the change in detail, so sorry if I mis-understood
>> the issue.
>> I made changes in scev to better fold type conversion by
>> various sources
>> of information, for example, vrp, niters, undefined overflow
>> behavior etc.
>> In theory these information should be available for other
>> optimizers without
>> querying scev.  For the mentioned test, vrp should compute
>> accurate range
>> information for "i" so that we can fold (intptr_t) (i + 1) it without
>> worrying
>> overflow.  Note we don't do it in generic folding because
>> (intptr_t) (i) + 1
>> could be more expensive (especially in case of (T)(i + j)), or
>> because the
>> CST part is in bigger precision after conversion.
>> But such folding is wanted in several places, e.g, IVOPTs.  To
>> provide such
>> an interface, we changed tree-affine and made it do aggressive
>> fold.  I am
>> curious if it's possible to use aff_tree to implement
>> strip_constant_offset
>> here since aggressive folding is wanted.  After all, using
>> additional chrec
>> looks like a little heavy wrto the simple test.
>
> Yeah, using aff_tree does work here when the bounds are constant.
> It doesn't look like it works for things like:
>
> double p[1000];
> double q[1000];
>
> void
> f4 (unsigned int n)
> {
>   for (unsigned int i = 0; i < n; i += 4)
> {
>   double a = q[i] + p[i];
>   double b = q[i + 1] + p[i + 1];
>   q[i] = a;
>   q[i + 1] = b;
> }
> }
>
> though, where the bounds on the global arrays guarantee that [i + 1] 
> can't
> overflow, even though "n" is unconstrained.  The patch as posted 
> handles
> this case too.
 BTW is this a missed optimization in value range analysis?  The range
 information for i should flow in a way like: array boundary -> niters
 -> scev/vrp.
 I think that's what niters/scev do in analysis.
>>>
>>> Yeah, maybe :-)  It looks like the problem is that when SLP runs,
>>> the previous VRP pass came before loop header copying, so the (single)
>>> header has to cope with n == 0 case.  Thus we get:
>> Ah, there are several passes in between vrp and pass_ch, not sure if
>> any such pass depends on vrp intensively.  I would suggestion reorder
>> the two passes, or standalone VRP interface updating information for
>> loop region after header copied?   This is a non-trivial issue that
>> needs to be fixed.  Niters analyzer rely on
>> simplify_using_initial_conditions heavily to get the same information,
>> which in my opinion should be provided by VRP.  Though this won't be

Re: PR81635: Use chrecs to help find related data refs

2017-08-22 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, Aug 18, 2017 at 12:30 PM, Richard Biener
>  wrote:
>> On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng  wrote:
>>> On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford
>>>  wrote:
 "Bin.Cheng"  writes:
> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford
>  wrote:
>> "Bin.Cheng"  writes:
>>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
>>>  wrote:
 "Bin.Cheng"  writes:
> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>  wrote:
>> The first loop in the testcase regressed after my recent changes to
>> dr_analyze_innermost.  Previously we would treat "i" as an iv even
>> for bb analysis and end up with:
>>
>>DR_BASE_ADDRESS: p or q
>>DR_OFFSET: 0
>>DR_INIT: 0 or 4
>>DR_STEP: 16
>>
>> We now always keep the step as 0 instead, so for an int "i" we'd 
>> have:
>>
>>DR_BASE_ADDRESS: p or q
>>DR_OFFSET: (intptr_t) i
>>DR_INIT: 0 or 4
>>DR_STEP: 0
>>
>> This is also what we'd like to have for the unsigned "i", but the
>> problem is that strip_constant_offset thinks that the "i + 1" in
>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
>> longer seem to be related to the [i] ones.
>
> Didn't read the change in detail, so sorry if I mis-understood
> the issue.
> I made changes in scev to better fold type conversion by
> various sources
> of information, for example, vrp, niters, undefined overflow
> behavior etc.
> In theory these information should be available for other
> optimizers without
> querying scev.  For the mentioned test, vrp should compute
> accurate range
> information for "i" so that we can fold (intptr_t) (i + 1) it without
> worrying
> overflow.  Note we don't do it in generic folding because
> (intptr_t) (i) + 1
> could be more expensive (especially in case of (T)(i + j)), or
> because the
> CST part is in bigger precision after conversion.
> But such folding is wanted in several places, e.g, IVOPTs.  To
> provide such
> an interface, we changed tree-affine and made it do aggressive
> fold.  I am
> curious if it's possible to use aff_tree to implement
> strip_constant_offset
> here since aggressive folding is wanted.  After all, using
> additional chrec
> looks like a little heavy wrto the simple test.

 Yeah, using aff_tree does work here when the bounds are constant.
 It doesn't look like it works for things like:

 double p[1000];
 double q[1000];

 void
 f4 (unsigned int n)
 {
   for (unsigned int i = 0; i < n; i += 4)
 {
   double a = q[i] + p[i];
   double b = q[i + 1] + p[i + 1];
   q[i] = a;
   q[i + 1] = b;
 }
 }

 though, where the bounds on the global arrays guarantee that [i + 1] 
 can't
 overflow, even though "n" is unconstrained.  The patch as posted 
 handles
 this case too.
>>> BTW is this a missed optimization in value range analysis?  The range
>>> information for i should flow in a way like: array boundary -> niters
>>> -> scev/vrp.
>>> I think that's what niters/scev do in analysis.
>>
>> Yeah, maybe :-)  It looks like the problem is that when SLP runs,
>> the previous VRP pass came before loop header copying, so the (single)
>> header has to cope with n == 0 case.  Thus we get:
> Ah, there are several passes in between vrp and pass_ch, not sure if
> any such pass depends on vrp intensively.  I would suggestion reorder
> the two passes, or standalone VRP interface updating information for
> loop region after header copied?   This is a non-trivial issue that
> needs to be fixed.  Niters analyzer rely on
> simplify_using_initial_conditions heavily to get the same information,
> which in my opinion should be provided by VRP.  Though this won't be
> able to obsolete simplify_using_initial_conditions because VRP is weak
> in symbolic range...
>
>>
>>   Visiting statement:
>>   i_15 = ASSERT_EXPR 

Re: PR81635: Use chrecs to help find related data refs

2017-08-18 Thread Richard Biener
On Fri, Aug 18, 2017 at 12:30 PM, Richard Biener
 wrote:
> On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng  wrote:
>> On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford
>>  wrote:
>>> "Bin.Cheng"  writes:
 On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford
  wrote:
> "Bin.Cheng"  writes:
>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
>>  wrote:
>>> "Bin.Cheng"  writes:
 On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
  wrote:
> The first loop in the testcase regressed after my recent changes to
> dr_analyze_innermost.  Previously we would treat "i" as an iv even
> for bb analysis and end up with:
>
>DR_BASE_ADDRESS: p or q
>DR_OFFSET: 0
>DR_INIT: 0 or 4
>DR_STEP: 16
>
> We now always keep the step as 0 instead, so for an int "i" we'd have:
>
>DR_BASE_ADDRESS: p or q
>DR_OFFSET: (intptr_t) i
>DR_INIT: 0 or 4
>DR_STEP: 0
>
> This is also what we'd like to have for the unsigned "i", but the
> problem is that strip_constant_offset thinks that the "i + 1" in
> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
> longer seem to be related to the [i] ones.

 Didn't read the change in detail, so sorry if I mis-understood the 
 issue.
 I made changes in scev to better fold type conversion by various 
 sources
 of information, for example, vrp, niters, undefined overflow behavior 
 etc.
 In theory these information should be available for other
 optimizers without
 querying scev.  For the mentioned test, vrp should compute accurate 
 range
 information for "i" so that we can fold (intptr_t) (i + 1) it without
 worrying
 overflow.  Note we don't do it in generic folding because
 (intptr_t) (i) + 1
 could be more expensive (especially in case of (T)(i + j)), or because 
 the
 CST part is in bigger precision after conversion.
 But such folding is wanted in several places, e.g, IVOPTs.  To provide 
 such
 an interface, we changed tree-affine and made it do aggressive fold.  
 I am
 curious if it's possible to use aff_tree to implement 
 strip_constant_offset
 here since aggressive folding is wanted.  After all, using additional 
 chrec
 looks like a little heavy wrto the simple test.
>>>
>>> Yeah, using aff_tree does work here when the bounds are constant.
>>> It doesn't look like it works for things like:
>>>
>>> double p[1000];
>>> double q[1000];
>>>
>>> void
>>> f4 (unsigned int n)
>>> {
>>>   for (unsigned int i = 0; i < n; i += 4)
>>> {
>>>   double a = q[i] + p[i];
>>>   double b = q[i + 1] + p[i + 1];
>>>   q[i] = a;
>>>   q[i + 1] = b;
>>> }
>>> }
>>>
>>> though, where the bounds on the global arrays guarantee that [i + 1] 
>>> can't
>>> overflow, even though "n" is unconstrained.  The patch as posted handles
>>> this case too.
>> BTW is this a missed optimization in value range analysis?  The range
>> information for i should flow in a way like: array boundary -> niters
>> -> scev/vrp.
>> I think that's what niters/scev do in analysis.
>
> Yeah, maybe :-)  It looks like the problem is that when SLP runs,
> the previous VRP pass came before loop header copying, so the (single)
> header has to cope with n == 0 case.  Thus we get:
 Ah, there are several passes in between vrp and pass_ch, not sure if
 any such pass depends on vrp intensively.  I would suggestion reorder
 the two passes, or standalone VRP interface updating information for
 loop region after header copied?   This is a non-trivial issue that
 needs to be fixed.  Niters analyzer rely on
 simplify_using_initial_conditions heavily to get the same information,
 which in my opinion should be provided by VRP.  Though this won't be
 able to obsolete simplify_using_initial_conditions because VRP is weak
 in symbolic range...

>
>   Visiting statement:
>   i_15 = ASSERT_EXPR ;
>   Intersecting
> [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>   and
> [0, 0]
>   to
> [0, 0]  EQUIVALENCES: { i_6 } 

Re: PR81635: Use chrecs to help find related data refs

2017-08-18 Thread Richard Biener
On Thu, Aug 17, 2017 at 2:24 PM, Bin.Cheng  wrote:
> On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford
>  wrote:
>> "Bin.Cheng"  writes:
>>> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford
>>>  wrote:
 "Bin.Cheng"  writes:
> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
>  wrote:
>> "Bin.Cheng"  writes:
>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>>>  wrote:
 The first loop in the testcase regressed after my recent changes to
 dr_analyze_innermost.  Previously we would treat "i" as an iv even
 for bb analysis and end up with:

DR_BASE_ADDRESS: p or q
DR_OFFSET: 0
DR_INIT: 0 or 4
DR_STEP: 16

 We now always keep the step as 0 instead, so for an int "i" we'd have:

DR_BASE_ADDRESS: p or q
DR_OFFSET: (intptr_t) i
DR_INIT: 0 or 4
DR_STEP: 0

 This is also what we'd like to have for the unsigned "i", but the
 problem is that strip_constant_offset thinks that the "i + 1" in
 "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
 The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
 name that holds "(intptr_t) (i + 1)", meaning that the accesses no
 longer seem to be related to the [i] ones.
>>>
>>> Didn't read the change in detail, so sorry if I mis-understood the 
>>> issue.
>>> I made changes in scev to better fold type conversion by various sources
>>> of information, for example, vrp, niters, undefined overflow behavior 
>>> etc.
>>> In theory these information should be available for other
>>> optimizers without
>>> querying scev.  For the mentioned test, vrp should compute accurate 
>>> range
>>> information for "i" so that we can fold (intptr_t) (i + 1) it without
>>> worrying
>>> overflow.  Note we don't do it in generic folding because
>>> (intptr_t) (i) + 1
>>> could be more expensive (especially in case of (T)(i + j)), or because 
>>> the
>>> CST part is in bigger precision after conversion.
>>> But such folding is wanted in several places, e.g, IVOPTs.  To provide 
>>> such
>>> an interface, we changed tree-affine and made it do aggressive fold.  I 
>>> am
>>> curious if it's possible to use aff_tree to implement 
>>> strip_constant_offset
>>> here since aggressive folding is wanted.  After all, using additional 
>>> chrec
>>> looks like a little heavy wrto the simple test.
>>
>> Yeah, using aff_tree does work here when the bounds are constant.
>> It doesn't look like it works for things like:
>>
>> double p[1000];
>> double q[1000];
>>
>> void
>> f4 (unsigned int n)
>> {
>>   for (unsigned int i = 0; i < n; i += 4)
>> {
>>   double a = q[i] + p[i];
>>   double b = q[i + 1] + p[i + 1];
>>   q[i] = a;
>>   q[i + 1] = b;
>> }
>> }
>>
>> though, where the bounds on the global arrays guarantee that [i + 1] 
>> can't
>> overflow, even though "n" is unconstrained.  The patch as posted handles
>> this case too.
> BTW is this a missed optimization in value range analysis?  The range
> information for i should flow in a way like: array boundary -> niters
> -> scev/vrp.
> I think that's what niters/scev do in analysis.

 Yeah, maybe :-)  It looks like the problem is that when SLP runs,
 the previous VRP pass came before loop header copying, so the (single)
 header has to cope with n == 0 case.  Thus we get:
>>> Ah, there are several passes in between vrp and pass_ch, not sure if
>>> any such pass depends on vrp intensively.  I would suggestion reorder
>>> the two passes, or standalone VRP interface updating information for
>>> loop region after header copied?   This is a non-trivial issue that
>>> needs to be fixed.  Niters analyzer rely on
>>> simplify_using_initial_conditions heavily to get the same information,
>>> which in my opinion should be provided by VRP.  Though this won't be
>>> able to obsolete simplify_using_initial_conditions because VRP is weak
>>> in symbolic range...
>>>

   Visiting statement:
   i_15 = ASSERT_EXPR ;
   Intersecting
 [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
   and
 [0, 0]
   to
 [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
   Intersecting
 [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
   and
 [0, 1000]
   to
 [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
   Found new range for 

Re: PR81635: Use chrecs to help find related data refs

2017-08-17 Thread Bin.Cheng
On Thu, Aug 17, 2017 at 12:35 PM, Richard Sandiford
 wrote:
> "Bin.Cheng"  writes:
>> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford
>>  wrote:
>>> "Bin.Cheng"  writes:
 On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
  wrote:
> "Bin.Cheng"  writes:
>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>>  wrote:
>>> The first loop in the testcase regressed after my recent changes to
>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even
>>> for bb analysis and end up with:
>>>
>>>DR_BASE_ADDRESS: p or q
>>>DR_OFFSET: 0
>>>DR_INIT: 0 or 4
>>>DR_STEP: 16
>>>
>>> We now always keep the step as 0 instead, so for an int "i" we'd have:
>>>
>>>DR_BASE_ADDRESS: p or q
>>>DR_OFFSET: (intptr_t) i
>>>DR_INIT: 0 or 4
>>>DR_STEP: 0
>>>
>>> This is also what we'd like to have for the unsigned "i", but the
>>> problem is that strip_constant_offset thinks that the "i + 1" in
>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
>>> longer seem to be related to the [i] ones.
>>
>> Didn't read the change in detail, so sorry if I mis-understood the issue.
>> I made changes in scev to better fold type conversion by various sources
>> of information, for example, vrp, niters, undefined overflow behavior 
>> etc.
>> In theory these information should be available for other
>> optimizers without
>> querying scev.  For the mentioned test, vrp should compute accurate range
>> information for "i" so that we can fold (intptr_t) (i + 1) it without
>> worrying
>> overflow.  Note we don't do it in generic folding because
>> (intptr_t) (i) + 1
>> could be more expensive (especially in case of (T)(i + j)), or because 
>> the
>> CST part is in bigger precision after conversion.
>> But such folding is wanted in several places, e.g, IVOPTs.  To provide 
>> such
>> an interface, we changed tree-affine and made it do aggressive fold.  I 
>> am
>> curious if it's possible to use aff_tree to implement 
>> strip_constant_offset
>> here since aggressive folding is wanted.  After all, using additional 
>> chrec
>> looks like a little heavy wrto the simple test.
>
> Yeah, using aff_tree does work here when the bounds are constant.
> It doesn't look like it works for things like:
>
> double p[1000];
> double q[1000];
>
> void
> f4 (unsigned int n)
> {
>   for (unsigned int i = 0; i < n; i += 4)
> {
>   double a = q[i] + p[i];
>   double b = q[i + 1] + p[i + 1];
>   q[i] = a;
>   q[i + 1] = b;
> }
> }
>
> though, where the bounds on the global arrays guarantee that [i + 1] can't
> overflow, even though "n" is unconstrained.  The patch as posted handles
> this case too.
 BTW is this a missed optimization in value range analysis?  The range
 information for i should flow in a way like: array boundary -> niters
 -> scev/vrp.
 I think that's what niters/scev do in analysis.
>>>
>>> Yeah, maybe :-)  It looks like the problem is that when SLP runs,
>>> the previous VRP pass came before loop header copying, so the (single)
>>> header has to cope with n == 0 case.  Thus we get:
>> Ah, there are several passes in between vrp and pass_ch, not sure if
>> any such pass depends on vrp intensively.  I would suggestion reorder
>> the two passes, or standalone VRP interface updating information for
>> loop region after header copied?   This is a non-trivial issue that
>> needs to be fixed.  Niters analyzer rely on
>> simplify_using_initial_conditions heavily to get the same information,
>> which in my opinion should be provided by VRP.  Though this won't be
>> able to obsolete simplify_using_initial_conditions because VRP is weak
>> in symbolic range...
>>
>>>
>>>   Visiting statement:
>>>   i_15 = ASSERT_EXPR ;
>>>   Intersecting
>>> [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>>>   and
>>> [0, 0]
>>>   to
>>> [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
>>>   Intersecting
>>> [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
>>>   and
>>> [0, 1000]
>>>   to
>>> [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
>>>   Found new range for i_15: [0, 0]
>>>
>>>   Visiting statement:
>>>   _3 = i_15 + 1;
>>>   Match-and-simplified i_15 + 1 to 1
>>>   Intersecting
>>> [1, 1]
>>>   and
>>> [0, +INF]
>>>   to
>>> [1, 1]
>>>   Found new range for 

Re: PR81635: Use chrecs to help find related data refs

2017-08-17 Thread Richard Sandiford
"Bin.Cheng"  writes:
> On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford
>  wrote:
>> "Bin.Cheng"  writes:
>>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
>>>  wrote:
 "Bin.Cheng"  writes:
> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>  wrote:
>> The first loop in the testcase regressed after my recent changes to
>> dr_analyze_innermost.  Previously we would treat "i" as an iv even
>> for bb analysis and end up with:
>>
>>DR_BASE_ADDRESS: p or q
>>DR_OFFSET: 0
>>DR_INIT: 0 or 4
>>DR_STEP: 16
>>
>> We now always keep the step as 0 instead, so for an int "i" we'd have:
>>
>>DR_BASE_ADDRESS: p or q
>>DR_OFFSET: (intptr_t) i
>>DR_INIT: 0 or 4
>>DR_STEP: 0
>>
>> This is also what we'd like to have for the unsigned "i", but the
>> problem is that strip_constant_offset thinks that the "i + 1" in
>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
>> longer seem to be related to the [i] ones.
>
> Didn't read the change in detail, so sorry if I mis-understood the issue.
> I made changes in scev to better fold type conversion by various sources
> of information, for example, vrp, niters, undefined overflow behavior etc.
> In theory these information should be available for other
> optimizers without
> querying scev.  For the mentioned test, vrp should compute accurate range
> information for "i" so that we can fold (intptr_t) (i + 1) it without
> worrying
> overflow.  Note we don't do it in generic folding because
> (intptr_t) (i) + 1
> could be more expensive (especially in case of (T)(i + j)), or because the
> CST part is in bigger precision after conversion.
> But such folding is wanted in several places, e.g, IVOPTs.  To provide 
> such
> an interface, we changed tree-affine and made it do aggressive fold.  I am
> curious if it's possible to use aff_tree to implement 
> strip_constant_offset
> here since aggressive folding is wanted.  After all, using additional 
> chrec
> looks like a little heavy wrto the simple test.

 Yeah, using aff_tree does work here when the bounds are constant.
 It doesn't look like it works for things like:

 double p[1000];
 double q[1000];

 void
 f4 (unsigned int n)
 {
   for (unsigned int i = 0; i < n; i += 4)
 {
   double a = q[i] + p[i];
   double b = q[i + 1] + p[i + 1];
   q[i] = a;
   q[i + 1] = b;
 }
 }

 though, where the bounds on the global arrays guarantee that [i + 1] can't
 overflow, even though "n" is unconstrained.  The patch as posted handles
 this case too.
>>> BTW is this a missed optimization in value range analysis?  The range
>>> information for i should flow in a way like: array boundary -> niters
>>> -> scev/vrp.
>>> I think that's what niters/scev do in analysis.
>>
>> Yeah, maybe :-)  It looks like the problem is that when SLP runs,
>> the previous VRP pass came before loop header copying, so the (single)
>> header has to cope with n == 0 case.  Thus we get:
> Ah, there are several passes in between vrp and pass_ch, not sure if
> any such pass depends on vrp intensively.  I would suggestion reorder
> the two passes, or standalone VRP interface updating information for
> loop region after header copied?   This is a non-trivial issue that
> needs to be fixed.  Niters analyzer rely on
> simplify_using_initial_conditions heavily to get the same information,
> which in my opinion should be provided by VRP.  Though this won't be
> able to obsolete simplify_using_initial_conditions because VRP is weak
> in symbolic range...
>
>>
>>   Visiting statement:
>>   i_15 = ASSERT_EXPR ;
>>   Intersecting
>> [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>>   and
>> [0, 0]
>>   to
>> [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
>>   Intersecting
>> [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
>>   and
>> [0, 1000]
>>   to
>> [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
>>   Found new range for i_15: [0, 0]
>>
>>   Visiting statement:
>>   _3 = i_15 + 1;
>>   Match-and-simplified i_15 + 1 to 1
>>   Intersecting
>> [1, 1]
>>   and
>> [0, +INF]
>>   to
>> [1, 1]
>>   Found new range for _3: [1, 1]
>>
>> (where _3 is the index we care about), followed by:
>>
>>   Visiting statement:
>>   i_15 = ASSERT_EXPR ;
>>   Intersectings/aarch64-linux/trunk-orig/debug/gcc'
>> [0, n_9(D) + 4294967295]  

Re: PR81635: Use chrecs to help find related data refs

2017-08-17 Thread Bin.Cheng
On Wed, Aug 16, 2017 at 6:50 PM, Richard Sandiford
 wrote:
> "Bin.Cheng"  writes:
>> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
>>  wrote:
>>> "Bin.Cheng"  writes:
 On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
  wrote:
> The first loop in the testcase regressed after my recent changes to
> dr_analyze_innermost.  Previously we would treat "i" as an iv even
> for bb analysis and end up with:
>
>DR_BASE_ADDRESS: p or q
>DR_OFFSET: 0
>DR_INIT: 0 or 4
>DR_STEP: 16
>
> We now always keep the step as 0 instead, so for an int "i" we'd have:
>
>DR_BASE_ADDRESS: p or q
>DR_OFFSET: (intptr_t) i
>DR_INIT: 0 or 4
>DR_STEP: 0
>
> This is also what we'd like to have for the unsigned "i", but the
> problem is that strip_constant_offset thinks that the "i + 1" in
> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
> longer seem to be related to the [i] ones.

 Didn't read the change in detail, so sorry if I mis-understood the issue.
 I made changes in scev to better fold type conversion by various sources
 of information, for example, vrp, niters, undefined overflow behavior etc.
 In theory these information should be available for other optimizers 
 without
 querying scev.  For the mentioned test, vrp should compute accurate range
 information for "i" so that we can fold (intptr_t) (i + 1) it without
 worrying
 overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 
 1
 could be more expensive (especially in case of (T)(i + j)), or because the
 CST part is in bigger precision after conversion.
 But such folding is wanted in several places, e.g, IVOPTs.  To provide such
 an interface, we changed tree-affine and made it do aggressive fold.  I am
 curious if it's possible to use aff_tree to implement strip_constant_offset
 here since aggressive folding is wanted.  After all, using additional chrec
 looks like a little heavy wrto the simple test.
>>>
>>> Yeah, using aff_tree does work here when the bounds are constant.
>>> It doesn't look like it works for things like:
>>>
>>> double p[1000];
>>> double q[1000];
>>>
>>> void
>>> f4 (unsigned int n)
>>> {
>>>   for (unsigned int i = 0; i < n; i += 4)
>>> {
>>>   double a = q[i] + p[i];
>>>   double b = q[i + 1] + p[i + 1];
>>>   q[i] = a;
>>>   q[i + 1] = b;
>>> }
>>> }
>>>
>>> though, where the bounds on the global arrays guarantee that [i + 1] can't
>>> overflow, even though "n" is unconstrained.  The patch as posted handles
>>> this case too.
>> BTW is this a missed optimization in value range analysis?  The range
>> information for i should flow in a way like: array boundary -> niters
>> -> scev/vrp.
>> I think that's what niters/scev do in analysis.
>
> Yeah, maybe :-)  It looks like the problem is that when SLP runs,
> the previous VRP pass came before loop header copying, so the (single)
> header has to cope with n == 0 case.  Thus we get:
Ah, there are several passes in between vrp and pass_ch, not sure if
any such pass depends on vrp intensively.  I would suggestion reorder
the two passes, or standalone VRP interface updating information for
loop region after header copied?   This is a non-trivial issue that
needs to be fixed.  Niters analyzer rely on
simplify_using_initial_conditions heavily to get the same information,
which in my opinion should be provided by VRP.  Though this won't be
able to obsolete simplify_using_initial_conditions because VRP is weak
in symbolic range...

>
>   Visiting statement:
>   i_15 = ASSERT_EXPR ;
>   Intersecting
> [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>   and
> [0, 0]
>   to
> [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
>   Intersecting
> [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
>   and
> [0, 1000]
>   to
> [0, 0]  EQUIVALENCES: { i_6 } (1 elements)
>   Found new range for i_15: [0, 0]
>
>   Visiting statement:
>   _3 = i_15 + 1;
>   Match-and-simplified i_15 + 1 to 1
>   Intersecting
> [1, 1]
>   and
> [0, +INF]
>   to
> [1, 1]
>   Found new range for _3: [1, 1]
>
> (where _3 is the index we care about), followed by:
>
>   Visiting statement:
>   i_15 = ASSERT_EXPR ;
>   Intersectings/aarch64-linux/trunk-orig/debug/gcc'
> [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>   and
> [0, 4]
>   to
> [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
>   Intersecting
> [0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } 

Re: PR81635: Use chrecs to help find related data refs

2017-08-16 Thread Richard Sandiford
"Bin.Cheng"  writes:
> On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
>  wrote:
>> "Bin.Cheng"  writes:
>>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>>>  wrote:
 The first loop in the testcase regressed after my recent changes to
 dr_analyze_innermost.  Previously we would treat "i" as an iv even
 for bb analysis and end up with:

DR_BASE_ADDRESS: p or q
DR_OFFSET: 0
DR_INIT: 0 or 4
DR_STEP: 16

 We now always keep the step as 0 instead, so for an int "i" we'd have:

DR_BASE_ADDRESS: p or q
DR_OFFSET: (intptr_t) i
DR_INIT: 0 or 4
DR_STEP: 0

 This is also what we'd like to have for the unsigned "i", but the
 problem is that strip_constant_offset thinks that the "i + 1" in
 "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
 The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
 name that holds "(intptr_t) (i + 1)", meaning that the accesses no
 longer seem to be related to the [i] ones.
>>>
>>> Didn't read the change in detail, so sorry if I mis-understood the issue.
>>> I made changes in scev to better fold type conversion by various sources
>>> of information, for example, vrp, niters, undefined overflow behavior etc.
>>> In theory these information should be available for other optimizers without
>>> querying scev.  For the mentioned test, vrp should compute accurate range
>>> information for "i" so that we can fold (intptr_t) (i + 1) it without
>>> worrying
>>> overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 1
>>> could be more expensive (especially in case of (T)(i + j)), or because the
>>> CST part is in bigger precision after conversion.
>>> But such folding is wanted in several places, e.g, IVOPTs.  To provide such
>>> an interface, we changed tree-affine and made it do aggressive fold.  I am
>>> curious if it's possible to use aff_tree to implement strip_constant_offset
>>> here since aggressive folding is wanted.  After all, using additional chrec
>>> looks like a little heavy wrto the simple test.
>>
>> Yeah, using aff_tree does work here when the bounds are constant.
>> It doesn't look like it works for things like:
>>
>> double p[1000];
>> double q[1000];
>>
>> void
>> f4 (unsigned int n)
>> {
>>   for (unsigned int i = 0; i < n; i += 4)
>> {
>>   double a = q[i] + p[i];
>>   double b = q[i + 1] + p[i + 1];
>>   q[i] = a;
>>   q[i + 1] = b;
>> }
>> }
>>
>> though, where the bounds on the global arrays guarantee that [i + 1] can't
>> overflow, even though "n" is unconstrained.  The patch as posted handles
>> this case too.
> BTW is this a missed optimization in value range analysis?  The range
> information for i should flow in a way like: array boundary -> niters
> -> scev/vrp.
> I think that's what niters/scev do in analysis.

Yeah, maybe :-)  It looks like the problem is that when SLP runs,
the previous VRP pass came before loop header copying, so the (single)
header has to cope with n == 0 case.  Thus we get:

  Visiting statement:
  i_15 = ASSERT_EXPR ;
  Intersecting
[0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  and
[0, 0]
  to
[0, 0]  EQUIVALENCES: { i_6 } (1 elements)
  Intersecting
[0, 0]  EQUIVALENCES: { i_6 } (1 elements)
  and
[0, 1000]
  to
[0, 0]  EQUIVALENCES: { i_6 } (1 elements)
  Found new range for i_15: [0, 0]

  Visiting statement:
  _3 = i_15 + 1;
  Match-and-simplified i_15 + 1 to 1
  Intersecting
[1, 1]
  and
[0, +INF]
  to
[1, 1]
  Found new range for _3: [1, 1]

(where _3 is the index we care about), followed by:

  Visiting statement:
  i_15 = ASSERT_EXPR ;
  Intersecting
[0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  and
[0, 4]
  to
[0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  Intersecting
[0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  and
[0, 1000]
  to
[0, n_9(D) + 4294967295]  EQUIVALENCES: { i_6 } (1 elements)
  Found new range for i_15: [0, n_9(D) + 4294967295]

  Visiting statement:
  _3 = i_15 + 1;
  Intersecting
[0, +INF]
  and
[0, +INF]
  to
[0, +INF]
  Found new range for _3: [0, +INF]

I guess in this case it would be better to intersect the i_15 ranges
to [0, 1000] rather than [0, n_9(D) + 4294967295].

It does work if another VRP pass runs after CH.  But even then,
is it a good idea to rely on the range info being kept up-to-date
all the way through to SLP?  A lot happens inbetween.

FWIW, the old simple_iv check that I removed for bb data-ref analysis
relies on SCEV analysis too, so I don't think this is more expensive
than what we had before.

Thanks,
Richard


Re: PR81635: Use chrecs to help find related data refs

2017-08-16 Thread Bin.Cheng
On Wed, Aug 16, 2017 at 5:00 PM, Richard Sandiford
 wrote:
> "Bin.Cheng"  writes:
>> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>>  wrote:
>>> The first loop in the testcase regressed after my recent changes to
>>> dr_analyze_innermost.  Previously we would treat "i" as an iv even
>>> for bb analysis and end up with:
>>>
>>>DR_BASE_ADDRESS: p or q
>>>DR_OFFSET: 0
>>>DR_INIT: 0 or 4
>>>DR_STEP: 16
>>>
>>> We now always keep the step as 0 instead, so for an int "i" we'd have:
>>>
>>>DR_BASE_ADDRESS: p or q
>>>DR_OFFSET: (intptr_t) i
>>>DR_INIT: 0 or 4
>>>DR_STEP: 0
>>>
>>> This is also what we'd like to have for the unsigned "i", but the
>>> problem is that strip_constant_offset thinks that the "i + 1" in
>>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
>>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
>>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
>>> longer seem to be related to the [i] ones.
>>
>> Didn't read the change in detail, so sorry if I mis-understood the issue.
>> I made changes in scev to better fold type conversion by various sources
>> of information, for example, vrp, niters, undefined overflow behavior etc.
>> In theory these information should be available for other optimizers without
>> querying scev.  For the mentioned test, vrp should compute accurate range
>> information for "i" so that we can fold (intptr_t) (i + 1) it without 
>> worrying
>> overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 1
>> could be more expensive (especially in case of (T)(i + j)), or because the
>> CST part is in bigger precision after conversion.
>> But such folding is wanted in several places, e.g, IVOPTs.  To provide such
>> an interface, we changed tree-affine and made it do aggressive fold.  I am
>> curious if it's possible to use aff_tree to implement strip_constant_offset
>> here since aggressive folding is wanted.  After all, using additional chrec
>> looks like a little heavy wrto the simple test.
>
> Yeah, using aff_tree does work here when the bounds are constant.
> It doesn't look like it works for things like:
>
> double p[1000];
> double q[1000];
>
> void
> f4 (unsigned int n)
> {
>   for (unsigned int i = 0; i < n; i += 4)
> {
>   double a = q[i] + p[i];
>   double b = q[i + 1] + p[i + 1];
>   q[i] = a;
>   q[i + 1] = b;
> }
> }
>
> though, where the bounds on the global arrays guarantee that [i + 1] can't
> overflow, even though "n" is unconstrained.  The patch as posted handles
> this case too.
BTW is this a missed optimization in value range analysis?  The range
information for i should flow in a way like: array boundary -> niters
-> scev/vrp.
I think that's what niters/scev do in analysis.

Thanks,
bin
>
> Thanks,
> Richard
>
>>
>> Thanks,
>> bin
>>>
>>> There seem to be two main uses of DR_OFFSET and DR_INIT.  One type
>>> expects DR_OFFSET and DR_INIT to be generic expressions whose sum
>>> gives the initial offset from DR_BASE_ADDRESS.  The other type uses
>>> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data
>>> references, with the distance between their start addresses being
>>> the difference between their DR_INITs.  For this second type, the
>>> exact form of DR_OFFSET doesn't really matter, and the more it is
>>> able to canonicalise the non-constant offset, the better.
>>>
>>> The patch fixes the PR by adding another offset/init pair to the
>>> innermost loop behaviour for this second group.  The new pair use chrecs
>>> rather than generic exprs for the offset part, with the chrec being
>>> analysed in the innermost loop for which the offset isn't invariant.
>>> This allows us to vectorise the second function in the testcase
>>> as well, which wasn't possible before the earlier patch.
>>>
>>> Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?
>>>
>>> Richard
>>>
>>>
>>> gcc/
>>> PR tree-optimization/81635
>>> * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and
>>> chrec_init.
>>> (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros.
>>> (dr_equal_offsets_p): Delete.
>>> (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare.
>>> * tree-data-ref.c: Include tree-ssa-loop-ivopts.h.
>>> (split_constant_offset): Handle POLYNOMIAL_CHREC.
>>> (dr_analyze_innermost): Initialize dr_chrec_offset and 
>>> dr_chrec_init.
>>> (operator ==): Use dr_chrec_offsets_equal_p and compare the
>>> DR_CHREC_INITs.
>>> (prune_runtime_alias_test_list): Likewise.
>>> (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and 
>>> compare
>>> the DR_CHREC_INITs.
>>> (dr_equal_offsets_p1, dr_equal_offsets_p): Delete.
>>> (analyze_offset_scev): New function.

Re: PR81635: Use chrecs to help find related data refs

2017-08-16 Thread Richard Sandiford
"Bin.Cheng"  writes:
> On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
>  wrote:
>> The first loop in the testcase regressed after my recent changes to
>> dr_analyze_innermost.  Previously we would treat "i" as an iv even
>> for bb analysis and end up with:
>>
>>DR_BASE_ADDRESS: p or q
>>DR_OFFSET: 0
>>DR_INIT: 0 or 4
>>DR_STEP: 16
>>
>> We now always keep the step as 0 instead, so for an int "i" we'd have:
>>
>>DR_BASE_ADDRESS: p or q
>>DR_OFFSET: (intptr_t) i
>>DR_INIT: 0 or 4
>>DR_STEP: 0
>>
>> This is also what we'd like to have for the unsigned "i", but the
>> problem is that strip_constant_offset thinks that the "i + 1" in
>> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
>> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
>> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
>> longer seem to be related to the [i] ones.
>
> Didn't read the change in detail, so sorry if I mis-understood the issue.
> I made changes in scev to better fold type conversion by various sources
> of information, for example, vrp, niters, undefined overflow behavior etc.
> In theory these information should be available for other optimizers without
> querying scev.  For the mentioned test, vrp should compute accurate range
> information for "i" so that we can fold (intptr_t) (i + 1) it without worrying
> overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 1
> could be more expensive (especially in case of (T)(i + j)), or because the
> CST part is in bigger precision after conversion.
> But such folding is wanted in several places, e.g, IVOPTs.  To provide such
> an interface, we changed tree-affine and made it do aggressive fold.  I am
> curious if it's possible to use aff_tree to implement strip_constant_offset
> here since aggressive folding is wanted.  After all, using additional chrec
> looks like a little heavy wrto the simple test.

Yeah, using aff_tree does work here when the bounds are constant.
It doesn't look like it works for things like:

double p[1000];
double q[1000];

void
f4 (unsigned int n)
{
  for (unsigned int i = 0; i < n; i += 4)
{
  double a = q[i] + p[i];
  double b = q[i + 1] + p[i + 1];
  q[i] = a;
  q[i + 1] = b;
}
}

though, where the bounds on the global arrays guarantee that [i + 1] can't
overflow, even though "n" is unconstrained.  The patch as posted handles
this case too.

Thanks,
Richard

>
> Thanks,
> bin
>>
>> There seem to be two main uses of DR_OFFSET and DR_INIT.  One type
>> expects DR_OFFSET and DR_INIT to be generic expressions whose sum
>> gives the initial offset from DR_BASE_ADDRESS.  The other type uses
>> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data
>> references, with the distance between their start addresses being
>> the difference between their DR_INITs.  For this second type, the
>> exact form of DR_OFFSET doesn't really matter, and the more it is
>> able to canonicalise the non-constant offset, the better.
>>
>> The patch fixes the PR by adding another offset/init pair to the
>> innermost loop behaviour for this second group.  The new pair use chrecs
>> rather than generic exprs for the offset part, with the chrec being
>> analysed in the innermost loop for which the offset isn't invariant.
>> This allows us to vectorise the second function in the testcase
>> as well, which wasn't possible before the earlier patch.
>>
>> Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?
>>
>> Richard
>>
>>
>> gcc/
>> PR tree-optimization/81635
>> * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and
>> chrec_init.
>> (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros.
>> (dr_equal_offsets_p): Delete.
>> (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare.
>> * tree-data-ref.c: Include tree-ssa-loop-ivopts.h.
>> (split_constant_offset): Handle POLYNOMIAL_CHREC.
>> (dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init.
>> (operator ==): Use dr_chrec_offsets_equal_p and compare the
>> DR_CHREC_INITs.
>> (prune_runtime_alias_test_list): Likewise.
>> (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare
>> the DR_CHREC_INITs.
>> (dr_equal_offsets_p1, dr_equal_offsets_p): Delete.
>> (analyze_offset_scev): New function.
>> (compute_offset_chrecs): Likewise.
>> (dr_chrec_offsets_equal_p): Likewise.
>> (dr_chrec_offsets_compare): Likewise.
>> * tree-loop-distribution.c (compute_alias_check_pairs): Use
>> dr_chrec_offsets_compare.
>> * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use
>> dr_chrec_offsets_compare and compare the DR_CHREC_INITs.
>> (dr_group_sort_cmp): Likewise.
>> 

Re: PR81635: Use chrecs to help find related data refs

2017-08-16 Thread Bin.Cheng
On Wed, Aug 16, 2017 at 2:38 PM, Richard Sandiford
 wrote:
> The first loop in the testcase regressed after my recent changes to
> dr_analyze_innermost.  Previously we would treat "i" as an iv even
> for bb analysis and end up with:
>
>DR_BASE_ADDRESS: p or q
>DR_OFFSET: 0
>DR_INIT: 0 or 4
>DR_STEP: 16
>
> We now always keep the step as 0 instead, so for an int "i" we'd have:
>
>DR_BASE_ADDRESS: p or q
>DR_OFFSET: (intptr_t) i
>DR_INIT: 0 or 4
>DR_STEP: 0
>
> This is also what we'd like to have for the unsigned "i", but the
> problem is that strip_constant_offset thinks that the "i + 1" in
> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
> longer seem to be related to the [i] ones.

Didn't read the change in detail, so sorry if I mis-understood the issue.
I made changes in scev to better fold type conversion by various sources
of information, for example, vrp, niters, undefined overflow behavior etc.
In theory these information should be available for other optimizers without
querying scev.  For the mentioned test, vrp should compute accurate range
information for "i" so that we can fold (intptr_t) (i + 1) it without worrying
overflow.  Note we don't do it in generic folding because (intptr_t) (i) + 1
could be more expensive (especially in case of (T)(i + j)), or because the
CST part is in bigger precision after conversion.
But such folding is wanted in several places, e.g, IVOPTs.  To provide such
an interface, we changed tree-affine and made it do aggressive fold.  I am
curious if it's possible to use aff_tree to implement strip_constant_offset
here since aggressive folding is wanted.  After all, using additional chrec
looks like a little heavy wrto the simple test.

Thanks,
bin
>
> There seem to be two main uses of DR_OFFSET and DR_INIT.  One type
> expects DR_OFFSET and DR_INIT to be generic expressions whose sum
> gives the initial offset from DR_BASE_ADDRESS.  The other type uses
> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data
> references, with the distance between their start addresses being
> the difference between their DR_INITs.  For this second type, the
> exact form of DR_OFFSET doesn't really matter, and the more it is
> able to canonicalise the non-constant offset, the better.
>
> The patch fixes the PR by adding another offset/init pair to the
> innermost loop behaviour for this second group.  The new pair use chrecs
> rather than generic exprs for the offset part, with the chrec being
> analysed in the innermost loop for which the offset isn't invariant.
> This allows us to vectorise the second function in the testcase
> as well, which wasn't possible before the earlier patch.
>
> Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?
>
> Richard
>
>
> gcc/
> PR tree-optimization/81635
> * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and
> chrec_init.
> (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros.
> (dr_equal_offsets_p): Delete.
> (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare.
> * tree-data-ref.c: Include tree-ssa-loop-ivopts.h.
> (split_constant_offset): Handle POLYNOMIAL_CHREC.
> (dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init.
> (operator ==): Use dr_chrec_offsets_equal_p and compare the
> DR_CHREC_INITs.
> (prune_runtime_alias_test_list): Likewise.
> (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare
> the DR_CHREC_INITs.
> (dr_equal_offsets_p1, dr_equal_offsets_p): Delete.
> (analyze_offset_scev): New function.
> (compute_offset_chrecs): Likewise.
> (dr_chrec_offsets_equal_p): Likewise.
> (dr_chrec_offsets_compare): Likewise.
> * tree-loop-distribution.c (compute_alias_check_pairs): Use
> dr_chrec_offsets_compare.
> * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use
> dr_chrec_offsets_compare and compare the DR_CHREC_INITs.
> (dr_group_sort_cmp): Likewise.
> (vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT.
> (vect_no_alias_p): Likewise.
> (vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and
> compare the DR_CHREC_INITs.
> (vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare.
> * tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead
> of DR_INIT.
>
> gcc/testsuite/
> PR tree-optimization/81635
> * gcc.dg/vect/bb-slp-pr81635.c: New test.
>


Re: PR81635: Use chrecs to help find related data refs

2017-08-16 Thread Richard Sandiford
Richard Sandiford  writes:
> The first loop in the testcase regressed after my recent changes to
> dr_analyze_innermost.  Previously we would treat "i" as an iv even
> for bb analysis and end up with:
>
>DR_BASE_ADDRESS: p or q
>DR_OFFSET: 0
>DR_INIT: 0 or 4
>DR_STEP: 16
>
> We now always keep the step as 0 instead, so for an int "i" we'd have:
>
>DR_BASE_ADDRESS: p or q
>DR_OFFSET: (intptr_t) i
>DR_INIT: 0 or 4
>DR_STEP: 0
>
> This is also what we'd like to have for the unsigned "i", but the
> problem is that strip_constant_offset thinks that the "i + 1" in
> "(intptr_t) (i + 1)" could wrap and so doesn't peel off the "+ 1".
> The [i + 1] accesses therefore have a DR_OFFSET equal to the SSA
> name that holds "(intptr_t) (i + 1)", meaning that the accesses no
> longer seem to be related to the [i] ones.
>
> There seem to be two main uses of DR_OFFSET and DR_INIT.  One type
> expects DR_OFFSET and DR_INIT to be generic expressions whose sum
> gives the initial offset from DR_BASE_ADDRESS.  The other type uses
> the pair (DR_BASE_ADDRESS, DR_OFFSET) to identify related data
> references, with the distance between their start addresses being
> the difference between their DR_INITs.  For this second type, the
> exact form of DR_OFFSET doesn't really matter, and the more it is
> able to canonicalise the non-constant offset, the better.
>
> The patch fixes the PR by adding another offset/init pair to the
> innermost loop behaviour for this second group.  The new pair use chrecs
> rather than generic exprs for the offset part, with the chrec being
> analysed in the innermost loop for which the offset isn't invariant.
> This allows us to vectorise the second function in the testcase
> as well, which wasn't possible before the earlier patch.
>
> Tested on x86_64-linux-gnu and aarch64-linux-gnu.  OK to install?
>
> Richard
>
>
> gcc/
>   PR tree-optimization/81635
>   * tree-data-ref.h (innermost_loop_behavior): Add chrec_offset and
>   chrec_init.
>   (DR_CHREC_OFFSET, DR_CHREC_INIT): New macros.
>   (dr_equal_offsets_p): Delete.
>   (dr_chrec_offsets_equal_p, dr_chrec_offsets_compare): Declare.
>   * tree-data-ref.c: Include tree-ssa-loop-ivopts.h.
>   (split_constant_offset): Handle POLYNOMIAL_CHREC.
>   (dr_analyze_innermost): Initialize dr_chrec_offset and dr_chrec_init.
>   (operator ==): Use dr_chrec_offsets_equal_p and compare the
>   DR_CHREC_INITs.
>   (prune_runtime_alias_test_list): Likewise.
>   (comp_dr_with_seg_len_pair): Use dr_chrec_offsets_compare and compare
>   the DR_CHREC_INITs.
>   (dr_equal_offsets_p1, dr_equal_offsets_p): Delete.
>   (analyze_offset_scev): New function.
>   (compute_offset_chrecs): Likewise.
>   (dr_chrec_offsets_equal_p): Likewise.
>   (dr_chrec_offsets_compare): Likewise.
>   * tree-loop-distribution.c (compute_alias_check_pairs): Use
>   dr_chrec_offsets_compare.
>   * tree-vect-data-refs.c (vect_find_same_alignment_drs): Use
>   dr_chrec_offsets_compare and compare the DR_CHREC_INITs.
>   (dr_group_sort_cmp): Likewise.
>   (vect_analyze_group_access_1): Use DR_CHREC_INIT instead of DR_INIT.
>   (vect_no_alias_p): Likewise.
>   (vect_analyze_data_ref_accesses): Use dr_chrec_offsets_equal_p and
>   compare the DR_CHREC_INITs.
>   (vect_prune_runtime_alias_test_list): Use dr_chrec_offsets_compare.
>   * tree-vect-stmts.c (vectorizable_load): Use DR_CHREC_INIT instead
>   of DR_INIT.
>
> gcc/testsuite/
>   PR tree-optimization/81635
>   * gcc.dg/vect/bb-slp-pr81635.c: New test.

Sorry, forgot I was going to convert this to a context diff...

Index: gcc/tree-data-ref.h
===
*** gcc/tree-data-ref.h 2017-08-04 11:42:45.938134820 +0100
--- gcc/tree-data-ref.h 2017-08-16 14:34:56.611554296 +0100
*** struct innermost_loop_behavior
*** 52,57 
--- 52,78 
tree init;
tree step;
  
+   /* The scalar evolution of OFFSET + INIT, split into non-constant and
+  constant terms in the same way as OFFSET and INIT.  For example,
+  if OFFSET + INIT is {x + 4, +, 3}_1, the fields would be:
+ 
+  chrec_offset  {x, +, 3}_1
+  chrec_init4
+ 
+  If OFFSET + INIT cannot be represented as a scalar evolution,
+  the fields are simply a copy of OFFSET and INIT.
+ 
+  This split is useful when analyzing the relationship between two
+  data references with the same base.  OFFSET and INIT are generic
+  expressions that can be used to build related data references,
+  while CHREC_OFFSET and CHREC_INIT are our best attempt at
+  canonicalizing the value of OFFSET + INIT.
+ 
+  These fields are only initialized lazily, by a call to
+  compute_offset_chrecs.  */
+   tree chrec_offset;
+   tree chrec_init;
+ 
/* BASE_ADDRESS is known to be misaligned by