Re: [RFC] Induction variable candidates not sufficiently general

2018-07-23 Thread Richard Biener
On Sat, Jul 21, 2018 at 3:28 AM Bin.Cheng  wrote:
>
> On Tue, Jul 17, 2018 at 2:08 AM, Kelvin Nilsen  wrote:
> > Thanks for looking at this for me.  In simplifying the test case for a bug 
> > report, I've narrowed the "problem" to integer overflow considerations.  My 
> > len variable is declared int, and the target has 64-bit pointers.  I'm 
> > gathering that the "manual transformation" I quoted below is not considered 
> > "equivalent" to the original source code due to different integer overflow 
> > behaviors.  If I redeclare len to be unsigned long long, then I 
> > automatically get the optimizations that I was originally expecting.
> >
> > I suppose this is really NOT a bug?
> As your test case demonstrates, it is caused by wrapping unsigned int32.
> >
> > Is there a compiler optimization flag that allows the optimizer to ignore 
> > array index integer overflow in considering legal optimizations?
> I am not aware of one for unsigned integer, and I guess it won't be
> introduced in the future either?

We've had -funsafe-loop-optimizations in the past but that only
concerned niter analysis, not scalar evolution analysis
which is likely required here.

And no, there's no plan to re-introduce those.

Richard.

> Thanks,
> bin
> >
> >
> >
> > On 7/13/18 9:14 PM, Bin.Cheng wrote:
> >> On Fri, Jul 13, 2018 at 6:04 AM, Kelvin Nilsen  
> >> wrote:
> >>> A somewhat old "issue report" pointed me to the code generated for a 
> >>> 4-fold manually unrolled version of the following loop:
> >>>
>    while (++len != len_limit) /* this is loop */
>    if (pb[len] != cur[len])
>    break;
> >>>
> >>> As unrolled, the loop appears as:
> >>>
>  while (++len != len_limit) /* this is loop */ {
>    if (pb[len] != cur[len])
>  break;
>    if (++len == len_limit)  /* unrolled 2nd iteration */
>  break;
>    if (pb[len] != cur[len])
>  break;
>    if (++len == len_limit)  /* unrolled 3rd iteration */
>  break;
>    if (pb[len] != cur[len])
>  break;
>    if (++len == len_limit)  /* unrolled 4th iteration */
>  break;
>    if (pb[len] != cur[len])
>  break;
>  }
> >>>
> >>> In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the 
> >>> only induction variable candidates that are being considered are all 
> >>> forms of the len variable.  We are not considering any induction 
> >>> variables to represent the address expressions [len] and [len].
> >>>
> >>> I rewrote the source code for this loop to make the addressing 
> >>> expressions more explicit, as in the following:
> >>>
>    cur++;
>    while (++pb != last_pb) /* this is loop */ {
>    if (*pb != *cur)
>  break;
>    ++cur;
>    if (++pb == last_pb)  /* unrolled 2nd iteration */
>  break;
>    if (*pb != *cur)
>  break;
>    ++cur;
>    if (++pb == last_pb)  /* unrolled 3rd iteration */
>  break;
>    if (*pb != *cur)
>  break;
>    ++cur;
>    if (++pb == last_pb)  /* unrolled 4th iteration */
>  break;
>    if (*pb != *cur)
>  break;
>    ++cur;
>    }
> >>>
> >>> Now, gcc does a better job of identifying the "address expression 
> >>> induction variables".  This version of the loop runs about 10% faster 
> >>> than the original on my target architecture.
> >>>
> >>> This would seem to be a textbook pattern for the induction variable 
> >>> analysis.  Does anyone have any thoughts on the best way to add these 
> >>> candidates to the set of induction variables that are considered by 
> >>> tree-ssa-loop-ivopts.c?
> >>>
> >>> Thanks in advance for any suggestions.
> >>>
> >> Hi,
> >> Could you please file a bug with your original slow test code
> >> attached?  I tried to construct meaningful test case from your code
> >> snippet but not successful.  There is difference in generated
> >> assembly, but it's not that fundamental.  So a bug with preprocessed
> >> test would be high appreciated.
> >> I think there are two potential issues in cost computation for such
> >> case: invariant expression and iv uses outside of loop handled as
> >> inside uses.
> >>
> >> Thanks,
> >> bin
> >>
> >>


Re: [RFC] Induction variable candidates not sufficiently general

2018-07-20 Thread Bin.Cheng
On Tue, Jul 17, 2018 at 2:08 AM, Kelvin Nilsen  wrote:
> Thanks for looking at this for me.  In simplifying the test case for a bug 
> report, I've narrowed the "problem" to integer overflow considerations.  My 
> len variable is declared int, and the target has 64-bit pointers.  I'm 
> gathering that the "manual transformation" I quoted below is not considered 
> "equivalent" to the original source code due to different integer overflow 
> behaviors.  If I redeclare len to be unsigned long long, then I automatically 
> get the optimizations that I was originally expecting.
>
> I suppose this is really NOT a bug?
As your test case demonstrates, it is caused by wrapping unsigned int32.
>
> Is there a compiler optimization flag that allows the optimizer to ignore 
> array index integer overflow in considering legal optimizations?
I am not aware of one for unsigned integer, and I guess it won't be
introduced in the future either?

Thanks,
bin
>
>
>
> On 7/13/18 9:14 PM, Bin.Cheng wrote:
>> On Fri, Jul 13, 2018 at 6:04 AM, Kelvin Nilsen  
>> wrote:
>>> A somewhat old "issue report" pointed me to the code generated for a 4-fold 
>>> manually unrolled version of the following loop:
>>>
   while (++len != len_limit) /* this is loop */
   if (pb[len] != cur[len])
   break;
>>>
>>> As unrolled, the loop appears as:
>>>
 while (++len != len_limit) /* this is loop */ {
   if (pb[len] != cur[len])
 break;
   if (++len == len_limit)  /* unrolled 2nd iteration */
 break;
   if (pb[len] != cur[len])
 break;
   if (++len == len_limit)  /* unrolled 3rd iteration */
 break;
   if (pb[len] != cur[len])
 break;
   if (++len == len_limit)  /* unrolled 4th iteration */
 break;
   if (pb[len] != cur[len])
 break;
 }
>>>
>>> In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the 
>>> only induction variable candidates that are being considered are all forms 
>>> of the len variable.  We are not considering any induction variables to 
>>> represent the address expressions [len] and [len].
>>>
>>> I rewrote the source code for this loop to make the addressing expressions 
>>> more explicit, as in the following:
>>>
   cur++;
   while (++pb != last_pb) /* this is loop */ {
   if (*pb != *cur)
 break;
   ++cur;
   if (++pb == last_pb)  /* unrolled 2nd iteration */
 break;
   if (*pb != *cur)
 break;
   ++cur;
   if (++pb == last_pb)  /* unrolled 3rd iteration */
 break;
   if (*pb != *cur)
 break;
   ++cur;
   if (++pb == last_pb)  /* unrolled 4th iteration */
 break;
   if (*pb != *cur)
 break;
   ++cur;
   }
>>>
>>> Now, gcc does a better job of identifying the "address expression induction 
>>> variables".  This version of the loop runs about 10% faster than the 
>>> original on my target architecture.
>>>
>>> This would seem to be a textbook pattern for the induction variable 
>>> analysis.  Does anyone have any thoughts on the best way to add these 
>>> candidates to the set of induction variables that are considered by 
>>> tree-ssa-loop-ivopts.c?
>>>
>>> Thanks in advance for any suggestions.
>>>
>> Hi,
>> Could you please file a bug with your original slow test code
>> attached?  I tried to construct meaningful test case from your code
>> snippet but not successful.  There is difference in generated
>> assembly, but it's not that fundamental.  So a bug with preprocessed
>> test would be high appreciated.
>> I think there are two potential issues in cost computation for such
>> case: invariant expression and iv uses outside of loop handled as
>> inside uses.
>>
>> Thanks,
>> bin
>>
>>


Re: [RFC] Induction variable candidates not sufficiently general

2018-07-16 Thread Kelvin Nilsen
Thanks for looking at this for me.  In simplifying the test case for a bug 
report, I've narrowed the "problem" to integer overflow considerations.  My len 
variable is declared int, and the target has 64-bit pointers.  I'm gathering 
that the "manual transformation" I quoted below is not considered "equivalent" 
to the original source code due to different integer overflow behaviors.  If I 
redeclare len to be unsigned long long, then I automatically get the 
optimizations that I was originally expecting.

I suppose this is really NOT a bug?

Is there a compiler optimization flag that allows the optimizer to ignore array 
index integer overflow in considering legal optimizations?



On 7/13/18 9:14 PM, Bin.Cheng wrote:
> On Fri, Jul 13, 2018 at 6:04 AM, Kelvin Nilsen  wrote:
>> A somewhat old "issue report" pointed me to the code generated for a 4-fold 
>> manually unrolled version of the following loop:
>>
>>>   while (++len != len_limit) /* this is loop */
>>>   if (pb[len] != cur[len])
>>>   break;
>>
>> As unrolled, the loop appears as:
>>
>>> while (++len != len_limit) /* this is loop */ {
>>>   if (pb[len] != cur[len])
>>> break;
>>>   if (++len == len_limit)  /* unrolled 2nd iteration */
>>> break;
>>>   if (pb[len] != cur[len])
>>> break;
>>>   if (++len == len_limit)  /* unrolled 3rd iteration */
>>> break;
>>>   if (pb[len] != cur[len])
>>> break;
>>>   if (++len == len_limit)  /* unrolled 4th iteration */
>>> break;
>>>   if (pb[len] != cur[len])
>>> break;
>>> }
>>
>> In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the 
>> only induction variable candidates that are being considered are all forms 
>> of the len variable.  We are not considering any induction variables to 
>> represent the address expressions [len] and [len].
>>
>> I rewrote the source code for this loop to make the addressing expressions 
>> more explicit, as in the following:
>>
>>>   cur++;
>>>   while (++pb != last_pb) /* this is loop */ {
>>>   if (*pb != *cur)
>>> break;
>>>   ++cur;
>>>   if (++pb == last_pb)  /* unrolled 2nd iteration */
>>> break;
>>>   if (*pb != *cur)
>>> break;
>>>   ++cur;
>>>   if (++pb == last_pb)  /* unrolled 3rd iteration */
>>> break;
>>>   if (*pb != *cur)
>>> break;
>>>   ++cur;
>>>   if (++pb == last_pb)  /* unrolled 4th iteration */
>>> break;
>>>   if (*pb != *cur)
>>> break;
>>>   ++cur;
>>>   }
>>
>> Now, gcc does a better job of identifying the "address expression induction 
>> variables".  This version of the loop runs about 10% faster than the 
>> original on my target architecture.
>>
>> This would seem to be a textbook pattern for the induction variable 
>> analysis.  Does anyone have any thoughts on the best way to add these 
>> candidates to the set of induction variables that are considered by 
>> tree-ssa-loop-ivopts.c?
>>
>> Thanks in advance for any suggestions.
>>
> Hi,
> Could you please file a bug with your original slow test code
> attached?  I tried to construct meaningful test case from your code
> snippet but not successful.  There is difference in generated
> assembly, but it's not that fundamental.  So a bug with preprocessed
> test would be high appreciated.
> I think there are two potential issues in cost computation for such
> case: invariant expression and iv uses outside of loop handled as
> inside uses.
> 
> Thanks,
> bin
> 
> 

#include 
#include 

int
bt_skip_func(const __uint64_t len_limit, const __uint8_t *cur,
 long long int delta, __uint64_t len) {

  const __uint8_t *pb = cur - delta;

  while (++len != len_limit) {
if (pb[len] != cur[len])
  break;
if (++len == len_limit)
  break;
if (pb[len] != cur[len])
  break;
if (++len == len_limit)
  break;
if (pb[len] != cur[len])
  break;
if (++len == len_limit)
  break;
if (pb[len] != cur[len])
  break;
  }

  return len;
}

int main (int argc, char *argv[]) {

  char *text_input = "this is some text that should be longer";
  unsigned long long int len_limit = strlen (text_input);
  int pos = 0;

  int cur_match = 0;
  int depth = 0;

  int result = bt_skip_func(len_limit, text_input + 3, (long long) 3, 
(unsigned long long) 1);
}
.file   "ivsimple.long.c"
.abiversion 2
.section".text"
.align 2
.p2align 4,,15
.globl bt_skip_func
.type   bt_skip_func, @function
bt_skip_func:
.LFB22:
.cfi_startproc
subf 5,5,4   # 13   [c=4 l=4]  

Re: [RFC] Induction variable candidates not sufficiently general

2018-07-13 Thread Bin.Cheng
On Fri, Jul 13, 2018 at 6:04 AM, Kelvin Nilsen  wrote:
> A somewhat old "issue report" pointed me to the code generated for a 4-fold 
> manually unrolled version of the following loop:
>
>>   while (++len != len_limit) /* this is loop */
>>   if (pb[len] != cur[len])
>>   break;
>
> As unrolled, the loop appears as:
>
>> while (++len != len_limit) /* this is loop */ {
>>   if (pb[len] != cur[len])
>> break;
>>   if (++len == len_limit)  /* unrolled 2nd iteration */
>> break;
>>   if (pb[len] != cur[len])
>> break;
>>   if (++len == len_limit)  /* unrolled 3rd iteration */
>> break;
>>   if (pb[len] != cur[len])
>> break;
>>   if (++len == len_limit)  /* unrolled 4th iteration */
>> break;
>>   if (pb[len] != cur[len])
>> break;
>> }
>
> In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the only 
> induction variable candidates that are being considered are all forms of the 
> len variable.  We are not considering any induction variables to represent 
> the address expressions [len] and [len].
>
> I rewrote the source code for this loop to make the addressing expressions 
> more explicit, as in the following:
>
>>   cur++;
>>   while (++pb != last_pb) /* this is loop */ {
>>   if (*pb != *cur)
>> break;
>>   ++cur;
>>   if (++pb == last_pb)  /* unrolled 2nd iteration */
>> break;
>>   if (*pb != *cur)
>> break;
>>   ++cur;
>>   if (++pb == last_pb)  /* unrolled 3rd iteration */
>> break;
>>   if (*pb != *cur)
>> break;
>>   ++cur;
>>   if (++pb == last_pb)  /* unrolled 4th iteration */
>> break;
>>   if (*pb != *cur)
>> break;
>>   ++cur;
>>   }
>
> Now, gcc does a better job of identifying the "address expression induction 
> variables".  This version of the loop runs about 10% faster than the original 
> on my target architecture.
>
> This would seem to be a textbook pattern for the induction variable analysis. 
>  Does anyone have any thoughts on the best way to add these candidates to the 
> set of induction variables that are considered by tree-ssa-loop-ivopts.c?
>
> Thanks in advance for any suggestions.
>
Hi,
Could you please file a bug with your original slow test code
attached?  I tried to construct meaningful test case from your code
snippet but not successful.  There is difference in generated
assembly, but it's not that fundamental.  So a bug with preprocessed
test would be high appreciated.
I think there are two potential issues in cost computation for such
case: invariant expression and iv uses outside of loop handled as
inside uses.

Thanks,
bin


Re: [RFC] Induction variable candidates not sufficiently general

2018-07-13 Thread Richard Biener
On Fri, Jul 13, 2018 at 12:05 AM Kelvin Nilsen  wrote:
>
> A somewhat old "issue report" pointed me to the code generated for a 4-fold 
> manually unrolled version of the following loop:
>
> >   while (++len != len_limit) /* this is loop */
> >   if (pb[len] != cur[len])
> >   break;
>
> As unrolled, the loop appears as:
>
> > while (++len != len_limit) /* this is loop */ {
> >   if (pb[len] != cur[len])
> > break;
> >   if (++len == len_limit)  /* unrolled 2nd iteration */
> > break;
> >   if (pb[len] != cur[len])
> > break;
> >   if (++len == len_limit)  /* unrolled 3rd iteration */
> > break;
> >   if (pb[len] != cur[len])
> > break;
> >   if (++len == len_limit)  /* unrolled 4th iteration */
> > break;
> >   if (pb[len] != cur[len])
> > break;
> > }
>
> In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the only 
> induction variable candidates that are being considered are all forms of the 
> len variable.  We are not considering any induction variables to represent 
> the address expressions [len] and [len].

I am surprised - did you dig down why?  Because generally IVOPTs does
consider pointer IVs.

Richard.

> I rewrote the source code for this loop to make the addressing expressions 
> more explicit, as in the following:
>
> >   cur++;
> >   while (++pb != last_pb) /* this is loop */ {
> >   if (*pb != *cur)
> > break;
> >   ++cur;
> >   if (++pb == last_pb)  /* unrolled 2nd iteration */
> > break;
> >   if (*pb != *cur)
> > break;
> >   ++cur;
> >   if (++pb == last_pb)  /* unrolled 3rd iteration */
> > break;
> >   if (*pb != *cur)
> > break;
> >   ++cur;
> >   if (++pb == last_pb)  /* unrolled 4th iteration */
> > break;
> >   if (*pb != *cur)
> > break;
> >   ++cur;
> >   }
>
> Now, gcc does a better job of identifying the "address expression induction 
> variables".  This version of the loop runs about 10% faster than the original 
> on my target architecture.
>
> This would seem to be a textbook pattern for the induction variable analysis. 
>  Does anyone have any thoughts on the best way to add these candidates to the 
> set of induction variables that are considered by tree-ssa-loop-ivopts.c?
>
> Thanks in advance for any suggestions.
>


[RFC] Induction variable candidates not sufficiently general

2018-07-12 Thread Kelvin Nilsen
A somewhat old "issue report" pointed me to the code generated for a 4-fold 
manually unrolled version of the following loop:

>   while (++len != len_limit) /* this is loop */
>   if (pb[len] != cur[len])
>   break;

As unrolled, the loop appears as:

> while (++len != len_limit) /* this is loop */ {
>   if (pb[len] != cur[len])
> break;
>   if (++len == len_limit)  /* unrolled 2nd iteration */
> break;
>   if (pb[len] != cur[len])
> break;
>   if (++len == len_limit)  /* unrolled 3rd iteration */
> break;
>   if (pb[len] != cur[len])
> break;
>   if (++len == len_limit)  /* unrolled 4th iteration */
> break;
>   if (pb[len] != cur[len])
> break;
> }

In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the only 
induction variable candidates that are being considered are all forms of the 
len variable.  We are not considering any induction variables to represent the 
address expressions [len] and [len].

I rewrote the source code for this loop to make the addressing expressions more 
explicit, as in the following:

>   cur++;
>   while (++pb != last_pb) /* this is loop */ {
>   if (*pb != *cur)
> break;
>   ++cur;
>   if (++pb == last_pb)  /* unrolled 2nd iteration */
> break;
>   if (*pb != *cur)
> break;
>   ++cur;
>   if (++pb == last_pb)  /* unrolled 3rd iteration */
> break;
>   if (*pb != *cur)
> break;
>   ++cur;
>   if (++pb == last_pb)  /* unrolled 4th iteration */
> break;
>   if (*pb != *cur)
> break;
>   ++cur;
>   }

Now, gcc does a better job of identifying the "address expression induction 
variables".  This version of the loop runs about 10% faster than the original 
on my target architecture.

This would seem to be a textbook pattern for the induction variable analysis.  
Does anyone have any thoughts on the best way to add these candidates to the 
set of induction variables that are considered by tree-ssa-loop-ivopts.c?

Thanks in advance for any suggestions.