Re: Add an alternative vector loop iv mechanism

2017-10-19 Thread Richard Biener
On Thu, Oct 19, 2017 at 10:48 AM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Thu, Oct 19, 2017 at 12:28 AM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
 On Fri, Oct 13, 2017 at 4:10 PM, Richard Sandiford
  wrote:
> Normally we adjust the vector loop so that it iterates:
>
>(original number of scalar iterations - number of peels) / VF
>
> times, enforcing this using an IV that starts at zero and increments
> by one each iteration.  However, dividing by VF would be expensive
> for variable VF, so this patch adds an alternative in which the IV
> increments by VF each iteration instead.  We then need to take care
> to handle possible overflow in the IV.

 Hmm, why do you need to handle possible overflow?  Doesn't the
 original loop have a natural IV that evolves like this?  After all we
 can compute an expression for niters of the scalar loop.
>>>
>>> The problem comes with loops like:
>>>
>>>unsigned char i = 0;
>>>do
>>>  {
>>>...
>>>i--;
>>>  }
>>>while (i != 0);
>>>
>>> The loop statements execute 256 times and the latch executes 255 times.
>>> LOOP_VINFO_NITERSM1 is then 255 but LOOP_VINFO_NITERS (stored as an
>>> unsigned char) is 0.
>>
>> Yes, that's an existing issue and the reason why I introduced
>> NITERSM1.  All remaining uses of NITERS should really go away
>> because of this corner-case.  So you are introducing a new user?
>
> It's not really an NITERSM1 vs. NITERS thing.  We'd get the same
> result/have the same problem with NITERSM1 - (STEP - 1) instead
> of NITERS - STEP, namely:
>
> - the new IV uses the same type as NITERS
> - we only want the loop to iterate if there are at least STEP scalar
>   iterations to go
> - this means that the natural limit is "IV <= NITERS - STEP"
>   or "IV <= NITERSM1 - (STEP - 1)" (both equivalent)
> - the loop is only guaranteed to terminate if the IV can hit
>   a value STEP times higher than that, i.e. "IV == NITERS - STEP"
>   must be followed by an iteration in which the branch-back
>   condition is false
> - but if NITERS can't represent the actual number of iterations,
>   then there is no value STEP times higher than that
> - we cope with this by starting the IV at -1 and using a limit
>   of "IV < NITERS - STEP" i.e. "IV <= NITERSM1 - STEP".
>
> So you could see this as using a limit based on NITERSM1 with a
> start of -1, although the "< NITERS - STEP" avoids the need to
> subtract 1 at runtime.
>
> But it seems better to use a 0-based IV when we can, since that
> leads to more natural ivopts opportunities.  That's why the loop
> tests for the overflow case and only uses the -1 based IV when
> necessary.

I see.  Thanks for the clarification.

Richard.

> Thanks,
> Richard
>
>>
>> Richard.
>>
>>> This leads to things like:
>>>
>>>   /* Constant case.  */
>>>   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>>> {
>>>   tree cst_niters = LOOP_VINFO_NITERS (loop_vinfo);
>>>   tree cst_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo);
>>>
>>>   gcc_assert (TREE_CODE (cst_niters) == INTEGER_CST);
>>>   gcc_assert (TREE_CODE (cst_nitersm1) == INTEGER_CST);
>>>   if (wi::to_widest (cst_nitersm1) < wi::to_widest (cst_niters))
>>> return true;
>>> }
>>>
>>> in loop_niters_no_overflow.
>>>
> The new mechanism isn't used yet; a later patch replaces the
> "if (1)" with a check for variable VF.  If the patch is OK, I'll
> hold off applying it until the follow-on is ready to go in.

 I indeed don't like code that isn't exercised.  Otherwise looks reasonable.
>>>
>>> Thanks.
>>>
>>> Richard
>>>
 Thanks,
 Richard.

> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
> OK to install when the time comes?
>
> Richard
>
>
> 2017-10-13  Richard Sandiford  
>
> gcc/
> * tree-vect-loop-manip.c: Include gimple-fold.h.
> (slpeel_make_loop_iterate_ntimes): Add step, final_iv and
> niters_maybe_zero parameters.  Handle other cases besides a step of
>> 1.
> (vect_gen_vector_loop_niters): Add a step_vector_ptr parameter.
> Add a path that uses a step of VF instead of 1, but disable it
> for now.
> (vect_do_peeling): Add step_vector, niters_vector_mult_vf_var
> and niters_no_overflow parameters.  Update calls to
> slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters.
> Create a new SSA name if the latter choses to use a ste other
> than zero, and return it via niters_vector_mult_vf_var.
> * tree-vect-loop.c (vect_transform_loop): Update calls to
> vect_do_peeling, vect_gen_vector_loop_niters and
> 

Re: Add an alternative vector loop iv mechanism

2017-10-19 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Oct 19, 2017 at 12:28 AM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Fri, Oct 13, 2017 at 4:10 PM, Richard Sandiford
>>>  wrote:
 Normally we adjust the vector loop so that it iterates:

(original number of scalar iterations - number of peels) / VF

 times, enforcing this using an IV that starts at zero and increments
 by one each iteration.  However, dividing by VF would be expensive
 for variable VF, so this patch adds an alternative in which the IV
 increments by VF each iteration instead.  We then need to take care
 to handle possible overflow in the IV.
>>>
>>> Hmm, why do you need to handle possible overflow?  Doesn't the
>>> original loop have a natural IV that evolves like this?  After all we
>>> can compute an expression for niters of the scalar loop.
>>
>> The problem comes with loops like:
>>
>>unsigned char i = 0;
>>do
>>  {
>>...
>>i--;
>>  }
>>while (i != 0);
>>
>> The loop statements execute 256 times and the latch executes 255 times.
>> LOOP_VINFO_NITERSM1 is then 255 but LOOP_VINFO_NITERS (stored as an
>> unsigned char) is 0.
>
> Yes, that's an existing issue and the reason why I introduced
> NITERSM1.  All remaining uses of NITERS should really go away
> because of this corner-case.  So you are introducing a new user?

It's not really an NITERSM1 vs. NITERS thing.  We'd get the same
result/have the same problem with NITERSM1 - (STEP - 1) instead
of NITERS - STEP, namely:

- the new IV uses the same type as NITERS
- we only want the loop to iterate if there are at least STEP scalar
  iterations to go
- this means that the natural limit is "IV <= NITERS - STEP"
  or "IV <= NITERSM1 - (STEP - 1)" (both equivalent)
- the loop is only guaranteed to terminate if the IV can hit
  a value STEP times higher than that, i.e. "IV == NITERS - STEP"
  must be followed by an iteration in which the branch-back
  condition is false
- but if NITERS can't represent the actual number of iterations,
  then there is no value STEP times higher than that
- we cope with this by starting the IV at -1 and using a limit
  of "IV < NITERS - STEP" i.e. "IV <= NITERSM1 - STEP".

So you could see this as using a limit based on NITERSM1 with a
start of -1, although the "< NITERS - STEP" avoids the need to
subtract 1 at runtime.

But it seems better to use a 0-based IV when we can, since that
leads to more natural ivopts opportunities.  That's why the loop
tests for the overflow case and only uses the -1 based IV when
necessary.

Thanks,
Richard

>
> Richard.
>
>> This leads to things like:
>>
>>   /* Constant case.  */
>>   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>> {
>>   tree cst_niters = LOOP_VINFO_NITERS (loop_vinfo);
>>   tree cst_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo);
>>
>>   gcc_assert (TREE_CODE (cst_niters) == INTEGER_CST);
>>   gcc_assert (TREE_CODE (cst_nitersm1) == INTEGER_CST);
>>   if (wi::to_widest (cst_nitersm1) < wi::to_widest (cst_niters))
>> return true;
>> }
>>
>> in loop_niters_no_overflow.
>>
 The new mechanism isn't used yet; a later patch replaces the
 "if (1)" with a check for variable VF.  If the patch is OK, I'll
 hold off applying it until the follow-on is ready to go in.
>>>
>>> I indeed don't like code that isn't exercised.  Otherwise looks reasonable.
>>
>> Thanks.
>>
>> Richard
>>
>>> Thanks,
>>> Richard.
>>>
 Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
 OK to install when the time comes?

 Richard


 2017-10-13  Richard Sandiford  

 gcc/
 * tree-vect-loop-manip.c: Include gimple-fold.h.
 (slpeel_make_loop_iterate_ntimes): Add step, final_iv and
 niters_maybe_zero parameters.  Handle other cases besides a step of
> 1.
 (vect_gen_vector_loop_niters): Add a step_vector_ptr parameter.
 Add a path that uses a step of VF instead of 1, but disable it
 for now.
 (vect_do_peeling): Add step_vector, niters_vector_mult_vf_var
 and niters_no_overflow parameters.  Update calls to
 slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters.
 Create a new SSA name if the latter choses to use a ste other
 than zero, and return it via niters_vector_mult_vf_var.
 * tree-vect-loop.c (vect_transform_loop): Update calls to
 vect_do_peeling, vect_gen_vector_loop_niters and
 slpeel_make_loop_iterate_ntimes.
 * tree-vectorizer.h (slpeel_make_loop_iterate_ntimes,
> vect_do_peeling)
 (vect_gen_vector_loop_niters): Update declarations after above
>>> changes.

 Index: gcc/tree-vect-loop-manip.c
 

Re: Add an alternative vector loop iv mechanism

2017-10-19 Thread Richard Biener
On Thu, Oct 19, 2017 at 12:28 AM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Fri, Oct 13, 2017 at 4:10 PM, Richard Sandiford
>>  wrote:
>>> Normally we adjust the vector loop so that it iterates:
>>>
>>>(original number of scalar iterations - number of peels) / VF
>>>
>>> times, enforcing this using an IV that starts at zero and increments
>>> by one each iteration.  However, dividing by VF would be expensive
>>> for variable VF, so this patch adds an alternative in which the IV
>>> increments by VF each iteration instead.  We then need to take care
>>> to handle possible overflow in the IV.
>>
>> Hmm, why do you need to handle possible overflow?  Doesn't the
>> original loop have a natural IV that evolves like this?  After all we
>> can compute an expression for niters of the scalar loop.
>
> The problem comes with loops like:
>
>unsigned char i = 0;
>do
>  {
>...
>i--;
>  }
>while (i != 0);
>
> The loop statements execute 256 times and the latch executes 255 times.
> LOOP_VINFO_NITERSM1 is then 255 but LOOP_VINFO_NITERS (stored as an
> unsigned char) is 0.

Yes, that's an existing issue and the reason why I introduced
NITERSM1.  All remaining uses of NITERS should really go away
because of this corner-case.  So you are introducing a new user?

Richard.

> This leads to things like:
>
>   /* Constant case.  */
>   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> {
>   tree cst_niters = LOOP_VINFO_NITERS (loop_vinfo);
>   tree cst_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo);
>
>   gcc_assert (TREE_CODE (cst_niters) == INTEGER_CST);
>   gcc_assert (TREE_CODE (cst_nitersm1) == INTEGER_CST);
>   if (wi::to_widest (cst_nitersm1) < wi::to_widest (cst_niters))
> return true;
> }
>
> in loop_niters_no_overflow.
>
>>> The new mechanism isn't used yet; a later patch replaces the
>>> "if (1)" with a check for variable VF.  If the patch is OK, I'll
>>> hold off applying it until the follow-on is ready to go in.
>>
>> I indeed don't like code that isn't exercised.  Otherwise looks reasonable.
>
> Thanks.
>
> Richard
>
>> Thanks,
>> Richard.
>>
>>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
>>> OK to install when the time comes?
>>>
>>> Richard
>>>
>>>
>>> 2017-10-13  Richard Sandiford  
>>>
>>> gcc/
>>> * tree-vect-loop-manip.c: Include gimple-fold.h.
>>> (slpeel_make_loop_iterate_ntimes): Add step, final_iv and
>>> niters_maybe_zero parameters.  Handle other cases besides a step of 
>>> 1.
>>> (vect_gen_vector_loop_niters): Add a step_vector_ptr parameter.
>>> Add a path that uses a step of VF instead of 1, but disable it
>>> for now.
>>> (vect_do_peeling): Add step_vector, niters_vector_mult_vf_var
>>> and niters_no_overflow parameters.  Update calls to
>>> slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters.
>>> Create a new SSA name if the latter choses to use a ste other
>>> than zero, and return it via niters_vector_mult_vf_var.
>>> * tree-vect-loop.c (vect_transform_loop): Update calls to
>>> vect_do_peeling, vect_gen_vector_loop_niters and
>>> slpeel_make_loop_iterate_ntimes.
>>> * tree-vectorizer.h (slpeel_make_loop_iterate_ntimes, 
>>> vect_do_peeling)
>>> (vect_gen_vector_loop_niters): Update declarations after above
>> changes.
>>>
>>> Index: gcc/tree-vect-loop-manip.c
>>> ===
>>> --- gcc/tree-vect-loop-manip.c  2017-10-13 15:01:40.144777367 +0100
>>> +++ gcc/tree-vect-loop-manip.c  2017-10-13 15:01:40.296014347 +0100
>>> @@ -41,6 +41,7 @@ Software Foundation; either version 3, o
>>>  #include "tree-scalar-evolution.h"
>>>  #include "tree-vectorizer.h"
>>>  #include "tree-ssa-loop-ivopts.h"
>>> +#include "gimple-fold.h"
>>>
>>>  /*
>>>Simple Loop Peeling Utilities
>>> @@ -247,30 +248,115 @@ adjust_phi_and_debug_stmts (gimple *upda
>>> gimple_bb (update_phi));
>>>  }
>>>
>>> -/* Make the LOOP iterate NITERS times. This is done by adding a new IV
>>> -   that starts at zero, increases by one and its limit is NITERS.
>>> +/* Make LOOP iterate N == (NITERS - STEP) / STEP + 1 times,
>>> +   where NITERS is known to be outside the range [1, STEP - 1].
>>> +   This is equivalent to making the loop execute NITERS / STEP
>>> +   times when NITERS is nonzero and (1 << M) / STEP times otherwise,
>>> +   where M is the precision of NITERS.
>>> +
>>> +   NITERS_MAYBE_ZERO is true if NITERS can be zero, false it is known
>>> +   to be >= STEP.  In the latter case N is always NITERS / STEP.
>>> +
>>> +   If FINAL_IV is nonnull, it is an SSA name that should be set to
>>> +   N * STEP on exit from the 

Re: Add an alternative vector loop iv mechanism

2017-10-18 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, Oct 13, 2017 at 4:10 PM, Richard Sandiford
>  wrote:
>> Normally we adjust the vector loop so that it iterates:
>>
>>(original number of scalar iterations - number of peels) / VF
>>
>> times, enforcing this using an IV that starts at zero and increments
>> by one each iteration.  However, dividing by VF would be expensive
>> for variable VF, so this patch adds an alternative in which the IV
>> increments by VF each iteration instead.  We then need to take care
>> to handle possible overflow in the IV.
>
> Hmm, why do you need to handle possible overflow?  Doesn't the
> original loop have a natural IV that evolves like this?  After all we
> can compute an expression for niters of the scalar loop.

The problem comes with loops like:

   unsigned char i = 0;
   do
 {
   ...
   i--;
 }
   while (i != 0);

The loop statements execute 256 times and the latch executes 255 times.
LOOP_VINFO_NITERSM1 is then 255 but LOOP_VINFO_NITERS (stored as an
unsigned char) is 0.

This leads to things like:

  /* Constant case.  */
  if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
{
  tree cst_niters = LOOP_VINFO_NITERS (loop_vinfo);
  tree cst_nitersm1 = LOOP_VINFO_NITERSM1 (loop_vinfo);

  gcc_assert (TREE_CODE (cst_niters) == INTEGER_CST);
  gcc_assert (TREE_CODE (cst_nitersm1) == INTEGER_CST);
  if (wi::to_widest (cst_nitersm1) < wi::to_widest (cst_niters))
return true;
}

in loop_niters_no_overflow.

>> The new mechanism isn't used yet; a later patch replaces the
>> "if (1)" with a check for variable VF.  If the patch is OK, I'll
>> hold off applying it until the follow-on is ready to go in.
>
> I indeed don't like code that isn't exercised.  Otherwise looks reasonable.

Thanks.

Richard

> Thanks,
> Richard.
>
>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
>> OK to install when the time comes?
>>
>> Richard
>>
>>
>> 2017-10-13  Richard Sandiford  
>>
>> gcc/
>> * tree-vect-loop-manip.c: Include gimple-fold.h.
>> (slpeel_make_loop_iterate_ntimes): Add step, final_iv and
>> niters_maybe_zero parameters.  Handle other cases besides a step of 
>> 1.
>> (vect_gen_vector_loop_niters): Add a step_vector_ptr parameter.
>> Add a path that uses a step of VF instead of 1, but disable it
>> for now.
>> (vect_do_peeling): Add step_vector, niters_vector_mult_vf_var
>> and niters_no_overflow parameters.  Update calls to
>> slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters.
>> Create a new SSA name if the latter choses to use a ste other
>> than zero, and return it via niters_vector_mult_vf_var.
>> * tree-vect-loop.c (vect_transform_loop): Update calls to
>> vect_do_peeling, vect_gen_vector_loop_niters and
>> slpeel_make_loop_iterate_ntimes.
>> * tree-vectorizer.h (slpeel_make_loop_iterate_ntimes, 
>> vect_do_peeling)
>> (vect_gen_vector_loop_niters): Update declarations after above
> changes.
>>
>> Index: gcc/tree-vect-loop-manip.c
>> ===
>> --- gcc/tree-vect-loop-manip.c  2017-10-13 15:01:40.144777367 +0100
>> +++ gcc/tree-vect-loop-manip.c  2017-10-13 15:01:40.296014347 +0100
>> @@ -41,6 +41,7 @@ Software Foundation; either version 3, o
>>  #include "tree-scalar-evolution.h"
>>  #include "tree-vectorizer.h"
>>  #include "tree-ssa-loop-ivopts.h"
>> +#include "gimple-fold.h"
>>
>>  /*
>>Simple Loop Peeling Utilities
>> @@ -247,30 +248,115 @@ adjust_phi_and_debug_stmts (gimple *upda
>> gimple_bb (update_phi));
>>  }
>>
>> -/* Make the LOOP iterate NITERS times. This is done by adding a new IV
>> -   that starts at zero, increases by one and its limit is NITERS.
>> +/* Make LOOP iterate N == (NITERS - STEP) / STEP + 1 times,
>> +   where NITERS is known to be outside the range [1, STEP - 1].
>> +   This is equivalent to making the loop execute NITERS / STEP
>> +   times when NITERS is nonzero and (1 << M) / STEP times otherwise,
>> +   where M is the precision of NITERS.
>> +
>> +   NITERS_MAYBE_ZERO is true if NITERS can be zero, false it is known
>> +   to be >= STEP.  In the latter case N is always NITERS / STEP.
>> +
>> +   If FINAL_IV is nonnull, it is an SSA name that should be set to
>> +   N * STEP on exit from the loop.
>>
>> Assumption: the exit-condition of LOOP is the last stmt in the loop.  */
>>
>>  void
>> -slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters)
>> +slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters, tree step,
>> +tree final_iv, bool niters_maybe_zero)
>>  {
>>tree indx_before_incr, indx_after_incr;
>>gcond *cond_stmt;
>>gcond *orig_cond;
>> +  

Re: Add an alternative vector loop iv mechanism

2017-10-18 Thread Richard Biener
On Fri, Oct 13, 2017 at 4:10 PM, Richard Sandiford
 wrote:
> Normally we adjust the vector loop so that it iterates:
>
>(original number of scalar iterations - number of peels) / VF
>
> times, enforcing this using an IV that starts at zero and increments
> by one each iteration.  However, dividing by VF would be expensive
> for variable VF, so this patch adds an alternative in which the IV
> increments by VF each iteration instead.  We then need to take care
> to handle possible overflow in the IV.

Hmm, why do you need to handle possible overflow?  Doesn't the
original loop have a natural IV that evolves like this?  After all we
can compute an expression for niters of the scalar loop.

> The new mechanism isn't used yet; a later patch replaces the
> "if (1)" with a check for variable VF.  If the patch is OK, I'll
> hold off applying it until the follow-on is ready to go in.

I indeed don't like code that isn't exercised.  Otherwise looks reasonable.

Thanks,
Richard.

> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
> OK to install when the time comes?
>
> Richard
>
>
> 2017-10-13  Richard Sandiford  
>
> gcc/
> * tree-vect-loop-manip.c: Include gimple-fold.h.
> (slpeel_make_loop_iterate_ntimes): Add step, final_iv and
> niters_maybe_zero parameters.  Handle other cases besides a step of 1.
> (vect_gen_vector_loop_niters): Add a step_vector_ptr parameter.
> Add a path that uses a step of VF instead of 1, but disable it
> for now.
> (vect_do_peeling): Add step_vector, niters_vector_mult_vf_var
> and niters_no_overflow parameters.  Update calls to
> slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters.
> Create a new SSA name if the latter choses to use a ste other
> than zero, and return it via niters_vector_mult_vf_var.
> * tree-vect-loop.c (vect_transform_loop): Update calls to
> vect_do_peeling, vect_gen_vector_loop_niters and
> slpeel_make_loop_iterate_ntimes.
> * tree-vectorizer.h (slpeel_make_loop_iterate_ntimes, vect_do_peeling)
> (vect_gen_vector_loop_niters): Update declarations after above 
> changes.
>
> Index: gcc/tree-vect-loop-manip.c
> ===
> --- gcc/tree-vect-loop-manip.c  2017-10-13 15:01:40.144777367 +0100
> +++ gcc/tree-vect-loop-manip.c  2017-10-13 15:01:40.296014347 +0100
> @@ -41,6 +41,7 @@ Software Foundation; either version 3, o
>  #include "tree-scalar-evolution.h"
>  #include "tree-vectorizer.h"
>  #include "tree-ssa-loop-ivopts.h"
> +#include "gimple-fold.h"
>
>  /*
>Simple Loop Peeling Utilities
> @@ -247,30 +248,115 @@ adjust_phi_and_debug_stmts (gimple *upda
> gimple_bb (update_phi));
>  }
>
> -/* Make the LOOP iterate NITERS times. This is done by adding a new IV
> -   that starts at zero, increases by one and its limit is NITERS.
> +/* Make LOOP iterate N == (NITERS - STEP) / STEP + 1 times,
> +   where NITERS is known to be outside the range [1, STEP - 1].
> +   This is equivalent to making the loop execute NITERS / STEP
> +   times when NITERS is nonzero and (1 << M) / STEP times otherwise,
> +   where M is the precision of NITERS.
> +
> +   NITERS_MAYBE_ZERO is true if NITERS can be zero, false it is known
> +   to be >= STEP.  In the latter case N is always NITERS / STEP.
> +
> +   If FINAL_IV is nonnull, it is an SSA name that should be set to
> +   N * STEP on exit from the loop.
>
> Assumption: the exit-condition of LOOP is the last stmt in the loop.  */
>
>  void
> -slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters)
> +slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters, tree step,
> +tree final_iv, bool niters_maybe_zero)
>  {
>tree indx_before_incr, indx_after_incr;
>gcond *cond_stmt;
>gcond *orig_cond;
> +  edge pe = loop_preheader_edge (loop);
>edge exit_edge = single_exit (loop);
>gimple_stmt_iterator loop_cond_gsi;
>gimple_stmt_iterator incr_gsi;
>bool insert_after;
> -  tree init = build_int_cst (TREE_TYPE (niters), 0);
> -  tree step = build_int_cst (TREE_TYPE (niters), 1);
>source_location loop_loc;
>enum tree_code code;
> +  tree niters_type = TREE_TYPE (niters);
>
>orig_cond = get_loop_exit_condition (loop);
>gcc_assert (orig_cond);
>loop_cond_gsi = gsi_for_stmt (orig_cond);
>
> +  tree init, limit;
> +  if (!niters_maybe_zero && integer_onep (step))
> +{
> +  /* In this case we can use a simple 0-based IV:
> +
> +A:
> +  x = 0;
> +  do
> +{
> +  ...
> +  x += 1;
> +}
> +  while (x < NITERS);  */
> +  code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GE_EXPR : LT_EXPR;
> +  

Add an alternative vector loop iv mechanism

2017-10-13 Thread Richard Sandiford
Normally we adjust the vector loop so that it iterates:

   (original number of scalar iterations - number of peels) / VF

times, enforcing this using an IV that starts at zero and increments
by one each iteration.  However, dividing by VF would be expensive
for variable VF, so this patch adds an alternative in which the IV
increments by VF each iteration instead.  We then need to take care
to handle possible overflow in the IV.

The new mechanism isn't used yet; a later patch replaces the
"if (1)" with a check for variable VF.  If the patch is OK, I'll
hold off applying it until the follow-on is ready to go in.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
OK to install when the time comes?

Richard


2017-10-13  Richard Sandiford  

gcc/
* tree-vect-loop-manip.c: Include gimple-fold.h.
(slpeel_make_loop_iterate_ntimes): Add step, final_iv and
niters_maybe_zero parameters.  Handle other cases besides a step of 1.
(vect_gen_vector_loop_niters): Add a step_vector_ptr parameter.
Add a path that uses a step of VF instead of 1, but disable it
for now.
(vect_do_peeling): Add step_vector, niters_vector_mult_vf_var
and niters_no_overflow parameters.  Update calls to
slpeel_make_loop_iterate_ntimes and vect_gen_vector_loop_niters.
Create a new SSA name if the latter choses to use a ste other
than zero, and return it via niters_vector_mult_vf_var.
* tree-vect-loop.c (vect_transform_loop): Update calls to
vect_do_peeling, vect_gen_vector_loop_niters and
slpeel_make_loop_iterate_ntimes.
* tree-vectorizer.h (slpeel_make_loop_iterate_ntimes, vect_do_peeling)
(vect_gen_vector_loop_niters): Update declarations after above changes.

Index: gcc/tree-vect-loop-manip.c
===
--- gcc/tree-vect-loop-manip.c  2017-10-13 15:01:40.144777367 +0100
+++ gcc/tree-vect-loop-manip.c  2017-10-13 15:01:40.296014347 +0100
@@ -41,6 +41,7 @@ Software Foundation; either version 3, o
 #include "tree-scalar-evolution.h"
 #include "tree-vectorizer.h"
 #include "tree-ssa-loop-ivopts.h"
+#include "gimple-fold.h"
 
 /*
   Simple Loop Peeling Utilities
@@ -247,30 +248,115 @@ adjust_phi_and_debug_stmts (gimple *upda
gimple_bb (update_phi));
 }
 
-/* Make the LOOP iterate NITERS times. This is done by adding a new IV
-   that starts at zero, increases by one and its limit is NITERS.
+/* Make LOOP iterate N == (NITERS - STEP) / STEP + 1 times,
+   where NITERS is known to be outside the range [1, STEP - 1].
+   This is equivalent to making the loop execute NITERS / STEP
+   times when NITERS is nonzero and (1 << M) / STEP times otherwise,
+   where M is the precision of NITERS.
+
+   NITERS_MAYBE_ZERO is true if NITERS can be zero, false it is known
+   to be >= STEP.  In the latter case N is always NITERS / STEP.
+
+   If FINAL_IV is nonnull, it is an SSA name that should be set to
+   N * STEP on exit from the loop.
 
Assumption: the exit-condition of LOOP is the last stmt in the loop.  */
 
 void
-slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters)
+slpeel_make_loop_iterate_ntimes (struct loop *loop, tree niters, tree step,
+tree final_iv, bool niters_maybe_zero)
 {
   tree indx_before_incr, indx_after_incr;
   gcond *cond_stmt;
   gcond *orig_cond;
+  edge pe = loop_preheader_edge (loop);
   edge exit_edge = single_exit (loop);
   gimple_stmt_iterator loop_cond_gsi;
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
-  tree init = build_int_cst (TREE_TYPE (niters), 0);
-  tree step = build_int_cst (TREE_TYPE (niters), 1);
   source_location loop_loc;
   enum tree_code code;
+  tree niters_type = TREE_TYPE (niters);
 
   orig_cond = get_loop_exit_condition (loop);
   gcc_assert (orig_cond);
   loop_cond_gsi = gsi_for_stmt (orig_cond);
 
+  tree init, limit;
+  if (!niters_maybe_zero && integer_onep (step))
+{
+  /* In this case we can use a simple 0-based IV:
+
+A:
+  x = 0;
+  do
+{
+  ...
+  x += 1;
+}
+  while (x < NITERS);  */
+  code = (exit_edge->flags & EDGE_TRUE_VALUE) ? GE_EXPR : LT_EXPR;
+  init = build_zero_cst (niters_type);
+  limit = niters;
+}
+  else
+{
+  /* The following works for all values of NITERS except 0:
+
+B:
+  x = 0;
+  do
+{
+  ...
+  x += STEP;
+}
+  while (x <= NITERS - STEP);
+
+so that the loop continues to iterate if x + STEP - 1 < NITERS
+but stops if x + STEP - 1 >= NITERS.
+
+However, if NITERS is zero, x never hits a value above NITERS - STEP
+before wrapping around.  There are two obvious ways of