Re: [4/6] Optionally pick the cheapest loop_vec_info

2019-11-08 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Nov 7, 2019 at 6:15 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Wed, Nov 6, 2019 at 3:01 PM Richard Sandiford
>> >  wrote:
>> >>
>> >> Richard Biener  writes:
>> >> > On Tue, Nov 5, 2019 at 3:29 PM Richard Sandiford
>> >> >  wrote:
>> >> >>
>> >> >> This patch adds a mode in which the vectoriser tries each available
>> >> >> base vector mode and picks the one with the lowest cost.  For now
>> >> >> the behaviour is behind a default-off --param, but a later patch
>> >> >> enables it by default for SVE.
>> >> >>
>> >> >> The patch keeps the current behaviour of preferring a VF of
>> >> >> loop->simdlen over any larger or smaller VF, regardless of costs
>> >> >> or target preferences.
>> >> >
>> >> > Can you avoid using a --param for this?  Instead I'd suggest to
>> >> > amend the vectorize_modes target hook to return some
>> >> > flags like VECT_FIRST_MODE_WINS.  We'd eventually want
>> >> > to make the target able to say do-not-vectorize-epiloges-of-MODE
>> >> > (I think we may not want to vectorize SSE vectorized loop
>> >> > epilogues with MMX-with-SSE or GPRs for example).  I guess
>> >> > for the latter we'd use a new target hook.
>> >>
>> >> The reason for using a --param was that I wanted a way of turning
>> >> this on and off on the command line, so that users can experiment
>> >> with it if necessary.  E.g. enabling the --param could be a viable
>> >> alternative to -mprefix-* in some cases.  Disabling it would be
>> >> a way of working around a bad cost model decision without going
>> >> all the way to -fno-vect-cost-model.
>> >>
>> >> These kinds of --params can become useful workarounds until an
>> >> optimisation bug is fixed.
>> >
>> > I'm arguing that the default depends on the actual ISAs so there isn't
>> > a one-fits all and given we have OMP SIMD and target cloning for
>> > multiple ISAs this looks like a wrong approach.  For sure the
>> > target can use its own switches to override defaults here, or alternatively
>> > we might want to have a #pragma GCC simdlen mimicing OMP behavior
>> > here.
>>
>> I agree there's no one-size-fits-all choice here, but that's true for
>> other --params too.  The problem with using target switches is that we
>> have to explain them and to keep accepting them "forever" (or at least
>> with a long deprecation period).
>
> Fortunately next week you'll be able to add target specific --params
> to your targets .opt file ;)

Nice!  That definitely sounds like a good compromise. :-)  I'll hold off
on 6/6 until Martin's patches have gone in.  There are a couple of other
SVE things that would benefit from that too.

Thanks,
Richard


Re: [4/6] Optionally pick the cheapest loop_vec_info

2019-11-08 Thread Richard Biener
On Thu, Nov 7, 2019 at 6:15 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Wed, Nov 6, 2019 at 3:01 PM Richard Sandiford
> >  wrote:
> >>
> >> Richard Biener  writes:
> >> > On Tue, Nov 5, 2019 at 3:29 PM Richard Sandiford
> >> >  wrote:
> >> >>
> >> >> This patch adds a mode in which the vectoriser tries each available
> >> >> base vector mode and picks the one with the lowest cost.  For now
> >> >> the behaviour is behind a default-off --param, but a later patch
> >> >> enables it by default for SVE.
> >> >>
> >> >> The patch keeps the current behaviour of preferring a VF of
> >> >> loop->simdlen over any larger or smaller VF, regardless of costs
> >> >> or target preferences.
> >> >
> >> > Can you avoid using a --param for this?  Instead I'd suggest to
> >> > amend the vectorize_modes target hook to return some
> >> > flags like VECT_FIRST_MODE_WINS.  We'd eventually want
> >> > to make the target able to say do-not-vectorize-epiloges-of-MODE
> >> > (I think we may not want to vectorize SSE vectorized loop
> >> > epilogues with MMX-with-SSE or GPRs for example).  I guess
> >> > for the latter we'd use a new target hook.
> >>
> >> The reason for using a --param was that I wanted a way of turning
> >> this on and off on the command line, so that users can experiment
> >> with it if necessary.  E.g. enabling the --param could be a viable
> >> alternative to -mprefix-* in some cases.  Disabling it would be
> >> a way of working around a bad cost model decision without going
> >> all the way to -fno-vect-cost-model.
> >>
> >> These kinds of --params can become useful workarounds until an
> >> optimisation bug is fixed.
> >
> > I'm arguing that the default depends on the actual ISAs so there isn't
> > a one-fits all and given we have OMP SIMD and target cloning for
> > multiple ISAs this looks like a wrong approach.  For sure the
> > target can use its own switches to override defaults here, or alternatively
> > we might want to have a #pragma GCC simdlen mimicing OMP behavior
> > here.
>
> I agree there's no one-size-fits-all choice here, but that's true for
> other --params too.  The problem with using target switches is that we
> have to explain them and to keep accepting them "forever" (or at least
> with a long deprecation period).

Fortunately next week you'll be able to add target specific --params
to your targets .opt file ;)

>  Whereas the --param was just something
> that people could play with or perhaps use to work around problems
> temporarily.  It would come with no guarantees attached.  And what the
> --param did applied to any targets that support multiple modes,
> regardless of what the targets do by default.
>
> All that said, here's a version that returns the bitmask you suggested.
> I ended up making the flag select the new behaviour and 0 select the
> current behaviour, rather than have a flag for "first mode wins".
> Tested as before.

OK.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> 2019-11-07  Richard Sandiford  
>
> gcc/
> * target.h (VECT_COMPARE_COSTS): New constant.
> * target.def (autovectorize_vector_modes): Return a bitmask of flags.
> * doc/tm.texi: Regenerate.
> * targhooks.h (default_autovectorize_vector_modes): Update 
> accordingly.
> * targhooks.c (default_autovectorize_vector_modes): Likewise.
> * config/aarch64/aarch64.c (aarch64_autovectorize_vector_modes):
> Likewise.
> * config/arc/arc.c (arc_autovectorize_vector_modes): Likewise.
> * config/arm/arm.c (arm_autovectorize_vector_modes): Likewise.
> * config/i386/i386.c (ix86_autovectorize_vector_modes): Likewise.
> * config/mips/mips.c (mips_autovectorize_vector_modes): Likewise.
> * tree-vectorizer.h (_loop_vec_info::vec_outside_cost)
> (_loop_vec_info::vec_inside_cost): New member variables.
> * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize them.
> (vect_better_loop_vinfo_p, vect_joust_loop_vinfos): New functions.
> (vect_analyze_loop): When autovectorize_vector_modes returns
> VECT_COMPARE_COSTS, try vectorizing the loop with each available
> vector mode and picking the one with the lowest cost.
> (vect_estimate_min_profitable_iters): Record the computed costs
> in the loop_vec_info.
>
> Index: gcc/target.h
> ===
> --- gcc/target.h2019-11-07 15:11:15.831017985 +
> +++ gcc/target.h2019-11-07 16:52:30.037198353 +
> @@ -218,6 +218,14 @@ enum omp_device_kind_arch_isa {
>omp_device_isa
>  };
>
> +/* Flags returned by TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES:
> +
> +   VECT_COMPARE_COSTS
> +   Tells the loop vectorizer to try all the provided modes and
> +   pick the one with the lowest cost.  By default the vectorizer
> +   will choose the first mode that works.  */
> +const unsigned int VECT_COMPARE_COSTS = 1U << 0;
> +
>  

Re: [4/6] Optionally pick the cheapest loop_vec_info

2019-11-07 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, Nov 6, 2019 at 3:01 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Tue, Nov 5, 2019 at 3:29 PM Richard Sandiford
>> >  wrote:
>> >>
>> >> This patch adds a mode in which the vectoriser tries each available
>> >> base vector mode and picks the one with the lowest cost.  For now
>> >> the behaviour is behind a default-off --param, but a later patch
>> >> enables it by default for SVE.
>> >>
>> >> The patch keeps the current behaviour of preferring a VF of
>> >> loop->simdlen over any larger or smaller VF, regardless of costs
>> >> or target preferences.
>> >
>> > Can you avoid using a --param for this?  Instead I'd suggest to
>> > amend the vectorize_modes target hook to return some
>> > flags like VECT_FIRST_MODE_WINS.  We'd eventually want
>> > to make the target able to say do-not-vectorize-epiloges-of-MODE
>> > (I think we may not want to vectorize SSE vectorized loop
>> > epilogues with MMX-with-SSE or GPRs for example).  I guess
>> > for the latter we'd use a new target hook.
>>
>> The reason for using a --param was that I wanted a way of turning
>> this on and off on the command line, so that users can experiment
>> with it if necessary.  E.g. enabling the --param could be a viable
>> alternative to -mprefix-* in some cases.  Disabling it would be
>> a way of working around a bad cost model decision without going
>> all the way to -fno-vect-cost-model.
>>
>> These kinds of --params can become useful workarounds until an
>> optimisation bug is fixed.
>
> I'm arguing that the default depends on the actual ISAs so there isn't
> a one-fits all and given we have OMP SIMD and target cloning for
> multiple ISAs this looks like a wrong approach.  For sure the
> target can use its own switches to override defaults here, or alternatively
> we might want to have a #pragma GCC simdlen mimicing OMP behavior
> here.

I agree there's no one-size-fits-all choice here, but that's true for
other --params too.  The problem with using target switches is that we
have to explain them and to keep accepting them "forever" (or at least
with a long deprecation period).  Whereas the --param was just something
that people could play with or perhaps use to work around problems
temporarily.  It would come with no guarantees attached.  And what the
--param did applied to any targets that support multiple modes,
regardless of what the targets do by default.

All that said, here's a version that returns the bitmask you suggested.
I ended up making the flag select the new behaviour and 0 select the
current behaviour, rather than have a flag for "first mode wins".
Tested as before.

Thanks,
Richard


2019-11-07  Richard Sandiford  

gcc/
* target.h (VECT_COMPARE_COSTS): New constant.
* target.def (autovectorize_vector_modes): Return a bitmask of flags.
* doc/tm.texi: Regenerate.
* targhooks.h (default_autovectorize_vector_modes): Update accordingly.
* targhooks.c (default_autovectorize_vector_modes): Likewise.
* config/aarch64/aarch64.c (aarch64_autovectorize_vector_modes):
Likewise.
* config/arc/arc.c (arc_autovectorize_vector_modes): Likewise.
* config/arm/arm.c (arm_autovectorize_vector_modes): Likewise.
* config/i386/i386.c (ix86_autovectorize_vector_modes): Likewise.
* config/mips/mips.c (mips_autovectorize_vector_modes): Likewise.
* tree-vectorizer.h (_loop_vec_info::vec_outside_cost)
(_loop_vec_info::vec_inside_cost): New member variables.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize them.
(vect_better_loop_vinfo_p, vect_joust_loop_vinfos): New functions.
(vect_analyze_loop): When autovectorize_vector_modes returns
VECT_COMPARE_COSTS, try vectorizing the loop with each available
vector mode and picking the one with the lowest cost.
(vect_estimate_min_profitable_iters): Record the computed costs
in the loop_vec_info.

Index: gcc/target.h
===
--- gcc/target.h2019-11-07 15:11:15.831017985 +
+++ gcc/target.h2019-11-07 16:52:30.037198353 +
@@ -218,6 +218,14 @@ enum omp_device_kind_arch_isa {
   omp_device_isa
 };
 
+/* Flags returned by TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES:
+
+   VECT_COMPARE_COSTS
+   Tells the loop vectorizer to try all the provided modes and
+   pick the one with the lowest cost.  By default the vectorizer
+   will choose the first mode that works.  */
+const unsigned int VECT_COMPARE_COSTS = 1U << 0;
+
 /* The target structure.  This holds all the backend hooks.  */
 #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
 #define DEFHOOK(NAME, DOC, TYPE, PARAMS, INIT) TYPE (* NAME) PARAMS;
Index: gcc/target.def
===
--- gcc/target.def  2019-11-07 15:11:15.819018071 +
+++ gcc/target.def  2019-11-07 

Re: [4/6] Optionally pick the cheapest loop_vec_info

2019-11-06 Thread Richard Biener
On Wed, Nov 6, 2019 at 3:01 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Tue, Nov 5, 2019 at 3:29 PM Richard Sandiford
> >  wrote:
> >>
> >> This patch adds a mode in which the vectoriser tries each available
> >> base vector mode and picks the one with the lowest cost.  For now
> >> the behaviour is behind a default-off --param, but a later patch
> >> enables it by default for SVE.
> >>
> >> The patch keeps the current behaviour of preferring a VF of
> >> loop->simdlen over any larger or smaller VF, regardless of costs
> >> or target preferences.
> >
> > Can you avoid using a --param for this?  Instead I'd suggest to
> > amend the vectorize_modes target hook to return some
> > flags like VECT_FIRST_MODE_WINS.  We'd eventually want
> > to make the target able to say do-not-vectorize-epiloges-of-MODE
> > (I think we may not want to vectorize SSE vectorized loop
> > epilogues with MMX-with-SSE or GPRs for example).  I guess
> > for the latter we'd use a new target hook.
>
> The reason for using a --param was that I wanted a way of turning
> this on and off on the command line, so that users can experiment
> with it if necessary.  E.g. enabling the --param could be a viable
> alternative to -mprefix-* in some cases.  Disabling it would be
> a way of working around a bad cost model decision without going
> all the way to -fno-vect-cost-model.
>
> These kinds of --params can become useful workarounds until an
> optimisation bug is fixed.

I'm arguing that the default depends on the actual ISAs so there isn't
a one-fits all and given we have OMP SIMD and target cloning for
multiple ISAs this looks like a wrong approach.  For sure the
target can use its own switches to override defaults here, or alternatively
we might want to have a #pragma GCC simdlen mimicing OMP behavior
here.

Richard.

>
> Thanks,
> Richard


Re: [4/6] Optionally pick the cheapest loop_vec_info

2019-11-06 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Nov 5, 2019 at 3:29 PM Richard Sandiford
>  wrote:
>>
>> This patch adds a mode in which the vectoriser tries each available
>> base vector mode and picks the one with the lowest cost.  For now
>> the behaviour is behind a default-off --param, but a later patch
>> enables it by default for SVE.
>>
>> The patch keeps the current behaviour of preferring a VF of
>> loop->simdlen over any larger or smaller VF, regardless of costs
>> or target preferences.
>
> Can you avoid using a --param for this?  Instead I'd suggest to
> amend the vectorize_modes target hook to return some
> flags like VECT_FIRST_MODE_WINS.  We'd eventually want
> to make the target able to say do-not-vectorize-epiloges-of-MODE
> (I think we may not want to vectorize SSE vectorized loop
> epilogues with MMX-with-SSE or GPRs for example).  I guess
> for the latter we'd use a new target hook.

The reason for using a --param was that I wanted a way of turning
this on and off on the command line, so that users can experiment
with it if necessary.  E.g. enabling the --param could be a viable
alternative to -mprefix-* in some cases.  Disabling it would be
a way of working around a bad cost model decision without going
all the way to -fno-vect-cost-model.

These kinds of --params can become useful workarounds until an
optimisation bug is fixed.

Thanks,
Richard


Re: [4/6] Optionally pick the cheapest loop_vec_info

2019-11-06 Thread Richard Biener
On Tue, Nov 5, 2019 at 3:29 PM Richard Sandiford
 wrote:
>
> This patch adds a mode in which the vectoriser tries each available
> base vector mode and picks the one with the lowest cost.  For now
> the behaviour is behind a default-off --param, but a later patch
> enables it by default for SVE.
>
> The patch keeps the current behaviour of preferring a VF of
> loop->simdlen over any larger or smaller VF, regardless of costs
> or target preferences.

Can you avoid using a --param for this?  Instead I'd suggest to
amend the vectorize_modes target hook to return some
flags like VECT_FIRST_MODE_WINS.  We'd eventually want
to make the target able to say do-not-vectorize-epiloges-of-MODE
(I think we may not want to vectorize SSE vectorized loop
epilogues with MMX-with-SSE or GPRs for example).  I guess
for the latter we'd use a new target hook.

Otherwise looks reasonable.

Richard.

>
> 2019-11-05  Richard Sandiford  
>
> gcc/
> * params.def (vect-compare-loop-costs): New param.
> * doc/invoke.texi: Document it.
> * tree-vectorizer.h (_loop_vec_info::vec_outside_cost)
> (_loop_vec_info::vec_inside_cost): New member variables.
> * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize them.
> (vect_better_loop_vinfo_p, vect_joust_loop_vinfos): New functions.
> (vect_analyze_loop): When the new parameter allows, try vectorizing
> the loop with each available vector mode and picking the one with
> the lowest cost.
> (vect_estimate_min_profitable_iters): Record the computed costs
> in the loop_vec_info.
>
> Index: gcc/params.def
> ===
> --- gcc/params.def  2019-10-31 17:15:25.470517368 +
> +++ gcc/params.def  2019-11-05 14:19:58.781197820 +
> @@ -661,6 +661,13 @@ DEFPARAM(PARAM_VECT_MAX_PEELING_FOR_ALIG
>   "Maximum number of loop peels to enhance alignment of data 
> references in a loop.",
>   -1, -1, 64)
>
> +DEFPARAM(PARAM_VECT_COMPARE_LOOP_COSTS,
> +"vect-compare-loop-costs",
> +"Whether to try vectorizing a loop using each supported"
> +" combination of vector types and picking the version with the"
> +" lowest cost.",
> +0, 0, 1)
> +
>  DEFPARAM(PARAM_MAX_CSELIB_MEMORY_LOCATIONS,
>  "max-cselib-memory-locations",
>  "The maximum memory locations recorded by cselib.",
> Index: gcc/doc/invoke.texi
> ===
> --- gcc/doc/invoke.texi 2019-11-04 21:13:57.611756365 +
> +++ gcc/doc/invoke.texi 2019-11-05 14:19:58.777197850 +
> @@ -11563,6 +11563,12 @@ doing loop versioning for alias in the v
>  The maximum number of loop peels to enhance access alignment
>  for vectorizer. Value -1 means no limit.
>
> +@item vect-compare-loop-costs
> +Whether to try vectorizing a loop using each supported combination of
> +vector types and picking the version with the lowest cost.  This parameter
> +has no effect when @option{-fno-vect-cost-model} or
> +@option{-fvect-cost-model=unlimited} are used.
> +
>  @item max-iterations-to-track
>  The maximum number of iterations of a loop the brute-force algorithm
>  for analysis of the number of iterations of the loop tries to evaluate.
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2019-11-05 14:19:33.829371745 +
> +++ gcc/tree-vectorizer.h   2019-11-05 14:19:58.781197820 +
> @@ -601,6 +601,13 @@ typedef class _loop_vec_info : public ve
>/* Cost of a single scalar iteration.  */
>int single_scalar_iteration_cost;
>
> +  /* The cost of the vector prologue and epilogue, including peeled
> + iterations and set-up code.  */
> +  int vec_outside_cost;
> +
> +  /* The cost of the vector loop body.  */
> +  int vec_inside_cost;
> +
>/* Is the loop vectorizable? */
>bool vectorizable;
>
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2019-11-05 14:19:33.829371745 +
> +++ gcc/tree-vect-loop.c2019-11-05 14:19:58.781197820 +
> @@ -830,6 +830,8 @@ _loop_vec_info::_loop_vec_info (class lo
>  scan_map (NULL),
>  slp_unrolling_factor (1),
>  single_scalar_iteration_cost (0),
> +vec_outside_cost (0),
> +vec_inside_cost (0),
>  vectorizable (false),
>  can_fully_mask_p (true),
>  fully_masked_p (false),
> @@ -2373,6 +2375,80 @@ vect_analyze_loop_2 (loop_vec_info loop_
>goto start_over;
>  }
>
> +/* Return true if vectorizing a loop using NEW_LOOP_VINFO appears
> +   to be better than vectorizing it using OLD_LOOP_VINFO.  Assume that
> +   OLD_LOOP_VINFO is better unless something specifically indicates
> +   otherwise.
> +
> +   Note that this deliberately isn't a partial order.  */
> +
> +static bool
> +vect_better_loop_vinfo_p