Re: [4/6] Optionally pick the cheapest loop_vec_info
Richard Biener writes: > On Thu, Nov 7, 2019 at 6:15 PM Richard Sandiford > wrote: >> >> Richard Biener writes: >> > On Wed, Nov 6, 2019 at 3:01 PM Richard Sandiford >> > wrote: >> >> >> >> Richard Biener writes: >> >> > On Tue, Nov 5, 2019 at 3:29 PM Richard Sandiford >> >> > wrote: >> >> >> >> >> >> This patch adds a mode in which the vectoriser tries each available >> >> >> base vector mode and picks the one with the lowest cost. For now >> >> >> the behaviour is behind a default-off --param, but a later patch >> >> >> enables it by default for SVE. >> >> >> >> >> >> The patch keeps the current behaviour of preferring a VF of >> >> >> loop->simdlen over any larger or smaller VF, regardless of costs >> >> >> or target preferences. >> >> > >> >> > Can you avoid using a --param for this? Instead I'd suggest to >> >> > amend the vectorize_modes target hook to return some >> >> > flags like VECT_FIRST_MODE_WINS. We'd eventually want >> >> > to make the target able to say do-not-vectorize-epiloges-of-MODE >> >> > (I think we may not want to vectorize SSE vectorized loop >> >> > epilogues with MMX-with-SSE or GPRs for example). I guess >> >> > for the latter we'd use a new target hook. >> >> >> >> The reason for using a --param was that I wanted a way of turning >> >> this on and off on the command line, so that users can experiment >> >> with it if necessary. E.g. enabling the --param could be a viable >> >> alternative to -mprefix-* in some cases. Disabling it would be >> >> a way of working around a bad cost model decision without going >> >> all the way to -fno-vect-cost-model. >> >> >> >> These kinds of --params can become useful workarounds until an >> >> optimisation bug is fixed. >> > >> > I'm arguing that the default depends on the actual ISAs so there isn't >> > a one-fits all and given we have OMP SIMD and target cloning for >> > multiple ISAs this looks like a wrong approach. For sure the >> > target can use its own switches to override defaults here, or alternatively >> > we might want to have a #pragma GCC simdlen mimicing OMP behavior >> > here. >> >> I agree there's no one-size-fits-all choice here, but that's true for >> other --params too. The problem with using target switches is that we >> have to explain them and to keep accepting them "forever" (or at least >> with a long deprecation period). > > Fortunately next week you'll be able to add target specific --params > to your targets .opt file ;) Nice! That definitely sounds like a good compromise. :-) I'll hold off on 6/6 until Martin's patches have gone in. There are a couple of other SVE things that would benefit from that too. Thanks, Richard
Re: [4/6] Optionally pick the cheapest loop_vec_info
On Thu, Nov 7, 2019 at 6:15 PM Richard Sandiford wrote: > > Richard Biener writes: > > On Wed, Nov 6, 2019 at 3:01 PM Richard Sandiford > > wrote: > >> > >> Richard Biener writes: > >> > On Tue, Nov 5, 2019 at 3:29 PM Richard Sandiford > >> > wrote: > >> >> > >> >> This patch adds a mode in which the vectoriser tries each available > >> >> base vector mode and picks the one with the lowest cost. For now > >> >> the behaviour is behind a default-off --param, but a later patch > >> >> enables it by default for SVE. > >> >> > >> >> The patch keeps the current behaviour of preferring a VF of > >> >> loop->simdlen over any larger or smaller VF, regardless of costs > >> >> or target preferences. > >> > > >> > Can you avoid using a --param for this? Instead I'd suggest to > >> > amend the vectorize_modes target hook to return some > >> > flags like VECT_FIRST_MODE_WINS. We'd eventually want > >> > to make the target able to say do-not-vectorize-epiloges-of-MODE > >> > (I think we may not want to vectorize SSE vectorized loop > >> > epilogues with MMX-with-SSE or GPRs for example). I guess > >> > for the latter we'd use a new target hook. > >> > >> The reason for using a --param was that I wanted a way of turning > >> this on and off on the command line, so that users can experiment > >> with it if necessary. E.g. enabling the --param could be a viable > >> alternative to -mprefix-* in some cases. Disabling it would be > >> a way of working around a bad cost model decision without going > >> all the way to -fno-vect-cost-model. > >> > >> These kinds of --params can become useful workarounds until an > >> optimisation bug is fixed. > > > > I'm arguing that the default depends on the actual ISAs so there isn't > > a one-fits all and given we have OMP SIMD and target cloning for > > multiple ISAs this looks like a wrong approach. For sure the > > target can use its own switches to override defaults here, or alternatively > > we might want to have a #pragma GCC simdlen mimicing OMP behavior > > here. > > I agree there's no one-size-fits-all choice here, but that's true for > other --params too. The problem with using target switches is that we > have to explain them and to keep accepting them "forever" (or at least > with a long deprecation period). Fortunately next week you'll be able to add target specific --params to your targets .opt file ;) > Whereas the --param was just something > that people could play with or perhaps use to work around problems > temporarily. It would come with no guarantees attached. And what the > --param did applied to any targets that support multiple modes, > regardless of what the targets do by default. > > All that said, here's a version that returns the bitmask you suggested. > I ended up making the flag select the new behaviour and 0 select the > current behaviour, rather than have a flag for "first mode wins". > Tested as before. OK. Thanks, Richard. > Thanks, > Richard > > > 2019-11-07 Richard Sandiford > > gcc/ > * target.h (VECT_COMPARE_COSTS): New constant. > * target.def (autovectorize_vector_modes): Return a bitmask of flags. > * doc/tm.texi: Regenerate. > * targhooks.h (default_autovectorize_vector_modes): Update > accordingly. > * targhooks.c (default_autovectorize_vector_modes): Likewise. > * config/aarch64/aarch64.c (aarch64_autovectorize_vector_modes): > Likewise. > * config/arc/arc.c (arc_autovectorize_vector_modes): Likewise. > * config/arm/arm.c (arm_autovectorize_vector_modes): Likewise. > * config/i386/i386.c (ix86_autovectorize_vector_modes): Likewise. > * config/mips/mips.c (mips_autovectorize_vector_modes): Likewise. > * tree-vectorizer.h (_loop_vec_info::vec_outside_cost) > (_loop_vec_info::vec_inside_cost): New member variables. > * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize them. > (vect_better_loop_vinfo_p, vect_joust_loop_vinfos): New functions. > (vect_analyze_loop): When autovectorize_vector_modes returns > VECT_COMPARE_COSTS, try vectorizing the loop with each available > vector mode and picking the one with the lowest cost. > (vect_estimate_min_profitable_iters): Record the computed costs > in the loop_vec_info. > > Index: gcc/target.h > === > --- gcc/target.h2019-11-07 15:11:15.831017985 + > +++ gcc/target.h2019-11-07 16:52:30.037198353 + > @@ -218,6 +218,14 @@ enum omp_device_kind_arch_isa { >omp_device_isa > }; > > +/* Flags returned by TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES: > + > + VECT_COMPARE_COSTS > + Tells the loop vectorizer to try all the provided modes and > + pick the one with the lowest cost. By default the vectorizer > + will choose the first mode that works. */ > +const unsigned int VECT_COMPARE_COSTS = 1U << 0; > + > /*
Re: [4/6] Optionally pick the cheapest loop_vec_info
Richard Biener writes: > On Wed, Nov 6, 2019 at 3:01 PM Richard Sandiford > wrote: >> >> Richard Biener writes: >> > On Tue, Nov 5, 2019 at 3:29 PM Richard Sandiford >> > wrote: >> >> >> >> This patch adds a mode in which the vectoriser tries each available >> >> base vector mode and picks the one with the lowest cost. For now >> >> the behaviour is behind a default-off --param, but a later patch >> >> enables it by default for SVE. >> >> >> >> The patch keeps the current behaviour of preferring a VF of >> >> loop->simdlen over any larger or smaller VF, regardless of costs >> >> or target preferences. >> > >> > Can you avoid using a --param for this? Instead I'd suggest to >> > amend the vectorize_modes target hook to return some >> > flags like VECT_FIRST_MODE_WINS. We'd eventually want >> > to make the target able to say do-not-vectorize-epiloges-of-MODE >> > (I think we may not want to vectorize SSE vectorized loop >> > epilogues with MMX-with-SSE or GPRs for example). I guess >> > for the latter we'd use a new target hook. >> >> The reason for using a --param was that I wanted a way of turning >> this on and off on the command line, so that users can experiment >> with it if necessary. E.g. enabling the --param could be a viable >> alternative to -mprefix-* in some cases. Disabling it would be >> a way of working around a bad cost model decision without going >> all the way to -fno-vect-cost-model. >> >> These kinds of --params can become useful workarounds until an >> optimisation bug is fixed. > > I'm arguing that the default depends on the actual ISAs so there isn't > a one-fits all and given we have OMP SIMD and target cloning for > multiple ISAs this looks like a wrong approach. For sure the > target can use its own switches to override defaults here, or alternatively > we might want to have a #pragma GCC simdlen mimicing OMP behavior > here. I agree there's no one-size-fits-all choice here, but that's true for other --params too. The problem with using target switches is that we have to explain them and to keep accepting them "forever" (or at least with a long deprecation period). Whereas the --param was just something that people could play with or perhaps use to work around problems temporarily. It would come with no guarantees attached. And what the --param did applied to any targets that support multiple modes, regardless of what the targets do by default. All that said, here's a version that returns the bitmask you suggested. I ended up making the flag select the new behaviour and 0 select the current behaviour, rather than have a flag for "first mode wins". Tested as before. Thanks, Richard 2019-11-07 Richard Sandiford gcc/ * target.h (VECT_COMPARE_COSTS): New constant. * target.def (autovectorize_vector_modes): Return a bitmask of flags. * doc/tm.texi: Regenerate. * targhooks.h (default_autovectorize_vector_modes): Update accordingly. * targhooks.c (default_autovectorize_vector_modes): Likewise. * config/aarch64/aarch64.c (aarch64_autovectorize_vector_modes): Likewise. * config/arc/arc.c (arc_autovectorize_vector_modes): Likewise. * config/arm/arm.c (arm_autovectorize_vector_modes): Likewise. * config/i386/i386.c (ix86_autovectorize_vector_modes): Likewise. * config/mips/mips.c (mips_autovectorize_vector_modes): Likewise. * tree-vectorizer.h (_loop_vec_info::vec_outside_cost) (_loop_vec_info::vec_inside_cost): New member variables. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize them. (vect_better_loop_vinfo_p, vect_joust_loop_vinfos): New functions. (vect_analyze_loop): When autovectorize_vector_modes returns VECT_COMPARE_COSTS, try vectorizing the loop with each available vector mode and picking the one with the lowest cost. (vect_estimate_min_profitable_iters): Record the computed costs in the loop_vec_info. Index: gcc/target.h === --- gcc/target.h2019-11-07 15:11:15.831017985 + +++ gcc/target.h2019-11-07 16:52:30.037198353 + @@ -218,6 +218,14 @@ enum omp_device_kind_arch_isa { omp_device_isa }; +/* Flags returned by TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES: + + VECT_COMPARE_COSTS + Tells the loop vectorizer to try all the provided modes and + pick the one with the lowest cost. By default the vectorizer + will choose the first mode that works. */ +const unsigned int VECT_COMPARE_COSTS = 1U << 0; + /* The target structure. This holds all the backend hooks. */ #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME; #define DEFHOOK(NAME, DOC, TYPE, PARAMS, INIT) TYPE (* NAME) PARAMS; Index: gcc/target.def === --- gcc/target.def 2019-11-07 15:11:15.819018071 + +++ gcc/target.def 2019-11-07 16:52:30
Re: [4/6] Optionally pick the cheapest loop_vec_info
On Wed, Nov 6, 2019 at 3:01 PM Richard Sandiford wrote: > > Richard Biener writes: > > On Tue, Nov 5, 2019 at 3:29 PM Richard Sandiford > > wrote: > >> > >> This patch adds a mode in which the vectoriser tries each available > >> base vector mode and picks the one with the lowest cost. For now > >> the behaviour is behind a default-off --param, but a later patch > >> enables it by default for SVE. > >> > >> The patch keeps the current behaviour of preferring a VF of > >> loop->simdlen over any larger or smaller VF, regardless of costs > >> or target preferences. > > > > Can you avoid using a --param for this? Instead I'd suggest to > > amend the vectorize_modes target hook to return some > > flags like VECT_FIRST_MODE_WINS. We'd eventually want > > to make the target able to say do-not-vectorize-epiloges-of-MODE > > (I think we may not want to vectorize SSE vectorized loop > > epilogues with MMX-with-SSE or GPRs for example). I guess > > for the latter we'd use a new target hook. > > The reason for using a --param was that I wanted a way of turning > this on and off on the command line, so that users can experiment > with it if necessary. E.g. enabling the --param could be a viable > alternative to -mprefix-* in some cases. Disabling it would be > a way of working around a bad cost model decision without going > all the way to -fno-vect-cost-model. > > These kinds of --params can become useful workarounds until an > optimisation bug is fixed. I'm arguing that the default depends on the actual ISAs so there isn't a one-fits all and given we have OMP SIMD and target cloning for multiple ISAs this looks like a wrong approach. For sure the target can use its own switches to override defaults here, or alternatively we might want to have a #pragma GCC simdlen mimicing OMP behavior here. Richard. > > Thanks, > Richard
Re: [4/6] Optionally pick the cheapest loop_vec_info
Richard Biener writes: > On Tue, Nov 5, 2019 at 3:29 PM Richard Sandiford > wrote: >> >> This patch adds a mode in which the vectoriser tries each available >> base vector mode and picks the one with the lowest cost. For now >> the behaviour is behind a default-off --param, but a later patch >> enables it by default for SVE. >> >> The patch keeps the current behaviour of preferring a VF of >> loop->simdlen over any larger or smaller VF, regardless of costs >> or target preferences. > > Can you avoid using a --param for this? Instead I'd suggest to > amend the vectorize_modes target hook to return some > flags like VECT_FIRST_MODE_WINS. We'd eventually want > to make the target able to say do-not-vectorize-epiloges-of-MODE > (I think we may not want to vectorize SSE vectorized loop > epilogues with MMX-with-SSE or GPRs for example). I guess > for the latter we'd use a new target hook. The reason for using a --param was that I wanted a way of turning this on and off on the command line, so that users can experiment with it if necessary. E.g. enabling the --param could be a viable alternative to -mprefix-* in some cases. Disabling it would be a way of working around a bad cost model decision without going all the way to -fno-vect-cost-model. These kinds of --params can become useful workarounds until an optimisation bug is fixed. Thanks, Richard
Re: [4/6] Optionally pick the cheapest loop_vec_info
On Tue, Nov 5, 2019 at 3:29 PM Richard Sandiford wrote: > > This patch adds a mode in which the vectoriser tries each available > base vector mode and picks the one with the lowest cost. For now > the behaviour is behind a default-off --param, but a later patch > enables it by default for SVE. > > The patch keeps the current behaviour of preferring a VF of > loop->simdlen over any larger or smaller VF, regardless of costs > or target preferences. Can you avoid using a --param for this? Instead I'd suggest to amend the vectorize_modes target hook to return some flags like VECT_FIRST_MODE_WINS. We'd eventually want to make the target able to say do-not-vectorize-epiloges-of-MODE (I think we may not want to vectorize SSE vectorized loop epilogues with MMX-with-SSE or GPRs for example). I guess for the latter we'd use a new target hook. Otherwise looks reasonable. Richard. > > 2019-11-05 Richard Sandiford > > gcc/ > * params.def (vect-compare-loop-costs): New param. > * doc/invoke.texi: Document it. > * tree-vectorizer.h (_loop_vec_info::vec_outside_cost) > (_loop_vec_info::vec_inside_cost): New member variables. > * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize them. > (vect_better_loop_vinfo_p, vect_joust_loop_vinfos): New functions. > (vect_analyze_loop): When the new parameter allows, try vectorizing > the loop with each available vector mode and picking the one with > the lowest cost. > (vect_estimate_min_profitable_iters): Record the computed costs > in the loop_vec_info. > > Index: gcc/params.def > === > --- gcc/params.def 2019-10-31 17:15:25.470517368 + > +++ gcc/params.def 2019-11-05 14:19:58.781197820 + > @@ -661,6 +661,13 @@ DEFPARAM(PARAM_VECT_MAX_PEELING_FOR_ALIG > "Maximum number of loop peels to enhance alignment of data > references in a loop.", > -1, -1, 64) > > +DEFPARAM(PARAM_VECT_COMPARE_LOOP_COSTS, > +"vect-compare-loop-costs", > +"Whether to try vectorizing a loop using each supported" > +" combination of vector types and picking the version with the" > +" lowest cost.", > +0, 0, 1) > + > DEFPARAM(PARAM_MAX_CSELIB_MEMORY_LOCATIONS, > "max-cselib-memory-locations", > "The maximum memory locations recorded by cselib.", > Index: gcc/doc/invoke.texi > === > --- gcc/doc/invoke.texi 2019-11-04 21:13:57.611756365 + > +++ gcc/doc/invoke.texi 2019-11-05 14:19:58.777197850 + > @@ -11563,6 +11563,12 @@ doing loop versioning for alias in the v > The maximum number of loop peels to enhance access alignment > for vectorizer. Value -1 means no limit. > > +@item vect-compare-loop-costs > +Whether to try vectorizing a loop using each supported combination of > +vector types and picking the version with the lowest cost. This parameter > +has no effect when @option{-fno-vect-cost-model} or > +@option{-fvect-cost-model=unlimited} are used. > + > @item max-iterations-to-track > The maximum number of iterations of a loop the brute-force algorithm > for analysis of the number of iterations of the loop tries to evaluate. > Index: gcc/tree-vectorizer.h > === > --- gcc/tree-vectorizer.h 2019-11-05 14:19:33.829371745 + > +++ gcc/tree-vectorizer.h 2019-11-05 14:19:58.781197820 + > @@ -601,6 +601,13 @@ typedef class _loop_vec_info : public ve >/* Cost of a single scalar iteration. */ >int single_scalar_iteration_cost; > > + /* The cost of the vector prologue and epilogue, including peeled > + iterations and set-up code. */ > + int vec_outside_cost; > + > + /* The cost of the vector loop body. */ > + int vec_inside_cost; > + >/* Is the loop vectorizable? */ >bool vectorizable; > > Index: gcc/tree-vect-loop.c > === > --- gcc/tree-vect-loop.c2019-11-05 14:19:33.829371745 + > +++ gcc/tree-vect-loop.c2019-11-05 14:19:58.781197820 + > @@ -830,6 +830,8 @@ _loop_vec_info::_loop_vec_info (class lo > scan_map (NULL), > slp_unrolling_factor (1), > single_scalar_iteration_cost (0), > +vec_outside_cost (0), > +vec_inside_cost (0), > vectorizable (false), > can_fully_mask_p (true), > fully_masked_p (false), > @@ -2373,6 +2375,80 @@ vect_analyze_loop_2 (loop_vec_info loop_ >goto start_over; > } > > +/* Return true if vectorizing a loop using NEW_LOOP_VINFO appears > + to be better than vectorizing it using OLD_LOOP_VINFO. Assume that > + OLD_LOOP_VINFO is better unless something specifically indicates > + otherwise. > + > + Note that this deliberately isn't a partial order. */ > + > +static bool > +vect_better_loop_vinfo_p (loop_vec_in