Re: Add support for fully-predicated loops

2018-01-15 Thread Christophe Lyon
Hi Richard,


On 7 January 2018 at 18:08, James Greenhalgh  wrote:
> On Mon, Dec 18, 2017 at 07:40:00PM +, Jeff Law wrote:
>> On 11/17/2017 07:56 AM, Richard Sandiford wrote:
>> > This patch adds support for using a single fully-predicated loop instead
>> > of a vector loop and a scalar tail.  An SVE WHILELO instruction generates
>> > the predicate for each iteration of the loop, given the current scalar
>> > iv value and the loop bound.  This operation is wrapped up in a new 
>> > internal
>> > function called WHILE_ULT.  E.g.:
>> >
>> >WHILE_ULT (0, 3, { 0, 0, 0, 0}) -> { 1, 1, 1, 0 }
>> >WHILE_ULT (UINT_MAX - 1, UINT_MAX, { 0, 0, 0, 0 }) -> { 1, 0, 0, 0 }
>> >
>> > The third WHILE_ULT argument is needed to make the operation
>> > unambiguous: without it, WHILE_ULT (0, 3) for one vector type would
>> > seem equivalent to WHILE_ULT (0, 3) for another, even if the types have
>> > different numbers of elements.
>> >
>> > Note that the patch uses "mask" and "fully-masked" instead of
>> > "predicate" and "fully-predicated", to follow existing GCC terminology.
>> >
>> > This patch just handles the simple cases, punting for things like
>> > reductions and live-out values.  Later patches remove most of these
>> > restrictions.
>> >
>> > Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>> > and powerpc64le-linux-gnu.  OK to install?
>> >
>> > Richard
>> >
>> >
>> > 2017-11-17  Richard Sandiford  
>> > Alan Hayward  
>> > David Sherwood  
>> >
>> > gcc/
>> > * optabs.def (while_ult_optab): New optab.
>> > * doc/md.texi (while_ult@var{m}@var{n}): Document.
>> > * internal-fn.def (WHILE_ULT): New internal function.
>> > * internal-fn.h (direct_internal_fn_supported_p): New override
>> > that takes two types as argument.
>> > * internal-fn.c (while_direct): New macro.
>> > (expand_while_optab_fn): New function.
>> > (convert_optab_supported_p): Likewise.
>> > (direct_while_optab_supported_p): New macro.
>> > * wide-int.h (wi::udiv_ceil): New function.
>> > * tree-vectorizer.h (rgroup_masks): New structure.
>> > (vec_loop_masks): New typedef.
>> > (_loop_vec_info): Add masks, mask_compare_type, can_fully_mask_p
>> > and fully_masked_p.
>> > (LOOP_VINFO_CAN_FULLY_MASK_P, LOOP_VINFO_FULLY_MASKED_P)
>> > (LOOP_VINFO_MASKS, LOOP_VINFO_MASK_COMPARE_TYPE): New macros.
>> > (vect_max_vf): New function.
>> > (slpeel_make_loop_iterate_ntimes): Delete.
>> > (vect_set_loop_condition, vect_get_loop_mask_type, vect_gen_while)
>> > (vect_halve_mask_nunits, vect_double_mask_nunits): Declare.
>> > )vect_record_loop_mask, vect_get_loop_mask): Likewise.
>> > * tree-vect-loop-manip.c: Include tree-ssa-loop-niter.h,
>> > internal-fn.h, stor-layout.h and optabs-query.h.
>> > (vect_set_loop_mask): New function.
>> > (add_preheader_seq): Likewise.
>> > (add_header_seq): Likewise.
>> > (vect_maybe_permute_loop_masks): Likewise.
>> > (vect_set_loop_masks_directly): Likewise.
>> > (vect_set_loop_condition_masked): Likewise.
>> > (vect_set_loop_condition_unmasked): New function, split out from
>> > slpeel_make_loop_iterate_ntimes.
>> > (slpeel_make_loop_iterate_ntimes): Rename to..
>> > (vect_set_loop_condition): ...this.  Use vect_set_loop_condition_masked
>> > for fully-masked loops and vect_set_loop_condition_unmasked otherwise.
>> > (vect_do_peeling): Update call accordingly.
>> > (vect_gen_vector_loop_niters): Use VF as the step for fully-masked
>> > loops.
>> > * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
>> > mask_compare_type, can_fully_mask_p and fully_masked_p.
>> > (release_vec_loop_masks): New function.
>> > (_loop_vec_info): Use it to free the loop masks.
>> > (can_produce_all_loop_masks_p): New function.
>> > (vect_get_max_nscalars_per_iter): Likewise.
>> > (vect_verify_full_masking): Likewise.
>> > (vect_analyze_loop_2): Save LOOP_VINFO_CAN_FULLY_MASK_P around
>> > retries, and free the mask rgroups before retrying.  Check loop-wide
>> > reasons for disallowing fully-masked loops.  Make the final decision
>> > about whether use a fully-masked loop or not.
>> > (vect_estimate_min_profitable_iters): Do not assume that peeling
>> > for the number of iterations will be needed for fully-masked loops.
>> > (vectorizable_reduction): Disable fully-masked loops.
>> > (vectorizable_live_operation): Likewise.
>> > (vect_halve_mask_nunits): New function.
>> > (vect_double_mask_nunits): Likewise.
>> > (vect_record_loop_mask): Likewise.
>> > (vect_get_loop_mask): Likewise.
>> > (vect_transform_loop): Handle the case in which the final loop
>> > iteration might handle a partial vector.  Call vect_set_loop_condition
>> > instead of 

Re: Add support for fully-predicated loops

2018-01-07 Thread James Greenhalgh
On Mon, Dec 18, 2017 at 07:40:00PM +, Jeff Law wrote:
> On 11/17/2017 07:56 AM, Richard Sandiford wrote:
> > This patch adds support for using a single fully-predicated loop instead
> > of a vector loop and a scalar tail.  An SVE WHILELO instruction generates
> > the predicate for each iteration of the loop, given the current scalar
> > iv value and the loop bound.  This operation is wrapped up in a new internal
> > function called WHILE_ULT.  E.g.:
> > 
> >WHILE_ULT (0, 3, { 0, 0, 0, 0}) -> { 1, 1, 1, 0 }
> >WHILE_ULT (UINT_MAX - 1, UINT_MAX, { 0, 0, 0, 0 }) -> { 1, 0, 0, 0 }
> > 
> > The third WHILE_ULT argument is needed to make the operation
> > unambiguous: without it, WHILE_ULT (0, 3) for one vector type would
> > seem equivalent to WHILE_ULT (0, 3) for another, even if the types have
> > different numbers of elements.
> > 
> > Note that the patch uses "mask" and "fully-masked" instead of
> > "predicate" and "fully-predicated", to follow existing GCC terminology.
> > 
> > This patch just handles the simple cases, punting for things like
> > reductions and live-out values.  Later patches remove most of these
> > restrictions.
> > 
> > Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
> > and powerpc64le-linux-gnu.  OK to install?
> > 
> > Richard
> > 
> > 
> > 2017-11-17  Richard Sandiford  
> > Alan Hayward  
> > David Sherwood  
> > 
> > gcc/
> > * optabs.def (while_ult_optab): New optab.
> > * doc/md.texi (while_ult@var{m}@var{n}): Document.
> > * internal-fn.def (WHILE_ULT): New internal function.
> > * internal-fn.h (direct_internal_fn_supported_p): New override
> > that takes two types as argument.
> > * internal-fn.c (while_direct): New macro.
> > (expand_while_optab_fn): New function.
> > (convert_optab_supported_p): Likewise.
> > (direct_while_optab_supported_p): New macro.
> > * wide-int.h (wi::udiv_ceil): New function.
> > * tree-vectorizer.h (rgroup_masks): New structure.
> > (vec_loop_masks): New typedef.
> > (_loop_vec_info): Add masks, mask_compare_type, can_fully_mask_p
> > and fully_masked_p.
> > (LOOP_VINFO_CAN_FULLY_MASK_P, LOOP_VINFO_FULLY_MASKED_P)
> > (LOOP_VINFO_MASKS, LOOP_VINFO_MASK_COMPARE_TYPE): New macros.
> > (vect_max_vf): New function.
> > (slpeel_make_loop_iterate_ntimes): Delete.
> > (vect_set_loop_condition, vect_get_loop_mask_type, vect_gen_while)
> > (vect_halve_mask_nunits, vect_double_mask_nunits): Declare.
> > )vect_record_loop_mask, vect_get_loop_mask): Likewise.
> > * tree-vect-loop-manip.c: Include tree-ssa-loop-niter.h,
> > internal-fn.h, stor-layout.h and optabs-query.h.
> > (vect_set_loop_mask): New function.
> > (add_preheader_seq): Likewise.
> > (add_header_seq): Likewise.
> > (vect_maybe_permute_loop_masks): Likewise.
> > (vect_set_loop_masks_directly): Likewise.
> > (vect_set_loop_condition_masked): Likewise.
> > (vect_set_loop_condition_unmasked): New function, split out from
> > slpeel_make_loop_iterate_ntimes.
> > (slpeel_make_loop_iterate_ntimes): Rename to..
> > (vect_set_loop_condition): ...this.  Use vect_set_loop_condition_masked
> > for fully-masked loops and vect_set_loop_condition_unmasked otherwise.
> > (vect_do_peeling): Update call accordingly.
> > (vect_gen_vector_loop_niters): Use VF as the step for fully-masked
> > loops.
> > * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
> > mask_compare_type, can_fully_mask_p and fully_masked_p.
> > (release_vec_loop_masks): New function.
> > (_loop_vec_info): Use it to free the loop masks.
> > (can_produce_all_loop_masks_p): New function.
> > (vect_get_max_nscalars_per_iter): Likewise.
> > (vect_verify_full_masking): Likewise.
> > (vect_analyze_loop_2): Save LOOP_VINFO_CAN_FULLY_MASK_P around
> > retries, and free the mask rgroups before retrying.  Check loop-wide
> > reasons for disallowing fully-masked loops.  Make the final decision
> > about whether use a fully-masked loop or not.
> > (vect_estimate_min_profitable_iters): Do not assume that peeling
> > for the number of iterations will be needed for fully-masked loops.
> > (vectorizable_reduction): Disable fully-masked loops.
> > (vectorizable_live_operation): Likewise.
> > (vect_halve_mask_nunits): New function.
> > (vect_double_mask_nunits): Likewise.
> > (vect_record_loop_mask): Likewise.
> > (vect_get_loop_mask): Likewise.
> > (vect_transform_loop): Handle the case in which the final loop
> > iteration might handle a partial vector.  Call vect_set_loop_condition
> > instead of slpeel_make_loop_iterate_ntimes.
> > * tree-vect-stmts.c: Include tree-ssa-loop-niter.h and gimple-fold.h.
> > (check_load_store_masking): New function.
> > (prepare_load_store_mask): 

Re: Add support for fully-predicated loops

2017-12-18 Thread Jeff Law
On 11/17/2017 07:56 AM, Richard Sandiford wrote:
> This patch adds support for using a single fully-predicated loop instead
> of a vector loop and a scalar tail.  An SVE WHILELO instruction generates
> the predicate for each iteration of the loop, given the current scalar
> iv value and the loop bound.  This operation is wrapped up in a new internal
> function called WHILE_ULT.  E.g.:
> 
>WHILE_ULT (0, 3, { 0, 0, 0, 0}) -> { 1, 1, 1, 0 }
>WHILE_ULT (UINT_MAX - 1, UINT_MAX, { 0, 0, 0, 0 }) -> { 1, 0, 0, 0 }
> 
> The third WHILE_ULT argument is needed to make the operation
> unambiguous: without it, WHILE_ULT (0, 3) for one vector type would
> seem equivalent to WHILE_ULT (0, 3) for another, even if the types have
> different numbers of elements.
> 
> Note that the patch uses "mask" and "fully-masked" instead of
> "predicate" and "fully-predicated", to follow existing GCC terminology.
> 
> This patch just handles the simple cases, punting for things like
> reductions and live-out values.  Later patches remove most of these
> restrictions.
> 
> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
> and powerpc64le-linux-gnu.  OK to install?
> 
> Richard
> 
> 
> 2017-11-17  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * optabs.def (while_ult_optab): New optab.
>   * doc/md.texi (while_ult@var{m}@var{n}): Document.
>   * internal-fn.def (WHILE_ULT): New internal function.
>   * internal-fn.h (direct_internal_fn_supported_p): New override
>   that takes two types as argument.
>   * internal-fn.c (while_direct): New macro.
>   (expand_while_optab_fn): New function.
>   (convert_optab_supported_p): Likewise.
>   (direct_while_optab_supported_p): New macro.
>   * wide-int.h (wi::udiv_ceil): New function.
>   * tree-vectorizer.h (rgroup_masks): New structure.
>   (vec_loop_masks): New typedef.
>   (_loop_vec_info): Add masks, mask_compare_type, can_fully_mask_p
>   and fully_masked_p.
>   (LOOP_VINFO_CAN_FULLY_MASK_P, LOOP_VINFO_FULLY_MASKED_P)
>   (LOOP_VINFO_MASKS, LOOP_VINFO_MASK_COMPARE_TYPE): New macros.
>   (vect_max_vf): New function.
>   (slpeel_make_loop_iterate_ntimes): Delete.
>   (vect_set_loop_condition, vect_get_loop_mask_type, vect_gen_while)
>   (vect_halve_mask_nunits, vect_double_mask_nunits): Declare.
>   )vect_record_loop_mask, vect_get_loop_mask): Likewise.
>   * tree-vect-loop-manip.c: Include tree-ssa-loop-niter.h,
>   internal-fn.h, stor-layout.h and optabs-query.h.
>   (vect_set_loop_mask): New function.
>   (add_preheader_seq): Likewise.
>   (add_header_seq): Likewise.
>   (vect_maybe_permute_loop_masks): Likewise.
>   (vect_set_loop_masks_directly): Likewise.
>   (vect_set_loop_condition_masked): Likewise.
>   (vect_set_loop_condition_unmasked): New function, split out from
>   slpeel_make_loop_iterate_ntimes.
>   (slpeel_make_loop_iterate_ntimes): Rename to..
>   (vect_set_loop_condition): ...this.  Use vect_set_loop_condition_masked
>   for fully-masked loops and vect_set_loop_condition_unmasked otherwise.
>   (vect_do_peeling): Update call accordingly.
>   (vect_gen_vector_loop_niters): Use VF as the step for fully-masked
>   loops.
>   * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
>   mask_compare_type, can_fully_mask_p and fully_masked_p.
>   (release_vec_loop_masks): New function.
>   (_loop_vec_info): Use it to free the loop masks.
>   (can_produce_all_loop_masks_p): New function.
>   (vect_get_max_nscalars_per_iter): Likewise.
>   (vect_verify_full_masking): Likewise.
>   (vect_analyze_loop_2): Save LOOP_VINFO_CAN_FULLY_MASK_P around
>   retries, and free the mask rgroups before retrying.  Check loop-wide
>   reasons for disallowing fully-masked loops.  Make the final decision
>   about whether use a fully-masked loop or not.
>   (vect_estimate_min_profitable_iters): Do not assume that peeling
>   for the number of iterations will be needed for fully-masked loops.
>   (vectorizable_reduction): Disable fully-masked loops.
>   (vectorizable_live_operation): Likewise.
>   (vect_halve_mask_nunits): New function.
>   (vect_double_mask_nunits): Likewise.
>   (vect_record_loop_mask): Likewise.
>   (vect_get_loop_mask): Likewise.
>   (vect_transform_loop): Handle the case in which the final loop
>   iteration might handle a partial vector.  Call vect_set_loop_condition
>   instead of slpeel_make_loop_iterate_ntimes.
>   * tree-vect-stmts.c: Include tree-ssa-loop-niter.h and gimple-fold.h.
>   (check_load_store_masking): New function.
>   (prepare_load_store_mask): Likewise.
>   (vectorizable_store): Handle fully-masked loops.
>   (vectorizable_load): Likewise.
>   

Add support for fully-predicated loops

2017-11-17 Thread Richard Sandiford
This patch adds support for using a single fully-predicated loop instead
of a vector loop and a scalar tail.  An SVE WHILELO instruction generates
the predicate for each iteration of the loop, given the current scalar
iv value and the loop bound.  This operation is wrapped up in a new internal
function called WHILE_ULT.  E.g.:

   WHILE_ULT (0, 3, { 0, 0, 0, 0}) -> { 1, 1, 1, 0 }
   WHILE_ULT (UINT_MAX - 1, UINT_MAX, { 0, 0, 0, 0 }) -> { 1, 0, 0, 0 }

The third WHILE_ULT argument is needed to make the operation
unambiguous: without it, WHILE_ULT (0, 3) for one vector type would
seem equivalent to WHILE_ULT (0, 3) for another, even if the types have
different numbers of elements.

Note that the patch uses "mask" and "fully-masked" instead of
"predicate" and "fully-predicated", to follow existing GCC terminology.

This patch just handles the simple cases, punting for things like
reductions and live-out values.  Later patches remove most of these
restrictions.

Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
and powerpc64le-linux-gnu.  OK to install?

Richard


2017-11-17  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* optabs.def (while_ult_optab): New optab.
* doc/md.texi (while_ult@var{m}@var{n}): Document.
* internal-fn.def (WHILE_ULT): New internal function.
* internal-fn.h (direct_internal_fn_supported_p): New override
that takes two types as argument.
* internal-fn.c (while_direct): New macro.
(expand_while_optab_fn): New function.
(convert_optab_supported_p): Likewise.
(direct_while_optab_supported_p): New macro.
* wide-int.h (wi::udiv_ceil): New function.
* tree-vectorizer.h (rgroup_masks): New structure.
(vec_loop_masks): New typedef.
(_loop_vec_info): Add masks, mask_compare_type, can_fully_mask_p
and fully_masked_p.
(LOOP_VINFO_CAN_FULLY_MASK_P, LOOP_VINFO_FULLY_MASKED_P)
(LOOP_VINFO_MASKS, LOOP_VINFO_MASK_COMPARE_TYPE): New macros.
(vect_max_vf): New function.
(slpeel_make_loop_iterate_ntimes): Delete.
(vect_set_loop_condition, vect_get_loop_mask_type, vect_gen_while)
(vect_halve_mask_nunits, vect_double_mask_nunits): Declare.
)vect_record_loop_mask, vect_get_loop_mask): Likewise.
* tree-vect-loop-manip.c: Include tree-ssa-loop-niter.h,
internal-fn.h, stor-layout.h and optabs-query.h.
(vect_set_loop_mask): New function.
(add_preheader_seq): Likewise.
(add_header_seq): Likewise.
(vect_maybe_permute_loop_masks): Likewise.
(vect_set_loop_masks_directly): Likewise.
(vect_set_loop_condition_masked): Likewise.
(vect_set_loop_condition_unmasked): New function, split out from
slpeel_make_loop_iterate_ntimes.
(slpeel_make_loop_iterate_ntimes): Rename to..
(vect_set_loop_condition): ...this.  Use vect_set_loop_condition_masked
for fully-masked loops and vect_set_loop_condition_unmasked otherwise.
(vect_do_peeling): Update call accordingly.
(vect_gen_vector_loop_niters): Use VF as the step for fully-masked
loops.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
mask_compare_type, can_fully_mask_p and fully_masked_p.
(release_vec_loop_masks): New function.
(_loop_vec_info): Use it to free the loop masks.
(can_produce_all_loop_masks_p): New function.
(vect_get_max_nscalars_per_iter): Likewise.
(vect_verify_full_masking): Likewise.
(vect_analyze_loop_2): Save LOOP_VINFO_CAN_FULLY_MASK_P around
retries, and free the mask rgroups before retrying.  Check loop-wide
reasons for disallowing fully-masked loops.  Make the final decision
about whether use a fully-masked loop or not.
(vect_estimate_min_profitable_iters): Do not assume that peeling
for the number of iterations will be needed for fully-masked loops.
(vectorizable_reduction): Disable fully-masked loops.
(vectorizable_live_operation): Likewise.
(vect_halve_mask_nunits): New function.
(vect_double_mask_nunits): Likewise.
(vect_record_loop_mask): Likewise.
(vect_get_loop_mask): Likewise.
(vect_transform_loop): Handle the case in which the final loop
iteration might handle a partial vector.  Call vect_set_loop_condition
instead of slpeel_make_loop_iterate_ntimes.
* tree-vect-stmts.c: Include tree-ssa-loop-niter.h and gimple-fold.h.
(check_load_store_masking): New function.
(prepare_load_store_mask): Likewise.
(vectorizable_store): Handle fully-masked loops.
(vectorizable_load): Likewise.
(supportable_widening_operation): Use vect_halve_mask_nunits for
booleans.