Re: Let the target choose a vectorisation alignment

2017-10-06 Thread Richard Sandiford
[+arm maintainers]

Christophe Lyon  writes:
> On 18 September 2017 at 15:57, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Mon, Sep 18, 2017 at 1:58 PM, Richard Sandiford
>>>  wrote:
 The vectoriser aligned vectors to TYPE_ALIGN unconditionally, although
 there was also a hard-coded assumption that this was equal to the type
 size.  This was inconvenient for SVE for two reasons:

 - When compiling for a specific power-of-2 SVE vector length, we might
   want to align to a full vector.  However, the TYPE_ALIGN is governed
   by the ABI alignment, which is 128 bits regardless of size.

 - For vector-length-agnostic code it doesn't usually make sense to align,
   since the runtime vector length might not be a power of two.  Even for
   power of two sizes, there's no guarantee that aligning to the previous
   16 bytes will be an improveent.

 This patch therefore adds a target hook to control the preferred
 vectoriser (as opposed to ABI) alignment.

 Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
 Also tested by comparing the testsuite assembly output on at least one
 target per CPU directory.  OK to install?
>>>
>>> Did you specifically choose to pass the hook a vector type rather than
>>> a mode?
>>
>> It seemed like the safest thing to do for the default implementation,
>> e.g. in case we're vectorising "without SIMD" and thus without an
>> underlying vector mode.  I agree it probably doesn't make much
>> difference for non-default implementations.
>>
>>> I suppose in peeling for alignment the target should be able to
>>> prevent peeling by returning element alignment from the hook?
>>
>> Yeah.  This is what the SVE port does in the default vector-length
>> agnostic mode, and might also make sense in fixed-length mode.
>> Maybe it would be useful for other targets too, if unaligned accesses
>> have a negligible penalty for them.
>>
>
> Since this was committed (r253101), I've noticed a regression
> on arm-none-linux-gnueabihf:
> FAIL:gcc.dg/vect/vect-multitypes-1.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "Vectorizing an unaligned access" 4
> FAIL:gcc.dg/vect/vect-multitypes-1.c scan-tree-dump-times vect
> "Vectorizing an unaligned access" 4
>
> for instance, with
> --target arm-none-linux-gnueabihf
> --with-mode arm
> --with-cpu cortex-a9
> --with-fpu neon-fp16

Thanks for letting me know.  I hadn't realised that AArch32 had an
ABI alignment of 64 bits for 128-bit vectors.

The problem is that, before the patch, we were inconsistent about
using the type alignment and type size for alignment calculations.
vect_compute_data_ref_alignment used the type alignment, so for
128-bit vectors would calculate misalignment relative to 64 bits
rather than 128 bits.  But vect_enhance_data_refs_alignment
calculated the number of peels based on the type size rather
than the type alignment, so would act as though the misalignment
calculated by vect_compute_data_ref_alignment was relative to
128 bits rather than 64 bits.

So for one loop in the test case, we pick a "vector short (8)".
vect_compute_data_ref_alignment realised that one of the original scalar
short accesses was misaligned by 6 bytes (3 elements) relative to the
ABI-defined 8-byte boundary.  However, vect_enhance_data_refs_alignment
applied that misalignment to the vector size of 16 bytes.  We would
then peel 5 iterations to achieve alignment, even though only one
iteration was needed to achieve 64-bit alignment.

(Peeling 5 iterations doesn't achieve 128-bit alignment, since for this
loop we can't compute the misalignment relative to 128 bits at compile
time.  All we were doing was peeling an extra vector iteration to
achieve 64-bit alignment.)

After the patch we base everything on the ABI's 64-bit alignment,
so peel once rather than 5 times.  A combination of these effects
mean that we end up with fewer accesses being reported as unaligned.

In that sense the patch is behaving as designed.  But the question is:
what should the preferred alignment for vectorisaton purposes be on Arm?
The options seem to be:

1) Prefer to align to the ABI alignment, as current trunk does.
2) Prefer to align to the vector size.

Both are changes from the pre-patch behaviour for this testcase,
since as I mentioned above, we cannot compute the misalignment
wrt the vector size at compile time (but before the patch we
acted as though we could).

The attached patch would do 2).  (Minus changelog since it's
just an illustration.)  Changes to the testsuite markup are
needed both ways.

Thanks,
Richard


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3e5438a..6cee000 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -273,6 +273,7 @@ static bool arm_array_mode_supported_p (machine_mode,
 static machine_mode arm_preferred_simd_mode (scalar_mode);
 static bool arm_class_likely_spilled_p (reg_class_t);
 static HOST_WIDE_INT 

Re: Let the target choose a vectorisation alignment

2017-10-05 Thread Christophe Lyon
Hi Richard,


On 18 September 2017 at 15:57, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Mon, Sep 18, 2017 at 1:58 PM, Richard Sandiford
>>  wrote:
>>> The vectoriser aligned vectors to TYPE_ALIGN unconditionally, although
>>> there was also a hard-coded assumption that this was equal to the type
>>> size.  This was inconvenient for SVE for two reasons:
>>>
>>> - When compiling for a specific power-of-2 SVE vector length, we might
>>>   want to align to a full vector.  However, the TYPE_ALIGN is governed
>>>   by the ABI alignment, which is 128 bits regardless of size.
>>>
>>> - For vector-length-agnostic code it doesn't usually make sense to align,
>>>   since the runtime vector length might not be a power of two.  Even for
>>>   power of two sizes, there's no guarantee that aligning to the previous
>>>   16 bytes will be an improveent.
>>>
>>> This patch therefore adds a target hook to control the preferred
>>> vectoriser (as opposed to ABI) alignment.
>>>
>>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
>>> Also tested by comparing the testsuite assembly output on at least one
>>> target per CPU directory.  OK to install?
>>
>> Did you specifically choose to pass the hook a vector type rather than
>> a mode?
>
> It seemed like the safest thing to do for the default implementation,
> e.g. in case we're vectorising "without SIMD" and thus without an
> underlying vector mode.  I agree it probably doesn't make much
> difference for non-default implementations.
>
>> I suppose in peeling for alignment the target should be able to
>> prevent peeling by returning element alignment from the hook?
>
> Yeah.  This is what the SVE port does in the default vector-length
> agnostic mode, and might also make sense in fixed-length mode.
> Maybe it would be useful for other targets too, if unaligned accesses
> have a negligible penalty for them.
>

Since this was committed (r253101), I've noticed a regression
on arm-none-linux-gnueabihf:
FAIL:gcc.dg/vect/vect-multitypes-1.c -flto -ffat-lto-objects
scan-tree-dump-times vect "Vectorizing an unaligned access" 4
FAIL:gcc.dg/vect/vect-multitypes-1.c scan-tree-dump-times vect
"Vectorizing an unaligned access" 4

for instance, with
--target arm-none-linux-gnueabihf
--with-mode arm
--with-cpu cortex-a9
--with-fpu neon-fp16

Thanks,

Christophe

> Thanks,
> Richard
>> Ok.
>>
>> Thanks,
>> Richard.


Re: Let the target choose a vectorisation alignment

2017-09-18 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Sep 18, 2017 at 1:58 PM, Richard Sandiford
>  wrote:
>> The vectoriser aligned vectors to TYPE_ALIGN unconditionally, although
>> there was also a hard-coded assumption that this was equal to the type
>> size.  This was inconvenient for SVE for two reasons:
>>
>> - When compiling for a specific power-of-2 SVE vector length, we might
>>   want to align to a full vector.  However, the TYPE_ALIGN is governed
>>   by the ABI alignment, which is 128 bits regardless of size.
>>
>> - For vector-length-agnostic code it doesn't usually make sense to align,
>>   since the runtime vector length might not be a power of two.  Even for
>>   power of two sizes, there's no guarantee that aligning to the previous
>>   16 bytes will be an improveent.
>>
>> This patch therefore adds a target hook to control the preferred
>> vectoriser (as opposed to ABI) alignment.
>>
>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
>> Also tested by comparing the testsuite assembly output on at least one
>> target per CPU directory.  OK to install?
>
> Did you specifically choose to pass the hook a vector type rather than
> a mode?

It seemed like the safest thing to do for the default implementation,
e.g. in case we're vectorising "without SIMD" and thus without an
underlying vector mode.  I agree it probably doesn't make much
difference for non-default implementations.

> I suppose in peeling for alignment the target should be able to
> prevent peeling by returning element alignment from the hook?

Yeah.  This is what the SVE port does in the default vector-length
agnostic mode, and might also make sense in fixed-length mode.
Maybe it would be useful for other targets too, if unaligned accesses
have a negligible penalty for them.

Thanks,
Richard
> Ok.
>
> Thanks,
> Richard.


Re: Let the target choose a vectorisation alignment

2017-09-18 Thread Richard Biener
On Mon, Sep 18, 2017 at 1:58 PM, Richard Sandiford
 wrote:
> The vectoriser aligned vectors to TYPE_ALIGN unconditionally, although
> there was also a hard-coded assumption that this was equal to the type
> size.  This was inconvenient for SVE for two reasons:
>
> - When compiling for a specific power-of-2 SVE vector length, we might
>   want to align to a full vector.  However, the TYPE_ALIGN is governed
>   by the ABI alignment, which is 128 bits regardless of size.
>
> - For vector-length-agnostic code it doesn't usually make sense to align,
>   since the runtime vector length might not be a power of two.  Even for
>   power of two sizes, there's no guarantee that aligning to the previous
>   16 bytes will be an improveent.
>
> This patch therefore adds a target hook to control the preferred
> vectoriser (as opposed to ABI) alignment.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> Also tested by comparing the testsuite assembly output on at least one
> target per CPU directory.  OK to install?

Did you specifically choose to pass the hook a vector type rather than
a mode?  I suppose in peeling for alignment the target should be able to
prevent peeling by returning element alignment from the hook?

Ok.

Thanks,
Richard.

> Richard
>
>
> 2017-09-18  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * target.def (preferred_vector_alignment): New hook.
> * doc/tm.texi.in (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): New
> hook.
> * doc/tm.texi: Regenerate.
> * targhooks.h (default_preferred_vector_alignment): Declare.
> * targhooks.c (default_preferred_vector_alignment): New function.
> * tree-vectorizer.h (dataref_aux): Add a target_alignment field.
> Expand commentary.
> (DR_TARGET_ALIGNMENT): New macro.
> (aligned_access_p): Update commentary.
> (vect_known_alignment_in_bytes): New function.
> * tree-vect-data-refs.c (vect_calculate_required_alignment): New
> function.
> (vect_compute_data_ref_alignment): Set DR_TARGET_ALIGNMENT.
> Calculate the misalignment based on the target alignment rather than
> the vector size.
> (vect_update_misalignment_for_peel): Use DR_TARGET_ALIGMENT
> rather than TYPE_ALIGN / BITS_PER_UNIT to update the misalignment.
> (vect_enhance_data_refs_alignment): Mask the byte misalignment with
> the target alignment, rather than masking the element misalignment
> with the number of elements in a vector.  Also use the target
> alignment when calculating the maximum number of peels.
> (vect_find_same_alignment_drs): Use vect_calculate_required_alignment
> instead of TYPE_ALIGN_UNIT.
> (vect_duplicate_ssa_name_ptr_info): Remove stmt_info parameter.
> Measure DR_MISALIGNMENT relative to DR_TARGET_ALIGNMENT.
> (vect_create_addr_base_for_vector_ref): Update call accordingly.
> (vect_create_data_ref_ptr): Likewise.
> (vect_setup_realignment): Realign by ANDing with
> -DR_TARGET_MISALIGNMENT.
> * tree-vect-loop-manip.c (vect_gen_prolog_loop_niters): Calculate
> the number of peels based on DR_TARGET_ALIGNMENT.
> * tree-vect-stmts.c (get_group_load_store_type): Compare the gap
> with the guaranteed alignment boundary when deciding whether
> overrun is OK.
> (vectorizable_mask_load_store): Interpret DR_MISALIGNMENT
> relative to DR_TARGET_ALIGNMENT instead of TYPE_ALIGN_UNIT.
> (ensure_base_align): Remove stmt_info parameter.  Get the
> target base alignment from DR_TARGET_ALIGNMENT.
> (vectorizable_store): Update call accordingly.   Interpret
> DR_MISALIGNMENT relative to DR_TARGET_ALIGNMENT instead of
> TYPE_ALIGN_UNIT.
> (vectorizable_load): Likewise.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-outer-3a.c: Adjust dump scan for new wording
> of alignment message.
> * gcc.dg/vect/vect-outer-3a-big-array.c: Likewise.
>
> Index: gcc/target.def
> ===
> *** gcc/target.def  2017-09-18 12:56:24.635070853 +0100
> --- gcc/target.def  2017-09-18 12:56:24.847378559 +0100
> *** misalignment value (@var{misalign}).",
> *** 1820,1825 
> --- 1820,1839 
>int, (enum vect_cost_for_stmt type_of_cost, tree vectype, int misalign),
>default_builtin_vectorization_cost)
>
> + DEFHOOK
> + (preferred_vector_alignment,
> +  "This hook returns the preferred alignment in bits for accesses to\n\
> + vectors of type @var{type} in vectorized code.  This might be less than\n\
> + or greater than the ABI-defined value returned by\n\
> + @code{TARGET_VECTOR_ALIGNMENT}.  It can be equal to the alignment of\n\
> + a single element, in which case the vectorizer will not try to optimize\n\
> + for ali

Let the target choose a vectorisation alignment

2017-09-18 Thread Richard Sandiford
The vectoriser aligned vectors to TYPE_ALIGN unconditionally, although
there was also a hard-coded assumption that this was equal to the type
size.  This was inconvenient for SVE for two reasons:

- When compiling for a specific power-of-2 SVE vector length, we might
  want to align to a full vector.  However, the TYPE_ALIGN is governed
  by the ABI alignment, which is 128 bits regardless of size.

- For vector-length-agnostic code it doesn't usually make sense to align,
  since the runtime vector length might not be a power of two.  Even for
  power of two sizes, there's no guarantee that aligning to the previous
  16 bytes will be an improveent.

This patch therefore adds a target hook to control the preferred
vectoriser (as opposed to ABI) alignment.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
Also tested by comparing the testsuite assembly output on at least one
target per CPU directory.  OK to install?

Richard


2017-09-18  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* target.def (preferred_vector_alignment): New hook.
* doc/tm.texi.in (TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT): New
hook.
* doc/tm.texi: Regenerate.
* targhooks.h (default_preferred_vector_alignment): Declare.
* targhooks.c (default_preferred_vector_alignment): New function.
* tree-vectorizer.h (dataref_aux): Add a target_alignment field.
Expand commentary.
(DR_TARGET_ALIGNMENT): New macro.
(aligned_access_p): Update commentary.
(vect_known_alignment_in_bytes): New function.
* tree-vect-data-refs.c (vect_calculate_required_alignment): New
function.
(vect_compute_data_ref_alignment): Set DR_TARGET_ALIGNMENT.
Calculate the misalignment based on the target alignment rather than
the vector size.
(vect_update_misalignment_for_peel): Use DR_TARGET_ALIGMENT
rather than TYPE_ALIGN / BITS_PER_UNIT to update the misalignment.
(vect_enhance_data_refs_alignment): Mask the byte misalignment with
the target alignment, rather than masking the element misalignment
with the number of elements in a vector.  Also use the target
alignment when calculating the maximum number of peels.
(vect_find_same_alignment_drs): Use vect_calculate_required_alignment
instead of TYPE_ALIGN_UNIT.
(vect_duplicate_ssa_name_ptr_info): Remove stmt_info parameter.
Measure DR_MISALIGNMENT relative to DR_TARGET_ALIGNMENT.
(vect_create_addr_base_for_vector_ref): Update call accordingly.
(vect_create_data_ref_ptr): Likewise.
(vect_setup_realignment): Realign by ANDing with
-DR_TARGET_MISALIGNMENT.
* tree-vect-loop-manip.c (vect_gen_prolog_loop_niters): Calculate
the number of peels based on DR_TARGET_ALIGNMENT.
* tree-vect-stmts.c (get_group_load_store_type): Compare the gap
with the guaranteed alignment boundary when deciding whether
overrun is OK.
(vectorizable_mask_load_store): Interpret DR_MISALIGNMENT
relative to DR_TARGET_ALIGNMENT instead of TYPE_ALIGN_UNIT.
(ensure_base_align): Remove stmt_info parameter.  Get the
target base alignment from DR_TARGET_ALIGNMENT.
(vectorizable_store): Update call accordingly.   Interpret
DR_MISALIGNMENT relative to DR_TARGET_ALIGNMENT instead of
TYPE_ALIGN_UNIT.
(vectorizable_load): Likewise.

gcc/testsuite/
* gcc.dg/vect/vect-outer-3a.c: Adjust dump scan for new wording
of alignment message.
* gcc.dg/vect/vect-outer-3a-big-array.c: Likewise.

Index: gcc/target.def
===
*** gcc/target.def  2017-09-18 12:56:24.635070853 +0100
--- gcc/target.def  2017-09-18 12:56:24.847378559 +0100
*** misalignment value (@var{misalign}).",
*** 1820,1825 
--- 1820,1839 
   int, (enum vect_cost_for_stmt type_of_cost, tree vectype, int misalign),
   default_builtin_vectorization_cost)
  
+ DEFHOOK
+ (preferred_vector_alignment,
+  "This hook returns the preferred alignment in bits for accesses to\n\
+ vectors of type @var{type} in vectorized code.  This might be less than\n\
+ or greater than the ABI-defined value returned by\n\
+ @code{TARGET_VECTOR_ALIGNMENT}.  It can be equal to the alignment of\n\
+ a single element, in which case the vectorizer will not try to optimize\n\
+ for alignment.\n\
+ \n\
+ The default hook returns @code{TYPE_ALIGN (@var{type})}, which is\n\
+ correct for most targets.",
+  HOST_WIDE_INT, (const_tree type),
+  default_preferred_vector_alignment)
+ 
  /* Return true if vector alignment is reachable (by peeling N
 iterations) for the given scalar type.  */
  DEFHOOK
Index: gcc/doc/tm.texi.in
===
*** gcc/doc/tm.texi.in  2017-09-18 12:56:24.63507085