Re: [RFC 0/X] Implement GCC support for AArch64 libmvec

2023-04-21 Thread Jakub Jelinek via Gcc-patches
On Fri, Apr 21, 2023 at 10:54:51AM +0100, Richard Sandiford wrote:
> > I'm guessing the keyword here is 'trait' which I'm guessing is different 
> > from a omp declare simd directive, which is why it's not required to 
> > have a simdlen clause in an omp declare simd (see Jakub's comment).
> 
> Sure.  The thread above is about whether we need extension("scalable")
> or should drop it.  And extension("scalable") is only used in omp
> declare variant.  This was in response to "I also do not see a need
> for the 'omp declare variant' scalable extension constructs".

I'm not sure extension("scalable") in context selectors is what you want
to handle declare variant.  While extension trait is allowed and it is
implementation defined what is accepted as its arguments (within the
boundaries of allowed syntax), in this case you really want to adjust
behavior of the simd trait, so it would be better to specify you want
scalable simdlen using a simd trait property.

There will be OpenMP F2F meeting next month, I think this should be
discussed there and agreed on how to do this, after all, seems ARM
won't be the only architecture that needs it, RISC-V might be another.

Jakub



Re: [RFC 0/X] Implement GCC support for AArch64 libmvec

2023-04-21 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> On 20/04/2023 17:13, Richard Sandiford wrote:
>> "Andre Vieira (lists)"  writes:
>>> On 20/04/2023 15:51, Richard Sandiford wrote:
 "Andre Vieira (lists)"  writes:
> Hi all,
>
> This is a series of patches/RFCs to implement support in GCC to be able
> to target AArch64's libmvec functions that will be/are being added to 
> glibc.
> We have chosen to use the omp pragma '#pragma omp declare variant ...'
> with a simd construct as the way for glibc to inform GCC what functions
> are available.
>
> For example, if we would like to supply a vector version of the scalar
> 'cosf' we would have an include file with something like:
> typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
> typedef __attribute__((__neon_vector_type__(2))) float __f32x2_t;
> typedef __SVFloat32_t __sv_f32_t;
> typedef __SVBool_t __sv_bool_t;
> __f32x4_t _ZGVnN4v_cosf (__f32x4_t);
> __f32x2_t _ZGVnN2v_cosf (__f32x2_t);
> __sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
> #pragma omp declare variant(_ZGVnN4v_cosf) \
>match(construct = {simd(notinbranch, simdlen(4))}, device =
> {isa("simd")})
> #pragma omp declare variant(_ZGVnN2v_cosf) \
>match(construct = {simd(notinbranch, simdlen(2))}, device =
> {isa("simd")})
> #pragma omp declare variant(_ZGVsMxv_cosf) \
>match(construct = {simd(inbranch)}, device = {isa("sve")})
> extern float cosf (float);
>
> The BETA ABI can be found in the vfabia64 subdir of
> https://github.com/ARM-software/abi-aa/
> This currently disagrees with how this patch series implements 'omp
> declare simd' for SVE and I also do not see a need for the 'omp declare
> variant' scalable extension constructs. I will make changes to the ABI
> once we've finalized the co-design of the ABI and this implementation.

 I don't see a good reason for dropping the extension("scalable").
 The problem is that since the base spec requires a simdlen clause,
 GCC should in general raise an error if simdlen is omitted.
>>> Where can you find this in the specs? I tried to find it but couldn't.
>>>
>>> Leaving out simdlen in a 'omp declare simd' I assume is OK, our vector
>>> ABI defines behaviour for this. But I couldn't find what it meant for a
>>> omp declare variant, obviously can't be the same as for declare simd, as
>>> that is defined to mean 'define a set of clones' and only one clone can
>>> be associated to a declare variant.
>> 
>> I was going from https://www.openmp.org/spec-html/5.0/openmpsu25.html ,
>> which says:
>> 
>>The simd trait can be further defined with properties that match the
>>clauses accepted by the declare simd directive with the same name and
>>semantics. The simd trait must define at least the simdlen property and
>>one of the inbranch or notinbranch properties.
>> 
>> (probably best to read it in the original -- it's almost incomprehensible
>> without markup)
>> 
> I'm guessing the keyword here is 'trait' which I'm guessing is different 
> from a omp declare simd directive, which is why it's not required to 
> have a simdlen clause in an omp declare simd (see Jakub's comment).

Sure.  The thread above is about whether we need extension("scalable")
or should drop it.  And extension("scalable") is only used in omp
declare variant.  This was in response to "I also do not see a need
for the 'omp declare variant' scalable extension constructs".

Not having a simdlen on an omp declare simd is of course OK (and the
VFABI defines behaviour for that case).

Richard


Re: [RFC 0/X] Implement GCC support for AArch64 libmvec

2023-04-21 Thread Andre Vieira (lists) via Gcc-patches




On 20/04/2023 17:13, Richard Sandiford wrote:

"Andre Vieira (lists)"  writes:

On 20/04/2023 15:51, Richard Sandiford wrote:

"Andre Vieira (lists)"  writes:

Hi all,

This is a series of patches/RFCs to implement support in GCC to be able
to target AArch64's libmvec functions that will be/are being added to glibc.
We have chosen to use the omp pragma '#pragma omp declare variant ...'
with a simd construct as the way for glibc to inform GCC what functions
are available.

For example, if we would like to supply a vector version of the scalar
'cosf' we would have an include file with something like:
typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
typedef __attribute__((__neon_vector_type__(2))) float __f32x2_t;
typedef __SVFloat32_t __sv_f32_t;
typedef __SVBool_t __sv_bool_t;
__f32x4_t _ZGVnN4v_cosf (__f32x4_t);
__f32x2_t _ZGVnN2v_cosf (__f32x2_t);
__sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
#pragma omp declare variant(_ZGVnN4v_cosf) \
   match(construct = {simd(notinbranch, simdlen(4))}, device =
{isa("simd")})
#pragma omp declare variant(_ZGVnN2v_cosf) \
   match(construct = {simd(notinbranch, simdlen(2))}, device =
{isa("simd")})
#pragma omp declare variant(_ZGVsMxv_cosf) \
   match(construct = {simd(inbranch)}, device = {isa("sve")})
extern float cosf (float);

The BETA ABI can be found in the vfabia64 subdir of
https://github.com/ARM-software/abi-aa/
This currently disagrees with how this patch series implements 'omp
declare simd' for SVE and I also do not see a need for the 'omp declare
variant' scalable extension constructs. I will make changes to the ABI
once we've finalized the co-design of the ABI and this implementation.


I don't see a good reason for dropping the extension("scalable").
The problem is that since the base spec requires a simdlen clause,
GCC should in general raise an error if simdlen is omitted.

Where can you find this in the specs? I tried to find it but couldn't.

Leaving out simdlen in a 'omp declare simd' I assume is OK, our vector
ABI defines behaviour for this. But I couldn't find what it meant for a
omp declare variant, obviously can't be the same as for declare simd, as
that is defined to mean 'define a set of clones' and only one clone can
be associated to a declare variant.


I was going from https://www.openmp.org/spec-html/5.0/openmpsu25.html ,
which says:

   The simd trait can be further defined with properties that match the
   clauses accepted by the declare simd directive with the same name and
   semantics. The simd trait must define at least the simdlen property and
   one of the inbranch or notinbranch properties.

(probably best to read it in the original -- it's almost incomprehensible
without markup)

I'm guessing the keyword here is 'trait' which I'm guessing is different 
from a omp declare simd directive, which is why it's not required to 
have a simdlen clause in an omp declare simd (see Jakub's comment).


But for declare variants I guess it does require you to? It doesn't 
'break' anything, just means I need to add support for parsing the 
extension clause as was originally planned.

Richard


Re: [RFC 0/X] Implement GCC support for AArch64 libmvec

2023-04-20 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> On 20/04/2023 15:51, Richard Sandiford wrote:
>> "Andre Vieira (lists)"  writes:
>>> Hi all,
>>>
>>> This is a series of patches/RFCs to implement support in GCC to be able
>>> to target AArch64's libmvec functions that will be/are being added to glibc.
>>> We have chosen to use the omp pragma '#pragma omp declare variant ...'
>>> with a simd construct as the way for glibc to inform GCC what functions
>>> are available.
>>>
>>> For example, if we would like to supply a vector version of the scalar
>>> 'cosf' we would have an include file with something like:
>>> typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
>>> typedef __attribute__((__neon_vector_type__(2))) float __f32x2_t;
>>> typedef __SVFloat32_t __sv_f32_t;
>>> typedef __SVBool_t __sv_bool_t;
>>> __f32x4_t _ZGVnN4v_cosf (__f32x4_t);
>>> __f32x2_t _ZGVnN2v_cosf (__f32x2_t);
>>> __sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
>>> #pragma omp declare variant(_ZGVnN4v_cosf) \
>>>   match(construct = {simd(notinbranch, simdlen(4))}, device =
>>> {isa("simd")})
>>> #pragma omp declare variant(_ZGVnN2v_cosf) \
>>>   match(construct = {simd(notinbranch, simdlen(2))}, device =
>>> {isa("simd")})
>>> #pragma omp declare variant(_ZGVsMxv_cosf) \
>>>   match(construct = {simd(inbranch)}, device = {isa("sve")})
>>> extern float cosf (float);
>>>
>>> The BETA ABI can be found in the vfabia64 subdir of
>>> https://github.com/ARM-software/abi-aa/
>>> This currently disagrees with how this patch series implements 'omp
>>> declare simd' for SVE and I also do not see a need for the 'omp declare
>>> variant' scalable extension constructs. I will make changes to the ABI
>>> once we've finalized the co-design of the ABI and this implementation.
>> 
>> I don't see a good reason for dropping the extension("scalable").
>> The problem is that since the base spec requires a simdlen clause,
>> GCC should in general raise an error if simdlen is omitted.
> Where can you find this in the specs? I tried to find it but couldn't.
>
> Leaving out simdlen in a 'omp declare simd' I assume is OK, our vector 
> ABI defines behaviour for this. But I couldn't find what it meant for a 
> omp declare variant, obviously can't be the same as for declare simd, as 
> that is defined to mean 'define a set of clones' and only one clone can 
> be associated to a declare variant.

I was going from https://www.openmp.org/spec-html/5.0/openmpsu25.html ,
which says:

  The simd trait can be further defined with properties that match the
  clauses accepted by the declare simd directive with the same name and
  semantics. The simd trait must define at least the simdlen property and
  one of the inbranch or notinbranch properties.

(probably best to read it in the original -- it's almost incomprehensible
without markup)

Richard


Re: [RFC 0/X] Implement GCC support for AArch64 libmvec

2023-04-20 Thread Jakub Jelinek via Gcc-patches
On Thu, Apr 20, 2023 at 04:22:50PM +0100, Andre Vieira (lists) wrote:
> > I don't see a good reason for dropping the extension("scalable").
> > The problem is that since the base spec requires a simdlen clause,
> > GCC should in general raise an error if simdlen is omitted.
> Where can you find this in the specs? I tried to find it but couldn't.
> 
> Leaving out simdlen in a 'omp declare simd' I assume is OK, our vector ABI
> defines behaviour for this. But I couldn't find what it meant for a omp
> declare variant, obviously can't be the same as for declare simd, as that is
> defined to mean 'define a set of clones' and only one clone can be
> associated to a declare variant.

For missing simdlen on omp declare simd, OpenMP 5.2 says [202:14-15]:
"If a SIMD version is created and the simdlen clause is not specified, the 
number of concurrent
arguments for the function is implementation defined."
Nobody says it must be a constant when not specified, when specified it has
to be a constant.
declare variant is function call specialization based on lots of different
aspects.  If you specify simd among construct selectors, then the
implementation is allowed (and kind of expected but not currently implemented in
GCC) to change the calling convention based on the declare simd ABIs, but
again, simdlen might be specified (then it has to have constant number in
it) or not, then I bet it is supposed to be derived from the actual
differences in the calling convention to which match it is.
But as I said, this part isn't implemented yet even on other targets.

Jakub



Re: [RFC 0/X] Implement GCC support for AArch64 libmvec

2023-04-20 Thread Andre Vieira (lists) via Gcc-patches




On 20/04/2023 15:51, Richard Sandiford wrote:

"Andre Vieira (lists)"  writes:

Hi all,

This is a series of patches/RFCs to implement support in GCC to be able
to target AArch64's libmvec functions that will be/are being added to glibc.
We have chosen to use the omp pragma '#pragma omp declare variant ...'
with a simd construct as the way for glibc to inform GCC what functions
are available.

For example, if we would like to supply a vector version of the scalar
'cosf' we would have an include file with something like:
typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
typedef __attribute__((__neon_vector_type__(2))) float __f32x2_t;
typedef __SVFloat32_t __sv_f32_t;
typedef __SVBool_t __sv_bool_t;
__f32x4_t _ZGVnN4v_cosf (__f32x4_t);
__f32x2_t _ZGVnN2v_cosf (__f32x2_t);
__sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
#pragma omp declare variant(_ZGVnN4v_cosf) \
  match(construct = {simd(notinbranch, simdlen(4))}, device =
{isa("simd")})
#pragma omp declare variant(_ZGVnN2v_cosf) \
  match(construct = {simd(notinbranch, simdlen(2))}, device =
{isa("simd")})
#pragma omp declare variant(_ZGVsMxv_cosf) \
  match(construct = {simd(inbranch)}, device = {isa("sve")})
extern float cosf (float);

The BETA ABI can be found in the vfabia64 subdir of
https://github.com/ARM-software/abi-aa/
This currently disagrees with how this patch series implements 'omp
declare simd' for SVE and I also do not see a need for the 'omp declare
variant' scalable extension constructs. I will make changes to the ABI
once we've finalized the co-design of the ABI and this implementation.


I don't see a good reason for dropping the extension("scalable").
The problem is that since the base spec requires a simdlen clause,
GCC should in general raise an error if simdlen is omitted.

Where can you find this in the specs? I tried to find it but couldn't.

Leaving out simdlen in a 'omp declare simd' I assume is OK, our vector 
ABI defines behaviour for this. But I couldn't find what it meant for a 
omp declare variant, obviously can't be the same as for declare simd, as 
that is defined to mean 'define a set of clones' and only one clone can 
be associated to a declare variant.


But I'm not sure it makes sense to ignore -msve-vector-bits= when
compiling the SVE version (which is what patch 4 seems to do).
If someone compiles with -march=armv8.4-a, we'll use all Armv8.4-A
features in the Advanced SIMD routines.  Why should we ignore
SVE-related target information for the SVE routines?
Not sure I understand what you mean.  The vector ABI defines that if a 
simdlen is omitted that (other than the NEON clones) a SVE VLA clone is 
available. So how would I take -msve-vector-bits into consideration? Do 
you mean I ought to add them as options to pass to the function so that 
it gets used when doing the codegen for the clone (if a function body is 
available)?


This is where things get a bit iffy for me though... We purposefully 
generate a SVE simdclone regardless of command-line options, just like 
x86 does, so why would these options affect simd clone generation but 
not the actual availability of SVE? Just seems a bit odd...


A viable alternative would be to rely on declare variant for such 
behaviour, where we could use function attributes to pass specific 
target options to the variant's prototype to be able to add more 
specific tuning options per variant.  Not sure it will work but I can 
try it with my rebased patches at some point. I have to admit though, it 
is not a feature we are looking to use, so not sure it's worth the 
effort. The SVE simdclone codegen (with function bodies) is already 
pretty bad, so if we do believe there is a usecase for these, that might 
be something we should focus on before this sort of more specific tuning.


Of course, the fact that we take command-line options into account
means that omp simd/variant clauses on linkonce/comdat group functions
are an ODR violation waiting to happen.  But the same is true for the
original scalar functions that the clauses are attached to.
Can't find proper definitions of linkonce/comdat group functions so 
can't comment.




Thanks,
Richard


The patch series has three main steps:
1) Add SVE support for 'omp declare simd', see PR 96342
2) Enable GCC to use omp declare variants with simd constructs as simd
clones during auto-vectorization.
3) Add SLP support for vectorizable_simd_clone_call (This sounded like a
nice thing to add as we want to move away from non-slp vectorization).

Below you can see the list of current Patches/RFCs, the difference being
on how confident I am of the proposed changes. For the RFC I am hoping
to get early comments on the approach, rather than more indepth
code-reviews.

I appreciate we are still in Stage 4, so I can completely understand if
you don't have time to review this now, but I thought it can't hurt to
post these early.

Andre Vieira:
[PATCH] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS

Re: [RFC 0/X] Implement GCC support for AArch64 libmvec

2023-04-20 Thread Richard Sandiford via Gcc-patches
"Andre Vieira (lists)"  writes:
> Hi all,
>
> This is a series of patches/RFCs to implement support in GCC to be able 
> to target AArch64's libmvec functions that will be/are being added to glibc.
> We have chosen to use the omp pragma '#pragma omp declare variant ...' 
> with a simd construct as the way for glibc to inform GCC what functions 
> are available.
>
> For example, if we would like to supply a vector version of the scalar 
> 'cosf' we would have an include file with something like:
> typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
> typedef __attribute__((__neon_vector_type__(2))) float __f32x2_t;
> typedef __SVFloat32_t __sv_f32_t;
> typedef __SVBool_t __sv_bool_t;
> __f32x4_t _ZGVnN4v_cosf (__f32x4_t);
> __f32x2_t _ZGVnN2v_cosf (__f32x2_t);
> __sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
> #pragma omp declare variant(_ZGVnN4v_cosf) \
>  match(construct = {simd(notinbranch, simdlen(4))}, device = 
> {isa("simd")})
> #pragma omp declare variant(_ZGVnN2v_cosf) \
>  match(construct = {simd(notinbranch, simdlen(2))}, device = 
> {isa("simd")})
> #pragma omp declare variant(_ZGVsMxv_cosf) \
>  match(construct = {simd(inbranch)}, device = {isa("sve")})
> extern float cosf (float);
>
> The BETA ABI can be found in the vfabia64 subdir of 
> https://github.com/ARM-software/abi-aa/
> This currently disagrees with how this patch series implements 'omp 
> declare simd' for SVE and I also do not see a need for the 'omp declare 
> variant' scalable extension constructs. I will make changes to the ABI 
> once we've finalized the co-design of the ABI and this implementation.

I don't see a good reason for dropping the extension("scalable").
The problem is that since the base spec requires a simdlen clause,
GCC should in general raise an error if simdlen is omitted.
Relaxing that for an explicit extension seems better than doing it
only based on the ISA (which should in general be a free-form string).
Having "scalable" in the definition also helps to make the intent clearer.

Any change to the declare simd behaviour should probably be agreed
with the LLVM folks first.  Like you say, we already know that GCC
can do your version, since it already does the equivalent thing for x86.

I'm not sure, but I'm guessing the declare simd VFABI was written
that way because, at the time (several years ago), there were
concerns about switching SVE on and off on a function-by-function
basis in LLVM.

But I'm not sure it makes sense to ignore -msve-vector-bits= when
compiling the SVE version (which is what patch 4 seems to do).
If someone compiles with -march=armv8.4-a, we'll use all Armv8.4-A
features in the Advanced SIMD routines.  Why should we ignore
SVE-related target information for the SVE routines?

Of course, the fact that we take command-line options into account
means that omp simd/variant clauses on linkonce/comdat group functions
are an ODR violation waiting to happen.  But the same is true for the
original scalar functions that the clauses are attached to.

Thanks,
Richard

> The patch series has three main steps:
> 1) Add SVE support for 'omp declare simd', see PR 96342
> 2) Enable GCC to use omp declare variants with simd constructs as simd 
> clones during auto-vectorization.
> 3) Add SLP support for vectorizable_simd_clone_call (This sounded like a 
> nice thing to add as we want to move away from non-slp vectorization).
>
> Below you can see the list of current Patches/RFCs, the difference being 
> on how confident I am of the proposed changes. For the RFC I am hoping 
> to get early comments on the approach, rather than more indepth 
> code-reviews.
>
> I appreciate we are still in Stage 4, so I can completely understand if 
> you don't have time to review this now, but I thought it can't hurt to 
> post these early.
>
> Andre Vieira:
> [PATCH] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS
> [PATCH] parloops: Copy target and optimizations when creating a function 
> clone
> [PATCH] parloops: Allow poly nit and bound
> [RFC] omp, aarch64: Add SVE support for 'omp declare simd' [PR 96342]
> [RFC] omp: Create simd clones from 'omp declare variant's
> [RFC] omp: Allow creation of simd clones from omp declare variant with 
> -fopenmp-simd flag
>
> Work in progress:
> [RFC] vect: Enable SLP codegen for vectorizable_simd_clone_call