Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
On Wed, 28 Feb 2024, Andre Vieira (lists) wrote: > > > On 27/02/2024 08:47, Richard Biener wrote: > > On Mon, 26 Feb 2024, Andre Vieira (lists) wrote: > > > >> > >> > >> On 05/02/2024 09:56, Richard Biener wrote: > >>> On Thu, 1 Feb 2024, Andre Vieira (lists) wrote: > >>> > > > On 01/02/2024 07:19, Richard Biener wrote: > > On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: > > > > > > The patch didn't come with a testcase so it's really hard to tell > > what goes wrong now and how it is fixed ... > > My bad! I had a testcase locally but never added it... > > However... now I look at it and ran it past Richard S, the codegen isn't > 'wrong', but it does have the potential to lead to some pretty slow > codegen, > especially for inbranch simdclones where it transforms the SVE predicate > into > an Advanced SIMD vector by inserting the elements one at a time... > > An example of which can be seen if you do: > > gcc -O3 -march=armv8-a+sve -msve-vector-bits=128 -fopenmp-simd t.c -S > > with the following t.c: > #pragma omp declare simd simdlen(4) inbranch > int __attribute__ ((const)) fn5(int); > > void fn4 (int *a, int *b, int n) > { > for (int i = 0; i < n; ++i) > b[i] = fn5(a[i]); > } > > Now I do have to say, for our main usecase of libmvec we won't have any > 'inbranch' Advanced SIMD clones, so we avoid that issue... But of course > that > doesn't mean user-code will. > >>> > >>> It seems to use SVE masks with vector(4) and the > >>> ABI says the mask is vector(4) int. You say that's because we choose > >>> a Adv SIMD clone for the SVE VLS vector code (it calls _ZGVnM4v_fn5). > >>> > >>> The vectorizer creates > >>> > >>> _44 = VEC_COND_EXPR ; > >>> > >>> and then vector lowering decomposes this. That means the vectorizer > >>> lacks a check that the target handles this VEC_COND_EXPR. > >>> > >>> Of course I would expect that SVE with VLS vectors is able to > >>> code generate this operation, so it's missing patterns in the end. > >>> > >>> Richard. > >>> > >> > >> What should we do for GCC-14? Going forward I think the right thing to do > >> is > >> to add these patterns. But I am not even going to try to do that right now > >> and > >> even though we can codegen for this, the result doesn't feel like it would > >> ever be profitable which means I'd rather not vectorize, or well pick a > >> different vector mode if possible. > >> > >> This would be achieved with the change to the targethook. If I change the > >> hook > >> to take modes, using STMT_VINFO_VECTYPE (stmt_vinfo), is that OK for now? > > > > Passing in a mode is OK. I'm still not fully understanding why the > > clone isn't fully specifying 'mode' and if it does not why the > > vectorizer itself can not disregard it. > > > We could check that the modes of the parameters & return type are the same as > the vector operands & result in the vectorizer. But then we'd also want to > make sure we don't reject cases where we have simdclones with compatible > modes, aka same element type, but a multiple element count. Which is where'd > we get in trouble again I think, because we'd want to accept V8SI -> 2x V4SI, > but not V8SI -> 2x VNx4SI (with VLS and aarch64_sve_vg = 2), not because it's > invalid, but because right now the codegen is bad. And it's easier to do this > in the targethook, which we can technically also use to 'rank' simdclones by > setting a target_badness value, so in the future we could decide to assign > some 'badness' to influence the rank a SVE simdclone for Advanced SIMD loops > vs an Advanced SIMD clone for Advanced SIMD loops. > > This does touch another issue of simdclone costing, which is a larger issue in > general and one we (arm) might want to approach in the future. It's a complex > issue, because the vectorizer doesn't know the performance impact of a > simdclone, we assume (as we should) that its faster than the original scalar, > though we currently don't record costs for either, but we don't know by how > much or how much impact it has, so the vectorizer can't reason whether it's > beneficial to use a simdclone if it has to do a lot of operand preparation, we > can merely tell it to use it, or not and all the other operations in the loop > will determine costing. > > > > From the past discussion I understood the existing situation isn't > > as bad as initially thought and no bad things happen right now? > Nope, I thought they compiler would fall apart, but it seems to be able to > transform the operands from one mode into the other, so without the targethook > it just generates slower loops in certain cases, which we'd rather avoid given > the usecase for simdclones is to speed things up ;) > > > Attached reworked patch. > > > This patch adds a machine_mode argument to TARGET_SIMD_CLONE_USABLE to make > sure
Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
On 27/02/2024 08:47, Richard Biener wrote: On Mon, 26 Feb 2024, Andre Vieira (lists) wrote: On 05/02/2024 09:56, Richard Biener wrote: On Thu, 1 Feb 2024, Andre Vieira (lists) wrote: On 01/02/2024 07:19, Richard Biener wrote: On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: The patch didn't come with a testcase so it's really hard to tell what goes wrong now and how it is fixed ... My bad! I had a testcase locally but never added it... However... now I look at it and ran it past Richard S, the codegen isn't 'wrong', but it does have the potential to lead to some pretty slow codegen, especially for inbranch simdclones where it transforms the SVE predicate into an Advanced SIMD vector by inserting the elements one at a time... An example of which can be seen if you do: gcc -O3 -march=armv8-a+sve -msve-vector-bits=128 -fopenmp-simd t.c -S with the following t.c: #pragma omp declare simd simdlen(4) inbranch int __attribute__ ((const)) fn5(int); void fn4 (int *a, int *b, int n) { for (int i = 0; i < n; ++i) b[i] = fn5(a[i]); } Now I do have to say, for our main usecase of libmvec we won't have any 'inbranch' Advanced SIMD clones, so we avoid that issue... But of course that doesn't mean user-code will. It seems to use SVE masks with vector(4) and the ABI says the mask is vector(4) int. You say that's because we choose a Adv SIMD clone for the SVE VLS vector code (it calls _ZGVnM4v_fn5). The vectorizer creates _44 = VEC_COND_EXPR ; and then vector lowering decomposes this. That means the vectorizer lacks a check that the target handles this VEC_COND_EXPR. Of course I would expect that SVE with VLS vectors is able to code generate this operation, so it's missing patterns in the end. Richard. What should we do for GCC-14? Going forward I think the right thing to do is to add these patterns. But I am not even going to try to do that right now and even though we can codegen for this, the result doesn't feel like it would ever be profitable which means I'd rather not vectorize, or well pick a different vector mode if possible. This would be achieved with the change to the targethook. If I change the hook to take modes, using STMT_VINFO_VECTYPE (stmt_vinfo), is that OK for now? Passing in a mode is OK. I'm still not fully understanding why the clone isn't fully specifying 'mode' and if it does not why the vectorizer itself can not disregard it. We could check that the modes of the parameters & return type are the same as the vector operands & result in the vectorizer. But then we'd also want to make sure we don't reject cases where we have simdclones with compatible modes, aka same element type, but a multiple element count. Which is where'd we get in trouble again I think, because we'd want to accept V8SI -> 2x V4SI, but not V8SI -> 2x VNx4SI (with VLS and aarch64_sve_vg = 2), not because it's invalid, but because right now the codegen is bad. And it's easier to do this in the targethook, which we can technically also use to 'rank' simdclones by setting a target_badness value, so in the future we could decide to assign some 'badness' to influence the rank a SVE simdclone for Advanced SIMD loops vs an Advanced SIMD clone for Advanced SIMD loops. This does touch another issue of simdclone costing, which is a larger issue in general and one we (arm) might want to approach in the future. It's a complex issue, because the vectorizer doesn't know the performance impact of a simdclone, we assume (as we should) that its faster than the original scalar, though we currently don't record costs for either, but we don't know by how much or how much impact it has, so the vectorizer can't reason whether it's beneficial to use a simdclone if it has to do a lot of operand preparation, we can merely tell it to use it, or not and all the other operations in the loop will determine costing. From the past discussion I understood the existing situation isn't as bad as initially thought and no bad things happen right now? Nope, I thought they compiler would fall apart, but it seems to be able to transform the operands from one mode into the other, so without the targethook it just generates slower loops in certain cases, which we'd rather avoid given the usecase for simdclones is to speed things up ;) Attached reworked patch. This patch adds a machine_mode argument to TARGET_SIMD_CLONE_USABLE to make sure the target can reject a simd_clone based on the vector mode it is using. This is needed because for VLS SVE vectorization the vectorizer accepts Advanced SIMD simd clones when vectorizing using SVE types because the simdlens might match, this currently leads to suboptimal codegen. Other targets do not currently need to use this argument. gcc/ChangeLog: * target.def (TARGET_SIMD_CLONE_USABLE): Add argument. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass vector_mode to call
Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
On Mon, 26 Feb 2024, Andre Vieira (lists) wrote: > > > On 05/02/2024 09:56, Richard Biener wrote: > > On Thu, 1 Feb 2024, Andre Vieira (lists) wrote: > > > >> > >> > >> On 01/02/2024 07:19, Richard Biener wrote: > >>> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: > >>> > >>> > >>> The patch didn't come with a testcase so it's really hard to tell > >>> what goes wrong now and how it is fixed ... > >> > >> My bad! I had a testcase locally but never added it... > >> > >> However... now I look at it and ran it past Richard S, the codegen isn't > >> 'wrong', but it does have the potential to lead to some pretty slow > >> codegen, > >> especially for inbranch simdclones where it transforms the SVE predicate > >> into > >> an Advanced SIMD vector by inserting the elements one at a time... > >> > >> An example of which can be seen if you do: > >> > >> gcc -O3 -march=armv8-a+sve -msve-vector-bits=128 -fopenmp-simd t.c -S > >> > >> with the following t.c: > >> #pragma omp declare simd simdlen(4) inbranch > >> int __attribute__ ((const)) fn5(int); > >> > >> void fn4 (int *a, int *b, int n) > >> { > >> for (int i = 0; i < n; ++i) > >> b[i] = fn5(a[i]); > >> } > >> > >> Now I do have to say, for our main usecase of libmvec we won't have any > >> 'inbranch' Advanced SIMD clones, so we avoid that issue... But of course > >> that > >> doesn't mean user-code will. > > > > It seems to use SVE masks with vector(4) and the > > ABI says the mask is vector(4) int. You say that's because we choose > > a Adv SIMD clone for the SVE VLS vector code (it calls _ZGVnM4v_fn5). > > > > The vectorizer creates > > > >_44 = VEC_COND_EXPR ; > > > > and then vector lowering decomposes this. That means the vectorizer > > lacks a check that the target handles this VEC_COND_EXPR. > > > > Of course I would expect that SVE with VLS vectors is able to > > code generate this operation, so it's missing patterns in the end. > > > > Richard. > > > > What should we do for GCC-14? Going forward I think the right thing to do is > to add these patterns. But I am not even going to try to do that right now and > even though we can codegen for this, the result doesn't feel like it would > ever be profitable which means I'd rather not vectorize, or well pick a > different vector mode if possible. > > This would be achieved with the change to the targethook. If I change the hook > to take modes, using STMT_VINFO_VECTYPE (stmt_vinfo), is that OK for now? Passing in a mode is OK. I'm still not fully understanding why the clone isn't fully specifying 'mode' and if it does not why the vectorizer itself can not disregard it. >From the past discussion I understood the existing situation isn't as bad as initially thought and no bad things happen right now? Thanks, Richard.
Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
On 05/02/2024 09:56, Richard Biener wrote: On Thu, 1 Feb 2024, Andre Vieira (lists) wrote: On 01/02/2024 07:19, Richard Biener wrote: On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: The patch didn't come with a testcase so it's really hard to tell what goes wrong now and how it is fixed ... My bad! I had a testcase locally but never added it... However... now I look at it and ran it past Richard S, the codegen isn't 'wrong', but it does have the potential to lead to some pretty slow codegen, especially for inbranch simdclones where it transforms the SVE predicate into an Advanced SIMD vector by inserting the elements one at a time... An example of which can be seen if you do: gcc -O3 -march=armv8-a+sve -msve-vector-bits=128 -fopenmp-simd t.c -S with the following t.c: #pragma omp declare simd simdlen(4) inbranch int __attribute__ ((const)) fn5(int); void fn4 (int *a, int *b, int n) { for (int i = 0; i < n; ++i) b[i] = fn5(a[i]); } Now I do have to say, for our main usecase of libmvec we won't have any 'inbranch' Advanced SIMD clones, so we avoid that issue... But of course that doesn't mean user-code will. It seems to use SVE masks with vector(4) and the ABI says the mask is vector(4) int. You say that's because we choose a Adv SIMD clone for the SVE VLS vector code (it calls _ZGVnM4v_fn5). The vectorizer creates _44 = VEC_COND_EXPR ; and then vector lowering decomposes this. That means the vectorizer lacks a check that the target handles this VEC_COND_EXPR. Of course I would expect that SVE with VLS vectors is able to code generate this operation, so it's missing patterns in the end. Richard. What should we do for GCC-14? Going forward I think the right thing to do is to add these patterns. But I am not even going to try to do that right now and even though we can codegen for this, the result doesn't feel like it would ever be profitable which means I'd rather not vectorize, or well pick a different vector mode if possible. This would be achieved with the change to the targethook. If I change the hook to take modes, using STMT_VINFO_VECTYPE (stmt_vinfo), is that OK for now? Kind regards, Andre
Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
On Thu, 1 Feb 2024, Andre Vieira (lists) wrote: > > > On 01/02/2024 07:19, Richard Biener wrote: > > On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: > > > > > > The patch didn't come with a testcase so it's really hard to tell > > what goes wrong now and how it is fixed ... > > My bad! I had a testcase locally but never added it... > > However... now I look at it and ran it past Richard S, the codegen isn't > 'wrong', but it does have the potential to lead to some pretty slow codegen, > especially for inbranch simdclones where it transforms the SVE predicate into > an Advanced SIMD vector by inserting the elements one at a time... > > An example of which can be seen if you do: > > gcc -O3 -march=armv8-a+sve -msve-vector-bits=128 -fopenmp-simd t.c -S > > with the following t.c: > #pragma omp declare simd simdlen(4) inbranch > int __attribute__ ((const)) fn5(int); > > void fn4 (int *a, int *b, int n) > { > for (int i = 0; i < n; ++i) > b[i] = fn5(a[i]); > } > > Now I do have to say, for our main usecase of libmvec we won't have any > 'inbranch' Advanced SIMD clones, so we avoid that issue... But of course that > doesn't mean user-code will. It seems to use SVE masks with vector(4) and the ABI says the mask is vector(4) int. You say that's because we choose a Adv SIMD clone for the SVE VLS vector code (it calls _ZGVnM4v_fn5). The vectorizer creates _44 = VEC_COND_EXPR ; and then vector lowering decomposes this. That means the vectorizer lacks a check that the target handles this VEC_COND_EXPR. Of course I would expect that SVE with VLS vectors is able to code generate this operation, so it's missing patterns in the end. Richard. > I'm gonna remove this patch and run another test regression to see if it > catches anything weird, but if not then I guess we do have the option to not > use this patch and aim to solve the costing or codegen issue in GCC-15. We > don't currently do any simdclone costing and I don't have a clear suggestion > for how given openmp has no mechanism that I know off to expose the speedup of > a simdclone over it's scalar variant, so how would we 'compare' a simdclone > call with extra overhead of argument preparation vs scalar, though at least we > could prefer a call to a different simdclone with less argument preparation. > Anyways I digress. > > Other tests, these require aarch64-autovec-preference=2 so that also has me > worried less... > > gcc -O3 -march=armv8-a+sve -msve-vector-bits=128 --param > aarch64-autovec-preference=2 -fopenmp-simd t.c -S > > t.c: > #pragma omp declare simd simdlen(2) notinbranch > float __attribute__ ((const)) fn1(double); > > void fn0 (float *a, float *b, int n) > { > for (int i = 0; i < n; ++i) > b[i] = fn1((double) a[i]); > } > > #pragma omp declare simd simdlen(2) notinbranch > float __attribute__ ((const)) fn3(float); > > void fn2 (float *a, double *b, int n) > { > for (int i = 0; i < n; ++i) > b[i] = (double) fn3(a[i]); > } > > > Richard. > > > >>> > >>> That said, I wonder how we end up mixing things up in the first place. > >>> > >>> Richard. > >> > > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
On 01/02/2024 07:19, Richard Biener wrote: On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: The patch didn't come with a testcase so it's really hard to tell what goes wrong now and how it is fixed ... My bad! I had a testcase locally but never added it... However... now I look at it and ran it past Richard S, the codegen isn't 'wrong', but it does have the potential to lead to some pretty slow codegen, especially for inbranch simdclones where it transforms the SVE predicate into an Advanced SIMD vector by inserting the elements one at a time... An example of which can be seen if you do: gcc -O3 -march=armv8-a+sve -msve-vector-bits=128 -fopenmp-simd t.c -S with the following t.c: #pragma omp declare simd simdlen(4) inbranch int __attribute__ ((const)) fn5(int); void fn4 (int *a, int *b, int n) { for (int i = 0; i < n; ++i) b[i] = fn5(a[i]); } Now I do have to say, for our main usecase of libmvec we won't have any 'inbranch' Advanced SIMD clones, so we avoid that issue... But of course that doesn't mean user-code will. I'm gonna remove this patch and run another test regression to see if it catches anything weird, but if not then I guess we do have the option to not use this patch and aim to solve the costing or codegen issue in GCC-15. We don't currently do any simdclone costing and I don't have a clear suggestion for how given openmp has no mechanism that I know off to expose the speedup of a simdclone over it's scalar variant, so how would we 'compare' a simdclone call with extra overhead of argument preparation vs scalar, though at least we could prefer a call to a different simdclone with less argument preparation. Anyways I digress. Other tests, these require aarch64-autovec-preference=2 so that also has me worried less... gcc -O3 -march=armv8-a+sve -msve-vector-bits=128 --param aarch64-autovec-preference=2 -fopenmp-simd t.c -S t.c: #pragma omp declare simd simdlen(2) notinbranch float __attribute__ ((const)) fn1(double); void fn0 (float *a, float *b, int n) { for (int i = 0; i < n; ++i) b[i] = fn1((double) a[i]); } #pragma omp declare simd simdlen(2) notinbranch float __attribute__ ((const)) fn3(float); void fn2 (float *a, double *b, int n) { for (int i = 0; i < n; ++i) b[i] = (double) fn3(a[i]); } Richard. That said, I wonder how we end up mixing things up in the first place. Richard.
Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
"Andre Vieira (lists)" writes: > [...] The question at hand > here is, what can the vectorizer use for a specific loop. If we are > using Advanced SIMD modes then it needs to call an Advanced SIMD clone, > and if we are using SVE modes then it needs to call an SVE clone. At > least until we support the ABI conversion, because like I said for an > unpacked argument they behave differently. Probably also worth noting that multi-byte elements are laid out differently for big-endian. E.g. V4SI is loaded as a 128-bit integer whereas VNx4SI is loaded as an array of 4 32-bit integers, with the first 32-bit integer going in the least significant bits of the register. So it would only be possible to use Advanced SIMD clones for SVE modes and vice versa for little-endian, or if the elements are all bytes, or if we add some reverses to the inputs and outputs. Richard
Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: > > > On 31/01/2024 14:35, Richard Biener wrote: > > On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: > > > >> > >> > >> On 31/01/2024 13:58, Richard Biener wrote: > >>> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: > >>> > > > On 31/01/2024 12:13, Richard Biener wrote: > > On Wed, 31 Jan 2024, Richard Biener wrote: > > > >> On Tue, 30 Jan 2024, Andre Vieira wrote: > >> > >>> > >>> This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure > >>> the > >>> target can reject a simd_clone based on the vector mode it is using. > >>> This is needed because for VLS SVE vectorization the vectorizer > >>> accepts > >>> Advanced SIMD simd clones when vectorizing using SVE types because the > >>> simdlens > >>> might match. This will cause type errors later on. > >>> > >>> Other targets do not currently need to use this argument. > >> > >> Can you instead pass down the mode? > > > > Thinking about that again the cgraph_simd_clone info in the clone > > should have sufficient information to disambiguate. If it doesn't > > then we should amend it. > > > > Richard. > > Hi Richard, > > Thanks for the review, I don't think cgraph_simd_clone_info is the right > place > to pass down this information, since this is information about the caller > rather than the simdclone itself. What we are trying to achieve here is > making > the vectorizer being able to accept or reject simdclones based on the ISA > we > are vectorizing for. To distinguish between SVE and Advanced SIMD ISAs we > use > modes, I am also not sure that's ideal but it is what we currently use. > So > to > answer your earlier question, yes I can also pass down mode if that's > preferable. > >>> > >>> Note cgraph_simd_clone_info has simdlen and we seem to check elsewhere > >>> whether that's POLY or constant. I wonder how aarch64_sve_mode_p > >>> comes into play here which in the end classifies VLS SVE modes as > >>> non-SVE? > >>> > >> > >> Using -msve-vector-bits=128 > >> (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo)) > >> $4 = E_VNx4SImode > >> (gdb) p TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo)) > >> $5 = (tree) 0xf741c1b0 > >> (gdb) p debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo))) > >> 128 > >> (gdb) p aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))) > >> $5 = true > >> > >> and for reference without vls codegen: > >> (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo)) > >> $1 = E_VNx4SImode > >> (gdb) p debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo))) > >> POLY_INT_CST [128, 128] > >> > >> Having said that I believe that the USABLE targethook implementation for > >> aarch64 should also block other uses, like an Advanced SIMD mode being used > >> as > >> input for a SVE VLS SIMDCLONE. The reason being that for instance 'half' > >> registers like VNx2SI are packed differently from V2SI. > >> > >> We could teach the vectorizer to support these of course, but that requires > >> more work and is not extremely useful just yet. I'll add the extra check > >> that > >> to the patch once we agree on how to pass down the information we need. > >> Happy > >> to use either mode, or stmt_vec_info and extract the mode from it like it > >> does > >> now. > > > > As said, please pass down 'mode'. But I wonder how to document it, > > which mode is that supposed to be? Any of result or any argument > > mode that happens to be a vector? I think that we might be able > > to mix Advanced SIMD modes and SVE modes with -msve-vector-bits=128 > > in the same loop? > > > > Are the simd clones you don't want to use with -msve-vector-bits=128 > > having constant simdlen? If so why do you generate them in the first > > place? > > So this is where things get a bit confusing and I will write up some text for > these cases to put in our ABI document (currently in Beta and in need of some > tlc). > > Our intended behaviour is for a 'declare simd' without a simdlen to generate > simdclones for: > * Advanced SIMD 128 and 64-bit vectors, where possible (we don't allow for > simdlen 1, Tamar fixed that in gcc recently), > * SVE VLA vectors. > > Let me illustrate this with an example: > > __attribute__ ((simd (notinbranch), const)) float cosf(float); > > Should tell the compiler the following simd clones are available: > __ZGVnN4v_cosf 128-bit 4x4 float Advanced SIMD clone > __ZGVnN2v_cosf 64-bit 4x2 float Advanced SIMD clone > __ZGVsMxv_cosf [128, 128]-bit 4x4xN SVE SIMD clone > > [To save you looking into the abi let me break this down, _ZGV is prefix, then > 'n' or 's' picks between Advanced SIMD and SVE, 'N' or 'M' picks between Not > Masked and Masked (SVE is always masked even if we ask for notinbranch), then > a digit or 'x' picks between Vector Length or VLA, and after that you get
Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
On 31/01/2024 14:35, Richard Biener wrote: On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: On 31/01/2024 13:58, Richard Biener wrote: On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: On 31/01/2024 12:13, Richard Biener wrote: On Wed, 31 Jan 2024, Richard Biener wrote: On Tue, 30 Jan 2024, Andre Vieira wrote: This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the target can reject a simd_clone based on the vector mode it is using. This is needed because for VLS SVE vectorization the vectorizer accepts Advanced SIMD simd clones when vectorizing using SVE types because the simdlens might match. This will cause type errors later on. Other targets do not currently need to use this argument. Can you instead pass down the mode? Thinking about that again the cgraph_simd_clone info in the clone should have sufficient information to disambiguate. If it doesn't then we should amend it. Richard. Hi Richard, Thanks for the review, I don't think cgraph_simd_clone_info is the right place to pass down this information, since this is information about the caller rather than the simdclone itself. What we are trying to achieve here is making the vectorizer being able to accept or reject simdclones based on the ISA we are vectorizing for. To distinguish between SVE and Advanced SIMD ISAs we use modes, I am also not sure that's ideal but it is what we currently use. So to answer your earlier question, yes I can also pass down mode if that's preferable. Note cgraph_simd_clone_info has simdlen and we seem to check elsewhere whether that's POLY or constant. I wonder how aarch64_sve_mode_p comes into play here which in the end classifies VLS SVE modes as non-SVE? Using -msve-vector-bits=128 (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo)) $4 = E_VNx4SImode (gdb) p TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo)) $5 = (tree) 0xf741c1b0 (gdb) p debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo))) 128 (gdb) p aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))) $5 = true and for reference without vls codegen: (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo)) $1 = E_VNx4SImode (gdb) p debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo))) POLY_INT_CST [128, 128] Having said that I believe that the USABLE targethook implementation for aarch64 should also block other uses, like an Advanced SIMD mode being used as input for a SVE VLS SIMDCLONE. The reason being that for instance 'half' registers like VNx2SI are packed differently from V2SI. We could teach the vectorizer to support these of course, but that requires more work and is not extremely useful just yet. I'll add the extra check that to the patch once we agree on how to pass down the information we need. Happy to use either mode, or stmt_vec_info and extract the mode from it like it does now. As said, please pass down 'mode'. But I wonder how to document it, which mode is that supposed to be? Any of result or any argument mode that happens to be a vector? I think that we might be able to mix Advanced SIMD modes and SVE modes with -msve-vector-bits=128 in the same loop? Are the simd clones you don't want to use with -msve-vector-bits=128 having constant simdlen? If so why do you generate them in the first place? So this is where things get a bit confusing and I will write up some text for these cases to put in our ABI document (currently in Beta and in need of some tlc). Our intended behaviour is for a 'declare simd' without a simdlen to generate simdclones for: * Advanced SIMD 128 and 64-bit vectors, where possible (we don't allow for simdlen 1, Tamar fixed that in gcc recently), * SVE VLA vectors. Let me illustrate this with an example: __attribute__ ((simd (notinbranch), const)) float cosf(float); Should tell the compiler the following simd clones are available: __ZGVnN4v_cosf 128-bit 4x4 float Advanced SIMD clone __ZGVnN2v_cosf 64-bit 4x2 float Advanced SIMD clone __ZGVsMxv_cosf [128, 128]-bit 4x4xN SVE SIMD clone [To save you looking into the abi let me break this down, _ZGV is prefix, then 'n' or 's' picks between Advanced SIMD and SVE, 'N' or 'M' picks between Not Masked and Masked (SVE is always masked even if we ask for notinbranch), then a digit or 'x' picks between Vector Length or VLA, and after that you get a letter per argument, where v = vector mapped] Regardless of -msve-vector-bits, however, the vectorizer (and any other part of the compiler) may assume that the VL of the VLA SVE clone is that specified by -msve-vector-bits, which if the clone is written in a VLA way will still work. If the attribute is used with a function definition rather than declaration, so: __attribute__ ((simd (notinbranch), const)) float fn0(float a) { return a + 1.0f; } the compiler should again generate the three simd clones: __ZGVnN4v_fn0 128-bit 4x4 float Advanced SIMD clone __ZGVnN2v_fn0 64-bit 4x2 float Advanced SIMD clone __ZGVsMxv_fn0 [128, 128]-bit 4x4xN SVE SIMD
Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
On 31/01/2024 14:03, Richard Biener wrote: On Wed, 31 Jan 2024, Richard Biener wrote: On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: On 31/01/2024 12:13, Richard Biener wrote: On Wed, 31 Jan 2024, Richard Biener wrote: On Tue, 30 Jan 2024, Andre Vieira wrote: This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the target can reject a simd_clone based on the vector mode it is using. This is needed because for VLS SVE vectorization the vectorizer accepts Advanced SIMD simd clones when vectorizing using SVE types because the simdlens might match. This will cause type errors later on. Other targets do not currently need to use this argument. Can you instead pass down the mode? Thinking about that again the cgraph_simd_clone info in the clone should have sufficient information to disambiguate. If it doesn't then we should amend it. Richard. Hi Richard, Thanks for the review, I don't think cgraph_simd_clone_info is the right place to pass down this information, since this is information about the caller rather than the simdclone itself. What we are trying to achieve here is making the vectorizer being able to accept or reject simdclones based on the ISA we are vectorizing for. To distinguish between SVE and Advanced SIMD ISAs we use modes, I am also not sure that's ideal but it is what we currently use. So to answer your earlier question, yes I can also pass down mode if that's preferable. Note cgraph_simd_clone_info has simdlen and we seem to check elsewhere whether that's POLY or constant. I wonder how aarch64_sve_mode_p comes into play here which in the end classifies VLS SVE modes as non-SVE? Maybe it's just a bit non-obvious as you key on mangling: static int -aarch64_simd_clone_usable (struct cgraph_node *node) +aarch64_simd_clone_usable (struct cgraph_node *node, stmt_vec_info stmt_vinfo) { switch (node->simdclone->vecsize_mangle) { case 'n': if (!TARGET_SIMD) return -1; + if (STMT_VINFO_VECTYPE (stmt_vinfo) + && aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo + return -1; ? What does 'n' mean? It's documented as /* The mangling character for a given vector size. This is used to determine the ISA mangling bit as specified in the Intel Vector ABI. */ unsigned char vecsize_mangle; I'll update the comment, but yeh 'n' is for Advanced SIMD, 's' is for SVE. which is slightly misleading.
Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: > > > On 31/01/2024 13:58, Richard Biener wrote: > > On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: > > > >> > >> > >> On 31/01/2024 12:13, Richard Biener wrote: > >>> On Wed, 31 Jan 2024, Richard Biener wrote: > >>> > On Tue, 30 Jan 2024, Andre Vieira wrote: > > > > > This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure > > the > > target can reject a simd_clone based on the vector mode it is using. > > This is needed because for VLS SVE vectorization the vectorizer accepts > > Advanced SIMD simd clones when vectorizing using SVE types because the > > simdlens > > might match. This will cause type errors later on. > > > > Other targets do not currently need to use this argument. > > Can you instead pass down the mode? > >>> > >>> Thinking about that again the cgraph_simd_clone info in the clone > >>> should have sufficient information to disambiguate. If it doesn't > >>> then we should amend it. > >>> > >>> Richard. > >> > >> Hi Richard, > >> > >> Thanks for the review, I don't think cgraph_simd_clone_info is the right > >> place > >> to pass down this information, since this is information about the caller > >> rather than the simdclone itself. What we are trying to achieve here is > >> making > >> the vectorizer being able to accept or reject simdclones based on the ISA > >> we > >> are vectorizing for. To distinguish between SVE and Advanced SIMD ISAs we > >> use > >> modes, I am also not sure that's ideal but it is what we currently use. So > >> to > >> answer your earlier question, yes I can also pass down mode if that's > >> preferable. > > > > Note cgraph_simd_clone_info has simdlen and we seem to check elsewhere > > whether that's POLY or constant. I wonder how aarch64_sve_mode_p > > comes into play here which in the end classifies VLS SVE modes as > > non-SVE? > > > > Using -msve-vector-bits=128 > (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo)) > $4 = E_VNx4SImode > (gdb) p TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo)) > $5 = (tree) 0xf741c1b0 > (gdb) p debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo))) > 128 > (gdb) p aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))) > $5 = true > > and for reference without vls codegen: > (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo)) > $1 = E_VNx4SImode > (gdb) p debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo))) > POLY_INT_CST [128, 128] > > Having said that I believe that the USABLE targethook implementation for > aarch64 should also block other uses, like an Advanced SIMD mode being used as > input for a SVE VLS SIMDCLONE. The reason being that for instance 'half' > registers like VNx2SI are packed differently from V2SI. > > We could teach the vectorizer to support these of course, but that requires > more work and is not extremely useful just yet. I'll add the extra check that > to the patch once we agree on how to pass down the information we need. Happy > to use either mode, or stmt_vec_info and extract the mode from it like it does > now. As said, please pass down 'mode'. But I wonder how to document it, which mode is that supposed to be? Any of result or any argument mode that happens to be a vector? I think that we might be able to mix Advanced SIMD modes and SVE modes with -msve-vector-bits=128 in the same loop? Are the simd clones you don't want to use with -msve-vector-bits=128 having constant simdlen? If so why do you generate them in the first place? That said, I wonder how we end up mixing things up in the first place. Richard.
Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
On 31/01/2024 13:58, Richard Biener wrote: On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: On 31/01/2024 12:13, Richard Biener wrote: On Wed, 31 Jan 2024, Richard Biener wrote: On Tue, 30 Jan 2024, Andre Vieira wrote: This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the target can reject a simd_clone based on the vector mode it is using. This is needed because for VLS SVE vectorization the vectorizer accepts Advanced SIMD simd clones when vectorizing using SVE types because the simdlens might match. This will cause type errors later on. Other targets do not currently need to use this argument. Can you instead pass down the mode? Thinking about that again the cgraph_simd_clone info in the clone should have sufficient information to disambiguate. If it doesn't then we should amend it. Richard. Hi Richard, Thanks for the review, I don't think cgraph_simd_clone_info is the right place to pass down this information, since this is information about the caller rather than the simdclone itself. What we are trying to achieve here is making the vectorizer being able to accept or reject simdclones based on the ISA we are vectorizing for. To distinguish between SVE and Advanced SIMD ISAs we use modes, I am also not sure that's ideal but it is what we currently use. So to answer your earlier question, yes I can also pass down mode if that's preferable. Note cgraph_simd_clone_info has simdlen and we seem to check elsewhere whether that's POLY or constant. I wonder how aarch64_sve_mode_p comes into play here which in the end classifies VLS SVE modes as non-SVE? Using -msve-vector-bits=128 (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo)) $4 = E_VNx4SImode (gdb) p TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo)) $5 = (tree) 0xf741c1b0 (gdb) p debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo))) 128 (gdb) p aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))) $5 = true and for reference without vls codegen: (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo)) $1 = E_VNx4SImode (gdb) p debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo))) POLY_INT_CST [128, 128] Having said that I believe that the USABLE targethook implementation for aarch64 should also block other uses, like an Advanced SIMD mode being used as input for a SVE VLS SIMDCLONE. The reason being that for instance 'half' registers like VNx2SI are packed differently from V2SI. We could teach the vectorizer to support these of course, but that requires more work and is not extremely useful just yet. I'll add the extra check that to the patch once we agree on how to pass down the information we need. Happy to use either mode, or stmt_vec_info and extract the mode from it like it does now. Regards, Andre
Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
On Wed, 31 Jan 2024, Richard Biener wrote: > On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: > > > > > > > On 31/01/2024 12:13, Richard Biener wrote: > > > On Wed, 31 Jan 2024, Richard Biener wrote: > > > > > >> On Tue, 30 Jan 2024, Andre Vieira wrote: > > >> > > >>> > > >>> This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure > > >>> the > > >>> target can reject a simd_clone based on the vector mode it is using. > > >>> This is needed because for VLS SVE vectorization the vectorizer accepts > > >>> Advanced SIMD simd clones when vectorizing using SVE types because the > > >>> simdlens > > >>> might match. This will cause type errors later on. > > >>> > > >>> Other targets do not currently need to use this argument. > > >> > > >> Can you instead pass down the mode? > > > > > > Thinking about that again the cgraph_simd_clone info in the clone > > > should have sufficient information to disambiguate. If it doesn't > > > then we should amend it. > > > > > > Richard. > > > > Hi Richard, > > > > Thanks for the review, I don't think cgraph_simd_clone_info is the right > > place > > to pass down this information, since this is information about the caller > > rather than the simdclone itself. What we are trying to achieve here is > > making > > the vectorizer being able to accept or reject simdclones based on the ISA we > > are vectorizing for. To distinguish between SVE and Advanced SIMD ISAs we > > use > > modes, I am also not sure that's ideal but it is what we currently use. So > > to > > answer your earlier question, yes I can also pass down mode if that's > > preferable. > > Note cgraph_simd_clone_info has simdlen and we seem to check elsewhere > whether that's POLY or constant. I wonder how aarch64_sve_mode_p > comes into play here which in the end classifies VLS SVE modes as > non-SVE? Maybe it's just a bit non-obvious as you key on mangling: static int -aarch64_simd_clone_usable (struct cgraph_node *node) +aarch64_simd_clone_usable (struct cgraph_node *node, stmt_vec_info stmt_vinfo) { switch (node->simdclone->vecsize_mangle) { case 'n': if (!TARGET_SIMD) return -1; + if (STMT_VINFO_VECTYPE (stmt_vinfo) + && aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo + return -1; ? What does 'n' mean? It's documented as /* The mangling character for a given vector size. This is used to determine the ISA mangling bit as specified in the Intel Vector ABI. */ unsigned char vecsize_mangle; which is slightly misleading.
Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
On Wed, 31 Jan 2024, Andre Vieira (lists) wrote: > > > On 31/01/2024 12:13, Richard Biener wrote: > > On Wed, 31 Jan 2024, Richard Biener wrote: > > > >> On Tue, 30 Jan 2024, Andre Vieira wrote: > >> > >>> > >>> This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the > >>> target can reject a simd_clone based on the vector mode it is using. > >>> This is needed because for VLS SVE vectorization the vectorizer accepts > >>> Advanced SIMD simd clones when vectorizing using SVE types because the > >>> simdlens > >>> might match. This will cause type errors later on. > >>> > >>> Other targets do not currently need to use this argument. > >> > >> Can you instead pass down the mode? > > > > Thinking about that again the cgraph_simd_clone info in the clone > > should have sufficient information to disambiguate. If it doesn't > > then we should amend it. > > > > Richard. > > Hi Richard, > > Thanks for the review, I don't think cgraph_simd_clone_info is the right place > to pass down this information, since this is information about the caller > rather than the simdclone itself. What we are trying to achieve here is making > the vectorizer being able to accept or reject simdclones based on the ISA we > are vectorizing for. To distinguish between SVE and Advanced SIMD ISAs we use > modes, I am also not sure that's ideal but it is what we currently use. So to > answer your earlier question, yes I can also pass down mode if that's > preferable. Note cgraph_simd_clone_info has simdlen and we seem to check elsewhere whether that's POLY or constant. I wonder how aarch64_sve_mode_p comes into play here which in the end classifies VLS SVE modes as non-SVE? > Regards, > Andre > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
On 31/01/2024 12:13, Richard Biener wrote: On Wed, 31 Jan 2024, Richard Biener wrote: On Tue, 30 Jan 2024, Andre Vieira wrote: This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the target can reject a simd_clone based on the vector mode it is using. This is needed because for VLS SVE vectorization the vectorizer accepts Advanced SIMD simd clones when vectorizing using SVE types because the simdlens might match. This will cause type errors later on. Other targets do not currently need to use this argument. Can you instead pass down the mode? Thinking about that again the cgraph_simd_clone info in the clone should have sufficient information to disambiguate. If it doesn't then we should amend it. Richard. Hi Richard, Thanks for the review, I don't think cgraph_simd_clone_info is the right place to pass down this information, since this is information about the caller rather than the simdclone itself. What we are trying to achieve here is making the vectorizer being able to accept or reject simdclones based on the ISA we are vectorizing for. To distinguish between SVE and Advanced SIMD ISAs we use modes, I am also not sure that's ideal but it is what we currently use. So to answer your earlier question, yes I can also pass down mode if that's preferable. Regards, Andre
Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
On Wed, 31 Jan 2024, Richard Biener wrote: > On Tue, 30 Jan 2024, Andre Vieira wrote: > > > > > This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the > > target can reject a simd_clone based on the vector mode it is using. > > This is needed because for VLS SVE vectorization the vectorizer accepts > > Advanced SIMD simd clones when vectorizing using SVE types because the > > simdlens > > might match. This will cause type errors later on. > > > > Other targets do not currently need to use this argument. > > Can you instead pass down the mode? Thinking about that again the cgraph_simd_clone info in the clone should have sufficient information to disambiguate. If it doesn't then we should amend it. Richard.
Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
On Tue, 30 Jan 2024, Andre Vieira wrote: > > This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the > target can reject a simd_clone based on the vector mode it is using. > This is needed because for VLS SVE vectorization the vectorizer accepts > Advanced SIMD simd clones when vectorizing using SVE types because the > simdlens > might match. This will cause type errors later on. > > Other targets do not currently need to use this argument. Can you instead pass down the mode? > gcc/ChangeLog: > > * target.def (TARGET_SIMD_CLONE_USABLE): Add argument. > * tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass stmt_info to > call TARGET_SIMD_CLONE_USABLE. > * config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add argument > and use it to reject the use of SVE simd clones with Advanced SIMD > modes. > * config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused argument. > * config/i386/i386.cc (ix86_simd_clone_usable): Likewise. > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
[PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the target can reject a simd_clone based on the vector mode it is using. This is needed because for VLS SVE vectorization the vectorizer accepts Advanced SIMD simd clones when vectorizing using SVE types because the simdlens might match. This will cause type errors later on. Other targets do not currently need to use this argument. gcc/ChangeLog: * target.def (TARGET_SIMD_CLONE_USABLE): Add argument. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass stmt_info to call TARGET_SIMD_CLONE_USABLE. * config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add argument and use it to reject the use of SVE simd clones with Advanced SIMD modes. * config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused argument. * config/i386/i386.cc (ix86_simd_clone_usable): Likewise. diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index a37d47b243e..31617510160 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -28694,13 +28694,16 @@ aarch64_simd_clone_adjust (struct cgraph_node *node) /* Implement TARGET_SIMD_CLONE_USABLE. */ static int -aarch64_simd_clone_usable (struct cgraph_node *node) +aarch64_simd_clone_usable (struct cgraph_node *node, stmt_vec_info stmt_vinfo) { switch (node->simdclone->vecsize_mangle) { case 'n': if (!TARGET_SIMD) return -1; + if (STMT_VINFO_VECTYPE (stmt_vinfo) + && aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo + return -1; return 0; default: gcc_unreachable (); diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index e80de2ce056..c48b212d9e6 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -5658,7 +5658,8 @@ gcn_simd_clone_adjust (struct cgraph_node *ARG_UNUSED (node)) /* Implement TARGET_SIMD_CLONE_USABLE. */ static int -gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node)) +gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node), + stmt_vec_info ARG_UNUSED (stmt_vinfo)) { /* We don't need to do anything here because gcn_simd_clone_compute_vecsize_and_simdlen currently only returns one diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index b3e7c74846e..63e6b9d2643 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -25193,7 +25193,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, slightly less desirable, etc.). */ static int -ix86_simd_clone_usable (struct cgraph_node *node) +ix86_simd_clone_usable (struct cgraph_node *node, + stmt_vec_info ARG_UNUSED (stmt_vinfo)) { switch (node->simdclone->vecsize_mangle) { diff --git a/gcc/target.def b/gcc/target.def index fdad7bbc93e..4fade9c4eec 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1648,7 +1648,7 @@ DEFHOOK in vectorized loops in current function, or non-negative number if it is\n\ usable. In that case, the smaller the number is, the more desirable it is\n\ to use it.", -int, (struct cgraph_node *), NULL) +int, (struct cgraph_node *, _stmt_vec_info *), NULL) HOOK_VECTOR_END (simd_clone) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 1dbe1115da4..da02082c034 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -4074,7 +4074,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, this_badness += floor_log2 (num_calls) * 4096; if (n->simdclone->inbranch) this_badness += 8192; - int target_badness = targetm.simd_clone.usable (n); + int target_badness = targetm.simd_clone.usable (n, stmt_info); if (target_badness < 0) continue; this_badness += target_badness * 512;