[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342
--- Comment #20 from GCC Commits ---
The master branch has been updated by Tamar Christina :
https://gcc.gnu.org/g:830bead4859cd00da87e1304ba249cf0d3eb5a5a
commit r15-6597-g830bead4859cd00da87e1304ba249cf0d3eb5a5a
Author: Tamar Christina
Date: Mon Jan 6 09:24:36 2025 +
AArch64: Implement four and eight chunk VLA concats [PR118272]
The following testcase
#pragma GCC target ("+sve")
extern char __attribute__ ((simd, const)) fn3 (int, short);
void test_fn3 (float *a, float *b, double *c, int n)
{
for (int i = 0; i < n; ++i)
a[i] = fn3 (b[i], c[i]);
}
at -Ofast ICEs because my previous patch only added support for combining 2
partial SVE vectors into a bigger vector. However There can also 4 and 8
piece subvectors.
This patch fixes this by implementing the missing expansions.
gcc/ChangeLog:
PR target/96342
PR target/118272
* config/aarch64/aarch64-sve.md (vec_init,
vec_initvnx16qivnx2qi): New.
* config/aarch64/aarch64.cc
(aarch64_sve_expand_vector_init_subvector):
Rewrite to support any arbitrary combinations.
* config/aarch64/iterators.md (SVE_NO2E): Update to use SVE_NO4E
(SVE_NO2E, Vquad): New.
gcc/testsuite/ChangeLog:
PR target/96342
PR target/118272
* gcc.target/aarch64/vect-simd-clone-3.c: New test.
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 --- Comment #19 from GCC Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:89b2c7dc96c4944c306131b665a4738a8a99413e commit r15-6393-g89b2c7dc96c4944c306131b665a4738a8a99413e Author: Tamar Christina Date: Fri Dec 20 14:34:32 2024 + AArch64: Implement vector concat of partial SVE vectors [PR96342] This patch adds support for vector constructor from two partial SVE vectors into a full SVE vector. It also implements support for the standard vec_init obtab to do this. gcc/ChangeLog: PR target/96342 * config/aarch64/aarch64-protos.h (aarch64_sve_expand_vector_init_subvector): New. * config/aarch64/aarch64-sve.md (vec_init): New. (@aarch64_pack_partial): New. * config/aarch64/aarch64.cc (aarch64_sve_expand_vector_init_subvector): New. * config/aarch64/iterators.md (SVE_NO2E): New. (VHALF, Vhalf): Add SVE partial vectors. gcc/testsuite/ChangeLog: PR target/96342 * gcc.target/aarch64/vect-simd-clone-2.c: New test.
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 --- Comment #18 from GCC Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:d7d3dfe7a2a26e370805ddf835bfd00c51d32f1b commit r15-6392-gd7d3dfe7a2a26e370805ddf835bfd00c51d32f1b Author: Tamar Christina Date: Fri Dec 20 14:27:25 2024 + AArch64: Add SVE support for simd clones [PR96342] This patch finalizes adding support for the generation of SVE simd clones when no simdlen is provided, following the ABI rules where the widest data type determines the minimum amount of elements in a length agnostic vector. gcc/ChangeLog: PR target/96342 * config/aarch64/aarch64-protos.h (add_sve_type_attribute): Declare. * config/aarch64/aarch64-sve-builtins.cc (add_sve_type_attribute): Make visibility global and support use for non_acle types. * config/aarch64/aarch64.cc (aarch64_simd_clone_compute_vecsize_and_simdlen): Create VLA simd clone when no simdlen is provided, according to ABI rules. (simd_clone_adjust_sve_vector_type): New helper function. (aarch64_simd_clone_adjust): Add '+sve' attribute to SVE simd clones and modify types to use SVE types. * omp-simd-clone.cc (simd_clone_mangle): Print 'x' for VLA simdlen. (simd_clone_adjust): Adapt safelen check to be compatible with VLA simdlen. gcc/testsuite/ChangeLog: PR target/96342 * gcc.target/aarch64/declare-simd-2.c: Add SVE clone scan. * gcc.target/aarch64/vect-simd-clone-1.c: New test. * g++.target/aarch64/vect-simd-clone-1.C: New test. Co-authored-by: Victor Do Nascimento Co-authored-by: Tamar Christina
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 --- Comment #17 from GCC Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:6ecb365d4c3f36eaf684c38fc5d9008a1409c725 commit r15-6391-g6ecb365d4c3f36eaf684c38fc5d9008a1409c725 Author: Tamar Christina Date: Fri Dec 20 14:25:50 2024 + AArch64: Disable `omp declare variant' tests for aarch64 [PR96342] These tests are x86 specific and shouldn't be run for aarch64. gcc/testsuite/ChangeLog: PR target/96342 * c-c++-common/gomp/declare-variant-14.c: Make i?86 and x86_64 target only test. * gfortran.dg/gomp/declare-variant-14.f90: Likewise.
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342
--- Comment #16 from GCC Commits ---
The master branch has been updated by Tamar Christina :
https://gcc.gnu.org/g:b6242bd122757ec6c75c73a4921f24a9a382b090
commit r15-6109-gb6242bd122757ec6c75c73a4921f24a9a382b090
Author: Victor Do Nascimento
Date: Wed Dec 11 12:00:58 2024 +
middle-end: Add initial support for poly_int64 BIT_FIELD_REF in expand pass
[PR96342]
While `poly_int64' has been the default representation of bitfield size
and offset for some time, there was a lack of support for the use of
non-constant `poly_int64' values for those values throughout the
compiler, limiting the applicability of the BIT_FIELD_REF rtl expression
for variable length vectors, such as those used by SVE.
This patch starts work on extending the functionality of relevant
functions in the expand pass such as to enable their use by the compiler
for such vectors.
gcc/ChangeLog:
PR target/96342
* expr.cc (store_constructor): Enable poly_{u}int64 type usage.
(get_inner_reference): Ditto.
Co-authored-by: Tamar Christina
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 --- Comment #15 from GCC Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:d069eb91d5696a8642bd5fc44a6d47fd7f74d18b commit r15-6108-gd069eb91d5696a8642bd5fc44a6d47fd7f74d18b Author: Victor Do Nascimento Date: Wed Dec 11 12:00:09 2024 + middle-end: add vec_init support for variable length subvector concatenation. [PR96342] For architectures where the vector-length is a compile-time variable, rather representing a runtime constant, as is the case with SVE it is perfectly reasonable that such vector be made up of two (or more) subvector components of a compatible sub-length variable. One example of this would be the concatenation of two VNx4QI vectors into a single VNx8QI vector. This patch adds initial support for the enablement of this feature in the middle-end, removing the `.is_constant()' constraint on the vector's number of elements, instead making the constant no. of elements the multiple of the number of subvectors (which must then also be of variable length, such that their polynomial ratio then results in a compile-time constant) required to fill the vector. gcc/ChangeLog: PR target/96342 * expr.cc (store_constructor): add support for variable-length vectors. Co-authored-by: Tamar Christina
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 --- Comment #14 from GCC Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:240cbd2f26c0f1c1f83cfc3b69cc0271b56172e2 commit r15-6107-g240cbd2f26c0f1c1f83cfc3b69cc0271b56172e2 Author: Victor Do Nascimento Date: Wed Dec 11 11:58:55 2024 + middle-end: Fix mask length arg in call to vect_get_loop_mask [PR96342] When issuing multiple calls to a simdclone in a vectorized loop, TYPE_VECTOR_SUBPARTS(vectype) gives the incorrect number when compared to the TYPE_VECTOR_SUBPARTS result we get from the mask type derived from the relevant `rgroup_controls' entry within `vect_get_loop_mask'. By passing `masktype' instead, we are able to get the correct number of vector subparts and thu eliminate the ICE in the call to `vect_get_loop_mask' when the data type for which we retrieve the mask is wider than the one used when defining the mask at mask registration time. gcc/ChangeLog: PR target/96342 * tree-vect-stmts.cc (vectorizable_simd_clone_call): s/vectype/masktype/ in call to vect_get_loop_mask.
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 --- Comment #13 from GCC Commits --- The master branch has been updated by Tamar Christina : https://gcc.gnu.org/g:561ef7c8477ba58ea64de259af9c2d0870afd9d4 commit r15-6106-g561ef7c8477ba58ea64de259af9c2d0870afd9d4 Author: Andre Vieira Date: Wed Dec 11 11:50:22 2024 + middle-end: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE [PR96342] This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the target can reject a simd_clone based on the vector mode it is using. This is needed because for VLS SVE vectorization the vectorizer accepts Advanced SIMD simd clones when vectorizing using SVE types because the simdlens might match. This will cause type errors later on. Other targets do not currently need to use this argument. gcc/ChangeLog: PR target/96342 * target.def (TARGET_SIMD_CLONE_USABLE): Add argument. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass stmt_info to call TARGET_SIMD_CLONE_USABLE. * config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add argument and use it to reject the use of SVE simd clones with Advanced SIMD modes. * config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused argument. * config/i386/i386.cc (ix86_simd_clone_usable): Likewise. * doc/tm.texi: Regenerate Co-authored-by: Victor Do Nascimento Co-authored-by: Tamar Christina
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 Tamar Christina changed: What|Removed |Added Version|11.0|15.0 CC||tnfchris at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2024-12-03 Ever confirmed|0 |1 --- Comment #12 from Tamar Christina --- Sending respun patches today.
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 --- Comment #11 from avieira at gcc dot gnu.org --- I realized this ticket hadn't been updated in a while. Late in development for gcc-14 I realized sve simdclone usage was leading to a regression on a benchmark, I couldn't get to the bottom of the regression in time so I decided to punt this to gcc-15. Someone in our team has picked up this work and we are planning to get this done in gcc-15.
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 --- Comment #10 from avieira at gcc dot gnu.org --- yang I assume you are no longer working on this?
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 --- Comment #9 from CVS Commits --- The master branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:abe93733a265f8a8b56dbdd307380f8c83dd3ab5 commit r11-4676-gabe93733a265f8a8b56dbdd307380f8c83dd3ab5 Author: Yang Yang Date: Tue Nov 3 16:13:47 2020 + PR target/96342 Change field "simdlen" into poly_uint64 This is the first patch of PR96342. In order to add support for "omp declare simd", change the type of the field "simdlen" of struct cgraph_simd_clone from unsigned int to poly_uint64 and related adaptation. Since the length might be variable for the SVE cases. 2020-11-03 Yang Yang gcc/ChangeLog: * cgraph.h (struct cgraph_simd_clone): Change field "simdlen" of struct cgraph_simd_clone from unsigned int to poly_uint64. * config/aarch64/aarch64.c (aarch64_simd_clone_compute_vecsize_and_simdlen): adaptation of operations on "simdlen". * config/i386/i386.c (ix86_simd_clone_compute_vecsize_and_simdlen): Printf formats update. * gengtype.c (main): Handle poly_uint64. * omp-simd-clone.c (simd_clone_mangle): Likewise.Re (simd_clone_adjust_return_type): Likewise. (create_tmp_simd_array): Likewise. (simd_clone_adjust_argument_types): Likewise. (simd_clone_init_simd_arrays): Likewise. (ipa_simd_modify_function_body): Likewise. (simd_clone_adjust): Likewise. (expand_simd_clones): Likewise. * poly-int-types.h (vector_unroll_factor): New macro. * poly-int.h (constant_multiple_p): Add two-argument versions. * tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise.
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 --- Comment #8 from rsandifo at gcc dot gnu.org --- (In reply to yangyang from comment #3) > The work is mainly composed of three parts: the generating of SVE > functions for "omp declare simd" in pass_omp_simd_clone, the supporting of > SVE PCS of non-builtin types, and the generating of the call of SVE > vectoried functions in pass_vect. I plan to finish this work in the > following five steps, each step corresponds to a patch: This plan looks really good to me, thanks. I agree with everything I've snipped in this reply. > f) In pass_expand, only when a “SVE type” attribute is added to the tree > nodes of the types of arguments and return type, these types use the SVE > PCS. For now, GCC only has a mechanism for adding attributes to SVE builtin > type, so I plan to define a new hook to add attribute to the types of > arguments and return type of simdclones generated if needed. The related > processing functions are planned to be moved to aarch64.c from > aarch64-sve-builtin.cc in addition. It's a very minor detail, sorry, but I'd prefer to keep stuff in aarch64-sve-builtins.cc if possible, and simply export the functions that we need via aarch64-protos.h. > Part 4) Add the generating of VLS SVE functions for "omp declare simd". The > specification writes: “When using a simdlen(len) clause, the compiler > expects a VLS vector version of the function that is tuned for a specific > implementation of SVE. ”. Therefore I think only when the number of bits in > a SVE vector register of the target is specified and coincides with the > simdlen clause, GCC is supposed to generate the VLS SVE functions for "omp > declare simd", I think in principle we should generate this unconditionally. There are two possible approaches, in increasing order of quality of implementation: (1) Divide the problem into three cases: (a) -msve-vector-bits=scalable In this case, generate VLA code for the VLS routines. The point here is that the VLS interface guarantees that the SVE registers are a particular size, but the compiler is not required to take advantage of that information. Using VLA code is a valid implementation choice. (b) -msve-vectors-bits=N, N matches the simdlen For this we'd generate VLS code in the way that you describe. (c) -msve-vectors-bits=N, N does not match the simdlen We should silently accept this for declarations, but emit a warning or an error if the compiler needs to generate a definition. (2) Allow -msve-vector-bits= to vary on a function-by-function basis, in the same way that the set of target features can already vary on a function-by-function basis. Then, as a follow-on change, use this feature to generate VLS code for whichever simdlen the code specifies. (2) is likely to be tricky, so I'd recommend starting with (1) and treating (2) as a potential future optimisation. > Part 5) Generate the call of SVE vectoried functions in pass_vect, > specifically: > > a) Define a new hook that return true if the target support variable vector > length simdclones and set the aarch64 return value to true if TARGET_SVE. In > vectorizable_simd_clone_call, continue analyzing instead of directly > returning false. It would be good to generalise existing hooks if possible, rather than add one specifically for VLA vs. VLS. > In addition, I have finished the first two patches and attached them on > this PR. Is it necessary to send the patchs to the GCC patches mailing list > for reviewing? Yeah, if you could send them to gcc-patches, that'd be great.
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 --- Comment #7 from rsandifo at gcc dot gnu.org --- Comment on attachment 49414 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49414 part2-patch Nice :-) For the constant_multiple_p calls that calculate a vector multiple, it might be good to have a macro in poly-int-types.h with the same definition as vector_element_size, but with a name that more closely matches its use here. Maybe "vector_unroll_factor" or something (suggestions for better names welcome -- I'm not too happy with my attempt.)
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 --- Comment #6 from rsandifo at gcc dot gnu.org --- Comment on attachment 49413 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49413 part1-patch Thanks for the summary and patches, and sorry for the delayed reply. Taking part1-patch first: this generally looks good to me. Some minor things: - It isn't necessary to use poly_int stuff in i386.c, since the new poly_uint64 will naturally decay to a uint64_t in i386 target files. It might still be necessary to update some of the printf formats though. - It's more correct to use "unsigned HOST_WIDE_INT" instead of "long unsigned int" and %wd instead of %ld. One reason is that we support cross builds from 32-bit hosts, where long is 32 bits. Long is also 32 bits for ILP32. - Some of the multiple_p calls can instead use exact_div, which is the preferred way of performing a division that is known to have no remainder. I think it's clear that we need to make this change, and since it's a natural "poly_intification", I don't think it needs to wait for other SVE work to be completed. So would you be able to submit the patch to the list independently of the other work? Stage 1 closes in three weeks' time so it would be good to get this in before then. Thanks again for doing this.
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 --- Comment #5 from yangyang --- Created attachment 49414 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49414&action=edit part2-patch
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 --- Comment #4 from yangyang --- Created attachment 49413 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49413&action=edit part1-patch
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 --- Comment #3 from yangyang --- Hi, Sorry for the slow reply. After studying the specification of SVE "omp declare simd" and GCC's current implementation of "omp declare simd", I have developed a rough plan to support GCC to generating SVE functions for "omp declare simd". However, there are still some uncertainties in the plan which might need further discussion. The work is mainly composed of three parts: the generating of SVE functions for "omp declare simd" in pass_omp_simd_clone, the supporting of SVE PCS of non-builtin types, and the generating of the call of SVE vectoried functions in pass_vect. I plan to finish this work in the following five steps, each step corresponds to a patch: Part 1) Change the type of the field "simdlen" of struct cgraph_simd_clone from unsigned int to poly_uint64 and related adaptation. Since the length might be variable for the SVE cases. PR96342-part1-v1.patch Part 2) During debugging, I find that all the calls to interface simd_clone_subparts needing to be replaced with calls to TYPE_VECTOR_SUBPARTS due to the introduction of SVE simdclones. So I plan to complete all the replacements in a patch. PR96342-part1-v2.patch Part 3) Add the generating of VLA SVE (vector length agnostic, without "simdlen") functions for "omp declare simd" and skip the VLS (vector length specific) ones, specifically: a) In aarch64_simd_clone_compute_vecsize_and_simdlen, add 1 to “count” when TARGE_SVE is specified. b) Add bool type field “always_masked” in struct cgraph_simd_clone to mark simdclones that always masked and skip the generating of noinbranch version when always_masked is true. In aarch64_simd_clone_compute_vecsize_and_simdlen, set it to true when processing SVE simdclones. c) In aarch64_simd_clone_compute_vecsize_and_simdlen, set the “vecsize_mangle” to ‘s’, and the “vec_bits” to BITS_PER_SVE_VECTOR when processing VLA SVE simdclones. Report an unsupported warning when processing VLS SVE simdclones. d) Adjust simd_clone_mangle. e) Support SVE masking: For SVE vector functions, masked signatures are generated by add a svbool_t mask (corresponds to a predicate register) as the last parameter. Since aarch64 GCC currently doesn’t support muti-types simdclones, the input predicate works for all the types, GCC doesn’t need to do special adjustment. For now, I plan to follow current scheme, transform the input predicate into a bool array with [16, 16] elements (since the input predicate always has a mode of VNx16BImode), and use the active elements to build the branch, the following gimple stmts are expected to be generated: MEM > [( *)&mask.34] = mask.37_17(D); … _9 = iter.38_6 * 4; _8 = mask.34[_9]; if (_8 == 0) … The number 4 in _9 = iter.38_6 * 4; comes from arg_unit_size / mask_unit_size. For how to do this, set “clonei->mask_mode” to VNx16BImode when processing SVE simdclones in aarch64_simd_clone_compute_vecsize_and_simdlen. And when processing cgraph_simd_clone->mask_mode in common codes, add special treatment if cgraph_simd_clone->mask_mode != VOIDmode and cgraph_simd_clone->mask_mode is VECTOR_MODE, which corresponds to the SVE cases (It’s OK to do so since cgraph_simd_clone->mask_mode != VOIDmode is established only when the mask is passed in integer argument(s) in current GCC). f) In pass_expand, only when a “SVE type” attribute is added to the tree nodes of the types of arguments and return type, these types use the SVE PCS. For now, GCC only has a mechanism for adding attributes to SVE builtin type, so I plan to define a new hook to add attribute to the types of arguments and return type of simdclones generated if needed. The related processing functions are planned to be moved to aarch64.c from aarch64-sve-builtin.cc in addition. Part 4) Add the generating of VLS SVE functions for "omp declare simd". The specification writes: “When using a simdlen(len) clause, the compiler expects a VLS vector version of the function that is tuned for a specific implementation of SVE. ”. Therefore I think only when the number of bits in a SVE vector register of the target is specified and coincides with the simdlen clause, GCC is supposed to generate the VLS SVE functions for "omp declare simd", specifically: a) In aarch64_simd_clone_compute_vecsize_and_simdlen, when processing VLS SVE simdclones, if the number of bits in an SVE vector register is specified and coincides with the simdlen clause, set “clonei->vecsize_mangle”, “clonei->mask_mode”, and “clonei->always_masked” and calculate the “vec_bits”, otherwise report a warning and return NULL. b) In this case, the field "simdlen" is a constant, so using build_vector_type to build the vector type will get an advanced SIMD version instead of a SVE version, which seems to be wrong. I plan to add a new hook. The hook does some special treatment to build a SVE version vector type when processing VLS SVE simdclones, while call build_vector_type directly in other cas
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 --- Comment #2 from rsandifo at gcc dot gnu.org --- That's great, thanks for the heads-up.
[Bug target/96342] [SVE] Add support for "omp declare simd"
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342 yangyang changed: What|Removed |Added CC||yangyang305 at huawei dot com --- Comment #1 from yangyang --- I'm looking into this. Will update when I have something to discuss.
