[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #13 from rsandifo at gcc dot gnu.org --- Fixed for arm and aarch64.
[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253 --- Comment #12 from CVS Commits --- The trunk branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:7313381d2ce44b72b4c9f70bd5670e5d78d1f631 commit r13-1730-g7313381d2ce44b72b4c9f70bd5670e5d78d1f631 Author: Richard Sandiford Date: Mon Jul 18 12:57:10 2022 +0100 arm: Replace arm_builtin_vectorized_function [PR106253] This patch extends the fix for PR106253 to AArch32. As with AArch64, we were using ACLE intrinsics to vectorise scalar built-ins, even though the two sometimes have different ECF_* flags. (That in turn is because the ACLE intrinsics should follow the instruction semantics as closely as possible, whereas the scalar built-ins follow language specs.) The patch also removes the copysignf built-in, which only existed for this purpose and wasn't a ârealâ arm_neon.h built-in. Doing this also has the side-effect of enabling vectorisation of rint and roundeven. Logically that should be a separate patch, but making it one would have meant adding a new int iterator for the original set of instructions and then removing it again when including new functions. I've restricted the bswap tests to little-endian because we end up with excessive spilling on big-endian. E.g.: sub sp, sp, #8 vstrd1, [sp] vldrd16, [sp] vrev16.8d16, d16 vstrd16, [sp] vldrd0, [sp] add sp, sp, #8 @ sp needed bx lr Similarly, the copysign tests require little-endian because on big-endian we unnecessarily load the constant from the constant pool: vldr.32 s15, .L3 vdup.32 d0, d7[1] vbsld0, d2, d1 bx lr .L3: .word -2147483648 gcc/ PR target/106253 * config/arm/arm-builtins.cc (arm_builtin_vectorized_function): Delete. * config/arm/arm-protos.h (arm_builtin_vectorized_function): Delete. * config/arm/arm.cc (TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION): Delete. * config/arm/arm_neon_builtins.def (copysignf): Delete. * config/arm/iterators.md (nvrint_pattern): New attribute. * config/arm/neon.md (2): New pattern. (l2): Likewise. (neon_copysignf): Rename to... (copysign3): ...this. gcc/testsuite/ PR target/106253 * gcc.target/arm/vect_unary_1.c: New test. * gcc.target/arm/vect_binary_1.c: Likewise.
[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253 --- Comment #11 from Richard Biener --- (In reply to Tamar Christina from comment #10) > For completeness, I reduced the Armhf failure and that seems to happen on > bswap. > > #include > #include > > void > __sha256_process_block (uint32_t *buffer, size_t len, uint32_t *W) > { > for (unsigned int t = 0; t < 16; ++t) > { >W[t] = __bswap_32 (*buffer); > ++buffer; > } > } > > will ICE at -O3 So similar issue for __builtin_neon_bswapv4si_uu then which is thought to clobber memory (it hopefully doesn't).
[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253 --- Comment #10 from Tamar Christina --- For completeness, I reduced the Armhf failure and that seems to happen on bswap. #include #include void __sha256_process_block (uint32_t *buffer, size_t len, uint32_t *W) { for (unsigned int t = 0; t < 16; ++t) { W[t] = __bswap_32 (*buffer); ++buffer; } } will ICE at -O3
[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253 rsandifo at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org --- Comment #9 from rsandifo at gcc dot gnu.org --- Fixed for aarch64. I'll do the same thing for arm.
[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253 --- Comment #8 from CVS Commits --- The trunk branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:00eab0c654e09c8a0f1b1a3b1c7bff8764e64991 commit r13-1647-g00eab0c654e09c8a0f1b1a3b1c7bff8764e64991 Author: Richard Sandiford Date: Tue Jul 12 14:09:44 2022 +0100 Add internal functions for iround etc. [PR106253] The PR is about the aarch64 port using an ACLE built-in function to vectorise a scalar function call, even though the ECF_* flags for the ACLE function didn't match the ECF_* flags for the scalar call. To some extent that kind of difference is inevitable, since the ACLE intrinsics are supposed to follow the behaviour of the underlying instruction as closely as possible. Also, using target-specific builtins has the drawback of limiting further gimple optimisation, since the gimple optimisers won't know what the function does. We handle several other maths functions, including round, floor and ceil, by defining directly-mapped internal functions that are linked to the associated built-in functions. This has two main advantages: - it means that, internally, we are not restricted to the set of scalar types that happen to have associated C/C++ functions - the functions (and thus the underlying optabs) extend naturally to vectors This patch takes the same approach for the remaining functions handled by aarch64_builtin_vectorized_function. gcc/ PR target/106253 * predict.h (insn_optimization_type): Declare. * predict.cc (insn_optimization_type): New function. * internal-fn.def (IFN_ICEIL, IFN_IFLOOR, IFN_IRINT, IFN_IROUND) (IFN_LCEIL, IFN_LFLOOR, IFN_LRINT, IFN_LROUND, IFN_LLCEIL) (IFN_LLFLOOR, IFN_LLRINT, IFN_LLROUND): New internal functions. * internal-fn.cc (unary_convert_direct): New macro. (expand_convert_optab_fn): New function. (expand_unary_convert_optab_fn): New macro. (direct_unary_convert_optab_supported_p): Likewise. * optabs.cc (expand_sfix_optab): Pass insn_optimization_type to convert_optab_handler. * config/aarch64/aarch64-protos.h (aarch64_builtin_vectorized_function): Delete. * config/aarch64/aarch64-builtins.cc (aarch64_builtin_vectorized_function): Delete. * config/aarch64/aarch64.cc (TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION): Delete. * config/i386/i386.cc (ix86_optab_supported_p): Handle lround_optab. * config/i386/i386.md (lround2): Remove optimize_insn_for_size_p test. gcc/testsuite/ PR target/106253 * gcc.target/aarch64/vect_unary_1.c: Add tests for iroundf, llround, iceilf, llceil, ifloorf, llfloor, irintf and llrint. * gfortran.dg/vect/pr106253.f: New test.
[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253 --- Comment #7 from Richard Biener --- Btw, I can see FAIL: gcc.dg/vect/vect-rounding-lceil.c (internal compiler error: in vect_transf orm_loops, at tree-vectorizer.cc:1032) FAIL: gcc.dg/vect/vect-rounding-lfloor.c (internal compiler error: in vect_transform_loops, at tree-vectorizer.cc:1032) FAIL: gfortran.dg/g77/20010430.f -O2 (internal compiler error: in vect_transform_loops, at tree-vectorizer.cc:1032) (NINT user) when testing aarch64-linux. Those are all this aarch64 builtin issue (and probably also reproduce on the arm backend side).
[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253 Tamar Christina changed: What|Removed |Added CC||tnfchris at gcc dot gnu.org Target|aarch64-linux-gnu |aarch64-linux-gnu, ||arm-none-linux-gnueabihf --- Comment #6 from Tamar Christina --- Same problem happens with Armhf when building libc. during GIMPLE pass: vect In file included from sha256.c:213: ./sha256-block.c: In function ‘__sha256_process_block’: ./sha256-block.c:6:1: internal compiler error: in vect_transform_loops, at tree-vectorizer.cc:1032 6 | __sha256_process_block (const void *buffer, size_t len, struct sha256_ctx *ctx) | ^~
[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253 --- Comment #5 from Andrew Pinski --- (In reply to rsand...@gcc.gnu.org from comment #4) > I think for those we have to honour the prevailing > flush-to-zero mode, which makes the > functions at best pure rather than const. GCC does handle changing of the rounding mode or change of the denormalized flush to zero mode even in general floating point code. So the question becomes should wait for the infrastructure to be added to fix that case or add another builtin?
[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253 --- Comment #4 from rsandifo at gcc dot gnu.org --- I guess we'll need different patterns in that case. These builtins are also used to expand ACLE intrinsics, and I think for those we have to honour the prevailing flush-to-zero mode, which makes the functions at best pure rather than const.
[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253 --- Comment #3 from Richard Biener --- aarch64_builtin_vectorized_function is what returns these, but the decls seem to be generated elsewhere.
[Bug target/106253] [13 Regression] ICE in vect_transform_loops, at tree-vectorizer.cc:1032
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106253 Richard Biener changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot gnu.org Component|tree-optimization |target CC||rsandifo at gcc dot gnu.org --- Comment #2 from Richard Biener --- # .MEM = VDEF <.MEM> vect__19.49_45 = __builtin_aarch64_lroundv4sfv4si (vect__18.48_44); aarch64 lround vectorized builtin does not have the same guarantees as the builtin.def one. It's good that those cases are now uncovered (I've already fixed a similar case in the x86 backend). Can a target maintainer please take this?