[Bug c++/85052] Implement support for clang's __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 --- Comment #12 from Matthias Kretz --- (In reply to Jakub Jelinek from comment #11) > [...] though for 8x conversions we > are e.g. on x86 already outside of the realm of natively supported vectors > (we don't really want MMX and for 1024 bit and wider generic vectors we > don't always emit best code). Creatively thinking, consider constants stored as (u)char arrays (for bandwith optimization), converted to double or (u)llong when used. I'd want to use a half-SSE load + subsequent conversion to AVX-512 vector (e.g. vpmovsxbq + vcvtqq2pd) or even full SSE load + one shift and two conversions to AVX-512. Similar motivation for the reverse direction. (Though a lot less likely to be used in practice, I believe. Hmm, maybe AI applications can prove that expectation wrong.) But we should track optimizations in their own issues.
[Bug c++/85052] Implement support for clang's __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 Jakub Jelinek changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #11 from Jakub Jelinek --- Implemented on the trunk now. The 4x/8x narrowing/widening conversions will need further work to handle them efficiently, though for 8x conversions we are e.g. on x86 already outside of the realm of natively supported vectors (we don't really want MMX and for 1024 bit and wider generic vectors we don't always emit best code).
[Bug c++/85052] Implement support for clang's __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 --- Comment #10 from Jakub Jelinek --- Author: jakub Date: Mon Jan 7 08:49:08 2019 New Revision: 267632 URL: https://gcc.gnu.org/viewcvs?rev=267632&root=gcc&view=rev Log: PR c++/85052 * tree-vect-generic.c: Include insn-config.h and recog.h. (expand_vector_piecewise): Add defaulted ret_type argument, if non-NULL, use that in preference to type for the result type. (expand_vector_parallel): Formatting fix. (do_vec_conversion, do_vec_narrowing_conversion, expand_vector_conversion): New functions. (expand_vector_operations_1): Call expand_vector_conversion for VEC_CONVERT ifn calls. * internal-fn.def (VEC_CONVERT): New internal function. * internal-fn.c (expand_VEC_CONVERT): New function. * fold-const-call.c (fold_const_vec_convert): New function. (fold_const_call): Use it for CFN_VEC_CONVERT. * doc/extend.texi (__builtin_convertvector): Document. c-family/ * c-common.h (enum rid): Add RID_BUILTIN_CONVERTVECTOR. (c_build_vec_convert): Declare. * c-common.c (c_build_vec_convert): New function. c/ * c-parser.c (c_parser_postfix_expression): Parse __builtin_convertvector. cp/ * cp-tree.h (cp_build_vec_convert): Declare. * parser.c (cp_parser_postfix_expression): Parse __builtin_convertvector. * constexpr.c: Include fold-const-call.h. (cxx_eval_internal_function): Handle IFN_VEC_CONVERT. (potential_constant_expression_1): Likewise. * semantics.c (cp_build_vec_convert): New function. * pt.c (tsubst_copy_and_build): Handle CALL_EXPR to IFN_VEC_CONVERT. testsuite/ * c-c++-common/builtin-convertvector-1.c: New test. * c-c++-common/torture/builtin-convertvector-1.c: New test. * g++.dg/ext/builtin-convertvector-1.C: New test. * g++.dg/cpp0x/constexpr-builtin4.C: New test. Added: trunk/gcc/testsuite/c-c++-common/builtin-convertvector-1.c trunk/gcc/testsuite/c-c++-common/torture/builtin-convertvector-1.c trunk/gcc/testsuite/g++.dg/cpp0x/constexpr-builtin4.C trunk/gcc/testsuite/g++.dg/ext/builtin-convertvector-1.C Modified: trunk/gcc/ChangeLog trunk/gcc/c-family/ChangeLog trunk/gcc/c-family/c-common.c trunk/gcc/c-family/c-common.h trunk/gcc/c/ChangeLog trunk/gcc/c/c-parser.c trunk/gcc/cp/ChangeLog trunk/gcc/cp/constexpr.c trunk/gcc/cp/cp-tree.h trunk/gcc/cp/parser.c trunk/gcc/cp/pt.c trunk/gcc/cp/semantics.c trunk/gcc/doc/extend.texi trunk/gcc/fold-const-call.c trunk/gcc/internal-fn.c trunk/gcc/internal-fn.def trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-generic.c
[Bug c++/85052] Implement support for clang's __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 --- Comment #9 from Matthias Kretz --- (In reply to Devin Hussey from comment #7) > Wait, silly me, this isn't about optimizations, this is about patterns. Regarding optimizations, PR85048 is a first step (it lists all x86 single-instruction SIMD conversions). I also linked my library implementation in #5, which provides optimizations for all cases on x86.
[Bug c++/85052] Implement support for clang's __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 --- Comment #8 from Jakub Jelinek --- Note, I've posted in the meantime a newer version of the patch that should handle the 2x narrowing or 2x widening cases better, see https://gcc.gnu.org/ml/gcc-patches/2019-01/msg00129.html
[Bug c++/85052] Implement support for clang's __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 --- Comment #7 from Devin Hussey --- Wait, silly me, this isn't about optimizations, this is about patterns. It does the same thing it was doing for this code: typedef unsigned u32x2 __attribute__((vector_size(8))); typedef unsigned long long u64x2 __attribute__((vector_size(16))); u64x2 cvt(u32x2 in) { return (u64x2) { (unsigned long long)in[0], (unsigned long long)in[1] }; }
[Bug c++/85052] Implement support for clang's __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 --- Comment #6 from Devin Hussey --- The patch seems to be working. typedef unsigned u32x2 __attribute__((vector_size(8))); typedef unsigned long long u64x2 __attribute__((vector_size(16))); u64x2 cvt(u32x2 in) { return __builtin_convertvector(in, u64x2); } It doesn't generate the best code, but it isn't bad. x86_64, SSE4.1: cvt: movq%xmm0, %rax movd%eax, %xmm0 shrq$32, %rax pinsrq $1, %rax, %xmm0 ret x86_64, SSE2: cvt: movq%xmm0, %rax movd%eax, %xmm0 shrq$32, %rax movq%rax, %xmm1 punpcklqdq %xmm1, %xmm0 ret ARMv7a NEON: cvt: sub sp, sp, #16 mov r3, #0 str r3, [sp, #4] str r3, [sp, #12] add r3, sp, #8 vst1.32 {d0[0]}, [sp] vst1.32 {d0[1]}, [r3] vld1.64 {d0-d1}, [sp:64] add sp, sp, #16 bx lr I haven't built the others yet. The correct code would be this ([signed|unsigned]): cvt: vmovl.[s|u]32q0, d0 bx lr I am testing other targets now. For the reference, this is what clang generates for other targets: aarch64: cvt: [s|u]shll v0.2d, v0.2s, #0 ret sse4.1/avx: cvt: [v]pmov[s|z]xdqxmm0, xmm0 ret sse2: signed_cvt: pxorxmm1, xmm1 pcmpgtd xmm1, xmm0 punpckldq xmm0, xmm1 # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] ret unsigned_cvt: xorps xmm1, xmm1 unpcklpsxmm0, xmm1 # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1] ret
[Bug c++/85052] Implement support for clang's __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 --- Comment #5 from Matthias Kretz --- Thank you Jakub! Here's a tested x86 library implementation for all conversions and different ISA extension support for reference: https://github.com/mattkretz/gcc/blob/mkretz/simd/libstdc%2B%2B-v3/include/experimental/bits/simd_x86_conversions.h (I have not looked at the patch yet to see whether I understand enough of the implementation to optimize conversions myself.)
[Bug c++/85052] Implement support for clang's __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 Jakub Jelinek changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org --- Comment #4 from Jakub Jelinek --- Created attachment 45319 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45319&action=edit gcc9-pr85052.patch Untested implementation. Some further work is needed to improve code generation for the narrowing or widening conversions.
[Bug c++/85052] Implement support for clang's __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 Jan Hubicka changed: What|Removed |Added CC||vincenzo.innocente at cern dot ch --- Comment #3 from Jan Hubicka --- *** Bug 61731 has been marked as a duplicate of this bug. ***
[Bug c++/85052] Implement support for clang's __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 --- Comment #2 from Marc Glisse --- Dup of PR61731.
[Bug c++/85052] Implement support for clang's __builtin_convertvector
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2018-03-23 Ever confirmed|0 |1 Severity|normal |enhancement --- Comment #1 from Richard Biener --- Confirmed. __builtin_convertvector is used to express generic vector type-conversion operations. The input vector and the output vector type must have the same number of elements. Syntax: __builtin_convertvector(src_vec, dst_vec_type)