[Bug c++/85052] Implement support for clang's __builtin_convertvector

2019-01-07 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052

--- Comment #12 from Matthias Kretz  ---
(In reply to Jakub Jelinek from comment #11)
> [...] though for 8x conversions we
> are e.g. on x86 already outside of the realm of natively supported vectors
> (we don't really want MMX and for 1024 bit and wider generic vectors we
> don't always emit best code).

Creatively thinking, consider constants stored as (u)char arrays (for bandwith
optimization), converted to double or (u)llong when used. I'd want to use a
half-SSE load + subsequent conversion to AVX-512 vector (e.g. vpmovsxbq +
vcvtqq2pd) or even full SSE load + one shift and two conversions to AVX-512.

Similar motivation for the reverse direction. (Though a lot less likely to be
used in practice, I believe. Hmm, maybe AI applications can prove that
expectation wrong.)

But we should track optimizations in their own issues.

[Bug c++/85052] Implement support for clang's __builtin_convertvector

2019-01-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Jakub Jelinek  ---
Implemented on the trunk now.  The 4x/8x narrowing/widening conversions will
need further work to handle them efficiently, though for 8x conversions we are
e.g. on x86 already outside of the realm of natively supported vectors (we
don't really want MMX and for 1024 bit and wider generic vectors we don't
always emit best code).

[Bug c++/85052] Implement support for clang's __builtin_convertvector

2019-01-07 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052

--- Comment #10 from Jakub Jelinek  ---
Author: jakub
Date: Mon Jan  7 08:49:08 2019
New Revision: 267632

URL: https://gcc.gnu.org/viewcvs?rev=267632=gcc=rev
Log:
PR c++/85052
* tree-vect-generic.c: Include insn-config.h and recog.h.
(expand_vector_piecewise): Add defaulted ret_type argument,
if non-NULL, use that in preference to type for the result type.
(expand_vector_parallel): Formatting fix.
(do_vec_conversion, do_vec_narrowing_conversion,
expand_vector_conversion): New functions.
(expand_vector_operations_1): Call expand_vector_conversion
for VEC_CONVERT ifn calls.
* internal-fn.def (VEC_CONVERT): New internal function.
* internal-fn.c (expand_VEC_CONVERT): New function.
* fold-const-call.c (fold_const_vec_convert): New function.
(fold_const_call): Use it for CFN_VEC_CONVERT.
* doc/extend.texi (__builtin_convertvector): Document.
c-family/
* c-common.h (enum rid): Add RID_BUILTIN_CONVERTVECTOR.
(c_build_vec_convert): Declare.
* c-common.c (c_build_vec_convert): New function.
c/
* c-parser.c (c_parser_postfix_expression): Parse
__builtin_convertvector.
cp/
* cp-tree.h (cp_build_vec_convert): Declare.
* parser.c (cp_parser_postfix_expression): Parse
__builtin_convertvector.
* constexpr.c: Include fold-const-call.h.
(cxx_eval_internal_function): Handle IFN_VEC_CONVERT.
(potential_constant_expression_1): Likewise.
* semantics.c (cp_build_vec_convert): New function.
* pt.c (tsubst_copy_and_build): Handle CALL_EXPR to
IFN_VEC_CONVERT.
testsuite/
* c-c++-common/builtin-convertvector-1.c: New test.
* c-c++-common/torture/builtin-convertvector-1.c: New test.
* g++.dg/ext/builtin-convertvector-1.C: New test.
* g++.dg/cpp0x/constexpr-builtin4.C: New test.

Added:
trunk/gcc/testsuite/c-c++-common/builtin-convertvector-1.c
trunk/gcc/testsuite/c-c++-common/torture/builtin-convertvector-1.c
trunk/gcc/testsuite/g++.dg/cpp0x/constexpr-builtin4.C
trunk/gcc/testsuite/g++.dg/ext/builtin-convertvector-1.C
Modified:
trunk/gcc/ChangeLog
trunk/gcc/c-family/ChangeLog
trunk/gcc/c-family/c-common.c
trunk/gcc/c-family/c-common.h
trunk/gcc/c/ChangeLog
trunk/gcc/c/c-parser.c
trunk/gcc/cp/ChangeLog
trunk/gcc/cp/constexpr.c
trunk/gcc/cp/cp-tree.h
trunk/gcc/cp/parser.c
trunk/gcc/cp/pt.c
trunk/gcc/cp/semantics.c
trunk/gcc/doc/extend.texi
trunk/gcc/fold-const-call.c
trunk/gcc/internal-fn.c
trunk/gcc/internal-fn.def
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-generic.c

[Bug c++/85052] Implement support for clang's __builtin_convertvector

2019-01-05 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052

--- Comment #9 from Matthias Kretz  ---
(In reply to Devin Hussey from comment #7)
> Wait, silly me, this isn't about optimizations, this is about patterns.

Regarding optimizations, PR85048 is a first step (it lists all x86
single-instruction SIMD conversions). I also linked my library implementation
in #5, which provides optimizations for all cases on x86.

[Bug c++/85052] Implement support for clang's __builtin_convertvector

2019-01-05 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052

--- Comment #8 from Jakub Jelinek  ---
Note, I've posted in the meantime a newer version of the patch that should
handle the 2x narrowing or 2x widening cases better, see
https://gcc.gnu.org/ml/gcc-patches/2019-01/msg00129.html

[Bug c++/85052] Implement support for clang's __builtin_convertvector

2019-01-05 Thread husseydevin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052

--- Comment #7 from Devin Hussey  ---
Wait, silly me, this isn't about optimizations, this is about patterns.

It does the same thing it was doing for this code:

typedef unsigned u32x2 __attribute__((vector_size(8)));
typedef unsigned long long u64x2 __attribute__((vector_size(16)));

u64x2 cvt(u32x2 in)
{
return (u64x2) { (unsigned long long)in[0], (unsigned long long)in[1] };
}

[Bug c++/85052] Implement support for clang's __builtin_convertvector

2019-01-05 Thread husseydevin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052

--- Comment #6 from Devin Hussey  ---
The patch seems to be working.

typedef unsigned u32x2 __attribute__((vector_size(8)));
typedef unsigned long long u64x2 __attribute__((vector_size(16)));

u64x2 cvt(u32x2 in)
{
return __builtin_convertvector(in, u64x2);
}

It doesn't generate the best code, but it isn't bad.

x86_64, SSE4.1:

cvt:
movq%xmm0, %rax
movd%eax, %xmm0
shrq$32, %rax
pinsrq  $1, %rax, %xmm0
ret

x86_64, SSE2:

cvt:
movq%xmm0, %rax
movd%eax, %xmm0
shrq$32, %rax
movq%rax, %xmm1
punpcklqdq  %xmm1, %xmm0
ret

ARMv7a NEON:

cvt:
sub sp, sp, #16
mov r3, #0
str r3, [sp, #4]
str r3, [sp, #12]
add r3, sp, #8
vst1.32 {d0[0]}, [sp]
vst1.32 {d0[1]}, [r3]
vld1.64 {d0-d1}, [sp:64]
add sp, sp, #16
bx  lr

I haven't built the others yet.

The correct code would be this ([signed|unsigned]):

cvt:
vmovl.[s|u]32q0, d0
bx lr

I am testing other targets now. 

For the reference, this is what clang generates for other targets:

aarch64:

cvt:
[s|u]shll   v0.2d, v0.2s, #0
ret

sse4.1/avx:

cvt:
[v]pmov[s|z]xdqxmm0, xmm0
ret

sse2:

signed_cvt:
pxorxmm1, xmm1
pcmpgtd xmm1, xmm0
punpckldq   xmm0, xmm1  # xmm0 =
xmm0[0],xmm1[0],xmm0[1],xmm1[1]
ret

unsigned_cvt:
xorps   xmm1, xmm1
unpcklpsxmm0, xmm1  # xmm0 =
xmm0[0],xmm1[0],xmm0[1],xmm1[1]
ret

[Bug c++/85052] Implement support for clang's __builtin_convertvector

2019-01-02 Thread kretz at kde dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052

--- Comment #5 from Matthias Kretz  ---
Thank you Jakub! Here's a tested x86 library implementation for all conversions
and different ISA extension support for reference:

https://github.com/mattkretz/gcc/blob/mkretz/simd/libstdc%2B%2B-v3/include/experimental/bits/simd_x86_conversions.h

(I have not looked at the patch yet to see whether I understand enough of the
implementation to optimize conversions myself.)

[Bug c++/85052] Implement support for clang's __builtin_convertvector

2019-01-02 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
Created attachment 45319
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45319=edit
gcc9-pr85052.patch

Untested implementation.  Some further work is needed to improve code
generation for the narrowing or widening conversions.

[Bug c++/85052] Implement support for clang's __builtin_convertvector

2018-12-26 Thread hubicka at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052

Jan Hubicka  changed:

   What|Removed |Added

 CC||vincenzo.innocente at cern dot 
ch

--- Comment #3 from Jan Hubicka  ---
*** Bug 61731 has been marked as a duplicate of this bug. ***

[Bug c++/85052] Implement support for clang's __builtin_convertvector

2018-03-30 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052

--- Comment #2 from Marc Glisse  ---
Dup of PR61731.

[Bug c++/85052] Implement support for clang's __builtin_convertvector

2018-03-23 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-03-23
 Ever confirmed|0   |1
   Severity|normal  |enhancement

--- Comment #1 from Richard Biener  ---
Confirmed.

__builtin_convertvector is used to express generic vector type-conversion
operations. The input vector and the output vector type must have the same
number of elements.

Syntax:
  __builtin_convertvector(src_vec, dst_vec_type)