AVX conversion intrinsics

kretz at kde dot org Tue, 10 Apr 2018 07:48:55 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85324


            Bug ID: 85324
           Summary: missing constant propagation on SSE/AVX conversion
                    intrinsics
           Product: gcc
           Version: 8.0.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kretz at kde dot org
  Target Milestone: ---

The following test case shows that constant propagation through conversion
intrinsics does not work:

#include <x86intrin.h>

template <class T> using V [[gnu::vector_size(16)]] = T;

// missed optimization:
auto a1() { return 1 + (V<  int>)_mm_cvttps_epi32(_mm_set1_ps(1.f)); }
auto b1() { return 1 + (V< long>)_mm_cvttps_epi64(_mm_set1_ps(1.f)); }
auto c1() { return 1 + (V<  int>)_mm_cvttpd_epi32(_mm_set1_pd(1.)); }
auto d1() { return 1 + (V< long>)_mm_cvttpd_epi64(_mm_set1_pd(1.)); }
auto e1() { return 1 + (V<short>)_mm_cvtepi32_epi16(_mm_set1_epi32(1)); }

The resulting asm is (`-O3 -march=skylake-avx512 -std=c++17`):
a1():
  vcvttps2dq .LC0(%rip), %xmm0
  vpaddd %xmm0, %xmm0, %xmm0
  ret
b1():
  vcvttps2qq .LC0(%rip), %xmm0
  vpaddq %xmm0, %xmm0, %xmm0
  ret
c1():
  vmovdqa64 .LC1(%rip), %xmm0
  vcvttpd2dqx .LC5(%rip), %xmm1
  vpaddd %xmm0, %xmm1, %xmm0
  ret
d1():
  vcvttpd2qq .LC5(%rip), %xmm0
  vpaddq %xmm0, %xmm0, %xmm0
  ret
e1():
  vmovdqa64 .LC7(%rip), %xmm1
  vmovdqa64 .LC1(%rip), %xmm0
  vpmovdw %xmm0, %xmm0
  vpaddw %xmm1, %xmm0, %xmm0
  ret

It should be a single load of a constant in each function. (A wrapper using
__builtin_constant_p can work around it; cf. https://godbolt.org/g/8dta7B)

[Bug target/85324] New: missing constant propagation on SSE/AVX conversion intrinsics

Reply via email to