https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85324
Bug ID: 85324 Summary: missing constant propagation on SSE/AVX conversion intrinsics Product: gcc Version: 8.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- The following test case shows that constant propagation through conversion intrinsics does not work: #include <x86intrin.h> template <class T> using V [[gnu::vector_size(16)]] = T; // missed optimization: auto a1() { return 1 + (V< int>)_mm_cvttps_epi32(_mm_set1_ps(1.f)); } auto b1() { return 1 + (V< long>)_mm_cvttps_epi64(_mm_set1_ps(1.f)); } auto c1() { return 1 + (V< int>)_mm_cvttpd_epi32(_mm_set1_pd(1.)); } auto d1() { return 1 + (V< long>)_mm_cvttpd_epi64(_mm_set1_pd(1.)); } auto e1() { return 1 + (V<short>)_mm_cvtepi32_epi16(_mm_set1_epi32(1)); } The resulting asm is (`-O3 -march=skylake-avx512 -std=c++17`): a1(): vcvttps2dq .LC0(%rip), %xmm0 vpaddd %xmm0, %xmm0, %xmm0 ret b1(): vcvttps2qq .LC0(%rip), %xmm0 vpaddq %xmm0, %xmm0, %xmm0 ret c1(): vmovdqa64 .LC1(%rip), %xmm0 vcvttpd2dqx .LC5(%rip), %xmm1 vpaddd %xmm0, %xmm1, %xmm0 ret d1(): vcvttpd2qq .LC5(%rip), %xmm0 vpaddq %xmm0, %xmm0, %xmm0 ret e1(): vmovdqa64 .LC7(%rip), %xmm1 vmovdqa64 .LC1(%rip), %xmm0 vpmovdw %xmm0, %xmm0 vpaddw %xmm1, %xmm0, %xmm0 ret It should be a single load of a constant in each function. (A wrapper using __builtin_constant_p can work around it; cf. https://godbolt.org/g/8dta7B)