https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85480

            Bug ID: 85480
           Summary: zero extension from xmm to zmm via _mm512_insert???x?
                    not optimized
           Product: gcc
           Version: 8.0.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kretz at kde dot org
  Target Milestone: ---

Test case (cf. https://godbolt.org/g/p4Kt8X):

#include <x86intrin.h>
__m512 zero_extend2(__m128 a) {
    return _mm512_insertf32x4(__m512(), a, 0);
}
__m512d zero_extend2(__m128d a) {
    return _mm512_insertf64x2(__m512d(), a, 0);
}
__m512i zero_extend2(__m128i a) {
    return _mm512_inserti32x4(__m512i(), a, 0);
}

These 3 functions should compile to `vmovaps %xmm0, %xmm0`, `vmovapd %xmm0,
%xmm0`, and `vmovdqa %xmm0, %xmm0`, respectively.

GCC detects the optimization for the xmm->ymm and ymm->zmm cases already. It's
only missing for the xmm->zmm case.

Reply via email to