https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85480
Bug ID: 85480 Summary: zero extension from xmm to zmm via _mm512_insert???x? not optimized Product: gcc Version: 8.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Test case (cf. https://godbolt.org/g/p4Kt8X): #include <x86intrin.h> __m512 zero_extend2(__m128 a) { return _mm512_insertf32x4(__m512(), a, 0); } __m512d zero_extend2(__m128d a) { return _mm512_insertf64x2(__m512d(), a, 0); } __m512i zero_extend2(__m128i a) { return _mm512_inserti32x4(__m512i(), a, 0); } These 3 functions should compile to `vmovaps %xmm0, %xmm0`, `vmovapd %xmm0, %xmm0`, and `vmovdqa %xmm0, %xmm0`, respectively. GCC detects the optimization for the xmm->ymm and ymm->zmm cases already. It's only missing for the xmm->zmm case.