https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432
Bug ID: 107432 Summary: __builtin_convertvector generates inefficient code Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: g.peterh...@t-online.de Target Milestone: --- Example: conversion int64_t -> int32_t avx512f + avx512vl HW conversions are available. avx2 There is a correctly working 32-bit-permutation (_mm256_permutevar8x32_epi32/vpermd) that can be used. I have not (yet) evaluated whether other conversions (larger int -> smaller int) are also affected. PS: On x86 it's already hell to optimize all cases depending on the instruction set. PPS: What about -march=znver4 ? https://godbolt.org/z/3s79bnh7v thx Gero