https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107432

            Bug ID: 107432
           Summary: __builtin_convertvector generates inefficient code
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: g.peterh...@t-online.de
  Target Milestone: ---

Example: conversion int64_t -> int32_t

avx512f + avx512vl
HW conversions are available.

avx2
There is a correctly working 32-bit-permutation
(_mm256_permutevar8x32_epi32/vpermd) that can be used.

I have not (yet) evaluated whether other conversions (larger int -> smaller
int) are also affected.
PS: On x86 it's already hell to optimize all cases depending on the instruction
set.
PPS: What about -march=znver4 ?

https://godbolt.org/z/3s79bnh7v

thx
Gero

Reply via email to