https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89189

            Bug ID: 89189
           Summary: missed optimization for 16/8-bit vector shuffle
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kretz at kde dot org
  Target Milestone: ---
            Target: x86_64-*-*, i?86-*-*

Testcase `-O2 -msse2`, further missed optimization with SSSE3 / SSE4.1 (cf.
https://godbolt.org/z/Yx6aLo):

using vshort [[gnu::vector_size(16)]] = short;
vshort f(vshort x) {
    return vshort{x[3], x[7]};
}

using vchar [[gnu::vector_size(16)]] = char;
vchar g(vchar x) {
    return vchar{x[7], x[15]};
}

f is compiled to 2x pextrw, movd, pinsrw + unpacks for zeroing high bits. The
latter unpacks are unnecessary since movd already zeros the high bits [127:32].

With SSE4.1 g is compiled to a similar pattern using pextrb/pinsrb. In this
case movd is used, but note that pextrb zeros the bits [31:8] in the GPR, so
that the unpacks for zeroing are also unnecessary.

Using SSSE3, both functions can also be compiled to a single pshufb instruction
using a suitable constant shuffle vector (6,7,14,15,-1,-1,... and
7,15,-1,-1,...).

Reply via email to