https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89189
Bug ID: 89189 Summary: missed optimization for 16/8-bit vector shuffle Product: gcc Version: 9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: kretz at kde dot org Target Milestone: --- Target: x86_64-*-*, i?86-*-* Testcase `-O2 -msse2`, further missed optimization with SSSE3 / SSE4.1 (cf. https://godbolt.org/z/Yx6aLo): using vshort [[gnu::vector_size(16)]] = short; vshort f(vshort x) { return vshort{x[3], x[7]}; } using vchar [[gnu::vector_size(16)]] = char; vchar g(vchar x) { return vchar{x[7], x[15]}; } f is compiled to 2x pextrw, movd, pinsrw + unpacks for zeroing high bits. The latter unpacks are unnecessary since movd already zeros the high bits [127:32]. With SSE4.1 g is compiled to a similar pattern using pextrb/pinsrb. In this case movd is used, but note that pextrb zeros the bits [31:8] in the GPR, so that the unpacks for zeroing are also unnecessary. Using SSSE3, both functions can also be compiled to a single pshufb instruction using a suitable constant shuffle vector (6,7,14,15,-1,-1,... and 7,15,-1,-1,...).