https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79938
--- Comment #6 from postmaster at raasu dot org ---
I tried identical code using intrinsics with both clang and gcc:
clang:
movdqa xmm1,XMMWORD PTR [rip+0xd98]# 402050 <_IO_stdin_used+0x50>
pand xmm1,xmm0
movdqa xmm2,xmm0
pshufb
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79938
--- Comment #5 from postmaster at raasu dot org ---
My brains think it's basically four shuffles and three vector additions. It's
part of vectorized adler32 implementation, so there is real-life use for the
optimization.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79938
Andrew Pinski changed:
What|Removed |Added
Severity|normal |enhancement
Last reconfirmed|