https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106038
Bug ID: 106038 Summary: x86_64 vectorization of ALU ops using xmm registers prematurely Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: goldstein.w.n at gmail dot com Target Milestone: --- See: https://godbolt.org/z/YxWEn6Y65 Basically in all cases where the total amount of memory touched is <= 8 bytes (word size) the vectorization pass is choosing to inefficiently use xmm registers to vectorize the unrolled loops. GPRs (as GCC <= 9.5 was doing) is faster / less code size. Related to: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106022