https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95435
Bug ID: 95435 Summary: bad builtin memcpy performance with znver1/znver2 and 32bit Product: gcc Version: 10.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jan at jki dot io Target Milestone: --- Created attachment 48641 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48641&action=edit gcc -g -m32 -march=znver2 -O1 -s testmem_modified.c -o tm32 Hi, I have found a regression with znver2 and 32bit compiles regarding builtin memcpy. Test program is this one: https://github.com/level1wendell/memcpy_sse results: gcc -g -m32 -march=znver2 -O1 -s testmem_modified.c -o tm32 32 MB = 2.462717 ms gcc -g -m32 -march=skylake -O1 -s testmem_modified.c -o tm32 32 MB = 1.135762 ms gcc -fno-builtin-memcpy -g -m32 -march=znver2 -O1 -s testmem_modified.c -o tm32 32 MB = 1.138656 ms I have attached the generated assembler code for the first 2 cases.