https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95435

            Bug ID: 95435
           Summary: bad builtin memcpy performance with znver1/znver2 and
                    32bit
           Product: gcc
           Version: 10.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jan at jki dot io
  Target Milestone: ---

Created attachment 48641
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48641&action=edit
gcc -g -m32  -march=znver2 -O1 -s testmem_modified.c -o tm32

Hi,
I have found a regression with znver2 and 32bit compiles regarding builtin
memcpy.
Test program is this one:
https://github.com/level1wendell/memcpy_sse

results:
gcc -g -m32  -march=znver2 -O1 -s testmem_modified.c -o tm32
32 MB = 2.462717 ms

gcc -g -m32  -march=skylake -O1 -s testmem_modified.c -o tm32
32 MB = 1.135762 ms

gcc -fno-builtin-memcpy -g -m32  -march=znver2 -O1 -s testmem_modified.c -o
tm32
32 MB = 1.138656 ms

I have attached the generated assembler code for the first 2 cases.

Reply via email to