https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100320
Bug ID: 100320 Summary: regression: 32-bit x86 memcpy is suboptimal Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: vda.linux at googlemail dot com Target Milestone: --- Bug 21329 has returned. 32-bit x86 memory block moves are using "movl $LEN,%ecx; rep movsl" insns. However, for fixed short blocks it is more efficient to just repeat a few "movsl" insns - this allows to drop "mov $LEN,%ecx" insn. It's shorter, and more importantly, "rep movsl" are slow-start microcoded insns (they are faster than moves using general-purpose registers only on blocks larger than 100-200 bytes) - OTOH, bare "movsl" are not microcoded and take ~4 cycles to execute. 21329 was closed with it fixed: CVSROOT: /cvs/gcc Module name: gcc Branch: gcc-4_0-rhl-branch Changes by: ja...@gcc.gnu.org 2005-05-18 19:08:44 Modified files: gcc : ChangeLog gcc/config/i386: i386.c Log message: 2005-05-06 Denis Vlasenko <v...@port.imtp.ilyichevsk.odessa.ua> Jakub Jelinek <ja...@redhat.com> PR target/21329 * config/i386/i386.c (ix86_expand_movmem): Don't use rep; movsb for -Os if (movsl;)*(movsw;)?(movsb;)? sequence is shorter. Don't use rep; movs{l,q} if the repetition count is really small, instead use a sequence of movs{l,q} instructions. (the above is commit 95935e2db5c45bef5631f51538d1e10d8b5b7524 in gcc.gnu.org/git/gcc.git, seems that code was largely replaced by: commit 8c996513856f2769aee1730cb211050fef055fb5 Author: Jan Hubicka <j...@suse.cz> Date: Mon Nov 27 17:00:26 2006 +010 expr.c (emit_block_move_via_libcall): Export. ) With gcc version 11.0.0 20210210 (Red Hat 11.0.0-0) (GCC) I see "rep movsl"s again: void *f(void *d, const void *s) { return memcpy(d, s, 16); } $ gcc -Os -m32 -fomit-frame-pointer -c -o z.o z.c && objdump -drw z.o z.o: file format elf32-i386 Disassembly of section .text: 00000000 <f>: 0: 57 push %edi 1: b9 04 00 00 00 mov $0x4,%ecx 6: 56 push %esi 7: 8b 44 24 0c mov 0xc(%esp),%eax b: 8b 74 24 10 mov 0x10(%esp),%esi f: 89 c7 mov %eax,%edi 11: f3 a5 rep movsl %ds:(%esi),%es:(%edi) 13: 5e pop %esi 14: 5f pop %edi 15: c3 ret The expected code would not have "mov $0x4,%ecx" and would have "rep movsl" replaced by "movsl;movsl;movsl;movsl". The testcase from 21329 with implicit block moves via struct copies, from here https://gcc.gnu.org/bugzilla/attachment.cgi?id=8790 also demonstrates it: $ gcc -Os -m32 -fomit-frame-pointer -c -o z1.o z1.c && objdump -drw z1.o z1.o: file format elf32-i386 Disassembly of section .text: 00000000 <f10>: 0: a1 00 00 00 00 mov 0x0,%eax 1: R_386_32 w10 5: a3 00 00 00 00 mov %eax,0x0 6: R_386_32 t10 a: c3 ret 0000000b <f20>: b: a1 00 00 00 00 mov 0x0,%eax c: R_386_32 w20 10: 8b 15 04 00 00 00 mov 0x4,%edx 12: R_386_32 w20 16: a3 00 00 00 00 mov %eax,0x0 17: R_386_32 t20 1b: 89 15 04 00 00 00 mov %edx,0x4 1d: R_386_32 t20 21: c3 ret 00000022 <f21>: 22: 57 push %edi 23: b9 09 00 00 00 mov $0x9,%ecx 28: bf 00 00 00 00 mov $0x0,%edi 29: R_386_32 t21 2d: 56 push %esi 2e: be 00 00 00 00 mov $0x0,%esi 2f: R_386_32 w21 33: f3 a4 rep movsb %ds:(%esi),%es:(%edi) 35: 5e pop %esi 36: 5f pop %edi 37: c3 ret 00000038 <f22>: 38: 57 push %edi 39: b9 0a 00 00 00 mov $0xa,%ecx 3e: bf 00 00 00 00 mov $0x0,%edi 3f: R_386_32 t22 43: 56 push %esi 44: be 00 00 00 00 mov $0x0,%esi 45: R_386_32 w22 49: f3 a4 rep movsb %ds:(%esi),%es:(%edi) 4b: 5e pop %esi 4c: 5f pop %edi 4d: c3 ret 0000004e <f23>: 4e: 57 push %edi 4f: b9 0b 00 00 00 mov $0xb,%ecx 54: bf 00 00 00 00 mov $0x0,%edi 55: R_386_32 t23 59: 56 push %esi 5a: be 00 00 00 00 mov $0x0,%esi 5b: R_386_32 w23 5f: f3 a4 rep movsb %ds:(%esi),%es:(%edi) 61: 5e pop %esi 62: 5f pop %edi 63: c3 ret 00000064 <f30>: 64: 57 push %edi 65: b9 03 00 00 00 mov $0x3,%ecx 6a: bf 00 00 00 00 mov $0x0,%edi 6b: R_386_32 t30 6f: 56 push %esi 70: be 00 00 00 00 mov $0x0,%esi 71: R_386_32 w30 75: f3 a5 rep movsl %ds:(%esi),%es:(%edi) 77: 5e pop %esi 78: 5f pop %edi 79: c3 ret 0000007a <f40>: 7a: 57 push %edi 7b: b9 04 00 00 00 mov $0x4,%ecx 80: bf 00 00 00 00 mov $0x0,%edi 81: R_386_32 t40 85: 56 push %esi 86: be 00 00 00 00 mov $0x0,%esi 87: R_386_32 w40 8b: f3 a5 rep movsl %ds:(%esi),%es:(%edi) 8d: 5e pop %esi 8e: 5f pop %edi 8f: c3 ret 00000090 <f50>: 90: 57 push %edi 91: b9 05 00 00 00 mov $0x5,%ecx 96: bf 00 00 00 00 mov $0x0,%edi 97: R_386_32 t50 9b: 56 push %esi 9c: be 00 00 00 00 mov $0x0,%esi 9d: R_386_32 w50 a1: f3 a5 rep movsl %ds:(%esi),%es:(%edi) a3: 5e pop %esi a4: 5f pop %edi a5: c3 ret 000000a6 <f60>: a6: 57 push %edi a7: b9 06 00 00 00 mov $0x6,%ecx ac: bf 00 00 00 00 mov $0x0,%edi ad: R_386_32 t60 b1: 56 push %esi b2: be 00 00 00 00 mov $0x0,%esi b3: R_386_32 w60 b7: f3 a5 rep movsl %ds:(%esi),%es:(%edi) b9: 5e pop %esi ba: 5f pop %edi bb: c3 ret 000000bc <f>: ...