https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884
Bug ID: 98884 Summary: Implement empty struct optimisations on ARM Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: david at westcontrol dot com Target Milestone: --- Empty "tag" structs (or classes) are useful for strong typing, function options, and so on. The rules of C++ require these to have a non-zero size (so that addresses of different instances are valid and distinct), but they contain no significant data. Ideally, therefore, the compiler will not generate code that sets values or copies values when passing around such types. Unfortunately, that is not quite the case. Consider these two examples, with foo1 creating a tag type, and foo2 passing it on: struct Tag { friend Tag make_tag(); private: Tag() {} }; Tag make_tag() { return Tag{}; }; void needs_tag(Tag); void foo1(void) { Tag t = make_tag(); needs_tag(t); } struct Tag1 {}; struct Tag2 {}; struct Tag3 {}; struct Tag4 {}; struct Tag5 {}; void needs_tags(int x, Tag1 t1, Tag2 t2, Tag3 t3, Tag4 t4, Tag5 t5); void foo2(Tag1 t1, Tag2 t2, Tag3 t3, Tag4 t4, Tag5 t5) { needs_tags(12345, t1, t2, t3, t4, t5); } (Here is a godbolt link for convenience: <https://godbolt.org/z/o5K78h>) On x86, since gcc 8, this has been quite efficient (this is all with -O2): make_tag(): xor eax, eax ret foo1(): jmp needs_tag(Tag) foo2(Tag1, Tag2, Tag3, Tag4, Tag5): mov edi, 12345 jmp needs_tags(int, Tag1, Tag2, Tag3, Tag4, Tag5) The contents of the tag instances are basically ignored. The exception is on "make_tag", where the return is given the value 0 unnecessarily. But on ARM it is a different matter. This is for the Cortex-M4: make_tag(): mov r0, #0 bx lr foo1(): mov r0, #0 b needs_tag(Tag) foo2(Tag1, Tag2, Tag3, Tag4, Tag5): push {lr} sub sp, sp, #12 mov r2, #0 mov r3, r2 strb r2, [sp, #4] strb r2, [sp] mov r1, r2 movw r0, #12345 bl needs_tags(int, Tag1, Tag2, Tag3, Tag4, Tag5) add sp, sp, #12 ldr pc, [sp], #4 The needless register and stack allocations, initialisations and copying mean that this technique has a significant overhead for something that should really "disappear in the compilation". The x86 port manages this well. Is it possible to get such optimisations into the ARM port too? Oh, and for comparison, clang with the same options (-std=c++17 -Wall -Wextra -O2 -mcpu=cortex-m4) gives: make_tag(): bx lr foo1(): b needs_tag(Tag) foo2(Tag1, Tag2, Tag3, Tag4, Tag5): movw r0, #12345 b needs_tags(int, Tag1, Tag2, Tag3, Tag4, Tag5)