https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98884

            Bug ID: 98884
           Summary: Implement empty struct optimisations on ARM
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: david at westcontrol dot com
  Target Milestone: ---

Empty "tag" structs (or classes) are useful for strong typing, function
options, and so on.  The rules of C++ require these to have a non-zero size (so
that addresses of different instances are valid and distinct), but they contain
no significant data.  Ideally, therefore, the compiler will not generate code
that sets values or copies values when passing around such types. 
Unfortunately, that is not quite the case.

Consider these two examples, with foo1 creating a tag type, and foo2 passing it
on:

struct Tag {
    friend Tag make_tag();
private:
    Tag() {}
};

Tag make_tag() { 
    return Tag{}; 
};

void needs_tag(Tag);

void foo1(void) {
    Tag t = make_tag();
    needs_tag(t);
}


struct Tag1 {};
struct Tag2 {};
struct Tag3 {};
struct Tag4 {};
struct Tag5 {};

void needs_tags(int x, Tag1 t1, Tag2 t2, Tag3 t3, Tag4 t4, Tag5 t5);

void foo2(Tag1 t1, Tag2 t2, Tag3 t3, Tag4 t4, Tag5 t5)
{
    needs_tags(12345, t1, t2, t3, t4, t5);
}


(Here is a godbolt link for convenience: <https://godbolt.org/z/o5K78h>)

On x86, since gcc 8, this has been quite efficient (this is all with -O2):

make_tag():
        xor     eax, eax
        ret
foo1():
        jmp     needs_tag(Tag)
foo2(Tag1, Tag2, Tag3, Tag4, Tag5):
        mov     edi, 12345
        jmp     needs_tags(int, Tag1, Tag2, Tag3, Tag4, Tag5)

The contents of the tag instances are basically ignored.  The exception is on
"make_tag", where the return is given the value 0 unnecessarily.

But on ARM it is a different matter.  This is for the Cortex-M4:


make_tag():
        mov     r0, #0
        bx      lr
foo1():
        mov     r0, #0
        b       needs_tag(Tag)
foo2(Tag1, Tag2, Tag3, Tag4, Tag5):
        push    {lr}
        sub     sp, sp, #12
        mov     r2, #0
        mov     r3, r2
        strb    r2, [sp, #4]
        strb    r2, [sp]
        mov     r1, r2
        movw    r0, #12345
        bl      needs_tags(int, Tag1, Tag2, Tag3, Tag4, Tag5)
        add     sp, sp, #12
        ldr     pc, [sp], #4

The needless register and stack allocations, initialisations and copying mean
that this technique has a significant overhead for something that should really
"disappear in the compilation".

The x86 port manages this well.  Is it possible to get such optimisations into
the ARM port too?


Oh, and for comparison, clang with the same options (-std=c++17 -Wall -Wextra
-O2 -mcpu=cortex-m4) gives:

make_tag():
        bx      lr
foo1():
        b       needs_tag(Tag)
foo2(Tag1, Tag2, Tag3, Tag4, Tag5):
        movw    r0, #12345
        b       needs_tags(int, Tag1, Tag2, Tag3, Tag4, Tag5)

Reply via email to