https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96532

Eero Tamminen <eerott at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |eerott at gmail dot com

--- Comment #7 from Eero Tamminen <eerott at gmail dot com> ---
Timing and profiling whole EmuTOS (m68k ROM) bootup, showed these added
memcpy() calls adding 8% to the boot time [1] with GCC 13.1.

For that particular case, all those extra (20000) memcpy() calls, and the
associated 8% bootup overhead, came from this loop:
-----------------------------------
uint32_t pair_planes[4];
...
for (i = 0; i < v_planes / 2; i++) {
    *(uint32_t*)addr = pair_planes[i];
    addr += sizeof(uint32_t);
} 
-----------------------------------
And it went away when GCC -freestanding option was used.

Without that memcpy() overhead, GCC 13.1 perf was then very close to GCC 4.6
perf in that particular case (it did not help other cases where newer GCC was
slower).

Further testing with (compiler explorer) showed that when compiler was given a
better hint that the loop it replaced with memcpy() actually loops max 4 times,
those memcpy() instances went also away:
-----------------------------------
if (v_planes > 2*ARRAY_SIZE(pair_planes)) return;
-----------------------------------

How GCC deduced that above loop was large enough that it makes sense to replace
it with memcpy() overhead?  From the max valid index for "pair_planes", it
should have already been clear that any large indexes get to "undefined
behavior".

[1] 1/3 of the boot time went to timeout for waiting user interaction, and 1/3
went to waiting slow disk responses, so in reality the overhead was really 3x
8%.

Reply via email to