https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90582
Andrew Pinski changed:
What|Removed |Added
Severity|normal |enhancement
--- Comment #2 from Andrew Pinski ---
(In reply to Andrew Pinski from comment #1)
> > I assume EOR / CBNZ is as at least as efficient as SUBS / BNE on
> > all/most AArch64 microarchitectures, but someone should check.
>
> It is similar as x86 with that respect on some cores (Marvell's cores
> mostly).
> That is ThunderX, ThunderX 2 and OcteonTX and OcteonTX2 all have the ability
> to do macro-combining of the two instructions into one micro-op.
Even on non-most Marvell cores now, subs/bne is better than eor/cbnz.
Anyways starting GCC 10.3/9.4 we get:
ldr x2, [x0]
subsx1, x1, x2
mov x2, 0
bne .L5
Which we can't fuse anyways. I wonder if we should clobber x1 too.
Note for -fomit-frame-pointer issue, it is not really an issue as only
-momit-leaf-frame-pointer is turned on by default and now the function is NOT a
leaf function due to the call to __stack_chk_fail .
>mov x1,0# and destroy the reg
>mov w1, 3 # right before it's already
> destroyed
This is by design, GCC does not go back and figure out if we could remove the
zeroing as if it deletes it on accident, it might introduce a "security hole".
So emitting it always allows that NOT to happen.
As far as the other issue dealing with the address formation, it is a small
missed optmization and might not help in general or at all.