[Bug target/90582] AArch64 stack-protector wastes an instruction on address-generation

2024-01-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90582

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

--- Comment #2 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #1)
> > I assume EOR / CBNZ is as at least as efficient as SUBS / BNE on
> > all/most AArch64 microarchitectures, but someone should check.
> 
> It is similar as x86 with that respect on some cores (Marvell's cores
> mostly).
> That is ThunderX, ThunderX 2 and OcteonTX and OcteonTX2 all have the ability
> to do macro-combining of the two instructions into one micro-op.

Even on non-most Marvell cores now, subs/bne is better than eor/cbnz.


Anyways starting GCC 10.3/9.4  we get:
ldr x2, [x0]
subsx1, x1, x2
mov x2, 0
bne .L5

Which we can't fuse anyways.  I wonder if we should clobber x1 too.


Note for -fomit-frame-pointer issue, it is not really an issue as only
-momit-leaf-frame-pointer is turned on by default and now the function is NOT a
leaf function due to the call to __stack_chk_fail .

>mov x1,0# and destroy the reg
>mov w1, 3   # right before it's already 
> destroyed

This is by design, GCC does not go back and figure out if we could remove the
zeroing as if it deletes it on accident, it might introduce a "security hole".
So emitting it always allows that NOT to happen.


As far as the other issue dealing with the address formation, it is a small
missed optmization and might not help in general or at all.

[Bug target/90582] AArch64 stack-protector wastes an instruction on address-generation

2019-05-22 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90582

--- Comment #1 from Andrew Pinski  ---
> I assume EOR / CBNZ is as at least as efficient as SUBS / BNE on
> all/most AArch64 microarchitectures, but someone should check.

It is similar as x86 with that respect on some cores (Marvell's cores mostly).
That is ThunderX, ThunderX 2 and OcteonTX and OcteonTX2 all have the ability to
do macro-combining of the two instructions into one micro-op.