[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86

2019-05-24 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Jakub Jelinek  ---
Should be fixed now for GCC 10+.

[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86

2019-05-24 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568

--- Comment #8 from Jakub Jelinek  ---
Author: jakub
Date: Fri May 24 08:58:50 2019
New Revision: 271596

URL: https://gcc.gnu.org/viewcvs?rev=271596&root=gcc&view=rev
Log:
PR target/90568
* config/i386/x86-tune-sched.c (ix86_macro_funsion_pair_p): Call
gen_attr_type just once instead of 4-7 times.  Formatting fixes.
Handle stack_protect_test_ codegen similarly to corresponding
sub instruction.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/x86-tune-sched.c

[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86

2019-05-23 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568

--- Comment #7 from Jakub Jelinek  ---
Author: jakub
Date: Thu May 23 11:18:41 2019
New Revision: 271552

URL: https://gcc.gnu.org/viewcvs?rev=271552&root=gcc&view=rev
Log:
PR target/90568
* config/i386/i386.md (stack_protect_test_): Use sub instead
of xor.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.md

[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86

2019-05-23 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568

--- Comment #6 from Jakub Jelinek  ---
Created attachment 46401
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46401&action=edit
gcc10-pr90568-2.patch

Incremental untested patch for the macro-fusion, on top of
https://gcc.gnu.org/ml/gcc-patches/2019-05/msg01541.html

[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86

2019-05-22 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568

--- Comment #5 from Peter Cordes  ---
And BTW, this only helps if the SUB and JNE are consecutive, which GCC
(correctly) doesn't currently optimize for with XOR.

If this sub/jne is different from a normal sub/branch and won't already get
optimized for macro-fusion, we may get even more benefit from this change by
teaching gcc to keep them adjacent.

GCC currently sometimes splits up the instructions like this:

xorq%fs:40, %rdx
movl%ebx, %eax
jne .L7

from gcc8.3 (but not 9.1 or trunk in this case) on https://godbolt.org/z/nNjQ8u


#include 
unsigned int get_random_seed() {
std::random_device rd;
return rd();
}

Even with -O3 -march=skylake.
That's not wrong because XOR can't macro-fuse, but the point of switching to
SUB is that it *can* macro-fuse into a single sub-and-branch uop on
Sandybridge-family.  So we might need to teach gcc about that.

So when you change this, please make it aware of optimizing for macro-fusion by
keeping the sub and jne back to back.  Preferably with tune=generic (because
Sandybridge-family is fairly widespread and it doesn't hurt on other CPUs), but
definitely with -mtune=intel or -mtune=sandybridge or later.

Nehalem and earlier can only macro-fuse test/cmp

The potential downside of putting it adjacent instead of 1 or 2 insns earlier
for uarches that can't macro-fuse SUB/JNE should be about zero on average. 
These branches should predict very well, and there are no in-order x86 CPUs
still being sold.  So it's mostly just going to be variations in fetch/decode
that help sometimes, hurt sometimes, like any code alignment change.

[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86

2019-05-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2019-05-22
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #4 from Jakub Jelinek  ---
Ok, will change it then.  THanks for the report.

[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86

2019-05-22 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568

--- Comment #3 from Peter Cordes  ---
(In reply to Jakub Jelinek from comment #2)
> The xor there is intentional, for security reasons we do not want the stack
> canary to stay in the register afterwards, because then it could be later
> spilled or accessible to some exploit in another way.

Ok, so we can't use CMP, therefore we should use SUB, which as I showed does
help on Sandybridge-family vs. XOR.

x - x = 0   just like 
x ^ x = 0

Otherwise SUB wouldn't set ZF.

SUB is not worse than XOR on any other CPUs; there are no CPUs with better XOR
throughput than ADD/SUB.

In the canary mismatch case, leaving  attacker_value - key  in a register seems
no worse than leaving attacker_value ^ key in a register.  Either value
trivially reveals the canary value to an attacker that knows what they
overwrote the stack with, if it does somehow leak.  We jump to __stack_chk_fail
in that case, not relying on the return value on the stack, so a ROP attack
wouldn't be sufficient to leak that value anywhere.

[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86

2019-05-22 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
The xor there is intentional, for security reasons we do not want the stack
canary to stay in the register afterwards, because then it could be later
spilled or accessible to some exploit in another way.

[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86

2019-05-21 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568

--- Comment #1 from Peter Cordes  ---
https://godbolt.org/z/hHCVTc

Forgot to mention, stack-protector also disables use of the red-zone for no
apparent reason, so that's another missed optimization.  (Perhaps rarely
relevant; probably most functions that get stack protection are big enough that
they need more stack, or non-leaf.  I sidestepped that with volatile.)