[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568 Jakub Jelinek changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #9 from Jakub Jelinek --- Should be fixed now for GCC 10+.
[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568 --- Comment #8 from Jakub Jelinek --- Author: jakub Date: Fri May 24 08:58:50 2019 New Revision: 271596 URL: https://gcc.gnu.org/viewcvs?rev=271596&root=gcc&view=rev Log: PR target/90568 * config/i386/x86-tune-sched.c (ix86_macro_funsion_pair_p): Call gen_attr_type just once instead of 4-7 times. Formatting fixes. Handle stack_protect_test_ codegen similarly to corresponding sub instruction. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/x86-tune-sched.c
[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568 --- Comment #7 from Jakub Jelinek --- Author: jakub Date: Thu May 23 11:18:41 2019 New Revision: 271552 URL: https://gcc.gnu.org/viewcvs?rev=271552&root=gcc&view=rev Log: PR target/90568 * config/i386/i386.md (stack_protect_test_): Use sub instead of xor. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.md
[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568 --- Comment #6 from Jakub Jelinek --- Created attachment 46401 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46401&action=edit gcc10-pr90568-2.patch Incremental untested patch for the macro-fusion, on top of https://gcc.gnu.org/ml/gcc-patches/2019-05/msg01541.html
[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568 --- Comment #5 from Peter Cordes --- And BTW, this only helps if the SUB and JNE are consecutive, which GCC (correctly) doesn't currently optimize for with XOR. If this sub/jne is different from a normal sub/branch and won't already get optimized for macro-fusion, we may get even more benefit from this change by teaching gcc to keep them adjacent. GCC currently sometimes splits up the instructions like this: xorq%fs:40, %rdx movl%ebx, %eax jne .L7 from gcc8.3 (but not 9.1 or trunk in this case) on https://godbolt.org/z/nNjQ8u #include unsigned int get_random_seed() { std::random_device rd; return rd(); } Even with -O3 -march=skylake. That's not wrong because XOR can't macro-fuse, but the point of switching to SUB is that it *can* macro-fuse into a single sub-and-branch uop on Sandybridge-family. So we might need to teach gcc about that. So when you change this, please make it aware of optimizing for macro-fusion by keeping the sub and jne back to back. Preferably with tune=generic (because Sandybridge-family is fairly widespread and it doesn't hurt on other CPUs), but definitely with -mtune=intel or -mtune=sandybridge or later. Nehalem and earlier can only macro-fuse test/cmp The potential downside of putting it adjacent instead of 1 or 2 insns earlier for uarches that can't macro-fuse SUB/JNE should be about zero on average. These branches should predict very well, and there are no in-order x86 CPUs still being sold. So it's mostly just going to be variations in fetch/decode that help sometimes, hurt sometimes, like any code alignment change.
[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568 Jakub Jelinek changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2019-05-22 Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #4 from Jakub Jelinek --- Ok, will change it then. THanks for the report.
[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568 --- Comment #3 from Peter Cordes --- (In reply to Jakub Jelinek from comment #2) > The xor there is intentional, for security reasons we do not want the stack > canary to stay in the register afterwards, because then it could be later > spilled or accessible to some exploit in another way. Ok, so we can't use CMP, therefore we should use SUB, which as I showed does help on Sandybridge-family vs. XOR. x - x = 0 just like x ^ x = 0 Otherwise SUB wouldn't set ZF. SUB is not worse than XOR on any other CPUs; there are no CPUs with better XOR throughput than ADD/SUB. In the canary mismatch case, leaving attacker_value - key in a register seems no worse than leaving attacker_value ^ key in a register. Either value trivially reveals the canary value to an attacker that knows what they overwrote the stack with, if it does somehow leak. We jump to __stack_chk_fail in that case, not relying on the return value on the stack, so a ROP attack wouldn't be sufficient to leak that value anywhere.
[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek --- The xor there is intentional, for security reasons we do not want the stack canary to stay in the register afterwards, because then it could be later spilled or accessible to some exploit in another way.
[Bug target/90568] stack protector should use cmp or sub, not xor, to allow macro-fusion on x86
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90568 --- Comment #1 from Peter Cordes --- https://godbolt.org/z/hHCVTc Forgot to mention, stack-protector also disables use of the red-zone for no apparent reason, so that's another missed optimization. (Perhaps rarely relevant; probably most functions that get stack protection are big enough that they need more stack, or non-leaf. I sidestepped that with volatile.)