Re: Patch ping (was Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841))
On Fri, Dec 20, 2019 at 12:26 AM Jakub Jelinek wrote: > > On Thu, Dec 19, 2019 at 06:23:59PM +0100, Jakub Jelinek wrote: > > On Thu, Dec 19, 2019 at 04:50:40PM +0100, Jan Hubicka wrote: > > > Outputting the move as RIP relative movq would work. > > > LC12 is string "s" and has nothing to do with stack protecting. > > > > This should fix it by doing more carefully what *movdi_internal > > does. Will bootstrap/regtest it tonight. > > Passed bootstrap/regtest on x86_64-linux and i686-linux. Ok for trunk? > > > > 2019-12-19 Jakub Jelinek > > > > PR target/92841 > > * config/i386/i386.md (*stack_protect_set_3): For pic_32bit_operand > > always use lea{q}, no matter what value which_alternative has. > > > > * gcc.target/i386/pr92841-2.c: New test. LGTM. Thanks, Uros. > Jakub >
Re: Patch ping (was Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841))
On Thu, Dec 19, 2019 at 06:23:59PM +0100, Jakub Jelinek wrote: > On Thu, Dec 19, 2019 at 04:50:40PM +0100, Jan Hubicka wrote: > > Outputting the move as RIP relative movq would work. > > LC12 is string "s" and has nothing to do with stack protecting. > > This should fix it by doing more carefully what *movdi_internal > does. Will bootstrap/regtest it tonight. Passed bootstrap/regtest on x86_64-linux and i686-linux. Ok for trunk? > > 2019-12-19 Jakub Jelinek > > PR target/92841 > * config/i386/i386.md (*stack_protect_set_3): For pic_32bit_operand > always use lea{q}, no matter what value which_alternative has. > > * gcc.target/i386/pr92841-2.c: New test. Jakub
Re: Patch ping (was Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841))
On Thu, Dec 19, 2019 at 04:50:40PM +0100, Jan Hubicka wrote: > Outputting the move as RIP relative movq would work. > LC12 is string "s" and has nothing to do with stack protecting. This should fix it by doing more carefully what *movdi_internal does. Will bootstrap/regtest it tonight. 2019-12-19 Jakub Jelinek PR target/92841 * config/i386/i386.md (*stack_protect_set_3): For pic_32bit_operand always use lea{q}, no matter what value which_alternative has. * gcc.target/i386/pr92841-2.c: New test. --- gcc/config/i386/i386.md.jj 2019-12-19 13:31:52.424108730 +0100 +++ gcc/config/i386/i386.md 2019-12-19 18:12:54.064747056 +0100 @@ -19863,12 +19863,13 @@ (define_insn "*stack_protect_set_3" { output_asm_insn ("mov{q}\t{%3, %1|%1, %3}", operands); output_asm_insn ("mov{q}\t{%1, %0|%0, %1}", operands); - if (which_alternative == 0) + if (pic_32bit_operand (operands[2], DImode)) +return "lea{q}\t{%E2, %1|%1, %E2}"; + else if (which_alternative == 0) return "mov{l}\t{%k2, %k1|%k1, %k2}"; else if (which_alternative == 2) return "movabs{q}\t{%2, %1|%1, %2}"; - else if (pic_32bit_operand (operands[2], DImode) - || ix86_use_lea_for_mov (insn, operands + 1)) + else if (ix86_use_lea_for_mov (insn, operands + 1)) return "lea{q}\t{%E2, %1|%1, %E2}"; else return "mov{q}\t{%2, %1|%1, %2}"; --- gcc/testsuite/gcc.target/i386/pr92841-2.c.jj2019-12-19 18:18:31.295587557 +0100 +++ gcc/testsuite/gcc.target/i386/pr92841-2.c 2019-12-19 18:19:37.800570175 +0100 @@ -0,0 +1,18 @@ +/* PR target/92841 */ +/* { dg-do compile { target { { { *-*-linux* } && lp64 } && fstack_protector } } } */ +/* { dg-options "-O2 -fpic -fstack-protector-strong -masm=att" } */ +/* { dg-final { scan-assembler "leaq\tbuf2\\\(%rip\\\)," } } */ + +static char buf2[64]; +void bar (char *, char *); + +void +foo () +{ + char buf[64]; + char *p = buf2; + asm ("" : "+a" (p)); + char *q = buf; + asm ("" : "+r" (q)); + bar (p, q); +} Jakub
Re: Patch ping (was Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841))
> On Thu, Dec 19, 2019 at 04:01:31PM +0100, Jan Hubicka wrote: > > I now get build failure of Firefox with LTO due to: > > > > movabsq $.LC12, %rax > > > > which is output by: > > > > (insn:TI 468 3 849 2 (parallel [ > > > > (set (mem/v/f/c:DI (plus:DI (reg/f:DI 6 bp) > > > > (const_int -56 [0xffc8])) [1 D.22910+0 > > S8 A64]) > > (unspec:DI [ > > > > (mem/v/f:DI (const_int 40 [0x28]) [0 > > MEM[( long unsigned int *)40B]+0 S8 A64 AS1]) > > ] UNSPEC_SP_SET)) > > > > (set (reg:DI 0 ax [157]) > > > > (symbol_ref/f:DI ("*.LC12") [flags 0x2] > 0x756ac870 *.LC12>)) > > (clobber (reg:CC 17 flags)) > > > > ]) 1042 {*stack_protect_set_3} > > > > (expr_list:REG_UNUSED (reg:CC 17 flags) > > > > (nil))) > > > > > > I suppose we need to be more careful about the symbolic operands for > > movabs? > > Is that -fPIC or -fPIE or -fno-pic? > What were the two insns before peephole2? It is -fPIC transforming (insn 9 7 12 2 (parallel [ (set (mem/v/f/c:DI (plus:DI (reg/f:DI 6 bp) (const_int -56 [0xffc8])) [1 D.22910+0 S8 A64]) (unspec:DI [ (mem/v/f:DI (const_int 40 [0x28]) [0 MEM[( long unsigned int *)40B]+0 S8 A64 AS1]) ] UNSPEC_SP_SET)) (set (reg:DI 0 ax [157]) (const_int 0 [0])) (clobber (reg:CC 17 flags)) ]) "/aux/hubicka/firefox2019-trunktest/dist/include/testing/gtest/include/gtest/gtest.h":1538:17 1039 {stack_protect_set_1_di} (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_UNUSED (reg:DI 0 ax [157]) (nil (note 12 9 13 2 NOTE_INSN_DELETED) (insn 13 12 426 2 (set (reg:DI 0 ax [159]) (symbol_ref/f:DI ("*.LC12") [flags 0x2] )) 66 {*movdi_internal} (nil)) to (insn 468 7 426 2 (parallel [ (set (mem/v/f/c:DI (plus:DI (reg/f:DI 6 bp) (const_int -56 [0xffc8])) [1 D.22910+0 S8 A64]) (unspec:DI [ (mem/v/f:DI (const_int 40 [0x28]) [0 MEM[( long unsigned int *)40B]+0 S8 A64 AS1]) ] UNSPEC_SP_SET)) (set (reg:DI 0 ax [157]) (symbol_ref/f:DI ("*.LC12") [flags 0x2] )) (clobber (reg:CC 17 flags)) ]) -1 (nil)) Outputting the move as RIP relative movq would work. LC12 is string "s" and has nothing to do with stack protecting.
Re: Patch ping (was Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841))
On Thu, Dec 19, 2019 at 04:01:31PM +0100, Jan Hubicka wrote: > I now get build failure of Firefox with LTO due to: > > movabsq $.LC12, %rax > > which is output by: > > (insn:TI 468 3 849 2 (parallel [ > > (set (mem/v/f/c:DI (plus:DI (reg/f:DI 6 bp) > > (const_int -56 [0xffc8])) [1 D.22910+0 S8 > A64]) > (unspec:DI [ > > (mem/v/f:DI (const_int 40 [0x28]) [0 > MEM[( long unsigned int *)40B]+0 S8 A64 AS1]) > ] UNSPEC_SP_SET)) > > (set (reg:DI 0 ax [157]) > > (symbol_ref/f:DI ("*.LC12") [flags 0x2] 0x756ac870 *.LC12>)) > (clobber (reg:CC 17 flags)) > > ]) 1042 {*stack_protect_set_3} > > (expr_list:REG_UNUSED (reg:CC 17 flags) > > (nil))) > > > I suppose we need to be more careful about the symbolic operands for > movabs? Is that -fPIC or -fPIE or -fno-pic? What were the two insns before peephole2? Jakub
Re: Patch ping (was Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841))
> Hi! > > I'd like to ping this patch (with the sizeof (c) -> sizeof (c) / sizeof (c[0]) > testsuite fix Andreas pointed out). > > Thanks! > > On Tue, Dec 10, 2019 at 10:57:35AM +0100, Jakub Jelinek wrote: > > 2019-12-10 Jakub Jelinek > > > > PR target/92841 > > * config/i386/i386.md (*stack_protect_set_2_, > > *stack_protect_set_3): New define_insns and corresponding > > define_peephole2s. > > > > * gcc.target/i386/pr92841.c: New test. I now get build failure of Firefox with LTO due to: movabsq $.LC12, %rax which is output by: (insn:TI 468 3 849 2 (parallel [ (set (mem/v/f/c:DI (plus:DI (reg/f:DI 6 bp) (const_int -56 [0xffc8])) [1 D.22910+0 S8 A64]) (unspec:DI [ (mem/v/f:DI (const_int 40 [0x28]) [0 MEM[( long unsigned int *)40B]+0 S8 A64 AS1]) ] UNSPEC_SP_SET)) (set (reg:DI 0 ax [157]) (symbol_ref/f:DI ("*.LC12") [flags 0x2] )) (clobber (reg:CC 17 flags)) ]) 1042 {*stack_protect_set_3} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) I suppose we need to be more careful about the symbolic operands for movabs? Honza
Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841)
On Tue, Dec 10, 2019 at 10:57 AM Jakub Jelinek wrote: > > Hi! > > The stack_protect_set_1_ pattern intentionally clears the register it > used as a temporary to read the canary from the register and push it back > on the stack for security reasons, to make sure the stack canary isn't > spilled somewhere else etc. On the following testcase, we end up with a > weird: > movq%fs:40, %rax > movq%rax, 24(%rsp) > xorl%eax, %eax > movl$30, %eax > sequence though, where the reporter rightfully complains it is a waste > to clear the register and immediately set it to something else. > > We really don't want to split this into two patterns, because then the > scheduler and whatever other post-RA passes could stick some further > code in between and increase the lifetime of the security sensitive > data in the register. > > So, the following patch uses peephole2 to merge the > stack_protect_set_1_ insn with following setter of the same register. > Only SImode and for TARGET_64BIT DImode moves are considered, as QI/HImode > (unless actually emitted as SImode) moves don't overwrite the whole > register, and for simplicity only the most common cases (no XMM/MM etc. > sources). > > Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested > (just check-gcc check-c++-all check-target-libstdc++-v3 with > --target_board=unix/-fstack-protector-strong). The -fstack-protector-strong > testing was done with additional logging whenever the peephole2's kick in. > During the -m64 testing, it kicked in 17682 times, during -m32 testing > 6344 times. Ok for trunk? > > 2019-12-10 Jakub Jelinek > > PR target/92841 > * config/i386/i386.md (*stack_protect_set_2_, > *stack_protect_set_3): New define_insns and corresponding > define_peephole2s. > > * gcc.target/i386/pr92841.c: New test. OK with a couple of changes below. Thanks, Uros. > --- gcc/config/i386/i386.md.jj 2019-12-05 10:04:05.35723 +0100 > +++ gcc/config/i386/i386.md 2019-12-09 19:29:31.578885594 +0100 > @@ -19732,6 +19732,98 @@ (define_insn "@stack_protect_set_1_"mov{}\t{%1, %2|%2, %1}\;mov{}\t{%2, %0|%0, > %2}\;xor{l}\t%k2, %k2" >[(set_attr "type" "multi")]) > > +;; Patterns and peephole2s to optimize stack_protect_set_1_ > +;; immediately followed by *mov{s,d}i_internal to the same register, > +;; where we can avoid the xor{l} above. We don't split this, so that > +;; scheduling or anything else doesn't separate the *stack_protect_set* > +;; pattern from the set of the register that overwrites the register > +;; with a new value. > +(define_insn "*stack_protect_set_2_" > + [(set (match_operand:PTR 0 "memory_operand" "=m") > + (unspec:PTR [(match_operand:PTR 3 "memory_operand" "m")] > + UNSPEC_SP_SET)) > + (set (match_operand:SI 1 "register_operand" "=") > + (match_operand:SI 2 "general_operand" "g")) > + (clobber (reg:CC FLAGS_REG))] > + "reload_completed > + && !reg_overlap_mentioned_p (operands[1], operands[2])" > +{ > + output_asm_insn ("mov{}\t{%3, %1|%1, %3}", operands); > + output_asm_insn ("mov{}\t{%1, %0|%0, %1}", operands); This looks much better than the current asm template in stack_protect_set_1_ and stack_protect_test_1_. Can you maybe also change templates there? > + if (pic_32bit_operand (operands[2], SImode) > + || ix86_use_lea_for_mov (insn, operands + 1)) > +return "lea{l}\t{%E2, %1|%1, %E2}"; > + else > +return "mov{l}\t{%2, %1|%1, %2}"; > +} > + [(set_attr "type" "multi") > + (set_attr "length" "24")]) > + > +(define_peephole2 > + [(parallel [(set (match_operand:PTR 0 "memory_operand") > + (unspec:PTR [(match_operand:PTR 1 "memory_operand")] > + UNSPEC_SP_SET)) > +(set (match_operand:PTR 2 "general_reg_operand") (const_int 0)) > +(clobber (reg:CC FLAGS_REG))]) > + (set (match_operand:SI 3 "general_reg_operand") > + (match_operand:SI 4 "general_operand"))] > + "reload_completed > + && REGNO (operands[2]) == REGNO (operands[3]) > + && !reg_overlap_mentioned_p (operands[3], operands[4]) > + && (general_reg_operand (operands[4], SImode) > + || !register_operand (operands[4], SImode))" > + [(parallel [(set (match_dup 0) > + (unspec:PTR [(match_dup 1)] UNSPEC_SP_SET)) > +(set (match_dup 3) (match_dup 4)) > +(clobber (reg:CC FLAGS_REG))])]) No need for "reload_completed" in peephole2 patterns. Also (IIRC), operand predicate is not needed for operand 4 when it is mentioned in the insn condition. > +(define_insn "*stack_protect_set_3" > + [(set (match_operand:DI 0 "memory_operand" "=m,m,m") > + (unspec:DI [(match_operand:DI 3 "memory_operand" "m,m,m")] > + UNSPEC_SP_SET)) > + (set (match_operand:DI 1 "register_operand" "=,r,r") > + (match_operand:DI 2 "general_operand" "Z,rem,i")) > + (clobber (reg:CC FLAGS_REG))] > +
Patch ping (was Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841))
Hi! I'd like to ping this patch (with the sizeof (c) -> sizeof (c) / sizeof (c[0]) testsuite fix Andreas pointed out). Thanks! On Tue, Dec 10, 2019 at 10:57:35AM +0100, Jakub Jelinek wrote: > 2019-12-10 Jakub Jelinek > > PR target/92841 > * config/i386/i386.md (*stack_protect_set_2_, > *stack_protect_set_3): New define_insns and corresponding > define_peephole2s. > > * gcc.target/i386/pr92841.c: New test. Jakub
Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841)
On Tue, Dec 10, 2019 at 11:02:39AM +0100, Andreas Schwab wrote: > On Dez 10 2019, Jakub Jelinek wrote: > > > --- gcc/testsuite/gcc.target/i386/pr92841.c.jj 2019-12-09 > > 19:38:29.572759215 +0100 > > +++ gcc/testsuite/gcc.target/i386/pr92841.c 2019-12-09 19:40:59.642492417 > > +0100 > > @@ -0,0 +1,17 @@ > > +/* PR target/92841 */ > > +/* { dg-do compile { target fstack_protector } } */ > > +/* { dg-options "-O2 -fstack-protector-strong -masm=att" } */ > > +/* { dg-final { scan-assembler-not "xor\[lq]\t%(\[re]\[a-z0-9]*), > > %\\1\[\n\r]*\tmov\[lq]\t\[^\n\r]*, %\\1" } } */ > > + > > +const struct S { int b; } c[] = {30, 12, 20, 0, 11}; > > +void bar (int *); > > + > > +void > > +foo (void) > > +{ > > + int e[4]; > > + const struct S *a; > > + for (a = c; a < c + sizeof (c); a++) > > +if (a->b) > > This accesses c beyond bounds. Does that invalidate the test? Thanks for noticing, changed into for (a = c; a < c + sizeof (c) / sizeof (c[0]); a++) in my copy, the testcase still FAILs before the patch and PASSes with it. Jakub
Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841)
On Dez 10 2019, Jakub Jelinek wrote: > --- gcc/testsuite/gcc.target/i386/pr92841.c.jj2019-12-09 > 19:38:29.572759215 +0100 > +++ gcc/testsuite/gcc.target/i386/pr92841.c 2019-12-09 19:40:59.642492417 > +0100 > @@ -0,0 +1,17 @@ > +/* PR target/92841 */ > +/* { dg-do compile { target fstack_protector } } */ > +/* { dg-options "-O2 -fstack-protector-strong -masm=att" } */ > +/* { dg-final { scan-assembler-not "xor\[lq]\t%(\[re]\[a-z0-9]*), > %\\1\[\n\r]*\tmov\[lq]\t\[^\n\r]*, %\\1" } } */ > + > +const struct S { int b; } c[] = {30, 12, 20, 0, 11}; > +void bar (int *); > + > +void > +foo (void) > +{ > + int e[4]; > + const struct S *a; > + for (a = c; a < c + sizeof (c); a++) > +if (a->b) This accesses c beyond bounds. Does that invalidate the test? Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
[PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841)
Hi! The stack_protect_set_1_ pattern intentionally clears the register it used as a temporary to read the canary from the register and push it back on the stack for security reasons, to make sure the stack canary isn't spilled somewhere else etc. On the following testcase, we end up with a weird: movq%fs:40, %rax movq%rax, 24(%rsp) xorl%eax, %eax movl$30, %eax sequence though, where the reporter rightfully complains it is a waste to clear the register and immediately set it to something else. We really don't want to split this into two patterns, because then the scheduler and whatever other post-RA passes could stick some further code in between and increase the lifetime of the security sensitive data in the register. So, the following patch uses peephole2 to merge the stack_protect_set_1_ insn with following setter of the same register. Only SImode and for TARGET_64BIT DImode moves are considered, as QI/HImode (unless actually emitted as SImode) moves don't overwrite the whole register, and for simplicity only the most common cases (no XMM/MM etc. sources). Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested (just check-gcc check-c++-all check-target-libstdc++-v3 with --target_board=unix/-fstack-protector-strong). The -fstack-protector-strong testing was done with additional logging whenever the peephole2's kick in. During the -m64 testing, it kicked in 17682 times, during -m32 testing 6344 times. Ok for trunk? 2019-12-10 Jakub Jelinek PR target/92841 * config/i386/i386.md (*stack_protect_set_2_, *stack_protect_set_3): New define_insns and corresponding define_peephole2s. * gcc.target/i386/pr92841.c: New test. --- gcc/config/i386/i386.md.jj 2019-12-05 10:04:05.35723 +0100 +++ gcc/config/i386/i386.md 2019-12-09 19:29:31.578885594 +0100 @@ -19732,6 +19732,98 @@ (define_insn "@stack_protect_set_1_}\t{%1, %2|%2, %1}\;mov{}\t{%2, %0|%0, %2}\;xor{l}\t%k2, %k2" [(set_attr "type" "multi")]) +;; Patterns and peephole2s to optimize stack_protect_set_1_ +;; immediately followed by *mov{s,d}i_internal to the same register, +;; where we can avoid the xor{l} above. We don't split this, so that +;; scheduling or anything else doesn't separate the *stack_protect_set* +;; pattern from the set of the register that overwrites the register +;; with a new value. +(define_insn "*stack_protect_set_2_" + [(set (match_operand:PTR 0 "memory_operand" "=m") + (unspec:PTR [(match_operand:PTR 3 "memory_operand" "m")] + UNSPEC_SP_SET)) + (set (match_operand:SI 1 "register_operand" "=") + (match_operand:SI 2 "general_operand" "g")) + (clobber (reg:CC FLAGS_REG))] + "reload_completed + && !reg_overlap_mentioned_p (operands[1], operands[2])" +{ + output_asm_insn ("mov{}\t{%3, %1|%1, %3}", operands); + output_asm_insn ("mov{}\t{%1, %0|%0, %1}", operands); + if (pic_32bit_operand (operands[2], SImode) + || ix86_use_lea_for_mov (insn, operands + 1)) +return "lea{l}\t{%E2, %1|%1, %E2}"; + else +return "mov{l}\t{%2, %1|%1, %2}"; +} + [(set_attr "type" "multi") + (set_attr "length" "24")]) + +(define_peephole2 + [(parallel [(set (match_operand:PTR 0 "memory_operand") + (unspec:PTR [(match_operand:PTR 1 "memory_operand")] + UNSPEC_SP_SET)) +(set (match_operand:PTR 2 "general_reg_operand") (const_int 0)) +(clobber (reg:CC FLAGS_REG))]) + (set (match_operand:SI 3 "general_reg_operand") + (match_operand:SI 4 "general_operand"))] + "reload_completed + && REGNO (operands[2]) == REGNO (operands[3]) + && !reg_overlap_mentioned_p (operands[3], operands[4]) + && (general_reg_operand (operands[4], SImode) + || !register_operand (operands[4], SImode))" + [(parallel [(set (match_dup 0) + (unspec:PTR [(match_dup 1)] UNSPEC_SP_SET)) +(set (match_dup 3) (match_dup 4)) +(clobber (reg:CC FLAGS_REG))])]) + +(define_insn "*stack_protect_set_3" + [(set (match_operand:DI 0 "memory_operand" "=m,m,m") + (unspec:DI [(match_operand:DI 3 "memory_operand" "m,m,m")] + UNSPEC_SP_SET)) + (set (match_operand:DI 1 "register_operand" "=,r,r") + (match_operand:DI 2 "general_operand" "Z,rem,i")) + (clobber (reg:CC FLAGS_REG))] + "TARGET_64BIT + && reload_completed + && !reg_overlap_mentioned_p (operands[1], operands[2])" +{ + output_asm_insn ("mov{q}\t{%3, %1|%1, %3}", operands); + output_asm_insn ("mov{q}\t{%1, %0|%0, %1}", operands); + if (which_alternative == 0) +return "mov{l}\t{%k2, %k1|%k1, %k2}"; + else if (which_alternative == 2) +return "movabs{q}\t{%2, %1|%1, %2}"; + else if (pic_32bit_operand (operands[2], DImode) + || ix86_use_lea_for_mov (insn, operands + 1)) +return "lea{q}\t{%E2, %1|%1, %E2}"; + else +return "mov{q}\t{%2, %1|%1, %2}"; +} + [(set_attr "type" "multi") +