Re: Patch ping (was Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841))

2019-12-19 Thread Uros Bizjak
On Fri, Dec 20, 2019 at 12:26 AM Jakub Jelinek  wrote:
>
> On Thu, Dec 19, 2019 at 06:23:59PM +0100, Jakub Jelinek wrote:
> > On Thu, Dec 19, 2019 at 04:50:40PM +0100, Jan Hubicka wrote:
> > > Outputting the move as RIP relative movq would work.
> > > LC12 is string "s" and has nothing to do with stack protecting.
> >
> > This should fix it by doing more carefully what *movdi_internal
> > does.  Will bootstrap/regtest it tonight.
>
> Passed bootstrap/regtest on x86_64-linux and i686-linux.  Ok for trunk?
> >
> > 2019-12-19  Jakub Jelinek  
> >
> >   PR target/92841
> >   * config/i386/i386.md (*stack_protect_set_3): For pic_32bit_operand
> >   always use lea{q}, no matter what value which_alternative has.
> >
> >   * gcc.target/i386/pr92841-2.c: New test.

LGTM.

Thanks,
Uros.

> Jakub
>


Re: Patch ping (was Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841))

2019-12-19 Thread Jakub Jelinek
On Thu, Dec 19, 2019 at 06:23:59PM +0100, Jakub Jelinek wrote:
> On Thu, Dec 19, 2019 at 04:50:40PM +0100, Jan Hubicka wrote:
> > Outputting the move as RIP relative movq would work.
> > LC12 is string "s" and has nothing to do with stack protecting.
> 
> This should fix it by doing more carefully what *movdi_internal
> does.  Will bootstrap/regtest it tonight.

Passed bootstrap/regtest on x86_64-linux and i686-linux.  Ok for trunk?
> 
> 2019-12-19  Jakub Jelinek  
> 
>   PR target/92841
>   * config/i386/i386.md (*stack_protect_set_3): For pic_32bit_operand
>   always use lea{q}, no matter what value which_alternative has.
> 
>   * gcc.target/i386/pr92841-2.c: New test.

Jakub



Re: Patch ping (was Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841))

2019-12-19 Thread Jakub Jelinek
On Thu, Dec 19, 2019 at 04:50:40PM +0100, Jan Hubicka wrote:
> Outputting the move as RIP relative movq would work.
> LC12 is string "s" and has nothing to do with stack protecting.

This should fix it by doing more carefully what *movdi_internal
does.  Will bootstrap/regtest it tonight.

2019-12-19  Jakub Jelinek  

PR target/92841
* config/i386/i386.md (*stack_protect_set_3): For pic_32bit_operand
always use lea{q}, no matter what value which_alternative has.

* gcc.target/i386/pr92841-2.c: New test.

--- gcc/config/i386/i386.md.jj  2019-12-19 13:31:52.424108730 +0100
+++ gcc/config/i386/i386.md 2019-12-19 18:12:54.064747056 +0100
@@ -19863,12 +19863,13 @@ (define_insn "*stack_protect_set_3"
 {
   output_asm_insn ("mov{q}\t{%3, %1|%1, %3}", operands);
   output_asm_insn ("mov{q}\t{%1, %0|%0, %1}", operands);
-  if (which_alternative == 0)
+  if (pic_32bit_operand (operands[2], DImode))
+return "lea{q}\t{%E2, %1|%1, %E2}";
+  else if (which_alternative == 0)
 return "mov{l}\t{%k2, %k1|%k1, %k2}";
   else if (which_alternative == 2)
 return "movabs{q}\t{%2, %1|%1, %2}";
-  else if (pic_32bit_operand (operands[2], DImode)
-  || ix86_use_lea_for_mov (insn, operands + 1))
+  else if (ix86_use_lea_for_mov (insn, operands + 1))
 return "lea{q}\t{%E2, %1|%1, %E2}";
   else
 return "mov{q}\t{%2, %1|%1, %2}";
--- gcc/testsuite/gcc.target/i386/pr92841-2.c.jj2019-12-19 
18:18:31.295587557 +0100
+++ gcc/testsuite/gcc.target/i386/pr92841-2.c   2019-12-19 18:19:37.800570175 
+0100
@@ -0,0 +1,18 @@
+/* PR target/92841 */
+/* { dg-do compile { target { { { *-*-linux* } && lp64 } && fstack_protector } 
} } */
+/* { dg-options "-O2 -fpic -fstack-protector-strong -masm=att" } */
+/* { dg-final { scan-assembler "leaq\tbuf2\\\(%rip\\\)," } } */
+
+static char buf2[64];
+void bar (char *, char *);
+
+void
+foo ()
+{
+  char buf[64];
+  char *p = buf2;
+  asm ("" : "+a" (p));
+  char *q = buf;
+  asm ("" : "+r" (q));
+  bar (p, q);
+}


Jakub



Re: Patch ping (was Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841))

2019-12-19 Thread Jan Hubicka
> On Thu, Dec 19, 2019 at 04:01:31PM +0100, Jan Hubicka wrote:
> > I now get build failure of Firefox with LTO due to:
> > 
> > movabsq $.LC12, %rax
> > 
> > which is output by:
> > 
> > (insn:TI 468 3 849 2 (parallel [
> > 
> > (set (mem/v/f/c:DI (plus:DI (reg/f:DI 6 bp) 
> > 
> > (const_int -56 [0xffc8])) [1 D.22910+0 
> > S8 A64])
> > (unspec:DI [
> > 
> > (mem/v/f:DI (const_int 40 [0x28]) [0 
> > MEM[( long unsigned int *)40B]+0 S8 A64 AS1])
> > ] UNSPEC_SP_SET))   
> > 
> > (set (reg:DI 0 ax [157])
> > 
> > (symbol_ref/f:DI ("*.LC12") [flags 0x2]   > 0x756ac870 *.LC12>))
> > (clobber (reg:CC 17 flags)) 
> > 
> > ]) 1042 {*stack_protect_set_3}  
> > 
> >  (expr_list:REG_UNUSED (reg:CC 17 flags)
> > 
> > (nil))) 
> > 
> > 
> > I suppose we need to be more careful about the symbolic operands for
> > movabs?
> 
> Is that -fPIC or -fPIE or -fno-pic?
> What were the two insns before peephole2?
It is -fPIC transforming

(insn 9 7 12 2 (parallel [  
(set (mem/v/f/c:DI (plus:DI (reg/f:DI 6 bp) 
(const_int -56 [0xffc8])) [1 D.22910+0 S8 
A64])
(unspec:DI [
(mem/v/f:DI (const_int 40 [0x28]) [0 
MEM[( long unsigned int *)40B]+0 S8 A64 AS1])
] UNSPEC_SP_SET))   
(set (reg:DI 0 ax [157])
(const_int 0 [0]))  
(clobber (reg:CC 17 flags)) 
]) 
"/aux/hubicka/firefox2019-trunktest/dist/include/testing/gtest/include/gtest/gtest.h":1538:17
 1039 {stack_protect_set_1_di}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_UNUSED (reg:DI 0 ax [157])   
(nil
 
(note 12 9 13 2 NOTE_INSN_DELETED)  
(insn 13 12 426 2 (set (reg:DI 0 ax [159])  
(symbol_ref/f:DI ("*.LC12") [flags 0x2]  )) 66 {*movdi_internal}
 (nil)) 

to

(insn 468 7 426 2 (parallel [   
(set (mem/v/f/c:DI (plus:DI (reg/f:DI 6 bp) 
(const_int -56 [0xffc8])) [1 D.22910+0 S8 
A64])
(unspec:DI [
(mem/v/f:DI (const_int 40 [0x28]) [0 
MEM[( long unsigned int *)40B]+0 S8 A64 AS1])
] UNSPEC_SP_SET))   
(set (reg:DI 0 ax [157])
(symbol_ref/f:DI ("*.LC12") [flags 0x2]  ))
(clobber (reg:CC 17 flags)) 
]) -1   
 (nil)) 

Outputting the move as RIP relative movq would work.
LC12 is string "s" and has nothing to do with stack protecting.


Re: Patch ping (was Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841))

2019-12-19 Thread Jakub Jelinek
On Thu, Dec 19, 2019 at 04:01:31PM +0100, Jan Hubicka wrote:
> I now get build failure of Firefox with LTO due to:
> 
> movabsq $.LC12, %rax
> 
> which is output by:
> 
> (insn:TI 468 3 849 2 (parallel [  
>   
> (set (mem/v/f/c:DI (plus:DI (reg/f:DI 6 bp)   
>   
> (const_int -56 [0xffc8])) [1 D.22910+0 S8 
> A64])
> (unspec:DI [  
>   
> (mem/v/f:DI (const_int 40 [0x28]) [0 
> MEM[( long unsigned int *)40B]+0 S8 A64 AS1])
> ] UNSPEC_SP_SET)) 
>   
> (set (reg:DI 0 ax [157])  
>   
> (symbol_ref/f:DI ("*.LC12") [flags 0x2]   0x756ac870 *.LC12>))
> (clobber (reg:CC 17 flags))   
>   
> ]) 1042 {*stack_protect_set_3}
>   
>  (expr_list:REG_UNUSED (reg:CC 17 flags)  
>   
> (nil)))   
>   
> 
> I suppose we need to be more careful about the symbolic operands for
> movabs?

Is that -fPIC or -fPIE or -fno-pic?
What were the two insns before peephole2?

Jakub



Re: Patch ping (was Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841))

2019-12-19 Thread Jan Hubicka
> Hi!
> 
> I'd like to ping this patch (with the sizeof (c) -> sizeof (c) / sizeof (c[0])
> testsuite fix Andreas pointed out).
> 
> Thanks!
> 
> On Tue, Dec 10, 2019 at 10:57:35AM +0100, Jakub Jelinek wrote:
> > 2019-12-10  Jakub Jelinek  
> > 
> > PR target/92841
> > * config/i386/i386.md (*stack_protect_set_2_,
> > *stack_protect_set_3): New define_insns and corresponding
> > define_peephole2s.
> > 
> > * gcc.target/i386/pr92841.c: New test.

I now get build failure of Firefox with LTO due to:

movabsq $.LC12, %rax

which is output by:

(insn:TI 468 3 849 2 (parallel [
(set (mem/v/f/c:DI (plus:DI (reg/f:DI 6 bp) 
(const_int -56 [0xffc8])) [1 D.22910+0 S8 
A64])
(unspec:DI [
(mem/v/f:DI (const_int 40 [0x28]) [0 
MEM[( long unsigned int *)40B]+0 S8 A64 AS1])
] UNSPEC_SP_SET))   
(set (reg:DI 0 ax [157])
(symbol_ref/f:DI ("*.LC12") [flags 0x2]  ))
(clobber (reg:CC 17 flags)) 
]) 1042 {*stack_protect_set_3}  
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil))) 

I suppose we need to be more careful about the symbolic operands for
movabs?

Honza


Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841)

2019-12-17 Thread Uros Bizjak
On Tue, Dec 10, 2019 at 10:57 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The stack_protect_set_1_ pattern intentionally clears the register it
> used as a temporary to read the canary from the register and push it back
> on the stack for security reasons, to make sure the stack canary isn't
> spilled somewhere else etc.  On the following testcase, we end up with a
> weird:
> movq%fs:40, %rax
> movq%rax, 24(%rsp)
> xorl%eax, %eax
> movl$30, %eax
> sequence though, where the reporter rightfully complains it is a waste
> to clear the register and immediately set it to something else.
>
> We really don't want to split this into two patterns, because then the
> scheduler and whatever other post-RA passes could stick some further
> code in between and increase the lifetime of the security sensitive
> data in the register.
>
> So, the following patch uses peephole2 to merge the
> stack_protect_set_1_ insn with following setter of the same register.
> Only SImode and for TARGET_64BIT DImode moves are considered, as QI/HImode
> (unless actually emitted as SImode) moves don't overwrite the whole
> register, and for simplicity only the most common cases (no XMM/MM etc.
> sources).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested
> (just check-gcc check-c++-all check-target-libstdc++-v3 with
> --target_board=unix/-fstack-protector-strong).  The -fstack-protector-strong
> testing was done with additional logging whenever the peephole2's kick in.
> During the -m64 testing, it kicked in 17682 times, during -m32 testing
> 6344 times.  Ok for trunk?
>
> 2019-12-10  Jakub Jelinek  
>
> PR target/92841
> * config/i386/i386.md (*stack_protect_set_2_,
> *stack_protect_set_3): New define_insns and corresponding
> define_peephole2s.
>
> * gcc.target/i386/pr92841.c: New test.

OK with a couple of changes below.

Thanks,
Uros.

> --- gcc/config/i386/i386.md.jj  2019-12-05 10:04:05.35723 +0100
> +++ gcc/config/i386/i386.md 2019-12-09 19:29:31.578885594 +0100
> @@ -19732,6 +19732,98 @@ (define_insn "@stack_protect_set_1_"mov{}\t{%1, %2|%2, %1}\;mov{}\t{%2, %0|%0, 
> %2}\;xor{l}\t%k2, %k2"
>[(set_attr "type" "multi")])
>
> +;; Patterns and peephole2s to optimize stack_protect_set_1_
> +;; immediately followed by *mov{s,d}i_internal to the same register,
> +;; where we can avoid the xor{l} above.  We don't split this, so that
> +;; scheduling or anything else doesn't separate the *stack_protect_set*
> +;; pattern from the set of the register that overwrites the register
> +;; with a new value.
> +(define_insn "*stack_protect_set_2_"
> +  [(set (match_operand:PTR 0 "memory_operand" "=m")
> +   (unspec:PTR [(match_operand:PTR 3 "memory_operand" "m")]
> +   UNSPEC_SP_SET))
> +   (set (match_operand:SI 1 "register_operand" "=")
> +   (match_operand:SI 2 "general_operand" "g"))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "reload_completed
> +   && !reg_overlap_mentioned_p (operands[1], operands[2])"
> +{
> +  output_asm_insn ("mov{}\t{%3, %1|%1, %3}", operands);
> +  output_asm_insn ("mov{}\t{%1, %0|%0, %1}", operands);

This looks much better than the current asm template in
stack_protect_set_1_ and stack_protect_test_1_. Can you
maybe also change templates there?

> +  if (pic_32bit_operand (operands[2], SImode)
> +  || ix86_use_lea_for_mov (insn, operands + 1))
> +return "lea{l}\t{%E2, %1|%1, %E2}";
> +  else
> +return "mov{l}\t{%2, %1|%1, %2}";
> +}
> +  [(set_attr "type" "multi")
> +   (set_attr "length" "24")])
> +
> +(define_peephole2
> + [(parallel [(set (match_operand:PTR 0 "memory_operand")
> + (unspec:PTR [(match_operand:PTR 1 "memory_operand")]
> + UNSPEC_SP_SET))
> +(set (match_operand:PTR 2 "general_reg_operand") (const_int 0))
> +(clobber (reg:CC FLAGS_REG))])
> +  (set (match_operand:SI 3 "general_reg_operand")
> +   (match_operand:SI 4 "general_operand"))]
> + "reload_completed
> +  && REGNO (operands[2]) == REGNO (operands[3])
> +  && !reg_overlap_mentioned_p (operands[3], operands[4])
> +  && (general_reg_operand (operands[4], SImode)
> +  || !register_operand (operands[4], SImode))"
> + [(parallel [(set (match_dup 0)
> + (unspec:PTR [(match_dup 1)] UNSPEC_SP_SET))
> +(set (match_dup 3) (match_dup 4))
> +(clobber (reg:CC FLAGS_REG))])])

No need for "reload_completed" in peephole2 patterns. Also (IIRC),
operand predicate is not needed for operand 4 when it is mentioned in
the insn condition.

> +(define_insn "*stack_protect_set_3"
> +  [(set (match_operand:DI 0 "memory_operand" "=m,m,m")
> +   (unspec:DI [(match_operand:DI 3 "memory_operand" "m,m,m")]
> +  UNSPEC_SP_SET))
> +   (set (match_operand:DI 1 "register_operand" "=,r,r")
> +   (match_operand:DI 2 "general_operand" "Z,rem,i"))
> +   (clobber (reg:CC FLAGS_REG))]
> +  

Patch ping (was Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841))

2019-12-17 Thread Jakub Jelinek
Hi!

I'd like to ping this patch (with the sizeof (c) -> sizeof (c) / sizeof (c[0])
testsuite fix Andreas pointed out).

Thanks!

On Tue, Dec 10, 2019 at 10:57:35AM +0100, Jakub Jelinek wrote:
> 2019-12-10  Jakub Jelinek  
> 
>   PR target/92841
>   * config/i386/i386.md (*stack_protect_set_2_,
>   *stack_protect_set_3): New define_insns and corresponding
>   define_peephole2s.
> 
>   * gcc.target/i386/pr92841.c: New test.

Jakub



Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841)

2019-12-10 Thread Jakub Jelinek
On Tue, Dec 10, 2019 at 11:02:39AM +0100, Andreas Schwab wrote:
> On Dez 10 2019, Jakub Jelinek wrote:
> 
> > --- gcc/testsuite/gcc.target/i386/pr92841.c.jj  2019-12-09 
> > 19:38:29.572759215 +0100
> > +++ gcc/testsuite/gcc.target/i386/pr92841.c 2019-12-09 19:40:59.642492417 
> > +0100
> > @@ -0,0 +1,17 @@
> > +/* PR target/92841 */
> > +/* { dg-do compile { target fstack_protector } } */
> > +/* { dg-options "-O2 -fstack-protector-strong -masm=att" } */
> > +/* { dg-final { scan-assembler-not "xor\[lq]\t%(\[re]\[a-z0-9]*), 
> > %\\1\[\n\r]*\tmov\[lq]\t\[^\n\r]*, %\\1" } } */
> > +
> > +const struct S { int b; } c[] = {30, 12, 20, 0, 11};
> > +void bar (int *);
> > +
> > +void
> > +foo (void)
> > +{
> > +  int e[4];
> > +  const struct S *a;
> > +  for (a = c; a < c + sizeof (c); a++)
> > +if (a->b)
> 
> This accesses c beyond bounds.  Does that invalidate the test?

Thanks for noticing, changed into
  for (a = c; a < c + sizeof (c) / sizeof (c[0]); a++)
in my copy, the testcase still FAILs before the patch and PASSes with it.

Jakub



Re: [PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841)

2019-12-10 Thread Andreas Schwab
On Dez 10 2019, Jakub Jelinek wrote:

> --- gcc/testsuite/gcc.target/i386/pr92841.c.jj2019-12-09 
> 19:38:29.572759215 +0100
> +++ gcc/testsuite/gcc.target/i386/pr92841.c   2019-12-09 19:40:59.642492417 
> +0100
> @@ -0,0 +1,17 @@
> +/* PR target/92841 */
> +/* { dg-do compile { target fstack_protector } } */
> +/* { dg-options "-O2 -fstack-protector-strong -masm=att" } */
> +/* { dg-final { scan-assembler-not "xor\[lq]\t%(\[re]\[a-z0-9]*), 
> %\\1\[\n\r]*\tmov\[lq]\t\[^\n\r]*, %\\1" } } */
> +
> +const struct S { int b; } c[] = {30, 12, 20, 0, 11};
> +void bar (int *);
> +
> +void
> +foo (void)
> +{
> +  int e[4];
> +  const struct S *a;
> +  for (a = c; a < c + sizeof (c); a++)
> +if (a->b)

This accesses c beyond bounds.  Does that invalidate the test?

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


[PATCH] Oprimize stack_protect_set_1_ followed by a move to the same register (PR target/92841)

2019-12-10 Thread Jakub Jelinek
Hi!

The stack_protect_set_1_ pattern intentionally clears the register it
used as a temporary to read the canary from the register and push it back
on the stack for security reasons, to make sure the stack canary isn't
spilled somewhere else etc.  On the following testcase, we end up with a
weird:
movq%fs:40, %rax
movq%rax, 24(%rsp)
xorl%eax, %eax
movl$30, %eax
sequence though, where the reporter rightfully complains it is a waste
to clear the register and immediately set it to something else.

We really don't want to split this into two patterns, because then the
scheduler and whatever other post-RA passes could stick some further
code in between and increase the lifetime of the security sensitive
data in the register.

So, the following patch uses peephole2 to merge the
stack_protect_set_1_ insn with following setter of the same register.
Only SImode and for TARGET_64BIT DImode moves are considered, as QI/HImode
(unless actually emitted as SImode) moves don't overwrite the whole
register, and for simplicity only the most common cases (no XMM/MM etc.
sources).

Bootstrapped/regtested on x86_64-linux and i686-linux, additionally tested
(just check-gcc check-c++-all check-target-libstdc++-v3 with
--target_board=unix/-fstack-protector-strong).  The -fstack-protector-strong
testing was done with additional logging whenever the peephole2's kick in.
During the -m64 testing, it kicked in 17682 times, during -m32 testing
6344 times.  Ok for trunk?

2019-12-10  Jakub Jelinek  

PR target/92841
* config/i386/i386.md (*stack_protect_set_2_,
*stack_protect_set_3): New define_insns and corresponding
define_peephole2s.

* gcc.target/i386/pr92841.c: New test.

--- gcc/config/i386/i386.md.jj  2019-12-05 10:04:05.35723 +0100
+++ gcc/config/i386/i386.md 2019-12-09 19:29:31.578885594 +0100
@@ -19732,6 +19732,98 @@ (define_insn "@stack_protect_set_1_}\t{%1, %2|%2, %1}\;mov{}\t{%2, %0|%0, 
%2}\;xor{l}\t%k2, %k2"
   [(set_attr "type" "multi")])
 
+;; Patterns and peephole2s to optimize stack_protect_set_1_
+;; immediately followed by *mov{s,d}i_internal to the same register,
+;; where we can avoid the xor{l} above.  We don't split this, so that
+;; scheduling or anything else doesn't separate the *stack_protect_set*
+;; pattern from the set of the register that overwrites the register
+;; with a new value.
+(define_insn "*stack_protect_set_2_"
+  [(set (match_operand:PTR 0 "memory_operand" "=m")
+   (unspec:PTR [(match_operand:PTR 3 "memory_operand" "m")]
+   UNSPEC_SP_SET))
+   (set (match_operand:SI 1 "register_operand" "=")
+   (match_operand:SI 2 "general_operand" "g"))
+   (clobber (reg:CC FLAGS_REG))]
+  "reload_completed
+   && !reg_overlap_mentioned_p (operands[1], operands[2])"
+{
+  output_asm_insn ("mov{}\t{%3, %1|%1, %3}", operands);
+  output_asm_insn ("mov{}\t{%1, %0|%0, %1}", operands);
+  if (pic_32bit_operand (operands[2], SImode)
+  || ix86_use_lea_for_mov (insn, operands + 1))
+return "lea{l}\t{%E2, %1|%1, %E2}";
+  else
+return "mov{l}\t{%2, %1|%1, %2}";
+}
+  [(set_attr "type" "multi")
+   (set_attr "length" "24")])
+
+(define_peephole2
+ [(parallel [(set (match_operand:PTR 0 "memory_operand")
+ (unspec:PTR [(match_operand:PTR 1 "memory_operand")]
+ UNSPEC_SP_SET))
+(set (match_operand:PTR 2 "general_reg_operand") (const_int 0))
+(clobber (reg:CC FLAGS_REG))])
+  (set (match_operand:SI 3 "general_reg_operand")
+   (match_operand:SI 4 "general_operand"))]
+ "reload_completed
+  && REGNO (operands[2]) == REGNO (operands[3])
+  && !reg_overlap_mentioned_p (operands[3], operands[4])
+  && (general_reg_operand (operands[4], SImode)
+  || !register_operand (operands[4], SImode))"
+ [(parallel [(set (match_dup 0)
+ (unspec:PTR [(match_dup 1)] UNSPEC_SP_SET))
+(set (match_dup 3) (match_dup 4))
+(clobber (reg:CC FLAGS_REG))])])
+
+(define_insn "*stack_protect_set_3"
+  [(set (match_operand:DI 0 "memory_operand" "=m,m,m")
+   (unspec:DI [(match_operand:DI 3 "memory_operand" "m,m,m")]
+  UNSPEC_SP_SET))
+   (set (match_operand:DI 1 "register_operand" "=,r,r")
+   (match_operand:DI 2 "general_operand" "Z,rem,i"))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_64BIT
+   && reload_completed
+   && !reg_overlap_mentioned_p (operands[1], operands[2])"
+{
+  output_asm_insn ("mov{q}\t{%3, %1|%1, %3}", operands);
+  output_asm_insn ("mov{q}\t{%1, %0|%0, %1}", operands);
+  if (which_alternative == 0)
+return "mov{l}\t{%k2, %k1|%k1, %k2}";
+  else if (which_alternative == 2)
+return "movabs{q}\t{%2, %1|%1, %2}";
+  else if (pic_32bit_operand (operands[2], DImode)
+  || ix86_use_lea_for_mov (insn, operands + 1))
+return "lea{q}\t{%E2, %1|%1, %E2}";
+  else
+return "mov{q}\t{%2, %1|%1, %2}";
+}
+  [(set_attr "type" "multi")
+