Re: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro

2018-02-12 Thread Linus Torvalds
On Mon, Feb 12, 2018 at 5:29 AM, Denys Vlasenko  wrote:
>
> xorq's are slower than xorl's on Silvermont/Knights Landing.
> I propose using xorl instead.

Makes sense, and matches the other 'xorl'.

I suspect the only reason it uses 'xorq' for the high regs is that the
register names for the high regs are "odd" and don't match the legacy
register names. You need to use "%r15d'" for the 32-bit version.

   Linus


Re: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro

2018-02-12 Thread Linus Torvalds
On Mon, Feb 12, 2018 at 5:29 AM, Denys Vlasenko  wrote:
>
> xorq's are slower than xorl's on Silvermont/Knights Landing.
> I propose using xorl instead.

Makes sense, and matches the other 'xorl'.

I suspect the only reason it uses 'xorq' for the high regs is that the
register names for the high regs are "odd" and don't match the legacy
register names. You need to use "%r15d'" for the 32-bit version.

   Linus


Re: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro

2018-02-12 Thread Denys Vlasenko

On 02/12/2018 02:36 PM, David Laight wrote:

From: Denys Vlasenko

Sent: 12 February 2018 13:29

...


x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro

Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before
SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS.
This macro uses PUSH instead of MOV and should therefore be faster, at
least on newer CPUs.

...

Link: http://lkml.kernel.org/r/20180211104949.12992-5-li...@dominikbrodowski.net
Signed-off-by: Ingo Molnar 
---
   arch/x86/entry/calling.h  | 36 
   arch/x86/entry/entry_64.S |  6 ++
   2 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index a05cbb8..57b1b87 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -137,6 +137,42 @@ For 32-bit we have the following conventions - kernel is 
built with
UNWIND_HINT_REGS offset=\offset
.endm

+   .macro PUSH_AND_CLEAR_REGS
+   /*
+* Push registers and sanitize registers of values that a
+* speculation attack might otherwise want to exploit. The
+* lower registers are likely clobbered well before they
+* could be put to use in a speculative execution gadget.
+* Interleave XOR with PUSH for better uop scheduling:
+*/
+   pushq   %rdi/* pt_regs->di */
+   pushq   %rsi/* pt_regs->si */
+   pushq   %rdx/* pt_regs->dx */
+   pushq   %rcx/* pt_regs->cx */
+   pushq   %rax/* pt_regs->ax */
+   pushq   %r8 /* pt_regs->r8 */
+   xorq%r8, %r8/* nospec   r8 */


xorq's are slower than xorl's on Silvermont/Knights Landing.
I propose using xorl instead.


Does using movq to copy the first zero to the other registers make
the code any faster?

ISTR mov reg-reg is often implemented as a register rename rather than an
alu operation.


xorl is implemented in register rename as well. Just, for some reason,
xorq did not get the same treatment on those CPUs.


Re: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro

2018-02-12 Thread Denys Vlasenko

On 02/12/2018 02:36 PM, David Laight wrote:

From: Denys Vlasenko

Sent: 12 February 2018 13:29

...


x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro

Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before
SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS.
This macro uses PUSH instead of MOV and should therefore be faster, at
least on newer CPUs.

...

Link: http://lkml.kernel.org/r/20180211104949.12992-5-li...@dominikbrodowski.net
Signed-off-by: Ingo Molnar 
---
   arch/x86/entry/calling.h  | 36 
   arch/x86/entry/entry_64.S |  6 ++
   2 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index a05cbb8..57b1b87 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -137,6 +137,42 @@ For 32-bit we have the following conventions - kernel is 
built with
UNWIND_HINT_REGS offset=\offset
.endm

+   .macro PUSH_AND_CLEAR_REGS
+   /*
+* Push registers and sanitize registers of values that a
+* speculation attack might otherwise want to exploit. The
+* lower registers are likely clobbered well before they
+* could be put to use in a speculative execution gadget.
+* Interleave XOR with PUSH for better uop scheduling:
+*/
+   pushq   %rdi/* pt_regs->di */
+   pushq   %rsi/* pt_regs->si */
+   pushq   %rdx/* pt_regs->dx */
+   pushq   %rcx/* pt_regs->cx */
+   pushq   %rax/* pt_regs->ax */
+   pushq   %r8 /* pt_regs->r8 */
+   xorq%r8, %r8/* nospec   r8 */


xorq's are slower than xorl's on Silvermont/Knights Landing.
I propose using xorl instead.


Does using movq to copy the first zero to the other registers make
the code any faster?

ISTR mov reg-reg is often implemented as a register rename rather than an
alu operation.


xorl is implemented in register rename as well. Just, for some reason,
xorq did not get the same treatment on those CPUs.


RE: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro

2018-02-12 Thread David Laight
From: Denys Vlasenko
> Sent: 12 February 2018 13:29
...
> >
> > x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro
> >
> > Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before
> > SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS.
> > This macro uses PUSH instead of MOV and should therefore be faster, at
> > least on newer CPUs.
...
> > Link: 
> > http://lkml.kernel.org/r/20180211104949.12992-5-li...@dominikbrodowski.net
> > Signed-off-by: Ingo Molnar 
> > ---
> >   arch/x86/entry/calling.h  | 36 
> >   arch/x86/entry/entry_64.S |  6 ++
> >   2 files changed, 38 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> > index a05cbb8..57b1b87 100644
> > --- a/arch/x86/entry/calling.h
> > +++ b/arch/x86/entry/calling.h
> > @@ -137,6 +137,42 @@ For 32-bit we have the following conventions - kernel 
> > is built with
> > UNWIND_HINT_REGS offset=\offset
> > .endm
> >
> > +   .macro PUSH_AND_CLEAR_REGS
> > +   /*
> > +* Push registers and sanitize registers of values that a
> > +* speculation attack might otherwise want to exploit. The
> > +* lower registers are likely clobbered well before they
> > +* could be put to use in a speculative execution gadget.
> > +* Interleave XOR with PUSH for better uop scheduling:
> > +*/
> > +   pushq   %rdi/* pt_regs->di */
> > +   pushq   %rsi/* pt_regs->si */
> > +   pushq   %rdx/* pt_regs->dx */
> > +   pushq   %rcx/* pt_regs->cx */
> > +   pushq   %rax/* pt_regs->ax */
> > +   pushq   %r8 /* pt_regs->r8 */
> > +   xorq%r8, %r8/* nospec   r8 */
> 
> xorq's are slower than xorl's on Silvermont/Knights Landing.
> I propose using xorl instead.

Does using movq to copy the first zero to the other registers make
the code any faster?

ISTR mov reg-reg is often implemented as a register rename rather than an
alu operation.

David



RE: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro

2018-02-12 Thread David Laight
From: Denys Vlasenko
> Sent: 12 February 2018 13:29
...
> >
> > x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro
> >
> > Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before
> > SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS.
> > This macro uses PUSH instead of MOV and should therefore be faster, at
> > least on newer CPUs.
...
> > Link: 
> > http://lkml.kernel.org/r/20180211104949.12992-5-li...@dominikbrodowski.net
> > Signed-off-by: Ingo Molnar 
> > ---
> >   arch/x86/entry/calling.h  | 36 
> >   arch/x86/entry/entry_64.S |  6 ++
> >   2 files changed, 38 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> > index a05cbb8..57b1b87 100644
> > --- a/arch/x86/entry/calling.h
> > +++ b/arch/x86/entry/calling.h
> > @@ -137,6 +137,42 @@ For 32-bit we have the following conventions - kernel 
> > is built with
> > UNWIND_HINT_REGS offset=\offset
> > .endm
> >
> > +   .macro PUSH_AND_CLEAR_REGS
> > +   /*
> > +* Push registers and sanitize registers of values that a
> > +* speculation attack might otherwise want to exploit. The
> > +* lower registers are likely clobbered well before they
> > +* could be put to use in a speculative execution gadget.
> > +* Interleave XOR with PUSH for better uop scheduling:
> > +*/
> > +   pushq   %rdi/* pt_regs->di */
> > +   pushq   %rsi/* pt_regs->si */
> > +   pushq   %rdx/* pt_regs->dx */
> > +   pushq   %rcx/* pt_regs->cx */
> > +   pushq   %rax/* pt_regs->ax */
> > +   pushq   %r8 /* pt_regs->r8 */
> > +   xorq%r8, %r8/* nospec   r8 */
> 
> xorq's are slower than xorl's on Silvermont/Knights Landing.
> I propose using xorl instead.

Does using movq to copy the first zero to the other registers make
the code any faster?

ISTR mov reg-reg is often implemented as a register rename rather than an
alu operation.

David



Re: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro

2018-02-12 Thread Denys Vlasenko

On 02/12/2018 11:17 AM, tip-bot for Dominik Brodowski wrote:

Commit-ID:  7b7b09f110f06f3c006e9b3f4590f7d9ba91345b
Gitweb: https://git.kernel.org/tip/7b7b09f110f06f3c006e9b3f4590f7d9ba91345b
Author: Dominik Brodowski 
AuthorDate: Sun, 11 Feb 2018 11:49:45 +0100
Committer:  Ingo Molnar 
CommitDate: Mon, 12 Feb 2018 08:06:36 +0100

x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro

Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before
SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS.
This macro uses PUSH instead of MOV and should therefore be faster, at
least on newer CPUs.

Suggested-by: Linus Torvalds 
Signed-off-by: Dominik Brodowski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: dan.j.willi...@intel.com
Link: http://lkml.kernel.org/r/20180211104949.12992-5-li...@dominikbrodowski.net
Signed-off-by: Ingo Molnar 
---
  arch/x86/entry/calling.h  | 36 
  arch/x86/entry/entry_64.S |  6 ++
  2 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index a05cbb8..57b1b87 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -137,6 +137,42 @@ For 32-bit we have the following conventions - kernel is 
built with
UNWIND_HINT_REGS offset=\offset
.endm
  
+	.macro PUSH_AND_CLEAR_REGS

+   /*
+* Push registers and sanitize registers of values that a
+* speculation attack might otherwise want to exploit. The
+* lower registers are likely clobbered well before they
+* could be put to use in a speculative execution gadget.
+* Interleave XOR with PUSH for better uop scheduling:
+*/
+   pushq   %rdi/* pt_regs->di */
+   pushq   %rsi/* pt_regs->si */
+   pushq   %rdx/* pt_regs->dx */
+   pushq   %rcx/* pt_regs->cx */
+   pushq   %rax/* pt_regs->ax */
+   pushq   %r8 /* pt_regs->r8 */
+   xorq%r8, %r8/* nospec   r8 */


xorq's are slower than xorl's on Silvermont/Knights Landing.
I propose using xorl instead.


+   pushq   %r9 /* pt_regs->r9 */
+   xorq%r9, %r9/* nospec   r9 */
+   pushq   %r10/* pt_regs->r10 */
+   xorq%r10, %r10  /* nospec   r10 */
+   pushq   %r11/* pt_regs->r11 */
+   xorq%r11, %r11  /* nospec   r11*/
+   pushq   %rbx/* pt_regs->rbx */
+   xorl%ebx, %ebx  /* nospec   rbx*/
+   pushq   %rbp/* pt_regs->rbp */
+   xorl%ebp, %ebp  /* nospec   rbp*/
+   pushq   %r12/* pt_regs->r12 */
+   xorq%r12, %r12  /* nospec   r12*/
+   pushq   %r13/* pt_regs->r13 */
+   xorq%r13, %r13  /* nospec   r13*/
+   pushq   %r14/* pt_regs->r14 */
+   xorq%r14, %r14  /* nospec   r14*/
+   pushq   %r15/* pt_regs->r15 */
+   xorq%r15, %r15  /* nospec   r15*/


Re: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro

2018-02-12 Thread Denys Vlasenko

On 02/12/2018 11:17 AM, tip-bot for Dominik Brodowski wrote:

Commit-ID:  7b7b09f110f06f3c006e9b3f4590f7d9ba91345b
Gitweb: https://git.kernel.org/tip/7b7b09f110f06f3c006e9b3f4590f7d9ba91345b
Author: Dominik Brodowski 
AuthorDate: Sun, 11 Feb 2018 11:49:45 +0100
Committer:  Ingo Molnar 
CommitDate: Mon, 12 Feb 2018 08:06:36 +0100

x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro

Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before
SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS.
This macro uses PUSH instead of MOV and should therefore be faster, at
least on newer CPUs.

Suggested-by: Linus Torvalds 
Signed-off-by: Dominik Brodowski 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Brian Gerst 
Cc: Denys Vlasenko 
Cc: H. Peter Anvin 
Cc: Josh Poimboeuf 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: dan.j.willi...@intel.com
Link: http://lkml.kernel.org/r/20180211104949.12992-5-li...@dominikbrodowski.net
Signed-off-by: Ingo Molnar 
---
  arch/x86/entry/calling.h  | 36 
  arch/x86/entry/entry_64.S |  6 ++
  2 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index a05cbb8..57b1b87 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -137,6 +137,42 @@ For 32-bit we have the following conventions - kernel is 
built with
UNWIND_HINT_REGS offset=\offset
.endm
  
+	.macro PUSH_AND_CLEAR_REGS

+   /*
+* Push registers and sanitize registers of values that a
+* speculation attack might otherwise want to exploit. The
+* lower registers are likely clobbered well before they
+* could be put to use in a speculative execution gadget.
+* Interleave XOR with PUSH for better uop scheduling:
+*/
+   pushq   %rdi/* pt_regs->di */
+   pushq   %rsi/* pt_regs->si */
+   pushq   %rdx/* pt_regs->dx */
+   pushq   %rcx/* pt_regs->cx */
+   pushq   %rax/* pt_regs->ax */
+   pushq   %r8 /* pt_regs->r8 */
+   xorq%r8, %r8/* nospec   r8 */


xorq's are slower than xorl's on Silvermont/Knights Landing.
I propose using xorl instead.


+   pushq   %r9 /* pt_regs->r9 */
+   xorq%r9, %r9/* nospec   r9 */
+   pushq   %r10/* pt_regs->r10 */
+   xorq%r10, %r10  /* nospec   r10 */
+   pushq   %r11/* pt_regs->r11 */
+   xorq%r11, %r11  /* nospec   r11*/
+   pushq   %rbx/* pt_regs->rbx */
+   xorl%ebx, %ebx  /* nospec   rbx*/
+   pushq   %rbp/* pt_regs->rbp */
+   xorl%ebp, %ebp  /* nospec   rbp*/
+   pushq   %r12/* pt_regs->r12 */
+   xorq%r12, %r12  /* nospec   r12*/
+   pushq   %r13/* pt_regs->r13 */
+   xorq%r13, %r13  /* nospec   r13*/
+   pushq   %r14/* pt_regs->r14 */
+   xorq%r14, %r14  /* nospec   r14*/
+   pushq   %r15/* pt_regs->r15 */
+   xorq%r15, %r15  /* nospec   r15*/