Re: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro
On Mon, Feb 12, 2018 at 5:29 AM, Denys Vlasenkowrote: > > xorq's are slower than xorl's on Silvermont/Knights Landing. > I propose using xorl instead. Makes sense, and matches the other 'xorl'. I suspect the only reason it uses 'xorq' for the high regs is that the register names for the high regs are "odd" and don't match the legacy register names. You need to use "%r15d'" for the 32-bit version. Linus
Re: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro
On Mon, Feb 12, 2018 at 5:29 AM, Denys Vlasenko wrote: > > xorq's are slower than xorl's on Silvermont/Knights Landing. > I propose using xorl instead. Makes sense, and matches the other 'xorl'. I suspect the only reason it uses 'xorq' for the high regs is that the register names for the high regs are "odd" and don't match the legacy register names. You need to use "%r15d'" for the 32-bit version. Linus
Re: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro
On 02/12/2018 02:36 PM, David Laight wrote: From: Denys Vlasenko Sent: 12 February 2018 13:29 ... x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS. This macro uses PUSH instead of MOV and should therefore be faster, at least on newer CPUs. ... Link: http://lkml.kernel.org/r/20180211104949.12992-5-li...@dominikbrodowski.net Signed-off-by: Ingo Molnar--- arch/x86/entry/calling.h | 36 arch/x86/entry/entry_64.S | 6 ++ 2 files changed, 38 insertions(+), 4 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index a05cbb8..57b1b87 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -137,6 +137,42 @@ For 32-bit we have the following conventions - kernel is built with UNWIND_HINT_REGS offset=\offset .endm + .macro PUSH_AND_CLEAR_REGS + /* +* Push registers and sanitize registers of values that a +* speculation attack might otherwise want to exploit. The +* lower registers are likely clobbered well before they +* could be put to use in a speculative execution gadget. +* Interleave XOR with PUSH for better uop scheduling: +*/ + pushq %rdi/* pt_regs->di */ + pushq %rsi/* pt_regs->si */ + pushq %rdx/* pt_regs->dx */ + pushq %rcx/* pt_regs->cx */ + pushq %rax/* pt_regs->ax */ + pushq %r8 /* pt_regs->r8 */ + xorq%r8, %r8/* nospec r8 */ xorq's are slower than xorl's on Silvermont/Knights Landing. I propose using xorl instead. Does using movq to copy the first zero to the other registers make the code any faster? ISTR mov reg-reg is often implemented as a register rename rather than an alu operation. xorl is implemented in register rename as well. Just, for some reason, xorq did not get the same treatment on those CPUs.
Re: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro
On 02/12/2018 02:36 PM, David Laight wrote: From: Denys Vlasenko Sent: 12 February 2018 13:29 ... x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS. This macro uses PUSH instead of MOV and should therefore be faster, at least on newer CPUs. ... Link: http://lkml.kernel.org/r/20180211104949.12992-5-li...@dominikbrodowski.net Signed-off-by: Ingo Molnar --- arch/x86/entry/calling.h | 36 arch/x86/entry/entry_64.S | 6 ++ 2 files changed, 38 insertions(+), 4 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index a05cbb8..57b1b87 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -137,6 +137,42 @@ For 32-bit we have the following conventions - kernel is built with UNWIND_HINT_REGS offset=\offset .endm + .macro PUSH_AND_CLEAR_REGS + /* +* Push registers and sanitize registers of values that a +* speculation attack might otherwise want to exploit. The +* lower registers are likely clobbered well before they +* could be put to use in a speculative execution gadget. +* Interleave XOR with PUSH for better uop scheduling: +*/ + pushq %rdi/* pt_regs->di */ + pushq %rsi/* pt_regs->si */ + pushq %rdx/* pt_regs->dx */ + pushq %rcx/* pt_regs->cx */ + pushq %rax/* pt_regs->ax */ + pushq %r8 /* pt_regs->r8 */ + xorq%r8, %r8/* nospec r8 */ xorq's are slower than xorl's on Silvermont/Knights Landing. I propose using xorl instead. Does using movq to copy the first zero to the other registers make the code any faster? ISTR mov reg-reg is often implemented as a register rename rather than an alu operation. xorl is implemented in register rename as well. Just, for some reason, xorq did not get the same treatment on those CPUs.
RE: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro
From: Denys Vlasenko > Sent: 12 February 2018 13:29 ... > > > > x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro > > > > Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before > > SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS. > > This macro uses PUSH instead of MOV and should therefore be faster, at > > least on newer CPUs. ... > > Link: > > http://lkml.kernel.org/r/20180211104949.12992-5-li...@dominikbrodowski.net > > Signed-off-by: Ingo Molnar> > --- > > arch/x86/entry/calling.h | 36 > > arch/x86/entry/entry_64.S | 6 ++ > > 2 files changed, 38 insertions(+), 4 deletions(-) > > > > diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h > > index a05cbb8..57b1b87 100644 > > --- a/arch/x86/entry/calling.h > > +++ b/arch/x86/entry/calling.h > > @@ -137,6 +137,42 @@ For 32-bit we have the following conventions - kernel > > is built with > > UNWIND_HINT_REGS offset=\offset > > .endm > > > > + .macro PUSH_AND_CLEAR_REGS > > + /* > > +* Push registers and sanitize registers of values that a > > +* speculation attack might otherwise want to exploit. The > > +* lower registers are likely clobbered well before they > > +* could be put to use in a speculative execution gadget. > > +* Interleave XOR with PUSH for better uop scheduling: > > +*/ > > + pushq %rdi/* pt_regs->di */ > > + pushq %rsi/* pt_regs->si */ > > + pushq %rdx/* pt_regs->dx */ > > + pushq %rcx/* pt_regs->cx */ > > + pushq %rax/* pt_regs->ax */ > > + pushq %r8 /* pt_regs->r8 */ > > + xorq%r8, %r8/* nospec r8 */ > > xorq's are slower than xorl's on Silvermont/Knights Landing. > I propose using xorl instead. Does using movq to copy the first zero to the other registers make the code any faster? ISTR mov reg-reg is often implemented as a register rename rather than an alu operation. David
RE: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro
From: Denys Vlasenko > Sent: 12 February 2018 13:29 ... > > > > x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro > > > > Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before > > SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS. > > This macro uses PUSH instead of MOV and should therefore be faster, at > > least on newer CPUs. ... > > Link: > > http://lkml.kernel.org/r/20180211104949.12992-5-li...@dominikbrodowski.net > > Signed-off-by: Ingo Molnar > > --- > > arch/x86/entry/calling.h | 36 > > arch/x86/entry/entry_64.S | 6 ++ > > 2 files changed, 38 insertions(+), 4 deletions(-) > > > > diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h > > index a05cbb8..57b1b87 100644 > > --- a/arch/x86/entry/calling.h > > +++ b/arch/x86/entry/calling.h > > @@ -137,6 +137,42 @@ For 32-bit we have the following conventions - kernel > > is built with > > UNWIND_HINT_REGS offset=\offset > > .endm > > > > + .macro PUSH_AND_CLEAR_REGS > > + /* > > +* Push registers and sanitize registers of values that a > > +* speculation attack might otherwise want to exploit. The > > +* lower registers are likely clobbered well before they > > +* could be put to use in a speculative execution gadget. > > +* Interleave XOR with PUSH for better uop scheduling: > > +*/ > > + pushq %rdi/* pt_regs->di */ > > + pushq %rsi/* pt_regs->si */ > > + pushq %rdx/* pt_regs->dx */ > > + pushq %rcx/* pt_regs->cx */ > > + pushq %rax/* pt_regs->ax */ > > + pushq %r8 /* pt_regs->r8 */ > > + xorq%r8, %r8/* nospec r8 */ > > xorq's are slower than xorl's on Silvermont/Knights Landing. > I propose using xorl instead. Does using movq to copy the first zero to the other registers make the code any faster? ISTR mov reg-reg is often implemented as a register rename rather than an alu operation. David
Re: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro
On 02/12/2018 11:17 AM, tip-bot for Dominik Brodowski wrote: Commit-ID: 7b7b09f110f06f3c006e9b3f4590f7d9ba91345b Gitweb: https://git.kernel.org/tip/7b7b09f110f06f3c006e9b3f4590f7d9ba91345b Author: Dominik BrodowskiAuthorDate: Sun, 11 Feb 2018 11:49:45 +0100 Committer: Ingo Molnar CommitDate: Mon, 12 Feb 2018 08:06:36 +0100 x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS. This macro uses PUSH instead of MOV and should therefore be faster, at least on newer CPUs. Suggested-by: Linus Torvalds Signed-off-by: Dominik Brodowski Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: dan.j.willi...@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-5-li...@dominikbrodowski.net Signed-off-by: Ingo Molnar --- arch/x86/entry/calling.h | 36 arch/x86/entry/entry_64.S | 6 ++ 2 files changed, 38 insertions(+), 4 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index a05cbb8..57b1b87 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -137,6 +137,42 @@ For 32-bit we have the following conventions - kernel is built with UNWIND_HINT_REGS offset=\offset .endm + .macro PUSH_AND_CLEAR_REGS + /* +* Push registers and sanitize registers of values that a +* speculation attack might otherwise want to exploit. The +* lower registers are likely clobbered well before they +* could be put to use in a speculative execution gadget. +* Interleave XOR with PUSH for better uop scheduling: +*/ + pushq %rdi/* pt_regs->di */ + pushq %rsi/* pt_regs->si */ + pushq %rdx/* pt_regs->dx */ + pushq %rcx/* pt_regs->cx */ + pushq %rax/* pt_regs->ax */ + pushq %r8 /* pt_regs->r8 */ + xorq%r8, %r8/* nospec r8 */ xorq's are slower than xorl's on Silvermont/Knights Landing. I propose using xorl instead. + pushq %r9 /* pt_regs->r9 */ + xorq%r9, %r9/* nospec r9 */ + pushq %r10/* pt_regs->r10 */ + xorq%r10, %r10 /* nospec r10 */ + pushq %r11/* pt_regs->r11 */ + xorq%r11, %r11 /* nospec r11*/ + pushq %rbx/* pt_regs->rbx */ + xorl%ebx, %ebx /* nospec rbx*/ + pushq %rbp/* pt_regs->rbp */ + xorl%ebp, %ebp /* nospec rbp*/ + pushq %r12/* pt_regs->r12 */ + xorq%r12, %r12 /* nospec r12*/ + pushq %r13/* pt_regs->r13 */ + xorq%r13, %r13 /* nospec r13*/ + pushq %r14/* pt_regs->r14 */ + xorq%r14, %r14 /* nospec r14*/ + pushq %r15/* pt_regs->r15 */ + xorq%r15, %r15 /* nospec r15*/
Re: [tip:x86/pti] x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro
On 02/12/2018 11:17 AM, tip-bot for Dominik Brodowski wrote: Commit-ID: 7b7b09f110f06f3c006e9b3f4590f7d9ba91345b Gitweb: https://git.kernel.org/tip/7b7b09f110f06f3c006e9b3f4590f7d9ba91345b Author: Dominik Brodowski AuthorDate: Sun, 11 Feb 2018 11:49:45 +0100 Committer: Ingo Molnar CommitDate: Mon, 12 Feb 2018 08:06:36 +0100 x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro Those instances where ALLOC_PT_GPREGS_ON_STACK is called just before SAVE_AND_CLEAR_REGS can trivially be replaced by PUSH_AND_CLEAN_REGS. This macro uses PUSH instead of MOV and should therefore be faster, at least on newer CPUs. Suggested-by: Linus Torvalds Signed-off-by: Dominik Brodowski Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: dan.j.willi...@intel.com Link: http://lkml.kernel.org/r/20180211104949.12992-5-li...@dominikbrodowski.net Signed-off-by: Ingo Molnar --- arch/x86/entry/calling.h | 36 arch/x86/entry/entry_64.S | 6 ++ 2 files changed, 38 insertions(+), 4 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index a05cbb8..57b1b87 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -137,6 +137,42 @@ For 32-bit we have the following conventions - kernel is built with UNWIND_HINT_REGS offset=\offset .endm + .macro PUSH_AND_CLEAR_REGS + /* +* Push registers and sanitize registers of values that a +* speculation attack might otherwise want to exploit. The +* lower registers are likely clobbered well before they +* could be put to use in a speculative execution gadget. +* Interleave XOR with PUSH for better uop scheduling: +*/ + pushq %rdi/* pt_regs->di */ + pushq %rsi/* pt_regs->si */ + pushq %rdx/* pt_regs->dx */ + pushq %rcx/* pt_regs->cx */ + pushq %rax/* pt_regs->ax */ + pushq %r8 /* pt_regs->r8 */ + xorq%r8, %r8/* nospec r8 */ xorq's are slower than xorl's on Silvermont/Knights Landing. I propose using xorl instead. + pushq %r9 /* pt_regs->r9 */ + xorq%r9, %r9/* nospec r9 */ + pushq %r10/* pt_regs->r10 */ + xorq%r10, %r10 /* nospec r10 */ + pushq %r11/* pt_regs->r11 */ + xorq%r11, %r11 /* nospec r11*/ + pushq %rbx/* pt_regs->rbx */ + xorl%ebx, %ebx /* nospec rbx*/ + pushq %rbp/* pt_regs->rbp */ + xorl%ebp, %ebp /* nospec rbp*/ + pushq %r12/* pt_regs->r12 */ + xorq%r12, %r12 /* nospec r12*/ + pushq %r13/* pt_regs->r13 */ + xorq%r13, %r13 /* nospec r13*/ + pushq %r14/* pt_regs->r14 */ + xorq%r14, %r14 /* nospec r14*/ + pushq %r15/* pt_regs->r15 */ + xorq%r15, %r15 /* nospec r15*/