[PATCH v1 5/8] x86/entry/clearregs: Clear registers for 64bit exceptions/interrupts

2018-01-09 Thread Andi Kleen
From: Andi Kleen 

Clear all registers on entering the 64bit kernel for exceptions and
interrupts.

Since there are no arguments this is fairly simple.

Signed-off-by: Andi Kleen 
---
 arch/x86/entry/entry_64.S | 5 +
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 632081fd7086..6ab4c2aaeabb 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -636,6 +636,7 @@ END(irq_entries_start)
ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS
SAVE_EXTRA_REGS
+   CLEAR_ALL_REGS
ENCODE_FRAME_POINTER
 
testb   $3, CS(%rsp)
@@ -1192,6 +1193,7 @@ ENTRY(xen_failsafe_callback)
ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS
SAVE_EXTRA_REGS
+   CLEAR_ALL_REGS
ENCODE_FRAME_POINTER
jmp error_exit
 END(xen_failsafe_callback)
@@ -1237,6 +1239,7 @@ ENTRY(paranoid_entry)
cld
SAVE_C_REGS 8
SAVE_EXTRA_REGS 8
+   CLEAR_ALL_REGS
ENCODE_FRAME_POINTER 8
movl$1, %ebx
movl$MSR_GS_BASE, %ecx
@@ -1289,6 +1292,7 @@ ENTRY(error_entry)
cld
SAVE_C_REGS 8
SAVE_EXTRA_REGS 8
+   CLEAR_ALL_REGS
ENCODE_FRAME_POINTER 8
xorl%ebx, %ebx
testb   $3, CS+8(%rsp)
@@ -1487,6 +1491,7 @@ ENTRY(nmi)
pushq   %r14/* pt_regs->r14 */
pushq   %r15/* pt_regs->r15 */
UNWIND_HINT_REGS
+   CLEAR_ALL_REGS
ENCODE_FRAME_POINTER
 
/*
-- 
2.14.3



[PATCH v1 6/8] x86/entry/clearregs: Add number of arguments to syscall tables

2018-01-09 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

In order to sanitize the system call arguments properly
we need to know the number of syscall arguments for each
syscall. Add a new column to the 32bit and 64bit syscall
tables to list the number of arguments.

Also fix the generation script to not confuse the number
with a compat entry.

Generated with some scripting and quick review (but more eyeballs
would be appreciated)

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 arch/x86/entry/syscalls/syscall_32.tbl | 726 -
 arch/x86/entry/syscalls/syscall_64.tbl | 708 
 arch/x86/entry/syscalls/syscalltbl.sh  |   7 +-
 3 files changed, 723 insertions(+), 718 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index 448ac2161112..c3a4480365dd 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -6,388 +6,388 @@
 #
 # The abi is always "i386" for this file.
 #
-0  i386restart_syscall sys_restart_syscall
-1  i386exitsys_exit
-2  i386forksys_forksys_fork
-3  i386readsys_read
-4  i386write   sys_write
-5  i386opensys_open
compat_sys_open
-6  i386close   sys_close
-7  i386waitpid sys_waitpid 
sys32_waitpid
-8  i386creat   sys_creat
-9  i386linksys_link
-10 i386unlink  sys_unlink
-11 i386execve  sys_execve  
compat_sys_execve
-12 i386chdir   sys_chdir
-13 i386timesys_time
compat_sys_time
-14 i386mknod   sys_mknod
-15 i386chmod   sys_chmod
-16 i386lchown  sys_lchown16
+0  i386restart_syscall sys_restart_syscall 0
+1  i386exitsys_exit1
+2  i386forksys_fork
sys_fork0
+3  i386readsys_read3
+4  i386write   sys_write   3
+5  i386opensys_open
compat_sys_open 3
+6  i386close   sys_close   1
+7  i386waitpid sys_waitpid 
sys32_waitpid   3
+8  i386creat   sys_creat   2
+9  i386linksys_link2
+10 i386unlink  sys_unlink  1
+11 i386execve  sys_execve  
compat_sys_execve   3
+12 i386chdir   sys_chdir   1
+13 i386timesys_time
compat_sys_time 1
+14 i386mknod   sys_mknod   3
+15 i386chmod   sys_chmod   2
+16 i386lchown  sys_lchown163
 17 i386break
-18 i386oldstat sys_stat
-19 i386lseek   sys_lseek   
compat_sys_lseek
-20 i386getpid  sys_getpid
-21 i386mount   sys_mount   
compat_sys_mount
-22 i386umount  sys_oldumount
-23 i386setuid  sys_setuid16
-24 i386getuid  sys_getuid16
-25 i386stime   sys_stime   
compat_sys_stime
-26 i386ptrace  sys_ptrace  
compat_sys_ptrace
-27 i386alarm   sys_alarm
-28 i386oldfstatsys_fstat
-29 i386pause   sys_pause
-30 i386utime   sys_utime   
compat_sys_utime
+18 i386oldstat sys_stat2
+19 i386lseek   sys_lseek   
compat_sys_lseek3
+20 i386getpid  sys_getpid  0
+21 i386mount   sys_mount   
compat_sys_mount5
+22 i386umount  sys_oldumount   2
+23 i386setuid  sys_setuid161
+24 i386getuid  sys_getuid160
+25 i386stime   sys_stime   
compat_sys_stime1
+26 i386ptrace  sys_ptrace  
compat_sys_ptrace   4
+27 i386alarm   sys_alarm   1
+28 i386oldfstatsys_fstat   2
+29 i386pause  

[PATCH v1 4/8] x86/entry/retpoline: Clear extra registers for compat syscalls

2018-01-09 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

Clear all registers for compat calls on 64bit kernels. All arguments
are initially passed through the stack, so this is fairly simple
without additional stubs.

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 arch/x86/entry/entry_64_compat.S | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 98d5358e4041..16fd2643a77f 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -95,6 +95,8 @@ ENTRY(entry_SYSENTER_compat)
pushq   $0  /* pt_regs->r14 = 0 */
pushq   $0  /* pt_regs->r15 = 0 */
cld
+   /* Can clear all because arguments are passed through the stack */
+   CLEAR_ALL_REGS
 
/*
 * SYSENTER doesn't filter flags, so we need to clear NT and AC
@@ -223,6 +225,8 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe)
pushq   $0  /* pt_regs->r13 = 0 */
pushq   $0  /* pt_regs->r14 = 0 */
pushq   $0  /* pt_regs->r15 = 0 */
+   /* Can clear all because arguments are passed through the stack */
+   CLEAR_ALL_REGS
 
/*
 * User mode is traced as though IRQs are on, and SYSENTER
@@ -348,6 +352,8 @@ ENTRY(entry_INT80_compat)
pushq   %r14/* pt_regs->r14 */
pushq   %r15/* pt_regs->r15 */
cld
+   /* Can clear all because arguments are passed through the stack */
+   CLEAR_ALL_REGS
 
/*
 * User mode is traced as though IRQs are on, and the interrupt
-- 
2.14.3



[PATCH v1 6/8] x86/entry/clearregs: Add number of arguments to syscall tables

2018-01-09 Thread Andi Kleen
From: Andi Kleen 

In order to sanitize the system call arguments properly
we need to know the number of syscall arguments for each
syscall. Add a new column to the 32bit and 64bit syscall
tables to list the number of arguments.

Also fix the generation script to not confuse the number
with a compat entry.

Generated with some scripting and quick review (but more eyeballs
would be appreciated)

Signed-off-by: Andi Kleen 
---
 arch/x86/entry/syscalls/syscall_32.tbl | 726 -
 arch/x86/entry/syscalls/syscall_64.tbl | 708 
 arch/x86/entry/syscalls/syscalltbl.sh  |   7 +-
 3 files changed, 723 insertions(+), 718 deletions(-)

diff --git a/arch/x86/entry/syscalls/syscall_32.tbl 
b/arch/x86/entry/syscalls/syscall_32.tbl
index 448ac2161112..c3a4480365dd 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -6,388 +6,388 @@
 #
 # The abi is always "i386" for this file.
 #
-0  i386restart_syscall sys_restart_syscall
-1  i386exitsys_exit
-2  i386forksys_forksys_fork
-3  i386readsys_read
-4  i386write   sys_write
-5  i386opensys_open
compat_sys_open
-6  i386close   sys_close
-7  i386waitpid sys_waitpid 
sys32_waitpid
-8  i386creat   sys_creat
-9  i386linksys_link
-10 i386unlink  sys_unlink
-11 i386execve  sys_execve  
compat_sys_execve
-12 i386chdir   sys_chdir
-13 i386timesys_time
compat_sys_time
-14 i386mknod   sys_mknod
-15 i386chmod   sys_chmod
-16 i386lchown  sys_lchown16
+0  i386restart_syscall sys_restart_syscall 0
+1  i386exitsys_exit1
+2  i386forksys_fork
sys_fork0
+3  i386readsys_read3
+4  i386write   sys_write   3
+5  i386opensys_open
compat_sys_open 3
+6  i386close   sys_close   1
+7  i386waitpid sys_waitpid 
sys32_waitpid   3
+8  i386creat   sys_creat   2
+9  i386linksys_link2
+10 i386unlink  sys_unlink  1
+11 i386execve  sys_execve  
compat_sys_execve   3
+12 i386chdir   sys_chdir   1
+13 i386timesys_time
compat_sys_time 1
+14 i386mknod   sys_mknod   3
+15 i386chmod   sys_chmod   2
+16 i386lchown  sys_lchown163
 17 i386break
-18 i386oldstat sys_stat
-19 i386lseek   sys_lseek   
compat_sys_lseek
-20 i386getpid  sys_getpid
-21 i386mount   sys_mount   
compat_sys_mount
-22 i386umount  sys_oldumount
-23 i386setuid  sys_setuid16
-24 i386getuid  sys_getuid16
-25 i386stime   sys_stime   
compat_sys_stime
-26 i386ptrace  sys_ptrace  
compat_sys_ptrace
-27 i386alarm   sys_alarm
-28 i386oldfstatsys_fstat
-29 i386pause   sys_pause
-30 i386utime   sys_utime   
compat_sys_utime
+18 i386oldstat sys_stat2
+19 i386lseek   sys_lseek   
compat_sys_lseek3
+20 i386getpid  sys_getpid  0
+21 i386mount   sys_mount   
compat_sys_mount5
+22 i386umount  sys_oldumount   2
+23 i386setuid  sys_setuid161
+24 i386getuid  sys_getuid160
+25 i386stime   sys_stime   
compat_sys_stime1
+26 i386ptrace  sys_ptrace  
compat_sys_ptrace   4
+27 i386alarm   sys_alarm   1
+28 i386oldfstatsys_fstat   2
+29 i386pause   sys_pause   0
+30 i386utime  

[PATCH v1 4/8] x86/entry/retpoline: Clear extra registers for compat syscalls

2018-01-09 Thread Andi Kleen
From: Andi Kleen 

Clear all registers for compat calls on 64bit kernels. All arguments
are initially passed through the stack, so this is fairly simple
without additional stubs.

Signed-off-by: Andi Kleen 
---
 arch/x86/entry/entry_64_compat.S | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 98d5358e4041..16fd2643a77f 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -95,6 +95,8 @@ ENTRY(entry_SYSENTER_compat)
pushq   $0  /* pt_regs->r14 = 0 */
pushq   $0  /* pt_regs->r15 = 0 */
cld
+   /* Can clear all because arguments are passed through the stack */
+   CLEAR_ALL_REGS
 
/*
 * SYSENTER doesn't filter flags, so we need to clear NT and AC
@@ -223,6 +225,8 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe)
pushq   $0  /* pt_regs->r13 = 0 */
pushq   $0  /* pt_regs->r14 = 0 */
pushq   $0  /* pt_regs->r15 = 0 */
+   /* Can clear all because arguments are passed through the stack */
+   CLEAR_ALL_REGS
 
/*
 * User mode is traced as though IRQs are on, and SYSENTER
@@ -348,6 +352,8 @@ ENTRY(entry_INT80_compat)
pushq   %r14/* pt_regs->r14 */
pushq   %r15/* pt_regs->r15 */
cld
+   /* Can clear all because arguments are passed through the stack */
+   CLEAR_ALL_REGS
 
/*
 * User mode is traced as though IRQs are on, and the interrupt
-- 
2.14.3



[PATCH v1 1/8] x86/entry/clearregs: Remove partial stack frame in fast system call

2018-01-09 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

Remove the partial stack frame in the 64bit syscall fast path.
In the next patch we want to clear the extra registers, which requires
to always save all registers. So remove the partial stack frame
in the syscall fast path and always save everything.

This actually simplifies the code because the ptregs stubs
are not needed anymore.

arch/x86/entry/entry_64.S   | 57 
-
arch/x86/entry/syscall_64.c |  2 +-

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 arch/x86/entry/entry_64.S   | 57 -
 arch/x86/entry/syscall_64.c |  2 +-
 2 files changed, 5 insertions(+), 54 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 58dbf7a12a05..bbdfbdd817d6 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -234,7 +234,9 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
pushq   %r9 /* pt_regs->r9 */
pushq   %r10/* pt_regs->r10 */
pushq   %r11/* pt_regs->r11 */
-   sub $(6*8), %rsp/* pt_regs->bp, bx, r12-15 not 
saved */
+   sub $(6*8), %rsp
+   SAVE_EXTRA_REGS
+
UNWIND_HINT_REGS extra=0
 
/*
@@ -262,11 +264,6 @@ entry_SYSCALL_64_fastpath:
ja  1f  /* return -ENOSYS (already in 
pt_regs->ax) */
movq%r10, %rcx
 
-   /*
-* This call instruction is handled specially in stub_ptregs_64.
-* It might end up jumping to the slow path.  If it jumps, RAX
-* and all argument registers are clobbered.
-*/
 #ifdef CONFIG_RETPOLINE
movqsys_call_table(, %rax, 8), %rax
call__x86_indirect_thunk_rax
@@ -293,9 +290,7 @@ entry_SYSCALL_64_fastpath:
TRACE_IRQS_ON   /* user mode is traced as IRQs on */
movqRIP(%rsp), %rcx
movqEFLAGS(%rsp), %r11
-   addq$6*8, %rsp  /* skip extra regs -- they were preserved */
-   UNWIND_HINT_EMPTY
-   jmp .Lpop_c_regs_except_rcx_r11_and_sysret
+   jmp syscall_return_via_sysret
 
 1:
/*
@@ -305,14 +300,12 @@ entry_SYSCALL_64_fastpath:
 */
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_ANY)
-   SAVE_EXTRA_REGS
movq%rsp, %rdi
callsyscall_return_slowpath /* returns with IRQs disabled */
jmp return_from_SYSCALL_64
 
 entry_SYSCALL64_slow_path:
/* IRQs are off. */
-   SAVE_EXTRA_REGS
movq%rsp, %rdi
calldo_syscall_64   /* returns with IRQs disabled */
 
@@ -389,7 +382,6 @@ syscall_return_via_sysret:
/* rcx and r11 are already restored (see code above) */
UNWIND_HINT_EMPTY
POP_EXTRA_REGS
-.Lpop_c_regs_except_rcx_r11_and_sysret:
popq%rsi/* skip r11 */
popq%r10
popq%r9
@@ -420,47 +412,6 @@ syscall_return_via_sysret:
USERGS_SYSRET64
 END(entry_SYSCALL_64)
 
-ENTRY(stub_ptregs_64)
-   /*
-* Syscalls marked as needing ptregs land here.
-* If we are on the fast path, we need to save the extra regs,
-* which we achieve by trying again on the slow path.  If we are on
-* the slow path, the extra regs are already saved.
-*
-* RAX stores a pointer to the C function implementing the syscall.
-* IRQs are on.
-*/
-   cmpq$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
-   jne 1f
-
-   /*
-* Called from fast path -- disable IRQs again, pop return address
-* and jump to slow path
-*/
-   DISABLE_INTERRUPTS(CLBR_ANY)
-   TRACE_IRQS_OFF
-   popq%rax
-   UNWIND_HINT_REGS extra=0
-   jmp entry_SYSCALL64_slow_path
-
-1:
-   JMP_NOSPEC %rax /* Called from C */
-END(stub_ptregs_64)
-
-.macro ptregs_stub func
-ENTRY(ptregs_\func)
-   UNWIND_HINT_FUNC
-   leaq\func(%rip), %rax
-   jmp stub_ptregs_64
-END(ptregs_\func)
-.endm
-
-/* Instantiate ptregs_stub for each ptregs-using syscall */
-#define __SYSCALL_64_QUAL_(sym)
-#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_stub sym
-#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym)
-#include 
-
 /*
  * %rdi: prev task
  * %rsi: next task
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 9c09775e589d..ad1ae014f943 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -8,7 +8,7 @@
 #include 
 
 #define __SYSCALL_64_QUAL_(sym) sym
-#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym
+#define __SYSCALL_64_QUAL_ptregs(sym) sym
 
 #define __SYSCALL_64(nr, sym, qual) extern asmlinkage long 
__SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, 
unsigned long, unsigned long, unsigned long);
 #include 
-- 
2.14.3



[PATCH v1 1/8] x86/entry/clearregs: Remove partial stack frame in fast system call

2018-01-09 Thread Andi Kleen
From: Andi Kleen 

Remove the partial stack frame in the 64bit syscall fast path.
In the next patch we want to clear the extra registers, which requires
to always save all registers. So remove the partial stack frame
in the syscall fast path and always save everything.

This actually simplifies the code because the ptregs stubs
are not needed anymore.

arch/x86/entry/entry_64.S   | 57 
-
arch/x86/entry/syscall_64.c |  2 +-

Signed-off-by: Andi Kleen 
---
 arch/x86/entry/entry_64.S   | 57 -
 arch/x86/entry/syscall_64.c |  2 +-
 2 files changed, 5 insertions(+), 54 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 58dbf7a12a05..bbdfbdd817d6 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -234,7 +234,9 @@ GLOBAL(entry_SYSCALL_64_after_hwframe)
pushq   %r9 /* pt_regs->r9 */
pushq   %r10/* pt_regs->r10 */
pushq   %r11/* pt_regs->r11 */
-   sub $(6*8), %rsp/* pt_regs->bp, bx, r12-15 not 
saved */
+   sub $(6*8), %rsp
+   SAVE_EXTRA_REGS
+
UNWIND_HINT_REGS extra=0
 
/*
@@ -262,11 +264,6 @@ entry_SYSCALL_64_fastpath:
ja  1f  /* return -ENOSYS (already in 
pt_regs->ax) */
movq%r10, %rcx
 
-   /*
-* This call instruction is handled specially in stub_ptregs_64.
-* It might end up jumping to the slow path.  If it jumps, RAX
-* and all argument registers are clobbered.
-*/
 #ifdef CONFIG_RETPOLINE
movqsys_call_table(, %rax, 8), %rax
call__x86_indirect_thunk_rax
@@ -293,9 +290,7 @@ entry_SYSCALL_64_fastpath:
TRACE_IRQS_ON   /* user mode is traced as IRQs on */
movqRIP(%rsp), %rcx
movqEFLAGS(%rsp), %r11
-   addq$6*8, %rsp  /* skip extra regs -- they were preserved */
-   UNWIND_HINT_EMPTY
-   jmp .Lpop_c_regs_except_rcx_r11_and_sysret
+   jmp syscall_return_via_sysret
 
 1:
/*
@@ -305,14 +300,12 @@ entry_SYSCALL_64_fastpath:
 */
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_ANY)
-   SAVE_EXTRA_REGS
movq%rsp, %rdi
callsyscall_return_slowpath /* returns with IRQs disabled */
jmp return_from_SYSCALL_64
 
 entry_SYSCALL64_slow_path:
/* IRQs are off. */
-   SAVE_EXTRA_REGS
movq%rsp, %rdi
calldo_syscall_64   /* returns with IRQs disabled */
 
@@ -389,7 +382,6 @@ syscall_return_via_sysret:
/* rcx and r11 are already restored (see code above) */
UNWIND_HINT_EMPTY
POP_EXTRA_REGS
-.Lpop_c_regs_except_rcx_r11_and_sysret:
popq%rsi/* skip r11 */
popq%r10
popq%r9
@@ -420,47 +412,6 @@ syscall_return_via_sysret:
USERGS_SYSRET64
 END(entry_SYSCALL_64)
 
-ENTRY(stub_ptregs_64)
-   /*
-* Syscalls marked as needing ptregs land here.
-* If we are on the fast path, we need to save the extra regs,
-* which we achieve by trying again on the slow path.  If we are on
-* the slow path, the extra regs are already saved.
-*
-* RAX stores a pointer to the C function implementing the syscall.
-* IRQs are on.
-*/
-   cmpq$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp)
-   jne 1f
-
-   /*
-* Called from fast path -- disable IRQs again, pop return address
-* and jump to slow path
-*/
-   DISABLE_INTERRUPTS(CLBR_ANY)
-   TRACE_IRQS_OFF
-   popq%rax
-   UNWIND_HINT_REGS extra=0
-   jmp entry_SYSCALL64_slow_path
-
-1:
-   JMP_NOSPEC %rax /* Called from C */
-END(stub_ptregs_64)
-
-.macro ptregs_stub func
-ENTRY(ptregs_\func)
-   UNWIND_HINT_FUNC
-   leaq\func(%rip), %rax
-   jmp stub_ptregs_64
-END(ptregs_\func)
-.endm
-
-/* Instantiate ptregs_stub for each ptregs-using syscall */
-#define __SYSCALL_64_QUAL_(sym)
-#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_stub sym
-#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym)
-#include 
-
 /*
  * %rdi: prev task
  * %rsi: next task
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index 9c09775e589d..ad1ae014f943 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -8,7 +8,7 @@
 #include 
 
 #define __SYSCALL_64_QUAL_(sym) sym
-#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym
+#define __SYSCALL_64_QUAL_ptregs(sym) sym
 
 #define __SYSCALL_64(nr, sym, qual) extern asmlinkage long 
__SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, 
unsigned long, unsigned long, unsigned long);
 #include 
-- 
2.14.3



x86/clearregs: Register sanitizing at kernel entry for speculation hygiene

2018-01-09 Thread Andi Kleen
This patch kit implements clearing of all unused registers on kernel entries,
including system calls and all exceptions and interrupt.

This doesn't fix any known issue, but will make it harder in general
to exploit the kernel with speculation because it will be harder
to get user controlled values into kernel code.

The patchkit is a bit more complicated because it attempts to clear
unused argument registers, which requires on 64bit to know how
many arguments each system call has. I used some scripting
to derive the number of system calls from the SYSCALL_DEFINE*s
and add it to the x86 system call tables.

Everything else is relatively simple and straight forward,
and could be used independently.

I assume this mostly isn't 4.15 material, but should be considered for 4.16
Possibly some of the simpler patches could be considered for 4.15

Original patches were from Tim Chen, but changed significantly
by AK.

git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git spec/clearregs-1

v1: Initial post


x86/clearregs: Register sanitizing at kernel entry for speculation hygiene

2018-01-09 Thread Andi Kleen
This patch kit implements clearing of all unused registers on kernel entries,
including system calls and all exceptions and interrupt.

This doesn't fix any known issue, but will make it harder in general
to exploit the kernel with speculation because it will be harder
to get user controlled values into kernel code.

The patchkit is a bit more complicated because it attempts to clear
unused argument registers, which requires on 64bit to know how
many arguments each system call has. I used some scripting
to derive the number of system calls from the SYSCALL_DEFINE*s
and add it to the x86 system call tables.

Everything else is relatively simple and straight forward,
and could be used independently.

I assume this mostly isn't 4.15 material, but should be considered for 4.16
Possibly some of the simpler patches could be considered for 4.15

Original patches were from Tim Chen, but changed significantly
by AK.

git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git spec/clearregs-1

v1: Initial post


[PATCH v1 2/8] x86/entry/clearregs: Add infrastructure to clear registers

2018-01-09 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

Add 64bit assembler macros to clear registers on kernel entry.
Used in followon patches.

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 arch/x86/entry/calling.h | 28 
 1 file changed, 28 insertions(+)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 45a63e00a6af..9444e7623185 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -172,6 +172,34 @@ For 32-bit we have the following conventions - kernel is 
built with
.byte 0xf1
.endm
 
+   .macro CLEAR_R11_TO_R15
+   xorq %r15, %r15
+   xorq %r14, %r14
+   xorq %r13, %r13
+   xorq %r12, %r12
+   xorq %r11, %r11
+   .endm
+
+   .macro CLEAR_R8_TO_R15
+   CLEAR_R11_TO_R15
+   xorq %r10, %r10
+   xorq %r9, %r9
+   xorq %r8, %r8
+   .endm
+
+   .macro CLEAR_ALL_REGS
+   CLEAR_R8_TO_R15
+   xorl %eax, %eax
+   xorl %ebx, %ebx
+   xorl %ecx, %ecx
+   xorl %edx, %edx
+   xorl %esi, %esi
+   xorl %edi, %edi
+#ifndef CONFIG_FRAME_POINTER
+   xorl %ebp, %ebp
+#endif
+   .endm
+
 /*
  * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
  * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
-- 
2.14.3



[PATCH v1 2/8] x86/entry/clearregs: Add infrastructure to clear registers

2018-01-09 Thread Andi Kleen
From: Andi Kleen 

Add 64bit assembler macros to clear registers on kernel entry.
Used in followon patches.

Signed-off-by: Andi Kleen 
---
 arch/x86/entry/calling.h | 28 
 1 file changed, 28 insertions(+)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 45a63e00a6af..9444e7623185 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -172,6 +172,34 @@ For 32-bit we have the following conventions - kernel is 
built with
.byte 0xf1
.endm
 
+   .macro CLEAR_R11_TO_R15
+   xorq %r15, %r15
+   xorq %r14, %r14
+   xorq %r13, %r13
+   xorq %r12, %r12
+   xorq %r11, %r11
+   .endm
+
+   .macro CLEAR_R8_TO_R15
+   CLEAR_R11_TO_R15
+   xorq %r10, %r10
+   xorq %r9, %r9
+   xorq %r8, %r8
+   .endm
+
+   .macro CLEAR_ALL_REGS
+   CLEAR_R8_TO_R15
+   xorl %eax, %eax
+   xorl %ebx, %ebx
+   xorl %ecx, %ecx
+   xorl %edx, %edx
+   xorl %esi, %esi
+   xorl %edi, %edi
+#ifndef CONFIG_FRAME_POINTER
+   xorl %ebp, %ebp
+#endif
+   .endm
+
 /*
  * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
  * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
-- 
2.14.3



[PATCH v1 7/8] x86/entry/clearregs: Add 64bit stubs to clear unused arguments regs

2018-01-09 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

The main system call code doesn't know how many arguments each
system call has. So generate stubs that do the clearing.

Set up macros to generate stubs to clear unused argument registers
for each system call in a 64bit kernel. This uses the syscall
argument count from the syscall tables added earlier.

Each system call will run through its stub which then clears
the registers not used for input arguments before jumping
to the real system calls. It also clears RAX.

We have to move all the __SYSCALL_* users atomically.
This is a larger patch, but it's difficult to do it
git bisect safe otherwise.

Longer term this setup will also allow to get rid
of the system call table, as it will be possible
to compute the entry point with a simple shift.
So far this is not done here.

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 arch/x86/entry/calling.h  | 24 
 arch/x86/entry/entry_64.S | 15 +++
 arch/x86/entry/syscall_32.c   |  4 ++--
 arch/x86/entry/syscall_64.c   |  5 +++--
 arch/x86/entry/syscalls/syscalltbl.sh | 15 ---
 arch/x86/kernel/asm-offsets_32.c  |  2 +-
 arch/x86/kernel/asm-offsets_64.c  |  4 ++--
 arch/x86/um/sys_call_table_32.c   |  4 ++--
 arch/x86/um/sys_call_table_64.c   |  4 ++--
 9 files changed, 59 insertions(+), 18 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 9444e7623185..c89a8a8d195c 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -200,6 +200,30 @@ For 32-bit we have the following conventions - kernel is 
built with
 #endif
.endm
 
+   /* Clear unused argument registers */
+
+.macro CLEAR_ARGS num
+   /* we leave EAX around because it has been already checked */
+   .if \num < 6
+   xorq%r9, %r9# arg6
+   .endif
+   .if \num < 5
+   xorq%r8, %r8# arg5
+   .endif
+   .if \num < 4
+   xorl%ecx, %ecx  # arg4
+   .endif
+   .if \num < 3
+   xorl%edx, %edx  # arg3
+   .endif
+   .if \num < 2
+   xorl%esi, %esi  # arg2
+   .endif
+   .if \num < 1
+   xorl%edi, %edi  # arg1
+   .endif
+.endm
+
 /*
  * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
  * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 6ab4c2aaeabb..5b2456a30b17 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1755,6 +1755,21 @@ nmi_restore:
iretq
 END(nmi)
 
+/*
+ * Clear all argument registers not used by a system call.
+ */
+
+.macro gen_arg_stub sym, num
+ENTRY(san_args_\sym)
+   CLEAR_ARGS \num
+   xor   %eax, %eax
+   jmp\sym
+END(san_args_\sym)
+.endm
+
+#define __SYSCALL_64(nr, sym, qual, num) gen_arg_stub sym, num
+#include 
+
 ENTRY(ignore_sysret)
UNWIND_HINT_EMPTY
mov $-ENOSYS, %eax
diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index 95c294963612..b31e5c8b7ba7 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -7,11 +7,11 @@
 #include 
 #include 
 
-#define __SYSCALL_I386(nr, sym, qual) extern asmlinkage long sym(unsigned 
long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned 
long) ;
+#define __SYSCALL_I386(nr, sym, qual, num) extern asmlinkage long sym(unsigned 
long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned 
long) ;
 #include 
 #undef __SYSCALL_I386
 
-#define __SYSCALL_I386(nr, sym, qual) [nr] = sym,
+#define __SYSCALL_I386(nr, sym, qual, num) [nr] = sym,
 
 extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned 
long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index ad1ae014f943..963c9c14480f 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -10,11 +10,12 @@
 #define __SYSCALL_64_QUAL_(sym) sym
 #define __SYSCALL_64_QUAL_ptregs(sym) sym
 
-#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long 
__SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, 
unsigned long, unsigned long, unsigned long);
+#define __SYSCALL_64(nr, sym, qual, num) \
+   extern asmlinkage long 
__SYSCALL_64_QUAL_##qual(san_args_##sym)(unsigned long, unsigned long, unsigned 
long, unsigned long, unsigned long, unsigned long);
 #include 
 #undef __SYSCALL_64
 
-#define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym),
+#define __SYSCALL_64(nr, sym, qual, num) [nr] = 
__SYSCALL_64_QUAL_##qual(san_args_##sym),
 
 extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, 
unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/entry/syscalls/syscalltbl.sh 
b/arch/x86/entr

[PATCH v1 7/8] x86/entry/clearregs: Add 64bit stubs to clear unused arguments regs

2018-01-09 Thread Andi Kleen
From: Andi Kleen 

The main system call code doesn't know how many arguments each
system call has. So generate stubs that do the clearing.

Set up macros to generate stubs to clear unused argument registers
for each system call in a 64bit kernel. This uses the syscall
argument count from the syscall tables added earlier.

Each system call will run through its stub which then clears
the registers not used for input arguments before jumping
to the real system calls. It also clears RAX.

We have to move all the __SYSCALL_* users atomically.
This is a larger patch, but it's difficult to do it
git bisect safe otherwise.

Longer term this setup will also allow to get rid
of the system call table, as it will be possible
to compute the entry point with a simple shift.
So far this is not done here.

Signed-off-by: Andi Kleen 
---
 arch/x86/entry/calling.h  | 24 
 arch/x86/entry/entry_64.S | 15 +++
 arch/x86/entry/syscall_32.c   |  4 ++--
 arch/x86/entry/syscall_64.c   |  5 +++--
 arch/x86/entry/syscalls/syscalltbl.sh | 15 ---
 arch/x86/kernel/asm-offsets_32.c  |  2 +-
 arch/x86/kernel/asm-offsets_64.c  |  4 ++--
 arch/x86/um/sys_call_table_32.c   |  4 ++--
 arch/x86/um/sys_call_table_64.c   |  4 ++--
 9 files changed, 59 insertions(+), 18 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 9444e7623185..c89a8a8d195c 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -200,6 +200,30 @@ For 32-bit we have the following conventions - kernel is 
built with
 #endif
.endm
 
+   /* Clear unused argument registers */
+
+.macro CLEAR_ARGS num
+   /* we leave EAX around because it has been already checked */
+   .if \num < 6
+   xorq%r9, %r9# arg6
+   .endif
+   .if \num < 5
+   xorq%r8, %r8# arg5
+   .endif
+   .if \num < 4
+   xorl%ecx, %ecx  # arg4
+   .endif
+   .if \num < 3
+   xorl%edx, %edx  # arg3
+   .endif
+   .if \num < 2
+   xorl%esi, %esi  # arg2
+   .endif
+   .if \num < 1
+   xorl%edi, %edi  # arg1
+   .endif
+.endm
+
 /*
  * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
  * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 6ab4c2aaeabb..5b2456a30b17 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1755,6 +1755,21 @@ nmi_restore:
iretq
 END(nmi)
 
+/*
+ * Clear all argument registers not used by a system call.
+ */
+
+.macro gen_arg_stub sym, num
+ENTRY(san_args_\sym)
+   CLEAR_ARGS \num
+   xor   %eax, %eax
+   jmp\sym
+END(san_args_\sym)
+.endm
+
+#define __SYSCALL_64(nr, sym, qual, num) gen_arg_stub sym, num
+#include 
+
 ENTRY(ignore_sysret)
UNWIND_HINT_EMPTY
mov $-ENOSYS, %eax
diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c
index 95c294963612..b31e5c8b7ba7 100644
--- a/arch/x86/entry/syscall_32.c
+++ b/arch/x86/entry/syscall_32.c
@@ -7,11 +7,11 @@
 #include 
 #include 
 
-#define __SYSCALL_I386(nr, sym, qual) extern asmlinkage long sym(unsigned 
long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned 
long) ;
+#define __SYSCALL_I386(nr, sym, qual, num) extern asmlinkage long sym(unsigned 
long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned 
long) ;
 #include 
 #undef __SYSCALL_I386
 
-#define __SYSCALL_I386(nr, sym, qual) [nr] = sym,
+#define __SYSCALL_I386(nr, sym, qual, num) [nr] = sym,
 
 extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned 
long, unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index ad1ae014f943..963c9c14480f 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -10,11 +10,12 @@
 #define __SYSCALL_64_QUAL_(sym) sym
 #define __SYSCALL_64_QUAL_ptregs(sym) sym
 
-#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long 
__SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, 
unsigned long, unsigned long, unsigned long);
+#define __SYSCALL_64(nr, sym, qual, num) \
+   extern asmlinkage long 
__SYSCALL_64_QUAL_##qual(san_args_##sym)(unsigned long, unsigned long, unsigned 
long, unsigned long, unsigned long, unsigned long);
 #include 
 #undef __SYSCALL_64
 
-#define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym),
+#define __SYSCALL_64(nr, sym, qual, num) [nr] = 
__SYSCALL_64_QUAL_##qual(san_args_##sym),
 
 extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, 
unsigned long, unsigned long, unsigned long);
 
diff --git a/arch/x86/entry/syscalls/syscalltbl.sh 
b/arch/x86/entry/syscalls/syscalltbl.sh
index bb8a12f32610..79fff684d75e

Re: [PATCH] x86/retpoline: Fix NOSPEC_JMP for tip

2018-01-09 Thread Andi Kleen
> Then just make sure X86_FEATURE_RETPOLINE_AMD disables X86_FEATURE_RETPOLINE.
> 
> That is both simpler an dsmaller, no?

Yes that works, and is clearly better/simpler.

Tested-by: Andi Kleen <a...@linux.intel.com>

Thomas, I assume you will fix it up, or let me know if I should
send another patch.

-Andi


Re: [PATCH] x86/retpoline: Fix NOSPEC_JMP for tip

2018-01-09 Thread Andi Kleen
> Then just make sure X86_FEATURE_RETPOLINE_AMD disables X86_FEATURE_RETPOLINE.
> 
> That is both simpler an dsmaller, no?

Yes that works, and is clearly better/simpler.

Tested-by: Andi Kleen 

Thomas, I assume you will fix it up, or let me know if I should
send another patch.

-Andi


[PATCH] x86/retpoline: Fix NOSPEC_JMP for tip

2018-01-09 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

With the latest tip x86/pti I get oopses when booting
a 64bit VM in qemu with RETPOLINE/gcc7 and PTI enabled.

The following patch fixes it for me. Something doesn't
seem to work with ALTERNATIVE_2. It adds only a few bytes
more code, so seems acceptable.

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 arch/x86/include/asm/nospec-branch.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index 8ddf8513550e..eea6340a0abb 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -46,9 +46,8 @@
  */
 .macro JMP_NOSPEC reg:req
 #ifdef CONFIG_RETPOLINE
-   ALTERNATIVE_2 __stringify(jmp *\reg),   \
-   __stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE, \
-   __stringify(lfence; jmp *\reg), X86_FEATURE_RETPOLINE_AMD
+   ALTERNATIVE "", __stringify(lfence; jmp *\reg), 
X86_FEATURE_RETPOLINE_AMD
+   ALTERNATIVE __stringify(jmp *\reg), __stringify(RETPOLINE_JMP \reg), 
X86_FEATURE_RETPOLINE
 #else
jmp *\reg
 #endif
-- 
2.14.3



[PATCH] x86/retpoline: Fix NOSPEC_JMP for tip

2018-01-09 Thread Andi Kleen
From: Andi Kleen 

With the latest tip x86/pti I get oopses when booting
a 64bit VM in qemu with RETPOLINE/gcc7 and PTI enabled.

The following patch fixes it for me. Something doesn't
seem to work with ALTERNATIVE_2. It adds only a few bytes
more code, so seems acceptable.

Signed-off-by: Andi Kleen 
---
 arch/x86/include/asm/nospec-branch.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index 8ddf8513550e..eea6340a0abb 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -46,9 +46,8 @@
  */
 .macro JMP_NOSPEC reg:req
 #ifdef CONFIG_RETPOLINE
-   ALTERNATIVE_2 __stringify(jmp *\reg),   \
-   __stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE, \
-   __stringify(lfence; jmp *\reg), X86_FEATURE_RETPOLINE_AMD
+   ALTERNATIVE "", __stringify(lfence; jmp *\reg), 
X86_FEATURE_RETPOLINE_AMD
+   ALTERNATIVE __stringify(jmp *\reg), __stringify(RETPOLINE_JMP \reg), 
X86_FEATURE_RETPOLINE
 #else
jmp *\reg
 #endif
-- 
2.14.3



[tip:x86/pti] x86/retpoline: Avoid return buffer underflows on context switch

2018-01-09 Thread tip-bot for Andi Kleen
Commit-ID:  450c505047981e97471f0170e0102f613bba4739
Gitweb: https://git.kernel.org/tip/450c505047981e97471f0170e0102f613bba4739
Author: Andi Kleen <a...@linux.intel.com>
AuthorDate: Tue, 9 Jan 2018 14:43:17 +
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Tue, 9 Jan 2018 16:17:55 +0100

x86/retpoline: Avoid return buffer underflows on context switch

CPUs have return buffers which store the return address for RET to predict
function returns. Some CPUs (Skylake, some Broadwells) can fall back to
indirect branch prediction on return buffer underflow.

retpoline is supposed to prevent uncontrolled indirect branch speculation,
which could be poisoned by ring 3, so it needs to prevent uncontrolled
return buffer underflows in the kernel as well.

This can happen when a context switch from a shallower to a deeper kernel
stack happens.  The deeper kernel stack would eventually underflow the
return buffer, which again would make the CPU fall back to the indirect
branch predictor.

To guard against this fill the return buffer with controlled content during
context switch. This prevents any underflows.

Always fill the buffer with 30 entries: 32 minus 2 for at least one call
from entry_{64,32}.S to C code and another into the function doing the
fill.

That's pessimistic because there are likely more controlled kernel calls
before this happens, but it depends on compiler optimizations and other
factors so avoid speculative optimization, error on the side of safety and
always fill 30 entries.

[dwmw2: Fix comments about nop between calls, Move #ifdef CONFIG_RETPOLINE
to call sites not macro. Use Google's original RSB stuffing.]

[tglx: Massaged changelog ]

Signed-off-by: Andi Kleen <a...@linux.intel.com>
Signed-off-by: David Woodhouse <d...@amazon.co.uk>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Cc: gno...@lxorguk.ukuu.org.uk
Cc: Rik van Riel <r...@redhat.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Jiri Kosina <ji...@kernel.org>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Kees Cook <keesc...@google.com>
Cc: Tim Chen <tim.c.c...@linux.intel.com>
Cc: Greg Kroah-Hartman <gre...@linux-foundation.org>
Cc: Paul Turner <p...@google.com>
Link: 
https://lkml.kernel.org/r/1515508997-6154-12-git-send-email-d...@amazon.co.uk

---
 arch/x86/entry/entry_32.S| 17 ++
 arch/x86/entry/entry_64.S| 17 ++
 arch/x86/include/asm/nospec-branch.h | 44 
 3 files changed, 78 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index a1f28a5..d2ef7f32 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -244,6 +244,23 @@ ENTRY(__switch_to_asm)
movl%ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
 #endif
 
+#ifdef CONFIG_RETPOLINE
+   /*
+* When we switch from a shallower to a deeper call stack
+* the call stack will underflow in the kernel in the next task.
+* This could cause the CPU to fall back to indirect branch
+* prediction, which may be poisoned.
+*
+* To guard against that always fill the return stack with
+* known values.
+*
+* We do this in assembler because it needs to be before
+* any calls on the new stack, and this can be difficult to
+* ensure in a complex C function like __switch_to.
+*/
+   ALTERNATIVE "", "FILL_RETURN_BUFFER %ebx", X86_FEATURE_RETPOLINE
+#endif
+
/* restore callee-saved registers */
popl%esi
popl%edi
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 59874bc..58dbf7a 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -487,6 +487,23 @@ ENTRY(__switch_to_asm)
movq%rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
 #endif
 
+#ifdef CONFIG_RETPOLINE
+   /*
+* When we switch from a shallower to a deeper call stack
+* the call stack will underflow in the kernel in the next task.
+* This could cause the CPU to fall back to indirect branch
+* prediction, which may be poisoned.
+*
+* To guard against that always fill the return stack with
+* known values.
+*
+* We do this in assembler because it needs to be before
+* any calls on the new stack, and this can be difficult to
+* ensure in a complex C function like __switch_to.
+*/
+   ALTERNATIVE "", "FILL_RETURN_BUFFER %r12", X86_FEATURE_RETPOLINE
+#endif
+
/* restore callee-saved registers */
popq%r15
popq%r14
diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index a86e845..8ddf

[tip:x86/pti] x86/retpoline: Avoid return buffer underflows on context switch

2018-01-09 Thread tip-bot for Andi Kleen
Commit-ID:  450c505047981e97471f0170e0102f613bba4739
Gitweb: https://git.kernel.org/tip/450c505047981e97471f0170e0102f613bba4739
Author: Andi Kleen 
AuthorDate: Tue, 9 Jan 2018 14:43:17 +
Committer:  Thomas Gleixner 
CommitDate: Tue, 9 Jan 2018 16:17:55 +0100

x86/retpoline: Avoid return buffer underflows on context switch

CPUs have return buffers which store the return address for RET to predict
function returns. Some CPUs (Skylake, some Broadwells) can fall back to
indirect branch prediction on return buffer underflow.

retpoline is supposed to prevent uncontrolled indirect branch speculation,
which could be poisoned by ring 3, so it needs to prevent uncontrolled
return buffer underflows in the kernel as well.

This can happen when a context switch from a shallower to a deeper kernel
stack happens.  The deeper kernel stack would eventually underflow the
return buffer, which again would make the CPU fall back to the indirect
branch predictor.

To guard against this fill the return buffer with controlled content during
context switch. This prevents any underflows.

Always fill the buffer with 30 entries: 32 minus 2 for at least one call
from entry_{64,32}.S to C code and another into the function doing the
fill.

That's pessimistic because there are likely more controlled kernel calls
before this happens, but it depends on compiler optimizations and other
factors so avoid speculative optimization, error on the side of safety and
always fill 30 entries.

[dwmw2: Fix comments about nop between calls, Move #ifdef CONFIG_RETPOLINE
to call sites not macro. Use Google's original RSB stuffing.]

[tglx: Massaged changelog ]

Signed-off-by: Andi Kleen 
Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
Cc: gno...@lxorguk.ukuu.org.uk
Cc: Rik van Riel 
Cc: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: Jiri Kosina 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Kees Cook 
Cc: Tim Chen 
Cc: Greg Kroah-Hartman 
Cc: Paul Turner 
Link: 
https://lkml.kernel.org/r/1515508997-6154-12-git-send-email-d...@amazon.co.uk

---
 arch/x86/entry/entry_32.S| 17 ++
 arch/x86/entry/entry_64.S| 17 ++
 arch/x86/include/asm/nospec-branch.h | 44 
 3 files changed, 78 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index a1f28a5..d2ef7f32 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -244,6 +244,23 @@ ENTRY(__switch_to_asm)
movl%ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset
 #endif
 
+#ifdef CONFIG_RETPOLINE
+   /*
+* When we switch from a shallower to a deeper call stack
+* the call stack will underflow in the kernel in the next task.
+* This could cause the CPU to fall back to indirect branch
+* prediction, which may be poisoned.
+*
+* To guard against that always fill the return stack with
+* known values.
+*
+* We do this in assembler because it needs to be before
+* any calls on the new stack, and this can be difficult to
+* ensure in a complex C function like __switch_to.
+*/
+   ALTERNATIVE "", "FILL_RETURN_BUFFER %ebx", X86_FEATURE_RETPOLINE
+#endif
+
/* restore callee-saved registers */
popl%esi
popl%edi
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 59874bc..58dbf7a 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -487,6 +487,23 @@ ENTRY(__switch_to_asm)
movq%rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset
 #endif
 
+#ifdef CONFIG_RETPOLINE
+   /*
+* When we switch from a shallower to a deeper call stack
+* the call stack will underflow in the kernel in the next task.
+* This could cause the CPU to fall back to indirect branch
+* prediction, which may be poisoned.
+*
+* To guard against that always fill the return stack with
+* known values.
+*
+* We do this in assembler because it needs to be before
+* any calls on the new stack, and this can be difficult to
+* ensure in a complex C function like __switch_to.
+*/
+   ALTERNATIVE "", "FILL_RETURN_BUFFER %r12", X86_FEATURE_RETPOLINE
+#endif
+
/* restore callee-saved registers */
popq%r15
popq%r14
diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index a86e845..8ddf851 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -64,6 +64,50 @@
 #endif
 .endm
 
+/*
+ * Use 32-N: 32 is the max return buffer size, but there should have been
+ * at a minimum two controlled calls already: one into the kernel from
+ * entry*.S and another into the function containing this macro. So N=2,
+ * thus 30.
+ */
+#define NUM_BRANCHES_TO_FILL   30
+
+/*
+ * Fil

[tip:x86/pti] x86/retpoline/irq32: Convert assembler indirect jumps

2018-01-09 Thread tip-bot for Andi Kleen
Commit-ID:  3025d1ebb41bc8fc58fc050c6d4d6dd4d71ca5e8
Gitweb: https://git.kernel.org/tip/3025d1ebb41bc8fc58fc050c6d4d6dd4d71ca5e8
Author: Andi Kleen <a...@linux.intel.com>
AuthorDate: Tue, 9 Jan 2018 14:43:16 +
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Tue, 9 Jan 2018 16:17:54 +0100

x86/retpoline/irq32: Convert assembler indirect jumps

Convert all indirect jumps in 32bit irq inline asm code to use non
speculative sequences.

Signed-off-by: Andi Kleen <a...@linux.intel.com>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Acked-by: Arjan van de Ven <ar...@linux.intel.com>
Acked-by: Ingo Molnar <mi...@kernel.org>
Cc: gno...@lxorguk.ukuu.org.uk
Cc: Rik van Riel <r...@redhat.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Jiri Kosina <ji...@kernel.org>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Kees Cook <keesc...@google.com>
Cc: Tim Chen <tim.c.c...@linux.intel.com>
Cc: Greg Kroah-Hartman <gre...@linux-foundation.org>
Cc: Paul Turner <p...@google.com>
Link: 
https://lkml.kernel.org/r/1515508997-6154-11-git-send-email-d...@amazon.co.uk

---
 arch/x86/kernel/irq_32.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c
index a83b334..c1bdbd3 100644
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -20,6 +20,7 @@
 #include 
 
 #include 
+#include 
 
 #ifdef CONFIG_DEBUG_STACKOVERFLOW
 
@@ -55,11 +56,11 @@ DEFINE_PER_CPU(struct irq_stack *, softirq_stack);
 static void call_on_stack(void *func, void *stack)
 {
asm volatile("xchgl %%ebx,%%esp \n"
-"call  *%%edi  \n"
+CALL_NOSPEC
 "movl  %%ebx,%%esp \n"
 : "=b" (stack)
 : "0" (stack),
-  "D"(func)
+  [thunk_target] "D"(func)
 : "memory", "cc", "edx", "ecx", "eax");
 }
 
@@ -95,11 +96,11 @@ static inline int execute_on_irq_stack(int overflow, struct 
irq_desc *desc)
call_on_stack(print_stack_overflow, isp);
 
asm volatile("xchgl %%ebx,%%esp \n"
-"call  *%%edi  \n"
+CALL_NOSPEC
 "movl  %%ebx,%%esp \n"
 : "=a" (arg1), "=b" (isp)
 :  "0" (desc),   "1" (isp),
-   "D" (desc->handle_irq)
+   [thunk_target] "D" (desc->handle_irq)
 : "memory", "cc", "ecx");
return 1;
 }


[tip:x86/pti] x86/retpoline/irq32: Convert assembler indirect jumps

2018-01-09 Thread tip-bot for Andi Kleen
Commit-ID:  3025d1ebb41bc8fc58fc050c6d4d6dd4d71ca5e8
Gitweb: https://git.kernel.org/tip/3025d1ebb41bc8fc58fc050c6d4d6dd4d71ca5e8
Author: Andi Kleen 
AuthorDate: Tue, 9 Jan 2018 14:43:16 +
Committer:  Thomas Gleixner 
CommitDate: Tue, 9 Jan 2018 16:17:54 +0100

x86/retpoline/irq32: Convert assembler indirect jumps

Convert all indirect jumps in 32bit irq inline asm code to use non
speculative sequences.

Signed-off-by: Andi Kleen 
Signed-off-by: Thomas Gleixner 
Acked-by: Arjan van de Ven 
Acked-by: Ingo Molnar 
Cc: gno...@lxorguk.ukuu.org.uk
Cc: Rik van Riel 
Cc: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: Jiri Kosina 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Kees Cook 
Cc: Tim Chen 
Cc: Greg Kroah-Hartman 
Cc: Paul Turner 
Link: 
https://lkml.kernel.org/r/1515508997-6154-11-git-send-email-d...@amazon.co.uk

---
 arch/x86/kernel/irq_32.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c
index a83b334..c1bdbd3 100644
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -20,6 +20,7 @@
 #include 
 
 #include 
+#include 
 
 #ifdef CONFIG_DEBUG_STACKOVERFLOW
 
@@ -55,11 +56,11 @@ DEFINE_PER_CPU(struct irq_stack *, softirq_stack);
 static void call_on_stack(void *func, void *stack)
 {
asm volatile("xchgl %%ebx,%%esp \n"
-"call  *%%edi  \n"
+CALL_NOSPEC
 "movl  %%ebx,%%esp \n"
 : "=b" (stack)
 : "0" (stack),
-  "D"(func)
+  [thunk_target] "D"(func)
 : "memory", "cc", "edx", "ecx", "eax");
 }
 
@@ -95,11 +96,11 @@ static inline int execute_on_irq_stack(int overflow, struct 
irq_desc *desc)
call_on_stack(print_stack_overflow, isp);
 
asm volatile("xchgl %%ebx,%%esp \n"
-"call  *%%edi  \n"
+CALL_NOSPEC
 "movl  %%ebx,%%esp \n"
 : "=a" (arg1), "=b" (isp)
 :  "0" (desc),   "1" (isp),
-   "D" (desc->handle_irq)
+   [thunk_target] "D" (desc->handle_irq)
 : "memory", "cc", "ecx");
return 1;
 }


[tip:x86/pti] x86/retpoline: Temporarily disable objtool when CONFIG_RETPOLINE=y

2018-01-09 Thread tip-bot for Andi Kleen
Commit-ID:  61888594f2ff61633c7fb29b58c128d6dc850e7c
Gitweb: https://git.kernel.org/tip/61888594f2ff61633c7fb29b58c128d6dc850e7c
Author: Andi Kleen <a...@linux.intel.com>
AuthorDate: Tue, 9 Jan 2018 14:43:08 +
Committer:  Thomas Gleixner <t...@linutronix.de>
CommitDate: Tue, 9 Jan 2018 16:17:51 +0100

x86/retpoline: Temporarily disable objtool when CONFIG_RETPOLINE=y

objtool's assembler currently cannot deal with the code generated by the
retpoline compiler and throws hundreds of warnings, mostly because it sees
calls that don't have a symbolic target.

Exclude all the options that rely on objtool when RETPOLINE is active.

This mainly means that the kernel has to fallback to use the frame pointer
unwinder and livepatch is not supported.

Josh is looking into resolving the issue.

Signed-off-by: Andi Kleen <a...@linux.intel.com>
Signed-off-by: David Woodhouse <d...@amazon.co.uk>
Signed-off-by: Thomas Gleixner <t...@linutronix.de>
Acked-by: Arjan van de Ven <ar...@linux.intel.com>
Acked-by: Ingo Molnar <mi...@kernel.org>
Cc: gno...@lxorguk.ukuu.org.uk
Cc: Rik van Riel <r...@redhat.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Linus Torvalds <torva...@linux-foundation.org>
Cc: Jiri Kosina <ji...@kernel.org>
Cc: Andy Lutomirski <l...@amacapital.net>
Cc: Dave Hansen <dave.han...@intel.com>
Cc: Kees Cook <keesc...@google.com>
Cc: Tim Chen <tim.c.c...@linux.intel.com>
Cc: Greg Kroah-Hartman <gre...@linux-foundation.org>
Cc: Paul Turner <p...@google.com>
Link: 
https://lkml.kernel.org/r/1515508997-6154-3-git-send-email-d...@amazon.co.uk

---
 arch/x86/Kconfig   | 4 ++--
 arch/x86/Kconfig.debug | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d181916..abeac4b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -172,8 +172,8 @@ config X86
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
-   select HAVE_RELIABLE_STACKTRACE if X86_64 && 
UNWINDER_FRAME_POINTER && STACK_VALIDATION
-   select HAVE_STACK_VALIDATIONif X86_64
+   select HAVE_RELIABLE_STACKTRACE if X86_64 && 
UNWINDER_FRAME_POINTER && STACK_VALIDATION && !RETPOLINE
+   select HAVE_STACK_VALIDATIONif X86_64 && !RETPOLINE
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_USER_RETURN_NOTIFIER
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 6293a87..9f3928d 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -359,8 +359,8 @@ config PUNIT_ATOM_DEBUG
 
 choice
prompt "Choose kernel unwinder"
-   default UNWINDER_ORC if X86_64
-   default UNWINDER_FRAME_POINTER if X86_32
+   default UNWINDER_ORC if X86_64 && !RETPOLINE
+   default UNWINDER_FRAME_POINTER if X86_32 || RETPOLINE
---help---
  This determines which method will be used for unwinding kernel stack
  traces for panics, oopses, bugs, warnings, perf, /proc//stack,
@@ -368,7 +368,7 @@ choice
 
 config UNWINDER_ORC
bool "ORC unwinder"
-   depends on X86_64
+   depends on X86_64 && !RETPOLINE
select STACK_VALIDATION
---help---
  This option enables the ORC (Oops Rewind Capability) unwinder for


[tip:x86/pti] x86/retpoline: Temporarily disable objtool when CONFIG_RETPOLINE=y

2018-01-09 Thread tip-bot for Andi Kleen
Commit-ID:  61888594f2ff61633c7fb29b58c128d6dc850e7c
Gitweb: https://git.kernel.org/tip/61888594f2ff61633c7fb29b58c128d6dc850e7c
Author: Andi Kleen 
AuthorDate: Tue, 9 Jan 2018 14:43:08 +
Committer:  Thomas Gleixner 
CommitDate: Tue, 9 Jan 2018 16:17:51 +0100

x86/retpoline: Temporarily disable objtool when CONFIG_RETPOLINE=y

objtool's assembler currently cannot deal with the code generated by the
retpoline compiler and throws hundreds of warnings, mostly because it sees
calls that don't have a symbolic target.

Exclude all the options that rely on objtool when RETPOLINE is active.

This mainly means that the kernel has to fallback to use the frame pointer
unwinder and livepatch is not supported.

Josh is looking into resolving the issue.

Signed-off-by: Andi Kleen 
Signed-off-by: David Woodhouse 
Signed-off-by: Thomas Gleixner 
Acked-by: Arjan van de Ven 
Acked-by: Ingo Molnar 
Cc: gno...@lxorguk.ukuu.org.uk
Cc: Rik van Riel 
Cc: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: Jiri Kosina 
Cc: Andy Lutomirski 
Cc: Dave Hansen 
Cc: Kees Cook 
Cc: Tim Chen 
Cc: Greg Kroah-Hartman 
Cc: Paul Turner 
Link: 
https://lkml.kernel.org/r/1515508997-6154-3-git-send-email-d...@amazon.co.uk

---
 arch/x86/Kconfig   | 4 ++--
 arch/x86/Kconfig.debug | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d181916..abeac4b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -172,8 +172,8 @@ config X86
select HAVE_PERF_USER_STACK_DUMP
select HAVE_RCU_TABLE_FREE
select HAVE_REGS_AND_STACK_ACCESS_API
-   select HAVE_RELIABLE_STACKTRACE if X86_64 && 
UNWINDER_FRAME_POINTER && STACK_VALIDATION
-   select HAVE_STACK_VALIDATIONif X86_64
+   select HAVE_RELIABLE_STACKTRACE if X86_64 && 
UNWINDER_FRAME_POINTER && STACK_VALIDATION && !RETPOLINE
+   select HAVE_STACK_VALIDATIONif X86_64 && !RETPOLINE
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_UNSTABLE_SCHED_CLOCK
select HAVE_USER_RETURN_NOTIFIER
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index 6293a87..9f3928d 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -359,8 +359,8 @@ config PUNIT_ATOM_DEBUG
 
 choice
prompt "Choose kernel unwinder"
-   default UNWINDER_ORC if X86_64
-   default UNWINDER_FRAME_POINTER if X86_32
+   default UNWINDER_ORC if X86_64 && !RETPOLINE
+   default UNWINDER_FRAME_POINTER if X86_32 || RETPOLINE
---help---
  This determines which method will be used for unwinding kernel stack
  traces for panics, oopses, bugs, warnings, perf, /proc//stack,
@@ -368,7 +368,7 @@ choice
 
 config UNWINDER_ORC
bool "ORC unwinder"
-   depends on X86_64
+   depends on X86_64 && !RETPOLINE
select STACK_VALIDATION
---help---
  This option enables the ORC (Oops Rewind Capability) unwinder for


Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch II

2018-01-08 Thread Andi Kleen
> > On Skylake and Broadwell when the RSB underflows it will fall back to the 
> > indirect branch predictor, which can be poisoned and we try to avoid
> > using with retpoline. So we try to avoid underflows, and this filling
> > helps us with that.
> 
> That's no longer true for Broadwell with the latest microcode, right?

That's right.

-Andi



Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch II

2018-01-08 Thread Andi Kleen
> > On Skylake and Broadwell when the RSB underflows it will fall back to the 
> > indirect branch predictor, which can be poisoned and we try to avoid
> > using with retpoline. So we try to avoid underflows, and this filling
> > helps us with that.
> 
> That's no longer true for Broadwell with the latest microcode, right?

That's right.

-Andi



Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch II

2018-01-08 Thread Andi Kleen
On Mon, Jan 08, 2018 at 05:16:02PM -0800, Andi Kleen wrote:
> > If we clear the registers, what the hell are you going to put in the
> > RSB that helps you?
> 
> RSB allows you to control chains of gadgets.

I admit the gadget thing is a bit obscure.

There's another case we were actually more worried about: 

On Skylake and Broadwell when the RSB underflows it will fall back to the 
indirect branch predictor, which can be poisoned and we try to avoid
using with retpoline. So we try to avoid underflows, and this filling
helps us with that.

Does that make more sense?

-Andi


Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch II

2018-01-08 Thread Andi Kleen
On Mon, Jan 08, 2018 at 05:16:02PM -0800, Andi Kleen wrote:
> > If we clear the registers, what the hell are you going to put in the
> > RSB that helps you?
> 
> RSB allows you to control chains of gadgets.

I admit the gadget thing is a bit obscure.

There's another case we were actually more worried about: 

On Skylake and Broadwell when the RSB underflows it will fall back to the 
indirect branch predictor, which can be poisoned and we try to avoid
using with retpoline. So we try to avoid underflows, and this filling
helps us with that.

Does that make more sense?

-Andi


Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch

2018-01-08 Thread Andi Kleen
> If we clear the registers, what the hell are you going to put in the
> RSB that helps you?

RSB allows you to control chains of gadgets.

You can likely find some chain of gadgets that set up constants in registers in 
a
lot of useful ways. Perhaps not any way (so may be hard to scan through all of
memory), but it's likely you could find gadgets that result in a lot of useful
direct mapped addresses, which the next gadget can then reference.

Especially RAX is quite vulnerable to this because there will be a lot 
of code that does "modify RAX in interesting ways ; RET"

> So instead of saying "we have to flush the return stack", I'm saying
> that we should look at things that make flushing the return stack
> _unnecessary_, simply because even if the attacker were to control it
> entirely, they'd still be up shit creek without a paddle.

I agree that clearing registers is useful (was just hacking on that patch).

-Andi



Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch

2018-01-08 Thread Andi Kleen
> If we clear the registers, what the hell are you going to put in the
> RSB that helps you?

RSB allows you to control chains of gadgets.

You can likely find some chain of gadgets that set up constants in registers in 
a
lot of useful ways. Perhaps not any way (so may be hard to scan through all of
memory), but it's likely you could find gadgets that result in a lot of useful
direct mapped addresses, which the next gadget can then reference.

Especially RAX is quite vulnerable to this because there will be a lot 
of code that does "modify RAX in interesting ways ; RET"

> So instead of saying "we have to flush the return stack", I'm saying
> that we should look at things that make flushing the return stack
> _unnecessary_, simply because even if the attacker were to control it
> entirely, they'd still be up shit creek without a paddle.

I agree that clearing registers is useful (was just hacking on that patch).

-Andi



Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch

2018-01-08 Thread Andi Kleen
> So I was really hoping that in places like context switching etc, we'd
> be able to instead effectively kill off any exploits by clearing
> registers.
> 
> That should make it pretty damn hard to then find a matching "gadget"
> that actually does anything interesting/powerful.
> 
> Together with Spectre already being pretty hard to take advantage of,
> and the eBPF people making those user-proivided gadgets inaccessible,
> it really should be a pretty powerful fix.
> 
> Hmm?

Essentially the RSB are hidden registers, and the only way to clear them
is the FILL_RETURN_BUFFER sequence.  I don't see how clearing anything else
would help?

-Andi


Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch

2018-01-08 Thread Andi Kleen
> So I was really hoping that in places like context switching etc, we'd
> be able to instead effectively kill off any exploits by clearing
> registers.
> 
> That should make it pretty damn hard to then find a matching "gadget"
> that actually does anything interesting/powerful.
> 
> Together with Spectre already being pretty hard to take advantage of,
> and the eBPF people making those user-proivided gadgets inaccessible,
> it really should be a pretty powerful fix.
> 
> Hmm?

Essentially the RSB are hidden registers, and the only way to clear them
is the FILL_RETURN_BUFFER sequence.  I don't see how clearing anything else
would help?

-Andi


Re: [PATCH] x86/retpoline: Also fill return buffer after idle

2018-01-08 Thread Andi Kleen
> Probably doesn't matter right there but it's going to end up being used
> elsewhere with IBRS/IBPB, and the compiler is going to think it needs
> to save all the call-clobbered registers for that. Do we want to make
> it use inline asm instead?

You mean KVM?

All the other places have lots of other calls too, so the registers are likely 
already clobbered.

-Andi


Re: [PATCH] x86/retpoline: Also fill return buffer after idle

2018-01-08 Thread Andi Kleen
> Probably doesn't matter right there but it's going to end up being used
> elsewhere with IBRS/IBPB, and the compiler is going to think it needs
> to save all the call-clobbered registers for that. Do we want to make
> it use inline asm instead?

You mean KVM?

All the other places have lots of other calls too, so the registers are likely 
already clobbered.

-Andi


Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch

2018-01-08 Thread Andi Kleen
On Mon, Jan 08, 2018 at 03:56:30PM -0800, Linus Torvalds wrote:
> On Mon, Jan 8, 2018 at 3:44 PM, David Woodhouse  wrote:
> >
> > To guard against this fill the return buffer with controlled
> > content during context switch. This prevents any underflows.
> 
> Ugh. I really dislike this patch. Everything else in the retpoline
> patches makes me go "ok, that's reasonable". This one makes me go
> "Eww".
> 
> It's hacky, it's ugly, and it looks pretty expensive too.

Modern cores are quite fast at executing calls. 

> 
> Is there really nothing more clever we can do?

We could be a cleverer in selecting how many dummy calls to do.
But that would likely be fragile and hard to maintain
and likely be more complicated, and I doubt it would buy that much.

Don't really have a better proposal, sorry.

-Andi


Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch

2018-01-08 Thread Andi Kleen
On Mon, Jan 08, 2018 at 03:56:30PM -0800, Linus Torvalds wrote:
> On Mon, Jan 8, 2018 at 3:44 PM, David Woodhouse  wrote:
> >
> > To guard against this fill the return buffer with controlled
> > content during context switch. This prevents any underflows.
> 
> Ugh. I really dislike this patch. Everything else in the retpoline
> patches makes me go "ok, that's reasonable". This one makes me go
> "Eww".
> 
> It's hacky, it's ugly, and it looks pretty expensive too.

Modern cores are quite fast at executing calls. 

> 
> Is there really nothing more clever we can do?

We could be a cleverer in selecting how many dummy calls to do.
But that would likely be fragile and hard to maintain
and likely be more complicated, and I doubt it would buy that much.

Don't really have a better proposal, sorry.

-Andi


[PATCH] x86/retpoline: Also fill return buffer after idle

2018-01-08 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

This is an extension of the earlier patch to fill the return buffer
on context switch. It uses the assembler macros added earlier.

When we go into deeper idle states the return buffer could be cleared
in MWAIT, but then another thread which wakes up earlier might
be poisoning the indirect branch predictor. Then when the return
buffer underflows there might an uncontrolled indirect branch.

To guard against this always fill the return buffer when exiting idle.

Needed on Skylake and some Broadwells.

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 arch/x86/entry/entry_32.S|  8 
 arch/x86/entry/entry_64.S|  8 
 arch/x86/include/asm/mwait.h | 11 ++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 7dee84a3cf83..2687cce8a02e 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1092,3 +1092,11 @@ ENTRY(rewind_stack_do_exit)
calldo_exit
 1: jmp 1b
 END(rewind_stack_do_exit)
+
+ENTRY(fill_return_buffer)
+#ifdef CONFIG_RETPOLINE
+   ALTERNATIVE "ret", "", X86_FEATURE_RETPOLINE
+FILL_RETURN_BUFFER
+#endif
+ret
+END(fill_return_buffer)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a33033e2bfe0..92fbec1b0eb5 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1831,3 +1831,11 @@ ENTRY(rewind_stack_do_exit)
 
calldo_exit
 END(rewind_stack_do_exit)
+
+ENTRY(fill_return_buffer)
+#ifdef CONFIG_RETPOLINE
+   ALTERNATIVE "ret", "", X86_FEATURE_RETPOLINE
+   FILL_RETURN_BUFFER
+#endif
+   ret
+END(fill_return_buffer)
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 39a2fb29378a..1d9f9269b5e7 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -87,6 +87,8 @@ static inline void __sti_mwait(unsigned long eax, unsigned 
long ecx)
 :: "a" (eax), "c" (ecx));
 }
 
+extern __visible void fill_return_buffer(void);
+
 /*
  * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
  * which can obviate IPI to trigger checking of need_resched.
@@ -107,8 +109,15 @@ static inline void mwait_idle_with_hints(unsigned long 
eax, unsigned long ecx)
}
 
__monitor((void *)_thread_info()->flags, 0, 0);
-   if (!need_resched())
+   if (!need_resched()) {
__mwait(eax, ecx);
+   /*
+* idle could have cleared the return buffer,
+* so fill it to prevent uncontrolled
+* speculation.
+*/
+   fill_return_buffer();
+   }
}
current_clr_polling();
 }
-- 
2.14.3



[PATCH] x86/retpoline: Also fill return buffer after idle

2018-01-08 Thread Andi Kleen
From: Andi Kleen 

This is an extension of the earlier patch to fill the return buffer
on context switch. It uses the assembler macros added earlier.

When we go into deeper idle states the return buffer could be cleared
in MWAIT, but then another thread which wakes up earlier might
be poisoning the indirect branch predictor. Then when the return
buffer underflows there might an uncontrolled indirect branch.

To guard against this always fill the return buffer when exiting idle.

Needed on Skylake and some Broadwells.

Signed-off-by: Andi Kleen 
---
 arch/x86/entry/entry_32.S|  8 
 arch/x86/entry/entry_64.S|  8 
 arch/x86/include/asm/mwait.h | 11 ++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 7dee84a3cf83..2687cce8a02e 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1092,3 +1092,11 @@ ENTRY(rewind_stack_do_exit)
calldo_exit
 1: jmp 1b
 END(rewind_stack_do_exit)
+
+ENTRY(fill_return_buffer)
+#ifdef CONFIG_RETPOLINE
+   ALTERNATIVE "ret", "", X86_FEATURE_RETPOLINE
+FILL_RETURN_BUFFER
+#endif
+ret
+END(fill_return_buffer)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a33033e2bfe0..92fbec1b0eb5 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1831,3 +1831,11 @@ ENTRY(rewind_stack_do_exit)
 
calldo_exit
 END(rewind_stack_do_exit)
+
+ENTRY(fill_return_buffer)
+#ifdef CONFIG_RETPOLINE
+   ALTERNATIVE "ret", "", X86_FEATURE_RETPOLINE
+   FILL_RETURN_BUFFER
+#endif
+   ret
+END(fill_return_buffer)
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index 39a2fb29378a..1d9f9269b5e7 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -87,6 +87,8 @@ static inline void __sti_mwait(unsigned long eax, unsigned 
long ecx)
 :: "a" (eax), "c" (ecx));
 }
 
+extern __visible void fill_return_buffer(void);
+
 /*
  * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
  * which can obviate IPI to trigger checking of need_resched.
@@ -107,8 +109,15 @@ static inline void mwait_idle_with_hints(unsigned long 
eax, unsigned long ecx)
}
 
__monitor((void *)_thread_info()->flags, 0, 0);
-   if (!need_resched())
+   if (!need_resched()) {
__mwait(eax, ecx);
+   /*
+* idle could have cleared the return buffer,
+* so fill it to prevent uncontrolled
+* speculation.
+*/
+   fill_return_buffer();
+   }
}
current_clr_polling();
 }
-- 
2.14.3



Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

2018-01-08 Thread Andi Kleen
> > Why is none of that done here? Also, can we pretty please stop using
> > those retarded number labels, they make this stuff unreadable.
> 
> Personally I find the magic labels with strange ASCII characters 
> far less readable than a simple number.

Tried it and \@ is incompatible with .rept.

-Andi


Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

2018-01-08 Thread Andi Kleen
> > Why is none of that done here? Also, can we pretty please stop using
> > those retarded number labels, they make this stuff unreadable.
> 
> Personally I find the magic labels with strange ASCII characters 
> far less readable than a simple number.

Tried it and \@ is incompatible with .rept.

-Andi


Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

2018-01-08 Thread Andi Kleen
> We want this on vmexit too, right? 

Yes. KVM patches are done separately.

-Andi




Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

2018-01-08 Thread Andi Kleen
> We want this on vmexit too, right? 

Yes. KVM patches are done separately.

-Andi




Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

2018-01-08 Thread Andi Kleen
> So pjt did alignment, a single unroll and per discussion earlier today
> (CET) or late last night (PST), he only does 16.

I used the Intel recommended sequence, which recommends 32. 

Not sure if alignment makes a difference. I can check. 

> Why is none of that done here? Also, can we pretty please stop using
> those retarded number labels, they make this stuff unreadable.

Personally I find the magic labels with strange ASCII characters 
far less readable than a simple number.

But can change it if you insist.

> Also, pause is unlikely to stop speculation, that comment doesn't make
> sense. Looking at PJT's version there used to be a speculation trap in
> there, but I can't see that here.

My understanding is that it stops speculation. But could also
use LFENCE.

-Andi


Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch

2018-01-08 Thread Andi Kleen
> So pjt did alignment, a single unroll and per discussion earlier today
> (CET) or late last night (PST), he only does 16.

I used the Intel recommended sequence, which recommends 32. 

Not sure if alignment makes a difference. I can check. 

> Why is none of that done here? Also, can we pretty please stop using
> those retarded number labels, they make this stuff unreadable.

Personally I find the magic labels with strange ASCII characters 
far less readable than a simple number.

But can change it if you insist.

> Also, pause is unlikely to stop speculation, that comment doesn't make
> sense. Looking at PJT's version there used to be a speculation trap in
> there, but I can't see that here.

My understanding is that it stops speculation. But could also
use LFENCE.

-Andi


[PATCH] x86/retpoline: Avoid return buffer underflows on context switch

2018-01-08 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

[This is on top of David's retpoline branch, as of 08-01 this morning]

This patch further hardens retpoline

CPUs have return buffers which store the return address for
RET to predict function returns. Some CPUs (Skylake, some Broadwells)
can fall back to indirect branch prediction on return buffer underflow.

With retpoline we want to avoid uncontrolled indirect branches,
which could be poisoned by ring 3, so we need to avoid uncontrolled
return buffer underflows in the kernel.

This can happen when we're context switching from a shallower to a
deeper kernel stack.  The deeper kernel stack would eventually underflow
the return buffer, which again would fall back to the indirect branch predictor.

To guard against this fill the return buffer with controlled
content during context switch. This prevents any underflows.

We always fill the buffer with 30 entries: 32 minus 2 for at
least one call from entry_{64,32}.S to C code and another into
the function doing the filling.

That's pessimistic because we likely did more controlled kernel calls.
So in principle we could do less.  However it's hard to maintain such an
invariant, and it may be broken with more aggressive compilers.
So err on the side of safety and always fill 30.

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 arch/x86/entry/entry_32.S| 15 +++
 arch/x86/entry/entry_64.S| 15 +++
 arch/x86/include/asm/nospec-branch.h | 29 +
 3 files changed, 59 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index cf9ef33d299b..5404a9b2197c 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -250,6 +250,21 @@ ENTRY(__switch_to_asm)
popl%ebx
popl%ebp
 
+   /*
+* When we switch from a shallower to a deeper call stack
+* the call stack will underflow in the kernel in the next task.
+* This could cause the CPU to fall back to indirect branch
+* prediction, which may be poisoned.
+*
+* To guard against that always fill the return stack with
+* known values.
+*
+* We do this in assembler because it needs to be before
+* any calls on the new stack, and this can be difficult to
+* ensure in a complex C function like __switch_to.
+*/
+   ALTERNATIVE "jmp __switch_to", "", X86_FEATURE_RETPOLINE
+   FILL_RETURN_BUFFER
jmp __switch_to
 END(__switch_to_asm)
 
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9bce6ed03353..0f28d0ea57e8 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -495,6 +495,21 @@ ENTRY(__switch_to_asm)
popq%rbx
popq%rbp
 
+   /*
+* When we switch from a shallower to a deeper call stack
+* the call stack will underflow in the kernel in the next task.
+* This could cause the CPU to fall back to indirect branch
+* prediction, which may be poisoned.
+*
+* To guard against that always fill the return stack with
+* known values.
+*
+* We do this in assembler because it needs to be before
+* any calls on the new stack, and this can be difficult to
+* ensure in a complex C function like __switch_to.
+*/
+   ALTERNATIVE "jmp __switch_to", "", X86_FEATURE_RETPOLINE
+   FILL_RETURN_BUFFER
jmp __switch_to
 END(__switch_to_asm)
 
diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index b8c8eeacb4be..e84e231248c2 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -53,6 +53,35 @@
 #endif
 .endm
 
+/*
+ * We use 32-N: 32 is the max return buffer size,
+ * but there should have been at a minimum two
+ * controlled calls already: one into the kernel
+ * from entry*.S and another into the function
+ * containing this macro. So N=2, thus 30.
+ */
+#define NUM_BRANCHES_TO_FILL   30
+
+/*
+ * Fill the CPU return branch buffer to prevent
+ * indirect branch prediction on underflow.
+ * Caller should check for X86_FEATURE_SMEP and X86_FEATURE_RETPOLINE
+ */
+.macro FILL_RETURN_BUFFER
+#ifdef CONFIG_RETPOLINE
+   .rept   NUM_BRANCHES_TO_FILL
+   call1221f
+   pause   /* stop speculation */
+1221:
+   .endr
+#ifdef CONFIG_64BIT
+   addq$8*NUM_BRANCHES_TO_FILL, %rsp
+#else
+   addl$4*NUM_BRANCHES_TO_FILL, %esp
+#endif
+#endif
+.endm
+
 #else /* __ASSEMBLY__ */
 
 #if defined(CONFIG_X86_64) && defined(RETPOLINE)
-- 
2.14.3



[PATCH] x86/retpoline: Avoid return buffer underflows on context switch

2018-01-08 Thread Andi Kleen
From: Andi Kleen 

[This is on top of David's retpoline branch, as of 08-01 this morning]

This patch further hardens retpoline

CPUs have return buffers which store the return address for
RET to predict function returns. Some CPUs (Skylake, some Broadwells)
can fall back to indirect branch prediction on return buffer underflow.

With retpoline we want to avoid uncontrolled indirect branches,
which could be poisoned by ring 3, so we need to avoid uncontrolled
return buffer underflows in the kernel.

This can happen when we're context switching from a shallower to a
deeper kernel stack.  The deeper kernel stack would eventually underflow
the return buffer, which again would fall back to the indirect branch predictor.

To guard against this fill the return buffer with controlled
content during context switch. This prevents any underflows.

We always fill the buffer with 30 entries: 32 minus 2 for at
least one call from entry_{64,32}.S to C code and another into
the function doing the filling.

That's pessimistic because we likely did more controlled kernel calls.
So in principle we could do less.  However it's hard to maintain such an
invariant, and it may be broken with more aggressive compilers.
So err on the side of safety and always fill 30.

Signed-off-by: Andi Kleen 
---
 arch/x86/entry/entry_32.S| 15 +++
 arch/x86/entry/entry_64.S| 15 +++
 arch/x86/include/asm/nospec-branch.h | 29 +
 3 files changed, 59 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index cf9ef33d299b..5404a9b2197c 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -250,6 +250,21 @@ ENTRY(__switch_to_asm)
popl%ebx
popl%ebp
 
+   /*
+* When we switch from a shallower to a deeper call stack
+* the call stack will underflow in the kernel in the next task.
+* This could cause the CPU to fall back to indirect branch
+* prediction, which may be poisoned.
+*
+* To guard against that always fill the return stack with
+* known values.
+*
+* We do this in assembler because it needs to be before
+* any calls on the new stack, and this can be difficult to
+* ensure in a complex C function like __switch_to.
+*/
+   ALTERNATIVE "jmp __switch_to", "", X86_FEATURE_RETPOLINE
+   FILL_RETURN_BUFFER
jmp __switch_to
 END(__switch_to_asm)
 
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9bce6ed03353..0f28d0ea57e8 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -495,6 +495,21 @@ ENTRY(__switch_to_asm)
popq%rbx
popq%rbp
 
+   /*
+* When we switch from a shallower to a deeper call stack
+* the call stack will underflow in the kernel in the next task.
+* This could cause the CPU to fall back to indirect branch
+* prediction, which may be poisoned.
+*
+* To guard against that always fill the return stack with
+* known values.
+*
+* We do this in assembler because it needs to be before
+* any calls on the new stack, and this can be difficult to
+* ensure in a complex C function like __switch_to.
+*/
+   ALTERNATIVE "jmp __switch_to", "", X86_FEATURE_RETPOLINE
+   FILL_RETURN_BUFFER
jmp __switch_to
 END(__switch_to_asm)
 
diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index b8c8eeacb4be..e84e231248c2 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -53,6 +53,35 @@
 #endif
 .endm
 
+/*
+ * We use 32-N: 32 is the max return buffer size,
+ * but there should have been at a minimum two
+ * controlled calls already: one into the kernel
+ * from entry*.S and another into the function
+ * containing this macro. So N=2, thus 30.
+ */
+#define NUM_BRANCHES_TO_FILL   30
+
+/*
+ * Fill the CPU return branch buffer to prevent
+ * indirect branch prediction on underflow.
+ * Caller should check for X86_FEATURE_SMEP and X86_FEATURE_RETPOLINE
+ */
+.macro FILL_RETURN_BUFFER
+#ifdef CONFIG_RETPOLINE
+   .rept   NUM_BRANCHES_TO_FILL
+   call1221f
+   pause   /* stop speculation */
+1221:
+   .endr
+#ifdef CONFIG_64BIT
+   addq$8*NUM_BRANCHES_TO_FILL, %rsp
+#else
+   addl$4*NUM_BRANCHES_TO_FILL, %esp
+#endif
+#endif
+.endm
+
 #else /* __ASSEMBLY__ */
 
 #if defined(CONFIG_X86_64) && defined(RETPOLINE)
-- 
2.14.3



Re: perf: documentation for cycles_no_execute event

2018-01-08 Thread Andi Kleen
> > Many of the x86 pipeline.json files have the brief description "Total
> > execution stalls" for both CYCLE_ACTIVITY.CYCLES_NO_EXECUTE and
> > CYCLE_ACTIVITY.STALLS_TOTAL.  Should the case for
> > CYCLE_ACTIVITY.CYCLES_NO_EXECUTE have a brief description that mentions
> > cycles?  Some of the files do have a public description that mentions
> > cycles, eg broadwellde/pipeline.json has "Counts number of cycles nothing
> > is executed on any execution port.", but others (eg ivytown/pipeline.json)
> > just have "Total execution stalls." twice.  Maybe "Cycles where nothing is
> > executed" would be ok for CYCLE_ACTIVITY.CYCLES_NO_EXECUTE?

Yes it's all in cycles

Will clarify in a future event list update.

-Andi


Re: perf: documentation for cycles_no_execute event

2018-01-08 Thread Andi Kleen
> > Many of the x86 pipeline.json files have the brief description "Total
> > execution stalls" for both CYCLE_ACTIVITY.CYCLES_NO_EXECUTE and
> > CYCLE_ACTIVITY.STALLS_TOTAL.  Should the case for
> > CYCLE_ACTIVITY.CYCLES_NO_EXECUTE have a brief description that mentions
> > cycles?  Some of the files do have a public description that mentions
> > cycles, eg broadwellde/pipeline.json has "Counts number of cycles nothing
> > is executed on any execution port.", but others (eg ivytown/pipeline.json)
> > just have "Total execution stalls." twice.  Maybe "Cycles where nothing is
> > executed" would be ok for CYCLE_ACTIVITY.CYCLES_NO_EXECUTE?

Yes it's all in cycles

Will clarify in a future event list update.

-Andi


Re: [PATCH V3] perf script: add script to profile and resolve physical mem type

2018-01-05 Thread Andi Kleen
On Thu, Jan 04, 2018 at 12:59:55PM -0800, kan.li...@intel.com wrote:
> From: Kan Liang <kan.li...@intel.com>
> 
> There could be different types of memory in the system. E.g normal
> System Memory, Persistent Memory. To understand how the workload maps to
> those memories, it's important to know the I/O statistics of them.
> Perf can collect physical addresses, but those are raw data.
> It still needs extra work to resolve the physical addresses.
> Provide a script to facilitate the physical addresses resolving and
> I/O statistics.

Reviewed-by: Andi Kleen <a...@linux.intel.com>

-Andi


Re: [PATCH V3] perf script: add script to profile and resolve physical mem type

2018-01-05 Thread Andi Kleen
On Thu, Jan 04, 2018 at 12:59:55PM -0800, kan.li...@intel.com wrote:
> From: Kan Liang 
> 
> There could be different types of memory in the system. E.g normal
> System Memory, Persistent Memory. To understand how the workload maps to
> those memories, it's important to know the I/O statistics of them.
> Perf can collect physical addresses, but those are raw data.
> It still needs extra work to resolve the physical addresses.
> Provide a script to facilitate the physical addresses resolving and
> I/O statistics.

Reviewed-by: Andi Kleen 

-Andi


[PATCH] retpoline/modpost: Quieten MODVERSION retpoline build

2018-01-05 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

The internal retpoline thunks used by the compiler contain a dot.

They have to be exported, but modversions cannot handle them
it because they don't have a prototype due to the C incompatible
name (and it doesn't support asm("..."))

This leads to lots of warnings from modpost with a retpoline
build with MODVERSIONS enabled. The actual symbols load fine,
they just don't get versioned. That's not a problem here
because we don't expect them to change ever.

Quieten the respective warning messages in modpost for any
symbols containing a dot.

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 scripts/mod/modpost.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 6510536c06df..7a06d2032ecf 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -693,7 +693,9 @@ static void handle_modversions(struct module *mod, struct 
elf_info *info,
 #endif
if (is_crc) {
const char *e = is_vmlinux(mod->name) ?"":".ko";
-   warn("EXPORT symbol \"%s\" [%s%s] version generation 
failed, symbol will not be versioned.\n", symname + strlen(CRC_PFX), mod->name, 
e);
+   const char *name = symname + strlen(CRC_PFX);
+   if (!strchr(name, '.'))
+   warn("EXPORT symbol \"%s\" [%s%s] version 
generation failed, symbol will not be versioned.\n", name, mod->name, e);
}
mod->unres = alloc_symbol(symname,
  ELF_ST_BIND(sym->st_info) == STB_WEAK,
@@ -2220,7 +,7 @@ static int add_versions(struct buffer *b, struct module 
*mod)
for (s = mod->unres; s; s = s->next) {
if (!s->module)
continue;
-   if (!s->crc_valid) {
+   if (!s->crc_valid && !strchr(s->name, '.')) {
warn("\"%s\" [%s.ko] has no CRC!\n",
s->name, mod->name);
continue;
-- 
2.14.3



[PATCH] retpoline/modpost: Quieten MODVERSION retpoline build

2018-01-05 Thread Andi Kleen
From: Andi Kleen 

The internal retpoline thunks used by the compiler contain a dot.

They have to be exported, but modversions cannot handle them
it because they don't have a prototype due to the C incompatible
name (and it doesn't support asm("..."))

This leads to lots of warnings from modpost with a retpoline
build with MODVERSIONS enabled. The actual symbols load fine,
they just don't get versioned. That's not a problem here
because we don't expect them to change ever.

Quieten the respective warning messages in modpost for any
symbols containing a dot.

Signed-off-by: Andi Kleen 
---
 scripts/mod/modpost.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 6510536c06df..7a06d2032ecf 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -693,7 +693,9 @@ static void handle_modversions(struct module *mod, struct 
elf_info *info,
 #endif
if (is_crc) {
const char *e = is_vmlinux(mod->name) ?"":".ko";
-   warn("EXPORT symbol \"%s\" [%s%s] version generation 
failed, symbol will not be versioned.\n", symname + strlen(CRC_PFX), mod->name, 
e);
+   const char *name = symname + strlen(CRC_PFX);
+   if (!strchr(name, '.'))
+   warn("EXPORT symbol \"%s\" [%s%s] version 
generation failed, symbol will not be versioned.\n", name, mod->name, e);
}
mod->unres = alloc_symbol(symname,
  ELF_ST_BIND(sym->st_info) == STB_WEAK,
@@ -2220,7 +,7 @@ static int add_versions(struct buffer *b, struct module 
*mod)
for (s = mod->unres; s; s = s->next) {
if (!s->module)
continue;
-   if (!s->crc_valid) {
+   if (!s->crc_valid && !strchr(s->name, '.')) {
warn("\"%s\" [%s.ko] has no CRC!\n",
s->name, mod->name);
continue;
-- 
2.14.3



Re: [PATCH v3 01/13] x86/retpoline: Add initial retpoline support

2018-01-05 Thread Andi Kleen
> If the *compiler* uses the out-of-line version, that's a separate
> thing. But for our asm cases, let's just make it all be the inline
> case, ok?

Should be a simple change.

> 
> It also should simplify the whole target generation. None of this
> silly "__x86.indirect_thunk.\reg" crap with different targets for
> different register choices.
> 
> Hmm?

We need the different thunks anyways for the compiler generated code.
So it won't go away.

-Andi


Re: [PATCH v3 01/13] x86/retpoline: Add initial retpoline support

2018-01-05 Thread Andi Kleen
> If the *compiler* uses the out-of-line version, that's a separate
> thing. But for our asm cases, let's just make it all be the inline
> case, ok?

Should be a simple change.

> 
> It also should simplify the whole target generation. None of this
> silly "__x86.indirect_thunk.\reg" crap with different targets for
> different register choices.
> 
> Hmm?

We need the different thunks anyways for the compiler generated code.
So it won't go away.

-Andi


Re: [PATCH v3 01/13] x86/retpoline: Add initial retpoline support

2018-01-04 Thread Andi Kleen
> Clearly Paul's approach to retpoline without lfence is faster.
> I'm guessing it wasn't shared with amazon/intel until now and
> this set of patches going to adopt it, right?
> 
> Paul, could you share a link to a set of alternative gcc patches
> that do retpoline similar to llvm diff ?

I don't think it's a good idea to use any sequence not signed
off by CPU designers and extensively tested. 

While another one may work for most tests, it could always fail in
some corner case.

Right now we have the more heavy weight one and I would
suggest to stay with that one for now. Then worry
about more optimizations later.

Correctness first.

-Andi


Re: [PATCH v3 01/13] x86/retpoline: Add initial retpoline support

2018-01-04 Thread Andi Kleen
> Clearly Paul's approach to retpoline without lfence is faster.
> I'm guessing it wasn't shared with amazon/intel until now and
> this set of patches going to adopt it, right?
> 
> Paul, could you share a link to a set of alternative gcc patches
> that do retpoline similar to llvm diff ?

I don't think it's a good idea to use any sequence not signed
off by CPU designers and extensively tested. 

While another one may work for most tests, it could always fail in
some corner case.

Right now we have the more heavy weight one and I would
suggest to stay with that one for now. Then worry
about more optimizations later.

Correctness first.

-Andi


Re: [PATCH v3 10/13] x86/retpoline/pvops: Convert assembler indirect jumps

2018-01-04 Thread Andi Kleen
On Thu, Jan 04, 2018 at 04:02:06PM +0100, Juergen Gross wrote:
> On 04/01/18 15:37, David Woodhouse wrote:
> > Convert pvops invocations to use non-speculative call sequences, when
> > CONFIG_RETPOLINE is enabled.
> > 
> > There is scope for future optimisation here — once the pvops methods are
> > actually set, we could just turn the damn things into *direct* jumps.
> > But this is perfectly sufficient for now, without that added complexity.
> 
> I don't see the need to modify the pvops calls.
> 
> All indirect calls are replaced by either direct calls or other code
> long before any user code is active.
> 
> For modules the replacements are in place before the module is being
> used.

Agreed. This shouldn't be needed.

-Andi


Re: [PATCH v3 10/13] x86/retpoline/pvops: Convert assembler indirect jumps

2018-01-04 Thread Andi Kleen
On Thu, Jan 04, 2018 at 04:02:06PM +0100, Juergen Gross wrote:
> On 04/01/18 15:37, David Woodhouse wrote:
> > Convert pvops invocations to use non-speculative call sequences, when
> > CONFIG_RETPOLINE is enabled.
> > 
> > There is scope for future optimisation here — once the pvops methods are
> > actually set, we could just turn the damn things into *direct* jumps.
> > But this is perfectly sufficient for now, without that added complexity.
> 
> I don't see the need to modify the pvops calls.
> 
> All indirect calls are replaced by either direct calls or other code
> long before any user code is active.
> 
> For modules the replacements are in place before the module is being
> used.

Agreed. This shouldn't be needed.

-Andi


Re: [PATCH v2 11/12] retpoline/objtool: Disable some objtool warnings

2018-01-04 Thread Andi Kleen
On Thu, Jan 04, 2018 at 10:06:01AM -0600, Josh Poimboeuf wrote:
> On Thu, Jan 04, 2018 at 07:59:14AM -0800, Andi Kleen wrote:
> > > NAK.  We can't blindly disable objtool warnings, that will break
> > > livepatch and the ORC unwinder.  If you share a .o file (or the GCC
> > > code) I can look at adding retpoline support.
> > 
> > I don't think we can wait for that. We can disable livepatch and the
> > unwinder for now. They are not essential. Frame pointers should work
> > well enough for unwinding
> 
> If you want to make this feature conflict with livepatch and ORC,
> silencing objtool warnings is not the way to do it.

I don't see why it would conflict with the unwinder anyways?

It doesn't change the long term stack state, so it should be invisible to the 
unwinder (unless you crash in the thunk, which is very unlikely)

I actually got some unwinder backtraces during development and they seemed
to work.

> 
> > and afaik nobody can use livepatch in mainline anyways.
> 
> Why not?  The patch creation tooling is still out-of-tree, but livepatch
> itself is fully supported in mainline.

Ok.

Still doesn't seem critical at this point if it's some out of tree
thing.

-Andi


Re: [PATCH v2 11/12] retpoline/objtool: Disable some objtool warnings

2018-01-04 Thread Andi Kleen
On Thu, Jan 04, 2018 at 10:06:01AM -0600, Josh Poimboeuf wrote:
> On Thu, Jan 04, 2018 at 07:59:14AM -0800, Andi Kleen wrote:
> > > NAK.  We can't blindly disable objtool warnings, that will break
> > > livepatch and the ORC unwinder.  If you share a .o file (or the GCC
> > > code) I can look at adding retpoline support.
> > 
> > I don't think we can wait for that. We can disable livepatch and the
> > unwinder for now. They are not essential. Frame pointers should work
> > well enough for unwinding
> 
> If you want to make this feature conflict with livepatch and ORC,
> silencing objtool warnings is not the way to do it.

I don't see why it would conflict with the unwinder anyways?

It doesn't change the long term stack state, so it should be invisible to the 
unwinder (unless you crash in the thunk, which is very unlikely)

I actually got some unwinder backtraces during development and they seemed
to work.

> 
> > and afaik nobody can use livepatch in mainline anyways.
> 
> Why not?  The patch creation tooling is still out-of-tree, but livepatch
> itself is fully supported in mainline.

Ok.

Still doesn't seem critical at this point if it's some out of tree
thing.

-Andi


Re: [PATCH v2 11/12] retpoline/objtool: Disable some objtool warnings

2018-01-04 Thread Andi Kleen
> NAK.  We can't blindly disable objtool warnings, that will break
> livepatch and the ORC unwinder.  If you share a .o file (or the GCC
> code) I can look at adding retpoline support.

I don't think we can wait for that. We can disable livepatch and the
unwinder for now. They are not essential. Frame pointers should work
well enough for unwinding and afaik nobody can use livepatch in mainline 
anyways.

-Andi


Re: [PATCH v2 11/12] retpoline/objtool: Disable some objtool warnings

2018-01-04 Thread Andi Kleen
> NAK.  We can't blindly disable objtool warnings, that will break
> livepatch and the ORC unwinder.  If you share a .o file (or the GCC
> code) I can look at adding retpoline support.

I don't think we can wait for that. We can disable livepatch and the
unwinder for now. They are not essential. Frame pointers should work
well enough for unwinding and afaik nobody can use livepatch in mainline 
anyways.

-Andi


Re: Avoid speculative indirect calls in kernel

2018-01-04 Thread Andi Kleen
> +.macro JMP_THUNK reg:req
> +#ifdef RETPOLINE
> +   ALTERNATIVE __stringify(jmp __x86.indirect_thunk.\reg), 
> __stringify(jmp *%\reg), X86_FEATURE_IBRS_ATT
> +#else
> +   jmp *\reg
> +#endif
> +.endm

I remove that because what you're testing for doesn't exist in the tree yet. 

Yes it can be added later.

Right now we just want a basic static version to work reliably.

-Andi


Re: Avoid speculative indirect calls in kernel

2018-01-04 Thread Andi Kleen
> +.macro JMP_THUNK reg:req
> +#ifdef RETPOLINE
> +   ALTERNATIVE __stringify(jmp __x86.indirect_thunk.\reg), 
> __stringify(jmp *%\reg), X86_FEATURE_IBRS_ATT
> +#else
> +   jmp *\reg
> +#endif
> +.endm

I remove that because what you're testing for doesn't exist in the tree yet. 

Yes it can be added later.

Right now we just want a basic static version to work reliably.

-Andi


Re: [PATCH v2 06/12] x86/retpoline/crypto: Convert xen assembler indirect jumps

2018-01-03 Thread Andi Kleen
> > diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S 
> > b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
> > index 1743e6850e00..9cd8450a2050 100644
> > --- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
> > +++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
> > @@ -12,6 +12,7 @@
> >  
> >  #include 
> >  #include 
> > +#include 
> 
> I fail to connect this change to the patch title.
> 
> Maybe should be part of the crypto patch?

Right I moved the hunk into the wrong patch. Will fix.

-Andi


Re: [PATCH v2 06/12] x86/retpoline/crypto: Convert xen assembler indirect jumps

2018-01-03 Thread Andi Kleen
> > diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S 
> > b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
> > index 1743e6850e00..9cd8450a2050 100644
> > --- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
> > +++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
> > @@ -12,6 +12,7 @@
> >  
> >  #include 
> >  #include 
> > +#include 
> 
> I fail to connect this change to the patch title.
> 
> Maybe should be part of the crypto patch?

Right I moved the hunk into the wrong patch. Will fix.

-Andi


[PATCH v2 06/12] x86/retpoline/crypto: Convert xen assembler indirect jumps

2018-01-03 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

Convert all indirect jumps in xen inline assembler code to use
non speculative sequences.

Based on code from David Woodhouse and Tim Chen

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 1 +
 arch/x86/include/asm/xen/hypercall.h | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S 
b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
index 1743e6850e00..9cd8450a2050 100644
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -12,6 +12,7 @@
 
 #include 
 #include 
+#include 
 
 #define CAMELLIA_TABLE_BYTE_LEN 272
 
diff --git a/arch/x86/include/asm/xen/hypercall.h 
b/arch/x86/include/asm/xen/hypercall.h
index 7cb282e9e587..91de35bcce5e 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -217,7 +218,7 @@ privcmd_call(unsigned call,
__HYPERCALL_5ARG(a1, a2, a3, a4, a5);
 
stac();
-   asm volatile("call *%[call]"
+   asm volatile(NOSPEC_CALL("%[call]")
 : __HYPERCALL_5PARAM
 : [call] "a" (_page[call])
 : __HYPERCALL_CLOBBER5);
-- 
2.14.3



[PATCH v2 06/12] x86/retpoline/crypto: Convert xen assembler indirect jumps

2018-01-03 Thread Andi Kleen
From: Andi Kleen 

Convert all indirect jumps in xen inline assembler code to use
non speculative sequences.

Based on code from David Woodhouse and Tim Chen

Signed-off-by: Andi Kleen 
---
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 1 +
 arch/x86/include/asm/xen/hypercall.h | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S 
b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
index 1743e6850e00..9cd8450a2050 100644
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -12,6 +12,7 @@
 
 #include 
 #include 
+#include 
 
 #define CAMELLIA_TABLE_BYTE_LEN 272
 
diff --git a/arch/x86/include/asm/xen/hypercall.h 
b/arch/x86/include/asm/xen/hypercall.h
index 7cb282e9e587..91de35bcce5e 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -217,7 +218,7 @@ privcmd_call(unsigned call,
__HYPERCALL_5ARG(a1, a2, a3, a4, a5);
 
stac();
-   asm volatile("call *%[call]"
+   asm volatile(NOSPEC_CALL("%[call]")
 : __HYPERCALL_5PARAM
 : [call] "a" (_page[call])
 : __HYPERCALL_CLOBBER5);
-- 
2.14.3



[PATCH v2 05/12] x86/retpoline/hyperv: Convert assembler indirect jumps

2018-01-03 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

Convert all indirect jumps in hyperv inline asm code to use
non speculative sequences.

Based on code from David Woodhouse and Tim Chen

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 arch/x86/include/asm/mshyperv.h | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 5400add2885b..2dfcdd401d4c 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
@@ -186,7 +187,7 @@ static inline u64 hv_do_hypercall(u64 control, void *input, 
void *output)
return U64_MAX;
 
__asm__ __volatile__("mov %4, %%r8\n"
-"call *%5"
+NOSPEC_CALL("%5")
 : "=a" (hv_status), ASM_CALL_CONSTRAINT,
   "+c" (control), "+d" (input_address)
 :  "r" (output_address), "m" (hv_hypercall_pg)
@@ -200,7 +201,7 @@ static inline u64 hv_do_hypercall(u64 control, void *input, 
void *output)
if (!hv_hypercall_pg)
return U64_MAX;
 
-   __asm__ __volatile__("call *%7"
+   __asm__ __volatile__(NOSPEC_CALL("%7")
 : "=A" (hv_status),
   "+c" (input_address_lo), ASM_CALL_CONSTRAINT
 : "A" (control),
@@ -227,7 +228,7 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 
input1)
 
 #ifdef CONFIG_X86_64
{
-   __asm__ __volatile__("call *%4"
+   __asm__ __volatile__(NOSPEC_CALL("%4")
 : "=a" (hv_status), ASM_CALL_CONSTRAINT,
   "+c" (control), "+d" (input1)
 : "m" (hv_hypercall_pg)
@@ -238,7 +239,7 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 
input1)
u32 input1_hi = upper_32_bits(input1);
u32 input1_lo = lower_32_bits(input1);
 
-   __asm__ __volatile__ ("call *%5"
+   __asm__ __volatile__ (NOSPEC_CALL("%5")
  : "=A"(hv_status),
"+c"(input1_lo),
ASM_CALL_CONSTRAINT
-- 
2.14.3



[PATCH v2 05/12] x86/retpoline/hyperv: Convert assembler indirect jumps

2018-01-03 Thread Andi Kleen
From: Andi Kleen 

Convert all indirect jumps in hyperv inline asm code to use
non speculative sequences.

Based on code from David Woodhouse and Tim Chen

Signed-off-by: Andi Kleen 
---
 arch/x86/include/asm/mshyperv.h | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 5400add2885b..2dfcdd401d4c 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent
@@ -186,7 +187,7 @@ static inline u64 hv_do_hypercall(u64 control, void *input, 
void *output)
return U64_MAX;
 
__asm__ __volatile__("mov %4, %%r8\n"
-"call *%5"
+NOSPEC_CALL("%5")
 : "=a" (hv_status), ASM_CALL_CONSTRAINT,
   "+c" (control), "+d" (input_address)
 :  "r" (output_address), "m" (hv_hypercall_pg)
@@ -200,7 +201,7 @@ static inline u64 hv_do_hypercall(u64 control, void *input, 
void *output)
if (!hv_hypercall_pg)
return U64_MAX;
 
-   __asm__ __volatile__("call *%7"
+   __asm__ __volatile__(NOSPEC_CALL("%7")
 : "=A" (hv_status),
   "+c" (input_address_lo), ASM_CALL_CONSTRAINT
 : "A" (control),
@@ -227,7 +228,7 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 
input1)
 
 #ifdef CONFIG_X86_64
{
-   __asm__ __volatile__("call *%4"
+   __asm__ __volatile__(NOSPEC_CALL("%4")
 : "=a" (hv_status), ASM_CALL_CONSTRAINT,
   "+c" (control), "+d" (input1)
 : "m" (hv_hypercall_pg)
@@ -238,7 +239,7 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 
input1)
u32 input1_hi = upper_32_bits(input1);
u32 input1_lo = lower_32_bits(input1);
 
-   __asm__ __volatile__ ("call *%5"
+   __asm__ __volatile__ (NOSPEC_CALL("%5")
  : "=A"(hv_status),
"+c"(input1_lo),
ASM_CALL_CONSTRAINT
-- 
2.14.3



[PATCH v2 01/12] x86/retpoline: Define retpoline indirect thunk and macros

2018-01-03 Thread Andi Kleen
From: Dave Hansen <dave.han...@linux.intel.com>

From: David Woodhouse <d...@amazon.co.uk>

retpoline is a special sequence on Intel CPUs to stop speculation for
indirect branches.

Provide assembler infrastructure to use retpoline by the compiler
and for assembler. We add the out of line trampoline used by the
compiler, and NOSPEC_JUMP / NOSPEC_CALL macros for assembler

[Originally from David and Tim, heavily hacked by AK]

v2: Add CONFIG_RETPOLINE option
Signed-off-by: David Woodhouse <d...@amazon.co.uk>
Signed-off-by: Tim Chen <tim.c.c...@linux.intel.com>
Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 arch/x86/Kconfig|  8 +
 arch/x86/include/asm/jump-asm.h | 70 +
 arch/x86/kernel/vmlinux.lds.S   |  1 +
 arch/x86/lib/Makefile   |  1 +
 arch/x86/lib/retpoline.S| 35 +
 5 files changed, 115 insertions(+)
 create mode 100644 arch/x86/include/asm/jump-asm.h
 create mode 100644 arch/x86/lib/retpoline.S

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d4fc98c50378..8b0facfa35be 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -429,6 +429,14 @@ config GOLDFISH
def_bool y
depends on X86_GOLDFISH
 
+config RETPOLINE
+   bool "Avoid speculative indirect branches in kernel"
+   default y
+   help
+ Compile kernel with the retpoline compiler options to guard against
+ kernel to user data leaks by avoiding speculative indirect
+ branches. Requires a new enough compiler. The kernel may run slower.
+
 config INTEL_RDT
bool "Intel Resource Director Technology support"
default n
diff --git a/arch/x86/include/asm/jump-asm.h b/arch/x86/include/asm/jump-asm.h
new file mode 100644
index ..936fa620f346
--- /dev/null
+++ b/arch/x86/include/asm/jump-asm.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef JUMP_ASM_H
+#define JUMP_ASM_H 1
+
+#ifdef __ASSEMBLY__
+
+#ifdef CONFIG_RETPOLINE
+
+/*
+ * Jump to an indirect pointer without speculation.
+ *
+ * The out of line __x86.indirect_thunk has special code sequences
+ * to stop speculation.
+ */
+
+.macro NOSPEC_JMP target
+   push\target
+   jmp __x86.indirect_thunk
+.endm
+
+
+/*
+ * Call an indirect pointer without speculation.
+ */
+
+.macro NOSPEC_CALL target
+   jmp 1221f
+1222:
+   push\target
+   jmp __x86.indirect_thunk
+1221:
+   call1222b
+.endm
+
+#else /* CONFIG_RETPOLINE */
+
+.macro NOSPEC_JMP target
+   jmp *\target
+.endm
+
+.macro NOSPEC_CALL target
+   call *\target
+.endm
+
+#endif /* !CONFIG_RETPOLINE */
+
+#else /* __ASSEMBLY__ */
+
+#ifdef CONFIG_RETPOLINE
+
+#define NOSPEC_JMP(t) \
+   "push " t "; "  \
+   "jmp __x86.indirect_thunk; "
+
+#define NOSPEC_CALL(t) \
+   "   jmp 1221f; "\
+   "1222:  push " t ";"\
+   "   jmp __x86.indirect_thunk;"  \
+   "1221:  call 1222b;"
+
+#else /* CONFIG_RETPOLINE */
+
+#define NOSPEC_JMP(t) "jmp *" t "; "
+#define NOSPEC_CALL(t) "call *" t "; "
+
+#endif /* !CONFIG_RETPOLINE */
+
+#endif /* !__ASSEMBLY */
+
+#endif
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 1e413a9326aa..2e64241a6664 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -103,6 +103,7 @@ SECTIONS
/* bootstrapping code */
HEAD_TEXT
. = ALIGN(8);
+   *(.text.__x86.indirect_thunk)
TEXT_TEXT
SCHED_TEXT
CPUIDLE_TEXT
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 7b181b61170e..f23934bbaf4e 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -26,6 +26,7 @@ lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
+lib-$(CONFIG_RETPOLINE) += retpoline.o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
 
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
new file mode 100644
index ..cb40781adbfe
--- /dev/null
+++ b/arch/x86/lib/retpoline.S
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Out of line jump trampoline for calls that disable speculation.
+ *
+ * This is a special sequence that prevents the CPU speculating
+ * for indirect calls.
+ *
+ * This can be called by gcc generated code, or with the asm macros
+ * in asm/jump-asm.h
+ */
+
+#include 
+#include 
+#include 
+
+   .section.text.__x86.indirect_thunk,"ax"
+
+ENTRY(__x86.indirect_thunk)
+   CFI_STARTPROC
+   callretpoline_call_target
+2:
+   lfence   

[PATCH v2 01/12] x86/retpoline: Define retpoline indirect thunk and macros

2018-01-03 Thread Andi Kleen
From: Dave Hansen 

From: David Woodhouse 

retpoline is a special sequence on Intel CPUs to stop speculation for
indirect branches.

Provide assembler infrastructure to use retpoline by the compiler
and for assembler. We add the out of line trampoline used by the
compiler, and NOSPEC_JUMP / NOSPEC_CALL macros for assembler

[Originally from David and Tim, heavily hacked by AK]

v2: Add CONFIG_RETPOLINE option
Signed-off-by: David Woodhouse 
Signed-off-by: Tim Chen 
Signed-off-by: Andi Kleen 
---
 arch/x86/Kconfig|  8 +
 arch/x86/include/asm/jump-asm.h | 70 +
 arch/x86/kernel/vmlinux.lds.S   |  1 +
 arch/x86/lib/Makefile   |  1 +
 arch/x86/lib/retpoline.S| 35 +
 5 files changed, 115 insertions(+)
 create mode 100644 arch/x86/include/asm/jump-asm.h
 create mode 100644 arch/x86/lib/retpoline.S

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d4fc98c50378..8b0facfa35be 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -429,6 +429,14 @@ config GOLDFISH
def_bool y
depends on X86_GOLDFISH
 
+config RETPOLINE
+   bool "Avoid speculative indirect branches in kernel"
+   default y
+   help
+ Compile kernel with the retpoline compiler options to guard against
+ kernel to user data leaks by avoiding speculative indirect
+ branches. Requires a new enough compiler. The kernel may run slower.
+
 config INTEL_RDT
bool "Intel Resource Director Technology support"
default n
diff --git a/arch/x86/include/asm/jump-asm.h b/arch/x86/include/asm/jump-asm.h
new file mode 100644
index ..936fa620f346
--- /dev/null
+++ b/arch/x86/include/asm/jump-asm.h
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef JUMP_ASM_H
+#define JUMP_ASM_H 1
+
+#ifdef __ASSEMBLY__
+
+#ifdef CONFIG_RETPOLINE
+
+/*
+ * Jump to an indirect pointer without speculation.
+ *
+ * The out of line __x86.indirect_thunk has special code sequences
+ * to stop speculation.
+ */
+
+.macro NOSPEC_JMP target
+   push\target
+   jmp __x86.indirect_thunk
+.endm
+
+
+/*
+ * Call an indirect pointer without speculation.
+ */
+
+.macro NOSPEC_CALL target
+   jmp 1221f
+1222:
+   push\target
+   jmp __x86.indirect_thunk
+1221:
+   call1222b
+.endm
+
+#else /* CONFIG_RETPOLINE */
+
+.macro NOSPEC_JMP target
+   jmp *\target
+.endm
+
+.macro NOSPEC_CALL target
+   call *\target
+.endm
+
+#endif /* !CONFIG_RETPOLINE */
+
+#else /* __ASSEMBLY__ */
+
+#ifdef CONFIG_RETPOLINE
+
+#define NOSPEC_JMP(t) \
+   "push " t "; "  \
+   "jmp __x86.indirect_thunk; "
+
+#define NOSPEC_CALL(t) \
+   "   jmp 1221f; "\
+   "1222:  push " t ";"\
+   "   jmp __x86.indirect_thunk;"  \
+   "1221:  call 1222b;"
+
+#else /* CONFIG_RETPOLINE */
+
+#define NOSPEC_JMP(t) "jmp *" t "; "
+#define NOSPEC_CALL(t) "call *" t "; "
+
+#endif /* !CONFIG_RETPOLINE */
+
+#endif /* !__ASSEMBLY */
+
+#endif
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 1e413a9326aa..2e64241a6664 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -103,6 +103,7 @@ SECTIONS
/* bootstrapping code */
HEAD_TEXT
. = ALIGN(8);
+   *(.text.__x86.indirect_thunk)
TEXT_TEXT
SCHED_TEXT
CPUIDLE_TEXT
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 7b181b61170e..f23934bbaf4e 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -26,6 +26,7 @@ lib-y += memcpy_$(BITS).o
 lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
 lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o
 lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
+lib-$(CONFIG_RETPOLINE) += retpoline.o
 
 obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
 
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
new file mode 100644
index ..cb40781adbfe
--- /dev/null
+++ b/arch/x86/lib/retpoline.S
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+/*
+ * Out of line jump trampoline for calls that disable speculation.
+ *
+ * This is a special sequence that prevents the CPU speculating
+ * for indirect calls.
+ *
+ * This can be called by gcc generated code, or with the asm macros
+ * in asm/jump-asm.h
+ */
+
+#include 
+#include 
+#include 
+
+   .section.text.__x86.indirect_thunk,"ax"
+
+ENTRY(__x86.indirect_thunk)
+   CFI_STARTPROC
+   callretpoline_call_target
+2:
+   lfence  /* stop speculation */
+   jmp 2b
+retpoline_call_target:
+#ifdef CONFIG_64BIT
+   lea 8(%rsp), %rsp
+#else
+   lea 4(%esp), %esp
+#endif
+   ret
+   CFI_ENDPROC
+ENDPROC(__x86.indirect_thunk)
+
+   EXPORT_SYMBOL(__x86.indirect_thunk)
-- 
2.14.3



[PATCH v2 12/12] retpoline: Attempt to quiten objtool warning for unreachable code

2018-01-03 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

The speculative jump trampoline has to contain unreachable code.
objtool keeps complaining

arch/x86/lib/retpoline.o: warning: objtool: __x86.indirect_thunk()+0x8: 
unreachable instruction

I tried to fix it here by adding ASM_UNREACHABLE annotation (after
supporting them for pure assembler), but it still complains.
Seems like a objtool bug?

So it doesn't actually fix the warning oyet.
Of course it's just a warning so the kernel will still work fine.

Perhaps Josh can figure it out

Cc: jpoim...@redhat.com
Not-Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 arch/x86/lib/retpoline.S |  3 +++
 include/linux/compiler.h | 10 +-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index cb40781adbfe..f4ed556a90ee 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -11,6 +11,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 
@@ -21,7 +22,9 @@ ENTRY(__x86.indirect_thunk)
callretpoline_call_target
 2:
lfence  /* stop speculation */
+   ASM_UNREACHABLE
jmp 2b
+   ASM_UNREACHABLE
 retpoline_call_target:
 #ifdef CONFIG_64BIT
lea 8(%rsp), %rsp
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 52e611ab9a6c..cfba91acc79a 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -269,7 +269,15 @@ static __always_inline void __write_once_size(volatile 
void *p, void *res, int s
 
 #endif /* __KERNEL__ */
 
-#endif /* __ASSEMBLY__ */
+#else /* __ASSEMBLY__ */
+
+#define ASM_UNREACHABLE \
+   999:\
+   .pushsection .discard.unreachable;  \
+   .long 999b - .; \
+   .popsection
+
+#endif /* !__ASSEMBLY__ */
 
 /* Compile time object size, -1 for unknown */
 #ifndef __compiletime_object_size
-- 
2.14.3



[PATCH v2 03/12] x86/retpoline/entry: Convert entry assembler indirect jumps

2018-01-03 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

Convert all indirect jumps in core 32/64bit entry assembler code to use
non speculative sequences.

Based on code from David Woodhouse and Tim Chen

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 arch/x86/entry/entry_32.S |  5 +++--
 arch/x86/entry/entry_64.S | 12 +++-
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index ace8f321a5a1..a4b88260d6f7 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
.section .entry.text, "ax"
 
@@ -290,7 +291,7 @@ ENTRY(ret_from_fork)
 
/* kernel thread */
 1: movl%edi, %eax
-   call*%ebx
+   NOSPEC_CALL %ebx
/*
 * A kernel thread is allowed to return here after successfully
 * calling do_execve().  Exit to userspace to complete the execve()
@@ -919,7 +920,7 @@ common_exception:
movl%ecx, %es
TRACE_IRQS_OFF
movl%esp, %eax  # pt_regs pointer
-   call*%edi
+   NOSPEC_CALL %edi
jmp ret_from_exception
 END(common_exception)
 
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f048e384ff54..486990fb3e4d 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "calling.h"
@@ -191,7 +192,7 @@ ENTRY(entry_SYSCALL_64_trampoline)
 */
pushq   %rdi
movq$entry_SYSCALL_64_stage2, %rdi
-   jmp *%rdi
+   NOSPEC_JMP  %rdi
 END(entry_SYSCALL_64_trampoline)
 
.popsection
@@ -269,8 +270,9 @@ entry_SYSCALL_64_fastpath:
 * This call instruction is handled specially in stub_ptregs_64.
 * It might end up jumping to the slow path.  If it jumps, RAX
 * and all argument registers are clobbered.
-*/
-   call*sys_call_table(, %rax, 8)
+   */
+   movqsys_call_table(, %rax, 8), %rax
+   NOSPEC_CALL %rax
 .Lentry_SYSCALL_64_after_fastpath_call:
 
movq%rax, RAX(%rsp)
@@ -442,7 +444,7 @@ ENTRY(stub_ptregs_64)
jmp entry_SYSCALL64_slow_path
 
 1:
-   jmp *%rax   /* Called from C */
+   NOSPEC_JMP  %rax/* Called from C */
 END(stub_ptregs_64)
 
 .macro ptregs_stub func
@@ -521,7 +523,7 @@ ENTRY(ret_from_fork)
 1:
/* kernel thread */
movq%r12, %rdi
-   call*%rbx
+   NOSPEC_CALL %rbx
/*
 * A kernel thread is allowed to return here after successfully
 * calling do_execve().  Exit to userspace to complete the execve()
-- 
2.14.3



[PATCH v2 03/12] x86/retpoline/entry: Convert entry assembler indirect jumps

2018-01-03 Thread Andi Kleen
From: Andi Kleen 

Convert all indirect jumps in core 32/64bit entry assembler code to use
non speculative sequences.

Based on code from David Woodhouse and Tim Chen

Signed-off-by: Andi Kleen 
---
 arch/x86/entry/entry_32.S |  5 +++--
 arch/x86/entry/entry_64.S | 12 +++-
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index ace8f321a5a1..a4b88260d6f7 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
.section .entry.text, "ax"
 
@@ -290,7 +291,7 @@ ENTRY(ret_from_fork)
 
/* kernel thread */
 1: movl%edi, %eax
-   call*%ebx
+   NOSPEC_CALL %ebx
/*
 * A kernel thread is allowed to return here after successfully
 * calling do_execve().  Exit to userspace to complete the execve()
@@ -919,7 +920,7 @@ common_exception:
movl%ecx, %es
TRACE_IRQS_OFF
movl%esp, %eax  # pt_regs pointer
-   call*%edi
+   NOSPEC_CALL %edi
jmp ret_from_exception
 END(common_exception)
 
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f048e384ff54..486990fb3e4d 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "calling.h"
@@ -191,7 +192,7 @@ ENTRY(entry_SYSCALL_64_trampoline)
 */
pushq   %rdi
movq$entry_SYSCALL_64_stage2, %rdi
-   jmp *%rdi
+   NOSPEC_JMP  %rdi
 END(entry_SYSCALL_64_trampoline)
 
.popsection
@@ -269,8 +270,9 @@ entry_SYSCALL_64_fastpath:
 * This call instruction is handled specially in stub_ptregs_64.
 * It might end up jumping to the slow path.  If it jumps, RAX
 * and all argument registers are clobbered.
-*/
-   call*sys_call_table(, %rax, 8)
+   */
+   movqsys_call_table(, %rax, 8), %rax
+   NOSPEC_CALL %rax
 .Lentry_SYSCALL_64_after_fastpath_call:
 
movq%rax, RAX(%rsp)
@@ -442,7 +444,7 @@ ENTRY(stub_ptregs_64)
jmp entry_SYSCALL64_slow_path
 
 1:
-   jmp *%rax   /* Called from C */
+   NOSPEC_JMP  %rax/* Called from C */
 END(stub_ptregs_64)
 
 .macro ptregs_stub func
@@ -521,7 +523,7 @@ ENTRY(ret_from_fork)
 1:
/* kernel thread */
movq%r12, %rdi
-   call*%rbx
+   NOSPEC_CALL %rbx
/*
 * A kernel thread is allowed to return here after successfully
 * calling do_execve().  Exit to userspace to complete the execve()
-- 
2.14.3



[PATCH v2 12/12] retpoline: Attempt to quiten objtool warning for unreachable code

2018-01-03 Thread Andi Kleen
From: Andi Kleen 

The speculative jump trampoline has to contain unreachable code.
objtool keeps complaining

arch/x86/lib/retpoline.o: warning: objtool: __x86.indirect_thunk()+0x8: 
unreachable instruction

I tried to fix it here by adding ASM_UNREACHABLE annotation (after
supporting them for pure assembler), but it still complains.
Seems like a objtool bug?

So it doesn't actually fix the warning oyet.
Of course it's just a warning so the kernel will still work fine.

Perhaps Josh can figure it out

Cc: jpoim...@redhat.com
Not-Signed-off-by: Andi Kleen 
---
 arch/x86/lib/retpoline.S |  3 +++
 include/linux/compiler.h | 10 +-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index cb40781adbfe..f4ed556a90ee 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -11,6 +11,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 
@@ -21,7 +22,9 @@ ENTRY(__x86.indirect_thunk)
callretpoline_call_target
 2:
lfence  /* stop speculation */
+   ASM_UNREACHABLE
jmp 2b
+   ASM_UNREACHABLE
 retpoline_call_target:
 #ifdef CONFIG_64BIT
lea 8(%rsp), %rsp
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 52e611ab9a6c..cfba91acc79a 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -269,7 +269,15 @@ static __always_inline void __write_once_size(volatile 
void *p, void *res, int s
 
 #endif /* __KERNEL__ */
 
-#endif /* __ASSEMBLY__ */
+#else /* __ASSEMBLY__ */
+
+#define ASM_UNREACHABLE \
+   999:\
+   .pushsection .discard.unreachable;  \
+   .long 999b - .; \
+   .popsection
+
+#endif /* !__ASSEMBLY__ */
 
 /* Compile time object size, -1 for unknown */
 #ifndef __compiletime_object_size
-- 
2.14.3



[PATCH v2 02/12] x86/retpoline/crypto: Convert crypto assembler indirect jumps

2018-01-03 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

Convert all indirect jumps in crypto assembler code to use
non speculative sequences.

Based on code from David Woodhouse and Tim Chen

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 arch/x86/crypto/aesni-intel_asm.S| 5 +++--
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  | 3 ++-
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 2 +-
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S| 3 ++-
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 16627fec80b2..f6c964f30ede 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * The following macros are used to move an (un)aligned 16 byte value to/from
@@ -2884,7 +2885,7 @@ ENTRY(aesni_xts_crypt8)
pxor INC, STATE4
movdqu IV, 0x30(OUTP)
 
-   call *%r11
+   NOSPEC_CALL %r11
 
movdqu 0x00(OUTP), INC
pxor INC, STATE1
@@ -2929,7 +2930,7 @@ ENTRY(aesni_xts_crypt8)
_aesni_gf128mul_x_ble()
movups IV, (IVP)
 
-   call *%r11
+   NOSPEC_CALL %r11
 
movdqu 0x40(OUTP), INC
pxor INC, STATE1
diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S 
b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index f7c495e2863c..9006543227b2 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -17,6 +17,7 @@
 
 #include 
 #include 
+#include 
 
 #define CAMELLIA_TABLE_BYTE_LEN 272
 
@@ -1227,7 +1228,7 @@ camellia_xts_crypt_16way:
vpxor 14 * 16(%rax), %xmm15, %xmm14;
vpxor 15 * 16(%rax), %xmm15, %xmm15;
 
-   call *%r9;
+   NOSPEC_CALL %r9;
 
addq $(16 * 16), %rsp;
 
diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S 
b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
index eee5b3982cfd..1743e6850e00 100644
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -1343,7 +1343,7 @@ camellia_xts_crypt_32way:
vpxor 14 * 32(%rax), %ymm15, %ymm14;
vpxor 15 * 32(%rax), %ymm15, %ymm15;
 
-   call *%r9;
+   NOSPEC_CALL %r9;
 
addq $(16 * 32), %rsp;
 
diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S 
b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
index 7a7de27c6f41..969000c64456 100644
--- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
+++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
@@ -45,6 +45,7 @@
 
 #include 
 #include 
+#include 
 
 ## ISCSI CRC 32 Implementation with crc32 and pclmulqdq Instruction
 
@@ -172,7 +173,7 @@ continue_block:
movzxw  (bufp, %rax, 2), len
lea crc_array(%rip), bufp
lea (bufp, len, 1), bufp
-   jmp *bufp
+   NOSPEC_JMP bufp
 

## 2a) PROCESS FULL BLOCKS:
-- 
2.14.3



[PATCH v2 02/12] x86/retpoline/crypto: Convert crypto assembler indirect jumps

2018-01-03 Thread Andi Kleen
From: Andi Kleen 

Convert all indirect jumps in crypto assembler code to use
non speculative sequences.

Based on code from David Woodhouse and Tim Chen

Signed-off-by: Andi Kleen 
---
 arch/x86/crypto/aesni-intel_asm.S| 5 +++--
 arch/x86/crypto/camellia-aesni-avx-asm_64.S  | 3 ++-
 arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 2 +-
 arch/x86/crypto/crc32c-pcl-intel-asm_64.S| 3 ++-
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_asm.S 
b/arch/x86/crypto/aesni-intel_asm.S
index 16627fec80b2..f6c964f30ede 100644
--- a/arch/x86/crypto/aesni-intel_asm.S
+++ b/arch/x86/crypto/aesni-intel_asm.S
@@ -32,6 +32,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * The following macros are used to move an (un)aligned 16 byte value to/from
@@ -2884,7 +2885,7 @@ ENTRY(aesni_xts_crypt8)
pxor INC, STATE4
movdqu IV, 0x30(OUTP)
 
-   call *%r11
+   NOSPEC_CALL %r11
 
movdqu 0x00(OUTP), INC
pxor INC, STATE1
@@ -2929,7 +2930,7 @@ ENTRY(aesni_xts_crypt8)
_aesni_gf128mul_x_ble()
movups IV, (IVP)
 
-   call *%r11
+   NOSPEC_CALL %r11
 
movdqu 0x40(OUTP), INC
pxor INC, STATE1
diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S 
b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
index f7c495e2863c..9006543227b2 100644
--- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S
@@ -17,6 +17,7 @@
 
 #include 
 #include 
+#include 
 
 #define CAMELLIA_TABLE_BYTE_LEN 272
 
@@ -1227,7 +1228,7 @@ camellia_xts_crypt_16way:
vpxor 14 * 16(%rax), %xmm15, %xmm14;
vpxor 15 * 16(%rax), %xmm15, %xmm15;
 
-   call *%r9;
+   NOSPEC_CALL %r9;
 
addq $(16 * 16), %rsp;
 
diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S 
b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
index eee5b3982cfd..1743e6850e00 100644
--- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
+++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S
@@ -1343,7 +1343,7 @@ camellia_xts_crypt_32way:
vpxor 14 * 32(%rax), %ymm15, %ymm14;
vpxor 15 * 32(%rax), %ymm15, %ymm15;
 
-   call *%r9;
+   NOSPEC_CALL %r9;
 
addq $(16 * 32), %rsp;
 
diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S 
b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
index 7a7de27c6f41..969000c64456 100644
--- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
+++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S
@@ -45,6 +45,7 @@
 
 #include 
 #include 
+#include 
 
 ## ISCSI CRC 32 Implementation with crc32 and pclmulqdq Instruction
 
@@ -172,7 +173,7 @@ continue_block:
movzxw  (bufp, %rax, 2), len
lea crc_array(%rip), bufp
lea (bufp, len, 1), bufp
-   jmp *bufp
+   NOSPEC_JMP bufp
 

## 2a) PROCESS FULL BLOCKS:
-- 
2.14.3



[PATCH v2 07/12] x86/retpoline/checksum32: Convert assembler indirect jumps

2018-01-03 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

Convert all indirect jumps in 32bit checksum assembler code to use
non speculative sequences.

Based on code from David Woodhouse and Tim Chen

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 arch/x86/lib/checksum_32.S | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/checksum_32.S b/arch/x86/lib/checksum_32.S
index 4d34bb548b41..d19862183d4b 100644
--- a/arch/x86/lib/checksum_32.S
+++ b/arch/x86/lib/checksum_32.S
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 

 /*
  * computes a partial checksum, e.g. for TCP/UDP fragments
@@ -156,7 +157,7 @@ ENTRY(csum_partial)
negl %ebx
lea 45f(%ebx,%ebx,2), %ebx
testl %esi, %esi
-   jmp *%ebx
+   NOSPEC_JMP %ebx
 
# Handle 2-byte-aligned regions
 20:addw (%esi), %ax
@@ -439,7 +440,7 @@ ENTRY(csum_partial_copy_generic)
andl $-32,%edx
lea 3f(%ebx,%ebx), %ebx
testl %esi, %esi 
-   jmp *%ebx
+   NOSPEC_JMP %ebx
 1: addl $64,%esi
addl $64,%edi 
SRC(movb -32(%edx),%bl) ; SRC(movb (%edx),%bl)
-- 
2.14.3



[PATCH v2 07/12] x86/retpoline/checksum32: Convert assembler indirect jumps

2018-01-03 Thread Andi Kleen
From: Andi Kleen 

Convert all indirect jumps in 32bit checksum assembler code to use
non speculative sequences.

Based on code from David Woodhouse and Tim Chen

Signed-off-by: Andi Kleen 
---
 arch/x86/lib/checksum_32.S | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/checksum_32.S b/arch/x86/lib/checksum_32.S
index 4d34bb548b41..d19862183d4b 100644
--- a/arch/x86/lib/checksum_32.S
+++ b/arch/x86/lib/checksum_32.S
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 

 /*
  * computes a partial checksum, e.g. for TCP/UDP fragments
@@ -156,7 +157,7 @@ ENTRY(csum_partial)
negl %ebx
lea 45f(%ebx,%ebx,2), %ebx
testl %esi, %esi
-   jmp *%ebx
+   NOSPEC_JMP %ebx
 
# Handle 2-byte-aligned regions
 20:addw (%esi), %ax
@@ -439,7 +440,7 @@ ENTRY(csum_partial_copy_generic)
andl $-32,%edx
lea 3f(%ebx,%ebx), %ebx
testl %esi, %esi 
-   jmp *%ebx
+   NOSPEC_JMP %ebx
 1: addl $64,%esi
addl $64,%edi 
SRC(movb -32(%edx),%bl) ; SRC(movb (%edx),%bl)
-- 
2.14.3



Avoid speculative indirect calls in kernel

2018-01-03 Thread Andi Kleen
This is a fix for Variant 2 in 
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

Any speculative indirect calls in the kernel can be tricked 
to execute any kernel code, which may allow side channel
attacks that can leak arbitrary kernel data.

So we want to avoid speculative indirect calls in the kernel.

There's a special code sequence called a retpoline that can
do indirect calls without speculation. We use a new compiler
option -mindirect-branch=thunk-extern (gcc patch will be released
separately) to recompile the kernel with this new sequence.

We also patch all the assembler code in the kernel to use
the new sequence.

The patches were originally from David Woodhouse and Tim Chen,
but then reworked and enhanced by me.

No performance numbers at this point. 32bit is only boot tested.

Git tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc spec/retpoline-415-2

v1: Initial post.
v2:
Add CONFIG_RETPOLINE to build kernel without it.
Change warning messages.
Hide modpost warning message



[PATCH v2 11/12] retpoline/objtool: Disable some objtool warnings

2018-01-03 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

With the indirect call thunk enabled compiler two objtool
warnings are triggered very frequently and make the build
very noisy.

I don't see a good way to avoid them, so just disable them
for now.

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 tools/objtool/check.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 9b341584eb1b..435c71f944dc 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -503,8 +503,13 @@ static int add_call_destinations(struct objtool_file *file)
insn->call_dest = find_symbol_by_offset(insn->sec,
dest_off);
if (!insn->call_dest) {
+#if 0
+   /* Compilers with 
-mindirect-branch=thunk-extern trigger
+* this everywhere on x86. Disable for now.
+*/
WARN_FUNC("can't find call dest symbol at 
offset 0x%lx",
  insn->sec, insn->offset, dest_off);
+#endif
return -1;
}
} else if (rela->sym->type == STT_SECTION) {
@@ -1716,8 +1721,14 @@ static int validate_branch(struct objtool_file *file, 
struct instruction *first,
return 1;
 
} else if (func && has_modified_stack_frame()) {
+#if 0
+   /* Compilers with 
-mindirect-branch=thunk-extern trigger
+* this everywhere on x86. Disable for now.
+*/
+
WARN_FUNC("sibling call from callable 
instruction with modified stack frame",
  sec, insn->offset);
+#endif
return 1;
}
 
-- 
2.14.3



Avoid speculative indirect calls in kernel

2018-01-03 Thread Andi Kleen
This is a fix for Variant 2 in 
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

Any speculative indirect calls in the kernel can be tricked 
to execute any kernel code, which may allow side channel
attacks that can leak arbitrary kernel data.

So we want to avoid speculative indirect calls in the kernel.

There's a special code sequence called a retpoline that can
do indirect calls without speculation. We use a new compiler
option -mindirect-branch=thunk-extern (gcc patch will be released
separately) to recompile the kernel with this new sequence.

We also patch all the assembler code in the kernel to use
the new sequence.

The patches were originally from David Woodhouse and Tim Chen,
but then reworked and enhanced by me.

No performance numbers at this point. 32bit is only boot tested.

Git tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc spec/retpoline-415-2

v1: Initial post.
v2:
Add CONFIG_RETPOLINE to build kernel without it.
Change warning messages.
Hide modpost warning message



[PATCH v2 11/12] retpoline/objtool: Disable some objtool warnings

2018-01-03 Thread Andi Kleen
From: Andi Kleen 

With the indirect call thunk enabled compiler two objtool
warnings are triggered very frequently and make the build
very noisy.

I don't see a good way to avoid them, so just disable them
for now.

Signed-off-by: Andi Kleen 
---
 tools/objtool/check.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 9b341584eb1b..435c71f944dc 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -503,8 +503,13 @@ static int add_call_destinations(struct objtool_file *file)
insn->call_dest = find_symbol_by_offset(insn->sec,
dest_off);
if (!insn->call_dest) {
+#if 0
+   /* Compilers with 
-mindirect-branch=thunk-extern trigger
+* this everywhere on x86. Disable for now.
+*/
WARN_FUNC("can't find call dest symbol at 
offset 0x%lx",
  insn->sec, insn->offset, dest_off);
+#endif
return -1;
}
} else if (rela->sym->type == STT_SECTION) {
@@ -1716,8 +1721,14 @@ static int validate_branch(struct objtool_file *file, 
struct instruction *first,
return 1;
 
} else if (func && has_modified_stack_frame()) {
+#if 0
+   /* Compilers with 
-mindirect-branch=thunk-extern trigger
+* this everywhere on x86. Disable for now.
+*/
+
WARN_FUNC("sibling call from callable 
instruction with modified stack frame",
  sec, insn->offset);
+#endif
return 1;
}
 
-- 
2.14.3



[PATCH v2 09/12] x86/retpoline: Finally enable retpoline for C code

2018-01-03 Thread Andi Kleen
From: Dave Hansen 

From: David Woodhouse 

Add retpoline compile option in Makefile

Update Makefile with retpoline compile options.  This requires a gcc with the
retpoline compiler patches enabled.

Print a warning when the compiler doesn't support retpoline

[Originally from David and Tim, but hacked by AK]

v2: Use CONFIG option to enable
Signed-off-by: David Woodhouse 
Signed-off-by: Tim Chen 
---
 arch/x86/Makefile | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 3e73bc255e4e..b02e35350244 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -230,6 +230,17 @@ KBUILD_CFLAGS += -Wno-sign-compare
 #
 KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
 
+#
+ifdef CONFIG_RETPOLINE
+   RETPOLINE_CFLAGS += $(call cc-option,-mindirect-branch=thunk-extern)
+   ifneq ($(RETPOLINE_CFLAGS),)
+   KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE
+   KBUILD_AFLAGS += -DRETPOLINE
+   else
+$(warning Retpoline not supported in compiler. System may be 
insecure.)
+   endif
+endif
+
 archscripts: scripts_basic
$(Q)$(MAKE) $(build)=arch/x86/tools relocs
 
-- 
2.14.3



[PATCH v2 10/12] retpoline/taint: Taint kernel for missing retpoline in compiler

2018-01-03 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

When the kernel or a module hasn't been compiled with a retpoline
aware compiler, print a warning and set a taint flag.

For modules it is checked at compile time, however it cannot
check assembler or other non compiled objects used in the module link.

Due to lack of better letter it uses taint option 'Z'

v2: Change warning message
Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 Documentation/admin-guide/tainted-kernels.rst |  3 +++
 arch/x86/kernel/setup.c   |  6 ++
 include/linux/kernel.h|  4 +++-
 kernel/module.c   | 11 ++-
 kernel/panic.c|  1 +
 scripts/mod/modpost.c |  9 +
 6 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/tainted-kernels.rst 
b/Documentation/admin-guide/tainted-kernels.rst
index 1df03b5cb02f..800261b6bd6f 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -52,6 +52,9 @@ characters, each representing a particular tainted value.
 
  16) ``K`` if the kernel has been live patched.
 
+ 17) ``Z`` if the x86 kernel or a module hasn't been compiled with
+ a retpoline aware compiler and may be vulnerable to data leaks.
+
 The primary reason for the **'Tainted: '** string is to tell kernel
 debuggers if this is a clean kernel or if anything unusual has
 occurred.  Tainting is permanent: even if an offending module is
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 8af2e8d0c0a1..cc880b46b756 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1296,6 +1296,12 @@ void __init setup_arch(char **cmdline_p)
 #endif
 
unwind_init();
+
+#ifndef RETPOLINE
+   add_taint(TAINT_NO_RETPOLINE, LOCKDEP_STILL_OK);
+   pr_warn("No support for retpoline in kernel compiler\n");
+   pr_warn("System may be vulnerable to data leaks.\n");
+#endif
 }
 
 #ifdef CONFIG_X86_32
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index ce51455e2adf..fbb4d3baffcc 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -550,7 +550,9 @@ extern enum system_states {
 #define TAINT_SOFTLOCKUP   14
 #define TAINT_LIVEPATCH15
 #define TAINT_AUX  16
-#define TAINT_FLAGS_COUNT  17
+#define TAINT_NO_RETPOLINE 17
+
+#define TAINT_FLAGS_COUNT  18
 
 struct taint_flag {
char c_true;/* character printed when tainted */
diff --git a/kernel/module.c b/kernel/module.c
index dea01ac9cb74..92db3f59a29a 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3028,7 +3028,16 @@ static int check_modinfo(struct module *mod, struct 
load_info *info, int flags)
mod->name);
add_taint_module(mod, TAINT_OOT_MODULE, LOCKDEP_STILL_OK);
}
-
+#ifdef RETPOLINE
+   if (!get_modinfo(info, "retpoline")) {
+   if (!test_taint(TAINT_NO_RETPOLINE)) {
+   pr_warn("%s: loading module not compiled with retpoline 
compiler.\n",
+   mod->name);
+   pr_warn("Kernel may be vulnerable to data leaks.\n");
+   }
+   add_taint_module(mod, TAINT_NO_RETPOLINE, LOCKDEP_STILL_OK);
+   }
+#endif
if (get_modinfo(info, "staging")) {
add_taint_module(mod, TAINT_CRAP, LOCKDEP_STILL_OK);
pr_warn("%s: module is from the staging directory, the quality "
diff --git a/kernel/panic.c b/kernel/panic.c
index 2cfef408fec9..6686c67b6e4b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -325,6 +325,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
{ 'L', ' ', false },/* TAINT_SOFTLOCKUP */
{ 'K', ' ', true }, /* TAINT_LIVEPATCH */
{ 'X', ' ', true }, /* TAINT_AUX */
+   { 'Z', ' ', true }, /* TAINT_NO_RETPOLINE */
 };
 
 /**
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index f51cf977c65b..6510536c06df 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -2165,6 +2165,14 @@ static void add_intree_flag(struct buffer *b, int 
is_intree)
buf_printf(b, "\nMODULE_INFO(intree, \"Y\");\n");
 }
 
+/* Cannot check for assembler */
+static void add_retpoline(struct buffer *b)
+{
+   buf_printf(b, "\n#ifdef RETPOLINE\n");
+   buf_printf(b, "MODULE_INFO(retpoline, \"Y\");\n");
+   buf_printf(b, "#endif\n");
+}
+
 static void add_staging_flag(struct buffer *b, const char *name)
 {
static const char *staging_dir = "drivers/staging";
@@ -2506,6 +2514,7 @@ int main(int argc, char **argv)
err |= check_modname_len(mod);
   

[PATCH v2 09/12] x86/retpoline: Finally enable retpoline for C code

2018-01-03 Thread Andi Kleen
From: Dave Hansen 

From: David Woodhouse 

Add retpoline compile option in Makefile

Update Makefile with retpoline compile options.  This requires a gcc with the
retpoline compiler patches enabled.

Print a warning when the compiler doesn't support retpoline

[Originally from David and Tim, but hacked by AK]

v2: Use CONFIG option to enable
Signed-off-by: David Woodhouse 
Signed-off-by: Tim Chen 
---
 arch/x86/Makefile | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 3e73bc255e4e..b02e35350244 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -230,6 +230,17 @@ KBUILD_CFLAGS += -Wno-sign-compare
 #
 KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
 
+#
+ifdef CONFIG_RETPOLINE
+   RETPOLINE_CFLAGS += $(call cc-option,-mindirect-branch=thunk-extern)
+   ifneq ($(RETPOLINE_CFLAGS),)
+   KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE
+   KBUILD_AFLAGS += -DRETPOLINE
+   else
+$(warning Retpoline not supported in compiler. System may be 
insecure.)
+   endif
+endif
+
 archscripts: scripts_basic
$(Q)$(MAKE) $(build)=arch/x86/tools relocs
 
-- 
2.14.3



[PATCH v2 10/12] retpoline/taint: Taint kernel for missing retpoline in compiler

2018-01-03 Thread Andi Kleen
From: Andi Kleen 

When the kernel or a module hasn't been compiled with a retpoline
aware compiler, print a warning and set a taint flag.

For modules it is checked at compile time, however it cannot
check assembler or other non compiled objects used in the module link.

Due to lack of better letter it uses taint option 'Z'

v2: Change warning message
Signed-off-by: Andi Kleen 
---
 Documentation/admin-guide/tainted-kernels.rst |  3 +++
 arch/x86/kernel/setup.c   |  6 ++
 include/linux/kernel.h|  4 +++-
 kernel/module.c   | 11 ++-
 kernel/panic.c|  1 +
 scripts/mod/modpost.c |  9 +
 6 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/tainted-kernels.rst 
b/Documentation/admin-guide/tainted-kernels.rst
index 1df03b5cb02f..800261b6bd6f 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -52,6 +52,9 @@ characters, each representing a particular tainted value.
 
  16) ``K`` if the kernel has been live patched.
 
+ 17) ``Z`` if the x86 kernel or a module hasn't been compiled with
+ a retpoline aware compiler and may be vulnerable to data leaks.
+
 The primary reason for the **'Tainted: '** string is to tell kernel
 debuggers if this is a clean kernel or if anything unusual has
 occurred.  Tainting is permanent: even if an offending module is
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 8af2e8d0c0a1..cc880b46b756 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1296,6 +1296,12 @@ void __init setup_arch(char **cmdline_p)
 #endif
 
unwind_init();
+
+#ifndef RETPOLINE
+   add_taint(TAINT_NO_RETPOLINE, LOCKDEP_STILL_OK);
+   pr_warn("No support for retpoline in kernel compiler\n");
+   pr_warn("System may be vulnerable to data leaks.\n");
+#endif
 }
 
 #ifdef CONFIG_X86_32
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index ce51455e2adf..fbb4d3baffcc 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -550,7 +550,9 @@ extern enum system_states {
 #define TAINT_SOFTLOCKUP   14
 #define TAINT_LIVEPATCH15
 #define TAINT_AUX  16
-#define TAINT_FLAGS_COUNT  17
+#define TAINT_NO_RETPOLINE 17
+
+#define TAINT_FLAGS_COUNT  18
 
 struct taint_flag {
char c_true;/* character printed when tainted */
diff --git a/kernel/module.c b/kernel/module.c
index dea01ac9cb74..92db3f59a29a 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3028,7 +3028,16 @@ static int check_modinfo(struct module *mod, struct 
load_info *info, int flags)
mod->name);
add_taint_module(mod, TAINT_OOT_MODULE, LOCKDEP_STILL_OK);
}
-
+#ifdef RETPOLINE
+   if (!get_modinfo(info, "retpoline")) {
+   if (!test_taint(TAINT_NO_RETPOLINE)) {
+   pr_warn("%s: loading module not compiled with retpoline 
compiler.\n",
+   mod->name);
+   pr_warn("Kernel may be vulnerable to data leaks.\n");
+   }
+   add_taint_module(mod, TAINT_NO_RETPOLINE, LOCKDEP_STILL_OK);
+   }
+#endif
if (get_modinfo(info, "staging")) {
add_taint_module(mod, TAINT_CRAP, LOCKDEP_STILL_OK);
pr_warn("%s: module is from the staging directory, the quality "
diff --git a/kernel/panic.c b/kernel/panic.c
index 2cfef408fec9..6686c67b6e4b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -325,6 +325,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
{ 'L', ' ', false },/* TAINT_SOFTLOCKUP */
{ 'K', ' ', true }, /* TAINT_LIVEPATCH */
{ 'X', ' ', true }, /* TAINT_AUX */
+   { 'Z', ' ', true }, /* TAINT_NO_RETPOLINE */
 };
 
 /**
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index f51cf977c65b..6510536c06df 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -2165,6 +2165,14 @@ static void add_intree_flag(struct buffer *b, int 
is_intree)
buf_printf(b, "\nMODULE_INFO(intree, \"Y\");\n");
 }
 
+/* Cannot check for assembler */
+static void add_retpoline(struct buffer *b)
+{
+   buf_printf(b, "\n#ifdef RETPOLINE\n");
+   buf_printf(b, "MODULE_INFO(retpoline, \"Y\");\n");
+   buf_printf(b, "#endif\n");
+}
+
 static void add_staging_flag(struct buffer *b, const char *name)
 {
static const char *staging_dir = "drivers/staging";
@@ -2506,6 +2514,7 @@ int main(int argc, char **argv)
err |= check_modname_len(mod);
add_header(, mod);
add_intree_f

[PATCH v2 04/12] x86/retpoline/ftrace: Convert ftrace assembler indirect jumps

2018-01-03 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

Convert all indirect jumps in ftrace assembler code to use
non speculative sequences.

Based on code from David Woodhouse and Tim Chen

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 arch/x86/kernel/ftrace_32.S | 3 ++-
 arch/x86/kernel/ftrace_64.S | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/ftrace_32.S b/arch/x86/kernel/ftrace_32.S
index b6c6468e10bc..c0c10ba53375 100644
--- a/arch/x86/kernel/ftrace_32.S
+++ b/arch/x86/kernel/ftrace_32.S
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CC_USING_FENTRY
 # define function_hook __fentry__
@@ -241,5 +242,5 @@ return_to_handler:
movl%eax, %ecx
popl%edx
popl%eax
-   jmp *%ecx
+   NOSPEC_JMP  %ecx
 #endif
diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index c832291d948a..4814c4b30b8d 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -7,7 +7,7 @@
 #include 
 #include 
 #include 
-
+#include 
 
.code64
.section .entry.text, "ax"
@@ -286,7 +286,7 @@ trace:
 * ip and parent ip are used and the list function is called when
 * function tracing is enabled.
 */
-   call   *ftrace_trace_function
+   NOSPEC_CALL ftrace_trace_function
 
restore_mcount_regs
 
@@ -329,5 +329,5 @@ GLOBAL(return_to_handler)
movq 8(%rsp), %rdx
movq (%rsp), %rax
addq $24, %rsp
-   jmp *%rdi
+   NOSPEC_JMP %rdi
 #endif
-- 
2.14.3



[PATCH v2 04/12] x86/retpoline/ftrace: Convert ftrace assembler indirect jumps

2018-01-03 Thread Andi Kleen
From: Andi Kleen 

Convert all indirect jumps in ftrace assembler code to use
non speculative sequences.

Based on code from David Woodhouse and Tim Chen

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/ftrace_32.S | 3 ++-
 arch/x86/kernel/ftrace_64.S | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/ftrace_32.S b/arch/x86/kernel/ftrace_32.S
index b6c6468e10bc..c0c10ba53375 100644
--- a/arch/x86/kernel/ftrace_32.S
+++ b/arch/x86/kernel/ftrace_32.S
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CC_USING_FENTRY
 # define function_hook __fentry__
@@ -241,5 +242,5 @@ return_to_handler:
movl%eax, %ecx
popl%edx
popl%eax
-   jmp *%ecx
+   NOSPEC_JMP  %ecx
 #endif
diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index c832291d948a..4814c4b30b8d 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -7,7 +7,7 @@
 #include 
 #include 
 #include 
-
+#include 
 
.code64
.section .entry.text, "ax"
@@ -286,7 +286,7 @@ trace:
 * ip and parent ip are used and the list function is called when
 * function tracing is enabled.
 */
-   call   *ftrace_trace_function
+   NOSPEC_CALL ftrace_trace_function
 
restore_mcount_regs
 
@@ -329,5 +329,5 @@ GLOBAL(return_to_handler)
movq 8(%rsp), %rdx
movq (%rsp), %rax
addq $24, %rsp
-   jmp *%rdi
+   NOSPEC_JMP %rdi
 #endif
-- 
2.14.3



[PATCH v2 08/12] x86/retpoline/irq32: Convert assembler indirect jumps

2018-01-03 Thread Andi Kleen
From: Andi Kleen <a...@linux.intel.com>

Convert all indirect jumps in 32bit irq inline asm code to use
non speculative sequences.

Signed-off-by: Andi Kleen <a...@linux.intel.com>
---
 arch/x86/kernel/irq_32.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c
index a83b3346a0e1..2dce2fdd2c4e 100644
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -20,6 +20,7 @@
 #include 
 
 #include 
+#include 
 
 #ifdef CONFIG_DEBUG_STACKOVERFLOW
 
@@ -55,7 +56,7 @@ DEFINE_PER_CPU(struct irq_stack *, softirq_stack);
 static void call_on_stack(void *func, void *stack)
 {
asm volatile("xchgl %%ebx,%%esp \n"
-"call  *%%edi  \n"
+NOSPEC_CALL("%%edi")
 "movl  %%ebx,%%esp \n"
 : "=b" (stack)
 : "0" (stack),
@@ -95,7 +96,7 @@ static inline int execute_on_irq_stack(int overflow, struct 
irq_desc *desc)
call_on_stack(print_stack_overflow, isp);
 
asm volatile("xchgl %%ebx,%%esp \n"
-"call  *%%edi  \n"
+NOSPEC_CALL("%%edi")
 "movl  %%ebx,%%esp \n"
 : "=a" (arg1), "=b" (isp)
 :  "0" (desc),   "1" (isp),
-- 
2.14.3



[PATCH v2 08/12] x86/retpoline/irq32: Convert assembler indirect jumps

2018-01-03 Thread Andi Kleen
From: Andi Kleen 

Convert all indirect jumps in 32bit irq inline asm code to use
non speculative sequences.

Signed-off-by: Andi Kleen 
---
 arch/x86/kernel/irq_32.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c
index a83b3346a0e1..2dce2fdd2c4e 100644
--- a/arch/x86/kernel/irq_32.c
+++ b/arch/x86/kernel/irq_32.c
@@ -20,6 +20,7 @@
 #include 
 
 #include 
+#include 
 
 #ifdef CONFIG_DEBUG_STACKOVERFLOW
 
@@ -55,7 +56,7 @@ DEFINE_PER_CPU(struct irq_stack *, softirq_stack);
 static void call_on_stack(void *func, void *stack)
 {
asm volatile("xchgl %%ebx,%%esp \n"
-"call  *%%edi  \n"
+NOSPEC_CALL("%%edi")
 "movl  %%ebx,%%esp \n"
 : "=b" (stack)
 : "0" (stack),
@@ -95,7 +96,7 @@ static inline int execute_on_irq_stack(int overflow, struct 
irq_desc *desc)
call_on_stack(print_stack_overflow, isp);
 
asm volatile("xchgl %%ebx,%%esp \n"
-"call  *%%edi  \n"
+NOSPEC_CALL("%%edi")
 "movl  %%ebx,%%esp \n"
 : "=a" (arg1), "=b" (isp)
 :  "0" (desc),   "1" (isp),
-- 
2.14.3



Re: Avoid speculative indirect calls in kernel

2018-01-03 Thread Andi Kleen
> So you say, that we finally need a perl interpreter in the kernel to do
> alternative patching?

I don't think perl or objtool makes sense. That would be just incredibly
fragile because compilers can reorder and mix code. 

It could be done with a gcc change I suppose. That should be reliable.

But that would need to be developed first. We don't have it right now.

As the first step a compile time approach should be sufficient.
We can add a CONFIG option so people can chose at compile time.

Then later we can investigate run time patching.

-Andi


Re: Avoid speculative indirect calls in kernel

2018-01-03 Thread Andi Kleen
> So you say, that we finally need a perl interpreter in the kernel to do
> alternative patching?

I don't think perl or objtool makes sense. That would be just incredibly
fragile because compilers can reorder and mix code. 

It could be done with a gcc change I suppose. That should be reliable.

But that would need to be developed first. We don't have it right now.

As the first step a compile time approach should be sufficient.
We can add a CONFIG option so people can chose at compile time.

Then later we can investigate run time patching.

-Andi


Re: [PATCH] Fix read buffer overflow in delta-ipc

2018-01-03 Thread Andi Kleen
On Wed, Jan 03, 2018 at 09:40:04AM +, Hugues FRUCHET wrote:
> Hi Andi,
> Thanks for the patch but I would suggest to use strlcpy instead, this 
> will guard msg.name overwriting and add the NULL termination in case
> of truncation:
> - memcpy(msg.name, name, sizeof(msg.name));
> - msg.name[sizeof(msg.name) - 1] = 0;
> + strlcpy(msg.name, name, sizeof(msg.name));

I'm not an expert on your setup, but it seems strlcpy would leak some
uninitialized stack data over your ipc mechanism. strclpy doesn't pad the
data. If the IPC is a security boundary that would be a security bug.

So I think the original patch is better than strlcpy.

-Andi


Re: [PATCH] Fix read buffer overflow in delta-ipc

2018-01-03 Thread Andi Kleen
On Wed, Jan 03, 2018 at 09:40:04AM +, Hugues FRUCHET wrote:
> Hi Andi,
> Thanks for the patch but I would suggest to use strlcpy instead, this 
> will guard msg.name overwriting and add the NULL termination in case
> of truncation:
> - memcpy(msg.name, name, sizeof(msg.name));
> - msg.name[sizeof(msg.name) - 1] = 0;
> + strlcpy(msg.name, name, sizeof(msg.name));

I'm not an expert on your setup, but it seems strlcpy would leak some
uninitialized stack data over your ipc mechanism. strclpy doesn't pad the
data. If the IPC is a security boundary that would be a security bug.

So I think the original patch is better than strlcpy.

-Andi


Re: Avoid speculative indirect calls in kernel

2018-01-03 Thread Andi Kleen
> It should be a CPU_BUG bit as we have for the other mess. And that can be
> used for patching.

It has to be done at compile time because it requires a compiler option.

Most of the indirect calls are in C code.

So it cannot just patched in, only partially out.

-Andi


Re: Avoid speculative indirect calls in kernel

2018-01-03 Thread Andi Kleen
> It should be a CPU_BUG bit as we have for the other mess. And that can be
> used for patching.

It has to be done at compile time because it requires a compiler option.

Most of the indirect calls are in C code.

So it cannot just patched in, only partially out.

-Andi


Re: Avoid speculative indirect calls in kernel

2018-01-03 Thread Andi Kleen
Hi Linus,

On Wed, Jan 03, 2018 at 03:51:35PM -0800, Linus Torvalds wrote:
> On Wed, Jan 3, 2018 at 3:09 PM, Andi Kleen <a...@firstfloor.org> wrote:
> > This is a fix for Variant 2 in
> > https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
> >
> > Any speculative indirect calls in the kernel can be tricked
> > to execute any kernel code, which may allow side channel
> > attacks that can leak arbitrary kernel data.
> 
> Why is this all done without any configuration options?

I was thinking of a config option, but I was struggling with a name.

CONFIG_INSECURE_KERNEL, CONFIG_LEAK_MEMORY? 

And should it be positive or negative?

So I opted to be secure uncontionally.

It would be simple to add however, all hooks are either in the Makefile
or in asm/jump-asm.h

>  - these workarounds should have a way to disable them.
> 

There will be soon patches to add other ways and they have a way
to patch out most of the retpoline overhead at runtime
(basically replace the trampoline with a pure ret)

We just wanted to get the retpoline code out first because
it's the most basic and widest applicable fix.

-Andi



Re: Avoid speculative indirect calls in kernel

2018-01-03 Thread Andi Kleen
Hi Linus,

On Wed, Jan 03, 2018 at 03:51:35PM -0800, Linus Torvalds wrote:
> On Wed, Jan 3, 2018 at 3:09 PM, Andi Kleen  wrote:
> > This is a fix for Variant 2 in
> > https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html
> >
> > Any speculative indirect calls in the kernel can be tricked
> > to execute any kernel code, which may allow side channel
> > attacks that can leak arbitrary kernel data.
> 
> Why is this all done without any configuration options?

I was thinking of a config option, but I was struggling with a name.

CONFIG_INSECURE_KERNEL, CONFIG_LEAK_MEMORY? 

And should it be positive or negative?

So I opted to be secure uncontionally.

It would be simple to add however, all hooks are either in the Makefile
or in asm/jump-asm.h

>  - these workarounds should have a way to disable them.
> 

There will be soon patches to add other ways and they have a way
to patch out most of the retpoline overhead at runtime
(basically replace the trampoline with a pure ret)

We just wanted to get the retpoline code out first because
it's the most basic and widest applicable fix.

-Andi



Avoid speculative indirect calls in kernel

2018-01-03 Thread Andi Kleen
This is a fix for Variant 2 in 
https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html

Any speculative indirect calls in the kernel can be tricked 
to execute any kernel code, which may allow side channel
attacks that can leak arbitrary kernel data.

So we want to avoid speculative indirect calls in the kernel.

There's a special code sequence called a retpoline that can
do indirect calls without speculation. We use a new compiler
option -mindirect-branch=thunk-extern (gcc patch will be released
separately) to recompile the kernel with this new sequence.

We also patch all the assembler code in the kernel to use
the new sequence.

The patches were originally from David Woodhouse and Tim Chen,
but then reworked and enhanced by me.

No performance numbers at this point. 32bit is only boot tested.

Git tree available in 
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc spec/retpoline-415-1

v1: Initial post.


<    10   11   12   13   14   15   16   17   18   19   >