[PATCH v1 5/8] x86/entry/clearregs: Clear registers for 64bit exceptions/interrupts
From: Andi Kleen Clear all registers on entering the 64bit kernel for exceptions and interrupts. Since there are no arguments this is fairly simple. Signed-off-by: Andi Kleen --- arch/x86/entry/entry_64.S | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 632081fd7086..6ab4c2aaeabb 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -636,6 +636,7 @@ END(irq_entries_start) ALLOC_PT_GPREGS_ON_STACK SAVE_C_REGS SAVE_EXTRA_REGS + CLEAR_ALL_REGS ENCODE_FRAME_POINTER testb $3, CS(%rsp) @@ -1192,6 +1193,7 @@ ENTRY(xen_failsafe_callback) ALLOC_PT_GPREGS_ON_STACK SAVE_C_REGS SAVE_EXTRA_REGS + CLEAR_ALL_REGS ENCODE_FRAME_POINTER jmp error_exit END(xen_failsafe_callback) @@ -1237,6 +1239,7 @@ ENTRY(paranoid_entry) cld SAVE_C_REGS 8 SAVE_EXTRA_REGS 8 + CLEAR_ALL_REGS ENCODE_FRAME_POINTER 8 movl$1, %ebx movl$MSR_GS_BASE, %ecx @@ -1289,6 +1292,7 @@ ENTRY(error_entry) cld SAVE_C_REGS 8 SAVE_EXTRA_REGS 8 + CLEAR_ALL_REGS ENCODE_FRAME_POINTER 8 xorl%ebx, %ebx testb $3, CS+8(%rsp) @@ -1487,6 +1491,7 @@ ENTRY(nmi) pushq %r14/* pt_regs->r14 */ pushq %r15/* pt_regs->r15 */ UNWIND_HINT_REGS + CLEAR_ALL_REGS ENCODE_FRAME_POINTER /* -- 2.14.3
[PATCH v1 6/8] x86/entry/clearregs: Add number of arguments to syscall tables
From: Andi Kleen <a...@linux.intel.com> In order to sanitize the system call arguments properly we need to know the number of syscall arguments for each syscall. Add a new column to the 32bit and 64bit syscall tables to list the number of arguments. Also fix the generation script to not confuse the number with a compat entry. Generated with some scripting and quick review (but more eyeballs would be appreciated) Signed-off-by: Andi Kleen <a...@linux.intel.com> --- arch/x86/entry/syscalls/syscall_32.tbl | 726 - arch/x86/entry/syscalls/syscall_64.tbl | 708 arch/x86/entry/syscalls/syscalltbl.sh | 7 +- 3 files changed, 723 insertions(+), 718 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 448ac2161112..c3a4480365dd 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -6,388 +6,388 @@ # # The abi is always "i386" for this file. # -0 i386restart_syscall sys_restart_syscall -1 i386exitsys_exit -2 i386forksys_forksys_fork -3 i386readsys_read -4 i386write sys_write -5 i386opensys_open compat_sys_open -6 i386close sys_close -7 i386waitpid sys_waitpid sys32_waitpid -8 i386creat sys_creat -9 i386linksys_link -10 i386unlink sys_unlink -11 i386execve sys_execve compat_sys_execve -12 i386chdir sys_chdir -13 i386timesys_time compat_sys_time -14 i386mknod sys_mknod -15 i386chmod sys_chmod -16 i386lchown sys_lchown16 +0 i386restart_syscall sys_restart_syscall 0 +1 i386exitsys_exit1 +2 i386forksys_fork sys_fork0 +3 i386readsys_read3 +4 i386write sys_write 3 +5 i386opensys_open compat_sys_open 3 +6 i386close sys_close 1 +7 i386waitpid sys_waitpid sys32_waitpid 3 +8 i386creat sys_creat 2 +9 i386linksys_link2 +10 i386unlink sys_unlink 1 +11 i386execve sys_execve compat_sys_execve 3 +12 i386chdir sys_chdir 1 +13 i386timesys_time compat_sys_time 1 +14 i386mknod sys_mknod 3 +15 i386chmod sys_chmod 2 +16 i386lchown sys_lchown163 17 i386break -18 i386oldstat sys_stat -19 i386lseek sys_lseek compat_sys_lseek -20 i386getpid sys_getpid -21 i386mount sys_mount compat_sys_mount -22 i386umount sys_oldumount -23 i386setuid sys_setuid16 -24 i386getuid sys_getuid16 -25 i386stime sys_stime compat_sys_stime -26 i386ptrace sys_ptrace compat_sys_ptrace -27 i386alarm sys_alarm -28 i386oldfstatsys_fstat -29 i386pause sys_pause -30 i386utime sys_utime compat_sys_utime +18 i386oldstat sys_stat2 +19 i386lseek sys_lseek compat_sys_lseek3 +20 i386getpid sys_getpid 0 +21 i386mount sys_mount compat_sys_mount5 +22 i386umount sys_oldumount 2 +23 i386setuid sys_setuid161 +24 i386getuid sys_getuid160 +25 i386stime sys_stime compat_sys_stime1 +26 i386ptrace sys_ptrace compat_sys_ptrace 4 +27 i386alarm sys_alarm 1 +28 i386oldfstatsys_fstat 2 +29 i386pause
[PATCH v1 4/8] x86/entry/retpoline: Clear extra registers for compat syscalls
From: Andi Kleen <a...@linux.intel.com> Clear all registers for compat calls on 64bit kernels. All arguments are initially passed through the stack, so this is fairly simple without additional stubs. Signed-off-by: Andi Kleen <a...@linux.intel.com> --- arch/x86/entry/entry_64_compat.S | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index 98d5358e4041..16fd2643a77f 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -95,6 +95,8 @@ ENTRY(entry_SYSENTER_compat) pushq $0 /* pt_regs->r14 = 0 */ pushq $0 /* pt_regs->r15 = 0 */ cld + /* Can clear all because arguments are passed through the stack */ + CLEAR_ALL_REGS /* * SYSENTER doesn't filter flags, so we need to clear NT and AC @@ -223,6 +225,8 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe) pushq $0 /* pt_regs->r13 = 0 */ pushq $0 /* pt_regs->r14 = 0 */ pushq $0 /* pt_regs->r15 = 0 */ + /* Can clear all because arguments are passed through the stack */ + CLEAR_ALL_REGS /* * User mode is traced as though IRQs are on, and SYSENTER @@ -348,6 +352,8 @@ ENTRY(entry_INT80_compat) pushq %r14/* pt_regs->r14 */ pushq %r15/* pt_regs->r15 */ cld + /* Can clear all because arguments are passed through the stack */ + CLEAR_ALL_REGS /* * User mode is traced as though IRQs are on, and the interrupt -- 2.14.3
[PATCH v1 6/8] x86/entry/clearregs: Add number of arguments to syscall tables
From: Andi Kleen In order to sanitize the system call arguments properly we need to know the number of syscall arguments for each syscall. Add a new column to the 32bit and 64bit syscall tables to list the number of arguments. Also fix the generation script to not confuse the number with a compat entry. Generated with some scripting and quick review (but more eyeballs would be appreciated) Signed-off-by: Andi Kleen --- arch/x86/entry/syscalls/syscall_32.tbl | 726 - arch/x86/entry/syscalls/syscall_64.tbl | 708 arch/x86/entry/syscalls/syscalltbl.sh | 7 +- 3 files changed, 723 insertions(+), 718 deletions(-) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 448ac2161112..c3a4480365dd 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -6,388 +6,388 @@ # # The abi is always "i386" for this file. # -0 i386restart_syscall sys_restart_syscall -1 i386exitsys_exit -2 i386forksys_forksys_fork -3 i386readsys_read -4 i386write sys_write -5 i386opensys_open compat_sys_open -6 i386close sys_close -7 i386waitpid sys_waitpid sys32_waitpid -8 i386creat sys_creat -9 i386linksys_link -10 i386unlink sys_unlink -11 i386execve sys_execve compat_sys_execve -12 i386chdir sys_chdir -13 i386timesys_time compat_sys_time -14 i386mknod sys_mknod -15 i386chmod sys_chmod -16 i386lchown sys_lchown16 +0 i386restart_syscall sys_restart_syscall 0 +1 i386exitsys_exit1 +2 i386forksys_fork sys_fork0 +3 i386readsys_read3 +4 i386write sys_write 3 +5 i386opensys_open compat_sys_open 3 +6 i386close sys_close 1 +7 i386waitpid sys_waitpid sys32_waitpid 3 +8 i386creat sys_creat 2 +9 i386linksys_link2 +10 i386unlink sys_unlink 1 +11 i386execve sys_execve compat_sys_execve 3 +12 i386chdir sys_chdir 1 +13 i386timesys_time compat_sys_time 1 +14 i386mknod sys_mknod 3 +15 i386chmod sys_chmod 2 +16 i386lchown sys_lchown163 17 i386break -18 i386oldstat sys_stat -19 i386lseek sys_lseek compat_sys_lseek -20 i386getpid sys_getpid -21 i386mount sys_mount compat_sys_mount -22 i386umount sys_oldumount -23 i386setuid sys_setuid16 -24 i386getuid sys_getuid16 -25 i386stime sys_stime compat_sys_stime -26 i386ptrace sys_ptrace compat_sys_ptrace -27 i386alarm sys_alarm -28 i386oldfstatsys_fstat -29 i386pause sys_pause -30 i386utime sys_utime compat_sys_utime +18 i386oldstat sys_stat2 +19 i386lseek sys_lseek compat_sys_lseek3 +20 i386getpid sys_getpid 0 +21 i386mount sys_mount compat_sys_mount5 +22 i386umount sys_oldumount 2 +23 i386setuid sys_setuid161 +24 i386getuid sys_getuid160 +25 i386stime sys_stime compat_sys_stime1 +26 i386ptrace sys_ptrace compat_sys_ptrace 4 +27 i386alarm sys_alarm 1 +28 i386oldfstatsys_fstat 2 +29 i386pause sys_pause 0 +30 i386utime
[PATCH v1 4/8] x86/entry/retpoline: Clear extra registers for compat syscalls
From: Andi Kleen Clear all registers for compat calls on 64bit kernels. All arguments are initially passed through the stack, so this is fairly simple without additional stubs. Signed-off-by: Andi Kleen --- arch/x86/entry/entry_64_compat.S | 6 ++ 1 file changed, 6 insertions(+) diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index 98d5358e4041..16fd2643a77f 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -95,6 +95,8 @@ ENTRY(entry_SYSENTER_compat) pushq $0 /* pt_regs->r14 = 0 */ pushq $0 /* pt_regs->r15 = 0 */ cld + /* Can clear all because arguments are passed through the stack */ + CLEAR_ALL_REGS /* * SYSENTER doesn't filter flags, so we need to clear NT and AC @@ -223,6 +225,8 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe) pushq $0 /* pt_regs->r13 = 0 */ pushq $0 /* pt_regs->r14 = 0 */ pushq $0 /* pt_regs->r15 = 0 */ + /* Can clear all because arguments are passed through the stack */ + CLEAR_ALL_REGS /* * User mode is traced as though IRQs are on, and SYSENTER @@ -348,6 +352,8 @@ ENTRY(entry_INT80_compat) pushq %r14/* pt_regs->r14 */ pushq %r15/* pt_regs->r15 */ cld + /* Can clear all because arguments are passed through the stack */ + CLEAR_ALL_REGS /* * User mode is traced as though IRQs are on, and the interrupt -- 2.14.3
[PATCH v1 1/8] x86/entry/clearregs: Remove partial stack frame in fast system call
From: Andi Kleen <a...@linux.intel.com> Remove the partial stack frame in the 64bit syscall fast path. In the next patch we want to clear the extra registers, which requires to always save all registers. So remove the partial stack frame in the syscall fast path and always save everything. This actually simplifies the code because the ptregs stubs are not needed anymore. arch/x86/entry/entry_64.S | 57 - arch/x86/entry/syscall_64.c | 2 +- Signed-off-by: Andi Kleen <a...@linux.intel.com> --- arch/x86/entry/entry_64.S | 57 - arch/x86/entry/syscall_64.c | 2 +- 2 files changed, 5 insertions(+), 54 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 58dbf7a12a05..bbdfbdd817d6 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -234,7 +234,9 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) pushq %r9 /* pt_regs->r9 */ pushq %r10/* pt_regs->r10 */ pushq %r11/* pt_regs->r11 */ - sub $(6*8), %rsp/* pt_regs->bp, bx, r12-15 not saved */ + sub $(6*8), %rsp + SAVE_EXTRA_REGS + UNWIND_HINT_REGS extra=0 /* @@ -262,11 +264,6 @@ entry_SYSCALL_64_fastpath: ja 1f /* return -ENOSYS (already in pt_regs->ax) */ movq%r10, %rcx - /* -* This call instruction is handled specially in stub_ptregs_64. -* It might end up jumping to the slow path. If it jumps, RAX -* and all argument registers are clobbered. -*/ #ifdef CONFIG_RETPOLINE movqsys_call_table(, %rax, 8), %rax call__x86_indirect_thunk_rax @@ -293,9 +290,7 @@ entry_SYSCALL_64_fastpath: TRACE_IRQS_ON /* user mode is traced as IRQs on */ movqRIP(%rsp), %rcx movqEFLAGS(%rsp), %r11 - addq$6*8, %rsp /* skip extra regs -- they were preserved */ - UNWIND_HINT_EMPTY - jmp .Lpop_c_regs_except_rcx_r11_and_sysret + jmp syscall_return_via_sysret 1: /* @@ -305,14 +300,12 @@ entry_SYSCALL_64_fastpath: */ TRACE_IRQS_ON ENABLE_INTERRUPTS(CLBR_ANY) - SAVE_EXTRA_REGS movq%rsp, %rdi callsyscall_return_slowpath /* returns with IRQs disabled */ jmp return_from_SYSCALL_64 entry_SYSCALL64_slow_path: /* IRQs are off. */ - SAVE_EXTRA_REGS movq%rsp, %rdi calldo_syscall_64 /* returns with IRQs disabled */ @@ -389,7 +382,6 @@ syscall_return_via_sysret: /* rcx and r11 are already restored (see code above) */ UNWIND_HINT_EMPTY POP_EXTRA_REGS -.Lpop_c_regs_except_rcx_r11_and_sysret: popq%rsi/* skip r11 */ popq%r10 popq%r9 @@ -420,47 +412,6 @@ syscall_return_via_sysret: USERGS_SYSRET64 END(entry_SYSCALL_64) -ENTRY(stub_ptregs_64) - /* -* Syscalls marked as needing ptregs land here. -* If we are on the fast path, we need to save the extra regs, -* which we achieve by trying again on the slow path. If we are on -* the slow path, the extra regs are already saved. -* -* RAX stores a pointer to the C function implementing the syscall. -* IRQs are on. -*/ - cmpq$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp) - jne 1f - - /* -* Called from fast path -- disable IRQs again, pop return address -* and jump to slow path -*/ - DISABLE_INTERRUPTS(CLBR_ANY) - TRACE_IRQS_OFF - popq%rax - UNWIND_HINT_REGS extra=0 - jmp entry_SYSCALL64_slow_path - -1: - JMP_NOSPEC %rax /* Called from C */ -END(stub_ptregs_64) - -.macro ptregs_stub func -ENTRY(ptregs_\func) - UNWIND_HINT_FUNC - leaq\func(%rip), %rax - jmp stub_ptregs_64 -END(ptregs_\func) -.endm - -/* Instantiate ptregs_stub for each ptregs-using syscall */ -#define __SYSCALL_64_QUAL_(sym) -#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_stub sym -#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym) -#include - /* * %rdi: prev task * %rsi: next task diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c index 9c09775e589d..ad1ae014f943 100644 --- a/arch/x86/entry/syscall_64.c +++ b/arch/x86/entry/syscall_64.c @@ -8,7 +8,7 @@ #include #define __SYSCALL_64_QUAL_(sym) sym -#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym +#define __SYSCALL_64_QUAL_ptregs(sym) sym #define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); #include -- 2.14.3
[PATCH v1 1/8] x86/entry/clearregs: Remove partial stack frame in fast system call
From: Andi Kleen Remove the partial stack frame in the 64bit syscall fast path. In the next patch we want to clear the extra registers, which requires to always save all registers. So remove the partial stack frame in the syscall fast path and always save everything. This actually simplifies the code because the ptregs stubs are not needed anymore. arch/x86/entry/entry_64.S | 57 - arch/x86/entry/syscall_64.c | 2 +- Signed-off-by: Andi Kleen --- arch/x86/entry/entry_64.S | 57 - arch/x86/entry/syscall_64.c | 2 +- 2 files changed, 5 insertions(+), 54 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 58dbf7a12a05..bbdfbdd817d6 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -234,7 +234,9 @@ GLOBAL(entry_SYSCALL_64_after_hwframe) pushq %r9 /* pt_regs->r9 */ pushq %r10/* pt_regs->r10 */ pushq %r11/* pt_regs->r11 */ - sub $(6*8), %rsp/* pt_regs->bp, bx, r12-15 not saved */ + sub $(6*8), %rsp + SAVE_EXTRA_REGS + UNWIND_HINT_REGS extra=0 /* @@ -262,11 +264,6 @@ entry_SYSCALL_64_fastpath: ja 1f /* return -ENOSYS (already in pt_regs->ax) */ movq%r10, %rcx - /* -* This call instruction is handled specially in stub_ptregs_64. -* It might end up jumping to the slow path. If it jumps, RAX -* and all argument registers are clobbered. -*/ #ifdef CONFIG_RETPOLINE movqsys_call_table(, %rax, 8), %rax call__x86_indirect_thunk_rax @@ -293,9 +290,7 @@ entry_SYSCALL_64_fastpath: TRACE_IRQS_ON /* user mode is traced as IRQs on */ movqRIP(%rsp), %rcx movqEFLAGS(%rsp), %r11 - addq$6*8, %rsp /* skip extra regs -- they were preserved */ - UNWIND_HINT_EMPTY - jmp .Lpop_c_regs_except_rcx_r11_and_sysret + jmp syscall_return_via_sysret 1: /* @@ -305,14 +300,12 @@ entry_SYSCALL_64_fastpath: */ TRACE_IRQS_ON ENABLE_INTERRUPTS(CLBR_ANY) - SAVE_EXTRA_REGS movq%rsp, %rdi callsyscall_return_slowpath /* returns with IRQs disabled */ jmp return_from_SYSCALL_64 entry_SYSCALL64_slow_path: /* IRQs are off. */ - SAVE_EXTRA_REGS movq%rsp, %rdi calldo_syscall_64 /* returns with IRQs disabled */ @@ -389,7 +382,6 @@ syscall_return_via_sysret: /* rcx and r11 are already restored (see code above) */ UNWIND_HINT_EMPTY POP_EXTRA_REGS -.Lpop_c_regs_except_rcx_r11_and_sysret: popq%rsi/* skip r11 */ popq%r10 popq%r9 @@ -420,47 +412,6 @@ syscall_return_via_sysret: USERGS_SYSRET64 END(entry_SYSCALL_64) -ENTRY(stub_ptregs_64) - /* -* Syscalls marked as needing ptregs land here. -* If we are on the fast path, we need to save the extra regs, -* which we achieve by trying again on the slow path. If we are on -* the slow path, the extra regs are already saved. -* -* RAX stores a pointer to the C function implementing the syscall. -* IRQs are on. -*/ - cmpq$.Lentry_SYSCALL_64_after_fastpath_call, (%rsp) - jne 1f - - /* -* Called from fast path -- disable IRQs again, pop return address -* and jump to slow path -*/ - DISABLE_INTERRUPTS(CLBR_ANY) - TRACE_IRQS_OFF - popq%rax - UNWIND_HINT_REGS extra=0 - jmp entry_SYSCALL64_slow_path - -1: - JMP_NOSPEC %rax /* Called from C */ -END(stub_ptregs_64) - -.macro ptregs_stub func -ENTRY(ptregs_\func) - UNWIND_HINT_FUNC - leaq\func(%rip), %rax - jmp stub_ptregs_64 -END(ptregs_\func) -.endm - -/* Instantiate ptregs_stub for each ptregs-using syscall */ -#define __SYSCALL_64_QUAL_(sym) -#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_stub sym -#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(sym) -#include - /* * %rdi: prev task * %rsi: next task diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c index 9c09775e589d..ad1ae014f943 100644 --- a/arch/x86/entry/syscall_64.c +++ b/arch/x86/entry/syscall_64.c @@ -8,7 +8,7 @@ #include #define __SYSCALL_64_QUAL_(sym) sym -#define __SYSCALL_64_QUAL_ptregs(sym) ptregs_##sym +#define __SYSCALL_64_QUAL_ptregs(sym) sym #define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); #include -- 2.14.3
x86/clearregs: Register sanitizing at kernel entry for speculation hygiene
This patch kit implements clearing of all unused registers on kernel entries, including system calls and all exceptions and interrupt. This doesn't fix any known issue, but will make it harder in general to exploit the kernel with speculation because it will be harder to get user controlled values into kernel code. The patchkit is a bit more complicated because it attempts to clear unused argument registers, which requires on 64bit to know how many arguments each system call has. I used some scripting to derive the number of system calls from the SYSCALL_DEFINE*s and add it to the x86 system call tables. Everything else is relatively simple and straight forward, and could be used independently. I assume this mostly isn't 4.15 material, but should be considered for 4.16 Possibly some of the simpler patches could be considered for 4.15 Original patches were from Tim Chen, but changed significantly by AK. git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git spec/clearregs-1 v1: Initial post
x86/clearregs: Register sanitizing at kernel entry for speculation hygiene
This patch kit implements clearing of all unused registers on kernel entries, including system calls and all exceptions and interrupt. This doesn't fix any known issue, but will make it harder in general to exploit the kernel with speculation because it will be harder to get user controlled values into kernel code. The patchkit is a bit more complicated because it attempts to clear unused argument registers, which requires on 64bit to know how many arguments each system call has. I used some scripting to derive the number of system calls from the SYSCALL_DEFINE*s and add it to the x86 system call tables. Everything else is relatively simple and straight forward, and could be used independently. I assume this mostly isn't 4.15 material, but should be considered for 4.16 Possibly some of the simpler patches could be considered for 4.15 Original patches were from Tim Chen, but changed significantly by AK. git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git spec/clearregs-1 v1: Initial post
[PATCH v1 2/8] x86/entry/clearregs: Add infrastructure to clear registers
From: Andi Kleen <a...@linux.intel.com> Add 64bit assembler macros to clear registers on kernel entry. Used in followon patches. Signed-off-by: Andi Kleen <a...@linux.intel.com> --- arch/x86/entry/calling.h | 28 1 file changed, 28 insertions(+) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 45a63e00a6af..9444e7623185 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -172,6 +172,34 @@ For 32-bit we have the following conventions - kernel is built with .byte 0xf1 .endm + .macro CLEAR_R11_TO_R15 + xorq %r15, %r15 + xorq %r14, %r14 + xorq %r13, %r13 + xorq %r12, %r12 + xorq %r11, %r11 + .endm + + .macro CLEAR_R8_TO_R15 + CLEAR_R11_TO_R15 + xorq %r10, %r10 + xorq %r9, %r9 + xorq %r8, %r8 + .endm + + .macro CLEAR_ALL_REGS + CLEAR_R8_TO_R15 + xorl %eax, %eax + xorl %ebx, %ebx + xorl %ecx, %ecx + xorl %edx, %edx + xorl %esi, %esi + xorl %edi, %edi +#ifndef CONFIG_FRAME_POINTER + xorl %ebp, %ebp +#endif + .endm + /* * This is a sneaky trick to help the unwinder find pt_regs on the stack. The * frame pointer is replaced with an encoded pointer to pt_regs. The encoding -- 2.14.3
[PATCH v1 2/8] x86/entry/clearregs: Add infrastructure to clear registers
From: Andi Kleen Add 64bit assembler macros to clear registers on kernel entry. Used in followon patches. Signed-off-by: Andi Kleen --- arch/x86/entry/calling.h | 28 1 file changed, 28 insertions(+) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 45a63e00a6af..9444e7623185 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -172,6 +172,34 @@ For 32-bit we have the following conventions - kernel is built with .byte 0xf1 .endm + .macro CLEAR_R11_TO_R15 + xorq %r15, %r15 + xorq %r14, %r14 + xorq %r13, %r13 + xorq %r12, %r12 + xorq %r11, %r11 + .endm + + .macro CLEAR_R8_TO_R15 + CLEAR_R11_TO_R15 + xorq %r10, %r10 + xorq %r9, %r9 + xorq %r8, %r8 + .endm + + .macro CLEAR_ALL_REGS + CLEAR_R8_TO_R15 + xorl %eax, %eax + xorl %ebx, %ebx + xorl %ecx, %ecx + xorl %edx, %edx + xorl %esi, %esi + xorl %edi, %edi +#ifndef CONFIG_FRAME_POINTER + xorl %ebp, %ebp +#endif + .endm + /* * This is a sneaky trick to help the unwinder find pt_regs on the stack. The * frame pointer is replaced with an encoded pointer to pt_regs. The encoding -- 2.14.3
[PATCH v1 7/8] x86/entry/clearregs: Add 64bit stubs to clear unused arguments regs
From: Andi Kleen <a...@linux.intel.com> The main system call code doesn't know how many arguments each system call has. So generate stubs that do the clearing. Set up macros to generate stubs to clear unused argument registers for each system call in a 64bit kernel. This uses the syscall argument count from the syscall tables added earlier. Each system call will run through its stub which then clears the registers not used for input arguments before jumping to the real system calls. It also clears RAX. We have to move all the __SYSCALL_* users atomically. This is a larger patch, but it's difficult to do it git bisect safe otherwise. Longer term this setup will also allow to get rid of the system call table, as it will be possible to compute the entry point with a simple shift. So far this is not done here. Signed-off-by: Andi Kleen <a...@linux.intel.com> --- arch/x86/entry/calling.h | 24 arch/x86/entry/entry_64.S | 15 +++ arch/x86/entry/syscall_32.c | 4 ++-- arch/x86/entry/syscall_64.c | 5 +++-- arch/x86/entry/syscalls/syscalltbl.sh | 15 --- arch/x86/kernel/asm-offsets_32.c | 2 +- arch/x86/kernel/asm-offsets_64.c | 4 ++-- arch/x86/um/sys_call_table_32.c | 4 ++-- arch/x86/um/sys_call_table_64.c | 4 ++-- 9 files changed, 59 insertions(+), 18 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 9444e7623185..c89a8a8d195c 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -200,6 +200,30 @@ For 32-bit we have the following conventions - kernel is built with #endif .endm + /* Clear unused argument registers */ + +.macro CLEAR_ARGS num + /* we leave EAX around because it has been already checked */ + .if \num < 6 + xorq%r9, %r9# arg6 + .endif + .if \num < 5 + xorq%r8, %r8# arg5 + .endif + .if \num < 4 + xorl%ecx, %ecx # arg4 + .endif + .if \num < 3 + xorl%edx, %edx # arg3 + .endif + .if \num < 2 + xorl%esi, %esi # arg2 + .endif + .if \num < 1 + xorl%edi, %edi # arg1 + .endif +.endm + /* * This is a sneaky trick to help the unwinder find pt_regs on the stack. The * frame pointer is replaced with an encoded pointer to pt_regs. The encoding diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 6ab4c2aaeabb..5b2456a30b17 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1755,6 +1755,21 @@ nmi_restore: iretq END(nmi) +/* + * Clear all argument registers not used by a system call. + */ + +.macro gen_arg_stub sym, num +ENTRY(san_args_\sym) + CLEAR_ARGS \num + xor %eax, %eax + jmp\sym +END(san_args_\sym) +.endm + +#define __SYSCALL_64(nr, sym, qual, num) gen_arg_stub sym, num +#include + ENTRY(ignore_sysret) UNWIND_HINT_EMPTY mov $-ENOSYS, %eax diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c index 95c294963612..b31e5c8b7ba7 100644 --- a/arch/x86/entry/syscall_32.c +++ b/arch/x86/entry/syscall_32.c @@ -7,11 +7,11 @@ #include #include -#define __SYSCALL_I386(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ; +#define __SYSCALL_I386(nr, sym, qual, num) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ; #include #undef __SYSCALL_I386 -#define __SYSCALL_I386(nr, sym, qual) [nr] = sym, +#define __SYSCALL_I386(nr, sym, qual, num) [nr] = sym, extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c index ad1ae014f943..963c9c14480f 100644 --- a/arch/x86/entry/syscall_64.c +++ b/arch/x86/entry/syscall_64.c @@ -10,11 +10,12 @@ #define __SYSCALL_64_QUAL_(sym) sym #define __SYSCALL_64_QUAL_ptregs(sym) sym -#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); +#define __SYSCALL_64(nr, sym, qual, num) \ + extern asmlinkage long __SYSCALL_64_QUAL_##qual(san_args_##sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); #include #undef __SYSCALL_64 -#define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym), +#define __SYSCALL_64(nr, sym, qual, num) [nr] = __SYSCALL_64_QUAL_##qual(san_args_##sym), extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); diff --git a/arch/x86/entry/syscalls/syscalltbl.sh b/arch/x86/entr
[PATCH v1 7/8] x86/entry/clearregs: Add 64bit stubs to clear unused arguments regs
From: Andi Kleen The main system call code doesn't know how many arguments each system call has. So generate stubs that do the clearing. Set up macros to generate stubs to clear unused argument registers for each system call in a 64bit kernel. This uses the syscall argument count from the syscall tables added earlier. Each system call will run through its stub which then clears the registers not used for input arguments before jumping to the real system calls. It also clears RAX. We have to move all the __SYSCALL_* users atomically. This is a larger patch, but it's difficult to do it git bisect safe otherwise. Longer term this setup will also allow to get rid of the system call table, as it will be possible to compute the entry point with a simple shift. So far this is not done here. Signed-off-by: Andi Kleen --- arch/x86/entry/calling.h | 24 arch/x86/entry/entry_64.S | 15 +++ arch/x86/entry/syscall_32.c | 4 ++-- arch/x86/entry/syscall_64.c | 5 +++-- arch/x86/entry/syscalls/syscalltbl.sh | 15 --- arch/x86/kernel/asm-offsets_32.c | 2 +- arch/x86/kernel/asm-offsets_64.c | 4 ++-- arch/x86/um/sys_call_table_32.c | 4 ++-- arch/x86/um/sys_call_table_64.c | 4 ++-- 9 files changed, 59 insertions(+), 18 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 9444e7623185..c89a8a8d195c 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -200,6 +200,30 @@ For 32-bit we have the following conventions - kernel is built with #endif .endm + /* Clear unused argument registers */ + +.macro CLEAR_ARGS num + /* we leave EAX around because it has been already checked */ + .if \num < 6 + xorq%r9, %r9# arg6 + .endif + .if \num < 5 + xorq%r8, %r8# arg5 + .endif + .if \num < 4 + xorl%ecx, %ecx # arg4 + .endif + .if \num < 3 + xorl%edx, %edx # arg3 + .endif + .if \num < 2 + xorl%esi, %esi # arg2 + .endif + .if \num < 1 + xorl%edi, %edi # arg1 + .endif +.endm + /* * This is a sneaky trick to help the unwinder find pt_regs on the stack. The * frame pointer is replaced with an encoded pointer to pt_regs. The encoding diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 6ab4c2aaeabb..5b2456a30b17 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1755,6 +1755,21 @@ nmi_restore: iretq END(nmi) +/* + * Clear all argument registers not used by a system call. + */ + +.macro gen_arg_stub sym, num +ENTRY(san_args_\sym) + CLEAR_ARGS \num + xor %eax, %eax + jmp\sym +END(san_args_\sym) +.endm + +#define __SYSCALL_64(nr, sym, qual, num) gen_arg_stub sym, num +#include + ENTRY(ignore_sysret) UNWIND_HINT_EMPTY mov $-ENOSYS, %eax diff --git a/arch/x86/entry/syscall_32.c b/arch/x86/entry/syscall_32.c index 95c294963612..b31e5c8b7ba7 100644 --- a/arch/x86/entry/syscall_32.c +++ b/arch/x86/entry/syscall_32.c @@ -7,11 +7,11 @@ #include #include -#define __SYSCALL_I386(nr, sym, qual) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ; +#define __SYSCALL_I386(nr, sym, qual, num) extern asmlinkage long sym(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long) ; #include #undef __SYSCALL_I386 -#define __SYSCALL_I386(nr, sym, qual) [nr] = sym, +#define __SYSCALL_I386(nr, sym, qual, num) [nr] = sym, extern asmlinkage long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c index ad1ae014f943..963c9c14480f 100644 --- a/arch/x86/entry/syscall_64.c +++ b/arch/x86/entry/syscall_64.c @@ -10,11 +10,12 @@ #define __SYSCALL_64_QUAL_(sym) sym #define __SYSCALL_64_QUAL_ptregs(sym) sym -#define __SYSCALL_64(nr, sym, qual) extern asmlinkage long __SYSCALL_64_QUAL_##qual(sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); +#define __SYSCALL_64(nr, sym, qual, num) \ + extern asmlinkage long __SYSCALL_64_QUAL_##qual(san_args_##sym)(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); #include #undef __SYSCALL_64 -#define __SYSCALL_64(nr, sym, qual) [nr] = __SYSCALL_64_QUAL_##qual(sym), +#define __SYSCALL_64(nr, sym, qual, num) [nr] = __SYSCALL_64_QUAL_##qual(san_args_##sym), extern long sys_ni_syscall(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); diff --git a/arch/x86/entry/syscalls/syscalltbl.sh b/arch/x86/entry/syscalls/syscalltbl.sh index bb8a12f32610..79fff684d75e
Re: [PATCH] x86/retpoline: Fix NOSPEC_JMP for tip
> Then just make sure X86_FEATURE_RETPOLINE_AMD disables X86_FEATURE_RETPOLINE. > > That is both simpler an dsmaller, no? Yes that works, and is clearly better/simpler. Tested-by: Andi Kleen <a...@linux.intel.com> Thomas, I assume you will fix it up, or let me know if I should send another patch. -Andi
Re: [PATCH] x86/retpoline: Fix NOSPEC_JMP for tip
> Then just make sure X86_FEATURE_RETPOLINE_AMD disables X86_FEATURE_RETPOLINE. > > That is both simpler an dsmaller, no? Yes that works, and is clearly better/simpler. Tested-by: Andi Kleen Thomas, I assume you will fix it up, or let me know if I should send another patch. -Andi
[PATCH] x86/retpoline: Fix NOSPEC_JMP for tip
From: Andi Kleen <a...@linux.intel.com> With the latest tip x86/pti I get oopses when booting a 64bit VM in qemu with RETPOLINE/gcc7 and PTI enabled. The following patch fixes it for me. Something doesn't seem to work with ALTERNATIVE_2. It adds only a few bytes more code, so seems acceptable. Signed-off-by: Andi Kleen <a...@linux.intel.com> --- arch/x86/include/asm/nospec-branch.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 8ddf8513550e..eea6340a0abb 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -46,9 +46,8 @@ */ .macro JMP_NOSPEC reg:req #ifdef CONFIG_RETPOLINE - ALTERNATIVE_2 __stringify(jmp *\reg), \ - __stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE, \ - __stringify(lfence; jmp *\reg), X86_FEATURE_RETPOLINE_AMD + ALTERNATIVE "", __stringify(lfence; jmp *\reg), X86_FEATURE_RETPOLINE_AMD + ALTERNATIVE __stringify(jmp *\reg), __stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE #else jmp *\reg #endif -- 2.14.3
[PATCH] x86/retpoline: Fix NOSPEC_JMP for tip
From: Andi Kleen With the latest tip x86/pti I get oopses when booting a 64bit VM in qemu with RETPOLINE/gcc7 and PTI enabled. The following patch fixes it for me. Something doesn't seem to work with ALTERNATIVE_2. It adds only a few bytes more code, so seems acceptable. Signed-off-by: Andi Kleen --- arch/x86/include/asm/nospec-branch.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 8ddf8513550e..eea6340a0abb 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -46,9 +46,8 @@ */ .macro JMP_NOSPEC reg:req #ifdef CONFIG_RETPOLINE - ALTERNATIVE_2 __stringify(jmp *\reg), \ - __stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE, \ - __stringify(lfence; jmp *\reg), X86_FEATURE_RETPOLINE_AMD + ALTERNATIVE "", __stringify(lfence; jmp *\reg), X86_FEATURE_RETPOLINE_AMD + ALTERNATIVE __stringify(jmp *\reg), __stringify(RETPOLINE_JMP \reg), X86_FEATURE_RETPOLINE #else jmp *\reg #endif -- 2.14.3
[tip:x86/pti] x86/retpoline: Avoid return buffer underflows on context switch
Commit-ID: 450c505047981e97471f0170e0102f613bba4739 Gitweb: https://git.kernel.org/tip/450c505047981e97471f0170e0102f613bba4739 Author: Andi Kleen <a...@linux.intel.com> AuthorDate: Tue, 9 Jan 2018 14:43:17 + Committer: Thomas Gleixner <t...@linutronix.de> CommitDate: Tue, 9 Jan 2018 16:17:55 +0100 x86/retpoline: Avoid return buffer underflows on context switch CPUs have return buffers which store the return address for RET to predict function returns. Some CPUs (Skylake, some Broadwells) can fall back to indirect branch prediction on return buffer underflow. retpoline is supposed to prevent uncontrolled indirect branch speculation, which could be poisoned by ring 3, so it needs to prevent uncontrolled return buffer underflows in the kernel as well. This can happen when a context switch from a shallower to a deeper kernel stack happens. The deeper kernel stack would eventually underflow the return buffer, which again would make the CPU fall back to the indirect branch predictor. To guard against this fill the return buffer with controlled content during context switch. This prevents any underflows. Always fill the buffer with 30 entries: 32 minus 2 for at least one call from entry_{64,32}.S to C code and another into the function doing the fill. That's pessimistic because there are likely more controlled kernel calls before this happens, but it depends on compiler optimizations and other factors so avoid speculative optimization, error on the side of safety and always fill 30 entries. [dwmw2: Fix comments about nop between calls, Move #ifdef CONFIG_RETPOLINE to call sites not macro. Use Google's original RSB stuffing.] [tglx: Massaged changelog ] Signed-off-by: Andi Kleen <a...@linux.intel.com> Signed-off-by: David Woodhouse <d...@amazon.co.uk> Signed-off-by: Thomas Gleixner <t...@linutronix.de> Cc: gno...@lxorguk.ukuu.org.uk Cc: Rik van Riel <r...@redhat.com> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Jiri Kosina <ji...@kernel.org> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Dave Hansen <dave.han...@intel.com> Cc: Kees Cook <keesc...@google.com> Cc: Tim Chen <tim.c.c...@linux.intel.com> Cc: Greg Kroah-Hartman <gre...@linux-foundation.org> Cc: Paul Turner <p...@google.com> Link: https://lkml.kernel.org/r/1515508997-6154-12-git-send-email-d...@amazon.co.uk --- arch/x86/entry/entry_32.S| 17 ++ arch/x86/entry/entry_64.S| 17 ++ arch/x86/include/asm/nospec-branch.h | 44 3 files changed, 78 insertions(+) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index a1f28a5..d2ef7f32 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -244,6 +244,23 @@ ENTRY(__switch_to_asm) movl%ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset #endif +#ifdef CONFIG_RETPOLINE + /* +* When we switch from a shallower to a deeper call stack +* the call stack will underflow in the kernel in the next task. +* This could cause the CPU to fall back to indirect branch +* prediction, which may be poisoned. +* +* To guard against that always fill the return stack with +* known values. +* +* We do this in assembler because it needs to be before +* any calls on the new stack, and this can be difficult to +* ensure in a complex C function like __switch_to. +*/ + ALTERNATIVE "", "FILL_RETURN_BUFFER %ebx", X86_FEATURE_RETPOLINE +#endif + /* restore callee-saved registers */ popl%esi popl%edi diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 59874bc..58dbf7a 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -487,6 +487,23 @@ ENTRY(__switch_to_asm) movq%rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset #endif +#ifdef CONFIG_RETPOLINE + /* +* When we switch from a shallower to a deeper call stack +* the call stack will underflow in the kernel in the next task. +* This could cause the CPU to fall back to indirect branch +* prediction, which may be poisoned. +* +* To guard against that always fill the return stack with +* known values. +* +* We do this in assembler because it needs to be before +* any calls on the new stack, and this can be difficult to +* ensure in a complex C function like __switch_to. +*/ + ALTERNATIVE "", "FILL_RETURN_BUFFER %r12", X86_FEATURE_RETPOLINE +#endif + /* restore callee-saved registers */ popq%r15 popq%r14 diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index a86e845..8ddf
[tip:x86/pti] x86/retpoline: Avoid return buffer underflows on context switch
Commit-ID: 450c505047981e97471f0170e0102f613bba4739 Gitweb: https://git.kernel.org/tip/450c505047981e97471f0170e0102f613bba4739 Author: Andi Kleen AuthorDate: Tue, 9 Jan 2018 14:43:17 + Committer: Thomas Gleixner CommitDate: Tue, 9 Jan 2018 16:17:55 +0100 x86/retpoline: Avoid return buffer underflows on context switch CPUs have return buffers which store the return address for RET to predict function returns. Some CPUs (Skylake, some Broadwells) can fall back to indirect branch prediction on return buffer underflow. retpoline is supposed to prevent uncontrolled indirect branch speculation, which could be poisoned by ring 3, so it needs to prevent uncontrolled return buffer underflows in the kernel as well. This can happen when a context switch from a shallower to a deeper kernel stack happens. The deeper kernel stack would eventually underflow the return buffer, which again would make the CPU fall back to the indirect branch predictor. To guard against this fill the return buffer with controlled content during context switch. This prevents any underflows. Always fill the buffer with 30 entries: 32 minus 2 for at least one call from entry_{64,32}.S to C code and another into the function doing the fill. That's pessimistic because there are likely more controlled kernel calls before this happens, but it depends on compiler optimizations and other factors so avoid speculative optimization, error on the side of safety and always fill 30 entries. [dwmw2: Fix comments about nop between calls, Move #ifdef CONFIG_RETPOLINE to call sites not macro. Use Google's original RSB stuffing.] [tglx: Massaged changelog ] Signed-off-by: Andi Kleen Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner Cc: gno...@lxorguk.ukuu.org.uk Cc: Rik van Riel Cc: Peter Zijlstra Cc: Linus Torvalds Cc: Jiri Kosina Cc: Andy Lutomirski Cc: Dave Hansen Cc: Kees Cook Cc: Tim Chen Cc: Greg Kroah-Hartman Cc: Paul Turner Link: https://lkml.kernel.org/r/1515508997-6154-12-git-send-email-d...@amazon.co.uk --- arch/x86/entry/entry_32.S| 17 ++ arch/x86/entry/entry_64.S| 17 ++ arch/x86/include/asm/nospec-branch.h | 44 3 files changed, 78 insertions(+) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index a1f28a5..d2ef7f32 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -244,6 +244,23 @@ ENTRY(__switch_to_asm) movl%ebx, PER_CPU_VAR(stack_canary)+stack_canary_offset #endif +#ifdef CONFIG_RETPOLINE + /* +* When we switch from a shallower to a deeper call stack +* the call stack will underflow in the kernel in the next task. +* This could cause the CPU to fall back to indirect branch +* prediction, which may be poisoned. +* +* To guard against that always fill the return stack with +* known values. +* +* We do this in assembler because it needs to be before +* any calls on the new stack, and this can be difficult to +* ensure in a complex C function like __switch_to. +*/ + ALTERNATIVE "", "FILL_RETURN_BUFFER %ebx", X86_FEATURE_RETPOLINE +#endif + /* restore callee-saved registers */ popl%esi popl%edi diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 59874bc..58dbf7a 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -487,6 +487,23 @@ ENTRY(__switch_to_asm) movq%rbx, PER_CPU_VAR(irq_stack_union)+stack_canary_offset #endif +#ifdef CONFIG_RETPOLINE + /* +* When we switch from a shallower to a deeper call stack +* the call stack will underflow in the kernel in the next task. +* This could cause the CPU to fall back to indirect branch +* prediction, which may be poisoned. +* +* To guard against that always fill the return stack with +* known values. +* +* We do this in assembler because it needs to be before +* any calls on the new stack, and this can be difficult to +* ensure in a complex C function like __switch_to. +*/ + ALTERNATIVE "", "FILL_RETURN_BUFFER %r12", X86_FEATURE_RETPOLINE +#endif + /* restore callee-saved registers */ popq%r15 popq%r14 diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index a86e845..8ddf851 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -64,6 +64,50 @@ #endif .endm +/* + * Use 32-N: 32 is the max return buffer size, but there should have been + * at a minimum two controlled calls already: one into the kernel from + * entry*.S and another into the function containing this macro. So N=2, + * thus 30. + */ +#define NUM_BRANCHES_TO_FILL 30 + +/* + * Fil
[tip:x86/pti] x86/retpoline/irq32: Convert assembler indirect jumps
Commit-ID: 3025d1ebb41bc8fc58fc050c6d4d6dd4d71ca5e8 Gitweb: https://git.kernel.org/tip/3025d1ebb41bc8fc58fc050c6d4d6dd4d71ca5e8 Author: Andi Kleen <a...@linux.intel.com> AuthorDate: Tue, 9 Jan 2018 14:43:16 + Committer: Thomas Gleixner <t...@linutronix.de> CommitDate: Tue, 9 Jan 2018 16:17:54 +0100 x86/retpoline/irq32: Convert assembler indirect jumps Convert all indirect jumps in 32bit irq inline asm code to use non speculative sequences. Signed-off-by: Andi Kleen <a...@linux.intel.com> Signed-off-by: Thomas Gleixner <t...@linutronix.de> Acked-by: Arjan van de Ven <ar...@linux.intel.com> Acked-by: Ingo Molnar <mi...@kernel.org> Cc: gno...@lxorguk.ukuu.org.uk Cc: Rik van Riel <r...@redhat.com> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Jiri Kosina <ji...@kernel.org> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Dave Hansen <dave.han...@intel.com> Cc: Kees Cook <keesc...@google.com> Cc: Tim Chen <tim.c.c...@linux.intel.com> Cc: Greg Kroah-Hartman <gre...@linux-foundation.org> Cc: Paul Turner <p...@google.com> Link: https://lkml.kernel.org/r/1515508997-6154-11-git-send-email-d...@amazon.co.uk --- arch/x86/kernel/irq_32.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c index a83b334..c1bdbd3 100644 --- a/arch/x86/kernel/irq_32.c +++ b/arch/x86/kernel/irq_32.c @@ -20,6 +20,7 @@ #include #include +#include #ifdef CONFIG_DEBUG_STACKOVERFLOW @@ -55,11 +56,11 @@ DEFINE_PER_CPU(struct irq_stack *, softirq_stack); static void call_on_stack(void *func, void *stack) { asm volatile("xchgl %%ebx,%%esp \n" -"call *%%edi \n" +CALL_NOSPEC "movl %%ebx,%%esp \n" : "=b" (stack) : "0" (stack), - "D"(func) + [thunk_target] "D"(func) : "memory", "cc", "edx", "ecx", "eax"); } @@ -95,11 +96,11 @@ static inline int execute_on_irq_stack(int overflow, struct irq_desc *desc) call_on_stack(print_stack_overflow, isp); asm volatile("xchgl %%ebx,%%esp \n" -"call *%%edi \n" +CALL_NOSPEC "movl %%ebx,%%esp \n" : "=a" (arg1), "=b" (isp) : "0" (desc), "1" (isp), - "D" (desc->handle_irq) + [thunk_target] "D" (desc->handle_irq) : "memory", "cc", "ecx"); return 1; }
[tip:x86/pti] x86/retpoline/irq32: Convert assembler indirect jumps
Commit-ID: 3025d1ebb41bc8fc58fc050c6d4d6dd4d71ca5e8 Gitweb: https://git.kernel.org/tip/3025d1ebb41bc8fc58fc050c6d4d6dd4d71ca5e8 Author: Andi Kleen AuthorDate: Tue, 9 Jan 2018 14:43:16 + Committer: Thomas Gleixner CommitDate: Tue, 9 Jan 2018 16:17:54 +0100 x86/retpoline/irq32: Convert assembler indirect jumps Convert all indirect jumps in 32bit irq inline asm code to use non speculative sequences. Signed-off-by: Andi Kleen Signed-off-by: Thomas Gleixner Acked-by: Arjan van de Ven Acked-by: Ingo Molnar Cc: gno...@lxorguk.ukuu.org.uk Cc: Rik van Riel Cc: Peter Zijlstra Cc: Linus Torvalds Cc: Jiri Kosina Cc: Andy Lutomirski Cc: Dave Hansen Cc: Kees Cook Cc: Tim Chen Cc: Greg Kroah-Hartman Cc: Paul Turner Link: https://lkml.kernel.org/r/1515508997-6154-11-git-send-email-d...@amazon.co.uk --- arch/x86/kernel/irq_32.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c index a83b334..c1bdbd3 100644 --- a/arch/x86/kernel/irq_32.c +++ b/arch/x86/kernel/irq_32.c @@ -20,6 +20,7 @@ #include #include +#include #ifdef CONFIG_DEBUG_STACKOVERFLOW @@ -55,11 +56,11 @@ DEFINE_PER_CPU(struct irq_stack *, softirq_stack); static void call_on_stack(void *func, void *stack) { asm volatile("xchgl %%ebx,%%esp \n" -"call *%%edi \n" +CALL_NOSPEC "movl %%ebx,%%esp \n" : "=b" (stack) : "0" (stack), - "D"(func) + [thunk_target] "D"(func) : "memory", "cc", "edx", "ecx", "eax"); } @@ -95,11 +96,11 @@ static inline int execute_on_irq_stack(int overflow, struct irq_desc *desc) call_on_stack(print_stack_overflow, isp); asm volatile("xchgl %%ebx,%%esp \n" -"call *%%edi \n" +CALL_NOSPEC "movl %%ebx,%%esp \n" : "=a" (arg1), "=b" (isp) : "0" (desc), "1" (isp), - "D" (desc->handle_irq) + [thunk_target] "D" (desc->handle_irq) : "memory", "cc", "ecx"); return 1; }
[tip:x86/pti] x86/retpoline: Temporarily disable objtool when CONFIG_RETPOLINE=y
Commit-ID: 61888594f2ff61633c7fb29b58c128d6dc850e7c Gitweb: https://git.kernel.org/tip/61888594f2ff61633c7fb29b58c128d6dc850e7c Author: Andi Kleen <a...@linux.intel.com> AuthorDate: Tue, 9 Jan 2018 14:43:08 + Committer: Thomas Gleixner <t...@linutronix.de> CommitDate: Tue, 9 Jan 2018 16:17:51 +0100 x86/retpoline: Temporarily disable objtool when CONFIG_RETPOLINE=y objtool's assembler currently cannot deal with the code generated by the retpoline compiler and throws hundreds of warnings, mostly because it sees calls that don't have a symbolic target. Exclude all the options that rely on objtool when RETPOLINE is active. This mainly means that the kernel has to fallback to use the frame pointer unwinder and livepatch is not supported. Josh is looking into resolving the issue. Signed-off-by: Andi Kleen <a...@linux.intel.com> Signed-off-by: David Woodhouse <d...@amazon.co.uk> Signed-off-by: Thomas Gleixner <t...@linutronix.de> Acked-by: Arjan van de Ven <ar...@linux.intel.com> Acked-by: Ingo Molnar <mi...@kernel.org> Cc: gno...@lxorguk.ukuu.org.uk Cc: Rik van Riel <r...@redhat.com> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Jiri Kosina <ji...@kernel.org> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Dave Hansen <dave.han...@intel.com> Cc: Kees Cook <keesc...@google.com> Cc: Tim Chen <tim.c.c...@linux.intel.com> Cc: Greg Kroah-Hartman <gre...@linux-foundation.org> Cc: Paul Turner <p...@google.com> Link: https://lkml.kernel.org/r/1515508997-6154-3-git-send-email-d...@amazon.co.uk --- arch/x86/Kconfig | 4 ++-- arch/x86/Kconfig.debug | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d181916..abeac4b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -172,8 +172,8 @@ config X86 select HAVE_PERF_USER_STACK_DUMP select HAVE_RCU_TABLE_FREE select HAVE_REGS_AND_STACK_ACCESS_API - select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION - select HAVE_STACK_VALIDATIONif X86_64 + select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION && !RETPOLINE + select HAVE_STACK_VALIDATIONif X86_64 && !RETPOLINE select HAVE_SYSCALL_TRACEPOINTS select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_USER_RETURN_NOTIFIER diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index 6293a87..9f3928d 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -359,8 +359,8 @@ config PUNIT_ATOM_DEBUG choice prompt "Choose kernel unwinder" - default UNWINDER_ORC if X86_64 - default UNWINDER_FRAME_POINTER if X86_32 + default UNWINDER_ORC if X86_64 && !RETPOLINE + default UNWINDER_FRAME_POINTER if X86_32 || RETPOLINE ---help--- This determines which method will be used for unwinding kernel stack traces for panics, oopses, bugs, warnings, perf, /proc//stack, @@ -368,7 +368,7 @@ choice config UNWINDER_ORC bool "ORC unwinder" - depends on X86_64 + depends on X86_64 && !RETPOLINE select STACK_VALIDATION ---help--- This option enables the ORC (Oops Rewind Capability) unwinder for
[tip:x86/pti] x86/retpoline: Temporarily disable objtool when CONFIG_RETPOLINE=y
Commit-ID: 61888594f2ff61633c7fb29b58c128d6dc850e7c Gitweb: https://git.kernel.org/tip/61888594f2ff61633c7fb29b58c128d6dc850e7c Author: Andi Kleen AuthorDate: Tue, 9 Jan 2018 14:43:08 + Committer: Thomas Gleixner CommitDate: Tue, 9 Jan 2018 16:17:51 +0100 x86/retpoline: Temporarily disable objtool when CONFIG_RETPOLINE=y objtool's assembler currently cannot deal with the code generated by the retpoline compiler and throws hundreds of warnings, mostly because it sees calls that don't have a symbolic target. Exclude all the options that rely on objtool when RETPOLINE is active. This mainly means that the kernel has to fallback to use the frame pointer unwinder and livepatch is not supported. Josh is looking into resolving the issue. Signed-off-by: Andi Kleen Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner Acked-by: Arjan van de Ven Acked-by: Ingo Molnar Cc: gno...@lxorguk.ukuu.org.uk Cc: Rik van Riel Cc: Peter Zijlstra Cc: Linus Torvalds Cc: Jiri Kosina Cc: Andy Lutomirski Cc: Dave Hansen Cc: Kees Cook Cc: Tim Chen Cc: Greg Kroah-Hartman Cc: Paul Turner Link: https://lkml.kernel.org/r/1515508997-6154-3-git-send-email-d...@amazon.co.uk --- arch/x86/Kconfig | 4 ++-- arch/x86/Kconfig.debug | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d181916..abeac4b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -172,8 +172,8 @@ config X86 select HAVE_PERF_USER_STACK_DUMP select HAVE_RCU_TABLE_FREE select HAVE_REGS_AND_STACK_ACCESS_API - select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION - select HAVE_STACK_VALIDATIONif X86_64 + select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION && !RETPOLINE + select HAVE_STACK_VALIDATIONif X86_64 && !RETPOLINE select HAVE_SYSCALL_TRACEPOINTS select HAVE_UNSTABLE_SCHED_CLOCK select HAVE_USER_RETURN_NOTIFIER diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index 6293a87..9f3928d 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -359,8 +359,8 @@ config PUNIT_ATOM_DEBUG choice prompt "Choose kernel unwinder" - default UNWINDER_ORC if X86_64 - default UNWINDER_FRAME_POINTER if X86_32 + default UNWINDER_ORC if X86_64 && !RETPOLINE + default UNWINDER_FRAME_POINTER if X86_32 || RETPOLINE ---help--- This determines which method will be used for unwinding kernel stack traces for panics, oopses, bugs, warnings, perf, /proc//stack, @@ -368,7 +368,7 @@ choice config UNWINDER_ORC bool "ORC unwinder" - depends on X86_64 + depends on X86_64 && !RETPOLINE select STACK_VALIDATION ---help--- This option enables the ORC (Oops Rewind Capability) unwinder for
Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch II
> > On Skylake and Broadwell when the RSB underflows it will fall back to the > > indirect branch predictor, which can be poisoned and we try to avoid > > using with retpoline. So we try to avoid underflows, and this filling > > helps us with that. > > That's no longer true for Broadwell with the latest microcode, right? That's right. -Andi
Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch II
> > On Skylake and Broadwell when the RSB underflows it will fall back to the > > indirect branch predictor, which can be poisoned and we try to avoid > > using with retpoline. So we try to avoid underflows, and this filling > > helps us with that. > > That's no longer true for Broadwell with the latest microcode, right? That's right. -Andi
Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch II
On Mon, Jan 08, 2018 at 05:16:02PM -0800, Andi Kleen wrote: > > If we clear the registers, what the hell are you going to put in the > > RSB that helps you? > > RSB allows you to control chains of gadgets. I admit the gadget thing is a bit obscure. There's another case we were actually more worried about: On Skylake and Broadwell when the RSB underflows it will fall back to the indirect branch predictor, which can be poisoned and we try to avoid using with retpoline. So we try to avoid underflows, and this filling helps us with that. Does that make more sense? -Andi
Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch II
On Mon, Jan 08, 2018 at 05:16:02PM -0800, Andi Kleen wrote: > > If we clear the registers, what the hell are you going to put in the > > RSB that helps you? > > RSB allows you to control chains of gadgets. I admit the gadget thing is a bit obscure. There's another case we were actually more worried about: On Skylake and Broadwell when the RSB underflows it will fall back to the indirect branch predictor, which can be poisoned and we try to avoid using with retpoline. So we try to avoid underflows, and this filling helps us with that. Does that make more sense? -Andi
Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch
> If we clear the registers, what the hell are you going to put in the > RSB that helps you? RSB allows you to control chains of gadgets. You can likely find some chain of gadgets that set up constants in registers in a lot of useful ways. Perhaps not any way (so may be hard to scan through all of memory), but it's likely you could find gadgets that result in a lot of useful direct mapped addresses, which the next gadget can then reference. Especially RAX is quite vulnerable to this because there will be a lot of code that does "modify RAX in interesting ways ; RET" > So instead of saying "we have to flush the return stack", I'm saying > that we should look at things that make flushing the return stack > _unnecessary_, simply because even if the attacker were to control it > entirely, they'd still be up shit creek without a paddle. I agree that clearing registers is useful (was just hacking on that patch). -Andi
Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch
> If we clear the registers, what the hell are you going to put in the > RSB that helps you? RSB allows you to control chains of gadgets. You can likely find some chain of gadgets that set up constants in registers in a lot of useful ways. Perhaps not any way (so may be hard to scan through all of memory), but it's likely you could find gadgets that result in a lot of useful direct mapped addresses, which the next gadget can then reference. Especially RAX is quite vulnerable to this because there will be a lot of code that does "modify RAX in interesting ways ; RET" > So instead of saying "we have to flush the return stack", I'm saying > that we should look at things that make flushing the return stack > _unnecessary_, simply because even if the attacker were to control it > entirely, they'd still be up shit creek without a paddle. I agree that clearing registers is useful (was just hacking on that patch). -Andi
Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch
> So I was really hoping that in places like context switching etc, we'd > be able to instead effectively kill off any exploits by clearing > registers. > > That should make it pretty damn hard to then find a matching "gadget" > that actually does anything interesting/powerful. > > Together with Spectre already being pretty hard to take advantage of, > and the eBPF people making those user-proivided gadgets inaccessible, > it really should be a pretty powerful fix. > > Hmm? Essentially the RSB are hidden registers, and the only way to clear them is the FILL_RETURN_BUFFER sequence. I don't see how clearing anything else would help? -Andi
Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch
> So I was really hoping that in places like context switching etc, we'd > be able to instead effectively kill off any exploits by clearing > registers. > > That should make it pretty damn hard to then find a matching "gadget" > that actually does anything interesting/powerful. > > Together with Spectre already being pretty hard to take advantage of, > and the eBPF people making those user-proivided gadgets inaccessible, > it really should be a pretty powerful fix. > > Hmm? Essentially the RSB are hidden registers, and the only way to clear them is the FILL_RETURN_BUFFER sequence. I don't see how clearing anything else would help? -Andi
Re: [PATCH] x86/retpoline: Also fill return buffer after idle
> Probably doesn't matter right there but it's going to end up being used > elsewhere with IBRS/IBPB, and the compiler is going to think it needs > to save all the call-clobbered registers for that. Do we want to make > it use inline asm instead? You mean KVM? All the other places have lots of other calls too, so the registers are likely already clobbered. -Andi
Re: [PATCH] x86/retpoline: Also fill return buffer after idle
> Probably doesn't matter right there but it's going to end up being used > elsewhere with IBRS/IBPB, and the compiler is going to think it needs > to save all the call-clobbered registers for that. Do we want to make > it use inline asm instead? You mean KVM? All the other places have lots of other calls too, so the registers are likely already clobbered. -Andi
Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch
On Mon, Jan 08, 2018 at 03:56:30PM -0800, Linus Torvalds wrote: > On Mon, Jan 8, 2018 at 3:44 PM, David Woodhousewrote: > > > > To guard against this fill the return buffer with controlled > > content during context switch. This prevents any underflows. > > Ugh. I really dislike this patch. Everything else in the retpoline > patches makes me go "ok, that's reasonable". This one makes me go > "Eww". > > It's hacky, it's ugly, and it looks pretty expensive too. Modern cores are quite fast at executing calls. > > Is there really nothing more clever we can do? We could be a cleverer in selecting how many dummy calls to do. But that would likely be fragile and hard to maintain and likely be more complicated, and I doubt it would buy that much. Don't really have a better proposal, sorry. -Andi
Re: [PATCH v6 11/10] x86/retpoline: Avoid return buffer underflows on context switch
On Mon, Jan 08, 2018 at 03:56:30PM -0800, Linus Torvalds wrote: > On Mon, Jan 8, 2018 at 3:44 PM, David Woodhouse wrote: > > > > To guard against this fill the return buffer with controlled > > content during context switch. This prevents any underflows. > > Ugh. I really dislike this patch. Everything else in the retpoline > patches makes me go "ok, that's reasonable". This one makes me go > "Eww". > > It's hacky, it's ugly, and it looks pretty expensive too. Modern cores are quite fast at executing calls. > > Is there really nothing more clever we can do? We could be a cleverer in selecting how many dummy calls to do. But that would likely be fragile and hard to maintain and likely be more complicated, and I doubt it would buy that much. Don't really have a better proposal, sorry. -Andi
[PATCH] x86/retpoline: Also fill return buffer after idle
From: Andi Kleen <a...@linux.intel.com> This is an extension of the earlier patch to fill the return buffer on context switch. It uses the assembler macros added earlier. When we go into deeper idle states the return buffer could be cleared in MWAIT, but then another thread which wakes up earlier might be poisoning the indirect branch predictor. Then when the return buffer underflows there might an uncontrolled indirect branch. To guard against this always fill the return buffer when exiting idle. Needed on Skylake and some Broadwells. Signed-off-by: Andi Kleen <a...@linux.intel.com> --- arch/x86/entry/entry_32.S| 8 arch/x86/entry/entry_64.S| 8 arch/x86/include/asm/mwait.h | 11 ++- 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 7dee84a3cf83..2687cce8a02e 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -1092,3 +1092,11 @@ ENTRY(rewind_stack_do_exit) calldo_exit 1: jmp 1b END(rewind_stack_do_exit) + +ENTRY(fill_return_buffer) +#ifdef CONFIG_RETPOLINE + ALTERNATIVE "ret", "", X86_FEATURE_RETPOLINE +FILL_RETURN_BUFFER +#endif +ret +END(fill_return_buffer) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index a33033e2bfe0..92fbec1b0eb5 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1831,3 +1831,11 @@ ENTRY(rewind_stack_do_exit) calldo_exit END(rewind_stack_do_exit) + +ENTRY(fill_return_buffer) +#ifdef CONFIG_RETPOLINE + ALTERNATIVE "ret", "", X86_FEATURE_RETPOLINE + FILL_RETURN_BUFFER +#endif + ret +END(fill_return_buffer) diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 39a2fb29378a..1d9f9269b5e7 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -87,6 +87,8 @@ static inline void __sti_mwait(unsigned long eax, unsigned long ecx) :: "a" (eax), "c" (ecx)); } +extern __visible void fill_return_buffer(void); + /* * This uses new MONITOR/MWAIT instructions on P4 processors with PNI, * which can obviate IPI to trigger checking of need_resched. @@ -107,8 +109,15 @@ static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx) } __monitor((void *)_thread_info()->flags, 0, 0); - if (!need_resched()) + if (!need_resched()) { __mwait(eax, ecx); + /* +* idle could have cleared the return buffer, +* so fill it to prevent uncontrolled +* speculation. +*/ + fill_return_buffer(); + } } current_clr_polling(); } -- 2.14.3
[PATCH] x86/retpoline: Also fill return buffer after idle
From: Andi Kleen This is an extension of the earlier patch to fill the return buffer on context switch. It uses the assembler macros added earlier. When we go into deeper idle states the return buffer could be cleared in MWAIT, but then another thread which wakes up earlier might be poisoning the indirect branch predictor. Then when the return buffer underflows there might an uncontrolled indirect branch. To guard against this always fill the return buffer when exiting idle. Needed on Skylake and some Broadwells. Signed-off-by: Andi Kleen --- arch/x86/entry/entry_32.S| 8 arch/x86/entry/entry_64.S| 8 arch/x86/include/asm/mwait.h | 11 ++- 3 files changed, 26 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 7dee84a3cf83..2687cce8a02e 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -1092,3 +1092,11 @@ ENTRY(rewind_stack_do_exit) calldo_exit 1: jmp 1b END(rewind_stack_do_exit) + +ENTRY(fill_return_buffer) +#ifdef CONFIG_RETPOLINE + ALTERNATIVE "ret", "", X86_FEATURE_RETPOLINE +FILL_RETURN_BUFFER +#endif +ret +END(fill_return_buffer) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index a33033e2bfe0..92fbec1b0eb5 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1831,3 +1831,11 @@ ENTRY(rewind_stack_do_exit) calldo_exit END(rewind_stack_do_exit) + +ENTRY(fill_return_buffer) +#ifdef CONFIG_RETPOLINE + ALTERNATIVE "ret", "", X86_FEATURE_RETPOLINE + FILL_RETURN_BUFFER +#endif + ret +END(fill_return_buffer) diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h index 39a2fb29378a..1d9f9269b5e7 100644 --- a/arch/x86/include/asm/mwait.h +++ b/arch/x86/include/asm/mwait.h @@ -87,6 +87,8 @@ static inline void __sti_mwait(unsigned long eax, unsigned long ecx) :: "a" (eax), "c" (ecx)); } +extern __visible void fill_return_buffer(void); + /* * This uses new MONITOR/MWAIT instructions on P4 processors with PNI, * which can obviate IPI to trigger checking of need_resched. @@ -107,8 +109,15 @@ static inline void mwait_idle_with_hints(unsigned long eax, unsigned long ecx) } __monitor((void *)_thread_info()->flags, 0, 0); - if (!need_resched()) + if (!need_resched()) { __mwait(eax, ecx); + /* +* idle could have cleared the return buffer, +* so fill it to prevent uncontrolled +* speculation. +*/ + fill_return_buffer(); + } } current_clr_polling(); } -- 2.14.3
Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
> > Why is none of that done here? Also, can we pretty please stop using > > those retarded number labels, they make this stuff unreadable. > > Personally I find the magic labels with strange ASCII characters > far less readable than a simple number. Tried it and \@ is incompatible with .rept. -Andi
Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
> > Why is none of that done here? Also, can we pretty please stop using > > those retarded number labels, they make this stuff unreadable. > > Personally I find the magic labels with strange ASCII characters > far less readable than a simple number. Tried it and \@ is incompatible with .rept. -Andi
Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
> We want this on vmexit too, right? Yes. KVM patches are done separately. -Andi
Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
> We want this on vmexit too, right? Yes. KVM patches are done separately. -Andi
Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
> So pjt did alignment, a single unroll and per discussion earlier today > (CET) or late last night (PST), he only does 16. I used the Intel recommended sequence, which recommends 32. Not sure if alignment makes a difference. I can check. > Why is none of that done here? Also, can we pretty please stop using > those retarded number labels, they make this stuff unreadable. Personally I find the magic labels with strange ASCII characters far less readable than a simple number. But can change it if you insist. > Also, pause is unlikely to stop speculation, that comment doesn't make > sense. Looking at PJT's version there used to be a speculation trap in > there, but I can't see that here. My understanding is that it stops speculation. But could also use LFENCE. -Andi
Re: [PATCH] x86/retpoline: Avoid return buffer underflows on context switch
> So pjt did alignment, a single unroll and per discussion earlier today > (CET) or late last night (PST), he only does 16. I used the Intel recommended sequence, which recommends 32. Not sure if alignment makes a difference. I can check. > Why is none of that done here? Also, can we pretty please stop using > those retarded number labels, they make this stuff unreadable. Personally I find the magic labels with strange ASCII characters far less readable than a simple number. But can change it if you insist. > Also, pause is unlikely to stop speculation, that comment doesn't make > sense. Looking at PJT's version there used to be a speculation trap in > there, but I can't see that here. My understanding is that it stops speculation. But could also use LFENCE. -Andi
[PATCH] x86/retpoline: Avoid return buffer underflows on context switch
From: Andi Kleen <a...@linux.intel.com> [This is on top of David's retpoline branch, as of 08-01 this morning] This patch further hardens retpoline CPUs have return buffers which store the return address for RET to predict function returns. Some CPUs (Skylake, some Broadwells) can fall back to indirect branch prediction on return buffer underflow. With retpoline we want to avoid uncontrolled indirect branches, which could be poisoned by ring 3, so we need to avoid uncontrolled return buffer underflows in the kernel. This can happen when we're context switching from a shallower to a deeper kernel stack. The deeper kernel stack would eventually underflow the return buffer, which again would fall back to the indirect branch predictor. To guard against this fill the return buffer with controlled content during context switch. This prevents any underflows. We always fill the buffer with 30 entries: 32 minus 2 for at least one call from entry_{64,32}.S to C code and another into the function doing the filling. That's pessimistic because we likely did more controlled kernel calls. So in principle we could do less. However it's hard to maintain such an invariant, and it may be broken with more aggressive compilers. So err on the side of safety and always fill 30. Signed-off-by: Andi Kleen <a...@linux.intel.com> --- arch/x86/entry/entry_32.S| 15 +++ arch/x86/entry/entry_64.S| 15 +++ arch/x86/include/asm/nospec-branch.h | 29 + 3 files changed, 59 insertions(+) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index cf9ef33d299b..5404a9b2197c 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -250,6 +250,21 @@ ENTRY(__switch_to_asm) popl%ebx popl%ebp + /* +* When we switch from a shallower to a deeper call stack +* the call stack will underflow in the kernel in the next task. +* This could cause the CPU to fall back to indirect branch +* prediction, which may be poisoned. +* +* To guard against that always fill the return stack with +* known values. +* +* We do this in assembler because it needs to be before +* any calls on the new stack, and this can be difficult to +* ensure in a complex C function like __switch_to. +*/ + ALTERNATIVE "jmp __switch_to", "", X86_FEATURE_RETPOLINE + FILL_RETURN_BUFFER jmp __switch_to END(__switch_to_asm) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 9bce6ed03353..0f28d0ea57e8 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -495,6 +495,21 @@ ENTRY(__switch_to_asm) popq%rbx popq%rbp + /* +* When we switch from a shallower to a deeper call stack +* the call stack will underflow in the kernel in the next task. +* This could cause the CPU to fall back to indirect branch +* prediction, which may be poisoned. +* +* To guard against that always fill the return stack with +* known values. +* +* We do this in assembler because it needs to be before +* any calls on the new stack, and this can be difficult to +* ensure in a complex C function like __switch_to. +*/ + ALTERNATIVE "jmp __switch_to", "", X86_FEATURE_RETPOLINE + FILL_RETURN_BUFFER jmp __switch_to END(__switch_to_asm) diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index b8c8eeacb4be..e84e231248c2 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -53,6 +53,35 @@ #endif .endm +/* + * We use 32-N: 32 is the max return buffer size, + * but there should have been at a minimum two + * controlled calls already: one into the kernel + * from entry*.S and another into the function + * containing this macro. So N=2, thus 30. + */ +#define NUM_BRANCHES_TO_FILL 30 + +/* + * Fill the CPU return branch buffer to prevent + * indirect branch prediction on underflow. + * Caller should check for X86_FEATURE_SMEP and X86_FEATURE_RETPOLINE + */ +.macro FILL_RETURN_BUFFER +#ifdef CONFIG_RETPOLINE + .rept NUM_BRANCHES_TO_FILL + call1221f + pause /* stop speculation */ +1221: + .endr +#ifdef CONFIG_64BIT + addq$8*NUM_BRANCHES_TO_FILL, %rsp +#else + addl$4*NUM_BRANCHES_TO_FILL, %esp +#endif +#endif +.endm + #else /* __ASSEMBLY__ */ #if defined(CONFIG_X86_64) && defined(RETPOLINE) -- 2.14.3
[PATCH] x86/retpoline: Avoid return buffer underflows on context switch
From: Andi Kleen [This is on top of David's retpoline branch, as of 08-01 this morning] This patch further hardens retpoline CPUs have return buffers which store the return address for RET to predict function returns. Some CPUs (Skylake, some Broadwells) can fall back to indirect branch prediction on return buffer underflow. With retpoline we want to avoid uncontrolled indirect branches, which could be poisoned by ring 3, so we need to avoid uncontrolled return buffer underflows in the kernel. This can happen when we're context switching from a shallower to a deeper kernel stack. The deeper kernel stack would eventually underflow the return buffer, which again would fall back to the indirect branch predictor. To guard against this fill the return buffer with controlled content during context switch. This prevents any underflows. We always fill the buffer with 30 entries: 32 minus 2 for at least one call from entry_{64,32}.S to C code and another into the function doing the filling. That's pessimistic because we likely did more controlled kernel calls. So in principle we could do less. However it's hard to maintain such an invariant, and it may be broken with more aggressive compilers. So err on the side of safety and always fill 30. Signed-off-by: Andi Kleen --- arch/x86/entry/entry_32.S| 15 +++ arch/x86/entry/entry_64.S| 15 +++ arch/x86/include/asm/nospec-branch.h | 29 + 3 files changed, 59 insertions(+) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index cf9ef33d299b..5404a9b2197c 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -250,6 +250,21 @@ ENTRY(__switch_to_asm) popl%ebx popl%ebp + /* +* When we switch from a shallower to a deeper call stack +* the call stack will underflow in the kernel in the next task. +* This could cause the CPU to fall back to indirect branch +* prediction, which may be poisoned. +* +* To guard against that always fill the return stack with +* known values. +* +* We do this in assembler because it needs to be before +* any calls on the new stack, and this can be difficult to +* ensure in a complex C function like __switch_to. +*/ + ALTERNATIVE "jmp __switch_to", "", X86_FEATURE_RETPOLINE + FILL_RETURN_BUFFER jmp __switch_to END(__switch_to_asm) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 9bce6ed03353..0f28d0ea57e8 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -495,6 +495,21 @@ ENTRY(__switch_to_asm) popq%rbx popq%rbp + /* +* When we switch from a shallower to a deeper call stack +* the call stack will underflow in the kernel in the next task. +* This could cause the CPU to fall back to indirect branch +* prediction, which may be poisoned. +* +* To guard against that always fill the return stack with +* known values. +* +* We do this in assembler because it needs to be before +* any calls on the new stack, and this can be difficult to +* ensure in a complex C function like __switch_to. +*/ + ALTERNATIVE "jmp __switch_to", "", X86_FEATURE_RETPOLINE + FILL_RETURN_BUFFER jmp __switch_to END(__switch_to_asm) diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index b8c8eeacb4be..e84e231248c2 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -53,6 +53,35 @@ #endif .endm +/* + * We use 32-N: 32 is the max return buffer size, + * but there should have been at a minimum two + * controlled calls already: one into the kernel + * from entry*.S and another into the function + * containing this macro. So N=2, thus 30. + */ +#define NUM_BRANCHES_TO_FILL 30 + +/* + * Fill the CPU return branch buffer to prevent + * indirect branch prediction on underflow. + * Caller should check for X86_FEATURE_SMEP and X86_FEATURE_RETPOLINE + */ +.macro FILL_RETURN_BUFFER +#ifdef CONFIG_RETPOLINE + .rept NUM_BRANCHES_TO_FILL + call1221f + pause /* stop speculation */ +1221: + .endr +#ifdef CONFIG_64BIT + addq$8*NUM_BRANCHES_TO_FILL, %rsp +#else + addl$4*NUM_BRANCHES_TO_FILL, %esp +#endif +#endif +.endm + #else /* __ASSEMBLY__ */ #if defined(CONFIG_X86_64) && defined(RETPOLINE) -- 2.14.3
Re: perf: documentation for cycles_no_execute event
> > Many of the x86 pipeline.json files have the brief description "Total > > execution stalls" for both CYCLE_ACTIVITY.CYCLES_NO_EXECUTE and > > CYCLE_ACTIVITY.STALLS_TOTAL. Should the case for > > CYCLE_ACTIVITY.CYCLES_NO_EXECUTE have a brief description that mentions > > cycles? Some of the files do have a public description that mentions > > cycles, eg broadwellde/pipeline.json has "Counts number of cycles nothing > > is executed on any execution port.", but others (eg ivytown/pipeline.json) > > just have "Total execution stalls." twice. Maybe "Cycles where nothing is > > executed" would be ok for CYCLE_ACTIVITY.CYCLES_NO_EXECUTE? Yes it's all in cycles Will clarify in a future event list update. -Andi
Re: perf: documentation for cycles_no_execute event
> > Many of the x86 pipeline.json files have the brief description "Total > > execution stalls" for both CYCLE_ACTIVITY.CYCLES_NO_EXECUTE and > > CYCLE_ACTIVITY.STALLS_TOTAL. Should the case for > > CYCLE_ACTIVITY.CYCLES_NO_EXECUTE have a brief description that mentions > > cycles? Some of the files do have a public description that mentions > > cycles, eg broadwellde/pipeline.json has "Counts number of cycles nothing > > is executed on any execution port.", but others (eg ivytown/pipeline.json) > > just have "Total execution stalls." twice. Maybe "Cycles where nothing is > > executed" would be ok for CYCLE_ACTIVITY.CYCLES_NO_EXECUTE? Yes it's all in cycles Will clarify in a future event list update. -Andi
Re: [PATCH V3] perf script: add script to profile and resolve physical mem type
On Thu, Jan 04, 2018 at 12:59:55PM -0800, kan.li...@intel.com wrote: > From: Kan Liang <kan.li...@intel.com> > > There could be different types of memory in the system. E.g normal > System Memory, Persistent Memory. To understand how the workload maps to > those memories, it's important to know the I/O statistics of them. > Perf can collect physical addresses, but those are raw data. > It still needs extra work to resolve the physical addresses. > Provide a script to facilitate the physical addresses resolving and > I/O statistics. Reviewed-by: Andi Kleen <a...@linux.intel.com> -Andi
Re: [PATCH V3] perf script: add script to profile and resolve physical mem type
On Thu, Jan 04, 2018 at 12:59:55PM -0800, kan.li...@intel.com wrote: > From: Kan Liang > > There could be different types of memory in the system. E.g normal > System Memory, Persistent Memory. To understand how the workload maps to > those memories, it's important to know the I/O statistics of them. > Perf can collect physical addresses, but those are raw data. > It still needs extra work to resolve the physical addresses. > Provide a script to facilitate the physical addresses resolving and > I/O statistics. Reviewed-by: Andi Kleen -Andi
[PATCH] retpoline/modpost: Quieten MODVERSION retpoline build
From: Andi Kleen <a...@linux.intel.com> The internal retpoline thunks used by the compiler contain a dot. They have to be exported, but modversions cannot handle them it because they don't have a prototype due to the C incompatible name (and it doesn't support asm("...")) This leads to lots of warnings from modpost with a retpoline build with MODVERSIONS enabled. The actual symbols load fine, they just don't get versioned. That's not a problem here because we don't expect them to change ever. Quieten the respective warning messages in modpost for any symbols containing a dot. Signed-off-by: Andi Kleen <a...@linux.intel.com> --- scripts/mod/modpost.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index 6510536c06df..7a06d2032ecf 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -693,7 +693,9 @@ static void handle_modversions(struct module *mod, struct elf_info *info, #endif if (is_crc) { const char *e = is_vmlinux(mod->name) ?"":".ko"; - warn("EXPORT symbol \"%s\" [%s%s] version generation failed, symbol will not be versioned.\n", symname + strlen(CRC_PFX), mod->name, e); + const char *name = symname + strlen(CRC_PFX); + if (!strchr(name, '.')) + warn("EXPORT symbol \"%s\" [%s%s] version generation failed, symbol will not be versioned.\n", name, mod->name, e); } mod->unres = alloc_symbol(symname, ELF_ST_BIND(sym->st_info) == STB_WEAK, @@ -2220,7 +,7 @@ static int add_versions(struct buffer *b, struct module *mod) for (s = mod->unres; s; s = s->next) { if (!s->module) continue; - if (!s->crc_valid) { + if (!s->crc_valid && !strchr(s->name, '.')) { warn("\"%s\" [%s.ko] has no CRC!\n", s->name, mod->name); continue; -- 2.14.3
[PATCH] retpoline/modpost: Quieten MODVERSION retpoline build
From: Andi Kleen The internal retpoline thunks used by the compiler contain a dot. They have to be exported, but modversions cannot handle them it because they don't have a prototype due to the C incompatible name (and it doesn't support asm("...")) This leads to lots of warnings from modpost with a retpoline build with MODVERSIONS enabled. The actual symbols load fine, they just don't get versioned. That's not a problem here because we don't expect them to change ever. Quieten the respective warning messages in modpost for any symbols containing a dot. Signed-off-by: Andi Kleen --- scripts/mod/modpost.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index 6510536c06df..7a06d2032ecf 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -693,7 +693,9 @@ static void handle_modversions(struct module *mod, struct elf_info *info, #endif if (is_crc) { const char *e = is_vmlinux(mod->name) ?"":".ko"; - warn("EXPORT symbol \"%s\" [%s%s] version generation failed, symbol will not be versioned.\n", symname + strlen(CRC_PFX), mod->name, e); + const char *name = symname + strlen(CRC_PFX); + if (!strchr(name, '.')) + warn("EXPORT symbol \"%s\" [%s%s] version generation failed, symbol will not be versioned.\n", name, mod->name, e); } mod->unres = alloc_symbol(symname, ELF_ST_BIND(sym->st_info) == STB_WEAK, @@ -2220,7 +,7 @@ static int add_versions(struct buffer *b, struct module *mod) for (s = mod->unres; s; s = s->next) { if (!s->module) continue; - if (!s->crc_valid) { + if (!s->crc_valid && !strchr(s->name, '.')) { warn("\"%s\" [%s.ko] has no CRC!\n", s->name, mod->name); continue; -- 2.14.3
Re: [PATCH v3 01/13] x86/retpoline: Add initial retpoline support
> If the *compiler* uses the out-of-line version, that's a separate > thing. But for our asm cases, let's just make it all be the inline > case, ok? Should be a simple change. > > It also should simplify the whole target generation. None of this > silly "__x86.indirect_thunk.\reg" crap with different targets for > different register choices. > > Hmm? We need the different thunks anyways for the compiler generated code. So it won't go away. -Andi
Re: [PATCH v3 01/13] x86/retpoline: Add initial retpoline support
> If the *compiler* uses the out-of-line version, that's a separate > thing. But for our asm cases, let's just make it all be the inline > case, ok? Should be a simple change. > > It also should simplify the whole target generation. None of this > silly "__x86.indirect_thunk.\reg" crap with different targets for > different register choices. > > Hmm? We need the different thunks anyways for the compiler generated code. So it won't go away. -Andi
Re: [PATCH v3 01/13] x86/retpoline: Add initial retpoline support
> Clearly Paul's approach to retpoline without lfence is faster. > I'm guessing it wasn't shared with amazon/intel until now and > this set of patches going to adopt it, right? > > Paul, could you share a link to a set of alternative gcc patches > that do retpoline similar to llvm diff ? I don't think it's a good idea to use any sequence not signed off by CPU designers and extensively tested. While another one may work for most tests, it could always fail in some corner case. Right now we have the more heavy weight one and I would suggest to stay with that one for now. Then worry about more optimizations later. Correctness first. -Andi
Re: [PATCH v3 01/13] x86/retpoline: Add initial retpoline support
> Clearly Paul's approach to retpoline without lfence is faster. > I'm guessing it wasn't shared with amazon/intel until now and > this set of patches going to adopt it, right? > > Paul, could you share a link to a set of alternative gcc patches > that do retpoline similar to llvm diff ? I don't think it's a good idea to use any sequence not signed off by CPU designers and extensively tested. While another one may work for most tests, it could always fail in some corner case. Right now we have the more heavy weight one and I would suggest to stay with that one for now. Then worry about more optimizations later. Correctness first. -Andi
Re: [PATCH v3 10/13] x86/retpoline/pvops: Convert assembler indirect jumps
On Thu, Jan 04, 2018 at 04:02:06PM +0100, Juergen Gross wrote: > On 04/01/18 15:37, David Woodhouse wrote: > > Convert pvops invocations to use non-speculative call sequences, when > > CONFIG_RETPOLINE is enabled. > > > > There is scope for future optimisation here — once the pvops methods are > > actually set, we could just turn the damn things into *direct* jumps. > > But this is perfectly sufficient for now, without that added complexity. > > I don't see the need to modify the pvops calls. > > All indirect calls are replaced by either direct calls or other code > long before any user code is active. > > For modules the replacements are in place before the module is being > used. Agreed. This shouldn't be needed. -Andi
Re: [PATCH v3 10/13] x86/retpoline/pvops: Convert assembler indirect jumps
On Thu, Jan 04, 2018 at 04:02:06PM +0100, Juergen Gross wrote: > On 04/01/18 15:37, David Woodhouse wrote: > > Convert pvops invocations to use non-speculative call sequences, when > > CONFIG_RETPOLINE is enabled. > > > > There is scope for future optimisation here — once the pvops methods are > > actually set, we could just turn the damn things into *direct* jumps. > > But this is perfectly sufficient for now, without that added complexity. > > I don't see the need to modify the pvops calls. > > All indirect calls are replaced by either direct calls or other code > long before any user code is active. > > For modules the replacements are in place before the module is being > used. Agreed. This shouldn't be needed. -Andi
Re: [PATCH v2 11/12] retpoline/objtool: Disable some objtool warnings
On Thu, Jan 04, 2018 at 10:06:01AM -0600, Josh Poimboeuf wrote: > On Thu, Jan 04, 2018 at 07:59:14AM -0800, Andi Kleen wrote: > > > NAK. We can't blindly disable objtool warnings, that will break > > > livepatch and the ORC unwinder. If you share a .o file (or the GCC > > > code) I can look at adding retpoline support. > > > > I don't think we can wait for that. We can disable livepatch and the > > unwinder for now. They are not essential. Frame pointers should work > > well enough for unwinding > > If you want to make this feature conflict with livepatch and ORC, > silencing objtool warnings is not the way to do it. I don't see why it would conflict with the unwinder anyways? It doesn't change the long term stack state, so it should be invisible to the unwinder (unless you crash in the thunk, which is very unlikely) I actually got some unwinder backtraces during development and they seemed to work. > > > and afaik nobody can use livepatch in mainline anyways. > > Why not? The patch creation tooling is still out-of-tree, but livepatch > itself is fully supported in mainline. Ok. Still doesn't seem critical at this point if it's some out of tree thing. -Andi
Re: [PATCH v2 11/12] retpoline/objtool: Disable some objtool warnings
On Thu, Jan 04, 2018 at 10:06:01AM -0600, Josh Poimboeuf wrote: > On Thu, Jan 04, 2018 at 07:59:14AM -0800, Andi Kleen wrote: > > > NAK. We can't blindly disable objtool warnings, that will break > > > livepatch and the ORC unwinder. If you share a .o file (or the GCC > > > code) I can look at adding retpoline support. > > > > I don't think we can wait for that. We can disable livepatch and the > > unwinder for now. They are not essential. Frame pointers should work > > well enough for unwinding > > If you want to make this feature conflict with livepatch and ORC, > silencing objtool warnings is not the way to do it. I don't see why it would conflict with the unwinder anyways? It doesn't change the long term stack state, so it should be invisible to the unwinder (unless you crash in the thunk, which is very unlikely) I actually got some unwinder backtraces during development and they seemed to work. > > > and afaik nobody can use livepatch in mainline anyways. > > Why not? The patch creation tooling is still out-of-tree, but livepatch > itself is fully supported in mainline. Ok. Still doesn't seem critical at this point if it's some out of tree thing. -Andi
Re: [PATCH v2 11/12] retpoline/objtool: Disable some objtool warnings
> NAK. We can't blindly disable objtool warnings, that will break > livepatch and the ORC unwinder. If you share a .o file (or the GCC > code) I can look at adding retpoline support. I don't think we can wait for that. We can disable livepatch and the unwinder for now. They are not essential. Frame pointers should work well enough for unwinding and afaik nobody can use livepatch in mainline anyways. -Andi
Re: [PATCH v2 11/12] retpoline/objtool: Disable some objtool warnings
> NAK. We can't blindly disable objtool warnings, that will break > livepatch and the ORC unwinder. If you share a .o file (or the GCC > code) I can look at adding retpoline support. I don't think we can wait for that. We can disable livepatch and the unwinder for now. They are not essential. Frame pointers should work well enough for unwinding and afaik nobody can use livepatch in mainline anyways. -Andi
Re: Avoid speculative indirect calls in kernel
> +.macro JMP_THUNK reg:req > +#ifdef RETPOLINE > +Â Â Â ALTERNATIVE __stringify(jmp __x86.indirect_thunk.\reg), > __stringify(jmp *%\reg), X86_FEATURE_IBRS_ATT > +#else > +Â Â Â jmp *\reg > +#endif > +.endm I remove that because what you're testing for doesn't exist in the tree yet. Yes it can be added later. Right now we just want a basic static version to work reliably. -Andi
Re: Avoid speculative indirect calls in kernel
> +.macro JMP_THUNK reg:req > +#ifdef RETPOLINE > +Â Â Â ALTERNATIVE __stringify(jmp __x86.indirect_thunk.\reg), > __stringify(jmp *%\reg), X86_FEATURE_IBRS_ATT > +#else > +Â Â Â jmp *\reg > +#endif > +.endm I remove that because what you're testing for doesn't exist in the tree yet. Yes it can be added later. Right now we just want a basic static version to work reliably. -Andi
Re: [PATCH v2 06/12] x86/retpoline/crypto: Convert xen assembler indirect jumps
> > diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S > > b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S > > index 1743e6850e00..9cd8450a2050 100644 > > --- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S > > +++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S > > @@ -12,6 +12,7 @@ > > > > #include > > #include > > +#include > > I fail to connect this change to the patch title. > > Maybe should be part of the crypto patch? Right I moved the hunk into the wrong patch. Will fix. -Andi
Re: [PATCH v2 06/12] x86/retpoline/crypto: Convert xen assembler indirect jumps
> > diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S > > b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S > > index 1743e6850e00..9cd8450a2050 100644 > > --- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S > > +++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S > > @@ -12,6 +12,7 @@ > > > > #include > > #include > > +#include > > I fail to connect this change to the patch title. > > Maybe should be part of the crypto patch? Right I moved the hunk into the wrong patch. Will fix. -Andi
[PATCH v2 06/12] x86/retpoline/crypto: Convert xen assembler indirect jumps
From: Andi Kleen <a...@linux.intel.com> Convert all indirect jumps in xen inline assembler code to use non speculative sequences. Based on code from David Woodhouse and Tim Chen Signed-off-by: Andi Kleen <a...@linux.intel.com> --- arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 1 + arch/x86/include/asm/xen/hypercall.h | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S index 1743e6850e00..9cd8450a2050 100644 --- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S +++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S @@ -12,6 +12,7 @@ #include #include +#include #define CAMELLIA_TABLE_BYTE_LEN 272 diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h index 7cb282e9e587..91de35bcce5e 100644 --- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -44,6 +44,7 @@ #include #include #include +#include #include #include @@ -217,7 +218,7 @@ privcmd_call(unsigned call, __HYPERCALL_5ARG(a1, a2, a3, a4, a5); stac(); - asm volatile("call *%[call]" + asm volatile(NOSPEC_CALL("%[call]") : __HYPERCALL_5PARAM : [call] "a" (_page[call]) : __HYPERCALL_CLOBBER5); -- 2.14.3
[PATCH v2 06/12] x86/retpoline/crypto: Convert xen assembler indirect jumps
From: Andi Kleen Convert all indirect jumps in xen inline assembler code to use non speculative sequences. Based on code from David Woodhouse and Tim Chen Signed-off-by: Andi Kleen --- arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 1 + arch/x86/include/asm/xen/hypercall.h | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S index 1743e6850e00..9cd8450a2050 100644 --- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S +++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S @@ -12,6 +12,7 @@ #include #include +#include #define CAMELLIA_TABLE_BYTE_LEN 272 diff --git a/arch/x86/include/asm/xen/hypercall.h b/arch/x86/include/asm/xen/hypercall.h index 7cb282e9e587..91de35bcce5e 100644 --- a/arch/x86/include/asm/xen/hypercall.h +++ b/arch/x86/include/asm/xen/hypercall.h @@ -44,6 +44,7 @@ #include #include #include +#include #include #include @@ -217,7 +218,7 @@ privcmd_call(unsigned call, __HYPERCALL_5ARG(a1, a2, a3, a4, a5); stac(); - asm volatile("call *%[call]" + asm volatile(NOSPEC_CALL("%[call]") : __HYPERCALL_5PARAM : [call] "a" (_page[call]) : __HYPERCALL_CLOBBER5); -- 2.14.3
[PATCH v2 05/12] x86/retpoline/hyperv: Convert assembler indirect jumps
From: Andi Kleen <a...@linux.intel.com> Convert all indirect jumps in hyperv inline asm code to use non speculative sequences. Based on code from David Woodhouse and Tim Chen Signed-off-by: Andi Kleen <a...@linux.intel.com> --- arch/x86/include/asm/mshyperv.h | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index 5400add2885b..2dfcdd401d4c 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -7,6 +7,7 @@ #include #include #include +#include /* * The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent @@ -186,7 +187,7 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output) return U64_MAX; __asm__ __volatile__("mov %4, %%r8\n" -"call *%5" +NOSPEC_CALL("%5") : "=a" (hv_status), ASM_CALL_CONSTRAINT, "+c" (control), "+d" (input_address) : "r" (output_address), "m" (hv_hypercall_pg) @@ -200,7 +201,7 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output) if (!hv_hypercall_pg) return U64_MAX; - __asm__ __volatile__("call *%7" + __asm__ __volatile__(NOSPEC_CALL("%7") : "=A" (hv_status), "+c" (input_address_lo), ASM_CALL_CONSTRAINT : "A" (control), @@ -227,7 +228,7 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1) #ifdef CONFIG_X86_64 { - __asm__ __volatile__("call *%4" + __asm__ __volatile__(NOSPEC_CALL("%4") : "=a" (hv_status), ASM_CALL_CONSTRAINT, "+c" (control), "+d" (input1) : "m" (hv_hypercall_pg) @@ -238,7 +239,7 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1) u32 input1_hi = upper_32_bits(input1); u32 input1_lo = lower_32_bits(input1); - __asm__ __volatile__ ("call *%5" + __asm__ __volatile__ (NOSPEC_CALL("%5") : "=A"(hv_status), "+c"(input1_lo), ASM_CALL_CONSTRAINT -- 2.14.3
[PATCH v2 05/12] x86/retpoline/hyperv: Convert assembler indirect jumps
From: Andi Kleen Convert all indirect jumps in hyperv inline asm code to use non speculative sequences. Based on code from David Woodhouse and Tim Chen Signed-off-by: Andi Kleen --- arch/x86/include/asm/mshyperv.h | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index 5400add2885b..2dfcdd401d4c 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -7,6 +7,7 @@ #include #include #include +#include /* * The below CPUID leaves are present if VersionAndFeatures.HypervisorPresent @@ -186,7 +187,7 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output) return U64_MAX; __asm__ __volatile__("mov %4, %%r8\n" -"call *%5" +NOSPEC_CALL("%5") : "=a" (hv_status), ASM_CALL_CONSTRAINT, "+c" (control), "+d" (input_address) : "r" (output_address), "m" (hv_hypercall_pg) @@ -200,7 +201,7 @@ static inline u64 hv_do_hypercall(u64 control, void *input, void *output) if (!hv_hypercall_pg) return U64_MAX; - __asm__ __volatile__("call *%7" + __asm__ __volatile__(NOSPEC_CALL("%7") : "=A" (hv_status), "+c" (input_address_lo), ASM_CALL_CONSTRAINT : "A" (control), @@ -227,7 +228,7 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1) #ifdef CONFIG_X86_64 { - __asm__ __volatile__("call *%4" + __asm__ __volatile__(NOSPEC_CALL("%4") : "=a" (hv_status), ASM_CALL_CONSTRAINT, "+c" (control), "+d" (input1) : "m" (hv_hypercall_pg) @@ -238,7 +239,7 @@ static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1) u32 input1_hi = upper_32_bits(input1); u32 input1_lo = lower_32_bits(input1); - __asm__ __volatile__ ("call *%5" + __asm__ __volatile__ (NOSPEC_CALL("%5") : "=A"(hv_status), "+c"(input1_lo), ASM_CALL_CONSTRAINT -- 2.14.3
[PATCH v2 01/12] x86/retpoline: Define retpoline indirect thunk and macros
From: Dave Hansen <dave.han...@linux.intel.com> From: David Woodhouse <d...@amazon.co.uk> retpoline is a special sequence on Intel CPUs to stop speculation for indirect branches. Provide assembler infrastructure to use retpoline by the compiler and for assembler. We add the out of line trampoline used by the compiler, and NOSPEC_JUMP / NOSPEC_CALL macros for assembler [Originally from David and Tim, heavily hacked by AK] v2: Add CONFIG_RETPOLINE option Signed-off-by: David Woodhouse <d...@amazon.co.uk> Signed-off-by: Tim Chen <tim.c.c...@linux.intel.com> Signed-off-by: Andi Kleen <a...@linux.intel.com> --- arch/x86/Kconfig| 8 + arch/x86/include/asm/jump-asm.h | 70 + arch/x86/kernel/vmlinux.lds.S | 1 + arch/x86/lib/Makefile | 1 + arch/x86/lib/retpoline.S| 35 + 5 files changed, 115 insertions(+) create mode 100644 arch/x86/include/asm/jump-asm.h create mode 100644 arch/x86/lib/retpoline.S diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d4fc98c50378..8b0facfa35be 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -429,6 +429,14 @@ config GOLDFISH def_bool y depends on X86_GOLDFISH +config RETPOLINE + bool "Avoid speculative indirect branches in kernel" + default y + help + Compile kernel with the retpoline compiler options to guard against + kernel to user data leaks by avoiding speculative indirect + branches. Requires a new enough compiler. The kernel may run slower. + config INTEL_RDT bool "Intel Resource Director Technology support" default n diff --git a/arch/x86/include/asm/jump-asm.h b/arch/x86/include/asm/jump-asm.h new file mode 100644 index ..936fa620f346 --- /dev/null +++ b/arch/x86/include/asm/jump-asm.h @@ -0,0 +1,70 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef JUMP_ASM_H +#define JUMP_ASM_H 1 + +#ifdef __ASSEMBLY__ + +#ifdef CONFIG_RETPOLINE + +/* + * Jump to an indirect pointer without speculation. + * + * The out of line __x86.indirect_thunk has special code sequences + * to stop speculation. + */ + +.macro NOSPEC_JMP target + push\target + jmp __x86.indirect_thunk +.endm + + +/* + * Call an indirect pointer without speculation. + */ + +.macro NOSPEC_CALL target + jmp 1221f +1222: + push\target + jmp __x86.indirect_thunk +1221: + call1222b +.endm + +#else /* CONFIG_RETPOLINE */ + +.macro NOSPEC_JMP target + jmp *\target +.endm + +.macro NOSPEC_CALL target + call *\target +.endm + +#endif /* !CONFIG_RETPOLINE */ + +#else /* __ASSEMBLY__ */ + +#ifdef CONFIG_RETPOLINE + +#define NOSPEC_JMP(t) \ + "push " t "; " \ + "jmp __x86.indirect_thunk; " + +#define NOSPEC_CALL(t) \ + " jmp 1221f; "\ + "1222: push " t ";"\ + " jmp __x86.indirect_thunk;" \ + "1221: call 1222b;" + +#else /* CONFIG_RETPOLINE */ + +#define NOSPEC_JMP(t) "jmp *" t "; " +#define NOSPEC_CALL(t) "call *" t "; " + +#endif /* !CONFIG_RETPOLINE */ + +#endif /* !__ASSEMBLY */ + +#endif diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 1e413a9326aa..2e64241a6664 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -103,6 +103,7 @@ SECTIONS /* bootstrapping code */ HEAD_TEXT . = ALIGN(8); + *(.text.__x86.indirect_thunk) TEXT_TEXT SCHED_TEXT CPUIDLE_TEXT diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile index 7b181b61170e..f23934bbaf4e 100644 --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -26,6 +26,7 @@ lib-y += memcpy_$(BITS).o lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o +lib-$(CONFIG_RETPOLINE) += retpoline.o obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S new file mode 100644 index ..cb40781adbfe --- /dev/null +++ b/arch/x86/lib/retpoline.S @@ -0,0 +1,35 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +/* + * Out of line jump trampoline for calls that disable speculation. + * + * This is a special sequence that prevents the CPU speculating + * for indirect calls. + * + * This can be called by gcc generated code, or with the asm macros + * in asm/jump-asm.h + */ + +#include +#include +#include + + .section.text.__x86.indirect_thunk,"ax" + +ENTRY(__x86.indirect_thunk) + CFI_STARTPROC + callretpoline_call_target +2: + lfence
[PATCH v2 01/12] x86/retpoline: Define retpoline indirect thunk and macros
From: Dave Hansen From: David Woodhouse retpoline is a special sequence on Intel CPUs to stop speculation for indirect branches. Provide assembler infrastructure to use retpoline by the compiler and for assembler. We add the out of line trampoline used by the compiler, and NOSPEC_JUMP / NOSPEC_CALL macros for assembler [Originally from David and Tim, heavily hacked by AK] v2: Add CONFIG_RETPOLINE option Signed-off-by: David Woodhouse Signed-off-by: Tim Chen Signed-off-by: Andi Kleen --- arch/x86/Kconfig| 8 + arch/x86/include/asm/jump-asm.h | 70 + arch/x86/kernel/vmlinux.lds.S | 1 + arch/x86/lib/Makefile | 1 + arch/x86/lib/retpoline.S| 35 + 5 files changed, 115 insertions(+) create mode 100644 arch/x86/include/asm/jump-asm.h create mode 100644 arch/x86/lib/retpoline.S diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index d4fc98c50378..8b0facfa35be 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -429,6 +429,14 @@ config GOLDFISH def_bool y depends on X86_GOLDFISH +config RETPOLINE + bool "Avoid speculative indirect branches in kernel" + default y + help + Compile kernel with the retpoline compiler options to guard against + kernel to user data leaks by avoiding speculative indirect + branches. Requires a new enough compiler. The kernel may run slower. + config INTEL_RDT bool "Intel Resource Director Technology support" default n diff --git a/arch/x86/include/asm/jump-asm.h b/arch/x86/include/asm/jump-asm.h new file mode 100644 index ..936fa620f346 --- /dev/null +++ b/arch/x86/include/asm/jump-asm.h @@ -0,0 +1,70 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef JUMP_ASM_H +#define JUMP_ASM_H 1 + +#ifdef __ASSEMBLY__ + +#ifdef CONFIG_RETPOLINE + +/* + * Jump to an indirect pointer without speculation. + * + * The out of line __x86.indirect_thunk has special code sequences + * to stop speculation. + */ + +.macro NOSPEC_JMP target + push\target + jmp __x86.indirect_thunk +.endm + + +/* + * Call an indirect pointer without speculation. + */ + +.macro NOSPEC_CALL target + jmp 1221f +1222: + push\target + jmp __x86.indirect_thunk +1221: + call1222b +.endm + +#else /* CONFIG_RETPOLINE */ + +.macro NOSPEC_JMP target + jmp *\target +.endm + +.macro NOSPEC_CALL target + call *\target +.endm + +#endif /* !CONFIG_RETPOLINE */ + +#else /* __ASSEMBLY__ */ + +#ifdef CONFIG_RETPOLINE + +#define NOSPEC_JMP(t) \ + "push " t "; " \ + "jmp __x86.indirect_thunk; " + +#define NOSPEC_CALL(t) \ + " jmp 1221f; "\ + "1222: push " t ";"\ + " jmp __x86.indirect_thunk;" \ + "1221: call 1222b;" + +#else /* CONFIG_RETPOLINE */ + +#define NOSPEC_JMP(t) "jmp *" t "; " +#define NOSPEC_CALL(t) "call *" t "; " + +#endif /* !CONFIG_RETPOLINE */ + +#endif /* !__ASSEMBLY */ + +#endif diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 1e413a9326aa..2e64241a6664 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -103,6 +103,7 @@ SECTIONS /* bootstrapping code */ HEAD_TEXT . = ALIGN(8); + *(.text.__x86.indirect_thunk) TEXT_TEXT SCHED_TEXT CPUIDLE_TEXT diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile index 7b181b61170e..f23934bbaf4e 100644 --- a/arch/x86/lib/Makefile +++ b/arch/x86/lib/Makefile @@ -26,6 +26,7 @@ lib-y += memcpy_$(BITS).o lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o insn-eval.o lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o +lib-$(CONFIG_RETPOLINE) += retpoline.o obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S new file mode 100644 index ..cb40781adbfe --- /dev/null +++ b/arch/x86/lib/retpoline.S @@ -0,0 +1,35 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +/* + * Out of line jump trampoline for calls that disable speculation. + * + * This is a special sequence that prevents the CPU speculating + * for indirect calls. + * + * This can be called by gcc generated code, or with the asm macros + * in asm/jump-asm.h + */ + +#include +#include +#include + + .section.text.__x86.indirect_thunk,"ax" + +ENTRY(__x86.indirect_thunk) + CFI_STARTPROC + callretpoline_call_target +2: + lfence /* stop speculation */ + jmp 2b +retpoline_call_target: +#ifdef CONFIG_64BIT + lea 8(%rsp), %rsp +#else + lea 4(%esp), %esp +#endif + ret + CFI_ENDPROC +ENDPROC(__x86.indirect_thunk) + + EXPORT_SYMBOL(__x86.indirect_thunk) -- 2.14.3
[PATCH v2 12/12] retpoline: Attempt to quiten objtool warning for unreachable code
From: Andi Kleen <a...@linux.intel.com> The speculative jump trampoline has to contain unreachable code. objtool keeps complaining arch/x86/lib/retpoline.o: warning: objtool: __x86.indirect_thunk()+0x8: unreachable instruction I tried to fix it here by adding ASM_UNREACHABLE annotation (after supporting them for pure assembler), but it still complains. Seems like a objtool bug? So it doesn't actually fix the warning oyet. Of course it's just a warning so the kernel will still work fine. Perhaps Josh can figure it out Cc: jpoim...@redhat.com Not-Signed-off-by: Andi Kleen <a...@linux.intel.com> --- arch/x86/lib/retpoline.S | 3 +++ include/linux/compiler.h | 10 +- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S index cb40781adbfe..f4ed556a90ee 100644 --- a/arch/x86/lib/retpoline.S +++ b/arch/x86/lib/retpoline.S @@ -11,6 +11,7 @@ */ #include +#include #include #include @@ -21,7 +22,9 @@ ENTRY(__x86.indirect_thunk) callretpoline_call_target 2: lfence /* stop speculation */ + ASM_UNREACHABLE jmp 2b + ASM_UNREACHABLE retpoline_call_target: #ifdef CONFIG_64BIT lea 8(%rsp), %rsp diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 52e611ab9a6c..cfba91acc79a 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -269,7 +269,15 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s #endif /* __KERNEL__ */ -#endif /* __ASSEMBLY__ */ +#else /* __ASSEMBLY__ */ + +#define ASM_UNREACHABLE \ + 999:\ + .pushsection .discard.unreachable; \ + .long 999b - .; \ + .popsection + +#endif /* !__ASSEMBLY__ */ /* Compile time object size, -1 for unknown */ #ifndef __compiletime_object_size -- 2.14.3
[PATCH v2 03/12] x86/retpoline/entry: Convert entry assembler indirect jumps
From: Andi Kleen <a...@linux.intel.com> Convert all indirect jumps in core 32/64bit entry assembler code to use non speculative sequences. Based on code from David Woodhouse and Tim Chen Signed-off-by: Andi Kleen <a...@linux.intel.com> --- arch/x86/entry/entry_32.S | 5 +++-- arch/x86/entry/entry_64.S | 12 +++- 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index ace8f321a5a1..a4b88260d6f7 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -44,6 +44,7 @@ #include #include #include +#include .section .entry.text, "ax" @@ -290,7 +291,7 @@ ENTRY(ret_from_fork) /* kernel thread */ 1: movl%edi, %eax - call*%ebx + NOSPEC_CALL %ebx /* * A kernel thread is allowed to return here after successfully * calling do_execve(). Exit to userspace to complete the execve() @@ -919,7 +920,7 @@ common_exception: movl%ecx, %es TRACE_IRQS_OFF movl%esp, %eax # pt_regs pointer - call*%edi + NOSPEC_CALL %edi jmp ret_from_exception END(common_exception) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index f048e384ff54..486990fb3e4d 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -37,6 +37,7 @@ #include #include #include +#include #include #include "calling.h" @@ -191,7 +192,7 @@ ENTRY(entry_SYSCALL_64_trampoline) */ pushq %rdi movq$entry_SYSCALL_64_stage2, %rdi - jmp *%rdi + NOSPEC_JMP %rdi END(entry_SYSCALL_64_trampoline) .popsection @@ -269,8 +270,9 @@ entry_SYSCALL_64_fastpath: * This call instruction is handled specially in stub_ptregs_64. * It might end up jumping to the slow path. If it jumps, RAX * and all argument registers are clobbered. -*/ - call*sys_call_table(, %rax, 8) + */ + movqsys_call_table(, %rax, 8), %rax + NOSPEC_CALL %rax .Lentry_SYSCALL_64_after_fastpath_call: movq%rax, RAX(%rsp) @@ -442,7 +444,7 @@ ENTRY(stub_ptregs_64) jmp entry_SYSCALL64_slow_path 1: - jmp *%rax /* Called from C */ + NOSPEC_JMP %rax/* Called from C */ END(stub_ptregs_64) .macro ptregs_stub func @@ -521,7 +523,7 @@ ENTRY(ret_from_fork) 1: /* kernel thread */ movq%r12, %rdi - call*%rbx + NOSPEC_CALL %rbx /* * A kernel thread is allowed to return here after successfully * calling do_execve(). Exit to userspace to complete the execve() -- 2.14.3
[PATCH v2 03/12] x86/retpoline/entry: Convert entry assembler indirect jumps
From: Andi Kleen Convert all indirect jumps in core 32/64bit entry assembler code to use non speculative sequences. Based on code from David Woodhouse and Tim Chen Signed-off-by: Andi Kleen --- arch/x86/entry/entry_32.S | 5 +++-- arch/x86/entry/entry_64.S | 12 +++- 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index ace8f321a5a1..a4b88260d6f7 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -44,6 +44,7 @@ #include #include #include +#include .section .entry.text, "ax" @@ -290,7 +291,7 @@ ENTRY(ret_from_fork) /* kernel thread */ 1: movl%edi, %eax - call*%ebx + NOSPEC_CALL %ebx /* * A kernel thread is allowed to return here after successfully * calling do_execve(). Exit to userspace to complete the execve() @@ -919,7 +920,7 @@ common_exception: movl%ecx, %es TRACE_IRQS_OFF movl%esp, %eax # pt_regs pointer - call*%edi + NOSPEC_CALL %edi jmp ret_from_exception END(common_exception) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index f048e384ff54..486990fb3e4d 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -37,6 +37,7 @@ #include #include #include +#include #include #include "calling.h" @@ -191,7 +192,7 @@ ENTRY(entry_SYSCALL_64_trampoline) */ pushq %rdi movq$entry_SYSCALL_64_stage2, %rdi - jmp *%rdi + NOSPEC_JMP %rdi END(entry_SYSCALL_64_trampoline) .popsection @@ -269,8 +270,9 @@ entry_SYSCALL_64_fastpath: * This call instruction is handled specially in stub_ptregs_64. * It might end up jumping to the slow path. If it jumps, RAX * and all argument registers are clobbered. -*/ - call*sys_call_table(, %rax, 8) + */ + movqsys_call_table(, %rax, 8), %rax + NOSPEC_CALL %rax .Lentry_SYSCALL_64_after_fastpath_call: movq%rax, RAX(%rsp) @@ -442,7 +444,7 @@ ENTRY(stub_ptregs_64) jmp entry_SYSCALL64_slow_path 1: - jmp *%rax /* Called from C */ + NOSPEC_JMP %rax/* Called from C */ END(stub_ptregs_64) .macro ptregs_stub func @@ -521,7 +523,7 @@ ENTRY(ret_from_fork) 1: /* kernel thread */ movq%r12, %rdi - call*%rbx + NOSPEC_CALL %rbx /* * A kernel thread is allowed to return here after successfully * calling do_execve(). Exit to userspace to complete the execve() -- 2.14.3
[PATCH v2 12/12] retpoline: Attempt to quiten objtool warning for unreachable code
From: Andi Kleen The speculative jump trampoline has to contain unreachable code. objtool keeps complaining arch/x86/lib/retpoline.o: warning: objtool: __x86.indirect_thunk()+0x8: unreachable instruction I tried to fix it here by adding ASM_UNREACHABLE annotation (after supporting them for pure assembler), but it still complains. Seems like a objtool bug? So it doesn't actually fix the warning oyet. Of course it's just a warning so the kernel will still work fine. Perhaps Josh can figure it out Cc: jpoim...@redhat.com Not-Signed-off-by: Andi Kleen --- arch/x86/lib/retpoline.S | 3 +++ include/linux/compiler.h | 10 +- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S index cb40781adbfe..f4ed556a90ee 100644 --- a/arch/x86/lib/retpoline.S +++ b/arch/x86/lib/retpoline.S @@ -11,6 +11,7 @@ */ #include +#include #include #include @@ -21,7 +22,9 @@ ENTRY(__x86.indirect_thunk) callretpoline_call_target 2: lfence /* stop speculation */ + ASM_UNREACHABLE jmp 2b + ASM_UNREACHABLE retpoline_call_target: #ifdef CONFIG_64BIT lea 8(%rsp), %rsp diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 52e611ab9a6c..cfba91acc79a 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -269,7 +269,15 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s #endif /* __KERNEL__ */ -#endif /* __ASSEMBLY__ */ +#else /* __ASSEMBLY__ */ + +#define ASM_UNREACHABLE \ + 999:\ + .pushsection .discard.unreachable; \ + .long 999b - .; \ + .popsection + +#endif /* !__ASSEMBLY__ */ /* Compile time object size, -1 for unknown */ #ifndef __compiletime_object_size -- 2.14.3
[PATCH v2 02/12] x86/retpoline/crypto: Convert crypto assembler indirect jumps
From: Andi Kleen <a...@linux.intel.com> Convert all indirect jumps in crypto assembler code to use non speculative sequences. Based on code from David Woodhouse and Tim Chen Signed-off-by: Andi Kleen <a...@linux.intel.com> --- arch/x86/crypto/aesni-intel_asm.S| 5 +++-- arch/x86/crypto/camellia-aesni-avx-asm_64.S | 3 ++- arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 2 +- arch/x86/crypto/crc32c-pcl-intel-asm_64.S| 3 ++- 4 files changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S index 16627fec80b2..f6c964f30ede 100644 --- a/arch/x86/crypto/aesni-intel_asm.S +++ b/arch/x86/crypto/aesni-intel_asm.S @@ -32,6 +32,7 @@ #include #include #include +#include /* * The following macros are used to move an (un)aligned 16 byte value to/from @@ -2884,7 +2885,7 @@ ENTRY(aesni_xts_crypt8) pxor INC, STATE4 movdqu IV, 0x30(OUTP) - call *%r11 + NOSPEC_CALL %r11 movdqu 0x00(OUTP), INC pxor INC, STATE1 @@ -2929,7 +2930,7 @@ ENTRY(aesni_xts_crypt8) _aesni_gf128mul_x_ble() movups IV, (IVP) - call *%r11 + NOSPEC_CALL %r11 movdqu 0x40(OUTP), INC pxor INC, STATE1 diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S index f7c495e2863c..9006543227b2 100644 --- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S +++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S @@ -17,6 +17,7 @@ #include #include +#include #define CAMELLIA_TABLE_BYTE_LEN 272 @@ -1227,7 +1228,7 @@ camellia_xts_crypt_16way: vpxor 14 * 16(%rax), %xmm15, %xmm14; vpxor 15 * 16(%rax), %xmm15, %xmm15; - call *%r9; + NOSPEC_CALL %r9; addq $(16 * 16), %rsp; diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S index eee5b3982cfd..1743e6850e00 100644 --- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S +++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S @@ -1343,7 +1343,7 @@ camellia_xts_crypt_32way: vpxor 14 * 32(%rax), %ymm15, %ymm14; vpxor 15 * 32(%rax), %ymm15, %ymm15; - call *%r9; + NOSPEC_CALL %r9; addq $(16 * 32), %rsp; diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S index 7a7de27c6f41..969000c64456 100644 --- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S +++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S @@ -45,6 +45,7 @@ #include #include +#include ## ISCSI CRC 32 Implementation with crc32 and pclmulqdq Instruction @@ -172,7 +173,7 @@ continue_block: movzxw (bufp, %rax, 2), len lea crc_array(%rip), bufp lea (bufp, len, 1), bufp - jmp *bufp + NOSPEC_JMP bufp ## 2a) PROCESS FULL BLOCKS: -- 2.14.3
[PATCH v2 02/12] x86/retpoline/crypto: Convert crypto assembler indirect jumps
From: Andi Kleen Convert all indirect jumps in crypto assembler code to use non speculative sequences. Based on code from David Woodhouse and Tim Chen Signed-off-by: Andi Kleen --- arch/x86/crypto/aesni-intel_asm.S| 5 +++-- arch/x86/crypto/camellia-aesni-avx-asm_64.S | 3 ++- arch/x86/crypto/camellia-aesni-avx2-asm_64.S | 2 +- arch/x86/crypto/crc32c-pcl-intel-asm_64.S| 3 ++- 4 files changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/x86/crypto/aesni-intel_asm.S b/arch/x86/crypto/aesni-intel_asm.S index 16627fec80b2..f6c964f30ede 100644 --- a/arch/x86/crypto/aesni-intel_asm.S +++ b/arch/x86/crypto/aesni-intel_asm.S @@ -32,6 +32,7 @@ #include #include #include +#include /* * The following macros are used to move an (un)aligned 16 byte value to/from @@ -2884,7 +2885,7 @@ ENTRY(aesni_xts_crypt8) pxor INC, STATE4 movdqu IV, 0x30(OUTP) - call *%r11 + NOSPEC_CALL %r11 movdqu 0x00(OUTP), INC pxor INC, STATE1 @@ -2929,7 +2930,7 @@ ENTRY(aesni_xts_crypt8) _aesni_gf128mul_x_ble() movups IV, (IVP) - call *%r11 + NOSPEC_CALL %r11 movdqu 0x40(OUTP), INC pxor INC, STATE1 diff --git a/arch/x86/crypto/camellia-aesni-avx-asm_64.S b/arch/x86/crypto/camellia-aesni-avx-asm_64.S index f7c495e2863c..9006543227b2 100644 --- a/arch/x86/crypto/camellia-aesni-avx-asm_64.S +++ b/arch/x86/crypto/camellia-aesni-avx-asm_64.S @@ -17,6 +17,7 @@ #include #include +#include #define CAMELLIA_TABLE_BYTE_LEN 272 @@ -1227,7 +1228,7 @@ camellia_xts_crypt_16way: vpxor 14 * 16(%rax), %xmm15, %xmm14; vpxor 15 * 16(%rax), %xmm15, %xmm15; - call *%r9; + NOSPEC_CALL %r9; addq $(16 * 16), %rsp; diff --git a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S index eee5b3982cfd..1743e6850e00 100644 --- a/arch/x86/crypto/camellia-aesni-avx2-asm_64.S +++ b/arch/x86/crypto/camellia-aesni-avx2-asm_64.S @@ -1343,7 +1343,7 @@ camellia_xts_crypt_32way: vpxor 14 * 32(%rax), %ymm15, %ymm14; vpxor 15 * 32(%rax), %ymm15, %ymm15; - call *%r9; + NOSPEC_CALL %r9; addq $(16 * 32), %rsp; diff --git a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S index 7a7de27c6f41..969000c64456 100644 --- a/arch/x86/crypto/crc32c-pcl-intel-asm_64.S +++ b/arch/x86/crypto/crc32c-pcl-intel-asm_64.S @@ -45,6 +45,7 @@ #include #include +#include ## ISCSI CRC 32 Implementation with crc32 and pclmulqdq Instruction @@ -172,7 +173,7 @@ continue_block: movzxw (bufp, %rax, 2), len lea crc_array(%rip), bufp lea (bufp, len, 1), bufp - jmp *bufp + NOSPEC_JMP bufp ## 2a) PROCESS FULL BLOCKS: -- 2.14.3
[PATCH v2 07/12] x86/retpoline/checksum32: Convert assembler indirect jumps
From: Andi Kleen <a...@linux.intel.com> Convert all indirect jumps in 32bit checksum assembler code to use non speculative sequences. Based on code from David Woodhouse and Tim Chen Signed-off-by: Andi Kleen <a...@linux.intel.com> --- arch/x86/lib/checksum_32.S | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/lib/checksum_32.S b/arch/x86/lib/checksum_32.S index 4d34bb548b41..d19862183d4b 100644 --- a/arch/x86/lib/checksum_32.S +++ b/arch/x86/lib/checksum_32.S @@ -29,6 +29,7 @@ #include #include #include +#include /* * computes a partial checksum, e.g. for TCP/UDP fragments @@ -156,7 +157,7 @@ ENTRY(csum_partial) negl %ebx lea 45f(%ebx,%ebx,2), %ebx testl %esi, %esi - jmp *%ebx + NOSPEC_JMP %ebx # Handle 2-byte-aligned regions 20:addw (%esi), %ax @@ -439,7 +440,7 @@ ENTRY(csum_partial_copy_generic) andl $-32,%edx lea 3f(%ebx,%ebx), %ebx testl %esi, %esi - jmp *%ebx + NOSPEC_JMP %ebx 1: addl $64,%esi addl $64,%edi SRC(movb -32(%edx),%bl) ; SRC(movb (%edx),%bl) -- 2.14.3
[PATCH v2 07/12] x86/retpoline/checksum32: Convert assembler indirect jumps
From: Andi Kleen Convert all indirect jumps in 32bit checksum assembler code to use non speculative sequences. Based on code from David Woodhouse and Tim Chen Signed-off-by: Andi Kleen --- arch/x86/lib/checksum_32.S | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/lib/checksum_32.S b/arch/x86/lib/checksum_32.S index 4d34bb548b41..d19862183d4b 100644 --- a/arch/x86/lib/checksum_32.S +++ b/arch/x86/lib/checksum_32.S @@ -29,6 +29,7 @@ #include #include #include +#include /* * computes a partial checksum, e.g. for TCP/UDP fragments @@ -156,7 +157,7 @@ ENTRY(csum_partial) negl %ebx lea 45f(%ebx,%ebx,2), %ebx testl %esi, %esi - jmp *%ebx + NOSPEC_JMP %ebx # Handle 2-byte-aligned regions 20:addw (%esi), %ax @@ -439,7 +440,7 @@ ENTRY(csum_partial_copy_generic) andl $-32,%edx lea 3f(%ebx,%ebx), %ebx testl %esi, %esi - jmp *%ebx + NOSPEC_JMP %ebx 1: addl $64,%esi addl $64,%edi SRC(movb -32(%edx),%bl) ; SRC(movb (%edx),%bl) -- 2.14.3
Avoid speculative indirect calls in kernel
This is a fix for Variant 2 in https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html Any speculative indirect calls in the kernel can be tricked to execute any kernel code, which may allow side channel attacks that can leak arbitrary kernel data. So we want to avoid speculative indirect calls in the kernel. There's a special code sequence called a retpoline that can do indirect calls without speculation. We use a new compiler option -mindirect-branch=thunk-extern (gcc patch will be released separately) to recompile the kernel with this new sequence. We also patch all the assembler code in the kernel to use the new sequence. The patches were originally from David Woodhouse and Tim Chen, but then reworked and enhanced by me. No performance numbers at this point. 32bit is only boot tested. Git tree available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc spec/retpoline-415-2 v1: Initial post. v2: Add CONFIG_RETPOLINE to build kernel without it. Change warning messages. Hide modpost warning message
[PATCH v2 11/12] retpoline/objtool: Disable some objtool warnings
From: Andi Kleen <a...@linux.intel.com> With the indirect call thunk enabled compiler two objtool warnings are triggered very frequently and make the build very noisy. I don't see a good way to avoid them, so just disable them for now. Signed-off-by: Andi Kleen <a...@linux.intel.com> --- tools/objtool/check.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 9b341584eb1b..435c71f944dc 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -503,8 +503,13 @@ static int add_call_destinations(struct objtool_file *file) insn->call_dest = find_symbol_by_offset(insn->sec, dest_off); if (!insn->call_dest) { +#if 0 + /* Compilers with -mindirect-branch=thunk-extern trigger +* this everywhere on x86. Disable for now. +*/ WARN_FUNC("can't find call dest symbol at offset 0x%lx", insn->sec, insn->offset, dest_off); +#endif return -1; } } else if (rela->sym->type == STT_SECTION) { @@ -1716,8 +1721,14 @@ static int validate_branch(struct objtool_file *file, struct instruction *first, return 1; } else if (func && has_modified_stack_frame()) { +#if 0 + /* Compilers with -mindirect-branch=thunk-extern trigger +* this everywhere on x86. Disable for now. +*/ + WARN_FUNC("sibling call from callable instruction with modified stack frame", sec, insn->offset); +#endif return 1; } -- 2.14.3
Avoid speculative indirect calls in kernel
This is a fix for Variant 2 in https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html Any speculative indirect calls in the kernel can be tricked to execute any kernel code, which may allow side channel attacks that can leak arbitrary kernel data. So we want to avoid speculative indirect calls in the kernel. There's a special code sequence called a retpoline that can do indirect calls without speculation. We use a new compiler option -mindirect-branch=thunk-extern (gcc patch will be released separately) to recompile the kernel with this new sequence. We also patch all the assembler code in the kernel to use the new sequence. The patches were originally from David Woodhouse and Tim Chen, but then reworked and enhanced by me. No performance numbers at this point. 32bit is only boot tested. Git tree available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc spec/retpoline-415-2 v1: Initial post. v2: Add CONFIG_RETPOLINE to build kernel without it. Change warning messages. Hide modpost warning message
[PATCH v2 11/12] retpoline/objtool: Disable some objtool warnings
From: Andi Kleen With the indirect call thunk enabled compiler two objtool warnings are triggered very frequently and make the build very noisy. I don't see a good way to avoid them, so just disable them for now. Signed-off-by: Andi Kleen --- tools/objtool/check.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 9b341584eb1b..435c71f944dc 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -503,8 +503,13 @@ static int add_call_destinations(struct objtool_file *file) insn->call_dest = find_symbol_by_offset(insn->sec, dest_off); if (!insn->call_dest) { +#if 0 + /* Compilers with -mindirect-branch=thunk-extern trigger +* this everywhere on x86. Disable for now. +*/ WARN_FUNC("can't find call dest symbol at offset 0x%lx", insn->sec, insn->offset, dest_off); +#endif return -1; } } else if (rela->sym->type == STT_SECTION) { @@ -1716,8 +1721,14 @@ static int validate_branch(struct objtool_file *file, struct instruction *first, return 1; } else if (func && has_modified_stack_frame()) { +#if 0 + /* Compilers with -mindirect-branch=thunk-extern trigger +* this everywhere on x86. Disable for now. +*/ + WARN_FUNC("sibling call from callable instruction with modified stack frame", sec, insn->offset); +#endif return 1; } -- 2.14.3
[PATCH v2 09/12] x86/retpoline: Finally enable retpoline for C code
From: Dave HansenFrom: David Woodhouse Add retpoline compile option in Makefile Update Makefile with retpoline compile options. This requires a gcc with the retpoline compiler patches enabled. Print a warning when the compiler doesn't support retpoline [Originally from David and Tim, but hacked by AK] v2: Use CONFIG option to enable Signed-off-by: David Woodhouse Signed-off-by: Tim Chen --- arch/x86/Makefile | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 3e73bc255e4e..b02e35350244 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -230,6 +230,17 @@ KBUILD_CFLAGS += -Wno-sign-compare # KBUILD_CFLAGS += -fno-asynchronous-unwind-tables +# +ifdef CONFIG_RETPOLINE + RETPOLINE_CFLAGS += $(call cc-option,-mindirect-branch=thunk-extern) + ifneq ($(RETPOLINE_CFLAGS),) + KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE + KBUILD_AFLAGS += -DRETPOLINE + else +$(warning Retpoline not supported in compiler. System may be insecure.) + endif +endif + archscripts: scripts_basic $(Q)$(MAKE) $(build)=arch/x86/tools relocs -- 2.14.3
[PATCH v2 10/12] retpoline/taint: Taint kernel for missing retpoline in compiler
From: Andi Kleen <a...@linux.intel.com> When the kernel or a module hasn't been compiled with a retpoline aware compiler, print a warning and set a taint flag. For modules it is checked at compile time, however it cannot check assembler or other non compiled objects used in the module link. Due to lack of better letter it uses taint option 'Z' v2: Change warning message Signed-off-by: Andi Kleen <a...@linux.intel.com> --- Documentation/admin-guide/tainted-kernels.rst | 3 +++ arch/x86/kernel/setup.c | 6 ++ include/linux/kernel.h| 4 +++- kernel/module.c | 11 ++- kernel/panic.c| 1 + scripts/mod/modpost.c | 9 + 6 files changed, 32 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst index 1df03b5cb02f..800261b6bd6f 100644 --- a/Documentation/admin-guide/tainted-kernels.rst +++ b/Documentation/admin-guide/tainted-kernels.rst @@ -52,6 +52,9 @@ characters, each representing a particular tainted value. 16) ``K`` if the kernel has been live patched. + 17) ``Z`` if the x86 kernel or a module hasn't been compiled with + a retpoline aware compiler and may be vulnerable to data leaks. + The primary reason for the **'Tainted: '** string is to tell kernel debuggers if this is a clean kernel or if anything unusual has occurred. Tainting is permanent: even if an offending module is diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 8af2e8d0c0a1..cc880b46b756 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1296,6 +1296,12 @@ void __init setup_arch(char **cmdline_p) #endif unwind_init(); + +#ifndef RETPOLINE + add_taint(TAINT_NO_RETPOLINE, LOCKDEP_STILL_OK); + pr_warn("No support for retpoline in kernel compiler\n"); + pr_warn("System may be vulnerable to data leaks.\n"); +#endif } #ifdef CONFIG_X86_32 diff --git a/include/linux/kernel.h b/include/linux/kernel.h index ce51455e2adf..fbb4d3baffcc 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -550,7 +550,9 @@ extern enum system_states { #define TAINT_SOFTLOCKUP 14 #define TAINT_LIVEPATCH15 #define TAINT_AUX 16 -#define TAINT_FLAGS_COUNT 17 +#define TAINT_NO_RETPOLINE 17 + +#define TAINT_FLAGS_COUNT 18 struct taint_flag { char c_true;/* character printed when tainted */ diff --git a/kernel/module.c b/kernel/module.c index dea01ac9cb74..92db3f59a29a 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -3028,7 +3028,16 @@ static int check_modinfo(struct module *mod, struct load_info *info, int flags) mod->name); add_taint_module(mod, TAINT_OOT_MODULE, LOCKDEP_STILL_OK); } - +#ifdef RETPOLINE + if (!get_modinfo(info, "retpoline")) { + if (!test_taint(TAINT_NO_RETPOLINE)) { + pr_warn("%s: loading module not compiled with retpoline compiler.\n", + mod->name); + pr_warn("Kernel may be vulnerable to data leaks.\n"); + } + add_taint_module(mod, TAINT_NO_RETPOLINE, LOCKDEP_STILL_OK); + } +#endif if (get_modinfo(info, "staging")) { add_taint_module(mod, TAINT_CRAP, LOCKDEP_STILL_OK); pr_warn("%s: module is from the staging directory, the quality " diff --git a/kernel/panic.c b/kernel/panic.c index 2cfef408fec9..6686c67b6e4b 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -325,6 +325,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = { { 'L', ' ', false },/* TAINT_SOFTLOCKUP */ { 'K', ' ', true }, /* TAINT_LIVEPATCH */ { 'X', ' ', true }, /* TAINT_AUX */ + { 'Z', ' ', true }, /* TAINT_NO_RETPOLINE */ }; /** diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index f51cf977c65b..6510536c06df 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -2165,6 +2165,14 @@ static void add_intree_flag(struct buffer *b, int is_intree) buf_printf(b, "\nMODULE_INFO(intree, \"Y\");\n"); } +/* Cannot check for assembler */ +static void add_retpoline(struct buffer *b) +{ + buf_printf(b, "\n#ifdef RETPOLINE\n"); + buf_printf(b, "MODULE_INFO(retpoline, \"Y\");\n"); + buf_printf(b, "#endif\n"); +} + static void add_staging_flag(struct buffer *b, const char *name) { static const char *staging_dir = "drivers/staging"; @@ -2506,6 +2514,7 @@ int main(int argc, char **argv) err |= check_modname_len(mod);
[PATCH v2 09/12] x86/retpoline: Finally enable retpoline for C code
From: Dave Hansen From: David Woodhouse Add retpoline compile option in Makefile Update Makefile with retpoline compile options. This requires a gcc with the retpoline compiler patches enabled. Print a warning when the compiler doesn't support retpoline [Originally from David and Tim, but hacked by AK] v2: Use CONFIG option to enable Signed-off-by: David Woodhouse Signed-off-by: Tim Chen --- arch/x86/Makefile | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 3e73bc255e4e..b02e35350244 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -230,6 +230,17 @@ KBUILD_CFLAGS += -Wno-sign-compare # KBUILD_CFLAGS += -fno-asynchronous-unwind-tables +# +ifdef CONFIG_RETPOLINE + RETPOLINE_CFLAGS += $(call cc-option,-mindirect-branch=thunk-extern) + ifneq ($(RETPOLINE_CFLAGS),) + KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE + KBUILD_AFLAGS += -DRETPOLINE + else +$(warning Retpoline not supported in compiler. System may be insecure.) + endif +endif + archscripts: scripts_basic $(Q)$(MAKE) $(build)=arch/x86/tools relocs -- 2.14.3
[PATCH v2 10/12] retpoline/taint: Taint kernel for missing retpoline in compiler
From: Andi Kleen When the kernel or a module hasn't been compiled with a retpoline aware compiler, print a warning and set a taint flag. For modules it is checked at compile time, however it cannot check assembler or other non compiled objects used in the module link. Due to lack of better letter it uses taint option 'Z' v2: Change warning message Signed-off-by: Andi Kleen --- Documentation/admin-guide/tainted-kernels.rst | 3 +++ arch/x86/kernel/setup.c | 6 ++ include/linux/kernel.h| 4 +++- kernel/module.c | 11 ++- kernel/panic.c| 1 + scripts/mod/modpost.c | 9 + 6 files changed, 32 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst index 1df03b5cb02f..800261b6bd6f 100644 --- a/Documentation/admin-guide/tainted-kernels.rst +++ b/Documentation/admin-guide/tainted-kernels.rst @@ -52,6 +52,9 @@ characters, each representing a particular tainted value. 16) ``K`` if the kernel has been live patched. + 17) ``Z`` if the x86 kernel or a module hasn't been compiled with + a retpoline aware compiler and may be vulnerable to data leaks. + The primary reason for the **'Tainted: '** string is to tell kernel debuggers if this is a clean kernel or if anything unusual has occurred. Tainting is permanent: even if an offending module is diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 8af2e8d0c0a1..cc880b46b756 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -1296,6 +1296,12 @@ void __init setup_arch(char **cmdline_p) #endif unwind_init(); + +#ifndef RETPOLINE + add_taint(TAINT_NO_RETPOLINE, LOCKDEP_STILL_OK); + pr_warn("No support for retpoline in kernel compiler\n"); + pr_warn("System may be vulnerable to data leaks.\n"); +#endif } #ifdef CONFIG_X86_32 diff --git a/include/linux/kernel.h b/include/linux/kernel.h index ce51455e2adf..fbb4d3baffcc 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -550,7 +550,9 @@ extern enum system_states { #define TAINT_SOFTLOCKUP 14 #define TAINT_LIVEPATCH15 #define TAINT_AUX 16 -#define TAINT_FLAGS_COUNT 17 +#define TAINT_NO_RETPOLINE 17 + +#define TAINT_FLAGS_COUNT 18 struct taint_flag { char c_true;/* character printed when tainted */ diff --git a/kernel/module.c b/kernel/module.c index dea01ac9cb74..92db3f59a29a 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -3028,7 +3028,16 @@ static int check_modinfo(struct module *mod, struct load_info *info, int flags) mod->name); add_taint_module(mod, TAINT_OOT_MODULE, LOCKDEP_STILL_OK); } - +#ifdef RETPOLINE + if (!get_modinfo(info, "retpoline")) { + if (!test_taint(TAINT_NO_RETPOLINE)) { + pr_warn("%s: loading module not compiled with retpoline compiler.\n", + mod->name); + pr_warn("Kernel may be vulnerable to data leaks.\n"); + } + add_taint_module(mod, TAINT_NO_RETPOLINE, LOCKDEP_STILL_OK); + } +#endif if (get_modinfo(info, "staging")) { add_taint_module(mod, TAINT_CRAP, LOCKDEP_STILL_OK); pr_warn("%s: module is from the staging directory, the quality " diff --git a/kernel/panic.c b/kernel/panic.c index 2cfef408fec9..6686c67b6e4b 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -325,6 +325,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = { { 'L', ' ', false },/* TAINT_SOFTLOCKUP */ { 'K', ' ', true }, /* TAINT_LIVEPATCH */ { 'X', ' ', true }, /* TAINT_AUX */ + { 'Z', ' ', true }, /* TAINT_NO_RETPOLINE */ }; /** diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index f51cf977c65b..6510536c06df 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -2165,6 +2165,14 @@ static void add_intree_flag(struct buffer *b, int is_intree) buf_printf(b, "\nMODULE_INFO(intree, \"Y\");\n"); } +/* Cannot check for assembler */ +static void add_retpoline(struct buffer *b) +{ + buf_printf(b, "\n#ifdef RETPOLINE\n"); + buf_printf(b, "MODULE_INFO(retpoline, \"Y\");\n"); + buf_printf(b, "#endif\n"); +} + static void add_staging_flag(struct buffer *b, const char *name) { static const char *staging_dir = "drivers/staging"; @@ -2506,6 +2514,7 @@ int main(int argc, char **argv) err |= check_modname_len(mod); add_header(, mod); add_intree_f
[PATCH v2 04/12] x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
From: Andi Kleen <a...@linux.intel.com> Convert all indirect jumps in ftrace assembler code to use non speculative sequences. Based on code from David Woodhouse and Tim Chen Signed-off-by: Andi Kleen <a...@linux.intel.com> --- arch/x86/kernel/ftrace_32.S | 3 ++- arch/x86/kernel/ftrace_64.S | 6 +++--- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/ftrace_32.S b/arch/x86/kernel/ftrace_32.S index b6c6468e10bc..c0c10ba53375 100644 --- a/arch/x86/kernel/ftrace_32.S +++ b/arch/x86/kernel/ftrace_32.S @@ -8,6 +8,7 @@ #include #include #include +#include #ifdef CC_USING_FENTRY # define function_hook __fentry__ @@ -241,5 +242,5 @@ return_to_handler: movl%eax, %ecx popl%edx popl%eax - jmp *%ecx + NOSPEC_JMP %ecx #endif diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S index c832291d948a..4814c4b30b8d 100644 --- a/arch/x86/kernel/ftrace_64.S +++ b/arch/x86/kernel/ftrace_64.S @@ -7,7 +7,7 @@ #include #include #include - +#include .code64 .section .entry.text, "ax" @@ -286,7 +286,7 @@ trace: * ip and parent ip are used and the list function is called when * function tracing is enabled. */ - call *ftrace_trace_function + NOSPEC_CALL ftrace_trace_function restore_mcount_regs @@ -329,5 +329,5 @@ GLOBAL(return_to_handler) movq 8(%rsp), %rdx movq (%rsp), %rax addq $24, %rsp - jmp *%rdi + NOSPEC_JMP %rdi #endif -- 2.14.3
[PATCH v2 04/12] x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
From: Andi Kleen Convert all indirect jumps in ftrace assembler code to use non speculative sequences. Based on code from David Woodhouse and Tim Chen Signed-off-by: Andi Kleen --- arch/x86/kernel/ftrace_32.S | 3 ++- arch/x86/kernel/ftrace_64.S | 6 +++--- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/ftrace_32.S b/arch/x86/kernel/ftrace_32.S index b6c6468e10bc..c0c10ba53375 100644 --- a/arch/x86/kernel/ftrace_32.S +++ b/arch/x86/kernel/ftrace_32.S @@ -8,6 +8,7 @@ #include #include #include +#include #ifdef CC_USING_FENTRY # define function_hook __fentry__ @@ -241,5 +242,5 @@ return_to_handler: movl%eax, %ecx popl%edx popl%eax - jmp *%ecx + NOSPEC_JMP %ecx #endif diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S index c832291d948a..4814c4b30b8d 100644 --- a/arch/x86/kernel/ftrace_64.S +++ b/arch/x86/kernel/ftrace_64.S @@ -7,7 +7,7 @@ #include #include #include - +#include .code64 .section .entry.text, "ax" @@ -286,7 +286,7 @@ trace: * ip and parent ip are used and the list function is called when * function tracing is enabled. */ - call *ftrace_trace_function + NOSPEC_CALL ftrace_trace_function restore_mcount_regs @@ -329,5 +329,5 @@ GLOBAL(return_to_handler) movq 8(%rsp), %rdx movq (%rsp), %rax addq $24, %rsp - jmp *%rdi + NOSPEC_JMP %rdi #endif -- 2.14.3
[PATCH v2 08/12] x86/retpoline/irq32: Convert assembler indirect jumps
From: Andi Kleen <a...@linux.intel.com> Convert all indirect jumps in 32bit irq inline asm code to use non speculative sequences. Signed-off-by: Andi Kleen <a...@linux.intel.com> --- arch/x86/kernel/irq_32.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c index a83b3346a0e1..2dce2fdd2c4e 100644 --- a/arch/x86/kernel/irq_32.c +++ b/arch/x86/kernel/irq_32.c @@ -20,6 +20,7 @@ #include #include +#include #ifdef CONFIG_DEBUG_STACKOVERFLOW @@ -55,7 +56,7 @@ DEFINE_PER_CPU(struct irq_stack *, softirq_stack); static void call_on_stack(void *func, void *stack) { asm volatile("xchgl %%ebx,%%esp \n" -"call *%%edi \n" +NOSPEC_CALL("%%edi") "movl %%ebx,%%esp \n" : "=b" (stack) : "0" (stack), @@ -95,7 +96,7 @@ static inline int execute_on_irq_stack(int overflow, struct irq_desc *desc) call_on_stack(print_stack_overflow, isp); asm volatile("xchgl %%ebx,%%esp \n" -"call *%%edi \n" +NOSPEC_CALL("%%edi") "movl %%ebx,%%esp \n" : "=a" (arg1), "=b" (isp) : "0" (desc), "1" (isp), -- 2.14.3
[PATCH v2 08/12] x86/retpoline/irq32: Convert assembler indirect jumps
From: Andi Kleen Convert all indirect jumps in 32bit irq inline asm code to use non speculative sequences. Signed-off-by: Andi Kleen --- arch/x86/kernel/irq_32.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/irq_32.c b/arch/x86/kernel/irq_32.c index a83b3346a0e1..2dce2fdd2c4e 100644 --- a/arch/x86/kernel/irq_32.c +++ b/arch/x86/kernel/irq_32.c @@ -20,6 +20,7 @@ #include #include +#include #ifdef CONFIG_DEBUG_STACKOVERFLOW @@ -55,7 +56,7 @@ DEFINE_PER_CPU(struct irq_stack *, softirq_stack); static void call_on_stack(void *func, void *stack) { asm volatile("xchgl %%ebx,%%esp \n" -"call *%%edi \n" +NOSPEC_CALL("%%edi") "movl %%ebx,%%esp \n" : "=b" (stack) : "0" (stack), @@ -95,7 +96,7 @@ static inline int execute_on_irq_stack(int overflow, struct irq_desc *desc) call_on_stack(print_stack_overflow, isp); asm volatile("xchgl %%ebx,%%esp \n" -"call *%%edi \n" +NOSPEC_CALL("%%edi") "movl %%ebx,%%esp \n" : "=a" (arg1), "=b" (isp) : "0" (desc), "1" (isp), -- 2.14.3
Re: Avoid speculative indirect calls in kernel
> So you say, that we finally need a perl interpreter in the kernel to do > alternative patching? I don't think perl or objtool makes sense. That would be just incredibly fragile because compilers can reorder and mix code. It could be done with a gcc change I suppose. That should be reliable. But that would need to be developed first. We don't have it right now. As the first step a compile time approach should be sufficient. We can add a CONFIG option so people can chose at compile time. Then later we can investigate run time patching. -Andi
Re: Avoid speculative indirect calls in kernel
> So you say, that we finally need a perl interpreter in the kernel to do > alternative patching? I don't think perl or objtool makes sense. That would be just incredibly fragile because compilers can reorder and mix code. It could be done with a gcc change I suppose. That should be reliable. But that would need to be developed first. We don't have it right now. As the first step a compile time approach should be sufficient. We can add a CONFIG option so people can chose at compile time. Then later we can investigate run time patching. -Andi
Re: [PATCH] Fix read buffer overflow in delta-ipc
On Wed, Jan 03, 2018 at 09:40:04AM +, Hugues FRUCHET wrote: > Hi Andi, > Thanks for the patch but I would suggest to use strlcpy instead, this > will guard msg.name overwriting and add the NULL termination in case > of truncation: > - memcpy(msg.name, name, sizeof(msg.name)); > - msg.name[sizeof(msg.name) - 1] = 0; > + strlcpy(msg.name, name, sizeof(msg.name)); I'm not an expert on your setup, but it seems strlcpy would leak some uninitialized stack data over your ipc mechanism. strclpy doesn't pad the data. If the IPC is a security boundary that would be a security bug. So I think the original patch is better than strlcpy. -Andi
Re: [PATCH] Fix read buffer overflow in delta-ipc
On Wed, Jan 03, 2018 at 09:40:04AM +, Hugues FRUCHET wrote: > Hi Andi, > Thanks for the patch but I would suggest to use strlcpy instead, this > will guard msg.name overwriting and add the NULL termination in case > of truncation: > - memcpy(msg.name, name, sizeof(msg.name)); > - msg.name[sizeof(msg.name) - 1] = 0; > + strlcpy(msg.name, name, sizeof(msg.name)); I'm not an expert on your setup, but it seems strlcpy would leak some uninitialized stack data over your ipc mechanism. strclpy doesn't pad the data. If the IPC is a security boundary that would be a security bug. So I think the original patch is better than strlcpy. -Andi
Re: Avoid speculative indirect calls in kernel
> It should be a CPU_BUG bit as we have for the other mess. And that can be > used for patching. It has to be done at compile time because it requires a compiler option. Most of the indirect calls are in C code. So it cannot just patched in, only partially out. -Andi
Re: Avoid speculative indirect calls in kernel
> It should be a CPU_BUG bit as we have for the other mess. And that can be > used for patching. It has to be done at compile time because it requires a compiler option. Most of the indirect calls are in C code. So it cannot just patched in, only partially out. -Andi
Re: Avoid speculative indirect calls in kernel
Hi Linus, On Wed, Jan 03, 2018 at 03:51:35PM -0800, Linus Torvalds wrote: > On Wed, Jan 3, 2018 at 3:09 PM, Andi Kleen <a...@firstfloor.org> wrote: > > This is a fix for Variant 2 in > > https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html > > > > Any speculative indirect calls in the kernel can be tricked > > to execute any kernel code, which may allow side channel > > attacks that can leak arbitrary kernel data. > > Why is this all done without any configuration options? I was thinking of a config option, but I was struggling with a name. CONFIG_INSECURE_KERNEL, CONFIG_LEAK_MEMORY? And should it be positive or negative? So I opted to be secure uncontionally. It would be simple to add however, all hooks are either in the Makefile or in asm/jump-asm.h > - these workarounds should have a way to disable them. > There will be soon patches to add other ways and they have a way to patch out most of the retpoline overhead at runtime (basically replace the trampoline with a pure ret) We just wanted to get the retpoline code out first because it's the most basic and widest applicable fix. -Andi
Re: Avoid speculative indirect calls in kernel
Hi Linus, On Wed, Jan 03, 2018 at 03:51:35PM -0800, Linus Torvalds wrote: > On Wed, Jan 3, 2018 at 3:09 PM, Andi Kleen wrote: > > This is a fix for Variant 2 in > > https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html > > > > Any speculative indirect calls in the kernel can be tricked > > to execute any kernel code, which may allow side channel > > attacks that can leak arbitrary kernel data. > > Why is this all done without any configuration options? I was thinking of a config option, but I was struggling with a name. CONFIG_INSECURE_KERNEL, CONFIG_LEAK_MEMORY? And should it be positive or negative? So I opted to be secure uncontionally. It would be simple to add however, all hooks are either in the Makefile or in asm/jump-asm.h > - these workarounds should have a way to disable them. > There will be soon patches to add other ways and they have a way to patch out most of the retpoline overhead at runtime (basically replace the trampoline with a pure ret) We just wanted to get the retpoline code out first because it's the most basic and widest applicable fix. -Andi
Avoid speculative indirect calls in kernel
This is a fix for Variant 2 in https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html Any speculative indirect calls in the kernel can be tricked to execute any kernel code, which may allow side channel attacks that can leak arbitrary kernel data. So we want to avoid speculative indirect calls in the kernel. There's a special code sequence called a retpoline that can do indirect calls without speculation. We use a new compiler option -mindirect-branch=thunk-extern (gcc patch will be released separately) to recompile the kernel with this new sequence. We also patch all the assembler code in the kernel to use the new sequence. The patches were originally from David Woodhouse and Tim Chen, but then reworked and enhanced by me. No performance numbers at this point. 32bit is only boot tested. Git tree available in git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc spec/retpoline-415-1 v1: Initial post.