Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On Fri, Mar 2, 2018 at 9:10 AM, Joerg Roedelwrote: > On Thu, Mar 01, 2018 at 10:38:21AM -0800, Linus Torvalds wrote: >> Note that debug traps can happen regardless of TF, Think kgdb etc. >> Arguably kgdb users get what they deserve, but still.. I think root >> can set kernel breakpoints too. > > But that seems to be broken right now at least wrt. to the espfix code > where there is no handling for in the #DB handler. Can userspace really > set arbitrary kernel breakpoints? > As far as I'm concerned, I don't try to support kernel debugger users setting arbitrary breakpoints in the kernel entry text.
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On Fri, Mar 2, 2018 at 9:10 AM, Joerg Roedel wrote: > On Thu, Mar 01, 2018 at 10:38:21AM -0800, Linus Torvalds wrote: >> Note that debug traps can happen regardless of TF, Think kgdb etc. >> Arguably kgdb users get what they deserve, but still.. I think root >> can set kernel breakpoints too. > > But that seems to be broken right now at least wrt. to the espfix code > where there is no handling for in the #DB handler. Can userspace really > set arbitrary kernel breakpoints? > As far as I'm concerned, I don't try to support kernel debugger users setting arbitrary breakpoints in the kernel entry text.
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On Thu, Mar 01, 2018 at 10:38:21AM -0800, Linus Torvalds wrote: > Note that debug traps can happen regardless of TF, Think kgdb etc. > Arguably kgdb users get what they deserve, but still.. I think root > can set kernel breakpoints too. But that seems to be broken right now at least wrt. to the espfix code where there is no handling for in the #DB handler. Can userspace really set arbitrary kernel breakpoints? Joerg
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On Thu, Mar 01, 2018 at 10:38:21AM -0800, Linus Torvalds wrote: > Note that debug traps can happen regardless of TF, Think kgdb etc. > Arguably kgdb users get what they deserve, but still.. I think root > can set kernel breakpoints too. But that seems to be broken right now at least wrt. to the espfix code where there is no handling for in the #DB handler. Can userspace really set arbitrary kernel breakpoints? Joerg
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On Thu, Mar 01, 2018 at 01:24:39PM -0500, Brian Gerst wrote: > The IF flag only affects external maskable interrupts, not traps or > faults. You do need to check CR3 because SYSENTER does not clear TF > and will immediately cause a debug trap on kernel entry (with user > CR3) if set. That is why the code existed before to check for the > entry stack for debug/NMI. Yeah, okay, thanks for the clarification. This also means the #DB handler needs to leave with the same cr3 as it entered. I'll work that into my patches. Thanks, Joerg
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On Thu, Mar 01, 2018 at 01:24:39PM -0500, Brian Gerst wrote: > The IF flag only affects external maskable interrupts, not traps or > faults. You do need to check CR3 because SYSENTER does not clear TF > and will immediately cause a debug trap on kernel entry (with user > CR3) if set. That is why the code existed before to check for the > entry stack for debug/NMI. Yeah, okay, thanks for the clarification. This also means the #DB handler needs to leave with the same cr3 as it entered. I'll work that into my patches. Thanks, Joerg
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On Thu, Mar 1, 2018 at 10:24 AM, Brian Gerstwrote: > > The IF flag only affects external maskable interrupts, not traps or > faults. You do need to check CR3 because SYSENTER does not clear TF > and will immediately cause a debug trap on kernel entry (with user > CR3) if set. That is why the code existed before to check for the > entry stack for debug/NMI. Note that debug traps can happen regardless of TF, Think kgdb etc. Arguably kgdb users get what they deserve, but still.. I think root can set kernel breakpoints too. Linus
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On Thu, Mar 1, 2018 at 10:24 AM, Brian Gerst wrote: > > The IF flag only affects external maskable interrupts, not traps or > faults. You do need to check CR3 because SYSENTER does not clear TF > and will immediately cause a debug trap on kernel entry (with user > CR3) if set. That is why the code existed before to check for the > entry stack for debug/NMI. Note that debug traps can happen regardless of TF, Think kgdb etc. Arguably kgdb users get what they deserve, but still.. I think root can set kernel breakpoints too. Linus
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On 03/01/2018 10:24 AM, Brian Gerst wrote: > One thing that I am not certain about is whether debug exception can > happen even if the IF flag is cleared. If it can, debug exception should > be handled like NMI as the state of the CR3 can be indeterminate if the > exception happens in the entry/exit code. It can happen with IF cleared. I ran into it during PTI development more than once. That's why the debug fault code uses paranoid_entry on 64-bit just like the NMI code.
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On 03/01/2018 10:24 AM, Brian Gerst wrote: > One thing that I am not certain about is whether debug exception can > happen even if the IF flag is cleared. If it can, debug exception should > be handled like NMI as the state of the CR3 can be indeterminate if the > exception happens in the entry/exit code. It can happen with IF cleared. I ran into it during PTI development more than once. That's why the debug fault code uses paranoid_entry on 64-bit just like the NMI code.
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On Thu, Mar 1, 2018 at 11:50 AM, Joerg Roedelwrote: > On Thu, Mar 01, 2018 at 09:33:11AM -0500, Waiman Long wrote: >> On 03/01/2018 08:34 AM, Joerg Roedel wrote: >> I think that should fix the issue of debug exception from userspace. >> >> One thing that I am not certain about is whether debug exception can >> happen even if the IF flag is cleared. If it can, debug exception should >> be handled like NMI as the state of the CR3 can be indeterminate if the >> exception happens in the entry/exit code. > > I am actually not 100% sure where it can happen, from the code it can > happen from anywhere, except when we are running on an espfix stack. > > So I am not sure we need the same complex handling NMIs need wrt. to > switching the cr3s. The IF flag only affects external maskable interrupts, not traps or faults. You do need to check CR3 because SYSENTER does not clear TF and will immediately cause a debug trap on kernel entry (with user CR3) if set. That is why the code existed before to check for the entry stack for debug/NMI. -- Brian Gerst
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On Thu, Mar 1, 2018 at 11:50 AM, Joerg Roedel wrote: > On Thu, Mar 01, 2018 at 09:33:11AM -0500, Waiman Long wrote: >> On 03/01/2018 08:34 AM, Joerg Roedel wrote: >> I think that should fix the issue of debug exception from userspace. >> >> One thing that I am not certain about is whether debug exception can >> happen even if the IF flag is cleared. If it can, debug exception should >> be handled like NMI as the state of the CR3 can be indeterminate if the >> exception happens in the entry/exit code. > > I am actually not 100% sure where it can happen, from the code it can > happen from anywhere, except when we are running on an espfix stack. > > So I am not sure we need the same complex handling NMIs need wrt. to > switching the cr3s. The IF flag only affects external maskable interrupts, not traps or faults. You do need to check CR3 because SYSENTER does not clear TF and will immediately cause a debug trap on kernel entry (with user CR3) if set. That is why the code existed before to check for the entry stack for debug/NMI. -- Brian Gerst
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On Thu, Mar 01, 2018 at 09:33:11AM -0500, Waiman Long wrote: > On 03/01/2018 08:34 AM, Joerg Roedel wrote: > I think that should fix the issue of debug exception from userspace. > > One thing that I am not certain about is whether debug exception can > happen even if the IF flag is cleared. If it can, debug exception should > be handled like NMI as the state of the CR3 can be indeterminate if the > exception happens in the entry/exit code. I am actually not 100% sure where it can happen, from the code it can happen from anywhere, except when we are running on an espfix stack. So I am not sure we need the same complex handling NMIs need wrt. to switching the cr3s. Joerg
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On Thu, Mar 01, 2018 at 09:33:11AM -0500, Waiman Long wrote: > On 03/01/2018 08:34 AM, Joerg Roedel wrote: > I think that should fix the issue of debug exception from userspace. > > One thing that I am not certain about is whether debug exception can > happen even if the IF flag is cleared. If it can, debug exception should > be handled like NMI as the state of the CR3 can be indeterminate if the > exception happens in the entry/exit code. I am actually not 100% sure where it can happen, from the code it can happen from anywhere, except when we are running on an espfix stack. So I am not sure we need the same complex handling NMIs need wrt. to switching the cr3s. Joerg
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On 03/01/2018 08:34 AM, Joerg Roedel wrote: > On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote: >>> + /* Make sure we are running on kernel cr3 */ >>> + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax >>> + >>> xorl%edx, %edx # error code 0 >>> movl%esp, %eax # pt_regs pointer >>> >> The debug exception calls ret_from_exception on exit. If coming from >> userspace, the C function prepare_exit_to_usermode() will be called. >> With the PTI-32 code, it means that function will be called with the >> entry stack instead of the task stack. This can be problematic as macro >> like current won't work anymore. > Okay, I had another look at the debug handler. As I said before, it > already handles the from-entry-stack case, but with these patches it > gets more likely that we actually hit that path. > > Also, with the special handling for from-kernel-with-entry-stack > situations we can simplify the debug handler and make it more robust > with the diff below. Thoughts? > > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S > index 8c149f5..844aff1 100644 > --- a/arch/x86/entry/entry_32.S > +++ b/arch/x86/entry/entry_32.S > @@ -1318,33 +1318,14 @@ ENTRY(debug) > ASM_CLAC > pushl $-1 # mark this as an int > > - SAVE_ALL > + SAVE_ALL switch_stacks=1 > ENCODE_FRAME_POINTER > > - /* Make sure we are running on kernel cr3 */ > - SWITCH_TO_KERNEL_CR3 scratch_reg=%eax > - > xorl%edx, %edx # error code 0 > movl%esp, %eax # pt_regs pointer > > - /* Are we currently on the SYSENTER stack? */ > - movlPER_CPU_VAR(cpu_entry_area), %ecx > - addl$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx > - subl%eax, %ecx /* ecx = (end of entry_stack) - esp */ > - cmpl$SIZEOF_entry_stack, %ecx > - jb .Ldebug_from_sysenter_stack > - > - TRACE_IRQS_OFF > - calldo_debug > - jmp ret_from_exception > - > -.Ldebug_from_sysenter_stack: > - /* We're on the SYSENTER stack. Switch off. */ > - movl%esp, %ebx > - movlPER_CPU_VAR(cpu_current_top_of_stack), %esp > TRACE_IRQS_OFF > calldo_debug > - movl%ebx, %esp > jmp ret_from_exception > END(debug) > > > I think that should fix the issue of debug exception from userspace. One thing that I am not certain about is whether debug exception can happen even if the IF flag is cleared. If it can, debug exception should be handled like NMI as the state of the CR3 can be indeterminate if the exception happens in the entry/exit code. Cheers, Longman
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On 03/01/2018 08:34 AM, Joerg Roedel wrote: > On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote: >>> + /* Make sure we are running on kernel cr3 */ >>> + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax >>> + >>> xorl%edx, %edx # error code 0 >>> movl%esp, %eax # pt_regs pointer >>> >> The debug exception calls ret_from_exception on exit. If coming from >> userspace, the C function prepare_exit_to_usermode() will be called. >> With the PTI-32 code, it means that function will be called with the >> entry stack instead of the task stack. This can be problematic as macro >> like current won't work anymore. > Okay, I had another look at the debug handler. As I said before, it > already handles the from-entry-stack case, but with these patches it > gets more likely that we actually hit that path. > > Also, with the special handling for from-kernel-with-entry-stack > situations we can simplify the debug handler and make it more robust > with the diff below. Thoughts? > > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S > index 8c149f5..844aff1 100644 > --- a/arch/x86/entry/entry_32.S > +++ b/arch/x86/entry/entry_32.S > @@ -1318,33 +1318,14 @@ ENTRY(debug) > ASM_CLAC > pushl $-1 # mark this as an int > > - SAVE_ALL > + SAVE_ALL switch_stacks=1 > ENCODE_FRAME_POINTER > > - /* Make sure we are running on kernel cr3 */ > - SWITCH_TO_KERNEL_CR3 scratch_reg=%eax > - > xorl%edx, %edx # error code 0 > movl%esp, %eax # pt_regs pointer > > - /* Are we currently on the SYSENTER stack? */ > - movlPER_CPU_VAR(cpu_entry_area), %ecx > - addl$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx > - subl%eax, %ecx /* ecx = (end of entry_stack) - esp */ > - cmpl$SIZEOF_entry_stack, %ecx > - jb .Ldebug_from_sysenter_stack > - > - TRACE_IRQS_OFF > - calldo_debug > - jmp ret_from_exception > - > -.Ldebug_from_sysenter_stack: > - /* We're on the SYSENTER stack. Switch off. */ > - movl%esp, %ebx > - movlPER_CPU_VAR(cpu_current_top_of_stack), %esp > TRACE_IRQS_OFF > calldo_debug > - movl%ebx, %esp > jmp ret_from_exception > END(debug) > > > I think that should fix the issue of debug exception from userspace. One thing that I am not certain about is whether debug exception can happen even if the IF flag is cleared. If it can, debug exception should be handled like NMI as the state of the CR3 can be indeterminate if the exception happens in the entry/exit code. Cheers, Longman
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote: > > + /* Make sure we are running on kernel cr3 */ > > + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax > > + > > xorl%edx, %edx # error code 0 > > movl%esp, %eax # pt_regs pointer > > > > The debug exception calls ret_from_exception on exit. If coming from > userspace, the C function prepare_exit_to_usermode() will be called. > With the PTI-32 code, it means that function will be called with the > entry stack instead of the task stack. This can be problematic as macro > like current won't work anymore. Okay, I had another look at the debug handler. As I said before, it already handles the from-entry-stack case, but with these patches it gets more likely that we actually hit that path. Also, with the special handling for from-kernel-with-entry-stack situations we can simplify the debug handler and make it more robust with the diff below. Thoughts? diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 8c149f5..844aff1 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -1318,33 +1318,14 @@ ENTRY(debug) ASM_CLAC pushl $-1 # mark this as an int - SAVE_ALL + SAVE_ALL switch_stacks=1 ENCODE_FRAME_POINTER - /* Make sure we are running on kernel cr3 */ - SWITCH_TO_KERNEL_CR3 scratch_reg=%eax - xorl%edx, %edx # error code 0 movl%esp, %eax # pt_regs pointer - /* Are we currently on the SYSENTER stack? */ - movlPER_CPU_VAR(cpu_entry_area), %ecx - addl$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx - subl%eax, %ecx /* ecx = (end of entry_stack) - esp */ - cmpl$SIZEOF_entry_stack, %ecx - jb .Ldebug_from_sysenter_stack - - TRACE_IRQS_OFF - calldo_debug - jmp ret_from_exception - -.Ldebug_from_sysenter_stack: - /* We're on the SYSENTER stack. Switch off. */ - movl%esp, %ebx - movlPER_CPU_VAR(cpu_current_top_of_stack), %esp TRACE_IRQS_OFF calldo_debug - movl%ebx, %esp jmp ret_from_exception END(debug)
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote: > > + /* Make sure we are running on kernel cr3 */ > > + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax > > + > > xorl%edx, %edx # error code 0 > > movl%esp, %eax # pt_regs pointer > > > > The debug exception calls ret_from_exception on exit. If coming from > userspace, the C function prepare_exit_to_usermode() will be called. > With the PTI-32 code, it means that function will be called with the > entry stack instead of the task stack. This can be problematic as macro > like current won't work anymore. Okay, I had another look at the debug handler. As I said before, it already handles the from-entry-stack case, but with these patches it gets more likely that we actually hit that path. Also, with the special handling for from-kernel-with-entry-stack situations we can simplify the debug handler and make it more robust with the diff below. Thoughts? diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 8c149f5..844aff1 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -1318,33 +1318,14 @@ ENTRY(debug) ASM_CLAC pushl $-1 # mark this as an int - SAVE_ALL + SAVE_ALL switch_stacks=1 ENCODE_FRAME_POINTER - /* Make sure we are running on kernel cr3 */ - SWITCH_TO_KERNEL_CR3 scratch_reg=%eax - xorl%edx, %edx # error code 0 movl%esp, %eax # pt_regs pointer - /* Are we currently on the SYSENTER stack? */ - movlPER_CPU_VAR(cpu_entry_area), %ecx - addl$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx - subl%eax, %ecx /* ecx = (end of entry_stack) - esp */ - cmpl$SIZEOF_entry_stack, %ecx - jb .Ldebug_from_sysenter_stack - - TRACE_IRQS_OFF - calldo_debug - jmp ret_from_exception - -.Ldebug_from_sysenter_stack: - /* We're on the SYSENTER stack. Switch off. */ - movl%esp, %ebx - movlPER_CPU_VAR(cpu_current_top_of_stack), %esp TRACE_IRQS_OFF calldo_debug - movl%ebx, %esp jmp ret_from_exception END(debug)
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote: > On 02/09/2018 04:25 AM, Joerg Roedel wrote: > > SAVE_ALL > > ENCODE_FRAME_POINTER > > + > > + /* Make sure we are running on kernel cr3 */ > > + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax > > + > > xorl%edx, %edx # error code 0 > > movl%esp, %eax # pt_regs pointer > > > > The debug exception calls ret_from_exception on exit. If coming from > userspace, the C function prepare_exit_to_usermode() will be called. > With the PTI-32 code, it means that function will be called with the > entry stack instead of the task stack. This can be problematic as macro > like current won't work anymore. This is not different from before, no? The debug handler already can be entered on entry stack before this patch-set. Joerg
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote: > On 02/09/2018 04:25 AM, Joerg Roedel wrote: > > SAVE_ALL > > ENCODE_FRAME_POINTER > > + > > + /* Make sure we are running on kernel cr3 */ > > + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax > > + > > xorl%edx, %edx # error code 0 > > movl%esp, %eax # pt_regs pointer > > > > The debug exception calls ret_from_exception on exit. If coming from > userspace, the C function prepare_exit_to_usermode() will be called. > With the PTI-32 code, it means that function will be called with the > entry stack instead of the task stack. This can be problematic as macro > like current won't work anymore. This is not different from before, no? The debug handler already can be entered on entry stack before this patch-set. Joerg
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On 02/09/2018 04:25 AM, Joerg Roedel wrote: > From: Joerg Roedel> > Add unconditional cr3 switches between user and kernel cr3 > to all non-NMI entry and exit points. > > Signed-off-by: Joerg Roedel > --- > arch/x86/entry/entry_32.S | 59 > ++- > 1 file changed, 58 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S > index 9693485..b5ef003 100644 > --- a/arch/x86/entry/entry_32.S > +++ b/arch/x86/entry/entry_32.S > @@ -328,6 +328,25 @@ > #endif /* CONFIG_X86_ESPFIX32 */ > .endm > > +/* Unconditionally switch to user cr3 */ > +.macro SWITCH_TO_USER_CR3 scratch_reg:req > + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI > + > + movl%cr3, \scratch_reg > + orl $PTI_SWITCH_MASK, \scratch_reg > + movl\scratch_reg, %cr3 > +.Lend_\@: > +.endm > + > +/* Unconditionally switch to kernel cr3 */ > +.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req > + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI > + movl%cr3, \scratch_reg > + andl$(~PTI_SWITCH_MASK), \scratch_reg > + movl\scratch_reg, %cr3 > +.Lend_\@: > +.endm > + > > /* > * Called with pt_regs fully populated and kernel segments loaded, > @@ -343,6 +362,8 @@ > > ALTERNATIVE "", "jmp .Lend_\@", X86_FEATURE_XENPV > > + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax > + > /* Are we on the entry stack? Bail out if not! */ > movlPER_CPU_VAR(cpu_entry_area), %edi > addl$CPU_ENTRY_AREA_entry_stack, %edi > @@ -637,6 +658,18 @@ ENTRY(xen_sysenter_target) > * 0(%ebp) arg6 > */ > ENTRY(entry_SYSENTER_32) > + /* > + * On entry-stack with all userspace-regs live - save and > + * restore eflags and %eax to use it as scratch-reg for the cr3 > + * switch. > + */ > + pushfl > + pushl %eax > + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax > + popl%eax > + popfl > + > + /* Stack empty again, switch to task stack */ > movlTSS_entry_stack(%esp), %esp > > .Lsysenter_past_esp: > @@ -691,6 +724,10 @@ ENTRY(entry_SYSENTER_32) > movlPT_OLDESP(%esp), %ecx /* pt_regs->sp */ > 1: mov PT_FS(%esp), %fs > PTGS_TO_GS > + > + /* Segments are restored - switch to user cr3 */ > + SWITCH_TO_USER_CR3 scratch_reg=%eax > + > popl%ebx/* pt_regs->bx */ > addl$2*4, %esp /* skip pt_regs->cx and pt_regs->dx */ > popl%esi/* pt_regs->si */ > @@ -778,7 +815,23 @@ restore_all: > .Lrestore_all_notrace: > CHECK_AND_APPLY_ESPFIX > .Lrestore_nocheck: > - RESTORE_REGS 4 # skip orig_eax/error_code > + /* > + * First restore user segments. This can cause exceptions, so we > + * run it with kernel cr3. > + */ > + RESTORE_SEGMENTS > + > + /* > + * Segments are restored - no more exceptions from here on except on > + * iret, but that handled safely. > + */ > + SWITCH_TO_USER_CR3 scratch_reg=%eax > + > + /* Restore rest */ > + RESTORE_INT_REGS > + > + /* Unwind stack to the iret frame */ > + RESTORE_SKIP_SEGMENTS 4 # skip orig_eax/error_code > .Lirq_return: > INTERRUPT_RETURN > > @@ -1139,6 +1192,10 @@ ENTRY(debug) > > SAVE_ALL > ENCODE_FRAME_POINTER > + > + /* Make sure we are running on kernel cr3 */ > + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax > + > xorl%edx, %edx # error code 0 > movl%esp, %eax # pt_regs pointer > The debug exception calls ret_from_exception on exit. If coming from userspace, the C function prepare_exit_to_usermode() will be called. With the PTI-32 code, it means that function will be called with the entry stack instead of the task stack. This can be problematic as macro like current won't work anymore. -Longman
Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
On 02/09/2018 04:25 AM, Joerg Roedel wrote: > From: Joerg Roedel > > Add unconditional cr3 switches between user and kernel cr3 > to all non-NMI entry and exit points. > > Signed-off-by: Joerg Roedel > --- > arch/x86/entry/entry_32.S | 59 > ++- > 1 file changed, 58 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S > index 9693485..b5ef003 100644 > --- a/arch/x86/entry/entry_32.S > +++ b/arch/x86/entry/entry_32.S > @@ -328,6 +328,25 @@ > #endif /* CONFIG_X86_ESPFIX32 */ > .endm > > +/* Unconditionally switch to user cr3 */ > +.macro SWITCH_TO_USER_CR3 scratch_reg:req > + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI > + > + movl%cr3, \scratch_reg > + orl $PTI_SWITCH_MASK, \scratch_reg > + movl\scratch_reg, %cr3 > +.Lend_\@: > +.endm > + > +/* Unconditionally switch to kernel cr3 */ > +.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req > + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI > + movl%cr3, \scratch_reg > + andl$(~PTI_SWITCH_MASK), \scratch_reg > + movl\scratch_reg, %cr3 > +.Lend_\@: > +.endm > + > > /* > * Called with pt_regs fully populated and kernel segments loaded, > @@ -343,6 +362,8 @@ > > ALTERNATIVE "", "jmp .Lend_\@", X86_FEATURE_XENPV > > + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax > + > /* Are we on the entry stack? Bail out if not! */ > movlPER_CPU_VAR(cpu_entry_area), %edi > addl$CPU_ENTRY_AREA_entry_stack, %edi > @@ -637,6 +658,18 @@ ENTRY(xen_sysenter_target) > * 0(%ebp) arg6 > */ > ENTRY(entry_SYSENTER_32) > + /* > + * On entry-stack with all userspace-regs live - save and > + * restore eflags and %eax to use it as scratch-reg for the cr3 > + * switch. > + */ > + pushfl > + pushl %eax > + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax > + popl%eax > + popfl > + > + /* Stack empty again, switch to task stack */ > movlTSS_entry_stack(%esp), %esp > > .Lsysenter_past_esp: > @@ -691,6 +724,10 @@ ENTRY(entry_SYSENTER_32) > movlPT_OLDESP(%esp), %ecx /* pt_regs->sp */ > 1: mov PT_FS(%esp), %fs > PTGS_TO_GS > + > + /* Segments are restored - switch to user cr3 */ > + SWITCH_TO_USER_CR3 scratch_reg=%eax > + > popl%ebx/* pt_regs->bx */ > addl$2*4, %esp /* skip pt_regs->cx and pt_regs->dx */ > popl%esi/* pt_regs->si */ > @@ -778,7 +815,23 @@ restore_all: > .Lrestore_all_notrace: > CHECK_AND_APPLY_ESPFIX > .Lrestore_nocheck: > - RESTORE_REGS 4 # skip orig_eax/error_code > + /* > + * First restore user segments. This can cause exceptions, so we > + * run it with kernel cr3. > + */ > + RESTORE_SEGMENTS > + > + /* > + * Segments are restored - no more exceptions from here on except on > + * iret, but that handled safely. > + */ > + SWITCH_TO_USER_CR3 scratch_reg=%eax > + > + /* Restore rest */ > + RESTORE_INT_REGS > + > + /* Unwind stack to the iret frame */ > + RESTORE_SKIP_SEGMENTS 4 # skip orig_eax/error_code > .Lirq_return: > INTERRUPT_RETURN > > @@ -1139,6 +1192,10 @@ ENTRY(debug) > > SAVE_ALL > ENCODE_FRAME_POINTER > + > + /* Make sure we are running on kernel cr3 */ > + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax > + > xorl%edx, %edx # error code 0 > movl%esp, %eax # pt_regs pointer > The debug exception calls ret_from_exception on exit. If coming from userspace, the C function prepare_exit_to_usermode() will be called. With the PTI-32 code, it means that function will be called with the entry stack instead of the task stack. This can be problematic as macro like current won't work anymore. -Longman