Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-16 Thread Andy Lutomirski
On Fri, Mar 2, 2018 at 9:10 AM, Joerg Roedel  wrote:
> On Thu, Mar 01, 2018 at 10:38:21AM -0800, Linus Torvalds wrote:
>> Note that debug traps can happen regardless of TF, Think kgdb etc.
>> Arguably kgdb users get what they deserve, but still.. I think root
>> can set kernel breakpoints too.
>
> But that seems to be broken right now at least wrt. to the espfix code
> where there is no handling for in the #DB handler. Can userspace really
> set arbitrary kernel breakpoints?
>

As far as I'm concerned, I don't try to support kernel debugger users
setting arbitrary breakpoints in the kernel entry text.


Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-16 Thread Andy Lutomirski
On Fri, Mar 2, 2018 at 9:10 AM, Joerg Roedel  wrote:
> On Thu, Mar 01, 2018 at 10:38:21AM -0800, Linus Torvalds wrote:
>> Note that debug traps can happen regardless of TF, Think kgdb etc.
>> Arguably kgdb users get what they deserve, but still.. I think root
>> can set kernel breakpoints too.
>
> But that seems to be broken right now at least wrt. to the espfix code
> where there is no handling for in the #DB handler. Can userspace really
> set arbitrary kernel breakpoints?
>

As far as I'm concerned, I don't try to support kernel debugger users
setting arbitrary breakpoints in the kernel entry text.


Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-02 Thread Joerg Roedel
On Thu, Mar 01, 2018 at 10:38:21AM -0800, Linus Torvalds wrote:
> Note that debug traps can happen regardless of TF, Think kgdb etc.
> Arguably kgdb users get what they deserve, but still.. I think root
> can set kernel breakpoints too.

But that seems to be broken right now at least wrt. to the espfix code
where there is no handling for in the #DB handler. Can userspace really
set arbitrary kernel breakpoints?


Joerg



Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-02 Thread Joerg Roedel
On Thu, Mar 01, 2018 at 10:38:21AM -0800, Linus Torvalds wrote:
> Note that debug traps can happen regardless of TF, Think kgdb etc.
> Arguably kgdb users get what they deserve, but still.. I think root
> can set kernel breakpoints too.

But that seems to be broken right now at least wrt. to the espfix code
where there is no handling for in the #DB handler. Can userspace really
set arbitrary kernel breakpoints?


Joerg



Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-02 Thread Joerg Roedel
On Thu, Mar 01, 2018 at 01:24:39PM -0500, Brian Gerst wrote:
> The IF flag only affects external maskable interrupts, not traps or
> faults.  You do need to check CR3 because SYSENTER does not clear TF
> and will immediately cause a debug trap on kernel entry (with user
> CR3) if set.  That is why the code existed before to check for the
> entry stack for debug/NMI.

Yeah, okay, thanks for the clarification. This also means the #DB
handler needs to leave with the same cr3 as it entered. I'll work that
into my patches.

Thanks,

Joerg


Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-02 Thread Joerg Roedel
On Thu, Mar 01, 2018 at 01:24:39PM -0500, Brian Gerst wrote:
> The IF flag only affects external maskable interrupts, not traps or
> faults.  You do need to check CR3 because SYSENTER does not clear TF
> and will immediately cause a debug trap on kernel entry (with user
> CR3) if set.  That is why the code existed before to check for the
> entry stack for debug/NMI.

Yeah, okay, thanks for the clarification. This also means the #DB
handler needs to leave with the same cr3 as it entered. I'll work that
into my patches.

Thanks,

Joerg


Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-01 Thread Linus Torvalds
On Thu, Mar 1, 2018 at 10:24 AM, Brian Gerst  wrote:
>
> The IF flag only affects external maskable interrupts, not traps or
> faults.  You do need to check CR3 because SYSENTER does not clear TF
> and will immediately cause a debug trap on kernel entry (with user
> CR3) if set.  That is why the code existed before to check for the
> entry stack for debug/NMI.

Note that debug traps can happen regardless of TF, Think kgdb etc.
Arguably kgdb users get what they deserve, but still.. I think root
can set kernel breakpoints too.

  Linus


Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-01 Thread Linus Torvalds
On Thu, Mar 1, 2018 at 10:24 AM, Brian Gerst  wrote:
>
> The IF flag only affects external maskable interrupts, not traps or
> faults.  You do need to check CR3 because SYSENTER does not clear TF
> and will immediately cause a debug trap on kernel entry (with user
> CR3) if set.  That is why the code existed before to check for the
> entry stack for debug/NMI.

Note that debug traps can happen regardless of TF, Think kgdb etc.
Arguably kgdb users get what they deserve, but still.. I think root
can set kernel breakpoints too.

  Linus


Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-01 Thread Dave Hansen
On 03/01/2018 10:24 AM, Brian Gerst wrote:
> One thing that I am not certain about is whether debug exception can
> happen even if the IF flag is cleared. If it can, debug exception should
> be handled like NMI as the state of the CR3 can be indeterminate if the
> exception happens in the entry/exit code.

It can happen with IF cleared.  I ran into it during PTI development
more than once.  That's why the debug fault code uses paranoid_entry on
64-bit just like the NMI code.


Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-01 Thread Dave Hansen
On 03/01/2018 10:24 AM, Brian Gerst wrote:
> One thing that I am not certain about is whether debug exception can
> happen even if the IF flag is cleared. If it can, debug exception should
> be handled like NMI as the state of the CR3 can be indeterminate if the
> exception happens in the entry/exit code.

It can happen with IF cleared.  I ran into it during PTI development
more than once.  That's why the debug fault code uses paranoid_entry on
64-bit just like the NMI code.


Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-01 Thread Brian Gerst
On Thu, Mar 1, 2018 at 11:50 AM, Joerg Roedel  wrote:
> On Thu, Mar 01, 2018 at 09:33:11AM -0500, Waiman Long wrote:
>> On 03/01/2018 08:34 AM, Joerg Roedel wrote:
>> I think that should fix the issue of debug exception from userspace.
>>
>> One thing that I am not certain about is whether debug exception can
>> happen even if the IF flag is cleared. If it can, debug exception should
>> be handled like NMI as the state of the CR3 can be indeterminate if the
>> exception happens in the entry/exit code.
>
> I am actually not 100% sure where it can happen, from the code it can
> happen from anywhere, except when we are running on an espfix stack.
>
> So I am not sure we need the same complex handling NMIs need wrt. to
> switching the cr3s.

The IF flag only affects external maskable interrupts, not traps or
faults.  You do need to check CR3 because SYSENTER does not clear TF
and will immediately cause a debug trap on kernel entry (with user
CR3) if set.  That is why the code existed before to check for the
entry stack for debug/NMI.

--
Brian Gerst


Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-01 Thread Brian Gerst
On Thu, Mar 1, 2018 at 11:50 AM, Joerg Roedel  wrote:
> On Thu, Mar 01, 2018 at 09:33:11AM -0500, Waiman Long wrote:
>> On 03/01/2018 08:34 AM, Joerg Roedel wrote:
>> I think that should fix the issue of debug exception from userspace.
>>
>> One thing that I am not certain about is whether debug exception can
>> happen even if the IF flag is cleared. If it can, debug exception should
>> be handled like NMI as the state of the CR3 can be indeterminate if the
>> exception happens in the entry/exit code.
>
> I am actually not 100% sure where it can happen, from the code it can
> happen from anywhere, except when we are running on an espfix stack.
>
> So I am not sure we need the same complex handling NMIs need wrt. to
> switching the cr3s.

The IF flag only affects external maskable interrupts, not traps or
faults.  You do need to check CR3 because SYSENTER does not clear TF
and will immediately cause a debug trap on kernel entry (with user
CR3) if set.  That is why the code existed before to check for the
entry stack for debug/NMI.

--
Brian Gerst


Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-01 Thread Joerg Roedel
On Thu, Mar 01, 2018 at 09:33:11AM -0500, Waiman Long wrote:
> On 03/01/2018 08:34 AM, Joerg Roedel wrote:
> I think that should fix the issue of debug exception from userspace.
> 
> One thing that I am not certain about is whether debug exception can
> happen even if the IF flag is cleared. If it can, debug exception should
> be handled like NMI as the state of the CR3 can be indeterminate if the
> exception happens in the entry/exit code.

I am actually not 100% sure where it can happen, from the code it can
happen from anywhere, except when we are running on an espfix stack.

So I am not sure we need the same complex handling NMIs need wrt. to
switching the cr3s.


Joerg



Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-01 Thread Joerg Roedel
On Thu, Mar 01, 2018 at 09:33:11AM -0500, Waiman Long wrote:
> On 03/01/2018 08:34 AM, Joerg Roedel wrote:
> I think that should fix the issue of debug exception from userspace.
> 
> One thing that I am not certain about is whether debug exception can
> happen even if the IF flag is cleared. If it can, debug exception should
> be handled like NMI as the state of the CR3 can be indeterminate if the
> exception happens in the entry/exit code.

I am actually not 100% sure where it can happen, from the code it can
happen from anywhere, except when we are running on an espfix stack.

So I am not sure we need the same complex handling NMIs need wrt. to
switching the cr3s.


Joerg



Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-01 Thread Waiman Long
On 03/01/2018 08:34 AM, Joerg Roedel wrote:
> On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote:
>>> +   /* Make sure we are running on kernel cr3 */
>>> +   SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
>>> +
>>> xorl%edx, %edx  # error code 0
>>> movl%esp, %eax  # pt_regs pointer
>>>  
>> The debug exception calls ret_from_exception on exit. If coming from
>> userspace, the C function prepare_exit_to_usermode() will be called.
>> With the PTI-32 code, it means that function will be called with the
>> entry stack instead of the task stack. This can be problematic as macro
>> like current won't work anymore.
> Okay, I had another look at the debug handler. As I said before, it
> already handles the from-entry-stack case, but with these patches it
> gets more likely that we actually hit that path.
>
> Also, with the special handling for from-kernel-with-entry-stack
> situations we can simplify the debug handler and make it more robust
> with the diff below. Thoughts?
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 8c149f5..844aff1 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -1318,33 +1318,14 @@ ENTRY(debug)
>   ASM_CLAC
>   pushl   $-1 # mark this as an int
>  
> - SAVE_ALL
> + SAVE_ALL switch_stacks=1
>   ENCODE_FRAME_POINTER
>  
> - /* Make sure we are running on kernel cr3 */
> - SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> -
>   xorl%edx, %edx  # error code 0
>   movl%esp, %eax  # pt_regs pointer
>  
> - /* Are we currently on the SYSENTER stack? */
> - movlPER_CPU_VAR(cpu_entry_area), %ecx
> - addl$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
> - subl%eax, %ecx  /* ecx = (end of entry_stack) - esp */
> - cmpl$SIZEOF_entry_stack, %ecx
> - jb  .Ldebug_from_sysenter_stack
> -
> - TRACE_IRQS_OFF
> - calldo_debug
> - jmp ret_from_exception
> -
> -.Ldebug_from_sysenter_stack:
> - /* We're on the SYSENTER stack.  Switch off. */
> - movl%esp, %ebx
> - movlPER_CPU_VAR(cpu_current_top_of_stack), %esp
>   TRACE_IRQS_OFF
>   calldo_debug
> - movl%ebx, %esp
>   jmp ret_from_exception
>  END(debug)
>  
>
>
I think that should fix the issue of debug exception from userspace.

One thing that I am not certain about is whether debug exception can
happen even if the IF flag is cleared. If it can, debug exception should
be handled like NMI as the state of the CR3 can be indeterminate if the
exception happens in the entry/exit code.

Cheers,
Longman




Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-01 Thread Waiman Long
On 03/01/2018 08:34 AM, Joerg Roedel wrote:
> On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote:
>>> +   /* Make sure we are running on kernel cr3 */
>>> +   SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
>>> +
>>> xorl%edx, %edx  # error code 0
>>> movl%esp, %eax  # pt_regs pointer
>>>  
>> The debug exception calls ret_from_exception on exit. If coming from
>> userspace, the C function prepare_exit_to_usermode() will be called.
>> With the PTI-32 code, it means that function will be called with the
>> entry stack instead of the task stack. This can be problematic as macro
>> like current won't work anymore.
> Okay, I had another look at the debug handler. As I said before, it
> already handles the from-entry-stack case, but with these patches it
> gets more likely that we actually hit that path.
>
> Also, with the special handling for from-kernel-with-entry-stack
> situations we can simplify the debug handler and make it more robust
> with the diff below. Thoughts?
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 8c149f5..844aff1 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -1318,33 +1318,14 @@ ENTRY(debug)
>   ASM_CLAC
>   pushl   $-1 # mark this as an int
>  
> - SAVE_ALL
> + SAVE_ALL switch_stacks=1
>   ENCODE_FRAME_POINTER
>  
> - /* Make sure we are running on kernel cr3 */
> - SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> -
>   xorl%edx, %edx  # error code 0
>   movl%esp, %eax  # pt_regs pointer
>  
> - /* Are we currently on the SYSENTER stack? */
> - movlPER_CPU_VAR(cpu_entry_area), %ecx
> - addl$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
> - subl%eax, %ecx  /* ecx = (end of entry_stack) - esp */
> - cmpl$SIZEOF_entry_stack, %ecx
> - jb  .Ldebug_from_sysenter_stack
> -
> - TRACE_IRQS_OFF
> - calldo_debug
> - jmp ret_from_exception
> -
> -.Ldebug_from_sysenter_stack:
> - /* We're on the SYSENTER stack.  Switch off. */
> - movl%esp, %ebx
> - movlPER_CPU_VAR(cpu_current_top_of_stack), %esp
>   TRACE_IRQS_OFF
>   calldo_debug
> - movl%ebx, %esp
>   jmp ret_from_exception
>  END(debug)
>  
>
>
I think that should fix the issue of debug exception from userspace.

One thing that I am not certain about is whether debug exception can
happen even if the IF flag is cleared. If it can, debug exception should
be handled like NMI as the state of the CR3 can be indeterminate if the
exception happens in the entry/exit code.

Cheers,
Longman




Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-01 Thread Joerg Roedel
On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote:
> > +   /* Make sure we are running on kernel cr3 */
> > +   SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> > +
> > xorl%edx, %edx  # error code 0
> > movl%esp, %eax  # pt_regs pointer
> >  
> 
> The debug exception calls ret_from_exception on exit. If coming from
> userspace, the C function prepare_exit_to_usermode() will be called.
> With the PTI-32 code, it means that function will be called with the
> entry stack instead of the task stack. This can be problematic as macro
> like current won't work anymore.

Okay, I had another look at the debug handler. As I said before, it
already handles the from-entry-stack case, but with these patches it
gets more likely that we actually hit that path.

Also, with the special handling for from-kernel-with-entry-stack
situations we can simplify the debug handler and make it more robust
with the diff below. Thoughts?

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 8c149f5..844aff1 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1318,33 +1318,14 @@ ENTRY(debug)
ASM_CLAC
pushl   $-1 # mark this as an int
 
-   SAVE_ALL
+   SAVE_ALL switch_stacks=1
ENCODE_FRAME_POINTER
 
-   /* Make sure we are running on kernel cr3 */
-   SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
-
xorl%edx, %edx  # error code 0
movl%esp, %eax  # pt_regs pointer
 
-   /* Are we currently on the SYSENTER stack? */
-   movlPER_CPU_VAR(cpu_entry_area), %ecx
-   addl$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
-   subl%eax, %ecx  /* ecx = (end of entry_stack) - esp */
-   cmpl$SIZEOF_entry_stack, %ecx
-   jb  .Ldebug_from_sysenter_stack
-
-   TRACE_IRQS_OFF
-   calldo_debug
-   jmp ret_from_exception
-
-.Ldebug_from_sysenter_stack:
-   /* We're on the SYSENTER stack.  Switch off. */
-   movl%esp, %ebx
-   movlPER_CPU_VAR(cpu_current_top_of_stack), %esp
TRACE_IRQS_OFF
calldo_debug
-   movl%ebx, %esp
jmp ret_from_exception
 END(debug)
 




Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-01 Thread Joerg Roedel
On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote:
> > +   /* Make sure we are running on kernel cr3 */
> > +   SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> > +
> > xorl%edx, %edx  # error code 0
> > movl%esp, %eax  # pt_regs pointer
> >  
> 
> The debug exception calls ret_from_exception on exit. If coming from
> userspace, the C function prepare_exit_to_usermode() will be called.
> With the PTI-32 code, it means that function will be called with the
> entry stack instead of the task stack. This can be problematic as macro
> like current won't work anymore.

Okay, I had another look at the debug handler. As I said before, it
already handles the from-entry-stack case, but with these patches it
gets more likely that we actually hit that path.

Also, with the special handling for from-kernel-with-entry-stack
situations we can simplify the debug handler and make it more robust
with the diff below. Thoughts?

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 8c149f5..844aff1 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1318,33 +1318,14 @@ ENTRY(debug)
ASM_CLAC
pushl   $-1 # mark this as an int
 
-   SAVE_ALL
+   SAVE_ALL switch_stacks=1
ENCODE_FRAME_POINTER
 
-   /* Make sure we are running on kernel cr3 */
-   SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
-
xorl%edx, %edx  # error code 0
movl%esp, %eax  # pt_regs pointer
 
-   /* Are we currently on the SYSENTER stack? */
-   movlPER_CPU_VAR(cpu_entry_area), %ecx
-   addl$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
-   subl%eax, %ecx  /* ecx = (end of entry_stack) - esp */
-   cmpl$SIZEOF_entry_stack, %ecx
-   jb  .Ldebug_from_sysenter_stack
-
-   TRACE_IRQS_OFF
-   calldo_debug
-   jmp ret_from_exception
-
-.Ldebug_from_sysenter_stack:
-   /* We're on the SYSENTER stack.  Switch off. */
-   movl%esp, %ebx
-   movlPER_CPU_VAR(cpu_current_top_of_stack), %esp
TRACE_IRQS_OFF
calldo_debug
-   movl%ebx, %esp
jmp ret_from_exception
 END(debug)
 




Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-01 Thread Joerg Roedel
On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote:
> On 02/09/2018 04:25 AM, Joerg Roedel wrote:
> > SAVE_ALL
> > ENCODE_FRAME_POINTER
> > +
> > +   /* Make sure we are running on kernel cr3 */
> > +   SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> > +
> > xorl%edx, %edx  # error code 0
> > movl%esp, %eax  # pt_regs pointer
> >  
> 
> The debug exception calls ret_from_exception on exit. If coming from
> userspace, the C function prepare_exit_to_usermode() will be called.
> With the PTI-32 code, it means that function will be called with the
> entry stack instead of the task stack. This can be problematic as macro
> like current won't work anymore.

This is not different from before, no? The debug handler already can be
entered on entry stack before this patch-set.


Joerg



Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-03-01 Thread Joerg Roedel
On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote:
> On 02/09/2018 04:25 AM, Joerg Roedel wrote:
> > SAVE_ALL
> > ENCODE_FRAME_POINTER
> > +
> > +   /* Make sure we are running on kernel cr3 */
> > +   SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> > +
> > xorl%edx, %edx  # error code 0
> > movl%esp, %eax  # pt_regs pointer
> >  
> 
> The debug exception calls ret_from_exception on exit. If coming from
> userspace, the C function prepare_exit_to_usermode() will be called.
> With the PTI-32 code, it means that function will be called with the
> entry stack instead of the task stack. This can be problematic as macro
> like current won't work anymore.

This is not different from before, no? The debug handler already can be
entered on entry stack before this patch-set.


Joerg



Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-02-27 Thread Waiman Long
On 02/09/2018 04:25 AM, Joerg Roedel wrote:
> From: Joerg Roedel 
>
> Add unconditional cr3 switches between user and kernel cr3
> to all non-NMI entry and exit points.
>
> Signed-off-by: Joerg Roedel 
> ---
>  arch/x86/entry/entry_32.S | 59 
> ++-
>  1 file changed, 58 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 9693485..b5ef003 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -328,6 +328,25 @@
>  #endif /* CONFIG_X86_ESPFIX32 */
>  .endm
>  
> +/* Unconditionally switch to user cr3 */
> +.macro SWITCH_TO_USER_CR3 scratch_reg:req
> + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
> +
> + movl%cr3, \scratch_reg
> + orl $PTI_SWITCH_MASK, \scratch_reg
> + movl\scratch_reg, %cr3
> +.Lend_\@:
> +.endm
> +
> +/* Unconditionally switch to kernel cr3 */
> +.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
> + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
> + movl%cr3, \scratch_reg
> + andl$(~PTI_SWITCH_MASK), \scratch_reg
> + movl\scratch_reg, %cr3
> +.Lend_\@:
> +.endm
> +
>  
>  /*
>   * Called with pt_regs fully populated and kernel segments loaded,
> @@ -343,6 +362,8 @@
>  
>   ALTERNATIVE "", "jmp .Lend_\@", X86_FEATURE_XENPV
>  
> + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> +
>   /* Are we on the entry stack? Bail out if not! */
>   movlPER_CPU_VAR(cpu_entry_area), %edi
>   addl$CPU_ENTRY_AREA_entry_stack, %edi
> @@ -637,6 +658,18 @@ ENTRY(xen_sysenter_target)
>   * 0(%ebp) arg6
>   */
>  ENTRY(entry_SYSENTER_32)
> + /*
> +  * On entry-stack with all userspace-regs live - save and
> +  * restore eflags and %eax to use it as scratch-reg for the cr3
> +  * switch.
> +  */
> + pushfl
> + pushl   %eax
> + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> + popl%eax
> + popfl
> +
> + /* Stack empty again, switch to task stack */
>   movlTSS_entry_stack(%esp), %esp
>  
>  .Lsysenter_past_esp:
> @@ -691,6 +724,10 @@ ENTRY(entry_SYSENTER_32)
>   movlPT_OLDESP(%esp), %ecx   /* pt_regs->sp */
>  1:   mov PT_FS(%esp), %fs
>   PTGS_TO_GS
> +
> + /* Segments are restored - switch to user cr3 */
> + SWITCH_TO_USER_CR3 scratch_reg=%eax
> +
>   popl%ebx/* pt_regs->bx */
>   addl$2*4, %esp  /* skip pt_regs->cx and pt_regs->dx */
>   popl%esi/* pt_regs->si */
> @@ -778,7 +815,23 @@ restore_all:
>  .Lrestore_all_notrace:
>   CHECK_AND_APPLY_ESPFIX
>  .Lrestore_nocheck:
> - RESTORE_REGS 4  # skip orig_eax/error_code
> + /*
> +  * First restore user segments. This can cause exceptions, so we
> +  * run it with kernel cr3.
> +  */
> + RESTORE_SEGMENTS
> +
> + /*
> +  * Segments are restored - no more exceptions from here on except on
> +  * iret, but that handled safely.
> +  */
> + SWITCH_TO_USER_CR3 scratch_reg=%eax
> +
> + /* Restore rest */
> + RESTORE_INT_REGS
> +
> + /* Unwind stack to the iret frame */
> + RESTORE_SKIP_SEGMENTS 4 # skip orig_eax/error_code
>  .Lirq_return:
>   INTERRUPT_RETURN
>  
> @@ -1139,6 +1192,10 @@ ENTRY(debug)
>  
>   SAVE_ALL
>   ENCODE_FRAME_POINTER
> +
> + /* Make sure we are running on kernel cr3 */
> + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> +
>   xorl%edx, %edx  # error code 0
>   movl%esp, %eax  # pt_regs pointer
>  

The debug exception calls ret_from_exception on exit. If coming from
userspace, the C function prepare_exit_to_usermode() will be called.
With the PTI-32 code, it means that function will be called with the
entry stack instead of the task stack. This can be problematic as macro
like current won't work anymore.

-Longman



Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

2018-02-27 Thread Waiman Long
On 02/09/2018 04:25 AM, Joerg Roedel wrote:
> From: Joerg Roedel 
>
> Add unconditional cr3 switches between user and kernel cr3
> to all non-NMI entry and exit points.
>
> Signed-off-by: Joerg Roedel 
> ---
>  arch/x86/entry/entry_32.S | 59 
> ++-
>  1 file changed, 58 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 9693485..b5ef003 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -328,6 +328,25 @@
>  #endif /* CONFIG_X86_ESPFIX32 */
>  .endm
>  
> +/* Unconditionally switch to user cr3 */
> +.macro SWITCH_TO_USER_CR3 scratch_reg:req
> + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
> +
> + movl%cr3, \scratch_reg
> + orl $PTI_SWITCH_MASK, \scratch_reg
> + movl\scratch_reg, %cr3
> +.Lend_\@:
> +.endm
> +
> +/* Unconditionally switch to kernel cr3 */
> +.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
> + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
> + movl%cr3, \scratch_reg
> + andl$(~PTI_SWITCH_MASK), \scratch_reg
> + movl\scratch_reg, %cr3
> +.Lend_\@:
> +.endm
> +
>  
>  /*
>   * Called with pt_regs fully populated and kernel segments loaded,
> @@ -343,6 +362,8 @@
>  
>   ALTERNATIVE "", "jmp .Lend_\@", X86_FEATURE_XENPV
>  
> + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> +
>   /* Are we on the entry stack? Bail out if not! */
>   movlPER_CPU_VAR(cpu_entry_area), %edi
>   addl$CPU_ENTRY_AREA_entry_stack, %edi
> @@ -637,6 +658,18 @@ ENTRY(xen_sysenter_target)
>   * 0(%ebp) arg6
>   */
>  ENTRY(entry_SYSENTER_32)
> + /*
> +  * On entry-stack with all userspace-regs live - save and
> +  * restore eflags and %eax to use it as scratch-reg for the cr3
> +  * switch.
> +  */
> + pushfl
> + pushl   %eax
> + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> + popl%eax
> + popfl
> +
> + /* Stack empty again, switch to task stack */
>   movlTSS_entry_stack(%esp), %esp
>  
>  .Lsysenter_past_esp:
> @@ -691,6 +724,10 @@ ENTRY(entry_SYSENTER_32)
>   movlPT_OLDESP(%esp), %ecx   /* pt_regs->sp */
>  1:   mov PT_FS(%esp), %fs
>   PTGS_TO_GS
> +
> + /* Segments are restored - switch to user cr3 */
> + SWITCH_TO_USER_CR3 scratch_reg=%eax
> +
>   popl%ebx/* pt_regs->bx */
>   addl$2*4, %esp  /* skip pt_regs->cx and pt_regs->dx */
>   popl%esi/* pt_regs->si */
> @@ -778,7 +815,23 @@ restore_all:
>  .Lrestore_all_notrace:
>   CHECK_AND_APPLY_ESPFIX
>  .Lrestore_nocheck:
> - RESTORE_REGS 4  # skip orig_eax/error_code
> + /*
> +  * First restore user segments. This can cause exceptions, so we
> +  * run it with kernel cr3.
> +  */
> + RESTORE_SEGMENTS
> +
> + /*
> +  * Segments are restored - no more exceptions from here on except on
> +  * iret, but that handled safely.
> +  */
> + SWITCH_TO_USER_CR3 scratch_reg=%eax
> +
> + /* Restore rest */
> + RESTORE_INT_REGS
> +
> + /* Unwind stack to the iret frame */
> + RESTORE_SKIP_SEGMENTS 4 # skip orig_eax/error_code
>  .Lirq_return:
>   INTERRUPT_RETURN
>  
> @@ -1139,6 +1192,10 @@ ENTRY(debug)
>  
>   SAVE_ALL
>   ENCODE_FRAME_POINTER
> +
> + /* Make sure we are running on kernel cr3 */
> + SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> +
>   xorl%edx, %edx  # error code 0
>   movl%esp, %eax  # pt_regs pointer
>  

The debug exception calls ret_from_exception on exit. If coming from
userspace, the C function prepare_exit_to_usermode() will be called.
With the PTI-32 code, it means that function will be called with the
entry stack instead of the task stack. This can be problematic as macro
like current won't work anymore.

-Longman