Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-19 Thread Joerg Roedel
On Wed, Jan 17, 2018 at 10:12:32AM -0800, Andy Lutomirski wrote:
> I would optimize for simplicity, not speed.  You're already planning
> to write to CR3, which is serializing, blows away the TLB, *and* takes
> the absurdly large amount of time that the microcode needs to blow
> away the TLB.

Okay, so I am going to do the stack-switch before pt_regs is restored.
This is at least better than playing games with hiding the entry/exit
%esp somewhere in stack-memory.


Thanks,

Joerg


Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-19 Thread Joerg Roedel
On Wed, Jan 17, 2018 at 10:12:32AM -0800, Andy Lutomirski wrote:
> I would optimize for simplicity, not speed.  You're already planning
> to write to CR3, which is serializing, blows away the TLB, *and* takes
> the absurdly large amount of time that the microcode needs to blow
> away the TLB.

Okay, so I am going to do the stack-switch before pt_regs is restored.
This is at least better than playing games with hiding the entry/exit
%esp somewhere in stack-memory.


Thanks,

Joerg


Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-17 Thread Andy Lutomirski
On Wed, Jan 17, 2018 at 6:10 AM, Joerg Roedel  wrote:
> On Wed, Jan 17, 2018 at 05:57:53AM -0800, Brian Gerst wrote:
>> On Wed, Jan 17, 2018 at 1:24 AM, Joerg Roedel  wrote:
>
>> > I have no real idea on how to switch back to the entry stack without
>> > access to per_cpu variables. I also can't access the cpu_entry_area for
>> > the cpu yet, because for that we need to be on the entry stack already.
>>
>> Switch to the trampoline stack before loading user segments.
>
> That requires to copy most of pt_regs from task- to trampoline-stack,
> not sure if that is faster than temporily restoring kernel %fs.
>

I would optimize for simplicity, not speed.  You're already planning
to write to CR3, which is serializing, blows away the TLB, *and* takes
the absurdly large amount of time that the microcode needs to blow
away the TLB.

(For whatever reason, Intel doesn't seem to have hardware that can
quickly wipe the TLB.  I suspect that the actual implementation does
it in a loop and wipes little pieces at a time.  Whatever it actually
does, the CR3 write itself is very slow.)


Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-17 Thread Andy Lutomirski
On Wed, Jan 17, 2018 at 6:10 AM, Joerg Roedel  wrote:
> On Wed, Jan 17, 2018 at 05:57:53AM -0800, Brian Gerst wrote:
>> On Wed, Jan 17, 2018 at 1:24 AM, Joerg Roedel  wrote:
>
>> > I have no real idea on how to switch back to the entry stack without
>> > access to per_cpu variables. I also can't access the cpu_entry_area for
>> > the cpu yet, because for that we need to be on the entry stack already.
>>
>> Switch to the trampoline stack before loading user segments.
>
> That requires to copy most of pt_regs from task- to trampoline-stack,
> not sure if that is faster than temporily restoring kernel %fs.
>

I would optimize for simplicity, not speed.  You're already planning
to write to CR3, which is serializing, blows away the TLB, *and* takes
the absurdly large amount of time that the microcode needs to blow
away the TLB.

(For whatever reason, Intel doesn't seem to have hardware that can
quickly wipe the TLB.  I suspect that the actual implementation does
it in a loop and wipes little pieces at a time.  Whatever it actually
does, the CR3 write itself is very slow.)


Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-17 Thread Josh Poimboeuf
On Wed, Jan 17, 2018 at 03:14:18PM +0100, Joerg Roedel wrote:
> On Wed, Jan 17, 2018 at 06:00:07AM -0800, Brian Gerst wrote:
> > On Wed, Jan 17, 2018 at 5:57 AM, Brian Gerst  wrote:
> > But then again, you could take a fault on the trampoline stack if you
> > get a bad segment.  Perhaps just pushing the new stack pointer onto
> > the process stack before user segment loads will be the right move.
> 
> User segment loads pop from the stack, so having anything on-top also
> doesn't work.
> 
> Maybe I can leave some space at the bottom of the task-stack at entry
> time and store the pointer there on exit, if that doesn't confuse the
> stack unwinder too much.

If you put it at the end of the stack page, I _think_ all you'd have to
do is just adjust TOP_OF_KERNEL_STACK_PADDING.

-- 
Josh


Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-17 Thread Josh Poimboeuf
On Wed, Jan 17, 2018 at 03:14:18PM +0100, Joerg Roedel wrote:
> On Wed, Jan 17, 2018 at 06:00:07AM -0800, Brian Gerst wrote:
> > On Wed, Jan 17, 2018 at 5:57 AM, Brian Gerst  wrote:
> > But then again, you could take a fault on the trampoline stack if you
> > get a bad segment.  Perhaps just pushing the new stack pointer onto
> > the process stack before user segment loads will be the right move.
> 
> User segment loads pop from the stack, so having anything on-top also
> doesn't work.
> 
> Maybe I can leave some space at the bottom of the task-stack at entry
> time and store the pointer there on exit, if that doesn't confuse the
> stack unwinder too much.

If you put it at the end of the stack page, I _think_ all you'd have to
do is just adjust TOP_OF_KERNEL_STACK_PADDING.

-- 
Josh


Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-17 Thread Joerg Roedel
On Wed, Jan 17, 2018 at 06:00:07AM -0800, Brian Gerst wrote:
> On Wed, Jan 17, 2018 at 5:57 AM, Brian Gerst  wrote:
> But then again, you could take a fault on the trampoline stack if you
> get a bad segment.  Perhaps just pushing the new stack pointer onto
> the process stack before user segment loads will be the right move.

User segment loads pop from the stack, so having anything on-top also
doesn't work.

Maybe I can leave some space at the bottom of the task-stack at entry
time and store the pointer there on exit, if that doesn't confuse the
stack unwinder too much.


Joerg


Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-17 Thread Joerg Roedel
On Wed, Jan 17, 2018 at 06:00:07AM -0800, Brian Gerst wrote:
> On Wed, Jan 17, 2018 at 5:57 AM, Brian Gerst  wrote:
> But then again, you could take a fault on the trampoline stack if you
> get a bad segment.  Perhaps just pushing the new stack pointer onto
> the process stack before user segment loads will be the right move.

User segment loads pop from the stack, so having anything on-top also
doesn't work.

Maybe I can leave some space at the bottom of the task-stack at entry
time and store the pointer there on exit, if that doesn't confuse the
stack unwinder too much.


Joerg


Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-17 Thread Joerg Roedel
On Wed, Jan 17, 2018 at 05:57:53AM -0800, Brian Gerst wrote:
> On Wed, Jan 17, 2018 at 1:24 AM, Joerg Roedel  wrote:

> > I have no real idea on how to switch back to the entry stack without
> > access to per_cpu variables. I also can't access the cpu_entry_area for
> > the cpu yet, because for that we need to be on the entry stack already.
> 
> Switch to the trampoline stack before loading user segments.

That requires to copy most of pt_regs from task- to trampoline-stack,
not sure if that is faster than temporily restoring kernel %fs.


Joerg


Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-17 Thread Joerg Roedel
On Wed, Jan 17, 2018 at 05:57:53AM -0800, Brian Gerst wrote:
> On Wed, Jan 17, 2018 at 1:24 AM, Joerg Roedel  wrote:

> > I have no real idea on how to switch back to the entry stack without
> > access to per_cpu variables. I also can't access the cpu_entry_area for
> > the cpu yet, because for that we need to be on the entry stack already.
> 
> Switch to the trampoline stack before loading user segments.

That requires to copy most of pt_regs from task- to trampoline-stack,
not sure if that is faster than temporily restoring kernel %fs.


Joerg


Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-17 Thread Brian Gerst
On Wed, Jan 17, 2018 at 5:57 AM, Brian Gerst  wrote:
> On Wed, Jan 17, 2018 at 1:24 AM, Joerg Roedel  wrote:
>> On Tue, Jan 16, 2018 at 02:48:43PM -0800, Andy Lutomirski wrote:
>>> On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel  wrote:
>>> > +   /* Restore user %edi and user %fs */
>>> > +   movl (%edi), %edi
>>> > +   popl %fs
>>>
>>> Yikes!  We're not *supposed* to be able to observe an asynchronous
>>> descriptor table change, but if the LDT changes out from under you,
>>> this is going to blow up badly.  It would be really nice if you could
>>> pull this off without percpu access or without needing to do this
>>> dance where you load user FS, then kernel FS, then user FS.  If that's
>>> not doable, then you should at least add exception handling -- look at
>>> the other 'pop %fs' instructions in entry_32.S.
>>
>> You are right! This also means I need to do the 'popl %fs' before the
>> cr3-switch. I'll fix it in the next version.
>>
>> I have no real idea on how to switch back to the entry stack without
>> access to per_cpu variables. I also can't access the cpu_entry_area for
>> the cpu yet, because for that we need to be on the entry stack already.
>
> Switch to the trampoline stack before loading user segments.

But then again, you could take a fault on the trampoline stack if you
get a bad segment.  Perhaps just pushing the new stack pointer onto
the process stack before user segment loads will be the right move.

--
Brian Gerst


Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-17 Thread Brian Gerst
On Wed, Jan 17, 2018 at 5:57 AM, Brian Gerst  wrote:
> On Wed, Jan 17, 2018 at 1:24 AM, Joerg Roedel  wrote:
>> On Tue, Jan 16, 2018 at 02:48:43PM -0800, Andy Lutomirski wrote:
>>> On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel  wrote:
>>> > +   /* Restore user %edi and user %fs */
>>> > +   movl (%edi), %edi
>>> > +   popl %fs
>>>
>>> Yikes!  We're not *supposed* to be able to observe an asynchronous
>>> descriptor table change, but if the LDT changes out from under you,
>>> this is going to blow up badly.  It would be really nice if you could
>>> pull this off without percpu access or without needing to do this
>>> dance where you load user FS, then kernel FS, then user FS.  If that's
>>> not doable, then you should at least add exception handling -- look at
>>> the other 'pop %fs' instructions in entry_32.S.
>>
>> You are right! This also means I need to do the 'popl %fs' before the
>> cr3-switch. I'll fix it in the next version.
>>
>> I have no real idea on how to switch back to the entry stack without
>> access to per_cpu variables. I also can't access the cpu_entry_area for
>> the cpu yet, because for that we need to be on the entry stack already.
>
> Switch to the trampoline stack before loading user segments.

But then again, you could take a fault on the trampoline stack if you
get a bad segment.  Perhaps just pushing the new stack pointer onto
the process stack before user segment loads will be the right move.

--
Brian Gerst


Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-17 Thread Brian Gerst
On Wed, Jan 17, 2018 at 1:24 AM, Joerg Roedel  wrote:
> On Tue, Jan 16, 2018 at 02:48:43PM -0800, Andy Lutomirski wrote:
>> On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel  wrote:
>> > +   /* Restore user %edi and user %fs */
>> > +   movl (%edi), %edi
>> > +   popl %fs
>>
>> Yikes!  We're not *supposed* to be able to observe an asynchronous
>> descriptor table change, but if the LDT changes out from under you,
>> this is going to blow up badly.  It would be really nice if you could
>> pull this off without percpu access or without needing to do this
>> dance where you load user FS, then kernel FS, then user FS.  If that's
>> not doable, then you should at least add exception handling -- look at
>> the other 'pop %fs' instructions in entry_32.S.
>
> You are right! This also means I need to do the 'popl %fs' before the
> cr3-switch. I'll fix it in the next version.
>
> I have no real idea on how to switch back to the entry stack without
> access to per_cpu variables. I also can't access the cpu_entry_area for
> the cpu yet, because for that we need to be on the entry stack already.

Switch to the trampoline stack before loading user segments.

--
Brian Gerst


Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-17 Thread Brian Gerst
On Wed, Jan 17, 2018 at 1:24 AM, Joerg Roedel  wrote:
> On Tue, Jan 16, 2018 at 02:48:43PM -0800, Andy Lutomirski wrote:
>> On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel  wrote:
>> > +   /* Restore user %edi and user %fs */
>> > +   movl (%edi), %edi
>> > +   popl %fs
>>
>> Yikes!  We're not *supposed* to be able to observe an asynchronous
>> descriptor table change, but if the LDT changes out from under you,
>> this is going to blow up badly.  It would be really nice if you could
>> pull this off without percpu access or without needing to do this
>> dance where you load user FS, then kernel FS, then user FS.  If that's
>> not doable, then you should at least add exception handling -- look at
>> the other 'pop %fs' instructions in entry_32.S.
>
> You are right! This also means I need to do the 'popl %fs' before the
> cr3-switch. I'll fix it in the next version.
>
> I have no real idea on how to switch back to the entry stack without
> access to per_cpu variables. I also can't access the cpu_entry_area for
> the cpu yet, because for that we need to be on the entry stack already.

Switch to the trampoline stack before loading user segments.

--
Brian Gerst


Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-17 Thread Joerg Roedel
On Tue, Jan 16, 2018 at 02:48:43PM -0800, Andy Lutomirski wrote:
> On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel  wrote:
> > +   /* Restore user %edi and user %fs */
> > +   movl (%edi), %edi
> > +   popl %fs
> 
> Yikes!  We're not *supposed* to be able to observe an asynchronous
> descriptor table change, but if the LDT changes out from under you,
> this is going to blow up badly.  It would be really nice if you could
> pull this off without percpu access or without needing to do this
> dance where you load user FS, then kernel FS, then user FS.  If that's
> not doable, then you should at least add exception handling -- look at
> the other 'pop %fs' instructions in entry_32.S.

You are right! This also means I need to do the 'popl %fs' before the
cr3-switch. I'll fix it in the next version.

I have no real idea on how to switch back to the entry stack without
access to per_cpu variables. I also can't access the cpu_entry_area for
the cpu yet, because for that we need to be on the entry stack already.


Joerg



Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-17 Thread Joerg Roedel
On Tue, Jan 16, 2018 at 02:48:43PM -0800, Andy Lutomirski wrote:
> On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel  wrote:
> > +   /* Restore user %edi and user %fs */
> > +   movl (%edi), %edi
> > +   popl %fs
> 
> Yikes!  We're not *supposed* to be able to observe an asynchronous
> descriptor table change, but if the LDT changes out from under you,
> this is going to blow up badly.  It would be really nice if you could
> pull this off without percpu access or without needing to do this
> dance where you load user FS, then kernel FS, then user FS.  If that's
> not doable, then you should at least add exception handling -- look at
> the other 'pop %fs' instructions in entry_32.S.

You are right! This also means I need to do the 'popl %fs' before the
cr3-switch. I'll fix it in the next version.

I have no real idea on how to switch back to the entry stack without
access to per_cpu variables. I also can't access the cpu_entry_area for
the cpu yet, because for that we need to be on the entry stack already.


Joerg



Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-16 Thread Andy Lutomirski
On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel  wrote:
> From: Joerg Roedel 
>
> Switch back to the trampoline stack before returning to
> userspace.
>
> Signed-off-by: Joerg Roedel 
> ---
>  arch/x86/entry/entry_32.S| 58 
> 
>  arch/x86/kernel/asm-offsets_32.c |  1 +
>  2 files changed, 59 insertions(+)
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 5a7bdb73be9f..14018eeb11c3 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -263,6 +263,61 @@
>  .endm
>
>  /*
> + * Switch back from the kernel stack to the entry stack.
> + *
> + * iret_frame > 0 adds code to copie over an iret frame from the old to
> + *the new stack. It also adds a check which bails out if
> + *we are not returning to user-space.
> + *
> + * This macro is allowed not modify eflags when iret_frame == 0.
> + */
> +.macro SWITCH_TO_ENTRY_STACK iret_frame=0
> +   .if \iret_frame > 0
> +   /* Are we returning to userspace? */
> +   testb   $3, 4(%esp) /* return CS */
> +   jz .Lend_\@
> +   .endif
> +
> +   /*
> +* We run with user-%fs already loaded from pt_regs, so we don't
> +* have access to per_cpu data anymore, and there is no swapgs
> +* equivalent on x86_32.
> +* We work around this by loading the kernel-%fs again and
> +* reading the entry stack address from there. Then we restore
> +* the user-%fs and return.
> +*/
> +   pushl %fs
> +   pushl %edi
> +
> +   /* Re-load kernel-%fs, after that we can use PER_CPU_VAR */
> +   movl $(__KERNEL_PERCPU), %edi
> +   movl %edi, %fs
> +
> +   /* Save old stack pointer to copy the return frame over if needed */
> +   movl %esp, %edi
> +   movl PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %esp
> +
> +   /* Now we are on the entry stack */
> +
> +   .if \iret_frame > 0
> +   /* Stack frame: ss, esp, eflags, cs, eip, fs, edi */
> +   pushl 6*4(%edi) /* ss */
> +   pushl 5*4(%edi) /* esp */
> +   pushl 4*4(%edi) /* eflags */
> +   pushl 3*4(%edi) /* cs */
> +   pushl 2*4(%edi) /* eip */
> +   .endif
> +
> +   pushl 4(%edi)   /* fs */
> +
> +   /* Restore user %edi and user %fs */
> +   movl (%edi), %edi
> +   popl %fs

Yikes!  We're not *supposed* to be able to observe an asynchronous
descriptor table change, but if the LDT changes out from under you,
this is going to blow up badly.  It would be really nice if you could
pull this off without percpu access or without needing to do this
dance where you load user FS, then kernel FS, then user FS.  If that's
not doable, then you should at least add exception handling -- look at
the other 'pop %fs' instructions in entry_32.S.


Re: [PATCH 03/16] x86/entry/32: Leave the kernel via the trampoline stack

2018-01-16 Thread Andy Lutomirski
On Tue, Jan 16, 2018 at 8:36 AM, Joerg Roedel  wrote:
> From: Joerg Roedel 
>
> Switch back to the trampoline stack before returning to
> userspace.
>
> Signed-off-by: Joerg Roedel 
> ---
>  arch/x86/entry/entry_32.S| 58 
> 
>  arch/x86/kernel/asm-offsets_32.c |  1 +
>  2 files changed, 59 insertions(+)
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 5a7bdb73be9f..14018eeb11c3 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -263,6 +263,61 @@
>  .endm
>
>  /*
> + * Switch back from the kernel stack to the entry stack.
> + *
> + * iret_frame > 0 adds code to copie over an iret frame from the old to
> + *the new stack. It also adds a check which bails out if
> + *we are not returning to user-space.
> + *
> + * This macro is allowed not modify eflags when iret_frame == 0.
> + */
> +.macro SWITCH_TO_ENTRY_STACK iret_frame=0
> +   .if \iret_frame > 0
> +   /* Are we returning to userspace? */
> +   testb   $3, 4(%esp) /* return CS */
> +   jz .Lend_\@
> +   .endif
> +
> +   /*
> +* We run with user-%fs already loaded from pt_regs, so we don't
> +* have access to per_cpu data anymore, and there is no swapgs
> +* equivalent on x86_32.
> +* We work around this by loading the kernel-%fs again and
> +* reading the entry stack address from there. Then we restore
> +* the user-%fs and return.
> +*/
> +   pushl %fs
> +   pushl %edi
> +
> +   /* Re-load kernel-%fs, after that we can use PER_CPU_VAR */
> +   movl $(__KERNEL_PERCPU), %edi
> +   movl %edi, %fs
> +
> +   /* Save old stack pointer to copy the return frame over if needed */
> +   movl %esp, %edi
> +   movl PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %esp
> +
> +   /* Now we are on the entry stack */
> +
> +   .if \iret_frame > 0
> +   /* Stack frame: ss, esp, eflags, cs, eip, fs, edi */
> +   pushl 6*4(%edi) /* ss */
> +   pushl 5*4(%edi) /* esp */
> +   pushl 4*4(%edi) /* eflags */
> +   pushl 3*4(%edi) /* cs */
> +   pushl 2*4(%edi) /* eip */
> +   .endif
> +
> +   pushl 4(%edi)   /* fs */
> +
> +   /* Restore user %edi and user %fs */
> +   movl (%edi), %edi
> +   popl %fs

Yikes!  We're not *supposed* to be able to observe an asynchronous
descriptor table change, but if the LDT changes out from under you,
this is going to blow up badly.  It would be really nice if you could
pull this off without percpu access or without needing to do this
dance where you load user FS, then kernel FS, then user FS.  If that's
not doable, then you should at least add exception handling -- look at
the other 'pop %fs' instructions in entry_32.S.