Hi,

Jiri Slaby wrote:
> On 08/18/2015, 12:55 AM, Thomas D wrote:
>> From: Andy Lutomirski <[email protected]>
>>
>> commit 9b6e6a8334d56354853f9c255d1395c2ba570e0a upstream.
>>
>> Returning to userspace is tricky: IRET can fail, and ESPFIX can
>> rearrange the stack prior to IRET.
>>
>> The NMI nesting fixup relies on a precise stack layout and
>> atomic IRET.  Rather than trying to teach the NMI nesting fixup
>> to handle ESPFIX and failed IRET, punt: run NMIs that came from
>> user mode on the normal kernel stack.
>>
>> This will make some nested NMIs visible to C code, but the C
>> code is okay with that.
>>
>> As a side effect, this should speed up perf: it eliminates an
>> RDMSR when NMIs come from user mode.
>>
>> Signed-off-by: Andy Lutomirski <[email protected]>
>> Reviewed-by: Steven Rostedt <[email protected]>
>> Reviewed-by: Borislav Petkov <[email protected]>
>> Cc: Linus Torvalds <[email protected]>
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: [email protected]
>> Signed-off-by: Ingo Molnar <[email protected]>
>> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>> ---
>>  arch/x86/kernel/entry_64.S | 77 
>> +++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 73 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
>> index 28b08345..bd7d8aa 100644
>> --- a/arch/x86/kernel/entry_64.S
>> +++ b/arch/x86/kernel/entry_64.S
>> @@ -1715,19 +1715,88 @@ ENTRY(nmi)
>>       * a nested NMI that updated the copy interrupt stack frame, a
>>       * jump will be made to the repeat_nmi code that will handle the second
>>       * NMI.
>> +     *
>> +     * However, espfix prevents us from directly returning to userspace
>> +     * with a single IRET instruction.  Similarly, IRET to user mode
>> +     * can fault.  We therefore handle NMIs from user space like
>> +     * other IST entries.
>>       */
>>  
>>      /* Use %rdx as out temp variable throughout */
>>      pushq_cfi %rdx
>>      CFI_REL_OFFSET rdx, 0
>>  
>> +    testb   $3, CS-RIP+8(%rsp)
>> +    jz      .Lnmi_from_kernel
>> +
>> +    /*
>> +     * NMI from user mode.  We need to run on the thread stack, but we
>> +     * can't go through the normal entry paths: NMIs are masked, and
>> +     * we don't want to enable interrupts, because then we'll end
>> +     * up in an awkward situation in which IRQs are on but NMIs
>> +     * are off.
>> +     */
>> +
>> +    SWAPGS
>> +    cld
>> +    movq    %rsp, %rdx
>> +    movq    PER_CPU_VAR(kernel_stack), %rsp
> 
> I think you are wasting stack space here. With kernel_stack, you should
> add 5*8 (KERNEL_STACK_OFFSET) to the pointer here. I.e. space for 5
> registers is pre-reserved at kernel_stack already. (Or use movq instead
> of the 5 pushq below.)
> 
> Why don't you re-use the 3.16's version anyway?
> 
>> +    pushq   5*8(%rdx)       /* pt_regs->ss */
>> +    pushq   4*8(%rdx)       /* pt_regs->rsp */
>> +    pushq   3*8(%rdx)       /* pt_regs->flags */
>> +    pushq   2*8(%rdx)       /* pt_regs->cs */
>> +    pushq   1*8(%rdx)       /* pt_regs->rip */
>> +    pushq   $-1             /* pt_regs->orig_ax */
>> +    pushq   %rdi            /* pt_regs->di */
>> +    pushq   %rsi            /* pt_regs->si */
>> +    pushq   (%rdx)          /* pt_regs->dx */
>> +    pushq   %rcx            /* pt_regs->cx */
>> +    pushq   %rax            /* pt_regs->ax */
>> +    pushq   %r8             /* pt_regs->r8 */
>> +    pushq   %r9             /* pt_regs->r9 */
>> +    pushq   %r10            /* pt_regs->r10 */
>> +    pushq   %r11            /* pt_regs->r11 */
>> +    pushq   %rbx            /* pt_regs->rbx */
>> +    pushq   %rbp            /* pt_regs->rbp */
>> +    pushq   %r12            /* pt_regs->r12 */
>> +    pushq   %r13            /* pt_regs->r13 */
>> +    pushq   %r14            /* pt_regs->r14 */
>> +    pushq   %r15            /* pt_regs->r15 */

Mh, so you mean

> +     addq    $KERNEL_STACK_OFFSET, %rsp

between

> +     movq    PER_CPU_VAR(kernel_stack), %rsp

and

> +     pushq   5*8(%rdx)       /* pt_regs->ss */

is missing?

That seems to be the only difference between this patch and Debian's
3.16.7-ckt11-1+deb8u2 [1] version.

 [1]
https://anonscm.debian.org/cgit/kernel/linux.git/tree/debian/patches/bugfix/x86/0006-x86-nmi-64-Switch-stacks-on-userspace-NMI-entry.patch?h=jessie#n69



-Thomas

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to