Re: [PATCH 3/3] x86/efi: Use efi_switch_mm() rather than manually twiddling with cr3

2017-08-23 Thread Sai Praneeth Prakhya
On Mon, 2017-08-21 at 08:23 -0700, Andy Lutomirski wrote:
> 
> > On Aug 21, 2017, at 7:08 AM, Peter Zijlstra  wrote:
> > 
> >> On Mon, Aug 21, 2017 at 06:56:01AM -0700, Andy Lutomirski wrote:
> >> 
> >> 
> >>> On Aug 21, 2017, at 3:33 AM, Peter Zijlstra  wrote:
> > 
>  
>  Using a kernel thread solves the problem for real.  Anything that
>  blindly accesses user memory in kernel thread context is terminally
>  broken no matter what.
> >>> 
> >>> So perf-callchain doesn't do it 'blindly', it wants either:
> >>> 
> >>> - user_mode(regs) true, or
> >>> - task_pt_regs() set.
> >>> 
> >>> However I'm thinking that if the kernel thread has ->mm == _mm, the
> >>> EFI code running could very well have user_mode(regs) being true.
> >>> 
> >>> intel_pmu_pebs_fixup() OTOH 'blindly' assumes that the LBR addresses are
> >>> accessible. It bails on error though. So while its careful, it does
> >>> attempt to access the 'user' mapping directly. Which should also trigger
> >>> with the EFI code.
> >>> 
> >>> And I'm not seeing anything particularly broken with either. The PEBS
> >>> fixup relies on the CPU having just executed the code, and if it could
> >>> fetch and execute the code, why shouldn't it be able to fetch and read?
> >> 
> >> There are two ways this could be a problem.  One is that u privileged
> >> user apps shouldn't be able to read from EFI memory.
> > 
> > Ah, but only root can create per-cpu events or attach events to kernel
> > threads (with sensible paranoia levels).
> 
> But this may not need to be percpu.  If a non root user can trigger, say, an 
> EFI variable read in their own thread context, boom.
> 
+ Tony

Hi Andi,

I am trying to reproduce the issue that we are discussing and hence
tried an experiment like this:
A user process continuously reads efi variable by
"cat /sys/firmware/efi/efivars/Boot-8be4df61-93ca-11d2-aa0d-00e098032b8c" 
for specified time (Eg: 100 seconds) and simultaneously I ran "perf top" as 
root (which I suppose should trigger NMI's). I see that everything is fine, no 
lockups, no kernel crash, no warnings/errors in dmesg.

I see that perf top reports 50% of time is spent in efi function
(probably efi_get_variable()).
OverheadShared Object   Symbol
50% [unknown]   [k] 0xfffeea967416

50% is max, on avg it's 35%.

I have tested this on two kernels v4.12 and v3.19. My machine has 8
cores and to stress test, I further offlined all cpus except cpu0.

Could you please let me know a way to reproduce the issue that we are
discussing here.
I think the issue we are concerned here is, when kernel is in efi
context and an NMI happens and if the NMI handler tries to access user
space, boom! we don't have user space in efi context. Am I right in
understanding the issue or is it something else?

Regards,
Sai

--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Part1 PATCH v3 14/17] x86/boot: Add early boot support when running with SEV active

2017-08-23 Thread Borislav Petkov
On Mon, Jul 24, 2017 at 02:07:54PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky 
> 
> Early in the boot process, add checks to determine if the kernel is
> running with Secure Encrypted Virtualization (SEV) active.
> 
> Checking for SEV requires checking that the kernel is running under a
> hypervisor (CPUID 0x0001, bit 31), that the SEV feature is available
> (CPUID 0x801f, bit 1) and then check a non-interceptable SEV MSR
> (0xc0010131, bit 0).
> 
> This check is required so that during early compressed kernel booting the
> pagetables (both the boot pagetables and KASLR pagetables (if enabled) are
> updated to include the encryption mask so that when the kernel is
> decompressed into encrypted memory.

, it can boot properly.

:)

> After the kernel is decompressed and continues booting the same logic is
> used to check if SEV is active and set a flag indicating so.  This allows
> us to distinguish between SME and SEV, each of which have unique
> differences in how certain things are handled: e.g. DMA (always bounce
> buffered with SEV) or EFI tables (always access decrypted with SME).
> 
> Signed-off-by: Tom Lendacky 
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/boot/compressed/Makefile  |   2 +
>  arch/x86/boot/compressed/head_64.S |  16 +
>  arch/x86/boot/compressed/mem_encrypt.S | 103 
> +
>  arch/x86/boot/compressed/misc.h|   2 +
>  arch/x86/boot/compressed/pagetable.c   |   8 ++-
>  arch/x86/include/asm/mem_encrypt.h |   3 +
>  arch/x86/include/asm/msr-index.h   |   3 +
>  arch/x86/include/uapi/asm/kvm_para.h   |   1 -
>  arch/x86/mm/mem_encrypt.c  |  42 +++---
>  9 files changed, 169 insertions(+), 11 deletions(-)
>  create mode 100644 arch/x86/boot/compressed/mem_encrypt.S
> 
> diff --git a/arch/x86/boot/compressed/Makefile 
> b/arch/x86/boot/compressed/Makefile
> index 2c860ad..d2fe901 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -72,6 +72,8 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/head_$(BITS).o 
> $(obj)/misc.o \
>   $(obj)/string.o $(obj)/cmdline.o $(obj)/error.o \
>   $(obj)/piggy.o $(obj)/cpuflags.o
>  
> +vmlinux-objs-$(CONFIG_X86_64) += $(obj)/mem_encrypt.o

There's a

ifdef CONFIG_X86_64

a couple of lines below. Just put it there.

...

> +++ b/arch/x86/boot/compressed/mem_encrypt.S
> @@ -0,0 +1,103 @@
> +/*
> + * AMD Memory Encryption Support
> + *
> + * Copyright (C) 2017 Advanced Micro Devices, Inc.
> + *
> + * Author: Tom Lendacky 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
> + .text
> + .code32
> +ENTRY(get_sev_encryption_bit)
> + xor %eax, %eax
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + push%ebx
> + push%ecx
> + push%edx
> +
> + /* Check if running under a hypervisor */
> + movl$1, %eax
> + cpuid
> + bt  $31, %ecx   /* Check the hypervisor bit */
> + jnc .Lno_sev
> +
> + movl$0x8000, %eax   /* CPUID to check the highest leaf */
> + cpuid
> + cmpl$0x801f, %eax   /* See if 0x801f is available */
> + jb  .Lno_sev
> +
> + /*
> +  * Check for the SEV feature:
> +  *   CPUID Fn8000_001F[EAX] - Bit 1
> +  *   CPUID Fn8000_001F[EBX] - Bits 5:0
> +  * Pagetable bit position used to indicate encryption
> +  */
> + movl$0x801f, %eax
> + cpuid
> + bt  $1, %eax/* Check if SEV is available */
> + jnc .Lno_sev
> +
> + movl$MSR_F17H_SEV, %ecx /* Read the SEV MSR */
> + rdmsr
> + bt  $MSR_F17H_SEV_ENABLED_BIT, %eax /* Check if SEV is active */
> + jnc .Lno_sev
> +
> + /*
> +  * Get memory encryption information:
> +  */

The side-comment is enough. This one above can go.

> + movl%ebx, %eax
> + andl$0x3f, %eax /* Return the encryption bit location */
> + jmp .Lsev_exit
> +
> +.Lno_sev:
> + xor %eax, %eax
> +
> +.Lsev_exit:
> + pop %edx
> + pop %ecx
> + pop %ebx
> +
> +#endif   /* CONFIG_AMD_MEM_ENCRYPT */
> +
> + ret
> +ENDPROC(get_sev_encryption_bit)
> +
> + .code64
> +ENTRY(get_sev_encryption_mask)
> + xor %rax, %rax
> +
> +#ifdef CONFIG_AMD_MEM_ENCRYPT
> + push%rbp
> + push%rdx
> +
> + movq%rsp, %rbp  /* Save current stack pointer */
> +
> + callget_sev_encryption_bit  /* Get the encryption bit position */

So we get to call get_sev_encryption_bit() here again and noodle through
the CPUID discovery and