Re: [PATCH] KVM: arm64: Disable LTO in hyp

2021-03-05 Thread Sami Tolvanen
On Thu, Mar 4, 2021 at 2:34 PM Sami Tolvanen  wrote:
>
> On Thu, Mar 4, 2021 at 2:17 PM Marc Zyngier  wrote:
> >
> > On Thu, 04 Mar 2021 21:25:41 +,
> > Sami Tolvanen  wrote:
> > >
> > > On Thu, Mar 4, 2021 at 11:15 AM Marc Zyngier  wrote:
> > > >
> > > > On Thu, 04 Mar 2021 18:45:44 +,
> > > > Sami Tolvanen  wrote:
> > > > >
> > > > > allmodconfig + CONFIG_LTO_CLANG_THIN=y fails to build due to following
> > > > > linker errors:
> > > > >
> > > > >   ld.lld: error: irqbypass.c:(function __guest_enter: .text+0x21CC):
> > > >
> > > > I assume this message is only an oddity, right? Because
> > > > __guest_enter() is as far as you can imagine from irqbypass.c...
> > >
> > > I'm not sure what's up with the filename in the error message. Fangrui
> > > or Nick probably have a better idea.
> > >
> > > > >   relocation R_AARCH64_CONDBR19 out of range: 2031220 is not in
> > > > >   [-1048576, 1048575]; references hyp_panic
> > > > >   >>> defined in vmlinux.o
> > > > >
> > > > >   ld.lld: error: irqbypass.c:(function __guest_enter: .text+0x21E0):
> > > > >   relocation R_AARCH64_ADR_PREL_LO21 out of range: 2031200 is not in
> > > > >   [-1048576, 1048575]; references hyp_panic
> > > > >   >>> defined in vmlinux.o
> > > > >
> > > > > As LTO is not really necessary for the hypervisor code, disable it for
> > > > > the hyp directory to fix the build.
> > > >
> > > > Can you shed some light on what the problem is exactly?
> > >
> > > I assume hyp_panic() ends up being placed too far from __guest_enter()
> > > when the kernel is large enough. Possibly something to do with LLVM
> > > always splitting functions into separate sections with LTO. I'm not
> > > sure why the linker cannot shuffle things around to make everyone
> > > happy in this case, but I confirmed that this patch also fixes the
> > > build issue for me:
> > >
> > > diff --git a/arch/arm64/kvm/hyp/vhe/switch.c 
> > > b/arch/arm64/kvm/hyp/vhe/switch.c
> > > index af8e940d0f03..128197b7c794 100644
> > > --- a/arch/arm64/kvm/hyp/vhe/switch.c
> > > +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> > > @@ -214,7 +214,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, u64 
> > > par)
> > >  }
> > >  NOKPROBE_SYMBOL(__hyp_call_panic);
> > >
> > > -void __noreturn hyp_panic(void)
> > > +void __noreturn hyp_panic(void) __section(".text")
> > >  {
> > > u64 spsr = read_sysreg_el2(SYS_SPSR);
> > > u64 elr = read_sysreg_el2(SYS_ELR);
> > >
> >
> > We're getting into black-magic territory here. Why wouldn't hyp_panic
> > be in the .text section already?
>
> It's not quite black magic. LLVM essentially flips on
> -ffunction-sections with LTO and therefore, hyp_panic() will be in
> .text.hyp_panic in vmlinux.o, while __guest_enter() will be in .text.
> Everything ends up in .text when we link vmlinux, of course.
>
> $ readelf --sections vmlinux.o | grep hyp_panic
>   [3936] .text.hyp_panic   PROGBITS   004b56e4

Note that disabling LTO here has essentially the same effect as using
__section(".text"). It stops the compiler from splitting these
functions into .text.* sections and makes it less likely that
hyp_panic() ends up too far away from __guest_enter().

If neither of these workarounds sound appealing, I suppose we could
alternatively change hyp/entry.S to use adr_l for hyp_panic. Thoughts?

Sami
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Disable LTO in hyp

2021-03-05 Thread Sami Tolvanen
On Thu, Mar 4, 2021 at 2:17 PM Marc Zyngier  wrote:
>
> On Thu, 04 Mar 2021 21:25:41 +,
> Sami Tolvanen  wrote:
> >
> > On Thu, Mar 4, 2021 at 11:15 AM Marc Zyngier  wrote:
> > >
> > > On Thu, 04 Mar 2021 18:45:44 +,
> > > Sami Tolvanen  wrote:
> > > >
> > > > allmodconfig + CONFIG_LTO_CLANG_THIN=y fails to build due to following
> > > > linker errors:
> > > >
> > > >   ld.lld: error: irqbypass.c:(function __guest_enter: .text+0x21CC):
> > >
> > > I assume this message is only an oddity, right? Because
> > > __guest_enter() is as far as you can imagine from irqbypass.c...
> >
> > I'm not sure what's up with the filename in the error message. Fangrui
> > or Nick probably have a better idea.
> >
> > > >   relocation R_AARCH64_CONDBR19 out of range: 2031220 is not in
> > > >   [-1048576, 1048575]; references hyp_panic
> > > >   >>> defined in vmlinux.o
> > > >
> > > >   ld.lld: error: irqbypass.c:(function __guest_enter: .text+0x21E0):
> > > >   relocation R_AARCH64_ADR_PREL_LO21 out of range: 2031200 is not in
> > > >   [-1048576, 1048575]; references hyp_panic
> > > >   >>> defined in vmlinux.o
> > > >
> > > > As LTO is not really necessary for the hypervisor code, disable it for
> > > > the hyp directory to fix the build.
> > >
> > > Can you shed some light on what the problem is exactly?
> >
> > I assume hyp_panic() ends up being placed too far from __guest_enter()
> > when the kernel is large enough. Possibly something to do with LLVM
> > always splitting functions into separate sections with LTO. I'm not
> > sure why the linker cannot shuffle things around to make everyone
> > happy in this case, but I confirmed that this patch also fixes the
> > build issue for me:
> >
> > diff --git a/arch/arm64/kvm/hyp/vhe/switch.c 
> > b/arch/arm64/kvm/hyp/vhe/switch.c
> > index af8e940d0f03..128197b7c794 100644
> > --- a/arch/arm64/kvm/hyp/vhe/switch.c
> > +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> > @@ -214,7 +214,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, u64 par)
> >  }
> >  NOKPROBE_SYMBOL(__hyp_call_panic);
> >
> > -void __noreturn hyp_panic(void)
> > +void __noreturn hyp_panic(void) __section(".text")
> >  {
> > u64 spsr = read_sysreg_el2(SYS_SPSR);
> > u64 elr = read_sysreg_el2(SYS_ELR);
> >
>
> We're getting into black-magic territory here. Why wouldn't hyp_panic
> be in the .text section already?

It's not quite black magic. LLVM essentially flips on
-ffunction-sections with LTO and therefore, hyp_panic() will be in
.text.hyp_panic in vmlinux.o, while __guest_enter() will be in .text.
Everything ends up in .text when we link vmlinux, of course.

$ readelf --sections vmlinux.o | grep hyp_panic
  [3936] .text.hyp_panic   PROGBITS   004b56e4

Sami
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Disable LTO in hyp

2021-03-05 Thread Sami Tolvanen
On Fri, Mar 5, 2021 at 6:22 AM Ard Biesheuvel  wrote:
>
> On Fri, 5 Mar 2021 at 12:38, Marc Zyngier  wrote:
> >
> > On Fri, 05 Mar 2021 02:38:17 +,
> > Sami Tolvanen  wrote:
> > >
> > > On Thu, Mar 4, 2021 at 2:34 PM Sami Tolvanen  
> > > wrote:
> > > >
> > > > On Thu, Mar 4, 2021 at 2:17 PM Marc Zyngier  wrote:
> > > > >
> > > > > On Thu, 04 Mar 2021 21:25:41 +,
> > > > > Sami Tolvanen  wrote:
> >
> > [...]
> >
> > > > > > I assume hyp_panic() ends up being placed too far from 
> > > > > > __guest_enter()
> > > > > > when the kernel is large enough. Possibly something to do with LLVM
> > > > > > always splitting functions into separate sections with LTO. I'm not
> > > > > > sure why the linker cannot shuffle things around to make everyone
> > > > > > happy in this case, but I confirmed that this patch also fixes the
> > > > > > build issue for me:
> > > > > >
> > > > > > diff --git a/arch/arm64/kvm/hyp/vhe/switch.c 
> > > > > > b/arch/arm64/kvm/hyp/vhe/switch.c
> > > > > > index af8e940d0f03..128197b7c794 100644
> > > > > > --- a/arch/arm64/kvm/hyp/vhe/switch.c
> > > > > > +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> > > > > > @@ -214,7 +214,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, 
> > > > > > u64 par)
> > > > > >  }
> > > > > >  NOKPROBE_SYMBOL(__hyp_call_panic);
> > > > > >
> > > > > > -void __noreturn hyp_panic(void)
> > > > > > +void __noreturn hyp_panic(void) __section(".text")
> > > > > >  {
> > > > > > u64 spsr = read_sysreg_el2(SYS_SPSR);
> > > > > > u64 elr = read_sysreg_el2(SYS_ELR);
> > > > > >
> > > > >
> > > > > We're getting into black-magic territory here. Why wouldn't hyp_panic
> > > > > be in the .text section already?
> > > >
> > > > It's not quite black magic. LLVM essentially flips on
> > > > -ffunction-sections with LTO and therefore, hyp_panic() will be in
> > > > .text.hyp_panic in vmlinux.o, while __guest_enter() will be in .text.
> > > > Everything ends up in .text when we link vmlinux, of course.
> > > >
> > > > $ readelf --sections vmlinux.o | grep hyp_panic
> > > >   [3936] .text.hyp_panic   PROGBITS   004b56e4
> > >
> > > Note that disabling LTO here has essentially the same effect as using
> > > __section(".text"). It stops the compiler from splitting these
> > > functions into .text.* sections and makes it less likely that
> > > hyp_panic() ends up too far away from __guest_enter().
> > >
> > > If neither of these workarounds sound appealing, I suppose we could
> > > alternatively change hyp/entry.S to use adr_l for hyp_panic. Thoughts?
> >
> > That would be an actual fix instead of a workaround, as it would
> > remove existing assumptions about the relative locations of the two
> > objects. I guess you need to fix both instances with something such
> > as:
> >
> > diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> > index b0afad7a99c6..a43e1f7ee354 100644
> > --- a/arch/arm64/kvm/hyp/entry.S
> > +++ b/arch/arm64/kvm/hyp/entry.S
> > @@ -85,8 +85,10 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL)
> >
> > // If the hyp context is loaded, go straight to hyp_panic
> > get_loaded_vcpu x0, x1
> > -   cbz x0, hyp_panic
> > -
> > +   cbnzx0, 1f
> > +   adr_l   x0, hyp_panic
> > +   br  x0
> > +1:
>
> Agree with replacing the conditional branches that refer to external
> symbols: the compiler never emits those, for the reason we are seeing
> here, i.e., the range is simply insufficient.
>
> But let's just use 'b hyp_panic' instead, no?

Alright, this seems to work for me:

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index b0afad7a99c6..c62265951467 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -85,8 +85,10 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL)

// If the hyp context is loaded, go straight to hyp_panic
get_loaded_vcpu x0, x1
-   cbz x0, hyp_panic
+   cbnzx0, 1f
+   b   hyp_panic

+1:
// The hyp context is saved so make sure it is restored to allow
// hyp_panic to run at hyp and, subsequently, panic to run in the host.
// This makes use of __guest_exit to avoid duplication but sets the
@@ -94,7 +96,7 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL)
// current state is saved to the guest context but it will only be
// accurate if the guest had been completely restored.
adr_this_cpu x0, kvm_hyp_ctxt, x1
-   adr x1, hyp_panic
+   adr_l   x1, hyp_panic
str x1, [x0, #CPU_XREG_OFFSET(30)]

get_vcpu_ptrx1, x0

But when I say work, I mean this fixes the allmodconfig build with
LTO, and my kernel boots at EL2. I don't actually have a way to
properly test KVM on arm64. If nobody sees obvious issues here, I can
send a proper patch a bit later.

> > // The hyp context is saved so make sure it is restored to allow
> > // hyp_panic to run at hyp and, subsequently, 

Re: [PATCH] KVM: arm64: Don't use cbz/adr with external symbols

2021-03-05 Thread Kees Cook
On Fri, Mar 05, 2021 at 12:21:24PM -0800, Sami Tolvanen wrote:
> allmodconfig + CONFIG_LTO_CLANG_THIN=y fails to build due to following
> linker errors:
> 
>   ld.lld: error: irqbypass.c:(function __guest_enter: .text+0x21CC):
>   relocation R_AARCH64_CONDBR19 out of range: 2031220 is not in
>   [-1048576, 1048575]; references hyp_panic
>   >>> defined in vmlinux.o
> 
>   ld.lld: error: irqbypass.c:(function __guest_enter: .text+0x21E0):
>   relocation R_AARCH64_ADR_PREL_LO21 out of range: 2031200 is not in
>   [-1048576, 1048575]; references hyp_panic
>   >>> defined in vmlinux.o
> 
> This is because with LTO, the compiler ends up placing hyp_panic()
> more than 1MB away from __guest_enter(). Use an unconditional branch
> and adr_l instead to fix the issue.
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/1317
> Reported-by: Nathan Chancellor 
> Suggested-by: Marc Zyngier 
> Suggested-by: Ard Biesheuvel 
> Signed-off-by: Sami Tolvanen 

Reviewed-by: Kees Cook 

-- 
Kees Cook
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2

2021-03-05 Thread Will Deacon
On Tue, Mar 02, 2021 at 02:59:59PM +, Quentin Perret wrote:
> When KVM runs in protected nVHE mode, make use of a stage 2 page-table
> to give the hypervisor some control over the host memory accesses. The
> host stage 2 is created lazily using large block mappings if possible,
> and will default to page mappings in absence of a better solution.
> 
> From this point on, memory accesses from the host to protected memory
> regions (e.g. marked PROT_NONE) are fatal and lead to hyp_panic().
> 
> Signed-off-by: Quentin Perret 
> ---
>  arch/arm64/include/asm/kvm_asm.h  |   1 +
>  arch/arm64/include/asm/kvm_cpufeature.h   |   2 +
>  arch/arm64/kernel/image-vars.h|   3 +
>  arch/arm64/kvm/arm.c  |  10 +
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  34 +++
>  arch/arm64/kvm/hyp/nvhe/Makefile  |   2 +-
>  arch/arm64/kvm/hyp/nvhe/hyp-init.S|   1 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c|  11 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c | 213 ++
>  arch/arm64/kvm/hyp/nvhe/setup.c   |   5 +
>  arch/arm64/kvm/hyp/nvhe/switch.c  |   7 +-
>  arch/arm64/kvm/hyp/nvhe/tlb.c |   4 +-
>  12 files changed, 286 insertions(+), 7 deletions(-)
>  create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
>  create mode 100644 arch/arm64/kvm/hyp/nvhe/mem_protect.c

[...]

> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h 
> b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> new file mode 100644
> index ..d293cb328cc4
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2020 Google LLC
> + * Author: Quentin Perret 
> + */
> +
> +#ifndef __KVM_NVHE_MEM_PROTECT__
> +#define __KVM_NVHE_MEM_PROTECT__
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct host_kvm {
> + struct kvm_arch arch;
> + struct kvm_pgtable pgt;
> + struct kvm_pgtable_mm_ops mm_ops;
> + hyp_spinlock_t lock;
> +};
> +extern struct host_kvm host_kvm;
> +
> +int __pkvm_prot_finalize(void);
> +int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool);
> +void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
> +
> +static __always_inline void __load_host_stage2(void)
> +{
> + if (static_branch_likely(_protected_mode_initialized))
> + __load_stage2(_kvm.arch.mmu, host_kvm.arch.vtcr);
> + else
> + write_sysreg(0, vttbr_el2);

I realise you've just moved the code, but why is this needed? All we've
done is point the stage-2 ttb at 0x0 (i.e. we haven't disabled anything).

> +#endif /* __KVM_NVHE_MEM_PROTECT__ */
> diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile 
> b/arch/arm64/kvm/hyp/nvhe/Makefile
> index e204ea77ab27..ce49795324a7 100644
> --- a/arch/arm64/kvm/hyp/nvhe/Makefile
> +++ b/arch/arm64/kvm/hyp/nvhe/Makefile
> @@ -14,7 +14,7 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs))
>  
>  obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \
>hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \
> -  cache.o cpufeature.o setup.o mm.o
> +  cache.o cpufeature.o setup.o mm.o mem_protect.o
>  obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \
>../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o
>  obj-y += $(lib-objs)
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S 
> b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> index f312672d895e..6fa01b04954f 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S
> @@ -119,6 +119,7 @@ alternative_else_nop_endif
>  
>   /* Invalidate the stale TLBs from Bootloader */
>   tlbialle2
> + tlbivmalls12e1
>   dsb sy
>  
>   /*
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
> b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index ae6503c9be15..f47028d3fd0a 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -13,6 +13,7 @@
>  #include 
>  #include 
>  
> +#include 
>  #include 
>  #include 
>  
> @@ -151,6 +152,10 @@ static void handle___pkvm_create_private_mapping(struct 
> kvm_cpu_context *host_ct
>   cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
>  }
>  
> +static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
> +{
> + cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
> +}
>  typedef void (*hcall_t)(struct kvm_cpu_context *);
>  
>  #define HANDLE_FUNC(x)   [__KVM_HOST_SMCCC_FUNC_##x] = 
> (hcall_t)handle_##x
> @@ -174,6 +179,7 @@ static const hcall_t host_hcall[] = {
>   HANDLE_FUNC(__pkvm_cpu_set_vector),
>   HANDLE_FUNC(__pkvm_create_mappings),
>   HANDLE_FUNC(__pkvm_create_private_mapping),
> + HANDLE_FUNC(__pkvm_prot_finalize),
>  };
>  
>  static void handle_host_hcall(struct 

Re: [PATCH v3 32/32] KVM: arm64: Protect the .hyp sections from the host

2021-03-05 Thread Will Deacon
On Tue, Mar 02, 2021 at 03:00:02PM +, Quentin Perret wrote:
> When KVM runs in nVHE protected mode, use the host stage 2 to unmap the
> hypervisor sections. The long-term goal is to ensure the EL2 code can
> remain robust regardless of the host's state, so this starts by making
> sure the host cannot e.g. write to the .hyp sections directly.
> 
> Signed-off-by: Quentin Perret 
> ---
>  arch/arm64/include/asm/kvm_asm.h  |  1 +
>  arch/arm64/kvm/arm.c  | 46 +++
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c|  9 
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c | 22 +
>  5 files changed, 80 insertions(+)

[...]

>  static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
> b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 2252ad1a8945..ed480facdc88 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -196,6 +196,28 @@ static int host_stage2_idmap(u64 addr)
>   return ret;
>  }
>  
> +int __pkvm_host_unmap(phys_addr_t start, phys_addr_t end)
> +{
> + struct kvm_mem_range r1, r2;
> + int ret;
> +
> + /*
> +  * host_stage2_unmap_dev_all() currently relies on MMIO mappings being
> +  * non-persistent, so don't allow PROT_NONE in MMIO range.
> +  */
> + if (!find_mem_range(start, ) || !find_mem_range(end, ))
> + return -EINVAL;
> + if (r1.start != r2.start)
> + return -EINVAL;


Feels like this should be in a helper to determine whether or not a range is
solely covered by memory.

Either way:

Acked-by: Will Deacon 

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Ensure I-cache isolation between vcpus of a same VM

2021-03-05 Thread Catalin Marinas
On Wed, Mar 03, 2021 at 04:45:05PM +, Marc Zyngier wrote:
> It recently became apparent that the ARMv8 architecture has interesting
> rules regarding attributes being used when fetching instructions
> if the MMU is off at Stage-1.
> 
> In this situation, the CPU is allowed to fetch from the PoC and
> allocate into the I-cache (unless the memory is mapped with
> the XN attribute at Stage-2).

Digging through the ARM ARM is hard. Do we have this behaviour with FWB
as well?

-- 
Catalin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 26/32] KVM: arm64: Introduce PROT_NONE mappings for stage 2

2021-03-05 Thread Will Deacon
On Fri, Mar 05, 2021 at 09:52:12AM +, Quentin Perret wrote:
> On Thursday 04 Mar 2021 at 20:00:45 (+), Will Deacon wrote:
> > On Tue, Mar 02, 2021 at 02:59:56PM +, Quentin Perret wrote:
> > > Once we start unmapping portions of memory from the host stage 2 (such
> > > as e.g. the hypervisor memory sections, or pages that belong to
> > > protected guests), we will need a way to track page ownership. And
> > > given that all mappings in the host stage 2 will be identity-mapped, we
> > > can use the host stage 2 page-table itself as a simplistic rmap.
> > > 
> > > As a first step towards this, introduce a new protection attribute
> > > in the stage 2 page table code, called KVM_PGTABLE_PROT_NONE, which
> > > allows to annotate portions of the IPA space as inaccessible. For
> > > simplicity, PROT_NONE mappings are created as invalid mappings with a
> > > software bit set.
> > 
> > Just as an observation, but given that they're invalid we can use any bit
> > from [63:2] to indicate that it's a PROT_NONE mapping, and that way we
> > can keep the real "software bits" for live mappings.
> > 
> > But we can of course change that later when we need the bit for something
> > else.
> 
> Right, so I used this approach for consistency with the kernel's
> PROT_NONE mappings:
> 
>   #define PTE_PROT_NONE   (_AT(pteval_t, 1) << 58) /* only when 
> !PTE_VALID */
> 
> And in fact now that I think about it, it might be worth re-using the
> same bit in stage 2.
> 
> But yes it would be pretty neat to use the other bits of invalid
> mappings to add metadata about the pages. I could even drop the
> PROT_NONE stuff straight away in favor of a more extensive mechanism for
> tracking page ownership...
> 
> Thinking about it, it should be relatively straightforward to construct
> the host stage 2 with the following invariants:
> 
>   1. everything is identity-mapped in the host stage 2;
> 
>   2. all valid mappings imply the underlying PA range belongs to the
>  host;
> 
>   3. bits [63:32] (say) of all invalid mappings are used to store a
>  unique identifier for the owner of the underlying PA range;
> 
>   4. the host identifier is 0, such that it owns all of memory by
>  default at boot as its pgd is zeroed;
> 
> And then I could replace my PROT_NONE permission stuff by an ownership
> change. E.g. the hypervisor would have its own identifier, and I could
> use it to mark the .hyp memory sections as owned by the hyp (which
> implies inaccessible by the host). And that should scale quite easily
> when we start running protected guests as we'll assign them their own
> identifiers. Sharing pages between guests (or even worse, between host
> and guests) is a bit trickier, but maybe that is for later.
> 
> Thoughts?

I think this sounds like a worthwhile generalisation to me, although virtio
brings an immediate need for shared pages and so we'll still need a software
bit for those so that we e.g. prevent the host from donating such a shared
page to the hypervisor.

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()

2021-03-05 Thread Quentin Perret
On Friday 05 Mar 2021 at 14:39:42 (+), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:58PM +, Quentin Perret wrote:
> > +/**
> > + * kvm_pgtable_stage2_idmap_greedy() - Identity-map an Intermediate 
> > Physical
> > + *Address with a leaf entry at the highest
> > + *possible level.
> 
> Not sure it's worth mentioning "highest possible level" here, as
> realistically the caller still has to provide a memcache to deal with the
> worst case and the structure of the page-table shouldn't matter.

Right, we need to pass a range so I suppose that should be enough to
say 'this tries to cover large portions of memory'.

> > + * @pgt:   Page-table structure initialised by kvm_pgtable_*_init().
> > + * @addr:  Input address to identity-map.
> > + * @prot:  Permissions and attributes for the mapping.
> > + * @range: Boundaries of the maximum memory region to map.
> > + * @mc:Cache of pre-allocated memory from which to allocate 
> > page-table
> > + * pages.
> > + *
> > + * This function attempts to install high-level identity-mappings covering 
> > @addr
> 
> "high-level"? (again, I think I'd just drop this)
> 
> > + * without overriding existing mappings with incompatible permissions or
> > + * attributes. An existing table entry may be coalesced into a block 
> > mapping
> > + * if and only if it covers @addr and all its leafs are either invalid 
> > and/or
> 
> s/leafs/leaf entries/

Ack for both.

> > + * have permissions and attributes strictly matching @prot. The mapping is
> > + * guaranteed to be contained within the boundaries specified by @range at 
> > call
> > + * time. If only a subset of the memory specified by @range is mapped 
> > (because
> > + * of e.g. alignment issues or existing incompatible mappings), @range 
> > will be
> > + * updated accordingly.
> > + *
> > + * Return: 0 on success, negative error code on failure.
> > + */
> > +int kvm_pgtable_stage2_idmap_greedy(struct kvm_pgtable *pgt, u64 addr,
> > +   enum kvm_pgtable_prot prot,
> > +   struct kvm_mem_range *range,
> > +   void *mc);
> >  #endif /* __ARM64_KVM_PGTABLE_H__ */
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 8aa01a9e2603..6897d771e2b2 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -987,3 +987,122 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable 
> > *pgt)
> > pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
> > pgt->pgd = NULL;
> >  }
> > +
> > +struct stage2_reduce_range_data {
> > +   kvm_pte_t attr;
> > +   u64 target_addr;
> > +   u32 start_level;
> > +   struct kvm_mem_range *range;
> > +};
> > +
> > +static int __stage2_reduce_range(struct stage2_reduce_range_data *data, 
> > u64 addr)
> > +{
> > +   u32 level = data->start_level;
> > +
> > +   for (; level < KVM_PGTABLE_MAX_LEVELS; level++) {
> > +   u64 granule = kvm_granule_size(level);
> > +   u64 start = ALIGN_DOWN(data->target_addr, granule);
> > +   u64 end = start + granule;
> > +
> > +   /*
> > +* The pinned address is in the current range, try one level
> > +* deeper.
> > +*/
> > +   if (start == ALIGN_DOWN(addr, granule))
> > +   continue;
> > +
> > +   /*
> > +* Make sure the current range is a reduction of the existing
> > +* range before updating it.
> > +*/
> > +   if (data->range->start <= start && end <= data->range->end) {
> > +   data->start_level = level;
> > +   data->range->start = start;
> > +   data->range->end = end;
> > +   return 0;
> > +   }
> > +   }
> > +
> > +   return -EINVAL;
> > +}
> > +
> > +#define KVM_PTE_LEAF_S2_COMPAT_MASK(KVM_PTE_LEAF_ATTR_S2_PERMS | \
> > +KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR | \
> > +KVM_PTE_LEAF_SW_BIT_PROT_NONE)
> > +
> > +static int stage2_reduce_range_walker(u64 addr, u64 end, u32 level,
> > + kvm_pte_t *ptep,
> > + enum kvm_pgtable_walk_flags flag,
> > + void * const arg)
> > +{
> > +   struct stage2_reduce_range_data *data = arg;
> > +   kvm_pte_t attr;
> > +   int ret;
> > +
> > +   if (addr < data->range->start || addr >= data->range->end)
> > +   return 0;
> > +
> > +   attr = *ptep & KVM_PTE_LEAF_S2_COMPAT_MASK;
> > +   if (!attr || attr == data->attr)
> > +   return 0;
> > +
> > +   /*
> > +* An existing mapping with incompatible protection attributes is
> > +* 'pinned', so reduce the range if we hit one.
> > +*/
> > +   ret = __stage2_reduce_range(data, addr);
> > +   if (ret)
> > +   return ret;
> > +
> > + 

Re: [PATCH v3 31/32] KVM: arm64: Disable PMU support in protected mode

2021-03-05 Thread Will Deacon
On Tue, Mar 02, 2021 at 03:00:01PM +, Quentin Perret wrote:
> The host currently writes directly in EL2 per-CPU data sections from
> the PMU code when running in nVHE. In preparation for unmapping the EL2
> sections from the host stage 2, disable PMU support in protected mode as
> we currently do not have a use-case for it.
> 
> Signed-off-by: Quentin Perret 
> ---
>  arch/arm64/kvm/perf.c | 3 ++-
>  arch/arm64/kvm/pmu.c  | 8 
>  2 files changed, 6 insertions(+), 5 deletions(-)

Acked-by: Will Deacon 

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm64: Add big.LITTLE architecture support for vcpu

2021-03-05 Thread Yejune Deng
In big.LITTLE architecture,the application layer calls KVM_ARM_VCPU_INIT
several times.Sometimes the thread will run on the big core, sometimes
will run on the little core.The big core and the little core has always
different cpuid, but the init target is fixed. and that leads to
init->target != phys_target. So modify phys target from the current core
to all cpu online.

Signed-off-by: Yejune Deng 
---
 arch/arm64/kvm/arm.c | 30 +-
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fc4c95dd2d26..f7fe09a5b23e 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -983,20 +983,37 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct 
kvm_irq_level *irq_level,
return -EINVAL;
 }
 
+static void check_kvm_target_cpu(void *ret)
+{
+   *(int *)ret = kvm_target_cpu();
+}
+
+static int kvm_cpu_online_target(int init_target)
+{
+   int cpu, ret = -1;
+
+   for_each_online_cpu(cpu) {
+   smp_call_function_single(cpu, check_kvm_target_cpu, , 1);
+   if (ret == init_target)
+   return 0;
+   }
+
+   return -EINVAL;
+}
+
 static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
   const struct kvm_vcpu_init *init)
 {
unsigned int i, ret;
-   int phys_target = kvm_target_cpu();
 
-   if (init->target != phys_target)
+   if (kvm_cpu_online_target(init->target))
return -EINVAL;
 
/*
 * Secondary and subsequent calls to KVM_ARM_VCPU_INIT must
 * use the same target.
 */
-   if (vcpu->arch.target != -1 && vcpu->arch.target != init->target)
+   if (vcpu->arch.target != -1 && kvm_cpu_online_target(init->target))
return -EINVAL;
 
/* -ENOENT for unknown features, -EINVAL for invalid combinations. */
@@ -1018,7 +1035,7 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
set_bit(i, vcpu->arch.features);
}
 
-   vcpu->arch.target = phys_target;
+   vcpu->arch.target = init->target;
 
/* Now we know what it is, we can reset it. */
ret = kvm_reset_vcpu(vcpu);
@@ -1815,11 +1832,6 @@ static int init_hyp_mode(void)
return err;
 }
 
-static void check_kvm_target_cpu(void *ret)
-{
-   *(int *)ret = kvm_target_cpu();
-}
-
 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr)
 {
struct kvm_vcpu *vcpu;
-- 
2.29.0

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 8/8] KVM: arm64: Fix range alignment when walking page tables

2021-03-05 Thread Marc Zyngier
From: Jia He 

When walking the page tables at a given level, and if the start
address for the range isn't aligned for that level, we propagate
the misalignment on each iteration at that level.

This results in the walker ignoring a number of entries (depending
on the original misalignment) on each subsequent iteration.

Properly aligning the address before the next iteration addresses
this issue.

Cc: sta...@vger.kernel.org
Reported-by: Howard Zhang 
Acked-by: Will Deacon 
Signed-off-by: Jia He 
Fixes: b1e57de62cfb ("KVM: arm64: Add stand-alone page-table walker 
infrastructure")
[maz: rewrite commit message]
Signed-off-by: Marc Zyngier 
Link: https://lore.kernel.org/r/20210303024225.2591-1-justin...@arm.com
---
 arch/arm64/kvm/hyp/pgtable.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 4d177ce1d536..926fc07074f5 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -223,6 +223,7 @@ static inline int __kvm_pgtable_visit(struct 
kvm_pgtable_walk_data *data,
goto out;
 
if (!table) {
+   data->addr = ALIGN_DOWN(data->addr, kvm_granule_size(level));
data->addr += kvm_granule_size(level);
goto out;
}
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 7/8] KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility

2021-03-05 Thread Marc Zyngier
It looks like we have broken firmware out there that wrongly advertises
a GICv2 compatibility interface, despite the CPUs not being able to deal
with it.

To work around this, check that the CPU initialising KVM is actually able
to switch to MMIO instead of system registers, and use that as a
precondition to enable GICv2 compatibility in KVM.

Note that the detection happens on a single CPU. If the firmware is
lying *and* that the CPUs are asymetric, all hope is lost anyway.

Reported-by: Shameerali Kolothum Thodi 
Tested-by: Shameer Kolothum 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/hyp/vgic-v3-sr.c | 35 +++--
 arch/arm64/kvm/vgic/vgic-v3.c   |  8 ++--
 2 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
index 005daa0c9dd7..ee3682b9873c 100644
--- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
@@ -408,11 +408,42 @@ void __vgic_v3_init_lrs(void)
 /*
  * Return the GIC CPU configuration:
  * - [31:0]  ICH_VTR_EL2
- * - [63:32] RES0
+ * - [62:32] RES0
+ * - [63]MMIO (GICv2) capable
  */
 u64 __vgic_v3_get_gic_config(void)
 {
-   return read_gicreg(ICH_VTR_EL2);
+   u64 val, sre = read_gicreg(ICC_SRE_EL1);
+   unsigned long flags = 0;
+
+   /*
+* To check whether we have a MMIO-based (GICv2 compatible)
+* CPU interface, we need to disable the system register
+* view. To do that safely, we have to prevent any interrupt
+* from firing (which would be deadly).
+*
+* Note that this only makes sense on VHE, as interrupts are
+* already masked for nVHE as part of the exception entry to
+* EL2.
+*/
+   if (has_vhe())
+   flags = local_daif_save();
+
+   write_gicreg(0, ICC_SRE_EL1);
+   isb();
+
+   val = read_gicreg(ICC_SRE_EL1);
+
+   write_gicreg(sre, ICC_SRE_EL1);
+   isb();
+
+   if (has_vhe())
+   local_daif_restore(flags);
+
+   val  = (val & ICC_SRE_EL1_SRE) ? 0 : (1ULL << 63);
+   val |= read_gicreg(ICH_VTR_EL2);
+
+   return val;
 }
 
 u64 __vgic_v3_read_vmcr(void)
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index c3e6c3fd333b..6f530925a231 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -575,8 +575,10 @@ early_param("kvm-arm.vgic_v4_enable", early_gicv4_enable);
 int vgic_v3_probe(const struct gic_kvm_info *info)
 {
u64 ich_vtr_el2 = kvm_call_hyp_ret(__vgic_v3_get_gic_config);
+   bool has_v2;
int ret;
 
+   has_v2 = ich_vtr_el2 >> 63;
ich_vtr_el2 = (u32)ich_vtr_el2;
 
/*
@@ -596,13 +598,15 @@ int vgic_v3_probe(const struct gic_kvm_info *info)
 gicv4_enable ? "en" : "dis");
}
 
+   kvm_vgic_global_state.vcpu_base = 0;
+
if (!info->vcpu.start) {
kvm_info("GICv3: no GICV resource entry\n");
-   kvm_vgic_global_state.vcpu_base = 0;
+   } else if (!has_v2) {
+   pr_warn(FW_BUG "CPU interface incapable of MMIO access\n");
} else if (!PAGE_ALIGNED(info->vcpu.start)) {
pr_warn("GICV physical address 0x%llx not page aligned\n",
(unsigned long long)info->vcpu.start);
-   kvm_vgic_global_state.vcpu_base = 0;
} else {
kvm_vgic_global_state.vcpu_base = info->vcpu.start;
kvm_vgic_global_state.can_emulate_gicv2 = true;
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 6/8] KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()

2021-03-05 Thread Marc Zyngier
As we are about to report a bit more information to the rest of
the kernel, rename __vgic_v3_get_ich_vtr_el2() to the more
explicit __vgic_v3_get_gic_config().

No functional change.

Tested-by: Shameer Kolothum 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/include/asm/kvm_asm.h   | 4 ++--
 arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 +++---
 arch/arm64/kvm/hyp/vgic-v3-sr.c| 7 ++-
 arch/arm64/kvm/vgic/vgic-v3.c  | 4 +++-
 4 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 22d933e9b59e..9c0e396dd03f 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -50,7 +50,7 @@
 #define __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_local_vmid   5
 #define __KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff  6
 #define __KVM_HOST_SMCCC_FUNC___kvm_enable_ssbs7
-#define __KVM_HOST_SMCCC_FUNC___vgic_v3_get_ich_vtr_el28
+#define __KVM_HOST_SMCCC_FUNC___vgic_v3_get_gic_config 8
 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_read_vmcr  9
 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_write_vmcr 10
 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_init_lrs   11
@@ -192,7 +192,7 @@ extern void __kvm_timer_set_cntvoff(u64 cntvoff);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 
-extern u64 __vgic_v3_get_ich_vtr_el2(void);
+extern u64 __vgic_v3_get_gic_config(void);
 extern u64 __vgic_v3_read_vmcr(void);
 extern void __vgic_v3_write_vmcr(u32 vmcr);
 extern void __vgic_v3_init_lrs(void);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c 
b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index f012f8665ecc..8f129968204e 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -67,9 +67,9 @@ static void handle___kvm_enable_ssbs(struct kvm_cpu_context 
*host_ctxt)
write_sysreg_el2(tmp, SYS_SCTLR);
 }
 
-static void handle___vgic_v3_get_ich_vtr_el2(struct kvm_cpu_context *host_ctxt)
+static void handle___vgic_v3_get_gic_config(struct kvm_cpu_context *host_ctxt)
 {
-   cpu_reg(host_ctxt, 1) = __vgic_v3_get_ich_vtr_el2();
+   cpu_reg(host_ctxt, 1) = __vgic_v3_get_gic_config();
 }
 
 static void handle___vgic_v3_read_vmcr(struct kvm_cpu_context *host_ctxt)
@@ -118,7 +118,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__kvm_tlb_flush_local_vmid),
HANDLE_FUNC(__kvm_timer_set_cntvoff),
HANDLE_FUNC(__kvm_enable_ssbs),
-   HANDLE_FUNC(__vgic_v3_get_ich_vtr_el2),
+   HANDLE_FUNC(__vgic_v3_get_gic_config),
HANDLE_FUNC(__vgic_v3_read_vmcr),
HANDLE_FUNC(__vgic_v3_write_vmcr),
HANDLE_FUNC(__vgic_v3_init_lrs),
diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c
index 80406f463c28..005daa0c9dd7 100644
--- a/arch/arm64/kvm/hyp/vgic-v3-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c
@@ -405,7 +405,12 @@ void __vgic_v3_init_lrs(void)
__gic_v3_set_lr(0, i);
 }
 
-u64 __vgic_v3_get_ich_vtr_el2(void)
+/*
+ * Return the GIC CPU configuration:
+ * - [31:0]  ICH_VTR_EL2
+ * - [63:32] RES0
+ */
+u64 __vgic_v3_get_gic_config(void)
 {
return read_gicreg(ICH_VTR_EL2);
 }
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 52915b342351..c3e6c3fd333b 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -574,9 +574,11 @@ early_param("kvm-arm.vgic_v4_enable", early_gicv4_enable);
  */
 int vgic_v3_probe(const struct gic_kvm_info *info)
 {
-   u32 ich_vtr_el2 = kvm_call_hyp_ret(__vgic_v3_get_ich_vtr_el2);
+   u64 ich_vtr_el2 = kvm_call_hyp_ret(__vgic_v3_get_gic_config);
int ret;
 
+   ich_vtr_el2 = (u32)ich_vtr_el2;
+
/*
 * The ListRegs field is 5 bits, but there is an architectural
 * maximum of 16 list registers. Just ignore bit 4...
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 5/8] KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available

2021-03-05 Thread Marc Zyngier
When running under a nesting hypervisor, it isn't guaranteed that
the virtual HW will include a PMU. In which case, let's not try
to access the PMU registers in the world switch, as that'd be
deadly.

Reported-by: Andre Przywara 
Signed-off-by: Marc Zyngier 
Reviewed-by: Alexandru Elisei 
Link: https://lore.kernel.org/r/20210209114844.3278746-3-...@kernel.org
---
 arch/arm64/kernel/image-vars.h  | 3 +++
 arch/arm64/kvm/hyp/include/hyp/switch.h | 9 ++---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 23f1a557bd9f..5aa9ed1e9ec6 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -101,6 +101,9 @@ KVM_NVHE_ALIAS(__stop___kvm_ex_table);
 /* Array containing bases of nVHE per-CPU memory regions. */
 KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
 
+/* PMU available static key */
+KVM_NVHE_ALIAS(kvm_arm_pmu_available);
+
 #endif /* CONFIG_KVM */
 
 #endif /* __ARM64_KERNEL_IMAGE_VARS_H */
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h 
b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 54f4860cd87c..6c1f51f25eb3 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -90,15 +90,18 @@ static inline void __activate_traps_common(struct kvm_vcpu 
*vcpu)
 * counter, which could make a PMXEVCNTR_EL0 access UNDEF at
 * EL1 instead of being trapped to EL2.
 */
-   write_sysreg(0, pmselr_el0);
-   write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
+   if (kvm_arm_support_pmu_v3()) {
+   write_sysreg(0, pmselr_el0);
+   write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
+   }
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
 }
 
 static inline void __deactivate_traps_common(void)
 {
write_sysreg(0, hstr_el2);
-   write_sysreg(0, pmuserenr_el0);
+   if (kvm_arm_support_pmu_v3())
+   write_sysreg(0, pmuserenr_el0);
 }
 
 static inline void ___activate_traps(struct kvm_vcpu *vcpu)
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 4/8] KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key

2021-03-05 Thread Marc Zyngier
We currently find out about the presence of a HW PMU (or the handling
of that PMU by perf, which amounts to the same thing) in a fairly
roundabout way, by checking the number of counters available to perf.
That's good enough for now, but we will soon need to find about about
that on paths where perf is out of reach (in the world switch).

Instead, let's turn kvm_arm_support_pmu_v3() into a static key.

Signed-off-by: Marc Zyngier 
Reviewed-by: Alexandru Elisei 
Link: https://lore.kernel.org/r/20210209114844.3278746-2-...@kernel.org
---
 arch/arm64/kvm/perf.c | 10 ++
 arch/arm64/kvm/pmu-emul.c | 10 --
 include/kvm/arm_pmu.h |  9 +++--
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/perf.c b/arch/arm64/kvm/perf.c
index d45b8b9a4415..739164324afe 100644
--- a/arch/arm64/kvm/perf.c
+++ b/arch/arm64/kvm/perf.c
@@ -11,6 +11,8 @@
 
 #include 
 
+DEFINE_STATIC_KEY_FALSE(kvm_arm_pmu_available);
+
 static int kvm_is_in_guest(void)
 {
 return kvm_get_running_vcpu() != NULL;
@@ -48,6 +50,14 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = {
 
 int kvm_perf_init(void)
 {
+   /*
+* Check if HW_PERF_EVENTS are supported by checking the number of
+* hardware performance counters. This could ensure the presence of
+* a physical PMU and CONFIG_PERF_EVENT is selected.
+*/
+   if (IS_ENABLED(CONFIG_ARM_PMU) && perf_num_counters() > 0)
+   static_branch_enable(_arm_pmu_available);
+
return perf_register_guest_info_callbacks(_guest_cbs);
 }
 
diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c
index e9ec08b0b070..e32c6e139a09 100644
--- a/arch/arm64/kvm/pmu-emul.c
+++ b/arch/arm64/kvm/pmu-emul.c
@@ -823,16 +823,6 @@ u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1)
return val & mask;
 }
 
-bool kvm_arm_support_pmu_v3(void)
-{
-   /*
-* Check if HW_PERF_EVENTS are supported by checking the number of
-* hardware performance counters. This could ensure the presence of
-* a physical PMU and CONFIG_PERF_EVENT is selected.
-*/
-   return (perf_num_counters() > 0);
-}
-
 int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu)
 {
if (!kvm_vcpu_has_pmu(vcpu))
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 8dcb3e1477bc..6fd3cda608e4 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -13,6 +13,13 @@
 #define ARMV8_PMU_CYCLE_IDX(ARMV8_PMU_MAX_COUNTERS - 1)
 #define ARMV8_PMU_MAX_COUNTER_PAIRS((ARMV8_PMU_MAX_COUNTERS + 1) >> 1)
 
+DECLARE_STATIC_KEY_FALSE(kvm_arm_pmu_available);
+
+static __always_inline bool kvm_arm_support_pmu_v3(void)
+{
+   return static_branch_likely(_arm_pmu_available);
+}
+
 #ifdef CONFIG_HW_PERF_EVENTS
 
 struct kvm_pmc {
@@ -47,7 +54,6 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 
val);
 void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
u64 select_idx);
-bool kvm_arm_support_pmu_v3(void);
 int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr);
 int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu,
@@ -87,7 +93,6 @@ static inline void kvm_pmu_software_increment(struct kvm_vcpu 
*vcpu, u64 val) {}
 static inline void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) {}
 static inline void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu,
  u64 data, u64 select_idx) {}
-static inline bool kvm_arm_support_pmu_v3(void) { return false; }
 static inline int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu,
  struct kvm_device_attr *attr)
 {
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 3/8] KVM: arm64: Fix nVHE hyp panic host context restore

2021-03-05 Thread Marc Zyngier
From: Andrew Scull 

When panicking from the nVHE hyp and restoring the host context, x29 is
expected to hold a pointer to the host context. This wasn't being done
so fix it to make sure there's a valid pointer the host context being
used.

Rather than passing a boolean indicating whether or not the host context
should be restored, instead pass the pointer to the host context. NULL
is passed to indicate that no context should be restored.

Fixes: a2e102e20fd6 ("KVM: arm64: nVHE: Handle hyp panics")
Cc: sta...@vger.kernel.org
Signed-off-by: Andrew Scull 
[maz: partial rewrite to fit 5.12-rc1]
Signed-off-by: Marc Zyngier 
Link: https://lore.kernel.org/r/20210219122406.1337626-1-asc...@google.com
---
 arch/arm64/include/asm/kvm_hyp.h |  3 ++-
 arch/arm64/kvm/hyp/nvhe/host.S   | 15 ---
 arch/arm64/kvm/hyp/nvhe/switch.c |  3 +--
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 385bd7dd3d39..32ae676236b6 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -102,7 +102,8 @@ bool kvm_host_psci_handler(struct kvm_cpu_context 
*host_ctxt);
 
 void __noreturn hyp_panic(void);
 #ifdef __KVM_NVHE_HYPERVISOR__
-void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par);
+void __noreturn __hyp_do_panic(struct kvm_cpu_context *host_ctxt, u64 spsr,
+  u64 elr, u64 par);
 #endif
 
 #endif /* __ARM64_KVM_HYP_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 6585a7cbbc56..5d94584840cc 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -71,7 +71,8 @@ SYM_FUNC_START(__host_enter)
 SYM_FUNC_END(__host_enter)
 
 /*
- * void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 
par);
+ * void __noreturn __hyp_do_panic(struct kvm_cpu_context *host_ctxt, u64 spsr,
+ *   u64 elr, u64 par);
  */
 SYM_FUNC_START(__hyp_do_panic)
/* Prepare and exit to the host's panic funciton. */
@@ -82,9 +83,11 @@ SYM_FUNC_START(__hyp_do_panic)
hyp_kimg_va lr, x6
msr elr_el2, lr
 
-   /* Set the panic format string. Use the, now free, LR as scratch. */
-   ldr lr, =__hyp_panic_string
-   hyp_kimg_va lr, x6
+   mov x29, x0
+
+   /* Load the format string into x0 and arguments into x1-7 */
+   ldr x0, =__hyp_panic_string
+   hyp_kimg_va x0, x6
 
/* Load the format arguments into x1-7. */
mov x6, x3
@@ -94,9 +97,7 @@ SYM_FUNC_START(__hyp_do_panic)
mrs x5, hpfar_el2
 
/* Enter the host, conditionally restoring the host context. */
-   cmp x0, xzr
-   mov x0, lr
-   b.eq__host_enter_without_restoring
+   cbz x29, __host_enter_without_restoring
b   __host_enter_for_panic
 SYM_FUNC_END(__hyp_do_panic)
 
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 59aa1045fdaf..68ab6b4d5141 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -266,7 +266,6 @@ void __noreturn hyp_panic(void)
u64 spsr = read_sysreg_el2(SYS_SPSR);
u64 elr = read_sysreg_el2(SYS_ELR);
u64 par = read_sysreg_par();
-   bool restore_host = true;
struct kvm_cpu_context *host_ctxt;
struct kvm_vcpu *vcpu;
 
@@ -280,7 +279,7 @@ void __noreturn hyp_panic(void)
__sysreg_restore_state_nvhe(host_ctxt);
}
 
-   __hyp_do_panic(restore_host, spsr, elr, par);
+   __hyp_do_panic(host_ctxt, spsr, elr, par);
unreachable();
 }
 
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 2/8] KVM: arm64: Avoid corrupting vCPU context register in guest exit

2021-03-05 Thread Marc Zyngier
From: Will Deacon 

Commit 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest
context") tracks the currently running vCPU, clearing the pointer to
NULL on exit from a guest.

Unfortunately, the use of 'set_loaded_vcpu' clobbers x1 to point at the
kvm_hyp_ctxt instead of the vCPU context, causing the subsequent RAS
code to go off into the weeds when it saves the DISR assuming that the
CPU context is embedded in a struct vCPU.

Leave x1 alone and use x3 as a temporary register instead when clearing
the vCPU on the guest exit path.

Cc: Marc Zyngier 
Cc: Andrew Scull 
Cc: 
Fixes: 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest context")
Suggested-by: Quentin Perret 
Signed-off-by: Will Deacon 
Signed-off-by: Marc Zyngier 
Link: https://lore.kernel.org/r/20210226181211.14542-1-w...@kernel.org
---
 arch/arm64/kvm/hyp/entry.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index b0afad7a99c6..0c66a1d408fd 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -146,7 +146,7 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL)
// Now restore the hyp regs
restore_callee_saved_regs x2
 
-   set_loaded_vcpu xzr, x1, x2
+   set_loaded_vcpu xzr, x2, x3
 
 alternative_if ARM64_HAS_RAS_EXTN
// If we have the RAS extensions we can consume a pending error
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 1/8] KVM: arm64: nvhe: Save the SPE context early

2021-03-05 Thread Marc Zyngier
From: Suzuki K Poulose 

The nVHE KVM hyp drains and disables the SPE buffer, before
entering the guest, as the EL1&0 translation regime
is going to be loaded with that of the guest.

But this operation is performed way too late, because :
  - The owning translation regime of the SPE buffer
is transferred to EL2. (MDCR_EL2_E2PB == 0)
  - The guest Stage1 is loaded.

Thus the flush could use the host EL1 virtual address,
but use the EL2 translations instead of host EL1, for writing
out any cached data.

Fix this by moving the SPE buffer handling early enough.
The restore path is doing the right thing.

Fixes: 014c4c77aad7 ("KVM: arm64: Improve debug register save/restore flow")
Cc: sta...@vger.kernel.org
Cc: Christoffer Dall 
Cc: Marc Zyngier 
Cc: Will Deacon 
Cc: Catalin Marinas 
Cc: Mark Rutland 
Cc: Alexandru Elisei 
Reviewed-by: Alexandru Elisei 
Signed-off-by: Suzuki K Poulose 
Signed-off-by: Marc Zyngier 
Link: https://lore.kernel.org/r/20210302120345.3102874-1-suzuki.poul...@arm.com
---
 arch/arm64/include/asm/kvm_hyp.h   |  5 +
 arch/arm64/kvm/hyp/nvhe/debug-sr.c | 12 ++--
 arch/arm64/kvm/hyp/nvhe/switch.c   | 11 ++-
 3 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index c0450828378b..385bd7dd3d39 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -83,6 +83,11 @@ void sysreg_restore_guest_state_vhe(struct kvm_cpu_context 
*ctxt);
 void __debug_switch_to_guest(struct kvm_vcpu *vcpu);
 void __debug_switch_to_host(struct kvm_vcpu *vcpu);
 
+#ifdef __KVM_NVHE_HYPERVISOR__
+void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu);
+void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu);
+#endif
+
 void __fpsimd_save_state(struct user_fpsimd_state *fp_regs);
 void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs);
 
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c 
b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 91a711aa8382..f401724f12ef 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -58,16 +58,24 @@ static void __debug_restore_spe(u64 pmscr_el1)
write_sysreg_s(pmscr_el1, SYS_PMSCR_EL1);
 }
 
-void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
+void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
 {
/* Disable and flush SPE data generation */
__debug_save_spe(>arch.host_debug_state.pmscr_el1);
+}
+
+void __debug_switch_to_guest(struct kvm_vcpu *vcpu)
+{
__debug_switch_to_guest_common(vcpu);
 }
 
-void __debug_switch_to_host(struct kvm_vcpu *vcpu)
+void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
 {
__debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1);
+}
+
+void __debug_switch_to_host(struct kvm_vcpu *vcpu)
+{
__debug_switch_to_host_common(vcpu);
 }
 
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index f3d0e9eca56c..59aa1045fdaf 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -192,6 +192,14 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
pmu_switch_needed = __pmu_switch_to_guest(host_ctxt);
 
__sysreg_save_state_nvhe(host_ctxt);
+   /*
+* We must flush and disable the SPE buffer for nVHE, as
+* the translation regime(EL1&0) is going to be loaded with
+* that of the guest. And we must do this before we change the
+* translation regime to EL2 (via MDCR_EL2_E2PB == 0) and
+* before we load guest Stage1.
+*/
+   __debug_save_host_buffers_nvhe(vcpu);
 
__adjust_pc(vcpu);
 
@@ -234,11 +242,12 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED)
__fpsimd_save_fpexc32(vcpu);
 
+   __debug_switch_to_host(vcpu);
/*
 * This must come after restoring the host sysregs, since a non-VHE
 * system may enable SPE here and make use of the TTBRs.
 */
-   __debug_switch_to_host(vcpu);
+   __debug_restore_host_buffers_nvhe(vcpu);
 
if (pmu_switch_needed)
__pmu_switch_to_host(host_ctxt);
-- 
2.29.2

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH 0/8] KVM/arm64 fixes for 5.12, take #1

2021-03-05 Thread Marc Zyngier
Hi Paolo,

Here's the first batch of fixes for 5.12. We have a handful of low
level world-switch regressions, a page table walker fix, more PMU
tidying up, and a workaround for systems with creative firmware.

This will need to go on top of the current state of mainline.

Please apply,

M.

Andrew Scull (1):
  KVM: arm64: Fix nVHE hyp panic host context restore

Jia He (1):
  KVM: arm64: Fix range alignment when walking page tables

Marc Zyngier (4):
  KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
  KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available
  KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to 
__vgic_v3_get_gic_config()
  KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 
compatibility

Suzuki K Poulose (1):
  KVM: arm64: nvhe: Save the SPE context early

Will Deacon (1):
  KVM: arm64: Avoid corrupting vCPU context register in guest exit

 arch/arm64/include/asm/kvm_asm.h|  4 ++--
 arch/arm64/include/asm/kvm_hyp.h|  8 ++-
 arch/arm64/kernel/image-vars.h  |  3 +++
 arch/arm64/kvm/hyp/entry.S  |  2 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h |  9 +---
 arch/arm64/kvm/hyp/nvhe/debug-sr.c  | 12 --
 arch/arm64/kvm/hyp/nvhe/host.S  | 15 +++--
 arch/arm64/kvm/hyp/nvhe/hyp-main.c  |  6 ++---
 arch/arm64/kvm/hyp/nvhe/switch.c| 14 +---
 arch/arm64/kvm/hyp/pgtable.c|  1 +
 arch/arm64/kvm/hyp/vgic-v3-sr.c | 40 +++--
 arch/arm64/kvm/perf.c   | 10 +
 arch/arm64/kvm/pmu-emul.c   | 10 -
 arch/arm64/kvm/vgic/vgic-v3.c   | 12 +++---
 include/kvm/arm_pmu.h   |  9 ++--
 15 files changed, 116 insertions(+), 39 deletions(-)
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [GIT PULL] KVM/arm64 fixes for 5.12, take #1

2021-03-05 Thread Marc Zyngier
Hi Paolo,

On Fri, 05 Mar 2021 17:27:36 +,
Paolo Bonzini  wrote:
> 
> On 05/03/21 17:49, Marc Zyngier wrote:
> > Hi Paolo,
> > 
> > Here's the first batch of fixes for 5.12. We have a handful of low
> > level world-switch regressions, a page table walker fix, more PMU
> > tidying up, and a workaround for systems with creative firmware.
> > 
> > Note that this is based on -rc1 despite the breakage, as I didn't feel
> > like holding these patches until -rc2.
> > 
> > Please pull,
> > 
> > M.
> > 
> > The following changes since commit fe07bfda2fb9cdef8a4d4008a409bb02f35f1bd8:
> > 
> >Linux 5.12-rc1 (2021-02-28 16:05:19 -0800)
> > 
> > are available in the Git repository at:
> > 
> >git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
> > tags/kvmarm-fixes-5.12-1
> > 
> > for you to fetch changes up to e85583b3f1fe62c9b371a3100c1c91af94005ca9:
> > 
> >KVM: arm64: Fix range alignment when walking page tables (2021-03-04 
> > 09:54:12 +)
> 
> Hi Marc,
> 
> due to a severe data corruption bug in 5.12-rc1, Linus suggested not
> including 5.12-rc1 in trees to avoid it eating our filesystems
> unwittingly during future bisections.
> 
> Would it be a problem for you to rebase on top of your merge window
> pull request?  If there are conflicts, another possibility is for you
> to just send me the patch series.  I will handle all the topic branch
> juggling.
> 
> This will mean rewriting kvmarm.git's history, but it does seem to be
> the lesser (or the most future-proof) evil.

The problem is that this is not only kvmarm, but also the Android
tree, which directly pulls from the kvmarm stable branches. I guess
we'll have to live with it.

I'll reply to this email with the patch series.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [GIT PULL] KVM/arm64 fixes for 5.12, take #1

2021-03-05 Thread Paolo Bonzini

On 05/03/21 17:49, Marc Zyngier wrote:

Hi Paolo,

Here's the first batch of fixes for 5.12. We have a handful of low
level world-switch regressions, a page table walker fix, more PMU
tidying up, and a workaround for systems with creative firmware.

Note that this is based on -rc1 despite the breakage, as I didn't feel
like holding these patches until -rc2.

Please pull,

M.

The following changes since commit fe07bfda2fb9cdef8a4d4008a409bb02f35f1bd8:

   Linux 5.12-rc1 (2021-02-28 16:05:19 -0800)

are available in the Git repository at:

   git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
tags/kvmarm-fixes-5.12-1

for you to fetch changes up to e85583b3f1fe62c9b371a3100c1c91af94005ca9:

   KVM: arm64: Fix range alignment when walking page tables (2021-03-04 
09:54:12 +)


Hi Marc,

due to a severe data corruption bug in 5.12-rc1, Linus suggested not 
including 5.12-rc1 in trees to avoid it eating our filesystems 
unwittingly during future bisections.


Would it be a problem for you to rebase on top of your merge window pull 
request?  If there are conflicts, another possibility is for you to just 
send me the patch series.  I will handle all the topic branch juggling.


This will mean rewriting kvmarm.git's history, but it does seem to be 
the lesser (or the most future-proof) evil.


Thanks,

Paolo



KVM/arm64 fixes for 5.12, take #1

- Fix SPE context save/restore on nVHE
- Fix some subtle host context corruption on vcpu exit
- Fix panic handling on nVHE
- Prevent the hypervisor from accessing PMU registers when there is none
- Workaround broken firmwares advertising bogus GICv2 compatibility
- Fix Stage-2 unaligned range unmapping


Andrew Scull (1):
   KVM: arm64: Fix nVHE hyp panic host context restore

Jia He (1):
   KVM: arm64: Fix range alignment when walking page tables

Marc Zyngier (4):
   KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
   KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is 
available
   KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to 
__vgic_v3_get_gic_config()
   KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 
compatibility

Suzuki K Poulose (1):
   KVM: arm64: nvhe: Save the SPE context early

Will Deacon (1):
   KVM: arm64: Avoid corrupting vCPU context register in guest exit

  arch/arm64/include/asm/kvm_asm.h|  4 ++--
  arch/arm64/include/asm/kvm_hyp.h|  8 ++-
  arch/arm64/kernel/image-vars.h  |  3 +++
  arch/arm64/kvm/hyp/entry.S  |  2 +-
  arch/arm64/kvm/hyp/include/hyp/switch.h |  9 +---
  arch/arm64/kvm/hyp/nvhe/debug-sr.c  | 12 --
  arch/arm64/kvm/hyp/nvhe/host.S  | 15 +++--
  arch/arm64/kvm/hyp/nvhe/hyp-main.c  |  6 ++---
  arch/arm64/kvm/hyp/nvhe/switch.c| 14 +---
  arch/arm64/kvm/hyp/pgtable.c|  1 +
  arch/arm64/kvm/hyp/vgic-v3-sr.c | 40 +++--
  arch/arm64/kvm/perf.c   | 10 +
  arch/arm64/kvm/pmu-emul.c   | 10 -
  arch/arm64/kvm/vgic/vgic-v3.c   | 12 +++---
  include/kvm/arm_pmu.h   |  9 ++--
  15 files changed, 116 insertions(+), 39 deletions(-)



___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Disable LTO in hyp

2021-03-05 Thread Marc Zyngier

On 2021-03-05 16:55, Sami Tolvanen wrote:

On Fri, Mar 5, 2021 at 6:22 AM Ard Biesheuvel  wrote:


On Fri, 5 Mar 2021 at 12:38, Marc Zyngier  wrote:
>
> On Fri, 05 Mar 2021 02:38:17 +,
> Sami Tolvanen  wrote:
> >
> > On Thu, Mar 4, 2021 at 2:34 PM Sami Tolvanen  
wrote:
> > >
> > > On Thu, Mar 4, 2021 at 2:17 PM Marc Zyngier  wrote:
> > > >
> > > > On Thu, 04 Mar 2021 21:25:41 +,
> > > > Sami Tolvanen  wrote:
>
> [...]
>
> > > > > I assume hyp_panic() ends up being placed too far from __guest_enter()
> > > > > when the kernel is large enough. Possibly something to do with LLVM
> > > > > always splitting functions into separate sections with LTO. I'm not
> > > > > sure why the linker cannot shuffle things around to make everyone
> > > > > happy in this case, but I confirmed that this patch also fixes the
> > > > > build issue for me:
> > > > >
> > > > > diff --git a/arch/arm64/kvm/hyp/vhe/switch.c 
b/arch/arm64/kvm/hyp/vhe/switch.c
> > > > > index af8e940d0f03..128197b7c794 100644
> > > > > --- a/arch/arm64/kvm/hyp/vhe/switch.c
> > > > > +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> > > > > @@ -214,7 +214,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, 
u64 par)
> > > > >  }
> > > > >  NOKPROBE_SYMBOL(__hyp_call_panic);
> > > > >
> > > > > -void __noreturn hyp_panic(void)
> > > > > +void __noreturn hyp_panic(void) __section(".text")
> > > > >  {
> > > > > u64 spsr = read_sysreg_el2(SYS_SPSR);
> > > > > u64 elr = read_sysreg_el2(SYS_ELR);
> > > > >
> > > >
> > > > We're getting into black-magic territory here. Why wouldn't hyp_panic
> > > > be in the .text section already?
> > >
> > > It's not quite black magic. LLVM essentially flips on
> > > -ffunction-sections with LTO and therefore, hyp_panic() will be in
> > > .text.hyp_panic in vmlinux.o, while __guest_enter() will be in .text.
> > > Everything ends up in .text when we link vmlinux, of course.
> > >
> > > $ readelf --sections vmlinux.o | grep hyp_panic
> > >   [3936] .text.hyp_panic   PROGBITS   004b56e4
> >
> > Note that disabling LTO here has essentially the same effect as using
> > __section(".text"). It stops the compiler from splitting these
> > functions into .text.* sections and makes it less likely that
> > hyp_panic() ends up too far away from __guest_enter().
> >
> > If neither of these workarounds sound appealing, I suppose we could
> > alternatively change hyp/entry.S to use adr_l for hyp_panic. Thoughts?
>
> That would be an actual fix instead of a workaround, as it would
> remove existing assumptions about the relative locations of the two
> objects. I guess you need to fix both instances with something such
> as:
>
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index b0afad7a99c6..a43e1f7ee354 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -85,8 +85,10 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL)
>
> // If the hyp context is loaded, go straight to hyp_panic
> get_loaded_vcpu x0, x1
> -   cbz x0, hyp_panic
> -
> +   cbnzx0, 1f
> +   adr_l   x0, hyp_panic
> +   br  x0
> +1:

Agree with replacing the conditional branches that refer to external
symbols: the compiler never emits those, for the reason we are seeing
here, i.e., the range is simply insufficient.

But let's just use 'b hyp_panic' instead, no?


Alright, this seems to work for me:

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index b0afad7a99c6..c62265951467 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -85,8 +85,10 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL)

// If the hyp context is loaded, go straight to hyp_panic
get_loaded_vcpu x0, x1
-   cbz x0, hyp_panic
+   cbnzx0, 1f
+   b   hyp_panic

+1:
// The hyp context is saved so make sure it is restored to 
allow
// hyp_panic to run at hyp and, subsequently, panic to run in 
the host.
// This makes use of __guest_exit to avoid duplication but sets 
the

@@ -94,7 +96,7 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL)
// current state is saved to the guest context but it will only 
be

// accurate if the guest had been completely restored.
adr_this_cpu x0, kvm_hyp_ctxt, x1
-   adr x1, hyp_panic
+   adr_l   x1, hyp_panic
str x1, [x0, #CPU_XREG_OFFSET(30)]

get_vcpu_ptrx1, x0

But when I say work, I mean this fixes the allmodconfig build with
LTO, and my kernel boots at EL2. I don't actually have a way to
properly test KVM on arm64. If nobody sees obvious issues here, I can
send a proper patch a bit later.


Please do, it is "obviously correct"! ;-)

Thanks,

M.
--
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()

2021-03-05 Thread Will Deacon
On Fri, Mar 05, 2021 at 03:03:36PM +, Quentin Perret wrote:
> On Friday 05 Mar 2021 at 14:39:42 (+), Will Deacon wrote:
> > On Tue, Mar 02, 2021 at 02:59:58PM +, Quentin Perret wrote:
> > > + /* Reduce the kvm_mem_range to a granule size */
> > > + ret = __stage2_reduce_range(, range->end);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + /* Walk the range to check permissions and reduce further if needed */
> > > + do {
> > > + ret = kvm_pgtable_walk(pgt, range->start, range->end, );
> > 
> > (we spent some time debugging an issue here and you spotted that you're
> > passing range->end instead of the size ;)
> 
> Yep, I have the fix applied locally, and ready to fly in v4 :)
> 
> > > + } while (ret == -EAGAIN);
> > 
> > I'm a bit nervous about this loop -- what guarantees forward progress here?
> > Can we return to the host after a few tries instead?
> 
> -EAGAIN only happens when we've been able to successfully reduce the
> range to a potentially valid granule size. That can't happen infinitely.
> 
> We're guaranteed to fail when trying to reduce the range to a
> granularity smaller than PAGE_SIZE (the -EINVAL case of
> __stage2_reduce_range), which is indicative of a host memory abort in a
> page it should not access (because marked PROT_NONE for instance).

Can you compute an upper bound on the number of iterations based on the
number of page-table levels then? I'm just conscious that all of this is
effectively running with irqs disabled, and so being able to reason about
the amount of work we're going to do makes me much more comfortable.

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[GIT PULL] KVM/arm64 fixes for 5.12, take #1

2021-03-05 Thread Marc Zyngier
Hi Paolo,

Here's the first batch of fixes for 5.12. We have a handful of low
level world-switch regressions, a page table walker fix, more PMU
tidying up, and a workaround for systems with creative firmware.

Note that this is based on -rc1 despite the breakage, as I didn't feel
like holding these patches until -rc2.

Please pull,

M.

The following changes since commit fe07bfda2fb9cdef8a4d4008a409bb02f35f1bd8:

  Linux 5.12-rc1 (2021-02-28 16:05:19 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git 
tags/kvmarm-fixes-5.12-1

for you to fetch changes up to e85583b3f1fe62c9b371a3100c1c91af94005ca9:

  KVM: arm64: Fix range alignment when walking page tables (2021-03-04 09:54:12 
+)


KVM/arm64 fixes for 5.12, take #1

- Fix SPE context save/restore on nVHE
- Fix some subtle host context corruption on vcpu exit
- Fix panic handling on nVHE
- Prevent the hypervisor from accessing PMU registers when there is none
- Workaround broken firmwares advertising bogus GICv2 compatibility
- Fix Stage-2 unaligned range unmapping


Andrew Scull (1):
  KVM: arm64: Fix nVHE hyp panic host context restore

Jia He (1):
  KVM: arm64: Fix range alignment when walking page tables

Marc Zyngier (4):
  KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
  KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available
  KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to 
__vgic_v3_get_gic_config()
  KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 
compatibility

Suzuki K Poulose (1):
  KVM: arm64: nvhe: Save the SPE context early

Will Deacon (1):
  KVM: arm64: Avoid corrupting vCPU context register in guest exit

 arch/arm64/include/asm/kvm_asm.h|  4 ++--
 arch/arm64/include/asm/kvm_hyp.h|  8 ++-
 arch/arm64/kernel/image-vars.h  |  3 +++
 arch/arm64/kvm/hyp/entry.S  |  2 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h |  9 +---
 arch/arm64/kvm/hyp/nvhe/debug-sr.c  | 12 --
 arch/arm64/kvm/hyp/nvhe/host.S  | 15 +++--
 arch/arm64/kvm/hyp/nvhe/hyp-main.c  |  6 ++---
 arch/arm64/kvm/hyp/nvhe/switch.c| 14 +---
 arch/arm64/kvm/hyp/pgtable.c|  1 +
 arch/arm64/kvm/hyp/vgic-v3-sr.c | 40 +++--
 arch/arm64/kvm/perf.c   | 10 +
 arch/arm64/kvm/pmu-emul.c   | 10 -
 arch/arm64/kvm/vgic/vgic-v3.c   | 12 +++---
 include/kvm/arm_pmu.h   |  9 ++--
 15 files changed, 116 insertions(+), 39 deletions(-)
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64/mm: Fix __enable_mmu() for new TGRAN range values

2021-03-05 Thread Mark Rutland
On Fri, Mar 05, 2021 at 08:06:09PM +0530, Anshuman Khandual wrote:
> From: James Morse 
> 
> As per ARM ARM DDI 0487G.a, when FEAT_LPA2 is implemented, ID_AA64MMFR0_EL1
> might contain a range of values to describe supported translation granules
> (4K and 16K pages sizes in particular) instead of just enabled or disabled
> values. This changes __enable_mmu() function to handle complete acceptable
> range of values (depending on whether the field is signed or unsigned) now
> represented with ID_AA64MMFR0_TGRAN_SUPPORTED_[MIN..MAX] pair. While here,
> also fix similar situations in EFI stub and KVM as well.
> 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: Marc Zyngier 
> Cc: James Morse 
> Cc: Suzuki K Poulose 
> Cc: Ard Biesheuvel 
> Cc: Mark Rutland 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: kvmarm@lists.cs.columbia.edu
> Cc: linux-...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Signed-off-by: James Morse 
> Signed-off-by: Anshuman Khandual 
> ---
>  arch/arm64/include/asm/sysreg.h   | 20 ++--
>  arch/arm64/kernel/head.S  |  6 --
>  arch/arm64/kvm/reset.c| 23 ---
>  drivers/firmware/efi/libstub/arm64-stub.c |  2 +-
>  4 files changed, 31 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index dfd4edb..d4a5fca9 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -796,6 +796,11 @@
>  #define ID_AA64MMFR0_PARANGE_48  0x5
>  #define ID_AA64MMFR0_PARANGE_52  0x6
>  
> +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT   0x0
> +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE  0x1
> +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN   0x2
> +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX   0x7

The TGRAN2 fields doesn't quite follow the usual ID scheme rules, so how
do we deteremine the max value? Does the ARM ARM say anything in
particular about them, like we do for some of the PMU ID fields?

Otherwise, this patch looks correct to me.

Thanks,
Mark.

> +
>  #ifdef CONFIG_ARM64_PA_BITS_52
>  #define ID_AA64MMFR0_PARANGE_MAX ID_AA64MMFR0_PARANGE_52
>  #else
> @@ -961,14 +966,17 @@
>  #define ID_PFR1_PROGMOD_SHIFT0
>  
>  #if defined(CONFIG_ARM64_4K_PAGES)
> -#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN4_SHIFT
> -#define ID_AA64MMFR0_TGRAN_SUPPORTED ID_AA64MMFR0_TGRAN4_SUPPORTED
> +#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN4_SHIFT
> +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN4_SUPPORTED
> +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX 0x7
>  #elif defined(CONFIG_ARM64_16K_PAGES)
> -#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN16_SHIFT
> -#define ID_AA64MMFR0_TGRAN_SUPPORTED ID_AA64MMFR0_TGRAN16_SUPPORTED
> +#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN16_SHIFT
> +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN16_SUPPORTED
> +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX 0xF
>  #elif defined(CONFIG_ARM64_64K_PAGES)
> -#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN64_SHIFT
> -#define ID_AA64MMFR0_TGRAN_SUPPORTED ID_AA64MMFR0_TGRAN64_SUPPORTED
> +#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN64_SHIFT
> +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN64_SUPPORTED
> +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX 0x7
>  #endif
>  
>  #define MVFR2_FPMISC_SHIFT   4
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 66b0e0b..8b469f1 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -655,8 +655,10 @@ SYM_FUNC_END(__secondary_too_slow)
>  SYM_FUNC_START(__enable_mmu)
>   mrs x2, ID_AA64MMFR0_EL1
>   ubfxx2, x2, #ID_AA64MMFR0_TGRAN_SHIFT, 4
> - cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED
> - b.ne__no_granule_support
> + cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MIN
> + b.lt__no_granule_support
> + cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MAX
> + b.gt__no_granule_support
>   update_early_cpu_boot_status 0, x2, x3
>   adrpx2, idmap_pg_dir
>   phys_to_ttbr x1, x1
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 47f3f03..fe72bfb 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -286,7 +286,7 @@ u32 get_kvm_ipa_limit(void)
>  
>  int kvm_set_ipa_limit(void)
>  {
> - unsigned int parange, tgran_2;
> + unsigned int parange, tgran_2_shift, tgran_2;
>   u64 mmfr0;
>  
>   mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
> @@ -300,27 +300,28 @@ int kvm_set_ipa_limit(void)
>   switch (PAGE_SIZE) {
>   default:
>   case SZ_4K:
> - tgran_2 = ID_AA64MMFR0_TGRAN4_2_SHIFT;
> + tgran_2_shift = ID_AA64MMFR0_TGRAN4_2_SHIFT;
>   break;
>   case SZ_16K:
> - tgran_2 = ID_AA64MMFR0_TGRAN16_2_SHIFT;
> + 

Re: [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()

2021-03-05 Thread Will Deacon
On Tue, Mar 02, 2021 at 02:59:58PM +, Quentin Perret wrote:
> Add a new map function to the KVM page-table library that allows to
> greedily create block identity-mappings. This will be useful to create
> lazily the host stage 2 page-table as it will own most of memory and
> will always be identity mapped.
> 
> The new helper function creates the mapping in 2 steps: it first walks
> the page-table to compute the largest possible granule that can be used
> to idmap a given address without overriding existing incompatible
> mappings; and then creates a mapping accordingly.
> 
> Signed-off-by: Quentin Perret 
> ---
>  arch/arm64/include/asm/kvm_pgtable.h |  37 +
>  arch/arm64/kvm/hyp/pgtable.c | 119 +++
>  2 files changed, 156 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
> b/arch/arm64/include/asm/kvm_pgtable.h
> index c9f6ed76e0ad..e51dcce69a5e 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -96,6 +96,16 @@ enum kvm_pgtable_prot {
>  #define PAGE_HYP_RO  (KVM_PGTABLE_PROT_R)
>  #define PAGE_HYP_DEVICE  (PAGE_HYP | KVM_PGTABLE_PROT_DEVICE)
>  
> +/**
> + * struct kvm_mem_range - Range of Intermediate Physical Addresses
> + * @start:   Start of the range.
> + * @end: End of the range.
> + */
> +struct kvm_mem_range {
> + u64 start;
> + u64 end;
> +};
> +
>  /**
>   * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table 
> walk.
>   * @KVM_PGTABLE_WALK_LEAF:   Visit leaf entries, including invalid
> @@ -379,4 +389,31 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, 
> u64 addr, u64 size);
>  int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>struct kvm_pgtable_walker *walker);
>  
> +/**
> + * kvm_pgtable_stage2_idmap_greedy() - Identity-map an Intermediate Physical
> + *  Address with a leaf entry at the highest
> + *  possible level.

Not sure it's worth mentioning "highest possible level" here, as
realistically the caller still has to provide a memcache to deal with the
worst case and the structure of the page-table shouldn't matter.

> + * @pgt: Page-table structure initialised by kvm_pgtable_*_init().
> + * @addr:Input address to identity-map.
> + * @prot:Permissions and attributes for the mapping.
> + * @range:   Boundaries of the maximum memory region to map.
> + * @mc:  Cache of pre-allocated memory from which to allocate 
> page-table
> + *   pages.
> + *
> + * This function attempts to install high-level identity-mappings covering 
> @addr

"high-level"? (again, I think I'd just drop this)

> + * without overriding existing mappings with incompatible permissions or
> + * attributes. An existing table entry may be coalesced into a block mapping
> + * if and only if it covers @addr and all its leafs are either invalid and/or

s/leafs/leaf entries/

> + * have permissions and attributes strictly matching @prot. The mapping is
> + * guaranteed to be contained within the boundaries specified by @range at 
> call
> + * time. If only a subset of the memory specified by @range is mapped 
> (because
> + * of e.g. alignment issues or existing incompatible mappings), @range will 
> be
> + * updated accordingly.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_stage2_idmap_greedy(struct kvm_pgtable *pgt, u64 addr,
> + enum kvm_pgtable_prot prot,
> + struct kvm_mem_range *range,
> + void *mc);
>  #endif   /* __ARM64_KVM_PGTABLE_H__ */
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 8aa01a9e2603..6897d771e2b2 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -987,3 +987,122 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
>   pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
>   pgt->pgd = NULL;
>  }
> +
> +struct stage2_reduce_range_data {
> + kvm_pte_t attr;
> + u64 target_addr;
> + u32 start_level;
> + struct kvm_mem_range *range;
> +};
> +
> +static int __stage2_reduce_range(struct stage2_reduce_range_data *data, u64 
> addr)
> +{
> + u32 level = data->start_level;
> +
> + for (; level < KVM_PGTABLE_MAX_LEVELS; level++) {
> + u64 granule = kvm_granule_size(level);
> + u64 start = ALIGN_DOWN(data->target_addr, granule);
> + u64 end = start + granule;
> +
> + /*
> +  * The pinned address is in the current range, try one level
> +  * deeper.
> +  */
> + if (start == ALIGN_DOWN(addr, granule))
> + continue;
> +
> + /*
> +  * Make sure the current range is a reduction of the existing
> +  * range 

[PATCH] arm64/mm: Fix __enable_mmu() for new TGRAN range values

2021-03-05 Thread Anshuman Khandual
From: James Morse 

As per ARM ARM DDI 0487G.a, when FEAT_LPA2 is implemented, ID_AA64MMFR0_EL1
might contain a range of values to describe supported translation granules
(4K and 16K pages sizes in particular) instead of just enabled or disabled
values. This changes __enable_mmu() function to handle complete acceptable
range of values (depending on whether the field is signed or unsigned) now
represented with ID_AA64MMFR0_TGRAN_SUPPORTED_[MIN..MAX] pair. While here,
also fix similar situations in EFI stub and KVM as well.

Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: James Morse 
Cc: Suzuki K Poulose 
Cc: Ard Biesheuvel 
Cc: Mark Rutland 
Cc: linux-arm-ker...@lists.infradead.org
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-...@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: James Morse 
Signed-off-by: Anshuman Khandual 
---
 arch/arm64/include/asm/sysreg.h   | 20 ++--
 arch/arm64/kernel/head.S  |  6 --
 arch/arm64/kvm/reset.c| 23 ---
 drivers/firmware/efi/libstub/arm64-stub.c |  2 +-
 4 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index dfd4edb..d4a5fca9 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -796,6 +796,11 @@
 #define ID_AA64MMFR0_PARANGE_480x5
 #define ID_AA64MMFR0_PARANGE_520x6
 
+#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT 0x0
+#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE0x1
+#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN 0x2
+#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX 0x7
+
 #ifdef CONFIG_ARM64_PA_BITS_52
 #define ID_AA64MMFR0_PARANGE_MAX   ID_AA64MMFR0_PARANGE_52
 #else
@@ -961,14 +966,17 @@
 #define ID_PFR1_PROGMOD_SHIFT  0
 
 #if defined(CONFIG_ARM64_4K_PAGES)
-#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN4_SHIFT
-#define ID_AA64MMFR0_TGRAN_SUPPORTED   ID_AA64MMFR0_TGRAN4_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN4_SHIFT
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN   ID_AA64MMFR0_TGRAN4_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX   0x7
 #elif defined(CONFIG_ARM64_16K_PAGES)
-#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN16_SHIFT
-#define ID_AA64MMFR0_TGRAN_SUPPORTED   ID_AA64MMFR0_TGRAN16_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN16_SHIFT
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN   ID_AA64MMFR0_TGRAN16_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX   0xF
 #elif defined(CONFIG_ARM64_64K_PAGES)
-#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN64_SHIFT
-#define ID_AA64MMFR0_TGRAN_SUPPORTED   ID_AA64MMFR0_TGRAN64_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SHIFT   ID_AA64MMFR0_TGRAN64_SHIFT
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN   ID_AA64MMFR0_TGRAN64_SUPPORTED
+#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX   0x7
 #endif
 
 #define MVFR2_FPMISC_SHIFT 4
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 66b0e0b..8b469f1 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -655,8 +655,10 @@ SYM_FUNC_END(__secondary_too_slow)
 SYM_FUNC_START(__enable_mmu)
mrs x2, ID_AA64MMFR0_EL1
ubfxx2, x2, #ID_AA64MMFR0_TGRAN_SHIFT, 4
-   cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED
-   b.ne__no_granule_support
+   cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MIN
+   b.lt__no_granule_support
+   cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MAX
+   b.gt__no_granule_support
update_early_cpu_boot_status 0, x2, x3
adrpx2, idmap_pg_dir
phys_to_ttbr x1, x1
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 47f3f03..fe72bfb 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -286,7 +286,7 @@ u32 get_kvm_ipa_limit(void)
 
 int kvm_set_ipa_limit(void)
 {
-   unsigned int parange, tgran_2;
+   unsigned int parange, tgran_2_shift, tgran_2;
u64 mmfr0;
 
mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
@@ -300,27 +300,28 @@ int kvm_set_ipa_limit(void)
switch (PAGE_SIZE) {
default:
case SZ_4K:
-   tgran_2 = ID_AA64MMFR0_TGRAN4_2_SHIFT;
+   tgran_2_shift = ID_AA64MMFR0_TGRAN4_2_SHIFT;
break;
case SZ_16K:
-   tgran_2 = ID_AA64MMFR0_TGRAN16_2_SHIFT;
+   tgran_2_shift = ID_AA64MMFR0_TGRAN16_2_SHIFT;
break;
case SZ_64K:
-   tgran_2 = ID_AA64MMFR0_TGRAN64_2_SHIFT;
+   tgran_2_shift = ID_AA64MMFR0_TGRAN64_2_SHIFT;
break;
}
 
-   switch (cpuid_feature_extract_unsigned_field(mmfr0, tgran_2)) {
-   default:
-   case 1:
+   tgran_2 = cpuid_feature_extract_unsigned_field(mmfr0, tgran_2_shift);
+   if (tgran_2 == 

Re: [PATCH] KVM: arm64: Disable LTO in hyp

2021-03-05 Thread Ard Biesheuvel
On Fri, 5 Mar 2021 at 12:38, Marc Zyngier  wrote:
>
> On Fri, 05 Mar 2021 02:38:17 +,
> Sami Tolvanen  wrote:
> >
> > On Thu, Mar 4, 2021 at 2:34 PM Sami Tolvanen  
> > wrote:
> > >
> > > On Thu, Mar 4, 2021 at 2:17 PM Marc Zyngier  wrote:
> > > >
> > > > On Thu, 04 Mar 2021 21:25:41 +,
> > > > Sami Tolvanen  wrote:
>
> [...]
>
> > > > > I assume hyp_panic() ends up being placed too far from __guest_enter()
> > > > > when the kernel is large enough. Possibly something to do with LLVM
> > > > > always splitting functions into separate sections with LTO. I'm not
> > > > > sure why the linker cannot shuffle things around to make everyone
> > > > > happy in this case, but I confirmed that this patch also fixes the
> > > > > build issue for me:
> > > > >
> > > > > diff --git a/arch/arm64/kvm/hyp/vhe/switch.c 
> > > > > b/arch/arm64/kvm/hyp/vhe/switch.c
> > > > > index af8e940d0f03..128197b7c794 100644
> > > > > --- a/arch/arm64/kvm/hyp/vhe/switch.c
> > > > > +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> > > > > @@ -214,7 +214,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, 
> > > > > u64 par)
> > > > >  }
> > > > >  NOKPROBE_SYMBOL(__hyp_call_panic);
> > > > >
> > > > > -void __noreturn hyp_panic(void)
> > > > > +void __noreturn hyp_panic(void) __section(".text")
> > > > >  {
> > > > > u64 spsr = read_sysreg_el2(SYS_SPSR);
> > > > > u64 elr = read_sysreg_el2(SYS_ELR);
> > > > >
> > > >
> > > > We're getting into black-magic territory here. Why wouldn't hyp_panic
> > > > be in the .text section already?
> > >
> > > It's not quite black magic. LLVM essentially flips on
> > > -ffunction-sections with LTO and therefore, hyp_panic() will be in
> > > .text.hyp_panic in vmlinux.o, while __guest_enter() will be in .text.
> > > Everything ends up in .text when we link vmlinux, of course.
> > >
> > > $ readelf --sections vmlinux.o | grep hyp_panic
> > >   [3936] .text.hyp_panic   PROGBITS   004b56e4
> >
> > Note that disabling LTO here has essentially the same effect as using
> > __section(".text"). It stops the compiler from splitting these
> > functions into .text.* sections and makes it less likely that
> > hyp_panic() ends up too far away from __guest_enter().
> >
> > If neither of these workarounds sound appealing, I suppose we could
> > alternatively change hyp/entry.S to use adr_l for hyp_panic. Thoughts?
>
> That would be an actual fix instead of a workaround, as it would
> remove existing assumptions about the relative locations of the two
> objects. I guess you need to fix both instances with something such
> as:
>
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index b0afad7a99c6..a43e1f7ee354 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -85,8 +85,10 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL)
>
> // If the hyp context is loaded, go straight to hyp_panic
> get_loaded_vcpu x0, x1
> -   cbz x0, hyp_panic
> -
> +   cbnzx0, 1f
> +   adr_l   x0, hyp_panic
> +   br  x0
> +1:

Agree with replacing the conditional branches that refer to external
symbols: the compiler never emits those, for the reason we are seeing
here, i.e., the range is simply insufficient.

But let's just use 'b hyp_panic' instead, no?



> // The hyp context is saved so make sure it is restored to allow
> // hyp_panic to run at hyp and, subsequently, panic to run in the 
> host.
> // This makes use of __guest_exit to avoid duplication but sets the
> @@ -94,7 +96,7 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL)
> // current state is saved to the guest context but it will only be
> // accurate if the guest had been completely restored.
> adr_this_cpu x0, kvm_hyp_ctxt, x1
> -   adr x1, hyp_panic
> +   adr_l   x1, hyp_panic
> str x1, [x0, #CPU_XREG_OFFSET(30)]
>
> get_vcpu_ptrx1, x0
>
> which is completely untested. I wouldn't be surprised if there were
> more of these somewhere.
>

A quick grep gives me

$ objdump -r vmlinux.o |grep BR19
0005b6e0 R_AARCH64_CONDBR19  hyp_panic
00418e08 R_AARCH64_CONDBR19  __memcpy
00418e14 R_AARCH64_CONDBR19  __memcpy
3818 R_AARCH64_CONDBR19  __kvm_nvhe___guest_exit_panic
3898 R_AARCH64_CONDBR19  __kvm_nvhe___guest_exit_panic
3918 R_AARCH64_CONDBR19  __kvm_nvhe___guest_exit_panic
3998 R_AARCH64_CONDBR19  __kvm_nvhe___guest_exit_panic
3a18 R_AARCH64_CONDBR19  __kvm_nvhe___guest_exit_panic
3a98 R_AARCH64_CONDBR19  __kvm_nvhe___guest_exit_panic
3b18 R_AARCH64_CONDBR19  __kvm_nvhe___guest_exit_panic
3b98 R_AARCH64_CONDBR19  __kvm_nvhe___guest_exit_panic
3c10 R_AARCH64_CONDBR19  __kvm_nvhe___host_exit
3c1c R_AARCH64_CONDBR19  __kvm_nvhe___host_exit
64f0 R_AARCH64_CONDBR19  __kvm_nvhe_hyp_panic
078c 

Re: [PATCH v3 26/32] KVM: arm64: Introduce PROT_NONE mappings for stage 2

2021-03-05 Thread Quentin Perret
On Thursday 04 Mar 2021 at 20:00:45 (+), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:56PM +, Quentin Perret wrote:
> > Once we start unmapping portions of memory from the host stage 2 (such
> > as e.g. the hypervisor memory sections, or pages that belong to
> > protected guests), we will need a way to track page ownership. And
> > given that all mappings in the host stage 2 will be identity-mapped, we
> > can use the host stage 2 page-table itself as a simplistic rmap.
> > 
> > As a first step towards this, introduce a new protection attribute
> > in the stage 2 page table code, called KVM_PGTABLE_PROT_NONE, which
> > allows to annotate portions of the IPA space as inaccessible. For
> > simplicity, PROT_NONE mappings are created as invalid mappings with a
> > software bit set.
> 
> Just as an observation, but given that they're invalid we can use any bit
> from [63:2] to indicate that it's a PROT_NONE mapping, and that way we
> can keep the real "software bits" for live mappings.
> 
> But we can of course change that later when we need the bit for something
> else.

Right, so I used this approach for consistency with the kernel's
PROT_NONE mappings:

#define PTE_PROT_NONE   (_AT(pteval_t, 1) << 58) /* only when 
!PTE_VALID */

And in fact now that I think about it, it might be worth re-using the
same bit in stage 2.

But yes it would be pretty neat to use the other bits of invalid
mappings to add metadata about the pages. I could even drop the
PROT_NONE stuff straight away in favor of a more extensive mechanism for
tracking page ownership...

Thinking about it, it should be relatively straightforward to construct
the host stage 2 with the following invariants:

  1. everything is identity-mapped in the host stage 2;

  2. all valid mappings imply the underlying PA range belongs to the
 host;

  3. bits [63:32] (say) of all invalid mappings are used to store a
 unique identifier for the owner of the underlying PA range;

  4. the host identifier is 0, such that it owns all of memory by
 default at boot as its pgd is zeroed;

And then I could replace my PROT_NONE permission stuff by an ownership
change. E.g. the hypervisor would have its own identifier, and I could
use it to mark the .hyp memory sections as owned by the hyp (which
implies inaccessible by the host). And that should scale quite easily
when we start running protected guests as we'll assign them their own
identifiers. Sharing pages between guests (or even worse, between host
and guests) is a bit trickier, but maybe that is for later.

Thoughts?

> > 
> > Signed-off-by: Quentin Perret 
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h |  2 ++
> >  arch/arm64/kvm/hyp/pgtable.c | 26 --
> >  2 files changed, 26 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h 
> > b/arch/arm64/include/asm/kvm_pgtable.h
> > index 9935dbae2cc1..c9f6ed76e0ad 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -80,6 +80,7 @@ struct kvm_pgtable {
> >   * @KVM_PGTABLE_PROT_W:Write permission.
> >   * @KVM_PGTABLE_PROT_R:Read permission.
> >   * @KVM_PGTABLE_PROT_DEVICE:   Device attributes.
> > + * @KVM_PGTABLE_PROT_NONE: No permission.
> >   */
> >  enum kvm_pgtable_prot {
> > KVM_PGTABLE_PROT_X  = BIT(0),
> > @@ -87,6 +88,7 @@ enum kvm_pgtable_prot {
> > KVM_PGTABLE_PROT_R  = BIT(2),
> >  
> > KVM_PGTABLE_PROT_DEVICE = BIT(3),
> > +   KVM_PGTABLE_PROT_NONE   = BIT(4),
> 
> Why do we need an extra entry here? Couldn't we just create PROT_NONE
> entries when none of R,W or X are set?

The kernel has an explicit PAGE_NONE permission, so I followed the same
idea, but that could work as well. Now, as per the above discussion that
may not be relevant if we implement the page ownership thing.

> >  };
> >  
> >  #define PAGE_HYP   (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index bdd6e3d4eeb6..8e7059fcfd40 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -48,6 +48,8 @@
> >  KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
> >  KVM_PTE_LEAF_ATTR_HI_S2_XN)
> >  
> > +#define KVM_PTE_LEAF_SW_BIT_PROT_NONE  BIT(55)
> > +
> >  struct kvm_pgtable_walk_data {
> > struct kvm_pgtable  *pgt;
> > struct kvm_pgtable_walker   *walker;
> > @@ -120,6 +122,16 @@ static bool kvm_pte_valid(kvm_pte_t pte)
> > return pte & KVM_PTE_VALID;
> >  }
> >  
> > +static bool kvm_pte_prot_none(kvm_pte_t pte)
> > +{
> > +   return pte & KVM_PTE_LEAF_SW_BIT_PROT_NONE;
> > +}
> 
> I think it would be a good idea to check !kvm_pte_valid() in here too,
> since it doesn't make sense to report true for valid (or table) 

Re: [PATCH v3 27/32] KVM: arm64: Refactor stage2_map_set_prot_attr()

2021-03-05 Thread Quentin Perret
On Thursday 04 Mar 2021 at 20:03:36 (+), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:57PM +, Quentin Perret wrote:
> > In order to ease its re-use in other code paths, refactor
> > stage2_map_set_prot_attr() to not depend on a stage2_map_data struct.
> > No functional change intended.
> > 
> > Signed-off-by: Quentin Perret 
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 19 ---
> >  1 file changed, 8 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 8e7059fcfd40..8aa01a9e2603 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -494,8 +494,7 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
> > return vtcr;
> >  }
> >  
> > -static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
> > -   struct stage2_map_data *data)
> > +static kvm_pte_t stage2_get_prot_attr(enum kvm_pgtable_prot prot)
> >  {
> > bool device = prot & KVM_PGTABLE_PROT_DEVICE;
> > kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
> > @@ -504,15 +503,15 @@ static int stage2_map_set_prot_attr(enum 
> > kvm_pgtable_prot prot,
> >  
> > if (prot & KVM_PGTABLE_PROT_NONE) {
> > if (prot != KVM_PGTABLE_PROT_NONE)
> > -   return -EINVAL;
> > +   return 0;
> 
> Hmm, does the architecture actually say that having all these attributes
> as 0 is illegal?

Hmm, that's a good point, that might not be the case. I assumed we would
have no use for this, but there we can easily avoid the restriction
so...

> If not, I think it would be better to keep the int return
> code and replace the 'data' parameter with a pointer to a kvm_pte_t.
> 
> Does that work?

I think so yes, I'll fix it up.

Cheers,
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 24/32] KVM: arm64: Reserve memory for host stage 2

2021-03-05 Thread Quentin Perret
On Thursday 04 Mar 2021 at 19:49:53 (+), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:54PM +, Quentin Perret wrote:
> > Extend the memory pool allocated for the hypervisor to include enough
> > pages to map all of memory at page granularity for the host stage 2.
> > While at it, also reserve some memory for device mappings.
> > 
> > Signed-off-by: Quentin Perret 
> > ---
> >  arch/arm64/kvm/hyp/include/nvhe/mm.h | 23 ++-
> >  arch/arm64/kvm/hyp/nvhe/setup.c  | 12 
> >  arch/arm64/kvm/hyp/reserved_mem.c|  2 ++
> >  3 files changed, 36 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h 
> > b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > index ac0f7fcffd08..411a35db949c 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > @@ -53,7 +53,7 @@ static inline unsigned long 
> > __hyp_pgtable_max_pages(unsigned long nr_pages)
> > return total;
> >  }
> >  
> > -static inline unsigned long hyp_s1_pgtable_pages(void)
> > +static inline unsigned long __hyp_pgtable_total_pages(void)
> >  {
> > unsigned long res = 0, i;
> >  
> > @@ -63,9 +63,30 @@ static inline unsigned long hyp_s1_pgtable_pages(void)
> > res += __hyp_pgtable_max_pages(reg->size >> PAGE_SHIFT);
> > }
> >  
> > +   return res;
> > +}
> > +
> > +static inline unsigned long hyp_s1_pgtable_pages(void)
> > +{
> > +   unsigned long res;
> > +
> > +   res = __hyp_pgtable_total_pages();
> > +
> > /* Allow 1 GiB for private mappings */
> > res += __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT);
> >  
> > return res;
> >  }
> > +
> > +static inline unsigned long host_s2_mem_pgtable_pages(void)
> > +{
> > +   return __hyp_pgtable_total_pages() + 16;
> 
> Is this 16 due to the possibility of a concatenated pgd?

Yes it is, to be sure we have a safe upper-bound.

> If so, please add a comment to that effect.

Will do, thanks.
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 16/32] KVM: arm64: Elevate hypervisor mappings creation at EL2

2021-03-05 Thread Quentin Perret
On Thursday 04 Mar 2021 at 19:25:41 (+), Will Deacon wrote:
> > +static int do_pkvm_init(u32 hyp_va_bits)
> > +{
> > +   void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
> > +   int ret;
> > +
> > +   preempt_disable();
> > +   hyp_install_host_vector();
> 
> It's a shame we need this both here _and_ on the reinit path, but it looks
> like it's necessary.

Right and I want this before the KVM vectors are installed on secondary
CPUs, to make sure they get the new pgtable from the start. Otherwise
I'd need to do the same dance on all of them to go a switch TTBR0_EL2
and such.

> > +   ret = kvm_call_hyp_nvhe(__pkvm_init, hyp_mem_base, hyp_mem_size,
> > +   num_possible_cpus(), kern_hyp_va(per_cpu_base),
> > +   hyp_va_bits);
> > +   preempt_enable();
> > +
> > +   return ret;
> > +}
> 
> [...]
> 
> >  /**
> >   * Inits Hyp-mode on all online CPUs
> >   */
> >  static int init_hyp_mode(void)
> >  {
> > +   u32 hyp_va_bits;
> > int cpu;
> > -   int err = 0;
> > +   int err = -ENOMEM;
> > +
> > +   /*
> > +* The protected Hyp-mode cannot be initialized if the memory pool
> > +* allocation has failed.
> > +*/
> > +   if (is_protected_kvm_enabled() && !hyp_mem_base)
> > +   return err;
> 
> This skips the error message you get on the out_err path.

Ack, I'll fix.

> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 4d41d7838d53..9d331bf262d2 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -221,15 +221,39 @@ void free_hyp_pgds(void)
> > if (hyp_pgtable) {
> > kvm_pgtable_hyp_destroy(hyp_pgtable);
> > kfree(hyp_pgtable);
> > +   hyp_pgtable = NULL;
> > }
> > mutex_unlock(_hyp_pgd_mutex);
> >  }
> >  
> > +static bool kvm_host_owns_hyp_mappings(void)
> > +{
> > +   if (static_branch_likely(_protected_mode_initialized))
> > +   return false;
> > +
> > +   /*
> > +* This can happen at boot time when __create_hyp_mappings() is called
> > +* after the hyp protection has been enabled, but the static key has
> > +* not been flipped yet.
> > +*/
> > +   if (!hyp_pgtable && is_protected_kvm_enabled())
> > +   return false;
> > +
> > +   WARN_ON(!hyp_pgtable);
> > +
> > +   return true;
> 
>   return !(WARN_ON(!hyp_pgtable) && is_protected_kvm_enabled());

Wouldn't this WARN when I have !hyp_pgtable && is_protected_kvm_enabled()
but the static key is still off (which can happen, sadly, as per the
comment above)?

Thanks,
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Add big.LITTLE architecture support for vcpu

2021-03-05 Thread Marc Zyngier
On Fri, 05 Mar 2021 10:34:25 +,
Yejune Deng  wrote:
> 
> In big.LITTLE architecture,the application layer calls KVM_ARM_VCPU_INIT
> several times.Sometimes the thread will run on the big core, sometimes
> will run on the little core.The big core and the little core has always
> different cpuid, but the init target is fixed. and that leads to
> init->target != phys_target. So modify phys target from the current core
> to all cpu online.

This is done on purpose, and it is userspace's responsibility to pin
its vcpu threads to a certain type of CPUs if it cares at all.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm64: Disable LTO in hyp

2021-03-05 Thread Marc Zyngier
On Fri, 05 Mar 2021 02:38:17 +,
Sami Tolvanen  wrote:
> 
> On Thu, Mar 4, 2021 at 2:34 PM Sami Tolvanen  wrote:
> >
> > On Thu, Mar 4, 2021 at 2:17 PM Marc Zyngier  wrote:
> > >
> > > On Thu, 04 Mar 2021 21:25:41 +,
> > > Sami Tolvanen  wrote:

[...]

> > > > I assume hyp_panic() ends up being placed too far from __guest_enter()
> > > > when the kernel is large enough. Possibly something to do with LLVM
> > > > always splitting functions into separate sections with LTO. I'm not
> > > > sure why the linker cannot shuffle things around to make everyone
> > > > happy in this case, but I confirmed that this patch also fixes the
> > > > build issue for me:
> > > >
> > > > diff --git a/arch/arm64/kvm/hyp/vhe/switch.c 
> > > > b/arch/arm64/kvm/hyp/vhe/switch.c
> > > > index af8e940d0f03..128197b7c794 100644
> > > > --- a/arch/arm64/kvm/hyp/vhe/switch.c
> > > > +++ b/arch/arm64/kvm/hyp/vhe/switch.c
> > > > @@ -214,7 +214,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, u64 
> > > > par)
> > > >  }
> > > >  NOKPROBE_SYMBOL(__hyp_call_panic);
> > > >
> > > > -void __noreturn hyp_panic(void)
> > > > +void __noreturn hyp_panic(void) __section(".text")
> > > >  {
> > > > u64 spsr = read_sysreg_el2(SYS_SPSR);
> > > > u64 elr = read_sysreg_el2(SYS_ELR);
> > > >
> > >
> > > We're getting into black-magic territory here. Why wouldn't hyp_panic
> > > be in the .text section already?
> >
> > It's not quite black magic. LLVM essentially flips on
> > -ffunction-sections with LTO and therefore, hyp_panic() will be in
> > .text.hyp_panic in vmlinux.o, while __guest_enter() will be in .text.
> > Everything ends up in .text when we link vmlinux, of course.
> >
> > $ readelf --sections vmlinux.o | grep hyp_panic
> >   [3936] .text.hyp_panic   PROGBITS   004b56e4
> 
> Note that disabling LTO here has essentially the same effect as using
> __section(".text"). It stops the compiler from splitting these
> functions into .text.* sections and makes it less likely that
> hyp_panic() ends up too far away from __guest_enter().
> 
> If neither of these workarounds sound appealing, I suppose we could
> alternatively change hyp/entry.S to use adr_l for hyp_panic. Thoughts?

That would be an actual fix instead of a workaround, as it would
remove existing assumptions about the relative locations of the two
objects. I guess you need to fix both instances with something such
as:

diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
index b0afad7a99c6..a43e1f7ee354 100644
--- a/arch/arm64/kvm/hyp/entry.S
+++ b/arch/arm64/kvm/hyp/entry.S
@@ -85,8 +85,10 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL)
 
// If the hyp context is loaded, go straight to hyp_panic
get_loaded_vcpu x0, x1
-   cbz x0, hyp_panic
-
+   cbnzx0, 1f
+   adr_l   x0, hyp_panic
+   br  x0
+1:
// The hyp context is saved so make sure it is restored to allow
// hyp_panic to run at hyp and, subsequently, panic to run in the host.
// This makes use of __guest_exit to avoid duplication but sets the
@@ -94,7 +96,7 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL)
// current state is saved to the guest context but it will only be
// accurate if the guest had been completely restored.
adr_this_cpu x0, kvm_hyp_ctxt, x1
-   adr x1, hyp_panic
+   adr_l   x1, hyp_panic
str x1, [x0, #CPU_XREG_OFFSET(30)]
 
get_vcpu_ptrx1, x0

which is completely untested. I wouldn't be surprised if there were
more of these somewhere.

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v12 03/13] vfio: VFIO_IOMMU_SET_MSI_BINDING

2021-03-05 Thread Jean-Philippe Brucker
Hi,

On Tue, Feb 23, 2021 at 10:06:15PM +0100, Eric Auger wrote:
> This patch adds the VFIO_IOMMU_SET_MSI_BINDING ioctl which aim
> to (un)register the guest MSI binding to the host. This latter
> then can use those stage 1 bindings to build a nested stage
> binding targeting the physical MSIs.

Now that RMR is in the IORT spec, could it be used for the nested MSI
problem?  For virtio-iommu tables I was planning to do it like this:

MSI is mapped at stage-2 with an arbitrary IPA->doorbell PA. We report
this IPA to userspace through iommu_groups/X/reserved_regions. No change
there. Then to the guest we report a reserved identity mapping at IPA
(using RMR, an equivalent DT binding, or probed RESV_MEM for
virtio-iommu). The guest creates that mapping at stage-1, and that's it.
Unless I overlooked something we'd only reuse existing infrastructure and
avoid the SET_MSI_BINDING interface.

Thanks,
Jean
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


RE: [RFC PATCH 4/5] iommu/arm-smmu-v3: Use pinned VMID for NESTED stage with BTM

2021-03-05 Thread Shameerali Kolothum Thodi
Hi Jean,

> -Original Message-
> From: Jean-Philippe Brucker [mailto:jean-phili...@linaro.org]
> Sent: 04 March 2021 17:11
> To: Shameerali Kolothum Thodi 
> Cc: linux-arm-ker...@lists.infradead.org; io...@lists.linux-foundation.org;
> kvmarm@lists.cs.columbia.edu; m...@kernel.org;
> alex.william...@redhat.com; eric.au...@redhat.com;
> zhangfei@linaro.org; Jonathan Cameron
> ; Zengtao (B) ;
> linux...@openeuler.org
> Subject: Re: [RFC PATCH 4/5] iommu/arm-smmu-v3: Use pinned VMID for
> NESTED stage with BTM
> 
> Hi Shameer,
> 
> On Mon, Feb 22, 2021 at 03:53:37PM +, Shameer Kolothum wrote:
> > If the SMMU supports BTM and the device belongs to NESTED domain
> > with shared pasid table, we need to use the VMID allocated by the
> > KVM for the s2 configuration. Hence, request a pinned VMID from KVM.
> >
> > Signed-off-by: Shameer Kolothum 
> > ---
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 49
> -
> >  1 file changed, 47 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index 26bf7da1bcd0..04f83f7c8319 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -28,6 +28,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include 
> >
> > @@ -2195,6 +2196,33 @@ static void arm_smmu_bitmap_free(unsigned
> long *map, int idx)
> > clear_bit(idx, map);
> >  }
> >
> > +static int arm_smmu_pinned_vmid_get(struct arm_smmu_domain
> *smmu_domain)
> > +{
> > +   struct arm_smmu_master *master;
> > +
> > +   master = list_first_entry_or_null(_domain->devices,
> > + struct arm_smmu_master, domain_head);
> 
> This probably needs to hold devices_lock while using master.

Ok.

> 
> > +   if (!master)
> > +   return -EINVAL;
> > +
> > +   return kvm_pinned_vmid_get(master->dev);
> > +}
> > +
> > +static int arm_smmu_pinned_vmid_put(struct arm_smmu_domain
> *smmu_domain)
> > +{
> > +   struct arm_smmu_master *master;
> > +
> > +   master = list_first_entry_or_null(_domain->devices,
> > + struct arm_smmu_master, domain_head);
> > +   if (!master)
> > +   return -EINVAL;
> > +
> > +   if (smmu_domain->s2_cfg.vmid)
> > +   return kvm_pinned_vmid_put(master->dev);
> > +
> > +   return 0;
> > +}
> > +
> >  static void arm_smmu_domain_free(struct iommu_domain *domain)
> >  {
> > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> > @@ -2215,8 +2243,11 @@ static void arm_smmu_domain_free(struct
> iommu_domain *domain)
> > mutex_unlock(_smmu_asid_lock);
> > }
> > if (s2_cfg->set) {
> > -   if (s2_cfg->vmid)
> > -   arm_smmu_bitmap_free(smmu->vmid_map, s2_cfg->vmid);
> > +   if (s2_cfg->vmid) {
> > +   if (!(smmu->features & ARM_SMMU_FEAT_BTM) &&
> > +   smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> > +   arm_smmu_bitmap_free(smmu->vmid_map,
> s2_cfg->vmid);
> > +   }
> > }
> >
> > kfree(smmu_domain);
> > @@ -3199,6 +3230,17 @@ static int arm_smmu_attach_pasid_table(struct
> iommu_domain *domain,
> > !(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
> > goto out;
> >
> > +   if (smmu->features & ARM_SMMU_FEAT_BTM) {
> > +   ret = arm_smmu_pinned_vmid_get(smmu_domain);
> > +   if (ret < 0)
> > +   goto out;
> > +
> > +   if (smmu_domain->s2_cfg.vmid)
> > +   arm_smmu_bitmap_free(smmu->vmid_map,
> smmu_domain->s2_cfg.vmid);
> > +
> > +   smmu_domain->s2_cfg.vmid = (u16)ret;
> 
> That will require a TLB invalidation on the old VMID, once the STE is
> rewritten.

True. Will add that.

> More generally I think this pinned VMID set conflicts with that of
> stage-2-only domains (which is the default state until a guest attaches a
> PASID table). Say you have one guest using DOMAIN_NESTED without PASID
> table, just DMA to IPA using VMID 0x8000. Now another guest attaches a
> PASID table and obtains the same VMID from KVM. The stage-2 translation
> might use TLB entries from the other guest, no?  They'll both create
> stage-2 TLB entries with {StreamWorld=NS-EL1, VMID=0x8000}
> 
> It's tempting to allocate all VMIDs through KVM instead, but that will
> force a dependency on KVM to use VFIO_TYPE1_NESTING_IOMMU and might
> break
> existing users of that extension (though I'm not sure there are any).
> Instead we might need to restrict the SMMU VMID bitmap to match the
> private VMID set in KVM.

Right, that is indeed a problem. I will take a look at this suggestion.
 
> Besides we probably want to restrict this feature to systems supporting
> VMID16 on both SMMU and CPUs, or at least check that they are compatible.