Re: [PATCH] KVM: arm64: Disable LTO in hyp
On Thu, Mar 4, 2021 at 2:34 PM Sami Tolvanen wrote: > > On Thu, Mar 4, 2021 at 2:17 PM Marc Zyngier wrote: > > > > On Thu, 04 Mar 2021 21:25:41 +, > > Sami Tolvanen wrote: > > > > > > On Thu, Mar 4, 2021 at 11:15 AM Marc Zyngier wrote: > > > > > > > > On Thu, 04 Mar 2021 18:45:44 +, > > > > Sami Tolvanen wrote: > > > > > > > > > > allmodconfig + CONFIG_LTO_CLANG_THIN=y fails to build due to following > > > > > linker errors: > > > > > > > > > > ld.lld: error: irqbypass.c:(function __guest_enter: .text+0x21CC): > > > > > > > > I assume this message is only an oddity, right? Because > > > > __guest_enter() is as far as you can imagine from irqbypass.c... > > > > > > I'm not sure what's up with the filename in the error message. Fangrui > > > or Nick probably have a better idea. > > > > > > > > relocation R_AARCH64_CONDBR19 out of range: 2031220 is not in > > > > > [-1048576, 1048575]; references hyp_panic > > > > > >>> defined in vmlinux.o > > > > > > > > > > ld.lld: error: irqbypass.c:(function __guest_enter: .text+0x21E0): > > > > > relocation R_AARCH64_ADR_PREL_LO21 out of range: 2031200 is not in > > > > > [-1048576, 1048575]; references hyp_panic > > > > > >>> defined in vmlinux.o > > > > > > > > > > As LTO is not really necessary for the hypervisor code, disable it for > > > > > the hyp directory to fix the build. > > > > > > > > Can you shed some light on what the problem is exactly? > > > > > > I assume hyp_panic() ends up being placed too far from __guest_enter() > > > when the kernel is large enough. Possibly something to do with LLVM > > > always splitting functions into separate sections with LTO. I'm not > > > sure why the linker cannot shuffle things around to make everyone > > > happy in this case, but I confirmed that this patch also fixes the > > > build issue for me: > > > > > > diff --git a/arch/arm64/kvm/hyp/vhe/switch.c > > > b/arch/arm64/kvm/hyp/vhe/switch.c > > > index af8e940d0f03..128197b7c794 100644 > > > --- a/arch/arm64/kvm/hyp/vhe/switch.c > > > +++ b/arch/arm64/kvm/hyp/vhe/switch.c > > > @@ -214,7 +214,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, u64 > > > par) > > > } > > > NOKPROBE_SYMBOL(__hyp_call_panic); > > > > > > -void __noreturn hyp_panic(void) > > > +void __noreturn hyp_panic(void) __section(".text") > > > { > > > u64 spsr = read_sysreg_el2(SYS_SPSR); > > > u64 elr = read_sysreg_el2(SYS_ELR); > > > > > > > We're getting into black-magic territory here. Why wouldn't hyp_panic > > be in the .text section already? > > It's not quite black magic. LLVM essentially flips on > -ffunction-sections with LTO and therefore, hyp_panic() will be in > .text.hyp_panic in vmlinux.o, while __guest_enter() will be in .text. > Everything ends up in .text when we link vmlinux, of course. > > $ readelf --sections vmlinux.o | grep hyp_panic > [3936] .text.hyp_panic PROGBITS 004b56e4 Note that disabling LTO here has essentially the same effect as using __section(".text"). It stops the compiler from splitting these functions into .text.* sections and makes it less likely that hyp_panic() ends up too far away from __guest_enter(). If neither of these workarounds sound appealing, I suppose we could alternatively change hyp/entry.S to use adr_l for hyp_panic. Thoughts? Sami ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: Disable LTO in hyp
On Thu, Mar 4, 2021 at 2:17 PM Marc Zyngier wrote: > > On Thu, 04 Mar 2021 21:25:41 +, > Sami Tolvanen wrote: > > > > On Thu, Mar 4, 2021 at 11:15 AM Marc Zyngier wrote: > > > > > > On Thu, 04 Mar 2021 18:45:44 +, > > > Sami Tolvanen wrote: > > > > > > > > allmodconfig + CONFIG_LTO_CLANG_THIN=y fails to build due to following > > > > linker errors: > > > > > > > > ld.lld: error: irqbypass.c:(function __guest_enter: .text+0x21CC): > > > > > > I assume this message is only an oddity, right? Because > > > __guest_enter() is as far as you can imagine from irqbypass.c... > > > > I'm not sure what's up with the filename in the error message. Fangrui > > or Nick probably have a better idea. > > > > > > relocation R_AARCH64_CONDBR19 out of range: 2031220 is not in > > > > [-1048576, 1048575]; references hyp_panic > > > > >>> defined in vmlinux.o > > > > > > > > ld.lld: error: irqbypass.c:(function __guest_enter: .text+0x21E0): > > > > relocation R_AARCH64_ADR_PREL_LO21 out of range: 2031200 is not in > > > > [-1048576, 1048575]; references hyp_panic > > > > >>> defined in vmlinux.o > > > > > > > > As LTO is not really necessary for the hypervisor code, disable it for > > > > the hyp directory to fix the build. > > > > > > Can you shed some light on what the problem is exactly? > > > > I assume hyp_panic() ends up being placed too far from __guest_enter() > > when the kernel is large enough. Possibly something to do with LLVM > > always splitting functions into separate sections with LTO. I'm not > > sure why the linker cannot shuffle things around to make everyone > > happy in this case, but I confirmed that this patch also fixes the > > build issue for me: > > > > diff --git a/arch/arm64/kvm/hyp/vhe/switch.c > > b/arch/arm64/kvm/hyp/vhe/switch.c > > index af8e940d0f03..128197b7c794 100644 > > --- a/arch/arm64/kvm/hyp/vhe/switch.c > > +++ b/arch/arm64/kvm/hyp/vhe/switch.c > > @@ -214,7 +214,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, u64 par) > > } > > NOKPROBE_SYMBOL(__hyp_call_panic); > > > > -void __noreturn hyp_panic(void) > > +void __noreturn hyp_panic(void) __section(".text") > > { > > u64 spsr = read_sysreg_el2(SYS_SPSR); > > u64 elr = read_sysreg_el2(SYS_ELR); > > > > We're getting into black-magic territory here. Why wouldn't hyp_panic > be in the .text section already? It's not quite black magic. LLVM essentially flips on -ffunction-sections with LTO and therefore, hyp_panic() will be in .text.hyp_panic in vmlinux.o, while __guest_enter() will be in .text. Everything ends up in .text when we link vmlinux, of course. $ readelf --sections vmlinux.o | grep hyp_panic [3936] .text.hyp_panic PROGBITS 004b56e4 Sami ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: Disable LTO in hyp
On Fri, Mar 5, 2021 at 6:22 AM Ard Biesheuvel wrote: > > On Fri, 5 Mar 2021 at 12:38, Marc Zyngier wrote: > > > > On Fri, 05 Mar 2021 02:38:17 +, > > Sami Tolvanen wrote: > > > > > > On Thu, Mar 4, 2021 at 2:34 PM Sami Tolvanen > > > wrote: > > > > > > > > On Thu, Mar 4, 2021 at 2:17 PM Marc Zyngier wrote: > > > > > > > > > > On Thu, 04 Mar 2021 21:25:41 +, > > > > > Sami Tolvanen wrote: > > > > [...] > > > > > > > > I assume hyp_panic() ends up being placed too far from > > > > > > __guest_enter() > > > > > > when the kernel is large enough. Possibly something to do with LLVM > > > > > > always splitting functions into separate sections with LTO. I'm not > > > > > > sure why the linker cannot shuffle things around to make everyone > > > > > > happy in this case, but I confirmed that this patch also fixes the > > > > > > build issue for me: > > > > > > > > > > > > diff --git a/arch/arm64/kvm/hyp/vhe/switch.c > > > > > > b/arch/arm64/kvm/hyp/vhe/switch.c > > > > > > index af8e940d0f03..128197b7c794 100644 > > > > > > --- a/arch/arm64/kvm/hyp/vhe/switch.c > > > > > > +++ b/arch/arm64/kvm/hyp/vhe/switch.c > > > > > > @@ -214,7 +214,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, > > > > > > u64 par) > > > > > > } > > > > > > NOKPROBE_SYMBOL(__hyp_call_panic); > > > > > > > > > > > > -void __noreturn hyp_panic(void) > > > > > > +void __noreturn hyp_panic(void) __section(".text") > > > > > > { > > > > > > u64 spsr = read_sysreg_el2(SYS_SPSR); > > > > > > u64 elr = read_sysreg_el2(SYS_ELR); > > > > > > > > > > > > > > > > We're getting into black-magic territory here. Why wouldn't hyp_panic > > > > > be in the .text section already? > > > > > > > > It's not quite black magic. LLVM essentially flips on > > > > -ffunction-sections with LTO and therefore, hyp_panic() will be in > > > > .text.hyp_panic in vmlinux.o, while __guest_enter() will be in .text. > > > > Everything ends up in .text when we link vmlinux, of course. > > > > > > > > $ readelf --sections vmlinux.o | grep hyp_panic > > > > [3936] .text.hyp_panic PROGBITS 004b56e4 > > > > > > Note that disabling LTO here has essentially the same effect as using > > > __section(".text"). It stops the compiler from splitting these > > > functions into .text.* sections and makes it less likely that > > > hyp_panic() ends up too far away from __guest_enter(). > > > > > > If neither of these workarounds sound appealing, I suppose we could > > > alternatively change hyp/entry.S to use adr_l for hyp_panic. Thoughts? > > > > That would be an actual fix instead of a workaround, as it would > > remove existing assumptions about the relative locations of the two > > objects. I guess you need to fix both instances with something such > > as: > > > > diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S > > index b0afad7a99c6..a43e1f7ee354 100644 > > --- a/arch/arm64/kvm/hyp/entry.S > > +++ b/arch/arm64/kvm/hyp/entry.S > > @@ -85,8 +85,10 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL) > > > > // If the hyp context is loaded, go straight to hyp_panic > > get_loaded_vcpu x0, x1 > > - cbz x0, hyp_panic > > - > > + cbnzx0, 1f > > + adr_l x0, hyp_panic > > + br x0 > > +1: > > Agree with replacing the conditional branches that refer to external > symbols: the compiler never emits those, for the reason we are seeing > here, i.e., the range is simply insufficient. > > But let's just use 'b hyp_panic' instead, no? Alright, this seems to work for me: diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S index b0afad7a99c6..c62265951467 100644 --- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -85,8 +85,10 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL) // If the hyp context is loaded, go straight to hyp_panic get_loaded_vcpu x0, x1 - cbz x0, hyp_panic + cbnzx0, 1f + b hyp_panic +1: // The hyp context is saved so make sure it is restored to allow // hyp_panic to run at hyp and, subsequently, panic to run in the host. // This makes use of __guest_exit to avoid duplication but sets the @@ -94,7 +96,7 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL) // current state is saved to the guest context but it will only be // accurate if the guest had been completely restored. adr_this_cpu x0, kvm_hyp_ctxt, x1 - adr x1, hyp_panic + adr_l x1, hyp_panic str x1, [x0, #CPU_XREG_OFFSET(30)] get_vcpu_ptrx1, x0 But when I say work, I mean this fixes the allmodconfig build with LTO, and my kernel boots at EL2. I don't actually have a way to properly test KVM on arm64. If nobody sees obvious issues here, I can send a proper patch a bit later. > > // The hyp context is saved so make sure it is restored to allow > > // hyp_panic to run at hyp and, subsequently,
Re: [PATCH] KVM: arm64: Don't use cbz/adr with external symbols
On Fri, Mar 05, 2021 at 12:21:24PM -0800, Sami Tolvanen wrote: > allmodconfig + CONFIG_LTO_CLANG_THIN=y fails to build due to following > linker errors: > > ld.lld: error: irqbypass.c:(function __guest_enter: .text+0x21CC): > relocation R_AARCH64_CONDBR19 out of range: 2031220 is not in > [-1048576, 1048575]; references hyp_panic > >>> defined in vmlinux.o > > ld.lld: error: irqbypass.c:(function __guest_enter: .text+0x21E0): > relocation R_AARCH64_ADR_PREL_LO21 out of range: 2031200 is not in > [-1048576, 1048575]; references hyp_panic > >>> defined in vmlinux.o > > This is because with LTO, the compiler ends up placing hyp_panic() > more than 1MB away from __guest_enter(). Use an unconditional branch > and adr_l instead to fix the issue. > > Link: https://github.com/ClangBuiltLinux/linux/issues/1317 > Reported-by: Nathan Chancellor > Suggested-by: Marc Zyngier > Suggested-by: Ard Biesheuvel > Signed-off-by: Sami Tolvanen Reviewed-by: Kees Cook -- Kees Cook ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 29/32] KVM: arm64: Wrap the host with a stage 2
On Tue, Mar 02, 2021 at 02:59:59PM +, Quentin Perret wrote: > When KVM runs in protected nVHE mode, make use of a stage 2 page-table > to give the hypervisor some control over the host memory accesses. The > host stage 2 is created lazily using large block mappings if possible, > and will default to page mappings in absence of a better solution. > > From this point on, memory accesses from the host to protected memory > regions (e.g. marked PROT_NONE) are fatal and lead to hyp_panic(). > > Signed-off-by: Quentin Perret > --- > arch/arm64/include/asm/kvm_asm.h | 1 + > arch/arm64/include/asm/kvm_cpufeature.h | 2 + > arch/arm64/kernel/image-vars.h| 3 + > arch/arm64/kvm/arm.c | 10 + > arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 34 +++ > arch/arm64/kvm/hyp/nvhe/Makefile | 2 +- > arch/arm64/kvm/hyp/nvhe/hyp-init.S| 1 + > arch/arm64/kvm/hyp/nvhe/hyp-main.c| 11 + > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 213 ++ > arch/arm64/kvm/hyp/nvhe/setup.c | 5 + > arch/arm64/kvm/hyp/nvhe/switch.c | 7 +- > arch/arm64/kvm/hyp/nvhe/tlb.c | 4 +- > 12 files changed, 286 insertions(+), 7 deletions(-) > create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h > create mode 100644 arch/arm64/kvm/hyp/nvhe/mem_protect.c [...] > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h > b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h > new file mode 100644 > index ..d293cb328cc4 > --- /dev/null > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h > @@ -0,0 +1,34 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* > + * Copyright (C) 2020 Google LLC > + * Author: Quentin Perret > + */ > + > +#ifndef __KVM_NVHE_MEM_PROTECT__ > +#define __KVM_NVHE_MEM_PROTECT__ > +#include > +#include > +#include > +#include > +#include > + > +struct host_kvm { > + struct kvm_arch arch; > + struct kvm_pgtable pgt; > + struct kvm_pgtable_mm_ops mm_ops; > + hyp_spinlock_t lock; > +}; > +extern struct host_kvm host_kvm; > + > +int __pkvm_prot_finalize(void); > +int kvm_host_prepare_stage2(void *mem_pgt_pool, void *dev_pgt_pool); > +void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt); > + > +static __always_inline void __load_host_stage2(void) > +{ > + if (static_branch_likely(_protected_mode_initialized)) > + __load_stage2(_kvm.arch.mmu, host_kvm.arch.vtcr); > + else > + write_sysreg(0, vttbr_el2); I realise you've just moved the code, but why is this needed? All we've done is point the stage-2 ttb at 0x0 (i.e. we haven't disabled anything). > +#endif /* __KVM_NVHE_MEM_PROTECT__ */ > diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile > b/arch/arm64/kvm/hyp/nvhe/Makefile > index e204ea77ab27..ce49795324a7 100644 > --- a/arch/arm64/kvm/hyp/nvhe/Makefile > +++ b/arch/arm64/kvm/hyp/nvhe/Makefile > @@ -14,7 +14,7 @@ lib-objs := $(addprefix ../../../lib/, $(lib-objs)) > > obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \ >hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \ > - cache.o cpufeature.o setup.o mm.o > + cache.o cpufeature.o setup.o mm.o mem_protect.o > obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \ >../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o > obj-y += $(lib-objs) > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-init.S > b/arch/arm64/kvm/hyp/nvhe/hyp-init.S > index f312672d895e..6fa01b04954f 100644 > --- a/arch/arm64/kvm/hyp/nvhe/hyp-init.S > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-init.S > @@ -119,6 +119,7 @@ alternative_else_nop_endif > > /* Invalidate the stale TLBs from Bootloader */ > tlbialle2 > + tlbivmalls12e1 > dsb sy > > /* > diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c > b/arch/arm64/kvm/hyp/nvhe/hyp-main.c > index ae6503c9be15..f47028d3fd0a 100644 > --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c > +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c > @@ -13,6 +13,7 @@ > #include > #include > > +#include > #include > #include > > @@ -151,6 +152,10 @@ static void handle___pkvm_create_private_mapping(struct > kvm_cpu_context *host_ct > cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot); > } > > +static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt) > +{ > + cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize(); > +} > typedef void (*hcall_t)(struct kvm_cpu_context *); > > #define HANDLE_FUNC(x) [__KVM_HOST_SMCCC_FUNC_##x] = > (hcall_t)handle_##x > @@ -174,6 +179,7 @@ static const hcall_t host_hcall[] = { > HANDLE_FUNC(__pkvm_cpu_set_vector), > HANDLE_FUNC(__pkvm_create_mappings), > HANDLE_FUNC(__pkvm_create_private_mapping), > + HANDLE_FUNC(__pkvm_prot_finalize), > }; > > static void handle_host_hcall(struct
Re: [PATCH v3 32/32] KVM: arm64: Protect the .hyp sections from the host
On Tue, Mar 02, 2021 at 03:00:02PM +, Quentin Perret wrote: > When KVM runs in nVHE protected mode, use the host stage 2 to unmap the > hypervisor sections. The long-term goal is to ensure the EL2 code can > remain robust regardless of the host's state, so this starts by making > sure the host cannot e.g. write to the .hyp sections directly. > > Signed-off-by: Quentin Perret > --- > arch/arm64/include/asm/kvm_asm.h | 1 + > arch/arm64/kvm/arm.c | 46 +++ > arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 + > arch/arm64/kvm/hyp/nvhe/hyp-main.c| 9 > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 22 + > 5 files changed, 80 insertions(+) [...] > static void handle_host_hcall(struct kvm_cpu_context *host_ctxt) > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c > b/arch/arm64/kvm/hyp/nvhe/mem_protect.c > index 2252ad1a8945..ed480facdc88 100644 > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c > @@ -196,6 +196,28 @@ static int host_stage2_idmap(u64 addr) > return ret; > } > > +int __pkvm_host_unmap(phys_addr_t start, phys_addr_t end) > +{ > + struct kvm_mem_range r1, r2; > + int ret; > + > + /* > + * host_stage2_unmap_dev_all() currently relies on MMIO mappings being > + * non-persistent, so don't allow PROT_NONE in MMIO range. > + */ > + if (!find_mem_range(start, ) || !find_mem_range(end, )) > + return -EINVAL; > + if (r1.start != r2.start) > + return -EINVAL; Feels like this should be in a helper to determine whether or not a range is solely covered by memory. Either way: Acked-by: Will Deacon Will ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
On Wed, Mar 03, 2021 at 04:45:05PM +, Marc Zyngier wrote: > It recently became apparent that the ARMv8 architecture has interesting > rules regarding attributes being used when fetching instructions > if the MMU is off at Stage-1. > > In this situation, the CPU is allowed to fetch from the PoC and > allocate into the I-cache (unless the memory is mapped with > the XN attribute at Stage-2). Digging through the ARM ARM is hard. Do we have this behaviour with FWB as well? -- Catalin ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 26/32] KVM: arm64: Introduce PROT_NONE mappings for stage 2
On Fri, Mar 05, 2021 at 09:52:12AM +, Quentin Perret wrote: > On Thursday 04 Mar 2021 at 20:00:45 (+), Will Deacon wrote: > > On Tue, Mar 02, 2021 at 02:59:56PM +, Quentin Perret wrote: > > > Once we start unmapping portions of memory from the host stage 2 (such > > > as e.g. the hypervisor memory sections, or pages that belong to > > > protected guests), we will need a way to track page ownership. And > > > given that all mappings in the host stage 2 will be identity-mapped, we > > > can use the host stage 2 page-table itself as a simplistic rmap. > > > > > > As a first step towards this, introduce a new protection attribute > > > in the stage 2 page table code, called KVM_PGTABLE_PROT_NONE, which > > > allows to annotate portions of the IPA space as inaccessible. For > > > simplicity, PROT_NONE mappings are created as invalid mappings with a > > > software bit set. > > > > Just as an observation, but given that they're invalid we can use any bit > > from [63:2] to indicate that it's a PROT_NONE mapping, and that way we > > can keep the real "software bits" for live mappings. > > > > But we can of course change that later when we need the bit for something > > else. > > Right, so I used this approach for consistency with the kernel's > PROT_NONE mappings: > > #define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when > !PTE_VALID */ > > And in fact now that I think about it, it might be worth re-using the > same bit in stage 2. > > But yes it would be pretty neat to use the other bits of invalid > mappings to add metadata about the pages. I could even drop the > PROT_NONE stuff straight away in favor of a more extensive mechanism for > tracking page ownership... > > Thinking about it, it should be relatively straightforward to construct > the host stage 2 with the following invariants: > > 1. everything is identity-mapped in the host stage 2; > > 2. all valid mappings imply the underlying PA range belongs to the > host; > > 3. bits [63:32] (say) of all invalid mappings are used to store a > unique identifier for the owner of the underlying PA range; > > 4. the host identifier is 0, such that it owns all of memory by > default at boot as its pgd is zeroed; > > And then I could replace my PROT_NONE permission stuff by an ownership > change. E.g. the hypervisor would have its own identifier, and I could > use it to mark the .hyp memory sections as owned by the hyp (which > implies inaccessible by the host). And that should scale quite easily > when we start running protected guests as we'll assign them their own > identifiers. Sharing pages between guests (or even worse, between host > and guests) is a bit trickier, but maybe that is for later. > > Thoughts? I think this sounds like a worthwhile generalisation to me, although virtio brings an immediate need for shared pages and so we'll still need a software bit for those so that we e.g. prevent the host from donating such a shared page to the hypervisor. Will ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
On Friday 05 Mar 2021 at 14:39:42 (+), Will Deacon wrote: > On Tue, Mar 02, 2021 at 02:59:58PM +, Quentin Perret wrote: > > +/** > > + * kvm_pgtable_stage2_idmap_greedy() - Identity-map an Intermediate > > Physical > > + *Address with a leaf entry at the highest > > + *possible level. > > Not sure it's worth mentioning "highest possible level" here, as > realistically the caller still has to provide a memcache to deal with the > worst case and the structure of the page-table shouldn't matter. Right, we need to pass a range so I suppose that should be enough to say 'this tries to cover large portions of memory'. > > + * @pgt: Page-table structure initialised by kvm_pgtable_*_init(). > > + * @addr: Input address to identity-map. > > + * @prot: Permissions and attributes for the mapping. > > + * @range: Boundaries of the maximum memory region to map. > > + * @mc:Cache of pre-allocated memory from which to allocate > > page-table > > + * pages. > > + * > > + * This function attempts to install high-level identity-mappings covering > > @addr > > "high-level"? (again, I think I'd just drop this) > > > + * without overriding existing mappings with incompatible permissions or > > + * attributes. An existing table entry may be coalesced into a block > > mapping > > + * if and only if it covers @addr and all its leafs are either invalid > > and/or > > s/leafs/leaf entries/ Ack for both. > > + * have permissions and attributes strictly matching @prot. The mapping is > > + * guaranteed to be contained within the boundaries specified by @range at > > call > > + * time. If only a subset of the memory specified by @range is mapped > > (because > > + * of e.g. alignment issues or existing incompatible mappings), @range > > will be > > + * updated accordingly. > > + * > > + * Return: 0 on success, negative error code on failure. > > + */ > > +int kvm_pgtable_stage2_idmap_greedy(struct kvm_pgtable *pgt, u64 addr, > > + enum kvm_pgtable_prot prot, > > + struct kvm_mem_range *range, > > + void *mc); > > #endif /* __ARM64_KVM_PGTABLE_H__ */ > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > > index 8aa01a9e2603..6897d771e2b2 100644 > > --- a/arch/arm64/kvm/hyp/pgtable.c > > +++ b/arch/arm64/kvm/hyp/pgtable.c > > @@ -987,3 +987,122 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable > > *pgt) > > pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz); > > pgt->pgd = NULL; > > } > > + > > +struct stage2_reduce_range_data { > > + kvm_pte_t attr; > > + u64 target_addr; > > + u32 start_level; > > + struct kvm_mem_range *range; > > +}; > > + > > +static int __stage2_reduce_range(struct stage2_reduce_range_data *data, > > u64 addr) > > +{ > > + u32 level = data->start_level; > > + > > + for (; level < KVM_PGTABLE_MAX_LEVELS; level++) { > > + u64 granule = kvm_granule_size(level); > > + u64 start = ALIGN_DOWN(data->target_addr, granule); > > + u64 end = start + granule; > > + > > + /* > > +* The pinned address is in the current range, try one level > > +* deeper. > > +*/ > > + if (start == ALIGN_DOWN(addr, granule)) > > + continue; > > + > > + /* > > +* Make sure the current range is a reduction of the existing > > +* range before updating it. > > +*/ > > + if (data->range->start <= start && end <= data->range->end) { > > + data->start_level = level; > > + data->range->start = start; > > + data->range->end = end; > > + return 0; > > + } > > + } > > + > > + return -EINVAL; > > +} > > + > > +#define KVM_PTE_LEAF_S2_COMPAT_MASK(KVM_PTE_LEAF_ATTR_S2_PERMS | \ > > +KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR | \ > > +KVM_PTE_LEAF_SW_BIT_PROT_NONE) > > + > > +static int stage2_reduce_range_walker(u64 addr, u64 end, u32 level, > > + kvm_pte_t *ptep, > > + enum kvm_pgtable_walk_flags flag, > > + void * const arg) > > +{ > > + struct stage2_reduce_range_data *data = arg; > > + kvm_pte_t attr; > > + int ret; > > + > > + if (addr < data->range->start || addr >= data->range->end) > > + return 0; > > + > > + attr = *ptep & KVM_PTE_LEAF_S2_COMPAT_MASK; > > + if (!attr || attr == data->attr) > > + return 0; > > + > > + /* > > +* An existing mapping with incompatible protection attributes is > > +* 'pinned', so reduce the range if we hit one. > > +*/ > > + ret = __stage2_reduce_range(data, addr); > > + if (ret) > > + return ret; > > + > > +
Re: [PATCH v3 31/32] KVM: arm64: Disable PMU support in protected mode
On Tue, Mar 02, 2021 at 03:00:01PM +, Quentin Perret wrote: > The host currently writes directly in EL2 per-CPU data sections from > the PMU code when running in nVHE. In preparation for unmapping the EL2 > sections from the host stage 2, disable PMU support in protected mode as > we currently do not have a use-case for it. > > Signed-off-by: Quentin Perret > --- > arch/arm64/kvm/perf.c | 3 ++- > arch/arm64/kvm/pmu.c | 8 > 2 files changed, 6 insertions(+), 5 deletions(-) Acked-by: Will Deacon Will ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH] KVM: arm64: Add big.LITTLE architecture support for vcpu
In big.LITTLE architecture,the application layer calls KVM_ARM_VCPU_INIT several times.Sometimes the thread will run on the big core, sometimes will run on the little core.The big core and the little core has always different cpuid, but the init target is fixed. and that leads to init->target != phys_target. So modify phys target from the current core to all cpu online. Signed-off-by: Yejune Deng --- arch/arm64/kvm/arm.c | 30 +- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index fc4c95dd2d26..f7fe09a5b23e 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -983,20 +983,37 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level, return -EINVAL; } +static void check_kvm_target_cpu(void *ret) +{ + *(int *)ret = kvm_target_cpu(); +} + +static int kvm_cpu_online_target(int init_target) +{ + int cpu, ret = -1; + + for_each_online_cpu(cpu) { + smp_call_function_single(cpu, check_kvm_target_cpu, , 1); + if (ret == init_target) + return 0; + } + + return -EINVAL; +} + static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu, const struct kvm_vcpu_init *init) { unsigned int i, ret; - int phys_target = kvm_target_cpu(); - if (init->target != phys_target) + if (kvm_cpu_online_target(init->target)) return -EINVAL; /* * Secondary and subsequent calls to KVM_ARM_VCPU_INIT must * use the same target. */ - if (vcpu->arch.target != -1 && vcpu->arch.target != init->target) + if (vcpu->arch.target != -1 && kvm_cpu_online_target(init->target)) return -EINVAL; /* -ENOENT for unknown features, -EINVAL for invalid combinations. */ @@ -1018,7 +1035,7 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu, set_bit(i, vcpu->arch.features); } - vcpu->arch.target = phys_target; + vcpu->arch.target = init->target; /* Now we know what it is, we can reset it. */ ret = kvm_reset_vcpu(vcpu); @@ -1815,11 +1832,6 @@ static int init_hyp_mode(void) return err; } -static void check_kvm_target_cpu(void *ret) -{ - *(int *)ret = kvm_target_cpu(); -} - struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr) { struct kvm_vcpu *vcpu; -- 2.29.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 8/8] KVM: arm64: Fix range alignment when walking page tables
From: Jia He When walking the page tables at a given level, and if the start address for the range isn't aligned for that level, we propagate the misalignment on each iteration at that level. This results in the walker ignoring a number of entries (depending on the original misalignment) on each subsequent iteration. Properly aligning the address before the next iteration addresses this issue. Cc: sta...@vger.kernel.org Reported-by: Howard Zhang Acked-by: Will Deacon Signed-off-by: Jia He Fixes: b1e57de62cfb ("KVM: arm64: Add stand-alone page-table walker infrastructure") [maz: rewrite commit message] Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20210303024225.2591-1-justin...@arm.com --- arch/arm64/kvm/hyp/pgtable.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 4d177ce1d536..926fc07074f5 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -223,6 +223,7 @@ static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data, goto out; if (!table) { + data->addr = ALIGN_DOWN(data->addr, kvm_granule_size(level)); data->addr += kvm_granule_size(level); goto out; } -- 2.29.2 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 7/8] KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility
It looks like we have broken firmware out there that wrongly advertises a GICv2 compatibility interface, despite the CPUs not being able to deal with it. To work around this, check that the CPU initialising KVM is actually able to switch to MMIO instead of system registers, and use that as a precondition to enable GICv2 compatibility in KVM. Note that the detection happens on a single CPU. If the firmware is lying *and* that the CPUs are asymetric, all hope is lost anyway. Reported-by: Shameerali Kolothum Thodi Tested-by: Shameer Kolothum Signed-off-by: Marc Zyngier --- arch/arm64/kvm/hyp/vgic-v3-sr.c | 35 +++-- arch/arm64/kvm/vgic/vgic-v3.c | 8 ++-- 2 files changed, 39 insertions(+), 4 deletions(-) diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c index 005daa0c9dd7..ee3682b9873c 100644 --- a/arch/arm64/kvm/hyp/vgic-v3-sr.c +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c @@ -408,11 +408,42 @@ void __vgic_v3_init_lrs(void) /* * Return the GIC CPU configuration: * - [31:0] ICH_VTR_EL2 - * - [63:32] RES0 + * - [62:32] RES0 + * - [63]MMIO (GICv2) capable */ u64 __vgic_v3_get_gic_config(void) { - return read_gicreg(ICH_VTR_EL2); + u64 val, sre = read_gicreg(ICC_SRE_EL1); + unsigned long flags = 0; + + /* +* To check whether we have a MMIO-based (GICv2 compatible) +* CPU interface, we need to disable the system register +* view. To do that safely, we have to prevent any interrupt +* from firing (which would be deadly). +* +* Note that this only makes sense on VHE, as interrupts are +* already masked for nVHE as part of the exception entry to +* EL2. +*/ + if (has_vhe()) + flags = local_daif_save(); + + write_gicreg(0, ICC_SRE_EL1); + isb(); + + val = read_gicreg(ICC_SRE_EL1); + + write_gicreg(sre, ICC_SRE_EL1); + isb(); + + if (has_vhe()) + local_daif_restore(flags); + + val = (val & ICC_SRE_EL1_SRE) ? 0 : (1ULL << 63); + val |= read_gicreg(ICH_VTR_EL2); + + return val; } u64 __vgic_v3_read_vmcr(void) diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c index c3e6c3fd333b..6f530925a231 100644 --- a/arch/arm64/kvm/vgic/vgic-v3.c +++ b/arch/arm64/kvm/vgic/vgic-v3.c @@ -575,8 +575,10 @@ early_param("kvm-arm.vgic_v4_enable", early_gicv4_enable); int vgic_v3_probe(const struct gic_kvm_info *info) { u64 ich_vtr_el2 = kvm_call_hyp_ret(__vgic_v3_get_gic_config); + bool has_v2; int ret; + has_v2 = ich_vtr_el2 >> 63; ich_vtr_el2 = (u32)ich_vtr_el2; /* @@ -596,13 +598,15 @@ int vgic_v3_probe(const struct gic_kvm_info *info) gicv4_enable ? "en" : "dis"); } + kvm_vgic_global_state.vcpu_base = 0; + if (!info->vcpu.start) { kvm_info("GICv3: no GICV resource entry\n"); - kvm_vgic_global_state.vcpu_base = 0; + } else if (!has_v2) { + pr_warn(FW_BUG "CPU interface incapable of MMIO access\n"); } else if (!PAGE_ALIGNED(info->vcpu.start)) { pr_warn("GICV physical address 0x%llx not page aligned\n", (unsigned long long)info->vcpu.start); - kvm_vgic_global_state.vcpu_base = 0; } else { kvm_vgic_global_state.vcpu_base = info->vcpu.start; kvm_vgic_global_state.can_emulate_gicv2 = true; -- 2.29.2 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 6/8] KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()
As we are about to report a bit more information to the rest of the kernel, rename __vgic_v3_get_ich_vtr_el2() to the more explicit __vgic_v3_get_gic_config(). No functional change. Tested-by: Shameer Kolothum Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/kvm_asm.h | 4 ++-- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 +++--- arch/arm64/kvm/hyp/vgic-v3-sr.c| 7 ++- arch/arm64/kvm/vgic/vgic-v3.c | 4 +++- 4 files changed, 14 insertions(+), 7 deletions(-) diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index 22d933e9b59e..9c0e396dd03f 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h @@ -50,7 +50,7 @@ #define __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_local_vmid 5 #define __KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff 6 #define __KVM_HOST_SMCCC_FUNC___kvm_enable_ssbs7 -#define __KVM_HOST_SMCCC_FUNC___vgic_v3_get_ich_vtr_el28 +#define __KVM_HOST_SMCCC_FUNC___vgic_v3_get_gic_config 8 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_read_vmcr 9 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_write_vmcr 10 #define __KVM_HOST_SMCCC_FUNC___vgic_v3_init_lrs 11 @@ -192,7 +192,7 @@ extern void __kvm_timer_set_cntvoff(u64 cntvoff); extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu); -extern u64 __vgic_v3_get_ich_vtr_el2(void); +extern u64 __vgic_v3_get_gic_config(void); extern u64 __vgic_v3_read_vmcr(void); extern void __vgic_v3_write_vmcr(u32 vmcr); extern void __vgic_v3_init_lrs(void); diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c index f012f8665ecc..8f129968204e 100644 --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c @@ -67,9 +67,9 @@ static void handle___kvm_enable_ssbs(struct kvm_cpu_context *host_ctxt) write_sysreg_el2(tmp, SYS_SCTLR); } -static void handle___vgic_v3_get_ich_vtr_el2(struct kvm_cpu_context *host_ctxt) +static void handle___vgic_v3_get_gic_config(struct kvm_cpu_context *host_ctxt) { - cpu_reg(host_ctxt, 1) = __vgic_v3_get_ich_vtr_el2(); + cpu_reg(host_ctxt, 1) = __vgic_v3_get_gic_config(); } static void handle___vgic_v3_read_vmcr(struct kvm_cpu_context *host_ctxt) @@ -118,7 +118,7 @@ static const hcall_t host_hcall[] = { HANDLE_FUNC(__kvm_tlb_flush_local_vmid), HANDLE_FUNC(__kvm_timer_set_cntvoff), HANDLE_FUNC(__kvm_enable_ssbs), - HANDLE_FUNC(__vgic_v3_get_ich_vtr_el2), + HANDLE_FUNC(__vgic_v3_get_gic_config), HANDLE_FUNC(__vgic_v3_read_vmcr), HANDLE_FUNC(__vgic_v3_write_vmcr), HANDLE_FUNC(__vgic_v3_init_lrs), diff --git a/arch/arm64/kvm/hyp/vgic-v3-sr.c b/arch/arm64/kvm/hyp/vgic-v3-sr.c index 80406f463c28..005daa0c9dd7 100644 --- a/arch/arm64/kvm/hyp/vgic-v3-sr.c +++ b/arch/arm64/kvm/hyp/vgic-v3-sr.c @@ -405,7 +405,12 @@ void __vgic_v3_init_lrs(void) __gic_v3_set_lr(0, i); } -u64 __vgic_v3_get_ich_vtr_el2(void) +/* + * Return the GIC CPU configuration: + * - [31:0] ICH_VTR_EL2 + * - [63:32] RES0 + */ +u64 __vgic_v3_get_gic_config(void) { return read_gicreg(ICH_VTR_EL2); } diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c index 52915b342351..c3e6c3fd333b 100644 --- a/arch/arm64/kvm/vgic/vgic-v3.c +++ b/arch/arm64/kvm/vgic/vgic-v3.c @@ -574,9 +574,11 @@ early_param("kvm-arm.vgic_v4_enable", early_gicv4_enable); */ int vgic_v3_probe(const struct gic_kvm_info *info) { - u32 ich_vtr_el2 = kvm_call_hyp_ret(__vgic_v3_get_ich_vtr_el2); + u64 ich_vtr_el2 = kvm_call_hyp_ret(__vgic_v3_get_gic_config); int ret; + ich_vtr_el2 = (u32)ich_vtr_el2; + /* * The ListRegs field is 5 bits, but there is an architectural * maximum of 16 list registers. Just ignore bit 4... -- 2.29.2 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 5/8] KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available
When running under a nesting hypervisor, it isn't guaranteed that the virtual HW will include a PMU. In which case, let's not try to access the PMU registers in the world switch, as that'd be deadly. Reported-by: Andre Przywara Signed-off-by: Marc Zyngier Reviewed-by: Alexandru Elisei Link: https://lore.kernel.org/r/20210209114844.3278746-3-...@kernel.org --- arch/arm64/kernel/image-vars.h | 3 +++ arch/arm64/kvm/hyp/include/hyp/switch.h | 9 ++--- 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h index 23f1a557bd9f..5aa9ed1e9ec6 100644 --- a/arch/arm64/kernel/image-vars.h +++ b/arch/arm64/kernel/image-vars.h @@ -101,6 +101,9 @@ KVM_NVHE_ALIAS(__stop___kvm_ex_table); /* Array containing bases of nVHE per-CPU memory regions. */ KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base); +/* PMU available static key */ +KVM_NVHE_ALIAS(kvm_arm_pmu_available); + #endif /* CONFIG_KVM */ #endif /* __ARM64_KERNEL_IMAGE_VARS_H */ diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h index 54f4860cd87c..6c1f51f25eb3 100644 --- a/arch/arm64/kvm/hyp/include/hyp/switch.h +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h @@ -90,15 +90,18 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu) * counter, which could make a PMXEVCNTR_EL0 access UNDEF at * EL1 instead of being trapped to EL2. */ - write_sysreg(0, pmselr_el0); - write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0); + if (kvm_arm_support_pmu_v3()) { + write_sysreg(0, pmselr_el0); + write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0); + } write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2); } static inline void __deactivate_traps_common(void) { write_sysreg(0, hstr_el2); - write_sysreg(0, pmuserenr_el0); + if (kvm_arm_support_pmu_v3()) + write_sysreg(0, pmuserenr_el0); } static inline void ___activate_traps(struct kvm_vcpu *vcpu) -- 2.29.2 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 4/8] KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
We currently find out about the presence of a HW PMU (or the handling of that PMU by perf, which amounts to the same thing) in a fairly roundabout way, by checking the number of counters available to perf. That's good enough for now, but we will soon need to find about about that on paths where perf is out of reach (in the world switch). Instead, let's turn kvm_arm_support_pmu_v3() into a static key. Signed-off-by: Marc Zyngier Reviewed-by: Alexandru Elisei Link: https://lore.kernel.org/r/20210209114844.3278746-2-...@kernel.org --- arch/arm64/kvm/perf.c | 10 ++ arch/arm64/kvm/pmu-emul.c | 10 -- include/kvm/arm_pmu.h | 9 +++-- 3 files changed, 17 insertions(+), 12 deletions(-) diff --git a/arch/arm64/kvm/perf.c b/arch/arm64/kvm/perf.c index d45b8b9a4415..739164324afe 100644 --- a/arch/arm64/kvm/perf.c +++ b/arch/arm64/kvm/perf.c @@ -11,6 +11,8 @@ #include +DEFINE_STATIC_KEY_FALSE(kvm_arm_pmu_available); + static int kvm_is_in_guest(void) { return kvm_get_running_vcpu() != NULL; @@ -48,6 +50,14 @@ static struct perf_guest_info_callbacks kvm_guest_cbs = { int kvm_perf_init(void) { + /* +* Check if HW_PERF_EVENTS are supported by checking the number of +* hardware performance counters. This could ensure the presence of +* a physical PMU and CONFIG_PERF_EVENT is selected. +*/ + if (IS_ENABLED(CONFIG_ARM_PMU) && perf_num_counters() > 0) + static_branch_enable(_arm_pmu_available); + return perf_register_guest_info_callbacks(_guest_cbs); } diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index e9ec08b0b070..e32c6e139a09 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -823,16 +823,6 @@ u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1) return val & mask; } -bool kvm_arm_support_pmu_v3(void) -{ - /* -* Check if HW_PERF_EVENTS are supported by checking the number of -* hardware performance counters. This could ensure the presence of -* a physical PMU and CONFIG_PERF_EVENT is selected. -*/ - return (perf_num_counters() > 0); -} - int kvm_arm_pmu_v3_enable(struct kvm_vcpu *vcpu) { if (!kvm_vcpu_has_pmu(vcpu)) diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h index 8dcb3e1477bc..6fd3cda608e4 100644 --- a/include/kvm/arm_pmu.h +++ b/include/kvm/arm_pmu.h @@ -13,6 +13,13 @@ #define ARMV8_PMU_CYCLE_IDX(ARMV8_PMU_MAX_COUNTERS - 1) #define ARMV8_PMU_MAX_COUNTER_PAIRS((ARMV8_PMU_MAX_COUNTERS + 1) >> 1) +DECLARE_STATIC_KEY_FALSE(kvm_arm_pmu_available); + +static __always_inline bool kvm_arm_support_pmu_v3(void) +{ + return static_branch_likely(_arm_pmu_available); +} + #ifdef CONFIG_HW_PERF_EVENTS struct kvm_pmc { @@ -47,7 +54,6 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val); void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val); void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, u64 select_idx); -bool kvm_arm_support_pmu_v3(void); int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr); int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, @@ -87,7 +93,6 @@ static inline void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val) {} static inline void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) {} static inline void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, u64 select_idx) {} -static inline bool kvm_arm_support_pmu_v3(void) { return false; } static inline int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) { -- 2.29.2 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 3/8] KVM: arm64: Fix nVHE hyp panic host context restore
From: Andrew Scull When panicking from the nVHE hyp and restoring the host context, x29 is expected to hold a pointer to the host context. This wasn't being done so fix it to make sure there's a valid pointer the host context being used. Rather than passing a boolean indicating whether or not the host context should be restored, instead pass the pointer to the host context. NULL is passed to indicate that no context should be restored. Fixes: a2e102e20fd6 ("KVM: arm64: nVHE: Handle hyp panics") Cc: sta...@vger.kernel.org Signed-off-by: Andrew Scull [maz: partial rewrite to fit 5.12-rc1] Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20210219122406.1337626-1-asc...@google.com --- arch/arm64/include/asm/kvm_hyp.h | 3 ++- arch/arm64/kvm/hyp/nvhe/host.S | 15 --- arch/arm64/kvm/hyp/nvhe/switch.c | 3 +-- 3 files changed, 11 insertions(+), 10 deletions(-) diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h index 385bd7dd3d39..32ae676236b6 100644 --- a/arch/arm64/include/asm/kvm_hyp.h +++ b/arch/arm64/include/asm/kvm_hyp.h @@ -102,7 +102,8 @@ bool kvm_host_psci_handler(struct kvm_cpu_context *host_ctxt); void __noreturn hyp_panic(void); #ifdef __KVM_NVHE_HYPERVISOR__ -void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par); +void __noreturn __hyp_do_panic(struct kvm_cpu_context *host_ctxt, u64 spsr, + u64 elr, u64 par); #endif #endif /* __ARM64_KVM_HYP_H__ */ diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S index 6585a7cbbc56..5d94584840cc 100644 --- a/arch/arm64/kvm/hyp/nvhe/host.S +++ b/arch/arm64/kvm/hyp/nvhe/host.S @@ -71,7 +71,8 @@ SYM_FUNC_START(__host_enter) SYM_FUNC_END(__host_enter) /* - * void __noreturn __hyp_do_panic(bool restore_host, u64 spsr, u64 elr, u64 par); + * void __noreturn __hyp_do_panic(struct kvm_cpu_context *host_ctxt, u64 spsr, + * u64 elr, u64 par); */ SYM_FUNC_START(__hyp_do_panic) /* Prepare and exit to the host's panic funciton. */ @@ -82,9 +83,11 @@ SYM_FUNC_START(__hyp_do_panic) hyp_kimg_va lr, x6 msr elr_el2, lr - /* Set the panic format string. Use the, now free, LR as scratch. */ - ldr lr, =__hyp_panic_string - hyp_kimg_va lr, x6 + mov x29, x0 + + /* Load the format string into x0 and arguments into x1-7 */ + ldr x0, =__hyp_panic_string + hyp_kimg_va x0, x6 /* Load the format arguments into x1-7. */ mov x6, x3 @@ -94,9 +97,7 @@ SYM_FUNC_START(__hyp_do_panic) mrs x5, hpfar_el2 /* Enter the host, conditionally restoring the host context. */ - cmp x0, xzr - mov x0, lr - b.eq__host_enter_without_restoring + cbz x29, __host_enter_without_restoring b __host_enter_for_panic SYM_FUNC_END(__hyp_do_panic) diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c index 59aa1045fdaf..68ab6b4d5141 100644 --- a/arch/arm64/kvm/hyp/nvhe/switch.c +++ b/arch/arm64/kvm/hyp/nvhe/switch.c @@ -266,7 +266,6 @@ void __noreturn hyp_panic(void) u64 spsr = read_sysreg_el2(SYS_SPSR); u64 elr = read_sysreg_el2(SYS_ELR); u64 par = read_sysreg_par(); - bool restore_host = true; struct kvm_cpu_context *host_ctxt; struct kvm_vcpu *vcpu; @@ -280,7 +279,7 @@ void __noreturn hyp_panic(void) __sysreg_restore_state_nvhe(host_ctxt); } - __hyp_do_panic(restore_host, spsr, elr, par); + __hyp_do_panic(host_ctxt, spsr, elr, par); unreachable(); } -- 2.29.2 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 2/8] KVM: arm64: Avoid corrupting vCPU context register in guest exit
From: Will Deacon Commit 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest context") tracks the currently running vCPU, clearing the pointer to NULL on exit from a guest. Unfortunately, the use of 'set_loaded_vcpu' clobbers x1 to point at the kvm_hyp_ctxt instead of the vCPU context, causing the subsequent RAS code to go off into the weeds when it saves the DISR assuming that the CPU context is embedded in a struct vCPU. Leave x1 alone and use x3 as a temporary register instead when clearing the vCPU on the guest exit path. Cc: Marc Zyngier Cc: Andrew Scull Cc: Fixes: 7db21530479f ("KVM: arm64: Restore hyp when panicking in guest context") Suggested-by: Quentin Perret Signed-off-by: Will Deacon Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20210226181211.14542-1-w...@kernel.org --- arch/arm64/kvm/hyp/entry.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S index b0afad7a99c6..0c66a1d408fd 100644 --- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -146,7 +146,7 @@ SYM_INNER_LABEL(__guest_exit, SYM_L_GLOBAL) // Now restore the hyp regs restore_callee_saved_regs x2 - set_loaded_vcpu xzr, x1, x2 + set_loaded_vcpu xzr, x2, x3 alternative_if ARM64_HAS_RAS_EXTN // If we have the RAS extensions we can consume a pending error -- 2.29.2 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 1/8] KVM: arm64: nvhe: Save the SPE context early
From: Suzuki K Poulose The nVHE KVM hyp drains and disables the SPE buffer, before entering the guest, as the EL1&0 translation regime is going to be loaded with that of the guest. But this operation is performed way too late, because : - The owning translation regime of the SPE buffer is transferred to EL2. (MDCR_EL2_E2PB == 0) - The guest Stage1 is loaded. Thus the flush could use the host EL1 virtual address, but use the EL2 translations instead of host EL1, for writing out any cached data. Fix this by moving the SPE buffer handling early enough. The restore path is doing the right thing. Fixes: 014c4c77aad7 ("KVM: arm64: Improve debug register save/restore flow") Cc: sta...@vger.kernel.org Cc: Christoffer Dall Cc: Marc Zyngier Cc: Will Deacon Cc: Catalin Marinas Cc: Mark Rutland Cc: Alexandru Elisei Reviewed-by: Alexandru Elisei Signed-off-by: Suzuki K Poulose Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20210302120345.3102874-1-suzuki.poul...@arm.com --- arch/arm64/include/asm/kvm_hyp.h | 5 + arch/arm64/kvm/hyp/nvhe/debug-sr.c | 12 ++-- arch/arm64/kvm/hyp/nvhe/switch.c | 11 ++- 3 files changed, 25 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h index c0450828378b..385bd7dd3d39 100644 --- a/arch/arm64/include/asm/kvm_hyp.h +++ b/arch/arm64/include/asm/kvm_hyp.h @@ -83,6 +83,11 @@ void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt); void __debug_switch_to_guest(struct kvm_vcpu *vcpu); void __debug_switch_to_host(struct kvm_vcpu *vcpu); +#ifdef __KVM_NVHE_HYPERVISOR__ +void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu); +void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu); +#endif + void __fpsimd_save_state(struct user_fpsimd_state *fp_regs); void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs); diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c index 91a711aa8382..f401724f12ef 100644 --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c @@ -58,16 +58,24 @@ static void __debug_restore_spe(u64 pmscr_el1) write_sysreg_s(pmscr_el1, SYS_PMSCR_EL1); } -void __debug_switch_to_guest(struct kvm_vcpu *vcpu) +void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu) { /* Disable and flush SPE data generation */ __debug_save_spe(>arch.host_debug_state.pmscr_el1); +} + +void __debug_switch_to_guest(struct kvm_vcpu *vcpu) +{ __debug_switch_to_guest_common(vcpu); } -void __debug_switch_to_host(struct kvm_vcpu *vcpu) +void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu) { __debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1); +} + +void __debug_switch_to_host(struct kvm_vcpu *vcpu) +{ __debug_switch_to_host_common(vcpu); } diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c index f3d0e9eca56c..59aa1045fdaf 100644 --- a/arch/arm64/kvm/hyp/nvhe/switch.c +++ b/arch/arm64/kvm/hyp/nvhe/switch.c @@ -192,6 +192,14 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu) pmu_switch_needed = __pmu_switch_to_guest(host_ctxt); __sysreg_save_state_nvhe(host_ctxt); + /* +* We must flush and disable the SPE buffer for nVHE, as +* the translation regime(EL1&0) is going to be loaded with +* that of the guest. And we must do this before we change the +* translation regime to EL2 (via MDCR_EL2_E2PB == 0) and +* before we load guest Stage1. +*/ + __debug_save_host_buffers_nvhe(vcpu); __adjust_pc(vcpu); @@ -234,11 +242,12 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu) if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) __fpsimd_save_fpexc32(vcpu); + __debug_switch_to_host(vcpu); /* * This must come after restoring the host sysregs, since a non-VHE * system may enable SPE here and make use of the TTBRs. */ - __debug_switch_to_host(vcpu); + __debug_restore_host_buffers_nvhe(vcpu); if (pmu_switch_needed) __pmu_switch_to_host(host_ctxt); -- 2.29.2 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 0/8] KVM/arm64 fixes for 5.12, take #1
Hi Paolo, Here's the first batch of fixes for 5.12. We have a handful of low level world-switch regressions, a page table walker fix, more PMU tidying up, and a workaround for systems with creative firmware. This will need to go on top of the current state of mainline. Please apply, M. Andrew Scull (1): KVM: arm64: Fix nVHE hyp panic host context restore Jia He (1): KVM: arm64: Fix range alignment when walking page tables Marc Zyngier (4): KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config() KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility Suzuki K Poulose (1): KVM: arm64: nvhe: Save the SPE context early Will Deacon (1): KVM: arm64: Avoid corrupting vCPU context register in guest exit arch/arm64/include/asm/kvm_asm.h| 4 ++-- arch/arm64/include/asm/kvm_hyp.h| 8 ++- arch/arm64/kernel/image-vars.h | 3 +++ arch/arm64/kvm/hyp/entry.S | 2 +- arch/arm64/kvm/hyp/include/hyp/switch.h | 9 +--- arch/arm64/kvm/hyp/nvhe/debug-sr.c | 12 -- arch/arm64/kvm/hyp/nvhe/host.S | 15 +++-- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 ++--- arch/arm64/kvm/hyp/nvhe/switch.c| 14 +--- arch/arm64/kvm/hyp/pgtable.c| 1 + arch/arm64/kvm/hyp/vgic-v3-sr.c | 40 +++-- arch/arm64/kvm/perf.c | 10 + arch/arm64/kvm/pmu-emul.c | 10 - arch/arm64/kvm/vgic/vgic-v3.c | 12 +++--- include/kvm/arm_pmu.h | 9 ++-- 15 files changed, 116 insertions(+), 39 deletions(-) ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [GIT PULL] KVM/arm64 fixes for 5.12, take #1
Hi Paolo, On Fri, 05 Mar 2021 17:27:36 +, Paolo Bonzini wrote: > > On 05/03/21 17:49, Marc Zyngier wrote: > > Hi Paolo, > > > > Here's the first batch of fixes for 5.12. We have a handful of low > > level world-switch regressions, a page table walker fix, more PMU > > tidying up, and a workaround for systems with creative firmware. > > > > Note that this is based on -rc1 despite the breakage, as I didn't feel > > like holding these patches until -rc2. > > > > Please pull, > > > > M. > > > > The following changes since commit fe07bfda2fb9cdef8a4d4008a409bb02f35f1bd8: > > > >Linux 5.12-rc1 (2021-02-28 16:05:19 -0800) > > > > are available in the Git repository at: > > > >git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git > > tags/kvmarm-fixes-5.12-1 > > > > for you to fetch changes up to e85583b3f1fe62c9b371a3100c1c91af94005ca9: > > > >KVM: arm64: Fix range alignment when walking page tables (2021-03-04 > > 09:54:12 +) > > Hi Marc, > > due to a severe data corruption bug in 5.12-rc1, Linus suggested not > including 5.12-rc1 in trees to avoid it eating our filesystems > unwittingly during future bisections. > > Would it be a problem for you to rebase on top of your merge window > pull request? If there are conflicts, another possibility is for you > to just send me the patch series. I will handle all the topic branch > juggling. > > This will mean rewriting kvmarm.git's history, but it does seem to be > the lesser (or the most future-proof) evil. The problem is that this is not only kvmarm, but also the Android tree, which directly pulls from the kvmarm stable branches. I guess we'll have to live with it. I'll reply to this email with the patch series. Thanks, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [GIT PULL] KVM/arm64 fixes for 5.12, take #1
On 05/03/21 17:49, Marc Zyngier wrote: Hi Paolo, Here's the first batch of fixes for 5.12. We have a handful of low level world-switch regressions, a page table walker fix, more PMU tidying up, and a workaround for systems with creative firmware. Note that this is based on -rc1 despite the breakage, as I didn't feel like holding these patches until -rc2. Please pull, M. The following changes since commit fe07bfda2fb9cdef8a4d4008a409bb02f35f1bd8: Linux 5.12-rc1 (2021-02-28 16:05:19 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git tags/kvmarm-fixes-5.12-1 for you to fetch changes up to e85583b3f1fe62c9b371a3100c1c91af94005ca9: KVM: arm64: Fix range alignment when walking page tables (2021-03-04 09:54:12 +) Hi Marc, due to a severe data corruption bug in 5.12-rc1, Linus suggested not including 5.12-rc1 in trees to avoid it eating our filesystems unwittingly during future bisections. Would it be a problem for you to rebase on top of your merge window pull request? If there are conflicts, another possibility is for you to just send me the patch series. I will handle all the topic branch juggling. This will mean rewriting kvmarm.git's history, but it does seem to be the lesser (or the most future-proof) evil. Thanks, Paolo KVM/arm64 fixes for 5.12, take #1 - Fix SPE context save/restore on nVHE - Fix some subtle host context corruption on vcpu exit - Fix panic handling on nVHE - Prevent the hypervisor from accessing PMU registers when there is none - Workaround broken firmwares advertising bogus GICv2 compatibility - Fix Stage-2 unaligned range unmapping Andrew Scull (1): KVM: arm64: Fix nVHE hyp panic host context restore Jia He (1): KVM: arm64: Fix range alignment when walking page tables Marc Zyngier (4): KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config() KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility Suzuki K Poulose (1): KVM: arm64: nvhe: Save the SPE context early Will Deacon (1): KVM: arm64: Avoid corrupting vCPU context register in guest exit arch/arm64/include/asm/kvm_asm.h| 4 ++-- arch/arm64/include/asm/kvm_hyp.h| 8 ++- arch/arm64/kernel/image-vars.h | 3 +++ arch/arm64/kvm/hyp/entry.S | 2 +- arch/arm64/kvm/hyp/include/hyp/switch.h | 9 +--- arch/arm64/kvm/hyp/nvhe/debug-sr.c | 12 -- arch/arm64/kvm/hyp/nvhe/host.S | 15 +++-- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 ++--- arch/arm64/kvm/hyp/nvhe/switch.c| 14 +--- arch/arm64/kvm/hyp/pgtable.c| 1 + arch/arm64/kvm/hyp/vgic-v3-sr.c | 40 +++-- arch/arm64/kvm/perf.c | 10 + arch/arm64/kvm/pmu-emul.c | 10 - arch/arm64/kvm/vgic/vgic-v3.c | 12 +++--- include/kvm/arm_pmu.h | 9 ++-- 15 files changed, 116 insertions(+), 39 deletions(-) ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: Disable LTO in hyp
On 2021-03-05 16:55, Sami Tolvanen wrote: On Fri, Mar 5, 2021 at 6:22 AM Ard Biesheuvel wrote: On Fri, 5 Mar 2021 at 12:38, Marc Zyngier wrote: > > On Fri, 05 Mar 2021 02:38:17 +, > Sami Tolvanen wrote: > > > > On Thu, Mar 4, 2021 at 2:34 PM Sami Tolvanen wrote: > > > > > > On Thu, Mar 4, 2021 at 2:17 PM Marc Zyngier wrote: > > > > > > > > On Thu, 04 Mar 2021 21:25:41 +, > > > > Sami Tolvanen wrote: > > [...] > > > > > > I assume hyp_panic() ends up being placed too far from __guest_enter() > > > > > when the kernel is large enough. Possibly something to do with LLVM > > > > > always splitting functions into separate sections with LTO. I'm not > > > > > sure why the linker cannot shuffle things around to make everyone > > > > > happy in this case, but I confirmed that this patch also fixes the > > > > > build issue for me: > > > > > > > > > > diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c > > > > > index af8e940d0f03..128197b7c794 100644 > > > > > --- a/arch/arm64/kvm/hyp/vhe/switch.c > > > > > +++ b/arch/arm64/kvm/hyp/vhe/switch.c > > > > > @@ -214,7 +214,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, u64 par) > > > > > } > > > > > NOKPROBE_SYMBOL(__hyp_call_panic); > > > > > > > > > > -void __noreturn hyp_panic(void) > > > > > +void __noreturn hyp_panic(void) __section(".text") > > > > > { > > > > > u64 spsr = read_sysreg_el2(SYS_SPSR); > > > > > u64 elr = read_sysreg_el2(SYS_ELR); > > > > > > > > > > > > > We're getting into black-magic territory here. Why wouldn't hyp_panic > > > > be in the .text section already? > > > > > > It's not quite black magic. LLVM essentially flips on > > > -ffunction-sections with LTO and therefore, hyp_panic() will be in > > > .text.hyp_panic in vmlinux.o, while __guest_enter() will be in .text. > > > Everything ends up in .text when we link vmlinux, of course. > > > > > > $ readelf --sections vmlinux.o | grep hyp_panic > > > [3936] .text.hyp_panic PROGBITS 004b56e4 > > > > Note that disabling LTO here has essentially the same effect as using > > __section(".text"). It stops the compiler from splitting these > > functions into .text.* sections and makes it less likely that > > hyp_panic() ends up too far away from __guest_enter(). > > > > If neither of these workarounds sound appealing, I suppose we could > > alternatively change hyp/entry.S to use adr_l for hyp_panic. Thoughts? > > That would be an actual fix instead of a workaround, as it would > remove existing assumptions about the relative locations of the two > objects. I guess you need to fix both instances with something such > as: > > diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S > index b0afad7a99c6..a43e1f7ee354 100644 > --- a/arch/arm64/kvm/hyp/entry.S > +++ b/arch/arm64/kvm/hyp/entry.S > @@ -85,8 +85,10 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL) > > // If the hyp context is loaded, go straight to hyp_panic > get_loaded_vcpu x0, x1 > - cbz x0, hyp_panic > - > + cbnzx0, 1f > + adr_l x0, hyp_panic > + br x0 > +1: Agree with replacing the conditional branches that refer to external symbols: the compiler never emits those, for the reason we are seeing here, i.e., the range is simply insufficient. But let's just use 'b hyp_panic' instead, no? Alright, this seems to work for me: diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S index b0afad7a99c6..c62265951467 100644 --- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -85,8 +85,10 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL) // If the hyp context is loaded, go straight to hyp_panic get_loaded_vcpu x0, x1 - cbz x0, hyp_panic + cbnzx0, 1f + b hyp_panic +1: // The hyp context is saved so make sure it is restored to allow // hyp_panic to run at hyp and, subsequently, panic to run in the host. // This makes use of __guest_exit to avoid duplication but sets the @@ -94,7 +96,7 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL) // current state is saved to the guest context but it will only be // accurate if the guest had been completely restored. adr_this_cpu x0, kvm_hyp_ctxt, x1 - adr x1, hyp_panic + adr_l x1, hyp_panic str x1, [x0, #CPU_XREG_OFFSET(30)] get_vcpu_ptrx1, x0 But when I say work, I mean this fixes the allmodconfig build with LTO, and my kernel boots at EL2. I don't actually have a way to properly test KVM on arm64. If nobody sees obvious issues here, I can send a proper patch a bit later. Please do, it is "obviously correct"! ;-) Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
On Fri, Mar 05, 2021 at 03:03:36PM +, Quentin Perret wrote: > On Friday 05 Mar 2021 at 14:39:42 (+), Will Deacon wrote: > > On Tue, Mar 02, 2021 at 02:59:58PM +, Quentin Perret wrote: > > > + /* Reduce the kvm_mem_range to a granule size */ > > > + ret = __stage2_reduce_range(, range->end); > > > + if (ret) > > > + return ret; > > > + > > > + /* Walk the range to check permissions and reduce further if needed */ > > > + do { > > > + ret = kvm_pgtable_walk(pgt, range->start, range->end, ); > > > > (we spent some time debugging an issue here and you spotted that you're > > passing range->end instead of the size ;) > > Yep, I have the fix applied locally, and ready to fly in v4 :) > > > > + } while (ret == -EAGAIN); > > > > I'm a bit nervous about this loop -- what guarantees forward progress here? > > Can we return to the host after a few tries instead? > > -EAGAIN only happens when we've been able to successfully reduce the > range to a potentially valid granule size. That can't happen infinitely. > > We're guaranteed to fail when trying to reduce the range to a > granularity smaller than PAGE_SIZE (the -EINVAL case of > __stage2_reduce_range), which is indicative of a host memory abort in a > page it should not access (because marked PROT_NONE for instance). Can you compute an upper bound on the number of iterations based on the number of page-table levels then? I'm just conscious that all of this is effectively running with irqs disabled, and so being able to reason about the amount of work we're going to do makes me much more comfortable. Will ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[GIT PULL] KVM/arm64 fixes for 5.12, take #1
Hi Paolo, Here's the first batch of fixes for 5.12. We have a handful of low level world-switch regressions, a page table walker fix, more PMU tidying up, and a workaround for systems with creative firmware. Note that this is based on -rc1 despite the breakage, as I didn't feel like holding these patches until -rc2. Please pull, M. The following changes since commit fe07bfda2fb9cdef8a4d4008a409bb02f35f1bd8: Linux 5.12-rc1 (2021-02-28 16:05:19 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git tags/kvmarm-fixes-5.12-1 for you to fetch changes up to e85583b3f1fe62c9b371a3100c1c91af94005ca9: KVM: arm64: Fix range alignment when walking page tables (2021-03-04 09:54:12 +) KVM/arm64 fixes for 5.12, take #1 - Fix SPE context save/restore on nVHE - Fix some subtle host context corruption on vcpu exit - Fix panic handling on nVHE - Prevent the hypervisor from accessing PMU registers when there is none - Workaround broken firmwares advertising bogus GICv2 compatibility - Fix Stage-2 unaligned range unmapping Andrew Scull (1): KVM: arm64: Fix nVHE hyp panic host context restore Jia He (1): KVM: arm64: Fix range alignment when walking page tables Marc Zyngier (4): KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config() KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility Suzuki K Poulose (1): KVM: arm64: nvhe: Save the SPE context early Will Deacon (1): KVM: arm64: Avoid corrupting vCPU context register in guest exit arch/arm64/include/asm/kvm_asm.h| 4 ++-- arch/arm64/include/asm/kvm_hyp.h| 8 ++- arch/arm64/kernel/image-vars.h | 3 +++ arch/arm64/kvm/hyp/entry.S | 2 +- arch/arm64/kvm/hyp/include/hyp/switch.h | 9 +--- arch/arm64/kvm/hyp/nvhe/debug-sr.c | 12 -- arch/arm64/kvm/hyp/nvhe/host.S | 15 +++-- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 6 ++--- arch/arm64/kvm/hyp/nvhe/switch.c| 14 +--- arch/arm64/kvm/hyp/pgtable.c| 1 + arch/arm64/kvm/hyp/vgic-v3-sr.c | 40 +++-- arch/arm64/kvm/perf.c | 10 + arch/arm64/kvm/pmu-emul.c | 10 - arch/arm64/kvm/vgic/vgic-v3.c | 12 +++--- include/kvm/arm_pmu.h | 9 ++-- 15 files changed, 116 insertions(+), 39 deletions(-) ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] arm64/mm: Fix __enable_mmu() for new TGRAN range values
On Fri, Mar 05, 2021 at 08:06:09PM +0530, Anshuman Khandual wrote: > From: James Morse > > As per ARM ARM DDI 0487G.a, when FEAT_LPA2 is implemented, ID_AA64MMFR0_EL1 > might contain a range of values to describe supported translation granules > (4K and 16K pages sizes in particular) instead of just enabled or disabled > values. This changes __enable_mmu() function to handle complete acceptable > range of values (depending on whether the field is signed or unsigned) now > represented with ID_AA64MMFR0_TGRAN_SUPPORTED_[MIN..MAX] pair. While here, > also fix similar situations in EFI stub and KVM as well. > > Cc: Catalin Marinas > Cc: Will Deacon > Cc: Marc Zyngier > Cc: James Morse > Cc: Suzuki K Poulose > Cc: Ard Biesheuvel > Cc: Mark Rutland > Cc: linux-arm-ker...@lists.infradead.org > Cc: kvmarm@lists.cs.columbia.edu > Cc: linux-...@vger.kernel.org > Cc: linux-ker...@vger.kernel.org > Signed-off-by: James Morse > Signed-off-by: Anshuman Khandual > --- > arch/arm64/include/asm/sysreg.h | 20 ++-- > arch/arm64/kernel/head.S | 6 -- > arch/arm64/kvm/reset.c| 23 --- > drivers/firmware/efi/libstub/arm64-stub.c | 2 +- > 4 files changed, 31 insertions(+), 20 deletions(-) > > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h > index dfd4edb..d4a5fca9 100644 > --- a/arch/arm64/include/asm/sysreg.h > +++ b/arch/arm64/include/asm/sysreg.h > @@ -796,6 +796,11 @@ > #define ID_AA64MMFR0_PARANGE_48 0x5 > #define ID_AA64MMFR0_PARANGE_52 0x6 > > +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT 0x0 > +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE 0x1 > +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN 0x2 > +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX 0x7 The TGRAN2 fields doesn't quite follow the usual ID scheme rules, so how do we deteremine the max value? Does the ARM ARM say anything in particular about them, like we do for some of the PMU ID fields? Otherwise, this patch looks correct to me. Thanks, Mark. > + > #ifdef CONFIG_ARM64_PA_BITS_52 > #define ID_AA64MMFR0_PARANGE_MAX ID_AA64MMFR0_PARANGE_52 > #else > @@ -961,14 +966,17 @@ > #define ID_PFR1_PROGMOD_SHIFT0 > > #if defined(CONFIG_ARM64_4K_PAGES) > -#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN4_SHIFT > -#define ID_AA64MMFR0_TGRAN_SUPPORTED ID_AA64MMFR0_TGRAN4_SUPPORTED > +#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN4_SHIFT > +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN4_SUPPORTED > +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX 0x7 > #elif defined(CONFIG_ARM64_16K_PAGES) > -#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN16_SHIFT > -#define ID_AA64MMFR0_TGRAN_SUPPORTED ID_AA64MMFR0_TGRAN16_SUPPORTED > +#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN16_SHIFT > +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN16_SUPPORTED > +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX 0xF > #elif defined(CONFIG_ARM64_64K_PAGES) > -#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN64_SHIFT > -#define ID_AA64MMFR0_TGRAN_SUPPORTED ID_AA64MMFR0_TGRAN64_SUPPORTED > +#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN64_SHIFT > +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN64_SUPPORTED > +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX 0x7 > #endif > > #define MVFR2_FPMISC_SHIFT 4 > diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S > index 66b0e0b..8b469f1 100644 > --- a/arch/arm64/kernel/head.S > +++ b/arch/arm64/kernel/head.S > @@ -655,8 +655,10 @@ SYM_FUNC_END(__secondary_too_slow) > SYM_FUNC_START(__enable_mmu) > mrs x2, ID_AA64MMFR0_EL1 > ubfxx2, x2, #ID_AA64MMFR0_TGRAN_SHIFT, 4 > - cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED > - b.ne__no_granule_support > + cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MIN > + b.lt__no_granule_support > + cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MAX > + b.gt__no_granule_support > update_early_cpu_boot_status 0, x2, x3 > adrpx2, idmap_pg_dir > phys_to_ttbr x1, x1 > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c > index 47f3f03..fe72bfb 100644 > --- a/arch/arm64/kvm/reset.c > +++ b/arch/arm64/kvm/reset.c > @@ -286,7 +286,7 @@ u32 get_kvm_ipa_limit(void) > > int kvm_set_ipa_limit(void) > { > - unsigned int parange, tgran_2; > + unsigned int parange, tgran_2_shift, tgran_2; > u64 mmfr0; > > mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1); > @@ -300,27 +300,28 @@ int kvm_set_ipa_limit(void) > switch (PAGE_SIZE) { > default: > case SZ_4K: > - tgran_2 = ID_AA64MMFR0_TGRAN4_2_SHIFT; > + tgran_2_shift = ID_AA64MMFR0_TGRAN4_2_SHIFT; > break; > case SZ_16K: > - tgran_2 = ID_AA64MMFR0_TGRAN16_2_SHIFT; > +
Re: [PATCH v3 28/32] KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
On Tue, Mar 02, 2021 at 02:59:58PM +, Quentin Perret wrote: > Add a new map function to the KVM page-table library that allows to > greedily create block identity-mappings. This will be useful to create > lazily the host stage 2 page-table as it will own most of memory and > will always be identity mapped. > > The new helper function creates the mapping in 2 steps: it first walks > the page-table to compute the largest possible granule that can be used > to idmap a given address without overriding existing incompatible > mappings; and then creates a mapping accordingly. > > Signed-off-by: Quentin Perret > --- > arch/arm64/include/asm/kvm_pgtable.h | 37 + > arch/arm64/kvm/hyp/pgtable.c | 119 +++ > 2 files changed, 156 insertions(+) > > diff --git a/arch/arm64/include/asm/kvm_pgtable.h > b/arch/arm64/include/asm/kvm_pgtable.h > index c9f6ed76e0ad..e51dcce69a5e 100644 > --- a/arch/arm64/include/asm/kvm_pgtable.h > +++ b/arch/arm64/include/asm/kvm_pgtable.h > @@ -96,6 +96,16 @@ enum kvm_pgtable_prot { > #define PAGE_HYP_RO (KVM_PGTABLE_PROT_R) > #define PAGE_HYP_DEVICE (PAGE_HYP | KVM_PGTABLE_PROT_DEVICE) > > +/** > + * struct kvm_mem_range - Range of Intermediate Physical Addresses > + * @start: Start of the range. > + * @end: End of the range. > + */ > +struct kvm_mem_range { > + u64 start; > + u64 end; > +}; > + > /** > * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table > walk. > * @KVM_PGTABLE_WALK_LEAF: Visit leaf entries, including invalid > @@ -379,4 +389,31 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, > u64 addr, u64 size); > int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size, >struct kvm_pgtable_walker *walker); > > +/** > + * kvm_pgtable_stage2_idmap_greedy() - Identity-map an Intermediate Physical > + * Address with a leaf entry at the highest > + * possible level. Not sure it's worth mentioning "highest possible level" here, as realistically the caller still has to provide a memcache to deal with the worst case and the structure of the page-table shouldn't matter. > + * @pgt: Page-table structure initialised by kvm_pgtable_*_init(). > + * @addr:Input address to identity-map. > + * @prot:Permissions and attributes for the mapping. > + * @range: Boundaries of the maximum memory region to map. > + * @mc: Cache of pre-allocated memory from which to allocate > page-table > + * pages. > + * > + * This function attempts to install high-level identity-mappings covering > @addr "high-level"? (again, I think I'd just drop this) > + * without overriding existing mappings with incompatible permissions or > + * attributes. An existing table entry may be coalesced into a block mapping > + * if and only if it covers @addr and all its leafs are either invalid and/or s/leafs/leaf entries/ > + * have permissions and attributes strictly matching @prot. The mapping is > + * guaranteed to be contained within the boundaries specified by @range at > call > + * time. If only a subset of the memory specified by @range is mapped > (because > + * of e.g. alignment issues or existing incompatible mappings), @range will > be > + * updated accordingly. > + * > + * Return: 0 on success, negative error code on failure. > + */ > +int kvm_pgtable_stage2_idmap_greedy(struct kvm_pgtable *pgt, u64 addr, > + enum kvm_pgtable_prot prot, > + struct kvm_mem_range *range, > + void *mc); > #endif /* __ARM64_KVM_PGTABLE_H__ */ > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > index 8aa01a9e2603..6897d771e2b2 100644 > --- a/arch/arm64/kvm/hyp/pgtable.c > +++ b/arch/arm64/kvm/hyp/pgtable.c > @@ -987,3 +987,122 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt) > pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz); > pgt->pgd = NULL; > } > + > +struct stage2_reduce_range_data { > + kvm_pte_t attr; > + u64 target_addr; > + u32 start_level; > + struct kvm_mem_range *range; > +}; > + > +static int __stage2_reduce_range(struct stage2_reduce_range_data *data, u64 > addr) > +{ > + u32 level = data->start_level; > + > + for (; level < KVM_PGTABLE_MAX_LEVELS; level++) { > + u64 granule = kvm_granule_size(level); > + u64 start = ALIGN_DOWN(data->target_addr, granule); > + u64 end = start + granule; > + > + /* > + * The pinned address is in the current range, try one level > + * deeper. > + */ > + if (start == ALIGN_DOWN(addr, granule)) > + continue; > + > + /* > + * Make sure the current range is a reduction of the existing > + * range
[PATCH] arm64/mm: Fix __enable_mmu() for new TGRAN range values
From: James Morse As per ARM ARM DDI 0487G.a, when FEAT_LPA2 is implemented, ID_AA64MMFR0_EL1 might contain a range of values to describe supported translation granules (4K and 16K pages sizes in particular) instead of just enabled or disabled values. This changes __enable_mmu() function to handle complete acceptable range of values (depending on whether the field is signed or unsigned) now represented with ID_AA64MMFR0_TGRAN_SUPPORTED_[MIN..MAX] pair. While here, also fix similar situations in EFI stub and KVM as well. Cc: Catalin Marinas Cc: Will Deacon Cc: Marc Zyngier Cc: James Morse Cc: Suzuki K Poulose Cc: Ard Biesheuvel Cc: Mark Rutland Cc: linux-arm-ker...@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Signed-off-by: James Morse Signed-off-by: Anshuman Khandual --- arch/arm64/include/asm/sysreg.h | 20 ++-- arch/arm64/kernel/head.S | 6 -- arch/arm64/kvm/reset.c| 23 --- drivers/firmware/efi/libstub/arm64-stub.c | 2 +- 4 files changed, 31 insertions(+), 20 deletions(-) diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index dfd4edb..d4a5fca9 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -796,6 +796,11 @@ #define ID_AA64MMFR0_PARANGE_480x5 #define ID_AA64MMFR0_PARANGE_520x6 +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT 0x0 +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE0x1 +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN 0x2 +#define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX 0x7 + #ifdef CONFIG_ARM64_PA_BITS_52 #define ID_AA64MMFR0_PARANGE_MAX ID_AA64MMFR0_PARANGE_52 #else @@ -961,14 +966,17 @@ #define ID_PFR1_PROGMOD_SHIFT 0 #if defined(CONFIG_ARM64_4K_PAGES) -#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN4_SHIFT -#define ID_AA64MMFR0_TGRAN_SUPPORTED ID_AA64MMFR0_TGRAN4_SUPPORTED +#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN4_SHIFT +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN4_SUPPORTED +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX 0x7 #elif defined(CONFIG_ARM64_16K_PAGES) -#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN16_SHIFT -#define ID_AA64MMFR0_TGRAN_SUPPORTED ID_AA64MMFR0_TGRAN16_SUPPORTED +#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN16_SHIFT +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN16_SUPPORTED +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX 0xF #elif defined(CONFIG_ARM64_64K_PAGES) -#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN64_SHIFT -#define ID_AA64MMFR0_TGRAN_SUPPORTED ID_AA64MMFR0_TGRAN64_SUPPORTED +#define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN64_SHIFT +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN64_SUPPORTED +#define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX 0x7 #endif #define MVFR2_FPMISC_SHIFT 4 diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S index 66b0e0b..8b469f1 100644 --- a/arch/arm64/kernel/head.S +++ b/arch/arm64/kernel/head.S @@ -655,8 +655,10 @@ SYM_FUNC_END(__secondary_too_slow) SYM_FUNC_START(__enable_mmu) mrs x2, ID_AA64MMFR0_EL1 ubfxx2, x2, #ID_AA64MMFR0_TGRAN_SHIFT, 4 - cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED - b.ne__no_granule_support + cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MIN + b.lt__no_granule_support + cmp x2, #ID_AA64MMFR0_TGRAN_SUPPORTED_MAX + b.gt__no_granule_support update_early_cpu_boot_status 0, x2, x3 adrpx2, idmap_pg_dir phys_to_ttbr x1, x1 diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 47f3f03..fe72bfb 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -286,7 +286,7 @@ u32 get_kvm_ipa_limit(void) int kvm_set_ipa_limit(void) { - unsigned int parange, tgran_2; + unsigned int parange, tgran_2_shift, tgran_2; u64 mmfr0; mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1); @@ -300,27 +300,28 @@ int kvm_set_ipa_limit(void) switch (PAGE_SIZE) { default: case SZ_4K: - tgran_2 = ID_AA64MMFR0_TGRAN4_2_SHIFT; + tgran_2_shift = ID_AA64MMFR0_TGRAN4_2_SHIFT; break; case SZ_16K: - tgran_2 = ID_AA64MMFR0_TGRAN16_2_SHIFT; + tgran_2_shift = ID_AA64MMFR0_TGRAN16_2_SHIFT; break; case SZ_64K: - tgran_2 = ID_AA64MMFR0_TGRAN64_2_SHIFT; + tgran_2_shift = ID_AA64MMFR0_TGRAN64_2_SHIFT; break; } - switch (cpuid_feature_extract_unsigned_field(mmfr0, tgran_2)) { - default: - case 1: + tgran_2 = cpuid_feature_extract_unsigned_field(mmfr0, tgran_2_shift); + if (tgran_2 ==
Re: [PATCH] KVM: arm64: Disable LTO in hyp
On Fri, 5 Mar 2021 at 12:38, Marc Zyngier wrote: > > On Fri, 05 Mar 2021 02:38:17 +, > Sami Tolvanen wrote: > > > > On Thu, Mar 4, 2021 at 2:34 PM Sami Tolvanen > > wrote: > > > > > > On Thu, Mar 4, 2021 at 2:17 PM Marc Zyngier wrote: > > > > > > > > On Thu, 04 Mar 2021 21:25:41 +, > > > > Sami Tolvanen wrote: > > [...] > > > > > > I assume hyp_panic() ends up being placed too far from __guest_enter() > > > > > when the kernel is large enough. Possibly something to do with LLVM > > > > > always splitting functions into separate sections with LTO. I'm not > > > > > sure why the linker cannot shuffle things around to make everyone > > > > > happy in this case, but I confirmed that this patch also fixes the > > > > > build issue for me: > > > > > > > > > > diff --git a/arch/arm64/kvm/hyp/vhe/switch.c > > > > > b/arch/arm64/kvm/hyp/vhe/switch.c > > > > > index af8e940d0f03..128197b7c794 100644 > > > > > --- a/arch/arm64/kvm/hyp/vhe/switch.c > > > > > +++ b/arch/arm64/kvm/hyp/vhe/switch.c > > > > > @@ -214,7 +214,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, > > > > > u64 par) > > > > > } > > > > > NOKPROBE_SYMBOL(__hyp_call_panic); > > > > > > > > > > -void __noreturn hyp_panic(void) > > > > > +void __noreturn hyp_panic(void) __section(".text") > > > > > { > > > > > u64 spsr = read_sysreg_el2(SYS_SPSR); > > > > > u64 elr = read_sysreg_el2(SYS_ELR); > > > > > > > > > > > > > We're getting into black-magic territory here. Why wouldn't hyp_panic > > > > be in the .text section already? > > > > > > It's not quite black magic. LLVM essentially flips on > > > -ffunction-sections with LTO and therefore, hyp_panic() will be in > > > .text.hyp_panic in vmlinux.o, while __guest_enter() will be in .text. > > > Everything ends up in .text when we link vmlinux, of course. > > > > > > $ readelf --sections vmlinux.o | grep hyp_panic > > > [3936] .text.hyp_panic PROGBITS 004b56e4 > > > > Note that disabling LTO here has essentially the same effect as using > > __section(".text"). It stops the compiler from splitting these > > functions into .text.* sections and makes it less likely that > > hyp_panic() ends up too far away from __guest_enter(). > > > > If neither of these workarounds sound appealing, I suppose we could > > alternatively change hyp/entry.S to use adr_l for hyp_panic. Thoughts? > > That would be an actual fix instead of a workaround, as it would > remove existing assumptions about the relative locations of the two > objects. I guess you need to fix both instances with something such > as: > > diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S > index b0afad7a99c6..a43e1f7ee354 100644 > --- a/arch/arm64/kvm/hyp/entry.S > +++ b/arch/arm64/kvm/hyp/entry.S > @@ -85,8 +85,10 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL) > > // If the hyp context is loaded, go straight to hyp_panic > get_loaded_vcpu x0, x1 > - cbz x0, hyp_panic > - > + cbnzx0, 1f > + adr_l x0, hyp_panic > + br x0 > +1: Agree with replacing the conditional branches that refer to external symbols: the compiler never emits those, for the reason we are seeing here, i.e., the range is simply insufficient. But let's just use 'b hyp_panic' instead, no? > // The hyp context is saved so make sure it is restored to allow > // hyp_panic to run at hyp and, subsequently, panic to run in the > host. > // This makes use of __guest_exit to avoid duplication but sets the > @@ -94,7 +96,7 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL) > // current state is saved to the guest context but it will only be > // accurate if the guest had been completely restored. > adr_this_cpu x0, kvm_hyp_ctxt, x1 > - adr x1, hyp_panic > + adr_l x1, hyp_panic > str x1, [x0, #CPU_XREG_OFFSET(30)] > > get_vcpu_ptrx1, x0 > > which is completely untested. I wouldn't be surprised if there were > more of these somewhere. > A quick grep gives me $ objdump -r vmlinux.o |grep BR19 0005b6e0 R_AARCH64_CONDBR19 hyp_panic 00418e08 R_AARCH64_CONDBR19 __memcpy 00418e14 R_AARCH64_CONDBR19 __memcpy 3818 R_AARCH64_CONDBR19 __kvm_nvhe___guest_exit_panic 3898 R_AARCH64_CONDBR19 __kvm_nvhe___guest_exit_panic 3918 R_AARCH64_CONDBR19 __kvm_nvhe___guest_exit_panic 3998 R_AARCH64_CONDBR19 __kvm_nvhe___guest_exit_panic 3a18 R_AARCH64_CONDBR19 __kvm_nvhe___guest_exit_panic 3a98 R_AARCH64_CONDBR19 __kvm_nvhe___guest_exit_panic 3b18 R_AARCH64_CONDBR19 __kvm_nvhe___guest_exit_panic 3b98 R_AARCH64_CONDBR19 __kvm_nvhe___guest_exit_panic 3c10 R_AARCH64_CONDBR19 __kvm_nvhe___host_exit 3c1c R_AARCH64_CONDBR19 __kvm_nvhe___host_exit 64f0 R_AARCH64_CONDBR19 __kvm_nvhe_hyp_panic 078c
Re: [PATCH v3 26/32] KVM: arm64: Introduce PROT_NONE mappings for stage 2
On Thursday 04 Mar 2021 at 20:00:45 (+), Will Deacon wrote: > On Tue, Mar 02, 2021 at 02:59:56PM +, Quentin Perret wrote: > > Once we start unmapping portions of memory from the host stage 2 (such > > as e.g. the hypervisor memory sections, or pages that belong to > > protected guests), we will need a way to track page ownership. And > > given that all mappings in the host stage 2 will be identity-mapped, we > > can use the host stage 2 page-table itself as a simplistic rmap. > > > > As a first step towards this, introduce a new protection attribute > > in the stage 2 page table code, called KVM_PGTABLE_PROT_NONE, which > > allows to annotate portions of the IPA space as inaccessible. For > > simplicity, PROT_NONE mappings are created as invalid mappings with a > > software bit set. > > Just as an observation, but given that they're invalid we can use any bit > from [63:2] to indicate that it's a PROT_NONE mapping, and that way we > can keep the real "software bits" for live mappings. > > But we can of course change that later when we need the bit for something > else. Right, so I used this approach for consistency with the kernel's PROT_NONE mappings: #define PTE_PROT_NONE (_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */ And in fact now that I think about it, it might be worth re-using the same bit in stage 2. But yes it would be pretty neat to use the other bits of invalid mappings to add metadata about the pages. I could even drop the PROT_NONE stuff straight away in favor of a more extensive mechanism for tracking page ownership... Thinking about it, it should be relatively straightforward to construct the host stage 2 with the following invariants: 1. everything is identity-mapped in the host stage 2; 2. all valid mappings imply the underlying PA range belongs to the host; 3. bits [63:32] (say) of all invalid mappings are used to store a unique identifier for the owner of the underlying PA range; 4. the host identifier is 0, such that it owns all of memory by default at boot as its pgd is zeroed; And then I could replace my PROT_NONE permission stuff by an ownership change. E.g. the hypervisor would have its own identifier, and I could use it to mark the .hyp memory sections as owned by the hyp (which implies inaccessible by the host). And that should scale quite easily when we start running protected guests as we'll assign them their own identifiers. Sharing pages between guests (or even worse, between host and guests) is a bit trickier, but maybe that is for later. Thoughts? > > > > Signed-off-by: Quentin Perret > > --- > > arch/arm64/include/asm/kvm_pgtable.h | 2 ++ > > arch/arm64/kvm/hyp/pgtable.c | 26 -- > > 2 files changed, 26 insertions(+), 2 deletions(-) > > > > diff --git a/arch/arm64/include/asm/kvm_pgtable.h > > b/arch/arm64/include/asm/kvm_pgtable.h > > index 9935dbae2cc1..c9f6ed76e0ad 100644 > > --- a/arch/arm64/include/asm/kvm_pgtable.h > > +++ b/arch/arm64/include/asm/kvm_pgtable.h > > @@ -80,6 +80,7 @@ struct kvm_pgtable { > > * @KVM_PGTABLE_PROT_W:Write permission. > > * @KVM_PGTABLE_PROT_R:Read permission. > > * @KVM_PGTABLE_PROT_DEVICE: Device attributes. > > + * @KVM_PGTABLE_PROT_NONE: No permission. > > */ > > enum kvm_pgtable_prot { > > KVM_PGTABLE_PROT_X = BIT(0), > > @@ -87,6 +88,7 @@ enum kvm_pgtable_prot { > > KVM_PGTABLE_PROT_R = BIT(2), > > > > KVM_PGTABLE_PROT_DEVICE = BIT(3), > > + KVM_PGTABLE_PROT_NONE = BIT(4), > > Why do we need an extra entry here? Couldn't we just create PROT_NONE > entries when none of R,W or X are set? The kernel has an explicit PAGE_NONE permission, so I followed the same idea, but that could work as well. Now, as per the above discussion that may not be relevant if we implement the page ownership thing. > > }; > > > > #define PAGE_HYP (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W) > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > > index bdd6e3d4eeb6..8e7059fcfd40 100644 > > --- a/arch/arm64/kvm/hyp/pgtable.c > > +++ b/arch/arm64/kvm/hyp/pgtable.c > > @@ -48,6 +48,8 @@ > > KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \ > > KVM_PTE_LEAF_ATTR_HI_S2_XN) > > > > +#define KVM_PTE_LEAF_SW_BIT_PROT_NONE BIT(55) > > + > > struct kvm_pgtable_walk_data { > > struct kvm_pgtable *pgt; > > struct kvm_pgtable_walker *walker; > > @@ -120,6 +122,16 @@ static bool kvm_pte_valid(kvm_pte_t pte) > > return pte & KVM_PTE_VALID; > > } > > > > +static bool kvm_pte_prot_none(kvm_pte_t pte) > > +{ > > + return pte & KVM_PTE_LEAF_SW_BIT_PROT_NONE; > > +} > > I think it would be a good idea to check !kvm_pte_valid() in here too, > since it doesn't make sense to report true for valid (or table)
Re: [PATCH v3 27/32] KVM: arm64: Refactor stage2_map_set_prot_attr()
On Thursday 04 Mar 2021 at 20:03:36 (+), Will Deacon wrote: > On Tue, Mar 02, 2021 at 02:59:57PM +, Quentin Perret wrote: > > In order to ease its re-use in other code paths, refactor > > stage2_map_set_prot_attr() to not depend on a stage2_map_data struct. > > No functional change intended. > > > > Signed-off-by: Quentin Perret > > --- > > arch/arm64/kvm/hyp/pgtable.c | 19 --- > > 1 file changed, 8 insertions(+), 11 deletions(-) > > > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > > index 8e7059fcfd40..8aa01a9e2603 100644 > > --- a/arch/arm64/kvm/hyp/pgtable.c > > +++ b/arch/arm64/kvm/hyp/pgtable.c > > @@ -494,8 +494,7 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift) > > return vtcr; > > } > > > > -static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot, > > - struct stage2_map_data *data) > > +static kvm_pte_t stage2_get_prot_attr(enum kvm_pgtable_prot prot) > > { > > bool device = prot & KVM_PGTABLE_PROT_DEVICE; > > kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) : > > @@ -504,15 +503,15 @@ static int stage2_map_set_prot_attr(enum > > kvm_pgtable_prot prot, > > > > if (prot & KVM_PGTABLE_PROT_NONE) { > > if (prot != KVM_PGTABLE_PROT_NONE) > > - return -EINVAL; > > + return 0; > > Hmm, does the architecture actually say that having all these attributes > as 0 is illegal? Hmm, that's a good point, that might not be the case. I assumed we would have no use for this, but there we can easily avoid the restriction so... > If not, I think it would be better to keep the int return > code and replace the 'data' parameter with a pointer to a kvm_pte_t. > > Does that work? I think so yes, I'll fix it up. Cheers, Quentin ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 24/32] KVM: arm64: Reserve memory for host stage 2
On Thursday 04 Mar 2021 at 19:49:53 (+), Will Deacon wrote: > On Tue, Mar 02, 2021 at 02:59:54PM +, Quentin Perret wrote: > > Extend the memory pool allocated for the hypervisor to include enough > > pages to map all of memory at page granularity for the host stage 2. > > While at it, also reserve some memory for device mappings. > > > > Signed-off-by: Quentin Perret > > --- > > arch/arm64/kvm/hyp/include/nvhe/mm.h | 23 ++- > > arch/arm64/kvm/hyp/nvhe/setup.c | 12 > > arch/arm64/kvm/hyp/reserved_mem.c| 2 ++ > > 3 files changed, 36 insertions(+), 1 deletion(-) > > > > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h > > b/arch/arm64/kvm/hyp/include/nvhe/mm.h > > index ac0f7fcffd08..411a35db949c 100644 > > --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h > > +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h > > @@ -53,7 +53,7 @@ static inline unsigned long > > __hyp_pgtable_max_pages(unsigned long nr_pages) > > return total; > > } > > > > -static inline unsigned long hyp_s1_pgtable_pages(void) > > +static inline unsigned long __hyp_pgtable_total_pages(void) > > { > > unsigned long res = 0, i; > > > > @@ -63,9 +63,30 @@ static inline unsigned long hyp_s1_pgtable_pages(void) > > res += __hyp_pgtable_max_pages(reg->size >> PAGE_SHIFT); > > } > > > > + return res; > > +} > > + > > +static inline unsigned long hyp_s1_pgtable_pages(void) > > +{ > > + unsigned long res; > > + > > + res = __hyp_pgtable_total_pages(); > > + > > /* Allow 1 GiB for private mappings */ > > res += __hyp_pgtable_max_pages(SZ_1G >> PAGE_SHIFT); > > > > return res; > > } > > + > > +static inline unsigned long host_s2_mem_pgtable_pages(void) > > +{ > > + return __hyp_pgtable_total_pages() + 16; > > Is this 16 due to the possibility of a concatenated pgd? Yes it is, to be sure we have a safe upper-bound. > If so, please add a comment to that effect. Will do, thanks. Quentin ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 16/32] KVM: arm64: Elevate hypervisor mappings creation at EL2
On Thursday 04 Mar 2021 at 19:25:41 (+), Will Deacon wrote: > > +static int do_pkvm_init(u32 hyp_va_bits) > > +{ > > + void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base); > > + int ret; > > + > > + preempt_disable(); > > + hyp_install_host_vector(); > > It's a shame we need this both here _and_ on the reinit path, but it looks > like it's necessary. Right and I want this before the KVM vectors are installed on secondary CPUs, to make sure they get the new pgtable from the start. Otherwise I'd need to do the same dance on all of them to go a switch TTBR0_EL2 and such. > > + ret = kvm_call_hyp_nvhe(__pkvm_init, hyp_mem_base, hyp_mem_size, > > + num_possible_cpus(), kern_hyp_va(per_cpu_base), > > + hyp_va_bits); > > + preempt_enable(); > > + > > + return ret; > > +} > > [...] > > > /** > > * Inits Hyp-mode on all online CPUs > > */ > > static int init_hyp_mode(void) > > { > > + u32 hyp_va_bits; > > int cpu; > > - int err = 0; > > + int err = -ENOMEM; > > + > > + /* > > +* The protected Hyp-mode cannot be initialized if the memory pool > > +* allocation has failed. > > +*/ > > + if (is_protected_kvm_enabled() && !hyp_mem_base) > > + return err; > > This skips the error message you get on the out_err path. Ack, I'll fix. > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index 4d41d7838d53..9d331bf262d2 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -221,15 +221,39 @@ void free_hyp_pgds(void) > > if (hyp_pgtable) { > > kvm_pgtable_hyp_destroy(hyp_pgtable); > > kfree(hyp_pgtable); > > + hyp_pgtable = NULL; > > } > > mutex_unlock(_hyp_pgd_mutex); > > } > > > > +static bool kvm_host_owns_hyp_mappings(void) > > +{ > > + if (static_branch_likely(_protected_mode_initialized)) > > + return false; > > + > > + /* > > +* This can happen at boot time when __create_hyp_mappings() is called > > +* after the hyp protection has been enabled, but the static key has > > +* not been flipped yet. > > +*/ > > + if (!hyp_pgtable && is_protected_kvm_enabled()) > > + return false; > > + > > + WARN_ON(!hyp_pgtable); > > + > > + return true; > > return !(WARN_ON(!hyp_pgtable) && is_protected_kvm_enabled()); Wouldn't this WARN when I have !hyp_pgtable && is_protected_kvm_enabled() but the static key is still off (which can happen, sadly, as per the comment above)? Thanks, Quentin ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: Add big.LITTLE architecture support for vcpu
On Fri, 05 Mar 2021 10:34:25 +, Yejune Deng wrote: > > In big.LITTLE architecture,the application layer calls KVM_ARM_VCPU_INIT > several times.Sometimes the thread will run on the big core, sometimes > will run on the little core.The big core and the little core has always > different cpuid, but the init target is fixed. and that leads to > init->target != phys_target. So modify phys target from the current core > to all cpu online. This is done on purpose, and it is userspace's responsibility to pin its vcpu threads to a certain type of CPUs if it cares at all. Thanks, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: Disable LTO in hyp
On Fri, 05 Mar 2021 02:38:17 +, Sami Tolvanen wrote: > > On Thu, Mar 4, 2021 at 2:34 PM Sami Tolvanen wrote: > > > > On Thu, Mar 4, 2021 at 2:17 PM Marc Zyngier wrote: > > > > > > On Thu, 04 Mar 2021 21:25:41 +, > > > Sami Tolvanen wrote: [...] > > > > I assume hyp_panic() ends up being placed too far from __guest_enter() > > > > when the kernel is large enough. Possibly something to do with LLVM > > > > always splitting functions into separate sections with LTO. I'm not > > > > sure why the linker cannot shuffle things around to make everyone > > > > happy in this case, but I confirmed that this patch also fixes the > > > > build issue for me: > > > > > > > > diff --git a/arch/arm64/kvm/hyp/vhe/switch.c > > > > b/arch/arm64/kvm/hyp/vhe/switch.c > > > > index af8e940d0f03..128197b7c794 100644 > > > > --- a/arch/arm64/kvm/hyp/vhe/switch.c > > > > +++ b/arch/arm64/kvm/hyp/vhe/switch.c > > > > @@ -214,7 +214,7 @@ static void __hyp_call_panic(u64 spsr, u64 elr, u64 > > > > par) > > > > } > > > > NOKPROBE_SYMBOL(__hyp_call_panic); > > > > > > > > -void __noreturn hyp_panic(void) > > > > +void __noreturn hyp_panic(void) __section(".text") > > > > { > > > > u64 spsr = read_sysreg_el2(SYS_SPSR); > > > > u64 elr = read_sysreg_el2(SYS_ELR); > > > > > > > > > > We're getting into black-magic territory here. Why wouldn't hyp_panic > > > be in the .text section already? > > > > It's not quite black magic. LLVM essentially flips on > > -ffunction-sections with LTO and therefore, hyp_panic() will be in > > .text.hyp_panic in vmlinux.o, while __guest_enter() will be in .text. > > Everything ends up in .text when we link vmlinux, of course. > > > > $ readelf --sections vmlinux.o | grep hyp_panic > > [3936] .text.hyp_panic PROGBITS 004b56e4 > > Note that disabling LTO here has essentially the same effect as using > __section(".text"). It stops the compiler from splitting these > functions into .text.* sections and makes it less likely that > hyp_panic() ends up too far away from __guest_enter(). > > If neither of these workarounds sound appealing, I suppose we could > alternatively change hyp/entry.S to use adr_l for hyp_panic. Thoughts? That would be an actual fix instead of a workaround, as it would remove existing assumptions about the relative locations of the two objects. I guess you need to fix both instances with something such as: diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S index b0afad7a99c6..a43e1f7ee354 100644 --- a/arch/arm64/kvm/hyp/entry.S +++ b/arch/arm64/kvm/hyp/entry.S @@ -85,8 +85,10 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL) // If the hyp context is loaded, go straight to hyp_panic get_loaded_vcpu x0, x1 - cbz x0, hyp_panic - + cbnzx0, 1f + adr_l x0, hyp_panic + br x0 +1: // The hyp context is saved so make sure it is restored to allow // hyp_panic to run at hyp and, subsequently, panic to run in the host. // This makes use of __guest_exit to avoid duplication but sets the @@ -94,7 +96,7 @@ SYM_INNER_LABEL(__guest_exit_panic, SYM_L_GLOBAL) // current state is saved to the guest context but it will only be // accurate if the guest had been completely restored. adr_this_cpu x0, kvm_hyp_ctxt, x1 - adr x1, hyp_panic + adr_l x1, hyp_panic str x1, [x0, #CPU_XREG_OFFSET(30)] get_vcpu_ptrx1, x0 which is completely untested. I wouldn't be surprised if there were more of these somewhere. Thanks, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v12 03/13] vfio: VFIO_IOMMU_SET_MSI_BINDING
Hi, On Tue, Feb 23, 2021 at 10:06:15PM +0100, Eric Auger wrote: > This patch adds the VFIO_IOMMU_SET_MSI_BINDING ioctl which aim > to (un)register the guest MSI binding to the host. This latter > then can use those stage 1 bindings to build a nested stage > binding targeting the physical MSIs. Now that RMR is in the IORT spec, could it be used for the nested MSI problem? For virtio-iommu tables I was planning to do it like this: MSI is mapped at stage-2 with an arbitrary IPA->doorbell PA. We report this IPA to userspace through iommu_groups/X/reserved_regions. No change there. Then to the guest we report a reserved identity mapping at IPA (using RMR, an equivalent DT binding, or probed RESV_MEM for virtio-iommu). The guest creates that mapping at stage-1, and that's it. Unless I overlooked something we'd only reuse existing infrastructure and avoid the SET_MSI_BINDING interface. Thanks, Jean ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
RE: [RFC PATCH 4/5] iommu/arm-smmu-v3: Use pinned VMID for NESTED stage with BTM
Hi Jean, > -Original Message- > From: Jean-Philippe Brucker [mailto:jean-phili...@linaro.org] > Sent: 04 March 2021 17:11 > To: Shameerali Kolothum Thodi > Cc: linux-arm-ker...@lists.infradead.org; io...@lists.linux-foundation.org; > kvmarm@lists.cs.columbia.edu; m...@kernel.org; > alex.william...@redhat.com; eric.au...@redhat.com; > zhangfei@linaro.org; Jonathan Cameron > ; Zengtao (B) ; > linux...@openeuler.org > Subject: Re: [RFC PATCH 4/5] iommu/arm-smmu-v3: Use pinned VMID for > NESTED stage with BTM > > Hi Shameer, > > On Mon, Feb 22, 2021 at 03:53:37PM +, Shameer Kolothum wrote: > > If the SMMU supports BTM and the device belongs to NESTED domain > > with shared pasid table, we need to use the VMID allocated by the > > KVM for the s2 configuration. Hence, request a pinned VMID from KVM. > > > > Signed-off-by: Shameer Kolothum > > --- > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 49 > - > > 1 file changed, 47 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > index 26bf7da1bcd0..04f83f7c8319 100644 > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > @@ -28,6 +28,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > > > @@ -2195,6 +2196,33 @@ static void arm_smmu_bitmap_free(unsigned > long *map, int idx) > > clear_bit(idx, map); > > } > > > > +static int arm_smmu_pinned_vmid_get(struct arm_smmu_domain > *smmu_domain) > > +{ > > + struct arm_smmu_master *master; > > + > > + master = list_first_entry_or_null(_domain->devices, > > + struct arm_smmu_master, domain_head); > > This probably needs to hold devices_lock while using master. Ok. > > > + if (!master) > > + return -EINVAL; > > + > > + return kvm_pinned_vmid_get(master->dev); > > +} > > + > > +static int arm_smmu_pinned_vmid_put(struct arm_smmu_domain > *smmu_domain) > > +{ > > + struct arm_smmu_master *master; > > + > > + master = list_first_entry_or_null(_domain->devices, > > + struct arm_smmu_master, domain_head); > > + if (!master) > > + return -EINVAL; > > + > > + if (smmu_domain->s2_cfg.vmid) > > + return kvm_pinned_vmid_put(master->dev); > > + > > + return 0; > > +} > > + > > static void arm_smmu_domain_free(struct iommu_domain *domain) > > { > > struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); > > @@ -2215,8 +2243,11 @@ static void arm_smmu_domain_free(struct > iommu_domain *domain) > > mutex_unlock(_smmu_asid_lock); > > } > > if (s2_cfg->set) { > > - if (s2_cfg->vmid) > > - arm_smmu_bitmap_free(smmu->vmid_map, s2_cfg->vmid); > > + if (s2_cfg->vmid) { > > + if (!(smmu->features & ARM_SMMU_FEAT_BTM) && > > + smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED) > > + arm_smmu_bitmap_free(smmu->vmid_map, > s2_cfg->vmid); > > + } > > } > > > > kfree(smmu_domain); > > @@ -3199,6 +3230,17 @@ static int arm_smmu_attach_pasid_table(struct > iommu_domain *domain, > > !(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB)) > > goto out; > > > > + if (smmu->features & ARM_SMMU_FEAT_BTM) { > > + ret = arm_smmu_pinned_vmid_get(smmu_domain); > > + if (ret < 0) > > + goto out; > > + > > + if (smmu_domain->s2_cfg.vmid) > > + arm_smmu_bitmap_free(smmu->vmid_map, > smmu_domain->s2_cfg.vmid); > > + > > + smmu_domain->s2_cfg.vmid = (u16)ret; > > That will require a TLB invalidation on the old VMID, once the STE is > rewritten. True. Will add that. > More generally I think this pinned VMID set conflicts with that of > stage-2-only domains (which is the default state until a guest attaches a > PASID table). Say you have one guest using DOMAIN_NESTED without PASID > table, just DMA to IPA using VMID 0x8000. Now another guest attaches a > PASID table and obtains the same VMID from KVM. The stage-2 translation > might use TLB entries from the other guest, no? They'll both create > stage-2 TLB entries with {StreamWorld=NS-EL1, VMID=0x8000} > > It's tempting to allocate all VMIDs through KVM instead, but that will > force a dependency on KVM to use VFIO_TYPE1_NESTING_IOMMU and might > break > existing users of that extension (though I'm not sure there are any). > Instead we might need to restrict the SMMU VMID bitmap to match the > private VMID set in KVM. Right, that is indeed a problem. I will take a look at this suggestion. > Besides we probably want to restrict this feature to systems supporting > VMID16 on both SMMU and CPUs, or at least check that they are compatible.