Re: [PATCH v4 kvmtool 00/12] arm64: Allow the user to set RAM base address
On 16/06/2022 14:48, Alexandru Elisei wrote: The series can be found at [1]. It is loosely based on the patches that allow the user to define the VM memory layout (RAM + MMIO) [2]. I've cherry-picked a handful of patches from that series, the rest I wrote from scratch since there have been several changes to the way guest memory is handled. I've chosen to focus on specifying the RAM layout with only one RAM bank and leave the rest for a later series because this was relatively easy to accomplish, while still being very useful. What this series does: for arm64, the user can now specify the base address for RAM: $ ./lkvm run -m1G@2G .. # Equivalent to ./lkvm run -m1024 The memory units are B (bytes), K (kilobytes), M (megabytes), G (gigabytes), T (terrabytes), P (petabytes). Lowercase is also valid. Want to put RAM at the top of the physical address range? Easy: $ ./lkvm run -m2G@1022G .. # Assumes the maximum is 40 bits of IPA There one limitation on the RAM base address: it must not overlap with the MMIO range that kvmtool uses for arm/arm64, which lives below 2GB. Why this is useful, in my opinion: 1. Testing how a payload handles different memory layouts without the need to hack kvmtool or find the hardware that implements the desired layout. 2. It can serve as a development tool for adding support for larger PA ranges for Linux and KVM (currently capped at 48 bits for 4k/16k pages), or other payloads. Summary of the series == * The series starts with refactoring how kvm->cfg.ram_size is validated and used, followed by several cleanups in the arm and arm64 code. * Then patch #8 ("builtin_run: Allow standard size specifiers for memory") introduced the ability to specify the measurement unit for memory. I believe that typing the equivalent of 2TB in megabytes isn't appealing for anyone. * More cleanups in the arm/arm64 code follow, which are needed for patch #12 ("arm64: Allow the user to specify the RAM base address"). This is where the ability to specify the RAM base address is introduced. Testing === Same testing as before: - Build tested each patch for all architectures. - Ran an x86 kernel with and without setting the amount of RAM using the memory specifiers; tested that setting the RAM address results in an error. - Ran an arm64 kernel without setting the size, with setting the size and with setting the size and address; tried different addresses (2G, 3G, 256G); also tested that going below 2G or above the maximum IPA correctly results in an error. - Ran all arm64 kvm-unit-test tests with similar combinations of memory size and address (instead of 256G I used 128G, as that's where I/O lives for qemu and kvm-unit-tests maps that unconditionally as I/O). - Ran all 32bit arm tests on an arm64 host with various combinations of memory size and address (base address at 2G and 2.5G only due to a limitation in the way the tests are set up). I have tested this series on arm64 Fast model, with memory placed from 32bit to 48bit IPA and it works well. For the series: Reviewed-and-Tested-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v11 03/40] arm64: cpufeature: Always specify and use a field width for capabilities
On 07/02/2022 15:20, Mark Brown wrote: Since all the fields in the main ID registers are 4 bits wide we have up until now not bothered specifying the width in the code. Since we now wish to use this mechanism to enumerate features from the floating point feature registers which do not follow this pattern add a width to the table. This means updating all the existing table entries but makes it less likely that we run into issues in future due to implicitly assuming a 4 bit width. Signed-off-by: Mark Brown Cc: Suzuki K Poulose Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 01/38] arm64: cpufeature: Always specify and use a field width for capabilities
On 25/01/2022 00:10, Mark Brown wrote: Since all the fields in the main ID registers are 4 bits wide we have up until now not bothered specifying the width in the code. Since we now wish to use this mechanism to enumerate features from the floating point feature registers which do not follow this pattern add a width to the table. This means updating all the existing table entries but makes it less likely that we run into issues in future due to implicitly assuming a 4 bit width. Signed-off-by: Mark Brown Cc: Suzuki K Poulose --- arch/arm64/include/asm/cpufeature.h | 1 + arch/arm64/kernel/cpufeature.c | 167 +--- 2 files changed, 102 insertions(+), 66 deletions(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index ef6be92b1921..2728abd9cae4 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -356,6 +356,7 @@ struct arm64_cpu_capabilities { struct {/* Feature register checking */ u32 sys_reg; u8 field_pos; + u8 field_width; u8 min_field_value; u8 hwcap_type; bool sign; diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index a46ab3b1c4d5..d9f09e40aaf6 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1307,7 +1307,9 @@ u64 __read_sysreg_by_encoding(u32 sys_id) static bool feature_matches(u64 reg, const struct arm64_cpu_capabilities *entry) { - int val = cpuid_feature_extract_field(reg, entry->field_pos, entry->sign); + int val = cpuid_feature_extract_field_width(reg, entry->field_pos, + entry->field_width, + entry->sign); Could we do something like : + int val = cpuid_feature_extract_field_width(reg, entry->field_pos, entry->field_width ? : 4, .. ); and leave the existing structures as they are ? return val >= entry->min_field_value; } @@ -1952,6 +1954,7 @@ static const struct arm64_cpu_capabilities arm64_features[] = { There is arm64_errata[] array in cpu_errata.c. So, if we use the above proposal we could leave things unchanged for all existing uses. .matches = has_cpuid_feature, .sys_reg = SYS_ID_AA64MMFR0_EL1, .field_pos = ID_AA64MMFR0_ECV_SHIFT, + .field_width = 4, .sign = FTR_UNSIGNED, .min_field_value = 1, }, ... -#define HWCAP_CPUID_MATCH(reg, field, s, min_value) \ +#define HWCAP_CPUID_MATCH(reg, field, width, s, min_value) \ .matches = has_cpuid_feature, \ .sys_reg = reg, \ .field_pos = field, \ + .field_width = width, \ .sign = s, \ .min_field_value = min_value, And that could avoid these changes too. We could add : #define HWCAP_CPUID_MATCH_WIDTH(...) when/if we need one, which sets the width. Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC PATCH v4 02/39] KVM: arm64: Add lock/unlock memslot user API
On 25/08/2021 17:17, Alexandru Elisei wrote: Stage 2 faults triggered by the profiling buffer attempting to write to memory are reported by the SPE hardware by asserting a buffer management event interrupt. Interrupts are by their nature asynchronous, which means that the guest might have changed its stage 1 translation tables since the attempted write. SPE reports the guest virtual address that caused the data abort, not the IPA, which means that KVM would have to walk the guest's stage 1 tables to find the IPA. Using the AT instruction to walk the guest's tables in hardware is not an option because it doesn't report the IPA in the case of a stage 2 fault on a stage 1 table walk. Avoid both issues by pre-mapping the guest memory at stage 2. This is being done by adding a capability that allows the user to pin the memory backing a memslot. The same capability can be used to unlock a memslot, which unpins the pages associated with the memslot, but doesn't unmap the IPA range from stage 2; in this case, the addresses will be unmapped from stage 2 via the MMU notifiers when the process' address space changes. For now, the capability doesn't actually do anything other than checking that the usage is correct; the memory operations will be added in future patches. Signed-off-by: Alexandru Elisei --- Documentation/virt/kvm/api.rst | 56 +++ arch/arm64/include/asm/kvm_mmu.h | 3 ++ arch/arm64/kvm/arm.c | 42 -- arch/arm64/kvm/mmu.c | 76 include/uapi/linux/kvm.h | 8 5 files changed, 181 insertions(+), 4 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index dae68e68ca23..741327ef06b0 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6682,6 +6682,62 @@ MAP_SHARED mmap will result in an -EINVAL return. When enabled the VMM may make use of the ``KVM_ARM_MTE_COPY_TAGS`` ioctl to perform a bulk copy of tags to/from the guest. +7.29 KVM_CAP_ARM_LOCK_USER_MEMORY_REGION + + +:Architectures: arm64 +:Target: VM +:Parameters: flags is one of KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK or + KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK + args[0] is the slot number + args[1] specifies the permisions when the memslot is locked or if + all memslots should be unlocked + +The presence of this capability indicates that KVM supports locking the memory +associated with the memslot, and unlocking a previously locked memslot. + +The 'flags' parameter is defined as follows: + +7.29.1 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_LOCK +- + +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION +:Architectures: arm64 +:Target: VM +:Parameters: args[0] contains the memory slot number + args[1] contains the permissions for the locked memory: + KVM_ARM_LOCK_MEMORY_READ (mandatory) to map it with + read permissions and KVM_ARM_LOCK_MEMORY_WRITE + (optional) with write permissions +:Returns: 0 on success; negative error code on failure + +Enabling this capability causes the memory described by the memslot to be +pinned in the process address space and the corresponding stage 2 IPA range +mapped at stage 2. The permissions specified in args[1] apply to both +mappings. The memory pinned with this capability counts towards the max +locked memory limit for the current process. + +The capability must be enabled before any VCPUs have run. The virtual memory +range described by the memslot must be mapped in the userspace process without +any gaps. It is considered an error if write permissions are specified for a +memslot which logs dirty pages. + +7.29.2 KVM_ARM_LOCK_USER_MEMORY_REGION_FLAGS_UNLOCK +--- + +:Capability: 'flags' parameter to KVM_CAP_ARM_LOCK_USER_MEMORY_REGION +:Architectures: arm64 +:Target: VM +:Parameters: args[0] contains the memory slot number + args[1] optionally contains the flag KVM_ARM_UNLOCK_MEM_ALL, + which unlocks all previously locked memslots. +:Returns: 0 on success; negative error code on failure + +Enabling this capability causes the memory pinned when locking the memslot +specified in args[0] to be unpinned, or, optionally, the memory associated +with all locked memslots, to be unpinned. The IPA range is not unmapped +from stage 2. + 8. Other capabilities. == diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index b52c5c4b9a3d..ef079b5eb475 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -216,6 +216,9 @@ static inline void __invalidate_icache_guest_page(void *va, size_t size) void kvm_set_way_flush(struct
Re: [PATCH] KVM: arm64: Allow KVM to be disabled from the command line
On 30/09/2021 11:29, Marc Zyngier wrote: On Wed, 29 Sep 2021 11:35:46 +0100, Suzuki K Poulose wrote: On 03/09/2021 10:16, Marc Zyngier wrote: Although KVM can be compiled out of the kernel, it cannot be disabled at runtime. Allow this possibility by introducing a new mode that will prevent KVM from initialising. This is useful in the (limited) circumstances where you don't want KVM to be available (what is wrong with you?), or when you want to install another hypervisor instead (good luck with that). Signed-off-by: Marc Zyngier --- Documentation/admin-guide/kernel-parameters.txt | 3 +++ arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kernel/idreg-override.c | 1 + arch/arm64/kvm/arm.c| 14 +- 4 files changed, 18 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 91ba391f9b32..cc5f68846434 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2365,6 +2365,9 @@ kvm-arm.mode= [KVM,ARM] Select one of KVM/arm64's modes of operation. +none: Forcefully disable KVM and run in nVHE mode, + preventing KVM from ever initialising. + nvhe: Standard nVHE-based mode, without support for protected guests. diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index f8be56d5342b..019490c67976 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -58,6 +58,7 @@ enum kvm_mode { KVM_MODE_DEFAULT, KVM_MODE_PROTECTED, + KVM_MODE_NONE, }; enum kvm_mode kvm_get_mode(void); diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index d8e606fe3c21..57013c1b6552 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -95,6 +95,7 @@ static const struct { charalias[FTR_ALIAS_NAME_LEN]; charfeature[FTR_ALIAS_OPTION_LEN]; } aliases[] __initconst = { + { "kvm-arm.mode=none","id_aa64mmfr1.vh=0" }, { "kvm-arm.mode=nvhe","id_aa64mmfr1.vh=0" }, { "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" }, { "arm64.nobti", "id_aa64pfr1.bt=0" }, diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index fe102cd2e518..cdc70e238316 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -2064,6 +2064,11 @@ int kvm_arch_init(void *opaque) return -ENODEV; } +if (kvm_get_mode() == KVM_MODE_NONE) { + kvm_info("KVM disabled from command line\n"); + return -ENODEV; + } + in_hyp_mode = is_kernel_in_hyp_mode(); if (cpus_have_final_cap(ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE) || @@ -2137,8 +2142,15 @@ static int __init early_kvm_mode_cfg(char *arg) return 0; } -if (strcmp(arg, "nvhe") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) + if (strcmp(arg, "nvhe") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) { + kvm_mode = KVM_MODE_DEFAULT; return 0; + } + + if (strcmp(arg, "none") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) { nit: Does this really need to WARN here ? Unlike the "nvhe" case, if the user wants to keep the KVM out of the picture for, say debugging something, it is perfectly Ok to allow the kernel to be running at EL2 without having to change the Firmware to alter the landing EL for the kernel ? Well, the doc says "run in nVHE mode" and the option forces id_aa64mmfr1.vh=0. The WARN_ON() will only fires on broken^Wfruity HW that is VHE only. Note that this doesn't rely on any firmware change (we drop from EL2 to EL1 and stay there). Ah, ok. So the "none" is in fact "nvhe + no-kvm". Thats the bit I missed. TBH, that name to me sounds like "no KVM" at all, which is what we want. The question is, do we really need "none" to force vh == 0 ? I understand this is only a problem on a rare set of HWs. But the generic option looks deceiving. That said, I am happy to leave this as is and the doc says so. We could add another option (none-vhe?) that stays at EL2 and still disables KVM if there is an appetite for it. Na. Don't think that is necessary. Otherwise, Acked-by: Suzuki K Poulose Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: Allow KVM to be disabled from the command line
On 03/09/2021 10:16, Marc Zyngier wrote: Although KVM can be compiled out of the kernel, it cannot be disabled at runtime. Allow this possibility by introducing a new mode that will prevent KVM from initialising. This is useful in the (limited) circumstances where you don't want KVM to be available (what is wrong with you?), or when you want to install another hypervisor instead (good luck with that). Signed-off-by: Marc Zyngier --- Documentation/admin-guide/kernel-parameters.txt | 3 +++ arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kernel/idreg-override.c | 1 + arch/arm64/kvm/arm.c| 14 +- 4 files changed, 18 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 91ba391f9b32..cc5f68846434 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2365,6 +2365,9 @@ kvm-arm.mode= [KVM,ARM] Select one of KVM/arm64's modes of operation. + none: Forcefully disable KVM and run in nVHE mode, + preventing KVM from ever initialising. + nvhe: Standard nVHE-based mode, without support for protected guests. diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index f8be56d5342b..019490c67976 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -58,6 +58,7 @@ enum kvm_mode { KVM_MODE_DEFAULT, KVM_MODE_PROTECTED, + KVM_MODE_NONE, }; enum kvm_mode kvm_get_mode(void); diff --git a/arch/arm64/kernel/idreg-override.c b/arch/arm64/kernel/idreg-override.c index d8e606fe3c21..57013c1b6552 100644 --- a/arch/arm64/kernel/idreg-override.c +++ b/arch/arm64/kernel/idreg-override.c @@ -95,6 +95,7 @@ static const struct { charalias[FTR_ALIAS_NAME_LEN]; charfeature[FTR_ALIAS_OPTION_LEN]; } aliases[] __initconst = { + { "kvm-arm.mode=none","id_aa64mmfr1.vh=0" }, { "kvm-arm.mode=nvhe","id_aa64mmfr1.vh=0" }, { "kvm-arm.mode=protected", "id_aa64mmfr1.vh=0" }, { "arm64.nobti", "id_aa64pfr1.bt=0" }, diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index fe102cd2e518..cdc70e238316 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -2064,6 +2064,11 @@ int kvm_arch_init(void *opaque) return -ENODEV; } + if (kvm_get_mode() == KVM_MODE_NONE) { + kvm_info("KVM disabled from command line\n"); + return -ENODEV; + } + in_hyp_mode = is_kernel_in_hyp_mode(); if (cpus_have_final_cap(ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE) || @@ -2137,8 +2142,15 @@ static int __init early_kvm_mode_cfg(char *arg) return 0; } - if (strcmp(arg, "nvhe") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) + if (strcmp(arg, "nvhe") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) { + kvm_mode = KVM_MODE_DEFAULT; return 0; + } + + if (strcmp(arg, "none") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) { nit: Does this really need to WARN here ? Unlike the "nvhe" case, if the user wants to keep the KVM out of the picture for, say debugging something, it is perfectly Ok to allow the kernel to be running at EL2 without having to change the Firmware to alter the landing EL for the kernel ? Otherwise, Acked-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC PATCH v4 01/39] KVM: arm64: Make lock_all_vcpus() available to the rest of KVM
On 25/08/2021 17:17, Alexandru Elisei wrote: The VGIC code uses the lock_all_vcpus() function to make sure no VCPUs are run while it fiddles with the global VGIC state. Move the declaration of lock_all_vcpus() and the corresponding unlock function into asm/kvm_host.h where it can be reused by other parts of KVM/arm64 and rename the functions to kvm_{lock,unlock}_all_vcpus() to make them more generic. Because the scope of the code potentially using the functions has increased, add a lockdep check that the kvm->lock is held by the caller. Holding the lock is necessary because otherwise userspace would be able to create new VCPUs and run them while the existing VCPUs are locked. No functional change intended. Signed-off-by: Alexandru Elisei LGTM, Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC PATCH v4 00/39] KVM: arm64: Add Statistical Profiling Extension (SPE) support
On 25/08/2021 17:17, Alexandru Elisei wrote: This is v4 of the SPE series posted at [1]. v2 can be found at [2], and the original series at [3]. Statistical Profiling Extension (SPE) is an optional feature added in ARMv8.2. It allows sampling at regular intervals of the operations executed by the PE and storing a record of each operation in a memory buffer. A high level overview of the extension is presented in an article on arm.com [4]. This is another complete rewrite of the series, and nothing is set in stone. If you think of a better way to do things, please suggest it. Features added == The rewrite enabled me to add support for several features not present in the previous iteration: - Support for heterogeneous systems, where only some of the CPUs support SPE. This is accomplished via the KVM_ARM_VCPU_SUPPORTED_CPUS VCPU ioctl. - Support for VM migration with the KVM_ARM_VCPU_SPE_CTRL(KVM_ARM_VCPU_SPE_STOP) VCPU ioctl. - The requirement for userspace to mlock() the guest memory has been removed, and now userspace can make changes to memory contents after the memory is mapped at stage 2. - Better debugging of guest memory pinning by printing a warning when we get an unexpected read or write fault. This helped me catch several bugs during development, it has already proven very useful. Many thanks to James who suggested when reviewing v3. Missing features I've tried to keep the series as small as possible to make it easier to review, while implementing the core functionality needed for the SPE emulation. As such, I've chosen to not implement several features: - Host profiling a guest which has the SPE feature bit set (see open questions). - No errata workarounds have been implemented yet, and there are quite a few of them for Neoverse N1 and Neoverse V1. - Disabling CONFIG_NUMA_BALANCING is a hack to get KVM SPE to work and I am investigating other ways to get around automatic numa balancing, like requiring userspace to disable it via set_mempolicy(). I am also going to look at how VFIO gets around it. Suggestions welcome. - There's plenty of room for optimization. Off the top of my head, using block mappings at stage 2, batch pinning of pages (similar to what VFIO does), optimize the way KVM keeps track of pinned pages (using a linked list triples the memory usage), context-switch the SPE registers on vcpu_load/vcpu_put on VHE if the host is not profiling, locking optimizations, etc, etc. - ...and others. I'm sure I'm missing at least a few things which are important for someone. Known issues This is an RFC, so keep in mind that almost definitely there will be scary bugs. For example, below is a list of known issues which don't affect the correctness of the emulation, and which I'm planning to fix in a future iteration: - With CONFIG_PROVE_LOCKING=y, lockdep complains about lock contention when the VCPU executes the dcache clean pending ops. - With CONFIG_PROVE_LOCKING=y, KVM will hit a BUG at kvm_lock_all_vcpus()->mutex_trylock(>mutex) with more than 48 VCPUs. This BUG statement can also be triggered with mainline. To reproduce it, compile kvmtool from this branch [5] and follow the instruction in the kvmtool commit message. One workaround could be to stop trying to lock all VCPUs when locking a memslot and document the fact that it is required that no VCPUs are run before the ioctl completes, otherwise bad things might happen to the VM. Open questions == 1. Implementing support for host profiling a guest with the SPE feature means setting the profiling buffer owning regime to EL2. While that is in effect, PMBIDR_EL1.P will equal 1. This has two consequences: if the guest probes SPE during this time, the driver will fail; and the guest will be able to determine when it is profiled. I see two options here: This doesn't mean the EL2 is owning the SPE. It only tells you that a higher level EL is owning the SPE. It could as well be EL3. (e.g, MDCR_EL3.NSPB == 0 or 1). So I think this is architecturally correct, as long as we trap the guest access to other SPE registers and inject and UNDEF. Thanks Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH] kvm: arm64: nvhe: Save the SPE context early
The nVHE KVM hyp drains and disables the SPE buffer, before entering the guest, as the EL1&0 translation regime is going to be loaded with that of the guest. But this operation is performed way too late, because : - The owning translation regime of the SPE buffer is transferred to EL2. (MDCR_EL2_E2PB == 0) - The guest Stage1 is loaded. Thus the flush could use the host EL1 virtual address, but use the EL2 translations instead of host EL1, for writing out any cached data. Fix this by moving the SPE buffer handling early enough. The restore path is doing the right thing. Fixes: 014c4c77aad7 ("KVM: arm64: Improve debug register save/restore flow") Cc: sta...@vger.kernel.org Cc: Christoffer Dall Cc: Marc Zyngier Cc: Will Deacon Cc: Catalin Marinas Cc: Mark Rutland Cc: Alexandru Elisei Reviewed-by: Alexandru Elisei Signed-off-by: Suzuki K Poulose --- arch/arm64/include/asm/kvm_hyp.h | 5 + arch/arm64/kvm/hyp/nvhe/debug-sr.c | 12 ++-- arch/arm64/kvm/hyp/nvhe/switch.c | 11 ++- 3 files changed, 25 insertions(+), 3 deletions(-) diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h index c0450828378b..385bd7dd3d39 100644 --- a/arch/arm64/include/asm/kvm_hyp.h +++ b/arch/arm64/include/asm/kvm_hyp.h @@ -83,6 +83,11 @@ void sysreg_restore_guest_state_vhe(struct kvm_cpu_context *ctxt); void __debug_switch_to_guest(struct kvm_vcpu *vcpu); void __debug_switch_to_host(struct kvm_vcpu *vcpu); +#ifdef __KVM_NVHE_HYPERVISOR__ +void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu); +void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu); +#endif + void __fpsimd_save_state(struct user_fpsimd_state *fp_regs); void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs); diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c index 91a711aa8382..f401724f12ef 100644 --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c @@ -58,16 +58,24 @@ static void __debug_restore_spe(u64 pmscr_el1) write_sysreg_s(pmscr_el1, SYS_PMSCR_EL1); } -void __debug_switch_to_guest(struct kvm_vcpu *vcpu) +void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu) { /* Disable and flush SPE data generation */ __debug_save_spe(>arch.host_debug_state.pmscr_el1); +} + +void __debug_switch_to_guest(struct kvm_vcpu *vcpu) +{ __debug_switch_to_guest_common(vcpu); } -void __debug_switch_to_host(struct kvm_vcpu *vcpu) +void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu) { __debug_restore_spe(vcpu->arch.host_debug_state.pmscr_el1); +} + +void __debug_switch_to_host(struct kvm_vcpu *vcpu) +{ __debug_switch_to_host_common(vcpu); } diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c index f3d0e9eca56c..59aa1045fdaf 100644 --- a/arch/arm64/kvm/hyp/nvhe/switch.c +++ b/arch/arm64/kvm/hyp/nvhe/switch.c @@ -192,6 +192,14 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu) pmu_switch_needed = __pmu_switch_to_guest(host_ctxt); __sysreg_save_state_nvhe(host_ctxt); + /* +* We must flush and disable the SPE buffer for nVHE, as +* the translation regime(EL1&0) is going to be loaded with +* that of the guest. And we must do this before we change the +* translation regime to EL2 (via MDCR_EL2_E2PB == 0) and +* before we load guest Stage1. +*/ + __debug_save_host_buffers_nvhe(vcpu); __adjust_pc(vcpu); @@ -234,11 +242,12 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu) if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) __fpsimd_save_fpexc32(vcpu); + __debug_switch_to_host(vcpu); /* * This must come after restoring the host sysregs, since a non-VHE * system may enable SPE here and make use of the TTBRs. */ - __debug_switch_to_host(vcpu); + __debug_restore_host_buffers_nvhe(vcpu); if (pmu_switch_needed) __pmu_switch_to_host(host_ctxt); -- 2.24.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v5 13/21] arm64: Allow ID_AA64MMFR1_EL1.VH to be overridden from the command line
On 1/25/21 10:50 AM, Marc Zyngier wrote: As we want to be able to disable VHE at runtime, let's match "id_aa64mmfr1.vh=" from the command line as an override. This doesn't have much effect yet as our boot code doesn't look at the cpufeature, but only at the HW registers. Signed-off-by: Marc Zyngier Acked-by: David Brazdil --- arch/arm64/include/asm/cpufeature.h | 2 ++ arch/arm64/kernel/cpufeature.c | 5 - arch/arm64/kernel/idreg-override.c | 11 +++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 4179cfc8ed57..b0ed37da4067 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -818,6 +818,8 @@ static inline unsigned int get_vmid_bits(u64 mmfr1) return 8; } +extern struct arm64_ftr_override id_aa64mmfr1_override; + u32 get_kvm_ipa_limit(void); void dump_cpu_features(void); diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 4b84a1e1dc51..c1d6712c4249 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -557,6 +557,8 @@ static const struct arm64_ftr_bits ftr_raz[] = { #define ARM64_FTR_REG(id, table) ARM64_FTR_REG_OVERRIDE(id, table, _override) +struct arm64_ftr_override id_aa64mmfr1_override; Does this need to be ro_after_init ? Otherwise, looks good to me: Acked-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v4 10/21] arm64: cpufeature: Use IDreg override in __read_sysreg_by_encoding()
On 1/22/21 6:53 PM, Catalin Marinas wrote: On Mon, Jan 18, 2021 at 09:45:22AM +, Marc Zyngier wrote: __read_sysreg_by_encoding() is used by a bunch of cpufeature helpers, which should take the feature override into account. Let's do that. For a good measure (and because we are likely to need to further down the line), make this helper available to the rest of the non-modular kernel. Code that needs to know the *real* features of a CPU can still use read_sysreg_s(), and find the bare, ugly truth. Signed-off-by: Marc Zyngier diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index aaa075c6f029..48a011935d8c 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -1149,14 +1149,17 @@ u64 read_sanitised_ftr_reg(u32 id) EXPORT_SYMBOL_GPL(read_sanitised_ftr_reg); #define read_sysreg_case(r) \ - case r: return read_sysreg_s(r) + case r: val = read_sysreg_s(r); break; /* * __read_sysreg_by_encoding() - Used by a STARTING cpu before cpuinfo is populated. * Read the system register on the current CPU */ -static u64 __read_sysreg_by_encoding(u32 sys_id) +u64 __read_sysreg_by_encoding(u32 sys_id) { + struct arm64_ftr_reg *regp; + u64 val; + switch (sys_id) { read_sysreg_case(SYS_ID_PFR0_EL1); read_sysreg_case(SYS_ID_PFR1_EL1); @@ -1199,6 +1202,14 @@ static u64 __read_sysreg_by_encoding(u32 sys_id) BUG(); return 0; } + + regp = get_arm64_ftr_reg(sys_id); + if (regp && regp->override_mask && regp->override_val) { + val &= ~*regp->override_mask; + val |= (*regp->override_val & *regp->override_mask); + } + + return val; Ah, now the previous patch makes more sense. I don't particularly like this but I can't tell how to work around it. I was hoping that the overriding feature behaves more like a secondary CPU that limits all the overridden features. However, this approach would fail for FTR_EXACT cases (like PAC, though I wonder whether it fails already with your previous patch since the boot CPU value won't match the override, hence dropping to the safe one). Correct !For FTR_EXACT, we dont want to override a value that is not safe, e.g PAC. This is handled correctly in the previous patch and thus we are covered. Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v4 09/21] arm64: cpufeature: Add global feature override facility
On 1/18/21 9:45 AM, Marc Zyngier wrote: Add a facility to globally override a feature, no matter what the HW says. Yes, this sounds dangerous, but we do respect the "safe" value for a given feature. This doesn't mean the user doesn't need to know what they are doing. Nothing uses this yet, so we are pretty safe. For now. Signed-off-by: Marc Zyngier Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 09/21] arm64: cpufeature: Add global feature override facility
On 1/12/21 11:51 AM, Marc Zyngier wrote: On 2021-01-12 11:50, Marc Zyngier wrote: Hi Suzuki, On 2021-01-12 09:17, Suzuki K Poulose wrote: Hi Marc, On 1/11/21 7:48 PM, Marc Zyngier wrote: [...] diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 894af60b9669..00d99e593b65 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -774,6 +774,7 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) u64 strict_mask = ~0x0ULL; u64 user_mask = 0; u64 valid_mask = 0; + u64 override_val = 0, override_mask = 0; const struct arm64_ftr_bits *ftrp; struct arm64_ftr_reg *reg = get_arm64_ftr_reg(sys_reg); @@ -781,9 +782,35 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) if (!reg) return; + if (reg->override_mask && reg->override_val) { + override_mask = *reg->override_mask; + override_val = *reg->override_val; + } + for (ftrp = reg->ftr_bits; ftrp->width; ftrp++) { u64 ftr_mask = arm64_ftr_mask(ftrp); s64 ftr_new = arm64_ftr_value(ftrp, new); + s64 ftr_ovr = arm64_ftr_value(ftrp, override_val); + + if ((ftr_mask & override_mask) == ftr_mask) { + if (ftr_ovr < ftr_new) { Here we assume that all the features are FTR_LOWER_SAFE. We could probably use arm64_ftr_safe_value(ftrp, ftr_new, ftr_ovr) here ? That would cover us for both HIGHER_SAFE and LOWER_SAFE features. However that may be restrictive for FTR_EXACT, as we the safe value would be set to "ftr->safe_val". I guess that may be better than forcing to use an unsafe value for the boot CPU, which could anyway conflict with the other CPUs and eventually trigger the ftr alue to be safe_val. I like the idea of using the helper, as it cleanups up the code a bit. However, not being to set a feature to a certain value could be restrictive, as in general, it means that we can only disable a feature and not adjust its level of support. Take PMUVER for example: with the helper, I can't override it from v8.4 to v8.1. I can only go to v8.0. Actually, we can only *disable* the PMU altogether. Same question though... It depends on two things : 1) What is the safe value for an EXACT typed feature ? Usually, that means either disabled, or the lowest possible value. 2) How is this value consumed ? a) i.e, Do we use the per-CPU value Then none of these changes have any effect b) System wide value ? Then we get the safe value as "influenced" by the infrastructure. The safe value we use for EXACT features is exclusively for making sure that the system uses the safe assumption and thus should be the best option. To answer your question, for PMU, it is 0, implies, v8.0. Or we could update the safe value to -1 (0xf) as the safe value, which is a bit more safer, kind of implying that the PMU is not a standard one. Cheers Suzuki M. Is it something we care about? Thanks, M. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 09/21] arm64: cpufeature: Add global feature override facility
On 1/12/21 11:50 AM, Marc Zyngier wrote: Hi Suzuki, On 2021-01-12 09:17, Suzuki K Poulose wrote: Hi Marc, On 1/11/21 7:48 PM, Marc Zyngier wrote: [...] diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 894af60b9669..00d99e593b65 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -774,6 +774,7 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) u64 strict_mask = ~0x0ULL; u64 user_mask = 0; u64 valid_mask = 0; + u64 override_val = 0, override_mask = 0; const struct arm64_ftr_bits *ftrp; struct arm64_ftr_reg *reg = get_arm64_ftr_reg(sys_reg); @@ -781,9 +782,35 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) if (!reg) return; + if (reg->override_mask && reg->override_val) { + override_mask = *reg->override_mask; + override_val = *reg->override_val; + } + for (ftrp = reg->ftr_bits; ftrp->width; ftrp++) { u64 ftr_mask = arm64_ftr_mask(ftrp); s64 ftr_new = arm64_ftr_value(ftrp, new); + s64 ftr_ovr = arm64_ftr_value(ftrp, override_val); + + if ((ftr_mask & override_mask) == ftr_mask) { + if (ftr_ovr < ftr_new) { Here we assume that all the features are FTR_LOWER_SAFE. We could probably use arm64_ftr_safe_value(ftrp, ftr_new, ftr_ovr) here ? That would cover us for both HIGHER_SAFE and LOWER_SAFE features. However that may be restrictive for FTR_EXACT, as we the safe value would be set to "ftr->safe_val". I guess that may be better than forcing to use an unsafe value for the boot CPU, which could anyway conflict with the other CPUs and eventually trigger the ftr alue to be safe_val. I like the idea of using the helper, as it cleanups up the code a bit. However, not being to set a feature to a certain value could be restrictive, as in general, it means that we can only disable a feature and not adjust its level of support. Take PMUVER for example: with the helper, I can't override it from v8.4 to v8.1. I can only go to v8.0. My point is, we set this only for the "init" of cpu features. So, even if we init to a custom , non-(default-safe) value, the secondary CPUs could scream, and the system wide safe value could fall back to the "safe" value for EXACT features, no matter what you did to init it. Also, it will be dangerous for things like PAC, as the lower level value may not be compatible with the "actual" cpu feature supported. So simply setting to a lower value for EXACT features is generally not safe, though I understand some are exceptions. Suzuki Is it something we care about? Thanks, M. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 09/21] arm64: cpufeature: Add global feature override facility
Hi Marc, On 1/11/21 7:48 PM, Marc Zyngier wrote: Hi Catalin, On 2021-01-11 18:41, Catalin Marinas wrote: Hi Marc, On Mon, Jan 11, 2021 at 01:27:59PM +, Marc Zyngier wrote: Add a facility to globally override a feature, no matter what the HW says. Yes, this is dangerous. Yeah, it's dangerous. We can make it less so if we only allow safe values (e.g. lower if FTR_UNSIGNED). My plan was also to allow non-safe values in order to trigger features that are not advertised by the HW. But I can understand if you are reluctant to allow such thing! :D diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 9a555809b89c..465d2cb63bfc 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -75,6 +75,8 @@ struct arm64_ftr_reg { u64 sys_val; u64 user_val; const struct arm64_ftr_bits *ftr_bits; + u64 *override_val; + u64 *override_mask; }; At the arm64_ftr_reg level, we don't have any information about the safe values for a feature. Could we instead move this to arm64_ftr_bits? We probably only need a single field. When populating the feature values, we can make sure it doesn't go above the hardware one. I attempted a feature modification for MTE here, though I dropped the entire series in the meantime as we clarified the ARM ARM: https://lore.kernel.org/linux-arm-kernel/20200515171612.1020-24-catalin.mari...@arm.com/ Srinivas copied it in his patch (but forgot to give credit ;)): https://lore.kernel.org/linux-arm-msm/1610152163-16554-3-git-send-email-sram...@codeaurora.org/ The above adds a filter function but, instead, just use your mechanism in this series for idreg.feature setting via cmdline. The arm64_ftr_value() function extracts the hardware value and lowers it if a cmdline argument was passed. One thing is that it is not always possible to sanitise the value passed if it is required very early on, as I do with VHE. But in that case I actually check that we are VHE capable before starting to poke at VHE-specific state. I came up with the following patch on top, which preserves the current global approach (no per arm64_ftr_bits state), but checks (and alters) the override as it iterates through the various fields. For example, if I pass "arm64.nopauth kvm-arm.mode=nvhe id_aa64pfr1.bt=5" to the FVP, I get the following output: [ 0.00] CPU features: SYS_ID_AA64ISAR1_EL1[31:28]: forced from 1 to 0 [ 0.00] CPU features: SYS_ID_AA64ISAR1_EL1[11:8]: forced from 1 to 0 [ 0.00] CPU features: SYS_ID_AA64MMFR1_EL1[11:8]: forced from 1 to 0 [ 0.00] CPU features: SYS_ID_AA64PFR1_EL1[3:0]: not forcing 1 to 5 [ 0.00] CPU features: detected: GIC system register CPU interface [ 0.00] CPU features: detected: Hardware dirty bit management [ 0.00] CPU features: detected: Spectre-v4 [ 0.00] CPU features: detected: Branch Target Identification showing that the PAC features have been downgraded, together with VHE, but that BTI is still detected as value 5 was obviously bogus. Thoughts? M. diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 894af60b9669..00d99e593b65 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -774,6 +774,7 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) u64 strict_mask = ~0x0ULL; u64 user_mask = 0; u64 valid_mask = 0; + u64 override_val = 0, override_mask = 0; const struct arm64_ftr_bits *ftrp; struct arm64_ftr_reg *reg = get_arm64_ftr_reg(sys_reg); @@ -781,9 +782,35 @@ static void __init init_cpu_ftr_reg(u32 sys_reg, u64 new) if (!reg) return; + if (reg->override_mask && reg->override_val) { + override_mask = *reg->override_mask; + override_val = *reg->override_val; + } + for (ftrp = reg->ftr_bits; ftrp->width; ftrp++) { u64 ftr_mask = arm64_ftr_mask(ftrp); s64 ftr_new = arm64_ftr_value(ftrp, new); + s64 ftr_ovr = arm64_ftr_value(ftrp, override_val); + + if ((ftr_mask & override_mask) == ftr_mask) { + if (ftr_ovr < ftr_new) { Here we assume that all the features are FTR_LOWER_SAFE. We could probably use arm64_ftr_safe_value(ftrp, ftr_new, ftr_ovr) here ? That would cover us for both HIGHER_SAFE and LOWER_SAFE features. However that may be restrictive for FTR_EXACT, as we the safe value would be set to "ftr->safe_val". I guess that may be better than forcing to use an unsafe value for the boot CPU, which could anyway conflict with the other CPUs and eventually trigger the ftr alue to be safe_val. i.e, ftr_val = arm64_ftr_safe_value(ftrp, ftr_ovr, ftr_new); Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 03/11] KVM: arm64: Make kvm_skip_instr() and co private to HYP
On 10/26/20 1:34 PM, Marc Zyngier wrote: In an effort to remove the vcpu PC manipulations from EL1 on nVHE systems, move kvm_skip_instr() to be HYP-specific. EL1's intent to increment PC post emulation is now signalled via a flag in the vcpu structure. Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/kvm_emulate.h | 27 +-- arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kvm/handle_exit.c | 6 +-- arch/arm64/kvm/hyp/include/hyp/adjust_pc.h | 56 ++ arch/arm64/kvm/hyp/include/hyp/switch.h| 2 + arch/arm64/kvm/hyp/nvhe/switch.c | 3 ++ arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c | 2 + arch/arm64/kvm/hyp/vgic-v3-sr.c| 2 + arch/arm64/kvm/hyp/vhe/switch.c| 3 ++ arch/arm64/kvm/mmio.c | 2 +- arch/arm64/kvm/mmu.c | 2 +- arch/arm64/kvm/sys_regs.c | 2 +- 12 files changed, 77 insertions(+), 31 deletions(-) create mode 100644 arch/arm64/kvm/hyp/include/hyp/adjust_pc.h diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h index 0864f425547d..6d2b5d1aa7b3 100644 --- a/arch/arm64/include/asm/kvm_emulate.h +++ b/arch/arm64/include/asm/kvm_emulate.h @@ -472,32 +472,9 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu, return data;/* Leave LE untouched */ } -static __always_inline void kvm_skip_instr(struct kvm_vcpu *vcpu) +static __always_inline void kvm_incr_pc(struct kvm_vcpu *vcpu) { - if (vcpu_mode_is_32bit(vcpu)) { - kvm_skip_instr32(vcpu); - } else { - *vcpu_pc(vcpu) += 4; - *vcpu_cpsr(vcpu) &= ~PSR_BTYPE_MASK; - } - - /* advance the singlestep state machine */ - *vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS; -} - -/* - * Skip an instruction which has been emulated at hyp while most guest sysregs - * are live. - */ -static __always_inline void __kvm_skip_instr(struct kvm_vcpu *vcpu) -{ - *vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR); - vcpu_gp_regs(vcpu)->pstate = read_sysreg_el2(SYS_SPSR); - - kvm_skip_instr(vcpu); - - write_sysreg_el2(vcpu_gp_regs(vcpu)->pstate, SYS_SPSR); - write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR); + vcpu->arch.flags |= KVM_ARM64_INCREMENT_PC; } #endif /* __ARM64_KVM_EMULATE_H__ */ diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 0aecbab6a7fb..9a75de3ad8da 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -406,6 +406,7 @@ struct kvm_vcpu_arch { #define KVM_ARM64_GUEST_HAS_SVE (1 << 5) /* SVE exposed to guest */ #define KVM_ARM64_VCPU_SVE_FINALIZED (1 << 6) /* SVE config completed */ #define KVM_ARM64_GUEST_HAS_PTRAUTH (1 << 7) /* PTRAUTH exposed to guest */ +#define KVM_ARM64_INCREMENT_PC (1 << 8) /* Increment PC */ #define vcpu_has_sve(vcpu) (system_supports_sve() && \ ((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE)) diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c index 30bf8e22df54..d4e00a864ee6 100644 --- a/arch/arm64/kvm/handle_exit.c +++ b/arch/arm64/kvm/handle_exit.c @@ -61,7 +61,7 @@ static int handle_smc(struct kvm_vcpu *vcpu) * otherwise return to the same address... */ vcpu_set_reg(vcpu, 0, ~0UL); - kvm_skip_instr(vcpu); + kvm_incr_pc(vcpu); return 1; } @@ -100,7 +100,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu) kvm_clear_request(KVM_REQ_UNHALT, vcpu); } - kvm_skip_instr(vcpu); + kvm_incr_pc(vcpu); return 1; } @@ -221,7 +221,7 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu) * that fail their condition code check" */ if (!kvm_condition_valid(vcpu)) { - kvm_skip_instr(vcpu); + kvm_incr_pc(vcpu); handled = 1; } else { exit_handle_fn exit_handler; diff --git a/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h b/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h new file mode 100644 index ..4ecaf5cb2633 --- /dev/null +++ b/arch/arm64/kvm/hyp/include/hyp/adjust_pc.h @@ -0,0 +1,56 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Guest PC manipulation helpers + * + * Copyright (C) 2012,2013 - ARM Ltd + * Copyright (C) 2020 - Google LLC + * Author: Marc Zyngier + */ + +#ifndef __ARM64_KVM_HYP_ADJUST_PC_H__ +#define __ARM64_KVM_HYP_ADJUST_PC_H__ + +#include +#include + +static inline void kvm_skip_instr(struct kvm_vcpu *vcpu) +{ + if (vcpu_mode_is_32bit(vcpu)) { + kvm_skip_instr32(vcpu); + } else { + *vcpu_pc(vcpu) += 4; + *vcpu_cpsr(vcpu) &= ~PSR_BTYPE_MASK; + } + + /* advance the singlestep state machine */ + *vcpu_cpsr(vcpu) &= ~DBG_SPSR_SS; +} + +/* + * Skip
Re: [PATCH 2/2] KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set
On 08/11/2020 11:27 AM, Will Deacon wrote: When an MMU notifier call results in unmapping a range that spans multiple PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary, since this avoids running into RCU stalls during VM teardown. Unfortunately, if the VM is destroyed as a result of OOM, then blocking is not permitted and the call to the scheduler triggers the following BUG(): | BUG: sleeping function called from invalid context at arch/arm64/kvm/mmu.c:394 | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper | INFO: lockdep is turned off. | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 | Call trace: | dump_backtrace+0x0/0x284 | show_stack+0x1c/0x28 | dump_stack+0xf0/0x1a4 | ___might_sleep+0x2bc/0x2cc | unmap_stage2_range+0x160/0x1ac | kvm_unmap_hva_range+0x1a0/0x1c8 | kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8 | __mmu_notifier_invalidate_range_start+0x218/0x31c | mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0 | __oom_reap_task_mm+0x128/0x268 | oom_reap_task+0xac/0x298 | oom_reaper+0x178/0x17c | kthread+0x1e4/0x1fc | ret_from_fork+0x10/0x30 Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier flags. Cc: Fixes: 8b3405e345b5 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd") Cc: Marc Zyngier Cc: Suzuki K Poulose Cc: James Morse Signed-off-by: Will Deacon --- Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] kvmtool: arm64: Report missing support for 32bit guests
Hi Marc On 07/01/2020 04:42 PM, Marc Zyngier wrote: On 2020-07-01 15:20, Suzuki K Poulose wrote: When the host doesn't support 32bit guests, the kvmtool fails without a proper message on what is wrong. i.e, $ lkvm run -c 1 Image --aarch32 # lkvm run -k Image -m 256 -c 1 --name guest-105618 Fatal: Unable to initialise vcpu Given that there is no other easy way to check if the host supports 32bit guests, it is always good to report this by checking the capability, rather than leaving the users to hunt this down by looking at the code! After this patch: $ lkvm run -c 1 Image --aarch32 # lkvm run -k Image -m 256 -c 1 --name guest-105695 Fatal: 32bit guests are not supported Fancy! Cc: Will Deacon Reported-by: Sami Mujawar Signed-off-by: Suzuki K Poulose --- arm/kvm-cpu.c | 4 1 file changed, 4 insertions(+) diff --git a/arm/kvm-cpu.c b/arm/kvm-cpu.c index 554414f..2acecae 100644 --- a/arm/kvm-cpu.c +++ b/arm/kvm-cpu.c @@ -46,6 +46,10 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, unsigned long cpu_id) .features = ARM_VCPU_FEATURE_FLAGS(kvm, cpu_id) }; + if (kvm->cfg.arch.aarch32_guest && + !kvm__supports_extension(kvm, KVM_CAP_ARM_EL1_32BIT)) Can you please check that this still compiles for 32bit host? Yes, it does. I have built this on an arm32 rootfs with make ARCH=arm. The kvm->cfg.arch is common across arm/arm64 and is defined here : arm/include/arm-common/kvm-config-arch.h And the aarch32 command line option is only available on aarch64 host. So this is safe on an arm32 host. + die("32bit guests are not supported\n"); + vcpu = calloc(1, sizeof(struct kvm_cpu)); if (!vcpu) return NULL; With the above detail checked, Acked-by: Marc Zyngier Thanks Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH] kvmtool: arm64: Report missing support for 32bit guests
When the host doesn't support 32bit guests, the kvmtool fails without a proper message on what is wrong. i.e, $ lkvm run -c 1 Image --aarch32 # lkvm run -k Image -m 256 -c 1 --name guest-105618 Fatal: Unable to initialise vcpu Given that there is no other easy way to check if the host supports 32bit guests, it is always good to report this by checking the capability, rather than leaving the users to hunt this down by looking at the code! After this patch: $ lkvm run -c 1 Image --aarch32 # lkvm run -k Image -m 256 -c 1 --name guest-105695 Fatal: 32bit guests are not supported Cc: Will Deacon Reported-by: Sami Mujawar Signed-off-by: Suzuki K Poulose --- arm/kvm-cpu.c | 4 1 file changed, 4 insertions(+) diff --git a/arm/kvm-cpu.c b/arm/kvm-cpu.c index 554414f..2acecae 100644 --- a/arm/kvm-cpu.c +++ b/arm/kvm-cpu.c @@ -46,6 +46,10 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, unsigned long cpu_id) .features = ARM_VCPU_FEATURE_FLAGS(kvm, cpu_id) }; + if (kvm->cfg.arch.aarch32_guest && + !kvm__supports_extension(kvm, KVM_CAP_ARM_EL1_32BIT)) + die("32bit guests are not supported\n"); + vcpu = calloc(1, sizeof(struct kvm_cpu)); if (!vcpu) return NULL; -- 2.24.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH V4 06/17] arm64/cpufeature: Introduce ID_MMFR5 CPU register
On 05/19/2020 10:40 AM, Anshuman Khandual wrote: This adds basic building blocks required for ID_MMFR5 CPU register which provides information about the implemented memory model and memory management support in AArch32 state. This is added per ARM DDI 0487F.a specification. Cc: Catalin Marinas Cc: Will Deacon Cc: Marc Zyngier Cc: Mark Rutland Cc: James Morse Cc: Suzuki K Poulose Cc: kvmarm@lists.cs.columbia.edu Cc: linux-arm-ker...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Suggested-by: Will Deacon Signed-off-by: Anshuman Khandual Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH V4 05/17] arm64/cpufeature: Introduce ID_DFR1 CPU register
On 05/19/2020 10:40 AM, Anshuman Khandual wrote: This adds basic building blocks required for ID_DFR1 CPU register which provides top level information about the debug system in AArch32 state. This is added per ARM DDI 0487F.a specification. Cc: Catalin Marinas Cc: Will Deacon Cc: Marc Zyngier Cc: Mark Rutland Cc: James Morse Cc: Suzuki K Poulose Cc: kvmarm@lists.cs.columbia.edu Cc: linux-arm-ker...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Suggested-by: Will Deacon Signed-off-by: Anshuman Khandual diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 600ce237c487..faf644a66e89 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -457,6 +457,11 @@ static const struct arm64_ftr_bits ftr_id_dfr0[] = { ARM64_FTR_END, }; +static const struct arm64_ftr_bits ftr_id_dfr1[] = { + S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_DFR1_MTPMU_SHIFT, 4, 0), + ARM64_FTR_END, +}; + diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c index cb79b083f97f..50a281703d9d 100644 --- a/arch/arm64/kernel/cpuinfo.c +++ b/arch/arm64/kernel/cpuinfo.c @@ -362,6 +362,7 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info) /* Update the 32bit ID registers only if AArch32 is implemented */ if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) { info->reg_id_dfr0 = read_cpuid(ID_DFR0_EL1); + info->reg_id_dfr1 = read_cpuid(ID_DFR1_EL1); info->reg_id_isar0 = read_cpuid(ID_ISAR0_EL1); info->reg_id_isar1 = read_cpuid(ID_ISAR1_EL1); info->reg_id_isar2 = read_cpuid(ID_ISAR2_EL1); diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index b784b156edb3..0723cfbff7e9 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1457,7 +1457,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { ID_SANITISED(MVFR2_EL1), ID_UNALLOCATED(3,3), ID_SANITISED(ID_PFR2_EL1), - ID_UNALLOCATED(3,5), + ID_HIDDEN(ID_DFR1_EL1), It might be a good idea to mention why this is HIDDEN in the description. With that : Reviewed-by : Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm/arm64: release kvm->mmu_lock in loop to prevent starvation
On 04/15/2020 09:42 AM, Jiang Yi wrote: Do cond_resched_lock() in stage2_flush_memslot() like what is done in unmap_stage2_range() and other places holding mmu_lock while processing a possibly large range of memory. Signed-off-by: Jiang Yi --- virt/kvm/arm/mmu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index e3b9ee268823..7315af2c52f8 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -417,16 +417,19 @@ static void stage2_flush_memslot(struct kvm *kvm, phys_addr_t next; pgd_t *pgd; pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr); do { next = stage2_pgd_addr_end(kvm, addr, end); if (!stage2_pgd_none(kvm, *pgd)) stage2_flush_puds(kvm, pgd, addr, next); + + if (next != end) + cond_resched_lock(>mmu_lock); } while (pgd++, addr = next, addr != end); } Given that this is called under the srcu_lock this looks good to me: Reviewed-by: Suzuki K Poulose /** * stage2_flush_vm - Invalidate cache for pages mapped in stage 2 * @kvm: The struct kvm pointer * * Go through the stage 2 page tables and invalidate any cache lines ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH V3 05/16] arm64/cpufeature: Introduce ID_DFR1 CPU register
On 05/02/2020 02:33 PM, Anshuman Khandual wrote: This adds basic building blocks required for ID_DFR1 CPU register which provides top level information about the debug system in AArch32 state. This is added per ARM DDI 0487F.a specification. Cc: Catalin Marinas Cc: Will Deacon Cc: Marc Zyngier Cc: Mark Rutland Cc: James Morse Cc: Suzuki K Poulose Cc: kvmarm@lists.cs.columbia.edu Cc: linux-arm-ker...@lists.infradead.org Cc: linux-ker...@vger.kernel.org Suggested-by: Will Deacon Signed-off-by: Anshuman Khandual --- arch/arm64/include/asm/cpu.h| 1 + arch/arm64/include/asm/sysreg.h | 3 +++ arch/arm64/kernel/cpufeature.c | 10 ++ arch/arm64/kernel/cpuinfo.c | 1 + arch/arm64/kvm/sys_regs.c | 2 +- 5 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h index 464e828a994d..d9a78bdec409 100644 --- a/arch/arm64/include/asm/cpu.h +++ b/arch/arm64/include/asm/cpu.h @@ -33,6 +33,7 @@ struct cpuinfo_arm64 { u64 reg_id_aa64zfr0; u32 reg_id_dfr0; + u32 reg_id_dfr1; u32 reg_id_isar0; u32 reg_id_isar1; u32 reg_id_isar2; diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index c977449e02db..2e1e922e1409 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -154,6 +154,7 @@ #define SYS_MVFR1_EL1 sys_reg(3, 0, 0, 3, 1) #define SYS_MVFR2_EL1 sys_reg(3, 0, 0, 3, 2) #define SYS_ID_PFR2_EL1 sys_reg(3, 0, 0, 3, 4) +#define SYS_ID_DFR1_EL1sys_reg(3, 0, 0, 3, 5) #define SYS_ID_AA64PFR0_EL1 sys_reg(3, 0, 0, 4, 0) #define SYS_ID_AA64PFR1_EL1 sys_reg(3, 0, 0, 4, 1) @@ -763,6 +764,8 @@ #define ID_ISAR4_WITHSHIFTS_SHIFT 4 #define ID_ISAR4_UNPRIV_SHIFT 0 +#define ID_DFR1_MTPMU_SHIFT 0 + #define ID_ISAR0_DIVIDE_SHIFT 24 #define ID_ISAR0_DEBUG_SHIFT 20 #define ID_ISAR0_COPROC_SHIFT 16 diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index a8247bf92959..2ce952d9668d 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -451,6 +451,11 @@ static const struct arm64_ftr_bits ftr_id_dfr0[] = { ARM64_FTR_END, }; +static const struct arm64_ftr_bits ftr_id_dfr1[] = { + S_ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_DFR1_MTPMU_SHIFT, 4, 0), - ID_UNALLOCATED(3,5), + ID_SANITISED(ID_DFR1_EL1), ID_UNALLOCATED(3,6), ID_UNALLOCATED(3,7), IIUC, we should not expose the MTPMU to the KVM guests. Either we could drop this entire patch, or we should emulate the MTPMU to 0 in KVM. Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v4 3/4] KVM: arm64: Change CONFIG_KVM to a menuconfig entry
On 04/21/2020 02:17 PM, Fuad Tabba wrote: From: Will Deacon Changing CONFIG_KVM to be a 'menuconfig' entry in Kconfig mean that we can straightforwardly enumerate optional features, such as the virtual PMU device as dependent options. Signed-off-by: Will Deacon Signed-off-by: Fuad Tabba Acked-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v4 1/4] KVM: arm64: Kill off CONFIG_KVM_ARM_HOST
On 04/21/2020 02:17 PM, Fuad Tabba wrote: From: Will Deacon CONFIG_KVM_ARM_HOST is just a proxy for CONFIG_KVM, so remove it in favour of the latter. Signed-off-by: Will Deacon Signed-off-by: Fuad Tabba Acked-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v4 2/4] KVM: arm64: Update help text
On 04/21/2020 02:17 PM, Fuad Tabba wrote: From: Will Deacon arm64 KVM supports 16k pages since 02e0b7600f83 ("arm64: kvm: Add support for 16K pages"), so update the Kconfig help text accordingly. Signed-off-by: Will Deacon Signed-off-by: Fuad Tabba --- arch/arm64/kvm/Kconfig | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index ce724e526689..d2cf4f099454 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -44,8 +44,6 @@ config KVM select TASK_DELAY_ACCT ---help--- Support hosting virtualized guest machines. - We don't support KVM with 16K page tables yet, due to the multiple - levels of fake page tables. If unsure, say N. Acked-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 04/26] arm64: Detect the ARMv8.4 TTL feature
On 04/22/2020 01:00 PM, Marc Zyngier wrote: In order to reduce the cost of TLB invalidation, the ARMv8.4 TTL feature allows TLBs to be issued with a level allowing for quicker invalidation. Let's detect the feature for now. Further patches will implement its actual usage. Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/cpucaps.h | 3 ++- arch/arm64/include/asm/sysreg.h | 1 + arch/arm64/kernel/cpufeature.c | 11 +++ 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h index 8eb5a088ae658..cabb0c49a1d11 100644 --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -61,7 +61,8 @@ #define ARM64_HAS_AMU_EXTN51 #define ARM64_HAS_ADDRESS_AUTH52 #define ARM64_HAS_GENERIC_AUTH53 +#define ARM64_HAS_ARMv8_4_TTL 54 -#define ARM64_NCAPS54 +#define ARM64_NCAPS55 #endif /* __ASM_CPUCAPS_H */ diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 5d10c9148e844..79cf186b7e471 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -726,6 +726,7 @@ /* id_aa64mmfr2 */ #define ID_AA64MMFR2_E0PD_SHIFT 60 +#define ID_AA64MMFR2_TTL_SHIFT 48 #define ID_AA64MMFR2_FWB_SHIFT40 #define ID_AA64MMFR2_AT_SHIFT 32 #define ID_AA64MMFR2_LVA_SHIFT16 diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 9892a845d06c9..d8ab4f1e93bee 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -252,6 +252,7 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr1[] = { static const struct arm64_ftr_bits ftr_id_aa64mmfr2[] = { ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_E0PD_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_TTL_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_FWB_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_AT_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_LVA_SHIFT, 4, 0), @@ -1630,6 +1631,16 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_cpuid_feature, .cpu_enable = cpu_has_fwb, }, + { + .desc = "ARMv8.4 Translation Table Level", + .type = ARM64_CPUCAP_SYSTEM_FEATURE, + .capability = ARM64_HAS_ARMv8_4_TTL, + .sys_reg = SYS_ID_AA64MMFR2_EL1, + .sign = FTR_UNSIGNED, + .field_pos = ID_AA64MMFR2_TTL_SHIFT, + .min_field_value = 1, + .matches = has_cpuid_feature, + }, Reviewed-by : Suzuki K Polose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 01/26] KVM: arm64: Check advertised Stage-2 page size capability
On 04/22/2020 03:07 PM, Marc Zyngier wrote: Hi Suzuki, On 2020-04-22 14:40, Suzuki K Poulose wrote: Hi Marc, On 04/22/2020 01:00 PM, Marc Zyngier wrote: With ARMv8.5-GTG, the hardware (or more likely a hypervisor) can advertise the supported Stage-2 page sizes. Let's check this at boot time. Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/kvm_host.h | 2 +- arch/arm64/include/asm/sysreg.h | 3 +++ arch/arm64/kernel/cpufeature.c | 8 +++ arch/arm64/kvm/reset.c | 40 --- virt/kvm/arm/arm.c | 4 +--- 5 files changed, 50 insertions(+), 7 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 32c8a675e5a4a..7dd8fefa6aecd 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -670,7 +670,7 @@ static inline int kvm_arm_have_ssbd(void) void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu); void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu); -void kvm_set_ipa_limit(void); +int kvm_set_ipa_limit(void); #define __KVM_HAVE_ARCH_VM_ALLOC struct kvm *kvm_arch_alloc_vm(void); diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index ebc6224328318..5d10c9148e844 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -686,6 +686,9 @@ #define ID_AA64ZFR0_SVEVER_SVE2 0x1 /* id_aa64mmfr0 */ +#define ID_AA64MMFR0_TGRAN4_2_SHIFT 40 +#define ID_AA64MMFR0_TGRAN64_2_SHIFT 36 +#define ID_AA64MMFR0_TGRAN16_2_SHIFT 32 #define ID_AA64MMFR0_TGRAN4_SHIFT 28 #define ID_AA64MMFR0_TGRAN64_SHIFT 24 #define ID_AA64MMFR0_TGRAN16_SHIFT 20 diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 9fac745aa7bb2..9892a845d06c9 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -208,6 +208,14 @@ static const struct arm64_ftr_bits ftr_id_aa64zfr0[] = { }; static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = { + /* + * Page size not being supported at Stage-2 are not fatal. You + * just give up KVM if PAGE_SIZE isn't supported there. Go fix + * your favourite nesting hypervisor. + */ + ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, ID_AA64MMFR0_TGRAN4_2_SHIFT, 4, 1), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, ID_AA64MMFR0_TGRAN64_2_SHIFT, 4, 1), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, ID_AA64MMFR0_TGRAN16_2_SHIFT, 4, 1), One minor issue with this is, if we get a system with cpus having values 0 and 2 (both of which indicates the stage-2 support), we might reset the value to 1 for the feature indicating, we don't support and block KVM. But, we can blame the nesting hypervisor for not emulating this to (2). Do we need a comment to make this explicit here ? Sure. How about something like: "There is a small corner case where the hypervisor could explicitly advertise a given granule size at Stage-2 (value 2) on some vCPUs, and use the fallback to Stage-1 (value 0) for other vCPUs. Although this is not forbidden by the architecture, it indicates that the hypervisor is being silly (or buggy). We make no effort to cope with this and pretend that if these fields are inconsistent across vCPUs, then it isn't worth trying to bring KVM up." Looks fine to me. Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 02/26] KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h
Hi Marc, On 04/22/2020 01:00 PM, Marc Zyngier wrote: Having __load_guest_stage2 in kvm_hyp.h is quickly going to trigger a circular include problem. In order to avoid this, let's move it to kvm_mmu.h, where it will be a better fit anyway. In the process, drop the __hyp_text annotation, which doesn't help as the function is marked as __always_inline. Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/kvm_hyp.h | 18 -- arch/arm64/include/asm/kvm_mmu.h | 17 + 2 files changed, 17 insertions(+), 18 deletions(-) diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h index fe57f60f06a89..dcb63bf941057 100644 --- a/arch/arm64/include/asm/kvm_hyp.h +++ b/arch/arm64/include/asm/kvm_hyp.h @@ -10,7 +10,6 @@ #include #include #include -#include #include #define __hyp_text __section(.hyp.text) notrace @@ -88,22 +87,5 @@ void deactivate_traps_vhe_put(void); u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt); void __noreturn __hyp_do_panic(unsigned long, ...); -/* - * Must be called from hyp code running at EL2 with an updated VTTBR - * and interrupts disabled. - */ -static __always_inline void __hyp_text __load_guest_stage2(struct kvm *kvm) -{ - write_sysreg(kvm->arch.vtcr, vtcr_el2); - write_sysreg(kvm_get_vttbr(kvm), vttbr_el2); - - /* -* ARM errata 1165522 and 1530923 require the actual execution of the -* above before we can switch to the EL1/EL0 translation regime used by -* the guest. -*/ - asm(ALTERNATIVE("nop", "isb", ARM64_WORKAROUND_SPECULATIVE_AT_VHE)); -} - #endif /* __ARM64_KVM_HYP_H__ */ diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 30b0e8d6b8953..5ba1310639ec6 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -604,5 +604,22 @@ static __always_inline u64 kvm_get_vttbr(struct kvm *kvm) return kvm_phys_to_vttbr(baddr) | vmid_field | cnp; } +/* + * Must be called from hyp code running at EL2 with an updated VTTBR + * and interrupts disabled. + */ +static __always_inline void __load_guest_stage2(struct kvm *kvm) +{ + write_sysreg(kvm->arch.vtcr, vtcr_el2); + write_sysreg(kvm_get_vttbr(kvm), vttbr_el2); + + /* +* ARM erratum 1165522 requires the actual execution of the above Is it intentional to drop the reference to errata 1530923 ? Otherwise : Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 01/26] KVM: arm64: Check advertised Stage-2 page size capability
Hi Marc, On 04/22/2020 01:00 PM, Marc Zyngier wrote: With ARMv8.5-GTG, the hardware (or more likely a hypervisor) can advertise the supported Stage-2 page sizes. Let's check this at boot time. Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/kvm_host.h | 2 +- arch/arm64/include/asm/sysreg.h | 3 +++ arch/arm64/kernel/cpufeature.c| 8 +++ arch/arm64/kvm/reset.c| 40 --- virt/kvm/arm/arm.c| 4 +--- 5 files changed, 50 insertions(+), 7 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 32c8a675e5a4a..7dd8fefa6aecd 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -670,7 +670,7 @@ static inline int kvm_arm_have_ssbd(void) void kvm_vcpu_load_sysregs(struct kvm_vcpu *vcpu); void kvm_vcpu_put_sysregs(struct kvm_vcpu *vcpu); -void kvm_set_ipa_limit(void); +int kvm_set_ipa_limit(void); #define __KVM_HAVE_ARCH_VM_ALLOC struct kvm *kvm_arch_alloc_vm(void); diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index ebc6224328318..5d10c9148e844 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -686,6 +686,9 @@ #define ID_AA64ZFR0_SVEVER_SVE2 0x1 /* id_aa64mmfr0 */ +#define ID_AA64MMFR0_TGRAN4_2_SHIFT40 +#define ID_AA64MMFR0_TGRAN64_2_SHIFT 36 +#define ID_AA64MMFR0_TGRAN16_2_SHIFT 32 #define ID_AA64MMFR0_TGRAN4_SHIFT 28 #define ID_AA64MMFR0_TGRAN64_SHIFT24 #define ID_AA64MMFR0_TGRAN16_SHIFT20 diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 9fac745aa7bb2..9892a845d06c9 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -208,6 +208,14 @@ static const struct arm64_ftr_bits ftr_id_aa64zfr0[] = { }; static const struct arm64_ftr_bits ftr_id_aa64mmfr0[] = { + /* +* Page size not being supported at Stage-2 are not fatal. You +* just give up KVM if PAGE_SIZE isn't supported there. Go fix +* your favourite nesting hypervisor. +*/ + ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, ID_AA64MMFR0_TGRAN4_2_SHIFT, 4, 1), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, ID_AA64MMFR0_TGRAN64_2_SHIFT, 4, 1), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_EXACT, ID_AA64MMFR0_TGRAN16_2_SHIFT, 4, 1), One minor issue with this is, if we get a system with cpus having values 0 and 2 (both of which indicates the stage-2 support), we might reset the value to 1 for the feature indicating, we don't support and block KVM. But, we can blame the nesting hypervisor for not emulating this to (2). Do we need a comment to make this explicit here ? Otherwise this looks fine to me Suzuki /* * We already refuse to boot CPUs that don't support our configured * page size, so we can only detect mismatches for a page size other diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 30b7ea680f66c..241db35a7ef4f 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -9,6 +9,7 @@ */ #include +#include #include #include #include @@ -340,11 +341,42 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) return ret; } -void kvm_set_ipa_limit(void) +int kvm_set_ipa_limit(void) { - unsigned int ipa_max, pa_max, va_max, parange; + unsigned int ipa_max, pa_max, va_max, parange, tgran_2; + u64 mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1); - parange = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1) & 0x7; + /* +* Check with ARMv8.5-GTG that our PAGE_SIZE is supported at +* Stage-2. If not, things will stop very quickly. +*/ + switch (PAGE_SIZE) { + default: + case SZ_4K: + tgran_2 = ID_AA64MMFR0_TGRAN4_2_SHIFT; + break; + case SZ_16K: + tgran_2 = ID_AA64MMFR0_TGRAN16_2_SHIFT; + break; + case SZ_64K: + tgran_2 = ID_AA64MMFR0_TGRAN64_2_SHIFT; + break; + } + + switch (FIELD_GET(0xFUL << tgran_2, mmfr0)) { + default: + case 1: + kvm_err("PAGE_SIZE not supported at Stage-2, giving up\n"); + return -EINVAL; + case 0: + kvm_debug("PAGE_SIZE supported at Stage-2 (default)\n"); + break; + case 2: + kvm_debug("PAGE_SIZE supported at Stage-2 (advertised)\n"); + break; + } + + parange = mmfr0 & 0x7; pa_max = id_aa64mmfr0_parange_to_phys_shift(parange); /* Clamp the IPA limit to the PA size supported by the kernel */ @@ -378,6 +410,8 @@ void kvm_set_ipa_limit(void) "KVM IPA limit (%d bit) is smaller than default size\n", ipa_max); kvm_ipa_limit = ipa_max; kvm_info("IPA Size Limit: %dbits\n", kvm_ipa_limit); + +
Re: [PATCH v2 8/8] arm64: cpufeature: Add an overview comment for the cpufeature framework
On 04/21/2020 03:29 PM, Will Deacon wrote: Now that Suzuki isn't within throwing distance, I thought I'd better add a rough overview comment to cpufeature.c so that it doesn't take me days to remember how it works next time. Signed-off-by: Will Deacon --- Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 3/8] arm64: cpufeature: Add CPU capability for AArch32 EL1 support
On 04/15/2020 02:22 PM, Marc Zyngier wrote: On Wed, 15 Apr 2020 14:15:51 +0100 Suzuki K Poulose wrote: On 04/15/2020 11:14 AM, Will Deacon wrote: On Wed, Apr 15, 2020 at 11:13:54AM +0100, Suzuki K Poulose wrote: On 04/14/2020 10:31 PM, Will Deacon wrote: Although we emit a "SANITY CHECK" warning and taint the kernel if we detect a CPU mismatch for AArch32 support at EL1, we still online the CPU with disastrous consequences for any running 32-bit VMs. Introduce a capability for AArch32 support at EL1 so that late onlining of incompatible CPUs is forbidden. Signed-off-by: Will Deacon One of the other important missing sanity check for KVM is the VMID width check. I will code something up. Cheers! Do we handle things like the IPA size already? Good point. No, we don't. I will include this too. There is also the question of the ARMv8.5-GTG extension. I have a patch that treats it as non-strict, but that approach would fail with KVM if we online a late CPU without support for the right page size at S2. Good point. Again this can be added to the list of checks performed on the hot-plugged CPUs along with IPA, VMID width. Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 7/8] arm64: cpufeature: Relax checks for AArch32 support at EL[0-2]
On 04/15/2020 01:29 PM, Will Deacon wrote: On Wed, Apr 15, 2020 at 12:37:31PM +0100, Suzuki K Poulose wrote: On 04/15/2020 11:58 AM, Will Deacon wrote: On Wed, Apr 15, 2020 at 11:50:58AM +0100, Suzuki K Poulose wrote: On 04/14/2020 10:31 PM, Will Deacon wrote: We don't need to be quite as strict about mismatched AArch32 support, which is good because the friendly hardware folks have been busy mismatching this to their hearts' content. * We don't care about EL2 or EL3 (there are silly comments concerning the latter, so remove those) * EL1 support is gated by the ARM64_HAS_32BIT_EL1 capability and handled gracefully when a mismatch occurs * EL1 support is gated by the ARM64_HAS_32BIT_EL0 capability and handled s/EL1/EL0 gracefully when a mismatch occurs Relax the AArch32 checks to FTR_NONSTRICT. Agreed. We should do something similar for the features exposed by the ELF_HWCAP, of course in a separate series. Hmm, I didn't think we needed to touch the HWCAPs, as they're derived from the sanitised feature register values. What am I missing? sorry, that was cryptic. I was suggesting to relax the ftr fields to NONSTRICT for the fields covered by ELF HWCAPs (and other CPU hwcaps). Ah, gotcha. Given that the HWCAPs usually describe EL0 features, I say we can punt this down the road until people give us hardware with mismatched AArch32 at EL0. Btw, this is not just mismatched AArch32, but mismatched AArch64 HWCAPs too, which I believe exists. Anyways as you said, we can delay this until we get the reports :-) Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 8/8] arm64: cpufeature: Add an overview comment for the cpufeature framework
Hi Will, On 04/14/2020 10:31 PM, Will Deacon wrote: Now that Suzuki isn't within throwing distance, I thought I'd better add a rough overview comment to cpufeature.c so that it doesn't take me days to remember how it works next time. Signed-off-by: Will Deacon --- arch/arm64/kernel/cpufeature.c | 43 ++ 1 file changed, 43 insertions(+) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 680a453ca8c4..421ca99dc8fc 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -3,6 +3,49 @@ * Contains CPU feature definitions * * Copyright (C) 2015 ARM Ltd. + * + * A note for the weary kernel hacker: the code here is confusing and hard to + * follow! That's partly because it's solving a nasty problem, but also because + * there's a little bit of over-abstraction that tends to obscure what's going + * on behind a maze of helper functions and macros. Thanks for writing this up ! + * + * The basic problem is that hardware folks have started gluing together CPUs + * with distinct architectural features; in some cases even creating SoCs where + * user-visible instructions are available only on a subset of the available + * cores. We try to address this by snapshotting the feature registers of the + * boot CPU and comparing these with the feature registers of each secondary + * CPU when bringing them up. If there is a mismatch, then we update the + * snapshot state to indicate the lowest-common denominator of the feature, + * known as the "safe" value. This snapshot state can be queried to view the I am not sure if the following is implied above. 1) Against the "snapshot" state, where mismatches triggers updating the "snapshot" state to reflect the "safe" value. 2) Compared against the CPU feature registers of *the boot CPU* for "FTR_STRICT" fields and any mismatch triggers TAINT_CPU_OUT_OF_SPEC. This makes sure that warning is generated for each OUT_OF_SPEC secondary CPU. + * "sanitised" value of a feature register. + * + * The sanitised register values are used to decide which capabilities we + * have in the system. These may be in the form of traditional "hwcaps" + * advertised to userspace or internal "cpucaps" which are used to configure + * things like alternative patching and static keys. While a feature mismatch + * may result in a TAINT_CPU_OUT_OF_SPEC kernel taint, a capability mismatch + * may prevent a CPU from being onlined at all. + * + * Some implementation details worth remembering: + * + * - Mismatched features are *always* sanitised to a "safe" value, which + * usually indicates that the feature is not supported. + * + * - A mismatched feature marked with FTR_STRICT will cause a "SANITY CHECK" + * warning when onlining an offending CPU and the kernel will be tainted + * with TAINT_CPU_OUT_OF_SPEC. As mentioned above, this check is against that of the "boot CPU" register state, which may not be implicit from the statement. + * + * - Features marked as FTR_VISIBLE have their sanitised value visible to + * userspace. FTR_VISIBLE features in registers that are only visible + * to EL0 by trapping *must* have a corresponding HWCAP so that late + * onlining of CPUs cannot lead to features disappearing at runtime. + * As you mentioned in the other response we could add information about the guest view, something like : - KVM exposes the sanitised value of the feature registers to the guests and is not affected by the FTR_VISIBLE. However, depending on the individual feature support in the hypervisor, some of the fields may be capped/limited. Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 3/8] arm64: cpufeature: Add CPU capability for AArch32 EL1 support
On 04/15/2020 11:14 AM, Will Deacon wrote: On Wed, Apr 15, 2020 at 11:13:54AM +0100, Suzuki K Poulose wrote: On 04/14/2020 10:31 PM, Will Deacon wrote: Although we emit a "SANITY CHECK" warning and taint the kernel if we detect a CPU mismatch for AArch32 support at EL1, we still online the CPU with disastrous consequences for any running 32-bit VMs. Introduce a capability for AArch32 support at EL1 so that late onlining of incompatible CPUs is forbidden. Signed-off-by: Will Deacon One of the other important missing sanity check for KVM is the VMID width check. I will code something up. Cheers! Do we handle things like the IPA size already? Good point. No, we don't. I will include this too. Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 7/8] arm64: cpufeature: Relax checks for AArch32 support at EL[0-2]
On 04/15/2020 11:58 AM, Will Deacon wrote: On Wed, Apr 15, 2020 at 11:50:58AM +0100, Suzuki K Poulose wrote: On 04/14/2020 10:31 PM, Will Deacon wrote: We don't need to be quite as strict about mismatched AArch32 support, which is good because the friendly hardware folks have been busy mismatching this to their hearts' content. * We don't care about EL2 or EL3 (there are silly comments concerning the latter, so remove those) * EL1 support is gated by the ARM64_HAS_32BIT_EL1 capability and handled gracefully when a mismatch occurs * EL1 support is gated by the ARM64_HAS_32BIT_EL0 capability and handled s/EL1/EL0 gracefully when a mismatch occurs Relax the AArch32 checks to FTR_NONSTRICT. Agreed. We should do something similar for the features exposed by the ELF_HWCAP, of course in a separate series. Hmm, I didn't think we needed to touch the HWCAPs, as they're derived from the sanitised feature register values. What am I missing? sorry, that was cryptic. I was suggesting to relax the ftr fields to NONSTRICT for the fields covered by ELF HWCAPs (and other CPU hwcaps). Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 7/8] arm64: cpufeature: Relax checks for AArch32 support at EL[0-2]
On 04/14/2020 10:31 PM, Will Deacon wrote: We don't need to be quite as strict about mismatched AArch32 support, which is good because the friendly hardware folks have been busy mismatching this to their hearts' content. * We don't care about EL2 or EL3 (there are silly comments concerning the latter, so remove those) * EL1 support is gated by the ARM64_HAS_32BIT_EL1 capability and handled gracefully when a mismatch occurs * EL1 support is gated by the ARM64_HAS_32BIT_EL0 capability and handled s/EL1/EL0 gracefully when a mismatch occurs Relax the AArch32 checks to FTR_NONSTRICT. Agreed. We should do something similar for the features exposed by the ELF_HWCAP, of course in a separate series. Signed-off-by: Will Deacon Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 6/8] arm64: cpufeature: Relax AArch32 system checks if EL1 is 64-bit only
On 04/14/2020 10:31 PM, Will Deacon wrote: If AArch32 is not supported at EL1, the AArch32 feature register fields no longer advertise support for some system features: * ISAR4.SMC * PFR1.{Virt_frac, Sec_frac, Virtualization, Security, ProgMod} In which case, we don't need to emit "SANITY CHECK" failures for all of them. Add logic to relax the strictness of individual feature register fields at runtime and use this for the fields above if 32-bit EL1 is not supported. Signed-off-by: Will Deacon Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 5/8] arm64: cpufeature: Factor out checking of AArch32 features
On 04/14/2020 10:31 PM, Will Deacon wrote: update_cpu_features() is pretty large, so split out the checking of the AArch32 features into a separate function and call it after checking the AArch64 features. Signed-off-by: Will Deacon --- arch/arm64/kernel/cpufeature.c | 108 +++-- 1 file changed, 61 insertions(+), 47 deletions(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 7dfcdd9e75c1..32828a77acc3 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -715,6 +715,65 @@ static int check_update_ftr_reg(u32 sys_id, int cpu, u64 val, u64 boot) return 1; } +static int update_32bit_cpu_features(int cpu, struct cpuinfo_arm64 *info, +struct cpuinfo_arm64 *boot) +{ ... - if (id_aa64pfr0_sve(info->reg_id_aa64pfr0)) { taint |= check_update_ftr_reg(SYS_ZCR_EL1, cpu, info->reg_zcr, boot->reg_zcr); @@ -845,6 +857,8 @@ void update_cpu_features(int cpu, sve_update_vq_map(); } + taint |= update_32bit_cpu_features(cpu, info, boot); + This relies on the assumption that the id_aa64pfr0 has been sanitised. It may be worth adding a comment to make sure people (hacking the kernel) don't move this around and break that dependency. Either ways: Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 4/8] arm64: cpufeature: Remove redundant call to id_aa64pfr0_32bit_el0()
On 04/14/2020 10:31 PM, Will Deacon wrote: There's no need to call id_aa64pfr0_32bit_el0() twice because the sanitised value of ID_AA64PFR0_EL1 has already been updated for the CPU being onlined. Remove the redundant function call. Signed-off-by: Will Deacon Reviewed-by: Suzuki K Poulose --- arch/arm64/kernel/cpufeature.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 838fe5cc8d7e..7dfcdd9e75c1 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -792,9 +792,7 @@ void update_cpu_features(int cpu, * If we have AArch32, we care about 32-bit features for compat. * If the system doesn't support AArch32, don't update them. */ - if (id_aa64pfr0_32bit_el0(read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1)) && - id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) { - + if (id_aa64pfr0_32bit_el0(read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1))) { taint |= check_update_ftr_reg(SYS_ID_DFR0_EL1, cpu, info->reg_id_dfr0, boot->reg_id_dfr0); taint |= check_update_ftr_reg(SYS_ID_ISAR0_EL1, cpu, ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 3/8] arm64: cpufeature: Add CPU capability for AArch32 EL1 support
On 04/14/2020 10:31 PM, Will Deacon wrote: Although we emit a "SANITY CHECK" warning and taint the kernel if we detect a CPU mismatch for AArch32 support at EL1, we still online the CPU with disastrous consequences for any running 32-bit VMs. Introduce a capability for AArch32 support at EL1 so that late onlining of incompatible CPUs is forbidden. Signed-off-by: Will Deacon One of the other important missing sanity check for KVM is the VMID width check. I will code something up. Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 2/8] arm64: cpufeature: Spell out register fields for ID_ISAR4 and ID_PFR1
On 04/14/2020 10:31 PM, Will Deacon wrote: In preparation for runtime updates to the strictness of some AArch32 features, spell out the register fields for ID_ISAR4 and ID_PFR1 to make things clearer to read. Note that this isn't functionally necessary, as the feature arrays themselves are not modified dynamically and remain 'const'. Signed-off-by: Will Deacon --- arch/arm64/include/asm/sysreg.h | 17 + arch/arm64/kernel/cpufeature.c | 28 ++-- 2 files changed, 43 insertions(+), 2 deletions(-) Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 1/8] arm64: cpufeature: Relax check for IESB support
Hi Will On 04/14/2020 10:31 PM, Will Deacon wrote: From: Sai Prakash Ranjan We don't care if IESB is supported or not as we always set SCTLR_ELx.IESB and, if it works, that's really great. Relax the ID_AA64MMFR2.IESB cpufeature check so that we don't warn and taint if it's mismatched. Signed-off-by: Sai Prakash Ranjan [will: rewrote commit message] Signed-off-by: Will Deacon Reviewed-by: Suzuki K Poulose --- arch/arm64/kernel/cpufeature.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 9fac745aa7bb..63df28e6a425 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -247,7 +247,7 @@ static const struct arm64_ftr_bits ftr_id_aa64mmfr2[] = { ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_FWB_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_AT_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_LVA_SHIFT, 4, 0), - ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_IESB_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_IESB_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_LSM_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_UAO_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64MMFR2_CNP_SHIFT, 4, 0), ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 0/6] Introduce ID_PFR2 and other CPU feature changes
Hi Anshuman On 02/14/2020 04:23 AM, Anshuman Khandual wrote: On 01/28/2020 06:09 PM, Anshuman Khandual wrote: This series is primarily motivated from an adhoc list from Mark Rutland during our ID_ISAR6 discussion [1]. Besides, it also includes a patch which does macro replacement for various open bits shift encodings in various CPU ID registers. This series is based on linux-next 20200124. [1] https://patchwork.kernel.org/patch/11287805/ Is there anything else apart from these changes which can be accommodated in this series, please do let me know. Thank you. Just a gentle ping. Any updates, does this series looks okay ? Is there anything else related to CPU ID register feature bits, which can be added up here. FWIW, the series still applies on v5.6-rc1. Sorry for the delay ! The series looks good to me, except for some minor comments. Please see the individual patches. Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 1/6] arm64/cpufeature: Introduce ID_PFR2 CPU register
On 01/28/2020 12:39 PM, Anshuman Khandual wrote: This adds basic building blocks required for ID_PFR2 CPU register which provides information about the AArch32 programmers model which must be interpreted along with ID_PFR0 and ID_PFR1 CPU registers. Cc: Catalin Marinas Cc: Will Deacon Cc: Marc Zyngier Cc: James Morse Cc: Suzuki K Poulose Cc: Mark Rutland Cc: kvmarm@lists.cs.columbia.edu Cc: linux-ker...@vger.kernel.org Signed-off-by: Anshuman Khandual Sorry for the delay ! Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 1/2] arm64: Combine workarounds for speculative AT errata
On 13/11/2019 11:41, Steven Price wrote: Cortex-A57/A72 have a similar errata to Cortex-A76 regarding speculation of the AT instruction. Since the workaround for A57/A72 doesn't require VHE, the restriction enforcing VHE for A76 can be removed by combining the workaround flag for both errata. So combine WORKAROUND_1165522 and WORKAROUND_1319367 into WORKAROUND_SPECULATIVE_AT. The majority of code is contained within VHE or NVHE specific functions, for the cases where the code is shared extra checks against has_vhe(). This also paves the way for adding a similar erratum for Cortex-A55. Signed-off-by: Steven Price diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index 4f8187a4fc46..b801f8e832aa 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -744,6 +744,16 @@ static const struct midr_range erratum_1418040_list[] = { }; #endif +#ifdef CONFIG_ARM64_WORKAROUND_SPECULATIVE_AT +static const struct midr_range erratum_speculative_at_list[] = { +#ifdef CONFIG_ARM64_ERRATUM_1165522 + /* Cortex A76 r0p0 to r2p0 */ + MIDR_RANGE(MIDR_CORTEX_A76, 0, 0, 2, 0), +#endif + {}, +}; +#endif + const struct arm64_cpu_capabilities arm64_errata[] = { #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE { @@ -868,12 +878,11 @@ const struct arm64_cpu_capabilities arm64_errata[] = { ERRATA_MIDR_RANGE_LIST(erratum_1418040_list), }, #endif -#ifdef CONFIG_ARM64_ERRATUM_1165522 +#ifdef CONFIG_ARM64_WORKAROUND_SPECULATIVE_AT { - /* Cortex-A76 r0p0 to r2p0 */ .desc = "ARM erratum 1165522", - .capability = ARM64_WORKAROUND_1165522, - ERRATA_MIDR_RANGE(MIDR_CORTEX_A76, 0, 0, 2, 0), + .capability = ARM64_WORKAROUND_SPECULATIVE_AT, + ERRATA_MIDR_RANGE_LIST(erratum_speculative_at_list), }, #endif #ifdef CONFIG_ARM64_ERRATUM_1463225 @@ -910,7 +919,7 @@ const struct arm64_cpu_capabilities arm64_errata[] = { #ifdef CONFIG_ARM64_ERRATUM_1319367 { .desc = "ARM erratum 1319367", - .capability = ARM64_WORKAROUND_1319367, + .capability = ARM64_WORKAROUND_SPECULATIVE_AT, ERRATA_MIDR_RANGE_LIST(ca57_a72), }, #endif Have you tested this patch with both the errata CONFIGs turned on ? Having multiple entries for the same capability should trigger a WARNING at boot with init_cpu_hwcaps_indirect_list_from_array(). You could simply add the MIDRs to the midr_list and update the description to include all the Errata numbers. Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC PATCH v4 2/5] ptp: Reorganize ptp_kvm modules to make it arch-independent.
On 26/09/2019 12:42, Jianyong Wu wrote: Currently, ptp_kvm modules implementation is only for x86 which includs large part of arch-specific code. This patch move all of those code into new arch related file in the same directory. Signed-off-by: Jianyong Wu ... +int kvm_arch_ptp_get_clock_fn(unsigned long *cycle, struct timespec64 *tspec, + struct clocksource **cs) diff --git a/include/asm-generic/ptp_kvm.h b/include/asm-generic/ptp_kvm.h new file mode 100644 index ..208e842bfa64 --- /dev/null +++ b/include/asm-generic/ptp_kvm.h +int kvm_arch_ptp_get_clock_fn(long *cycle, + struct timespec64 *tspec, void *cs); Conflicting types for kvm_arch_ptp_get_clock_fn() ? Suzuki
Re: [RFC PATCH v4 2/5] ptp: Reorganize ptp_kvm modules to make it arch-independent.
On 27/09/2019 11:10, Jianyong Wu (Arm Technology China) wrote: Hi Suzuki, -Original Message- From: Suzuki K Poulose Sent: Thursday, September 26, 2019 9:06 PM To: Jianyong Wu (Arm Technology China) ; net...@vger.kernel.org; yangbo...@nxp.com; john.stu...@linaro.org; t...@linutronix.de; pbonz...@redhat.com; sean.j.christopher...@intel.com; m...@kernel.org; richardcoch...@gmail.com; Mark Rutland ; Will Deacon Cc: linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; kvmarm@lists.cs.columbia.edu; k...@vger.kernel.org; Steve Capper ; Kaly Xin (Arm Technology China) ; Justin He (Arm Technology China) ; nd Subject: Re: [RFC PATCH v4 2/5] ptp: Reorganize ptp_kvm modules to make it arch-independent. Hi Jianyong, On 26/09/2019 12:42, Jianyong Wu wrote: Currently, ptp_kvm modules implementation is only for x86 which includs large part of arch-specific code. This patch move all of those code into new arch related file in the same directory. Signed-off-by: Jianyong Wu --- drivers/ptp/Makefile | 1 + drivers/ptp/{ptp_kvm.c => kvm_ptp.c} | 77 ++-- drivers/ptp/ptp_kvm_x86.c| 87 include/asm-generic/ptp_kvm.h| 12 4 files changed, 118 insertions(+), 59 deletions(-) rename drivers/ptp/{ptp_kvm.c => kvm_ptp.c} (63%) minor nit: Could we not skip renaming the file ? Given you are following the ptp_kvm_* for the arch specific files and the header files, wouldn't it be good to keep ptp_kvm.c ? If the module name ptp_kvm.ko is the same with its dependent object file ptp_kvm.o, warning will be given by compiler, So I change the ptp_kvm.c to kvm_ptp.c to avoid that conflict. Hmm, ok. How about ptp_kvm_common.c instead of kvm_ptp.c ? Suzuki
Re: [PATCH 1/5] arm64: Add ARM64_WORKAROUND_1319367 for all A57 and A72 versions
On 25/09/2019 12:19, Marc Zyngier wrote: Rework the EL2 vector hardening that is only selected for A57 and A72 so that the table can also be used for ARM64_WORKAROUND_1319367. Signed-off-by: Marc Zyngier Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC PATCH v4 2/5] ptp: Reorganize ptp_kvm modules to make it arch-independent.
Hi Jianyong, On 26/09/2019 12:42, Jianyong Wu wrote: Currently, ptp_kvm modules implementation is only for x86 which includs large part of arch-specific code. This patch move all of those code into new arch related file in the same directory. Signed-off-by: Jianyong Wu --- drivers/ptp/Makefile | 1 + drivers/ptp/{ptp_kvm.c => kvm_ptp.c} | 77 ++-- drivers/ptp/ptp_kvm_x86.c| 87 include/asm-generic/ptp_kvm.h| 12 4 files changed, 118 insertions(+), 59 deletions(-) rename drivers/ptp/{ptp_kvm.c => kvm_ptp.c} (63%) minor nit: Could we not skip renaming the file ? Given you are following the ptp_kvm_* for the arch specific files and the header files, wouldn't it be good to keep ptp_kvm.c ? Rest looks fine. Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH] MAINTAINERS: Fix spelling mistake in my name
Fix a typo in my name against in KVM-ARM reviewers. Signed-off-by: Suzuki K Poulose --- Will, Please could you pick this one too ? There might be conflict with the other updates. --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index c144bd6a432e..ce5e40d91041 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8795,7 +8795,7 @@ KERNEL VIRTUAL MACHINE FOR ARM/ARM64 (KVM/arm, KVM/arm64) M: Marc Zyngier R: James Morse R: Julien Thierry -R: Suzuki K Pouloze +R: Suzuki K Poulose L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers) L: kvmarm@lists.cs.columbia.edu T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git -- 2.21.0
Re: [PATCH 07/59] KVM: arm64: nv: Add EL2 system registers to vcpu context
Hi Marc, On 21/06/2019 10:37, Marc Zyngier wrote: From: Jintack Lim ARM v8.3 introduces a new bit in the HCR_EL2, which is the NV bit. When this bit is set, accessing EL2 registers in EL1 traps to EL2. In addition, executing the following instructions in EL1 will trap to EL2: tlbi, at, eret, and msr/mrs instructions to access SP_EL1. Most of the instructions that trap to EL2 with the NV bit were undef at EL1 prior to ARM v8.3. The only instruction that was not undef is eret. This patch sets up a handler for EL2 registers and SP_EL1 register accesses at EL1. The host hypervisor keeps those register values in memory, and will emulate their behavior. This patch doesn't set the NV bit yet. It will be set in a later patch once nested virtualization support is completed. Signed-off-by: Jintack Lim Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/kvm_host.h | 37 +++- arch/arm64/include/asm/sysreg.h | 50 - arch/arm64/kvm/sys_regs.c | 74 --- 3 files changed, 154 insertions(+), 7 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 4bcd9c1291d5..2d4290d2513a 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -173,12 +173,47 @@ enum vcpu_sysreg { APGAKEYLO_EL1, APGAKEYHI_EL1, - /* 32bit specific registers. Keep them at the end of the range */ + /* 32bit specific registers. */ DACR32_EL2, /* Domain Access Control Register */ IFSR32_EL2, /* Instruction Fault Status Register */ FPEXC32_EL2,/* Floating-Point Exception Control Register */ DBGVCR32_EL2, /* Debug Vector Catch Register */ + /* EL2 registers sorted ascending by Op0, Op1, CRn, CRm, Op2 */ + FIRST_EL2_SYSREG, + VPIDR_EL2 = FIRST_EL2_SYSREG, + /* Virtualization Processor ID Register */ + VMPIDR_EL2, /* Virtualization Multiprocessor ID Register */ + SCTLR_EL2, /* System Control Register (EL2) */ + ACTLR_EL2, /* Auxiliary Control Register (EL2) */ + HCR_EL2,/* Hypervisor Configuration Register */ + MDCR_EL2, /* Monitor Debug Configuration Register (EL2) */ + CPTR_EL2, /* Architectural Feature Trap Register (EL2) */ + HSTR_EL2, /* Hypervisor System Trap Register */ + HACR_EL2, /* Hypervisor Auxiliary Control Register */ + TTBR0_EL2, /* Translation Table Base Register 0 (EL2) */ + TTBR1_EL2, /* Translation Table Base Register 1 (EL2) */ + TCR_EL2,/* Translation Control Register (EL2) */ + VTTBR_EL2, /* Virtualization Translation Table Base Register */ + VTCR_EL2, /* Virtualization Translation Control Register */ + SPSR_EL2, /* EL2 saved program status register */ + ELR_EL2,/* EL2 exception link register */ + AFSR0_EL2, /* Auxiliary Fault Status Register 0 (EL2) */ + AFSR1_EL2, /* Auxiliary Fault Status Register 1 (EL2) */ + ESR_EL2,/* Exception Syndrome Register (EL2) */ + FAR_EL2,/* Hypervisor IPA Fault Address Register */ + HPFAR_EL2, /* Hypervisor IPA Fault Address Register */ + MAIR_EL2, /* Memory Attribute Indirection Register (EL2) */ + AMAIR_EL2, /* Auxiliary Memory Attribute Indirection Register (EL2) */ + VBAR_EL2, /* Vector Base Address Register (EL2) */ + RVBAR_EL2, /* Reset Vector Base Address Register */ + RMR_EL2,/* Reset Management Register */ + CONTEXTIDR_EL2, /* Context ID Register (EL2) */ + TPIDR_EL2, /* EL2 Software Thread ID Register */ + CNTVOFF_EL2,/* Counter-timer Virtual Offset register */ + CNTHCTL_EL2,/* Counter-timer Hypervisor Control register */ + SP_EL2, /* EL2 Stack Pointer */ Does it need to include the following registers Counter-timer Hyp Physical timer registers ? i.e, CNTHP_{CTL,CVAL,TVAL}_EL2. Or do we have some other magic with NV for the virtual EL2 ? Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 05/59] KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set
On 21/06/2019 10:37, Marc Zyngier wrote: From: Christoffer Dall Reset the VCPU with PSTATE.M = EL2h when the nested virtualization feature is enabled on the VCPU. Signed-off-by: Christoffer Dall Signed-off-by: Marc Zyngier --- arch/arm64/kvm/reset.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index 1140b4485575..675ca07dbb05 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -52,6 +52,11 @@ static const struct kvm_regs default_regs_reset = { PSR_F_BIT | PSR_D_BIT), }; +static const struct kvm_regs default_regs_reset_el2 = { + .regs.pstate = (PSR_MODE_EL2h | PSR_A_BIT | PSR_I_BIT | + PSR_F_BIT | PSR_D_BIT), +}; + static const struct kvm_regs default_regs_reset32 = { .regs.pstate = (PSR_AA32_MODE_SVC | PSR_AA32_A_BIT | PSR_AA32_I_BIT | PSR_AA32_F_BIT), @@ -302,6 +307,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu) if (!cpu_has_32bit_el1()) goto out; cpu_reset = _regs_reset32; + } else if (test_bit(KVM_ARM_VCPU_NESTED_VIRT, vcpu->arch.features)) { minor nit: "else if nested_virt_in_use(vcpu)" ? Either ways: Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v9 4/5] KVM: arm/arm64: remove pmc->bitmask
On 17/06/2019 16:43, Andrew Murray wrote: On Thu, Jun 13, 2019 at 05:50:43PM +0100, Suzuki K Poulose wrote: On 13/06/2019 10:39, Andrew Murray wrote: On Thu, Jun 13, 2019 at 08:30:51AM +0100, Julien Thierry wrote: index ae1e886d4a1a..88ce24ae0b45 100644 --- a/virt/kvm/arm/pmu.c +++ b/virt/kvm/arm/pmu.c @@ -47,7 +47,10 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx) counter += perf_event_read_value(pmc->perf_event, , ); - return counter & pmc->bitmask; + if (select_idx != ARMV8_PMU_CYCLE_IDX) + counter = lower_32_bits(counter); Shouldn't this depend on PMCR.LC as well? If PMCR.LC is clear we only want the lower 32bits of the cycle counter. Yes that's correct. The hunk should look like this: - return counter & pmc->bitmask; + if (!(select_idx == ARMV8_PMU_CYCLE_IDX && + __vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_LC)) + counter = lower_32_bits(counter); + + return counter; May be you could add a macro : #define vcpu_pmu_counter_is_64bit(vcpu, idx) ? Yes I think a helper would be useful - though I'd prefer the name 'kvm_pmu_idx_is_long_cycle_counter'. This seems a little clearer as you could otherwise argue that a chained counter is also 64 bits. When you get to add 64bit PMU counter (v8.5), this would be handy. So having it de-coupled from the cycle counter may be a good idea. Anyways, I leave that to you. Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v9 5/5] KVM: arm/arm64: support chained PMU counters
On 12/06/2019 20:04, Andrew Murray wrote: ARMv8 provides support for chained PMU counters, where an event type of 0x001E is set for odd-numbered counters, the event counter will increment by one for each overflow of the preceding even-numbered counter. Let's emulate this in KVM by creating a 64 bit perf counter when a user chains two emulated counters together. For chained events we only support generating an overflow interrupt on the high counter. We use the attributes of the low counter to determine the attributes of the perf event. Suggested-by: Marc Zyngier Signed-off-by: Andrew Murray Reviewed-by: Julien Thierry Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v9 4/5] KVM: arm/arm64: remove pmc->bitmask
On 13/06/2019 10:39, Andrew Murray wrote: On Thu, Jun 13, 2019 at 08:30:51AM +0100, Julien Thierry wrote: Hi Andrew, On 12/06/2019 20:04, Andrew Murray wrote: We currently use pmc->bitmask to determine the width of the pmc - however it's superfluous as the pmc index already describes if the pmc is a cycle counter or event counter. The architecture clearly describes the widths of these counters. Let's remove the bitmask to simplify the code. Signed-off-by: Andrew Murray --- include/kvm/arm_pmu.h | 1 - virt/kvm/arm/pmu.c| 19 +-- 2 files changed, 9 insertions(+), 11 deletions(-) diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h index b73f31baca52..2f0e28dc5a9e 100644 --- a/include/kvm/arm_pmu.h +++ b/include/kvm/arm_pmu.h @@ -28,7 +28,6 @@ struct kvm_pmc { u8 idx; /* index into the pmu->pmc array */ struct perf_event *perf_event; - u64 bitmask; }; struct kvm_pmu { diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c index ae1e886d4a1a..88ce24ae0b45 100644 --- a/virt/kvm/arm/pmu.c +++ b/virt/kvm/arm/pmu.c @@ -47,7 +47,10 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx) counter += perf_event_read_value(pmc->perf_event, , ); - return counter & pmc->bitmask; + if (select_idx != ARMV8_PMU_CYCLE_IDX) + counter = lower_32_bits(counter); Shouldn't this depend on PMCR.LC as well? If PMCR.LC is clear we only want the lower 32bits of the cycle counter. Yes that's correct. The hunk should look like this: - return counter & pmc->bitmask; + if (!(select_idx == ARMV8_PMU_CYCLE_IDX && + __vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_LC)) + counter = lower_32_bits(counter); + + return counter; May be you could add a macro : #define vcpu_pmu_counter_is_64bit(vcpu, idx) ? Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 6/6] KVM: arm/arm64: support chained PMU counters
Hi Andrew, @@ -398,27 +531,43 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx) /* Software increment event does't need to be backed by a perf event */ if (eventsel == ARMV8_PMUV3_PERFCTR_SW_INCR && - select_idx != ARMV8_PMU_CYCLE_IDX) + pmc->idx != ARMV8_PMU_CYCLE_IDX) return; memset(, 0, sizeof(struct perf_event_attr)); attr.type = PERF_TYPE_RAW; attr.size = sizeof(attr); attr.pinned = 1; - attr.disabled = !kvm_pmu_counter_is_enabled(vcpu, select_idx); + attr.disabled = !kvm_pmu_counter_is_enabled(vcpu, pmc->idx); attr.exclude_user = data & ARMV8_PMU_EXCLUDE_EL0 ? 1 : 0; attr.exclude_kernel = data & ARMV8_PMU_EXCLUDE_EL1 ? 1 : 0; attr.exclude_hv = 1; /* Don't count EL2 events */ attr.exclude_host = 1; /* Don't count host events */ - attr.config = (select_idx == ARMV8_PMU_CYCLE_IDX) ? + attr.config = (pmc->idx == ARMV8_PMU_CYCLE_IDX) ? ARMV8_PMUV3_PERFCTR_CPU_CYCLES : eventsel; - counter = kvm_pmu_get_counter_value(vcpu, select_idx); - /* The initial sample period (overflow count) of an event. */ - attr.sample_period = (-counter) & GENMASK(31, 0); + counter = kvm_pmu_get_pair_counter_value(vcpu, pmc); + + if (kvm_pmu_idx_has_chain_evtype(vcpu, pmc->idx)) { + /** +* The initial sample period (overflow count) of an event. For +* chained counters we only support overflow interrupts on the +* high counter. +*/ + attr.sample_period = (-counter) & GENMASK(63, 0); + event = perf_event_create_kernel_counter(, -1, current, +kvm_pmu_perf_overflow, +pmc + 1); - event = perf_event_create_kernel_counter(, -1, current, + if (kvm_pmu_counter_is_enabled(vcpu, pmc->idx + 1)) + attr.config1 |= PERF_ATTR_CFG1_KVM_PMU_CHAINED; + } else { + /* The initial sample period (overflow count) of an event. */ + attr.sample_period = (-counter) & GENMASK(31, 0); + event = perf_event_create_kernel_counter(, -1, current, kvm_pmu_perf_overflow, pmc); + } + If this was the Cycle counter and t he PMCR_LC was set, shouldn't we be using 64bit mask here ? We fall back to using the Cycle counter in 64bit mode for "normal" (read guest) kernel. So shouldn't we reflect that here ? Rest looks fine to me. Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 5/6] KVM: arm/arm64: remove pmc->bitmask
On 22/05/2019 17:26, Andrew Murray wrote: On Wed, May 22, 2019 at 05:07:31PM +0100, Marc Zyngier wrote: On 22/05/2019 16:30, Andrew Murray wrote: We currently use pmc->bitmask to determine the width of the pmc - however it's superfluous as the pmc index already describes if the pmc is a cycle counter or event counter. The architecture clearly describes the widths of these counters. Let's remove the bitmask to simplify the code. Signed-off-by: Andrew Murray --- include/kvm/arm_pmu.h | 1 - virt/kvm/arm/pmu.c| 15 +-- 2 files changed, 5 insertions(+), 11 deletions(-) diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h index b73f31baca52..2f0e28dc5a9e 100644 --- a/include/kvm/arm_pmu.h +++ b/include/kvm/arm_pmu.h @@ -28,7 +28,6 @@ struct kvm_pmc { u8 idx; /* index into the pmu->pmc array */ struct perf_event *perf_event; - u64 bitmask; }; - - if (val & ARMV8_PMU_PMCR_LC) { - pmc = >pmc[ARMV8_PMU_CYCLE_IDX]; - pmc->bitmask = 0xUL; - } } ... static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u64 select_idx) @@ -420,7 +415,7 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx) counter = kvm_pmu_get_counter_value(vcpu, select_idx); /* The initial sample period (overflow count) of an event. */ - attr.sample_period = (-counter) & pmc->bitmask; + attr.sample_period = (-counter) & GENMASK(31, 0); Isn't this the one case where the bitmask actually matters? If we're dealing with the cycle counter, it shouldn't be truncated, right? Ah yes, that should be conditional on idx as well. The mask for Cycle counter also depends on the PMCR.LC field set by the guest, isn't it ? So unless we correlate that with the idx, we could be passing in wrong results ? Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH] mm, compaction: Make sure we isolate a valid PFN
When we have holes in a normal memory zone, we could endup having cached_migrate_pfns which may not necessarily be valid, under heavy memory pressure with swapping enabled ( via __reset_isolation_suitable(), triggered by kswapd). Later if we fail to find a page via fast_isolate_freepages(), we may end up using the migrate_pfn we started the search with, as valid page. This could lead to accessing NULL pointer derefernces like below, due to an invalid mem_section pointer. Unable to handle kernel NULL pointer dereference at virtual address 0008 [47/1825] Mem abort info: ESR = 0x9604 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x0004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp = 82f94ae9 [0008] pgd= Internal error: Oops: 9604 [#1] SMP ... CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6 Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018 pstate: 6005 (nZCv daif -PAN -UAO) pc : set_pfnblock_flags_mask+0x58/0xe8 lr : compaction_alloc+0x300/0x950 [...] Process qemu-system-aar (pid: 6080, stack limit = 0x95070da5) Call trace: set_pfnblock_flags_mask+0x58/0xe8 compaction_alloc+0x300/0x950 migrate_pages+0x1a4/0xbb0 compact_zone+0x750/0xde8 compact_zone_order+0xd8/0x118 try_to_compact_pages+0xb4/0x290 __alloc_pages_direct_compact+0x84/0x1e0 __alloc_pages_nodemask+0x5e0/0xe18 alloc_pages_vma+0x1cc/0x210 do_huge_pmd_anonymous_page+0x108/0x7c8 __handle_mm_fault+0xdd4/0x1190 handle_mm_fault+0x114/0x1c0 __get_user_pages+0x198/0x3c0 get_user_pages_unlocked+0xb4/0x1d8 __gfn_to_pfn_memslot+0x12c/0x3b8 gfn_to_pfn_prot+0x4c/0x60 kvm_handle_guest_abort+0x4b0/0xcd8 handle_exit+0x140/0x1b8 kvm_arch_vcpu_ioctl_run+0x260/0x768 kvm_vcpu_ioctl+0x490/0x898 do_vfs_ioctl+0xc4/0x898 ksys_ioctl+0x8c/0xa0 __arm64_sys_ioctl+0x28/0x38 el0_svc_common+0x74/0x118 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc Code: f8607840 f11f 8b011401 9a801020 (f9400400) ---[ end trace af6a35219325a9b6 ]--- The issue was reported on an arm64 server with 128GB with holes in the zone (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while running 100 KVM guest instances. This patch fixes the issue by ensuring that the page belongs to a valid PFN when we fallback to using the lower limit of the scan range upon failure in fast_isolate_freepages(). Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target") Reported-by: Marc Zyngier Signed-off-by: Suzuki K Poulose --- mm/compaction.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9febc8c..9e1b9ac 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) page = pfn_to_page(highest); cc->free_pfn = highest; } else { - if (cc->direct_compaction) { + if (cc->direct_compaction && pfn_valid(min_pfn)) { page = pfn_to_page(min_pfn); cc->free_pfn = min_pfn; } -- 2.7.4 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: mm/compaction: BUG: NULL pointer dereference
Hi Mel, Thanks for your quick response. On 24/05/2019 11:39, Mel Gorman wrote: On Fri, May 24, 2019 at 10:20:19AM +0100, Suzuki K Poulose wrote: Hi, We are hitting NULL pointer dereferences while running stress tests with KVM. See splat [0]. The test is to spawn 100 VMs all doing standard debian installation (Thanks to Marc's automated scripts, available here [1] ). The problem has been reproduced with a better rate of success from 5.1-rc6 onwards. The issue is only reproducible with swapping enabled and the entire memory is used up, when swapping heavily. Also this issue is only reproducible on only one server with 128GB, which has the following memory layout: [32GB@4GB, hole , 96GB@544GB] Here is my non-expert analysis of the issue so far. Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable() to figure out the cached values for migrate/free pfn for a zone, by scanning through the entire zone. On our server it does so in the range of [ 0x10_, 0xa00_ ], with the following area of holes : [ 0x20_, 0x880_ ]. In the failing case, we end up setting the cached migrate pfn as : 0x508_, which is right in the center of the zone pfn range. i.e ( 0x10_ + 0xa00_ ) / 2, with reset_migrate = 0x88_4e00, reset_free = 0x10_. Now these cached values are used by the fast_isolate_freepages() to find a pfn. However, since we cant find anything during the search we fall back to using the page belonging to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid PFN or not. This is then passed on to fast_isolate_around() which tries to do : set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer. The following patch seems to fix the issue for me, but I am not quite convinced that it is the right fix. Thoughts ? I think the patch is valid and the alternatives would be unnecessarily complicated. During a normal scan for free pages to isolate, there is a check for pageblock_pfn_to_page() which uses a pfn_valid check for non-contiguous zones in __pageblock_pfn_to_page. Now, while the I had the initial version with the pageblock_pfn_to_page(), but as you said, it is a complicated way of perform the same check as pfn_valid(). non-contiguous check could be made in the area you highlight, it would be a relatively small optimisation that would be unmeasurable overall. However, it is definitely the case that if the PFN you highlight is invalid that badness happens. If you want to express this as a signed-off patch with an adjusted changelog then I'd be happy to add Sure, will send it right away. Reviewed-by: Mel Gorman Thanks. Suzuki
mm/compaction: BUG: NULL pointer dereference
Hi, We are hitting NULL pointer dereferences while running stress tests with KVM. See splat [0]. The test is to spawn 100 VMs all doing standard debian installation (Thanks to Marc's automated scripts, available here [1] ). The problem has been reproduced with a better rate of success from 5.1-rc6 onwards. The issue is only reproducible with swapping enabled and the entire memory is used up, when swapping heavily. Also this issue is only reproducible on only one server with 128GB, which has the following memory layout: [32GB@4GB, hole , 96GB@544GB] Here is my non-expert analysis of the issue so far. Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable() to figure out the cached values for migrate/free pfn for a zone, by scanning through the entire zone. On our server it does so in the range of [ 0x10_, 0xa00_ ], with the following area of holes : [ 0x20_, 0x880_ ]. In the failing case, we end up setting the cached migrate pfn as : 0x508_, which is right in the center of the zone pfn range. i.e ( 0x10_ + 0xa00_ ) / 2, with reset_migrate = 0x88_4e00, reset_free = 0x10_. Now these cached values are used by the fast_isolate_freepages() to find a pfn. However, since we cant find anything during the search we fall back to using the page belonging to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid PFN or not. This is then passed on to fast_isolate_around() which tries to do : set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer. The following patch seems to fix the issue for me, but I am not quite convinced that it is the right fix. Thoughts ? diff --git a/mm/compaction.c b/mm/compaction.c index 9febc8c..9e1b9ac 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) page = pfn_to_page(highest); cc->free_pfn = highest; } else { - if (cc->direct_compaction) { + if (cc->direct_compaction && pfn_valid(min_pfn)) { page = pfn_to_page(min_pfn); cc->free_pfn = min_pfn; } Suzuki [ 0 ] Kernel splat Unable to handle kernel NULL pointer dereference at virtual address 0008 [47/1825] Mem abort info: ESR = 0x9604 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x0004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp = 82f94ae9 [0008] pgd= Internal error: Oops: 9604 [#1] SMP ... CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6 Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018 pstate: 6005 (nZCv daif -PAN -UAO) pc : set_pfnblock_flags_mask+0x58/0xe8 lr : compaction_alloc+0x300/0x950 sp : 1fc03010 x29: 1fc03010 x28: x27: x26: 10bf7000 x25: 06445000 x24: 06444e00 x23: 7e018f138000 x22: 0003 x21: 0001 x20: 06444e00 x19: 0001 x18: x17: x16: 809f7fe97268 x15: 000191138000 x14: x13: 0070 x12: x11: 1fc03108 x10: x9 : 09222400 x8 : 0187 x7 : 063c4e00 x6 : 06444e00 x5 : 0008 x4 : 0001 x3 : 0003 x2 : 809f7fe92840 x1 : 0220 x0 : Process qemu-system-aar (pid: 6080, stack limit = 0x95070da5) Call trace: set_pfnblock_flags_mask+0x58/0xe8 compaction_alloc+0x300/0x950 migrate_pages+0x1a4/0xbb0 compact_zone+0x750/0xde8 compact_zone_order+0xd8/0x118 try_to_compact_pages+0xb4/0x290 __alloc_pages_direct_compact+0x84/0x1e0 __alloc_pages_nodemask+0x5e0/0xe18 alloc_pages_vma+0x1cc/0x210 do_huge_pmd_anonymous_page+0x108/0x7c8 __handle_mm_fault+0xdd4/0x1190 handle_mm_fault+0x114/0x1c0 __get_user_pages+0x198/0x3c0 get_user_pages_unlocked+0xb4/0x1d8 __gfn_to_pfn_memslot+0x12c/0x3b8 gfn_to_pfn_prot+0x4c/0x60 kvm_handle_guest_abort+0x4b0/0xcd8 handle_exit+0x140/0x1b8 kvm_arch_vcpu_ioctl_run+0x260/0x768 kvm_vcpu_ioctl+0x490/0x898 do_vfs_ioctl+0xc4/0x898 ksys_ioctl+0x8c/0xa0 __arm64_sys_ioctl+0x28/0x38 el0_svc_common+0x74/0x118 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc Code: f8607840 f11f 8b011401 9a801020 (f9400400) ---[ end trace af6a35219325a9b6 ]--- [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/vminstall.git/ ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu
Re: [PATCH v7 4/5] arm64: perf: extract chain helper into header
On 21/05/2019 16:52, Andrew Murray wrote: The ARMv8 Performance Monitors Extension includes an architectural event type named CHAIN which allows for chaining counters together. Let's extract the test for this event into a header file such that other users, such as KVM (for PMU emulation) can make use of. Signed-off-by: Andrew Murray --- arch/arm64/include/asm/perf_event.h | 5 + arch/arm64/kernel/perf_event.c | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/perf_event.h b/arch/arm64/include/asm/perf_event.h index c593761ba61c..cd13f3fd1055 100644 --- a/arch/arm64/include/asm/perf_event.h +++ b/arch/arm64/include/asm/perf_event.h @@ -219,6 +219,11 @@ #define ARMV8_PMU_USERENR_CR (1 << 2) /* Cycle counter can be read at EL0 */ #define ARMV8_PMU_USERENR_ER (1 << 3) /* Event counter can be read at EL0 */ +static inline bool armv8pmu_evtype_is_chain(u64 evtype) +{ + return (evtype == ARMV8_PMUV3_PERFCTR_CHAIN); +} + #ifdef CONFIG_PERF_EVENTS struct pt_regs; extern unsigned long perf_instruction_pointer(struct pt_regs *regs); diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 314b1adedf06..265bd835a724 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -879,7 +879,7 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event, static int armv8pmu_filter_match(struct perf_event *event) { unsigned long evtype = event->hw.config_base & ARMV8_PMU_EVTYPE_EVENT; - return evtype != ARMV8_PMUV3_PERFCTR_CHAIN; + return !armv8pmu_evtype_is_chain(evtype); } static void armv8pmu_reset(void *info) Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH AUTOSEL 5.0 29/98] KVM: arm/arm64: Enforce PTE mappings at stage2 when needed
Hi Sasha, On 04/22/2019 08:40 PM, Sasha Levin wrote: From: Suzuki K Poulose [ Upstream commit a80868f398554842b14d07060012c06efb57c456 ] commit 6794ad5443a2118 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings") made the checks to skip huge mappings, stricter. However it introduced a bug where we still use huge mappings, ignoring the flag to use PTE mappings, by not reseting the vma_pagesize to PAGE_SIZE. Also, the checks do not cover the PUD huge pages, that was under review during the same period. This patch fixes both the issues. Fixes : 6794ad5443a2118 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings") Reported-by: Zenghui Yu Cc: Zenghui Yu Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose Signed-off-by: Marc Zyngier Signed-off-by: Sasha Levin (Microsoft) Please be aware that we need a follow up fix for this patch to fix the problem for THP backed memory. http://lists.infradead.org/pipermail/linux-arm-kernel/2019-April/645324.html It should appear upstream soon. Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v13 8/8] arm64: docs: document perf event attributes
On 09/04/2019 20:22, Andrew Murray wrote: The interaction between the exclude_{host,guest} flags, exclude_{user,kernel,hv} flags and presence of VHE can result in different exception levels being filtered by the ARMv8 PMU. As this can be confusing let's document how they work on arm64. Signed-off-by: Andrew Murray --- Thats really helpful ! Thanks for writing one ! Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v13 7/8] arm64: KVM: avoid isb's by using direct pmxevtyper sysreg
On 09/04/2019 20:22, Andrew Murray wrote: Upon entering or exiting a guest we may modify multiple PMU counters to enable of disable EL0 filtering. We presently do this via the indirect PMXEVTYPER_EL0 system register (where the counter we modify is selected by PMSELR). With this approach it is necessary to order the writes via isb instructions such that we select the correct counter before modifying it. Let's avoid potentially expensive instruction barriers by using the direct PMEVTYPER_EL0 registers instead. As the change to counter type relates only to EL0 filtering we can rely on the implicit instruction barrier which occurs when we transition from EL2 to EL1 on entering the guest. On returning to userspace we can, at the latest, rely on the implicit barrier between EL2 and EL0. We can also depend on the explicit isb in armv8pmu_select_counter to order our write against any other kernel changes by the PMU driver to the type register as a result of preemption. Signed-off-by: Andrew Murray --- arch/arm64/kvm/pmu.c | 84 ++-- 1 file changed, 74 insertions(+), 10 deletions(-) diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index 3407a529e116..4d55193ccc71 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -91,6 +91,74 @@ void __hyp_text __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt) write_sysreg(pmu->events_host, pmcntenset_el0); } +#define PMEVTYPER_READ_CASE(idx)\ + case idx: \ + return read_sysreg(pmevtyper##idx##_el0) + +#define PMEVTYPER_WRITE_CASE(idx) \ + case idx: \ + write_sysreg(val, pmevtyper##idx##_el0);\ + break + +#define PMEVTYPER_CASES(readwrite) \ + PMEVTYPER_##readwrite##_CASE(0);\ + PMEVTYPER_##readwrite##_CASE(1);\ + PMEVTYPER_##readwrite##_CASE(2);\ + PMEVTYPER_##readwrite##_CASE(3);\ + PMEVTYPER_##readwrite##_CASE(4);\ + PMEVTYPER_##readwrite##_CASE(5);\ + PMEVTYPER_##readwrite##_CASE(6);\ + PMEVTYPER_##readwrite##_CASE(7);\ + PMEVTYPER_##readwrite##_CASE(8);\ + PMEVTYPER_##readwrite##_CASE(9);\ + PMEVTYPER_##readwrite##_CASE(10); \ + PMEVTYPER_##readwrite##_CASE(11); \ + PMEVTYPER_##readwrite##_CASE(12); \ + PMEVTYPER_##readwrite##_CASE(13); \ + PMEVTYPER_##readwrite##_CASE(14); \ + PMEVTYPER_##readwrite##_CASE(15); \ + PMEVTYPER_##readwrite##_CASE(16); \ + PMEVTYPER_##readwrite##_CASE(17); \ + PMEVTYPER_##readwrite##_CASE(18); \ + PMEVTYPER_##readwrite##_CASE(19); \ + PMEVTYPER_##readwrite##_CASE(20); \ + PMEVTYPER_##readwrite##_CASE(21); \ + PMEVTYPER_##readwrite##_CASE(22); \ + PMEVTYPER_##readwrite##_CASE(23); \ + PMEVTYPER_##readwrite##_CASE(24); \ + PMEVTYPER_##readwrite##_CASE(25); \ + PMEVTYPER_##readwrite##_CASE(26); \ + PMEVTYPER_##readwrite##_CASE(27); \ + PMEVTYPER_##readwrite##_CASE(28); \ + PMEVTYPER_##readwrite##_CASE(29); \ + PMEVTYPER_##readwrite##_CASE(30) + Don't we need case 31 and deal with the PMCCFILTR, instead of WARN_ON(1) below ? Otherwise, Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v13 6/8] arm64: KVM: Enable VHE support for :G/:H perf event modifiers
On 09/04/2019 20:22, Andrew Murray wrote: With VHE different exception levels are used between the host (EL2) and guest (EL1) with a shared exception level for userpace (EL0). We can take advantage of this and use the PMU's exception level filtering to avoid enabling/disabling counters in the world-switch code. Instead we just s/Instead// ? modify the counter type to include or exclude EL0 at vcpu_{load,put} time. We also ensure that trapped PMU system register writes do not re-enable EL0 when reconfiguring the backing perf events. This approach completely avoids blackout windows seen with !VHE. Suggested-by: Christoffer Dall Signed-off-by: Andrew Murray Acked-by: Will Deacon +/* + * On VHE ensure that only guest events have EL0 counting enabled + */ +void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) +{ + struct kvm_cpu_context *host_ctxt; + struct kvm_host_data *host; + u32 events_guest, events_host; + + if (!has_vhe()) + return; + + host_ctxt = vcpu->arch.host_cpu_context; + host = container_of(host_ctxt, struct kvm_host_data, host_ctxt); + events_guest = host->pmu_events.events_guest; + events_host = host->pmu_events.events_host; + + kvm_vcpu_pmu_enable_el0(events_guest); + kvm_vcpu_pmu_disable_el0(events_host); +} + +/* + * On VHE ensure that only guest host have EL0 counting enabled nit: s/guest/host/host events/ + */ +void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) +{ + struct kvm_cpu_context *host_ctxt; + struct kvm_host_data *host; + u32 events_guest, events_host; + + if (!has_vhe()) + return; + + host_ctxt = vcpu->arch.host_cpu_context; + host = container_of(host_ctxt, struct kvm_host_data, host_ctxt); + events_guest = host->pmu_events.events_guest; + events_host = host->pmu_events.events_host; + + kvm_vcpu_pmu_enable_el0(events_host); + kvm_vcpu_pmu_disable_el0(events_guest); +} diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 539feecda5b8..c7fa47ad2387 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -695,6 +695,7 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p, val |= p->regval & ARMV8_PMU_PMCR_MASK; __vcpu_sys_reg(vcpu, PMCR_EL0) = val; kvm_pmu_handle_pmcr(vcpu, val); + kvm_vcpu_pmu_restore_guest(vcpu); nit: I am not sure if we need to do this for PMCR accesses ? Unless we have modified some changes to the events (e.g, like the two instances below). Or am I missing something here ? Otherwise: Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v13 5/8] arm64: KVM: Enable !VHE support for :G/:H perf event modifiers
On 09/04/2019 20:22, Andrew Murray wrote: Enable/disable event counters as appropriate when entering and exiting the guest to enable support for guest or host only event counting. For both VHE and non-VHE we switch the counters between host/guest at EL2. The PMU may be on when we change which counters are enabled however we avoid adding an isb as we instead rely on existing context synchronisation events: the eret to enter the guest (__guest_enter) and eret in kvm_call_hyp for __kvm_vcpu_run_nvhe on returning. Signed-off-by: Andrew Murray Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v13 4/8] arm64: arm_pmu: Add !VHE support for exclude_host/exclude_guest attributes
On 04/09/2019 08:22 PM, Andrew Murray wrote: Add support for the :G and :H attributes in perf by handling the exclude_host/exclude_guest event attributes. We notify KVM of counters that we wish to be enabled or disabled on guest entry/exit and thus defer from starting or stopping events based on their event attributes. With !VHE we switch the counters between host/guest at EL2. We are able to eliminate counters counting host events on the boundaries of guest entry/exit when using :G by filtering out EL2 for exclude_host. When using !exclude_hv there is a small blackout window at the guest entry/exit where host events are not captured. Signed-off-by: Andrew Murray Acked-by: Will Deacon --- arch/arm64/kernel/perf_event.c | 43 -- 1 file changed, 36 insertions(+), 7 deletions(-) diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index cccf4fc86d67..6bb28aaf5aea 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -26,6 +26,7 @@ #include #include +#include #include #include #include @@ -528,11 +529,21 @@ static inline int armv8pmu_enable_counter(int idx) static inline void armv8pmu_enable_event_counter(struct perf_event *event) { + struct perf_event_attr *attr = >attr; int idx = event->hw.idx; + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx)); - armv8pmu_enable_counter(idx); if (armv8pmu_event_is_chained(event)) - armv8pmu_enable_counter(idx - 1); + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1)); + + kvm_set_pmu_events(counter_bits, attr); + + /* We rely on the hypervisor switch code to enable guest counters */ + if (!kvm_pmu_counter_deferred(attr)) { + armv8pmu_enable_counter(idx); + if (armv8pmu_event_is_chained(event)) + armv8pmu_enable_counter(idx - 1); + } } static inline int armv8pmu_disable_counter(int idx) @@ -545,11 +556,21 @@ static inline int armv8pmu_disable_counter(int idx) static inline void armv8pmu_disable_event_counter(struct perf_event *event) { struct hw_perf_event *hwc = >hw; + struct perf_event_attr *attr = >attr; int idx = hwc->idx; + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx)); if (armv8pmu_event_is_chained(event)) - armv8pmu_disable_counter(idx - 1); - armv8pmu_disable_counter(idx); + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1)); + + kvm_clr_pmu_events(counter_bits); + + /* We rely on the hypervisor switch code to disable guest counters */ + if (!kvm_pmu_counter_deferred(attr)) { + if (armv8pmu_event_is_chained(event)) + armv8pmu_disable_counter(idx - 1); + armv8pmu_disable_counter(idx); + } } static inline int armv8pmu_enable_intens(int idx) @@ -829,11 +850,16 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event, if (!attr->exclude_kernel) config_base |= ARMV8_PMU_INCLUDE_EL2; } else { - if (attr->exclude_kernel) - config_base |= ARMV8_PMU_EXCLUDE_EL1; - if (!attr->exclude_hv) + if (!attr->exclude_hv && !attr->exclude_host) config_base |= ARMV8_PMU_INCLUDE_EL2; } + + /* +* Filter out !VHE kernels and guest kernels +*/ + if (attr->exclude_kernel) + config_base |= ARMV8_PMU_EXCLUDE_EL1; + if (attr->exclude_user) config_base |= ARMV8_PMU_EXCLUDE_EL0; @@ -863,6 +889,9 @@ static void armv8pmu_reset(void *info) armv8pmu_disable_intens(idx); } + /* Clear the counters we flip at guest entry/exit */ + kvm_clr_pmu_events(U32_MAX); + /* * Initialize & Reset PMNC. Request overflow interrupt for * 64 bit cycle counter but cheat in armv8pmu_write_counter(). Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 2/2] kvm: arm: Unify handling THP backed host memory
Hi Zhengui, On 11/04/2019 02:59, Zenghui Yu wrote: Hi Suzuki, On 2019/4/10 23:23, Suzuki K Poulose wrote: We support mapping host memory backed by PMD transparent hugepages at stage2 as huge pages. However the checks are now spread across two different places. Let us unify the handling of the THPs to keep the code cleaner (and future proof for PUD THP support). This patch moves transparent_hugepage_adjust() closer to the caller to avoid a forward declaration for fault_supports_stage2_huge_mappings(). Also, since we already handle the case where the host VA and the guest PA may not be aligned, the explicit VM_BUG_ON() is not required. Cc: Marc Zyngier Cc: Christoffer Dall Cc: Zneghui Yu Signed-off-by: Suzuki K Poulose --- virt/kvm/arm/mmu.c | 123 +++-- 1 file changed, 62 insertions(+), 61 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 6d73322..714eec2 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1380,53 +1380,6 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, return ret; } -static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap) -{ - kvm_pfn_t pfn = *pfnp; - gfn_t gfn = *ipap >> PAGE_SHIFT; - struct page *page = pfn_to_page(pfn); - - /* -* PageTransCompoundMap() returns true for THP and -* hugetlbfs. Make sure the adjustment is done only for THP -* pages. -*/ - if (!PageHuge(page) && PageTransCompoundMap(page)) { - unsigned long mask; - /* -* The address we faulted on is backed by a transparent huge -* page. However, because we map the compound huge page and -* not the individual tail page, we need to transfer the -* refcount to the head page. We have to be careful that the -* THP doesn't start to split while we are adjusting the -* refcounts. -* -* We are sure this doesn't happen, because mmu_notifier_retry -* was successful and we are holding the mmu_lock, so if this -* THP is trying to split, it will be blocked in the mmu -* notifier before touching any of the pages, specifically -* before being able to call __split_huge_page_refcount(). -* -* We can therefore safely transfer the refcount from PG_tail -* to PG_head and switch the pfn from a tail page to the head -* page accordingly. -*/ - mask = PTRS_PER_PMD - 1; - VM_BUG_ON((gfn & mask) != (pfn & mask)); - if (pfn & mask) { - *ipap &= PMD_MASK; - kvm_release_pfn_clean(pfn); - pfn &= ~mask; - kvm_get_pfn(pfn); - *pfnp = pfn; - } - - return true; - } - - return false; -} - /** * stage2_wp_ptes - write protect PMD range * @pmd: pointer to pmd entry @@ -1677,6 +1630,61 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, (hva & ~(map_size - 1)) + map_size <= uaddr_end; } +/* + * Check if the given hva is backed by a transparent huge page (THP) + * and whether it can be mapped using block mapping in stage2. If so, adjust + * the stage2 PFN and IPA accordingly. Only PMD_SIZE THPs are currently + * supported. This will need to be updated to support other THP sizes. + * + * Returns the size of the mapping. + */ +static unsigned long +transparent_hugepage_adjust(struct kvm_memory_slot *memslot, + unsigned long hva, kvm_pfn_t *pfnp, + phys_addr_t *ipap) +{ + kvm_pfn_t pfn = *pfnp; + struct page *page = pfn_to_page(pfn); + + /* +* PageTransCompoundMap() returns true for THP and +* hugetlbfs. Make sure the adjustment is done only for THP +* pages. Also make sure that the HVA and IPA are sufficiently +* aligned and that the block map is contained within the memslot. +*/ + if (!PageHuge(page) && PageTransCompoundMap(page) && We managed to get here, ensure that we only play with normal size pages and no hugetlbfs pages will be involved. "!PageHuge(page)" will always return true and we can let it go. I think that is a bit tricky. If someone ever modifies the user_mem_abort() and we end up in getting called with a HugeTLB backed page things could go wrong. I could do remove the check, but would like to add a WARN_ON_ONCE() to make sure our assumption is held. i.e, WARN_ON_ONCE(PageHuge(page)); if (PageTransCompoundMap(page) &&>> + fa
Re: [PATCH 1/2] kvm: arm: Clean up the checking for huge mapping
On 04/11/2019 02:48 AM, Zenghui Yu wrote: On 2019/4/10 23:23, Suzuki K Poulose wrote: If we are checking whether the stage2 can map PAGE_SIZE, we don't have to do the boundary checks as both the host VMA and the guest memslots are page aligned. Bail the case easily. Cc: Christoffer Dall Cc: Marc Zyngier Signed-off-by: Suzuki K Poulose --- virt/kvm/arm/mmu.c | 4 1 file changed, 4 insertions(+) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index a39dcfd..6d73322 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1624,6 +1624,10 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, hva_t uaddr_start, uaddr_end; size_t size; + /* The memslot and the VMA are guaranteed to be aligned to PAGE_SIZE */ + if (map_size == PAGE_SIZE) + return true; + size = memslot->npages * PAGE_SIZE; gpa_start = memslot->base_gfn << PAGE_SHIFT; We can do a comment clean up as well in this patch. s/<< PAGE_SIZE/<< PAGE_SHIFT/ Sure, I missed that. Will fix it in the next version. Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 2/2] kvm: arm: Unify handling THP backed host memory
We support mapping host memory backed by PMD transparent hugepages at stage2 as huge pages. However the checks are now spread across two different places. Let us unify the handling of the THPs to keep the code cleaner (and future proof for PUD THP support). This patch moves transparent_hugepage_adjust() closer to the caller to avoid a forward declaration for fault_supports_stage2_huge_mappings(). Also, since we already handle the case where the host VA and the guest PA may not be aligned, the explicit VM_BUG_ON() is not required. Cc: Marc Zyngier Cc: Christoffer Dall Cc: Zneghui Yu Signed-off-by: Suzuki K Poulose --- virt/kvm/arm/mmu.c | 123 +++-- 1 file changed, 62 insertions(+), 61 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 6d73322..714eec2 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1380,53 +1380,6 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, return ret; } -static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap) -{ - kvm_pfn_t pfn = *pfnp; - gfn_t gfn = *ipap >> PAGE_SHIFT; - struct page *page = pfn_to_page(pfn); - - /* -* PageTransCompoundMap() returns true for THP and -* hugetlbfs. Make sure the adjustment is done only for THP -* pages. -*/ - if (!PageHuge(page) && PageTransCompoundMap(page)) { - unsigned long mask; - /* -* The address we faulted on is backed by a transparent huge -* page. However, because we map the compound huge page and -* not the individual tail page, we need to transfer the -* refcount to the head page. We have to be careful that the -* THP doesn't start to split while we are adjusting the -* refcounts. -* -* We are sure this doesn't happen, because mmu_notifier_retry -* was successful and we are holding the mmu_lock, so if this -* THP is trying to split, it will be blocked in the mmu -* notifier before touching any of the pages, specifically -* before being able to call __split_huge_page_refcount(). -* -* We can therefore safely transfer the refcount from PG_tail -* to PG_head and switch the pfn from a tail page to the head -* page accordingly. -*/ - mask = PTRS_PER_PMD - 1; - VM_BUG_ON((gfn & mask) != (pfn & mask)); - if (pfn & mask) { - *ipap &= PMD_MASK; - kvm_release_pfn_clean(pfn); - pfn &= ~mask; - kvm_get_pfn(pfn); - *pfnp = pfn; - } - - return true; - } - - return false; -} - /** * stage2_wp_ptes - write protect PMD range * @pmd: pointer to pmd entry @@ -1677,6 +1630,61 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, (hva & ~(map_size - 1)) + map_size <= uaddr_end; } +/* + * Check if the given hva is backed by a transparent huge page (THP) + * and whether it can be mapped using block mapping in stage2. If so, adjust + * the stage2 PFN and IPA accordingly. Only PMD_SIZE THPs are currently + * supported. This will need to be updated to support other THP sizes. + * + * Returns the size of the mapping. + */ +static unsigned long +transparent_hugepage_adjust(struct kvm_memory_slot *memslot, + unsigned long hva, kvm_pfn_t *pfnp, + phys_addr_t *ipap) +{ + kvm_pfn_t pfn = *pfnp; + struct page *page = pfn_to_page(pfn); + + /* +* PageTransCompoundMap() returns true for THP and +* hugetlbfs. Make sure the adjustment is done only for THP +* pages. Also make sure that the HVA and IPA are sufficiently +* aligned and that the block map is contained within the memslot. +*/ + if (!PageHuge(page) && PageTransCompoundMap(page) && + fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) { + /* +* The address we faulted on is backed by a transparent huge +* page. However, because we map the compound huge page and +* not the individual tail page, we need to transfer the +* refcount to the head page. We have to be careful that the +* THP doesn't start to split while we are adjusting the +* refcounts. +* +* We are sure this doesn't happen, because mmu_notifier_retry +* was successful and we are holding the mmu_lock, so if this +* THP is trying to split, it will be block
[PATCH 1/2] kvm: arm: Clean up the checking for huge mapping
If we are checking whether the stage2 can map PAGE_SIZE, we don't have to do the boundary checks as both the host VMA and the guest memslots are page aligned. Bail the case easily. Cc: Christoffer Dall Cc: Marc Zyngier Signed-off-by: Suzuki K Poulose --- virt/kvm/arm/mmu.c | 4 1 file changed, 4 insertions(+) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index a39dcfd..6d73322 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1624,6 +1624,10 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, hva_t uaddr_start, uaddr_end; size_t size; + /* The memslot and the VMA are guaranteed to be aligned to PAGE_SIZE */ + if (map_size == PAGE_SIZE) + return true; + size = memslot->npages * PAGE_SIZE; gpa_start = memslot->base_gfn << PAGE_SHIFT; -- 2.7.4 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 0/2] kvm: arm: Unify stage2 mapping for THP backed memory
This series cleans up the handling of the stage2 huge mapping for THP. Applies on top of [1] [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2019-April/645324.html Suzuki K Poulose (2): kvm: arm: Clean up the checking for huge mapping kvm: arm: Unify handling THP backed host memory virt/kvm/arm/mmu.c | 127 - 1 file changed, 66 insertions(+), 61 deletions(-) -- 2.7.4 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v2] kvm: arm: Skip stage2 huge mappings for unaligned ipa backed by THP
With commit a80868f398554842b14, we no longer ensure that the THP page is properly aligned in the guest IPA. Skip the stage2 huge mapping for unaligned IPA backed by transparent hugepages. Fixes: a80868f398554842b14 ("KVM: arm/arm64: Enforce PTE mappings at stage2 when needed") Reported-by: Eric Auger Cc: Marc Zyngier Cc: Chirstoffer Dall Cc: Zenghui Yu Cc: Zheng Xiang Cc: Andrew Murray Cc: Eric Auger Signed-off-by: Suzuki K Poulose --- Changes since V1: - Zenghui Yu reported that the first version misses the boundary check for the end address. Lets reuse the existing helper to check it. --- virt/kvm/arm/mmu.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 27c9583..a39dcfd 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1781,8 +1781,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * Only PMD_SIZE transparent hugepages(THP) are * currently supported. This code will need to be * updated to support other THP sizes. +* +* Make sure the host VA and the guest IPA are sufficiently +* aligned and that the block is contained within the memslot. */ - if (transparent_hugepage_adjust(, _ipa)) + if (fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE) && + transparent_hugepage_adjust(, _ipa)) vma_pagesize = PMD_SIZE; } -- 2.7.4 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] kvm: arm: Skip stage2 huge mappings for unaligned ipa backed by THP
On 10/04/2019 03:20, Zenghui Yu wrote: On 2019/4/9 22:59, Suzuki K Poulose wrote: Hi Zenghui On 04/09/2019 09:05 AM, Zenghui Yu wrote: On 2019/4/9 2:40, Suzuki K Poulose wrote: Hi Zenhui, On 04/08/2019 04:11 PM, Zenghui Yu wrote: Hi Suzuki, Thanks for the reply. ... Hi Suzuki, Why not making use of fault_supports_stage2_huge_mapping()? Let it do some checks for us. fault_supports_stage2_huge_mapping() was intended to do a *two-step* check to tell us that can we create stage2 huge block mappings, and this check is both for hugetlbfs and THP. With commit a80868f398554842b14, we pass PAGE_SIZE as "map_size" for normal size pages (which turned out to be almost meaningless), and unfortunately the THP check no longer works. Thats correct. So we want to rework *THP* check process. Your patch fixes the first checking-step, but the second is still missed, am I wrong? It fixes the step explicitly for the THP by making sure that the GPA and the HVA are aligned to the map size. Yes, I understand how your patch had fixed the issue. But what I'm really concerned about here is the *second* checking-step in fault_supports_stage2_huge_mapping(). We have to check if we are mapping a non-block aligned or non-block sized memslot, if so, we can not create block mappings for the beginning and end of this memslot. This is what the second part of fault_supports_stage2_huge_mapping() had done. I haven't seen this checking-step in your patch, did I miss something? I see. I don't think this calls for a VM_BUG_ON(). It is simply a case where the GPA is not aligned to HVA, but for normal VMA that could be made THP. We had this VM_BUG_ON(), which would have never hit because we would have set force_pte if they were not aligned. Yes, I agree. + /* Skip memslots with unaligned IPA and user address */ + if ((gfn & mask) != (pfn & mask)) + return false; if (pfn & mask) { *ipap &= PMD_MASK; kvm_release_pfn_clean(pfn); ---8>--- Rework fault_supports_stage2_huge_mapping(), let it check THP again. Signed-off-by: Zenghui Yu --- virt/kvm/arm/mmu.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 27c9583..5e1b258 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1632,6 +1632,15 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, uaddr_end = uaddr_start + size; /* + * If the memslot is _not_ backed by hugetlbfs, then check if it + * can be backed by transparent hugepages. + * + * Currently only PMD_SIZE THPs are supported, revisit it later. + */ + if (map_size == PAGE_SIZE) + map_size = PMD_SIZE; + This looks hackish. What is we support PUD_SIZE huge page in the future ? Yes, this might make the code a little difficult to understand. But by doing so, we follow the same logic before commit a80868f398554842b14, that said, we do the two-step checking for normal size pages in fault_supports_stage2_huge_mapping(), to decide if we can create THP mappings for these pages. As for PUD_SIZE THPs, to be honest, I have no idea now :( How about the following diff ? diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 97b5417..98e5cec 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1791,7 +1791,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * currently supported. This code will need to be * updated to support other THP sizes. */ - if (transparent_hugepage_adjust(, _ipa)) + if (fault_supports_stage2_huge_mappings(memslot, hva, PMD_SIZE) && + transparent_hugepage_adjust(, _ipa)) vma_pagesize = PMD_SIZE; } I think this is good enough for the issue. (One minor concern: With this change, it seems that we no longer need "force_pte" and can just use "logging_active" instead. But this is not much related to what we're fixing.) I would still leave the force_pte there to avoid checking for a THP case in a situation where we forced to PTE level mapping on a hugepage backed VMA. It would serve to avoid another check. Hi Suzuki, Yes, I agree, thanks. Cool, I have a patch to fix this properly and two other patches to clean up and unify the way we handle the THP backed hugepages. Will send them out after a bit of testing, later today. Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] kvm: arm: Skip stage2 huge mappings for unaligned ipa backed by THP
Hi Zenghui On 04/09/2019 09:05 AM, Zenghui Yu wrote: On 2019/4/9 2:40, Suzuki K Poulose wrote: Hi Zenhui, On 04/08/2019 04:11 PM, Zenghui Yu wrote: Hi Suzuki, Thanks for the reply. ... Hi Suzuki, Why not making use of fault_supports_stage2_huge_mapping()? Let it do some checks for us. fault_supports_stage2_huge_mapping() was intended to do a *two-step* check to tell us that can we create stage2 huge block mappings, and this check is both for hugetlbfs and THP. With commit a80868f398554842b14, we pass PAGE_SIZE as "map_size" for normal size pages (which turned out to be almost meaningless), and unfortunately the THP check no longer works. Thats correct. So we want to rework *THP* check process. Your patch fixes the first checking-step, but the second is still missed, am I wrong? It fixes the step explicitly for the THP by making sure that the GPA and the HVA are aligned to the map size. Yes, I understand how your patch had fixed the issue. But what I'm really concerned about here is the *second* checking-step in fault_supports_stage2_huge_mapping(). We have to check if we are mapping a non-block aligned or non-block sized memslot, if so, we can not create block mappings for the beginning and end of this memslot. This is what the second part of fault_supports_stage2_huge_mapping() had done. I haven't seen this checking-step in your patch, did I miss something? I see. I don't think this calls for a VM_BUG_ON(). It is simply a case where the GPA is not aligned to HVA, but for normal VMA that could be made THP. We had this VM_BUG_ON(), which would have never hit because we would have set force_pte if they were not aligned. Yes, I agree. + /* Skip memslots with unaligned IPA and user address */ + if ((gfn & mask) != (pfn & mask)) + return false; if (pfn & mask) { *ipap &= PMD_MASK; kvm_release_pfn_clean(pfn); ---8>--- Rework fault_supports_stage2_huge_mapping(), let it check THP again. Signed-off-by: Zenghui Yu --- virt/kvm/arm/mmu.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 27c9583..5e1b258 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1632,6 +1632,15 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, uaddr_end = uaddr_start + size; /* + * If the memslot is _not_ backed by hugetlbfs, then check if it + * can be backed by transparent hugepages. + * + * Currently only PMD_SIZE THPs are supported, revisit it later. + */ + if (map_size == PAGE_SIZE) + map_size = PMD_SIZE; + This looks hackish. What is we support PUD_SIZE huge page in the future ? Yes, this might make the code a little difficult to understand. But by doing so, we follow the same logic before commit a80868f398554842b14, that said, we do the two-step checking for normal size pages in fault_supports_stage2_huge_mapping(), to decide if we can create THP mappings for these pages. As for PUD_SIZE THPs, to be honest, I have no idea now :( How about the following diff ? diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 97b5417..98e5cec 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1791,7 +1791,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * currently supported. This code will need to be * updated to support other THP sizes. */ - if (transparent_hugepage_adjust(, _ipa)) + if (fault_supports_stage2_huge_mappings(memslot, hva, PMD_SIZE) && + transparent_hugepage_adjust(, _ipa)) vma_pagesize = PMD_SIZE; } I think this is good enough for the issue. (One minor concern: With this change, it seems that we no longer need "force_pte" and can just use "logging_active" instead. But this is not much related to what we're fixing.) I would still leave the force_pte there to avoid checking for a THP case in a situation where we forced to PTE level mapping on a hugepage backed VMA. It would serve to avoid another check. Cheers Suzuki thanks. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] kvm: arm: Skip stage2 huge mappings for unaligned ipa backed by THP
Hi Zenhui, On 04/08/2019 04:11 PM, Zenghui Yu wrote: Hi Suzuki, Thanks for the reply. ... Hi Suzuki, Why not making use of fault_supports_stage2_huge_mapping()? Let it do some checks for us. fault_supports_stage2_huge_mapping() was intended to do a *two-step* check to tell us that can we create stage2 huge block mappings, and this check is both for hugetlbfs and THP. With commit a80868f398554842b14, we pass PAGE_SIZE as "map_size" for normal size pages (which turned out to be almost meaningless), and unfortunately the THP check no longer works. Thats correct. So we want to rework *THP* check process. Your patch fixes the first checking-step, but the second is still missed, am I wrong? It fixes the step explicitly for the THP by making sure that the GPA and the HVA are aligned to the map size. Yes, I understand how your patch had fixed the issue. But what I'm really concerned about here is the *second* checking-step in fault_supports_stage2_huge_mapping(). We have to check if we are mapping a non-block aligned or non-block sized memslot, if so, we can not create block mappings for the beginning and end of this memslot. This is what the second part of fault_supports_stage2_huge_mapping() had done. I haven't seen this checking-step in your patch, did I miss something? I see. I don't think this calls for a VM_BUG_ON(). It is simply a case where the GPA is not aligned to HVA, but for normal VMA that could be made THP. We had this VM_BUG_ON(), which would have never hit because we would have set force_pte if they were not aligned. Yes, I agree. + /* Skip memslots with unaligned IPA and user address */ + if ((gfn & mask) != (pfn & mask)) + return false; if (pfn & mask) { *ipap &= PMD_MASK; kvm_release_pfn_clean(pfn); ---8>--- Rework fault_supports_stage2_huge_mapping(), let it check THP again. Signed-off-by: Zenghui Yu --- virt/kvm/arm/mmu.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 27c9583..5e1b258 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1632,6 +1632,15 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, uaddr_end = uaddr_start + size; /* + * If the memslot is _not_ backed by hugetlbfs, then check if it + * can be backed by transparent hugepages. + * + * Currently only PMD_SIZE THPs are supported, revisit it later. + */ + if (map_size == PAGE_SIZE) + map_size = PMD_SIZE; + This looks hackish. What is we support PUD_SIZE huge page in the future ? Yes, this might make the code a little difficult to understand. But by doing so, we follow the same logic before commit a80868f398554842b14, that said, we do the two-step checking for normal size pages in fault_supports_stage2_huge_mapping(), to decide if we can create THP mappings for these pages. As for PUD_SIZE THPs, to be honest, I have no idea now :( How about the following diff ? diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 97b5417..98e5cec 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1791,7 +1791,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * currently supported. This code will need to be * updated to support other THP sizes. */ - if (transparent_hugepage_adjust(, _ipa)) + if (fault_supports_stage2_huge_mappings(memslot, hva, PMD_SIZE) && + transparent_hugepage_adjust(, _ipa)) vma_pagesize = PMD_SIZE; } -- 2.7.4 Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] kvm: arm: Skip stage2 huge mappings for unaligned ipa backed by THP
On 04/08/2019 04:50 AM, Zenghui Yu wrote: On 2019/4/2 19:06, Suzuki K Poulose wrote: With commit a80868f398554842b14, we no longer ensure that the THP page is properly aligned in the guest IPA. Skip the stage2 huge mapping for unaligned IPA backed by transparent hugepages. Fixes: a80868f398554842b14 ("KVM: arm/arm64: Enforce PTE mappings at stage2 when needed") Reported-by: Eric Auger Cc: Marc Zyngier Cc: Chirstoffer Dall Cc: Zenghui Yu Cc: Zheng Xiang Tested-by: Eric Auger Signed-off-by: Suzuki K Poulose Hi Suzuki, Why not making use of fault_supports_stage2_huge_mapping()? Let it do some checks for us. fault_supports_stage2_huge_mapping() was intended to do a *two-step* check to tell us that can we create stage2 huge block mappings, and this check is both for hugetlbfs and THP. With commit a80868f398554842b14, we pass PAGE_SIZE as "map_size" for normal size pages (which turned out to be almost meaningless), and unfortunately the THP check no longer works. Thats correct. So we want to rework *THP* check process. Your patch fixes the first checking-step, but the second is still missed, am I wrong? It fixes the step explicitly for the THP by making sure that the GPA and the HVA are aligned to the map size. Can you please give a look at the below diff? thanks, zenghui --- virt/kvm/arm/mmu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 27c9583..4a22f5b 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1412,7 +1412,9 @@ static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap) * page accordingly. */ mask = PTRS_PER_PMD - 1; - VM_BUG_ON((gfn & mask) != (pfn & mask)); Somehow, I'd prefer keeping the VM_BUG_ON() here, let it report some potential issues in the future (of course I hope none:) ) I don't think this calls for a VM_BUG_ON(). It is simply a case where the GPA is not aligned to HVA, but for normal VMA that could be made THP. We had this VM_BUG_ON(), which would have never hit because we would have set force_pte if they were not aligned. + /* Skip memslots with unaligned IPA and user address */ + if ((gfn & mask) != (pfn & mask)) + return false; if (pfn & mask) { *ipap &= PMD_MASK; kvm_release_pfn_clean(pfn); ---8>--- Rework fault_supports_stage2_huge_mapping(), let it check THP again. Signed-off-by: Zenghui Yu --- virt/kvm/arm/mmu.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 27c9583..5e1b258 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1632,6 +1632,15 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, uaddr_end = uaddr_start + size; /* + * If the memslot is _not_ backed by hugetlbfs, then check if it + * can be backed by transparent hugepages. + * + * Currently only PMD_SIZE THPs are supported, revisit it later. + */ + if (map_size == PAGE_SIZE) + map_size = PMD_SIZE; + This looks hackish. What is we support PUD_SIZE huge page in the future ? + /* * Pages belonging to memslots that don't have the same alignment * within a PMD/PUD for userspace and IPA cannot be mapped with stage-2 * PMD/PUD entries, because we'll end up mapping the wrong pages. @@ -1643,7 +1652,7 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, * |abcde|fgh Stage-1 block | Stage-1 block tv|xyz| * +-+++---+ * - * memslot->base_gfn << PAGE_SIZE: + * memslot->base_gfn << PAGE_SHIFT: That could be fixed. * +---+++-+ * |abc|def Stage-2 block | Stage-2 block |tvxyz| * +---+++-+ But personally I don't think this is the right way to fix it. And as mentioned, the VM_BUG_ON() is unnecessary and could easily be triggered by the user by using unaligned GPA/HVA. All we need to do is, use page mapping in such cases, which we do with my patch. Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH] kvm: arm: Skip stage2 huge mappings for unaligned ipa backed by THP
With commit a80868f398554842b14, we no longer ensure that the THP page is properly aligned in the guest IPA. Skip the stage2 huge mapping for unaligned IPA backed by transparent hugepages. Fixes: a80868f398554842b14 ("KVM: arm/arm64: Enforce PTE mappings at stage2 when needed") Reported-by: Eric Auger Cc: Marc Zyngier Cc: Chirstoffer Dall Cc: Zenghui Yu Cc: Zheng Xiang Tested-by: Eric Auger Signed-off-by: Suzuki K Poulose --- virt/kvm/arm/mmu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 27c9583..4a22f5b 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1412,7 +1412,9 @@ static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap) * page accordingly. */ mask = PTRS_PER_PMD - 1; - VM_BUG_ON((gfn & mask) != (pfn & mask)); + /* Skip memslots with unaligned IPA and user address */ + if ((gfn & mask) != (pfn & mask)) + return false; if (pfn & mask) { *ipap &= PMD_MASK; kvm_release_pfn_clean(pfn); -- 2.7.4 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 5/8] KVM: arm/arm64: Enforce PTE mappings at stage2 when needed
On Mon, Apr 01, 2019 at 07:10:37PM +0200, Auger Eric wrote: > Hi Suzuki, > > On 3/28/19 2:36 PM, Marc Zyngier wrote: > > From: Suzuki K Poulose > > > > commit 6794ad5443a2118 ("KVM: arm/arm64: Fix unintended stage 2 PMD > > mappings") > > made the checks to skip huge mappings, stricter. However it introduced > > a bug where we still use huge mappings, ignoring the flag to > > use PTE mappings, by not reseting the vma_pagesize to PAGE_SIZE. > > > > Also, the checks do not cover the PUD huge pages, that was > > under review during the same period. This patch fixes both > > the issues. > > I face a regression with this patch. My guest gets stuck. I am running > on AMD Seattle. Reverting the patch makes things work again for me. I > run with qemu. In this scenario I don't use hugepages. I use 64kB page > size for both the host and guest. Hi Eric, Thanks for the testing. Does the following patch fix the issue for you ? ---8>--- kvm: arm: Skip transparent huge pages in unaligned memslots We silently create stage2 huge mappings for a memslot with unaligned IPA and user address. Signed-off-by: Suzuki K Poulose --- virt/kvm/arm/mmu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 27c9583..4a22f5b 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1412,7 +1412,9 @@ static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap) * page accordingly. */ mask = PTRS_PER_PMD - 1; - VM_BUG_ON((gfn & mask) != (pfn & mask)); + /* Skip memslots with unaligned IPA and user address */ + if ((gfn & mask) != (pfn & mask)) + return false; if (pfn & mask) { *ipap &= PMD_MASK; kvm_release_pfn_clean(pfn); -- 2.7.4 Kind regards Suzuki > > Thanks > > Eric > > > > Fixes : 6794ad5443a2118 ("KVM: arm/arm64: Fix unintended stage 2 PMD > > mappings") > > Reported-by: Zenghui Yu > > Cc: Zenghui Yu > > Cc: Christoffer Dall > > Signed-off-by: Suzuki K Poulose > > Signed-off-by: Marc Zyngier > > --- > > virt/kvm/arm/mmu.c | 43 +-- > > 1 file changed, 21 insertions(+), 22 deletions(-) > > > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > > index ffd7acdceac7..bcdf978c0d1d 100644 > > --- a/virt/kvm/arm/mmu.c > > +++ b/virt/kvm/arm/mmu.c > > @@ -1594,8 +1594,9 @@ static void kvm_send_hwpoison_signal(unsigned long > > address, > > send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current); > > } > > > > -static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot > > *memslot, > > - unsigned long hva) > > +static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot > > *memslot, > > + unsigned long hva, > > + unsigned long map_size) > > { > > gpa_t gpa_start; > > hva_t uaddr_start, uaddr_end; > > @@ -1610,34 +1611,34 @@ static bool > > fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot, > > > > /* > > * Pages belonging to memslots that don't have the same alignment > > -* within a PMD for userspace and IPA cannot be mapped with stage-2 > > -* PMD entries, because we'll end up mapping the wrong pages. > > +* within a PMD/PUD for userspace and IPA cannot be mapped with stage-2 > > +* PMD/PUD entries, because we'll end up mapping the wrong pages. > > * > > * Consider a layout like the following: > > * > > *memslot->userspace_addr: > > *+-+++---+ > > -*|abcde|fgh Stage-1 PMD|Stage-1 PMD tv|xyz| > > +*|abcde|fgh Stage-1 block |Stage-1 block tv|xyz| > > *+-+++---+ > > * > > *memslot->base_gfn << PAGE_SIZE: > > * +---+++-+ > > -* |abc|def Stage-2 PMD|Stage-2 PMD |tvxyz| > > +* |abc|def Stage-2 block |Stage-2 block |tvxyz| > > * +---+++-+ > > * > > -* If we create those stage-2 PMDs, we'll end up with this incorrect > > +* If we create those stage-2 blocks, we'll end up wit
Re: [PATCH v12 2/8] arm64: KVM: encapsulate kvm_cpu_context in kvm_host_data
On 03/28/2019 10:37 AM, Andrew Murray wrote: The virt/arm core allocates a kvm_cpu_context_t percpu, at present this is a typedef to kvm_cpu_context and is used to store host cpu context. The kvm_cpu_context structure is also used elsewhere to hold vcpu context. In order to use the percpu to hold additional future host information we encapsulate kvm_cpu_context in a new structure and rename the typedef and percpu to match. Signed-off-by: Andrew Murray --- arch/arm/include/asm/kvm_host.h | 8 ++-- arch/arm64/include/asm/kvm_asm.h | 3 ++- arch/arm64/include/asm/kvm_host.h | 16 ++-- arch/arm64/kernel/asm-offsets.c | 1 + virt/kvm/arm/arm.c| 14 -- 5 files changed, 27 insertions(+), 15 deletions(-) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 770d73257ad9..427c28be6452 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -150,7 +150,11 @@ struct kvm_cpu_context { u32 cp15[NR_CP15_REGS]; }; -typedef struct kvm_cpu_context kvm_cpu_context_t; +struct kvm_host_data { + struct kvm_cpu_context host_ctxt; +}; + +typedef struct kvm_host_data kvm_host_data_t; static inline void kvm_init_host_cpu_context(kvm_cpu_context_t *cpu_ctxt, int cpu) We need to fix this function prototype to accept struct kvm_cpu_context, instead of the now removed kvm_cpu_context_t, to prevent a build break on arm32 ? With that : Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm/arm64: Comments cleanup in mmu.c
On 25/03/2019 08:02, Zenghui Yu wrote: Some comments in virt/kvm/arm/mmu.c are outdated and incorrect. Get rid of these comments, to keep the code correct and current as a whole. Only a cleanup here, no functional change. Signed-off-by: Zenghui Yu --- virt/kvm/arm/mmu.c | 23 +-- 1 file changed, 9 insertions(+), 14 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index ffd7acd..0a6efe7 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -102,8 +102,7 @@ static bool kvm_is_device_pfn(unsigned long pfn) * @addr: IPA * @pmd: pmd pointer for IPA * - * Function clears a PMD entry, flushes addr 1st and 2nd stage TLBs. Marks all - * pages in the range dirty. + * Function clears a PMD entry, flushes addr 1st and 2nd stage TLBs. */ static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd) { @@ -121,8 +120,7 @@ static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd) * @addr: IPA * @pud: pud pointer for IPA * - * Function clears a PUD entry, flushes addr 1st and 2nd stage TLBs. Marks all - * pages in the range dirty. + * Function clears a PUD entry, flushes addr 1st and 2nd stage TLBs. */ static void stage2_dissolve_pud(struct kvm *kvm, phys_addr_t addr, pud_t *pudp) { @@ -899,9 +897,8 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size, * kvm_alloc_stage2_pgd - allocate level-1 table for stage-2 translation. * @kvm: The KVM struct pointer for the VM. * - * Allocates only the stage-2 HW PGD level table(s) (can support either full - * 40-bit input addresses or limited to 32-bit input addresses). Clears the - * allocated pages. + * Allocates only the stage-2 HW PGD level table(s) of size defined by + * stage2_pgd_size(kvm). * * Note we don't need locking here as this is only called when the VM is * created, which can only be done once. @@ -1451,13 +1448,11 @@ static void stage2_wp_pmds(struct kvm *kvm, pud_t *pud, } /** - * stage2_wp_puds - write protect PGD range - * @pgd: pointer to pgd entry - * @addr: range start address - * @end: range end address - * - * Process PUD entries, for a huge PUD we cause a panic. - */ + * stage2_wp_puds - write protect PGD range + * @pgd: pointer to pgd entry + * @addr: range start address + * @end: range end address + */ Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH RFC 01/14] arm64/mm: Introduce asid_info structure and move asid_generation/asid_map to it
Hi Julien, On 21/03/2019 16:36, Julien Grall wrote: In an attempt to make the ASID allocator generic, create a new structure asid_info to store all the information necessary for the allocator. For now, move the variables asid_generation and asid_map to the new structure asid_info. Follow-up patches will move more variables. Note to avoid more renaming aftwards, a local variable 'info' has been created and is a pointer to the ASID allocator structure. Signed-off-by: Julien Grall --- arch/arm64/mm/context.c | 46 ++ 1 file changed, 26 insertions(+), 20 deletions(-) diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c index 1f0ea2facf24..34db54f1a39a 100644 --- a/arch/arm64/mm/context.c +++ b/arch/arm64/mm/context.c @@ -30,8 +30,11 @@ static u32 asid_bits; static DEFINE_RAW_SPINLOCK(cpu_asid_lock); -static atomic64_t asid_generation; -static unsigned long *asid_map; +struct asid_info +{ + atomic64_t generation; + unsigned long *map; +} asid_info; Shouldn't this be static ? Rest looks fine. Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v2] kvm: arm: Fix handling of stage2 huge mappings
We rely on the mmu_notifier call backs to handle the split/merge of huge pages and thus we are guaranteed that, while creating a block mapping, either the entire block is unmapped at stage2 or it is missing permission. However, we miss a case where the block mapping is split for dirty logging case and then could later be made block mapping, if we cancel the dirty logging. This not only creates inconsistent TLB entries for the pages in the the block, but also leakes the table pages for PMD level. Handle this corner case for the huge mappings at stage2 by unmapping the non-huge mapping for the block. This could potentially release the upper level table. So we need to restart the table walk once we unmap the range. Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages") Reported-by: Zheng Xiang Cc: Zheng Xiang Cc: Zhenghui Yu Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- Changes since v1: - Fix build break on arm32 - Add missing definitions for S2_PUD_* - Use kvm_pud_pfn() instead of pud_pfn() - Make PUD handling similar to PMD, by dropping the else case arch/arm/include/asm/stage2_pgtable.h | 2 ++ virt/kvm/arm/mmu.c| 59 +-- 2 files changed, 45 insertions(+), 16 deletions(-) diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h index de20895..9e11dce 100644 --- a/arch/arm/include/asm/stage2_pgtable.h +++ b/arch/arm/include/asm/stage2_pgtable.h @@ -75,6 +75,8 @@ static inline bool kvm_stage2_has_pud(struct kvm *kvm) #define S2_PMD_MASKPMD_MASK #define S2_PMD_SIZEPMD_SIZE +#define S2_PUD_MASKPUD_MASK +#define S2_PUD_SIZEPUD_SIZE static inline bool kvm_stage2_has_pmd(struct kvm *kvm) { diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index fce0983..97b5417 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1060,25 +1060,43 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache { pmd_t *pmd, old_pmd; +retry: pmd = stage2_get_pmd(kvm, cache, addr); VM_BUG_ON(!pmd); old_pmd = *pmd; + /* +* Multiple vcpus faulting on the same PMD entry, can +* lead to them sequentially updating the PMD with the +* same value. Following the break-before-make +* (pmd_clear() followed by tlb_flush()) process can +* hinder forward progress due to refaults generated +* on missing translations. +* +* Skip updating the page table if the entry is +* unchanged. +*/ + if (pmd_val(old_pmd) == pmd_val(*new_pmd)) + return 0; + if (pmd_present(old_pmd)) { /* -* Multiple vcpus faulting on the same PMD entry, can -* lead to them sequentially updating the PMD with the -* same value. Following the break-before-make -* (pmd_clear() followed by tlb_flush()) process can -* hinder forward progress due to refaults generated -* on missing translations. +* If we already have PTE level mapping for this block, +* we must unmap it to avoid inconsistent TLB state and +* leaking the table page. We could end up in this situation +* if the memory slot was marked for dirty logging and was +* reverted, leaving PTE level mappings for the pages accessed +* during the period. So, unmap the PTE level mapping for this +* block and retry, as we could have released the upper level +* table in the process. * -* Skip updating the page table if the entry is -* unchanged. +* Normal THP split/merge follows mmu_notifier callbacks and do +* get handled accordingly. */ - if (pmd_val(old_pmd) == pmd_val(*new_pmd)) - return 0; - + if (!pmd_thp_or_huge(old_pmd)) { + unmap_stage2_range(kvm, addr & S2_PMD_MASK, S2_PMD_SIZE); + goto retry; + } /* * Mapping in huge pages should only happen through a * fault. If a page is merged into a transparent huge @@ -1090,8 +1108,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache * should become splitting first, unmapped, merged, * and mapped back in on-demand. */ - VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); - + WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); pmd_clear(pmd); kvm_tlb_flush_vmid_ipa(kvm, addr); } else { @@ -
Re: [PATCH] kvm: arm: Fix handling of stage2 huge mappings
Marc, On 20/03/2019 10:35, Marc Zyngier wrote: On Wed, 20 Mar 2019 10:23:39 + Suzuki K Poulose wrote: Hi Suzuki, Marc, On 20/03/2019 10:11, Marc Zyngier wrote: On Wed, 20 Mar 2019 09:44:38 + Suzuki K Poulose wrote: Hi Marc, On 20/03/2019 08:15, Marc Zyngier wrote: Hi Suzuki, On Tue, 19 Mar 2019 14:11:08 +, Suzuki K Poulose wrote: ... + if (!pmd_thp_or_huge(old_pmd)) { + unmap_stage2_range(kvm, addr & S2_PMD_MASK, S2_PMD_SIZE); + goto retry; + if (!stage2_pud_huge(kvm, old_pud)) { + unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE); We should really get rid of the S2_P{U/M}D_* definitions, as they are always the same as the host. The only thing that changes is the PGD size which varies according to the IPA and the concatenation. Also what do you think about using P{M,U}D_* instead of S2_P{M,U}D_* above ? I could make that change with the respin. Given that this is a fix, I'd like it to be as small as obvious as possible, making it easier to backport. I'm happy to take another patch for 5.2 that will drop the whole S2_P* if we still think that this should be the case (though what I'd really like is to have architectural levels instead of these arbitrary definitions). I only meant the two new instances added above in the patch. Of course, I could send something to fix the existing ones. Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] kvm: arm: Fix handling of stage2 huge mappings
Marc, On 20/03/2019 10:11, Marc Zyngier wrote: On Wed, 20 Mar 2019 09:44:38 + Suzuki K Poulose wrote: Hi Marc, On 20/03/2019 08:15, Marc Zyngier wrote: Hi Suzuki, On Tue, 19 Mar 2019 14:11:08 +, Suzuki K Poulose wrote: We rely on the mmu_notifier call backs to handle the split/merge of huge pages and thus we are guaranteed that, while creating a block mapping, either the entire block is unmapped at stage2 or it is missing permission. However, we miss a case where the block mapping is split for dirty logging case and then could later be made block mapping, if we cancel the dirty logging. This not only creates inconsistent TLB entries for the pages in the the block, but also leakes the table pages for PMD level. Handle this corner case for the huge mappings at stage2 by unmapping the non-huge mapping for the block. This could potentially release the upper level table. So we need to restart the table walk once we unmap the range. Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages") Reported-by: Zheng Xiang Cc: Zheng Xiang Cc: Zhengui Yu Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose ... + if (!pmd_thp_or_huge(old_pmd)) { + unmap_stage2_range(kvm, addr & S2_PMD_MASK, S2_PMD_SIZE); + goto retry; + if (!stage2_pud_huge(kvm, old_pud)) { + unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE); We should really get rid of the S2_P{U/M}D_* definitions, as they are always the same as the host. The only thing that changes is the PGD size which varies according to the IPA and the concatenation. Also what do you think about using P{M,U}D_* instead of S2_P{M,U}D_* above ? I could make that change with the respin. Sure, feel free to send a fixed version. I'll drop the currently queued patch. Thanks. Sorry for the trouble. Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] kvm: arm: Fix handling of stage2 huge mappings
Hi Marc, On 20/03/2019 08:15, Marc Zyngier wrote: Hi Suzuki, On Tue, 19 Mar 2019 14:11:08 +, Suzuki K Poulose wrote: We rely on the mmu_notifier call backs to handle the split/merge of huge pages and thus we are guaranteed that, while creating a block mapping, either the entire block is unmapped at stage2 or it is missing permission. However, we miss a case where the block mapping is split for dirty logging case and then could later be made block mapping, if we cancel the dirty logging. This not only creates inconsistent TLB entries for the pages in the the block, but also leakes the table pages for PMD level. Handle this corner case for the huge mappings at stage2 by unmapping the non-huge mapping for the block. This could potentially release the upper level table. So we need to restart the table walk once we unmap the range. Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages") Reported-by: Zheng Xiang Cc: Zheng Xiang Cc: Zhengui Yu Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose ... +retry: pmd = stage2_get_pmd(kvm, cache, addr); VM_BUG_ON(!pmd); ... if (pmd_present(old_pmd)) { /* -* Multiple vcpus faulting on the same PMD entry, can -* lead to them sequentially updating the PMD with the -* same value. Following the break-before-make -* (pmd_clear() followed by tlb_flush()) process can -* hinder forward progress due to refaults generated -* on missing translations. +* If we already have PTE level mapping for this block, +* we must unmap it to avoid inconsistent TLB state and +* leaking the table page. We could end up in this situation +* if the memory slot was marked for dirty logging and was +* reverted, leaving PTE level mappings for the pages accessed +* during the period. So, unmap the PTE level mapping for this +* block and retry, as we could have released the upper level +* table in the process. * -* Skip updating the page table if the entry is -* unchanged. +* Normal THP split/merge follows mmu_notifier callbacks and do +* get handled accordingly. */ - if (pmd_val(old_pmd) == pmd_val(*new_pmd)) - return 0; - + if (!pmd_thp_or_huge(old_pmd)) { + unmap_stage2_range(kvm, addr & S2_PMD_MASK, S2_PMD_SIZE); + goto retry; This looks slightly dodgy. Doing this retry results in another call to stage2_get_pmd(), which may or may not result in allocating a PUD. I think this is safe as if we managed to get here, it means the whole hierarchy was already present and nothing was allocated in the first round. Somehow, I would feel more comfortable with just not even trying. Unmap, don't fix the fault, let the vcpu come again for additional punishment. But this is probably more invasive, as none of the stage2_set_p*() return value is ever evaluated. Oh well. Yes. The other option was to unmap_stage2_ptes() and get the page refcount on the new pmd. But that kind of makes it a bit difficult to follow the code. if (stage2_pud_present(kvm, old_pud)) { - stage2_pud_clear(kvm, pudp); - kvm_tlb_flush_vmid_ipa(kvm, addr); + /* +* If we already have table level mapping for this block, unmap +* the range for this block and retry. +*/ + if (!stage2_pud_huge(kvm, old_pud)) { + unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE); This broke 32bit. I've added the following hunk to fix it: Grrr! Sorry about that. diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h index de2089501b8b..b8f21088a744 100644 --- a/arch/arm/include/asm/stage2_pgtable.h +++ b/arch/arm/include/asm/stage2_pgtable.h @@ -68,6 +68,9 @@ stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) #define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp) #define stage2_pud_table_empty(kvm, pudp) false +#define S2_PUD_MASKPGDIR_MASK +#define S2_PUD_SIZEPGDIR_SIZE + We should really get rid of the S2_P{U/M}D_* definitions, as they are always the same as the host. The only thing that changes is the PGD size which varies according to the IPA and the concatenation. static inline bool kvm_stage2_has_pud(struct kvm *kvm) { return false; + goto retry; + } else { + WARN_ON_ONCE(pud_pfn(old_pud) != pud_pfn(*new_pudp)); + stage2_pud_clear(kvm, pudp); + kvm_
[PATCH] kvm: arm: Fix handling of stage2 huge mappings
We rely on the mmu_notifier call backs to handle the split/merge of huge pages and thus we are guaranteed that, while creating a block mapping, either the entire block is unmapped at stage2 or it is missing permission. However, we miss a case where the block mapping is split for dirty logging case and then could later be made block mapping, if we cancel the dirty logging. This not only creates inconsistent TLB entries for the pages in the the block, but also leakes the table pages for PMD level. Handle this corner case for the huge mappings at stage2 by unmapping the non-huge mapping for the block. This could potentially release the upper level table. So we need to restart the table walk once we unmap the range. Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages") Reported-by: Zheng Xiang Cc: Zheng Xiang Cc: Zhengui Yu Cc: Marc Zyngier Cc: Christoffer Dall Signed-off-by: Suzuki K Poulose --- virt/kvm/arm/mmu.c | 63 ++ 1 file changed, 45 insertions(+), 18 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index fce0983..6ad6f19d 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1060,25 +1060,43 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache { pmd_t *pmd, old_pmd; +retry: pmd = stage2_get_pmd(kvm, cache, addr); VM_BUG_ON(!pmd); old_pmd = *pmd; + /* +* Multiple vcpus faulting on the same PMD entry, can +* lead to them sequentially updating the PMD with the +* same value. Following the break-before-make +* (pmd_clear() followed by tlb_flush()) process can +* hinder forward progress due to refaults generated +* on missing translations. +* +* Skip updating the page table if the entry is +* unchanged. +*/ + if (pmd_val(old_pmd) == pmd_val(*new_pmd)) + return 0; + if (pmd_present(old_pmd)) { /* -* Multiple vcpus faulting on the same PMD entry, can -* lead to them sequentially updating the PMD with the -* same value. Following the break-before-make -* (pmd_clear() followed by tlb_flush()) process can -* hinder forward progress due to refaults generated -* on missing translations. +* If we already have PTE level mapping for this block, +* we must unmap it to avoid inconsistent TLB state and +* leaking the table page. We could end up in this situation +* if the memory slot was marked for dirty logging and was +* reverted, leaving PTE level mappings for the pages accessed +* during the period. So, unmap the PTE level mapping for this +* block and retry, as we could have released the upper level +* table in the process. * -* Skip updating the page table if the entry is -* unchanged. +* Normal THP split/merge follows mmu_notifier callbacks and do +* get handled accordingly. */ - if (pmd_val(old_pmd) == pmd_val(*new_pmd)) - return 0; - + if (!pmd_thp_or_huge(old_pmd)) { + unmap_stage2_range(kvm, addr & S2_PMD_MASK, S2_PMD_SIZE); + goto retry; + } /* * Mapping in huge pages should only happen through a * fault. If a page is merged into a transparent huge @@ -1090,8 +1108,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache * should become splitting first, unmapped, merged, * and mapped back in on-demand. */ - VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); - + WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); pmd_clear(pmd); kvm_tlb_flush_vmid_ipa(kvm, addr); } else { @@ -1107,6 +1124,7 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac { pud_t *pudp, old_pud; +retry: pudp = stage2_get_pud(kvm, cache, addr); VM_BUG_ON(!pudp); @@ -1114,16 +1132,25 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac /* * A large number of vcpus faulting on the same stage 2 entry, -* can lead to a refault due to the -* stage2_pud_clear()/tlb_flush(). Skip updating the page -* tables if there is no change. +* can lead to a refault due to the stage2_pud_clear()/tlb_flush(). +* Skip updating the page tables if there is no change. */ if (pud_val(old_pud) == pud_val(*new_pudp)) return 0; if (stage2_pud_presen
Re: [RFC] Question about TLB flush while set Stage-2 huge pages
Hi ! On Sun, Mar 17, 2019 at 09:34:11PM +0800, Zenghui Yu wrote: > Hi Suzuki, > > On 2019/3/15 22:56, Suzuki K Poulose wrote: > >Hi Zhengui, > > s/Zhengui/Zheng/ > > (I think you must wanted to say "Hi" to Zheng :-) ) > Sorry about that. > > I have looked into your patch and the kernel log, and I believe that > your patch had already addressed this issue. But I think we can do it > a little better - two more points need to be handled with caution. > > Take PMD hugepage (PMD_SIZE == 2M) for example: > ... > >Thats bad, we seem to be making upto 4 unbalanced put_page(). > > > >>>>--- > >>>> virt/kvm/arm/mmu.c | 51 > >>>>+++ > >>>> 1 file changed, 35 insertions(+), 16 deletions(-) > >>>> > >>>>diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > >>>>index 66e0fbb5..04b0f9b 100644 > >>>>--- a/virt/kvm/arm/mmu.c > >>>>+++ b/virt/kvm/arm/mmu.c > >>>>@@ -1076,24 +1076,38 @@ static int stage2_set_pmd_huge(struct kvm > >>>>*kvm, struct kvm_mmu_memory_cache > >>>> * Skip updating the page table if the entry is > >>>> * unchanged. > >>>> */ > >>>>- if (pmd_val(old_pmd) == pmd_val(*new_pmd)) > >>>>+ if (pmd_val(old_pmd) == pmd_val(*new_pmd)) { > >>>> return 0; > >>>>- > >>>>+ } else if (WARN_ON_ONCE(!pmd_thp_or_huge(old_pmd))) { > >>>> /* > >>>>- * Mapping in huge pages should only happen through a > >>>>- * fault. If a page is merged into a transparent huge > >>>>- * page, the individual subpages of that huge page > >>>>- * should be unmapped through MMU notifiers before we > >>>>- * get here. > >>>>- * > >>>>- * Merging of CompoundPages is not supported; they > >>>>- * should become splitting first, unmapped, merged, > >>>>- * and mapped back in on-demand. > >>>>+ * If we have PTE level mapping for this block, > >>>>+ * we must unmap it to avoid inconsistent TLB > >>>>+ * state. We could end up in this situation if > >>>>+ * the memory slot was marked for dirty logging > >>>>+ * and was reverted, leaving PTE level mappings > >>>>+ * for the pages accessed during the period. > >>>>+ * Normal THP split/merge follows mmu_notifier > >>>>+ * callbacks and do get handled accordingly. > >>>> */ > >>>>- VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); > >>>>+ unmap_stage2_range(kvm, (addr & S2_PMD_MASK), > >>>>S2_PMD_SIZE); > > First, using unmap_stage2_range() here is not quite appropriate. Suppose > we've only accessed one 2M page in HPA [x, x+1]Gib range, with other > pages unaccessed. What will happen if unmap_stage2_range(this_2M_page)? > We'll unexpectedly reach clear_stage2_pud_entry(), and things are going > to get really bad. So we'd better use unmap_stage2_ptes() here since we > only want to unmap a 2M range. Yes, you're right. If this PMD entry is the only entry in the parent PUD table, then the PUD table may get free'd and we may install the table in a place which is not plugged into the table. > > > Second, consider below function stack: > > unmap_stage2_ptes() > clear_stage2_pmd_entry() > put_page(virt_to_page(pmd)) > > It seems that we have one "redundant" put_page() here, (thus comes the > bad kernel log ... ,) but actually we do not. By stage2_set_pmd_huge(), > the PMD table entry will then point to a 2M block (originally pointed > to a PTE table), the _refcount of this PMD-level table page should _not_ > change after unmap_stage2_ptes(). So what we really should do is adding > a get_page() after unmapping to keep the _refcount a balance! Yes we need an additional refcount on the new huge pmd table, if we are tearing down the PTE level table. > > > thoughts ? A simple patch below (based on yours) for details. > > > thanks, > > zenghui > > > >> > >>It seems that kvm decreases the _refcount of the page twice in > >>transparent_hugepage_adjust() > >>and unmap_stage2_range(). > > > >But I thought we should be doing that on the head_page already, as
Re: [RFC] Question about TLB flush while set Stage-2 huge pages
Hi Zhengui, On 15/03/2019 08:21, Zheng Xiang wrote: Hi Suzuki, I have tested this patch, VM doesn't hang and we get expected WARNING log: Thanks for the quick testing ! However, we also get the following unexpected log: [ 908.329900] BUG: Bad page state in process qemu-kvm pfn:a2fb41cf [ 908.339415] page:7e28bed073c0 count:-4 mapcount:0 mapping: index:0x0 [ 908.339416] flags: 0x4e00() [ 908.339418] raw: 4e00 dead0100 dead0200 [ 908.339419] raw: fffc [ 908.339420] page dumped because: nonzero _refcount [ 908.339437] CPU: 32 PID: 72599 Comm: qemu-kvm Kdump: loaded Tainted: GB W5.0.0+ #1 [ 908.339438] Call trace: [ 908.339439] dump_backtrace+0x0/0x188 [ 908.339441] show_stack+0x24/0x30 [ 908.339442] dump_stack+0xa8/0xcc [ 908.339443] bad_page+0xf0/0x150 [ 908.339445] free_pages_check_bad+0x84/0xa0 [ 908.339446] free_pcppages_bulk+0x4b8/0x750 [ 908.339448] free_unref_page_commit+0x13c/0x198 [ 908.339449] free_unref_page+0x84/0xa0 [ 908.339451] __free_pages+0x58/0x68 [ 908.339452] zap_huge_pmd+0x290/0x2d8 [ 908.339454] unmap_page_range+0x2b4/0x470 [ 908.339455] unmap_single_vma+0x94/0xe8 [ 908.339457] unmap_vmas+0x8c/0x108 [ 908.339458] exit_mmap+0xd4/0x178 [ 908.339459] mmput+0x74/0x180 [ 908.339460] do_exit+0x2b4/0x5b0 [ 908.339462] do_group_exit+0x3c/0xe0 [ 908.339463] __arm64_sys_exit_group+0x24/0x28 [ 908.339465] el0_svc_common+0xa0/0x180 [ 908.339466] el0_svc_handler+0x38/0x78 [ 908.339467] el0_svc+0x8/0xc Thats bad, we seem to be making upto 4 unbalanced put_page(). --- virt/kvm/arm/mmu.c | 51 +++ 1 file changed, 35 insertions(+), 16 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 66e0fbb5..04b0f9b 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1076,24 +1076,38 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache * Skip updating the page table if the entry is * unchanged. */ - if (pmd_val(old_pmd) == pmd_val(*new_pmd)) + if (pmd_val(old_pmd) == pmd_val(*new_pmd)) { return 0; - + } else if (WARN_ON_ONCE(!pmd_thp_or_huge(old_pmd))) { /* - * Mapping in huge pages should only happen through a - * fault. If a page is merged into a transparent huge - * page, the individual subpages of that huge page - * should be unmapped through MMU notifiers before we - * get here. - * - * Merging of CompoundPages is not supported; they - * should become splitting first, unmapped, merged, - * and mapped back in on-demand. + * If we have PTE level mapping for this block, + * we must unmap it to avoid inconsistent TLB + * state. We could end up in this situation if + * the memory slot was marked for dirty logging + * and was reverted, leaving PTE level mappings + * for the pages accessed during the period. + * Normal THP split/merge follows mmu_notifier + * callbacks and do get handled accordingly. */ - VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); + unmap_stage2_range(kvm, (addr & S2_PMD_MASK), S2_PMD_SIZE); It seems that kvm decreases the _refcount of the page twice in transparent_hugepage_adjust() and unmap_stage2_range(). But I thought we should be doing that on the head_page already, as this is THP. I will take a look and get back to you on this. Btw, is it possible for you to turn on CONFIG_DEBUG_VM and re-run with the above patch ? Kind regards Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC] Question about TLB flush while set Stage-2 huge pages
y level-3 permission fault caused by a write > >> on a same IPA. After > >> analyzing the source code, I find KVM always return from the bellow *if* > >> statement in > >> stage2_set_pmd_huge() even if we only have a single VCPU: > >> > >> /* > >> * Multiple vcpus faulting on the same PMD entry, can > >> * lead to them sequentially updating the PMD with the > >> * same value. Following the break-before-make > >> * (pmd_clear() followed by tlb_flush()) process can > >> * hinder forward progress due to refaults generated > >> * on missing translations. > >> * > >> * Skip updating the page table if the entry is > >> * unchanged. > >> */ > >> if (pmd_val(old_pmd) == pmd_val(*new_pmd)) > >> return 0; > >> > >> The PMD has already set the PMD_S2_RDWR bit. I doubt > >> kvm_tlb_flush_vmid_ipa() does not invalidate > >> Stage-2 for the subpages of the PMD(except the first PTE of this PMD). > >> Finally I add some debug > >> code to flush tlb for all subpages of the PMD, as shown bellow: > >> > >> /* > >> * Mapping in huge pages should only happen through a > >> * fault. If a page is merged into a transparent huge > >> * page, the individual subpages of that huge page > >> * should be unmapped through MMU notifiers before we > >> * get here. > >> * > >> * Merging of CompoundPages is not supported; they > >> * should become splitting first, unmapped, merged, > >> * and mapped back in on-demand. > >> */ > >> VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); > >> > >> pmd_clear(pmd); > >> for (cnt = 0; cnt < 512; cnt++) > >> kvm_tlb_flush_vmid_ipa(kvm, addr + cnt*PAGE_SIZE); > >> > >> Then the problem no longer reproduce. > > > > This makes very little sense. We shouldn't be able to enter this path > > for anything else but a permission update, otherwise the VM_BUG_ON > > should fire. > > Hmm, I think I didn't describe it very clearly. > Look at the following sequence: > > 1) Set a PMD READONLY and logging_active. > > 2) KVM handles permission fault caused by writing a subpage(assumpt *b*) > within this huge PMD. > > 3) KVM dissolves PMD and invalidates TLB for this PMD. Then set a writable > PTE. > > 4) Read another 511 PTEs and setup Stage-2 PTE table. > > 5) Now remove logging_active and keep another 511 PTEs READONLY. > > 6) VM continues to write a subpage(assumpt *c*) and cause permission fault. > > 7) KVM handles this new fault and makes a new writable PMD after > transparent_hugepage_adjust(). > > 8) KVM invalidates TLB for the first page(*a*) of the PMD. >Here another 511 RO PTEs entries still stay in TLB, especially *c* which > will be wrote later. > > 9) KVM then set this new writable PMD. >Step 8-9 is what stage2_set_pmd_huge() does. > > 10) VM continues to write *c*, but this time it hits the RO PTE entry in TLB > and causes permission fault again. >Sometimes it can also cause TLB conflict aborts. > > 11) KVM repeats step 6 and goes to the following statement and return 0: > > * Skip updating the page table if the entry is > * unchanged. > */ > if (pmd_val(old_pmd) == pmd_val(*new_pmd)) > return 0; > > 12) Then it will repeat step 10-11 until the PTE entry is invalidated. > > I think there is something abnormal in step 8. > Should I blame my hardware? Or is it a kernel bug? Marc and I had a discussion about this and it looks like we may have an issue here. So with the cancellation of logging, we do not trigger the mmu_notifiers (as the userspace memory mapping hasn't changed) and thus have memory leaks while trying to install a huge mapping. Would it be possible for you to try the patch below ? It will trigger a WARNING to confirm our theory, but should not cause the hang. As we unmap the PMD/PUD range of PTE mappings before reinstalling a block map. ---8>--- test: kvm: arm: Fix handling of stage2 huge mappings We rely on the mmu_notifier call backs to handle the split/merging of huge pages and thus we are guaranteed that while creating a block mapping, the entire block is unmapped at stage2. However, we miss a case where the block mapping is split for dirty logging case and then could later be made block mapping, if we cancel the dirty loggi
[PATCH] kvm: arm: Enforce PTE mappings at stage2 when needed
commit 6794ad5443a2118 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings") made the checks to skip huge mappings, stricter. However it introduced a bug where we still use huge mappings, ignoring the flag to use PTE mappings, by not reseting the vma_pagesize to PAGE_SIZE. Also, the checks do not cover the PUD huge pages, that was under review during the same period. This patch fixes both the issues. Fixes : 6794ad5443a2118 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings") Reported-by: Zenghui Yu Cc: Zenghui Yu Cc: Christoffer Dall Cc: Marc Zyngier Signed-off-by: Suzuki K Poulose --- virt/kvm/arm/mmu.c | 43 +-- 1 file changed, 21 insertions(+), 22 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 30251e2..66e0fbb5 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1595,8 +1595,9 @@ static void kvm_send_hwpoison_signal(unsigned long address, send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current); } -static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot, - unsigned long hva) +static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot, + unsigned long hva, + unsigned long map_size) { gpa_t gpa_start, gpa_end; hva_t uaddr_start, uaddr_end; @@ -1612,34 +1613,34 @@ static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot, /* * Pages belonging to memslots that don't have the same alignment -* within a PMD for userspace and IPA cannot be mapped with stage-2 -* PMD entries, because we'll end up mapping the wrong pages. +* within a PMD/PUD for userspace and IPA cannot be mapped with stage-2 +* PMD/PUD entries, because we'll end up mapping the wrong pages. * * Consider a layout like the following: * *memslot->userspace_addr: *+-+++---+ -*|abcde|fgh Stage-1 PMD|Stage-1 PMD tv|xyz| +*|abcde|fgh Stage-1 block |Stage-1 block tv|xyz| *+-+++---+ * *memslot->base_gfn << PAGE_SIZE: * +---+++-+ -* |abc|def Stage-2 PMD|Stage-2 PMD |tvxyz| +* |abc|def Stage-2 block |Stage-2 block |tvxyz| * +---+++-+ * -* If we create those stage-2 PMDs, we'll end up with this incorrect +* If we create those stage-2 blocks, we'll end up with this incorrect * mapping: * d -> f * e -> g * f -> h */ - if ((gpa_start & ~S2_PMD_MASK) != (uaddr_start & ~S2_PMD_MASK)) + if ((gpa_start & (map_size - 1)) != (uaddr_start & (map_size - 1))) return false; /* * Next, let's make sure we're not trying to map anything not covered -* by the memslot. This means we have to prohibit PMD size mappings -* for the beginning and end of a non-PMD aligned and non-PMD sized +* by the memslot. This means we have to prohibit block size mappings +* for the beginning and end of a non-block aligned and non-block sized * memory slot (illustrated by the head and tail parts of the * userspace view above containing pages 'abcde' and 'xyz', * respectively). @@ -1648,8 +1649,8 @@ static bool fault_supports_stage2_pmd_mappings(struct kvm_memory_slot *memslot, * userspace_addr or the base_gfn, as both are equally aligned (per * the check above) and equally sized. */ - return (hva & S2_PMD_MASK) >= uaddr_start && - (hva & S2_PMD_MASK) + S2_PMD_SIZE <= uaddr_end; + return (hva & ~(map_size - 1)) >= uaddr_start && + (hva & ~(map_size - 1)) + map_size <= uaddr_end; } static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, @@ -1678,12 +1679,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } - if (!fault_supports_stage2_pmd_mappings(memslot, hva)) - force_pte = true; - - if (logging_active) - force_pte = true; - /* Let's check if we will get back a huge page backed by hugetlbfs */ down_read(>mm->mmap_sem); vma = find_vma_intersection(current->mm, hva, hva + 1); @@ -1694,6 +1689,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, } vma_pagesize = vma_kernel_pagesize(
Re: [PATCH v11 3/8] arm64: KVM: add accessors to track guest/host only counters
Hi Andrew, On 08/03/2019 12:07, Andrew Murray wrote: In order to effeciently switch events_{guest,host} perf counters at guest entry/exit we add bitfields to kvm_cpu_context for guest and host events as well as accessors for updating them. A function is also provided which allows the PMU driver to determine if a counter should start counting when it is enabled. With exclude_host, events on !VHE we may only start counting when entering the guest. Some minor comments below. Signed-off-by: Andrew Murray --- arch/arm64/include/asm/kvm_host.h | 17 +++ arch/arm64/kvm/Makefile | 2 +- arch/arm64/kvm/pmu.c | 49 +++ 3 files changed, 67 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/kvm/pmu.c diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 1d36619d6650..4b7219128f2d 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -207,8 +207,14 @@ struct kvm_cpu_context { struct kvm_vcpu *__hyp_running_vcpu; }; +struct kvm_pmu_events { + u32 events_host; + u32 events_guest; +}; + struct kvm_host_data { struct kvm_cpu_context host_ctxt; + struct kvm_pmu_events pmu_events; }; typedef struct kvm_host_data kvm_host_data_t; @@ -479,11 +485,22 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu); void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu); void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu); +static inline bool kvm_pmu_counter_defered(struct perf_event_attr *attr) nit: s/defered/deferred/ +{ + return attr->exclude_host; +} + Does it need a definition for !CONFIG_KVM case, to make sure that the events are always enabled for that case ? i.e, does this introduce a change in behavior for !CONFIG_KVM case ? #ifdef CONFIG_KVM /* Avoid conflicts with core headers if CONFIG_KVM=n */ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu) { return kvm_arch_vcpu_run_map_fp(vcpu); } + +void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr); +void kvm_clr_pmu_events(u32 clr); +#else +static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {} +static inline void kvm_clr_pmu_events(u32 clr) {} #endif static inline void kvm_arm_vhe_guest_enter(void) diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index 0f2a135ba15b..f34cb49b66ae 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -19,7 +19,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o -kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o +kvm-$(CONFIG_KVM_ARM_HOST) += vgic-sys-reg-v3.o fpsimd.o pmu.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c new file mode 100644 index ..43965a3cc0f4 --- /dev/null +++ b/arch/arm64/kvm/pmu.c @@ -0,0 +1,49 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * arch/arm64/kvm/pmu.c: Switching between guest and host counters minor nit: You don't need the file name, it is superfluous. + * + * Copyright 2019 Arm Limited + * Author: Andrew Murray + */ +#include +#include + +DECLARE_PER_CPU(kvm_host_data_t, kvm_host_data); nit: Do we really need this ? This is already in asm/kvm_host.h. + +/* + * Given the exclude_{host,guest} attributes, determine if we are going + * to need to switch counters at guest entry/exit. + */ +static bool kvm_pmu_switch_needed(struct perf_event_attr *attr) +{ + /* Only switch if attributes are different */ + return (attr->exclude_host ^ attr->exclude_guest); I back Julien's suggestion to keep this simple. +} + +/* + * Add events to track that we may want to switch at guest entry/exit + * time. + */ +void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) +{ + struct kvm_host_data *ctx = this_cpu_ptr(_host_data); + + if (!kvm_pmu_switch_needed(attr)) + return; + + if (!attr->exclude_host) + ctx->pmu_events.events_host |= set; + if (!attr->exclude_guest) + ctx->pmu_events.events_guest |= set; +} + +/* + * Stop tracking events + */ +void kvm_clr_pmu_events(u32 clr) +{ + struct kvm_host_data *ctx = this_cpu_ptr(_host_data); + + ctx->pmu_events.events_host &= ~clr; + ctx->pmu_events.events_guest &= ~clr; +} Rest looks fine. Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC PATCH] KVM: arm64: Force a PTE mapping when logging is enabled
Hi Zenghui, On 05/03/2019 11:09, Zenghui Yu wrote: Hi Marc, Suzuki, On 2019/3/5 1:34, Marc Zyngier wrote: Hi Zenghui, Suzuki, On 04/03/2019 17:13, Suzuki K Poulose wrote: Hi Zenghui, On Sun, Mar 03, 2019 at 11:14:38PM +0800, Zenghui Yu wrote: I think there're still some problems in this patch... Details below. On Sat, Mar 2, 2019 at 11:39 AM Zenghui Yu wrote: The idea behind this is: we don't want to keep tracking of huge pages when logging_active is true, which will result in performance degradation. We still need to set vma_pagesize to PAGE_SIZE, so that we can make use of it to force a PTE mapping. Yes, you're right. We are indeed ignoring the force_pte flag. Cc: Suzuki K Poulose Cc: Punit Agrawal Signed-off-by: Zenghui Yu --- Atfer looking into https://patchwork.codeaurora.org/patch/647985/ , the "vma_pagesize = PAGE_SIZE" logic was not intended to be deleted. As far as I can tell, we used to have "hugetlb" to force the PTE mapping, but we have "vma_pagesize" currently instead. We should set it properly for performance reasons (e.g, in VM migration). Did I miss something important? --- virt/kvm/arm/mmu.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 30251e2..7d41b16 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1705,6 +1705,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) && !force_pte) { gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; + } else { + /* +* Fallback to PTE if it's not one of the stage2 +* supported hugepage sizes or the corresponding level +* doesn't exist, or logging is enabled. First, Instead of "logging is enabled", it should be "force_pte is true", since "force_pte" will be true when: 1) fault_supports_stage2_pmd_mappings() return false; or 2) "logging is enabled" (e.g, in VM migration). Second, fallback some unsupported hugepage sizes (e.g, 64K hugepage with 4K pages) to PTE is somewhat strange. And it will then _unexpectedly_ reach transparent_hugepage_adjust(), though no real adjustment will happen since commit fd2ef358282c ("KVM: arm/arm64: Ensure only THP is candidate for adjustment"). Keeping "vma_pagesize" there as it is will be better, right? So I'd just simplify the logic like: We could fix this right in the beginning. See patch below: } else if (force_pte) { vma_pagesize = PAGE_SIZE; } Will send a V2 later and waiting for your comments :) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 30251e2..529331e 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1693,7 +1693,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } - vma_pagesize = vma_kernel_pagesize(vma); + /* If we are forced to map at page granularity, force the pagesize here */ + vma_pagesize = force_pte ? PAGE_SIZE : vma_kernel_pagesize(vma); + /* * The stage2 has a minimum of 2 level table (For arm64 see * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can @@ -1701,11 +1703,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * As for PUD huge maps, we must make sure that we have at least * 3 levels, i.e, PMD is not folded. */ - if ((vma_pagesize == PMD_SIZE || -(vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) && - !force_pte) { + if (vma_pagesize == PMD_SIZE || + (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; - } + up_read(>mm->mmap_sem); /* We need minimum second+third level pages */ A nicer implementation and easier to understand, thanks! That's pretty interesting, because this is almost what we already have in the NV code: https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/tree/virt/kvm/arm/mmu.c?h=kvm-arm64/nv-wip-v5.0-rc7#n1752 (note that force_pte is gone in that branch). haha :-) sorry about that. I haven't looked into the NV code yet, so ... But I'm still wondering: should we fix this wrong mapping size problem before NV is introduced? Since this problem has not much to do with NV, and 5.0 has already been released with this problem (and 5.1 will without fix ...). Yes, we must fix it. I will soon send out a patch copying on it. Its just that I find some more issues around forcing the PTE mappings with PUD huge pages. I will send something out soon. Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [RFC PATCH] KVM: arm64: Force a PTE mapping when logging is enabled
Hi Zenghui, On Sun, Mar 03, 2019 at 11:14:38PM +0800, Zenghui Yu wrote: > I think there're still some problems in this patch... Details below. > > On Sat, Mar 2, 2019 at 11:39 AM Zenghui Yu wrote: > > > > The idea behind this is: we don't want to keep tracking of huge pages when > > logging_active is true, which will result in performance degradation. We > > still need to set vma_pagesize to PAGE_SIZE, so that we can make use of it > > to force a PTE mapping. Yes, you're right. We are indeed ignoring the force_pte flag. > > > > Cc: Suzuki K Poulose > > Cc: Punit Agrawal > > Signed-off-by: Zenghui Yu > > > > --- > > Atfer looking into https://patchwork.codeaurora.org/patch/647985/ , the > > "vma_pagesize = PAGE_SIZE" logic was not intended to be deleted. As far > > as I can tell, we used to have "hugetlb" to force the PTE mapping, but > > we have "vma_pagesize" currently instead. We should set it properly for > > performance reasons (e.g, in VM migration). Did I miss something important? > > > > --- > > virt/kvm/arm/mmu.c | 7 +++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > > index 30251e2..7d41b16 100644 > > --- a/virt/kvm/arm/mmu.c > > +++ b/virt/kvm/arm/mmu.c > > @@ -1705,6 +1705,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > > phys_addr_t fault_ipa, > > (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) && > > !force_pte) { > > gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> > > PAGE_SHIFT; > > + } else { > > + /* > > +* Fallback to PTE if it's not one of the stage2 > > +* supported hugepage sizes or the corresponding level > > +* doesn't exist, or logging is enabled. > > First, Instead of "logging is enabled", it should be "force_pte is true", > since "force_pte" will be true when: > > 1) fault_supports_stage2_pmd_mappings() return false; or > 2) "logging is enabled" (e.g, in VM migration). > > Second, fallback some unsupported hugepage sizes (e.g, 64K hugepage with > 4K pages) to PTE is somewhat strange. And it will then _unexpectedly_ > reach transparent_hugepage_adjust(), though no real adjustment will happen > since commit fd2ef358282c ("KVM: arm/arm64: Ensure only THP is candidate > for adjustment"). Keeping "vma_pagesize" there as it is will be better, > right? > > So I'd just simplify the logic like: We could fix this right in the beginning. See patch below: > > } else if (force_pte) { > vma_pagesize = PAGE_SIZE; > } > > > Will send a V2 later and waiting for your comments :) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 30251e2..529331e 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1693,7 +1693,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } - vma_pagesize = vma_kernel_pagesize(vma); + /* If we are forced to map at page granularity, force the pagesize here */ + vma_pagesize = force_pte ? PAGE_SIZE : vma_kernel_pagesize(vma); + /* * The stage2 has a minimum of 2 level table (For arm64 see * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can @@ -1701,11 +1703,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * As for PUD huge maps, we must make sure that we have at least * 3 levels, i.e, PMD is not folded. */ - if ((vma_pagesize == PMD_SIZE || -(vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) && - !force_pte) { + if (vma_pagesize == PMD_SIZE || + (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; - } + up_read(>mm->mmap_sem); /* We need minimum second+third level pages */ --- Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 5/5] KVM: arm/arm64: support chained PMU counters
On 05/02/2019 14:33, Julien Thierry wrote: Hi Andrew, On 04/02/2019 16:53, Andrew Murray wrote: Emulate chained PMU counters by creating a single 64 bit event counter for a pair of chained KVM counters. Signed-off-by: Andrew Murray --- include/kvm/arm_pmu.h | 1 + virt/kvm/arm/pmu.c| 321 +- 2 files changed, 269 insertions(+), 53 deletions(-) diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h index b73f31b..8e691ee 100644 --- a/include/kvm/arm_pmu.h +++ b/include/kvm/arm_pmu.h @@ -29,6 +29,7 @@ struct kvm_pmc { u8 idx; /* index into the pmu->pmc array */ struct perf_event *perf_event; u64 bitmask; + u64 overflow_count; }; struct kvm_pmu { diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c index a64aeb2..9318130 100644 --- a/virt/kvm/arm/pmu.c +++ b/virt/kvm/arm/pmu.c @@ -24,9 +24,25 @@ #include #include +#define ARMV8_PMUV3_PERFCTR_CHAIN 0x1E I find it a bit awkward to have this redefined here. Maybe we could define a helper in kvm_host.h: bool kvm_pmu_typer_is_chain(u64 typer); That would always return false for arm32? We don't support ARMv7 host, so that doesn't matter. But it is a good idea to wrap it in a function here. kvm_vcpu_kick(vcpu); @@ -440,6 +632,10 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx) select_idx != ARMV8_PMU_CYCLE_IDX) return; + /* Handled by even event */ + if (eventsel == ARMV8_PMUV3_PERFCTR_CHAIN) + return; + memset(, 0, sizeof(struct perf_event_attr)); attr.type = PERF_TYPE_RAW; attr.size = sizeof(attr); @@ -452,6 +648,9 @@ static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx) attr.config = (select_idx == ARMV8_PMU_CYCLE_IDX) ? ARMV8_PMUV3_PERFCTR_CPU_CYCLES : eventsel; + if (kvm_pmu_event_is_chained(vcpu, select_idx)) + attr.config1 |= 0x1; I'm not very familiar with the usage of perf attributes configs, but is there any chance we could name this flag? Even if only for the local file? Something like PERF_ATTR_CFG1_KVM_PMU_CHAINED (unless there is an existing naming convention for event attributes). There is ARPMU_EVT_64BIT in "linux/perf/arm_pmu.h". But in general we don't specify where the Bit goes in (i.e, CFG1 etc). Cheers Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 4/5] KVM: arm/arm64: lazily create perf events on enable
Hi Andrew, On 04/02/2019 16:53, Andrew Murray wrote: To prevent re-creating perf events everytime the counter registers are changed, let's instead lazily create the event when the event is first enabled and destroy it when it changes. Signed-off-by: Andrew Murray --- virt/kvm/arm/pmu.c | 96 ++ 1 file changed, 68 insertions(+), 28 deletions(-) diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c index 95d74ec..a64aeb2 100644 --- a/virt/kvm/arm/pmu.c +++ b/virt/kvm/arm/pmu.c @@ -24,7 +24,10 @@ #include #include +static void kvm_pmu_sync_counter_enable(struct kvm_vcpu *vcpu, u64 select_idx); static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx); +static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, struct kvm_pmc *pmc); + /** * kvm_pmu_get_counter_value - get PMU counter value * @vcpu: The vcpu pointer @@ -59,13 +62,15 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx) void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val) { u64 reg; + struct kvm_pmu *pmu = >arch.pmu; + struct kvm_pmc *pmc = >pmc[select_idx]; reg = (select_idx == ARMV8_PMU_CYCLE_IDX) ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx; __vcpu_sys_reg(vcpu, reg) += (s64)val - kvm_pmu_get_counter_value(vcpu, select_idx); - /* Recreate the perf event to reflect the updated sample_period */ - kvm_pmu_create_perf_event(vcpu, select_idx); + kvm_pmu_stop_counter(vcpu, pmc); + kvm_pmu_sync_counter_enable(vcpu, select_idx); } /** @@ -83,6 +88,7 @@ static void kvm_pmu_release_perf_event(struct kvm_pmc *pmc) /** * kvm_pmu_stop_counter - stop PMU counter + * @vcpu: The vcpu pointer * @pmc: The PMU counter pointer * * If this counter has been configured to monitor some event, release it here. @@ -143,6 +149,24 @@ u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu) } /** + * kvm_pmu_enable_counter - create/enable a counter + * @vcpu: The vcpu pointer + * @select_idx: The counter index + */ +static void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 select_idx) +{ + struct kvm_pmu *pmu = >arch.pmu; + struct kvm_pmc *pmc = >pmc[select_idx]; + + if (!pmc->perf_event) + kvm_pmu_create_perf_event(vcpu, select_idx); + + perf_event_enable(pmc->perf_event); + if (pmc->perf_event->state != PERF_EVENT_STATE_ACTIVE) + kvm_debug("failed to enable perf event\n"); +} + +/** * kvm_pmu_enable_counter_mask - enable selected PMU counters * @vcpu: The vcpu pointer * @val: the value guest writes to PMCNTENSET register @@ -152,8 +176,6 @@ u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu) void kvm_pmu_enable_counter_mask(struct kvm_vcpu *vcpu, u64 val) { int i; - struct kvm_pmu *pmu = >arch.pmu; - struct kvm_pmc *pmc; if (!(__vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E) || !val) return; @@ -162,16 +184,39 @@ void kvm_pmu_enable_counter_mask(struct kvm_vcpu *vcpu, u64 val) if (!(val & BIT(i))) continue; - pmc = >pmc[i]; - if (pmc->perf_event) { - perf_event_enable(pmc->perf_event); - if (pmc->perf_event->state != PERF_EVENT_STATE_ACTIVE) - kvm_debug("fail to enable perf event\n"); - } + kvm_pmu_enable_counter(vcpu, i); } } /** + * kvm_pmu_sync_counter_enable - reenable a counter if it should be enabled + * @vcpu: The vcpu pointer + * @select_idx: The counter index + */ +static void kvm_pmu_sync_counter_enable(struct kvm_vcpu *vcpu, + u64 select_idx) +{ + u64 set = __vcpu_sys_reg(vcpu, PMCNTENSET_EL0); + + if (set & BIT(select_idx)) + kvm_pmu_enable_counter_mask(vcpu, BIT(select_idx)); I think there is a problem here. We could be creating an event for a counter beyond what is supported by the CPU. i.e, we don't seem to validate that the mask we are creating is within the kvm_pmu_valid_counter_mask(). The other callers seem to verify this. I guess it may be better to add a check for this in the kvm_pmu_enable_counter_mask(). minor nit: Feel free to ignore. If we move the check for PMCNTENSET_EL0 to pmu_enable_counter_mask() we could get rid of the above function. Anyways, we should only be enabling the counters set in PMCNTENSET_EL0. +} + +/** + * kvm_pmu_disable_counter - disable selected PMU counter + * @vcpu: The vcpu pointer + * @pmc: The counter to disable + */ +static void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 select_idx) +{ + struct kvm_pmu *pmu = >arch.pmu; + struct kvm_pmc *pmc = >pmc[select_idx]; + + if (pmc->perf_event) + perf_event_disable(pmc->perf_event); +} + +/** * kvm_pmu_disable_counter_mask -
Re: [PATCH] kvm: arm64: Fix comment for KVM_PHYS_SHIFT
Hi On 14/02/2019 01:45, Zenghui Yu wrote: Since Suzuki K Poulose's work on Dynamic IPA support, KVM_PHYS_SHIFT will be used only when machine_type's bits[7:0] equal to 0 (by default). Thus the outdated comment should be fixed. Cc: Suzuki K Poulose Signed-off-by: Zenghui Yu --- arch/arm64/include/asm/kvm_mmu.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 8af4b1b..a12bf48 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -138,7 +138,8 @@ static inline unsigned long __kern_hyp_va(unsigned long v) }) /* - * We currently only support a 40bit IPA. + * We currently support using a VM-specified IPA size. For backward + * compatibility, the default IPA size is fixed to 40bits. */ #define KVM_PHYS_SHIFT(40) Thanks for the fix. Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 3/5] KVM: arm/arm64: re-create event when setting counter value
Hi Andrew, On 04/02/2019 16:53, Andrew Murray wrote: The perf event sample_period is currently set based upon the current counter value, when PMXEVTYPER is written to and the perf event is created. However the user may choose to write the type before the counter value in which case sample_period will be set incorrectly. Let's instead decouple event creation from PMXEVTYPER and (re)create the event in either suitation. Signed-off-by: Andrew Murray --- virt/kvm/arm/pmu.c | 40 +++- 1 file changed, 31 insertions(+), 9 deletions(-) diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c index 6e7c179..95d74ec 100644 --- a/virt/kvm/arm/pmu.c +++ b/virt/kvm/arm/pmu.c @@ -24,6 +24,7 @@ #include #include +static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx); /** * kvm_pmu_get_counter_value - get PMU counter value * @vcpu: The vcpu pointer @@ -62,6 +63,9 @@ void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val) reg = (select_idx == ARMV8_PMU_CYCLE_IDX) ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx; __vcpu_sys_reg(vcpu, reg) += (s64)val - kvm_pmu_get_counter_value(vcpu, select_idx); + + /* Recreate the perf event to reflect the updated sample_period */ + kvm_pmu_create_perf_event(vcpu, select_idx); } /** @@ -378,23 +382,22 @@ static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u64 select_idx) } /** - * kvm_pmu_set_counter_event_type - set selected counter to monitor some event + * kvm_pmu_create_perf_event - create a perf event for a counter * @vcpu: The vcpu pointer - * @data: The data guest writes to PMXEVTYPER_EL0 + * @data: Type of event as per PMXEVTYPER_EL0 format * @select_idx: The number of selected counter nit: data no longer exists. - * - * When OS accesses PMXEVTYPER_EL0, that means it wants to set a PMC to count an - * event with given hardware event number. Here we call perf_event API to - * emulate this action and create a kernel perf event for it. */ -void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, - u64 select_idx) +static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx) { With the comment from Julien addressed, Reviewed-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm/arm64: Update MAINTAINERS entries
Hi Marc, On 12/02/2019 11:06, Marc Zyngier wrote: For historical reasons, KVM/arm and KVM/arm64 have had different entries in the MAINTAINER file. This makes little sense, as they are maintained together. On top of that, we have a bunch of talented people helping with the reviewing, and they deserve to be mentionned in the consolidated entry. Thanks for the recognition ! Acked-by: Christoffer Dall Signed-off-by: Marc Zyngier --- MAINTAINERS | 18 +++--- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 51029a425dbe..91ec0682443b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8297,29 +8297,25 @@ S: Maintained F:arch/x86/include/asm/svm.h F:arch/x86/kvm/svm.c -KERNEL VIRTUAL MACHINE FOR ARM (KVM/arm) +KERNEL VIRTUAL MACHINE FOR ARM/ARM64 (KVM/arm, KVM/arm64) M:Christoffer Dall M:Marc Zyngier +R: James Morse +R: Julien Thierry +R: Suzuki Pouloze Please note that it is: "Suzuki K Poulose" With that: Acked-by: Suzuki K Poulose ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm