Re: [PATCH 1/3] MAINTAINERS: KVM: arm/arm64: Remove myself as maintainer
On 24/05/19 16:47, Marc Zyngier wrote: > From: Christoffer Dall > > I no longer have time to actively review patches and manage the tree and > it's time to make that official. Thanks Christopher for your work, I hope to meet you anyway at KVM Forum! Paolo > Huge thanks to the incredible Linux community and all the contributors > who have put up with me over the past years. > > I also take this opportunity to remove the website link to the Columbia > web page, as that information is no longer up to date and I don't know > who manages that anymore. > > Signed-off-by: Christoffer Dall > Signed-off-by: Marc Zyngier > --- > MAINTAINERS | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 5cfbea4ce575..4ba271a8e0ef 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -8611,14 +8611,12 @@ F:arch/x86/include/asm/svm.h > F: arch/x86/kvm/svm.c > > KERNEL VIRTUAL MACHINE FOR ARM/ARM64 (KVM/arm, KVM/arm64) > -M: Christoffer Dall > M: Marc Zyngier > R: James Morse > R: Julien Thierry > R: Suzuki K Pouloze > L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers) > L: kvmarm@lists.cs.columbia.edu > -W: http://systems.cs.columbia.edu/projects/kvm-arm > T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git > S: Maintained > F: arch/arm/include/uapi/asm/kvm* > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] mm, compaction: Make sure we isolate a valid PFN
On Fri, May 24, 2019 at 04:31:48PM +0100, Suzuki K Poulose wrote: > When we have holes in a normal memory zone, we could endup having > cached_migrate_pfns which may not necessarily be valid, under heavy memory > pressure with swapping enabled ( via __reset_isolation_suitable(), triggered > by kswapd). > > Later if we fail to find a page via fast_isolate_freepages(), we may > end up using the migrate_pfn we started the search with, as valid > page. This could lead to accessing NULL pointer derefernces like below, > due to an invalid mem_section pointer. > > Unable to handle kernel NULL pointer dereference at virtual address > 0008 [47/1825] > Mem abort info: >ESR = 0x9604 >Exception class = DABT (current EL), IL = 32 bits >SET = 0, FnV = 0 >EA = 0, S1PTW = 0 > Data abort info: >ISV = 0, ISS = 0x0004 >CM = 0, WnR = 0 > user pgtable: 4k pages, 48-bit VAs, pgdp = 82f94ae9 > [0008] pgd= > Internal error: Oops: 9604 [#1] SMP > ... > CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6 > Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 > 09/25/2018 > pstate: 6005 (nZCv daif -PAN -UAO) > pc : set_pfnblock_flags_mask+0x58/0xe8 > lr : compaction_alloc+0x300/0x950 > [...] > Process qemu-system-aar (pid: 6080, stack limit = 0x95070da5) > Call trace: > set_pfnblock_flags_mask+0x58/0xe8 > compaction_alloc+0x300/0x950 > migrate_pages+0x1a4/0xbb0 > compact_zone+0x750/0xde8 > compact_zone_order+0xd8/0x118 > try_to_compact_pages+0xb4/0x290 > __alloc_pages_direct_compact+0x84/0x1e0 > __alloc_pages_nodemask+0x5e0/0xe18 > alloc_pages_vma+0x1cc/0x210 > do_huge_pmd_anonymous_page+0x108/0x7c8 > __handle_mm_fault+0xdd4/0x1190 > handle_mm_fault+0x114/0x1c0 > __get_user_pages+0x198/0x3c0 > get_user_pages_unlocked+0xb4/0x1d8 > __gfn_to_pfn_memslot+0x12c/0x3b8 > gfn_to_pfn_prot+0x4c/0x60 > kvm_handle_guest_abort+0x4b0/0xcd8 > handle_exit+0x140/0x1b8 > kvm_arch_vcpu_ioctl_run+0x260/0x768 > kvm_vcpu_ioctl+0x490/0x898 > do_vfs_ioctl+0xc4/0x898 > ksys_ioctl+0x8c/0xa0 > __arm64_sys_ioctl+0x28/0x38 > el0_svc_common+0x74/0x118 > el0_svc_handler+0x38/0x78 > el0_svc+0x8/0xc > Code: f8607840 f11f 8b011401 9a801020 (f9400400) > ---[ end trace af6a35219325a9b6 ]--- > > The issue was reported on an arm64 server with 128GB with holes in the zone > (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while running 100 > KVM > guest instances. > > This patch fixes the issue by ensuring that the page belongs to a valid PFN > when we fallback to using the lower limit of the scan range upon failure in > fast_isolate_freepages(). > > Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a > migration target") > Reported-by: Marc Zyngier > Signed-off-by: Suzuki K Poulose Reviewed-by: Mel Gorman -- Mel Gorman SUSE Labs
[PATCH] mm, compaction: Make sure we isolate a valid PFN
When we have holes in a normal memory zone, we could endup having cached_migrate_pfns which may not necessarily be valid, under heavy memory pressure with swapping enabled ( via __reset_isolation_suitable(), triggered by kswapd). Later if we fail to find a page via fast_isolate_freepages(), we may end up using the migrate_pfn we started the search with, as valid page. This could lead to accessing NULL pointer derefernces like below, due to an invalid mem_section pointer. Unable to handle kernel NULL pointer dereference at virtual address 0008 [47/1825] Mem abort info: ESR = 0x9604 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x0004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp = 82f94ae9 [0008] pgd= Internal error: Oops: 9604 [#1] SMP ... CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6 Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018 pstate: 6005 (nZCv daif -PAN -UAO) pc : set_pfnblock_flags_mask+0x58/0xe8 lr : compaction_alloc+0x300/0x950 [...] Process qemu-system-aar (pid: 6080, stack limit = 0x95070da5) Call trace: set_pfnblock_flags_mask+0x58/0xe8 compaction_alloc+0x300/0x950 migrate_pages+0x1a4/0xbb0 compact_zone+0x750/0xde8 compact_zone_order+0xd8/0x118 try_to_compact_pages+0xb4/0x290 __alloc_pages_direct_compact+0x84/0x1e0 __alloc_pages_nodemask+0x5e0/0xe18 alloc_pages_vma+0x1cc/0x210 do_huge_pmd_anonymous_page+0x108/0x7c8 __handle_mm_fault+0xdd4/0x1190 handle_mm_fault+0x114/0x1c0 __get_user_pages+0x198/0x3c0 get_user_pages_unlocked+0xb4/0x1d8 __gfn_to_pfn_memslot+0x12c/0x3b8 gfn_to_pfn_prot+0x4c/0x60 kvm_handle_guest_abort+0x4b0/0xcd8 handle_exit+0x140/0x1b8 kvm_arch_vcpu_ioctl_run+0x260/0x768 kvm_vcpu_ioctl+0x490/0x898 do_vfs_ioctl+0xc4/0x898 ksys_ioctl+0x8c/0xa0 __arm64_sys_ioctl+0x28/0x38 el0_svc_common+0x74/0x118 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc Code: f8607840 f11f 8b011401 9a801020 (f9400400) ---[ end trace af6a35219325a9b6 ]--- The issue was reported on an arm64 server with 128GB with holes in the zone (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while running 100 KVM guest instances. This patch fixes the issue by ensuring that the page belongs to a valid PFN when we fallback to using the lower limit of the scan range upon failure in fast_isolate_freepages(). Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target") Reported-by: Marc Zyngier Signed-off-by: Suzuki K Poulose --- mm/compaction.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9febc8c..9e1b9ac 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) page = pfn_to_page(highest); cc->free_pfn = highest; } else { - if (cc->direct_compaction) { + if (cc->direct_compaction && pfn_valid(min_pfn)) { page = pfn_to_page(min_pfn); cc->free_pfn = min_pfn; } -- 2.7.4 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[GIT PULL] KVM/arm updates for 5.2-rc2
Paolo, Radim, This is the first batch of KVM/arm fixes for 5.2. The biggest item on the menu is Christoffer removing himself from the MAINTAINERS file. He'll be missed. The rest is a set of fixes moving some code around to prevent KASAN and co from crashing the kernel on non-VHE systems. Please pull, M. The following changes since commit a188339ca5a396acc588e5851ed7e19f66b0ebd9: Linux 5.2-rc1 (2019-05-19 15:47:09 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git tags/kvmarm-fixes-for-5.2 for you to fetch changes up to 623e1528d4090bd1abaf93ec46f047dee9a6fb32: KVM: arm/arm64: Move cc/it checks under hyp's Makefile to avoid instrumentation (2019-05-24 14:53:20 +0100) KVM/arm updates for 5.2-rc2 - Correctly annotate HYP-callable code to be non-traceable - Remove Christoffer from the MAINTAINERS file as his request Christoffer Dall (1): MAINTAINERS: KVM: arm/arm64: Remove myself as maintainer James Morse (2): KVM: arm64: Move pmu hyp code under hyp's Makefile to avoid instrumentation KVM: arm/arm64: Move cc/it checks under hyp's Makefile to avoid instrumentation MAINTAINERS | 2 - arch/arm/kvm/hyp/Makefile | 1 + arch/arm64/include/asm/kvm_host.h | 3 - arch/arm64/kvm/hyp/Makefile | 1 + arch/arm64/kvm/hyp/switch.c | 39 +++ arch/arm64/kvm/pmu.c | 38 --- virt/kvm/arm/aarch32.c| 121 - virt/kvm/arm/hyp/aarch32.c| 136 ++ 8 files changed, 177 insertions(+), 164 deletions(-) create mode 100644 virt/kvm/arm/hyp/aarch32.c ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 1/3] MAINTAINERS: KVM: arm/arm64: Remove myself as maintainer
From: Christoffer Dall I no longer have time to actively review patches and manage the tree and it's time to make that official. Huge thanks to the incredible Linux community and all the contributors who have put up with me over the past years. I also take this opportunity to remove the website link to the Columbia web page, as that information is no longer up to date and I don't know who manages that anymore. Signed-off-by: Christoffer Dall Signed-off-by: Marc Zyngier --- MAINTAINERS | 2 -- 1 file changed, 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index 5cfbea4ce575..4ba271a8e0ef 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8611,14 +8611,12 @@ F: arch/x86/include/asm/svm.h F: arch/x86/kvm/svm.c KERNEL VIRTUAL MACHINE FOR ARM/ARM64 (KVM/arm, KVM/arm64) -M: Christoffer Dall M: Marc Zyngier R: James Morse R: Julien Thierry R: Suzuki K Pouloze L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers) L: kvmarm@lists.cs.columbia.edu -W: http://systems.cs.columbia.edu/projects/kvm-arm T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git S: Maintained F: arch/arm/include/uapi/asm/kvm* -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 3/3] KVM: arm/arm64: Move cc/it checks under hyp's Makefile to avoid instrumentation
From: James Morse KVM has helpers to handle the condition codes of trapped aarch32 instructions. These are marked __hyp_text and used from HYP, but they aren't built by the 'hyp' Makefile, which has all the runes to avoid ASAN and KCOV instrumentation. Move this code to a new hyp/aarch32.c to avoid a hyp-panic when starting an aarch32 guest on a host built with the ASAN/KCOV debug options. Fixes: 021234ef3752f ("KVM: arm64: Make kvm_condition_valid32() accessible from EL2") Fixes: 8cebe750c4d9a ("arm64: KVM: Make kvm_skip_instr32 available to HYP") Signed-off-by: James Morse Signed-off-by: Marc Zyngier --- arch/arm/kvm/hyp/Makefile | 1 + arch/arm64/kvm/hyp/Makefile | 1 + virt/kvm/arm/aarch32.c | 121 virt/kvm/arm/hyp/aarch32.c | 136 4 files changed, 138 insertions(+), 121 deletions(-) create mode 100644 virt/kvm/arm/hyp/aarch32.c diff --git a/arch/arm/kvm/hyp/Makefile b/arch/arm/kvm/hyp/Makefile index d2b5ec9c4b92..ba88b1eca93c 100644 --- a/arch/arm/kvm/hyp/Makefile +++ b/arch/arm/kvm/hyp/Makefile @@ -11,6 +11,7 @@ CFLAGS_ARMV7VE :=$(call cc-option, -march=armv7ve) obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/timer-sr.o +obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/aarch32.o obj-$(CONFIG_KVM_ARM_HOST) += tlb.o obj-$(CONFIG_KVM_ARM_HOST) += cp15-sr.o diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile index 82d1904328ad..ea710f674cb6 100644 --- a/arch/arm64/kvm/hyp/Makefile +++ b/arch/arm64/kvm/hyp/Makefile @@ -10,6 +10,7 @@ KVM=../../../../virt/kvm obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/timer-sr.o +obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/aarch32.o obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-cpuif-proxy.o obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o diff --git a/virt/kvm/arm/aarch32.c b/virt/kvm/arm/aarch32.c index 5abbe9b3c652..6880236974b8 100644 --- a/virt/kvm/arm/aarch32.c +++ b/virt/kvm/arm/aarch32.c @@ -25,127 +25,6 @@ #include #include -/* - * stolen from arch/arm/kernel/opcodes.c - * - * condition code lookup table - * index into the table is test code: EQ, NE, ... LT, GT, AL, NV - * - * bit position in short is condition code: NZCV - */ -static const unsigned short cc_map[16] = { - 0xF0F0, /* EQ == Z set*/ - 0x0F0F, /* NE */ - 0x, /* CS == C set*/ - 0x, /* CC */ - 0xFF00, /* MI == N set*/ - 0x00FF, /* PL */ - 0x, /* VS == V set*/ - 0x, /* VC */ - 0x0C0C, /* HI == C set && Z clear */ - 0xF3F3, /* LS == C clear || Z set */ - 0xAA55, /* GE == (N==V) */ - 0x55AA, /* LT == (N!=V) */ - 0x0A05, /* GT == (!Z && (N==V)) */ - 0xF5FA, /* LE == (Z || (N!=V))*/ - 0x, /* AL always */ - 0 /* NV */ -}; - -/* - * Check if a trapped instruction should have been executed or not. - */ -bool __hyp_text kvm_condition_valid32(const struct kvm_vcpu *vcpu) -{ - unsigned long cpsr; - u32 cpsr_cond; - int cond; - - /* Top two bits non-zero? Unconditional. */ - if (kvm_vcpu_get_hsr(vcpu) >> 30) - return true; - - /* Is condition field valid? */ - cond = kvm_vcpu_get_condition(vcpu); - if (cond == 0xE) - return true; - - cpsr = *vcpu_cpsr(vcpu); - - if (cond < 0) { - /* This can happen in Thumb mode: examine IT state. */ - unsigned long it; - - it = ((cpsr >> 8) & 0xFC) | ((cpsr >> 25) & 0x3); - - /* it == 0 => unconditional. */ - if (it == 0) - return true; - - /* The cond for this insn works out as the top 4 bits. */ - cond = (it >> 4); - } - - cpsr_cond = cpsr >> 28; - - if (!((cc_map[cond] >> cpsr_cond) & 1)) - return false; - - return true; -} - -/** - * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block - * @vcpu: The VCPU pointer - * - * When exceptions occur while instructions are executed in Thumb IF-THEN - * blocks, the ITSTATE field of the CPSR is not advanced (updated), so we have - * to do this little bit of work manually. The fields map like this: - * - * IT[7:0] -> CPSR[26:25],CPSR[15:10] - */ -static void __hyp_text kvm_adjust_itstate(struct kvm_vcpu *vcpu) -{ - unsigned long itbits, cond; - unsigned long
[PATCH 2/3] KVM: arm64: Move pmu hyp code under hyp's Makefile to avoid instrumentation
From: James Morse KVM's pmu.c contains the __hyp_text needed to switch the pmu registers between host and guest. Because this isn't covered by the 'hyp' Makefile, it can be built with kasan and friends when these are enabled in Kconfig. When starting a guest, this results in: | Kernel panic - not syncing: HYP panic: | PS:a3c9 PC:8328ada0 ESR:8607 | FAR:8328ada0 HPFAR:29df5300 PAR: | VCPU:4e10b7d6 | CPU: 0 PID: 3088 Comm: qemu-system-aar Not tainted 5.2.0-rc1 #11026 | Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Plat | Call trace: | dump_backtrace+0x0/0x200 | show_stack+0x20/0x30 | dump_stack+0xec/0x158 | panic+0x1ec/0x420 | panic+0x0/0x420 | SMP: stopping secondary CPUs | Kernel Offset: disabled | CPU features: 0x002,25006082 | Memory Limit: none | ---[ end Kernel panic - not syncing: HYP panic: This is caused by functions in pmu.c calling the instrumented code, which isn't mapped to hyp. From objdump -r: | RELOCATION RECORDS FOR [.hyp.text]: | OFFSET TYPE VALUE | 0010 R_AARCH64_CALL26 __sanitizer_cov_trace_pc | 0018 R_AARCH64_CALL26 __asan_load4_noabort | 0024 R_AARCH64_CALL26 __asan_load4_noabort Move the affected code to a new file under 'hyp's Makefile. Fixes: 3d91befbb3a0 ("arm64: KVM: Enable !VHE support for :G/:H perf event modifiers") Cc: Andrew Murray Signed-off-by: James Morse Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/kvm_host.h | 3 --- arch/arm64/kvm/hyp/switch.c | 39 +++ arch/arm64/kvm/pmu.c | 38 -- 3 files changed, 39 insertions(+), 41 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 2a8d3f8ca22c..4bcd9c1291d5 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -592,9 +592,6 @@ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu) void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr); void kvm_clr_pmu_events(u32 clr); -void __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt); -bool __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt); - void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu); void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu); #else diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c index 22b4c335e0b2..8799e0c267d4 100644 --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -16,6 +16,7 @@ */ #include +#include #include #include #include @@ -505,6 +506,44 @@ static void __hyp_text __set_host_arch_workaround_state(struct kvm_vcpu *vcpu) #endif } +/** + * Disable host events, enable guest events + */ +static bool __hyp_text __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt) +{ + struct kvm_host_data *host; + struct kvm_pmu_events *pmu; + + host = container_of(host_ctxt, struct kvm_host_data, host_ctxt); + pmu = >pmu_events; + + if (pmu->events_host) + write_sysreg(pmu->events_host, pmcntenclr_el0); + + if (pmu->events_guest) + write_sysreg(pmu->events_guest, pmcntenset_el0); + + return (pmu->events_host || pmu->events_guest); +} + +/** + * Disable guest events, enable host events + */ +static void __hyp_text __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt) +{ + struct kvm_host_data *host; + struct kvm_pmu_events *pmu; + + host = container_of(host_ctxt, struct kvm_host_data, host_ctxt); + pmu = >pmu_events; + + if (pmu->events_guest) + write_sysreg(pmu->events_guest, pmcntenclr_el0); + + if (pmu->events_host) + write_sysreg(pmu->events_host, pmcntenset_el0); +} + /* Switch to the guest for VHE systems running in EL2 */ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) { diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index 3da94a5bb6b7..e71d00bb5271 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -53,44 +53,6 @@ void kvm_clr_pmu_events(u32 clr) ctx->pmu_events.events_guest &= ~clr; } -/** - * Disable host events, enable guest events - */ -bool __hyp_text __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt) -{ - struct kvm_host_data *host; - struct kvm_pmu_events *pmu; - - host = container_of(host_ctxt, struct kvm_host_data, host_ctxt); - pmu = >pmu_events; - - if (pmu->events_host) - write_sysreg(pmu->events_host, pmcntenclr_el0); - - if (pmu->events_guest) - write_sysreg(pmu->events_guest, pmcntenset_el0); - - return (pmu->events_host || pmu->events_guest); -} - -/** - * Disable guest events, enable host events - */ -void __hyp_text __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt) -{ - struct kvm_host_data *host; - struct kvm_pmu_events *pmu; - - host =
Re: [PATCH v2 05/15] arm64: KVM: add access handler for SPE system registers
On Fri, May 24, 2019 at 12:36:24PM +0100, Julien Thierry wrote: > Hi Sudeep, > > On 23/05/2019 11:34, Sudeep Holla wrote: > > SPE Profiling Buffer owning EL is configurable and when MDCR_EL2.E2PB > > is configured to provide buffer ownership to EL1, the control registers > > are trapped. > > > > Add access handlers for the Statistical Profiling Extension(SPE) > > Profiling Buffer controls registers. This is need to support profiling > > using SPE in the guests. > > > > Signed-off-by: Sudeep Holla > > --- > > arch/arm64/include/asm/kvm_host.h | 13 > > arch/arm64/kvm/sys_regs.c | 35 +++ > > include/kvm/arm_spe.h | 15 + > > 3 files changed, 63 insertions(+) > > > > diff --git a/arch/arm64/include/asm/kvm_host.h > > b/arch/arm64/include/asm/kvm_host.h > > index 611a4884fb6c..559aa6931291 100644 > > --- a/arch/arm64/include/asm/kvm_host.h > > +++ b/arch/arm64/include/asm/kvm_host.h > > @@ -147,6 +147,19 @@ enum vcpu_sysreg { > > MDCCINT_EL1,/* Monitor Debug Comms Channel Interrupt Enable Reg */ > > DISR_EL1, /* Deferred Interrupt Status Register */ > > > > + /* Statistical Profiling Extension Registers */ > > + > > + PMSCR_EL1, > > + PMSICR_EL1, > > + PMSIRR_EL1, > > + PMSFCR_EL1, > > + PMSEVFR_EL1, > > + PMSLATFR_EL1, > > + PMSIDR_EL1, > > + PMBLIMITR_EL1, > > + PMBPTR_EL1, > > + PMBSR_EL1, > > + > > /* Performance Monitors Registers */ > > PMCR_EL0, /* Control Register */ > > PMSELR_EL0, /* Event Counter Selection Register */ > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > > index 857b226bcdde..dbf5056828d3 100644 > > --- a/arch/arm64/kvm/sys_regs.c > > +++ b/arch/arm64/kvm/sys_regs.c > > @@ -646,6 +646,30 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const > > struct sys_reg_desc *r) > > __vcpu_sys_reg(vcpu, PMCR_EL0) = val; > > } > > > > +static bool access_pmsb_val(struct kvm_vcpu *vcpu, struct sys_reg_params > > *p, > > + const struct sys_reg_desc *r) > > +{ > > + if (p->is_write) > > + vcpu_write_sys_reg(vcpu, p->regval, r->reg); > > + else > > + p->regval = vcpu_read_sys_reg(vcpu, r->reg); > > + > > + return true; > > +} > > + > > +static void reset_pmsb_val(struct kvm_vcpu *vcpu, const struct > > sys_reg_desc *r) > > +{ > > + if (!kvm_arm_support_spe_v1()) { > > + __vcpu_sys_reg(vcpu, r->reg) = 0; > > + return; > > + } > > + > > + if (r->reg == PMSIDR_EL1) > > If only PMSIDR_EL1 has a non-zero reset value, it feels a bit weird to > share the reset function for all these registers. > Ah, right. Initially I did have couple of other registers which were not needed. So I removed them without observing that I could have just used reset_val(0) for all except PMSIDR_EL1. > I would suggest only having a reset_pmsidr() function, and just use > reset_val() with sys_reg_desc->val set to 0 for all the others. > Thanks for pointing this out. -- Regards, Sudeep
Re: [PATCH] MAINTAINERS: KVM: arm/arm64: Remove myself as maintainer
Hi Christoffer, On 21/05/2019 14:25, Christoffer Dall wrote: > I no longer have time to actively review patches and manage the tree and > it's time to make that official. > > Huge thanks to the incredible Linux community and all the contributors > who have put up with me over the past years. > > I also take this opportunity to remove the website link to the Columbia > web page, as that information is no longer up to date and I don't know > who manages that anymore. > > Signed-off-by: Christoffer Dall > --- > MAINTAINERS | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 5cfbea4ce575..4ba271a8e0ef 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -8611,14 +8611,12 @@ F:arch/x86/include/asm/svm.h > F: arch/x86/kvm/svm.c > > KERNEL VIRTUAL MACHINE FOR ARM/ARM64 (KVM/arm, KVM/arm64) > -M: Christoffer Dall > M: Marc Zyngier > R: James Morse > R: Julien Thierry > R: Suzuki K Pouloze > L: linux-arm-ker...@lists.infradead.org (moderated for non-subscribers) > L: kvmarm@lists.cs.columbia.edu > -W: http://systems.cs.columbia.edu/projects/kvm-arm > T: git git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git > S: Maintained > F: arch/arm/include/uapi/asm/kvm* > With regrets: applied as a fix for 5.2. Thanks *a lot* for all the great work you've done over the years, you've been an awesome co-maintainer. Do remember that we know where to find you, though! ;-) M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: mm/compaction: BUG: NULL pointer dereference
On 05/24/2019 06:00 PM, Mel Gorman wrote: > On Fri, May 24, 2019 at 04:26:16PM +0530, Anshuman Khandual wrote: >> >> >> On 05/24/2019 02:50 PM, Suzuki K Poulose wrote: >>> Hi, >>> >>> We are hitting NULL pointer dereferences while running stress tests with >>> KVM. >>> See splat [0]. The test is to spawn 100 VMs all doing standard debian >>> installation (Thanks to Marc's automated scripts, available here [1] ). >>> The problem has been reproduced with a better rate of success from 5.1-rc6 >>> onwards. >>> >>> The issue is only reproducible with swapping enabled and the entire >>> memory is used up, when swapping heavily. Also this issue is only >>> reproducible >>> on only one server with 128GB, which has the following memory layout: >>> >>> [32GB@4GB, hole , 96GB@544GB] >>> >>> Here is my non-expert analysis of the issue so far. >>> >>> Under extreme memory pressure, the kswapd could trigger >>> reset_isolation_suitable() >>> to figure out the cached values for migrate/free pfn for a zone, by >>> scanning through >>> the entire zone. On our server it does so in the range of [ 0x10_, >>> 0xa00_ ], >>> with the following area of holes : [ 0x20_, 0x880_ ]. >>> In the failing case, we end up setting the cached migrate pfn as : >>> 0x508_, which >>> is right in the center of the zone pfn range. i.e ( 0x10_ + 0xa00_ >>> ) / 2, >>> with reset_migrate = 0x88_4e00, reset_free = 0x10_. >>> >>> Now these cached values are used by the fast_isolate_freepages() to find a >>> pfn. However, >>> since we cant find anything during the search we fall back to using the >>> page belonging >>> to the min_pfn (which is the migrate_pfn), without proper checks to see if >>> that is valid >>> PFN or not. This is then passed on to fast_isolate_around() which tries to >>> do : >>> set_pageblock_skip(page) on the page which blows up due to an NULL >>> mem_section pointer. >>> >>> The following patch seems to fix the issue for me, but I am not quite >>> convinced that >>> it is the right fix. Thoughts ? >>> >>> >>> diff --git a/mm/compaction.c b/mm/compaction.c >>> index 9febc8c..9e1b9ac 100644 >>> --- a/mm/compaction.c >>> +++ b/mm/compaction.c >>> @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) >>> page = pfn_to_page(highest); >>> cc->free_pfn = highest; >>> } else { >>> - if (cc->direct_compaction) { >>> + if (cc->direct_compaction && >>> pfn_valid(min_pfn)) { >>> page = pfn_to_page(min_pfn); >> >> pfn_to_online_page() here would be better as it does not add pfn_valid() >> cost on >> architectures which does not subscribe to CONFIG_HOLES_IN_ZONE. But >> regardless if >> the compaction is trying to scan pfns in zone holes, then it should be >> avoided. > > CONFIG_HOLES_IN_ZONE typically applies in special cases where an arch > punches holes within a section. As both do a section lookup, the cost is > similar but pfn_valid in general is less subtle in this case. Normally > pfn_valid_within is only ok when a pfn_valid check has been made on the > max_order aligned range as well as a zone boundary check. In this case, > it's much more straight-forward to leave it as pfn_valid. Sure, makes sense. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: mm/compaction: BUG: NULL pointer dereference
On Fri, May 24, 2019 at 04:26:16PM +0530, Anshuman Khandual wrote: > > > On 05/24/2019 02:50 PM, Suzuki K Poulose wrote: > > Hi, > > > > We are hitting NULL pointer dereferences while running stress tests with > > KVM. > > See splat [0]. The test is to spawn 100 VMs all doing standard debian > > installation (Thanks to Marc's automated scripts, available here [1] ). > > The problem has been reproduced with a better rate of success from 5.1-rc6 > > onwards. > > > > The issue is only reproducible with swapping enabled and the entire > > memory is used up, when swapping heavily. Also this issue is only > > reproducible > > on only one server with 128GB, which has the following memory layout: > > > > [32GB@4GB, hole , 96GB@544GB] > > > > Here is my non-expert analysis of the issue so far. > > > > Under extreme memory pressure, the kswapd could trigger > > reset_isolation_suitable() > > to figure out the cached values for migrate/free pfn for a zone, by > > scanning through > > the entire zone. On our server it does so in the range of [ 0x10_, > > 0xa00_ ], > > with the following area of holes : [ 0x20_, 0x880_ ]. > > In the failing case, we end up setting the cached migrate pfn as : > > 0x508_, which > > is right in the center of the zone pfn range. i.e ( 0x10_ + 0xa00_ > > ) / 2, > > with reset_migrate = 0x88_4e00, reset_free = 0x10_. > > > > Now these cached values are used by the fast_isolate_freepages() to find a > > pfn. However, > > since we cant find anything during the search we fall back to using the > > page belonging > > to the min_pfn (which is the migrate_pfn), without proper checks to see if > > that is valid > > PFN or not. This is then passed on to fast_isolate_around() which tries to > > do : > > set_pageblock_skip(page) on the page which blows up due to an NULL > > mem_section pointer. > > > > The following patch seems to fix the issue for me, but I am not quite > > convinced that > > it is the right fix. Thoughts ? > > > > > > diff --git a/mm/compaction.c b/mm/compaction.c > > index 9febc8c..9e1b9ac 100644 > > --- a/mm/compaction.c > > +++ b/mm/compaction.c > > @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) > > page = pfn_to_page(highest); > > cc->free_pfn = highest; > > } else { > > - if (cc->direct_compaction) { > > + if (cc->direct_compaction && > > pfn_valid(min_pfn)) { > > page = pfn_to_page(min_pfn); > > pfn_to_online_page() here would be better as it does not add pfn_valid() cost > on > architectures which does not subscribe to CONFIG_HOLES_IN_ZONE. But > regardless if > the compaction is trying to scan pfns in zone holes, then it should be > avoided. CONFIG_HOLES_IN_ZONE typically applies in special cases where an arch punches holes within a section. As both do a section lookup, the cost is similar but pfn_valid in general is less subtle in this case. Normally pfn_valid_within is only ok when a pfn_valid check has been made on the max_order aligned range as well as a zone boundary check. In this case, it's much more straight-forward to leave it as pfn_valid. -- Mel Gorman SUSE Labs
Re: [PATCH v2 12/15] KVM: arm64: add a new vcpu device control group for SPEv1
On 24/05/2019 12:21, Sudeep Holla wrote: > On Fri, May 24, 2019 at 11:37:51AM +0100, Marc Zyngier wrote: >> Hi Sudeep, >> >> On 23/05/2019 11:34, Sudeep Holla wrote: >>> To configure the virtual SPEv1 overflow interrupt number, we use the >>> vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_SPE_V1_IRQ >>> attribute within the KVM_ARM_VCPU_SPE_V1_CTRL group. >>> >>> After configuring the SPEv1, call the vcpu ioctl with attribute >>> KVM_ARM_VCPU_SPE_V1_INIT to initialize the SPEv1. >>> >>> Signed-off-by: Sudeep Holla >>> --- >>> Documentation/virtual/kvm/devices/vcpu.txt | 28 >>> arch/arm64/include/asm/kvm_host.h | 2 +- >>> arch/arm64/include/uapi/asm/kvm.h | 4 + >>> arch/arm64/kvm/Makefile| 1 + >>> arch/arm64/kvm/guest.c | 9 ++ >>> arch/arm64/kvm/reset.c | 3 + >>> include/kvm/arm_spe.h | 35 + >>> include/uapi/linux/kvm.h | 1 + >>> virt/kvm/arm/arm.c | 1 + >>> virt/kvm/arm/spe.c | 163 + >>> 10 files changed, 246 insertions(+), 1 deletion(-) >>> create mode 100644 virt/kvm/arm/spe.c >>> >>> diff --git a/Documentation/virtual/kvm/devices/vcpu.txt >>> b/Documentation/virtual/kvm/devices/vcpu.txt >>> index 2b5dab16c4f2..d1ece488aeee 100644 >>> --- a/Documentation/virtual/kvm/devices/vcpu.txt >>> +++ b/Documentation/virtual/kvm/devices/vcpu.txt >>> @@ -60,3 +60,31 @@ time to use the number provided for a given timer, >>> overwriting any previously >>> configured values on other VCPUs. Userspace should configure the interrupt >>> numbers on at least one VCPU after creating all VCPUs and before running >>> any >>> VCPUs. >>> + >>> +3. GROUP: KVM_ARM_VCPU_SPE_V1_CTRL >>> +Architectures: ARM64 >>> + >>> +1.1. ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_IRQ >>> +Parameters: in kvm_device_attr.addr the address for SPE buffer overflow >>> interrupt >>> + is a pointer to an int >>> +Returns: -EBUSY: The SPE overflow interrupt is already set >>> + -ENXIO: The overflow interrupt not set when attempting to get it >>> + -ENODEV: SPEv1 not supported >>> + -EINVAL: Invalid SPE overflow interrupt number supplied or >>> + trying to set the IRQ number without using an in-kernel >>> + irqchip. >>> + >>> +A value describing the SPEv1 (Statistical Profiling Extension v1) overflow >>> +interrupt number for this vcpu. This interrupt should be PPI and the >>> interrupt >>> +type and number must be same for each vcpu. >>> + >>> +1.2 ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_INIT >>> +Parameters: no additional parameter in kvm_device_attr.addr >>> +Returns: -ENODEV: SPEv1 not supported or GIC not initialized >>> + -ENXIO: SPEv1 not properly configured or in-kernel irqchip not >>> + configured as required prior to calling this attribute >>> + -EBUSY: SPEv1 already initialized >>> + >>> +Request the initialization of the SPEv1. If using the SPEv1 with an >>> in-kernel >>> +virtual GIC implementation, this must be done after initializing the >>> in-kernel >>> +irqchip. >>> diff --git a/arch/arm64/include/asm/kvm_host.h >>> b/arch/arm64/include/asm/kvm_host.h >>> index 6921fdfd477b..fc4ead0774b3 100644 >>> --- a/arch/arm64/include/asm/kvm_host.h >>> +++ b/arch/arm64/include/asm/kvm_host.h >>> @@ -50,7 +50,7 @@ >>> >>> #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS >>> >>> -#define KVM_VCPU_MAX_FEATURES 7 >>> +#define KVM_VCPU_MAX_FEATURES 8 >>> >>> #define KVM_REQ_SLEEP \ >>> KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) >>> diff --git a/arch/arm64/include/uapi/asm/kvm.h >>> b/arch/arm64/include/uapi/asm/kvm.h >>> index 7b7ac0f6cec9..4c9e168de896 100644 >>> --- a/arch/arm64/include/uapi/asm/kvm.h >>> +++ b/arch/arm64/include/uapi/asm/kvm.h >>> @@ -106,6 +106,7 @@ struct kvm_regs { >>> #define KVM_ARM_VCPU_SVE 4 /* enable SVE for this CPU */ >>> #define KVM_ARM_VCPU_PTRAUTH_ADDRESS 5 /* VCPU uses address >>> authentication */ >>> #define KVM_ARM_VCPU_PTRAUTH_GENERIC 6 /* VCPU uses generic >>> authentication */ >>> +#define KVM_ARM_VCPU_SPE_V17 /* Support guest SPEv1 */ >>> >>> struct kvm_vcpu_init { >>> __u32 target; >>> @@ -306,6 +307,9 @@ struct kvm_vcpu_events { >>> #define KVM_ARM_VCPU_TIMER_CTRL1 >>> #define KVM_ARM_VCPU_TIMER_IRQ_VTIMER0 >>> #define KVM_ARM_VCPU_TIMER_IRQ_PTIMER1 >>> +#define KVM_ARM_VCPU_SPE_V1_CTRL 2 >>> +#define KVM_ARM_VCPU_SPE_V1_IRQ 0 >>> +#define KVM_ARM_VCPU_SPE_V1_INIT 1 >>> >>> /* KVM_IRQ_LINE irq field index values */ >>> #define KVM_ARM_IRQ_TYPE_SHIFT 24 >>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile >>> index 3ac1a64d2fb9..1ba6154dd8e1 100644 >>> --- a/arch/arm64/kvm/Makefile >>> +++ b/arch/arm64/kvm/Makefile >>> @@ -35,3 +35,4 @@
Re: [PATCH v2 12/15] KVM: arm64: add a new vcpu device control group for SPEv1
On Fri, May 24, 2019 at 11:37:51AM +0100, Marc Zyngier wrote: > Hi Sudeep, > > On 23/05/2019 11:34, Sudeep Holla wrote: > > To configure the virtual SPEv1 overflow interrupt number, we use the > > vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_SPE_V1_IRQ > > attribute within the KVM_ARM_VCPU_SPE_V1_CTRL group. > > > > After configuring the SPEv1, call the vcpu ioctl with attribute > > KVM_ARM_VCPU_SPE_V1_INIT to initialize the SPEv1. > > > > Signed-off-by: Sudeep Holla > > --- > > Documentation/virtual/kvm/devices/vcpu.txt | 28 > > arch/arm64/include/asm/kvm_host.h | 2 +- > > arch/arm64/include/uapi/asm/kvm.h | 4 + > > arch/arm64/kvm/Makefile| 1 + > > arch/arm64/kvm/guest.c | 9 ++ > > arch/arm64/kvm/reset.c | 3 + > > include/kvm/arm_spe.h | 35 + > > include/uapi/linux/kvm.h | 1 + > > virt/kvm/arm/arm.c | 1 + > > virt/kvm/arm/spe.c | 163 + > > 10 files changed, 246 insertions(+), 1 deletion(-) > > create mode 100644 virt/kvm/arm/spe.c > > > > diff --git a/Documentation/virtual/kvm/devices/vcpu.txt > > b/Documentation/virtual/kvm/devices/vcpu.txt > > index 2b5dab16c4f2..d1ece488aeee 100644 > > --- a/Documentation/virtual/kvm/devices/vcpu.txt > > +++ b/Documentation/virtual/kvm/devices/vcpu.txt > > @@ -60,3 +60,31 @@ time to use the number provided for a given timer, > > overwriting any previously > > configured values on other VCPUs. Userspace should configure the interrupt > > numbers on at least one VCPU after creating all VCPUs and before running > > any > > VCPUs. > > + > > +3. GROUP: KVM_ARM_VCPU_SPE_V1_CTRL > > +Architectures: ARM64 > > + > > +1.1. ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_IRQ > > +Parameters: in kvm_device_attr.addr the address for SPE buffer overflow > > interrupt > > + is a pointer to an int > > +Returns: -EBUSY: The SPE overflow interrupt is already set > > + -ENXIO: The overflow interrupt not set when attempting to get it > > + -ENODEV: SPEv1 not supported > > + -EINVAL: Invalid SPE overflow interrupt number supplied or > > + trying to set the IRQ number without using an in-kernel > > + irqchip. > > + > > +A value describing the SPEv1 (Statistical Profiling Extension v1) overflow > > +interrupt number for this vcpu. This interrupt should be PPI and the > > interrupt > > +type and number must be same for each vcpu. > > + > > +1.2 ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_INIT > > +Parameters: no additional parameter in kvm_device_attr.addr > > +Returns: -ENODEV: SPEv1 not supported or GIC not initialized > > + -ENXIO: SPEv1 not properly configured or in-kernel irqchip not > > + configured as required prior to calling this attribute > > + -EBUSY: SPEv1 already initialized > > + > > +Request the initialization of the SPEv1. If using the SPEv1 with an > > in-kernel > > +virtual GIC implementation, this must be done after initializing the > > in-kernel > > +irqchip. > > diff --git a/arch/arm64/include/asm/kvm_host.h > > b/arch/arm64/include/asm/kvm_host.h > > index 6921fdfd477b..fc4ead0774b3 100644 > > --- a/arch/arm64/include/asm/kvm_host.h > > +++ b/arch/arm64/include/asm/kvm_host.h > > @@ -50,7 +50,7 @@ > > > > #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS > > > > -#define KVM_VCPU_MAX_FEATURES 7 > > +#define KVM_VCPU_MAX_FEATURES 8 > > > > #define KVM_REQ_SLEEP \ > > KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) > > diff --git a/arch/arm64/include/uapi/asm/kvm.h > > b/arch/arm64/include/uapi/asm/kvm.h > > index 7b7ac0f6cec9..4c9e168de896 100644 > > --- a/arch/arm64/include/uapi/asm/kvm.h > > +++ b/arch/arm64/include/uapi/asm/kvm.h > > @@ -106,6 +106,7 @@ struct kvm_regs { > > #define KVM_ARM_VCPU_SVE 4 /* enable SVE for this CPU */ > > #define KVM_ARM_VCPU_PTRAUTH_ADDRESS 5 /* VCPU uses address > > authentication */ > > #define KVM_ARM_VCPU_PTRAUTH_GENERIC 6 /* VCPU uses generic > > authentication */ > > +#define KVM_ARM_VCPU_SPE_V17 /* Support guest SPEv1 */ > > > > struct kvm_vcpu_init { > > __u32 target; > > @@ -306,6 +307,9 @@ struct kvm_vcpu_events { > > #define KVM_ARM_VCPU_TIMER_CTRL1 > > #define KVM_ARM_VCPU_TIMER_IRQ_VTIMER0 > > #define KVM_ARM_VCPU_TIMER_IRQ_PTIMER1 > > +#define KVM_ARM_VCPU_SPE_V1_CTRL 2 > > +#define KVM_ARM_VCPU_SPE_V1_IRQ 0 > > +#define KVM_ARM_VCPU_SPE_V1_INIT 1 > > > > /* KVM_IRQ_LINE irq field index values */ > > #define KVM_ARM_IRQ_TYPE_SHIFT 24 > > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile > > index 3ac1a64d2fb9..1ba6154dd8e1 100644 > > --- a/arch/arm64/kvm/Makefile > > +++ b/arch/arm64/kvm/Makefile > > @@ -35,3 +35,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) +=
Re: mm/compaction: BUG: NULL pointer dereference
On 05/24/2019 02:50 PM, Suzuki K Poulose wrote: > Hi, > > We are hitting NULL pointer dereferences while running stress tests with KVM. > See splat [0]. The test is to spawn 100 VMs all doing standard debian > installation (Thanks to Marc's automated scripts, available here [1] ). > The problem has been reproduced with a better rate of success from 5.1-rc6 > onwards. > > The issue is only reproducible with swapping enabled and the entire > memory is used up, when swapping heavily. Also this issue is only reproducible > on only one server with 128GB, which has the following memory layout: > > [32GB@4GB, hole , 96GB@544GB] > > Here is my non-expert analysis of the issue so far. > > Under extreme memory pressure, the kswapd could trigger > reset_isolation_suitable() > to figure out the cached values for migrate/free pfn for a zone, by scanning > through > the entire zone. On our server it does so in the range of [ 0x10_, > 0xa00_ ], > with the following area of holes : [ 0x20_, 0x880_ ]. > In the failing case, we end up setting the cached migrate pfn as : > 0x508_, which > is right in the center of the zone pfn range. i.e ( 0x10_ + 0xa00_ ) > / 2, > with reset_migrate = 0x88_4e00, reset_free = 0x10_. > > Now these cached values are used by the fast_isolate_freepages() to find a > pfn. However, > since we cant find anything during the search we fall back to using the page > belonging > to the min_pfn (which is the migrate_pfn), without proper checks to see if > that is valid > PFN or not. This is then passed on to fast_isolate_around() which tries to do > : > set_pageblock_skip(page) on the page which blows up due to an NULL > mem_section pointer. > > The following patch seems to fix the issue for me, but I am not quite > convinced that > it is the right fix. Thoughts ? > > > diff --git a/mm/compaction.c b/mm/compaction.c > index 9febc8c..9e1b9ac 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) > page = pfn_to_page(highest); > cc->free_pfn = highest; > } else { > - if (cc->direct_compaction) { > + if (cc->direct_compaction && > pfn_valid(min_pfn)) { > page = pfn_to_page(min_pfn); pfn_to_online_page() here would be better as it does not add pfn_valid() cost on architectures which does not subscribe to CONFIG_HOLES_IN_ZONE. But regardless if the compaction is trying to scan pfns in zone holes, then it should be avoided.
Re: mm/compaction: BUG: NULL pointer dereference
Hi Mel, Thanks for your quick response. On 24/05/2019 11:39, Mel Gorman wrote: On Fri, May 24, 2019 at 10:20:19AM +0100, Suzuki K Poulose wrote: Hi, We are hitting NULL pointer dereferences while running stress tests with KVM. See splat [0]. The test is to spawn 100 VMs all doing standard debian installation (Thanks to Marc's automated scripts, available here [1] ). The problem has been reproduced with a better rate of success from 5.1-rc6 onwards. The issue is only reproducible with swapping enabled and the entire memory is used up, when swapping heavily. Also this issue is only reproducible on only one server with 128GB, which has the following memory layout: [32GB@4GB, hole , 96GB@544GB] Here is my non-expert analysis of the issue so far. Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable() to figure out the cached values for migrate/free pfn for a zone, by scanning through the entire zone. On our server it does so in the range of [ 0x10_, 0xa00_ ], with the following area of holes : [ 0x20_, 0x880_ ]. In the failing case, we end up setting the cached migrate pfn as : 0x508_, which is right in the center of the zone pfn range. i.e ( 0x10_ + 0xa00_ ) / 2, with reset_migrate = 0x88_4e00, reset_free = 0x10_. Now these cached values are used by the fast_isolate_freepages() to find a pfn. However, since we cant find anything during the search we fall back to using the page belonging to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid PFN or not. This is then passed on to fast_isolate_around() which tries to do : set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer. The following patch seems to fix the issue for me, but I am not quite convinced that it is the right fix. Thoughts ? I think the patch is valid and the alternatives would be unnecessarily complicated. During a normal scan for free pages to isolate, there is a check for pageblock_pfn_to_page() which uses a pfn_valid check for non-contiguous zones in __pageblock_pfn_to_page. Now, while the I had the initial version with the pageblock_pfn_to_page(), but as you said, it is a complicated way of perform the same check as pfn_valid(). non-contiguous check could be made in the area you highlight, it would be a relatively small optimisation that would be unmeasurable overall. However, it is definitely the case that if the PFN you highlight is invalid that badness happens. If you want to express this as a signed-off patch with an adjusted changelog then I'd be happy to add Sure, will send it right away. Reviewed-by: Mel Gorman Thanks. Suzuki
Re: mm/compaction: BUG: NULL pointer dereference
On Fri, May 24, 2019 at 10:20:19AM +0100, Suzuki K Poulose wrote: > Hi, > > We are hitting NULL pointer dereferences while running stress tests with KVM. > See splat [0]. The test is to spawn 100 VMs all doing standard debian > installation (Thanks to Marc's automated scripts, available here [1] ). > The problem has been reproduced with a better rate of success from 5.1-rc6 > onwards. > > The issue is only reproducible with swapping enabled and the entire > memory is used up, when swapping heavily. Also this issue is only reproducible > on only one server with 128GB, which has the following memory layout: > > [32GB@4GB, hole , 96GB@544GB] > > Here is my non-expert analysis of the issue so far. > > Under extreme memory pressure, the kswapd could trigger > reset_isolation_suitable() > to figure out the cached values for migrate/free pfn for a zone, by scanning > through > the entire zone. On our server it does so in the range of [ 0x10_, > 0xa00_ ], > with the following area of holes : [ 0x20_, 0x880_ ]. > In the failing case, we end up setting the cached migrate pfn as : > 0x508_, which > is right in the center of the zone pfn range. i.e ( 0x10_ + 0xa00_ ) > / 2, > with reset_migrate = 0x88_4e00, reset_free = 0x10_. > > Now these cached values are used by the fast_isolate_freepages() to find a > pfn. However, > since we cant find anything during the search we fall back to using the page > belonging > to the min_pfn (which is the migrate_pfn), without proper checks to see if > that is valid > PFN or not. This is then passed on to fast_isolate_around() which tries to do > : > set_pageblock_skip(page) on the page which blows up due to an NULL > mem_section pointer. > > The following patch seems to fix the issue for me, but I am not quite > convinced that > it is the right fix. Thoughts ? > I think the patch is valid and the alternatives would be unnecessarily complicated. During a normal scan for free pages to isolate, there is a check for pageblock_pfn_to_page() which uses a pfn_valid check for non-contiguous zones in __pageblock_pfn_to_page. Now, while the non-contiguous check could be made in the area you highlight, it would be a relatively small optimisation that would be unmeasurable overall. However, it is definitely the case that if the PFN you highlight is invalid that badness happens. If you want to express this as a signed-off patch with an adjusted changelog then I'd be happy to add Reviewed-by: Mel Gorman If you are not comfortable with rewriting the changelog and formatting it as a patch then I can do it on your behalf and preserve your Signed-off-by. Just let me know. Thanks for researching this, I think it also applies to other people but had not found the time to track it down. -- Mel Gorman SUSE Labs
Re: [PATCH v2 12/15] KVM: arm64: add a new vcpu device control group for SPEv1
Hi Sudeep, On 23/05/2019 11:34, Sudeep Holla wrote: > To configure the virtual SPEv1 overflow interrupt number, we use the > vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_SPE_V1_IRQ > attribute within the KVM_ARM_VCPU_SPE_V1_CTRL group. > > After configuring the SPEv1, call the vcpu ioctl with attribute > KVM_ARM_VCPU_SPE_V1_INIT to initialize the SPEv1. > > Signed-off-by: Sudeep Holla > --- > Documentation/virtual/kvm/devices/vcpu.txt | 28 > arch/arm64/include/asm/kvm_host.h | 2 +- > arch/arm64/include/uapi/asm/kvm.h | 4 + > arch/arm64/kvm/Makefile| 1 + > arch/arm64/kvm/guest.c | 9 ++ > arch/arm64/kvm/reset.c | 3 + > include/kvm/arm_spe.h | 35 + > include/uapi/linux/kvm.h | 1 + > virt/kvm/arm/arm.c | 1 + > virt/kvm/arm/spe.c | 163 + > 10 files changed, 246 insertions(+), 1 deletion(-) > create mode 100644 virt/kvm/arm/spe.c > > diff --git a/Documentation/virtual/kvm/devices/vcpu.txt > b/Documentation/virtual/kvm/devices/vcpu.txt > index 2b5dab16c4f2..d1ece488aeee 100644 > --- a/Documentation/virtual/kvm/devices/vcpu.txt > +++ b/Documentation/virtual/kvm/devices/vcpu.txt > @@ -60,3 +60,31 @@ time to use the number provided for a given timer, > overwriting any previously > configured values on other VCPUs. Userspace should configure the interrupt > numbers on at least one VCPU after creating all VCPUs and before running any > VCPUs. > + > +3. GROUP: KVM_ARM_VCPU_SPE_V1_CTRL > +Architectures: ARM64 > + > +1.1. ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_IRQ > +Parameters: in kvm_device_attr.addr the address for SPE buffer overflow > interrupt > + is a pointer to an int > +Returns: -EBUSY: The SPE overflow interrupt is already set > + -ENXIO: The overflow interrupt not set when attempting to get it > + -ENODEV: SPEv1 not supported > + -EINVAL: Invalid SPE overflow interrupt number supplied or > + trying to set the IRQ number without using an in-kernel > + irqchip. > + > +A value describing the SPEv1 (Statistical Profiling Extension v1) overflow > +interrupt number for this vcpu. This interrupt should be PPI and the > interrupt > +type and number must be same for each vcpu. > + > +1.2 ATTRIBUTE: KVM_ARM_VCPU_SPE_V1_INIT > +Parameters: no additional parameter in kvm_device_attr.addr > +Returns: -ENODEV: SPEv1 not supported or GIC not initialized > + -ENXIO: SPEv1 not properly configured or in-kernel irqchip not > + configured as required prior to calling this attribute > + -EBUSY: SPEv1 already initialized > + > +Request the initialization of the SPEv1. If using the SPEv1 with an > in-kernel > +virtual GIC implementation, this must be done after initializing the > in-kernel > +irqchip. > diff --git a/arch/arm64/include/asm/kvm_host.h > b/arch/arm64/include/asm/kvm_host.h > index 6921fdfd477b..fc4ead0774b3 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -50,7 +50,7 @@ > > #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS > > -#define KVM_VCPU_MAX_FEATURES 7 > +#define KVM_VCPU_MAX_FEATURES 8 > > #define KVM_REQ_SLEEP \ > KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) > diff --git a/arch/arm64/include/uapi/asm/kvm.h > b/arch/arm64/include/uapi/asm/kvm.h > index 7b7ac0f6cec9..4c9e168de896 100644 > --- a/arch/arm64/include/uapi/asm/kvm.h > +++ b/arch/arm64/include/uapi/asm/kvm.h > @@ -106,6 +106,7 @@ struct kvm_regs { > #define KVM_ARM_VCPU_SVE 4 /* enable SVE for this CPU */ > #define KVM_ARM_VCPU_PTRAUTH_ADDRESS 5 /* VCPU uses address authentication */ > #define KVM_ARM_VCPU_PTRAUTH_GENERIC 6 /* VCPU uses generic authentication */ > +#define KVM_ARM_VCPU_SPE_V1 7 /* Support guest SPEv1 */ > > struct kvm_vcpu_init { > __u32 target; > @@ -306,6 +307,9 @@ struct kvm_vcpu_events { > #define KVM_ARM_VCPU_TIMER_CTRL 1 > #define KVM_ARM_VCPU_TIMER_IRQ_VTIMER 0 > #define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1 > +#define KVM_ARM_VCPU_SPE_V1_CTRL 2 > +#define KVM_ARM_VCPU_SPE_V1_IRQ0 > +#define KVM_ARM_VCPU_SPE_V1_INIT 1 > > /* KVM_IRQ_LINE irq field index values */ > #define KVM_ARM_IRQ_TYPE_SHIFT 24 > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile > index 3ac1a64d2fb9..1ba6154dd8e1 100644 > --- a/arch/arm64/kvm/Makefile > +++ b/arch/arm64/kvm/Makefile > @@ -35,3 +35,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-debug.o > kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o > kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o > kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o > +kvm-$(CONFIG_KVM_ARM_SPE) += $(KVM)/arm/spe.o > diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c > index
mm/compaction: BUG: NULL pointer dereference
Hi, We are hitting NULL pointer dereferences while running stress tests with KVM. See splat [0]. The test is to spawn 100 VMs all doing standard debian installation (Thanks to Marc's automated scripts, available here [1] ). The problem has been reproduced with a better rate of success from 5.1-rc6 onwards. The issue is only reproducible with swapping enabled and the entire memory is used up, when swapping heavily. Also this issue is only reproducible on only one server with 128GB, which has the following memory layout: [32GB@4GB, hole , 96GB@544GB] Here is my non-expert analysis of the issue so far. Under extreme memory pressure, the kswapd could trigger reset_isolation_suitable() to figure out the cached values for migrate/free pfn for a zone, by scanning through the entire zone. On our server it does so in the range of [ 0x10_, 0xa00_ ], with the following area of holes : [ 0x20_, 0x880_ ]. In the failing case, we end up setting the cached migrate pfn as : 0x508_, which is right in the center of the zone pfn range. i.e ( 0x10_ + 0xa00_ ) / 2, with reset_migrate = 0x88_4e00, reset_free = 0x10_. Now these cached values are used by the fast_isolate_freepages() to find a pfn. However, since we cant find anything during the search we fall back to using the page belonging to the min_pfn (which is the migrate_pfn), without proper checks to see if that is valid PFN or not. This is then passed on to fast_isolate_around() which tries to do : set_pageblock_skip(page) on the page which blows up due to an NULL mem_section pointer. The following patch seems to fix the issue for me, but I am not quite convinced that it is the right fix. Thoughts ? diff --git a/mm/compaction.c b/mm/compaction.c index 9febc8c..9e1b9ac 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_control *cc) page = pfn_to_page(highest); cc->free_pfn = highest; } else { - if (cc->direct_compaction) { + if (cc->direct_compaction && pfn_valid(min_pfn)) { page = pfn_to_page(min_pfn); cc->free_pfn = min_pfn; } Suzuki [ 0 ] Kernel splat Unable to handle kernel NULL pointer dereference at virtual address 0008 [47/1825] Mem abort info: ESR = 0x9604 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x0004 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp = 82f94ae9 [0008] pgd= Internal error: Oops: 9604 [#1] SMP ... CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6 Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018 pstate: 6005 (nZCv daif -PAN -UAO) pc : set_pfnblock_flags_mask+0x58/0xe8 lr : compaction_alloc+0x300/0x950 sp : 1fc03010 x29: 1fc03010 x28: x27: x26: 10bf7000 x25: 06445000 x24: 06444e00 x23: 7e018f138000 x22: 0003 x21: 0001 x20: 06444e00 x19: 0001 x18: x17: x16: 809f7fe97268 x15: 000191138000 x14: x13: 0070 x12: x11: 1fc03108 x10: x9 : 09222400 x8 : 0187 x7 : 063c4e00 x6 : 06444e00 x5 : 0008 x4 : 0001 x3 : 0003 x2 : 809f7fe92840 x1 : 0220 x0 : Process qemu-system-aar (pid: 6080, stack limit = 0x95070da5) Call trace: set_pfnblock_flags_mask+0x58/0xe8 compaction_alloc+0x300/0x950 migrate_pages+0x1a4/0xbb0 compact_zone+0x750/0xde8 compact_zone_order+0xd8/0x118 try_to_compact_pages+0xb4/0x290 __alloc_pages_direct_compact+0x84/0x1e0 __alloc_pages_nodemask+0x5e0/0xe18 alloc_pages_vma+0x1cc/0x210 do_huge_pmd_anonymous_page+0x108/0x7c8 __handle_mm_fault+0xdd4/0x1190 handle_mm_fault+0x114/0x1c0 __get_user_pages+0x198/0x3c0 get_user_pages_unlocked+0xb4/0x1d8 __gfn_to_pfn_memslot+0x12c/0x3b8 gfn_to_pfn_prot+0x4c/0x60 kvm_handle_guest_abort+0x4b0/0xcd8 handle_exit+0x140/0x1b8 kvm_arch_vcpu_ioctl_run+0x260/0x768 kvm_vcpu_ioctl+0x490/0x898 do_vfs_ioctl+0xc4/0x898 ksys_ioctl+0x8c/0xa0 __arm64_sys_ioctl+0x28/0x38 el0_svc_common+0x74/0x118 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc Code: f8607840 f11f 8b011401 9a801020 (f9400400) ---[ end trace af6a35219325a9b6 ]--- [1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/vminstall.git/ ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu