Re: [PATCH] arm64: KVM: reduce guest fpsimd trap
On Wed, May 16, 2018 at 20:46 PM GST+8, Christoffer Dall wrote: > On Wed, May 16, 2018 at 11:32:17AM +0100, Dave Martin wrote: > > On Wed, May 16, 2018 at 10:25:40AM +0100, Marc Zyngier wrote: > > > [+Dave] > > > > > > Hi Nianyao, > > > > > > On 16/05/18 10:08, Tangnianyao (ICT) wrote: > > > > Add last_fpsimd_trap to notify if guest last exit reason is handling > > > > fpsimd. If guest is using fpsimd frequently, save host's fpsimd state > > > > and restore guest's fpsimd state and deactive fpsimd trap before > > > > returning to guest. It can avoid guest fpsimd trap soon to improve > > > > performance. > > > > So, the purpose of this patch is to context switch the FPSIMD state on > > initial entry to the guest, instead of enabling the trap and context > > switching the FPSIMD state lazily? > > > > And you decide whether to do this or not, based on whether the guest > triggered a lazy FPSIMD switch previously? I try to detect guest using fpsimd , but not overtraining which may include more complexity and not suitable for common cases. Therefore I just decide whether guest have triggerd a lazy fpsimd switch last time. > > > > > > > > > > Signed-off-by: Nianyao Tang> > > > --- > > > > arch/arm64/kernel/asm-offsets.c | 1 + > > > > arch/arm64/kvm/hyp/entry.S | 5 + > > > > arch/arm64/kvm/hyp/switch.c | 38 > > > > -- > > > > include/linux/kvm_host.h| 1 + > > > > 4 files changed, 43 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/arch/arm64/kernel/asm-offsets.c > > > > b/arch/arm64/kernel/asm-offsets.c index 5bdda65..35a9c5c 100644 > > > > --- a/arch/arm64/kernel/asm-offsets.c > > > > +++ b/arch/arm64/kernel/asm-offsets.c > > > > @@ -136,6 +136,7 @@ int main(void) #ifdef CONFIG_KVM_ARM_HOST > > > >DEFINE(VCPU_CONTEXT, offsetof(struct kvm_vcpu, arch.ctxt)); > > > >DEFINE(VCPU_FAULT_DISR, offsetof(struct kvm_vcpu, > > > > arch.fault.disr_el1)); > > > > + DEFINE(VCPU_LAST_FPSIMD_TRAP, offsetof(struct kvm_vcpu, > > > > + last_fpsimd_trap)); > > > >DEFINE(CPU_GP_REGS, offsetof(struct kvm_cpu_context, > > > > gp_regs)); > > > >DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_regs, regs)); > > > >DEFINE(CPU_FP_REGS, offsetof(struct kvm_regs, fp_regs)); > > > > diff --git a/arch/arm64/kvm/hyp/entry.S > > > > b/arch/arm64/kvm/hyp/entry.S index e41a161..956e042 100644 > > > > --- a/arch/arm64/kvm/hyp/entry.S > > > > +++ b/arch/arm64/kvm/hyp/entry.S > > > > @@ -197,6 +197,11 @@ alternative_endif > > > > add x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS) > > > > bl __fpsimd_restore_state > > > > > > > > + // Mark guest using fpsimd now > > > > + ldr x0, [x3, #VCPU_LAST_FPSIMD_TRAP] > > > > + add x0, x0, #1 > > > > Can this overflow? It won't overflow. It only trigger one fpsimd trap and restore guest's fpsimd state and deactivate fpsimd trap. When next non-fpsimd trap comes, it will be clear unconditionally. > > > > > > + str x0, [x3, #VCPU_LAST_FPSIMD_TRAP] > > > > + > > > > // Skip restoring fpexc32 for AArch64 guests > > > > mrs x1, hcr_el2 > > > > tbnzx1, #HCR_RW_SHIFT, 1f > > > > diff --git a/arch/arm64/kvm/hyp/switch.c > > > > b/arch/arm64/kvm/hyp/switch.c index d964523..86eea1b 100644 > > > > --- a/arch/arm64/kvm/hyp/switch.c > > > > +++ b/arch/arm64/kvm/hyp/switch.c > > > > @@ -92,7 +92,13 @@ static void activate_traps_vhe(struct kvm_vcpu > > > > *vcpu) > > > > > > > > val = read_sysreg(cpacr_el1); > > > > val |= CPACR_EL1_TTA; > > > > - val &= ~(CPACR_EL1_FPEN | CPACR_EL1_ZEN); > > > > + val &= ~CPACR_EL1_ZEN; > > > > + > > > > + if (vcpu->last_fpsimd_trap) > > > > + val |= CPACR_EL1_FPEN; > > > > + else > > > > + val &= ~CPACR_EL1_FPEN; > > > > + > > > > write_sysreg(val, cpacr_el1); > > > > > > > > write_sysreg(kvm_get_hyp_vector(), vbar_el1); @@ -105,7 +111,13 > > > > @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu) > > > > __activate_traps_common(vcpu); > > > > > > > > val = CPTR_EL2_DEFAULT; > > > > - val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ; > > > > + val |= CPTR_EL2_TTA | CPTR_EL2_TZ; > > > > + > > > > + if (vcpu->last_fpsimd_trap) > > > > + val &= ~CPTR_EL2_TFP; > > > > + else > > > > + val |= CPTR_EL2_TFP; > > > > + > > > > write_sysreg(val, cptr_el2); > > > > } > > > > > > > > @@ -406,6 +418,17 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) > > > > __activate_traps(vcpu); > > > > __activate_vm(vcpu->kvm); > > > > > > > > + /* > > > > +* If guest last trap to host for handling fpsimd, > > > > last_fpsimd_trap > > > > +* is set. Restore guest's fpsimd state and deactivate fpsimd > > > > trap > > > > +* to
Re: [PATCH] arm64: kvm: use -fno-jump-tables with clang
On Fri, May 18, 2018 at 11:13 AM Marc Zyngierwrote: > What I'd really like is to apply that patch knowing that: > - you have checked that with a released version of the compiler, you > don't observe any absolute address in any of the objects that are going > to be executed at EL2 on a mainline kernel, To verify, we should disassemble objects from arch/arm64/kvm/hyp/*.o and make sure we don't see absolute addresses? I can work with Sami to get a sense of what the before and after of this patch looks like in disassembly, then verify those changes are pervasive. > - you have successfully run guests with a mainline kernel, I believe Andrey has already done this. If he can verify (maybe during working hours next week), then maybe we can add his Tested-by to this patches commit message? > - it works for a reasonable set of common kernel configurations > (defconfig and some of the most useful debug options), It's easy for us to test our kernel configs for Android, ChromeOS, and defconfig. I'd be curious to know the shortlist of "most useful debug options" just to be a better kernel developer, personally. > - I can reproduce your findings with the same released compiler. Lets wait for Andrey to confirm his test setup. On the Android side, I think you should be able to get by with a released version, but I'd be curious to hear from Andrey. > Is that the case? I don't think any of the above is completely outlandish. These are all reasonable. Thanks for the feedback. -- Thanks, ~Nick Desaulniers ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] arm64: kvm: use -fno-jump-tables with clang
On 18/05/18 18:56, Nick Desaulniers wrote: > + Andrey > > On Fri, May 18, 2018 at 10:45 AM Marc Zyngierwrote: >> On 18/05/18 18:40, Nick Desaulniers wrote: >>> On Fri, May 18, 2018 at 10:30 AM Marc Zyngier > wrote: I'm going to ask the question I've asked before when this patch cropped up (must be the 4th time now): > > The previous threads for context: > https://patchwork.kernel.org/patch/10060381/ > https://lkml.org/lkml/2018/3/16/434 > Is it guaranteed that this is the only case where LLVM/clang is going > to generate absolute addresses instead of using relative addressing? >>> >>> It seems like if there's requirements that only relative addressing be >>> used, then the compiler should be told explicitly about this > restriction, >>> no? > >> Certainly. What's the rune? > > It seems like -fno-jump-tables covers all known issues and unblocks people > from doing further work. It sounds like you'd like some kind of stronger > guarantee? Wont those cases still "crop up" as far as needing to annotate > either the code, or build scripts? What I'd really like is to apply that patch knowing that: - you have checked that with a released version of the compiler, you don't observe any absolute address in any of the objects that are going to be executed at EL2 on a mainline kernel, - you have successfully run guests with a mainline kernel, - it works for a reasonable set of common kernel configurations (defconfig and some of the most useful debug options), - I can reproduce your findings with the same released compiler. Is that the case? I don't think any of the above is completely outlandish. M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] arm64: kvm: use -fno-jump-tables with clang
+ Andrey On Fri, May 18, 2018 at 10:45 AM Marc Zyngierwrote: > On 18/05/18 18:40, Nick Desaulniers wrote: > > On Fri, May 18, 2018 at 10:30 AM Marc Zyngier wrote: > >> I'm going to ask the question I've asked before when this patch cropped > >> up (must be the 4th time now): The previous threads for context: https://patchwork.kernel.org/patch/10060381/ https://lkml.org/lkml/2018/3/16/434 > >> Is it guaranteed that this is the only case where LLVM/clang is going to > >> generate absolute addresses instead of using relative addressing? > > > > It seems like if there's requirements that only relative addressing be > > used, then the compiler should be told explicitly about this restriction, > > no? > Certainly. What's the rune? It seems like -fno-jump-tables covers all known issues and unblocks people from doing further work. It sounds like you'd like some kind of stronger guarantee? Wont those cases still "crop up" as far as needing to annotate either the code, or build scripts? -- Thanks, ~Nick Desaulniers ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] arm64: kvm: use -fno-jump-tables with clang
On 18/05/18 18:40, Nick Desaulniers wrote: > On Fri, May 18, 2018 at 10:30 AM Marc Zyngierwrote: >> I'm going to ask the question I've asked before when this patch cropped >> up (must be the 4th time now): > >> Is it guaranteed that this is the only case where LLVM/clang is going to >> generate absolute addresses instead of using relative addressing? > > It seems like if there's requirements that only relative addressing be > used, then the compiler should be told explicitly about this restriction, > no? Certainly. What's the rune? M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] arm64: kvm: use -fno-jump-tables with clang
+ Andrey (who reported testing this patch in https://github.com/ClangBuiltLinux/linux/issues/11) On Fri, May 18, 2018 at 10:40 AM Nick Desaulnierswrote: > On Fri, May 18, 2018 at 10:30 AM Marc Zyngier wrote: > > I'm going to ask the question I've asked before when this patch cropped > > up (must be the 4th time now): > > Is it guaranteed that this is the only case where LLVM/clang is going to > > generate absolute addresses instead of using relative addressing? > It seems like if there's requirements that only relative addressing be > used, then the compiler should be told explicitly about this restriction, > no? > -- > Thanks, > ~Nick Desaulniers -- Thanks, ~Nick Desaulniers ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] arm64: kvm: use -fno-jump-tables with clang
On Fri, May 18, 2018 at 10:30 AM Marc Zyngierwrote: > I'm going to ask the question I've asked before when this patch cropped > up (must be the 4th time now): > Is it guaranteed that this is the only case where LLVM/clang is going to > generate absolute addresses instead of using relative addressing? It seems like if there's requirements that only relative addressing be used, then the compiler should be told explicitly about this restriction, no? -- Thanks, ~Nick Desaulniers ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH] arm64: kvm: use -fno-jump-tables with clang
Starting with LLVM r308050, clang generates a jump table with EL1 virtual addresses in __init_stage2_translation, which results in a kernel panic when booting at EL2: Kernel panic - not syncing: HYP panic: PS:83c9 PC:089e6fd8 ESR:8604 FAR:089e6fd8 HPFAR:09825000 PAR: VCPU:000804fc20001221 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc7-dirty #3 Hardware name: ARM Juno development board (r1) (DT) Call trace: [] dump_backtrace+0x0/0x34c [] show_stack+0x18/0x20 [] dump_stack+0xc4/0xfc [] panic+0x138/0x2b4 [] panic+0x0/0x2b4 SMP: stopping secondary CPUs SMP: failed to stop secondary CPUs 0-3,5 Kernel Offset: disabled CPU features: 0x002086 Memory Limit: none ---[ end Kernel panic - not syncing: HYP panic: PS:83c9 PC:089e6fd8 ESR:8604 FAR:089e6fd8 HPFAR:09825000 PAR: VCPU:000804fc20001221 This change adds -fno-jump-tables to arm64/hyp to work around the bug. Suggested-by: AKASHI TakahiroSigned-off-by: Sami Tolvanen --- arch/arm64/kvm/hyp/Makefile | 4 1 file changed, 4 insertions(+) diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile index 4313f74753332..745b3255e7832 100644 --- a/arch/arm64/kvm/hyp/Makefile +++ b/arch/arm64/kvm/hyp/Makefile @@ -5,6 +5,10 @@ ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING +ifeq ($(cc-name),clang) +ccflags-y += -fno-jump-tables +endif + KVM=../../../../virt/kvm obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o -- 2.17.0.441.gb46fe60e1d-goog ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] arm64: kvm: use -fno-jump-tables with clang
On 18/05/18 18:02, Sami Tolvanen wrote: > Starting with LLVM r308050, clang generates a jump table with EL1 > virtual addresses in __init_stage2_translation, which results in a > kernel panic when booting at EL2: > > Kernel panic - not syncing: HYP panic: > PS:83c9 PC:089e6fd8 ESR:8604 > FAR:089e6fd8 HPFAR:09825000 PAR: > VCPU:000804fc20001221 > > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc7-dirty #3 > Hardware name: ARM Juno development board (r1) (DT) > Call trace: > [] dump_backtrace+0x0/0x34c > [] show_stack+0x18/0x20 > [] dump_stack+0xc4/0xfc > [] panic+0x138/0x2b4 > [] panic+0x0/0x2b4 > SMP: stopping secondary CPUs > SMP: failed to stop secondary CPUs 0-3,5 > Kernel Offset: disabled > CPU features: 0x002086 > Memory Limit: none > ---[ end Kernel panic - not syncing: HYP panic: > PS:83c9 PC:089e6fd8 ESR:8604 > FAR:089e6fd8 HPFAR:09825000 PAR: > VCPU:000804fc20001221 > > This change adds -fno-jump-tables to arm64/hyp to work around the > bug. > > Suggested-by: AKASHI Takahiro> Signed-off-by: Sami Tolvanen > --- > arch/arm64/kvm/hyp/Makefile | 4 > 1 file changed, 4 insertions(+) > > diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile > index 4313f74753332..745b3255e7832 100644 > --- a/arch/arm64/kvm/hyp/Makefile > +++ b/arch/arm64/kvm/hyp/Makefile > @@ -5,6 +5,10 @@ > > ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING > > +ifeq ($(cc-name),clang) > +ccflags-y += -fno-jump-tables > +endif > + > KVM=../../../../virt/kvm > > obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o > I'm going to ask the question I've asked before when this patch cropped up (must be the 4th time now): Is it guaranteed that this is the only case where LLVM/clang is going to generate absolute addresses instead of using relative addressing? So far, nobody has answered that question. If you assure me that this is the case, I'll take that patch. Otherwise, we're just playing whack-a-mole, as with the profiling stuff. Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 2/2] KVM: arm/arm64: harden unmap_stage2_ptes in case end is not PAGE_SIZE aligned
On 5/18/2018 5:48 PM, Marc Zyngier Wrote: > On 18/05/18 10:27, Jia He wrote: >> If it passes addr=0x20292,size=0xfe00 to unmap_stage2_range-> >> ...->unmap_stage2_ptes, unmap_stage2_ptes will get addr=0x20292, >> end=0x20292fe00. After first while loop addr=0x20293, end=0x20292fe00, >> then addr!=end. Thus it will touch another pages by put_pages() in the >> 2nd loop. >> >> This patch fixes it by hardening the break condition of while loop. >> >> Signed-off-by: jia...@hxt-semitech.com >> --- >> v2: newly added >> >> virt/kvm/arm/mmu.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 8dac311..45cd040 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -217,7 +217,7 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t >> *pmd, >> >> put_page(virt_to_page(pte)); >> } >> -} while (pte++, addr += PAGE_SIZE, addr != end); >> +} while (pte++, addr += PAGE_SIZE, addr < end); >> >> if (stage2_pte_table_empty(start_pte)) >> clear_stage2_pmd_entry(kvm, pmd, start_addr); >> > > I don't think this change is the right thing to do. You get that failure > because you're being passed a size that is not a multiple of PAGE_SIZE. > That's the mistake. > > You should ensure that this never happens, rather than changing the page > table walkers (which are consistent with the way this kind of code is > written in other places of the kernel). As you mentioned in your first > patch, the real issue is that KSM is broken, and this is what should be > fixed. > Got it, thanks Should I resend the patch 1/2 without any changes after droping patch 2/2? -- Cheers, Jia ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 2/2] KVM: arm/arm64: harden unmap_stage2_ptes in case end is not PAGE_SIZE aligned
On 18/05/18 10:27, Jia He wrote: > If it passes addr=0x20292,size=0xfe00 to unmap_stage2_range-> > ...->unmap_stage2_ptes, unmap_stage2_ptes will get addr=0x20292, > end=0x20292fe00. After first while loop addr=0x20293, end=0x20292fe00, > then addr!=end. Thus it will touch another pages by put_pages() in the > 2nd loop. > > This patch fixes it by hardening the break condition of while loop. > > Signed-off-by: jia...@hxt-semitech.com > --- > v2: newly added > > virt/kvm/arm/mmu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > index 8dac311..45cd040 100644 > --- a/virt/kvm/arm/mmu.c > +++ b/virt/kvm/arm/mmu.c > @@ -217,7 +217,7 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd, > > put_page(virt_to_page(pte)); > } > - } while (pte++, addr += PAGE_SIZE, addr != end); > + } while (pte++, addr += PAGE_SIZE, addr < end); > > if (stage2_pte_table_empty(start_pte)) > clear_stage2_pmd_entry(kvm, pmd, start_addr); > I don't think this change is the right thing to do. You get that failure because you're being passed a size that is not a multiple of PAGE_SIZE. That's the mistake. You should ensure that this never happens, rather than changing the page table walkers (which are consistent with the way this kind of code is written in other places of the kernel). As you mentioned in your first patch, the real issue is that KSM is broken, and this is what should be fixed. Thanks, M. -- Jazz is not dead. It just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v2 2/2] KVM: arm/arm64: harden unmap_stage2_ptes in case end is not PAGE_SIZE aligned
If it passes addr=0x20292,size=0xfe00 to unmap_stage2_range-> ...->unmap_stage2_ptes, unmap_stage2_ptes will get addr=0x20292, end=0x20292fe00. After first while loop addr=0x20293, end=0x20292fe00, then addr!=end. Thus it will touch another pages by put_pages() in the 2nd loop. This patch fixes it by hardening the break condition of while loop. Signed-off-by: jia...@hxt-semitech.com --- v2: newly added virt/kvm/arm/mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 8dac311..45cd040 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -217,7 +217,7 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd, put_page(virt_to_page(pte)); } - } while (pte++, addr += PAGE_SIZE, addr != end); + } while (pte++, addr += PAGE_SIZE, addr < end); if (stage2_pte_table_empty(start_pte)) clear_stage2_pmd_entry(kvm, pmd, start_addr); -- 1.8.3.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v2 1/2] KVM: arm/arm64: add WARN_ON if size is not PAGE_SIZE aligned in unmap_stage2_range
There is a panic in armv8a server(QDF2400) under memory pressure tests (start 20 guests and run memhog in the host). -begin [35380.800950] BUG: Bad page state in process qemu-kvm pfn:dd0b6 [35380.805825] page:7fe003742d80 count:-4871 mapcount:-2126053375 mapping: (null) index:0x0 [35380.815024] flags: 0x1fffc000() [35380.818845] raw: 1fffc000 ecf98147 [35380.826569] raw: dead0100 dead0200 8017c001c000 [35380.805825] page:7fe003742d80 count:-4871 mapcount:-2126053375 mapping: (null) index:0x0 [35380.815024] flags: 0x1fffc000() [35380.818845] raw: 1fffc000 ecf98147 [35380.826569] raw: dead0100 dead0200 8017c001c000 [35380.834294] page dumped because: nonzero _refcount [35380.839069] Modules linked in: vhost_net vhost tap ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter fcoe libfcoe libfc 8021q garp mrp stp llc scsi_transport_fc openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack vfat fat rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx5_ib ib_core crc32_ce ipmi_ssif tpm_tis tpm_tis_core sg nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath ip_tables xfs libcrc32c mlx5_core mlxfw devlink ahci_platform libahci_platform libahci qcom_emac sdhci_acpi sdhci hdma mmc_core hdma_mgmt i2c_qup dm_mirror dm_region_hash dm_log dm_mod [35380.908341] CPU: 29 PID: 18323 Comm: qemu-kvm Tainted: G W 4.14.15-5.hxt.aarch64 #1 [35380.917107] Hardware name: [35380.930909] Call trace: [35380.933345] [] dump_backtrace+0x0/0x22c [35380.938723] [] show_stack+0x24/0x2c [35380.943759] [] dump_stack+0x8c/0xb0 [35380.948794] [] bad_page+0xf4/0x154 [35380.953740] [] free_pages_check_bad+0x90/0x9c [35380.959642] [] free_pcppages_bulk+0x464/0x518 [35380.965545] [] free_hot_cold_page+0x22c/0x300 [35380.971448] [] __put_page+0x54/0x60 [35380.976484] [] unmap_stage2_range+0x170/0x2b4 [35380.982385] [] kvm_unmap_hva_handler+0x30/0x40 [35380.988375] [] handle_hva_to_gpa+0xb0/0xec [35380.994016] [] kvm_unmap_hva_range+0x5c/0xd0 [35380.999833] [] kvm_mmu_notifier_invalidate_range_start+0x60/0xb0 [35381.007387] [] __mmu_notifier_invalidate_range_start+0x64/0x8c [35381.014765] [] try_to_unmap_one+0x78c/0x7a4 [35381.020493] [] rmap_walk_ksm+0x124/0x1a0 [35381.025961] [] rmap_walk+0x94/0x98 [35381.030909] [] try_to_unmap+0x100/0x124 [35381.036293] [] unmap_and_move+0x480/0x6fc [35381.041847] [] migrate_pages+0x10c/0x288 [35381.047318] [] compact_zone+0x238/0x954 [35381.052697] [] compact_zone_order+0xc4/0xe8 [35381.058427] [] try_to_compact_pages+0x160/0x294 [35381.064503] [] __alloc_pages_direct_compact+0x68/0x194 [35381.071187] [] __alloc_pages_nodemask+0xc20/0xf7c [35381.077437] [] alloc_pages_vma+0x1a4/0x1c0 [35381.083080] [] do_huge_pmd_anonymous_page+0x128/0x324 [35381.089677] [] __handle_mm_fault+0x71c/0x7e8 [35381.095492] [] handle_mm_fault+0xf8/0x194 [35381.101049] [] __get_user_pages+0x124/0x34c [35381.106777] [] populate_vma_page_range+0x90/0x9c [35381.112941] [] __mm_populate+0xc4/0x15c [35381.118322] [] SyS_mlockall+0x100/0x164 [35381.123705] Exception stack(0x800dce5f3ec0 to 0x800dce5f4000) [35381.130128] 3ec0: 0003 d6e6024cc9b87e00 be94f000 [35381.137940] 3ee0: 0002 cf6fc3c0 [35381.145753] 3f00: 00e6 cf6fc490 eeeab0f0 d6e6024cc9b87e00 [35381.153565] 3f20: be81b3c0 0020 9e53eff806b5 [35381.161379] 3f40: be94de48 a7c269b0 0011 eeeabf68 [35381.169190] 3f60: ceacfe60 be94f000 be9ba358 be7ffb80 [35381.177003] 3f80: be9ba000 be959f64 be94f000 [35381.184815] 3fa0: eeeabdb0 be5f3bf8 eeeabdb0 [35381.192628] 3fc0: a7c269b8 6000 0003 00e6 [35381.200440] 3fe0: [35381.208254] [] __sys_trace_return+0x0/0x4 [35381.213809] Disabling lock debugging due to kernel taint end-- The root cause might be what I fixed at [1]. But from arm kvm points of view, it would be better we caught the exception earlier and clearer. If the size is not PAGE_SIZE aligned, unmap_stage2_range might unmap the wrong(more or less) page range. Hence it caused the "BUG: Bad page state" [1] https://lkml.org/lkml/2018/5/3/1042 Signed-off-by: jia...@hxt-semitech.com Reviewed-by: Suzuki