Re: [PATCH] arm64: KVM: reduce guest fpsimd trap

2018-05-18 Thread Tangnianyao (ICT)
On Wed, May 16, 2018 at 20:46 PM GST+8, Christoffer Dall wrote:
> On Wed, May 16, 2018 at 11:32:17AM +0100, Dave Martin wrote:
> > On Wed, May 16, 2018 at 10:25:40AM +0100, Marc Zyngier wrote:
> > > [+Dave]
> > > 
> > > Hi Nianyao,
> > > 
> > > On 16/05/18 10:08, Tangnianyao (ICT) wrote:
> > > > Add last_fpsimd_trap to notify if guest last exit reason is handling 
> > > > fpsimd. If guest is using fpsimd frequently, save host's fpsimd state 
> > > > and restore guest's fpsimd state and deactive fpsimd trap before 
> > > > returning to guest. It can avoid guest fpsimd trap soon to improve 
> > > > performance.
> > 
> > So, the purpose of this patch is to context switch the FPSIMD state on 
> > initial entry to the guest, instead of enabling the trap and context 
> > switching the FPSIMD state lazily?
> > 
> > And you decide whether to do this or not, based on whether the guest 
> triggered a lazy FPSIMD switch previously?

I try to detect guest using fpsimd , but not overtraining which may 
include more complexity and not suitable for common cases. Therefore I 
just decide whether guest have triggerd a lazy fpsimd switch last time.

> > 
> > > > 
> > > > Signed-off-by: Nianyao Tang 
> > > > ---
> > > >  arch/arm64/kernel/asm-offsets.c |  1 +
> > > >  arch/arm64/kvm/hyp/entry.S  |  5 +
> > > >  arch/arm64/kvm/hyp/switch.c | 38 
> > > > --
> > > >  include/linux/kvm_host.h|  1 +
> > > >  4 files changed, 43 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arch/arm64/kernel/asm-offsets.c 
> > > > b/arch/arm64/kernel/asm-offsets.c index 5bdda65..35a9c5c 100644
> > > > --- a/arch/arm64/kernel/asm-offsets.c
> > > > +++ b/arch/arm64/kernel/asm-offsets.c
> > > > @@ -136,6 +136,7 @@ int main(void)  #ifdef CONFIG_KVM_ARM_HOST
> > > >DEFINE(VCPU_CONTEXT, offsetof(struct kvm_vcpu, arch.ctxt));
> > > >DEFINE(VCPU_FAULT_DISR,  offsetof(struct kvm_vcpu, 
> > > > arch.fault.disr_el1));
> > > > +  DEFINE(VCPU_LAST_FPSIMD_TRAP, offsetof(struct kvm_vcpu, 
> > > > + last_fpsimd_trap));
> > > >DEFINE(CPU_GP_REGS,  offsetof(struct kvm_cpu_context, 
> > > > gp_regs));
> > > >DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_regs, regs));
> > > >DEFINE(CPU_FP_REGS,  offsetof(struct kvm_regs, fp_regs));
> > > > diff --git a/arch/arm64/kvm/hyp/entry.S 
> > > > b/arch/arm64/kvm/hyp/entry.S index e41a161..956e042 100644
> > > > --- a/arch/arm64/kvm/hyp/entry.S
> > > > +++ b/arch/arm64/kvm/hyp/entry.S
> > > > @@ -197,6 +197,11 @@ alternative_endif
> > > > add x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS)
> > > > bl  __fpsimd_restore_state
> > > >  
> > > > +   // Mark guest using fpsimd now
> > > > +   ldr x0, [x3, #VCPU_LAST_FPSIMD_TRAP]
> > > > +   add x0, x0, #1
> > 
> > Can this overflow?

It won't overflow. It only trigger one fpsimd trap and restore guest's fpsimd
state and deactivate fpsimd trap. When next non-fpsimd trap comes, it will be 
clear unconditionally.

> > 
> > > > +   str x0, [x3, #VCPU_LAST_FPSIMD_TRAP]
> > > > +
> > > > // Skip restoring fpexc32 for AArch64 guests
> > > > mrs x1, hcr_el2
> > > > tbnzx1, #HCR_RW_SHIFT, 1f
> > > > diff --git a/arch/arm64/kvm/hyp/switch.c 
> > > > b/arch/arm64/kvm/hyp/switch.c index d964523..86eea1b 100644
> > > > --- a/arch/arm64/kvm/hyp/switch.c
> > > > +++ b/arch/arm64/kvm/hyp/switch.c
> > > > @@ -92,7 +92,13 @@ static void activate_traps_vhe(struct kvm_vcpu 
> > > > *vcpu)
> > > >  
> > > > val = read_sysreg(cpacr_el1);
> > > > val |= CPACR_EL1_TTA;
> > > > -   val &= ~(CPACR_EL1_FPEN | CPACR_EL1_ZEN);
> > > > +   val &= ~CPACR_EL1_ZEN;
> > > > +
> > > > +   if (vcpu->last_fpsimd_trap)
> > > > +   val |= CPACR_EL1_FPEN;
> > > > +   else
> > > > +   val &= ~CPACR_EL1_FPEN;
> > > > +
> > > > write_sysreg(val, cpacr_el1);
> > > >  
> > > > write_sysreg(kvm_get_hyp_vector(), vbar_el1); @@ -105,7 +111,13 
> > > > @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu)
> > > > __activate_traps_common(vcpu);
> > > >  
> > > > val = CPTR_EL2_DEFAULT;
> > > > -   val |= CPTR_EL2_TTA | CPTR_EL2_TFP | CPTR_EL2_TZ;
> > > > +   val |= CPTR_EL2_TTA | CPTR_EL2_TZ;
> > > > +
> > > > +   if (vcpu->last_fpsimd_trap)
> > > > +   val &= ~CPTR_EL2_TFP;
> > > > +   else
> > > > +   val |= CPTR_EL2_TFP;
> > > > +
> > > > write_sysreg(val, cptr_el2);
> > > >  }
> > > >  
> > > > @@ -406,6 +418,17 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
> > > > __activate_traps(vcpu);
> > > > __activate_vm(vcpu->kvm);
> > > >  
> > > > +   /*
> > > > +* If guest last trap to host for handling fpsimd, 
> > > > last_fpsimd_trap
> > > > +* is set. Restore guest's fpsimd state and deactivate fpsimd 
> > > > trap
> > > > +* to 

Re: [PATCH] arm64: kvm: use -fno-jump-tables with clang

2018-05-18 Thread Nick Desaulniers
On Fri, May 18, 2018 at 11:13 AM Marc Zyngier  wrote:
> What I'd really like is to apply that patch knowing that:

> - you have checked that with a released version of the compiler, you
> don't observe any absolute address in any of the objects that are going
> to be executed at EL2 on a mainline kernel,

To verify, we should disassemble objects from arch/arm64/kvm/hyp/*.o and
make sure we don't see absolute addresses?  I can work with Sami to get a
sense of what the before and after of this patch looks like in disassembly,
then verify those changes are pervasive.

> - you have successfully run guests with a mainline kernel,

I believe Andrey has already done this.  If he can verify (maybe during
working hours next week), then maybe we can add his Tested-by to this
patches commit message?

> - it works for a reasonable set of common kernel configurations
> (defconfig and some of the most useful debug options),

It's easy for us to test our kernel configs for Android, ChromeOS, and
defconfig.  I'd be curious to know the shortlist of "most useful debug
options" just to be a better kernel developer, personally.

> - I can reproduce your findings with the same released compiler.

Lets wait for Andrey to confirm his test setup.  On the Android side, I
think you should be able to get by with a released version, but I'd be
curious to hear from Andrey.

> Is that the case? I don't think any of the above is completely outlandish.

These are all reasonable. Thanks for the feedback.
-- 
Thanks,
~Nick Desaulniers
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: kvm: use -fno-jump-tables with clang

2018-05-18 Thread Marc Zyngier
On 18/05/18 18:56, Nick Desaulniers wrote:
> + Andrey
> 
> On Fri, May 18, 2018 at 10:45 AM Marc Zyngier  wrote:
>> On 18/05/18 18:40, Nick Desaulniers wrote:
>>> On Fri, May 18, 2018 at 10:30 AM Marc Zyngier 
> wrote:
 I'm going to ask the question I've asked before when this patch cropped
 up (must be the 4th time now):
> 
> The previous threads for context:
> https://patchwork.kernel.org/patch/10060381/
> https://lkml.org/lkml/2018/3/16/434
> 
 Is it guaranteed that this is the only case where LLVM/clang is going
> to
 generate absolute addresses instead of using relative addressing?
>>>
>>> It seems like if there's requirements that only relative addressing be
>>> used, then the compiler should be told explicitly about this
> restriction,
>>> no?
> 
>> Certainly. What's the rune?
> 
> It seems like -fno-jump-tables covers all known issues and unblocks people
> from doing further work.  It sounds like you'd like some kind of stronger
> guarantee? Wont those cases still "crop up" as far as needing to annotate
> either the code, or build scripts?

What I'd really like is to apply that patch knowing that:

- you have checked that with a released version of the compiler, you
don't observe any absolute address in any of the objects that are going
to be executed at EL2 on a mainline kernel,

- you have successfully run guests with a mainline kernel,

- it works for a reasonable set of common kernel configurations
(defconfig and some of the most useful debug options),

- I can reproduce your findings with the same released compiler.

Is that the case? I don't think any of the above is completely outlandish.

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: kvm: use -fno-jump-tables with clang

2018-05-18 Thread Nick Desaulniers
+ Andrey

On Fri, May 18, 2018 at 10:45 AM Marc Zyngier  wrote:
> On 18/05/18 18:40, Nick Desaulniers wrote:
> > On Fri, May 18, 2018 at 10:30 AM Marc Zyngier 
wrote:
> >> I'm going to ask the question I've asked before when this patch cropped
> >> up (must be the 4th time now):

The previous threads for context:
https://patchwork.kernel.org/patch/10060381/
https://lkml.org/lkml/2018/3/16/434

> >> Is it guaranteed that this is the only case where LLVM/clang is going
to
> >> generate absolute addresses instead of using relative addressing?
> >
> > It seems like if there's requirements that only relative addressing be
> > used, then the compiler should be told explicitly about this
restriction,
> > no?

> Certainly. What's the rune?

It seems like -fno-jump-tables covers all known issues and unblocks people
from doing further work.  It sounds like you'd like some kind of stronger
guarantee? Wont those cases still "crop up" as far as needing to annotate
either the code, or build scripts?
-- 
Thanks,
~Nick Desaulniers
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: kvm: use -fno-jump-tables with clang

2018-05-18 Thread Marc Zyngier
On 18/05/18 18:40, Nick Desaulniers wrote:
> On Fri, May 18, 2018 at 10:30 AM Marc Zyngier  wrote:
>> I'm going to ask the question I've asked before when this patch cropped
>> up (must be the 4th time now):
> 
>> Is it guaranteed that this is the only case where LLVM/clang is going to
>> generate absolute addresses instead of using relative addressing?
> 
> It seems like if there's requirements that only relative addressing be
> used, then the compiler should be told explicitly about this restriction,
> no?

Certainly. What's the rune?

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: kvm: use -fno-jump-tables with clang

2018-05-18 Thread Nick Desaulniers
+ Andrey (who reported testing this patch in
https://github.com/ClangBuiltLinux/linux/issues/11)
On Fri, May 18, 2018 at 10:40 AM Nick Desaulniers 
wrote:

> On Fri, May 18, 2018 at 10:30 AM Marc Zyngier 
wrote:
> > I'm going to ask the question I've asked before when this patch cropped
> > up (must be the 4th time now):

> > Is it guaranteed that this is the only case where LLVM/clang is going to
> > generate absolute addresses instead of using relative addressing?

> It seems like if there's requirements that only relative addressing be
> used, then the compiler should be told explicitly about this restriction,
> no?
> --
> Thanks,
> ~Nick Desaulniers



-- 
Thanks,
~Nick Desaulniers
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: kvm: use -fno-jump-tables with clang

2018-05-18 Thread Nick Desaulniers
On Fri, May 18, 2018 at 10:30 AM Marc Zyngier  wrote:
> I'm going to ask the question I've asked before when this patch cropped
> up (must be the 4th time now):

> Is it guaranteed that this is the only case where LLVM/clang is going to
> generate absolute addresses instead of using relative addressing?

It seems like if there's requirements that only relative addressing be
used, then the compiler should be told explicitly about this restriction,
no?
-- 
Thanks,
~Nick Desaulniers
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] arm64: kvm: use -fno-jump-tables with clang

2018-05-18 Thread Sami Tolvanen
Starting with LLVM r308050, clang generates a jump table with EL1
virtual addresses in __init_stage2_translation, which results in a
kernel panic when booting at EL2:

  Kernel panic - not syncing: HYP panic:
  PS:83c9 PC:089e6fd8 ESR:8604
  FAR:089e6fd8 HPFAR:09825000 PAR:
  VCPU:000804fc20001221

  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc7-dirty #3
  Hardware name: ARM Juno development board (r1) (DT)
  Call trace:
  [] dump_backtrace+0x0/0x34c
  [] show_stack+0x18/0x20
  [] dump_stack+0xc4/0xfc
  [] panic+0x138/0x2b4
  [] panic+0x0/0x2b4
  SMP: stopping secondary CPUs
  SMP: failed to stop secondary CPUs 0-3,5
  Kernel Offset: disabled
  CPU features: 0x002086
  Memory Limit: none
  ---[ end Kernel panic - not syncing: HYP panic:
  PS:83c9 PC:089e6fd8 ESR:8604
  FAR:089e6fd8 HPFAR:09825000 PAR:
  VCPU:000804fc20001221

This change adds -fno-jump-tables to arm64/hyp to work around the
bug.

Suggested-by: AKASHI Takahiro 
Signed-off-by: Sami Tolvanen 
---
 arch/arm64/kvm/hyp/Makefile | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index 4313f74753332..745b3255e7832 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -5,6 +5,10 @@
 
 ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING
 
+ifeq ($(cc-name),clang)
+ccflags-y += -fno-jump-tables
+endif
+
 KVM=../../../../virt/kvm
 
 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o
-- 
2.17.0.441.gb46fe60e1d-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] arm64: kvm: use -fno-jump-tables with clang

2018-05-18 Thread Marc Zyngier
On 18/05/18 18:02, Sami Tolvanen wrote:
> Starting with LLVM r308050, clang generates a jump table with EL1
> virtual addresses in __init_stage2_translation, which results in a
> kernel panic when booting at EL2:
> 
>   Kernel panic - not syncing: HYP panic:
>   PS:83c9 PC:089e6fd8 ESR:8604
>   FAR:089e6fd8 HPFAR:09825000 PAR:
>   VCPU:000804fc20001221
> 
>   CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc7-dirty #3
>   Hardware name: ARM Juno development board (r1) (DT)
>   Call trace:
>   [] dump_backtrace+0x0/0x34c
>   [] show_stack+0x18/0x20
>   [] dump_stack+0xc4/0xfc
>   [] panic+0x138/0x2b4
>   [] panic+0x0/0x2b4
>   SMP: stopping secondary CPUs
>   SMP: failed to stop secondary CPUs 0-3,5
>   Kernel Offset: disabled
>   CPU features: 0x002086
>   Memory Limit: none
>   ---[ end Kernel panic - not syncing: HYP panic:
>   PS:83c9 PC:089e6fd8 ESR:8604
>   FAR:089e6fd8 HPFAR:09825000 PAR:
>   VCPU:000804fc20001221
> 
> This change adds -fno-jump-tables to arm64/hyp to work around the
> bug.
> 
> Suggested-by: AKASHI Takahiro 
> Signed-off-by: Sami Tolvanen 
> ---
>  arch/arm64/kvm/hyp/Makefile | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index 4313f74753332..745b3255e7832 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -5,6 +5,10 @@
>  
>  ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING
>  
> +ifeq ($(cc-name),clang)
> +ccflags-y += -fno-jump-tables
> +endif
> +
>  KVM=../../../../virt/kvm
>  
>  obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o
> 

I'm going to ask the question I've asked before when this patch cropped
up (must be the 4th time now):

Is it guaranteed that this is the only case where LLVM/clang is going to
generate absolute addresses instead of using relative addressing?

So far, nobody has answered that question. If you assure me that this is
the case, I'll take that patch. Otherwise, we're just playing
whack-a-mole, as with the profiling stuff.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 2/2] KVM: arm/arm64: harden unmap_stage2_ptes in case end is not PAGE_SIZE aligned

2018-05-18 Thread Jia He


On 5/18/2018 5:48 PM, Marc Zyngier Wrote:
> On 18/05/18 10:27, Jia He wrote:
>> If it passes addr=0x20292,size=0xfe00 to unmap_stage2_range->
>> ...->unmap_stage2_ptes, unmap_stage2_ptes will get addr=0x20292,
>> end=0x20292fe00. After first while loop addr=0x20293, end=0x20292fe00,
>> then addr!=end. Thus it will touch another pages by put_pages() in the
>> 2nd loop.
>>
>> This patch fixes it by hardening the break condition of while loop.
>>
>> Signed-off-by: jia...@hxt-semitech.com
>> ---
>> v2: newly added
>>
>>  virt/kvm/arm/mmu.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 8dac311..45cd040 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -217,7 +217,7 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t 
>> *pmd,
>>  
>>  put_page(virt_to_page(pte));
>>  }
>> -} while (pte++, addr += PAGE_SIZE, addr != end);
>> +} while (pte++, addr += PAGE_SIZE, addr < end);
>>  
>>  if (stage2_pte_table_empty(start_pte))
>>  clear_stage2_pmd_entry(kvm, pmd, start_addr);
>>
> 
> I don't think this change is the right thing to do. You get that failure
> because you're being passed a size that is not a multiple of PAGE_SIZE.
> That's the mistake.
> 
> You should ensure that this never happens, rather than changing the page
> table walkers (which are consistent with the way this kind of code is
> written in other places of the kernel). As you mentioned in your first
> patch, the real issue is that KSM is broken, and this is what should be
> fixed.
> 
Got it, thanks
Should I resend the patch 1/2 without any changes after droping patch 2/2?

-- 
Cheers,
Jia
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 2/2] KVM: arm/arm64: harden unmap_stage2_ptes in case end is not PAGE_SIZE aligned

2018-05-18 Thread Marc Zyngier
On 18/05/18 10:27, Jia He wrote:
> If it passes addr=0x20292,size=0xfe00 to unmap_stage2_range->
> ...->unmap_stage2_ptes, unmap_stage2_ptes will get addr=0x20292,
> end=0x20292fe00. After first while loop addr=0x20293, end=0x20292fe00,
> then addr!=end. Thus it will touch another pages by put_pages() in the
> 2nd loop.
> 
> This patch fixes it by hardening the break condition of while loop.
> 
> Signed-off-by: jia...@hxt-semitech.com
> ---
> v2: newly added
> 
>  virt/kvm/arm/mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 8dac311..45cd040 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -217,7 +217,7 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
>  
>   put_page(virt_to_page(pte));
>   }
> - } while (pte++, addr += PAGE_SIZE, addr != end);
> + } while (pte++, addr += PAGE_SIZE, addr < end);
>  
>   if (stage2_pte_table_empty(start_pte))
>   clear_stage2_pmd_entry(kvm, pmd, start_addr);
> 

I don't think this change is the right thing to do. You get that failure
because you're being passed a size that is not a multiple of PAGE_SIZE.
That's the mistake.

You should ensure that this never happens, rather than changing the page
table walkers (which are consistent with the way this kind of code is
written in other places of the kernel). As you mentioned in your first
patch, the real issue is that KSM is broken, and this is what should be
fixed.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 2/2] KVM: arm/arm64: harden unmap_stage2_ptes in case end is not PAGE_SIZE aligned

2018-05-18 Thread Jia He
If it passes addr=0x20292,size=0xfe00 to unmap_stage2_range->
...->unmap_stage2_ptes, unmap_stage2_ptes will get addr=0x20292,
end=0x20292fe00. After first while loop addr=0x20293, end=0x20292fe00,
then addr!=end. Thus it will touch another pages by put_pages() in the
2nd loop.

This patch fixes it by hardening the break condition of while loop.

Signed-off-by: jia...@hxt-semitech.com
---
v2: newly added

 virt/kvm/arm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 8dac311..45cd040 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -217,7 +217,7 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
 
put_page(virt_to_page(pte));
}
-   } while (pte++, addr += PAGE_SIZE, addr != end);
+   } while (pte++, addr += PAGE_SIZE, addr < end);
 
if (stage2_pte_table_empty(start_pte))
clear_stage2_pmd_entry(kvm, pmd, start_addr);
-- 
1.8.3.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 1/2] KVM: arm/arm64: add WARN_ON if size is not PAGE_SIZE aligned in unmap_stage2_range

2018-05-18 Thread Jia He
There is a panic in armv8a server(QDF2400) under memory pressure tests
(start 20 guests and run memhog in the host).

-begin
[35380.800950] BUG: Bad page state in process qemu-kvm  pfn:dd0b6
[35380.805825] page:7fe003742d80 count:-4871 mapcount:-2126053375
mapping:  (null) index:0x0
[35380.815024] flags: 0x1fffc000()
[35380.818845] raw: 1fffc000  
ecf98147
[35380.826569] raw: dead0100 dead0200 8017c001c000

[35380.805825] page:7fe003742d80 count:-4871 mapcount:-2126053375
mapping:  (null) index:0x0
[35380.815024] flags: 0x1fffc000()
[35380.818845] raw: 1fffc000  
ecf98147
[35380.826569] raw: dead0100 dead0200 8017c001c000

[35380.834294] page dumped because: nonzero _refcount
[35380.839069] Modules linked in: vhost_net vhost tap ebtable_filter
ebtables ip6table_filter ip6_tables iptable_filter fcoe libfcoe libfc
8021q garp mrp stp llc scsi_transport_fc openvswitch nf_conntrack_ipv6
nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6
nf_nat nf_conntrack vfat fat rpcrdma ib_isert iscsi_target_mod ib_iser
libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp
scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm
ib_cm iw_cm mlx5_ib ib_core crc32_ce ipmi_ssif tpm_tis tpm_tis_core sg
nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_multipath ip_tables xfs
libcrc32c mlx5_core mlxfw devlink ahci_platform libahci_platform libahci
qcom_emac sdhci_acpi sdhci hdma mmc_core hdma_mgmt i2c_qup dm_mirror
dm_region_hash dm_log dm_mod
[35380.908341] CPU: 29 PID: 18323 Comm: qemu-kvm Tainted: G W
4.14.15-5.hxt.aarch64 #1
[35380.917107] Hardware name: 
[35380.930909] Call trace:
[35380.933345] [] dump_backtrace+0x0/0x22c
[35380.938723] [] show_stack+0x24/0x2c
[35380.943759] [] dump_stack+0x8c/0xb0
[35380.948794] [] bad_page+0xf4/0x154
[35380.953740] [] free_pages_check_bad+0x90/0x9c
[35380.959642] [] free_pcppages_bulk+0x464/0x518
[35380.965545] [] free_hot_cold_page+0x22c/0x300
[35380.971448] [] __put_page+0x54/0x60
[35380.976484] [] unmap_stage2_range+0x170/0x2b4
[35380.982385] [] kvm_unmap_hva_handler+0x30/0x40
[35380.988375] [] handle_hva_to_gpa+0xb0/0xec
[35380.994016] [] kvm_unmap_hva_range+0x5c/0xd0
[35380.999833] []
kvm_mmu_notifier_invalidate_range_start+0x60/0xb0
[35381.007387] []
__mmu_notifier_invalidate_range_start+0x64/0x8c
[35381.014765] [] try_to_unmap_one+0x78c/0x7a4
[35381.020493] [] rmap_walk_ksm+0x124/0x1a0
[35381.025961] [] rmap_walk+0x94/0x98
[35381.030909] [] try_to_unmap+0x100/0x124
[35381.036293] [] unmap_and_move+0x480/0x6fc
[35381.041847] [] migrate_pages+0x10c/0x288
[35381.047318] [] compact_zone+0x238/0x954
[35381.052697] [] compact_zone_order+0xc4/0xe8
[35381.058427] [] try_to_compact_pages+0x160/0x294
[35381.064503] []
__alloc_pages_direct_compact+0x68/0x194
[35381.071187] [] __alloc_pages_nodemask+0xc20/0xf7c
[35381.077437] [] alloc_pages_vma+0x1a4/0x1c0
[35381.083080] []
do_huge_pmd_anonymous_page+0x128/0x324
[35381.089677] [] __handle_mm_fault+0x71c/0x7e8
[35381.095492] [] handle_mm_fault+0xf8/0x194
[35381.101049] [] __get_user_pages+0x124/0x34c
[35381.106777] [] populate_vma_page_range+0x90/0x9c
[35381.112941] [] __mm_populate+0xc4/0x15c
[35381.118322] [] SyS_mlockall+0x100/0x164
[35381.123705] Exception stack(0x800dce5f3ec0 to 0x800dce5f4000)
[35381.130128] 3ec0: 0003 d6e6024cc9b87e00 be94f000

[35381.137940] 3ee0: 0002  
cf6fc3c0
[35381.145753] 3f00: 00e6 cf6fc490 eeeab0f0
d6e6024cc9b87e00
[35381.153565] 3f20:  be81b3c0 0020
9e53eff806b5
[35381.161379] 3f40: be94de48 a7c269b0 0011
eeeabf68
[35381.169190] 3f60: ceacfe60 be94f000 be9ba358
be7ffb80
[35381.177003] 3f80: be9ba000 be959f64 
be94f000
[35381.184815] 3fa0:  eeeabdb0 be5f3bf8
eeeabdb0
[35381.192628] 3fc0: a7c269b8 6000 0003
00e6
[35381.200440] 3fe0:   

[35381.208254] [] __sys_trace_return+0x0/0x4
[35381.213809] Disabling lock debugging due to kernel taint
end--

The root cause might be what I fixed at [1]. But from arm kvm points of
view, it would be better we caught the exception earlier and clearer.

If the size is not PAGE_SIZE aligned, unmap_stage2_range might unmap the
wrong(more or less) page range. Hence it caused the "BUG: Bad page
state"

[1] https://lkml.org/lkml/2018/5/3/1042

Signed-off-by: jia...@hxt-semitech.com
Reviewed-by: Suzuki