Re: [PATCH v2 0/4] KVM: Paravirt remote TLB flush

2017-11-09 Thread Wanpeng Li
2017-11-10 15:04 GMT+08:00 Wanpeng Li :
> Remote flushing api's does a busy wait which is fine in bare-metal
> scenario. But with-in the guest, the vcpus might have been pre-empted
> or blocked. In this scenario, the initator vcpu would end up
> busy-waiting for a long amount of time.
>
> This patch set implements para-virt flush tlbs making sure that it
> does not wait for vcpus that are sleeping. And all the sleeping vcpus
> flush the tlb on guest enter. Idea was discussed here:
> https://lkml.org/lkml/2012/2/20/157
>
> The best result is achieved when we're overcommiting the host by running
> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching
> vCPUs which are not scheduled and avoid the wait on the main CPU.
>
> In addition, thanks for commit 9e52fc2b50d ("x86/mm: Enable RCU based
> page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)")
>
> Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in
> one linux guest.
>
> ebizzy -M
>   vanillaoptimized boost
>  8 vCPUs   10152   10083   -0.68%
> 16 vCPUs12244866   297.5%
> 24 vCPUs11093871   249%
> 32 vCPUs10253375   229.3%

v1 -> v2:
 * a new CPUID feature bit
 * fix cmpxchg check
 * use kvm_vcpu_flush_tlb() to get the statistics right
 * just OR the KVM_VCPU_PREEMPTED in kvm_steal_time_set_preempted
 * add a new bool argument to kvm_x86_ops->tlb_flush
 * __cpumask_clear_cpu() instead of cpumask_clear_cpu()
 * not put cpumask_t on stack
 * rebase the patchset against "locking/qspinlock/x86: Avoid
test-and-set when PV_DEDICATED is set" v3

>
> Wanpeng Li (4):
>   KVM: Add vCPU running/preempted state
>   KVM: Add paravirt remote TLB flush
>   KVM: X86: introduce invalidate_gpa argument to tlb flush
>   KVM: Add flush_on_enter before guest enter
>
>  Documentation/virtual/kvm/cpuid.txt  | 10 ++
>  arch/x86/include/asm/kvm_host.h  |  2 +-
>  arch/x86/include/uapi/asm/kvm_para.h |  6 ++
>  arch/x86/kernel/kvm.c| 35 ++-
>  arch/x86/kvm/cpuid.c |  3 ++-
>  arch/x86/kvm/svm.c   | 14 +++---
>  arch/x86/kvm/vmx.c   | 21 +++--
>  arch/x86/kvm/x86.c   | 24 +++-
>  8 files changed, 86 insertions(+), 29 deletions(-)
>
> --
> 2.7.4
>


Re: [PATCH v2 0/4] KVM: Paravirt remote TLB flush

2017-11-09 Thread Wanpeng Li
2017-11-10 15:04 GMT+08:00 Wanpeng Li :
> Remote flushing api's does a busy wait which is fine in bare-metal
> scenario. But with-in the guest, the vcpus might have been pre-empted
> or blocked. In this scenario, the initator vcpu would end up
> busy-waiting for a long amount of time.
>
> This patch set implements para-virt flush tlbs making sure that it
> does not wait for vcpus that are sleeping. And all the sleeping vcpus
> flush the tlb on guest enter. Idea was discussed here:
> https://lkml.org/lkml/2012/2/20/157
>
> The best result is achieved when we're overcommiting the host by running
> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching
> vCPUs which are not scheduled and avoid the wait on the main CPU.
>
> In addition, thanks for commit 9e52fc2b50d ("x86/mm: Enable RCU based
> page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)")
>
> Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in
> one linux guest.
>
> ebizzy -M
>   vanillaoptimized boost
>  8 vCPUs   10152   10083   -0.68%
> 16 vCPUs12244866   297.5%
> 24 vCPUs11093871   249%
> 32 vCPUs10253375   229.3%

v1 -> v2:
 * a new CPUID feature bit
 * fix cmpxchg check
 * use kvm_vcpu_flush_tlb() to get the statistics right
 * just OR the KVM_VCPU_PREEMPTED in kvm_steal_time_set_preempted
 * add a new bool argument to kvm_x86_ops->tlb_flush
 * __cpumask_clear_cpu() instead of cpumask_clear_cpu()
 * not put cpumask_t on stack
 * rebase the patchset against "locking/qspinlock/x86: Avoid
test-and-set when PV_DEDICATED is set" v3

>
> Wanpeng Li (4):
>   KVM: Add vCPU running/preempted state
>   KVM: Add paravirt remote TLB flush
>   KVM: X86: introduce invalidate_gpa argument to tlb flush
>   KVM: Add flush_on_enter before guest enter
>
>  Documentation/virtual/kvm/cpuid.txt  | 10 ++
>  arch/x86/include/asm/kvm_host.h  |  2 +-
>  arch/x86/include/uapi/asm/kvm_para.h |  6 ++
>  arch/x86/kernel/kvm.c| 35 ++-
>  arch/x86/kvm/cpuid.c |  3 ++-
>  arch/x86/kvm/svm.c   | 14 +++---
>  arch/x86/kvm/vmx.c   | 21 +++--
>  arch/x86/kvm/x86.c   | 24 +++-
>  8 files changed, 86 insertions(+), 29 deletions(-)
>
> --
> 2.7.4
>


[PATCH v2 0/4] KVM: Paravirt remote TLB flush

2017-11-09 Thread Wanpeng Li
Remote flushing api's does a busy wait which is fine in bare-metal
scenario. But with-in the guest, the vcpus might have been pre-empted
or blocked. In this scenario, the initator vcpu would end up
busy-waiting for a long amount of time.

This patch set implements para-virt flush tlbs making sure that it
does not wait for vcpus that are sleeping. And all the sleeping vcpus
flush the tlb on guest enter. Idea was discussed here:
https://lkml.org/lkml/2012/2/20/157

The best result is achieved when we're overcommiting the host by running 
multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching 
vCPUs which are not scheduled and avoid the wait on the main CPU.

In addition, thanks for commit 9e52fc2b50d ("x86/mm: Enable RCU based 
page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)")

Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in 
one linux guest.

ebizzy -M 
  vanillaoptimized boost
 8 vCPUs   10152   10083   -0.68% 
16 vCPUs12244866   297.5% 
24 vCPUs11093871   249%
32 vCPUs10253375   229.3% 

Wanpeng Li (4):
  KVM: Add vCPU running/preempted state
  KVM: Add paravirt remote TLB flush
  KVM: X86: introduce invalidate_gpa argument to tlb flush
  KVM: Add flush_on_enter before guest enter

 Documentation/virtual/kvm/cpuid.txt  | 10 ++
 arch/x86/include/asm/kvm_host.h  |  2 +-
 arch/x86/include/uapi/asm/kvm_para.h |  6 ++
 arch/x86/kernel/kvm.c| 35 ++-
 arch/x86/kvm/cpuid.c |  3 ++-
 arch/x86/kvm/svm.c   | 14 +++---
 arch/x86/kvm/vmx.c   | 21 +++--
 arch/x86/kvm/x86.c   | 24 +++-
 8 files changed, 86 insertions(+), 29 deletions(-)

-- 
2.7.4



[PATCH v2 0/4] KVM: Paravirt remote TLB flush

2017-11-09 Thread Wanpeng Li
Remote flushing api's does a busy wait which is fine in bare-metal
scenario. But with-in the guest, the vcpus might have been pre-empted
or blocked. In this scenario, the initator vcpu would end up
busy-waiting for a long amount of time.

This patch set implements para-virt flush tlbs making sure that it
does not wait for vcpus that are sleeping. And all the sleeping vcpus
flush the tlb on guest enter. Idea was discussed here:
https://lkml.org/lkml/2012/2/20/157

The best result is achieved when we're overcommiting the host by running 
multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching 
vCPUs which are not scheduled and avoid the wait on the main CPU.

In addition, thanks for commit 9e52fc2b50d ("x86/mm: Enable RCU based 
page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)")

Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in 
one linux guest.

ebizzy -M 
  vanillaoptimized boost
 8 vCPUs   10152   10083   -0.68% 
16 vCPUs12244866   297.5% 
24 vCPUs11093871   249%
32 vCPUs10253375   229.3% 

Wanpeng Li (4):
  KVM: Add vCPU running/preempted state
  KVM: Add paravirt remote TLB flush
  KVM: X86: introduce invalidate_gpa argument to tlb flush
  KVM: Add flush_on_enter before guest enter

 Documentation/virtual/kvm/cpuid.txt  | 10 ++
 arch/x86/include/asm/kvm_host.h  |  2 +-
 arch/x86/include/uapi/asm/kvm_para.h |  6 ++
 arch/x86/kernel/kvm.c| 35 ++-
 arch/x86/kvm/cpuid.c |  3 ++-
 arch/x86/kvm/svm.c   | 14 +++---
 arch/x86/kvm/vmx.c   | 21 +++--
 arch/x86/kvm/x86.c   | 24 +++-
 8 files changed, 86 insertions(+), 29 deletions(-)

-- 
2.7.4