Re: [PATCH v2 2/4] KVM: Add paravirt remote TLB flush
2017-11-15 17:54 GMT+08:00 Peter Zijlstra: > On Wed, Nov 15, 2017 at 04:43:32PM +0800, Wanpeng Li wrote: >> Hi Peterz, >> >> I found big performance difference as I discuss with you several days ago. >> >> ebizzy -M >> vanillastatic/local cpumask per-cpu cpumask >> 8 vCPUs 1015210083 10117 >> 16 vCPUs1224 4866 10008 >> 24 vCPUs1109 38719928 >> 32 vCPUs1025 33759811 >> >> In addition, I can observe ~50% perf top time is occupied by >> smp_call_function_many(), ~30% perf top time is occupied by >> call_function_interrupt() in the guest when running ebizzy for >> static/local cpumask variable. However, I almost can't observe these >> IPI stuffs after changing to per-cpu variable. Any opinions? > > That doesn't really make sense.. :/ > > So a single static variable is broken (multiple CPUs can call > flush_tlb_others() concurrently and overwrite each others masks). But I > don't see why a per-cpu variable would be much slower than an on-stack > variable. The score of ebizzy, bigger is better, so per-cpu variable 2~3 times better than on-stack. Actually I find what happens here. :) + for_each_possible_cpu(cpu) { + zalloc_cpumask_var_node(per_cpu_ptr(&__pv_tlb_mask, cpu), + GFP_KERNEL, cpu_to_node(cpu)); + } This zalloc_cpumask_var_node() returns NULL and fails to alloc per-cpu memory. There is a check in my kvm_flush_tlb_others(): + if (unlikely(!flushmask)) + return; So the kvm_flush_tlb_others() skips all the tlbs shutdown, I think that's the reason why the score of overcommit is as high as non-overcommit, in addition, it also explains why I can't observe IPI related functions by perf top. Regards, Wanpeng Li
Re: [PATCH v2 2/4] KVM: Add paravirt remote TLB flush
2017-11-15 17:54 GMT+08:00 Peter Zijlstra : > On Wed, Nov 15, 2017 at 04:43:32PM +0800, Wanpeng Li wrote: >> Hi Peterz, >> >> I found big performance difference as I discuss with you several days ago. >> >> ebizzy -M >> vanillastatic/local cpumask per-cpu cpumask >> 8 vCPUs 1015210083 10117 >> 16 vCPUs1224 4866 10008 >> 24 vCPUs1109 38719928 >> 32 vCPUs1025 33759811 >> >> In addition, I can observe ~50% perf top time is occupied by >> smp_call_function_many(), ~30% perf top time is occupied by >> call_function_interrupt() in the guest when running ebizzy for >> static/local cpumask variable. However, I almost can't observe these >> IPI stuffs after changing to per-cpu variable. Any opinions? > > That doesn't really make sense.. :/ > > So a single static variable is broken (multiple CPUs can call > flush_tlb_others() concurrently and overwrite each others masks). But I > don't see why a per-cpu variable would be much slower than an on-stack > variable. The score of ebizzy, bigger is better, so per-cpu variable 2~3 times better than on-stack. Actually I find what happens here. :) + for_each_possible_cpu(cpu) { + zalloc_cpumask_var_node(per_cpu_ptr(&__pv_tlb_mask, cpu), + GFP_KERNEL, cpu_to_node(cpu)); + } This zalloc_cpumask_var_node() returns NULL and fails to alloc per-cpu memory. There is a check in my kvm_flush_tlb_others(): + if (unlikely(!flushmask)) + return; So the kvm_flush_tlb_others() skips all the tlbs shutdown, I think that's the reason why the score of overcommit is as high as non-overcommit, in addition, it also explains why I can't observe IPI related functions by perf top. Regards, Wanpeng Li
Re: [PATCH v2 2/4] KVM: Add paravirt remote TLB flush
On Wed, Nov 15, 2017 at 04:43:32PM +0800, Wanpeng Li wrote: > Hi Peterz, > > I found big performance difference as I discuss with you several days ago. > > ebizzy -M > vanillastatic/local cpumask per-cpu cpumask > 8 vCPUs 1015210083 10117 > 16 vCPUs1224 4866 10008 > 24 vCPUs1109 38719928 > 32 vCPUs1025 33759811 > > In addition, I can observe ~50% perf top time is occupied by > smp_call_function_many(), ~30% perf top time is occupied by > call_function_interrupt() in the guest when running ebizzy for > static/local cpumask variable. However, I almost can't observe these > IPI stuffs after changing to per-cpu variable. Any opinions? That doesn't really make sense.. :/ So a single static variable is broken (multiple CPUs can call flush_tlb_others() concurrently and overwrite each others masks). But I don't see why a per-cpu variable would be much slower than an on-stack variable.
Re: [PATCH v2 2/4] KVM: Add paravirt remote TLB flush
On Wed, Nov 15, 2017 at 04:43:32PM +0800, Wanpeng Li wrote: > Hi Peterz, > > I found big performance difference as I discuss with you several days ago. > > ebizzy -M > vanillastatic/local cpumask per-cpu cpumask > 8 vCPUs 1015210083 10117 > 16 vCPUs1224 4866 10008 > 24 vCPUs1109 38719928 > 32 vCPUs1025 33759811 > > In addition, I can observe ~50% perf top time is occupied by > smp_call_function_many(), ~30% perf top time is occupied by > call_function_interrupt() in the guest when running ebizzy for > static/local cpumask variable. However, I almost can't observe these > IPI stuffs after changing to per-cpu variable. Any opinions? That doesn't really make sense.. :/ So a single static variable is broken (multiple CPUs can call flush_tlb_others() concurrently and overwrite each others masks). But I don't see why a per-cpu variable would be much slower than an on-stack variable.
Re: [PATCH v2 2/4] KVM: Add paravirt remote TLB flush
2017-11-10 16:24 GMT+08:00 Paolo Bonzini: > On 10/11/2017 08:04, Wanpeng Li wrote: >> From: Wanpeng Li >> >> Remote flushing api's does a busy wait which is fine in bare-metal >> scenario. But with-in the guest, the vcpus might have been pre-empted >> or blocked. In this scenario, the initator vcpu would end up >> busy-waiting for a long amount of time. >> >> This patch set implements para-virt flush tlbs making sure that it >> does not wait for vcpus that are sleeping. And all the sleeping vcpus >> flush the tlb on guest enter. >> >> The best result is achieved when we're overcommiting the host by running >> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching >> vCPUs which are not scheduled and avoid the wait on the main CPU. >> >> Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in >> one linux guest. >> >> ebizzy -M >> vanillaoptimized boost >> 8 vCPUs 10152 10083 -0.68% >> 16 vCPUs12244866 297.5% >> 24 vCPUs11093871 249% >> 32 vCPUs10253375 229.3% >> >> Cc: Paolo Bonzini >> Cc: Radim Krčmář >> Signed-off-by: Wanpeng Li >> --- >> Documentation/virtual/kvm/cpuid.txt | 4 >> arch/x86/include/uapi/asm/kvm_para.h | 2 ++ >> arch/x86/kernel/kvm.c| 31 +++ >> 3 files changed, 37 insertions(+) >> >> diff --git a/Documentation/virtual/kvm/cpuid.txt >> b/Documentation/virtual/kvm/cpuid.txt >> index 117066a..9693fcc 100644 >> --- a/Documentation/virtual/kvm/cpuid.txt >> +++ b/Documentation/virtual/kvm/cpuid.txt >> @@ -60,6 +60,10 @@ KVM_FEATURE_PV_DEDICATED || 8 || guest >> checks this feature bit >> || || mizations such as usage of >> || || qspinlocks. >> >> -- >> +KVM_FEATURE_PV_TLB_FLUSH || 9 || guest checks this feature bit >> + || || before enabling >> paravirtualized >> + || || tlb flush. >> +-- >> KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no >> guest-side >> || || per-cpu warps are expected in >> || || kvmclock. >> diff --git a/arch/x86/include/uapi/asm/kvm_para.h >> b/arch/x86/include/uapi/asm/kvm_para.h >> index 9ead1ed..a028479 100644 >> --- a/arch/x86/include/uapi/asm/kvm_para.h >> +++ b/arch/x86/include/uapi/asm/kvm_para.h >> @@ -25,6 +25,7 @@ >> #define KVM_FEATURE_PV_EOI 6 >> #define KVM_FEATURE_PV_UNHALT7 >> #define KVM_FEATURE_PV_DEDICATED 8 >> +#define KVM_FEATURE_PV_TLB_FLUSH 9 >> >> /* The last 8 bits are used to indicate how to interpret the flags field >> * in pvclock structure. If no bits are set, all flags are ignored. >> @@ -53,6 +54,7 @@ struct kvm_steal_time { >> >> #define KVM_VCPU_NOT_PREEMPTED (0 << 0) >> #define KVM_VCPU_PREEMPTED (1 << 0) >> +#define KVM_VCPU_SHOULD_FLUSH (1 << 1) >> >> #define KVM_CLOCK_PAIRING_WALLCLOCK 0 >> struct kvm_clock_pairing { >> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c >> index 66ed3bc..50f4b6a 100644 >> --- a/arch/x86/kernel/kvm.c >> +++ b/arch/x86/kernel/kvm.c >> @@ -465,6 +465,33 @@ static void __init kvm_apf_trap_init(void) >> update_intr_gate(X86_TRAP_PF, async_page_fault); >> } >> >> +static cpumask_t flushmask; > > Hi Wanpeng, > > are you going to send v3 with a percpu variable? Hi Peterz, I found big performance difference as I discuss with you several days ago. ebizzy -M vanillastatic/local cpumask per-cpu cpumask 8 vCPUs 1015210083 10117 16 vCPUs1224 4866 10008 24 vCPUs1109 38719928 32 vCPUs1025 33759811 In addition, I can observe ~50% perf top time is occupied by smp_call_function_many(), ~30% perf top time is occupied by call_function_interrupt() in the guest when running ebizzy for static/local cpumask variable. However, I almost can't observe these IPI stuffs after changing to per-cpu variable. Any opinions? Regards, Wanpeng Li > > Paolo > >> +static void kvm_flush_tlb_others(const struct cpumask *cpumask, >> + const struct flush_tlb_info *info) >> +{ >> + u8 state; >> + int cpu; >> + struct kvm_steal_time *src; >> + >> + cpumask_copy(, cpumask); >> + /* >> + * We have to call flush only on online vCPUs. And >> + * queue
Re: [PATCH v2 2/4] KVM: Add paravirt remote TLB flush
2017-11-10 16:24 GMT+08:00 Paolo Bonzini : > On 10/11/2017 08:04, Wanpeng Li wrote: >> From: Wanpeng Li >> >> Remote flushing api's does a busy wait which is fine in bare-metal >> scenario. But with-in the guest, the vcpus might have been pre-empted >> or blocked. In this scenario, the initator vcpu would end up >> busy-waiting for a long amount of time. >> >> This patch set implements para-virt flush tlbs making sure that it >> does not wait for vcpus that are sleeping. And all the sleeping vcpus >> flush the tlb on guest enter. >> >> The best result is achieved when we're overcommiting the host by running >> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching >> vCPUs which are not scheduled and avoid the wait on the main CPU. >> >> Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in >> one linux guest. >> >> ebizzy -M >> vanillaoptimized boost >> 8 vCPUs 10152 10083 -0.68% >> 16 vCPUs12244866 297.5% >> 24 vCPUs11093871 249% >> 32 vCPUs10253375 229.3% >> >> Cc: Paolo Bonzini >> Cc: Radim Krčmář >> Signed-off-by: Wanpeng Li >> --- >> Documentation/virtual/kvm/cpuid.txt | 4 >> arch/x86/include/uapi/asm/kvm_para.h | 2 ++ >> arch/x86/kernel/kvm.c| 31 +++ >> 3 files changed, 37 insertions(+) >> >> diff --git a/Documentation/virtual/kvm/cpuid.txt >> b/Documentation/virtual/kvm/cpuid.txt >> index 117066a..9693fcc 100644 >> --- a/Documentation/virtual/kvm/cpuid.txt >> +++ b/Documentation/virtual/kvm/cpuid.txt >> @@ -60,6 +60,10 @@ KVM_FEATURE_PV_DEDICATED || 8 || guest >> checks this feature bit >> || || mizations such as usage of >> || || qspinlocks. >> >> -- >> +KVM_FEATURE_PV_TLB_FLUSH || 9 || guest checks this feature bit >> + || || before enabling >> paravirtualized >> + || || tlb flush. >> +-- >> KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no >> guest-side >> || || per-cpu warps are expected in >> || || kvmclock. >> diff --git a/arch/x86/include/uapi/asm/kvm_para.h >> b/arch/x86/include/uapi/asm/kvm_para.h >> index 9ead1ed..a028479 100644 >> --- a/arch/x86/include/uapi/asm/kvm_para.h >> +++ b/arch/x86/include/uapi/asm/kvm_para.h >> @@ -25,6 +25,7 @@ >> #define KVM_FEATURE_PV_EOI 6 >> #define KVM_FEATURE_PV_UNHALT7 >> #define KVM_FEATURE_PV_DEDICATED 8 >> +#define KVM_FEATURE_PV_TLB_FLUSH 9 >> >> /* The last 8 bits are used to indicate how to interpret the flags field >> * in pvclock structure. If no bits are set, all flags are ignored. >> @@ -53,6 +54,7 @@ struct kvm_steal_time { >> >> #define KVM_VCPU_NOT_PREEMPTED (0 << 0) >> #define KVM_VCPU_PREEMPTED (1 << 0) >> +#define KVM_VCPU_SHOULD_FLUSH (1 << 1) >> >> #define KVM_CLOCK_PAIRING_WALLCLOCK 0 >> struct kvm_clock_pairing { >> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c >> index 66ed3bc..50f4b6a 100644 >> --- a/arch/x86/kernel/kvm.c >> +++ b/arch/x86/kernel/kvm.c >> @@ -465,6 +465,33 @@ static void __init kvm_apf_trap_init(void) >> update_intr_gate(X86_TRAP_PF, async_page_fault); >> } >> >> +static cpumask_t flushmask; > > Hi Wanpeng, > > are you going to send v3 with a percpu variable? Hi Peterz, I found big performance difference as I discuss with you several days ago. ebizzy -M vanillastatic/local cpumask per-cpu cpumask 8 vCPUs 1015210083 10117 16 vCPUs1224 4866 10008 24 vCPUs1109 38719928 32 vCPUs1025 33759811 In addition, I can observe ~50% perf top time is occupied by smp_call_function_many(), ~30% perf top time is occupied by call_function_interrupt() in the guest when running ebizzy for static/local cpumask variable. However, I almost can't observe these IPI stuffs after changing to per-cpu variable. Any opinions? Regards, Wanpeng Li > > Paolo > >> +static void kvm_flush_tlb_others(const struct cpumask *cpumask, >> + const struct flush_tlb_info *info) >> +{ >> + u8 state; >> + int cpu; >> + struct kvm_steal_time *src; >> + >> + cpumask_copy(, cpumask); >> + /* >> + * We have to call flush only on online vCPUs. And >> + * queue flush_on_enter for pre-empted vCPUs >> + */ >> + for_each_cpu(cpu, cpumask) { >> + src =
Re: [PATCH v2 2/4] KVM: Add paravirt remote TLB flush
2017-11-10 16:33 GMT+08:00 Wanpeng Li: > 2017-11-10 16:24 GMT+08:00 Paolo Bonzini : >> On 10/11/2017 08:04, Wanpeng Li wrote: >>> From: Wanpeng Li >>> >>> Remote flushing api's does a busy wait which is fine in bare-metal >>> scenario. But with-in the guest, the vcpus might have been pre-empted >>> or blocked. In this scenario, the initator vcpu would end up >>> busy-waiting for a long amount of time. >>> >>> This patch set implements para-virt flush tlbs making sure that it >>> does not wait for vcpus that are sleeping. And all the sleeping vcpus >>> flush the tlb on guest enter. >>> >>> The best result is achieved when we're overcommiting the host by running >>> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching >>> vCPUs which are not scheduled and avoid the wait on the main CPU. >>> >>> Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in >>> one linux guest. >>> >>> ebizzy -M >>> vanillaoptimized boost >>> 8 vCPUs 10152 10083 -0.68% >>> 16 vCPUs12244866 297.5% >>> 24 vCPUs11093871 249% >>> 32 vCPUs10253375 229.3% >>> >>> Cc: Paolo Bonzini >>> Cc: Radim Krčmář >>> Signed-off-by: Wanpeng Li >>> --- >>> Documentation/virtual/kvm/cpuid.txt | 4 >>> arch/x86/include/uapi/asm/kvm_para.h | 2 ++ >>> arch/x86/kernel/kvm.c| 31 +++ >>> 3 files changed, 37 insertions(+) >>> >>> diff --git a/Documentation/virtual/kvm/cpuid.txt >>> b/Documentation/virtual/kvm/cpuid.txt >>> index 117066a..9693fcc 100644 >>> --- a/Documentation/virtual/kvm/cpuid.txt >>> +++ b/Documentation/virtual/kvm/cpuid.txt >>> @@ -60,6 +60,10 @@ KVM_FEATURE_PV_DEDICATED || 8 || guest >>> checks this feature bit >>> || || mizations such as usage of >>> || || qspinlocks. >>> >>> -- >>> +KVM_FEATURE_PV_TLB_FLUSH || 9 || guest checks this feature >>> bit >>> + || || before enabling >>> paravirtualized >>> + || || tlb flush. >>> +-- >>> KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no >>> guest-side >>> || || per-cpu warps are expected >>> in >>> || || kvmclock. >>> diff --git a/arch/x86/include/uapi/asm/kvm_para.h >>> b/arch/x86/include/uapi/asm/kvm_para.h >>> index 9ead1ed..a028479 100644 >>> --- a/arch/x86/include/uapi/asm/kvm_para.h >>> +++ b/arch/x86/include/uapi/asm/kvm_para.h >>> @@ -25,6 +25,7 @@ >>> #define KVM_FEATURE_PV_EOI 6 >>> #define KVM_FEATURE_PV_UNHALT7 >>> #define KVM_FEATURE_PV_DEDICATED 8 >>> +#define KVM_FEATURE_PV_TLB_FLUSH 9 >>> >>> /* The last 8 bits are used to indicate how to interpret the flags field >>> * in pvclock structure. If no bits are set, all flags are ignored. >>> @@ -53,6 +54,7 @@ struct kvm_steal_time { >>> >>> #define KVM_VCPU_NOT_PREEMPTED (0 << 0) >>> #define KVM_VCPU_PREEMPTED (1 << 0) >>> +#define KVM_VCPU_SHOULD_FLUSH (1 << 1) >>> >>> #define KVM_CLOCK_PAIRING_WALLCLOCK 0 >>> struct kvm_clock_pairing { >>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c >>> index 66ed3bc..50f4b6a 100644 >>> --- a/arch/x86/kernel/kvm.c >>> +++ b/arch/x86/kernel/kvm.c >>> @@ -465,6 +465,33 @@ static void __init kvm_apf_trap_init(void) >>> update_intr_gate(X86_TRAP_PF, async_page_fault); >>> } >>> >>> +static cpumask_t flushmask; >> >> Hi Wanpeng, >> >> are you going to send v3 with a percpu variable? > > Yeah, I just complete v3 according to Peterz's comments in another > guy's thread, I will send out them after completing the testing. This is how it looks it. https://pastebin.com/raw/L2vqu4cZ Regards, Wanpeng Li > > Regards, > Wanpeng Li > >> >> Paolo >> >>> +static void kvm_flush_tlb_others(const struct cpumask *cpumask, >>> + const struct flush_tlb_info *info) >>> +{ >>> + u8 state; >>> + int cpu; >>> + struct kvm_steal_time *src; >>> + >>> + cpumask_copy(, cpumask); >>> + /* >>> + * We have to call flush only on online vCPUs. And >>> + * queue flush_on_enter for pre-empted vCPUs >>> + */ >>> + for_each_cpu(cpu, cpumask) { >>> + src = _cpu(steal_time, cpu); >>> + state = src->preempted; >>> + if ((state & KVM_VCPU_PREEMPTED)) { >>> + if (cmpxchg(>preempted, state, state | >>> + KVM_VCPU_SHOULD_FLUSH) == state) >>> +
Re: [PATCH v2 2/4] KVM: Add paravirt remote TLB flush
2017-11-10 16:33 GMT+08:00 Wanpeng Li : > 2017-11-10 16:24 GMT+08:00 Paolo Bonzini : >> On 10/11/2017 08:04, Wanpeng Li wrote: >>> From: Wanpeng Li >>> >>> Remote flushing api's does a busy wait which is fine in bare-metal >>> scenario. But with-in the guest, the vcpus might have been pre-empted >>> or blocked. In this scenario, the initator vcpu would end up >>> busy-waiting for a long amount of time. >>> >>> This patch set implements para-virt flush tlbs making sure that it >>> does not wait for vcpus that are sleeping. And all the sleeping vcpus >>> flush the tlb on guest enter. >>> >>> The best result is achieved when we're overcommiting the host by running >>> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching >>> vCPUs which are not scheduled and avoid the wait on the main CPU. >>> >>> Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in >>> one linux guest. >>> >>> ebizzy -M >>> vanillaoptimized boost >>> 8 vCPUs 10152 10083 -0.68% >>> 16 vCPUs12244866 297.5% >>> 24 vCPUs11093871 249% >>> 32 vCPUs10253375 229.3% >>> >>> Cc: Paolo Bonzini >>> Cc: Radim Krčmář >>> Signed-off-by: Wanpeng Li >>> --- >>> Documentation/virtual/kvm/cpuid.txt | 4 >>> arch/x86/include/uapi/asm/kvm_para.h | 2 ++ >>> arch/x86/kernel/kvm.c| 31 +++ >>> 3 files changed, 37 insertions(+) >>> >>> diff --git a/Documentation/virtual/kvm/cpuid.txt >>> b/Documentation/virtual/kvm/cpuid.txt >>> index 117066a..9693fcc 100644 >>> --- a/Documentation/virtual/kvm/cpuid.txt >>> +++ b/Documentation/virtual/kvm/cpuid.txt >>> @@ -60,6 +60,10 @@ KVM_FEATURE_PV_DEDICATED || 8 || guest >>> checks this feature bit >>> || || mizations such as usage of >>> || || qspinlocks. >>> >>> -- >>> +KVM_FEATURE_PV_TLB_FLUSH || 9 || guest checks this feature >>> bit >>> + || || before enabling >>> paravirtualized >>> + || || tlb flush. >>> +-- >>> KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no >>> guest-side >>> || || per-cpu warps are expected >>> in >>> || || kvmclock. >>> diff --git a/arch/x86/include/uapi/asm/kvm_para.h >>> b/arch/x86/include/uapi/asm/kvm_para.h >>> index 9ead1ed..a028479 100644 >>> --- a/arch/x86/include/uapi/asm/kvm_para.h >>> +++ b/arch/x86/include/uapi/asm/kvm_para.h >>> @@ -25,6 +25,7 @@ >>> #define KVM_FEATURE_PV_EOI 6 >>> #define KVM_FEATURE_PV_UNHALT7 >>> #define KVM_FEATURE_PV_DEDICATED 8 >>> +#define KVM_FEATURE_PV_TLB_FLUSH 9 >>> >>> /* The last 8 bits are used to indicate how to interpret the flags field >>> * in pvclock structure. If no bits are set, all flags are ignored. >>> @@ -53,6 +54,7 @@ struct kvm_steal_time { >>> >>> #define KVM_VCPU_NOT_PREEMPTED (0 << 0) >>> #define KVM_VCPU_PREEMPTED (1 << 0) >>> +#define KVM_VCPU_SHOULD_FLUSH (1 << 1) >>> >>> #define KVM_CLOCK_PAIRING_WALLCLOCK 0 >>> struct kvm_clock_pairing { >>> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c >>> index 66ed3bc..50f4b6a 100644 >>> --- a/arch/x86/kernel/kvm.c >>> +++ b/arch/x86/kernel/kvm.c >>> @@ -465,6 +465,33 @@ static void __init kvm_apf_trap_init(void) >>> update_intr_gate(X86_TRAP_PF, async_page_fault); >>> } >>> >>> +static cpumask_t flushmask; >> >> Hi Wanpeng, >> >> are you going to send v3 with a percpu variable? > > Yeah, I just complete v3 according to Peterz's comments in another > guy's thread, I will send out them after completing the testing. This is how it looks it. https://pastebin.com/raw/L2vqu4cZ Regards, Wanpeng Li > > Regards, > Wanpeng Li > >> >> Paolo >> >>> +static void kvm_flush_tlb_others(const struct cpumask *cpumask, >>> + const struct flush_tlb_info *info) >>> +{ >>> + u8 state; >>> + int cpu; >>> + struct kvm_steal_time *src; >>> + >>> + cpumask_copy(, cpumask); >>> + /* >>> + * We have to call flush only on online vCPUs. And >>> + * queue flush_on_enter for pre-empted vCPUs >>> + */ >>> + for_each_cpu(cpu, cpumask) { >>> + src = _cpu(steal_time, cpu); >>> + state = src->preempted; >>> + if ((state & KVM_VCPU_PREEMPTED)) { >>> + if (cmpxchg(>preempted, state, state | >>> + KVM_VCPU_SHOULD_FLUSH) == state) >>> + __cpumask_clear_cpu(cpu, ); >>> + } >>> + } >>> + >>> + native_flush_tlb_others(,
Re: [PATCH v2 2/4] KVM: Add paravirt remote TLB flush
2017-11-10 16:24 GMT+08:00 Paolo Bonzini: > On 10/11/2017 08:04, Wanpeng Li wrote: >> From: Wanpeng Li >> >> Remote flushing api's does a busy wait which is fine in bare-metal >> scenario. But with-in the guest, the vcpus might have been pre-empted >> or blocked. In this scenario, the initator vcpu would end up >> busy-waiting for a long amount of time. >> >> This patch set implements para-virt flush tlbs making sure that it >> does not wait for vcpus that are sleeping. And all the sleeping vcpus >> flush the tlb on guest enter. >> >> The best result is achieved when we're overcommiting the host by running >> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching >> vCPUs which are not scheduled and avoid the wait on the main CPU. >> >> Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in >> one linux guest. >> >> ebizzy -M >> vanillaoptimized boost >> 8 vCPUs 10152 10083 -0.68% >> 16 vCPUs12244866 297.5% >> 24 vCPUs11093871 249% >> 32 vCPUs10253375 229.3% >> >> Cc: Paolo Bonzini >> Cc: Radim Krčmář >> Signed-off-by: Wanpeng Li >> --- >> Documentation/virtual/kvm/cpuid.txt | 4 >> arch/x86/include/uapi/asm/kvm_para.h | 2 ++ >> arch/x86/kernel/kvm.c| 31 +++ >> 3 files changed, 37 insertions(+) >> >> diff --git a/Documentation/virtual/kvm/cpuid.txt >> b/Documentation/virtual/kvm/cpuid.txt >> index 117066a..9693fcc 100644 >> --- a/Documentation/virtual/kvm/cpuid.txt >> +++ b/Documentation/virtual/kvm/cpuid.txt >> @@ -60,6 +60,10 @@ KVM_FEATURE_PV_DEDICATED || 8 || guest >> checks this feature bit >> || || mizations such as usage of >> || || qspinlocks. >> >> -- >> +KVM_FEATURE_PV_TLB_FLUSH || 9 || guest checks this feature bit >> + || || before enabling >> paravirtualized >> + || || tlb flush. >> +-- >> KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no >> guest-side >> || || per-cpu warps are expected in >> || || kvmclock. >> diff --git a/arch/x86/include/uapi/asm/kvm_para.h >> b/arch/x86/include/uapi/asm/kvm_para.h >> index 9ead1ed..a028479 100644 >> --- a/arch/x86/include/uapi/asm/kvm_para.h >> +++ b/arch/x86/include/uapi/asm/kvm_para.h >> @@ -25,6 +25,7 @@ >> #define KVM_FEATURE_PV_EOI 6 >> #define KVM_FEATURE_PV_UNHALT7 >> #define KVM_FEATURE_PV_DEDICATED 8 >> +#define KVM_FEATURE_PV_TLB_FLUSH 9 >> >> /* The last 8 bits are used to indicate how to interpret the flags field >> * in pvclock structure. If no bits are set, all flags are ignored. >> @@ -53,6 +54,7 @@ struct kvm_steal_time { >> >> #define KVM_VCPU_NOT_PREEMPTED (0 << 0) >> #define KVM_VCPU_PREEMPTED (1 << 0) >> +#define KVM_VCPU_SHOULD_FLUSH (1 << 1) >> >> #define KVM_CLOCK_PAIRING_WALLCLOCK 0 >> struct kvm_clock_pairing { >> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c >> index 66ed3bc..50f4b6a 100644 >> --- a/arch/x86/kernel/kvm.c >> +++ b/arch/x86/kernel/kvm.c >> @@ -465,6 +465,33 @@ static void __init kvm_apf_trap_init(void) >> update_intr_gate(X86_TRAP_PF, async_page_fault); >> } >> >> +static cpumask_t flushmask; > > Hi Wanpeng, > > are you going to send v3 with a percpu variable? Yeah, I just complete v3 according to Peterz's comments in another guy's thread, I will send out them after completing the testing. Regards, Wanpeng Li > > Paolo > >> +static void kvm_flush_tlb_others(const struct cpumask *cpumask, >> + const struct flush_tlb_info *info) >> +{ >> + u8 state; >> + int cpu; >> + struct kvm_steal_time *src; >> + >> + cpumask_copy(, cpumask); >> + /* >> + * We have to call flush only on online vCPUs. And >> + * queue flush_on_enter for pre-empted vCPUs >> + */ >> + for_each_cpu(cpu, cpumask) { >> + src = _cpu(steal_time, cpu); >> + state = src->preempted; >> + if ((state & KVM_VCPU_PREEMPTED)) { >> + if (cmpxchg(>preempted, state, state | >> + KVM_VCPU_SHOULD_FLUSH) == state) >> + __cpumask_clear_cpu(cpu, ); >> + } >> + } >> + >> + native_flush_tlb_others(, info); >> +} >> + >> void __init kvm_guest_init(void) >> { >> int i; >> @@ -484,6 +511,10 @@ void __init kvm_guest_init(void) >>
Re: [PATCH v2 2/4] KVM: Add paravirt remote TLB flush
2017-11-10 16:24 GMT+08:00 Paolo Bonzini : > On 10/11/2017 08:04, Wanpeng Li wrote: >> From: Wanpeng Li >> >> Remote flushing api's does a busy wait which is fine in bare-metal >> scenario. But with-in the guest, the vcpus might have been pre-empted >> or blocked. In this scenario, the initator vcpu would end up >> busy-waiting for a long amount of time. >> >> This patch set implements para-virt flush tlbs making sure that it >> does not wait for vcpus that are sleeping. And all the sleeping vcpus >> flush the tlb on guest enter. >> >> The best result is achieved when we're overcommiting the host by running >> multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching >> vCPUs which are not scheduled and avoid the wait on the main CPU. >> >> Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in >> one linux guest. >> >> ebizzy -M >> vanillaoptimized boost >> 8 vCPUs 10152 10083 -0.68% >> 16 vCPUs12244866 297.5% >> 24 vCPUs11093871 249% >> 32 vCPUs10253375 229.3% >> >> Cc: Paolo Bonzini >> Cc: Radim Krčmář >> Signed-off-by: Wanpeng Li >> --- >> Documentation/virtual/kvm/cpuid.txt | 4 >> arch/x86/include/uapi/asm/kvm_para.h | 2 ++ >> arch/x86/kernel/kvm.c| 31 +++ >> 3 files changed, 37 insertions(+) >> >> diff --git a/Documentation/virtual/kvm/cpuid.txt >> b/Documentation/virtual/kvm/cpuid.txt >> index 117066a..9693fcc 100644 >> --- a/Documentation/virtual/kvm/cpuid.txt >> +++ b/Documentation/virtual/kvm/cpuid.txt >> @@ -60,6 +60,10 @@ KVM_FEATURE_PV_DEDICATED || 8 || guest >> checks this feature bit >> || || mizations such as usage of >> || || qspinlocks. >> >> -- >> +KVM_FEATURE_PV_TLB_FLUSH || 9 || guest checks this feature bit >> + || || before enabling >> paravirtualized >> + || || tlb flush. >> +-- >> KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no >> guest-side >> || || per-cpu warps are expected in >> || || kvmclock. >> diff --git a/arch/x86/include/uapi/asm/kvm_para.h >> b/arch/x86/include/uapi/asm/kvm_para.h >> index 9ead1ed..a028479 100644 >> --- a/arch/x86/include/uapi/asm/kvm_para.h >> +++ b/arch/x86/include/uapi/asm/kvm_para.h >> @@ -25,6 +25,7 @@ >> #define KVM_FEATURE_PV_EOI 6 >> #define KVM_FEATURE_PV_UNHALT7 >> #define KVM_FEATURE_PV_DEDICATED 8 >> +#define KVM_FEATURE_PV_TLB_FLUSH 9 >> >> /* The last 8 bits are used to indicate how to interpret the flags field >> * in pvclock structure. If no bits are set, all flags are ignored. >> @@ -53,6 +54,7 @@ struct kvm_steal_time { >> >> #define KVM_VCPU_NOT_PREEMPTED (0 << 0) >> #define KVM_VCPU_PREEMPTED (1 << 0) >> +#define KVM_VCPU_SHOULD_FLUSH (1 << 1) >> >> #define KVM_CLOCK_PAIRING_WALLCLOCK 0 >> struct kvm_clock_pairing { >> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c >> index 66ed3bc..50f4b6a 100644 >> --- a/arch/x86/kernel/kvm.c >> +++ b/arch/x86/kernel/kvm.c >> @@ -465,6 +465,33 @@ static void __init kvm_apf_trap_init(void) >> update_intr_gate(X86_TRAP_PF, async_page_fault); >> } >> >> +static cpumask_t flushmask; > > Hi Wanpeng, > > are you going to send v3 with a percpu variable? Yeah, I just complete v3 according to Peterz's comments in another guy's thread, I will send out them after completing the testing. Regards, Wanpeng Li > > Paolo > >> +static void kvm_flush_tlb_others(const struct cpumask *cpumask, >> + const struct flush_tlb_info *info) >> +{ >> + u8 state; >> + int cpu; >> + struct kvm_steal_time *src; >> + >> + cpumask_copy(, cpumask); >> + /* >> + * We have to call flush only on online vCPUs. And >> + * queue flush_on_enter for pre-empted vCPUs >> + */ >> + for_each_cpu(cpu, cpumask) { >> + src = _cpu(steal_time, cpu); >> + state = src->preempted; >> + if ((state & KVM_VCPU_PREEMPTED)) { >> + if (cmpxchg(>preempted, state, state | >> + KVM_VCPU_SHOULD_FLUSH) == state) >> + __cpumask_clear_cpu(cpu, ); >> + } >> + } >> + >> + native_flush_tlb_others(, info); >> +} >> + >> void __init kvm_guest_init(void) >> { >> int i; >> @@ -484,6 +511,10 @@ void __init kvm_guest_init(void) >> pv_time_ops.steal_clock = kvm_steal_clock; >> } >> >> + if
Re: [PATCH v2 2/4] KVM: Add paravirt remote TLB flush
On 10/11/2017 08:04, Wanpeng Li wrote: > From: Wanpeng Li> > Remote flushing api's does a busy wait which is fine in bare-metal > scenario. But with-in the guest, the vcpus might have been pre-empted > or blocked. In this scenario, the initator vcpu would end up > busy-waiting for a long amount of time. > > This patch set implements para-virt flush tlbs making sure that it > does not wait for vcpus that are sleeping. And all the sleeping vcpus > flush the tlb on guest enter. > > The best result is achieved when we're overcommiting the host by running > multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching > vCPUs which are not scheduled and avoid the wait on the main CPU. > > Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in > one linux guest. > > ebizzy -M > vanillaoptimized boost > 8 vCPUs 10152 10083 -0.68% > 16 vCPUs12244866 297.5% > 24 vCPUs11093871 249% > 32 vCPUs10253375 229.3% > > Cc: Paolo Bonzini > Cc: Radim Krčmář > Signed-off-by: Wanpeng Li > --- > Documentation/virtual/kvm/cpuid.txt | 4 > arch/x86/include/uapi/asm/kvm_para.h | 2 ++ > arch/x86/kernel/kvm.c| 31 +++ > 3 files changed, 37 insertions(+) > > diff --git a/Documentation/virtual/kvm/cpuid.txt > b/Documentation/virtual/kvm/cpuid.txt > index 117066a..9693fcc 100644 > --- a/Documentation/virtual/kvm/cpuid.txt > +++ b/Documentation/virtual/kvm/cpuid.txt > @@ -60,6 +60,10 @@ KVM_FEATURE_PV_DEDICATED || 8 || guest > checks this feature bit > || || mizations such as usage of > || || qspinlocks. > > -- > +KVM_FEATURE_PV_TLB_FLUSH || 9 || guest checks this feature bit > + || || before enabling > paravirtualized > + || || tlb flush. > +-- > KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no > guest-side > || || per-cpu warps are expected in > || || kvmclock. > diff --git a/arch/x86/include/uapi/asm/kvm_para.h > b/arch/x86/include/uapi/asm/kvm_para.h > index 9ead1ed..a028479 100644 > --- a/arch/x86/include/uapi/asm/kvm_para.h > +++ b/arch/x86/include/uapi/asm/kvm_para.h > @@ -25,6 +25,7 @@ > #define KVM_FEATURE_PV_EOI 6 > #define KVM_FEATURE_PV_UNHALT7 > #define KVM_FEATURE_PV_DEDICATED 8 > +#define KVM_FEATURE_PV_TLB_FLUSH 9 > > /* The last 8 bits are used to indicate how to interpret the flags field > * in pvclock structure. If no bits are set, all flags are ignored. > @@ -53,6 +54,7 @@ struct kvm_steal_time { > > #define KVM_VCPU_NOT_PREEMPTED (0 << 0) > #define KVM_VCPU_PREEMPTED (1 << 0) > +#define KVM_VCPU_SHOULD_FLUSH (1 << 1) > > #define KVM_CLOCK_PAIRING_WALLCLOCK 0 > struct kvm_clock_pairing { > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index 66ed3bc..50f4b6a 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -465,6 +465,33 @@ static void __init kvm_apf_trap_init(void) > update_intr_gate(X86_TRAP_PF, async_page_fault); > } > > +static cpumask_t flushmask; Hi Wanpeng, are you going to send v3 with a percpu variable? Paolo > +static void kvm_flush_tlb_others(const struct cpumask *cpumask, > + const struct flush_tlb_info *info) > +{ > + u8 state; > + int cpu; > + struct kvm_steal_time *src; > + > + cpumask_copy(, cpumask); > + /* > + * We have to call flush only on online vCPUs. And > + * queue flush_on_enter for pre-empted vCPUs > + */ > + for_each_cpu(cpu, cpumask) { > + src = _cpu(steal_time, cpu); > + state = src->preempted; > + if ((state & KVM_VCPU_PREEMPTED)) { > + if (cmpxchg(>preempted, state, state | > + KVM_VCPU_SHOULD_FLUSH) == state) > + __cpumask_clear_cpu(cpu, ); > + } > + } > + > + native_flush_tlb_others(, info); > +} > + > void __init kvm_guest_init(void) > { > int i; > @@ -484,6 +511,10 @@ void __init kvm_guest_init(void) > pv_time_ops.steal_clock = kvm_steal_clock; > } > > + if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) && > + !kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED)) > + pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others; > + > if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) >
Re: [PATCH v2 2/4] KVM: Add paravirt remote TLB flush
On 10/11/2017 08:04, Wanpeng Li wrote: > From: Wanpeng Li > > Remote flushing api's does a busy wait which is fine in bare-metal > scenario. But with-in the guest, the vcpus might have been pre-empted > or blocked. In this scenario, the initator vcpu would end up > busy-waiting for a long amount of time. > > This patch set implements para-virt flush tlbs making sure that it > does not wait for vcpus that are sleeping. And all the sleeping vcpus > flush the tlb on guest enter. > > The best result is achieved when we're overcommiting the host by running > multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching > vCPUs which are not scheduled and avoid the wait on the main CPU. > > Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in > one linux guest. > > ebizzy -M > vanillaoptimized boost > 8 vCPUs 10152 10083 -0.68% > 16 vCPUs12244866 297.5% > 24 vCPUs11093871 249% > 32 vCPUs10253375 229.3% > > Cc: Paolo Bonzini > Cc: Radim Krčmář > Signed-off-by: Wanpeng Li > --- > Documentation/virtual/kvm/cpuid.txt | 4 > arch/x86/include/uapi/asm/kvm_para.h | 2 ++ > arch/x86/kernel/kvm.c| 31 +++ > 3 files changed, 37 insertions(+) > > diff --git a/Documentation/virtual/kvm/cpuid.txt > b/Documentation/virtual/kvm/cpuid.txt > index 117066a..9693fcc 100644 > --- a/Documentation/virtual/kvm/cpuid.txt > +++ b/Documentation/virtual/kvm/cpuid.txt > @@ -60,6 +60,10 @@ KVM_FEATURE_PV_DEDICATED || 8 || guest > checks this feature bit > || || mizations such as usage of > || || qspinlocks. > > -- > +KVM_FEATURE_PV_TLB_FLUSH || 9 || guest checks this feature bit > + || || before enabling > paravirtualized > + || || tlb flush. > +-- > KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no > guest-side > || || per-cpu warps are expected in > || || kvmclock. > diff --git a/arch/x86/include/uapi/asm/kvm_para.h > b/arch/x86/include/uapi/asm/kvm_para.h > index 9ead1ed..a028479 100644 > --- a/arch/x86/include/uapi/asm/kvm_para.h > +++ b/arch/x86/include/uapi/asm/kvm_para.h > @@ -25,6 +25,7 @@ > #define KVM_FEATURE_PV_EOI 6 > #define KVM_FEATURE_PV_UNHALT7 > #define KVM_FEATURE_PV_DEDICATED 8 > +#define KVM_FEATURE_PV_TLB_FLUSH 9 > > /* The last 8 bits are used to indicate how to interpret the flags field > * in pvclock structure. If no bits are set, all flags are ignored. > @@ -53,6 +54,7 @@ struct kvm_steal_time { > > #define KVM_VCPU_NOT_PREEMPTED (0 << 0) > #define KVM_VCPU_PREEMPTED (1 << 0) > +#define KVM_VCPU_SHOULD_FLUSH (1 << 1) > > #define KVM_CLOCK_PAIRING_WALLCLOCK 0 > struct kvm_clock_pairing { > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index 66ed3bc..50f4b6a 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -465,6 +465,33 @@ static void __init kvm_apf_trap_init(void) > update_intr_gate(X86_TRAP_PF, async_page_fault); > } > > +static cpumask_t flushmask; Hi Wanpeng, are you going to send v3 with a percpu variable? Paolo > +static void kvm_flush_tlb_others(const struct cpumask *cpumask, > + const struct flush_tlb_info *info) > +{ > + u8 state; > + int cpu; > + struct kvm_steal_time *src; > + > + cpumask_copy(, cpumask); > + /* > + * We have to call flush only on online vCPUs. And > + * queue flush_on_enter for pre-empted vCPUs > + */ > + for_each_cpu(cpu, cpumask) { > + src = _cpu(steal_time, cpu); > + state = src->preempted; > + if ((state & KVM_VCPU_PREEMPTED)) { > + if (cmpxchg(>preempted, state, state | > + KVM_VCPU_SHOULD_FLUSH) == state) > + __cpumask_clear_cpu(cpu, ); > + } > + } > + > + native_flush_tlb_others(, info); > +} > + > void __init kvm_guest_init(void) > { > int i; > @@ -484,6 +511,10 @@ void __init kvm_guest_init(void) > pv_time_ops.steal_clock = kvm_steal_clock; > } > > + if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) && > + !kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED)) > + pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others; > + > if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) > apic_set_eoi_write(kvm_guest_apic_eoi_write); > >
[PATCH v2 2/4] KVM: Add paravirt remote TLB flush
From: Wanpeng LiRemote flushing api's does a busy wait which is fine in bare-metal scenario. But with-in the guest, the vcpus might have been pre-empted or blocked. In this scenario, the initator vcpu would end up busy-waiting for a long amount of time. This patch set implements para-virt flush tlbs making sure that it does not wait for vcpus that are sleeping. And all the sleeping vcpus flush the tlb on guest enter. The best result is achieved when we're overcommiting the host by running multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching vCPUs which are not scheduled and avoid the wait on the main CPU. Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in one linux guest. ebizzy -M vanillaoptimized boost 8 vCPUs 10152 10083 -0.68% 16 vCPUs12244866 297.5% 24 vCPUs11093871 249% 32 vCPUs10253375 229.3% Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Wanpeng Li --- Documentation/virtual/kvm/cpuid.txt | 4 arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kernel/kvm.c| 31 +++ 3 files changed, 37 insertions(+) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 117066a..9693fcc 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -60,6 +60,10 @@ KVM_FEATURE_PV_DEDICATED || 8 || guest checks this feature bit || || mizations such as usage of || || qspinlocks. -- +KVM_FEATURE_PV_TLB_FLUSH || 9 || guest checks this feature bit + || || before enabling paravirtualized + || || tlb flush. +-- KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 9ead1ed..a028479 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -25,6 +25,7 @@ #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 #define KVM_FEATURE_PV_DEDICATED 8 +#define KVM_FEATURE_PV_TLB_FLUSH 9 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -53,6 +54,7 @@ struct kvm_steal_time { #define KVM_VCPU_NOT_PREEMPTED (0 << 0) #define KVM_VCPU_PREEMPTED (1 << 0) +#define KVM_VCPU_SHOULD_FLUSH (1 << 1) #define KVM_CLOCK_PAIRING_WALLCLOCK 0 struct kvm_clock_pairing { diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 66ed3bc..50f4b6a 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -465,6 +465,33 @@ static void __init kvm_apf_trap_init(void) update_intr_gate(X86_TRAP_PF, async_page_fault); } +static cpumask_t flushmask; + +static void kvm_flush_tlb_others(const struct cpumask *cpumask, + const struct flush_tlb_info *info) +{ + u8 state; + int cpu; + struct kvm_steal_time *src; + + cpumask_copy(, cpumask); + /* +* We have to call flush only on online vCPUs. And +* queue flush_on_enter for pre-empted vCPUs +*/ + for_each_cpu(cpu, cpumask) { + src = _cpu(steal_time, cpu); + state = src->preempted; + if ((state & KVM_VCPU_PREEMPTED)) { + if (cmpxchg(>preempted, state, state | + KVM_VCPU_SHOULD_FLUSH) == state) + __cpumask_clear_cpu(cpu, ); + } + } + + native_flush_tlb_others(, info); +} + void __init kvm_guest_init(void) { int i; @@ -484,6 +511,10 @@ void __init kvm_guest_init(void) pv_time_ops.steal_clock = kvm_steal_clock; } + if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) && + !kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED)) + pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others; + if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) apic_set_eoi_write(kvm_guest_apic_eoi_write); -- 2.7.4
[PATCH v2 2/4] KVM: Add paravirt remote TLB flush
From: Wanpeng Li Remote flushing api's does a busy wait which is fine in bare-metal scenario. But with-in the guest, the vcpus might have been pre-empted or blocked. In this scenario, the initator vcpu would end up busy-waiting for a long amount of time. This patch set implements para-virt flush tlbs making sure that it does not wait for vcpus that are sleeping. And all the sleeping vcpus flush the tlb on guest enter. The best result is achieved when we're overcommiting the host by running multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching vCPUs which are not scheduled and avoid the wait on the main CPU. Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy in one linux guest. ebizzy -M vanillaoptimized boost 8 vCPUs 10152 10083 -0.68% 16 vCPUs12244866 297.5% 24 vCPUs11093871 249% 32 vCPUs10253375 229.3% Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Wanpeng Li --- Documentation/virtual/kvm/cpuid.txt | 4 arch/x86/include/uapi/asm/kvm_para.h | 2 ++ arch/x86/kernel/kvm.c| 31 +++ 3 files changed, 37 insertions(+) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 117066a..9693fcc 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -60,6 +60,10 @@ KVM_FEATURE_PV_DEDICATED || 8 || guest checks this feature bit || || mizations such as usage of || || qspinlocks. -- +KVM_FEATURE_PV_TLB_FLUSH || 9 || guest checks this feature bit + || || before enabling paravirtualized + || || tlb flush. +-- KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 9ead1ed..a028479 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -25,6 +25,7 @@ #define KVM_FEATURE_PV_EOI 6 #define KVM_FEATURE_PV_UNHALT 7 #define KVM_FEATURE_PV_DEDICATED 8 +#define KVM_FEATURE_PV_TLB_FLUSH 9 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. @@ -53,6 +54,7 @@ struct kvm_steal_time { #define KVM_VCPU_NOT_PREEMPTED (0 << 0) #define KVM_VCPU_PREEMPTED (1 << 0) +#define KVM_VCPU_SHOULD_FLUSH (1 << 1) #define KVM_CLOCK_PAIRING_WALLCLOCK 0 struct kvm_clock_pairing { diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 66ed3bc..50f4b6a 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -465,6 +465,33 @@ static void __init kvm_apf_trap_init(void) update_intr_gate(X86_TRAP_PF, async_page_fault); } +static cpumask_t flushmask; + +static void kvm_flush_tlb_others(const struct cpumask *cpumask, + const struct flush_tlb_info *info) +{ + u8 state; + int cpu; + struct kvm_steal_time *src; + + cpumask_copy(, cpumask); + /* +* We have to call flush only on online vCPUs. And +* queue flush_on_enter for pre-empted vCPUs +*/ + for_each_cpu(cpu, cpumask) { + src = _cpu(steal_time, cpu); + state = src->preempted; + if ((state & KVM_VCPU_PREEMPTED)) { + if (cmpxchg(>preempted, state, state | + KVM_VCPU_SHOULD_FLUSH) == state) + __cpumask_clear_cpu(cpu, ); + } + } + + native_flush_tlb_others(, info); +} + void __init kvm_guest_init(void) { int i; @@ -484,6 +511,10 @@ void __init kvm_guest_init(void) pv_time_ops.steal_clock = kvm_steal_clock; } + if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) && + !kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED)) + pv_mmu_ops.flush_tlb_others = kvm_flush_tlb_others; + if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) apic_set_eoi_write(kvm_guest_apic_eoi_write); -- 2.7.4