[PATCH RESEND v2 2/3] KVM: X86: Implement PV sched yield hypercall

2019-05-28 Thread Wanpeng Li
From: Wanpeng Li The target vCPUs are in runnable state after vcpu_kick and suitable as a yield target. This patch implements the sched yield hypercall. 17% performance increase of ebizzy benchmark can be observed in an over-subscribe environment. (w/ kvm-pv-tlb disabled, testing TLB flush

Re: [PATCH RESEND v2] KVM: X86: Implement PV sched yield hypercall

2019-05-28 Thread Wanpeng Li
e, please drop us a note to > help improve the system] > > url: > https://github.com/0day-ci/linux/commits/Wanpeng-Li/KVM-X86-Implement-PV-sched-yield-hypercall/20190528-132021 > base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next > config: x86_64-allyesconf

[PATCH RESEND v2] KVM: X86: Implement PV sched yield hypercall

2019-05-27 Thread Wanpeng Li
From: Wanpeng Li The target vCPUs are in runnable state after vcpu_kick and suitable as a yield target. This patch implements the sched yield hypercall. 17% performance increase of ebizzy benchmark can be observed in an over-subscribe environment. (w/ kvm-pv-tlb disabled, testing TLB flush

Re: [PATCH 2/3] KVM: X86: Implement PV sched yield hypercall

2019-05-27 Thread Wanpeng Li
On Mon, 27 May 2019 at 19:54, Paolo Bonzini wrote: > > On 27/05/19 12:34, Wanpeng Li wrote: > > + rcu_read_lock(); > > + map = rcu_dereference(kvm->arch.apic_map); > > + target = map->phys_map[dest_id]->vcpu; > > + rcu_read_unlock(); &

[PATCH v2 2/3] KVM: X86: Implement PV sched yield hypercall

2019-05-27 Thread Wanpeng Li
From: Wanpeng Li The target vCPUs are in runnable state after vcpu_kick and suitable as a yield target. This patch implements the sched yield hypercall. 17% performace increase of ebizzy benchmark can be observed in an over-subscribe environment. (w/ kvm-pv-tlb disabled, testing TLB flush

[PATCH v2 3/3] KVM: X86: Expose PV_SCHED_YIELD CPUID feature bit to guest

2019-05-27 Thread Wanpeng Li
From: Wanpeng Li Expose PV_SCHED_YIELD feature bit to guest, the guest can check this feature bit before using paravirtualized sched yield. Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Wanpeng Li --- Documentation/virtual/kvm/cpuid.txt | 4 arch/x86/kvm/cpuid.c| 3

[PATCH v2 1/3] KVM: X86: Implement PV sched yield in linux guest

2019-05-27 Thread Wanpeng Li
From: Wanpeng Li When sending a call-function IPI-many to vCPUs, yield if any of the IPI target vCPUs was preempted, we just select the first preempted target vCPU which we found since the state of target vCPUs can change underneath and to avoid race conditions. Cc: Paolo Bonzini Cc: Radim

[PATCH v2 0/3] KVM: Yield to IPI target if necessary

2019-05-27 Thread Wanpeng Li
-function is not easy to be trigged by userspace workload). v1 -> v2: * check map is not NULL * check map->phys_map[dest_id] is not NULL Wanpeng Li (3): KVM: X86: Implement PV sched yield in linux guest KVM: X86: Implement PV sched yield hypercall KVM: X86: Expose PV_SCHED_YIELD

[PATCH 1/3] KVM: X86: Implement PV sched yield in linux guest

2019-05-27 Thread Wanpeng Li
From: Wanpeng Li When sending a call-function IPI-many to vCPUs, yield if any of the IPI target vCPUs was preempted, we just select the first preempted target vCPU which we found since the state of target vCPUs can change underneath and to avoid race conditions. Cc: Paolo Bonzini Cc: Radim

[PATCH 3/3] KVM: X86: Expose PV_SCHED_YIELD CPUID feature bit to guest

2019-05-27 Thread Wanpeng Li
From: Wanpeng Li Expose PV_SCHED_YIELD feature bit to guest, the guest can check this feature bit before using paravirtualized sched yield. Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Wanpeng Li --- Documentation/virtual/kvm/cpuid.txt | 4 arch/x86/kvm/cpuid.c| 3

[PATCH 0/3] KVM: Yield to IPI target if necessary

2019-05-27 Thread Wanpeng Li
-function is not easy to be trigged by userspace workload). Wanpeng Li (3): KVM: X86: Implement PV sched yield in linux guest KVM: X86: Implement PV sched yield hypercall KVM: X86: Expose PV_SCHED_YIELD CPUID feature bit to guest Documentation/virtual/kvm/cpuid.txt | 4

[PATCH 2/3] KVM: X86: Implement PV sched yield hypercall

2019-05-27 Thread Wanpeng Li
From: Wanpeng Li The target vCPUs are in runnable state after vcpu_kick and suitable as a yield target. This patch implements the sched yield hypercall. 17% performace increase of ebizzy benchmark can be observed in an over-subscribe environment. (w/ kvm-pv-tlb disabled, testing TLB flush

[PATCH 2/2] KVM: LAPIC: remove the trailing newline used in the fmt parameter of TP_printk

2019-05-22 Thread Wanpeng Li
From: Wanpeng Li The trailing newlines will lead to extra newlines in the trace file which looks like the following output, so remove it. qemu-system-x86-15695 [002] ...1 15774.839240: kvm_hv_timer_state: vcpu_id 0 hv_timer 1 qemu-system-x86-15695 [002] ...1 15774.839309: kvm_hv_timer_state

[PATCH 1/2] KVM: LAPIC: Optimize timer latency consider world switch time

2019-05-22 Thread Wanpeng Li
From: Wanpeng Li Advance lapic timer tries to hidden the hypervisor overhead between the host emulated timer fires and the guest awares the timer is fired. However, even though after more sustaining optimizations, kvm-unit-tests/tscdeadline_latency still awares ~1000 cycles latency since we

Re: [PATCH v4 0/5] KVM: LAPIC: Optimize timer latency further

2019-05-22 Thread Wanpeng Li
On Mon, 20 May 2019 at 16:18, Wanpeng Li wrote: > > Advance lapic timer tries to hidden the hypervisor overhead between the > host emulated timer fires and the guest awares the timer is fired. However, > it just hidden the time between apic_timer_fn/handle_preemption_timer -> >

[PATCH 1/3] KVM: Documentation: Add disable pause exits to KVM_CAP_X86_DISABLE_EXITS

2019-05-21 Thread Wanpeng Li
From: Wanpeng Li Commit b31c114b (KVM: X86: Provide a capability to disable PAUSE intercepts) forgot to add the KVM_X86_DISABLE_EXITS_PAUSE into api doc. This patch adds it. Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Sean Christopherson Cc: Liran Alon Signed-off-by: Wanpeng Li

[PATCH v2 2/3] KVM: X86: Provide a capability to disable cstate msr read intercepts

2019-05-21 Thread Wanpeng Li
From: Wanpeng Li Allow guest reads CORE cstate when exposing host CPU power management capabilities to the guest. PKG cstate is restricted to avoid a guest to get the whole package information in multi-tenant scenario. Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Sean Christopherson Cc: Liran

[PATCH v2 3/3] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit

2019-05-21 Thread Wanpeng Li
From: Wanpeng Li MSR IA32_MISC_ENABLE bit 18, according to SDM: | When this bit is set to 0, the MONITOR feature flag is not set (CPUID.01H:ECX[bit 3] = 0). | This indicates that MONITOR/MWAIT are not supported. | | Software attempts to execute MONITOR/MWAIT will cause #UD when this bit is 0

Re: [PATCH v4 4/5] KVM: LAPIC: Delay trace advance expire delta

2019-05-20 Thread Wanpeng Li
On Mon, 20 May 2019 at 19:41, Paolo Bonzini wrote: > > On 20/05/19 13:36, Wanpeng Li wrote: > >> Hmm, yeah, that makes sense. The location of the tracepoint is a bit > >> weird, but I guess we can add a comment in the code. > > Do you need me to post a new patchset?

Re: [PATCH RESEND 2/4] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit

2019-05-20 Thread Wanpeng Li
On Mon, 20 May 2019 at 18:34, Paolo Bonzini wrote: > > On 17/05/19 10:49, Wanpeng Li wrote: > > MSR IA32_MSIC_ENABLE bit 18, according to SDM: > > > > | When this bit is set to 0, the MONITOR feature flag is not set > > (CPUID.01H:ECX[bit 3] = 0). > >

Re: [PATCH v4 4/5] KVM: LAPIC: Delay trace advance expire delta

2019-05-20 Thread Wanpeng Li
On Mon, 20 May 2019 at 19:33, Paolo Bonzini wrote: > > On 20/05/19 13:22, Wanpeng Li wrote: > >> > >> We would like to move wait_lapic_expire() just before vmentry, which would > >> place wait_lapic_expire() again inside the extended quiescent state. Drop >

Re: [PATCH 1/4] KVM: x86: Disable intercept for CORE cstate read

2019-05-20 Thread Wanpeng Li
On Mon, 20 May 2019 at 18:30, Paolo Bonzini wrote: > > On 17/05/19 10:49, Wanpeng Li wrote: > > From: Wanpeng Li > > > > Allow guest reads CORE cstate when exposing host CPU power management > > capabilities > > to the guest. PKG cstate is restrict

Re: [PATCH v4 4/5] KVM: LAPIC: Delay trace advance expire delta

2019-05-20 Thread Wanpeng Li
On Mon, 20 May 2019 at 19:14, Paolo Bonzini wrote: > > On 20/05/19 10:18, Wanpeng Li wrote: > > From: Wanpeng Li > > > > wait_lapic_expire() call was moved above guest_enter_irqoff() because of > > its tracepoint, which violated the RCU extended quiescent state inv

Re: [PATCH v3 3/5] KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace

2019-05-20 Thread Wanpeng Li
On Sat, 18 May 2019 at 04:05, Sean Christopherson wrote: > > On Thu, May 16, 2019 at 11:06:18AM +0800, Wanpeng Li wrote: > > From: Wanpeng Li > > > > Expose per-vCPU timer_advance_ns to userspace, so it is able to > > query the auto-adjusted value. > > > &g

Re: [PATCH v3 5/5] KVM: LAPIC: Optimize timer latency further

2019-05-20 Thread Wanpeng Li
On Sat, 18 May 2019 at 03:50, Sean Christopherson wrote: > > On Thu, May 16, 2019 at 11:06:20AM +0800, Wanpeng Li wrote: > > From: Wanpeng Li > > > > Advance lapic timer tries to hidden the hypervisor overhead between the > > host emulated timer fires and the g

[PATCH v4 5/5] KVM: LAPIC: Optimize timer latency further

2019-05-20 Thread Wanpeng Li
From: Wanpeng Li Advance lapic timer tries to hidden the hypervisor overhead between the host emulated timer fires and the guest awares the timer is fired. However, it just hidden the time between apic_timer_fn/handle_preemption_timer -> wait_lapic_expire, instead of the real posit

[PATCH v4 1/5] KVM: LAPIC: Extract adaptive tune timer advancement logic

2019-05-20 Thread Wanpeng Li
From: Wanpeng Li Extract adaptive tune timer advancement logic to a single function. Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Sean Christopherson Cc: Liran Alon Signed-off-by: Wanpeng Li --- arch/x86/kvm/lapic.c | 57 ++-- 1 file changed, 33

[PATCH v4 0/5] KVM: LAPIC: Optimize timer latency further

2019-05-20 Thread Wanpeng Li
est_exit_irqoff() * move wait_lapic_expire() before flushing the L1 v1 -> v2: * fix indent in patch 1/4 * remove the wait_lapic_expire() tracepoint and expose by debugfs * move the call to wait_lapic_expire() into vmx.c and svm.c Wanpeng Li (5): KVM: LAPIC: Extract adaptive tune timer advance

[PATCH v4 2/5] KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow

2019-05-20 Thread Wanpeng Li
From: Wanpeng Li After commit c3941d9e0 (KVM: lapic: Allow user to disable adaptive tuning of timer advancement), '-1' enables adaptive tuning starting from default advancment of 1000ns. However, we should expose an int instead of an overflow uint module parameter. Before patch: /sys/module

[PATCH v4 4/5] KVM: LAPIC: Delay trace advance expire delta

2019-05-20 Thread Wanpeng Li
From: Wanpeng Li wait_lapic_expire() call was moved above guest_enter_irqoff() because of its tracepoint, which violated the RCU extended quiescent state invoked by guest_enter_irqoff()[1][2]. This patch simply moves the tracepoint below guest_exit_irqoff() in vcpu_enter_guest(). Snapshot

[PATCH v4 3/5] KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace

2019-05-20 Thread Wanpeng Li
From: Wanpeng Li Expose per-vCPU timer_advance_ns to userspace, so it is able to query the auto-adjusted value. Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Sean Christopherson Cc: Liran Alon Signed-off-by: Wanpeng Li --- arch/x86/kvm/debugfs.c | 18 ++ 1 file changed, 18

Re: [PATCH v3 4/5] KVM: LAPIC: Delay trace advance expire delta

2019-05-20 Thread Wanpeng Li
On Sat, 18 May 2019 at 03:44, Sean Christopherson wrote: > > On Thu, May 16, 2019 at 11:06:19AM +0800, Wanpeng Li wrote: > > From: Wanpeng Li > > > > wait_lapic_expire() call was moved above guest_enter_irqoff() because of > > its tracepoint, which violated th

[PATCH 3/4] KVM: Fix spinlock taken warning during host resume

2019-05-17 Thread Wanpeng Li
From: Wanpeng Li WARNING: CPU: 0 PID: 13554 at kvm/arch/x86/kvm//../../../virt/kvm/kvm_main.c:4183 kvm_resume+0x3c/0x40 [kvm] CPU: 0 PID: 13554 Comm: step_after_susp Tainted: G OE 5.1.0-rc4+ #1 RIP: 0010:kvm_resume+0x3c/0x40 [kvm] Call Trace: syscore_resume+0x63/0x2d0

[PATCH RESEND 2/4] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit

2019-05-17 Thread Wanpeng Li
From: Wanpeng Li MSR IA32_MSIC_ENABLE bit 18, according to SDM: | When this bit is set to 0, the MONITOR feature flag is not set (CPUID.01H:ECX[bit 3] = 0). | This indicates that MONITOR/MWAIT are not supported. | | Software attempts to execute MONITOR/MWAIT will cause #UD when this bit is 0

[PATCH 4/4] KVM: nVMX: Fix using __this_cpu_read() in preemptible context

2019-05-17 Thread Wanpeng Li
From: Wanpeng Li BUG: using __this_cpu_read() in preemptible [] code: qemu-system-x86/4590 caller is nested_vmx_enter_non_root_mode+0xebd/0x1790 [kvm_intel] CPU: 4 PID: 4590 Comm: qemu-system-x86 Tainted: G OE 5.1.0-rc4+ #1 Call Trace: dump_stack+0x67/0x95

[PATCH 1/4] KVM: x86: Disable intercept for CORE cstate read

2019-05-17 Thread Wanpeng Li
From: Wanpeng Li Allow guest reads CORE cstate when exposing host CPU power management capabilities to the guest. PKG cstate is restricted to avoid a guest to get the whole package information in multi-tenant scenario. Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Sean Christopherson Cc: Liran

Re: [PATCH v2 3/4] KVM: LAPIC: Expose per-vCPU timer adavance information to userspace

2019-05-15 Thread Wanpeng Li
On Thu, 16 May 2019 at 01:21, Sean Christopherson wrote: > > On Wed, May 15, 2019 at 12:11:53PM +0800, Wanpeng Li wrote: > > From: Wanpeng Li > > > > Expose the per-vCPU advancement information to the user via per-vCPU debugfs > > entry. wait_lapic

Re: [PATCH v2 4/4] KVM: LAPIC: Optimize timer latency further

2019-05-15 Thread Wanpeng Li
On Thu, 16 May 2019 at 01:42, Sean Christopherson wrote: > > On Wed, May 15, 2019 at 12:11:54PM +0800, Wanpeng Li wrote: > > From: Wanpeng Li > > > > Advance lapic timer tries to hidden the hypervisor overhead between the > > host emulated timer fires and the g

[PATCH v3 1/5] KVM: LAPIC: Extract adaptive tune timer advancement logic

2019-05-15 Thread Wanpeng Li
From: Wanpeng Li Extract adaptive tune timer advancement logic to a single function. Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Sean Christopherson Cc: Liran Alon Signed-off-by: Wanpeng Li --- arch/x86/kvm/lapic.c | 57 ++-- 1 file changed, 33

[PATCH v3 5/5] KVM: LAPIC: Optimize timer latency further

2019-05-15 Thread Wanpeng Li
From: Wanpeng Li Advance lapic timer tries to hidden the hypervisor overhead between the host emulated timer fires and the guest awares the timer is fired. However, it just hidden the time between apic_timer_fn/handle_preemption_timer -> wait_lapic_expire, instead of the real posit

[PATCH v3 4/5] KVM: LAPIC: Delay trace advance expire delta

2019-05-15 Thread Wanpeng Li
From: Wanpeng Li wait_lapic_expire() call was moved above guest_enter_irqoff() because of its tracepoint, which violated the RCU extended quiescent state invoked by guest_enter_irqoff()[1][2]. This patch simply moves the tracepoint below guest_exit_irqoff() in vcpu_enter_guest(). Snapshot

[PATCH v3 2/5] KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow

2019-05-15 Thread Wanpeng Li
From: Wanpeng Li After commit c3941d9e0 (KVM: lapic: Allow user to disable adaptive tuning of timer advancement), '-1' enables adaptive tuning starting from default advancment of 1000ns. However, we should expose an int instead of an overflow uint module parameter. Before patch: /sys/module

[PATCH v3 3/5] KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace

2019-05-15 Thread Wanpeng Li
From: Wanpeng Li Expose per-vCPU timer_advance_ns to userspace, so it is able to query the auto-adjusted value. Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Sean Christopherson Cc: Liran Alon Signed-off-by: Wanpeng Li --- arch/x86/kvm/debugfs.c | 16 1 file changed, 16

[PATCH v3 0/5] KVM: LAPIC: Optimize timer latency further

2019-05-15 Thread Wanpeng Li
ose by debugfs * move the call to wait_lapic_expire() into vmx.c and svm.c Wanpeng Li (5): KVM: LAPIC: Extract adaptive tune timer advancement logic KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow KVM: LAPIC: Expose per-vCPU timer_advance_ns to userspace KVM: LAPIC: Delay trace adva

Re: [PATCH] sched: introduce configurable delay before entering idle

2019-05-15 Thread Wanpeng Li
On Thu, 16 May 2019 at 02:42, Ankur Arora wrote: > > On 5/14/19 6:50 AM, Marcelo Tosatti wrote: > > On Mon, May 13, 2019 at 05:20:37PM +0800, Wanpeng Li wrote: > >> On Wed, 8 May 2019 at 02:57, Marcelo Tosatti wrote: > >>> > >>> > >>> Cert

[PATCH v2 0/4] KVM: LAPIC: Optimize timer latency further

2019-05-14 Thread Wanpeng Li
o ~1000+ cycles on a haswell desktop) for kvm-unit-tests/tscdeadline_latency when testing busy waits. v1 -> v2: * fix indent in patch 1/4 * remove the wait_lapic_expire() tracepoint and expose by debugfs * move the call to wait_lapic_expire() into vmx.c and svm.c Wanpeng Li (4):

[PATCH v2 1/4] KVM: LAPIC: Extract adaptive tune timer advancement logic

2019-05-14 Thread Wanpeng Li
From: Wanpeng Li Extract adaptive tune timer advancement logic to a single function. Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Sean Christopherson Cc: Liran Alon Signed-off-by: Wanpeng Li --- arch/x86/kvm/lapic.c | 57 ++-- 1 file changed, 33

[PATCH v2 0/4] KVM: LAPIC: Optimize timer latency further

2019-05-14 Thread Wanpeng Li
o ~1000+ cycles on a haswell desktop) for kvm-unit-tests/tscdeadline_latency when testing busy waits. v1 -> v2: * fix indent in patch 1/4 * remove the wait_lapic_expire() tracepoint and expose by debugfs * move the call to wait_lapic_expire() into vmx.c and svm.c Wanpeng Li (4):

[PATCH v2 2/4] KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow

2019-05-14 Thread Wanpeng Li
From: Wanpeng Li After commit c3941d9e0 (KVM: lapic: Allow user to disable adaptive tuning of timer advancement), '-1' enables adaptive tuning starting from default advancment of 1000ns. However, we should expose an int instead of an overflow uint module parameter. Before patch: /sys/module

[PATCH v2 3/4] KVM: LAPIC: Expose per-vCPU timer adavance information to userspace

2019-05-14 Thread Wanpeng Li
From: Wanpeng Li Expose the per-vCPU advancement information to the user via per-vCPU debugfs entry. wait_lapic_expire() call was moved above guest_enter_irqoff() because of its tracepoint, which violated the RCU extended quiescent state invoked by guest_enter_irqoff()[1][2]. This patch

[PATCH v2 4/4] KVM: LAPIC: Optimize timer latency further

2019-05-14 Thread Wanpeng Li
From: Wanpeng Li Advance lapic timer tries to hidden the hypervisor overhead between the host emulated timer fires and the guest awares the timer is fired. However, it just hidden the time between apic_timer_fn/handle_preemption_timer -> wait_lapic_expire, instead of the real posit

Re: [PATCH] sched: introduce configurable delay before entering idle

2019-05-14 Thread Wanpeng Li
On Wed, 15 May 2019 at 02:20, Marcelo Tosatti wrote: > > On Tue, May 14, 2019 at 11:20:15AM -0400, Konrad Rzeszutek Wilk wrote: > > On Tue, May 14, 2019 at 10:50:23AM -0300, Marcelo Tosatti wrote: > > > On Mon, May 13, 2019 at 05:20:37PM +0800, Wanpeng Li wrote: > > &

Re: [PATCH] sched: introduce configurable delay before entering idle

2019-05-14 Thread Wanpeng Li
On Mon, 13 May 2019 at 19:52, Raslan, KarimAllah wrote: > > On Mon, 2019-05-13 at 07:31 -0400, Konrad Rzeszutek Wilk wrote: > > On May 13, 2019 5:20:37 AM EDT, Wanpeng Li wrote: > > > > > > On Wed, 8 May 2019 at 02:57, Marcelo Tosatti > > > wrote:

Re: [PATCH 3/3] KVM: LAPIC: Optimize timer latency further

2019-05-14 Thread Wanpeng Li
On Tue, 14 May 2019 at 09:45, Wanpeng Li wrote: > > On Tue, 14 May 2019 at 03:54, Sean Christopherson > wrote: > > > > On Thu, May 09, 2019 at 07:29:21PM +0800, Wanpeng Li wrote: > > > From: Wanpeng Li > > > > > > Advance lapic timer trie

Re: [PATCH] KVM: X86: Enable IA32_MSIC_ENABLE MONITOR bit when exposing mwait/monitor

2019-05-14 Thread Wanpeng Li
On Mon, 13 May 2019 at 21:35, Radim Krčmář wrote: > > 2019-05-13 17:46+0800, Wanpeng Li: > > From: Wanpeng Li > > > > MSR IA32_MSIC_ENABLE bit 18, according to SDM: > > > > | When this bit is set to 0, the MONITOR feature flag is not set > > (CPUID.

[PATCH v2] KVM: X86: Emulate MSR_IA32_MISC_ENABLE MWAIT bit

2019-05-13 Thread Wanpeng Li
From: Wanpeng Li MSR IA32_MSIC_ENABLE bit 18, according to SDM: | When this bit is set to 0, the MONITOR feature flag is not set (CPUID.01H:ECX[bit 3] = 0). | This indicates that MONITOR/MWAIT are not supported. | | Software attempts to execute MONITOR/MWAIT will cause #UD when this bit is 0

Re: [PATCH 3/3] KVM: LAPIC: Optimize timer latency further

2019-05-13 Thread Wanpeng Li
On Tue, 14 May 2019 at 03:54, Sean Christopherson wrote: > > On Thu, May 09, 2019 at 07:29:21PM +0800, Wanpeng Li wrote: > > From: Wanpeng Li > > > > Advance lapic timer tries to hidden the hypervisor overhead between host > > timer fires and the guest awares the ti

Re: [PATCH 1/3] KVM: LAPIC: Extract adaptive tune timer advancement logic

2019-05-13 Thread Wanpeng Li
On Tue, 14 May 2019 at 03:39, Sean Christopherson wrote: > > On Thu, May 09, 2019 at 07:29:19PM +0800, Wanpeng Li wrote: > > From: Wanpeng Li > > > > Extract adaptive tune timer advancement logic to a single function. > > Why? Just because the function wait_lap

[PATCH] KVM: X86: Enable IA32_MSIC_ENABLE MONITOR bit when exposing mwait/monitor

2019-05-13 Thread Wanpeng Li
From: Wanpeng Li MSR IA32_MSIC_ENABLE bit 18, according to SDM: | When this bit is set to 0, the MONITOR feature flag is not set (CPUID.01H:ECX[bit 3] = 0). | This indicates that MONITOR/MWAIT are not supported. | | Software attempts to execute MONITOR/MWAIT will cause #UD when this bit

Re: [PATCH] sched: introduce configurable delay before entering idle

2019-05-13 Thread Wanpeng Li
if it can solve this? Regards, Wanpeng Li > > This patch introduces a configurable busy-wait delay before entering the > architecture delay routine, allowing wakeup IPIs to be skipped > (if the IPI happens in that window). > > The real-life workload which this patch improves p

[PATCH 1/3] KVM: LAPIC: Extract adaptive tune timer advancement logic

2019-05-09 Thread Wanpeng Li
From: Wanpeng Li Extract adaptive tune timer advancement logic to a single function. Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Sean Christopherson Cc: Liran Alon Signed-off-by: Wanpeng Li --- arch/x86/kvm/lapic.c | 57 ++-- 1 file changed, 33

[PATCH 3/3] KVM: LAPIC: Optimize timer latency further

2019-05-09 Thread Wanpeng Li
From: Wanpeng Li Advance lapic timer tries to hidden the hypervisor overhead between host timer fires and the guest awares the timer is fired. However, it just hidden the time between apic_timer_fn/handle_preemption_timer -> wait_lapic_expire, instead of the real position of vmentry wh

[PATCH 2/3] KVM: LAPIC: Fix lapic_timer_advance_ns parameter overflow

2019-05-09 Thread Wanpeng Li
From: Wanpeng Li After commit c3941d9e0 (KVM: lapic: Allow user to disable adaptive tuning of timer advancement), '-1' enables adaptive tuning starting from default advancment of 1000ns. However, we should expose an int instead of an overflow uint module parameter. Before patch: /sys/module

[PATCH 0/3] KVM: LAPIC: Optimize timer latency further

2019-05-09 Thread Wanpeng Li
tively tuning the timer advancement. The patch can reduce 50% latency (~1600+ cycles to ~800+ cycles on a haswell desktop) for kvm-unit-tests/tscdeadline_latency when testing busy waits. Wanpeng Li (3): KVM: LAPIC: Extract adaptive tune timer advancement logic KVM: LAPIC: Fix lapic_timer_advance

Re: [PATCH] kernel/sched: run nohz idle load balancer on HK_FLAG_MISC CPUs

2019-04-28 Thread Wanpeng Li
nr_cpu_ids && idle_cpu(ilb)) > - return ilb; > + for_each_cpu_and(ilb, nohz.idle_cpus_mask, > + housekeeping_cpumask(HK_FLAG_MISC)) { > + if (idle_cpu(ilb)) > + return ilb; > + } What will happen if cpu1 is still id

Re: [PATCH] x86/kvm: move kvm_load/put_guest_xcr0 into atomic context

2019-04-27 Thread Wanpeng Li
lock_page Maybe this should not be counted to guest time in guest_exit_irqoff()? Regards, Wanpeng Li > > > > In this case, host_xcr0 is 0x2ff, guest vcpu xcr0 is 0xff. After schedule > > out, host cpu has guest xcr0 loaded (0xff). > > > > In __switch_to { >

[PATCH RESEND] KVM: MMU: Introduce single thread to zap collapsible sptes

2019-01-28 Thread Wanpeng Li
From: Wanpeng Li Last year guys from huawei reported that the call of memory_global_dirty_log_start/stop() takes 13s for 4T memory and cause guest freeze too long which increases the unacceptable migration downtime. [1] [2] Guangrong pointed out: | collapsible_sptes zaps 4k mappings

[PATCH RESEND] KVM: MMU: Introduce single thread to zap collapsible sptes

2019-01-08 Thread Wanpeng Li
From: Wanpeng Li Last year guys from huawei reported that the call of memory_global_dirty_log_start/stop() takes 13s for 4T memory and cause guest freeze too long which increases the unacceptable migration downtime. [1] [2] Guangrong pointed out: | collapsible_sptes zaps 4k mappings

Re: [PATCH] KVM: MMU: Introduce single thread to zap collapsible sptes

2018-12-20 Thread Wanpeng Li
On Thu, 20 Dec 2018 at 22:43, Radim Krčmář wrote: > > 2018-12-06 15:58+0800, Wanpeng Li: > > From: Wanpeng Li > > > > Last year guys from huawei reported that the call of > > memory_global_dirty_log_start/stop() > > takes 13s for 4T memory and cause g

Re: [PATCH] KVM: MMU: Introduce single thread to zap collapsible sptes

2018-12-19 Thread Wanpeng Li
kindly ping, On Fri, 14 Dec 2018 at 15:24, Wanpeng Li wrote: > > ping, > On Thu, 6 Dec 2018 at 15:58, Wanpeng Li wrote: > > > > From: Wanpeng Li > > > > Last year guys from huawei reported that the call of > > memory_global_dirty_log_start/stop() >

[PATCH] KVM: X86: Fix NULL deref in vcpu_scan_ioapic

2018-12-16 Thread Wanpeng Li
From: Wanpeng Li Reported by syzkaller: CPU: 1 PID: 5962 Comm: syz-executor118 Not tainted 4.20.0-rc6+ #374 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:kvm_apic_hw_enabled arch/x86/kvm/lapic.h:169 [inline] RIP: 0010

Re: [PATCH] KVM: MMU: Introduce single thread to zap collapsible sptes

2018-12-13 Thread Wanpeng Li
ping, On Thu, 6 Dec 2018 at 15:58, Wanpeng Li wrote: > > From: Wanpeng Li > > Last year guys from huawei reported that the call of > memory_global_dirty_log_start/stop() > takes 13s for 4T memory and cause guest freeze too long which increases the > unacceptable > mi

[PATCH] KVM: MMU: Introduce single thread to zap collapsible sptes

2018-12-05 Thread Wanpeng Li
From: Wanpeng Li Last year guys from huawei reported that the call of memory_global_dirty_log_start/stop() takes 13s for 4T memory and cause guest freeze too long which increases the unacceptable migration downtime. [1] [2] Guangrong pointed out: | collapsible_sptes zaps 4k mappings

[PATCH] KVM: MMU: Introduce single thread to zap collapsible sptes

2018-12-05 Thread Wanpeng Li
From: Wanpeng Li Last year guys from huawei reported that the call of memory_global_dirty_log_start/stop() takes 13s for 4T memory and cause guest freeze too long which increases the unacceptable migration downtime. [1] [2] Guangrong pointed out: | collapsible_sptes zaps 4k mappings

Re: KASAN: use-after-free Read in kvm_write_guest_offset_cached

2018-11-26 Thread Wanpeng Li
On Tue, 27 Nov 2018 at 12:51, syzbot wrote: > > Hello, Is there beauty C codes? Regards, Wanpeng Li > > syzbot found the following crash on: > > HEAD commit:442b8cea2477 Add linux-next specific files for 20181109 > git tree: linux-next > console output: https

Re: KASAN: use-after-free Read in kvm_write_guest_offset_cached

2018-11-26 Thread Wanpeng Li
On Tue, 27 Nov 2018 at 12:51, syzbot wrote: > > Hello, Is there beauty C codes? Regards, Wanpeng Li > > syzbot found the following crash on: > > HEAD commit:442b8cea2477 Add linux-next specific files for 20181109 > git tree: linux-next > console output: https

[PATCH] KVM: X86: Fix scan ioapic use-before-initialization

2018-11-20 Thread Wanpeng Li
From: Wanpeng Li Reported by syzkaller: BUG: unable to handle kernel NULL pointer dereference at 01c8 PGD 8003ec4da067 P4D 8003ec4da067 PUD 3f7bfa067 PMD 0 Oops: [#1] PREEMPT SMP PTI CPU: 7 PID: 5059 Comm: debug Tainted: G OE 4.19.0-rc5 #16 RIP: 0010

[PATCH] KVM: X86: Fix scan ioapic use-before-initialization

2018-11-20 Thread Wanpeng Li
From: Wanpeng Li Reported by syzkaller: BUG: unable to handle kernel NULL pointer dereference at 01c8 PGD 8003ec4da067 P4D 8003ec4da067 PUD 3f7bfa067 PMD 0 Oops: [#1] PREEMPT SMP PTI CPU: 7 PID: 5059 Comm: debug Tainted: G OE 4.19.0-rc5 #16 RIP: 0010

[PATCH] KVM: LAPIC: Fix pv ipis use-before-initialization

2018-11-19 Thread Wanpeng Li
xes it by checking whether or not apic map is NULL and bailing out immediately if that is the case. Fixes: 4180bf1b65 (KVM: X86: Implement "send IPI" hypercall) Reported-by: Wei Wu Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Wei Wu Signed-off-by: Wanpeng Li --- arch/x86/kvm/lapic.c | 5

[PATCH] KVM: LAPIC: Fix pv ipis use-before-initialization

2018-11-19 Thread Wanpeng Li
xes it by checking whether or not apic map is NULL and bailing out immediately if that is the case. Fixes: 4180bf1b65 (KVM: X86: Implement "send IPI" hypercall) Reported-by: Wei Wu Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Wei Wu Signed-off-by: Wanpeng Li --- arch/x86/kvm/lapic.c | 5

Re: [PATCH v4] x86: load FPU registers on return to userland

2018-11-11 Thread Wanpeng Li
g the registers if the task stays in kernel and does > not return to userland > - make kernel_fpu_begin() cheaper: it only saves the registers on the > first invocation. The second invocation does not need save them again. > Do you have any performance data? Regards, Wanpeng Li &g

Re: [PATCH v4] x86: load FPU registers on return to userland

2018-11-11 Thread Wanpeng Li
g the registers if the task stays in kernel and does > not return to userland > - make kernel_fpu_begin() cheaper: it only saves the registers on the > first invocation. The second invocation does not need save them again. > Do you have any performance data? Regards, Wanpeng Li &g

Re: [PATCH] KVM: VMX: enable nested virtualization by default

2018-10-16 Thread Wanpeng Li
mostly nested = 0; > +static bool __read_mostly nested = 1; Really cool, a milestone for nested. :) Regards, Wanpeng Li

Re: [PATCH] KVM: VMX: enable nested virtualization by default

2018-10-16 Thread Wanpeng Li
mostly nested = 0; > +static bool __read_mostly nested = 1; Really cool, a milestone for nested. :) Regards, Wanpeng Li

Re: [PATCH] KVM: LAPIC: Tune lapic_timer_advance_ns automatically

2018-10-08 Thread Wanpeng Li
On Mon, 8 Oct 2018 at 20:04, Liran Alon wrote: > > > > > On 8 Oct 2018, at 13:59, Wanpeng Li wrote: > > > > On Mon, 8 Oct 2018 at 05:02, Liran Alon wrote: > >> > >> > >> > >>> On 28 Sep 2018, at 9:12, Wanpeng Li wro

Re: [PATCH] KVM: LAPIC: Tune lapic_timer_advance_ns automatically

2018-10-08 Thread Wanpeng Li
On Mon, 8 Oct 2018 at 20:04, Liran Alon wrote: > > > > > On 8 Oct 2018, at 13:59, Wanpeng Li wrote: > > > > On Mon, 8 Oct 2018 at 05:02, Liran Alon wrote: > >> > >> > >> > >>> On 28 Sep 2018, at 9:12, Wanpeng Li wro

[PATCH v2] KVM: LAPIC: Tune lapic_timer_advance_ns automatically

2018-10-08 Thread Wanpeng Li
From: Wanpeng Li In cloud environment, lapic_timer_advance_ns is needed to be tuned for every CPU generations, and every host kernel versions(the kvm-unit-tests/tscdeadline_latency.flat is 5700 cycles for upstream kernel and 9600 cycles for our 3.10 product kernel, both preemption_timer=N

[PATCH v2] KVM: LAPIC: Tune lapic_timer_advance_ns automatically

2018-10-08 Thread Wanpeng Li
From: Wanpeng Li In cloud environment, lapic_timer_advance_ns is needed to be tuned for every CPU generations, and every host kernel versions(the kvm-unit-tests/tscdeadline_latency.flat is 5700 cycles for upstream kernel and 9600 cycles for our 3.10 product kernel, both preemption_timer=N

Re: [PATCH] KVM: LAPIC: Tune lapic_timer_advance_ns automatically

2018-10-08 Thread Wanpeng Li
On Mon, 8 Oct 2018 at 05:02, Liran Alon wrote: > > > > > On 28 Sep 2018, at 9:12, Wanpeng Li wrote: > > > > From: Wanpeng Li > > > > In cloud environment, lapic_timer_advance_ns is needed to be tuned for > > every CPU > > generations, and

Re: [PATCH] KVM: LAPIC: Tune lapic_timer_advance_ns automatically

2018-10-08 Thread Wanpeng Li
On Mon, 8 Oct 2018 at 05:02, Liran Alon wrote: > > > > > On 28 Sep 2018, at 9:12, Wanpeng Li wrote: > > > > From: Wanpeng Li > > > > In cloud environment, lapic_timer_advance_ns is needed to be tuned for > > every CPU > > generations, and

Re: [PATCH] sched/fair: vruntime should normalize when switching from fair

2018-09-28 Thread Wanpeng Li
On Sat, 29 Sep 2018 at 01:36, Dietmar Eggemann wrote: > > On 09/28/2018 06:10 PM, Steve Muckle wrote: > > On 09/27/2018 05:43 PM, Wanpeng Li wrote: > >>>> On your CPU4: > >>>> scheduler_ipi() > >>>>-> sched_ttwu_pending() >

Re: [PATCH] sched/fair: vruntime should normalize when switching from fair

2018-09-28 Thread Wanpeng Li
On Sat, 29 Sep 2018 at 01:36, Dietmar Eggemann wrote: > > On 09/28/2018 06:10 PM, Steve Muckle wrote: > > On 09/27/2018 05:43 PM, Wanpeng Li wrote: > >>>> On your CPU4: > >>>> scheduler_ipi() > >>>>-> sched_ttwu_pending() >

[PATCH] KVM: LAPIC: Tune lapic_timer_advance_ns automatically

2018-09-28 Thread Wanpeng Li
From: Wanpeng Li In cloud environment, lapic_timer_advance_ns is needed to be tuned for every CPU generations, and every host kernel versions(the kvm-unit-tests/tscdeadline_latency.flat is 5700 cycles for upstream kernel and 9600 cycles for our 3.10 product kernel, both preemption_timer=N

[PATCH] KVM: LAPIC: Tune lapic_timer_advance_ns automatically

2018-09-28 Thread Wanpeng Li
From: Wanpeng Li In cloud environment, lapic_timer_advance_ns is needed to be tuned for every CPU generations, and every host kernel versions(the kvm-unit-tests/tscdeadline_latency.flat is 5700 cycles for upstream kernel and 9600 cycles for our 3.10 product kernel, both preemption_timer=N

Re: [PATCH] sched/fair: vruntime should normalize when switching from fair

2018-09-27 Thread Wanpeng Li
On Thu, 27 Sep 2018 at 21:23, Dietmar Eggemann wrote: > > On 09/27/2018 03:19 AM, Wanpeng Li wrote: > > On Thu, 27 Sep 2018 at 06:38, Dietmar Eggemann > > wrote: > >> > >> Hi, > >> > >> On 09/26/2018 11:50 AM, Wanpeng Li wrote: > &g

Re: [PATCH] sched/fair: vruntime should normalize when switching from fair

2018-09-27 Thread Wanpeng Li
On Thu, 27 Sep 2018 at 21:23, Dietmar Eggemann wrote: > > On 09/27/2018 03:19 AM, Wanpeng Li wrote: > > On Thu, 27 Sep 2018 at 06:38, Dietmar Eggemann > > wrote: > >> > >> Hi, > >> > >> On 09/26/2018 11:50 AM, Wanpeng Li wrote: > &g

Re: [PATCH] sched/fair: vruntime should normalize when switching from fair

2018-09-26 Thread Wanpeng Li
On Thu, 27 Sep 2018 at 06:38, Dietmar Eggemann wrote: > > Hi, > > On 09/26/2018 11:50 AM, Wanpeng Li wrote: > > Hi Dietmar, > > On Tue, 28 Aug 2018 at 22:55, Dietmar Eggemann > > wrote: > >> > >> On 08/27/2018 12:14 PM, Peter Zijlstra wrote: >

Re: [PATCH] sched/fair: vruntime should normalize when switching from fair

2018-09-26 Thread Wanpeng Li
On Thu, 27 Sep 2018 at 06:38, Dietmar Eggemann wrote: > > Hi, > > On 09/26/2018 11:50 AM, Wanpeng Li wrote: > > Hi Dietmar, > > On Tue, 28 Aug 2018 at 22:55, Dietmar Eggemann > > wrote: > >> > >> On 08/27/2018 12:14 PM, Peter Zijlstra wrote: >

Re: [PATCH] sched/fair: vruntime should normalize when switching from fair

2018-09-26 Thread Wanpeng Li
the fair rq's min_vruntime is added to > >>>>>> the task's vruntime, even though it wasn't subtracted earlier. Could you point out when the fair rq's min_vruntime is added to the task's vruntime in your *later* scenario? attach_task_cfs_rq will not do that the same re

Re: [PATCH] sched/fair: vruntime should normalize when switching from fair

2018-09-26 Thread Wanpeng Li
the fair rq's min_vruntime is added to > >>>>>> the task's vruntime, even though it wasn't subtracted earlier. Could you point out when the fair rq's min_vruntime is added to the task's vruntime in your *later* scenario? attach_task_cfs_rq will not do that the same re

<    1   2   3   4   5   6   7   8   9   10   >