Re: [patch 3/3] KVM: x86: add tracepoint to wait_lapic_expire
On 15/12/2014 23:06, Marcelo Tosatti wrote: Add tracepoint to wait_lapic_expire. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/lapic.c === --- kvm.orig/arch/x86/kvm/lapic.c +++ kvm/arch/x86/kvm/lapic.c @@ -1121,6 +1121,7 @@ void wait_lapic_expire(struct kvm_vcpu * { struct kvm_lapic *apic = vcpu-arch.apic; u64 guest_tsc, tsc_deadline; + unsigned int total_delay = 0; if (!kvm_vcpu_has_lapic(vcpu)) return; @@ -1138,9 +1139,13 @@ void wait_lapic_expire(struct kvm_vcpu * while (guest_tsc tsc_deadline) { int delay = min(tsc_deadline - guest_tsc, 1000ULL); + total_delay += delay; + __delay(delay); guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc()); } + + trace_kvm_wait_lapic_expire(vcpu-vcpu_id, total_delay); Let's add guest_tsc - tsc_deadline to the tracepoint. This should simplify the tuning of the parameter. Paolo } static void start_apic_timer(struct kvm_lapic *apic) Index: kvm/arch/x86/kvm/trace.h === --- kvm.orig/arch/x86/kvm/trace.h +++ kvm/arch/x86/kvm/trace.h @@ -914,6 +914,25 @@ TRACE_EVENT(kvm_pvclock_update, __entry-flags) ); +TRACE_EVENT(kvm_wait_lapic_expire, + TP_PROTO(unsigned int vcpu_id, unsigned int total_delay), + TP_ARGS(vcpu_id, total_delay), + + TP_STRUCT__entry( + __field(unsigned int, vcpu_id ) + __field(unsigned int, total_delay ) + ), + + TP_fast_assign( + __entry-vcpu_id = vcpu_id; + __entry-total_delay = total_delay; + ), + + TP_printk(vcpu %u: total_delay %u, + __entry-vcpu_id, + __entry-total_delay) +); + #endif /* _TRACE_KVM_H */ #undef TRACE_INCLUDE_PATH -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [v3 00/26] Add VT-d Posted-Interrupts support
Hi Paolo, Could you please have a look at this series? Thanks a lot! Thanks, Feng -Original Message- From: Wu, Feng Sent: Friday, December 12, 2014 11:15 PM To: t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; g...@kernel.org; pbonz...@redhat.com; dw...@infradead.org; j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org; io...@lists.linux-foundation.org; kvm@vger.kernel.org; Wu, Feng Subject: [v3 00/26] Add VT-d Posted-Interrupts support VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt. With VT-d Posted-Interrupts enabled, external interrupts from direct-assigned devices can be delivered to guests without VMM intervention when guest is running in non-root mode. You can find the VT-d Posted-Interrtups Spec. in the following URL: http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog y/vt-directed-io-spec.html v1-v2: * Use VFIO framework to enable this feature, the VFIO part of this series is base on Eric's patch [PATCH v3 0/8] KVM-VFIO IRQ forward control * Rebase this patchset on git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git, then revise some irq logic based on the new hierarchy irqdomain patches provided by Jiang Liu jiang@linux.intel.com v2-v3: * Adjust the Posted-interrupts Descriptor updating logic when vCPU is preempted or blocked. * KVM_DEV_VFIO_DEVICE_POSTING_IRQ -- KVM_DEV_VFIO_DEVICE_POST_IRQ * __KVM_HAVE_ARCH_KVM_VFIO_POSTING -- __KVM_HAVE_ARCH_KVM_VFIO_POST * Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which can be used to change back to remapping mode. * Fix typo This patch series is made of the following groups: 1-6: Some preparation changes in iommu and irq component, this is based on the new hierarchy irqdomain logic. 7-9, 26: IOMMU changes for VT-d Posted-Interrupts, such as, feature detection, command line parameter. 10-17, 22-25: Changes related to KVM itself. 18-20: Changes in VFIO component, this part was previously sent out as [RFC PATCH v2 0/2] kvm-vfio: implement the vfio skeleton for VT-d Posted-Interrupts 21: x86 irq related changes Feng Wu (26): genirq: Introduce irq_set_vcpu_affinity() to target an interrupt to a VCPU iommu: Add new member capability to struct irq_remap_ops iommu, x86: Define new irte structure for VT-d Posted-Interrupts iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip x86, irq: Implement irq_set_vcpu_affinity for pci_msi_ir_controller iommu, x86: No need to migrating irq for VT-d Posted-Interrupts iommu, x86: Add cap_pi_support() to detect VT-d PI capability iommu, x86: Add intel_irq_remapping_capability() for Intel iommu, x86: define irq_remapping_cap() KVM: change struct pi_desc for VT-d Posted-Interrupts KVM: Add some helper functions for Posted-Interrupts KVM: Initialize VT-d Posted-Interrupts Descriptor KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu KVM: add interfaces to control PI outside vmx KVM: Make struct kvm_irq_routing_table accessible KVM: make kvm_set_msi_irq() public KVM: kvm-vfio: User API for VT-d Posted-Interrupts KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts KVM: x86: kvm-vfio: VT-d posted-interrupts setup x86, irq: Define a global vector for VT-d Posted-Interrupts KVM: Define a wakeup worker thread for vCPU KVM: Update Posted-Interrupts Descriptor when vCPU is preempted KVM: Update Posted-Interrupts Descriptor when vCPU is blocked KVM: Suppress posted-interrupt when 'SN' is set iommu/vt-d: Add a command line parameter for VT-d posted-interrupts Documentation/kernel-parameters.txt| 1 + Documentation/virtual/kvm/devices/vfio.txt | 9 ++ arch/x86/include/asm/entry_arch.h | 2 + arch/x86/include/asm/hardirq.h | 1 + arch/x86/include/asm/hw_irq.h | 2 + arch/x86/include/asm/irq_remapping.h | 11 ++ arch/x86/include/asm/irq_vectors.h | 1 + arch/x86/include/asm/kvm_host.h| 12 ++ arch/x86/kernel/apic/msi.c | 1 + arch/x86/kernel/entry_64.S | 2 + arch/x86/kernel/irq.c | 27 arch/x86/kernel/irqinit.c | 2 + arch/x86/kvm/Makefile | 2 +- arch/x86/kvm/kvm_vfio_x86.c| 77 + arch/x86/kvm/vmx.c | 244 - arch/x86/kvm/x86.c | 22 ++- drivers/iommu/intel_irq_remapping.c| 68 +++- drivers/iommu/irq_remapping.c | 24 ++- drivers/iommu/irq_remapping.h | 8 + include/linux/dmar.h | 32 include/linux/intel-iommu.h
Re: [PATCH] KVM: nVMX: consult PFEC_MASK and PFEC_MATCH when generating #PF VM-exit
On 15/12/2014 21:56, Eugene Korenevsky wrote: + u32 inequality, bit; + + bit = (vmcs12-exception_bitmap (1u PF_VECTOR)) ? 1u : 0; + inequality = + (error_code vmcs12-page_fault_error_code_mask) != + vmcs12-page_fault_error_code_match ? 1u : 0; You should either remove ? 1u : 0 (which is redundant), or flip the bit in the exception bitmap, like inequality = ... ? (1u PF_VECTOR) : 0; return ((vmcs12-exception_bitmap ^ inequality) (1u PF_VECTOR)) != 0; If you choose the former, please use != 0 in the assignment of bit instead of the ternary operator, and make the two variables bool. Then you can remove the != 0 in the return below. Paolo + return (inequality ^ bit) != 0; +} -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why do additional cores reduce performance?
On 16/12/2014 00:40, Oleg Ovechko wrote: A. Host Windows, 6 cores (no HT, turbo boost off): 6:23 (+- 10 secs) B. Host Windows, 1 CPU core (other are turned off in BIOS): 7:13 (+-10 secs) C. Host 1 core, Guest Windows 1 core: 7:15 - same as B, no degradation D. Host 6 cores, Guest Windows 1 core: 7:57 E. Host 6 cores, Guest Windows 4 cores: 8:17 What is your benchmark? Windows sometimes has scalability problems due to the way it does timing. Try replacing -cpu host with -no-hpet -cpu host,hv_time,hv_vapic. 3. Also I am unsure about HT. When I specify cores=2, I suppose you mean threads=2. is there any guaranty that whole core with both HT parts is passed to VM? Or it can be mix of two real cores with separate caches? It will be a mix. Do not specify HT in the guest, unless you have HT in the host _and_ you are pinning the two threads of each guest core to the two threads of a host core. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH/RFC] s390/kernel: use stnsm instead of stosm
At least on z196 stnsm is faster than stosm. Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com --- arch/s390/include/asm/irqflags.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/s390/include/asm/irqflags.h b/arch/s390/include/asm/irqflags.h index 37b9091..16aa0c7 100644 --- a/arch/s390/include/asm/irqflags.h +++ b/arch/s390/include/asm/irqflags.h @@ -36,7 +36,7 @@ static inline notrace void __arch_local_irq_ssm(unsigned long flags) static inline notrace unsigned long arch_local_save_flags(void) { - return __arch_local_irq_stosm(0x00); + return __arch_local_irq_stnsm(0xff); } static inline notrace unsigned long arch_local_irq_save(void) -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] s390/kernel: use stnsm instead of stosm
Paolo, sorry, should have only go to Martin and Heiko. Nothing to worry about from your side. :-) Am 16.12.2014 um 10:30 schrieb Christian Borntraeger: At least on z196 stnsm is faster than stosm. Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com --- arch/s390/include/asm/irqflags.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/s390/include/asm/irqflags.h b/arch/s390/include/asm/irqflags.h index 37b9091..16aa0c7 100644 --- a/arch/s390/include/asm/irqflags.h +++ b/arch/s390/include/asm/irqflags.h @@ -36,7 +36,7 @@ static inline notrace void __arch_local_irq_ssm(unsigned long flags) static inline notrace unsigned long arch_local_save_flags(void) { - return __arch_local_irq_stosm(0x00); + return __arch_local_irq_stnsm(0xff); } static inline notrace unsigned long arch_local_irq_save(void) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] s390/kernel: use stnsm instead of stosm
On 16/12/2014 10:31, Christian Borntraeger wrote: Paolo, sorry, should have only go to Martin and Heiko. Nothing to worry about from your side. :-) No problem, it's always fun to learn new s390 instructions. :) Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/3] KVM: x86: add tracepoint to wait_lapic_expire
On Tue, Dec 16, 2014 at 10:03:39AM +0100, Paolo Bonzini wrote: On 15/12/2014 23:06, Marcelo Tosatti wrote: Add tracepoint to wait_lapic_expire. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/lapic.c === --- kvm.orig/arch/x86/kvm/lapic.c +++ kvm/arch/x86/kvm/lapic.c @@ -1121,6 +1121,7 @@ void wait_lapic_expire(struct kvm_vcpu * { struct kvm_lapic *apic = vcpu-arch.apic; u64 guest_tsc, tsc_deadline; + unsigned int total_delay = 0; if (!kvm_vcpu_has_lapic(vcpu)) return; @@ -1138,9 +1139,13 @@ void wait_lapic_expire(struct kvm_vcpu * while (guest_tsc tsc_deadline) { int delay = min(tsc_deadline - guest_tsc, 1000ULL); + total_delay += delay; + __delay(delay); guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc()); } + + trace_kvm_wait_lapic_expire(vcpu-vcpu_id, total_delay); Let's add guest_tsc - tsc_deadline to the tracepoint. This should simplify the tuning of the parameter. Paolo total_delay is very close to that, except the summands are 1000 + 1000 + ... + remainder Yes? BTW its very easy to tune the parameter with the kvm-unit-test test (the optimal value is clear). I'll write a document. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/3] KVM: x86: add tracepoint to wait_lapic_expire
On 16/12/2014 13:15, Marcelo Tosatti wrote: Let's add guest_tsc - tsc_deadline to the tracepoint. This should simplify the tuning of the parameter. total_delay is very close to that, except the summands are 1000 + 1000 + ... + remainder Yes? Almost: guest_tsc - tsc_deadline will be negative if the vmentry overshot the original tsc_deadline. In that case, total_delay will be zero. BTW its very easy to tune the parameter with the kvm-unit-test test (the optimal value is clear). I'll write a document. Nice benefit of the test. :) Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/3] KVM: add option to advance tscdeadline hrtimer expiration (v5)
See patches for details. v2: - fix email address. v3: - use module parameter for configuration of value (Paolo/Radim) v4: - fix check for tscdeadline mode while waiting for expiration (Paolo) - use proper delay function (Radim) - fix LVTT tscdeadline mode check in hrtimer interrupt handler (Radim) - add comment regarding PPR and APICv (Paolo) v5: - use tscdeadline expiration and guest tsc difference in tracepoint (Paolo) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/3] KVM: x86: add option to advance tscdeadline hrtimer expiration
For the hrtimer which emulates the tscdeadline timer in the guest, add an option to advance expiration, and busy spin on VM-entry waiting for the actual expiration time to elapse. This allows achieving low latencies in cyclictest (or any scenario which requires strict timing regarding timer expiration). Reduces average cyclictest latency from 12us to 8us on Core i5 desktop. Note: this option requires tuning to find the appropriate value for a particular hardware/guest combination. One method is to measure the average delay between apic_timer_fn and VM-entry. Another method is to start with 1000ns, and increase the value in say 500ns increments until avg cyclictest numbers stop decreasing. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/lapic.c === --- kvm.orig/arch/x86/kvm/lapic.c +++ kvm/arch/x86/kvm/lapic.c @@ -33,6 +33,7 @@ #include asm/page.h #include asm/current.h #include asm/apicdef.h +#include asm/delay.h #include linux/atomic.h #include linux/jump_label.h #include kvm_cache_regs.h @@ -1073,6 +1074,7 @@ static void apic_timer_expired(struct kv { struct kvm_vcpu *vcpu = apic-vcpu; wait_queue_head_t *q = vcpu-wq; + struct kvm_timer *ktimer = apic-lapic_timer; /* * Note: KVM_REQ_PENDING_TIMER is implicitly checked in @@ -1087,11 +1089,64 @@ static void apic_timer_expired(struct kv if (waitqueue_active(q)) wake_up_interruptible(q); + + if (apic_lvtt_tscdeadline(apic)) + ktimer-expired_tscdeadline = ktimer-tscdeadline; +} + +/* + * On APICv, this test will cause a busy wait + * during a higher-priority task. + */ + +static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu) +{ + struct kvm_lapic *apic = vcpu-arch.apic; + u32 reg = kvm_apic_get_reg(apic, APIC_LVTT); + + if (kvm_apic_hw_enabled(apic)) { + int vec = reg APIC_VECTOR_MASK; + + if (kvm_x86_ops-test_posted_interrupt) + return kvm_x86_ops-test_posted_interrupt(vcpu, vec); + else { + if (apic_test_vector(vec, apic-regs + APIC_ISR)) + return true; + } + } + return false; +} + +void wait_lapic_expire(struct kvm_vcpu *vcpu) +{ + struct kvm_lapic *apic = vcpu-arch.apic; + u64 guest_tsc, tsc_deadline; + + if (!kvm_vcpu_has_lapic(vcpu)) + return; + + if (apic-lapic_timer.expired_tscdeadline == 0) + return; + + if (!lapic_timer_int_injected(vcpu)) + return; + + tsc_deadline = apic-lapic_timer.expired_tscdeadline; + apic-lapic_timer.expired_tscdeadline = 0; + guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc()); + + while (guest_tsc tsc_deadline) { + int delay = min(tsc_deadline - guest_tsc, 1000ULL); + + __delay(delay); + guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc()); + } } static void start_apic_timer(struct kvm_lapic *apic) { ktime_t now; + atomic_set(apic-lapic_timer.pending, 0); if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) { @@ -1137,6 +1192,7 @@ static void start_apic_timer(struct kvm_ /* lapic timer in tsc deadline mode */ u64 guest_tsc, tscdeadline = apic-lapic_timer.tscdeadline; u64 ns = 0; + ktime_t expire; struct kvm_vcpu *vcpu = apic-vcpu; unsigned long this_tsc_khz = vcpu-arch.virtual_tsc_khz; unsigned long flags; @@ -1151,8 +1207,10 @@ static void start_apic_timer(struct kvm_ if (likely(tscdeadline guest_tsc)) { ns = (tscdeadline - guest_tsc) * 100ULL; do_div(ns, this_tsc_khz); + expire = ktime_add_ns(now, ns); + expire = ktime_sub_ns(expire, lapic_timer_advance_ns); hrtimer_start(apic-lapic_timer.timer, - ktime_add_ns(now, ns), HRTIMER_MODE_ABS); + expire, HRTIMER_MODE_ABS); } else apic_timer_expired(apic); Index: kvm/arch/x86/kvm/lapic.h === --- kvm.orig/arch/x86/kvm/lapic.h +++ kvm/arch/x86/kvm/lapic.h @@ -14,6 +14,7 @@ struct kvm_timer { u32 timer_mode; u32 timer_mode_mask; u64 tscdeadline; + u64 expired_tscdeadline; atomic_t pending; /* accumulated triggered timers */ }; @@ -170,4 +171,6 @@ static inline bool kvm_apic_has_events(s bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector); +void wait_lapic_expire(struct kvm_vcpu *vcpu); + #endif Index: kvm/arch/x86/kvm/x86.c
[patch 1/3] KVM: x86: add method to test PIR bitmap vector
kvm_x86_ops-test_posted_interrupt() returns true/false depending whether 'vector' is set. Next patch makes use of this interface. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/include/asm/kvm_host.h === --- kvm.orig/arch/x86/include/asm/kvm_host.h +++ kvm/arch/x86/include/asm/kvm_host.h @@ -743,6 +743,7 @@ struct kvm_x86_ops { void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set); void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa); void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector); + bool (*test_posted_interrupt)(struct kvm_vcpu *vcpu, int vector); void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu); int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); int (*get_tdp_level)(void); Index: kvm/arch/x86/kvm/vmx.c === --- kvm.orig/arch/x86/kvm/vmx.c +++ kvm/arch/x86/kvm/vmx.c @@ -435,6 +435,11 @@ static int pi_test_and_set_pir(int vecto return test_and_set_bit(vector, (unsigned long *)pi_desc-pir); } +static int pi_test_pir(int vector, struct pi_desc *pi_desc) +{ + return test_bit(vector, (unsigned long *)pi_desc-pir); +} + struct vcpu_vmx { struct kvm_vcpu vcpu; unsigned long host_rsp; @@ -5939,6 +5944,7 @@ static __init int hardware_setup(void) else { kvm_x86_ops-hwapic_irr_update = NULL; kvm_x86_ops-deliver_posted_interrupt = NULL; + kvm_x86_ops-test_posted_interrupt = NULL; kvm_x86_ops-sync_pir_to_irr = vmx_sync_pir_to_irr_dummy; } @@ -6960,6 +6966,13 @@ static int handle_invvpid(struct kvm_vcp return 1; } +static bool vmx_test_pir(struct kvm_vcpu *vcpu, int vector) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + + return pi_test_pir(vector, vmx-pi_desc); +} + /* * The exit handlers return 1 if the exit was handled fully and guest execution * may resume. Otherwise they set the kvm_run parameter to indicate what needs @@ -9374,6 +9387,7 @@ static struct kvm_x86_ops vmx_x86_ops = .hwapic_isr_update = vmx_hwapic_isr_update, .sync_pir_to_irr = vmx_sync_pir_to_irr, .deliver_posted_interrupt = vmx_deliver_posted_interrupt, + .test_posted_interrupt = vmx_test_pir, .set_tss_addr = vmx_set_tss_addr, .get_tdp_level = get_ept_level, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 3/3] KVM: x86: add tracepoint to wait_lapic_expire
Add tracepoint to wait_lapic_expire. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/lapic.c === --- kvm.orig/arch/x86/kvm/lapic.c +++ kvm/arch/x86/kvm/lapic.c @@ -1120,7 +1120,7 @@ static bool lapic_timer_int_injected(str void wait_lapic_expire(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic = vcpu-arch.apic; - u64 guest_tsc, tsc_deadline; + u64 orig_guest_tsc, guest_tsc, tsc_deadline; if (!kvm_vcpu_has_lapic(vcpu)) return; @@ -1133,7 +1133,7 @@ void wait_lapic_expire(struct kvm_vcpu * tsc_deadline = apic-lapic_timer.expired_tscdeadline; apic-lapic_timer.expired_tscdeadline = 0; - guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc()); + orig_guest_tsc = guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc()); while (guest_tsc tsc_deadline) { int delay = min(tsc_deadline - guest_tsc, 1000ULL); @@ -1141,6 +1141,8 @@ void wait_lapic_expire(struct kvm_vcpu * __delay(delay); guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc()); } + + trace_kvm_wait_lapic_expire(vcpu-vcpu_id, orig_guest_tsc - tsc_deadline); } static void start_apic_timer(struct kvm_lapic *apic) Index: kvm/arch/x86/kvm/trace.h === --- kvm.orig/arch/x86/kvm/trace.h +++ kvm/arch/x86/kvm/trace.h @@ -914,6 +914,25 @@ TRACE_EVENT(kvm_pvclock_update, __entry-flags) ); +TRACE_EVENT(kvm_wait_lapic_expire, + TP_PROTO(unsigned int vcpu_id, s64 delta), + TP_ARGS(vcpu_id, delta), + + TP_STRUCT__entry( + __field(unsigned int, vcpu_id ) + __field(s64,delta ) + ), + + TP_fast_assign( + __entry-vcpu_id = vcpu_id; + __entry-delta = delta; + ), + + TP_printk(vcpu %u: delta %lld, + __entry-vcpu_id, + __entry-delta) +); + #endif /* _TRACE_KVM_H */ #undef TRACE_INCLUDE_PATH -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/3] KVM: x86: add option to advance tscdeadline hrtimer expiration
On 15/12/2014 23:06, Marcelo Tosatti wrote: For the hrtimer which emulates the tscdeadline timer in the guest, add an option to advance expiration, and busy spin on VM-entry waiting for the actual expiration time to elapse. This allows achieving low latencies in cyclictest (or any scenario which requires strict timing regarding timer expiration). Reduces average cyclictest latency from 12us to 8us on Core i5 desktop. Note: this option requires tuning to find the appropriate value for a particular hardware/guest combination. One method is to measure the average delay between apic_timer_fn and VM-entry. Another method is to start with 1000ns, and increase the value in say 500ns increments until avg cyclictest numbers stop decreasing. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/lapic.c === --- kvm.orig/arch/x86/kvm/lapic.c +++ kvm/arch/x86/kvm/lapic.c @@ -33,6 +33,7 @@ #include asm/page.h #include asm/current.h #include asm/apicdef.h +#include asm/delay.h #include linux/atomic.h #include linux/jump_label.h #include kvm_cache_regs.h @@ -1073,6 +1074,7 @@ static void apic_timer_expired(struct kv { struct kvm_vcpu *vcpu = apic-vcpu; wait_queue_head_t *q = vcpu-wq; + struct kvm_timer *ktimer = apic-lapic_timer; /* * Note: KVM_REQ_PENDING_TIMER is implicitly checked in @@ -1087,11 +1089,64 @@ static void apic_timer_expired(struct kv if (waitqueue_active(q)) wake_up_interruptible(q); + + if (apic_lvtt_tscdeadline(apic)) + ktimer-expired_tscdeadline = ktimer-tscdeadline; +} + +/* + * On APICv, this test will cause a busy wait + * during a higher-priority task. + */ + +static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu) +{ + struct kvm_lapic *apic = vcpu-arch.apic; + u32 reg = kvm_apic_get_reg(apic, APIC_LVTT); + + if (kvm_apic_hw_enabled(apic)) { + int vec = reg APIC_VECTOR_MASK; + + if (kvm_x86_ops-test_posted_interrupt) + return kvm_x86_ops-test_posted_interrupt(vcpu, vec); + else { + if (apic_test_vector(vec, apic-regs + APIC_ISR)) + return true; + } + } + return false; +} + +void wait_lapic_expire(struct kvm_vcpu *vcpu) +{ + struct kvm_lapic *apic = vcpu-arch.apic; + u64 guest_tsc, tsc_deadline; + + if (!kvm_vcpu_has_lapic(vcpu)) + return; + + if (apic-lapic_timer.expired_tscdeadline == 0) + return; + + if (!lapic_timer_int_injected(vcpu)) + return; By the time we get here, I think, if expired_tscdeadline != 0 we're sure that the interrupt has been injected. It may be in IRR rather than ISR, but at least on APICv the last test should be redundant. So perhaps you can get rid of patch 1 and check kvm_apic_vid_enabled(vcpu-kvm): if (k_a_v_e(vcpu-kvm) return true; if (apic_test_vector(vec, apic-regs + APIC_ISR)) return true; Does this sound correct? Paolo + tsc_deadline = apic-lapic_timer.expired_tscdeadline; + apic-lapic_timer.expired_tscdeadline = 0; + guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc()); + + while (guest_tsc tsc_deadline) { + int delay = min(tsc_deadline - guest_tsc, 1000ULL); + + __delay(delay); + guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, native_read_tsc()); + } } static void start_apic_timer(struct kvm_lapic *apic) { ktime_t now; + atomic_set(apic-lapic_timer.pending, 0); if (apic_lvtt_period(apic) || apic_lvtt_oneshot(apic)) { @@ -1137,6 +1192,7 @@ static void start_apic_timer(struct kvm_ /* lapic timer in tsc deadline mode */ u64 guest_tsc, tscdeadline = apic-lapic_timer.tscdeadline; u64 ns = 0; + ktime_t expire; struct kvm_vcpu *vcpu = apic-vcpu; unsigned long this_tsc_khz = vcpu-arch.virtual_tsc_khz; unsigned long flags; @@ -1151,8 +1207,10 @@ static void start_apic_timer(struct kvm_ if (likely(tscdeadline guest_tsc)) { ns = (tscdeadline - guest_tsc) * 100ULL; do_div(ns, this_tsc_khz); + expire = ktime_add_ns(now, ns); + expire = ktime_sub_ns(expire, lapic_timer_advance_ns); hrtimer_start(apic-lapic_timer.timer, - ktime_add_ns(now, ns), HRTIMER_MODE_ABS); + expire, HRTIMER_MODE_ABS); } else apic_timer_expired(apic); Index: kvm/arch/x86/kvm/lapic.h === ---
Re: [patch 2/3] KVM: x86: add option to advance tscdeadline hrtimer expiration
On Tue, Dec 16, 2014 at 03:34:22PM +0100, Paolo Bonzini wrote: On 15/12/2014 23:06, Marcelo Tosatti wrote: For the hrtimer which emulates the tscdeadline timer in the guest, add an option to advance expiration, and busy spin on VM-entry waiting for the actual expiration time to elapse. This allows achieving low latencies in cyclictest (or any scenario which requires strict timing regarding timer expiration). Reduces average cyclictest latency from 12us to 8us on Core i5 desktop. Note: this option requires tuning to find the appropriate value for a particular hardware/guest combination. One method is to measure the average delay between apic_timer_fn and VM-entry. Another method is to start with 1000ns, and increase the value in say 500ns increments until avg cyclictest numbers stop decreasing. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/lapic.c === --- kvm.orig/arch/x86/kvm/lapic.c +++ kvm/arch/x86/kvm/lapic.c @@ -33,6 +33,7 @@ #include asm/page.h #include asm/current.h #include asm/apicdef.h +#include asm/delay.h #include linux/atomic.h #include linux/jump_label.h #include kvm_cache_regs.h @@ -1073,6 +1074,7 @@ static void apic_timer_expired(struct kv { struct kvm_vcpu *vcpu = apic-vcpu; wait_queue_head_t *q = vcpu-wq; + struct kvm_timer *ktimer = apic-lapic_timer; /* * Note: KVM_REQ_PENDING_TIMER is implicitly checked in @@ -1087,11 +1089,64 @@ static void apic_timer_expired(struct kv if (waitqueue_active(q)) wake_up_interruptible(q); + + if (apic_lvtt_tscdeadline(apic)) + ktimer-expired_tscdeadline = ktimer-tscdeadline; +} + +/* + * On APICv, this test will cause a busy wait + * during a higher-priority task. + */ + +static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu) +{ + struct kvm_lapic *apic = vcpu-arch.apic; + u32 reg = kvm_apic_get_reg(apic, APIC_LVTT); + + if (kvm_apic_hw_enabled(apic)) { + int vec = reg APIC_VECTOR_MASK; + + if (kvm_x86_ops-test_posted_interrupt) + return kvm_x86_ops-test_posted_interrupt(vcpu, vec); + else { + if (apic_test_vector(vec, apic-regs + APIC_ISR)) + return true; + } + } + return false; +} + +void wait_lapic_expire(struct kvm_vcpu *vcpu) +{ + struct kvm_lapic *apic = vcpu-arch.apic; + u64 guest_tsc, tsc_deadline; + + if (!kvm_vcpu_has_lapic(vcpu)) + return; + + if (apic-lapic_timer.expired_tscdeadline == 0) + return; + + if (!lapic_timer_int_injected(vcpu)) + return; By the time we get here, I think, if expired_tscdeadline != 0 we're sure that the interrupt has been injected. It may be in IRR rather than ISR, but at least on APICv the last test should be redundant. So perhaps you can get rid of patch 1 and check kvm_apic_vid_enabled(vcpu-kvm): if (k_a_v_e(vcpu-kvm) return true; if (apic_test_vector(vec, apic-regs + APIC_ISR)) return true; Does this sound correct? * expired_tscdeadline != 0. * APIC timer interrupt delivery masked at LVTT register. Implies expired_tscdeadline != 0 and interrupt not injected. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/3] KVM: x86: add option to advance tscdeadline hrtimer expiration
On 16/12/2014 16:13, Marcelo Tosatti wrote: So perhaps you can get rid of patch 1 and check kvm_apic_vid_enabled(vcpu-kvm): if (k_a_v_e(vcpu-kvm) return true; if (apic_test_vector(vec, apic-regs + APIC_ISR)) return true; Does this sound correct? * expired_tscdeadline != 0. * APIC timer interrupt delivery masked at LVTT register. Implies expired_tscdeadline != 0 and interrupt not injected. Good point. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: iommu: Add cond_resched to legacy device assignment code
From: Joerg Roedel jroe...@suse.de When assigning devices to large memory guests (=128GB guest memory in the failure case) the functions to create the IOMMU page-tables for the whole guest might run for a very long time. On non-preemptible kernels this might cause Soft-Lockup warnings. Fix these by adding a cond_resched() to the mapping and unmapping loops. Signed-off-by: Joerg Roedel jroe...@suse.de --- virt/kvm/iommu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c index c1e6ae9..ac427e8 100644 --- a/virt/kvm/iommu.c +++ b/virt/kvm/iommu.c @@ -137,7 +137,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot) gfn += page_size PAGE_SHIFT; - + cond_resched(); } return 0; @@ -311,6 +311,8 @@ static void kvm_iommu_put_pages(struct kvm *kvm, kvm_unpin_pages(kvm, pfn, unmap_pages); gfn += unmap_pages; + + cond_resched(); } } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why do additional cores reduce performance?
What is your benchmark? I've tried different ways (CrystalDiskMark 3.0.3 x64, ATTO Disk Banchmark v2.47) all give same result. The numbers I've provided in 1st mail are for 100G file copied over. I simply subtract stop and start times. 50 seconds is so huge difference (three sigma rule gives 10 secs for 10 tries), I can even use wall clocks. When everything is enabled in BIOS it is 6:23 on real Windows versus 9:03 on virtualized... Phil Ehrens has sent me link https://lists.gnu.org/archive/html/qemu-discuss/2014-10/msg00036.html If I don't misunderstand, it means kvm/qemu simply is not designed for multi-threading. I guess I need to try different hypervisor. 50% performance is too high price especially when VT-x and VT-d are meant to make it 0% Windows sometimes has scalability problems due to the way it does timing. Try replacing -cpu host with -no-hpet -cpu host,hv_time,hv_vapic. Does not change results. It will be a mix. Do not specify HT in the guest, unless you have HT in the host _and_ you are pinning the two threads of each guest core to the two threads of a host core. Do you mean -smp 4,sockets=1,cores=2,threads=2 for 2 cores with HT enabled? Gives even worth result - 9:17 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Why do additional cores reduce performance?
On 16/12/2014 17:22, Oleg Ovechko wrote: What is your benchmark? I've tried different ways (CrystalDiskMark 3.0.3 x64, ATTO Disk Banchmark v2.47) all give same result. All are run on the AHCI passthrough disk(s), right? When everything is enabled in BIOS it is 6:23 on real Windows versus 9:03 on virtualized... Phil Ehrens has sent me link https://lists.gnu.org/archive/html/qemu-discuss/2014-10/msg00036.html If I don't misunderstand, it means kvm/qemu simply is not designed for multi-threading. No, it means TCG does not support multithreading. KVM does, and you are using it. I guess I need to try different hypervisor. 50% performance is too high price especially when VT-x and VT-d are meant to make it 0% It is surprising to me too. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] [kvmtool]: Use the arch default transport method for network
From: Suzuki K. Poulose suzuki.poul...@arm.com lkvm by default sets up a virtio-pci transport for network, if none is specified. This can be a problem on archs (e.g ARM64), where virtio-pci is not supported yet and cause the following warning at exit. # KVM compatibility warning. virtio-net device was not detected. While you have requested a virtio-net device, the guest kernel did not initialize it. Please make sure that the guest kernel was compiled with CONFIG_VIRTIO_NET=y enabled in .config. This patch changes it to make use of the default transport method for the architecture when none is specified. This will ensure that on every arch we get the network up by default in the VM. Applies on top of the kvm/arm branch in Will's kvmtool tree. Signed-off-by: Suzuki K. Poulose suzuki.poul...@arm.com Acked-by: Will Deacon will.dea...@arm.com --- tools/kvm/include/kvm/virtio.h |1 + tools/kvm/virtio/core.c|9 + tools/kvm/virtio/net.c | 21 +++-- 3 files changed, 25 insertions(+), 6 deletions(-) diff --git a/tools/kvm/include/kvm/virtio.h b/tools/kvm/include/kvm/virtio.h index 8a9eab5..768ee96 100644 --- a/tools/kvm/include/kvm/virtio.h +++ b/tools/kvm/include/kvm/virtio.h @@ -160,6 +160,7 @@ int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev, struct virtio_ops *ops, enum virtio_trans trans, int device_id, int subsys_id, int class); int virtio_compat_add_message(const char *device, const char *config); +const char* virtio_trans_name(enum virtio_trans trans); static inline void *virtio_get_vq(struct kvm *kvm, u32 pfn, u32 page_size) { diff --git a/tools/kvm/virtio/core.c b/tools/kvm/virtio/core.c index 9ae7887..3b6e4d7 100644 --- a/tools/kvm/virtio/core.c +++ b/tools/kvm/virtio/core.c @@ -12,6 +12,15 @@ #include kvm/kvm.h +const char* virtio_trans_name(enum virtio_trans trans) +{ + if (trans == VIRTIO_PCI) + return pci; + else if (trans == VIRTIO_MMIO) + return mmio; + return unknown; +} + struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, u32 head, u32 len) { struct vring_used_elem *used_elem; diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c index c8af385..e9daea4 100644 --- a/tools/kvm/virtio/net.c +++ b/tools/kvm/virtio/net.c @@ -758,6 +758,7 @@ static int virtio_net__init_one(struct virtio_net_params *params) int i, err; struct net_dev *ndev; struct virtio_ops *ops; + enum virtio_trans trans = VIRTIO_DEFAULT_TRANS(params-kvm); ndev = calloc(1, sizeof(struct net_dev)); if (ndev == NULL) @@ -799,12 +800,20 @@ static int virtio_net__init_one(struct virtio_net_params *params) } *ops = net_dev_virtio_ops; - if (params-trans strcmp(params-trans, mmio) == 0) - virtio_init(params-kvm, ndev, ndev-vdev, ops, VIRTIO_MMIO, - PCI_DEVICE_ID_VIRTIO_NET, VIRTIO_ID_NET, PCI_CLASS_NET); - else - virtio_init(params-kvm, ndev, ndev-vdev, ops, VIRTIO_PCI, - PCI_DEVICE_ID_VIRTIO_NET, VIRTIO_ID_NET, PCI_CLASS_NET); + + if (params-trans) { + if (strcmp(params-trans, mmio) == 0) + trans = VIRTIO_MMIO; + else if (strcmp(params-trans, pci) == 0) + trans = VIRTIO_PCI; + else + pr_warning(virtio-net: Unknown transport method : %s, + falling back to %s., params-trans, + virtio_trans_name(trans)); + } + + virtio_init(params-kvm, ndev, ndev-vdev, ops, trans, + PCI_DEVICE_ID_VIRTIO_NET, VIRTIO_ID_NET, PCI_CLASS_NET); if (params-vhost) virtio_net__vhost_init(params-kvm, ndev); -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[v2 PATCH] KVM: nVMX: consult PFEC_MASK and PFEC_MATCH when generating #PF VM-exit
When generating #PF VM-exit, check equality: (PFEC PFEC_MASK) == PFEC_MATCH If there is equality, the 14 bit of exception bitmap is used to take decision about generating #PF VM-exit. If there is inequality, inverted 14 bit is used. Signed-off-by: Eugene Korenevsky ekorenev...@gmail.com --- arch/x86/kvm/vmx.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 09ccf6c..a8ef8265 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8206,6 +8206,18 @@ static void nested_ept_uninit_mmu_context(struct kvm_vcpu *vcpu) vcpu-arch.walk_mmu = vcpu-arch.mmu; } +static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, + u16 error_code) +{ + bool inequality, bit; + + bit = (vmcs12-exception_bitmap (1u PF_VECTOR)) != 0; + inequality = + (error_code vmcs12-page_fault_error_code_mask) != +vmcs12-page_fault_error_code_match; + return inequality ^ bit; +} + static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu, struct x86_exception *fault) { @@ -8213,8 +8225,7 @@ static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu, WARN_ON(!is_guest_mode(vcpu)); - /* TODO: also check PFEC_MATCH/MASK, not just EB.PF. */ - if (vmcs12-exception_bitmap (1u PF_VECTOR)) + if (nested_vmx_is_page_fault_vmexit(vmcs12, fault-error_code)) nested_vmx_vmexit(vcpu, to_vmx(vcpu)-exit_reason, vmcs_read32(VM_EXIT_INTR_INFO), vmcs_readl(EXIT_QUALIFICATION)); -- 2.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: iommu: Add cond_resched to legacy device assignment code
On 2014/12/16 23:47, Joerg Roedel wrote: From: Joerg Roedel jroe...@suse.de When assigning devices to large memory guests (=128GB guest memory in the failure case) the functions to create the IOMMU page-tables for the whole guest might run for a very long time. On non-preemptible kernels this might cause Soft-Lockup warnings. Fix these by adding a cond_resched() to the mapping and unmapping loops. Signed-off-by: Joerg Roedel jroe...@suse.de --- virt/kvm/iommu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c index c1e6ae9..ac427e8 100644 --- a/virt/kvm/iommu.c +++ b/virt/kvm/iommu.c This file is already gone after one latest commit c274e03af705, kvm: x86: move assigned-dev.c and iommu.c to arch/x86/ is introduced, so you need to pull your tree firstly :) Tiejun @@ -137,7 +137,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot) gfn += page_size PAGE_SHIFT; - + cond_resched(); } return 0; @@ -311,6 +311,8 @@ static void kvm_iommu_put_pages(struct kvm *kvm, kvm_unpin_pages(kvm, pfn, unmap_pages); gfn += unmap_pages; + + cond_resched(); } } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Clarification for patch 7
Hi Christoffer, Marc - in stage2_dissolve_pmd() CONFIG_SMP is unnecessary. At the time huge page is write protected, until it faults and is cleared any page in the range may be dirty not just the gpa access that caused the fault. The comment on another CPU is wrong, I confused myself while testing it should not be possible for another CPU to write to same PMD range while another is handling its PMD fault. I ran a test with only initrd which exposed an issue using my test scenario, QEMU appears fine. It also depends on user space if you first turn on logging, do pre-copy then marking just the page is enough. It's hard to interpret the API in this case. It just says dirty pages since the last call. That patch could be resent without upsetting the rest. - Mario -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[question] Why newer QEMU may lose irq when doing migration?
Hi, all: The patchset (https://lkml.org/lkml/2014/3/18/309) fixed migration of Windows guests, but commit 0bc830b05c667218d703f2026ec866c49df974fc (KVM: ioapic: clear IRR for edge-triggered interrupts at delivery) introduced a bug (see https://www.mail-archive.com/kvm@vger.kernel.org/msg109813.html). From the description Unlike the old qemu-kvm, which really never did that, with new QEMU it is for some reason somewhat likely to migrate a VM with a nonzero IRR in the ioapic. Why could new QEMU do that? I can not find any codes about the some reason.. As we know, once a irq is set in kvm's ioapic, the ioapic will send that irq to lapic, this is an atomic operation. Then, kvm will inject them in inject_pending_event(or set rvi in apic-v case). QEMU will also save the pending irq when doing migration. I can not find a point which guest could lose a irq, but this scenario really exists. Any ideas? Thanks, Wincy -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html