[PATCH] Search the LAPIC's for one that will accept a PIC interrupt.
Older versions of 32-bit linux have a Checking 'hlt' instruction test where they repeatedly call the 'hlt' instruction, and then expect a timer interrupt to kick the CPU out of halt. This happens before any LAPIC or IOAPIC setup happens, which means that all of the APIC's are in virtual wire mode at this point. Unfortunately, the current implementation of virtual wire mode is hardcoded to only kick the BSP, so if a crash+kexec occurs on a different vcpu, it will never get kicked. This patch makes pic_unlock() do the equivalent of kvm_irq_delivery_to_apic() for the IOAPIC code. That is, it runs through all of the vcpus looking for one that is in virtual wire mode. In the normal case where LAPICs and IOAPICs are configured, this won't be used at all. In the bootstrap phase of a modern OS, before the LAPICs and IOAPICs are configured, this will have exactly the same behavior as today; VCPU0 is always looked at first, so it will always get out of the loop after the first iteration. This will only go through the loop more than once during a kexec/kdump, in which case it will only do it a few times until the kexec'ed kernel programs the LAPIC and IOAPIC. Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/i8259.c | 17 + 1 files changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index 2c73f44..85ecabc 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -44,16 +44,25 @@ static void pic_unlock(struct kvm_pic *s) __releases(s-lock) { bool wakeup = s-wakeup_needed; - struct kvm_vcpu *vcpu; + struct kvm_vcpu *vcpu, *found = NULL; + int i; s-wakeup_needed = false; raw_spin_unlock(s-lock); if (wakeup) { - vcpu = s-kvm-bsp_vcpu; - if (vcpu) - kvm_vcpu_kick(vcpu); + kvm_for_each_vcpu(i, vcpu, s-kvm) { + if (kvm_apic_accept_pic_intr(vcpu)) { + found = vcpu; + break; + } + } + + if (!found) + found = s-kvm-bsp_vcpu; + + kvm_vcpu_kick(found); } } -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/3]: Fixes to IRQ routing
As we've discussed previously, here is a series of patches to fix some of the IRQ routing issues we have in KVM. With this series in place I was able to successfully kdump a RHEL-5 64-bit, and RHEL-6 32- and 64-bit guest on CPU's other than the BSP. RHEL-5 32-bit kdump still does not work; it gets stuck on Checking 'hlt' instruction. However, it does that both before and after this series, so there is something else going on there that I still have to debug. I also need to change the kvm_migrate_pit_timer function to migrate the timer over to the last CPU that handled the timer interrupt, on the theory that that particlar CPU is likely to handle the timer interrupt again in the near future. However, this is an optimization and shouldn't delay the inclusion of the rest of the series for correctness. Changes since RFC: - Changed ps-inject_lock from raw_spinlock_t to spinlock_t - Fixed up some formatting issues - Changed to have one PIT workqueue per-guest - Remember to cancel_work_sync when destroying the PIT Changes since v1: - Call cancel_work_sync everywhere we call hrtimer_cancel - Bring back the reinjection logic - Fix up formatting issues from checkpatch Changes since v2: - Fix up the reinjection logic thanks to review from Gleb and Marcelo Tested with -no-kvm-pit-reinjection on a RHEL-3 guest -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.
We really want to kvm_set_irq during the hrtimer callback, but that is risky because that is during interrupt context. Instead, offload the work to a workqueue, which is a bit safer and should provide most of the same functionality. Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/i8254.c | 141 ++ arch/x86/kvm/i8254.h |4 +- arch/x86/kvm/irq.c |1 - 3 files changed, 88 insertions(+), 58 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 188d827..467cc47 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -34,6 +34,7 @@ #include linux/kvm_host.h #include linux/slab.h +#include linux/workqueue.h #include irq.h #include i8254.h @@ -244,11 +245,22 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian) { struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state, irq_ack_notifier); - raw_spin_lock(ps-inject_lock); - if (atomic_dec_return(ps-pit_timer.pending) 0) + int value; + + spin_lock(ps-inject_lock); + value = atomic_dec_return(ps-pit_timer.pending); + if (value 0) + /* spurious acks can be generated if, for example, the +* PIC is being reset. Handle it gracefully here +*/ atomic_inc(ps-pit_timer.pending); + else if (value 0) + /* in this case, we had multiple outstanding pit interrupts +* that we needed to inject. Reinject +*/ + queue_work(ps-pit-wq, ps-pit-expired); ps-irq_ack = 1; - raw_spin_unlock(ps-inject_lock); + spin_unlock(ps-inject_lock); } void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu) @@ -264,10 +276,10 @@ void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu) hrtimer_start_expires(timer, HRTIMER_MODE_ABS); } -static void destroy_pit_timer(struct kvm_timer *pt) +static void destroy_pit_timer(struct kvm_pit *pit) { - pr_debug(execute del timer!\n); - hrtimer_cancel(pt-timer); + hrtimer_cancel(pit-pit_state.pit_timer.timer); + cancel_work_sync(pit-expired); } static bool kpit_is_periodic(struct kvm_timer *ktimer) @@ -281,6 +293,60 @@ static struct kvm_timer_ops kpit_ops = { .is_periodic = kpit_is_periodic, }; +static void pit_do_work(struct work_struct *work) +{ + struct kvm_pit *pit = container_of(work, struct kvm_pit, expired); + struct kvm *kvm = pit-kvm; + struct kvm_vcpu *vcpu; + int i; + struct kvm_kpit_state *ps = pit-pit_state; + int inject = 0; + + /* Try to inject pending interrupts when +* last one has been acked. +*/ + spin_lock(ps-inject_lock); + if (ps-irq_ack) { + ps-irq_ack = 0; + inject = 1; + } + spin_unlock(ps-inject_lock); + if (inject) { + kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1); + kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0); + + /* +* Provides NMI watchdog support via Virtual Wire mode. +* The route is: PIT - PIC - LVT0 in NMI mode. +* +* Note: Our Virtual Wire implementation is simplified, only +* propagating PIT interrupts to all VCPUs when they have set +* LVT0 to NMI delivery. Other PIC interrupts are just sent to +* VCPU0, and only if its LVT0 is in EXTINT mode. +*/ + if (kvm-arch.vapics_in_nmi_mode 0) + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_apic_nmi_wd_deliver(vcpu); + } +} + +static enum hrtimer_restart pit_timer_fn(struct hrtimer *data) +{ + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + struct kvm_pit *pt = ktimer-kvm-arch.vpit; + + if (ktimer-reinject || !atomic_read(ktimer-pending)) { + atomic_inc(ktimer-pending); + queue_work(pt-wq, pt-expired); + } + + if (ktimer-t_ops-is_periodic(ktimer)) { + hrtimer_add_expires_ns(ktimer-timer, ktimer-period); + return HRTIMER_RESTART; + } else + return HRTIMER_NORESTART; +} + static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period) { struct kvm_timer *pt = ps-pit_timer; @@ -292,13 +358,13 @@ static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period) /* TODO The new value only affected after the retriggered */ hrtimer_cancel(pt-timer); + cancel_work_sync(ps-pit-expired); pt-period = interval; ps-is_periodic = is_period; - pt-timer.function = kvm_timer_fn; + pt-timer.function = pit_timer_fn; pt-t_ops = kpit_ops; pt-kvm = ps-pit-kvm; - pt-vcpu
[PATCH v3 3/3] In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's
Otherwise we might try to deliver a timer interrupt to a cpu that can't possibly handle it. Signed-off-by: Chris Lalancette clala...@redhat.com --- virt/kvm/irq_comm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 52f412f..06cf61e 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -100,7 +100,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, if (r 0) r = 0; r += kvm_apic_set_irq(vcpu, irq); - } else { + } else if (kvm_lapic_enabled(vcpu)) { if (!lowest) lowest = vcpu; else if (kvm_apic_compare_prio(vcpu, lowest) 0) -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/3] Allow any LAPIC to accept PIC interrupts.
If the guest wants to accept timer interrupts on a CPU other than the BSP, we need to remove this gate. Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/lapic.c | 12 +--- 1 files changed, 5 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index d8258a0..ee0f76c 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1107,13 +1107,11 @@ int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu) u32 lvt0 = apic_get_reg(vcpu-arch.apic, APIC_LVT0); int r = 0; - if (kvm_vcpu_is_bsp(vcpu)) { - if (!apic_hw_enabled(vcpu-arch.apic)) - r = 1; - if ((lvt0 APIC_LVT_MASKED) == 0 - GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT) - r = 1; - } + if (!apic_hw_enabled(vcpu-arch.apic)) + r = 1; + if ((lvt0 APIC_LVT_MASKED) == 0 + GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT) + r = 1; return r; } -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/3]: Fixes to IRQ routing
As we've discussed previously, here is a series of patches to fix some of the IRQ routing issues we have in KVM. With this series in place I was able to successfully kdump a RHEL-5 64-bit, and RHEL-6 32- and 64-bit guest on CPU's other than the BSP. RHEL-5 32-bit kdump still does not work; it gets stuck on Checking 'hlt' instruction. However, it does that both before and after this series, so there is something else going on there that I still have to debug. I also need to change the kvm_migrate_pit_timer function to migrate the timer over to the last CPU that handled the timer interrupt, on the theory that that particlar CPU is likely to handle the timer interrupt again in the near future. However, this is an optimization and shouldn't delay the inclusion of the rest of the series for correctness. Changes since RFC: - Changed ps-inject_lock from raw_spinlock_t to spinlock_t - Fixed up some formatting issues - Changed to have one PIT workqueue per-guest - Remember to cancel_work_sync when destroying the PIT Changes since v1: - Call cancel_work_sync everywhere we call hrtimer_cancel - Bring back the reinjection logic - Fix up formatting issues from checkpatch -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/3] Introduce a workqueue to deliver PIT timer interrupts.
We really want to kvm_set_irq during the hrtimer callback, but that is risky because that is during interrupt context. Instead, offload the work to a workqueue, which is a bit safer and should provide most of the same functionality. Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/i8254.c | 125 - arch/x86/kvm/i8254.h |4 +- arch/x86/kvm/irq.c |1 - 3 files changed, 74 insertions(+), 56 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 188d827..3bed8ac 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -34,6 +34,7 @@ #include linux/kvm_host.h #include linux/slab.h +#include linux/workqueue.h #include irq.h #include i8254.h @@ -244,11 +245,11 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian) { struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state, irq_ack_notifier); - raw_spin_lock(ps-inject_lock); + spin_lock(ps-inject_lock); if (atomic_dec_return(ps-pit_timer.pending) 0) atomic_inc(ps-pit_timer.pending); ps-irq_ack = 1; - raw_spin_unlock(ps-inject_lock); + spin_unlock(ps-inject_lock); } void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu) @@ -267,7 +268,8 @@ void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu) static void destroy_pit_timer(struct kvm_timer *pt) { pr_debug(execute del timer!\n); - hrtimer_cancel(pt-timer); + if (hrtimer_cancel(pt-timer)) + cancel_work_sync(pt-kvm-arch.vpit-expired); } static bool kpit_is_periodic(struct kvm_timer *ktimer) @@ -281,6 +283,58 @@ static struct kvm_timer_ops kpit_ops = { .is_periodic = kpit_is_periodic, }; +static void pit_do_work(struct work_struct *work) +{ + struct kvm_pit *pit = container_of(work, struct kvm_pit, expired); + struct kvm *kvm = pit-kvm; + struct kvm_vcpu *vcpu; + int i; + struct kvm_kpit_state *ps = pit-pit_state; + int inject = 0; + + /* Try to inject pending interrupts when +* last one has been acked. +*/ + spin_lock(ps-inject_lock); + if (ps-irq_ack) { + ps-irq_ack = 0; + inject = 1; + } + spin_unlock(ps-inject_lock); + if (inject) { + kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1); + kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0); + + /* +* Provides NMI watchdog support via Virtual Wire mode. +* The route is: PIT - PIC - LVT0 in NMI mode. +* +* Note: Our Virtual Wire implementation is simplified, only +* propagating PIT interrupts to all VCPUs when they have set +* LVT0 to NMI delivery. Other PIC interrupts are just sent to +* VCPU0, and only if its LVT0 is in EXTINT mode. +*/ + if (kvm-arch.vapics_in_nmi_mode 0) + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_apic_nmi_wd_deliver(vcpu); + } +} + +static enum hrtimer_restart pit_timer_fn(struct hrtimer *data) +{ + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + struct kvm_pit *pt = ktimer-kvm-arch.vpit; + + if (ktimer-reinject) + queue_work(pt-wq, pt-expired); + + if (ktimer-t_ops-is_periodic(ktimer)) { + hrtimer_add_expires_ns(ktimer-timer, ktimer-period); + return HRTIMER_RESTART; + } else + return HRTIMER_NORESTART; +} + static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period) { struct kvm_timer *pt = ps-pit_timer; @@ -291,14 +345,14 @@ static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period) pr_debug(create pit timer, interval is %llu nsec\n, interval); /* TODO The new value only affected after the retriggered */ - hrtimer_cancel(pt-timer); + if (hrtimer_cancel(pt-timer)) + cancel_work_sync(pt-kvm-arch.vpit-expired); pt-period = interval; ps-is_periodic = is_period; - pt-timer.function = kvm_timer_fn; + pt-timer.function = pit_timer_fn; pt-t_ops = kpit_ops; pt-kvm = ps-pit-kvm; - pt-vcpu = pt-kvm-bsp_vcpu; atomic_set(pt-pending, 0); ps-irq_ack = 1; @@ -626,7 +680,14 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags) mutex_init(pit-pit_state.lock); mutex_lock(pit-pit_state.lock); - raw_spin_lock_init(pit-pit_state.inject_lock); + spin_lock_init(pit-pit_state.inject_lock); + + pit-wq = create_singlethread_workqueue(kvm-pit-wq); + if (!pit-wq) { + kfree(pit); + return NULL; + } + INIT_WORK(pit-expired, pit_do_work); kvm-arch.vpit
[PATCH v2 2/3] Allow any LAPIC to accept PIC interrupts.
If the guest wants to accept timer interrupts on a CPU other than the BSP, we need to remove this gate. Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/lapic.c | 12 +--- 1 files changed, 5 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index d8258a0..ee0f76c 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1107,13 +1107,11 @@ int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu) u32 lvt0 = apic_get_reg(vcpu-arch.apic, APIC_LVT0); int r = 0; - if (kvm_vcpu_is_bsp(vcpu)) { - if (!apic_hw_enabled(vcpu-arch.apic)) - r = 1; - if ((lvt0 APIC_LVT_MASKED) == 0 - GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT) - r = 1; - } + if (!apic_hw_enabled(vcpu-arch.apic)) + r = 1; + if ((lvt0 APIC_LVT_MASKED) == 0 + GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT) + r = 1; return r; } -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/3] In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's
Otherwise we might try to deliver a timer interrupt to a cpu that can't possibly handle it. Signed-off-by: Chris Lalancette clala...@redhat.com --- virt/kvm/irq_comm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 52f412f..06cf61e 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -100,7 +100,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, if (r 0) r = 0; r += kvm_apic_set_irq(vcpu, irq); - } else { + } else if (kvm_lapic_enabled(vcpu)) { if (!lowest) lowest = vcpu; else if (kvm_apic_compare_prio(vcpu, lowest) 0) -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3]: Fixes to IRQ routing
As we've discussed previously, here is a series of patches to fix some of the IRQ routing issues we have in KVM. With this series in place I was able to successfully kdump a RHEL-5 64-bit, and RHEL-6 32- and 64-bit guest on CPU's other than the BSP. RHEL-5 32-bit kdump still does not work; it gets stuck on Checking 'hlt' instruction. However, it does that both before and after this series, so there is something else going on there that I still have to debug. I also need to change the kvm_migrate_pit_timer function to migrate the timer over to the last CPU that handled the timer interrupt, on the theory that that particlar CPU is likely to handle the timer interrupt again in the near future. Changes since RFC: - Changed ps-inject_lock from raw_spinlock_t to spinlock_t - Fixed up some formatting issues - Changed to have one PIT workqueue per-guest - Remember to cancel_work_sync when destroying the PIT Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] Introduce a workqueue to deliver PIT timer interrupts.
We really want to kvm_set_irq during the hrtimer callback, but that is risky because that is during interrupt context. Instead, offload the work to a workqueue, which is a bit safer and should provide most of the same functionality. Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/i8254.c | 117 -- arch/x86/kvm/i8254.h |4 +- arch/x86/kvm/irq.c |1 - 3 files changed, 69 insertions(+), 53 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 188d827..99c7472 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -34,6 +34,7 @@ #include linux/kvm_host.h #include linux/slab.h +#include linux/workqueue.h #include irq.h #include i8254.h @@ -244,11 +245,11 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian) { struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state, irq_ack_notifier); - raw_spin_lock(ps-inject_lock); + spin_lock(ps-inject_lock); if (atomic_dec_return(ps-pit_timer.pending) 0) atomic_inc(ps-pit_timer.pending); ps-irq_ack = 1; - raw_spin_unlock(ps-inject_lock); + spin_unlock(ps-inject_lock); } void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu) @@ -281,6 +282,58 @@ static struct kvm_timer_ops kpit_ops = { .is_periodic = kpit_is_periodic, }; +static void pit_do_work(struct work_struct *work) +{ + struct kvm_pit *pit = container_of(work, struct kvm_pit, expired); + struct kvm *kvm = pit-kvm; + struct kvm_vcpu *vcpu; + int i; + struct kvm_kpit_state *ps = pit-pit_state; + int inject = 0; + + /* Try to inject pending interrupts when +* last one has been acked. +*/ + spin_lock(ps-inject_lock); + if (ps-irq_ack) { + ps-irq_ack = 0; + inject = 1; + } + spin_unlock(ps-inject_lock); + if (inject) { + kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1); + kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0); + + /* +* Provides NMI watchdog support via Virtual Wire mode. +* The route is: PIT - PIC - LVT0 in NMI mode. +* +* Note: Our Virtual Wire implementation is simplified, only +* propagating PIT interrupts to all VCPUs when they have set +* LVT0 to NMI delivery. Other PIC interrupts are just sent to +* VCPU0, and only if its LVT0 is in EXTINT mode. +*/ + if (kvm-arch.vapics_in_nmi_mode 0) + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_apic_nmi_wd_deliver(vcpu); + } +} + +static enum hrtimer_restart pit_timer_fn(struct hrtimer *data) +{ + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + struct kvm_pit *pt = ktimer-kvm-arch.vpit; + + queue_work(pt-wq, pt-expired); + + if (ktimer-t_ops-is_periodic(ktimer)) { + hrtimer_add_expires_ns(ktimer-timer, ktimer-period); + return HRTIMER_RESTART; + } + else + return HRTIMER_NORESTART; +} + static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period) { struct kvm_timer *pt = ps-pit_timer; @@ -295,10 +348,9 @@ static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period) pt-period = interval; ps-is_periodic = is_period; - pt-timer.function = kvm_timer_fn; + pt-timer.function = pit_timer_fn; pt-t_ops = kpit_ops; pt-kvm = ps-pit-kvm; - pt-vcpu = pt-kvm-bsp_vcpu; atomic_set(pt-pending, 0); ps-irq_ack = 1; @@ -626,7 +678,14 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags) mutex_init(pit-pit_state.lock); mutex_lock(pit-pit_state.lock); - raw_spin_lock_init(pit-pit_state.inject_lock); + spin_lock_init(pit-pit_state.inject_lock); + + pit-wq = create_singlethread_workqueue(kvm-pit-wq); + if (!pit-wq) { + kfree(pit); + return NULL; + } + INIT_WORK(pit-expired, pit_do_work); kvm-arch.vpit = pit; pit-kvm = kvm; @@ -687,52 +746,8 @@ void kvm_free_pit(struct kvm *kvm) hrtimer_cancel(timer); kvm_free_irq_source_id(kvm, kvm-arch.vpit-irq_source_id); mutex_unlock(kvm-arch.vpit-pit_state.lock); + cancel_work_sync(kvm-arch.vpit-expired); + destroy_workqueue(kvm-arch.vpit-wq); kfree(kvm-arch.vpit); } } - -static void __inject_pit_timer_intr(struct kvm *kvm) -{ - struct kvm_vcpu *vcpu; - int i; - - kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1); - kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0
[PATCH 3/3] In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's
Otherwise we might try to deliver a timer interrupt to a cpu that can't possibly handle it. Signed-off-by: Chris Lalancette clala...@redhat.com --- virt/kvm/irq_comm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 52f412f..06cf61e 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -100,7 +100,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, if (r 0) r = 0; r += kvm_apic_set_irq(vcpu, irq); - } else { + } else if (kvm_lapic_enabled(vcpu)) { if (!lowest) lowest = vcpu; else if (kvm_apic_compare_prio(vcpu, lowest) 0) -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] Allow any LAPIC to accept PIC interrupts.
If the guest wants to accept timer interrupts on a CPU other than the BSP, we need to remove this gate. Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/lapic.c | 12 +--- 1 files changed, 5 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index d8258a0..ee0f76c 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1107,13 +1107,11 @@ int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu) u32 lvt0 = apic_get_reg(vcpu-arch.apic, APIC_LVT0); int r = 0; - if (kvm_vcpu_is_bsp(vcpu)) { - if (!apic_hw_enabled(vcpu-arch.apic)) - r = 1; - if ((lvt0 APIC_LVT_MASKED) == 0 - GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT) - r = 1; - } + if (!apic_hw_enabled(vcpu-arch.apic)) + r = 1; + if ((lvt0 APIC_LVT_MASKED) == 0 + GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT) + r = 1; return r; } -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 1/3] Introduce a workqueue to deliver PIT timer interrupts.
We really want to kvm_set_irq during the hrtimer callback, but that is risky because that is during interrupt context. Instead, offload the work to a workqueue, which is a bit safer and should provide most of the same functionality. Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/i8254.c | 106 ++--- arch/x86/kvm/i8254.h |4 ++ arch/x86/kvm/irq.c |1 - arch/x86/kvm/x86.c |4 ++ 4 files changed, 73 insertions(+), 42 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 188d827..dd655ef 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -34,6 +34,7 @@ #include linux/kvm_host.h #include linux/slab.h +#include linux/workqueue.h #include irq.h #include i8254.h @@ -49,6 +50,8 @@ #define RW_STATE_WORD0 3 #define RW_STATE_WORD1 4 +static struct workqueue_struct *pit_wq; + /* Compute with 96 bit intermediate result: (a*b)/c */ static u64 muldiv64(u64 a, u32 b, u32 c) { @@ -281,6 +284,58 @@ static struct kvm_timer_ops kpit_ops = { .is_periodic = kpit_is_periodic, }; +static void pit_do_work(struct work_struct *work) +{ + struct kvm_pit *pit = container_of(work, struct kvm_pit, expired); + struct kvm *kvm = pit-kvm; + struct kvm_vcpu *vcpu; + int i; + struct kvm_kpit_state *ps = pit-pit_state; + int inject = 0; + + /* Try to inject pending interrupts when +* last one has been acked. +*/ + raw_spin_lock(ps-inject_lock); + if (ps-irq_ack) { + ps-irq_ack = 0; + inject = 1; + } + raw_spin_unlock(ps-inject_lock); + if (inject) { + kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1); + kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0); + + /* +* Provides NMI watchdog support via Virtual Wire mode. +* The route is: PIT - PIC - LVT0 in NMI mode. +* +* Note: Our Virtual Wire implementation is simplified, only +* propagating PIT interrupts to all VCPUs when they have set +* LVT0 to NMI delivery. Other PIC interrupts are just sent to +* VCPU0, and only if its LVT0 is in EXTINT mode. +*/ + if (kvm-arch.vapics_in_nmi_mode 0) + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_apic_nmi_wd_deliver(vcpu); + } +} + +static enum hrtimer_restart pit_timer_fn(struct hrtimer *data) +{ + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + struct kvm_pit *pt = ktimer-kvm-arch.vpit; + + queue_work(pit_wq, pt-expired); + + if (ktimer-t_ops-is_periodic(ktimer)) { + hrtimer_add_expires_ns(ktimer-timer, ktimer-period); + return HRTIMER_RESTART; + } + else + return HRTIMER_NORESTART; +} + static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period) { struct kvm_timer *pt = ps-pit_timer; @@ -295,10 +350,9 @@ static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period) pt-period = interval; ps-is_periodic = is_period; - pt-timer.function = kvm_timer_fn; + pt-timer.function = pit_timer_fn; pt-t_ops = kpit_ops; pt-kvm = ps-pit-kvm; - pt-vcpu = pt-kvm-bsp_vcpu; atomic_set(pt-pending, 0); ps-irq_ack = 1; @@ -628,6 +682,8 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags) mutex_lock(pit-pit_state.lock); raw_spin_lock_init(pit-pit_state.inject_lock); + INIT_WORK(pit-expired, pit_do_work); + kvm-arch.vpit = pit; pit-kvm = kvm; @@ -691,48 +747,16 @@ void kvm_free_pit(struct kvm *kvm) } } -static void __inject_pit_timer_intr(struct kvm *kvm) +int kvm_pit_create(void) { - struct kvm_vcpu *vcpu; - int i; + pit_wq = create_singlethread_workqueue(kvm-pit-wq); + if (!pit_wq) + return -ENOMEM; - kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1); - kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0); - - /* -* Provides NMI watchdog support via Virtual Wire mode. -* The route is: PIT - PIC - LVT0 in NMI mode. -* -* Note: Our Virtual Wire implementation is simplified, only -* propagating PIT interrupts to all VCPUs when they have set -* LVT0 to NMI delivery. Other PIC interrupts are just sent to -* VCPU0, and only if its LVT0 is in EXTINT mode. -*/ - if (kvm-arch.vapics_in_nmi_mode 0) - kvm_for_each_vcpu(i, vcpu, kvm) - kvm_apic_nmi_wd_deliver(vcpu); + return 0; } -void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu) +void kvm_pit_cleanup(void) { - struct kvm_pit *pit = vcpu-kvm-arch.vpit; - struct kvm *kvm = vcpu
[RFC][PATCH 2/3] Allow any LAPIC to accept PIC interrupts.
If the guest wants to accept timer interrupts on a CPU other than the BSP, we need to remove this gate. Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/lapic.c | 12 +--- 1 files changed, 5 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index d8258a0..ee0f76c 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1107,13 +1107,11 @@ int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu) u32 lvt0 = apic_get_reg(vcpu-arch.apic, APIC_LVT0); int r = 0; - if (kvm_vcpu_is_bsp(vcpu)) { - if (!apic_hw_enabled(vcpu-arch.apic)) - r = 1; - if ((lvt0 APIC_LVT_MASKED) == 0 - GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT) - r = 1; - } + if (!apic_hw_enabled(vcpu-arch.apic)) + r = 1; + if ((lvt0 APIC_LVT_MASKED) == 0 + GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT) + r = 1; return r; } -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 3/3] In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's
Otherwise we might try to deliver a timer interrupt to a cpu that can't possibly handle it. Signed-off-by: Chris Lalancette clala...@redhat.com --- virt/kvm/irq_comm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 52f412f..06cf61e 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -100,7 +100,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, if (r 0) r = 0; r += kvm_apic_set_irq(vcpu, irq); - } else { + } else if (kvm_lapic_enabled(vcpu)) { if (!lowest) lowest = vcpu; else if (kvm_apic_compare_prio(vcpu, lowest) 0) -- 1.6.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Print a user-friendly message on failed vmentry
On 05/19/2010 05:16 PM, Mohammed Gamal wrote: This patch address bug report in https://bugs.launchpad.net/qemu/+bug/530077. Failed vmentries were handled with handle_unhandled() which prints a rather unfriendly message to the user. This patch separates handling vmentry failures from unknown exit reasons and prints a friendly message to the user. Signed-off-by: Mohammed Gamal m.gamal...@gmail.com --- qemu-kvm.c | 16 +++- 1 files changed, 15 insertions(+), 1 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 35a4c8a..deb4df8 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -106,6 +106,20 @@ static int handle_unhandled(uint64_t reason) return -EINVAL; } +static int handle_failed_vmentry(uint64_t reason) +{ +fprintf(stderr, kvm: vm entry failed with error 0x% PRIx64 \n\n, reason); +fprintf(stderr, If you're runnning a guest on an Intel machine, it can be\n); +fprintf(stderr, most-likely due to the guest going into an invalid state\n); +fprintf(stderr, for Intel VT. For example, the guest maybe running in big\n); +fprintf(stderr, real mode which is not supported by Intel VT.\n\n); +fprintf(stderr, You may want to try enabling real mode emulation in KVM.\n); +fprintf(stderr, To Enable it, you may run the following commands as root:\n); +fprintf(stderr, # rmmod kvm_intel\n); +fprintf(stderr, # rmmod kvm\n); +fprintf(stderr, # modprobe kvm_intel emulate_invalid_guest_state=1\n); +return -EINVAL; +} The thing is, there are other valid reasons for vmentry failure. A while ago I tracked down a bug in the Linux kernel that was causing us to vmenter with invalid segments; this message would have been very misleading in that case. I think you'd have to do more complete analysis of the vmentry failure code to be more certain about the reason for failure. -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [UNTESTED] KVM: do not call kvm_set_irq from irq disabled section
On 04/23/2010 07:05 AM, Avi Kivity wrote: On 04/22/2010 10:55 PM, Gleb Natapov wrote: What about converting PIC/IOAPIC mutexes into spinlocks? Works for me, but on large guests the spinning will be noticeable. I believe. For interrupts going through IOPIC, but we know this is not scalable anyway. Yes. We also wanted to convert the ioapic/pic to spinlocks so we could queue the interrupt from the PIT directly instead of using KVM_REQ_PENDING_TIMER which keeps confusing me. Chris Lalancette posted a patchset for this a while back but it was never completed. Yeah, I'm sorry I never completed it. It turns out that with the HPET changes that went in around the time I was looking at it, that set of patches wasn't really required to fix the problem I was seeing with kdump. That being said, if it's useful to somebody, I can repost the patches (though they are woefully out-of-date now). Let me know if you want to see them again. -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Windows Vista/7 repeatedly prompt to Set Network Location
On 01/26/2010 11:22 PM, Jamin W. Collins wrote: Every time I start a Windows Vista or Windows 7 virtual machine it request a that a location be set for the network, regardless of the fact that the network location has already been set the same way every time the system is started. Near as I can tell the VM's NIC MAC, IP address, DNS servers, default gateway, and all other network related items are the same every single time. While this does not stop the VM from working it is annoying. I'm starting the VM the same way through libvirt every single time. The resultant kvm command line is: /usr/bin/kvm -S -M pc-0.11 -cpu qemu32 -enable-kvm -m 1536 -smp 1 -name win7 -uuid 6671f42a-b974-6bb9-bd7e-fd1da95cabe5 -monitor unix:/var/lib/libvirt/qemu/win7.monitor,server,nowait -localtime -boot c -drive file=/media/devel/testing/win7/testing.img,if=ide,index=0,boot=on -drive file=/media/devel/testing/isos/en_windows_7_professional_x86_dvd_x15-65804.iso,if=ide,media=cdrom,index=2 -net nic,macaddr=54:52:00:37:4a:ce,vlan=0,model=rtl8139,name=rtl8139.0 -net tap,fd=38,vlan=0,name=tap.0 -serial pty -parallel none -usb -usbdevice tablet -vga cirrus While the uuid isn't the same from one execution to the next, to my knowledge it's not something the VM ever sees and is only an identifier within KVM. Has anyone else seen anything like this? Actually, that's not true (if I remember correctly). The uuid is exposed to the guest via the SMBIOS tables, so that might be causing the problem. Try setting the UUID in the libvirt XML, and that may solve it. -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/12] Kick appropriate CPUs when signalling interrupts.
On 12/02/2009 04:44 PM, Gleb Natapov wrote: On Tue, Dec 01, 2009 at 03:36:41PM +0100, Chris Lalancette wrote: Make sure that we kick the appropriate vcpu when delivering an interrupt. This makes sure that we wake any idle cpus to cause a vcpu_run and an interrupt injection to occur. Signed-off-by: Chris Lalancette clala...@redhat.com --- virt/kvm/irq_comm.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 9b07734..96df854 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -98,6 +98,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, if (r 0) r = 0; r += kvm_apic_set_irq(vcpu, irq); +kvm_vcpu_kick(vcpu); } else { if (!lowest) lowest = vcpu; @@ -106,8 +107,10 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, } } -if (lowest) +if (lowest) { r = kvm_apic_set_irq(lowest, irq); +kvm_vcpu_kick(lowest); +} return r; } kvm_vcpu_kick() is done inside kvm_apic_set_irq(), so why is this needed? Yeah, you are absolutely right. I mis-read it before. I'll remove this bit. -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 03/12] Remove KVM_REQ_PENDING_TIMER.
On 12/01/2009 08:04 PM, Jan Kiszka wrote: Chris Lalancette wrote: KVM_REQ_PENDING_TIMER is set and cleared in a couple of places, but it never seems to be actually checked. Remove it. I would suggest to study the introducing commit 06e05645661211b9eaadaf6344c335d2e80f0ba2. My strong feeling is that this removal is wrong. Ah, thanks. Now I see what you mean. In newer KVM, if you are in __vcpu_run() and are in the while (r 0) loop, after you've done kvm_inject_pending_timer_irqs(), but before you do the next __vcpu_run() (and hence disabled local IRQs), you could receive an hrtimer callback. In that case I guess you would hit the if (vcpu-requests || need_resched() || signal_pending()) test, don't inject the interrupt, and possibly spend a long time in the guest before an unrelated event causes the exit and hence causes the timer injection. Well, that certainly is a bit subtle :). I think I might submit a patch to at least put a comment in vcpu_enter_guest() about this. Unfortunately, this does sort of put a damper on what's going on later in the series. If the i8254 doesn't belong to the BSP (which it really shouldn't), then I don't know which vcpu to raise the bit on. Maybe I can do it in the kvm_irq_delivery_to_apic() function, or kvm_apic_set_irq(). I'll take a look at that. Thanks again for the review, -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/12]: Fix kdump under KVM
Another version of the patch series to make kdump work inside KVM guests. The current problem with using kdump is that KVM only delivers PIT interrupts to the BSP. While this is technically allowed by the MPS spec, most motherboards deliver timer interrupts to *any* LAPIC in virtual wire mode. Since a crash can occur on any CPU, timer interrupts must be able to reach any CPU in order for kdump to work properly. Therefore, this series allows timer interrupts to be routed to any virtual CPU that is in the correct mode. I've broken up the series into much smaller chunks now; some of them can be applied on their own as bug fixes. There are a few changes to this series from the last time I posted. They are: 1) I've taken Marcelo's advice to only redirect PIT interrupts in the IOAPIC to the BSP if the destination is multiple vcpus. 2) I've changed it so that we use a spin_lock() in the IOAPIC instead of a mutex lock, based on the discussion with Avi. This means that we can get rid of all of the delayed interrupt action in the PIT and just directly set_irq in the hrtimer callbacks. The conversion of the IOAPIC to a spin_lock instead of a mutex_lock caused some concern with Avi when we last discussed it. In particular he was concerned that any notifiers that kvm_set_irq kicked off were safe from an IRQ context. From what I can tell, there are two sets of notifiers: the IRQ mask notifiers and the IRQ ACK notifiers. There is one, and only one mask notifier, in the PIT; it merely sets a flag, so it should be interrupt safe. For the IRQ ACK notifiers, there is one notifier for the PIT and one for assigned-dev.c. The one in the PIT is definitely interrupt safe, it just sets a flag. The one in assigned-dev is more complicated. It calls out to kvm_set_irq() (which is IRQ-safe now), and also calls out to enable_irq(), which should be interrupt-safe as long as bus_lock is not set (which it never is, from what I can tell). Therefore, I *think* this should be safe, although I don't have VT-d hardware available at the moment to test for sure. I did basic smoke testing on a few different SMP guests: Fedora 11 i386, Fedora 12 x86_64, Red Hat Linux 5.2, Windows XP i386, RHEL-4.8, Ubuntu 8.10, Windows 2003 i386, and RHEL-5.3. I also did targetted testing of the kdump feature in the RHEL-5.3 guest, and kdump works there. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/12] Fix up some comments around the source tree.
Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/svm.c |2 +- virt/kvm/ioapic.c |4 ++-- virt/kvm/kvm_main.c |2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 3de0b37..68fe7f2 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2628,7 +2628,7 @@ static void svm_complete_interrupts(struct vcpu_svm *svm) break; case SVM_EXITINTINFO_TYPE_EXEPT: /* In case of software exception do not reinject an exception - vector, but re-execute and instruction instead */ + vector, but re-execute an instruction instead */ if (is_nested(svm)) break; if (kvm_exception_is_soft(vector)) diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 38a2d20..53f3166 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -164,10 +164,10 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq) irqe.shorthand = 0; #ifdef CONFIG_X86 - /* Always delivery PIT interrupt to vcpu 0 */ + /* Always deliver PIT interrupt to vcpu 0 */ if (irq == 0) { irqe.dest_mode = 0; /* Physical mode. */ - /* need to read apic_id from apic regiest since + /* need to read apic_id from apic register since * it can be rewritten */ irqe.dest_id = ioapic-kvm-bsp_vcpu-vcpu_id; } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f92ba13..748900e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1180,7 +1180,7 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu) } /* - * Creates some virtual cpus. Good luck creating more than one. + * Creates a virtual cpu. */ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) { -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/12] Make kvm_pic_reset static.
Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/i8259.c |2 +- arch/x86/kvm/irq.h |2 -- 2 files changed, 1 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index d057c0c..59a8a68 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -233,7 +233,7 @@ int kvm_pic_read_irq(struct kvm *kvm) return intno; } -void kvm_pic_reset(struct kvm_kpic_state *s) +static void kvm_pic_reset(struct kvm_kpic_state *s) { int irq; struct kvm *kvm = s-pics_state-irq_request_opaque; diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h index be399e2..a5344f1 100644 --- a/arch/x86/kvm/irq.h +++ b/arch/x86/kvm/irq.h @@ -93,8 +93,6 @@ static inline int irqchip_in_kernel(struct kvm *kvm) return ret; } -void kvm_pic_reset(struct kvm_kpic_state *s); - void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu); void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu); void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu); -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/12] Remove KVM_REQ_PENDING_TIMER.
KVM_REQ_PENDING_TIMER is set and cleared in a couple of places, but it never seems to be actually checked. Remove it. Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/timer.c |5 + arch/x86/kvm/x86.c |1 - include/linux/kvm_host.h |1 - 3 files changed, 1 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c index eea4043..72b5144 100644 --- a/arch/x86/kvm/timer.c +++ b/arch/x86/kvm/timer.c @@ -14,11 +14,8 @@ static int __kvm_timer_fn(struct kvm_vcpu *vcpu, struct kvm_timer *ktimer) * not care about potentially loosing timer events in the !reinject * case anyway. */ - if (ktimer-reinject || !atomic_read(ktimer-pending)) { + if (ktimer-reinject || !atomic_read(ktimer-pending)) atomic_inc(ktimer-pending); - /* FIXME: this code should not know anything about vcpus */ - set_bit(KVM_REQ_PENDING_TIMER, vcpu-requests); - } if (waitqueue_active(q)) wake_up_interruptible(q); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 153a526..2fb4251 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4034,7 +4034,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) if (r = 0) break; - clear_bit(KVM_REQ_PENDING_TIMER, vcpu-requests); if (kvm_cpu_has_pending_timer(vcpu)) kvm_inject_pending_timer_irqs(vcpu); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index bd5a616..053e49f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -33,7 +33,6 @@ #define KVM_REQ_REPORT_TPR_ACCESS 2 #define KVM_REQ_MMU_RELOAD 3 #define KVM_REQ_TRIPLE_FAULT 4 -#define KVM_REQ_PENDING_TIMER 5 #define KVM_REQ_UNHALT 6 #define KVM_REQ_MMU_SYNC 7 #define KVM_REQ_KVMCLOCK_UPDATE8 -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 06/12] Make the PIC use interrupt safe locking.
Since we want to be able to call kvm_pic_set_irq() in interrupt context, convert the uses of spin_lock() to spin_lock_irqsave() as appropriate in i8259.c Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/i8259.c | 34 ++ 1 files changed, 22 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index 59a8a68..8285f0d 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -52,10 +52,12 @@ static void pic_clear_isr(struct kvm_kpic_state *s, int irq) void kvm_pic_clear_isr_ack(struct kvm *kvm) { struct kvm_pic *s = pic_irqchip(kvm); - spin_lock(s-lock); + unsigned long flags; + + spin_lock_irqsave(s-lock, flags); s-pics[0].isr_ack = 0xff; s-pics[1].isr_ack = 0xff; - spin_unlock(s-lock); + spin_unlock_irqrestore(s-lock, flags); } /* @@ -156,24 +158,27 @@ static void pic_update_irq(struct kvm_pic *s) void kvm_pic_update_irq(struct kvm_pic *s) { - spin_lock(s-lock); + unsigned long flags; + + spin_lock_irqsave(s-lock, flags); pic_update_irq(s); - spin_unlock(s-lock); + spin_unlock_irqrestore(s-lock, flags); } int kvm_pic_set_irq(void *opaque, int irq, int level) { struct kvm_pic *s = opaque; int ret = -1; + unsigned long flags; - spin_lock(s-lock); + spin_lock_irqsave(s-lock, flags); if (irq = 0 irq PIC_NUM_PINS) { ret = pic_set_irq1(s-pics[irq 3], irq 7, level); pic_update_irq(s); trace_kvm_pic_set_irq(irq 3, irq 7, s-pics[irq 3].elcr, s-pics[irq 3].imr, ret == 0); } - spin_unlock(s-lock); + spin_unlock_irqrestore(s-lock, flags); return ret; } @@ -202,8 +207,9 @@ int kvm_pic_read_irq(struct kvm *kvm) { int irq, irq2, intno; struct kvm_pic *s = pic_irqchip(kvm); + unsigned long flags; - spin_lock(s-lock); + spin_lock_irqsave(s-lock, flags); irq = pic_get_irq(s-pics[0]); if (irq = 0) { pic_intack(s-pics[0], irq); @@ -228,7 +234,7 @@ int kvm_pic_read_irq(struct kvm *kvm) intno = s-pics[0].irq_base + irq; } pic_update_irq(s); - spin_unlock(s-lock); + spin_unlock_irqrestore(s-lock, flags); return intno; } @@ -434,6 +440,8 @@ static int picdev_write(struct kvm_io_device *this, { struct kvm_pic *s = to_pic(this); unsigned char data = *(unsigned char *)val; + unsigned long flags; + if (!picdev_in_range(addr)) return -EOPNOTSUPP; @@ -442,7 +450,7 @@ static int picdev_write(struct kvm_io_device *this, printk(KERN_ERR PIC: non byte write\n); return 0; } - spin_lock(s-lock); + spin_lock_irqsave(s-lock, flags); switch (addr) { case 0x20: case 0x21: @@ -455,7 +463,7 @@ static int picdev_write(struct kvm_io_device *this, elcr_ioport_write(s-pics[addr 1], addr, data); break; } - spin_unlock(s-lock); + spin_unlock_irqrestore(s-lock, flags); return 0; } @@ -464,6 +472,8 @@ static int picdev_read(struct kvm_io_device *this, { struct kvm_pic *s = to_pic(this); unsigned char data = 0; + unsigned long flags; + if (!picdev_in_range(addr)) return -EOPNOTSUPP; @@ -472,7 +482,7 @@ static int picdev_read(struct kvm_io_device *this, printk(KERN_ERR PIC: non byte read\n); return 0; } - spin_lock(s-lock); + spin_lock_irqsave(s-lock, flags); switch (addr) { case 0x20: case 0x21: @@ -486,7 +496,7 @@ static int picdev_read(struct kvm_io_device *this, break; } *(unsigned char *)val = data; - spin_unlock(s-lock); + spin_unlock_irqrestore(s-lock, flags); return 0; } -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/12] Make the IOAPIC lock a spinlock.
In order to be able to call kvm_set_irq from an interrupt context, the IOAPIC lock can't be a (possibly sleeping) mutex. Convert it to a spinlock. Signed-off-by: Chris Lalancette clala...@redhat.com --- virt/kvm/ioapic.c | 40 +--- virt/kvm/ioapic.h |2 +- 2 files changed, 26 insertions(+), 16 deletions(-) diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index fd52949..3d68826 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -192,8 +192,9 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level) u32 mask = 1 irq; union kvm_ioapic_redirect_entry entry; int ret = 1; + unsigned long flags; - mutex_lock(ioapic-lock); + spin_lock_irqsave(ioapic-lock, flags); if (irq = 0 irq IOAPIC_NUM_PINS) { entry = ioapic-redirtbl[irq]; level ^= entry.fields.polarity; @@ -210,7 +211,7 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level) } trace_kvm_ioapic_set_irq(entry.bits, irq, ret == 0); } - mutex_unlock(ioapic-lock); + spin_unlock_irqrestore(ioapic-lock, flags); return ret; } @@ -234,9 +235,9 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector, * is dropped it will be put into irr and will be delivered * after ack notifier returns. */ - mutex_unlock(ioapic-lock); + spin_unlock(ioapic-lock); kvm_notify_acked_irq(ioapic-kvm, KVM_IRQCHIP_IOAPIC, i); - mutex_lock(ioapic-lock); + spin_lock(ioapic-lock); if (trigger_mode != IOAPIC_LEVEL_TRIG) continue; @@ -251,10 +252,11 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int vector, void kvm_ioapic_update_eoi(struct kvm *kvm, int vector, int trigger_mode) { struct kvm_ioapic *ioapic = kvm-arch.vioapic; + unsigned long flags; - mutex_lock(ioapic-lock); + spin_lock_irqsave(ioapic-lock, flags); __kvm_ioapic_update_eoi(ioapic, vector, trigger_mode); - mutex_unlock(ioapic-lock); + spin_unlock_irqrestore(ioapic-lock, flags); } static inline struct kvm_ioapic *to_ioapic(struct kvm_io_device *dev) @@ -273,6 +275,8 @@ static int ioapic_mmio_read(struct kvm_io_device *this, gpa_t addr, int len, { struct kvm_ioapic *ioapic = to_ioapic(this); u32 result; + unsigned long flags; + if (!ioapic_in_range(ioapic, addr)) return -EOPNOTSUPP; @@ -280,7 +284,7 @@ static int ioapic_mmio_read(struct kvm_io_device *this, gpa_t addr, int len, ASSERT(!(addr 0xf)); /* check alignment */ addr = 0xff; - mutex_lock(ioapic-lock); + spin_lock_irqsave(ioapic-lock, flags); switch (addr) { case IOAPIC_REG_SELECT: result = ioapic-ioregsel; @@ -294,7 +298,7 @@ static int ioapic_mmio_read(struct kvm_io_device *this, gpa_t addr, int len, result = 0; break; } - mutex_unlock(ioapic-lock); + spin_unlock_irqrestore(ioapic-lock, flags); switch (len) { case 8: @@ -316,6 +320,8 @@ static int ioapic_mmio_write(struct kvm_io_device *this, gpa_t addr, int len, { struct kvm_ioapic *ioapic = to_ioapic(this); u32 data; + unsigned long flags; + if (!ioapic_in_range(ioapic, addr)) return -EOPNOTSUPP; @@ -331,7 +337,7 @@ static int ioapic_mmio_write(struct kvm_io_device *this, gpa_t addr, int len, } addr = 0xff; - mutex_lock(ioapic-lock); + spin_lock_irqsave(ioapic-lock, flags); switch (addr) { case IOAPIC_REG_SELECT: ioapic-ioregsel = data; @@ -349,7 +355,7 @@ static int ioapic_mmio_write(struct kvm_io_device *this, gpa_t addr, int len, default: break; } - mutex_unlock(ioapic-lock); + spin_unlock_irqrestore(ioapic-lock, flags); return 0; } @@ -378,7 +384,7 @@ int kvm_ioapic_init(struct kvm *kvm) ioapic = kzalloc(sizeof(struct kvm_ioapic), GFP_KERNEL); if (!ioapic) return -ENOMEM; - mutex_init(ioapic-lock); + spin_lock_init(ioapic-lock); kvm-arch.vioapic = ioapic; kvm_ioapic_reset(ioapic); kvm_iodevice_init(ioapic-dev, ioapic_mmio_ops); @@ -393,23 +399,27 @@ int kvm_ioapic_init(struct kvm *kvm) int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state) { struct kvm_ioapic *ioapic = ioapic_irqchip(kvm); + unsigned long flags; + if (!ioapic) return -EINVAL; - mutex_lock(ioapic-lock); + spin_lock_irqsave(ioapic-lock, flags); memcpy(state, ioapic, sizeof(struct kvm_ioapic_state)); - mutex_unlock(ioapic-lock); + spin_unlock_irqrestore(ioapic-lock
[PATCH 04/12] IOAPIC timer interrupt redirect.
Only redirect IRQ 0 (i.e. timer interrupt) to the BSP if the APIC destination is multiple vcpus. Signed-off-by: Chris Lalancette clala...@redhat.com --- virt/kvm/ioapic.c | 15 +-- 1 files changed, 13 insertions(+), 2 deletions(-) diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 53f3166..fd52949 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -144,6 +144,15 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val) } } +static int ioapic_dest_multiple_cpus(struct kvm_lapic_irq *irqe) +{ + /* physical mode is always directed to 1 cpu */ + if (irqe-dest_mode == 0 || irqe-dest_id == 0) + return 0; + + return hweight_long(irqe-dest_id) != 1; +} + static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq) { union kvm_ioapic_redirect_entry *entry = ioapic-redirtbl[irq]; @@ -164,8 +173,10 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq) irqe.shorthand = 0; #ifdef CONFIG_X86 - /* Always deliver PIT interrupt to vcpu 0 */ - if (irq == 0) { + /* if this is the PIT interrupt and the dest_id is multiple +* vcpus, re-write to go to vcpu 0 +*/ + if (irq == 0 ioapic_dest_multiple_cpus(irqe)) { irqe.dest_mode = 0; /* Physical mode. */ /* need to read apic_id from apic register since * it can be rewritten */ -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/12] Remove timer.c
The code in timer.c isn't really similar enough between the i8254 and the lapic to share. Split these into separate functions, and remove timer.c Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/Makefile|3 +- arch/x86/kvm/i8254.c | 27 - arch/x86/kvm/kvm_timer.h |5 ++- arch/x86/kvm/lapic.c | 33 +++- arch/x86/kvm/timer.c | 47 -- 5 files changed, 61 insertions(+), 54 deletions(-) delete mode 100644 arch/x86/kvm/timer.c diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 31a7035..8d9adf6 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -10,8 +10,7 @@ kvm-y += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \ assigned-dev.o) kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o) -kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ - i8254.o timer.o +kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o i8254.o kvm-intel-y+= vmx.o kvm-amd-y += svm.o diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index fab7440..dc6eff4 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -277,6 +277,30 @@ static struct kvm_timer_ops kpit_ops = { .is_periodic = kpit_is_periodic, }; +static enum hrtimer_restart pit_timer_fn(struct hrtimer *data) +{ + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + struct kvm *kvm = ktimer-kvm; + + /* +* There is a race window between reading and incrementing, but we do +* not care about potentially losing timer events in the !reinject +* case anyway. +*/ + if (ktimer-reinject || !atomic_read(ktimer-pending)) + atomic_inc(ktimer-pending); + + if (waitqueue_active(kvm-bsp_vcpu-wq)) + wake_up_interruptible(kvm-bsp_vcpu-wq); + + if (ktimer-t_ops-is_periodic(ktimer)) { + hrtimer_add_expires_ns(ktimer-timer, ktimer-period); + return HRTIMER_RESTART; + } + else + return HRTIMER_NORESTART; +} + static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period) { struct kvm_timer *pt = ps-pit_timer; @@ -291,10 +315,9 @@ static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period) pt-period = interval; ps-is_periodic = is_period; - pt-timer.function = kvm_timer_fn; + pt-timer.function = pit_timer_fn; pt-t_ops = kpit_ops; pt-kvm = ps-pit-kvm; - pt-vcpu = pt-kvm-bsp_vcpu; atomic_set(pt-pending, 0); ps-irq_ack = 1; diff --git a/arch/x86/kvm/kvm_timer.h b/arch/x86/kvm/kvm_timer.h index 55c7524..fbd4acb 100644 --- a/arch/x86/kvm/kvm_timer.h +++ b/arch/x86/kvm/kvm_timer.h @@ -1,3 +1,5 @@ +#ifndef __KVM_TIMER_H +#define __KVM_TIMER_H struct kvm_timer { struct hrtimer timer; @@ -13,6 +15,5 @@ struct kvm_timer_ops { bool (*is_periodic)(struct kvm_timer *); }; - -enum hrtimer_restart kvm_timer_fn(struct hrtimer *data); +#endif diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 44acf7c..0005409 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1040,6 +1040,37 @@ static const struct kvm_io_device_ops apic_mmio_ops = { .write= apic_mmio_write, }; +static enum hrtimer_restart lapic_timer_fn(struct hrtimer *data) +{ + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + struct kvm_vcpu *vcpu; + struct kvm_lapic *lapic; + + vcpu = ktimer-vcpu; + if (!vcpu) + return HRTIMER_NORESTART; + + lapic = vcpu-arch.apic; + + /* +* There is a race window between reading and incrementing, but we do +* not care about potentially losing timer events in the !reinject +* case anyway. +*/ + if (ktimer-reinject || !atomic_read(ktimer-pending)) + atomic_inc(ktimer-pending); + + if (waitqueue_active(vcpu-wq)) + wake_up_interruptible(vcpu-wq); + + if (ktimer-t_ops-is_periodic(ktimer)) { + hrtimer_add_expires_ns(ktimer-timer, ktimer-period); + return HRTIMER_RESTART; + } + else + return HRTIMER_NORESTART; +} + int kvm_create_lapic(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic; @@ -1065,7 +1096,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu) hrtimer_init(apic-lapic_timer.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); - apic-lapic_timer.timer.function = kvm_timer_fn; + apic-lapic_timer.timer.function = lapic_timer_fn; apic-lapic_timer.t_ops = lapic_timer_ops; apic-lapic_timer.kvm = vcpu-kvm; apic-lapic_timer.vcpu = vcpu; diff --git a/arch
[PATCH 07/12] Rename kvm_apic_accept_pic_intr
Call it kvm_apic_in_virtual_wire_mode, which is more correct. Also change it to not only operate properly on the boot CPU, but on any CPU. Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/i8259.c |2 +- arch/x86/kvm/irq.c |4 ++-- arch/x86/kvm/lapic.c | 17 - arch/x86/kvm/lapic.h |2 +- 4 files changed, 12 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index 8285f0d..80d4705 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -263,7 +263,7 @@ static void kvm_pic_reset(struct kvm_kpic_state *s) s-init4 = 0; for (irq = 0; irq PIC_NUM_PINS/2; irq++) { - if (vcpu0 kvm_apic_accept_pic_intr(vcpu0)) + if (vcpu0 kvm_apic_in_virtual_wire_mode(vcpu0)) if (irr (1 irq) || isr (1 irq)) { pic_clear_isr(s, irq); } diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index 96dfbb6..05721fc 100644 --- a/arch/x86/kvm/irq.c +++ b/arch/x86/kvm/irq.c @@ -53,7 +53,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v) return v-arch.interrupt.pending; if (kvm_apic_has_interrupt(v) == -1) { /* LAPIC */ - if (kvm_apic_accept_pic_intr(v)) { + if (kvm_apic_in_virtual_wire_mode(v)) { s = pic_irqchip(v-kvm);/* PIC */ return s-output; } else @@ -76,7 +76,7 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v) vector = kvm_get_apic_interrupt(v); /* APIC */ if (vector == -1) { - if (kvm_apic_accept_pic_intr(v)) { + if (kvm_apic_in_virtual_wire_mode(v)) { s = pic_irqchip(v-kvm); s-output = 0; /* PIC */ vector = kvm_pic_read_irq(v-kvm); diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index cd60c0b..44acf7c 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -435,7 +435,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, /* * Should only be called by kvm_apic_local_deliver() with LVT0, * before NMI watchdog was enabled. Already handled by -* kvm_apic_accept_pic_intr(). +* kvm_apic_in_virtual_wire_mode(). */ break; @@ -1099,18 +1099,17 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) return highest_irr; } -int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu) +int kvm_apic_in_virtual_wire_mode(struct kvm_vcpu *vcpu) { u32 lvt0 = apic_get_reg(vcpu-arch.apic, APIC_LVT0); int r = 0; - if (kvm_vcpu_is_bsp(vcpu)) { - if (!apic_hw_enabled(vcpu-arch.apic)) - r = 1; - if ((lvt0 APIC_LVT_MASKED) == 0 - GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT) - r = 1; - } + if (!apic_hw_enabled(vcpu-arch.apic)) + r = 1; + if ((lvt0 APIC_LVT_MASKED) == 0 + GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT) + r = 1; + return r; } diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 40010b0..ce4cd2d 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -22,7 +22,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu); void kvm_free_lapic(struct kvm_vcpu *vcpu); int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu); -int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu); +int kvm_apic_in_virtual_wire_mode(struct kvm_vcpu *vcpu); int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu); void kvm_lapic_reset(struct kvm_vcpu *vcpu); u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu); -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/12] Allow the PIC to signal CPUs other than BSP.
While generally the PIT is used to signal only the BSP, it's doesn't actually have to do that architecturally. Allow the PIC to signal any VCPU that is in Virtual Wire mode. Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/i8259.c | 19 ++- 1 files changed, 10 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c index 80d4705..ff20415 100644 --- a/arch/x86/kvm/i8259.c +++ b/arch/x86/kvm/i8259.c @@ -242,8 +242,6 @@ int kvm_pic_read_irq(struct kvm *kvm) static void kvm_pic_reset(struct kvm_kpic_state *s) { int irq; - struct kvm *kvm = s-pics_state-irq_request_opaque; - struct kvm_vcpu *vcpu0 = kvm-bsp_vcpu; u8 irr = s-irr, isr = s-imr; s-last_irr = 0; @@ -263,10 +261,8 @@ static void kvm_pic_reset(struct kvm_kpic_state *s) s-init4 = 0; for (irq = 0; irq PIC_NUM_PINS/2; irq++) { - if (vcpu0 kvm_apic_in_virtual_wire_mode(vcpu0)) - if (irr (1 irq) || isr (1 irq)) { - pic_clear_isr(s, irq); - } + if (irr (1 irq) || isr (1 irq)) + pic_clear_isr(s, irq); } } @@ -506,14 +502,19 @@ static int picdev_read(struct kvm_io_device *this, static void pic_irq_request(void *opaque, int level) { struct kvm *kvm = opaque; - struct kvm_vcpu *vcpu = kvm-bsp_vcpu; struct kvm_pic *s = pic_irqchip(kvm); int irq = pic_get_irq(s-pics[0]); + struct kvm_vcpu *vcpu; + int i; s-output = level; - if (vcpu level (s-pics[0].isr_ack (1 irq))) { + if (level (s-pics[0].isr_ack (1 irq))) { s-pics[0].isr_ack = ~(1 irq); - kvm_vcpu_kick(vcpu); + kvm_for_each_vcpu(i, vcpu, kvm) { + if (kvm_apic_in_virtual_wire_mode(vcpu) + waitqueue_active(vcpu-wq)) + wake_up_interruptible(vcpu-wq); + } } } -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/12] Kick appropriate CPUs when signalling interrupts.
Make sure that we kick the appropriate vcpu when delivering an interrupt. This makes sure that we wake any idle cpus to cause a vcpu_run and an interrupt injection to occur. Signed-off-by: Chris Lalancette clala...@redhat.com --- virt/kvm/irq_comm.c |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 9b07734..96df854 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -98,6 +98,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, if (r 0) r = 0; r += kvm_apic_set_irq(vcpu, irq); + kvm_vcpu_kick(vcpu); } else { if (!lowest) lowest = vcpu; @@ -106,8 +107,10 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, } } - if (lowest) + if (lowest) { r = kvm_apic_set_irq(lowest, irq); + kvm_vcpu_kick(lowest); + } return r; } -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/12] Fix missing spin_lock in PIT timer.
Changes to the irq_ack variable in the pit_state must be protected by the inject_lock spinlock; otherwise, we can erroneously inject a timer interrupt into a guest. Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/i8254.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index dc6eff4..ece7e12 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -606,10 +606,13 @@ void kvm_pit_reset(struct kvm_pit *pit) static void pit_mask_notifer(struct kvm_irq_mask_notifier *kimn, bool mask) { struct kvm_pit *pit = container_of(kimn, struct kvm_pit, mask_notifier); + unsigned long flags; if (!mask) { + spin_lock_irqsave(pit-pit_state.inject_lock, flags); atomic_set(pit-pit_state.pit_timer.pending, 0); pit-pit_state.irq_ack = 1; + spin_unlock_irqrestore(pit-pit_state.inject_lock, flags); } } -- 1.6.5.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] x86: Make sure get_user_desc() doesn't sign extend.
The current implementation of get_user_desc() sign extends the return value because of integer promotion rules. For the most part, this doesn't matter, because the top bit of base2 is usually 0. If, however, that bit is 1, then the entire value will be 0x... which is probably not what the caller intended. This patch casts the entire thing to unsigned before returning, which generates almost the same assembly as the current code but replaces the final cltq (sign extend) with a mov %eax %eax (zero-extend). This fixes booting certain guests under KVM. (3rd retry, no response to the previous 2) Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/include/asm/desc.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index e8de2f6..617bd56 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -288,7 +288,7 @@ static inline void load_LDT(mm_context_t *pc) static inline unsigned long get_desc_base(const struct desc_struct *desc) { - return desc-base0 | ((desc-base1) 16) | ((desc-base2) 24); + return (unsigned)(desc-base0 | ((desc-base1) 16) | ((desc-base2) 24)); } static inline void set_desc_base(struct desc_struct *desc, unsigned long base) -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] Fix kdump under KVM.
Avi Kivity wrote: On 10/30/2009 02:23 PM, Chris Lalancette wrote: In the meantime, I've gotten the set_irq from IRQ context that Avi suggested working, and the fixing up of this IOAPIC check is the last bit to actually get kdump working. There are two problems with that: - kvm_set_irq() calls some notifiers (irq_mask_notifiers, at least, but possibly more); need to make sure those are safe from irq context - kvm_set_irq() will loop on all vcpus, usually incurring a cache miss or two; with large vcpu counts this can make the irq-off time quite large, hurting our worst-case latency we can defer the second problem since it's only a performance issue, but how did you deal with the first? Actually, there is a problem here, but I don't think it's as you describe. kvm_set_irq() doesn't directly call the mask notifiers; all it really does is call the i8259/ioapic callback, which causes an interrupt to be raised on the LAPIC in question, which causes the interrupt to be injected into the VCPU on the next run. That doesn't call the irq_mask_notifiers, and seems to be IRQ-safe (with the conversion from mutex's to spin_locks). The irq mask notifiers are called later on during an EOI. In particular, ioapic_mmio_write() - ioapic_write_indirect(), which is protected by the ioapic_lock. I don't see a problem here (although please point it out if I missed it). The problem I do see has to do with the irq_ack notifiers. In particular, if you are on ia64 and ioapic_mmio_write() gets called for the IOAPIC_REG_EOI address, we call __kvm_ioapic_update_eoi(). In that case, we drop the spin_lock and call kvm_notify_acked_irq(), which can re-enter the ioapic code. This could be IRQ unsafe, after we return from kvm_notify_acked_irq() we re-acquire the spin-lock, but in an irq-unsafe fashion. Therefore we could take the spin-lock, get interrupted for the timer interrupt, and try to re-acquire the lock, causing a deadlock. I'll have to think a bit more on how to solve that one (there's a similar situation going on in the i8259, which is not ia64 specific). So, either I'm missing exactly what you are talking about, or there's no deadlock with my patch (except with ia64). Can you explain further? -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm problems on new hardware
Ryan Harper wrote: * Danny ter Haar d...@dth.net [2009-10-29 13:38]: Update: I compiled/installed 2.6.32-rc5-git3 on this machine I manually start kvm: vhost1:~# kvm -m 512 -cdrom /vz/template/iso/debian-503-amd64-netinst.iso The bootscreen comes up, i hit enter to install and i get these messages (copied from dmesg) handle_exception: unexpected, vectoring info 0x8010 intr info 0x8b0d handle_exception: unexpected, vectoring info 0x800d intr info 0x8b0d [this line is repeated many times: dmesg |grep 0x8b0d | wc -l 570 ] and finally before ending the kvm session i get: vmx_handle_exit: unexpected, valid vectoring info (0x800d) and exit reason is 0x8021 I've seen that in a couple places. I don't think we have root cause, but in at least one situation (running win2k3 with 4G of ram) the work around was to use: Actually, I'm not 100% sure, but I might have root-caused this. At least, the exit reason is the same exit reason I fixed. That exit reason essentially means vmenter failed because of invalid guest fields. In the case I tracked down, it was because we were wrongly sign-extending the segment fields (instead of zero-extending them). Can either you or Danny: 1) Get the output from stderr of qemu when this happens? I believe qemu dumps the state of all of the guest fields when it's going to crash like this, and that can tell us if the GUEST_STATE is wrong. 2) Try the patch posted here: http://lkml.org/lkml/2009/10/28/201, and see if it helps? -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] Fix kdump under KVM.
Marcelo Tosatti wrote: On Wed, Oct 28, 2009 at 11:21:42AM +0100, Chris Lalancette wrote: Marcelo Tosatti wrote: On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote: This patch is the main point of the series. In order for kdump to properly work inside a KVM guest, we need to make sure that all VCPUs in virtual wire APIC mode get kicked to try and pick up the timer interrupts. To do this, we iterate over the CPUs and deliver interrupts to the proper VCPUs. I don't love the concept of doing kvm_irq_kick_vcpus() from within pit_timer_fn(). A PIT is not connected to a CPU at all, only to a PIC or APIC. However, if a CPU enters idle, this is the only way to wake it up to check for the interrupt. The reason the PIT interrupt was fixed to BSP is: http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg13250.html Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the destination overwrite in case its programmed by the guest to a single CPU would fix it? Ug, nasty. I definitely don't want to re-introduce that bug. What exactly do you mean by fix it? Do you mean fix the original RHEL-5.1 PAE issue, or fix the kdump issue? The kdump issue. Something like: #ifdef CONFIG_X86 /* Always delivery PIT interrupt to vcpu 0 */ if (irq == 0 dest_multiple_vcpus(entry)) { irqe.dest_mode = 0; /* Physical mode. */ /* need to read apic_id from apic regiest since * it can be rewritten */ irqe.dest_id = ioapic-kvm-bsp_vcpu-vcpu_id; } #endif The rest of the code should be ready to deal with it. Do you have any more information about how to reproduce that original issue? I tried installing a RHEL-5.1 i386 guest with a PAE kernel, and I was able to install and boot it just fine with or without the always deliver irq 0 to vcpu 0 check above. Either there's something in particular I need to do to trigger the issue (hardware or software), or it's being solved another way. In the meantime, I've gotten the set_irq from IRQ context that Avi suggested working, and the fixing up of this IOAPIC check is the last bit to actually get kdump working. -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] Fix kdump under KVM.
Marcelo Tosatti wrote: On Fri, Oct 30, 2009 at 01:23:57PM +0100, Chris Lalancette wrote: Marcelo Tosatti wrote: On Wed, Oct 28, 2009 at 11:21:42AM +0100, Chris Lalancette wrote: Marcelo Tosatti wrote: On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote: This patch is the main point of the series. In order for kdump to properly work inside a KVM guest, we need to make sure that all VCPUs in virtual wire APIC mode get kicked to try and pick up the timer interrupts. To do this, we iterate over the CPUs and deliver interrupts to the proper VCPUs. I don't love the concept of doing kvm_irq_kick_vcpus() from within pit_timer_fn(). A PIT is not connected to a CPU at all, only to a PIC or APIC. However, if a CPU enters idle, this is the only way to wake it up to check for the interrupt. The reason the PIT interrupt was fixed to BSP is: http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg13250.html Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the destination overwrite in case its programmed by the guest to a single CPU would fix it? Ug, nasty. I definitely don't want to re-introduce that bug. What exactly do you mean by fix it? Do you mean fix the original RHEL-5.1 PAE issue, or fix the kdump issue? The kdump issue. Something like: #ifdef CONFIG_X86 /* Always delivery PIT interrupt to vcpu 0 */ if (irq == 0 dest_multiple_vcpus(entry)) { irqe.dest_mode = 0; /* Physical mode. */ /* need to read apic_id from apic regiest since * it can be rewritten */ irqe.dest_id = ioapic-kvm-bsp_vcpu-vcpu_id; } #endif The rest of the code should be ready to deal with it. Do you have any more information about how to reproduce that original issue? I tried installing a RHEL-5.1 i386 guest with a PAE kernel, and I was able to install and boot it just fine with or without the always deliver irq 0 to vcpu 0 check above. Either there's something in particular I need to do to trigger the issue (hardware or software), or it's being solved another way. For VMX, the guests TSC's are now usually synchronized (which was not the case before, when this patch was written). For AMD, they start out of sync. Migration also causes them to be out of sync. So i'd prefer to postpone the removal of this limitation until there is a guarantee the TSCs are synchronized across vcpus. OK. I'll try to get on an unsynchronized TSC machine and see if I can reproduce there. In the meantime, I've gotten the set_irq from IRQ context that Avi suggested working, and the fixing up of this IOAPIC check is the last bit to actually get kdump working. The kdump fix does not require that, I believe (although it is useful/required for other things, please send the patch separately). The hrtimer will fire on vcpu0, then: kvm_inject_pit_interrupts - ioapic_set_irq - apic_deliver - kvm_vcpu_kick(vcpuN) As long as the dest-id = vcpu_id overwrite change is in place. Just to follow up here, I am taking your suggestion about testing for dest_id to multiple vcpus in the IOAPIC, and it seems to be working so far. However, that is not enough to fix kdump. We also need to remove the BSP assumptions in the i8254 code, otherwise we will never even attempt to deliver the interrupt to !vcpu0. That means that we need code for Avi's suggestion about kvm_set_irq for this to all work. Attached is the patch that I'm currently testing if anyone else wants to look at it. I obviously need to break it up into a nice series, but that will have to wait until next week. My ioapic_dest_multiple_cpus() function could do with a popcnt, but I couldn't find a macro after a quick look around the tree. What's the recommended way to count bits in Linux? -- Chris Lalancette current.patch Description: application/mbox
Re: [PATCH 0/5]: Fix kdump under KVM
Avi Kivity wrote: On 10/28/2009 12:13 PM, Chris Lalancette wrote: The kick from i8254 code is pretty bad, as you mention. I forget why it is needed at all - shouldn't kvm_set_irq() end up kicking the correct As I understand it, that's not quite how it works. From what I can see, what happens is that the i8254 is programmed as an hrtimer. When the hrtimer expires, we get a callback in kvm_timer_fn (or pit_timer_fn, in my new code). That code is running in interrupt context, however, so you can't directly call set_irq at that point. Instead, we update the pending variable and defer work until later on. That later on is when we are doing a vcpu_run, at which point we check the pending variable, and if set, inject the interrupt. The problem is that if the vcpu(s) are in idle when the hrtimer expires, and we don't kick them, no vcpu will wake up, and hence none of them will ever run set_irq to get it injected into the guest. Oh yes, I remember now. If you have other ideas on how we might accomplish this, I'd definitely be interested in hearing them. We could schedule work to run in thread context. Previous to the user return notifier work, this would cause a reloading of a bunch of msrs and would thus be quite expensive, but now switching to kernel threads and back is pretty cheap. So we could try schedule_work(). My only worry is that it uses wake_up() instead of wake_up_sync(), so it could introduce delay. I'd much prefer a variant the schedules immediately. An alternative is to make irq injection work from irq context. It's not difficult to do technically - convert mutexes to spin_lock_irqsave() - I'm just worried about long irqoff times with large guests (ioapic needs to iterate over all vcpus sometimes). This is useful for other cases - device assignment interrupt injection, irqfd for vhost/vbus. So maybe we should go this path, and worry about reducing irqoff times later when we bump KVM_MAX_VCPUS. I'm starting to take a look at this. While this may be a generically useful cleanup, it occurs to me that even if we call set_irq from the hrtimer callback, we still have to do the kick_vcpus type thing. Otherwise, we still have the problem where if all vcpus are idle, none of them will be doing vcpu_run anytime soon to actually inject the interrupt into the guest. Or did I mis-understand what you are proposing? -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] Fix kdump under KVM.
Marcelo Tosatti wrote: On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote: This patch is the main point of the series. In order for kdump to properly work inside a KVM guest, we need to make sure that all VCPUs in virtual wire APIC mode get kicked to try and pick up the timer interrupts. To do this, we iterate over the CPUs and deliver interrupts to the proper VCPUs. I don't love the concept of doing kvm_irq_kick_vcpus() from within pit_timer_fn(). A PIT is not connected to a CPU at all, only to a PIC or APIC. However, if a CPU enters idle, this is the only way to wake it up to check for the interrupt. The reason the PIT interrupt was fixed to BSP is: http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg13250.html Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the destination overwrite in case its programmed by the guest to a single CPU would fix it? Ug, nasty. I definitely don't want to re-introduce that bug. What exactly do you mean by fix it? Do you mean fix the original RHEL-5.1 PAE issue, or fix the kdump issue? -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] x86: Make sure get_user_desc() doesn't sign extend.
The current implementation of get_user_desc() sign extends the return value because of integer promotion rules. For the most part, this doesn't matter, because the top bit of base2 is usually 0. If, however, that bit is 1, then the entire value will be 0x... which is probably not what the caller intended. This patch casts the entire thing to unsigned before returning, which generates almost the same assembly as the current code but replaces the final cltq (sign extend) with a mov %eax %eax (zero-extend). This fixes booting certain guests under KVM. (2nd resend, since no response to the last two submissions) Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/include/asm/desc.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index e8de2f6..617bd56 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -288,7 +288,7 @@ static inline void load_LDT(mm_context_t *pc) static inline unsigned long get_desc_base(const struct desc_struct *desc) { - return desc-base0 | ((desc-base1) 16) | ((desc-base2) 24); + return (unsigned)(desc-base0 | ((desc-base1) 16) | ((desc-base2) 24)); } static inline void set_desc_base(struct desc_struct *desc, unsigned long base) -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5]: Fix kdump under KVM
This patch series aims to get kdump working inside a KVM guest. The current problem with using kdump is that KVM always delivers PIT interrupts to the BSP, and the BSP only. While this is technically allowed by the MPS spec, most motherboards actually deliver timer interrupts to *any* LAPIC in virtual wire mode. Since a crash can occur on any CPU, timer interrupts must be able to reach any CPU in order for kdump to work properly. Therefore, this patch series kicks all of the relevant vCPUs when delivering a timer interrupt. With these patches in place, kdump in a RHEL-5 guest works properly. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] Fix up some comments around the source tree.
Signed-off-by: Chris Lalancette clala...@redhat.com --- :100644 100644 34b700f... ba61f27... M arch/x86/kvm/svm.c :100644 100644 38a2d20... cd6f92b... M virt/kvm/ioapic.c :100644 100644 bd44fb4... c22bc17... M virt/kvm/kvm_main.c arch/x86/kvm/svm.c |2 +- virt/kvm/ioapic.c |2 +- virt/kvm/kvm_main.c |2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 34b700f..ba61f27 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2608,7 +2608,7 @@ static void svm_complete_interrupts(struct vcpu_svm *svm) break; case SVM_EXITINTINFO_TYPE_EXEPT: /* In case of software exception do not reinject an exception - vector, but re-execute and instruction instead */ + vector, but re-execute an instruction instead */ if (is_nested(svm)) break; if (kvm_exception_is_soft(vector)) diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 38a2d20..cd6f92b 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -164,7 +164,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq) irqe.shorthand = 0; #ifdef CONFIG_X86 - /* Always delivery PIT interrupt to vcpu 0 */ + /* Always deliver PIT interrupt to vcpu 0 */ if (irq == 0) { irqe.dest_mode = 0; /* Physical mode. */ /* need to read apic_id from apic regiest since diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index bd44fb4..c22bc17 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1180,7 +1180,7 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu) } /* - * Creates some virtual cpus. Good luck creating more than one. + * Creates a virtual cpu. */ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) { -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] Remove references to VCPU in i8254
Conceptually, the i8254 is hooked to a PIC or IOAPIC. Therefore, this patch removes most references to vcpu in i8254.c. There are two exceptions to this: 1) In pit_timer_fn, we still have to kick the BSP to wake it out of idle. This will be changed in a later patch. 2) In __kvm_migrate_pit_timer, we have to migrate the PIT around with the BSP, since hrtimers work on a per-CPU basis. I've added a comment here to clarify why this is needed. Signed-off-by: Chris Lalancette clala...@redhat.com --- :100644 100644 fab7440... d5c08fa... M arch/x86/kvm/i8254.c :100644 100644 d4c1c7f... d7bc40b... M arch/x86/kvm/i8254.h :100644 100644 96dfbb6... ab3a56e... M arch/x86/kvm/irq.c :100644 100644 c025a23... e16b968... M arch/x86/kvm/irq.h :100644 100644 55c7524... ba39e25... M arch/x86/kvm/kvm_timer.h arch/x86/kvm/i8254.c | 50 ++--- arch/x86/kvm/i8254.h |4 ++- arch/x86/kvm/irq.c |4 +- arch/x86/kvm/irq.h |2 - arch/x86/kvm/kvm_timer.h |3 ++ 5 files changed, 50 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index fab7440..d5c08fa 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -33,6 +33,7 @@ #include irq.h #include i8254.h +#include kvm_timer.h #ifndef CONFIG_X86_64 #define mod_64(x, y) ((x) - (y) * div64_u64(x, y)) @@ -227,12 +228,13 @@ static void pit_latch_status(struct kvm *kvm, int channel) } } -int pit_has_pending_timer(struct kvm_vcpu *vcpu) +int pit_has_pending_timer(struct kvm *kvm) { - struct kvm_pit *pit = vcpu-kvm-arch.vpit; + struct kvm_pit *pit = kvm-arch.vpit; - if (pit kvm_vcpu_is_bsp(vcpu) pit-pit_state.irq_ack) + if (pit pit-pit_state.irq_ack) return atomic_read(pit-pit_state.pit_timer.pending); + return 0; } @@ -252,6 +254,13 @@ void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu) struct kvm_pit *pit = vcpu-kvm-arch.vpit; struct hrtimer *timer; + /* +* technically, the PIT isn't hooked to a particular VCPU; +* the logical structure is PIT - [IOA]PIC - CPU[s]. However, +* hrtimers expire on a per-cpu basis, and since we initially +* created the hrtimer during BSP creation, we move it around +* with the BSP. +*/ if (!kvm_vcpu_is_bsp(vcpu) || !pit) return; @@ -277,6 +286,33 @@ static struct kvm_timer_ops kpit_ops = { .is_periodic = kpit_is_periodic, }; +static enum hrtimer_restart pit_timer_fn(struct hrtimer *data) +{ + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + int restart_timer = 0; + + /* +* There is a race window between reading and incrementing, but we do +* not care about potentially losing timer events in the !reinject +* case anyway. +*/ + if (ktimer-reinject || !atomic_read(ktimer-pending)) + atomic_inc(ktimer-pending); + + if (waitqueue_active(ktimer-kvm-bsp_vcpu-wq)) + wake_up_interruptible(ktimer-kvm-bsp_vcpu-wq); + + if (ktimer-t_ops-is_periodic(ktimer)) { + hrtimer_add_expires_ns(ktimer-timer, ktimer-period); + restart_timer = 1; + } + + if (restart_timer) + return HRTIMER_RESTART; + else + return HRTIMER_NORESTART; +} + static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period) { struct kvm_timer *pt = ps-pit_timer; @@ -291,10 +327,9 @@ static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period) pt-period = interval; ps-is_periodic = is_period; - pt-timer.function = kvm_timer_fn; + pt-timer.function = pit_timer_fn; pt-t_ops = kpit_ops; pt-kvm = ps-pit-kvm; - pt-vcpu = pt-kvm-bsp_vcpu; atomic_set(pt-pending, 0); ps-irq_ack = 1; @@ -705,10 +740,9 @@ static void __inject_pit_timer_intr(struct kvm *kvm) kvm_apic_nmi_wd_deliver(vcpu); } -void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu) +void kvm_inject_pit_timer_irqs(struct kvm *kvm) { - struct kvm_pit *pit = vcpu-kvm-arch.vpit; - struct kvm *kvm = vcpu-kvm; + struct kvm_pit *pit = kvm-arch.vpit; struct kvm_kpit_state *ps; if (pit) { diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h index d4c1c7f..d7bc40b 100644 --- a/arch/x86/kvm/i8254.h +++ b/arch/x86/kvm/i8254.h @@ -49,10 +49,12 @@ struct kvm_pit { #define KVM_MAX_PIT_INTR_INTERVAL HZ / 100 #define KVM_PIT_CHANNEL_MASK 0x3 -void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu); +int pit_has_pending_timer(struct kvm *kvm); +void kvm_inject_pit_timer_irqs(struct kvm *kvm); void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start); struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags); void kvm_free_pit(struct kvm *kvm
[PATCH 5/5] Fix kdump under KVM.
This patch is the main point of the series. In order for kdump to properly work inside a KVM guest, we need to make sure that all VCPUs in virtual wire APIC mode get kicked to try and pick up the timer interrupts. To do this, we iterate over the CPUs and deliver interrupts to the proper VCPUs. I don't love the concept of doing kvm_irq_kick_vcpus() from within pit_timer_fn(). A PIT is not connected to a CPU at all, only to a PIC or APIC. However, if a CPU enters idle, this is the only way to wake it up to check for the interrupt. Signed-off-by: Chris Lalancette clala...@redhat.com --- :100644 100644 d5c08fa... fe08823... M arch/x86/kvm/i8254.c :100644 100644 40f... 5b699c1... M arch/x86/kvm/lapic.c :100644 100644 40010b0... 9c4e52b... M arch/x86/kvm/lapic.h :100644 100644 053e49f... 975b0d6... M include/linux/kvm_host.h :100644 100644 cd6f92b... 717d265... M virt/kvm/ioapic.c :100644 100644 0d454d3... d24ac91... M virt/kvm/irq_comm.c arch/x86/kvm/i8254.c |3 +-- arch/x86/kvm/lapic.c | 12 arch/x86/kvm/lapic.h |1 + include/linux/kvm_host.h |2 ++ virt/kvm/ioapic.c|9 - virt/kvm/irq_comm.c | 12 6 files changed, 28 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index d5c08fa..fe08823 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -299,8 +299,7 @@ static enum hrtimer_restart pit_timer_fn(struct hrtimer *data) if (ktimer-reinject || !atomic_read(ktimer-pending)) atomic_inc(ktimer-pending); - if (waitqueue_active(ktimer-kvm-bsp_vcpu-wq)) - wake_up_interruptible(ktimer-kvm-bsp_vcpu-wq); + kvm_irq_kick_vcpus(ktimer-kvm); if (ktimer-t_ops-is_periodic(ktimer)) { hrtimer_add_expires_ns(ktimer-timer, ktimer-period); diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 40f..5b699c1 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1031,6 +1031,18 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu) kvm_apic_local_deliver(apic, APIC_LVT0); } +int kvm_apic_in_virtual_wire_mode(struct kvm_vcpu *vcpu) +{ + if (kvm_lapic_enabled(vcpu)) { + u32 lvt0 = apic_get_reg(vcpu-arch.apic, APIC_LVT0); + if ((lvt0 APIC_LVT_MASKED) == 0 + GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT) + return 1; + } + + return 0; +} + static struct kvm_timer_ops lapic_timer_ops = { .is_periodic = lapic_is_periodic, }; diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 40010b0..9c4e52b 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -30,6 +30,7 @@ void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8); void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value); u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu); void kvm_apic_set_version(struct kvm_vcpu *vcpu); +int kvm_apic_in_virtual_wire_mode(struct kvm_vcpu *vcpu); int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 053e49f..975b0d6 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -542,6 +542,8 @@ int kvm_set_irq_routing(struct kvm *kvm, unsigned flags); void kvm_free_irq_routing(struct kvm *kvm); +void kvm_irq_kick_vcpus(struct kvm *kvm); + #else static inline void kvm_free_irq_routing(struct kvm *kvm) {} diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index cd6f92b..717d265 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -163,15 +163,6 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq) irqe.level = 1; irqe.shorthand = 0; -#ifdef CONFIG_X86 - /* Always deliver PIT interrupt to vcpu 0 */ - if (irq == 0) { - irqe.dest_mode = 0; /* Physical mode. */ - /* need to read apic_id from apic regiest since -* it can be rewritten */ - irqe.dest_id = ioapic-kvm-bsp_vcpu-vcpu_id; - } -#endif return kvm_irq_delivery_to_apic(ioapic-kvm, NULL, irqe); } diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 0d454d3..d24ac91 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -293,6 +293,18 @@ void kvm_free_irq_routing(struct kvm *kvm) kfree(kvm-irq_routing); } +void kvm_irq_kick_vcpus(struct kvm *kvm) +{ + int i; + struct kvm_vcpu *vcpu; + + kvm_for_each_vcpu(i, vcpu, kvm) { + if (kvm_apic_in_virtual_wire_mode(vcpu)) + if (waitqueue_active(vcpu-wq)) + wake_up_interruptible(vcpu-wq); + } +} + static int setup_routing_entry(struct kvm_irq_routing_table *rt, struct kvm_kernel_irq_routing_entry *e
[PATCH 4/5] Remove timer.c
The code in arch/x86/kvm/timer.c is not similar enough between the various implementations to really share it. Move the implementation into the LAPIC code, and then remove timer.c Signed-off-by: Chris Lalancette clala...@redhat.com --- :100644 100644 31a7035... 8d9adf6... M arch/x86/kvm/Makefile :100644 100644 ba39e25... 55b2dab... M arch/x86/kvm/kvm_timer.h :100644 100644 cd60c0b... 40f... M arch/x86/kvm/lapic.c :100644 00 72b5144... 000... D arch/x86/kvm/timer.c arch/x86/kvm/Makefile|3 +- arch/x86/kvm/kvm_timer.h |3 -- arch/x86/kvm/lapic.c | 34 - arch/x86/kvm/timer.c | 47 -- 4 files changed, 34 insertions(+), 53 deletions(-) diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 31a7035..8d9adf6 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -10,8 +10,7 @@ kvm-y += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \ assigned-dev.o) kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o) -kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ - i8254.o timer.o +kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o i8254.o kvm-intel-y+= vmx.o kvm-amd-y += svm.o diff --git a/arch/x86/kvm/kvm_timer.h b/arch/x86/kvm/kvm_timer.h index ba39e25..55b2dab 100644 --- a/arch/x86/kvm/kvm_timer.h +++ b/arch/x86/kvm/kvm_timer.h @@ -15,7 +15,4 @@ struct kvm_timer_ops { bool (*is_periodic)(struct kvm_timer *); }; - -enum hrtimer_restart kvm_timer_fn(struct hrtimer *data); - #endif diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index cd60c0b..40f 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1040,6 +1040,38 @@ static const struct kvm_io_device_ops apic_mmio_ops = { .write= apic_mmio_write, }; +static enum hrtimer_restart lapic_timer_fn(struct hrtimer *data) +{ + struct kvm_vcpu *vcpu; + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + int restart_timer = 0; + + vcpu = ktimer-vcpu; + if (!vcpu) + return HRTIMER_NORESTART; + + /* +* There is a race window between reading and incrementing, but we do +* not care about potentially losing timer events in the !reinject +* case anyway. +*/ + if (ktimer-reinject || !atomic_read(ktimer-pending)) + atomic_inc(ktimer-pending); + + if (waitqueue_active(vcpu-wq)) + wake_up_interruptible(vcpu-wq); + + if (ktimer-t_ops-is_periodic(ktimer)) { + hrtimer_add_expires_ns(ktimer-timer, ktimer-period); + restart_timer = 1; + } + + if (restart_timer) + return HRTIMER_RESTART; + else + return HRTIMER_NORESTART; +} + int kvm_create_lapic(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic; @@ -1065,7 +1097,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu) hrtimer_init(apic-lapic_timer.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); - apic-lapic_timer.timer.function = kvm_timer_fn; + apic-lapic_timer.timer.function = lapic_timer_fn; apic-lapic_timer.t_ops = lapic_timer_ops; apic-lapic_timer.kvm = vcpu-kvm; apic-lapic_timer.vcpu = vcpu; diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c deleted file mode 100644 index 72b5144..000 --- a/arch/x86/kvm/timer.c +++ /dev/null @@ -1,47 +0,0 @@ -#include linux/kvm_host.h -#include linux/kvm.h -#include linux/hrtimer.h -#include asm/atomic.h -#include kvm_timer.h - -static int __kvm_timer_fn(struct kvm_vcpu *vcpu, struct kvm_timer *ktimer) -{ - int restart_timer = 0; - wait_queue_head_t *q = vcpu-wq; - - /* -* There is a race window between reading and incrementing, but we do -* not care about potentially loosing timer events in the !reinject -* case anyway. -*/ - if (ktimer-reinject || !atomic_read(ktimer-pending)) - atomic_inc(ktimer-pending); - - if (waitqueue_active(q)) - wake_up_interruptible(q); - - if (ktimer-t_ops-is_periodic(ktimer)) { - hrtimer_add_expires_ns(ktimer-timer, ktimer-period); - restart_timer = 1; - } - - return restart_timer; -} - -enum hrtimer_restart kvm_timer_fn(struct hrtimer *data) -{ - int restart_timer; - struct kvm_vcpu *vcpu; - struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); - - vcpu = ktimer-vcpu; - if (!vcpu) - return HRTIMER_NORESTART; - - restart_timer = __kvm_timer_fn(vcpu, ktimer); - if (restart_timer) - return HRTIMER_RESTART; - else - return HRTIMER_NORESTART; -} - -- 1.6.0.6 -- To unsubscribe from
[PATCH 2/5] Remove KVM_REQ_PENDING_TIMER.
KVM_REQ_PENDING_TIMER is set and cleared in a couple of places, but it never seems to be actually checked. Remove it. Signed-off-by: Chris Lalancette clala...@redhat.com --- :100644 100644 eea4043... 72b5144... M arch/x86/kvm/timer.c :100644 100644 2ef39062.. 93a65b4... M arch/x86/kvm/x86.c :100644 100644 bd5a616... 053e49f... M include/linux/kvm_host.h arch/x86/kvm/timer.c |5 + arch/x86/kvm/x86.c |1 - include/linux/kvm_host.h |1 - 3 files changed, 1 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c index eea4043..72b5144 100644 --- a/arch/x86/kvm/timer.c +++ b/arch/x86/kvm/timer.c @@ -14,11 +14,8 @@ static int __kvm_timer_fn(struct kvm_vcpu *vcpu, struct kvm_timer *ktimer) * not care about potentially loosing timer events in the !reinject * case anyway. */ - if (ktimer-reinject || !atomic_read(ktimer-pending)) { + if (ktimer-reinject || !atomic_read(ktimer-pending)) atomic_inc(ktimer-pending); - /* FIXME: this code should not know anything about vcpus */ - set_bit(KVM_REQ_PENDING_TIMER, vcpu-requests); - } if (waitqueue_active(q)) wake_up_interruptible(q); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2ef3906..93a65b4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3906,7 +3906,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) if (r = 0) break; - clear_bit(KVM_REQ_PENDING_TIMER, vcpu-requests); if (kvm_cpu_has_pending_timer(vcpu)) kvm_inject_pending_timer_irqs(vcpu); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index bd5a616..053e49f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -33,7 +33,6 @@ #define KVM_REQ_REPORT_TPR_ACCESS 2 #define KVM_REQ_MMU_RELOAD 3 #define KVM_REQ_TRIPLE_FAULT 4 -#define KVM_REQ_PENDING_TIMER 5 #define KVM_REQ_UNHALT 6 #define KVM_REQ_MMU_SYNC 7 #define KVM_REQ_KVMCLOCK_UPDATE8 -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Make sure get_user_desc() doesn't sign extend.
The current implementation of get_user_desc() sign extends the return value because of integer promotion rules. For the most part, this doesn't matter, because the top bit of base2 is usually 0. If, however, that bit is 1, then the entire value will be 0x... which is probably not what the caller intended. This patch casts the entire thing to unsigned before returning, which generates almost the same assembly as the current code but replaces the final cltq (sign extend) with a mov %eax %eax (zero-extend). This fixes booting certain guests under KVM. (resend; I got no response last time so I guess I sent it to the wrong places) Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/include/asm/desc.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index e8de2f6..617bd56 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -288,7 +288,7 @@ static inline void load_LDT(mm_context_t *pc) static inline unsigned long get_desc_base(const struct desc_struct *desc) { - return desc-base0 | ((desc-base1) 16) | ((desc-base2) 24); + return (unsigned)(desc-base0 | ((desc-base1) 16) | ((desc-base2) 24)); } static inline void set_desc_base(struct desc_struct *desc, unsigned long base) -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Make sure get_user_desc() doesn't sign extend.
The current implementation of get_user_desc() sign extends the return value because of integer promotion rules. For the most part, this doesn't matter, because the top bit of base2 is usually 0. If, however, that bit is 1, then the entire value will be 0x... which is probably not what the caller intended. This patch casts the entire thing to unsigned before returning, which generates almost the same assembly as the current code but replaces the final cltq (sign extend) with a mov %eax %eax (zero-extend). This fixes booting certain guests under KVM. Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/include/asm/desc.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index e8de2f6..617bd56 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -288,7 +288,7 @@ static inline void load_LDT(mm_context_t *pc) static inline unsigned long get_desc_base(const struct desc_struct *desc) { - return desc-base0 | ((desc-base1) 16) | ((desc-base2) 24); + return (unsigned)(desc-base0 | ((desc-base1) 16) | ((desc-base2) 24)); } static inline void set_desc_base(struct desc_struct *desc, unsigned long base) -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix up vmx_set_segment for booting older guests.
If a guest happens to be unlucky enough to use an address such as 0xc000 in the CS base address field, the next attempt to VM enter will fail. This is because the vmcs_writel() that writes the base address into the VMCS will sign-extend the field to 64-bits, and the Intel manual states that bits 63:32 of this field *must* be 0. Use vmcs_write32() where appropriate. This fixes booting of an absolutely ancient Red Hat Linux 5.2 (not Enterprise Linux!) guest. Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/vmx.c | 17 - 1 files changed, 16 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 364263a..311afd4 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1846,7 +1846,22 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu, vmx-rmode.tr.ar = vmx_segment_access_rights(var); return; } - vmcs_writel(sf-base, var-base); + + /* Intel 64 and IA-32 Architecture Software Developer's Manual Vol. 3b, +* section 22.3.1.2 states that VMENTRY will fail if bits 63:32 of the +* base address for CS, SS, DS, ES are not 0 and the register is usable. +* +* If var-base happens to have bit 31 set, then it will get sign +* extended on the vmcs_writel(), causing this check to fail. Make +* sure to use the 32-bit version where appropriate. +*/ + if (sf-base == GUEST_CS_BASE || + ((~sf-ar_bytes 0x0001) (sf-base == GUEST_SS_BASE || + sf-base == GUEST_DS_BASE || + sf-base == GUEST_ES_BASE))) + vmcs_write32(sf-base, var-base); + else + vmcs_writel(sf-base, var-base); vmcs_write32(sf-limit, var-limit); vmcs_write16(sf-selector, var-selector); if (vmx-rmode.vm86_active var-s) { -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Print Guest VMCS state on vmexit failure
If we fail to handle a VMEXIT for some reason, print out a lot more debugging information about the state of the GUEST VMCS area. This does not fix a bug, but helps a lot when trying to track down the cause of a VMEXIT/VMENTRY failure. Signed-off-by: Chris Lalancette clala...@redhat.com --- arch/x86/kvm/vmx.c | 38 ++ 1 files changed, 38 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 311afd4..37b1682 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3452,6 +3452,14 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { static const int kvm_vmx_max_exit_handlers = ARRAY_SIZE(kvm_vmx_exit_handlers); +#define PRINT_GUEST_SEGMENT(seg) do { \ + printk(KERN_DEBUG #seg : SELECTOR 0x%lx, BASE 0x%lx, LIMIT 0x%lx, AR 0x%lx\n, \ + vmcs_readl(GUEST_##seg##_SELECTOR), \ + vmcs_readl(GUEST_##seg##_BASE), \ + vmcs_readl(GUEST_##seg##_LIMIT), \ + vmcs_readl(GUEST_##seg##_AR_BYTES)); \ + while(0) + /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -3512,6 +3520,36 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) else { vcpu-run-exit_reason = KVM_EXIT_UNKNOWN; vcpu-run-hw.hardware_exit_reason = exit_reason; + + printk(KERN_DEBUG GUEST STATE:\n); + printk(KERN_DEBUG CR0: 0x%lx\n, vmcs_readl(GUEST_CR0)); + printk(KERN_DEBUG CR3: 0x%lx\n, vmcs_readl(GUEST_CR3)); + printk(KERN_DEBUG CR4: 0x%lx\n, vmcs_readl(GUEST_CR4)); + printk(KERN_DEBUG VMENTRY CONTROL: 0x%lx\n, + vmcs_readl(VM_ENTRY_CONTROLS)); + printk(KERN_DEBUG DR7: 0x%lx\n, vmcs_readl(GUEST_DR7)); + printk(KERN_DEBUG SYSENTER ESP: 0x%lx\n, + vmcs_readl(GUEST_SYSENTER_ESP)); + printk(KERN_DEBUG SYSENTER EIP: 0x%lx\n, + vmcs_readl(GUEST_SYSENTER_EIP)); + + PRINT_GUEST_SEGMENT(CS); + PRINT_GUEST_SEGMENT(SS); + PRINT_GUEST_SEGMENT(DS); + PRINT_GUEST_SEGMENT(ES); + PRINT_GUEST_SEGMENT(FS); + PRINT_GUEST_SEGMENT(GS); + PRINT_GUEST_SEGMENT(TR); + PRINT_GUEST_SEGMENT(LDTR); + + printk(KERN_DEBUG GDTR: BASE 0x%lx, LIMIT 0x%lx, + vmcs_readl(GUEST_GDTR_BASE), + vmcs_readl(GUEST_GDTR_LIMIT)); + printk(KERN_DEBUG IDTR: BASE 0x%lx, LIMIT 0x%lx, + vmcs_readl(GUEST_IDTR_BASE), + vmcs_readl(GUEST_IDTR_LIMIT)); + printk(KERN_DEBUG RIP: 0x%lx\n,vmcs_readl(GUEST_RIP)); + printk(KERN_DEBUG RFLAGS: 0x%lx\n,vmcs_readl(GUEST_RFLAGS)); } return 0; } -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Print Guest VMCS state on vmexit failure
Avi Kivity wrote: On 10/20/2009 04:50 PM, Chris Lalancette wrote: If we fail to handle a VMEXIT for some reason, print out a lot more debugging information about the state of the GUEST VMCS area. This does not fix a bug, but helps a lot when trying to track down the cause of a VMEXIT/VMENTRY failure. register state can just as easily be examined in the qemu monitor. Ah, true. OK, forget this patch. -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Print Guest VMCS state on vmexit failure
Nikola Ciprich wrote: Hi, maybe it's stupid question, but is this available also when qemu/kvm is started using libvirt stuff? I think it uses monitor so it's inaccessible for user no? Yes and no. The monitor is inaccessible when using libvirt, but I totally forgot that qemu dumps the register state to stderr before abort()'ing on an unknown vm exit. Libvirt takes the output from stderr and stores it in /var/log/libvirt/qemu/guestname. So you would still be able to see this output when using libvirt. -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix up vmx_set_segment for booting older guests.
Avi Kivity wrote: +else +vmcs_writel(sf-base, var-base); vmcs_write32(sf-limit, var-limit); I think the correct fix is to zero extend in vmcs_writel() rather than here. But as far as I can tell, it already does. Where does the sign extension occur? Perhaps in userspace? Very good question Avi. I should have dug a bit deeper before posting. I traced this further back, and here's what it looks like is going on: arch/x86/kvm/x86.c:kvm_load_segment_descriptor() is responsible for loading the CPU segment descriptor into the VMCS area. It does this by calling load_segment_descriptor_to_kvm_desct(), doing a few minor transformations of the data, then calling kvm_set_segment() to load it into the VMCS. The problem arises in load_segment_descriptor_to_kvm_desct() - seg_desct_to_kvm_desct(). seg_desct_to_kvm_desct() takes the struct desc_struct (in this case, base0 == 0x0, base1 == 0x0, and base2 == 0xc0), then calls get_desc_base() and stores the result in the struct kvm_segment. The return value from get_desc_base is It's here that the sign-extension occurs, which eventually causes that VM entry failure. get_desc_base() sign-extends because of some complicated u8 to unsigned rules that I'm not completely sure of. The below patch fixes my original issue, but I'm not at all sure that this is the right thing to do. I could also change get_desc_base() itself to do the casting, which should do the right thing for all callers, but I'm not sure if that's what all callers want. Anybody else have an opinion? diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a93ba29..b58bda2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3997,7 +3997,7 @@ static void kvm_set_segment(struct kvm_vcpu *vcpu, static void seg_desct_to_kvm_desct(struct desc_struct *seg_desc, u16 selector, struct kvm_segment *kvm_desct) { - kvm_desct-base = get_desc_base(seg_desc); + kvm_desct-base = (unsigned)get_desc_base(seg_desc); kvm_desct-limit = get_desc_limit(seg_desc); if (seg_desc-g) { kvm_desct-limit = 12; -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is AMD rev F the same thing as socket F?
Neil Aggarwal wrote: I forgot to mention that I have an AMD Opteron 2210 processor. When I get cat /proc/cpuinfo, I get this output for the first core (The second core is the same): vendor_id : AuthenticAMD cpu family : 15 ^^ means that you have a Rev F (cpu_family 15 == 0xf in hex). -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] add sysctl for kvm wallclock sync
Glauber Costa wrote: diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index fc409e9..3a4e1bd 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -27,7 +27,7 @@ #define KVM_SCALE 22 static int kvmclock = 1; -static unsigned int kvm_wall_update_interval = 5; +unsigned int kvm_wall_update_interval = 5; I think the overall idea is very interesting, but I also think that it should be disabled by default. Because of the problems with time in virtualization, people are already conditioned to run ntpd inside their guests, and this kvmclock change will fight with ntpd. Also, the command # date 09091323 (or whatever) ceases to work like it does on bare-metal, so I think it has to be an opt-in feature. -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]: Always use KVM_VERSION to build version number
Avi, Trying to use libvirt with development snapshots of qemu-kvm is a bit problematic. The trouble is that for all development snapshots, the value that gets placed into this string: QEMU PC emulator version 0.10.0 (kvm-devel), Copyright (c) 2003-2008 Is always kvm-devel. That means we can't tell if this is a kvm development snapshot built yesterday, or 6 months ago, which means that in turn we can't tell what features are available. While we can always tell people building their own qemu to force it by echoing a value into KVM_VERSION, it would be much better if this were done by default. Something like kvm-88-devel, which would signify that this the development happening after kvm-88, leading towards kvm-89. Would you accept something like the patch below, which would require you to edit the KVM_VERSION file twice during a release (once right before the release, to change it to kvm-89, and once right after the release to change it back to kvm-89-devel)? Signed-off-by: Chris Lalancette clala...@redhat.com diff --git a/KVM_VERSION b/KVM_VERSION new file mode 100644 index 000..efd3e0e --- /dev/null +++ b/KVM_VERSION @@ -0,0 +1 @@ +kvm-88-devel
Problem with X on 32 bit guest on 64-bit host
All, I've been trying to track down this problem with starting X on a 32-bit guest on a 64-bit host, and I've hit a bit of a wall. Let me describe the setup: Host: AMD Barcelona machine, 16GB memory, 8 cores, running 2.6.29-rc2, kvm-userspace 3f7cba35281a5b2dba008179a4979d737105574d Guest: RHEL-5 32-bit guest, single VCPU. The problem is that inside the 32-bit guest, X refuses to start. Now, on an Intel platform I have hanging around here, this works just fine; I copy the guest over, start it up, and X starts right up. Also, on the Barcelona, with a 64-bit RHEL-5 guest, X starts fine. I've done quite a bit of tracing inside the guest, and from the guest's perspective, something just isn't right. When X is trying to start, one thing it does is copy a BIOS region from /dev/mem into a shared memory region mapped at 0 inside the X process. The page fault for the access to the memory region at 0 works just fine, but the very next page fault that is injected is completely bogus; it's either TASK_SIZE (which is 0xc000), or has bogus VMA flags set, etc. Going further, what actually happens is that X uses glibc's optimized memcpy routine, which, in assembly, looks like this: (gdb) disass memcpy Dump of assembler code for function memcpy: 0x00387090 memcpy+0: mov0xc(%esp),%ecx 0x00387094 memcpy+4: mov%edi,%eax 0x00387096 memcpy+6: mov0x4(%esp),%edi 0x0038709a memcpy+10: mov%esi,%edx 0x0038709c memcpy+12: mov0x8(%esp),%esi 0x003870a0 memcpy+16: cld 0x003870a1 memcpy+17: shr%ecx 0x003870a3 memcpy+19: jae0x3870a6 memcpy+22 0x003870a5 memcpy+21: movsb %ds:(%esi),%es:(%edi) 0x003870a6 memcpy+22: shr%ecx 0x003870a8 memcpy+24: jae0x3870ac memcpy+28 0x003870aa memcpy+26: movsw %ds:(%esi),%es:(%edi) 0x003870ac memcpy+28: rep movsl %ds:(%esi),%es:(%edi) 0x003870ae memcpy+30: mov%eax,%edi 0x003870b0 memcpy+32: mov%edx,%esi 0x003870b2 memcpy+34: mov0x4(%esp),%eax 0x003870b6 memcpy+38: ret If I replace that optimized memcpy routine with my own, stupid memcpy (basically just dst[i] = src[i] in a loop), everything works fine, and doesn't get the bogus page fault. In turn, that leads me to suspect that the rep command is actually not being emulated properly on the host side, but I'm not quite sure of that, nor am I sure where to go from here. Does anybody have any ideas of what I can do to further track this down? -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]: Trivial qemu/configure fix
If kvm is *not* detected, then the qemu/configure script gets upset. Add some quotes to make it happier. Signed-off-by: Chris Lalancette clala...@redhat.com diff --git a/qemu/configure b/qemu/configure index ff4a462..107699a 100755 --- a/qemu/configure +++ b/qemu/configure @@ -1642,7 +1642,7 @@ disable_cpu_emulation() { } configure_kvm() { - if test $kvm = yes -a $target_softmmu = yes -a \ + if test $kvm = yes -a $target_softmmu = yes -a \ \( $cpu = i386 -o $cpu = x86_64 -o $cpu = ia64 -o $cpu = powerpc \); then echo #define USE_KVM 1 $config_h echo USE_KVM=1 $config_mak
Re: [PATCH 0/2] kvm: disable virtualization on kdump
Eric W. Biederman wrote: svm can writeback into memory at odd times if we don't do this, and the cost is small - clear a bit in EFER. There's no reason to be lazy. Especially if we can clear that bit unconditionally (when EFER is present) I'm all for it. That is the case. Cool. I forget if we have to test for EFER on 32bit, or if we can just wrmsr and be done with it. Regardless that sounds easy to do on the kexec path. I'm pretty sure you have to test for it first; pre-64 bit x86 hardware doesn't have the EFER register, so you'll fault on access. On the other hand pre-64 bit x86 hardware doesn't usually (ever?) have VT extensions either. -- Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 1/10] Refactor QEMUFile for live migration
Anthony Liguori wrote: snip void qemu_fflush(QEMUFile *f) { -if (!f-is_writable) +if (!f-put_buffer) return; + if (f-buf_index 0) { -if (f-is_file) { -fseek(f-outfile, f-buf_offset, SEEK_SET); -fwrite(f-buf, 1, f-buf_index, f-outfile); -} else { -bdrv_pwrite(f-bs, f-base_offset + f-buf_offset, -f-buf, f-buf_index); -} + f-put_buffer(f-opaque, f-buf, f-buf_offset, f-buf_index); Nit...whitespace damage. ... Overall, seems to be a good refactoring, and seems to keep the original semantics of qemu_fopen_bdrv() and qemu_fopen(). Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 3/10] Add bdrv_flush_all()
Anthony Liguori wrote: This patch adds a bdrv_flush_all() function. It's necessary to ensure that all IO operations have been flushed to disk before completely a live migration. N.B. we don't actually use this now. We really should flush the block drivers using an live savevm callback to avoid unnecessary guest down time. Simple enough, and follows the pattern in the KVM migration. Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vmport: unknown command 13
Elmar Haneke wrote: After movong from KVM-72 to KVM-73 I do get the Notice vmport: unknown command 13 The Message appears on starting emulation. In an Netboot environment it does appear before booting from Network is asked. What might go wrong here? It's actually harmless; it's the Bochs BIOS trying to access the vmport function 13, which means give me your UUID. However, upstream Qemu (and hence, KVM) doesn't support this function, so that's where the error message comes from. If I understand it correctly, upstream Qemu now has suppressed this warning message, so the next time KVM syncs up, the message will disappear. Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Migrating Vm's from one machine to another...and back
Jean-Pierre Dion wrote: Hi, I made some VM migrations between several machines (Intel-Intel and Intel-AMD) and when one VM has been migrated onto a second machine, I cannot put it back on the first one. I talked with Laurent and it seems that there is no way (or no obvious one) to do that. This could be useful when doing some tasks (load balancing...). Is it an item that is interesting enough to be done ? And does anyone have some idea on how to do that ? Hm, this seems to work for me, in combination with my recently posted Implement tcp nowait option patch. Basically you start the guest on machine A like normal. Then you start the container on machine B with the -incoming flags. Now you migrate the guest over to machine B (I assume you've gotten this far). Now, you kill the container on machine A (since the running guest is on B now), and restart it with -incoming. Now you migrate back from B to A, and it all works out. You can actually probably get rid of one of these steps by starting the machine on machine A in the first place with -S -incoming tcp://0:,nowait, and then running cont in the monitor. In this case, the machine is already running, and *also* listening for incoming migrate requests. Now you do exactly the same on machine B, and you should be able to migrate back and forth between them at will. Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: Implement tcp nowait option for migration
Chris Lalancette wrote: Sometimes you want to be able to start up the receiving side of a live migration and actually be able to run monitor commands before you do the migration. Libvirt, in particular, wants to do this for setting up the migration. This patch implements a nowait option to the receiving side so that you start up the receiving side similar to: qemu-kvm -M pc -S blah blah -incoming tcp://0:,nowait Then you are able to interact with the monitor before the live migration takes place. Anthony, Any comments on this patch? I'd like to try to get this in the KVM tree once Avi gets back, so I would prefer to do any iterations/cleanups now. Thanks, Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH]: Add a migrate_incoming monitor option
Daniel P. Berrange wrote: An accept trick only handles the TCP case though. I know this was Chris' example that we're currently using, but we intend to switch to passing an open file descriptor instead, and proxying the data via a secure channel instead of the plain tcp, or builtin SSH tunnelling. With an open FD we'll need another way - I guess just registering a event on POLLIN might do the trick. It seems to me that just having an explicit migrate incoming monitor command would be simpler than having to code different triggers to implement 'nowait' for each transport we have. I just finished a patch to do the nowait for tcp as suggested by Anthony, and it was actually far less code then I feared. It just needed a little re-factoring of the migrate_incoming_tcp() function. I'll be posting that in a couple of minutes. Given how this code worked out, doing a nowait for the fd style is really no big deal. Both ways (explicit monitor command and nowait) seem to have their benefits; the monitor command has the benefit of being more flexible, while the nowait has the benefit of being similar to already existing Qemu commands. My preference is for the monitor command since it seems a little more natural for a management tool, but the nowait is clearly an option as well. I'll also post a cleanup patch with Dan's suggestion for the monitor patch, so both implementations will be available. Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]: Implement tcp nowait option for migration
Sometimes you want to be able to start up the receiving side of a live migration and actually be able to run monitor commands before you do the migration. Libvirt, in particular, wants to do this for setting up the migration. This patch implements a nowait option to the receiving side so that you start up the receiving side similar to: qemu-kvm -M pc -S blah blah -incoming tcp://0:,nowait Then you are able to interact with the monitor before the live migration takes place. Signed-off-by: Chris Lalancette [EMAIL PROTECTED] diff --git a/qemu/migration.c b/qemu/migration.c index a64a287..d16e289 100644 --- a/qemu/migration.c +++ b/qemu/migration.c @@ -886,13 +886,10 @@ static int migrate_incoming_fd(int fd) return ret; } -static int migrate_incoming_tcp(const char *host) +static int migrate_listen_tcp(const char *host, int *outfd) { struct sockaddr_in addr; -socklen_t addrlen = sizeof(addr); -int fd, sfd; -ssize_t len; -uint8_t status = 0; +int fd = -1; int reuse = 1; int rc; @@ -928,19 +925,43 @@ static int migrate_incoming_tcp(const char *host) goto error_socket; } +*outfd = fd; + +return 0; + +error_socket: +close(fd); +error: +return rc; +} + +struct migrate_tcp_data { +int listen_fd; +int rc; +}; + +static void migrate_incoming_tcp(void *opaque) +{ +struct sockaddr_in addr; +socklen_t addrlen = sizeof(addr); +struct migrate_tcp_data *data = (struct migrate_tcp_data *)opaque; +int sfd; +ssize_t len; +uint8_t status = 0; + again: -sfd = accept(fd, (struct sockaddr *)addr, addrlen); +sfd = accept(data-listen_fd, (struct sockaddr *)addr, addrlen); if (sfd == -1) { if (errno == EINTR) goto again; perror(accept() failed); -rc = MIG_STAT_DST_ACCEPT_FAILED; +data-rc = MIG_STAT_DST_ACCEPT_FAILED; goto error_socket; } -rc = migrate_incoming_fd(sfd); -if (rc != 0) { -fprintf(stderr, migrate_incoming_fd failed (rc=%d)\n, rc); +data-rc = migrate_incoming_fd(sfd); +if (data-rc != 0) { +fprintf(stderr, migrate_incoming_fd failed (rc=%d)\n, data-rc); goto error_accept; } @@ -951,13 +972,13 @@ send_ack: if (len != 1) { fprintf(stderr, migration: send_ack: write error len=%zu (%s)\n, len, strerror(errno)); -rc = MIG_STAT_DST_WRITE_FAILED; +data-rc = MIG_STAT_DST_WRITE_FAILED; goto error_accept; } -rc = wait_for_message(WAIT FOR GO, sfd, wait_for_message_timeout); -if (rc) { -rc += 200; +data-rc = wait_for_message(WAIT FOR GO, sfd, wait_for_message_timeout); +if (data-rc) { +data-rc += 200; goto error_accept; } @@ -966,7 +987,7 @@ wait_for_go: if (len == -1 errno == EAGAIN) goto wait_for_go; if (len != 1) { -rc = MIG_STAT_DST_READ_FAILED; +data-rc = MIG_STAT_DST_READ_FAILED; fprintf(stderr, migration: wait_for_go: read error len=%zu (%s)\n, len, strerror(errno)); } @@ -974,9 +995,10 @@ wait_for_go: error_accept: close(sfd); error_socket: -close(fd); -error: -return rc; +qemu_set_fd_handler(data-listen_fd, NULL, NULL, NULL); +close(data-listen_fd); + +qemu_free(data); } int migrate_incoming(const char *device) @@ -996,16 +1018,57 @@ int migrate_incoming(const char *device) } } else if (strstart(device, tcp://, ptr)) { char *host, *end; + struct migrate_tcp_data *data; + int is_waitconnect = 1; + host = strdup(ptr); + if (!host) + goto fail; end = strchr(host, '/'); if (end) *end = 0; - ret = migrate_incoming_tcp(host); + + data = qemu_mallocz(sizeof(struct migrate_tcp_data)); + if (!data) { + qemu_free(host); + goto fail; + } + + ptr = host; + while((ptr = strchr(ptr,','))) { + ptr++; + if (!strncmp(ptr,nowait,6)) { + is_waitconnect = 0; + } else { + printf(Unknown option: %s\n, ptr); + qemu_free(host); + goto fail; + } + } + + ret = migrate_listen_tcp(host, (data-listen_fd)); qemu_free(host); + if (ret != 0) + goto fail; + + /* + * if we made it here, then migrate_incoming_tcp is responsible for + * freeing the data structure + */ + if (!is_waitconnect) { + socket_set_nonblock(data-listen_fd); + qemu_set_fd_handler(data-listen_fd, migrate_incoming_tcp, NULL, data); + } + else { + migrate_incoming_tcp(data); + ret = data-rc; + } + } else { errno = EINVAL; ret = MIG_STAT_DST_INVALID_PARAMS; } + fail: return ret; }
Re: [PATCH]: Add a migrate_incoming monitor option
Daniel P. Berrange wrote: @@ -9673,11 +9675,16 @@ int main(int argc, char **argv) if (incoming) { int rc; -rc = migrate_incoming(incoming); -if (rc != 0) { -fprintf(stderr, Migration failed rc=%d\n, rc); -exit(rc); -} +if (strncmp(incoming, monitor, 7) == 0) { +incoming_monitor = 1; +} +else { +rc = migrate_incoming(incoming); +if (rc != 0) { +fprintf(stderr, Migration failed rc=%d\n, rc); +exit(rc); +} +} Rather than putting the strncmp(monitor) into vl.c, I'd just leave this part as is. Put the logic into the 'migrate_incoming()' method so that it just sets the 'incoming_monitor' flag and then returns immediately. That would allwo the 'incoming_Monitor' flag to be declared static to the migrate.c file, instead of polluting vl.c Actually, that won't quite work. We still need to share the incoming_monitor flag between migration.c and monitor.c. However, your suggestion is better in that this is a migration-specific flag, so I'll move it over like you suggest. Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]: Add a migrate_incoming monitor option
We've been trying to plumb libvirt to do KVM migration. One of the stumbling blocks we are running into, however, is that libvirt expects to be able to use the Qemu monitor both before and after migration has taken place, on both the source and destination nodes. After migration has taken place is no problem; we return to the main qemu select() loop, and we can run monitor commands. However, before migration, on the destination side, when we start qemu with a command-line like: qemu-kvm -M pc -S blah blah -incoming tcp://0: we can't run any monitor commands since the migration code is synchronously waiting for an incoming tcp connection. To get around this, the following patch adds a new monitor command called migrate_incoming; it takes all of the same parameters as the command-line option, but just starts it later. To make sure it is safe, you actually have to start with -incoming monitor; if you run it without that, it will just spit an error at you. So with this in place, libvirt can do the equivalent of: qemu-kvm -M pc -S blah blah -incoming monitor (qemu) info cpus ...other commands (qemu) migrate_incoming tcp://0: ...wait for migration to start, and then complete (qemu) info block ...etc. Signed-off-by: Chris Lalancette [EMAIL PROTECTED] diff --git a/qemu/monitor.c b/qemu/monitor.c index 20dcca6..c11b82c 100644 --- a/qemu/monitor.c +++ b/qemu/monitor.c @@ -504,6 +504,25 @@ static void do_cont(void) vm_start(); } +static void do_migrate_incoming(const char *incom) +{ +extern int incoming_monitor; + +if (!incoming_monitor) { +term_printf(FAIL: Can only use the migrate-incoming command with -incoming monitor\n); +} +else { +int rc; + +rc = migrate_incoming(incom); + if (rc != 0) { +fprintf(stderr, Migration failed rc=%d\n, rc); +exit(rc); + } +vm_start(); +} +} + #ifdef CONFIG_GDBSTUB static void do_gdbserver(const char *port) { @@ -1486,6 +1505,7 @@ static term_cmd_t term_cmds[] = { , cancel the current VM migration }, { migrate_set_speed, s, do_migrate_set_speed, value, set maximum speed (in bytes) for migrations }, +{ migrate_incoming, s, do_migrate_incoming, incom, incoming string }, { cpu_set, is, do_cpu_set_nr, cpu [online|offline], change cpu state }, #if defined(TARGET_I386) || defined(TARGET_X86_64) { drive_add, iss, drive_hot_add, pcibus pcidevfn [file=file][,if=type][,bus=n]\n diff --git a/qemu/vl.c b/qemu/vl.c index e1762ee..9b5f113 100644 --- a/qemu/vl.c +++ b/qemu/vl.c @@ -229,6 +229,7 @@ int cursor_hide = 1; int graphic_rotate = 0; int daemonize = 0; const char *incoming; +int incoming_monitor = 0; const char *option_rom[MAX_OPTION_ROMS]; int nb_option_roms; int semihosting_enabled = 0; @@ -9673,11 +9675,16 @@ int main(int argc, char **argv) if (incoming) { int rc; -rc = migrate_incoming(incoming); -if (rc != 0) { -fprintf(stderr, Migration failed rc=%d\n, rc); -exit(rc); - } +if (strncmp(incoming, monitor, 7) == 0) { +incoming_monitor = 1; +} +else { +rc = migrate_incoming(incoming); +if (rc != 0) { +fprintf(stderr, Migration failed rc=%d\n, rc); +exit(rc); +} +} } {
Re: Live Migration fails
Daniel Hasler wrote: After a while, on host B (the target), I see the following error: migration: wait_for_go: read error len=0 (Interrupted system call) Migration failed rc=210 while on host A, I read following error message: WAIT FOR ACK: timeout reached Migration failed! ret=0 error=13 No, this is a bug I just fixed yesterday in Qemu proper. The next time that Avi rebases KVM to Qemu (which he said he would do as soon as the fix went into Qemu), then this should be fixed. FYI, the problem is basically that the i2c restore on the destination side was expecting 4 bytes, while the i2c save on the sending side was only sending 1 byte, so they eventually timed out while waiting for each other. Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fresh install of Windows XP hangs early in boot?
Roland Dreier wrote: BTW I tried using if=ide to install Windows XP and got a blue screen during the installer. What are people doing to run XP in a kvm guest? Hm, your comment later on makes me think you tried this on AMD. If so, I have also run into a similar problem with Windows guests under AMD. After installing WinDbg, it told me that it was a Paging Request in Non-Paged memory related to the Video memory area. Does yours look similar to that? I have not had time to track it further than that, though. Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]: Fix i2c_bus_save, which fixes KVM live migration
Attached is a simple patch to make i2c_bus_save() put a 32-bit quantity in the save file, which matches what i2c_bus_load() expects to pull out of the save file later. Without this fix in place, KVM live migration fails since the sender is only sending 1 byte while the receiver is waiting to receive 4 bytes. Avi, I don't know when you plan to next rebase KVM to upstream QEMU, but it's probably a good idea to carry this patch so that live migration works at all. Signed-off-by: Chris Lalancette [EMAIL PROTECTED] diff --git a/qemu/hw/i2c.c b/qemu/hw/i2c.c index 5d283fb..f711db7 100644 --- a/qemu/hw/i2c.c +++ b/qemu/hw/i2c.c @@ -21,7 +21,7 @@ static void i2c_bus_save(QEMUFile *f, void *opaque) { i2c_bus *bus = (i2c_bus *)opaque; -qemu_put_byte(f, bus-current_dev ? bus-current_dev-address : -1); +qemu_put_be32(f, bus-current_dev ? bus-current_dev-address : -1); } static int i2c_bus_load(QEMUFile *f, void *opaque, int version_id)
[PATCH]: Fake emulate Intel perfctr MSRs
Attached is a patch for fake emulating Intel perfctr MSRs, similar to the recent patch to fake emulate the AMD perfctr MSRs. This is needed for a reason similar for the previous patch; older linux guests (in this case, 2.6.9) can attempt to access the MSR's without a fixup section, and injecting a GPF kills the guest. Tested by me on RHEL-4 i386 and x86_64 guests, as well as F-9 guests. Signed-off-by: Chris Lalancette [EMAIL PROTECTED] diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index aaa99ed..f28789e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -29,6 +29,7 @@ #include asm/io.h #include asm/desc.h +#include asm/intel_arch_perfmon.h #define __ex(x) __kvm_handle_fault_on_reboot(x) @@ -916,6 +917,18 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data) case MSR_IA32_TIME_STAMP_COUNTER: guest_write_tsc(data); break; + case MSR_ARCH_PERFMON_EVENTSEL0: + case MSR_ARCH_PERFMON_EVENTSEL1: + case MSR_ARCH_PERFMON_PERFCTR0: + case MSR_ARCH_PERFMON_PERFCTR1: + /* + * Just discard all writes to the performance counters; this + * should keep both older linux and windows 64-bit guests + * happy + */ + pr_unimpl(vcpu, unimplemented perfctr wrmsr: 0x%x data 0x%llx\n, msr_index, data); + + break; default: msr = find_msr_entry(vmx, msr_index); if (msr) {
Re: [Fwd: [PATCH]: Fix crash in virtio_blk during modprobe ; rmmod ; modprobe]
Jens Axboe wrote: On Thu, May 22 2008, Rusty Russell wrote: On Wednesday 21 May 2008 23:12:39 Chris Lalancette wrote: diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 4962e62..c678ac5 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -294,6 +294,7 @@ static void virtblk_remove(struct virtio_device *vdev) vdev-config-reset(vdev); blk_cleanup_queue(vblk-disk-queue); + del_gendisk(vblk-disk); put_disk(vblk-disk); unregister_blkdev(major, virtblk); mempool_destroy(vblk-pool); Thanks Chris, it seems reasonable and I'm sure it works (kinda hard to test here, it's my root block dev). Other drivers seem to do blk_cleanup_queue() *after* del_gendisk: does it matter? Jens CC'd: he's gentle with my dumb questions... Rusty. del_gendisk() can generate IO, so it would seem safer to do that _before_ putting the queue reference :-) Ah, good to know. Out of curiousity, why would del_gendisk() generate additional I/O? Chris Lalancette -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH]: Fix silly output for virtio devices in /proc/interrupts
register_virtio_device() is doing something silly by overwriting what the caller put into .bus_id. This causes the interrupt line for all virtio devices to show up as 0, 1, etc. in /proc/interrupts. The attached patch fixes it. Signed-off-by: Chris Lalancette [EMAIL PROTECTED] commit cb97605728fc1d7a15ddc6ab689e8bcd23871133 Author: Chris Lalancette [EMAIL PROTECTED] Date: Thu May 15 09:04:55 2008 -0400 register_virtio_device was doing something silly, in that it was overwriting what the calling driver stuck into .bus_id for the name. This caused problems in the output of /proc/interrupts, since when you request_irq(), it doesn't actually copy the devname you pass in but just stores a pointer to the data. The fix is to just not have register_virtio_device do anything with the bus_id, and assume the higher level driver set it up properly. diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 138a7f0..1556ac2 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -128,7 +128,6 @@ int register_virtio_device(struct virtio_device *dev) int err; dev-dev.bus = virtio_bus; - sprintf(dev-dev.bus_id, %u, dev-index); /* We always start by resetting the device, in case a previous * driver messed it up. This also tests that code path a little. */