[PATCH] Search the LAPIC's for one that will accept a PIC interrupt.

2010-06-21 Thread Chris Lalancette
Older versions of 32-bit linux have a Checking 'hlt' instruction
test where they repeatedly call the 'hlt' instruction, and then
expect a timer interrupt to kick the CPU out of halt.  This happens
before any LAPIC or IOAPIC setup happens, which means that all of
the APIC's are in virtual wire mode at this point.  Unfortunately,
the current implementation of virtual wire mode is hardcoded to
only kick the BSP, so if a crash+kexec occurs on a different
vcpu, it will never get kicked.

This patch makes pic_unlock() do the equivalent of
kvm_irq_delivery_to_apic() for the IOAPIC code.  That is, it runs
through all of the vcpus looking for one that is in virtual wire
mode.  In the normal case where LAPICs and IOAPICs are configured,
this won't be used at all.  In the bootstrap phase of a modern
OS, before the LAPICs and IOAPICs are configured, this will have
exactly the same behavior as today; VCPU0 is always looked at
first, so it will always get out of the loop after the first
iteration.  This will only go through the loop more than once
during a kexec/kdump, in which case it will only do it a few times
until the kexec'ed kernel programs the LAPIC and IOAPIC.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/i8259.c |   17 +
 1 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 2c73f44..85ecabc 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -44,16 +44,25 @@ static void pic_unlock(struct kvm_pic *s)
__releases(s-lock)
 {
bool wakeup = s-wakeup_needed;
-   struct kvm_vcpu *vcpu;
+   struct kvm_vcpu *vcpu, *found = NULL;
+   int i;
 
s-wakeup_needed = false;
 
raw_spin_unlock(s-lock);
 
if (wakeup) {
-   vcpu = s-kvm-bsp_vcpu;
-   if (vcpu)
-   kvm_vcpu_kick(vcpu);
+   kvm_for_each_vcpu(i, vcpu, s-kvm) {
+   if (kvm_apic_accept_pic_intr(vcpu)) {
+   found = vcpu;
+   break;
+   }
+   }
+
+   if (!found)
+   found = s-kvm-bsp_vcpu;
+
+   kvm_vcpu_kick(found);
}
 }
 
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/3]: Fixes to IRQ routing

2010-06-16 Thread Chris Lalancette
As we've discussed previously, here is a series of patches to
fix some of the IRQ routing issues we have in KVM.  With this series
in place I was able to successfully kdump a RHEL-5 64-bit, and RHEL-6
32- and 64-bit guest on CPU's other than the BSP.  RHEL-5 32-bit kdump still
does not work; it gets stuck on Checking 'hlt' instruction.  However,
it does that both before and after this series, so there is something
else going on there that I still have to debug.

I also need to change the kvm_migrate_pit_timer function to migrate the
timer over to the last CPU that handled the timer interrupt, on the
theory that that particlar CPU is likely to handle the timer interrupt again
in the near future.  However, this is an optimization and shouldn't delay the
inclusion of the rest of the series for correctness.

Changes since RFC:
 - Changed ps-inject_lock from raw_spinlock_t to spinlock_t
 - Fixed up some formatting issues
 - Changed to have one PIT workqueue per-guest
 - Remember to cancel_work_sync when destroying the PIT

Changes since v1:
 - Call cancel_work_sync everywhere we call hrtimer_cancel
 - Bring back the reinjection logic
 - Fix up formatting issues from checkpatch

Changes since v2:
 - Fix up the reinjection logic thanks to review from Gleb and Marcelo
   Tested with -no-kvm-pit-reinjection on a RHEL-3 guest

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/3] Introduce a workqueue to deliver PIT timer interrupts.

2010-06-16 Thread Chris Lalancette
We really want to kvm_set_irq during the hrtimer callback,
but that is risky because that is during interrupt context.
Instead, offload the work to a workqueue, which is a bit safer
and should provide most of the same functionality.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/i8254.c |  141 ++
 arch/x86/kvm/i8254.h |4 +-
 arch/x86/kvm/irq.c   |1 -
 3 files changed, 88 insertions(+), 58 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 188d827..467cc47 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -34,6 +34,7 @@
 
 #include linux/kvm_host.h
 #include linux/slab.h
+#include linux/workqueue.h
 
 #include irq.h
 #include i8254.h
@@ -244,11 +245,22 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier 
*kian)
 {
struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state,
 irq_ack_notifier);
-   raw_spin_lock(ps-inject_lock);
-   if (atomic_dec_return(ps-pit_timer.pending)  0)
+   int value;
+
+   spin_lock(ps-inject_lock);
+   value = atomic_dec_return(ps-pit_timer.pending);
+   if (value  0)
+   /* spurious acks can be generated if, for example, the
+* PIC is being reset.  Handle it gracefully here
+*/
atomic_inc(ps-pit_timer.pending);
+   else if (value  0)
+   /* in this case, we had multiple outstanding pit interrupts
+* that we needed to inject.  Reinject
+*/
+   queue_work(ps-pit-wq, ps-pit-expired);
ps-irq_ack = 1;
-   raw_spin_unlock(ps-inject_lock);
+   spin_unlock(ps-inject_lock);
 }
 
 void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
@@ -264,10 +276,10 @@ void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
 }
 
-static void destroy_pit_timer(struct kvm_timer *pt)
+static void destroy_pit_timer(struct kvm_pit *pit)
 {
-   pr_debug(execute del timer!\n);
-   hrtimer_cancel(pt-timer);
+   hrtimer_cancel(pit-pit_state.pit_timer.timer);
+   cancel_work_sync(pit-expired);
 }
 
 static bool kpit_is_periodic(struct kvm_timer *ktimer)
@@ -281,6 +293,60 @@ static struct kvm_timer_ops kpit_ops = {
.is_periodic = kpit_is_periodic,
 };
 
+static void pit_do_work(struct work_struct *work)
+{
+   struct kvm_pit *pit = container_of(work, struct kvm_pit, expired);
+   struct kvm *kvm = pit-kvm;
+   struct kvm_vcpu *vcpu;
+   int i;
+   struct kvm_kpit_state *ps = pit-pit_state;
+   int inject = 0;
+
+   /* Try to inject pending interrupts when
+* last one has been acked.
+*/
+   spin_lock(ps-inject_lock);
+   if (ps-irq_ack) {
+   ps-irq_ack = 0;
+   inject = 1;
+   }
+   spin_unlock(ps-inject_lock);
+   if (inject) {
+   kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1);
+   kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0);
+
+   /*
+* Provides NMI watchdog support via Virtual Wire mode.
+* The route is: PIT - PIC - LVT0 in NMI mode.
+*
+* Note: Our Virtual Wire implementation is simplified, only
+* propagating PIT interrupts to all VCPUs when they have set
+* LVT0 to NMI delivery. Other PIC interrupts are just sent to
+* VCPU0, and only if its LVT0 is in EXTINT mode.
+*/
+   if (kvm-arch.vapics_in_nmi_mode  0)
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   kvm_apic_nmi_wd_deliver(vcpu);
+   }
+}
+
+static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
+{
+   struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+   struct kvm_pit *pt = ktimer-kvm-arch.vpit;
+
+   if (ktimer-reinject || !atomic_read(ktimer-pending)) {
+   atomic_inc(ktimer-pending);
+   queue_work(pt-wq, pt-expired);
+   }
+
+   if (ktimer-t_ops-is_periodic(ktimer)) {
+   hrtimer_add_expires_ns(ktimer-timer, ktimer-period);
+   return HRTIMER_RESTART;
+   } else
+   return HRTIMER_NORESTART;
+}
+
 static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period)
 {
struct kvm_timer *pt = ps-pit_timer;
@@ -292,13 +358,13 @@ static void create_pit_timer(struct kvm_kpit_state *ps, 
u32 val, int is_period)
 
/* TODO The new value only affected after the retriggered */
hrtimer_cancel(pt-timer);
+   cancel_work_sync(ps-pit-expired);
pt-period = interval;
ps-is_periodic = is_period;
 
-   pt-timer.function = kvm_timer_fn;
+   pt-timer.function = pit_timer_fn;
pt-t_ops = kpit_ops;
pt-kvm = ps-pit-kvm;
-   pt-vcpu

[PATCH v3 3/3] In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's

2010-06-16 Thread Chris Lalancette
Otherwise we might try to deliver a timer interrupt to a cpu that
can't possibly handle it.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 virt/kvm/irq_comm.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 52f412f..06cf61e 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -100,7 +100,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
if (r  0)
r = 0;
r += kvm_apic_set_irq(vcpu, irq);
-   } else {
+   } else if (kvm_lapic_enabled(vcpu)) {
if (!lowest)
lowest = vcpu;
else if (kvm_apic_compare_prio(vcpu, lowest)  0)
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/3] Allow any LAPIC to accept PIC interrupts.

2010-06-16 Thread Chris Lalancette
If the guest wants to accept timer interrupts on a CPU other
than the BSP, we need to remove this gate.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/lapic.c |   12 +---
 1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index d8258a0..ee0f76c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1107,13 +1107,11 @@ int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
u32 lvt0 = apic_get_reg(vcpu-arch.apic, APIC_LVT0);
int r = 0;
 
-   if (kvm_vcpu_is_bsp(vcpu)) {
-   if (!apic_hw_enabled(vcpu-arch.apic))
-   r = 1;
-   if ((lvt0  APIC_LVT_MASKED) == 0 
-   GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
-   r = 1;
-   }
+   if (!apic_hw_enabled(vcpu-arch.apic))
+   r = 1;
+   if ((lvt0  APIC_LVT_MASKED) == 0 
+   GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
+   r = 1;
return r;
 }
 
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3]: Fixes to IRQ routing

2010-06-14 Thread Chris Lalancette
As we've discussed previously, here is a series of patches to
fix some of the IRQ routing issues we have in KVM.  With this series
in place I was able to successfully kdump a RHEL-5 64-bit, and RHEL-6
32- and 64-bit guest on CPU's other than the BSP.  RHEL-5 32-bit kdump still
does not work; it gets stuck on Checking 'hlt' instruction.  However,
it does that both before and after this series, so there is something
else going on there that I still have to debug.

I also need to change the kvm_migrate_pit_timer function to migrate the
timer over to the last CPU that handled the timer interrupt, on the
theory that that particlar CPU is likely to handle the timer interrupt again
in the near future.  However, this is an optimization and shouldn't delay the
inclusion of the rest of the series for correctness.

Changes since RFC:
 - Changed ps-inject_lock from raw_spinlock_t to spinlock_t
 - Fixed up some formatting issues
 - Changed to have one PIT workqueue per-guest
 - Remember to cancel_work_sync when destroying the PIT

Changes since v1:
 - Call cancel_work_sync everywhere we call hrtimer_cancel
 - Bring back the reinjection logic
 - Fix up formatting issues from checkpatch

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] Introduce a workqueue to deliver PIT timer interrupts.

2010-06-14 Thread Chris Lalancette
We really want to kvm_set_irq during the hrtimer callback,
but that is risky because that is during interrupt context.
Instead, offload the work to a workqueue, which is a bit safer
and should provide most of the same functionality.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/i8254.c |  125 -
 arch/x86/kvm/i8254.h |4 +-
 arch/x86/kvm/irq.c   |1 -
 3 files changed, 74 insertions(+), 56 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 188d827..3bed8ac 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -34,6 +34,7 @@
 
 #include linux/kvm_host.h
 #include linux/slab.h
+#include linux/workqueue.h
 
 #include irq.h
 #include i8254.h
@@ -244,11 +245,11 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier 
*kian)
 {
struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state,
 irq_ack_notifier);
-   raw_spin_lock(ps-inject_lock);
+   spin_lock(ps-inject_lock);
if (atomic_dec_return(ps-pit_timer.pending)  0)
atomic_inc(ps-pit_timer.pending);
ps-irq_ack = 1;
-   raw_spin_unlock(ps-inject_lock);
+   spin_unlock(ps-inject_lock);
 }
 
 void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
@@ -267,7 +268,8 @@ void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
 static void destroy_pit_timer(struct kvm_timer *pt)
 {
pr_debug(execute del timer!\n);
-   hrtimer_cancel(pt-timer);
+   if (hrtimer_cancel(pt-timer))
+   cancel_work_sync(pt-kvm-arch.vpit-expired);
 }
 
 static bool kpit_is_periodic(struct kvm_timer *ktimer)
@@ -281,6 +283,58 @@ static struct kvm_timer_ops kpit_ops = {
.is_periodic = kpit_is_periodic,
 };
 
+static void pit_do_work(struct work_struct *work)
+{
+   struct kvm_pit *pit = container_of(work, struct kvm_pit, expired);
+   struct kvm *kvm = pit-kvm;
+   struct kvm_vcpu *vcpu;
+   int i;
+   struct kvm_kpit_state *ps = pit-pit_state;
+   int inject = 0;
+
+   /* Try to inject pending interrupts when
+* last one has been acked.
+*/
+   spin_lock(ps-inject_lock);
+   if (ps-irq_ack) {
+   ps-irq_ack = 0;
+   inject = 1;
+   }
+   spin_unlock(ps-inject_lock);
+   if (inject) {
+   kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1);
+   kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0);
+
+   /*
+* Provides NMI watchdog support via Virtual Wire mode.
+* The route is: PIT - PIC - LVT0 in NMI mode.
+*
+* Note: Our Virtual Wire implementation is simplified, only
+* propagating PIT interrupts to all VCPUs when they have set
+* LVT0 to NMI delivery. Other PIC interrupts are just sent to
+* VCPU0, and only if its LVT0 is in EXTINT mode.
+*/
+   if (kvm-arch.vapics_in_nmi_mode  0)
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   kvm_apic_nmi_wd_deliver(vcpu);
+   }
+}
+
+static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
+{
+   struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+   struct kvm_pit *pt = ktimer-kvm-arch.vpit;
+
+   if (ktimer-reinject)
+   queue_work(pt-wq, pt-expired);
+
+   if (ktimer-t_ops-is_periodic(ktimer)) {
+   hrtimer_add_expires_ns(ktimer-timer, ktimer-period);
+   return HRTIMER_RESTART;
+   } else
+   return HRTIMER_NORESTART;
+}
+
 static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period)
 {
struct kvm_timer *pt = ps-pit_timer;
@@ -291,14 +345,14 @@ static void create_pit_timer(struct kvm_kpit_state *ps, 
u32 val, int is_period)
pr_debug(create pit timer, interval is %llu nsec\n, interval);
 
/* TODO The new value only affected after the retriggered */
-   hrtimer_cancel(pt-timer);
+   if (hrtimer_cancel(pt-timer))
+   cancel_work_sync(pt-kvm-arch.vpit-expired);
pt-period = interval;
ps-is_periodic = is_period;
 
-   pt-timer.function = kvm_timer_fn;
+   pt-timer.function = pit_timer_fn;
pt-t_ops = kpit_ops;
pt-kvm = ps-pit-kvm;
-   pt-vcpu = pt-kvm-bsp_vcpu;
 
atomic_set(pt-pending, 0);
ps-irq_ack = 1;
@@ -626,7 +680,14 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 
mutex_init(pit-pit_state.lock);
mutex_lock(pit-pit_state.lock);
-   raw_spin_lock_init(pit-pit_state.inject_lock);
+   spin_lock_init(pit-pit_state.inject_lock);
+
+   pit-wq = create_singlethread_workqueue(kvm-pit-wq);
+   if (!pit-wq) {
+   kfree(pit);
+   return NULL;
+   }
+   INIT_WORK(pit-expired, pit_do_work);
 
kvm-arch.vpit

[PATCH v2 2/3] Allow any LAPIC to accept PIC interrupts.

2010-06-14 Thread Chris Lalancette
If the guest wants to accept timer interrupts on a CPU other
than the BSP, we need to remove this gate.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/lapic.c |   12 +---
 1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index d8258a0..ee0f76c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1107,13 +1107,11 @@ int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
u32 lvt0 = apic_get_reg(vcpu-arch.apic, APIC_LVT0);
int r = 0;
 
-   if (kvm_vcpu_is_bsp(vcpu)) {
-   if (!apic_hw_enabled(vcpu-arch.apic))
-   r = 1;
-   if ((lvt0  APIC_LVT_MASKED) == 0 
-   GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
-   r = 1;
-   }
+   if (!apic_hw_enabled(vcpu-arch.apic))
+   r = 1;
+   if ((lvt0  APIC_LVT_MASKED) == 0 
+   GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
+   r = 1;
return r;
 }
 
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's

2010-06-14 Thread Chris Lalancette
Otherwise we might try to deliver a timer interrupt to a cpu that
can't possibly handle it.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 virt/kvm/irq_comm.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 52f412f..06cf61e 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -100,7 +100,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
if (r  0)
r = 0;
r += kvm_apic_set_irq(vcpu, irq);
-   } else {
+   } else if (kvm_lapic_enabled(vcpu)) {
if (!lowest)
lowest = vcpu;
else if (kvm_apic_compare_prio(vcpu, lowest)  0)
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3]: Fixes to IRQ routing

2010-06-10 Thread Chris Lalancette
As we've discussed previously, here is a series of patches to
fix some of the IRQ routing issues we have in KVM.  With this series
in place I was able to successfully kdump a RHEL-5 64-bit, and RHEL-6
32- and 64-bit guest on CPU's other than the BSP.  RHEL-5 32-bit kdump still
does not work; it gets stuck on Checking 'hlt' instruction.  However,
it does that both before and after this series, so there is something
else going on there that I still have to debug.

I also need to change the kvm_migrate_pit_timer function to migrate the
timer over to the last CPU that handled the timer interrupt, on the
theory that that particlar CPU is likely to handle the timer interrupt again
in the near future.

Changes since RFC:
 - Changed ps-inject_lock from raw_spinlock_t to spinlock_t
 - Fixed up some formatting issues
 - Changed to have one PIT workqueue per-guest
 - Remember to cancel_work_sync when destroying the PIT

Chris Lalancette

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] Introduce a workqueue to deliver PIT timer interrupts.

2010-06-10 Thread Chris Lalancette
We really want to kvm_set_irq during the hrtimer callback,
but that is risky because that is during interrupt context.
Instead, offload the work to a workqueue, which is a bit safer
and should provide most of the same functionality.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/i8254.c |  117 --
 arch/x86/kvm/i8254.h |4 +-
 arch/x86/kvm/irq.c   |1 -
 3 files changed, 69 insertions(+), 53 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 188d827..99c7472 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -34,6 +34,7 @@
 
 #include linux/kvm_host.h
 #include linux/slab.h
+#include linux/workqueue.h
 
 #include irq.h
 #include i8254.h
@@ -244,11 +245,11 @@ static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier 
*kian)
 {
struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state,
 irq_ack_notifier);
-   raw_spin_lock(ps-inject_lock);
+   spin_lock(ps-inject_lock);
if (atomic_dec_return(ps-pit_timer.pending)  0)
atomic_inc(ps-pit_timer.pending);
ps-irq_ack = 1;
-   raw_spin_unlock(ps-inject_lock);
+   spin_unlock(ps-inject_lock);
 }
 
 void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
@@ -281,6 +282,58 @@ static struct kvm_timer_ops kpit_ops = {
.is_periodic = kpit_is_periodic,
 };
 
+static void pit_do_work(struct work_struct *work)
+{
+   struct kvm_pit *pit = container_of(work, struct kvm_pit, expired);
+   struct kvm *kvm = pit-kvm;
+   struct kvm_vcpu *vcpu;
+   int i;
+   struct kvm_kpit_state *ps = pit-pit_state;
+   int inject = 0;
+
+   /* Try to inject pending interrupts when
+* last one has been acked.
+*/
+   spin_lock(ps-inject_lock);
+   if (ps-irq_ack) {
+   ps-irq_ack = 0;
+   inject = 1;
+   }
+   spin_unlock(ps-inject_lock);
+   if (inject) {
+   kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1);
+   kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0);
+
+   /*
+* Provides NMI watchdog support via Virtual Wire mode.
+* The route is: PIT - PIC - LVT0 in NMI mode.
+*
+* Note: Our Virtual Wire implementation is simplified, only
+* propagating PIT interrupts to all VCPUs when they have set
+* LVT0 to NMI delivery. Other PIC interrupts are just sent to
+* VCPU0, and only if its LVT0 is in EXTINT mode.
+*/
+   if (kvm-arch.vapics_in_nmi_mode  0)
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   kvm_apic_nmi_wd_deliver(vcpu);
+   }
+}
+
+static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
+{
+   struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+   struct kvm_pit *pt = ktimer-kvm-arch.vpit;
+
+   queue_work(pt-wq, pt-expired);
+
+   if (ktimer-t_ops-is_periodic(ktimer)) {
+   hrtimer_add_expires_ns(ktimer-timer, ktimer-period);
+   return HRTIMER_RESTART;
+   }
+   else
+   return HRTIMER_NORESTART;
+}
+
 static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period)
 {
struct kvm_timer *pt = ps-pit_timer;
@@ -295,10 +348,9 @@ static void create_pit_timer(struct kvm_kpit_state *ps, 
u32 val, int is_period)
pt-period = interval;
ps-is_periodic = is_period;
 
-   pt-timer.function = kvm_timer_fn;
+   pt-timer.function = pit_timer_fn;
pt-t_ops = kpit_ops;
pt-kvm = ps-pit-kvm;
-   pt-vcpu = pt-kvm-bsp_vcpu;
 
atomic_set(pt-pending, 0);
ps-irq_ack = 1;
@@ -626,7 +678,14 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 
mutex_init(pit-pit_state.lock);
mutex_lock(pit-pit_state.lock);
-   raw_spin_lock_init(pit-pit_state.inject_lock);
+   spin_lock_init(pit-pit_state.inject_lock);
+
+   pit-wq = create_singlethread_workqueue(kvm-pit-wq);
+   if (!pit-wq) {
+   kfree(pit);
+   return NULL;
+   }
+   INIT_WORK(pit-expired, pit_do_work);
 
kvm-arch.vpit = pit;
pit-kvm = kvm;
@@ -687,52 +746,8 @@ void kvm_free_pit(struct kvm *kvm)
hrtimer_cancel(timer);
kvm_free_irq_source_id(kvm, kvm-arch.vpit-irq_source_id);
mutex_unlock(kvm-arch.vpit-pit_state.lock);
+   cancel_work_sync(kvm-arch.vpit-expired);
+   destroy_workqueue(kvm-arch.vpit-wq);
kfree(kvm-arch.vpit);
}
 }
-
-static void __inject_pit_timer_intr(struct kvm *kvm)
-{
-   struct kvm_vcpu *vcpu;
-   int i;
-
-   kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1);
-   kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0

[PATCH 3/3] In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's

2010-06-10 Thread Chris Lalancette
Otherwise we might try to deliver a timer interrupt to a cpu that
can't possibly handle it.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 virt/kvm/irq_comm.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 52f412f..06cf61e 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -100,7 +100,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
if (r  0)
r = 0;
r += kvm_apic_set_irq(vcpu, irq);
-   } else {
+   } else if (kvm_lapic_enabled(vcpu)) {
if (!lowest)
lowest = vcpu;
else if (kvm_apic_compare_prio(vcpu, lowest)  0)
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] Allow any LAPIC to accept PIC interrupts.

2010-06-10 Thread Chris Lalancette
If the guest wants to accept timer interrupts on a CPU other
than the BSP, we need to remove this gate.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/lapic.c |   12 +---
 1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index d8258a0..ee0f76c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1107,13 +1107,11 @@ int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
u32 lvt0 = apic_get_reg(vcpu-arch.apic, APIC_LVT0);
int r = 0;
 
-   if (kvm_vcpu_is_bsp(vcpu)) {
-   if (!apic_hw_enabled(vcpu-arch.apic))
-   r = 1;
-   if ((lvt0  APIC_LVT_MASKED) == 0 
-   GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
-   r = 1;
-   }
+   if (!apic_hw_enabled(vcpu-arch.apic))
+   r = 1;
+   if ((lvt0  APIC_LVT_MASKED) == 0 
+   GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
+   r = 1;
return r;
 }
 
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 1/3] Introduce a workqueue to deliver PIT timer interrupts.

2010-06-08 Thread Chris Lalancette
We really want to kvm_set_irq during the hrtimer callback,
but that is risky because that is during interrupt context.
Instead, offload the work to a workqueue, which is a bit safer
and should provide most of the same functionality.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/i8254.c |  106 ++---
 arch/x86/kvm/i8254.h |4 ++
 arch/x86/kvm/irq.c   |1 -
 arch/x86/kvm/x86.c   |4 ++
 4 files changed, 73 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 188d827..dd655ef 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -34,6 +34,7 @@
 
 #include linux/kvm_host.h
 #include linux/slab.h
+#include linux/workqueue.h
 
 #include irq.h
 #include i8254.h
@@ -49,6 +50,8 @@
 #define RW_STATE_WORD0 3
 #define RW_STATE_WORD1 4
 
+static struct workqueue_struct *pit_wq;
+
 /* Compute with 96 bit intermediate result: (a*b)/c */
 static u64 muldiv64(u64 a, u32 b, u32 c)
 {
@@ -281,6 +284,58 @@ static struct kvm_timer_ops kpit_ops = {
.is_periodic = kpit_is_periodic,
 };
 
+static void pit_do_work(struct work_struct *work)
+{
+   struct kvm_pit *pit = container_of(work, struct kvm_pit, expired);
+   struct kvm *kvm = pit-kvm;
+   struct kvm_vcpu *vcpu;
+   int i;
+   struct kvm_kpit_state *ps = pit-pit_state;
+   int inject = 0;
+
+   /* Try to inject pending interrupts when
+* last one has been acked.
+*/
+   raw_spin_lock(ps-inject_lock);
+   if (ps-irq_ack) {
+   ps-irq_ack = 0;
+   inject = 1;
+   }
+   raw_spin_unlock(ps-inject_lock);
+   if (inject) {
+   kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1);
+   kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0);
+
+   /*
+* Provides NMI watchdog support via Virtual Wire mode.
+* The route is: PIT - PIC - LVT0 in NMI mode.
+*
+* Note: Our Virtual Wire implementation is simplified, only
+* propagating PIT interrupts to all VCPUs when they have set
+* LVT0 to NMI delivery. Other PIC interrupts are just sent to
+* VCPU0, and only if its LVT0 is in EXTINT mode.
+*/
+   if (kvm-arch.vapics_in_nmi_mode  0)
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   kvm_apic_nmi_wd_deliver(vcpu);
+   }
+}
+
+static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
+{
+   struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+   struct kvm_pit *pt = ktimer-kvm-arch.vpit;
+
+   queue_work(pit_wq, pt-expired);
+
+   if (ktimer-t_ops-is_periodic(ktimer)) {
+   hrtimer_add_expires_ns(ktimer-timer, ktimer-period);
+  return HRTIMER_RESTART;
+   }
+   else
+  return HRTIMER_NORESTART;
+}
+
 static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period)
 {
struct kvm_timer *pt = ps-pit_timer;
@@ -295,10 +350,9 @@ static void create_pit_timer(struct kvm_kpit_state *ps, 
u32 val, int is_period)
pt-period = interval;
ps-is_periodic = is_period;
 
-   pt-timer.function = kvm_timer_fn;
+   pt-timer.function = pit_timer_fn;
pt-t_ops = kpit_ops;
pt-kvm = ps-pit-kvm;
-   pt-vcpu = pt-kvm-bsp_vcpu;
 
atomic_set(pt-pending, 0);
ps-irq_ack = 1;
@@ -628,6 +682,8 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
mutex_lock(pit-pit_state.lock);
raw_spin_lock_init(pit-pit_state.inject_lock);
 
+   INIT_WORK(pit-expired, pit_do_work);
+
kvm-arch.vpit = pit;
pit-kvm = kvm;
 
@@ -691,48 +747,16 @@ void kvm_free_pit(struct kvm *kvm)
}
 }
 
-static void __inject_pit_timer_intr(struct kvm *kvm)
+int kvm_pit_create(void)
 {
-   struct kvm_vcpu *vcpu;
-   int i;
+   pit_wq = create_singlethread_workqueue(kvm-pit-wq);
+   if (!pit_wq)
+   return -ENOMEM;
 
-   kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1);
-   kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0);
-
-   /*
-* Provides NMI watchdog support via Virtual Wire mode.
-* The route is: PIT - PIC - LVT0 in NMI mode.
-*
-* Note: Our Virtual Wire implementation is simplified, only
-* propagating PIT interrupts to all VCPUs when they have set
-* LVT0 to NMI delivery. Other PIC interrupts are just sent to
-* VCPU0, and only if its LVT0 is in EXTINT mode.
-*/
-   if (kvm-arch.vapics_in_nmi_mode  0)
-   kvm_for_each_vcpu(i, vcpu, kvm)
-   kvm_apic_nmi_wd_deliver(vcpu);
+   return 0;
 }
 
-void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu)
+void kvm_pit_cleanup(void)
 {
-   struct kvm_pit *pit = vcpu-kvm-arch.vpit;
-   struct kvm *kvm = vcpu

[RFC][PATCH 2/3] Allow any LAPIC to accept PIC interrupts.

2010-06-08 Thread Chris Lalancette
If the guest wants to accept timer interrupts on a CPU other
than the BSP, we need to remove this gate.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/lapic.c |   12 +---
 1 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index d8258a0..ee0f76c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1107,13 +1107,11 @@ int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
u32 lvt0 = apic_get_reg(vcpu-arch.apic, APIC_LVT0);
int r = 0;
 
-   if (kvm_vcpu_is_bsp(vcpu)) {
-   if (!apic_hw_enabled(vcpu-arch.apic))
-   r = 1;
-   if ((lvt0  APIC_LVT_MASKED) == 0 
-   GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
-   r = 1;
-   }
+   if (!apic_hw_enabled(vcpu-arch.apic))
+   r = 1;
+   if ((lvt0  APIC_LVT_MASKED) == 0 
+   GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
+   r = 1;
return r;
 }
 
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 3/3] In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's

2010-06-08 Thread Chris Lalancette
Otherwise we might try to deliver a timer interrupt to a cpu that
can't possibly handle it.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 virt/kvm/irq_comm.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 52f412f..06cf61e 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -100,7 +100,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
if (r  0)
r = 0;
r += kvm_apic_set_irq(vcpu, irq);
-   } else {
+   } else if (kvm_lapic_enabled(vcpu)) {
if (!lowest)
lowest = vcpu;
else if (kvm_apic_compare_prio(vcpu, lowest)  0)
-- 
1.6.6.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Print a user-friendly message on failed vmentry

2010-05-20 Thread Chris Lalancette
On 05/19/2010 05:16 PM, Mohammed Gamal wrote:
 This patch address bug report in https://bugs.launchpad.net/qemu/+bug/530077.
 
 Failed vmentries were handled with handle_unhandled() which prints a rather
 unfriendly message to the user. This patch separates handling vmentry failures
 from unknown exit reasons and prints a friendly message to the user.
 
 Signed-off-by: Mohammed Gamal m.gamal...@gmail.com
 ---
  qemu-kvm.c |   16 +++-
  1 files changed, 15 insertions(+), 1 deletions(-)
 
 diff --git a/qemu-kvm.c b/qemu-kvm.c
 index 35a4c8a..deb4df8 100644
 --- a/qemu-kvm.c
 +++ b/qemu-kvm.c
 @@ -106,6 +106,20 @@ static int handle_unhandled(uint64_t reason)
  return -EINVAL;
  }
  
 +static int handle_failed_vmentry(uint64_t reason)
 +{
 +fprintf(stderr, kvm: vm entry failed with error 0x% PRIx64 \n\n, 
 reason);
 +fprintf(stderr, If you're runnning a guest on an Intel machine, it can 
 be\n);
 +fprintf(stderr, most-likely due to the guest going into an invalid 
 state\n);
 +fprintf(stderr, for Intel VT. For example, the guest maybe running in 
 big\n);
 +fprintf(stderr, real mode which is not supported by Intel VT.\n\n);
 +fprintf(stderr, You may want to try enabling real mode emulation in 
 KVM.\n);
 +fprintf(stderr, To Enable it, you may run the following commands as 
 root:\n);
 +fprintf(stderr, # rmmod kvm_intel\n);
 +fprintf(stderr, # rmmod kvm\n);
 +fprintf(stderr, # modprobe kvm_intel emulate_invalid_guest_state=1\n);
 +return -EINVAL;
 +}

The thing is, there are other valid reasons for vmentry failure.  A while ago I 
tracked
down a bug in the Linux kernel that was causing us to vmenter with invalid 
segments;
this message would have been very misleading in that case.  I think you'd have 
to do
more complete analysis of the vmentry failure code to be more certain about the 
reason
for failure.

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [UNTESTED] KVM: do not call kvm_set_irq from irq disabled section

2010-04-23 Thread Chris Lalancette
On 04/23/2010 07:05 AM, Avi Kivity wrote:
 On 04/22/2010 10:55 PM, Gleb Natapov wrote:


 What about converting PIC/IOAPIC mutexes into spinlocks?

 Works for me, but on large guests the spinning will be noticeable.
 I believe.
  
 For interrupts going through IOPIC, but we know this is not scalable
 anyway.

 
 Yes.  We also wanted to convert the ioapic/pic to spinlocks so we could
 queue the interrupt from the PIT directly instead of using
 KVM_REQ_PENDING_TIMER which keeps confusing me.  Chris Lalancette posted
 a patchset for this a while back but it was never completed.

Yeah, I'm sorry I never completed it.  It turns out that with the HPET
changes that went in around the time I was looking at it, that set of
patches wasn't really required to fix the problem I was seeing with kdump.

That being said, if it's useful to somebody, I can repost the patches
(though they are woefully out-of-date now).  Let me know if you want
to see them again.

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows Vista/7 repeatedly prompt to Set Network Location

2010-01-27 Thread Chris Lalancette
On 01/26/2010 11:22 PM, Jamin W. Collins wrote:
 Every time I start a Windows Vista or Windows 7 virtual machine it
 request a that a location be set for the network, regardless of the fact
 that the network location has already been set the same way every time
 the system is started.  Near as I can tell the VM's NIC MAC, IP address,
 DNS servers, default gateway, and all other network related items are
 the same every single time.  While this does not stop the VM from
 working it is annoying.
 
 I'm starting the VM the same way through libvirt every single time.  The
 resultant kvm command line is:
 
 /usr/bin/kvm -S -M pc-0.11 -cpu qemu32 -enable-kvm -m 1536 -smp 1 -name
 win7 -uuid 6671f42a-b974-6bb9-bd7e-fd1da95cabe5 -monitor
 unix:/var/lib/libvirt/qemu/win7.monitor,server,nowait -localtime -boot c
 -drive file=/media/devel/testing/win7/testing.img,if=ide,index=0,boot=on
 -drive
 file=/media/devel/testing/isos/en_windows_7_professional_x86_dvd_x15-65804.iso,if=ide,media=cdrom,index=2
 -net nic,macaddr=54:52:00:37:4a:ce,vlan=0,model=rtl8139,name=rtl8139.0
 -net tap,fd=38,vlan=0,name=tap.0 -serial pty -parallel none -usb
 -usbdevice tablet -vga cirrus
 
 While the uuid isn't the same from one execution to the next, to my
 knowledge it's not something the VM ever sees and is only an identifier
 within KVM.  Has anyone else seen anything like this?

Actually, that's not true (if I remember correctly).  The uuid is exposed to 
the guest
via the SMBIOS tables, so that might be causing the problem.  Try setting the
UUID in the libvirt XML, and that may solve it.

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/12] Kick appropriate CPUs when signalling interrupts.

2009-12-08 Thread Chris Lalancette
On 12/02/2009 04:44 PM, Gleb Natapov wrote:
 On Tue, Dec 01, 2009 at 03:36:41PM +0100, Chris Lalancette wrote:
 Make sure that we kick the appropriate vcpu when delivering
 an interrupt.  This makes sure that we wake any idle cpus
 to cause a vcpu_run and an interrupt injection to occur.

 Signed-off-by: Chris Lalancette clala...@redhat.com
 ---
  virt/kvm/irq_comm.c |5 -
  1 files changed, 4 insertions(+), 1 deletions(-)

 diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
 index 9b07734..96df854 100644
 --- a/virt/kvm/irq_comm.c
 +++ b/virt/kvm/irq_comm.c
 @@ -98,6 +98,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
 kvm_lapic *src,
  if (r  0)
  r = 0;
  r += kvm_apic_set_irq(vcpu, irq);
 +kvm_vcpu_kick(vcpu);
  } else {
  if (!lowest)
  lowest = vcpu;
 @@ -106,8 +107,10 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
 kvm_lapic *src,
  }
  }
  
 -if (lowest)
 +if (lowest) {
  r = kvm_apic_set_irq(lowest, irq);
 +kvm_vcpu_kick(lowest);
 +}
  
  return r;
  }
 kvm_vcpu_kick() is done inside kvm_apic_set_irq(), so why is this
 needed?

Yeah, you are absolutely right.  I mis-read it before.  I'll remove this bit.

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/12] Remove KVM_REQ_PENDING_TIMER.

2009-12-02 Thread Chris Lalancette
On 12/01/2009 08:04 PM, Jan Kiszka wrote:
 Chris Lalancette wrote:
 KVM_REQ_PENDING_TIMER is set and cleared in a couple of places,
 but it never seems to be actually checked.  Remove it.

 
 I would suggest to study the introducing commit
 06e05645661211b9eaadaf6344c335d2e80f0ba2. My strong feeling is that this
 removal is wrong.

Ah, thanks.  Now I see what you mean.  In newer KVM, if you are in
__vcpu_run() and are in the while (r  0) loop, after you've
done kvm_inject_pending_timer_irqs(), but before you do the next
__vcpu_run() (and hence disabled local IRQs), you could receive an
hrtimer callback.  In that case I guess you would hit the 
if (vcpu-requests || need_resched() || signal_pending()) test, don't
inject the interrupt, and possibly spend a long time in the guest
before an unrelated event causes the exit and hence causes the timer
injection.

Well, that certainly is a bit subtle :).  I think I might submit a
patch to at least put a comment in vcpu_enter_guest() about this.

Unfortunately, this does sort of put a damper on what's going on later
in the series.  If the i8254 doesn't belong to the BSP (which it really
shouldn't), then I don't know which vcpu to raise the bit on.  Maybe I
can do it in the kvm_irq_delivery_to_apic() function, or
kvm_apic_set_irq().  I'll take a look at that.

Thanks again for the review,
-- 
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/12]: Fix kdump under KVM

2009-12-01 Thread Chris Lalancette

Another version of the patch series to make kdump work inside KVM
guests.  The current problem with using kdump is that KVM only
delivers PIT interrupts to the BSP.  While this is technically
allowed by the MPS spec, most motherboards deliver timer interrupts
to *any* LAPIC in virtual wire mode.  Since a crash can occur on
any CPU, timer interrupts must be able to reach any CPU in order
for kdump to work properly.  Therefore, this series allows timer
interrupts to be routed to any virtual CPU that is in the correct mode.

I've broken up the series into much smaller chunks now; some of them
can be applied on their own as bug fixes.  There are a few changes
to this series from the last time I posted.  They are:

1)  I've taken Marcelo's advice to only redirect PIT interrupts in
the IOAPIC to the BSP if the destination is multiple vcpus.
2)  I've changed it so that we use a spin_lock() in the IOAPIC instead
of a mutex lock, based on the discussion with Avi.  This means that
we can get rid of all of the delayed interrupt action in the PIT
and just directly set_irq in the hrtimer callbacks.

The conversion of the IOAPIC to a spin_lock instead of a mutex_lock
caused some concern with Avi when we last discussed it.  In particular
he was concerned that any notifiers that kvm_set_irq kicked off were
safe from an IRQ context.  From what I can tell, there are two sets
of notifiers: the IRQ mask notifiers and the IRQ ACK notifiers.  There
is one, and only one mask notifier, in the PIT; it merely sets a flag,
so it should be interrupt safe.  For the IRQ ACK notifiers, there is
one notifier for the PIT and one for assigned-dev.c.  The one in the
PIT is definitely interrupt safe, it just sets a flag.  The one
in assigned-dev is more complicated.  It calls out to kvm_set_irq()
(which is IRQ-safe now), and also calls out to enable_irq(), which
should be interrupt-safe as long as bus_lock is not set
(which it never is, from what I can tell).  Therefore, I *think* this
should be safe, although I don't have VT-d hardware available at the
moment to test for sure.

I did basic smoke testing on a few different SMP guests: Fedora 11 i386,
Fedora 12 x86_64, Red Hat Linux 5.2, Windows XP i386, RHEL-4.8, 
Ubuntu 8.10, Windows 2003 i386, and RHEL-5.3.  I also did targetted
testing of the kdump feature in the RHEL-5.3 guest, and kdump works there.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/12] Fix up some comments around the source tree.

2009-12-01 Thread Chris Lalancette
Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/svm.c  |2 +-
 virt/kvm/ioapic.c   |4 ++--
 virt/kvm/kvm_main.c |2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 3de0b37..68fe7f2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2628,7 +2628,7 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
break;
case SVM_EXITINTINFO_TYPE_EXEPT:
/* In case of software exception do not reinject an exception
-  vector, but re-execute and instruction instead */
+  vector, but re-execute an instruction instead */
if (is_nested(svm))
break;
if (kvm_exception_is_soft(vector))
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 38a2d20..53f3166 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -164,10 +164,10 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int 
irq)
irqe.shorthand = 0;
 
 #ifdef CONFIG_X86
-   /* Always delivery PIT interrupt to vcpu 0 */
+   /* Always deliver PIT interrupt to vcpu 0 */
if (irq == 0) {
irqe.dest_mode = 0; /* Physical mode. */
-   /* need to read apic_id from apic regiest since
+   /* need to read apic_id from apic register since
 * it can be rewritten */
irqe.dest_id = ioapic-kvm-bsp_vcpu-vcpu_id;
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f92ba13..748900e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1180,7 +1180,7 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu)
 }
 
 /*
- * Creates some virtual cpus.  Good luck creating more than one.
+ * Creates a virtual cpu.
  */
 static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 {
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/12] Make kvm_pic_reset static.

2009-12-01 Thread Chris Lalancette
Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/i8259.c |2 +-
 arch/x86/kvm/irq.h   |2 --
 2 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index d057c0c..59a8a68 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -233,7 +233,7 @@ int kvm_pic_read_irq(struct kvm *kvm)
return intno;
 }
 
-void kvm_pic_reset(struct kvm_kpic_state *s)
+static void kvm_pic_reset(struct kvm_kpic_state *s)
 {
int irq;
struct kvm *kvm = s-pics_state-irq_request_opaque;
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index be399e2..a5344f1 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -93,8 +93,6 @@ static inline int irqchip_in_kernel(struct kvm *kvm)
return ret;
 }
 
-void kvm_pic_reset(struct kvm_kpic_state *s);
-
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
 void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/12] Remove KVM_REQ_PENDING_TIMER.

2009-12-01 Thread Chris Lalancette
KVM_REQ_PENDING_TIMER is set and cleared in a couple of places,
but it never seems to be actually checked.  Remove it.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/timer.c |5 +
 arch/x86/kvm/x86.c   |1 -
 include/linux/kvm_host.h |1 -
 3 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c
index eea4043..72b5144 100644
--- a/arch/x86/kvm/timer.c
+++ b/arch/x86/kvm/timer.c
@@ -14,11 +14,8 @@ static int __kvm_timer_fn(struct kvm_vcpu *vcpu, struct 
kvm_timer *ktimer)
 * not care about potentially loosing timer events in the !reinject
 * case anyway.
 */
-   if (ktimer-reinject || !atomic_read(ktimer-pending)) {
+   if (ktimer-reinject || !atomic_read(ktimer-pending))
atomic_inc(ktimer-pending);
-   /* FIXME: this code should not know anything about vcpus */
-   set_bit(KVM_REQ_PENDING_TIMER, vcpu-requests);
-   }
 
if (waitqueue_active(q))
wake_up_interruptible(q);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 153a526..2fb4251 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4034,7 +4034,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
if (r = 0)
break;
 
-   clear_bit(KVM_REQ_PENDING_TIMER, vcpu-requests);
if (kvm_cpu_has_pending_timer(vcpu))
kvm_inject_pending_timer_irqs(vcpu);
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bd5a616..053e49f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -33,7 +33,6 @@
 #define KVM_REQ_REPORT_TPR_ACCESS  2
 #define KVM_REQ_MMU_RELOAD 3
 #define KVM_REQ_TRIPLE_FAULT   4
-#define KVM_REQ_PENDING_TIMER  5
 #define KVM_REQ_UNHALT 6
 #define KVM_REQ_MMU_SYNC   7
 #define KVM_REQ_KVMCLOCK_UPDATE8
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/12] Make the PIC use interrupt safe locking.

2009-12-01 Thread Chris Lalancette
Since we want to be able to call kvm_pic_set_irq() in interrupt
context, convert the uses of spin_lock() to spin_lock_irqsave()
as appropriate in i8259.c

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/i8259.c |   34 ++
 1 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 59a8a68..8285f0d 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -52,10 +52,12 @@ static void pic_clear_isr(struct kvm_kpic_state *s, int irq)
 void kvm_pic_clear_isr_ack(struct kvm *kvm)
 {
struct kvm_pic *s = pic_irqchip(kvm);
-   spin_lock(s-lock);
+   unsigned long flags;
+
+   spin_lock_irqsave(s-lock, flags);
s-pics[0].isr_ack = 0xff;
s-pics[1].isr_ack = 0xff;
-   spin_unlock(s-lock);
+   spin_unlock_irqrestore(s-lock, flags);
 }
 
 /*
@@ -156,24 +158,27 @@ static void pic_update_irq(struct kvm_pic *s)
 
 void kvm_pic_update_irq(struct kvm_pic *s)
 {
-   spin_lock(s-lock);
+   unsigned long flags;
+
+   spin_lock_irqsave(s-lock, flags);
pic_update_irq(s);
-   spin_unlock(s-lock);
+   spin_unlock_irqrestore(s-lock, flags);
 }
 
 int kvm_pic_set_irq(void *opaque, int irq, int level)
 {
struct kvm_pic *s = opaque;
int ret = -1;
+   unsigned long flags;
 
-   spin_lock(s-lock);
+   spin_lock_irqsave(s-lock, flags);
if (irq = 0  irq  PIC_NUM_PINS) {
ret = pic_set_irq1(s-pics[irq  3], irq  7, level);
pic_update_irq(s);
trace_kvm_pic_set_irq(irq  3, irq  7, s-pics[irq  3].elcr,
  s-pics[irq  3].imr, ret == 0);
}
-   spin_unlock(s-lock);
+   spin_unlock_irqrestore(s-lock, flags);
 
return ret;
 }
@@ -202,8 +207,9 @@ int kvm_pic_read_irq(struct kvm *kvm)
 {
int irq, irq2, intno;
struct kvm_pic *s = pic_irqchip(kvm);
+   unsigned long flags;
 
-   spin_lock(s-lock);
+   spin_lock_irqsave(s-lock, flags);
irq = pic_get_irq(s-pics[0]);
if (irq = 0) {
pic_intack(s-pics[0], irq);
@@ -228,7 +234,7 @@ int kvm_pic_read_irq(struct kvm *kvm)
intno = s-pics[0].irq_base + irq;
}
pic_update_irq(s);
-   spin_unlock(s-lock);
+   spin_unlock_irqrestore(s-lock, flags);
 
return intno;
 }
@@ -434,6 +440,8 @@ static int picdev_write(struct kvm_io_device *this,
 {
struct kvm_pic *s = to_pic(this);
unsigned char data = *(unsigned char *)val;
+   unsigned long flags;
+
if (!picdev_in_range(addr))
return -EOPNOTSUPP;
 
@@ -442,7 +450,7 @@ static int picdev_write(struct kvm_io_device *this,
printk(KERN_ERR PIC: non byte write\n);
return 0;
}
-   spin_lock(s-lock);
+   spin_lock_irqsave(s-lock, flags);
switch (addr) {
case 0x20:
case 0x21:
@@ -455,7 +463,7 @@ static int picdev_write(struct kvm_io_device *this,
elcr_ioport_write(s-pics[addr  1], addr, data);
break;
}
-   spin_unlock(s-lock);
+   spin_unlock_irqrestore(s-lock, flags);
return 0;
 }
 
@@ -464,6 +472,8 @@ static int picdev_read(struct kvm_io_device *this,
 {
struct kvm_pic *s = to_pic(this);
unsigned char data = 0;
+   unsigned long flags;
+
if (!picdev_in_range(addr))
return -EOPNOTSUPP;
 
@@ -472,7 +482,7 @@ static int picdev_read(struct kvm_io_device *this,
printk(KERN_ERR PIC: non byte read\n);
return 0;
}
-   spin_lock(s-lock);
+   spin_lock_irqsave(s-lock, flags);
switch (addr) {
case 0x20:
case 0x21:
@@ -486,7 +496,7 @@ static int picdev_read(struct kvm_io_device *this,
break;
}
*(unsigned char *)val = data;
-   spin_unlock(s-lock);
+   spin_unlock_irqrestore(s-lock, flags);
return 0;
 }
 
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 05/12] Make the IOAPIC lock a spinlock.

2009-12-01 Thread Chris Lalancette
In order to be able to call kvm_set_irq from an interrupt
context, the IOAPIC lock can't be a (possibly sleeping) mutex.
Convert it to a spinlock.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 virt/kvm/ioapic.c |   40 +---
 virt/kvm/ioapic.h |2 +-
 2 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index fd52949..3d68826 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -192,8 +192,9 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, 
int level)
u32 mask = 1  irq;
union kvm_ioapic_redirect_entry entry;
int ret = 1;
+   unsigned long flags;
 
-   mutex_lock(ioapic-lock);
+   spin_lock_irqsave(ioapic-lock, flags);
if (irq = 0  irq  IOAPIC_NUM_PINS) {
entry = ioapic-redirtbl[irq];
level ^= entry.fields.polarity;
@@ -210,7 +211,7 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, 
int level)
}
trace_kvm_ioapic_set_irq(entry.bits, irq, ret == 0);
}
-   mutex_unlock(ioapic-lock);
+   spin_unlock_irqrestore(ioapic-lock, flags);
 
return ret;
 }
@@ -234,9 +235,9 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic 
*ioapic, int vector,
 * is dropped it will be put into irr and will be delivered
 * after ack notifier returns.
 */
-   mutex_unlock(ioapic-lock);
+   spin_unlock(ioapic-lock);
kvm_notify_acked_irq(ioapic-kvm, KVM_IRQCHIP_IOAPIC, i);
-   mutex_lock(ioapic-lock);
+   spin_lock(ioapic-lock);
 
if (trigger_mode != IOAPIC_LEVEL_TRIG)
continue;
@@ -251,10 +252,11 @@ static void __kvm_ioapic_update_eoi(struct kvm_ioapic 
*ioapic, int vector,
 void kvm_ioapic_update_eoi(struct kvm *kvm, int vector, int trigger_mode)
 {
struct kvm_ioapic *ioapic = kvm-arch.vioapic;
+   unsigned long flags;
 
-   mutex_lock(ioapic-lock);
+   spin_lock_irqsave(ioapic-lock, flags);
__kvm_ioapic_update_eoi(ioapic, vector, trigger_mode);
-   mutex_unlock(ioapic-lock);
+   spin_unlock_irqrestore(ioapic-lock, flags);
 }
 
 static inline struct kvm_ioapic *to_ioapic(struct kvm_io_device *dev)
@@ -273,6 +275,8 @@ static int ioapic_mmio_read(struct kvm_io_device *this, 
gpa_t addr, int len,
 {
struct kvm_ioapic *ioapic = to_ioapic(this);
u32 result;
+   unsigned long flags;
+
if (!ioapic_in_range(ioapic, addr))
return -EOPNOTSUPP;
 
@@ -280,7 +284,7 @@ static int ioapic_mmio_read(struct kvm_io_device *this, 
gpa_t addr, int len,
ASSERT(!(addr  0xf));  /* check alignment */
 
addr = 0xff;
-   mutex_lock(ioapic-lock);
+   spin_lock_irqsave(ioapic-lock, flags);
switch (addr) {
case IOAPIC_REG_SELECT:
result = ioapic-ioregsel;
@@ -294,7 +298,7 @@ static int ioapic_mmio_read(struct kvm_io_device *this, 
gpa_t addr, int len,
result = 0;
break;
}
-   mutex_unlock(ioapic-lock);
+   spin_unlock_irqrestore(ioapic-lock, flags);
 
switch (len) {
case 8:
@@ -316,6 +320,8 @@ static int ioapic_mmio_write(struct kvm_io_device *this, 
gpa_t addr, int len,
 {
struct kvm_ioapic *ioapic = to_ioapic(this);
u32 data;
+   unsigned long flags;
+
if (!ioapic_in_range(ioapic, addr))
return -EOPNOTSUPP;
 
@@ -331,7 +337,7 @@ static int ioapic_mmio_write(struct kvm_io_device *this, 
gpa_t addr, int len,
}
 
addr = 0xff;
-   mutex_lock(ioapic-lock);
+   spin_lock_irqsave(ioapic-lock, flags);
switch (addr) {
case IOAPIC_REG_SELECT:
ioapic-ioregsel = data;
@@ -349,7 +355,7 @@ static int ioapic_mmio_write(struct kvm_io_device *this, 
gpa_t addr, int len,
default:
break;
}
-   mutex_unlock(ioapic-lock);
+   spin_unlock_irqrestore(ioapic-lock, flags);
return 0;
 }
 
@@ -378,7 +384,7 @@ int kvm_ioapic_init(struct kvm *kvm)
ioapic = kzalloc(sizeof(struct kvm_ioapic), GFP_KERNEL);
if (!ioapic)
return -ENOMEM;
-   mutex_init(ioapic-lock);
+   spin_lock_init(ioapic-lock);
kvm-arch.vioapic = ioapic;
kvm_ioapic_reset(ioapic);
kvm_iodevice_init(ioapic-dev, ioapic_mmio_ops);
@@ -393,23 +399,27 @@ int kvm_ioapic_init(struct kvm *kvm)
 int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state)
 {
struct kvm_ioapic *ioapic = ioapic_irqchip(kvm);
+   unsigned long flags;
+
if (!ioapic)
return -EINVAL;
 
-   mutex_lock(ioapic-lock);
+   spin_lock_irqsave(ioapic-lock, flags);
memcpy(state, ioapic, sizeof(struct kvm_ioapic_state));
-   mutex_unlock(ioapic-lock);
+   spin_unlock_irqrestore(ioapic-lock

[PATCH 04/12] IOAPIC timer interrupt redirect.

2009-12-01 Thread Chris Lalancette
Only redirect IRQ 0 (i.e. timer interrupt) to the BSP if
the APIC destination is multiple vcpus.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 virt/kvm/ioapic.c |   15 +--
 1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 53f3166..fd52949 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -144,6 +144,15 @@ static void ioapic_write_indirect(struct kvm_ioapic 
*ioapic, u32 val)
}
 }
 
+static int ioapic_dest_multiple_cpus(struct kvm_lapic_irq *irqe)
+{
+   /* physical mode is always directed to 1 cpu */
+   if (irqe-dest_mode == 0 || irqe-dest_id == 0)
+   return 0;
+
+   return hweight_long(irqe-dest_id) != 1;
+}
+
 static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq)
 {
union kvm_ioapic_redirect_entry *entry = ioapic-redirtbl[irq];
@@ -164,8 +173,10 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int 
irq)
irqe.shorthand = 0;
 
 #ifdef CONFIG_X86
-   /* Always deliver PIT interrupt to vcpu 0 */
-   if (irq == 0) {
+   /* if this is the PIT interrupt and the dest_id is multiple
+* vcpus, re-write to go to vcpu 0
+*/
+   if (irq == 0  ioapic_dest_multiple_cpus(irqe)) {
irqe.dest_mode = 0; /* Physical mode. */
/* need to read apic_id from apic register since
 * it can be rewritten */
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/12] Remove timer.c

2009-12-01 Thread Chris Lalancette
The code in timer.c isn't really similar enough between
the i8254 and the lapic to share.  Split these into
separate functions, and remove timer.c

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/Makefile|3 +-
 arch/x86/kvm/i8254.c |   27 -
 arch/x86/kvm/kvm_timer.h |5 ++-
 arch/x86/kvm/lapic.c |   33 +++-
 arch/x86/kvm/timer.c |   47 --
 5 files changed, 61 insertions(+), 54 deletions(-)
 delete mode 100644 arch/x86/kvm/timer.c

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 31a7035..8d9adf6 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -10,8 +10,7 @@ kvm-y += $(addprefix ../../../virt/kvm/, 
kvm_main.o ioapic.o \
assigned-dev.o)
 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
 
-kvm-y  += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
-  i8254.o timer.o
+kvm-y  += x86.o mmu.o emulate.o i8259.o irq.o lapic.o i8254.o
 kvm-intel-y+= vmx.o
 kvm-amd-y  += svm.o
 
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index fab7440..dc6eff4 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -277,6 +277,30 @@ static struct kvm_timer_ops kpit_ops = {
.is_periodic = kpit_is_periodic,
 };
 
+static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
+{
+   struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+   struct kvm *kvm = ktimer-kvm;
+
+   /*
+* There is a race window between reading and incrementing, but we do
+* not care about potentially losing timer events in the !reinject
+* case anyway.
+*/
+   if (ktimer-reinject || !atomic_read(ktimer-pending))
+   atomic_inc(ktimer-pending);
+
+   if (waitqueue_active(kvm-bsp_vcpu-wq))
+   wake_up_interruptible(kvm-bsp_vcpu-wq);
+
+   if (ktimer-t_ops-is_periodic(ktimer)) {
+   hrtimer_add_expires_ns(ktimer-timer, ktimer-period);
+   return HRTIMER_RESTART;
+   }
+   else
+   return HRTIMER_NORESTART;
+}
+
 static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period)
 {
struct kvm_timer *pt = ps-pit_timer;
@@ -291,10 +315,9 @@ static void create_pit_timer(struct kvm_kpit_state *ps, 
u32 val, int is_period)
pt-period = interval;
ps-is_periodic = is_period;
 
-   pt-timer.function = kvm_timer_fn;
+   pt-timer.function = pit_timer_fn;
pt-t_ops = kpit_ops;
pt-kvm = ps-pit-kvm;
-   pt-vcpu = pt-kvm-bsp_vcpu;
 
atomic_set(pt-pending, 0);
ps-irq_ack = 1;
diff --git a/arch/x86/kvm/kvm_timer.h b/arch/x86/kvm/kvm_timer.h
index 55c7524..fbd4acb 100644
--- a/arch/x86/kvm/kvm_timer.h
+++ b/arch/x86/kvm/kvm_timer.h
@@ -1,3 +1,5 @@
+#ifndef __KVM_TIMER_H
+#define __KVM_TIMER_H
 
 struct kvm_timer {
struct hrtimer timer;
@@ -13,6 +15,5 @@ struct kvm_timer_ops {
 bool (*is_periodic)(struct kvm_timer *);
 };
 
-
-enum hrtimer_restart kvm_timer_fn(struct hrtimer *data);
+#endif
 
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 44acf7c..0005409 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1040,6 +1040,37 @@ static const struct kvm_io_device_ops apic_mmio_ops = {
.write= apic_mmio_write,
 };
 
+static enum hrtimer_restart lapic_timer_fn(struct hrtimer *data)
+{
+   struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+   struct kvm_vcpu *vcpu;
+   struct kvm_lapic *lapic;
+
+   vcpu = ktimer-vcpu;
+   if (!vcpu)
+   return HRTIMER_NORESTART;
+
+   lapic = vcpu-arch.apic;
+
+   /*
+* There is a race window between reading and incrementing, but we do
+* not care about potentially losing timer events in the !reinject
+* case anyway.
+*/
+   if (ktimer-reinject || !atomic_read(ktimer-pending))
+   atomic_inc(ktimer-pending);
+
+   if (waitqueue_active(vcpu-wq))
+   wake_up_interruptible(vcpu-wq);
+
+   if (ktimer-t_ops-is_periodic(ktimer)) {
+   hrtimer_add_expires_ns(ktimer-timer, ktimer-period);
+   return HRTIMER_RESTART;
+   }
+   else
+   return HRTIMER_NORESTART;
+}
+
 int kvm_create_lapic(struct kvm_vcpu *vcpu)
 {
struct kvm_lapic *apic;
@@ -1065,7 +1096,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
 
hrtimer_init(apic-lapic_timer.timer, CLOCK_MONOTONIC,
 HRTIMER_MODE_ABS);
-   apic-lapic_timer.timer.function = kvm_timer_fn;
+   apic-lapic_timer.timer.function = lapic_timer_fn;
apic-lapic_timer.t_ops = lapic_timer_ops;
apic-lapic_timer.kvm = vcpu-kvm;
apic-lapic_timer.vcpu = vcpu;
diff --git a/arch

[PATCH 07/12] Rename kvm_apic_accept_pic_intr

2009-12-01 Thread Chris Lalancette
Call it kvm_apic_in_virtual_wire_mode, which is more
correct.  Also change it to not only operate properly
on the boot CPU, but on any CPU.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/i8259.c |2 +-
 arch/x86/kvm/irq.c   |4 ++--
 arch/x86/kvm/lapic.c |   17 -
 arch/x86/kvm/lapic.h |2 +-
 4 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 8285f0d..80d4705 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -263,7 +263,7 @@ static void kvm_pic_reset(struct kvm_kpic_state *s)
s-init4 = 0;
 
for (irq = 0; irq  PIC_NUM_PINS/2; irq++) {
-   if (vcpu0  kvm_apic_accept_pic_intr(vcpu0))
+   if (vcpu0  kvm_apic_in_virtual_wire_mode(vcpu0))
if (irr  (1  irq) || isr  (1  irq)) {
pic_clear_isr(s, irq);
}
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 96dfbb6..05721fc 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -53,7 +53,7 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
return v-arch.interrupt.pending;
 
if (kvm_apic_has_interrupt(v) == -1) {  /* LAPIC */
-   if (kvm_apic_accept_pic_intr(v)) {
+   if (kvm_apic_in_virtual_wire_mode(v)) {
s = pic_irqchip(v-kvm);/* PIC */
return s-output;
} else
@@ -76,7 +76,7 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
 
vector = kvm_get_apic_interrupt(v); /* APIC */
if (vector == -1) {
-   if (kvm_apic_accept_pic_intr(v)) {
+   if (kvm_apic_in_virtual_wire_mode(v)) {
s = pic_irqchip(v-kvm);
s-output = 0;  /* PIC */
vector = kvm_pic_read_irq(v-kvm);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index cd60c0b..44acf7c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -435,7 +435,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
delivery_mode,
/*
 * Should only be called by kvm_apic_local_deliver() with LVT0,
 * before NMI watchdog was enabled. Already handled by
-* kvm_apic_accept_pic_intr().
+* kvm_apic_in_virtual_wire_mode().
 */
break;
 
@@ -1099,18 +1099,17 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
return highest_irr;
 }
 
-int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
+int kvm_apic_in_virtual_wire_mode(struct kvm_vcpu *vcpu)
 {
u32 lvt0 = apic_get_reg(vcpu-arch.apic, APIC_LVT0);
int r = 0;
 
-   if (kvm_vcpu_is_bsp(vcpu)) {
-   if (!apic_hw_enabled(vcpu-arch.apic))
-   r = 1;
-   if ((lvt0  APIC_LVT_MASKED) == 0 
-   GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
-   r = 1;
-   }
+   if (!apic_hw_enabled(vcpu-arch.apic))
+   r = 1;
+   if ((lvt0  APIC_LVT_MASKED) == 0 
+   GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
+   r = 1;
+
return r;
 }
 
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 40010b0..ce4cd2d 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -22,7 +22,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu);
 void kvm_free_lapic(struct kvm_vcpu *vcpu);
 
 int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
-int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
+int kvm_apic_in_virtual_wire_mode(struct kvm_vcpu *vcpu);
 int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
 void kvm_lapic_reset(struct kvm_vcpu *vcpu);
 u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/12] Allow the PIC to signal CPUs other than BSP.

2009-12-01 Thread Chris Lalancette
While generally the PIT is used to signal only the BSP,
it's doesn't actually have to do that architecturally.
Allow the PIC to signal any VCPU that is in
Virtual Wire mode.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/i8259.c |   19 ++-
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 80d4705..ff20415 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -242,8 +242,6 @@ int kvm_pic_read_irq(struct kvm *kvm)
 static void kvm_pic_reset(struct kvm_kpic_state *s)
 {
int irq;
-   struct kvm *kvm = s-pics_state-irq_request_opaque;
-   struct kvm_vcpu *vcpu0 = kvm-bsp_vcpu;
u8 irr = s-irr, isr = s-imr;
 
s-last_irr = 0;
@@ -263,10 +261,8 @@ static void kvm_pic_reset(struct kvm_kpic_state *s)
s-init4 = 0;
 
for (irq = 0; irq  PIC_NUM_PINS/2; irq++) {
-   if (vcpu0  kvm_apic_in_virtual_wire_mode(vcpu0))
-   if (irr  (1  irq) || isr  (1  irq)) {
-   pic_clear_isr(s, irq);
-   }
+   if (irr  (1  irq) || isr  (1  irq))
+   pic_clear_isr(s, irq);
}
 }
 
@@ -506,14 +502,19 @@ static int picdev_read(struct kvm_io_device *this,
 static void pic_irq_request(void *opaque, int level)
 {
struct kvm *kvm = opaque;
-   struct kvm_vcpu *vcpu = kvm-bsp_vcpu;
struct kvm_pic *s = pic_irqchip(kvm);
int irq = pic_get_irq(s-pics[0]);
+   struct kvm_vcpu *vcpu;
+   int i;
 
s-output = level;
-   if (vcpu  level  (s-pics[0].isr_ack  (1  irq))) {
+   if (level  (s-pics[0].isr_ack  (1  irq))) {
s-pics[0].isr_ack = ~(1  irq);
-   kvm_vcpu_kick(vcpu);
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   if (kvm_apic_in_virtual_wire_mode(vcpu) 
+   waitqueue_active(vcpu-wq))
+   wake_up_interruptible(vcpu-wq);
+   }
}
 }
 
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/12] Kick appropriate CPUs when signalling interrupts.

2009-12-01 Thread Chris Lalancette
Make sure that we kick the appropriate vcpu when delivering
an interrupt.  This makes sure that we wake any idle cpus
to cause a vcpu_run and an interrupt injection to occur.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 virt/kvm/irq_comm.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 9b07734..96df854 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -98,6 +98,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
if (r  0)
r = 0;
r += kvm_apic_set_irq(vcpu, irq);
+   kvm_vcpu_kick(vcpu);
} else {
if (!lowest)
lowest = vcpu;
@@ -106,8 +107,10 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
}
}
 
-   if (lowest)
+   if (lowest) {
r = kvm_apic_set_irq(lowest, irq);
+   kvm_vcpu_kick(lowest);
+   }
 
return r;
 }
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/12] Fix missing spin_lock in PIT timer.

2009-12-01 Thread Chris Lalancette
Changes to the irq_ack variable in the pit_state must be
protected by the inject_lock spinlock; otherwise, we can
erroneously inject a timer interrupt into a guest.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/i8254.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index dc6eff4..ece7e12 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -606,10 +606,13 @@ void kvm_pit_reset(struct kvm_pit *pit)
 static void pit_mask_notifer(struct kvm_irq_mask_notifier *kimn, bool mask)
 {
struct kvm_pit *pit = container_of(kimn, struct kvm_pit, mask_notifier);
+   unsigned long flags;
 
if (!mask) {
+   spin_lock_irqsave(pit-pit_state.inject_lock, flags);
atomic_set(pit-pit_state.pit_timer.pending, 0);
pit-pit_state.irq_ack = 1;
+   spin_unlock_irqrestore(pit-pit_state.inject_lock, flags);
}
 }
 
-- 
1.6.5.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] x86: Make sure get_user_desc() doesn't sign extend.

2009-11-04 Thread Chris Lalancette
The current implementation of get_user_desc() sign extends
the return value because of integer promotion rules.  For
the most part, this doesn't matter, because the top bit of
base2 is usually 0.  If, however, that bit is 1, then the
entire value will be 0x... which is probably not what
the caller intended.  This patch casts the entire thing
to unsigned before returning, which generates almost the
same assembly as the current code but replaces the final
cltq (sign extend) with a mov %eax %eax (zero-extend).
This fixes booting certain guests under KVM.

(3rd retry, no response to the previous 2)

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/include/asm/desc.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index e8de2f6..617bd56 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -288,7 +288,7 @@ static inline void load_LDT(mm_context_t *pc)
 
 static inline unsigned long get_desc_base(const struct desc_struct *desc)
 {
-   return desc-base0 | ((desc-base1)  16) | ((desc-base2)  24);
+   return (unsigned)(desc-base0 | ((desc-base1)  16) | ((desc-base2) 
 24));
 }
 
 static inline void set_desc_base(struct desc_struct *desc, unsigned long base)
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Fix kdump under KVM.

2009-11-02 Thread Chris Lalancette
Avi Kivity wrote:
 On 10/30/2009 02:23 PM, Chris Lalancette wrote:
 In the meantime, I've gotten the set_irq from IRQ context that Avi 
 suggested
 working, and the fixing up of this IOAPIC check is the last bit to actually 
 get
 kdump working.

 
 There are two problems with that:
 
 - kvm_set_irq() calls some notifiers (irq_mask_notifiers, at least, but 
 possibly more); need to make sure those are safe from irq context
 - kvm_set_irq() will loop on all vcpus, usually incurring a cache miss 
 or two; with large vcpu counts this can make the irq-off time quite 
 large, hurting our worst-case latency
 
 we can defer the second problem since it's only a performance issue, but 
 how did you deal with the first?
 

Actually, there is a problem here, but I don't think it's as you describe.
kvm_set_irq() doesn't directly call the mask notifiers; all it really does is
call the i8259/ioapic callback, which causes an interrupt to be raised on the
LAPIC in question, which causes the interrupt to be injected into the VCPU on
the next run.  That doesn't call the irq_mask_notifiers, and seems to be
IRQ-safe (with the conversion from mutex's to spin_locks).

The irq mask notifiers are called later on during an EOI.  In particular,
ioapic_mmio_write() - ioapic_write_indirect(), which is protected by the
ioapic_lock.  I don't see a problem here (although please point it out if I
missed it).

The problem I do see has to do with the irq_ack notifiers.  In particular, if
you are on ia64 and ioapic_mmio_write() gets called for the IOAPIC_REG_EOI
address, we call __kvm_ioapic_update_eoi().  In that case, we drop the spin_lock
and call kvm_notify_acked_irq(), which can re-enter the ioapic code.  This could
be IRQ unsafe, after we return from kvm_notify_acked_irq() we re-acquire the
spin-lock, but in an irq-unsafe fashion.  Therefore we could take the spin-lock,
get interrupted for the timer interrupt, and try to re-acquire the lock, causing
a deadlock.  I'll have to think a bit more on how to solve that one (there's a
similar situation going on in the i8259, which is not ia64 specific).

So, either I'm missing exactly what you are talking about, or there's no
deadlock with my patch (except with ia64).  Can you explain further?

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm problems on new hardware

2009-10-30 Thread Chris Lalancette
Ryan Harper wrote:
 * Danny ter Haar d...@dth.net [2009-10-29 13:38]:
 Update:
 I compiled/installed 2.6.32-rc5-git3 on this machine

 I manually start kvm:

 vhost1:~# kvm -m 512 -cdrom  /vz/template/iso/debian-503-amd64-netinst.iso

 The bootscreen comes up, i hit enter to install and i get these messages
 (copied from dmesg)

 handle_exception: unexpected, vectoring info 0x8010 intr info 0x8b0d
 handle_exception: unexpected, vectoring info 0x800d intr info 0x8b0d
 [this line is repeated many times: 
 dmesg |grep 0x8b0d | wc -l 
 570 ]
 and finally before ending the kvm session i get:
 vmx_handle_exit: unexpected, valid vectoring info (0x800d) and exit 
 reason is 0x8021
 
 I've seen that in a couple places.  I don't think we have root cause,
 but in at least one situation (running win2k3 with  4G of ram) the
 work around was to use:

Actually, I'm not 100% sure, but I might have root-caused this.  At least, the
exit reason is the same exit reason I fixed.  That exit reason essentially means
vmenter failed because of invalid guest fields.  In the case I tracked down,
it was because we were wrongly sign-extending the segment fields (instead of
zero-extending them).  Can either you or Danny:

1)  Get the output from stderr of qemu when this happens?  I believe qemu dumps
the state of all of the guest fields when it's going to crash like this, and
that can tell us if the GUEST_STATE is wrong.

2)  Try the patch posted here: http://lkml.org/lkml/2009/10/28/201, and see if
it helps?

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Fix kdump under KVM.

2009-10-30 Thread Chris Lalancette
Marcelo Tosatti wrote:
 On Wed, Oct 28, 2009 at 11:21:42AM +0100, Chris Lalancette wrote:
 Marcelo Tosatti wrote:
 On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
 This patch is the main point of the series.  In order for
 kdump to properly work inside a KVM guest, we need to make
 sure that all VCPUs in virtual wire APIC mode get kicked
 to try and pick up the timer interrupts.  To do this,
 we iterate over the CPUs and deliver interrupts to the
 proper VCPUs.

 I don't love the concept of doing kvm_irq_kick_vcpus() from
 within pit_timer_fn().  A PIT is not connected to a CPU at all,
 only to a PIC or APIC.  However, if a CPU enters idle, this is
 the only way to wake it up to check for the interrupt.
 The reason the PIT interrupt was fixed to BSP is:

 http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg13250.html

 Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the   
 destination overwrite in case its programmed by the guest to 
 a single CPU would fix it?
 Ug, nasty.  I definitely don't want to re-introduce that bug.  What exactly 
 do
 you mean by fix it?  Do you mean fix the original RHEL-5.1 PAE issue, or 
 fix
 the kdump issue?
 
 The kdump issue. Something like:
 
 #ifdef CONFIG_X86
 /* Always delivery PIT interrupt to vcpu 0 */
 if (irq == 0  dest_multiple_vcpus(entry)) {
 irqe.dest_mode = 0; /* Physical mode. */
 /* need to read apic_id from apic regiest since
  * it can be rewritten */
 irqe.dest_id = ioapic-kvm-bsp_vcpu-vcpu_id;
 }
 #endif
 
 The rest of the code should be ready to deal with it.

Do you have any more information about how to reproduce that original issue?  I
tried installing a RHEL-5.1 i386 guest with a PAE kernel, and I was able to
install and boot it just fine with or without the always deliver irq 0 to vcpu
0 check above.  Either there's something in particular I need to do to trigger
the issue (hardware or software), or it's being solved another way.

In the meantime, I've gotten the set_irq from IRQ context that Avi suggested
working, and the fixing up of this IOAPIC check is the last bit to actually get
kdump working.

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Fix kdump under KVM.

2009-10-30 Thread Chris Lalancette
Marcelo Tosatti wrote:
 On Fri, Oct 30, 2009 at 01:23:57PM +0100, Chris Lalancette wrote:
 Marcelo Tosatti wrote:
 On Wed, Oct 28, 2009 at 11:21:42AM +0100, Chris Lalancette wrote:
 Marcelo Tosatti wrote:
 On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
 This patch is the main point of the series.  In order for
 kdump to properly work inside a KVM guest, we need to make
 sure that all VCPUs in virtual wire APIC mode get kicked
 to try and pick up the timer interrupts.  To do this,
 we iterate over the CPUs and deliver interrupts to the
 proper VCPUs.

 I don't love the concept of doing kvm_irq_kick_vcpus() from
 within pit_timer_fn().  A PIT is not connected to a CPU at all,
 only to a PIC or APIC.  However, if a CPU enters idle, this is
 the only way to wake it up to check for the interrupt.
 The reason the PIT interrupt was fixed to BSP is:

 http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg13250.html

 Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the   
 destination overwrite in case its programmed by the guest to 
 a single CPU would fix it?
 Ug, nasty.  I definitely don't want to re-introduce that bug.  What 
 exactly do
 you mean by fix it?  Do you mean fix the original RHEL-5.1 PAE issue, or 
 fix
 the kdump issue?
 The kdump issue. Something like:

 #ifdef CONFIG_X86
 /* Always delivery PIT interrupt to vcpu 0 */
 if (irq == 0  dest_multiple_vcpus(entry)) {
 irqe.dest_mode = 0; /* Physical mode. */
 /* need to read apic_id from apic regiest since
  * it can be rewritten */
 irqe.dest_id = ioapic-kvm-bsp_vcpu-vcpu_id;
 }
 #endif

 The rest of the code should be ready to deal with it.
 Do you have any more information about how to reproduce that original issue? 
  I
 tried installing a RHEL-5.1 i386 guest with a PAE kernel, and I was able to
 install and boot it just fine with or without the always deliver irq 0 to 
 vcpu
 0 check above.  Either there's something in particular I need to do to 
 trigger
 the issue (hardware or software), or it's being solved another way.
 
 For VMX, the guests TSC's are now usually synchronized (which was not
 the case before, when this patch was written). For AMD, they start out
 of sync.
 
 Migration also causes them to be out of sync. So i'd prefer to postpone
 the removal of this limitation until there is a guarantee the TSCs 
 are synchronized across vcpus.

OK.  I'll try to get on an unsynchronized TSC machine and see if I can reproduce
there.

 
 In the meantime, I've gotten the set_irq from IRQ context that Avi 
 suggested
 working, and the fixing up of this IOAPIC check is the last bit to actually 
 get
 kdump working.
 
 The kdump fix does not require that, I believe (although it is
 useful/required for other things, please send the patch separately).
 
 The hrtimer will fire on vcpu0, then:
 
 kvm_inject_pit_interrupts - ioapic_set_irq - apic_deliver -
 kvm_vcpu_kick(vcpuN)
 
 As long as the dest-id = vcpu_id overwrite change is in place.

Just to follow up here, I am taking your suggestion about testing for dest_id to
multiple vcpus in the IOAPIC, and it seems to be working so far.  However, that
is not enough to fix kdump.  We also need to remove the BSP assumptions in the
i8254 code, otherwise we will never even attempt to deliver the interrupt to
!vcpu0.  That means that we need code for Avi's suggestion about kvm_set_irq for
this to all work.

Attached is the patch that I'm currently testing if anyone else wants to look at
it.  I obviously need to break it up into a nice series, but that will have to
wait until next week.  My ioapic_dest_multiple_cpus() function could do with a
popcnt, but I couldn't find a macro after a quick look around the tree.
What's the recommended way to count bits in Linux?

-- 
Chris Lalancette


current.patch
Description: application/mbox


Re: [PATCH 0/5]: Fix kdump under KVM

2009-10-29 Thread Chris Lalancette
Avi Kivity wrote:
 On 10/28/2009 12:13 PM, Chris Lalancette wrote:
 The kick from i8254 code is pretty bad, as you mention.  I forget why it
 is needed at all - shouldn't kvm_set_irq() end up kicking the correct
  
 As I understand it, that's not quite how it works.  From what I can see, what
 happens is that the i8254 is programmed as an hrtimer.  When the hrtimer
 expires, we get a callback in kvm_timer_fn (or pit_timer_fn, in my new code).
 That code is running in interrupt context, however, so you can't directly 
 call
 set_irq at that point.  Instead, we update the pending variable and defer
 work until later on.  That later on is when we are doing a vcpu_run, at 
 which
 point we check the pending variable, and if set, inject the interrupt.

 The problem is that if the vcpu(s) are in idle when the hrtimer expires, and 
 we
 don't kick them, no vcpu will wake up, and hence none of them will ever run
 set_irq to get it injected into the guest.

 
 Oh yes, I remember now.
 
 If you have other ideas on how we might accomplish this, I'd definitely be
 interested in hearing them.

 
 We could schedule work to run in thread context.  Previous to the user 
 return notifier work, this would cause a reloading of a bunch of msrs 
 and would thus be quite expensive, but now switching to kernel threads 
 and back is pretty cheap.
 
 So we could try schedule_work().  My only worry is that it uses 
 wake_up() instead of wake_up_sync(), so it could introduce delay.  I'd 
 much prefer a variant the schedules immediately.
 
 An alternative is to make irq injection work from irq context.  It's not 
 difficult to do technically - convert mutexes to spin_lock_irqsave() - 
 I'm just worried about long irqoff times with large guests (ioapic needs 
 to iterate over all vcpus sometimes).  This is useful for other cases - 
 device assignment interrupt injection, irqfd for vhost/vbus.  So maybe 
 we should go this path, and worry about reducing irqoff times later when 
 we bump KVM_MAX_VCPUS.
 

I'm starting to take a look at this.  While this may be a generically useful
cleanup, it occurs to me that even if we call set_irq from the hrtimer
callback, we still have to do the kick_vcpus type thing.  Otherwise, we still
have the problem where if all vcpus are idle, none of them will be doing
vcpu_run anytime soon to actually inject the interrupt into the guest.

Or did I mis-understand what you are proposing?

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] Fix kdump under KVM.

2009-10-28 Thread Chris Lalancette
Marcelo Tosatti wrote:
 On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
 This patch is the main point of the series.  In order for
 kdump to properly work inside a KVM guest, we need to make
 sure that all VCPUs in virtual wire APIC mode get kicked
 to try and pick up the timer interrupts.  To do this,
 we iterate over the CPUs and deliver interrupts to the
 proper VCPUs.

 I don't love the concept of doing kvm_irq_kick_vcpus() from
 within pit_timer_fn().  A PIT is not connected to a CPU at all,
 only to a PIC or APIC.  However, if a CPU enters idle, this is
 the only way to wake it up to check for the interrupt.
 
 The reason the PIT interrupt was fixed to BSP is:
 
 http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg13250.html
 
 Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the   
 destination overwrite in case its programmed by the guest to 
 a single CPU would fix it?

Ug, nasty.  I definitely don't want to re-introduce that bug.  What exactly do
you mean by fix it?  Do you mean fix the original RHEL-5.1 PAE issue, or fix
the kdump issue?

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] x86: Make sure get_user_desc() doesn't sign extend.

2009-10-28 Thread Chris Lalancette
The current implementation of get_user_desc() sign extends
the return value because of integer promotion rules.  For
the most part, this doesn't matter, because the top bit of
base2 is usually 0.  If, however, that bit is 1, then the
entire value will be 0x... which is probably not what
the caller intended.  This patch casts the entire thing
to unsigned before returning, which generates almost the
same assembly as the current code but replaces the final
cltq (sign extend) with a mov %eax %eax (zero-extend).
This fixes booting certain guests under KVM.

(2nd resend, since no response to the last two submissions)

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/include/asm/desc.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index e8de2f6..617bd56 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -288,7 +288,7 @@ static inline void load_LDT(mm_context_t *pc)
 
 static inline unsigned long get_desc_base(const struct desc_struct *desc)
 {
-   return desc-base0 | ((desc-base1)  16) | ((desc-base2)  24);
+   return (unsigned)(desc-base0 | ((desc-base1)  16) | ((desc-base2) 
 24));
 }
 
 static inline void set_desc_base(struct desc_struct *desc, unsigned long base)
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/5]: Fix kdump under KVM

2009-10-27 Thread Chris Lalancette

This patch series aims to get kdump working inside a KVM guest.
The current problem with using kdump is that KVM always delivers
PIT interrupts to the BSP, and the BSP only.  While this is
technically allowed by the MPS spec, most motherboards actually
deliver timer interrupts to *any* LAPIC in virtual wire mode.
Since a crash can occur on any CPU, timer interrupts must
be able to reach any CPU in order for kdump to work properly.

Therefore, this patch series kicks all of the relevant vCPUs
when delivering a timer interrupt.  With these patches in
place, kdump in a RHEL-5 guest works properly.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] Fix up some comments around the source tree.

2009-10-27 Thread Chris Lalancette
Signed-off-by: Chris Lalancette clala...@redhat.com
---
:100644 100644 34b700f... ba61f27... M  arch/x86/kvm/svm.c
:100644 100644 38a2d20... cd6f92b... M  virt/kvm/ioapic.c
:100644 100644 bd44fb4... c22bc17... M  virt/kvm/kvm_main.c
 arch/x86/kvm/svm.c  |2 +-
 virt/kvm/ioapic.c   |2 +-
 virt/kvm/kvm_main.c |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 34b700f..ba61f27 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2608,7 +2608,7 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
break;
case SVM_EXITINTINFO_TYPE_EXEPT:
/* In case of software exception do not reinject an exception
-  vector, but re-execute and instruction instead */
+  vector, but re-execute an instruction instead */
if (is_nested(svm))
break;
if (kvm_exception_is_soft(vector))
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 38a2d20..cd6f92b 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -164,7 +164,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int 
irq)
irqe.shorthand = 0;
 
 #ifdef CONFIG_X86
-   /* Always delivery PIT interrupt to vcpu 0 */
+   /* Always deliver PIT interrupt to vcpu 0 */
if (irq == 0) {
irqe.dest_mode = 0; /* Physical mode. */
/* need to read apic_id from apic regiest since
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bd44fb4..c22bc17 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1180,7 +1180,7 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu)
 }
 
 /*
- * Creates some virtual cpus.  Good luck creating more than one.
+ * Creates a virtual cpu.
  */
 static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 {
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] Remove references to VCPU in i8254

2009-10-27 Thread Chris Lalancette
Conceptually, the i8254 is hooked to a PIC or IOAPIC.  Therefore,
this patch removes most references to vcpu in i8254.c.
There are two exceptions to this:

1)  In pit_timer_fn, we still have to kick the BSP to wake it out
of idle.  This will be changed in a later patch.

2)  In __kvm_migrate_pit_timer, we have to migrate the PIT
around with the BSP, since hrtimers work on a per-CPU basis.
I've added a comment here to clarify why this is needed.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
:100644 100644 fab7440... d5c08fa... M  arch/x86/kvm/i8254.c
:100644 100644 d4c1c7f... d7bc40b... M  arch/x86/kvm/i8254.h
:100644 100644 96dfbb6... ab3a56e... M  arch/x86/kvm/irq.c
:100644 100644 c025a23... e16b968... M  arch/x86/kvm/irq.h
:100644 100644 55c7524... ba39e25... M  arch/x86/kvm/kvm_timer.h
 arch/x86/kvm/i8254.c |   50 ++---
 arch/x86/kvm/i8254.h |4 ++-
 arch/x86/kvm/irq.c   |4 +-
 arch/x86/kvm/irq.h   |2 -
 arch/x86/kvm/kvm_timer.h |3 ++
 5 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index fab7440..d5c08fa 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -33,6 +33,7 @@
 
 #include irq.h
 #include i8254.h
+#include kvm_timer.h
 
 #ifndef CONFIG_X86_64
 #define mod_64(x, y) ((x) - (y) * div64_u64(x, y))
@@ -227,12 +228,13 @@ static void pit_latch_status(struct kvm *kvm, int channel)
}
 }
 
-int pit_has_pending_timer(struct kvm_vcpu *vcpu)
+int pit_has_pending_timer(struct kvm *kvm)
 {
-   struct kvm_pit *pit = vcpu-kvm-arch.vpit;
+   struct kvm_pit *pit = kvm-arch.vpit;
 
-   if (pit  kvm_vcpu_is_bsp(vcpu)  pit-pit_state.irq_ack)
+   if (pit  pit-pit_state.irq_ack)
return atomic_read(pit-pit_state.pit_timer.pending);
+
return 0;
 }
 
@@ -252,6 +254,13 @@ void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
struct kvm_pit *pit = vcpu-kvm-arch.vpit;
struct hrtimer *timer;
 
+   /*
+* technically, the PIT isn't hooked to a particular VCPU;
+* the logical structure is PIT - [IOA]PIC - CPU[s].  However,
+* hrtimers expire on a per-cpu basis, and since we initially
+* created the hrtimer during BSP creation, we move it around
+* with the BSP.
+*/
if (!kvm_vcpu_is_bsp(vcpu) || !pit)
return;
 
@@ -277,6 +286,33 @@ static struct kvm_timer_ops kpit_ops = {
.is_periodic = kpit_is_periodic,
 };
 
+static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
+{
+   struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+   int restart_timer = 0;
+
+   /*
+* There is a race window between reading and incrementing, but we do
+* not care about potentially losing timer events in the !reinject
+* case anyway.
+*/
+   if (ktimer-reinject || !atomic_read(ktimer-pending))
+   atomic_inc(ktimer-pending);
+
+   if (waitqueue_active(ktimer-kvm-bsp_vcpu-wq))
+   wake_up_interruptible(ktimer-kvm-bsp_vcpu-wq);
+
+   if (ktimer-t_ops-is_periodic(ktimer)) {
+   hrtimer_add_expires_ns(ktimer-timer, ktimer-period);
+   restart_timer = 1;
+   }
+
+   if (restart_timer)
+   return HRTIMER_RESTART;
+   else
+   return HRTIMER_NORESTART;
+}
+
 static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period)
 {
struct kvm_timer *pt = ps-pit_timer;
@@ -291,10 +327,9 @@ static void create_pit_timer(struct kvm_kpit_state *ps, 
u32 val, int is_period)
pt-period = interval;
ps-is_periodic = is_period;
 
-   pt-timer.function = kvm_timer_fn;
+   pt-timer.function = pit_timer_fn;
pt-t_ops = kpit_ops;
pt-kvm = ps-pit-kvm;
-   pt-vcpu = pt-kvm-bsp_vcpu;
 
atomic_set(pt-pending, 0);
ps-irq_ack = 1;
@@ -705,10 +740,9 @@ static void __inject_pit_timer_intr(struct kvm *kvm)
kvm_apic_nmi_wd_deliver(vcpu);
 }
 
-void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu)
+void kvm_inject_pit_timer_irqs(struct kvm *kvm)
 {
-   struct kvm_pit *pit = vcpu-kvm-arch.vpit;
-   struct kvm *kvm = vcpu-kvm;
+   struct kvm_pit *pit = kvm-arch.vpit;
struct kvm_kpit_state *ps;
 
if (pit) {
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index d4c1c7f..d7bc40b 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -49,10 +49,12 @@ struct kvm_pit {
 #define KVM_MAX_PIT_INTR_INTERVAL   HZ / 100
 #define KVM_PIT_CHANNEL_MASK   0x3
 
-void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu);
+int pit_has_pending_timer(struct kvm *kvm);
+void kvm_inject_pit_timer_irqs(struct kvm *kvm);
 void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int 
hpet_legacy_start);
 struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags);
 void kvm_free_pit(struct kvm *kvm

[PATCH 5/5] Fix kdump under KVM.

2009-10-27 Thread Chris Lalancette
This patch is the main point of the series.  In order for
kdump to properly work inside a KVM guest, we need to make
sure that all VCPUs in virtual wire APIC mode get kicked
to try and pick up the timer interrupts.  To do this,
we iterate over the CPUs and deliver interrupts to the
proper VCPUs.

I don't love the concept of doing kvm_irq_kick_vcpus() from
within pit_timer_fn().  A PIT is not connected to a CPU at all,
only to a PIC or APIC.  However, if a CPU enters idle, this is
the only way to wake it up to check for the interrupt.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
:100644 100644 d5c08fa... fe08823... M  arch/x86/kvm/i8254.c
:100644 100644 40f... 5b699c1... M  arch/x86/kvm/lapic.c
:100644 100644 40010b0... 9c4e52b... M  arch/x86/kvm/lapic.h
:100644 100644 053e49f... 975b0d6... M  include/linux/kvm_host.h
:100644 100644 cd6f92b... 717d265... M  virt/kvm/ioapic.c
:100644 100644 0d454d3... d24ac91... M  virt/kvm/irq_comm.c
 arch/x86/kvm/i8254.c |3 +--
 arch/x86/kvm/lapic.c |   12 
 arch/x86/kvm/lapic.h |1 +
 include/linux/kvm_host.h |2 ++
 virt/kvm/ioapic.c|9 -
 virt/kvm/irq_comm.c  |   12 
 6 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index d5c08fa..fe08823 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -299,8 +299,7 @@ static enum hrtimer_restart pit_timer_fn(struct hrtimer 
*data)
if (ktimer-reinject || !atomic_read(ktimer-pending))
atomic_inc(ktimer-pending);
 
-   if (waitqueue_active(ktimer-kvm-bsp_vcpu-wq))
-   wake_up_interruptible(ktimer-kvm-bsp_vcpu-wq);
+   kvm_irq_kick_vcpus(ktimer-kvm);
 
if (ktimer-t_ops-is_periodic(ktimer)) {
hrtimer_add_expires_ns(ktimer-timer, ktimer-period);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 40f..5b699c1 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1031,6 +1031,18 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu)
kvm_apic_local_deliver(apic, APIC_LVT0);
 }
 
+int kvm_apic_in_virtual_wire_mode(struct kvm_vcpu *vcpu)
+{
+   if (kvm_lapic_enabled(vcpu)) {
+   u32 lvt0 = apic_get_reg(vcpu-arch.apic, APIC_LVT0);
+   if ((lvt0  APIC_LVT_MASKED) == 0 
+   GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
+   return 1;
+   }
+
+   return 0;
+}
+
 static struct kvm_timer_ops lapic_timer_ops = {
.is_periodic = lapic_is_periodic,
 };
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 40010b0..9c4e52b 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -30,6 +30,7 @@ void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long 
cr8);
 void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
 u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
 void kvm_apic_set_version(struct kvm_vcpu *vcpu);
+int kvm_apic_in_virtual_wire_mode(struct kvm_vcpu *vcpu);
 
 int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
 int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 053e49f..975b0d6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -542,6 +542,8 @@ int kvm_set_irq_routing(struct kvm *kvm,
unsigned flags);
 void kvm_free_irq_routing(struct kvm *kvm);
 
+void kvm_irq_kick_vcpus(struct kvm *kvm);
+
 #else
 
 static inline void kvm_free_irq_routing(struct kvm *kvm) {}
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index cd6f92b..717d265 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -163,15 +163,6 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int 
irq)
irqe.level = 1;
irqe.shorthand = 0;
 
-#ifdef CONFIG_X86
-   /* Always deliver PIT interrupt to vcpu 0 */
-   if (irq == 0) {
-   irqe.dest_mode = 0; /* Physical mode. */
-   /* need to read apic_id from apic regiest since
-* it can be rewritten */
-   irqe.dest_id = ioapic-kvm-bsp_vcpu-vcpu_id;
-   }
-#endif
return kvm_irq_delivery_to_apic(ioapic-kvm, NULL, irqe);
 }
 
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 0d454d3..d24ac91 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -293,6 +293,18 @@ void kvm_free_irq_routing(struct kvm *kvm)
kfree(kvm-irq_routing);
 }
 
+void kvm_irq_kick_vcpus(struct kvm *kvm)
+{
+   int i;
+   struct kvm_vcpu *vcpu;
+
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   if (kvm_apic_in_virtual_wire_mode(vcpu))
+   if (waitqueue_active(vcpu-wq))
+   wake_up_interruptible(vcpu-wq);
+   }
+}
+
 static int setup_routing_entry(struct kvm_irq_routing_table *rt,
   struct kvm_kernel_irq_routing_entry *e

[PATCH 4/5] Remove timer.c

2009-10-27 Thread Chris Lalancette
The code in arch/x86/kvm/timer.c is not similar enough between
the various implementations to really share it.  Move the
implementation into the LAPIC code, and then remove timer.c

Signed-off-by: Chris Lalancette clala...@redhat.com
---
:100644 100644 31a7035... 8d9adf6... M  arch/x86/kvm/Makefile
:100644 100644 ba39e25... 55b2dab... M  arch/x86/kvm/kvm_timer.h
:100644 100644 cd60c0b... 40f... M  arch/x86/kvm/lapic.c
:100644 00 72b5144... 000... D  arch/x86/kvm/timer.c
 arch/x86/kvm/Makefile|3 +-
 arch/x86/kvm/kvm_timer.h |3 --
 arch/x86/kvm/lapic.c |   34 -
 arch/x86/kvm/timer.c |   47 --
 4 files changed, 34 insertions(+), 53 deletions(-)

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 31a7035..8d9adf6 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -10,8 +10,7 @@ kvm-y += $(addprefix ../../../virt/kvm/, 
kvm_main.o ioapic.o \
assigned-dev.o)
 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
 
-kvm-y  += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
-  i8254.o timer.o
+kvm-y  += x86.o mmu.o emulate.o i8259.o irq.o lapic.o i8254.o
 kvm-intel-y+= vmx.o
 kvm-amd-y  += svm.o
 
diff --git a/arch/x86/kvm/kvm_timer.h b/arch/x86/kvm/kvm_timer.h
index ba39e25..55b2dab 100644
--- a/arch/x86/kvm/kvm_timer.h
+++ b/arch/x86/kvm/kvm_timer.h
@@ -15,7 +15,4 @@ struct kvm_timer_ops {
 bool (*is_periodic)(struct kvm_timer *);
 };
 
-
-enum hrtimer_restart kvm_timer_fn(struct hrtimer *data);
-
 #endif
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index cd60c0b..40f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1040,6 +1040,38 @@ static const struct kvm_io_device_ops apic_mmio_ops = {
.write= apic_mmio_write,
 };
 
+static enum hrtimer_restart lapic_timer_fn(struct hrtimer *data)
+{
+   struct kvm_vcpu *vcpu;
+   struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+   int restart_timer = 0;
+
+   vcpu = ktimer-vcpu;
+   if (!vcpu)
+   return HRTIMER_NORESTART;
+
+   /*
+* There is a race window between reading and incrementing, but we do
+* not care about potentially losing timer events in the !reinject
+* case anyway.
+*/
+   if (ktimer-reinject || !atomic_read(ktimer-pending))
+   atomic_inc(ktimer-pending);
+
+   if (waitqueue_active(vcpu-wq))
+   wake_up_interruptible(vcpu-wq);
+
+   if (ktimer-t_ops-is_periodic(ktimer)) {
+   hrtimer_add_expires_ns(ktimer-timer, ktimer-period);
+   restart_timer = 1;
+   }
+
+   if (restart_timer)
+   return HRTIMER_RESTART;
+   else
+   return HRTIMER_NORESTART;
+}
+
 int kvm_create_lapic(struct kvm_vcpu *vcpu)
 {
struct kvm_lapic *apic;
@@ -1065,7 +1097,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
 
hrtimer_init(apic-lapic_timer.timer, CLOCK_MONOTONIC,
 HRTIMER_MODE_ABS);
-   apic-lapic_timer.timer.function = kvm_timer_fn;
+   apic-lapic_timer.timer.function = lapic_timer_fn;
apic-lapic_timer.t_ops = lapic_timer_ops;
apic-lapic_timer.kvm = vcpu-kvm;
apic-lapic_timer.vcpu = vcpu;
diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c
deleted file mode 100644
index 72b5144..000
--- a/arch/x86/kvm/timer.c
+++ /dev/null
@@ -1,47 +0,0 @@
-#include linux/kvm_host.h
-#include linux/kvm.h
-#include linux/hrtimer.h
-#include asm/atomic.h
-#include kvm_timer.h
-
-static int __kvm_timer_fn(struct kvm_vcpu *vcpu, struct kvm_timer *ktimer)
-{
-   int restart_timer = 0;
-   wait_queue_head_t *q = vcpu-wq;
-
-   /*
-* There is a race window between reading and incrementing, but we do
-* not care about potentially loosing timer events in the !reinject
-* case anyway.
-*/
-   if (ktimer-reinject || !atomic_read(ktimer-pending))
-   atomic_inc(ktimer-pending);
-
-   if (waitqueue_active(q))
-   wake_up_interruptible(q);
-
-   if (ktimer-t_ops-is_periodic(ktimer)) {
-   hrtimer_add_expires_ns(ktimer-timer, ktimer-period);
-   restart_timer = 1;
-   }
-
-   return restart_timer;
-}
-
-enum hrtimer_restart kvm_timer_fn(struct hrtimer *data)
-{
-   int restart_timer;
-   struct kvm_vcpu *vcpu;
-   struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
-
-   vcpu = ktimer-vcpu;
-   if (!vcpu)
-   return HRTIMER_NORESTART;
-
-   restart_timer = __kvm_timer_fn(vcpu, ktimer);
-   if (restart_timer)
-   return HRTIMER_RESTART;
-   else
-   return HRTIMER_NORESTART;
-}
-
-- 
1.6.0.6

--
To unsubscribe from

[PATCH 2/5] Remove KVM_REQ_PENDING_TIMER.

2009-10-27 Thread Chris Lalancette
KVM_REQ_PENDING_TIMER is set and cleared in a couple of places,
but it never seems to be actually checked.  Remove it.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
:100644 100644 eea4043... 72b5144... M  arch/x86/kvm/timer.c
:100644 100644 2ef39062.. 93a65b4... M  arch/x86/kvm/x86.c
:100644 100644 bd5a616... 053e49f... M  include/linux/kvm_host.h
 arch/x86/kvm/timer.c |5 +
 arch/x86/kvm/x86.c   |1 -
 include/linux/kvm_host.h |1 -
 3 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c
index eea4043..72b5144 100644
--- a/arch/x86/kvm/timer.c
+++ b/arch/x86/kvm/timer.c
@@ -14,11 +14,8 @@ static int __kvm_timer_fn(struct kvm_vcpu *vcpu, struct 
kvm_timer *ktimer)
 * not care about potentially loosing timer events in the !reinject
 * case anyway.
 */
-   if (ktimer-reinject || !atomic_read(ktimer-pending)) {
+   if (ktimer-reinject || !atomic_read(ktimer-pending))
atomic_inc(ktimer-pending);
-   /* FIXME: this code should not know anything about vcpus */
-   set_bit(KVM_REQ_PENDING_TIMER, vcpu-requests);
-   }
 
if (waitqueue_active(q))
wake_up_interruptible(q);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2ef3906..93a65b4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3906,7 +3906,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
if (r = 0)
break;
 
-   clear_bit(KVM_REQ_PENDING_TIMER, vcpu-requests);
if (kvm_cpu_has_pending_timer(vcpu))
kvm_inject_pending_timer_irqs(vcpu);
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bd5a616..053e49f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -33,7 +33,6 @@
 #define KVM_REQ_REPORT_TPR_ACCESS  2
 #define KVM_REQ_MMU_RELOAD 3
 #define KVM_REQ_TRIPLE_FAULT   4
-#define KVM_REQ_PENDING_TIMER  5
 #define KVM_REQ_UNHALT 6
 #define KVM_REQ_MMU_SYNC   7
 #define KVM_REQ_KVMCLOCK_UPDATE8
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Make sure get_user_desc() doesn't sign extend.

2009-10-23 Thread Chris Lalancette
The current implementation of get_user_desc() sign extends
the return value because of integer promotion rules.  For
the most part, this doesn't matter, because the top bit of
base2 is usually 0.  If, however, that bit is 1, then the
entire value will be 0x... which is probably not what
the caller intended.  This patch casts the entire thing
to unsigned before returning, which generates almost the
same assembly as the current code but replaces the final
cltq (sign extend) with a mov %eax %eax (zero-extend).
This fixes booting certain guests under KVM.

(resend; I got no response last time so I guess I sent it to
the wrong places)

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/include/asm/desc.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index e8de2f6..617bd56 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -288,7 +288,7 @@ static inline void load_LDT(mm_context_t *pc)
 
 static inline unsigned long get_desc_base(const struct desc_struct *desc)
 {
-   return desc-base0 | ((desc-base1)  16) | ((desc-base2)  24);
+   return (unsigned)(desc-base0 | ((desc-base1)  16) | ((desc-base2) 
 24));
 }
 
 static inline void set_desc_base(struct desc_struct *desc, unsigned long base)
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Make sure get_user_desc() doesn't sign extend.

2009-10-21 Thread Chris Lalancette
The current implementation of get_user_desc() sign extends
the return value because of integer promotion rules.  For
the most part, this doesn't matter, because the top bit of
base2 is usually 0.  If, however, that bit is 1, then the
entire value will be 0x... which is probably not what
the caller intended.  This patch casts the entire thing
to unsigned before returning, which generates almost the
same assembly as the current code but replaces the final
cltq (sign extend) with a mov %eax %eax (zero-extend).
This fixes booting certain guests under KVM.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/include/asm/desc.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index e8de2f6..617bd56 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -288,7 +288,7 @@ static inline void load_LDT(mm_context_t *pc)
 
 static inline unsigned long get_desc_base(const struct desc_struct *desc)
 {
-   return desc-base0 | ((desc-base1)  16) | ((desc-base2)  24);
+   return (unsigned)(desc-base0 | ((desc-base1)  16) | ((desc-base2) 
 24));
 }
 
 static inline void set_desc_base(struct desc_struct *desc, unsigned long base)
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix up vmx_set_segment for booting older guests.

2009-10-20 Thread Chris Lalancette
If a guest happens to be unlucky enough to use an address
such as 0xc000 in the CS base address field, the next attempt
to VM enter will fail.  This is because the vmcs_writel() that
writes the base address into the VMCS will sign-extend the field
to 64-bits, and the Intel manual states that bits 63:32 of this
field *must* be 0.  Use vmcs_write32() where appropriate.
This fixes booting of an absolutely ancient Red Hat Linux 5.2
(not Enterprise Linux!) guest.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/vmx.c |   17 -
 1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 364263a..311afd4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1846,7 +1846,22 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
vmx-rmode.tr.ar = vmx_segment_access_rights(var);
return;
}
-   vmcs_writel(sf-base, var-base);
+
+   /* Intel 64 and IA-32 Architecture Software Developer's Manual Vol. 3b,
+* section 22.3.1.2 states that VMENTRY will fail if bits 63:32 of the
+* base address for CS, SS, DS, ES are not 0 and the register is usable.
+*
+* If var-base happens to have bit 31 set, then it will get sign
+* extended on the vmcs_writel(), causing this check to fail.  Make
+* sure to use the 32-bit version where appropriate.
+*/
+   if (sf-base == GUEST_CS_BASE ||
+   ((~sf-ar_bytes  0x0001)  (sf-base == GUEST_SS_BASE ||
+ sf-base == GUEST_DS_BASE ||
+ sf-base == GUEST_ES_BASE)))
+   vmcs_write32(sf-base, var-base);
+   else
+   vmcs_writel(sf-base, var-base);
vmcs_write32(sf-limit, var-limit);
vmcs_write16(sf-selector, var-selector);
if (vmx-rmode.vm86_active  var-s) {
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Print Guest VMCS state on vmexit failure

2009-10-20 Thread Chris Lalancette
If we fail to handle a VMEXIT for some reason, print out a lot
more debugging information about the state of the GUEST VMCS
area.  This does not fix a bug, but helps a lot when trying to
track down the cause of a VMEXIT/VMENTRY failure.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
 arch/x86/kvm/vmx.c |   38 ++
 1 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 311afd4..37b1682 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3452,6 +3452,14 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu 
*vcpu) = {
 static const int kvm_vmx_max_exit_handlers =
ARRAY_SIZE(kvm_vmx_exit_handlers);
 
+#define PRINT_GUEST_SEGMENT(seg) do {  \
+   printk(KERN_DEBUG #seg : SELECTOR 0x%lx, BASE 0x%lx, LIMIT 0x%lx, AR 
0x%lx\n, \
+  vmcs_readl(GUEST_##seg##_SELECTOR),  \
+  vmcs_readl(GUEST_##seg##_BASE),  \
+  vmcs_readl(GUEST_##seg##_LIMIT), \
+  vmcs_readl(GUEST_##seg##_AR_BYTES)); \
+   while(0)
+
 /*
  * The guest has exited.  See if we can fix it or if we need userspace
  * assistance.
@@ -3512,6 +3520,36 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
else {
vcpu-run-exit_reason = KVM_EXIT_UNKNOWN;
vcpu-run-hw.hardware_exit_reason = exit_reason;
+
+   printk(KERN_DEBUG GUEST STATE:\n);
+   printk(KERN_DEBUG CR0: 0x%lx\n, vmcs_readl(GUEST_CR0));
+   printk(KERN_DEBUG CR3: 0x%lx\n, vmcs_readl(GUEST_CR3));
+   printk(KERN_DEBUG CR4: 0x%lx\n, vmcs_readl(GUEST_CR4));
+   printk(KERN_DEBUG VMENTRY CONTROL: 0x%lx\n,
+  vmcs_readl(VM_ENTRY_CONTROLS));
+   printk(KERN_DEBUG DR7: 0x%lx\n, vmcs_readl(GUEST_DR7));
+   printk(KERN_DEBUG SYSENTER ESP: 0x%lx\n,
+  vmcs_readl(GUEST_SYSENTER_ESP));
+   printk(KERN_DEBUG SYSENTER EIP: 0x%lx\n,
+  vmcs_readl(GUEST_SYSENTER_EIP));
+
+   PRINT_GUEST_SEGMENT(CS);
+   PRINT_GUEST_SEGMENT(SS);
+   PRINT_GUEST_SEGMENT(DS);
+   PRINT_GUEST_SEGMENT(ES);
+   PRINT_GUEST_SEGMENT(FS);
+   PRINT_GUEST_SEGMENT(GS);
+   PRINT_GUEST_SEGMENT(TR);
+   PRINT_GUEST_SEGMENT(LDTR);
+
+   printk(KERN_DEBUG GDTR: BASE 0x%lx, LIMIT 0x%lx,
+  vmcs_readl(GUEST_GDTR_BASE),
+  vmcs_readl(GUEST_GDTR_LIMIT));
+   printk(KERN_DEBUG IDTR: BASE 0x%lx, LIMIT 0x%lx,
+  vmcs_readl(GUEST_IDTR_BASE),
+  vmcs_readl(GUEST_IDTR_LIMIT));
+   printk(KERN_DEBUG RIP: 0x%lx\n,vmcs_readl(GUEST_RIP));
+   printk(KERN_DEBUG RFLAGS: 0x%lx\n,vmcs_readl(GUEST_RFLAGS));
}
return 0;
 }
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Print Guest VMCS state on vmexit failure

2009-10-20 Thread Chris Lalancette
Avi Kivity wrote:
 On 10/20/2009 04:50 PM, Chris Lalancette wrote:
 If we fail to handle a VMEXIT for some reason, print out a lot
 more debugging information about the state of the GUEST VMCS
 area.  This does not fix a bug, but helps a lot when trying to
 track down the cause of a VMEXIT/VMENTRY failure.

 
 register state can just as easily be examined in the qemu monitor.
 

Ah, true.  OK, forget this patch.

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Print Guest VMCS state on vmexit failure

2009-10-20 Thread Chris Lalancette
Nikola Ciprich wrote:
 Hi,
 maybe it's stupid question, but is this available also when qemu/kvm
 is started using libvirt  stuff? I think it uses monitor so it's 
 inaccessible for user no?

Yes and no.  The monitor is inaccessible when using libvirt, but I totally
forgot that qemu dumps the register state to stderr before abort()'ing on an
unknown vm exit.  Libvirt takes the output from stderr and stores it in
/var/log/libvirt/qemu/guestname.  So you would still be able to see this
output when using libvirt.

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix up vmx_set_segment for booting older guests.

2009-10-20 Thread Chris Lalancette
Avi Kivity wrote:
 +else
 +vmcs_writel(sf-base, var-base);
  vmcs_write32(sf-limit, var-limit);

 
 I think the correct fix is to zero extend in vmcs_writel() rather than 
 here.  But as far as I can tell, it already does.  Where does the sign 
 extension occur?  Perhaps in userspace?

Very good question Avi.  I should have dug a bit deeper before posting.  I
traced this further back, and here's what it looks like is going on:

arch/x86/kvm/x86.c:kvm_load_segment_descriptor() is responsible for loading the
CPU segment descriptor into the VMCS area.  It does this by calling
load_segment_descriptor_to_kvm_desct(), doing a few minor transformations of the
data, then calling kvm_set_segment() to load it into the VMCS.

The problem arises in load_segment_descriptor_to_kvm_desct() -
seg_desct_to_kvm_desct().  seg_desct_to_kvm_desct() takes the struct desc_struct
(in this case, base0 == 0x0, base1 == 0x0, and base2 == 0xc0), then calls
get_desc_base() and stores the result in the struct kvm_segment.  The return
value from get_desc_base is It's here that the sign-extension occurs, which
eventually causes that VM entry failure.

get_desc_base() sign-extends because of some complicated u8 to unsigned rules
that I'm not completely sure of.  The below patch fixes my original issue, but
I'm not at all sure that this is the right thing to do.  I could also change
get_desc_base() itself to do the casting, which should do the right thing for
all callers, but I'm not sure if that's what all callers want.  Anybody else
have an opinion?


diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a93ba29..b58bda2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3997,7 +3997,7 @@ static void kvm_set_segment(struct kvm_vcpu *vcpu,
 static void seg_desct_to_kvm_desct(struct desc_struct *seg_desc, u16 selector,
   struct kvm_segment *kvm_desct)
 {
-   kvm_desct-base = get_desc_base(seg_desc);
+   kvm_desct-base = (unsigned)get_desc_base(seg_desc);
kvm_desct-limit = get_desc_limit(seg_desc);
if (seg_desc-g) {
kvm_desct-limit = 12;


-- 
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is AMD rev F the same thing as socket F?

2009-10-19 Thread Chris Lalancette
Neil Aggarwal wrote:
 I forgot to mention that I have an AMD Opteron 2210
 processor.
 
 When I get cat /proc/cpuinfo, I get this output for the
 first core (The second core is the same):
 
 vendor_id   : AuthenticAMD
 cpu family  : 15

^^ means that you have a Rev F (cpu_family 15 == 0xf in hex).

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] add sysctl for kvm wallclock sync

2009-09-02 Thread Chris Lalancette
Glauber Costa wrote:
 diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
 index fc409e9..3a4e1bd 100644
 --- a/arch/x86/kernel/kvmclock.c
 +++ b/arch/x86/kernel/kvmclock.c
 @@ -27,7 +27,7 @@
  #define KVM_SCALE 22
  
  static int kvmclock = 1;
 -static unsigned int kvm_wall_update_interval = 5;
 +unsigned int kvm_wall_update_interval = 5;

I think the overall idea is very interesting, but I also think that it should be
disabled by default.  Because of the problems with time in virtualization,
people are already conditioned to run ntpd inside their guests, and this
kvmclock change will fight with ntpd.  Also, the command # date 09091323 (or
whatever) ceases to work like it does on bare-metal, so I think it has to be an
opt-in feature.

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]: Always use KVM_VERSION to build version number

2009-08-07 Thread Chris Lalancette
Avi,
 Trying to use libvirt with development snapshots of qemu-kvm is a bit
problematic.  The trouble is that for all development snapshots, the value that
gets placed into this string:

QEMU PC emulator version 0.10.0 (kvm-devel), Copyright (c) 2003-2008

Is always kvm-devel.  That means we can't tell if this is a kvm development
snapshot built yesterday, or 6 months ago, which means that in turn we can't
tell what features are available.  While we can always tell people building
their own qemu to force it by echoing a value into KVM_VERSION, it would be much
better if this were done by default.  Something like kvm-88-devel, which would
signify that this the development happening after kvm-88, leading towards
kvm-89.  Would you accept something like the patch below, which would require
you to edit the KVM_VERSION file twice during a release (once right before the
release, to change it to kvm-89, and once right after the release to change it
back to kvm-89-devel)?

Signed-off-by: Chris Lalancette clala...@redhat.com
diff --git a/KVM_VERSION b/KVM_VERSION
new file mode 100644
index 000..efd3e0e
--- /dev/null
+++ b/KVM_VERSION
@@ -0,0 +1 @@
+kvm-88-devel


Problem with X on 32 bit guest on 64-bit host

2009-02-05 Thread Chris Lalancette
All,
 I've been trying to track down this problem with starting X on a 32-bit
guest on a 64-bit host, and I've hit a bit of a wall.  Let me describe the 
setup:

Host: AMD Barcelona machine, 16GB memory, 8 cores, running 2.6.29-rc2,
kvm-userspace 3f7cba35281a5b2dba008179a4979d737105574d

Guest: RHEL-5 32-bit guest, single VCPU.

The problem is that inside the 32-bit guest, X refuses to start.  Now,
on an Intel platform I have hanging around here, this works just fine; I copy
the guest over, start it up, and X starts right up.  Also, on the Barcelona,
with a 64-bit RHEL-5 guest, X starts fine.

I've done quite a bit of tracing inside the guest, and from the guest's
perspective, something just isn't right.  When X is trying to start, one thing
it does is copy a BIOS region from /dev/mem into a shared memory region mapped
at 0 inside the X process.  The page fault for the access
to the memory region at 0 works just fine, but the very next page fault that is
injected is completely bogus; it's either  TASK_SIZE (which is 0xc000), or
has bogus VMA flags set, etc.

Going further, what actually happens is that X uses glibc's optimized memcpy
routine, which, in assembly, looks like this:

(gdb) disass memcpy
Dump of assembler code for function memcpy:
0x00387090 memcpy+0:  mov0xc(%esp),%ecx
0x00387094 memcpy+4:  mov%edi,%eax
0x00387096 memcpy+6:  mov0x4(%esp),%edi
0x0038709a memcpy+10: mov%esi,%edx
0x0038709c memcpy+12: mov0x8(%esp),%esi
0x003870a0 memcpy+16: cld
0x003870a1 memcpy+17: shr%ecx
0x003870a3 memcpy+19: jae0x3870a6 memcpy+22
0x003870a5 memcpy+21: movsb  %ds:(%esi),%es:(%edi)
0x003870a6 memcpy+22: shr%ecx
0x003870a8 memcpy+24: jae0x3870ac memcpy+28
0x003870aa memcpy+26: movsw  %ds:(%esi),%es:(%edi)
0x003870ac memcpy+28: rep movsl %ds:(%esi),%es:(%edi)
0x003870ae memcpy+30: mov%eax,%edi
0x003870b0 memcpy+32: mov%edx,%esi
0x003870b2 memcpy+34: mov0x4(%esp),%eax
0x003870b6 memcpy+38: ret

If I replace that optimized memcpy routine with my own, stupid memcpy (basically
just dst[i] = src[i] in a loop), everything works fine, and doesn't get the
bogus page fault.  In turn, that leads me to suspect that the rep command is
actually not being emulated properly on the host side, but I'm not quite sure of
that, nor am I sure where to go from here.  Does anybody have any ideas of what
I can do to further track this down?

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]: Trivial qemu/configure fix

2009-01-19 Thread Chris Lalancette
If kvm is *not* detected, then the qemu/configure script gets upset.  Add some
quotes to make it happier.

Signed-off-by: Chris Lalancette clala...@redhat.com
diff --git a/qemu/configure b/qemu/configure
index ff4a462..107699a 100755
--- a/qemu/configure
+++ b/qemu/configure
@@ -1642,7 +1642,7 @@ disable_cpu_emulation() {
 }
 
 configure_kvm() {
-  if test $kvm = yes -a $target_softmmu = yes -a \
+  if test $kvm = yes -a $target_softmmu = yes -a \
   \( $cpu = i386 -o $cpu = x86_64 -o $cpu = ia64 -o $cpu = powerpc \); then
 echo #define USE_KVM 1  $config_h
 echo USE_KVM=1  $config_mak


Re: [PATCH 0/2] kvm: disable virtualization on kdump

2008-10-30 Thread Chris Lalancette
Eric W. Biederman wrote:
 svm can writeback into memory at odd times if we don't do this, and the 
 cost
 is
 small - clear a bit in EFER.  There's no reason to be lazy.

 Especially if we can clear that bit unconditionally (when
 EFER is present) I'm all for it.

 That is the case.
 
 Cool.  I forget if we have to test for EFER on 32bit, or if we can just wrmsr
 and be done with it.  Regardless that sounds easy to do on the kexec path.

I'm pretty sure you have to test for it first; pre-64 bit x86 hardware doesn't
have the EFER register, so you'll fault on access.  On the other hand pre-64 bit
x86 hardware doesn't usually (ever?) have VT extensions either.

-- 
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 1/10] Refactor QEMUFile for live migration

2008-09-10 Thread Chris Lalancette
Anthony Liguori wrote:
snip
  void qemu_fflush(QEMUFile *f)
  {
 -if (!f-is_writable)
 +if (!f-put_buffer)
  return;
 +
  if (f-buf_index  0) {
 -if (f-is_file) {
 -fseek(f-outfile, f-buf_offset, SEEK_SET);
 -fwrite(f-buf, 1, f-buf_index, f-outfile);
 -} else {
 -bdrv_pwrite(f-bs, f-base_offset + f-buf_offset,
 -f-buf, f-buf_index);
 -}
 + f-put_buffer(f-opaque, f-buf, f-buf_offset, f-buf_index);

Nit...whitespace damage.

...

Overall, seems to be a good refactoring, and seems to keep the original
semantics of qemu_fopen_bdrv() and qemu_fopen().

Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 3/10] Add bdrv_flush_all()

2008-09-10 Thread Chris Lalancette
Anthony Liguori wrote:
 This patch adds a bdrv_flush_all() function.  It's necessary to ensure that 
 all
 IO operations have been flushed to disk before completely a live migration.
 
 N.B. we don't actually use this now.  We really should flush the block drivers
 using an live savevm callback to avoid unnecessary guest down time.

Simple enough, and follows the pattern in the KVM migration.

Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vmport: unknown command 13

2008-08-23 Thread Chris Lalancette
Elmar Haneke wrote:
 After movong from KVM-72 to KVM-73 I do get the Notice
 
 vmport: unknown command 13
 
 The Message appears on starting emulation. In an Netboot environment it
 does appear before booting from Network is asked.
 
 What might go wrong here?

It's actually harmless; it's the Bochs BIOS trying to access the vmport function
13, which means give me your UUID.  However, upstream Qemu (and hence, KVM)
doesn't support this function, so that's where the error message comes from.  If
I understand it correctly, upstream Qemu now has suppressed this warning
message, so the next time KVM syncs up, the message will disappear.

Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Migrating Vm's from one machine to another...and back

2008-08-11 Thread Chris Lalancette
Jean-Pierre Dion wrote:
 Hi,
 
 I made some VM migrations between several machines
 (Intel-Intel and Intel-AMD) and when one VM has been
 migrated onto a second machine, I cannot put it
 back on the first one. I talked with Laurent and it
 seems that there is no way (or no obvious one)
 to do that.
 
 This could be useful when doing some tasks (load
 balancing...).
 
 Is it an item that is interesting enough to be done ?
 And does anyone have some idea on how to do that ?

Hm, this seems to work for me, in combination with my recently posted Implement
tcp nowait option patch.  Basically you start the guest on machine A like
normal.  Then you start the container on machine B with the -incoming flags.
Now you migrate the guest over to machine B (I assume you've gotten this far).
Now, you kill the container on machine A (since the running guest is on B now),
and restart it with -incoming.  Now you migrate back from B to A, and it all
works out.

You can actually probably get rid of one of these steps by starting the machine
on machine A in the first place with -S -incoming tcp://0:,nowait, and then
running cont in the monitor.  In this case, the machine is already running,
and *also* listening for incoming migrate requests.  Now you do exactly the same
on machine B, and you should be able to migrate back and forth between them at 
will.

Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: Implement tcp nowait option for migration

2008-08-06 Thread Chris Lalancette
Chris Lalancette wrote:
 Sometimes you want to be able to start up the receiving side of a live 
 migration
 and actually be able to run monitor commands before you do the migration.
 Libvirt, in particular, wants to do this for setting up the migration.  This
 patch implements a nowait option to the receiving side so that you start up
 the receiving side similar to:
 
 qemu-kvm -M pc -S blah blah -incoming tcp://0:,nowait
 
 Then you are able to interact with the monitor before the live migration takes
 place.

Anthony,
 Any comments on this patch?  I'd like to try to get this in the KVM tree
once Avi gets back, so I would prefer to do any iterations/cleanups now.

Thanks,
Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: Add a migrate_incoming monitor option

2008-08-01 Thread Chris Lalancette
Daniel P. Berrange wrote:
 An accept trick only handles the TCP case though. I know this was Chris'
 example that we're currently using, but we intend to switch to passing 
 an open file descriptor instead, and proxying the data via a secure 
 channel instead of the plain tcp, or builtin SSH tunnelling. 
 
 With an open FD we'll need another way - I guess just registering a
 event on POLLIN might do the trick. It seems to me that just having
 an explicit migrate incoming monitor command would be simpler than having
 to code different triggers to implement 'nowait' for each transport
 we have.

I just finished a patch to do the nowait for tcp as suggested by Anthony, and
it was actually far less code then I feared.  It just needed a little
re-factoring of the migrate_incoming_tcp() function.  I'll be posting that in a
couple of minutes.  Given how this code worked out, doing a nowait for the fd
style is really no big deal.

Both ways (explicit monitor command and nowait) seem to have their benefits;
the monitor command has the benefit of being more flexible, while the nowait has
the benefit of being similar to already existing Qemu commands.  My preference
is for the monitor command since it seems a little more natural for a management
tool, but the nowait is clearly an option as well.

I'll also post a cleanup patch with Dan's suggestion for the monitor patch, so
both implementations will be available.

Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]: Implement tcp nowait option for migration

2008-08-01 Thread Chris Lalancette
Sometimes you want to be able to start up the receiving side of a live migration
and actually be able to run monitor commands before you do the migration.
Libvirt, in particular, wants to do this for setting up the migration.  This
patch implements a nowait option to the receiving side so that you start up
the receiving side similar to:

qemu-kvm -M pc -S blah blah -incoming tcp://0:,nowait

Then you are able to interact with the monitor before the live migration takes
place.

Signed-off-by: Chris Lalancette [EMAIL PROTECTED]
diff --git a/qemu/migration.c b/qemu/migration.c
index a64a287..d16e289 100644
--- a/qemu/migration.c
+++ b/qemu/migration.c
@@ -886,13 +886,10 @@ static int migrate_incoming_fd(int fd)
 return ret;
 }
 
-static int migrate_incoming_tcp(const char *host)
+static int migrate_listen_tcp(const char *host, int *outfd)
 {
 struct sockaddr_in addr;
-socklen_t addrlen = sizeof(addr);
-int fd, sfd;
-ssize_t len;
-uint8_t status = 0;
+int fd = -1;
 int reuse = 1;
 int rc;
 
@@ -928,19 +925,43 @@ static int migrate_incoming_tcp(const char *host)
 	goto error_socket;
 }
 
+*outfd = fd;
+
+return 0;
+
+error_socket:
+close(fd);
+error:
+return rc;
+}
+
+struct migrate_tcp_data {
+int listen_fd;
+int rc;
+};
+
+static void migrate_incoming_tcp(void *opaque)
+{
+struct sockaddr_in addr;
+socklen_t addrlen = sizeof(addr);
+struct migrate_tcp_data *data = (struct migrate_tcp_data *)opaque;
+int sfd;
+ssize_t len;
+uint8_t status = 0;
+
 again:
-sfd = accept(fd, (struct sockaddr *)addr, addrlen);
+sfd = accept(data-listen_fd, (struct sockaddr *)addr, addrlen);
 if (sfd == -1) {
 	if (errno == EINTR)
 	goto again;
 perror(accept() failed);
-rc = MIG_STAT_DST_ACCEPT_FAILED;
+data-rc = MIG_STAT_DST_ACCEPT_FAILED;
 	goto error_socket;
 }
 
-rc = migrate_incoming_fd(sfd);
-if (rc != 0) {
-fprintf(stderr, migrate_incoming_fd failed (rc=%d)\n, rc);
+data-rc = migrate_incoming_fd(sfd);
+if (data-rc != 0) {
+fprintf(stderr, migrate_incoming_fd failed (rc=%d)\n, data-rc);
 	goto error_accept;
 }
 
@@ -951,13 +972,13 @@ send_ack:
 if (len != 1) {
 fprintf(stderr, migration: send_ack: write error len=%zu (%s)\n,
 len, strerror(errno));
-rc = MIG_STAT_DST_WRITE_FAILED;
+data-rc = MIG_STAT_DST_WRITE_FAILED;
 	goto error_accept;
 }
 
-rc = wait_for_message(WAIT FOR GO, sfd, wait_for_message_timeout);
-if (rc) {
-rc += 200;
+data-rc = wait_for_message(WAIT FOR GO, sfd, wait_for_message_timeout);
+if (data-rc) {
+data-rc += 200;
 goto error_accept;
 }
 
@@ -966,7 +987,7 @@ wait_for_go:
 if (len == -1  errno == EAGAIN)
 	goto wait_for_go;
 if (len != 1) {
-rc = MIG_STAT_DST_READ_FAILED;
+data-rc = MIG_STAT_DST_READ_FAILED;
 fprintf(stderr, migration: wait_for_go: read error len=%zu (%s)\n,
 len, strerror(errno));
 }
@@ -974,9 +995,10 @@ wait_for_go:
 error_accept:
 close(sfd);
 error_socket:
-close(fd);
-error:
-return rc;
+qemu_set_fd_handler(data-listen_fd, NULL, NULL, NULL);
+close(data-listen_fd);
+
+qemu_free(data);
 }
 
 int migrate_incoming(const char *device)
@@ -996,16 +1018,57 @@ int migrate_incoming(const char *device)
 	}
 } else if (strstart(device, tcp://, ptr)) {
 	char *host, *end;
+	struct migrate_tcp_data *data;
+	int is_waitconnect = 1;
+
 	host = strdup(ptr);
+	if (!host)
+	goto fail;
 	end = strchr(host, '/');
 	if (end) *end = 0;
-	ret = migrate_incoming_tcp(host);
+
+	data = qemu_mallocz(sizeof(struct migrate_tcp_data));
+	if (!data) {
+	qemu_free(host);
+	goto fail;
+	}
+
+	ptr = host;
+	while((ptr = strchr(ptr,','))) {
+	ptr++;
+	if (!strncmp(ptr,nowait,6)) {
+	is_waitconnect = 0;
+	} else {
+	printf(Unknown option: %s\n, ptr);
+		qemu_free(host);
+	goto fail;
+	}
+	}
+
+	ret = migrate_listen_tcp(host, (data-listen_fd));
 	qemu_free(host);
+	if (ret != 0)
+	goto fail;
+
+	/*
+	 * if we made it here, then migrate_incoming_tcp is responsible for
+	 * freeing the data structure
+	 */
+	if (!is_waitconnect) {
+	socket_set_nonblock(data-listen_fd);
+	qemu_set_fd_handler(data-listen_fd, migrate_incoming_tcp, NULL, data);
+	}
+	else {
+	migrate_incoming_tcp(data);
+	ret = data-rc;
+	}
+
 } else {
 	errno = EINVAL;
 	ret = MIG_STAT_DST_INVALID_PARAMS;
 }
 
+ fail:
 return ret;
 }
 


Re: [PATCH]: Add a migrate_incoming monitor option

2008-08-01 Thread Chris Lalancette
Daniel P. Berrange wrote:
 @@ -9673,11 +9675,16 @@ int main(int argc, char **argv)
  if (incoming) {
  int rc;
  
 -rc = migrate_incoming(incoming);
 -if (rc != 0) {
 -fprintf(stderr, Migration failed rc=%d\n, rc);
 -exit(rc);
 -}
 +if (strncmp(incoming, monitor, 7) == 0) {
 +incoming_monitor = 1;
 +}
 +else {
 +rc = migrate_incoming(incoming);
 +if (rc != 0) {
 +fprintf(stderr, Migration failed rc=%d\n, rc);
 +exit(rc);
 +}
 +}
 
 Rather than putting the strncmp(monitor)  into vl.c, I'd just leave
 this part as is.  Put the logic into the 'migrate_incoming()' method
 so that it just sets the 'incoming_monitor' flag and then returns
 immediately. That would allwo the 'incoming_Monitor' flag to be declared
 static to the migrate.c file, instead of polluting vl.c

Actually, that won't quite work.  We still need to share the incoming_monitor
flag between migration.c and monitor.c.  However, your suggestion is better in
that this is a migration-specific flag, so I'll move it over like you suggest.

Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]: Add a migrate_incoming monitor option

2008-07-31 Thread Chris Lalancette
We've been trying to plumb libvirt to do KVM migration.  One of the stumbling
blocks we are running into, however, is that libvirt expects to be able to use
the Qemu monitor both before and after migration has taken place, on both the
source and destination nodes.  After migration has taken place is no problem; we
return to the main qemu select() loop, and we can run monitor commands.
However, before migration, on the destination side, when we start qemu with a
command-line like:

qemu-kvm -M pc -S blah blah -incoming tcp://0:

we can't run any monitor commands since the migration code is synchronously
waiting for an incoming tcp connection.  To get around this, the following patch
adds a new monitor command called migrate_incoming; it takes all of the same
parameters as the command-line option, but just starts it later.  To make sure
it is safe, you actually have to start with -incoming monitor; if you run it
without that, it will just spit an error at you.  So with this in place, libvirt
can do the equivalent of:

qemu-kvm -M pc -S blah blah -incoming monitor
(qemu) info cpus
...other commands
(qemu) migrate_incoming tcp://0:
...wait for migration to start, and then complete
(qemu) info block
...etc.

Signed-off-by: Chris Lalancette [EMAIL PROTECTED]
diff --git a/qemu/monitor.c b/qemu/monitor.c
index 20dcca6..c11b82c 100644
--- a/qemu/monitor.c
+++ b/qemu/monitor.c
@@ -504,6 +504,25 @@ static void do_cont(void)
 vm_start();
 }
 
+static void do_migrate_incoming(const char *incom)
+{
+extern int incoming_monitor;
+
+if (!incoming_monitor) {
+term_printf(FAIL: Can only use the migrate-incoming command with -incoming monitor\n);
+}
+else {
+int rc;
+
+rc = migrate_incoming(incom);
+	if (rc != 0) {
+fprintf(stderr, Migration failed rc=%d\n, rc);
+exit(rc);
+	}
+vm_start();
+}
+}
+
 #ifdef CONFIG_GDBSTUB
 static void do_gdbserver(const char *port)
 {
@@ -1486,6 +1505,7 @@ static term_cmd_t term_cmds[] = {
   , cancel the current VM migration },
 { migrate_set_speed, s, do_migrate_set_speed,
   value, set maximum speed (in bytes) for migrations },
+{ migrate_incoming, s, do_migrate_incoming, incom, incoming string },
 { cpu_set, is, do_cpu_set_nr, cpu [online|offline], change cpu state },
 #if defined(TARGET_I386) || defined(TARGET_X86_64)
 { drive_add, iss, drive_hot_add, pcibus pcidevfn [file=file][,if=type][,bus=n]\n
diff --git a/qemu/vl.c b/qemu/vl.c
index e1762ee..9b5f113 100644
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -229,6 +229,7 @@ int cursor_hide = 1;
 int graphic_rotate = 0;
 int daemonize = 0;
 const char *incoming;
+int incoming_monitor = 0;
 const char *option_rom[MAX_OPTION_ROMS];
 int nb_option_roms;
 int semihosting_enabled = 0;
@@ -9673,11 +9675,16 @@ int main(int argc, char **argv)
 if (incoming) {
 int rc;
 
-rc = migrate_incoming(incoming);
-if (rc != 0) {
-fprintf(stderr, Migration failed rc=%d\n, rc);
-exit(rc);
-	}
+if (strncmp(incoming, monitor, 7) == 0) {
+incoming_monitor = 1;
+}
+else {
+rc = migrate_incoming(incoming);
+if (rc != 0) {
+fprintf(stderr, Migration failed rc=%d\n, rc);
+exit(rc);
+}
+}
 }
 
 {


Re: Live Migration fails

2008-07-30 Thread Chris Lalancette
Daniel Hasler wrote:
 After a while, on host B (the target), I see the following error:
  migration: wait_for_go: read error len=0 (Interrupted system call)
  Migration failed rc=210
 
 while on host A, I read following error message:
  WAIT FOR ACK: timeout reached
  Migration failed! ret=0 error=13

No, this is a bug I just fixed yesterday in Qemu proper.  The next time that Avi
rebases KVM to Qemu (which he said he would do as soon as the fix went into
Qemu), then this should be fixed.

FYI, the problem is basically that the i2c restore on the destination side was
expecting 4 bytes, while the i2c save on the sending side was only sending 1
byte, so they eventually timed out while waiting for each other.

Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fresh install of Windows XP hangs early in boot?

2008-07-30 Thread Chris Lalancette
Roland Dreier wrote:
 BTW I tried using if=ide to install Windows XP and got a blue screen
 during the installer.  What are people doing to run XP in a kvm guest?

Hm, your comment later on makes me think you tried this on AMD.  If so, I have
also run into a similar problem with Windows guests under AMD.  After installing
WinDbg, it told me that it was a Paging Request in Non-Paged memory related to
the Video memory area.  Does yours look similar to that?  I have not had time to
track it further than that, though.

Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]: Fix i2c_bus_save, which fixes KVM live migration

2008-07-29 Thread Chris Lalancette
Attached is a simple patch to make i2c_bus_save() put a 32-bit quantity in the
save file, which matches what i2c_bus_load() expects to pull out of the save
file later.  Without this fix in place, KVM live migration fails since the
sender is only sending 1 byte while the receiver is waiting to receive 4 bytes.

Avi, I don't know when you plan to next rebase KVM to upstream QEMU, but it's
probably a good idea to carry this patch so that live migration works at all.

Signed-off-by: Chris Lalancette [EMAIL PROTECTED]
diff --git a/qemu/hw/i2c.c b/qemu/hw/i2c.c
index 5d283fb..f711db7 100644
--- a/qemu/hw/i2c.c
+++ b/qemu/hw/i2c.c
@@ -21,7 +21,7 @@ static void i2c_bus_save(QEMUFile *f, void *opaque)
 {
 i2c_bus *bus = (i2c_bus *)opaque;
 
-qemu_put_byte(f, bus-current_dev ? bus-current_dev-address : -1);
+qemu_put_be32(f, bus-current_dev ? bus-current_dev-address : -1);
 }
 
 static int i2c_bus_load(QEMUFile *f, void *opaque, int version_id)


[PATCH]: Fake emulate Intel perfctr MSRs

2008-05-29 Thread Chris Lalancette
Attached is a patch for fake emulating Intel perfctr MSRs, similar to the recent
patch to fake emulate the AMD perfctr MSRs.  This is needed for a reason similar
for the previous patch; older linux guests (in this case, 2.6.9) can attempt to
access the MSR's without a fixup section, and injecting a GPF kills the guest.
Tested by me on RHEL-4 i386 and x86_64 guests, as well as F-9 guests.

Signed-off-by: Chris Lalancette [EMAIL PROTECTED]
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index aaa99ed..f28789e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -29,6 +29,7 @@
 
 #include asm/io.h
 #include asm/desc.h
+#include asm/intel_arch_perfmon.h
 
 #define __ex(x) __kvm_handle_fault_on_reboot(x)
 
@@ -916,6 +917,18 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
 	case MSR_IA32_TIME_STAMP_COUNTER:
 		guest_write_tsc(data);
 		break;
+	case MSR_ARCH_PERFMON_EVENTSEL0:
+	case MSR_ARCH_PERFMON_EVENTSEL1:
+	case MSR_ARCH_PERFMON_PERFCTR0:
+	case MSR_ARCH_PERFMON_PERFCTR1:
+		/*
+		 * Just discard all writes to the performance counters; this
+		 * should keep both older linux and windows 64-bit guests
+		 * happy
+		 */
+		pr_unimpl(vcpu, unimplemented perfctr wrmsr: 0x%x data 0x%llx\n, msr_index, data);
+
+		break;
 	default:
 		msr = find_msr_entry(vmx, msr_index);
 		if (msr) {


Re: [Fwd: [PATCH]: Fix crash in virtio_blk during modprobe ; rmmod ; modprobe]

2008-05-22 Thread Chris Lalancette
Jens Axboe wrote:
 On Thu, May 22 2008, Rusty Russell wrote:
 On Wednesday 21 May 2008 23:12:39 Chris Lalancette wrote:
 diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
 index 4962e62..c678ac5 100644
 --- a/drivers/block/virtio_blk.c
 +++ b/drivers/block/virtio_blk.c
 @@ -294,6 +294,7 @@ static void virtblk_remove(struct virtio_device *vdev)
 vdev-config-reset(vdev);
  
 blk_cleanup_queue(vblk-disk-queue);
 +   del_gendisk(vblk-disk);
 put_disk(vblk-disk);
 unregister_blkdev(major, virtblk);
 mempool_destroy(vblk-pool);
 Thanks Chris, it seems reasonable and I'm sure it works (kinda hard to
 test here, it's my root block dev).  Other drivers seem to do
 blk_cleanup_queue() *after* del_gendisk: does it matter?

 Jens CC'd: he's gentle with my dumb questions...  Rusty.
 
 del_gendisk() can generate IO, so it would seem safer to do that
 _before_ putting the queue reference :-)
 

Ah, good to know.  Out of curiousity, why would del_gendisk() generate
additional I/O?

Chris Lalancette
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]: Fix silly output for virtio devices in /proc/interrupts

2008-05-21 Thread Chris Lalancette
register_virtio_device() is doing something silly by overwriting what the caller
put into .bus_id.  This causes the interrupt line for all virtio devices to show
up as 0, 1, etc. in /proc/interrupts.  The attached patch fixes it.

Signed-off-by: Chris Lalancette [EMAIL PROTECTED]
commit cb97605728fc1d7a15ddc6ab689e8bcd23871133
Author: Chris Lalancette [EMAIL PROTECTED]
Date:   Thu May 15 09:04:55 2008 -0400

register_virtio_device was doing something silly, in that it was overwriting
what the calling driver stuck into .bus_id for the name.  This caused
problems in the output of /proc/interrupts, since when you request_irq(),
it doesn't actually copy the devname you pass in but just stores a pointer
to the data.  The fix is to just not have register_virtio_device do anything
with the bus_id, and assume the higher level driver set it up properly.

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 138a7f0..1556ac2 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -128,7 +128,6 @@ int register_virtio_device(struct virtio_device *dev)
 	int err;
 
 	dev-dev.bus = virtio_bus;
-	sprintf(dev-dev.bus_id, %u, dev-index);
 
 	/* We always start by resetting the device, in case a previous
 	 * driver messed it up.  This also tests that code path a little. */