[KVM-AUTOTEST][COMMIT] Merge branch 'master' of git://github.com/ehabkost/autotest

2009-06-11 Thread Uri Lublin
From: Uri Lublin u...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Define CONFIG_KVM_APIC_ARCHITECTURE

2009-06-11 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/ia64/external-module-compat.h b/ia64/external-module-compat.h
index 8ccad90..60a83a1 100644
--- a/ia64/external-module-compat.h
+++ b/ia64/external-module-compat.h
@@ -24,6 +24,10 @@ typedef u64 phys_addr_t;
 #error KVM/IA-64 depends on preempt notifiers in kernel.
 #endif
 
+#ifndef CONFIG_KVM_APIC_ARCHITECTURE
+#define CONFIG_KVM_APIC_ARCHITECTURE
+#endif
+
 /* smp_call_function() lost an argument in 2.6.27. */
 #if LINUX_VERSION_CODE  KERNEL_VERSION(2,6,27)
 
diff --git a/x86/external-module-compat.h b/x86/external-module-compat.h
index 273bfee..f7aa151 100644
--- a/x86/external-module-compat.h
+++ b/x86/external-module-compat.h
@@ -22,6 +22,10 @@ typedef u64 phys_addr_t;
 #define CONFIG_HAVE_KVM_EVENTFD 1
 #endif
 
+#ifndef CONFIG_KVM_APIC_ARCHITECTURE
+#define CONFIG_KVM_APIC_ARCHITECTURE
+#endif
+
 #if LINUX_VERSION_CODE  KERNEL_VERSION(2,6,25)
 
 #ifdef CONFIG_X86_64
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Use pointer to vcpu instead of vcpu_id in timer code.

2009-06-11 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 5d5cfd3..26c29cb 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -291,7 +291,7 @@ static void create_pit_timer(struct kvm_kpit_state *ps, u32 
val, int is_period)
pt-timer.function = kvm_timer_fn;
pt-t_ops = kpit_ops;
pt-kvm = ps-pit-kvm;
-   pt-vcpu_id = 0;
+   pt-vcpu = pt-kvm-bsp_vcpu;
 
atomic_set(pt-pending, 0);
ps-irq_ack = 1;
diff --git a/arch/x86/kvm/kvm_timer.h b/arch/x86/kvm/kvm_timer.h
index 26bd6ba..55c7524 100644
--- a/arch/x86/kvm/kvm_timer.h
+++ b/arch/x86/kvm/kvm_timer.h
@@ -6,7 +6,7 @@ struct kvm_timer {
bool reinject;
struct kvm_timer_ops *t_ops;
struct kvm *kvm;
-   int vcpu_id;
+   struct kvm_vcpu *vcpu;
 };
 
 struct kvm_timer_ops {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b066130..b1694dc 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -950,7 +950,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
apic-lapic_timer.timer.function = kvm_timer_fn;
apic-lapic_timer.t_ops = lapic_timer_ops;
apic-lapic_timer.kvm = vcpu-kvm;
-   apic-lapic_timer.vcpu_id = vcpu-vcpu_id;
+   apic-lapic_timer.vcpu = vcpu;
 
apic-base_address = APIC_DEFAULT_PHYS_BASE;
vcpu-arch.apic_base = APIC_DEFAULT_PHYS_BASE;
diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c
index 86dbac0..85cc743 100644
--- a/arch/x86/kvm/timer.c
+++ b/arch/x86/kvm/timer.c
@@ -33,7 +33,7 @@ enum hrtimer_restart kvm_timer_fn(struct hrtimer *data)
struct kvm_vcpu *vcpu;
struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
 
-   vcpu = ktimer-kvm-vcpus[ktimer-vcpu_id];
+   vcpu = ktimer-vcpu;
if (!vcpu)
return HRTIMER_NORESTART;
 
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6

2009-06-11 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] KVM: Introduce kvm_vcpu_is_bsp() function.

2009-06-11 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Use it instead of open code vcpu_id zero is BSP assumption.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 3199221..3924591 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -1216,7 +1216,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
if (IS_ERR(vmm_vcpu))
return PTR_ERR(vmm_vcpu);
 
-   if (vcpu-vcpu_id == 0) {
+   if (kvm_vcpu_is_bsp(vcpu)) {
vcpu-arch.mp_state = KVM_MP_STATE_RUNNABLE;
 
/*Set entry address for first run.*/
diff --git a/arch/ia64/kvm/vcpu.c b/arch/ia64/kvm/vcpu.c
index a2c6c15..7e7391d 100644
--- a/arch/ia64/kvm/vcpu.c
+++ b/arch/ia64/kvm/vcpu.c
@@ -830,7 +830,7 @@ static void vcpu_set_itc(struct kvm_vcpu *vcpu, u64 val)
 
kvm = (struct kvm *)KVM_VM_BASE;
 
-   if (vcpu-vcpu_id == 0) {
+   if (kvm_vcpu_is_bsp(vcpu)) {
for (i = 0; i  kvm-arch.online_vcpus; i++) {
v = (struct kvm_vcpu *)((char *)vcpu +
sizeof(struct kvm_vcpu_data) * i);
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 9749ec3..5d5cfd3 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -228,7 +228,7 @@ int pit_has_pending_timer(struct kvm_vcpu *vcpu)
 {
struct kvm_pit *pit = vcpu-kvm-arch.vpit;
 
-   if (pit  vcpu-vcpu_id == 0  pit-pit_state.irq_ack)
+   if (pit  kvm_vcpu_is_bsp(vcpu)  pit-pit_state.irq_ack)
return atomic_read(pit-pit_state.pit_timer.pending);
return 0;
 }
@@ -249,7 +249,7 @@ void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
struct kvm_pit *pit = vcpu-kvm-arch.vpit;
struct hrtimer *timer;
 
-   if (vcpu-vcpu_id != 0 || !pit)
+   if (!kvm_vcpu_is_bsp(vcpu) || !pit)
return;
 
timer = pit-pit_state.pit_timer.timer;
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index bf94a45..148c52a 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -57,7 +57,7 @@ static void pic_unlock(struct kvm_pic *s)
}
 
if (wakeup) {
-   vcpu = s-kvm-vcpus[0];
+   vcpu = s-kvm-bsp_vcpu;
if (vcpu)
kvm_vcpu_kick(vcpu);
}
@@ -254,7 +254,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s)
 {
int irq, irqbase, n;
struct kvm *kvm = s-pics_state-irq_request_opaque;
-   struct kvm_vcpu *vcpu0 = kvm-vcpus[0];
+   struct kvm_vcpu *vcpu0 = kvm-bsp_vcpu;
 
if (s == s-pics_state-pics[0])
irqbase = 0;
@@ -512,7 +512,7 @@ static void picdev_read(struct kvm_io_device *this,
 static void pic_irq_request(void *opaque, int level)
 {
struct kvm *kvm = opaque;
-   struct kvm_vcpu *vcpu = kvm-vcpus[0];
+   struct kvm_vcpu *vcpu = kvm-bsp_vcpu;
struct kvm_pic *s = pic_irqchip(kvm);
int irq = pic_get_irq(s-pics[0]);
 
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 44f20cd..b066130 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -793,7 +793,8 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
vcpu-arch.apic_base = value;
return;
}
-   if (apic-vcpu-vcpu_id)
+
+   if (!kvm_vcpu_is_bsp(apic-vcpu))
value = ~MSR_IA32_APICBASE_BSP;
 
vcpu-arch.apic_base = value;
@@ -844,7 +845,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
}
update_divide_count(apic);
atomic_set(apic-lapic_timer.pending, 0);
-   if (vcpu-vcpu_id == 0)
+   if (kvm_vcpu_is_bsp(vcpu))
vcpu-arch.apic_base |= MSR_IA32_APICBASE_BSP;
apic_update_ppr(apic);
 
@@ -985,7 +986,7 @@ int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
u32 lvt0 = apic_get_reg(vcpu-arch.apic, APIC_LVT0);
int r = 0;
 
-   if (vcpu-vcpu_id == 0) {
+   if (kvm_vcpu_is_bsp(vcpu)) {
if (!apic_hw_enabled(vcpu-arch.apic))
r = 1;
if ((lvt0  APIC_LVT_MASKED) == 0 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 37397f6..13f6f7d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -645,7 +645,7 @@ static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
 
init_vmcb(svm);
 
-   if (vcpu-vcpu_id != 0) {
+   if (!kvm_vcpu_is_bsp(vcpu)) {
kvm_rip_write(vcpu, 0);
svm-vmcb-save.cs.base = svm-vcpu.arch.sipi_vector  12;
svm-vmcb-save.cs.selector = svm-vcpu.arch.sipi_vector  8;
@@ -709,7 +709,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
fx_init(svm-vcpu);
svm-vcpu.fpu_active = 1;
svm-vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
-   if (svm-vcpu.vcpu_id == 0)
+   if (kvm_vcpu_is_bsp(svm-vcpu))

[COMMIT master] KVM: Break dependency between vcpu index in vcpus array and vcpu_id.

2009-06-11 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

Archs are free to use vcpu_id as they see fit. For x86 it is used as
vcpu's apic id. New ioctl is added to configure boot vcpu id that was
assumed to be 0 till now.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 5f43697..9cf1c4b 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -465,7 +465,6 @@ struct kvm_arch {
unsigned long   metaphysical_rr4;
unsigned long   vmm_init_rr;
 
-   int online_vcpus;
int is_sn2;
 
struct kvm_ioapic *vioapic;
diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
index f922bbb..cbadd8a 100644
--- a/arch/ia64/kvm/Kconfig
+++ b/arch/ia64/kvm/Kconfig
@@ -25,6 +25,7 @@ config KVM
select PREEMPT_NOTIFIERS
select ANON_INODES
select HAVE_KVM_IRQCHIP
+   select KVM_APIC_ARCHITECTURE
---help---
  Support hosting fully virtualized guest machines using hardware
  virtualization extensions.  You will need a fairly recent
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 3924591..cbda5db 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -338,7 +338,7 @@ static struct kvm_vcpu *lid_to_vcpu(struct kvm *kvm, 
unsigned long id,
union ia64_lid lid;
int i;
 
-   for (i = 0; i  kvm-arch.online_vcpus; i++) {
+   for (i = 0; i  atomic_read(kvm-online_vcpus); i++) {
if (kvm-vcpus[i]) {
lid.val = VCPU_LID(kvm-vcpus[i]);
if (lid.id == id  lid.eid == eid)
@@ -412,7 +412,7 @@ static int handle_global_purge(struct kvm_vcpu *vcpu, 
struct kvm_run *kvm_run)
 
call_data.ptc_g_data = p-u.ptc_g_data;
 
-   for (i = 0; i  kvm-arch.online_vcpus; i++) {
+   for (i = 0; i  atomic_read(kvm-online_vcpus); i++) {
if (!kvm-vcpus[i] || kvm-vcpus[i]-arch.mp_state ==
KVM_MP_STATE_UNINITIALIZED ||
vcpu == kvm-vcpus[i])
@@ -852,8 +852,6 @@ struct  kvm *kvm_arch_create_vm(void)
 
kvm_init_vm(kvm);
 
-   kvm-arch.online_vcpus = 0;
-
return kvm;
 
 }
@@ -1356,8 +1354,6 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
goto fail;
}
 
-   kvm-arch.online_vcpus++;
-
return vcpu;
 fail:
return ERR_PTR(r);
diff --git a/arch/ia64/kvm/vcpu.c b/arch/ia64/kvm/vcpu.c
index 7e7391d..2334eac 100644
--- a/arch/ia64/kvm/vcpu.c
+++ b/arch/ia64/kvm/vcpu.c
@@ -831,7 +831,7 @@ static void vcpu_set_itc(struct kvm_vcpu *vcpu, u64 val)
kvm = (struct kvm *)KVM_VM_BASE;
 
if (kvm_vcpu_is_bsp(vcpu)) {
-   for (i = 0; i  kvm-arch.online_vcpus; i++) {
+   for (i = 0; i  atomic_read(kvm-online_vcpus); i++) {
v = (struct kvm_vcpu *)((char *)vcpu +
sizeof(struct kvm_vcpu_data) * i);
VMX(v, itc_offset) = itc_offset;
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 8cd2a4e..7fbedfd 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -27,6 +27,7 @@ config KVM
select ANON_INODES
select HAVE_KVM_IRQCHIP
select HAVE_KVM_EVENTFD
+   select KVM_APIC_ARCHITECTURE
---help---
  Support hosting fully virtualized guest machines using hardware
  virtualization extensions.  You will need a fairly recent
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 7ed9de1..c7611ef 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -430,6 +430,7 @@ struct kvm_trace_rec {
 #ifdef __KVM_HAVE_PIT
 #define KVM_CAP_PIT2 33
 #endif
+#define KVM_CAP_SET_BOOT_CPU_ID 34
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -535,6 +536,7 @@ struct kvm_irqfd {
 #define KVM_DEASSIGN_DEV_IRQ   _IOW(KVMIO, 0x75, struct kvm_assigned_irq)
 #define KVM_IRQFD  _IOW(KVMIO, 0x76, struct kvm_irqfd)
 #define KVM_CREATE_PIT2   _IOW(KVMIO, 0x77, struct 
kvm_pit_config)
+#define KVM_SET_BOOT_CPU_ID_IO(KVMIO, 0x78)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b55d427..1478b8f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -129,8 +129,12 @@ struct kvm {
int nmemslots;
struct kvm_memory_slot memslots[KVM_MEMORY_SLOTS +
KVM_PRIVATE_MEM_SLOTS];
+#ifdef CONFIG_KVM_APIC_ARCHITECTURE
+   u32 bsp_vcpu_id;
struct kvm_vcpu *bsp_vcpu;
+#endif
struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
+   atomic_t online_vcpus;
struct list_head vm_list;
struct mutex lock;
struct kvm_io_bus mmio_bus;
@@ -550,8 +554,10 @@ static inline void kvm_irqfd_release(struct kvm *kvm) {}
 
 

[COMMIT master] Merge branch 'for-avi' of git://git.et.redhat.com/qemu-net

2009-06-11 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

* 'for-avi' of git://git.et.redhat.com/qemu-net: (69 commits)
  Fix build breakage when using VDE introduced by 4f1c942
  Fix defined but not used warning
  monitor: Introduce get_command_name()
  monitor: Remove unused variable
  monitor: Remove uneeded 'return' statement
  monitor: Remove uneeded goto
  Use snprintf to avoid OpenBSD warning
  Fix Sparse warning
  Clean up generated qemu-img-cmds.h
  Fix Sparse warning
  microblaze-dis.c does not need to be executable
  Fix warning
  Remove unused and misnamed field and variable
  Update irqs on reset and device load
  Register reset functions for e1000 and rtl8139
  virtio-net: Increase filter and control limits
  virtio-net: Add new RX filter controls
  virtio-net: MAC filter optimization
  virtio-net: Fix MAC filter overflow handling
  virtio-net: reorganize receive_filter()
  ...
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Pull qemu headers into libkvm

2009-06-11 Thread Avi Kivity
From: Glauber Costa glom...@redhat.com

Those headers define qemu specific things like ram_addr_t.
This will allow us to start using them in libkvm.

Signed-off-by: Glauber Costa glom...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/libkvm-all.c b/libkvm-all.c
index dd56498..45679fb 100644
--- a/libkvm-all.c
+++ b/libkvm-all.c
@@ -26,6 +26,7 @@
 #error libkvm: userspace and kernel version mismatch
 #endif
 
+#include sysemu.h
 #include unistd.h
 #include fcntl.h
 #include stdio.h
@@ -47,7 +48,6 @@
 #define DPRINTF(fmt, args...) do {} while (0)
 #endif
 
-#define MIN(x,y) ((x)  (y) ? (x) : (y))
 #define ALIGN(x, y) (((x)+(y)-1)  ~((y)-1))
 
 int kvm_abi = EXPECTED_KVM_API_VERSION;
diff --git a/libkvm-all.h b/libkvm-all.h
index 03b98df..d647ef1 100644
--- a/libkvm-all.h
+++ b/libkvm-all.h
@@ -82,6 +82,7 @@ struct kvm_vcpu_context
 typedef struct kvm_context *kvm_context_t;
 typedef struct kvm_vcpu_context *kvm_vcpu_context_t;
 
+#include kvm.h
 int kvm_alloc_kernel_memory(kvm_context_t kvm, unsigned long memory,
void **vm_mem);
 int kvm_alloc_userspace_memory(kvm_context_t kvm, unsigned long memory,
diff --git a/target-i386/libkvm.c b/target-i386/libkvm.c
index f88102e..0f4e009 100644
--- a/target-i386/libkvm.c
+++ b/target-i386/libkvm.c
@@ -1,3 +1,5 @@
+#include sysemu.h
+
 #include libkvm-all.h
 #include libkvm.h
 #include errno.h
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Network throughput limits for local VM - VM communication

2009-06-11 Thread Arnd Bergmann
On Wednesday 10 June 2009, Fischer, Anna wrote:
  Have you tried eliminating VLAN to simplify the setup?
 
 No - but there is a relating bug in the tun/tap interface (well, it is not
 really a bug but simply the way tun/tap works) that will cause packets to
 be replicated on all the tap interfaces (across all bridges attached to
 those) if I do not configure VLANs. This will result in a system that is
 even more overloaded. I had discovered this a while back when running
 UDP stress tests under 10G.

Not sure I understand. Do you mean you have all three guests connected
to the same bridge? If you want the router guest to be the only connection,
you should not connect the two bridges anywhere else, so I don't see
how packets can go from one bridge to the other one, except through the
router.

  Does it change when the guests communicate over a -net socket interface
  with your router instead of the -net tap + bridge in the host?
 
 I have not tried this - I need the bridge in the network data path for
 some testing, so using the -net socket interface would not solve my problem.

I did not mean this to solve your problem but to hunt down the bug.
If the problem only exists with the host bridge device, we should look
there, but if it persists, we can probably rule out the tap, bridge and vlan
code in the host as the problem source.
 
 However, I have just today managed to get around this bug by using the
 e1000 QEMU emulated NIC model and this seems to do the trick. Now the
 throughput is still very low, but that might simply be because my system
 is too weak. When using the e1000 model instead of rtl8139 or virtio, 
 I do not have any network crashes any more.

That could either indicate a bug in rtl8139 and virtio, or that the
specific timing of the e1000 model hides this bug.

What happens if only one side uses e1000 while the other still uses
virtio? What about any of the other models?

Arnd 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Network throughput limits for local VM - VM communication

2009-06-11 Thread Fischer, Anna
 Subject: Re: Network throughput limits for local VM - VM
 communication
 
 On Wednesday 10 June 2009, Fischer, Anna wrote:
   Have you tried eliminating VLAN to simplify the setup?
 
  No - but there is a relating bug in the tun/tap interface (well, it
 is not
  really a bug but simply the way tun/tap works) that will cause
 packets to
  be replicated on all the tap interfaces (across all bridges attached
 to
  those) if I do not configure VLANs. This will result in a system that
 is
  even more overloaded. I had discovered this a while back when running
  UDP stress tests under 10G.
 
 Not sure I understand. Do you mean you have all three guests connected
 to the same bridge? If you want the router guest to be the only
 connection,
 you should not connect the two bridges anywhere else, so I don't see
 how packets can go from one bridge to the other one, except through the
 router.

I am using two bridges, and yes, in theory, the router should be the only 
connection between the two guests. However, without VLANs, the tun interface 
will pass packets to all tap interfaces. It has to, as it doesn't know to which 
one the packet has to go to. It does not look at packets, it simply copies 
buffers from userspace to the tap interface in the kernel. The tap interface 
then eventually drops the packet, if the MAC address does not match its own. So 
packets will not actually go across both bridges, because the tap interface 
that should not receive the packet does drop it. However, it does receive the 
packet and processes it to some extend which causes some overhead. As I was 
told by someone at KVM/RedHat, this does not happen when using VLANs as then 
there will be a direct mapping between any tun-tap device and so no packet 
replication across multiple tap devices.
 

   Does it change when the guests communicate over a -net socket
 interface
   with your router instead of the -net tap + bridge in the host?
 
  I have not tried this - I need the bridge in the network data path
 for
  some testing, so using the -net socket interface would not solve my
 problem.
 
 I did not mean this to solve your problem but to hunt down the bug.
 If the problem only exists with the host bridge device, we should look
 there, but if it persists, we can probably rule out the tap, bridge and
 vlan
 code in the host as the problem source.

Yes, I understand you were trying to help and using the -net socket interface 
would help to narrow down where the problem could be. I just have not yet 
managed to set this up, but I might do if I find the time in the next days. I 
was hoping that other people might have seen the same issues I see, but 
unfortunately I did not get that many replies/suggestions on this issue from 
the list at all.
 

  However, I have just today managed to get around this bug by using
 the
  e1000 QEMU emulated NIC model and this seems to do the trick. Now the
  throughput is still very low, but that might simply be because my
 system
  is too weak. When using the e1000 model instead of rtl8139 or virtio,
  I do not have any network crashes any more.
 
 That could either indicate a bug in rtl8139 and virtio, or that the
 specific timing of the e1000 model hides this bug.
 
 What happens if only one side uses e1000 while the other still uses
 virtio? What about any of the other models?

Good question. I will try this out and post the results.

Cheers,
Anna
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2] [APIC] Optimize searching for highest IRR

2009-06-11 Thread Gleb Natapov
Most of the time IRR is empty, so instead of scanning the whole IRR on
each VM entry keep a variable that tells us if IRR is not empty. IRR
will have to be scanned twice on each IRQ delivery, but this is much
more rare than VM entry.

v2:
 The only difference from v1 is the comment describing possible race and
 how it is solved. The race is not created by the patch BTW.

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 44f20cd..38a7fa0 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -165,29 +165,45 @@ static int find_highest_vector(void *bitmap)
 
 static inline int apic_test_and_set_irr(int vec, struct kvm_lapic *apic)
 {
+   apic-irr_pending = true;
return apic_test_and_set_vector(vec, apic-regs + APIC_IRR);
 }
 
-static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
+static inline int apic_search_irr(struct kvm_lapic *apic)
 {
-   apic_clear_vector(vec, apic-regs + APIC_IRR);
+   return find_highest_vector(apic-regs + APIC_IRR);
 }
 
 static inline int apic_find_highest_irr(struct kvm_lapic *apic)
 {
int result;
 
-   result = find_highest_vector(apic-regs + APIC_IRR);
+   if (!apic-irr_pending)
+   return -1;
+
+   result = apic_search_irr(apic);
ASSERT(result == -1 || result = 16);
 
return result;
 }
 
+static inline void apic_clear_irr(int vec, struct kvm_lapic *apic)
+{
+   apic-irr_pending = false;
+   apic_clear_vector(vec, apic-regs + APIC_IRR);
+   if (apic_search_irr(apic) != -1)
+   apic-irr_pending = true;
+}
+
 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
 {
struct kvm_lapic *apic = vcpu-arch.apic;
int highest_irr;
 
+   /* This may race with setting of irr in __apic_accept_irq() and
+  value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq
+  will cause vmexit immediately and the value will be recalculated
+  on the next vmentry. */
if (!apic)
return 0;
highest_irr = apic_find_highest_irr(apic);
@@ -842,6 +858,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
apic_set_reg(apic, APIC_ISR + 0x10 * i, 0);
apic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
}
+   apic-irr_pending = false;
update_divide_count(apic);
atomic_set(apic-lapic_timer.pending, 0);
if (vcpu-vcpu_id == 0)
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index a587f83..3f3ecc6 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -12,6 +12,7 @@ struct kvm_lapic {
struct kvm_timer lapic_timer;
u32 divide_count;
struct kvm_vcpu *vcpu;
+   bool irr_pending;
struct page *regs_page;
void *regs;
gpa_t vapic_addr;
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM updates for 2.6.31

2009-06-11 Thread Avi Kivity
Linus, please pull the 2.6.31 KVM batch from

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git kvm-updates/2.6.31

Changes include MSI support, a rework of the interrupt code, improved smp
performance, and architecure code updates.

Amit Shah (1):
  KVM: x86: Ignore reads to EVNTSEL MSRs

Andi Kleen (1):
  KVM: Add VT-x machine check support

Andre Przywara (1):
  KVM: SVM: Fix cross vendor migration issue in segment segment descriptor

Avi Kivity (18):
  KVM: VMX: Don't use highmem pages for the msr and pio bitmaps
  KVM: VMX: Don't intercept MSR_KERNEL_GS_BASE
  KVM: VMX: Make module parameters readable
  KVM: VMX: Rename kvm_handle_exit() to vmx_handle_exit()
  KVM: VMX: Simplify module parameter names
  KVM: VMX: Annotate module parameters as __read_mostly
  KVM: VMX: Zero the vpid module parameter if vpid is not supported
  KVM: VMX: Zero ept module parameter if ept is not present
  KVM: VMX: Fold vm_need_ept() into callers
  KVM: VMX: Make flexpriority module parameter reflect hardware capability
  KVM: MMU: Use different shadows when EFER.NXE changes
  KVM: Replace kvmclock open-coded get_cpu_var() with the real thing
  KVM: Fix cpuid feature misreporting
  KVM: Add AMD cpuid bit: cr8_legacy, abm, misaligned sse, sse4, 3dnow 
prefetch
  x86: Add cpu features MOVBE and POPCNT
  KVM: Update cpuid 1.ecx reporting
  KVM: Disable large pages on misaligned memory slots
  KVM: Prevent overflow in largepages calculation

Carsten Otte (4):
  KVM: s390: Fix memory slot versus run - v3
  KVM: s390: Unlink vcpu on destroy - v2
  KVM: s390: Sanity check on validity intercept
  KVM: s390: Verify memory in kvm run

Chris Wright (1):
  KVM: Trivial format fix in setup_routing_entry()

Christian Borntraeger (3):
  KVM: declare ioapic functions only on affected hardware
  KVM: s390: use hrtimer for clock wakeup from idle - v2
  KVM: s390: optimize float int lock: spin_lock_bh -- spin_lock

Dong, Eddie (2):
  KVM: MMU: Emulate #PF error code of reserved bits violation
  KVM: Use rsvd_bits_mask in load_pdptrs()

Eddie Dong (1):
  KVM: MMU: Fix comment in page_fault()

Glauber Costa (3):
  KVM: fix apic_debug instances
  KVM: Replace -drop_interrupt_shadow() by -set_interrupt_shadow()
  KVM: Deal with interrupt shadow state for emulated instructions

Gleb Natapov (51):
  KVM: APIC: kvm_apic_set_irq deliver all kinds of interrupts
  KVM: ioapic/msi interrupt delivery consolidation
  KVM: consolidate ioapic/ipi interrupt delivery logic
  KVM: change the way how lowest priority vcpu is calculated
  KVM: APIC: get rid of deliver_bitmask
  KVM: MMU: do not free active mmu pages in free_mmu_pages()
  KVM: SVM: Remove duplicate code in svm_do_inject_vector()
  KVM: reuse (pop|push)_irq from svm.c in vmx.c
  KVM: Timer event should not unconditionally unhalt vcpu.
  KVM: Fix interrupt unhalting a vcpu when it shouldn't
  KVM: VMX: Fix handling of a fault during NMI unblocked due to IRET
  KVM: VMX: Rewrite vmx_complete_interrupt()'s twisted maze of if() 
statements
  KVM: VMX: Do not zero idt_vectoring_info in vmx_complete_interrupts().
  KVM: Fix task switch back link handling.
  KVM: Fix unneeded instruction skipping during task switching.
  KVM: x86 emulator: fix call near emulation
  KVM: x86 emulator: Add decoding of 16bit second immediate argument
  KVM: x86 emulator: Add lcall decoding
  KVM: x86 emulator: Complete ljmp decoding at decode stage
  KVM: x86 emulator: Complete short/near jcc decoding in decode stage
  KVM: x86 emulator: Complete decoding of call near in decode stage
  KVM: x86 emulator: Add unsigned byte immediate decode
  KVM: x86 emulator: Completely decode in/out at decoding stage
  KVM: x86 emulator: Decode soft interrupt instructions
  KVM: x86 emulator: Add new mode of instruction emulation: skip
  KVM: SVM: Skip instruction on a task switch only when appropriate
  KVM: Make kvm_cpu_(has|get)_interrupt() work for userspace irqchip too
  KVM: VMX: Consolidate userspace and kernel interrupt injection for VMX
  KVM: VMX: Cleanup vmx_intr_assist()
  KVM: Use kvm_arch_interrupt_allowed() instead of checking 
interrupt_window_open directly
  KVM: SVM: Coalesce userspace/kernel irqchip interrupt injection logic
  KVM: Remove exception_injected() callback.
  KVM: Remove inject_pending_vectors() callback
  KVM: Remove kvm_push_irq()
  KVM: sync_lapic_to_cr8() should always sync cr8 to V_TPR
  KVM: Do not report TPR write to userspace if new value bigger or equal to 
a previous one.
  KVM: Get rid of arch.interrupt_window_open  arch.nmi_window_open
  KVM: SVM: Add NMI injection support
  KVM: Fix userspace IRQ chip migration
  KVM: Get rid of get_irq() callback
  KVM: SVM: Don't reinject event that caused a task 

Re: Network throughput limits for local VM - VM communication

2009-06-11 Thread Avi Kivity

Fischer, Anna wrote:

I am using two bridges, and yes, in theory, the router should be the only connection 
between the two guests. However, without VLANs, the tun interface will pass packets 
to all tap interfaces. It has to, as it doesn't know to which one the packet has to 
go to. It does not look at packets, it simply copies buffers from userspace to the 
tap interface in the kernel. The tap interface then eventually drops the packet, if 
the MAC address does not match its own. So packets will not actually go across both 
bridges, because the tap interface that should not receive the packet does drop it. 
However, it does receive the packet and processes it to some extend which causes some 
overhead. As I was told by someone at KVM/RedHat, this does not happen when using 
VLANs as then there will be a direct mapping between any tun-tap device and 
so no packet replication across multiple tap devices.
  


This only happens if the receiving tap never sends out packets.  If the 
tap interface does send out packets, the bridge will associate their MAC 
address with that interface, and future packets will only be forwarded 
there.


Is this your scenario?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: change the dirty page tracking to work with dirty bity

2009-06-11 Thread Ulrich Drepper
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Izik Eidus wrote:
 + if (!kvm_x86_ops-dirty_bit_support()) {
 + spin_lock(kvm-mmu_lock);
 + /*  remove_write_access() flush the tlb */
 + kvm_mmu_slot_remove_write_access(kvm, log-slot);
 + spin_unlock(kvm-mmu_lock);
 + } else {
 + kvm_flush_remote_tlbs(kvm);

It might not correspond to the common style, but I think a callback
function -dirty_bit_support is overkill.  This is a function pointer
the compiler cannot see through.  Hence it's an indirect function call.
 But the implementation is always a simple yes/no (it seems).  Indirect
calls are rather expensive (most of the time they cannot be predicted
right).

Why not instead have a read-only data constants and have an inline
function test that value?  It means no function call and only one data
access.


Also, you're inconsistent in the use of integers and true/false in the
implementations of this function.  Either use 0/1 or false/true.

- --
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkowv08ACgkQ2ijCOnn/RHR71ACdH3xr3XPnCLgsMMwdTawfehEN
vs4An2DlErhU6SeanSYVIyP3eLB4sjsz
=UZ32
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Replace pending exception by PF if it happens serially.

2009-06-11 Thread Gleb Natapov
Replace previous exception with a new one in a hope that instruction
re-execution will regenerate lost exception.

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 272e2e8..3150d06 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -181,16 +181,21 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, 
unsigned long addr,
++vcpu-stat.pf_guest;
 
if (vcpu-arch.exception.pending) {
-   if (vcpu-arch.exception.nr == PF_VECTOR) {
-   printk(KERN_DEBUG kvm: inject_page_fault:
-double fault 0x%lx\n, addr);
-   vcpu-arch.exception.nr = DF_VECTOR;
-   vcpu-arch.exception.error_code = 0;
-   } else if (vcpu-arch.exception.nr == DF_VECTOR) {
+   switch(vcpu-arch.exception.nr) {
+   case DF_VECTOR:
/* triple fault - shutdown */
set_bit(KVM_REQ_TRIPLE_FAULT, vcpu-requests);
+   case PF_VECTOR:
+   vcpu-arch.exception.nr = DF_VECTOR;
+   vcpu-arch.exception.error_code = 0;
+   return;
+   default:
+   /* replace previous exception with a new one in a hope
+  that instruction re-execution will regenerate lost
+  exception */
+   vcpu-arch.exception.pending = false;
+   break;
}
-   return;
}
vcpu-arch.cr2 = addr;
kvm_queue_exception_e(vcpu, PF_VECTOR, error_code);
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM won't compile on 2.6.29

2009-06-11 Thread Avi Kivity

Bike  Snow wrote:

Hello

I've compiled and installed KVM on kernel 2.6.28-11. It worked perfectly.
I'm using Ubuntu 9.04.

I'm now trying to compile on kernel 2.6.29-4.

It fails on compiling the kernel module with this error message:

/usr/src/kvm-kmod-devel-86/x86/iommu.c: In function âkvm_iommu_map_pagesâ:
/usr/src/kvm-kmod-devel-86/x86/iommu.c:90: error: âIOMMU_CACHEâ
undeclared (first use in this function)
/usr/src/kvm-kmod-devel-86/x86/iommu.c:90: error: (Each undeclared
identifier is reported only once
/usr/src/kvm-kmod-devel-86/x86/iommu.c:90: error: for each function it
appears in.)
/usr/src/kvm-kmod-devel-86/x86/iommu.c: In function âkvm_assign_deviceâ:
/usr/src/kvm-kmod-devel-86/x86/iommu.c:155: error: implicit
declaration of function âiommu_domain_has_capâ
/usr/src/kvm-kmod-devel-86/x86/iommu.c:156: error:
âIOMMU_CAP_CACHE_COHERENCYâ undeclared (first use in this function)
make[3]: *** [/usr/src/kvm-kmod-devel-86/x86/iommu.o] Error 1
make[2]: *** [/usr/src/kvm-kmod-devel-86/x86] Error 2
make[1]: *** [_module_/usr/src/kvm-kmod-devel-86] Error 2
make[1]: Leaving directory `/usr/src/linux-headers-2.6.29-02062904-generic'
make: *** [all] Error 2


This happens if I compile with the kvm-86.tar.gz package or the
smaller module only kvm-kmod-devel-86.tar.gz package.

I have kernel headers installed. I've also installed the source for
2.6.29-4 (not necessary but tried it anyway).

Any ideas?

  


This is already fixed in git.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 05/11] qemu: MSI-X support functions

2009-06-11 Thread Michael S. Tsirkin
On Wed, Jun 10, 2009 at 08:04:13PM +0100, Paul Brook wrote:
   If we can't start a new qemu with the same hardware configuration then we
   should not be allowing migration or loading of snapshots.
 
  OK, so I'll add an option in virtio-net to disable msi-x, and such
  an option will be added in any device with msi-x support.
  Will that address your concern?
 
 Yes, as long as migration fails when you try to migrate to the wrong kind of 
 device.
 
 Paul

I think the right way to do this, is to make sure that standard
read-only registers in PCI config space are not modified in migration
(device-specific registers could have changed as a result of guest
actions, so we can't make assumptions).
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Network throughput limits for local VM - VM communication

2009-06-11 Thread Fischer, Anna
 Subject: Re: Network throughput limits for local VM - VM
 communication
 
 Fischer, Anna wrote:
  I am using two bridges, and yes, in theory, the router should be the
 only connection between the two guests. However, without VLANs, the tun
 interface will pass packets to all tap interfaces. It has to, as it
 doesn't know to which one the packet has to go to. It does not look at
 packets, it simply copies buffers from userspace to the tap interface
 in the kernel. The tap interface then eventually drops the packet, if
 the MAC address does not match its own. So packets will not actually go
 across both bridges, because the tap interface that should not receive
 the packet does drop it. However, it does receive the packet and
 processes it to some extend which causes some overhead. As I was told
 by someone at KVM/RedHat, this does not happen when using VLANs as then
 there will be a direct mapping between any tun-tap device and so no
 packet replication across multiple tap devices.
 
 
 This only happens if the receiving tap never sends out packets.  If the
 tap interface does send out packets, the bridge will associate their
 MAC
 address with that interface, and future packets will only be forwarded
 there.
 
 Is this your scenario?

Not sure I understand. As far as I can see the packets are replicated on the 
tun/tap interface before they actually enter the bridge. So this is not about 
the bridge learning MAC addresses and flooding frames to unknown destinations. 
So I think this is different.

Anna

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive

2009-06-11 Thread Michael Goldish

- Yolkfull Chow yz...@redhat.com wrote:

 Michael, these are the backtrace messages:
 
 ...
 20090611-064959 
 no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
 
 ERROR: run_once: Test failed: [Errno 12] Cannot allocate memory
 20090611-064959 
 no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
 
 DEBUG: run_once: Postprocessing on error...
 20090611-065000 
 no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
 
 DEBUG: postprocess_vm: Postprocessing VM 'vm1'...
 20090611-065000 
 no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
 
 DEBUG: postprocess_vm: VM object found in environment
 20090611-065000 
 no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
 
 DEBUG: send_monitor_cmd: Sending monitor command: screendump 
 /kvm-autotest/client/results/default/kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024/debug/post_vm1.ppm
 20090611-065000 
 no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:
 
 DEBUG: run_once: Contents of environment: {'vm__vm1': kvm_vm.VM 
 instance at 0x92999a28}
 post-test sysinfo error:
 Traceback (most recent call last):
File /kvm-autotest/client/common_lib/log.py, line 58, in
 decorated_func
  fn(*args, **dargs)
File /kvm-autotest/client/bin/base_sysinfo.py, line 213, in 
 log_after_each_test
  log.run(test_sysinfodir)
File /kvm-autotest/client/bin/base_sysinfo.py, line 112, in run
  shell=True, env=env)
File /usr/lib64/python2.4/subprocess.py, line 412, in call
  return Popen(*args, **kwargs).wait()
File /usr/lib64/python2.4/subprocess.py, line 542, in __init__
  errread, errwrite)
File /usr/lib64/python2.4/subprocess.py, line 902, in
 _execute_child
  self.pid = os.fork()
 OSError: [Errno 12] Cannot allocate memory
 2009-06-11 06:50:02,859 Configuring logger for client level
  FAIL
 kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024

 kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024

 timestamp=1244717402localtime=Jun 11 06:50:02Unhandled
 OSError: 
 [Errno 12] Cannot allocate memory
Traceback (most recent call last):
  File /kvm-autotest/client/common_lib/test.py, line 304,
 
 in _exec
self.execute(*p_args, **p_dargs)
  File /kvm-autotest/client/common_lib/test.py, line 187,
 
 in execute
self.run_once(*args, **dargs)
  File 
 /kvm-autotest/client/tests/kvm_runtest_2/kvm_runtest_2.py, line 145,
 
 in run_once
routine_obj.routine(self, params, env)
  File 
 /kvm-autotest/client/tests/kvm_runtest_2/kvm_tests.py, line 3071, in
 
 run_boot_vms
curr_vm_session = kvm_utils.wait_for(curr_vm.ssh_login,
 
 240, 0, 2)
  File 
 /kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py, line 797, in
 
 wait_for
output = func()
  File
 /kvm-autotest/client/tests/kvm_runtest_2/kvm_vm.py, 
 line 728, in ssh_login
session = kvm_utils.ssh(address, port, username, 
 password, prompt, timeout)
  File 
 /kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py, line 553, in
 ssh
return remote_login(command, password, prompt, \n,
 timeout)
  File 
 /kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py, line 431, in
 
 remote_login
sub = kvm_spawn(command, linesep)
  File 
 /kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py, line 114, in
 
 __init__
(pid, fd) = pty.fork()
  File /usr/lib64/python2.4/pty.py, line 108, in fork
pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
 Persistent state variable __group_level now set to 1
  END FAIL
 kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024

 kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024

 timestamp=1244717403localtime=Jun 11 06:50:03
 Dropping caches
 2009-06-11 06:50:03,409 running: sync
 JOB ERROR: Unhandled OSError: [Errno 12] Cannot allocate memory
 Traceback (most recent call last):
File /kvm-autotest/client/bin/job.py, line 978, in step_engine
  execfile(self.control, global_control_vars, global_control_vars)
File /kvm-autotest/client/control, line 1030, in ?
  cfg_to_test(kvm_tests.cfg)
File /kvm-autotest/client/control, line 1013, in cfg_to_test
  current_status = job.run_test(kvm_runtest_2, params=dict, 
 tag=tagname)
File /kvm

Re: [PATCH 0/4] qemu-kvm cleanup

2009-06-11 Thread Avi Kivity

Glauber Costa wrote:

This series do some more cleanups in qemu-kvm.c
I decided it is better to clean it up in place a little bit
before merging it to kvm-all.c

it is dependant on my previous patch:
move libkvm-all.c code to qemu-kvm.c
  


I don't see that patch.  Where is it?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] cleanup mmio coalescing functions

2009-06-11 Thread Avi Kivity

Glauber Costa wrote:

remove wrappers that existed only due to qemu/libkvm separation.
Use qemu types for function definitions.

 
-int kvm_register_coalesced_mmio(kvm_context_t kvm, uint64_t addr, uint32_t size)

+int kvm_coalesce_mmio_region(target_phys_addr_t addr, ram_addr_t size)
 {
 #ifdef KVM_CAP_COALESCED_MMIO
+kvm_context_t kvm = kvm_context;
  


While all this code is doomed, please maintain consistent indentation 
while it lives.



struct kvm_coalesced_mmio_zone zone;
int r;
 
@@ -1121,9 +1122,10 @@ int kvm_register_coalesced_mmio(kvm_context_t kvm, uint64_t addr, uint32_t size)

return -ENOSYS;
 }
 
-int kvm_unregister_coalesced_mmio(kvm_context_t kvm, uint64_t addr, uint32_t size)

+int kvm_uncoalesce_mmio_region(target_phys_addr_t addr, ram_addr_t size)
 {
 #ifdef KVM_CAP_COALESCED_MMIO
+kvm_context_t kvm = kvm_context;
  


Here too.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] qemu-kvm cleanup

2009-06-11 Thread Avi Kivity

Avi Kivity wrote:

Glauber Costa wrote:

This series do some more cleanups in qemu-kvm.c
I decided it is better to clean it up in place a little bit
before merging it to kvm-all.c

it is dependant on my previous patch:
move libkvm-all.c code to qemu-kvm.c
  


I don't see that patch.  Where is it?



Apart from my little comment and the missing prerequisite, this series 
looks good.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-AUTOTEST PATCH] A test patch - Boot VMs until one of them becomes unresponsive

2009-06-11 Thread Yolkfull Chow

On 06/11/2009 04:53 PM, Michael Goldish wrote:

- Yolkfull Chowyz...@redhat.com  wrote:

   

Michael, these are the backtrace messages:

...
20090611-064959
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:

ERROR: run_once: Test failed: [Errno 12] Cannot allocate memory
20090611-064959
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:

DEBUG: run_once: Postprocessing on error...
20090611-065000
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:

DEBUG: postprocess_vm: Postprocessing VM 'vm1'...
20090611-065000
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:

DEBUG: postprocess_vm: VM object found in environment
20090611-065000
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:

DEBUG: send_monitor_cmd: Sending monitor command: screendump
/kvm-autotest/client/results/default/kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024/debug/post_vm1.ppm
20090611-065000
no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024:

DEBUG: run_once: Contents of environment: {'vm__vm1':kvm_vm.VM
instance at 0x92999a28}
post-test sysinfo error:
Traceback (most recent call last):
File /kvm-autotest/client/common_lib/log.py, line 58, in
decorated_func
  fn(*args, **dargs)
File /kvm-autotest/client/bin/base_sysinfo.py, line 213, in
log_after_each_test
  log.run(test_sysinfodir)
File /kvm-autotest/client/bin/base_sysinfo.py, line 112, in run
  shell=True, env=env)
File /usr/lib64/python2.4/subprocess.py, line 412, in call
  return Popen(*args, **kwargs).wait()
File /usr/lib64/python2.4/subprocess.py, line 542, in __init__
  errread, errwrite)
File /usr/lib64/python2.4/subprocess.py, line 902, in
_execute_child
  self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
2009-06-11 06:50:02,859 Configuring logger for client level
  FAIL
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024

kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024

timestamp=1244717402localtime=Jun 11 06:50:02Unhandled
OSError:
[Errno 12] Cannot allocate memory
Traceback (most recent call last):
  File /kvm-autotest/client/common_lib/test.py, line 304,

in _exec
self.execute(*p_args, **p_dargs)
  File /kvm-autotest/client/common_lib/test.py, line 187,

in execute
self.run_once(*args, **dargs)
  File
/kvm-autotest/client/tests/kvm_runtest_2/kvm_runtest_2.py, line 145,

in run_once
routine_obj.routine(self, params, env)
  File
/kvm-autotest/client/tests/kvm_runtest_2/kvm_tests.py, line 3071, in

run_boot_vms
curr_vm_session = kvm_utils.wait_for(curr_vm.ssh_login,

240, 0, 2)
  File
/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py, line 797, in

wait_for
output = func()
  File
/kvm-autotest/client/tests/kvm_runtest_2/kvm_vm.py,
line 728, in ssh_login
session = kvm_utils.ssh(address, port, username,
password, prompt, timeout)
  File
/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py, line 553, in
ssh
return remote_login(command, password, prompt, \n,
timeout)
  File
/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py, line 431, in

remote_login
sub = kvm_spawn(command, linesep)
  File
/kvm-autotest/client/tests/kvm_runtest_2/kvm_utils.py, line 114, in

__init__
(pid, fd) = pty.fork()
  File /usr/lib64/python2.4/pty.py, line 108, in fork
pid = os.fork()
OSError: [Errno 12] Cannot allocate memory
Persistent state variable __group_level now set to 1
  END FAIL
kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024

kvm_runtest_2.[RHEL-Server-5.3-64][None][1024][1][qcow2]no_boundary.local_stg.RHEL.5.3-server-64.no_ksm.boot_vms.e1000.user.size_1024

timestamp=1244717403localtime=Jun 11 06:50:03
Dropping caches
2009-06-11 06:50:03,409 running: sync
JOB ERROR: Unhandled OSError: [Errno 12] Cannot allocate memory
Traceback (most recent call last):
File /kvm-autotest/client/bin/job.py, line 978, in step_engine
  execfile(self.control, global_control_vars, global_control_vars)
File /kvm-autotest/client/control, line 1030, in ?
  cfg_to_test(kvm_tests.cfg)
File /kvm-autotest/client/control, line 1013, in cfg_to_test
  current_status = job.run_test(kvm_runtest_2, params=dict,
tag=tagname)
File /kvm-autotest/client/bin/job.py, line 44, in wrapped
  utils.drop_caches

Re: [PATCH 2/2] kvm: change the dirty page tracking to work with dirty bity

2009-06-11 Thread Izik Eidus

Avi Kivity wrote:

Izik Eidus wrote:
change the dirty page tracking to work with dirty bity instead of 
page fault.
right now the dirty page tracking work with the help of page faults, 
when we
want to track a page for being dirty, we write protect it and we mark 
it dirty
when we have write page fault, this code move into looking at the 
dirty bit

of the spte.

  


I'm concerned about performance during the later stages of live 
migration.  Even if only 1000 pages are dirty, you still have to look 
at 2,000,000 or more ptes (for an 8GB guest).  That's a lot of overhead.


I think we need to use the page table hierarchy, write protect the 
upper page table so we know which page tables we need to look at.





Great idea, so i add another bitmap for the page directory?
 
+static int vmx_dirty_bit_support(void)

+{
+return false;
+}
  


It's false only when ept is enabled.



Yea, that i found out already

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: change the dirty page tracking to work with dirty bity

2009-06-11 Thread Avi Kivity

Izik Eidus wrote:

Avi Kivity wrote:

Izik Eidus wrote:
change the dirty page tracking to work with dirty bity instead of 
page fault.
right now the dirty page tracking work with the help of page faults, 
when we
want to track a page for being dirty, we write protect it and we 
mark it dirty
when we have write page fault, this code move into looking at the 
dirty bit

of the spte.

  


I'm concerned about performance during the later stages of live 
migration.  Even if only 1000 pages are dirty, you still have to look 
at 2,000,000 or more ptes (for an 8GB guest).  That's a lot of overhead.


I think we need to use the page table hierarchy, write protect the 
upper page table so we know which page tables we need to look at.





Great idea, so i add another bitmap for the page directory?


No, why?

You need to drop write access to the shadow root ptes.  When you get a 
fault, restore write access to the root ptes, but drop access from the 
L3 ptes, and so on until you reach the L1 ptes.  There you clear the 
dirty bits, and add the page to a list of pages that need to be checked 
for dirty bits.  This way you only check ptes that have a chance to be 
dirty.


I'm not sure that will be faster, but there's a good chance.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pcidevice: failed to assign irq / hang on Intel nic boot message

2009-06-11 Thread Heiko Harders

Hello,

I'm trying to setup a virtual machine with my onboard nic passed 
through. Unfortunately I get the message:


Failed to assign irq for 2:00.0: Input/output error
Perhaps you are assigning a device that shares an IRQ with another device?

I'm using a clean Ubuntu 9.04 installation (64 bits version) which comes 
with kernel `2.6.28-11-generic'. The kvm version is 84 and I'm using an 
AMD cpu.


The relevant section from `dmesg' shows that IRQ 18 is used, but that by 
MSI/MSI-X also IRQ 2300 is assigned. IRQ 18 is also used for onboard usb 
devices and the graphics card. I need these, so I can't shut them down 
unfortunately.


# relevant snipped from dmesg after booting the machine:
[4.123477] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[4.123498] r8169 :02:00.0: PCI INT A - GSI 18 (level, low) - 
IRQ 18

[4.123514] r8169 :02:00.0: setting latency timer to 64
[4.123593] r8169 :02:00.0: irq 2300 for MSI/MSI-X

The procedure I followed was:

# unbind the onboard nic
cd /sys/bus/pci/devices/:02:00.0/driver
echo -n :02:00.0  unbind

# try to run kvm
kvm -m 512 -hda /dev/vg/vm1 -cdrom ubuntu-9.04-server-amd64.iso 
-pcidevice host=2:00.0 -boot d

Failed to assign irq for 2:00.0: Input/output error
Perhaps you are assigning a device that shares an IRQ with another device?

# The following appeared in dmesg:
[ 1220.744178] r8169 :02:00.0: PCI INT A disabled
[ 1242.066374] pci :02:00.0: PCI INT A - GSI 18 (level, low) - IRQ 18
[ 1242.02] pci :02:00.0: Invalid ROM contents
[ 1242.761820] kvm: 5930: cpu0 unhandled wrmsr: 0xc0010117 data 0
[ 1243.389097] pci :02:00.0: PCI INT A disabled

Does this mean that for my setup it isn't possible to use the onboard 
nic as a pcidevice for kvm? Anything I missed, or suggestions to try?


Another thing I tried was using another nic (an intel pro/100) as 
pcidevice. This seemed to work, I didn't get any complaints (that nic 
didn't share any irq's with other devices). The QEMU window appears and 
the virtual machine boots. But the Intel nic shows a message during booting:


Initializing Intel PRO/100 Boot Agent Version 2.0
Press Ctrl+S to enter the Setup Program..

While the progress dots (behind the word `Program' in the last sentence) 
appeared, QEMU didn't advance beyond this point. Pressing Ctrl+S brought 
me into the set program, but disabling the boot message didn't make any 
difference (and while the message was disabled it was still possible to 
enter the setup program using Ctrl+S). Is this a known problem? Any work 
arround for this?


Thanks for any help/insights,
Heiko

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] Merge latest qemu.git

2009-06-11 Thread Mark McLoughlin
Hi Avi,
The conflicts with the networking changes just pushed to qemu.git are
fairly involved to resolve, so I thought I'd try and save you the pain.

Below is a pull request which merges in the latest, and does it in the
way we recently discussed - each conflict is resolved in a separate
merge commit.

I've build tested the x86_64-softmmu target and done some light
networking testing.

HTH,
Mark.

The following changes since commit b6810dec0ea5c9e90e90404424458918972853d8:
  Avi Kivity (1):
Regenerate bios for MADT/RSDT fixes

are available in the git repository at:

  git://git.et.redhat.com/qemu-net.git for-avi

Alex Williamson (7):
  virtio-net: Add version_id 7 placeholder for vnet header support
  virtio-net: Use a byte to store RX mode flags
  virtio-net: reorganize receive_filter()
  virtio-net: Fix MAC filter overflow handling
  virtio-net: MAC filter optimization
  virtio-net: Add new RX filter controls
  virtio-net: Increase filter and control limits

Anthony Liguori (2):
  Merge branch 'net-queue'
  Fix build breakage when using VDE introduced by 4f1c942

Blue Swirl (11):
  Use hxtool to generate monitor documentation and C structures
  Fix generation of CONFIG_KVM
  Register reset functions for e1000 and rtl8139
  Update irqs on reset and device load
  Remove unused and misnamed field and variable
  Fix warning
  microblaze-dis.c does not need to be executable
  Fix Sparse warning
  Clean up generated qemu-img-cmds.h
  Fix Sparse warning
  Use snprintf to avoid OpenBSD warning

Edgar E. Iglesias (2):
  microblaze: Fix loading of petalogix s3adsp1800 dtb.
  CRIS: Remove duplicated flag defines.

Gerd Hoffmann (6):
  qdev: kill DeviceState-name
  qdev: add monitor command to dump the tree.
  xen: net backend doesn't need linux headers.
  xen nic: use qemu_malloc
  xen nic: use XC_PAGE_SIZE instead of PAGE_SIZE.
  qdev: c99 initilaizers for bus_type_names

Jan Kiszka (7):
  kvm: Improve upgrade notes when facing unsupported kernels
  net: Don't deliver to disabled interfaces in qemu_sendv_packet
  net: Fix and improved ordered packet delivery
  slirp: Avoid zombie processes after fork_exec
  net: Real fix for check_params users
  net: Improve parameter error reporting
  slirp: Reorder initialization

Kevin Wolf (2):
  qemu-img: Print available options with -o ?
  Document changes in qemu-img interface

Luiz Capitulino (5):
  monitor: Remove uneeded goto
  monitor: Remove uneeded 'return' statement
  monitor: Remove unused variable
  monitor: Introduce get_command_name()
  Fix defined but not used warning

Mark McLoughlin (22):
  Revert Fix output of uninitialized strings
  net: fix error reporting for some net parameter checks
  net: factor tap_read_packet() out of tap_send()
  net: move the tap buffer into TAPState
  net: vlan clients with no fd_can_read() can always receive
  net: only read from tapfd when we can send
  net: add fd_readv() handler to qemu_new_vlan_client() args
  net: re-name vc-fd_read() to vc-receive()
  net: pass VLANClientState* as first arg to receive handlers
  net: add return value to packet receive handler
  net: return status from qemu_deliver_packet()
  net: split out packet queueing and flushing into separate functions
  net: add qemu_send_packet_async()
  net: make use of async packet sending API in tap client
  virtio-net: implement rx packet queueing
  Merge branch 'master' of git://git.sv.gnu.org/qemu
  Merge branch 'master' of git://git.sv.gnu.org/qemu
  Merge branch 'master' of git://git.sv.gnu.org/qemu
  Merge branch 'master' of git://git.sv.gnu.org/qemu
  Merge branch 'master' of git://git.sv.gnu.org/qemu
  Merge branch 'master' of git://git.sv.gnu.org/qemu
  Merge branch 'master' of git://git.sv.gnu.org/qemu

Nathan Froyd (1):
  fix gdbstub support for multiple threads in usermode, v3

Paul Brook (9):
  Use relative path for bios
  Implement multiple samplers on stellaris ADC
  Stellaris qdev conversion
  Add --enable-debug
  Remove ARM NVIC initialization hack
  Fix elf loader range checking
  Record device property types
  Fix typo
  Use correct type for SPARC cpu_cc_op

Stefan Weil (2):
  Fix spelling in comment.
  doc: Update information on supported network adapters.

Stuart Brady (1):
  Use hxtool for qemu-img command list

 .gitignore   |2 +
 Makefile |   22 +-
 Makefile.target  |8 +-
 block/cow.c  |   12 +-
 block/qcow.c |   18 +-
 block/qcow2.c|   30 ++-
 block/raw-posix.c|6 +-
 block/raw-win32.c|6 +-
 block/vmdk.c |   18 +-
 

[KVM-AUTOTEST][PATCH] Enable running test(s) multiple times (iterations)

2009-06-11 Thread Uri Lublin

The following patch did not make it in the merge.
I've been waiting for the merge to stabilize first.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST][PATCH] Enable running test(s) multiple times (iterations)

2009-06-11 Thread Uri Lublin
From: Supriya Kannery supri...@in.ibm.com

Default is to run each test once.

Just add iterations = N in kvm_tests.cfg to the test(s) you
want to run multiple times.

Signed-off-by: Supriya Kannery supri...@in.ibm.com
Cc : Michael Goldish mgold...@redhat.com
Signed-off-by: Uri Lublin u...@redhat.com
---
 client/tests/kvm/control |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm/control b/client/tests/kvm/control
index b3543ee..c030a14 100644
--- a/client/tests/kvm/control
+++ b/client/tests/kvm/control
@@ -145,8 +145,10 @@ for dict in list:
 dependencies_satisfied = False
 break
 if dependencies_satisfied:
+test_iterations=int(dict.get(iterations, 1))
 current_status = job.run_test(kvm, params=dict,
-  tag=dict.get(shortname))
+  tag=dict.get(shortname),
+  iterations=test_iterations)
 else:
 current_status = False
 status_dict[dict.get(name)] = current_status
-- 
1.6.2.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] Merge latest qemu.git

2009-06-11 Thread Avi Kivity

Mark McLoughlin wrote:

Hi Avi,
The conflicts with the networking changes just pushed to qemu.git are
fairly involved to resolve, so I thought I'd try and save you the pain.

Below is a pull request which merges in the latest, and does it in the
way we recently discussed - each conflict is resolved in a separate
merge commit.

I've build tested the x86_64-softmmu target and done some light
networking testing.

  


Pulled, thanks.


HTH,
  


Very much.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


qemu-kvm broken after ./configure --disable-kvm

2009-06-11 Thread Beth Kon
Building latest git with ./configure --disable-kvm breaks with errors in 
pcspk.c

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM-AUTOTEST] New test module: iperf

2009-06-11 Thread Alexey Eromenko

Hello KVM-Autotest users  developers,

I want to present a new KVM-Autotest module here: kvm_iperf.

Basically it tests networking functionality, stability and performance of guest 
OSes.
This test is cross-platform -- i.e. it works on both Linux and Windows VMs.

I was under development since some time, and now I feel it is mature for a 
release.
The test is dependent on python and KVM-Autotest framework.

Basically the module consists of kvm_iperf.py, and small modifications to
kvm_runtest_2.py and kvm_tests.py.

You will also need to create a new misc subdirectory inside your 
kvm_runtest_2/.
(# mkdir kvm_runtest_2/misc)

And put two files there:
1. iperf -- this one must be Linux i586 binary
2. iperf.exe -- this one must be Win32 binary

optionally third file:
3. iperf64 -- this one must be Linux x64 binary

On Linux platform we could compile on the fly, but I decided not to (for now),
because some Linux guests do not have a compiler.
Also using i586 binary ensures that it can be used on both i586 and x64 guests.
But you can use both, if you modify kvm_tests.py accordingly.

optional parameters:
iperf_duration -- allows to specify long test durations (default = 5 sec)
iperf_parallel_threads -- allows to test multiple network threads in parallel 
(default = 1 thread)

Theoretically in future we could support other UNIXes as well (BSD and Solaris).
For this you will need only to add respective binaries in misc/ folder,  and 
modify kvm_tests.py for respective OSes.

iperf:
http://sourceforge.net/projects/iperf - source for latest.
iperf for Windows:
http://dast.nlanr.net/Projects/Iperf/iperf-1.7.0-win32.exe - stable binary.
http://noc.pregi.net/iperf.html

To commit this module we may need to commit iPerf binaries, which I hate to do. 
but I have no better idea for now.

Please review it  commit it.

-Alexey Eromenko

kvm_iperf.py
Description: Binary data


kvm_runtest_2.py.patch
Description: Binary data


kvm_tests.cfg.sample.2009-04-26
Description: Binary data


kvm_tests.cfg.sample.2009-04-26.iperf.patch
Description: Binary data


kvm_tests.cfg.sample.2009-04-26.iperf.patched
Description: Binary data


Re: [libvirt] Re: [CentOS-devel] Latest kvm packages for CentOS 5.3

2009-06-11 Thread Daniel P. Berrange
On Wed, Jun 10, 2009 at 04:50:25PM +0200, Dag Wieers wrote:
 On Wed, 10 Jun 2009, Federico Simoncelli wrote:
 
 I've been working quite extensively with kvm on CentOS 5.3 lately.
 If you are interested in the latest rpm of kvm-kmod-2.6.30-rc8,
 qemu-kvm-0.10.5 and libvirt-0.6.4 you can temporary find them here:
 
 http://update.nethesis.it/kvm/
 
 I've had no problem so far using these packages.
 Feedback is welcome.
 
 RHEL5.4 is expected to have KVM support, so it would be nice to know in 
 advance which version is being included with RHEL 5.4. Then we can update 
 our own CentOS kvm kmod for testing and reporting upstream the issue(s) we 
 still find.

That version info will become available when RHEL-5.4 beta ships in the 
not too distant future...

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v10] kvm: add support for irqfd

2009-06-11 Thread Michael S. Tsirkin
Going over this code again, I seem to see a minor error handling issue here:

On Wed, May 20, 2009 at 10:30:49AM -0400, Gregory Haskins wrote:
 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 new file mode 100644
 index 000..72a282e
 --- /dev/null
 +++ b/virt/kvm/eventfd.c
 @@ -0,0 +1,228 @@
 +/*
 + * kvm eventfd support - use eventfd objects to signal various KVM events
 + *
 + * Copyright 2009 Novell.  All Rights Reserved.
 + *
 + * Author:
 + *   Gregory Haskins ghask...@novell.com
 + *
 + * This file is free software; you can redistribute it and/or modify
 + * it under the terms of version 2 of the GNU General Public License
 + * as published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.   See the
 + * GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program; if not, write to the Free Software Foundation,
 + * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
 + */
 +
 +#include linux/kvm_host.h
 +#include linux/workqueue.h
 +#include linux/syscalls.h
 +#include linux/wait.h
 +#include linux/poll.h
 +#include linux/file.h
 +#include linux/list.h
 +
 +/*
 + * 
 + * irqfd: Allows an fd to be used to inject an interrupt to the guest
 + *
 + * Credit goes to Avi Kivity for the original idea.
 + * 
 + */
 +struct _irqfd {
 + struct kvm   *kvm;
 + int   gsi;
 + struct file  *file;
 + struct list_head  list;
 + poll_tablept;
 + wait_queue_head_t*wqh;
 + wait_queue_t  wait;
 + struct work_structwork;
 +};
 +
 +static void
 +irqfd_inject(struct work_struct *work)
 +{
 + struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
 + struct kvm *kvm = irqfd-kvm;
 +
 + mutex_lock(kvm-lock);
 + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
 + kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
 + mutex_unlock(kvm-lock);
 +}
 +
 +static int
 +irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
 +{
 + struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
 +
 + /*
 +  * The wake_up is called with interrupts disabled.  Therefore we need
 +  * to defer the IRQ injection until later since we need to acquire the
 +  * kvm-lock to do so.
 +  */
 + schedule_work(irqfd-work);
 +
 + return 0;
 +}
 +
 +static void
 +irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh,
 + poll_table *pt)
 +{
 + struct _irqfd *irqfd = container_of(pt, struct _irqfd, pt);
 +
 + irqfd-wqh = wqh;
 + add_wait_queue(wqh, irqfd-wait);
 +}
 +
 +static int
 +kvm_assign_irqfd(struct kvm *kvm, int fd, int gsi)
 +{
 + struct _irqfd *irqfd;
 + struct file *file = NULL;
 + int ret;
 +
 + irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
 + if (!irqfd)
 + return -ENOMEM;
 +
 + irqfd-kvm = kvm;
 + irqfd-gsi = gsi;
 + INIT_LIST_HEAD(irqfd-list);
 + INIT_WORK(irqfd-work, irqfd_inject);
 +
 + /*
 +  * Embed the file* lifetime in the irqfd.
 +  */
 + file = fget(fd);
 + if (IS_ERR(file)) {
 + ret = PTR_ERR(file);
 + goto fail;
 + }
 +
 + /*
 +  * Install our own custom wake-up handling so we are notified via
 +  * a callback whenever someone signals the underlying eventfd
 +  */
 + init_waitqueue_func_entry(irqfd-wait, irqfd_wakeup);
 + init_poll_funcptr(irqfd-pt, irqfd_ptable_queue_proc);
 +
 + ret = file-f_op-poll(file, irqfd-pt);
 + if (ret  0)
 + goto fail;
 +
 + irqfd-file = file;
 +
 + mutex_lock(kvm-lock);
 + list_add_tail(irqfd-list, kvm-irqfds);
 + mutex_unlock(kvm-lock);
 +
 + return 0;
 +
 +fail:
 + if (irqfd-wqh)
 + remove_wait_queue(irqfd-wqh, irqfd-wait);

Why are these 2 lines here? Either we might get a callback even though
poll failed - and then this test without lock is probably racy -
or we can't, and then we can replace the above with BUG_ON(irqfd-wqh).

Which is it? I think the later ...


 +
 + if (file  !IS_ERR(file))
 + fput(file);
 +
 + kfree(irqfd);
 + return ret;
 +}
 +
 +static void
 +irqfd_release(struct _irqfd *irqfd)
 +{
 + /*
 +  * The ordering is important.  We must remove ourselves from the wqh
 +  * first to ensure no more event callbacks are issued, and then flush
 +  * any previously scheduled work prior to freeing the memory
 +  */
 + remove_wait_queue(irqfd-wqh, irqfd-wait);
 +
 + 

Re: [KVM PATCH v10] kvm: add support for irqfd

2009-06-11 Thread Michael S. Tsirkin
On Thu, Jun 11, 2009 at 04:16:47PM +0300, Michael S. Tsirkin wrote:
  +
  +   ret = file-f_op-poll(file, irqfd-pt);
  +   if (ret  0)
  +   goto fail;

Looking at it some more, we have:
struct file_operations {

unsigned int (*poll) (struct file *, struct poll_table_struct *);

So the comparison above does not seem to make sense:
it seems that the return value from poll can not be negative.

Will the callback be executed if someone did a write to eventfd
before we attached it? If no, maybe we should call it here
if ret != 0.


  +
  +   irqfd-file = file;
  +
  +   mutex_lock(kvm-lock);
  +   list_add_tail(irqfd-list, kvm-irqfds);
  +   mutex_unlock(kvm-lock);
  +
  +   return 0;
  +
  +fail:
  +   if (irqfd-wqh)
  +   remove_wait_queue(irqfd-wqh, irqfd-wait);
 
 Why are these 2 lines here? Either we might get a callback even though
 poll failed - and then this test without lock is probably racy -
 or we can't, and then we can replace the above with BUG_ON(irqfd-wqh).
 
 Which is it? I think the later ...
 
 
  +
  +   if (file  !IS_ERR(file))
  +   fput(file);
  +
  +   kfree(irqfd);
  +   return ret;
  +}
  +
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm broken after ./configure --disable-kvm

2009-06-11 Thread Jan Kiszka
Beth Kon wrote:
 Building latest git with ./configure --disable-kvm breaks with errors in
 pcspk.c

With latest git, things break much earlier in case your host does not
provide linux/kvm.h because libkvm-all.h includes it unconditionally.

I would like to push this task to Glauber as he is already shuffling
around most of the involved code: Could you have a look on --disable-kvm
too while you are at it? My basic idea would be to get rid of direct
qemu-kvm.h includes so that you always obtain the required [proto]types
by including kvm.h, independent of CONFIG_KVM and already prepared for
upstream where there is no qemu-kvm.h.

Regarding the bugs I left behind in pcspk.c, I would suggest something
like

diff --git a/hw/pcspk.c b/hw/pcspk.c
index 9e1b59a..5b624d1 100644
--- a/hw/pcspk.c
+++ b/hw/pcspk.c
@@ -51,10 +51,9 @@ static const char *s_spk = pcspk;
 static PCSpkState pcspk_state;
 
 #ifdef USE_KVM_PIT
-static void kvm_get_pit_ch2(PITState *pit,
-struct kvm_pit_state *inkernel_state)
+static void kvm_get_pit_ch2(PITState *pit, KVMPITState *inkernel_state)
 {
-struct kvm_pit_state pit_state;
+KVMPITState pit_state;
 
 if (kvm_enabled()  qemu_kvm_pit_in_kernel()) {
 kvm_get_pit(kvm_context, pit_state);
@@ -68,8 +67,7 @@ static void kvm_get_pit_ch2(PITState *pit,
 }
 }
 
-static void kvm_set_pit_ch2(PITState *pit,
-struct kvm_pit_state *inkernel_state)
+static void kvm_set_pit_ch2(PITState *pit, KVMPITState *inkernel_state)
 {
 if (kvm_enabled()  qemu_kvm_pit_in_kernel()) {
 inkernel_state-channels[2].mode = pit-channels[2].mode;
@@ -82,9 +80,9 @@ static void kvm_set_pit_ch2(PITState *pit,
 }
 #else
 static inline void kvm_get_pit_ch2(PITState *pit,
-   kvm_pit_state *inkernel_state) { }
+   KVMPITState *inkernel_state) { }
 static inline void kvm_set_pit_ch2(PITState *pit,
-   kvm_pit_state *inkernel_state) { }
+   KVMPITState *inkernel_state) { }
 #endif
 
 static inline void generate_samples(PCSpkState *s)
@@ -168,7 +166,7 @@ static uint32_t pcspk_ioport_read(void *opaque, uint32_t 
addr)
 
 static void pcspk_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 {
-struct kvm_pit_state inkernel_state;
+KVMPITState inkernel_state;
 PCSpkState *s = opaque;
 const int gate = val  1;
 

where KVMPITState is defined as

#ifdef KVM_CAP_PIT
typedef struct kvm_pit_state KVMPITState;
#else
typedef struct { } KVMPITState;
#endif

Thanks,
Jan



signature.asc
Description: OpenPGP digital signature


[patch 5/5] KVM: VMX: conditionally disable 2M pages

2009-06-11 Thread Marcelo Tosatti
Disable usage of 2M pages if VMX_EPT_2MB_PAGE_BIT (bit 16) is clear
in MSR_IA32_VMX_EPT_VPID_CAP and EPT is enabled.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/vmx.c
===
--- kvm.orig/arch/x86/kvm/vmx.c
+++ kvm/arch/x86/kvm/vmx.c
@@ -1393,6 +1393,9 @@ static __init int hardware_setup(void)
if (!cpu_has_vmx_tpr_shadow())
kvm_x86_ops-update_cr8_intercept = NULL;
 
+   if (enable_ept  !cpu_has_vmx_ept_2m_page())
+   kvm_disable_largepages();
+
return alloc_kvm_area();
 }
 
Index: kvm/include/linux/kvm_host.h
===
--- kvm.orig/include/linux/kvm_host.h
+++ kvm/include/linux/kvm_host.h
@@ -219,6 +219,7 @@ int kvm_arch_set_memory_region(struct kv
struct kvm_userspace_memory_region *mem,
struct kvm_memory_slot old,
int user_alloc);
+void kvm_disable_largepages(void);
 void kvm_arch_flush_shadow(struct kvm *kvm);
 gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn);
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
Index: kvm/virt/kvm/kvm_main.c
===
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -85,6 +85,8 @@ static long kvm_vcpu_ioctl(struct file *
 
 static bool kvm_rebooting;
 
+static bool largepages_disabled = false;
+
 #ifdef KVM_CAP_DEVICE_ASSIGNMENT
 static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head 
*head,
  int assigned_dev_id)
@@ -1171,9 +1173,11 @@ int __kvm_set_memory_region(struct kvm *
ugfn = new.userspace_addr  PAGE_SHIFT;
/*
 * If the gfn and userspace address are not aligned wrt each
-* other, disable large page support for this slot
+* other, or if explicitly asked to, disable large page
+* support for this slot
 */
-   if ((base_gfn ^ ugfn)  (KVM_PAGES_PER_HPAGE - 1))
+   if ((base_gfn ^ ugfn)  (KVM_PAGES_PER_HPAGE - 1) ||
+   largepages_disabled)
for (i = 0; i  largepages; ++i)
new.lpage_info[i].write_count = 1;
}
@@ -1286,6 +1290,12 @@ out:
return r;
 }
 
+void kvm_disable_largepages(void)
+{
+   largepages_disabled = true;
+}
+EXPORT_SYMBOL_GPL(kvm_disable_largepages);
+
 int is_error_page(struct page *page)
 {
return page == bad_page;

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 3/5] KVM: MMU: add kvm_mmu_get_spte_hierarchy helper

2009-06-11 Thread Marcelo Tosatti
Required by EPT misconfiguration handler.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -3013,6 +3013,24 @@ out:
return r;
 }
 
+int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes[4])
+{
+   struct kvm_shadow_walk_iterator iterator;
+   int nr_sptes = 0;
+
+   spin_lock(vcpu-kvm-mmu_lock);
+   for_each_shadow_entry(vcpu, addr, iterator) {
+   sptes[iterator.level-1] = iterator.sptep;
+   nr_sptes++;
+   if (!is_shadow_present_pte(*iterator.sptep))
+   break;
+   }
+   spin_unlock(vcpu-kvm-mmu_lock);
+
+   return nr_sptes;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_get_spte_hierarchy);
+
 #ifdef AUDIT
 
 static const char *audit_msg;
Index: kvm/arch/x86/kvm/mmu.h
===
--- kvm.orig/arch/x86/kvm/mmu.h
+++ kvm/arch/x86/kvm/mmu.h
@@ -37,6 +37,8 @@
 #define PT32_ROOT_LEVEL 2
 #define PT32E_ROOT_LEVEL 3
 
+int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes[4]);
+
 static inline void kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
 {
if (unlikely(vcpu-kvm-arch.n_free_mmu_pages  KVM_MIN_FREE_MMU_PAGES))

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/5] KVM: MMU: make for_each_shadow_entry aware of largepages

2009-06-11 Thread Marcelo Tosatti
On Wed, Jun 10, 2009 at 12:21:05PM +0300, Avi Kivity wrote:
 Avi Kivity wrote:
 Marcelo Tosatti wrote:
 This way there is no need to add explicit checks in every
 for_each_shadow_entry user.

 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

 Index: kvm/arch/x86/kvm/mmu.c
 ===
 --- kvm.orig/arch/x86/kvm/mmu.c
 +++ kvm/arch/x86/kvm/mmu.c
 @@ -1273,6 +1273,11 @@ static bool shadow_walk_okay(struct kvm_
  {
  if (iterator-level  PT_PAGE_TABLE_LEVEL)
  return false;
 +
 +if (iterator-level == PT_PAGE_TABLE_LEVEL)
 +if (is_large_pte(*iterator-sptep))
 +return false;

   
 s/==//?


 Ah, it's actually fine.  But changing == to = will make it 1GBpage-ready.

Humpf, better check level explicitly before interpreting bit 7, so lets 
skip this for 1GB pages.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 4/5] KVM: VMX: EPT misconfiguration handler

2009-06-11 Thread Marcelo Tosatti
Handler for EPT misconfiguration which checks for valid state 
in the shadow pagetables, printing the spte on each level.

The separate WARN_ONs are useful for kerneloops.org.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/vmx.c
===
--- kvm.orig/arch/x86/kvm/vmx.c
+++ kvm/arch/x86/kvm/vmx.c
@@ -3233,6 +3233,90 @@ static int handle_ept_violation(struct k
return kvm_mmu_page_fault(vcpu, gpa  PAGE_MASK, 0);
 }
 
+static u64 ept_rsvd_mask(u64 *sptep, int level)
+{
+   int i;
+   u64 mask = 0;
+
+   for (i = 51; i  boot_cpu_data.x86_phys_bits; i--)
+   mask |= (1ULL  i);
+
+   if (level  2)
+   /* bits 7:3 reserved */
+   mask |= 0xf8;
+   else if (level == 2) {
+   if (*sptep  (1ULL  7))
+   /* 2MB ref, bits 20:12 reserved */
+   mask |= 0x1ff000;
+   else
+   /* bits 6:3 reserved */
+   mask |= 0x78;
+   }
+
+   return mask;
+}
+
+static void ept_misconfig_inspect_spte(struct kvm_vcpu *vcpu, u64 *sptep,
+  int level)
+{
+   printk(KERN_ERR %s: sptep %p spte 0x%llx level %d\n,
+   __func__, sptep, *sptep, level);
+
+   /* 010b (write-only) */
+   WARN_ON((*sptep  0x7) == 0x2);
+
+   /* 110b (write/execute) */
+   WARN_ON((*sptep  0x7) == 0x6);
+
+   /* 100b (execute-only) and value not supported by logical processor */
+   if (!cpu_has_vmx_ept_execute_only())
+   WARN_ON((*sptep  0x7) == 0x4);
+
+   /* not 000b */
+   if ((*sptep  0x7)) {
+   u64 rsvd_bits = *sptep  ept_rsvd_mask(sptep, level);
+
+   if (rsvd_bits != 0) {
+   printk(KERN_ERR %s: rsvd_bits = 0x%llx\n,
+__func__, rsvd_bits);
+   WARN_ON(1);
+   }
+
+   if (level == 1 || (level == 2  (*sptep  (1ULL  7 {
+   u64 ept_mem_type = (*sptep  0x38)  3;
+
+   if (ept_mem_type == 2 || ept_mem_type == 3 ||
+   ept_mem_type == 7) {
+   printk(KERN_ERR %s: ept_mem_type=0x%llx\n,
+   __func__, ept_mem_type);
+   WARN_ON(1);
+   }
+   }
+   }
+}
+
+static int handle_ept_misconfig(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+{
+   u64 *sptes[4];
+   int nr_sptes, i;
+   gpa_t gpa;
+
+   gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
+
+   printk(KERN_ERR EPT: Misconfiguration.\n);
+   printk(KERN_ERR EPT: GPA: 0x%llx\n, gpa);
+
+   nr_sptes = kvm_mmu_get_spte_hierarchy(vcpu, gpa, sptes);
+
+   for (i = PT64_ROOT_LEVEL; i  PT64_ROOT_LEVEL - nr_sptes; --i)
+   ept_misconfig_inspect_spte(vcpu, sptes[i-1], i);
+
+   kvm_run-exit_reason = KVM_EXIT_UNKNOWN;
+   kvm_run-hw.hardware_exit_reason = EXIT_REASON_EPT_MISCONFIG;
+
+   return 0;
+}
+
 static int handle_nmi_window(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
u32 cpu_based_vm_exec_control;
@@ -3303,8 +3387,9 @@ static int (*kvm_vmx_exit_handlers[])(st
[EXIT_REASON_APIC_ACCESS] = handle_apic_access,
[EXIT_REASON_WBINVD]  = handle_wbinvd,
[EXIT_REASON_TASK_SWITCH] = handle_task_switch,
-   [EXIT_REASON_EPT_VIOLATION]   = handle_ept_violation,
[EXIT_REASON_MCE_DURING_VMENTRY]  = handle_machine_check,
+   [EXIT_REASON_EPT_VIOLATION]   = handle_ept_violation,
+   [EXIT_REASON_EPT_MISCONFIG]   = handle_ept_misconfig,
 };
 
 static const int kvm_vmx_max_exit_handlers =

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 1/5] KVM: VMX: more MSR_IA32_VMX_EPT_VPID_CAP capability bits

2009-06-11 Thread Marcelo Tosatti
Required for EPT misconfiguration handler.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/include/asm/vmx.h
===
--- kvm.orig/arch/x86/include/asm/vmx.h
+++ kvm/arch/x86/include/asm/vmx.h
@@ -352,9 +352,16 @@ enum vmcs_field {
 #define VMX_EPT_EXTENT_INDIVIDUAL_ADDR 0
 #define VMX_EPT_EXTENT_CONTEXT 1
 #define VMX_EPT_EXTENT_GLOBAL  2
+
+#define VMX_EPT_EXECUTE_ONLY_BIT   (1ull)
+#define VMX_EPT_PAGE_WALK_4_BIT(1ull  6)
+#define VMX_EPTP_UC_BIT(1ull  8)
+#define VMX_EPTP_WB_BIT(1ull  14)
+#define VMX_EPT_2MB_PAGE_BIT   (1ull  16)
 #define VMX_EPT_EXTENT_INDIVIDUAL_BIT  (1ull  24)
 #define VMX_EPT_EXTENT_CONTEXT_BIT (1ull  25)
 #define VMX_EPT_EXTENT_GLOBAL_BIT  (1ull  26)
+
 #define VMX_EPT_DEFAULT_GAW3
 #define VMX_EPT_MAX_GAW0x4
 #define VMX_EPT_MT_EPTE_SHIFT  3
Index: kvm/arch/x86/kvm/vmx.c
===
--- kvm.orig/arch/x86/kvm/vmx.c
+++ kvm/arch/x86/kvm/vmx.c
@@ -270,6 +270,26 @@ static inline bool cpu_has_vmx_flexprior
cpu_has_vmx_virtualize_apic_accesses();
 }
 
+static inline bool cpu_has_vmx_ept_execute_only(void)
+{
+   return !!(vmx_capability.ept  VMX_EPT_EXECUTE_ONLY_BIT);
+}
+
+static inline bool cpu_has_vmx_eptp_uncacheable(void)
+{
+   return !!(vmx_capability.ept  VMX_EPTP_UC_BIT);
+}
+
+static inline bool cpu_has_vmx_eptp_writeback(void)
+{
+   return !!(vmx_capability.ept  VMX_EPTP_WB_BIT);
+}
+
+static inline bool cpu_has_vmx_ept_2m_page(void)
+{
+   return !!(vmx_capability.ept  VMX_EPT_2MB_PAGE_BIT);
+}
+
 static inline int cpu_has_vmx_invept_individual_addr(void)
 {
return !!(vmx_capability.ept  VMX_EPT_EXTENT_INDIVIDUAL_BIT);

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/5] KVM: MMU: make for_each_shadow_entry aware of largepages

2009-06-11 Thread Marcelo Tosatti
This way there is no need to add explicit checks in every
for_each_shadow_entry user.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1273,6 +1273,11 @@ static bool shadow_walk_okay(struct kvm_
 {
if (iterator-level  PT_PAGE_TABLE_LEVEL)
return false;
+
+   if (iterator-level == PT_PAGE_TABLE_LEVEL)
+   if (is_large_pte(*iterator-sptep))
+   return false;
+
iterator-index = SHADOW_PT_INDEX(iterator-addr, iterator-level);
iterator-sptep = ((u64 *)__va(iterator-shadow_addr)) + 
iterator-index;
return true;

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 0/5] VMX EPT misconfiguration handler v2

2009-06-11 Thread Marcelo Tosatti
Addressing comments.

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: change the dirty page tracking to work with dirty bity

2009-06-11 Thread Marcelo Tosatti
On Thu, Jun 11, 2009 at 02:27:46PM +0300, Izik Eidus wrote:
 Marcelo Tosatti wrote:


 What i'm saying is with shadow and NPT (i believe) you can mark a spte
 writable but not dirty, which gives you the ability to know whether
 certain pages have been dirtied.
   

 Isnt this what this patch is doing?

Yes, was confused for some reason i don't remember.

So making the dirty bit available to the host is a good idea, but would
have to check things like faults on out of sync pagetables (where
the guest dirty bit might be cleared in parallel, maybe its ok but
not sure), verify transfer of dirty bit when zapping is consistent
everywhere, etc.

So it would be nicer to introduce an optimization to the way dirty bit
info is acquired, then you use that to optimize kvm's dirty log ioctl.

The link with KSM was that you can consult this dirty info, which is
fast, to know if content of pages has changed. But it maybe useless,
don't know.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2353510 ] Fedora 10 and F11 failures

2009-06-11 Thread SourceForge.net
Bugs item #2353510, was opened at 2008-11-27 14:46
Message generated for change (Settings changed) made by technologov
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2353510group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 9
Private: No
Submitted By: Technologov (technologov)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fedora 10 and F11 failures

Initial Comment:

Description:
Fedora 10 fails to install on KVM. (KVM-79)

The DVD version stucks at the near end setup stage, when trying to install GRUB 
bootloader into HDD.
It didn't proceed within one hour, which indicates stucked VM.

Sometimes it may stuck earlier - during init or during early setup.

Live CD (32-bit) started fine on both Intel and AMD. (except top menu minor 
rendering bug)

Guest(s): Fedora 10 64-bit
Guest(s): Fedora 10 32-bit
Host(s): Fedora 7 64-bit, Intel, KVM-79
Host(s): Fedora 7 64-bit, AMD, KVM-79

Command: (for DVD)
qemu-kvm -cdrom /isos/linux/Fedora-10-x86_64-DVD.iso -m 512 -hda 
/vm/f10-64.qcow2  -boot d

*and* (for LiveCD)
qemu-kvm -cdrom /isos/linux/F10-i686-Live.iso -m 512

-Alexey, 27.11.2008.

--

Comment By: Technologov (technologov)
Date: 2009-06-11 17:18

Message:
Not only Fedora 10, but also Fedora 11 fails in the same way. Raising bug
priority.

Guest(s): Fedora 10 64-bit DVD

Tested on KVM-86, Intel CPU.

--

Comment By: Technologov (technologov)
Date: 2008-12-02 12:39

Message:

I have opened similar bug against Fedora 10 bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=474116

-Alexey

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2353510group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 0/6] mmu audit update v4

2009-06-11 Thread Avi Kivity

Marcelo Tosatti wrote:

Addressing comments, introducing a new helper, handling largepages.

  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] move libkvm-all.c code to qemu-kvm.c

2009-06-11 Thread Avi Kivity

Glauber Costa wrote:

Ultimately, goal is to put it in kvm-all.c, so we
can start sharing things. This is put here first
to allow for preparation.

It is almost a cut and paste. Only needed adaptation
goes with kvm_has_sync_mmu(), which had a conflicting
definition.
  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/5] KVM: MMU: add kvm_mmu_get_spte_hierarchy helper

2009-06-11 Thread Avi Kivity

Marcelo Tosatti wrote:

Required by EPT misconfiguration handler.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -3013,6 +3013,24 @@ out:
return r;
 }
 
+int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes[4])

+{
+   struct kvm_shadow_walk_iterator iterator;
+   int nr_sptes = 0;
+
+   spin_lock(vcpu-kvm-mmu_lock);
+   for_each_shadow_entry(vcpu, addr, iterator) {
+   sptes[iterator.level-1] = iterator.sptep;
  


Returning a pointer...


+   nr_sptes++;
+   if (!is_shadow_present_pte(*iterator.sptep))
+   break;
+   }
+   spin_unlock(vcpu-kvm-mmu_lock);
  


... and unlocking the lock that protects it.

True, this is called in extreme cases, but I think you can dereference 
the pointer in the function just as easily.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] cleanup mmio coalescing functions

2009-06-11 Thread Glauber Costa
remove wrappers that existed only due to qemu/libkvm separation.
Use qemu types for function definitions.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 qemu-kvm.c |   27 ---
 qemu-kvm.h |5 -
 2 files changed, 4 insertions(+), 28 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 2c2d46f..7b25d9e 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1099,9 +1099,10 @@ int kvm_init_coalesced_mmio(kvm_context_t kvm)
return r;
 }
 
-int kvm_register_coalesced_mmio(kvm_context_t kvm, uint64_t addr, uint32_t 
size)
+int kvm_coalesce_mmio_region(target_phys_addr_t addr, ram_addr_t size)
 {
 #ifdef KVM_CAP_COALESCED_MMIO
+   kvm_context_t kvm = kvm_context;
struct kvm_coalesced_mmio_zone zone;
int r;
 
@@ -1121,9 +1122,10 @@ int kvm_register_coalesced_mmio(kvm_context_t kvm, 
uint64_t addr, uint32_t size)
return -ENOSYS;
 }
 
-int kvm_unregister_coalesced_mmio(kvm_context_t kvm, uint64_t addr, uint32_t 
size)
+int kvm_uncoalesce_mmio_region(target_phys_addr_t addr, ram_addr_t size)
 {
 #ifdef KVM_CAP_COALESCED_MMIO
+   kvm_context_t kvm = kvm_context;
struct kvm_coalesced_mmio_zone zone;
int r;
 
@@ -2773,27 +2775,6 @@ void kvm_mutex_lock(void)
 cpu_single_env = NULL;
 }
 
-int qemu_kvm_register_coalesced_mmio(target_phys_addr_t addr, unsigned int 
size)
-{
-return kvm_register_coalesced_mmio(kvm_context, addr, size);
-}
-
-int qemu_kvm_unregister_coalesced_mmio(target_phys_addr_t addr,
-  unsigned int size)
-{
-return kvm_unregister_coalesced_mmio(kvm_context, addr, size);
-}
-
-int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
-{
-return kvm_register_coalesced_mmio(kvm_context, start, size);
-}
-
-int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size)
-{
-return kvm_unregister_coalesced_mmio(kvm_context, start, size);
-}
-
 #ifdef USE_KVM_DEVICE_ASSIGNMENT
 void kvm_add_ioperm_data(struct ioperm_data *data)
 {
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 0dfbcd1..4db1763 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -111,11 +111,6 @@ void kvm_tpr_access_report(CPUState *env, uint64_t rip, 
int is_write);
 void kvm_tpr_vcpu_start(CPUState *env);
 
 int qemu_kvm_get_dirty_pages(unsigned long phys_addr, void *buf);
-int qemu_kvm_register_coalesced_mmio(target_phys_addr_t addr,
-unsigned int size);
-int qemu_kvm_unregister_coalesced_mmio(target_phys_addr_t addr,
-  unsigned int size);
-
 int kvm_coalesce_mmio_region(target_phys_addr_t start, ram_addr_t size);
 int kvm_uncoalesce_mmio_region(target_phys_addr_t start, ram_addr_t size);
 
-- 
1.5.6.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] remove callbacks structure

2009-06-11 Thread Glauber Costa
The purpose of that was only to allow the user of libkvm
to register functions pointers that corresponded to possible
actions. We don't need that anymore.

Signed-off-by: Glauber Costa glom...@redhat.com
---
 libkvm-all.h |4 +-
 qemu-kvm.c   |  380 +++---
 2 files changed, 175 insertions(+), 209 deletions(-)

diff --git a/libkvm-all.h b/libkvm-all.h
index 4f7b9a3..be8c855 100644
--- a/libkvm-all.h
+++ b/libkvm-all.h
@@ -177,12 +177,10 @@ struct kvm_callbacks {
  * holds information about the KVM instance that gets created by this call.\n
  * This should always be your first call to KVM.
  *
- * \param callbacks Pointer to a valid kvm_callbacks structure
  * \param opaque Not used
  * \return NULL on failure
  */
-kvm_context_t kvm_init(struct kvm_callbacks *callbacks,
-  void *opaque);
+kvm_context_t kvm_init(void *opaque);
 
 /*!
  * \brief Cleanup the KVM context
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 7b25d9e..7a0fb83 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -10,6 +10,7 @@
 
 #include assert.h
 #include string.h
+#include signal.h
 #include hw/hw.h
 #include sysemu.h
 #include qemu-common.h
@@ -192,6 +193,156 @@ int kvm_is_containing_region(kvm_context_t kvm, unsigned 
long phys_addr, unsigne
return 1;
 }
 
+#ifdef KVM_CAP_SET_GUEST_DEBUG
+static int kvm_debug(void *opaque, void *data,
+ struct kvm_debug_exit_arch *arch_info)
+{
+int handle = kvm_arch_debug(arch_info);
+CPUState *env = data;
+
+if (handle) {
+   kvm_debug_cpu_requested = env;
+   env-kvm_cpu_state.stopped = 1;
+}
+return handle;
+}
+#endif
+
+static int kvm_inb(void *opaque, uint16_t addr, uint8_t *data)
+{
+*data = cpu_inb(0, addr);
+return 0;
+}
+
+static int kvm_inw(void *opaque, uint16_t addr, uint16_t *data)
+{
+*data = cpu_inw(0, addr);
+return 0;
+}
+
+static int kvm_inl(void *opaque, uint16_t addr, uint32_t *data)
+{
+*data = cpu_inl(0, addr);
+return 0;
+}
+
+#define PM_IO_BASE 0xb000
+
+static int kvm_outb(void *opaque, uint16_t addr, uint8_t data)
+{
+if (addr == 0xb2) {
+   switch (data) {
+   case 0: {
+   cpu_outb(0, 0xb3, 0);
+   break;
+   }
+   case 0xf0: {
+   unsigned x;
+
+   /* enable acpi */
+   x = cpu_inw(0, PM_IO_BASE + 4);
+   x = ~1;
+   cpu_outw(0, PM_IO_BASE + 4, x);
+   break;
+   }
+   case 0xf1: {
+   unsigned x;
+
+   /* enable acpi */
+   x = cpu_inw(0, PM_IO_BASE + 4);
+   x |= 1;
+   cpu_outw(0, PM_IO_BASE + 4, x);
+   break;
+   }
+   default:
+   break;
+   }
+   return 0;
+}
+cpu_outb(0, addr, data);
+return 0;
+}
+
+static int kvm_outw(void *opaque, uint16_t addr, uint16_t data)
+{
+cpu_outw(0, addr, data);
+return 0;
+}
+
+static int kvm_outl(void *opaque, uint16_t addr, uint32_t data)
+{
+cpu_outl(0, addr, data);
+return 0;
+}
+
+static int kvm_mmio_read(void *opaque, uint64_t addr, uint8_t *data, int len)
+{
+   cpu_physical_memory_rw(addr, data, len, 0);
+   return 0;
+}
+
+static int kvm_mmio_write(void *opaque, uint64_t addr, uint8_t *data, int len)
+{
+   cpu_physical_memory_rw(addr, data, len, 1);
+   return 0;
+}
+
+static int kvm_io_window(void *opaque)
+{
+return 1;
+}
+
+static int kvm_halt(void *opaque, kvm_vcpu_context_t vcpu)
+{
+return kvm_arch_halt(opaque, vcpu);
+}
+
+static int kvm_shutdown(void *opaque, void *data)
+{
+CPUState *env = (CPUState *)data;
+
+/* stop the current vcpu from going back to guest mode */
+env-kvm_cpu_state.stopped = 1;
+
+qemu_system_reset_request();
+return 1;
+}
+
+static int handle_unhandled(kvm_context_t kvm, kvm_vcpu_context_t vcpu,
+uint64_t reason)
+{
+fprintf(stderr, kvm: unhandled exit %PRIx64\n, reason);
+return -EINVAL;
+}
+
+
+static int kvm_try_push_interrupts(void *opaque)
+{
+return kvm_arch_try_push_interrupts(opaque);
+}
+
+static void kvm_post_run(void *opaque, void *data)
+{
+CPUState *env = (CPUState *)data;
+
+pthread_mutex_lock(qemu_mutex);
+kvm_arch_post_kvm_run(opaque, env);
+}
+
+static int kvm_pre_run(void *opaque, void *data)
+{
+CPUState *env = (CPUState *)data;
+
+kvm_arch_pre_kvm_run(opaque, env);
+
+if (env-exit_request)
+   return 1;
+pthread_mutex_unlock(qemu_mutex);
+return 0;
+}
+
+
+
 /* 
  * dirty pages logging control 
  */
@@ -314,8 +465,7 @@ int kvm_dirty_pages_log_reset(kvm_context_t kvm)
 }
 
 
-kvm_context_t kvm_init(struct kvm_callbacks *callbacks,
-  void *opaque)
+kvm_context_t kvm_init(void *opaque)
 {
int fd;
kvm_context_t kvm;
@@ -351,7 +501,6 @@ kvm_context_t kvm_init(struct kvm_callbacks *callbacks,
memset(kvm, 0, sizeof(*kvm));
kvm-fd = fd;
kvm-vm_fd = -1;
-   kvm-callbacks 

Re: [PATCH 0/4] qemu-kvm cleanup

2009-06-11 Thread Avi Kivity

Glauber Costa wrote:

Same series as before, but with avi's little comment addressed.
  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] BIOS changes for configuring irq0-inti2 override (v4)

2009-06-11 Thread Beth Kon
These patches resolve the irq0-inti2 override issue, and get the hpet working
on kvm.

Override and HPET changes are sent as a series because HPET depends on the
override. Win2k8 expects the HPET interrupt on inti2, regardless of whether
an override exists in the BIOS. And the HPET spec states that in legacy mode,
timer interrupt is on inti2.

The irq0-inti2 override will always be used unless the kernel cannot do irq
routing (i.e., compatibility with old kernels). So if the kernel is capable,
userspace sets up irq0-inti2 via the irq routing interface, and adds the
irq0-inti2 override to the MADT interrupt source override table,
and the mp table (for the no-acpi case).

Changes from v3:

- changes based on comments from Avi and Gleb.
- corrected legacy enable/disable for in-kernel PIT. The code now best
  approximates a multiplexer that disables PIT interrupts when HPET is 
  in legacy mode (as described by HPET spec). Any changes to the PIT that 
  may occur while HPET is operating in legacy mode are saved, so if 
  HPET leaves legacy mode, the PIT is just reenabled, with mode set 
  to whatever the last setting from guest was. Legacy mode is disabled
  at least during crash and shutdown (in Linux), so this needs to be 
  handled properly.


---
 kvm/bios/rombios32.c |   60 -
 1 files changed, 44 insertions(+), 16 deletions(-)

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 369cbef..9d6910e 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -444,6 +444,9 @@ uint32_t cpuid_features;
 uint32_t cpuid_ext_features;
 unsigned long ram_size;
 uint64_t ram_end;
+#ifdef BX_QEMU
+uint8_t irq0_override;
+#endif
 #ifdef BX_USE_EBDA_TABLES
 unsigned long ebda_cur_addr;
 #endif
@@ -485,6 +488,7 @@ void wrmsr_smp(uint32_t index, uint64_t val)
 #define QEMU_CFG_ARCH_LOCAL 0x8000
 #define QEMU_CFG_ACPI_TABLES  (QEMU_CFG_ARCH_LOCAL + 0)
 #define QEMU_CFG_SMBIOS_ENTRIES  (QEMU_CFG_ARCH_LOCAL + 1)
+#define QEMU_CFG_IRQ0_OVERRIDE   (QEMU_CFG_ARCH_LOCAL + 2)
 
 int qemu_cfg_port;
 
@@ -553,6 +557,17 @@ uint64_t qemu_cfg_get64 (void)
 }
 #endif
 
+#ifdef BX_QEMU
+void irq0_override_probe(void)
+{
+if(qemu_cfg_port) {
+qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE);
+qemu_cfg_read(irq0_override, 1);
+return;
+}
+}
+#endif
+
 void cpu_probe(void)
 {
 uint32_t eax, ebx, ecx, edx;
@@ -1195,6 +1210,13 @@ static void mptable_init(void)
 
 /* irqs */
 for(i = 0; i  16; i++) {
+#ifdef BX_QEMU
+/* One entry per ioapic interrupt destination. Destination 2 is covered
+ * by irq0-inti2 override (i == 0). Source IRQ 2 is unused
+ */
+if (irq0_override  i == 2)
+continue;
+#endif
 putb(q, 3); /* entry type = I/O interrupt */
 putb(q, 0); /* interrupt type = vectored interrupt */
 putb(q, 0); /* flags: po=0, el=0 */
@@ -1202,7 +1224,12 @@ static void mptable_init(void)
 putb(q, 0); /* source bus ID = ISA */
 putb(q, i); /* source bus IRQ */
 putb(q, ioapic_id); /* dest I/O APIC ID */
-putb(q, i); /* dest I/O APIC interrupt in */
+#ifdef BX_QEMU
+if (irq0_override  i == 0)
+putb(q, 2); /* dest I/O APIC interrupt in */
+else
+#endif
+putb(q, i); /* dest I/O APIC interrupt in */
 }
 /* patch length */
 len = q - mp_config_table;
@@ -1758,23 +1785,21 @@ void acpi_bios_init(void)
 io_apic-io_apic_id = smp_cpus;
 io_apic-address = cpu_to_le32(0xfec0);
 io_apic-interrupt = cpu_to_le32(0);
-#ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 io_apic++;
-
-int_override = (void *)io_apic;
-int_override-type = APIC_XRUPT_OVERRIDE;
-int_override-length = sizeof(*int_override);
-int_override-bus = cpu_to_le32(0);
-int_override-source = cpu_to_le32(0);
-int_override-gsi = cpu_to_le32(2);
-int_override-flags = cpu_to_le32(0);
-#endif
+int_override = (struct madt_int_override*)(io_apic);
+#ifdef BX_QEMU
+if (irq0_override) {
+memset(int_override, 0, sizeof(*int_override));
+int_override-type = APIC_XRUPT_OVERRIDE;
+int_override-length = sizeof(*int_override);
+int_override-source = 0;
+int_override-gsi = 2;
+int_override-flags = 0; /* conforms to bus specifications */
+int_override++;
+}
 #endif
-
-int_override = (struct madt_int_override*)(io_apic + 1);
-for ( i = 0; i  16; i++ ) {
-if ( PCI_ISA_IRQ_MASK  (1U  i) ) {
+for (i = 0; i  16; i++) {
+if (PCI_ISA_IRQ_MASK  (1U  i)) {
 memset(int_override, 0, sizeof(*int_override));
 int_override-type   = APIC_XRUPT_OVERRIDE;
 int_override-length = sizeof(*int_override);
@@ -2697,6 +2722,9 @@ void rombios32_init(uint32_t *s3_resume_vector, uint8_t 
*shutdown_flag)

[PATCH 3/5] BIOS changes for KVM HPET (v5)

2009-06-11 Thread Beth Kon
Signed-off-by: Beth Kon e...@us.ibm.com


---
 kvm/bios/acpi-dsdt.dsl |2 --
 kvm/bios/rombios32.c   |   11 +++
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl
index db57307..71d0a5e 100755
--- a/kvm/bios/acpi-dsdt.dsl
+++ b/kvm/bios/acpi-dsdt.dsl
@@ -296,7 +296,6 @@ DefinitionBlock (
 })
 }
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 Device(HPET) {
 Name(_HID,  EISAID(PNP0103))
 Name(_UID, 0)
@@ -316,7 +315,6 @@ DefinitionBlock (
 })
 }
 #endif
-#endif
 }
 
 Scope(\_SB.PCI0) {
diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 9d6910e..1106f38 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1518,8 +1518,8 @@ struct acpi_20_generic_address {
 } __attribute__((__packed__));
 
 /*
- *  * HPET Description Table
- *   */
+ *  HPET Description Table
+ */
 struct acpi_20_hpet {
 ACPI_TABLE_HEADER_DEF   /* ACPI common table 
header */
 uint32_t   timer_block_id;
@@ -1703,13 +1703,11 @@ void acpi_bios_init(void)
 addr += madt_size;
 
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 addr = (addr + 7)  ~7;
 hpet_addr = addr;
 hpet = (void *)(addr);
 addr += sizeof(*hpet);
 #endif
-#endif
 
 /* RSDP */
 memset(rsdp, 0, sizeof(*rsdp));
@@ -1883,7 +1881,6 @@ void acpi_bios_init(void)
 }
 
 /* HPET */
-#ifdef HPET_WORKS_IN_KVM
 memset(hpet, 0, sizeof(*hpet));
 /* Note timer_block_id value must be kept in sync with value advertised by
  * emulated hpet
@@ -1892,7 +1889,6 @@ void acpi_bios_init(void)
 hpet-addr.address = cpu_to_le32(ACPI_HPET_ADDRESS);
 acpi_build_table_header((struct  acpi_table_header *)hpet,
  HPET, sizeof(*hpet), 1);
-#endif
 
 acpi_additional_tables(); /* resets cfg to required entry */
 for(i = 0; i  external_tables; i++) {
@@ -1912,8 +1908,7 @@ void acpi_bios_init(void)
 /* kvm has no ssdt (processors are in dsdt) */
 //  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(ssdt_addr);
 #ifdef BX_QEMU
-/* No HPET (yet) */
-//  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
+rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
 if (nb_numa_nodes  0)
 rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr);
 #endif
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/5] HPET interaction with in-kernel PIT

2009-06-11 Thread Beth Kon

Signed-off-by: Beth Kon e...@us.ibm.com

---
 arch/x86/include/asm/kvm.h |1 +
 arch/x86/kvm/i8254.c   |   24 +++-
 arch/x86/kvm/i8254.h   |3 ++-
 arch/x86/kvm/x86.c |5 -
 4 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index 708b9c3..3c44923 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -235,6 +235,7 @@ struct kvm_guest_debug_arch {
 
 struct kvm_pit_state {
struct kvm_pit_channel_state channels[3];
+   u8 hpet_legacy_mode;
 };
 
 struct kvm_reinject_control {
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 331705f..bb8382b 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -340,10 +340,20 @@ static void pit_load_count(struct kvm *kvm, int channel, 
u32 val)
}
 }
 
-void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val)
+void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int 
hpet_legacy_start)
 {
+   u8 saved_mode;
mutex_lock(kvm-arch.vpit-pit_state.lock);
-   pit_load_count(kvm, channel, val);
+   if (hpet_legacy_start) {
+   /* save existing mode for later reenablement */
+   saved_mode = kvm-arch.vpit-pit_state.channels[0].mode;
+   kvm-arch.vpit-pit_state.channels[0].mode = 0xff; /* disable 
timer */
+   pit_load_count(kvm, channel, val);
+   kvm-arch.vpit-pit_state.channels[0].mode = saved_mode;
+   } else {
+   if (!(channel == 0  
kvm-arch.vpit-pit_state.hpet_legacy_mode))
+   pit_load_count(kvm, channel, val);
+   }
mutex_unlock(kvm-arch.vpit-pit_state.lock);
 }
 
@@ -411,17 +421,20 @@ static void pit_ioport_write(struct kvm_io_device *this,
switch (s-write_state) {
default:
case RW_STATE_LSB:
-   pit_load_count(kvm, addr, val);
+   if (!(addr == 0  pit_state-hpet_legacy_mode))
+   pit_load_count(kvm, addr, val);
break;
case RW_STATE_MSB:
-   pit_load_count(kvm, addr, val  8);
+   if (!(addr == 0  pit_state-hpet_legacy_mode))
+   pit_load_count(kvm, addr, val  8);
break;
case RW_STATE_WORD0:
s-write_latch = val;
s-write_state = RW_STATE_WORD1;
break;
case RW_STATE_WORD1:
-   pit_load_count(kvm, addr, s-write_latch | (val  8));
+   if (!(addr == 0  pit_state-hpet_legacy_mode))
+   pit_load_count(kvm, addr, s-write_latch | (val 
 8));
s-write_state = RW_STATE_WORD0;
break;
}
@@ -548,6 +561,7 @@ void kvm_pit_reset(struct kvm_pit *pit)
struct kvm_kpit_channel_state *c;
 
mutex_lock(pit-pit_state.lock);
+   pit-pit_state.hpet_legacy_mode = 0;
for (i = 0; i  3; i++) {
c = pit-pit_state.channels[i];
c-mode = 0xff;
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index b267018..b5967ca 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -21,6 +21,7 @@ struct kvm_kpit_channel_state {
 
 struct kvm_kpit_state {
struct kvm_kpit_channel_state channels[3];
+   u8 hpet_legacy_mode;
struct kvm_timer pit_timer;
bool is_periodic;
u32speaker_data_on;
@@ -49,7 +50,7 @@ struct kvm_pit {
 #define KVM_PIT_CHANNEL_MASK   0x3
 
 void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu);
-void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val);
+void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int 
hpet_legacy_start);
 struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags);
 void kvm_free_pit(struct kvm *kvm);
 void kvm_pit_reset(struct kvm_pit *pit);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1b91ea7..3c70545 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1948,9 +1948,12 @@ static int kvm_vm_ioctl_get_pit(struct kvm *kvm, struct 
kvm_pit_state *ps)
 static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct kvm_pit_state *ps)
 {
int r = 0;
+   int hpet_legacy_start = 0;
 
+   if (ps-hpet_legacy_mode  !kvm-arch.vpit-pit_state.hpet_legacy_mode)
+   hpet_legacy_start = 1;
memcpy(kvm-arch.vpit-pit_state, ps, sizeof(struct kvm_pit_state));
-   kvm_pit_load_count(kvm, 0, ps-channels[0].count);
+   kvm_pit_load_count(kvm, 0, ps-channels[0].count, hpet_legacy_start);
return r;
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] Userspace changes for configuring irq0-inti2 override (v4)

2009-06-11 Thread Beth Kon
Signed-off-by: Beth Kon e...@us.ibm.com

---
 hw/ioapic.c|6 +++---
 hw/pc.c|2 ++
 qemu-kvm-x86.c |6 +-
 qemu-kvm.h |2 ++
 sysemu.h   |1 +
 vl.c   |   11 +--
 6 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/hw/ioapic.c b/hw/ioapic.c
index 6c178c7..a67b766 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -23,6 +23,7 @@
 
 #include hw.h
 #include pc.h
+#include sysemu.h
 #include qemu-timer.h
 #include host-utils.h
 
@@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level)
 {
 IOAPICState *s = opaque;
 
-#if 0
 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
  * to GSI 2.  GSI maps to ioapic 1-1.  This is not
  * the cleanest way of doing it but it should work. */
 
-if (vector == 0)
+if (vector == 0  irq0override) {
 vector = 2;
-#endif
+}
 
 if (vector = 0  vector  IOAPIC_NUM_PINS) {
 uint32_t mask = 1  vector;
diff --git a/hw/pc.c b/hw/pc.c
index 66f4635..1c068fb 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -55,6 +55,7 @@
 #define BIOS_CFG_IOPORT 0x510
 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0)
 #define FW_CFG_SMBIOS_ENTRIES (FW_CFG_ARCH_LOCAL + 1)
+#define FW_CFG_IRQ0_OVERRIDE (FW_CFG_ARCH_LOCAL + 2)
 
 #define MAX_IDE_BUS 2
 
@@ -476,6 +477,7 @@ static void bochs_bios_init(void)
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, (uint8_t *)acpi_tables,
  acpi_tables_len);
+fw_cfg_add_bytes(fw_cfg, FW_CFG_IRQ0_OVERRIDE, irq0override, 1);
 
 smbios_table = smbios_get_table(smbios_len);
 if (smbios_table)
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 5526d8f..89337e9 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -909,7 +909,11 @@ int kvm_arch_init_irq_routing(void)
 return r;
 }
 for (i = 0; i  24; ++i) {
-r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+if (i == 0) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2);
+} else if (i != 2) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+}
 if (r  0)
 return r;
 }
diff --git a/qemu-kvm.h b/qemu-kvm.h
index fa40542..6bbafbc 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -169,6 +169,7 @@ int handle_tpr_access(void *opaque, kvm_vcpu_context_t vcpu,
 #define kvm_enabled() (kvm_allowed)
 #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context)
 #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context)
+#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context)
 #define kvm_has_sync_mmu() qemu_kvm_has_sync_mmu()
 void kvm_init_vcpu(CPUState *env);
 void kvm_load_tsc(CPUState *env);
@@ -177,6 +178,7 @@ void kvm_load_tsc(CPUState *env);
 #define kvm_nested 0
 #define qemu_kvm_irqchip_in_kernel() (0)
 #define qemu_kvm_pit_in_kernel() (0)
+#define qemu_kvm_has_gsi_routing() (0)
 #define kvm_has_sync_mmu() (0)
 #define kvm_load_registers(env) do {} while(0)
 #define kvm_save_registers(env) do {} while(0)
diff --git a/sysemu.h b/sysemu.h
index 47d001e..f78e974 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -108,6 +108,7 @@ extern int xenfb_enabled;
 extern int graphic_width;
 extern int graphic_height;
 extern int graphic_depth;
+extern uint8_t irq0override;
 extern DisplayType display_type;
 extern const char *keyboard_layout;
 extern int win2k_install_hack;
diff --git a/vl.c b/vl.c
index 2fda17b..9b1d1ab 100644
--- a/vl.c
+++ b/vl.c
@@ -253,6 +253,7 @@ int no_reboot = 0;
 int no_shutdown = 0;
 int cursor_hide = 1;
 int graphic_rotate = 0;
+uint8_t irq0override = 1;
 #ifndef _WIN32
 int daemonize = 0;
 #endif
@@ -6054,8 +6055,14 @@ int main(int argc, char **argv, char **envp)
 
 module_call_init(MODULE_INIT_DEVICE);
 
-if (kvm_enabled())
-   kvm_init_ap();
+if (kvm_enabled()) {
+   kvm_init_ap();
+#ifdef USE_KVM
+if (kvm_irqchip  !qemu_kvm_has_gsi_routing()) {
+irq0override = 0;
+}
+#endif
+}
 
 machine-init(ram_size, boot_devices,
   kernel_filename, kernel_cmdline, initrd_filename, cpu_model);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] Userspace changes for KVM HPET (v4)

2009-06-11 Thread Beth Kon
The big change here is handling of enabling/disabling of hpet legacy mode. When 
hpet enters
legacy mode, the spec says that the pit stops generating interrupts. In 
practice, we want to 
stop the pit periodic timer from running because it is wasteful in a virtual 
environment. 

We also have to worry about the hpet leaving legacy mode (which, at least in 
linux, happens
only during a shutdown or crash). At this point, according to the hpet spec, 
PIT interrupts
need to be reenabled. For us, it means the PIT timer needs to be restarted.  

This patch handles this situation better than the previous version by coming 
closer to 
just disabling PIT interrupts. It allows the PIT state to change if the OS 
modifies it,
even while PIT is disabled, but does not allow a pit timer to start. Then if 
HPET
legacy mode is disabled, whatever the PIT state is at that point, the PIT timer 
is 
restarted accordingly.

Signed-off-by: Beth Kon e...@us.ibm.com


---
 hw/hpet.c |   15 +++
 hw/i8254.c|   43 ++-
 hw/i8254.h|2 ++
 hw/pc.h   |4 ++--
 kvm/include/x86/asm/kvm.h |1 +
 qemu-kvm.c|   20 
 qemu-kvm.h|3 ++-
 vl.c  |7 ++-
 8 files changed, 74 insertions(+), 21 deletions(-)

diff --git a/hw/hpet.c b/hw/hpet.c
index 29db325..043b92b 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -206,6 +206,9 @@ static int hpet_load(QEMUFile *f, void *opaque, int 
version_id)
 qemu_get_timer(f, s-timer[i].qemu_timer);
 }
 }
+if (hpet_in_legacy_mode()) {
+hpet_disable_pit();
+}
 return 0;
 }
 
@@ -475,9 +478,11 @@ static void hpet_ram_writel(void *opaque, 
target_phys_addr_t addr,
 }
 /* i8254 and RTC are disabled when HPET is in legacy mode */
 if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) {
-hpet_pit_disable();
+hpet_disable_pit();
+dprintf(qemu: hpet disabled pit\n);
 } else if (deactivating_bit(old_val, new_val, 
HPET_CFG_LEGACY)) {
-hpet_pit_enable();
+hpet_enable_pit();
+dprintf(qemu: hpet enabled pit\n);
 }
 break;
 case HPET_CFG + 4:
@@ -554,13 +559,15 @@ static void hpet_reset(void *opaque) {
 /* 64-bit main counter; 3 timers supported; LegacyReplacementRoute. */
 s-capability = 0x8086a201ULL;
 s-capability |= ((HPET_CLK_PERIOD)  32);
-if (count  0)
+if (count  0) {
 /* we don't enable pit when hpet_reset is first called (by hpet_init)
  * because hpet is taking over for pit here. On subsequent invocations,
  * hpet_reset is called due to system reset. At this point control must
  * be returned to pit until SW reenables hpet.
  */
-hpet_pit_enable();
+hpet_enable_pit();
+dprintf(qemu: hpet enabled pit\n);
+}
 count = 1;
 }
 
diff --git a/hw/i8254.c b/hw/i8254.c
index 2f229f9..8c8076f 100644
--- a/hw/i8254.c
+++ b/hw/i8254.c
@@ -25,6 +25,7 @@
 #include pc.h
 #include isa.h
 #include qemu-timer.h
+#include qemu-kvm.h
 #include i8254.h
 
 //#define DEBUG_PIT
@@ -198,6 +199,9 @@ int pit_get_mode(PITState *pit, int channel)
 
 static inline void pit_load_count(PITChannelState *s, int val)
 {
+if (s-channel == 0  pit_state.hpet_legacy_mode) {
+return;
+}
 if (val == 0)
 val = 0x1;
 s-count_load_time = qemu_get_clock(vm_clock);
@@ -371,10 +375,11 @@ static void pit_irq_timer_update(PITChannelState *s, 
int64_t current_time)
(double)(expire_time - current_time) / ticks_per_sec);
 #endif
 s-next_transition_time = expire_time;
-if (expire_time != -1)
+if (expire_time != -1) {
 qemu_mod_timer(s-irq_timer, expire_time);
-else
+} else {
 qemu_del_timer(s-irq_timer);
+}
 }
 
 static void pit_irq_timer(void *opaque)
@@ -451,6 +456,7 @@ void pit_reset(void *opaque)
 PITChannelState *s;
 int i;
 
+pit-hpet_legacy_mode = 0;
 for(i = 0;i  3; i++) {
 s = pit-channels[i];
 s-mode = 3;
@@ -460,32 +466,43 @@ void pit_reset(void *opaque)
 }
 
 /* When HPET is operating in legacy mode, i8254 timer0 is disabled */
-void hpet_pit_disable(void) {
-PITChannelState *s;
-s = pit_state.channels[0];
-if (s-irq_timer)
-qemu_del_timer(s-irq_timer);
+
+void hpet_disable_pit(void)
+{
+PITChannelState *s = pit_state.channels[0];
+if (qemu_kvm_pit_in_kernel()) {
+kvm_hpet_disable_kpit();
+} else {
+if (s-irq_timer) {
+qemu_del_timer(s-irq_timer);
+}
+}
 }
 
 /* When HPET is reset or leaving legacy mode, it must reenable i8254
  * timer 0
  */
 
-void hpet_pit_enable(void)
+void hpet_enable_pit(void)
 {
 

[patch 1/5] KVM: VMX: more MSR_IA32_VMX_EPT_VPID_CAP capability bits

2009-06-11 Thread Marcelo Tosatti
Required for EPT misconfiguration handler.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/include/asm/vmx.h
===
--- kvm.orig/arch/x86/include/asm/vmx.h
+++ kvm/arch/x86/include/asm/vmx.h
@@ -352,9 +352,16 @@ enum vmcs_field {
 #define VMX_EPT_EXTENT_INDIVIDUAL_ADDR 0
 #define VMX_EPT_EXTENT_CONTEXT 1
 #define VMX_EPT_EXTENT_GLOBAL  2
+
+#define VMX_EPT_EXECUTE_ONLY_BIT   (1ull)
+#define VMX_EPT_PAGE_WALK_4_BIT(1ull  6)
+#define VMX_EPTP_UC_BIT(1ull  8)
+#define VMX_EPTP_WB_BIT(1ull  14)
+#define VMX_EPT_2MB_PAGE_BIT   (1ull  16)
 #define VMX_EPT_EXTENT_INDIVIDUAL_BIT  (1ull  24)
 #define VMX_EPT_EXTENT_CONTEXT_BIT (1ull  25)
 #define VMX_EPT_EXTENT_GLOBAL_BIT  (1ull  26)
+
 #define VMX_EPT_DEFAULT_GAW3
 #define VMX_EPT_MAX_GAW0x4
 #define VMX_EPT_MT_EPTE_SHIFT  3
Index: kvm/arch/x86/kvm/vmx.c
===
--- kvm.orig/arch/x86/kvm/vmx.c
+++ kvm/arch/x86/kvm/vmx.c
@@ -270,6 +270,26 @@ static inline bool cpu_has_vmx_flexprior
cpu_has_vmx_virtualize_apic_accesses();
 }
 
+static inline bool cpu_has_vmx_ept_execute_only(void)
+{
+   return !!(vmx_capability.ept  VMX_EPT_EXECUTE_ONLY_BIT);
+}
+
+static inline bool cpu_has_vmx_eptp_uncacheable(void)
+{
+   return !!(vmx_capability.ept  VMX_EPTP_UC_BIT);
+}
+
+static inline bool cpu_has_vmx_eptp_writeback(void)
+{
+   return !!(vmx_capability.ept  VMX_EPTP_WB_BIT);
+}
+
+static inline bool cpu_has_vmx_ept_2m_page(void)
+{
+   return !!(vmx_capability.ept  VMX_EPT_2MB_PAGE_BIT);
+}
+
 static inline int cpu_has_vmx_invept_individual_addr(void)
 {
return !!(vmx_capability.ept  VMX_EPT_EXTENT_INDIVIDUAL_BIT);

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 4/5] KVM: VMX: EPT misconfiguration handler

2009-06-11 Thread Marcelo Tosatti
Handler for EPT misconfiguration which checks for valid state 
in the shadow pagetables, printing the spte on each level.

The separate WARN_ONs are useful for kerneloops.org.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/vmx.c
===
--- kvm.orig/arch/x86/kvm/vmx.c
+++ kvm/arch/x86/kvm/vmx.c
@@ -3233,6 +3233,89 @@ static int handle_ept_violation(struct k
return kvm_mmu_page_fault(vcpu, gpa  PAGE_MASK, 0);
 }
 
+static u64 ept_rsvd_mask(u64 spte, int level)
+{
+   int i;
+   u64 mask = 0;
+
+   for (i = 51; i  boot_cpu_data.x86_phys_bits; i--)
+   mask |= (1ULL  i);
+
+   if (level  2)
+   /* bits 7:3 reserved */
+   mask |= 0xf8;
+   else if (level == 2) {
+   if (spte  (1ULL  7))
+   /* 2MB ref, bits 20:12 reserved */
+   mask |= 0x1ff000;
+   else
+   /* bits 6:3 reserved */
+   mask |= 0x78;
+   }
+
+   return mask;
+}
+
+static void ept_misconfig_inspect_spte(struct kvm_vcpu *vcpu, u64 spte,
+  int level)
+{
+   printk(KERN_ERR %s: spte 0x%llx level %d\n, __func__, spte, level);
+
+   /* 010b (write-only) */
+   WARN_ON((spte  0x7) == 0x2);
+
+   /* 110b (write/execute) */
+   WARN_ON((spte  0x7) == 0x6);
+
+   /* 100b (execute-only) and value not supported by logical processor */
+   if (!cpu_has_vmx_ept_execute_only())
+   WARN_ON((spte  0x7) == 0x4);
+
+   /* not 000b */
+   if ((spte  0x7)) {
+   u64 rsvd_bits = spte  ept_rsvd_mask(spte, level);
+
+   if (rsvd_bits != 0) {
+   printk(KERN_ERR %s: rsvd_bits = 0x%llx\n,
+__func__, rsvd_bits);
+   WARN_ON(1);
+   }
+
+   if (level == 1 || (level == 2  (spte  (1ULL  7 {
+   u64 ept_mem_type = (spte  0x38)  3;
+
+   if (ept_mem_type == 2 || ept_mem_type == 3 ||
+   ept_mem_type == 7) {
+   printk(KERN_ERR %s: ept_mem_type=0x%llx\n,
+   __func__, ept_mem_type);
+   WARN_ON(1);
+   }
+   }
+   }
+}
+
+static int handle_ept_misconfig(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+{
+   u64 sptes[4];
+   int nr_sptes, i;
+   gpa_t gpa;
+
+   gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
+
+   printk(KERN_ERR EPT: Misconfiguration.\n);
+   printk(KERN_ERR EPT: GPA: 0x%llx\n, gpa);
+
+   nr_sptes = kvm_mmu_get_spte_hierarchy(vcpu, gpa, sptes);
+
+   for (i = PT64_ROOT_LEVEL; i  PT64_ROOT_LEVEL - nr_sptes; --i)
+   ept_misconfig_inspect_spte(vcpu, sptes[i-1], i);
+
+   kvm_run-exit_reason = KVM_EXIT_UNKNOWN;
+   kvm_run-hw.hardware_exit_reason = EXIT_REASON_EPT_MISCONFIG;
+
+   return 0;
+}
+
 static int handle_nmi_window(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
u32 cpu_based_vm_exec_control;
@@ -3303,8 +3386,9 @@ static int (*kvm_vmx_exit_handlers[])(st
[EXIT_REASON_APIC_ACCESS] = handle_apic_access,
[EXIT_REASON_WBINVD]  = handle_wbinvd,
[EXIT_REASON_TASK_SWITCH] = handle_task_switch,
-   [EXIT_REASON_EPT_VIOLATION]   = handle_ept_violation,
[EXIT_REASON_MCE_DURING_VMENTRY]  = handle_machine_check,
+   [EXIT_REASON_EPT_VIOLATION]   = handle_ept_violation,
+   [EXIT_REASON_EPT_MISCONFIG]   = handle_ept_misconfig,
 };
 
 static const int kvm_vmx_max_exit_handlers =

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 3/5] KVM: MMU: add kvm_mmu_get_spte_hierarchy helper

2009-06-11 Thread Marcelo Tosatti
Required by EPT misconfiguration handler.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -3013,6 +3013,24 @@ out:
return r;
 }
 
+int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4])
+{
+   struct kvm_shadow_walk_iterator iterator;
+   int nr_sptes = 0;
+
+   spin_lock(vcpu-kvm-mmu_lock);
+   for_each_shadow_entry(vcpu, addr, iterator) {
+   sptes[iterator.level-1] = *iterator.sptep;
+   nr_sptes++;
+   if (!is_shadow_present_pte(*iterator.sptep))
+   break;
+   }
+   spin_unlock(vcpu-kvm-mmu_lock);
+
+   return nr_sptes;
+}
+EXPORT_SYMBOL_GPL(kvm_mmu_get_spte_hierarchy);
+
 #ifdef AUDIT
 
 static const char *audit_msg;
Index: kvm/arch/x86/kvm/mmu.h
===
--- kvm.orig/arch/x86/kvm/mmu.h
+++ kvm/arch/x86/kvm/mmu.h
@@ -37,6 +37,8 @@
 #define PT32_ROOT_LEVEL 2
 #define PT32E_ROOT_LEVEL 3
 
+int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
+
 static inline void kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu)
 {
if (unlikely(vcpu-kvm-arch.n_free_mmu_pages  KVM_MIN_FREE_MMU_PAGES))

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/5] KVM: MMU: make for_each_shadow_entry aware of largepages

2009-06-11 Thread Marcelo Tosatti
This way there is no need to add explicit checks in every
for_each_shadow_entry user.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1273,6 +1273,11 @@ static bool shadow_walk_okay(struct kvm_
 {
if (iterator-level  PT_PAGE_TABLE_LEVEL)
return false;
+
+   if (iterator-level == PT_PAGE_TABLE_LEVEL)
+   if (is_large_pte(*iterator-sptep))
+   return false;
+
iterator-index = SHADOW_PT_INDEX(iterator-addr, iterator-level);
iterator-sptep = ((u64 *)__va(iterator-shadow_addr)) + 
iterator-index;
return true;

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 5/5] KVM: VMX: conditionally disable 2M pages

2009-06-11 Thread Marcelo Tosatti
Disable usage of 2M pages if VMX_EPT_2MB_PAGE_BIT (bit 16) is clear
in MSR_IA32_VMX_EPT_VPID_CAP and EPT is enabled.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/vmx.c
===
--- kvm.orig/arch/x86/kvm/vmx.c
+++ kvm/arch/x86/kvm/vmx.c
@@ -1393,6 +1393,9 @@ static __init int hardware_setup(void)
if (!cpu_has_vmx_tpr_shadow())
kvm_x86_ops-update_cr8_intercept = NULL;
 
+   if (enable_ept  !cpu_has_vmx_ept_2m_page())
+   kvm_disable_largepages();
+
return alloc_kvm_area();
 }
 
Index: kvm/include/linux/kvm_host.h
===
--- kvm.orig/include/linux/kvm_host.h
+++ kvm/include/linux/kvm_host.h
@@ -219,6 +219,7 @@ int kvm_arch_set_memory_region(struct kv
struct kvm_userspace_memory_region *mem,
struct kvm_memory_slot old,
int user_alloc);
+void kvm_disable_largepages(void);
 void kvm_arch_flush_shadow(struct kvm *kvm);
 gfn_t unalias_gfn(struct kvm *kvm, gfn_t gfn);
 struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
Index: kvm/virt/kvm/kvm_main.c
===
--- kvm.orig/virt/kvm/kvm_main.c
+++ kvm/virt/kvm/kvm_main.c
@@ -85,6 +85,8 @@ static long kvm_vcpu_ioctl(struct file *
 
 static bool kvm_rebooting;
 
+static bool largepages_disabled = false;
+
 #ifdef KVM_CAP_DEVICE_ASSIGNMENT
 static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head 
*head,
  int assigned_dev_id)
@@ -1171,9 +1173,11 @@ int __kvm_set_memory_region(struct kvm *
ugfn = new.userspace_addr  PAGE_SHIFT;
/*
 * If the gfn and userspace address are not aligned wrt each
-* other, disable large page support for this slot
+* other, or if explicitly asked to, disable large page
+* support for this slot
 */
-   if ((base_gfn ^ ugfn)  (KVM_PAGES_PER_HPAGE - 1))
+   if ((base_gfn ^ ugfn)  (KVM_PAGES_PER_HPAGE - 1) ||
+   largepages_disabled)
for (i = 0; i  largepages; ++i)
new.lpage_info[i].write_count = 1;
}
@@ -1286,6 +1290,12 @@ out:
return r;
 }
 
+void kvm_disable_largepages(void)
+{
+   largepages_disabled = true;
+}
+EXPORT_SYMBOL_GPL(kvm_disable_largepages);
+
 int is_error_page(struct page *page)
 {
return page == bad_page;

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] msi-x: let drivers retry when not enough vectors

2009-06-11 Thread Jesse Barnes
On Thu, 7 May 2009 11:28:41 +0300
Michael S. Tsirkin m...@redhat.com wrote:

 pci_enable_msix currently returns -EINVAL if you ask
 for more vectors than supported by the device, which would
 typically cause fallback to regular interrupts.
 
 It's better to return the table size, making the driver retry
 MSI-X with less vectors.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 Hi Jesse,
 This came up when I was adding MSI-X support to virtio pci driver,
 which does not know the exact table size upfront.
 Could you consider this patch for 2.6.31 please?

Applied this one to my linux-next branch; hopefully Rusty won't mind
too much. :)

-- 
Jesse Barnes, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] BIOS changes for configuring irq0-inti2 override (v4)

2009-06-11 Thread Beth Kon

Beth Kon wrote:

Sebastian Herbszt wrote:

Beth Kon wrote:
These patches resolve the irq0-inti2 override issue, and get the 
hpet working

on kvm.

Override and HPET changes are sent as a series because HPET depends 
on the
override. Win2k8 expects the HPET interrupt on inti2, regardless of 
whether
an override exists in the BIOS. And the HPET spec states that in 
legacy mode,

timer interrupt is on inti2.

The irq0-inti2 override will always be used unless the kernel 
cannot do irq
routing (i.e., compatibility with old kernels). So if the kernel is 
capable,
userspace sets up irq0-inti2 via the irq routing interface, and 
adds the

irq0-inti2 override to the MADT interrupt source override table,
and the mp table (for the no-acpi case).

Changes from v3:

- changes based on comments from Avi and Gleb.
- corrected legacy enable/disable for in-kernel PIT. The code now best
 approximates a multiplexer that disables PIT interrupts when HPET 
is  in legacy mode (as described by HPET spec). Any changes to the 
PIT that  may occur while HPET is operating in legacy mode are 
saved, so if  HPET leaves legacy mode, the PIT is just reenabled, 
with mode set  to whatever the last setting from guest was. 
Legacy mode is disabled
 at least during crash and shutdown (in Linux), so this needs to be 
 handled properly.



---
kvm/bios/rombios32.c |   60 
-

1 files changed, 44 insertions(+), 16 deletions(-)


What about the mptable entry count?
Think it would need something like

#ifdef BX_QEMU
 if (irq0_override)
   putle16(q, smp_cpus + 17); /* entry count */
 else
   putle16(q, smp_cpus + 18); /* entry count */
#else
 putle16(q, smp_cpus + 18); /* entry count */
#endif

Your patch Fix non-ACPI Timer Interrupt Routing - v3 [1] included 
such a change.


[1] http://lists.gnu.org/archive/html/qemu-devel/2009-04/msg01396.html

Yes, I lost that somehow! Thanks (again!).
Actually, it isn't that simple. That patch that you referred to was a 
qemu patch. But I still don't see it in qemu-patched bochs bios. 
Apparently, I did neglect to add it to the kvm bios patches that I had 
waiting.


Anthony, do you know what happened to this patch?




- Sebastian



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] Userspace changes for configuring irq0-inti2 override (v6)

2009-06-11 Thread Beth Kon
These patches resolve the irq0-inti2 override issue, and get the hpet working
on kvm.

Override and HPET changes are sent as a series because HPET depends on the
override. Win2k8 expects the HPET interrupt on inti2, regardless of whether
an override exists in the BIOS. And the HPET spec states that in legacy mode,
timer interrupt is on inti2.

The irq0-inti2 override will always be used unless the kernel cannot do irq
routing (i.e., compatibility with old kernels). So if the kernel is capable,
userspace sets up irq0-inti2 via the irq routing interface, and adds the
irq0-inti2 override to the MADT interrupt source override table,
and the mp table (for the no-acpi case).

Changes from v3:

- changes based on comments from Avi and Gleb.
- corrected legacy enable/disable for in-kernel PIT. The code now best
  approximates a multiplexer that disables PIT interrupts when HPET is 
  in legacy mode (as described by HPET spec). Any changes to the PIT that 
  may occur while HPET is operating in legacy mode are saved, so if 
  HPET leaves legacy mode, the PIT is just reenabled, with mode set 
  to whatever the last setting from guest was. Legacy mode is disabled
  at least during crash and shutdown (in Linux), so this needs to be 
  handled properly.

Changes from v4:

- Modify mp_table entry count depending on whether irq_override is enabled.


Signed-off-by: Beth Kon e...@us.ibm.com
---
 kvm/bios/rombios32.c |   67 ++
 1 files changed, 51 insertions(+), 16 deletions(-)

diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 7db91d8..d6886ee 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -446,6 +446,9 @@ uint32_t cpuid_features;
 uint32_t cpuid_ext_features;
 unsigned long ram_size;
 uint64_t ram_end;
+#ifdef BX_QEMU
+uint8_t irq0_override;
+#endif
 #ifdef BX_USE_EBDA_TABLES
 unsigned long ebda_cur_addr;
 #endif
@@ -487,6 +490,7 @@ void wrmsr_smp(uint32_t index, uint64_t val)
 #define QEMU_CFG_ARCH_LOCAL 0x8000
 #define QEMU_CFG_ACPI_TABLES  (QEMU_CFG_ARCH_LOCAL + 0)
 #define QEMU_CFG_SMBIOS_ENTRIES  (QEMU_CFG_ARCH_LOCAL + 1)
+#define QEMU_CFG_IRQ0_OVERRIDE   (QEMU_CFG_ARCH_LOCAL + 2)
 
 int qemu_cfg_port;
 
@@ -555,6 +559,17 @@ uint64_t qemu_cfg_get64 (void)
 }
 #endif
 
+#ifdef BX_QEMU
+void irq0_override_probe(void)
+{
+if(qemu_cfg_port) {
+qemu_cfg_select(QEMU_CFG_IRQ0_OVERRIDE);
+qemu_cfg_read(irq0_override, 1);
+return;
+}
+}
+#endif
+
 void cpu_probe(void)
 {
 uint32_t eax, ebx, ecx, edx;
@@ -1153,7 +1168,14 @@ static void mptable_init(void)
 putstr(q, 0.1 ); /* vendor id */
 putle32(q, 0); /* OEM table ptr */
 putle16(q, 0); /* OEM table size */
+#ifdef BX_QEMU
+if (irq0_override)
+putle16(q, MAX_CPUS + 17); /* entry count */
+else
+putle16(q, MAX_CPUS + 18); /* entry count */
+#else
 putle16(q, MAX_CPUS + 18); /* entry count */
+#endif
 putle32(q, 0xfee0); /* local APIC addr */
 putle16(q, 0); /* ext table length */
 putb(q, 0); /* ext table checksum */
@@ -1197,6 +1219,13 @@ static void mptable_init(void)
 
 /* irqs */
 for(i = 0; i  16; i++) {
+#ifdef BX_QEMU
+/* One entry per ioapic interrupt destination. Destination 2 is covered
+ * by irq0-inti2 override (i == 0). Source IRQ 2 is unused
+ */
+if (irq0_override  i == 2)
+continue;
+#endif
 putb(q, 3); /* entry type = I/O interrupt */
 putb(q, 0); /* interrupt type = vectored interrupt */
 putb(q, 0); /* flags: po=0, el=0 */
@@ -1204,7 +1233,12 @@ static void mptable_init(void)
 putb(q, 0); /* source bus ID = ISA */
 putb(q, i); /* source bus IRQ */
 putb(q, ioapic_id); /* dest I/O APIC ID */
-putb(q, i); /* dest I/O APIC interrupt in */
+#ifdef BX_QEMU
+if (irq0_override  i == 0)
+putb(q, 2); /* dest I/O APIC interrupt in */
+else
+#endif
+putb(q, i); /* dest I/O APIC interrupt in */
 }
 /* patch length */
 len = q - mp_config_table;
@@ -1760,23 +1794,21 @@ void acpi_bios_init(void)
 io_apic-io_apic_id = smp_cpus;
 io_apic-address = cpu_to_le32(0xfec0);
 io_apic-interrupt = cpu_to_le32(0);
-#ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 io_apic++;
-
-int_override = (void *)io_apic;
-int_override-type = APIC_XRUPT_OVERRIDE;
-int_override-length = sizeof(*int_override);
-int_override-bus = cpu_to_le32(0);
-int_override-source = cpu_to_le32(0);
-int_override-gsi = cpu_to_le32(2);
-int_override-flags = cpu_to_le32(0);
-#endif
+int_override = (struct madt_int_override*)(io_apic);
+#ifdef BX_QEMU
+if (irq0_override) {
+memset(int_override, 0, sizeof(*int_override));
+int_override-type = APIC_XRUPT_OVERRIDE;
+int_override-length = sizeof(*int_override);
+int_override-source = 

[PATCH 2/5] Userspace changes for configuring irq0-inti2 override (v6)

2009-06-11 Thread Beth Kon
Signed-off-by: Beth Kon e...@us.ibm.com

---
 hw/ioapic.c|6 +++---
 hw/pc.c|2 ++
 qemu-kvm-x86.c |6 +-
 qemu-kvm.h |2 ++
 sysemu.h   |1 +
 vl.c   |   11 +--
 6 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/hw/ioapic.c b/hw/ioapic.c
index 6c178c7..a67b766 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -23,6 +23,7 @@
 
 #include hw.h
 #include pc.h
+#include sysemu.h
 #include qemu-timer.h
 #include host-utils.h
 
@@ -95,14 +96,13 @@ void ioapic_set_irq(void *opaque, int vector, int level)
 {
 IOAPICState *s = opaque;
 
-#if 0
 /* ISA IRQs map to GSI 1-1 except for IRQ0 which maps
  * to GSI 2.  GSI maps to ioapic 1-1.  This is not
  * the cleanest way of doing it but it should work. */
 
-if (vector == 0)
+if (vector == 0  irq0override) {
 vector = 2;
-#endif
+}
 
 if (vector = 0  vector  IOAPIC_NUM_PINS) {
 uint32_t mask = 1  vector;
diff --git a/hw/pc.c b/hw/pc.c
index 66f4635..1c068fb 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -55,6 +55,7 @@
 #define BIOS_CFG_IOPORT 0x510
 #define FW_CFG_ACPI_TABLES (FW_CFG_ARCH_LOCAL + 0)
 #define FW_CFG_SMBIOS_ENTRIES (FW_CFG_ARCH_LOCAL + 1)
+#define FW_CFG_IRQ0_OVERRIDE (FW_CFG_ARCH_LOCAL + 2)
 
 #define MAX_IDE_BUS 2
 
@@ -476,6 +477,7 @@ static void bochs_bios_init(void)
 fw_cfg_add_i64(fw_cfg, FW_CFG_RAM_SIZE, (uint64_t)ram_size);
 fw_cfg_add_bytes(fw_cfg, FW_CFG_ACPI_TABLES, (uint8_t *)acpi_tables,
  acpi_tables_len);
+fw_cfg_add_bytes(fw_cfg, FW_CFG_IRQ0_OVERRIDE, irq0override, 1);
 
 smbios_table = smbios_get_table(smbios_len);
 if (smbios_table)
diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 5526d8f..89337e9 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -909,7 +909,11 @@ int kvm_arch_init_irq_routing(void)
 return r;
 }
 for (i = 0; i  24; ++i) {
-r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+if (i == 0) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, 2);
+} else if (i != 2) {
+r = kvm_add_irq_route(kvm_context, i, KVM_IRQCHIP_IOAPIC, i);
+}
 if (r  0)
 return r;
 }
diff --git a/qemu-kvm.h b/qemu-kvm.h
index fa40542..6bbafbc 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -169,6 +169,7 @@ int handle_tpr_access(void *opaque, kvm_vcpu_context_t vcpu,
 #define kvm_enabled() (kvm_allowed)
 #define qemu_kvm_irqchip_in_kernel() kvm_irqchip_in_kernel(kvm_context)
 #define qemu_kvm_pit_in_kernel() kvm_pit_in_kernel(kvm_context)
+#define qemu_kvm_has_gsi_routing() kvm_has_gsi_routing(kvm_context)
 #define kvm_has_sync_mmu() qemu_kvm_has_sync_mmu()
 void kvm_init_vcpu(CPUState *env);
 void kvm_load_tsc(CPUState *env);
@@ -177,6 +178,7 @@ void kvm_load_tsc(CPUState *env);
 #define kvm_nested 0
 #define qemu_kvm_irqchip_in_kernel() (0)
 #define qemu_kvm_pit_in_kernel() (0)
+#define qemu_kvm_has_gsi_routing() (0)
 #define kvm_has_sync_mmu() (0)
 #define kvm_load_registers(env) do {} while(0)
 #define kvm_save_registers(env) do {} while(0)
diff --git a/sysemu.h b/sysemu.h
index 47d001e..f78e974 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -108,6 +108,7 @@ extern int xenfb_enabled;
 extern int graphic_width;
 extern int graphic_height;
 extern int graphic_depth;
+extern uint8_t irq0override;
 extern DisplayType display_type;
 extern const char *keyboard_layout;
 extern int win2k_install_hack;
diff --git a/vl.c b/vl.c
index 2fda17b..9b1d1ab 100644
--- a/vl.c
+++ b/vl.c
@@ -253,6 +253,7 @@ int no_reboot = 0;
 int no_shutdown = 0;
 int cursor_hide = 1;
 int graphic_rotate = 0;
+uint8_t irq0override = 1;
 #ifndef _WIN32
 int daemonize = 0;
 #endif
@@ -6054,8 +6055,14 @@ int main(int argc, char **argv, char **envp)
 
 module_call_init(MODULE_INIT_DEVICE);
 
-if (kvm_enabled())
-   kvm_init_ap();
+if (kvm_enabled()) {
+   kvm_init_ap();
+#ifdef USE_KVM
+if (kvm_irqchip  !qemu_kvm_has_gsi_routing()) {
+irq0override = 0;
+}
+#endif
+}
 
 machine-init(ram_size, boot_devices,
   kernel_filename, kernel_cmdline, initrd_filename, cpu_model);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] BIOS changes for KVM HPET (v6)

2009-06-11 Thread Beth Kon
Signed-off-by: Beth Kon e...@us.ibm.com

---
 kvm/bios/acpi-dsdt.dsl |2 --
 kvm/bios/rombios32.c   |   11 +++
 2 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/kvm/bios/acpi-dsdt.dsl b/kvm/bios/acpi-dsdt.dsl
index db57307..71d0a5e 100755
--- a/kvm/bios/acpi-dsdt.dsl
+++ b/kvm/bios/acpi-dsdt.dsl
@@ -296,7 +296,6 @@ DefinitionBlock (
 })
 }
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 Device(HPET) {
 Name(_HID,  EISAID(PNP0103))
 Name(_UID, 0)
@@ -316,7 +315,6 @@ DefinitionBlock (
 })
 }
 #endif
-#endif
 }
 
 Scope(\_SB.PCI0) {
diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 9d6910e..1106f38 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1518,8 +1518,8 @@ struct acpi_20_generic_address {
 } __attribute__((__packed__));
 
 /*
- *  * HPET Description Table
- *   */
+ *  HPET Description Table
+ */
 struct acpi_20_hpet {
 ACPI_TABLE_HEADER_DEF   /* ACPI common table 
header */
 uint32_t   timer_block_id;
@@ -1703,13 +1703,11 @@ void acpi_bios_init(void)
 addr += madt_size;
 
 #ifdef BX_QEMU
-#ifdef HPET_WORKS_IN_KVM
 addr = (addr + 7)  ~7;
 hpet_addr = addr;
 hpet = (void *)(addr);
 addr += sizeof(*hpet);
 #endif
-#endif
 
 /* RSDP */
 memset(rsdp, 0, sizeof(*rsdp));
@@ -1883,7 +1881,6 @@ void acpi_bios_init(void)
 }
 
 /* HPET */
-#ifdef HPET_WORKS_IN_KVM
 memset(hpet, 0, sizeof(*hpet));
 /* Note timer_block_id value must be kept in sync with value advertised by
  * emulated hpet
@@ -1892,7 +1889,6 @@ void acpi_bios_init(void)
 hpet-addr.address = cpu_to_le32(ACPI_HPET_ADDRESS);
 acpi_build_table_header((struct  acpi_table_header *)hpet,
  HPET, sizeof(*hpet), 1);
-#endif
 
 acpi_additional_tables(); /* resets cfg to required entry */
 for(i = 0; i  external_tables; i++) {
@@ -1912,8 +1908,7 @@ void acpi_bios_init(void)
 /* kvm has no ssdt (processors are in dsdt) */
 //  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(ssdt_addr);
 #ifdef BX_QEMU
-/* No HPET (yet) */
-//  rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
+rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(hpet_addr);
 if (nb_numa_nodes  0)
 rsdt-table_offset_entry[nb_rsdt_entries++] = cpu_to_le32(srat_addr);
 #endif
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] Userspace changes for KVM HPET (v6)

2009-06-11 Thread Beth Kon
The big change here is handling of enabling/disabling of hpet legacy mode. When 
hpet enters
legacy mode, the spec says that the pit stops generating interrupts. In 
practice, we want to 
stop the pit periodic timer from running because it is wasteful in a virtual 
environment. 

We also have to worry about the hpet leaving legacy mode (which, at least in 
linux, happens
only during a shutdown or crash). At this point, according to the hpet spec, 
PIT interrupts
need to be reenabled. For us, it means the PIT timer needs to be restarted.  

This patch handles this situation better than the previous version by coming 
closer to 
just disabling PIT interrupts. It allows the PIT state to change if the OS 
modifies it,
even while PIT is disabled, but does not allow a pit timer to start. Then if 
HPET
legacy mode is disabled, whatever the PIT state is at that point, the PIT timer 
is 
restarted accordingly.

Signed-off-by: Beth Kon e...@us.ibm.com
---

diff --git a/hw/hpet.c b/hw/hpet.c
index 29db325..043b92b 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -206,6 +206,9 @@ static int hpet_load(QEMUFile *f, void *opaque, int 
version_id)
 qemu_get_timer(f, s-timer[i].qemu_timer);
 }
 }
+if (hpet_in_legacy_mode()) {
+hpet_disable_pit();
+}
 return 0;
 }
 
@@ -475,9 +478,11 @@ static void hpet_ram_writel(void *opaque, 
target_phys_addr_t addr,
 }
 /* i8254 and RTC are disabled when HPET is in legacy mode */
 if (activating_bit(old_val, new_val, HPET_CFG_LEGACY)) {
-hpet_pit_disable();
+hpet_disable_pit();
+dprintf(qemu: hpet disabled pit\n);
 } else if (deactivating_bit(old_val, new_val, 
HPET_CFG_LEGACY)) {
-hpet_pit_enable();
+hpet_enable_pit();
+dprintf(qemu: hpet enabled pit\n);
 }
 break;
 case HPET_CFG + 4:
@@ -554,13 +559,15 @@ static void hpet_reset(void *opaque) {
 /* 64-bit main counter; 3 timers supported; LegacyReplacementRoute. */
 s-capability = 0x8086a201ULL;
 s-capability |= ((HPET_CLK_PERIOD)  32);
-if (count  0)
+if (count  0) {
 /* we don't enable pit when hpet_reset is first called (by hpet_init)
  * because hpet is taking over for pit here. On subsequent invocations,
  * hpet_reset is called due to system reset. At this point control must
  * be returned to pit until SW reenables hpet.
  */
-hpet_pit_enable();
+hpet_enable_pit();
+dprintf(qemu: hpet enabled pit\n);
+}
 count = 1;
 }
 
diff --git a/hw/i8254.c b/hw/i8254.c
index 2f229f9..8c8076f 100644
--- a/hw/i8254.c
+++ b/hw/i8254.c
@@ -25,6 +25,7 @@
 #include pc.h
 #include isa.h
 #include qemu-timer.h
+#include qemu-kvm.h
 #include i8254.h
 
 //#define DEBUG_PIT
@@ -198,6 +199,9 @@ int pit_get_mode(PITState *pit, int channel)
 
 static inline void pit_load_count(PITChannelState *s, int val)
 {
+if (s-channel == 0  pit_state.hpet_legacy_mode) {
+return;
+}
 if (val == 0)
 val = 0x1;
 s-count_load_time = qemu_get_clock(vm_clock);
@@ -371,10 +375,11 @@ static void pit_irq_timer_update(PITChannelState *s, 
int64_t current_time)
(double)(expire_time - current_time) / ticks_per_sec);
 #endif
 s-next_transition_time = expire_time;
-if (expire_time != -1)
+if (expire_time != -1) {
 qemu_mod_timer(s-irq_timer, expire_time);
-else
+} else {
 qemu_del_timer(s-irq_timer);
+}
 }
 
 static void pit_irq_timer(void *opaque)
@@ -451,6 +456,7 @@ void pit_reset(void *opaque)
 PITChannelState *s;
 int i;
 
+pit-hpet_legacy_mode = 0;
 for(i = 0;i  3; i++) {
 s = pit-channels[i];
 s-mode = 3;
@@ -460,32 +466,43 @@ void pit_reset(void *opaque)
 }
 
 /* When HPET is operating in legacy mode, i8254 timer0 is disabled */
-void hpet_pit_disable(void) {
-PITChannelState *s;
-s = pit_state.channels[0];
-if (s-irq_timer)
-qemu_del_timer(s-irq_timer);
+
+void hpet_disable_pit(void)
+{
+PITChannelState *s = pit_state.channels[0];
+if (qemu_kvm_pit_in_kernel()) {
+kvm_hpet_disable_kpit();
+} else {
+if (s-irq_timer) {
+qemu_del_timer(s-irq_timer);
+}
+}
 }
 
 /* When HPET is reset or leaving legacy mode, it must reenable i8254
  * timer 0
  */
 
-void hpet_pit_enable(void)
+void hpet_enable_pit(void)
 {
 PITState *pit = pit_state;
-PITChannelState *s;
-s = pit-channels[0];
-s-mode = 3;
-s-gate = 1;
-pit_load_count(s, 0);
+PITChannelState *s = pit-channels[0];
+if (qemu_kvm_pit_in_kernel()) {
+kvm_hpet_enable_kpit();
+} else {
+pit_load_count(s, s-count);
+}
 }
 
 PITState *pit_init(int base, qemu_irq irq)
 {
 PITState *pit = pit_state;
 PITChannelState *s;
+int i;
 

[PATCH 5/5] HPET interaction with in-kernel PIT (v6)

2009-06-11 Thread Beth Kon

Signed-off-by: Beth Kon e...@us.ibm.com

---
 arch/x86/include/asm/kvm.h |1 +
 arch/x86/kvm/i8254.c   |   24 +++-
 arch/x86/kvm/i8254.h   |3 ++-
 arch/x86/kvm/x86.c |5 -
 4 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm.h b/arch/x86/include/asm/kvm.h
index 708b9c3..3c44923 100644
--- a/arch/x86/include/asm/kvm.h
+++ b/arch/x86/include/asm/kvm.h
@@ -235,6 +235,7 @@ struct kvm_guest_debug_arch {
 
 struct kvm_pit_state {
struct kvm_pit_channel_state channels[3];
+   u8 hpet_legacy_mode;
 };
 
 struct kvm_reinject_control {
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 331705f..bb8382b 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -340,10 +340,20 @@ static void pit_load_count(struct kvm *kvm, int channel, 
u32 val)
}
 }
 
-void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val)
+void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int 
hpet_legacy_start)
 {
+   u8 saved_mode;
mutex_lock(kvm-arch.vpit-pit_state.lock);
-   pit_load_count(kvm, channel, val);
+   if (hpet_legacy_start) {
+   /* save existing mode for later reenablement */
+   saved_mode = kvm-arch.vpit-pit_state.channels[0].mode;
+   kvm-arch.vpit-pit_state.channels[0].mode = 0xff; /* disable 
timer */
+   pit_load_count(kvm, channel, val);
+   kvm-arch.vpit-pit_state.channels[0].mode = saved_mode;
+   } else {
+   if (!(channel == 0  
kvm-arch.vpit-pit_state.hpet_legacy_mode))
+   pit_load_count(kvm, channel, val);
+   }
mutex_unlock(kvm-arch.vpit-pit_state.lock);
 }
 
@@ -411,17 +421,20 @@ static void pit_ioport_write(struct kvm_io_device *this,
switch (s-write_state) {
default:
case RW_STATE_LSB:
-   pit_load_count(kvm, addr, val);
+   if (!(addr == 0  pit_state-hpet_legacy_mode))
+   pit_load_count(kvm, addr, val);
break;
case RW_STATE_MSB:
-   pit_load_count(kvm, addr, val  8);
+   if (!(addr == 0  pit_state-hpet_legacy_mode))
+   pit_load_count(kvm, addr, val  8);
break;
case RW_STATE_WORD0:
s-write_latch = val;
s-write_state = RW_STATE_WORD1;
break;
case RW_STATE_WORD1:
-   pit_load_count(kvm, addr, s-write_latch | (val  8));
+   if (!(addr == 0  pit_state-hpet_legacy_mode))
+   pit_load_count(kvm, addr, s-write_latch | (val 
 8));
s-write_state = RW_STATE_WORD0;
break;
}
@@ -548,6 +561,7 @@ void kvm_pit_reset(struct kvm_pit *pit)
struct kvm_kpit_channel_state *c;
 
mutex_lock(pit-pit_state.lock);
+   pit-pit_state.hpet_legacy_mode = 0;
for (i = 0; i  3; i++) {
c = pit-pit_state.channels[i];
c-mode = 0xff;
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index b267018..b5967ca 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -21,6 +21,7 @@ struct kvm_kpit_channel_state {
 
 struct kvm_kpit_state {
struct kvm_kpit_channel_state channels[3];
+   u8 hpet_legacy_mode;
struct kvm_timer pit_timer;
bool is_periodic;
u32speaker_data_on;
@@ -49,7 +50,7 @@ struct kvm_pit {
 #define KVM_PIT_CHANNEL_MASK   0x3
 
 void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu);
-void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val);
+void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int 
hpet_legacy_start);
 struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags);
 void kvm_free_pit(struct kvm *kvm);
 void kvm_pit_reset(struct kvm_pit *pit);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1b91ea7..3c70545 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1948,9 +1948,12 @@ static int kvm_vm_ioctl_get_pit(struct kvm *kvm, struct 
kvm_pit_state *ps)
 static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct kvm_pit_state *ps)
 {
int r = 0;
+   int hpet_legacy_start = 0;
 
+   if (ps-hpet_legacy_mode  !kvm-arch.vpit-pit_state.hpet_legacy_mode)
+   hpet_legacy_start = 1;
memcpy(kvm-arch.vpit-pit_state, ps, sizeof(struct kvm_pit_state));
-   kvm_pit_load_count(kvm, 0, ps-channels[0].count);
+   kvm_pit_load_count(kvm, 0, ps-channels[0].count, hpet_legacy_start);
return r;
 }
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v2 0/2] irqfd: use POLLHUP notification for close()

2009-06-11 Thread Michael S. Tsirkin
On Thu, Jun 04, 2009 at 08:48:02AM -0400, Gregory Haskins wrote:
 (Applies to kvm.git/master:25deed73)
 
 Please see the header for 2/2 for a description.  This patch series has been
 fully tested and appears to be working correctly.
 
 [Review notes:
   *) Paul has looked at the SRCU design and, to my knowledge, didn't find
  any holes.
   *) Michael, Avi, and myself agree that while the removal of the DEASSIGN
  vector is not desirable, the fix on close() is more important in
  the short-term.  We can always add DEASSIGN support again in the
future with a CAP bit.
 ]

So, I've been thinking about this, and this approach has another
problem: it depends on pollhup on close which is AFAIK an
eventfd-specific feature. This will prevent us from supporting polling
other useful file types, such as sockets and pipes, down the road, with
this interface.

And there's DEASSIGN issue which is needed for migration and MSI vector
remapping.

I didn't realise these implications when I suggested deassign on close.
To me, it now looks like we are better off reverting this patch.
We can later add 'deassign on close' support with CAP bit after all :)

Avi, Gregory, what's your take?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v2 0/2] irqfd: use POLLHUP notification for close()

2009-06-11 Thread Michael S. Tsirkin
[ Resending with correct address for Davide. Pls don't reply
  to the original one, you'll get bounces. ]

On Thu, Jun 04, 2009 at 08:48:02AM -0400, Gregory Haskins wrote:
 (Applies to kvm.git/master:25deed73)
 
 Please see the header for 2/2 for a description.  This patch series has been
 fully tested and appears to be working correctly.
 
 [Review notes:
   *) Paul has looked at the SRCU design and, to my knowledge, didn't find
  any holes.
   *) Michael, Avi, and myself agree that while the removal of the DEASSIGN
  vector is not desirable, the fix on close() is more important in
  the short-term.  We can always add DEASSIGN support again in the
future with a CAP bit.
 ]

So, I've been thinking about this, and this approach has another
problem: it depends on pollhup on close which is AFAIK an
eventfd-specific feature. This will prevent us from supporting polling
other useful file types, such as sockets and pipes, down the road, with
this interface.

And there's DEASSIGN issue which is needed for migration and MSI vector
remapping.

I didn't realise these implications when I suggested deassign on close.
To me, it now looks like we are better off reverting this patch.
We can later add 'deassign on close' support with CAP bit after all :)

Avi, Gregory, what's your take?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html