KVM-86 not exposing 64 bits CPU anymore, NICE

2009-06-04 Thread Gilles PIETRI

Hi,

I'm quite pissed off. I just upgraded to kvm-86 on a host that has 
worked nicely on kvm-78 for quite some time. But since I was fearing the 
qcow2 corruption issues, I wanted to upgrade kvm-86. After testing the 
performance, I decided to switch. How stupid that was. That was really 
putting too much trust in KVM.


Now I can't have 64 bits CPUs on my guests.
My host is running a 2.6.27.7 kernel, and is x86_64 enabled.
Until the upgrade, guests were running x86_64 fine.
Now, it says long mode can't be used or something like that, and I can 
only have 32 bits guests.


Looks really like the bug explained here: 
http://www.mail-archive.com/kvm@vger.kernel.org/msg09431.html


If I use -no-kvm, it works, but obviously, I want to be able to have kvm 
support enabled.


Now, I really am happy about this upgrade, and I'm gonna have to roll it 
back. I really would appreciate some help on this..


Gilles
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM-86 not exposing 64 bits CPU anymore, NICE

2009-06-04 Thread Jim Paris
Gilles PIETRI wrote:
 Hi,

 I'm quite pissed off. I just upgraded to kvm-86 on a host that has  
 worked nicely on kvm-78 for quite some time. But since I was fearing the  
 qcow2 corruption issues, I wanted to upgrade kvm-86. After testing the  
 performance, I decided to switch. How stupid that was. That was really  
 putting too much trust in KVM.

 Now I can't have 64 bits CPUs on my guests.
 My host is running a 2.6.27.7 kernel, and is x86_64 enabled.
 Until the upgrade, guests were running x86_64 fine.
 Now, it says long mode can't be used or something like that, and I can  
 only have 32 bits guests.

Please see
  http://www.mail-archive.com/kvm@vger.kernel.org/msg15757.html
  http://www.mail-archive.com/kvm@vger.kernel.org/msg15769.html

-jim
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM-86 not exposing 64 bits CPU anymore, NICE

2009-06-04 Thread Alexey Eromenko

- Gilles PIETRI contact+...@gilouweb.com wrote:

 Hi,
 
 I'm quite pissed off. I just upgraded to kvm-86 on a host that has 
 worked nicely on kvm-78 for quite some time. But since I was fearing
 the 
 qcow2 corruption issues, I wanted to upgrade kvm-86. After testing the
 
 performance, I decided to switch. How stupid that was. That was really
 
 putting too much trust in KVM.
 
 Now I can't have 64 bits CPUs on my guests.
 My host is running a 2.6.27.7 kernel, and is x86_64 enabled.
 Until the upgrade, guests were running x86_64 fine.
 Now, it says long mode can't be used or something like that, and I can
 
 only have 32 bits guests.
 
 Looks really like the bug explained here: 
 http://www.mail-archive.com/kvm@vger.kernel.org/msg09431.html
 
 If I use -no-kvm, it works, but obviously, I want to be able to have
 kvm 
 support enabled.
 
 Now, I really am happy about this upgrade, and I'm gonna have to roll
 it 
 back. I really would appreciate some help on this..
 
 Gilles

Hi Gilles,

What are you saying is very strange, because KVM-Autotest has passed all tests 
for KVM-86 release,
and I can say that 64-bit guests work here. (both Intel  AMD, on RHEL 5.3/x64)

-Alexey
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM-86 not exposing 64 bits CPU anymore, NICE

2009-06-04 Thread Gilles PIETRI

Le 04/06/2009 09:46, Jim Paris a écrit :

Gilles PIETRI wrote:

Hi,

I'm quite pissed off. I just upgraded to kvm-86 on a host that has  
worked nicely on kvm-78 for quite some time. But since I was fearing the  
qcow2 corruption issues, I wanted to upgrade kvm-86. After testing the  
performance, I decided to switch. How stupid that was. That was really  
putting too much trust in KVM.


Now I can't have 64 bits CPUs on my guests.
My host is running a 2.6.27.7 kernel, and is x86_64 enabled.
Until the upgrade, guests were running x86_64 fine.
Now, it says long mode can't be used or something like that, and I can  
only have 32 bits guests.


Please see
  http://www.mail-archive.com/kvm@vger.kernel.org/msg15757.html
  http://www.mail-archive.com/kvm@vger.kernel.org/msg15769.html

-jim


Gonna check that, thanks a lot, this didn't get on my radar..

Regards,

Gilles
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v8] qemu-kvm: add irqfd support

2009-06-04 Thread Avi Kivity

Gregory Haskins wrote:

irqfd lets you create an eventfd based file-desriptor to inject interrupts
to a kvm guest.  We associate one gsi per fd for fine-grained routing.

[note: this is meant to work in conjunction with the POLLHUP version of
 irqfd, which has not yet been accepted into kvm.git]
  


Applied with two changes: added a dependency on CONFIG_eventfd (with the 
kvm external module, you can have irqfd support without eventfd 
support), and adjusted for the new libkvm location (libkvm-all.[ch]).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()

2009-06-04 Thread Avi Kivity

Michael S. Tsirkin wrote:

On Tue, Jun 02, 2009 at 01:41:05PM -0400, Gregory Haskins wrote:
  

And having close not clean up the state unless you do an ioctl first is
very messy IMO - I don't think you'll find any such examples in kernel.

  
  

I agree, and that is why I am advocating this POLLHUP solution.  It was
only this other way to begin with because the technology didn't exist
until Davide showed me the light.

Problem with your request is that I already looked into what is
essentially a bi-directional reference problem (for a different reason)
when I started the POLLHUP series.  Its messy to do this in a way that
doesn't negatively impact the fast path (introducing locking, etc) or
make my head explode making sure it doesn't race.  Afaict, we would need
to solve this problem to do what you are proposing (patches welcome).

If this hybrid decoupled-deassign + unified-close is indeed an important
feature set, I suggest that we still consider this POLLHUP series for
inclusion, and then someone can re-introduce DEASSIGN support in the
future as a CAP bit extension.  That way we at least get the desirable
close() properties that we both seem in favor of, and get this advanced
use case when we need it (and can figure out the locking design).




FWIW, I took a look and yes, it is non-trivial.
I concur, we can always add the deassign ioctl later.
  


I agree that deassign is needed for reasons of symmetry, and that it can 
be added later.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [1/2] x86: MCE: Define MCE_VECTOR

2009-06-04 Thread Andi Kleen

[This patch is already in the mce3 branch in mce3 tip, but I'm including
it here because it's needed for the next patch.]

Signed-off-by: Andi Kleen a...@linux.intel.com

---
 arch/x86/include/asm/irq_vectors.h |1 +
 1 file changed, 1 insertion(+)

Index: linux/arch/x86/include/asm/irq_vectors.h
===
--- linux.orig/arch/x86/include/asm/irq_vectors.h   2009-05-27 
21:48:38.0 +0200
+++ linux/arch/x86/include/asm/irq_vectors.h2009-05-27 21:48:38.0 
+0200
@@ -25,6 +25,7 @@
  */
 
 #define NMI_VECTOR 0x02
+#define MCE_VECTOR 0x12
 
 /*
  * IDT vectors usable for external interrupt sources start
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [2/2] KVM: Add VT-x machine check support

2009-06-04 Thread Andi Kleen

[Avi could you please still consider this patch for your 2.6.31 patchqueue?
It's fairly simple, but important to handle memory errors in guests]

VT-x needs an explicit MC vector intercept to handle machine checks in the 
hyper visor.

It also has a special option to catch machine checks that happen
during VT entry.

Do these interceptions and forward them to the Linux machine check
handler. Make it always look like user space is interrupted because
the machine check handler treats kernel/user space differently.

Thanks to Huang Ying and Jiang Yunhong for help and testing.

Cc: ying.hu...@intel.com
Signed-off-by: Andi Kleen a...@linux.intel.com

---
 arch/x86/include/asm/vmx.h |1 +
 arch/x86/kvm/vmx.c |   26 --
 2 files changed, 25 insertions(+), 2 deletions(-)

Index: linux/arch/x86/include/asm/vmx.h
===
--- linux.orig/arch/x86/include/asm/vmx.h   2009-05-28 10:47:53.0 
+0200
+++ linux/arch/x86/include/asm/vmx.h2009-06-04 11:58:49.0 +0200
@@ -247,6 +247,7 @@
 #define EXIT_REASON_MSR_READ31
 #define EXIT_REASON_MSR_WRITE   32
 #define EXIT_REASON_MWAIT_INSTRUCTION   36
+#define EXIT_REASON_MACHINE_CHECK  41
 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43
 #define EXIT_REASON_APIC_ACCESS 44
 #define EXIT_REASON_EPT_VIOLATION   48
Index: linux/arch/x86/kvm/vmx.c
===
--- linux.orig/arch/x86/kvm/vmx.c   2009-05-28 10:47:53.0 +0200
+++ linux/arch/x86/kvm/vmx.c2009-06-04 12:05:44.0 +0200
@@ -32,6 +32,7 @@
 #include asm/desc.h
 #include asm/vmx.h
 #include asm/virtext.h
+#include asm/mce.h
 
 #define __ex(x) __kvm_handle_fault_on_reboot(x)
 
@@ -478,7 +479,7 @@
 {
u32 eb;
 
-   eb = (1u  PF_VECTOR) | (1u  UD_VECTOR);
+   eb = (1u  PF_VECTOR) | (1u  UD_VECTOR) | (1u  MC_VECTOR);
if (!vcpu-fpu_active)
eb |= 1u  NM_VECTOR;
if (vcpu-guest_debug  KVM_GUESTDBG_ENABLE) {
@@ -2585,6 +2586,23 @@
return 0;
 }
 
+/*
+ * Trigger machine check on the host. We assume all the MSRs are already set up
+ * by the CPU and that we still run on the same CPU as the MCE occurred on.
+ * We pass a fake environment to the machine check handler because we want
+ * the guest to be always treated like user space, no matter what context
+ * it used internally.
+ */
+static int handle_machine_check(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
+{
+   struct pt_regs regs = {
+   .cs = 3, /* Fake ring 3 no matter what the guest ran on */
+   .flags = X86_EFLAGS_IF,
+   };
+   do_machine_check(regs, 0);
+   return 1;
+}
+
 static int handle_exception(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -2596,6 +2614,10 @@
vect_info = vmx-idt_vectoring_info;
intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
 
+   ex_no = intr_info  INTR_INFO_VECTOR_MASK;
+   if (ex_no == MCE_VECTOR)
+   return handle_machine_check(vcpu, kvm_run);
+
if ((vect_info  VECTORING_INFO_VALID_MASK) 
!is_page_fault(intr_info))
printk(KERN_ERR %s: unexpected, vectoring info 0x%x 
@@ -2648,7 +2670,6 @@
return 1;
}
 
-   ex_no = intr_info  INTR_INFO_VECTOR_MASK;
switch (ex_no) {
case DB_VECTOR:
dr6 = vmcs_readl(EXIT_QUALIFICATION);
@@ -3150,6 +3171,7 @@
[EXIT_REASON_WBINVD]  = handle_wbinvd,
[EXIT_REASON_TASK_SWITCH] = handle_task_switch,
[EXIT_REASON_EPT_VIOLATION]   = handle_ept_violation,
+   [EXIT_REASON_MACHINE_CHECK]   = handle_machine_check,
 };
 
 static const int kvm_vmx_max_exit_handlers =
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()

2009-06-04 Thread Gregory Haskins
Avi Kivity wrote:
 Michael S. Tsirkin wrote:
 On Tue, Jun 02, 2009 at 01:41:05PM -0400, Gregory Haskins wrote:
  
 And having close not clean up the state unless you do an ioctl
 first is
 very messy IMO - I don't think you'll find any such examples in
 kernel.

 
 I agree, and that is why I am advocating this POLLHUP solution.  It was
 only this other way to begin with because the technology didn't exist
 until Davide showed me the light.

 Problem with your request is that I already looked into what is
 essentially a bi-directional reference problem (for a different reason)
 when I started the POLLHUP series.  Its messy to do this in a way that
 doesn't negatively impact the fast path (introducing locking, etc) or
 make my head explode making sure it doesn't race.  Afaict, we would
 need
 to solve this problem to do what you are proposing (patches welcome).

 If this hybrid decoupled-deassign + unified-close is indeed an
 important
 feature set, I suggest that we still consider this POLLHUP series for
 inclusion, and then someone can re-introduce DEASSIGN support in the
 future as a CAP bit extension.  That way we at least get the desirable
 close() properties that we both seem in favor of, and get this advanced
 use case when we need it (and can figure out the locking design).

 

 FWIW, I took a look and yes, it is non-trivial.
 I concur, we can always add the deassign ioctl later.
   

 I agree that deassign is needed for reasons of symmetry, and that it
 can be added later.

Cool.

FYI: Davide's patch has been accepted into -mm (Andrew CC'd).  I am not
sure of the protocol here, but I assume this means you can now safely
pull it from -mm into kvm.git so the prerequisite for 2/2 is properly met.

-Greg



signature.asc
Description: OpenPGP digital signature


Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()

2009-06-04 Thread Gregory Haskins
Avi Kivity wrote:
 Gregory Haskins wrote:
 I agree that deassign is needed for reasons of symmetry, and that it
 can be added later.

 
 Cool.

 FYI: Davide's patch has been accepted into -mm (Andrew CC'd).  I am not
 sure of the protocol here, but I assume this means you can now safely
 pull it from -mm into kvm.git so the prerequisite for 2/2 is properly
 met.
   

 I'm not sure either.

 But I think I saw a Thanks for catching that for 2/2?

Ah, right!  I queued that fix up eons ago after David's feedback and
forgot that it was there waiting for me ;)

Since Paul ok'd (I think?) the srcu design, and the only other feedback
was the key-bitmap thing from Davide, I will go ahead and push a v2 with
just that one fix (unless there is any other feedback?)

-Greg



signature.asc
Description: OpenPGP digital signature


Re: [KVM-RFC PATCH 0/2] irqfd: use POLLHUP notification for close()

2009-06-04 Thread Avi Kivity

Gregory Haskins wrote:

Since Paul ok'd (I think?) the srcu design, and the only other feedback
was the key-bitmap thing from Davide, I will go ahead and push a v2 with
just that one fix (unless there is any other feedback?)
  


I'll do a detailed review on your next posting.  When I see a long 
thread I go hide under the bed, where there is no Internet access.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Avi Kivity

Bharata B Rao wrote:

2. Need for hard limiting CPU resource
--
- Pay-per-use: In enterprise systems that cater to multiple clients/customers
  where a customer demands a certain share of CPU resources and pays only
  that, CPU hard limits will be useful to hard limit the customer's job
  to consume only the specified amount of CPU resource.
- In container based virtualization environments running multiple containers,
  hard limits will be useful to ensure a container doesn't exceed its
  CPU entitlement.
- Hard limits can be used to provide guarantees.
  

How can hard limits provide guarantees?

Let's take an example where I have 1 group that I wish to guarantee a 
20% share of the cpu, and anther 8 groups with no limits or guarantees.


One way to achieve the guarantee is to hard limit each of the 8 other 
groups to 10%; the sum total of the limits is 80%, leaving 20% for the 
guarantee group. The downside is the arbitrary limit imposed on the 
other groups.


Another way is to place the 8 groups in a container group, and limit 
that to 80%. But that doesn't work if I want to provide guarantees to 
several groups.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v5 0/2] iosignalfd

2009-06-04 Thread Avi Kivity

Gregory Haskins wrote:

Marcello, Avi, and myself have previously agreed that Marcello's
mmio-locking cleanup should go in first.   When that happens, I will
need to rebase this series because it changes how you interface to the
io_bus code.  I should have mentioned that here, but forgot.  (Speaking
of, is there an ETA when that code will be merged Avi?)
  


I had issues with the unbalanced locking the patchset introduced in 
coalesced_mmio, once these are resolved the patchset will be merged.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [2/2] KVM: Add VT-x machine check support

2009-06-04 Thread Andi Kleen
On Thu, Jun 04, 2009 at 02:48:17PM +0300, Avi Kivity wrote:
 Andi Kleen wrote:
 [Avi could you please still consider this patch for your 2.6.31 patchqueue?
 It's fairly simple, but important to handle memory errors in guests]
   
 
 Oh yes, and it'll be needed for -stable.  IIUC, right now a machine 
 check is trapped by the guest, so the guest is killed instead of the host?

Yes the guest will receive int 18.

But it will not kill itmelf because the guest cannot access the machine check
MSRs, so it will not see any machine check. So it's kind of ignored,
which is pretty bad.

 
 +/*
 + * Trigger machine check on the host. We assume all the MSRs are already 
 set up
 + * by the CPU and that we still run on the same CPU as the MCE occurred 
 on.
 + * We pass a fake environment to the machine check handler because we want
 + * the guest to be always treated like user space, no matter what context
 + * it used internally.
 + */
   
 
 This assumption is incorrect.  This code is executed after preemption 
 has been enabled, and we may have even slept before reaching it.

The only thing that counts here is the context before the machine
check event. If there was a vmexit we know it was in guest context.

The only requirement we have is that we're running still on the same
CPU. I assume that's true, otherwise the vmcb accesses wouldn't work?

  [EXIT_REASON_EPT_VIOLATION]   = handle_ept_violation,
 +[EXIT_REASON_MACHINE_CHECK]   = handle_machine_check,
  };
  
  static const int kvm_vmx_max_exit_handlers =
   
 
 We get both an explicit EXIT_REASON and an exception?

These are different cases. The exception is #MC in guest context,
the EXIT_REASON is when a #MC happens while the CPU is executing
the VM entry microcode.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] cleanup acpi table creation

2009-06-04 Thread Gleb Natapov
Current code is a mess. And addition of acpi tables is broken.

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/kvm/bios/rombios32.c b/kvm/bios/rombios32.c
index 369cbef..fda4894 100755
--- a/kvm/bios/rombios32.c
+++ b/kvm/bios/rombios32.c
@@ -1293,15 +1293,13 @@ struct rsdp_descriptor /* Root System 
Descriptor Pointer */
uint8_treserved [3];   /* Reserved 
field must be 0 */
 } __attribute__((__packed__));
 
-#define MAX_RSDT_ENTRIES 100
-
 /*
  * ACPI 1.0 Root System Description Table (RSDT)
  */
 struct rsdt_descriptor_rev1
 {
ACPI_TABLE_HEADER_DEF   /* ACPI common table 
header */
-   uint32_t table_offset_entry 
[MAX_RSDT_ENTRIES]; /* Array of pointers to other */
+   uint32_t table_offset_entry [0]; /* Array 
of pointers to other */
 /* ACPI tables */
 } __attribute__((__packed__));
 
@@ -1585,324 +1583,332 @@ static void acpi_build_srat_memory(struct 
srat_memory_affinity *numamem,
  return;
 }
 
-/* base_addr must be a multiple of 4KB */
-void acpi_bios_init(void)
+static void rsdp_build(struct rsdp_descriptor *rsdp, uint32_t rsdt)
 {
-struct rsdp_descriptor *rsdp;
-struct rsdt_descriptor_rev1 *rsdt;
-struct fadt_descriptor_rev1 *fadt;
-struct facs_descriptor_rev1 *facs;
-struct multiple_apic_table *madt;
-uint8_t *dsdt, *ssdt;
+ memset(rsdp, 0, sizeof(*rsdp));
+ memcpy(rsdp-signature, RSD PTR , 8);
 #ifdef BX_QEMU
-struct system_resource_affinity_table *srat;
-struct acpi_20_hpet *hpet;
-uint32_t hpet_addr;
-#endif
-uint32_t base_addr, rsdt_addr, fadt_addr, addr, facs_addr, dsdt_addr, 
ssdt_addr;
-uint32_t acpi_tables_size, madt_addr, madt_size, rsdt_size;
-uint32_t srat_addr,srat_size;
-uint16_t i, external_tables;
-int nb_numa_nodes;
-int nb_rsdt_entries = 0;
-
-/* reserve memory space for tables */
-#ifdef BX_USE_EBDA_TABLES
-ebda_cur_addr = align(ebda_cur_addr, 16);
-rsdp = (void *)(ebda_cur_addr);
-ebda_cur_addr += sizeof(*rsdp);
+ memcpy(rsdp-oem_id, QEMU  , 6);
 #else
-bios_table_cur_addr = align(bios_table_cur_addr, 16);
-rsdp = (void *)(bios_table_cur_addr);
-bios_table_cur_addr += sizeof(*rsdp);
+ memcpy(rsdp-oem_id, BOCHS , 6);
 #endif
+ rsdp-rsdt_physical_address = rsdt;
+ rsdp-checksum = acpi_checksum((void*)rsdp, 20);
+}
 
-#ifdef BX_QEMU
-external_tables = acpi_additional_tables();
-#else
-external_tables = 0;
-#endif
+static uint32_t facs_build(uint32_t *addr)
+{
+ struct facs_descriptor_rev1 *facs;
 
-addr = base_addr = ram_size - ACPI_DATA_SIZE;
-rsdt_addr = addr;
-rsdt = (void *)(addr);
-rsdt_size = sizeof(*rsdt) + external_tables * 4;
-addr += rsdt_size;
+ *addr = (*addr + 63)  ~63; /* 64 byte alignment for FACS */
+ facs = (void*)(*addr);
+ *addr += sizeof(*facs);
 
-fadt_addr = addr;
-fadt = (void *)(addr);
-addr += sizeof(*fadt);
+ memset(facs, 0, sizeof(*facs));
+ memcpy(facs-signature, FACS, 4);
+ facs-length = cpu_to_le32(sizeof(*facs));
+ BX_INFO(Firmware waking vector %p\n, facs-firmware_waking_vector);
 
-/* XXX: FACS should be in RAM */
-addr = (addr + 63)  ~63; /* 64 byte alignment for FACS */
-facs_addr = addr;
-facs = (void *)(addr);
-addr += sizeof(*facs);
+ return (uint32_t)facs;
+}
 
-dsdt_addr = addr;
-dsdt = (void *)(addr);
-addr += sizeof(AmlCode);
+static uint32_t dsdt_build(uint32_t *addr)
+{
+ uint8_t *dsdt = (void*)(*addr);
 
-#ifdef BX_QEMU
-qemu_cfg_select(QEMU_CFG_NUMA);
-nb_numa_nodes = qemu_cfg_get64();
+ *addr += sizeof(AmlCode);
+
+ memcpy(dsdt, AmlCode, sizeof(AmlCode));
+
+ return (uint32_t)dsdt;
+}
+
+static uint32_t fadt_build(uint32_t *addr, uint32_t facs, uint32_t dsdt)
+{
+ struct fadt_descriptor_rev1 *fadt = (void*)(*addr);
+
+ *addr += sizeof(*fadt);
+ memset(fadt, 0, sizeof(*fadt));
+ fadt-firmware_ctrl = facs;
+ fadt-dsdt = dsdt;
+ fadt-model = 1;
+ fadt-reserved1 = 0;
+ fadt-sci_int = cpu_to_le16(pm_sci_int);
+ fadt-smi_cmd = cpu_to_le32(SMI_CMD_IO_ADDR);
+ fadt-acpi_enable = 0xf1;
+ fadt-acpi_disable = 0xf0;
+ fadt-pm1a_evt_blk = cpu_to_le32(pm_io_base);
+ fadt-pm1a_cnt_blk = cpu_to_le32(pm_io_base + 0x04);
+ fadt-pm_tmr_blk = cpu_to_le32(pm_io_base + 0x08);
+ fadt-pm1_evt_len = 4;
+ fadt-pm1_cnt_len = 2;
+ fadt-pm_tmr_len = 4;
+ fadt-plvl2_lat = cpu_to_le16(0xfff); // C2 state not supported
+ fadt-plvl3_lat = cpu_to_le16(0xfff); // C3 state not supported
+ fadt-gpe0_blk = cpu_to_le32(0xafe0);
+ fadt-gpe0_blk_len = 4;
+ /* WBINVD + PROC_C1 + SLP_BUTTON + FIX_RTC */
+ fadt-flags = cpu_to_le32((1  0) | (1  2) | (1  5) | (1  6));
+ acpi_build_table_header((struct acpi_table_header *)fadt, FACP,
+   

[KVM PATCH v2 0/2] irqfd: use POLLHUP notification for close()

2009-06-04 Thread Gregory Haskins
(Applies to kvm.git/master:25deed73)

Please see the header for 2/2 for a description.  This patch series has been
fully tested and appears to be working correctly.

[Review notes:
  *) Paul has looked at the SRCU design and, to my knowledge, didn't find
 any holes.
  *) Michael, Avi, and myself agree that while the removal of the DEASSIGN
 vector is not desirable, the fix on close() is more important in
 the short-term.  We can always add DEASSIGN support again in the
 future with a CAP bit.
]

[Changelog:

  v2:
 *) Pulled in Davide's official patch for 1/2 from his submission
accepted into -mmotm.
 *) Fixed patch 2/2 to use the key field as a bitmap in the wakeup
logic, per Davide's feedback.

  v1:
 *) Initial release
]
  

---

Davide Libenzi (1):
  Allow waiters to be notified about the eventfd file* going away, and give

Gregory Haskins (1):
  kvm: use POLLHUP to close an irqfd instead of an explicit ioctl


 fs/eventfd.c|   10 +++
 include/linux/kvm.h |2 -
 virt/kvm/eventfd.c  |  177 +++
 virt/kvm/kvm_main.c |3 +
 4 files changed, 90 insertions(+), 102 deletions(-)

-- 
Signature
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM PATCH v2 1/2] Allow waiters to be notified about the eventfd file* going away, and give

2009-06-04 Thread Gregory Haskins
From: Davide Libenzi davi...@xmailserver.org

them a change to unregister from the wait queue.  This is turn allows
eventfd users to use the eventfd file* w/out holding a live reference to
it.

After the eventfd user callbacks returns, any usage of the eventfd file*
should be dropped.  The eventfd user callback can acquire sleepy locks
since it is invoked lockless.

This is a feature, needed by KVM to avoid an awkward workaround when using
eventdf.

[gmh: pulled from -mmotm for inclusion in kvm.git]

Signed-off-by: Davide Libenzi davi...@xmailserver.org
Tested-by: Gregory Haskins ghask...@novell.com
Signed-off-by: Andrew Morton a...@linux-foundation.org
Signed-off-by: Gregory Haskins ghask...@novell.com
---

 fs/eventfd.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 3f0e197..72f5f8d 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -61,7 +61,15 @@ EXPORT_SYMBOL_GPL(eventfd_signal);
 
 static int eventfd_release(struct inode *inode, struct file *file)
 {
-   kfree(file-private_data);
+   struct eventfd_ctx *ctx = file-private_data;
+
+   /*
+* No need to hold the lock here, since we are on the file cleanup
+* path and the ones still attached to the wait queue will be
+* serialized by wake_up_locked_poll().
+*/
+   wake_up_locked_poll(ctx-wqh, POLLHUP);
+   kfree(ctx);
return 0;
 }
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[KVM PATCH v2 2/2] kvm: use POLLHUP to close an irqfd instead of an explicit ioctl

2009-06-04 Thread Gregory Haskins
Assigning an irqfd object to a kvm object creates a relationship that we
currently manage by having the kvm oject acquire/hold a file* reference to
the underlying eventfd.  The lifetime of these objects is properly maintained
by decoupling the two objects whenever the irqfd is closed or kvm is closed,
whichever comes first.

However, the irqfd close method is less than ideal since it requires two
system calls to complete (one for ioctl(kvmfd, IRQFD_DEASSIGN), the other for
close(eventfd)).  This dual-call approach was utilized because there was no
notification mechanism on the eventfd side at the time irqfd was implemented.

Recently, Davide proposed a patch to send a POLLHUP wakeup whenever an
eventfd is about to close.  So we eliminate the IRQFD_DEASSIGN ioctl (*)
vector in favor of sensing the desassign automatically when the fd is closed.
The resulting code is slightly more complex as a result since we need to
allow either side to sever the relationship independently.  We utilize SRCU
to guarantee stable concurrent access to the KVM pointer without adding
additional atomic operations in the fast path.

At minimum, this design should be acked by both Davide and Paul (cc'd).

(*) The irqfd patch does not exist in any released tree, so the understanding
is that we can alter the irqfd specific ABI without taking the normal
precautions, such as CAP bits.

Signed-off-by: Gregory Haskins ghask...@novell.com
CC: Davide Libenzi davi...@xmailserver.org
CC: Michael S. Tsirkin m...@redhat.com
CC: Paul E. McKenney paul...@linux.vnet.ibm.com
---

 include/linux/kvm.h |2 -
 virt/kvm/eventfd.c  |  177 +++
 virt/kvm/kvm_main.c |3 +
 3 files changed, 81 insertions(+), 101 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 632a856..29b62cc 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -482,8 +482,6 @@ struct kvm_x86_mce {
 };
 #endif
 
-#define KVM_IRQFD_FLAG_DEASSIGN (1  0)
-
 struct kvm_irqfd {
__u32 fd;
__u32 gsi;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index f3f2ea1..004c660 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -37,39 +37,92 @@
  * 
  */
 struct _irqfd {
+   struct mutex  lock;
+   struct srcu_structsrcu;
struct kvm   *kvm;
int   gsi;
-   struct file  *file;
struct list_head  list;
poll_tablept;
wait_queue_head_t*wqh;
wait_queue_t  wait;
-   struct work_structwork;
+   struct work_structinject;
 };
 
 static void
 irqfd_inject(struct work_struct *work)
 {
-   struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
-   struct kvm *kvm = irqfd-kvm;
+   struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
+   struct kvm *kvm;
+   int idx;
+
+   idx = srcu_read_lock(irqfd-srcu);
+
+   kvm = rcu_dereference(irqfd-kvm);
+   if (kvm) {
+   mutex_lock(kvm-lock);
+   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
+   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
+   mutex_unlock(kvm-lock);
+   }
+
+   srcu_read_unlock(irqfd-srcu, idx);
+}
+
+static void
+irqfd_disconnect(struct _irqfd *irqfd)
+{
+   struct kvm *kvm;
+
+   mutex_lock(irqfd-lock);
+
+   kvm = rcu_dereference(irqfd-kvm);
+   rcu_assign_pointer(irqfd-kvm, NULL);
+
+   mutex_unlock(irqfd-lock);
+
+   if (!kvm)
+   return;
 
mutex_lock(kvm-lock);
-   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
-   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
+   list_del(irqfd-list);
mutex_unlock(kvm-lock);
+
+   /*
+* It is important to not drop the kvm reference until the next grace
+* period because there might be lockless references in flight up
+* until then
+*/
+   synchronize_srcu(irqfd-srcu);
+   kvm_put_kvm(kvm);
 }
 
 static int
 irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
 {
struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
+   unsigned long flags = (unsigned long)key;
 
-   /*
-* The wake_up is called with interrupts disabled.  Therefore we need
-* to defer the IRQ injection until later since we need to acquire the
-* kvm-lock to do so.
-*/
-   schedule_work(irqfd-work);
+   if (flags  POLLIN)
+   /*
+* The POLLIN wake_up is called with interrupts disabled.
+* Therefore we need to defer the IRQ injection until later
+* since we need to acquire the kvm-lock to do so.
+*/
+   schedule_work(irqfd-inject);
+
+   if 

Re: [PATCH] [2/2] KVM: Add VT-x machine check support

2009-06-04 Thread Avi Kivity

Andi Kleen wrote:
This assumption is incorrect.  This code is executed after preemption 
has been enabled, and we may have even slept before reaching it.



The only thing that counts here is the context before the machine
check event. If there was a vmexit we know it was in guest context.

The only requirement we have is that we're running still on the same
CPU. I assume that's true, otherwise the vmcb accesses wouldn't work?
  


It's not true, we're in preemptible context and may have even slept.

vmcs access work because we have a preempt notifier called when we are 
scheduled in, and will execute vmclear/vmptrld as necessary.  Look at 
kvm_preempt_ops in virt/kvm_main.c.



We get both an explicit EXIT_REASON and an exception?



These are different cases. The exception is #MC in guest context,
the EXIT_REASON is when a #MC happens while the CPU is executing
the VM entry microcode.
  


I see, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [2/2] KVM: Add VT-x machine check support

2009-06-04 Thread Andi Kleen
On Thu, Jun 04, 2009 at 03:49:03PM +0300, Avi Kivity wrote:
 Andi Kleen wrote:
 This assumption is incorrect.  This code is executed after preemption 
 has been enabled, and we may have even slept before reaching it.
 
 
 The only thing that counts here is the context before the machine
 check event. If there was a vmexit we know it was in guest context.
 
 The only requirement we have is that we're running still on the same
 CPU. I assume that's true, otherwise the vmcb accesses wouldn't work?
   
 
 It's not true, we're in preemptible context and may have even slept.
 
 vmcs access work because we have a preempt notifier called when we are 
 scheduled in, and will execute vmclear/vmptrld as necessary.  Look at 
 kvm_preempt_ops in virt/kvm_main.c.

I see. So we need to move that check earlier. Do you have a preference
where it should be?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] Do not use cpu_index in interface between libkvm and qemu

2009-06-04 Thread Avi Kivity

Gleb Natapov wrote:

On vcpu creation cookie is returned which is used in future communication.

  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] revert part of 3db8b916e merge

2009-06-04 Thread Avi Kivity

Gleb Natapov wrote:

kvm_*_mpstate() cannot be called from kvm_arch_*_registers()
since kvm_arch_*_registers() sometimes called from io thread, but
kvm_*_mpstate() can be called only by cpu thread.
  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: Flush icache after dma operations for ia64

2009-06-04 Thread Jes Sorensen

Zhang, Xiantao wrote:

Hi, Jes
Have you verified whether it works for you ?  You may run kernel build in 
the guest with 4 vcpus,  if it can be done successfully without any error, it 
should be Okay I think, otherwise, we may need to investigate it further. :)
Xiantao 


Hi Xiantao,

I was able to run a 16 vCPU guest and build the kernel using make -j 16.
How quickly would the problem show up for you, on every run, or should I
run more tests?

Cheers,
Jes
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [2/2] KVM: Add VT-x machine check support

2009-06-04 Thread Avi Kivity

Andi Kleen wrote:
vmcs access work because we have a preempt notifier called when we are 
scheduled in, and will execute vmclear/vmptrld as necessary.  Look at 
kvm_preempt_ops in virt/kvm_main.c.



I see. So we need to move that check earlier. Do you have a preference
where it should be?
  


There's no good place as it breaks the nice exit handler table.  You 
could put it in vmx_complete_interrupts() next to NMI handling.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v4 3/3] kvm: add iosignalfd support

2009-06-04 Thread Mark McLoughlin
Hi Greg,

On Wed, 2009-06-03 at 18:04 -0400, Gregory Haskins wrote:
 Hi Mark,
   So with the v5 release of iosignalfd, we now have the notion of a
 trigger, the API of which is as follows:
 
 ---
 /*!
  * \brief Assign an eventfd to an IO port (PIO or MMIO)
  *
  * Assigns an eventfd based file-descriptor to a specific PIO or MMIO
  * address range.  Any guest writes to the specified range will generate
  * an eventfd signal.
  *
  * A data-match pointer can be optionally provided in trigger and only
  * writes which match this value exactly will generate an event.  The length
  * of the trigger is established by the length of the overall IO range, and
  * therefore must be in a natural byte-width for the IO routines of your
  * particular architecture (e.g. 1, 2, 4, or 8 bytes on x86_64).

This looks like it'll work fine for virtio-pci.

Thanks,
Mark.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [2/2] KVM: Add VT-x machine check support

2009-06-04 Thread Avi Kivity

Andi Kleen wrote:
There's no good place as it breaks the nice exit handler table.  You 
could put it in vmx_complete_interrupts() next to NMI handling.



I think I came up with a easy cheesy but not too bad solution now that should 
work. It simply remembers the CPU in the vcpu structure and schedules back to 
it. That's fine for this purpose. 
  


We might be able schedule back in a timely manner.  Why not hack 
vmx_complete_interrupts()?  You're still in the critical section so 
you're guaranteed no delays or surprises.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [2/2] KVM: Add VT-x machine check support

2009-06-04 Thread Andi Kleen
On Thu, Jun 04, 2009 at 04:49:50PM +0300, Avi Kivity wrote:
 Andi Kleen wrote:
 There's no good place as it breaks the nice exit handler table.  You 
 could put it in vmx_complete_interrupts() next to NMI handling.
 
 
 I think I came up with a easy cheesy but not too bad solution now that 
 should work. It simply remembers the CPU in the vcpu structure and 
 schedules back to it. That's fine for this purpose. 
   
 
 We might be able schedule back in a timely manner.  Why not hack 
 vmx_complete_interrupts()?  You're still in the critical section so 
 you're guaranteed no delays or surprises.

Yes, have to do that. My original scheme was too risky because
the Machine checks have synchronization mechanisms now and 
preemption has no time limit.

I'll hack on it later today, hope fully have a patch tomorrow.

-Andi


-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [KVM PATCH v2 0/2] irqfd: use POLLHUP notification for close()

2009-06-04 Thread Avi Kivity

Gregory Haskins wrote:

(Applies to kvm.git/master:25deed73)

Please see the header for 2/2 for a description.  This patch series has been
fully tested and appears to be working correctly.

  


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] apic/ioapic kvm free implementation

2009-06-04 Thread Gleb Natapov
On Wed, Jun 03, 2009 at 05:19:26PM -0400, Glauber Costa wrote:
 Same thing,
 
 addressing comments from gleb.
 
 
Jan, can you run your test on this one? It differs from previous one in
halt handling.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2801212 ] sles10sp2 guest timer run too fast

2009-06-04 Thread SourceForge.net
Bugs item #2801212, was opened at 2009-06-04 08:17
Message generated for change (Tracker Item Submitted) made by jiajun
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2801212group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jiajun Xu (jiajun)
Assigned to: Nobody/Anonymous (nobody)
Summary: sles10sp2 guest timer run too fast 

Initial Comment:
With kvm.git Commit:7ff90748cebbfbafc8cfa6bdd633113cd9537789
qemu-kvm Commit:a1cd3c985c848dae73966f9601f15fbcade72f1, we found that 
sles10sp2 will run much faster than real, about 27s faster each after 60s real 
time.

Reproduce steps:

(1)qemu-system-x86_64  -m 1024 -net nic,macaddr=00:16:3e:6f:f3:d1,model=rtl8139 
-net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/sles10sp2.img
(2)Run ntpdate in guest: ntpdate sync_machine_ip  sleep 60  ntpdate 
sync_machine_ip

Current result:

sles10sp2rc1-guest:~ #  ntpdate sync_machine_ip  sleep 60  ntpdate 
sync_machine_ip
31 May 23:16:59 ntpdate[3303]: step time server 192.168.198.248 offset
-61.27418
31 May 23:17:32 ntpdate[3305]: step time server 192.168.198.248 offset
-27.626469 sec

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2801212group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


NV-CUDA: a new way in virtualization is possible?

2009-06-04 Thread OneSoul

Hello all!

I'm a KVM/Qemu user for a long time and I'm very satisfied by its 
features of flexibility, power and portability - really a good project!


Recently, reading some technical articles over internet, I have 
discoverd the big potentialities of the NV-CUDA framework in relation to 
the scientific and graphic computing that takes strong advantage from 
the most recent GPUs. Someone has used it for password recovery, 
realtime rendering, etc, with great results.


It would be possible to use this technology in the KVM/Qemu project to 
achieve better performance?
It could be a significative step for the develop in virtualization 
technology?


Someone, in experimental way, has (re)wrote the md-raid kernel modules 
using the CUDA framework to accelerate some features... and it seems 
that works fine.
Why not for KVM/Qemu or related projects, including kernel/user-space 
extension?


What do you think about this draft idea?

Any feedback is welcome...
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NV-CUDA: a new way in virtualization is possible?

2009-06-04 Thread Pantelis Koukousoulas
 It would be possible to use this technology in the KVM/Qemu project to
 achieve better performance?
 It could be a significative step for the develop in virtualization
 technology?

Nothing is impossible, but it is at least not obvious how to pull
off such a trick.
Qemu/KVM is not embarrassingly parallelizable, at least not in a
straightforward
way imho.

 Someone, in experimental way, has (re)wrote the md-raid kernel modules using
 the CUDA framework to accelerate some features... and it seems that works
 fine.
 Why not for KVM/Qemu or related projects, including kernel/user-space
 extension?

RAID is easy, as is FFT, graphics operations, cryptography etc. People
have been parallelizing these algorithms for several years before even nvidia
existed and CUDA is just a new backend to apply more or less the same
techniques.

KVM/Qemu on the other hand are not 100% CPU bound and are also not
trivial to massively parallelize, so you might find the task a bit hard.

HTH,
Pantelis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


TODO list for qemu+KVM networking performance v2

2009-06-04 Thread Michael S. Tsirkin
As I'm new to qemu/kvm, to figure out how networking performance can be 
improved, I
went over the code and took some notes.  As I did this, I tried to record ideas
from recent discussions and ideas that came up on improving performance. Thus
this list.

This includes a partial overview of networking code in a virtual environment, 
with
focus on performance: I'm only interested in sending and receiving packets,
ignoring configuration etc.

I have likely missed a ton of clever ideas and older discussions, and probably
misunderstood some code. Please pipe up with corrections, additions, etc. And
please don't take offence if I didn't attribute the idea correctly - most of
them are marked mst by I don't claim they are original. Just let me know.

And there are a couple of trivial questions on the code - I'll
add answers here as they become available.

I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as
well, and intend to dump updates there from time to time.

Thanks,
MST

---

There are many ways to set up networking in a virtual machone.
here's one: linux guest - virtio-net - virtio-pci - qemu+kvm - tap - 
bridge.
Let's take a look at this one.

Virtio is the guest side of things.

Guest kernel virtio-net:

TX:
- Guest kernel allocates a packet (skb) in guest kernel memory
  and fills it in with data, passes it to networking stack.
- The skb is passed on to guest network driver
  (hard_start_xmit)
- skbs in flight are kept in send queue linked list,
  so that we can flush them when device is removed
  [ mst: optimization idea: virtqueue already tracks
posted buffers. Add flush/purge operation and use that instead? ]
- skb is reformatted to scattergather format
  [ mst: idea to try: this does a copy for skb head,
which might be costly especially for small/linear packets.
Try to avoid this? Might need to tweak virtio interface.
  ]
- network driver adds the packet buffer on TX ring
- network driver does a kick which causes a VM exit
  [ mst: any way to mitigate # of VM exits here?
Possibly could be done on host side as well. ]
  [ markmc: All of our efforts there have been on the host side, I think
that's preferable than trying to do anything on the guest side. ]

- Full queue:
we keep a single extra skb around:
if we fail to transmit, we queue it
[ mst: idea to try: what does it do to
  performance if we queue more packets? ]
if we already have 1 outstanding packet,
we stop the queue and discard the new packet
[ mst: optimization idea: might be better to discard the old
  packet and queue the new one, e.g. with TCP old one
  might have timed out already ]
[ markmc: the queue might soon be going away:
   200905292346.04815.ru...@rustcorp.com.au
   
http://archive.netbsd.se/?ml=linux-netdeva=2009-05m=10788575
]

- We get each buffer from host as it is completed and free it
- TX interrupts are only enabled when queue is stopped,
  and when it is originally created (we disable them on completion)
  [ mst: idea: second part is probably unintentional.
todo: we probably should disable interrupts when device is created. 
]
- We poll for buffer completions:
  1. Before each TX 2. On a timer tasklet (unless 3 is supported)
  3. When host sends us interrupt telling us that the queue is empty
  [ mst: idea to try: instead of empty, enable send interrupts on xmit 
when
buffer is almost full (e.g. at least half empty): we are running 
out of
buffers, it's important to free them ASAP. Can be done
from host or from guest. ]
  [ Rusty proposing that we don't need (2) or (3) if the skbs are 
orphaned
before start_xmit(). See subj net: skb_orphan on 
dev_hard_start_xmit.]
  [ rusty also seems to be suggesting that disabling 
VIRTIO_F_NOTIFY_ON_EMPTY
on the host should help the case where the host out-paces the guest
  ]
  4. when queue is stopped or when first packet was sent after device
 was created (interrupts are enabled then)


RX:
- There are really 2 mostly separate code paths: with mergeable
  rx buffers support in host and without. I focus on mergeable
  buffers here since this is the default in recent qemu.
  [mst: optimization idea: mark mergeable_rx_bufs as likely() then?]
- Each skb has a 128 byte buffer at head and a single page for data.
  Only full pages are passed to virtio buffers.
  [ mst: for large packets, managing the 128 head buffers is wasted
   

Re: TODO list for qemu+KVM networking performance v2

2009-06-04 Thread Gregory Haskins
Michael S. Tsirkin wrote:
 As I'm new to qemu/kvm, to figure out how networking performance can be 
 improved, I
 went over the code and took some notes.  As I did this, I tried to record 
 ideas
 from recent discussions and ideas that came up on improving performance. Thus
 this list.

 This includes a partial overview of networking code in a virtual environment, 
 with
 focus on performance: I'm only interested in sending and receiving packets,
 ignoring configuration etc.

 I have likely missed a ton of clever ideas and older discussions, and probably
 misunderstood some code. Please pipe up with corrections, additions, etc. And
 please don't take offence if I didn't attribute the idea correctly - most of
 them are marked mst by I don't claim they are original. Just let me know.

 And there are a couple of trivial questions on the code - I'll
 add answers here as they become available.

 I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as
 well, and intend to dump updates there from time to time.
   

Hi Michael,
  Not sure if you have seen this, but I've already started to work on
the code for in-kernel devices and have a (currently non-virtio based)
proof-of-concept network device which you can for comparative data.  You
can find details here:

http://lkml.org/lkml/2009/4/21/408

snip

(Will look at your list later, to see if I can add anything)
 ---

 Short term plans: I plan to start out with trying out the following ideas:

 save a copy in qemu on RX side in case of a single nic in vlan
 implement virtio-host kernel module

 *detail on virtio-host-net kernel module project*

 virtio-host-net is a simple character device which gets memory layout 
 information
 from qemu, and uses this to convert between virtio descriptors to skbs.
 The skbs are then passed to/from raw socket (or we could bind virtio-host
 to physical device like raw socket does TBD).

 Interrupts will be reported to eventfd descriptors, and device will poll
 eventfd descriptors to get kicks from guest.

   

I currently have a virtio transport for vbus implemented, but it still
needs a virtio-net device-model backend written.  If you are interested,
we can work on this together to implement your idea.  Its on my todo
list for vbus anyway, but I am currently distracted with the
irqfd/iosignalfd projects which are prereqs for vbus to be considered
for merge.

Basically vbus is a framework for declaring in-kernel devices (not kvm
specific, per se) with a full security/containment model, a
hot-pluggable configuration engine, and a dynamically loadable 
device-model.  The framework takes care of the details of signal-path
and memory routing for you so that something like a virtio-net model can
be implemented once and work in a variety of environments such as kvm,
lguest, etc.

Interested?
-Greg



signature.asc
Description: OpenPGP digital signature


Re: TODO list for qemu+KVM networking performance v2

2009-06-04 Thread Michael S. Tsirkin
On Thu, Jun 04, 2009 at 01:16:05PM -0400, Gregory Haskins wrote:
 Michael S. Tsirkin wrote:
  As I'm new to qemu/kvm, to figure out how networking performance can be 
  improved, I
  went over the code and took some notes.  As I did this, I tried to record 
  ideas
  from recent discussions and ideas that came up on improving performance. 
  Thus
  this list.
 
  This includes a partial overview of networking code in a virtual 
  environment, with
  focus on performance: I'm only interested in sending and receiving packets,
  ignoring configuration etc.
 
  I have likely missed a ton of clever ideas and older discussions, and 
  probably
  misunderstood some code. Please pipe up with corrections, additions, etc. 
  And
  please don't take offence if I didn't attribute the idea correctly - most of
  them are marked mst by I don't claim they are original. Just let me know.
 
  And there are a couple of trivial questions on the code - I'll
  add answers here as they become available.
 
  I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as
  well, and intend to dump updates there from time to time.

 
 Hi Michael,
   Not sure if you have seen this, but I've already started to work on
 the code for in-kernel devices and have a (currently non-virtio based)
 proof-of-concept network device which you can for comparative data.  You
 can find details here:
 
 http://lkml.org/lkml/2009/4/21/408
 
 snip

Thanks

 (Will look at your list later, to see if I can add anything)
  ---
 
  Short term plans: I plan to start out with trying out the following ideas:
 
  save a copy in qemu on RX side in case of a single nic in vlan
  implement virtio-host kernel module
 
  *detail on virtio-host-net kernel module project*
 
  virtio-host-net is a simple character device which gets memory layout 
  information
  from qemu, and uses this to convert between virtio descriptors to skbs.
  The skbs are then passed to/from raw socket (or we could bind virtio-host
  to physical device like raw socket does TBD).
 
  Interrupts will be reported to eventfd descriptors, and device will poll
  eventfd descriptors to get kicks from guest.
 

 
 I currently have a virtio transport for vbus implemented, but it still
 needs a virtio-net device-model backend written.

You mean virtio-ring implementation?
I intended to basically start by reusing the code from
Documentation/lguest/lguest.c
Isn't this all there is to it?

  If you are interested,
 we can work on this together to implement your idea.  Its on my todo
 list for vbus anyway, but I am currently distracted with the
 irqfd/iosignalfd projects which are prereqs for vbus to be considered
 for merge.
 
 Basically vbus is a framework for declaring in-kernel devices (not kvm
 specific, per se) with a full security/containment model, a
 hot-pluggable configuration engine, and a dynamically loadable 
 device-model.  The framework takes care of the details of signal-path
 and memory routing for you so that something like a virtio-net model can
 be implemented once and work in a variety of environments such as kvm,
 lguest, etc.
 
 Interested?
 -Greg
 

It seems that a character device with a couple of ioctls would be simpler
for an initial prototype.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure

2009-06-04 Thread Michael S. Tsirkin
On Thu, Apr 09, 2009 at 12:30:57PM -0400, Gregory Haskins wrote:
 +static unsigned long
 +task_memctx_copy_to(struct vbus_memctx *ctx, void *dst, const void *src,
 + unsigned long n)
 +{
 + struct task_memctx *tm = to_task_memctx(ctx);
 + struct task_struct *p = tm-task;
 +
 + while (n) {
 + unsigned long offset = ((unsigned long)dst)%PAGE_SIZE;
 + unsigned long len = PAGE_SIZE - offset;
 + int ret;
 + struct page *pg;
 + void *maddr;
 +
 + if (len  n)
 + len = n;
 +
 + down_read(p-mm-mmap_sem);
 + ret = get_user_pages(p, p-mm,
 +  (unsigned long)dst, 1, 1, 0, pg, NULL);
 +
 + if (ret != 1) {
 + up_read(p-mm-mmap_sem);
 + break;
 + }
 +
 + maddr = kmap_atomic(pg, KM_USER0);
 + memcpy(maddr + offset, src, len);
 + kunmap_atomic(maddr, KM_USER0);
 + set_page_dirty_lock(pg);
 + put_page(pg);
 + up_read(p-mm-mmap_sem);
 +
 + src += len;
 + dst += len;
 + n -= len;
 + }
 +
 + return n;
 +}

BTW, why did you decide to use get_user_pages?
Would switch_mm + copy_to_user work as well
avoiding page walk if all pages are present?

Also - if we just had vmexit because a process executed
io (or hypercall), can't we just do copy_to_user there?
Avi, I think at some point you said that we can?


-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TODO list for qemu+KVM networking performance v2

2009-06-04 Thread Michael S. Tsirkin
On Thu, Jun 04, 2009 at 01:50:20PM -0400, Gregory Haskins wrote:
 Suit yourself, but I suspect that by the time you build the prototype
 you will either end up re-solving all the same problems anyway, or have
 diminished functionality (or both).

/me goes to look at vbus patches.


-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TODO list for qemu+KVM networking performance v2

2009-06-04 Thread Gregory Haskins
Michael S. Tsirkin wrote:
 On Thu, Jun 04, 2009 at 01:16:05PM -0400, Gregory Haskins wrote:
   
 Michael S. Tsirkin wrote:
 
 As I'm new to qemu/kvm, to figure out how networking performance can be 
 improved, I
 went over the code and took some notes.  As I did this, I tried to record 
 ideas
 from recent discussions and ideas that came up on improving performance. 
 Thus
 this list.

 This includes a partial overview of networking code in a virtual 
 environment, with
 focus on performance: I'm only interested in sending and receiving packets,
 ignoring configuration etc.

 I have likely missed a ton of clever ideas and older discussions, and 
 probably
 misunderstood some code. Please pipe up with corrections, additions, etc. 
 And
 please don't take offence if I didn't attribute the idea correctly - most of
 them are marked mst by I don't claim they are original. Just let me know.

 And there are a couple of trivial questions on the code - I'll
 add answers here as they become available.

 I out up a copy at http://www.linux-kvm.org/page/Networking_Performance as
 well, and intend to dump updates there from time to time.
   
   
 Hi Michael,
   Not sure if you have seen this, but I've already started to work on
 the code for in-kernel devices and have a (currently non-virtio based)
 proof-of-concept network device which you can for comparative data.  You
 can find details here:

 http://lkml.org/lkml/2009/4/21/408

 snip
 

 Thanks

   
 (Will look at your list later, to see if I can add anything)
 
 ---

 Short term plans: I plan to start out with trying out the following ideas:

 save a copy in qemu on RX side in case of a single nic in vlan
 implement virtio-host kernel module

 *detail on virtio-host-net kernel module project*

 virtio-host-net is a simple character device which gets memory layout 
 information
 from qemu, and uses this to convert between virtio descriptors to skbs.
 The skbs are then passed to/from raw socket (or we could bind virtio-host
 to physical device like raw socket does TBD).

 Interrupts will be reported to eventfd descriptors, and device will poll
 eventfd descriptors to get kicks from guest.

   
   
 I currently have a virtio transport for vbus implemented, but it still
 needs a virtio-net device-model backend written.
 

 You mean virtio-ring implementation?
   

Right.

 I intended to basically start by reusing the code from
 Documentation/lguest/lguest.c
 Isn't this all there is to it?
   

Not sure.  I reused the ring code already in the kernel.

   
  If you are interested,
 we can work on this together to implement your idea.  Its on my todo
 list for vbus anyway, but I am currently distracted with the
 irqfd/iosignalfd projects which are prereqs for vbus to be considered
 for merge.

 Basically vbus is a framework for declaring in-kernel devices (not kvm
 specific, per se) with a full security/containment model, a
 hot-pluggable configuration engine, and a dynamically loadable 
 device-model.  The framework takes care of the details of signal-path
 and memory routing for you so that something like a virtio-net model can
 be implemented once and work in a variety of environments such as kvm,
 lguest, etc.

 Interested?
 -Greg

 

 It seems that a character device with a couple of ioctls would be simpler
 for an initial prototype.
   

Suit yourself, but I suspect that by the time you build the prototype
you will either end up re-solving all the same problems anyway, or have
diminished functionality (or both).  Its actually very simple to declare
a new virtio-vbus device, but the choice is yours.  I can crank out a
skeleton for you, if you like.

-Greg




signature.asc
Description: OpenPGP digital signature


[patch 1/4] KVM: x86: grab pic lock in kvm_pic_clear_isr_ack

2009-06-04 Thread Marcelo Tosatti
isr_ack is protected by kvm_pic-lock.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/i8259.c
===
--- kvm.orig/arch/x86/kvm/i8259.c
+++ kvm/arch/x86/kvm/i8259.c
@@ -72,8 +72,10 @@ static void pic_clear_isr(struct kvm_kpi
 void kvm_pic_clear_isr_ack(struct kvm *kvm)
 {
struct kvm_pic *s = pic_irqchip(kvm);
+   pic_lock(s);
s-pics[0].isr_ack = 0xff;
s-pics[1].isr_ack = 0xff;
+   pic_unlock(s);
 }
 
 /*

-- 

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 2/4] KVM: move coalesced_mmio locking to its own device

2009-06-04 Thread Marcelo Tosatti
Move coalesced_mmio locking to its own device, instead of relying on
kvm-lock.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/virt/kvm/coalesced_mmio.c
===
--- kvm.orig/virt/kvm/coalesced_mmio.c
+++ kvm/virt/kvm/coalesced_mmio.c
@@ -31,10 +31,6 @@ static int coalesced_mmio_in_range(struc
if (!is_write)
return 0;
 
-   /* kvm-lock is taken by the caller and must be not released before
- * dev.read/write
- */
-
/* Are we able to batch it ? */
 
/* last is the first free entry
@@ -70,7 +66,7 @@ static void coalesced_mmio_write(struct 
struct kvm_coalesced_mmio_dev *dev = to_mmio(this);
struct kvm_coalesced_mmio_ring *ring = dev-kvm-coalesced_mmio_ring;
 
-   /* kvm-lock must be taken by caller before call to in_range()*/
+   spin_lock(dev-lock);
 
/* copy data in first free entry of the ring */
 
@@ -79,6 +75,7 @@ static void coalesced_mmio_write(struct 
memcpy(ring-coalesced_mmio[ring-last].data, val, len);
smp_wmb();
ring-last = (ring-last + 1) % KVM_COALESCED_MMIO_MAX;
+   spin_unlock(dev-lock);
 }
 
 static void coalesced_mmio_destructor(struct kvm_io_device *this)
@@ -101,6 +98,7 @@ int kvm_coalesced_mmio_init(struct kvm *
dev = kzalloc(sizeof(struct kvm_coalesced_mmio_dev), GFP_KERNEL);
if (!dev)
return -ENOMEM;
+   spin_lock_init(dev-lock);
kvm_iodevice_init(dev-dev, coalesced_mmio_ops);
dev-kvm = kvm;
kvm-coalesced_mmio_dev = dev;
Index: kvm/virt/kvm/coalesced_mmio.h
===
--- kvm.orig/virt/kvm/coalesced_mmio.h
+++ kvm/virt/kvm/coalesced_mmio.h
@@ -12,6 +12,7 @@
 struct kvm_coalesced_mmio_dev {
struct kvm_io_device dev;
struct kvm *kvm;
+   spinlock_t lock;
int nb_zones;
struct kvm_coalesced_mmio_zone zone[KVM_COALESCED_MMIO_ZONE_MAX];
 };

-- 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 4/4] KVM: switch irq injection/acking data structures to irq_lock

2009-06-04 Thread Marcelo Tosatti
Protect irq injection/acking data structures with a separate irq_lock
mutex. This fixes the following deadlock:

CPU A   CPU B
kvm_vm_ioctl_deassign_dev_irq()
  mutex_lock(kvm-lock);worker_thread()
  - kvm_deassign_irq()- 
kvm_assigned_dev_interrupt_work_handler()
- deassign_host_irq()   mutex_lock(kvm-lock);
  - cancel_work_sync() [blocked]

Reported-by: Alex Williamson alex.william...@hp.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/i8254.c
===
--- kvm.orig/arch/x86/kvm/i8254.c
+++ kvm/arch/x86/kvm/i8254.c
@@ -651,10 +651,10 @@ static void __inject_pit_timer_intr(stru
struct kvm_vcpu *vcpu;
int i;
 
-   mutex_lock(kvm-lock);
+   mutex_lock(kvm-irq_lock);
kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1);
kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0);
-   mutex_unlock(kvm-lock);
+   mutex_unlock(kvm-irq_lock);
 
/*
 * Provides NMI watchdog support via Virtual Wire mode.
Index: kvm/arch/x86/kvm/x86.c
===
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -2099,10 +2099,10 @@ long kvm_arch_vm_ioctl(struct file *filp
goto out;
if (irqchip_in_kernel(kvm)) {
__s32 status;
-   mutex_lock(kvm-lock);
+   mutex_lock(kvm-irq_lock);
status = kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID,
irq_event.irq, irq_event.level);
-   mutex_unlock(kvm-lock);
+   mutex_unlock(kvm-irq_lock);
if (ioctl == KVM_IRQ_LINE_STATUS) {
irq_event.status = status;
if (copy_to_user(argp, irq_event,
@@ -2348,12 +2348,11 @@ mmio:
 */
mutex_lock(vcpu-kvm-lock);
mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 0);
+   mutex_unlock(vcpu-kvm-lock);
if (mmio_dev) {
kvm_iodevice_read(mmio_dev, gpa, bytes, val);
-   mutex_unlock(vcpu-kvm-lock);
return X86EMUL_CONTINUE;
}
-   mutex_unlock(vcpu-kvm-lock);
 
vcpu-mmio_needed = 1;
vcpu-mmio_phys_addr = gpa;
@@ -2403,12 +2402,11 @@ mmio:
 */
mutex_lock(vcpu-kvm-lock);
mmio_dev = vcpu_find_mmio_dev(vcpu, gpa, bytes, 1);
+   mutex_unlock(vcpu-kvm-lock);
if (mmio_dev) {
kvm_iodevice_write(mmio_dev, gpa, bytes, val);
-   mutex_unlock(vcpu-kvm-lock);
return X86EMUL_CONTINUE;
}
-   mutex_unlock(vcpu-kvm-lock);
 
vcpu-mmio_needed = 1;
vcpu-mmio_phys_addr = gpa;
@@ -2731,7 +2729,6 @@ static void kernel_pio(struct kvm_io_dev
 {
/* TODO: String I/O for in kernel device */
 
-   mutex_lock(vcpu-kvm-lock);
if (vcpu-arch.pio.in)
kvm_iodevice_read(pio_dev, vcpu-arch.pio.port,
  vcpu-arch.pio.size,
@@ -2740,7 +2737,6 @@ static void kernel_pio(struct kvm_io_dev
kvm_iodevice_write(pio_dev, vcpu-arch.pio.port,
   vcpu-arch.pio.size,
   pd);
-   mutex_unlock(vcpu-kvm-lock);
 }
 
 static void pio_string_write(struct kvm_io_device *pio_dev,
@@ -2750,14 +2746,12 @@ static void pio_string_write(struct kvm_
void *pd = vcpu-arch.pio_data;
int i;
 
-   mutex_lock(vcpu-kvm-lock);
for (i = 0; i  io-cur_count; i++) {
kvm_iodevice_write(pio_dev, io-port,
   io-size,
   pd);
pd += io-size;
}
-   mutex_unlock(vcpu-kvm-lock);
 }
 
 static struct kvm_io_device *vcpu_find_pio_dev(struct kvm_vcpu *vcpu,
@@ -2794,7 +2788,9 @@ int kvm_emulate_pio(struct kvm_vcpu *vcp
val = kvm_register_read(vcpu, VCPU_REGS_RAX);
memcpy(vcpu-arch.pio_data, val, 4);
 
+   mutex_lock(vcpu-kvm-lock);
pio_dev = vcpu_find_pio_dev(vcpu, port, size, !in);
+   mutex_unlock(vcpu-kvm-lock);
if (pio_dev) {
kernel_pio(pio_dev, vcpu, vcpu-arch.pio_data);
complete_pio(vcpu);
@@ -2858,9 +2854,12 @@ int kvm_emulate_pio_string(struct kvm_vc
 
vcpu-arch.pio.guest_gva = address;
 
+   mutex_lock(vcpu-kvm-lock);
pio_dev = vcpu_find_pio_dev(vcpu, port,
vcpu-arch.pio.cur_count,
!vcpu-arch.pio.in);
+   mutex_unlock(vcpu-kvm-lock);
+
if (!vcpu-arch.pio.in) {
/* string PIO write */
ret = pio_copy_data(vcpu);
Index: kvm/virt/kvm/kvm_main.c

Re: [patch] VMX Unrestricted mode support

2009-06-04 Thread Jan Kiszka
Nitin A Kamble wrote:
 Hi Avi,
   I find that the qemu processor reset state is not per the IA32
 processor specifications. (Sections 8.1.1 of
 http://www.intel.com/Assets/PDF/manual/253668.pdf)
 
 In qemu-kvm.git in file target-i386/helper.c in function cpu_reset the
 segment registers are initialized as follows:
 
 cpu_x86_load_seg_cache(env, R_CS, 0xf000, 0x, 0x,
DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | 
   DESC_R_MASK);
 cpu_x86_load_seg_cache(env, R_DS, 0, 0, 0x,
DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
 cpu_x86_load_seg_cache(env, R_ES, 0, 0, 0x,
DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
 cpu_x86_load_seg_cache(env, R_SS, 0, 0, 0x,
DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
 cpu_x86_load_seg_cache(env, R_FS, 0, 0, 0x,
DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
 cpu_x86_load_seg_cache(env, R_GS, 0, 0, 0x,
DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
 
 While the IA32 cpu reset state specification says that Segment Accessed
 bit is also 1 at the time of cpu reset. so the above code should look
 like this:
 
 cpu_x86_load_seg_cache(env, R_CS, 0xf000, 0x, 0x,
  DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | 
  DESC_R_MASK | DESC_A_MASK);
 cpu_x86_load_seg_cache(env, R_DS, 0, 0, 0x,
  DESC_P_MASK | DESC_S_MASK | DESC_W_MASK | DESC_A_MASK);
 cpu_x86_load_seg_cache(env, R_ES, 0, 0, 0x,
  DESC_P_MASK | DESC_S_MASK | DESC_W_MASK| DESC_A_MASK);
 cpu_x86_load_seg_cache(env, R_SS, 0, 0, 0x,
  DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |DESC_A_MASK);
 cpu_x86_load_seg_cache(env, R_FS, 0, 0, 0x,
  DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
 cpu_x86_load_seg_cache(env, R_GS, 0, 0, 0x,
  DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
 
 This discrepancy is adding the need of the following function in the
 unrestricted guest patch.

As Avi already indicated: Independent of the kvm workaround for older
qemu versions, please post (to qemu-devel) a patch against upstream's
git to fix the discrepancy.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure

2009-06-04 Thread Gregory Haskins
Michael S. Tsirkin wrote:
 On Thu, Apr 09, 2009 at 12:30:57PM -0400, Gregory Haskins wrote:
   
 +static unsigned long
 +task_memctx_copy_to(struct vbus_memctx *ctx, void *dst, const void *src,
 +unsigned long n)
 +{
 +struct task_memctx *tm = to_task_memctx(ctx);
 +struct task_struct *p = tm-task;
 +
 +while (n) {
 +unsigned long offset = ((unsigned long)dst)%PAGE_SIZE;
 +unsigned long len = PAGE_SIZE - offset;
 +int ret;
 +struct page *pg;
 +void *maddr;
 +
 +if (len  n)
 +len = n;
 +
 +down_read(p-mm-mmap_sem);
 +ret = get_user_pages(p, p-mm,
 + (unsigned long)dst, 1, 1, 0, pg, NULL);
 +
 +if (ret != 1) {
 +up_read(p-mm-mmap_sem);
 +break;
 +}
 +
 +maddr = kmap_atomic(pg, KM_USER0);
 +memcpy(maddr + offset, src, len);
 +kunmap_atomic(maddr, KM_USER0);
 +set_page_dirty_lock(pg);
 +put_page(pg);
 +up_read(p-mm-mmap_sem);
 +
 +src += len;
 +dst += len;
 +n -= len;
 +}
 +
 +return n;
 +}
 

 BTW, why did you decide to use get_user_pages?
 Would switch_mm + copy_to_user work as well
 avoiding page walk if all pages are present?
   

Well, basic c_t_u() won't work because its likely not current if you
are updating the ring from some other task, but I think you have already
figured that out based on the switch_mm suggestion.  The simple truth is
I was not familiar with switch_mm at the time I wrote this (nor am I
now).  If this is a superior method that allows you to acquire
c_t_u(some_other_ctx) like behavior, I see no problem in changing.  I
will look into this, and thanks for the suggestion!

 Also - if we just had vmexit because a process executed
 io (or hypercall), can't we just do copy_to_user there?
 Avi, I think at some point you said that we can?
   

Right, and yes that will work I believe.  We could always do a if (p ==
current) check to test for this.  To date, I don't typically do
anything mem-ops related directly in vcpu context so this wasn't an
issue...but that doesn't mean someone wont try in the future.  
Therefore, I agree we should strive to optimize it if we can.

   

Thanks Michael,
-Greg



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure

2009-06-04 Thread Avi Kivity

Michael S. Tsirkin wrote:

Also - if we just had vmexit because a process executed
io (or hypercall), can't we just do copy_to_user there?
Avi, I think at some point you said that we can?
  


You can do copy_to_user() whereever it is legal in Linux.  Almost all of 
kvm runs in process context, preemptible, and with interrupts enabled.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure

2009-06-04 Thread Avi Kivity

Gregory Haskins wrote:

   


BTW, why did you decide to use get_user_pages?
Would switch_mm + copy_to_user work as well
avoiding page walk if all pages are present?
  



Well, basic c_t_u() won't work because its likely not current if you
are updating the ring from some other task, but I think you have already
figured that out based on the switch_mm suggestion.  The simple truth is
I was not familiar with switch_mm at the time I wrote this (nor am I
now).  If this is a superior method that allows you to acquire
c_t_u(some_other_ctx) like behavior, I see no problem in changing.  I
will look into this, and thanks for the suggestion!
  


copy_to_user() is significantly faster than get_user_pages() + kmap() + 
memcmp() (or their variants).


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure

2009-06-04 Thread Gregory Haskins
Avi Kivity wrote:
 Gregory Haskins wrote:

   
 BTW, why did you decide to use get_user_pages?
 Would switch_mm + copy_to_user work as well
 avoiding page walk if all pages are present?
   

 Well, basic c_t_u() won't work because its likely not current if you
 are updating the ring from some other task, but I think you have already
 figured that out based on the switch_mm suggestion.  The simple truth is
 I was not familiar with switch_mm at the time I wrote this (nor am I
 now).  If this is a superior method that allows you to acquire
 c_t_u(some_other_ctx) like behavior, I see no problem in changing.  I
 will look into this, and thanks for the suggestion!
   

 copy_to_user() is significantly faster than get_user_pages() + kmap()
 + memcmp() (or their variants).


Oh, I don't doubt that (in fact, I was pretty sure that was the case
based on some of the optimizations I could see in studying the c_t_u()
path).  I just didn't realize there were other ways to do it if its a
non current task. ;)

I guess the enigma for me right now is what cost does switch_mm have? 
(Thats not a slam against the suggested approach...I really do not know
and am curious).

As an aside, note that we seem to be reviewing v2, where v3 is really
the last set I pushed.  I think this patch is more or less the same
across both iterations, but FYI that I would recommend looking at v3
instead.

-Greg



signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH v2 00/19] virtual-bus

2009-06-04 Thread Gregory Haskins
Avi Kivity wrote:
 Gregory Haskins wrote:
 Avi,

 Gregory Haskins wrote:
  
 Todo:
 *) Develop some kind of hypercall registration mechanism for KVM so
 that
we can use that as an integration point instead of directly hooking
kvm hypercalls
   

 What would you like to see here?  I now remember why I removed the
 original patch I had for registration...it requires some kind of
 discovery mechanism on its own.  Note that this is hard, but I figured
 it would make the overall series simpler if I didn't go this route and
 instead just integrated with a statically allocated vector.  That being
 said, I have no problem adding this back in but figure we should discuss
 the approach so I don't go down a rat-hole ;)

   


 One idea is similar to signalfd() or eventfd().  Provide a kvm ioctl
 that takes a gsi and returns an fd.  Writes to the fd change the state
 of the line, possible triggering an interrupt.  Another ioctl takes a
 hypercall number or pio port as well as an existing fd.  Invocations
 of the hypercall or writes to the port write to the fd (using the same
 protocol as eventfd), so the other end can respond.

 The nice thing is that this can be used by both kernel and userspace
 components, and for kernel components, hypercalls can be either
 buffered or unbuffered.


And thus the kvm-eventfd (irqfd/iosignalfd) interface project was born. ;)

(Michael FYI: so I will be pushing a vbus-v4 series at some point in the
near future that is expressed in terms of irqfd/iosignalfd, per the
conversation above.  The patches in v3 and earlier are more intrusive to
the KVM core than they will be in final form)

-Greg




signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH v2 03/19] vbus: add connection-client helper infrastructure

2009-06-04 Thread Avi Kivity

Gregory Haskins wrote:

Oh, I don't doubt that (in fact, I was pretty sure that was the case
based on some of the optimizations I could see in studying the c_t_u()
path).  I just didn't realize there were other ways to do it if its a
non current task. ;)

I guess the enigma for me right now is what cost does switch_mm have? 
(Thats not a slam against the suggested approach...I really do not know

and am curious).
  


switch_mm() is probably very cheap (reloads cr3), but it does dirty the 
current cpu's tlb.  When the kernel needs to flush a process' tlb, it 
will have to IPI that cpu in addition to all others.  This takes place, 
for example, after munmap() or after a page is swapped out (though 
significant batching is done there).


It's still plenty cheaper in my estimation.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] use KVMState, as upstream do

2009-06-04 Thread Glauber Costa
On Thu, Jun 04, 2009 at 10:23:29PM +0300, Gleb Natapov wrote:
 On Thu, Jun 04, 2009 at 02:23:03PM -0400, Glauber Costa wrote:
  This is a pretty mechanical change. To make code look
  closer to upstream qemu, I'm renaming kvm_context_t to
  KVMState. Mid term goal here is to start sharing code
  whereas possible.
  
  Avi, please apply, or I'll send you a video of myself
  dancing naked.
  
 You can start recording it since I doubt this patch will apply cleanly
 to today's master (other mechanical change was applied). Regardless, I
 think trying to use bits of qemu kvm is dangerous. It has similar function
 with same names, but with different assumptions about conditional they
 can be executed in (look at commit a5ddb119). I actually prefer to be
 different enough to not call upstream qemu function by mistake.

I did it against today's master. If new patches came in, is just
a matter of regenerating this, since it is, as I said, mechanical.

Also, as we don't compile in upstream functions yet (kvm-all.c and kvm.c
are not included in the final object), there is no such risk.
Of course, I am aiming towards it, but the first step will be to change
the name of conflicting functions until we can pick qemu's implementation,
in which case the former will just go away.

If we are serious about merging qemu-kvm into qemu, I don't see a way out
of it. We should start changing things this way to accomodate it. Different
enough won't do.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] apic/ioapic kvm free implementation

2009-06-04 Thread Jan Kiszka
Gleb Natapov wrote:
 On Wed, Jun 03, 2009 at 05:19:26PM -0400, Glauber Costa wrote:
 Same thing,

 addressing comments from gleb.


 Jan, can you run your test on this one? It differs from previous one in
 halt handling.

Still works for me.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] use KVMState, as upstream do

2009-06-04 Thread Gleb Natapov
On Thu, Jun 04, 2009 at 04:33:19PM -0300, Glauber Costa wrote:
 On Thu, Jun 04, 2009 at 10:23:29PM +0300, Gleb Natapov wrote:
  On Thu, Jun 04, 2009 at 02:23:03PM -0400, Glauber Costa wrote:
   This is a pretty mechanical change. To make code look
   closer to upstream qemu, I'm renaming kvm_context_t to
   KVMState. Mid term goal here is to start sharing code
   whereas possible.
   
   Avi, please apply, or I'll send you a video of myself
   dancing naked.
   
  You can start recording it since I doubt this patch will apply cleanly
  to today's master (other mechanical change was applied). Regardless, I
  think trying to use bits of qemu kvm is dangerous. It has similar function
  with same names, but with different assumptions about conditional they
  can be executed in (look at commit a5ddb119). I actually prefer to be
  different enough to not call upstream qemu function by mistake.
 
 I did it against today's master. If new patches came in, is just
 a matter of regenerating this, since it is, as I said, mechanical.
 
 Also, as we don't compile in upstream functions yet (kvm-all.c and kvm.c
 are not included in the final object), there is no such risk.
 Of course, I am aiming towards it, but the first step will be to change
 the name of conflicting functions until we can pick qemu's implementation,
 in which case the former will just go away.
That is the point. We can't just pick qemu's implementation most of the
times.

 
 If we are serious about merging qemu-kvm into qemu, I don't see a way out
 of it. We should start changing things this way to accomodate it. Different
 enough won't do.
I don't really like the idea to morph working implementation to look like
non-working one. I do agree that qemu-kvm should be cleaned substantially
before going upstream. Upstream qemu kvm should go away than. I don't
see much work done to enhance it anyway.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] apic/ioapic kvm free implementation

2009-06-04 Thread Gleb Natapov
On Thu, Jun 04, 2009 at 09:46:23PM +0200, Jan Kiszka wrote:
 Gleb Natapov wrote:
  On Wed, Jun 03, 2009 at 05:19:26PM -0400, Glauber Costa wrote:
  Same thing,
 
  addressing comments from gleb.
 
 
  Jan, can you run your test on this one? It differs from previous one in
  halt handling.
 
 Still works for me.
 
Cool, thanks.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] use KVMState, as upstream do

2009-06-04 Thread Glauber Costa
On Thu, Jun 04, 2009 at 11:00:46PM +0300, Gleb Natapov wrote:
 On Thu, Jun 04, 2009 at 04:33:19PM -0300, Glauber Costa wrote:
  On Thu, Jun 04, 2009 at 10:23:29PM +0300, Gleb Natapov wrote:
   On Thu, Jun 04, 2009 at 02:23:03PM -0400, Glauber Costa wrote:
This is a pretty mechanical change. To make code look
closer to upstream qemu, I'm renaming kvm_context_t to
KVMState. Mid term goal here is to start sharing code
whereas possible.

Avi, please apply, or I'll send you a video of myself
dancing naked.

   You can start recording it since I doubt this patch will apply cleanly
   to today's master (other mechanical change was applied). Regardless, I
   think trying to use bits of qemu kvm is dangerous. It has similar function
   with same names, but with different assumptions about conditional they
   can be executed in (look at commit a5ddb119). I actually prefer to be
   different enough to not call upstream qemu function by mistake.
  
  I did it against today's master. If new patches came in, is just
  a matter of regenerating this, since it is, as I said, mechanical.
  
  Also, as we don't compile in upstream functions yet (kvm-all.c and kvm.c
  are not included in the final object), there is no such risk.
  Of course, I am aiming towards it, but the first step will be to change
  the name of conflicting functions until we can pick qemu's implementation,
  in which case the former will just go away.
 That is the point. We can't just pick qemu's implementation most of the
 times.
until we can pick up qemu's implementation potentially involves replacing
that particular piece with upstream version first.

 
  
  If we are serious about merging qemu-kvm into qemu, I don't see a way out
  of it. We should start changing things this way to accomodate it. Different
  enough won't do.
 I don't really like the idea to morph working implementation to look like
 non-working one. I do agree that qemu-kvm should be cleaned substantially
 before going upstream. Upstream qemu kvm should go away than. I don't
 see much work done to enhance it anyway.
 

this first phase has nothing to do with functionality. To begin with,
KVMState is qemu style, kvm_context_t is not, like it or not (I don't).

I don't plan to introduce regressions, you can rest assured. But we _do_
have to make things look much more qemuer, and that's what this patch
aims at.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] use KVMState, as upstream do

2009-06-04 Thread Glauber Costa
On Thu, Jun 04, 2009 at 11:09:52PM +0300, Gleb Natapov wrote:
 On Thu, Jun 04, 2009 at 05:10:51PM -0300, Glauber Costa wrote:
  On Thu, Jun 04, 2009 at 11:00:46PM +0300, Gleb Natapov wrote:
   On Thu, Jun 04, 2009 at 04:33:19PM -0300, Glauber Costa wrote:
On Thu, Jun 04, 2009 at 10:23:29PM +0300, Gleb Natapov wrote:
 On Thu, Jun 04, 2009 at 02:23:03PM -0400, Glauber Costa wrote:
  This is a pretty mechanical change. To make code look
  closer to upstream qemu, I'm renaming kvm_context_t to
  KVMState. Mid term goal here is to start sharing code
  whereas possible.
  
  Avi, please apply, or I'll send you a video of myself
  dancing naked.
  
 You can start recording it since I doubt this patch will apply cleanly
 to today's master (other mechanical change was applied). Regardless, I
 think trying to use bits of qemu kvm is dangerous. It has similar 
 function
 with same names, but with different assumptions about conditional they
 can be executed in (look at commit a5ddb119). I actually prefer to be
 different enough to not call upstream qemu function by mistake.

I did it against today's master. If new patches came in, is just
a matter of regenerating this, since it is, as I said, mechanical.

Also, as we don't compile in upstream functions yet (kvm-all.c and kvm.c
are not included in the final object), there is no such risk.
Of course, I am aiming towards it, but the first step will be to change
the name of conflicting functions until we can pick qemu's 
implementation,
in which case the former will just go away.
   That is the point. We can't just pick qemu's implementation most of the
   times.
  until we can pick up qemu's implementation potentially involves replacing
  that particular piece with upstream version first.
  
   

If we are serious about merging qemu-kvm into qemu, I don't see a way 
out
of it. We should start changing things this way to accomodate it. 
Different
enough won't do.
   I don't really like the idea to morph working implementation to look like
   non-working one. I do agree that qemu-kvm should be cleaned substantially
   before going upstream. Upstream qemu kvm should go away than. I don't
   see much work done to enhance it anyway.
   
  
  this first phase has nothing to do with functionality. To begin with,
  KVMState is qemu style, kvm_context_t is not, like it or not (I don't).
  
 I am not against this mechanical change at all, don't get me wrong. I
 don't want to mix two kvm implementation together in strange ways.
 
too late for not wanting anything strange to happen ;-)

But I do believe this is the way to turn qemu-kvm.git into something
that feeds qemu.git. And that's what we all want.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] use KVMState, as upstream do

2009-06-04 Thread Gleb Natapov
On Thu, Jun 04, 2009 at 05:18:06PM -0300, Glauber Costa wrote:
   this first phase has nothing to do with functionality. To begin with,
   KVMState is qemu style, kvm_context_t is not, like it or not (I don't).
   
  I am not against this mechanical change at all, don't get me wrong. I
  don't want to mix two kvm implementation together in strange ways.
  
 too late for not wanting anything strange to happen ;-)
 
You are right, I should have said in stranger ways.

 But I do believe this is the way to turn qemu-kvm.git into something
 that feeds qemu.git. And that's what we all want.
Disagree with first part, agree with second :)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM on Debian

2009-06-04 Thread Aaron Clausen
I'm running a production Debian Lenny server using KVM to run a couple
of Windows and a couple of Linux guests.  All is working well, but I
want to give my Server 2003 guest access to a SCSI tape drive.
Unfortunately, Debian is pretty conservative, and the version of KVM
is too old to support this.  Is there a reasonably safe way of
upgrading to one of the newer versions of KVM on this server?

-- 
Aaron Clausen
mightymartia...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM on Debian

2009-06-04 Thread Mark van Walraven
On Thu, Jun 04, 2009 at 01:37:54PM -0700, Aaron Clausen wrote:
 I'm running a production Debian Lenny server using KVM to run a couple
 of Windows and a couple of Linux guests.  All is working well, but I
 want to give my Server 2003 guest access to a SCSI tape drive.
 Unfortunately, Debian is pretty conservative, and the version of KVM
 is too old to support this.  Is there a reasonably safe way of
 upgrading to one of the newer versions of KVM on this server?

I'm interested in this too, so far I have found that Lenny's libvirt fails
to parse the output of kvm --help, though this is fixed in the libvirt in
testing.  The kvm package from experimental seems to work well - after a
day of testing.

My next step is to try qemu-kvm, built from source.  The Debianised libvirt
expects the kvm binaries to be in /usr/bin/kvm, so you can symlink them
from /usr/local/bin if you prefer to install there.  I've also experimented
with shell script wrapper in /usr/bin/kvm that condenses the output of
qemu-kvm --help so that libvirtd for Lenny works.

Regards,

Mark.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] QEMU KVM: i386: Fix the cpu reset state

2009-06-04 Thread Nitin A Kamble
As per the IA32 processor manual, the accessed bit is set to 1 in the
processor state after reset. qemu pc cpu_reset code was missing this
accessed bit setting.

Signed-off-by: Nitin A Kamble nitin.a.kam...@intel.com
---
 target-i386/helper.c |   18 --
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/target-i386/helper.c b/target-i386/helper.c
index 7fc5366..573fb5b 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -493,17 +493,23 @@ void cpu_reset(CPUX86State *env)
 env-tr.flags = DESC_P_MASK | (11  DESC_TYPE_SHIFT);
 
 cpu_x86_load_seg_cache(env, R_CS, 0xf000, 0x, 0x,
-   DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK | 
DESC_R_MASK);
+   DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK |
+   DESC_R_MASK | DESC_A_MASK);
 cpu_x86_load_seg_cache(env, R_DS, 0, 0, 0x,
-   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
+   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |
+   DESC_A_MASK);
 cpu_x86_load_seg_cache(env, R_ES, 0, 0, 0x,
-   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
+   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |
+   DESC_A_MASK);
 cpu_x86_load_seg_cache(env, R_SS, 0, 0, 0x,
-   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
+   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |
+   DESC_A_MASK);
 cpu_x86_load_seg_cache(env, R_FS, 0, 0, 0x,
-   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
+   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |
+   DESC_A_MASK);
 cpu_x86_load_seg_cache(env, R_GS, 0, 0, 0x,
-   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK);
+   DESC_P_MASK | DESC_S_MASK | DESC_W_MASK |
+   DESC_A_MASK);
 
 env-eip = 0xfff0;
 env-regs[R_EDX] = env-cpuid_version;
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Mike Waychison

Avi Kivity wrote:

Bharata B Rao wrote:

2. Need for hard limiting CPU resource
--
- Pay-per-use: In enterprise systems that cater to multiple clients/customers
  where a customer demands a certain share of CPU resources and pays only
  that, CPU hard limits will be useful to hard limit the customer's job
  to consume only the specified amount of CPU resource.
- In container based virtualization environments running multiple containers,
  hard limits will be useful to ensure a container doesn't exceed its
  CPU entitlement.
- Hard limits can be used to provide guarantees.
  

How can hard limits provide guarantees?


Hard limits are useful and desirable in situations where we would like 
to maintain deterministic behavior.


Placing a hard cap on the cpu usage of a given task group (and 
configuring such that this cpu time is not overcommited) on a system 
allows us to create a hard guarantee that throughput for that task group 
will not fluctuate as other workloads are added and removed on the system.


Cache use and bus bandwidth in a multi-workload environment can still 
cause a performance deviation, but these are second order compared to 
the cpu scheduling guarantees themselves.


Mike Waychison
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM on Debian

2009-06-04 Thread Matthew Palmer
On Thu, Jun 04, 2009 at 01:37:54PM -0700, Aaron Clausen wrote:
 I'm running a production Debian Lenny server using KVM to run a couple
 of Windows and a couple of Linux guests.  All is working well, but I
 want to give my Server 2003 guest access to a SCSI tape drive.
 Unfortunately, Debian is pretty conservative, and the version of KVM
 is too old to support this.  Is there a reasonably safe way of
 upgrading to one of the newer versions of KVM on this server?

Backporting kvm from experimental is straightforward, and has worked fine
for me.

- Matt
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH KVM VMX 1/2] KVM: VMX: Rename rmode.active to rmode.vm86_active

2009-06-04 Thread Nitin A Kamble
That way the interpretation of rmode.active becomes more clear with
unrestricted guest code.

Signed-off-by: Nitin A Kamble nitin.a.kam...@intel.com
---
 arch/x86/include/asm/kvm_host.h |2 +-
 arch/x86/kvm/vmx.c  |   28 ++--
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1951d39..1cc901e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -339,7 +339,7 @@ struct kvm_vcpu_arch {
} interrupt;
 
struct {
-   int active;
+   int vm86_active;
u8 save_iopl;
struct kvm_save_segment {
u16 selector;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fd05fd2..d1ec8a9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -497,7 +497,7 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
if (vcpu-guest_debug  KVM_GUESTDBG_USE_SW_BP)
eb |= 1u  BP_VECTOR;
}
-   if (vcpu-arch.rmode.active)
+   if (vcpu-arch.rmode.vm86_active)
eb = ~0;
if (enable_ept)
eb = ~(1u  PF_VECTOR); /* bypass_guest_pf = 0 */
@@ -733,7 +733,7 @@ static unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu)
 
 static void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 {
-   if (vcpu-arch.rmode.active)
+   if (vcpu-arch.rmode.vm86_active)
rflags |= X86_EFLAGS_IOPL | X86_EFLAGS_VM;
vmcs_writel(GUEST_RFLAGS, rflags);
 }
@@ -790,7 +790,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, 
unsigned nr,
intr_info |= INTR_INFO_DELIVER_CODE_MASK;
}
 
-   if (vcpu-arch.rmode.active) {
+   if (vcpu-arch.rmode.vm86_active) {
vmx-rmode.irq.pending = true;
vmx-rmode.irq.vector = nr;
vmx-rmode.irq.rip = kvm_rip_read(vcpu);
@@ -1370,7 +1370,7 @@ static void enter_pmode(struct kvm_vcpu *vcpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
vmx-emulation_required = 1;
-   vcpu-arch.rmode.active = 0;
+   vcpu-arch.rmode.vm86_active = 0;
 
vmcs_writel(GUEST_TR_BASE, vcpu-arch.rmode.tr.base);
vmcs_write32(GUEST_TR_LIMIT, vcpu-arch.rmode.tr.limit);
@@ -1432,7 +1432,7 @@ static void enter_rmode(struct kvm_vcpu *vcpu)
struct vcpu_vmx *vmx = to_vmx(vcpu);
 
vmx-emulation_required = 1;
-   vcpu-arch.rmode.active = 1;
+   vcpu-arch.rmode.vm86_active = 1;
 
vcpu-arch.rmode.tr.base = vmcs_readl(GUEST_TR_BASE);
vmcs_writel(GUEST_TR_BASE, rmode_tss_base(vcpu-kvm));
@@ -1616,10 +1616,10 @@ static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
 
vmx_fpu_deactivate(vcpu);
 
-   if (vcpu-arch.rmode.active  (cr0  X86_CR0_PE))
+   if (vcpu-arch.rmode.vm86_active  (cr0  X86_CR0_PE))
enter_pmode(vcpu);
 
-   if (!vcpu-arch.rmode.active  !(cr0  X86_CR0_PE))
+   if (!vcpu-arch.rmode.vm86_active  !(cr0  X86_CR0_PE))
enter_rmode(vcpu);
 
 #ifdef CONFIG_X86_64
@@ -1675,7 +1675,7 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned 
long cr3)
 
 static void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
-   unsigned long hw_cr4 = cr4 | (vcpu-arch.rmode.active ?
+   unsigned long hw_cr4 = cr4 | (vcpu-arch.rmode.vm86_active ?
KVM_RMODE_VM_CR4_ALWAYS_ON : KVM_PMODE_VM_CR4_ALWAYS_ON);
 
vcpu-arch.cr4 = cr4;
@@ -1758,7 +1758,7 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
struct kvm_vmx_segment_field *sf = kvm_vmx_segment_fields[seg];
u32 ar;
 
-   if (vcpu-arch.rmode.active  seg == VCPU_SREG_TR) {
+   if (vcpu-arch.rmode.vm86_active  seg == VCPU_SREG_TR) {
vcpu-arch.rmode.tr.selector = var-selector;
vcpu-arch.rmode.tr.base = var-base;
vcpu-arch.rmode.tr.limit = var-limit;
@@ -1768,7 +1768,7 @@ static void vmx_set_segment(struct kvm_vcpu *vcpu,
vmcs_writel(sf-base, var-base);
vmcs_write32(sf-limit, var-limit);
vmcs_write16(sf-selector, var-selector);
-   if (vcpu-arch.rmode.active  var-s) {
+   if (vcpu-arch.rmode.vm86_active  var-s) {
/*
 * Hack real-mode segments into vm86 compatibility.
 */
@@ -2337,7 +2337,7 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
goto out;
}
 
-   vmx-vcpu.arch.rmode.active = 0;
+   vmx-vcpu.arch.rmode.vm86_active = 0;
 
vmx-soft_vnmi_blocked = 0;
 
@@ -2475,7 +2475,7 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu)
KVMTRACE_1D(INJ_VIRQ, vcpu, (u32)irq, handler);
 
++vcpu-stat.irq_injections;
-   if (vcpu-arch.rmode.active) {
+   if (vcpu-arch.rmode.vm86_active) {
vmx-rmode.irq.pending = true;
vmx-rmode.irq.vector = 

[PATCH KVM VMX 0/2] Enable Unrestricted Guest

2009-06-04 Thread Nitin A Kamble
Hi Avi,
  I have modified the earlier patch as per you comments. I have prepared a 
separate patch for renaming rmode.active to rmode.vm86_active. And 2nd patch 
enables the Unrestricted Guest feature in the KVM.
   This patch will also work with unfixed (cpu reset state) qemu. 
Please apply.

Thanks  Regards,
Nitin

Nitin A Kamble (2):
  KVM: VMX: Rename rmode.active to rmode.vm86_active
  KVM: VMX: Support Unrestricted Guest feature

 arch/x86/include/asm/kvm_host.h |   14 ---
 arch/x86/include/asm/vmx.h  |1 +
 arch/x86/kvm/vmx.c  |   77 +--
 3 files changed, 66 insertions(+), 26 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2801458 ] BUG at mmu.c:615 from localhost migration using ept+hugetlbf

2009-06-04 Thread SourceForge.net
Bugs item #2801458, was opened at 2009-06-04 19:36
Message generated for change (Tracker Item Submitted) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2801458group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: kernel
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Marcelo Tosatti (mtosatti)
Assigned to: Marcelo Tosatti (mtosatti)
Summary: BUG at mmu.c:615 from localhost migration using ept+hugetlbf

Initial Comment:
http://www.mail-archive.com/kvm@vger.kernel.org/msg16136.html

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2801458group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2801459 ] i8042.c: No controller found...

2009-06-04 Thread SourceForge.net
Bugs item #2801459, was opened at 2009-06-04 19:39
Message generated for change (Tracker Item Submitted) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2801459group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Marcelo Tosatti (mtosatti)
Assigned to: Marcelo Tosatti (mtosatti)
Summary: i8042.c: No controller found...

Initial Comment:
http://marc.info/?l=qemu-develm=124329227728366w=2

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2801459group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2801212 ] sles10sp2 guest timer run too fast

2009-06-04 Thread SourceForge.net
Bugs item #2801212, was opened at 2009-06-04 11:17
Message generated for change (Settings changed) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2801212group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jiajun Xu (jiajun)
Assigned to: Marcelo Tosatti (mtosatti)
Summary: sles10sp2 guest timer run too fast 

Initial Comment:
With kvm.git Commit:7ff90748cebbfbafc8cfa6bdd633113cd9537789
qemu-kvm Commit:a1cd3c985c848dae73966f9601f15fbcade72f1, we found that 
sles10sp2 will run much faster than real, about 27s faster each after 60s real 
time.

Reproduce steps:

(1)qemu-system-x86_64  -m 1024 -net nic,macaddr=00:16:3e:6f:f3:d1,model=rtl8139 
-net tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/sles10sp2.img
(2)Run ntpdate in guest: ntpdate sync_machine_ip  sleep 60  ntpdate 
sync_machine_ip

Current result:

sles10sp2rc1-guest:~ #  ntpdate sync_machine_ip  sleep 60  ntpdate 
sync_machine_ip
31 May 23:16:59 ntpdate[3303]: step time server 192.168.198.248 offset
-61.27418
31 May 23:17:32 ntpdate[3305]: step time server 192.168.198.248 offset
-27.626469 sec

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2801212group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2782199 ] linux_s3 ceased function

2009-06-04 Thread SourceForge.net
Bugs item #2782199, was opened at 2009-04-27 11:04
Message generated for change (Comment added) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2782199group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Technologov (technologov)
Assigned to: Nobody/Anonymous (nobody)
Summary: linux_s3 ceased function

Initial Comment:
Test linux_s3, that worked fine with KVM-85rc3 ceased to function in 
KVM-85rc5/rc6/final release.

S3 is a power sleep test. (suspend)

Now it only works with 2 guests: RHEL 5, Fedora 8. (32 and 64-bit)

Previously, with KVM-85rc3, linux_s3 test worked with the following guests: 
RHEL 4, RHEL 5, Fedora 8, Fedora 9, openSUSE 11.0, openSUSE 11.1, and Ubuntu 
8.10.

I see this as a regression.

-Alexey, 27.4.2009.

--

Comment By: Marcelo Tosatti (mtosatti)
Date: 2009-06-04 19:47

Message:
virtio_balloon thing

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2782199group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2782199 ] linux_s3 ceased function

2009-06-04 Thread SourceForge.net
Bugs item #2782199, was opened at 2009-04-27 11:04
Message generated for change (Settings changed) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2782199group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: Technologov (technologov)
Assigned to: Nobody/Anonymous (nobody)
Summary: linux_s3 ceased function

Initial Comment:
Test linux_s3, that worked fine with KVM-85rc3 ceased to function in 
KVM-85rc5/rc6/final release.

S3 is a power sleep test. (suspend)

Now it only works with 2 guests: RHEL 5, Fedora 8. (32 and 64-bit)

Previously, with KVM-85rc3, linux_s3 test worked with the following guests: 
RHEL 4, RHEL 5, Fedora 8, Fedora 9, openSUSE 11.0, openSUSE 11.1, and Ubuntu 
8.10.

I see this as a regression.

-Alexey, 27.4.2009.

--

Comment By: Marcelo Tosatti (mtosatti)
Date: 2009-06-04 19:47

Message:
virtio_balloon thing

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2782199group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2287677 ] kvm79 compiling errors (with-patched-kernel)

2009-06-04 Thread SourceForge.net
Bugs item #2287677, was opened at 2008-11-14 21:39
Message generated for change (Comment added) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2287677group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: intel
Group: None
Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: Darkman (darkman82)
Assigned to: Nobody/Anonymous (nobody)
Summary: kvm79 compiling errors (with-patched-kernel)

Initial Comment:


config.mak :

ARCH=i386
PROCESSOR=i386
PREFIX=/usr
KERNELDIR=/usr/src/linux-2.6.27.6/
KERNELSOURCEDIR=
LIBKVM_KERNELDIR=/root/kvm-79/kernel
WANT_MODULE=
CROSS_COMPILE=
CC=gcc
LD=ld
OBJCOPY=objcopy
AR=ar


ERRORS:
/root/kvm-79/qemu/qemu-kvm.c: In function 'ap_main_loop':
/root/kvm-79/qemu/qemu-kvm.c:459: error: 'kvm_arch_do_ioperm' undeclared (first 
use in this function)
/root/kvm-79/qemu/qemu-kvm.c:459: error: (Each undeclared identifier is 
reported only once
/root/kvm-79/qemu/qemu-kvm.c:459: error: for each function it appears in.)
/root/kvm-79/qemu/qemu-kvm.c: In function 'sigfd_handler':
/root/kvm-79/qemu/qemu-kvm.c:544: warning: format '%ld' expects type 'long 
int', but argument 2 has type 'ssize_t'
make[2]: *** [qemu-kvm.o] Error 1
make[2]: Leaving directory `/root/kvm-79/qemu/x86_64-softmmu'
make[1]: *** [subdir-x86_64-softmmu] Error 2
make[1]: Leaving directory `/root/kvm-79/qemu'
make: *** [qemu] Error 2

Same problem with 2.6.27.2 source.

kvm78 works fine.

--

Comment By: Marcelo Tosatti (mtosatti)
Date: 2009-06-04 19:51

Message:
Please try with kvm-85, and reopen in case its still problematic.

--

Comment By: Darkman (darkman82)
Date: 2008-12-05 19:33

Message:
It seems due to undefined USE_KVM_DEVICE_ASSIGNMENT.

In qemu-kvm.h:

qemu-kvm.h:95 #ifdef USE_KVM_DEVICE_ASSIGNMENT
qemu-kvm.h:96 void kvm_ioperm(CPUState *env, void *data);
qemu-kvm.h:97 void kvm_arch_do_ioperm(void *_data);
qemu-kvm.h:98 #endif

but in qemu-kvm.c we have

qemu-kvm.c:457/* do ioperm for io ports of assigned devices */
qemu-kvm.c:458LIST_FOREACH(data, ioperm_head, entries)
qemu-kvm.c:459on_vcpu(env, kvm_arch_do_ioperm, data);

without #ifdef block.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2287677group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2287677 ] kvm79 compiling errors (with-patched-kernel)

2009-06-04 Thread SourceForge.net
Bugs item #2287677, was opened at 2008-11-14 21:39
Message generated for change (Settings changed) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2287677group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: intel
Group: None
Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: Darkman (darkman82)
Assigned to: Nobody/Anonymous (nobody)
Summary: kvm79 compiling errors (with-patched-kernel)

Initial Comment:


config.mak :

ARCH=i386
PROCESSOR=i386
PREFIX=/usr
KERNELDIR=/usr/src/linux-2.6.27.6/
KERNELSOURCEDIR=
LIBKVM_KERNELDIR=/root/kvm-79/kernel
WANT_MODULE=
CROSS_COMPILE=
CC=gcc
LD=ld
OBJCOPY=objcopy
AR=ar


ERRORS:
/root/kvm-79/qemu/qemu-kvm.c: In function 'ap_main_loop':
/root/kvm-79/qemu/qemu-kvm.c:459: error: 'kvm_arch_do_ioperm' undeclared (first 
use in this function)
/root/kvm-79/qemu/qemu-kvm.c:459: error: (Each undeclared identifier is 
reported only once
/root/kvm-79/qemu/qemu-kvm.c:459: error: for each function it appears in.)
/root/kvm-79/qemu/qemu-kvm.c: In function 'sigfd_handler':
/root/kvm-79/qemu/qemu-kvm.c:544: warning: format '%ld' expects type 'long 
int', but argument 2 has type 'ssize_t'
make[2]: *** [qemu-kvm.o] Error 1
make[2]: Leaving directory `/root/kvm-79/qemu/x86_64-softmmu'
make[1]: *** [subdir-x86_64-softmmu] Error 2
make[1]: Leaving directory `/root/kvm-79/qemu'
make: *** [qemu] Error 2

Same problem with 2.6.27.2 source.

kvm78 works fine.

--

Comment By: Marcelo Tosatti (mtosatti)
Date: 2009-06-04 19:51

Message:
Please try with kvm-85, and reopen in case its still problematic.

--

Comment By: Marcelo Tosatti (mtosatti)
Date: 2009-06-04 19:51

Message:
Please try with kvm-85, and reopen in case its still problematic.

--

Comment By: Darkman (darkman82)
Date: 2008-12-05 19:33

Message:
It seems due to undefined USE_KVM_DEVICE_ASSIGNMENT.

In qemu-kvm.h:

qemu-kvm.h:95 #ifdef USE_KVM_DEVICE_ASSIGNMENT
qemu-kvm.h:96 void kvm_ioperm(CPUState *env, void *data);
qemu-kvm.h:97 void kvm_arch_do_ioperm(void *_data);
qemu-kvm.h:98 #endif

but in qemu-kvm.c we have

qemu-kvm.c:457/* do ioperm for io ports of assigned devices */
qemu-kvm.c:458LIST_FOREACH(data, ioperm_head, entries)
qemu-kvm.c:459on_vcpu(env, kvm_arch_do_ioperm, data);

without #ifdef block.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2287677group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-2624842 ] kernel BUG at /kvm-84/kernel/x86/kvm_main.c:2148!

2009-06-04 Thread SourceForge.net
Bugs item #2624842, was opened at 2009-02-21 14:27
Message generated for change (Comment added) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2624842group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: kernel
Group: None
Status: Closed
Resolution: None
Priority: 5
Private: No
Submitted By: jb17bsome (jb17bsome)
Assigned to: Nobody/Anonymous (nobody)
Summary: kernel BUG at /kvm-84/kernel/x86/kvm_main.c:2148!

Initial Comment:
cpu: AMD Phenom 9750 (4)
host distro: fedora 10 x86_64
host kernel: linus-2.6 git (v2.6.29-rc5-276-g2ec77fc)
guest: any.  I have tried fedora 10, windows nt 4, and windows 2008 images.
kvm version: 84

usage:
qemu-system-x86_64 -m 512

-no-kvm-pit and -no-kvm-irqchip cause the same bug.
-no-kvm runs fine.


--

Comment By: Marcelo Tosatti (mtosatti)
Date: 2009-06-04 19:57

Message:
jb17bsome,

Try unloading the virtualbox driver.

--

Comment By: jb17bsome (jb17bsome)
Date: 2009-05-06 21:13

Message:
I upgraded my system to the latest F11 dev as of May 6 2009, but I get the
same result...
So the same kernel BUG bug at kvm_handle_fault_on_reboot. 
I attached another kbug dump (with a little context from dmesg).
Is there anything else I can try? 

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2624842group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ kvm-Bugs-1971512 ] failure to migrate guests with more than 4GB of RAM

2009-06-04 Thread SourceForge.net
Bugs item #1971512, was opened at 2008-05-24 17:45
Message generated for change (Comment added) made by mtosatti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=1971512group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Closed
Resolution: None
Priority: 3
Private: No
Submitted By: Marcelo Tosatti (mtosatti)
Assigned to: Anthony Liguori (aliguori)
Summary: failure to migrate guests with more than 4GB of RAM

Initial Comment:

The migration code assumes linear phys_ram_base:

[r...@localhost kvm-userspace.tip]# qemu/x86_64-softmmu/qemu-system-x86_64 -hda 
/root/images/marcelo5-io-test.img -m 4097 -net nic,model=rtl8139 -net 
tap,script=/root/iptables/ifup -incoming tcp://0:/
audit_log_user_command(): Connection refused
audit_log_user_command(): Connection refused
migration: memory size mismatch: recv 22032384 mine 4316999680
migrate_incoming_fd failed (rc=232)


--

Comment By: Marcelo Tosatti (mtosatti)
Date: 2009-06-04 20:00

Message:
This has been fixed by Glauber.

--

Comment By: Jiajun Xu (jiajun)
Date: 2008-12-15 22:37

Message:
We did not run anyworkload, we do migration just after guest boots up and
becomes idle.

--

Comment By: Avi Kivity (avik)
Date: 2008-12-14 11:45

Message:
What workload is the guest running during the migration?

--

Comment By: Jiajun Xu (jiajun)
Date: 2008-12-09 23:09

Message:
Open the bug again since Live Migration 4G guest still fail on my machine.
Guest will call trace after Live Migration.

--

Comment By: SourceForge Robot (sf-robot)
Date: 2008-12-07 22:22

Message:
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).

--

Comment By: Jiajun Xu (jiajun)
Date: 2008-11-25 01:52

Message:
I tried latest commit, userspace.git
6e63ba19476753595e508713eb9daf559dc50bf6 with a 64-bit RHEL5.1 Guest. My
host kernel is 2.6.26.2. And My host has 8GB memory and 4GB swap.
Guest can be live migrated, but after that, guest will call trace.

Maybe we can have a check with each other's environment.

My steps as following:
1. qemu-system-x86_64 -incoming tcp:localhost: -m 4096  -net
nic,macaddr=00:16:3e:44:1a:a6,model=rtl8139 -net
tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/rhel5u1.img
2. qemu-system-x86_64  -m 4096 -net
nic,macaddr=00:16:3e:44:1a:a6,model=rtl8139 -net
tap,script=/etc/kvm/qemu-ifup -hda /share/xvs/var/rhel5u1.img
3. In qemu console, type migrate tcp:localhost:

The call trace messages in guest:
###
Kernel BUG at block/elevator.c:560
invalid opcode:  [1] SMP 
last sysfs file: /block/hda/removable
CPU 0 
Modules linked in: ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc
iscsi_tcp
ib_iser libiscsi scsi_transport_iscsi rdma_ucm ib_ucm ib_srp ib_sdp
rdma_cm
ib_cm iw_cm ib_addr ib_local_sa ib_ipoib ib_sa ib_uverbs ib_umad ib_mad
ib_core
dm_mirror dm_multipath dm_mod video sbs backlight i2c_ec i2c_core button
battery asus_acpi acpi_memhotplug ac lp floppy pcspkr serio_raw 8139cp
8139too
parport_pc parport mii ide_cd cdrom ata_piix libata sd_mod scsi_mod ext3
jbd
ehci_hcd ohci_hcd uhci_hcd
Pid: 0, comm: swapper Not tainted 2.6.18-53.el5 #1
RIP: 0010:[80134673]  [80134673]
elv_dequeue_request+0x8/0x3c
RSP: 0018:8040ddc0  EFLAGS: 00010046
RAX: 0001 RBX: 81011381b398 RCX: 
RDX: 81011381b398 RSI: 81011381b398 RDI: 81011fb912c0
RBP: 804abe18 R08: 80304108 R09: 0012
R10: 0022 R11:  R12: 
R13: 0001 R14: 0086 R15: 8040deb8
FS:  () GS:80396000()
knlGS:
CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
CR2: 2ad6f4d0 CR3: 0001126cc000 CR4: 06e0
Process swapper (pid: 0, threadinfo 803c6000, task
802dcae0)
Stack:  8000ae3c 804abe18 804abe50

 804abd00 0246 8003ba73 8003ba0c
 804abe18 81011fbe5800 8000d2a5 81011fb8c5c0
Call Trace:
 IRQ  [8000ae3c] ide_end_request+0xc6/0xfc
 [8003ba73] ide_dma_intr+0x67/0xab
 [8003ba0c] ide_dma_intr+0x0/0xab
 [8000d2a5] ide_intr+0x16f/0x1df
 [800107a0] 

RE: [PATCH] qemu-kvm: Flush icache after dma operations for ia64

2009-06-04 Thread Zhang, Xiantao
Jes Sorensen wrote:
 Zhang, Xiantao wrote:
 Hi, Jes
 Have you verified whether it works for you ?  You may run kernel
 build in the guest with 4 vcpus,  if it can be done successfully
 without any error, it should be Okay I think, otherwise, we may need
 to investigate it further. :) Xiantao  
 
 Hi Xiantao,
 
 I was able to run a 16 vCPU guest and build the kernel using make -j
 16. How quickly would the problem show up for you, on every run, or
 should I run more tests?

Hi Jes, 
 Good news! On my machine, without the patch, smp guest can't build one whole 
kernel at all. So if you can build it without errors and use it to boot up the 
guest, I think it should work well.  
Xiantao



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Bharata B Rao
On Thu, Jun 04, 2009 at 03:19:22PM +0300, Avi Kivity wrote:
 Bharata B Rao wrote:
 2. Need for hard limiting CPU resource
 --
 - Pay-per-use: In enterprise systems that cater to multiple clients/customers
   where a customer demands a certain share of CPU resources and pays only
   that, CPU hard limits will be useful to hard limit the customer's job
   to consume only the specified amount of CPU resource.
 - In container based virtualization environments running multiple containers,
   hard limits will be useful to ensure a container doesn't exceed its
   CPU entitlement.
 - Hard limits can be used to provide guarantees.
   
 How can hard limits provide guarantees?

 Let's take an example where I have 1 group that I wish to guarantee a  
 20% share of the cpu, and anther 8 groups with no limits or guarantees.

 One way to achieve the guarantee is to hard limit each of the 8 other  
 groups to 10%; the sum total of the limits is 80%, leaving 20% for the  
 guarantee group. The downside is the arbitrary limit imposed on the  
 other groups.

This method sounds very similar to the openvz method:
http://wiki.openvz.org/Containers/Guarantees_for_resources


 Another way is to place the 8 groups in a container group, and limit  
 that to 80%. But that doesn't work if I want to provide guarantees to  
 several groups.

Hmm why not ? Reduce the guarantee of the container group and provide
the same to additional groups ?

Regards,
Bharata.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Balbir Singh
* Avi Kivity a...@redhat.com [2009-06-04 15:19:22]:

 Bharata B Rao wrote:
  2. Need for hard limiting CPU resource
  --
  - Pay-per-use: In enterprise systems that cater to multiple 
  clients/customers
where a customer demands a certain share of CPU resources and pays only
that, CPU hard limits will be useful to hard limit the customer's job
to consume only the specified amount of CPU resource.
  - In container based virtualization environments running multiple 
  containers,
hard limits will be useful to ensure a container doesn't exceed its
CPU entitlement.
  - Hard limits can be used to provide guarantees.

 How can hard limits provide guarantees?
 
 Let's take an example where I have 1 group that I wish to guarantee a 
 20% share of the cpu, and anther 8 groups with no limits or guarantees.
 
 One way to achieve the guarantee is to hard limit each of the 8 other 
 groups to 10%; the sum total of the limits is 80%, leaving 20% for the 
 guarantee group. The downside is the arbitrary limit imposed on the 
 other groups.
 
 Another way is to place the 8 groups in a container group, and limit 
 that to 80%. But that doesn't work if I want to provide guarantees to 
 several groups.


Hi, Avi,

Take a look at
http://wiki.openvz.org/Containers/Guarantees_for_resources
and the associated program in the wiki page.

-- 
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Avi Kivity

Bharata B Rao wrote:
Another way is to place the 8 groups in a container group, and limit  
that to 80%. But that doesn't work if I want to provide guarantees to  
several groups.



Hmm why not ? Reduce the guarantee of the container group and provide
the same to additional groups ?
  


This method produces suboptimal results:

$ cgroup-limits 10 10 0
[50.0, 50.0, 40.0]

I want to provide two 10% guaranteed groups and one best-effort group.  
Using the limits method, no group can now use more than 50% of the 
resources.  However, having the first group use 90% of the resources 
does not violate any guarantees, but it not allowed by the solution.


#!/usr/bin/python

def calculate_limits(g, R):
   N = len(g)
   if N == 1:
   return [R]

   s = sum([R - gi for gi in g])
   return [(s - (R - gi) - (N - 2) * (R - gi)) / (N - 1)
   for gi in g]

import sys
print calculate_limits([float(x) for x in sys.argv[1:]], 100)

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM on Debian

2009-06-04 Thread Mark van Walraven
Hi,

An update in the hope that this is useful to someone :-)

On Fri, Jun 05, 2009 at 09:03:03AM +1200, Mark van Walraven wrote:
 My next step is to try qemu-kvm, built from source.  The Debianised libvirt
 expects the kvm binaries to be in /usr/bin/kvm, so you can symlink them
 from /usr/local/bin if you prefer to install there.  I've also experimented
 with shell script wrapper in /usr/bin/kvm that condenses the output of
 qemu-kvm --help so that libvirtd for Lenny works.

Actually, the current Debian Lenny libvirt* (0.4.6-10) seem to work
fine with qemu-kvm-0.10.5 built from source.  All I needed to do was
symlink /usr/local/bin/qemu-system-x86_64 to /usr/bin/kvm and copy
extboot.bin into /usr/local/share/qemu/ (I used the one from the
kvm 85+dfsg-3 package in Experimental).

So far, so good.

Mark.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Balbir Singh
On Fri, Jun 5, 2009 at 11:33 AM, Avi Kivity a...@redhat.com wrote:
 Bharata B Rao wrote:

 Another way is to place the 8 groups in a container group, and limit
  that to 80%. But that doesn't work if I want to provide guarantees to
  several groups.


 Hmm why not ? Reduce the guarantee of the container group and provide
 the same to additional groups ?


 This method produces suboptimal results:

 $ cgroup-limits 10 10 0
 [50.0, 50.0, 40.0]

 I want to provide two 10% guaranteed groups and one best-effort group.
  Using the limits method, no group can now use more than 50% of the
 resources.  However, having the first group use 90% of the resources does
 not violate any guarantees, but it not allowed by the solution.


How, it works out fine in my calculation

50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
limited to 90%
50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
limited to 90%
50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
limited to 100%

Now if we really have zeros, I would recommend using

cgroup-limits 10 10 and you'll see that you'll get 90, 90 as output.

Adding zeros to the calcuation is not recommended. Does that help?

Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Avi Kivity

Balbir Singh wrote:

On Fri, Jun 5, 2009 at 11:33 AM, Avi Kivity a...@redhat.com wrote:
  

Bharata B Rao wrote:


Another way is to place the 8 groups in a container group, and limit
 that to 80%. But that doesn't work if I want to provide guarantees to
 several groups.



Hmm why not ? Reduce the guarantee of the container group and provide
the same to additional groups ?

  

This method produces suboptimal results:

$ cgroup-limits 10 10 0
[50.0, 50.0, 40.0]

I want to provide two 10% guaranteed groups and one best-effort group.
 Using the limits method, no group can now use more than 50% of the
resources.  However, having the first group use 90% of the resources does
not violate any guarantees, but it not allowed by the solution.




How, it works out fine in my calculation

50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
limited to 90%
50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
limited to 90%
50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
limited to 100%
  


It's fine in that it satisfies the guarantees, but it is deeply 
suboptimal.  If I ran a cpu hog in the first group, while the other two 
were idle, it would be limited to 50% cpu.  On the other hand, if it 
consumed all 100% cpu it would still satisfy the guarantees (as the 
other groups are idle).


The result is that in such a situation, wall clock time would double 
even though cpu resources are available.

Now if we really have zeros, I would recommend using

cgroup-limits 10 10 and you'll see that you'll get 90, 90 as output.

Adding zeros to the calcuation is not recommended. Does that help?


What do you mean, it is not recommended? I have two groups which need at 
least 10% and one which does not need any guarantee, how do I express it?


In any case, changing the zero to 1% does not materially change the results.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Balbir Singh
* Avi Kivity a...@redhat.com [2009-06-05 07:44:27]:

 Balbir Singh wrote:
 On Fri, Jun 5, 2009 at 11:33 AM, Avi Kivity a...@redhat.com wrote:
   
 Bharata B Rao wrote:
 
 Another way is to place the 8 groups in a container group, and limit
  that to 80%. But that doesn't work if I want to provide guarantees to
  several groups.

 
 Hmm why not ? Reduce the guarantee of the container group and provide
 the same to additional groups ?

   
 This method produces suboptimal results:

 $ cgroup-limits 10 10 0
 [50.0, 50.0, 40.0]

 I want to provide two 10% guaranteed groups and one best-effort group.
  Using the limits method, no group can now use more than 50% of the
 resources.  However, having the first group use 90% of the resources does
 not violate any guarantees, but it not allowed by the solution.

 

 How, it works out fine in my calculation

 50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
 limited to 90%
 50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
 limited to 90%
 50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
 limited to 100%
   

 It's fine in that it satisfies the guarantees, but it is deeply  
 suboptimal.  If I ran a cpu hog in the first group, while the other two  
 were idle, it would be limited to 50% cpu.  On the other hand, if it  
 consumed all 100% cpu it would still satisfy the guarantees (as the  
 other groups are idle).

 The result is that in such a situation, wall clock time would double  
 even though cpu resources are available.

But then there is no other way to make a *guarantee*, guarantees come
at a cost of idling resources, no? Can you show me any other
combination that will provide the guarantee and without idling the
system for the specified guarantees?


 Now if we really have zeros, I would recommend using

 cgroup-limits 10 10 and you'll see that you'll get 90, 90 as output.

 Adding zeros to the calcuation is not recommended. Does that help?

 What do you mean, it is not recommended? I have two groups which need at  
 least 10% and one which does not need any guarantee, how do I express it?

Ignore this part of my comment

 In any case, changing the zero to 1% does not materially change the results.

True.

-- 
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 00/19] virtual-bus

2009-06-04 Thread Rusty Russell
On Fri, 5 Jun 2009 04:19:17 am Gregory Haskins wrote:
 Avi Kivity wrote:
  Gregory Haskins wrote:
  One idea is similar to signalfd() or eventfd()

 And thus the kvm-eventfd (irqfd/iosignalfd) interface project was born.
 ;)

The lguest patch queue already has such an interface :)  And I have a
partially complete in-kernel virtio_pci patch with the same trick.

I switched from kernel created eventfd to userspace passes in eventfd
after a while though; it lets you connect multiple virtqueues to a single fd
if you want.

Combined with a minor change to allow any process with access to the lguest fd
to queue interrupts, this allowed lguest to move to a thread-per-virtqueue
model which was a significant speedup as well as nice code reduction.

Here's the relevant kernel patch for reading.

Thanks!
Rusty.

lguest: use eventfds for device notification

Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with
an address: the main Launcher process returns with this address, and figures
out what device to run.

A far nicer model is to let processes bind an eventfd to an address: if we
find one, we simply signal the eventfd.

Signed-off-by: Rusty Russell ru...@rustcorp.com.au
Cc: Davide Libenzi davi...@xmailserver.org
---
 drivers/lguest/Kconfig  |2 -
 drivers/lguest/core.c   |8 ++--
 drivers/lguest/lg.h |9 
 drivers/lguest/lguest_user.c|   73 
 include/linux/lguest_launcher.h |1 
 5 files changed, 89 insertions(+), 4 deletions(-)

diff --git a/drivers/lguest/Kconfig b/drivers/lguest/Kconfig
--- a/drivers/lguest/Kconfig
+++ b/drivers/lguest/Kconfig
@@ -1,6 +1,6 @@
 config LGUEST
tristate Linux hypervisor example code
-   depends on X86_32  EXPERIMENTAL  !X86_PAE  FUTEX
+   depends on X86_32  EXPERIMENTAL  !X86_PAE  EVENTFD
select HVC_DRIVER
---help---
  This is a very simple module which allows you to run
diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
--- a/drivers/lguest/core.c
+++ b/drivers/lguest/core.c
@@ -198,9 +198,11 @@ int run_guest(struct lg_cpu *cpu, unsign
/* It's possible the Guest did a NOTIFY hypercall to the
 * Launcher, in which case we return from the read() now. */
if (cpu-pending_notify) {
-   if (put_user(cpu-pending_notify, user))
-   return -EFAULT;
-   return sizeof(cpu-pending_notify);
+   if (!send_notify_to_eventfd(cpu)) {
+   if (put_user(cpu-pending_notify, user))
+   return -EFAULT;
+   return sizeof(cpu-pending_notify);
+   }
}
 
/* Check for signals */
diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -82,6 +82,11 @@ struct lg_cpu {
struct lg_cpu_arch arch;
 };
 
+struct lg_eventfds {
+   unsigned long addr;
+   struct file *event;
+};
+
 /* The private info the thread maintains about the guest. */
 struct lguest
 {
@@ -102,6 +107,9 @@ struct lguest
unsigned int stack_pages;
u32 tsc_khz;
 
+   unsigned int num_eventfds;
+   struct lg_eventfds *eventfds;
+
/* Dead? */
const char *dead;
 };
@@ -152,6 +160,7 @@ void setup_default_idt_entries(struct lg
 void copy_traps(const struct lg_cpu *cpu, struct desc_struct *idt,
const unsigned long *def);
 void guest_set_clockevent(struct lg_cpu *cpu, unsigned long delta);
+bool send_notify_to_eventfd(struct lg_cpu *cpu);
 void init_clockdev(struct lg_cpu *cpu);
 bool check_syscall_vector(struct lguest *lg);
 int init_interrupts(void);
diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -7,6 +7,8 @@
 #include linux/miscdevice.h
 #include linux/fs.h
 #include linux/sched.h
+#include linux/eventfd.h
+#include linux/file.h
 #include lg.h
 
 /*L:055 When something happens, the Waker process needs a way to stop the
@@ -35,6 +37,70 @@ static int break_guest_out(struct lg_cpu
}
 }
 
+bool send_notify_to_eventfd(struct lg_cpu *cpu)
+{
+   unsigned int i;
+
+   /* lg-eventfds is RCU-protected */
+   preempt_disable();
+   for (i = 0; i  cpu-lg-num_eventfds; i++) {
+   if (cpu-lg-eventfds[i].addr == cpu-pending_notify) {
+   eventfd_signal(cpu-lg-eventfds[i].event, 1);
+   cpu-pending_notify = 0;
+   break;
+   }
+   }
+   preempt_enable();
+   return cpu-pending_notify == 0;
+}
+
+static int add_eventfd(struct lguest *lg, unsigned long addr, int fd)
+{
+   struct lg_eventfds *new, *old;
+
+   if (!addr)
+   return -EINVAL;
+
+   /* Replace the old array with the 

Re: [RFC] CPU hard limits

2009-06-04 Thread Chris Friesen
Balbir Singh wrote:

 But then there is no other way to make a *guarantee*, guarantees come
 at a cost of idling resources, no? Can you show me any other
 combination that will provide the guarantee and without idling the
 system for the specified guarantees?

The example given was two 10% guaranteed groups and one best-effort
group.  Why would this require idling resources?

If I have a hog in each group, the requirements would be met if the
groups got 33, 33, and 33.  (Or 10/10/80, for that matter.)  If the
second and third groups go idle, why not let the first group use 100% of
the cpu?

The only hard restriction is that the sum of the guarantees must be less
than 100%.

Chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Balbir Singh
* Balbir Singh bal...@linux.vnet.ibm.com [2009-06-05 12:49:46]:

 * Avi Kivity a...@redhat.com [2009-06-05 07:44:27]:
 
  Balbir Singh wrote:
  On Fri, Jun 5, 2009 at 11:33 AM, Avi Kivity a...@redhat.com wrote:

  Bharata B Rao wrote:
  
  Another way is to place the 8 groups in a container group, and limit
   that to 80%. But that doesn't work if I want to provide guarantees to
   several groups.
 
  
  Hmm why not ? Reduce the guarantee of the container group and provide
  the same to additional groups ?
 

  This method produces suboptimal results:
 
  $ cgroup-limits 10 10 0
  [50.0, 50.0, 40.0]
 
  I want to provide two 10% guaranteed groups and one best-effort group.
   Using the limits method, no group can now use more than 50% of the
  resources.  However, having the first group use 90% of the resources does
  not violate any guarantees, but it not allowed by the solution.
 
  
 
  How, it works out fine in my calculation
 
  50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
  limited to 90%
  50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
  limited to 90%
  50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
  limited to 100%

 
  It's fine in that it satisfies the guarantees, but it is deeply  
  suboptimal.  If I ran a cpu hog in the first group, while the other two  
  were idle, it would be limited to 50% cpu.  On the other hand, if it  
  consumed all 100% cpu it would still satisfy the guarantees (as the  
  other groups are idle).
 
  The result is that in such a situation, wall clock time would double  
  even though cpu resources are available.
 
 But then there is no other way to make a *guarantee*, guarantees come
 at a cost of idling resources, no? Can you show me any other
 combination that will provide the guarantee and without idling the
 system for the specified guarantees?

OK, I see part of your concern, but I think we could do some
optimizations during design. For example if all groups have reached
their hard-limit and the system is idle, should we do start a new hard
limit interval and restart, so that idleness can be removed. Would
that be an acceptable design point?

-- 
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Balbir Singh
* Chris Friesen cfrie...@nortel.com [2009-06-04 23:09:22]:

 Balbir Singh wrote:
 
  But then there is no other way to make a *guarantee*, guarantees come
  at a cost of idling resources, no? Can you show me any other
  combination that will provide the guarantee and without idling the
  system for the specified guarantees?
 
 The example given was two 10% guaranteed groups and one best-effort
 group.  Why would this require idling resources?
 
 If I have a hog in each group, the requirements would be met if the
 groups got 33, 33, and 33.  (Or 10/10/80, for that matter.)  If the
 second and third groups go idle, why not let the first group use 100% of
 the cpu?
 
 The only hard restriction is that the sum of the guarantees must be less
 than 100%.


Chris,

I just responded to a variation of this, I think that some of this
could be handled during design. I just sent out the email a few
minutes ago. Could you look at that and respond. 

-- 
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Avi Kivity

Balbir Singh wrote:

   


How, it works out fine in my calculation

50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
limited to 90%
50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
limited to 90%
50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
limited to 100%
  
  
It's fine in that it satisfies the guarantees, but it is deeply  
suboptimal.  If I ran a cpu hog in the first group, while the other two  
were idle, it would be limited to 50% cpu.  On the other hand, if it  
consumed all 100% cpu it would still satisfy the guarantees (as the  
other groups are idle).


The result is that in such a situation, wall clock time would double  
even though cpu resources are available.



But then there is no other way to make a *guarantee*, guarantees come
at a cost of idling resources, no? Can you show me any other
combination that will provide the guarantee and without idling the
system for the specified guarantees?
  


Suppose in my example cgroup 1 consumed 100% of the cpu resources and 
cgroup 2 and 3 were completely idle.  All of the guarantees are met (if 
cgroup 2 is idle, there's no need to give it the 10% cpu time it is 
guaranteed).


If  your only tool to achieve the guarantees is a limit system, then 
yes, the equation yields the correct results.  But given that it yields 
such inferior results, I think we need to look for a more involved solution.


I think the limits method fits cases where it is difficult to evict a 
resource (say, disk quotas -- if you want to guarantee 10% of space to 
cgroups 1, you must limit all others to 90%).  But for processor usage, 
you can evict a cgroup instantly, so nothing prevents a cgroup from 
consuming all available resources as long as others do not contend for them.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Balbir Singh
* Avi Kivity a...@redhat.com [2009-06-05 08:16:21]:

 Balbir Singh wrote:



 How, it works out fine in my calculation

 50 + 40 for G2 and G3, make sure that G1 gets 10%, since others are
 limited to 90%
 50 + 40 for G1 and G3, make sure that G2 gets 10%, since others are
 limited to 90%
 50 + 50 for G1 and G2, make sure that G3 gets 0%, since others are
 limited to 100%
 
 It's fine in that it satisfies the guarantees, but it is deeply   
 suboptimal.  If I ran a cpu hog in the first group, while the other 
 two  were idle, it would be limited to 50% cpu.  On the other hand, 
 if it  consumed all 100% cpu it would still satisfy the guarantees 
 (as the  other groups are idle).

 The result is that in such a situation, wall clock time would double  
 even though cpu resources are available.
 

 But then there is no other way to make a *guarantee*, guarantees come
 at a cost of idling resources, no? Can you show me any other
 combination that will provide the guarantee and without idling the
 system for the specified guarantees?
   

 Suppose in my example cgroup 1 consumed 100% of the cpu resources and  
 cgroup 2 and 3 were completely idle.  All of the guarantees are met (if  
 cgroup 2 is idle, there's no need to give it the 10% cpu time it is  
 guaranteed).

 If  your only tool to achieve the guarantees is a limit system, then  
 yes, the equation yields the correct results.  But given that it yields  
 such inferior results, I think we need to look for a more involved 
 solution.

 I think the limits method fits cases where it is difficult to evict a  
 resource (say, disk quotas -- if you want to guarantee 10% of space to  
 cgroups 1, you must limit all others to 90%).  But for processor usage,  
 you can evict a cgroup instantly, so nothing prevents a cgroup from  
 consuming all available resources as long as others do not contend for 
 them.

Avi,

Could you look at my newer email and comment, where I've mentioned
that I see your concern and discussed a design point. We could
probably take this discussion forward from there?

-- 
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Avi Kivity

Balbir Singh wrote:

But then there is no other way to make a *guarantee*, guarantees come
at a cost of idling resources, no? Can you show me any other
combination that will provide the guarantee and without idling the
system for the specified guarantees?



OK, I see part of your concern, but I think we could do some
optimizations during design. For example if all groups have reached
their hard-limit and the system is idle, should we do start a new hard
limit interval and restart, so that idleness can be removed. Would
that be an acceptable design point?


I think so.  Given guarantees G1..Gn (0 = Gi = 1; sum(Gi) = 1), and a 
cpu hog running in each group, how would the algorithm divide resources?


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] CPU hard limits

2009-06-04 Thread Balbir Singh
* Avi Kivity a...@redhat.com [2009-06-05 08:21:43]:

 Balbir Singh wrote:
 But then there is no other way to make a *guarantee*, guarantees come
 at a cost of idling resources, no? Can you show me any other
 combination that will provide the guarantee and without idling the
 system for the specified guarantees?
 

 OK, I see part of your concern, but I think we could do some
 optimizations during design. For example if all groups have reached
 their hard-limit and the system is idle, should we do start a new hard
 limit interval and restart, so that idleness can be removed. Would
 that be an acceptable design point?

 I think so.  Given guarantees G1..Gn (0 = Gi = 1; sum(Gi) = 1), and a  
 cpu hog running in each group, how would the algorithm divide resources?


As per the matrix calculation, but as soon as we reach an idle point,
we redistribute the b/w and start a new quantum so to speak, where all
groups are charged up to their hard limits.

For your question, if there is a CPU hog running, it would be as per
the matrix calculation, since the system has no idle point during the
bandwidth period.

-- 
Balbir
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v2 00/19] virtual-bus

2009-06-04 Thread Paul E. McKenney
On Fri, Jun 05, 2009 at 02:25:01PM +0930, Rusty Russell wrote:
 On Fri, 5 Jun 2009 04:19:17 am Gregory Haskins wrote:
  Avi Kivity wrote:
   Gregory Haskins wrote:
   One idea is similar to signalfd() or eventfd()
 
  And thus the kvm-eventfd (irqfd/iosignalfd) interface project was born.
  ;)
 
 The lguest patch queue already has such an interface :)  And I have a
 partially complete in-kernel virtio_pci patch with the same trick.
 
 I switched from kernel created eventfd to userspace passes in eventfd
 after a while though; it lets you connect multiple virtqueues to a single fd
 if you want.
 
 Combined with a minor change to allow any process with access to the lguest fd
 to queue interrupts, this allowed lguest to move to a thread-per-virtqueue
 model which was a significant speedup as well as nice code reduction.
 
 Here's the relevant kernel patch for reading.
 
 Thanks!
 Rusty.
 
 lguest: use eventfds for device notification
 
 Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with
 an address: the main Launcher process returns with this address, and figures
 out what device to run.
 
 A far nicer model is to let processes bind an eventfd to an address: if we
 find one, we simply signal the eventfd.

A couple of (probably misguided) RCU questions/suggestions interspersed.

 Signed-off-by: Rusty Russell ru...@rustcorp.com.au
 Cc: Davide Libenzi davi...@xmailserver.org
 ---
  drivers/lguest/Kconfig  |2 -
  drivers/lguest/core.c   |8 ++--
  drivers/lguest/lg.h |9 
  drivers/lguest/lguest_user.c|   73 
 
  include/linux/lguest_launcher.h |1 
  5 files changed, 89 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/lguest/Kconfig b/drivers/lguest/Kconfig
 --- a/drivers/lguest/Kconfig
 +++ b/drivers/lguest/Kconfig
 @@ -1,6 +1,6 @@
  config LGUEST
   tristate Linux hypervisor example code
 - depends on X86_32  EXPERIMENTAL  !X86_PAE  FUTEX
 + depends on X86_32  EXPERIMENTAL  !X86_PAE  EVENTFD
   select HVC_DRIVER
   ---help---
 This is a very simple module which allows you to run
 diff --git a/drivers/lguest/core.c b/drivers/lguest/core.c
 --- a/drivers/lguest/core.c
 +++ b/drivers/lguest/core.c
 @@ -198,9 +198,11 @@ int run_guest(struct lg_cpu *cpu, unsign
   /* It's possible the Guest did a NOTIFY hypercall to the
* Launcher, in which case we return from the read() now. */
   if (cpu-pending_notify) {
 - if (put_user(cpu-pending_notify, user))
 - return -EFAULT;
 - return sizeof(cpu-pending_notify);
 + if (!send_notify_to_eventfd(cpu)) {
 + if (put_user(cpu-pending_notify, user))
 + return -EFAULT;
 + return sizeof(cpu-pending_notify);
 + }
   }
 
   /* Check for signals */
 diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
 --- a/drivers/lguest/lg.h
 +++ b/drivers/lguest/lg.h
 @@ -82,6 +82,11 @@ struct lg_cpu {
   struct lg_cpu_arch arch;
  };
 
 +struct lg_eventfds {
 + unsigned long addr;
 + struct file *event;
 +};
 +
  /* The private info the thread maintains about the guest. */
  struct lguest
  {
 @@ -102,6 +107,9 @@ struct lguest
   unsigned int stack_pages;
   u32 tsc_khz;
 
 + unsigned int num_eventfds;
 + struct lg_eventfds *eventfds;
 +
   /* Dead? */
   const char *dead;
  };
 @@ -152,6 +160,7 @@ void setup_default_idt_entries(struct lg
  void copy_traps(const struct lg_cpu *cpu, struct desc_struct *idt,
   const unsigned long *def);
  void guest_set_clockevent(struct lg_cpu *cpu, unsigned long delta);
 +bool send_notify_to_eventfd(struct lg_cpu *cpu);
  void init_clockdev(struct lg_cpu *cpu);
  bool check_syscall_vector(struct lguest *lg);
  int init_interrupts(void);
 diff --git a/drivers/lguest/lguest_user.c b/drivers/lguest/lguest_user.c
 --- a/drivers/lguest/lguest_user.c
 +++ b/drivers/lguest/lguest_user.c
 @@ -7,6 +7,8 @@
  #include linux/miscdevice.h
  #include linux/fs.h
  #include linux/sched.h
 +#include linux/eventfd.h
 +#include linux/file.h
  #include lg.h
 
  /*L:055 When something happens, the Waker process needs a way to stop the
 @@ -35,6 +37,70 @@ static int break_guest_out(struct lg_cpu
   }
  }
 
 +bool send_notify_to_eventfd(struct lg_cpu *cpu)
 +{
 + unsigned int i;
 +
 + /* lg-eventfds is RCU-protected */
 + preempt_disable();

Suggest changing to rcu_read_lock() to match the synchronize_rcu().

 + for (i = 0; i  cpu-lg-num_eventfds; i++) {
 + if (cpu-lg-eventfds[i].addr == cpu-pending_notify) {
 + eventfd_signal(cpu-lg-eventfds[i].event, 1);

Shouldn't this be something like the following?

p = rcu_dereference(cpu-lg-eventfds);
if 

Re: [RFC] CPU hard limits

2009-06-04 Thread Bharata B Rao
On Fri, Jun 05, 2009 at 01:27:55PM +0800, Balbir Singh wrote:
 * Avi Kivity a...@redhat.com [2009-06-05 08:21:43]:
 
  Balbir Singh wrote:
  But then there is no other way to make a *guarantee*, guarantees come
  at a cost of idling resources, no? Can you show me any other
  combination that will provide the guarantee and without idling the
  system for the specified guarantees?
  
 
  OK, I see part of your concern, but I think we could do some
  optimizations during design. For example if all groups have reached
  their hard-limit and the system is idle, should we do start a new hard
  limit interval and restart, so that idleness can be removed. Would
  that be an acceptable design point?
 
  I think so.  Given guarantees G1..Gn (0 = Gi = 1; sum(Gi) = 1), and a  
  cpu hog running in each group, how would the algorithm divide resources?
 
 
 As per the matrix calculation, but as soon as we reach an idle point,
 we redistribute the b/w and start a new quantum so to speak, where all
 groups are charged up to their hard limits.

But could there be client models where you are required to strictly
adhere to the limit within the bandwidth and not provide more (by advancing
the bandwidth period) in the presence of idle cycles ?

Regards,
Bharata.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM on Debian

2009-06-04 Thread Michael Tokarev

Aaron Clausen wrote:
[]

is too old to support this.  Is there a reasonably safe way of
upgrading to one of the newer versions of KVM on this server?


Can't say for safe but you can grab my .debs which I use here
on a bunch of machines, from http://www.corpit.ru/debian/tls/kvm/ -
both binaries and sources.  To make them more safe to you, you
can download .dsc, .diff.gz, examine the content and build it
yourself.

/mjt

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Merge branch 'master' of git://git.sv.gnu.org/qemu

2009-06-04 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

* 'master' of git://git.sv.gnu.org/qemu: (40 commits)
  Update maintainer list.
  Install keymaps from new location
  vvfat: one more missing BlockDriver C99 initializer conversion
  Move keymaps into pc-bios
  kvm: Mark full address range dirty on live migration start
  Add detection of pthread library name
  User networking: Show active connections
  User Networking: Enable removal of redirections
  Allow monitor interaction when using migrate -exec
  fully split aio_pool from BlockDriver
  qcow: add qcow_aio_setup helper
  raw-posix: fix hdev_create
  fix raw_pread_aligned return value
  VNC: Fix memory allocation (wrong structure size).
  Drop bdrv_create2
  qcow2: Update multiple refcounts at once
  qcow2: Refactor update_refcount
  qcow/qcow2: Drop synchronous qcow_write()
  e1000: Ignore reset command
  Fix output of uninitialized strings
  ...

Signed-off-by: Avi Kivity a...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] kvm: Add irqfd support

2009-06-04 Thread Avi Kivity
From: Gregory Haskins ghask...@novell.com

irqfd lets you create an eventfd based file-desriptor to inject interrupts
to a kvm guest.  We associate one gsi per fd for fine-grained routing.

Signed-off-by: Gregory Haskins ghask...@novell.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/libkvm-all.c b/libkvm-all.c
index 1668e32..6a684a5 100644
--- a/libkvm-all.c
+++ b/libkvm-all.c
@@ -1481,3 +1481,52 @@ int kvm_assign_set_msix_entry(kvm_context_t kvm,
 return ret;
 }
 #endif
+
+#if defined(KVM_CAP_IRQFD)  defined(CONFIG_eventfd)
+
+#include sys/eventfd.h
+
+static int _kvm_irqfd(kvm_context_t kvm, int fd, int gsi, int flags)
+{
+   int r;
+   struct kvm_irqfd data = {
+   .fd= fd,
+   .gsi   = gsi,
+   .flags = flags,
+   };
+
+   r = ioctl(kvm-vm_fd, KVM_IRQFD, data);
+   if (r == -1)
+   r = -errno;
+   return r;
+}
+
+int kvm_irqfd(kvm_context_t kvm, int gsi, int flags)
+{
+   int r;
+   int fd;
+
+   if (!kvm_check_extension(kvm, KVM_CAP_IRQFD))
+   return -ENOENT;
+
+   fd = eventfd(0, 0);
+   if (fd  0)
+   return -errno;
+
+   r = _kvm_irqfd(kvm, fd, gsi, 0);
+   if (r  0) {
+   close(fd);
+   return -errno;
+   }
+
+   return fd;
+}
+
+#else /* KVM_CAP_IRQFD */
+
+int kvm_irqfd(kvm_context_t kvm, int gsi, int flags)
+{
+   return -ENOSYS;
+}
+
+#endif /* KVM_CAP_IRQFD */
diff --git a/libkvm-all.h b/libkvm-all.h
index 4821a1e..aca8ed6 100644
--- a/libkvm-all.h
+++ b/libkvm-all.h
@@ -856,6 +856,20 @@ int kvm_commit_irq_routes(kvm_context_t kvm);
  */
 int kvm_get_irq_route_gsi(kvm_context_t kvm);
 
+/*!
+ * \brief Create a file descriptor for injecting interrupts
+ *
+ * Creates an eventfd based file-descriptor that maps to a specific GSI
+ * in the guest.  eventfd compliant signaling (write() from userspace, or
+ * eventfd_signal() from kernelspace) will cause the GSI to inject
+ * itself into the guest at the next available window.
+ *
+ * \param kvm Pointer to the current kvm_context
+ * \param gsi GSI to assign to this fd
+ * \param flags reserved, must be zero
+ */
+int kvm_irqfd(kvm_context_t kvm, int gsi, int flags);
+
 #ifdef KVM_CAP_DEVICE_MSIX
 int kvm_assign_set_msix_nr(kvm_context_t kvm,
   struct kvm_assigned_msix_nr *msix_nr);
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Do not use cpu_index in interface between libkvm and qemu

2009-06-04 Thread Avi Kivity
From: Gleb Natapov g...@redhat.com

On vcpu creation cookie is returned which is used in future communication.

Signed-off-by: Gleb Natapov g...@redhat.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/cpu-defs.h b/cpu-defs.h
index 1e071e7..5f541e0 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -147,6 +147,7 @@ struct KVMCPUState {
 int stop;
 int stopped;
 int created;
+void *vcpu_ctx;
 struct qemu_work_item *queued_work_first, *queued_work_last;
 };
 
diff --git a/hw/apic.c b/hw/apic.c
index 86aa6b6..c5d97b2 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -833,7 +833,7 @@ static void kvm_kernel_lapic_save_to_user(APICState *s)
 struct kvm_lapic_state *kapic = apic;
 int i, v;
 
-kvm_get_lapic(kvm_context, s-cpu_env-cpu_index, kapic);
+kvm_get_lapic(s-cpu_env-kvm_cpu_state.vcpu_ctx, kapic);
 
 s-id = kapic_reg(kapic, 0x2)  24;
 s-tpr = kapic_reg(kapic, 0x8);
@@ -886,7 +886,7 @@ static void kvm_kernel_lapic_load_from_user(APICState *s)
 kapic_set_reg(klapic, 0x38, s-initial_count);
 kapic_set_reg(klapic, 0x3e, s-divide_conf);
 
-kvm_set_lapic(kvm_context, s-cpu_env-cpu_index, klapic);
+kvm_set_lapic(s-cpu_env-kvm_cpu_state.vcpu_ctx, klapic);
 }
 
 #endif
diff --git a/kvm-tpr-opt.c b/kvm-tpr-opt.c
index 8f7e1e5..f7b6f3b 100644
--- a/kvm-tpr-opt.c
+++ b/kvm-tpr-opt.c
@@ -70,7 +70,7 @@ static uint8_t read_byte_virt(CPUState *env, target_ulong 
virt)
 {
 struct kvm_sregs sregs;
 
-kvm_get_sregs(kvm_context, env-cpu_index, sregs);
+kvm_get_sregs(env-kvm_cpu_state.vcpu_ctx, sregs);
 return ldub_phys(map_addr(sregs, virt, NULL));
 }
 
@@ -78,7 +78,7 @@ static void write_byte_virt(CPUState *env, target_ulong virt, 
uint8_t b)
 {
 struct kvm_sregs sregs;
 
-kvm_get_sregs(kvm_context, env-cpu_index, sregs);
+kvm_get_sregs(env-kvm_cpu_state.vcpu_ctx, sregs);
 stb_phys(map_addr(sregs, virt, NULL), b);
 }
 
@@ -86,7 +86,7 @@ static __u64 kvm_rsp_read(CPUState *env)
 {
 struct kvm_regs regs;
 
-kvm_get_regs(kvm_context, env-cpu_index, regs);
+kvm_get_regs(env-kvm_cpu_state.vcpu_ctx, regs);
 return regs.rsp;
 }
 
@@ -192,7 +192,7 @@ static int bios_is_mapped(CPUState *env, uint64_t rip)
 if (bios_enabled)
return 1;
 
-kvm_get_sregs(kvm_context, env-cpu_index, sregs);
+kvm_get_sregs(env-kvm_cpu_state.vcpu_ctx, sregs);
 
 probe = (rip  0xf000) + 0xe;
 phys = map_addr(sregs, probe, perms);
@@ -240,7 +240,7 @@ static int enable_vapic(CPUState *env)
 if (pcr_cpu  0)
return 0;
 
-kvm_enable_vapic(kvm_context, env-cpu_index, vapic_phys + (pcr_cpu  7));
+kvm_enable_vapic(env-kvm_cpu_state.vcpu_ctx, vapic_phys + (pcr_cpu  7));
 cpu_physical_memory_rw(vapic_phys + (pcr_cpu  7) + 4, one, 1, 1);
 bios_enabled = 1;
 
@@ -313,7 +313,7 @@ void kvm_tpr_access_report(CPUState *env, uint64_t rip, int 
is_write)
 
 void kvm_tpr_vcpu_start(CPUState *env)
 {
-kvm_enable_tpr_access_reporting(kvm_context, env-cpu_index);
+kvm_enable_tpr_access_reporting(env-kvm_cpu_state.vcpu_ctx);
 if (bios_enabled)
enable_vapic(env);
 }
@@ -363,7 +363,7 @@ static void vtpr_ioport_write(void *opaque, uint32_t addr, 
uint32_t val)
 struct kvm_sregs sregs;
 uint32_t rip;
 
-kvm_get_regs(kvm_context, env-cpu_index, regs);
+kvm_get_regs(env-kvm_cpu_state.vcpu_ctx, regs);
 rip = regs.rip - 2;
 write_byte_virt(env, rip, 0x66);
 write_byte_virt(env, rip + 1, 0x90);
@@ -371,7 +371,7 @@ static void vtpr_ioport_write(void *opaque, uint32_t addr, 
uint32_t val)
return;
 if (!bios_is_mapped(env, rip))
printf(bios not mapped?\n);
-kvm_get_sregs(kvm_context, env-cpu_index, sregs);
+kvm_get_sregs(env-kvm_cpu_state.vcpu_ctx, sregs);
 for (addr = 0xf000u; addr = 0x8000u; addr -= 4096)
if (map_addr(sregs, addr, NULL) == 0xfee0u) {
real_tpr = addr + 0x80;
diff --git a/libkvm-all.c b/libkvm-all.c
index 6a684a5..c45b058 100644
--- a/libkvm-all.c
+++ b/libkvm-all.c
@@ -356,10 +356,12 @@ kvm_context_t kvm_init(struct kvm_callbacks *callbacks,
 
 void kvm_finalize(kvm_context_t kvm)
 {
+   /* FIXME
if (kvm-vcpu_fd[0] != -1)
close(kvm-vcpu_fd[0]);
if (kvm-vm_fd != -1)
close(kvm-vm_fd);
+   */
close(kvm-fd);
free(kvm);
 }
@@ -374,32 +376,43 @@ void kvm_disable_pit_creation(kvm_context_t kvm)
kvm-no_pit_creation = 1;
 }
 
-int kvm_create_vcpu(kvm_context_t kvm, int slot)
+kvm_vcpu_context_t kvm_create_vcpu(kvm_context_t kvm, int id)
 {
long mmap_size;
int r;
+   kvm_vcpu_context_t vcpu_ctx = malloc(sizeof(struct kvm_vcpu_context));
 
-   r = ioctl(kvm-vm_fd, KVM_CREATE_VCPU, slot);
+   if (!vcpu_ctx) {
+   errno = ENOMEM;
+   return NULL;
+   }
+
+   vcpu_ctx-kvm = kvm;
+   vcpu_ctx-id = id;
+
+   r = ioctl(kvm-vm_fd, KVM_CREATE_VCPU, id);
if (r == 

[COMMIT master] Update source link

2009-06-04 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/linux-2.6 b/linux-2.6
index d789c98..28ddf0a 16
--- a/linux-2.6
+++ b/linux-2.6
@@ -1 +1 @@
-Subproject commit d789c98af00b1bd62f91a241586546072a41175b
+Subproject commit 28ddf0aebbf546e56efd1951725d5457ce1ebf98
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[COMMIT master] Backport srcu implementation

2009-06-04 Thread Avi Kivity
From: Avi Kivity a...@redhat.com

Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/external-module-compat-comm.h b/external-module-compat-comm.h
index 9801441..5b96c46 100644
--- a/external-module-compat-comm.h
+++ b/external-module-compat-comm.h
@@ -812,3 +812,16 @@ static inline struct file *eventfd_fget(int fd)
 }
 
 #endif
+
+/* srcu was born in 2.6.19 */
+
+#if LINUX_VERSION_CODE = KERNEL_VERSION(2,6,19)
+
+#define kvm_init_srcu_struct init_srcu_struct
+#define kvm_cleanup_srcu_struct cleanup_srcu_struct
+#define kvm_srcu_read_lock srcu_read_lock
+#define kvm_srcu_read_unlock srcu_read_unlock
+#define kvm_synchronize_srcu synchronize_srcu
+#define kvm_srcu_batches_completed srcu_batches_completed
+
+#endif
diff --git a/include-compat/linux/srcu.h b/include-compat/linux/srcu.h
new file mode 100644
index 000..0d476be
--- /dev/null
+++ b/include-compat/linux/srcu.h
@@ -0,0 +1,53 @@
+/*
+ * Sleepable Read-Copy Update mechanism for mutual exclusion
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2006
+ *
+ * Author: Paul McKenney paul...@us.ibm.com
+ *
+ * For detailed explanation of Read-Copy Update mechanism see -
+ * Documentation/RCU/ *.txt
+ *
+ */
+
+#ifndef _LINUX_SRCU_H
+#define _LINUX_SRCU_H
+
+struct srcu_struct_array {
+   int c[2];
+};
+
+struct srcu_struct {
+   int completed;
+   struct srcu_struct_array *per_cpu_ref;
+   struct mutex mutex;
+};
+
+#ifndef CONFIG_PREEMPT
+#define srcu_barrier() barrier()
+#else /* #ifndef CONFIG_PREEMPT */
+#define srcu_barrier()
+#endif /* #else #ifndef CONFIG_PREEMPT */
+
+int kvm_init_srcu_struct(struct srcu_struct *sp);
+void kvm_cleanup_srcu_struct(struct srcu_struct *sp);
+int kvm_srcu_read_lock(struct srcu_struct *sp) __acquires(sp);
+void kvm_srcu_read_unlock(struct srcu_struct *sp, int idx) __releases(sp);
+void kvm_synchronize_srcu(struct srcu_struct *sp);
+long kvm_srcu_batches_completed(struct srcu_struct *sp);
+
+#endif
diff --git a/linux-2.6 b/linux-2.6
index 6790e2f..d789c98 16
--- a/linux-2.6
+++ b/linux-2.6
@@ -1 +1 @@
-Subproject commit 6790e2f843585d5232ff7f724e6baf9f32cd08cd
+Subproject commit d789c98af00b1bd62f91a241586546072a41175b
diff --git a/srcu.c b/srcu.c
new file mode 100644
index 000..3243adf
--- /dev/null
+++ b/srcu.c
@@ -0,0 +1,263 @@
+/*
+ * Sleepable Read-Copy Update mechanism for mutual exclusion.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2006
+ *
+ * Author: Paul McKenney paul...@us.ibm.com
+ *
+ * For detailed explanation of Read-Copy Update mechanism see -
+ * Documentation/RCU/ *.txt
+ *
+ */
+
+#include linux/module.h
+#include linux/mutex.h
+#include linux/percpu.h
+#include linux/preempt.h
+#include linux/rcupdate.h
+#include linux/sched.h
+#include linux/slab.h
+#include linux/smp.h
+#include linux/srcu.h
+
+#undef kvm_init_srcu_struct
+#undef kvm_cleanup_srcu_struct
+#undef kvm_srcu_read_lock
+#undef kvm_srcu_read_unlock
+#undef kvm_synchronize_srcu
+#undef kvm_srcu_batches_completed
+/**
+ * init_srcu_struct - initialize a sleep-RCU structure
+ * @sp: structure to initialize.
+ *
+ * Must invoke this on a given srcu_struct before passing that srcu_struct
+ * to any other function.  Each srcu_struct represents a separate domain
+ * of SRCU protection.
+ */
+int kvm_init_srcu_struct(struct srcu_struct *sp)
+{
+   sp-completed = 0;
+   mutex_init(sp-mutex);
+   sp-per_cpu_ref = alloc_percpu(struct srcu_struct_array);
+   return (sp-per_cpu_ref ? 0 : -ENOMEM);
+}
+
+/*
+ * srcu_readers_active_idx -- returns approximate 

[COMMIT master] KVM: use POLLHUP to close an irqfd instead of an explicit ioctl

2009-06-04 Thread Avi Kivity
From: Gregory Haskins ghask...@novell.com

Assigning an irqfd object to a kvm object creates a relationship that we
currently manage by having the kvm oject acquire/hold a file* reference to
the underlying eventfd.  The lifetime of these objects is properly maintained
by decoupling the two objects whenever the irqfd is closed or kvm is closed,
whichever comes first.

However, the irqfd close method is less than ideal since it requires two
system calls to complete (one for ioctl(kvmfd, IRQFD_DEASSIGN), the other for
close(eventfd)).  This dual-call approach was utilized because there was no
notification mechanism on the eventfd side at the time irqfd was implemented.

Recently, Davide proposed a patch to send a POLLHUP wakeup whenever an
eventfd is about to close.  So we eliminate the IRQFD_DEASSIGN ioctl (*)
vector in favor of sensing the desassign automatically when the fd is closed.
The resulting code is slightly more complex as a result since we need to
allow either side to sever the relationship independently.  We utilize SRCU
to guarantee stable concurrent access to the KVM pointer without adding
additional atomic operations in the fast path.

[avi: add missing #include]

Signed-off-by: Gregory Haskins ghask...@novell.com
CC: Davide Libenzi davi...@xmailserver.org
CC: Michael S. Tsirkin m...@redhat.com
CC: Paul E. McKenney paul...@linux.vnet.ibm.com
Signed-off-by: Avi Kivity a...@redhat.com

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 632a856..29b62cc 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -482,8 +482,6 @@ struct kvm_x86_mce {
 };
 #endif
 
-#define KVM_IRQFD_FLAG_DEASSIGN (1  0)
-
 struct kvm_irqfd {
__u32 fd;
__u32 gsi;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index f3f2ea1..cdd3fce 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -28,6 +28,7 @@
 #include linux/file.h
 #include linux/list.h
 #include linux/eventfd.h
+#include linux/srcu.h
 
 /*
  * 
@@ -37,39 +38,92 @@
  * 
  */
 struct _irqfd {
+   struct mutex  lock;
+   struct srcu_structsrcu;
struct kvm   *kvm;
int   gsi;
-   struct file  *file;
struct list_head  list;
poll_tablept;
wait_queue_head_t*wqh;
wait_queue_t  wait;
-   struct work_structwork;
+   struct work_structinject;
 };
 
 static void
 irqfd_inject(struct work_struct *work)
 {
-   struct _irqfd *irqfd = container_of(work, struct _irqfd, work);
-   struct kvm *kvm = irqfd-kvm;
+   struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
+   struct kvm *kvm;
+   int idx;
+
+   idx = srcu_read_lock(irqfd-srcu);
+
+   kvm = rcu_dereference(irqfd-kvm);
+   if (kvm) {
+   mutex_lock(kvm-lock);
+   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
+   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
+   mutex_unlock(kvm-lock);
+   }
+
+   srcu_read_unlock(irqfd-srcu, idx);
+}
+
+static void
+irqfd_disconnect(struct _irqfd *irqfd)
+{
+   struct kvm *kvm;
+
+   mutex_lock(irqfd-lock);
+
+   kvm = rcu_dereference(irqfd-kvm);
+   rcu_assign_pointer(irqfd-kvm, NULL);
+
+   mutex_unlock(irqfd-lock);
+
+   if (!kvm)
+   return;
 
mutex_lock(kvm-lock);
-   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
-   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
+   list_del(irqfd-list);
mutex_unlock(kvm-lock);
+
+   /*
+* It is important to not drop the kvm reference until the next grace
+* period because there might be lockless references in flight up
+* until then
+*/
+   synchronize_srcu(irqfd-srcu);
+   kvm_put_kvm(kvm);
 }
 
 static int
 irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
 {
struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
+   unsigned long flags = (unsigned long)key;
 
-   /*
-* The wake_up is called with interrupts disabled.  Therefore we need
-* to defer the IRQ injection until later since we need to acquire the
-* kvm-lock to do so.
-*/
-   schedule_work(irqfd-work);
+   if (flags  POLLIN)
+   /*
+* The POLLIN wake_up is called with interrupts disabled.
+* Therefore we need to defer the IRQ injection until later
+* since we need to acquire the kvm-lock to do so.
+*/
+   schedule_work(irqfd-inject);
+
+   if (flags  POLLHUP) {
+   /*
+* The POLLHUP is called unlocked, so it theoretically should
+* be safe to