Re: [PATCH] expose kvmclock upper msr set.

2011-02-27 Thread Avi Kivity

On 02/25/2011 03:11 PM, Glauber Costa wrote:

On Thu, 2011-02-24 at 18:54 -0500, Steven Rostedt wrote:
  On Thu, 2011-02-24 at 20:48 -0300, Glauber Costa wrote:
On Thu, 2011-02-24 at 18:24 -0500, Steven Rostedt wrote:
  On Wed, Feb 23, 2011 at 12:44:14PM -0500, Glauber Costa wrote:
We've been supporting kvmclock MSRs in the 0x4b564d00-0x4b564dff 
range
for a while now, but we're not exposing it yet, meaning nobody is 
using it.
This simple patch takes care of that.

  Is nobody using it because it was never exposed? Or because nobody ever
  needed it, and we don't care (thus don't bother supporting it).

The former. Our guest kernels will only rely on features that are
exposed, meaning that if they are not, the guest kernel will never know
it is available.
  

  Might want to rephrase your change log, as to me it sounds like nobody
  is using it because it is not needed. Adding the Our guest... from
  your response to the change log will clear that up.

  Thanks,
I can do that, sure.

OTOH, I know avi changed changelogs for clarity a couple of times
before, so if there is no code change needed, maybe he think it is
easier to rephrase it before picking it up.

Avi?



If that's the only change, no problem.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/3] kvm: Allow memory slot array to grow on demand

2011-02-27 Thread Avi Kivity

On 02/24/2011 08:08 PM, Alex Williamson wrote:

@@ -207,7 +206,7 @@ struct kvm_mmu_page {
* One bit set per slot which has memory
* in this shadow page.
*/
-   DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
+   unsigned long *slot_bitmap;

  What about

   union {
 DECLARE_BITMAP(direct_slot_bitmap, BITS_PER_LONG);
 unsigned long *indirect_slot_bitmap;
   };

  to make the hackery below more explicit?

Yeah, it need something to make the hackery go down easier.  I was
actually thinking about:

unsigned long *slot_bitmap;
DECLARE_BITMAP(direct_slot_bitmap, BITS_PER_LONG);

Where we'd then just set:

slot_bitmap =direct_slot_bitmap;

It wastes 8 bytes, and pushes the cache a little harder, but still helps
the locality and makes the usage more consistent.


   unsigned long *sp_slot_bitmap(struct kvm_mmu_page *sp) { ... }

gives you the best of both worlds.



  We don't support failing kvm_mmu_get_page().  See
  mmu_memory_cache_alloc() and mmu_topup_memory_caches().

Hmm, apparently my search stopped at __direct_map() calling
kvm_mmu_get_page() and handling an error.


That's dead code (was there from the very first commit into mmu.c).


  
r = -ENOMEM;
-   slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
+
+   if (mem-slot= kvm-memslots-nmemslots) {
+   nmemslots = mem-slot + 1;
+   flush = true;

  Isn't flush here a little too agressive?  Shouldn't we flush only if we
  cross the BITS_PER_LONG threshold?

Perhaps, but is that overly exploiting our knowledge about the bitmap
implementation?  I figured better to error too aggressively than too
lazy since this is a rare event already.


I'm worried about the screen-clearing using the vga window at 
0xa[08]000.  If that works without too much flushing, then we're fine.


On second thoughts we're likely fine even if we do flush, since it's in 
a tight loop so it takes very little work to reestablish the dropped sptes.



@@ -1832,6 +1854,8 @@ static long kvm_vm_ioctl(struct file *filp,
sizeof kvm_userspace_mem))
goto out;
  
+   kvm_userspace_mem.slot += KVM_PRIVATE_MEM_SLOTS;
+

  Slightly uneasy about this, but no real objection.

If you have better ideas, let me know.  This reminds me to ask about
this chunk:

@@ -671,7 +674,7 @@ int __kvm_set_memory_region(struct kvm *kvm,

 /* Check for overlaps */
 r = -EEXIST;
-   for (i = 0; i  KVM_MEMORY_SLOTS; ++i) {
+   for (i = KVM_PRIVATE_MEM_SLOTS; i  kvm-memslots-nmemslots; ++i) {
 struct kvm_memory_slot *s =kvm-memslots-memslots[i];

 if (s == memslot || !s-npages)

I kept the same behavior as previous, but it highlights that we're not
checking for overlaps between private slots and anything else.  Existing
bug?  Thanks,


Yes, possibly serious.  Who knows what happens if we create a page using 
one slot and remove it via another?


Let's go write some Visual Basic.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] Weight-balanced binary tree + KVM growable memory slots using wbtree

2011-02-27 Thread Avi Kivity

On 02/24/2011 07:35 PM, Alex Williamson wrote:

On Thu, 2011-02-24 at 12:06 +0200, Avi Kivity wrote:
  On 02/23/2011 09:28 PM, Alex Williamson wrote:
I had forgotten about1M mem, so actually the slot configuration was:
  
0:1M
1: 1M - 3.5G
2: 4G+
  
I stacked the deck in favor of the static array (0: 4G+, 1: 1M-3.5G, 2:
1M), and got these kernbench results:
  
 base (stdev)reorder (stdev)   wbtree (stdev)
+-+++
Elapsed |  42.809 (0.19)  |  42.160 (0.22) |  42.305 (0.23) |
User| 115.709 (0.22)  | 114.358 (0.40) | 114.720 (0.31) |
System  |  41.605 (0.14)  |  40.741 (0.22) |  40.924 (0.20) |
%cpu|   366.9 (1.45)  |   367.4 (1.17) |   367.6 (1.51) |
context |  7272.3 (68.6)  |  7248.1 (89.7) |  7249.5 (97.8) |
sleeps  | 14826.2 (110.6) | 14780.7 (86.9) | 14798.5 (63.0) |
  
So, wbtree is only slightly behind reordering, and the standard
deviation suggests the runs are mostly within the noise of each other.
Thanks,

  Doesn't this indicate we should use reordering, instead of a new data
  structure?

The original problem that brought this on was scaling.  The re-ordered
array still has O(N) scaling while the tree should have ~O(logN) (note
that it currently doesn't because it needs a compaction algorithm added
after insert and remove).  So yes, it's hard to beat the results of a
test that hammers on the first couple entries of a sorted array, but I
think the tree has better than current performance and more predictable
when scaled performance.


Scaling doesn't matter, only actual performance.  Even a guest with 512 
slots would still hammer only on the first few slots, since these will 
contain the bulk of memory.



If we knew when we were searching for which type of data, it would
perhaps be nice if we could use a sorted array for guest memory (since
it's nicely bounded into a small number of large chunks), and a tree for
mmio (where we expect the scaling to be a factor).  Thanks,


We have three types of memory:

- RAM - a few large slots
- mapped mmio (for device assignment) - possible many small slots
- non-mapped mmio (for emulated devices) - no slots

The first two are handled in exactly the same way - they're just memory 
slots.  We expect a lot more hits into the RAM slots, since they're much 
bigger.  But by far the majority of faults will be for the third 
category - mapped memory will be hit once per page, then handled by 
hardware until Linux memory management does something about the page, 
which should hopefully be rare (with device assignment, rare == never, 
since those pages are pinned).


Therefore our optimization priorities should be
- complete miss into the slot list
- hit into the RAM slots
- hit into the other slots (trailing far behind)

Of course worst-case performance matters.  For example, we might (not 
sure) be searching the list with the mmu spinlock held.


I think we still have a bit to go before we can justify the new data 
structure.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: EPT: Misconfiguration

2011-02-27 Thread Avi Kivity


Copying netdev: looks like memory corruption in the networking stack.

Archive link: http://www.spinics.net/lists/kvm/msg50651.html (for the 
attachment).


On 02/24/2011 11:15 PM, Ruben Kerkhof wrote:


  On Tue, Feb 15, 2011 at 18:16, Marcelo Tosattimtosa...@redhat.com  wrote:

  This and the others reported. So yes, it looks something is corrupting
  memory. Ruben, you can try to boot with slub_debug=ZFPU kernel option.

Ok, there are now only 6 vms left on this host, and I've booted it
with the slub_debug=ZFPU option.
After a few hours, I got the following result:

2011-02-24T21:41:30.818496+01:00 phy005 kernel:
=
2011-02-24T21:41:30.818517+01:00 phy005 kernel: BUG kmalloc-2048 (Not
tainted): Object padding overwritten
2011-02-24T21:41:30.818523+01:00 phy005 kernel:
-
2011-02-24T21:41:30.818526+01:00 phy005 kernel:
2011-02-24T21:41:30.818530+01:00 phy005 kernel: INFO:
0x8806230752ca-0x8806230752cf. First byte 0x0 instead of 0x5a
2011-02-24T21:41:30.818534+01:00 phy005 kernel: INFO: Allocated in
__netdev_alloc_skb+0x34/0x51 age=2231 cpu=8 pid=0
2011-02-24T21:41:30.818537+01:00 phy005 kernel: INFO: Freed in
skb_release_data+0xc9/0xce age=2368 cpu=8 pid=2159
2011-02-24T21:41:30.818541+01:00 phy005 kernel: INFO: Slab
0xea00157a9880 objects=15 used=13 fp=0x8806230752d0
flags=0x404083
2011-02-24T21:41:30.818545+01:00 phy005 kernel: INFO: Object
0x880623074a88 @offset=19080 fp=0x8806230752d0

The rest of the output is attached since it's quite large.

Kind regards,

Ruben



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] qemu-kvm: Fix non-PCI target build

2011-02-27 Thread Avi Kivity

On 02/23/2011 10:28 AM, Jan Kiszka wrote:

Replace obsolete qemu-kvm.h with kvm.h in pci.c and build that module
just like upstream does. This fixes non-x86 targets which have no PCI
support.


Thanks, applied to mainline and stable-0.14.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] expose kvmclock upper msr set.

2011-02-27 Thread Avi Kivity

On 02/23/2011 07:44 PM, Glauber Costa wrote:

We've been supporting kvmclock MSRs in the 0x4b564d00-0x4b564dff range
for a while now, but we're not exposing it yet, meaning nobody is using it.
This simple patch takes care of that.


We're exposing them via KVM_GET_SUPPORTED_CPUID, leaf KVM_CPUID_FEATURES.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND] [PATCH 1/1] qemu-kvm (device-assignment): detecting the pre-fectchable memory region.

2011-02-27 Thread Avi Kivity

On 02/26/2011 06:44 PM, Prasad Joshi wrote:

During device assignment the memory pre-fetchable flag was discarded
as the IORESOURCE_PREFETCH was defined as 0x1000 when instead it
should have been 0x2000. Following small patch fixes the problem,
please apply.



Applied, thanks.  Note this patch was somehow corrupted with 0xa0 
characters instead of good old spaces.  Please check your mail config.



Signed-off-by: Prasad Joshiprasadjoshi...@gmail.com
---
diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index e5205cf..2cfa155 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -47,7 +47,7 @@
  #define IORESOURCE_MEM  0x0200
  #define IORESOURCE_IRQ  0x0400
  #define IORESOURCE_DMA  0x0800
-#define IORESOURCE_PREFETCH 0x1000  /* No side effects */
+#define IORESOURCE_PREFETCH 0x2000  /* No side effects */

  /* #define DEVICE_ASSIGNMENT_DEBUG 1 */
--


The last line from the diff was missing as well.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: Fixes the qemu-kvm compilation failure when configured with disable-kvm

2011-02-27 Thread Avi Kivity

On 02/26/2011 06:38 PM, Prasad Joshi wrote:

I pulled the latest qemu-kvm code and configured it with disabled-kvm
and related options. Configuration finishes well, but the compilation
fails.



This is already fixed in tree.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] kvm-unit-tests: add x86 port io accessors

2011-02-27 Thread Avi Kivity

On 02/24/2011 11:48 PM, Anthony Liguori wrote:

Signed-off-by: Anthony Liguorialigu...@us.ibm.com

diff --git a/lib/x86/io.h b/lib/x86/io.h
new file mode 100644
index 000..bd6341c
--- /dev/null
+++ b/lib/x86/io.h
@@ -0,0 +1,40 @@
+#ifndef IO_H
+#define IO_H
+
+static inline unsigned char inb(unsigned short port)
+{
+unsigned char value;
+asm volatile(inb %w1, %0 : =a (value) : Nd (port));
+return value;
+}


Are those %[wb] really needed?  gcc should do the right thing based on 
the argument type.


Might as well put all of that into processor.h.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm-unit-tests: do not set level sensitive irq when initializing the PIC

2011-02-27 Thread Avi Kivity

On 02/24/2011 11:48 PM, Anthony Liguori wrote:

I'm not sure if this was intentional but the QEMU i8259 does not support this
flag.  I haven't observed any issues with this but I'll happily admit that
I'm not very aware of what I'm doing here.

Signed-off-by: Anthony Liguorialigu...@us.ibm.com

  static u32 xapic_read(unsigned reg)
  {
  return *(volatile u32 *)(g_apic + reg);
@@ -133,7 +129,7 @@ void ioapic_write_redir(unsigned line, ioapic_redir_entry_t 
e)
  void enable_apic(void)
  {
  printf(enabling apic\n);
-xapic_write(0xf0, 0x1ff); /* spurious vector register */
+xapic_write(0xf0, 0x1f7); /* spurious vector register */
  }


Not sure what you're doing here.  You're changing the APIC Spurious 
Vector from 0xff to 0xf7?  This has nothing to do with the i8259 or 
level triggeredness as far as I can tell - it just enables the APIC (bit 
8) and selects a vector for reporting spurious interrupts (0xff happens 
to be the reset value).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] kvm-unit-tests: make I/O more friendly to existing QEMU hardware

2011-02-27 Thread Avi Kivity

On 02/24/2011 11:48 PM, Anthony Liguori wrote:

Use the serial port for printf() and use the Bochs bios exit port if the
testdev port isn't available.

This unconditionally switches to use the serial port but tries to use the
testdev exit port since that lets you pass an exit status.


The only issue I can see is that the serial port is much slower than the 
testdev port (since we can use rep outsb there).  That only matters for 
access.flat in full output mode (not default).  So please keep the old 
code under #if 0 so we can easily switch back to it.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 1/3] kvm-unit-tests: add x86 port io accessors

2011-02-27 Thread Anthony Liguori

On 02/27/2011 06:44 AM, Avi Kivity wrote:

On 02/24/2011 11:48 PM, Anthony Liguori wrote:

Signed-off-by: Anthony Liguorialigu...@us.ibm.com

diff --git a/lib/x86/io.h b/lib/x86/io.h
new file mode 100644
index 000..bd6341c
--- /dev/null
+++ b/lib/x86/io.h
@@ -0,0 +1,40 @@
+#ifndef IO_H
+#define IO_H
+
+static inline unsigned char inb(unsigned short port)
+{
+unsigned char value;
+asm volatile(inb %w1, %0 : =a (value) : Nd (port));
+return value;
+}


Are those %[wb] really needed?  gcc should do the right thing based on 
the argument type.


It's just a little extra type safety.



Might as well put all of that into processor.h.


Seems reasonable.

Regards,

Anthony Liguori


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: [PATCH 1/3] kvm-unit-tests: add x86 port io accessors

2011-02-27 Thread Avi Kivity

On 02/27/2011 04:01 PM, Anthony Liguori wrote:

On 02/27/2011 06:44 AM, Avi Kivity wrote:

On 02/24/2011 11:48 PM, Anthony Liguori wrote:

Signed-off-by: Anthony Liguorialigu...@us.ibm.com

diff --git a/lib/x86/io.h b/lib/x86/io.h
new file mode 100644
index 000..bd6341c
--- /dev/null
+++ b/lib/x86/io.h
@@ -0,0 +1,40 @@
+#ifndef IO_H
+#define IO_H
+
+static inline unsigned char inb(unsigned short port)
+{
+unsigned char value;
+asm volatile(inb %w1, %0 : =a (value) : Nd (port));
+return value;
+}


Are those %[wb] really needed?  gcc should do the right thing based 
on the argument type.


It's just a little extra type safety.


Yeah, but those constraints aren't really documented.  Linux does use 
them in a handful of places so they're unlikely to go away though.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: VMX: write new TR selector value into vmcs immediately if it changes during vm86 mode.

2011-02-27 Thread Avi Kivity

On 02/21/2011 12:07 PM, Gleb Natapov wrote:

When vm86 is active TR descriptor is updated with vm86 task values,
but selector is left intact. vmx_set_segment() makes sure that if TR
register is written into while vm86 is active the new values are saved
for use after vm86 is deactivated, but since selector is not updated on
vm86 activation/deactivation new value is lost. Fix this by writing new
selector into vmcs immediately.


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: VMX: Initialize vm86 TSS only once.

2011-02-27 Thread Avi Kivity

On 02/21/2011 12:07 PM, Gleb Natapov wrote:

Currently vm86 task is initialized on each real mode entry and vcpu
reset. Initialization is done by zeroing TSS and updating relevant
fields. But since all vcpus are using the same TSS there is a race where
one vcpu may use TSS while other vcpu is initializing it, so the vcpu
that uses TSS will see wrong TSS content and will behave incorrectly.
Fix that by initializing TSS only once.


Applied, thanks.

According to my reading of the code, if KVM_SET_TSS_ADDR is not invoked, 
the guest would fail both before and after the patch, yes?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: VMX: Initialize vm86 TSS only once.

2011-02-27 Thread Gleb Natapov
On Sun, Feb 27, 2011 at 05:43:07PM +0200, Avi Kivity wrote:
 On 02/21/2011 12:07 PM, Gleb Natapov wrote:
 Currently vm86 task is initialized on each real mode entry and vcpu
 reset. Initialization is done by zeroing TSS and updating relevant
 fields. But since all vcpus are using the same TSS there is a race where
 one vcpu may use TSS while other vcpu is initializing it, so the vcpu
 that uses TSS will see wrong TSS content and will behave incorrectly.
 Fix that by initializing TSS only once.
 
 Applied, thanks.
 
 According to my reading of the code, if KVM_SET_TSS_ADDR is not
 invoked, the guest would fail both before and after the patch, yes?
 
Hmmm. Actually no. Before the patch guest that doesn't use KVM_SET_TSS_ADDR
will use the top of slot zero. Should I fix that (how?), or should we
drop support for those old guests? The problem with using top of slot
zero is that this memory is available for guest use and we do not even
put it into e820 map as far as I see. Also there are patches floating
around that re-arrange memslots or even put them in a tree. They will
break old guests too.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: VMX: Initialize vm86 TSS only once.

2011-02-27 Thread Avi Kivity

On 02/27/2011 05:52 PM, Gleb Natapov wrote:


  According to my reading of the code, if KVM_SET_TSS_ADDR is not
  invoked, the guest would fail both before and after the patch, yes?

Hmmm. Actually no. Before the patch guest that doesn't use KVM_SET_TSS_ADDR
will use the top of slot zero. Should I fix that (how?), or should we
drop support for those old guests?


I don't think we have a problem with older qemus, but perhaps we do with 
non-qemu users.  The API clearly requires the ioctl to be called, but I 
don't think we can blame anyone for forgetting to do so, especially if 
it worked silently.



The problem with using top of slot
zero is that this memory is available for guest use and we do not even
put it into e820 map as far as I see. Also there are patches floating
around that re-arrange memslots or even put them in a tree. They will
break old guests too.


Well, slot 0 still exists even if it is moved somewhere else.

Something we can do is put the tss slot just below the highest slot that 
is still below 4G, and hope there is no mmio there.  Once the user 
issues KVM_SET_TSS_ADDR, use that.  We'll have to keep juggling that 
slot as the user creates more slots, icky.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: VMX: Initialize vm86 TSS only once.

2011-02-27 Thread Avi Kivity

On 02/27/2011 05:58 PM, Avi Kivity wrote:



The problem with using top of slot
zero is that this memory is available for guest use and we do not even
put it into e820 map as far as I see. Also there are patches floating
around that re-arrange memslots or even put them in a tree. They will
break old guests too.


Well, slot 0 still exists even if it is moved somewhere else.

Something we can do is put the tss slot just below the highest slot 
that is still below 4G, and hope there is no mmio there.  Once the 
user issues KVM_SET_TSS_ADDR, use that.  We'll have to keep juggling 
that slot as the user creates more slots, icky.




Or we can keep the old behaviour.  If KVM_SET_TSS_ADDR hasn't been 
called by the time of the first entry into real mode (the first 
KVM_CREATE_VCPU?), use the top of the first slot.


We can avoid the SMP problem by initializing the memory in a single 
pass, writing each byte exactly once with its final value.  This way 
concurrent initialization doesn't corrupt an in-use TSS.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: VMX: Initialize vm86 TSS only once.

2011-02-27 Thread Gleb Natapov
On Sun, Feb 27, 2011 at 05:58:54PM +0200, Avi Kivity wrote:
 On 02/27/2011 05:52 PM, Gleb Natapov wrote:
 
   According to my reading of the code, if KVM_SET_TSS_ADDR is not
   invoked, the guest would fail both before and after the patch, yes?
 
 Hmmm. Actually no. Before the patch guest that doesn't use KVM_SET_TSS_ADDR
 will use the top of slot zero. Should I fix that (how?), or should we
 drop support for those old guests?
 
 I don't think we have a problem with older qemus, but perhaps we do
 with non-qemu users.  The API clearly requires the ioctl to be
 called, but I don't think we can blame anyone for forgetting to do
 so, especially if it worked silently.
 
It may have worked as in no error returned from KVM_RUN, but if
userspace does not call to KVM_SET_TSS_ADDR kvm silently uses part of
a guest memory to store its data which may cause guest to fail long after
it was started. It is true that for that to happen guest needs to enter
protected mode during its lifetime and not many guests do that usually.
The only cases I can think of are during some guests installation and
S3 suspend/resume.  

 The problem with using top of slot
 zero is that this memory is available for guest use and we do not even
 put it into e820 map as far as I see. Also there are patches floating
 around that re-arrange memslots or even put them in a tree. They will
 break old guests too.
 
 Well, slot 0 still exists even if it is moved somewhere else.
 
 Something we can do is put the tss slot just below the highest slot
 that is still below 4G, and hope there is no mmio there.  Once the
 user issues KVM_SET_TSS_ADDR, use that.  We'll have to keep juggling
 that slot as the user creates more slots, icky.
 
I have a question about our current placement of tss addr. Qemu-kvm
places it at 4G-16M and comment says that this is just below BIOS ROM,
but BIOS ROM takes only upper 128K.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: VMX: Initialize vm86 TSS only once.

2011-02-27 Thread Avi Kivity

On 02/27/2011 06:12 PM, Gleb Natapov wrote:

On Sun, Feb 27, 2011 at 05:58:54PM +0200, Avi Kivity wrote:
  On 02/27/2011 05:52 PM, Gleb Natapov wrote:
  
 According to my reading of the code, if KVM_SET_TSS_ADDR is not
 invoked, the guest would fail both before and after the patch, yes?
  
  Hmmm. Actually no. Before the patch guest that doesn't use KVM_SET_TSS_ADDR
  will use the top of slot zero. Should I fix that (how?), or should we
  drop support for those old guests?

  I don't think we have a problem with older qemus, but perhaps we do
  with non-qemu users.  The API clearly requires the ioctl to be
  called, but I don't think we can blame anyone for forgetting to do
  so, especially if it worked silently.

It may have worked as in no error returned from KVM_RUN, but if
userspace does not call to KVM_SET_TSS_ADDR kvm silently uses part of
a guest memory to store its data which may cause guest to fail long after
it was started. It is true that for that to happen guest needs to enter
protected mode during its lifetime and not many guests do that usually.
The only cases I can think of are during some guests installation and
S3 suspend/resume.


Right.  I prefer to keep this partially working state if users didn't 
have a problem with it.



  The problem with using top of slot
  zero is that this memory is available for guest use and we do not even
  put it into e820 map as far as I see. Also there are patches floating
  around that re-arrange memslots or even put them in a tree. They will
  break old guests too.

  Well, slot 0 still exists even if it is moved somewhere else.

  Something we can do is put the tss slot just below the highest slot
  that is still below 4G, and hope there is no mmio there.  Once the
  user issues KVM_SET_TSS_ADDR, use that.  We'll have to keep juggling
  that slot as the user creates more slots, icky.

I have a question about our current placement of tss addr. Qemu-kvm
places it at 4G-16M and comment says that this is just below BIOS ROM,
but BIOS ROM takes only upper 128K.


Are you surprised that a comment is inaccurate?

Likely it was moved to make room for larger bioses.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm,async_pf: add missing kvm_async_pf_hash_reset()

2011-02-27 Thread Avi Kivity

On 02/21/2011 05:21 AM, Lai Jiangshan wrote:

The hash array of async gfns may still contain some left gfns after
kvm_clear_async_pf_completion_queue() called, need to clear them.


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: VMX: Initialize vm86 TSS only once.

2011-02-27 Thread Gleb Natapov
On Sun, Feb 27, 2011 at 06:04:16PM +0200, Avi Kivity wrote:
 On 02/27/2011 05:58 PM, Avi Kivity wrote:
 
 The problem with using top of slot
 zero is that this memory is available for guest use and we do not even
 put it into e820 map as far as I see. Also there are patches floating
 around that re-arrange memslots or even put them in a tree. They will
 break old guests too.
 
 Well, slot 0 still exists even if it is moved somewhere else.
 
 Something we can do is put the tss slot just below the highest
 slot that is still below 4G, and hope there is no mmio there.
 Once the user issues KVM_SET_TSS_ADDR, use that.  We'll have to
 keep juggling that slot as the user creates more slots, icky.
 
 
 Or we can keep the old behaviour.  If KVM_SET_TSS_ADDR hasn't been
 called by the time of the first entry into real mode (the first
 KVM_CREATE_VCPU?), use the top of the first slot.
 
Do we require that KVM_SET_TSS_ADDR is called before first KVM_CREATE_VCPU?
We may still break some theoretical userspaces this way.
 
 We can avoid the SMP problem by initializing the memory in a single
 pass, writing each byte exactly once with its final value.  This way
 concurrent initialization doesn't corrupt an in-use TSS.
 
Sounds hackish, but may work. Doing so will make entering pmode much
more slow.
 
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Q: status of kvm/ subdir in qemu-kvm tarball?

2011-02-27 Thread Avi Kivity

On 02/23/2011 04:57 PM, Jan Kiszka wrote:


  The only situation where using kernel headers not provided
  by qemu itself is when you want to build qemu for older
  kernel and omit some features and runtime tests.  But this
  is hardly a good goal to support.

I don't think the goal of the #ifdefs was ever some kind of size
optimization but just for the sake of avoiding build breakages.



Yes.


Well, rm -r kvm is a rather minor thing. But the header discussion is
important IMO as we continue to see breakages in KVM builds (mostly qemu
upstream) due to missing #ifdefs. That means:
  - we do not test all variants (because it's impractical)


We could teach buildbot about it.


  - people use older headers
  - too much effort is wasted on fixing distributed problems that can be
solved centrally


Yes.  But on the other hand carrying headers is the Wrong Thing, isn't 
it?  If everyone did that we'd be in a mess of duplication.  I'd like 
not to contribute to that.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: VMX: Initialize vm86 TSS only once.

2011-02-27 Thread Avi Kivity

On 02/27/2011 06:27 PM, Gleb Natapov wrote:


  Or we can keep the old behaviour.  If KVM_SET_TSS_ADDR hasn't been
  called by the time of the first entry into real mode (the first
  KVM_CREATE_VCPU?), use the top of the first slot.

Do we require that KVM_SET_TSS_ADDR is called before first KVM_CREATE_VCPU?
We may still break some theoretical userspaces this way.


The documentation doesn't say, but I think it's unlikely that userspace 
will start a guest and then configure it.




  We can avoid the SMP problem by initializing the memory in a single
  pass, writing each byte exactly once with its final value.  This way
  concurrent initialization doesn't corrupt an in-use TSS.

Sounds hackish, but may work. Doing so will make entering pmode much
more slow.


Why?  do it at most once per vcpu.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM: VMX: Initialize vm86 TSS only once.

2011-02-27 Thread Gleb Natapov
On Sun, Feb 27, 2011 at 06:31:13PM +0200, Avi Kivity wrote:
 On 02/27/2011 06:27 PM, Gleb Natapov wrote:
 
   Or we can keep the old behaviour.  If KVM_SET_TSS_ADDR hasn't been
   called by the time of the first entry into real mode (the first
   KVM_CREATE_VCPU?), use the top of the first slot.
 
 Do we require that KVM_SET_TSS_ADDR is called before first KVM_CREATE_VCPU?
 We may still break some theoretical userspaces this way.
 
 The documentation doesn't say, but I think it's unlikely that
 userspace will start a guest and then configure it.
 
KVM_CREATE_VCPU is done before guest is started. Starting of a guest
happens when KVM_RUN is done for the first time.

 
   We can avoid the SMP problem by initializing the memory in a single
   pass, writing each byte exactly once with its final value.  This way
   concurrent initialization doesn't corrupt an in-use TSS.
 
 Sounds hackish, but may work. Doing so will make entering pmode much
 more slow.
 
 Why?  do it at most once per vcpu.
 
The reboot may fail since guest may have used the memory while running,
so memory should be reinited on reboot, but since KVM doesn't always
know about reboot it should be done at each protected mode entry. 

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug inkvm_set_irq

2011-02-27 Thread Michael S. Tsirkin
On Fri, Feb 25, 2011 at 10:07:22AM +0100, Jean-Philippe Menil wrote:
 Hi,
 
 Each time i try tou use vhost_net, i'm facing a kernel bug.
 I do a modprobe vhost_net, and start guest whith vhost=on.
 
 Following is a trace with a kernel 2.6.37, but  i had the same
 problem with 2.6.36 (cf https://lkml.org/lkml/2010/11/30/29).

2.6.36 had a theorectical race that could explain this,
but it should be ok in 2.6.37.

 
 The bug only occurs whith vhost_net charged, so i don't know if this
 is a bug in kvm module code or in the vhost_net code.

It could be a bug in eventfd which is the interface
used by both kvm and vhost_net.
Just for fun, you can try 3.6.38 - eventfd code has been changed
a lot in 2.6.38 and if it does not trigger there
it's a hint that irqfd is the reason.

 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.243100] BUG: unable to handle kernel paging request at
 2458
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.243250] IP: [a041aa8a] kvm_set_irq+0x2a/0x130 [kvm]


Could you run markup_oops/ ksymoops on this please?
As far as I can see kvm_set_irq can only get a wrong
kvm pointer. Unless there's some general memory corruption,
I'd guess 

You can also try comparing the irqfd-kvm pointer in
kvm_irqfd_assign irqfd_wakeup and kvm_set_irq in
virt/kvm/eventfd.c.

 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.243378] PGD 45d363067 PUD 45e77a067 PMD 0
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.243556] Oops:  [#1] SMP
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.243692] last sysfs file:
 /sys/devices/pci:00/:00:0d.0/:05:00.0/:06:00.0/irq
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [  685.243777] CPU 0
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.243820] Modules linked in: vhost_net macvtap macvlan tun
 powernow_k8 mperf cpufreq_userspace cpufreq_stats cpufreq_powersave
 cpufreq_ondemand fre
 q_table cpufreq_conservative fuse xt_physdev ip6t_LOG
 ip6table_filter ip6_tables ipt_LOG xt_multiport xt_limit xt_tcpudp
 xt_state iptable_filter ip_tables x_tables nf_conntrack_tftp
 nf_conntrack_ftp nf_connt
 rack_ipv4 nf_defrag_ipv4 8021q bridge stp ext2 mbcache
 dm_round_robin dm_multipath nf_conntrack_ipv6 nf_conntrack
 nf_defrag_ipv6 kvm_amd kvm ipv6 snd_pcm snd_timer snd soundcore
 snd_page_alloc tpm_tis tpm ps
 mouse dcdbas tpm_bios processor i2c_nforce2 shpchp pcspkr ghes
 serio_raw joydev evdev pci_hotplug i2c_core hed button thermal_sys
 xfs exportfs dm_mod sg sr_mod cdrom usbhid hid usb_storage ses
 sd_mod enclosu
 re megaraid_sas ohci_hcd lpfc scsi_transport_fc scsi_tgt bnx2
 scsi_mod ehci_hcd [last unloaded: scsi_wait_scan]
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [  685.246123]
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.246123] Pid: 10, comm: kworker/0:1 Not tainted
 2.6.37-dsiun-110105 #17 0K543T/PowerEdge M605
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.246123] RIP: 0010:[a041aa8a]  [a041aa8a]
 kvm_set_irq+0x2a/0x130 [kvm]
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.246123] RSP: 0018:88045fc89d30  EFLAGS: 00010246
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.246123] RAX:  RBX: 001a RCX:
 0001
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.246123] RDX:  RSI:  RDI:
 
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.246123] RBP:  R08: 0001 R09:
 880856a91e48
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.246123] R10:  R11:  R12:
 
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.246123] R13: 0001 R14:  R15:
 
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.246123] FS:  7f617986c710() GS:88007f80()
 knlGS:
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.246123] CS:  0010 DS:  ES:  CR0: 8005003b
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.246123] CR2: 2458 CR3: 00045d197000 CR4:
 06f0
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.246123] DR0:  DR1:  DR2:
 
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.246123] DR3:  DR6: 0ff0 DR7:
 0400
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.246123] Process kworker/0:1 (pid: 10, threadinfo
 88045fc88000, task 88085fc53c30)
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [  685.246123] Stack:
 Feb 23 13:56:19 ayrshire.u06.univ-nantes.prive kernel: [
 685.246123]  88045fc89fd8 000119c0 88045fc88010
 88085fc53ee8
 Feb 23 13:56:19 

Re: regression - 2.6.36 - 2.6.37 - kvm - 32bit SMP guests don't boot

2011-02-27 Thread Nikola Ciprich
 I was not aware of the thread.  Please cc me directly, or add a keyword  
 I track - timekeeping, TSC..
Hello Zachary, thanks for Your time looking at this!
 That change alone may not bisect well; without further fixes on top of  
 it, you may end up with a hang or stall, which is likely to manifest in  
 a vendor-specific way.
I'm not sure I really understand You here, but this change is exactly to
what I got while bisecting. With later revisions, including this one,
32bit SMP guests don't boot, before it, they do..

 Basically there were a few differences in the platform code about how  
 TSC was dealt with on systems which did not have stable clocks, this  
 brought the logic into one location, but there was a slight change to  
 the logic here.

 Note very carefully, the logic on SVM is gated by a condition before  
 this change:

 if (unlikely(cpu != vcpu-cpu)) {
 -   u64 delta;
 -
 -   if (check_tsc_unstable()) {
 -   /*
 -* Make sure that the guest sees a monotonically
 -* increasing TSC.
 -*/
 -   delta = vcpu-arch.host_tsc - native_read_tsc();
 -   svm-vmcb-control.tsc_offset += delta;
 -   if (is_nested(svm))
 -   svm-nested.hsave-control.tsc_offset +=  
 delta;
 -   }
 -   vcpu-cpu = cpu;
 -   kvm_migrate_timers(vcpu);


 So this only happens with a system which reports TSC as unstable.  After  
 the change, KVM itself may report the TSC as unstable:

 +   if (unlikely(vcpu-cpu != cpu)) {
 +   /* Make sure TSC doesn't go backwards */
 +   s64 tsc_delta = !vcpu-arch.last_host_tsc ? 0 :
 +   native_read_tsc() -  
 vcpu-arch.last_host_tsc;
 +   if (tsc_delta  0)
 +   mark_tsc_unstable(KVM discovered backwards TSC);
 +   if (check_tsc_unstable())
 +   kvm_x86_ops-adjust_tsc_offset(vcpu, -tsc_delta);
 +   kvm_migrate_timers(vcpu);
 +   vcpu-cpu = cpu;
 +   }

 If the platform has very small TSC deltas across CPUs, but indicates the  
 TSC is stable, this could result in KVM marking the TSC unstable.  If  
 that is the case, this compensation logic will kick in to avoid  
 backwards TSCs.

 Note however, that the logic is not perfect; time which passes while not  
 running on any CPU will be erased, as the delta compensation removes not  
 just backwards, but any elapsed time from the TSC.  In extreme cases,  
 this could result in time appearing to stand still with guests  
 failing to boot.

 This was addressed with a later change, which catches up the missing time:

 commit c285545f813d7b0ce989fd34e42ad1fe785dc65d
yes, but this change is already included in 2.6.37, so maybe some other fix is 
needed?
if You have some idea what could be changed, I'll gladly test whatever You 
recommend,
but I'm afraid that's all I can do, since this is a bit of a rocket science for 
me, sorry :(
nik




 Author: Zachary Amsden zams...@redhat.com
 Date:   Sat Sep 18 14:38:15 2010 -1000

 KVM: x86: TSC catchup mode

 Negate the effects of AN TYM spell while kvm thread is preempted by  
 tracking
 conversion factor to the highest TSC rate and catching the TSC up  
 when it has
 fallen behind the kernel view of time.  Note that once triggered, we 
 don't
 turn off catchup mode.

 A slightly more clever version of this is possible, which only does  
 catchup
 when TSC rate drops, and which specifically targets only CPUs with  
 broken
 TSC, but since these all are considered unstable_tsc(), this patch  
 covers
 all necessary cases.

 Signed-off-by: Zachary Amsden zams...@redhat.com
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
-
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.:   +420 596 603 142
fax:+420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: ser...@linuxbox.cz
-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Q: status of kvm/ subdir in qemu-kvm tarball?

2011-02-27 Thread Jan Kiszka
On 2011-02-27 17:28, Avi Kivity wrote:
 On 02/23/2011 04:57 PM, Jan Kiszka wrote:
 
   The only situation where using kernel headers not provided
   by qemu itself is when you want to build qemu for older
   kernel and omit some features and runtime tests.  But this
   is hardly a good goal to support.

 I don't think the goal of the #ifdefs was ever some kind of size
 optimization but just for the sake of avoiding build breakages.

 
 Yes.
 
 Well, rm -r kvm is a rather minor thing. But the header discussion is
 important IMO as we continue to see breakages in KVM builds (mostly qemu
 upstream) due to missing #ifdefs. That means:
   - we do not test all variants (because it's impractical)
 
 We could teach buildbot about it.

We could, but the test matrix wouldn't be simple. Think of CAPs that
depend on other CAPs. I don't think it would be worth the effort. And if
we forget to update a test after adding a CAP, the situation won't be
different from today.

 
   - people use older headers
   - too much effort is wasted on fixing distributed problems that can be
 solved centrally
 
 Yes.  But on the other hand carrying headers is the Wrong Thing, isn't
 it?

Well, there are also different views on this, see e.g.
http://kernelnewbies.org/KernelHeaders

  If everyone did that we'd be in a mess of duplication.  I'd like
 not to contribute to that.

Not all distros keep their kernel headers sufficiently recent and/or
users forget to update them when updating the kernel (I just noticed
that vhost remained off in some builds here due to pre-historic distro
headers). As long as we continue adding or changing the kernel ABI, qemu
will face these problems.

We can't avoid a certain degree of mess, either in form of in-tree
header copies or via a continuously increasing number of #ifdefs in our
code. I'm really convinced now the former is both more user-friendly and
cleaner than the latter.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] Re: kvm crashes with spice while loading qxl

2011-02-27 Thread Alon Levy
On Sat, Feb 26, 2011 at 01:29:01PM +0100, Jan Kiszka wrote:
 On 2011-02-26 12:43, xming wrote:
  When trying to start X (and it loads qxl driver) the kvm process just 
  crashes.

This is fixed by Gerd's attached patch (taken from rhel repository, don't know
why it wasn't pushed to qemu-kvm upstream). I'll send it to kvm list as well 
(separate email).

  
  qemu-kvm 0.14
  
  startup line
  
  /usr/bin/kvm -name spaceball,process=spaceball -m 1024 -kernel
  /boot/bzImage-2.6.37.2-guest -append root=/dev/vda ro -smp 1 -netdev
  type=tap,id=spaceball0,script=kvm-ifup-brloc,vhost=on -device
  virtio-net-pci,netdev=spaceball0,mac=00:16:3e:00:08:01 -drive
  file=/dev/volume01/G-spaceball,if=virtio -vga qxl -spice
  port=5957,disable-ticketing -monitor
  telnet:192.168.0.254:10007,server,nowait,nodelay -pidfile
  /var/run/kvm/spaceball.pid
  
  host is running vanilla 2.6.37.1 on amd64.
  
  Here is the bt
  
  # gdb /usr/bin/qemu-system-x86_64
  GNU gdb (Gentoo 7.2 p1) 7.2
  Copyright (C) 2010 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later 
  http://gnu.org/licenses/gpl.html
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type show copying
  and show warranty for details.
  This GDB was configured as x86_64-pc-linux-gnu.
  For bug reporting instructions, please see:
  http://bugs.gentoo.org/...
  Reading symbols from /usr/bin/qemu-system-x86_64...done.
  (gdb) set args -name spaceball,process=spaceball -m 1024 -kernel
  /boot/bzImage-2.6.37.2-guest -append root=/dev/vda ro -smp 1 -netdev
  type=tap,id=spaceball0,script=kvm-ifup-brloc,vhost=on -device
  virtio-net-pci,netdev=spaceball0,mac=00:16:3e:00:08:01 -drive
  file=/dev/volume01/G-spaceball,if=virtio -vga qxl -spice
  port=5957,disable-ticketing -monitor
  telnet:192.168.0.254:10007,server,nowait,nodelay -pidfile
  /var/run/kvm/spaceball.pid
  (gdb) run
  Starting program: /usr/bin/qemu-system-x86_64 -name
  spaceball,process=spaceball -m 1024 -kernel
  /boot/bzImage-2.6.37.2-guest -append root=/dev/vda ro -smp 1 -netdev
  type=tap,id=spaceball0,script=kvm-ifup-brloc,vhost=on -device
  virtio-net-pci,netdev=spaceball0,mac=00:16:3e:00:08:01 -drive
  file=/dev/volume01/G-spaceball,if=virtio -vga qxl -spice
  port=5957,disable-ticketing -monitor
  telnet:192.168.0.254:10007,server,nowait,nodelay -pidfile
  /var/run/kvm/spaceball.pid
  [Thread debugging using libthread_db enabled]
  do_spice_init: starting 0.6.0
  spice_server_add_interface: SPICE_INTERFACE_KEYBOARD
  spice_server_add_interface: SPICE_INTERFACE_MOUSE
  [New Thread 0x74802710 (LWP 30294)]
  spice_server_add_interface: SPICE_INTERFACE_QXL
  [New Thread 0x7fffaacae710 (LWP 30295)]
  red_worker_main: begin
  handle_dev_destroy_surfaces:
  handle_dev_destroy_surfaces:
  handle_dev_input: start
  [New Thread 0x7fffaa4ad710 (LWP 30298)]
  [New Thread 0x7fffa9cac710 (LWP 30299)]
  [New Thread 0x7fffa94ab710 (LWP 30300)]
  [New Thread 0x7fffa8caa710 (LWP 30301)]
  [New Thread 0x7fffa3fff710 (LWP 30302)]
  [New Thread 0x7fffa37fe710 (LWP 30303)]
  [New Thread 0x7fffa2ffd710 (LWP 30304)]
  [New Thread 0x7fffa27fc710 (LWP 30305)]
  [New Thread 0x7fffa1ffb710 (LWP 30306)]
  [New Thread 0x7fffa17fa710 (LWP 30307)]
  reds_handle_main_link:
  reds_show_new_channel: channel 1:0, connected successfully, over Non Secure 
  link
  reds_main_handle_message: net test: latency 5.636000 ms, bitrate
  11027768 bps (10.516899 Mbps)
  reds_show_new_channel: channel 2:0, connected successfully, over Non Secure 
  link
  red_dispatcher_set_peer:
  handle_dev_input: connect
  handle_new_display_channel: jpeg disabled
  handle_new_display_channel: zlib-over-glz disabled
  reds_show_new_channel: channel 4:0, connected successfully, over Non Secure 
  link
  red_dispatcher_set_cursor_peer:
  handle_dev_input: cursor connect
  reds_show_new_channel: channel 3:0, connected successfully, over Non Secure 
  link
  inputs_link:
  [New Thread 0x7fffa07f8710 (LWP 30312)]
  [New Thread 0x7fff9fff7710 (LWP 30313)]
  [New Thread 0x7fff9f7f6710 (LWP 30314)]
  [New Thread 0x7fff9eff5710 (LWP 30315)]
  [New Thread 0x7fff9e7f4710 (LWP 30316)]
  [New Thread 0x7fff9dff3710 (LWP 30317)]
  [New Thread 0x7fff9d7f2710 (LWP 30318)]
  qemu-system-x86_64:
  /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/qemu-kvm.c:1724:
  kvm_mutex_unlock: Assertion `!cpu_single_env' failed.
  
  Program received signal SIGABRT, Aborted.
  [Switching to Thread 0x74802710 (LWP 30294)]
  0x75daa165 in raise () from /lib/libc.so.6
  (gdb)
  (gdb)
  (gdb)
  (gdb)
  (gdb) bt
  #0  0x75daa165 in raise () from /lib/libc.so.6
  #1  0x75dab580 in abort () from /lib/libc.so.6
  #2  0x75da3201 in __assert_fail () from /lib/libc.so.6
  #3  0x00436f7e in kvm_mutex_unlock ()
  at 
  /var/tmp/portage/app-emulation/qemu-kvm-0.14.0/work/qemu-kvm-0.14.0/qemu-kvm.c:1724
  #4  qemu_mutex_unlock_iothread ()

Re: [Qemu-devel] Re: kvm crashes with spice while loading qxl

2011-02-27 Thread Jan Kiszka
On 2011-02-27 20:03, Alon Levy wrote:
 On Sat, Feb 26, 2011 at 01:29:01PM +0100, Jan Kiszka wrote:
 On 2011-02-26 12:43, xming wrote:
 When trying to start X (and it loads qxl driver) the kvm process just 
 crashes.
 
 This is fixed by Gerd's attached patch (taken from rhel repository, don't know
 why it wasn't pushed to qemu-kvm upstream). I'll send it to kvm list as well 
 (separate email).

Patch looks OK on first glance, but the changelog is misleading: This
was broken for _both_ trees, but upstream didn't detect the bug.

My concerns regarding other side effects of juggling with global mutex
in spice code remain.

Jan



signature.asc
Description: OpenPGP digital signature


[PATCH] spice/qxl: locking fix for qemu-kvm

2011-02-27 Thread Alon Levy
From: Gerd Hoffmann kra...@redhat.com

qxl needs to release the qemu lock before calling some libspice
functions (and re-aquire it later).  In upstream qemu qxl can just
use qemu_mutex_{unlock,lock}_iothread.  In qemu-kvm this doesn't
work, qxl needs additionally save+restore the cpu_single_env pointer
on unlock+lock.

Signed-off-by: Gerd Hoffmann kra...@redhat.com
---
This is commit 15ba0114aa3ddedbeb2519cb0a8755a05f0a1d38
from rhel repo.
---
 hw/qxl.c   |   37 +
 ui/spice-display.c |   12 ++--
 ui/spice-display.h |6 ++
 3 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/hw/qxl.c b/hw/qxl.c
index fe4212b..117f7c8 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -125,6 +125,27 @@ static void qxl_reset_memslots(PCIQXLDevice *d);
 static void qxl_reset_surfaces(PCIQXLDevice *d);
 static void qxl_ring_set_dirty(PCIQXLDevice *qxl);
 
+/* qemu-kvm locking ... */
+void qxl_unlock_iothread(SimpleSpiceDisplay *ssd)
+{
+if (cpu_single_env) {
+assert(ssd-env == NULL);
+ssd-env = cpu_single_env;
+cpu_single_env = NULL;
+}
+qemu_mutex_unlock_iothread();
+}
+
+void qxl_lock_iothread(SimpleSpiceDisplay *ssd)
+{
+qemu_mutex_lock_iothread();
+if (ssd-env) {
+assert(cpu_single_env == NULL);
+cpu_single_env = ssd-env;
+ssd-env = NULL;
+}
+}
+
 static inline uint32_t msb_mask(uint32_t val)
 {
 uint32_t mask;
@@ -662,10 +683,10 @@ static void qxl_hard_reset(PCIQXLDevice *d, int loadvm)
 dprint(d, 1, %s: start%s\n, __FUNCTION__,
loadvm ?  (loadvm) : );
 
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(d-ssd);
 d-ssd.worker-reset_cursor(d-ssd.worker);
 d-ssd.worker-reset_image_cache(d-ssd.worker);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(d-ssd);
 qxl_reset_surfaces(d);
 qxl_reset_memslots(d);
 
@@ -795,9 +816,9 @@ static void qxl_reset_surfaces(PCIQXLDevice *d)
 {
 dprint(d, 1, %s:\n, __FUNCTION__);
 d-mode = QXL_MODE_UNDEFINED;
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(d-ssd);
 d-ssd.worker-destroy_surfaces(d-ssd.worker);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(d-ssd);
 memset(d-guest_surfaces.cmds, 0, sizeof(d-guest_surfaces.cmds));
 }
 
@@ -866,9 +887,9 @@ static void qxl_destroy_primary(PCIQXLDevice *d)
 dprint(d, 1, %s\n, __FUNCTION__);
 
 d-mode = QXL_MODE_UNDEFINED;
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(d-ssd);
 d-ssd.worker-destroy_primary_surface(d-ssd.worker, 0);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(d-ssd);
 }
 
 static void qxl_set_mode(PCIQXLDevice *d, int modenr, int loadvm)
@@ -938,10 +959,10 @@ static void ioport_write(void *opaque, uint32_t addr, 
uint32_t val)
 case QXL_IO_UPDATE_AREA:
 {
 QXLRect update = d-ram-update_area;
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(d-ssd);
 d-ssd.worker-update_area(d-ssd.worker, d-ram-update_surface,
update, NULL, 0, 0);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(d-ssd);
 break;
 }
 case QXL_IO_NOTIFY_CMD:
diff --git a/ui/spice-display.c b/ui/spice-display.c
index 020b423..defe652 100644
--- a/ui/spice-display.c
+++ b/ui/spice-display.c
@@ -186,18 +186,18 @@ void qemu_spice_create_host_primary(SimpleSpiceDisplay 
*ssd)
 surface.mem= (intptr_t)ssd-buf;
 surface.group_id   = MEMSLOT_GROUP_HOST;
 
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(ssd);
 ssd-worker-create_primary_surface(ssd-worker, 0, surface);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(ssd);
 }
 
 void qemu_spice_destroy_host_primary(SimpleSpiceDisplay *ssd)
 {
 dprint(1, %s:\n, __FUNCTION__);
 
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(ssd);
 ssd-worker-destroy_primary_surface(ssd-worker, 0);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(ssd);
 }
 
 void qemu_spice_vm_change_state_handler(void *opaque, int running, int reason)
@@ -207,9 +207,9 @@ void qemu_spice_vm_change_state_handler(void *opaque, int 
running, int reason)
 if (running) {
 ssd-worker-start(ssd-worker);
 } else {
-qemu_mutex_unlock_iothread();
+qxl_unlock_iothread(ssd);
 ssd-worker-stop(ssd-worker);
-qemu_mutex_lock_iothread();
+qxl_lock_iothread(ssd);
 }
 ssd-running = running;
 }
diff --git a/ui/spice-display.h b/ui/spice-display.h
index aef0464..df74828 100644
--- a/ui/spice-display.h
+++ b/ui/spice-display.h
@@ -43,6 +43,9 @@ typedef struct SimpleSpiceDisplay {
 QXLRect dirty;
 int notify;
 int running;
+
+/* qemu-kvm locking ... */
+void *env;
 } SimpleSpiceDisplay;
 
 typedef struct SimpleSpiceUpdate {
@@ -52,6 +55,9 @@ typedef struct SimpleSpiceUpdate {
 uint8_t *bitmap;
 } SimpleSpiceUpdate;
 
+void qxl_unlock_iothread(SimpleSpiceDisplay *ssd);
+void 

Re: [Qemu-devel] Re: kvm crashes with spice while loading qxl

2011-02-27 Thread Alon Levy
On Sun, Feb 27, 2011 at 08:11:26PM +0100, Jan Kiszka wrote:
 On 2011-02-27 20:03, Alon Levy wrote:
  On Sat, Feb 26, 2011 at 01:29:01PM +0100, Jan Kiszka wrote:
  On 2011-02-26 12:43, xming wrote:
  When trying to start X (and it loads qxl driver) the kvm process just 
  crashes.
  
  This is fixed by Gerd's attached patch (taken from rhel repository, don't 
  know
  why it wasn't pushed to qemu-kvm upstream). I'll send it to kvm list as 
  well (separate email).
 
 Patch looks OK on first glance, but the changelog is misleading: This
 was broken for _both_ trees, but upstream didn't detect the bug.
 

The trees the patch commit message refers to are qemu and qemu-kvm. qemu 
doesn't even
have cpu_single_env. It didn't talk about two qemu-kvm trees.

 My concerns regarding other side effects of juggling with global mutex
 in spice code remain.

I know there used to be a mutex in spice code and during the upstreaming 
process it
got ditched in favor of the qemu global io mutex. I would have rather deferred 
this
to Gerd since he wrote this, but he is not available atm.

 
 Jan
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: kvm crashes with spice while loading qxl

2011-02-27 Thread Jan Kiszka
On 2011-02-27 20:16, Alon Levy wrote:
 On Sun, Feb 27, 2011 at 08:11:26PM +0100, Jan Kiszka wrote:
 On 2011-02-27 20:03, Alon Levy wrote:
 On Sat, Feb 26, 2011 at 01:29:01PM +0100, Jan Kiszka wrote:
 On 2011-02-26 12:43, xming wrote:
 When trying to start X (and it loads qxl driver) the kvm process just 
 crashes.

 This is fixed by Gerd's attached patch (taken from rhel repository, don't 
 know
 why it wasn't pushed to qemu-kvm upstream). I'll send it to kvm list as 
 well (separate email).

 Patch looks OK on first glance, but the changelog is misleading: This
 was broken for _both_ trees, but upstream didn't detect the bug.

 
 The trees the patch commit message refers to are qemu and qemu-kvm.

The same did I.

 qemu doesn't even have cpu_single_env.

Really? Check again. :)

 It didn't talk about two qemu-kvm trees.
 
 My concerns regarding other side effects of juggling with global mutex
 in spice code remain.
 
 I know there used to be a mutex in spice code and during the upstreaming 
 process it
 got ditched in favor of the qemu global io mutex. I would have rather 
 deferred this
 to Gerd since he wrote this, but he is not available atm.

It's not necessarily bad to drop the io mutex, but it is more tricky
than it may appear on first glance.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] Re: kvm crashes with spice while loading qxl

2011-02-27 Thread Alon Levy
On Sun, Feb 27, 2011 at 08:27:01PM +0100, Jan Kiszka wrote:
 On 2011-02-27 20:16, Alon Levy wrote:
  On Sun, Feb 27, 2011 at 08:11:26PM +0100, Jan Kiszka wrote:
  On 2011-02-27 20:03, Alon Levy wrote:
  On Sat, Feb 26, 2011 at 01:29:01PM +0100, Jan Kiszka wrote:
  On 2011-02-26 12:43, xming wrote:
  When trying to start X (and it loads qxl driver) the kvm process just 
  crashes.
 
  This is fixed by Gerd's attached patch (taken from rhel repository, don't 
  know
  why it wasn't pushed to qemu-kvm upstream). I'll send it to kvm list as 
  well (separate email).
 
  Patch looks OK on first glance, but the changelog is misleading: This
  was broken for _both_ trees, but upstream didn't detect the bug.
 
  
  The trees the patch commit message refers to are qemu and qemu-kvm.
 
 The same did I.
 
  qemu doesn't even have cpu_single_env.
 
 Really? Check again. :)

Sorry, grepped the wrong repo. I'll send this to qemu-devel too then.

 
  It didn't talk about two qemu-kvm trees.
  
  My concerns regarding other side effects of juggling with global mutex
  in spice code remain.
  
  I know there used to be a mutex in spice code and during the upstreaming 
  process it
  got ditched in favor of the qemu global io mutex. I would have rather 
  deferred this
  to Gerd since he wrote this, but he is not available atm.
 
 It's not necessarily bad to drop the io mutex, but it is more tricky
 than it may appear on first glance.
 
 Jan
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] Re: kvm crashes with spice while loading qxl

2011-02-27 Thread Alon Levy
On Sun, Feb 27, 2011 at 08:27:01PM +0100, Jan Kiszka wrote:
 On 2011-02-27 20:16, Alon Levy wrote:
  On Sun, Feb 27, 2011 at 08:11:26PM +0100, Jan Kiszka wrote:
  On 2011-02-27 20:03, Alon Levy wrote:
  On Sat, Feb 26, 2011 at 01:29:01PM +0100, Jan Kiszka wrote:
  On 2011-02-26 12:43, xming wrote:
  When trying to start X (and it loads qxl driver) the kvm process just 
  crashes.
 
  This is fixed by Gerd's attached patch (taken from rhel repository, don't 
  know
  why it wasn't pushed to qemu-kvm upstream). I'll send it to kvm list as 
  well (separate email).
 
  Patch looks OK on first glance, but the changelog is misleading: This
  was broken for _both_ trees, but upstream didn't detect the bug.
 
  
  The trees the patch commit message refers to are qemu and qemu-kvm.
 
 The same did I.
 
  qemu doesn't even have cpu_single_env.
 
 Really? Check again. :)
 
  It didn't talk about two qemu-kvm trees.
  
  My concerns regarding other side effects of juggling with global mutex
  in spice code remain.
  
  I know there used to be a mutex in spice code and during the upstreaming 
  process it
  got ditched in favor of the qemu global io mutex. I would have rather 
  deferred this
  to Gerd since he wrote this, but he is not available atm.
 
 It's not necessarily bad to drop the io mutex, but it is more tricky
 than it may appear on first glance.

The problem with not dropping it is that we may be in vga mode and create
updates synthtically (i.e. qemu created and not driver created) that access the
framebuffer and need to be locked so the framebuffer isn't updated at the same
time. We drop the mutex only when we are about to call the dispatcher, which 
basically
waits on red_worker (a libspice-server thread) to do some work. red_worker may
in turn callback into qxl in qemu, which may try to acquire the lock. (the
many may's here are just reflections of the codepaths).

 
 Jan
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 30042] New: page allocation failure when using jumbo frames with kvm guests

2011-02-27 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=30042

   Summary: page allocation failure when using jumbo frames with
kvm guests
   Product: Virtualization
   Version: unspecified
Kernel Version: 2.6.35
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: low
  Priority: P1
 Component: kvm
AssignedTo: virtualization_...@kernel-bugs.osdl.org
ReportedBy: kernelbug_...@russenmafia.at
Regression: No


i've got a intel e1000 based network card in my ubuntu 10.10 host, using jumbo
frames (mtu 9000).
my kvm guest (ubuntu 10.10) i'm using the virtio network card - also
configured with jumbo frames (mtu 9000).
netio shows me about 190MB/s, which is almost double the performance i get from
the standard mtu size (1500).
the only problems i get is when rtorrent starts hash checking. it's awful slow,
cpu at 100% idle, and kernel errors appearing.

Feb 27 21:09:48 vmbuntu kernel: [348422.309070] rtorrent: page allocation
failure. order:2, mode:0x4020
Feb 27 21:09:48 vmbuntu kernel: [348422.309075] Pid: 984, comm: rtorrent Not
tainted 2.6.35-25-virtual #44-Ubuntu
Feb 27 21:09:48 vmbuntu kernel: [348422.309078] Call Trace:
Feb 27 21:09:48 vmbuntu kernel: [348422.309087]  [81107956]
__alloc_pages_slowpath+0x4b6/0x5a0
Feb 27 21:09:48 vmbuntu kernel: [348422.309091]  [81107bac]
__alloc_pages_nodemask+0x16c/0x1d0
Feb 27 21:09:48 vmbuntu kernel: [348422.309096]  [8113f742]
kmalloc_large_node+0x62/0xb0
Feb 27 21:09:48 vmbuntu kernel: [348422.309100]  [8114322c]
__kmalloc_node_track_caller+0x11c/0x1d0
Feb 27 21:09:48 vmbuntu kernel: [348422.309105]  [814f1f81] ?
sk_stream_alloc_skb+0x41/0x110
Feb 27 21:09:48 vmbuntu kernel: [348422.309109]  [814a8033]
__alloc_skb+0x83/0x170
Feb 27 21:09:48 vmbuntu kernel: [348422.309112]  [814f1f81]
sk_stream_alloc_skb+0x41/0x110
Feb 27 21:09:48 vmbuntu kernel: [348422.309116]  [814f5868]
tcp_sendmsg+0x3c8/0xa60
Feb 27 21:09:48 vmbuntu kernel: [348422.309121]  [81071096] ?
mod_timer+0x176/0x2d0
Feb 27 21:09:48 vmbuntu kernel: [348422.309126]  [814a07d3]
sock_sendmsg+0xf3/0x120
Feb 27 21:09:48 vmbuntu kernel: [348422.309129]  [814f5c28] ?
tcp_sendmsg+0x788/0xa60
Feb 27 21:09:48 vmbuntu kernel: [348422.309134]  [812975f6] ?
blkcipher_walk_next+0x1b6/0x3e0
Feb 27 21:09:48 vmbuntu kernel: [348422.309137]  [814a07d3] ?
sock_sendmsg+0xf3/0x120
Feb 27 21:09:48 vmbuntu kernel: [348422.309141]  [814a0841]
kernel_sendmsg+0x41/0x60
Feb 27 21:09:48 vmbuntu kernel: [348422.309160]  [a0052bee]
xs_send_kvec+0x8e/0xa0 [sunrpc]
Feb 27 21:09:48 vmbuntu kernel: [348422.309177]  [a01158be] ?
reserve_space+0xe/0x20 [nfs]
Feb 27 21:09:48 vmbuntu kernel: [348422.309185]  [a0052db2]
xs_sendpages+0x1b2/0x210 [sunrpc]
Feb 27 21:09:48 vmbuntu kernel: [348422.309194]  [a0052f4d]
xs_tcp_send_request+0x5d/0x170 [sunrpc]
Feb 27 21:09:48 vmbuntu kernel: [348422.309202]  [a0050fb3]
xprt_transmit+0x83/0x2f0 [sunrpc]
Feb 27 21:09:48 vmbuntu kernel: [348422.309210]  [a004e118]
call_transmit+0xa8/0x130 [sunrpc]
Feb 27 21:09:48 vmbuntu kernel: [348422.309219]  [a00564a2]
__rpc_execute+0xa2/0x290 [sunrpc]
Feb 27 21:09:48 vmbuntu kernel: [348422.309227]  [a0056713]
rpc_execute+0x83/0xa0 [sunrpc]
Feb 27 21:09:48 vmbuntu kernel: [348422.309235]  [a004ec99]
rpc_run_task+0x29/0x40 [sunrpc]
Feb 27 21:09:48 vmbuntu kernel: [348422.309244]  [a010198f]
nfs_read_rpcsetup+0x18f/0x1f0 [nfs]
Feb 27 21:09:48 vmbuntu kernel: [348422.309254]  [a01027b0] ?
readpage_async_filler+0x0/0x160 [nfs]
Feb 27 21:09:48 vmbuntu kernel: [348422.309264]  [a0101dd9]
nfs_pagein_one+0xa9/0xf0 [nfs]
Feb 27 21:09:48 vmbuntu kernel: [348422.309272]  [a00ff6c7]
nfs_pageio_doio+0x37/0x80 [nfs]
Feb 27 21:09:48 vmbuntu kernel: [348422.309281]  [a00ff764]
nfs_pageio_add_request+0x54/0x100 [nfs]
Feb 27 21:09:48 vmbuntu kernel: [348422.309296]  [a0102838]
readpage_async_filler+0x88/0x160 [nfs]
Feb 27 21:09:48 vmbuntu kernel: [348422.309306]  [a01027b0] ?
readpage_async_filler+0x0/0x160 [nfs]
Feb 27 21:09:48 vmbuntu kernel: [348422.309311]  [8110b49a]
read_cache_pages+0xba/0x130
Feb 27 21:09:48 vmbuntu kernel: [348422.309425]  [a0101676]
nfs_readpages+0x146/0x210 [nfs]
Feb 27 21:09:48 vmbuntu kernel: [348422.309435]  [a0101d30] ?
nfs_pagein_one+0x0/0xf0 [nfs]
Feb 27 21:09:48 vmbuntu kernel: [348422.309439]  [8110b1db]
__do_page_cache_readahead+0x13b/0x210
Feb 27 21:09:48 vmbuntu kernel: [348422.309443]  [8110b351]
force_page_cache_readahead+0x71/0xa0
Feb 27 21:09:48 vmbuntu kernel: [348422.309448]  [8111b28b]
madvise_vma+0xcb/0x210
Feb 27 21:09:48 vmbuntu kernel: [348422.309452]  [811265f5] ?
do_mmap_pgoff+0x335/0x380
Feb 27 

Re: [PATCH 2/4] KVM: Add kvm_io_ext_data to IO handler

2011-02-27 Thread Sheng Yang
On Friday 25 February 2011 16:12:30 Michael S. Tsirkin wrote:
 On Fri, Feb 25, 2011 at 11:23:30AM +0800, Sheng Yang wrote:
  On Thursday 24 February 2011 18:22:19 Michael S. Tsirkin wrote:
   On Thu, Feb 24, 2011 at 05:51:03PM +0800, Sheng Yang wrote:
Add a new parameter to IO writing handler, so that we can transfer
information from IO handler to caller.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---

 arch/x86/kvm/i8254.c  |6 --
 arch/x86/kvm/i8259.c  |3 ++-
 arch/x86/kvm/lapic.c  |3 ++-
 arch/x86/kvm/x86.c|   13 -
 include/linux/kvm_host.h  |   12 ++--
 virt/kvm/coalesced_mmio.c |3 ++-
 virt/kvm/eventfd.c|2 +-
 virt/kvm/ioapic.c |2 +-
 virt/kvm/iodev.h  |6 --
 virt/kvm/kvm_main.c   |4 ++--
 10 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index efad723..bd8f0c5 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -439,7 +439,8 @@ static inline int pit_in_range(gpa_t addr)

 }
 
 static int pit_ioport_write(struct kvm_io_device *this,

-   gpa_t addr, int len, const void *data)
+   gpa_t addr, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)

 {
 
struct kvm_pit *pit = dev_to_pit(this);
struct kvm_kpit_state *pit_state = pit-pit_state;

@@ -585,7 +586,8 @@ static int pit_ioport_read(struct kvm_io_device
*this,

 }
 
 static int speaker_ioport_write(struct kvm_io_device *this,

-   gpa_t addr, int len, const void *data)
+   gpa_t addr, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)

 {
 
struct kvm_pit *pit = speaker_to_pit(this);
struct kvm_kpit_state *pit_state = pit-pit_state;

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 3cece05..96b1070 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -480,7 +480,8 @@ static inline struct kvm_pic *to_pic(struct
kvm_io_device *dev)

 }
 
 static int picdev_write(struct kvm_io_device *this,

-gpa_t addr, int len, const void *val)
+gpa_t addr, int len, const void *val,
+struct kvm_io_ext_data *ext_data)

 {
 
struct kvm_pic *s = to_pic(this);
unsigned char data = *(unsigned char *)val;

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 93cf9d0..f413e9c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -836,7 +836,8 @@ static int apic_reg_write(struct kvm_lapic *apic,
u32 reg, u32 val)

 }
 
 static int apic_mmio_write(struct kvm_io_device *this,

-   gpa_t address, int len, const void *data)
+   gpa_t address, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)

 {
 
struct kvm_lapic *apic = to_lapic(this);
unsigned int offset = address - apic-base_address;

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fa708c9..21b84e2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3571,13 +3571,14 @@ static void kvm_init_msr_list(void)

 }
 
 static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int
 len,

-  const void *v)
+  const void *v, struct kvm_io_ext_data 
*ext_data)

 {
 
if (vcpu-arch.apic 

-   !kvm_iodevice_write(vcpu-arch.apic-dev, addr, len, v))
+   !kvm_iodevice_write(vcpu-arch.apic-dev, addr, len, v,
ext_data))

return 0;

-   return kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, addr, len, v);
+   return kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS,
+   addr, len, v, ext_data);

 }
 
 static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int
 len, void *v)

@@ -3807,6 +3808,7 @@ static int
emulator_write_emulated_onepage(unsigned long addr,

   struct kvm_vcpu *vcpu)
 
 {
 
gpa_t gpa;

+   struct kvm_io_ext_data ext_data;

gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);

@@ -3825,7 +3827,7 @@ mmio:
/*

 * Is this MMIO handled locally?
 */

-   if (!vcpu_mmio_write(vcpu, gpa, 

Re: [PATCH 3/4] KVM: Emulate MSI-X table in kernel

2011-02-27 Thread Sheng Yang
On Friday 25 February 2011 16:29:38 Michael S. Tsirkin wrote:
 On Fri, Feb 25, 2011 at 02:28:02PM +0800, Sheng Yang wrote:
  On Thursday 24 February 2011 18:45:08 Michael S. Tsirkin wrote:
   On Thu, Feb 24, 2011 at 05:51:04PM +0800, Sheng Yang wrote:
Then we can support mask bit operation of assigned devices now.

Signed-off-by: Sheng Yang sh...@linux.intel.com
   
   Doesn't look like all comments got addressed.
   E.g. gpa_t entry_base is still there and in reality
   you said it's a host virtual address so
   should be void __user *;
  
  Would update it.
  
   And ENOTSYNC meaning 'MSIX' is pretty hacky.
  
  I'd like to discuss it later. We may need some work on all MMIO handling
  side to make it more straightforward. But I don't want to bundle it with
  this one...
 
 It's not PCI related so I'll defer to Avi/Marcelo on this.
 Are you guys happy with the ENOTSYNC meaning 'MSIX'
 and userspace_exit_needed hacks in this code?
 
---

 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/Makefile   |2 +-
 arch/x86/kvm/mmu.c  |2 +
 arch/x86/kvm/x86.c  |   40 -
 include/linux/kvm.h |   28 
 include/linux/kvm_host.h|   34 +
 virt/kvm/assigned-dev.c |   44 ++
 virt/kvm/kvm_main.c |   38 +-
 virt/kvm/msix_mmio.c|  296
 +++ virt/kvm/msix_mmio.h
 
 |   25 
 
 10 files changed, 497 insertions(+), 13 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

diff --git a/arch/x86/include/asm/kvm_host.h
b/arch/x86/include/asm/kvm_host.h index aa75f21..4a390a4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -635,6 +635,7 @@ enum emulation_result {

EMULATE_DONE,   /* no further processing */
EMULATE_DO_MMIO,  /* kvm_run filled with mmio request */
EMULATE_FAIL, /* can't emulate this instruction */

+   EMULATE_USERSPACE_EXIT, /* we need exit to userspace */

 };
 
 #define EMULTYPE_NO_DECODE (1  0)

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f15501f..3a0d851 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.

 kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o 
ioapic.o 
\
 
coalesced_mmio.o irq_comm.o eventfd.o \

-   assigned-dev.o)
+   assigned-dev.o msix_mmio.o)

 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, 
iommu.o)
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/,
 async_pf.o)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9cafbb4..912dca4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3358,6 +3358,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu,
gva_t cr2, u32 error_code,

case EMULATE_DO_MMIO:
++vcpu-stat.mmio_exits;
/* fall through */

+   case EMULATE_USERSPACE_EXIT:
+   /* fall through */

case EMULATE_FAIL:
return 0;

default:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 21b84e2..87308eb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)

case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:

case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_MSIX_MMIO:
r = 1;
break;

case KVM_CAP_COALESCED_MMIO:
@@ -3809,6 +3810,7 @@ static int
emulator_write_emulated_onepage(unsigned long addr,

 {
 
gpa_t gpa;
struct kvm_io_ext_data ext_data;

+   int r;

gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);

@@ -3824,18 +3826,32 @@ static int
emulator_write_emulated_onepage(unsigned long addr,

 mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);

+   r = vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data);

/*

 * Is this MMIO handled locally?
 */

-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data))
+   if (!r)

return X86EMUL_CONTINUE;

-   vcpu-mmio_needed = 1;
-   vcpu-run-exit_reason = KVM_EXIT_MMIO;
-   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
-   vcpu-run-mmio.len = vcpu-mmio_size = bytes;
-   

[PATCH 0/3] [RFC] Implement multiqueue (RX TX) virtio-net

2011-02-27 Thread Krishna Kumar
This patch series is a continuation of an earlier one that
implemented guest MQ TX functionality.  This new patchset
implements both RX and TX MQ.  Qemu changes are not being
included at this time solely to aid in easier review.
Compatibility testing with old/new combinations of qemu/guest
and vhost was done without any issues.

Some early TCP/UDP test results are at the bottom of this
post, I plan to submit more test results in the coming days.

Please review and provide feedback on what can improve.

Thanks!

Signed-off-by: Krishna Kumar krkum...@in.ibm.com
---


Test configuration:
  Host:  8 Intel Xeon, 8 GB memory
  Guest: 4 cpus, 2 GB memory

Each test case runs for 60 secs, results below are average over
two runs.  Bandwidth numbers are in gbps.  I have used default
netperf, and no testing/system tuning other than taskset each
vhost to 0xf (cpus 0-3).  Comparison is testing original kernel
vs new kernel with #txqs=8 (# refers to number of netperf
sessions).

___
TCP: Guest - Local Host (TCP_STREAM)
#BW1BW2 (%) SD1SD2 (%)   RSD1RSD2 (%)
___
17190   7170 (-.2)  0  0 (0)  3   4 (33.3)
28774   11235 (28.0)3  3 (0)  16  14 (-12.5)
49753   15195 (55.7)17 21 (23.5)  65  59 (-9.2)
810224  18265 (78.6)71 115 (61.9) 251 240 (-4.3)
16   10749  18123 (68.6)277456 (64.6) 985 925 (-6.0)
32   11133  17351 (55.8)1132   1947 (71.9)39353831 (-2.6)
64   11223  17115 (52.4)4682   7836 (67.3)15949   15373 (-3.6)
128  11269  16986 (50.7)19783  31505 (59.2)   66799   61759 (-7.5)
___
Summary:  BW: 37.6  SD: 61.2  RSD: -6.5


___
 TCP: Local Host - Guest (TCP_MAERTS)
#BW1BW2 (%)SD1SD2 (%)RSD1RSD2 (%)
___
111490  10870 (-5.3)   0  0 (0)  2   2 (0)
210612  10554 (-.5)2  3 (50.0)   12  12 (0)
410047  14320 (42.5)   13 16 (23.0)  53  53 (0)
89273   15182 (63.7)   56 84 (50.0)  228 233 (2.1)
16   9291   15853 (70.6)   235390 (65.9) 934 965 (3.3)
32   9382   15741 (67.7)   9691823 (88.1)38684037 (4.3)
64   9270   14585 (57.3)   3966   8836 (122.7)   15415   17818 (15.5)
128  8997   14678 (63.1)   17024  36649 (115.2)  64933   72677 (11.9)
___
SUM:  BW: 24.8  SD: 114.6  RSD: 12.1

__
UDP: Local Host - Guest (UDP_STREAM)
#  BW1  BW2 (%)SD1SD2 (%)
__
1  1723616585 (-3.7)1  1 (0)
2  1679522693 (35.1)5  6 (20.0)
4  1339021502 (60.5)37 36 (-2.7)
8  1326124361 (83.7)163175 (7.3)
16 1277223796 (86.3)692826 (19.3)
32 1283223880 (86.0)2812   2871 (2.0)
64 1277924293 (90.1)11299  11237 (-.5)
1281300624857 (91.1)44778  43884 (-1.9)
__
Summary:  BW: 37.1  SD: -1.2
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] [RFC] Change virtqueue structure

2011-02-27 Thread Krishna Kumar
Move queue_index from virtio_pci_vq_info to virtqueue.  This
allows callback handlers to figure out the queue number for
the vq that needs attention.

Signed-off-by: Krishna Kumar krkum...@in.ibm.com
---
 drivers/virtio/virtio_pci.c |   10 +++---
 include/linux/virtio.h  |1 +
 2 files changed, 4 insertions(+), 7 deletions(-)

diff -ruNp org/include/linux/virtio.h new/include/linux/virtio.h
--- org/include/linux/virtio.h  2010-10-11 10:20:22.0 +0530
+++ new/include/linux/virtio.h  2011-02-23 16:26:18.0 +0530
@@ -22,6 +22,7 @@ struct virtqueue {
void (*callback)(struct virtqueue *vq);
const char *name;
struct virtio_device *vdev;
+   int queue_index;/* the index of the queue */
void *priv;
 };
 
diff -ruNp org/drivers/virtio/virtio_pci.c new/drivers/virtio/virtio_pci.c
--- org/drivers/virtio/virtio_pci.c 2011-01-28 11:38:24.0 +0530
+++ new/drivers/virtio/virtio_pci.c 2011-02-25 10:11:22.0 +0530
@@ -75,9 +75,6 @@ struct virtio_pci_vq_info
/* the number of entries in the queue */
int num;
 
-   /* the index of the queue */
-   int queue_index;
-
/* the virtual address of the ring queue */
void *queue;
 
@@ -180,11 +177,10 @@ static void vp_reset(struct virtio_devic
 static void vp_notify(struct virtqueue *vq)
 {
struct virtio_pci_device *vp_dev = to_vp_device(vq-vdev);
-   struct virtio_pci_vq_info *info = vq-priv;
 
/* we write the queue's selector into the notification register to
 * signal the other end */
-   iowrite16(info-queue_index, vp_dev-ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
+   iowrite16(vq-queue_index, vp_dev-ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
 }
 
 /* Handle a configuration change: Tell driver if it wants to know. */
@@ -380,7 +376,6 @@ static struct virtqueue *setup_vq(struct
if (!info)
return ERR_PTR(-ENOMEM);
 
-   info-queue_index = index;
info-num = num;
info-msix_vector = msix_vec;
 
@@ -403,6 +398,7 @@ static struct virtqueue *setup_vq(struct
goto out_activate_queue;
}
 
+   vq-queue_index = index;
vq-priv = info;
info-vq = vq;
 
@@ -441,7 +437,7 @@ static void vp_del_vq(struct virtqueue *
list_del(info-node);
spin_unlock_irqrestore(vp_dev-lock, flags);
 
-   iowrite16(info-queue_index, vp_dev-ioaddr + VIRTIO_PCI_QUEUE_SEL);
+   iowrite16(vq-queue_index, vp_dev-ioaddr + VIRTIO_PCI_QUEUE_SEL);
 
if (vp_dev-msix_enabled) {
iowrite16(VIRTIO_MSI_NO_VECTOR,
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] [RFC] Changes for MQ virtio-net

2011-02-27 Thread Krishna Kumar
Implement mq virtio-net driver. 

Though struct virtio_net_config changes, it works with the old
qemu since the last element is not accessed unless qemu sets
VIRTIO_NET_F_NUMTXQS.  Patch also adds a macro for the maximum
number of TX vq's (VIRTIO_MAX_TXQS) that the user can specify.

Signed-off-by: Krishna Kumar krkum...@in.ibm.com
---
 drivers/net/virtio_net.c   |  543 ---
 include/linux/virtio_net.h |6 
 2 files changed, 386 insertions(+), 163 deletions(-)

diff -ruNp org/include/linux/virtio_net.h new/include/linux/virtio_net.h
--- org/include/linux/virtio_net.h  2010-10-11 10:20:22.0 +0530
+++ new/include/linux/virtio_net.h  2011-02-25 16:24:15.0 +0530
@@ -7,6 +7,9 @@
 #include linux/virtio_config.h
 #include linux/if_ether.h
 
+/* Maximum number of individual RX/TX queues supported */
+#define VIRTIO_MAX_TXQS16
+
 /* The feature bitmap for virtio net */
 #define VIRTIO_NET_F_CSUM  0   /* Host handles pkts w/ partial csum */
 #define VIRTIO_NET_F_GUEST_CSUM1   /* Guest handles pkts w/ 
partial csum */
@@ -26,6 +29,7 @@
 #define VIRTIO_NET_F_CTRL_RX   18  /* Control channel RX mode support */
 #define VIRTIO_NET_F_CTRL_VLAN 19  /* Control channel VLAN filtering */
 #define VIRTIO_NET_F_CTRL_RX_EXTRA 20  /* Extra RX mode control support */
+#define VIRTIO_NET_F_NUMTXQS   21  /* Device supports multiple TX queue */
 
 #define VIRTIO_NET_S_LINK_UP   1   /* Link is up */
 
@@ -34,6 +38,8 @@ struct virtio_net_config {
__u8 mac[6];
/* See VIRTIO_NET_F_STATUS and VIRTIO_NET_S_* above */
__u16 status;
+   /* number of RX/TX queues */
+   __u16 numtxqs;
 } __attribute__((packed));
 
 /* This is the first element of the scatter-gather list.  If you don't
diff -ruNp org/drivers/net/virtio_net.c new/drivers/net/virtio_net.c
--- org/drivers/net/virtio_net.c2011-02-21 17:55:42.0 +0530
+++ new/drivers/net/virtio_net.c2011-02-25 16:23:41.0 +0530
@@ -40,31 +40,53 @@ module_param(gso, bool, 0444);
 
 #define VIRTNET_SEND_COMMAND_SG_MAX2
 
-struct virtnet_info {
-   struct virtio_device *vdev;
-   struct virtqueue *rvq, *svq, *cvq;
-   struct net_device *dev;
+/* Internal representation of a send virtqueue */
+struct send_queue {
+   struct virtqueue *svq;
+
+   /* TX: fragments + linear part + virtio header */
+   struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
+};
+
+/* Internal representation of a receive virtqueue */
+struct receive_queue {
+   /* Virtqueue associated with this receive_queue */
+   struct virtqueue *rvq;
+
+   /* Back pointer to the virtnet_info */
+   struct virtnet_info *vi;
+
struct napi_struct napi;
-   unsigned int status;
 
/* Number of input buffers, and max we've ever had. */
unsigned int num, max;
 
-   /* I like... big packets and I cannot lie! */
-   bool big_packets;
-
-   /* Host will merge rx buffers for big packets (shake it! shake it!) */
-   bool mergeable_rx_bufs;
-
/* Work struct for refilling if we run low on memory. */
struct delayed_work refill;
 
/* Chain pages by the private ptr. */
struct page *pages;
 
-   /* fragments + linear part + virtio header */
+   /* RX: fragments + linear part + virtio header */
struct scatterlist rx_sg[MAX_SKB_FRAGS + 2];
-   struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
+};
+
+struct virtnet_info {
+   struct send_queue **sq;
+   struct receive_queue **rq;
+
+   /* read-mostly variables */
+   int numtxqs cacheline_aligned_in_smp;   /* # of rxqs/txqs */
+   struct virtio_device *vdev;
+   struct virtqueue *cvq;
+   struct net_device *dev;
+   unsigned int status;
+
+   /* I like... big packets and I cannot lie! */
+   bool big_packets;
+
+   /* Host will merge rx buffers for big packets (shake it! shake it!) */
+   bool mergeable_rx_bufs;
 };
 
 struct skb_vnet_hdr {
@@ -94,22 +116,22 @@ static inline struct skb_vnet_hdr *skb_v
  * private is used to chain pages for big packets, put the whole
  * most recent used list in the beginning for reuse
  */
-static void give_pages(struct virtnet_info *vi, struct page *page)
+static void give_pages(struct receive_queue *rq, struct page *page)
 {
struct page *end;
 
/* Find end of list, sew whole thing into vi-pages. */
for (end = page; end-private; end = (struct page *)end-private);
-   end-private = (unsigned long)vi-pages;
-   vi-pages = page;
+   end-private = (unsigned long)rq-pages;
+   rq-pages = page;
 }
 
-static struct page *get_a_page(struct virtnet_info *vi, gfp_t gfp_mask)
+static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 {
-   struct page *p = vi-pages;
+   struct page *p = rq-pages;
 
if (p) {
-   vi-pages = (struct page *)p-private;
+  

[PATCH 3/3] [RFC] Changes for MQ vhost

2011-02-27 Thread Krishna Kumar
Changes for mq vhost.

vhost_net_open is changed to allocate a vhost_net and return.
The remaining initializations are delayed till SET_OWNER.
SET_OWNER is changed so that the argument is used to determine
how many txqs to use.  Unmodified qemu's will pass NULL, so
this is recognized and handled as numtxqs=1.

The number of vhost threads is = #txqs.  Threads handle more
than one txq when #txqs is more than MAX_VHOST_THREADS (4).
The same thread handles both RX and TX - tested with tap/bridge
so far (TBD: needs some changes in macvtap driver to support
the same).

I had to convert space-tab in vhost_attach_cgroups* to avoid
checkpatch errors.
 
Signed-off-by: Krishna Kumar krkum...@in.ibm.com
---
 drivers/vhost/net.c   |  295 ++--
 drivers/vhost/vhost.c |  225 +++---
 drivers/vhost/vhost.h |   39 -
 3 files changed, 378 insertions(+), 181 deletions(-)

diff -ruNp org/drivers/vhost/vhost.h new/drivers/vhost/vhost.h
--- org/drivers/vhost/vhost.h   2011-02-08 09:05:09.0 +0530
+++ new/drivers/vhost/vhost.h   2011-02-28 11:48:06.0 +0530
@@ -35,11 +35,11 @@ struct vhost_poll {
wait_queue_t  wait;
struct vhost_work work;
unsigned long mask;
-   struct vhost_dev *dev;
+   struct vhost_virtqueue*vq;  /* points back to vq */
 };
 
 void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
-unsigned long mask, struct vhost_dev *dev);
+unsigned long mask, struct vhost_virtqueue *vq);
 void vhost_poll_start(struct vhost_poll *poll, struct file *file);
 void vhost_poll_stop(struct vhost_poll *poll);
 void vhost_poll_flush(struct vhost_poll *poll);
@@ -108,8 +108,14 @@ struct vhost_virtqueue {
/* Log write descriptors */
void __user *log_base;
struct vhost_log *log;
+   struct task_struct *worker; /* worker for this vq */
+   spinlock_t *work_lock;  /* points to a dev-work_lock[] entry */
+   struct list_head *work_list;/* points to a dev-work_list[] entry */
+   int qnum;   /* 0 for RX, 1 - n-1 for TX */
 };
 
+#define MAX_VHOST_THREADS  4
+
 struct vhost_dev {
/* Readers use RCU to access memory table pointer
 * log base pointer and features.
@@ -122,12 +128,33 @@ struct vhost_dev {
int nvqs;
struct file *log_file;
struct eventfd_ctx *log_ctx;
-   spinlock_t work_lock;
-   struct list_head work_list;
-   struct task_struct *worker;
+   spinlock_t *work_lock[MAX_VHOST_THREADS];
+   struct list_head *work_list[MAX_VHOST_THREADS];
 };
 
-long vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue *vqs, int nvqs);
+/*
+ * Return maximum number of vhost threads needed to handle RX  TX.
+ * Upto MAX_VHOST_THREADS are started, and threads can be shared
+ * among different vq's if numtxqs  MAX_VHOST_THREADS.
+ */
+static inline int get_nvhosts(int nvqs)
+{
+   return min_t(int, nvqs / 2, MAX_VHOST_THREADS);
+}
+
+/*
+ * Get index of an existing thread that will handle this txq/rxq.
+ * The same thread handles both rx[index] and tx[index].
+ */
+static inline int vhost_get_thread_index(int index, int numtxqs, int nvhosts)
+{
+   return (index % numtxqs) % nvhosts;
+}
+
+int vhost_setup_vqs(struct vhost_dev *dev, int numtxqs);
+void vhost_free_vqs(struct vhost_dev *dev);
+long vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue *vqs, int nvqs,
+   int nvhosts);
 long vhost_dev_check_owner(struct vhost_dev *);
 long vhost_dev_reset_owner(struct vhost_dev *);
 void vhost_dev_cleanup(struct vhost_dev *);
diff -ruNp org/drivers/vhost/net.c new/drivers/vhost/net.c
--- org/drivers/vhost/net.c 2011-02-08 09:05:09.0 +0530
+++ new/drivers/vhost/net.c 2011-02-28 11:48:53.0 +0530
@@ -32,12 +32,6 @@
  * Using this limit prevents one virtqueue from starving others. */
 #define VHOST_NET_WEIGHT 0x8
 
-enum {
-   VHOST_NET_VQ_RX = 0,
-   VHOST_NET_VQ_TX = 1,
-   VHOST_NET_VQ_MAX = 2,
-};
-
 enum vhost_net_poll_state {
VHOST_NET_POLL_DISABLED = 0,
VHOST_NET_POLL_STARTED = 1,
@@ -46,12 +40,13 @@ enum vhost_net_poll_state {
 
 struct vhost_net {
struct vhost_dev dev;
-   struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX];
-   struct vhost_poll poll[VHOST_NET_VQ_MAX];
+   struct vhost_virtqueue *vqs;
+   struct vhost_poll *poll;
+   struct socket **socks;
/* Tells us whether we are polling a socket for TX.
 * We only do this when socket buffer fills up.
 * Protected by tx vq lock. */
-   enum vhost_net_poll_state tx_poll_state;
+   enum vhost_net_poll_state *tx_poll_state;
 };
 
 /* Pop first len bytes from iovec. Return number of segments used. */
@@ -91,28 +86,28 @@ static void copy_iovec_hdr(const struct 
 }
 
 /* Caller must have TX VQ lock */
-static void tx_poll_stop(struct vhost_net *net)

[PATCH 2/4] KVM: Add kvm_io_ext_data to IO handler

2011-02-27 Thread Sheng Yang
Add a new parameter to IO writing handler, so that we can transfer information
from IO handler to caller.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/kvm/i8254.c  |6 --
 arch/x86/kvm/i8259.c  |3 ++-
 arch/x86/kvm/lapic.c  |3 ++-
 arch/x86/kvm/x86.c|   13 -
 include/linux/kvm_host.h  |9 +++--
 virt/kvm/coalesced_mmio.c |3 ++-
 virt/kvm/eventfd.c|2 +-
 virt/kvm/ioapic.c |2 +-
 virt/kvm/iodev.h  |6 --
 virt/kvm/kvm_main.c   |4 ++--
 10 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index efad723..bd8f0c5 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -439,7 +439,8 @@ static inline int pit_in_range(gpa_t addr)
 }
 
 static int pit_ioport_write(struct kvm_io_device *this,
-   gpa_t addr, int len, const void *data)
+   gpa_t addr, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)
 {
struct kvm_pit *pit = dev_to_pit(this);
struct kvm_kpit_state *pit_state = pit-pit_state;
@@ -585,7 +586,8 @@ static int pit_ioport_read(struct kvm_io_device *this,
 }
 
 static int speaker_ioport_write(struct kvm_io_device *this,
-   gpa_t addr, int len, const void *data)
+   gpa_t addr, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)
 {
struct kvm_pit *pit = speaker_to_pit(this);
struct kvm_kpit_state *pit_state = pit-pit_state;
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 3cece05..96b1070 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -480,7 +480,8 @@ static inline struct kvm_pic *to_pic(struct kvm_io_device 
*dev)
 }
 
 static int picdev_write(struct kvm_io_device *this,
-gpa_t addr, int len, const void *val)
+gpa_t addr, int len, const void *val,
+struct kvm_io_ext_data *ext_data)
 {
struct kvm_pic *s = to_pic(this);
unsigned char data = *(unsigned char *)val;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 93cf9d0..f413e9c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -836,7 +836,8 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
 }
 
 static int apic_mmio_write(struct kvm_io_device *this,
-   gpa_t address, int len, const void *data)
+   gpa_t address, int len, const void *data,
+   struct kvm_io_ext_data *ext_data)
 {
struct kvm_lapic *apic = to_lapic(this);
unsigned int offset = address - apic-base_address;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fa708c9..21b84e2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3571,13 +3571,14 @@ static void kvm_init_msr_list(void)
 }
 
 static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len,
-  const void *v)
+  const void *v, struct kvm_io_ext_data *ext_data)
 {
if (vcpu-arch.apic 
-   !kvm_iodevice_write(vcpu-arch.apic-dev, addr, len, v))
+   !kvm_iodevice_write(vcpu-arch.apic-dev, addr, len, v, ext_data))
return 0;
 
-   return kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS, addr, len, v);
+   return kvm_io_bus_write(vcpu-kvm, KVM_MMIO_BUS,
+   addr, len, v, ext_data);
 }
 
 static int vcpu_mmio_read(struct kvm_vcpu *vcpu, gpa_t addr, int len, void *v)
@@ -3807,6 +3808,7 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
   struct kvm_vcpu *vcpu)
 {
gpa_t gpa;
+   struct kvm_io_ext_data ext_data;
 
gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
 
@@ -3825,7 +3827,7 @@ mmio:
/*
 * Is this MMIO handled locally?
 */
-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val))
+   if (!vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data))
return X86EMUL_CONTINUE;
 
vcpu-mmio_needed = 1;
@@ -3940,6 +3942,7 @@ static int kernel_pio(struct kvm_vcpu *vcpu, void *pd)
 {
/* TODO: String I/O for in kernel device */
int r;
+   struct kvm_io_ext_data ext_data;
 
if (vcpu-arch.pio.in)
r = kvm_io_bus_read(vcpu-kvm, KVM_PIO_BUS, vcpu-arch.pio.port,
@@ -3947,7 +3950,7 @@ static int kernel_pio(struct kvm_vcpu *vcpu, void *pd)
else
r = kvm_io_bus_write(vcpu-kvm, KVM_PIO_BUS,
 vcpu-arch.pio.port, vcpu-arch.pio.size,
-pd);
+pd, ext_data);
return r;
 }
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7d313e0..a32c53e 100644
--- 

[PATCH 0/4 v11] MSI-X MMIO support for KVM

2011-02-27 Thread Sheng Yang
Change from v9:
Update according to the comments of Alex and Michael.

Notice this patchset still based on 2.6.37 due to a block bug on assigned
device in the upstream now.

Sheng Yang (4):
  KVM: Move struct kvm_io_device to kvm_host.h
  KVM: Add kvm_io_ext_data to IO handler
  KVM: Emulate MSI-X table in kernel
  KVM: Add documents for MSI-X MMIO API

 Documentation/kvm/api.txt   |   58 
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/Makefile   |2 +-
 arch/x86/kvm/i8254.c|6 +-
 arch/x86/kvm/i8259.c|3 +-
 arch/x86/kvm/lapic.c|3 +-
 arch/x86/kvm/mmu.c  |2 +
 arch/x86/kvm/x86.c  |   51 +--
 include/linux/kvm.h |   28 
 include/linux/kvm_host.h|   66 +-
 virt/kvm/assigned-dev.c |   44 ++
 virt/kvm/coalesced_mmio.c   |3 +-
 virt/kvm/eventfd.c  |2 +-
 virt/kvm/ioapic.c   |2 +-
 virt/kvm/iodev.h|   31 +
 virt/kvm/kvm_main.c |   40 +-
 virt/kvm/msix_mmio.c|  301 +++
 virt/kvm/msix_mmio.h|   25 
 18 files changed, 616 insertions(+), 52 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] KVM: Emulate MSI-X table in kernel

2011-02-27 Thread Sheng Yang
Then we can support mask bit operation of assigned devices now.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/Makefile   |2 +-
 arch/x86/kvm/mmu.c  |2 +
 arch/x86/kvm/x86.c  |   40 -
 include/linux/kvm.h |   28 
 include/linux/kvm_host.h|   36 +
 virt/kvm/assigned-dev.c |   44 ++
 virt/kvm/kvm_main.c |   38 +-
 virt/kvm/msix_mmio.c|  301 +++
 virt/kvm/msix_mmio.h|   25 
 10 files changed, 504 insertions(+), 13 deletions(-)
 create mode 100644 virt/kvm/msix_mmio.c
 create mode 100644 virt/kvm/msix_mmio.h

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index aa75f21..4a390a4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -635,6 +635,7 @@ enum emulation_result {
EMULATE_DONE,   /* no further processing */
EMULATE_DO_MMIO,  /* kvm_run filled with mmio request */
EMULATE_FAIL, /* can't emulate this instruction */
+   EMULATE_USERSPACE_EXIT, /* we need exit to userspace */
 };
 
 #define EMULTYPE_NO_DECODE (1  0)
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f15501f..3a0d851 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -7,7 +7,7 @@ CFLAGS_vmx.o := -I.
 
 kvm-y  += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \
coalesced_mmio.o irq_comm.o eventfd.o \
-   assigned-dev.o)
+   assigned-dev.o msix_mmio.o)
 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(addprefix ../../../virt/kvm/, async_pf.o)
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9cafbb4..912dca4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3358,6 +3358,8 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, 
u32 error_code,
case EMULATE_DO_MMIO:
++vcpu-stat.mmio_exits;
/* fall through */
+   case EMULATE_USERSPACE_EXIT:
+   /* fall through */
case EMULATE_FAIL:
return 0;
default:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 21b84e2..87308eb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1966,6 +1966,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_MSIX_MMIO:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -3809,6 +3810,7 @@ static int emulator_write_emulated_onepage(unsigned long 
addr,
 {
gpa_t gpa;
struct kvm_io_ext_data ext_data;
+   int r;
 
gpa = kvm_mmu_gva_to_gpa_write(vcpu, addr, exception);
 
@@ -3824,18 +3826,32 @@ static int emulator_write_emulated_onepage(unsigned 
long addr,
 
 mmio:
trace_kvm_mmio(KVM_TRACE_MMIO_WRITE, bytes, gpa, *(u64 *)val);
+   r = vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data);
/*
 * Is this MMIO handled locally?
 */
-   if (!vcpu_mmio_write(vcpu, gpa, bytes, val, ext_data))
+   if (!r)
return X86EMUL_CONTINUE;
 
-   vcpu-mmio_needed = 1;
-   vcpu-run-exit_reason = KVM_EXIT_MMIO;
-   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
-   vcpu-run-mmio.len = vcpu-mmio_size = bytes;
-   vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
-   memcpy(vcpu-run-mmio.data, val, bytes);
+   if (r == -ENOTSYNC) {
+   vcpu-userspace_exit_needed = 1;
+   vcpu-run-exit_reason = KVM_EXIT_MSIX_ROUTING_UPDATE;
+   vcpu-run-msix_routing.dev_id =
+   ext_data.msix_routing.dev_id;
+   vcpu-run-msix_routing.type =
+   ext_data.msix_routing.type;
+   vcpu-run-msix_routing.entry_idx =
+   ext_data.msix_routing.entry_idx;
+   vcpu-run-msix_routing.flags =
+   ext_data.msix_routing.flags;
+   } else  {
+   vcpu-mmio_needed = 1;
+   vcpu-run-exit_reason = KVM_EXIT_MMIO;
+   vcpu-run-mmio.phys_addr = vcpu-mmio_phys_addr = gpa;
+   vcpu-run-mmio.len = vcpu-mmio_size = bytes;
+   vcpu-run-mmio.is_write = vcpu-mmio_is_write = 1;
+   memcpy(vcpu-run-mmio.data, val, bytes);
+   }
 
return X86EMUL_CONTINUE;
 }
@@ -4469,6 +4485,8 @@ done:
r = EMULATE_DO_MMIO;
} else if (r == EMULATION_RESTART)
goto restart;
+   else if (vcpu-userspace_exit_needed)
+   r = EMULATE_USERSPACE_EXIT;
else
r = EMULATE_DONE;
 
@@ -5397,12 +5415,18 @@ int 

[PATCH 4/4] KVM: Add documents for MSI-X MMIO API

2011-02-27 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 Documentation/kvm/api.txt |   58 +
 1 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index e1a9297..dd10c3b 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -1263,6 +1263,53 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+4.54 KVM_REGISTER_MSIX_MMIO
+
+Capability: KVM_CAP_MSIX_MMIO
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_msix_mmio_user (in)
+Returns: 0 on success, -1 on error
+
+This API indicates an MSI-X MMIO address of a guest device. Then all MMIO
+operation would be handled by kernel. When necessary(e.g. MSI data/address
+changed), KVM would exit to userspace using KVM_EXIT_MSIX_ROUTING_UPDATE to
+indicate the MMIO modification and require userspace to update IRQ routing
+table.
+
+NOTICE: Writing the MSI-X MMIO page after it was registered with this API may
+be dangerous for userspace program. The writing during VM running may result
+in synchronization issue therefore the assigned device can't work properly.
+The writing is allowed when VM is not running and can be used as save/restore
+mechanism.
+
+struct kvm_msix_mmio_user {
+   __u32 dev_id;
+   __u16 type; /* Device type and MMIO address type */
+   __u16 max_entries_nr;   /* Maximum entries supported */
+   __u64 base_addr;/* Guest physical address of MMIO */
+   __u64 base_va;  /* Host virtual address of MMIO mapping */
+   __u64 flags;/* Reserved for now */
+   __u64 reserved[4];
+};
+
+Current device type can be:
+#define KVM_MSIX_MMIO_TYPE_ASSIGNED_DEV(1  0)
+
+Current MMIO type can be:
+#define KVM_MSIX_MMIO_TYPE_BASE_TABLE  (1  8)
+
+4.55 KVM_UNREGISTER_MSIX_MMIO
+
+Capability: KVM_CAP_MSIX_MMIO
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_msix_mmio_user (in)
+Returns: 0 on success, -1 on error
+
+This API would unregister the specific MSI-X MMIO, indicated by dev_id and
+type fields of struct kvm_msix_mmio_user.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
@@ -1445,6 +1492,17 @@ Userspace can now handle the hypercall and when it's 
done modify the gprs as
 necessary. Upon guest entry all guest GPRs will then be replaced by the values
 in this struct.
 
+   /* KVM_EXIT_MSIX_ROUTING_UPDATE*/
+   struct {
+   __u32 dev_id;
+   __u16 type;
+   __u16 entry_idx;
+   __u64 flags;
+   } msix_routing;
+
+KVM_EXIT_MSIX_ROUTING_UPDATE indicates one MSI-X entry has been modified, and
+userspace need to update the correlated routing table.
+
/* Fix the size of the union. */
char padding[256];
};
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] KVM: Move struct kvm_io_device to kvm_host.h

2011-02-27 Thread Sheng Yang
Then it can be used by other struct in kvm_host.h

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 include/linux/kvm_host.h |   23 +++
 virt/kvm/iodev.h |   25 +
 2 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b5021db..7d313e0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -98,6 +98,29 @@ int kvm_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, 
gfn_t gfn,
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
+struct kvm_io_device;
+
+/**
+ * kvm_io_device_ops are called under kvm slots_lock.
+ * read and write handlers return 0 if the transaction has been handled,
+ * or non-zero to have it passed to the next device.
+ **/
+struct kvm_io_device_ops {
+   int (*read)(struct kvm_io_device *this,
+   gpa_t addr,
+   int len,
+   void *val);
+   int (*write)(struct kvm_io_device *this,
+gpa_t addr,
+int len,
+const void *val);
+   void (*destructor)(struct kvm_io_device *this);
+};
+
+struct kvm_io_device {
+   const struct kvm_io_device_ops *ops;
+};
+
 struct kvm_vcpu {
struct kvm *kvm;
 #ifdef CONFIG_PREEMPT_NOTIFIERS
diff --git a/virt/kvm/iodev.h b/virt/kvm/iodev.h
index 12fd3ca..d1f5651 100644
--- a/virt/kvm/iodev.h
+++ b/virt/kvm/iodev.h
@@ -17,32 +17,9 @@
 #define __KVM_IODEV_H__
 
 #include linux/kvm_types.h
+#include linux/kvm_host.h
 #include asm/errno.h
 
-struct kvm_io_device;
-
-/**
- * kvm_io_device_ops are called under kvm slots_lock.
- * read and write handlers return 0 if the transaction has been handled,
- * or non-zero to have it passed to the next device.
- **/
-struct kvm_io_device_ops {
-   int (*read)(struct kvm_io_device *this,
-   gpa_t addr,
-   int len,
-   void *val);
-   int (*write)(struct kvm_io_device *this,
-gpa_t addr,
-int len,
-const void *val);
-   void (*destructor)(struct kvm_io_device *this);
-};
-
-
-struct kvm_io_device {
-   const struct kvm_io_device_ops *ops;
-};
-
 static inline void kvm_iodevice_init(struct kvm_io_device *dev,
 const struct kvm_io_device_ops *ops)
 {
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] [RFC] Implement multiqueue (RX TX) virtio-net

2011-02-27 Thread Michael S. Tsirkin
On Mon, Feb 28, 2011 at 12:04:27PM +0530, Krishna Kumar wrote:
 This patch series is a continuation of an earlier one that
 implemented guest MQ TX functionality.  This new patchset
 implements both RX and TX MQ.  Qemu changes are not being
 included at this time solely to aid in easier review.
 Compatibility testing with old/new combinations of qemu/guest
 and vhost was done without any issues.
 
 Some early TCP/UDP test results are at the bottom of this
 post, I plan to submit more test results in the coming days.
 
 Please review and provide feedback on what can improve.
 
 Thanks!
 
 Signed-off-by: Krishna Kumar krkum...@in.ibm.com


To help testing, could you post the qemu changes separately please?

 ---
 
 
 Test configuration:
   Host:  8 Intel Xeon, 8 GB memory
   Guest: 4 cpus, 2 GB memory
 
 Each test case runs for 60 secs, results below are average over
 two runs.  Bandwidth numbers are in gbps.  I have used default
 netperf, and no testing/system tuning other than taskset each
 vhost to 0xf (cpus 0-3).  Comparison is testing original kernel
 vs new kernel with #txqs=8 (# refers to number of netperf
 sessions).
 
 ___
 TCP: Guest - Local Host (TCP_STREAM)
 #BW1BW2 (%) SD1SD2 (%)   RSD1RSD2 (%)
 ___
 17190   7170 (-.2)  0  0 (0)  3   4 (33.3)
 28774   11235 (28.0)3  3 (0)  16  14 (-12.5)
 49753   15195 (55.7)17 21 (23.5)  65  59 (-9.2)
 810224  18265 (78.6)71 115 (61.9) 251 240 (-4.3)
 16   10749  18123 (68.6)277456 (64.6) 985 925 (-6.0)
 32   11133  17351 (55.8)1132   1947 (71.9)39353831 (-2.6)
 64   11223  17115 (52.4)4682   7836 (67.3)15949   15373 (-3.6)
 128  11269  16986 (50.7)19783  31505 (59.2)   66799   61759 (-7.5)
 ___
 Summary:  BW: 37.6  SD: 61.2  RSD: -6.5
 
 
 ___
  TCP: Local Host - Guest (TCP_MAERTS)
 #BW1BW2 (%)SD1SD2 (%)RSD1RSD2 (%)
 ___
 111490  10870 (-5.3)   0  0 (0)  2   2 (0)
 210612  10554 (-.5)2  3 (50.0)   12  12 (0)
 410047  14320 (42.5)   13 16 (23.0)  53  53 (0)
 89273   15182 (63.7)   56 84 (50.0)  228 233 (2.1)
 16   9291   15853 (70.6)   235390 (65.9) 934 965 (3.3)
 32   9382   15741 (67.7)   9691823 (88.1)38684037 (4.3)
 64   9270   14585 (57.3)   3966   8836 (122.7)   15415   17818 (15.5)
 128  8997   14678 (63.1)   17024  36649 (115.2)  64933   72677 (11.9)
 ___
 SUM:  BW: 24.8  SD: 114.6  RSD: 12.1
 
 __
 UDP: Local Host - Guest (UDP_STREAM)
 #  BW1  BW2 (%)SD1SD2 (%)
 __
 1  1723616585 (-3.7)1  1 (0)
 2  1679522693 (35.1)5  6 (20.0)
 4  1339021502 (60.5)37 36 (-2.7)
 8  1326124361 (83.7)163175 (7.3)
 16 1277223796 (86.3)692826 (19.3)
 32 1283223880 (86.0)2812   2871 (2.0)
 64 1277924293 (90.1)11299  11237 (-.5)
 1281300624857 (91.1)44778  43884 (-1.9)
 __
 Summary:  BW: 37.1  SD: -1.2
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH V6 3/4] qmp, nmi: convert do_inject_nmi() to QObject

2011-02-27 Thread Markus Armbruster
Anthony Liguori anth...@codemonkey.ws writes:

 On 02/25/2011 03:54 AM, Markus Armbruster wrote:
 Anthony Liguorialigu...@linux.vnet.ibm.com  writes:


 On 02/24/2011 10:20 AM, Markus Armbruster wrote:
  
 Anthony Liguorialigu...@linux.vnet.ibm.com   writes:



 On 02/24/2011 02:33 AM, Markus Armbruster wrote:

  
 Anthony Liguorianth...@codemonkey.wswrites:

 [...]

 Please describe all expected errors.


  
 Quoting qmp-commands.hx:

3. Errors, in special, are not documented. Applications should 
 NOT check
   for specific errors classes or data (it's strongly recommended 
 to only
   check for the error key)

 Indeed, not a single error is documented there.  This is intentional.



 Yeah, but we're not 0.14 anymore and for 0.15, we need to document
 errors.  If you are suggesting I send a patch to remove that section,
 I'm more than happy to.

  
 Two separate issues here: 1. Are we ready to commit to the current
 design of errors, and 2. Is it fair to reject Lai's patch now because he
 doesn't document his errors.

 I'm not commenting on 1. here.

 Regarding 2.: rejecting a patch because it doesn't document an aspect
 that current master intentionally leaves undocumented is not how you
 treat contributors.  At least not if you want any other than certified
 masochists who enjoy pain, and professionals who get adequately
 compensated for it.

 Lead by example, not by fiat.


 http://repo.or.cz/w/qemu/aliguori.git/blob/refs/heads/glib:/qmp-schema.json

 I am in the process of documenting the errors of every command.  It's
 a royal pain but I'm going to document everything we have right now.
 It's actually the last bit of work I need to finish before sending
 QAPI out.

 So for new commands being added, it would be hugely helpful for the
 authors to document the errors such that I don't have to reverse
 engineer all of the possible error conditions.
  
 The moment this lands in master, you can begin to demand error
 descriptions from contributors.  Until then, I'll NAK error descriptions
 in qmp-commands.txt.  We left them undocumented there for good reasons:


 No, it was always a bad reason.  Good documentation is necessary to
 build good commands.  Errors are a huge part of the semantics of a
 command.  We cannot properly assess a command unless it's behavior in
 error conditions is well defined.

The decision to declare QMP stable was made at the KVM Forum after quite
some deliberation.  We explictly excluded errors.  We didn't do that
because errors are not worthy of specification (that would be silly).
We did it because there was too much unhappiness about the current state
of errors to close the debate on them at that time.  Luiz mindfully
documented that decision in qmp-commands.txt:

3. Errors, in special, are not documented. Applications should NOT check
   for specific errors classes or data (it's strongly recommended to only
   check for the error key)

Nobody denies that errors need to be specified, and this clause needs to
go.  But I must insist on proper review.  No sneaking in of error
specification through the commit backdoor, please.

 Once we have an error design in place that has a reasonable hope to
 stand the test of time, and have errors documented for at least some of
 the commands here, we can start to require proper error documentation
 for new commands.  But not now.

 I won't NAK non-normative error descriptions, say in commit messages, or
 in comments.  And I won't object to you asking for them.  But I feel you
 really shouldn't make it a condition for committing patches.  Especially
 not for simple patches that have been on list for months.


 I'm strongly committed to making QMP a first class interface in QEMU
 for 0.15.  I feel documentation is a crucial part to making that
 happen.

 I'm not asking for test cases even though that's something that we'll
 need for 0.15 because there's not enough infrastructure in master yet
 to do that in a reasonable way.  I realize I'm likely going to have to
 write that test case and I'm happy to do that.

 But there's no reason that we shouldn't require thorough documentation
 for all new QMP commands moving forward including error semantics.
 This is a critical part of having a first class API and no additional
 infrastructure is needed in master to do this.

Broken record time, again.

Nobody denies that errors need to be specified.  Fair goal for 0.15.
But I must insist on proper review.  No sneaking in of error
specification through the commit backdoor, please.

I won't NAK non-normative error descriptions, say in commit messages, or
in comments.  And I won't object to you asking for them.  But I feel you
really shouldn't make it a condition for committing patches.  Especially
not for simple patches that have been on list for months.
--
To unsubscribe from this list: send the line