Re: [kvm-devel] [PATCH 2/3] virtio ring implementation

2007-09-26 Thread Dor Laor
Rusty Russell wrote:
 On Tue, 2007-09-25 at 19:15 +0200, Dor Laor wrote:
   
 At the moment it's not good enough, there is a potential race were the
 guest optimistically turn off
 the VRING_AVAIL_F_NO_INTERRUPT in the vring_restart and afterwards
 finds there are more_used so
 it consume them in the poll function. 
 If in the middle, packets arrive the host will see the flag is off and
 send irq.
 In that case the above irq handler will report IRQ_NONE.
 

 Good point.  On the one hand, reporting IRQ_NONE occasionally is not
 fatal.  On the other, it'd be nice to get this right.

   
 It's also not trivial to cancel the optimistic approach in the restart
 function since then there might be another race
 that will result in missing irq reaching the guest.
 

 I did it optimistically because otherwise we need two barriers (and
 still have a window).

   
 I'll try to think of something better than a hypercall for switching
 irq on/off.
 

 Erk.  That would be v. bad...
   
Actually double barrier will satisfy non-shared irq lines and will 
report irq handling right.
The more complex scenario is shared irqs (pci...), since several devices 
are raising the irq line (level triggered)
in the same time, while the tap add receive packets it always leaves a 
potential race for irq handled return
value. So seems like we better not have shared irq and if only if shared 
irq is a must we should do the
naughty naughty things.

If returning IRQ_NONE once in a while is not too bad (including the 
matching windows implementation)
then there is no issue.
Dor.
 

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 2/3] virtio ring implementation

2007-09-25 Thread Dor Laor

Rusty Russell wrote:


On Mon, 2007-09-24 at 15:44 +0200, Dor Laor wrote:
 Rusty Russell wrote:
  +irqreturn_t vring_interrupt(int irq, void *_vq)
  +{
  +   struct vring_virtqueue *vq = to_vvq(_vq);
  +
  +   pr_debug(virtqueue interrupt for %p\n, vq);
  +
  +   if (unlikely(vq-broken))
  +   return IRQ_HANDLED;
  +
  +   if (more_used(vq)) {
  +   pr_debug(virtqueue callback for %p (%p)\n,
  +vq, vq-vq.callback);
  +   if (!vq-vq.callback)
  +   return IRQ_NONE;
  +   if (!vq-vq.callback(vq-vq))
  +   vq-vring.avail-flags |=
  VRING_AVAIL_F_NO_INTERRUPT;
  +   } else
  +   pr_debug(virtqueue %p no more used\n, vq);
  +
  +   return IRQ_HANDLED;
  +}
  +
 
 Seems like there is a problem with shared irq line, the interrupt
 handler always returns IRQ_HANDLED (except for the trivial case
 were the callback is null).

To reply for a second time, this time with thought.  Sorry.

Yes, this code is wrong.  It should be:

irqreturn_t vring_interrupt(int irq, void *_vq)
{
struct vring_virtqueue *vq = to_vvq(_vq);

if (!more_used(vq)) {
pr_debug(virtqueue interrupt with no work for %p\n, vq);
return IRQ_NONE;
}

if (unlikely(vq-broken))
return IRQ_HANDLED;

pr_debug(virtqueue callback for %p (%p)\n, vq, vq-vq.callback);
if (vq-vq.callback  !vq-vq.callback(vq-vq))
vq-vring.avail-flags |= VRING_AVAIL_F_NO_INTERRUPT;

return IRQ_HANDLED;
}

Cheers,
Rusty.

At the moment it's not good enough, there is a potential race were the 
guest optimistically turn off
the VRING_AVAIL_F_NO_INTERRUPT in the vring_restart and afterwards finds 
there are more_used so

it consume them in the poll function.
If in the middle, packets arrive the host will see the flag is off and 
send irq.

In that case the above irq handler will report IRQ_NONE.

It's also not trivial to cancel the optimistic approach in the restart 
function since then there might be another race

that will result in missing irq reaching the guest.
I'll try to think of something better than a hypercall for switching irq 
on/off.

Cheers mate,
Dor.
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 2/3] virtio ring implementation

2007-09-25 Thread Rusty Russell
On Tue, 2007-09-25 at 19:15 +0200, Dor Laor wrote:
 At the moment it's not good enough, there is a potential race were the
 guest optimistically turn off
 the VRING_AVAIL_F_NO_INTERRUPT in the vring_restart and afterwards
 finds there are more_used so
 it consume them in the poll function. 
 If in the middle, packets arrive the host will see the flag is off and
 send irq.
 In that case the above irq handler will report IRQ_NONE.

Good point.  On the one hand, reporting IRQ_NONE occasionally is not
fatal.  On the other, it'd be nice to get this right.

 It's also not trivial to cancel the optimistic approach in the restart
 function since then there might be another race
 that will result in missing irq reaching the guest.

I did it optimistically because otherwise we need two barriers (and
still have a window).

 I'll try to think of something better than a hypercall for switching
 irq on/off.

Erk.  That would be v. bad...

Thanks,
Rusty.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


[kvm-devel] [PATCH 2/3] virtio ring implementation

2007-09-24 Thread Rusty Russell
These helper routines supply most of the virtqueue_ops for hypervisors
which want to use a ring for virtio.  Unlike the previous lguest
implementation:

1) The rings are variable sized (2^n-1 elements).
2) They have an unfortunate limit of 65535 bytes per sg element.
3) The page numbers are always 64 bit (PAE anyone?)
4) They no longer place used[] on a separate page, just a separate
   cacheline.
5) We do a modulo on a variable.  We could be tricky if we cared.
6) Interrupts and notifies are suppressed using flags within the rings.

Users need only implement the new_vq and free_vq hooks (KVM wants the
guest to allocate the rings, lguest does it sanely).

Signed-off-by: Rusty Russell [EMAIL PROTECTED]
---
 arch/i386/lguest/Kconfig |1 
 drivers/virtio/Kconfig   |5 
 drivers/virtio/Makefile  |1 
 drivers/virtio/virtio_ring.c |  316 ++
 include/linux/virtio_ring.h  |  119 +++
 5 files changed, 442 insertions(+)

===
--- a/arch/i386/lguest/Kconfig
+++ b/arch/i386/lguest/Kconfig
@@ -2,6 +2,7 @@ config LGUEST_GUEST
bool Lguest guest support
select PARAVIRT
depends on !X86_PAE
+   select VIRTIO_RING
help
  Lguest is a tiny in-kernel hypervisor.  Selecting this will
  allow your kernel to boot under lguest.  This option will increase
===
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -1,3 +1,8 @@
 # Virtio always gets selected by whoever wants it.
 config VIRTIO
bool
+
+# Similarly the virtio ring implementation.
+config VIRTIO_RING
+   bool
+   depends on VIRTIO
===
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -1,1 +1,2 @@ obj-$(CONFIG_VIRTIO) += virtio.o
 obj-$(CONFIG_VIRTIO) += virtio.o
+obj-$(CONFIG_VIRTIO_RING) += virtio_ring.o
===
--- /dev/null
+++ b/drivers/virtio/virtio_ring.c
@@ -0,0 +1,316 @@
+/* Virtio ring implementation.
+ *
+ *  Copyright 2007 Rusty Russell IBM Corporation
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+ */
+#include linux/virtio.h
+#include linux/virtio_ring.h
+#include linux/device.h
+
+#ifdef DEBUG
+/* For development, we want to crash whenever the ring is screwed. */
+#define BAD_RING(vq, fmt...)   \
+   do { dev_err(vq-vq.vdev-dev, fmt); BUG(); } while(0)
+#define START_USE(vq) \
+   do { if ((vq)-in_use) panic(in_use = %i\n, (vq)-in_use); 
(vq)-in_use = __LINE__; mb(); } while(0)
+#define END_USE(vq) \
+   do { BUG_ON(!(vq)-in_use); (vq)-in_use = 0; mb(); } while(0)
+#else
+#define BAD_RING(vq, fmt...)   \
+   do { dev_err(vq-vq.vdev-dev, fmt); (vq)-broken = true; } while(0)
+#define START_USE(vq)
+#define END_USE(vq)
+#endif
+
+struct vring_virtqueue
+{
+   struct virtqueue vq;
+
+   /* Actual memory layout for this queue */
+   struct vring vring;
+
+   /* Other side has made a mess, don't try any more. */
+   bool broken;
+
+   /* Number of free buffers */
+   unsigned int num_free;
+   /* Head of free buffer list. */
+   unsigned int free_head;
+   /* Number we've added since last sync. */
+   unsigned int num_added;
+
+   /* Last used index we've seen. */
+   unsigned int last_used_idx;
+
+   /* How to notify other side. FIXME: commonalize hcalls! */
+   void (*notify)(struct virtqueue *vq);
+
+#ifdef DEBUG
+   /* They're supposed to lock for us. */
+   unsigned int in_use;
+#endif
+
+   /* Tokens for callbacks. */
+   void *data[];
+};
+
+#define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
+
+static int vring_add_buf(struct virtqueue *_vq,
+struct scatterlist sg[],
+unsigned int out,
+unsigned int in,
+void *data)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+   unsigned int i, avail, head, uninitialized_var(prev);
+
+   BUG_ON(data == NULL);
+   BUG_ON(out + in  vq-vring.num);
+   BUG_ON(out + in == 0);
+
+  

Re: [kvm-devel] [PATCH 2/3] virtio ring implementation

2007-09-24 Thread Dor Laor

Rusty Russell wrote:


These helper routines supply most of the virtqueue_ops for hypervisors
which want to use a ring for virtio.  Unlike the previous lguest
implementation:

1) The rings are variable sized (2^n-1 elements).
2) They have an unfortunate limit of 65535 bytes per sg element.
3) The page numbers are always 64 bit (PAE anyone?)
4) They no longer place used[] on a separate page, just a separate
   cacheline.
5) We do a modulo on a variable.  We could be tricky if we cared.
6) Interrupts and notifies are suppressed using flags within the rings.

Users need only implement the new_vq and free_vq hooks (KVM wants the
guest to allocate the rings, lguest does it sanely).

Signed-off-by: Rusty Russell [EMAIL PROTECTED]


[snip]


+irqreturn_t vring_interrupt(int irq, void *_vq)
+{
+   struct vring_virtqueue *vq = to_vvq(_vq);
+
+   pr_debug(virtqueue interrupt for %p\n, vq);
+
+   if (unlikely(vq-broken))
+   return IRQ_HANDLED;
+
+   if (more_used(vq)) {
+   pr_debug(virtqueue callback for %p (%p)\n,
+vq, vq-vq.callback);
+   if (!vq-vq.callback)
+   return IRQ_NONE;
+   if (!vq-vq.callback(vq-vq))
+   vq-vring.avail-flags |= 
VRING_AVAIL_F_NO_INTERRUPT;

+   } else
+   pr_debug(virtqueue %p no more used\n, vq);
+
+   return IRQ_HANDLED;
+}
+

Seems like there is a problem with shared irq line, the interrupt 
handler always returns IRQ_HANDLED (except for the trivial case

were the callback is null).

It can be solved by having a host irq counter (in the shared ring) and a 
guest irq counter and return

mb(); return (host_counter!=guest_counter)? IRQ_HANDLED:IRQ_NONE;

Dor.
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel


Re: [kvm-devel] [PATCH 2/3] virtio ring implementation

2007-09-24 Thread Rusty Russell
On Mon, 2007-09-24 at 15:44 +0200, Dor Laor wrote:
 Seems like there is a problem with shared irq line, the interrupt
 handler always returns IRQ_HANDLED (except for the trivial case
 were the callback is null).

 It can be solved by having a host irq counter (in the shared ring) and
 a guest irq counter and return
 mb(); return (host_counter!=guest_counter)? IRQ_HANDLED:IRQ_NONE;

Or we could make the callback return irqreturn_t and have an explicit
disable hook to disable interrupts.

Using the return value of the callback to disable the queue has always
made be a little uncomfortable, but it's slightly more efficient than a
separate hook.

Thanks,
Rusty.


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel