Re: [kvm-devel] [PATCH 2/3] virtio ring implementation
Rusty Russell wrote: On Tue, 2007-09-25 at 19:15 +0200, Dor Laor wrote: At the moment it's not good enough, there is a potential race were the guest optimistically turn off the VRING_AVAIL_F_NO_INTERRUPT in the vring_restart and afterwards finds there are more_used so it consume them in the poll function. If in the middle, packets arrive the host will see the flag is off and send irq. In that case the above irq handler will report IRQ_NONE. Good point. On the one hand, reporting IRQ_NONE occasionally is not fatal. On the other, it'd be nice to get this right. It's also not trivial to cancel the optimistic approach in the restart function since then there might be another race that will result in missing irq reaching the guest. I did it optimistically because otherwise we need two barriers (and still have a window). I'll try to think of something better than a hypercall for switching irq on/off. Erk. That would be v. bad... Actually double barrier will satisfy non-shared irq lines and will report irq handling right. The more complex scenario is shared irqs (pci...), since several devices are raising the irq line (level triggered) in the same time, while the tap add receive packets it always leaves a potential race for irq handled return value. So seems like we better not have shared irq and if only if shared irq is a must we should do the naughty naughty things. If returning IRQ_NONE once in a while is not too bad (including the matching windows implementation) then there is no issue. Dor. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 2/3] virtio ring implementation
Rusty Russell wrote: On Mon, 2007-09-24 at 15:44 +0200, Dor Laor wrote: Rusty Russell wrote: +irqreturn_t vring_interrupt(int irq, void *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + + pr_debug(virtqueue interrupt for %p\n, vq); + + if (unlikely(vq-broken)) + return IRQ_HANDLED; + + if (more_used(vq)) { + pr_debug(virtqueue callback for %p (%p)\n, +vq, vq-vq.callback); + if (!vq-vq.callback) + return IRQ_NONE; + if (!vq-vq.callback(vq-vq)) + vq-vring.avail-flags |= VRING_AVAIL_F_NO_INTERRUPT; + } else + pr_debug(virtqueue %p no more used\n, vq); + + return IRQ_HANDLED; +} + Seems like there is a problem with shared irq line, the interrupt handler always returns IRQ_HANDLED (except for the trivial case were the callback is null). To reply for a second time, this time with thought. Sorry. Yes, this code is wrong. It should be: irqreturn_t vring_interrupt(int irq, void *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); if (!more_used(vq)) { pr_debug(virtqueue interrupt with no work for %p\n, vq); return IRQ_NONE; } if (unlikely(vq-broken)) return IRQ_HANDLED; pr_debug(virtqueue callback for %p (%p)\n, vq, vq-vq.callback); if (vq-vq.callback !vq-vq.callback(vq-vq)) vq-vring.avail-flags |= VRING_AVAIL_F_NO_INTERRUPT; return IRQ_HANDLED; } Cheers, Rusty. At the moment it's not good enough, there is a potential race were the guest optimistically turn off the VRING_AVAIL_F_NO_INTERRUPT in the vring_restart and afterwards finds there are more_used so it consume them in the poll function. If in the middle, packets arrive the host will see the flag is off and send irq. In that case the above irq handler will report IRQ_NONE. It's also not trivial to cancel the optimistic approach in the restart function since then there might be another race that will result in missing irq reaching the guest. I'll try to think of something better than a hypercall for switching irq on/off. Cheers mate, Dor. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 2/3] virtio ring implementation
On Tue, 2007-09-25 at 19:15 +0200, Dor Laor wrote: At the moment it's not good enough, there is a potential race were the guest optimistically turn off the VRING_AVAIL_F_NO_INTERRUPT in the vring_restart and afterwards finds there are more_used so it consume them in the poll function. If in the middle, packets arrive the host will see the flag is off and send irq. In that case the above irq handler will report IRQ_NONE. Good point. On the one hand, reporting IRQ_NONE occasionally is not fatal. On the other, it'd be nice to get this right. It's also not trivial to cancel the optimistic approach in the restart function since then there might be another race that will result in missing irq reaching the guest. I did it optimistically because otherwise we need two barriers (and still have a window). I'll try to think of something better than a hypercall for switching irq on/off. Erk. That would be v. bad... Thanks, Rusty. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
[kvm-devel] [PATCH 2/3] virtio ring implementation
These helper routines supply most of the virtqueue_ops for hypervisors which want to use a ring for virtio. Unlike the previous lguest implementation: 1) The rings are variable sized (2^n-1 elements). 2) They have an unfortunate limit of 65535 bytes per sg element. 3) The page numbers are always 64 bit (PAE anyone?) 4) They no longer place used[] on a separate page, just a separate cacheline. 5) We do a modulo on a variable. We could be tricky if we cared. 6) Interrupts and notifies are suppressed using flags within the rings. Users need only implement the new_vq and free_vq hooks (KVM wants the guest to allocate the rings, lguest does it sanely). Signed-off-by: Rusty Russell [EMAIL PROTECTED] --- arch/i386/lguest/Kconfig |1 drivers/virtio/Kconfig |5 drivers/virtio/Makefile |1 drivers/virtio/virtio_ring.c | 316 ++ include/linux/virtio_ring.h | 119 +++ 5 files changed, 442 insertions(+) === --- a/arch/i386/lguest/Kconfig +++ b/arch/i386/lguest/Kconfig @@ -2,6 +2,7 @@ config LGUEST_GUEST bool Lguest guest support select PARAVIRT depends on !X86_PAE + select VIRTIO_RING help Lguest is a tiny in-kernel hypervisor. Selecting this will allow your kernel to boot under lguest. This option will increase === --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -1,3 +1,8 @@ # Virtio always gets selected by whoever wants it. config VIRTIO bool + +# Similarly the virtio ring implementation. +config VIRTIO_RING + bool + depends on VIRTIO === --- a/drivers/virtio/Makefile +++ b/drivers/virtio/Makefile @@ -1,1 +1,2 @@ obj-$(CONFIG_VIRTIO) += virtio.o obj-$(CONFIG_VIRTIO) += virtio.o +obj-$(CONFIG_VIRTIO_RING) += virtio_ring.o === --- /dev/null +++ b/drivers/virtio/virtio_ring.c @@ -0,0 +1,316 @@ +/* Virtio ring implementation. + * + * Copyright 2007 Rusty Russell IBM Corporation + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + */ +#include linux/virtio.h +#include linux/virtio_ring.h +#include linux/device.h + +#ifdef DEBUG +/* For development, we want to crash whenever the ring is screwed. */ +#define BAD_RING(vq, fmt...) \ + do { dev_err(vq-vq.vdev-dev, fmt); BUG(); } while(0) +#define START_USE(vq) \ + do { if ((vq)-in_use) panic(in_use = %i\n, (vq)-in_use); (vq)-in_use = __LINE__; mb(); } while(0) +#define END_USE(vq) \ + do { BUG_ON(!(vq)-in_use); (vq)-in_use = 0; mb(); } while(0) +#else +#define BAD_RING(vq, fmt...) \ + do { dev_err(vq-vq.vdev-dev, fmt); (vq)-broken = true; } while(0) +#define START_USE(vq) +#define END_USE(vq) +#endif + +struct vring_virtqueue +{ + struct virtqueue vq; + + /* Actual memory layout for this queue */ + struct vring vring; + + /* Other side has made a mess, don't try any more. */ + bool broken; + + /* Number of free buffers */ + unsigned int num_free; + /* Head of free buffer list. */ + unsigned int free_head; + /* Number we've added since last sync. */ + unsigned int num_added; + + /* Last used index we've seen. */ + unsigned int last_used_idx; + + /* How to notify other side. FIXME: commonalize hcalls! */ + void (*notify)(struct virtqueue *vq); + +#ifdef DEBUG + /* They're supposed to lock for us. */ + unsigned int in_use; +#endif + + /* Tokens for callbacks. */ + void *data[]; +}; + +#define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq) + +static int vring_add_buf(struct virtqueue *_vq, +struct scatterlist sg[], +unsigned int out, +unsigned int in, +void *data) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + unsigned int i, avail, head, uninitialized_var(prev); + + BUG_ON(data == NULL); + BUG_ON(out + in vq-vring.num); + BUG_ON(out + in == 0); + +
Re: [kvm-devel] [PATCH 2/3] virtio ring implementation
Rusty Russell wrote: These helper routines supply most of the virtqueue_ops for hypervisors which want to use a ring for virtio. Unlike the previous lguest implementation: 1) The rings are variable sized (2^n-1 elements). 2) They have an unfortunate limit of 65535 bytes per sg element. 3) The page numbers are always 64 bit (PAE anyone?) 4) They no longer place used[] on a separate page, just a separate cacheline. 5) We do a modulo on a variable. We could be tricky if we cared. 6) Interrupts and notifies are suppressed using flags within the rings. Users need only implement the new_vq and free_vq hooks (KVM wants the guest to allocate the rings, lguest does it sanely). Signed-off-by: Rusty Russell [EMAIL PROTECTED] [snip] +irqreturn_t vring_interrupt(int irq, void *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + + pr_debug(virtqueue interrupt for %p\n, vq); + + if (unlikely(vq-broken)) + return IRQ_HANDLED; + + if (more_used(vq)) { + pr_debug(virtqueue callback for %p (%p)\n, +vq, vq-vq.callback); + if (!vq-vq.callback) + return IRQ_NONE; + if (!vq-vq.callback(vq-vq)) + vq-vring.avail-flags |= VRING_AVAIL_F_NO_INTERRUPT; + } else + pr_debug(virtqueue %p no more used\n, vq); + + return IRQ_HANDLED; +} + Seems like there is a problem with shared irq line, the interrupt handler always returns IRQ_HANDLED (except for the trivial case were the callback is null). It can be solved by having a host irq counter (in the shared ring) and a guest irq counter and return mb(); return (host_counter!=guest_counter)? IRQ_HANDLED:IRQ_NONE; Dor. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel
Re: [kvm-devel] [PATCH 2/3] virtio ring implementation
On Mon, 2007-09-24 at 15:44 +0200, Dor Laor wrote: Seems like there is a problem with shared irq line, the interrupt handler always returns IRQ_HANDLED (except for the trivial case were the callback is null). It can be solved by having a host irq counter (in the shared ring) and a guest irq counter and return mb(); return (host_counter!=guest_counter)? IRQ_HANDLED:IRQ_NONE; Or we could make the callback return irqreturn_t and have an explicit disable hook to disable interrupts. Using the return value of the callback to disable the queue has always made be a little uncomfortable, but it's slightly more efficient than a separate hook. Thanks, Rusty. - This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/ ___ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel