Re: [1/2] tun: Only free a netdev when all tun descriptors are closed

2009-04-16 Thread Michael S. Tsirkin
On Thu, Apr 16, 2009 at 01:08:18AM -, Herbert Xu wrote:
> On Wed, Apr 15, 2009 at 10:38:34PM +0800, Herbert Xu wrote:
> > 
> > So how about this? We replace the dev destructor with our own that
> > doesn't immediately call free_netdev.  We only call free_netdev once
> > all tun fd's attached to the device have been closed.
> 
> Here's the patch.  I'd appreciate if everyone can review it
> and see if they can recreate the original race by
> 
> 1) Starting a process using tun and polls on it;
> 2) Doing ip tun del tunXXX while the process is polling.
> 
> tun: Only free a netdev when all tun descriptors are closed
> 
> The commit c70f182940f988448f3c12a209d18b1edc276e33 ("tun: Fix
> races between tun_net_close and free_netdev") fixed a race where
> an asynchronous deletion of a tun device can hose a poll(2) on
> a tun fd attached to that device.
> 
> However, this came at the cost of moving the tun wait queue into
> the tun file data structure.  The problem with this is that it
> imposes restrictions on when and where the tun device can access
> the wait queue since the tun file may change at any time due to
> detaching and reattaching.
> 
> In particular, now that we need to use the wait queue on the
> receive path it becomes difficult to properly synchronise this
> with the detachment of the tun device.
> 
> This patch solves the original race in a different way.  Since
> the race is only because the underlying memory gets freed, we
> can prevent it simply by ensuring that we don't do that until
> all tun descriptors ever attached to the device (even if they
> have since be detached because they may still be sitting in poll)
> have been closed.
> 
> This is done by using reference counting the attached tun file
> descriptors.  The refcount in tun->sk has been reappropriated
> for this purpose since it was already being used for that, albeit
> from the opposite angle.
> 
> Note that we no longer zero tfile->tun since tun_get will return
> NULL anyway after the refcount on tfile hits zero.  Instead it
> represents whether this device has ever been attached to a device.
> 
> Signed-off-by: Herbert Xu 
> 
> 
> Cheers,
> 
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> 
> .
>
> @@ -1275,20 +1278,18 @@ static int tun_chr_close(struct inode *inode, struct 
> file *file)
>   struct tun_file *tfile = file->private_data;
>   struct tun_struct *tun = __tun_get(tfile);
>  
> -
>   if (tun) {
> - DBG(KERN_INFO "%s: tun_chr_close\n", tun->dev->name);
> -
> - rtnl_lock();
> - __tun_detach(tun);
> -
>   /* If desireable, unregister the netdevice. */
> - if (!(tun->flags & TUN_PERSIST)) {
> - sock_put(tun->sk);
> - unregister_netdevice(tun->dev);
> - }
> + if (!(tun->flags & TUN_PERSIST))
> + unregister_netdev(tun->dev);
> + else
> + tun_put(tun);
> + } else
> + tun = tfile->tun;
>  
> - rtnl_unlock();
> + if (tun) {
> + DBG(KERN_INFO "%s: tun_chr_close\n", tun->dev->name);
> + sock_put(tun->sk);
>   }
>  
>   put_net(tfile->net);

This last bit seems to make a simple test using a non-persistent tap device
deadlock for me: we don't drop a reference acquired with __tun_get sock
unregister_netdevice blocks printing unregister_netdevice: waiting for tap0 to
become free. Usage count = 1.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [1/2] tun: Only free a netdev when all tun descriptors are closed

2009-04-18 Thread Michael S. Tsirkin
On Fri, Apr 17, 2009 at 08:09:23AM +0800, Herbert Xu wrote:
> On Thu, Apr 16, 2009 at 10:57:45PM +0300, Michael S. Tsirkin wrote:
> >
> > This last bit seems to make a simple test using a non-persistent tap device
> > deadlock for me: we don't drop a reference acquired with __tun_get sock
> > unregister_netdevice blocks printing unregister_netdevice: waiting for tap0 
> > to
> > become free. Usage count = 1.
> 
> Ah yes, I'd overlooked the fact that the original code didn't
> require the tfile refcount to hit zero.  Now we do.  Here's an
> updated version of the first patch.  The second patch should still
> work as is.
> 
> tun: Only free a netdev when all tun descriptors are closed

...

> @@ -1275,20 +1278,17 @@ static int tun_chr_close(struct inode *inode, struct 
> file *file)
>   struct tun_file *tfile = file->private_data;
>   struct tun_struct *tun = __tun_get(tfile);
>  
> -
>   if (tun) {
> - DBG(KERN_INFO "%s: tun_chr_close\n", tun->dev->name);
> -
> - rtnl_lock();
> - __tun_detach(tun);
> -
>   /* If desireable, unregister the netdevice. */
> - if (!(tun->flags & TUN_PERSIST)) {
> - sock_put(tun->sk);
> - unregister_netdevice(tun->dev);
> - }
> + if (!(tun->flags & TUN_PERSIST))
> + unregister_netdev(tun->dev);
> + tun_put(tun);
> + } else
> + tun = tfile->tun;
>  
> - rtnl_unlock();
> + if (tun) {
> + DBG(KERN_INFO "%s: tun_chr_close\n", tun->dev->name);
> + sock_put(tun->sk);
>   }
>  
>   put_net(tfile->net);

Does this work with TUN_PERSIST off?
I haven't tested this, but won't unregister_netdev block forever
waiting for device reference to become 0? Maybe you want

+   tun_put(tun);
+   if (!(tun->flags & TUN_PERSIST))
+   unregister_netdev(tun->dev);

or is there a race here?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-27 Thread Michael S. Tsirkin
Add optional MSI-X support: use a vector per virtqueue with
fallback to a common vector and finally to regular interrupt.
Teach all drivers to use it.

I added 2 new virtio operations: request_vqs/free_vqs because MSI
needs to know the total number of vectors upfront.

Signed-off-by: Michael S. Tsirkin 

---

Here's a draft set of patches for MSI-X support in the guest.  It still
needs to be tested properly, and performance impact measured, but I
thought I'd share it here in the hope of getting some very early
feedback/flames.

Michael S. Tsirkin (8):
  virtio: add request_vqs/free_vqs virtio operations
  virtio_blk: add request_vqs/free_vqs calls
  virtio-rng: add request_vqs/free_vqs calls
  virtio_console: add request_vqs/free_vqs calls
  virtio_net: add request_vqs/free_vqs calls
  virtio_balloon: add request_vqs/free_vqs calls
  virtio_pci: split up vp_interrupt
  virtio_pci: optional MSI-X support

 drivers/block/virtio_blk.c  |9 ++-
 drivers/char/hw_random/virtio-rng.c |   10 ++-
 drivers/char/virtio_console.c   |8 ++-
 drivers/net/virtio_net.c|   16 +++-
 drivers/virtio/virtio_balloon.c |9 ++-
 drivers/virtio/virtio_pci.c |  192 ++-
 include/linux/virtio_config.h   |   35 +++
 7 files changed, 247 insertions(+), 32 deletions(-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 1/8] virtio: add request_vqs/free_vqs operations

2009-04-27 Thread Michael S. Tsirkin
This adds 2 new optional virtio operations: request_vqs/free_vqs. They will be
used for MSI support, because MSI needs to know the total number of vectors
upfront.

Signed-off-by: Michael S. Tsirkin 
---
 include/linux/virtio_config.h |   35 +++
 1 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index bf8ec28..e935670 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -49,6 +49,15 @@
  * @set_status: write the status byte
  * vdev: the virtio_device
  * status: the new status byte
+ * @request_vqs: request the specified number of virtqueues
+ * vdev: the virtio_device
+ * max_vqs: the max number of virtqueues we want
+ *  If supplied, must call before any virtqueues are instantiated.
+ *  To modify the max number of virtqueues after request_vqs has been
+ *  called, call free_vqs and then request_vqs with a new value.
+ * @free_vqs: cleanup resources allocated by request_vqs
+ * vdev: the virtio_device
+ *  If supplied, must call after all virtqueues have been deleted.
  * @reset: reset the device
  * vdev: the virtio device
  * After this, status and feature negotiation must be done again
@@ -75,6 +84,8 @@ struct virtio_config_ops
u8 (*get_status)(struct virtio_device *vdev);
void (*set_status)(struct virtio_device *vdev, u8 status);
void (*reset)(struct virtio_device *vdev);
+   int (*request_vqs)(struct virtio_device *vdev, unsigned max_vqs);
+   void (*free_vqs)(struct virtio_device *vdev);
struct virtqueue *(*find_vq)(struct virtio_device *vdev,
 unsigned index,
 void (*callback)(struct virtqueue *));
@@ -126,5 +137,29 @@ static inline int virtio_config_buf(struct virtio_device 
*vdev,
vdev->config->get(vdev, offset, buf, len);
return 0;
 }
+
+/**
+ * virtio_request_vqs: request the specified number of virtqueues
+ * @vdev: the virtio_device
+ * @max_vqs: the max number of virtqueues we want
+ *
+ * For details, see documentation for request_vqs above. */
+static inline int virtio_request_vqs(struct virtio_device *vdev,
+unsigned max_vqs)
+{
+   return vdev->config->request_vqs ?
+   vdev->config->request_vqs(vdev, max_vqs) : 0;
+}
+
+/**
+ * free_vqs: cleanup resources allocated by virtio_request_vqs
+ * @vdev: the virtio_device
+ *
+ * For details, see documentation for free_vqs above. */
+static inline void virtio_free_vqs(struct virtio_device *vdev)
+{
+   if (vdev->config->free_vqs)
+   vdev->config->free_vqs(vdev);
+}
 #endif /* __KERNEL__ */
 #endif /* _LINUX_VIRTIO_CONFIG_H */
-- 
1.6.0.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 2/8] virtio_blk: add request_vqs/free_vqs calls

2009-04-27 Thread Michael S. Tsirkin
Add request_vqs/free_vqs calls to virtio_blk.
These will be required for MSI support.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/block/virtio_blk.c |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 5d34764..523eddc 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -224,10 +224,14 @@ static int virtblk_probe(struct virtio_device *vdev)
sg_init_table(vblk->sg, vblk->sg_elems);
 
/* We expect one virtqueue, for output. */
+   err = virtio_request_vqs(vdev, 1);
+   if (err)
+   goto out_free_vblk;
+
vblk->vq = vdev->config->find_vq(vdev, 0, blk_done);
if (IS_ERR(vblk->vq)) {
err = PTR_ERR(vblk->vq);
-   goto out_free_vblk;
+   goto out_free_vqs;
}
 
vblk->pool = mempool_create_kmalloc_pool(1,sizeof(struct virtblk_req));
@@ -324,6 +328,8 @@ out_mempool:
mempool_destroy(vblk->pool);
 out_free_vq:
vdev->config->del_vq(vblk->vq);
+out_free_vqs:
+   virtio_free_vqs(vdev);
 out_free_vblk:
kfree(vblk);
 out:
@@ -345,6 +351,7 @@ static void virtblk_remove(struct virtio_device *vdev)
put_disk(vblk->disk);
mempool_destroy(vblk->pool);
vdev->config->del_vq(vblk->vq);
+   virtio_free_vqs(vdev);
kfree(vblk);
 }
 
-- 
1.6.0.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 3/8] virtio-rng: add request_vqs/free_vqs calls

2009-04-27 Thread Michael S. Tsirkin
Add request_vqs/free_vqs calls to virtio-rng.
These will be required for MSI support.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/char/hw_random/virtio-rng.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index d0e563e..d43a6cd 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -94,13 +94,20 @@ static int virtrng_probe(struct virtio_device *vdev)
int err;
 
/* We expect a single virtqueue. */
+   err = virtio_request_vqs(vdev, 1);
+   if (err)
+   return err;
+
vq = vdev->config->find_vq(vdev, 0, random_recv_done);
-   if (IS_ERR(vq))
+   if (IS_ERR(vq)) {
+   virtio_free_vqs(vdev);
return PTR_ERR(vq);
+   }
 
err = hwrng_register(&virtio_hwrng);
if (err) {
vdev->config->del_vq(vq);
+   virtio_free_vqs(vdev);
return err;
}
 
@@ -113,6 +120,7 @@ static void virtrng_remove(struct virtio_device *vdev)
vdev->config->reset(vdev);
hwrng_unregister(&virtio_hwrng);
vdev->config->del_vq(vq);
+   virtio_free_vqs(vdev);
 }
 
 static struct virtio_device_id id_table[] = {
-- 
1.6.0.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 4/8] virtio_console: add request_vqs/free_vqs calls

2009-04-27 Thread Michael S. Tsirkin
Add request_vqs/free_vqs calls to virtio_console.
These will be required for MSI support.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/char/virtio_console.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index ff6f5a4..78c503f 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -199,13 +199,17 @@ static int __devinit virtcons_probe(struct virtio_device 
*dev)
goto fail;
}
 
+   err = virtio_request_vqs(vdev, 2);
+   if (err)
+   goto free;
+
/* Find the input queue. */
/* FIXME: This is why we want to wean off hvc: we do nothing
 * when input comes in. */
in_vq = vdev->config->find_vq(vdev, 0, hvc_handle_input);
if (IS_ERR(in_vq)) {
err = PTR_ERR(in_vq);
-   goto free;
+   goto free_vqs;
}
 
out_vq = vdev->config->find_vq(vdev, 1, NULL);
@@ -244,6 +248,8 @@ free_out_vq:
vdev->config->del_vq(out_vq);
 free_in_vq:
vdev->config->del_vq(in_vq);
+free_vqs:
+   virtio_free_vqs(vdev);
 free:
kfree(inbuf);
 fail:
-- 
1.6.0.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 6/8] virtio_balloon: add request_vqs/free_vqs calls

2009-04-27 Thread Michael S. Tsirkin
Add request_vqs/free_vqs calls to virtio_balloon.
These will be required for MSI support.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_balloon.c |9 -
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 5926826..77eac2b 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -217,10 +217,14 @@ static int virtballoon_probe(struct virtio_device *vdev)
vb->vdev = vdev;
 
/* We expect two virtqueues. */
+   err = virtio_request_vqs(vdev, 2);
+   if (err)
+   goto out_free_vb;
+
vb->inflate_vq = vdev->config->find_vq(vdev, 0, balloon_ack);
if (IS_ERR(vb->inflate_vq)) {
err = PTR_ERR(vb->inflate_vq);
-   goto out_free_vb;
+   goto out_free_vqs;
}
 
vb->deflate_vq = vdev->config->find_vq(vdev, 1, balloon_ack);
@@ -244,6 +248,8 @@ out_del_deflate_vq:
vdev->config->del_vq(vb->deflate_vq);
 out_del_inflate_vq:
vdev->config->del_vq(vb->inflate_vq);
+out_free_vqs:
+   virtio_free_vqs(vdev);
 out_free_vb:
kfree(vb);
 out:
@@ -265,6 +271,7 @@ static void virtballoon_remove(struct virtio_device *vdev)
 
vdev->config->del_vq(vb->deflate_vq);
vdev->config->del_vq(vb->inflate_vq);
+   virtio_free_vqs(vdev);
kfree(vb);
 }
 
-- 
1.6.0.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 7/8] virtio_pci: split up vp_interrupt

2009-04-27 Thread Michael S. Tsirkin
This reorganizes virtio-pci code in vp_interrupt slightly, so that
it's easier to add per-vq MSI support on top.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_pci.c |   45 +-
 1 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 330aacb..151538c 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -164,6 +164,37 @@ static void vp_notify(struct virtqueue *vq)
iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
 }
 
+/* Handle a configuration change: Tell driver if it wants to know. */
+static irqreturn_t vp_config_changed(int irq, void *opaque)
+{
+   struct virtio_pci_device *vp_dev = opaque;
+   struct virtio_driver *drv;
+   drv = container_of(vp_dev->vdev.dev.driver,
+  struct virtio_driver, driver);
+
+   if (drv && drv->config_changed)
+   drv->config_changed(&vp_dev->vdev);
+   return IRQ_HANDLED;
+}
+
+/* Notify all virtqueues on an interrupt. */
+static irqreturn_t vp_vring_interrupt(int irq, void *opaque)
+{
+   struct virtio_pci_device *vp_dev = opaque;
+   struct virtio_pci_vq_info *info;
+   irqreturn_t ret = IRQ_NONE;
+   unsigned long flags;
+
+   spin_lock_irqsave(&vp_dev->lock, flags);
+   list_for_each_entry(info, &vp_dev->virtqueues, node) {
+   if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+   spin_unlock_irqrestore(&vp_dev->lock, flags);
+
+   return ret;
+}
+
 /* A small wrapper to also acknowledge the interrupt when it's handled.
  * I really need an EIO hook for the vring so I can ack the interrupt once we
  * know that we'll be handling the IRQ but before we invoke the callback since
@@ -173,9 +204,6 @@ static void vp_notify(struct virtqueue *vq)
 static irqreturn_t vp_interrupt(int irq, void *opaque)
 {
struct virtio_pci_device *vp_dev = opaque;
-   struct virtio_pci_vq_info *info;
-   irqreturn_t ret = IRQ_NONE;
-   unsigned long flags;
u8 isr;
 
/* reading the ISR has the effect of also clearing it so it's very
@@ -187,14 +215,11 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
return IRQ_NONE;
 
/* Configuration change?  Tell driver if it wants to know. */
-   if (isr & VIRTIO_PCI_ISR_CONFIG) {
-   struct virtio_driver *drv;
-   drv = container_of(vp_dev->vdev.dev.driver,
-  struct virtio_driver, driver);
+   if (isr & VIRTIO_PCI_ISR_CONFIG)
+   vp_config_changed(irq, opaque);
 
-   if (drv && drv->config_changed)
-   drv->config_changed(&vp_dev->vdev);
-   }
+   return vp_vring_interrupt(irq, opaque);
+}
 
spin_lock_irqsave(&vp_dev->lock, flags);
list_for_each_entry(info, &vp_dev->virtqueues, node) {
-- 
1.6.0.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 5/8] virtio_net: add request_vqs/free_vqs calls

2009-04-27 Thread Michael S. Tsirkin
Add request_vqs/free_vqs calls to virtio_net.
These will be required for MSI support.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/net/virtio_net.c |   16 +---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9c82a39..fbe8a70 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -841,6 +841,7 @@ static int virtnet_probe(struct virtio_device *vdev)
int err;
struct net_device *dev;
struct virtnet_info *vi;
+   bool control_vq;
 
/* Allocate ourselves a network device with room for our info */
dev = alloc_etherdev(sizeof(struct virtnet_info));
@@ -901,11 +902,17 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
vi->mergeable_rx_bufs = true;
 
-   /* We expect two virtqueues, receive then send. */
+   /* We expect either two or three virtqueues: receive, send and
+* optionally control. */
+   control_vq = virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ);
+   err = virtio_request_vqs(vdev, control_vq ? 3 : 2);
+   if (err)
+   goto free;
+
vi->rvq = vdev->config->find_vq(vdev, 0, skb_recv_done);
if (IS_ERR(vi->rvq)) {
err = PTR_ERR(vi->rvq);
-   goto free;
+   goto free_vqs;
}
 
vi->svq = vdev->config->find_vq(vdev, 1, skb_xmit_done);
@@ -914,7 +921,7 @@ static int virtnet_probe(struct virtio_device *vdev)
goto free_recv;
}
 
-   if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)) {
+   if (control_vq) {
vi->cvq = vdev->config->find_vq(vdev, 2, NULL);
if (IS_ERR(vi->cvq)) {
err = PTR_ERR(vi->svq);
@@ -965,6 +972,8 @@ free_send:
vdev->config->del_vq(vi->svq);
 free_recv:
vdev->config->del_vq(vi->rvq);
+free_vqs:
+   virtio_free_vqs(vi->vdev);
 free:
free_netdev(dev);
return err;
@@ -994,6 +1003,7 @@ static void virtnet_remove(struct virtio_device *vdev)
vdev->config->del_vq(vi->rvq);
if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ))
vdev->config->del_vq(vi->cvq);
+   virtio_free_vqs(vi->vdev);
unregister_netdev(vi->dev);
 
while (vi->pages)
-- 
1.6.0.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 8/8] virtio_pci: optional MSI-X support

2009-04-27 Thread Michael S. Tsirkin
This implements optional MSI-X support in virtio_pci.
MSI-X is used whenever the host supports at least 2 MSI-X
vectors: 1 for configuration changes and 1 for virtqueues.
Per-virtqueue vectors are allocated if enough vectors
available.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_pci.c |  147 ++-
 1 files changed, 132 insertions(+), 15 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 151538c..20bdc8c 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -42,8 +42,33 @@ struct virtio_pci_device
/* a list of queues so we can dispatch IRQs */
spinlock_t lock;
struct list_head virtqueues;
+
+   /* MSI-X support */
+   struct msix_entry *msix_entries;
+   /* Name strings for interrupts. This size should be enough,
+* and I'm too lazy to allocate each name separately. */
+   char (*msix_names)[256];
+   /* Number of vectors configured at startup (excludes per-virtqueue
+* vectors if any) */
+   unsigned msix_preset_vectors;
+   /* Number of per-virtqueue vectors if any. */
+   unsigned msix_per_vq_vectors;
+};
+
+/* Constants for MSI-X */
+/* Use first vector for configuration changes, second and the rest for
+ * virtqueues Thus, we need at least 2 vectors for MSI. */
+enum {
+   VP_MSIX_CONFIG_VECTOR = 0,
+   VP_MSIX_VQ_VECTOR = 1,
+   VP_MSIX_MIN_VECTORS = 2
 };
 
+static inline int vq_vector(int index)
+{
+   return index + VP_MSIX_VQ_VECTOR;
+}
+
 struct virtio_pci_vq_info
 {
/* the actual virtqueue */
@@ -221,14 +246,92 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
return vp_vring_interrupt(irq, opaque);
 }
 
-   spin_lock_irqsave(&vp_dev->lock, flags);
-   list_for_each_entry(info, &vp_dev->virtqueues, node) {
-   if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
-   ret = IRQ_HANDLED;
+/* the config->free_vqs() implementation */
+static void vp_free_vqs(struct virtio_device *vdev) {
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   int i;
+
+   for (i = 0; i < vp_dev->msix_preset_vectors; ++i)
+   free_irq(vp_dev->msix_entries[i].vector, vp_dev);
+
+   if (!vp_dev->msix_preset_vectors)
+   free_irq(vp_dev->pci_dev->irq, vp_dev);
+}
+
+/* the config->request_vqs() implementation */
+static int vp_request_vqs(struct virtio_device *vdev, unsigned max_vqs) {
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   const char *name = dev_name(&vp_dev->vdev.dev);
+   unsigned i, vectors;
+   int err = -ENOMEM;
+
+   /* We need at most one vector per queue and one for config changes */
+   vectors = vq_vector(max_vqs);
+   vp_dev->msix_entries = kmalloc(vectors * sizeof *vp_dev->msix_entries,
+  GFP_KERNEL);
+   if (!vp_dev->msix_entries)
+   goto error_entries;
+   vp_dev->msix_names = kmalloc(vectors * sizeof *vp_dev->msix_names,
+GFP_KERNEL);
+   if (!vp_dev->msix_names)
+   goto error_names;
+
+   snprintf(vp_dev->msix_names[VP_MSIX_CONFIG_VECTOR],
+sizeof *vp_dev->msix_names, "%s-config", name);
+   for (i = 0; i < max_vqs; ++i)
+   snprintf(vp_dev->msix_names[vq_vector(i)],
+sizeof *vp_dev->msix_names, "%s-vq-%d", name, i);
+
+   vp_dev->msix_preset_vectors = 1;
+   vp_dev->msix_per_vq_vectors = max_vqs;
+   for (;;) {
+   err = pci_enable_msix(vp_dev->pci_dev, vp_dev->msix_entries,
+ vectors);
+   /* Error out if not enough vectors */
+   if (err > 0 && err < VP_MSIX_MIN_VECTORS)
+   err = -EBUSY;
+   if (err <= 0)
+   break;
+   /* Not enough vectors for all queues. Retry, disabling
+* per-queue interrupts */
+   vectors = VP_MSIX_MIN_VECTORS;
+   vp_dev->msix_preset_vectors = VP_MSIX_MIN_VECTORS;
+   vp_dev->msix_per_vq_vectors = 0;
+   snprintf(vp_dev->msix_names[VP_MSIX_VQ_VECTOR],
+sizeof *vp_dev->msix_names, "%s-vq", name);
}
-   spin_unlock_irqrestore(&vp_dev->lock, flags);
 
-   return ret;
+   if (err) {
+   /* Can't allocate enough MSI-X vectors, use regular interrupt */
+   vp_dev->msix_preset_vectors = 0;
+   vp_dev->msix_per_vq_vectors = 0;
+   /* Register a handler for the queue with the PCI device's
+* interrupt */
+   err = request_irq(vp_dev->pci_dev->

Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-27 Thread Michael S. Tsirkin
On Mon, Apr 27, 2009 at 06:06:48PM +0300, Avi Kivity wrote:
> Christian Borntraeger wrote:
>> Am Monday 27 April 2009 14:31:36 schrieb Michael S. Tsirkin:
>>   
>>> Add optional MSI-X support: use a vector per virtqueue with
>>> fallback to a common vector and finally to regular interrupt.
>>> Teach all drivers to use it.
>>>
>>> I added 2 new virtio operations: request_vqs/free_vqs because MSI
>>> needs to know the total number of vectors upfront.
>>>
>>> Signed-off-by: Michael S. Tsirkin 
>>> 
>>
>> I dont know, if that is feasible for MSI, but the transport(virtio_pci) 
>> should
>> already know the number of virtqueues, which should match the number of
>> vectors, no?
>>
>> In fact, the transport has to have a way of getting the number of virtqeues
>> because find_vq returns ENOENT on invalid index numbers.
>>   
>
> One thing I thought was to decouple the virtqueue count from the MSI  
> entry count, and let the guest choose how many MSI entries it wants to  
> use.  This is useful if it wants several queues to share an interrupt,  
> perhaps it wants both rx and tx rings on one cpu to share a single 
> vector.
>
> So the device could expose a large, constant number of MSI entries, and  
> in addition expose a table mapping ring number to msi entry number.  The  
> guest could point all rings to one entry, or have each ring use a  
> private entry, or anything in between.

Sounds good ... and it might not be a lot of code I think.

> That saves us the new API (at the expense of a lot more code, but with  
> added flexibility).

So we'll probably need to rename request_vqs to request_vectors,
but we probably still need the driver to pass the number of
vectors it wants to the transport. Right?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-27 Thread Michael S. Tsirkin
On Mon, Apr 27, 2009 at 06:59:52PM +0200, Christian Borntraeger wrote:
> Am Monday 27 April 2009 17:39:36 schrieb Michael S. Tsirkin:
> > So we'll probably need to rename request_vqs to request_vectors,
> > but we probably still need the driver to pass the number of
> > vectors it wants to the transport. Right?
> 
> This might be a stupid idea, but would something like the following
> be sufficient for you?

Yes.

> Index: linux-2.6/drivers/net/virtio_net.c
> ===
> --- linux-2.6.orig/drivers/net/virtio_net.c
> +++ linux-2.6/drivers/net/virtio_net.c
> @@ -1021,6 +1021,7 @@ static unsigned int features[] = {
>  static struct virtio_driver virtio_net = {
> .feature_table = features,
> .feature_table_size = ARRAY_SIZE(features),
> +   .num_vq_max = 3,
> .driver.name =  KBUILD_MODNAME,
> .driver.owner = THIS_MODULE,
> .id_table = id_table,
> Index: linux-2.6/include/linux/virtio.h
> ===
> --- linux-2.6.orig/include/linux/virtio.h
> +++ linux-2.6/include/linux/virtio.h
> @@ -110,6 +110,7 @@ struct virtio_driver {
> const struct virtio_device_id *id_table;
> const unsigned int *feature_table;
> unsigned int feature_table_size;
> +   unsigned int num_vq_max;
> int (*probe)(struct virtio_device *dev);
> void (*remove)(struct virtio_device *dev);
> void (*config_changed)(struct virtio_device *dev);

Good idea.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-27 Thread Michael S. Tsirkin
On Mon, Apr 27, 2009 at 05:37:25PM +0200, Christian Borntraeger wrote:
> > As  number of virtqueues <= number of vectors,
> > we could pre-allocate all vectors that host supports, but this seems
> > a bit drastic as an MSI-X device could support up to 2K vectors.
> > 
> > > In fact, the transport has to have a way of getting the number of 
> > > virtqeues
> > > because find_vq returns ENOENT on invalid index numbers.
> > > 
> > > Christian
> > 
> > So again, I think this is an upper bound supported by host. Right?
> 
> Not the upper bound, but the real available virtqueues. (With current qemu
> 3 for virtio-net, 2 for virtio-console etc.)

Here's what I mean by upper bound: guest and host number of
virtqueues might be different: the host might support
control virtqueue like in virtio-net, but guest might be an older
version and not use it.

> Since I use a different transport (drivers/s390/kvm/kvm_virtio.c), my
> motiviation is to keep the virtio interface as generic as possible. I
> dont really like the new interface, but I cannot give you silver
> bullet technical reasons - its more a gut feeling. The interface would
> work with lguest and s390.
> 
> Anyway. Avis suggestion to decouple MSI count and virtqueue count looks
> like a promising approach. 

So generally, we add request_vectors which gives us the number of
available MSI vectors and then find_vq gets a vector #?
Does this sound better?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-27 Thread Michael S. Tsirkin
On Mon, Apr 27, 2009 at 04:00:30PM +0200, Christian Borntraeger wrote:
> Am Monday 27 April 2009 14:31:36 schrieb Michael S. Tsirkin:
> > Add optional MSI-X support: use a vector per virtqueue with
> > fallback to a common vector and finally to regular interrupt.
> > Teach all drivers to use it.
> > 
> > I added 2 new virtio operations: request_vqs/free_vqs because MSI
> > needs to know the total number of vectors upfront.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> 
> I dont know, if that is feasible for MSI, but the transport(virtio_pci) should
> already know the number of virtqueues, which should match the number of
> vectors, no?

I think no, the transport can find out the max number of vectors the
device supports (pci_msix_table_size), but not how many virtqueues are
needed by the driver.
As  number of virtqueues <= number of vectors,
we could pre-allocate all vectors that host supports, but this seems
a bit drastic as an MSI-X device could support up to 2K vectors.

> In fact, the transport has to have a way of getting the number of virtqeues
> because find_vq returns ENOENT on invalid index numbers.
> 
> Christian

So again, I think this is an upper bound supported by host. Right?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-27 Thread Michael S. Tsirkin
On Mon, Apr 27, 2009 at 09:39:06AM -0500, Anthony Liguori wrote:
> Michael S. Tsirkin wrote:
>> Add optional MSI-X support: use a vector per virtqueue with
>> fallback to a common vector and finally to regular interrupt.
>> Teach all drivers to use it.
>>
>> I added 2 new virtio operations: request_vqs/free_vqs because MSI
>> needs to know the total number of vectors upfront.
>>
>> Signed-off-by: Michael S. Tsirkin 
>>   
>
> Do you have userspace patches for testing?
>
> Regards,
>
> Anthony Liguori

Not ready yet, sorry.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-28 Thread Michael S. Tsirkin
On Tue, Apr 28, 2009 at 09:47:14AM +0300, Avi Kivity wrote:
> Michael S. Tsirkin wrote:
>>> That saves us the new API (at the expense of a lot more code, but 
>>> with  added flexibility).
>>> 
>>
>> So we'll probably need to rename request_vqs to request_vectors,
>> but we probably still need the driver to pass the number of
>> vectors it wants to the transport. Right?
>>
>>   
>
> I don't think so - virtio will provide the number of interrupts it  
> supports, and virtio-net will tell it to bind ring X to interrupt Y.

This does not work for MSIX - in linux, you must map all MSI-X entries
to interrupt vectors upfront.

So what I see is transports providing something like:

struct virtio_interrupt_mapping {
int virtqueue;
int interrupt;
};

map_vqs_to_interrupt(dev, struct virtio_interrupt_mapping *, int nvirtqueues);
unmap_vqs(dev);

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-28 Thread Michael S. Tsirkin
On Tue, Apr 28, 2009 at 08:51:08PM +0300, Avi Kivity wrote:
> Michael S. Tsirkin wrote:
>> This does not work for MSIX - in linux, you must map all MSI-X entries
>> to interrupt vectors upfront.
>>   
>
> What?  that's very inflexible.
>
> Can you point me at the code?

See pci_enable_msix in include/linux/pci.h

>> So what I see is transports providing something like:
>>
>> struct virtio_interrupt_mapping {
>>  int virtqueue;
>>  int interrupt;
>> };
>>
>> map_vqs_to_interrupt(dev, struct virtio_interrupt_mapping *, int 
>> nvirtqueues);
>> unmap_vqs(dev);
>>   
>
> Isn't that the same thing?  Please explain the flow.

So to map vq 0 to vector 0, vq 1 to vector 1 and vq 2 to vector 2 the driver 
would do:

struct virtio_interrupt_mapping mapping[3] = { {0, 0}, {1, 1}, {2, 2} };
vec = map_vqs_to_interrupt(dev, mapping, 3);
if (vec) {
  error handling
}

and then find_vq as usual.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-04-28 Thread Michael S. Tsirkin
On Tue, Apr 28, 2009 at 02:56:15PM -0500, Anthony Liguori wrote:
> Michael S. Tsirkin wrote:
>> So to map vq 0 to vector 0, vq 1 to vector 1 and vq 2 to vector 2 the driver 
>> would do:
>>
>> struct virtio_interrupt_mapping mapping[3] = { {0, 0}, {1, 1}, {2, 2} };
>> vec = map_vqs_to_interrupt(dev, mapping, 3);
>> if (vec) {
>>   error handling
>> }
>>
>> and then find_vq as usual.
>>   
>
> Is it possible to just delay the msix enablement until after the queues  
> have been finalized (IOW in virtio-pci.c:vp_finalize_features)?

Yes, but

1. vp_finalize_features returns void, while enabling requested
   number of interrupts might fail.

2. we don't know in advance (without trying) whether we'll be able
   to allocate a specific number of vectors.
   So if we want to give drivers the flexibility to assign queues to
   vectors, driver must request vectors, we'll try to allocate
   and driver will decide what to do on failure.

After thinking about it, request_vqs was not a bad API for virtio, good
enough for existing drivers.  And the PCI space can be made future proof
allowing mapping virtqueues to vectors in case we extend API later.

Thoughts?


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/8] virtio: add request_vqs/free_vqs operations

2009-05-04 Thread Michael S. Tsirkin
On Mon, May 04, 2009 at 09:02:20PM +0930, Rusty Russell wrote:
> On Mon, 27 Apr 2009 10:01:53 pm Michael S. Tsirkin wrote:
> > This adds 2 new optional virtio operations: request_vqs/free_vqs. They will 
> > be
> > used for MSI support, because MSI needs to know the total number of vectors
> > upfront.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> 
> Hi Michael,

Hi Rusty,

>   Thanks for this work!  But this interface is horrible.  Either probe for the
> number of vqs in virtio_pci, or change find_vq to
> 
>   int (*find_vqs)(struct virtio_device *, unsigned max,
>struct virtqueue *vqs[]);

I'm happier with the later option: it's easy for a host to expose
support for a very large number of vqs, and I don't want them to
waste resources if guest does not use them.

Thanks for the feedback!

> Thanks,
> Rusty.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 0/8] virtio: add guest MSI-X support

2009-05-04 Thread Michael S. Tsirkin
On Mon, May 04, 2009 at 12:21:28PM +0300, Avi Kivity wrote:
> Michael S. Tsirkin wrote:
>>>> So what I see is transports providing something like:
>>>>
>>>> struct virtio_interrupt_mapping {
>>>>int virtqueue;
>>>>int interrupt;
>>>> };
>>>>
>>>> map_vqs_to_interrupt(dev, struct virtio_interrupt_mapping *, int 
>>>> nvirtqueues);
>>>> unmap_vqs(dev);
>>>> 
>>> Isn't that the same thing?  Please explain the flow.
>>> 
>>
>> So to map vq 0 to vector 0, vq 1 to vector 1 and vq 2 to vector 2 the driver 
>> would do:
>>
>> struct virtio_interrupt_mapping mapping[3] = { {0, 0}, {1, 1}, {2, 2} };
>> vec = map_vqs_to_interrupt(dev, mapping, 3);
>> if (vec) {
>>   error handling
>> }
>>
>> and then find_vq as usual.
>>   
>
> Yes, that works.
>
> Given that pci_enable_msix() can fail, we can put the retry loop in  
> virtio-pci, and instead of a static mapping, supply a dynamic mapping:
>
>static void get_vq_interrupt(..., int nr_interrupts, int vq)
>{
>/* reserve interrupt 0 to config changes; round-robin vqs to  
> interrupts */
>return 1 + (vq % (nr_interrupts - 1));
>}
>
>driver_init()
>{
>map_vqs_to_interrupt(dev, get_vq_interrupt);
>}
>
> map_vqs_to_interrupts() would call get_vq_interrupt() for each vq,  
> assuming the maximum nr_interrupts, and retry with smaller nr_interrupts  
> on failure.

Since guest drivers are going to do round-robin most of the time, I
think the right thing to do is to make the API simple, along the lines
proposed by Rusty, and make the guest/host ABI rich enough to support
arbitrary mapping, along the lines proposed by you. We can always change
the API, ABI is harder.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH] msi-x: let drivers retry when not enough vectors

2009-05-07 Thread Michael S. Tsirkin
pci_enable_msix currently returns -EINVAL if you ask
for more vectors than supported by the device, which would
typically cause fallback to regular interrupts.

It's better to return the table size, making the driver retry
MSI-X with less vectors.

Signed-off-by: Michael S. Tsirkin 
---

Hi Jesse,
This came up when I was adding MSI-X support to virtio pci driver,
which does not know the exact table size upfront.
Could you consider this patch for 2.6.31 please?


 drivers/pci/msi.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 6f2e629..f5bd1c9 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -687,8 +687,8 @@ int pci_msix_table_size(struct pci_dev *dev)
  * indicates the successful configuration of MSI-X capability structure
  * with new allocated MSI-X irqs. A return of < 0 indicates a failure.
  * Or a return of > 0 indicates that driver request is exceeding the number
- * of irqs available. Driver should use the returned value to re-send
- * its request.
+ * of irqs or MSI-X vectors available. Driver should use the returned value to
+ * re-send its request.
  **/
 int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int nvec)
 {
@@ -704,7 +704,7 @@ int pci_enable_msix(struct pci_dev* dev, struct msix_entry 
*entries, int nvec)
 
nr_entries = pci_msix_table_size(dev);
if (nvec > nr_entries)
-   return -EINVAL;
+   return nr_entries;
 
/* Check for any invalid entries */
for (i = 0; i < nvec; i++) {
-- 
1.6.3.rc3.1.g830204
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] msi-x: let drivers retry when not enough vectors

2009-05-07 Thread Michael S. Tsirkin
On Thu, May 07, 2009 at 04:51:24PM +0800, Sheng Yang wrote:
> On Thursday 07 May 2009 16:28:41 Michael S. Tsirkin wrote:
> > pci_enable_msix currently returns -EINVAL if you ask
> > for more vectors than supported by the device, which would
> > typically cause fallback to regular interrupts.
> >
> > It's better to return the table size, making the driver retry
> > MSI-X with less vectors.
> 
> Hi Michael
> 
> I think driver should read from capability list to know how many vector 
> supported by this device before enable MSI-X for device, as 
> pci_msix_table_size() did...

Drivers can do this, but it's more code. Since pci_enable_msix
calls pci_msix_table_size already, let it do the work. Right?


> -- 
> regards
> Yang, Sheng
> 
> >
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >
> > Hi Jesse,
> > This came up when I was adding MSI-X support to virtio pci driver,
> > which does not know the exact table size upfront.
> > Could you consider this patch for 2.6.31 please?
> >
> >
> >  drivers/pci/msi.c |6 +++---
> >  1 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> > index 6f2e629..f5bd1c9 100644
> > --- a/drivers/pci/msi.c
> > +++ b/drivers/pci/msi.c
> > @@ -687,8 +687,8 @@ int pci_msix_table_size(struct pci_dev *dev)
> >   * indicates the successful configuration of MSI-X capability structure
> >   * with new allocated MSI-X irqs. A return of < 0 indicates a failure.
> >   * Or a return of > 0 indicates that driver request is exceeding the
> > number - * of irqs available. Driver should use the returned value to
> > re-send - * its request.
> > + * of irqs or MSI-X vectors available. Driver should use the returned
> > value to + * re-send its request.
> >   **/
> >  int pci_enable_msix(struct pci_dev* dev, struct msix_entry *entries, int
> > nvec) {
> > @@ -704,7 +704,7 @@ int pci_enable_msix(struct pci_dev* dev, struct
> > msix_entry *entries, int nvec)
> >
> > nr_entries = pci_msix_table_size(dev);
> > if (nvec > nr_entries)
> > -   return -EINVAL;
> > +   return nr_entries;
> >
> > /* Check for any invalid entries */
> > for (i = 0; i < nvec; i++) {
> 

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] msi-x: let drivers retry when not enough vectors

2009-05-07 Thread Michael S. Tsirkin
On Thu, May 07, 2009 at 05:10:46PM +0800, Sheng Yang wrote:
> > > I think driver should read from capability list to know how many vector
> > > supported by this device before enable MSI-X for device, as
> > > pci_msix_table_size() did...
> >
> > Drivers can do this, but it's more code. Since pci_enable_msix
> > calls pci_msix_table_size already, let it do the work. Right?
> 
> If you don't know the vectors number before you enable MSI-X, how can you 
> setup vectors? 

I don't know how many irqs I will be able to get anyway.
vectors that can't be assigned are useless ...

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH] msi-x: let drivers retry when not enough vectors

2009-05-07 Thread Michael S. Tsirkin
On Thu, May 07, 2009 at 06:19:53PM +0800, Sheng Yang wrote:
> On Thursday 07 May 2009 17:53:02 Matthew Wilcox wrote:
> > On Thu, May 07, 2009 at 05:40:15PM +0800, Sheng Yang wrote:
> > > It's indeed weird. Why the semantic of pci_enable_msix can be changed to
> > > "enable msix, or tell me how many vector do you have"? You can simply
> > > call pci_msix_table_size() to get what you want, also without any more
> > > work, no? I can't understand...
> >
> > Here's a good example.  Let's suppose you have a driver which supports
> > two different models of cards, one has 16 MSI-X interrupts, the other
> > has 10.  You can call pci_enable_msix() asking for 16 vectors.  If your
> > card is model A, you get 16 interrupts.  If your card is model B, it says
> > "you can have 10".
> >
> > This is less work in the driver (since it must implement falling back to
> > a smaller number of interrupts *anyway*) than interrogating the card to
> > find out how many interrupts there are, then requesting the right number,
> > and still having the fallback path which is going to be less tested.
> 
> Yeah, partly understand now.
> 
> But the confusing of return value is not that pleasure compared to this 
> benefit. And even you have to fall back if return > 0 anyway, but in the 
> past, 
> you just need fall back once at most; but now you may fall back twice.

I don't think that's right - you might not be able to get the
number of interrupts that pci_enable_msix reported.

> This 
> make thing more complex - you need either two ifs or a simple loop. And just 
> one "if" can deal with it before. All that required is one call for 
> pci_msix_table_size(), and I believe most driver would like to know how much 
> vector it have before it fill the vectors, so mostly no extra cost. But for 
> this ambiguous return meaning, you have to add more code for fall back - yes, 
> the driver may can assert that the positive return value always would be irq 
> numbers if it call pci_msix_table_size() before, but is it safe in logic?

If you know how many vectors the card has, then the only failure mode
is when you are out of irqs. No change there.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2 0/3] virtio: add guest MSI-X support

2009-05-07 Thread Michael S. Tsirkin
Add optional MSI-X support: use a vector per virtqueue with
fallback to a common vector and finally to regular interrupt.
Teach all drivers to use it.

Signed-off-by: Michael S. Tsirkin 

---

Here's a draft set of patches for MSI-X support in the guest.  It still
needs to be tested properly, and performance impact measured, but I
thought I'd share it here in the hope of getting some very early
feedback/flames.

Changelog since v1:
- Per Avi's suggestion, let guest configure virtqueue to vector mapping
- Per Rusty's suggestion, replace API with find_vqs/del_vqs.

Michael S. Tsirkin (3):
  virtio: find_vqs/del_vqs virtio operations
  virtio_pci: split up vp_interrupt
  virtio_pci: optional MSI-X support

 drivers/block/virtio_blk.c  |   11 +-
 drivers/char/hw_random/virtio-rng.c |   11 +-
 drivers/char/virtio_console.c   |   27 ++--
 drivers/lguest/lguest_device.c  |   49 ++-
 drivers/net/virtio_net.c|   47 +++
 drivers/s390/kvm/kvm_virtio.c   |   64 -
 drivers/virtio/virtio_balloon.c |   31 ++--
 drivers/virtio/virtio_pci.c |  283 ++-
 include/linux/virtio_config.h   |   29 +++-
 include/linux/virtio_pci.h  |8 +-
 10 files changed, 440 insertions(+), 120 deletions(-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 1/3] virtio: find_vqs/del_vqs virtio operations

2009-05-07 Thread Michael S. Tsirkin
This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
and updates all drivers. This is needed for MSI support, because MSI
needs to know the total number of vectors upfront.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/block/virtio_blk.c  |   11 +++---
 drivers/char/hw_random/virtio-rng.c |   11 +++---
 drivers/char/virtio_console.c   |   27 ++
 drivers/lguest/lguest_device.c  |   49 +-
 drivers/net/virtio_net.c|   47 +++---
 drivers/s390/kvm/kvm_virtio.c   |   64 +-
 drivers/virtio/virtio_balloon.c |   31 -
 drivers/virtio/virtio_pci.c |   43 +++
 include/linux/virtio_config.h   |   29 +++-
 9 files changed, 222 insertions(+), 90 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 5d34764..7b7435d 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -197,6 +197,7 @@ static int virtblk_probe(struct virtio_device *vdev)
u64 cap;
u32 v;
u32 blk_size, sg_elems;
+   virtqueue_callback *callback[] = { blk_done };
 
if (index_to_minor(index) >= 1 << MINORBITS)
return -ENOSPC;
@@ -224,11 +225,9 @@ static int virtblk_probe(struct virtio_device *vdev)
sg_init_table(vblk->sg, vblk->sg_elems);
 
/* We expect one virtqueue, for output. */
-   vblk->vq = vdev->config->find_vq(vdev, 0, blk_done);
-   if (IS_ERR(vblk->vq)) {
-   err = PTR_ERR(vblk->vq);
+   err = vdev->config->find_vqs(vdev, 1, &vblk->vq, callback);
+   if (err)
goto out_free_vblk;
-   }
 
vblk->pool = mempool_create_kmalloc_pool(1,sizeof(struct virtblk_req));
if (!vblk->pool) {
@@ -323,7 +322,7 @@ out_put_disk:
 out_mempool:
mempool_destroy(vblk->pool);
 out_free_vq:
-   vdev->config->del_vq(vblk->vq);
+   vdev->config->del_vqs(vdev);
 out_free_vblk:
kfree(vblk);
 out:
@@ -344,7 +343,7 @@ static void virtblk_remove(struct virtio_device *vdev)
blk_cleanup_queue(vblk->disk->queue);
put_disk(vblk->disk);
mempool_destroy(vblk->pool);
-   vdev->config->del_vq(vblk->vq);
+   vdev->config->del_vqs(vdev);
kfree(vblk);
 }
 
diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index 86e83f8..18eabe4 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -91,16 +91,17 @@ static struct hwrng virtio_hwrng = {
 
 static int virtrng_probe(struct virtio_device *vdev)
 {
+   virtqueue_callback *callback[] = { random_recv_done };
int err;
 
/* We expect a single virtqueue. */
-   vq = vdev->config->find_vq(vdev, 0, random_recv_done);
-   if (IS_ERR(vq))
-   return PTR_ERR(vq);
+   err = vdev->config->find_vqs(vdev, 1, &vq, callback);
+   if (err)
+   return err;
 
err = hwrng_register(&virtio_hwrng);
if (err) {
-   vdev->config->del_vq(vq);
+   vdev->config->del_vqs(vdev);
return err;
}
 
@@ -112,7 +113,7 @@ static void virtrng_remove(struct virtio_device *vdev)
 {
vdev->config->reset(vdev);
hwrng_unregister(&virtio_hwrng);
-   vdev->config->del_vq(vq);
+   vdev->config->del_vqs(vdev);
 }
 
 static struct virtio_device_id id_table[] = {
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index ff6f5a4..1fd5376 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -188,6 +188,8 @@ static void hvc_handle_input(struct virtqueue *vq)
  * Finally we put our input buffer in the input queue, ready to receive. */
 static int __devinit virtcons_probe(struct virtio_device *dev)
 {
+   struct virtqueue *vqs[2];
+   virtqueue_callback *callbacks[2];
int err;
 
vdev = dev;
@@ -199,20 +201,17 @@ static int __devinit virtcons_probe(struct virtio_device 
*dev)
goto fail;
}
 
-   /* Find the input queue. */
+   /* Find the queues. */
/* FIXME: This is why we want to wean off hvc: we do nothing
 * when input comes in. */
-   in_vq = vdev->config->find_vq(vdev, 0, hvc_handle_input);
-   if (IS_ERR(in_vq)) {
-   err = PTR_ERR(in_vq);
+   callbacks[0] = hvc_handle_input;
+   callbacks[1] = NULL;
+   err = vdev->config->find_vqs(vdev, 2, vqs, callbacks);
+   if (err)
goto free;
-   }
 
-   out_vq = vdev->config->find_vq(vdev, 1, NULL);
-   if (IS_ERR(out_vq)) {
-   err = PTR_ERR(out_vq);
-   goto free_in_vq;
-   }
+   in_vq = vqs[0];
+   out

[PATCH 2/3] virtio_pci: split up vp_interrupt

2009-05-07 Thread Michael S. Tsirkin
This reorganizes virtio-pci code in vp_interrupt slightly, so that
it's easier to add per-vq MSI support on top.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_pci.c |   45 +-
 1 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 3671c42..f7b79a2 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -164,6 +164,37 @@ static void vp_notify(struct virtqueue *vq)
iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
 }
 
+/* Handle a configuration change: Tell driver if it wants to know. */
+static irqreturn_t vp_config_changed(int irq, void *opaque)
+{
+   struct virtio_pci_device *vp_dev = opaque;
+   struct virtio_driver *drv;
+   drv = container_of(vp_dev->vdev.dev.driver,
+  struct virtio_driver, driver);
+
+   if (drv && drv->config_changed)
+   drv->config_changed(&vp_dev->vdev);
+   return IRQ_HANDLED;
+}
+
+/* Notify all virtqueues on an interrupt. */
+static irqreturn_t vp_vring_interrupt(int irq, void *opaque)
+{
+   struct virtio_pci_device *vp_dev = opaque;
+   struct virtio_pci_vq_info *info;
+   irqreturn_t ret = IRQ_NONE;
+   unsigned long flags;
+
+   spin_lock_irqsave(&vp_dev->lock, flags);
+   list_for_each_entry(info, &vp_dev->virtqueues, node) {
+   if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+   spin_unlock_irqrestore(&vp_dev->lock, flags);
+
+   return ret;
+}
+
 /* A small wrapper to also acknowledge the interrupt when it's handled.
  * I really need an EIO hook for the vring so I can ack the interrupt once we
  * know that we'll be handling the IRQ but before we invoke the callback since
@@ -173,9 +204,6 @@ static void vp_notify(struct virtqueue *vq)
 static irqreturn_t vp_interrupt(int irq, void *opaque)
 {
struct virtio_pci_device *vp_dev = opaque;
-   struct virtio_pci_vq_info *info;
-   irqreturn_t ret = IRQ_NONE;
-   unsigned long flags;
u8 isr;
 
/* reading the ISR has the effect of also clearing it so it's very
@@ -187,14 +215,11 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
return IRQ_NONE;
 
/* Configuration change?  Tell driver if it wants to know. */
-   if (isr & VIRTIO_PCI_ISR_CONFIG) {
-   struct virtio_driver *drv;
-   drv = container_of(vp_dev->vdev.dev.driver,
-  struct virtio_driver, driver);
+   if (isr & VIRTIO_PCI_ISR_CONFIG)
+   vp_config_changed(irq, opaque);
 
-   if (drv && drv->config_changed)
-   drv->config_changed(&vp_dev->vdev);
-   }
+   return vp_vring_interrupt(irq, opaque);
+}
 
spin_lock_irqsave(&vp_dev->lock, flags);
list_for_each_entry(info, &vp_dev->virtqueues, node) {
-- 
1.6.3.rc3.1.g830204

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 3/3] virtio_pci: optional MSI-X support

2009-05-07 Thread Michael S. Tsirkin
This implements optional MSI-X support in virtio_pci.
MSI-X is used whenever the host supports at least 2 MSI-X
vectors: 1 for configuration changes and 1 for virtqueues.
Per-virtqueue vectors are allocated if enough vectors
available.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_pci.c |  209 +--
 include/linux/virtio_pci.h  |8 ++-
 2 files changed, 190 insertions(+), 27 deletions(-)

diff --git a/include/linux/virtio_pci.h b/include/linux/virtio_pci.h
index cd0fd5d..4a0275b 100644
--- a/include/linux/virtio_pci.h
+++ b/include/linux/virtio_pci.h
@@ -47,9 +47,15 @@
 /* The bit of the ISR which indicates a device configuration change. */
 #define VIRTIO_PCI_ISR_CONFIG  0x2
 
+/* MSI-X registers: only enabled if MSI-X is enabled. */
+/* A 16-bit vector for configuration changes. */
+#define VIRTIO_MSI_CONFIG_VECTOR20
+/* A 16-bit vector for selected queue notifications. */
+#define VIRTIO_MSI_QUEUE_VECTOR 22
+
 /* The remaining space is defined by each driver as the per-driver
  * configuration space */
-#define VIRTIO_PCI_CONFIG  20
+#define VIRTIO_PCI_CONFIG(dev) ((dev)->msix_enabled ? 24 : 20)
 
 /* Virtio ABI version, this must match exactly */
 #define VIRTIO_PCI_ABI_VERSION 0
diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index f7b79a2..2b6333c 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -42,6 +42,28 @@ struct virtio_pci_device
/* a list of queues so we can dispatch IRQs */
spinlock_t lock;
struct list_head virtqueues;
+
+   /* MSI-X support */
+   int msix_enabled;
+   struct msix_entry *msix_entries;
+   /* Name strings for interrupts. This size should be enough,
+* and I'm too lazy to allocate each name separately. */
+   char (*msix_names)[256];
+   /* Number of vectors configured at startup (excludes per-virtqueue
+* vectors if any) */
+   unsigned msix_preset_vectors;
+   /* Number of per-virtqueue vectors if any. */
+   unsigned msix_per_vq_vectors;
+};
+
+/* Constants for MSI-X */
+/* Use first vector for configuration changes, second and the rest for
+ * virtqueues Thus, we need at least 2 vectors for MSI. */
+enum {
+   VP_MSIX_CONFIG_VECTOR = 0,
+   VP_MSIX_VQ_VECTOR = 1,
+   VP_MSIX_MIN_VECTORS = 2,
+   VP_MSIX_NO_VECTOR = 0x,
 };
 
 struct virtio_pci_vq_info
@@ -60,6 +82,9 @@ struct virtio_pci_vq_info
 
/* the list node for the virtqueues list */
struct list_head node;
+
+   /* MSI-X vector (or none) */
+   unsigned vector;
 };
 
 /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
@@ -109,7 +134,8 @@ static void vp_get(struct virtio_device *vdev, unsigned 
offset,
   void *buf, unsigned len)
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-   void __iomem *ioaddr = vp_dev->ioaddr + VIRTIO_PCI_CONFIG + offset;
+   void __iomem *ioaddr = vp_dev->ioaddr +
+   VIRTIO_PCI_CONFIG(vp_dev) + offset;
u8 *ptr = buf;
int i;
 
@@ -123,7 +149,8 @@ static void vp_set(struct virtio_device *vdev, unsigned 
offset,
   const void *buf, unsigned len)
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-   void __iomem *ioaddr = vp_dev->ioaddr + VIRTIO_PCI_CONFIG + offset;
+   void __iomem *ioaddr = vp_dev->ioaddr +
+  VIRTIO_PCI_CONFIG(vp_dev) + offset;
const u8 *ptr = buf;
int i;
 
@@ -221,17 +248,110 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
return vp_vring_interrupt(irq, opaque);
 }
 
-   spin_lock_irqsave(&vp_dev->lock, flags);
-   list_for_each_entry(info, &vp_dev->virtqueues, node) {
-   if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
-   ret = IRQ_HANDLED;
+/* the config->free_vqs() implementation */
+static void vp_free_vectors(struct virtio_device *vdev) {
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   int i;
+
+   /* Disable the vector used for configuration */
+   iowrite16(VP_MSIX_NO_VECTOR,
+ vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
+
+   for (i = 0; i < vp_dev->msix_preset_vectors; ++i)
+   free_irq(vp_dev->msix_entries[i].vector, vp_dev);
+
+   if (!vp_dev->msix_preset_vectors)
+   free_irq(vp_dev->pci_dev->irq, vp_dev);
+
+   if (vp_dev->msix_enabled) {
+   vp_dev->msix_enabled = 0;
+   pci_disable_msix(vp_dev->pci_dev);
}
-   spin_unlock_irqrestore(&vp_dev->lock, flags);
+}
 
-   return ret;
+static int vp_request_vectors(struct virtio_device *vdev, unsigned max_vqs)
+{
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   const char 

Re: [PATCH 1/3] virtio: find_vqs/del_vqs virtio operations

2009-05-08 Thread Michael S. Tsirkin
On Fri, May 08, 2009 at 04:37:06PM +0930, Rusty Russell wrote:
> On Thu, 7 May 2009 11:40:39 pm Michael S. Tsirkin wrote:
> > This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
> > and updates all drivers. This is needed for MSI support, because MSI
> > needs to know the total number of vectors upfront.
> 
> Hmm, I have a similar need for a dev to vq mapping (debugging stats).  How's
> this as a common basis?

This helps. Should I redo mine on top of this?

> Thanks,
> Rusty.
> 
> virtio: add names to virtqueue struct, mapping from devices to queues.
> 
> Add a linked list of all virtqueues for a virtio device: this helps for
> debugging and is also needed for upcoming interface change.
> 
> Also, add a "name" field for clearer debug messages.
> 
> Signed-off-by: Rusty Russell 

Yes, this will simplify supporting find_vqs.

> @@ -303,10 +313,12 @@ struct virtqueue *vring_new_virtqueue(un
>   vq->vq.callback = callback;
>   vq->vq.vdev = vdev;
>   vq->vq.vq_ops = &vring_vq_ops;
> + vq->vq.name = name;
>   vq->notify = notify;
>   vq->broken = false;
>   vq->last_used_idx = 0;
>   vq->num_added = 0;
> + list_add_tail(&vq->vq.list, &vdev->vqs);
>  #ifdef DEBUG
>   vq->in_use = false;
>  #endif
> @@ -327,6 +339,7 @@ EXPORT_SYMBOL_GPL(vring_new_virtqueue);
>  
>  void vring_del_virtqueue(struct virtqueue *vq)
>  {
> + list_del(&vq->list);
>   kfree(to_vvq(vq));
>  }
>  EXPORT_SYMBOL_GPL(vring_del_virtqueue);

I note lack of locking here. This is okay in practice as
drivers don't really call find/del vq in parallel,
but making this explicit with find_vqs will be best, yes?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/3] virtio: find_vqs/del_vqs virtio operations

2009-05-10 Thread Michael S. Tsirkin
On Sun, May 10, 2009 at 01:37:06PM +0930, Rusty Russell wrote:
> Yes, and in fact a rough look at your patch reveals that we don't actually 
> need del_vq: now we track them, we can just do that as part of vdev 
> destruction, right?

Let's assume that a driver encounters an error in probe
after it calls find_vq. It would need a way to revert
find_vq, won't it?

It seems to me that bus->remove does not get called
on probe failure. Isn't that right?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 1/2] qemu-kvm: add MSI-X support

2009-05-11 Thread Michael S. Tsirkin
This adds (incomplete) MSI-X support to virtio net device.
Missing is save/load support, and command-line flag to
control the feature.

Signed-off-by: Michael S. Tsirkin 
---
 Makefile.target |2 +-
 hw/msix.c   |  362 +++
 hw/msix.h   |   33 +
 hw/pci.c|   35 --
 hw/pci.h|   53 +++-
 hw/virtio-balloon.c |2 +-
 hw/virtio-blk.c |3 +-
 hw/virtio-console.c |3 +-
 hw/virtio-net.c |3 +-
 hw/virtio.c |  167 +++-
 hw/virtio.h |4 +-
 11 files changed, 610 insertions(+), 57 deletions(-)
 create mode 100644 hw/msix.c
 create mode 100644 hw/msix.h

diff --git a/Makefile.target b/Makefile.target
index 5cb4c64..6a59a30 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -550,7 +550,7 @@ endif #CONFIG_BSD_USER
 # System emulator target
 ifndef CONFIG_USER_ONLY
 
-OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o dma-helpers.o
+OBJS=vl.o osdep.o monitor.o pci.o msix.o loader.o isa_mmio.o machine.o 
dma-helpers.o
 # virtio has to be here due to weird dependency between PCI and virtio-net.
 # need to fix this properly
 OBJS+=virtio.o virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o
diff --git a/hw/msix.c b/hw/msix.c
new file mode 100644
index 000..dcb7dbd
--- /dev/null
+++ b/hw/msix.c
@@ -0,0 +1,362 @@
+/*
+ * MSI-X device support
+ *
+ * This module includes support for MSI-X in pci devices.
+ *
+ * Author: Michael S. Tsirkin 
+ *
+ *  Copyright (c) 2009, Red Hat Inc, Michael S. Tsirkin (m...@redhat.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "hw.h"
+#include "msix.h"
+#include "pci.h"
+#include 
+
+/* Declaration from linux/pci_regs.h */
+#define  PCI_CAP_ID_MSIX 0x11 /* MSI-X */
+#define  PCI_MSIX_FLAGS 2 /* Table at lower 11 bits */
+#define  PCI_MSIX_FLAGS_QSIZE  0x7FF
+#define  PCI_MSIX_FLAGS_ENABLE (1 << 15)
+#define  PCI_MSIX_FLAGS_BIRMASK(7 << 0)
+
+/* MSI-X capability structure */
+#define MSIX_TABLE_OFFSET 4
+#define MSIX_PBA_OFFSET 8
+
+/* MSI-X table format */
+#define MSIX_MSG_ADDR 0
+#define MSIX_MSG_UPPER_ADDR 4
+#define MSIX_MSG_DATA 8
+#define MSIX_VECTOR_CTRL 12
+#define MSIX_ENTRY_SIZE 16
+#define MSIX_VECTOR_MASK 0x1
+
+/* How much space does an MSIX table need. */
+/* The spec requires giving the table structure
+ * a 4K aligned region all by itself. Align it to
+ * target pages so that drivers can do passthrough
+ * on the rest of the region. */
+#define MSIX_PAGE_SIZE TARGET_PAGE_ALIGN(0x1000)
+
+#ifdef MSIX_DEBUG
+#define DEBUG(fmt, ...)   \
+do {  \
+  fprintf(stderr, "%s: " fmt, __func__ , __VA_ARGS__);\
+} while (0)
+#else
+#define DEBUG(fmt, ...) do { } while(0)
+#endif
+
+/* Add MSI-X capability to the config space for the device. */
+/* Given a bar and its size, add MSI-X table on top of it
+ * and fill MSI-X capability in the config space.
+ * Original bar size must be a power of 2 or 0.
+ * New bar size is returned. */
+static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
+   unsigned bar_nr, unsigned bar_size,
+   unsigned *new_size)
+{
+unsigned config_offset = pdev->cap.start + pdev->cap.length;
+uint8_t *config = pdev->config + config_offset;
+
+if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1)
+return -EINVAL;
+if (bar_size > 0x8000)
+return -ENOSPC;
+
+/* Add space for MSI-X structures */
+if (!bar_size)
+*new_size = MSIX_PAGE_SIZE;
+else if (bar_size < MSIX_PAGE_SIZE) {
+bar_size = MSIX_PAGE_SIZE;
+*new_size = MSIX_PAGE_SIZE * 2;
+} else
+*new_size = bar_size * 2;
+
+pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
+/* Table on top of BAR */
+pci_set_long(config + MSIX_TABLE_OFFSET, bar_size | bar_nr);
+/* Pending bits on top of that */
+pci_set_long(config + MSIX_PBA_OFFSET, (bar_size + MSIX_PAGE_SIZE / 2) |
+ bar_nr);
+pci_add_capability(pdev, PCI_CAP_ID_MSIX, 
PCI_CAPABILITY_CONFIG_MSIX_LENGTH);
+pdev->cap.msix = config_offset;
+return 0;
+}
+
+static void msix_free_irq_entries(PCIDevice *dev)
+{
+int i;
+
+/* TODO: handle errors */
+for (i = 0; i < dev->msix_irq_entries_nr; i++)
+msix_vector_unuse(dev, i);
+}
+
+static void msix_enable(PCIDevice *dev)
+{
+uint32_t ctrl, data;
+int i;
+
+if (!dev->msix_irq_entries_nr) {
+fprintf(stderr, "MSI-X entry number is zero!\n");
+return;
+}
+
+for (i = 0; i < dev->msix_irq_entries_nr; ++i) {
+uint8_t *table_entry = dev->msix_table_page + i * MSIX_EN

[PATCH 2/2] qemu-kvm: use common code for assigned msix

2009-05-11 Thread Michael S. Tsirkin

For assigned devices, use common code to enable msi-x.
Add "hack" option as assigned devices lack a standard way to get vector usage.

Signed-off-by: Michael S. Tsirkin 
---
 hw/device-assignment.c |  336 
 hw/device-assignment.h |8 +-
 hw/msix.c  |   11 ++-
 hw/pci.h   |4 +
 4 files changed, 100 insertions(+), 259 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 0a5f850..aabed69 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -33,6 +33,7 @@
 #include 
 #include "qemu-kvm.h"
 #include "hw.h"
+#include "msix.h"
 #include "pc.h"
 #include "sysemu.h"
 #include "console.h"
@@ -150,11 +151,10 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, 
int region_num,
 {
 AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev);
 AssignedDevRegion *region = &r_dev->v_addrs[region_num];
-PCIRegion *real_region = &r_dev->real_device.regions[region_num];
 uint32_t old_ephys = region->e_physbase;
 uint32_t old_esize = region->e_size;
 int first_map = (region->e_size == 0);
-int ret = 0;
+int ret;
 
 DEBUG("e_phys=%08x r_virt=%p type=%d len=%08x region_num=%d \n",
   e_phys, region->u.r_virtbase, type, e_size, region_num);
@@ -166,27 +166,17 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, 
int region_num,
kvm_destroy_phys_mem(kvm_context, old_ephys,
  TARGET_PAGE_ALIGN(old_esize));
 
-if (e_size > 0) {
-/* deal with MSI-X MMIO page */
-if (real_region->base_addr <= r_dev->msix_table_addr &&
-real_region->base_addr + real_region->size >=
-r_dev->msix_table_addr) {
-int offset = r_dev->msix_table_addr - real_region->base_addr;
-ret = munmap(region->u.r_virtbase + offset, TARGET_PAGE_SIZE);
-if (ret == 0)
-DEBUG("munmap done, virt_base 0x%p\n",
-region->u.r_virtbase + offset);
-else {
-fprintf(stderr, "%s: fail munmap msix table!\n", __func__);
-exit(1);
-}
-cpu_register_physical_memory(e_phys + offset,
-TARGET_PAGE_SIZE, r_dev->mmio_index);
-}
-   ret = kvm_register_phys_mem(kvm_context, e_phys,
-region->u.r_virtbase,
-TARGET_PAGE_ALIGN(e_size), 0);
-}
+if (e_size <= 0)
+return;
+
+/* deal with MSI-X MMIO page */
+msix_mmio_map(pci_dev, region_num, e_phys, e_size, type);
+/* Only register as much memory as required to cover the
+ * actual device region. */
+e_size = r_dev->real_device.regions[region_num].size;
+ret = kvm_register_phys_mem(kvm_context, e_phys,
+region->u.r_virtbase,
+TARGET_PAGE_ALIGN(e_size), 0);
 
 if (ret != 0) {
fprintf(stderr, "%s: Error: create new mapping failed\n", __func__);
@@ -378,11 +368,16 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 
 /* handle memory io regions */
 if (cur_region->type & IORESOURCE_MEM) {
+uint32_t size = i == msix_bar_nr(&pci_dev->dev)
+? pci_dev->msix_bar_size : cur_region->size;
+
 int t = cur_region->type & IORESOURCE_PREFETCH
 ? PCI_ADDRESS_SPACE_MEM_PREFETCH
 : PCI_ADDRESS_SPACE_MEM;
 
 /* map physical memory */
+/* MSI-X table is located outside cur_region->size
+ * and so won't be mapped */
 pci_dev->v_addrs[i].e_physbase = cur_region->base_addr;
 pci_dev->v_addrs[i].u.r_virtbase =
 mmap(NULL,
@@ -397,7 +392,7 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 (uint32_t) (cur_region->base_addr));
 return -1;
 }
-pci_dev->v_addrs[i].r_size = cur_region->size;
+pci_dev->v_addrs[i].r_size = size;
 pci_dev->v_addrs[i].e_size = 0;
 
 /* add offset */
@@ -405,8 +400,7 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 (cur_region->base_addr & 0xFFF);
 
 pci_register_io_region((PCIDevice *) pci_dev, i,
-   cur_region->size, t,
-   assigned_dev_iomem_map);
+   size, t, assigned_dev_iomem_map);
 continue;
 }
 /* handle port io regions */
@@ -542,11 +536,11 @@ static void free_dev_irq_entries(Assign

[PATCH RFC 0/2] qemu-kvm: MSI-X support

2009-05-11 Thread Michael S. Tsirkin
Here's a draft MSI-X support patch. Among missing features:
save/load support, and command-line flag to control the
feature. This is on top of qemu-kvm: msi-x is disabled
without kvm interrupt injection support for now.

Michael S. Tsirkin (2):
  qemu-kvm: add MSI-X support
  qemu-kvm: use common code for assigned msix

 Makefile.target|2 +-
 hw/device-assignment.c |  336 +++-
 hw/device-assignment.h |8 +-
 hw/msix.c  |  371 
 hw/msix.h  |   33 +
 hw/pci.c   |   35 --
 hw/pci.h   |   57 +++-
 hw/virtio-balloon.c|2 +-
 hw/virtio-blk.c|3 +-
 hw/virtio-console.c|3 +-
 hw/virtio-net.c|3 +-
 hw/virtio.c|  167 +-
 hw/virtio.h|4 +-
 13 files changed, 709 insertions(+), 315 deletions(-)
 create mode 100644 hw/msix.c
 create mode 100644 hw/msix.h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 1/3] virtio: find_vqs/del_vqs virtio operations

2009-05-11 Thread Michael S. Tsirkin
This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
and updates all drivers. This is needed for MSI support, because MSI
needs to know the total number of vectors upfront.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/block/virtio_blk.c  |   11 +++---
 drivers/char/hw_random/virtio-rng.c |   11 +++---
 drivers/char/virtio_console.c   |   27 ++
 drivers/lguest/lguest_device.c  |   49 +-
 drivers/net/virtio_net.c|   48 +++---
 drivers/s390/kvm/kvm_virtio.c   |   64 +-
 drivers/virtio/virtio_balloon.c |   29 +++-
 drivers/virtio/virtio_pci.c |   43 +++
 include/linux/virtio_config.h   |   29 +++-
 net/9p/trans_virtio.c   |7 ++--
 10 files changed, 225 insertions(+), 93 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 5d34764..7b7435d 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -197,6 +197,7 @@ static int virtblk_probe(struct virtio_device *vdev)
u64 cap;
u32 v;
u32 blk_size, sg_elems;
+   virtqueue_callback *callback[] = { blk_done };
 
if (index_to_minor(index) >= 1 << MINORBITS)
return -ENOSPC;
@@ -224,11 +225,9 @@ static int virtblk_probe(struct virtio_device *vdev)
sg_init_table(vblk->sg, vblk->sg_elems);
 
/* We expect one virtqueue, for output. */
-   vblk->vq = vdev->config->find_vq(vdev, 0, blk_done);
-   if (IS_ERR(vblk->vq)) {
-   err = PTR_ERR(vblk->vq);
+   err = vdev->config->find_vqs(vdev, 1, &vblk->vq, callback);
+   if (err)
goto out_free_vblk;
-   }
 
vblk->pool = mempool_create_kmalloc_pool(1,sizeof(struct virtblk_req));
if (!vblk->pool) {
@@ -323,7 +322,7 @@ out_put_disk:
 out_mempool:
mempool_destroy(vblk->pool);
 out_free_vq:
-   vdev->config->del_vq(vblk->vq);
+   vdev->config->del_vqs(vdev);
 out_free_vblk:
kfree(vblk);
 out:
@@ -344,7 +343,7 @@ static void virtblk_remove(struct virtio_device *vdev)
blk_cleanup_queue(vblk->disk->queue);
put_disk(vblk->disk);
mempool_destroy(vblk->pool);
-   vdev->config->del_vq(vblk->vq);
+   vdev->config->del_vqs(vdev);
kfree(vblk);
 }
 
diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index 86e83f8..18eabe4 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -91,16 +91,17 @@ static struct hwrng virtio_hwrng = {
 
 static int virtrng_probe(struct virtio_device *vdev)
 {
+   virtqueue_callback *callback[] = { random_recv_done };
int err;
 
/* We expect a single virtqueue. */
-   vq = vdev->config->find_vq(vdev, 0, random_recv_done);
-   if (IS_ERR(vq))
-   return PTR_ERR(vq);
+   err = vdev->config->find_vqs(vdev, 1, &vq, callback);
+   if (err)
+   return err;
 
err = hwrng_register(&virtio_hwrng);
if (err) {
-   vdev->config->del_vq(vq);
+   vdev->config->del_vqs(vdev);
return err;
}
 
@@ -112,7 +113,7 @@ static void virtrng_remove(struct virtio_device *vdev)
 {
vdev->config->reset(vdev);
hwrng_unregister(&virtio_hwrng);
-   vdev->config->del_vq(vq);
+   vdev->config->del_vqs(vdev);
 }
 
 static struct virtio_device_id id_table[] = {
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index ff6f5a4..1fd5376 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -188,6 +188,8 @@ static void hvc_handle_input(struct virtqueue *vq)
  * Finally we put our input buffer in the input queue, ready to receive. */
 static int __devinit virtcons_probe(struct virtio_device *dev)
 {
+   struct virtqueue *vqs[2];
+   virtqueue_callback *callbacks[2];
int err;
 
vdev = dev;
@@ -199,20 +201,17 @@ static int __devinit virtcons_probe(struct virtio_device 
*dev)
goto fail;
}
 
-   /* Find the input queue. */
+   /* Find the queues. */
/* FIXME: This is why we want to wean off hvc: we do nothing
 * when input comes in. */
-   in_vq = vdev->config->find_vq(vdev, 0, hvc_handle_input);
-   if (IS_ERR(in_vq)) {
-   err = PTR_ERR(in_vq);
+   callbacks[0] = hvc_handle_input;
+   callbacks[1] = NULL;
+   err = vdev->config->find_vqs(vdev, 2, vqs, callbacks);
+   if (err)
goto free;
-   }
 
-   out_vq = vdev->config->find_vq(vdev, 1, NULL);
-   if (IS_ERR(out_vq)) {
-   err = PTR_ERR(out_vq);
-   got

[PATCHv3 0/3] virtio: MSI-X support

2009-05-11 Thread Michael S. Tsirkin
Here's an updated draft of virtio patches which work with the qemu-kvm
code that I just posted. This still needs to be rebased on top of
Rusty's recent virtqueue list patch. I am posting it in case people
want to start trying MSI-X out.

Michael S. Tsirkin (3):
  virtio: find_vqs/del_vqs virtio operations
  virtio_pci: split up vp_interrupt
  virtio_pci: optional MSI-X support

 drivers/block/virtio_blk.c  |   11 +-
 drivers/char/hw_random/virtio-rng.c |   11 +-
 drivers/char/virtio_console.c   |   27 ++--
 drivers/lguest/lguest_device.c  |   49 ++-
 drivers/net/virtio_net.c|   48 +++
 drivers/s390/kvm/kvm_virtio.c   |   64 -
 drivers/virtio/virtio_balloon.c |   29 ++--
 drivers/virtio/virtio_pci.c |  291 ++-
 include/linux/virtio_config.h   |   29 +++-
 include/linux/virtio_pci.h  |8 +-
 net/9p/trans_virtio.c   |7 +-
 11 files changed, 450 insertions(+), 124 deletions(-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 3/3] virtio_pci: optional MSI-X support

2009-05-11 Thread Michael S. Tsirkin
This implements optional MSI-X support in virtio_pci.
MSI-X is used whenever the host supports at least 2 MSI-X
vectors: 1 for configuration changes and 1 for virtqueues.
Per-virtqueue vectors are allocated if enough vectors
available.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_pci.c |  211 +++
 include/linux/virtio_pci.h  |8 ++-
 2 files changed, 199 insertions(+), 20 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index a6bebe2..d21e2e6 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -42,6 +42,29 @@ struct virtio_pci_device
/* a list of queues so we can dispatch IRQs */
spinlock_t lock;
struct list_head virtqueues;
+
+   /* MSI-X support */
+   int msix_enabled;
+   int intx_enabled;
+   struct msix_entry *msix_entries;
+   /* Name strings for interrupts. This size should be enough,
+* and I'm too lazy to allocate each name separately. */
+   char (*msix_names)[256];
+   /* Number of vectors configured at startup (excludes per-virtqueue
+* vectors if any) */
+   unsigned msix_preset_vectors;
+   /* Number of per-virtqueue vectors if any. */
+   unsigned msix_per_vq_vectors;
+};
+
+/* Constants for MSI-X */
+/* Use first vector for configuration changes, second and the rest for
+ * virtqueues Thus, we need at least 2 vectors for MSI. */
+enum {
+   VP_MSIX_CONFIG_VECTOR = 0,
+   VP_MSIX_VQ_VECTOR = 1,
+   VP_MSIX_MIN_VECTORS = 2,
+   VP_MSIX_NO_VECTOR = 0x,
 };
 
 struct virtio_pci_vq_info
@@ -60,6 +83,9 @@ struct virtio_pci_vq_info
 
/* the list node for the virtqueues list */
struct list_head node;
+
+   /* MSI-X vector (or none) */
+   unsigned vector;
 };
 
 /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
@@ -109,7 +135,8 @@ static void vp_get(struct virtio_device *vdev, unsigned 
offset,
   void *buf, unsigned len)
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-   void __iomem *ioaddr = vp_dev->ioaddr + VIRTIO_PCI_CONFIG + offset;
+   void __iomem *ioaddr = vp_dev->ioaddr +
+   VIRTIO_PCI_CONFIG(vp_dev) + offset;
u8 *ptr = buf;
int i;
 
@@ -123,7 +150,8 @@ static void vp_set(struct virtio_device *vdev, unsigned 
offset,
   const void *buf, unsigned len)
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-   void __iomem *ioaddr = vp_dev->ioaddr + VIRTIO_PCI_CONFIG + offset;
+   void __iomem *ioaddr = vp_dev->ioaddr +
+  VIRTIO_PCI_CONFIG(vp_dev) + offset;
const u8 *ptr = buf;
int i;
 
@@ -221,7 +249,116 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
return vp_vring_interrupt(irq, opaque);
 }
 
-/* the config->find_vq() implementation */
+static void vp_free_vectors(struct virtio_device *vdev) {
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   int i;
+
+   if (vp_dev->intx_enabled) {
+   free_irq(vp_dev->pci_dev->irq, vp_dev);
+   vp_dev->intx_enabled = 0;
+   }
+
+   for (i = 0; i < vp_dev->msix_preset_vectors; ++i)
+   free_irq(vp_dev->msix_entries[i].vector, vp_dev);
+   vp_dev->msix_preset_vectors = 0;
+
+   if (vp_dev->msix_enabled) {
+   /* Disable the vector used for configuration */
+   iowrite16(VP_MSIX_NO_VECTOR,
+ vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
+
+   vp_dev->msix_enabled = 0;
+   pci_disable_msix(vp_dev->pci_dev);
+   }
+}
+
+static int vp_request_vectors(struct virtio_device *vdev, unsigned max_vqs)
+{
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   const char *name = dev_name(&vp_dev->vdev.dev);
+   unsigned i, vectors;
+   int err = -ENOMEM;
+
+   /* We need at most one vector per queue and one for config changes */
+   vectors = VP_MSIX_VQ_VECTOR + max_vqs;
+   vp_dev->msix_entries = kmalloc(vectors * sizeof *vp_dev->msix_entries,
+  GFP_KERNEL);
+   if (!vp_dev->msix_entries)
+   goto error_entries;
+   vp_dev->msix_names = kmalloc(vectors * sizeof *vp_dev->msix_names,
+GFP_KERNEL);
+   if (!vp_dev->msix_names)
+   goto error_names;
+
+   snprintf(vp_dev->msix_names[VP_MSIX_CONFIG_VECTOR],
+sizeof *vp_dev->msix_names, "%s-config", name);
+   for (i = 0; i < max_vqs; ++i)
+   snprintf(vp_dev->msix_names[i + VP_MSIX_VQ_VECTOR],
+sizeof *vp_dev->msix_names, "%s-vq-%d", name, i);
+   for (i = 0; i < vectors; ++i)
+   vp_d

[PATCH 2/3] virtio_pci: split up vp_interrupt

2009-05-11 Thread Michael S. Tsirkin
This reorganizes virtio-pci code in vp_interrupt slightly, so that
it's easier to add per-vq MSI support on top.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_pci.c |   53 +++---
 1 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index d5e030f..a6bebe2 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -164,6 +164,37 @@ static void vp_notify(struct virtqueue *vq)
iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
 }
 
+/* Handle a configuration change: Tell driver if it wants to know. */
+static irqreturn_t vp_config_changed(int irq, void *opaque)
+{
+   struct virtio_pci_device *vp_dev = opaque;
+   struct virtio_driver *drv;
+   drv = container_of(vp_dev->vdev.dev.driver,
+  struct virtio_driver, driver);
+
+   if (drv && drv->config_changed)
+   drv->config_changed(&vp_dev->vdev);
+   return IRQ_HANDLED;
+}
+
+/* Notify all virtqueues on an interrupt. */
+static irqreturn_t vp_vring_interrupt(int irq, void *opaque)
+{
+   struct virtio_pci_device *vp_dev = opaque;
+   struct virtio_pci_vq_info *info;
+   irqreturn_t ret = IRQ_NONE;
+   unsigned long flags;
+
+   spin_lock_irqsave(&vp_dev->lock, flags);
+   list_for_each_entry(info, &vp_dev->virtqueues, node) {
+   if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+   spin_unlock_irqrestore(&vp_dev->lock, flags);
+
+   return ret;
+}
+
 /* A small wrapper to also acknowledge the interrupt when it's handled.
  * I really need an EIO hook for the vring so I can ack the interrupt once we
  * know that we'll be handling the IRQ but before we invoke the callback since
@@ -173,9 +204,6 @@ static void vp_notify(struct virtqueue *vq)
 static irqreturn_t vp_interrupt(int irq, void *opaque)
 {
struct virtio_pci_device *vp_dev = opaque;
-   struct virtio_pci_vq_info *info;
-   irqreturn_t ret = IRQ_NONE;
-   unsigned long flags;
u8 isr;
 
/* reading the ISR has the effect of also clearing it so it's very
@@ -187,23 +215,10 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
return IRQ_NONE;
 
/* Configuration change?  Tell driver if it wants to know. */
-   if (isr & VIRTIO_PCI_ISR_CONFIG) {
-   struct virtio_driver *drv;
-   drv = container_of(vp_dev->vdev.dev.driver,
-  struct virtio_driver, driver);
-
-   if (drv && drv->config_changed)
-   drv->config_changed(&vp_dev->vdev);
-   }
+   if (isr & VIRTIO_PCI_ISR_CONFIG)
+   vp_config_changed(irq, opaque);
 
-   spin_lock_irqsave(&vp_dev->lock, flags);
-   list_for_each_entry(info, &vp_dev->virtqueues, node) {
-   if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
-   ret = IRQ_HANDLED;
-   }
-   spin_unlock_irqrestore(&vp_dev->lock, flags);
-
-   return ret;
+   return vp_vring_interrupt(irq, opaque);
 }
 
 /* the config->find_vq() implementation */
-- 
1.6.3.rc3.1.g830204

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 0/2] qemu-kvm: MSI-X support

2009-05-12 Thread Michael S. Tsirkin
On Mon, May 11, 2009 at 05:24:25PM -0500, Anthony Liguori wrote:
> Michael S. Tsirkin wrote:
>> Here's a draft MSI-X support patch. Among missing features:
>> save/load support, and command-line flag to control the
>> feature. This is on top of qemu-kvm: msi-x is disabled
>> without kvm interrupt injection support for now.
>>   
>
> What's your impression of how much work would be to get this going on  
> top of upstream QEMU?
>
> I'm willing to borrow a few cycles to help out here.  I'd really like to  
> see this series go in via QEMU if possible.

It seems that if I just call apic_deliver_irq each time
I want to send MSI, things will work.

However, large part of the msix code is managing IRQs versus kernel,
and I'm not sure it's a wise investment of effort to rip it all out. So
IMHO, what's missing is API that abstracts managing irq routes in kvm,
specifically abstract this stuff in some way:
kvm_get_irq_route_gsi
kvm_add_routing_entry
kvm_del_routing_entry
kvm_commit_irq_routes
kvm_set_irq
How hard is that?

For now, this API could be a stub that just stores the routes somewhere,
and set_irq would call the local apic emulation, along the lines of:

uint8_t dest = (addr_lo & MSI_ADDR_DEST_ID_MASK)
>> MSI_ADDR_DEST_ID_SHIFT;
uint8_t vector = (addr_hi & MSI_DATA_VECTOR_MASK)
>> MSI_DATA_VECTOR_SHIFT;
uint8_t dest_mode = (addr_lo >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
uint8_t trigger_mode = (data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
uint8_t delivery_mode = (data >> MSI_DATA_DELIVERY_MODE_SHIFT) &
0x7;
apic_deliver_irq(dest, dest_mode, delivery_mode, vector, 0,
 trigger_mode);

I would be happy to port my msix code to work on top of this API.
Willing to help?


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/3] virtio: find_vqs/del_vqs virtio operations

2009-05-12 Thread Michael S. Tsirkin
On Wed, May 13, 2009 at 12:00:02AM +0930, Rusty Russell wrote:
> On Tue, 12 May 2009 07:49:32 am Michael S. Tsirkin wrote:
> > This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
> > and updates all drivers. This is needed for MSI support, because MSI
> > needs to know the total number of vectors upfront.
> 
> Sorry, is this not on top of my virtio_device vq linked list patch?

Not yet, working on that.

> Two other things: prefer vq_callback_t as the type name,

ok

> and perhaps consider 
> varargs for the callbacks (or would that be too horrible at the 
> implementation 
> end?)
> 
> Thanks,
> Rusty.

Ugh ... I think it will be. And AFAIK gcc generates a lot of code
for varargs - not something we want to do in each interrupt handler.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv4] virtio: find_vqs/del_vqs virtio operations

2009-05-12 Thread Michael S. Tsirkin
This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
and updates all drivers. This is needed for MSI support, because MSI
needs to know the total number of vectors upfront.

Signed-off-by: Michael S. Tsirkin 
---

Untested - Rusty could you pls take a look before I
go ahead and rebase the rest on top of this?

 drivers/block/virtio_blk.c  |   12 
 drivers/char/hw_random/virtio-rng.c |   12 +
 drivers/char/virtio_console.c   |   26 ---
 drivers/lguest/lguest_device.c  |   36 ++-
 drivers/net/virtio_net.c|   45 ++-
 drivers/s390/kvm/kvm_virtio.c   |   37 +++-
 drivers/virtio/virtio_balloon.c |   27 
 drivers/virtio/virtio_pci.c |   37 +++-
 include/linux/virtio_config.h   |   33 +
 net/9p/trans_virtio.c   |8 +++---
 10 files changed, 178 insertions(+), 95 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 8f7c956..ec7a47a 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -197,6 +197,8 @@ static int virtblk_probe(struct virtio_device *vdev)
u64 cap;
u32 v;
u32 blk_size, sg_elems;
+   vq_callback_t *callback[] = { blk_done };
+   const char *name[] = { "requests" };
 
if (index_to_minor(index) >= 1 << MINORBITS)
return -ENOSPC;
@@ -224,11 +226,9 @@ static int virtblk_probe(struct virtio_device *vdev)
sg_init_table(vblk->sg, vblk->sg_elems);
 
/* We expect one virtqueue, for output. */
-   vblk->vq = vdev->config->find_vq(vdev, 0, blk_done, "requests");
-   if (IS_ERR(vblk->vq)) {
-   err = PTR_ERR(vblk->vq);
+   err = vdev->config->find_vqs(vdev, 1, &vblk->vq, callback, name);
+   if (err)
goto out_free_vblk;
-   }
 
vblk->pool = mempool_create_kmalloc_pool(1,sizeof(struct virtblk_req));
if (!vblk->pool) {
@@ -323,7 +323,7 @@ out_put_disk:
 out_mempool:
mempool_destroy(vblk->pool);
 out_free_vq:
-   vdev->config->del_vq(vblk->vq);
+   vdev->config->del_vqs(vdev);
 out_free_vblk:
kfree(vblk);
 out:
@@ -344,7 +344,7 @@ static void virtblk_remove(struct virtio_device *vdev)
blk_cleanup_queue(vblk->disk->queue);
put_disk(vblk->disk);
mempool_destroy(vblk->pool);
-   vdev->config->del_vq(vblk->vq);
+   vdev->config->del_vqs(vdev);
kfree(vblk);
 }
 
diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index 2aeafce..65ee25a 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -91,16 +91,18 @@ static struct hwrng virtio_hwrng = {
 
 static int virtrng_probe(struct virtio_device *vdev)
 {
+   vq_callback_t *callback[] = { random_recv_done };
+   const char *name[] = { "input" };
int err;
 
/* We expect a single virtqueue. */
-   vq = vdev->config->find_vq(vdev, 0, random_recv_done, "input");
-   if (IS_ERR(vq))
-   return PTR_ERR(vq);
+   err = vdev->config->find_vqs(vdev, 1, &vq, callback, name);
+   if (err)
+   return err;
 
err = hwrng_register(&virtio_hwrng);
if (err) {
-   vdev->config->del_vq(vq);
+   vdev->config->del_vqs(vdev);
return err;
}
 
@@ -112,7 +114,7 @@ static void virtrng_remove(struct virtio_device *vdev)
 {
vdev->config->reset(vdev);
hwrng_unregister(&virtio_hwrng);
-   vdev->config->del_vq(vq);
+   vdev->config->del_vqs(vdev);
 }
 
 static struct virtio_device_id id_table[] = {
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 58684e4..144d1b7 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -188,6 +188,9 @@ static void hvc_handle_input(struct virtqueue *vq)
  * Finally we put our input buffer in the input queue, ready to receive. */
 static int __devinit virtcons_probe(struct virtio_device *dev)
 {
+   vq_callback_t *callbacks[] = { hvc_handle_input, NULL};
+   const char *names[] = { "input", "output" };
+   struct virtqueue *vqs[2];
int err;
 
vdev = dev;
@@ -199,20 +202,15 @@ static int __devinit virtcons_probe(struct virtio_device 
*dev)
goto fail;
}
 
-   /* Find the input queue. */
+   /* Find the queues. */
/* FIXME: This is why we want to wean off hvc: we do nothing
 * when input comes in. */
-   in_vq = vdev->config->find_vq(vdev, 0, hvc_handle_input, "input");
-   if (IS_ERR(in

Re: [PATCH] msi-x: let drivers retry when not enough vectors

2009-05-12 Thread Michael S. Tsirkin
On Fri, May 08, 2009 at 09:25:00AM +0930, Rusty Russell wrote:
> On Thu, 7 May 2009 07:49:53 pm Sheng Yang wrote:
> > On Thursday 07 May 2009 17:53:02 Matthew Wilcox wrote:
> > > Here's a good example.  Let's suppose you have a driver which supports
> > > two different models of cards, one has 16 MSI-X interrupts, the other
> > > has 10.  You can call pci_enable_msix() asking for 16 vectors.  If your
> > > card is model A, you get 16 interrupts.  If your card is model B, it says
> > > "you can have 10".
> 
> Sheng is absolutely right, that's a horrid API.
> 
> If it actually enabled that number and returned it, it might make sense (cf. 
> write() returning less bytes than you give it).  But overloading the return 
> value to save an explicit call is just ugly; it's not worth saving a few 
> lines 
> of code at cost of making all the drivers subtle and tricksy.
> 
> Fail with -ENOSPC or something.
> 
> Rusty.

I do agree that returning a positive value from pci_enable_msix
it an ugly API (but note that this is the API that linux currently has).

Here's a wrapper that I ended up with in my driver:

static int enable_msix(struct pci_dev *dev, struct msix_entry *entries,
   int *options, int noptions)
{
int i;
for (i = 0; i < noptions; ++i)
if (!pci_enable_msix(dev, entries, options[i]))
return options[i];
return -EBUSY;
}

This gets an array of options for # of vectors and tries them one after
the other until an option that the system can support is found.
On success, we get the # of vectors actually enabled, and
driver can then use them as it sees fit.

Is there interest in moving something like this to pci.h?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/3] virtio: find_vqs/del_vqs virtio operations

2009-05-13 Thread Michael S. Tsirkin
On Wed, May 13, 2009 at 10:47:08AM +0930, Rusty Russell wrote:
> On Wed, 13 May 2009 01:03:30 am Michael S. Tsirkin wrote:
> > On Wed, May 13, 2009 at 12:00:02AM +0930, Rusty Russell wrote
> > > and perhaps consider
> > > varargs for the callbacks (or would that be too horrible at the
> > > implementation end?)
> > >
> > > Thanks,
> > > Rusty.
> >
> > Ugh ... I think it will be. And AFAIK gcc generates a lot of code
> > for varargs - not something we want to do in each interrupt handler.
> 
> Err, no I mean for find_vqs:  eg.
>   (block device)
>   err = vdev->config->find_vqs(vdev, 1, &vblk->vq, blk_done);
> 
>   (net device)
>   err = vdev->config->find_vqs(vdev, 3, vqs, skb_recv_done, 
> skb_xmit_done, NULL);
> 
> A bit neater for for the single-queue case.
> 
> Cheers,
> Rusty.

Oh. I see. But it becomes messy now that we also need to pass in the
names, and we lose type safety.
Let's just add a helper function for the single vq case?

static inline struct virtqueue *virtio_find_vq(struct virtio_devide *vdev,
   vq_callback_t *c, const char *n)
{
vq_callback_t *callbacks[] = { c };
const char *names[] = { n };
struct virtqueue *vq;
int err = vdev->config->find_vqs(vdev, 1, &vq, callbacks, names);
if (err < 0)
return ERR_PTR(err);
return vq;
}


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv5 0/3] virtio: MSI-X support

2009-05-13 Thread Michael S. Tsirkin
Here's the latest draft of virtio patches.
This is on top of Rusty's recent virtqueue list + name patch.

Michael S. Tsirkin (3):
  virtio: find_vqs/del_vqs virtio operations
  virtio_pci: split up vp_interrupt
  virtio_pci: optional MSI-X support

 drivers/block/virtio_blk.c  |6 +-
 drivers/char/hw_random/virtio-rng.c |6 +-
 drivers/char/virtio_console.c   |   26 ++--
 drivers/lguest/lguest_device.c  |   36 -
 drivers/net/virtio_net.c|   45 ++---
 drivers/s390/kvm/kvm_virtio.c   |   36 -
 drivers/virtio/virtio_balloon.c |   27 ++--
 drivers/virtio/virtio_pci.c |  301 ++-
 include/linux/virtio_config.h   |   46 --
 include/linux/virtio_pci.h  |   10 +-
 net/9p/trans_virtio.c   |2 +-
 11 files changed, 423 insertions(+), 118 deletions(-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv5 1/3] virtio: find_vqs/del_vqs virtio operations

2009-05-13 Thread Michael S. Tsirkin
This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
and updates all drivers. This is needed for MSI support, because MSI
needs to know the total number of vectors upfront.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/block/virtio_blk.c  |6 ++--
 drivers/char/hw_random/virtio-rng.c |6 ++--
 drivers/char/virtio_console.c   |   26 ---
 drivers/lguest/lguest_device.c  |   36 +-
 drivers/net/virtio_net.c|   45 +
 drivers/s390/kvm/kvm_virtio.c   |   36 +-
 drivers/virtio/virtio_balloon.c |   27 
 drivers/virtio/virtio_pci.c |   37 ++-
 include/linux/virtio_config.h   |   46 ++
 net/9p/trans_virtio.c   |2 +-
 10 files changed, 180 insertions(+), 87 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 8f7c956..c9f5627 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -224,7 +224,7 @@ static int virtblk_probe(struct virtio_device *vdev)
sg_init_table(vblk->sg, vblk->sg_elems);
 
/* We expect one virtqueue, for output. */
-   vblk->vq = vdev->config->find_vq(vdev, 0, blk_done, "requests");
+   vblk->vq = virtio_find_single_vq(vdev, blk_done, "requests");
if (IS_ERR(vblk->vq)) {
err = PTR_ERR(vblk->vq);
goto out_free_vblk;
@@ -323,7 +323,7 @@ out_put_disk:
 out_mempool:
mempool_destroy(vblk->pool);
 out_free_vq:
-   vdev->config->del_vq(vblk->vq);
+   vdev->config->del_vqs(vdev);
 out_free_vblk:
kfree(vblk);
 out:
@@ -344,7 +344,7 @@ static void virtblk_remove(struct virtio_device *vdev)
blk_cleanup_queue(vblk->disk->queue);
put_disk(vblk->disk);
mempool_destroy(vblk->pool);
-   vdev->config->del_vq(vblk->vq);
+   vdev->config->del_vqs(vdev);
kfree(vblk);
 }
 
diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index 2aeafce..f2041fe 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -94,13 +94,13 @@ static int virtrng_probe(struct virtio_device *vdev)
int err;
 
/* We expect a single virtqueue. */
-   vq = vdev->config->find_vq(vdev, 0, random_recv_done, "input");
+   vq = virtio_find_single_vq(vdev, random_recv_done, "input");
if (IS_ERR(vq))
return PTR_ERR(vq);
 
err = hwrng_register(&virtio_hwrng);
if (err) {
-   vdev->config->del_vq(vq);
+   vdev->config->del_vqs(vdev);
return err;
}
 
@@ -112,7 +112,7 @@ static void virtrng_remove(struct virtio_device *vdev)
 {
vdev->config->reset(vdev);
hwrng_unregister(&virtio_hwrng);
-   vdev->config->del_vq(vq);
+   vdev->config->del_vqs(vdev);
 }
 
 static struct virtio_device_id id_table[] = {
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 58684e4..c74dacf 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -188,6 +188,9 @@ static void hvc_handle_input(struct virtqueue *vq)
  * Finally we put our input buffer in the input queue, ready to receive. */
 static int __devinit virtcons_probe(struct virtio_device *dev)
 {
+   vq_callback_t *callbacks[] = { hvc_handle_input, NULL};
+   const char *names[] = { "input", "output" };
+   struct virtqueue *vqs[2];
int err;
 
vdev = dev;
@@ -199,20 +202,15 @@ static int __devinit virtcons_probe(struct virtio_device 
*dev)
goto fail;
}
 
-   /* Find the input queue. */
+   /* Find the queues. */
/* FIXME: This is why we want to wean off hvc: we do nothing
 * when input comes in. */
-   in_vq = vdev->config->find_vq(vdev, 0, hvc_handle_input, "input");
-   if (IS_ERR(in_vq)) {
-   err = PTR_ERR(in_vq);
+   err = vdev->config->find_vqs(vdev, 2, vqs, callbacks, names);
+   if (err)
goto free;
-   }
 
-   out_vq = vdev->config->find_vq(vdev, 1, NULL, "output");
-   if (IS_ERR(out_vq)) {
-   err = PTR_ERR(out_vq);
-   goto free_in_vq;
-   }
+   in_vq = vqs[0];
+   out_vq = vqs[1];
 
/* Start using the new console output. */
virtio_cons.get_chars = get_chars;
@@ -233,17 +231,15 @@ static int __devinit virtcons_probe(struct virtio_device 
*dev)
hvc = hvc_alloc(0, 0, &virtio_cons, PAGE_SIZE);
if (IS_ERR(hvc)) {
err = PTR_ERR(hvc);
-   goto free_out_vq;
+   goto free_vqs;
 

[PATCHv5 2/3] virtio_pci: split up vp_interrupt

2009-05-13 Thread Michael S. Tsirkin
This reorganizes virtio-pci code in vp_interrupt slightly, so that
it's easier to add per-vq MSI support on top.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_pci.c |   53 +++---
 1 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 027f13f..951e673 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -164,6 +164,37 @@ static void vp_notify(struct virtqueue *vq)
iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
 }
 
+/* Handle a configuration change: Tell driver if it wants to know. */
+static irqreturn_t vp_config_changed(int irq, void *opaque)
+{
+   struct virtio_pci_device *vp_dev = opaque;
+   struct virtio_driver *drv;
+   drv = container_of(vp_dev->vdev.dev.driver,
+  struct virtio_driver, driver);
+
+   if (drv && drv->config_changed)
+   drv->config_changed(&vp_dev->vdev);
+   return IRQ_HANDLED;
+}
+
+/* Notify all virtqueues on an interrupt. */
+static irqreturn_t vp_vring_interrupt(int irq, void *opaque)
+{
+   struct virtio_pci_device *vp_dev = opaque;
+   struct virtio_pci_vq_info *info;
+   irqreturn_t ret = IRQ_NONE;
+   unsigned long flags;
+
+   spin_lock_irqsave(&vp_dev->lock, flags);
+   list_for_each_entry(info, &vp_dev->virtqueues, node) {
+   if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+   spin_unlock_irqrestore(&vp_dev->lock, flags);
+
+   return ret;
+}
+
 /* A small wrapper to also acknowledge the interrupt when it's handled.
  * I really need an EIO hook for the vring so I can ack the interrupt once we
  * know that we'll be handling the IRQ but before we invoke the callback since
@@ -173,9 +204,6 @@ static void vp_notify(struct virtqueue *vq)
 static irqreturn_t vp_interrupt(int irq, void *opaque)
 {
struct virtio_pci_device *vp_dev = opaque;
-   struct virtio_pci_vq_info *info;
-   irqreturn_t ret = IRQ_NONE;
-   unsigned long flags;
u8 isr;
 
/* reading the ISR has the effect of also clearing it so it's very
@@ -187,23 +215,10 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
return IRQ_NONE;
 
/* Configuration change?  Tell driver if it wants to know. */
-   if (isr & VIRTIO_PCI_ISR_CONFIG) {
-   struct virtio_driver *drv;
-   drv = container_of(vp_dev->vdev.dev.driver,
-  struct virtio_driver, driver);
-
-   if (drv && drv->config_changed)
-   drv->config_changed(&vp_dev->vdev);
-   }
+   if (isr & VIRTIO_PCI_ISR_CONFIG)
+   vp_config_changed(irq, opaque);
 
-   spin_lock_irqsave(&vp_dev->lock, flags);
-   list_for_each_entry(info, &vp_dev->virtqueues, node) {
-   if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
-   ret = IRQ_HANDLED;
-   }
-   spin_unlock_irqrestore(&vp_dev->lock, flags);
-
-   return ret;
+   return vp_vring_interrupt(irq, opaque);
 }
 
 /* the config->find_vq() implementation */
-- 
1.6.3.rc3.1.g830204

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv5 3/3] virtio_pci: optional MSI-X support

2009-05-13 Thread Michael S. Tsirkin
This implements optional MSI-X support in virtio_pci.
MSI-X is used whenever the host supports at least 2 MSI-X
vectors: 1 for configuration changes and 1 for virtqueues.
Per-virtqueue vectors are allocated if enough vectors
available.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_pci.c |  227 +++
 include/linux/virtio_pci.h  |   10 ++-
 2 files changed, 217 insertions(+), 20 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 951e673..65627a4 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -42,6 +42,26 @@ struct virtio_pci_device
/* a list of queues so we can dispatch IRQs */
spinlock_t lock;
struct list_head virtqueues;
+
+   /* MSI-X support */
+   int msix_enabled;
+   int intx_enabled;
+   struct msix_entry *msix_entries;
+   /* Name strings for interrupts. This size should be enough,
+* and I'm too lazy to allocate each name separately. */
+   char (*msix_names)[256];
+   /* Number of available vectors */
+   unsigned msix_vectors;
+   /* Vectors allocated */
+   unsigned msix_used_vectors;
+};
+
+/* Constants for MSI-X */
+/* Use first vector for configuration changes, second and the rest for
+ * virtqueues Thus, we need at least 2 vectors for MSI. */
+enum {
+   VP_MSIX_CONFIG_VECTOR = 0,
+   VP_MSIX_VQ_VECTOR = 1,
 };
 
 struct virtio_pci_vq_info
@@ -60,6 +80,9 @@ struct virtio_pci_vq_info
 
/* the list node for the virtqueues list */
struct list_head node;
+
+   /* MSI-X vector (or none) */
+   unsigned vector;
 };
 
 /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
@@ -109,7 +132,8 @@ static void vp_get(struct virtio_device *vdev, unsigned 
offset,
   void *buf, unsigned len)
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-   void __iomem *ioaddr = vp_dev->ioaddr + VIRTIO_PCI_CONFIG + offset;
+   void __iomem *ioaddr = vp_dev->ioaddr +
+   VIRTIO_PCI_CONFIG(vp_dev) + offset;
u8 *ptr = buf;
int i;
 
@@ -123,7 +147,8 @@ static void vp_set(struct virtio_device *vdev, unsigned 
offset,
   const void *buf, unsigned len)
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-   void __iomem *ioaddr = vp_dev->ioaddr + VIRTIO_PCI_CONFIG + offset;
+   void __iomem *ioaddr = vp_dev->ioaddr +
+  VIRTIO_PCI_CONFIG(vp_dev) + offset;
const u8 *ptr = buf;
int i;
 
@@ -221,7 +246,121 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
return vp_vring_interrupt(irq, opaque);
 }
 
-/* the config->find_vq() implementation */
+static void vp_free_vectors(struct virtio_device *vdev) {
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   int i;
+
+   if (vp_dev->intx_enabled) {
+   free_irq(vp_dev->pci_dev->irq, vp_dev);
+   vp_dev->intx_enabled = 0;
+   }
+
+   for (i = 0; i < vp_dev->msix_used_vectors; ++i)
+   free_irq(vp_dev->msix_entries[i].vector, vp_dev);
+   vp_dev->msix_used_vectors = 0;
+
+   if (vp_dev->msix_enabled) {
+   /* Disable the vector used for configuration */
+   iowrite16(VIRTIO_MSI_NO_VECTOR,
+ vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
+   /* Flush the write out to device */
+   ioread16(vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
+
+   vp_dev->msix_enabled = 0;
+   pci_disable_msix(vp_dev->pci_dev);
+   }
+}
+
+static int vp_enable_msix(struct pci_dev *dev, struct msix_entry *entries,
+ int *options, int noptions)
+{
+   int i;
+   for (i = 0; i < noptions; ++i)
+   if (!pci_enable_msix(dev, entries, options[i]))
+   return options[i];
+   return -EBUSY;
+}
+
+static int vp_request_vectors(struct virtio_device *vdev, unsigned max_vqs)
+{
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   const char *name = dev_name(&vp_dev->vdev.dev);
+   unsigned i, v;
+   int err = -ENOMEM;
+   /* We want at most one vector per queue and one for config changes.
+* Fallback to separate vectors for config and a shared for queues.
+* Finally fall back to regular interrupts. */
+   int options[] = { max_vqs + 1, 2 };
+   int nvectors = max(options[0], options[1]);
+
+   vp_dev->msix_entries = kmalloc(nvectors * sizeof *vp_dev->msix_entries,
+  GFP_KERNEL);
+   if (!vp_dev->msix_entries)
+   goto error_entries;
+   vp_dev->msix_names = kmalloc(nvectors * sizeof *vp_dev->msix_names,
+GFP_KERNEL);
+   if (!vp_d

[PATCHv6 0/4] virtio: MSI-X support

2009-05-14 Thread Michael S. Tsirkin
Here's the latest draft of virtio patches.
This is on top of Rusty's recent virtqueue list + name patch,
which is included in series for completeness.

Changelog:
changes since v5: fix build on s390 (only patch 2/4 modified)

Michael S. Tsirkin (3):
  virtio: find_vqs/del_vqs virtio operations
  virtio_pci: split up vp_interrupt
  virtio_pci: optional MSI-X support

Rusty Russell (1):
  virtio: add names to virtqueue struct, mapping from devices to
queues.

 drivers/block/virtio_blk.c  |6 +-
 drivers/char/hw_random/virtio-rng.c |6 +-
 drivers/char/virtio_console.c   |   26 ++--
 drivers/lguest/lguest_device.c  |   41 +-
 drivers/net/virtio_net.c|   45 ++---
 drivers/s390/kvm/kvm_virtio.c   |   43 +-
 drivers/virtio/virtio.c |2 +
 drivers/virtio/virtio_balloon.c |   27 ++--
 drivers/virtio/virtio_pci.c |  306 ++-
 drivers/virtio/virtio_ring.c|   25 +++-
 include/linux/virtio.h  |   12 +-
 include/linux/virtio_config.h   |   45 -
 include/linux/virtio_pci.h  |   10 +-
 include/linux/virtio_ring.h |3 +-
 net/9p/trans_virtio.c   |2 +-
 15 files changed, 465 insertions(+), 134 deletions(-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv6 1/4] virtio: add names to virtqueue struct, mapping from devices to queues.

2009-05-14 Thread Michael S. Tsirkin
From: Rusty Russell 

Add a linked list of all virtqueues for a virtio device: this helps for
debugging and is also needed for upcoming interface change.

Also, add a "name" field for clearer debug messages.

Signed-off-by: Rusty Russell 
---

including this Rusty's patch here for completeness.

 drivers/block/virtio_blk.c  |2 +-
 drivers/char/hw_random/virtio-rng.c |2 +-
 drivers/char/virtio_console.c   |4 ++--
 drivers/lguest/lguest_device.c  |5 +++--
 drivers/net/virtio_net.c|6 +++---
 drivers/s390/kvm/kvm_virtio.c   |7 ---
 drivers/virtio/virtio.c |2 ++
 drivers/virtio/virtio_balloon.c |4 ++--
 drivers/virtio/virtio_pci.c |5 +++--
 drivers/virtio/virtio_ring.c|   25 +++--
 include/linux/virtio.h  |   12 
 include/linux/virtio_config.h   |6 --
 include/linux/virtio_ring.h |3 ++-
 net/9p/trans_virtio.c   |2 +-
 14 files changed, 55 insertions(+), 30 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 5d34764..8f7c956 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -224,7 +224,7 @@ static int virtblk_probe(struct virtio_device *vdev)
sg_init_table(vblk->sg, vblk->sg_elems);
 
/* We expect one virtqueue, for output. */
-   vblk->vq = vdev->config->find_vq(vdev, 0, blk_done);
+   vblk->vq = vdev->config->find_vq(vdev, 0, blk_done, "requests");
if (IS_ERR(vblk->vq)) {
err = PTR_ERR(vblk->vq);
goto out_free_vblk;
diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index 86e83f8..2aeafce 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -94,7 +94,7 @@ static int virtrng_probe(struct virtio_device *vdev)
int err;
 
/* We expect a single virtqueue. */
-   vq = vdev->config->find_vq(vdev, 0, random_recv_done);
+   vq = vdev->config->find_vq(vdev, 0, random_recv_done, "input");
if (IS_ERR(vq))
return PTR_ERR(vq);
 
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index ff6f5a4..58684e4 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -202,13 +202,13 @@ static int __devinit virtcons_probe(struct virtio_device 
*dev)
/* Find the input queue. */
/* FIXME: This is why we want to wean off hvc: we do nothing
 * when input comes in. */
-   in_vq = vdev->config->find_vq(vdev, 0, hvc_handle_input);
+   in_vq = vdev->config->find_vq(vdev, 0, hvc_handle_input, "input");
if (IS_ERR(in_vq)) {
err = PTR_ERR(in_vq);
goto free;
}
 
-   out_vq = vdev->config->find_vq(vdev, 1, NULL);
+   out_vq = vdev->config->find_vq(vdev, 1, NULL, "output");
if (IS_ERR(out_vq)) {
err = PTR_ERR(out_vq);
goto free_in_vq;
diff --git a/drivers/lguest/lguest_device.c b/drivers/lguest/lguest_device.c
index df44d96..4babed8 100644
--- a/drivers/lguest/lguest_device.c
+++ b/drivers/lguest/lguest_device.c
@@ -228,7 +228,8 @@ extern void lguest_setup_irq(unsigned int irq);
  * function. */
 static struct virtqueue *lg_find_vq(struct virtio_device *vdev,
unsigned index,
-   void (*callback)(struct virtqueue *vq))
+   void (*callback)(struct virtqueue *vq),
+   const char *name)
 {
struct lguest_device *ldev = to_lgdev(vdev);
struct lguest_vq_info *lvq;
@@ -263,7 +264,7 @@ static struct virtqueue *lg_find_vq(struct virtio_device 
*vdev,
/* OK, tell virtio_ring.c to set up a virtqueue now we know its size
 * and we've got a pointer to its pages. */
vq = vring_new_virtqueue(lvq->config.num, LGUEST_VRING_ALIGN,
-vdev, lvq->pages, lg_notify, callback);
+vdev, lvq->pages, lg_notify, callback, name);
if (!vq) {
err = -ENOMEM;
goto unmap;
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4d1d479..be3b734 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -906,20 +906,20 @@ static int virtnet_probe(struct virtio_device *vdev)
vi->mergeable_rx_bufs = true;
 
/* We expect two virtqueues, receive then send. */
-   vi->rvq = vdev->config->find_vq(vdev, 0, skb_recv_done);
+   vi->rvq = vdev->config->find_vq(vdev, 0, skb_recv_done, "input");
if (IS_ERR(vi->rvq)) {
err = PTR_ERR(vi->rvq);
goto free;
}
 
-   vi->svq = vdev->config->find_vq(vdev, 1, skb_xmit_done);
+   vi->svq = vdev->config->find_vq(vdev, 1, skb_xmit_done, "output");
if (IS_ERR(vi-

[PATCHv6 2/4] virtio: find_vqs/del_vqs virtio operations

2009-05-14 Thread Michael S. Tsirkin
This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
and updates all drivers. This is needed for MSI support, because MSI
needs to know the total number of vectors upfront.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/block/virtio_blk.c  |6 ++--
 drivers/char/hw_random/virtio-rng.c |6 ++--
 drivers/char/virtio_console.c   |   26 ---
 drivers/lguest/lguest_device.c  |   36 +-
 drivers/net/virtio_net.c|   45 +
 drivers/s390/kvm/kvm_virtio.c   |   36 +-
 drivers/virtio/virtio_balloon.c |   27 
 drivers/virtio/virtio_pci.c |   37 ++-
 include/linux/virtio_config.h   |   47 ++
 net/9p/trans_virtio.c   |2 +-
 10 files changed, 181 insertions(+), 87 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 8f7c956..c9f5627 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -224,7 +224,7 @@ static int virtblk_probe(struct virtio_device *vdev)
sg_init_table(vblk->sg, vblk->sg_elems);
 
/* We expect one virtqueue, for output. */
-   vblk->vq = vdev->config->find_vq(vdev, 0, blk_done, "requests");
+   vblk->vq = virtio_find_single_vq(vdev, blk_done, "requests");
if (IS_ERR(vblk->vq)) {
err = PTR_ERR(vblk->vq);
goto out_free_vblk;
@@ -323,7 +323,7 @@ out_put_disk:
 out_mempool:
mempool_destroy(vblk->pool);
 out_free_vq:
-   vdev->config->del_vq(vblk->vq);
+   vdev->config->del_vqs(vdev);
 out_free_vblk:
kfree(vblk);
 out:
@@ -344,7 +344,7 @@ static void virtblk_remove(struct virtio_device *vdev)
blk_cleanup_queue(vblk->disk->queue);
put_disk(vblk->disk);
mempool_destroy(vblk->pool);
-   vdev->config->del_vq(vblk->vq);
+   vdev->config->del_vqs(vdev);
kfree(vblk);
 }
 
diff --git a/drivers/char/hw_random/virtio-rng.c 
b/drivers/char/hw_random/virtio-rng.c
index 2aeafce..f2041fe 100644
--- a/drivers/char/hw_random/virtio-rng.c
+++ b/drivers/char/hw_random/virtio-rng.c
@@ -94,13 +94,13 @@ static int virtrng_probe(struct virtio_device *vdev)
int err;
 
/* We expect a single virtqueue. */
-   vq = vdev->config->find_vq(vdev, 0, random_recv_done, "input");
+   vq = virtio_find_single_vq(vdev, random_recv_done, "input");
if (IS_ERR(vq))
return PTR_ERR(vq);
 
err = hwrng_register(&virtio_hwrng);
if (err) {
-   vdev->config->del_vq(vq);
+   vdev->config->del_vqs(vdev);
return err;
}
 
@@ -112,7 +112,7 @@ static void virtrng_remove(struct virtio_device *vdev)
 {
vdev->config->reset(vdev);
hwrng_unregister(&virtio_hwrng);
-   vdev->config->del_vq(vq);
+   vdev->config->del_vqs(vdev);
 }
 
 static struct virtio_device_id id_table[] = {
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 58684e4..c74dacf 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -188,6 +188,9 @@ static void hvc_handle_input(struct virtqueue *vq)
  * Finally we put our input buffer in the input queue, ready to receive. */
 static int __devinit virtcons_probe(struct virtio_device *dev)
 {
+   vq_callback_t *callbacks[] = { hvc_handle_input, NULL};
+   const char *names[] = { "input", "output" };
+   struct virtqueue *vqs[2];
int err;
 
vdev = dev;
@@ -199,20 +202,15 @@ static int __devinit virtcons_probe(struct virtio_device 
*dev)
goto fail;
}
 
-   /* Find the input queue. */
+   /* Find the queues. */
/* FIXME: This is why we want to wean off hvc: we do nothing
 * when input comes in. */
-   in_vq = vdev->config->find_vq(vdev, 0, hvc_handle_input, "input");
-   if (IS_ERR(in_vq)) {
-   err = PTR_ERR(in_vq);
+   err = vdev->config->find_vqs(vdev, 2, vqs, callbacks, names);
+   if (err)
goto free;
-   }
 
-   out_vq = vdev->config->find_vq(vdev, 1, NULL, "output");
-   if (IS_ERR(out_vq)) {
-   err = PTR_ERR(out_vq);
-   goto free_in_vq;
-   }
+   in_vq = vqs[0];
+   out_vq = vqs[1];
 
/* Start using the new console output. */
virtio_cons.get_chars = get_chars;
@@ -233,17 +231,15 @@ static int __devinit virtcons_probe(struct virtio_device 
*dev)
hvc = hvc_alloc(0, 0, &virtio_cons, PAGE_SIZE);
if (IS_ERR(hvc)) {
err = PTR_ERR(hvc);
-   goto free_out_vq;
+   goto free_vqs;
 

[PATCHv6 3/4] virtio_pci: split up vp_interrupt

2009-05-14 Thread Michael S. Tsirkin
This reorganizes virtio-pci code in vp_interrupt slightly, so that
it's easier to add per-vq MSI support on top.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_pci.c |   53 +++---
 1 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 027f13f..951e673 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -164,6 +164,37 @@ static void vp_notify(struct virtqueue *vq)
iowrite16(info->queue_index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_NOTIFY);
 }
 
+/* Handle a configuration change: Tell driver if it wants to know. */
+static irqreturn_t vp_config_changed(int irq, void *opaque)
+{
+   struct virtio_pci_device *vp_dev = opaque;
+   struct virtio_driver *drv;
+   drv = container_of(vp_dev->vdev.dev.driver,
+  struct virtio_driver, driver);
+
+   if (drv && drv->config_changed)
+   drv->config_changed(&vp_dev->vdev);
+   return IRQ_HANDLED;
+}
+
+/* Notify all virtqueues on an interrupt. */
+static irqreturn_t vp_vring_interrupt(int irq, void *opaque)
+{
+   struct virtio_pci_device *vp_dev = opaque;
+   struct virtio_pci_vq_info *info;
+   irqreturn_t ret = IRQ_NONE;
+   unsigned long flags;
+
+   spin_lock_irqsave(&vp_dev->lock, flags);
+   list_for_each_entry(info, &vp_dev->virtqueues, node) {
+   if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
+   ret = IRQ_HANDLED;
+   }
+   spin_unlock_irqrestore(&vp_dev->lock, flags);
+
+   return ret;
+}
+
 /* A small wrapper to also acknowledge the interrupt when it's handled.
  * I really need an EIO hook for the vring so I can ack the interrupt once we
  * know that we'll be handling the IRQ but before we invoke the callback since
@@ -173,9 +204,6 @@ static void vp_notify(struct virtqueue *vq)
 static irqreturn_t vp_interrupt(int irq, void *opaque)
 {
struct virtio_pci_device *vp_dev = opaque;
-   struct virtio_pci_vq_info *info;
-   irqreturn_t ret = IRQ_NONE;
-   unsigned long flags;
u8 isr;
 
/* reading the ISR has the effect of also clearing it so it's very
@@ -187,23 +215,10 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
return IRQ_NONE;
 
/* Configuration change?  Tell driver if it wants to know. */
-   if (isr & VIRTIO_PCI_ISR_CONFIG) {
-   struct virtio_driver *drv;
-   drv = container_of(vp_dev->vdev.dev.driver,
-  struct virtio_driver, driver);
-
-   if (drv && drv->config_changed)
-   drv->config_changed(&vp_dev->vdev);
-   }
+   if (isr & VIRTIO_PCI_ISR_CONFIG)
+   vp_config_changed(irq, opaque);
 
-   spin_lock_irqsave(&vp_dev->lock, flags);
-   list_for_each_entry(info, &vp_dev->virtqueues, node) {
-   if (vring_interrupt(irq, info->vq) == IRQ_HANDLED)
-   ret = IRQ_HANDLED;
-   }
-   spin_unlock_irqrestore(&vp_dev->lock, flags);
-
-   return ret;
+   return vp_vring_interrupt(irq, opaque);
 }
 
 /* the config->find_vq() implementation */
-- 
1.6.3.rc3.1.g830204

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv6 4/4] virtio_pci: optional MSI-X support

2009-05-14 Thread Michael S. Tsirkin
This implements optional MSI-X support in virtio_pci.
MSI-X is used whenever the host supports at least 2 MSI-X
vectors: 1 for configuration changes and 1 for virtqueues.
Per-virtqueue vectors are allocated if enough vectors
available.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_pci.c |  227 +++
 include/linux/virtio_pci.h  |   10 ++-
 2 files changed, 217 insertions(+), 20 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 951e673..65627a4 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -42,6 +42,26 @@ struct virtio_pci_device
/* a list of queues so we can dispatch IRQs */
spinlock_t lock;
struct list_head virtqueues;
+
+   /* MSI-X support */
+   int msix_enabled;
+   int intx_enabled;
+   struct msix_entry *msix_entries;
+   /* Name strings for interrupts. This size should be enough,
+* and I'm too lazy to allocate each name separately. */
+   char (*msix_names)[256];
+   /* Number of available vectors */
+   unsigned msix_vectors;
+   /* Vectors allocated */
+   unsigned msix_used_vectors;
+};
+
+/* Constants for MSI-X */
+/* Use first vector for configuration changes, second and the rest for
+ * virtqueues Thus, we need at least 2 vectors for MSI. */
+enum {
+   VP_MSIX_CONFIG_VECTOR = 0,
+   VP_MSIX_VQ_VECTOR = 1,
 };
 
 struct virtio_pci_vq_info
@@ -60,6 +80,9 @@ struct virtio_pci_vq_info
 
/* the list node for the virtqueues list */
struct list_head node;
+
+   /* MSI-X vector (or none) */
+   unsigned vector;
 };
 
 /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
@@ -109,7 +132,8 @@ static void vp_get(struct virtio_device *vdev, unsigned 
offset,
   void *buf, unsigned len)
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-   void __iomem *ioaddr = vp_dev->ioaddr + VIRTIO_PCI_CONFIG + offset;
+   void __iomem *ioaddr = vp_dev->ioaddr +
+   VIRTIO_PCI_CONFIG(vp_dev) + offset;
u8 *ptr = buf;
int i;
 
@@ -123,7 +147,8 @@ static void vp_set(struct virtio_device *vdev, unsigned 
offset,
   const void *buf, unsigned len)
 {
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
-   void __iomem *ioaddr = vp_dev->ioaddr + VIRTIO_PCI_CONFIG + offset;
+   void __iomem *ioaddr = vp_dev->ioaddr +
+  VIRTIO_PCI_CONFIG(vp_dev) + offset;
const u8 *ptr = buf;
int i;
 
@@ -221,7 +246,121 @@ static irqreturn_t vp_interrupt(int irq, void *opaque)
return vp_vring_interrupt(irq, opaque);
 }
 
-/* the config->find_vq() implementation */
+static void vp_free_vectors(struct virtio_device *vdev) {
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   int i;
+
+   if (vp_dev->intx_enabled) {
+   free_irq(vp_dev->pci_dev->irq, vp_dev);
+   vp_dev->intx_enabled = 0;
+   }
+
+   for (i = 0; i < vp_dev->msix_used_vectors; ++i)
+   free_irq(vp_dev->msix_entries[i].vector, vp_dev);
+   vp_dev->msix_used_vectors = 0;
+
+   if (vp_dev->msix_enabled) {
+   /* Disable the vector used for configuration */
+   iowrite16(VIRTIO_MSI_NO_VECTOR,
+ vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
+   /* Flush the write out to device */
+   ioread16(vp_dev->ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
+
+   vp_dev->msix_enabled = 0;
+   pci_disable_msix(vp_dev->pci_dev);
+   }
+}
+
+static int vp_enable_msix(struct pci_dev *dev, struct msix_entry *entries,
+ int *options, int noptions)
+{
+   int i;
+   for (i = 0; i < noptions; ++i)
+   if (!pci_enable_msix(dev, entries, options[i]))
+   return options[i];
+   return -EBUSY;
+}
+
+static int vp_request_vectors(struct virtio_device *vdev, unsigned max_vqs)
+{
+   struct virtio_pci_device *vp_dev = to_vp_device(vdev);
+   const char *name = dev_name(&vp_dev->vdev.dev);
+   unsigned i, v;
+   int err = -ENOMEM;
+   /* We want at most one vector per queue and one for config changes.
+* Fallback to separate vectors for config and a shared for queues.
+* Finally fall back to regular interrupts. */
+   int options[] = { max_vqs + 1, 2 };
+   int nvectors = max(options[0], options[1]);
+
+   vp_dev->msix_entries = kmalloc(nvectors * sizeof *vp_dev->msix_entries,
+  GFP_KERNEL);
+   if (!vp_dev->msix_entries)
+   goto error_entries;
+   vp_dev->msix_names = kmalloc(nvectors * sizeof *vp_dev->msix_names,
+GFP_KERNEL);
+   if (!vp_d

Re: [PATCHv6 4/4] virtio_pci: optional MSI-X support

2009-05-18 Thread Michael S. Tsirkin
On Mon, May 18, 2009 at 12:30:49AM +0300, Avi Kivity wrote:
> Michael S. Tsirkin wrote:
>> This implements optional MSI-X support in virtio_pci.
>> MSI-X is used whenever the host supports at least 2 MSI-X
>> vectors: 1 for configuration changes and 1 for virtqueues.
>> Per-virtqueue vectors are allocated if enough vectors
>> available.
> 
>
> I'm not sure I understand how the vq -> msi mapping works.  Do we  
> actually support an arbitrary mapping, or just either linear or n:1?

Arbitrary mapping.

> I don't mind the driver being limited, but the device interface should  
> be flexible.  We'll want to deal with limited vector availability soon.

I agree.

The code in qemu lets you specify, for each queue, which MSIX vector you
want to use, or a special value if you don't want signalling. You also
specify which MSIX vector you want to use for config change
notifications, or a special value if you want to e.g. poll.

I think that's as flexible as it gets.

The API within guest is much simpler, but it does not need to be stable.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2-RFC 0/2] qemu-kvm: MSI-X support

2009-05-20 Thread Michael S. Tsirkin
Here's a new version of MSI-X support patchset. I have completed
save/load support, and added a global option to disable MSI-X.

This is on top of qemu-kvm.git/queue

Todo: split patch up, support configurations without kernel irqchip,
apply to upstream qemu.git.

Anthony, please take a look at capability bit support that
I added in pci save/load. Does it look sane?
Simple version number is not sufficient for msi, because
user has an option to turn off msi even in new qemu.

Signed-off-by: Michael S. Tsirkin 


Michael S. Tsirkin (2):
  qemu-kvm: add MSI-X support
  qemu-kvm: use common code for assigned msix

 Makefile.target|2 +-
 hw/device-assignment.c |  329 +--
 hw/device-assignment.h |7 +-
 hw/msix.c  |  454 
 hw/msix.h  |   38 
 hw/pci.c   |  135 +++
 hw/pci.h   |   72 +++-
 hw/virtio-balloon.c|2 +-
 hw/virtio-blk.c|3 +-
 hw/virtio-console.c|3 +-
 hw/virtio-net.c|3 +-
 hw/virtio.c|  206 ++
 hw/virtio.h|6 +-
 qemu-options.hx|2 +
 vl.c   |3 +
 15 files changed, 929 insertions(+), 336 deletions(-)
 create mode 100644 hw/msix.c
 create mode 100644 hw/msix.h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2-RFC 1/2] qemu-kvm: add MSI-X support

2009-05-20 Thread Michael S. Tsirkin
This adds MSI-X support infrastructure and uses that to enable MSI-X
support in virtio net device. Also add a global option to disable MSI-X.

Signed-off-by: Michael S. Tsirkin 
---
 Makefile.target|2 +-
 hw/device-assignment.c |2 +
 hw/msix.c  |  447 
 hw/msix.h  |   38 
 hw/pci.c   |  135 +++
 hw/pci.h   |   68 +++-
 hw/virtio-balloon.c|2 +-
 hw/virtio-blk.c|3 +-
 hw/virtio-console.c|3 +-
 hw/virtio-net.c|3 +-
 hw/virtio.c|  206 ++
 hw/virtio.h|6 +-
 qemu-options.hx|2 +
 vl.c   |3 +
 14 files changed, 838 insertions(+), 82 deletions(-)
 create mode 100644 hw/msix.c
 create mode 100644 hw/msix.h

diff --git a/Makefile.target b/Makefile.target
index 979c07f..e049550 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -542,7 +542,7 @@ endif #CONFIG_BSD_USER
 ifndef CONFIG_USER_ONLY
 
 OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o dma-helpers.o \
- gdbstub.o gdbstub-xml.o
+ gdbstub.o gdbstub-xml.o msix.o
 # virtio has to be here due to weird dependency between PCI and virtio-net.
 # need to fix this properly
 OBJS+=virtio.o virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o
diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 624d15a..4806112 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1151,6 +1151,8 @@ struct PCIDevice *init_assigned_device(AssignedDevInfo 
*adev, PCIBus *bus)
 assigned_device_pci_cap_init) < 0)
 goto assigned_out;
 
+pci_dev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
+
 /* assign device to guest */
 r = assign_device(adev);
 if (r < 0)
diff --git a/hw/msix.c b/hw/msix.c
new file mode 100644
index 000..323eabc
--- /dev/null
+++ b/hw/msix.c
@@ -0,0 +1,447 @@
+/*
+ * MSI-X device support
+ *
+ * This module includes support for MSI-X in pci devices.
+ *
+ * Author: Michael S. Tsirkin 
+ *
+ *  Copyright (c) 2009, Red Hat Inc, Michael S. Tsirkin (m...@redhat.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "hw.h"
+#include "msix.h"
+#include "pci.h"
+#include 
+
+/* Declaration from linux/pci_regs.h */
+#define  PCI_CAP_ID_MSIX 0x11 /* MSI-X */
+#define  PCI_MSIX_FLAGS 2 /* Table at lower 11 bits */
+#define  PCI_MSIX_FLAGS_QSIZE  0x7FF
+#define  PCI_MSIX_FLAGS_ENABLE (1 << 15)
+#define  PCI_MSIX_FLAGS_BIRMASK(7 << 0)
+
+/* MSI-X capability structure */
+#define MSIX_TABLE_OFFSET 4
+#define MSIX_PBA_OFFSET 8
+
+/* MSI-X table format */
+#define MSIX_MSG_ADDR 0
+#define MSIX_MSG_UPPER_ADDR 4
+#define MSIX_MSG_DATA 8
+#define MSIX_VECTOR_CTRL 12
+#define MSIX_ENTRY_SIZE 16
+#define MSIX_VECTOR_MASK 0x1
+
+/* How much space does an MSIX table need. */
+/* The spec requires giving the table structure
+ * a 4K aligned region all by itself. Align it to
+ * target pages so that drivers can do passthrough
+ * on the rest of the region. */
+#define MSIX_PAGE_SIZE TARGET_PAGE_ALIGN(0x1000)
+
+#ifdef MSIX_DEBUG
+#define DEBUG(fmt, ...)   \
+do {  \
+  fprintf(stderr, "%s: " fmt, __func__ , __VA_ARGS__);\
+} while (0)
+#else
+#define DEBUG(fmt, ...) do { } while(0)
+#endif
+
+/* Flag to globally disable MSI-X support */
+int msix_disable;
+
+/* Add MSI-X capability to the config space for the device. */
+/* Given a bar and its size, add MSI-X table on top of it
+ * and fill MSI-X capability in the config space.
+ * Original bar size must be a power of 2 or 0.
+ * New bar size is returned. */
+static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
+   unsigned bar_nr, unsigned bar_size)
+{
+unsigned config_offset = pdev->cap.start + pdev->cap.length;
+uint8_t *config = pdev->config + config_offset;
+uint32_t new_size;
+
+if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1)
+return -EINVAL;
+if (bar_size > 0x8000)
+return -ENOSPC;
+
+/* Add space for MSI-X structures */
+if (!bar_size)
+new_size = MSIX_PAGE_SIZE;
+else if (bar_size < MSIX_PAGE_SIZE) {
+bar_size = MSIX_PAGE_SIZE;
+new_size = MSIX_PAGE_SIZE * 2;
+} else
+new_size = bar_size * 2;
+
+pdev->msix_bar_size = new_size;
+
+pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
+/* Table on top of BAR */
+pci_set_long(config + MSIX_TABLE_OFFSET, bar_size | bar_nr);
+/* Pending bits on top of that */
+pci_set_long(config + MSIX_PBA_OFFSET, (bar_size + MSIX_PAGE_SIZE / 2) |
+ bar_nr);
+pdev->cap.msi

[PATCHv2-RFC 2/2] qemu-kvm: use common code for assigned msix

2009-05-20 Thread Michael S. Tsirkin
For assigned devices, use common code to enable msi-x. We need a special
"assigned" option as assigned devices lack a standard way to get vector
usage.

Signed-off-by: Michael S. Tsirkin 
---
 hw/device-assignment.c |  329 
 hw/device-assignment.h |7 +-
 hw/msix.c  |9 ++-
 hw/pci.h   |4 +
 4 files changed, 93 insertions(+), 256 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 4806112..91a7bd7 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -33,6 +33,7 @@
 #include 
 #include "qemu-kvm.h"
 #include "hw.h"
+#include "msix.h"
 #include "pc.h"
 #include "sysemu.h"
 #include "console.h"
@@ -150,11 +151,10 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, 
int region_num,
 {
 AssignedDevice *r_dev = container_of(pci_dev, AssignedDevice, dev);
 AssignedDevRegion *region = &r_dev->v_addrs[region_num];
-PCIRegion *real_region = &r_dev->real_device.regions[region_num];
 uint32_t old_ephys = region->e_physbase;
 uint32_t old_esize = region->e_size;
 int first_map = (region->e_size == 0);
-int ret = 0;
+int ret;
 
 DEBUG("e_phys=%08x r_virt=%p type=%d len=%08x region_num=%d \n",
   e_phys, region->u.r_virtbase, type, e_size, region_num);
@@ -166,27 +166,17 @@ static void assigned_dev_iomem_map(PCIDevice *pci_dev, 
int region_num,
kvm_destroy_phys_mem(kvm_context, old_ephys,
  TARGET_PAGE_ALIGN(old_esize));
 
-if (e_size > 0) {
-/* deal with MSI-X MMIO page */
-if (real_region->base_addr <= r_dev->msix_table_addr &&
-real_region->base_addr + real_region->size >=
-r_dev->msix_table_addr) {
-int offset = r_dev->msix_table_addr - real_region->base_addr;
-ret = munmap(region->u.r_virtbase + offset, TARGET_PAGE_SIZE);
-if (ret == 0)
-DEBUG("munmap done, virt_base 0x%p\n",
-region->u.r_virtbase + offset);
-else {
-fprintf(stderr, "%s: fail munmap msix table!\n", __func__);
-exit(1);
-}
-cpu_register_physical_memory(e_phys + offset,
-TARGET_PAGE_SIZE, r_dev->mmio_index);
-}
-   ret = kvm_register_phys_mem(kvm_context, e_phys,
-region->u.r_virtbase,
-TARGET_PAGE_ALIGN(e_size), 0);
-}
+if (e_size <= 0)
+return;
+
+/* deal with MSI-X MMIO page */
+msix_mmio_map(pci_dev, region_num, e_phys, e_size, type);
+/* Only register as much memory as required to cover the
+ * actual device region. */
+e_size = r_dev->real_device.regions[region_num].size;
+ret = kvm_register_phys_mem(kvm_context, e_phys,
+region->u.r_virtbase,
+TARGET_PAGE_ALIGN(e_size), 0);
 
 if (ret != 0) {
fprintf(stderr, "%s: Error: create new mapping failed\n", __func__);
@@ -378,11 +368,16 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 
 /* handle memory io regions */
 if (cur_region->type & IORESOURCE_MEM) {
+uint32_t size = i == msix_bar_nr(&pci_dev->dev)
+? msix_bar_size(&pci_dev->dev) : cur_region->size;
+
 int t = cur_region->type & IORESOURCE_PREFETCH
 ? PCI_ADDRESS_SPACE_MEM_PREFETCH
 : PCI_ADDRESS_SPACE_MEM;
 
 /* map physical memory */
+/* MSI-X table is located outside cur_region->size
+ * and so won't be mapped */
 pci_dev->v_addrs[i].e_physbase = cur_region->base_addr;
 pci_dev->v_addrs[i].u.r_virtbase =
 mmap(NULL,
@@ -397,7 +392,7 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 (uint32_t) (cur_region->base_addr));
 return -1;
 }
-pci_dev->v_addrs[i].r_size = cur_region->size;
+pci_dev->v_addrs[i].r_size = size;
 pci_dev->v_addrs[i].e_size = 0;
 
 /* add offset */
@@ -405,8 +400,7 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 (cur_region->base_addr & 0xFFF);
 
 pci_register_io_region((PCIDevice *) pci_dev, i,
-   cur_region->size, t,
-   assigned_dev_iomem_map);
+   size, t, assigned_dev_iomem_map);
 continue;
 }
 /* handle port io regions */
@@ -542,11 +536,11 @@ static void 

Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-20 Thread Michael S. Tsirkin
On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
> On 5/20/09, Michael S. Tsirkin  wrote:
> > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
> >  > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > > define api for allocating/setting up msi-x irqs, and for updating them
> >  > >  with msi-x vector information, supply implementation in ioapic. Please
> >  > >  comment on this API: I intend to port my msi-x patch to work on top of
> >  > >  it.
> >  > >
> >  > >  Signed-off-by: Michael S. Tsirkin 
> >  >
> >  > Sparc64 also uses packets ("mondos", not implemented yet) for
> >  > interrupt vector data, there the packet size is 8 * 64 bits.
> >  > I think we should aim for a more generic API that covers this case also.
> >
> >
> > Are you sure this is a good idea? MSI is tied to PCI, and PCI only has
> >  MSI, not "mondos". What code would benefit from this abstraction?
> 
> Sparc64 emulation, of course. I think also the API would be neater.
> 
> >  > For example, irq.c could support opaque packet payload of
> >  > unspecified/predefined size.  MSI packet structure should be defined
> >  > in ioapic.c.
> >
> >
> > Note that MSI does not have packets and MSI interrupts do not pass any 
> > payload.
> 
> I don't know too much about MSI, what's the 'data' field in msi_state then?

opaque stuff that apic uses to select irq and cpu.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-20 Thread Michael S. Tsirkin
On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
> On 5/20/09, Michael S. Tsirkin  wrote:
> > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
> >  > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > > define api for allocating/setting up msi-x irqs, and for updating them
> >  > >  with msi-x vector information, supply implementation in ioapic. Please
> >  > >  comment on this API: I intend to port my msi-x patch to work on top of
> >  > >  it.
> >  > >
> >  > >  Signed-off-by: Michael S. Tsirkin 
> >  >
> >  > Sparc64 also uses packets ("mondos", not implemented yet) for
> >  > interrupt vector data, there the packet size is 8 * 64 bits.
> >  > I think we should aim for a more generic API that covers this case also.
> >
> >
> > Are you sure this is a good idea? MSI is tied to PCI, and PCI only has
> >  MSI, not "mondos". What code would benefit from this abstraction?
> 
> Sparc64 emulation, of course. I think also the API would be neater.

Since "mondos" are not interrupts, why use irqs for them?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-20 Thread Michael S. Tsirkin
On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
> On 5/20/09, Michael S. Tsirkin  wrote:
> > define api for allocating/setting up msi-x irqs, and for updating them
> >  with msi-x vector information, supply implementation in ioapic. Please
> >  comment on this API: I intend to port my msi-x patch to work on top of
> >  it.
> >
> >  Signed-off-by: Michael S. Tsirkin 
> 
> Sparc64 also uses packets ("mondos", not implemented yet) for
> interrupt vector data, there the packet size is 8 * 64 bits.
> I think we should aim for a more generic API that covers this case also.

Are you sure this is a good idea? MSI is tied to PCI, and PCI only has
MSI, not "mondos". What code would benefit from this abstraction?

> For example, irq.c could support opaque packet payload of
> unspecified/predefined size.  MSI packet structure should be defined
> in ioapic.c.

Note that MSI does not have packets and MSI interrupts do not pass any payload.

> The pci_msi_ops structure could be 'const', or do you expect it to
> change during execution?

Right. I'll fix that.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH] qemu: msi irq allocation api

2009-05-20 Thread Michael S. Tsirkin
define api for allocating/setting up msi-x irqs, and for updating them
with msi-x vector information, supply implementation in ioapic. Please
comment on this API: I intend to port my msi-x patch to work on top of
it.

Signed-off-by: Michael S. Tsirkin 
---
 hw/apic.c |1 -
 hw/ioapic.c   |   65 +
 hw/irq.c  |   10 
 hw/irq.h  |5 
 hw/pc.c   |1 +
 hw/pc.h   |2 +
 hw/pci.c  |2 +
 hw/pci.h  |   10 
 qemu-common.h |1 +
 9 files changed, 96 insertions(+), 1 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index d63d74b..2d2de69 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -929,4 +929,3 @@ int apic_init(CPUState *env)
 local_apics[s->id] = s;
 return 0;
 }
-
diff --git a/hw/ioapic.c b/hw/ioapic.c
index 317c2c2..5a99c46 100644
--- a/hw/ioapic.c
+++ b/hw/ioapic.c
@@ -23,6 +23,7 @@
 
 #include "hw.h"
 #include "pc.h"
+#include "pci.h"
 #include "qemu-timer.h"
 #include "host-utils.h"
 
@@ -43,6 +44,16 @@
 #define IOAPIC_DM_SIPI 0x5
 #define IOAPIC_DM_EXTINT   0x7
 
+/* Intel APIC constants: from include/asm/msidef.h */
+#define MSI_DATA_VECTOR_SHIFT  0
+#define MSI_DATA_VECTOR_MASK   0x00ff
+#define MSI_DATA_DELIVERY_MODE_SHIFT   8
+#define MSI_ADDR_DEST_MODE_SHIFT   2
+#define MSI_DATA_TRIGGER_SHIFT 15
+#define MSI_ADDR_DEST_ID_SHIFT 12
+#defineMSI_ADDR_DEST_ID_MASK   0x000
+#define MSI_DATA_LEVEL_SHIFT   14
+
 struct IOAPICState {
 uint8_t id;
 uint8_t ioregsel;
@@ -51,6 +62,11 @@ struct IOAPICState {
 uint64_t ioredtbl[IOAPIC_NUM_PINS];
 };
 
+struct msi_state {
+uint64_t addr;
+uint32_t data;
+};
+
 static void ioapic_service(IOAPICState *s)
 {
 uint8_t i;
@@ -259,3 +275,52 @@ IOAPICState *ioapic_init(void)
 
 return s;
 }
+
+/* MSI/MSI-X support */
+static void ioapic_send_msi(void *opaque, int irq, int level)
+{
+struct msi_state *state = opaque;
+uint8_t dest = (state[irq].addr & MSI_ADDR_DEST_ID_MASK)
+>> MSI_ADDR_DEST_ID_SHIFT;
+uint8_t vector = ((state[irq].addr >> 32) & MSI_DATA_VECTOR_MASK)
+>> MSI_DATA_VECTOR_SHIFT;
+uint8_t dest_mode = (state[irq].addr >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
+uint8_t trigger_mode = (state[irq].data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
+uint8_t delivery = (state[irq].data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
+apic_deliver_irq(dest, dest_mode, delivery, vector, 0, trigger_mode);
+}
+
+static qemu_irq *ioapic_allocate_msi(int nentries)
+{
+struct msi_state *state = qemu_mallocz(nentries * sizeof *state);
+qemu_irq *irqs;
+if (!state)
+return NULL;
+irqs = qemu_allocate_irqs(ioapic_send_msi, state, nentries);
+if (!irqs)
+qemu_free(state);
+return irqs;
+}
+
+static void ioapic_free_msi(qemu_irq *irq)
+{
+qemu_free(qemu_irq_get_opaque(irq[0]));
+qemu_free_irqs(irq);
+}
+
+static int ioapic_update_msi(qemu_irq irq, uint64_t addr, uint32_t data,
+ int masked)
+{
+struct msi_state *state = qemu_irq_get_opaque(irq);
+int vector = qemu_irq_get_vector(irq);
+state[vector].addr = addr;
+state[vector].data = data;
+return 0;
+}
+
+struct pci_msi_ops ioapic_msi_ops = {
+.allocate = ioapic_allocate_msi,
+.update = ioapic_update_msi,
+.free = ioapic_free_msi,
+};
+
diff --git a/hw/irq.c b/hw/irq.c
index 7703f62..9180381 100644
--- a/hw/irq.c
+++ b/hw/irq.c
@@ -75,3 +75,13 @@ qemu_irq qemu_irq_invert(qemu_irq irq)
 qemu_irq_raise(irq);
 return qemu_allocate_irqs(qemu_notirq, irq, 1)[0];
 }
+
+void *qemu_irq_get_opaque(qemu_irq irq)
+{
+return irq->opaque;
+}
+
+int qemu_irq_get_vector(qemu_irq irq)
+{
+return irq->n;
+}
diff --git a/hw/irq.h b/hw/irq.h
index 5daae44..0e3144d 100644
--- a/hw/irq.h
+++ b/hw/irq.h
@@ -32,4 +32,9 @@ void qemu_free_irqs(qemu_irq *s);
 /* Returns a new IRQ with opposite polarity.  */
 qemu_irq qemu_irq_invert(qemu_irq irq);
 
+/* Get the pointer stored in the irq. */
+void *qemu_irq_get_opaque(qemu_irq irq);
+
+/* Get vector stored in the irq */
+int qemu_irq_get_vector(qemu_irq irq);
 #endif
diff --git a/hw/pc.c b/hw/pc.c
index 61f6e7b..1b287c3 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -962,6 +962,7 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size,
 if (pci_enabled) {
 pci_bus = i440fx_init(&i440fx_state, i8259);
 piix3_devfn = piix3_init(pci_bus, -1);
+pci_msi_ops = &ioapic_msi_ops;
 } else {
 pci_bus = NULL;
 }
diff --git a/hw/pc.h b/hw/pc.h
index 50e6c39..2013aa9 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -55,6 +55,8 @@ void ioapic_set_irq(void *opaque, int vector, int level);
 void apic_reset_irq_delivered(void);
 int apic_get_irq_delivered(void);
 
+extern struct pci_m

Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-20 Thread Michael S. Tsirkin
On Wed, May 20, 2009 at 09:38:58PM +0300, Blue Swirl wrote:
> On 5/20/09, Michael S. Tsirkin  wrote:
> > On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
> >  > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
> >  > >  > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > >  > > define api for allocating/setting up msi-x irqs, and for updating 
> > them
> >  > >  > >  with msi-x vector information, supply implementation in ioapic. 
> > Please
> >  > >  > >  comment on this API: I intend to port my msi-x patch to work on 
> > top of
> >  > >  > >  it.
> >  > >  > >
> >  > >  > >  Signed-off-by: Michael S. Tsirkin 
> >  > >  >
> >  > >  > Sparc64 also uses packets ("mondos", not implemented yet) for
> >  > >  > interrupt vector data, there the packet size is 8 * 64 bits.
> >  > >  > I think we should aim for a more generic API that covers this case 
> > also.
> >  > >
> >  > >
> >  > > Are you sure this is a good idea? MSI is tied to PCI, and PCI only has
> >  > >  MSI, not "mondos". What code would benefit from this abstraction?
> >  >
> >  > Sparc64 emulation, of course. I think also the API would be neater.
> >
> >
> > Since "mondos" are not interrupts, why use irqs for them?
> 
> I just said above that they are used for interrupt vector data. What
> makes you think they are not interrupts?

I'm sorry, I don't really know anything about sparc.
All I am saying is that in PCI, interrupts never pass data,
so qemu_set_irq as it is now, is a good API to send them.

For the sparc feature you describe, you probably want to add
a message data parameter to qemu_set_irq, but it's not
really useful for MSI.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-20 Thread Michael S. Tsirkin
On Wed, May 20, 2009 at 11:02:24PM +0300, Michael S. Tsirkin wrote:
> On Wed, May 20, 2009 at 09:38:58PM +0300, Blue Swirl wrote:
> > On 5/20/09, Michael S. Tsirkin  wrote:
> > > On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
> > >  > On 5/20/09, Michael S. Tsirkin  wrote:
> > >  > > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
> > >  > >  > On 5/20/09, Michael S. Tsirkin  wrote:
> > >  > >  > > define api for allocating/setting up msi-x irqs, and for 
> > > updating them
> > >  > >  > >  with msi-x vector information, supply implementation in 
> > > ioapic. Please
> > >  > >  > >  comment on this API: I intend to port my msi-x patch to work 
> > > on top of
> > >  > >  > >  it.
> > >  > >  > >
> > >  > >  > >  Signed-off-by: Michael S. Tsirkin 
> > >  > >  >
> > >  > >  > Sparc64 also uses packets ("mondos", not implemented yet) for
> > >  > >  > interrupt vector data, there the packet size is 8 * 64 bits.
> > >  > >  > I think we should aim for a more generic API that covers this 
> > > case also.
> > >  > >
> > >  > >
> > >  > > Are you sure this is a good idea? MSI is tied to PCI, and PCI only 
> > > has
> > >  > >  MSI, not "mondos". What code would benefit from this abstraction?
> > >  >
> > >  > Sparc64 emulation, of course. I think also the API would be neater.
> > >
> > >
> > > Since "mondos" are not interrupts, why use irqs for them?
> > 
> > I just said above that they are used for interrupt vector data. What
> > makes you think they are not interrupts?
> 
> I'm sorry, I don't really know anything about sparc.
> All I am saying is that in PCI, interrupts never pass data,
> so qemu_set_irq as it is now, is a good API to send them.
> 
> For the sparc feature you describe, you probably want to add
> a message data parameter to qemu_set_irq, but it's not
> really useful for MSI.

Just to clarify, the main difference is that with MSI/MSI-X
both data and address fields are mostly static, modifying them
involves ioapic and device updates which might be an expensive
operation (e.g. with kvm, needs an extra system call).

So I don't think it makes sense to pass MSI-X data field
with each call to qemu_set_irq.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-20 Thread Michael S. Tsirkin
On Wed, May 20, 2009 at 11:18:56PM +0300, Blue Swirl wrote:
> On 5/20/09, Michael S. Tsirkin  wrote:
> > On Wed, May 20, 2009 at 09:38:58PM +0300, Blue Swirl wrote:
> >  > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > > On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
> >  > >  > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > >  > > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
> >  > >  > >  > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > >  > >  > > define api for allocating/setting up msi-x irqs, and for 
> > updating them
> >  > >  > >  > >  with msi-x vector information, supply implementation in 
> > ioapic. Please
> >  > >  > >  > >  comment on this API: I intend to port my msi-x patch to 
> > work on top of
> >  > >  > >  > >  it.
> >  > >  > >  > >
> >  > >  > >  > >  Signed-off-by: Michael S. Tsirkin 
> >  > >  > >  >
> >  > >  > >  > Sparc64 also uses packets ("mondos", not implemented yet) for
> >  > >  > >  > interrupt vector data, there the packet size is 8 * 64 bits.
> >  > >  > >  > I think we should aim for a more generic API that covers this 
> > case also.
> >  > >  > >
> >  > >  > >
> >  > >  > > Are you sure this is a good idea? MSI is tied to PCI, and PCI 
> > only has
> >  > >  > >  MSI, not "mondos". What code would benefit from this abstraction?
> >  > >  >
> >  > >  > Sparc64 emulation, of course. I think also the API would be neater.
> >  > >
> >  > >
> >  > > Since "mondos" are not interrupts, why use irqs for them?
> >  >
> >  > I just said above that they are used for interrupt vector data. What
> >  > makes you think they are not interrupts?
> >
> >
> > I'm sorry, I don't really know anything about sparc.
> >  All I am saying is that in PCI, interrupts never pass data,
> >  so qemu_set_irq as it is now, is a good API to send them.
> >
> >  For the sparc feature you describe, you probably want to add
> >  a message data parameter to qemu_set_irq, but it's not
> >  really useful for MSI.
> 
> But maybe this API would be useful for both MSI and mondo packets:
> /* Get vector data stored in the irq */
> void *qemu_irq_get_data(qemu_irq irq);

Hmm. Yes, it would work for MSI.
However I am thinking ahead to kvm support, which needs to
update the irqchip (with a system call) on each state update.
So 
- I really need a callback on data update
- data update might fail

Which if added to the generic API would make it too complex,
IMO.

> /* Set vector data stored in the irq */
> void qemu_irq_set_data(qemu_irq irq, void *data);
> You'd use a struct msi_state as the opaque (not using the array
> indexed by irq number), I'd use some kind of
> struct mondo {
>  uint64_t packet[8];
> };
> 
> Instead of ioapic_update_msi(), you'd call qemu_irq_set_data().

I guess it's a matter of taste, personally I prefer type-safe functions
to hiding everything in typeless void* buffers.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-20 Thread Michael S. Tsirkin
On Wed, May 20, 2009 at 11:26:42PM +0300, Blue Swirl wrote:
> On 5/20/09, Michael S. Tsirkin  wrote:
> > On Wed, May 20, 2009 at 11:02:24PM +0300, Michael S. Tsirkin wrote:
> >  > On Wed, May 20, 2009 at 09:38:58PM +0300, Blue Swirl wrote:
> >  > > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > > > On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
> >  > > >  > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > > >  > > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
> >  > > >  > >  > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > > >  > >  > > define api for allocating/setting up msi-x irqs, and for 
> > updating them
> >  > > >  > >  > >  with msi-x vector information, supply implementation in 
> > ioapic. Please
> >  > > >  > >  > >  comment on this API: I intend to port my msi-x patch to 
> > work on top of
> >  > > >  > >  > >  it.
> >  > > >  > >  > >
> >  > > >  > >  > >  Signed-off-by: Michael S. Tsirkin 
> >  > > >  > >  >
> >  > > >  > >  > Sparc64 also uses packets ("mondos", not implemented yet) for
> >  > > >  > >  > interrupt vector data, there the packet size is 8 * 64 bits.
> >  > > >  > >  > I think we should aim for a more generic API that covers 
> > this case also.
> >  > > >  > >
> >  > > >  > >
> >  > > >  > > Are you sure this is a good idea? MSI is tied to PCI, and PCI 
> > only has
> >  > > >  > >  MSI, not "mondos". What code would benefit from this 
> > abstraction?
> >  > > >  >
> >  > > >  > Sparc64 emulation, of course. I think also the API would be 
> > neater.
> >  > > >
> >  > > >
> >  > > > Since "mondos" are not interrupts, why use irqs for them?
> >  > >
> >  > > I just said above that they are used for interrupt vector data. What
> >  > > makes you think they are not interrupts?
> >  >
> >  > I'm sorry, I don't really know anything about sparc.
> >  > All I am saying is that in PCI, interrupts never pass data,
> >  > so qemu_set_irq as it is now, is a good API to send them.
> >  >
> >  > For the sparc feature you describe, you probably want to add
> >  > a message data parameter to qemu_set_irq, but it's not
> >  > really useful for MSI.
> >
> >
> > Just to clarify, the main difference is that with MSI/MSI-X
> >  both data and address fields are mostly static, modifying them
> >  involves ioapic and device updates which might be an expensive
> >  operation (e.g. with kvm, needs an extra system call).
> >
> >  So I don't think it makes sense to pass MSI-X data field
> >  with each call to qemu_set_irq.
> 
> No, but I think the Sparc situation is the same, the packet data is
> static for the interrupt source in question.

So, ok, we could add data update callback and then MSI and sparc
would do their thing there. I'm not convinced I like all this
play with untyped buffers, do you think it's helpful?

If yes, maybe I'll try to code it up and see how does it look.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Wed, May 20, 2009 at 11:44:57PM +0300, Blue Swirl wrote:
> On 5/20/09, Michael S. Tsirkin  wrote:
> > On Wed, May 20, 2009 at 11:26:42PM +0300, Blue Swirl wrote:
> >  > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > > On Wed, May 20, 2009 at 11:02:24PM +0300, Michael S. Tsirkin wrote:
> >  > >  > On Wed, May 20, 2009 at 09:38:58PM +0300, Blue Swirl wrote:
> >  > >  > > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > >  > > > On Wed, May 20, 2009 at 08:44:31PM +0300, Blue Swirl wrote:
> >  > >  > > >  > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > >  > > >  > > On Wed, May 20, 2009 at 08:21:01PM +0300, Blue Swirl wrote:
> >  > >  > > >  > >  > On 5/20/09, Michael S. Tsirkin  wrote:
> >  > >  > > >  > >  > > define api for allocating/setting up msi-x irqs, and 
> > for updating them
> >  > >  > > >  > >  > >  with msi-x vector information, supply implementation 
> > in ioapic. Please
> >  > >  > > >  > >  > >  comment on this API: I intend to port my msi-x patch 
> > to work on top of
> >  > >  > > >  > >  > >  it.
> >  > >  > > >  > >  > >
> >  > >  > > >  > >  > >  Signed-off-by: Michael S. Tsirkin 
> >  > >  > > >  > >  >
> >  > >  > > >  > >  > Sparc64 also uses packets ("mondos", not implemented 
> > yet) for
> >  > >  > > >  > >  > interrupt vector data, there the packet size is 8 * 64 
> > bits.
> >  > >  > > >  > >  > I think we should aim for a more generic API that 
> > covers this case also.
> >  > >  > > >  > >
> >  > >  > > >  > >
> >  > >  > > >  > > Are you sure this is a good idea? MSI is tied to PCI, and 
> > PCI only has
> >  > >  > > >  > >  MSI, not "mondos". What code would benefit from this 
> > abstraction?
> >  > >  > > >  >
> >  > >  > > >  > Sparc64 emulation, of course. I think also the API would be 
> > neater.
> >  > >  > > >
> >  > >  > > >
> >  > >  > > > Since "mondos" are not interrupts, why use irqs for them?
> >  > >  > >
> >  > >  > > I just said above that they are used for interrupt vector data. 
> > What
> >  > >  > > makes you think they are not interrupts?
> >  > >  >
> >  > >  > I'm sorry, I don't really know anything about sparc.
> >  > >  > All I am saying is that in PCI, interrupts never pass data,
> >  > >  > so qemu_set_irq as it is now, is a good API to send them.
> >  > >  >
> >  > >  > For the sparc feature you describe, you probably want to add
> >  > >  > a message data parameter to qemu_set_irq, but it's not
> >  > >  > really useful for MSI.
> >  > >
> >  > >
> >  > > Just to clarify, the main difference is that with MSI/MSI-X
> >  > >  both data and address fields are mostly static, modifying them
> >  > >  involves ioapic and device updates which might be an expensive
> >  > >  operation (e.g. with kvm, needs an extra system call).
> >  > >
> >  > >  So I don't think it makes sense to pass MSI-X data field
> >  > >  with each call to qemu_set_irq.
> >  >
> >  > No, but I think the Sparc situation is the same, the packet data is
> >  > static for the interrupt source in question.
> >
> >
> > So, ok, we could add data update callback and then MSI and sparc
> >  would do their thing there. I'm not convinced I like all this
> >  play with untyped buffers, do you think it's helpful?
> >
> >  If yes, maybe I'll try to code it up and see how does it look.
> 
> Well, get/set_data could be limited to irq.c, ioapic.c could export
> something like get/set_msi_data. MSI callers should only use
> get/set_msi_data. Ditto for get/set_mondo_data.

I started coding it up and got lost in a maze of callbacks and void pointers :)

And I also remembered that besides data, there's a mask and pending bits in msi
(it's ignored in the patch I sent, but I'm fixing that), so there needs
to be an interface for this as well.

I am guessing that using qemu_irq_get_opaque and qemu_irq_get_vector that I
added, you'll find it's easy to implement mondos in very few lines of code.
We can always add another level of indirection later when sparc code
surfaces, if we find it helps.


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 11:34:11AM +0100, Paul Brook wrote:
> > The PCI bus doesn't need any special support (I think) but something on
> > the other end needs to interpret those writes.
> 
> Sure. But there's definitely nothing PCI specific about it. I assumed this 
> would all be contained within the APIC.

Exactly. APIC supplies callbacks to set up/free/mask/unmask MSI interrupts.
For kvm, we'll have another implementation that passes these requests
on to kernel.

> > In any case we need some internal API for this, and qemu_irq looks like
> > a good choice.
> 
> What do you expect to be using this API?
> 
> Paul

emulated PCI devices such as virtio.
Hope to send a patch shortly.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 03:08:18PM +0300, Avi Kivity wrote:
> Paul Brook wrote:
> In any case we need some internal API for this, and qemu_irq looks like
> a good choice.
> 
 What do you expect to be using this API?
   
>>> virtio, emulated devices capable of supporting MSI (e1000?), device
>>> assignment (not yet in qemu.git).
>>> 
>>
>> It probably makes sense to have common infrastructure in pci.c to  
>> expose/implement device side MSI functionality. However I see no need 
>> for a direct API between the device and the APIC. We already have an 
>> API for memory accesses and MMIO regions. I'm pretty sure a system 
>> could implement MSI by pointing the device at system ram, and having 
>> the CPU periodically poll that.
>>   
>
> Instead of writing directly, let's abstract it behind a qemu_set_irq().   
> This is easier for device authors.  The default implementation of the  
> irq callback could write to apic memory, while for kvm we can directly  
> trigger the interrupt via the kvm APIs.

Right.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 01:29:37PM +0100, Paul Brook wrote:
> On Thursday 21 May 2009, Avi Kivity wrote:
> > Paul Brook wrote:
> >  In any case we need some internal API for this, and qemu_irq looks
> >  like a good choice.
> > >>>
> > >>> What do you expect to be using this API?
> > >>
> > >> virtio, emulated devices capable of supporting MSI (e1000?), device
> > >> assignment (not yet in qemu.git).
> > >
> > > It probably makes sense to have common infrastructure in pci.c to
> > > expose/implement device side MSI functionality. However I see no need for
> > > a direct API between the device and the APIC. We already have an API for
> > > memory accesses and MMIO regions. I'm pretty sure a system could
> > > implement MSI by pointing the device at system ram, and having the CPU
> > > periodically poll that.
> >
> > Instead of writing directly, let's abstract it behind a qemu_set_irq().
> > This is easier for device authors.  The default implementation of the
> > irq callback could write to apic memory, while for kvm we can directly
> > trigger the interrupt via the kvm APIs.
> 
> I'm still not convinced.
> 
> A tight coupling between PCI devices and the APIC is just going to cause us 
> problems later one. I'm going to come back to the fact that these are memory 
> writes so once we get IOMMU support they will presumably be subject to 
> remapping by that, just like any other memory access.

Actually, MSI messages are handled by IOMMU as interrupts, not as
regular memory accesses. iommu book has comments such as
• Interrupt addresses are never translated to memory addresses, but
other special address ranges may be reclaimed to be backed with memory.

> Even ignoring that, qemu_irq isn't really the right interface. A MSI is a one-
> off event, not a level state.

Yes, I just chose to ignore the level value. It does not look like such
a big issue ... Do you advocate a new qemu_msi structure then?

> OTOH stl_phys is exactly the right interface.

Not really. These are writes but not into physical memory.
For example, on intel 32 bit, stl_phys gets a 32 bit address,
MSI writes encode the interrupt vector in high 32 bit bit of
the address - way outside actual physical memory.

> The KVM interface should be contained within the APIC implementation.
> 
> Paul

Unfortunately kvm capabilities that are present in current kernels
do not map well to this interface. You need to perform
expensive set up for each interrupt vector you are going to use,
be it MSI or regular interrupt.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 03:38:56PM +0300, Avi Kivity wrote:
> Paul Brook wrote:
>>> Instead of writing directly, let's abstract it behind a qemu_set_irq().
>>> This is easier for device authors.  The default implementation of the
>>> irq callback could write to apic memory, while for kvm we can directly
>>> trigger the interrupt via the kvm APIs.
>>> 
>>
>> I'm still not convinced.
>>
>> A tight coupling between PCI devices and the APIC is just going to 
>> cause us problems later one. I'm going to come back to the fact that 
>> these are memory writes so once we get IOMMU support they will 
>> presumably be subject to remapping by that, just like any other memory 
>> access.
>>   
>
> I'm not suggesting the qemu_irq will extend all the way to the apic.   
> Think of it as connecting the device core with its interrupt unit.
>
>> Even ignoring that, qemu_irq isn't really the right interface. A MSI is a 
>> one-
>> off event, not a level state. OTOH stl_phys is exactly the right interface.
>>   
>
> The qemu_irq callback should do an stl_phys().  The device is happy  
> since it's using the same API it uses for non-MSI.  The APIC is happy  
> since it isn't connected directly to the device.  stl_phys() is happy  
> since it sees more traffic and can serve more ads.  kvm is happy since  
> it can hijack the callback to throw the interrupt directly into the 
> kernel.

I like the monetization angle here.

>> The KVM interface should be contained within the APIC implementation.
>>   
>
> Tricky, but doable.

I'd rather keep it simple.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 03:38:56PM +0300, Avi Kivity wrote:
> Paul Brook wrote:
>>> Instead of writing directly, let's abstract it behind a qemu_set_irq().
>>> This is easier for device authors.  The default implementation of the
>>> irq callback could write to apic memory, while for kvm we can directly
>>> trigger the interrupt via the kvm APIs.
>>> 
>>
>> I'm still not convinced.
>>
>> A tight coupling between PCI devices and the APIC is just going to 
>> cause us problems later one. I'm going to come back to the fact that 
>> these are memory writes so once we get IOMMU support they will 
>> presumably be subject to remapping by that, just like any other memory 
>> access.
>>   
>
> I'm not suggesting the qemu_irq will extend all the way to the apic.   
> Think of it as connecting the device core with its interrupt unit.
>
>> Even ignoring that, qemu_irq isn't really the right interface. A MSI is a 
>> one-
>> off event, not a level state. OTOH stl_phys is exactly the right interface.
>>   
>
> The qemu_irq callback should do an stl_phys().

Actually, it seems we can't do it this way now as stl_phys
only gets a 32 bit address. So I'll use apic_deliver for now,
but yes, it will be easy to later rewrite MSI implementation this way
if that limitatiuon is lifted.


> The device is happy  
> since it's using the same API it uses for non-MSI.  The APIC is happy  
> since it isn't connected directly to the device.  stl_phys() is happy  
> since it sees more traffic and can serve more ads.  kvm is happy since  
> it can hijack the callback to throw the interrupt directly into the 
> kernel.
>
>> The KVM interface should be contained within the APIC implementation.
>>   
>
> Tricky, but doable.
>
> -- 
> error compiling committee.c: too many arguments to function

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 02:09:32PM +0100, Paul Brook wrote:
> > > A tight coupling between PCI devices and the APIC is just going to cause
> > > us problems later one. I'm going to come back to the fact that these are
> > > memory writes so once we get IOMMU support they will presumably be
> > > subject to remapping by that, just like any other memory access.
> >
> > I'm not suggesting the qemu_irq will extend all the way to the apic.
> > Think of it as connecting the device core with its interrupt unit.
> >
> > > Even ignoring that, qemu_irq isn't really the right interface. A MSI is a
> > > one- off event, not a level state. OTOH stl_phys is exactly the right
> > > interface.
> >
> > The qemu_irq callback should do an stl_phys().  The device is happy
> > since it's using the same API it uses for non-MSI. 
> 
> MSI provides multiple edge triggered interrupts, whereas traditional mode 
> provides a single level triggered interrupt. My guess is most devices will 
> want to treat these differently anyway.

So, is qemu_send_msi better than qemu_set_irq.

> Either way, this is an implementation detail between pci.c and individual 
> devices. It has nothing to do with the APIC.
> 
> Paul

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 02:23:20PM +0100, Paul Brook wrote:
> > > MSI provides multiple edge triggered interrupts, whereas traditional mode
> > > provides a single level triggered interrupt. My guess is most devices
> > > will want to treat these differently anyway.
> >
> > So, is qemu_send_msi better than qemu_set_irq.
> 
> Neither. pci_send_msi,

Works for me.

> which is a trivial wrapper around stl_phys.

OK, but I'm adding another level of indirection in the middle,
to allow us to tie in a kvm backend.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 02:53:14PM +0100, Paul Brook wrote:
> > > which is a trivial wrapper around stl_phys.
> >
> > OK, but I'm adding another level of indirection in the middle,
> > to allow us to tie in a kvm backend.
> 
> kvm has no business messing with the PCI device code.

Yes it has :)

kvm needs data on MSI entries: that's the interface
current kernel exposes for injecting these interrupts.

I think we also need to support in-kernel devices which
would inject MSI interrupt directly from kernel.
For these, kvm would need to know when mask bit changes
and give us info on pending bit.

That's a fair amount of PCI specific code in kvm.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 02:23:20PM +0100, Paul Brook wrote:
> > > MSI provides multiple edge triggered interrupts, whereas traditional mode
> > > provides a single level triggered interrupt. My guess is most devices
> > > will want to treat these differently anyway.
> >
> > So, is qemu_send_msi better than qemu_set_irq.
> 
> Neither. pci_send_msi, which is a trivial wrapper around stl_phys.

I guess I'll start with that. This only works if target_phys_addr_t is
a 64 bit field (because MSI addresses are typically outside the 32 bit
memory space), I guess the simplest solution is to disable MSI if it's
not so.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 03:50:18PM +0100, Paul Brook wrote:
> > >>> kvm has no business messing with the PCI device code.
> > >>
> > >> kvm has a fast path for irq injection.  If qemu wants to support it we
> > >> need some abstraction here.
> > >
> > > Fast path from where to where? Having the PCI layer bypass/re-implement
> > > the APIC and inject the interrupt directly into the cpu core sounds a
> > > particularly bad idea.
> >
> > kvm implements the APIC in the host kernel (qemu upstream doesn't
> > support this yet).  The fast path is wired to the in-kernel APIC, not
> > the cpu core directly.
> >
> > The idea is to wire it to UIO for device assignment, to a virtio-device
> > implemented in the kernel, and to qemu.
> 
> I still don't see why you're trying to bypass straight from the pci layer to 
> the apic. Why can't you just pass the apic MMIO writes to the kernel? You've 
> presumably got to update the apic state anyway.
> 
> Paul

As far as I can tell, at least on Intel, MSI interrupts are not MMIO writes.
They are PCI memory writes to a hard-coded address range that
are passed to APIC. I don't think MMIO writes can triger MSI,
or at least this does not seem to be documented.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 02:31:26PM +0100, Paul Brook wrote:
> On Thursday 21 May 2009, Paul Brook wrote:
> > > > MSI provides multiple edge triggered interrupts, whereas traditional
> > > > mode provides a single level triggered interrupt. My guess is most
> > > > devices will want to treat these differently anyway.
> > >
> > > So, is qemu_send_msi better than qemu_set_irq.
> >
> > Neither. pci_send_msi, which is a trivial wrapper around stl_phys.
> 
> To clarify, you seem to be trying to fuse two largely separate features 
> together.
> 
> MSI is a standard PCI device capability[1] that involves the device 
> performing 
> a 32-bit memory write when something interesting occurs. These writes may or 
> may not be directed at a APIC.
> 
> The x86 APIC has a memory mapped interface that allows generation of CPU 
> interrupts in response response to memory writes. These may or may not come 
> from an MSI capable PCI device.
> 
> Paul
> 
> [1] Note a *device* capability, not a bus capability.

Paul, so I went over specs, and what you say about APIC here does not
seem to be what Intel actually implemented.  Specifically, Intel
implemented *MSI support in APIC*. This lets PCI devices, but not the CPU,
signal interrupts by memory writes.

For example, after reset, when CPU writes to address 0xfee0 this
is an access to a reserved register in APIC, but when PCI device
does write to 0xfee0, this triggers an interrupt to destination 0.

See section 9.12 in Intel® 64 and IA-32 Architectures Software
Developer’s Manual Volume 3A: System Programming Guide, Part 1
http://www.intel.com/Assets/PDF/manual/253668.pdf

So it seems that what we need to do in pci is:

if (!msi_ops || msi_ops->send_msi(address, data))
stl_phy(address, data);

where send_msi is wired to apic_send_msi and
where apic_send_msi returns an error for an address
outside of the MSI range 0xfee0 - 0xfeef

Makes sense?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH] qemu: msi irq allocation api

2009-05-21 Thread Michael S. Tsirkin
On Thu, May 21, 2009 at 07:45:20PM +0300, Michael S. Tsirkin wrote:
> On Thu, May 21, 2009 at 02:31:26PM +0100, Paul Brook wrote:
> > On Thursday 21 May 2009, Paul Brook wrote:
> > > > > MSI provides multiple edge triggered interrupts, whereas traditional
> > > > > mode provides a single level triggered interrupt. My guess is most
> > > > > devices will want to treat these differently anyway.
> > > >
> > > > So, is qemu_send_msi better than qemu_set_irq.
> > >
> > > Neither. pci_send_msi, which is a trivial wrapper around stl_phys.
> > 
> > To clarify, you seem to be trying to fuse two largely separate features 
> > together.
> > 
> > MSI is a standard PCI device capability[1] that involves the device 
> > performing 
> > a 32-bit memory write when something interesting occurs. These writes may 
> > or 
> > may not be directed at a APIC.
> > 
> > The x86 APIC has a memory mapped interface that allows generation of CPU 
> > interrupts in response response to memory writes. These may or may not come 
> > from an MSI capable PCI device.
> > 
> > Paul
> > 
> > [1] Note a *device* capability, not a bus capability.
> 
> Paul, so I went over specs, and what you say about APIC here does not
> seem to be what Intel actually implemented.  Specifically, Intel
> implemented *MSI support in APIC*. This lets PCI devices, but not the CPU,
> signal interrupts by memory writes.
> 
> For example, after reset, when CPU writes to address 0xfee0 this
> is an access to a reserved register in APIC, but when PCI device
> does write to 0xfee0, this triggers an interrupt to destination 0.
> 
> See section 9.12 in Intel® 64 and IA-32 Architectures Software
> Developer’s Manual Volume 3A: System Programming Guide, Part 1
> http://www.intel.com/Assets/PDF/manual/253668.pdf
> 
> So it seems that what we need to do in pci is:
> 
> if (!msi_ops || msi_ops->send_msi(address, data))
>   stl_phy(address, data);
> 
> where send_msi is wired to apic_send_msi and
> where apic_send_msi returns an error for an address
> outside of the MSI range 0xfee0 - 0xfeef
> 
> Makes sense?


So I ended up with these ops:
allocate
free
update
send
which APIC will define and MSI emulation
will use. Here, send will return error for addresses
outside 0xfeex range, and device will do a plain
stl_phy.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

[PATCHv2] qemu: make default_write_config use mask table

2009-05-25 Thread Michael S. Tsirkin
Change much of hw/pci to use symbolic constants and a table-driven
design: add a mask table with writable bits set and readonly bits unset.
Detect change by comparing original and new registers.

This makes it easy to support capabilities where read-only/writeable
bit layout differs between devices, depending on capabilities present.

As a result, writing a single byte in BAR registers now works as
it should. Writing to upper limit registers in the bridge
also works as it should. Code is also shorter.

Signed-off-by: Michael S. Tsirkin 
---

This fixes a minor bug in previous version of the patch:
pci config header size should be 0x40.
yamahata, you probably want to rebase your patch on top of this.

 hw/pci.c |  145 -
 hw/pci.h |   18 +++-
 2 files changed, 46 insertions(+), 117 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 0ab5b94..1e70e46 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -239,6 +239,17 @@ int pci_assign_devaddr(const char *addr, int *domp, int 
*busp, unsigned *slotp)
 return pci_parse_devaddr(devaddr, domp, busp, slotp);
 }
 
+static void pci_init_mask(PCIDevice *dev)
+{
+int i;
+dev->mask[PCI_CACHE_LINE_SIZE] = 0xff;
+dev->mask[PCI_INTERRUPT_LINE] = 0xff;
+dev->mask[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY
+ | PCI_COMMAND_MASTER;
+for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i)
+dev->mask[i] = 0xff;
+}
+
 /* -1 for devfn means auto assign */
 static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
  const char *name, int devfn,
@@ -261,6 +272,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
 pstrcpy(pci_dev->name, sizeof(pci_dev->name), name);
 memset(pci_dev->irq_state, 0, sizeof(pci_dev->irq_state));
 pci_set_default_subsystem_id(pci_dev);
+pci_init_mask(pci_dev);
 
 if (!config_read)
 config_read = pci_default_read_config;
@@ -334,6 +346,7 @@ void pci_register_io_region(PCIDevice *pci_dev, int 
region_num,
 {
 PCIIORegion *r;
 uint32_t addr;
+uint32_t mask;
 
 if ((unsigned int)region_num >= PCI_NUM_REGIONS)
 return;
@@ -349,12 +362,17 @@ void pci_register_io_region(PCIDevice *pci_dev, int 
region_num,
 r->size = size;
 r->type = type;
 r->map_func = map_func;
+
+mask = ~(size - 1);
 if (region_num == PCI_ROM_SLOT) {
 addr = 0x30;
+/* ROM enable bit is writeable */
+mask |= 1;
 } else {
 addr = 0x10 + region_num * 4;
 }
 *(uint32_t *)(pci_dev->config + addr) = cpu_to_le32(type);
+*(uint32_t *)(pci_dev->mask + addr) = cpu_to_le32(mask);
 }
 
 static void pci_update_mappings(PCIDevice *d)
@@ -463,118 +481,21 @@ uint32_t pci_default_read_config(PCIDevice *d,
 return val;
 }
 
-void pci_default_write_config(PCIDevice *d,
-  uint32_t address, uint32_t val, int len)
+void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
 {
-int can_write, i;
-uint32_t end, addr;
-
-if (len == 4 && ((address >= 0x10 && address < 0x10 + 4 * 6) ||
- (address >= 0x30 && address < 0x34))) {
-PCIIORegion *r;
-int reg;
+uint8_t orig[PCI_CONFIG_SPACE_SIZE];
+int i;
 
-if ( address >= 0x30 ) {
-reg = PCI_ROM_SLOT;
-}else{
-reg = (address - 0x10) >> 2;
-}
-r = &d->io_regions[reg];
-if (r->size == 0)
-goto default_config;
-/* compute the stored value */
-if (reg == PCI_ROM_SLOT) {
-/* keep ROM enable bit */
-val &= (~(r->size - 1)) | 1;
-} else {
-val &= ~(r->size - 1);
-val |= r->type;
-}
-*(uint32_t *)(d->config + address) = cpu_to_le32(val);
-pci_update_mappings(d);
-return;
-}
- default_config:
 /* not efficient, but simple */
-addr = address;
-for(i = 0; i < len; i++) {
-/* default read/write accesses */
-switch(d->config[0x0e]) {
-case 0x00:
-case 0x80:
-switch(addr) {
-case 0x00:
-case 0x01:
-case 0x02:
-case 0x03:
-case 0x06:
-case 0x07:
-case 0x08:
-case 0x09:
-case 0x0a:
-case 0x0b:
-case 0x0e:
-case 0x10 ... 0x27: /* base */
-case 0x2c ... 0x2f: /* read-only subsystem ID & vendor ID */
-case 0x30 ... 0x33: /* rom */
-case 0x3d:
-can_write = 0;
-break;
-default:
-can_write = 1;
-break;
-}
-break;
- 

[PATCH 00/11] qemu: MSI-X support

2009-05-25 Thread Michael S. Tsirkin
Here is the port of MSI-X support patches to upstream qemu.
Anthony, you said you are willing to borrow a few cycles to help MSI-X
go in through QEMU - could you please comment or commit?

This patchset adds generic support for MSI-X, adds implementation in
APIC, and uses MSI-X in virtio-net. At Paul's suggestion, I use stl_phy
to decouple APIC and MSI-X implementation.

This is on top of 'PATCHv2 make default_write_config use mask table'
that I posted previously. I have included that patch here
for completeness.

-- 
MST

Michael S. Tsirkin (11):
  qemu: make default_write_config use mask table
  qemu: capability bits in pci save/restore
  qemu: add routines to manage PCI capabilities
  qemu: helper routines for pci access.
  qemu: MSI-X support functions
  qemu: add flag to disable MSI-X by default
  qemu: minimal MSI/MSI-X implementation for PC
  qemu: add support for resizing regions
  qemu: virtio support for many interrupt vectors
  qemu: MSI-X support in virtio PCI
  qemu: request 3 vectors in virtio-net

 Makefile.target|2 +-
 hw/apic.c  |   50 ++-
 hw/msix.c  |  426 
 hw/msix.h  |   35 +
 hw/pci.c   |  295 +++-
 hw/pci.h   |   93 +++-
 hw/syborg_virtio.c |   13 ++-
 hw/virtio-net.c|1 +
 hw/virtio-pci.c|  168 -
 hw/virtio.c|   63 ++--
 hw/virtio.h|   10 +-
 qemu-options.hx|2 +
 vl.c   |3 +
 13 files changed, 952 insertions(+), 209 deletions(-)
 create mode 100644 hw/msix.c
 create mode 100644 hw/msix.h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 01/11] qemu: make default_write_config use mask table

2009-05-25 Thread Michael S. Tsirkin
Change much of hw/pci to use symbolic constants and a table-driven
design: add a mask table with writable bits set and readonly bits unset.
Detect change by comparing original and new registers.

This makes it easy to support capabilities where read-only/writeable
bit layout differs between devices, depending on capabilities present.

As a result, writing a single byte in BAR registers now works as
it should. Writing to upper limit registers in the bridge
also works as it should. Code is also shorter.

Signed-off-by: Michael S. Tsirkin 
---

This is the same as v2 of the patch that I sent previously.
It is included here for completeness.

 hw/pci.c |  145 -
 hw/pci.h |   18 +++-
 2 files changed, 46 insertions(+), 117 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 0ab5b94..1e70e46 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -239,6 +239,17 @@ int pci_assign_devaddr(const char *addr, int *domp, int 
*busp, unsigned *slotp)
 return pci_parse_devaddr(devaddr, domp, busp, slotp);
 }
 
+static void pci_init_mask(PCIDevice *dev)
+{
+int i;
+dev->mask[PCI_CACHE_LINE_SIZE] = 0xff;
+dev->mask[PCI_INTERRUPT_LINE] = 0xff;
+dev->mask[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY
+ | PCI_COMMAND_MASTER;
+for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i)
+dev->mask[i] = 0xff;
+}
+
 /* -1 for devfn means auto assign */
 static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus,
  const char *name, int devfn,
@@ -261,6 +272,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
 pstrcpy(pci_dev->name, sizeof(pci_dev->name), name);
 memset(pci_dev->irq_state, 0, sizeof(pci_dev->irq_state));
 pci_set_default_subsystem_id(pci_dev);
+pci_init_mask(pci_dev);
 
 if (!config_read)
 config_read = pci_default_read_config;
@@ -334,6 +346,7 @@ void pci_register_io_region(PCIDevice *pci_dev, int 
region_num,
 {
 PCIIORegion *r;
 uint32_t addr;
+uint32_t mask;
 
 if ((unsigned int)region_num >= PCI_NUM_REGIONS)
 return;
@@ -349,12 +362,17 @@ void pci_register_io_region(PCIDevice *pci_dev, int 
region_num,
 r->size = size;
 r->type = type;
 r->map_func = map_func;
+
+mask = ~(size - 1);
 if (region_num == PCI_ROM_SLOT) {
 addr = 0x30;
+/* ROM enable bit is writeable */
+mask |= 1;
 } else {
 addr = 0x10 + region_num * 4;
 }
 *(uint32_t *)(pci_dev->config + addr) = cpu_to_le32(type);
+*(uint32_t *)(pci_dev->mask + addr) = cpu_to_le32(mask);
 }
 
 static void pci_update_mappings(PCIDevice *d)
@@ -463,118 +481,21 @@ uint32_t pci_default_read_config(PCIDevice *d,
 return val;
 }
 
-void pci_default_write_config(PCIDevice *d,
-  uint32_t address, uint32_t val, int len)
+void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int l)
 {
-int can_write, i;
-uint32_t end, addr;
-
-if (len == 4 && ((address >= 0x10 && address < 0x10 + 4 * 6) ||
- (address >= 0x30 && address < 0x34))) {
-PCIIORegion *r;
-int reg;
+uint8_t orig[PCI_CONFIG_SPACE_SIZE];
+int i;
 
-if ( address >= 0x30 ) {
-reg = PCI_ROM_SLOT;
-}else{
-reg = (address - 0x10) >> 2;
-}
-r = &d->io_regions[reg];
-if (r->size == 0)
-goto default_config;
-/* compute the stored value */
-if (reg == PCI_ROM_SLOT) {
-/* keep ROM enable bit */
-val &= (~(r->size - 1)) | 1;
-} else {
-val &= ~(r->size - 1);
-val |= r->type;
-}
-*(uint32_t *)(d->config + address) = cpu_to_le32(val);
-pci_update_mappings(d);
-return;
-}
- default_config:
 /* not efficient, but simple */
-addr = address;
-for(i = 0; i < len; i++) {
-/* default read/write accesses */
-switch(d->config[0x0e]) {
-case 0x00:
-case 0x80:
-switch(addr) {
-case 0x00:
-case 0x01:
-case 0x02:
-case 0x03:
-case 0x06:
-case 0x07:
-case 0x08:
-case 0x09:
-case 0x0a:
-case 0x0b:
-case 0x0e:
-case 0x10 ... 0x27: /* base */
-case 0x2c ... 0x2f: /* read-only subsystem ID & vendor ID */
-case 0x30 ... 0x33: /* rom */
-case 0x3d:
-can_write = 0;
-break;
-default:
-can_write = 1;
-break;
-}
-break;
-default:
-case 0x01:
-swi

[PATCH 03/11] qemu: add routines to manage PCI capabilities

2009-05-25 Thread Michael S. Tsirkin
Add routines to manage PCI capability list. First user will be MSI-X.

Signed-off-by: Michael S. Tsirkin 
---
 hw/pci.c |   98 --
 hw/pci.h |   18 +++-
 2 files changed, 106 insertions(+), 10 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 5dcfb4e..6bc3819 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -130,12 +130,13 @@ void pci_device_save(PCIDevice *s, QEMUFile *f)
 int version = s->cap_present ? 3 : 2;
 int i;
 
-qemu_put_be32(f, version); /* PCI device version */
+/* PCI device version and capabilities */
+qemu_put_be32(f, version);
+if (version >= 3)
+qemu_put_be32(f, s->cap_present);
 qemu_put_buffer(f, s->config, 256);
 for (i = 0; i < 4; i++)
 qemu_put_be32(f, s->irq_state[i]);
-if (version >= 3)
-qemu_put_be32(f, s->cap_present);
 }
 
 int pci_device_load(PCIDevice *s, QEMUFile *f)
@@ -146,12 +147,6 @@ int pci_device_load(PCIDevice *s, QEMUFile *f)
 version_id = qemu_get_be32(f);
 if (version_id > 3)
 return -EINVAL;
-qemu_get_buffer(f, s->config, 256);
-pci_update_mappings(s);
-
-if (version_id >= 2)
-for (i = 0; i < 4; i ++)
-s->irq_state[i] = qemu_get_be32(f);
 if (version_id >= 3)
 s->cap_present = qemu_get_be32(f);
 else
@@ -160,6 +155,18 @@ int pci_device_load(PCIDevice *s, QEMUFile *f)
 if (s->cap_present & ~s->cap_supported)
 return -EINVAL;
 
+qemu_get_buffer(f, s->config, 256);
+pci_update_mappings(s);
+
+if (version_id >= 2)
+for (i = 0; i < 4; i ++)
+s->irq_state[i] = qemu_get_be32(f);
+/* Clear mask and used bits for capabilities.
+   Must be restored separately, since capabilities can
+   be placed anywhere in config space. */
+memset(s->used, 0, PCI_CONFIG_SPACE_SIZE);
+for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i)
+s->mask[i] = 0xff;
 return 0;
 }
 
@@ -870,3 +877,76 @@ PCIDevice *pci_create_simple(PCIBus *bus, int devfn, const 
char *name)
 
 return (PCIDevice *)dev;
 }
+
+static int pci_find_space(PCIDevice *pdev, uint8_t size)
+{
+int offset = PCI_CONFIG_HEADER_SIZE;
+int i;
+for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i)
+if (pdev->used[i])
+offset = i + 1;
+else if (i - offset + 1 == size)
+return offset;
+return 0;
+}
+
+static uint8_t pci_find_capability_list(PCIDevice *pdev, uint8_t cap_id,
+uint8_t *prev_p)
+{
+uint8_t next, prev;
+
+if (!(pdev->config[PCI_STATUS] & PCI_STATUS_CAP_LIST))
+return 0;
+
+for (prev = PCI_CAPABILITY_LIST; (next = pdev->config[prev]);
+ prev = next + PCI_CAP_LIST_NEXT)
+if (pdev->config[next + PCI_CAP_LIST_ID] != cap_id)
+break;
+
+*prev_p = prev;
+return next;
+}
+
+/* Reserve space and add capability to the linked list in pci config space */
+int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
+{
+uint8_t offset = pci_find_space(pdev, size);
+uint8_t *config = pdev->config + offset;
+if (!offset)
+return -ENOSPC;
+config[PCI_CAP_LIST_ID] = cap_id;
+config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
+pdev->config[PCI_CAPABILITY_LIST] = offset;
+pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
+memset(pdev->used + offset, 0xFF, size);
+/* Make capability read-only by default */
+memset(pdev->mask + offset, 0, size);
+return offset;
+}
+
+/* Unlink capability from the pci config space. */
+void pci_del_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
+{
+uint8_t prev, offset = pci_find_capability_list(pdev, cap_id, &prev);
+if (!offset)
+return;
+pdev->config[prev] = pdev->config[offset + PCI_CAP_LIST_NEXT];
+/* Make capability writeable again */
+memset(pdev->mask + offset, 0xff, size);
+memset(pdev->used + offset, 0, size);
+
+if (!pdev->config[PCI_CAPABILITY_LIST])
+pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
+}
+
+/* Reserve space for capability at a known offset (to call after load). */
+void pci_reserve_capability(PCIDevice *pdev, uint8_t offset, uint8_t size)
+{
+memset(pdev->used + offset, 0xff, size);
+}
+
+uint8_t pci_find_capability(PCIDevice *pdev, uint8_t cap_id)
+{
+uint8_t prev;
+return pci_find_capability_list(pdev, cap_id, &prev);
+}
diff --git a/hw/pci.h b/hw/pci.h
index 9139504..40137c6 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -123,6 +123,10 @@ typedef struct PCIIORegion {
 #define PCI_MIN_GNT0x3e/* 8 bits */
 #define PCI_MAX_LAT0x3f/* 8 bits */
 
+/* Capability lists */
+#define PCI_CAP_LIST_ID0   /* Capability ID */
+#d

[PATCH 02/11] qemu: capability bits in pci save/restore

2009-05-25 Thread Michael S. Tsirkin
Add support for capability bits in save/restore for pci.
These will be used for MSI, where the capability might
be present or not as requested by user, which does not
map well into a single version number.

Signed-off-by: Michael S. Tsirkin 
---
 hw/pci.c |   14 --
 hw/pci.h |4 
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 1e70e46..5dcfb4e 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -127,12 +127,15 @@ int pci_bus_num(PCIBus *s)
 
 void pci_device_save(PCIDevice *s, QEMUFile *f)
 {
+int version = s->cap_present ? 3 : 2;
 int i;
 
-qemu_put_be32(f, 2); /* PCI device version */
+qemu_put_be32(f, version); /* PCI device version */
 qemu_put_buffer(f, s->config, 256);
 for (i = 0; i < 4; i++)
 qemu_put_be32(f, s->irq_state[i]);
+if (version >= 3)
+qemu_put_be32(f, s->cap_present);
 }
 
 int pci_device_load(PCIDevice *s, QEMUFile *f)
@@ -141,7 +144,7 @@ int pci_device_load(PCIDevice *s, QEMUFile *f)
 int i;
 
 version_id = qemu_get_be32(f);
-if (version_id > 2)
+if (version_id > 3)
 return -EINVAL;
 qemu_get_buffer(f, s->config, 256);
 pci_update_mappings(s);
@@ -149,6 +152,13 @@ int pci_device_load(PCIDevice *s, QEMUFile *f)
 if (version_id >= 2)
 for (i = 0; i < 4; i ++)
 s->irq_state[i] = qemu_get_be32(f);
+if (version_id >= 3)
+s->cap_present = qemu_get_be32(f);
+else
+s->cap_present = 0;
+
+if (s->cap_present & ~s->cap_supported)
+return -EINVAL;
 
 return 0;
 }
diff --git a/hw/pci.h b/hw/pci.h
index c0c2380..9139504 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -178,6 +178,10 @@ struct PCIDevice {
 
 /* Current IRQ levels.  Used internally by the generic PCI code.  */
 int irq_state[4];
+
+/* Capability bits for save/load */
+uint32_t cap_supported;
+uint32_t cap_present;
 };
 
 PCIDevice *pci_register_device(PCIBus *bus, const char *name,
-- 
1.6.3.1.56.g79e1.dirty

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 04/11] qemu: helper routines for pci access.

2009-05-25 Thread Michael S. Tsirkin
Add inline routines for convenient access to pci devices
with correct (little) endianness. Will be used by MSI-X support.

Signed-off-by: Michael S. Tsirkin 
---
 hw/pci.h |   30 +++---
 1 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/hw/pci.h b/hw/pci.h
index 40137c6..4e112a3 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -240,21 +240,45 @@ PCIBus *pci_bridge_init(PCIBus *bus, int devfn, uint16_t 
vid, uint16_t did,
 pci_map_irq_fn map_irq, const char *name);
 
 static inline void
+pci_set_word(uint8_t *config, uint16_t val)
+{
+cpu_to_le16wu((uint16_t *)config, val);
+}
+
+static inline uint16_t
+pci_get_word(uint8_t *config)
+{
+return le16_to_cpupu((uint16_t *)config);
+}
+
+static inline void
+pci_set_long(uint8_t *config, uint32_t val)
+{
+cpu_to_le32wu((uint32_t *)config, val);
+}
+
+static inline uint32_t
+pci_get_long(uint8_t *config)
+{
+return le32_to_cpupu((uint32_t *)config);
+}
+
+static inline void
 pci_config_set_vendor_id(uint8_t *pci_config, uint16_t val)
 {
-cpu_to_le16wu((uint16_t *)&pci_config[PCI_VENDOR_ID], val);
+pci_set_word(&pci_config[PCI_VENDOR_ID], val);
 }
 
 static inline void
 pci_config_set_device_id(uint8_t *pci_config, uint16_t val)
 {
-cpu_to_le16wu((uint16_t *)&pci_config[PCI_DEVICE_ID], val);
+pci_set_word(&pci_config[PCI_DEVICE_ID], val);
 }
 
 static inline void
 pci_config_set_class(uint8_t *pci_config, uint16_t val)
 {
-cpu_to_le16wu((uint16_t *)&pci_config[PCI_CLASS_DEVICE], val);
+pci_set_word(&pci_config[PCI_CLASS_DEVICE], val);
 }
 
 typedef void (*pci_qdev_initfn)(PCIDevice *dev);
-- 
1.6.3.1.56.g79e1.dirty

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 05/11] qemu: MSI-X support functions

2009-05-25 Thread Michael S. Tsirkin
Add functions implementing MSI-X support. First user will be virtio-pci.
Note that platform must set a flag to declare MSI supported.
For PC this will be set by APIC.

Signed-off-by: Michael S. Tsirkin 
---
 Makefile.target |2 +-
 hw/msix.c   |  423 +++
 hw/msix.h   |   35 +
 hw/pci.h|   20 +++
 4 files changed, 479 insertions(+), 1 deletions(-)
 create mode 100644 hw/msix.c
 create mode 100644 hw/msix.h

diff --git a/Makefile.target b/Makefile.target
index 664a1e3..87b2859 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -486,7 +486,7 @@ endif #CONFIG_BSD_USER
 ifndef CONFIG_USER_ONLY
 
 OBJS=vl.o osdep.o monitor.o pci.o loader.o isa_mmio.o machine.o \
- gdbstub.o gdbstub-xml.o
+ gdbstub.o gdbstub-xml.o msix.o
 # virtio has to be here due to weird dependency between PCI and virtio-net.
 # need to fix this properly
 OBJS+=virtio-blk.o virtio-balloon.o virtio-net.o virtio-console.o
diff --git a/hw/msix.c b/hw/msix.c
new file mode 100644
index 000..4db2fc1
--- /dev/null
+++ b/hw/msix.c
@@ -0,0 +1,423 @@
+/*
+ * MSI-X device support
+ *
+ * This module includes support for MSI-X in pci devices.
+ *
+ * Author: Michael S. Tsirkin 
+ *
+ *  Copyright (c) 2009, Red Hat Inc, Michael S. Tsirkin (m...@redhat.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "hw.h"
+#include "msix.h"
+#include "pci.h"
+
+/* Declaration from linux/pci_regs.h */
+#define  PCI_CAP_ID_MSIX 0x11 /* MSI-X */
+#define  PCI_MSIX_FLAGS 2 /* Table at lower 11 bits */
+#define  PCI_MSIX_FLAGS_QSIZE  0x7FF
+#define  PCI_MSIX_FLAGS_ENABLE (1 << 15)
+#define  PCI_MSIX_FLAGS_BIRMASK(7 << 0)
+
+/* MSI-X capability structure */
+#define MSIX_TABLE_OFFSET 4
+#define MSIX_PBA_OFFSET 8
+#define MSIX_CAP_LENGTH 12
+
+/* MSI enable bit is in byte 1 in FLAGS register */
+#define MSIX_ENABLE_OFFSET (PCI_MSIX_FLAGS + 1)
+#define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
+
+/* MSI-X table format */
+#define MSIX_MSG_ADDR 0
+#define MSIX_MSG_UPPER_ADDR 4
+#define MSIX_MSG_DATA 8
+#define MSIX_VECTOR_CTRL 12
+#define MSIX_ENTRY_SIZE 16
+#define MSIX_VECTOR_MASK 0x1
+
+/* How much space does an MSIX table need. */
+/* The spec requires giving the table structure
+ * a 4K aligned region all by itself. Align it to
+ * target pages so that drivers can do passthrough
+ * on the rest of the region. */
+#define MSIX_PAGE_SIZE TARGET_PAGE_ALIGN(0x1000)
+/* Reserve second half of the page for pending bits */
+#define MSIX_PAGE_PENDING (MSIX_PAGE_SIZE / 2)
+#define MSIX_MAX_ENTRIES 32
+
+
+#ifdef MSIX_DEBUG
+#define DEBUG(fmt, ...)   \
+do {  \
+  fprintf(stderr, "%s: " fmt, __func__ , __VA_ARGS__);\
+} while (0)
+#else
+#define DEBUG(fmt, ...) do { } while(0)
+#endif
+
+/* Flag to globally disable MSI-X support */
+int msix_disable;
+
+/* Flag for interrupt controller to declare MSI-X support */
+int msix_supported;
+
+/* Add MSI-X capability to the config space for the device. */
+/* Given a bar and its size, add MSI-X table on top of it
+ * and fill MSI-X capability in the config space.
+ * Original bar size must be a power of 2 or 0.
+ * New bar size is returned. */
+static int msix_add_config(struct PCIDevice *pdev, unsigned short nentries,
+   unsigned bar_nr, unsigned bar_size)
+{
+int config_offset;
+uint8_t *config;
+uint32_t new_size;
+
+if (nentries < 1 || nentries > PCI_MSIX_FLAGS_QSIZE + 1)
+return -EINVAL;
+if (bar_size > 0x8000)
+return -ENOSPC;
+
+/* Add space for MSI-X structures */
+if (!bar_size)
+new_size = MSIX_PAGE_SIZE;
+else if (bar_size < MSIX_PAGE_SIZE) {
+bar_size = MSIX_PAGE_SIZE;
+new_size = MSIX_PAGE_SIZE * 2;
+} else
+new_size = bar_size * 2;
+
+pdev->msix_bar_size = new_size;
+config_offset = pci_add_capability(pdev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
+if (config_offset < 0)
+return config_offset;
+config = pdev->config + config_offset;
+
+pci_set_word(config + PCI_MSIX_FLAGS, nentries - 1);
+/* Table on top of BAR */
+pci_set_long(config + MSIX_TABLE_OFFSET, bar_size | bar_nr);
+/* Pending bits on top of that */
+pci_set_long(config + MSIX_PBA_OFFSET, (bar_size + MSIX_PAGE_PENDING) |
+ bar_nr);
+pdev->msix_cap = config_offset;
+/* Make flags bit writeable. */
+pdev->mask[config_offset + MSIX_ENABLE_OFFSET] |= MSIX_ENABLE_MASK;
+return 0;
+}
+
+static void msix_free_irq_entries(PCIDevice *dev)
+{
+int vector;
+
+for (vector = 0; vector < dev->msix_entries_nr; ++vector)
+dev->msix_entry_used[vector] = 0;
+}
+
+/* 

[PATCH 06/11] qemu: add flag to disable MSI-X by default

2009-05-25 Thread Michael S. Tsirkin
Add global flag to disable MSI-X by default.  This is useful primarily
to make images loadable by older qemu (without msix).  Even when MSI-X
is disabled by flag, you can still load images that have MSI-X enabled.

Signed-off-by: Michael S. Tsirkin 
---
 hw/msix.c   |3 +++
 qemu-options.hx |2 ++
 vl.c|3 +++
 3 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 4db2fc1..970a8ca 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -122,6 +122,9 @@ void msix_write_config(PCIDevice *dev, uint32_t addr,
uint32_t val, int len)
 {
 unsigned enable_pos = dev->msix_cap + MSIX_ENABLE_OFFSET;
+if (!(dev->cap_present & QEMU_PCI_CAP_MSIX))
+return;
+
 if (addr + len <= enable_pos || addr > enable_pos)
 return;
 
diff --git a/qemu-options.hx b/qemu-options.hx
index 87af798..fd041a4 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1575,3 +1575,5 @@ DEF("semihosting", 0, QEMU_OPTION_semihosting,
 DEF("old-param", 0, QEMU_OPTION_old_param,
 "-old-param  old param mode\n")
 #endif
+DEF("disable-msix", 0, QEMU_OPTION_disable_msix,
+"-disable-msix disable msix support for PCI devices (enabled by 
default)\n")
diff --git a/vl.c b/vl.c
index 2c1f0e0..2757d4f 100644
--- a/vl.c
+++ b/vl.c
@@ -134,6 +134,7 @@ int main(int argc, char **argv)
 #include "hw/usb.h"
 #include "hw/pcmcia.h"
 #include "hw/pc.h"
+#include "hw/msix.h"
 #include "hw/audiodev.h"
 #include "hw/isa.h"
 #include "hw/baum.h"
@@ -5557,6 +5558,8 @@ int main(int argc, char **argv, char **envp)
 xen_mode = XEN_ATTACH;
 break;
 #endif
+case QEMU_OPTION_disable_msix:
+msix_disable = 1;
 }
 }
 }
-- 
1.6.3.1.56.g79e1.dirty

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 07/11] qemu: minimal MSI/MSI-X implementation for PC

2009-05-25 Thread Michael S. Tsirkin
Implement MSI support in APIC. Note that MSI and MMIO APIC registers
are at the same memory location, but actually not on the global bus: MSI
is on PCI bus, APIC is connected directly to the CPU. We map them on the
global bus at the same address which happens to work because MSI
registers are reserved in APIC MMIO and vice versa.

Signed-off-by: Michael S. Tsirkin 
---
 hw/apic.c |   50 ++
 1 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/hw/apic.c b/hw/apic.c
index 8c8b2de..d831709 100644
--- a/hw/apic.c
+++ b/hw/apic.c
@@ -19,6 +19,8 @@
  */
 #include "hw.h"
 #include "pc.h"
+#include "pci.h"
+#include "msix.h"
 #include "qemu-timer.h"
 #include "host-utils.h"
 
@@ -63,6 +65,19 @@
 #define MAX_APICS 255
 #define MAX_APIC_WORDS 8
 
+/* Intel APIC constants: from include/asm/msidef.h */
+#define MSI_DATA_VECTOR_SHIFT  0
+#define MSI_DATA_VECTOR_MASK   0x00ff
+#define MSI_DATA_DELIVERY_MODE_SHIFT   8
+#define MSI_DATA_TRIGGER_SHIFT 15
+#define MSI_DATA_LEVEL_SHIFT   14
+#define MSI_ADDR_DEST_MODE_SHIFT   2
+#define MSI_ADDR_DEST_ID_SHIFT 12
+#defineMSI_ADDR_DEST_ID_MASK   0x000
+
+#define MSI_ADDR_BASE   0xfee0
+#define MSI_ADDR_SIZE   0x10
+
 typedef struct APICState {
 CPUState *cpu_env;
 uint32_t apicbase;
@@ -86,6 +101,13 @@ typedef struct APICState {
 QEMUTimer *timer;
 } APICState;
 
+struct msi_state {
+uint64_t addr;
+uint32_t data;
+int mask;
+int pending;
+};
+
 static int apic_io_memory;
 static APICState *local_apics[MAX_APICS + 1];
 static int last_apic_id = 0;
@@ -712,11 +734,31 @@ static uint32_t apic_mem_readl(void *opaque, 
target_phys_addr_t addr)
 return val;
 }
 
+static void apic_send_msi(target_phys_addr_t addr, uint32 data)
+{
+uint8_t dest = (addr & MSI_ADDR_DEST_ID_MASK) >> MSI_ADDR_DEST_ID_SHIFT;
+uint8_t vector = (data & MSI_DATA_VECTOR_MASK) >> MSI_DATA_VECTOR_SHIFT;
+uint8_t dest_mode = (addr >> MSI_ADDR_DEST_MODE_SHIFT) & 0x1;
+uint8_t trigger_mode = (data >> MSI_DATA_TRIGGER_SHIFT) & 0x1;
+uint8_t delivery = (data >> MSI_DATA_DELIVERY_MODE_SHIFT) & 0x7;
+/* XXX: Ignore redirection hint. */
+apic_deliver_irq(dest, dest_mode, delivery, vector, 0, trigger_mode);
+}
+
 static void apic_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t 
val)
 {
 CPUState *env;
 APICState *s;
-int index;
+int index = (addr >> 4) & 0xff;
+if (addr > 0xfff || !index) {
+/* MSI and MMIO APIC are at the same memory location,
+ * but actually not on the global bus: MSI is on PCI bus
+ * APIC is connected directly to the CPU.
+ * Mapping them on the global bus happens to work because
+ * MSI registers are reserved in APIC MMIO and vice versa. */
+apic_send_msi(addr, val);
+return;
+}
 
 env = cpu_single_env;
 if (!env)
@@ -727,7 +769,6 @@ static void apic_mem_writel(void *opaque, 
target_phys_addr_t addr, uint32_t val)
 printf("APIC write: %08x = %08x\n", (uint32_t)addr, val);
 #endif
 
-index = (addr >> 4) & 0xff;
 switch(index) {
 case 0x02:
 s->id = (val >> 24);
@@ -911,6 +952,7 @@ int apic_init(CPUState *env)
 s->cpu_env = env;
 
 apic_reset(s);
+msix_supported = 1;
 
 /* XXX: mapping more APICs at the same memory location */
 if (apic_io_memory == 0) {
@@ -918,7 +960,8 @@ int apic_init(CPUState *env)
on the global memory bus. */
 apic_io_memory = cpu_register_io_memory(0, apic_mem_read,
 apic_mem_write, NULL);
-cpu_register_physical_memory(s->apicbase & ~0xfff, 0x1000,
+/* XXX: what if the base changes? */
+cpu_register_physical_memory(MSI_ADDR_BASE, MSI_ADDR_SIZE,
  apic_io_memory);
 }
 s->timer = qemu_new_timer(vm_clock, apic_timer, s);
@@ -929,4 +972,3 @@ int apic_init(CPUState *env)
 local_apics[s->id] = s;
 return 0;
 }
-
-- 
1.6.3.1.56.g79e1.dirty

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 08/11] qemu: add support for resizing regions

2009-05-25 Thread Michael S. Tsirkin
Make it possible to resize PCI regions.  This will be used by virtio
with MSI-X, where the region size depends on whether MSI-X is enabled,
and can change across load/save.

Signed-off-by: Michael S. Tsirkin 
---
 hw/pci.c |   54 --
 hw/pci.h |3 +++
 2 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 6bc3819..19905b9 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -392,6 +392,41 @@ void pci_register_io_region(PCIDevice *pci_dev, int 
region_num,
 *(uint32_t *)(pci_dev->mask + addr) = cpu_to_le32(mask);
 }
 
+static void pci_unmap_region(PCIDevice *d, PCIIORegion *r)
+{
+if (r->addr == -1)
+return;
+if (r->type & PCI_ADDRESS_SPACE_IO) {
+int class;
+/* NOTE: specific hack for IDE in PC case:
+   only one byte must be mapped. */
+class = pci_get_word(d->config + PCI_CLASS_DEVICE);
+if (class == 0x0101 && r->size == 4) {
+isa_unassign_ioport(r->addr + 2, 1);
+} else {
+isa_unassign_ioport(r->addr, r->size);
+}
+} else {
+cpu_register_physical_memory(pci_to_cpu_addr(r->addr),
+ r->size,
+ IO_MEM_UNASSIGNED);
+qemu_unregister_coalesced_mmio(r->addr, r->size);
+}
+}
+
+void pci_resize_io_region(PCIDevice *pci_dev, int region_num,
+  uint32_t size)
+{
+
+PCIIORegion *r = &pci_dev->io_regions[region_num];
+if (r->size == size)
+return;
+r->size = size;
+pci_unmap_region(pci_dev, r);
+r->addr = -1;
+pci_update_mappings(pci_dev);
+}
+
 static void pci_update_mappings(PCIDevice *d)
 {
 PCIIORegion *r;
@@ -445,24 +480,7 @@ static void pci_update_mappings(PCIDevice *d)
 }
 /* now do the real mapping */
 if (new_addr != r->addr) {
-if (r->addr != -1) {
-if (r->type & PCI_ADDRESS_SPACE_IO) {
-int class;
-/* NOTE: specific hack for IDE in PC case:
-   only one byte must be mapped. */
-class = d->config[0x0a] | (d->config[0x0b] << 8);
-if (class == 0x0101 && r->size == 4) {
-isa_unassign_ioport(r->addr + 2, 1);
-} else {
-isa_unassign_ioport(r->addr, r->size);
-}
-} else {
-cpu_register_physical_memory(pci_to_cpu_addr(r->addr),
- r->size,
- IO_MEM_UNASSIGNED);
-qemu_unregister_coalesced_mmio(r->addr, r->size);
-}
-}
+pci_unmap_region(d, r);
 r->addr = new_addr;
 if (r->addr != -1) {
 r->map_func(d, i, r->addr, r->size, r->type);
diff --git a/hw/pci.h b/hw/pci.h
index 6da626b..4072f16 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -221,6 +221,9 @@ void pci_register_io_region(PCIDevice *pci_dev, int 
region_num,
 uint32_t size, int type,
 PCIMapIORegionFunc *map_func);
 
+void pci_resize_io_region(PCIDevice *pci_dev, int region_num,
+  uint32_t size);
+
 int pci_add_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size);
 
 void pci_del_capability(PCIDevice *pci_dev, uint8_t cap_id, uint8_t cap_size);
-- 
1.6.3.1.56.g79e1.dirty

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 11/11] qemu: request 3 vectors in virtio-net

2009-05-25 Thread Michael S. Tsirkin
Request up to 3 vectors in virtio-net. Actual bindings might supply
less.

Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio-net.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 60aa6da..6118fe3 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -621,6 +621,7 @@ VirtIODevice *virtio_net_init(DeviceState *dev)
 n->mac_table.macs = qemu_mallocz(MAC_TABLE_ENTRIES * ETH_ALEN);
 
 n->vlans = qemu_mallocz(MAX_VLAN >> 3);
+n->vdev.nvectors = 3;
 
 register_savevm("virtio-net", virtio_net_id++, VIRTIO_NET_VM_VERSION,
 virtio_net_save, virtio_net_load, n);
-- 
1.6.3.1.56.g79e1.dirty
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCH 10/11] qemu: MSI-X support in virtio PCI

2009-05-25 Thread Michael S. Tsirkin
This enables actual support for MSI-X in virtio PCI.
First user will be virtio-net.

Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio-pci.c |  152 --
 1 files changed, 112 insertions(+), 40 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 7dfdd80..294f4c7 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -18,6 +18,7 @@
 #include "virtio.h"
 #include "pci.h"
 //#include "sysemu.h"
+#include "msix.h"
 
 /* from Linux's linux/virtio_pci.h */
 
@@ -47,7 +48,24 @@
  * a read-and-acknowledge. */
 #define VIRTIO_PCI_ISR  19
 
-#define VIRTIO_PCI_CONFIG   20
+/* MSI-X registers: only enabled if MSI-X is enabled. */
+/* A 16-bit vector for configuration changes. */
+#define VIRTIO_MSI_CONFIG_VECTOR20
+/* A 16-bit vector for selected queue notifications. */
+#define VIRTIO_MSI_QUEUE_VECTOR 22
+
+/* Config space size */
+#define VIRTIO_PCI_CONFIG_NOMSI 20
+#define VIRTIO_PCI_CONFIG_MSI   24
+#define VIRTIO_PCI_REGION_SIZE(dev) (msix_present(dev) ? \
+ VIRTIO_PCI_CONFIG_MSI : \
+ VIRTIO_PCI_CONFIG_NOMSI)
+
+/* The remaining space is defined by each driver as the per-driver
+ * configuration space */
+#define VIRTIO_PCI_CONFIG(dev)  (msix_enabled(dev) ? \
+ VIRTIO_PCI_CONFIG_MSI : \
+ VIRTIO_PCI_CONFIG_NOMSI)
 
 /* Virtio ABI version, if we increment this, we break the guest driver. */
 #define VIRTIO_PCI_ABI_VERSION  0
@@ -81,14 +99,17 @@ typedef struct {
 static void virtio_pci_notify(void *opaque, uint16_t vector)
 {
 VirtIOPCIProxy *proxy = opaque;
-
-qemu_set_irq(proxy->pci_dev.irq[0], proxy->vdev->isr & 1);
+if (msix_enabled(&proxy->pci_dev))
+msix_notify(&proxy->pci_dev, vector);
+else
+qemu_set_irq(proxy->pci_dev.irq[0], proxy->vdev->isr & 1);
 }
 
 static void virtio_pci_reset(void *opaque)
 {
 VirtIOPCIProxy *proxy = opaque;
 virtio_reset(proxy->vdev);
+msix_reset(&proxy->pci_dev);
 }
 
 static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
@@ -97,8 +118,6 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, 
uint32_t val)
 VirtIODevice *vdev = proxy->vdev;
 target_phys_addr_t pa;
 
-addr -= proxy->addr;
-
 switch (addr) {
 case VIRTIO_PCI_GUEST_FEATURES:
/* Guest does not negotiate properly?  We have to assume nothing. */
@@ -131,17 +150,33 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 if (vdev->status == 0)
 virtio_pci_reset(proxy);
 break;
+case VIRTIO_MSI_CONFIG_VECTOR:
+msix_vector_unuse(&proxy->pci_dev, vdev->config_vector);
+/* Make it possible for guest to discover an error took place. */
+if (msix_vector_use(&proxy->pci_dev, val) < 0)
+val = VIRTIO_NO_VECTOR;
+vdev->config_vector = val;
+break;
+case VIRTIO_MSI_QUEUE_VECTOR:
+msix_vector_unuse(&proxy->pci_dev,
+  virtio_queue_vector(vdev, vdev->queue_sel));
+/* Make it possible for guest to discover an error took place. */
+if (msix_vector_use(&proxy->pci_dev, val) < 0)
+val = VIRTIO_NO_VECTOR;
+virtio_queue_set_vector(vdev, vdev->queue_sel, val);
+break;
+default:
+fprintf(stderr, "%s: unexpected address 0x%x value 0x%x\n",
+__func__, addr, val);
+break;
 }
 }
 
-static uint32_t virtio_ioport_read(void *opaque, uint32_t addr)
+static uint32_t virtio_ioport_read(VirtIOPCIProxy *proxy, uint32_t addr)
 {
-VirtIOPCIProxy *proxy = opaque;
 VirtIODevice *vdev = proxy->vdev;
 uint32_t ret = 0x;
 
-addr -= proxy->addr;
-
 switch (addr) {
 case VIRTIO_PCI_HOST_FEATURES:
 ret = vdev->get_features(vdev);
@@ -169,6 +204,12 @@ static uint32_t virtio_ioport_read(void *opaque, uint32_t 
addr)
 vdev->isr = 0;
 qemu_set_irq(proxy->pci_dev.irq[0], 0);
 break;
+case VIRTIO_MSI_CONFIG_VECTOR:
+ret = vdev->config_vector;
+break;
+case VIRTIO_MSI_QUEUE_VECTOR:
+ret = virtio_queue_vector(vdev, vdev->queue_sel);
+break;
 default:
 break;
 }
@@ -179,42 +220,72 @@ static uint32_t virtio_ioport_read(void *opaque, uint32_t 
addr)
 static uint32_t virtio_pci_config_readb(void *opaque, uint32_t addr)
 {
 VirtIOPCIProxy *proxy = opaque;
-addr -= proxy->addr + VIRTIO_PCI_CONFIG;
+uint32_t config = VIRTIO_PCI_CONFIG(&proxy->pci_dev);
+addr -= proxy->addr;
+if (addr < config)
+return virtio_ioport

[PATCH 09/11] qemu: virtio support for many interrupt vectors

2009-05-25 Thread Michael S. Tsirkin
Extend virtio to support many interrupt vectors, and rearrange code in
preparation for multi-vector support (mostly move reset out to bindings,
because we will have to reset the vectors in transport-specific code).
Actual bindings in pci, and use in net, to follow.
Load and save are not connected to bindings yet, so they are left
stubbed out for now.

Signed-off-by: Michael S. Tsirkin 
---
 hw/syborg_virtio.c |   13 --
 hw/virtio-pci.c|   24 +++
 hw/virtio.c|   63 ++-
 hw/virtio.h|   10 ++-
 4 files changed, 84 insertions(+), 26 deletions(-)

diff --git a/hw/syborg_virtio.c b/hw/syborg_virtio.c
index 37c219c..d8c978a 100644
--- a/hw/syborg_virtio.c
+++ b/hw/syborg_virtio.c
@@ -134,7 +134,10 @@ static void syborg_virtio_writel(void *opaque, 
target_phys_addr_t offset,
 vdev->features = value;
 break;
 case SYBORG_VIRTIO_QUEUE_BASE:
-virtio_queue_set_addr(vdev, vdev->queue_sel, value);
+if (value == 0)
+virtio_reset(vdev);
+else
+virtio_queue_set_addr(vdev, vdev->queue_sel, value);
 break;
 case SYBORG_VIRTIO_QUEUE_SEL:
 if (value < VIRTIO_PCI_QUEUE_MAX)
@@ -228,7 +231,7 @@ static CPUWriteMemoryFunc *syborg_virtio_writefn[] = {
  syborg_virtio_writel
 };
 
-static void syborg_virtio_update_irq(void *opaque)
+static void syborg_virtio_update_irq(void *opaque, uint16_t vector)
 {
 SyborgVirtIOProxy *proxy = opaque;
 int level;
@@ -239,7 +242,7 @@ static void syborg_virtio_update_irq(void *opaque)
 }
 
 static VirtIOBindings syborg_virtio_bindings = {
-.update_irq = syborg_virtio_update_irq
+.notify = syborg_virtio_update_irq
 };
 
 static void syborg_virtio_init(SyborgVirtIOProxy *proxy, VirtIODevice *vdev)
@@ -248,6 +251,8 @@ static void syborg_virtio_init(SyborgVirtIOProxy *proxy, 
VirtIODevice *vdev)
 
 proxy->vdev = vdev;
 
+/* Don't support multiple vectors */
+proxy->vdev->nvectors = 0;
 sysbus_init_irq(&proxy->busdev, &proxy->irq);
 iomemtype = cpu_register_io_memory(0, syborg_virtio_readfn,
syborg_virtio_writefn, proxy);
@@ -255,6 +260,8 @@ static void syborg_virtio_init(SyborgVirtIOProxy *proxy, 
VirtIODevice *vdev)
 
 proxy->id = ((uint32_t)0x1af4 << 16) | vdev->device_id;
 
+qemu_register_reset(virtio_reset, 0, vdev);
+
 virtio_bind_device(vdev, &syborg_virtio_bindings, proxy);
 }
 
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index c072423..7dfdd80 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -78,13 +78,19 @@ typedef struct {
 
 /* virtio device */
 
-static void virtio_pci_update_irq(void *opaque)
+static void virtio_pci_notify(void *opaque, uint16_t vector)
 {
 VirtIOPCIProxy *proxy = opaque;
 
 qemu_set_irq(proxy->pci_dev.irq[0], proxy->vdev->isr & 1);
 }
 
+static void virtio_pci_reset(void *opaque)
+{
+VirtIOPCIProxy *proxy = opaque;
+virtio_reset(proxy->vdev);
+}
+
 static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 {
 VirtIOPCIProxy *proxy = opaque;
@@ -108,7 +114,10 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 break;
 case VIRTIO_PCI_QUEUE_PFN:
 pa = (target_phys_addr_t)val << VIRTIO_PCI_QUEUE_ADDR_SHIFT;
-virtio_queue_set_addr(vdev, vdev->queue_sel, pa);
+if (pa == 0)
+virtio_pci_reset(proxy);
+else
+virtio_queue_set_addr(vdev, vdev->queue_sel, pa);
 break;
 case VIRTIO_PCI_QUEUE_SEL:
 if (val < VIRTIO_PCI_QUEUE_MAX)
@@ -120,7 +129,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 case VIRTIO_PCI_STATUS:
 vdev->status = val & 0xFF;
 if (vdev->status == 0)
-virtio_reset(vdev);
+virtio_pci_reset(proxy);
 break;
 }
 }
@@ -158,7 +167,7 @@ static uint32_t virtio_ioport_read(void *opaque, uint32_t 
addr)
 /* reading from the ISR also clears it. */
 ret = vdev->isr;
 vdev->isr = 0;
-virtio_update_irq(vdev);
+qemu_set_irq(proxy->pci_dev.irq[0], 0);
 break;
 default:
 break;
@@ -243,7 +252,7 @@ static void virtio_map(PCIDevice *pci_dev, int region_num,
 }
 
 static const VirtIOBindings virtio_pci_bindings = {
-.update_irq = virtio_pci_update_irq
+.notify = virtio_pci_notify
 };
 
 static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev,
@@ -255,6 +264,9 @@ static void virtio_init_pci(VirtIOPCIProxy *proxy, 
VirtIODevice *vdev,
 
 proxy->vdev = vdev;
 
+/* No support for multiple vectors yet. */
+proxy->vdev->nvectors = 0;
+
 config = proxy->pci_dev.config;
 pci_config_set_vendor_id(config, vendor);
 pci_config_set_device_id(config, device);
@

[PATCH] qemu: virtio save/load bindings

2009-05-25 Thread Michael S. Tsirkin
Implement bindings for virtio save/load. Use them in virtio pci.

Signed-off-by: Michael S. Tsirkin 
---

Is anyone working to fill in load/save bindings so that saving virtio
devices works? Here's a trivial patch to do this (this one is on top of my
MSI-X patchset).
Comments?

 hw/virtio-pci.c |   49 -
 hw/virtio.c |   31 ++-
 hw/virtio.h |4 
 3 files changed, 66 insertions(+), 18 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 294f4c7..589fbb1 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -105,6 +105,48 @@ static void virtio_pci_notify(void *opaque, uint16_t 
vector)
 qemu_set_irq(proxy->pci_dev.irq[0], proxy->vdev->isr & 1);
 }
 
+static void virtio_pci_save_config(void * opaque, QEMUFile *f)
+{
+VirtIOPCIProxy *proxy = opaque;
+pci_device_save(&proxy->pci_dev, f);
+msix_save(&proxy->pci_dev, f);
+if (msix_present(&proxy->pci_dev))
+qemu_put_be16(f, proxy->vdev->config_vector);
+}
+
+static void virtio_pci_save_queue(void * opaque, int n, QEMUFile *f)
+{
+VirtIOPCIProxy *proxy = opaque;
+if (msix_present(&proxy->pci_dev))
+qemu_put_be16(f, virtio_queue_vector(proxy->vdev, n));
+}
+
+static int virtio_pci_load_config(void * opaque, QEMUFile *f)
+{
+VirtIOPCIProxy *proxy = opaque;
+int ret;
+ret = pci_device_load(&proxy->pci_dev, f);
+if (ret)
+return ret;
+ret = msix_load(&proxy->pci_dev, f);
+if (ret)
+return ret;
+if (msix_present(&proxy->pci_dev))
+qemu_get_be16s(f, &proxy->vdev->config_vector);
+return 0;
+}
+
+static int virtio_pci_load_queue(void * opaque, int n, QEMUFile *f)
+{
+VirtIOPCIProxy *proxy = opaque;
+uint16_t vector;
+if (!msix_present(&proxy->pci_dev))
+return 0;
+qemu_get_be16s(f, &vector);
+virtio_queue_set_vector(proxy->vdev, n, vector);
+return 0;
+}
+
 static void virtio_pci_reset(void *opaque)
 {
 VirtIOPCIProxy *proxy = opaque;
@@ -317,7 +359,12 @@ static void virtio_write_config(PCIDevice *pci_dev, 
uint32_t address,
 }
 
 static const VirtIOBindings virtio_pci_bindings = {
-.notify = virtio_pci_notify
+.notify = virtio_pci_notify,
+.save_config = virtio_pci_save_config,
+.load_config = virtio_pci_load_config,
+.save_config = virtio_pci_save_config,
+.save_queue = virtio_pci_save_queue,
+.load_queue = virtio_pci_load_queue,
 };
 
 static void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev,
diff --git a/hw/virtio.c b/hw/virtio.c
index 63ffcff..b773dff 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -568,9 +568,8 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
 {
 int i;
 
-/* FIXME: load/save binding.  */
-//pci_device_save(&vdev->pci_dev, f);
-//msix_save(&vdev->pci_dev, f);
+if (vdev->binding->save_config)
+vdev->binding->save_config(vdev->binding_opaque, f);
 
 qemu_put_8s(f, &vdev->status);
 qemu_put_8s(f, &vdev->isr);
@@ -596,19 +595,20 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
 qemu_put_be32(f, vdev->vq[i].vring.num);
 qemu_put_be64(f, vdev->vq[i].pa);
 qemu_put_be16s(f, &vdev->vq[i].last_avail_idx);
-if (vdev->nvectors)
-qemu_put_be16s(f, &vdev->vq[i].vector);
+if (vdev->binding->save_queue)
+vdev->binding->save_queue(vdev->binding_opaque, i, f);
 }
 }
 
 int virtio_load(VirtIODevice *vdev, QEMUFile *f)
 {
-int num, i;
+int num, i, ret;
 
-/* FIXME: load/save binding.  */
-//pci_device_load(&vdev->pci_dev, f);
-//r = msix_load(&vdev->pci_dev, f);
-//pci_resize_io_region(&vdev->pci_dev, 1, msix_bar_size(&vdev->pci_dev));
+if (vdev->binding->load_config) {
+ret = vdev->binding->load_config(vdev->binding_opaque, f);
+if (ret)
+return ret;
+}
 
 qemu_get_8s(f, &vdev->status);
 qemu_get_8s(f, &vdev->isr);
@@ -617,10 +617,6 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f)
 vdev->config_len = qemu_get_be32(f);
 qemu_get_buffer(f, vdev->config, vdev->config_len);
 
-if (vdev->nvectors) {
-qemu_get_be16s(f, &vdev->config_vector);
-//msix_vector_use(&vdev->pci_dev, vdev->config_vector);
-}
 num = qemu_get_be32(f);
 
 for (i = 0; i < num; i++) {
@@ -631,9 +627,10 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f)
 if (vdev->vq[i].pa) {
 virtqueue_init(&vdev->vq[i]);
 }
-if (vdev->nvectors) {
-qemu_get_be16s(f, &vdev->vq[i].vector);
-//msix_vector_use(&vdev->pci_dev, vdev->config_vector);
+if (vdev-

Re: [Qemu-devel] [PATCH 04/11] qemu: helper routines for pci access.

2009-05-25 Thread Michael S. Tsirkin
On Tue, May 26, 2009 at 11:33:37AM +0900, Isaku Yamahata wrote:
> On Mon, May 25, 2009 at 03:25:33PM +0300, Michael S. Tsirkin wrote:
> > Add inline routines for convenient access to pci devices
> > with correct (little) endianness. Will be used by MSI-X support.
>
> Just a minor comment.
> How about to add pci_[sg]et_byte() for consistency?

I don't see that it makes sense - pci_set_long(config, value)
is shorter than *((uint32_t *)config) = cpu_to_le32(value),
but single bytes don't have endianness, and *config = value
is shorter.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH 04/11] qemu: helper routines for pci access.

2009-05-26 Thread Michael S. Tsirkin
On Tue, May 26, 2009 at 11:07:33AM +0300, Avi Kivity wrote:
> Michael S. Tsirkin wrote:
>> On Tue, May 26, 2009 at 11:33:37AM +0900, Isaku Yamahata wrote:
>>   
>>> On Mon, May 25, 2009 at 03:25:33PM +0300, Michael S. Tsirkin wrote:
>>> 
>>>> Add inline routines for convenient access to pci devices
>>>> with correct (little) endianness. Will be used by MSI-X support.
>>>>   
>>> Just a minor comment.
>>> How about to add pci_[sg]et_byte() for consistency?
>>> 
>>
>> I don't see that it makes sense - pci_set_long(config, value)
>> is shorter than *((uint32_t *)config) = cpu_to_le32(value),
>> but single bytes don't have endianness, and *config = value
>> is shorter.
>>   
>
> It's nice to have consistent APIs though.

Well, if enough people feel so ...



qemu: add pci_get/set_byte

Add pci_get/set_byte to keep *_word and *_long access functions company.
They are unused for now.

Signed-off-by: Michael S. Tsirkin 

---

diff --git a/hw/pci.h b/hw/pci.h
index 4072f16..e1e4fb4 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -263,6 +263,18 @@ PCIBus *pci_bridge_init(PCIBus *bus, int devfn, uint16_t 
vid, uint16_t did,
 pci_map_irq_fn map_irq, const char *name);
 
 static inline void
+pci_set_byte(uint8_t *config, uint8_t val)
+{
+*config = val;
+}
+
+static inline uint8_t
+pci_get_byte(uint8_t *config)
+{
+return *config;
+}
+
+static inline void
 pci_set_word(uint8_t *config, uint16_t val)
 {
 cpu_to_le16wu((uint16_t *)config, val);

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] [PATCH 03/11] qemu: add routines to manage PCI capabilities

2009-05-26 Thread Michael S. Tsirkin
On Tue, May 26, 2009 at 05:49:26PM +0900, Isaku Yamahata wrote:
> On Mon, May 25, 2009 at 03:25:20PM +0300, Michael S. Tsirkin wrote:
> > Add routines to manage PCI capability list. First user will be MSI-X.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >  hw/pci.c |   98 
> > --
> >  hw/pci.h |   18 +++-
> >  2 files changed, 106 insertions(+), 10 deletions(-)
> > 
> > diff --git a/hw/pci.c b/hw/pci.c
> > index 5dcfb4e..6bc3819 100644
> > --- a/hw/pci.c
> > +++ b/hw/pci.c
> > @@ -130,12 +130,13 @@ void pci_device_save(PCIDevice *s, QEMUFile *f)
> >  int version = s->cap_present ? 3 : 2;
> >  int i;
> >  
> > -qemu_put_be32(f, version); /* PCI device version */
> > +/* PCI device version and capabilities */
> > +qemu_put_be32(f, version);
> > +if (version >= 3)
> > +qemu_put_be32(f, s->cap_present);
> >  qemu_put_buffer(f, s->config, 256);
> >  for (i = 0; i < 4; i++)
> >  qemu_put_be32(f, s->irq_state[i]);
> > -if (version >= 3)
> > -qemu_put_be32(f, s->cap_present);
> >  }
> >  
> >  int pci_device_load(PCIDevice *s, QEMUFile *f)
> > @@ -146,12 +147,6 @@ int pci_device_load(PCIDevice *s, QEMUFile *f)
> >  version_id = qemu_get_be32(f);
> >  if (version_id > 3)
> >  return -EINVAL;
> > -qemu_get_buffer(f, s->config, 256);
> > -pci_update_mappings(s);
> > -
> > -if (version_id >= 2)
> > -for (i = 0; i < 4; i ++)
> > -s->irq_state[i] = qemu_get_be32(f);
> >  if (version_id >= 3)
> >  s->cap_present = qemu_get_be32(f);
> >  else
> > @@ -160,6 +155,18 @@ int pci_device_load(PCIDevice *s, QEMUFile *f)
> >  if (s->cap_present & ~s->cap_supported)
> >  return -EINVAL;
> >  
> > +qemu_get_buffer(f, s->config, 256);
> > +pci_update_mappings(s);
> > +
> > +if (version_id >= 2)
> > +for (i = 0; i < 4; i ++)
> > +s->irq_state[i] = qemu_get_be32(f);
> > +/* Clear mask and used bits for capabilities.
> > +   Must be restored separately, since capabilities can
> > +   be placed anywhere in config space. */
> > +memset(s->used, 0, PCI_CONFIG_SPACE_SIZE);
> > +for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i)
> > +s->mask[i] = 0xff;
> >  return 0;
> >  }
> >  
> > @@ -870,3 +877,76 @@ PCIDevice *pci_create_simple(PCIBus *bus, int devfn, 
> > const char *name)
> >  
> >  return (PCIDevice *)dev;
> >  }
> > +
> > +static int pci_find_space(PCIDevice *pdev, uint8_t size)
> > +{
> > +int offset = PCI_CONFIG_HEADER_SIZE;
> > +int i;
> > +for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i)
> > +if (pdev->used[i])
> > +offset = i + 1;
> > +else if (i - offset + 1 == size)
> > +return offset;
> > +return 0;
> > +}
> > +
> > +static uint8_t pci_find_capability_list(PCIDevice *pdev, uint8_t cap_id,
> > +uint8_t *prev_p)
> > +{
> > +uint8_t next, prev;
> > +
> > +if (!(pdev->config[PCI_STATUS] & PCI_STATUS_CAP_LIST))
> > +return 0;
> > +
> > +for (prev = PCI_CAPABILITY_LIST; (next = pdev->config[prev]);
> > + prev = next + PCI_CAP_LIST_NEXT)
> > +if (pdev->config[next + PCI_CAP_LIST_ID] != cap_id)
> 
> typo? ==


Fixed. Thanks.

> > +break;
> > +
> > +*prev_p = prev;
> > +return next;
> > +}
> > +
> > +/* Reserve space and add capability to the linked list in pci config space 
> > */
> > +int pci_add_capability(PCIDevice *pdev, uint8_t cap_id, uint8_t size)
> > +{
> > +uint8_t offset = pci_find_space(pdev, size);
> > +uint8_t *config = pdev->config + offset;
> > +if (!offset)
> > +return -ENOSPC;
> > +config[PCI_CAP_LIST_ID] = cap_id;
> > +config[PCI_CAP_LIST_NEXT] = pdev->config[PCI_CAPABILITY_LIST];
> > +pdev->config[PCI_CAPABILITY_LIST] = offset;
> > +pdev->config[PCI_STATUS] |= PCI_STATUS_CAP_LIST;
> > +memset(pdev->used + offset, 0xFF, size);
> > +/* Make capability read-only by default */
> > +memset(pdev->mask + offset, 0, size);
> >

[PATCH] qemu: fix pci_find_capability for multiple caps

2009-05-26 Thread Michael S. Tsirkin
pci_find_capability_list has a bug so it'd stop at the first
capability. This only happens to work as we only support
a single capability (MSI-X). Here's a fix.

Found-by: Isaku Yamahata 
Signed-off-by: Michael S. Tsirkin 

---

This is a fixup for my patch "qemu: add routines to manage PCI
capabilities".

 hw/pci.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index 19905b9..a63d988 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -918,7 +918,7 @@ static uint8_t pci_find_capability_list(PCIDevice *pdev, 
uint8_t cap_id,
 
 for (prev = PCI_CAPABILITY_LIST; (next = pdev->config[prev]);
  prev = next + PCI_CAP_LIST_NEXT)
-if (pdev->config[next + PCI_CAP_LIST_ID] != cap_id)
+if (pdev->config[next + PCI_CAP_LIST_ID] == cap_id)
 break;
 
 *prev_p = prev;

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: find_vqs operation starting at arbitrary index

2009-06-01 Thread Michael S. Tsirkin
On Mon, Jun 01, 2009 at 01:33:48PM +0530, Amit Shah wrote:
> Hello,
> 
> The recent find_vqs operation doesn't allow for a vq to be found at an
> arbitrary location; it's meant to be called once at startup to find all
> possible queues and never called again.
> 
> This doesn't work for devices which can have queues hot-plugged at
> run-time. This can be made to work by passing the 'start_index' value as
> was done earlier for find_vq, but I doubt something like the following
> will work. The MSI vectors might need some changing as well.

How, specifically?

> If this indeed is the right way to do it, I can add a
> 
> virtio_find_vqs helper along similar lines as virtio_find_single_vq and
> pass '0' as the start index to the ->find_vqs operation.
> 
> Or is there another way to do it?
> 
> diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
> index 193c8f0..cb3f8df 100644
> --- a/drivers/virtio/virtio_pci.c
> +++ b/drivers/virtio/virtio_pci.c
> @@ -499,7 +499,7 @@ static void vp_del_vqs(struct virtio_device *vdev)
>  
>  /* the config->find_vqs() implementation */
>  static int vp_find_vqs(struct virtio_device *vdev, unsigned nvqs,
> -struct virtqueue *vqs[],
> +unsigned start_index, struct virtqueue *vqs[],
>  vq_callback_t *callbacks[],
>  const char *names[])
>  {
> @@ -516,7 +516,8 @@ static int vp_find_vqs(struct virtio_device *vdev, 
> unsigned nvqs,
>   goto error_request;
>  
>   for (i = 0; i < nvqs; ++i) {
> - vqs[i] = vp_find_vq(vdev, i, callbacks[i], names[i]);
> + vqs[i] = vp_find_vq(vdev, start_index + i, callbacks[i],
> + names[i]);
>   if (IS_ERR(vqs[i]))
>   goto error_find;
>   }
> 
> 
> 
>   Amit
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: find_vqs operation starting at arbitrary index

2009-06-01 Thread Michael S. Tsirkin
On Mon, Jun 01, 2009 at 02:05:10PM +0530, Amit Shah wrote:
> On (Mon) Jun 01 2009 [11:11:06], Michael S. Tsirkin wrote:
> > On Mon, Jun 01, 2009 at 01:33:48PM +0530, Amit Shah wrote:
> > > Hello,
> > > 
> > > The recent find_vqs operation doesn't allow for a vq to be found at an
> > > arbitrary location; it's meant to be called once at startup to find all
> > > possible queues and never called again.
> > > 
> > > This doesn't work for devices which can have queues hot-plugged at
> > > run-time. This can be made to work by passing the 'start_index' value as
> > > was done earlier for find_vq, but I doubt something like the following
> > > will work. The MSI vectors might need some changing as well.
> > 
> > How, specifically?
> 
> I'm not sure; I was wanting to know if they will.

Yes, probably.

> I suspect this piece
> of code though:
> 
> in vp_find_vqs, just before calling vp_find_vq:
> 
>   /* How many vectors would we like? */
>   for (i = 0; i < nvqs; ++i)
>   if (callbacks[i])
>   ++vectors;
> 
>   err = vp_request_vectors(vdev, vectors);
>   if (err)
>   goto error_request;
> 
> Will any adjusting be needed for the 'vectors' argument (since it's
> considered to be the max value one can specify)?
> 
>   Amit

Right. And it can only be called once. And something'll have to be done
on cleanup as well.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: find_vqs operation starting at arbitrary index

2009-06-01 Thread Michael S. Tsirkin
On Mon, Jun 01, 2009 at 02:21:28PM +0530, Amit Shah wrote:
> On (Mon) Jun 01 2009 [11:43:27], Michael S. Tsirkin wrote:
> > On Mon, Jun 01, 2009 at 02:05:10PM +0530, Amit Shah wrote:
> > > On (Mon) Jun 01 2009 [11:11:06], Michael S. Tsirkin wrote:
> > > > On Mon, Jun 01, 2009 at 01:33:48PM +0530, Amit Shah wrote:
> > > > > Hello,
> > > > > 
> > > > > The recent find_vqs operation doesn't allow for a vq to be found at an
> > > > > arbitrary location; it's meant to be called once at startup to find 
> > > > > all
> > > > > possible queues and never called again.
> > > > > 
> > > > > This doesn't work for devices which can have queues hot-plugged at
> > > > > run-time. This can be made to work by passing the 'start_index' value 
> > > > > as
> > > > > was done earlier for find_vq, but I doubt something like the following
> > > > > will work. The MSI vectors might need some changing as well.
> > > > 
> > > > How, specifically?
> > > 
> > > I'm not sure; I was wanting to know if they will.
> > 
> > Yes, probably.
> > 
> > > I suspect this piece
> > > of code though:
> > > 
> > > in vp_find_vqs, just before calling vp_find_vq:
> > > 
> > >   /* How many vectors would we like? */
> > >   for (i = 0; i < nvqs; ++i)
> > >   if (callbacks[i])
> > >   ++vectors;
> > > 
> > >   err = vp_request_vectors(vdev, vectors);
> > >   if (err)
> > >   goto error_request;
> > > 
> > > Will any adjusting be needed for the 'vectors' argument (since it's
> > > considered to be the max value one can specify)?
> > > 
> > >   Amit
> > 
> > Right. And it can only be called once. And something'll have to be done
> > on cleanup as well.
> 
> Yes; that's the assumption find_vqs currently makes.

The reason really is with pci_enable_msix which is the underlying
limitation here.

> I'll take a stab at reworking this code.
> 
>   Amit
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


Re: find_vqs operation starting at arbitrary index

2009-06-01 Thread Michael S. Tsirkin
On Mon, Jun 01, 2009 at 02:21:28PM +0530, Amit Shah wrote:
> On (Mon) Jun 01 2009 [11:43:27], Michael S. Tsirkin wrote:
> > On Mon, Jun 01, 2009 at 02:05:10PM +0530, Amit Shah wrote:
> > > On (Mon) Jun 01 2009 [11:11:06], Michael S. Tsirkin wrote:
> > > > On Mon, Jun 01, 2009 at 01:33:48PM +0530, Amit Shah wrote:
> > > > > Hello,
> > > > > 
> > > > > The recent find_vqs operation doesn't allow for a vq to be found at an
> > > > > arbitrary location; it's meant to be called once at startup to find 
> > > > > all
> > > > > possible queues and never called again.
> > > > > 
> > > > > This doesn't work for devices which can have queues hot-plugged at
> > > > > run-time. This can be made to work by passing the 'start_index' value 
> > > > > as
> > > > > was done earlier for find_vq, but I doubt something like the following
> > > > > will work. The MSI vectors might need some changing as well.

Another possible approach would be to add enable_vq/disable_vq operations
(or possibly resize_vq).
Thus you would create an empty queue (size 0) upfront, and associate it with
a vector, and then enable it/set the real size when you really want to use it.
disable/resize to 0 when you don't need it anymore.


Actually, resizing queues might come in handy for virtio_net:
I think tx queue length might be tunable by user,
and maybe we can even auto-tune rx queue size to avoid packet loss.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


[PATCHv2 00/13] qemu: MSI-X support

2009-06-02 Thread Michael S. Tsirkin
Resending. Incorporated a minor fix pointed out by Isaku Yamahata.

Here is the port of MSI-X support patches to upstream qemu.
Please comment or commit.

This patchset adds generic support for MSI-X, adds implementation in
APIC, and uses MSI-X in virtio-net. At Paul's suggestion, I use stl_phy
to decouple APIC and MSI-X implementation.

This uses the mask table patch that I posted previously, and which is
now included in the series.

-- 
MST

Michael S. Tsirkin (13):
  qemu: make default_write_config use mask table
  qemu: capability bits in pci save/restore
  qemu: add routines to manage PCI capabilities
  qemu: helper routines for pci access.
  qemu: MSI-X support functions
  qemu: add flag to disable MSI-X by default
  qemu: minimal MSI/MSI-X implementation for PC
  qemu: add support for resizing regions
  qemu: virtio support for many interrupt vectors
  qemu: MSI-X support in virtio PCI
  qemu: request 3 vectors in virtio-net
  qemu: virtio save/load bindings
  qemu: add pci_get/set_byte

 Makefile.target|2 +-
 hw/apic.c  |   50 ++-
 hw/msix.c  |  426 
 hw/msix.h  |   35 +
 hw/pci.c   |  295 +++-
 hw/pci.h   |  105 -
 hw/syborg_virtio.c |   13 ++-
 hw/virtio-net.c|1 +
 hw/virtio-pci.c|  215 +-
 hw/virtio.c|   70 ++---
 hw/virtio.h|   14 ++-
 qemu-options.hx|2 +
 vl.c   |3 +
 13 files changed, 1017 insertions(+), 214 deletions(-)
 create mode 100644 hw/msix.c
 create mode 100644 hw/msix.h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization


  1   2   3   4   5   6   7   8   9   10   >