[PATCH] virtio-mem: drop unnecessary initialization

2020-06-07 Thread Michael S. Tsirkin
rc is initialized to -ENIVAL but that's never used. Drop it.

Fixes: 5f1f79bbc9e2 ("virtio-mem: Paravirtualized memory hotplug")
Reported-by: kernel test robot 
Signed-off-by: Michael S. Tsirkin 
---
 drivers/virtio/virtio_mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index f658fe9149be..2f357142ea5e 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -1768,7 +1768,7 @@ static void virtio_mem_delete_resource(struct virtio_mem 
*vm)
 static int virtio_mem_probe(struct virtio_device *vdev)
 {
struct virtio_mem *vm;
-   int rc = -EINVAL;
+   int rc;
 
BUILD_BUG_ON(sizeof(struct virtio_mem_req) != 24);
BUILD_BUG_ON(sizeof(struct virtio_mem_resp) != 10);
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE

2020-06-07 Thread Jürgen Groß

On 17.04.20 01:45, Deep Shah wrote:

Thomas Hellstrom will be handing over VMware's maintainership of these
interfaces to Deep Shah.

Signed-off-by: Deep Shah 
Acked-by: Thomas Hellstrom 

Pushed to xen/tip.git for-linus-5.8


Juergen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v5 13/13] vhost: drop head based APIs

2020-06-07 Thread Jason Wang


On 2020/6/7 下午10:11, Michael S. Tsirkin wrote:

Everyone's using buf APIs, no need for head based ones anymore.

Signed-off-by: Michael S. Tsirkin
---
  drivers/vhost/vhost.c | 36 
  drivers/vhost/vhost.h | 12 
  2 files changed, 8 insertions(+), 40 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 72ee55c810c4..e6931b760b61 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2299,12 +2299,12 @@ static int fetch_buf(struct vhost_virtqueue *vq)
return 1;
  }
  
-/* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */

+/* Revert the effect of fetch_buf. Useful for error handling. */
+static
  void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
  {
vq->last_avail_idx -= n;
  }
-EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);



The same question as previous version.

Do we need to rewind cached descriptor here?

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC 03/13] vhost: batching fetches

2020-06-07 Thread Jason Wang


On 2020/6/7 下午9:57, Michael S. Tsirkin wrote:

On Fri, Jun 05, 2020 at 11:40:17AM +0800, Jason Wang wrote:

On 2020/6/4 下午4:59, Michael S. Tsirkin wrote:

On Wed, Jun 03, 2020 at 03:27:39PM +0800, Jason Wang wrote:

On 2020/6/2 下午9:06, Michael S. Tsirkin wrote:

With this patch applied, new and old code perform identically.

Lots of extra optimizations are now possible, e.g.
we can fetch multiple heads with copy_from/to_user now.
We can get rid of maintaining the log array.  Etc etc.

Signed-off-by: Michael S. Tsirkin
Signed-off-by: Eugenio Pérez
Link:https://lore.kernel.org/r/20200401183118.8334-4-epere...@redhat.com
Signed-off-by: Michael S. Tsirkin
---
drivers/vhost/test.c  |  2 +-
drivers/vhost/vhost.c | 47 ++-
drivers/vhost/vhost.h |  5 -
3 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 9a3a09005e03..02806d6f84ef 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -119,7 +119,7 @@ static int vhost_test_open(struct inode *inode, struct file 
*f)
dev = >dev;
vqs[VHOST_TEST_VQ] = >vqs[VHOST_TEST_VQ];
n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
-   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV,
+   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV + 64,
   VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT, NULL);
f->private_data = n;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 8f9a07282625..aca2a5b0d078 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -299,6 +299,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
{
vq->num = 1;
vq->ndescs = 0;
+   vq->first_desc = 0;
vq->desc = NULL;
vq->avail = NULL;
vq->used = NULL;
@@ -367,6 +368,11 @@ static int vhost_worker(void *data)
return 0;
}
+static int vhost_vq_num_batch_descs(struct vhost_virtqueue *vq)
+{
+   return vq->max_descs - UIO_MAXIOV;
+}

1 descriptor does not mean 1 iov, e.g userspace may pass several 1 byte
length memory regions for us to translate.


Yes but I don't see the relevance. This tells us how many descriptors to
batch, not how many IOVs.

Yes, but questions are:

- this introduce another obstacle to support more than 1K queue size
- if we support 1K queue size, does it mean we need to cache 1K descriptors,
which seems a large stress on the cache

Thanks



Still don't understand the relevance. We support up to 1K descriptors
per buffer just for IOV since we always did. This adds 64 more
descriptors - is that a big deal?



If I understanding correctly, for net, the code tries to batch 
descriptors for at last one packet.


If we allow 1K queue size then we allow a packet that consists of 1K 
descriptors. Then we need to cache 1K descriptors.


Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 5/6] vdpa: introduce virtio pci driver

2020-06-07 Thread Jason Wang


On 2020/6/7 下午9:51, Michael S. Tsirkin wrote:

On Fri, Jun 05, 2020 at 04:54:17PM +0800, Jason Wang wrote:

On 2020/6/2 下午3:08, Jason Wang wrote:

+static const struct pci_device_id vp_vdpa_id_table[] = {
+    { PCI_DEVICE(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
+    { 0 }
+};

This looks like it'll create a mess with either virtio pci
or vdpa being loaded at random. Maybe just don't specify
any IDs for now. Down the road we could get a
distinct vendor ID or a range of device IDs for this.


Right, will do.

Thanks


Rethink about this. If we don't specify any ID, the binding won't work.

We can bind manually. It's not really for production anyway, so
not a big deal imho.



I think you mean doing it via "new_id", right.





How about using a dedicated subsystem vendor id for this?

Thanks

If virtio vendor id is used then standard driver is expected
to bind, right? Maybe use a dedicated vendor id?



I meant something like:

static const struct pci_device_id vp_vdpa_id_table[] = {
    { PCI_DEVICE_SUB(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID, 
VP_TEST_VENDOR_ID, VP_TEST_DEVICE_ID) },

    { 0 }
};

Thanks


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[vhost:vhost 18/52] drivers/virtio/virtio_mem.c:1391:5: warning: Variable 'rc' is reassigned a value before the old one has been used.

2020-06-07 Thread kernel test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost
head:   f3861bc96a7e130943e1975e571ae62c0319b064
commit: 5f1f79bbc9e26fa9412fa9522f957bb8f030c442 [18/52] virtio-mem: 
Paravirtualized memory hotplug
compiler: gcc-9 (Debian 9.3.0-13) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


cppcheck warnings: (new ones prefixed by >>)

>> drivers/virtio/virtio_mem.c:1391:5: warning: Variable 'rc' is reassigned a 
>> value before the old one has been used. [redundantAssignment]
rc = virtio_mem_init_vq(vm);
   ^
   drivers/virtio/virtio_mem.c:1375:0: note: Variable 'rc' is reassigned a 
value before the old one has been used.
int rc = -EINVAL;
   ^
   drivers/virtio/virtio_mem.c:1391:5: note: Variable 'rc' is reassigned a 
value before the old one has been used.
rc = virtio_mem_init_vq(vm);
   ^
>> drivers/virtio/virtio_mem.c:801:22: warning: int result is assigned to long 
>> variable. If the variable is long to avoid loss of information, then you 
>> have loss of information. [truncLongCastAssignment]
const uint64_t size = count * vm->subblock_size;
^
   drivers/virtio/virtio_mem.c:822:22: warning: int result is assigned to long 
variable. If the variable is long to avoid loss of information, then you have 
loss of information. [truncLongCastAssignment]
const uint64_t size = count * vm->subblock_size;
^

vim +/rc +1391 drivers/virtio/virtio_mem.c

  1371  
  1372  static int virtio_mem_probe(struct virtio_device *vdev)
  1373  {
  1374  struct virtio_mem *vm;
  1375  int rc = -EINVAL;
  1376  
  1377  vdev->priv = vm = kzalloc(sizeof(*vm), GFP_KERNEL);
  1378  if (!vm)
  1379  return -ENOMEM;
  1380  
  1381  init_waitqueue_head(>host_resp);
  1382  vm->vdev = vdev;
  1383  INIT_WORK(>wq, virtio_mem_run_wq);
  1384  mutex_init(>hotplug_mutex);
  1385  INIT_LIST_HEAD(>next);
  1386  spin_lock_init(>removal_lock);
  1387  hrtimer_init(>retry_timer, CLOCK_MONOTONIC, 
HRTIMER_MODE_REL);
  1388  vm->retry_timer.function = virtio_mem_timer_expired;
  1389  
  1390  /* register the virtqueue */
> 1391  rc = virtio_mem_init_vq(vm);
  1392  if (rc)
  1393  goto out_free_vm;
  1394  
  1395  /* initialize the device by querying the config */
  1396  rc = virtio_mem_init(vm);
  1397  if (rc)
  1398  goto out_del_vq;
  1399  
  1400  /* register callbacks */
  1401  vm->memory_notifier.notifier_call = 
virtio_mem_memory_notifier_cb;
  1402  rc = register_memory_notifier(>memory_notifier);
  1403  if (rc)
  1404  goto out_del_vq;
  1405  rc = register_virtio_mem_device(vm);
  1406  if (rc)
  1407  goto out_unreg_mem;
  1408  
  1409  virtio_device_ready(vdev);
  1410  
  1411  /* trigger a config update to start processing the 
requested_size */
  1412  atomic_set(>config_changed, 1);
  1413  queue_work(system_freezable_wq, >wq);
  1414  
  1415  return 0;
  1416  out_unreg_mem:
  1417  unregister_memory_notifier(>memory_notifier);
  1418  out_del_vq:
  1419  vdev->config->del_vqs(vdev);
  1420  out_free_vm:
  1421  kfree(vm);
  1422  vdev->priv = NULL;
  1423  
  1424  return rc;
  1425  }
  1426  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 13/13] vhost: drop head based APIs

2020-06-07 Thread Michael S. Tsirkin
Everyone's using buf APIs, no need for head based ones anymore.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/vhost.c | 36 
 drivers/vhost/vhost.h | 12 
 2 files changed, 8 insertions(+), 40 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 72ee55c810c4..e6931b760b61 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2299,12 +2299,12 @@ static int fetch_buf(struct vhost_virtqueue *vq)
return 1;
 }
 
-/* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
+/* Revert the effect of fetch_buf. Useful for error handling. */
+static
 void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
 {
vq->last_avail_idx -= n;
 }
-EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
 
 /* This function returns a value > 0 if a descriptor was found, or 0 if none 
were found.
  * A negative code is returned on error. */
@@ -2464,8 +2464,7 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
return 0;
 }
 
-/* After we've used one of their buffers, we tell them about it.  We'll then
- * want to notify the guest, using eventfd. */
+static
 int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 unsigned count)
 {
@@ -2499,10 +2498,8 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct 
vring_used_elem *heads,
}
return r;
 }
-EXPORT_SYMBOL_GPL(vhost_add_used_n);
 
-/* After we've used one of their buffers, we tell them about it.  We'll then
- * want to notify the guest, using eventfd. */
+static
 int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
 {
struct vring_used_elem heads = {
@@ -2512,14 +2509,17 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned 
int head, int len)
 
return vhost_add_used_n(vq, , 1);
 }
-EXPORT_SYMBOL_GPL(vhost_add_used);
 
+/* After we've used one of their buffers, we tell them about it.  We'll then
+ * want to notify the guest, using vhost_signal. */
 int vhost_put_used_buf(struct vhost_virtqueue *vq, struct vhost_buf *buf)
 {
return vhost_add_used(vq, buf->id, buf->in_len);
 }
 EXPORT_SYMBOL_GPL(vhost_put_used_buf);
 
+/* After we've used one of their buffers, we tell them about it.  We'll then
+ * want to notify the guest, using vhost_signal. */
 int vhost_put_used_n_bufs(struct vhost_virtqueue *vq,
  struct vhost_buf *bufs, unsigned count)
 {
@@ -2580,26 +2580,6 @@ void vhost_signal(struct vhost_dev *dev, struct 
vhost_virtqueue *vq)
 }
 EXPORT_SYMBOL_GPL(vhost_signal);
 
-/* And here's the combo meal deal.  Supersize me! */
-void vhost_add_used_and_signal(struct vhost_dev *dev,
-  struct vhost_virtqueue *vq,
-  unsigned int head, int len)
-{
-   vhost_add_used(vq, head, len);
-   vhost_signal(dev, vq);
-}
-EXPORT_SYMBOL_GPL(vhost_add_used_and_signal);
-
-/* multi-buffer version of vhost_add_used_and_signal */
-void vhost_add_used_and_signal_n(struct vhost_dev *dev,
-struct vhost_virtqueue *vq,
-struct vring_used_elem *heads, unsigned count)
-{
-   vhost_add_used_n(vq, heads, count);
-   vhost_signal(dev, vq);
-}
-EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n);
-
 /* return true if we're sure that avaiable ring is empty */
 bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 28eea0155efb..264a2a2fae97 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -197,11 +197,6 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int 
ioctl, void __user *arg
 bool vhost_vq_access_ok(struct vhost_virtqueue *vq);
 bool vhost_log_access_ok(struct vhost_dev *);
 
-int vhost_get_vq_desc(struct vhost_virtqueue *,
- struct iovec iov[], unsigned int iov_count,
- unsigned int *out_num, unsigned int *in_num,
- struct vhost_log *log, unsigned int *log_num);
-void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 int vhost_get_avail_buf(struct vhost_virtqueue *, struct vhost_buf *buf,
struct iovec iov[], unsigned int iov_count,
unsigned int *out_num, unsigned int *in_num,
@@ -209,13 +204,6 @@ int vhost_get_avail_buf(struct vhost_virtqueue *, struct 
vhost_buf *buf,
 void vhost_discard_avail_bufs(struct vhost_virtqueue *,
  struct vhost_buf *, unsigned count);
 int vhost_vq_init_access(struct vhost_virtqueue *);
-int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
-int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
-unsigned count);
-void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
-  unsigned int id, int len);
-void 

[PATCH RFC v5 06/13] vhost: reorder functions

2020-06-07 Thread Michael S. Tsirkin
Reorder functions in the file to not rely on forward
declarations, in preparation to making them static
down the road.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/vhost.c | 40 
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5075505cfe55..3ffcba4e27e9 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2299,6 +2299,13 @@ static int fetch_buf(struct vhost_virtqueue *vq)
return 1;
 }
 
+/* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
+void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
+{
+   vq->last_avail_idx -= n;
+}
+EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
+
 /* This function returns a value > 0 if a descriptor was found, or 0 if none 
were found.
  * A negative code is returned on error. */
 static int fetch_descs(struct vhost_virtqueue *vq)
@@ -2413,26 +2420,6 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
-/* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
-void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
-{
-   vq->last_avail_idx -= n;
-}
-EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
-
-/* After we've used one of their buffers, we tell them about it.  We'll then
- * want to notify the guest, using eventfd. */
-int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
-{
-   struct vring_used_elem heads = {
-   cpu_to_vhost32(vq, head),
-   cpu_to_vhost32(vq, len)
-   };
-
-   return vhost_add_used_n(vq, , 1);
-}
-EXPORT_SYMBOL_GPL(vhost_add_used);
-
 static int __vhost_add_used_n(struct vhost_virtqueue *vq,
struct vring_used_elem *heads,
unsigned count)
@@ -2502,6 +2489,19 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct 
vring_used_elem *heads,
 }
 EXPORT_SYMBOL_GPL(vhost_add_used_n);
 
+/* After we've used one of their buffers, we tell them about it.  We'll then
+ * want to notify the guest, using eventfd. */
+int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
+{
+   struct vring_used_elem heads = {
+   cpu_to_vhost32(vq, head),
+   cpu_to_vhost32(vq, len)
+   };
+
+   return vhost_add_used_n(vq, , 1);
+}
+EXPORT_SYMBOL_GPL(vhost_add_used);
+
 static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
__u16 old, new;
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 10/13] vhost/test: convert to the buf API

2020-06-07 Thread Michael S. Tsirkin
Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/test.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 02806d6f84ef..251fd2bf74a3 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -44,9 +44,10 @@ static void handle_vq(struct vhost_test *n)
 {
struct vhost_virtqueue *vq = >vqs[VHOST_TEST_VQ];
unsigned out, in;
-   int head;
+   int ret;
size_t len, total_len = 0;
void *private;
+   struct vhost_buf buf;
 
mutex_lock(>mutex);
private = vhost_vq_get_backend(vq);
@@ -58,15 +59,15 @@ static void handle_vq(struct vhost_test *n)
vhost_disable_notify(>dev, vq);
 
for (;;) {
-   head = vhost_get_vq_desc(vq, vq->iov,
-ARRAY_SIZE(vq->iov),
-, ,
-NULL, NULL);
+   ret = vhost_get_avail_buf(vq, vq->iov, ,
+ ARRAY_SIZE(vq->iov),
+ , ,
+ NULL, NULL);
/* On error, stop handling until the next kick. */
-   if (unlikely(head < 0))
+   if (unlikely(ret < 0))
break;
/* Nothing new?  Wait for eventfd to tell us they refilled. */
-   if (head == vq->num) {
+   if (!ret) {
if (unlikely(vhost_enable_notify(>dev, vq))) {
vhost_disable_notify(>dev, vq);
continue;
@@ -78,13 +79,14 @@ static void handle_vq(struct vhost_test *n)
   "out %d, int %d\n", out, in);
break;
}
-   len = iov_length(vq->iov, out);
+   len = buf.out_len;
/* Sanity check */
if (!len) {
vq_err(vq, "Unexpected 0 len for TX\n");
break;
}
-   vhost_add_used_and_signal(>dev, vq, head, 0);
+   vhost_put_used_buf(vq, );
+   vhost_signal(>dev, vq);
total_len += len;
if (unlikely(vhost_exceeds_weight(vq, 0, total_len)))
break;
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 11/13] vhost/scsi: switch to buf APIs

2020-06-07 Thread Michael S. Tsirkin
Switch to buf APIs. Doing this exposes a spec violation in vhost scsi:
all used bufs are marked with length 0.
Fix that is left for another day.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/scsi.c | 73 ++--
 1 file changed, 44 insertions(+), 29 deletions(-)

diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 0cbaa0b3893d..a5cdd4c01a3a 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -71,8 +71,8 @@ struct vhost_scsi_inflight {
 };
 
 struct vhost_scsi_cmd {
-   /* Descriptor from vhost_get_vq_desc() for virt_queue segment */
-   int tvc_vq_desc;
+   /* Descriptor from vhost_get_avail_buf() for virt_queue segment */
+   struct vhost_buf tvc_vq_desc;
/* virtio-scsi initiator task attribute */
int tvc_task_attr;
/* virtio-scsi response incoming iovecs */
@@ -213,7 +213,7 @@ struct vhost_scsi {
  * Context for processing request and control queue operations.
  */
 struct vhost_scsi_ctx {
-   int head;
+   struct vhost_buf buf;
unsigned int out, in;
size_t req_size, rsp_size;
size_t out_size, in_size;
@@ -443,6 +443,20 @@ static int vhost_scsi_check_stop_free(struct se_cmd 
*se_cmd)
return target_put_sess_cmd(se_cmd);
 }
 
+/* Signal to guest that request finished with no input buffer. */
+/* TODO calling this when writing into buffer and most likely a bug */
+static void vhost_scsi_signal_noinput(struct vhost_dev *vdev,
+ struct vhost_virtqueue *vq,
+ struct vhost_buf *bufp)
+{
+   struct vhost_buf buf = *bufp;
+
+   buf.in_len = 0;
+   vhost_put_used_buf(vq, );
+   vhost_signal(vdev, vq);
+}
+
+
 static void
 vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 {
@@ -450,7 +464,8 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct 
vhost_scsi_evt *evt)
struct virtio_scsi_event *event = >event;
struct virtio_scsi_event __user *eventp;
unsigned out, in;
-   int head, ret;
+   struct vhost_buf buf;
+   int ret;
 
if (!vhost_vq_get_backend(vq)) {
vs->vs_events_missed = true;
@@ -459,14 +474,14 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct 
vhost_scsi_evt *evt)
 
 again:
vhost_disable_notify(>dev, vq);
-   head = vhost_get_vq_desc(vq, vq->iov,
-   ARRAY_SIZE(vq->iov), , ,
-   NULL, NULL);
-   if (head < 0) {
+   ret = vhost_get_avail_buf(vq, ,
+ vq->iov, ARRAY_SIZE(vq->iov), , ,
+ NULL, NULL);
+   if (ret < 0) {
vs->vs_events_missed = true;
return;
}
-   if (head == vq->num) {
+   if (!ret) {
if (vhost_enable_notify(>dev, vq))
goto again;
vs->vs_events_missed = true;
@@ -488,7 +503,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct 
vhost_scsi_evt *evt)
eventp = vq->iov[out].iov_base;
ret = __copy_to_user(eventp, event, sizeof(*event));
if (!ret)
-   vhost_add_used_and_signal(>dev, vq, head, 0);
+   vhost_scsi_signal_noinput(>dev, vq, );
else
vq_err(vq, "Faulted on vhost_scsi_send_event\n");
 }
@@ -549,7 +564,7 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work 
*work)
ret = copy_to_iter(_rsp, sizeof(v_rsp), _iter);
if (likely(ret == sizeof(v_rsp))) {
struct vhost_scsi_virtqueue *q;
-   vhost_add_used(cmd->tvc_vq, cmd->tvc_vq_desc, 0);
+   vhost_put_used_buf(cmd->tvc_vq, >tvc_vq_desc);
q = container_of(cmd->tvc_vq, struct 
vhost_scsi_virtqueue, vq);
vq = q - vs->vqs;
__set_bit(vq, signal);
@@ -793,7 +808,7 @@ static void vhost_scsi_submission_work(struct work_struct 
*work)
 static void
 vhost_scsi_send_bad_target(struct vhost_scsi *vs,
   struct vhost_virtqueue *vq,
-  int head, unsigned out)
+  struct vhost_buf *buf, unsigned out)
 {
struct virtio_scsi_cmd_resp __user *resp;
struct virtio_scsi_cmd_resp rsp;
@@ -804,7 +819,7 @@ vhost_scsi_send_bad_target(struct vhost_scsi *vs,
resp = vq->iov[out].iov_base;
ret = __copy_to_user(resp, , sizeof(rsp));
if (!ret)
-   vhost_add_used_and_signal(>dev, vq, head, 0);
+   vhost_scsi_signal_noinput(>dev, vq, buf);
else
pr_err("Faulted on virtio_scsi_cmd_resp\n");
 }
@@ -813,21 +828,21 @@ static int
 vhost_scsi_get_desc(struct vhost_scsi *vs, struct vhost_virtqueue *vq,
struct vhost_scsi_ctx *vc)
 {
-   int ret = -ENXIO;
+   int r, ret = -ENXIO;
 
-   vc->head = 

[PATCH RFC v5 01/13] vhost: option to fetch descriptors through an independent struct

2020-06-07 Thread Michael S. Tsirkin
The idea is to support multiple ring formats by converting
to a format-independent array of descriptors.

This costs extra cycles, but we gain in ability
to fetch a batch of descriptors in one go, which
is good for code cache locality.

When used, this causes a minor performance degradation,
it's been kept as simple as possible for ease of review.
A follow-up patch gets us back the performance by adding batching.

To simplify benchmarking, I kept the old code around so one can switch
back and forth between old and new code. This will go away in the final
submission.

Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Eugenio Pérez 
Link: https://lore.kernel.org/r/20200401183118.8334-2-epere...@redhat.com
Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/vhost.c | 302 +-
 drivers/vhost/vhost.h |  16 +++
 2 files changed, 317 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 172da092107e..e682ed551587 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -303,6 +303,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
   struct vhost_virtqueue *vq)
 {
vq->num = 1;
+   vq->ndescs = 0;
vq->desc = NULL;
vq->avail = NULL;
vq->used = NULL;
@@ -373,6 +374,9 @@ static int vhost_worker(void *data)
 
 static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq)
 {
+   kfree(vq->descs);
+   vq->descs = NULL;
+   vq->max_descs = 0;
kfree(vq->indirect);
vq->indirect = NULL;
kfree(vq->log);
@@ -389,6 +393,10 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
 
for (i = 0; i < dev->nvqs; ++i) {
vq = dev->vqs[i];
+   vq->max_descs = dev->iov_limit;
+   vq->descs = kmalloc_array(vq->max_descs,
+ sizeof(*vq->descs),
+ GFP_KERNEL);
vq->indirect = kmalloc_array(UIO_MAXIOV,
 sizeof(*vq->indirect),
 GFP_KERNEL);
@@ -396,7 +404,7 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
GFP_KERNEL);
vq->heads = kmalloc_array(dev->iov_limit, sizeof(*vq->heads),
  GFP_KERNEL);
-   if (!vq->indirect || !vq->log || !vq->heads)
+   if (!vq->indirect || !vq->log || !vq->heads || !vq->descs)
goto err_nomem;
}
return 0;
@@ -488,6 +496,8 @@ void vhost_dev_init(struct vhost_dev *dev,
 
for (i = 0; i < dev->nvqs; ++i) {
vq = dev->vqs[i];
+   vq->descs = NULL;
+   vq->max_descs = 0;
vq->log = NULL;
vq->indirect = NULL;
vq->heads = NULL;
@@ -2315,6 +2325,296 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
+static struct vhost_desc *peek_split_desc(struct vhost_virtqueue *vq)
+{
+   BUG_ON(!vq->ndescs);
+   return >descs[vq->ndescs - 1];
+}
+
+static void pop_split_desc(struct vhost_virtqueue *vq)
+{
+   BUG_ON(!vq->ndescs);
+   --vq->ndescs;
+}
+
+#define VHOST_DESC_FLAGS (VRING_DESC_F_INDIRECT | VRING_DESC_F_WRITE | \
+ VRING_DESC_F_NEXT)
+static int push_split_desc(struct vhost_virtqueue *vq, struct vring_desc 
*desc, u16 id)
+{
+   struct vhost_desc *h;
+
+   if (unlikely(vq->ndescs >= vq->max_descs))
+   return -EINVAL;
+   h = >descs[vq->ndescs++];
+   h->addr = vhost64_to_cpu(vq, desc->addr);
+   h->len = vhost32_to_cpu(vq, desc->len);
+   h->flags = vhost16_to_cpu(vq, desc->flags) & VHOST_DESC_FLAGS;
+   h->id = id;
+
+   return 0;
+}
+
+static int fetch_indirect_descs(struct vhost_virtqueue *vq,
+   struct vhost_desc *indirect,
+   u16 head)
+{
+   struct vring_desc desc;
+   unsigned int i = 0, count, found = 0;
+   u32 len = indirect->len;
+   struct iov_iter from;
+   int ret;
+
+   /* Sanity check */
+   if (unlikely(len % sizeof desc)) {
+   vq_err(vq, "Invalid length in indirect descriptor: "
+  "len 0x%llx not multiple of 0x%zx\n",
+  (unsigned long long)len,
+  sizeof desc);
+   return -EINVAL;
+   }
+
+   ret = translate_desc(vq, indirect->addr, len, vq->indirect,
+UIO_MAXIOV, VHOST_ACCESS_RO);
+   if (unlikely(ret < 0)) {
+   if (ret != -EAGAIN)
+   vq_err(vq, "Translation failure %d in indirect.\n", 
ret);
+   return ret;
+   }
+   iov_iter_init(, READ, vq->indirect, ret, len);
+
+   /* We will use the result as an address to read from, so most
+   

[PATCH RFC v5 03/13] vhost: batching fetches

2020-06-07 Thread Michael S. Tsirkin
With this patch applied, new and old code perform identically.

Lots of extra optimizations are now possible, e.g.
we can fetch multiple heads with copy_from/to_user now.
We can get rid of maintaining the log array.  Etc etc.

Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Eugenio Pérez 
Link: https://lore.kernel.org/r/20200401183118.8334-4-epere...@redhat.com
Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/test.c  |  2 +-
 drivers/vhost/vhost.c | 47 ++-
 drivers/vhost/vhost.h |  3 +++
 3 files changed, 46 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 9a3a09005e03..02806d6f84ef 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -119,7 +119,7 @@ static int vhost_test_open(struct inode *inode, struct file 
*f)
dev = >dev;
vqs[VHOST_TEST_VQ] = >vqs[VHOST_TEST_VQ];
n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
-   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV,
+   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV + 64,
   VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT, NULL);
 
f->private_data = n;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 33a72edb3ccd..3b0609801381 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -304,6 +304,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 {
vq->num = 1;
vq->ndescs = 0;
+   vq->first_desc = 0;
vq->desc = NULL;
vq->avail = NULL;
vq->used = NULL;
@@ -372,6 +373,11 @@ static int vhost_worker(void *data)
return 0;
 }
 
+static int vhost_vq_num_batch_descs(struct vhost_virtqueue *vq)
+{
+   return vq->max_descs - UIO_MAXIOV;
+}
+
 static void vhost_vq_free_iovecs(struct vhost_virtqueue *vq)
 {
kfree(vq->descs);
@@ -394,6 +400,9 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
for (i = 0; i < dev->nvqs; ++i) {
vq = dev->vqs[i];
vq->max_descs = dev->iov_limit;
+   if (vhost_vq_num_batch_descs(vq) < 0) {
+   return -EINVAL;
+   }
vq->descs = kmalloc_array(vq->max_descs,
  sizeof(*vq->descs),
  GFP_KERNEL);
@@ -1610,6 +1619,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int 
ioctl, void __user *arg
vq->last_avail_idx = s.num;
/* Forget the cached index value. */
vq->avail_idx = vq->last_avail_idx;
+   vq->ndescs = vq->first_desc = 0;
break;
case VHOST_GET_VRING_BASE:
s.index = idx;
@@ -2179,7 +2189,7 @@ static int fetch_indirect_descs(struct vhost_virtqueue 
*vq,
return 0;
 }
 
-static int fetch_descs(struct vhost_virtqueue *vq)
+static int fetch_buf(struct vhost_virtqueue *vq)
 {
unsigned int i, head, found = 0;
struct vhost_desc *last;
@@ -2192,7 +2202,11 @@ static int fetch_descs(struct vhost_virtqueue *vq)
/* Check it isn't doing very strange things with descriptor numbers. */
last_avail_idx = vq->last_avail_idx;
 
-   if (vq->avail_idx == vq->last_avail_idx) {
+   if (unlikely(vq->avail_idx == vq->last_avail_idx)) {
+   /* If we already have work to do, don't bother re-checking. */
+   if (likely(vq->ndescs))
+   return vq->num;
+
if (unlikely(vhost_get_avail_idx(vq, _idx))) {
vq_err(vq, "Failed to access avail idx at %p\n",
>avail->idx);
@@ -2283,6 +2297,24 @@ static int fetch_descs(struct vhost_virtqueue *vq)
return 0;
 }
 
+static int fetch_descs(struct vhost_virtqueue *vq)
+{
+   int ret = 0;
+
+   if (unlikely(vq->first_desc >= vq->ndescs)) {
+   vq->first_desc = 0;
+   vq->ndescs = 0;
+   }
+
+   if (vq->ndescs)
+   return 0;
+
+   while (!ret && vq->ndescs <= vhost_vq_num_batch_descs(vq))
+   ret = fetch_buf(vq);
+
+   return vq->ndescs ? 0 : ret;
+}
+
 /* This looks in the virtqueue and for the first available buffer, and converts
  * it to an iovec for convenient access.  Since descriptors consist of some
  * number of output then some number of input descriptors, it's actually two
@@ -2308,7 +2340,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
if (unlikely(log))
*log_num = 0;
 
-   for (i = 0; i < vq->ndescs; ++i) {
+   for (i = vq->first_desc; i < vq->ndescs; ++i) {
unsigned iov_count = *in_num + *out_num;
struct vhost_desc *desc = >descs[i];
int access;
@@ -2354,14 +2386,19 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
}
 
ret = desc->id;
+
+   if (!(desc->flags & VRING_DESC_F_NEXT))
+ 

[PATCH RFC v5 04/13] vhost: cleanup fetch_buf return code handling

2020-06-07 Thread Michael S. Tsirkin
Return code of fetch_buf is confusing, so callers resort to
tricks to get to sane values. Let's switch to something standard:
0 empty, >0 non-empty, <0 error.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/vhost.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 3b0609801381..5075505cfe55 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2189,6 +2189,8 @@ static int fetch_indirect_descs(struct vhost_virtqueue 
*vq,
return 0;
 }
 
+/* This function returns a value > 0 if a descriptor was found, or 0 if none 
were found.
+ * A negative code is returned on error. */
 static int fetch_buf(struct vhost_virtqueue *vq)
 {
unsigned int i, head, found = 0;
@@ -2205,7 +2207,7 @@ static int fetch_buf(struct vhost_virtqueue *vq)
if (unlikely(vq->avail_idx == vq->last_avail_idx)) {
/* If we already have work to do, don't bother re-checking. */
if (likely(vq->ndescs))
-   return vq->num;
+   return 0;
 
if (unlikely(vhost_get_avail_idx(vq, _idx))) {
vq_err(vq, "Failed to access avail idx at %p\n",
@@ -2224,7 +2226,7 @@ static int fetch_buf(struct vhost_virtqueue *vq)
 * invalid.
 */
if (vq->avail_idx == last_avail_idx)
-   return vq->num;
+   return 0;
 
/* Only get avail ring entries after they have been
 * exposed by guest.
@@ -2294,12 +2296,14 @@ static int fetch_buf(struct vhost_virtqueue *vq)
/* On success, increment avail index. */
vq->last_avail_idx++;
 
-   return 0;
+   return 1;
 }
 
+/* This function returns a value > 0 if a descriptor was found, or 0 if none 
were found.
+ * A negative code is returned on error. */
 static int fetch_descs(struct vhost_virtqueue *vq)
 {
-   int ret = 0;
+   int ret;
 
if (unlikely(vq->first_desc >= vq->ndescs)) {
vq->first_desc = 0;
@@ -2309,10 +2313,14 @@ static int fetch_descs(struct vhost_virtqueue *vq)
if (vq->ndescs)
return 0;
 
-   while (!ret && vq->ndescs <= vhost_vq_num_batch_descs(vq))
-   ret = fetch_buf(vq);
+   for (ret = 1;
+ret > 0 && vq->ndescs <= vhost_vq_num_batch_descs(vq);
+ret = fetch_buf(vq))
+   ;
 
-   return vq->ndescs ? 0 : ret;
+   /* On success we expect some descs */
+   BUG_ON(ret > 0 && !vq->ndescs);
+   return ret;
 }
 
 /* This looks in the virtqueue and for the first available buffer, and converts
@@ -2331,7 +2339,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
int ret = fetch_descs(vq);
int i;
 
-   if (ret)
+   if (ret <= 0)
return ret;
 
/* Now convert to IOV */
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 00/13] vhost: ring format independence

2020-06-07 Thread Michael S. Tsirkin


This adds infrastructure required for supporting
multiple ring formats.

The idea is as follows: we convert descriptors to an
independent format first, and process that converting to
iov later.

Used ring is similar: we fetch into an independent struct first,
convert that to IOV later.

The point is that we have a tight loop that fetches
descriptors, which is good for cache utilization.
This will also allow all kind of batching tricks -
e.g. it seems possible to keep SMAP disabled while
we are fetching multiple descriptors.

For used descriptors, this allows keeping track of the buffer length
without need to rescan IOV.

This seems to perform exactly the same as the original
code based on a microbenchmark.
Lightly tested.
More testing would be very much appreciated.


changes from v4:
- added used descriptor format independence
- addressed comments by jason
- fixed a crash detected by the lkp robot.

changes from v3:
- fixed error handling in case of indirect descriptors
- add BUG_ON to detect buffer overflow in case of bugs
in response to comment by Jason Wang
- minor code tweaks

Changes from v2:
- fixed indirect descriptor batching
reported by Jason Wang

Changes from v1:
- typo fixes


Michael S. Tsirkin (13):
  vhost: option to fetch descriptors through an independent struct
  vhost: use batched version by default
  vhost: batching fetches
  vhost: cleanup fetch_buf return code handling
  vhost/net: pass net specific struct pointer
  vhost: reorder functions
  vhost: format-independent API for used buffers
  vhost/net: convert to new API: heads->bufs
  vhost/net: avoid iov length math
  vhost/test: convert to the buf API
  vhost/scsi: switch to buf APIs
  vhost/vsock: switch to the buf API
  vhost: drop head based APIs

 drivers/vhost/net.c   | 174 ++-
 drivers/vhost/scsi.c  |  73 
 drivers/vhost/test.c  |  22 +--
 drivers/vhost/vhost.c | 380 +++---
 drivers/vhost/vhost.h |  44 +++--
 drivers/vhost/vsock.c |  30 ++--
 6 files changed, 441 insertions(+), 282 deletions(-)

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 02/13] vhost: use batched version by default

2020-06-07 Thread Michael S. Tsirkin
As testing shows no performance change, switch to that now.

Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Eugenio Pérez 
Link: https://lore.kernel.org/r/20200401183118.8334-3-epere...@redhat.com
Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/vhost.c | 251 +-
 drivers/vhost/vhost.h |   4 -
 2 files changed, 2 insertions(+), 253 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index e682ed551587..33a72edb3ccd 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2078,253 +2078,6 @@ static unsigned next_desc(struct vhost_virtqueue *vq, 
struct vring_desc *desc)
return next;
 }
 
-static int get_indirect(struct vhost_virtqueue *vq,
-   struct iovec iov[], unsigned int iov_size,
-   unsigned int *out_num, unsigned int *in_num,
-   struct vhost_log *log, unsigned int *log_num,
-   struct vring_desc *indirect)
-{
-   struct vring_desc desc;
-   unsigned int i = 0, count, found = 0;
-   u32 len = vhost32_to_cpu(vq, indirect->len);
-   struct iov_iter from;
-   int ret, access;
-
-   /* Sanity check */
-   if (unlikely(len % sizeof desc)) {
-   vq_err(vq, "Invalid length in indirect descriptor: "
-  "len 0x%llx not multiple of 0x%zx\n",
-  (unsigned long long)len,
-  sizeof desc);
-   return -EINVAL;
-   }
-
-   ret = translate_desc(vq, vhost64_to_cpu(vq, indirect->addr), len, 
vq->indirect,
-UIO_MAXIOV, VHOST_ACCESS_RO);
-   if (unlikely(ret < 0)) {
-   if (ret != -EAGAIN)
-   vq_err(vq, "Translation failure %d in indirect.\n", 
ret);
-   return ret;
-   }
-   iov_iter_init(, READ, vq->indirect, ret, len);
-
-   /* We will use the result as an address to read from, so most
-* architectures only need a compiler barrier here. */
-   read_barrier_depends();
-
-   count = len / sizeof desc;
-   /* Buffers are chained via a 16 bit next field, so
-* we can have at most 2^16 of these. */
-   if (unlikely(count > USHRT_MAX + 1)) {
-   vq_err(vq, "Indirect buffer length too big: %d\n",
-  indirect->len);
-   return -E2BIG;
-   }
-
-   do {
-   unsigned iov_count = *in_num + *out_num;
-   if (unlikely(++found > count)) {
-   vq_err(vq, "Loop detected: last one at %u "
-  "indirect size %u\n",
-  i, count);
-   return -EINVAL;
-   }
-   if (unlikely(!copy_from_iter_full(, sizeof(desc), ))) 
{
-   vq_err(vq, "Failed indirect descriptor: idx %d, %zx\n",
-  i, (size_t)vhost64_to_cpu(vq, indirect->addr) + 
i * sizeof desc);
-   return -EINVAL;
-   }
-   if (unlikely(desc.flags & cpu_to_vhost16(vq, 
VRING_DESC_F_INDIRECT))) {
-   vq_err(vq, "Nested indirect descriptor: idx %d, %zx\n",
-  i, (size_t)vhost64_to_cpu(vq, indirect->addr) + 
i * sizeof desc);
-   return -EINVAL;
-   }
-
-   if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_WRITE))
-   access = VHOST_ACCESS_WO;
-   else
-   access = VHOST_ACCESS_RO;
-
-   ret = translate_desc(vq, vhost64_to_cpu(vq, desc.addr),
-vhost32_to_cpu(vq, desc.len), iov + 
iov_count,
-iov_size - iov_count, access);
-   if (unlikely(ret < 0)) {
-   if (ret != -EAGAIN)
-   vq_err(vq, "Translation failure %d indirect idx 
%d\n",
-   ret, i);
-   return ret;
-   }
-   /* If this is an input descriptor, increment that count. */
-   if (access == VHOST_ACCESS_WO) {
-   *in_num += ret;
-   if (unlikely(log && ret)) {
-   log[*log_num].addr = vhost64_to_cpu(vq, 
desc.addr);
-   log[*log_num].len = vhost32_to_cpu(vq, 
desc.len);
-   ++*log_num;
-   }
-   } else {
-   /* If it's an output descriptor, they're all supposed
-* to come before any input descriptors. */
-   if (unlikely(*in_num)) {
-   vq_err(vq, "Indirect descriptor "
-  "has out after in: idx %d\n", i);
-   return -EINVAL;
-   }
-   

[PATCH RFC v5 05/13] vhost/net: pass net specific struct pointer

2020-06-07 Thread Michael S. Tsirkin
In preparation for further cleanup, pass net specific pointer
to ubuf callbacks so we can move net specific fields
out to net structures.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/net.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index bf5e1d81ae25..ff594eec8ae3 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -94,7 +94,7 @@ struct vhost_net_ubuf_ref {
 */
atomic_t refcount;
wait_queue_head_t wait;
-   struct vhost_virtqueue *vq;
+   struct vhost_net_virtqueue *nvq;
 };
 
 #define VHOST_NET_BATCH 64
@@ -231,7 +231,7 @@ static void vhost_net_enable_zcopy(int vq)
 }
 
 static struct vhost_net_ubuf_ref *
-vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy)
+vhost_net_ubuf_alloc(struct vhost_net_virtqueue *nvq, bool zcopy)
 {
struct vhost_net_ubuf_ref *ubufs;
/* No zero copy backend? Nothing to count. */
@@ -242,7 +242,7 @@ vhost_net_ubuf_alloc(struct vhost_virtqueue *vq, bool zcopy)
return ERR_PTR(-ENOMEM);
atomic_set(>refcount, 1);
init_waitqueue_head(>wait);
-   ubufs->vq = vq;
+   ubufs->nvq = nvq;
return ubufs;
 }
 
@@ -384,13 +384,13 @@ static void vhost_zerocopy_signal_used(struct vhost_net 
*net,
 static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
 {
struct vhost_net_ubuf_ref *ubufs = ubuf->ctx;
-   struct vhost_virtqueue *vq = ubufs->vq;
+   struct vhost_net_virtqueue *nvq = ubufs->nvq;
int cnt;
 
rcu_read_lock_bh();
 
/* set len to mark this desc buffers done DMA */
-   vq->heads[ubuf->desc].len = success ?
+   nvq->vq.heads[ubuf->desc].in_len = success ?
VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
cnt = vhost_net_ubuf_put(ubufs);
 
@@ -402,7 +402,7 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, 
bool success)
 * less than 10% of times).
 */
if (cnt <= 1 || !(cnt % 16))
-   vhost_poll_queue(>poll);
+   vhost_poll_queue(>vq.poll);
 
rcu_read_unlock_bh();
 }
@@ -1525,7 +1525,7 @@ static long vhost_net_set_backend(struct vhost_net *n, 
unsigned index, int fd)
/* start polling new socket */
oldsock = vhost_vq_get_backend(vq);
if (sock != oldsock) {
-   ubufs = vhost_net_ubuf_alloc(vq,
+   ubufs = vhost_net_ubuf_alloc(nvq,
 sock && vhost_sock_zcopy(sock));
if (IS_ERR(ubufs)) {
r = PTR_ERR(ubufs);
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 08/13] vhost/net: convert to new API: heads->bufs

2020-06-07 Thread Michael S. Tsirkin
Convert vhost net to use the new format-agnostic API.
In particular, don't poke at vq internals such as the
heads array.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/net.c | 154 +++-
 1 file changed, 82 insertions(+), 72 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index ff594eec8ae3..830fe84912a5 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -59,13 +59,13 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy 
TX;"
  * status internally; used for zerocopy tx only.
  */
 /* Lower device DMA failed */
-#define VHOST_DMA_FAILED_LEN   ((__force __virtio32)3)
+#define VHOST_DMA_FAILED_LEN   (3)
 /* Lower device DMA done */
-#define VHOST_DMA_DONE_LEN ((__force __virtio32)2)
+#define VHOST_DMA_DONE_LEN (2)
 /* Lower device DMA in progress */
-#define VHOST_DMA_IN_PROGRESS  ((__force __virtio32)1)
+#define VHOST_DMA_IN_PROGRESS  (1)
 /* Buffer unused */
-#define VHOST_DMA_CLEAR_LEN((__force __virtio32)0)
+#define VHOST_DMA_CLEAR_LEN(0)
 
 #define VHOST_DMA_IS_DONE(len) ((__force u32)(len) >= (__force 
u32)VHOST_DMA_DONE_LEN)
 
@@ -112,9 +112,12 @@ struct vhost_net_virtqueue {
/* last used idx for outstanding DMA zerocopy buffers */
int upend_idx;
/* For TX, first used idx for DMA done zerocopy buffers
-* For RX, number of batched heads
+* For RX, number of batched bufs
 */
int done_idx;
+   /* Outstanding user bufs. UIO_MAXIOV in length. */
+   /* TODO: we can make this smaller for sure. */
+   struct vhost_buf *bufs;
/* Number of XDP frames batched */
int batched_xdp;
/* an array of userspace buffers info */
@@ -271,6 +274,8 @@ static void vhost_net_clear_ubuf_info(struct vhost_net *n)
int i;
 
for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
+   kfree(n->vqs[i].bufs);
+   n->vqs[i].bufs = NULL;
kfree(n->vqs[i].ubuf_info);
n->vqs[i].ubuf_info = NULL;
}
@@ -282,6 +287,12 @@ static int vhost_net_set_ubuf_info(struct vhost_net *n)
int i;
 
for (i = 0; i < VHOST_NET_VQ_MAX; ++i) {
+   n->vqs[i].bufs = kmalloc_array(UIO_MAXIOV,
+  sizeof(*n->vqs[i].bufs),
+  GFP_KERNEL);
+   if (!n->vqs[i].bufs)
+   goto err;
+
zcopy = vhost_net_zcopy_mask & (0x1 << i);
if (!zcopy)
continue;
@@ -364,18 +375,18 @@ static void vhost_zerocopy_signal_used(struct vhost_net 
*net,
int j = 0;
 
for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
-   if (vq->heads[i].len == VHOST_DMA_FAILED_LEN)
+   if (nvq->bufs[i].in_len == VHOST_DMA_FAILED_LEN)
vhost_net_tx_err(net);
-   if (VHOST_DMA_IS_DONE(vq->heads[i].len)) {
-   vq->heads[i].len = VHOST_DMA_CLEAR_LEN;
+   if (VHOST_DMA_IS_DONE(nvq->bufs[i].in_len)) {
+   nvq->bufs[i].in_len = VHOST_DMA_CLEAR_LEN;
++j;
} else
break;
}
while (j) {
add = min(UIO_MAXIOV - nvq->done_idx, j);
-   vhost_add_used_and_signal_n(vq->dev, vq,
-   >heads[nvq->done_idx], add);
+   vhost_put_used_n_bufs(vq, >bufs[nvq->done_idx], add);
+   vhost_signal(vq->dev, vq);
nvq->done_idx = (nvq->done_idx + add) % UIO_MAXIOV;
j -= add;
}
@@ -390,7 +401,7 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, 
bool success)
rcu_read_lock_bh();
 
/* set len to mark this desc buffers done DMA */
-   nvq->vq.heads[ubuf->desc].in_len = success ?
+   nvq->bufs[ubuf->desc].in_len = success ?
VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
cnt = vhost_net_ubuf_put(ubufs);
 
@@ -452,7 +463,8 @@ static void vhost_net_signal_used(struct 
vhost_net_virtqueue *nvq)
if (!nvq->done_idx)
return;
 
-   vhost_add_used_and_signal_n(dev, vq, vq->heads, nvq->done_idx);
+   vhost_put_used_n_bufs(vq, nvq->bufs, nvq->done_idx);
+   vhost_signal(dev, vq);
nvq->done_idx = 0;
 }
 
@@ -558,6 +570,7 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
struct vhost_net_virtqueue *tnvq,
+   struct vhost_buf *buf,
unsigned int *out_num, unsigned int *in_num,
struct msghdr *msghdr, bool *busyloop_intr)
 {
@@ -565,10 +578,10 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
struct vhost_virtqueue *rvq = >vq;

[PATCH RFC v5 12/13] vhost/vsock: switch to the buf API

2020-06-07 Thread Michael S. Tsirkin
A straight-forward conversion.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/vsock.c | 30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index a483cec31d5c..61c6d3dd2ae3 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -103,7 +103,8 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
unsigned out, in;
size_t nbytes;
size_t iov_len, payload_len;
-   int head;
+   struct vhost_buf buf;
+   int ret;
 
spin_lock_bh(>send_pkt_list_lock);
if (list_empty(>send_pkt_list)) {
@@ -117,16 +118,17 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
list_del_init(>list);
spin_unlock_bh(>send_pkt_list_lock);
 
-   head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
-, , NULL, NULL);
-   if (head < 0) {
+   ret = vhost_get_avail_buf(vq, ,
+ vq->iov, ARRAY_SIZE(vq->iov),
+ , , NULL, NULL);
+   if (ret < 0) {
spin_lock_bh(>send_pkt_list_lock);
list_add(>list, >send_pkt_list);
spin_unlock_bh(>send_pkt_list_lock);
break;
}
 
-   if (head == vq->num) {
+   if (!ret) {
spin_lock_bh(>send_pkt_list_lock);
list_add(>list, >send_pkt_list);
spin_unlock_bh(>send_pkt_list_lock);
@@ -186,7 +188,8 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 */
virtio_transport_deliver_tap_pkt(pkt);
 
-   vhost_add_used(vq, head, sizeof(pkt->hdr) + payload_len);
+   buf.in_len = sizeof(pkt->hdr) + payload_len;
+   vhost_put_used_buf(vq, );
added = true;
 
pkt->off += payload_len;
@@ -440,7 +443,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work 
*work)
struct vhost_vsock *vsock = container_of(vq->dev, struct vhost_vsock,
 dev);
struct virtio_vsock_pkt *pkt;
-   int head, pkts = 0, total_len = 0;
+   int ret, pkts = 0, total_len = 0;
+   struct vhost_buf buf;
unsigned int out, in;
bool added = false;
 
@@ -461,12 +465,13 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work 
*work)
goto no_more_replies;
}
 
-   head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
-, , NULL, NULL);
-   if (head < 0)
+   ret = vhost_get_avail_buf(vq, ,
+ vq->iov, ARRAY_SIZE(vq->iov),
+ , , NULL, NULL);
+   if (ret < 0)
break;
 
-   if (head == vq->num) {
+   if (!ret) {
if (unlikely(vhost_enable_notify(>dev, vq))) {
vhost_disable_notify(>dev, vq);
continue;
@@ -494,7 +499,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work 
*work)
virtio_transport_free_pkt(pkt);
 
len += sizeof(pkt->hdr);
-   vhost_add_used(vq, head, len);
+   buf.in_len = len;
+   vhost_put_used_buf(vq, );
total_len += len;
added = true;
} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH RFC v5 07/13] vhost: format-independent API for used buffers

2020-06-07 Thread Michael S. Tsirkin
Add a new API that doesn't assume used ring, heads, etc.
For now, we keep the old APIs around to make it easier
to convert drivers.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/vhost.c | 52 ++-
 drivers/vhost/vhost.h | 17 +-
 2 files changed, 58 insertions(+), 11 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 3ffcba4e27e9..72ee55c810c4 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2335,13 +2335,12 @@ static int fetch_descs(struct vhost_virtqueue *vq)
  * number of output then some number of input descriptors, it's actually two
  * iovecs, but we pack them into one and note how many of each there were.
  *
- * This function returns the descriptor number found, or vq->num (which is
- * never a valid descriptor number) if none was found.  A negative code is
- * returned on error. */
-int vhost_get_vq_desc(struct vhost_virtqueue *vq,
- struct iovec iov[], unsigned int iov_size,
- unsigned int *out_num, unsigned int *in_num,
- struct vhost_log *log, unsigned int *log_num)
+ * This function returns a value > 0 if a descriptor was found, or 0 if none 
were found.
+ * A negative code is returned on error. */
+int vhost_get_avail_buf(struct vhost_virtqueue *vq, struct vhost_buf *buf,
+   struct iovec iov[], unsigned int iov_size,
+   unsigned int *out_num, unsigned int *in_num,
+   struct vhost_log *log, unsigned int *log_num)
 {
int ret = fetch_descs(vq);
int i;
@@ -2354,6 +2353,8 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
*out_num = *in_num = 0;
if (unlikely(log))
*log_num = 0;
+   buf->in_len = buf->out_len = 0;
+   buf->descs = 0;
 
for (i = vq->first_desc; i < vq->ndescs; ++i) {
unsigned iov_count = *in_num + *out_num;
@@ -2383,6 +2384,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
/* If this is an input descriptor,
 * increment that count. */
*in_num += ret;
+   buf->in_len += desc->len;
if (unlikely(log && ret)) {
log[*log_num].addr = desc->addr;
log[*log_num].len = desc->len;
@@ -2398,9 +2400,11 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
goto err;
}
*out_num += ret;
+   buf->out_len += desc->len;
}
 
-   ret = desc->id;
+   buf->id = desc->id;
+   ++buf->descs;
 
if (!(desc->flags & VRING_DESC_F_NEXT))
break;
@@ -2408,7 +2412,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 
vq->first_desc = i + 1;
 
-   return ret;
+   return 1;
 
 err:
for (i = vq->first_desc; i < vq->ndescs; ++i)
@@ -2418,7 +2422,15 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 
return ret;
 }
-EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
+EXPORT_SYMBOL_GPL(vhost_get_avail_buf);
+
+/* Reverse the effect of vhost_get_avail_buf. Useful for error handling. */
+void vhost_discard_avail_bufs(struct vhost_virtqueue *vq,
+ struct vhost_buf *buf, unsigned count)
+{
+   vhost_discard_vq_desc(vq, count);
+}
+EXPORT_SYMBOL_GPL(vhost_discard_avail_bufs);
 
 static int __vhost_add_used_n(struct vhost_virtqueue *vq,
struct vring_used_elem *heads,
@@ -2502,6 +2514,26 @@ int vhost_add_used(struct vhost_virtqueue *vq, unsigned 
int head, int len)
 }
 EXPORT_SYMBOL_GPL(vhost_add_used);
 
+int vhost_put_used_buf(struct vhost_virtqueue *vq, struct vhost_buf *buf)
+{
+   return vhost_add_used(vq, buf->id, buf->in_len);
+}
+EXPORT_SYMBOL_GPL(vhost_put_used_buf);
+
+int vhost_put_used_n_bufs(struct vhost_virtqueue *vq,
+ struct vhost_buf *bufs, unsigned count)
+{
+   unsigned i;
+
+   for (i = 0; i < count; ++i) {
+   vq->heads[i].id = cpu_to_vhost32(vq, bufs[i].id);
+   vq->heads[i].len = cpu_to_vhost32(vq, bufs[i].in_len);
+   }
+
+   return vhost_add_used_n(vq, vq->heads, count);
+}
+EXPORT_SYMBOL_GPL(vhost_put_used_n_bufs);
+
 static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
__u16 old, new;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index fed36af5c444..28eea0155efb 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -67,6 +67,13 @@ struct vhost_desc {
u16 id;
 };
 
+struct vhost_buf {
+   u32 out_len;
+   u32 in_len;
+   u16 descs;
+   u16 id;
+};
+
 /* The virtqueue structure describes a queue attached to a device. */
 struct vhost_virtqueue {
struct vhost_dev *dev;
@@ -195,7 +202,12 @@ int 

[PATCH RFC v5 09/13] vhost/net: avoid iov length math

2020-06-07 Thread Michael S. Tsirkin
Now that API exposes buffer length, we no longer need to
scan IOVs to figure it out.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/net.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 830fe84912a5..0b509be8d7b1 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -607,11 +607,9 @@ static bool vhost_exceeds_maxpend(struct vhost_net *net)
 }
 
 static size_t init_iov_iter(struct vhost_virtqueue *vq, struct iov_iter *iter,
-   size_t hdr_size, int out)
+   size_t len, size_t hdr_size, int out)
 {
/* Skip header. TODO: support TSO. */
-   size_t len = iov_length(vq->iov, out);
-
iov_iter_init(iter, WRITE, vq->iov, out, len);
iov_iter_advance(iter, hdr_size);
 
@@ -640,7 +638,7 @@ static int get_tx_bufs(struct vhost_net *net,
}
 
/* Sanity check */
-   *len = init_iov_iter(vq, >msg_iter, nvq->vhost_hlen, *out);
+   *len = init_iov_iter(vq, >msg_iter, buf->out_len, nvq->vhost_hlen, 
*out);
if (*len == 0) {
vq_err(vq, "Unexpected header len for TX: %zd expected %zd\n",
*len, nvq->vhost_hlen);
@@ -1080,7 +1078,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
nlogs += *log_num;
log += *log_num;
}
-   len = iov_length(vq->iov + seg, in);
+   len = bufs[bufcount].in_len;
datalen -= len;
++bufcount;
seg += in;
-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC 01/13] vhost: option to fetch descriptors through an independent struct

2020-06-07 Thread Michael S. Tsirkin
On Wed, Jun 03, 2020 at 08:04:45PM +0800, Jason Wang wrote:
> 
> On 2020/6/3 下午5:48, Michael S. Tsirkin wrote:
> > On Wed, Jun 03, 2020 at 03:13:56PM +0800, Jason Wang wrote:
> > > On 2020/6/2 下午9:05, Michael S. Tsirkin wrote:
> 
> 
> [...]
> 
> 
> > > > +
> > > > +static int fetch_indirect_descs(struct vhost_virtqueue *vq,
> > > > +   struct vhost_desc *indirect,
> > > > +   u16 head)
> > > > +{
> > > > +   struct vring_desc desc;
> > > > +   unsigned int i = 0, count, found = 0;
> > > > +   u32 len = indirect->len;
> > > > +   struct iov_iter from;
> > > > +   int ret;
> > > > +
> > > > +   /* Sanity check */
> > > > +   if (unlikely(len % sizeof desc)) {
> > > > +   vq_err(vq, "Invalid length in indirect descriptor: "
> > > > +  "len 0x%llx not multiple of 0x%zx\n",
> > > > +  (unsigned long long)len,
> > > > +  sizeof desc);
> > > > +   return -EINVAL;
> > > > +   }
> > > > +
> > > > +   ret = translate_desc(vq, indirect->addr, len, vq->indirect,
> > > > +UIO_MAXIOV, VHOST_ACCESS_RO);
> > > > +   if (unlikely(ret < 0)) {
> > > > +   if (ret != -EAGAIN)
> > > > +   vq_err(vq, "Translation failure %d in 
> > > > indirect.\n", ret);
> > > > +   return ret;
> > > > +   }
> > > > +   iov_iter_init(, READ, vq->indirect, ret, len);
> > > > +
> > > > +   /* We will use the result as an address to read from, so most
> > > > +* architectures only need a compiler barrier here. */
> > > > +   read_barrier_depends();
> > > > +
> > > > +   count = len / sizeof desc;
> > > > +   /* Buffers are chained via a 16 bit next field, so
> > > > +* we can have at most 2^16 of these. */
> > > > +   if (unlikely(count > USHRT_MAX + 1)) {
> > > > +   vq_err(vq, "Indirect buffer length too big: %d\n",
> > > > +  indirect->len);
> > > > +   return -E2BIG;
> > > > +   }
> > > > +   if (unlikely(vq->ndescs + count > vq->max_descs)) {
> > > > +   vq_err(vq, "Too many indirect + direct descs: %d + 
> > > > %d\n",
> > > > +  vq->ndescs, indirect->len);
> > > > +   return -E2BIG;
> > > > +   }
> > > > +
> > > > +   do {
> > > > +   if (unlikely(++found > count)) {
> > > > +   vq_err(vq, "Loop detected: last one at %u "
> > > > +  "indirect size %u\n",
> > > > +  i, count);
> > > > +   return -EINVAL;
> > > > +   }
> > > > +   if (unlikely(!copy_from_iter_full(, sizeof(desc), 
> > > > ))) {
> > > > +   vq_err(vq, "Failed indirect descriptor: idx %d, 
> > > > %zx\n",
> > > > +  i, (size_t)indirect->addr + i * sizeof 
> > > > desc);
> > > > +   return -EINVAL;
> > > > +   }
> > > > +   if (unlikely(desc.flags & cpu_to_vhost16(vq, 
> > > > VRING_DESC_F_INDIRECT))) {
> > > > +   vq_err(vq, "Nested indirect descriptor: idx %d, 
> > > > %zx\n",
> > > > +  i, (size_t)indirect->addr + i * sizeof 
> > > > desc);
> > > > +   return -EINVAL;
> > > > +   }
> > > > +
> > > > +   push_split_desc(vq, , head);
> > > 
> > > The error is ignored.
> > See above:
> > 
> > if (unlikely(vq->ndescs + count > vq->max_descs))
> > 
> > So it can't fail here, we never fetch unless there's space.
> > 
> > I guess we can add a WARN_ON here.
> 
> 
> Yes.
> 
> 
> > 
> > > > +   } while ((i = next_desc(vq, )) != -1);
> > > > +   return 0;
> > > > +}
> > > > +
> > > > +static int fetch_descs(struct vhost_virtqueue *vq)
> > > > +{
> > > > +   unsigned int i, head, found = 0;
> > > > +   struct vhost_desc *last;
> > > > +   struct vring_desc desc;
> > > > +   __virtio16 avail_idx;
> > > > +   __virtio16 ring_head;
> > > > +   u16 last_avail_idx;
> > > > +   int ret;
> > > > +
> > > > +   /* Check it isn't doing very strange things with descriptor 
> > > > numbers. */
> > > > +   last_avail_idx = vq->last_avail_idx;
> > > > +
> > > > +   if (vq->avail_idx == vq->last_avail_idx) {
> > > > +   if (unlikely(vhost_get_avail_idx(vq, _idx))) {
> > > > +   vq_err(vq, "Failed to access avail idx at %p\n",
> > > > +   >avail->idx);
> > > > +   return -EFAULT;
> > > > +   }
> > > > +   vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
> > > > +
> > > > +   if (unlikely((u16)(vq->avail_idx - last_avail_idx) > 
> > > > vq->num)) {
> > > > +   vq_err(vq, "Guest moved used index from %u to 
> > 

Re: [PATCH RFC 03/13] vhost: batching fetches

2020-06-07 Thread Michael S. Tsirkin
On Fri, Jun 05, 2020 at 11:40:17AM +0800, Jason Wang wrote:
> 
> On 2020/6/4 下午4:59, Michael S. Tsirkin wrote:
> > On Wed, Jun 03, 2020 at 03:27:39PM +0800, Jason Wang wrote:
> > > On 2020/6/2 下午9:06, Michael S. Tsirkin wrote:
> > > > With this patch applied, new and old code perform identically.
> > > > 
> > > > Lots of extra optimizations are now possible, e.g.
> > > > we can fetch multiple heads with copy_from/to_user now.
> > > > We can get rid of maintaining the log array.  Etc etc.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin
> > > > Signed-off-by: Eugenio Pérez
> > > > Link:https://lore.kernel.org/r/20200401183118.8334-4-epere...@redhat.com
> > > > Signed-off-by: Michael S. Tsirkin
> > > > ---
> > > >drivers/vhost/test.c  |  2 +-
> > > >drivers/vhost/vhost.c | 47 
> > > > ++-
> > > >drivers/vhost/vhost.h |  5 -
> > > >3 files changed, 47 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
> > > > index 9a3a09005e03..02806d6f84ef 100644
> > > > --- a/drivers/vhost/test.c
> > > > +++ b/drivers/vhost/test.c
> > > > @@ -119,7 +119,7 @@ static int vhost_test_open(struct inode *inode, 
> > > > struct file *f)
> > > > dev = >dev;
> > > > vqs[VHOST_TEST_VQ] = >vqs[VHOST_TEST_VQ];
> > > > n->vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick;
> > > > -   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV,
> > > > +   vhost_dev_init(dev, vqs, VHOST_TEST_VQ_MAX, UIO_MAXIOV + 64,
> > > >VHOST_TEST_PKT_WEIGHT, VHOST_TEST_WEIGHT, NULL);
> > > > f->private_data = n;
> > > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> > > > index 8f9a07282625..aca2a5b0d078 100644
> > > > --- a/drivers/vhost/vhost.c
> > > > +++ b/drivers/vhost/vhost.c
> > > > @@ -299,6 +299,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
> > > >{
> > > > vq->num = 1;
> > > > vq->ndescs = 0;
> > > > +   vq->first_desc = 0;
> > > > vq->desc = NULL;
> > > > vq->avail = NULL;
> > > > vq->used = NULL;
> > > > @@ -367,6 +368,11 @@ static int vhost_worker(void *data)
> > > > return 0;
> > > >}
> > > > +static int vhost_vq_num_batch_descs(struct vhost_virtqueue *vq)
> > > > +{
> > > > +   return vq->max_descs - UIO_MAXIOV;
> > > > +}
> > > 1 descriptor does not mean 1 iov, e.g userspace may pass several 1 byte
> > > length memory regions for us to translate.
> > > 
> > Yes but I don't see the relevance. This tells us how many descriptors to
> > batch, not how many IOVs.
> 
> 
> Yes, but questions are:
> 
> - this introduce another obstacle to support more than 1K queue size
> - if we support 1K queue size, does it mean we need to cache 1K descriptors,
> which seems a large stress on the cache
> 
> Thanks
> 
> 
> > 

Still don't understand the relevance. We support up to 1K descriptors
per buffer just for IOV since we always did. This adds 64 more
descriptors - is that a big deal?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 5/6] vdpa: introduce virtio pci driver

2020-06-07 Thread Michael S. Tsirkin
On Fri, Jun 05, 2020 at 04:54:17PM +0800, Jason Wang wrote:
> 
> On 2020/6/2 下午3:08, Jason Wang wrote:
> > > 
> > > > +static const struct pci_device_id vp_vdpa_id_table[] = {
> > > > +    { PCI_DEVICE(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) },
> > > > +    { 0 }
> > > > +};
> > > This looks like it'll create a mess with either virtio pci
> > > or vdpa being loaded at random. Maybe just don't specify
> > > any IDs for now. Down the road we could get a
> > > distinct vendor ID or a range of device IDs for this.
> > 
> > 
> > Right, will do.
> > 
> > Thanks
> 
> 
> Rethink about this. If we don't specify any ID, the binding won't work.

We can bind manually. It's not really for production anyway, so
not a big deal imho.

> How about using a dedicated subsystem vendor id for this?
> 
> Thanks

If virtio vendor id is used then standard driver is expected
to bind, right? Maybe use a dedicated vendor id?

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_net: Unregister and re-register xdp_rxq across freeze/restore

2020-06-07 Thread Michael S. Tsirkin
On Fri, Jun 05, 2020 at 02:46:24PM -0700, Sean Christopherson wrote:
> Unregister each queue's xdp_rxq during freeze, and re-register the new
> instance during restore.  All queues are released during free and
> recreated during restore, i.e. the pre-freeze xdp_rxq will be lost.
> 
> The bug is detected by WARNs in xdp_rxq_info_unreg() and
> xdp_rxq_info_unreg_mem_model() that fire after a suspend/resume cycle as
> virtnet_close() attempts to unregister an uninitialized xdp_rxq object.
> 
>   [ cut here ]
>   Driver BUG
>   WARNING: CPU: 0 PID: 880 at net/core/xdp.c:163 xdp_rxq_info_unreg+0x48/0x50
>   Modules linked in:
>   CPU: 0 PID: 880 Comm: ip Not tainted 5.7.0-rc5+ #80
>   Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>   RIP: 0010:xdp_rxq_info_unreg+0x48/0x50
>   Code: <0f> 0b eb ca 0f 1f 40 00 0f 1f 44 00 00 53 48 83 ec 10 8b 47 0c 83
>   RSP: 0018:c91ab540 EFLAGS: 00010286
>   RAX:  RBX: 88827f83ac80 RCX: 
>   RDX: 000a RSI: 8253bc2a RDI: 825397ec
>   RBP:  R08: 8253bc20 R09: 000a
>   R10: c91ab548 R11: 0370 R12: 88817a89c000
>   R13:  R14: c91abbc8 R15: 0001
>   FS:  7f48b70e70c0() GS:88817bc0() knlGS:
>   CS:  0010 DS:  ES:  CR0: 80050033
>   CR2: 7f48b6634950 CR3: 000277f1d002 CR4: 00160eb0
>   Call Trace:
>virtnet_close+0x6a/0xb0
>__dev_close_many+0x91/0x100
>__dev_change_flags+0xc1/0x1c0
>dev_change_flags+0x23/0x60
>do_setlink+0x350/0xdf0
>__rtnl_newlink+0x553/0x860
>rtnl_newlink+0x43/0x60
>rtnetlink_rcv_msg+0x289/0x340
>netlink_rcv_skb+0xd1/0x110
>netlink_unicast+0x203/0x310
>netlink_sendmsg+0x32b/0x460
>sock_sendmsg+0x5b/0x60
>sys_sendmsg+0x23e/0x260
>___sys_sendmsg+0x88/0xd0
>__sys_sendmsg+0x63/0xa0
>do_syscall_64+0x4c/0x170
>entry_SYSCALL_64_after_hwframe+0x44/0xa9
>   [ cut here ]
> 
> Cc: Jesper Dangaard Brouer 
> Fixes: 754b8a21a96d5 ("virtio_net: setup xdp_rxq_info")
> Signed-off-by: Sean Christopherson 
> ---
> 
> Disclaimer: I am not remotely confident that this patch is 100% correct
> or complete, my VirtIO knowledge is poor and my networking knowledge is
> downright abysmal.
> 
>  drivers/net/virtio_net.c | 37 +
>  1 file changed, 29 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ba38765dc490..61055be3615e 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1469,6 +1469,21 @@ static int virtnet_poll(struct napi_struct *napi, int 
> budget)
>   return received;
>  }
>  
> +static int virtnet_reg_xdp(struct xdp_rxq_info *xdp_rxq,
> +struct net_device *dev, u32 queue_index)
> +{
> + int err;
> +
> + err = xdp_rxq_info_reg(xdp_rxq, dev, queue_index);
> + if (err < 0)
> + return err;
> +
> + err = xdp_rxq_info_reg_mem_model(xdp_rxq, MEM_TYPE_PAGE_SHARED, NULL);
> + if (err < 0)
> + xdp_rxq_info_unreg(xdp_rxq);
> + return err;
> +}
> +
>  static int virtnet_open(struct net_device *dev)
>  {
>   struct virtnet_info *vi = netdev_priv(dev);
> @@ -1480,17 +1495,10 @@ static int virtnet_open(struct net_device *dev)
>   if (!try_fill_recv(vi, >rq[i], GFP_KERNEL))
>   schedule_delayed_work(>refill, 0);
>  
> - err = xdp_rxq_info_reg(>rq[i].xdp_rxq, dev, i);
> + err = virtnet_reg_xdp(>rq[i].xdp_rxq, dev, i);
>   if (err < 0)
>   return err;
>  
> - err = xdp_rxq_info_reg_mem_model(>rq[i].xdp_rxq,
> -  MEM_TYPE_PAGE_SHARED, NULL);
> - if (err < 0) {
> - xdp_rxq_info_unreg(>rq[i].xdp_rxq);
> - return err;
> - }
> -
>   virtnet_napi_enable(vi->rq[i].vq, >rq[i].napi);
>   virtnet_napi_tx_enable(vi, vi->sq[i].vq, >sq[i].napi);
>   }
> @@ -2306,6 +2314,7 @@ static void virtnet_freeze_down(struct virtio_device 
> *vdev)
>  
>   if (netif_running(vi->dev)) {
>   for (i = 0; i < vi->max_queue_pairs; i++) {
> + xdp_rxq_info_unreg(>rq[i].xdp_rxq);
>   napi_disable(>rq[i].napi);
>   virtnet_napi_tx_disable(>sq[i].napi);

I suspect the right thing to do is to first disable all NAPI,
then play with XDP. Generally cleanup in the reverse order
of init is a good idea.


>   }
> @@ -2313,6 +2322,8 @@ static void virtnet_freeze_down(struct virtio_device 
> *vdev)
>  }
>  
>  static int init_vqs(struct virtnet_info *vi);
> +static void virtnet_del_vqs(struct virtnet_info *vi);
> +static void