Re: [PATCH v2 00/14] vDPA shadow virtqueue

2022-02-27 Thread Jason Wang


在 2022/2/27 下午9:40, Eugenio Pérez 写道:

This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
is intended as a new method of tracking the memory the devices touch
during a migration process: Instead of relay on vhost device's dirty
logging capability, SVQ intercepts the VQ dataplane forwarding the
descriptors between VM and device. This way qemu is the effective
writer of guests memory, like in qemu's virtio device operation.

When SVQ is enabled qemu offers a new virtual address space to the
device to read and write into, and it maps new vrings and the guest
memory in it. SVQ also intercepts kicks and calls between the device
and the guest. Used buffers relay would cause dirty memory being
tracked.

This effectively means that vDPA device passthrough is intercepted by
qemu. While SVQ should only be enabled at migration time, the switching
from regular mode to SVQ mode is left for a future series.

It is based on the ideas of DPDK SW assisted LM, in the series of
DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
not map the shadow vq in guest's VA, but in qemu's.

For qemu to use shadow virtqueues the guest virtio driver must not use
features like event_idx, indirect descriptors, packed and in_order.
These features are easy to implement on top of this base, but is left
for a future series for simplicity.

SVQ needs to be enabled at qemu start time with vdpa cmdline parameter:

-netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,x-svq=off

The first three patches enables notifications forwarding with
assistance of qemu. It's easy to enable only this if the relevant
cmdline part of the last patch is applied on top of these.

Next four patches implement the actual buffer forwarding. However,
address are not translated from HVA so they will need a host device with
an iommu allowing them to access all of the HVA range.

The last part of the series uses properly the host iommu, so qemu
creates a new iova address space in the device's range and translates
the buffers in it. Finally, it adds the cmdline parameter.

Some simple performance tests with netperf were done. They used a nested
guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
baseline average of ~9980.13Mbps:
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

131072  16384  1638430.019910.61
131072  16384  1638430.0010030.94
131072  16384  1638430.019998.84

To enable the notifications interception reduced performance to an
average of ~9577.73Mbit/s:
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

131072  16384  1638430.009563.03
131072  16384  1638430.019626.65
131072  16384  1638430.019543.51

Finally, to enable buffers forwarding reduced the throughput again to
~8902.92Mbit/s:
Recv   SendSend
Socket Socket  Message  Elapsed
Size   SizeSize Time Throughput
bytes  bytes   bytessecs.10^6bits/sec

131072  16384  1638430.018643.19
131072  16384  1638430.019033.56
131072  16384  1638430.019032.02

However, many performance improvements were left out of this series for
simplicity, so difference if performance should shrink in the future.

Comments are welcome.



The series looks good overall, few comments in the individual patch.

I think if there's no objection, we can try to make it 7.0. (soft-freeze 
is 2022-03-08)


Thanks




TODO in future series:
* Event, indirect, packed, and others features of virtio.
* To support different set of features between the device<->SVQ and the
   SVQ<->guest communication.
* Support of device host notifier memory regions.
* To sepparate buffers forwarding in its own AIO context, so we can
   throw more threads to that task and we don't need to stop the main
   event loop.
* Support multiqueue virtio-net vdpa.
* Proper documentation.

Changes from v1:
* Feature set at device->SVQ is now the same as SVQ->guest.
* Size of SVQ is not max available device size anymore, but guest's
   negotiated.
* Add VHOST_FILE_UNBIND kick and call fd treatment.
* Make SVQ a public struct
* Come back to previous approach to iova-tree
* Some assertions are now fail paths. Some errors are now log_guest.
* Only mask _F_LOG feature at vdpa_set_features svq enable path.
* Refactor some errors and messages. Add missing error unwindings.
* Add memory barrier at _F_NO_NOTIFY set.
* Stop checking for features flags out of transport range.
v1 link:
https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3...@redhat.com/

Changes from v4 RFC:
* Support of allocating / freeing iova ranges in IOVA tree. Extending
   already present iova-tree for that.
* Proper validation of guest features. Now SVQ can negotiate a
   different set of features with the device when enabled.
* Support of host notif

Re: [PATCH v2 11/14] vdpa: Adapt vhost_vdpa_get_vring_base to SVQ

2022-02-27 Thread Jason Wang


在 2022/2/27 下午9:41, Eugenio Pérez 写道:

This is needed to achieve migration, so the destination can restore its
index.



I suggest to duplicate the comment below here.

Thanks



Signed-off-by: Eugenio Pérez 
---
  hw/virtio/vhost-vdpa.c | 17 +
  1 file changed, 17 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 56f9f125cd..accc4024c2 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1180,8 +1180,25 @@ static int vhost_vdpa_set_vring_base(struct vhost_dev 
*dev,
  static int vhost_vdpa_get_vring_base(struct vhost_dev *dev,
 struct vhost_vring_state *ring)
  {
+struct vhost_vdpa *v = dev->opaque;
  int ret;
  
+if (v->shadow_vqs_enabled) {

+VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
+  ring->index);
+
+/*
+ * Setting base as last used idx, so destination will see as available
+ * all the entries that the device did not use, including the in-flight
+ * processing ones.
+ *
+ * TODO: This is ok for networking, but other kinds of devices might
+ * have problems with these retransmissions.
+ */
+ring->num = svq->last_used_idx;
+return 0;
+}
+
  ret = vhost_vdpa_call(dev, VHOST_GET_VRING_BASE, ring);
  trace_vhost_vdpa_get_vring_base(dev, ring->index, ring->num);
  return ret;


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 10/14] vdpa: Add custom IOTLB translations to SVQ

2022-02-27 Thread Jason Wang


在 2022/2/27 下午9:41, Eugenio Pérez 写道:

Use translations added in VhostIOVATree in SVQ.

Only introduce usage here, not allocation and deallocation. As with
previous patches, we use the dead code paths of shadow_vqs_enabled to
avoid commiting too many changes at once. These are impossible to take
at the moment.

Signed-off-by: Eugenio Pérez 
---
  hw/virtio/vhost-shadow-virtqueue.h |   6 +-
  include/hw/virtio/vhost-vdpa.h |   3 +
  hw/virtio/vhost-shadow-virtqueue.c |  76 -
  hw/virtio/vhost-vdpa.c | 128 -
  4 files changed, 187 insertions(+), 26 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 04c67685fd..b2f722d101 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -13,6 +13,7 @@
  #include "qemu/event_notifier.h"
  #include "hw/virtio/virtio.h"
  #include "standard-headers/linux/vhost_types.h"
+#include "hw/virtio/vhost-iova-tree.h"
  
  /* Shadow virtqueue to relay notifications */

  typedef struct VhostShadowVirtqueue {
@@ -43,6 +44,9 @@ typedef struct VhostShadowVirtqueue {
  /* Virtio device */
  VirtIODevice *vdev;
  
+/* IOVA mapping */

+VhostIOVATree *iova_tree;
+
  /* Map for use the guest's descriptors */
  VirtQueueElement **ring_id_maps;
  
@@ -78,7 +82,7 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,

   VirtQueue *vq);
  void vhost_svq_stop(VhostShadowVirtqueue *svq);
  
-VhostShadowVirtqueue *vhost_svq_new(void);

+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree);
  
  void vhost_svq_free(gpointer vq);

  G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 009a9f3b6b..ee8e939ad0 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -14,6 +14,7 @@
  
  #include 
  
+#include "hw/virtio/vhost-iova-tree.h"

  #include "hw/virtio/virtio.h"
  #include "standard-headers/linux/vhost_types.h"
  
@@ -30,6 +31,8 @@ typedef struct vhost_vdpa {

  MemoryListener listener;
  struct vhost_vdpa_iova_range iova_range;
  bool shadow_vqs_enabled;
+/* IOVA mapping used by the Shadow Virtqueue */
+VhostIOVATree *iova_tree;
  GPtrArray *shadow_vqs;
  struct vhost_dev *dev;
  VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index a38d313755..7e073773d1 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -11,6 +11,7 @@
  #include "hw/virtio/vhost-shadow-virtqueue.h"
  
  #include "qemu/error-report.h"

+#include "qemu/log.h"
  #include "qemu/main-loop.h"
  #include "qemu/log.h"
  #include "linux-headers/linux/vhost.h"
@@ -84,7 +85,58 @@ static void vhost_svq_set_notification(VhostShadowVirtqueue 
*svq, bool enable)
  }
  }
  
+/**

+ * Translate addresses between the qemu's virtual address and the SVQ IOVA
+ *
+ * @svqShadow VirtQueue
+ * @vaddr  Translated IOVA addresses
+ * @iovec  Source qemu's VA addresses
+ * @numLength of iovec and minimum length of vaddr
+ */
+static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
+ void **addrs, const struct iovec *iovec,
+ size_t num)
+{
+if (num == 0) {
+return true;
+}
+
+for (size_t i = 0; i < num; ++i) {
+DMAMap needle = {
+.translated_addr = (hwaddr)iovec[i].iov_base,
+.size = iovec[i].iov_len,
+};
+size_t off;
+
+const DMAMap *map = vhost_iova_tree_find_iova(svq->iova_tree, &needle);
+/*
+ * Map cannot be NULL since iova map contains all guest space and
+ * qemu already has a physical address mapped
+ */
+if (unlikely(!map)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "Invalid address 0x%"HWADDR_PRIx" given by guest",
+  needle.translated_addr);
+return false;
+}
+
+off = needle.translated_addr - map->translated_addr;
+addrs[i] = (void *)(map->iova + off);
+
+if (unlikely(int128_gt(int128_add(needle.translated_addr,
+  iovec[i].iov_len),
+   map->translated_addr + map->size))) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "Guest buffer expands over iova range");
+return false;
+}
+}
+
+return true;
+}
+
  static void vhost_vring_write_descs(VhostShadowVirtqueue *svq,
+void * const *vaddr_sg,



Nit: it looks to me we are not passing vaddr but iova here, so it might 
be better to use "sg"?




  const struct iovec *iovec,

Re: [PATCH v2 09/14] vhost: Add VhostIOVATree

2022-02-27 Thread Jason Wang


在 2022/2/27 下午9:41, Eugenio Pérez 写道:

This tree is able to look for a translated address from an IOVA address.

At first glance it is similar to util/iova-tree. However, SVQ working on
devices with limited IOVA space need more capabilities, like allocating
IOVA chunks or performing reverse translations (qemu addresses to iova).

The allocation capability, as "assign a free IOVA address to this chunk
of memory in qemu's address space" allows shadow virtqueue to create a
new address space that is not restricted by guest's addressable one, so
we can allocate shadow vqs vrings outside of it.

It duplicates the tree so it can search efficiently in both directions,
and it will signal overlap if iova or the translated address is present
in any tree.

Signed-off-by: Eugenio Pérez 
---
  hw/virtio/vhost-iova-tree.h |  27 +++
  hw/virtio/vhost-iova-tree.c | 155 
  hw/virtio/meson.build   |   2 +-
  3 files changed, 183 insertions(+), 1 deletion(-)
  create mode 100644 hw/virtio/vhost-iova-tree.h
  create mode 100644 hw/virtio/vhost-iova-tree.c

diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
new file mode 100644
index 00..6a4f24e0f9
--- /dev/null
+++ b/hw/virtio/vhost-iova-tree.h
@@ -0,0 +1,27 @@
+/*
+ * vhost software live migration iova tree
+ *
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
+ * SPDX-FileContributor: Author: Eugenio Pérez 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_VIRTIO_VHOST_IOVA_TREE_H
+#define HW_VIRTIO_VHOST_IOVA_TREE_H
+
+#include "qemu/iova-tree.h"
+#include "exec/memory.h"
+
+typedef struct VhostIOVATree VhostIOVATree;
+
+VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
+void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
+
+const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
+const DMAMap *map);
+int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map);
+void vhost_iova_tree_remove(VhostIOVATree *iova_tree, const DMAMap *map);
+
+#endif
diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
new file mode 100644
index 00..03496ac075
--- /dev/null
+++ b/hw/virtio/vhost-iova-tree.c
@@ -0,0 +1,155 @@
+/*
+ * vhost software live migration iova tree
+ *
+ * SPDX-FileCopyrightText: Red Hat, Inc. 2021
+ * SPDX-FileContributor: Author: Eugenio Pérez 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/iova-tree.h"
+#include "vhost-iova-tree.h"
+
+#define iova_min_addr qemu_real_host_page_size
+
+/**
+ * VhostIOVATree, able to:
+ * - Translate iova address
+ * - Reverse translate iova address (from translated to iova)
+ * - Allocate IOVA regions for translated range (linear operation)
+ */
+struct VhostIOVATree {
+/* First addressable iova address in the device */
+uint64_t iova_first;
+
+/* Last addressable iova address in the device */
+uint64_t iova_last;
+
+/* IOVA address to qemu memory maps. */
+IOVATree *iova_taddr_map;
+
+/* QEMU virtual memory address to iova maps */
+GTree *taddr_iova_map;
+};
+
+static gint vhost_iova_tree_cmp_taddr(gconstpointer a, gconstpointer b,
+  gpointer data)
+{
+const DMAMap *m1 = a, *m2 = b;
+
+if (m1->translated_addr > m2->translated_addr + m2->size) {
+return 1;
+}
+
+if (m1->translated_addr + m1->size < m2->translated_addr) {
+return -1;
+}
+
+/* Overlapped */
+return 0;
+}
+
+/**
+ * Create a new IOVA tree
+ *
+ * Returns the new IOVA tree
+ */
+VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, hwaddr iova_last)
+{
+VhostIOVATree *tree = g_new(VhostIOVATree, 1);
+
+/* Some devices do not like 0 addresses */
+tree->iova_first = MAX(iova_first, iova_min_addr);
+tree->iova_last = iova_last;
+
+tree->iova_taddr_map = iova_tree_new();
+tree->taddr_iova_map = g_tree_new_full(vhost_iova_tree_cmp_taddr, NULL,
+   NULL, g_free);
+return tree;
+}
+
+/**
+ * Delete an iova tree
+ */
+void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
+{
+iova_tree_destroy(iova_tree->iova_taddr_map);
+g_tree_unref(iova_tree->taddr_iova_map);
+g_free(iova_tree);
+}
+
+/**
+ * Find the IOVA address stored from a memory address
+ *
+ * @tree The iova tree
+ * @map  The map with the memory address
+ *
+ * Return the stored mapping, or NULL if not found.
+ */
+const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *tree,
+const DMAMap *map)
+{
+return g_tree_lookup(tree->taddr_iova_map, map);
+}
+
+/**
+ * Allocate a new mapping
+ *
+ * @tree  The iova tree
+ * @map   The iova map
+ *
+ * Returns:
+ * - IOVA_OK if the map fits in the container
+ * - IOVA_ERR_INVALID if the map does not make sense 

Re: [PATCH v2 08/14] util: Add iova_tree_alloc

2022-02-27 Thread Jason Wang


在 2022/2/27 下午9:41, Eugenio Pérez 写道:

This iova tree function allows it to look for a hole in allocated
regions and return a totally new translation for a given translated
address.

It's usage is mainly to allow devices to access qemu address space,
remapping guest's one into a new iova space where qemu can add chunks of
addresses.

Signed-off-by: Eugenio Pérez 
Reviewed-by: Peter Xu 
---
  include/qemu/iova-tree.h |  18 ++
  util/iova-tree.c | 133 +++
  2 files changed, 151 insertions(+)

diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
index 8249edd764..a623136cd8 100644
--- a/include/qemu/iova-tree.h
+++ b/include/qemu/iova-tree.h
@@ -29,6 +29,7 @@
  #define  IOVA_OK   (0)
  #define  IOVA_ERR_INVALID  (-1) /* Invalid parameters */
  #define  IOVA_ERR_OVERLAP  (-2) /* IOVA range overlapped */
+#define  IOVA_ERR_NOMEM(-3) /* Cannot allocate */
  
  typedef struct IOVATree IOVATree;

  typedef struct DMAMap {
@@ -119,6 +120,23 @@ const DMAMap *iova_tree_find_address(const IOVATree *tree, 
hwaddr iova);
   */
  void iova_tree_foreach(IOVATree *tree, iova_tree_iterator iterator);
  
+/**

+ * iova_tree_alloc:



Should be iova_tree_alloc_map.



+ *
+ * @tree: the iova tree to allocate from
+ * @map: the new map (as translated addr & size) to allocate in the iova region
+ * @iova_begin: the minimum address of the allocation
+ * @iova_end: the maximum addressable direction of the allocation
+ *
+ * Allocates a new region of a given size, between iova_min and iova_max.
+ *
+ * Return: Same as iova_tree_insert, but cannot overlap and can return error if
+ * iova tree is out of free contiguous range. The caller gets the assigned iova
+ * in map->iova.
+ */
+int iova_tree_alloc_map(IOVATree *tree, DMAMap *map, hwaddr iova_begin,
+hwaddr iova_end);
+
  /**
   * iova_tree_destroy:
   *
diff --git a/util/iova-tree.c b/util/iova-tree.c
index 23ea35b7a4..302b01f1cc 100644
--- a/util/iova-tree.c
+++ b/util/iova-tree.c
@@ -16,6 +16,39 @@ struct IOVATree {
  GTree *tree;
  };
  
+/* Args to pass to iova_tree_alloc foreach function. */

+struct IOVATreeAllocArgs {
+/* Size of the desired allocation */
+size_t new_size;
+
+/* The minimum address allowed in the allocation */
+hwaddr iova_begin;
+
+/* Map at the left of the hole, can be NULL if "this" is first one */
+const DMAMap *prev;
+
+/* Map at the right of the hole, can be NULL if "prev" is the last one */
+const DMAMap *this;
+
+/* If found, we fill in the IOVA here */
+hwaddr iova_result;
+
+/* Whether have we found a valid IOVA */
+bool iova_found;
+};
+
+/**
+ * Iterate args to the next hole
+ *
+ * @args  The alloc arguments
+ * @next  The next mapping in the tree. Can be NULL to signal the last one
+ */
+static void iova_tree_alloc_args_iterate(struct IOVATreeAllocArgs *args,
+ const DMAMap *next) {
+args->prev = args->this;
+args->this = next;
+}
+
  static int iova_tree_compare(gconstpointer a, gconstpointer b, gpointer data)
  {
  const DMAMap *m1 = a, *m2 = b;
@@ -107,6 +140,106 @@ int iova_tree_remove(IOVATree *tree, const DMAMap *map)
  return IOVA_OK;
  }
  
+/**

+ * Try to find an unallocated IOVA range between prev and this elements.
+ *
+ * @args Arguments to allocation
+ *
+ * Cases:
+ *
+ * (1) !prev, !this: No entries allocated, always succeed
+ *
+ * (2) !prev, this: We're iterating at the 1st element.
+ *
+ * (3) prev, !this: We're iterating at the last element.
+ *
+ * (4) prev, this: this is the most common case, we'll try to find a hole
+ * between "prev" and "this" mapping.
+ *
+ * Note that this function assumes the last valid iova is HWADDR_MAX, but it
+ * searches linearly so it's easy to discard the result if it's not the case.
+ */
+static void iova_tree_alloc_map_in_hole(struct IOVATreeAllocArgs *args)
+{
+const DMAMap *prev = args->prev, *this = args->this;
+uint64_t hole_start, hole_last;
+
+if (this && this->iova + this->size < args->iova_begin) {
+return;
+}
+
+hole_start = MAX(prev ? prev->iova + prev->size + 1 : 0, args->iova_begin);
+hole_last = this ? this->iova : HWADDR_MAX;



Do we need to use iova_last instead of HWADDR_MAX?



+
+if (hole_last - hole_start > args->new_size) {
+args->iova_result = hole_start;
+args->iova_found = true;
+}
+}
+
+/**
+ * Foreach dma node in the tree, compare if there is a hole with its previous
+ * node (or minimum iova address allowed) and the node.
+ *
+ * @key   Node iterating
+ * @value Node iterating
+ * @pargs Struct to communicate with the outside world
+ *
+ * Return: false to keep iterating, true if needs break.
+ */
+static gboolean iova_tree_alloc_traverse(gpointer key, gpointer value,
+ gpointer pargs)
+{
+struct IOVATreeAllocArgs *args = pargs;
+DMAMap *node = value;
+
+  

Re: [PATCH v2 07/14] vhost: Shadow virtqueue buffers forwarding

2022-02-27 Thread Jason Wang


在 2022/2/27 下午9:41, Eugenio Pérez 写道:

Initial version of shadow virtqueue that actually forward buffers. There
is no iommu support at the moment, and that will be addressed in future
patches of this series. Since all vhost-vdpa devices use forced IOMMU,
this means that SVQ is not usable at this point of the series on any
device.

For simplicity it only supports modern devices, that expects vring
in little endian, with split ring and no event idx or indirect
descriptors. Support for them will not be added in this series.

It reuses the VirtQueue code for the device part. The driver part is
based on Linux's virtio_ring driver, but with stripped functionality
and optimizations so it's easier to review.

However, forwarding buffers have some particular pieces: One of the most
unexpected ones is that a guest's buffer can expand through more than
one descriptor in SVQ. While this is handled gracefully by qemu's
emulated virtio devices, it may cause unexpected SVQ queue full. This
patch also solves it by checking for this condition at both guest's
kicks and device's calls. The code may be more elegant in the future if
SVQ code runs in its own iocontext.

Signed-off-by: Eugenio Pérez 
---
  hw/virtio/vhost-shadow-virtqueue.h |  29 +++
  hw/virtio/vhost-shadow-virtqueue.c | 360 -
  hw/virtio/vhost-vdpa.c | 165 -
  3 files changed, 542 insertions(+), 12 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 3bbea77082..04c67685fd 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -36,6 +36,33 @@ typedef struct VhostShadowVirtqueue {
  
  /* Guest's call notifier, where the SVQ calls guest. */

  EventNotifier svq_call;
+
+/* Virtio queue shadowing */
+VirtQueue *vq;
+
+/* Virtio device */
+VirtIODevice *vdev;
+
+/* Map for use the guest's descriptors */
+VirtQueueElement **ring_id_maps;
+
+/* Next VirtQueue element that guest made available */
+VirtQueueElement *next_guest_avail_elem;
+
+/* Next head to expose to the device */
+uint16_t avail_idx_shadow;
+
+/* Next free descriptor */
+uint16_t free_head;
+
+/* Last seen used idx */
+uint16_t shadow_used_idx;



I suggest to have a consistent name for avail and used instead of using 
one "shadow" as prefix and other as suffix




+
+/* Next head to consume from the device */
+uint16_t last_used_idx;
+
+/* Cache for the exposed notification flag */
+bool notification;
  } VhostShadowVirtqueue;
  
  bool vhost_svq_valid_features(uint64_t *features);

@@ -47,6 +74,8 @@ void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
  size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq);
  size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq);
  
+void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,

+ VirtQueue *vq);
  void vhost_svq_stop(VhostShadowVirtqueue *svq);
  
  VhostShadowVirtqueue *vhost_svq_new(void);

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 2150e2b071..a38d313755 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -12,6 +12,7 @@
  
  #include "qemu/error-report.h"

  #include "qemu/main-loop.h"
+#include "qemu/log.h"
  #include "linux-headers/linux/vhost.h"
  
  /**

@@ -53,22 +54,309 @@ bool vhost_svq_valid_features(uint64_t *features)
  return r;
  }
  
-/** Forward guest notifications */

-static void vhost_handle_guest_kick(EventNotifier *n)
+/**
+ * Number of descriptors that the SVQ can make available from the guest.
+ *
+ * @svq   The svq



Btw, I notice most of function there will be a colon in the middle of 
the parameter and it's documentation.  Maybe we should follow that.




+ */
+static uint16_t vhost_svq_available_slots(const VhostShadowVirtqueue *svq)
+{
+return svq->vring.num - (svq->avail_idx_shadow - svq->shadow_used_idx);
+}
+
+static void vhost_svq_set_notification(VhostShadowVirtqueue *svq, bool enable)
+{
+uint16_t notification_flag;
+
+if (svq->notification == enable) {
+return;
+}
+
+notification_flag = cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
+
+svq->notification = enable;
+if (enable) {
+svq->vring.avail->flags &= ~notification_flag;
+} else {
+svq->vring.avail->flags |= notification_flag;
+/* Make sure the flag is written before the read of used_idx */
+smp_mb();



So the comment assumes that a reading of used_idx will come. This makes 
me wonder if we can simply split this function as:


vhost_svq_disable_notification() and vhost_svq_enable_notification()

and in the vhost_svq_enable_notification, we simply return 
vhost_svq_more_used() after smp_mb().


(Not a must but just feel it might be more clear)



+}
+}
+
+static void vhost_vring_write_descs(VhostShadowVirtqueue *svq

Re: [PATCH v2 06/14] vdpa: adapt vhost_ops callbacks to svq

2022-02-27 Thread Jason Wang


在 2022/2/27 下午9:41, Eugenio Pérez 写道:

First half of the buffers forwarding part, preparing vhost-vdpa
callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
this is effectively dead code at the moment, but it helps to reduce
patch size.

Signed-off-by: Eugenio Pérez 
---
  hw/virtio/vhost-vdpa.c | 84 --
  1 file changed, 73 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index d614c435f3..b2c4e92fcf 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -344,6 +344,16 @@ static bool vhost_vdpa_one_time_request(struct vhost_dev 
*dev)
  return v->index != 0;
  }
  
+static int vhost_vdpa_get_dev_features(struct vhost_dev *dev,

+   uint64_t *features)
+{
+int ret;
+
+ret = vhost_vdpa_call(dev, VHOST_GET_FEATURES, features);
+trace_vhost_vdpa_get_features(dev, *features);
+return ret;
+}
+
  static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
 Error **errp)
  {
@@ -356,7 +366,7 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, 
struct vhost_vdpa *v,
  return 0;
  }
  
-r = hdev->vhost_ops->vhost_get_features(hdev, &dev_features);

+r = vhost_vdpa_get_dev_features(hdev, &dev_features);
  if (r != 0) {
  error_setg_errno(errp, -r, "Can't get vdpa device features");
  return r;
@@ -583,12 +593,26 @@ static int vhost_vdpa_set_mem_table(struct vhost_dev *dev,
  static int vhost_vdpa_set_features(struct vhost_dev *dev,
 uint64_t features)
  {
+struct vhost_vdpa *v = dev->opaque;
  int ret;
  
  if (vhost_vdpa_one_time_request(dev)) {

  return 0;
  }
  
+if (v->shadow_vqs_enabled) {

+uint64_t features_ok = features;
+bool ok;
+
+ok = vhost_svq_valid_features(&features_ok);
+if (unlikely(!ok)) {
+error_report(
+"Invalid guest acked feature flag, acked: 0x%"
+PRIx64", ok: 0x%"PRIx64, features, features_ok);
+return -EINVAL;
+}
+}
+
  trace_vhost_vdpa_set_features(dev, features);
  ret = vhost_vdpa_call(dev, VHOST_SET_FEATURES, &features);
  if (ret) {
@@ -735,6 +759,13 @@ static int vhost_vdpa_get_config(struct vhost_dev *dev, 
uint8_t *config,
  return ret;
   }
  
+static int vhost_vdpa_set_dev_vring_base(struct vhost_dev *dev,

+ struct vhost_vring_state *ring)
+{
+trace_vhost_vdpa_set_vring_base(dev, ring->index, ring->num);
+return vhost_vdpa_call(dev, VHOST_SET_VRING_BASE, ring);
+}
+
  static int vhost_vdpa_set_vring_dev_kick(struct vhost_dev *dev,
   struct vhost_vring_file *file)
  {
@@ -749,6 +780,18 @@ static int vhost_vdpa_set_vring_dev_call(struct vhost_dev 
*dev,
  return vhost_vdpa_call(dev, VHOST_SET_VRING_CALL, file);
  }
  
+static int vhost_vdpa_set_vring_dev_addr(struct vhost_dev *dev,

+ struct vhost_vring_addr *addr)
+{
+trace_vhost_vdpa_set_vring_addr(dev, addr->index, addr->flags,
+addr->desc_user_addr, addr->used_user_addr,
+addr->avail_user_addr,
+addr->log_guest_addr);
+
+return vhost_vdpa_call(dev, VHOST_SET_VRING_ADDR, addr);
+
+}
+
  /**
   * Set the shadow virtqueue descriptors to the device
   *
@@ -859,11 +902,17 @@ static int vhost_vdpa_set_log_base(struct vhost_dev *dev, 
uint64_t base,
  static int vhost_vdpa_set_vring_addr(struct vhost_dev *dev,
 struct vhost_vring_addr *addr)
  {
-trace_vhost_vdpa_set_vring_addr(dev, addr->index, addr->flags,
-addr->desc_user_addr, addr->used_user_addr,
-addr->avail_user_addr,
-addr->log_guest_addr);
-return vhost_vdpa_call(dev, VHOST_SET_VRING_ADDR, addr);
+struct vhost_vdpa *v = dev->opaque;
+
+if (v->shadow_vqs_enabled) {
+/*
+ * Device vring addr was set at device start. SVQ base is handled by
+ * VirtQueue code.
+ */
+return 0;
+}
+
+return vhost_vdpa_set_vring_dev_addr(dev, addr);
  }
  
  static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,

@@ -876,8 +925,17 @@ static int vhost_vdpa_set_vring_num(struct vhost_dev *dev,
  static int vhost_vdpa_set_vring_base(struct vhost_dev *dev,
 struct vhost_vring_state *ring)
  {
-trace_vhost_vdpa_set_vring_base(dev, ring->index, ring->num);
-return vhost_vdpa_call(dev, VHOST_SET_VRING_BASE, ring);
+struct vhost_vdpa *v = dev->opaque;
+
+if (v->shadow_vqs_enabled) {
+/*
+ * Device vring base was set at device start. SVQ base is handled

Re: [PATCH v2 04/14] vhost: Add vhost_svq_valid_features to shadow vq

2022-02-27 Thread Jason Wang


在 2022/2/27 下午9:41, Eugenio Pérez 写道:

This allows SVQ to negotiate features with the guest and the device. For
the device, SVQ is a driver. While this function bypasses all
non-transport features, it needs to disable the features that SVQ does
not support when forwarding buffers. This includes packed vq layout,
indirect descriptors or event idx.

Future changes can add support to offer more features to the guest,
since the use of VirtQueue gives this for free. This is left out at the
moment for simplicity.

Signed-off-by: Eugenio Pérez 
---
  hw/virtio/vhost-shadow-virtqueue.h |  2 ++
  hw/virtio/vhost-shadow-virtqueue.c | 39 ++
  hw/virtio/vhost-vdpa.c | 18 ++
  3 files changed, 59 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 1d4c160d0a..84747655ad 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -33,6 +33,8 @@ typedef struct VhostShadowVirtqueue {
  EventNotifier svq_call;
  } VhostShadowVirtqueue;
  
+bool vhost_svq_valid_features(uint64_t *features);

+
  void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
  void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int 
call_fd);
  
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c

index 54c701a196..34354aea2c 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -14,6 +14,45 @@
  #include "qemu/main-loop.h"
  #include "linux-headers/linux/vhost.h"
  
+/**

+ * Validate the transport device features that both guests can use with the SVQ
+ * and SVQs can use with the device.
+ *
+ * @dev_features  The features. If success, the acknowledged features. If
+ *failure, the minimal set from it.
+ *
+ * Returns true if SVQ can go with a subset of these, false otherwise.
+ */
+bool vhost_svq_valid_features(uint64_t *features)
+{
+bool r = true;
+
+for (uint64_t b = VIRTIO_TRANSPORT_F_START; b <= VIRTIO_TRANSPORT_F_END;
+ ++b) {
+switch (b) {
+case VIRTIO_F_ANY_LAYOUT:
+continue;
+
+case VIRTIO_F_ACCESS_PLATFORM:
+/* SVQ trust in the host's IOMMU to translate addresses */
+case VIRTIO_F_VERSION_1:
+/* SVQ trust that the guest vring is little endian */
+if (!(*features & BIT_ULL(b))) {
+set_bit(b, features);
+r = false;
+}
+continue;



It looks to me the *features is only used for logging errors, if this is 
the truth. I suggest to do error_setg in this function instead of 
letting the caller to pass a pointer.




+
+default:
+if (*features & BIT_ULL(b)) {
+clear_bit(b, features);
+}



Do we need to check indirect and event idx here?

Thanks



+}
+}
+
+return r;
+}
+
  /** Forward guest notifications */
  static void vhost_handle_guest_kick(EventNotifier *n)
  {
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index c73215751d..d614c435f3 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -348,11 +348,29 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, 
struct vhost_vdpa *v,
 Error **errp)
  {
  g_autoptr(GPtrArray) shadow_vqs = NULL;
+uint64_t dev_features, svq_features;
+int r;
+bool ok;
  
  if (!v->shadow_vqs_enabled) {

  return 0;
  }
  
+r = hdev->vhost_ops->vhost_get_features(hdev, &dev_features);

+if (r != 0) {
+error_setg_errno(errp, -r, "Can't get vdpa device features");
+return r;
+}
+
+svq_features = dev_features;
+ok = vhost_svq_valid_features(&svq_features);
+if (unlikely(!ok)) {
+error_setg(errp,
+"SVQ Invalid device feature flags, offer: 0x%"PRIx64", ok: 
0x%"PRIx64,
+dev_features, svq_features);
+return -1;
+}
+
  shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
  for (unsigned n = 0; n < hdev->nvqs; ++n) {
  g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new();


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 03/14] vhost: Add Shadow VirtQueue call forwarding capabilities

2022-02-27 Thread Jason Wang


在 2022/2/27 下午9:41, Eugenio Pérez 写道:

This will make qemu aware of the device used buffers, allowing it to
write the guest memory with its contents if needed.

Signed-off-by: Eugenio Pérez 
---
  hw/virtio/vhost-shadow-virtqueue.h |  4 
  hw/virtio/vhost-shadow-virtqueue.c | 34 ++
  hw/virtio/vhost-vdpa.c | 31 +--
  3 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 1cbc87d5d8..1d4c160d0a 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -28,9 +28,13 @@ typedef struct VhostShadowVirtqueue {
   * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
   */
  EventNotifier svq_kick;
+
+/* Guest's call notifier, where the SVQ calls guest. */
+EventNotifier svq_call;
  } VhostShadowVirtqueue;
  
  void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);

+void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd);
  
  void vhost_svq_stop(VhostShadowVirtqueue *svq);
  
diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c

index a5d0659f86..54c701a196 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -23,6 +23,38 @@ static void vhost_handle_guest_kick(EventNotifier *n)
  event_notifier_set(&svq->hdev_kick);
  }
  
+/* Forward vhost notifications */

+static void vhost_svq_handle_call(EventNotifier *n)
+{
+VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
+ hdev_call);
+event_notifier_test_and_clear(n);
+event_notifier_set(&svq->svq_call);
+}
+
+/**
+ * Set the call notifier for the SVQ to call the guest
+ *
+ * @svq Shadow virtqueue
+ * @call_fd call notifier
+ *
+ * Called on BQL context.
+ */
+void vhost_svq_set_guest_call_notifier(VhostShadowVirtqueue *svq, int call_fd)



I think we need to have consistent naming for both kick and call. Note 
that in patch 2 we had


vhost_svq_set_svq_kick_fd

Maybe it's better to use vhost_svq_set_guest_call_fd() here.



+{
+if (call_fd == VHOST_FILE_UNBIND) {
+/*
+ * Fail event_notifier_set if called handling device call.
+ *
+ * SVQ still needs device notifications, since it needs to keep
+ * forwarding used buffers even with the unbind.
+ */
+memset(&svq->svq_call, 0, sizeof(svq->svq_call));



I may miss something but shouldn't we stop polling svq_call here like

event_notifier_set_handle(&svq->svq_call, false);

?

Thanks


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 02/14] vhost: Add Shadow VirtQueue kick forwarding capabilities

2022-02-27 Thread Jason Wang


在 2022/2/27 下午9:40, Eugenio Pérez 写道:

At this mode no buffer forwarding will be performed in SVQ mode: Qemu
will just forward the guest's kicks to the device.

Host memory notifiers regions are left out for simplicity, and they will
not be addressed in this series.

Signed-off-by: Eugenio Pérez 
---
  hw/virtio/vhost-shadow-virtqueue.h |  14 +++
  include/hw/virtio/vhost-vdpa.h |   4 +
  hw/virtio/vhost-shadow-virtqueue.c |  52 +++
  hw/virtio/vhost-vdpa.c | 145 -
  4 files changed, 213 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index f1519e3c7b..1cbc87d5d8 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -18,8 +18,22 @@ typedef struct VhostShadowVirtqueue {
  EventNotifier hdev_kick;
  /* Shadow call notifier, sent to vhost */
  EventNotifier hdev_call;
+
+/*
+ * Borrowed virtqueue's guest to host notifier. To borrow it in this event
+ * notifier allows to recover the VhostShadowVirtqueue from the event loop
+ * easily. If we use the VirtQueue's one, we don't have an easy way to
+ * retrieve VhostShadowVirtqueue.
+ *
+ * So shadow virtqueue must not clean it, or we would lose VirtQueue one.
+ */
+EventNotifier svq_kick;
  } VhostShadowVirtqueue;
  
+void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);

+
+void vhost_svq_stop(VhostShadowVirtqueue *svq);
+
  VhostShadowVirtqueue *vhost_svq_new(void);
  
  void vhost_svq_free(gpointer vq);

diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index 3ce79a646d..009a9f3b6b 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -12,6 +12,8 @@
  #ifndef HW_VIRTIO_VHOST_VDPA_H
  #define HW_VIRTIO_VHOST_VDPA_H
  
+#include 

+
  #include "hw/virtio/virtio.h"
  #include "standard-headers/linux/vhost_types.h"
  
@@ -27,6 +29,8 @@ typedef struct vhost_vdpa {

  bool iotlb_batch_begin_sent;
  MemoryListener listener;
  struct vhost_vdpa_iova_range iova_range;
+bool shadow_vqs_enabled;
+GPtrArray *shadow_vqs;
  struct vhost_dev *dev;
  VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
  } VhostVDPA;
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 019cf1950f..a5d0659f86 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -11,6 +11,56 @@
  #include "hw/virtio/vhost-shadow-virtqueue.h"
  
  #include "qemu/error-report.h"

+#include "qemu/main-loop.h"
+#include "linux-headers/linux/vhost.h"
+
+/** Forward guest notifications */
+static void vhost_handle_guest_kick(EventNotifier *n)
+{
+VhostShadowVirtqueue *svq = container_of(n, VhostShadowVirtqueue,
+ svq_kick);
+event_notifier_test_and_clear(n);
+event_notifier_set(&svq->hdev_kick);
+}
+
+/**
+ * Set a new file descriptor for the guest to kick the SVQ and notify for avail
+ *
+ * @svq  The svq
+ * @svq_kick_fd  The svq kick fd
+ *
+ * Note that the SVQ will never close the old file descriptor.
+ */
+void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd)
+{
+EventNotifier *svq_kick = &svq->svq_kick;
+bool poll_stop = VHOST_FILE_UNBIND != event_notifier_get_fd(svq_kick);



I wonder if this is robust. E.g is there any chance that may end up with 
both poll_stop and poll_start are false?


If not, can we simple detect poll_stop as below and treat !poll_start 
and poll_stop?


Other looks good.

Thanks



+bool poll_start = svq_kick_fd != VHOST_FILE_UNBIND;
+
+if (poll_stop) {
+event_notifier_set_handler(svq_kick, NULL);
+}
+
+/*
+ * event_notifier_set_handler already checks for guest's notifications if
+ * they arrive at the new file descriptor in the switch, so there is no
+ * need to explicitly check for them.
+ */
+if (poll_start) {
+event_notifier_init_fd(svq_kick, svq_kick_fd);
+event_notifier_set(svq_kick);
+event_notifier_set_handler(svq_kick, vhost_handle_guest_kick);
+}
+}
+
+/**
+ * Stop the shadow virtqueue operation.
+ * @svq Shadow Virtqueue
+ */
+void vhost_svq_stop(VhostShadowVirtqueue *svq)
+{
+event_notifier_set_handler(&svq->svq_kick, NULL);
+}
  
  /**

   * Creates vhost shadow virtqueue, and instructs the vhost device to use the
@@ -39,6 +89,7 @@ VhostShadowVirtqueue *vhost_svq_new(void)
  goto err_init_hdev_call;
  }
  
+event_notifier_init_fd(&svq->svq_kick, VHOST_FILE_UNBIND);

  return g_steal_pointer(&svq);
  
  err_init_hdev_call:

@@ -56,6 +107,7 @@ err_init_hdev_kick:
  void vhost_svq_free(gpointer pvq)
  {
  VhostShadowVirtqueue *vq = pvq;
+vhost_svq_stop(vq);
  event_notifier_cleanup(&vq->hdev_kick);
  event_notifier_cleanup(&vq->hdev_call);
  g_free(vq);
diff --git a/hw/virtio/vhost-vdpa.

Re: [PATCH v2 00/14] vDPA shadow virtqueue

2022-02-27 Thread Jason Wang
On Sun, Feb 27, 2022 at 9:42 PM Eugenio Pérez  wrote:
>
> This series enable shadow virtqueue (SVQ) for vhost-vdpa devices. This
> is intended as a new method of tracking the memory the devices touch
> during a migration process: Instead of relay on vhost device's dirty
> logging capability, SVQ intercepts the VQ dataplane forwarding the
> descriptors between VM and device. This way qemu is the effective
> writer of guests memory, like in qemu's virtio device operation.
>
> When SVQ is enabled qemu offers a new virtual address space to the
> device to read and write into, and it maps new vrings and the guest
> memory in it. SVQ also intercepts kicks and calls between the device
> and the guest. Used buffers relay would cause dirty memory being
> tracked.
>
> This effectively means that vDPA device passthrough is intercepted by
> qemu. While SVQ should only be enabled at migration time, the switching
> from regular mode to SVQ mode is left for a future series.
>
> It is based on the ideas of DPDK SW assisted LM, in the series of
> DPDK's https://patchwork.dpdk.org/cover/48370/ . However, these does
> not map the shadow vq in guest's VA, but in qemu's.
>
> For qemu to use shadow virtqueues the guest virtio driver must not use
> features like event_idx, indirect descriptors, packed and in_order.
> These features are easy to implement on top of this base, but is left
> for a future series for simplicity.
>
> SVQ needs to be enabled at qemu start time with vdpa cmdline parameter:
>
> -netdev type=vhost-vdpa,vhostdev=vhost-vdpa-0,id=vhost-vdpa0,x-svq=off
>
> The first three patches enables notifications forwarding with
> assistance of qemu. It's easy to enable only this if the relevant
> cmdline part of the last patch is applied on top of these.
>
> Next four patches implement the actual buffer forwarding. However,
> address are not translated from HVA so they will need a host device with
> an iommu allowing them to access all of the HVA range.
>
> The last part of the series uses properly the host iommu, so qemu
> creates a new iova address space in the device's range and translates
> the buffers in it. Finally, it adds the cmdline parameter.
>
> Some simple performance tests with netperf were done. They used a nested
> guest with vp_vdpa, vhost-kernel at L0 host. Starting with no svq and a
> baseline average of ~9980.13Mbps:
> Recv   SendSend
> Socket Socket  Message  Elapsed
> Size   SizeSize Time Throughput
> bytes  bytes   bytessecs.10^6bits/sec
>
> 131072  16384  1638430.019910.61
> 131072  16384  1638430.0010030.94
> 131072  16384  1638430.019998.84
>
> To enable the notifications interception reduced performance to an
> average of ~9577.73Mbit/s:
> Recv   SendSend
> Socket Socket  Message  Elapsed
> Size   SizeSize Time Throughput
> bytes  bytes   bytessecs.10^6bits/sec
>
> 131072  16384  1638430.009563.03
> 131072  16384  1638430.019626.65
> 131072  16384  1638430.019543.51
>
> Finally, to enable buffers forwarding reduced the throughput again to
> ~8902.92Mbit/s:
> Recv   SendSend
> Socket Socket  Message  Elapsed
> Size   SizeSize Time Throughput
> bytes  bytes   bytessecs.10^6bits/sec
>
> 131072  16384  1638430.018643.19
> 131072  16384  1638430.019033.56
> 131072  16384  1638430.019032.02
>
> However, many performance improvements were left out of this series for
> simplicity, so difference if performance should shrink in the future.

I think the performance should be acceptable as a start.

>
> Comments are welcome.
>
> TODO in future series:
> * Event, indirect, packed, and others features of virtio.
> * To support different set of features between the device<->SVQ and the
>   SVQ<->guest communication.
> * Support of device host notifier memory regions.
> * To sepparate buffers forwarding in its own AIO context, so we can
>   throw more threads to that task and we don't need to stop the main
>   event loop.
> * Support multiqueue virtio-net vdpa.
> * Proper documentation.
>
> Changes from v1:
> * Feature set at device->SVQ is now the same as SVQ->guest.
> * Size of SVQ is not max available device size anymore, but guest's
>   negotiated.
> * Add VHOST_FILE_UNBIND kick and call fd treatment.
> * Make SVQ a public struct
> * Come back to previous approach to iova-tree
> * Some assertions are now fail paths. Some errors are now log_guest.
> * Only mask _F_LOG feature at vdpa_set_features svq enable path.
> * Refactor some errors and messages. Add missing error unwindings.
> * Add memory barrier at _F_NO_NOTIFY set.
> * Stop checking for features flags out of transport range.
> v1 link:
> https://lore.kernel.org/virtualization/7d86c715-6d71-8a27-91f5-8d47b71e3...@redhat.com/
>
> Changes from v4 RFC:
> * Support of allocating / freeing iova ranges in IOVA tree. Extending
>   already present iova-tree for that.
> * Proper validation of guest features. Now SVQ c

[PATCH] VMCI: Update maintainers for VMCI

2022-02-27 Thread Jorgen Hansen
Remove myself as maintainer for the VMCI driver, and add Bryan
and Rajesh.

Acked-by: Rajesh Jalisatgi 
Acked-by: Bryan Tan 
Acked-by: Vishnu Dasa 
Signed-off-by: Jorgen Hansen 
---
 MAINTAINERS | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index bb6c9b5a3253..ecf22b62161e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -20717,7 +20717,8 @@ S:  Supported
 F: drivers/ptp/ptp_vmw.c
 
 VMWARE VMCI DRIVER
-M: Jorgen Hansen 
+M: Bryan Tan 
+M: Rajesh Jalisatgi 
 M: Vishnu Dasa 
 L: linux-ker...@vger.kernel.org
 L: pv-driv...@vmware.com (private)
-- 
2.25.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v3 04/11] hv: Use driver_set_override() instead of open-coding

2022-02-27 Thread Michael Kelley (LINUX) via Virtualization
From: Krzysztof Kozlowski  Sent: Sunday, 
February 27, 2022 5:52 AM
> 
> Use a helper for seting driver_override to reduce amount of duplicated
> code. Make the driver_override field const char, because it is not
> modified by the core and it matches other subsystems.
> 
> Signed-off-by: Krzysztof Kozlowski 
> ---
>  drivers/hv/vmbus_drv.c | 28 
>  include/linux/hyperv.h |  7 ++-
>  2 files changed, 10 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 12a2b37e87f3..a0ff4139c3d2 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -575,31 +575,11 @@ static ssize_t driver_override_store(struct device *dev,
>const char *buf, size_t count)
>  {
>   struct hv_device *hv_dev = device_to_hv_device(dev);
> - char *driver_override, *old, *cp;
> -
> - /* We need to keep extra room for a newline */
> - if (count >= (PAGE_SIZE - 1))
> - return -EINVAL;
> -
> - driver_override = kstrndup(buf, count, GFP_KERNEL);
> - if (!driver_override)
> - return -ENOMEM;
> -
> - cp = strchr(driver_override, '\n');
> - if (cp)
> - *cp = '\0';
> -
> - device_lock(dev);
> - old = hv_dev->driver_override;
> - if (strlen(driver_override)) {
> - hv_dev->driver_override = driver_override;
> - } else {
> - kfree(driver_override);
> - hv_dev->driver_override = NULL;
> - }
> - device_unlock(dev);
> + int ret;
> 
> - kfree(old);
> + ret = driver_set_override(dev, &hv_dev->driver_override, buf, count);
> + if (ret)
> + return ret;
> 
>   return count;
>  }
> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> index fe2e0179ed51..beea11874be2 100644
> --- a/include/linux/hyperv.h
> +++ b/include/linux/hyperv.h
> @@ -1257,7 +1257,12 @@ struct hv_device {
>   u16 device_id;
> 
>   struct device device;
> - char *driver_override; /* Driver name to force a match */
> + /*
> +  * Driver name to force a match.
> +  * Do not set directly, because core frees it.
> +  * Use driver_set_override() to set or clear it.
> +  */
> + const char *driver_override;
> 
>   struct vmbus_channel *channel;
>   struct kset  *channels_kset;
> --
> 2.32.0

Reviewed-by: Michael Kelley 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization