Re: [PATCH v2 3/6] virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support

2024-05-23 Thread Jonah Palmer




On 5/23/24 6:47 AM, Eugenio Perez Martin wrote:

On Thu, May 23, 2024 at 12:30 PM Jonah Palmer  wrote:




On 5/22/24 12:07 PM, Eugenio Perez Martin wrote:

On Mon, May 20, 2024 at 3:01 PM Jonah Palmer  wrote:


Add VIRTIO_F_IN_ORDER feature support for the virtqueue_fill operation.

The goal of the virtqueue_ordered_fill operation when the
VIRTIO_F_IN_ORDER feature has been negotiated is to search for this
now-used element, set its length, and mark the element as filled in
the VirtQueue's used_elems array.

By marking the element as filled, it will indicate that this element has
been processed and is ready to be flushed, so long as the element is
in-order.

Signed-off-by: Jonah Palmer 
---
   hw/virtio/virtio.c | 36 +++-
   1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 7456d61bc8..01b6b32460 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -873,6 +873,38 @@ static void virtqueue_packed_fill(VirtQueue *vq, const 
VirtQueueElement *elem,
   vq->used_elems[idx].ndescs = elem->ndescs;
   }

+static void virtqueue_ordered_fill(VirtQueue *vq, const VirtQueueElement *elem,
+   unsigned int len)
+{
+unsigned int i, steps, max_steps;
+
+i = vq->used_idx;
+steps = 0;
+/*
+ * We shouldn't need to increase 'i' by more than the distance
+ * between used_idx and last_avail_idx.
+ */
+max_steps = (vq->last_avail_idx + vq->vring.num - vq->used_idx)
+% vq->vring.num;


I may be missing something, but (+vq->vring.num) is redundant if we (%
vq->vring.num), isn't it?



It ensures the result is always non-negative (e.g. when
vq->last_avail_idx < vq->used_idx).

I wasn't sure how different platforms or compilers would handle
something like -5 % 10, so to be safe I included the '+ vq->vring.num'.

For example, on my system, in test.c;

 #include 

 int main() {
 unsigned int result = -5 % 10;
 printf("Result of -5 %% 10 is: %d\n", result);
 return 0;
 }

# gcc -o test test.c

# ./test
Result of -5 % 10 is: -5



I think the modulo is being done in signed ints in your test, and then
converting a signed int to an unsigned int. Like result = (-5 % 10).

The unsigned wrap is always defined in C, and vq->last_avail_idx and
vq->used_idx are both unsigned. Here is a closer test:
int main(void) {
 unsigned int a = -5, b = 2;
 unsigned int result = (b-a) % 10;
 printf("Result of -5 %% 10 is: %u\n", result);
 return 0;
}

But it is a good catch for signed ints for sure :).

Thanks!



Ah, I see now! Thanks for the clarification. In that case, I'll remove 
the '+ vq->vring.num' in v3.



+
+/* Search for element in vq->used_elems */
+while (steps <= max_steps) {
+/* Found element, set length and mark as filled */
+if (vq->used_elems[i].index == elem->index) {
+vq->used_elems[i].len = len;
+vq->used_elems[i].in_order_filled = true;
+break;
+}
+
+i += vq->used_elems[i].ndescs;
+steps += vq->used_elems[i].ndescs;
+
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+}
+}
+


Let's report an error if we finish the loop. I think:
qemu_log_mask(LOG_GUEST_ERROR,
"%s: %s cannot fill buffer id %u\n",
__func__, vdev->name, elem->index);

(or similar) should do.

apart form that,

Reviewed-by: Eugenio Pérez 



Gotcha. Will add this in v3.

Thank you Eugenio!


   static void virtqueue_packed_fill_desc(VirtQueue *vq,
  const VirtQueueElement *elem,
  unsigned int idx,
@@ -923,7 +955,9 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
   return;
   }

-if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_ordered_fill(vq, elem, len);
+} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
   virtqueue_packed_fill(vq, elem, len, idx);
   } else {
   virtqueue_split_fill(vq, elem, len, idx);
--
2.39.3











Re: [PATCH v2 3/6] virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support

2024-05-23 Thread Jonah Palmer




On 5/22/24 12:07 PM, Eugenio Perez Martin wrote:

On Mon, May 20, 2024 at 3:01 PM Jonah Palmer  wrote:


Add VIRTIO_F_IN_ORDER feature support for the virtqueue_fill operation.

The goal of the virtqueue_ordered_fill operation when the
VIRTIO_F_IN_ORDER feature has been negotiated is to search for this
now-used element, set its length, and mark the element as filled in
the VirtQueue's used_elems array.

By marking the element as filled, it will indicate that this element has
been processed and is ready to be flushed, so long as the element is
in-order.

Signed-off-by: Jonah Palmer 
---
  hw/virtio/virtio.c | 36 +++-
  1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 7456d61bc8..01b6b32460 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -873,6 +873,38 @@ static void virtqueue_packed_fill(VirtQueue *vq, const 
VirtQueueElement *elem,
  vq->used_elems[idx].ndescs = elem->ndescs;
  }

+static void virtqueue_ordered_fill(VirtQueue *vq, const VirtQueueElement *elem,
+   unsigned int len)
+{
+unsigned int i, steps, max_steps;
+
+i = vq->used_idx;
+steps = 0;
+/*
+ * We shouldn't need to increase 'i' by more than the distance
+ * between used_idx and last_avail_idx.
+ */
+max_steps = (vq->last_avail_idx + vq->vring.num - vq->used_idx)
+% vq->vring.num;


I may be missing something, but (+vq->vring.num) is redundant if we (%
vq->vring.num), isn't it?



It ensures the result is always non-negative (e.g. when 
vq->last_avail_idx < vq->used_idx).


I wasn't sure how different platforms or compilers would handle 
something like -5 % 10, so to be safe I included the '+ vq->vring.num'.


For example, on my system, in test.c;

   #include 

   int main() {
   unsigned int result = -5 % 10;
   printf("Result of -5 %% 10 is: %d\n", result);
   return 0;
   }

# gcc -o test test.c

# ./test
Result of -5 % 10 is: -5


+
+/* Search for element in vq->used_elems */
+while (steps <= max_steps) {
+/* Found element, set length and mark as filled */
+if (vq->used_elems[i].index == elem->index) {
+vq->used_elems[i].len = len;
+vq->used_elems[i].in_order_filled = true;
+break;
+}
+
+i += vq->used_elems[i].ndescs;
+steps += vq->used_elems[i].ndescs;
+
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+}
+}
+


Let's report an error if we finish the loop. I think:
qemu_log_mask(LOG_GUEST_ERROR,
   "%s: %s cannot fill buffer id %u\n",
   __func__, vdev->name, elem->index);

(or similar) should do.

apart form that,

Reviewed-by: Eugenio Pérez 



Gotcha. Will add this in v3.

Thank you Eugenio!


  static void virtqueue_packed_fill_desc(VirtQueue *vq,
 const VirtQueueElement *elem,
 unsigned int idx,
@@ -923,7 +955,9 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
  return;
  }

-if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_ordered_fill(vq, elem, len);
+} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
  virtqueue_packed_fill(vq, elem, len, idx);
  } else {
  virtqueue_split_fill(vq, elem, len, idx);
--
2.39.3







Re: [PATCH v2 2/6] virtio: virtqueue_pop - VIRTIO_F_IN_ORDER support

2024-05-23 Thread Jonah Palmer




On 5/22/24 11:45 AM, Eugenio Perez Martin wrote:

On Mon, May 20, 2024 at 3:01 PM Jonah Palmer  wrote:


Add VIRTIO_F_IN_ORDER feature support in virtqueue_split_pop and
virtqueue_packed_pop.

VirtQueueElements popped from the available/descritpor ring are added to
the VirtQueue's used_elems array in-order and in the same fashion as
they would be added the used and descriptor rings, respectively.

This will allow us to keep track of the current order, what elements
have been written, as well as an element's essential data after being
processed.

Tested-by: Lei Yang 
Signed-off-by: Jonah Palmer 
---
  hw/virtio/virtio.c | 17 -
  1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 893a072c9d..7456d61bc8 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1506,7 +1506,7 @@ static void *virtqueue_alloc_element(size_t sz, unsigned 
out_num, unsigned in_nu

  static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
  {
-unsigned int i, head, max;
+unsigned int i, head, max, prev_avail_idx;
  VRingMemoryRegionCaches *caches;
  MemoryRegionCache indirect_desc_cache;
  MemoryRegionCache *desc_cache;
@@ -1539,6 +1539,8 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
  goto done;
  }

+prev_avail_idx = vq->last_avail_idx;
+
  if (!virtqueue_get_head(vq, vq->last_avail_idx++, )) {
  goto done;
  }
@@ -1630,6 +1632,12 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
sz)
  elem->in_sg[i] = iov[out_num + i];
  }

+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {


I think vq->last_avail_idx - 1 could be more clear here.

Either way,

Reviewed-by: Eugenio Pérez 



Sure thing! Will make this change in v3.


+vq->used_elems[prev_avail_idx].index = elem->index;
+vq->used_elems[prev_avail_idx].len = elem->len;
+vq->used_elems[prev_avail_idx].ndescs = elem->ndescs;
+}
+
  vq->inuse++;

  trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
@@ -1758,6 +1766,13 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t 
sz)

  elem->index = id;
  elem->ndescs = (desc_cache == _desc_cache) ? 1 : elem_entries;
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+vq->used_elems[vq->last_avail_idx].index = elem->index;
+vq->used_elems[vq->last_avail_idx].len = elem->len;
+vq->used_elems[vq->last_avail_idx].ndescs = elem->ndescs;
+}
+
  vq->last_avail_idx += elem->ndescs;
  vq->inuse += elem->ndescs;

--
2.39.3







Re: [PATCH v2 1/6] virtio: Add bool to VirtQueueElement

2024-05-23 Thread Jonah Palmer




On 5/22/24 11:44 AM, Eugenio Perez Martin wrote:

On Mon, May 20, 2024 at 3:01 PM Jonah Palmer  wrote:


Add the boolean 'in_order_filled' member to the VirtQueueElement structure.
The use of this boolean will signify whether the element has been processed
and is ready to be flushed (so long as the element is in-order). This
boolean is used to support the VIRTIO_F_IN_ORDER feature.

Tested-by: Lei Yang 


The code has changed from the version that Lei tested, so we should
drop this tag until he re-test again.

Reviewed-by: Eugenio Pérez 



My apologies. I wasn't sure if I should've removed the tag for all 
changes or just the significant changes.



Signed-off-by: Jonah Palmer 
---
  include/hw/virtio/virtio.h | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 7d5ffdc145..88e70c1ae1 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -69,6 +69,8 @@ typedef struct VirtQueueElement
  unsigned int ndescs;
  unsigned int out_num;
  unsigned int in_num;
+/* Element has been processed (VIRTIO_F_IN_ORDER) */
+bool in_order_filled;
  hwaddr *in_addr;
  hwaddr *out_addr;
  struct iovec *in_sg;
--
2.39.3







[PATCH v2 5/6] vhost, vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits

2024-05-20 Thread Jonah Palmer via
Add support for the VIRTIO_F_IN_ORDER feature across a variety of vhost
devices.

The inclusion of VIRTIO_F_IN_ORDER in the feature bits arrays for these
devices ensures that the backend is capable of offering and providing
support for this feature, and that it can be disabled if the backend
does not support it.

Tested-by: Lei Yang 
Acked-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 hw/block/vhost-user-blk.c| 1 +
 hw/net/vhost_net.c   | 2 ++
 hw/scsi/vhost-scsi.c | 1 +
 hw/scsi/vhost-user-scsi.c| 1 +
 hw/virtio/vhost-user-fs.c| 1 +
 hw/virtio/vhost-user-vsock.c | 1 +
 net/vhost-vdpa.c | 1 +
 7 files changed, 8 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9e6bbc6950..1dd0a8ef63 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index fd1a93701a..eb0b1c06e5 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_HASH_REPORT,
 VHOST_INVALID_FEATURE_BIT
 };
@@ -76,6 +77,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_RSS,
 VIRTIO_NET_F_HASH_REPORT,
 VIRTIO_NET_F_GUEST_USO4,
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index ae26bc19a4..40e7630191 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a63b1f4948..1d59951ab7 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index cca2cd41be..9243dbb128 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -33,6 +33,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 
 VHOST_INVALID_FEATURE_BIT
 };
diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
index 9431b9792c..cc7e4e47b4 100644
--- a/hw/virtio/vhost-user-vsock.c
+++ b/hw/virtio/vhost-user-vsock.c
@@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 85e73dd6a7..ed3185acfa 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
 VIRTIO_F_VERSION_1,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_CSUM,
 VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
 VIRTIO_NET_F_CTRL_MAC_ADDR,
-- 
2.39.3




[PATCH v2 2/6] virtio: virtqueue_pop - VIRTIO_F_IN_ORDER support

2024-05-20 Thread Jonah Palmer
Add VIRTIO_F_IN_ORDER feature support in virtqueue_split_pop and
virtqueue_packed_pop.

VirtQueueElements popped from the available/descritpor ring are added to
the VirtQueue's used_elems array in-order and in the same fashion as
they would be added the used and descriptor rings, respectively.

This will allow us to keep track of the current order, what elements
have been written, as well as an element's essential data after being
processed.

Tested-by: Lei Yang 
Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 893a072c9d..7456d61bc8 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1506,7 +1506,7 @@ static void *virtqueue_alloc_element(size_t sz, unsigned 
out_num, unsigned in_nu
 
 static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
 {
-unsigned int i, head, max;
+unsigned int i, head, max, prev_avail_idx;
 VRingMemoryRegionCaches *caches;
 MemoryRegionCache indirect_desc_cache;
 MemoryRegionCache *desc_cache;
@@ -1539,6 +1539,8 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
 goto done;
 }
 
+prev_avail_idx = vq->last_avail_idx;
+
 if (!virtqueue_get_head(vq, vq->last_avail_idx++, )) {
 goto done;
 }
@@ -1630,6 +1632,12 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
sz)
 elem->in_sg[i] = iov[out_num + i];
 }
 
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+vq->used_elems[prev_avail_idx].index = elem->index;
+vq->used_elems[prev_avail_idx].len = elem->len;
+vq->used_elems[prev_avail_idx].ndescs = elem->ndescs;
+}
+
 vq->inuse++;
 
 trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
@@ -1758,6 +1766,13 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t 
sz)
 
 elem->index = id;
 elem->ndescs = (desc_cache == _desc_cache) ? 1 : elem_entries;
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+vq->used_elems[vq->last_avail_idx].index = elem->index;
+vq->used_elems[vq->last_avail_idx].len = elem->len;
+vq->used_elems[vq->last_avail_idx].ndescs = elem->ndescs;
+}
+
 vq->last_avail_idx += elem->ndescs;
 vq->inuse += elem->ndescs;
 
-- 
2.39.3




[PATCH v2 4/6] virtio: virtqueue_ordered_flush - VIRTIO_F_IN_ORDER support

2024-05-20 Thread Jonah Palmer
Add VIRTIO_F_IN_ORDER feature support for the virtqueue_flush operation.

The goal of the virtqueue_ordered_flush operation when the
VIRTIO_F_IN_ORDER feature has been negotiated is to write elements to
the used/descriptor ring in-order and then update used_idx.

The function iterates through the VirtQueueElement used_elems array
in-order starting at vq->used_idx. If the element is valid (filled), the
element is written to the used/descriptor ring. This process continues
until we find an invalid (not filled) element.

For packed VQs, the first entry (at vq->used_idx) is written to the
descriptor ring last so the guest doesn't see any invalid descriptors.

If any elements were written, the used_idx is updated.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 66 +-
 1 file changed, 65 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 01b6b32460..39b91beece 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1016,6 +1016,68 @@ static void virtqueue_packed_flush(VirtQueue *vq, 
unsigned int count)
 }
 }
 
+static void virtqueue_ordered_flush(VirtQueue *vq)
+{
+unsigned int i = vq->used_idx;
+unsigned int ndescs = 0;
+uint16_t old = vq->used_idx;
+bool packed;
+VRingUsedElem uelem;
+
+packed = virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED);
+
+if (packed) {
+if (unlikely(!vq->vring.desc)) {
+return;
+}
+} else if (unlikely(!vq->vring.used)) {
+return;
+}
+
+/* First expected in-order element isn't ready, nothing to do */
+if (!vq->used_elems[i].in_order_filled) {
+return;
+}
+
+/* Search for filled elements in-order */
+while (vq->used_elems[i].in_order_filled) {
+/*
+ * First entry for packed VQs is written last so the guest
+ * doesn't see invalid descriptors.
+ */
+if (packed && i != vq->used_idx) {
+virtqueue_packed_fill_desc(vq, >used_elems[i], ndescs, false);
+} else if (!packed) {
+uelem.id = vq->used_elems[i].index;
+uelem.len = vq->used_elems[i].len;
+vring_used_write(vq, , i);
+}
+
+vq->used_elems[i].in_order_filled = false;
+ndescs += vq->used_elems[i].ndescs;
+i += ndescs;
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+}
+
+if (packed) {
+virtqueue_packed_fill_desc(vq, >used_elems[vq->used_idx], 0, true);
+vq->used_idx += ndescs;
+if (vq->used_idx >= vq->vring.num) {
+vq->used_idx -= vq->vring.num;
+vq->used_wrap_counter ^= 1;
+vq->signalled_used_valid = false;
+}
+} else {
+vring_used_idx_set(vq, i);
+if (unlikely((int16_t)(i - vq->signalled_used) < (uint16_t)(i - old))) 
{
+vq->signalled_used_valid = false;
+}
+}
+vq->inuse -= ndescs;
+}
+
 void virtqueue_flush(VirtQueue *vq, unsigned int count)
 {
 if (virtio_device_disabled(vq->vdev)) {
@@ -1023,7 +1085,9 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
 return;
 }
 
-if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_ordered_flush(vq);
+} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
 virtqueue_packed_flush(vq, count);
 } else {
 virtqueue_split_flush(vq, count);
-- 
2.39.3




[PATCH v2 0/6] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-05-20 Thread Jonah Palmer
The goal of these patches is to add support to a variety of virtio and
vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
indicates that all buffers are used by the device in the same order in
which they were made available by the driver.

These patches attempt to implement a generalized, non-device-specific
solution to support this feature.

The core feature behind this solution is a buffer mechanism in the form
of a VirtQueue's used_elems VirtQueueElement array. This allows devices
who always use buffers in-order by default to have a minimal overhead
impact. Devices that may not always use buffers in-order likely will
experience a performance hit. How large that performance hit is will
depend on how frequently elements are completed out-of-order.

A VirtQueue whose device uses this feature will use its used_elems
VirtQueueElement array to hold used VirtQueueElements. The index that
used elements are placed in used_elems is the same index on the
used/descriptor ring that would satisfy the in-order requirement. In
other words, used elements are placed in their in-order locations on
used_elems and are only written to the used/descriptor ring once the
elements on used_elems are able to continue their expected order.

To differentiate between a "used" and "unused" element on the used_elems
array (a "used" element being an element that has returned from
processing and an "unused" element being an element that has not yet
been processed), we added a boolean 'in_order_filled' member to the
VirtQueueElement struct. This flag is set to true when the element comes
back from processing (virtqueue_ordered_fill) and then set back to false
once it's been written to the used/descriptor ring
(virtqueue_ordered_flush).

---
v2: Make 'in_order_filled' more descriptive.
Change 'j' to more descriptive var name in virtqueue_split_pop.
Use more definitive search conditional in virtqueue_ordered_fill.
Avoid code duplication in virtqueue_ordered_flush.

v1: Move series from RFC to PATCH for submission.

Jonah Palmer (6):
  virtio: Add bool to VirtQueueElement
  virtio: virtqueue_pop - VIRTIO_F_IN_ORDER support
  virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support
  virtio: virtqueue_ordered_flush - VIRTIO_F_IN_ORDER support
  vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
  virtio: Add VIRTIO_F_IN_ORDER property definition

 hw/block/vhost-user-blk.c|   1 +
 hw/net/vhost_net.c   |   2 +
 hw/scsi/vhost-scsi.c |   1 +
 hw/scsi/vhost-user-scsi.c|   1 +
 hw/virtio/vhost-user-fs.c|   1 +
 hw/virtio/vhost-user-vsock.c |   1 +
 hw/virtio/virtio.c   | 119 ++-
 include/hw/virtio/virtio.h   |   6 +-
 net/vhost-vdpa.c |   1 +
 9 files changed, 129 insertions(+), 4 deletions(-)

-- 
2.39.3




[PATCH v2 6/6] virtio: Add VIRTIO_F_IN_ORDER property definition

2024-05-20 Thread Jonah Palmer
Extend the virtio device property definitions to include the
VIRTIO_F_IN_ORDER feature.

The default state of this feature is disabled, allowing it to be
explicitly enabled where it's supported.

Tested-by: Lei Yang 
Acked-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 88e70c1ae1..d33345ecc5 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -371,7 +371,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
 DEFINE_PROP_BIT64("packed", _state, _field, \
   VIRTIO_F_RING_PACKED, false), \
 DEFINE_PROP_BIT64("queue_reset", _state, _field, \
-  VIRTIO_F_RING_RESET, true)
+  VIRTIO_F_RING_RESET, true), \
+DEFINE_PROP_BIT64("in_order", _state, _field, \
+  VIRTIO_F_IN_ORDER, false)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
-- 
2.39.3




[PATCH v2 3/6] virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support

2024-05-20 Thread Jonah Palmer
Add VIRTIO_F_IN_ORDER feature support for the virtqueue_fill operation.

The goal of the virtqueue_ordered_fill operation when the
VIRTIO_F_IN_ORDER feature has been negotiated is to search for this
now-used element, set its length, and mark the element as filled in
the VirtQueue's used_elems array.

By marking the element as filled, it will indicate that this element has
been processed and is ready to be flushed, so long as the element is
in-order.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 7456d61bc8..01b6b32460 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -873,6 +873,38 @@ static void virtqueue_packed_fill(VirtQueue *vq, const 
VirtQueueElement *elem,
 vq->used_elems[idx].ndescs = elem->ndescs;
 }
 
+static void virtqueue_ordered_fill(VirtQueue *vq, const VirtQueueElement *elem,
+   unsigned int len)
+{
+unsigned int i, steps, max_steps;
+
+i = vq->used_idx;
+steps = 0;
+/*
+ * We shouldn't need to increase 'i' by more than the distance
+ * between used_idx and last_avail_idx.
+ */
+max_steps = (vq->last_avail_idx + vq->vring.num - vq->used_idx)
+% vq->vring.num;
+
+/* Search for element in vq->used_elems */
+while (steps <= max_steps) {
+/* Found element, set length and mark as filled */
+if (vq->used_elems[i].index == elem->index) {
+vq->used_elems[i].len = len;
+vq->used_elems[i].in_order_filled = true;
+break;
+}
+
+i += vq->used_elems[i].ndescs;
+steps += vq->used_elems[i].ndescs;
+
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+}
+}
+
 static void virtqueue_packed_fill_desc(VirtQueue *vq,
const VirtQueueElement *elem,
unsigned int idx,
@@ -923,7 +955,9 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
 return;
 }
 
-if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_ordered_fill(vq, elem, len);
+} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
 virtqueue_packed_fill(vq, elem, len, idx);
 } else {
 virtqueue_split_fill(vq, elem, len, idx);
-- 
2.39.3




[PATCH v2 1/6] virtio: Add bool to VirtQueueElement

2024-05-20 Thread Jonah Palmer
Add the boolean 'in_order_filled' member to the VirtQueueElement structure.
The use of this boolean will signify whether the element has been processed
and is ready to be flushed (so long as the element is in-order). This
boolean is used to support the VIRTIO_F_IN_ORDER feature.

Tested-by: Lei Yang 
Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 7d5ffdc145..88e70c1ae1 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -69,6 +69,8 @@ typedef struct VirtQueueElement
 unsigned int ndescs;
 unsigned int out_num;
 unsigned int in_num;
+/* Element has been processed (VIRTIO_F_IN_ORDER) */
+bool in_order_filled;
 hwaddr *in_addr;
 hwaddr *out_addr;
 struct iovec *in_sg;
-- 
2.39.3




Re: [PATCH 4/6] virtio: virtqueue_ordered_flush - VIRTIO_F_IN_ORDER support

2024-05-10 Thread Jonah Palmer




On 5/10/24 3:48 AM, Eugenio Perez Martin wrote:

On Mon, May 6, 2024 at 5:06 PM Jonah Palmer  wrote:


Add VIRTIO_F_IN_ORDER feature support for virtqueue_flush operations.

The goal of the virtqueue_flush operation when the VIRTIO_F_IN_ORDER
feature has been negotiated is to write elements to the used/descriptor
ring in-order and then update used_idx.

The function iterates through the VirtQueueElement used_elems array
in-order starting at vq->used_idx. If the element is valid (filled), the
element is written to the used/descriptor ring. This process continues
until we find an invalid (not filled) element.

If any elements were written, the used_idx is updated.

Tested-by: Lei Yang 
Signed-off-by: Jonah Palmer 
---
  hw/virtio/virtio.c | 75 +-
  1 file changed, 74 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 064046b5e2..0efed2c88e 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1006,6 +1006,77 @@ static void virtqueue_packed_flush(VirtQueue *vq, 
unsigned int count)
  }
  }

+static void virtqueue_ordered_flush(VirtQueue *vq)
+{
+unsigned int i = vq->used_idx;
+unsigned int ndescs = 0;
+uint16_t old = vq->used_idx;
+bool packed;
+VRingUsedElem uelem;
+
+packed = virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED);
+
+if (packed) {
+if (unlikely(!vq->vring.desc)) {
+return;
+}
+} else if (unlikely(!vq->vring.used)) {
+return;
+}
+
+/* First expected in-order element isn't ready, nothing to do */
+if (!vq->used_elems[i].filled) {
+return;
+}
+
+/* Write first expected in-order element to used ring (split VQs) */
+if (!packed) {
+uelem.id = vq->used_elems[i].index;
+uelem.len = vq->used_elems[i].len;
+vring_used_write(vq, , i);
+}
+
+ndescs += vq->used_elems[i].ndescs;
+i += ndescs;
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+
+/* Search for more filled elements in-order */
+while (vq->used_elems[i].filled) {
+if (packed) {
+virtqueue_packed_fill_desc(vq, >used_elems[i], ndescs, false);
+} else {
+uelem.id = vq->used_elems[i].index;
+uelem.len = vq->used_elems[i].len;
+vring_used_write(vq, , i);
+}
+
+vq->used_elems[i].filled = false;
+ndescs += vq->used_elems[i].ndescs;
+i += ndescs;
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+}
+


I may be missing something, but you have split out the first case as a
special one, totally out of the while loop. Can't it be contained in
the loop checking !(packed && i == vq->used_idx)? That would avoid
code duplication.

A comment can be added in the line of "first entry of packed is
written the last so the guest does not see invalid descriptors".



Yea this was intentional for the reason you've given above. It was 
either the solution above or, as you suggest, handling this in the while 
loop:


if (!vq->used_elems[i].filled) {
return;
}

while (vq->used_elems[i].filled) {
if (packed && i != vq->used_idx) {
virtqueue_packed_fill_desc(...);
} else {
...
}
...
}

I did consider this option at the time of writing this patch but I 
must've overcomplicated it in my head somehow and thought the current 
solution was the simpler one. However, after looking it over again, your 
suggestion is indeed the cleaner one.


Will adjust this in v2! Thanks for your time reviewing these!


+if (packed) {
+virtqueue_packed_fill_desc(vq, >used_elems[vq->used_idx], 0, true);
+vq->used_idx += ndescs;
+if (vq->used_idx >= vq->vring.num) {
+vq->used_idx -= vq->vring.num;
+vq->used_wrap_counter ^= 1;
+vq->signalled_used_valid = false;
+}
+} else {
+vring_used_idx_set(vq, i);
+if (unlikely((int16_t)(i - vq->signalled_used) < (uint16_t)(i - old))) 
{
+vq->signalled_used_valid = false;
+}
+}
+vq->inuse -= ndescs;
+}
+
  void virtqueue_flush(VirtQueue *vq, unsigned int count)
  {
  if (virtio_device_disabled(vq->vdev)) {
@@ -1013,7 +1084,9 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
  return;
  }

-if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_ordered_flush(vq);
+} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
  virtqueue_packed_flush(vq, count);
  } else {
  virtqueue_split_flush(vq, count);
--
2.39.3







Re: [PATCH 3/6] virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support

2024-05-10 Thread Jonah Palmer




On 5/9/24 10:08 AM, Eugenio Perez Martin wrote:

On Mon, May 6, 2024 at 5:05 PM Jonah Palmer  wrote:


Add VIRTIO_F_IN_ORDER feature support for virtqueue_fill operations.

The goal of the virtqueue_fill operation when the VIRTIO_F_IN_ORDER
feature has been negotiated is to search for this now-used element,
set its length, and mark the element as filled in the VirtQueue's
used_elems array.

By marking the element as filled, it will indicate that this element is
ready to be flushed, so long as the element is in-order.

Tested-by: Lei Yang 
Signed-off-by: Jonah Palmer 
---
  hw/virtio/virtio.c | 26 +-
  1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index e6eb1bb453..064046b5e2 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -873,6 +873,28 @@ static void virtqueue_packed_fill(VirtQueue *vq, const 
VirtQueueElement *elem,
  vq->used_elems[idx].ndescs = elem->ndescs;
  }

+static void virtqueue_ordered_fill(VirtQueue *vq, const VirtQueueElement *elem,
+   unsigned int len)
+{
+unsigned int i = vq->used_idx;
+
+/* Search for element in vq->used_elems */
+while (i != vq->last_avail_idx) {
+/* Found element, set length and mark as filled */
+if (vq->used_elems[i].index == elem->index) {
+vq->used_elems[i].len = len;
+vq->used_elems[i].filled = true;
+break;
+}
+
+i += vq->used_elems[i].ndescs;
+
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+}


This has a subtle problem: ndescs and elems->id are controlled by the
guest, so it could make QEMU to loop forever looking for the right
descriptor. For each iteration, the code must control that the
variable "i" will be different for the next iteration, and that there
will be no more than vq->last_avail_idx - vq->used_idx iterations.



Very true and something I was worried about, e.g. what if, for some 
strange reason, we could never get i == vq->last_avail_idx.


Perhaps as a surefire way to make sure we terminate appropriately, as 
you mentioned, 'i' should not increase by more than the distance between 
used_idx and last_avail_idx. If it does, we exit the while loop:


unsigned int steps = 0;
unsigned int max_steps = (vq->last_avail_idx + vq->vring.num -
  vq->used_idx) % vq->vring.num;
while (steps <= max_steps) {
...
steps += vq->used_elems[i].ndescs;
...
}

Though if we do find that steps <= max_steps, should we treat this as an 
error or give some kind of warning? Since I believe that, under normal 
behavior, we shouldn't find ourselves in a situation where we weren't 
able to find the matching VirtQueueElement in the used_elems array. And 
not setting 'vq->used_elems[i].filled = true' may cause issues later.



Apart of that, I think it makes more sense to split the logical
sections of the function this way:
/* declarations */
i = vq->used_idx

/* Search for element in vq->used_elems */
while (vq->used_elems[i].index != elem->index &&
vq->used_elems[i].index i != vq->last_avail_idx && ...) {
...
}

/* Set length and mark as filled */
vq->used_elems[i].len = len;
vq->used_elems[i].filled = true;
---

But I'm ok either way.



Let me know what you think of the proposed solution above. It doesn't 
explicitly separate the search and find operation like you're proposing 
here but it does clearly show the bounds of our search.


But doing:

while (vq->used_elems[i].index != elem->index &&
   vq->used_elems[i].index != vq->last_avail_idx &&
   steps <= max_steps) {
...
}

Works too.


+}
+
  static void virtqueue_packed_fill_desc(VirtQueue *vq,
 const VirtQueueElement *elem,
 unsigned int idx,
@@ -923,7 +945,9 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
  return;
  }

-if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_ordered_fill(vq, elem, len);
+} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
  virtqueue_packed_fill(vq, elem, len, idx);
  } else {
  virtqueue_split_fill(vq, elem, len, idx);
--
2.39.3







Re: [PATCH 2/6] virtio: virtqueue_pop - VIRTIO_F_IN_ORDER support

2024-05-10 Thread Jonah Palmer




On 5/9/24 9:13 AM, Eugenio Perez Martin wrote:

On Mon, May 6, 2024 at 5:06 PM Jonah Palmer  wrote:


Add VIRTIO_F_IN_ORDER feature support in virtqueue_split_pop and
virtqueue_packed_pop.

VirtQueueElements popped from the available/descritpor ring are added to
the VirtQueue's used_elems array in-order and in the same fashion as
they would be added the used and descriptor rings, respectively.

This will allow us to keep track of the current order, what elements
have been written, as well as an element's essential data after being
processed.

Tested-by: Lei Yang 
Signed-off-by: Jonah Palmer 
---
  hw/virtio/virtio.c | 17 -
  1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 893a072c9d..e6eb1bb453 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1506,7 +1506,7 @@ static void *virtqueue_alloc_element(size_t sz, unsigned 
out_num, unsigned in_nu

  static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
  {
-unsigned int i, head, max;
+unsigned int i, j, head, max;
  VRingMemoryRegionCaches *caches;
  MemoryRegionCache indirect_desc_cache;
  MemoryRegionCache *desc_cache;
@@ -1539,6 +1539,8 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
  goto done;
  }

+j = vq->last_avail_idx;
+
  if (!virtqueue_get_head(vq, vq->last_avail_idx++, )) {
  goto done;
  }
@@ -1630,6 +1632,12 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
sz)
  elem->in_sg[i] = iov[out_num + i];
  }

+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+vq->used_elems[j].index = elem->index;
+vq->used_elems[j].len = elem->len;
+vq->used_elems[j].ndescs = elem->ndescs;
+}
+
  vq->inuse++;

  trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
@@ -1758,6 +1766,13 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t 
sz)

  elem->index = id;
  elem->ndescs = (desc_cache == _desc_cache) ? 1 : elem_entries;
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+vq->used_elems[vq->last_avail_idx].index = elem->index;
+vq->used_elems[vq->last_avail_idx].len = elem->len;
+vq->used_elems[vq->last_avail_idx].ndescs = elem->ndescs;
+}
+


I suggest using a consistent style between packed and split: Either
always use vq->last_avail_idx or j. If you use j, please rename to
something more related to the usage, as j is usually for iterations.

In my opinion I think vq->last_avail_idx is better.




Totally agree. The reason I used a separate variable in 
virtqueue_split_pop was to capture the value of vq->last_avail_idx 
before it got incremented in the next line.


Not sure if it actually matters whether or not I use the value of 
last_avail_idx before or after it's incremented. I don't think it does 
but, in any case, I opted to use the value before it was incremented so 
as to be consistent with virtqueue_packed_pop, where last_avail_idx is 
used before it's incremented.


I'll change j to something more meaningful though. Maybe 
'init_last_avail_idx'? Hmm... will need to think on it.



  vq->last_avail_idx += elem->ndescs;
  vq->inuse += elem->ndescs;

--
2.39.3







Re: [PATCH 1/6] virtio: Add bool to VirtQueueElement

2024-05-10 Thread Jonah Palmer




On 5/9/24 8:32 AM, Eugenio Perez Martin wrote:

On Mon, May 6, 2024 at 5:06 PM Jonah Palmer  wrote:


Add the boolean 'filled' member to the VirtQueueElement structure. The
use of this boolean will signify if the element has been written to the
used / descriptor ring or not. This boolean is used to support the
VIRTIO_F_IN_ORDER feature.

Tested-by: Lei Yang 
Signed-off-by: Jonah Palmer 
---
  include/hw/virtio/virtio.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 7d5ffdc145..9ed9c3763c 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -69,6 +69,7 @@ typedef struct VirtQueueElement
  unsigned int ndescs;
  unsigned int out_num;
  unsigned int in_num;
+bool filled;


in_order_filled? I cannot come with a good name for this. Maybe we can
add a comment on top of the variable so we know what it is used for?



Will do! I can change the name to be more obvious as well.


  hwaddr *in_addr;
  hwaddr *out_addr;
  struct iovec *in_sg;
--
2.39.3







[PATCH 5/6] vhost, vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits

2024-05-06 Thread Jonah Palmer via
Add support for the VIRTIO_F_IN_ORDER feature across a variety of vhost
devices.

The inclusion of VIRTIO_F_IN_ORDER in the feature bits arrays for these
devices ensures that the backend is capable of offering and providing
support for this feature, and that it can be disabled if the backend
does not support it.

Tested-by: Lei Yang 
Acked-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 hw/block/vhost-user-blk.c| 1 +
 hw/net/vhost_net.c   | 2 ++
 hw/scsi/vhost-scsi.c | 1 +
 hw/scsi/vhost-user-scsi.c| 1 +
 hw/virtio/vhost-user-fs.c| 1 +
 hw/virtio/vhost-user-vsock.c | 1 +
 net/vhost-vdpa.c | 1 +
 7 files changed, 8 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9e6bbc6950..1dd0a8ef63 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index fd1a93701a..eb0b1c06e5 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_HASH_REPORT,
 VHOST_INVALID_FEATURE_BIT
 };
@@ -76,6 +77,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_RSS,
 VIRTIO_NET_F_HASH_REPORT,
 VIRTIO_NET_F_GUEST_USO4,
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index ae26bc19a4..40e7630191 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a63b1f4948..1d59951ab7 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index cca2cd41be..9243dbb128 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -33,6 +33,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 
 VHOST_INVALID_FEATURE_BIT
 };
diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
index 9431b9792c..cc7e4e47b4 100644
--- a/hw/virtio/vhost-user-vsock.c
+++ b/hw/virtio/vhost-user-vsock.c
@@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 85e73dd6a7..ed3185acfa 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
 VIRTIO_F_VERSION_1,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_CSUM,
 VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
 VIRTIO_NET_F_CTRL_MAC_ADDR,
-- 
2.39.3




[PATCH 0/6] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-05-06 Thread Jonah Palmer
The goal of these patches is to add support to a variety of virtio and
vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
indicates that all buffers are used by the device in the same order in
which they were made available by the driver.

These patches attempt to implement a generalized, non-device-specific
solution to support this feature.

The core feature behind this solution is a buffer mechanism in the form
of a VirtQueue's used_elems VirtQueueElement array. This allows devices
who always use buffers in-order by default to have a minimal overhead
impact. Devices that may not always use buffers in-order likely will
experience a performance hit. How large that performance hit is will
depend on how frequent elements are completed out-of-order.

A VirtQueue whose device who uses this feature will use its used_elems
VirtQueueElement array to hold used VirtQueueElements. The index that
used elements are placed in used_elems is the same index on the
used/descriptor ring that would satisfy the in-order requirement. In
other words, used elements are placed in their in-order locations on
used_elems and are only written to the used/descriptor ring once the
elements on used_elems are able to continue their expected order.

To differentiate between a "used" and "unused" element on the used_elems
array (a "used" element being an element that has returned from
processing and an "unused" element being an element that has not yet
been processed), we added a boolean 'filled' member to the
VirtQueueElement struct. This flag is set to true when the element comes
back from processing (virtqueue_ordered_fill) and then set back to false
once it's been written to the used/descriptor ring
(virtqueue_ordered_flush).

---
v1: Move series from RFC to PATCH for submission.

Jonah Palmer (6):
  virtio: Add bool to VirtQueueElement
  virtio: virtqueue_pop - VIRTIO_F_IN_ORDER support
  virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support
  virtio: virtqueue_ordered_flush - VIRTIO_F_IN_ORDER support
  vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
  virtio: Add VIRTIO_F_IN_ORDER property definition

 hw/block/vhost-user-blk.c|   1 +
 hw/net/vhost_net.c   |   2 +
 hw/scsi/vhost-scsi.c |   1 +
 hw/scsi/vhost-user-scsi.c|   1 +
 hw/virtio/vhost-user-fs.c|   1 +
 hw/virtio/vhost-user-vsock.c |   1 +
 hw/virtio/virtio.c   | 118 ++-
 include/hw/virtio/virtio.h   |   5 +-
 net/vhost-vdpa.c |   1 +
 9 files changed, 127 insertions(+), 4 deletions(-)

-- 
2.39.3




[PATCH 3/6] virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support

2024-05-06 Thread Jonah Palmer
Add VIRTIO_F_IN_ORDER feature support for virtqueue_fill operations.

The goal of the virtqueue_fill operation when the VIRTIO_F_IN_ORDER
feature has been negotiated is to search for this now-used element,
set its length, and mark the element as filled in the VirtQueue's
used_elems array.

By marking the element as filled, it will indicate that this element is
ready to be flushed, so long as the element is in-order.

Tested-by: Lei Yang 
Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index e6eb1bb453..064046b5e2 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -873,6 +873,28 @@ static void virtqueue_packed_fill(VirtQueue *vq, const 
VirtQueueElement *elem,
 vq->used_elems[idx].ndescs = elem->ndescs;
 }
 
+static void virtqueue_ordered_fill(VirtQueue *vq, const VirtQueueElement *elem,
+   unsigned int len)
+{
+unsigned int i = vq->used_idx;
+
+/* Search for element in vq->used_elems */
+while (i != vq->last_avail_idx) {
+/* Found element, set length and mark as filled */
+if (vq->used_elems[i].index == elem->index) {
+vq->used_elems[i].len = len;
+vq->used_elems[i].filled = true;
+break;
+}
+
+i += vq->used_elems[i].ndescs;
+
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+}
+}
+
 static void virtqueue_packed_fill_desc(VirtQueue *vq,
const VirtQueueElement *elem,
unsigned int idx,
@@ -923,7 +945,9 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
 return;
 }
 
-if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_ordered_fill(vq, elem, len);
+} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
 virtqueue_packed_fill(vq, elem, len, idx);
 } else {
 virtqueue_split_fill(vq, elem, len, idx);
-- 
2.39.3




[PATCH 1/6] virtio: Add bool to VirtQueueElement

2024-05-06 Thread Jonah Palmer
Add the boolean 'filled' member to the VirtQueueElement structure. The
use of this boolean will signify if the element has been written to the
used / descriptor ring or not. This boolean is used to support the
VIRTIO_F_IN_ORDER feature.

Tested-by: Lei Yang 
Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 7d5ffdc145..9ed9c3763c 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -69,6 +69,7 @@ typedef struct VirtQueueElement
 unsigned int ndescs;
 unsigned int out_num;
 unsigned int in_num;
+bool filled;
 hwaddr *in_addr;
 hwaddr *out_addr;
 struct iovec *in_sg;
-- 
2.39.3




[PATCH 6/6] virtio: Add VIRTIO_F_IN_ORDER property definition

2024-05-06 Thread Jonah Palmer
Extend the virtio device property definitions to include the
VIRTIO_F_IN_ORDER feature.

The default state of this feature is disabled, allowing it to be
explicitly enabled where it's supported.

Tested-by: Lei Yang 
Acked-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 9ed9c3763c..30c23400e3 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -370,7 +370,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
 DEFINE_PROP_BIT64("packed", _state, _field, \
   VIRTIO_F_RING_PACKED, false), \
 DEFINE_PROP_BIT64("queue_reset", _state, _field, \
-  VIRTIO_F_RING_RESET, true)
+  VIRTIO_F_RING_RESET, true), \
+DEFINE_PROP_BIT64("in_order", _state, _field, \
+  VIRTIO_F_IN_ORDER, false)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
-- 
2.39.3




[PATCH 2/6] virtio: virtqueue_pop - VIRTIO_F_IN_ORDER support

2024-05-06 Thread Jonah Palmer
Add VIRTIO_F_IN_ORDER feature support in virtqueue_split_pop and
virtqueue_packed_pop.

VirtQueueElements popped from the available/descritpor ring are added to
the VirtQueue's used_elems array in-order and in the same fashion as
they would be added the used and descriptor rings, respectively.

This will allow us to keep track of the current order, what elements
have been written, as well as an element's essential data after being
processed.

Tested-by: Lei Yang 
Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 893a072c9d..e6eb1bb453 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1506,7 +1506,7 @@ static void *virtqueue_alloc_element(size_t sz, unsigned 
out_num, unsigned in_nu
 
 static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
 {
-unsigned int i, head, max;
+unsigned int i, j, head, max;
 VRingMemoryRegionCaches *caches;
 MemoryRegionCache indirect_desc_cache;
 MemoryRegionCache *desc_cache;
@@ -1539,6 +1539,8 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
 goto done;
 }
 
+j = vq->last_avail_idx;
+
 if (!virtqueue_get_head(vq, vq->last_avail_idx++, )) {
 goto done;
 }
@@ -1630,6 +1632,12 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
sz)
 elem->in_sg[i] = iov[out_num + i];
 }
 
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+vq->used_elems[j].index = elem->index;
+vq->used_elems[j].len = elem->len;
+vq->used_elems[j].ndescs = elem->ndescs;
+}
+
 vq->inuse++;
 
 trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
@@ -1758,6 +1766,13 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t 
sz)
 
 elem->index = id;
 elem->ndescs = (desc_cache == _desc_cache) ? 1 : elem_entries;
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+vq->used_elems[vq->last_avail_idx].index = elem->index;
+vq->used_elems[vq->last_avail_idx].len = elem->len;
+vq->used_elems[vq->last_avail_idx].ndescs = elem->ndescs;
+}
+
 vq->last_avail_idx += elem->ndescs;
 vq->inuse += elem->ndescs;
 
-- 
2.39.3




[PATCH 4/6] virtio: virtqueue_ordered_flush - VIRTIO_F_IN_ORDER support

2024-05-06 Thread Jonah Palmer
Add VIRTIO_F_IN_ORDER feature support for virtqueue_flush operations.

The goal of the virtqueue_flush operation when the VIRTIO_F_IN_ORDER
feature has been negotiated is to write elements to the used/descriptor
ring in-order and then update used_idx.

The function iterates through the VirtQueueElement used_elems array
in-order starting at vq->used_idx. If the element is valid (filled), the
element is written to the used/descriptor ring. This process continues
until we find an invalid (not filled) element.

If any elements were written, the used_idx is updated.

Tested-by: Lei Yang 
Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 75 +-
 1 file changed, 74 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 064046b5e2..0efed2c88e 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1006,6 +1006,77 @@ static void virtqueue_packed_flush(VirtQueue *vq, 
unsigned int count)
 }
 }
 
+static void virtqueue_ordered_flush(VirtQueue *vq)
+{
+unsigned int i = vq->used_idx;
+unsigned int ndescs = 0;
+uint16_t old = vq->used_idx;
+bool packed;
+VRingUsedElem uelem;
+
+packed = virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED);
+
+if (packed) {
+if (unlikely(!vq->vring.desc)) {
+return;
+}
+} else if (unlikely(!vq->vring.used)) {
+return;
+}
+
+/* First expected in-order element isn't ready, nothing to do */
+if (!vq->used_elems[i].filled) {
+return;
+}
+
+/* Write first expected in-order element to used ring (split VQs) */
+if (!packed) {
+uelem.id = vq->used_elems[i].index;
+uelem.len = vq->used_elems[i].len;
+vring_used_write(vq, , i);
+}
+
+ndescs += vq->used_elems[i].ndescs;
+i += ndescs;
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+
+/* Search for more filled elements in-order */
+while (vq->used_elems[i].filled) {
+if (packed) {
+virtqueue_packed_fill_desc(vq, >used_elems[i], ndescs, false);
+} else {
+uelem.id = vq->used_elems[i].index;
+uelem.len = vq->used_elems[i].len;
+vring_used_write(vq, , i);
+}
+
+vq->used_elems[i].filled = false;
+ndescs += vq->used_elems[i].ndescs;
+i += ndescs;
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+}
+
+if (packed) {
+virtqueue_packed_fill_desc(vq, >used_elems[vq->used_idx], 0, true);
+vq->used_idx += ndescs;
+if (vq->used_idx >= vq->vring.num) {
+vq->used_idx -= vq->vring.num;
+vq->used_wrap_counter ^= 1;
+vq->signalled_used_valid = false;
+}
+} else {
+vring_used_idx_set(vq, i);
+if (unlikely((int16_t)(i - vq->signalled_used) < (uint16_t)(i - old))) 
{
+vq->signalled_used_valid = false;
+}
+}
+vq->inuse -= ndescs;
+}
+
 void virtqueue_flush(VirtQueue *vq, unsigned int count)
 {
 if (virtio_device_disabled(vq->vdev)) {
@@ -1013,7 +1084,9 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
 return;
 }
 
-if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_ordered_flush(vq);
+} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
 virtqueue_packed_flush(vq, count);
 } else {
 virtqueue_split_flush(vq, count);
-- 
2.39.3




Re: [RFC 1/2] iova_tree: add an id member to DMAMap

2024-04-29 Thread Jonah Palmer




On 4/29/24 4:14 AM, Eugenio Perez Martin wrote:

On Thu, Apr 25, 2024 at 7:44 PM Si-Wei Liu  wrote:




On 4/24/2024 12:33 AM, Eugenio Perez Martin wrote:

On Wed, Apr 24, 2024 at 12:21 AM Si-Wei Liu  wrote:



On 4/22/2024 1:49 AM, Eugenio Perez Martin wrote:

On Sat, Apr 20, 2024 at 1:50 AM Si-Wei Liu  wrote:


On 4/19/2024 1:29 AM, Eugenio Perez Martin wrote:

On Thu, Apr 18, 2024 at 10:46 PM Si-Wei Liu  wrote:

On 4/10/2024 3:03 AM, Eugenio Pérez wrote:

IOVA tree is also used to track the mappings of virtio-net shadow
virtqueue.  This mappings may not match with the GPA->HVA ones.

This causes a problem when overlapped regions (different GPA but same
translated HVA) exists in the tree, as looking them by HVA will return
them twice.  To solve this, create an id member so we can assign unique
identifiers (GPA) to the maps.

Signed-off-by: Eugenio Pérez 
---
  include/qemu/iova-tree.h | 5 +++--
  util/iova-tree.c | 3 ++-
  2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
index 2a10a7052e..34ee230e7d 100644
--- a/include/qemu/iova-tree.h
+++ b/include/qemu/iova-tree.h
@@ -36,6 +36,7 @@ typedef struct DMAMap {
  hwaddr iova;
  hwaddr translated_addr;
  hwaddr size;/* Inclusive */
+uint64_t id;
  IOMMUAccessFlags perm;
  } QEMU_PACKED DMAMap;
  typedef gboolean (*iova_tree_iterator)(DMAMap *map);
@@ -100,8 +101,8 @@ const DMAMap *iova_tree_find(const IOVATree *tree, const 
DMAMap *map);
   * @map: the mapping to search
   *
   * Search for a mapping in the iova tree that translated_addr overlaps 
with the
- * mapping range specified.  Only the first found mapping will be
- * returned.
+ * mapping range specified and map->id is equal.  Only the first found
+ * mapping will be returned.
   *
   * Return: DMAMap pointer if found, or NULL if not found.  Note that
   * the returned DMAMap pointer is maintained internally.  User should
diff --git a/util/iova-tree.c b/util/iova-tree.c
index 536789797e..0863e0a3b8 100644
--- a/util/iova-tree.c
+++ b/util/iova-tree.c
@@ -97,7 +97,8 @@ static gboolean iova_tree_find_address_iterator(gpointer key, 
gpointer value,

  needle = args->needle;
  if (map->translated_addr + map->size < needle->translated_addr ||
-needle->translated_addr + needle->size < map->translated_addr) {
+needle->translated_addr + needle->size < map->translated_addr ||
+needle->id != map->id) {

It looks this iterator can also be invoked by SVQ from
vhost_svq_translate_addr() -> iova_tree_find_iova(), where guest GPA
space will be searched on without passing in the ID (GPA), and exact
match for the same GPA range is not actually needed unlike the mapping
removal case. Could we create an API variant, for the SVQ lookup case
specifically? Or alternatively, add a special flag, say skip_id_match to
DMAMap, and the id match check may look like below:

(!needle->skip_id_match && needle->id != map->id)

I think vhost_svq_translate_addr() could just call the API variant or
pass DMAmap with skip_id_match set to true to svq_iova_tree_find_iova().


I think you're totally right. But I'd really like to not complicate
the API of the iova_tree more.

I think we can look for the hwaddr using memory_region_from_host and
then get the hwaddr. It is another lookup though...

Yeah, that will be another means of doing translation without having to
complicate the API around iova_tree. I wonder how the lookup through
memory_region_from_host() may perform compared to the iova tree one, the
former looks to be an O(N) linear search on a linked list while the
latter would be roughly O(log N) on an AVL tree?

Even worse, as the reverse lookup (from QEMU vaddr to SVQ IOVA) is
linear too. It is not even ordered.

Oh Sorry, I misread the code and I should look for g_tree_foreach ()
instead of g_tree_search_node(). So the former is indeed linear
iteration, but it looks to be ordered?

https://urldefense.com/v3/__https://github.com/GNOME/glib/blob/main/glib/gtree.c*L1115__;Iw!!ACWV5N9M2RV99hQ!Ng2rLfRd9tLyNTNocW50Mf5AcxSt0uF0wOdv120djff-z_iAdbujYK-jMi5UC1DZLxb1yLUv2vV0j3wJo8o$

The GPA / IOVA are ordered but we're looking by QEMU's vaddr.

If we have these translations:
[0x1000, 0x2000] -> [0x1, 0x11000]
[0x2000, 0x3000] -> [0x6000, 0x7000]

We will see them in this order, so we cannot stop the search at the first node.

Yeah, reverse lookup is unordered indeed, anyway.




But apart from this detail you're right, I have the same concerns with
this solution too. If we see a hard performance regression we could go
to more complicated solutions, like maintaining a reverse IOVATree in
vhost-iova-tree too. First RFCs of SVQ did that actually.

Agreed, yeap we can use memory_region_from_host for now.  Any reason why
reverse IOVATree was dropped, lack of users? But now we have one!


No, it is just simplicity. We already have 

[RFC v3 3/6] virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support

2024-04-08 Thread Jonah Palmer
Add VIRTIO_F_IN_ORDER feature support for virtqueue_fill operations.

The goal of the virtqueue_fill operation when the VIRTIO_F_IN_ORDER
feature has been negotiated is to search for this now-used element,
set its length, and mark the element as filled in the VirtQueue's
used_elems array.

By marking the element as filled, it will indicate that this element is
ready to be flushed, so long as the element is in-order.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 3ad58100b2..0730f26f74 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -872,6 +872,28 @@ static void virtqueue_packed_fill(VirtQueue *vq, const 
VirtQueueElement *elem,
 vq->used_elems[idx].ndescs = elem->ndescs;
 }
 
+static void virtqueue_ordered_fill(VirtQueue *vq, const VirtQueueElement *elem,
+   unsigned int len)
+{
+unsigned int i = vq->used_idx;
+
+/* Search for element in vq->used_elems */
+while (i != vq->last_avail_idx) {
+/* Found element, set length and mark as filled */
+if (vq->used_elems[i].index == elem->index) {
+vq->used_elems[i].len = len;
+vq->used_elems[i].filled = true;
+break;
+}
+
+i += vq->used_elems[i].ndescs;
+
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+}
+}
+
 static void virtqueue_packed_fill_desc(VirtQueue *vq,
const VirtQueueElement *elem,
unsigned int idx,
@@ -922,7 +944,9 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
 return;
 }
 
-if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_ordered_fill(vq, elem, len);
+} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
 virtqueue_packed_fill(vq, elem, len, idx);
 } else {
 virtqueue_split_fill(vq, elem, len, idx);
-- 
2.39.3




[RFC v3 5/6] vhost, vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits

2024-04-08 Thread Jonah Palmer via
Add support for the VIRTIO_F_IN_ORDER feature across a variety of vhost
devices.

The inclusion of VIRTIO_F_IN_ORDER in the feature bits arrays for these
devices ensures that the backend is capable of offering and providing
support for this feature, and that it can be disabled if the backend
does not support it.

Acked-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 hw/block/vhost-user-blk.c| 1 +
 hw/net/vhost_net.c   | 2 ++
 hw/scsi/vhost-scsi.c | 1 +
 hw/scsi/vhost-user-scsi.c| 1 +
 hw/virtio/vhost-user-fs.c| 1 +
 hw/virtio/vhost-user-vsock.c | 1 +
 net/vhost-vdpa.c | 1 +
 7 files changed, 8 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 6a856ad51a..d176ed857e 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index fd1a93701a..eb0b1c06e5 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_HASH_REPORT,
 VHOST_INVALID_FEATURE_BIT
 };
@@ -76,6 +77,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_RSS,
 VIRTIO_NET_F_HASH_REPORT,
 VIRTIO_NET_F_GUEST_USO4,
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index ae26bc19a4..40e7630191 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a63b1f4948..1d59951ab7 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index cca2cd41be..9243dbb128 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -33,6 +33,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 
 VHOST_INVALID_FEATURE_BIT
 };
diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
index 9431b9792c..cc7e4e47b4 100644
--- a/hw/virtio/vhost-user-vsock.c
+++ b/hw/virtio/vhost-user-vsock.c
@@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 85e73dd6a7..ed3185acfa 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
 VIRTIO_F_VERSION_1,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_CSUM,
 VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
 VIRTIO_NET_F_CTRL_MAC_ADDR,
-- 
2.39.3




[RFC v3 0/6] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-04-08 Thread Jonah Palmer
The goal of these patches is to add support to a variety of virtio and
vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
indicates that all buffers are used by the device in the same order in
which they were made available by the driver.

These patches attempt to implement a generalized, non-device-specific
solution to support this feature.

The core feature behind this solution is a buffer mechanism in the form
of a VirtQueue's used_elems VirtQueueElement array. This allows devices
who always use buffers in-order by default to have a minimal overhead
impact. Devices that may not always use buffers in-order likely will
experience a performance hit. How large that performance hit is will
depend on how frequent elements are completed out-of-order.

A VirtQueue whose device who uses this feature will use its used_elems
VirtQueueElement array to hold used VirtQueueElements. The index that
used elements are placed in used_elems is the same index on the
used/descriptor ring that would satisfy the in-order requirement. In
other words, used elements are placed in their in-order locations on
used_elems and are only written to the used/descriptor ring once the
elements on used_elems are able to continue their expected order.

To differentiate between a "used" and "unused" element on the used_elems
array (a "used" element being an element that has returned from
processing and an "unused" element being an element that has not yet
been processed), we added a boolean 'filled' member to the
VirtQueueElement struct. This flag is set to true when the element comes
back from processing (virtqueue_ordered_fill) and then set back to false
once it's been written to the used/descriptor ring
(virtqueue_ordered_flush).

---
v3: Add elements to used_elems during virtqueue_split/packed_pop
Replace current_seq_idx usage with vq->last_avail_idx
Remove used_seq_idx, leverage used_idx and last_avail_idx for
searching used_elems
Remove seq_idx in VirtQueueElement
Add boolean to VirtQueueElement to signal element status
Add virtqueue_ordered_fill/flush functions for ordering

v2: Use a VirtQueue's used_elems array as a buffer mechanism

v1: Implement custom GLib GHashTable as a buffer mechanism

Jonah Palmer (6):
  virtio: Add bool to VirtQueueElement
  virtio: virtqueue_pop - VIRTIO_F_IN_ORDER support
  virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support
  virtio: virtqueue_ordered_flush - VIRTIO_F_IN_ORDER support
  vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
  virtio: Add VIRTIO_F_IN_ORDER property definition

 hw/block/vhost-user-blk.c|   1 +
 hw/net/vhost_net.c   |   2 +
 hw/scsi/vhost-scsi.c |   1 +
 hw/scsi/vhost-user-scsi.c|   1 +
 hw/virtio/vhost-user-fs.c|   1 +
 hw/virtio/vhost-user-vsock.c |   1 +
 hw/virtio/virtio.c   | 118 ++-
 include/hw/virtio/virtio.h   |   5 +-
 net/vhost-vdpa.c |   1 +
 9 files changed, 127 insertions(+), 4 deletions(-)

-- 
2.39.3




[RFC v3 6/6] virtio: Add VIRTIO_F_IN_ORDER property definition

2024-04-08 Thread Jonah Palmer
Extend the virtio device property definitions to include the
VIRTIO_F_IN_ORDER feature.

The default state of this feature is disabled, allowing it to be
explicitly enabled where it's supported.

Acked-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 9034719f1d..43ea738e65 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -385,7 +385,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
 DEFINE_PROP_BIT64("packed", _state, _field, \
   VIRTIO_F_RING_PACKED, false), \
 DEFINE_PROP_BIT64("queue_reset", _state, _field, \
-  VIRTIO_F_RING_RESET, true)
+  VIRTIO_F_RING_RESET, true), \
+DEFINE_PROP_BIT64("in_order", _state, _field, \
+  VIRTIO_F_IN_ORDER, false)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
-- 
2.39.3




[RFC v3 2/6] virtio: virtqueue_pop - VIRTIO_F_IN_ORDER support

2024-04-08 Thread Jonah Palmer
Add VIRTIO_F_IN_ORDER feature support in virtqueue_split_pop and
virtqueue_packed_pop.

VirtQueueElements popped from the available/descritpor ring are added to
the VirtQueue's used_elems array in-order and in the same fashion as
they would be added the used and descriptor rings, respectively.

This will allow us to keep track of the current order, what elements
have been written, as well as an element's essential data after being
processed.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index fb6b4ccd83..3ad58100b2 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1497,7 +1497,7 @@ static void *virtqueue_alloc_element(size_t sz, unsigned 
out_num, unsigned in_nu
 
 static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
 {
-unsigned int i, head, max;
+unsigned int i, j, head, max;
 VRingMemoryRegionCaches *caches;
 MemoryRegionCache indirect_desc_cache;
 MemoryRegionCache *desc_cache;
@@ -1530,6 +1530,8 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
 goto done;
 }
 
+j = vq->last_avail_idx;
+
 if (!virtqueue_get_head(vq, vq->last_avail_idx++, )) {
 goto done;
 }
@@ -1621,6 +1623,12 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
sz)
 elem->in_sg[i] = iov[out_num + i];
 }
 
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+vq->used_elems[j].index = elem->index;
+vq->used_elems[j].len = elem->len;
+vq->used_elems[j].ndescs = elem->ndescs;
+}
+
 vq->inuse++;
 
 trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
@@ -1749,6 +1757,13 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t 
sz)
 
 elem->index = id;
 elem->ndescs = (desc_cache == _desc_cache) ? 1 : elem_entries;
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+vq->used_elems[vq->last_avail_idx].index = elem->index;
+vq->used_elems[vq->last_avail_idx].len = elem->len;
+vq->used_elems[vq->last_avail_idx].ndescs = elem->ndescs;
+}
+
 vq->last_avail_idx += elem->ndescs;
 vq->inuse += elem->ndescs;
 
-- 
2.39.3




[RFC v3 4/6] virtio: virtqueue_ordered_flush - VIRTIO_F_IN_ORDER support

2024-04-08 Thread Jonah Palmer
Add VIRTIO_F_IN_ORDER feature support for virtqueue_flush operations.

The goal of the virtqueue_flush operation when the VIRTIO_F_IN_ORDER
feature has been negotiated is to write elements to the used/descriptor
ring in-order and then update used_idx.

The function iterates through the VirtQueueElement used_elems array
in-order starting at vq->used_idx. If the element is valid (filled), the
element is written to the used/descriptor ring. This process continues
until we find an invalid (not filled) element.

If any elements were written, the used_idx is updated.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 75 +-
 1 file changed, 74 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 0730f26f74..13451d0cae 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -997,6 +997,77 @@ static void virtqueue_packed_flush(VirtQueue *vq, unsigned 
int count)
 }
 }
 
+static void virtqueue_ordered_flush(VirtQueue *vq)
+{
+unsigned int i = vq->used_idx;
+unsigned int ndescs = 0;
+uint16_t old = vq->used_idx;
+bool packed;
+VRingUsedElem uelem;
+
+packed = virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED);
+
+if (packed) {
+if (unlikely(!vq->vring.desc)) {
+return;
+}
+} else if (unlikely(!vq->vring.used)) {
+return;
+}
+
+/* First expected in-order element isn't ready, nothing to do */
+if (!vq->used_elems[i].filled) {
+return;
+}
+
+/* Write first expected in-order element to used ring (split VQs) */
+if (!packed) {
+uelem.id = vq->used_elems[i].index;
+uelem.len = vq->used_elems[i].len;
+vring_used_write(vq, , i);
+}
+
+ndescs += vq->used_elems[i].ndescs;
+i += ndescs;
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+
+/* Search for more filled elements in-order */
+while (vq->used_elems[i].filled) {
+if (packed) {
+virtqueue_packed_fill_desc(vq, >used_elems[i], ndescs, false);
+} else {
+uelem.id = vq->used_elems[i].index;
+uelem.len = vq->used_elems[i].len;
+vring_used_write(vq, , i);
+}
+
+vq->used_elems[i].filled = false;
+ndescs += vq->used_elems[i].ndescs;
+i += ndescs;
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+}
+
+if (packed) {
+virtqueue_packed_fill_desc(vq, >used_elems[vq->used_idx], 0, true);
+vq->used_idx += ndescs;
+if (vq->used_idx >= vq->vring.num) {
+vq->used_idx -= vq->vring.num;
+vq->used_wrap_counter ^= 1;
+vq->signalled_used_valid = false;
+}
+} else {
+vring_used_idx_set(vq, i);
+if (unlikely((int16_t)(i - vq->signalled_used) < (uint16_t)(i - old))) 
{
+vq->signalled_used_valid = false;
+}
+}
+vq->inuse -= ndescs;
+}
+
 void virtqueue_flush(VirtQueue *vq, unsigned int count)
 {
 if (virtio_device_disabled(vq->vdev)) {
@@ -1004,7 +1075,9 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
 return;
 }
 
-if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_ordered_flush(vq);
+} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
 virtqueue_packed_flush(vq, count);
 } else {
 virtqueue_split_flush(vq, count);
-- 
2.39.3




[RFC v3 1/6] virtio: Add bool to VirtQueueElement

2024-04-08 Thread Jonah Palmer
Add the boolean 'filled' member to the VirtQueueElement structure. The
use of this boolean will signify if the element has been written to the
used / descriptor ring or not. This boolean is used to support the
VIRTIO_F_IN_ORDER feature.

Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b3c74a1bca..9034719f1d 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -71,6 +71,7 @@ typedef struct VirtQueueElement
 unsigned int ndescs;
 unsigned int out_num;
 unsigned int in_num;
+bool filled;
 hwaddr *in_addr;
 hwaddr *out_addr;
 struct iovec *in_sg;
-- 
2.39.3




Re: [RFC v2 1/5] virtio: Initialize sequence variables

2024-04-05 Thread Jonah Palmer




On 4/5/24 11:04 AM, Eugenio Perez Martin wrote:

On Fri, Apr 5, 2024 at 3:59 PM Jonah Palmer  wrote:




On 4/4/24 12:33 PM, Eugenio Perez Martin wrote:

On Thu, Apr 4, 2024 at 4:42 PM Jonah Palmer  wrote:




On 4/4/24 7:35 AM, Eugenio Perez Martin wrote:

On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer  wrote:




On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:

On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer  wrote:


Initialize sequence variables for VirtQueue and VirtQueueElement
structures. A VirtQueue's sequence variables are initialized when a
VirtQueue is being created or reset. A VirtQueueElement's sequence
variable is initialized when a VirtQueueElement is being initialized.
These variables will be used to support the VIRTIO_F_IN_ORDER feature.

A VirtQueue's used_seq_idx represents the next expected index in a
sequence of VirtQueueElements to be processed (put on the used ring).
The next VirtQueueElement added to the used ring must match this
sequence number before additional elements can be safely added to the
used ring. It's also particularly useful for helping find the number of
new elements added to the used ring.

A VirtQueue's current_seq_idx represents the current sequence index.
This value is essentially a counter where the value is assigned to a new
VirtQueueElement and then incremented. Given its uint16_t type, this
sequence number can be between 0 and 65,535.

A VirtQueueElement's seq_idx represents the sequence number assigned to
the VirtQueueElement when it was created. This value must match with the
VirtQueue's used_seq_idx before the element can be put on the used ring
by the device.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 18 ++
 include/hw/virtio/virtio.h |  1 +
 2 files changed, 19 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index fb6b4ccd83..069d96df99 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -132,6 +132,10 @@ struct VirtQueue
 uint16_t used_idx;
 bool used_wrap_counter;

+/* In-Order sequence indices */
+uint16_t used_seq_idx;
+uint16_t current_seq_idx;
+


I'm having a hard time understanding the difference between these and
last_avail_idx and used_idx. It seems to me if we replace them
everything will work? What am I missing?



For used_seq_idx, it does work like used_idx except the difference is
when their values get updated, specifically for the split VQ case.

As you know, for the split VQ case, the used_idx is updated during
virtqueue_split_flush. However, imagine a batch of elements coming in
where virtqueue_split_fill is called multiple times before
virtqueue_split_flush. We want to make sure we write these elements to
the used ring in-order and we'll know its order based on used_seq_idx.

Alternatively, I thought about replicating the logic for the packed VQ
case (where this used_seq_idx isn't used) where we start looking at
vq->used_elems[vq->used_idx] and iterate through until we find a used
element, but I wasn't sure how to handle the case where elements get
used (written to the used ring) and new elements get put in used_elems
before the used_idx is updated. Since this search would require us to
always start at index vq->used_idx.

For example, say, of three elements getting filled (elem0 - elem2),
elem1 and elem0 come back first (vq->used_idx = 0):

elem1 - not in-order
elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
in-order, write elem0 and elem1 to used ring, mark elements as
used

Then elem2 comes back, but vq->used_idx is still 0, so how do we know to
ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1
(elem1) and iterate to vq->used_idx + 2 (elem2)?

Hmm... now that I'm thinking about it, maybe for the split VQ case we
could continue looking through the vq->used_elems array until we find an
unused element... but then again how would we (1) know if the element is
in-order and (2) know when to stop searching?



Ok I think I understand the problem now. It is aggravated if we add
chained descriptors to the mix.

We know that the order of used descriptors must be the exact same as
the order they were made available, leaving out in order batching.
What if vq->used_elems at virtqueue_pop and then virtqueue_push just
marks them as used somehow? Two booleans (or flag) would do for a
first iteration.

If we go with this approach I think used_elems should be renamed actually.



If I'm understanding correctly, I don't think adding newly created
elements to vq->used_elems at virtqueue_pop will do much for us.


By knowing what descriptor id must go in each position of the used ring.

Following your example, let's say avail_idx is 10 at that moment.
Then, the driver makes available the three elements you mention, so:
used_elems[10] = elem0
used_elems[11] = elem1
used_elems[12] = elem2

Now the device uses elem1. virtqueue_push can search linearly

Re: [RFC v2 1/5] virtio: Initialize sequence variables

2024-04-05 Thread Jonah Palmer




On 4/4/24 12:33 PM, Eugenio Perez Martin wrote:

On Thu, Apr 4, 2024 at 4:42 PM Jonah Palmer  wrote:




On 4/4/24 7:35 AM, Eugenio Perez Martin wrote:

On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer  wrote:




On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:

On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer  wrote:


Initialize sequence variables for VirtQueue and VirtQueueElement
structures. A VirtQueue's sequence variables are initialized when a
VirtQueue is being created or reset. A VirtQueueElement's sequence
variable is initialized when a VirtQueueElement is being initialized.
These variables will be used to support the VIRTIO_F_IN_ORDER feature.

A VirtQueue's used_seq_idx represents the next expected index in a
sequence of VirtQueueElements to be processed (put on the used ring).
The next VirtQueueElement added to the used ring must match this
sequence number before additional elements can be safely added to the
used ring. It's also particularly useful for helping find the number of
new elements added to the used ring.

A VirtQueue's current_seq_idx represents the current sequence index.
This value is essentially a counter where the value is assigned to a new
VirtQueueElement and then incremented. Given its uint16_t type, this
sequence number can be between 0 and 65,535.

A VirtQueueElement's seq_idx represents the sequence number assigned to
the VirtQueueElement when it was created. This value must match with the
VirtQueue's used_seq_idx before the element can be put on the used ring
by the device.

Signed-off-by: Jonah Palmer 
---
hw/virtio/virtio.c | 18 ++
include/hw/virtio/virtio.h |  1 +
2 files changed, 19 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index fb6b4ccd83..069d96df99 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -132,6 +132,10 @@ struct VirtQueue
uint16_t used_idx;
bool used_wrap_counter;

+/* In-Order sequence indices */
+uint16_t used_seq_idx;
+uint16_t current_seq_idx;
+


I'm having a hard time understanding the difference between these and
last_avail_idx and used_idx. It seems to me if we replace them
everything will work? What am I missing?



For used_seq_idx, it does work like used_idx except the difference is
when their values get updated, specifically for the split VQ case.

As you know, for the split VQ case, the used_idx is updated during
virtqueue_split_flush. However, imagine a batch of elements coming in
where virtqueue_split_fill is called multiple times before
virtqueue_split_flush. We want to make sure we write these elements to
the used ring in-order and we'll know its order based on used_seq_idx.

Alternatively, I thought about replicating the logic for the packed VQ
case (where this used_seq_idx isn't used) where we start looking at
vq->used_elems[vq->used_idx] and iterate through until we find a used
element, but I wasn't sure how to handle the case where elements get
used (written to the used ring) and new elements get put in used_elems
before the used_idx is updated. Since this search would require us to
always start at index vq->used_idx.

For example, say, of three elements getting filled (elem0 - elem2),
elem1 and elem0 come back first (vq->used_idx = 0):

elem1 - not in-order
elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
   in-order, write elem0 and elem1 to used ring, mark elements as
   used

Then elem2 comes back, but vq->used_idx is still 0, so how do we know to
ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1
(elem1) and iterate to vq->used_idx + 2 (elem2)?

Hmm... now that I'm thinking about it, maybe for the split VQ case we
could continue looking through the vq->used_elems array until we find an
unused element... but then again how would we (1) know if the element is
in-order and (2) know when to stop searching?



Ok I think I understand the problem now. It is aggravated if we add
chained descriptors to the mix.

We know that the order of used descriptors must be the exact same as
the order they were made available, leaving out in order batching.
What if vq->used_elems at virtqueue_pop and then virtqueue_push just
marks them as used somehow? Two booleans (or flag) would do for a
first iteration.

If we go with this approach I think used_elems should be renamed actually.



If I'm understanding correctly, I don't think adding newly created
elements to vq->used_elems at virtqueue_pop will do much for us.


By knowing what descriptor id must go in each position of the used ring.

Following your example, let's say avail_idx is 10 at that moment.
Then, the driver makes available the three elements you mention, so:
used_elems[10] = elem0
used_elems[11] = elem1
used_elems[12] = elem2

Now the device uses elem1. virtqueue_push can search linearly for
elem->index in used_elems[used_idx]...used_elems[avail_idx] range. As
the device is mis-behaving, no nee

Re: [RFC v2 1/5] virtio: Initialize sequence variables

2024-04-04 Thread Jonah Palmer




On 4/4/24 7:35 AM, Eugenio Perez Martin wrote:

On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer  wrote:




On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:

On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer  wrote:


Initialize sequence variables for VirtQueue and VirtQueueElement
structures. A VirtQueue's sequence variables are initialized when a
VirtQueue is being created or reset. A VirtQueueElement's sequence
variable is initialized when a VirtQueueElement is being initialized.
These variables will be used to support the VIRTIO_F_IN_ORDER feature.

A VirtQueue's used_seq_idx represents the next expected index in a
sequence of VirtQueueElements to be processed (put on the used ring).
The next VirtQueueElement added to the used ring must match this
sequence number before additional elements can be safely added to the
used ring. It's also particularly useful for helping find the number of
new elements added to the used ring.

A VirtQueue's current_seq_idx represents the current sequence index.
This value is essentially a counter where the value is assigned to a new
VirtQueueElement and then incremented. Given its uint16_t type, this
sequence number can be between 0 and 65,535.

A VirtQueueElement's seq_idx represents the sequence number assigned to
the VirtQueueElement when it was created. This value must match with the
VirtQueue's used_seq_idx before the element can be put on the used ring
by the device.

Signed-off-by: Jonah Palmer 
---
   hw/virtio/virtio.c | 18 ++
   include/hw/virtio/virtio.h |  1 +
   2 files changed, 19 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index fb6b4ccd83..069d96df99 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -132,6 +132,10 @@ struct VirtQueue
   uint16_t used_idx;
   bool used_wrap_counter;

+/* In-Order sequence indices */
+uint16_t used_seq_idx;
+uint16_t current_seq_idx;
+


I'm having a hard time understanding the difference between these and
last_avail_idx and used_idx. It seems to me if we replace them
everything will work? What am I missing?



For used_seq_idx, it does work like used_idx except the difference is
when their values get updated, specifically for the split VQ case.

As you know, for the split VQ case, the used_idx is updated during
virtqueue_split_flush. However, imagine a batch of elements coming in
where virtqueue_split_fill is called multiple times before
virtqueue_split_flush. We want to make sure we write these elements to
the used ring in-order and we'll know its order based on used_seq_idx.

Alternatively, I thought about replicating the logic for the packed VQ
case (where this used_seq_idx isn't used) where we start looking at
vq->used_elems[vq->used_idx] and iterate through until we find a used
element, but I wasn't sure how to handle the case where elements get
used (written to the used ring) and new elements get put in used_elems
before the used_idx is updated. Since this search would require us to
always start at index vq->used_idx.

For example, say, of three elements getting filled (elem0 - elem2),
elem1 and elem0 come back first (vq->used_idx = 0):

elem1 - not in-order
elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
  in-order, write elem0 and elem1 to used ring, mark elements as
  used

Then elem2 comes back, but vq->used_idx is still 0, so how do we know to
ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1
(elem1) and iterate to vq->used_idx + 2 (elem2)?

Hmm... now that I'm thinking about it, maybe for the split VQ case we
could continue looking through the vq->used_elems array until we find an
unused element... but then again how would we (1) know if the element is
in-order and (2) know when to stop searching?



Ok I think I understand the problem now. It is aggravated if we add
chained descriptors to the mix.

We know that the order of used descriptors must be the exact same as
the order they were made available, leaving out in order batching.
What if vq->used_elems at virtqueue_pop and then virtqueue_push just
marks them as used somehow? Two booleans (or flag) would do for a
first iteration.

If we go with this approach I think used_elems should be renamed actually.



If I'm understanding correctly, I don't think adding newly created 
elements to vq->used_elems at virtqueue_pop will do much for us. We 
could just keep adding processed elements to vq->used_elems at 
virtqueue_fill but instead of:


vq->used_elems[seq_idx].in_num = elem->in_num;
vq->used_elems[seq_idx].out_num = elem->out_num;

We could do:

vq->used_elems[seq_idx].in_num = 1;
vq->used_elems[seq_idx].out_num = 1;

We'd use in_num and out_num as separate flags. in_num could indicate if 
this element has been written to the used ring while out_num could 
indicate if this element has been flushed (1 for no, 0 for yes). In 
other words, when we go to write to the used rin

Re: [RFC v2 1/5] virtio: Initialize sequence variables

2024-04-03 Thread Jonah Palmer




On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:

On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer  wrote:


Initialize sequence variables for VirtQueue and VirtQueueElement
structures. A VirtQueue's sequence variables are initialized when a
VirtQueue is being created or reset. A VirtQueueElement's sequence
variable is initialized when a VirtQueueElement is being initialized.
These variables will be used to support the VIRTIO_F_IN_ORDER feature.

A VirtQueue's used_seq_idx represents the next expected index in a
sequence of VirtQueueElements to be processed (put on the used ring).
The next VirtQueueElement added to the used ring must match this
sequence number before additional elements can be safely added to the
used ring. It's also particularly useful for helping find the number of
new elements added to the used ring.

A VirtQueue's current_seq_idx represents the current sequence index.
This value is essentially a counter where the value is assigned to a new
VirtQueueElement and then incremented. Given its uint16_t type, this
sequence number can be between 0 and 65,535.

A VirtQueueElement's seq_idx represents the sequence number assigned to
the VirtQueueElement when it was created. This value must match with the
VirtQueue's used_seq_idx before the element can be put on the used ring
by the device.

Signed-off-by: Jonah Palmer 
---
  hw/virtio/virtio.c | 18 ++
  include/hw/virtio/virtio.h |  1 +
  2 files changed, 19 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index fb6b4ccd83..069d96df99 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -132,6 +132,10 @@ struct VirtQueue
  uint16_t used_idx;
  bool used_wrap_counter;

+/* In-Order sequence indices */
+uint16_t used_seq_idx;
+uint16_t current_seq_idx;
+


I'm having a hard time understanding the difference between these and
last_avail_idx and used_idx. It seems to me if we replace them
everything will work? What am I missing?



For used_seq_idx, it does work like used_idx except the difference is 
when their values get updated, specifically for the split VQ case.


As you know, for the split VQ case, the used_idx is updated during 
virtqueue_split_flush. However, imagine a batch of elements coming in 
where virtqueue_split_fill is called multiple times before 
virtqueue_split_flush. We want to make sure we write these elements to 
the used ring in-order and we'll know its order based on used_seq_idx.


Alternatively, I thought about replicating the logic for the packed VQ 
case (where this used_seq_idx isn't used) where we start looking at 
vq->used_elems[vq->used_idx] and iterate through until we find a used 
element, but I wasn't sure how to handle the case where elements get 
used (written to the used ring) and new elements get put in used_elems 
before the used_idx is updated. Since this search would require us to 
always start at index vq->used_idx.


For example, say, of three elements getting filled (elem0 - elem2), 
elem1 and elem0 come back first (vq->used_idx = 0):


elem1 - not in-order
elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
in-order, write elem0 and elem1 to used ring, mark elements as
used

Then elem2 comes back, but vq->used_idx is still 0, so how do we know to 
ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1 
(elem1) and iterate to vq->used_idx + 2 (elem2)?


Hmm... now that I'm thinking about it, maybe for the split VQ case we 
could continue looking through the vq->used_elems array until we find an 
unused element... but then again how would we (1) know if the element is 
in-order and (2) know when to stop searching?


In any case, the use of this variable could be seen as an optimization 
as its value will tell us where to start looking in vq->used_elems 
instead of always starting at vq->used_idx.


If this is like a one-shot scenario where one element gets written and 
then flushed after, then yes in this case used_seq_idx == used_idx.


--

For current_seq_idx, this is pretty much just a counter. Every new 
VirtQueueElement created from virtqueue_pop is given a number and the 
counter is incremented. Like grabbing a ticket number and waiting for 
your number to be called. The next person to grab a ticket number will 
be your number + 1.


Let me know if I'm making any sense. Thanks :)

Jonah


  /* Last used index value we have signalled on */
  uint16_t signalled_used;

@@ -1621,6 +1625,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
sz)
  elem->in_sg[i] = iov[out_num + i];
  }

+/* Assign sequence index for in-order processing */
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+elem->seq_idx = vq->current_seq_idx++;
+}
+
  vq->inuse++;

  trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
@@ -1760,6 +1769,11 @@ static void *virtqueue_packed_pop(V

[RFC v2 0/5] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-03-28 Thread Jonah Palmer
The goal of these patches is to add support to a variety of virtio and
vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
indicates that all buffers are used by the device in the same order in
which they were made available by the driver.

These patches attempt to implement a generalized, non-device-specific
solution to support this feature.

The core feature behind this solution is a buffer mechanism in the form
of a VirtQueue's used_elems VirtQueueElement array. This allows devices
who always use buffers in-order by default to have a minimal overhead
impact. Devices that may not always use buffers in-order likely will
experience a performance hit. How large that performance hit is will
depend on how frequent elements are completed out-of-order.

A VirtQueue whose device who uses this feature will use its used_elems
VirtQueueElement array to hold used VirtQueueElements. The index that
used elements are placed in used_elems is the same index on the
used/descriptor ring that would satisfy the in-order requirement. In
other words, used elements are placed in their in-order locations on
used_elems and are only written to the used/descriptor ring once the
elements on used_elems are able to continue their expected order.

To differentiate between a "used" and "unused" element on the used_elems
array (a "used" element being an element that was already written to the
used/descriptor ring and an "unused" element being an element that
wasn't), we use an element's in_num and out_num values. If the sum of
these two values is greater than 0, the element is considered unused. If
the sum is 0, then the element is considered used and invalid. When we
find an order and write the element to the used/descriptor ring, we set
these two values to 0 to indicate that it's been used.

---
v2: Use a VirtQueue's used_elems array as a buffer mechanism

v1: Implement custom GLib GHashTable as a buffer mechanism

Jonah Palmer (5):
  virtio: Initialize sequence variables
  virtio: In-order support for split VQs
  virtio: In-order support for packed VQs
  vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
  virtio: Add VIRTIO_F_IN_ORDER property definition

 hw/block/vhost-user-blk.c|   1 +
 hw/net/vhost_net.c   |   2 +
 hw/scsi/vhost-scsi.c |   1 +
 hw/scsi/vhost-user-scsi.c|   1 +
 hw/virtio/vhost-user-fs.c|   1 +
 hw/virtio/vhost-user-vsock.c |   1 +
 hw/virtio/virtio.c   | 118 +++
 include/hw/virtio/virtio.h   |   5 +-
 net/vhost-vdpa.c |   1 +
 9 files changed, 119 insertions(+), 12 deletions(-)

-- 
2.39.3




[RFC v2 2/5] virtio: In-order support for split VQs

2024-03-28 Thread Jonah Palmer
Implements VIRTIO_F_IN_ORDER feature support for virtio devices using
the split virtqueue layout.

For a virtio device that has negotiated the VIRTIO_F_IN_ORDER feature
whose virtqueues use a split virtqueue layout, it's essential that
used VirtQueueElements are written to the used ring in-order.

For devices that use this in-order feature, its VirtQueue's used_elems
array is used to hold processed VirtQueueElements until they can be
presented to the driver in-order.

In the split virtqueue case, we check to see if the element was the next
expected element to be written to the used ring. If it's not, nothing
get written to the used ring and we're done. If it is, the element is
written to the used ring and then we check to see if the next expected
element continues the order. This process is repeated until we're unable
to continue the order.

If no elements were written to the used ring, no update to the used
ring's index is needed.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 50 ++
 1 file changed, 46 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 069d96df99..19d3d43816 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -856,16 +856,38 @@ static void virtqueue_split_fill(VirtQueue *vq, const 
VirtQueueElement *elem,
 unsigned int len, unsigned int idx)
 {
 VRingUsedElem uelem;
+uint16_t uelem_idx;
 
 if (unlikely(!vq->vring.used)) {
 return;
 }
 
-idx = (idx + vq->used_idx) % vq->vring.num;
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+/* Write element(s) to used ring if they're in-order */
+while (true) {
+uelem_idx = vq->used_seq_idx % vq->vring.num;
 
-uelem.id = elem->index;
-uelem.len = len;
-vring_used_write(vq, , idx);
+/* Stop if element has been used */
+if (vq->used_elems[uelem_idx].in_num +
+vq->used_elems[uelem_idx].out_num <= 0) {
+break;
+}
+uelem.id = vq->used_elems[uelem_idx].index;
+uelem.len = vq->used_elems[uelem_idx].len;
+vring_used_write(vq, , uelem_idx);
+
+/* Mark this element as used */
+vq->used_elems[uelem_idx].in_num = 0;
+vq->used_elems[uelem_idx].out_num = 0;
+vq->used_seq_idx++;
+}
+} else {
+idx = (idx + vq->used_idx) % vq->vring.num;
+
+uelem.id = elem->index;
+uelem.len = len;
+vring_used_write(vq, , idx);
+}
 }
 
 static void virtqueue_packed_fill(VirtQueue *vq, const VirtQueueElement *elem,
@@ -918,6 +940,8 @@ static void virtqueue_packed_fill_desc(VirtQueue *vq,
 void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem,
 unsigned int len, unsigned int idx)
 {
+uint16_t seq_idx;
+
 trace_virtqueue_fill(vq, elem, len, idx);
 
 virtqueue_unmap_sg(vq, elem, len);
@@ -926,6 +950,16 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
 return;
 }
 
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+seq_idx = elem->seq_idx % vq->vring.num;
+
+vq->used_elems[seq_idx].index = elem->index;
+vq->used_elems[seq_idx].len = elem->len;
+vq->used_elems[seq_idx].ndescs = elem->ndescs;
+vq->used_elems[seq_idx].in_num = elem->in_num;
+vq->used_elems[seq_idx].out_num = elem->out_num;
+}
+
 if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
 virtqueue_packed_fill(vq, elem, len, idx);
 } else {
@@ -944,6 +978,14 @@ static void virtqueue_split_flush(VirtQueue *vq, unsigned 
int count)
 
 /* Make sure buffer is written before we update index. */
 smp_wmb();
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+count = (vq->used_seq_idx - vq->used_idx) % vq->vring.num;
+
+/* No in-order elements were written, nothing to update */
+if (!count) {
+return;
+}
+}
 trace_virtqueue_flush(vq, count);
 old = vq->used_idx;
 new = old + count;
-- 
2.39.3




[RFC v2 3/5] virtio: In-order support for packed VQs

2024-03-28 Thread Jonah Palmer
Implements VIRTIO_F_IN_ORDER feature support for virtio devices using
the packed virtqueue layout.

For a virtio device that has negotiated the VIRTIO_F_IN_ORDER feature
whose virtqueues use a packed virtqueue layout, it's essential that used
VirtQueueElements are written to the descriptor ring in-order.

In the packed virtqueue case, since we already write to the virtqueue's
used_elems array at the start of virtqueue_fill, we don't need to call
virtqueue_packed_fill. Furthermore, due to change in behavior of the
used_elems array and not knowing how many unused in-order elements
exist, separate logic is required for the flushing operation of packed
virtqueues.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 50 +++---
 1 file changed, 43 insertions(+), 7 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 19d3d43816..dc2eabd18b 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -960,7 +960,8 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
 vq->used_elems[seq_idx].out_num = elem->out_num;
 }
 
-if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED) &&
+!virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
 virtqueue_packed_fill(vq, elem, len, idx);
 } else {
 virtqueue_split_fill(vq, elem, len, idx);
@@ -997,18 +998,53 @@ static void virtqueue_split_flush(VirtQueue *vq, unsigned 
int count)
 
 static void virtqueue_packed_flush(VirtQueue *vq, unsigned int count)
 {
-unsigned int i, ndescs = 0;
+unsigned int i, j, uelem_idx, ndescs = 0;
 
 if (unlikely(!vq->vring.desc)) {
 return;
 }
 
-for (i = 1; i < count; i++) {
-virtqueue_packed_fill_desc(vq, >used_elems[i], i, false);
-ndescs += vq->used_elems[i].ndescs;
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+/* First expected element is used, nothing to do */
+if (vq->used_elems[vq->used_idx].in_num +
+vq->used_elems[vq->used_idx].out_num <= 0) {
+return;
+}
+
+j = vq->used_idx;
+
+for (i = j + 1; ; i++) {
+uelem_idx = i % vq->vring.num;
+
+/* Stop if element has been used */
+if (vq->used_elems[uelem_idx].in_num +
+vq->used_elems[uelem_idx].out_num <= 0) {
+break;
+}
+
+virtqueue_packed_fill_desc(vq, >used_elems[uelem_idx],
+   uelem_idx, false);
+ndescs += vq->used_elems[uelem_idx].ndescs;
+
+/* Mark this element as used */
+vq->used_elems[uelem_idx].in_num = 0;
+vq->used_elems[uelem_idx].out_num = 0;
+}
+
+/* Mark first expected element as used */
+vq->used_elems[vq->used_idx].in_num = 0;
+vq->used_elems[vq->used_idx].out_num = 0;
+} else {
+j = 0;
+
+for (i = 1; i < count; i++) {
+virtqueue_packed_fill_desc(vq, >used_elems[i], i, false);
+ndescs += vq->used_elems[i].ndescs;
+}
 }
-virtqueue_packed_fill_desc(vq, >used_elems[0], 0, true);
-ndescs += vq->used_elems[0].ndescs;
+
+virtqueue_packed_fill_desc(vq, >used_elems[j], j, true);
+ndescs += vq->used_elems[j].ndescs;
 
 vq->inuse -= ndescs;
 vq->used_idx += ndescs;
-- 
2.39.3




[RFC v2 4/5] vhost, vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits

2024-03-28 Thread Jonah Palmer
Add support for the VIRTIO_F_IN_ORDER feature across a variety of vhost
devices.

The inclusion of VIRTIO_F_IN_ORDER in the feature bits arrays for these
devices ensures that the backend is capable of offering and providing
support for this feature, and that it can be disabled if the backend
does not support it.

Acked-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 hw/block/vhost-user-blk.c| 1 +
 hw/net/vhost_net.c   | 2 ++
 hw/scsi/vhost-scsi.c | 1 +
 hw/scsi/vhost-user-scsi.c| 1 +
 hw/virtio/vhost-user-fs.c| 1 +
 hw/virtio/vhost-user-vsock.c | 1 +
 net/vhost-vdpa.c | 1 +
 7 files changed, 8 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 6a856ad51a..d176ed857e 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index fd1a93701a..eb0b1c06e5 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_HASH_REPORT,
 VHOST_INVALID_FEATURE_BIT
 };
@@ -76,6 +77,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_RSS,
 VIRTIO_NET_F_HASH_REPORT,
 VIRTIO_NET_F_GUEST_USO4,
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index ae26bc19a4..40e7630191 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a63b1f4948..1d59951ab7 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index cca2cd41be..9243dbb128 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -33,6 +33,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 
 VHOST_INVALID_FEATURE_BIT
 };
diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
index 9431b9792c..cc7e4e47b4 100644
--- a/hw/virtio/vhost-user-vsock.c
+++ b/hw/virtio/vhost-user-vsock.c
@@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 85e73dd6a7..ed3185acfa 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
 VIRTIO_F_VERSION_1,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_CSUM,
 VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
 VIRTIO_NET_F_CTRL_MAC_ADDR,
-- 
2.39.3




[RFC v2 1/5] virtio: Initialize sequence variables

2024-03-28 Thread Jonah Palmer
Initialize sequence variables for VirtQueue and VirtQueueElement
structures. A VirtQueue's sequence variables are initialized when a
VirtQueue is being created or reset. A VirtQueueElement's sequence
variable is initialized when a VirtQueueElement is being initialized.
These variables will be used to support the VIRTIO_F_IN_ORDER feature.

A VirtQueue's used_seq_idx represents the next expected index in a
sequence of VirtQueueElements to be processed (put on the used ring).
The next VirtQueueElement added to the used ring must match this
sequence number before additional elements can be safely added to the
used ring. It's also particularly useful for helping find the number of
new elements added to the used ring.

A VirtQueue's current_seq_idx represents the current sequence index.
This value is essentially a counter where the value is assigned to a new
VirtQueueElement and then incremented. Given its uint16_t type, this
sequence number can be between 0 and 65,535.

A VirtQueueElement's seq_idx represents the sequence number assigned to
the VirtQueueElement when it was created. This value must match with the
VirtQueue's used_seq_idx before the element can be put on the used ring
by the device.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 18 ++
 include/hw/virtio/virtio.h |  1 +
 2 files changed, 19 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index fb6b4ccd83..069d96df99 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -132,6 +132,10 @@ struct VirtQueue
 uint16_t used_idx;
 bool used_wrap_counter;
 
+/* In-Order sequence indices */
+uint16_t used_seq_idx;
+uint16_t current_seq_idx;
+
 /* Last used index value we have signalled on */
 uint16_t signalled_used;
 
@@ -1621,6 +1625,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
sz)
 elem->in_sg[i] = iov[out_num + i];
 }
 
+/* Assign sequence index for in-order processing */
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+elem->seq_idx = vq->current_seq_idx++;
+}
+
 vq->inuse++;
 
 trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
@@ -1760,6 +1769,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t 
sz)
 vq->shadow_avail_idx = vq->last_avail_idx;
 vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter;
 
+/* Assign sequence index for in-order processing */
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+elem->seq_idx = vq->current_seq_idx++;
+}
+
 trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
 done:
 address_space_cache_destroy(_desc_cache);
@@ -2087,6 +2101,8 @@ static void __virtio_queue_reset(VirtIODevice *vdev, 
uint32_t i)
 vdev->vq[i].notification = true;
 vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
 vdev->vq[i].inuse = 0;
+vdev->vq[i].used_seq_idx = 0;
+vdev->vq[i].current_seq_idx = 0;
 virtio_virtqueue_reset_region_cache(>vq[i]);
 }
 
@@ -2334,6 +2350,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
queue_size,
 vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN;
 vdev->vq[i].handle_output = handle_output;
 vdev->vq[i].used_elems = g_new0(VirtQueueElement, queue_size);
+vdev->vq[i].used_seq_idx = 0;
+vdev->vq[i].current_seq_idx = 0;
 
 return >vq[i];
 }
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b3c74a1bca..910b2a3427 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -75,6 +75,7 @@ typedef struct VirtQueueElement
 hwaddr *out_addr;
 struct iovec *in_sg;
 struct iovec *out_sg;
+uint16_t seq_idx;
 } VirtQueueElement;
 
 #define VIRTIO_QUEUE_MAX 1024
-- 
2.39.3




[RFC v2 5/5] virtio: Add VIRTIO_F_IN_ORDER property definition

2024-03-28 Thread Jonah Palmer
Extend the virtio device property definitions to include the
VIRTIO_F_IN_ORDER feature.

The default state of this feature is disabled, allowing it to be
explicitly enabled where it's supported.

Acked-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 910b2a3427..dd0ba6e57f 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -385,7 +385,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
 DEFINE_PROP_BIT64("packed", _state, _field, \
   VIRTIO_F_RING_PACKED, false), \
 DEFINE_PROP_BIT64("queue_reset", _state, _field, \
-  VIRTIO_F_RING_RESET, true)
+  VIRTIO_F_RING_RESET, true), \
+DEFINE_PROP_BIT64("in_order", _state, _field, \
+  VIRTIO_F_IN_ORDER, false)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
-- 
2.39.3




Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-03-26 Thread Jonah Palmer




On 3/26/24 2:34 PM, Eugenio Perez Martin wrote:

On Tue, Mar 26, 2024 at 5:49 PM Jonah Palmer  wrote:




On 3/25/24 4:33 PM, Eugenio Perez Martin wrote:

On Mon, Mar 25, 2024 at 5:52 PM Jonah Palmer  wrote:




On 3/22/24 7:18 AM, Eugenio Perez Martin wrote:

On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  wrote:


The goal of these patches is to add support to a variety of virtio and
vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
indicates that all buffers are used by the device in the same order in
which they were made available by the driver.

These patches attempt to implement a generalized, non-device-specific
solution to support this feature.

The core feature behind this solution is a buffer mechanism in the form
of GLib's GHashTable. The decision behind using a hash table was to
leverage their ability for quick lookup, insertion, and removal
operations. Given that our keys are simply numbers of an ordered
sequence, a hash table seemed like the best choice for a buffer
mechanism.

-

The strategy behind this implementation is as follows:

We know that buffers that are popped from the available ring and enqueued
for further processing will always done in the same order in which they
were made available by the driver. Given this, we can note their order
by assigning the resulting VirtQueueElement a key. This key is a number
in a sequence that represents the order in which they were popped from
the available ring, relative to the other VirtQueueElements.

For example, given 3 "elements" that were popped from the available
ring, we assign a key value to them which represents their order (elem0
is popped first, then elem1, then lastly elem2):

elem2   --  elem1   --  elem0   ---> Enqueue for processing
   (key: 2)(key: 1)(key: 0)

Then these elements are enqueued for further processing by the host.

While most devices will return these completed elements in the same
order in which they were enqueued, some devices may not (e.g.
virtio-blk). To guarantee that these elements are put on the used ring
in the same order in which they were enqueued, we can use a buffering
mechanism that keeps track of the next expected sequence number of an
element.

In other words, if the completed element does not have a key value that
matches the next expected sequence number, then we know this element is
not in-order and we must stash it away in a hash table until an order
can be made. The element's key value is used as the key for placing it
in the hash table.

If the completed element has a key value that matches the next expected
sequence number, then we know this element is in-order and we can push
it on the used ring. Then we increment the next expected sequence number
and check if the hash table contains an element at this key location.

If so, we retrieve this element, push it to the used ring, delete the
key-value pair from the hash table, increment the next expected sequence
number, and check the hash table again for an element at this new key
location. This process is repeated until we're unable to find an element
in the hash table to continue the order.

So, for example, say the 3 elements we enqueued were completed in the
following order: elem1, elem2, elem0. The next expected sequence number
is 0:

   exp-seq-num = 0:

elem1   --> elem1.key == exp-seq-num ? --> No, stash it
   (key: 1) |
|
v
  
  |key: 1 - elem1|
  
   -
   exp-seq-num = 0:

elem2   --> elem2.key == exp-seq-num ? --> No, stash it
   (key: 2) |
|
v
  
  |key: 1 - elem1|
  |--|
  |key: 2 - elem2|
  
   -
   exp-seq-num = 0:

elem0   --> elem0.key == exp-seq-num ? --> Yes, push to used ring
   (key: 0)

   exp-seq-num = 1:

   lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
remove elem from table
|
v
  
   

Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-03-26 Thread Jonah Palmer




On 3/25/24 4:33 PM, Eugenio Perez Martin wrote:

On Mon, Mar 25, 2024 at 5:52 PM Jonah Palmer  wrote:




On 3/22/24 7:18 AM, Eugenio Perez Martin wrote:

On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  wrote:


The goal of these patches is to add support to a variety of virtio and
vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
indicates that all buffers are used by the device in the same order in
which they were made available by the driver.

These patches attempt to implement a generalized, non-device-specific
solution to support this feature.

The core feature behind this solution is a buffer mechanism in the form
of GLib's GHashTable. The decision behind using a hash table was to
leverage their ability for quick lookup, insertion, and removal
operations. Given that our keys are simply numbers of an ordered
sequence, a hash table seemed like the best choice for a buffer
mechanism.

-

The strategy behind this implementation is as follows:

We know that buffers that are popped from the available ring and enqueued
for further processing will always done in the same order in which they
were made available by the driver. Given this, we can note their order
by assigning the resulting VirtQueueElement a key. This key is a number
in a sequence that represents the order in which they were popped from
the available ring, relative to the other VirtQueueElements.

For example, given 3 "elements" that were popped from the available
ring, we assign a key value to them which represents their order (elem0
is popped first, then elem1, then lastly elem2):

   elem2   --  elem1   --  elem0   ---> Enqueue for processing
  (key: 2)(key: 1)(key: 0)

Then these elements are enqueued for further processing by the host.

While most devices will return these completed elements in the same
order in which they were enqueued, some devices may not (e.g.
virtio-blk). To guarantee that these elements are put on the used ring
in the same order in which they were enqueued, we can use a buffering
mechanism that keeps track of the next expected sequence number of an
element.

In other words, if the completed element does not have a key value that
matches the next expected sequence number, then we know this element is
not in-order and we must stash it away in a hash table until an order
can be made. The element's key value is used as the key for placing it
in the hash table.

If the completed element has a key value that matches the next expected
sequence number, then we know this element is in-order and we can push
it on the used ring. Then we increment the next expected sequence number
and check if the hash table contains an element at this key location.

If so, we retrieve this element, push it to the used ring, delete the
key-value pair from the hash table, increment the next expected sequence
number, and check the hash table again for an element at this new key
location. This process is repeated until we're unable to find an element
in the hash table to continue the order.

So, for example, say the 3 elements we enqueued were completed in the
following order: elem1, elem2, elem0. The next expected sequence number
is 0:

  exp-seq-num = 0:

   elem1   --> elem1.key == exp-seq-num ? --> No, stash it
  (key: 1) |
   |
   v
 
 |key: 1 - elem1|
 
  -
  exp-seq-num = 0:

   elem2   --> elem2.key == exp-seq-num ? --> No, stash it
  (key: 2) |
   |
   v
 
 |key: 1 - elem1|
 |--|
 |key: 2 - elem2|
 
  -
  exp-seq-num = 0:

   elem0   --> elem0.key == exp-seq-num ? --> Yes, push to used ring
  (key: 0)

  exp-seq-num = 1:

  lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
   remove elem from table
   |
   v
 
 |key: 2 - elem2|
 

  exp-seq-num = 2:

  lookup(table, exp-seq-num) !=

Re: [RFC 4/8] virtio: Implement in-order handling for virtio devices

2024-03-25 Thread Jonah Palmer




On 3/22/24 6:46 AM, Eugenio Perez Martin wrote:

On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  wrote:


Implements in-order handling for most virtio devices using the
VIRTIO_F_IN_ORDER transport feature, specifically those who call
virtqueue_push to push their used elements onto the used ring.

The logic behind this implementation is as follows:

1.) virtqueue_pop always enqueues VirtQueueElements in-order.

virtqueue_pop always retrieves one or more buffer descriptors in-order
from the available ring and converts them into a VirtQueueElement. This
means that the order in which VirtQueueElements are enqueued are
in-order by default.

By virtue, as VirtQueueElements are created, we can assign a sequential
key value to them. This preserves the order of buffers that have been
made available to the device by the driver.

As VirtQueueElements are assigned a key value, the current sequence
number is incremented.

2.) Requests can be completed out-of-order.

While most devices complete requests in the same order that they were
enqueued by default, some devices don't (e.g. virtio-blk). The goal of
this out-of-order handling is to reduce the impact of devices that
process elements in-order by default while also guaranteeing compliance
with the VIRTIO_F_IN_ORDER feature.

Below is the logic behind handling completed requests (which may or may
not be in-order).

3.) Does the incoming used VirtQueueElement preserve the correct order?

In other words, is the sequence number (key) assigned to the
VirtQueueElement the expected number that would preserve the original
order?

3a.)
If it does... immediately push the used element onto the used ring.
Then increment the next expected sequence number and check to see if
any previous out-of-order VirtQueueElements stored on the hash table
has a key that matches this next expected sequence number.

For each VirtQueueElement found on the hash table with a matching key:
push the element on the used ring, remove the key-value pair from the
hash table, and then increment the next expected sequence number. Repeat
this process until we're unable to find an element with a matching key.

Note that if the device uses batching (e.g. virtio-net), then we skip
the virtqueue_flush call and let the device call it themselves.

3b.)
If it does not... stash the VirtQueueElement, along with relevant data,
as a InOrderVQElement on the hash table. The key used is the order_key
that was assigned when the VirtQueueElement was created.

Signed-off-by: Jonah Palmer 
---
  hw/virtio/virtio.c | 70 --
  include/hw/virtio/virtio.h |  8 +
  2 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 40124545d6..40e4377f1e 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -992,12 +992,56 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
  }
  }

+void virtqueue_order_element(VirtQueue *vq, const VirtQueueElement *elem,
+ unsigned int len, unsigned int idx,
+ unsigned int count)
+{
+InOrderVQElement *in_order_elem;
+
+if (elem->order_key == vq->current_order_idx) {
+/* Element is in-order, push to used ring */
+virtqueue_fill(vq, elem, len, idx);
+
+/* Batching? Don't flush */
+if (count) {
+virtqueue_flush(vq, count);


The "count" parameter is the number of heads used, but here you're
only using one head (elem). Same with the other virtqueue_flush in the
function.



True. This acts more as a flag than an actual count since, unless we're 
batching (which in the current setup, the device would explicitly call 
virtqueue_flush separately), this value will be either 0 or 1.



Also, this function sometimes replaces virtqueue_fill and other
replaces virtqueue_fill + virtqueue_flush (both examples in patch
6/8). I have the impression the series would be simpler if
virtqueue_order_element is a static function just handling the
virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER) path of
virtqueue_fill, so the caller does not need to know if the in_order
feature is on or off.



Originally I wanted this function to replace virtqueue_fill + 
virtqueue_flush but after looking at virtio_net_receive_rcu and 
vhost_svq_flush, where multiple virtqueue_fill's can be called before a 
single virtqueue_flush, I added this 'if (count)' conditional to handle 
both cases.


I did consider virtqueue_order_element just handling the virtqueue_fill 
path but then I wasn't sure how to handle calling virtqueue_flush when 
retrieving out-of-order data from the hash table.


For example, devices that call virtqueue_push would call virtqueue_fill 
and then virtqueue_flush afterwards. In the scenario where, say, elem1 
was found out of order and put into the hash table, and then elem0 comes 
along. For elem0 we'd call virtqueue_fill and then we should call 
virtqueue_flush to keep the ord

Re: [RFC 1/8] virtio: Define InOrderVQElement

2024-03-25 Thread Jonah Palmer




On 3/22/24 5:45 AM, Eugenio Perez Martin wrote:

On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  wrote:


Define the InOrderVQElement structure for the VIRTIO_F_IN_ORDER
transport feature implementation.

The InOrderVQElement structure is used to encapsulate out-of-order
VirtQueueElement data that was processed by the host. This data
includes:
  - The processed VirtQueueElement (elem)
  - Length of data (len)
  - VirtQueueElement array index (idx)
  - Number of processed VirtQueueElements (count)

InOrderVQElements will be stored in a buffering mechanism until an
order can be achieved.

Signed-off-by: Jonah Palmer 
---
  include/hw/virtio/virtio.h | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b3c74a1bca..c8aa435a5e 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -77,6 +77,13 @@ typedef struct VirtQueueElement
  struct iovec *out_sg;
  } VirtQueueElement;

+typedef struct InOrderVQElement {
+const VirtQueueElement *elem;


Some subsystems allocate space for extra elements after
VirtQueueElement, like VirtIOBlockReq. You can request virtqueue_pop
to allocate this extra space by its second argument. Would it work for
this?



I don't see why not. Although this may not be necessary due to me 
missing a key aspect mentioned in your comment below.



+unsigned int len;
+unsigned int idx;
+unsigned int count;


Now I don't get why these fields cannot be obtained from elem->(len,
index, ndescs) ?



Interesting. I didn't realize that these values are equivalent to a 
VirtQueueElement's len, index, and ndescs fields.


Is this always true? Else I would've expected, for example, 
virtqueue_push to not need the 'unsigned int len' parameter if this 
information is already included via. the VirtQueueElement being passed in.



+} InOrderVQElement;
+
  #define VIRTIO_QUEUE_MAX 1024

  #define VIRTIO_NO_VECTOR 0x
--
2.39.3







Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-03-25 Thread Jonah Palmer




On 3/22/24 7:18 AM, Eugenio Perez Martin wrote:

On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  wrote:


The goal of these patches is to add support to a variety of virtio and
vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
indicates that all buffers are used by the device in the same order in
which they were made available by the driver.

These patches attempt to implement a generalized, non-device-specific
solution to support this feature.

The core feature behind this solution is a buffer mechanism in the form
of GLib's GHashTable. The decision behind using a hash table was to
leverage their ability for quick lookup, insertion, and removal
operations. Given that our keys are simply numbers of an ordered
sequence, a hash table seemed like the best choice for a buffer
mechanism.

-

The strategy behind this implementation is as follows:

We know that buffers that are popped from the available ring and enqueued
for further processing will always done in the same order in which they
were made available by the driver. Given this, we can note their order
by assigning the resulting VirtQueueElement a key. This key is a number
in a sequence that represents the order in which they were popped from
the available ring, relative to the other VirtQueueElements.

For example, given 3 "elements" that were popped from the available
ring, we assign a key value to them which represents their order (elem0
is popped first, then elem1, then lastly elem2):

  elem2   --  elem1   --  elem0   ---> Enqueue for processing
 (key: 2)(key: 1)(key: 0)

Then these elements are enqueued for further processing by the host.

While most devices will return these completed elements in the same
order in which they were enqueued, some devices may not (e.g.
virtio-blk). To guarantee that these elements are put on the used ring
in the same order in which they were enqueued, we can use a buffering
mechanism that keeps track of the next expected sequence number of an
element.

In other words, if the completed element does not have a key value that
matches the next expected sequence number, then we know this element is
not in-order and we must stash it away in a hash table until an order
can be made. The element's key value is used as the key for placing it
in the hash table.

If the completed element has a key value that matches the next expected
sequence number, then we know this element is in-order and we can push
it on the used ring. Then we increment the next expected sequence number
and check if the hash table contains an element at this key location.

If so, we retrieve this element, push it to the used ring, delete the
key-value pair from the hash table, increment the next expected sequence
number, and check the hash table again for an element at this new key
location. This process is repeated until we're unable to find an element
in the hash table to continue the order.

So, for example, say the 3 elements we enqueued were completed in the
following order: elem1, elem2, elem0. The next expected sequence number
is 0:

 exp-seq-num = 0:

  elem1   --> elem1.key == exp-seq-num ? --> No, stash it
 (key: 1) |
  |
  v

|key: 1 - elem1|

 -
 exp-seq-num = 0:

  elem2   --> elem2.key == exp-seq-num ? --> No, stash it
 (key: 2) |
  |
  v

|key: 1 - elem1|
|--|
|key: 2 - elem2|

 -
 exp-seq-num = 0:

  elem0   --> elem0.key == exp-seq-num ? --> Yes, push to used ring
 (key: 0)

 exp-seq-num = 1:

 lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
  remove elem from table
  |
  v

|key: 2 - elem2|


 exp-seq-num = 2:

 lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
   

Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-03-21 Thread Jonah Palmer




On 3/21/24 3:48 PM, Dongli Zhang wrote:

Hi Jonah,

Would you mind helping explain how does VIRTIO_F_IN_ORDER improve the 
performance?

https://lore.kernel.org/all/20240321155717.1392787-1-jonah.pal...@oracle.com/#t

I tried to look for it from prior discussions but could not find why.

https://lore.kernel.org/all/byapr18mb2791df7e6c0f61e2d8698e8fa0...@byapr18mb2791.namprd18.prod.outlook.com/

Thank you very much!

Dongli Zhang



Hey Dongli,

So VIRTIO_F_IN_ORDER can theoretically improve performance under certain 
conditions. Whether it can improve performance today, I'm not sure.


But, if we can guarantee that all buffers are used by the device in the 
same order in which they're made available by the driver (enforcing a 
strict in-order processing and completion of requests), then we can 
leverage this to our advantage.


For example, we could simplify device and driver logic such as not 
needing complex mechanisms to track the completion of out-of-order 
requests (reduce request management overhead). Though the need of 
complex mechanisms to force this data to be in-order kind of defeats 
this benefit.


It could also improve cache utilization since sequential access patterns 
are more cache-friendly compared to random access patterns.


Also, in-order processing is more predictable, making it easier to 
optimize device and driver performance. E.g. it can allow us to 
fine-tune things without having to account for the variability of 
out-of-order completions.


But again, the actual performance impact will vary depending on the use 
case and workload. Scenarios that require high levels of parallelism or 
where out-of-order completions are efficiently managed, the flexibility 
of out-of-order processing can still be preferable.


Jonah


On 3/21/24 08:57, Jonah Palmer wrote:

The goal of these patches is to add support to a variety of virtio and
vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
indicates that all buffers are used by the device in the same order in
which they were made available by the driver.

These patches attempt to implement a generalized, non-device-specific
solution to support this feature.

The core feature behind this solution is a buffer mechanism in the form
of GLib's GHashTable. The decision behind using a hash table was to
leverage their ability for quick lookup, insertion, and removal
operations. Given that our keys are simply numbers of an ordered
sequence, a hash table seemed like the best choice for a buffer
mechanism.

-

The strategy behind this implementation is as follows:

We know that buffers that are popped from the available ring and enqueued
for further processing will always done in the same order in which they
were made available by the driver. Given this, we can note their order
by assigning the resulting VirtQueueElement a key. This key is a number
in a sequence that represents the order in which they were popped from
the available ring, relative to the other VirtQueueElements.

For example, given 3 "elements" that were popped from the available
ring, we assign a key value to them which represents their order (elem0
is popped first, then elem1, then lastly elem2):

  elem2   --  elem1   --  elem0   ---> Enqueue for processing
 (key: 2)(key: 1)(key: 0)

Then these elements are enqueued for further processing by the host.

While most devices will return these completed elements in the same
order in which they were enqueued, some devices may not (e.g.
virtio-blk). To guarantee that these elements are put on the used ring
in the same order in which they were enqueued, we can use a buffering
mechanism that keeps track of the next expected sequence number of an
element.

In other words, if the completed element does not have a key value that
matches the next expected sequence number, then we know this element is
not in-order and we must stash it away in a hash table until an order
can be made. The element's key value is used as the key for placing it
in the hash table.

If the completed element has a key value that matches the next expected
sequence number, then we know this element is in-order and we can push
it on the used ring. Then we increment the next expected sequence number
and check if the hash table contains an element at this key location.

If so, we retrieve this element, push it to the used ring, delete the
key-value pair from the hash table, increment the next expected sequence
number, and check the hash table again for an element at this new key
location. This process is repeated until we're unable to find an element
in the hash table to continue the order.

So, for example, say the 3 elements we enqueued were completed in the
following order: elem1, elem2, elem0. The next expected sequence number
is 0:

 exp-seq-num = 0:

  elem1   --> elem1.key == exp-seq-num ? --> No, st

[RFC 1/8] virtio: Define InOrderVQElement

2024-03-21 Thread Jonah Palmer
Define the InOrderVQElement structure for the VIRTIO_F_IN_ORDER
transport feature implementation.

The InOrderVQElement structure is used to encapsulate out-of-order
VirtQueueElement data that was processed by the host. This data
includes:
 - The processed VirtQueueElement (elem)
 - Length of data (len)
 - VirtQueueElement array index (idx)
 - Number of processed VirtQueueElements (count)

InOrderVQElements will be stored in a buffering mechanism until an
order can be achieved.

Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index b3c74a1bca..c8aa435a5e 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -77,6 +77,13 @@ typedef struct VirtQueueElement
 struct iovec *out_sg;
 } VirtQueueElement;
 
+typedef struct InOrderVQElement {
+const VirtQueueElement *elem;
+unsigned int len;
+unsigned int idx;
+unsigned int count;
+} InOrderVQElement;
+
 #define VIRTIO_QUEUE_MAX 1024
 
 #define VIRTIO_NO_VECTOR 0x
-- 
2.39.3




[RFC 8/8] virtio: Add VIRTIO_F_IN_ORDER property definition

2024-03-21 Thread Jonah Palmer
Extend the virtio device property definitions to include the
VIRTIO_F_IN_ORDER feature.

The default state of this feature is disabled, allowing it to be
explicitly enabled where it's supported.

Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index eeeda397a9..ffd78830a3 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -400,7 +400,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
 DEFINE_PROP_BIT64("packed", _state, _field, \
   VIRTIO_F_RING_PACKED, false), \
 DEFINE_PROP_BIT64("queue_reset", _state, _field, \
-  VIRTIO_F_RING_RESET, true)
+  VIRTIO_F_RING_RESET, true), \
+DEFINE_PROP_BIT64("in_order", _state, _field, \
+  VIRTIO_F_IN_ORDER, false)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
-- 
2.39.3




[RFC 5/8] virtio-net: in-order handling

2024-03-21 Thread Jonah Palmer
Implements in-order handling for the virtio-net device.

Since virtio-net utilizes batching for its Rx VirtQueue, the device is
responsible for calling virtqueue_flush once it has completed its
batching operation.

Note:
-
It's unclear if this implementation is really necessary to "guarantee"
that used VirtQueueElements are put on the used ring in-order since, by
design, virtio-net already does this with its Rx VirtQueue.

Signed-off-by: Jonah Palmer 
---
 hw/net/virtio-net.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9959f1932b..b0375f7e5e 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2069,7 +2069,11 @@ static ssize_t virtio_net_receive_rcu(NetClientState 
*nc, const uint8_t *buf,
 
 for (j = 0; j < i; j++) {
 /* signal other side */
-virtqueue_fill(q->rx_vq, elems[j], lens[j], j);
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_order_element(q->rx_vq, elems[j], lens[j], j, 0);
+} else {
+virtqueue_fill(q->rx_vq, elems[j], lens[j], j);
+}
 g_free(elems[j]);
 }
 
-- 
2.39.3




[RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-03-21 Thread Jonah Palmer
v
   
   |   *empty*|
   

exp-seq-num = 3:

lookup(table, exp-seq-num) != NULL ? --> No, done
-

Jonah Palmer (8):
  virtio: Define InOrderVQElement
  virtio: Create/destroy/reset VirtQueue In-Order hash table
  virtio: Define order variables
  virtio: Implement in-order handling for virtio devices
  virtio-net: in-order handling
  vhost-svq: in-order handling
  vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
  virtio: Add VIRTIO_F_IN_ORDER property definition

 hw/block/vhost-user-blk.c  |   1 +
 hw/net/vhost_net.c |   2 +
 hw/net/virtio-net.c|   6 +-
 hw/scsi/vhost-scsi.c   |   1 +
 hw/scsi/vhost-user-scsi.c  |   1 +
 hw/virtio/vhost-shadow-virtqueue.c |  15 -
 hw/virtio/vhost-user-fs.c  |   1 +
 hw/virtio/vhost-user-vsock.c   |   1 +
 hw/virtio/virtio.c | 103 -
 include/hw/virtio/virtio.h |  20 +-
 net/vhost-vdpa.c   |   1 +
 11 files changed, 145 insertions(+), 7 deletions(-)

-- 
2.39.3




[RFC 6/8] vhost-svq: in-order handling

2024-03-21 Thread Jonah Palmer
Implements in-order handling for vhost devices using shadow virtqueues.

Since vhost's shadow virtqueues utilize batching in their
vhost_svq_flush calls, the vhost device is responsible for calling
virtqueue_flush once it has completed its batching operation.

Note:
-
It's unclear if this implementation is really necessary to "guarantee"
in-order handling since, by design, the vhost_svq_flush function puts
used VirtQueueElements in-order already.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/vhost-shadow-virtqueue.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index fc5f408f77..3c42adee87 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -493,11 +493,20 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 qemu_log_mask(LOG_GUEST_ERROR,
  "More than %u used buffers obtained in a %u size SVQ",
  i, svq->vring.num);
-virtqueue_fill(vq, elem, len, i);
-virtqueue_flush(vq, i);
+if (virtio_vdev_has_feature(svq->vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_order_element(vq, elem, len, i, i);
+} else {
+virtqueue_fill(vq, elem, len, i);
+virtqueue_flush(vq, i);
+}
 return;
 }
-virtqueue_fill(vq, elem, len, i++);
+
+if (virtio_vdev_has_feature(svq->vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_order_element(vq, elem, len, i++, 0);
+} else {
+virtqueue_fill(vq, elem, len, i++);
+}
 }
 
 virtqueue_flush(vq, i);
-- 
2.39.3




[RFC 3/8] virtio: Define order variables

2024-03-21 Thread Jonah Palmer
Define order variables for their use in a VirtQueue's in-order hash
table. Also initialize current_order variables to 0 when creating or
resetting a VirtQueue. These variables are used when the device has
negotiated the VIRTIO_F_IN_ORDER transport feature.

A VirtQueue's current_order_idx represents the next expected index in
the sequence of VirtQueueElements to be processed (put on the used
ring). The next VirtQueueElement to be processed must match this
sequence number before additional elements can be safely added to the
used ring.

A VirtQueue's current_order_key is essentially a counter whose value is
saved as a key in a VirtQueueElement. After the value has been assigned
to the VirtQueueElement, the counter is incremented. All
VirtQueueElements being used by the device are assigned a key value and
the sequence at which they're assigned must be preserved when the device
puts these elements on the used ring.

A VirtQueueElement's order_key is value of a VirtQueue's
current_order_key at the time of the VirtQueueElement's creation. This
value must match with the VirtQueue's current_order_idx before it's able
to be put on the used ring by the device.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 6 ++
 include/hw/virtio/virtio.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index d2afeeb59a..40124545d6 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -155,6 +155,8 @@ struct VirtQueue
 
 /* In-Order */
 GHashTable *in_order_ht;
+uint16_t current_order_idx;
+uint16_t current_order_key;
 };
 
 const char *virtio_device_names[] = {
@@ -2103,6 +2105,8 @@ static void __virtio_queue_reset(VirtIODevice *vdev, 
uint32_t i)
 if (vdev->vq[i].in_order_ht != NULL) {
 g_hash_table_remove_all(vdev->vq[i].in_order_ht);
 }
+vdev->vq[i].current_order_idx = 0;
+vdev->vq[i].current_order_key = 0;
 virtio_virtqueue_reset_region_cache(>vq[i]);
 }
 
@@ -2357,6 +2361,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
queue_size,
 g_hash_table_new_full(g_direct_hash, g_direct_equal, NULL,
   free_in_order_vq_element);
 }
+vdev->vq[i].current_order_idx = 0;
+vdev->vq[i].current_order_key = 0;
 
 return >vq[i];
 }
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index c8aa435a5e..f83d7e1fee 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -75,6 +75,7 @@ typedef struct VirtQueueElement
 hwaddr *out_addr;
 struct iovec *in_sg;
 struct iovec *out_sg;
+uint16_t order_key;
 } VirtQueueElement;
 
 typedef struct InOrderVQElement {
-- 
2.39.3




[RFC 2/8] virtio: Create/destroy/reset VirtQueue In-Order hash table

2024-03-21 Thread Jonah Palmer
Define a GLib hash table (GHashTable) member in a device's VirtQueue
and add its creation, destruction, and reset functions appropriately.
Also define a function to handle the deallocation of InOrderVQElement
values whenever they're removed from the hash table or the hash table
is destroyed. This hash table is to be used when the device is using
the VIRTIO_F_IN_ORDER transport feature.

A VirtQueue's in-order hash table will take in a uint16_t key with a
InOrderVQElement value as its key-value pair.

The hash table will be used as a buffer mechanism for completed,
out-of-order VirtQueueElements until they can be used in the same order
in which they were made available to the device.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index fb6b4ccd83..d2afeeb59a 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -152,6 +152,9 @@ struct VirtQueue
 EventNotifier host_notifier;
 bool host_notifier_enabled;
 QLIST_ENTRY(VirtQueue) node;
+
+/* In-Order */
+GHashTable *in_order_ht;
 };
 
 const char *virtio_device_names[] = {
@@ -2070,6 +2073,16 @@ static enum virtio_device_endian 
virtio_current_cpu_endian(void)
 }
 }
 
+/* 
+ * Called when an element is removed from the hash table
+ * or when the hash table is destroyed.
+ */
+static void free_in_order_vq_element(gpointer data)
+{
+InOrderVQElement *elem = (InOrderVQElement *)data;
+g_free(elem);
+}
+
 static void __virtio_queue_reset(VirtIODevice *vdev, uint32_t i)
 {
 vdev->vq[i].vring.desc = 0;
@@ -2087,6 +2100,9 @@ static void __virtio_queue_reset(VirtIODevice *vdev, 
uint32_t i)
 vdev->vq[i].notification = true;
 vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
 vdev->vq[i].inuse = 0;
+if (vdev->vq[i].in_order_ht != NULL) {
+g_hash_table_remove_all(vdev->vq[i].in_order_ht);
+}
 virtio_virtqueue_reset_region_cache(>vq[i]);
 }
 
@@ -2334,6 +2350,13 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
queue_size,
 vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN;
 vdev->vq[i].handle_output = handle_output;
 vdev->vq[i].used_elems = g_new0(VirtQueueElement, queue_size);
+vdev->vq[i].in_order_ht = NULL;
+
+if (virtio_host_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+vdev->vq[i].in_order_ht =
+g_hash_table_new_full(g_direct_hash, g_direct_equal, NULL,
+  free_in_order_vq_element);
+}
 
 return >vq[i];
 }
@@ -2345,6 +2368,10 @@ void virtio_delete_queue(VirtQueue *vq)
 vq->handle_output = NULL;
 g_free(vq->used_elems);
 vq->used_elems = NULL;
+if (virtio_host_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+g_hash_table_destroy(vq->in_order_ht);
+vq->in_order_ht = NULL;
+}
 virtio_virtqueue_reset_region_cache(vq);
 }
 
-- 
2.39.3




[RFC 7/8] vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits

2024-03-21 Thread Jonah Palmer
Add support for the VIRTIO_F_IN_ORDER feature across a variety of vhost
devices.

The inclusion of VIRTIO_F_IN_ORDER in the feature bits arrays for these
devices ensures that the backend is capable of offering and providing
support for this feature, and that it can be disabled if the backend
does not support it.

Signed-off-by: Jonah Palmer 
---
 hw/block/vhost-user-blk.c| 1 +
 hw/net/vhost_net.c   | 2 ++
 hw/scsi/vhost-scsi.c | 1 +
 hw/scsi/vhost-user-scsi.c| 1 +
 hw/virtio/vhost-user-fs.c| 1 +
 hw/virtio/vhost-user-vsock.c | 1 +
 net/vhost-vdpa.c | 1 +
 7 files changed, 8 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 6a856ad51a..d176ed857e 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e8e1661646..33d1d4b9d3 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_HASH_REPORT,
 VHOST_INVALID_FEATURE_BIT
 };
@@ -76,6 +77,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_RSS,
 VIRTIO_NET_F_HASH_REPORT,
 VIRTIO_NET_F_GUEST_USO4,
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index ae26bc19a4..40e7630191 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a63b1f4948..1d59951ab7 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index cca2cd41be..9243dbb128 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -33,6 +33,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 
 VHOST_INVALID_FEATURE_BIT
 };
diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
index 9431b9792c..cc7e4e47b4 100644
--- a/hw/virtio/vhost-user-vsock.c
+++ b/hw/virtio/vhost-user-vsock.c
@@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 85e73dd6a7..ed3185acfa 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
 VIRTIO_F_VERSION_1,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_CSUM,
 VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
 VIRTIO_NET_F_CTRL_MAC_ADDR,
-- 
2.39.3




[RFC 4/8] virtio: Implement in-order handling for virtio devices

2024-03-21 Thread Jonah Palmer
Implements in-order handling for most virtio devices using the
VIRTIO_F_IN_ORDER transport feature, specifically those who call
virtqueue_push to push their used elements onto the used ring.

The logic behind this implementation is as follows:

1.) virtqueue_pop always enqueues VirtQueueElements in-order.

virtqueue_pop always retrieves one or more buffer descriptors in-order
from the available ring and converts them into a VirtQueueElement. This
means that the order in which VirtQueueElements are enqueued are
in-order by default.

By virtue, as VirtQueueElements are created, we can assign a sequential
key value to them. This preserves the order of buffers that have been
made available to the device by the driver.

As VirtQueueElements are assigned a key value, the current sequence
number is incremented.

2.) Requests can be completed out-of-order.

While most devices complete requests in the same order that they were
enqueued by default, some devices don't (e.g. virtio-blk). The goal of
this out-of-order handling is to reduce the impact of devices that
process elements in-order by default while also guaranteeing compliance
with the VIRTIO_F_IN_ORDER feature.

Below is the logic behind handling completed requests (which may or may
not be in-order).

3.) Does the incoming used VirtQueueElement preserve the correct order?

In other words, is the sequence number (key) assigned to the
VirtQueueElement the expected number that would preserve the original
order?

3a.)
If it does... immediately push the used element onto the used ring.
Then increment the next expected sequence number and check to see if
any previous out-of-order VirtQueueElements stored on the hash table
has a key that matches this next expected sequence number.

For each VirtQueueElement found on the hash table with a matching key:
push the element on the used ring, remove the key-value pair from the
hash table, and then increment the next expected sequence number. Repeat
this process until we're unable to find an element with a matching key.

Note that if the device uses batching (e.g. virtio-net), then we skip
the virtqueue_flush call and let the device call it themselves.

3b.)
If it does not... stash the VirtQueueElement, along with relevant data,
as a InOrderVQElement on the hash table. The key used is the order_key
that was assigned when the VirtQueueElement was created.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 70 --
 include/hw/virtio/virtio.h |  8 +
 2 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 40124545d6..40e4377f1e 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -992,12 +992,56 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
 }
 }
 
+void virtqueue_order_element(VirtQueue *vq, const VirtQueueElement *elem,
+ unsigned int len, unsigned int idx,
+ unsigned int count)
+{
+InOrderVQElement *in_order_elem;
+
+if (elem->order_key == vq->current_order_idx) {
+/* Element is in-order, push to used ring */
+virtqueue_fill(vq, elem, len, idx);
+
+/* Batching? Don't flush */
+if (count) {
+virtqueue_flush(vq, count);
+}
+
+/* Increment next expected order, search for more in-order elements */
+while ((in_order_elem = g_hash_table_lookup(vq->in_order_ht,
+GUINT_TO_POINTER(++vq->current_order_idx))) != NULL) {
+/* Found in-order element, push to used ring */
+virtqueue_fill(vq, in_order_elem->elem, in_order_elem->len,
+   in_order_elem->idx);
+
+/* Batching? Don't flush */
+if (count) {
+virtqueue_flush(vq, in_order_elem->count);
+}
+
+/* Remove key-value pair from hash table */
+g_hash_table_remove(vq->in_order_ht,
+GUINT_TO_POINTER(vq->current_order_idx));
+}
+} else {
+/* Element is out-of-order, stash in hash table */
+in_order_elem = virtqueue_alloc_in_order_element(elem, len, idx,
+ count);
+g_hash_table_insert(vq->in_order_ht, GUINT_TO_POINTER(elem->order_key),
+in_order_elem);
+}
+}
+
 void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem,
 unsigned int len)
 {
 RCU_READ_LOCK_GUARD();
-virtqueue_fill(vq, elem, len, 0);
-virtqueue_flush(vq, 1);
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_order_element(vq, elem, len, 0, 1);
+} else {
+virtqueue_fill(vq, elem, len, 0);
+virtqueue_flush(vq, 1);
+}
 }
 
 /* Called within rcu_read_lock().  */
@@ -1478,6 +1522,18 @@ void virtqueue_map(Virt

Re: [PATCH v3 for 9.1 0/6] virtio, vhost: Add VIRTIO_F_NOTIFICATION_DATA support

2024-03-18 Thread Jonah Palmer




On 3/16/24 11:45 AM, Jiri Pirko wrote:

Fri, Mar 15, 2024 at 05:55:51PM CET, jonah.pal...@oracle.com wrote:

The goal of these patches are to add support to a variety of virtio and
vhost devices for the VIRTIO_F_NOTIFICATION_DATA transport feature. This
feature indicates that a driver will pass extra data (instead of just a
virtqueue's index) when notifying the corresponding device.

The data passed in by the driver when this feature is enabled varies in
format depending on if the device is using a split or packed virtqueue
layout:

Split VQ
  - Upper 16 bits: shadow_avail_idx
  - Lower 16 bits: virtqueue index

Packed VQ
  - Upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
  - Lower 16 bits: virtqueue index

Also, due to the limitations of ioeventfd not being able to carry the
extra provided by the driver, having both VIRTIO_F_NOTIFICATION_DATA
feature and ioeventfd enabled is a functional mismatch. The user must
explicitly disable ioeventfd for the device in the Qemu arguments when
using this feature, else the device will fail to complete realization.

For example, a device must explicitly enable notification_data as well
as disable ioeventfd:

-device virtio-scsi-pci,...,ioeventfd=off,notification_data=on

A significant aspect of this effort has been to maintain compatibility
across different backends. As such, the feature is offered by backend
devices only when supported, with fallback mechanisms where backend
support is absent.

v3: Validate VQ idx via. virtio_queue_get_num() (pci, mmio, ccw)
Rename virtio_queue_set_shadow_avail_data
Only pass in upper 16 bits of 32-bit extra data (was redundant)
Make notification compatibility check function static
Drop tags on patches 1/6, 3/6, and 4/6

v2: Don't disable ioeventfd by default, user must disable it
Drop tags on patch 2/6

Jonah Palmer (6):
  virtio/virtio-pci: Handle extra notification data
  virtio: Prevent creation of device using notification-data with ioeventfd
  virtio-mmio: Handle extra notification data
  virtio-ccw: Handle extra notification data
  vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits
  virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition


Jonah, do you have kernel patches to add this feature as well?

Thanks!


Hi Jiri! I think there are already kernel patches for 
VIRTIO_F_NOTIFICATION_DATA, unless you're referring to something more 
specific that wasn't included in these patches:


[1]: virtio: add VIRTIO_F_NOTIFICATION_DATA feature support
https://lore.kernel.org/lkml/20230324195029.2410503-1-vik...@daynix.com/

[2]: virtio-vdpa: add VIRTIO_F_NOTIFICATION_DATA feature support
https://lore.kernel.org/lkml/20230413081855.36643-3-alvaro.ka...@solid-run.com/

Jonah



[PATCH v3 for 9.1 6/6] virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition

2024-03-15 Thread Jonah Palmer
Extend the virtio device property definitions to include the
VIRTIO_F_NOTIFICATION_DATA feature.

The default state of this feature is disabled, allowing it to be
explicitly enabled where it's supported.

Tested-by: Lei Yang 
Reviewed-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index cdd4f86b61..14858c0924 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -370,7 +370,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
 DEFINE_PROP_BIT64("packed", _state, _field, \
   VIRTIO_F_RING_PACKED, false), \
 DEFINE_PROP_BIT64("queue_reset", _state, _field, \
-  VIRTIO_F_RING_RESET, true)
+  VIRTIO_F_RING_RESET, true), \
+DEFINE_PROP_BIT64("notification_data", _state, _field, \
+  VIRTIO_F_NOTIFICATION_DATA, false)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
-- 
2.39.3




[PATCH v3 for 9.1 5/6] vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits

2024-03-15 Thread Jonah Palmer
Add support for the VIRTIO_F_NOTIFICATION_DATA feature across a variety
of vhost devices.

The inclusion of VIRTIO_F_NOTIFICATION_DATA in the feature bits arrays
for these devices ensures that the backend is capable of offering and
providing support for this feature, and that it can be disabled if the
backend does not support it.

Tested-by: Lei Yang 
Reviewed-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 hw/block/vhost-user-blk.c| 1 +
 hw/net/vhost_net.c   | 2 ++
 hw/scsi/vhost-scsi.c | 1 +
 hw/scsi/vhost-user-scsi.c| 1 +
 hw/virtio/vhost-user-fs.c| 2 +-
 hw/virtio/vhost-user-vsock.c | 1 +
 net/vhost-vdpa.c | 1 +
 7 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 6a856ad51a..983c0657da 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e8e1661646..bb1f975b39 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VIRTIO_NET_F_HASH_REPORT,
 VHOST_INVALID_FEATURE_BIT
 };
@@ -55,6 +56,7 @@ static const int kernel_feature_bits[] = {
 /* Features supported by others. */
 static const int user_feature_bits[] = {
 VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_NOTIFICATION_DATA,
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index ae26bc19a4..3d5fe0994d 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a63b1f4948..0b050805a8 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index cca2cd41be..ae48cc1c96 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -33,7 +33,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
-
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
index 9431b9792c..802b44a07d 100644
--- a/hw/virtio/vhost-user-vsock.c
+++ b/hw/virtio/vhost-user-vsock.c
@@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 2a9ddb4552..5583ce5279 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -61,6 +61,7 @@ const int vdpa_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
 VIRTIO_F_VERSION_1,
+VIRTIO_F_NOTIFICATION_DATA,
 VIRTIO_NET_F_CSUM,
 VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
 VIRTIO_NET_F_CTRL_MAC_ADDR,
-- 
2.39.3




[PATCH v3 for 9.1 0/6] virtio, vhost: Add VIRTIO_F_NOTIFICATION_DATA support

2024-03-15 Thread Jonah Palmer
The goal of these patches are to add support to a variety of virtio and
vhost devices for the VIRTIO_F_NOTIFICATION_DATA transport feature. This
feature indicates that a driver will pass extra data (instead of just a
virtqueue's index) when notifying the corresponding device.

The data passed in by the driver when this feature is enabled varies in
format depending on if the device is using a split or packed virtqueue
layout:

 Split VQ
  - Upper 16 bits: shadow_avail_idx
  - Lower 16 bits: virtqueue index

 Packed VQ
  - Upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
  - Lower 16 bits: virtqueue index

Also, due to the limitations of ioeventfd not being able to carry the
extra provided by the driver, having both VIRTIO_F_NOTIFICATION_DATA
feature and ioeventfd enabled is a functional mismatch. The user must
explicitly disable ioeventfd for the device in the Qemu arguments when
using this feature, else the device will fail to complete realization.

For example, a device must explicitly enable notification_data as well
as disable ioeventfd:

-device virtio-scsi-pci,...,ioeventfd=off,notification_data=on

A significant aspect of this effort has been to maintain compatibility
across different backends. As such, the feature is offered by backend
devices only when supported, with fallback mechanisms where backend
support is absent.

v3: Validate VQ idx via. virtio_queue_get_num() (pci, mmio, ccw)
Rename virtio_queue_set_shadow_avail_data
Only pass in upper 16 bits of 32-bit extra data (was redundant)
Make notification compatibility check function static
Drop tags on patches 1/6, 3/6, and 4/6

v2: Don't disable ioeventfd by default, user must disable it
Drop tags on patch 2/6

Jonah Palmer (6):
  virtio/virtio-pci: Handle extra notification data
  virtio: Prevent creation of device using notification-data with ioeventfd
  virtio-mmio: Handle extra notification data
  virtio-ccw: Handle extra notification data
  vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits
  virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition

 hw/block/vhost-user-blk.c|  1 +
 hw/net/vhost_net.c   |  2 ++
 hw/s390x/s390-virtio-ccw.c   | 17 +++
 hw/scsi/vhost-scsi.c |  1 +
 hw/scsi/vhost-user-scsi.c|  1 +
 hw/virtio/vhost-user-fs.c|  2 +-
 hw/virtio/vhost-user-vsock.c |  1 +
 hw/virtio/virtio-mmio.c  | 10 +++--
 hw/virtio/virtio-pci.c   | 11 +++---
 hw/virtio/virtio.c   | 40 
 include/hw/virtio/virtio.h   |  6 +-
 net/vhost-vdpa.c |  1 +
 12 files changed, 82 insertions(+), 11 deletions(-)

-- 
2.39.3




[PATCH v3 for 9.1 4/6] virtio-ccw: Handle extra notification data

2024-03-15 Thread Jonah Palmer
Add support to virtio-ccw devices for handling the extra data sent from
the driver to the device when the VIRTIO_F_NOTIFICATION_DATA transport
feature has been negotiated.

The extra data that's passed to the virtio-ccw device when this feature
is enabled varies depending on the device's virtqueue layout.

That data passed to the virtio-ccw device is in the same format as the
data passed to virtio-pci devices.

Signed-off-by: Jonah Palmer 
---
 hw/s390x/s390-virtio-ccw.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index b1dcb3857f..b550adfc68 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -140,9 +140,11 @@ static void subsystem_reset(void)
 static int virtio_ccw_hcall_notify(const uint64_t *args)
 {
 uint64_t subch_id = args[0];
-uint64_t queue = args[1];
+uint64_t data = args[1];
 SubchDev *sch;
+VirtIODevice *vdev;
 int cssid, ssid, schid, m;
+uint16_t vq_idx = data;
 
 if (ioinst_disassemble_sch_ident(subch_id, , , , )) {
 return -EINVAL;
@@ -151,12 +153,19 @@ static int virtio_ccw_hcall_notify(const uint64_t *args)
 if (!sch || !css_subch_visible(sch)) {
 return -EINVAL;
 }
-if (queue >= VIRTIO_QUEUE_MAX) {
+
+vdev = virtio_ccw_get_vdev(sch);
+if (vq_idx >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, vq_idx)) {
 return -EINVAL;
 }
-virtio_queue_notify(virtio_ccw_get_vdev(sch), queue);
-return 0;
 
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+virtio_queue_set_shadow_avail_idx(virtio_get_queue(vdev, vq_idx),
+  (data >> 16) & 0x);
+}
+
+virtio_queue_notify(vdev, vq_idx);
+return 0;
 }
 
 static int virtio_ccw_hcall_early_printk(const uint64_t *args)
-- 
2.39.3




[PATCH v3 for 9.1 1/6] virtio/virtio-pci: Handle extra notification data

2024-03-15 Thread Jonah Palmer
Add support to virtio-pci devices for handling the extra data sent
from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
transport feature has been negotiated.

The extra data that's passed to the virtio-pci device when this
feature is enabled varies depending on the device's virtqueue
layout.

In a split virtqueue layout, this data includes:
 - upper 16 bits: shadow_avail_idx
 - lower 16 bits: virtqueue index

In a packed virtqueue layout, this data includes:
 - upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
 - lower 16 bits: virtqueue index

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-pci.c | 11 ---
 hw/virtio/virtio.c | 18 ++
 include/hw/virtio/virtio.h |  2 ++
 3 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index cb6940fc0e..f3e0a08f53 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 {
 VirtIOPCIProxy *proxy = opaque;
 VirtIODevice *vdev = virtio_bus_get_device(>bus);
-uint16_t vector;
+uint16_t vector, vq_idx;
 hwaddr pa;
 
 switch (addr) {
@@ -408,8 +408,13 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 vdev->queue_sel = val;
 break;
 case VIRTIO_PCI_QUEUE_NOTIFY:
-if (val < VIRTIO_QUEUE_MAX) {
-virtio_queue_notify(vdev, val);
+vq_idx = val;
+if (vq_idx < VIRTIO_QUEUE_MAX && virtio_queue_get_num(vdev, vq_idx)) {
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+virtio_queue_set_shadow_avail_idx(virtio_get_queue(vdev, 
vq_idx),
+  val >> 16);
+}
+virtio_queue_notify(vdev, vq_idx);
 }
 break;
 case VIRTIO_PCI_STATUS:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index d229755eae..463426ca92 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2255,6 +2255,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, 
int align)
 }
 }
 
+void virtio_queue_set_shadow_avail_idx(VirtQueue *vq, uint16_t 
shadow_avail_idx)
+{
+if (!vq->vring.desc) {
+return;
+}
+
+/*
+ * 16-bit data for packed VQs include 1-bit wrap counter and
+ * 15-bit shadow_avail_idx.
+ */
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+vq->shadow_avail_wrap_counter = (shadow_avail_idx >> 15) & 0x1;
+vq->shadow_avail_idx = shadow_avail_idx & 0x7FFF;
+} else {
+vq->shadow_avail_idx = shadow_avail_idx;
+}
+}
+
 static void virtio_queue_notify_vq(VirtQueue *vq)
 {
 if (vq->vring.desc && vq->handle_output) {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index c8f72850bc..cdd4f86b61 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -306,6 +306,8 @@ int virtio_queue_ready(VirtQueue *vq);
 
 int virtio_queue_empty(VirtQueue *vq);
 
+void virtio_queue_set_shadow_avail_idx(VirtQueue *vq, uint16_t idx);
+
 /* Host binding interface.  */
 
 uint32_t virtio_config_readb(VirtIODevice *vdev, uint32_t addr);
-- 
2.39.3




[PATCH v3 for 9.1 2/6] virtio: Prevent creation of device using notification-data with ioeventfd

2024-03-15 Thread Jonah Palmer
Prevent the realization of a virtio device that attempts to use the
VIRTIO_F_NOTIFICATION_DATA transport feature without disabling
ioeventfd.

Due to ioeventfd not being able to carry the extra data associated with
this feature, having both enabled is a functional mismatch and therefore
Qemu should not continue the device's realization process.

Although the device does not yet know if the feature will be
successfully negotiated, many devices using this feature wont actually
work without this extra data and would fail FEATURES_OK anyway.

If ioeventfd is able to work with the extra notification data in the
future, this compatibility check can be removed.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 463426ca92..f9cb8d1e5c 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2971,6 +2971,20 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val)
 return ret;
 }
 
+static void virtio_device_check_notification_compatibility(VirtIODevice *vdev,
+   Error **errp)
+{
+VirtioBusState *bus = VIRTIO_BUS(qdev_get_parent_bus(DEVICE(vdev)));
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
+DeviceState *proxy = DEVICE(BUS(bus)->parent);
+
+if (virtio_host_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA) &&
+k->ioeventfd_enabled(proxy)) {
+error_setg(errp,
+   "notification_data=on without ioeventfd=off is not 
supported");
+}
+}
+
 size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
   uint64_t host_features)
 {
@@ -3731,6 +3745,14 @@ static void virtio_device_realize(DeviceState *dev, 
Error **errp)
 }
 }
 
+/* Devices should not use both ioeventfd and notification data feature */
+virtio_device_check_notification_compatibility(vdev, );
+if (err != NULL) {
+error_propagate(errp, err);
+vdc->unrealize(dev);
+return;
+}
+
 virtio_bus_device_plugged(vdev, );
 if (err != NULL) {
 error_propagate(errp, err);
-- 
2.39.3




[PATCH v3 for 9.1 3/6] virtio-mmio: Handle extra notification data

2024-03-15 Thread Jonah Palmer
Add support to virtio-mmio devices for handling the extra data sent from
the driver to the device when the VIRTIO_F_NOTIFICATION_DATA transport
feature has been negotiated.

The extra data that's passed to the virtio-mmio device when this feature
is enabled varies depending on the device's virtqueue layout.

The data passed to the virtio-mmio device is in the same format as the
data passed to virtio-pci devices.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-mmio.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index 22f9fbcf5a..003c363f0b 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -248,6 +248,7 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 {
 VirtIOMMIOProxy *proxy = (VirtIOMMIOProxy *)opaque;
 VirtIODevice *vdev = virtio_bus_get_device(>bus);
+uint16_t vq_idx;
 
 trace_virtio_mmio_write_offset(offset, value);
 
@@ -407,8 +408,13 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 }
 break;
 case VIRTIO_MMIO_QUEUE_NOTIFY:
-if (value < VIRTIO_QUEUE_MAX) {
-virtio_queue_notify(vdev, value);
+vq_idx = value;
+if (vq_idx < VIRTIO_QUEUE_MAX && virtio_queue_get_num(vdev, vq_idx)) {
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+virtio_queue_set_shadow_avail_idx(virtio_get_queue(vdev, 
vq_idx),
+  (value >> 16) & 0x);
+}
+virtio_queue_notify(vdev, vq_idx);
 }
 break;
 case VIRTIO_MMIO_INTERRUPT_ACK:
-- 
2.39.3




Re: [PATCH v2 1/6] virtio/virtio-pci: Handle extra notification data

2024-03-14 Thread Jonah Palmer




On 3/14/24 3:05 PM, Eugenio Perez Martin wrote:

On Thu, Mar 14, 2024 at 5:06 PM Jonah Palmer  wrote:




On 3/14/24 10:55 AM, Eugenio Perez Martin wrote:

On Thu, Mar 14, 2024 at 1:16 PM Jonah Palmer  wrote:




On 3/13/24 11:01 PM, Jason Wang wrote:

On Wed, Mar 13, 2024 at 7:55 PM Jonah Palmer  wrote:


Add support to virtio-pci devices for handling the extra data sent
from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
transport feature has been negotiated.

The extra data that's passed to the virtio-pci device when this
feature is enabled varies depending on the device's virtqueue
layout.

In a split virtqueue layout, this data includes:
- upper 16 bits: shadow_avail_idx
- lower 16 bits: virtqueue index

In a packed virtqueue layout, this data includes:
- upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
- lower 16 bits: virtqueue index

Tested-by: Lei Yang 
Reviewed-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
hw/virtio/virtio-pci.c | 10 +++---
hw/virtio/virtio.c | 18 ++
include/hw/virtio/virtio.h |  1 +
3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index cb6940fc0e..0f5c3c3b2f 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
{
VirtIOPCIProxy *proxy = opaque;
VirtIODevice *vdev = virtio_bus_get_device(>bus);
-uint16_t vector;
+uint16_t vector, vq_idx;
hwaddr pa;

switch (addr) {
@@ -408,8 +408,12 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
vdev->queue_sel = val;
break;
case VIRTIO_PCI_QUEUE_NOTIFY:
-if (val < VIRTIO_QUEUE_MAX) {
-virtio_queue_notify(vdev, val);
+vq_idx = val;
+if (vq_idx < VIRTIO_QUEUE_MAX) {
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+virtio_queue_set_shadow_avail_data(vdev, val);
+}
+virtio_queue_notify(vdev, vq_idx);
}
break;
case VIRTIO_PCI_STATUS:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index d229755eae..bcb9e09df0 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2255,6 +2255,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, 
int align)
}
}

+void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data)


Maybe I didn't explain well, but I think it is better to pass directly
idx to a VirtQueue *. That way only the caller needs to check for a
valid vq idx, and (my understanding is) the virtio.c interface is
migrating to VirtQueue * use anyway.



Oh, are you saying to just pass in a VirtQueue *vq instead of
VirtIODevice *vdev and get rid of the vq->vring.desc check in the function?



No, that needs to be kept. I meant the access to vdev->vq[i] without
checking for a valid i.



Ahh okay I see what you mean. But I thought the following was checking 
for a valid VQ index:


if (vq_idx < VIRTIO_QUEUE_MAX)

Of course the virtio device may not have up to VIRTIO_QUEUE_MAX 
virtqueues, so maybe we should be checking for validity like this?


if (vdev->vq[i].vring.num == 0)

Or was there something else you had in mind? Apologies for the confusion.


You can get the VirtQueue in the caller with virtio_get_queue. Which
also does not check for a valid index, but that way is clearer the
caller needs to check it.



Roger, I'll use this instead for clarity.


As a side note, the check for desc != 0 is widespread in QEMU but the
driver may use 0 address for desc, so it's not 100% valid. But to
change that now requires a deeper change out of the scope of this
series, so let's keep it for now :).

Thanks! >


I'll add it to the todo list =]


+{
+/* Lower 16 bits is the virtqueue index */
+uint16_t i = data;
+VirtQueue *vq = >vq[i];
+
+if (!vq->vring.desc) {
+return;
+}
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+vq->shadow_avail_wrap_counter = (data >> 31) & 0x1;
+vq->shadow_avail_idx = (data >> 16) & 0x7FFF;
+} else {
+vq->shadow_avail_idx = (data >> 16);


Do we need to do a sanity check for this value?

Thanks



It can't hurt, right? What kind of check did you have in mind?

if (vq->shadow_avail_idx >= vq->vring.num)



I'm a little bit lost too. shadow_avail_idx can take all uint16_t
values. Maybe you meant checking for a valid vq index, Jason?

Thanks!


Or something else?


+}
+}
+
static void virtio_queue_notify_vq(VirtQueue *vq)
{
if (vq->vring.desc && vq->handle_output) {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index c8f72850bc..53915947a7 100644
--- a/include/hw/virt

Re: [PATCH v2 1/6] virtio/virtio-pci: Handle extra notification data

2024-03-14 Thread Jonah Palmer




On 3/14/24 10:55 AM, Eugenio Perez Martin wrote:

On Thu, Mar 14, 2024 at 1:16 PM Jonah Palmer  wrote:




On 3/13/24 11:01 PM, Jason Wang wrote:

On Wed, Mar 13, 2024 at 7:55 PM Jonah Palmer  wrote:


Add support to virtio-pci devices for handling the extra data sent
from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
transport feature has been negotiated.

The extra data that's passed to the virtio-pci device when this
feature is enabled varies depending on the device's virtqueue
layout.

In a split virtqueue layout, this data includes:
   - upper 16 bits: shadow_avail_idx
   - lower 16 bits: virtqueue index

In a packed virtqueue layout, this data includes:
   - upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
   - lower 16 bits: virtqueue index

Tested-by: Lei Yang 
Reviewed-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
   hw/virtio/virtio-pci.c | 10 +++---
   hw/virtio/virtio.c | 18 ++
   include/hw/virtio/virtio.h |  1 +
   3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index cb6940fc0e..0f5c3c3b2f 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
   {
   VirtIOPCIProxy *proxy = opaque;
   VirtIODevice *vdev = virtio_bus_get_device(>bus);
-uint16_t vector;
+uint16_t vector, vq_idx;
   hwaddr pa;

   switch (addr) {
@@ -408,8 +408,12 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
   vdev->queue_sel = val;
   break;
   case VIRTIO_PCI_QUEUE_NOTIFY:
-if (val < VIRTIO_QUEUE_MAX) {
-virtio_queue_notify(vdev, val);
+vq_idx = val;
+if (vq_idx < VIRTIO_QUEUE_MAX) {
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+virtio_queue_set_shadow_avail_data(vdev, val);
+}
+virtio_queue_notify(vdev, vq_idx);
   }
   break;
   case VIRTIO_PCI_STATUS:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index d229755eae..bcb9e09df0 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2255,6 +2255,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, 
int align)
   }
   }

+void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data)


Maybe I didn't explain well, but I think it is better to pass directly
idx to a VirtQueue *. That way only the caller needs to check for a
valid vq idx, and (my understanding is) the virtio.c interface is
migrating to VirtQueue * use anyway.



Oh, are you saying to just pass in a VirtQueue *vq instead of 
VirtIODevice *vdev and get rid of the vq->vring.desc check in the function?



+{
+/* Lower 16 bits is the virtqueue index */
+uint16_t i = data;
+VirtQueue *vq = >vq[i];
+
+if (!vq->vring.desc) {
+return;
+}
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+vq->shadow_avail_wrap_counter = (data >> 31) & 0x1;
+vq->shadow_avail_idx = (data >> 16) & 0x7FFF;
+} else {
+vq->shadow_avail_idx = (data >> 16);


Do we need to do a sanity check for this value?

Thanks



It can't hurt, right? What kind of check did you have in mind?

if (vq->shadow_avail_idx >= vq->vring.num)



I'm a little bit lost too. shadow_avail_idx can take all uint16_t
values. Maybe you meant checking for a valid vq index, Jason?

Thanks!


Or something else?


+}
+}
+
   static void virtio_queue_notify_vq(VirtQueue *vq)
   {
   if (vq->vring.desc && vq->handle_output) {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index c8f72850bc..53915947a7 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -335,6 +335,7 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n);
   void virtio_init_region_cache(VirtIODevice *vdev, int n);
   void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
   void virtio_queue_notify(VirtIODevice *vdev, int n);
+void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data);
   uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
   void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector);
   int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
--
2.39.3











Re: [PATCH v2 1/6] virtio/virtio-pci: Handle extra notification data

2024-03-14 Thread Jonah Palmer




On 3/13/24 11:01 PM, Jason Wang wrote:

On Wed, Mar 13, 2024 at 7:55 PM Jonah Palmer  wrote:


Add support to virtio-pci devices for handling the extra data sent
from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
transport feature has been negotiated.

The extra data that's passed to the virtio-pci device when this
feature is enabled varies depending on the device's virtqueue
layout.

In a split virtqueue layout, this data includes:
  - upper 16 bits: shadow_avail_idx
  - lower 16 bits: virtqueue index

In a packed virtqueue layout, this data includes:
  - upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
  - lower 16 bits: virtqueue index

Tested-by: Lei Yang 
Reviewed-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
  hw/virtio/virtio-pci.c | 10 +++---
  hw/virtio/virtio.c | 18 ++
  include/hw/virtio/virtio.h |  1 +
  3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index cb6940fc0e..0f5c3c3b2f 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
  {
  VirtIOPCIProxy *proxy = opaque;
  VirtIODevice *vdev = virtio_bus_get_device(>bus);
-uint16_t vector;
+uint16_t vector, vq_idx;
  hwaddr pa;

  switch (addr) {
@@ -408,8 +408,12 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
  vdev->queue_sel = val;
  break;
  case VIRTIO_PCI_QUEUE_NOTIFY:
-if (val < VIRTIO_QUEUE_MAX) {
-virtio_queue_notify(vdev, val);
+vq_idx = val;
+if (vq_idx < VIRTIO_QUEUE_MAX) {
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+virtio_queue_set_shadow_avail_data(vdev, val);
+}
+virtio_queue_notify(vdev, vq_idx);
  }
  break;
  case VIRTIO_PCI_STATUS:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index d229755eae..bcb9e09df0 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2255,6 +2255,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, 
int align)
  }
  }

+void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data)
+{
+/* Lower 16 bits is the virtqueue index */
+uint16_t i = data;
+VirtQueue *vq = >vq[i];
+
+if (!vq->vring.desc) {
+return;
+}
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+vq->shadow_avail_wrap_counter = (data >> 31) & 0x1;
+vq->shadow_avail_idx = (data >> 16) & 0x7FFF;
+} else {
+vq->shadow_avail_idx = (data >> 16);


Do we need to do a sanity check for this value?

Thanks



It can't hurt, right? What kind of check did you have in mind?

if (vq->shadow_avail_idx >= vq->vring.num)

Or something else?


+}
+}
+
  static void virtio_queue_notify_vq(VirtQueue *vq)
  {
  if (vq->vring.desc && vq->handle_output) {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index c8f72850bc..53915947a7 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -335,6 +335,7 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n);
  void virtio_init_region_cache(VirtIODevice *vdev, int n);
  void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
  void virtio_queue_notify(VirtIODevice *vdev, int n);
+void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data);
  uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
  void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector);
  int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
--
2.39.3







Re: [PATCH v2 2/6] virtio: Prevent creation of device using notification-data with ioeventfd

2024-03-13 Thread Jonah Palmer




On 3/13/24 10:35 AM, Eugenio Perez Martin wrote:

On Wed, Mar 13, 2024 at 12:55 PM Jonah Palmer  wrote:


Prevent the realization of a virtio device that attempts to use the
VIRTIO_F_NOTIFICATION_DATA transport feature without disabling
ioeventfd.

Due to ioeventfd not being able to carry the extra data associated with
this feature, having both enabled is a functional mismatch and therefore
Qemu should not continue the device's realization process.

Although the device does not yet know if the feature will be
successfully negotiated, many devices using this feature wont actually
work without this extra data and would fail FEATURES_OK anyway.

If ioeventfd is able to work with the extra notification data in the
future, this compatibility check can be removed.

Signed-off-by: Jonah Palmer 
---
  hw/virtio/virtio.c | 22 ++
  include/hw/virtio/virtio.h |  2 ++
  2 files changed, 24 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index bcb9e09df0..d0a433b465 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2971,6 +2971,20 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val)
  return ret;
  }

+void virtio_device_check_notification_compatibility(VirtIODevice *vdev,
+Error **errp)
+{
+VirtioBusState *bus = VIRTIO_BUS(qdev_get_parent_bus(DEVICE(vdev)));
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
+DeviceState *proxy = DEVICE(BUS(bus)->parent);
+
+if (virtio_host_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA) &&
+k->ioeventfd_enabled(proxy)) {
+error_setg(errp,
+   "notification_data=on without ioeventfd=off is not 
supported");
+}
+}
+
  size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
uint64_t host_features)
  {
@@ -3731,6 +3745,14 @@ static void virtio_device_realize(DeviceState *dev, 
Error **errp)
  }
  }

+/* Devices should not use both ioeventfd and notification data feature */
+virtio_device_check_notification_compatibility(vdev, );
+if (err != NULL) {
+error_propagate(errp, err);
+vdc->unrealize(dev);
+return;
+}
+
  virtio_bus_device_plugged(vdev, );
  if (err != NULL) {
  error_propagate(errp, err);
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 53915947a7..e0325d84d0 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -346,6 +346,8 @@ void virtio_queue_reset(VirtIODevice *vdev, uint32_t 
queue_index);
  void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
  void virtio_update_irq(VirtIODevice *vdev);
  int virtio_set_features(VirtIODevice *vdev, uint64_t val);
+void virtio_device_check_notification_compatibility(VirtIODevice *vdev,
+Error **errp);


Why not make it static?



Great question with no good answer! Will fix this.



  /* Base devices.  */
  typedef struct VirtIOBlkConf VirtIOBlkConf;
--
2.39.3







[PATCH v2 0/6] virtio,vhost: Add VIRTIO_F_NOTIFICATION_DATA support

2024-03-13 Thread Jonah Palmer
The goal of these patches are to add support to a variety of virtio and
vhost devices for the VIRTIO_F_NOTIFICATION_DATA transport feature. This
feature indicates that a driver will pass extra data (instead of just a
virtqueue's index) when notifying the corresponding device.

The data passed in by the driver when this feature is enabled varies in
format depending on if the device is using a split or packed virtqueue
layout:

 Split VQ
  - Upper 16 bits: shadow_avail_idx
  - Lower 16 bits: virtqueue index

 Packed VQ
  - Upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
  - Lower 16 bits: virtqueue index

Also, due to the limitations of ioeventfd not being able to carry the
extra provided by the driver, having both VIRTIO_F_NOTIFICATION_DATA
feature and ioeventfd enabled is a functional mismatch. The user must
explicitly disable ioeventfd for the device in the Qemu arguments when
using this feature, else the device will fail to complete realization.

For example, a device must explicitly enable notification_data as well
as disable ioeventfd:

-device virtio-scsi-pci,...,ioeventfd=off,notification_data=on

A significant aspect of this effort has been to maintain compatibility
across different backends. As such, the feature is offered by backend
devices only when supported, with fallback mechanisms where backend
support is absent.

v2: Don't disable ioeventfd by default, user must disable it
Drop tags on patch 2/6

Jonah Palmer (6):
  virtio/virtio-pci: Handle extra notification data
  virtio: Prevent creation of device using notification-data with ioeventfd
  virtio-mmio: Handle extra notification data
  virtio-ccw: Handle extra notification data
  vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits
  virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition

 hw/block/vhost-user-blk.c|  1 +
 hw/net/vhost_net.c   |  2 ++
 hw/s390x/s390-virtio-ccw.c   | 16 +++
 hw/scsi/vhost-scsi.c |  1 +
 hw/scsi/vhost-user-scsi.c|  1 +
 hw/virtio/vhost-user-fs.c|  2 +-
 hw/virtio/vhost-user-vsock.c |  1 +
 hw/virtio/virtio-mmio.c  |  9 ++--
 hw/virtio/virtio-pci.c   | 10 ++---
 hw/virtio/virtio.c   | 40 
 include/hw/virtio/virtio.h   |  7 ++-
 net/vhost-vdpa.c |  1 +
 12 files changed, 80 insertions(+), 11 deletions(-)

-- 
2.39.3




[PATCH v2 3/6] virtio-mmio: Handle extra notification data

2024-03-13 Thread Jonah Palmer
Add support to virtio-mmio devices for handling the extra data sent from
the driver to the device when the VIRTIO_F_NOTIFICATION_DATA transport
feature has been negotiated.

The extra data that's passed to the virtio-mmio device when this feature
is enabled varies depending on the device's virtqueue layout.

The data passed to the virtio-mmio device is in the same format as the
data passed to virtio-pci devices.

Tested-by: Lei Yang 
Acked-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-mmio.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index 22f9fbcf5a..f99d5851a2 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -248,6 +248,7 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 {
 VirtIOMMIOProxy *proxy = (VirtIOMMIOProxy *)opaque;
 VirtIODevice *vdev = virtio_bus_get_device(>bus);
+uint16_t vq_idx;
 
 trace_virtio_mmio_write_offset(offset, value);
 
@@ -407,8 +408,12 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 }
 break;
 case VIRTIO_MMIO_QUEUE_NOTIFY:
-if (value < VIRTIO_QUEUE_MAX) {
-virtio_queue_notify(vdev, value);
+vq_idx = value;
+if (vq_idx < VIRTIO_QUEUE_MAX) {
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+virtio_queue_set_shadow_avail_data(vdev, value);
+}
+virtio_queue_notify(vdev, vq_idx);
 }
 break;
 case VIRTIO_MMIO_INTERRUPT_ACK:
-- 
2.39.3




[PATCH v2 5/6] vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits

2024-03-13 Thread Jonah Palmer
Add support for the VIRTIO_F_NOTIFICATION_DATA feature across a variety
of vhost devices.

The inclusion of VIRTIO_F_NOTIFICATION_DATA in the feature bits arrays
for these devices ensures that the backend is capable of offering and
providing support for this feature, and that it can be disabled if the
backend does not support it.

Tested-by: Lei Yang 
Reviewed-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 hw/block/vhost-user-blk.c| 1 +
 hw/net/vhost_net.c   | 2 ++
 hw/scsi/vhost-scsi.c | 1 +
 hw/scsi/vhost-user-scsi.c| 1 +
 hw/virtio/vhost-user-fs.c| 2 +-
 hw/virtio/vhost-user-vsock.c | 1 +
 net/vhost-vdpa.c | 1 +
 7 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 6a856ad51a..983c0657da 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e8e1661646..bb1f975b39 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VIRTIO_NET_F_HASH_REPORT,
 VHOST_INVALID_FEATURE_BIT
 };
@@ -55,6 +56,7 @@ static const int kernel_feature_bits[] = {
 /* Features supported by others. */
 static const int user_feature_bits[] = {
 VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_NOTIFICATION_DATA,
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index ae26bc19a4..3d5fe0994d 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a63b1f4948..0b050805a8 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index cca2cd41be..ae48cc1c96 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -33,7 +33,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
-
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
index 9431b9792c..802b44a07d 100644
--- a/hw/virtio/vhost-user-vsock.c
+++ b/hw/virtio/vhost-user-vsock.c
@@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 2a9ddb4552..5583ce5279 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -61,6 +61,7 @@ const int vdpa_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
 VIRTIO_F_VERSION_1,
+VIRTIO_F_NOTIFICATION_DATA,
 VIRTIO_NET_F_CSUM,
 VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
 VIRTIO_NET_F_CTRL_MAC_ADDR,
-- 
2.39.3




[PATCH v2 2/6] virtio: Prevent creation of device using notification-data with ioeventfd

2024-03-13 Thread Jonah Palmer
Prevent the realization of a virtio device that attempts to use the
VIRTIO_F_NOTIFICATION_DATA transport feature without disabling
ioeventfd.

Due to ioeventfd not being able to carry the extra data associated with
this feature, having both enabled is a functional mismatch and therefore
Qemu should not continue the device's realization process.

Although the device does not yet know if the feature will be
successfully negotiated, many devices using this feature wont actually
work without this extra data and would fail FEATURES_OK anyway.

If ioeventfd is able to work with the extra notification data in the
future, this compatibility check can be removed.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 22 ++
 include/hw/virtio/virtio.h |  2 ++
 2 files changed, 24 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index bcb9e09df0..d0a433b465 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2971,6 +2971,20 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val)
 return ret;
 }
 
+void virtio_device_check_notification_compatibility(VirtIODevice *vdev,
+Error **errp)
+{
+VirtioBusState *bus = VIRTIO_BUS(qdev_get_parent_bus(DEVICE(vdev)));
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
+DeviceState *proxy = DEVICE(BUS(bus)->parent);
+
+if (virtio_host_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA) &&
+k->ioeventfd_enabled(proxy)) {
+error_setg(errp,
+   "notification_data=on without ioeventfd=off is not 
supported");
+}
+}
+
 size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
   uint64_t host_features)
 {
@@ -3731,6 +3745,14 @@ static void virtio_device_realize(DeviceState *dev, 
Error **errp)
 }
 }
 
+/* Devices should not use both ioeventfd and notification data feature */
+virtio_device_check_notification_compatibility(vdev, );
+if (err != NULL) {
+error_propagate(errp, err);
+vdc->unrealize(dev);
+return;
+}
+
 virtio_bus_device_plugged(vdev, );
 if (err != NULL) {
 error_propagate(errp, err);
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 53915947a7..e0325d84d0 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -346,6 +346,8 @@ void virtio_queue_reset(VirtIODevice *vdev, uint32_t 
queue_index);
 void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
 void virtio_update_irq(VirtIODevice *vdev);
 int virtio_set_features(VirtIODevice *vdev, uint64_t val);
+void virtio_device_check_notification_compatibility(VirtIODevice *vdev,
+Error **errp);
 
 /* Base devices.  */
 typedef struct VirtIOBlkConf VirtIOBlkConf;
-- 
2.39.3




[PATCH v2 1/6] virtio/virtio-pci: Handle extra notification data

2024-03-13 Thread Jonah Palmer
Add support to virtio-pci devices for handling the extra data sent
from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
transport feature has been negotiated.

The extra data that's passed to the virtio-pci device when this
feature is enabled varies depending on the device's virtqueue
layout.

In a split virtqueue layout, this data includes:
 - upper 16 bits: shadow_avail_idx
 - lower 16 bits: virtqueue index

In a packed virtqueue layout, this data includes:
 - upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
 - lower 16 bits: virtqueue index

Tested-by: Lei Yang 
Reviewed-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-pci.c | 10 +++---
 hw/virtio/virtio.c | 18 ++
 include/hw/virtio/virtio.h |  1 +
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index cb6940fc0e..0f5c3c3b2f 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 {
 VirtIOPCIProxy *proxy = opaque;
 VirtIODevice *vdev = virtio_bus_get_device(>bus);
-uint16_t vector;
+uint16_t vector, vq_idx;
 hwaddr pa;
 
 switch (addr) {
@@ -408,8 +408,12 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 vdev->queue_sel = val;
 break;
 case VIRTIO_PCI_QUEUE_NOTIFY:
-if (val < VIRTIO_QUEUE_MAX) {
-virtio_queue_notify(vdev, val);
+vq_idx = val;
+if (vq_idx < VIRTIO_QUEUE_MAX) {
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+virtio_queue_set_shadow_avail_data(vdev, val);
+}
+virtio_queue_notify(vdev, vq_idx);
 }
 break;
 case VIRTIO_PCI_STATUS:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index d229755eae..bcb9e09df0 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2255,6 +2255,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, 
int align)
 }
 }
 
+void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data)
+{
+/* Lower 16 bits is the virtqueue index */
+uint16_t i = data;
+VirtQueue *vq = >vq[i];
+
+if (!vq->vring.desc) {
+return;
+}
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+vq->shadow_avail_wrap_counter = (data >> 31) & 0x1;
+vq->shadow_avail_idx = (data >> 16) & 0x7FFF;
+} else {
+vq->shadow_avail_idx = (data >> 16);
+}
+}
+
 static void virtio_queue_notify_vq(VirtQueue *vq)
 {
 if (vq->vring.desc && vq->handle_output) {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index c8f72850bc..53915947a7 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -335,6 +335,7 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n);
 void virtio_init_region_cache(VirtIODevice *vdev, int n);
 void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
 void virtio_queue_notify(VirtIODevice *vdev, int n);
+void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data);
 uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
 void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector);
 int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
-- 
2.39.3




[PATCH v2 6/6] virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition

2024-03-13 Thread Jonah Palmer
Extend the virtio device property definitions to include the
VIRTIO_F_NOTIFICATION_DATA feature.

The default state of this feature is disabled, allowing it to be
explicitly enabled where it's supported.

Tested-by: Lei Yang 
Reviewed-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index e0325d84d0..bc54c5e037 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -371,7 +371,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
 DEFINE_PROP_BIT64("packed", _state, _field, \
   VIRTIO_F_RING_PACKED, false), \
 DEFINE_PROP_BIT64("queue_reset", _state, _field, \
-  VIRTIO_F_RING_RESET, true)
+  VIRTIO_F_RING_RESET, true), \
+DEFINE_PROP_BIT64("notification_data", _state, _field, \
+  VIRTIO_F_NOTIFICATION_DATA, false)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
-- 
2.39.3




[PATCH v2 4/6] virtio-ccw: Handle extra notification data

2024-03-13 Thread Jonah Palmer
Add support to virtio-ccw devices for handling the extra data sent from
the driver to the device when the VIRTIO_F_NOTIFICATION_DATA transport
feature has been negotiated.

The extra data that's passed to the virtio-ccw device when this feature
is enabled varies depending on the device's virtqueue layout.

That data passed to the virtio-ccw device is in the same format as the
data passed to virtio-pci devices.

Tested-by: Lei Yang 
Acked-by: Eric Farman 
Acked-by: Thomas Huth 
Signed-off-by: Jonah Palmer 
---
 hw/s390x/s390-virtio-ccw.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index b1dcb3857f..7631e4aa41 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -140,9 +140,11 @@ static void subsystem_reset(void)
 static int virtio_ccw_hcall_notify(const uint64_t *args)
 {
 uint64_t subch_id = args[0];
-uint64_t queue = args[1];
+uint64_t data = args[1];
 SubchDev *sch;
+VirtIODevice *vdev;
 int cssid, ssid, schid, m;
+uint16_t vq_idx = data;
 
 if (ioinst_disassemble_sch_ident(subch_id, , , , )) {
 return -EINVAL;
@@ -151,12 +153,18 @@ static int virtio_ccw_hcall_notify(const uint64_t *args)
 if (!sch || !css_subch_visible(sch)) {
 return -EINVAL;
 }
-if (queue >= VIRTIO_QUEUE_MAX) {
+
+if (vq_idx >= VIRTIO_QUEUE_MAX) {
 return -EINVAL;
 }
-virtio_queue_notify(virtio_ccw_get_vdev(sch), queue);
-return 0;
 
+vdev = virtio_ccw_get_vdev(sch);
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+virtio_queue_set_shadow_avail_data(vdev, data);
+}
+
+virtio_queue_notify(vdev, vq_idx);
+return 0;
 }
 
 static int virtio_ccw_hcall_early_printk(const uint64_t *args)
-- 
2.39.3




Re: [PATCH v1 2/8] virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

2024-03-12 Thread Jonah Palmer




On 3/12/24 10:58 AM, Michael S. Tsirkin wrote:

On Tue, Mar 12, 2024 at 10:33:51AM -0400, Jonah Palmer wrote:



On 3/11/24 11:47 AM, Michael S. Tsirkin wrote:

On Mon, Mar 11, 2024 at 10:53:25AM -0400, Jonah Palmer wrote:



On 3/8/24 2:19 PM, Michael S. Tsirkin wrote:

On Fri, Mar 08, 2024 at 12:45:13PM -0500, Jonah Palmer wrote:



On 3/8/24 12:36 PM, Eugenio Perez Martin wrote:

On Fri, Mar 8, 2024 at 6: 01 PM Michael S. Tsirkin 
wrote: > > On Mon, Mar 04, 2024 at 02: 46: 06PM -0500, Jonah Palmer
wrote: > > Prevent ioeventfd from being enabled/disabled when a
virtio-pci > > device
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
Report Suspicious
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/ACWV5N9M2RV99hQ!Op20OCZE8kFi__wOXJ_Z0URZ2e_9fdaYz2tejZvKqiDgOm6ijq_imUptzxsrej_4riwCrBGeKmQ9VKXqnbV1ujbfiOV5-E2e1s3pKqpqUL-gRIuMQLDLygRD1hoX3Q$>
ZjQcmQRYFpfptBannerEnd

On Fri, Mar 8, 2024 at 6:01 PM Michael S. Tsirkin  wrote:


On Mon, Mar 04, 2024 at 02:46:06PM -0500, Jonah Palmer wrote:

Prevent ioeventfd from being enabled/disabled when a virtio-pci
device has negotiated the VIRTIO_F_NOTIFICATION_DATA transport
feature.

Due to ioeventfd not being able to carry the extra data associated with
this feature, the ioeventfd should be left in a disabled state for
emulated virtio-pci devices using this feature.

Reviewed-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 


I thought hard about this. I propose that for now,
instead of disabling ioevetfd silently we error out unless
user disabled it for us.
WDYT?



Yes, error is a better plan than silently disabling it. In the
(unlikely?) case we are able to make notification data work with
eventfd in the future, it makes the change more evident.



Will do in v2. I assume we'll also make this the case for virtio-mmio and
virtio-ccw?


Guess so. Pls note freeze is imminent.


Got it. Also, would you mind elaborating a bit more on "error out"? E.g. do
we want to prevent the Qemu from starting at all if a device is attempting
to use both VIRTIO_F_NOTIFICATION_DATA and ioeventfd? Or do you mean
something like still keep ioeventfd disabled but also log an error message
unless it was explicitly disabled by the user?



my preference would be to block device instance from being created.



I could very well be missing something here, but I was looking to see how I
could block the device from being created (realized) given the functional
mismatch between negotiating the VIRTIO_F_NOTIFICATION_DATA feature and
ioeventfd being enabled.

However, I realized that feature negotiation only happens after the virtio
device has been realized and it's one of the last steps before the device
becomes fully operational. In other words, we don't know if the guest
(driver) also supports this feature until the feature negotiation phase,
which is after realization.

So, during realization (e.g. virtio_device_realize), we know if the virtio
device (1) intends to negotiate the VIRTIO_F_NOTIFICATION_DATA feature and
(2) has enabled ioeventfd, however, we don't know if the driver will
actually support this notification data feature.

Given this, we could block the device from being created if the device is
*intending* to use the notification data feature along with ioeventfd, but
this seems premature since we don't know if the feature will actually be
successfully negotiated.


Yes this is the option I had in mind. Many devices with this feature
do not actually work if they do not get the extra data
so they fail FEATURES_OK, anyway.




Ah, okay I see. This was the extra context I was missing.

Will do, thanks Michael!


Another option might be check this during/immediately after feature
negotiation, and then unrealize the device. However, I'm not sure if by this
point it's "too late" to unrealize it.

There's also other options like defaulting to using notification data over
ioeventfd (since a user would need to explicitly enable it, showing intent
to actually use the feature), which is what we're doing now, except we could
add some kind of warning message for the user. Another option could be
setting the device to broken. However, these options don't align with your
suggestion of removing the device completely.

Let me know how you'd like me to proceed with this. Thanks!




---
hw/virtio/virtio-pci.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index d12edc567f..287b8f7720 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -417,13 +417,15 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
}
break;
case VIRTIO_PCI_STATUS:
-if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
+if (!(val & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICAT

Re: [PATCH v1 2/8] virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

2024-03-12 Thread Jonah Palmer




On 3/11/24 11:47 AM, Michael S. Tsirkin wrote:

On Mon, Mar 11, 2024 at 10:53:25AM -0400, Jonah Palmer wrote:



On 3/8/24 2:19 PM, Michael S. Tsirkin wrote:

On Fri, Mar 08, 2024 at 12:45:13PM -0500, Jonah Palmer wrote:



On 3/8/24 12:36 PM, Eugenio Perez Martin wrote:

On Fri, Mar 8, 2024 at 6: 01 PM Michael S. Tsirkin 
wrote: > > On Mon, Mar 04, 2024 at 02: 46: 06PM -0500, Jonah Palmer
wrote: > > Prevent ioeventfd from being enabled/disabled when a
virtio-pci > > device
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
Report Suspicious
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/ACWV5N9M2RV99hQ!Op20OCZE8kFi__wOXJ_Z0URZ2e_9fdaYz2tejZvKqiDgOm6ijq_imUptzxsrej_4riwCrBGeKmQ9VKXqnbV1ujbfiOV5-E2e1s3pKqpqUL-gRIuMQLDLygRD1hoX3Q$>
ZjQcmQRYFpfptBannerEnd

On Fri, Mar 8, 2024 at 6:01 PM Michael S. Tsirkin  wrote:


On Mon, Mar 04, 2024 at 02:46:06PM -0500, Jonah Palmer wrote:

Prevent ioeventfd from being enabled/disabled when a virtio-pci
device has negotiated the VIRTIO_F_NOTIFICATION_DATA transport
feature.

Due to ioeventfd not being able to carry the extra data associated with
this feature, the ioeventfd should be left in a disabled state for
emulated virtio-pci devices using this feature.

Reviewed-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 


I thought hard about this. I propose that for now,
instead of disabling ioevetfd silently we error out unless
user disabled it for us.
WDYT?



Yes, error is a better plan than silently disabling it. In the
(unlikely?) case we are able to make notification data work with
eventfd in the future, it makes the change more evident.



Will do in v2. I assume we'll also make this the case for virtio-mmio and
virtio-ccw?


Guess so. Pls note freeze is imminent.


Got it. Also, would you mind elaborating a bit more on "error out"? E.g. do
we want to prevent the Qemu from starting at all if a device is attempting
to use both VIRTIO_F_NOTIFICATION_DATA and ioeventfd? Or do you mean
something like still keep ioeventfd disabled but also log an error message
unless it was explicitly disabled by the user?



my preference would be to block device instance from being created.



I could very well be missing something here, but I was looking to see 
how I could block the device from being created (realized) given the 
functional mismatch between negotiating the VIRTIO_F_NOTIFICATION_DATA 
feature and ioeventfd being enabled.


However, I realized that feature negotiation only happens after the 
virtio device has been realized and it's one of the last steps before 
the device becomes fully operational. In other words, we don't know if 
the guest (driver) also supports this feature until the feature 
negotiation phase, which is after realization.


So, during realization (e.g. virtio_device_realize), we know if the 
virtio device (1) intends to negotiate the VIRTIO_F_NOTIFICATION_DATA 
feature and (2) has enabled ioeventfd, however, we don't know if the 
driver will actually support this notification data feature.


Given this, we could block the device from being created if the device 
is *intending* to use the notification data feature along with 
ioeventfd, but this seems premature since we don't know if the feature 
will actually be successfully negotiated.


Another option might be check this during/immediately after feature 
negotiation, and then unrealize the device. However, I'm not sure if by 
this point it's "too late" to unrealize it.


There's also other options like defaulting to using notification data 
over ioeventfd (since a user would need to explicitly enable it, showing 
intent to actually use the feature), which is what we're doing now, 
except we could add some kind of warning message for the user. Another 
option could be setting the device to broken. However, these options 
don't align with your suggestion of removing the device completely.


Let me know how you'd like me to proceed with this. Thanks!




---
   hw/virtio/virtio-pci.c | 6 --
   1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index d12edc567f..287b8f7720 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -417,13 +417,15 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
   }
   break;
   case VIRTIO_PCI_STATUS:
-if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
+if (!(val & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
   virtio_pci_stop_ioeventfd(proxy);
   }

   virtio_set_status(vdev, val & 0xFF);

-if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
+if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
   virtio_pci_start_ioeventfd(proxy);
   }

--
2.39.3












Re: [PATCH v1 2/8] virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

2024-03-11 Thread Jonah Palmer




On 3/8/24 2:19 PM, Michael S. Tsirkin wrote:

On Fri, Mar 08, 2024 at 12:45:13PM -0500, Jonah Palmer wrote:



On 3/8/24 12:36 PM, Eugenio Perez Martin wrote:

On Fri, Mar 8, 2024 at 6: 01 PM Michael S. Tsirkin 
wrote: > > On Mon, Mar 04, 2024 at 02: 46: 06PM -0500, Jonah Palmer
wrote: > > Prevent ioeventfd from being enabled/disabled when a
virtio-pci > > device
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
Report Suspicious
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/ACWV5N9M2RV99hQ!Op20OCZE8kFi__wOXJ_Z0URZ2e_9fdaYz2tejZvKqiDgOm6ijq_imUptzxsrej_4riwCrBGeKmQ9VKXqnbV1ujbfiOV5-E2e1s3pKqpqUL-gRIuMQLDLygRD1hoX3Q$>
ZjQcmQRYFpfptBannerEnd

On Fri, Mar 8, 2024 at 6:01 PM Michael S. Tsirkin  wrote:


On Mon, Mar 04, 2024 at 02:46:06PM -0500, Jonah Palmer wrote:

Prevent ioeventfd from being enabled/disabled when a virtio-pci
device has negotiated the VIRTIO_F_NOTIFICATION_DATA transport
feature.

Due to ioeventfd not being able to carry the extra data associated with
this feature, the ioeventfd should be left in a disabled state for
emulated virtio-pci devices using this feature.

Reviewed-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 


I thought hard about this. I propose that for now,
instead of disabling ioevetfd silently we error out unless
user disabled it for us.
WDYT?



Yes, error is a better plan than silently disabling it. In the
(unlikely?) case we are able to make notification data work with
eventfd in the future, it makes the change more evident.



Will do in v2. I assume we'll also make this the case for virtio-mmio and
virtio-ccw?


Guess so. Pls note freeze is imminent.


Got it. Also, would you mind elaborating a bit more on "error out"? E.g. 
do we want to prevent the Qemu from starting at all if a device is 
attempting to use both VIRTIO_F_NOTIFICATION_DATA and ioeventfd? Or do 
you mean something like still keep ioeventfd disabled but also log an 
error message unless it was explicitly disabled by the user?





---
  hw/virtio/virtio-pci.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index d12edc567f..287b8f7720 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -417,13 +417,15 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
  }
  break;
  case VIRTIO_PCI_STATUS:
-if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
+if (!(val & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
  virtio_pci_stop_ioeventfd(proxy);
  }

  virtio_set_status(vdev, val & 0xFF);

-if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
+if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
  virtio_pci_start_ioeventfd(proxy);
  }

--
2.39.3










Re: [PATCH v1 2/8] virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

2024-03-08 Thread Jonah Palmer




On 3/8/24 12:36 PM, Eugenio Perez Martin wrote:
On Fri, Mar 8, 2024 at 6: 01 PM Michael S. Tsirkin  
wrote: > > On Mon, Mar 04, 2024 at 02: 46: 06PM -0500, Jonah Palmer 
wrote: > > Prevent ioeventfd from being enabled/disabled when a 
virtio-pci > > device

ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
Report Suspicious
<https://us-phishalarm-ewt.proofpoint.com/EWT/v1/ACWV5N9M2RV99hQ!Op20OCZE8kFi__wOXJ_Z0URZ2e_9fdaYz2tejZvKqiDgOm6ijq_imUptzxsrej_4riwCrBGeKmQ9VKXqnbV1ujbfiOV5-E2e1s3pKqpqUL-gRIuMQLDLygRD1hoX3Q$>
ZjQcmQRYFpfptBannerEnd

On Fri, Mar 8, 2024 at 6:01 PM Michael S. Tsirkin  wrote:


On Mon, Mar 04, 2024 at 02:46:06PM -0500, Jonah Palmer wrote:
> Prevent ioeventfd from being enabled/disabled when a virtio-pci
> device has negotiated the VIRTIO_F_NOTIFICATION_DATA transport
> feature.
>
> Due to ioeventfd not being able to carry the extra data associated with
> this feature, the ioeventfd should be left in a disabled state for
> emulated virtio-pci devices using this feature.
>
> Reviewed-by: Eugenio Pérez 
> Signed-off-by: Jonah Palmer 

I thought hard about this. I propose that for now,
instead of disabling ioevetfd silently we error out unless
user disabled it for us.
WDYT?



Yes, error is a better plan than silently disabling it. In the
(unlikely?) case we are able to make notification data work with
eventfd in the future, it makes the change more evident.



Will do in v2. I assume we'll also make this the case for virtio-mmio 
and virtio-ccw?




> ---
>  hw/virtio/virtio-pci.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index d12edc567f..287b8f7720 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -417,13 +417,15 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
>  }
>  break;
>  case VIRTIO_PCI_STATUS:
> -if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
> +if (!(val & VIRTIO_CONFIG_S_DRIVER_OK) &&
> +!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
>  virtio_pci_stop_ioeventfd(proxy);
>  }
>
>  virtio_set_status(vdev, val & 0xFF);
>
> -if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
> +if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
> +!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
>  virtio_pci_start_ioeventfd(proxy);
>  }
>
> --
> 2.39.3







Re: [PATCH v1 0/8] virtio, vhost: Add VIRTIO_F_NOTIFICATION_DATA support

2024-03-08 Thread Jonah Palmer




On 3/8/24 8:28 AM, Lei Yang wrote:

Hi Jonah

QE tested this series v1 with a tap device with vhost=off with
regression tests, everything works fine. And QE also add
"notification_data=true" to the qemu command line then got "1" when
performing the command [1] inside the guest.
[1] cut -c39 /sys/devices/pci:00/:00:01.3/:05:00.0/virtio1/features

Tested-by: Lei Yang 



Thank you for double-checking this series for me Lei! I appreciate it :)

Jonah


On Thu, Mar 7, 2024 at 7:18 PM Eugenio Perez Martin  wrote:


On Wed, Mar 6, 2024 at 8:34 AM Michael S. Tsirkin  wrote:


On Wed, Mar 06, 2024 at 08:07:31AM +0100, Eugenio Perez Martin wrote:

On Wed, Mar 6, 2024 at 6:34 AM Jason Wang  wrote:


On Tue, Mar 5, 2024 at 3:46 AM Jonah Palmer  wrote:


The goal of these patches are to add support to a variety of virtio and
vhost devices for the VIRTIO_F_NOTIFICATION_DATA transport feature. This
feature indicates that a driver will pass extra data (instead of just a
virtqueue's index) when notifying the corresponding device.

The data passed in by the driver when this feature is enabled varies in
format depending on if the device is using a split or packed virtqueue
layout:

  Split VQ
   - Upper 16 bits: shadow_avail_idx
   - Lower 16 bits: virtqueue index

  Packed VQ
   - Upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
   - Lower 16 bits: virtqueue index

Also, due to the limitations of ioeventfd not being able to carry the
extra provided by the driver, ioeventfd is left disabled for any devices
using this feature.


Is there any method to overcome this? This might help for vhost.



As a half-baked idea, read(2)ing an eventfd descriptor returns an
8-byte integer already. The returned value of read depends on eventfd
flags, but both have to do with the number of writes of the other end.

My proposal is to replace this value with the last value written by
the guest, so we can extract the virtio notification data from there.
The behavior of read is similar to not-EFD_SEMAPHORE, reading a value
and then blocking if read again without writes. The behavior of KVM
writes is different, as it is not a counter anymore.

Thanks!



I doubt you will be able to support this in ioeventfd...


I agree.


But vhost does not really need the value at all.
So why mask out ioeventfd with vhost?


The interface should not be able to start with vhost-kernel because
the feature is not offered by the vhost-kernel device. So ioeventfd is
always enabled with vhost-kernel.

Or do you mean we should allow it and let vhost-kernel fetch data from
the avail ring as usual? I'm ok with that but then the guest can place
any value to it, so the driver cannot be properly "validated by
software" that way.


vhost-vdpa is probably the only one that might need it...


Right, but vhost-vdpa already supports doorbell memory regions so I
guess it has little use, isn't it?

Thanks!






Thanks



A significant aspect of this effort has been to maintain compatibility
across different backends. As such, the feature is offered by backend
devices only when supported, with fallback mechanisms where backend
support is absent.

Jonah Palmer (8):
   virtio/virtio-pci: Handle extra notification data
   virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
   virtio-mmio: Handle extra notification data
   virtio-mmio: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
   virtio-ccw: Handle extra notification data
   virtio-ccw: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
   vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits
   virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition

  hw/block/vhost-user-blk.c|  1 +
  hw/net/vhost_net.c   |  2 ++
  hw/s390x/s390-virtio-ccw.c   | 16 
  hw/s390x/virtio-ccw.c|  6 --
  hw/scsi/vhost-scsi.c |  1 +
  hw/scsi/vhost-user-scsi.c|  1 +
  hw/virtio/vhost-user-fs.c|  2 +-
  hw/virtio/vhost-user-vsock.c |  1 +
  hw/virtio/virtio-mmio.c  | 15 +++
  hw/virtio/virtio-pci.c   | 16 +++-
  hw/virtio/virtio.c   | 18 ++
  include/hw/virtio/virtio.h   |  5 -
  net/vhost-vdpa.c |  1 +
  13 files changed, 68 insertions(+), 17 deletions(-)

--
2.39.3














[PATCH v1 7/8] vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits

2024-03-04 Thread Jonah Palmer
Add support for the VIRTIO_F_NOTIFICATION_DATA feature across a variety
of vhost devices.

The inclusion of VIRTIO_F_NOTIFICATION_DATA in the feature bits arrays
for these devices ensures that the backend is capable of offering and
providing support for this feature, and that it can be disabled if the
backend does not support it.

Reviewed-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 hw/block/vhost-user-blk.c| 1 +
 hw/net/vhost_net.c   | 2 ++
 hw/scsi/vhost-scsi.c | 1 +
 hw/scsi/vhost-user-scsi.c| 1 +
 hw/virtio/vhost-user-fs.c| 2 +-
 hw/virtio/vhost-user-vsock.c | 1 +
 net/vhost-vdpa.c | 1 +
 7 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 6a856ad51a..983c0657da 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e8e1661646..bb1f975b39 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VIRTIO_NET_F_HASH_REPORT,
 VHOST_INVALID_FEATURE_BIT
 };
@@ -55,6 +56,7 @@ static const int kernel_feature_bits[] = {
 /* Features supported by others. */
 static const int user_feature_bits[] = {
 VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_NOTIFICATION_DATA,
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 58a00336c2..b8048f18e9 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a63b1f4948..0b050805a8 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index cca2cd41be..ae48cc1c96 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -33,7 +33,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
-
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
index 9431b9792c..802b44a07d 100644
--- a/hw/virtio/vhost-user-vsock.c
+++ b/hw/virtio/vhost-user-vsock.c
@@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index e6bdb4562d..08b822e6ed 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
 VIRTIO_F_VERSION_1,
+VIRTIO_F_NOTIFICATION_DATA,
 VIRTIO_NET_F_CSUM,
 VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
 VIRTIO_NET_F_CTRL_MAC_ADDR,
-- 
2.39.3




[PATCH v1 5/8] virtio-ccw: Handle extra notification data

2024-03-04 Thread Jonah Palmer
Add support to virtio-ccw devices for handling the extra data sent from
the driver to the device when the VIRTIO_F_NOTIFICATION_DATA transport
feature has been negotiated.

The extra data that's passed to the virtio-ccw device when this feature
is enabled varies depending on the device's virtqueue layout.

That data passed to the virtio-ccw device is in the same format as the
data passed to virtio-pci devices.

Acked-by: Thomas Huth 
Signed-off-by: Jonah Palmer 
---
 hw/s390x/s390-virtio-ccw.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 62804cc228..828052046b 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -140,9 +140,11 @@ static void subsystem_reset(void)
 static int virtio_ccw_hcall_notify(const uint64_t *args)
 {
 uint64_t subch_id = args[0];
-uint64_t queue = args[1];
+uint64_t data = args[1];
 SubchDev *sch;
+VirtIODevice *vdev;
 int cssid, ssid, schid, m;
+uint16_t vq_idx = data;
 
 if (ioinst_disassemble_sch_ident(subch_id, , , , )) {
 return -EINVAL;
@@ -151,12 +153,18 @@ static int virtio_ccw_hcall_notify(const uint64_t *args)
 if (!sch || !css_subch_visible(sch)) {
 return -EINVAL;
 }
-if (queue >= VIRTIO_QUEUE_MAX) {
+
+if (vq_idx >= VIRTIO_QUEUE_MAX) {
 return -EINVAL;
 }
-virtio_queue_notify(virtio_ccw_get_vdev(sch), queue);
-return 0;
 
+vdev = virtio_ccw_get_vdev(sch);
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+virtio_queue_set_shadow_avail_data(vdev, data);
+}
+
+virtio_queue_notify(vdev, vq_idx);
+return 0;
 }
 
 static int virtio_ccw_hcall_early_printk(const uint64_t *args)
-- 
2.39.3




[PATCH v1 2/8] virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

2024-03-04 Thread Jonah Palmer
Prevent ioeventfd from being enabled/disabled when a virtio-pci
device has negotiated the VIRTIO_F_NOTIFICATION_DATA transport
feature.

Due to ioeventfd not being able to carry the extra data associated with
this feature, the ioeventfd should be left in a disabled state for
emulated virtio-pci devices using this feature.

Reviewed-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-pci.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index d12edc567f..287b8f7720 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -417,13 +417,15 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 }
 break;
 case VIRTIO_PCI_STATUS:
-if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
+if (!(val & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_pci_stop_ioeventfd(proxy);
 }
 
 virtio_set_status(vdev, val & 0xFF);
 
-if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
+if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_pci_start_ioeventfd(proxy);
 }
 
-- 
2.39.3




[PATCH v1 3/8] virtio-mmio: Handle extra notification data

2024-03-04 Thread Jonah Palmer
Add support to virtio-mmio devices for handling the extra data sent from
the driver to the device when the VIRTIO_F_NOTIFICATION_DATA transport
feature has been negotiated.

The extra data that's passed to the virtio-mmio device when this feature
is enabled varies depending on the device's virtqueue layout.

The data passed to the virtio-mmio device is in the same format as the
data passed to virtio-pci devices.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-mmio.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index 22f9fbcf5a..f99d5851a2 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -248,6 +248,7 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 {
 VirtIOMMIOProxy *proxy = (VirtIOMMIOProxy *)opaque;
 VirtIODevice *vdev = virtio_bus_get_device(>bus);
+uint16_t vq_idx;
 
 trace_virtio_mmio_write_offset(offset, value);
 
@@ -407,8 +408,12 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 }
 break;
 case VIRTIO_MMIO_QUEUE_NOTIFY:
-if (value < VIRTIO_QUEUE_MAX) {
-virtio_queue_notify(vdev, value);
+vq_idx = value;
+if (vq_idx < VIRTIO_QUEUE_MAX) {
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+virtio_queue_set_shadow_avail_data(vdev, value);
+}
+virtio_queue_notify(vdev, vq_idx);
 }
 break;
 case VIRTIO_MMIO_INTERRUPT_ACK:
-- 
2.39.3




[PATCH v1 6/8] virtio-ccw: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

2024-03-04 Thread Jonah Palmer
Prevent ioeventfd from being enabled/disabled when a virtio-ccw device
has negotiated the VIRTIO_F_NOTIFICATION_DATA transport feature.

Due to the ioeventfd not being able to carry the extra data associated
with this feature, the ioeventfd should be left in a disabled state for
emulated virtio-ccw devices using this feature.

Acked-by: Thomas Huth 
Signed-off-by: Jonah Palmer 
---
 hw/s390x/virtio-ccw.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index b4676909dd..936ba78fda 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -530,14 +530,16 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 if (ret) {
 break;
 }
-if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+if (!(status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_ccw_stop_ioeventfd(dev);
 }
 if (virtio_set_status(vdev, status) == 0) {
 if (vdev->status == 0) {
 virtio_ccw_reset_virtio(dev);
 }
-if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, 
VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_ccw_start_ioeventfd(dev);
 }
 sch->curr_status.scsw.count = ccw.count - sizeof(status);
-- 
2.39.3




[PATCH v1 0/8] virtio,vhost: Add VIRTIO_F_NOTIFICATION_DATA support

2024-03-04 Thread Jonah Palmer
The goal of these patches are to add support to a variety of virtio and
vhost devices for the VIRTIO_F_NOTIFICATION_DATA transport feature. This
feature indicates that a driver will pass extra data (instead of just a
virtqueue's index) when notifying the corresponding device.

The data passed in by the driver when this feature is enabled varies in
format depending on if the device is using a split or packed virtqueue
layout:

 Split VQ
  - Upper 16 bits: shadow_avail_idx
  - Lower 16 bits: virtqueue index

 Packed VQ
  - Upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
  - Lower 16 bits: virtqueue index

Also, due to the limitations of ioeventfd not being able to carry the
extra provided by the driver, ioeventfd is left disabled for any devices
using this feature.

A significant aspect of this effort has been to maintain compatibility
across different backends. As such, the feature is offered by backend
devices only when supported, with fallback mechanisms where backend
support is absent.

Jonah Palmer (8):
  virtio/virtio-pci: Handle extra notification data
  virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
  virtio-mmio: Handle extra notification data
  virtio-mmio: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
  virtio-ccw: Handle extra notification data
  virtio-ccw: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
  vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits
  virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition

 hw/block/vhost-user-blk.c|  1 +
 hw/net/vhost_net.c   |  2 ++
 hw/s390x/s390-virtio-ccw.c   | 16 
 hw/s390x/virtio-ccw.c|  6 --
 hw/scsi/vhost-scsi.c |  1 +
 hw/scsi/vhost-user-scsi.c|  1 +
 hw/virtio/vhost-user-fs.c|  2 +-
 hw/virtio/vhost-user-vsock.c |  1 +
 hw/virtio/virtio-mmio.c  | 15 +++
 hw/virtio/virtio-pci.c   | 16 +++-
 hw/virtio/virtio.c   | 18 ++
 include/hw/virtio/virtio.h   |  5 -
 net/vhost-vdpa.c |  1 +
 13 files changed, 68 insertions(+), 17 deletions(-)

-- 
2.39.3




[PATCH v1 1/8] virtio/virtio-pci: Handle extra notification data

2024-03-04 Thread Jonah Palmer
Add support to virtio-pci devices for handling the extra data sent
from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
transport feature has been negotiated.

The extra data that's passed to the virtio-pci device when this
feature is enabled varies depending on the device's virtqueue
layout.

In a split virtqueue layout, this data includes:
 - upper 16 bits: shadow_avail_idx
 - lower 16 bits: virtqueue index

In a packed virtqueue layout, this data includes:
 - upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
 - lower 16 bits: virtqueue index

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-pci.c | 10 +++---
 hw/virtio/virtio.c | 18 ++
 include/hw/virtio/virtio.h |  1 +
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 1a7039fb0c..d12edc567f 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 {
 VirtIOPCIProxy *proxy = opaque;
 VirtIODevice *vdev = virtio_bus_get_device(>bus);
-uint16_t vector;
+uint16_t vector, vq_idx;
 hwaddr pa;
 
 switch (addr) {
@@ -408,8 +408,12 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 vdev->queue_sel = val;
 break;
 case VIRTIO_PCI_QUEUE_NOTIFY:
-if (val < VIRTIO_QUEUE_MAX) {
-virtio_queue_notify(vdev, val);
+vq_idx = val;
+if (vq_idx < VIRTIO_QUEUE_MAX) {
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+virtio_queue_set_shadow_avail_data(vdev, val);
+}
+virtio_queue_notify(vdev, vq_idx);
 }
 break;
 case VIRTIO_PCI_STATUS:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index d229755eae..bcb9e09df0 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2255,6 +2255,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, 
int align)
 }
 }
 
+void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data)
+{
+/* Lower 16 bits is the virtqueue index */
+uint16_t i = data;
+VirtQueue *vq = >vq[i];
+
+if (!vq->vring.desc) {
+return;
+}
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+vq->shadow_avail_wrap_counter = (data >> 31) & 0x1;
+vq->shadow_avail_idx = (data >> 16) & 0x7FFF;
+} else {
+vq->shadow_avail_idx = (data >> 16);
+}
+}
+
 static void virtio_queue_notify_vq(VirtQueue *vq)
 {
 if (vq->vring.desc && vq->handle_output) {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index c8f72850bc..53915947a7 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -335,6 +335,7 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n);
 void virtio_init_region_cache(VirtIODevice *vdev, int n);
 void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
 void virtio_queue_notify(VirtIODevice *vdev, int n);
+void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data);
 uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
 void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector);
 int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
-- 
2.39.3




[PATCH v1 8/8] virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition

2024-03-04 Thread Jonah Palmer
Extend the virtio device property definitions to include the
VIRTIO_F_NOTIFICATION_DATA feature.

The default state of this feature is disabled, allowing it to be
explicitly enabled where it's supported.

Reviewed-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 53915947a7..41ef3c4aef 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -369,7 +369,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
 DEFINE_PROP_BIT64("packed", _state, _field, \
   VIRTIO_F_RING_PACKED, false), \
 DEFINE_PROP_BIT64("queue_reset", _state, _field, \
-  VIRTIO_F_RING_RESET, true)
+  VIRTIO_F_RING_RESET, true), \
+DEFINE_PROP_BIT64("notification_data", _state, _field, \
+  VIRTIO_F_NOTIFICATION_DATA, false)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
-- 
2.39.3




[PATCH v1 4/8] virtio-mmio: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

2024-03-04 Thread Jonah Palmer
Prevent ioeventfd from being enabled/disabled when a virtio-mmio device
has negotiated the VIRTIO_F_NOTIFICATION_DATA transport feature.

Due to ioeventfd not being able to carry the extra data associated with
this feature, the ioeventfd should be left in a disabled state for
emulated virtio-mmio devices using this feature.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-mmio.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index f99d5851a2..f42ed5c512 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -421,7 +421,8 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 virtio_update_irq(vdev);
 break;
 case VIRTIO_MMIO_STATUS:
-if (!(value & VIRTIO_CONFIG_S_DRIVER_OK)) {
+if (!(value & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_mmio_stop_ioeventfd(proxy);
 }
 
@@ -433,7 +434,8 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 
 virtio_set_status(vdev, value & 0xff);
 
-if (value & VIRTIO_CONFIG_S_DRIVER_OK) {
+if ((value & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_mmio_start_ioeventfd(proxy);
 }
 
-- 
2.39.3




Re: [RFC 1/8] virtio/virtio-pci: Handle extra notification data

2024-03-04 Thread Jonah Palmer




On 3/4/24 12:24 PM, Eugenio Perez Martin wrote:

On Mon, Mar 4, 2024 at 6:09 PM Jonah Palmer  wrote:




On 3/1/24 2:31 PM, Eugenio Perez Martin wrote:

On Fri, Mar 1, 2024 at 2:44 PM Jonah Palmer  wrote:


Add support to virtio-pci devices for handling the extra data sent
from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
transport feature has been negotiated.

The extra data that's passed to the virtio-pci device when this
feature is enabled varies depending on the device's virtqueue
layout.

In a split virtqueue layout, this data includes:
   - upper 16 bits: last_avail_idx
   - lower 16 bits: virtqueue index

In a packed virtqueue layout, this data includes:
   - upper 16 bits: 1-bit wrap counter & 15-bit last_avail_idx
   - lower 16 bits: virtqueue index

Signed-off-by: Jonah Palmer 
---
   hw/virtio/virtio-pci.c | 13 ++---
   hw/virtio/virtio.c | 13 +
   include/hw/virtio/virtio.h |  1 +
   3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 1a7039fb0c..c7c577b177 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
   {
   VirtIOPCIProxy *proxy = opaque;
   VirtIODevice *vdev = virtio_bus_get_device(>bus);
-uint16_t vector;
+uint16_t vector, vq_idx;
   hwaddr pa;

   switch (addr) {
@@ -408,8 +408,15 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
   vdev->queue_sel = val;
   break;
   case VIRTIO_PCI_QUEUE_NOTIFY:
-if (val < VIRTIO_QUEUE_MAX) {
-virtio_queue_notify(vdev, val);
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+vq_idx = val & 0x;


Nitpick, but since vq_idx is already a uint16_t the & 0x is not
needed.


Ah okay. I wasn't sure if it was worthwhile to keep the '& 0x' in or
not for the sake of clarity and good practice. In that case I could just
do away with vq_idx here and use explicit casting on 'val'.


I think it's cleaner just to call virtio_set_notification data
in the has_feature(...) condition, but I'm happy with this too.


Do you mean something like:

if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA) &&
  virtio_set_notification_data(vdev, vq_idx, val)) {
  ...
}



Sorry I was not clear, I meant just to take out the common code of the
conditionals:
vq_idx = val;
if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA) {
 virtio_set_notification_data(vdev, val);
}



Ah, no problem! Thank you for the clarification!


Though I'm not sure what would then go in the body of this conditional,
especially if I did something like:

case VIRTIO_PCI_QUEUE_NOTIFY:
  if ((uint16_t)val < VIRTIO_QUEUE_MAX) {
  if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA) &&
  virtio_set_notification_data(vdev, val)) {
  // Not sure what to put here other than a no-op
  }

  virtio_queue_notify(vdev, (uint16_t)val);
  }
  break;

But I'm not sure if you'd prefer this explicit casting of 'val' over
implicit casting like:

uint16_t vq_idx = val;




+virtio_set_notification_data(vdev, vq_idx, val);
+} else {
+vq_idx = val;
+}
+
+if (vq_idx < VIRTIO_QUEUE_MAX) {
+virtio_queue_notify(vdev, vq_idx);
   }
   break;
   case VIRTIO_PCI_STATUS:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index d229755eae..a61f69b7fd 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2052,6 +2052,19 @@ int virtio_set_status(VirtIODevice *vdev, uint8_t val)
   return 0;
   }

+void virtio_set_notification_data(VirtIODevice *vdev, uint16_t i, uint32_t 
data)
+{
+VirtQueue *vq = >vq[i];
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+vq->last_avail_wrap_counter = (data >> 31) & 0x1;
+vq->last_avail_idx = (data >> 16) & 0x7FFF;
+} else {
+vq->last_avail_idx = (data >> 16) & 0x;
+}


It should not set last_avail_idx, only shadow_avail_idx. Otherwise,
QEMU can only see the descriptors placed after the notification.

Or am I missing something?

In that regard, I would call this function
"virtqueue_set_shadow_avail_idx". But I'm very bad at naming :).


Ah that's right. This would make Qemu skip processing descriptors that
might've been made available before the notification but after the
host's last check of last_avail_idx. In other words, ignoring available
descriptors that were placed before the notification but not yet
processed. Good catch, thank you!

So, for the packed VQ layout, we'll still want to save the wrap counter
but for the shadow_avail_idx, right? E.g.

if (virtio_vdev_has_feature(vdev, VIRTIO_F_R

Re: [RFC 7/8] vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits

2024-03-04 Thread Jonah Palmer




On 3/1/24 3:04 PM, Eugenio Perez Martin wrote:

On Fri, Mar 1, 2024 at 2:44 PM Jonah Palmer  wrote:


Add support for the VIRTIO_F_NOTIFICATION_DATA feature across a variety
of vhost devices.

The inclusion of VIRTIO_F_NOTIFICATION_DATA in the feature bits arrays
for these devices ensures that the backend is capable of offering and
providing support for this feature, and that it can be disabled if the
backend does not support it.

Signed-off-by: Jonah Palmer 
---
  hw/block/vhost-user-blk.c| 1 +
  hw/net/vhost_net.c   | 2 ++
  hw/scsi/vhost-scsi.c | 1 +
  hw/scsi/vhost-user-scsi.c| 1 +
  hw/virtio/vhost-user-fs.c| 2 +-
  hw/virtio/vhost-user-vsock.c | 1 +
  net/vhost-vdpa.c | 1 +
  7 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 6a856ad51a..983c0657da 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
  VIRTIO_F_RING_PACKED,
  VIRTIO_F_IOMMU_PLATFORM,
  VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
  VHOST_INVALID_FEATURE_BIT
  };

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e8e1661646..bb1f975b39 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
  VIRTIO_F_IOMMU_PLATFORM,
  VIRTIO_F_RING_PACKED,
  VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
  VIRTIO_NET_F_HASH_REPORT,
  VHOST_INVALID_FEATURE_BIT
  };
@@ -55,6 +56,7 @@ static const int kernel_feature_bits[] = {
  /* Features supported by others. */
  static const int user_feature_bits[] = {
  VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_NOTIFICATION_DATA,
  VIRTIO_RING_F_INDIRECT_DESC,
  VIRTIO_RING_F_EVENT_IDX,



vdpa_feature_bits also needs this feature bit added.


The vdpa_feature_bits in /net/vhost-vdpa.c, right? I did add this 
feature bit to this list, unless you're referring to something else.




Apart from that,

Reviewed-by: Eugenio Pérez 


diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 58a00336c2..b8048f18e9 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
  VIRTIO_RING_F_EVENT_IDX,
  VIRTIO_SCSI_F_HOTPLUG,
  VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
  VHOST_INVALID_FEATURE_BIT
  };

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a63b1f4948..0b050805a8 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
  VIRTIO_RING_F_EVENT_IDX,
  VIRTIO_SCSI_F_HOTPLUG,
  VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
  VHOST_INVALID_FEATURE_BIT
  };

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index cca2cd41be..ae48cc1c96 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -33,7 +33,7 @@ static const int user_feature_bits[] = {
  VIRTIO_F_RING_PACKED,
  VIRTIO_F_IOMMU_PLATFORM,
  VIRTIO_F_RING_RESET,
-
+VIRTIO_F_NOTIFICATION_DATA,
  VHOST_INVALID_FEATURE_BIT
  };

diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
index 9431b9792c..802b44a07d 100644
--- a/hw/virtio/vhost-user-vsock.c
+++ b/hw/virtio/vhost-user-vsock.c
@@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
  VIRTIO_RING_F_INDIRECT_DESC,
  VIRTIO_RING_F_EVENT_IDX,
  VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_NOTIFICATION_DATA,
  VHOST_INVALID_FEATURE_BIT
  };

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 3726ee5d67..2827d29ce7 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
  VIRTIO_F_RING_PACKED,
  VIRTIO_F_RING_RESET,
  VIRTIO_F_VERSION_1,
+VIRTIO_F_NOTIFICATION_DATA,
  VIRTIO_NET_F_CSUM,
  VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
  VIRTIO_NET_F_CTRL_MAC_ADDR,
--
2.39.3







Re: [RFC 1/8] virtio/virtio-pci: Handle extra notification data

2024-03-04 Thread Jonah Palmer




On 3/1/24 2:31 PM, Eugenio Perez Martin wrote:

On Fri, Mar 1, 2024 at 2:44 PM Jonah Palmer  wrote:


Add support to virtio-pci devices for handling the extra data sent
from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
transport feature has been negotiated.

The extra data that's passed to the virtio-pci device when this
feature is enabled varies depending on the device's virtqueue
layout.

In a split virtqueue layout, this data includes:
  - upper 16 bits: last_avail_idx
  - lower 16 bits: virtqueue index

In a packed virtqueue layout, this data includes:
  - upper 16 bits: 1-bit wrap counter & 15-bit last_avail_idx
  - lower 16 bits: virtqueue index

Signed-off-by: Jonah Palmer 
---
  hw/virtio/virtio-pci.c | 13 ++---
  hw/virtio/virtio.c | 13 +
  include/hw/virtio/virtio.h |  1 +
  3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 1a7039fb0c..c7c577b177 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
  {
  VirtIOPCIProxy *proxy = opaque;
  VirtIODevice *vdev = virtio_bus_get_device(>bus);
-uint16_t vector;
+uint16_t vector, vq_idx;
  hwaddr pa;

  switch (addr) {
@@ -408,8 +408,15 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
  vdev->queue_sel = val;
  break;
  case VIRTIO_PCI_QUEUE_NOTIFY:
-if (val < VIRTIO_QUEUE_MAX) {
-virtio_queue_notify(vdev, val);
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+vq_idx = val & 0x;


Nitpick, but since vq_idx is already a uint16_t the & 0x is not
needed. 


Ah okay. I wasn't sure if it was worthwhile to keep the '& 0x' in or 
not for the sake of clarity and good practice. In that case I could just 
do away with vq_idx here and use explicit casting on 'val'.



I think it's cleaner just to call virtio_set_notification data
in the has_feature(...) condition, but I'm happy with this too.


Do you mean something like:

if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA) &&
virtio_set_notification_data(vdev, vq_idx, val)) {
...
}

Though I'm not sure what would then go in the body of this conditional, 
especially if I did something like:


case VIRTIO_PCI_QUEUE_NOTIFY:
if ((uint16_t)val < VIRTIO_QUEUE_MAX) {
if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA) &&
virtio_set_notification_data(vdev, val)) {
// Not sure what to put here other than a no-op
}

virtio_queue_notify(vdev, (uint16_t)val);
}
break;

But I'm not sure if you'd prefer this explicit casting of 'val' over 
implicit casting like:


uint16_t vq_idx = val;




+virtio_set_notification_data(vdev, vq_idx, val);
+} else {
+vq_idx = val;
+}
+
+if (vq_idx < VIRTIO_QUEUE_MAX) {
+virtio_queue_notify(vdev, vq_idx);
  }
  break;
  case VIRTIO_PCI_STATUS:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index d229755eae..a61f69b7fd 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2052,6 +2052,19 @@ int virtio_set_status(VirtIODevice *vdev, uint8_t val)
  return 0;
  }

+void virtio_set_notification_data(VirtIODevice *vdev, uint16_t i, uint32_t 
data)
+{
+VirtQueue *vq = >vq[i];
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+vq->last_avail_wrap_counter = (data >> 31) & 0x1;
+vq->last_avail_idx = (data >> 16) & 0x7FFF;
+} else {
+vq->last_avail_idx = (data >> 16) & 0x;
+}


It should not set last_avail_idx, only shadow_avail_idx. Otherwise,
QEMU can only see the descriptors placed after the notification.

Or am I missing something?

In that regard, I would call this function
"virtqueue_set_shadow_avail_idx". But I'm very bad at naming :).


Ah that's right. This would make Qemu skip processing descriptors that 
might've been made available before the notification but after the 
host's last check of last_avail_idx. In other words, ignoring available 
descriptors that were placed before the notification but not yet 
processed. Good catch, thank you!


So, for the packed VQ layout, we'll still want to save the wrap counter 
but for the shadow_avail_idx, right? E.g.


if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
vq->shadow_avail_wrap_counter = (data >> 31) & 0x1;
vq->shadow_avail_idx = (data >> 16) & 0x7FFF;
} else {
vq->shadow_avail_idx = (data >> 16);
}



The rest looks good to me.

Thanks!


+vq->shadow_avail_idx = vq->last_avail_idx;
+}
+
  static enum virtio_device_endian virtio_default_endian(void)
  {
  if (target_wo

Re: [RFC 1/8] virtio/virtio-pci: Handle extra notification data

2024-03-04 Thread Jonah Palmer




On 3/1/24 2:55 PM, Eugenio Perez Martin wrote:

On Fri, Mar 1, 2024 at 2:44 PM Jonah Palmer  wrote:


Add support to virtio-pci devices for handling the extra data sent
from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
transport feature has been negotiated.

The extra data that's passed to the virtio-pci device when this
feature is enabled varies depending on the device's virtqueue
layout.

In a split virtqueue layout, this data includes:
  - upper 16 bits: last_avail_idx
  - lower 16 bits: virtqueue index

In a packed virtqueue layout, this data includes:
  - upper 16 bits: 1-bit wrap counter & 15-bit last_avail_idx
  - lower 16 bits: virtqueue index

Signed-off-by: Jonah Palmer 
---
  hw/virtio/virtio-pci.c | 13 ++---
  hw/virtio/virtio.c | 13 +
  include/hw/virtio/virtio.h |  1 +
  3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 1a7039fb0c..c7c577b177 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
  {
  VirtIOPCIProxy *proxy = opaque;
  VirtIODevice *vdev = virtio_bus_get_device(>bus);
-uint16_t vector;
+uint16_t vector, vq_idx;
  hwaddr pa;

  switch (addr) {
@@ -408,8 +408,15 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
  vdev->queue_sel = val;
  break;
  case VIRTIO_PCI_QUEUE_NOTIFY:
-if (val < VIRTIO_QUEUE_MAX) {
-virtio_queue_notify(vdev, val);
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+vq_idx = val & 0x;
+virtio_set_notification_data(vdev, vq_idx, val);
+} else {
+vq_idx = val;
+}
+
+if (vq_idx < VIRTIO_QUEUE_MAX) {
+virtio_queue_notify(vdev, vq_idx);
  }
  break;
  case VIRTIO_PCI_STATUS:
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index d229755eae..a61f69b7fd 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2052,6 +2052,19 @@ int virtio_set_status(VirtIODevice *vdev, uint8_t val)
  return 0;
  }

+void virtio_set_notification_data(VirtIODevice *vdev, uint16_t i, uint32_t 
data)
+{
+VirtQueue *vq = >vq[i];


Sorry I sent the previous mail too fast :).

i should also be checked against VIRTIO_QUEUE_MAX and vq->vring.desc
before continuing this function. Otherwise is an out of bound access.


Missed this, thank you. I will add these checks in!




+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+vq->last_avail_wrap_counter = (data >> 31) & 0x1;
+vq->last_avail_idx = (data >> 16) & 0x7FFF;
+} else {
+vq->last_avail_idx = (data >> 16) & 0x;
+}
+vq->shadow_avail_idx = vq->last_avail_idx;
+}
+
  static enum virtio_device_endian virtio_default_endian(void)
  {
  if (target_words_bigendian()) {
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index c8f72850bc..c92d8afc42 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -345,6 +345,7 @@ void virtio_queue_reset(VirtIODevice *vdev, uint32_t 
queue_index);
  void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
  void virtio_update_irq(VirtIODevice *vdev);
  int virtio_set_features(VirtIODevice *vdev, uint64_t val);
+void virtio_set_notification_data(VirtIODevice *vdev, uint16_t i, uint32_t 
data);

  /* Base devices.  */
  typedef struct VirtIOBlkConf VirtIOBlkConf;
--
2.39.3







[RFC 3/8] virtio-mmio: Handle extra notification data

2024-03-01 Thread Jonah Palmer
Add support to virtio-mmio devices for handling the extra data sent from
the driver to the device when the VIRTIO_F_NOTIFICATION_DATA transport
feature has been negotiated.

The extra data that's passed to the virtio-mmio device when this feature
is enabled varies depending on the device's virtqueue layout.

The data passed to the virtio-mmio device is in the same format as the
data passed to virtio-pci devices.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-mmio.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index 22f9fbcf5a..2bac77460e 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -248,6 +248,7 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 {
 VirtIOMMIOProxy *proxy = (VirtIOMMIOProxy *)opaque;
 VirtIODevice *vdev = virtio_bus_get_device(>bus);
+uint16_t vq_idx;
 
 trace_virtio_mmio_write_offset(offset, value);
 
@@ -407,8 +408,15 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 }
 break;
 case VIRTIO_MMIO_QUEUE_NOTIFY:
-if (value < VIRTIO_QUEUE_MAX) {
-virtio_queue_notify(vdev, value);
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+vq_idx = value & 0x;
+virtio_set_notification_data(vdev, vq_idx, value);
+} else {
+vq_idx = value;
+}
+
+if (vq_idx < VIRTIO_QUEUE_MAX) {
+virtio_queue_notify(vdev, vq_idx);
 }
 break;
 case VIRTIO_MMIO_INTERRUPT_ACK:
-- 
2.39.3




[RFC 2/8] virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

2024-03-01 Thread Jonah Palmer
Prevent ioeventfd from being enabled/disabled when a virtio-pci
device has negotiated the VIRTIO_F_NOTIFICATION_DATA transport
feature.

Due to ioeventfd not being able to carry the extra data associated with
this feature, the ioeventfd should be left in a disabled state for
emulated virtio-pci devices using this feature.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-pci.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index c7c577b177..fd9717a0f5 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -420,13 +420,15 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 }
 break;
 case VIRTIO_PCI_STATUS:
-if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
+if (!(val & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_pci_stop_ioeventfd(proxy);
 }
 
 virtio_set_status(vdev, val & 0xFF);
 
-if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
+if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_pci_start_ioeventfd(proxy);
 }
 
-- 
2.39.3




[RFC 4/8] virtio-mmio: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

2024-03-01 Thread Jonah Palmer
Prevent ioeventfd from being enabled/disabled when a virtio-mmio device
has negotiated the VIRTIO_F_NOTIFICATION_DATA transport feature.

Due to ioeventfd not being able to carry the extra data associated with
this feature, the ioeventfd should be left in a disabled state for
emulated virtio-mmio devices using this feature.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-mmio.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
index 2bac77460e..fc780a03b2 100644
--- a/hw/virtio/virtio-mmio.c
+++ b/hw/virtio/virtio-mmio.c
@@ -424,7 +424,8 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 virtio_update_irq(vdev);
 break;
 case VIRTIO_MMIO_STATUS:
-if (!(value & VIRTIO_CONFIG_S_DRIVER_OK)) {
+if (!(value & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_mmio_stop_ioeventfd(proxy);
 }
 
@@ -436,7 +437,8 @@ static void virtio_mmio_write(void *opaque, hwaddr offset, 
uint64_t value,
 
 virtio_set_status(vdev, value & 0xff);
 
-if (value & VIRTIO_CONFIG_S_DRIVER_OK) {
+if ((value & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_mmio_start_ioeventfd(proxy);
 }
 
-- 
2.39.3




[RFC 6/8] virtio-ccw: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

2024-03-01 Thread Jonah Palmer
Prevent ioeventfd from being enabled/disabled when a virtio-ccw device
has negotiated the VIRTIO_F_NOTIFICATION_DATA transport feature.

Due to the ioeventfd not being able to carry the extra data associated
with this feature, the ioeventfd should be left in a disabled state for
emulated virtio-ccw devices using this feature.

Signed-off-by: Jonah Palmer 
---
 hw/s390x/virtio-ccw.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/s390x/virtio-ccw.c b/hw/s390x/virtio-ccw.c
index b4676909dd..936ba78fda 100644
--- a/hw/s390x/virtio-ccw.c
+++ b/hw/s390x/virtio-ccw.c
@@ -530,14 +530,16 @@ static int virtio_ccw_cb(SubchDev *sch, CCW1 ccw)
 if (ret) {
 break;
 }
-if (!(status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+if (!(status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_ccw_stop_ioeventfd(dev);
 }
 if (virtio_set_status(vdev, status) == 0) {
 if (vdev->status == 0) {
 virtio_ccw_reset_virtio(dev);
 }
-if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+!virtio_vdev_has_feature(vdev, 
VIRTIO_F_NOTIFICATION_DATA)) {
 virtio_ccw_start_ioeventfd(dev);
 }
 sch->curr_status.scsw.count = ccw.count - sizeof(status);
-- 
2.39.3




[RFC 0/8] virtio,vhost: Add VIRTIO_F_NOTIFICATION_DATA support

2024-03-01 Thread Jonah Palmer
The goal of these patches are to add support to a variety of virtio and
vhost devices for the VIRTIO_F_NOTIFICATION_DATA transport feature. This
feature indicates that a driver will pass extra data (instead of just a
virtqueue's index) when notifying the corresponding device.

The data passed in by the driver when this feature is enabled varies in
format depending on if the device is using a split or packed virtqueue
layout:

 Split VQ
  - Upper 16 bits: last_avail_idx
  - Lower 16 bits: virtqueue index

 Packed VQ
  - Upper 16 bits: 1-bit wrap counter & 15-bit last_avail_idx
  - Lower 16 bits: virtqueue index

Also, due to the limitations of ioeventfd not being able to carry the
extra provided by the driver, ioeventfd is left disabled for any devices
using this feature.

A significant aspect of this effort has been to maintain compatibility
across different backends. As such, the feature is offered by backend
devices only when supported, with fallback mechanisms where backend
support is absent.

Jonah Palmer (8):
  virtio/virtio-pci: Handle extra notification data
  virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
  virtio-mmio: Handle extra notification data
  virtio-mmio: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
  virtio-ccw: Handle extra notification data
  virtio-ccw: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
  vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits
  virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition

 hw/block/vhost-user-blk.c|  1 +
 hw/net/vhost_net.c   |  2 ++
 hw/s390x/s390-virtio-ccw.c   | 18 ++
 hw/s390x/virtio-ccw.c|  6 --
 hw/scsi/vhost-scsi.c |  1 +
 hw/scsi/vhost-user-scsi.c|  1 +
 hw/virtio/vhost-user-fs.c|  2 +-
 hw/virtio/vhost-user-vsock.c |  1 +
 hw/virtio/virtio-mmio.c  | 18 ++
 hw/virtio/virtio-pci.c   | 19 ++-
 hw/virtio/virtio.c   | 13 +
 include/hw/virtio/virtio.h   |  5 -
 net/vhost-vdpa.c |  1 +
 13 files changed, 71 insertions(+), 17 deletions(-)

-- 
2.39.3




[RFC 7/8] vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits

2024-03-01 Thread Jonah Palmer
Add support for the VIRTIO_F_NOTIFICATION_DATA feature across a variety
of vhost devices.

The inclusion of VIRTIO_F_NOTIFICATION_DATA in the feature bits arrays
for these devices ensures that the backend is capable of offering and
providing support for this feature, and that it can be disabled if the
backend does not support it.

Signed-off-by: Jonah Palmer 
---
 hw/block/vhost-user-blk.c| 1 +
 hw/net/vhost_net.c   | 2 ++
 hw/scsi/vhost-scsi.c | 1 +
 hw/scsi/vhost-user-scsi.c| 1 +
 hw/virtio/vhost-user-fs.c| 2 +-
 hw/virtio/vhost-user-vsock.c | 1 +
 net/vhost-vdpa.c | 1 +
 7 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 6a856ad51a..983c0657da 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e8e1661646..bb1f975b39 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VIRTIO_NET_F_HASH_REPORT,
 VHOST_INVALID_FEATURE_BIT
 };
@@ -55,6 +56,7 @@ static const int kernel_feature_bits[] = {
 /* Features supported by others. */
 static const int user_feature_bits[] = {
 VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_NOTIFICATION_DATA,
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 58a00336c2..b8048f18e9 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a63b1f4948..0b050805a8 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index cca2cd41be..ae48cc1c96 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -33,7 +33,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
-
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
index 9431b9792c..802b44a07d 100644
--- a/hw/virtio/vhost-user-vsock.c
+++ b/hw/virtio/vhost-user-vsock.c
@@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_NOTIFICATION_DATA,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 3726ee5d67..2827d29ce7 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
 VIRTIO_F_VERSION_1,
+VIRTIO_F_NOTIFICATION_DATA,
 VIRTIO_NET_F_CSUM,
 VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
 VIRTIO_NET_F_CTRL_MAC_ADDR,
-- 
2.39.3




[RFC 8/8] virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition

2024-03-01 Thread Jonah Palmer
Extend the virtio device property definitions to include the
VIRTIO_F_NOTIFICATION_DATA feature.

The default state of this feature is disabled, allowing it to be
explicitly enabled where it's supported.

Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index c92d8afc42..5772737dde 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -369,7 +369,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
 DEFINE_PROP_BIT64("packed", _state, _field, \
   VIRTIO_F_RING_PACKED, false), \
 DEFINE_PROP_BIT64("queue_reset", _state, _field, \
-  VIRTIO_F_RING_RESET, true)
+  VIRTIO_F_RING_RESET, true), \
+DEFINE_PROP_BIT64("notification_data", _state, _field, \
+  VIRTIO_F_NOTIFICATION_DATA, false)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
-- 
2.39.3




[RFC 5/8] virtio-ccw: Handle extra notification data

2024-03-01 Thread Jonah Palmer
Add support to virtio-ccw devices for handling the extra data sent from
the driver to the device when the VIRTIO_F_NOTIFICATION_DATA transport
feature has been negotiated.

The extra data that's passed to the virtio-ccw device when this feature
is enabled varies depending on the device's virtqueue layout.

That data passed to the virtio-ccw device is in the same format as the
data passed to virtio-pci devices.

Signed-off-by: Jonah Palmer 
---
 hw/s390x/s390-virtio-ccw.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 62804cc228..b8e193956c 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -140,9 +140,11 @@ static void subsystem_reset(void)
 static int virtio_ccw_hcall_notify(const uint64_t *args)
 {
 uint64_t subch_id = args[0];
-uint64_t queue = args[1];
+uint64_t data = args[1];
 SubchDev *sch;
+VirtIODevice *vdev;
 int cssid, ssid, schid, m;
+uint16_t vq_idx;
 
 if (ioinst_disassemble_sch_ident(subch_id, , , , )) {
 return -EINVAL;
@@ -151,12 +153,20 @@ static int virtio_ccw_hcall_notify(const uint64_t *args)
 if (!sch || !css_subch_visible(sch)) {
 return -EINVAL;
 }
-if (queue >= VIRTIO_QUEUE_MAX) {
+
+vdev = virtio_ccw_get_vdev(sch);
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
+vq_idx = data & 0x;
+virtio_set_notification_data(vdev, vq_idx, data);
+} else {
+vq_idx = data;
+}
+
+if (vq_idx >= VIRTIO_QUEUE_MAX) {
 return -EINVAL;
 }
-virtio_queue_notify(virtio_ccw_get_vdev(sch), queue);
+virtio_queue_notify(vdev, vq_idx);
 return 0;
-
 }
 
 static int virtio_ccw_hcall_early_printk(const uint64_t *args)
-- 
2.39.3




  1   2   3   >