from:"Eugenio Perez Martin"

Re: [PATCH v3 0/6] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-06-20 Thread Eugenio Perez Martin

On Thu, Jun 20, 2024 at 7:56 PM Jonah Palmer  wrote:
>
> The goal of these patches is to add support to a variety of virtio and
> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
> indicates that all buffers are used by the device in the same order in
> which they were made available by the driver.
>
> These patches attempt to implement a generalized, non-device-specific
> solution to support this feature.
>
> The core feature behind this solution is a buffer mechanism in the form
> of a VirtQueue's used_elems VirtQueueElement array. This allows devices
> who always use buffers in-order by default to have a minimal overhead
> impact. Devices that may not always use buffers in-order likely will
> experience a performance hit. How large that performance hit is will
> depend on how frequently elements are completed out-of-order.
>
> A VirtQueue whose device uses this feature will use its used_elems
> VirtQueueElement array to hold used VirtQueueElements. The index that
> used elements are placed in used_elems is the same index on the
> used/descriptor ring that would satisfy the in-order requirement. In
> other words, used elements are placed in their in-order locations on
> used_elems and are only written to the used/descriptor ring once the
> elements on used_elems are able to continue their expected order.
>
> To differentiate between a "used" and "unused" element on the used_elems
> array (a "used" element being an element that has returned from
> processing and an "unused" element being an element that has not yet
> been processed), we added a boolean 'in_order_filled' member to the
> VirtQueueElement struct. This flag is set to true when the element comes
> back from processing (virtqueue_ordered_fill) and then set back to false
> once it's been written to the used/descriptor ring
> (virtqueue_ordered_flush).
>
> Testing:
> 
> Testing was done using the dpdk-testpmd application on both the host and
> guest using the following configurations. Traffic was generated between
> the host and guest after running 'start tx_first' on both the host and
> guest dpdk-testpmd applications. Results are below after traffic was
> generated for several seconds.
>
> Relevant Qemu args:
> ---
> -chardev socket,id=char1,path=/tmp/vhost-user1,server=off
> -chardev socket,id=char2,path=/tmp/vhost-user2,server=off
> -netdev type=vhost-user,id=net1,chardev=char1,vhostforce=on,queues=1
> -netdev type=vhost-user,id=net2,chardev=char2,vhostforce=on,queues=1
> -device virtio-net-pci,in_order=true,packed=true,netdev=net1,
> mac=56:48:4f:53:54:00,mq=on,vectors=4,rx_queue_size=256
> -device virtio-net-pci,in_order=true,packed=true,netdev=net2,
> mac=56:48:4f:53:54:01,mq=on,vectors=4,rx_queue_size=256
>

Hi Jonah,

These tests are great, but others should also be performed. In
particular, QEMU should run ok with "tap" netdev with vhost=off
instead of vhost-user:

-netdev type=tap,id=net1,vhost=off
-netdev type=tap,id=net2,vhost=off

This way, packets are going through the modified code. With this
configuration, QEMU is the one forwarding the packets so testpmd is
not needed in the host. It's still needed in the guest as linux guest
driver does not support in_order. The guest kernel cmdline and testpmd
cmdline should require no changes from the configuration you describe
here.

And then try with in_order=true,packed=false and
in_order=true,packed=off in corresponding virtio-net-pci.

Performance comparison between in_order=true and in_order=false is
also interesting but we're not batching so I don't think we will get
an extreme improvement.

Does the plan work for you?

Thanks!

> Host dpdk-testpmd command:
> --
> dpdk-testpmd -l 0,2,3,4,5 --socket-mem=1024 -n 4
> --vdev 'net_vhost0,iface=/tmp/vhost-user1'
> --vdev 'net_vhost1,iface=/tmp/vhost-user2' --
> --portmask=f -i --rxq=1 --txq=1 --nb-cores=4 --forward-mode=io
>
> Guest dpdk-testpmd command:
> ---
> dpdk-testpmd -l 0,1 -a :00:02.0 -a :00:03.0 -- --portmask=3
> --rxq=1 --txq=1 --nb-cores=1 --forward-mode=io -i
>
> Results:
> 
> +++ Accumulated forward statistics for all ports+++
> RX-packets: 79067488   RX-dropped: 0 RX-total: 79067488
> TX-packets: 79067552   TX-dropped: 0 TX-total: 79067552
> 
>
> ---
> v3: Drop Tested-by tags until patches are re-tested.
> Replace 'prev_avail_idx' with 'vq->last_avail_idx - 1' in
> virtqueue_split_pop.
> Remove redundant '+vq->vring.num' in 'max_steps' calculation in
> virtqueue_ordered_fill.
> Add test results to CV.
>
> v2: Make 'in_order_filled' more descriptive.
> Change 'j' to more descriptive var name in virtqueue_split_pop.
> Use more definitive search conditional in virtqueue_ordered_fill.
> Avoid code duplication in

Re: [PATCH v2 3/6] virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support

2024-05-23 Thread Eugenio Perez Martin

On Thu, May 23, 2024 at 12:30 PM Jonah Palmer  wrote:
>
>
>
> On 5/22/24 12:07 PM, Eugenio Perez Martin wrote:
> > On Mon, May 20, 2024 at 3:01 PM Jonah Palmer  
> > wrote:
> >>
> >> Add VIRTIO_F_IN_ORDER feature support for the virtqueue_fill operation.
> >>
> >> The goal of the virtqueue_ordered_fill operation when the
> >> VIRTIO_F_IN_ORDER feature has been negotiated is to search for this
> >> now-used element, set its length, and mark the element as filled in
> >> the VirtQueue's used_elems array.
> >>
> >> By marking the element as filled, it will indicate that this element has
> >> been processed and is ready to be flushed, so long as the element is
> >> in-order.
> >>
> >> Signed-off-by: Jonah Palmer 
> >> ---
> >>   hw/virtio/virtio.c | 36 +++-
> >>   1 file changed, 35 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >> index 7456d61bc8..01b6b32460 100644
> >> --- a/hw/virtio/virtio.c
> >> +++ b/hw/virtio/virtio.c
> >> @@ -873,6 +873,38 @@ static void virtqueue_packed_fill(VirtQueue *vq, 
> >> const VirtQueueElement *elem,
> >>   vq->used_elems[idx].ndescs = elem->ndescs;
> >>   }
> >>
> >> +static void virtqueue_ordered_fill(VirtQueue *vq, const VirtQueueElement 
> >> *elem,
> >> +   unsigned int len)
> >> +{
> >> +unsigned int i, steps, max_steps;
> >> +
> >> +i = vq->used_idx;
> >> +steps = 0;
> >> +/*
> >> + * We shouldn't need to increase 'i' by more than the distance
> >> + * between used_idx and last_avail_idx.
> >> + */
> >> +max_steps = (vq->last_avail_idx + vq->vring.num - vq->used_idx)
> >> +% vq->vring.num;
> >
> > I may be missing something, but (+vq->vring.num) is redundant if we (%
> > vq->vring.num), isn't it?
> >
>
> It ensures the result is always non-negative (e.g. when
> vq->last_avail_idx < vq->used_idx).
>
> I wasn't sure how different platforms or compilers would handle
> something like -5 % 10, so to be safe I included the '+ vq->vring.num'.
>
> For example, on my system, in test.c;
>
> #include 
>
> int main() {
> unsigned int result = -5 % 10;
> printf("Result of -5 %% 10 is: %d\n", result);
> return 0;
> }
>
> # gcc -o test test.c
>
> # ./test
> Result of -5 % 10 is: -5
>

I think the modulo is being done in signed ints in your test, and then
converting a signed int to an unsigned int. Like result = (-5 % 10).

The unsigned wrap is always defined in C, and vq->last_avail_idx and
vq->used_idx are both unsigned. Here is a closer test:
int main(void) {
unsigned int a = -5, b = 2;
unsigned int result = (b-a) % 10;
printf("Result of -5 %% 10 is: %u\n", result);
return 0;
}

But it is a good catch for signed ints for sure :).

Thanks!

> >> +
> >> +/* Search for element in vq->used_elems */
> >> +while (steps <= max_steps) {
> >> +/* Found element, set length and mark as filled */
> >> +if (vq->used_elems[i].index == elem->index) {
> >> +vq->used_elems[i].len = len;
> >> +vq->used_elems[i].in_order_filled = true;
> >> +break;
> >> +}
> >> +
> >> +i += vq->used_elems[i].ndescs;
> >> +steps += vq->used_elems[i].ndescs;
> >> +
> >> +if (i >= vq->vring.num) {
> >> +i -= vq->vring.num;
> >> +}
> >> +}
> >> +}
> >> +
> >
> > Let's report an error if we finish the loop. I think:
> > qemu_log_mask(LOG_GUEST_ERROR,
> >"%s: %s cannot fill buffer id %u\n",
> >__func__, vdev->name, elem->index);
> >
> > (or similar) should do.
> >
> > apart form that,
> >
> > Reviewed-by: Eugenio Pérez 
> >
>
> Gotcha. Will add this in v3.
>
> Thank you Eugenio!
>
> >>   static void virtqueue_packed_fill_desc(VirtQueue *vq,
> >>  const VirtQueueElement *elem,
> >>  unsigned int idx,
> >> @@ -923,7 +955,9 @@ void virtqueue_fill(VirtQueue *vq, const 
> >> VirtQueueElement *elem,
> >>   return;
> >>   }
> >>
> >> -if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> >> +if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
> >> +virtqueue_ordered_fill(vq, elem, len);
> >> +} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> >>   virtqueue_packed_fill(vq, elem, len, idx);
> >>   } else {
> >>   virtqueue_split_fill(vq, elem, len, idx);
> >> --
> >> 2.39.3
> >>
> >
>

Re: [PATCH v2 3/6] virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support

2024-05-22 Thread Eugenio Perez Martin

On Mon, May 20, 2024 at 3:01 PM Jonah Palmer  wrote:
>
> Add VIRTIO_F_IN_ORDER feature support for the virtqueue_fill operation.
>
> The goal of the virtqueue_ordered_fill operation when the
> VIRTIO_F_IN_ORDER feature has been negotiated is to search for this
> now-used element, set its length, and mark the element as filled in
> the VirtQueue's used_elems array.
>
> By marking the element as filled, it will indicate that this element has
> been processed and is ready to be flushed, so long as the element is
> in-order.
>
> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio.c | 36 +++-
>  1 file changed, 35 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 7456d61bc8..01b6b32460 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -873,6 +873,38 @@ static void virtqueue_packed_fill(VirtQueue *vq, const 
> VirtQueueElement *elem,
>  vq->used_elems[idx].ndescs = elem->ndescs;
>  }
>
> +static void virtqueue_ordered_fill(VirtQueue *vq, const VirtQueueElement 
> *elem,
> +   unsigned int len)
> +{
> +unsigned int i, steps, max_steps;
> +
> +i = vq->used_idx;
> +steps = 0;
> +/*
> + * We shouldn't need to increase 'i' by more than the distance
> + * between used_idx and last_avail_idx.
> + */
> +max_steps = (vq->last_avail_idx + vq->vring.num - vq->used_idx)
> +% vq->vring.num;

I may be missing something, but (+vq->vring.num) is redundant if we (%
vq->vring.num), isn't it?

> +
> +/* Search for element in vq->used_elems */
> +while (steps <= max_steps) {
> +/* Found element, set length and mark as filled */
> +if (vq->used_elems[i].index == elem->index) {
> +vq->used_elems[i].len = len;
> +vq->used_elems[i].in_order_filled = true;
> +break;
> +}
> +
> +i += vq->used_elems[i].ndescs;
> +steps += vq->used_elems[i].ndescs;
> +
> +if (i >= vq->vring.num) {
> +i -= vq->vring.num;
> +}
> +}
> +}
> +

Let's report an error if we finish the loop. I think:
qemu_log_mask(LOG_GUEST_ERROR,
  "%s: %s cannot fill buffer id %u\n",
  __func__, vdev->name, elem->index);

(or similar) should do.

apart form that,

Reviewed-by: Eugenio Pérez 

>  static void virtqueue_packed_fill_desc(VirtQueue *vq,
> const VirtQueueElement *elem,
> unsigned int idx,
> @@ -923,7 +955,9 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
> *elem,
>  return;
>  }
>
> -if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
> +virtqueue_ordered_fill(vq, elem, len);
> +} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
>  virtqueue_packed_fill(vq, elem, len, idx);
>  } else {
>  virtqueue_split_fill(vq, elem, len, idx);
> --
> 2.39.3
>

Re: [PATCH v2 2/6] virtio: virtqueue_pop - VIRTIO_F_IN_ORDER support

2024-05-22 Thread Eugenio Perez Martin

On Mon, May 20, 2024 at 3:01 PM Jonah Palmer  wrote:
>
> Add VIRTIO_F_IN_ORDER feature support in virtqueue_split_pop and
> virtqueue_packed_pop.
>
> VirtQueueElements popped from the available/descritpor ring are added to
> the VirtQueue's used_elems array in-order and in the same fashion as
> they would be added the used and descriptor rings, respectively.
>
> This will allow us to keep track of the current order, what elements
> have been written, as well as an element's essential data after being
> processed.
>
> Tested-by: Lei Yang 
> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 893a072c9d..7456d61bc8 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1506,7 +1506,7 @@ static void *virtqueue_alloc_element(size_t sz, 
> unsigned out_num, unsigned in_nu
>
>  static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
>  {
> -unsigned int i, head, max;
> +unsigned int i, head, max, prev_avail_idx;
>  VRingMemoryRegionCaches *caches;
>  MemoryRegionCache indirect_desc_cache;
>  MemoryRegionCache *desc_cache;
> @@ -1539,6 +1539,8 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
> sz)
>  goto done;
>  }
>
> +prev_avail_idx = vq->last_avail_idx;
> +
>  if (!virtqueue_get_head(vq, vq->last_avail_idx++, )) {
>  goto done;
>  }
> @@ -1630,6 +1632,12 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
> sz)
>  elem->in_sg[i] = iov[out_num + i];
>  }
>
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {

I think vq->last_avail_idx - 1 could be more clear here.

Either way,

Reviewed-by: Eugenio Pérez 

> +vq->used_elems[prev_avail_idx].index = elem->index;
> +vq->used_elems[prev_avail_idx].len = elem->len;
> +vq->used_elems[prev_avail_idx].ndescs = elem->ndescs;
> +}
> +
>  vq->inuse++;
>
>  trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> @@ -1758,6 +1766,13 @@ static void *virtqueue_packed_pop(VirtQueue *vq, 
> size_t sz)
>
>  elem->index = id;
>  elem->ndescs = (desc_cache == _desc_cache) ? 1 : elem_entries;
> +
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> +vq->used_elems[vq->last_avail_idx].index = elem->index;
> +vq->used_elems[vq->last_avail_idx].len = elem->len;
> +vq->used_elems[vq->last_avail_idx].ndescs = elem->ndescs;
> +}
> +
>  vq->last_avail_idx += elem->ndescs;
>  vq->inuse += elem->ndescs;
>
> --
> 2.39.3
>

Re: [PATCH v2 1/6] virtio: Add bool to VirtQueueElement

2024-05-22 Thread Eugenio Perez Martin

On Mon, May 20, 2024 at 3:01 PM Jonah Palmer  wrote:
>
> Add the boolean 'in_order_filled' member to the VirtQueueElement structure.
> The use of this boolean will signify whether the element has been processed
> and is ready to be flushed (so long as the element is in-order). This
> boolean is used to support the VIRTIO_F_IN_ORDER feature.
>
> Tested-by: Lei Yang 

The code has changed from the version that Lei tested, so we should
drop this tag until he re-test again.

Reviewed-by: Eugenio Pérez 

> Signed-off-by: Jonah Palmer 
> ---
>  include/hw/virtio/virtio.h | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 7d5ffdc145..88e70c1ae1 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -69,6 +69,8 @@ typedef struct VirtQueueElement
>  unsigned int ndescs;
>  unsigned int out_num;
>  unsigned int in_num;
> +/* Element has been processed (VIRTIO_F_IN_ORDER) */
> +bool in_order_filled;
>  hwaddr *in_addr;
>  hwaddr *out_addr;
>  struct iovec *in_sg;
> --
> 2.39.3
>

Re: [PATCH 4/6] virtio: virtqueue_ordered_flush - VIRTIO_F_IN_ORDER support

2024-05-10 Thread Eugenio Perez Martin

On Mon, May 6, 2024 at 5:06 PM Jonah Palmer  wrote:
>
> Add VIRTIO_F_IN_ORDER feature support for virtqueue_flush operations.
>
> The goal of the virtqueue_flush operation when the VIRTIO_F_IN_ORDER
> feature has been negotiated is to write elements to the used/descriptor
> ring in-order and then update used_idx.
>
> The function iterates through the VirtQueueElement used_elems array
> in-order starting at vq->used_idx. If the element is valid (filled), the
> element is written to the used/descriptor ring. This process continues
> until we find an invalid (not filled) element.
>
> If any elements were written, the used_idx is updated.
>
> Tested-by: Lei Yang 
> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio.c | 75 +-
>  1 file changed, 74 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 064046b5e2..0efed2c88e 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1006,6 +1006,77 @@ static void virtqueue_packed_flush(VirtQueue *vq, 
> unsigned int count)
>  }
>  }
>
> +static void virtqueue_ordered_flush(VirtQueue *vq)
> +{
> +unsigned int i = vq->used_idx;
> +unsigned int ndescs = 0;
> +uint16_t old = vq->used_idx;
> +bool packed;
> +VRingUsedElem uelem;
> +
> +packed = virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED);
> +
> +if (packed) {
> +if (unlikely(!vq->vring.desc)) {
> +return;
> +}
> +} else if (unlikely(!vq->vring.used)) {
> +return;
> +}
> +
> +/* First expected in-order element isn't ready, nothing to do */
> +if (!vq->used_elems[i].filled) {
> +return;
> +}
> +
> +/* Write first expected in-order element to used ring (split VQs) */
> +if (!packed) {
> +uelem.id = vq->used_elems[i].index;
> +uelem.len = vq->used_elems[i].len;
> +vring_used_write(vq, , i);
> +}
> +
> +ndescs += vq->used_elems[i].ndescs;
> +i += ndescs;
> +if (i >= vq->vring.num) {
> +i -= vq->vring.num;
> +}
> +
> +/* Search for more filled elements in-order */
> +while (vq->used_elems[i].filled) {
> +if (packed) {
> +virtqueue_packed_fill_desc(vq, >used_elems[i], ndescs, 
> false);
> +} else {
> +uelem.id = vq->used_elems[i].index;
> +uelem.len = vq->used_elems[i].len;
> +vring_used_write(vq, , i);
> +}
> +
> +vq->used_elems[i].filled = false;
> +ndescs += vq->used_elems[i].ndescs;
> +i += ndescs;
> +if (i >= vq->vring.num) {
> +i -= vq->vring.num;
> +}
> +}
> +

I may be missing something, but you have split out the first case as a
special one, totally out of the while loop. Can't it be contained in
the loop checking !(packed && i == vq->used_idx)? That would avoid
code duplication.

A comment can be added in the line of "first entry of packed is
written the last so the guest does not see invalid descriptors".

> +if (packed) {
> +virtqueue_packed_fill_desc(vq, >used_elems[vq->used_idx], 0, 
> true);
> +vq->used_idx += ndescs;
> +if (vq->used_idx >= vq->vring.num) {
> +vq->used_idx -= vq->vring.num;
> +vq->used_wrap_counter ^= 1;
> +vq->signalled_used_valid = false;
> +}
> +} else {
> +vring_used_idx_set(vq, i);
> +if (unlikely((int16_t)(i - vq->signalled_used) < (uint16_t)(i - 
> old))) {
> +vq->signalled_used_valid = false;
> +}
> +}
> +vq->inuse -= ndescs;
> +}
> +
>  void virtqueue_flush(VirtQueue *vq, unsigned int count)
>  {
>  if (virtio_device_disabled(vq->vdev)) {
> @@ -1013,7 +1084,9 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
>  return;
>  }
>
> -if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
> +virtqueue_ordered_flush(vq);
> +} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
>  virtqueue_packed_flush(vq, count);
>  } else {
>  virtqueue_split_flush(vq, count);
> --
> 2.39.3
>

Re: [PATCH 3/6] virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support

2024-05-09 Thread Eugenio Perez Martin

On Mon, May 6, 2024 at 5:05 PM Jonah Palmer  wrote:
>
> Add VIRTIO_F_IN_ORDER feature support for virtqueue_fill operations.
>
> The goal of the virtqueue_fill operation when the VIRTIO_F_IN_ORDER
> feature has been negotiated is to search for this now-used element,
> set its length, and mark the element as filled in the VirtQueue's
> used_elems array.
>
> By marking the element as filled, it will indicate that this element is
> ready to be flushed, so long as the element is in-order.
>
> Tested-by: Lei Yang 
> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio.c | 26 +-
>  1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index e6eb1bb453..064046b5e2 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -873,6 +873,28 @@ static void virtqueue_packed_fill(VirtQueue *vq, const 
> VirtQueueElement *elem,
>  vq->used_elems[idx].ndescs = elem->ndescs;
>  }
>
> +static void virtqueue_ordered_fill(VirtQueue *vq, const VirtQueueElement 
> *elem,
> +   unsigned int len)
> +{
> +unsigned int i = vq->used_idx;
> +
> +/* Search for element in vq->used_elems */
> +while (i != vq->last_avail_idx) {
> +/* Found element, set length and mark as filled */
> +if (vq->used_elems[i].index == elem->index) {
> +vq->used_elems[i].len = len;
> +vq->used_elems[i].filled = true;
> +break;
> +}
> +
> +i += vq->used_elems[i].ndescs;
> +
> +if (i >= vq->vring.num) {
> +i -= vq->vring.num;
> +}
> +}

This has a subtle problem: ndescs and elems->id are controlled by the
guest, so it could make QEMU to loop forever looking for the right
descriptor. For each iteration, the code must control that the
variable "i" will be different for the next iteration, and that there
will be no more than vq->last_avail_idx - vq->used_idx iterations.

Apart of that, I think it makes more sense to split the logical
sections of the function this way:
/* declarations */
i = vq->used_idx

/* Search for element in vq->used_elems */
while (vq->used_elems[i].index != elem->index &&
vq->used_elems[i].index i != vq->last_avail_idx && ...) {
...
}

/* Set length and mark as filled */
vq->used_elems[i].len = len;
vq->used_elems[i].filled = true;
---

But I'm ok either way.

> +}
> +
>  static void virtqueue_packed_fill_desc(VirtQueue *vq,
> const VirtQueueElement *elem,
> unsigned int idx,
> @@ -923,7 +945,9 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
> *elem,
>  return;
>  }
>
> -if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
> +virtqueue_ordered_fill(vq, elem, len);
> +} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
>  virtqueue_packed_fill(vq, elem, len, idx);
>  } else {
>  virtqueue_split_fill(vq, elem, len, idx);
> --
> 2.39.3
>

Re: [PATCH 2/6] virtio: virtqueue_pop - VIRTIO_F_IN_ORDER support

2024-05-09 Thread Eugenio Perez Martin

On Mon, May 6, 2024 at 5:06 PM Jonah Palmer  wrote:
>
> Add VIRTIO_F_IN_ORDER feature support in virtqueue_split_pop and
> virtqueue_packed_pop.
>
> VirtQueueElements popped from the available/descritpor ring are added to
> the VirtQueue's used_elems array in-order and in the same fashion as
> they would be added the used and descriptor rings, respectively.
>
> This will allow us to keep track of the current order, what elements
> have been written, as well as an element's essential data after being
> processed.
>
> Tested-by: Lei Yang 
> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 893a072c9d..e6eb1bb453 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -1506,7 +1506,7 @@ static void *virtqueue_alloc_element(size_t sz, 
> unsigned out_num, unsigned in_nu
>
>  static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
>  {
> -unsigned int i, head, max;
> +unsigned int i, j, head, max;
>  VRingMemoryRegionCaches *caches;
>  MemoryRegionCache indirect_desc_cache;
>  MemoryRegionCache *desc_cache;
> @@ -1539,6 +1539,8 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
> sz)
>  goto done;
>  }
>
> +j = vq->last_avail_idx;
> +
>  if (!virtqueue_get_head(vq, vq->last_avail_idx++, )) {
>  goto done;
>  }
> @@ -1630,6 +1632,12 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
> sz)
>  elem->in_sg[i] = iov[out_num + i];
>  }
>
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> +vq->used_elems[j].index = elem->index;
> +vq->used_elems[j].len = elem->len;
> +vq->used_elems[j].ndescs = elem->ndescs;
> +}
> +
>  vq->inuse++;
>
>  trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> @@ -1758,6 +1766,13 @@ static void *virtqueue_packed_pop(VirtQueue *vq, 
> size_t sz)
>
>  elem->index = id;
>  elem->ndescs = (desc_cache == _desc_cache) ? 1 : elem_entries;
> +
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> +vq->used_elems[vq->last_avail_idx].index = elem->index;
> +vq->used_elems[vq->last_avail_idx].len = elem->len;
> +vq->used_elems[vq->last_avail_idx].ndescs = elem->ndescs;
> +}
> +

I suggest using a consistent style between packed and split: Either
always use vq->last_avail_idx or j. If you use j, please rename to
something more related to the usage, as j is usually for iterations.

In my opinion I think vq->last_avail_idx is better.


>  vq->last_avail_idx += elem->ndescs;
>  vq->inuse += elem->ndescs;
>
> --
> 2.39.3
>

Re: [PATCH 1/6] virtio: Add bool to VirtQueueElement

2024-05-09 Thread Eugenio Perez Martin

On Mon, May 6, 2024 at 5:06 PM Jonah Palmer  wrote:
>
> Add the boolean 'filled' member to the VirtQueueElement structure. The
> use of this boolean will signify if the element has been written to the
> used / descriptor ring or not. This boolean is used to support the
> VIRTIO_F_IN_ORDER feature.
>
> Tested-by: Lei Yang 
> Signed-off-by: Jonah Palmer 
> ---
>  include/hw/virtio/virtio.h | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 7d5ffdc145..9ed9c3763c 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -69,6 +69,7 @@ typedef struct VirtQueueElement
>  unsigned int ndescs;
>  unsigned int out_num;
>  unsigned int in_num;
> +bool filled;

in_order_filled? I cannot come with a good name for this. Maybe we can
add a comment on top of the variable so we know what it is used for?

>  hwaddr *in_addr;
>  hwaddr *out_addr;
>  struct iovec *in_sg;
> --
> 2.39.3
>

Re: [RFC v2 1/5] virtio: Initialize sequence variables

2024-04-05 Thread Eugenio Perez Martin

On Fri, Apr 5, 2024 at 3:59 PM Jonah Palmer  wrote:
>
>
>
> On 4/4/24 12:33 PM, Eugenio Perez Martin wrote:
> > On Thu, Apr 4, 2024 at 4:42 PM Jonah Palmer  wrote:
> >>
> >>
> >>
> >> On 4/4/24 7:35 AM, Eugenio Perez Martin wrote:
> >>> On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer  
> >>> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:
> >>>>> On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer  
> >>>>> wrote:
> >>>>>>
> >>>>>> Initialize sequence variables for VirtQueue and VirtQueueElement
> >>>>>> structures. A VirtQueue's sequence variables are initialized when a
> >>>>>> VirtQueue is being created or reset. A VirtQueueElement's sequence
> >>>>>> variable is initialized when a VirtQueueElement is being initialized.
> >>>>>> These variables will be used to support the VIRTIO_F_IN_ORDER feature.
> >>>>>>
> >>>>>> A VirtQueue's used_seq_idx represents the next expected index in a
> >>>>>> sequence of VirtQueueElements to be processed (put on the used ring).
> >>>>>> The next VirtQueueElement added to the used ring must match this
> >>>>>> sequence number before additional elements can be safely added to the
> >>>>>> used ring. It's also particularly useful for helping find the number of
> >>>>>> new elements added to the used ring.
> >>>>>>
> >>>>>> A VirtQueue's current_seq_idx represents the current sequence index.
> >>>>>> This value is essentially a counter where the value is assigned to a 
> >>>>>> new
> >>>>>> VirtQueueElement and then incremented. Given its uint16_t type, this
> >>>>>> sequence number can be between 0 and 65,535.
> >>>>>>
> >>>>>> A VirtQueueElement's seq_idx represents the sequence number assigned to
> >>>>>> the VirtQueueElement when it was created. This value must match with 
> >>>>>> the
> >>>>>> VirtQueue's used_seq_idx before the element can be put on the used ring
> >>>>>> by the device.
> >>>>>>
> >>>>>> Signed-off-by: Jonah Palmer 
> >>>>>> ---
> >>>>>> hw/virtio/virtio.c | 18 ++
> >>>>>> include/hw/virtio/virtio.h |  1 +
> >>>>>> 2 files changed, 19 insertions(+)
> >>>>>>
> >>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >>>>>> index fb6b4ccd83..069d96df99 100644
> >>>>>> --- a/hw/virtio/virtio.c
> >>>>>> +++ b/hw/virtio/virtio.c
> >>>>>> @@ -132,6 +132,10 @@ struct VirtQueue
> >>>>>> uint16_t used_idx;
> >>>>>> bool used_wrap_counter;
> >>>>>>
> >>>>>> +/* In-Order sequence indices */
> >>>>>> +uint16_t used_seq_idx;
> >>>>>> +uint16_t current_seq_idx;
> >>>>>> +
> >>>>>
> >>>>> I'm having a hard time understanding the difference between these and
> >>>>> last_avail_idx and used_idx. It seems to me if we replace them
> >>>>> everything will work? What am I missing?
> >>>>>
> >>>>
> >>>> For used_seq_idx, it does work like used_idx except the difference is
> >>>> when their values get updated, specifically for the split VQ case.
> >>>>
> >>>> As you know, for the split VQ case, the used_idx is updated during
> >>>> virtqueue_split_flush. However, imagine a batch of elements coming in
> >>>> where virtqueue_split_fill is called multiple times before
> >>>> virtqueue_split_flush. We want to make sure we write these elements to
> >>>> the used ring in-order and we'll know its order based on used_seq_idx.
> >>>>
> >>>> Alternatively, I thought about replicating the logic for the packed VQ
> >>>> case (where this used_seq_idx isn't used) where we start looking at
> >>>> vq->used_elems[vq->used_idx] and iterate through until we find a used
> >>>> element, but I wasn't sure how to handle t

Re: [RFC v2 1/5] virtio: Initialize sequence variables

2024-04-04 Thread Eugenio Perez Martin

On Thu, Apr 4, 2024 at 4:42 PM Jonah Palmer  wrote:
>
>
>
> On 4/4/24 7:35 AM, Eugenio Perez Martin wrote:
> > On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer  wrote:
> >>
> >>
> >>
> >> On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:
> >>> On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer  
> >>> wrote:
> >>>>
> >>>> Initialize sequence variables for VirtQueue and VirtQueueElement
> >>>> structures. A VirtQueue's sequence variables are initialized when a
> >>>> VirtQueue is being created or reset. A VirtQueueElement's sequence
> >>>> variable is initialized when a VirtQueueElement is being initialized.
> >>>> These variables will be used to support the VIRTIO_F_IN_ORDER feature.
> >>>>
> >>>> A VirtQueue's used_seq_idx represents the next expected index in a
> >>>> sequence of VirtQueueElements to be processed (put on the used ring).
> >>>> The next VirtQueueElement added to the used ring must match this
> >>>> sequence number before additional elements can be safely added to the
> >>>> used ring. It's also particularly useful for helping find the number of
> >>>> new elements added to the used ring.
> >>>>
> >>>> A VirtQueue's current_seq_idx represents the current sequence index.
> >>>> This value is essentially a counter where the value is assigned to a new
> >>>> VirtQueueElement and then incremented. Given its uint16_t type, this
> >>>> sequence number can be between 0 and 65,535.
> >>>>
> >>>> A VirtQueueElement's seq_idx represents the sequence number assigned to
> >>>> the VirtQueueElement when it was created. This value must match with the
> >>>> VirtQueue's used_seq_idx before the element can be put on the used ring
> >>>> by the device.
> >>>>
> >>>> Signed-off-by: Jonah Palmer 
> >>>> ---
> >>>>hw/virtio/virtio.c | 18 ++
> >>>>include/hw/virtio/virtio.h |  1 +
> >>>>2 files changed, 19 insertions(+)
> >>>>
> >>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >>>> index fb6b4ccd83..069d96df99 100644
> >>>> --- a/hw/virtio/virtio.c
> >>>> +++ b/hw/virtio/virtio.c
> >>>> @@ -132,6 +132,10 @@ struct VirtQueue
> >>>>uint16_t used_idx;
> >>>>bool used_wrap_counter;
> >>>>
> >>>> +/* In-Order sequence indices */
> >>>> +uint16_t used_seq_idx;
> >>>> +uint16_t current_seq_idx;
> >>>> +
> >>>
> >>> I'm having a hard time understanding the difference between these and
> >>> last_avail_idx and used_idx. It seems to me if we replace them
> >>> everything will work? What am I missing?
> >>>
> >>
> >> For used_seq_idx, it does work like used_idx except the difference is
> >> when their values get updated, specifically for the split VQ case.
> >>
> >> As you know, for the split VQ case, the used_idx is updated during
> >> virtqueue_split_flush. However, imagine a batch of elements coming in
> >> where virtqueue_split_fill is called multiple times before
> >> virtqueue_split_flush. We want to make sure we write these elements to
> >> the used ring in-order and we'll know its order based on used_seq_idx.
> >>
> >> Alternatively, I thought about replicating the logic for the packed VQ
> >> case (where this used_seq_idx isn't used) where we start looking at
> >> vq->used_elems[vq->used_idx] and iterate through until we find a used
> >> element, but I wasn't sure how to handle the case where elements get
> >> used (written to the used ring) and new elements get put in used_elems
> >> before the used_idx is updated. Since this search would require us to
> >> always start at index vq->used_idx.
> >>
> >> For example, say, of three elements getting filled (elem0 - elem2),
> >> elem1 and elem0 come back first (vq->used_idx = 0):
> >>
> >> elem1 - not in-order
> >> elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
> >>   in-order, write elem0 and elem1 to used ring, mark elements as
> >>   used
> >>
> >> Then elem2 comes back, but vq->used_idx is still 0, so how do we know to
> >> ignore the used

Re: [RFC v2 1/5] virtio: Initialize sequence variables

2024-04-04 Thread Eugenio Perez Martin

On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer  wrote:
>
>
>
> On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:
> > On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer  
> > wrote:
> >>
> >> Initialize sequence variables for VirtQueue and VirtQueueElement
> >> structures. A VirtQueue's sequence variables are initialized when a
> >> VirtQueue is being created or reset. A VirtQueueElement's sequence
> >> variable is initialized when a VirtQueueElement is being initialized.
> >> These variables will be used to support the VIRTIO_F_IN_ORDER feature.
> >>
> >> A VirtQueue's used_seq_idx represents the next expected index in a
> >> sequence of VirtQueueElements to be processed (put on the used ring).
> >> The next VirtQueueElement added to the used ring must match this
> >> sequence number before additional elements can be safely added to the
> >> used ring. It's also particularly useful for helping find the number of
> >> new elements added to the used ring.
> >>
> >> A VirtQueue's current_seq_idx represents the current sequence index.
> >> This value is essentially a counter where the value is assigned to a new
> >> VirtQueueElement and then incremented. Given its uint16_t type, this
> >> sequence number can be between 0 and 65,535.
> >>
> >> A VirtQueueElement's seq_idx represents the sequence number assigned to
> >> the VirtQueueElement when it was created. This value must match with the
> >> VirtQueue's used_seq_idx before the element can be put on the used ring
> >> by the device.
> >>
> >> Signed-off-by: Jonah Palmer 
> >> ---
> >>   hw/virtio/virtio.c | 18 ++
> >>   include/hw/virtio/virtio.h |  1 +
> >>   2 files changed, 19 insertions(+)
> >>
> >> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >> index fb6b4ccd83..069d96df99 100644
> >> --- a/hw/virtio/virtio.c
> >> +++ b/hw/virtio/virtio.c
> >> @@ -132,6 +132,10 @@ struct VirtQueue
> >>   uint16_t used_idx;
> >>   bool used_wrap_counter;
> >>
> >> +/* In-Order sequence indices */
> >> +uint16_t used_seq_idx;
> >> +uint16_t current_seq_idx;
> >> +
> >
> > I'm having a hard time understanding the difference between these and
> > last_avail_idx and used_idx. It seems to me if we replace them
> > everything will work? What am I missing?
> >
>
> For used_seq_idx, it does work like used_idx except the difference is
> when their values get updated, specifically for the split VQ case.
>
> As you know, for the split VQ case, the used_idx is updated during
> virtqueue_split_flush. However, imagine a batch of elements coming in
> where virtqueue_split_fill is called multiple times before
> virtqueue_split_flush. We want to make sure we write these elements to
> the used ring in-order and we'll know its order based on used_seq_idx.
>
> Alternatively, I thought about replicating the logic for the packed VQ
> case (where this used_seq_idx isn't used) where we start looking at
> vq->used_elems[vq->used_idx] and iterate through until we find a used
> element, but I wasn't sure how to handle the case where elements get
> used (written to the used ring) and new elements get put in used_elems
> before the used_idx is updated. Since this search would require us to
> always start at index vq->used_idx.
>
> For example, say, of three elements getting filled (elem0 - elem2),
> elem1 and elem0 come back first (vq->used_idx = 0):
>
> elem1 - not in-order
> elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
>  in-order, write elem0 and elem1 to used ring, mark elements as
>  used
>
> Then elem2 comes back, but vq->used_idx is still 0, so how do we know to
> ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1
> (elem1) and iterate to vq->used_idx + 2 (elem2)?
>
> Hmm... now that I'm thinking about it, maybe for the split VQ case we
> could continue looking through the vq->used_elems array until we find an
> unused element... but then again how would we (1) know if the element is
> in-order and (2) know when to stop searching?
>

Ok I think I understand the problem now. It is aggravated if we add
chained descriptors to the mix.

We know that the order of used descriptors must be the exact same as
the order they were made available, leaving out in order batching.
What if vq->used_elems at virtqueue_pop and then virtqueue_push just
marks them as used somehow? Two booleans (or flag) would do for a
first iteration.

If we go with

Re: [RFC v2 1/5] virtio: Initialize sequence variables

2024-04-03 Thread Eugenio Perez Martin

On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer  wrote:
>
> Initialize sequence variables for VirtQueue and VirtQueueElement
> structures. A VirtQueue's sequence variables are initialized when a
> VirtQueue is being created or reset. A VirtQueueElement's sequence
> variable is initialized when a VirtQueueElement is being initialized.
> These variables will be used to support the VIRTIO_F_IN_ORDER feature.
>
> A VirtQueue's used_seq_idx represents the next expected index in a
> sequence of VirtQueueElements to be processed (put on the used ring).
> The next VirtQueueElement added to the used ring must match this
> sequence number before additional elements can be safely added to the
> used ring. It's also particularly useful for helping find the number of
> new elements added to the used ring.
>
> A VirtQueue's current_seq_idx represents the current sequence index.
> This value is essentially a counter where the value is assigned to a new
> VirtQueueElement and then incremented. Given its uint16_t type, this
> sequence number can be between 0 and 65,535.
>
> A VirtQueueElement's seq_idx represents the sequence number assigned to
> the VirtQueueElement when it was created. This value must match with the
> VirtQueue's used_seq_idx before the element can be put on the used ring
> by the device.
>
> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio.c | 18 ++
>  include/hw/virtio/virtio.h |  1 +
>  2 files changed, 19 insertions(+)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index fb6b4ccd83..069d96df99 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -132,6 +132,10 @@ struct VirtQueue
>  uint16_t used_idx;
>  bool used_wrap_counter;
>
> +/* In-Order sequence indices */
> +uint16_t used_seq_idx;
> +uint16_t current_seq_idx;
> +

I'm having a hard time understanding the difference between these and
last_avail_idx and used_idx. It seems to me if we replace them
everything will work? What am I missing?

>  /* Last used index value we have signalled on */
>  uint16_t signalled_used;
>
> @@ -1621,6 +1625,11 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
> sz)
>  elem->in_sg[i] = iov[out_num + i];
>  }
>
> +/* Assign sequence index for in-order processing */
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> +elem->seq_idx = vq->current_seq_idx++;
> +}
> +
>  vq->inuse++;
>
>  trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
> @@ -1760,6 +1769,11 @@ static void *virtqueue_packed_pop(VirtQueue *vq, 
> size_t sz)
>  vq->shadow_avail_idx = vq->last_avail_idx;
>  vq->shadow_avail_wrap_counter = vq->last_avail_wrap_counter;
>
> +/* Assign sequence index for in-order processing */
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
> +elem->seq_idx = vq->current_seq_idx++;
> +}
> +
>  trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
>  done:
>  address_space_cache_destroy(_desc_cache);
> @@ -2087,6 +2101,8 @@ static void __virtio_queue_reset(VirtIODevice *vdev, 
> uint32_t i)
>  vdev->vq[i].notification = true;
>  vdev->vq[i].vring.num = vdev->vq[i].vring.num_default;
>  vdev->vq[i].inuse = 0;
> +vdev->vq[i].used_seq_idx = 0;
> +vdev->vq[i].current_seq_idx = 0;
>  virtio_virtqueue_reset_region_cache(>vq[i]);
>  }
>
> @@ -2334,6 +2350,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int 
> queue_size,
>  vdev->vq[i].vring.align = VIRTIO_PCI_VRING_ALIGN;
>  vdev->vq[i].handle_output = handle_output;
>  vdev->vq[i].used_elems = g_new0(VirtQueueElement, queue_size);
> +vdev->vq[i].used_seq_idx = 0;
> +vdev->vq[i].current_seq_idx = 0;
>
>  return >vq[i];
>  }
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index b3c74a1bca..910b2a3427 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -75,6 +75,7 @@ typedef struct VirtQueueElement
>  hwaddr *out_addr;
>  struct iovec *in_sg;
>  struct iovec *out_sg;
> +uint16_t seq_idx;
>  } VirtQueueElement;
>
>  #define VIRTIO_QUEUE_MAX 1024
> --
> 2.39.3
>

Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-03-26 Thread Eugenio Perez Martin

On Tue, Mar 26, 2024 at 5:49 PM Jonah Palmer  wrote:
>
>
>
> On 3/25/24 4:33 PM, Eugenio Perez Martin wrote:
> > On Mon, Mar 25, 2024 at 5:52 PM Jonah Palmer  
> > wrote:
> >>
> >>
> >>
> >> On 3/22/24 7:18 AM, Eugenio Perez Martin wrote:
> >>> On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  
> >>> wrote:
> >>>>
> >>>> The goal of these patches is to add support to a variety of virtio and
> >>>> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
> >>>> indicates that all buffers are used by the device in the same order in
> >>>> which they were made available by the driver.
> >>>>
> >>>> These patches attempt to implement a generalized, non-device-specific
> >>>> solution to support this feature.
> >>>>
> >>>> The core feature behind this solution is a buffer mechanism in the form
> >>>> of GLib's GHashTable. The decision behind using a hash table was to
> >>>> leverage their ability for quick lookup, insertion, and removal
> >>>> operations. Given that our keys are simply numbers of an ordered
> >>>> sequence, a hash table seemed like the best choice for a buffer
> >>>> mechanism.
> >>>>
> >>>> -
> >>>>
> >>>> The strategy behind this implementation is as follows:
> >>>>
> >>>> We know that buffers that are popped from the available ring and enqueued
> >>>> for further processing will always done in the same order in which they
> >>>> were made available by the driver. Given this, we can note their order
> >>>> by assigning the resulting VirtQueueElement a key. This key is a number
> >>>> in a sequence that represents the order in which they were popped from
> >>>> the available ring, relative to the other VirtQueueElements.
> >>>>
> >>>> For example, given 3 "elements" that were popped from the available
> >>>> ring, we assign a key value to them which represents their order (elem0
> >>>> is popped first, then elem1, then lastly elem2):
> >>>>
> >>>>elem2   --  elem1   --  elem0   ---> Enqueue for processing
> >>>>   (key: 2)(key: 1)(key: 0)
> >>>>
> >>>> Then these elements are enqueued for further processing by the host.
> >>>>
> >>>> While most devices will return these completed elements in the same
> >>>> order in which they were enqueued, some devices may not (e.g.
> >>>> virtio-blk). To guarantee that these elements are put on the used ring
> >>>> in the same order in which they were enqueued, we can use a buffering
> >>>> mechanism that keeps track of the next expected sequence number of an
> >>>> element.
> >>>>
> >>>> In other words, if the completed element does not have a key value that
> >>>> matches the next expected sequence number, then we know this element is
> >>>> not in-order and we must stash it away in a hash table until an order
> >>>> can be made. The element's key value is used as the key for placing it
> >>>> in the hash table.
> >>>>
> >>>> If the completed element has a key value that matches the next expected
> >>>> sequence number, then we know this element is in-order and we can push
> >>>> it on the used ring. Then we increment the next expected sequence number
> >>>> and check if the hash table contains an element at this key location.
> >>>>
> >>>> If so, we retrieve this element, push it to the used ring, delete the
> >>>> key-value pair from the hash table, increment the next expected sequence
> >>>> number, and check the hash table again for an element at this new key
> >>>> location. This process is repeated until we're unable to find an element
> >>>> in the hash table to continue the order.
> >>>>
> >>>> So, for example, say the 3 elements we enqueued were completed in the
> >>>> following order: elem1, elem2, elem0. The next expected sequence number
> >>>> is 0:
> >>>>
> >>>>   exp-seq-num = 0:
> >>>>
> >>>&

Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-03-25 Thread Eugenio Perez Martin

On Mon, Mar 25, 2024 at 5:52 PM Jonah Palmer  wrote:
>
>
>
> On 3/22/24 7:18 AM, Eugenio Perez Martin wrote:
> > On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  
> > wrote:
> >>
> >> The goal of these patches is to add support to a variety of virtio and
> >> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
> >> indicates that all buffers are used by the device in the same order in
> >> which they were made available by the driver.
> >>
> >> These patches attempt to implement a generalized, non-device-specific
> >> solution to support this feature.
> >>
> >> The core feature behind this solution is a buffer mechanism in the form
> >> of GLib's GHashTable. The decision behind using a hash table was to
> >> leverage their ability for quick lookup, insertion, and removal
> >> operations. Given that our keys are simply numbers of an ordered
> >> sequence, a hash table seemed like the best choice for a buffer
> >> mechanism.
> >>
> >> -
> >>
> >> The strategy behind this implementation is as follows:
> >>
> >> We know that buffers that are popped from the available ring and enqueued
> >> for further processing will always done in the same order in which they
> >> were made available by the driver. Given this, we can note their order
> >> by assigning the resulting VirtQueueElement a key. This key is a number
> >> in a sequence that represents the order in which they were popped from
> >> the available ring, relative to the other VirtQueueElements.
> >>
> >> For example, given 3 "elements" that were popped from the available
> >> ring, we assign a key value to them which represents their order (elem0
> >> is popped first, then elem1, then lastly elem2):
> >>
> >>   elem2   --  elem1   --  elem0   ---> Enqueue for processing
> >>  (key: 2)(key: 1)(key: 0)
> >>
> >> Then these elements are enqueued for further processing by the host.
> >>
> >> While most devices will return these completed elements in the same
> >> order in which they were enqueued, some devices may not (e.g.
> >> virtio-blk). To guarantee that these elements are put on the used ring
> >> in the same order in which they were enqueued, we can use a buffering
> >> mechanism that keeps track of the next expected sequence number of an
> >> element.
> >>
> >> In other words, if the completed element does not have a key value that
> >> matches the next expected sequence number, then we know this element is
> >> not in-order and we must stash it away in a hash table until an order
> >> can be made. The element's key value is used as the key for placing it
> >> in the hash table.
> >>
> >> If the completed element has a key value that matches the next expected
> >> sequence number, then we know this element is in-order and we can push
> >> it on the used ring. Then we increment the next expected sequence number
> >> and check if the hash table contains an element at this key location.
> >>
> >> If so, we retrieve this element, push it to the used ring, delete the
> >> key-value pair from the hash table, increment the next expected sequence
> >> number, and check the hash table again for an element at this new key
> >> location. This process is repeated until we're unable to find an element
> >> in the hash table to continue the order.
> >>
> >> So, for example, say the 3 elements we enqueued were completed in the
> >> following order: elem1, elem2, elem0. The next expected sequence number
> >> is 0:
> >>
> >>  exp-seq-num = 0:
> >>
> >>   elem1   --> elem1.key == exp-seq-num ? --> No, stash it
> >>  (key: 1) |
> >>   |
> >>   v
> >> 
> >> |key: 1 - elem1|
> >> 
> >>  -
> >>  exp-seq-num = 0:
> >>
> >>   elem2   --> elem2.key == exp-seq-num ? --> No, stash it
> >>  (key: 2) |
> >>

Re: [RFC 4/8] virtio: Implement in-order handling for virtio devices

2024-03-25 Thread Eugenio Perez Martin

On Mon, Mar 25, 2024 at 6:35 PM Jonah Palmer  wrote:
>
>
>
> On 3/22/24 6:46 AM, Eugenio Perez Martin wrote:
> > On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  
> > wrote:
> >>
> >> Implements in-order handling for most virtio devices using the
> >> VIRTIO_F_IN_ORDER transport feature, specifically those who call
> >> virtqueue_push to push their used elements onto the used ring.
> >>
> >> The logic behind this implementation is as follows:
> >>
> >> 1.) virtqueue_pop always enqueues VirtQueueElements in-order.
> >>
> >> virtqueue_pop always retrieves one or more buffer descriptors in-order
> >> from the available ring and converts them into a VirtQueueElement. This
> >> means that the order in which VirtQueueElements are enqueued are
> >> in-order by default.
> >>
> >> By virtue, as VirtQueueElements are created, we can assign a sequential
> >> key value to them. This preserves the order of buffers that have been
> >> made available to the device by the driver.
> >>
> >> As VirtQueueElements are assigned a key value, the current sequence
> >> number is incremented.
> >>
> >> 2.) Requests can be completed out-of-order.
> >>
> >> While most devices complete requests in the same order that they were
> >> enqueued by default, some devices don't (e.g. virtio-blk). The goal of
> >> this out-of-order handling is to reduce the impact of devices that
> >> process elements in-order by default while also guaranteeing compliance
> >> with the VIRTIO_F_IN_ORDER feature.
> >>
> >> Below is the logic behind handling completed requests (which may or may
> >> not be in-order).
> >>
> >> 3.) Does the incoming used VirtQueueElement preserve the correct order?
> >>
> >> In other words, is the sequence number (key) assigned to the
> >> VirtQueueElement the expected number that would preserve the original
> >> order?
> >>
> >> 3a.)
> >> If it does... immediately push the used element onto the used ring.
> >> Then increment the next expected sequence number and check to see if
> >> any previous out-of-order VirtQueueElements stored on the hash table
> >> has a key that matches this next expected sequence number.
> >>
> >> For each VirtQueueElement found on the hash table with a matching key:
> >> push the element on the used ring, remove the key-value pair from the
> >> hash table, and then increment the next expected sequence number. Repeat
> >> this process until we're unable to find an element with a matching key.
> >>
> >> Note that if the device uses batching (e.g. virtio-net), then we skip
> >> the virtqueue_flush call and let the device call it themselves.
> >>
> >> 3b.)
> >> If it does not... stash the VirtQueueElement, along with relevant data,
> >> as a InOrderVQElement on the hash table. The key used is the order_key
> >> that was assigned when the VirtQueueElement was created.
> >>
> >> Signed-off-by: Jonah Palmer 
> >> ---
> >>   hw/virtio/virtio.c | 70 --
> >>   include/hw/virtio/virtio.h |  8 +
> >>   2 files changed, 76 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >> index 40124545d6..40e4377f1e 100644
> >> --- a/hw/virtio/virtio.c
> >> +++ b/hw/virtio/virtio.c
> >> @@ -992,12 +992,56 @@ void virtqueue_flush(VirtQueue *vq, unsigned int 
> >> count)
> >>   }
> >>   }
> >>
> >> +void virtqueue_order_element(VirtQueue *vq, const VirtQueueElement *elem,
> >> + unsigned int len, unsigned int idx,
> >> + unsigned int count)
> >> +{
> >> +InOrderVQElement *in_order_elem;
> >> +
> >> +if (elem->order_key == vq->current_order_idx) {
> >> +/* Element is in-order, push to used ring */
> >> +virtqueue_fill(vq, elem, len, idx);
> >> +
> >> +/* Batching? Don't flush */
> >> +if (count) {
> >> +virtqueue_flush(vq, count);
> >
> > The "count" parameter is the number of heads used, but here you're
> > only using one head (elem). Same with the other virtqueue_flush in the
> > function.
> >
>
> True. This acts more as a flag than an actual count since, unless we're
> batching (which in t

Re: [RFC 1/8] virtio: Define InOrderVQElement

2024-03-25 Thread Eugenio Perez Martin

On Mon, Mar 25, 2024 at 6:08 PM Jonah Palmer  wrote:
>
>
>
> On 3/22/24 5:45 AM, Eugenio Perez Martin wrote:
> > On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  
> > wrote:
> >>
> >> Define the InOrderVQElement structure for the VIRTIO_F_IN_ORDER
> >> transport feature implementation.
> >>
> >> The InOrderVQElement structure is used to encapsulate out-of-order
> >> VirtQueueElement data that was processed by the host. This data
> >> includes:
> >>   - The processed VirtQueueElement (elem)
> >>   - Length of data (len)
> >>   - VirtQueueElement array index (idx)
> >>   - Number of processed VirtQueueElements (count)
> >>
> >> InOrderVQElements will be stored in a buffering mechanism until an
> >> order can be achieved.
> >>
> >> Signed-off-by: Jonah Palmer 
> >> ---
> >>   include/hw/virtio/virtio.h | 7 +++
> >>   1 file changed, 7 insertions(+)
> >>
> >> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> >> index b3c74a1bca..c8aa435a5e 100644
> >> --- a/include/hw/virtio/virtio.h
> >> +++ b/include/hw/virtio/virtio.h
> >> @@ -77,6 +77,13 @@ typedef struct VirtQueueElement
> >>   struct iovec *out_sg;
> >>   } VirtQueueElement;
> >>
> >> +typedef struct InOrderVQElement {
> >> +const VirtQueueElement *elem;
> >
> > Some subsystems allocate space for extra elements after
> > VirtQueueElement, like VirtIOBlockReq. You can request virtqueue_pop
> > to allocate this extra space by its second argument. Would it work for
> > this?
> >
>
> I don't see why not. Although this may not be necessary due to me
> missing a key aspect mentioned in your comment below.
>
> >> +unsigned int len;
> >> +unsigned int idx;
> >> +unsigned int count;
> >
> > Now I don't get why these fields cannot be obtained from elem->(len,
> > index, ndescs) ?
> >
>
> Interesting. I didn't realize that these values are equivalent to a
> VirtQueueElement's len, index, and ndescs fields.
>
> Is this always true? Else I would've expected, for example,
> virtqueue_push to not need the 'unsigned int len' parameter if this
> information is already included via. the VirtQueueElement being passed in.
>

The code uses "len" to store the written length values of each used
descriptor between virtqueue_fill and virtqueue_flush. But not all
devices use these separately, only the ones that batches: virtio-net
and SVQ.

A smarter / less simpler implementation of virtqueue_push could
certainly avoid storing elem->len. But the performance gain is
probably tiny, and the code complexity grows.

> >> +} InOrderVQElement;
> >> +
> >>   #define VIRTIO_QUEUE_MAX 1024
> >>
> >>   #define VIRTIO_NO_VECTOR 0x
> >> --
> >> 2.39.3
> >>
> >
>

Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-03-22 Thread Eugenio Perez Martin

On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  wrote:
>
> The goal of these patches is to add support to a variety of virtio and
> vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
> indicates that all buffers are used by the device in the same order in
> which they were made available by the driver.
>
> These patches attempt to implement a generalized, non-device-specific
> solution to support this feature.
>
> The core feature behind this solution is a buffer mechanism in the form
> of GLib's GHashTable. The decision behind using a hash table was to
> leverage their ability for quick lookup, insertion, and removal
> operations. Given that our keys are simply numbers of an ordered
> sequence, a hash table seemed like the best choice for a buffer
> mechanism.
>
> -
>
> The strategy behind this implementation is as follows:
>
> We know that buffers that are popped from the available ring and enqueued
> for further processing will always done in the same order in which they
> were made available by the driver. Given this, we can note their order
> by assigning the resulting VirtQueueElement a key. This key is a number
> in a sequence that represents the order in which they were popped from
> the available ring, relative to the other VirtQueueElements.
>
> For example, given 3 "elements" that were popped from the available
> ring, we assign a key value to them which represents their order (elem0
> is popped first, then elem1, then lastly elem2):
>
>  elem2   --  elem1   --  elem0   ---> Enqueue for processing
> (key: 2)(key: 1)(key: 0)
>
> Then these elements are enqueued for further processing by the host.
>
> While most devices will return these completed elements in the same
> order in which they were enqueued, some devices may not (e.g.
> virtio-blk). To guarantee that these elements are put on the used ring
> in the same order in which they were enqueued, we can use a buffering
> mechanism that keeps track of the next expected sequence number of an
> element.
>
> In other words, if the completed element does not have a key value that
> matches the next expected sequence number, then we know this element is
> not in-order and we must stash it away in a hash table until an order
> can be made. The element's key value is used as the key for placing it
> in the hash table.
>
> If the completed element has a key value that matches the next expected
> sequence number, then we know this element is in-order and we can push
> it on the used ring. Then we increment the next expected sequence number
> and check if the hash table contains an element at this key location.
>
> If so, we retrieve this element, push it to the used ring, delete the
> key-value pair from the hash table, increment the next expected sequence
> number, and check the hash table again for an element at this new key
> location. This process is repeated until we're unable to find an element
> in the hash table to continue the order.
>
> So, for example, say the 3 elements we enqueued were completed in the
> following order: elem1, elem2, elem0. The next expected sequence number
> is 0:
>
> exp-seq-num = 0:
>
>  elem1   --> elem1.key == exp-seq-num ? --> No, stash it
> (key: 1) |
>  |
>  v
>
>|key: 1 - elem1|
>
> -
> exp-seq-num = 0:
>
>  elem2   --> elem2.key == exp-seq-num ? --> No, stash it
> (key: 2) |
>  |
>  v
>
>|key: 1 - elem1|
>|--|
>|key: 2 - elem2|
>
> -
> exp-seq-num = 0:
>
>  elem0   --> elem0.key == exp-seq-num ? --> Yes, push to used ring
> (key: 0)
>
> exp-seq-num = 1:
>
> lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>  remove elem from table
>  |
>  v
>
>|key: 2 - elem2|
>
>
> exp-seq-num = 2:
>
> lookup(table, exp-seq-num) != NULL ? --> Yes, push to used ring,
>

Re: [RFC 8/8] virtio: Add VIRTIO_F_IN_ORDER property definition

2024-03-22 Thread Eugenio Perez Martin

On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  wrote:
>
> Extend the virtio device property definitions to include the
> VIRTIO_F_IN_ORDER feature.
>
> The default state of this feature is disabled, allowing it to be
> explicitly enabled where it's supported.
>

Acked-by: Eugenio Pérez 

Thanks!

> Signed-off-by: Jonah Palmer 
> ---
>  include/hw/virtio/virtio.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index eeeda397a9..ffd78830a3 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -400,7 +400,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
>  DEFINE_PROP_BIT64("packed", _state, _field, \
>VIRTIO_F_RING_PACKED, false), \
>  DEFINE_PROP_BIT64("queue_reset", _state, _field, \
> -  VIRTIO_F_RING_RESET, true)
> +  VIRTIO_F_RING_RESET, true), \
> +DEFINE_PROP_BIT64("in_order", _state, _field, \
> +  VIRTIO_F_IN_ORDER, false)
>
>  hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
>  bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
> --
> 2.39.3
>

Re: [RFC 7/8] vhost/vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits

2024-03-22 Thread Eugenio Perez Martin

On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  wrote:
>
> Add support for the VIRTIO_F_IN_ORDER feature across a variety of vhost
> devices.
>
> The inclusion of VIRTIO_F_IN_ORDER in the feature bits arrays for these
> devices ensures that the backend is capable of offering and providing
> support for this feature, and that it can be disabled if the backend
> does not support it.
>

Acked-by: Eugenio Pérez 

Thanks!

> Signed-off-by: Jonah Palmer 
> ---
>  hw/block/vhost-user-blk.c| 1 +
>  hw/net/vhost_net.c   | 2 ++
>  hw/scsi/vhost-scsi.c | 1 +
>  hw/scsi/vhost-user-scsi.c| 1 +
>  hw/virtio/vhost-user-fs.c| 1 +
>  hw/virtio/vhost-user-vsock.c | 1 +
>  net/vhost-vdpa.c | 1 +
>  7 files changed, 8 insertions(+)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 6a856ad51a..d176ed857e 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_IOMMU_PLATFORM,
>  VIRTIO_F_RING_RESET,
> +VIRTIO_F_IN_ORDER,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index e8e1661646..33d1d4b9d3 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
>  VIRTIO_F_IOMMU_PLATFORM,
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_RING_RESET,
> +VIRTIO_F_IN_ORDER,
>  VIRTIO_NET_F_HASH_REPORT,
>  VHOST_INVALID_FEATURE_BIT
>  };
> @@ -76,6 +77,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_F_IOMMU_PLATFORM,
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_RING_RESET,
> +VIRTIO_F_IN_ORDER,
>  VIRTIO_NET_F_RSS,
>  VIRTIO_NET_F_HASH_REPORT,
>  VIRTIO_NET_F_GUEST_USO4,
> diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
> index ae26bc19a4..40e7630191 100644
> --- a/hw/scsi/vhost-scsi.c
> +++ b/hw/scsi/vhost-scsi.c
> @@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
>  VIRTIO_RING_F_EVENT_IDX,
>  VIRTIO_SCSI_F_HOTPLUG,
>  VIRTIO_F_RING_RESET,
> +VIRTIO_F_IN_ORDER,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index a63b1f4948..1d59951ab7 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_RING_F_EVENT_IDX,
>  VIRTIO_SCSI_F_HOTPLUG,
>  VIRTIO_F_RING_RESET,
> +VIRTIO_F_IN_ORDER,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index cca2cd41be..9243dbb128 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -33,6 +33,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_IOMMU_PLATFORM,
>  VIRTIO_F_RING_RESET,
> +VIRTIO_F_IN_ORDER,
>
>  VHOST_INVALID_FEATURE_BIT
>  };
> diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
> index 9431b9792c..cc7e4e47b4 100644
> --- a/hw/virtio/vhost-user-vsock.c
> +++ b/hw/virtio/vhost-user-vsock.c
> @@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_RING_F_INDIRECT_DESC,
>  VIRTIO_RING_F_EVENT_IDX,
>  VIRTIO_F_NOTIFY_ON_EMPTY,
> +VIRTIO_F_IN_ORDER,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 85e73dd6a7..ed3185acfa 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_RING_RESET,
>  VIRTIO_F_VERSION_1,
> +VIRTIO_F_IN_ORDER,
>  VIRTIO_NET_F_CSUM,
>  VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
>  VIRTIO_NET_F_CTRL_MAC_ADDR,
> --
> 2.39.3
>

Re: [RFC 4/8] virtio: Implement in-order handling for virtio devices

2024-03-22 Thread Eugenio Perez Martin

On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  wrote:
>
> Implements in-order handling for most virtio devices using the
> VIRTIO_F_IN_ORDER transport feature, specifically those who call
> virtqueue_push to push their used elements onto the used ring.
>
> The logic behind this implementation is as follows:
>
> 1.) virtqueue_pop always enqueues VirtQueueElements in-order.
>
> virtqueue_pop always retrieves one or more buffer descriptors in-order
> from the available ring and converts them into a VirtQueueElement. This
> means that the order in which VirtQueueElements are enqueued are
> in-order by default.
>
> By virtue, as VirtQueueElements are created, we can assign a sequential
> key value to them. This preserves the order of buffers that have been
> made available to the device by the driver.
>
> As VirtQueueElements are assigned a key value, the current sequence
> number is incremented.
>
> 2.) Requests can be completed out-of-order.
>
> While most devices complete requests in the same order that they were
> enqueued by default, some devices don't (e.g. virtio-blk). The goal of
> this out-of-order handling is to reduce the impact of devices that
> process elements in-order by default while also guaranteeing compliance
> with the VIRTIO_F_IN_ORDER feature.
>
> Below is the logic behind handling completed requests (which may or may
> not be in-order).
>
> 3.) Does the incoming used VirtQueueElement preserve the correct order?
>
> In other words, is the sequence number (key) assigned to the
> VirtQueueElement the expected number that would preserve the original
> order?
>
> 3a.)
> If it does... immediately push the used element onto the used ring.
> Then increment the next expected sequence number and check to see if
> any previous out-of-order VirtQueueElements stored on the hash table
> has a key that matches this next expected sequence number.
>
> For each VirtQueueElement found on the hash table with a matching key:
> push the element on the used ring, remove the key-value pair from the
> hash table, and then increment the next expected sequence number. Repeat
> this process until we're unable to find an element with a matching key.
>
> Note that if the device uses batching (e.g. virtio-net), then we skip
> the virtqueue_flush call and let the device call it themselves.
>
> 3b.)
> If it does not... stash the VirtQueueElement, along with relevant data,
> as a InOrderVQElement on the hash table. The key used is the order_key
> that was assigned when the VirtQueueElement was created.
>
> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio.c | 70 --
>  include/hw/virtio/virtio.h |  8 +
>  2 files changed, 76 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index 40124545d6..40e4377f1e 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -992,12 +992,56 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
>  }
>  }
>
> +void virtqueue_order_element(VirtQueue *vq, const VirtQueueElement *elem,
> + unsigned int len, unsigned int idx,
> + unsigned int count)
> +{
> +InOrderVQElement *in_order_elem;
> +
> +if (elem->order_key == vq->current_order_idx) {
> +/* Element is in-order, push to used ring */
> +virtqueue_fill(vq, elem, len, idx);
> +
> +/* Batching? Don't flush */
> +if (count) {
> +virtqueue_flush(vq, count);

The "count" parameter is the number of heads used, but here you're
only using one head (elem). Same with the other virtqueue_flush in the
function.

Also, this function sometimes replaces virtqueue_fill and other
replaces virtqueue_fill + virtqueue_flush (both examples in patch
6/8). I have the impression the series would be simpler if
virtqueue_order_element is a static function just handling the
virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER) path of
virtqueue_fill, so the caller does not need to know if the in_order
feature is on or off.

> +}
> +
> +/* Increment next expected order, search for more in-order elements 
> */
> +while ((in_order_elem = g_hash_table_lookup(vq->in_order_ht,
> +GUINT_TO_POINTER(++vq->current_order_idx))) != NULL) 
> {
> +/* Found in-order element, push to used ring */
> +virtqueue_fill(vq, in_order_elem->elem, in_order_elem->len,
> +   in_order_elem->idx);
> +
> +/* Batching? Don't flush */
> +if (count) {
> +virtqueue_flush(vq, in_order_elem->count);
> +}
> +
> +/* Remove key-value pair from hash table */
> +g_hash_table_remove(vq->in_order_ht,
> +GUINT_TO_POINTER(vq->current_order_idx));
> +}
> +} else {
> +/* Element is out-of-order, stash in hash table */
> +in_order_elem =

Re: [RFC 1/8] virtio: Define InOrderVQElement

2024-03-22 Thread Eugenio Perez Martin

On Thu, Mar 21, 2024 at 4:57 PM Jonah Palmer  wrote:
>
> Define the InOrderVQElement structure for the VIRTIO_F_IN_ORDER
> transport feature implementation.
>
> The InOrderVQElement structure is used to encapsulate out-of-order
> VirtQueueElement data that was processed by the host. This data
> includes:
>  - The processed VirtQueueElement (elem)
>  - Length of data (len)
>  - VirtQueueElement array index (idx)
>  - Number of processed VirtQueueElements (count)
>
> InOrderVQElements will be stored in a buffering mechanism until an
> order can be achieved.
>
> Signed-off-by: Jonah Palmer 
> ---
>  include/hw/virtio/virtio.h | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index b3c74a1bca..c8aa435a5e 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -77,6 +77,13 @@ typedef struct VirtQueueElement
>  struct iovec *out_sg;
>  } VirtQueueElement;
>
> +typedef struct InOrderVQElement {
> +const VirtQueueElement *elem;

Some subsystems allocate space for extra elements after
VirtQueueElement, like VirtIOBlockReq. You can request virtqueue_pop
to allocate this extra space by its second argument. Would it work for
this?

> +unsigned int len;
> +unsigned int idx;
> +unsigned int count;

Now I don't get why these fields cannot be obtained from elem->(len,
index, ndescs) ?

> +} InOrderVQElement;
> +
>  #define VIRTIO_QUEUE_MAX 1024
>
>  #define VIRTIO_NO_VECTOR 0x
> --
> 2.39.3
>

Re: [PATCH for-9.0 v3] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-03-19 Thread Eugenio Perez Martin

On Tue, Mar 19, 2024 at 11:00 AM Kevin Wolf  wrote:
>
> Am 18.03.2024 um 20:27 hat Eugenio Perez Martin geschrieben:
> > On Mon, Mar 18, 2024 at 10:02 AM Michael S. Tsirkin  wrote:
> > >
> > > On Mon, Mar 18, 2024 at 12:31:26PM +0800, Jason Wang wrote:
> > > > On Fri, Mar 15, 2024 at 11:59 PM Kevin Wolf  wrote:
> > > > >
> > > > > VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> > > > > status flag is set; with the current API of the kernel module, it is
> > > > > impossible to enable the opposite order in our block export code 
> > > > > because
> > > > > userspace is not notified when a virtqueue is enabled.
> > > > >
> > > > > This requirement also mathces the normal initialisation order as done 
> > > > > by
> > > > > the generic vhost code in QEMU. However, commit 6c482547 accidentally
> > > > > changed the order for vdpa-dev and broke access to VDUSE devices with
> > > > > this.
> > > > >
> > > > > This changes vdpa-dev to use the normal order again and use the 
> > > > > standard
> > > > > vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> > > > > used with vdpa-dev again after this fix.
> > > > >
> > > > > vhost_net intentionally avoided enabling the vrings for vdpa and does
> > > > > this manually later while it does enable them for other vhost 
> > > > > backends.
> > > > > Reflect this in the vhost_net code and return early for vdpa, so that
> > > > > the behaviour doesn't change for this device.
> > > > >
> > > > > Cc: qemu-sta...@nongnu.org
> > > > > Fixes: 6c4825476a4351530bcac17abab72295b75ffe98
> > > > > Signed-off-by: Kevin Wolf 
> > > > > ---
> > > > > v2:
> > > > > - Actually make use of the @enable parameter
> > > > > - Change vhost_net to preserve the current behaviour
> > > > >
> > > > > v3:
> > > > > - Updated trace point [Stefano]
> > > > > - Fixed typo in comment [Stefano]
> > > > >
> > > > >  hw/net/vhost_net.c | 10 ++
> > > > >  hw/virtio/vdpa-dev.c   |  5 +
> > > > >  hw/virtio/vhost-vdpa.c | 29 ++---
> > > > >  hw/virtio/vhost.c  |  8 +++-
> > > > >  hw/virtio/trace-events |  2 +-
> > > > >  5 files changed, 45 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > > > > index e8e1661646..fd1a93701a 100644
> > > > > --- a/hw/net/vhost_net.c
> > > > > +++ b/hw/net/vhost_net.c
> > > > > @@ -541,6 +541,16 @@ int vhost_set_vring_enable(NetClientState *nc, 
> > > > > int enable)
> > > > >  VHostNetState *net = get_vhost_net(nc);
> > > > >  const VhostOps *vhost_ops = net->dev.vhost_ops;
> > > > >
> > > > > +/*
> > > > > + * vhost-vdpa network devices need to enable dataplane 
> > > > > virtqueues after
> > > > > + * DRIVER_OK, so they can recover device state before starting 
> > > > > dataplane.
> > > > > + * Because of that, we don't enable virtqueues here and leave it 
> > > > > to
> > > > > + * net/vhost-vdpa.c.
> > > > > + */
> > > > > +if (nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> > > > > +return 0;
> > > > > +}
> > > >
> > > > I think we need some inputs from Eugenio, this is only needed for
> > > > shadow virtqueue during live migration but not other cases.
> > > >
> > > > Thanks
> > >
> > >
> > > Yes I think we had a backend flag for this, right? Eugenio can you
> > > comment please?
> > >
> >
> > We have the VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend flag,
> > right. If the backend does not offer it, it is better to enable all
> > the queues here and add a migration blocker in net/vhost-vdpa.c.
> >
> > So the check should be:
> > nc->info->type == VHOST_VDPA && (backend_features &
> > VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK).
> >
> > I can manage to add the migration blocker on top of this patch.
>
> Note that my patch preserves the current behaviour for vhost_net. The
> callback wasn't implemented for vdpa so far, so we never called anything
> even if the flag wasn't set. This patch adds an implementation for the
> callback, so we have to skip it here to have everything in vhost_net
> work as before - which is what the condition as written does.
>
> If we add a check for the flag now (I don't know if that's correct or
> not), that would be a second, unrelated change of behaviour in the same
> patch. So if it's necessary, that's a preexisting problem and I'd argue
> it doesn't belong in this patch, but should be done separately.
>

Right, that's a very good point. I'll add proper checking on top of
your patch when it is merged.

Reviewed-by: Eugenio Pérez 

Thanks!

Re: [PATCH for-9.0 v3] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-03-18 Thread Eugenio Perez Martin

On Mon, Mar 18, 2024 at 10:02 AM Michael S. Tsirkin  wrote:
>
> On Mon, Mar 18, 2024 at 12:31:26PM +0800, Jason Wang wrote:
> > On Fri, Mar 15, 2024 at 11:59 PM Kevin Wolf  wrote:
> > >
> > > VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> > > status flag is set; with the current API of the kernel module, it is
> > > impossible to enable the opposite order in our block export code because
> > > userspace is not notified when a virtqueue is enabled.
> > >
> > > This requirement also mathces the normal initialisation order as done by
> > > the generic vhost code in QEMU. However, commit 6c482547 accidentally
> > > changed the order for vdpa-dev and broke access to VDUSE devices with
> > > this.
> > >
> > > This changes vdpa-dev to use the normal order again and use the standard
> > > vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> > > used with vdpa-dev again after this fix.
> > >
> > > vhost_net intentionally avoided enabling the vrings for vdpa and does
> > > this manually later while it does enable them for other vhost backends.
> > > Reflect this in the vhost_net code and return early for vdpa, so that
> > > the behaviour doesn't change for this device.
> > >
> > > Cc: qemu-sta...@nongnu.org
> > > Fixes: 6c4825476a4351530bcac17abab72295b75ffe98
> > > Signed-off-by: Kevin Wolf 
> > > ---
> > > v2:
> > > - Actually make use of the @enable parameter
> > > - Change vhost_net to preserve the current behaviour
> > >
> > > v3:
> > > - Updated trace point [Stefano]
> > > - Fixed typo in comment [Stefano]
> > >
> > >  hw/net/vhost_net.c | 10 ++
> > >  hw/virtio/vdpa-dev.c   |  5 +
> > >  hw/virtio/vhost-vdpa.c | 29 ++---
> > >  hw/virtio/vhost.c  |  8 +++-
> > >  hw/virtio/trace-events |  2 +-
> > >  5 files changed, 45 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > > index e8e1661646..fd1a93701a 100644
> > > --- a/hw/net/vhost_net.c
> > > +++ b/hw/net/vhost_net.c
> > > @@ -541,6 +541,16 @@ int vhost_set_vring_enable(NetClientState *nc, int 
> > > enable)
> > >  VHostNetState *net = get_vhost_net(nc);
> > >  const VhostOps *vhost_ops = net->dev.vhost_ops;
> > >
> > > +/*
> > > + * vhost-vdpa network devices need to enable dataplane virtqueues 
> > > after
> > > + * DRIVER_OK, so they can recover device state before starting 
> > > dataplane.
> > > + * Because of that, we don't enable virtqueues here and leave it to
> > > + * net/vhost-vdpa.c.
> > > + */
> > > +if (nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> > > +return 0;
> > > +}
> >
> > I think we need some inputs from Eugenio, this is only needed for
> > shadow virtqueue during live migration but not other cases.
> >
> > Thanks
>
>
> Yes I think we had a backend flag for this, right? Eugenio can you
> comment please?
>

We have the VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK backend flag,
right. If the backend does not offer it, it is better to enable all
the queues here and add a migration blocker in net/vhost-vdpa.c.

So the check should be:
nc->info->type == VHOST_VDPA && (backend_features &
VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK).

I can manage to add the migration blocker on top of this patch.

Thanks!

Re: [PATCH v3 for 9.1 1/6] virtio/virtio-pci: Handle extra notification data

2024-03-15 Thread Eugenio Perez Martin

On Fri, Mar 15, 2024 at 5:57 PM Jonah Palmer  wrote:
>
> Add support to virtio-pci devices for handling the extra data sent
> from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
> transport feature has been negotiated.
>
> The extra data that's passed to the virtio-pci device when this
> feature is enabled varies depending on the device's virtqueue
> layout.
>
> In a split virtqueue layout, this data includes:
>  - upper 16 bits: shadow_avail_idx
>  - lower 16 bits: virtqueue index
>
> In a packed virtqueue layout, this data includes:
>  - upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
>  - lower 16 bits: virtqueue index
>

Reviewed-by: Eugenio Pérez 

Thanks!

> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio-pci.c | 11 ---
>  hw/virtio/virtio.c | 18 ++
>  include/hw/virtio/virtio.h |  2 ++
>  3 files changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index cb6940fc0e..f3e0a08f53 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  {
>  VirtIOPCIProxy *proxy = opaque;
>  VirtIODevice *vdev = virtio_bus_get_device(>bus);
> -uint16_t vector;
> +uint16_t vector, vq_idx;
>  hwaddr pa;
>
>  switch (addr) {
> @@ -408,8 +408,13 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  vdev->queue_sel = val;
>  break;
>  case VIRTIO_PCI_QUEUE_NOTIFY:
> -if (val < VIRTIO_QUEUE_MAX) {
> -virtio_queue_notify(vdev, val);
> +vq_idx = val;
> +if (vq_idx < VIRTIO_QUEUE_MAX && virtio_queue_get_num(vdev, vq_idx)) 
> {
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
> +virtio_queue_set_shadow_avail_idx(virtio_get_queue(vdev, 
> vq_idx),
> +  val >> 16);
> +}
> +virtio_queue_notify(vdev, vq_idx);
>  }
>  break;
>  case VIRTIO_PCI_STATUS:
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index d229755eae..463426ca92 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2255,6 +2255,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, 
> int align)
>  }
>  }
>
> +void virtio_queue_set_shadow_avail_idx(VirtQueue *vq, uint16_t 
> shadow_avail_idx)
> +{
> +if (!vq->vring.desc) {
> +return;
> +}
> +
> +/*
> + * 16-bit data for packed VQs include 1-bit wrap counter and
> + * 15-bit shadow_avail_idx.
> + */
> +if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
> +vq->shadow_avail_wrap_counter = (shadow_avail_idx >> 15) & 0x1;
> +vq->shadow_avail_idx = shadow_avail_idx & 0x7FFF;
> +} else {
> +vq->shadow_avail_idx = shadow_avail_idx;
> +}
> +}
> +
>  static void virtio_queue_notify_vq(VirtQueue *vq)
>  {
>  if (vq->vring.desc && vq->handle_output) {
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index c8f72850bc..cdd4f86b61 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -306,6 +306,8 @@ int virtio_queue_ready(VirtQueue *vq);
>
>  int virtio_queue_empty(VirtQueue *vq);
>
> +void virtio_queue_set_shadow_avail_idx(VirtQueue *vq, uint16_t idx);
> +
>  /* Host binding interface.  */
>
>  uint32_t virtio_config_readb(VirtIODevice *vdev, uint32_t addr);
> --
> 2.39.3
>

Re: [PATCH v2 1/6] virtio/virtio-pci: Handle extra notification data

2024-03-15 Thread Eugenio Perez Martin

On Thu, Mar 14, 2024 at 9:24 PM Jonah Palmer  wrote:
>
>
>
> On 3/14/24 3:05 PM, Eugenio Perez Martin wrote:
> > On Thu, Mar 14, 2024 at 5:06 PM Jonah Palmer  
> > wrote:
> >>
> >>
> >>
> >> On 3/14/24 10:55 AM, Eugenio Perez Martin wrote:
> >>> On Thu, Mar 14, 2024 at 1:16 PM Jonah Palmer  
> >>> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 3/13/24 11:01 PM, Jason Wang wrote:
> >>>>> On Wed, Mar 13, 2024 at 7:55 PM Jonah Palmer  
> >>>>> wrote:
> >>>>>>
> >>>>>> Add support to virtio-pci devices for handling the extra data sent
> >>>>>> from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
> >>>>>> transport feature has been negotiated.
> >>>>>>
> >>>>>> The extra data that's passed to the virtio-pci device when this
> >>>>>> feature is enabled varies depending on the device's virtqueue
> >>>>>> layout.
> >>>>>>
> >>>>>> In a split virtqueue layout, this data includes:
> >>>>>> - upper 16 bits: shadow_avail_idx
> >>>>>> - lower 16 bits: virtqueue index
> >>>>>>
> >>>>>> In a packed virtqueue layout, this data includes:
> >>>>>> - upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
> >>>>>> - lower 16 bits: virtqueue index
> >>>>>>
> >>>>>> Tested-by: Lei Yang 
> >>>>>> Reviewed-by: Eugenio Pérez 
> >>>>>> Signed-off-by: Jonah Palmer 
> >>>>>> ---
> >>>>>> hw/virtio/virtio-pci.c | 10 +++---
> >>>>>> hw/virtio/virtio.c | 18 ++
> >>>>>> include/hw/virtio/virtio.h |  1 +
> >>>>>> 3 files changed, 26 insertions(+), 3 deletions(-)
> >>>>>>
> >>>>>> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> >>>>>> index cb6940fc0e..0f5c3c3b2f 100644
> >>>>>> --- a/hw/virtio/virtio-pci.c
> >>>>>> +++ b/hw/virtio/virtio-pci.c
> >>>>>> @@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, 
> >>>>>> uint32_t addr, uint32_t val)
> >>>>>> {
> >>>>>> VirtIOPCIProxy *proxy = opaque;
> >>>>>> VirtIODevice *vdev = virtio_bus_get_device(>bus);
> >>>>>> -uint16_t vector;
> >>>>>> +uint16_t vector, vq_idx;
> >>>>>> hwaddr pa;
> >>>>>>
> >>>>>> switch (addr) {
> >>>>>> @@ -408,8 +408,12 @@ static void virtio_ioport_write(void *opaque, 
> >>>>>> uint32_t addr, uint32_t val)
> >>>>>> vdev->queue_sel = val;
> >>>>>> break;
> >>>>>> case VIRTIO_PCI_QUEUE_NOTIFY:
> >>>>>> -if (val < VIRTIO_QUEUE_MAX) {
> >>>>>> -virtio_queue_notify(vdev, val);
> >>>>>> +vq_idx = val;
> >>>>>> +if (vq_idx < VIRTIO_QUEUE_MAX) {
> >>>>>> +if (virtio_vdev_has_feature(vdev, 
> >>>>>> VIRTIO_F_NOTIFICATION_DATA)) {
> >>>>>> +virtio_queue_set_shadow_avail_data(vdev, val);
> >>>>>> +}
> >>>>>> +virtio_queue_notify(vdev, vq_idx);
> >>>>>> }
> >>>>>> break;
> >>>>>> case VIRTIO_PCI_STATUS:
> >>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >>>>>> index d229755eae..bcb9e09df0 100644
> >>>>>> --- a/hw/virtio/virtio.c
> >>>>>> +++ b/hw/virtio/virtio.c
> >>>>>> @@ -2255,6 +2255,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, 
> >>>>>> int n, int align)
> >>>>>> }
> >>>>>> }
> >>>>>>
> >>>>>> +void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t 
> >>>>>> data)
> >>>
> >>> Maybe I didn't explain

Re: [PATCH v2 1/6] virtio/virtio-pci: Handle extra notification data

2024-03-14 Thread Eugenio Perez Martin

On Thu, Mar 14, 2024 at 5:06 PM Jonah Palmer  wrote:
>
>
>
> On 3/14/24 10:55 AM, Eugenio Perez Martin wrote:
> > On Thu, Mar 14, 2024 at 1:16 PM Jonah Palmer  
> > wrote:
> >>
> >>
> >>
> >> On 3/13/24 11:01 PM, Jason Wang wrote:
> >>> On Wed, Mar 13, 2024 at 7:55 PM Jonah Palmer  
> >>> wrote:
> >>>>
> >>>> Add support to virtio-pci devices for handling the extra data sent
> >>>> from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
> >>>> transport feature has been negotiated.
> >>>>
> >>>> The extra data that's passed to the virtio-pci device when this
> >>>> feature is enabled varies depending on the device's virtqueue
> >>>> layout.
> >>>>
> >>>> In a split virtqueue layout, this data includes:
> >>>>- upper 16 bits: shadow_avail_idx
> >>>>- lower 16 bits: virtqueue index
> >>>>
> >>>> In a packed virtqueue layout, this data includes:
> >>>>- upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
> >>>>- lower 16 bits: virtqueue index
> >>>>
> >>>> Tested-by: Lei Yang 
> >>>> Reviewed-by: Eugenio Pérez 
> >>>> Signed-off-by: Jonah Palmer 
> >>>> ---
> >>>>hw/virtio/virtio-pci.c | 10 +++---
> >>>>hw/virtio/virtio.c | 18 ++
> >>>>include/hw/virtio/virtio.h |  1 +
> >>>>3 files changed, 26 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> >>>> index cb6940fc0e..0f5c3c3b2f 100644
> >>>> --- a/hw/virtio/virtio-pci.c
> >>>> +++ b/hw/virtio/virtio-pci.c
> >>>> @@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, 
> >>>> uint32_t addr, uint32_t val)
> >>>>{
> >>>>VirtIOPCIProxy *proxy = opaque;
> >>>>VirtIODevice *vdev = virtio_bus_get_device(>bus);
> >>>> -uint16_t vector;
> >>>> +uint16_t vector, vq_idx;
> >>>>hwaddr pa;
> >>>>
> >>>>switch (addr) {
> >>>> @@ -408,8 +408,12 @@ static void virtio_ioport_write(void *opaque, 
> >>>> uint32_t addr, uint32_t val)
> >>>>vdev->queue_sel = val;
> >>>>break;
> >>>>case VIRTIO_PCI_QUEUE_NOTIFY:
> >>>> -if (val < VIRTIO_QUEUE_MAX) {
> >>>> -virtio_queue_notify(vdev, val);
> >>>> +vq_idx = val;
> >>>> +if (vq_idx < VIRTIO_QUEUE_MAX) {
> >>>> +if (virtio_vdev_has_feature(vdev, 
> >>>> VIRTIO_F_NOTIFICATION_DATA)) {
> >>>> +virtio_queue_set_shadow_avail_data(vdev, val);
> >>>> +}
> >>>> +virtio_queue_notify(vdev, vq_idx);
> >>>>}
> >>>>break;
> >>>>case VIRTIO_PCI_STATUS:
> >>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >>>> index d229755eae..bcb9e09df0 100644
> >>>> --- a/hw/virtio/virtio.c
> >>>> +++ b/hw/virtio/virtio.c
> >>>> @@ -2255,6 +2255,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, 
> >>>> int n, int align)
> >>>>}
> >>>>}
> >>>>
> >>>> +void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t 
> >>>> data)
> >
> > Maybe I didn't explain well, but I think it is better to pass directly
> > idx to a VirtQueue *. That way only the caller needs to check for a
> > valid vq idx, and (my understanding is) the virtio.c interface is
> > migrating to VirtQueue * use anyway.
> >
>
> Oh, are you saying to just pass in a VirtQueue *vq instead of
> VirtIODevice *vdev and get rid of the vq->vring.desc check in the function?
>

No, that needs to be kept. I meant the access to vdev->vq[i] without
checking for a valid i.

You can get the VirtQueue in the caller with virtio_get_queue. Which
also does not check for a valid index, but that way is clearer the
caller needs to check it.

As a side note, the check for desc != 0 is widespread in QEMU but the
driver may use 0 address for desc, so it's not 100% valid. But to
change that now requi

Re: [PATCH v2 1/6] virtio/virtio-pci: Handle extra notification data

2024-03-14 Thread Eugenio Perez Martin

On Thu, Mar 14, 2024 at 1:16 PM Jonah Palmer  wrote:
>
>
>
> On 3/13/24 11:01 PM, Jason Wang wrote:
> > On Wed, Mar 13, 2024 at 7:55 PM Jonah Palmer  
> > wrote:
> >>
> >> Add support to virtio-pci devices for handling the extra data sent
> >> from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
> >> transport feature has been negotiated.
> >>
> >> The extra data that's passed to the virtio-pci device when this
> >> feature is enabled varies depending on the device's virtqueue
> >> layout.
> >>
> >> In a split virtqueue layout, this data includes:
> >>   - upper 16 bits: shadow_avail_idx
> >>   - lower 16 bits: virtqueue index
> >>
> >> In a packed virtqueue layout, this data includes:
> >>   - upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
> >>   - lower 16 bits: virtqueue index
> >>
> >> Tested-by: Lei Yang 
> >> Reviewed-by: Eugenio Pérez 
> >> Signed-off-by: Jonah Palmer 
> >> ---
> >>   hw/virtio/virtio-pci.c | 10 +++---
> >>   hw/virtio/virtio.c | 18 ++
> >>   include/hw/virtio/virtio.h |  1 +
> >>   3 files changed, 26 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> >> index cb6940fc0e..0f5c3c3b2f 100644
> >> --- a/hw/virtio/virtio-pci.c
> >> +++ b/hw/virtio/virtio-pci.c
> >> @@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> >> addr, uint32_t val)
> >>   {
> >>   VirtIOPCIProxy *proxy = opaque;
> >>   VirtIODevice *vdev = virtio_bus_get_device(>bus);
> >> -uint16_t vector;
> >> +uint16_t vector, vq_idx;
> >>   hwaddr pa;
> >>
> >>   switch (addr) {
> >> @@ -408,8 +408,12 @@ static void virtio_ioport_write(void *opaque, 
> >> uint32_t addr, uint32_t val)
> >>   vdev->queue_sel = val;
> >>   break;
> >>   case VIRTIO_PCI_QUEUE_NOTIFY:
> >> -if (val < VIRTIO_QUEUE_MAX) {
> >> -virtio_queue_notify(vdev, val);
> >> +vq_idx = val;
> >> +if (vq_idx < VIRTIO_QUEUE_MAX) {
> >> +if (virtio_vdev_has_feature(vdev, 
> >> VIRTIO_F_NOTIFICATION_DATA)) {
> >> +virtio_queue_set_shadow_avail_data(vdev, val);
> >> +}
> >> +virtio_queue_notify(vdev, vq_idx);
> >>   }
> >>   break;
> >>   case VIRTIO_PCI_STATUS:
> >> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >> index d229755eae..bcb9e09df0 100644
> >> --- a/hw/virtio/virtio.c
> >> +++ b/hw/virtio/virtio.c
> >> @@ -2255,6 +2255,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, int 
> >> n, int align)
> >>   }
> >>   }
> >>
> >> +void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data)

Maybe I didn't explain well, but I think it is better to pass directly
idx to a VirtQueue *. That way only the caller needs to check for a
valid vq idx, and (my understanding is) the virtio.c interface is
migrating to VirtQueue * use anyway.

> >> +{
> >> +/* Lower 16 bits is the virtqueue index */
> >> +uint16_t i = data;
> >> +VirtQueue *vq = >vq[i];
> >> +
> >> +if (!vq->vring.desc) {
> >> +return;
> >> +}
> >> +
> >> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> >> +vq->shadow_avail_wrap_counter = (data >> 31) & 0x1;
> >> +vq->shadow_avail_idx = (data >> 16) & 0x7FFF;
> >> +} else {
> >> +vq->shadow_avail_idx = (data >> 16);
> >
> > Do we need to do a sanity check for this value?
> >
> > Thanks
> >
>
> It can't hurt, right? What kind of check did you have in mind?
>
> if (vq->shadow_avail_idx >= vq->vring.num)
>

I'm a little bit lost too. shadow_avail_idx can take all uint16_t
values. Maybe you meant checking for a valid vq index, Jason?

Thanks!

> Or something else?
>
> >> +}
> >> +}
> >> +
> >>   static void virtio_queue_notify_vq(VirtQueue *vq)
> >>   {
> >>   if (vq->vring.desc && vq->handle_output) {
> >> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> >> index c8f72850bc..53915947a7 100644
> >> --- a/include/hw/virtio/virtio.h
> >> +++ b/include/hw/virtio/virtio.h
> >> @@ -335,6 +335,7 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int 
> >> n);
> >>   void virtio_init_region_cache(VirtIODevice *vdev, int n);
> >>   void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
> >>   void virtio_queue_notify(VirtIODevice *vdev, int n);
> >> +void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t 
> >> data);
> >>   uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
> >>   void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector);
> >>   int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
> >> --
> >> 2.39.3
> >>
> >
>

Re: [PATCH v2 2/6] virtio: Prevent creation of device using notification-data with ioeventfd

2024-03-13 Thread Eugenio Perez Martin

On Wed, Mar 13, 2024 at 12:55 PM Jonah Palmer  wrote:
>
> Prevent the realization of a virtio device that attempts to use the
> VIRTIO_F_NOTIFICATION_DATA transport feature without disabling
> ioeventfd.
>
> Due to ioeventfd not being able to carry the extra data associated with
> this feature, having both enabled is a functional mismatch and therefore
> Qemu should not continue the device's realization process.
>
> Although the device does not yet know if the feature will be
> successfully negotiated, many devices using this feature wont actually
> work without this extra data and would fail FEATURES_OK anyway.
>
> If ioeventfd is able to work with the extra notification data in the
> future, this compatibility check can be removed.
>
> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio.c | 22 ++
>  include/hw/virtio/virtio.h |  2 ++
>  2 files changed, 24 insertions(+)
>
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index bcb9e09df0..d0a433b465 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2971,6 +2971,20 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t 
> val)
>  return ret;
>  }
>
> +void virtio_device_check_notification_compatibility(VirtIODevice *vdev,
> +Error **errp)
> +{
> +VirtioBusState *bus = VIRTIO_BUS(qdev_get_parent_bus(DEVICE(vdev)));
> +VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(bus);
> +DeviceState *proxy = DEVICE(BUS(bus)->parent);
> +
> +if (virtio_host_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA) &&
> +k->ioeventfd_enabled(proxy)) {
> +error_setg(errp,
> +   "notification_data=on without ioeventfd=off is not 
> supported");
> +}
> +}
> +
>  size_t virtio_get_config_size(const VirtIOConfigSizeParams *params,
>uint64_t host_features)
>  {
> @@ -3731,6 +3745,14 @@ static void virtio_device_realize(DeviceState *dev, 
> Error **errp)
>  }
>  }
>
> +/* Devices should not use both ioeventfd and notification data feature */
> +virtio_device_check_notification_compatibility(vdev, );
> +if (err != NULL) {
> +error_propagate(errp, err);
> +vdc->unrealize(dev);
> +return;
> +}
> +
>  virtio_bus_device_plugged(vdev, );
>  if (err != NULL) {
>  error_propagate(errp, err);
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 53915947a7..e0325d84d0 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -346,6 +346,8 @@ void virtio_queue_reset(VirtIODevice *vdev, uint32_t 
> queue_index);
>  void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
>  void virtio_update_irq(VirtIODevice *vdev);
>  int virtio_set_features(VirtIODevice *vdev, uint64_t val);
> +void virtio_device_check_notification_compatibility(VirtIODevice *vdev,
> +Error **errp);

Why not make it static?

>
>  /* Base devices.  */
>  typedef struct VirtIOBlkConf VirtIOBlkConf;
> --
> 2.39.3
>

Re: [PATCH v1 2/8] virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

2024-03-08 Thread Eugenio Perez Martin

On Fri, Mar 8, 2024 at 6:01 PM Michael S. Tsirkin  wrote:
>
> On Mon, Mar 04, 2024 at 02:46:06PM -0500, Jonah Palmer wrote:
> > Prevent ioeventfd from being enabled/disabled when a virtio-pci
> > device has negotiated the VIRTIO_F_NOTIFICATION_DATA transport
> > feature.
> >
> > Due to ioeventfd not being able to carry the extra data associated with
> > this feature, the ioeventfd should be left in a disabled state for
> > emulated virtio-pci devices using this feature.
> >
> > Reviewed-by: Eugenio Pérez 
> > Signed-off-by: Jonah Palmer 
>
> I thought hard about this. I propose that for now,
> instead of disabling ioevetfd silently we error out unless
> user disabled it for us.
> WDYT?
>

Yes, error is a better plan than silently disabling it. In the
(unlikely?) case we are able to make notification data work with
eventfd in the future, it makes the change more evident.

>
> > ---
> >  hw/virtio/virtio-pci.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > index d12edc567f..287b8f7720 100644
> > --- a/hw/virtio/virtio-pci.c
> > +++ b/hw/virtio/virtio-pci.c
> > @@ -417,13 +417,15 @@ static void virtio_ioport_write(void *opaque, 
> > uint32_t addr, uint32_t val)
> >  }
> >  break;
> >  case VIRTIO_PCI_STATUS:
> > -if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
> > +if (!(val & VIRTIO_CONFIG_S_DRIVER_OK) &&
> > +!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
> >  virtio_pci_stop_ioeventfd(proxy);
> >  }
> >
> >  virtio_set_status(vdev, val & 0xFF);
> >
> > -if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
> > +if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
> > +!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
> >  virtio_pci_start_ioeventfd(proxy);
> >  }
> >
> > --
> > 2.39.3
>

Re: [PATCH v1 0/8] virtio,vhost: Add VIRTIO_F_NOTIFICATION_DATA support

2024-03-07 Thread Eugenio Perez Martin

On Wed, Mar 6, 2024 at 8:34 AM Michael S. Tsirkin  wrote:
>
> On Wed, Mar 06, 2024 at 08:07:31AM +0100, Eugenio Perez Martin wrote:
> > On Wed, Mar 6, 2024 at 6:34 AM Jason Wang  wrote:
> > >
> > > On Tue, Mar 5, 2024 at 3:46 AM Jonah Palmer  
> > > wrote:
> > > >
> > > > The goal of these patches are to add support to a variety of virtio and
> > > > vhost devices for the VIRTIO_F_NOTIFICATION_DATA transport feature. This
> > > > feature indicates that a driver will pass extra data (instead of just a
> > > > virtqueue's index) when notifying the corresponding device.
> > > >
> > > > The data passed in by the driver when this feature is enabled varies in
> > > > format depending on if the device is using a split or packed virtqueue
> > > > layout:
> > > >
> > > >  Split VQ
> > > >   - Upper 16 bits: shadow_avail_idx
> > > >   - Lower 16 bits: virtqueue index
> > > >
> > > >  Packed VQ
> > > >   - Upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
> > > >   - Lower 16 bits: virtqueue index
> > > >
> > > > Also, due to the limitations of ioeventfd not being able to carry the
> > > > extra provided by the driver, ioeventfd is left disabled for any devices
> > > > using this feature.
> > >
> > > Is there any method to overcome this? This might help for vhost.
> > >
> >
> > As a half-baked idea, read(2)ing an eventfd descriptor returns an
> > 8-byte integer already. The returned value of read depends on eventfd
> > flags, but both have to do with the number of writes of the other end.
> >
> > My proposal is to replace this value with the last value written by
> > the guest, so we can extract the virtio notification data from there.
> > The behavior of read is similar to not-EFD_SEMAPHORE, reading a value
> > and then blocking if read again without writes. The behavior of KVM
> > writes is different, as it is not a counter anymore.
> >
> > Thanks!
>
>
> I doubt you will be able to support this in ioeventfd...

I agree.

> But vhost does not really need the value at all.
> So why mask out ioeventfd with vhost?

The interface should not be able to start with vhost-kernel because
the feature is not offered by the vhost-kernel device. So ioeventfd is
always enabled with vhost-kernel.

Or do you mean we should allow it and let vhost-kernel fetch data from
the avail ring as usual? I'm ok with that but then the guest can place
any value to it, so the driver cannot be properly "validated by
software" that way.

> vhost-vdpa is probably the only one that might need it...

Right, but vhost-vdpa already supports doorbell memory regions so I
guess it has little use, isn't it?

Thanks!

>
>
>
> > > Thanks
> > >
> > > >
> > > > A significant aspect of this effort has been to maintain compatibility
> > > > across different backends. As such, the feature is offered by backend
> > > > devices only when supported, with fallback mechanisms where backend
> > > > support is absent.
> > > >
> > > > Jonah Palmer (8):
> > > >   virtio/virtio-pci: Handle extra notification data
> > > >   virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
> > > >   virtio-mmio: Handle extra notification data
> > > >   virtio-mmio: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
> > > >   virtio-ccw: Handle extra notification data
> > > >   virtio-ccw: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
> > > >   vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits
> > > >   virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition
> > > >
> > > >  hw/block/vhost-user-blk.c|  1 +
> > > >  hw/net/vhost_net.c   |  2 ++
> > > >  hw/s390x/s390-virtio-ccw.c   | 16 
> > > >  hw/s390x/virtio-ccw.c|  6 --
> > > >  hw/scsi/vhost-scsi.c |  1 +
> > > >  hw/scsi/vhost-user-scsi.c|  1 +
> > > >  hw/virtio/vhost-user-fs.c|  2 +-
> > > >  hw/virtio/vhost-user-vsock.c |  1 +
> > > >  hw/virtio/virtio-mmio.c  | 15 +++
> > > >  hw/virtio/virtio-pci.c   | 16 +++-
> > > >  hw/virtio/virtio.c   | 18 ++
> > > >  include/hw/virtio/virtio.h   |  5 -
> > > >  net/vhost-vdpa.c |  1 +
> > > >  13 files changed, 68 insertions(+), 17 deletions(-)
> > > >
> > > > --
> > > > 2.39.3
> > > >
> > >
>

Re: [PATCH v1 0/8] virtio,vhost: Add VIRTIO_F_NOTIFICATION_DATA support

2024-03-05 Thread Eugenio Perez Martin

On Wed, Mar 6, 2024 at 6:34 AM Jason Wang  wrote:
>
> On Tue, Mar 5, 2024 at 3:46 AM Jonah Palmer  wrote:
> >
> > The goal of these patches are to add support to a variety of virtio and
> > vhost devices for the VIRTIO_F_NOTIFICATION_DATA transport feature. This
> > feature indicates that a driver will pass extra data (instead of just a
> > virtqueue's index) when notifying the corresponding device.
> >
> > The data passed in by the driver when this feature is enabled varies in
> > format depending on if the device is using a split or packed virtqueue
> > layout:
> >
> >  Split VQ
> >   - Upper 16 bits: shadow_avail_idx
> >   - Lower 16 bits: virtqueue index
> >
> >  Packed VQ
> >   - Upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
> >   - Lower 16 bits: virtqueue index
> >
> > Also, due to the limitations of ioeventfd not being able to carry the
> > extra provided by the driver, ioeventfd is left disabled for any devices
> > using this feature.
>
> Is there any method to overcome this? This might help for vhost.
>

As a half-baked idea, read(2)ing an eventfd descriptor returns an
8-byte integer already. The returned value of read depends on eventfd
flags, but both have to do with the number of writes of the other end.

My proposal is to replace this value with the last value written by
the guest, so we can extract the virtio notification data from there.
The behavior of read is similar to not-EFD_SEMAPHORE, reading a value
and then blocking if read again without writes. The behavior of KVM
writes is different, as it is not a counter anymore.

Thanks!

> Thanks
>
> >
> > A significant aspect of this effort has been to maintain compatibility
> > across different backends. As such, the feature is offered by backend
> > devices only when supported, with fallback mechanisms where backend
> > support is absent.
> >
> > Jonah Palmer (8):
> >   virtio/virtio-pci: Handle extra notification data
> >   virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
> >   virtio-mmio: Handle extra notification data
> >   virtio-mmio: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
> >   virtio-ccw: Handle extra notification data
> >   virtio-ccw: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA
> >   vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits
> >   virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition
> >
> >  hw/block/vhost-user-blk.c|  1 +
> >  hw/net/vhost_net.c   |  2 ++
> >  hw/s390x/s390-virtio-ccw.c   | 16 
> >  hw/s390x/virtio-ccw.c|  6 --
> >  hw/scsi/vhost-scsi.c |  1 +
> >  hw/scsi/vhost-user-scsi.c|  1 +
> >  hw/virtio/vhost-user-fs.c|  2 +-
> >  hw/virtio/vhost-user-vsock.c |  1 +
> >  hw/virtio/virtio-mmio.c  | 15 +++
> >  hw/virtio/virtio-pci.c   | 16 +++-
> >  hw/virtio/virtio.c   | 18 ++
> >  include/hw/virtio/virtio.h   |  5 -
> >  net/vhost-vdpa.c |  1 +
> >  13 files changed, 68 insertions(+), 17 deletions(-)
> >
> > --
> > 2.39.3
> >
>

Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_NOTIFICATION_DATA support

2024-03-05 Thread Eugenio Perez Martin

On Tue, Mar 5, 2024 at 4:21 AM Xinying Yu  wrote:
>
> Of course,  I am glad to do.  And I need to clarify that our use case only 
> support VIRTIO_F_NOTIFICATION_DATA  transport feature on DPDK vDPA framework 
> which the backend type is NET_CLIENT_DRIVER_VHOST_USER and use 
> user_feature_bits. So the new feature add on vdpa_feature_bits  will not 
> under verified in our case.  Not sure this meets your expectations?

As long as the driver keeps using notification data it is not able to
tell the difference between one scenario or another, so yes.

> One more thing, I would ask how do  I get the full series patch? Do I copy 
> the RFC line by line from this link[1]?
>

lore.kernel.org is a good alternative as Thomas mentioned. Another
good one IMO is b4, which allows you to download the series and apply
with "git am" too using "b4 mbox
<20240301134330.4191007-1-jonah.pal...@oracle.com>" (untested).

https://pypi.org/project/b4/

Thanks!

> Thanks,
> Xinying
>
>
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2024-03/msg00090.html
>
> 
> From: Eugenio Perez Martin 
> Sent: Saturday, March 2, 2024 4:32 AM
> To: Wentao Jia ; Rick Zhong 
> ; Xinying Yu 
> Cc: Jonah Palmer ; qemu-de...@nongnu.org 
> ; m...@redhat.com ; 
> jasow...@redhat.com ; si-wei@oracle.com 
> ; boris.ostrov...@oracle.com 
> ; raph...@enfabrica.net ; 
> kw...@redhat.com ; hre...@redhat.com ; 
> pa...@linux.ibm.com ; borntrae...@linux.ibm.com 
> ; far...@linux.ibm.com ; 
> th...@redhat.com ; richard.hender...@linaro.org 
> ; da...@redhat.com ; 
> i...@linux.ibm.com ; coh...@redhat.com 
> ; pbonz...@redhat.com ; 
> f...@euphon.net ; stefa...@redhat.com ; 
> qemu-block@nongnu.org ; qemu-s3...@nongnu.org 
> ; virtio...@lists.linux.dev 
> Subject: Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_NOTIFICATION_DATA support
>
> Hi Wentao / Rick / Xinying Yu,
>
> Would it work for you to test this series on your use cases, so we
> make sure everything works as expected?
>
> Thanks!
>
> On Fri, Mar 1, 2024 at 2:44 PM Jonah Palmer  wrote:
> >
> > The goal of these patches are to add support to a variety of virtio and
> > vhost devices for the VIRTIO_F_NOTIFICATION_DATA transport feature. This
> > feature indicates that a driver will pass extra data (instead of just a
> > virtqueue's index) when notifying the corresponding device.
> >
> > The data passed in by the driver when this feature is enabled varies in
> > format depending on if the device is using a split or packed virtqueue
> > layout:
> >
> >  Split VQ
> >   - Upper 16 bits: last_avail_idx
> >   - Lower 16 bits: virtqueue index
> >
> >  Packed VQ
> >   - Upper 16 bits: 1-bit wrap counter & 15-bit last_avail_idx
> >   - Lower 16 bits: virtqueue index
> >
> > Also, due to the limitations of ioeventfd not being able to carry the
> > extra provided by the driver, ioeventfd is left disabled for any devices
> > using this feature.
> >
> > A significant aspect of this effort has been to maintain compatibility
> > across different backends. As such, the feature is offered by backend
> > devices only when supported, with fallback mechanisms where backend
> > support is absent.
> >
>
> Hi Wentao,
>

Re: [PATCH v1 4/8] virtio-mmio: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

2024-03-05 Thread Eugenio Perez Martin

On Mon, Mar 4, 2024 at 8:46 PM Jonah Palmer  wrote:
>
> Prevent ioeventfd from being enabled/disabled when a virtio-mmio device
> has negotiated the VIRTIO_F_NOTIFICATION_DATA transport feature.
>
> Due to ioeventfd not being able to carry the extra data associated with
> this feature, the ioeventfd should be left in a disabled state for
> emulated virtio-mmio devices using this feature.
>

Acked-by: Eugenio Pérez 

> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio-mmio.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
> index f99d5851a2..f42ed5c512 100644
> --- a/hw/virtio/virtio-mmio.c
> +++ b/hw/virtio/virtio-mmio.c
> @@ -421,7 +421,8 @@ static void virtio_mmio_write(void *opaque, hwaddr 
> offset, uint64_t value,
>  virtio_update_irq(vdev);
>  break;
>  case VIRTIO_MMIO_STATUS:
> -if (!(value & VIRTIO_CONFIG_S_DRIVER_OK)) {
> +if (!(value & VIRTIO_CONFIG_S_DRIVER_OK) &&
> +!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
>  virtio_mmio_stop_ioeventfd(proxy);
>  }
>
> @@ -433,7 +434,8 @@ static void virtio_mmio_write(void *opaque, hwaddr 
> offset, uint64_t value,
>
>  virtio_set_status(vdev, value & 0xff);
>
> -if (value & VIRTIO_CONFIG_S_DRIVER_OK) {
> +if ((value & VIRTIO_CONFIG_S_DRIVER_OK) &&
> +!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
>  virtio_mmio_start_ioeventfd(proxy);
>  }
>
> --
> 2.39.3
>

Re: [PATCH v1 3/8] virtio-mmio: Handle extra notification data

2024-03-05 Thread Eugenio Perez Martin

On Mon, Mar 4, 2024 at 8:46 PM Jonah Palmer  wrote:
>
> Add support to virtio-mmio devices for handling the extra data sent from
> the driver to the device when the VIRTIO_F_NOTIFICATION_DATA transport
> feature has been negotiated.
>
> The extra data that's passed to the virtio-mmio device when this feature
> is enabled varies depending on the device's virtqueue layout.
>
> The data passed to the virtio-mmio device is in the same format as the
> data passed to virtio-pci devices.
>

Acked-by: Eugenio Pérez 

> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio-mmio.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/virtio-mmio.c b/hw/virtio/virtio-mmio.c
> index 22f9fbcf5a..f99d5851a2 100644
> --- a/hw/virtio/virtio-mmio.c
> +++ b/hw/virtio/virtio-mmio.c
> @@ -248,6 +248,7 @@ static void virtio_mmio_write(void *opaque, hwaddr 
> offset, uint64_t value,
>  {
>  VirtIOMMIOProxy *proxy = (VirtIOMMIOProxy *)opaque;
>  VirtIODevice *vdev = virtio_bus_get_device(>bus);
> +uint16_t vq_idx;
>
>  trace_virtio_mmio_write_offset(offset, value);
>
> @@ -407,8 +408,12 @@ static void virtio_mmio_write(void *opaque, hwaddr 
> offset, uint64_t value,
>  }
>  break;
>  case VIRTIO_MMIO_QUEUE_NOTIFY:
> -if (value < VIRTIO_QUEUE_MAX) {
> -virtio_queue_notify(vdev, value);
> +vq_idx = value;
> +if (vq_idx < VIRTIO_QUEUE_MAX) {
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
> +virtio_queue_set_shadow_avail_data(vdev, value);
> +}
> +virtio_queue_notify(vdev, vq_idx);
>  }
>  break;
>  case VIRTIO_MMIO_INTERRUPT_ACK:
> --
> 2.39.3
>

Re: [PATCH v1 1/8] virtio/virtio-pci: Handle extra notification data

2024-03-05 Thread Eugenio Perez Martin

On Mon, Mar 4, 2024 at 8:46 PM Jonah Palmer  wrote:
>
> Add support to virtio-pci devices for handling the extra data sent
> from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
> transport feature has been negotiated.
>
> The extra data that's passed to the virtio-pci device when this
> feature is enabled varies depending on the device's virtqueue
> layout.
>
> In a split virtqueue layout, this data includes:
>  - upper 16 bits: shadow_avail_idx
>  - lower 16 bits: virtqueue index
>
> In a packed virtqueue layout, this data includes:
>  - upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx
>  - lower 16 bits: virtqueue index
>

Reviewed-by: Eugenio Pérez 

Thanks!

> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio-pci.c | 10 +++---
>  hw/virtio/virtio.c | 18 ++
>  include/hw/virtio/virtio.h |  1 +
>  3 files changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 1a7039fb0c..d12edc567f 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  {
>  VirtIOPCIProxy *proxy = opaque;
>  VirtIODevice *vdev = virtio_bus_get_device(>bus);
> -uint16_t vector;
> +uint16_t vector, vq_idx;
>  hwaddr pa;
>
>  switch (addr) {
> @@ -408,8 +408,12 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  vdev->queue_sel = val;
>  break;
>  case VIRTIO_PCI_QUEUE_NOTIFY:
> -if (val < VIRTIO_QUEUE_MAX) {
> -virtio_queue_notify(vdev, val);
> +vq_idx = val;
> +if (vq_idx < VIRTIO_QUEUE_MAX) {
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
> +virtio_queue_set_shadow_avail_data(vdev, val);
> +}
> +virtio_queue_notify(vdev, vq_idx);
>  }
>  break;
>  case VIRTIO_PCI_STATUS:
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index d229755eae..bcb9e09df0 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2255,6 +2255,24 @@ void virtio_queue_set_align(VirtIODevice *vdev, int n, 
> int align)
>  }
>  }
>
> +void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data)
> +{
> +/* Lower 16 bits is the virtqueue index */
> +uint16_t i = data;
> +VirtQueue *vq = >vq[i];
> +
> +if (!vq->vring.desc) {
> +return;
> +}
> +
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +vq->shadow_avail_wrap_counter = (data >> 31) & 0x1;
> +vq->shadow_avail_idx = (data >> 16) & 0x7FFF;
> +} else {
> +vq->shadow_avail_idx = (data >> 16);
> +}
> +}
> +
>  static void virtio_queue_notify_vq(VirtQueue *vq)
>  {
>  if (vq->vring.desc && vq->handle_output) {
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index c8f72850bc..53915947a7 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -335,6 +335,7 @@ void virtio_queue_update_rings(VirtIODevice *vdev, int n);
>  void virtio_init_region_cache(VirtIODevice *vdev, int n);
>  void virtio_queue_set_align(VirtIODevice *vdev, int n, int align);
>  void virtio_queue_notify(VirtIODevice *vdev, int n);
> +void virtio_queue_set_shadow_avail_data(VirtIODevice *vdev, uint32_t data);
>  uint16_t virtio_queue_vector(VirtIODevice *vdev, int n);
>  void virtio_queue_set_vector(VirtIODevice *vdev, int n, uint16_t vector);
>  int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n,
> --
> 2.39.3
>

Re: [RFC 1/8] virtio/virtio-pci: Handle extra notification data

2024-03-04 Thread Eugenio Perez Martin

On Mon, Mar 4, 2024 at 6:09 PM Jonah Palmer  wrote:
>
>
>
> On 3/1/24 2:31 PM, Eugenio Perez Martin wrote:
> > On Fri, Mar 1, 2024 at 2:44 PM Jonah Palmer  wrote:
> >>
> >> Add support to virtio-pci devices for handling the extra data sent
> >> from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
> >> transport feature has been negotiated.
> >>
> >> The extra data that's passed to the virtio-pci device when this
> >> feature is enabled varies depending on the device's virtqueue
> >> layout.
> >>
> >> In a split virtqueue layout, this data includes:
> >>   - upper 16 bits: last_avail_idx
> >>   - lower 16 bits: virtqueue index
> >>
> >> In a packed virtqueue layout, this data includes:
> >>   - upper 16 bits: 1-bit wrap counter & 15-bit last_avail_idx
> >>   - lower 16 bits: virtqueue index
> >>
> >> Signed-off-by: Jonah Palmer 
> >> ---
> >>   hw/virtio/virtio-pci.c | 13 ++---
> >>   hw/virtio/virtio.c | 13 +
> >>   include/hw/virtio/virtio.h |  1 +
> >>   3 files changed, 24 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> >> index 1a7039fb0c..c7c577b177 100644
> >> --- a/hw/virtio/virtio-pci.c
> >> +++ b/hw/virtio/virtio-pci.c
> >> @@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> >> addr, uint32_t val)
> >>   {
> >>   VirtIOPCIProxy *proxy = opaque;
> >>   VirtIODevice *vdev = virtio_bus_get_device(>bus);
> >> -uint16_t vector;
> >> +uint16_t vector, vq_idx;
> >>   hwaddr pa;
> >>
> >>   switch (addr) {
> >> @@ -408,8 +408,15 @@ static void virtio_ioport_write(void *opaque, 
> >> uint32_t addr, uint32_t val)
> >>   vdev->queue_sel = val;
> >>   break;
> >>   case VIRTIO_PCI_QUEUE_NOTIFY:
> >> -if (val < VIRTIO_QUEUE_MAX) {
> >> -virtio_queue_notify(vdev, val);
> >> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
> >> +vq_idx = val & 0x;
> >
> > Nitpick, but since vq_idx is already a uint16_t the & 0x is not
> > needed.
>
> Ah okay. I wasn't sure if it was worthwhile to keep the '& 0x' in or
> not for the sake of clarity and good practice. In that case I could just
> do away with vq_idx here and use explicit casting on 'val'.
>
> > I think it's cleaner just to call virtio_set_notification data
> > in the has_feature(...) condition, but I'm happy with this too.
>
> Do you mean something like:
>
> if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA) &&
>  virtio_set_notification_data(vdev, vq_idx, val)) {
>  ...
> }
>

Sorry I was not clear, I meant just to take out the common code of the
conditionals:
vq_idx = val;
if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA) {
virtio_set_notification_data(vdev, val);
}

> Though I'm not sure what would then go in the body of this conditional,
> especially if I did something like:
>
> case VIRTIO_PCI_QUEUE_NOTIFY:
>  if ((uint16_t)val < VIRTIO_QUEUE_MAX) {
>  if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA) &&
>  virtio_set_notification_data(vdev, val)) {
>  // Not sure what to put here other than a no-op
>  }
>
>  virtio_queue_notify(vdev, (uint16_t)val);
>  }
>  break;
>
> But I'm not sure if you'd prefer this explicit casting of 'val' over
> implicit casting like:
>
> uint16_t vq_idx = val;
>
> >
> >> +virtio_set_notification_data(vdev, vq_idx, val);
> >> +} else {
> >> +vq_idx = val;
> >> +}
> >> +
> >> +if (vq_idx < VIRTIO_QUEUE_MAX) {
> >> +virtio_queue_notify(vdev, vq_idx);
> >>   }
> >>   break;
> >>   case VIRTIO_PCI_STATUS:
> >> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >> index d229755eae..a61f69b7fd 100644
> >> --- a/hw/virtio/virtio.c
> >> +++ b/hw/virtio/virtio.c
> >> @@ -2052,6 +2052,19 @@ int virtio_set_status(VirtIODevice *vdev, uint8_t 
> >> val)
> >>   return 0;
> >>   }
> >>
> >> +void virtio_set_notification_data(VirtIODevice *vdev, uint16_t i, 
> >> uint32_t data)
> >> +{
> >> +V

Re: [RFC 0/8] virtio,vhost: Add VIRTIO_F_NOTIFICATION_DATA support

2024-03-01 Thread Eugenio Perez Martin

Hi Wentao / Rick / Xinying Yu,

Would it work for you to test this series on your use cases, so we
make sure everything works as expected?

Thanks!

On Fri, Mar 1, 2024 at 2:44 PM Jonah Palmer  wrote:
>
> The goal of these patches are to add support to a variety of virtio and
> vhost devices for the VIRTIO_F_NOTIFICATION_DATA transport feature. This
> feature indicates that a driver will pass extra data (instead of just a
> virtqueue's index) when notifying the corresponding device.
>
> The data passed in by the driver when this feature is enabled varies in
> format depending on if the device is using a split or packed virtqueue
> layout:
>
>  Split VQ
>   - Upper 16 bits: last_avail_idx
>   - Lower 16 bits: virtqueue index
>
>  Packed VQ
>   - Upper 16 bits: 1-bit wrap counter & 15-bit last_avail_idx
>   - Lower 16 bits: virtqueue index
>
> Also, due to the limitations of ioeventfd not being able to carry the
> extra provided by the driver, ioeventfd is left disabled for any devices
> using this feature.
>
> A significant aspect of this effort has been to maintain compatibility
> across different backends. As such, the feature is offered by backend
> devices only when supported, with fallback mechanisms where backend
> support is absent.
>

Hi Wentao,

Re: [RFC 8/8] virtio: Add VIRTIO_F_NOTIFICATION_DATA property definition

2024-03-01 Thread Eugenio Perez Martin

On Fri, Mar 1, 2024 at 2:44 PM Jonah Palmer  wrote:
>
> Extend the virtio device property definitions to include the
> VIRTIO_F_NOTIFICATION_DATA feature.
>
> The default state of this feature is disabled, allowing it to be
> explicitly enabled where it's supported.
>

Reviewed-by: Eugenio Pérez 

Thanks!

> Signed-off-by: Jonah Palmer 
> ---
>  include/hw/virtio/virtio.h | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index c92d8afc42..5772737dde 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -369,7 +369,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
>  DEFINE_PROP_BIT64("packed", _state, _field, \
>VIRTIO_F_RING_PACKED, false), \
>  DEFINE_PROP_BIT64("queue_reset", _state, _field, \
> -  VIRTIO_F_RING_RESET, true)
> +  VIRTIO_F_RING_RESET, true), \
> +DEFINE_PROP_BIT64("notification_data", _state, _field, \
> +  VIRTIO_F_NOTIFICATION_DATA, false)
>
>  hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
>  bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
> --
> 2.39.3
>

Re: [RFC 7/8] vhost/vhost-user: Add VIRTIO_F_NOTIFICATION_DATA to vhost feature bits

2024-03-01 Thread Eugenio Perez Martin

On Fri, Mar 1, 2024 at 2:44 PM Jonah Palmer  wrote:
>
> Add support for the VIRTIO_F_NOTIFICATION_DATA feature across a variety
> of vhost devices.
>
> The inclusion of VIRTIO_F_NOTIFICATION_DATA in the feature bits arrays
> for these devices ensures that the backend is capable of offering and
> providing support for this feature, and that it can be disabled if the
> backend does not support it.
>
> Signed-off-by: Jonah Palmer 
> ---
>  hw/block/vhost-user-blk.c| 1 +
>  hw/net/vhost_net.c   | 2 ++
>  hw/scsi/vhost-scsi.c | 1 +
>  hw/scsi/vhost-user-scsi.c| 1 +
>  hw/virtio/vhost-user-fs.c| 2 +-
>  hw/virtio/vhost-user-vsock.c | 1 +
>  net/vhost-vdpa.c | 1 +
>  7 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index 6a856ad51a..983c0657da 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_IOMMU_PLATFORM,
>  VIRTIO_F_RING_RESET,
> +VIRTIO_F_NOTIFICATION_DATA,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> index e8e1661646..bb1f975b39 100644
> --- a/hw/net/vhost_net.c
> +++ b/hw/net/vhost_net.c
> @@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
>  VIRTIO_F_IOMMU_PLATFORM,
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_RING_RESET,
> +VIRTIO_F_NOTIFICATION_DATA,
>  VIRTIO_NET_F_HASH_REPORT,
>  VHOST_INVALID_FEATURE_BIT
>  };
> @@ -55,6 +56,7 @@ static const int kernel_feature_bits[] = {
>  /* Features supported by others. */
>  static const int user_feature_bits[] = {
>  VIRTIO_F_NOTIFY_ON_EMPTY,
> +VIRTIO_F_NOTIFICATION_DATA,
>  VIRTIO_RING_F_INDIRECT_DESC,
>  VIRTIO_RING_F_EVENT_IDX,
>

vdpa_feature_bits also needs this feature bit added.

Apart from that,

Reviewed-by: Eugenio Pérez 

> diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
> index 58a00336c2..b8048f18e9 100644
> --- a/hw/scsi/vhost-scsi.c
> +++ b/hw/scsi/vhost-scsi.c
> @@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
>  VIRTIO_RING_F_EVENT_IDX,
>  VIRTIO_SCSI_F_HOTPLUG,
>  VIRTIO_F_RING_RESET,
> +VIRTIO_F_NOTIFICATION_DATA,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index a63b1f4948..0b050805a8 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_RING_F_EVENT_IDX,
>  VIRTIO_SCSI_F_HOTPLUG,
>  VIRTIO_F_RING_RESET,
> +VIRTIO_F_NOTIFICATION_DATA,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index cca2cd41be..ae48cc1c96 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -33,7 +33,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_IOMMU_PLATFORM,
>  VIRTIO_F_RING_RESET,
> -
> +VIRTIO_F_NOTIFICATION_DATA,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
> index 9431b9792c..802b44a07d 100644
> --- a/hw/virtio/vhost-user-vsock.c
> +++ b/hw/virtio/vhost-user-vsock.c
> @@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
>  VIRTIO_RING_F_INDIRECT_DESC,
>  VIRTIO_RING_F_EVENT_IDX,
>  VIRTIO_F_NOTIFY_ON_EMPTY,
> +VIRTIO_F_NOTIFICATION_DATA,
>  VHOST_INVALID_FEATURE_BIT
>  };
>
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 3726ee5d67..2827d29ce7 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
>  VIRTIO_F_RING_PACKED,
>  VIRTIO_F_RING_RESET,
>  VIRTIO_F_VERSION_1,
> +VIRTIO_F_NOTIFICATION_DATA,
>  VIRTIO_NET_F_CSUM,
>  VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
>  VIRTIO_NET_F_CTRL_MAC_ADDR,
> --
> 2.39.3
>

Re: [RFC 1/8] virtio/virtio-pci: Handle extra notification data

2024-03-01 Thread Eugenio Perez Martin

On Fri, Mar 1, 2024 at 2:44 PM Jonah Palmer  wrote:
>
> Add support to virtio-pci devices for handling the extra data sent
> from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
> transport feature has been negotiated.
>
> The extra data that's passed to the virtio-pci device when this
> feature is enabled varies depending on the device's virtqueue
> layout.
>
> In a split virtqueue layout, this data includes:
>  - upper 16 bits: last_avail_idx
>  - lower 16 bits: virtqueue index
>
> In a packed virtqueue layout, this data includes:
>  - upper 16 bits: 1-bit wrap counter & 15-bit last_avail_idx
>  - lower 16 bits: virtqueue index
>
> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio-pci.c | 13 ++---
>  hw/virtio/virtio.c | 13 +
>  include/hw/virtio/virtio.h |  1 +
>  3 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 1a7039fb0c..c7c577b177 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  {
>  VirtIOPCIProxy *proxy = opaque;
>  VirtIODevice *vdev = virtio_bus_get_device(>bus);
> -uint16_t vector;
> +uint16_t vector, vq_idx;
>  hwaddr pa;
>
>  switch (addr) {
> @@ -408,8 +408,15 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  vdev->queue_sel = val;
>  break;
>  case VIRTIO_PCI_QUEUE_NOTIFY:
> -if (val < VIRTIO_QUEUE_MAX) {
> -virtio_queue_notify(vdev, val);
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
> +vq_idx = val & 0x;
> +virtio_set_notification_data(vdev, vq_idx, val);
> +} else {
> +vq_idx = val;
> +}
> +
> +if (vq_idx < VIRTIO_QUEUE_MAX) {
> +virtio_queue_notify(vdev, vq_idx);
>  }
>  break;
>  case VIRTIO_PCI_STATUS:
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index d229755eae..a61f69b7fd 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2052,6 +2052,19 @@ int virtio_set_status(VirtIODevice *vdev, uint8_t val)
>  return 0;
>  }
>
> +void virtio_set_notification_data(VirtIODevice *vdev, uint16_t i, uint32_t 
> data)
> +{
> +VirtQueue *vq = >vq[i];

Sorry I sent the previous mail too fast :).

i should also be checked against VIRTIO_QUEUE_MAX and vq->vring.desc
before continuing this function. Otherwise is an out of bound access.

> +
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +vq->last_avail_wrap_counter = (data >> 31) & 0x1;
> +vq->last_avail_idx = (data >> 16) & 0x7FFF;
> +} else {
> +vq->last_avail_idx = (data >> 16) & 0x;
> +}
> +vq->shadow_avail_idx = vq->last_avail_idx;
> +}
> +
>  static enum virtio_device_endian virtio_default_endian(void)
>  {
>  if (target_words_bigendian()) {
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index c8f72850bc..c92d8afc42 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -345,6 +345,7 @@ void virtio_queue_reset(VirtIODevice *vdev, uint32_t 
> queue_index);
>  void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
>  void virtio_update_irq(VirtIODevice *vdev);
>  int virtio_set_features(VirtIODevice *vdev, uint64_t val);
> +void virtio_set_notification_data(VirtIODevice *vdev, uint16_t i, uint32_t 
> data);
>
>  /* Base devices.  */
>  typedef struct VirtIOBlkConf VirtIOBlkConf;
> --
> 2.39.3
>

Re: [RFC 2/8] virtio-pci: Lock ioeventfd state with VIRTIO_F_NOTIFICATION_DATA

2024-03-01 Thread Eugenio Perez Martin

On Fri, Mar 1, 2024 at 2:44 PM Jonah Palmer  wrote:
>
> Prevent ioeventfd from being enabled/disabled when a virtio-pci
> device has negotiated the VIRTIO_F_NOTIFICATION_DATA transport
> feature.
>
> Due to ioeventfd not being able to carry the extra data associated with
> this feature, the ioeventfd should be left in a disabled state for
> emulated virtio-pci devices using this feature.
>
> Signed-off-by: Jonah Palmer 

Reviewed-by: Eugenio Pérez 

Thanks!

> ---
>  hw/virtio/virtio-pci.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index c7c577b177..fd9717a0f5 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -420,13 +420,15 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  }
>  break;
>  case VIRTIO_PCI_STATUS:
> -if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
> +if (!(val & VIRTIO_CONFIG_S_DRIVER_OK) &&
> +!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
>  virtio_pci_stop_ioeventfd(proxy);
>  }
>
>  virtio_set_status(vdev, val & 0xFF);
>
> -if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
> +if ((val & VIRTIO_CONFIG_S_DRIVER_OK) &&
> +!virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
>  virtio_pci_start_ioeventfd(proxy);
>  }
>
> --
> 2.39.3
>

Re: [RFC 1/8] virtio/virtio-pci: Handle extra notification data

2024-03-01 Thread Eugenio Perez Martin

On Fri, Mar 1, 2024 at 2:44 PM Jonah Palmer  wrote:
>
> Add support to virtio-pci devices for handling the extra data sent
> from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA
> transport feature has been negotiated.
>
> The extra data that's passed to the virtio-pci device when this
> feature is enabled varies depending on the device's virtqueue
> layout.
>
> In a split virtqueue layout, this data includes:
>  - upper 16 bits: last_avail_idx
>  - lower 16 bits: virtqueue index
>
> In a packed virtqueue layout, this data includes:
>  - upper 16 bits: 1-bit wrap counter & 15-bit last_avail_idx
>  - lower 16 bits: virtqueue index
>
> Signed-off-by: Jonah Palmer 
> ---
>  hw/virtio/virtio-pci.c | 13 ++---
>  hw/virtio/virtio.c | 13 +
>  include/hw/virtio/virtio.h |  1 +
>  3 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 1a7039fb0c..c7c577b177 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -384,7 +384,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  {
>  VirtIOPCIProxy *proxy = opaque;
>  VirtIODevice *vdev = virtio_bus_get_device(>bus);
> -uint16_t vector;
> +uint16_t vector, vq_idx;
>  hwaddr pa;
>
>  switch (addr) {
> @@ -408,8 +408,15 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  vdev->queue_sel = val;
>  break;
>  case VIRTIO_PCI_QUEUE_NOTIFY:
> -if (val < VIRTIO_QUEUE_MAX) {
> -virtio_queue_notify(vdev, val);
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_NOTIFICATION_DATA)) {
> +vq_idx = val & 0x;

Nitpick, but since vq_idx is already a uint16_t the & 0x is not
needed. I think it's cleaner just to call virtio_set_notification data
in the has_feature(...) condition, but I'm happy with this too.

> +virtio_set_notification_data(vdev, vq_idx, val);
> +} else {
> +vq_idx = val;
> +}
> +
> +if (vq_idx < VIRTIO_QUEUE_MAX) {
> +virtio_queue_notify(vdev, vq_idx);
>  }
>  break;
>  case VIRTIO_PCI_STATUS:
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index d229755eae..a61f69b7fd 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2052,6 +2052,19 @@ int virtio_set_status(VirtIODevice *vdev, uint8_t val)
>  return 0;
>  }
>
> +void virtio_set_notification_data(VirtIODevice *vdev, uint16_t i, uint32_t 
> data)
> +{
> +VirtQueue *vq = >vq[i];
> +
> +if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
> +vq->last_avail_wrap_counter = (data >> 31) & 0x1;
> +vq->last_avail_idx = (data >> 16) & 0x7FFF;
> +} else {
> +vq->last_avail_idx = (data >> 16) & 0x;
> +}

It should not set last_avail_idx, only shadow_avail_idx. Otherwise,
QEMU can only see the descriptors placed after the notification.

Or am I missing something?

In that regard, I would call this function
"virtqueue_set_shadow_avail_idx". But I'm very bad at naming :).

The rest looks good to me.

Thanks!

> +vq->shadow_avail_idx = vq->last_avail_idx;
> +}
> +
>  static enum virtio_device_endian virtio_default_endian(void)
>  {
>  if (target_words_bigendian()) {
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index c8f72850bc..c92d8afc42 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -345,6 +345,7 @@ void virtio_queue_reset(VirtIODevice *vdev, uint32_t 
> queue_index);
>  void virtio_queue_enable(VirtIODevice *vdev, uint32_t queue_index);
>  void virtio_update_irq(VirtIODevice *vdev);
>  int virtio_set_features(VirtIODevice *vdev, uint64_t val);
> +void virtio_set_notification_data(VirtIODevice *vdev, uint16_t i, uint32_t 
> data);
>
>  /* Base devices.  */
>  typedef struct VirtIOBlkConf VirtIOBlkConf;
> --
> 2.39.3
>

Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-02-08 Thread Eugenio Perez Martin

On Wed, Feb 7, 2024 at 11:18 AM Kevin Wolf  wrote:
>
> Am 06.02.2024 um 17:44 hat Eugenio Perez Martin geschrieben:
> > On Mon, Feb 5, 2024 at 2:49 PM Kevin Wolf  wrote:
> > >
> > > Am 05.02.2024 um 13:22 hat Eugenio Perez Martin geschrieben:
> > > > On Fri, Feb 2, 2024 at 2:25 PM Kevin Wolf  wrote:
> > > > >
> > > > > VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> > > > > status flag is set; with the current API of the kernel module, it is
> > > > > impossible to enable the opposite order in our block export code 
> > > > > because
> > > > > userspace is not notified when a virtqueue is enabled.
> > > > >
> > > > > This requirement also mathces the normal initialisation order as done 
> > > > > by
> > > > > the generic vhost code in QEMU. However, commit 6c482547 accidentally
> > > > > changed the order for vdpa-dev and broke access to VDUSE devices with
> > > > > this.
> > > > >
> > > > > This changes vdpa-dev to use the normal order again and use the 
> > > > > standard
> > > > > vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> > > > > used with vdpa-dev again after this fix.
> > > > >
> > > > > Cc: qemu-sta...@nongnu.org
> > > > > Fixes: 6c4825476a4351530bcac17abab72295b75ffe98
> > > > > Signed-off-by: Kevin Wolf 
> > > > > ---
> > > > >  hw/virtio/vdpa-dev.c   |  5 +
> > > > >  hw/virtio/vhost-vdpa.c | 17 +
> > > > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> > > > > index eb9ecea83b..13e87f06f6 100644
> > > > > --- a/hw/virtio/vdpa-dev.c
> > > > > +++ b/hw/virtio/vdpa-dev.c
> > > > > @@ -253,14 +253,11 @@ static int vhost_vdpa_device_start(VirtIODevice 
> > > > > *vdev, Error **errp)
> > > > >
> > > > >  s->dev.acked_features = vdev->guest_features;
> > > > >
> > > > > -ret = vhost_dev_start(>dev, vdev, false);
> > > > > +ret = vhost_dev_start(>dev, vdev, true);
> > > > >  if (ret < 0) {
> > > > >  error_setg_errno(errp, -ret, "Error starting vhost");
> > > > >  goto err_guest_notifiers;
> > > > >  }
> > > > > -for (i = 0; i < s->dev.nvqs; ++i) {
> > > > > -vhost_vdpa_set_vring_ready(>vdpa, i);
> > > > > -}
> > > > >  s->started = true;
> > > > >
> > > > >  /*
> > > > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > > > index 3a43beb312..c4574d56c5 100644
> > > > > --- a/hw/virtio/vhost-vdpa.c
> > > > > +++ b/hw/virtio/vhost-vdpa.c
> > > > > @@ -904,6 +904,22 @@ int vhost_vdpa_set_vring_ready(struct vhost_vdpa 
> > > > > *v, unsigned idx)
> > > > >  return r;
> > > > >  }
> > > > >
> > > > > +static int vhost_vdpa_set_vring_enable(struct vhost_dev *dev, int 
> > > > > enable)
> > > > > +{
> > > > > +struct vhost_vdpa *v = dev->opaque;
> > > > > +unsigned int i;
> > > > > +int ret;
> > > > > +
> > > > > +for (i = 0; i < dev->nvqs; ++i) {
> > > > > +ret = vhost_vdpa_set_vring_ready(v, i);
> > > > > +if (ret < 0) {
> > > > > +return ret;
> > > > > +}
> > > > > +}
> > > > > +
> > > > > +return 0;
> > > > > +}
> > > > > +
> > > > >  static int vhost_vdpa_set_config_call(struct vhost_dev *dev,
> > > > > int fd)
> > > > >  {
> > > > > @@ -1524,6 +1540,7 @@ const VhostOps vdpa_ops = {
> > > > >  .vhost_set_features = vhost_vdpa_set_features,
> > > > >  .vhost_reset_device = vhost_vdpa_reset_device,
> > > > >  .vhost_get_vq_index = vhost_vdpa_get_vq_index,
> > > > > +.vhost_set_vring_enable = vhost_vdpa_set_vring_enable,
> > > > >  .

Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-02-06 Thread Eugenio Perez Martin

On Mon, Feb 5, 2024 at 2:49 PM Kevin Wolf  wrote:
>
> Am 05.02.2024 um 13:22 hat Eugenio Perez Martin geschrieben:
> > On Fri, Feb 2, 2024 at 2:25 PM Kevin Wolf  wrote:
> > >
> > > VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> > > status flag is set; with the current API of the kernel module, it is
> > > impossible to enable the opposite order in our block export code because
> > > userspace is not notified when a virtqueue is enabled.
> > >
> > > This requirement also mathces the normal initialisation order as done by
> > > the generic vhost code in QEMU. However, commit 6c482547 accidentally
> > > changed the order for vdpa-dev and broke access to VDUSE devices with
> > > this.
> > >
> > > This changes vdpa-dev to use the normal order again and use the standard
> > > vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> > > used with vdpa-dev again after this fix.
> > >
> > > Cc: qemu-sta...@nongnu.org
> > > Fixes: 6c4825476a4351530bcac17abab72295b75ffe98
> > > Signed-off-by: Kevin Wolf 
> > > ---
> > >  hw/virtio/vdpa-dev.c   |  5 +
> > >  hw/virtio/vhost-vdpa.c | 17 +
> > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> > > index eb9ecea83b..13e87f06f6 100644
> > > --- a/hw/virtio/vdpa-dev.c
> > > +++ b/hw/virtio/vdpa-dev.c
> > > @@ -253,14 +253,11 @@ static int vhost_vdpa_device_start(VirtIODevice 
> > > *vdev, Error **errp)
> > >
> > >  s->dev.acked_features = vdev->guest_features;
> > >
> > > -ret = vhost_dev_start(>dev, vdev, false);
> > > +ret = vhost_dev_start(>dev, vdev, true);
> > >  if (ret < 0) {
> > >  error_setg_errno(errp, -ret, "Error starting vhost");
> > >  goto err_guest_notifiers;
> > >  }
> > > -for (i = 0; i < s->dev.nvqs; ++i) {
> > > -vhost_vdpa_set_vring_ready(>vdpa, i);
> > > -}
> > >  s->started = true;
> > >
> > >  /*
> > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > index 3a43beb312..c4574d56c5 100644
> > > --- a/hw/virtio/vhost-vdpa.c
> > > +++ b/hw/virtio/vhost-vdpa.c
> > > @@ -904,6 +904,22 @@ int vhost_vdpa_set_vring_ready(struct vhost_vdpa *v, 
> > > unsigned idx)
> > >  return r;
> > >  }
> > >
> > > +static int vhost_vdpa_set_vring_enable(struct vhost_dev *dev, int enable)
> > > +{
> > > +struct vhost_vdpa *v = dev->opaque;
> > > +unsigned int i;
> > > +int ret;
> > > +
> > > +for (i = 0; i < dev->nvqs; ++i) {
> > > +ret = vhost_vdpa_set_vring_ready(v, i);
> > > +if (ret < 0) {
> > > +return ret;
> > > +}
> > > +}
> > > +
> > > +return 0;
> > > +}
> > > +
> > >  static int vhost_vdpa_set_config_call(struct vhost_dev *dev,
> > > int fd)
> > >  {
> > > @@ -1524,6 +1540,7 @@ const VhostOps vdpa_ops = {
> > >  .vhost_set_features = vhost_vdpa_set_features,
> > >  .vhost_reset_device = vhost_vdpa_reset_device,
> > >  .vhost_get_vq_index = vhost_vdpa_get_vq_index,
> > > +.vhost_set_vring_enable = vhost_vdpa_set_vring_enable,
> > >  .vhost_get_config  = vhost_vdpa_get_config,
> > >  .vhost_set_config = vhost_vdpa_set_config,
> > >  .vhost_requires_shm_log = NULL,
> >
> > vhost-vdpa net enables CVQ before dataplane ones to configure all the
> > device in the destination of a live migration. To go back again to
> > this callback would cause the device to enable the dataplane before
> > virtqueues are configured again.
>
> Not that it makes a difference, but I don't think it would actually be
> going back. Even before your commit 6c482547, we were not making use of
> the generic callback but just called the function in a slightly
> different place (but less different than after commit 6c482547).
>
> But anyway... Why don't the other vhost backend need the same for
> vhost-net then? Do they just not support live migration?
>

They don't support control virtqueue. More specifically, control
virtqueue is handled directly in QEMU.

> I don't know the code well enough to say

Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-02-05 Thread Eugenio Perez Martin

On Fri, Feb 2, 2024 at 2:25 PM Kevin Wolf  wrote:
>
> VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> status flag is set; with the current API of the kernel module, it is
> impossible to enable the opposite order in our block export code because
> userspace is not notified when a virtqueue is enabled.
>
> This requirement also mathces the normal initialisation order as done by
> the generic vhost code in QEMU. However, commit 6c482547 accidentally
> changed the order for vdpa-dev and broke access to VDUSE devices with
> this.
>
> This changes vdpa-dev to use the normal order again and use the standard
> vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> used with vdpa-dev again after this fix.
>
> Cc: qemu-sta...@nongnu.org
> Fixes: 6c4825476a4351530bcac17abab72295b75ffe98
> Signed-off-by: Kevin Wolf 
> ---
>  hw/virtio/vdpa-dev.c   |  5 +
>  hw/virtio/vhost-vdpa.c | 17 +
>  2 files changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> index eb9ecea83b..13e87f06f6 100644
> --- a/hw/virtio/vdpa-dev.c
> +++ b/hw/virtio/vdpa-dev.c
> @@ -253,14 +253,11 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, 
> Error **errp)
>
>  s->dev.acked_features = vdev->guest_features;
>
> -ret = vhost_dev_start(>dev, vdev, false);
> +ret = vhost_dev_start(>dev, vdev, true);
>  if (ret < 0) {
>  error_setg_errno(errp, -ret, "Error starting vhost");
>  goto err_guest_notifiers;
>  }
> -for (i = 0; i < s->dev.nvqs; ++i) {
> -vhost_vdpa_set_vring_ready(>vdpa, i);
> -}
>  s->started = true;
>
>  /*
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 3a43beb312..c4574d56c5 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -904,6 +904,22 @@ int vhost_vdpa_set_vring_ready(struct vhost_vdpa *v, 
> unsigned idx)
>  return r;
>  }
>
> +static int vhost_vdpa_set_vring_enable(struct vhost_dev *dev, int enable)
> +{
> +struct vhost_vdpa *v = dev->opaque;
> +unsigned int i;
> +int ret;
> +
> +for (i = 0; i < dev->nvqs; ++i) {
> +ret = vhost_vdpa_set_vring_ready(v, i);
> +if (ret < 0) {
> +return ret;
> +}
> +}
> +
> +return 0;
> +}
> +
>  static int vhost_vdpa_set_config_call(struct vhost_dev *dev,
> int fd)
>  {
> @@ -1524,6 +1540,7 @@ const VhostOps vdpa_ops = {
>  .vhost_set_features = vhost_vdpa_set_features,
>  .vhost_reset_device = vhost_vdpa_reset_device,
>  .vhost_get_vq_index = vhost_vdpa_get_vq_index,
> +.vhost_set_vring_enable = vhost_vdpa_set_vring_enable,
>  .vhost_get_config  = vhost_vdpa_get_config,
>  .vhost_set_config = vhost_vdpa_set_config,
>  .vhost_requires_shm_log = NULL,
> --
> 2.43.0
>

vhost-vdpa net enables CVQ before dataplane ones to configure all the
device in the destination of a live migration. To go back again to
this callback would cause the device to enable the dataplane before
virtqueues are configured again.

How does VDUSE userspace knows how many queues should enable? Can't
the kernel perform the necessary actions after DRIVER_OK, like
configuring the kick etc?

Thanks!

Re: [PATCH] vhost: mask VIRTIO_F_RING_RESET for vhost and vhost-user devices

2022-11-22 Thread Eugenio Perez Martin

On Tue, Nov 22, 2022 at 4:13 AM Jason Wang  wrote:
>
> On Mon, Nov 21, 2022 at 6:11 PM Stefano Garzarella  
> wrote:
> >
> > Commit 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
> > enabled VIRTIO_F_RING_RESET by default for all virtio devices.
> >
> > This feature is not currently emulated by QEMU, so for vhost and
> > vhost-user devices we need to make sure it is supported by the offloaded
> > device emulation (in-kernel or in another process).
> > To do this we need to add VIRTIO_F_RING_RESET to the features bitmap
> > passed to vhost_get_features(). This way it will be masked if the device
> > does not support it.
> >
> > This issue was initially discovered with vhost-vsock and vhost-user-vsock,
> > and then also tested with vhost-user-rng which confirmed the same issue.
> > They fail when sending features through VHOST_SET_FEATURES ioctl or
> > VHOST_USER_SET_FEATURES message, since VIRTIO_F_RING_RESET is negotiated
> > by the guest (Linux >= v6.0), but not supported by the device.
> >
> > Fixes: 69e1c14aa2 ("virtio: core: vq reset feature negotation support")
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1318
> > Signed-off-by: Stefano Garzarella 
>
> Acked-by: Jason Wang 
>
> > ---
> >
> > To prevent this problem in the future, perhaps we should provide a function
> > (e.g. vhost_device_get_features) where we go to mask all non-device-specific
> > features (e.g VIRTIO_F_*, VIRTIO_RING_F_*) that are not emulated by QEMU but
> > we expect them to be emulated by the vhost or vhost-user devices.
>
> Adding Eugenio, this might not be true if we want to use shadow
> virtqueue for emulating some features?
>

I think we should be able to introduce that in the (hypothetical)
vhost_device_get_features if SVQ starts emulating a ring feature,
isn't it? I think a first version not aware of SVQ is fine anyway, and
to introduce it later should be easy.

Thanks!

> Thanks
>
> > Then we can call it in all .get_features callbacks just before return the
> > features.
> >
> > What do you think?
> >
> > But maybe better to do that for the next release, I will send an RFC.
> >
> > Thanks,
> > Stefano
> > ---
> >  hw/block/vhost-user-blk.c  |  1 +
> >  hw/net/vhost_net.c |  1 +
> >  hw/scsi/vhost-scsi.c   |  1 +
> >  hw/scsi/vhost-user-scsi.c  |  1 +
> >  hw/virtio/vhost-user-fs.c  |  1 +
> >  hw/virtio/vhost-user-gpio.c|  1 +
> >  hw/virtio/vhost-user-i2c.c |  1 +
> >  hw/virtio/vhost-user-rng.c | 11 +--
> >  hw/virtio/vhost-vsock-common.c |  1 +
> >  net/vhost-vdpa.c   |  1 +
> >  10 files changed, 18 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > index 16ad400889..0d5190accf 100644
> > --- a/hw/block/vhost-user-blk.c
> > +++ b/hw/block/vhost-user-blk.c
> > @@ -52,6 +52,7 @@ static const int user_feature_bits[] = {
> >  VIRTIO_F_NOTIFY_ON_EMPTY,
> >  VIRTIO_F_RING_PACKED,
> >  VIRTIO_F_IOMMU_PLATFORM,
> > +VIRTIO_F_RING_RESET,
> >  VHOST_INVALID_FEATURE_BIT
> >  };
> >
> > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > index feda448878..26e4930676 100644
> > --- a/hw/net/vhost_net.c
> > +++ b/hw/net/vhost_net.c
> > @@ -75,6 +75,7 @@ static const int user_feature_bits[] = {
> >  VIRTIO_NET_F_MTU,
> >  VIRTIO_F_IOMMU_PLATFORM,
> >  VIRTIO_F_RING_PACKED,
> > +VIRTIO_F_RING_RESET,
> >  VIRTIO_NET_F_RSS,
> >  VIRTIO_NET_F_HASH_REPORT,
> >
> > diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
> > index bdf337a7a2..6a0fd0dfb1 100644
> > --- a/hw/scsi/vhost-scsi.c
> > +++ b/hw/scsi/vhost-scsi.c
> > @@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
> >  VIRTIO_RING_F_INDIRECT_DESC,
> >  VIRTIO_RING_F_EVENT_IDX,
> >  VIRTIO_SCSI_F_HOTPLUG,
> > +VIRTIO_F_RING_RESET,
> >  VHOST_INVALID_FEATURE_BIT
> >  };
> >
> > diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> > index bc37317d55..b7a71a802c 100644
> > --- a/hw/scsi/vhost-user-scsi.c
> > +++ b/hw/scsi/vhost-user-scsi.c
> > @@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
> >  VIRTIO_RING_F_INDIRECT_DESC,
> >  VIRTIO_RING_F_EVENT_IDX,
> >  VIRTIO_SCSI_F_HOTPLUG,
> > +VIRTIO_F_RING_RESET,
> >  VHOST_INVALID_FEATURE_BIT
> >  };
> >
> > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > index 1c40f42045..dc4014cdef 100644
> > --- a/hw/virtio/vhost-user-fs.c
> > +++ b/hw/virtio/vhost-user-fs.c
> > @@ -32,6 +32,7 @@ static const int user_feature_bits[] = {
> >  VIRTIO_F_NOTIFY_ON_EMPTY,
> >  VIRTIO_F_RING_PACKED,
> >  VIRTIO_F_IOMMU_PLATFORM,
> > +VIRTIO_F_RING_RESET,
> >
> >  VHOST_INVALID_FEATURE_BIT
> >  };
> > diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
> > index 677d1c7730..5851cb3bc9 100644
> > --- a/hw/virtio/vhost-user-gpio.c
> > +++ b/hw/virtio/vhost-user-gpio.c
> > @@ -24,6 +24,7 @@ static const int feature_bits[] =

Re: [RFC PATCH v3 19/28] hw/virtio: Replace g_memdup() by g_memdup2()

2022-04-01 Thread Eugenio Perez Martin

On Fri, Sep 3, 2021 at 8:11 PM Philippe Mathieu-Daudé  wrote:
>
> Per 
> https://discourse.gnome.org/t/port-your-module-from-g-memdup-to-g-memdup2-now/5538
>
>   The old API took the size of the memory to duplicate as a guint,
>   whereas most memory functions take memory sizes as a gsize. This
>   made it easy to accidentally pass a gsize to g_memdup(). For large
>   values, that would lead to a silent truncation of the size from 64
>   to 32 bits, and result in a heap area being returned which is
>   significantly smaller than what the caller expects. This can likely
>   be exploited in various modules to cause a heap buffer overflow.
>
> Replace g_memdup() by the safer g_memdup2() wrapper.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Should we check in_num/out_num in range?

I'd say it is not needed to check: virtqueue_pop fills them by
iterating through the descriptor chain so the range is restricted to
[0, 1024].

Acked-by: Eugenio Pérez 


> ---
>  hw/net/virtio-net.c   | 3 ++-
>  hw/virtio/virtio-crypto.c | 6 +++---
>  2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 16d20cdee52..338fbeb8c57 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1449,7 +1449,8 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, 
> VirtQueue *vq)
>  }
>
>  iov_cnt = elem->out_num;
> -iov2 = iov = g_memdup(elem->out_sg, sizeof(struct iovec) * 
> elem->out_num);
> +iov2 = iov = g_memdup2(elem->out_sg,
> +   sizeof(struct iovec) * elem->out_num);
>  s = iov_to_buf(iov, iov_cnt, 0, , sizeof(ctrl));
>  iov_discard_front(, _cnt, sizeof(ctrl));
>  if (s != sizeof(ctrl)) {
> diff --git a/hw/virtio/virtio-crypto.c b/hw/virtio/virtio-crypto.c
> index 54f9bbb789c..59886c1790d 100644
> --- a/hw/virtio/virtio-crypto.c
> +++ b/hw/virtio/virtio-crypto.c
> @@ -242,7 +242,7 @@ static void virtio_crypto_handle_ctrl(VirtIODevice *vdev, 
> VirtQueue *vq)
>  }
>
>  out_num = elem->out_num;
> -out_iov_copy = g_memdup(elem->out_sg, sizeof(out_iov[0]) * out_num);
> +out_iov_copy = g_memdup2(elem->out_sg, sizeof(out_iov[0]) * out_num);
>  out_iov = out_iov_copy;
>
>  in_num = elem->in_num;
> @@ -605,11 +605,11 @@ virtio_crypto_handle_request(VirtIOCryptoReq *request)
>  }
>
>  out_num = elem->out_num;
> -out_iov_copy = g_memdup(elem->out_sg, sizeof(out_iov[0]) * out_num);
> +out_iov_copy = g_memdup2(elem->out_sg, sizeof(out_iov[0]) * out_num);
>  out_iov = out_iov_copy;
>
>  in_num = elem->in_num;
> -in_iov_copy = g_memdup(elem->in_sg, sizeof(in_iov[0]) * in_num);
> +in_iov_copy = g_memdup2(elem->in_sg, sizeof(in_iov[0]) * in_num);
>  in_iov = in_iov_copy;
>
>  if (unlikely(iov_to_buf(out_iov, out_num, 0, , sizeof(req))
> --
> 2.31.1
>
>

Re: [PATCH v7 0/8] Packed virtqueue for virtio

2019-10-28 Thread Eugenio Perez Martin

Hi Michael.

Totally agree with you. I investigated that for a short time but in
the end I tested manually with the coverage instrumentation, trying to
exercise the same "code" as current tests touches in split virtqueues.

I will automate at least these paths.

Thanks!


On Fri, Oct 25, 2019 at 1:35 PM Michael S. Tsirkin  wrote:
>
> On Fri, Oct 25, 2019 at 10:35:19AM +0200, Eugenio Pérez wrote:
> > Hi:
> >
> > This is an updated version of packed virtqueue support based on Wei and 
> > Jason's
> > V6, mainly solving the clang leak detector error CI gave.
>
>
> Looks good, I will queue this up.
>
> It would be nice to add libqos based tests on top,
> based on Stefan's work.
>
>
> > Please review.
> >
> > Changes from V6:
> > - Commit reorder: Squash bugfix and sepparate big changes into smaller 
> > commits.
> >
> > Changes from V5:
> > - Fix qemu's CI asan error.
> > - Move/copy rcu comments.
> > - Merge duplicated vdev->broken check between split and packet version.
> >
> > Eugenio Pérez (2):
> >   virtio: Free blk virqueues at unrealize()
> >   virtio: Free rnd virqueue at unrealize()
> >
> > Jason Wang (4):
> >   virtio: basic packed virtqueue support
> >   virtio: event suppression support for packed ring
> >   vhost_net: enable packed ring support
> >   virtio: add property to enable packed virtqueue
> >
> > Wei Xu (2):
> >   virtio: basic structure for packed ring
> >   virtio: device/driver area size calculation refactor for split ring
> >
> >  hw/block/virtio-blk.c   |7 +-
> >  hw/char/virtio-serial-bus.c |2 +-
> >  hw/net/vhost_net.c  |2 +
> >  hw/scsi/virtio-scsi.c   |3 +-
> >  hw/virtio/virtio-rng.c  |1 +
> >  hw/virtio/virtio.c  | 1154 
> > ++-
> >  include/hw/virtio/virtio.h  |   14 +-
> >  7 files changed, 1045 insertions(+), 138 deletions(-)
> >
> > --
> > 2.16.5

Re: [PATCH v6 0/9] Packed virtqueue for virtio

2019-10-25 Thread Eugenio Perez Martin

Hi Jason!

I can post a new version. You will have it in a moment.

Thanks!

On Fri, Oct 25, 2019 at 5:20 AM Jason Wang  wrote:
>
>
> On 2019/10/25 上午1:13, Eugenio Pérez wrote:
> > Hi:
> >
> > This is an updated version of packed virtqueue support based on Wei and 
> > Jason's
> > V5, mainly solving the clang leak detector error CI gave.
> >
> > Please review.
> >
> > Changes from V5:
> > - Fix qemu's CI asan error.
> > - Move/copy rcu comments.
> > - Merge duplicated vdev->broken check between split and packet version.
> >
> > Eugenio Pérez (3):
> >virtio: Free rng and blk virqueues
> >virtio: add some rcu comments
> >virtio: Move vdev->broken check to dispatch drop_all
> >
> > Jason Wang (4):
> >virtio: basic packed virtqueue support
> >virtio: event suppression support for packed ring
> >vhost_net: enable packed ring support
> >virtio: add property to enable packed virtqueue
> >
> > Wei Xu (2):
> >virtio: basic structure for packed ring
> >virtio: device/driverr area size calculation refactor for split ring
>
>
> Looks good to me.
>
> Just two nits:
>
> I tend to squash patch 8 and patch 9 into the patch that introduces
> those issues and split patch 3 into two parts.
>
> Btw, if you wish you can add your s-o-b to the series.
>
> Do you want to post a new version or I can tweak them by myself?
>
> Thanks
>
>
> >
> >   hw/block/virtio-blk.c   |7 +-
> >   hw/char/virtio-serial-bus.c |2 +-
> >   hw/net/vhost_net.c  |2 +
> >   hw/scsi/virtio-scsi.c   |3 +-
> >   hw/virtio/virtio-rng.c  |1 +
> >   hw/virtio/virtio.c  | 1154 
> > ++-
> >   include/hw/virtio/virtio.h  |   14 +-
> >   7 files changed, 1045 insertions(+), 138 deletions(-)
> >

50 matches

Mail list logo