Re: [PULL for-9.0 0/1] Block patches

2024-04-04 Thread Peter Maydell
On Thu, 4 Apr 2024 at 14:58, Stefan Hajnoczi  wrote:
>
> The following changes since commit 786fd793b81410fb2a28914315e2f05d2ff6733b:
>
>   Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging 
> (2024-04-03 12:52:03 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/stefanha/qemu.git tags/block-pull-request
>
> for you to fetch changes up to bbdf9023665f409113cb07b463732861af63fb47:
>
>   block/virtio-blk: Fix memory leak from virtio_blk_zone_report (2024-04-04 
> 09:29:42 -0400)
>
> 
> Pull request
>
> Fix a memory leak in virtio-blk zone report emulation code when the request is
> invalid.
>
> 
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM



Re: [RFC v2 1/5] virtio: Initialize sequence variables

2024-04-04 Thread Eugenio Perez Martin
On Thu, Apr 4, 2024 at 4:42 PM Jonah Palmer  wrote:
>
>
>
> On 4/4/24 7:35 AM, Eugenio Perez Martin wrote:
> > On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer  wrote:
> >>
> >>
> >>
> >> On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:
> >>> On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer  
> >>> wrote:
> 
>  Initialize sequence variables for VirtQueue and VirtQueueElement
>  structures. A VirtQueue's sequence variables are initialized when a
>  VirtQueue is being created or reset. A VirtQueueElement's sequence
>  variable is initialized when a VirtQueueElement is being initialized.
>  These variables will be used to support the VIRTIO_F_IN_ORDER feature.
> 
>  A VirtQueue's used_seq_idx represents the next expected index in a
>  sequence of VirtQueueElements to be processed (put on the used ring).
>  The next VirtQueueElement added to the used ring must match this
>  sequence number before additional elements can be safely added to the
>  used ring. It's also particularly useful for helping find the number of
>  new elements added to the used ring.
> 
>  A VirtQueue's current_seq_idx represents the current sequence index.
>  This value is essentially a counter where the value is assigned to a new
>  VirtQueueElement and then incremented. Given its uint16_t type, this
>  sequence number can be between 0 and 65,535.
> 
>  A VirtQueueElement's seq_idx represents the sequence number assigned to
>  the VirtQueueElement when it was created. This value must match with the
>  VirtQueue's used_seq_idx before the element can be put on the used ring
>  by the device.
> 
>  Signed-off-by: Jonah Palmer 
>  ---
> hw/virtio/virtio.c | 18 ++
> include/hw/virtio/virtio.h |  1 +
> 2 files changed, 19 insertions(+)
> 
>  diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>  index fb6b4ccd83..069d96df99 100644
>  --- a/hw/virtio/virtio.c
>  +++ b/hw/virtio/virtio.c
>  @@ -132,6 +132,10 @@ struct VirtQueue
> uint16_t used_idx;
> bool used_wrap_counter;
> 
>  +/* In-Order sequence indices */
>  +uint16_t used_seq_idx;
>  +uint16_t current_seq_idx;
>  +
> >>>
> >>> I'm having a hard time understanding the difference between these and
> >>> last_avail_idx and used_idx. It seems to me if we replace them
> >>> everything will work? What am I missing?
> >>>
> >>
> >> For used_seq_idx, it does work like used_idx except the difference is
> >> when their values get updated, specifically for the split VQ case.
> >>
> >> As you know, for the split VQ case, the used_idx is updated during
> >> virtqueue_split_flush. However, imagine a batch of elements coming in
> >> where virtqueue_split_fill is called multiple times before
> >> virtqueue_split_flush. We want to make sure we write these elements to
> >> the used ring in-order and we'll know its order based on used_seq_idx.
> >>
> >> Alternatively, I thought about replicating the logic for the packed VQ
> >> case (where this used_seq_idx isn't used) where we start looking at
> >> vq->used_elems[vq->used_idx] and iterate through until we find a used
> >> element, but I wasn't sure how to handle the case where elements get
> >> used (written to the used ring) and new elements get put in used_elems
> >> before the used_idx is updated. Since this search would require us to
> >> always start at index vq->used_idx.
> >>
> >> For example, say, of three elements getting filled (elem0 - elem2),
> >> elem1 and elem0 come back first (vq->used_idx = 0):
> >>
> >> elem1 - not in-order
> >> elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
> >>   in-order, write elem0 and elem1 to used ring, mark elements as
> >>   used
> >>
> >> Then elem2 comes back, but vq->used_idx is still 0, so how do we know to
> >> ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1
> >> (elem1) and iterate to vq->used_idx + 2 (elem2)?
> >>
> >> Hmm... now that I'm thinking about it, maybe for the split VQ case we
> >> could continue looking through the vq->used_elems array until we find an
> >> unused element... but then again how would we (1) know if the element is
> >> in-order and (2) know when to stop searching?
> >>
> >
> > Ok I think I understand the problem now. It is aggravated if we add
> > chained descriptors to the mix.
> >
> > We know that the order of used descriptors must be the exact same as
> > the order they were made available, leaving out in order batching.
> > What if vq->used_elems at virtqueue_pop and then virtqueue_push just
> > marks them as used somehow? Two booleans (or flag) would do for a
> > first iteration.
> >
> > If we go with this approach I think used_elems should be renamed actually.
> >
>
> If I'm understanding correctly, I don't think adding newly created
> elements to vq->used_elems at virtqueue_pop will do much for u

Re: [RFC v2 1/5] virtio: Initialize sequence variables

2024-04-04 Thread Jonah Palmer




On 4/4/24 7:35 AM, Eugenio Perez Martin wrote:

On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer  wrote:




On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:

On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer  wrote:


Initialize sequence variables for VirtQueue and VirtQueueElement
structures. A VirtQueue's sequence variables are initialized when a
VirtQueue is being created or reset. A VirtQueueElement's sequence
variable is initialized when a VirtQueueElement is being initialized.
These variables will be used to support the VIRTIO_F_IN_ORDER feature.

A VirtQueue's used_seq_idx represents the next expected index in a
sequence of VirtQueueElements to be processed (put on the used ring).
The next VirtQueueElement added to the used ring must match this
sequence number before additional elements can be safely added to the
used ring. It's also particularly useful for helping find the number of
new elements added to the used ring.

A VirtQueue's current_seq_idx represents the current sequence index.
This value is essentially a counter where the value is assigned to a new
VirtQueueElement and then incremented. Given its uint16_t type, this
sequence number can be between 0 and 65,535.

A VirtQueueElement's seq_idx represents the sequence number assigned to
the VirtQueueElement when it was created. This value must match with the
VirtQueue's used_seq_idx before the element can be put on the used ring
by the device.

Signed-off-by: Jonah Palmer 
---
   hw/virtio/virtio.c | 18 ++
   include/hw/virtio/virtio.h |  1 +
   2 files changed, 19 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index fb6b4ccd83..069d96df99 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -132,6 +132,10 @@ struct VirtQueue
   uint16_t used_idx;
   bool used_wrap_counter;

+/* In-Order sequence indices */
+uint16_t used_seq_idx;
+uint16_t current_seq_idx;
+


I'm having a hard time understanding the difference between these and
last_avail_idx and used_idx. It seems to me if we replace them
everything will work? What am I missing?



For used_seq_idx, it does work like used_idx except the difference is
when their values get updated, specifically for the split VQ case.

As you know, for the split VQ case, the used_idx is updated during
virtqueue_split_flush. However, imagine a batch of elements coming in
where virtqueue_split_fill is called multiple times before
virtqueue_split_flush. We want to make sure we write these elements to
the used ring in-order and we'll know its order based on used_seq_idx.

Alternatively, I thought about replicating the logic for the packed VQ
case (where this used_seq_idx isn't used) where we start looking at
vq->used_elems[vq->used_idx] and iterate through until we find a used
element, but I wasn't sure how to handle the case where elements get
used (written to the used ring) and new elements get put in used_elems
before the used_idx is updated. Since this search would require us to
always start at index vq->used_idx.

For example, say, of three elements getting filled (elem0 - elem2),
elem1 and elem0 come back first (vq->used_idx = 0):

elem1 - not in-order
elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
  in-order, write elem0 and elem1 to used ring, mark elements as
  used

Then elem2 comes back, but vq->used_idx is still 0, so how do we know to
ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1
(elem1) and iterate to vq->used_idx + 2 (elem2)?

Hmm... now that I'm thinking about it, maybe for the split VQ case we
could continue looking through the vq->used_elems array until we find an
unused element... but then again how would we (1) know if the element is
in-order and (2) know when to stop searching?



Ok I think I understand the problem now. It is aggravated if we add
chained descriptors to the mix.

We know that the order of used descriptors must be the exact same as
the order they were made available, leaving out in order batching.
What if vq->used_elems at virtqueue_pop and then virtqueue_push just
marks them as used somehow? Two booleans (or flag) would do for a
first iteration.

If we go with this approach I think used_elems should be renamed actually.



If I'm understanding correctly, I don't think adding newly created 
elements to vq->used_elems at virtqueue_pop will do much for us. We 
could just keep adding processed elements to vq->used_elems at 
virtqueue_fill but instead of:


vq->used_elems[seq_idx].in_num = elem->in_num;
vq->used_elems[seq_idx].out_num = elem->out_num;

We could do:

vq->used_elems[seq_idx].in_num = 1;
vq->used_elems[seq_idx].out_num = 1;

We'd use in_num and out_num as separate flags. in_num could indicate if 
this element has been written to the used ring while out_num could 
indicate if this element has been flushed (1 for no, 0 for yes). In 
other words, when we go to write to the used ring, start at index 
vq->used_idx and iterate through the used elem

Re: [PATCH for-9.1 v3 09/11] hostmem: add a new memory backend based on POSIX shm_open()

2024-04-04 Thread David Hildenbrand

On 04.04.24 14:23, Stefano Garzarella wrote:

shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.

In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.

Signed-off-by: Stefano Garzarella 
---
v3
- enriched commit message and documentation to highlight that we
   want to mimic memfd (David)
---
  docs/system/devices/vhost-user.rst |   5 +-
  qapi/qom.json  |  17 +
  backends/hostmem-shm.c | 118 +
  backends/meson.build   |   1 +
  qemu-options.hx|  11 +++
  5 files changed, 150 insertions(+), 2 deletions(-)
  create mode 100644 backends/hostmem-shm.c

diff --git a/docs/system/devices/vhost-user.rst 
b/docs/system/devices/vhost-user.rst
index 9b2da106ce..35259d8ec7 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -98,8 +98,9 @@ Shared memory object
  
  In order for the daemon to access the VirtIO queues to process the

  requests it needs access to the guest's address space. This is
-achieved via the ``memory-backend-file`` or ``memory-backend-memfd``
-objects. A reference to a file-descriptor which can access this object
+achieved via the ``memory-backend-file``, ``memory-backend-memfd``, or
+``memory-backend-shm`` objects.
+A reference to a file-descriptor which can access this object
  will be passed via the socket as part of the protocol negotiation.
  
  Currently the shared memory object needs to match the size of the main

diff --git a/qapi/qom.json b/qapi/qom.json
index 85e6b4f84a..5252ec69e3 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -721,6 +721,19 @@
  '*hugetlbsize': 'size',
  '*seal': 'bool' } }
  
+##

+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# The @share boolean option is true by default with shm.
+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { } }
+


Acked-by: David Hildenbrand 

One comment: we should maybe just forbid setting share=off. it doesn't 
make any sense and it can even result in an unexpected double memory 
consumption. We missed doing that for memfd, unfortunately.


--
Cheers,

David / dhildenb




Re: [PATCH for-9.1 v3 08/11] contrib/vhost-user-blk: enable it on any POSIX system

2024-04-04 Thread Philippe Mathieu-Daudé

Hi Stefano,

On 4/4/24 14:23, Stefano Garzarella wrote:

Let's make the code more portable by using the "qemu/bswap.h" API
and adding defines from block/file-posix.c to support O_DIRECT in
other systems (e.g. macOS).

vhost-user-server.c is a dependency, let's enable it for any POSIX
system.

Signed-off-by: Stefano Garzarella 
---
  meson.build |  2 --
  contrib/vhost-user-blk/vhost-user-blk.c | 19 +--
  util/meson.build|  4 +++-
  3 files changed, 20 insertions(+), 5 deletions(-)




diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index a8ab9269a2..462e584857 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -16,6 +16,7 @@
   */
  
  #include "qemu/osdep.h"

+#include "qemu/bswap.h"
  #include "standard-headers/linux/virtio_blk.h"
  #include "libvhost-user-glib.h"




@@ -267,13 +282,13 @@ static int vub_virtio_process_req(VubDev *vdev_blk,
  req->in = (struct virtio_blk_inhdr *)elem->in_sg[in_num - 1].iov_base;
  in_num--;
  
-type = le32toh(req->out->type);

+type = le32_to_cpu(req->out->type);
  switch (type & ~VIRTIO_BLK_T_BARRIER) {
  case VIRTIO_BLK_T_IN:
  case VIRTIO_BLK_T_OUT: {
  ssize_t ret = 0;
  bool is_write = type & VIRTIO_BLK_T_OUT;
-req->sector_num = le64toh(req->out->sector);
+req->sector_num = le64_to_cpu(req->out->sector);
  if (is_write) {
  ret  = vub_writev(req, &elem->out_sg[1], out_num);
  } else {

Can we switch to the bswap API in a preliminary patch,
converting all the source files?



Re: [PATCH] hw/nvme: Add support for setting the MQES for the NVMe emulation

2024-04-04 Thread Keith Busch
On Thu, Apr 04, 2024 at 01:04:18PM +0100, John Berg wrote:
> The MQES field in the CAP register describes the Maximum Queue Entries
> Supported for the IO queues of an NVMe controller. Adding a +1 to the
> value in this field results in the total queue size. A full queue is
> when a queue of size N contains N - 1 entries, and the minimum queue
> size is 2. Thus the lowest MQES value is 1.
> 
> This patch adds the new mqes property to the NVMe emulation which allows
> a user to specify the maximum queue size by setting this property. This
> is useful as it enables testing of NVMe controller where the MQES is
> relatively small. The smallest NVMe queue size supported in NVMe is 2
> submission and completion entries, which means that the smallest legal
> mqes value is 1.
> 
> The following example shows how the mqes can be set for a the NVMe
> emulation:
> 
> -drive id=nvme0,if=none,file=nvme.img,format=raw
> -device nvme,drive=nvme0,serial=foo,mqes=1
> 
> If the mqes property is not provided then the default mqes will still be
> 0x7ff (the queue size is 2048 entries).

Looks good. I had to double check where nvme_create_sq() was getting its
limit from when processing the host command, and sure enough it's
directly from the register field.

Reviewed-by: Keith Busch 



[PULL for-9.0 0/1] Block patches

2024-04-04 Thread Stefan Hajnoczi
The following changes since commit 786fd793b81410fb2a28914315e2f05d2ff6733b:

  Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging 
(2024-04-03 12:52:03 +0100)

are available in the Git repository at:

  https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to bbdf9023665f409113cb07b463732861af63fb47:

  block/virtio-blk: Fix memory leak from virtio_blk_zone_report (2024-04-04 
09:29:42 -0400)


Pull request

Fix a memory leak in virtio-blk zone report emulation code when the request is
invalid.



Zheyu Ma (1):
  block/virtio-blk: Fix memory leak from virtio_blk_zone_report

 hw/block/virtio-blk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

-- 
2.44.0




[PULL for-9.0 1/1] block/virtio-blk: Fix memory leak from virtio_blk_zone_report

2024-04-04 Thread Stefan Hajnoczi
From: Zheyu Ma 

This modification ensures that in scenarios where the buffer size is
insufficient for a zone report, the function will now properly set an
error status and proceed to a cleanup label, instead of merely
returning.

The following ASAN log reveals it:

==1767400==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 312 byte(s) in 1 object(s) allocated from:
#0 0x64ac7b3280cd in malloc 
llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:129:3
#1 0x735b02fb9738 in g_malloc 
(/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5e738)
#2 0x64ac7d23be96 in virtqueue_split_pop hw/virtio/virtio.c:1612:12
#3 0x64ac7d23728a in virtqueue_pop hw/virtio/virtio.c:1783:16
#4 0x64ac7cfcaacd in virtio_blk_get_request hw/block/virtio-blk.c:228:27
#5 0x64ac7cfca7c7 in virtio_blk_handle_vq hw/block/virtio-blk.c:1123:23
#6 0x64ac7cfecb95 in virtio_blk_handle_output hw/block/virtio-blk.c:1157:5

Signed-off-by: Zheyu Ma 
Message-id: 20240404120040.1951466-1-zheyum...@gmail.com
Signed-off-by: Stefan Hajnoczi 
---
 hw/block/virtio-blk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 92de315f17..bb86e65f65 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -768,7 +768,8 @@ static void virtio_blk_handle_zone_report(VirtIOBlockReq 
*req,
 sizeof(struct virtio_blk_zone_report) +
 sizeof(struct virtio_blk_zone_descriptor)) {
 virtio_error(vdev, "in buffer too small for zone report");
-return;
+err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
+goto out;
 }
 
 /* start byte offset of the zone report */
-- 
2.44.0




Re: [PATCH for-9.1 v3 05/11] contrib/vhost-user-blk: fix bind() using the right size of the address

2024-04-04 Thread Philippe Mathieu-Daudé

On 4/4/24 14:23, Stefano Garzarella wrote:

On macOS passing `-s /tmp/vhost.socket` parameter to the vhost-user-blk
application, the bind was done on `/tmp/vhost.socke` pathname,
missing the last character.

This sounds like one of the portability problems described in the
unix(7) manpage:

Pathname sockets
When  binding  a socket to a pathname, a few rules should
be observed for maximum portability and ease of coding:

•  The pathname in sun_path should be null-terminated.

•  The length of the pathname, including the  terminating
   null byte, should not exceed the size of sun_path.

•  The  addrlen  argument  that  describes  the enclosing
   sockaddr_un structure should have a value of at least:

   offsetof(struct sockaddr_un, sun_path) +
   strlen(addr.sun_path)+1

   or,  more  simply,  addrlen  can   be   specified   as
   sizeof(struct sockaddr_un).

So let's follow the last advice and simplify the code as well.

Signed-off-by: Stefano Garzarella 
---
  contrib/vhost-user-blk/vhost-user-blk.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH for-9.1 v3 01/11] libvhost-user: set msg.msg_control to NULL when it is empty

2024-04-04 Thread Philippe Mathieu-Daudé

On 4/4/24 14:23, Stefano Garzarella wrote:

On some OS (e.g. macOS) sendmsg() returns -1 (errno EINVAL) if
the `struct msghdr` has the field `msg_controllen` set to 0, but
`msg_control` is not NULL.

Reviewed-by: Eric Blake 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
  subprojects/libvhost-user/libvhost-user.c | 1 +
  1 file changed, 1 insertion(+)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH] block/virtio-blk: Fix memory leak from virtio_blk_zone_report

2024-04-04 Thread Stefan Hajnoczi
On Thu, Apr 04, 2024 at 02:00:40PM +0200, Zheyu Ma wrote:
> This modification ensures that in scenarios where the buffer size is
> insufficient for a zone report, the function will now properly set an
> error status and proceed to a cleanup label, instead of merely
> returning.
> 
> The following ASAN log reveals it:
> 
> ==1767400==ERROR: LeakSanitizer: detected memory leaks
> Direct leak of 312 byte(s) in 1 object(s) allocated from:
> #0 0x64ac7b3280cd in malloc 
> llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:129:3
> #1 0x735b02fb9738 in g_malloc 
> (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5e738)
> #2 0x64ac7d23be96 in virtqueue_split_pop hw/virtio/virtio.c:1612:12
> #3 0x64ac7d23728a in virtqueue_pop hw/virtio/virtio.c:1783:16
> #4 0x64ac7cfcaacd in virtio_blk_get_request hw/block/virtio-blk.c:228:27
> #5 0x64ac7cfca7c7 in virtio_blk_handle_vq hw/block/virtio-blk.c:1123:23
> #6 0x64ac7cfecb95 in virtio_blk_handle_output hw/block/virtio-blk.c:1157:5
> 
> Signed-off-by: Zheyu Ma 
> ---
>  hw/block/virtio-blk.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 92de315f17..bb86e65f65 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -768,7 +768,8 @@ static void virtio_blk_handle_zone_report(VirtIOBlockReq 
> *req,
>  sizeof(struct virtio_blk_zone_report) +
>  sizeof(struct virtio_blk_zone_descriptor)) {
>  virtio_error(vdev, "in buffer too small for zone report");
> -return;
> +err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
> +goto out;
>  }
>  
>  /* start byte offset of the zone report */
> -- 
> 2.34.1
> 

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan


signature.asc
Description: PGP signature


Re: [PATCH] hw/nvme: Add support for setting the MQES for the NVMe emulation

2024-04-04 Thread Klaus Jensen
On Apr  4 13:04, John Berg wrote:
> From: John Berg 
> 
> The MQES field in the CAP register describes the Maximum Queue Entries
> Supported for the IO queues of an NVMe controller. Adding a +1 to the
> value in this field results in the total queue size. A full queue is
> when a queue of size N contains N - 1 entries, and the minimum queue
> size is 2. Thus the lowest MQES value is 1.
> 
> This patch adds the new mqes property to the NVMe emulation which allows
> a user to specify the maximum queue size by setting this property. This
> is useful as it enables testing of NVMe controller where the MQES is
> relatively small. The smallest NVMe queue size supported in NVMe is 2
> submission and completion entries, which means that the smallest legal
> mqes value is 1.
> 
> The following example shows how the mqes can be set for a the NVMe
> emulation:
> 
> -drive id=nvme0,if=none,file=nvme.img,format=raw
> -device nvme,drive=nvme0,serial=foo,mqes=1
> 
> If the mqes property is not provided then the default mqes will still be
> 0x7ff (the queue size is 2048 entries).
> 
> Signed-off-by: John Berg 
> ---
>  hw/nvme/ctrl.c | 9 -
>  hw/nvme/nvme.h | 1 +
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> index 127c3d2383..86cda9bc73 100644
> --- a/hw/nvme/ctrl.c
> +++ b/hw/nvme/ctrl.c
> @@ -7805,6 +7805,12 @@ static bool nvme_check_params(NvmeCtrl *n, Error 
> **errp)
>  return false;
>  }
>  
> +if (params->mqes < 1)
> +{

Please keep the `{` on the same line as the `if`. I think checkpatch.pl
should catch this.

No need to send a v2, I'll fix it up when I apply it to nvme-next :)

Thanks!


signature.asc
Description: PGP signature


Re: [PATCH] hw/nvme: Add support for setting the MQES for the NVMe emulation

2024-04-04 Thread Klaus Jensen
On Apr  4 13:04, John Berg wrote:
> From: John Berg 
> 
> The MQES field in the CAP register describes the Maximum Queue Entries
> Supported for the IO queues of an NVMe controller. Adding a +1 to the
> value in this field results in the total queue size. A full queue is
> when a queue of size N contains N - 1 entries, and the minimum queue
> size is 2. Thus the lowest MQES value is 1.
> 
> This patch adds the new mqes property to the NVMe emulation which allows
> a user to specify the maximum queue size by setting this property. This
> is useful as it enables testing of NVMe controller where the MQES is
> relatively small. The smallest NVMe queue size supported in NVMe is 2
> submission and completion entries, which means that the smallest legal
> mqes value is 1.
> 
> The following example shows how the mqes can be set for a the NVMe
> emulation:
> 
> -drive id=nvme0,if=none,file=nvme.img,format=raw
> -device nvme,drive=nvme0,serial=foo,mqes=1
> 
> If the mqes property is not provided then the default mqes will still be
> 0x7ff (the queue size is 2048 entries).
> 
> Signed-off-by: John Berg 

LGTM,

Reviewed-by: Klaus Jensen 


signature.asc
Description: PGP signature


[PATCH] hw/nvme: Add support for setting the MQES for the NVMe emulation

2024-04-04 Thread John Berg
From: John Berg 

The MQES field in the CAP register describes the Maximum Queue Entries
Supported for the IO queues of an NVMe controller. Adding a +1 to the
value in this field results in the total queue size. A full queue is
when a queue of size N contains N - 1 entries, and the minimum queue
size is 2. Thus the lowest MQES value is 1.

This patch adds the new mqes property to the NVMe emulation which allows
a user to specify the maximum queue size by setting this property. This
is useful as it enables testing of NVMe controller where the MQES is
relatively small. The smallest NVMe queue size supported in NVMe is 2
submission and completion entries, which means that the smallest legal
mqes value is 1.

The following example shows how the mqes can be set for a the NVMe
emulation:

-drive id=nvme0,if=none,file=nvme.img,format=raw
-device nvme,drive=nvme0,serial=foo,mqes=1

If the mqes property is not provided then the default mqes will still be
0x7ff (the queue size is 2048 entries).

Signed-off-by: John Berg 
---
 hw/nvme/ctrl.c | 9 -
 hw/nvme/nvme.h | 1 +
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 127c3d2383..86cda9bc73 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -7805,6 +7805,12 @@ static bool nvme_check_params(NvmeCtrl *n, Error **errp)
 return false;
 }
 
+if (params->mqes < 1)
+{
+error_setg(errp, "mqes property cannot be less than 1");
+return false;
+}
+
 if (n->pmr.dev) {
 if (params->msix_exclusive_bar) {
 error_setg(errp, "not enough BARs available to enable PMR");
@@ -8281,7 +8287,7 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
 
 id->ctratt = cpu_to_le32(ctratt);
 
-NVME_CAP_SET_MQES(cap, 0x7ff);
+NVME_CAP_SET_MQES(cap, n->params.mqes);
 NVME_CAP_SET_CQR(cap, 1);
 NVME_CAP_SET_TO(cap, 0xf);
 NVME_CAP_SET_CSS(cap, NVME_CAP_CSS_NVM);
@@ -8451,6 +8457,7 @@ static Property nvme_props[] = {
   params.sriov_max_vq_per_vf, 0),
 DEFINE_PROP_BOOL("msix-exclusive-bar", NvmeCtrl, params.msix_exclusive_bar,
  false),
+DEFINE_PROP_UINT16("mqes", NvmeCtrl, params.mqes, 0x7ff),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index bed8191bd5..2e7d31c0ae 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -521,6 +521,7 @@ typedef struct NvmeParams {
 uint32_t num_queues; /* deprecated since 5.1 */
 uint32_t max_ioqpairs;
 uint16_t msix_qsize;
+uint16_t mqes;
 uint32_t cmb_size_mb;
 uint8_t  aerl;
 uint32_t aer_max_queued;
-- 
2.34.1




[PATCH] block/virtio-blk: Fix memory leak from virtio_blk_zone_report

2024-04-04 Thread Zheyu Ma
This modification ensures that in scenarios where the buffer size is
insufficient for a zone report, the function will now properly set an
error status and proceed to a cleanup label, instead of merely
returning.

The following ASAN log reveals it:

==1767400==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 312 byte(s) in 1 object(s) allocated from:
#0 0x64ac7b3280cd in malloc 
llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:129:3
#1 0x735b02fb9738 in g_malloc 
(/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x5e738)
#2 0x64ac7d23be96 in virtqueue_split_pop hw/virtio/virtio.c:1612:12
#3 0x64ac7d23728a in virtqueue_pop hw/virtio/virtio.c:1783:16
#4 0x64ac7cfcaacd in virtio_blk_get_request hw/block/virtio-blk.c:228:27
#5 0x64ac7cfca7c7 in virtio_blk_handle_vq hw/block/virtio-blk.c:1123:23
#6 0x64ac7cfecb95 in virtio_blk_handle_output hw/block/virtio-blk.c:1157:5

Signed-off-by: Zheyu Ma 
---
 hw/block/virtio-blk.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 92de315f17..bb86e65f65 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -768,7 +768,8 @@ static void virtio_blk_handle_zone_report(VirtIOBlockReq 
*req,
 sizeof(struct virtio_blk_zone_report) +
 sizeof(struct virtio_blk_zone_descriptor)) {
 virtio_error(vdev, "in buffer too small for zone report");
-return;
+err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
+goto out;
 }
 
 /* start byte offset of the zone report */
-- 
2.34.1




[PATCH for-9.1 v3 08/11] contrib/vhost-user-blk: enable it on any POSIX system

2024-04-04 Thread Stefano Garzarella
Let's make the code more portable by using the "qemu/bswap.h" API
and adding defines from block/file-posix.c to support O_DIRECT in
other systems (e.g. macOS).

vhost-user-server.c is a dependency, let's enable it for any POSIX
system.

Signed-off-by: Stefano Garzarella 
---
 meson.build |  2 --
 contrib/vhost-user-blk/vhost-user-blk.c | 19 +--
 util/meson.build|  4 +++-
 3 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/meson.build b/meson.build
index 3197a2f62e..b541e5c875 100644
--- a/meson.build
+++ b/meson.build
@@ -1956,8 +1956,6 @@ has_statx = cc.has_header_symbol('sys/stat.h', 
'STATX_BASIC_STATS', prefix: gnu_
 has_statx_mnt_id = cc.has_header_symbol('sys/stat.h', 'STATX_MNT_ID', prefix: 
gnu_source_prefix)
 
 have_vhost_user_blk_server = get_option('vhost_user_blk_server') \
-  .require(host_os == 'linux',
-   error_message: 'vhost_user_blk_server requires linux') \
   .require(have_vhost_user,
error_message: 'vhost_user_blk_server requires vhost-user support') 
\
   .disable_auto_if(not have_tools and not have_system) \
diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index a8ab9269a2..462e584857 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -16,6 +16,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/bswap.h"
 #include "standard-headers/linux/virtio_blk.h"
 #include "libvhost-user-glib.h"
 
@@ -24,6 +25,20 @@
 #include 
 #endif
 
+/* OS X does not have O_DSYNC */
+#ifndef O_DSYNC
+#ifdef O_SYNC
+#define O_DSYNC O_SYNC
+#elif defined(O_FSYNC)
+#define O_DSYNC O_FSYNC
+#endif
+#endif
+
+/* Approximate O_DIRECT with O_DSYNC if O_DIRECT isn't available */
+#ifndef O_DIRECT
+#define O_DIRECT O_DSYNC
+#endif
+
 enum {
 VHOST_USER_BLK_MAX_QUEUES = 8,
 };
@@ -267,13 +282,13 @@ static int vub_virtio_process_req(VubDev *vdev_blk,
 req->in = (struct virtio_blk_inhdr *)elem->in_sg[in_num - 1].iov_base;
 in_num--;
 
-type = le32toh(req->out->type);
+type = le32_to_cpu(req->out->type);
 switch (type & ~VIRTIO_BLK_T_BARRIER) {
 case VIRTIO_BLK_T_IN:
 case VIRTIO_BLK_T_OUT: {
 ssize_t ret = 0;
 bool is_write = type & VIRTIO_BLK_T_OUT;
-req->sector_num = le64toh(req->out->sector);
+req->sector_num = le64_to_cpu(req->out->sector);
 if (is_write) {
 ret  = vub_writev(req, &elem->out_sg[1], out_num);
 } else {
diff --git a/util/meson.build b/util/meson.build
index 0ef9886be0..f52682ce96 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -113,10 +113,12 @@ if have_block
 util_ss.add(files('filemonitor-stub.c'))
   endif
   if host_os == 'linux'
-util_ss.add(files('vhost-user-server.c'), vhost_user)
 util_ss.add(files('vfio-helpers.c'))
 util_ss.add(files('chardev_open.c'))
   endif
+  if host_os != 'windows'
+util_ss.add(files('vhost-user-server.c'), vhost_user)
+  endif
 endif
 
 if cpu == 'aarch64'
-- 
2.44.0




[PATCH for-9.1 v3 05/11] contrib/vhost-user-blk: fix bind() using the right size of the address

2024-04-04 Thread Stefano Garzarella
On macOS passing `-s /tmp/vhost.socket` parameter to the vhost-user-blk
application, the bind was done on `/tmp/vhost.socke` pathname,
missing the last character.

This sounds like one of the portability problems described in the
unix(7) manpage:

   Pathname sockets
   When  binding  a socket to a pathname, a few rules should
   be observed for maximum portability and ease of coding:

   •  The pathname in sun_path should be null-terminated.

   •  The length of the pathname, including the  terminating
  null byte, should not exceed the size of sun_path.

   •  The  addrlen  argument  that  describes  the enclosing
  sockaddr_un structure should have a value of at least:

  offsetof(struct sockaddr_un, sun_path) +
  strlen(addr.sun_path)+1

  or,  more  simply,  addrlen  can   be   specified   as
  sizeof(struct sockaddr_un).

So let's follow the last advice and simplify the code as well.

Signed-off-by: Stefano Garzarella 
---
 contrib/vhost-user-blk/vhost-user-blk.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index 89e5f11a64..a8ab9269a2 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -469,7 +469,6 @@ static int unix_sock_new(char *unix_fn)
 {
 int sock;
 struct sockaddr_un un;
-size_t len;
 
 assert(unix_fn);
 
@@ -481,10 +480,9 @@ static int unix_sock_new(char *unix_fn)
 
 un.sun_family = AF_UNIX;
 (void)snprintf(un.sun_path, sizeof(un.sun_path), "%s", unix_fn);
-len = sizeof(un.sun_family) + strlen(un.sun_path);
 
 (void)unlink(unix_fn);
-if (bind(sock, (struct sockaddr *)&un, len) < 0) {
+if (bind(sock, (struct sockaddr *)&un, sizeof(un)) < 0) {
 perror("bind");
 goto fail;
 }
-- 
2.44.0




[PATCH for-9.1 v3 09/11] hostmem: add a new memory backend based on POSIX shm_open()

2024-04-04 Thread Stefano Garzarella
shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.

In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.

Signed-off-by: Stefano Garzarella 
---
v3
- enriched commit message and documentation to highlight that we
  want to mimic memfd (David)
---
 docs/system/devices/vhost-user.rst |   5 +-
 qapi/qom.json  |  17 +
 backends/hostmem-shm.c | 118 +
 backends/meson.build   |   1 +
 qemu-options.hx|  11 +++
 5 files changed, 150 insertions(+), 2 deletions(-)
 create mode 100644 backends/hostmem-shm.c

diff --git a/docs/system/devices/vhost-user.rst 
b/docs/system/devices/vhost-user.rst
index 9b2da106ce..35259d8ec7 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -98,8 +98,9 @@ Shared memory object
 
 In order for the daemon to access the VirtIO queues to process the
 requests it needs access to the guest's address space. This is
-achieved via the ``memory-backend-file`` or ``memory-backend-memfd``
-objects. A reference to a file-descriptor which can access this object
+achieved via the ``memory-backend-file``, ``memory-backend-memfd``, or
+``memory-backend-shm`` objects.
+A reference to a file-descriptor which can access this object
 will be passed via the socket as part of the protocol negotiation.
 
 Currently the shared memory object needs to match the size of the main
diff --git a/qapi/qom.json b/qapi/qom.json
index 85e6b4f84a..5252ec69e3 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -721,6 +721,19 @@
 '*hugetlbsize': 'size',
 '*seal': 'bool' } }
 
+##
+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# The @share boolean option is true by default with shm.
+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { } }
+
 ##
 # @MemoryBackendEpcProperties:
 #
@@ -976,6 +989,8 @@
 { 'name': 'memory-backend-memfd',
   'if': 'CONFIG_LINUX' },
 'memory-backend-ram',
+{ 'name': 'memory-backend-shm',
+  'if': 'CONFIG_POSIX' },
 'pef-guest',
 { 'name': 'pr-manager-helper',
   'if': 'CONFIG_LINUX' },
@@ -1047,6 +1062,8 @@
   'memory-backend-memfd':   { 'type': 'MemoryBackendMemfdProperties',
   'if': 'CONFIG_LINUX' },
   'memory-backend-ram': 'MemoryBackendProperties',
+  'memory-backend-shm': { 'type': 'MemoryBackendShmProperties',
+  'if': 'CONFIG_POSIX' },
   'pr-manager-helper':  { 'type': 'PrManagerHelperProperties',
   'if': 'CONFIG_LINUX' },
   'qtest':  'QtestProperties',
diff --git a/backends/hostmem-shm.c b/backends/hostmem-shm.c
new file mode 100644
index 00..7595204d29
--- /dev/null
+++ b/backends/hostmem-shm.c
@@ -0,0 +1,118 @@
+/*
+ * QEMU host POSIX shared memory object backend
+ *
+ * Copyright (C) 2024 Red Hat Inc
+ *
+ * Authors:
+ *   Stefano Garzarella 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/hostmem.h"
+#include "qapi/error.h"
+
+#define TYPE_MEMORY_BACKEND_SHM "memory-backend-shm"
+
+OBJECT_DECLARE_SIMPLE_TYPE(HostMemoryBackendShm, MEMORY_BACKEND_SHM)
+
+struct HostMemoryBackendShm {
+HostMemoryBackend parent_obj;
+};
+
+static bool
+shm_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
+{
+g_autoptr(GString) shm_name = g_string_new(NULL);
+g_autofree char *backend_name = NULL;
+uint32_t ram_flags;
+int fd, oflag;
+mode_t mode;
+
+if (!backend->size) {
+error_setg(errp, "can't create backend with size 0");
+return false;
+}
+
+/*
+ * Let's use `mode = 0` because we don't want other processes to open our
+ * memory unless we share the file descriptor with them.
+ */
+mode = 0;
+oflag = O_RDWR | O_CREAT | O_EXCL;
+backend_name = host_memory_backend_get_name(backend);
+
+/*
+ * Some operating

[PATCH for-9.1 v3 01/11] libvhost-user: set msg.msg_control to NULL when it is empty

2024-04-04 Thread Stefano Garzarella
On some OS (e.g. macOS) sendmsg() returns -1 (errno EINVAL) if
the `struct msghdr` has the field `msg_controllen` set to 0, but
`msg_control` is not NULL.

Reviewed-by: Eric Blake 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index a879149fef..22bea0c775 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -632,6 +632,7 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg 
*vmsg)
 memcpy(CMSG_DATA(cmsg), vmsg->fds, fdsize);
 } else {
 msg.msg_controllen = 0;
+msg.msg_control = NULL;
 }
 
 do {
-- 
2.44.0




[PATCH for-9.1 v3 11/11] tests/qtest/vhost-user-test: add a test case for memory-backend-shm

2024-04-04 Thread Stefano Garzarella
`memory-backend-shm` can be used with vhost-user devices, so let's
add a new test case for it.

Signed-off-by: Stefano Garzarella 
---
 tests/qtest/vhost-user-test.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index d4e437265f..8c1d903b2a 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -44,6 +44,8 @@
 "mem-path=%s,share=on -numa node,memdev=mem"
 #define QEMU_CMD_MEMFD  " -m %d -object memory-backend-memfd,id=mem,size=%dM," 
\
 " -numa node,memdev=mem"
+#define QEMU_CMD_SHM" -m %d -object memory-backend-shm,id=mem,size=%dM," \
+" -numa node,memdev=mem"
 #define QEMU_CMD_CHR" -chardev socket,id=%s,path=%s%s"
 #define QEMU_CMD_NETDEV " -netdev vhost-user,id=hs0,chardev=%s,vhostforce=on"
 
@@ -195,6 +197,7 @@ enum test_memfd {
 TEST_MEMFD_AUTO,
 TEST_MEMFD_YES,
 TEST_MEMFD_NO,
+TEST_MEMFD_SHM,
 };
 
 static void append_vhost_net_opts(TestServer *s, GString *cmd_line,
@@ -228,6 +231,8 @@ static void append_mem_opts(TestServer *server, GString 
*cmd_line,
 
 if (memfd == TEST_MEMFD_YES) {
 g_string_append_printf(cmd_line, QEMU_CMD_MEMFD, size, size);
+} else if (memfd == TEST_MEMFD_SHM) {
+g_string_append_printf(cmd_line, QEMU_CMD_SHM, size, size);
 } else {
 const char *root = init_hugepagefs() ? : server->tmpfs;
 
@@ -788,6 +793,19 @@ static void *vhost_user_test_setup_memfd(GString 
*cmd_line, void *arg)
 return server;
 }
 
+static void *vhost_user_test_setup_shm(GString *cmd_line, void *arg)
+{
+TestServer *server = test_server_new("vhost-user-test", arg);
+test_server_listen(server);
+
+append_mem_opts(server, cmd_line, 256, TEST_MEMFD_SHM);
+server->vu_ops->append_opts(server, cmd_line, "");
+
+g_test_queue_destroy(vhost_user_test_cleanup, server);
+
+return server;
+}
+
 static void test_read_guest_mem(void *obj, void *arg, QGuestAllocator *alloc)
 {
 TestServer *server = arg;
@@ -1081,6 +1099,11 @@ static void register_vhost_user_test(void)
  "virtio-net",
  test_read_guest_mem, &opts);
 
+opts.before = vhost_user_test_setup_shm;
+qos_add_test("vhost-user/read-guest-mem/shm",
+ "virtio-net",
+ test_read_guest_mem, &opts);
+
 if (qemu_memfd_check(MFD_ALLOW_SEALING)) {
 opts.before = vhost_user_test_setup_memfd;
 qos_add_test("vhost-user/read-guest-mem/memfd",
-- 
2.44.0




[PATCH for-9.1 v3 07/11] libvhost-user: enable it on any POSIX system

2024-04-04 Thread Stefano Garzarella
The vhost-user protocol is not really Linux-specific so let's enable
libvhost-user for any POSIX system.

Compiling it on macOS and FreeBSD some problems came up:
- avoid to include linux/vhost.h which is avaibale only on Linux
  (vhost_types.h contains many of the things we need)
- macOS doesn't provide sys/endian.h, so let's define them
  (note: libvhost-user doesn't include qemu's headers, so we can't use
   use "qemu/bswap.h")
- define eventfd_[write|read] as write/read wrapper when system doesn't
  provide those (e.g. macOS)
- copy SEAL defines from include/qemu/memfd.h to make the code works
  on FreeBSD where MFD_ALLOW_SEALING is defined
- define MAP_NORESERVE if it's not defined (e.g. on FreeBSD)

Signed-off-by: Stefano Garzarella 
---
 meson.build   |  2 +-
 subprojects/libvhost-user/libvhost-user.h |  2 +-
 subprojects/libvhost-user/libvhost-user.c | 60 +--
 3 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/meson.build b/meson.build
index c19d51501a..3197a2f62e 100644
--- a/meson.build
+++ b/meson.build
@@ -3194,7 +3194,7 @@ endif
 config_host_data.set('CONFIG_FDT', fdt.found())
 
 vhost_user = not_found
-if host_os == 'linux' and have_vhost_user
+if have_vhost_user
   libvhost_user = subproject('libvhost-user')
   vhost_user = libvhost_user.get_variable('vhost_user_dep')
 endif
diff --git a/subprojects/libvhost-user/libvhost-user.h 
b/subprojects/libvhost-user/libvhost-user.h
index deb40e77b3..e13e1d3931 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -18,9 +18,9 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include "standard-headers/linux/virtio_ring.h"
+#include "standard-headers/linux/vhost_types.h"
 
 /* Based on qemu/hw/virtio/vhost-user.c */
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 1c361ffd51..03edb4bf64 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -28,9 +28,7 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
 
 /* Necessary to provide VIRTIO_F_VERSION_1 on system
  * with older linux headers. Must appear before
@@ -39,8 +37,8 @@
 #include "standard-headers/linux/virtio_config.h"
 
 #if defined(__linux__)
+#include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -52,6 +50,62 @@
 
 #endif
 
+#if defined(__APPLE__) && (__MACH__)
+#include 
+#define htobe16(x) OSSwapHostToBigInt16(x)
+#define htole16(x) OSSwapHostToLittleInt16(x)
+#define be16toh(x) OSSwapBigToHostInt16(x)
+#define le16toh(x) OSSwapLittleToHostInt16(x)
+
+#define htobe32(x) OSSwapHostToBigInt32(x)
+#define htole32(x) OSSwapHostToLittleInt32(x)
+#define be32toh(x) OSSwapBigToHostInt32(x)
+#define le32toh(x) OSSwapLittleToHostInt32(x)
+
+#define htobe64(x) OSSwapHostToBigInt64(x)
+#define htole64(x) OSSwapHostToLittleInt64(x)
+#define be64toh(x) OSSwapBigToHostInt64(x)
+#define le64toh(x) OSSwapLittleToHostInt64(x)
+#endif
+
+#ifdef CONFIG_EVENTFD
+#include 
+#else
+#define eventfd_t uint64_t
+
+int eventfd_write(int fd, eventfd_t value)
+{
+return (write(fd, &value, sizeof(value)) == sizeof(value)) ? 0 : -1;
+}
+
+int eventfd_read(int fd, eventfd_t *value)
+{
+return (read(fd, value, sizeof(*value)) == sizeof(*value)) ? 0 : -1;
+}
+#endif
+
+#ifdef MFD_ALLOW_SEALING
+#include 
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001  /* prevent further seals from being set */
+#define F_SEAL_SHRINK   0x0002  /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004  /* prevent file from growing */
+#define F_SEAL_WRITE0x0008  /* prevent writes */
+#endif
+#endif
+
+#ifndef MAP_NORESERVE
+#define MAP_NORESERVE 0
+#endif
+
 #include "include/atomic.h"
 
 #include "libvhost-user.h"
-- 
2.44.0




[PATCH for-9.1 v3 04/11] vhost-user-server: do not set memory fd non-blocking

2024-04-04 Thread Stefano Garzarella
In vhost-user-server we set all fd received from the other peer
in non-blocking mode. For some of them (e.g. memfd, shm_open, etc.)
it's not really needed, because we don't use these fd with blocking
operations, but only to map memory.

In addition, in some systems this operation can fail (e.g. in macOS
setting an fd returned by shm_open() non-blocking fails with errno
= ENOTTY).

So, let's avoid setting fd non-blocking for those messages that we
know carry memory fd (e.g. VHOST_USER_ADD_MEM_REG,
VHOST_USER_SET_MEM_TABLE).

Signed-off-by: Stefano Garzarella 
---
v3:
- avoiding setting fd non-blocking for messages where we have memory fd
  (Eric)
---
 util/vhost-user-server.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 3bfb1ad3ec..b19229074a 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -65,6 +65,18 @@ static void vmsg_close_fds(VhostUserMsg *vmsg)
 static void vmsg_unblock_fds(VhostUserMsg *vmsg)
 {
 int i;
+
+/*
+ * These messages carry fd used to map memory, not to send/receive 
messages,
+ * so this operation is useless. In addition, in some systems this
+ * operation can fail (e.g. in macOS setting an fd returned by shm_open()
+ * non-blocking fails with errno = ENOTTY)
+ */
+if (vmsg->request == VHOST_USER_ADD_MEM_REG ||
+vmsg->request == VHOST_USER_SET_MEM_TABLE) {
+return;
+}
+
 for (i = 0; i < vmsg->fd_num; i++) {
 qemu_socket_set_nonblock(vmsg->fds[i]);
 }
-- 
2.44.0




[PATCH for-9.1 v3 02/11] libvhost-user: fail vu_message_write() if sendmsg() is failing

2024-04-04 Thread Stefano Garzarella
In vu_message_write() we use sendmsg() to send the message header,
then a write() to send the payload.

If sendmsg() fails we should avoid sending the payload, since we
were unable to send the header.

Discovered before fixing the issue with the previous patch, where
sendmsg() failed on macOS due to wrong parameters, but the frontend
still sent the payload which the backend incorrectly interpreted
as a wrong header.

Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 22bea0c775..a11afd1960 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -639,6 +639,11 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg 
*vmsg)
 rc = sendmsg(conn_fd, &msg, 0);
 } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
 
+if (rc <= 0) {
+vu_panic(dev, "Error while writing: %s", strerror(errno));
+return false;
+}
+
 if (vmsg->size) {
 do {
 if (vmsg->data) {
-- 
2.44.0




[PATCH for-9.1 v3 10/11] tests/qtest/vhost-user-blk-test: use memory-backend-shm

2024-04-04 Thread Stefano Garzarella
`memory-backend-memfd` is available only on Linux while the new
`memory-backend-shm` can be used on any POSIX-compliant operating
system. Let's use it so we can run the test in multiple environments.

Signed-off-by: Stefano Garzarella 
---
 tests/qtest/vhost-user-blk-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index 117b9acd10..e945f6abf2 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -906,7 +906,7 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
vhost_user_blk_bin);
 
 g_string_append_printf(cmd_line,
-" -object memory-backend-memfd,id=mem,size=256M,share=on "
+" -object memory-backend-shm,id=mem,size=256M,share=on "
 " -M memory-backend=mem -m 256M ");
 
 for (i = 0; i < vus_instances; i++) {
-- 
2.44.0




[PATCH for-9.1 v3 06/11] vhost-user: enable frontends on any POSIX system

2024-04-04 Thread Stefano Garzarella
The vhost-user protocol is not really Linux-specific so let's enable
vhost-user frontends for any POSIX system.

In vhost_net.c we use VHOST_FILE_UNBIND which is defined in a Linux
specific header, let's define it for other systems as well.

Signed-off-by: Stefano Garzarella 
---
 meson.build| 1 -
 hw/net/vhost_net.c | 5 +
 hw/block/Kconfig   | 2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index c9c3217ba4..c19d51501a 100644
--- a/meson.build
+++ b/meson.build
@@ -151,7 +151,6 @@ have_tpm = get_option('tpm') \
 
 # vhost
 have_vhost_user = get_option('vhost_user') \
-  .disable_auto_if(host_os != 'linux') \
   .require(host_os != 'windows',
error_message: 'vhost-user is not available on Windows').allowed()
 have_vhost_vdpa = get_option('vhost_vdpa') \
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index fd1a93701a..fced429813 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -34,8 +34,13 @@
 #include "standard-headers/linux/virtio_ring.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/virtio-bus.h"
+#if defined(__linux__)
 #include "linux-headers/linux/vhost.h"
+#endif
 
+#ifndef VHOST_FILE_UNBIND
+#define VHOST_FILE_UNBIND -1
+#endif
 
 /* Features supported by host kernel. */
 static const int kernel_feature_bits[] = {
diff --git a/hw/block/Kconfig b/hw/block/Kconfig
index 9e8f28f982..29ee09e434 100644
--- a/hw/block/Kconfig
+++ b/hw/block/Kconfig
@@ -40,7 +40,7 @@ config VHOST_USER_BLK
 bool
 # Only PCI devices are provided for now
 default y if VIRTIO_PCI
-depends on VIRTIO && VHOST_USER && LINUX
+depends on VIRTIO && VHOST_USER
 
 config SWIM
 bool
-- 
2.44.0




[PATCH for-9.1 v3 03/11] libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported

2024-04-04 Thread Stefano Garzarella
libvhost-user will panic when receiving VHOST_USER_GET_INFLIGHT_FD
message if MFD_ALLOW_SEALING is not defined, since it's not able
to create a memfd.

VHOST_USER_GET_INFLIGHT_FD is used only if
VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD is negotiated. So, let's mask
that feature if the backend is not able to properly handle these
messages.

Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index a11afd1960..1c361ffd51 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -1674,6 +1674,16 @@ vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg 
*vmsg)
 features |= dev->iface->get_protocol_features(dev);
 }
 
+/*
+ * If MFD_ALLOW_SEALING is not defined, we are not able to handle
+ * VHOST_USER_GET_INFLIGHT_FD messages, since we can't create a memfd.
+ * Those messages are used only if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD
+ * is negotiated. A device implementation can enable it, so let's mask
+ * it to avoid a runtime panic.
+ */
+#ifndef MFD_ALLOW_SEALING
+features &= ~(1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD);
+#endif
 vmsg_set_reply_u64(vmsg, features);
 return true;
 }
-- 
2.44.0




[PATCH for-9.1 v3 00/11] vhost-user: support any POSIX system (tested on macOS, FreeBSD, OpenBSD)

2024-04-04 Thread Stefano Garzarella
v1: https://patchew.org/QEMU/20240228114759.44758-1-sgarz...@redhat.com/
v2: https://patchew.org/QEMU/20240326133936.125332-1-sgarz...@redhat.com/
v3:
  - rebased on v9.0.0-rc2
  - patch 4: avoiding setting fd non-blocking for messages where we
have memory fd (Eric)
  - patch 9: enriched commit message and documentation to highlight that we
want to mimic memfd (David)

The vhost-user protocol is not really Linux-specific, so let's try support
QEMU's frontends and backends (including libvhost-user) in any POSIX system
with this series. The main use case is to be able to use virtio devices that
we don't have built-in in QEMU (e.g. virtiofsd, vhost-user-vsock, etc.) even
in non-Linux systems.

The first 5 patches are more like fixes discovered at runtime on macOS or
FreeBSD that could go even independently of this series.

Patches 6, 7, and 8 enable building of frontends and backends (including
libvhost-user) with associated code changes to succeed in compilation.

Patch 9 adds `memory-backend-shm` that uses the POSIX shm_open() API to
create shared memory which is identified by an fd that can be shared with
vhost-user backends. This is useful on those systems (like macOS) where
we don't have memfd_create() or special filesystems like "/dev/shm".

Patches 10 and 11 use `memory-backend-shm` in some vhost-user tests.

Maybe the first 5 patches can go separately, but I only discovered those
problems after testing patches 6 - 9, so I have included them in this series
for now. Please let me know if you prefer that I send them separately.

I tested this series using vhost-user-blk and QSD on macOS Sonoma 14.4
(aarch64), FreeBSD 14 (x86_64), OpenBSD 7.4 (x86_64), and Fedora 39 (x86_64)
in this way:

- Start vhost-user-blk or QSD (same commands for all systems)

  vhost-user-blk -s /tmp/vhost.socket \
-b Fedora-Cloud-Base-39-1.5.x86_64.raw

  qemu-storage-daemon \
--blockdev 
file,filename=Fedora-Cloud-Base-39-1.5.x86_64.qcow2,node-name=file \
--blockdev qcow2,file=file,node-name=qcow2 \
--export 
vhost-user-blk,addr.type=unix,addr.path=/tmp/vhost.socket,id=vub,num-queues=1,node-name=qcow2,writable=on

- macOS (aarch64): start QEMU (using hvf accelerator)

  qemu-system-aarch64 -smp 2 -cpu host -M virt,accel=hvf,memory-backend=mem \
-drive 
file=./build/pc-bios/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on \
-device virtio-net-device,netdev=net0 -netdev user,id=net0 \
-device ramfb -device usb-ehci -device usb-kbd \
-object memory-backend-shm,id=mem,size=512M \
-device vhost-user-blk-pci,num-queues=1,disable-legacy=on,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

- FreeBSD/OpenBSD (x86_64): start QEMU (no accelerators available)

  qemu-system-x86_64 -smp 2 -M q35,memory-backend=mem \
-object memory-backend-shm,id=mem,size="512M" \
-device vhost-user-blk-pci,num-queues=1,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

- Fedora (x86_64): start QEMU (using kvm accelerator)

  qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
-object memory-backend-shm,size="512M" \
-device vhost-user-blk-pci,num-queues=1,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

Branch pushed (and CI started) at 
https://gitlab.com/sgarzarella/qemu/-/tree/macos-vhost-user?ref_type=heads

Thanks,
Stefano

Stefano Garzarella (11):
  libvhost-user: set msg.msg_control to NULL when it is empty
  libvhost-user: fail vu_message_write() if sendmsg() is failing
  libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported
  vhost-user-server: do not set memory fd non-blocking
  contrib/vhost-user-blk: fix bind() using the right size of the address
  vhost-user: enable frontends on any POSIX system
  libvhost-user: enable it on any POSIX system
  contrib/vhost-user-blk: enable it on any POSIX system
  hostmem: add a new memory backend based on POSIX shm_open()
  tests/qtest/vhost-user-blk-test: use memory-backend-shm
  tests/qtest/vhost-user-test: add a test case for memory-backend-shm

 docs/system/devices/vhost-user.rst|   5 +-
 meson.build   |   5 +-
 qapi/qom.json |  17 
 subprojects/libvhost-user/libvhost-user.h |   2 +-
 backends/hostmem-shm.c| 118 ++
 contrib/vhost-user-blk/vhost-user-blk.c   |  23 -
 hw/net/vhost_net.c|   5 +
 subprojects/libvhost-user/libvhost-user.c |  76 +-
 tests/qtest/vhost-user-blk-test.c |   2 +-
 tests/qtest/vhost-user-test.c |  23 +
 util/vhost-user-server.c  |  12 +++
 backends/meson.build  |   1 +
 hw/block/Kconfig  |   2 +-
 qemu-options.hx   |  11 ++
 util/meson.build  |   4 +-
 15 files changed, 288 insertions(+), 18 deletions(-)
 create mode 100644 backends/hostmem-shm.c

-- 
2.44.0




Re: [RFC v2 1/5] virtio: Initialize sequence variables

2024-04-04 Thread Eugenio Perez Martin
On Wed, Apr 3, 2024 at 6:51 PM Jonah Palmer  wrote:
>
>
>
> On 4/3/24 6:18 AM, Eugenio Perez Martin wrote:
> > On Thu, Mar 28, 2024 at 5:22 PM Jonah Palmer  
> > wrote:
> >>
> >> Initialize sequence variables for VirtQueue and VirtQueueElement
> >> structures. A VirtQueue's sequence variables are initialized when a
> >> VirtQueue is being created or reset. A VirtQueueElement's sequence
> >> variable is initialized when a VirtQueueElement is being initialized.
> >> These variables will be used to support the VIRTIO_F_IN_ORDER feature.
> >>
> >> A VirtQueue's used_seq_idx represents the next expected index in a
> >> sequence of VirtQueueElements to be processed (put on the used ring).
> >> The next VirtQueueElement added to the used ring must match this
> >> sequence number before additional elements can be safely added to the
> >> used ring. It's also particularly useful for helping find the number of
> >> new elements added to the used ring.
> >>
> >> A VirtQueue's current_seq_idx represents the current sequence index.
> >> This value is essentially a counter where the value is assigned to a new
> >> VirtQueueElement and then incremented. Given its uint16_t type, this
> >> sequence number can be between 0 and 65,535.
> >>
> >> A VirtQueueElement's seq_idx represents the sequence number assigned to
> >> the VirtQueueElement when it was created. This value must match with the
> >> VirtQueue's used_seq_idx before the element can be put on the used ring
> >> by the device.
> >>
> >> Signed-off-by: Jonah Palmer 
> >> ---
> >>   hw/virtio/virtio.c | 18 ++
> >>   include/hw/virtio/virtio.h |  1 +
> >>   2 files changed, 19 insertions(+)
> >>
> >> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> >> index fb6b4ccd83..069d96df99 100644
> >> --- a/hw/virtio/virtio.c
> >> +++ b/hw/virtio/virtio.c
> >> @@ -132,6 +132,10 @@ struct VirtQueue
> >>   uint16_t used_idx;
> >>   bool used_wrap_counter;
> >>
> >> +/* In-Order sequence indices */
> >> +uint16_t used_seq_idx;
> >> +uint16_t current_seq_idx;
> >> +
> >
> > I'm having a hard time understanding the difference between these and
> > last_avail_idx and used_idx. It seems to me if we replace them
> > everything will work? What am I missing?
> >
>
> For used_seq_idx, it does work like used_idx except the difference is
> when their values get updated, specifically for the split VQ case.
>
> As you know, for the split VQ case, the used_idx is updated during
> virtqueue_split_flush. However, imagine a batch of elements coming in
> where virtqueue_split_fill is called multiple times before
> virtqueue_split_flush. We want to make sure we write these elements to
> the used ring in-order and we'll know its order based on used_seq_idx.
>
> Alternatively, I thought about replicating the logic for the packed VQ
> case (where this used_seq_idx isn't used) where we start looking at
> vq->used_elems[vq->used_idx] and iterate through until we find a used
> element, but I wasn't sure how to handle the case where elements get
> used (written to the used ring) and new elements get put in used_elems
> before the used_idx is updated. Since this search would require us to
> always start at index vq->used_idx.
>
> For example, say, of three elements getting filled (elem0 - elem2),
> elem1 and elem0 come back first (vq->used_idx = 0):
>
> elem1 - not in-order
> elem0 - in-order, vq->used_elems[vq->used_idx + 1] (elem1) also now
>  in-order, write elem0 and elem1 to used ring, mark elements as
>  used
>
> Then elem2 comes back, but vq->used_idx is still 0, so how do we know to
> ignore the used elements at vq->used_idx (elem0) and vq->used_idx + 1
> (elem1) and iterate to vq->used_idx + 2 (elem2)?
>
> Hmm... now that I'm thinking about it, maybe for the split VQ case we
> could continue looking through the vq->used_elems array until we find an
> unused element... but then again how would we (1) know if the element is
> in-order and (2) know when to stop searching?
>

Ok I think I understand the problem now. It is aggravated if we add
chained descriptors to the mix.

We know that the order of used descriptors must be the exact same as
the order they were made available, leaving out in order batching.
What if vq->used_elems at virtqueue_pop and then virtqueue_push just
marks them as used somehow? Two booleans (or flag) would do for a
first iteration.

If we go with this approach I think used_elems should be renamed actually.

> In any case, the use of this variable could be seen as an optimization
> as its value will tell us where to start looking in vq->used_elems
> instead of always starting at vq->used_idx.
>
> If this is like a one-shot scenario where one element gets written and
> then flushed after, then yes in this case used_seq_idx == used_idx.
>
> --
>
> For current_seq_idx, this is pretty much just a counter. Every new
> VirtQueueElement created from virtqueue_pop is given a number and the
> counter is incremented.

Re: [PATCH for-9.1 6/9] block/nbd: Use URI parsing code from glib

2024-04-04 Thread Richard W.M. Jones
On Thu, Mar 28, 2024 at 04:40:10PM +, Richard W.M. Jones wrote:
> libnbd absolutely does *not* get this right, eg:
> 
>   $ nbdinfo NBD://localhost
>   nbdinfo: nbd_connect_uri: unknown NBD URI scheme: NBD: Invalid argument
> 
> so that's a bug too.

Proposed fix:
https://gitlab.com/nbdkit/libnbd/-/merge_requests/6

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top




[PATCH v5] blockcommit: Reopen base image as RO after abort

2024-04-04 Thread Alexander Ivanov
If a blockcommit is aborted the base image remains in RW mode, that leads
to a fail of subsequent live migration.

How to reproduce:
  $ virsh snapshot-create-as vm snp1 --disk-only

  *** write something to the disk inside the guest ***

  $ virsh blockcommit vm vda --active --shallow && virsh blockjob vm vda --abort
  $ lsof /vzt/vm.qcow2
  COMMAND  PID USER   FD   TYPE DEVICE   SIZE/OFF NODE NAME
  qemu-syst 433203 root   45u   REG  253,0 1724776448  133 /vzt/vm.qcow2
  $ cat /proc/433203/fdinfo/45
  pos:0
  flags:  02140002 < The last 2 means RW mode

If the base image is in RW mode at the end of blockcommit and was in RO
mode before blockcommit, reopen the base BDS in RO.

Signed-off-by: Alexander Ivanov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/mirror.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 1bdce3b657..61f0a717b7 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -93,6 +93,7 @@ typedef struct MirrorBlockJob {
 int64_t active_write_bytes_in_flight;
 bool prepared;
 bool in_drain;
+bool base_ro;
 } MirrorBlockJob;
 
 typedef struct MirrorBDSOpaque {
@@ -794,6 +795,10 @@ static int mirror_exit_common(Job *job)
 bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
 bdrv_graph_wrunlock();
 
+if (abort && s->base_ro && !bdrv_is_read_only(target_bs)) {
+bdrv_reopen_set_read_only(target_bs, true, NULL);
+}
+
 bdrv_drained_end(target_bs);
 bdrv_unref(target_bs);
 
@@ -1717,6 +1722,7 @@ static BlockJob *mirror_start_job(
  bool is_none_mode, BlockDriverState *base,
  bool auto_complete, const char *filter_node_name,
  bool is_mirror, MirrorCopyMode copy_mode,
+ bool base_ro,
  Error **errp)
 {
 MirrorBlockJob *s;
@@ -1800,6 +1806,7 @@ static BlockJob *mirror_start_job(
 bdrv_unref(mirror_top_bs);
 
 s->mirror_top_bs = mirror_top_bs;
+s->base_ro = base_ro;
 
 /* No resize for the target either; while the mirror is still running, a
  * consistent read isn't necessarily possible. We could possibly allow
@@ -2029,7 +2036,7 @@ void mirror_start(const char *job_id, BlockDriverState 
*bs,
  speed, granularity, buf_size, backing_mode, zero_target,
  on_source_error, on_target_error, unmap, NULL, NULL,
  &mirror_job_driver, is_none_mode, base, false,
- filter_node_name, true, copy_mode, errp);
+ filter_node_name, true, copy_mode, false, errp);
 }
 
 BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
@@ -2058,7 +2065,7 @@ BlockJob *commit_active_start(const char *job_id, 
BlockDriverState *bs,
  on_error, on_error, true, cb, opaque,
  &commit_active_job_driver, false, base, auto_complete,
  filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
- errp);
+ base_read_only, errp);
 if (!job) {
 goto error_restore_flags;
 }
-- 
2.40.1




[PATCH-for-9.0] hw/sd/sdhci: Discard excess of data written to Buffer Data Port register

2024-04-04 Thread Philippe Mathieu-Daudé
Per "SD Host Controller Standard Specification Version 3.00":

  * 1.7 Buffer Control

  - 1.7.1 Control of Buffer Pointer

(3) Buffer Control with Block Size

In case of write operation, the buffer accumulates the data
written through the Buffer Data Port register. When the buffer
pointer reaches the block size, Buffer Write Enable in the
Present State register changes 1 to 0. It means no more data
can be written to the buffer. Excess data of the last write is
ignored. For example, if just lower 2 bytes data can be written
to the buffer and a 32-bit (4-byte) block of data is written to
the Buffer Data Port register, the lower 2 bytes of data is
written to the buffer and the upper 2 bytes is ignored.

Discard the excess of data to avoid overflow reported by fuzzer:

  $ cat << EOF | qemu-system-i386 \
 -display none -nodefaults \
 -machine accel=qtest -m 512M \
 -device sdhci-pci,sd-spec-version=3 \
 -device sd-card,drive=mydrive \
 -drive 
if=none,index=0,file=null-co://,format=raw,id=mydrive -nographic \
 -qtest stdio
  outl 0xcf8 0x80001013
  outl 0xcfc 0x91
  outl 0xcf8 0x80001001
  outl 0xcfc 0x0600
  write 0x912c 0x1 0x05
  write 0x9158 0x1 0x16
  write 0x9105 0x1 0x04
  write 0x9128 0x1 0x08
  write 0x16 0x1 0x21
  write 0x19 0x1 0x20
  write 0x910c 0x1 0x01
  write 0x910e 0x1 0x20
  write 0x910f 0x1 0x00
  write 0x910c 0x1 0x00
  write 0x9120 0x1 0x00
  EOF

Stack trace (part):
=
==89993==ERROR: AddressSanitizer: heap-buffer-overflow on address
0x61529900 at pc 0x55d5f885700d bp 0x7ffc1e1e9470 sp 0x7ffc1e1e9468
WRITE of size 1 at 0x61529900 thread T0
#0 0x55d5f885700c in sdhci_write_dataport hw/sd/sdhci.c:564:39
#1 0x55d5f8849150 in sdhci_write hw/sd/sdhci.c:1223:13
#2 0x55d5fa01db63 in memory_region_write_accessor system/memory.c:497:5
#3 0x55d5fa01d245 in access_with_adjusted_size system/memory.c:573:18
#4 0x55d5fa01b1a9 in memory_region_dispatch_write system/memory.c:1521:16
#5 0x55d5fa09f5c9 in flatview_write_continue system/physmem.c:2711:23
#6 0x55d5fa08f78b in flatview_write system/physmem.c:2753:12
#7 0x55d5fa08f258 in address_space_write system/physmem.c:2860:18
...
0x61529900 is located 0 bytes to the right of 512-byte region
[0x61529700,0x61529900) allocated by thread T0 here:
#0 0x55d5f7237b27 in __interceptor_calloc
#1 0x7f9e36dd4c50 in g_malloc0
#2 0x55d5f88672f7 in sdhci_pci_realize hw/sd/sdhci-pci.c:36:5
#3 0x55d5f844b582 in pci_qdev_realize hw/pci/pci.c:2092:9
#4 0x55d5fa2ee74b in device_set_realized hw/core/qdev.c:510:13
#5 0x55d5fa325bfb in property_set_bool qom/object.c:2358:5
#6 0x55d5fa31ea45 in object_property_set qom/object.c:1472:5
#7 0x55d5fa332509 in object_property_set_qobject om/qom-qobject.c:28:10
#8 0x55d5fa31f6ed in object_property_set_bool qom/object.c:1541:15
#9 0x55d5fa2e2948 in qdev_realize hw/core/qdev.c:292:12
#10 0x55d5f8eed3f1 in qdev_device_add_from_qdict 
system/qdev-monitor.c:719:10
#11 0x55d5f8eef7ff in qdev_device_add system/qdev-monitor.c:738:11
#12 0x55d5f8f211f0 in device_init_func system/vl.c:1200:11
#13 0x55d5fad0877d in qemu_opts_foreach util/qemu-option.c:1135:14
#14 0x55d5f8f0df9c in qemu_create_cli_devices system/vl.c:2638:5
#15 0x55d5f8f0db24 in qmp_x_exit_preconfig system/vl.c:2706:5
#16 0x55d5f8f14dc0 in qemu_init system/vl.c:3737:9
...
SUMMARY: AddressSanitizer: heap-buffer-overflow hw/sd/sdhci.c:564:39
in sdhci_write_dataport

Cc: qemu-sta...@nongnu.org
Fixes: d7dfca0807 ("hw/sdhci: introduce standard SD host controller")
Buglink: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=58813
Reported-by: Alexander Bulekov 
Reported-by: Chuhong Yuan 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/sd/sdhci.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index c5e0bc018b..2dd88fa139 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -552,7 +552,7 @@ static void sdhci_write_block_to_card(SDHCIState *s)
  * register */
 static void sdhci_write_dataport(SDHCIState *s, uint32_t value, unsigned size)
 {
-unsigned i;
+unsigned i, available;
 
 /* Check that there is free space left in a buffer */
 if (!(s->prnsts & SDHC_SPACE_AVAILABLE)) {
@@ -560,6 +560,14 @@ static void sdhci_write_dataport(SDHCIState *s, uint32_t 
value, unsigned size)
 return;
 }
 
+available = s->buf_maxsz - s->data_count;
+if (size > available) {
+qemu_log_mask(LOG_GUEST_ERROR, "SDHC buffer data full (size: 
%"PRIu32")"
+   " discarding %u byte%s\n",
+   s->buf_maxsz, size - available,
+