from:"Stefano Garzarella"

Re: [PATCH for-9.1 v3 09/11] hostmem: add a new memory backend based on POSIX shm_open()

2024-04-08 Thread Stefano Garzarella


On Mon, Apr 08, 2024 at 10:03:15AM +0200, David Hildenbrand wrote:

On 08.04.24 09:58, Stefano Garzarella wrote:

On Thu, Apr 04, 2024 at 04:09:34PM +0200, David Hildenbrand wrote:

On 04.04.24 14:23, Stefano Garzarella wrote:

shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.

In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.

Signed-off-by: Stefano Garzarella 
---
v3
- enriched commit message and documentation to highlight that we
  want to mimic memfd (David)
---
 docs/system/devices/vhost-user.rst |   5 +-
 qapi/qom.json  |  17 +
 backends/hostmem-shm.c | 118 +
 backends/meson.build   |   1 +
 qemu-options.hx|  11 +++
 5 files changed, 150 insertions(+), 2 deletions(-)
 create mode 100644 backends/hostmem-shm.c

diff --git a/docs/system/devices/vhost-user.rst 
b/docs/system/devices/vhost-user.rst
index 9b2da106ce..35259d8ec7 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -98,8 +98,9 @@ Shared memory object
 In order for the daemon to access the VirtIO queues to process the
 requests it needs access to the guest's address space. This is
-achieved via the ``memory-backend-file`` or ``memory-backend-memfd``
-objects. A reference to a file-descriptor which can access this object
+achieved via the ``memory-backend-file``, ``memory-backend-memfd``, or
+``memory-backend-shm`` objects.
+A reference to a file-descriptor which can access this object
 will be passed via the socket as part of the protocol negotiation.
 Currently the shared memory object needs to match the size of the main
diff --git a/qapi/qom.json b/qapi/qom.json
index 85e6b4f84a..5252ec69e3 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -721,6 +721,19 @@
 '*hugetlbsize': 'size',
 '*seal': 'bool' } }
+##
+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# The @share boolean option is true by default with shm.
+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { } }
+


Acked-by: David Hildenbrand 

One comment: we should maybe just forbid setting share=off. it doesn't
make any sense and it can even result in an unexpected double memory
consumption. We missed doing that for memfd, unfortunately.


Good point!

IIUC the `share` property is defined by the parent `hostmem`, so I
should find a way to override the property here and disable the setter,
or add an option to `hostmem` to make the property non-writable.


Right, or simply fail later when you would find "share=off" in 
shm_backend_memory_alloc().


This seems like the simplest and cleanest approach, I'll go in this 
direction!




When ever supporting named shmem_open(), it could make sense for VM 
snapshotting. Right now it doesn't really make any sense.


Yeah, I see.

Thanks,
Stefano

Re: [PATCH for-9.1 v3 00/11] vhost-user: support any POSIX system (tested on macOS, FreeBSD, OpenBSD)

2024-04-08 Thread Stefano Garzarella


FYI I'll be on PTO till May 2nd, I'll send the v4 when I'm back ASAP.

Thanks,
Stefano

On Thu, Apr 04, 2024 at 02:23:19PM +0200, Stefano Garzarella wrote:

v1: https://patchew.org/QEMU/20240228114759.44758-1-sgarz...@redhat.com/
v2: https://patchew.org/QEMU/20240326133936.125332-1-sgarz...@redhat.com/
v3:
 - rebased on v9.0.0-rc2
 - patch 4: avoiding setting fd non-blocking for messages where we
   have memory fd (Eric)
 - patch 9: enriched commit message and documentation to highlight that we
   want to mimic memfd (David)

The vhost-user protocol is not really Linux-specific, so let's try support
QEMU's frontends and backends (including libvhost-user) in any POSIX system
with this series. The main use case is to be able to use virtio devices that
we don't have built-in in QEMU (e.g. virtiofsd, vhost-user-vsock, etc.) even
in non-Linux systems.

The first 5 patches are more like fixes discovered at runtime on macOS or
FreeBSD that could go even independently of this series.

Patches 6, 7, and 8 enable building of frontends and backends (including
libvhost-user) with associated code changes to succeed in compilation.

Patch 9 adds `memory-backend-shm` that uses the POSIX shm_open() API to
create shared memory which is identified by an fd that can be shared with
vhost-user backends. This is useful on those systems (like macOS) where
we don't have memfd_create() or special filesystems like "/dev/shm".

Patches 10 and 11 use `memory-backend-shm` in some vhost-user tests.

Maybe the first 5 patches can go separately, but I only discovered those
problems after testing patches 6 - 9, so I have included them in this series
for now. Please let me know if you prefer that I send them separately.

I tested this series using vhost-user-blk and QSD on macOS Sonoma 14.4
(aarch64), FreeBSD 14 (x86_64), OpenBSD 7.4 (x86_64), and Fedora 39 (x86_64)
in this way:

- Start vhost-user-blk or QSD (same commands for all systems)

 vhost-user-blk -s /tmp/vhost.socket \
   -b Fedora-Cloud-Base-39-1.5.x86_64.raw

 qemu-storage-daemon \
   --blockdev 
file,filename=Fedora-Cloud-Base-39-1.5.x86_64.qcow2,node-name=file \
   --blockdev qcow2,file=file,node-name=qcow2 \
   --export 
vhost-user-blk,addr.type=unix,addr.path=/tmp/vhost.socket,id=vub,num-queues=1,node-name=qcow2,writable=on

- macOS (aarch64): start QEMU (using hvf accelerator)

 qemu-system-aarch64 -smp 2 -cpu host -M virt,accel=hvf,memory-backend=mem \
   -drive 
file=./build/pc-bios/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on \
   -device virtio-net-device,netdev=net0 -netdev user,id=net0 \
   -device ramfb -device usb-ehci -device usb-kbd \
   -object memory-backend-shm,id=mem,size=512M \
   -device vhost-user-blk-pci,num-queues=1,disable-legacy=on,chardev=char0 \
   -chardev socket,id=char0,path=/tmp/vhost.socket

- FreeBSD/OpenBSD (x86_64): start QEMU (no accelerators available)

 qemu-system-x86_64 -smp 2 -M q35,memory-backend=mem \
   -object memory-backend-shm,id=mem,size="512M" \
   -device vhost-user-blk-pci,num-queues=1,chardev=char0 \
   -chardev socket,id=char0,path=/tmp/vhost.socket

- Fedora (x86_64): start QEMU (using kvm accelerator)

 qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
   -object memory-backend-shm,size="512M" \
   -device vhost-user-blk-pci,num-queues=1,chardev=char0 \
   -chardev socket,id=char0,path=/tmp/vhost.socket

Branch pushed (and CI started) at 
https://gitlab.com/sgarzarella/qemu/-/tree/macos-vhost-user?ref_type=heads

Thanks,
Stefano

Stefano Garzarella (11):
 libvhost-user: set msg.msg_control to NULL when it is empty
 libvhost-user: fail vu_message_write() if sendmsg() is failing
 libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported
 vhost-user-server: do not set memory fd non-blocking
 contrib/vhost-user-blk: fix bind() using the right size of the address
 vhost-user: enable frontends on any POSIX system
 libvhost-user: enable it on any POSIX system
 contrib/vhost-user-blk: enable it on any POSIX system
 hostmem: add a new memory backend based on POSIX shm_open()
 tests/qtest/vhost-user-blk-test: use memory-backend-shm
 tests/qtest/vhost-user-test: add a test case for memory-backend-shm

docs/system/devices/vhost-user.rst|   5 +-
meson.build   |   5 +-
qapi/qom.json |  17 
subprojects/libvhost-user/libvhost-user.h |   2 +-
backends/hostmem-shm.c| 118 ++
contrib/vhost-user-blk/vhost-user-blk.c   |  23 -
hw/net/vhost_net.c|   5 +
subprojects/libvhost-user/libvhost-user.c |  76 +-
tests/qtest/vhost-user-blk-test.c |   2 +-
tests/qtest/vhost-user-test.c |  23 +
util/vhost-user-server.c  |  12 +++
backends/meson.build  |   1 +
hw/block/Kconfig  |   2 +-
qemu-options.hx   |  11 ++
util/meson.build

Re: [PATCH for-9.1 v3 09/11] hostmem: add a new memory backend based on POSIX shm_open()

2024-04-08 Thread Stefano Garzarella


On Thu, Apr 04, 2024 at 04:09:34PM +0200, David Hildenbrand wrote:

On 04.04.24 14:23, Stefano Garzarella wrote:

shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.

In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.

Signed-off-by: Stefano Garzarella 
---
v3
- enriched commit message and documentation to highlight that we
  want to mimic memfd (David)
---
 docs/system/devices/vhost-user.rst |   5 +-
 qapi/qom.json  |  17 +
 backends/hostmem-shm.c | 118 +
 backends/meson.build   |   1 +
 qemu-options.hx|  11 +++
 5 files changed, 150 insertions(+), 2 deletions(-)
 create mode 100644 backends/hostmem-shm.c

diff --git a/docs/system/devices/vhost-user.rst 
b/docs/system/devices/vhost-user.rst
index 9b2da106ce..35259d8ec7 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -98,8 +98,9 @@ Shared memory object
 In order for the daemon to access the VirtIO queues to process the
 requests it needs access to the guest's address space. This is
-achieved via the ``memory-backend-file`` or ``memory-backend-memfd``
-objects. A reference to a file-descriptor which can access this object
+achieved via the ``memory-backend-file``, ``memory-backend-memfd``, or
+``memory-backend-shm`` objects.
+A reference to a file-descriptor which can access this object
 will be passed via the socket as part of the protocol negotiation.
 Currently the shared memory object needs to match the size of the main
diff --git a/qapi/qom.json b/qapi/qom.json
index 85e6b4f84a..5252ec69e3 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -721,6 +721,19 @@
 '*hugetlbsize': 'size',
 '*seal': 'bool' } }
+##
+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# The @share boolean option is true by default with shm.
+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { } }
+


Acked-by: David Hildenbrand 

One comment: we should maybe just forbid setting share=off. it doesn't 
make any sense and it can even result in an unexpected double memory 
consumption. We missed doing that for memfd, unfortunately.


Good point!

IIUC the `share` property is defined by the parent `hostmem`, so I 
should find a way to override the property here and disable the setter, 
or add an option to `hostmem` to make the property non-writable.


Thanks,
Stefano

Re: [PATCH for-9.1 v3 08/11] contrib/vhost-user-blk: enable it on any POSIX system

2024-04-08 Thread Stefano Garzarella


On Thu, Apr 04, 2024 at 04:00:38PM +0200, Philippe Mathieu-Daudé wrote:

Hi Stefano,


Hi Phil!



On 4/4/24 14:23, Stefano Garzarella wrote:

Let's make the code more portable by using the "qemu/bswap.h" API
and adding defines from block/file-posix.c to support O_DIRECT in
other systems (e.g. macOS).

vhost-user-server.c is a dependency, let's enable it for any POSIX
system.

Signed-off-by: Stefano Garzarella 
---
 meson.build |  2 --
 contrib/vhost-user-blk/vhost-user-blk.c | 19 +--
 util/meson.build|  4 +++-
 3 files changed, 20 insertions(+), 5 deletions(-)




diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index a8ab9269a2..462e584857 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -16,6 +16,7 @@
  */
 #include "qemu/osdep.h"
+#include "qemu/bswap.h"
 #include "standard-headers/linux/virtio_blk.h"
 #include "libvhost-user-glib.h"




@@ -267,13 +282,13 @@ static int vub_virtio_process_req(VubDev *vdev_blk,
 req->in = (struct virtio_blk_inhdr *)elem->in_sg[in_num - 1].iov_base;
 in_num--;
-type = le32toh(req->out->type);
+type = le32_to_cpu(req->out->type);
 switch (type & ~VIRTIO_BLK_T_BARRIER) {
 case VIRTIO_BLK_T_IN:
 case VIRTIO_BLK_T_OUT: {
 ssize_t ret = 0;
 bool is_write = type & VIRTIO_BLK_T_OUT;
-req->sector_num = le64toh(req->out->sector);
+req->sector_num = le64_to_cpu(req->out->sector);
 if (is_write) {
 ret  = vub_writev(req, >out_sg[1], out_num);
 } else {

Can we switch to the bswap API in a preliminary patch,


Sure, I tried to minimize the patches because it's already big,
but I can split this.


converting all the source files?



What do you mean with "all the source files"?

"le64toh" is used here and in some subprojects (e.g. libvduse,
libvhost-user), where IIUC we can't use QEMU's bswap.h because we
don't want to put a dependency with the QEMU code.

BTW I'll check for other *toh() usage in QEMU code and change in the
preliminary patch you suggested to add.

Thanks for the review,
Stefano

[PATCH for-9.1 v3 03/11] libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported

2024-04-04 Thread Stefano Garzarella

libvhost-user will panic when receiving VHOST_USER_GET_INFLIGHT_FD
message if MFD_ALLOW_SEALING is not defined, since it's not able
to create a memfd.

VHOST_USER_GET_INFLIGHT_FD is used only if
VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD is negotiated. So, let's mask
that feature if the backend is not able to properly handle these
messages.

Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index a11afd1960..1c361ffd51 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -1674,6 +1674,16 @@ vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg 
*vmsg)
 features |= dev->iface->get_protocol_features(dev);
 }
 
+/*
+ * If MFD_ALLOW_SEALING is not defined, we are not able to handle
+ * VHOST_USER_GET_INFLIGHT_FD messages, since we can't create a memfd.
+ * Those messages are used only if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD
+ * is negotiated. A device implementation can enable it, so let's mask
+ * it to avoid a runtime panic.
+ */
+#ifndef MFD_ALLOW_SEALING
+features &= ~(1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD);
+#endif
 vmsg_set_reply_u64(vmsg, features);
 return true;
 }
-- 
2.44.0

[PATCH for-9.1 v3 11/11] tests/qtest/vhost-user-test: add a test case for memory-backend-shm

2024-04-04 Thread Stefano Garzarella

`memory-backend-shm` can be used with vhost-user devices, so let's
add a new test case for it.

Signed-off-by: Stefano Garzarella 
---
 tests/qtest/vhost-user-test.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index d4e437265f..8c1d903b2a 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -44,6 +44,8 @@
 "mem-path=%s,share=on -numa node,memdev=mem"
 #define QEMU_CMD_MEMFD  " -m %d -object memory-backend-memfd,id=mem,size=%dM," 
\
 " -numa node,memdev=mem"
+#define QEMU_CMD_SHM" -m %d -object memory-backend-shm,id=mem,size=%dM," \
+" -numa node,memdev=mem"
 #define QEMU_CMD_CHR" -chardev socket,id=%s,path=%s%s"
 #define QEMU_CMD_NETDEV " -netdev vhost-user,id=hs0,chardev=%s,vhostforce=on"
 
@@ -195,6 +197,7 @@ enum test_memfd {
 TEST_MEMFD_AUTO,
 TEST_MEMFD_YES,
 TEST_MEMFD_NO,
+TEST_MEMFD_SHM,
 };
 
 static void append_vhost_net_opts(TestServer *s, GString *cmd_line,
@@ -228,6 +231,8 @@ static void append_mem_opts(TestServer *server, GString 
*cmd_line,
 
 if (memfd == TEST_MEMFD_YES) {
 g_string_append_printf(cmd_line, QEMU_CMD_MEMFD, size, size);
+} else if (memfd == TEST_MEMFD_SHM) {
+g_string_append_printf(cmd_line, QEMU_CMD_SHM, size, size);
 } else {
 const char *root = init_hugepagefs() ? : server->tmpfs;
 
@@ -788,6 +793,19 @@ static void *vhost_user_test_setup_memfd(GString 
*cmd_line, void *arg)
 return server;
 }
 
+static void *vhost_user_test_setup_shm(GString *cmd_line, void *arg)
+{
+TestServer *server = test_server_new("vhost-user-test", arg);
+test_server_listen(server);
+
+append_mem_opts(server, cmd_line, 256, TEST_MEMFD_SHM);
+server->vu_ops->append_opts(server, cmd_line, "");
+
+g_test_queue_destroy(vhost_user_test_cleanup, server);
+
+return server;
+}
+
 static void test_read_guest_mem(void *obj, void *arg, QGuestAllocator *alloc)
 {
 TestServer *server = arg;
@@ -1081,6 +1099,11 @@ static void register_vhost_user_test(void)
  "virtio-net",
  test_read_guest_mem, );
 
+opts.before = vhost_user_test_setup_shm;
+qos_add_test("vhost-user/read-guest-mem/shm",
+ "virtio-net",
+ test_read_guest_mem, );
+
 if (qemu_memfd_check(MFD_ALLOW_SEALING)) {
 opts.before = vhost_user_test_setup_memfd;
 qos_add_test("vhost-user/read-guest-mem/memfd",
-- 
2.44.0

[PATCH for-9.1 v3 06/11] vhost-user: enable frontends on any POSIX system

2024-04-04 Thread Stefano Garzarella

The vhost-user protocol is not really Linux-specific so let's enable
vhost-user frontends for any POSIX system.

In vhost_net.c we use VHOST_FILE_UNBIND which is defined in a Linux
specific header, let's define it for other systems as well.

Signed-off-by: Stefano Garzarella 
---
 meson.build| 1 -
 hw/net/vhost_net.c | 5 +
 hw/block/Kconfig   | 2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index c9c3217ba4..c19d51501a 100644
--- a/meson.build
+++ b/meson.build
@@ -151,7 +151,6 @@ have_tpm = get_option('tpm') \
 
 # vhost
 have_vhost_user = get_option('vhost_user') \
-  .disable_auto_if(host_os != 'linux') \
   .require(host_os != 'windows',
error_message: 'vhost-user is not available on Windows').allowed()
 have_vhost_vdpa = get_option('vhost_vdpa') \
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index fd1a93701a..fced429813 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -34,8 +34,13 @@
 #include "standard-headers/linux/virtio_ring.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/virtio-bus.h"
+#if defined(__linux__)
 #include "linux-headers/linux/vhost.h"
+#endif
 
+#ifndef VHOST_FILE_UNBIND
+#define VHOST_FILE_UNBIND -1
+#endif
 
 /* Features supported by host kernel. */
 static const int kernel_feature_bits[] = {
diff --git a/hw/block/Kconfig b/hw/block/Kconfig
index 9e8f28f982..29ee09e434 100644
--- a/hw/block/Kconfig
+++ b/hw/block/Kconfig
@@ -40,7 +40,7 @@ config VHOST_USER_BLK
 bool
 # Only PCI devices are provided for now
 default y if VIRTIO_PCI
-depends on VIRTIO && VHOST_USER && LINUX
+depends on VIRTIO && VHOST_USER
 
 config SWIM
 bool
-- 
2.44.0

[PATCH for-9.1 v3 09/11] hostmem: add a new memory backend based on POSIX shm_open()

2024-04-04 Thread Stefano Garzarella

shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

This backend mimics memfd, allocating memory that is practically
anonymous. In theory shm_open() requires a name, but this is allocated
for a short time interval and shm_unlink() is called right after
shm_open(). After that, only fd is shared with external processes
(e.g., vhost-user) as if it were associated with anonymous memory.

In the future we may also allow the user to specify the name to be
passed to shm_open(), but for now we keep the backend simple, mimicking
anonymous memory such as memfd.

Signed-off-by: Stefano Garzarella 
---
v3
- enriched commit message and documentation to highlight that we
  want to mimic memfd (David)
---
 docs/system/devices/vhost-user.rst |   5 +-
 qapi/qom.json  |  17 +
 backends/hostmem-shm.c | 118 +
 backends/meson.build   |   1 +
 qemu-options.hx|  11 +++
 5 files changed, 150 insertions(+), 2 deletions(-)
 create mode 100644 backends/hostmem-shm.c

diff --git a/docs/system/devices/vhost-user.rst 
b/docs/system/devices/vhost-user.rst
index 9b2da106ce..35259d8ec7 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -98,8 +98,9 @@ Shared memory object
 
 In order for the daemon to access the VirtIO queues to process the
 requests it needs access to the guest's address space. This is
-achieved via the ``memory-backend-file`` or ``memory-backend-memfd``
-objects. A reference to a file-descriptor which can access this object
+achieved via the ``memory-backend-file``, ``memory-backend-memfd``, or
+``memory-backend-shm`` objects.
+A reference to a file-descriptor which can access this object
 will be passed via the socket as part of the protocol negotiation.
 
 Currently the shared memory object needs to match the size of the main
diff --git a/qapi/qom.json b/qapi/qom.json
index 85e6b4f84a..5252ec69e3 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -721,6 +721,19 @@
 '*hugetlbsize': 'size',
 '*seal': 'bool' } }
 
+##
+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# The @share boolean option is true by default with shm.
+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { } }
+
 ##
 # @MemoryBackendEpcProperties:
 #
@@ -976,6 +989,8 @@
 { 'name': 'memory-backend-memfd',
   'if': 'CONFIG_LINUX' },
 'memory-backend-ram',
+{ 'name': 'memory-backend-shm',
+  'if': 'CONFIG_POSIX' },
 'pef-guest',
 { 'name': 'pr-manager-helper',
   'if': 'CONFIG_LINUX' },
@@ -1047,6 +1062,8 @@
   'memory-backend-memfd':   { 'type': 'MemoryBackendMemfdProperties',
   'if': 'CONFIG_LINUX' },
   'memory-backend-ram': 'MemoryBackendProperties',
+  'memory-backend-shm': { 'type': 'MemoryBackendShmProperties',
+  'if': 'CONFIG_POSIX' },
   'pr-manager-helper':  { 'type': 'PrManagerHelperProperties',
   'if': 'CONFIG_LINUX' },
   'qtest':  'QtestProperties',
diff --git a/backends/hostmem-shm.c b/backends/hostmem-shm.c
new file mode 100644
index 00..7595204d29
--- /dev/null
+++ b/backends/hostmem-shm.c
@@ -0,0 +1,118 @@
+/*
+ * QEMU host POSIX shared memory object backend
+ *
+ * Copyright (C) 2024 Red Hat Inc
+ *
+ * Authors:
+ *   Stefano Garzarella 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/hostmem.h"
+#include "qapi/error.h"
+
+#define TYPE_MEMORY_BACKEND_SHM "memory-backend-shm"
+
+OBJECT_DECLARE_SIMPLE_TYPE(HostMemoryBackendShm, MEMORY_BACKEND_SHM)
+
+struct HostMemoryBackendShm {
+HostMemoryBackend parent_obj;
+};
+
+static bool
+shm_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
+{
+g_autoptr(GString) shm_name = g_string_new(NULL);
+g_autofree char *backend_name = NULL;
+uint32_t ram_flags;
+int fd, oflag;
+mode_t mode;
+
+if (!backend->size) {
+error_setg(errp, "can't create backend with size 0");
+return false;
+}
+
+/*
+ * Let's use `mode = 0` because we don't want other processes to open our
+ * memory unless we share the file descriptor with them.
+ */
+mode = 0;
+oflag = O_RDWR | O_CREAT | O_EXCL;
+backend_name = host_mem

[PATCH for-9.1 v3 08/11] contrib/vhost-user-blk: enable it on any POSIX system

2024-04-04 Thread Stefano Garzarella

Let's make the code more portable by using the "qemu/bswap.h" API
and adding defines from block/file-posix.c to support O_DIRECT in
other systems (e.g. macOS).

vhost-user-server.c is a dependency, let's enable it for any POSIX
system.

Signed-off-by: Stefano Garzarella 
---
 meson.build |  2 --
 contrib/vhost-user-blk/vhost-user-blk.c | 19 +--
 util/meson.build|  4 +++-
 3 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/meson.build b/meson.build
index 3197a2f62e..b541e5c875 100644
--- a/meson.build
+++ b/meson.build
@@ -1956,8 +1956,6 @@ has_statx = cc.has_header_symbol('sys/stat.h', 
'STATX_BASIC_STATS', prefix: gnu_
 has_statx_mnt_id = cc.has_header_symbol('sys/stat.h', 'STATX_MNT_ID', prefix: 
gnu_source_prefix)
 
 have_vhost_user_blk_server = get_option('vhost_user_blk_server') \
-  .require(host_os == 'linux',
-   error_message: 'vhost_user_blk_server requires linux') \
   .require(have_vhost_user,
error_message: 'vhost_user_blk_server requires vhost-user support') 
\
   .disable_auto_if(not have_tools and not have_system) \
diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index a8ab9269a2..462e584857 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -16,6 +16,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/bswap.h"
 #include "standard-headers/linux/virtio_blk.h"
 #include "libvhost-user-glib.h"
 
@@ -24,6 +25,20 @@
 #include 
 #endif
 
+/* OS X does not have O_DSYNC */
+#ifndef O_DSYNC
+#ifdef O_SYNC
+#define O_DSYNC O_SYNC
+#elif defined(O_FSYNC)
+#define O_DSYNC O_FSYNC
+#endif
+#endif
+
+/* Approximate O_DIRECT with O_DSYNC if O_DIRECT isn't available */
+#ifndef O_DIRECT
+#define O_DIRECT O_DSYNC
+#endif
+
 enum {
 VHOST_USER_BLK_MAX_QUEUES = 8,
 };
@@ -267,13 +282,13 @@ static int vub_virtio_process_req(VubDev *vdev_blk,
 req->in = (struct virtio_blk_inhdr *)elem->in_sg[in_num - 1].iov_base;
 in_num--;
 
-type = le32toh(req->out->type);
+type = le32_to_cpu(req->out->type);
 switch (type & ~VIRTIO_BLK_T_BARRIER) {
 case VIRTIO_BLK_T_IN:
 case VIRTIO_BLK_T_OUT: {
 ssize_t ret = 0;
 bool is_write = type & VIRTIO_BLK_T_OUT;
-req->sector_num = le64toh(req->out->sector);
+req->sector_num = le64_to_cpu(req->out->sector);
 if (is_write) {
 ret  = vub_writev(req, >out_sg[1], out_num);
 } else {
diff --git a/util/meson.build b/util/meson.build
index 0ef9886be0..f52682ce96 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -113,10 +113,12 @@ if have_block
 util_ss.add(files('filemonitor-stub.c'))
   endif
   if host_os == 'linux'
-util_ss.add(files('vhost-user-server.c'), vhost_user)
 util_ss.add(files('vfio-helpers.c'))
 util_ss.add(files('chardev_open.c'))
   endif
+  if host_os != 'windows'
+util_ss.add(files('vhost-user-server.c'), vhost_user)
+  endif
 endif
 
 if cpu == 'aarch64'
-- 
2.44.0

[PATCH for-9.1 v3 04/11] vhost-user-server: do not set memory fd non-blocking

2024-04-04 Thread Stefano Garzarella

In vhost-user-server we set all fd received from the other peer
in non-blocking mode. For some of them (e.g. memfd, shm_open, etc.)
it's not really needed, because we don't use these fd with blocking
operations, but only to map memory.

In addition, in some systems this operation can fail (e.g. in macOS
setting an fd returned by shm_open() non-blocking fails with errno
= ENOTTY).

So, let's avoid setting fd non-blocking for those messages that we
know carry memory fd (e.g. VHOST_USER_ADD_MEM_REG,
VHOST_USER_SET_MEM_TABLE).

Signed-off-by: Stefano Garzarella 
---
v3:
- avoiding setting fd non-blocking for messages where we have memory fd
  (Eric)
---
 util/vhost-user-server.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 3bfb1ad3ec..b19229074a 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -65,6 +65,18 @@ static void vmsg_close_fds(VhostUserMsg *vmsg)
 static void vmsg_unblock_fds(VhostUserMsg *vmsg)
 {
 int i;
+
+/*
+ * These messages carry fd used to map memory, not to send/receive 
messages,
+ * so this operation is useless. In addition, in some systems this
+ * operation can fail (e.g. in macOS setting an fd returned by shm_open()
+ * non-blocking fails with errno = ENOTTY)
+ */
+if (vmsg->request == VHOST_USER_ADD_MEM_REG ||
+vmsg->request == VHOST_USER_SET_MEM_TABLE) {
+return;
+}
+
 for (i = 0; i < vmsg->fd_num; i++) {
 qemu_socket_set_nonblock(vmsg->fds[i]);
 }
-- 
2.44.0

[PATCH for-9.1 v3 00/11] vhost-user: support any POSIX system (tested on macOS, FreeBSD, OpenBSD)

2024-04-04 Thread Stefano Garzarella

v1: https://patchew.org/QEMU/20240228114759.44758-1-sgarz...@redhat.com/
v2: https://patchew.org/QEMU/20240326133936.125332-1-sgarz...@redhat.com/
v3:
  - rebased on v9.0.0-rc2
  - patch 4: avoiding setting fd non-blocking for messages where we
have memory fd (Eric)
  - patch 9: enriched commit message and documentation to highlight that we
want to mimic memfd (David)

The vhost-user protocol is not really Linux-specific, so let's try support
QEMU's frontends and backends (including libvhost-user) in any POSIX system
with this series. The main use case is to be able to use virtio devices that
we don't have built-in in QEMU (e.g. virtiofsd, vhost-user-vsock, etc.) even
in non-Linux systems.

The first 5 patches are more like fixes discovered at runtime on macOS or
FreeBSD that could go even independently of this series.

Patches 6, 7, and 8 enable building of frontends and backends (including
libvhost-user) with associated code changes to succeed in compilation.

Patch 9 adds `memory-backend-shm` that uses the POSIX shm_open() API to
create shared memory which is identified by an fd that can be shared with
vhost-user backends. This is useful on those systems (like macOS) where
we don't have memfd_create() or special filesystems like "/dev/shm".

Patches 10 and 11 use `memory-backend-shm` in some vhost-user tests.

Maybe the first 5 patches can go separately, but I only discovered those
problems after testing patches 6 - 9, so I have included them in this series
for now. Please let me know if you prefer that I send them separately.

I tested this series using vhost-user-blk and QSD on macOS Sonoma 14.4
(aarch64), FreeBSD 14 (x86_64), OpenBSD 7.4 (x86_64), and Fedora 39 (x86_64)
in this way:

- Start vhost-user-blk or QSD (same commands for all systems)

  vhost-user-blk -s /tmp/vhost.socket \
-b Fedora-Cloud-Base-39-1.5.x86_64.raw

  qemu-storage-daemon \
--blockdev 
file,filename=Fedora-Cloud-Base-39-1.5.x86_64.qcow2,node-name=file \
--blockdev qcow2,file=file,node-name=qcow2 \
--export 
vhost-user-blk,addr.type=unix,addr.path=/tmp/vhost.socket,id=vub,num-queues=1,node-name=qcow2,writable=on

- macOS (aarch64): start QEMU (using hvf accelerator)

  qemu-system-aarch64 -smp 2 -cpu host -M virt,accel=hvf,memory-backend=mem \
-drive 
file=./build/pc-bios/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on \
-device virtio-net-device,netdev=net0 -netdev user,id=net0 \
-device ramfb -device usb-ehci -device usb-kbd \
-object memory-backend-shm,id=mem,size=512M \
-device vhost-user-blk-pci,num-queues=1,disable-legacy=on,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

- FreeBSD/OpenBSD (x86_64): start QEMU (no accelerators available)

  qemu-system-x86_64 -smp 2 -M q35,memory-backend=mem \
-object memory-backend-shm,id=mem,size="512M" \
-device vhost-user-blk-pci,num-queues=1,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

- Fedora (x86_64): start QEMU (using kvm accelerator)

  qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
-object memory-backend-shm,size="512M" \
-device vhost-user-blk-pci,num-queues=1,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

Branch pushed (and CI started) at 
https://gitlab.com/sgarzarella/qemu/-/tree/macos-vhost-user?ref_type=heads

Thanks,
Stefano

Stefano Garzarella (11):
  libvhost-user: set msg.msg_control to NULL when it is empty
  libvhost-user: fail vu_message_write() if sendmsg() is failing
  libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported
  vhost-user-server: do not set memory fd non-blocking
  contrib/vhost-user-blk: fix bind() using the right size of the address
  vhost-user: enable frontends on any POSIX system
  libvhost-user: enable it on any POSIX system
  contrib/vhost-user-blk: enable it on any POSIX system
  hostmem: add a new memory backend based on POSIX shm_open()
  tests/qtest/vhost-user-blk-test: use memory-backend-shm
  tests/qtest/vhost-user-test: add a test case for memory-backend-shm

 docs/system/devices/vhost-user.rst|   5 +-
 meson.build   |   5 +-
 qapi/qom.json |  17 
 subprojects/libvhost-user/libvhost-user.h |   2 +-
 backends/hostmem-shm.c| 118 ++
 contrib/vhost-user-blk/vhost-user-blk.c   |  23 -
 hw/net/vhost_net.c|   5 +
 subprojects/libvhost-user/libvhost-user.c |  76 +-
 tests/qtest/vhost-user-blk-test.c |   2 +-
 tests/qtest/vhost-user-test.c |  23 +
 util/vhost-user-server.c  |  12 +++
 backends/meson.build  |   1 +
 hw/block/Kconfig  |   2 +-
 qemu-options.hx   |  11 ++
 util/meson.build  |   4 +-
 15 files changed, 288 insertions(+), 18 deletions(-)
 create mode 100644 backends/hostmem-shm.c

-- 
2.44.0

[PATCH for-9.1 v3 07/11] libvhost-user: enable it on any POSIX system

2024-04-04 Thread Stefano Garzarella

The vhost-user protocol is not really Linux-specific so let's enable
libvhost-user for any POSIX system.

Compiling it on macOS and FreeBSD some problems came up:
- avoid to include linux/vhost.h which is avaibale only on Linux
  (vhost_types.h contains many of the things we need)
- macOS doesn't provide sys/endian.h, so let's define them
  (note: libvhost-user doesn't include qemu's headers, so we can't use
   use "qemu/bswap.h")
- define eventfd_[write|read] as write/read wrapper when system doesn't
  provide those (e.g. macOS)
- copy SEAL defines from include/qemu/memfd.h to make the code works
  on FreeBSD where MFD_ALLOW_SEALING is defined
- define MAP_NORESERVE if it's not defined (e.g. on FreeBSD)

Signed-off-by: Stefano Garzarella 
---
 meson.build   |  2 +-
 subprojects/libvhost-user/libvhost-user.h |  2 +-
 subprojects/libvhost-user/libvhost-user.c | 60 +--
 3 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/meson.build b/meson.build
index c19d51501a..3197a2f62e 100644
--- a/meson.build
+++ b/meson.build
@@ -3194,7 +3194,7 @@ endif
 config_host_data.set('CONFIG_FDT', fdt.found())
 
 vhost_user = not_found
-if host_os == 'linux' and have_vhost_user
+if have_vhost_user
   libvhost_user = subproject('libvhost-user')
   vhost_user = libvhost_user.get_variable('vhost_user_dep')
 endif
diff --git a/subprojects/libvhost-user/libvhost-user.h 
b/subprojects/libvhost-user/libvhost-user.h
index deb40e77b3..e13e1d3931 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -18,9 +18,9 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include "standard-headers/linux/virtio_ring.h"
+#include "standard-headers/linux/vhost_types.h"
 
 /* Based on qemu/hw/virtio/vhost-user.c */
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 1c361ffd51..03edb4bf64 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -28,9 +28,7 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
 
 /* Necessary to provide VIRTIO_F_VERSION_1 on system
  * with older linux headers. Must appear before
@@ -39,8 +37,8 @@
 #include "standard-headers/linux/virtio_config.h"
 
 #if defined(__linux__)
+#include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -52,6 +50,62 @@
 
 #endif
 
+#if defined(__APPLE__) && (__MACH__)
+#include 
+#define htobe16(x) OSSwapHostToBigInt16(x)
+#define htole16(x) OSSwapHostToLittleInt16(x)
+#define be16toh(x) OSSwapBigToHostInt16(x)
+#define le16toh(x) OSSwapLittleToHostInt16(x)
+
+#define htobe32(x) OSSwapHostToBigInt32(x)
+#define htole32(x) OSSwapHostToLittleInt32(x)
+#define be32toh(x) OSSwapBigToHostInt32(x)
+#define le32toh(x) OSSwapLittleToHostInt32(x)
+
+#define htobe64(x) OSSwapHostToBigInt64(x)
+#define htole64(x) OSSwapHostToLittleInt64(x)
+#define be64toh(x) OSSwapBigToHostInt64(x)
+#define le64toh(x) OSSwapLittleToHostInt64(x)
+#endif
+
+#ifdef CONFIG_EVENTFD
+#include 
+#else
+#define eventfd_t uint64_t
+
+int eventfd_write(int fd, eventfd_t value)
+{
+return (write(fd, , sizeof(value)) == sizeof(value)) ? 0 : -1;
+}
+
+int eventfd_read(int fd, eventfd_t *value)
+{
+return (read(fd, value, sizeof(*value)) == sizeof(*value)) ? 0 : -1;
+}
+#endif
+
+#ifdef MFD_ALLOW_SEALING
+#include 
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001  /* prevent further seals from being set */
+#define F_SEAL_SHRINK   0x0002  /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004  /* prevent file from growing */
+#define F_SEAL_WRITE0x0008  /* prevent writes */
+#endif
+#endif
+
+#ifndef MAP_NORESERVE
+#define MAP_NORESERVE 0
+#endif
+
 #include "include/atomic.h"
 
 #include "libvhost-user.h"
-- 
2.44.0

[PATCH for-9.1 v3 05/11] contrib/vhost-user-blk: fix bind() using the right size of the address

2024-04-04 Thread Stefano Garzarella

On macOS passing `-s /tmp/vhost.socket` parameter to the vhost-user-blk
application, the bind was done on `/tmp/vhost.socke` pathname,
missing the last character.

This sounds like one of the portability problems described in the
unix(7) manpage:

   Pathname sockets
   When  binding  a socket to a pathname, a few rules should
   be observed for maximum portability and ease of coding:

   •  The pathname in sun_path should be null-terminated.

   •  The length of the pathname, including the  terminating
  null byte, should not exceed the size of sun_path.

   •  The  addrlen  argument  that  describes  the enclosing
  sockaddr_un structure should have a value of at least:

  offsetof(struct sockaddr_un, sun_path) +
  strlen(addr.sun_path)+1

  or,  more  simply,  addrlen  can   be   specified   as
  sizeof(struct sockaddr_un).

So let's follow the last advice and simplify the code as well.

Signed-off-by: Stefano Garzarella 
---
 contrib/vhost-user-blk/vhost-user-blk.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index 89e5f11a64..a8ab9269a2 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -469,7 +469,6 @@ static int unix_sock_new(char *unix_fn)
 {
 int sock;
 struct sockaddr_un un;
-size_t len;
 
 assert(unix_fn);
 
@@ -481,10 +480,9 @@ static int unix_sock_new(char *unix_fn)
 
 un.sun_family = AF_UNIX;
 (void)snprintf(un.sun_path, sizeof(un.sun_path), "%s", unix_fn);
-len = sizeof(un.sun_family) + strlen(un.sun_path);
 
 (void)unlink(unix_fn);
-if (bind(sock, (struct sockaddr *), len) < 0) {
+if (bind(sock, (struct sockaddr *), sizeof(un)) < 0) {
 perror("bind");
 goto fail;
 }
-- 
2.44.0

[PATCH for-9.1 v3 10/11] tests/qtest/vhost-user-blk-test: use memory-backend-shm

2024-04-04 Thread Stefano Garzarella

`memory-backend-memfd` is available only on Linux while the new
`memory-backend-shm` can be used on any POSIX-compliant operating
system. Let's use it so we can run the test in multiple environments.

Signed-off-by: Stefano Garzarella 
---
 tests/qtest/vhost-user-blk-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index 117b9acd10..e945f6abf2 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -906,7 +906,7 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
vhost_user_blk_bin);
 
 g_string_append_printf(cmd_line,
-" -object memory-backend-memfd,id=mem,size=256M,share=on "
+" -object memory-backend-shm,id=mem,size=256M,share=on "
 " -M memory-backend=mem -m 256M ");
 
 for (i = 0; i < vus_instances; i++) {
-- 
2.44.0

[PATCH for-9.1 v3 01/11] libvhost-user: set msg.msg_control to NULL when it is empty

2024-04-04 Thread Stefano Garzarella

On some OS (e.g. macOS) sendmsg() returns -1 (errno EINVAL) if
the `struct msghdr` has the field `msg_controllen` set to 0, but
`msg_control` is not NULL.

Reviewed-by: Eric Blake 
Reviewed-by: David Hildenbrand 
Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index a879149fef..22bea0c775 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -632,6 +632,7 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg 
*vmsg)
 memcpy(CMSG_DATA(cmsg), vmsg->fds, fdsize);
 } else {
 msg.msg_controllen = 0;
+msg.msg_control = NULL;
 }
 
 do {
-- 
2.44.0

[PATCH for-9.1 v3 02/11] libvhost-user: fail vu_message_write() if sendmsg() is failing

2024-04-04 Thread Stefano Garzarella

In vu_message_write() we use sendmsg() to send the message header,
then a write() to send the payload.

If sendmsg() fails we should avoid sending the payload, since we
were unable to send the header.

Discovered before fixing the issue with the previous patch, where
sendmsg() failed on macOS due to wrong parameters, but the frontend
still sent the payload which the backend incorrectly interpreted
as a wrong header.

Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 22bea0c775..a11afd1960 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -639,6 +639,11 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg 
*vmsg)
 rc = sendmsg(conn_fd, , 0);
 } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
 
+if (rc <= 0) {
+vu_panic(dev, "Error while writing: %s", strerror(errno));
+return false;
+}
+
 if (vmsg->size) {
 do {
 if (vmsg->data) {
-- 
2.44.0

Re: [PATCH for-9.1 v2 09/11] hostmem: add a new memory backend based on POSIX shm_open()

2024-03-27 Thread Stefano Garzarella


On Wed, Mar 27, 2024 at 12:51:51PM +0100, David Hildenbrand wrote:

On 27.03.24 11:23, Stefano Garzarella wrote:

On Tue, Mar 26, 2024 at 03:45:52PM +0100, David Hildenbrand wrote:

+mode = 0;
+oflag = O_RDWR | O_CREAT | O_EXCL;
+backend_name = host_memory_backend_get_name(backend);
+
+/*
+ * Some operating systems allow creating anonymous POSIX shared memory
+ * objects (e.g. FreeBSD provides the SHM_ANON constant), but this is not
+ * defined by POSIX, so let's create a unique name.
+ *
+ * From Linux's shm_open(3) man-page:
+ *   For  portable  use,  a shared  memory  object should be identified
+ *   by a name of the form /somename;"
+ */
+g_string_printf(shm_name, "/qemu-" FMT_pid "-shm-%s", getpid(),
+backend_name);


Hm, shouldn't we just let the user specify a name, and if no name was
specified, generate one ourselves?


I thought about it and initially did it that way, but then some problems
came up so I tried to keep it as simple as possible for the user and for
our use case (having an fd associated with memory and sharing it with
other processes).

The problems I had were:

- what mode_t to use if the object does not exist and needs to be
   created? >
- exclude O_EXCL if the user passes the name since they may have already
   created it?


I'd handle both like we handle files. But I understand now that you 
really just want to "simulate" memfd.


Right, maybe I should write that in the commit message and 
documentation.






- call shm_unlink() only at object deallocation?


Right, we don't really do that for ordinary files, they keep existing. 
We only have the "discard-data" property to free up memory.


For memfd, it just happens "automatically". They cannot be looked up 
by name, and once the last user closes the fd, it gets destroyed.


Yep, I see.





So I thought that for now we only support the "anonymous" mode, and if
in the future we have a use case where the user wants to specify the
name, we can add it later.


Okay, so for now you really only want an anonymous fd that behaves like 
a memfd, got it.


Likely we should somehow fail if the "POSIX shared memory object" 
already exists. Hmm.


`O_CREAT | O_EXCL` should provide just this behavior.
From shm_open(3) manpage:

O_EXCL  If O_CREAT was also specified, and a shared memory object
with the given name already exists, return an error.  The
check for the existence of the object, and its creation if
it does not exist, are performed atomically.





That said, if you think it's already useful from the beginning, I can
add the name as an optional parameter.



I'm also not quite sure if "host_memory_backend_get_name()" should be
used for the purpose here.


What problem do you see? As an alternative I thought of a static
counter.


If it's really just an "internal UUID", as we'll remove the file using 
shm_unlink(), then the name in fact shouldn't matter and this would be 
fine. Correct?


Right. It's a name that will only "appear" in the system for a very 
short time since we call shm_unlink() right after shm_open(). So I just 
need the unique name to prevent several QEMUs launched at the same time 
from colliding with the name.




So I assume if we ever want to have non-anonymous fds here, we'd pass 
in a new property "name", and change the logic how we open/unlink.  
Makes sense.


Exactly! I can clarify this in the commit description as well.

Thanks for the helpful comments!
If there is anything I need to change for v3, please tell me ;-)

Stefano

Re: vhost-user-blk reconnect issue

2024-03-27 Thread Stefano Garzarella


Hi Yajun,

On Mon, Mar 25, 2024 at 10:54:13AM +, Yajun Wu wrote:

Hi experts,

With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect 
failure scenarios:


Do you know if has it ever worked and so it's a regression, or have we 
always had this problem?


Thanks,
Stefano


1. Disconnect vhost-user-blk backend before guest driver probe vblk device, 
then reconnect backend after guest driver probe device. QEMU won't send out any 
vhost messages to restore backend.
This is because vhost->vdev is NULL before guest driver probe vblk device, so 
vhost_user_blk_disconnect won't be called, s->connected is still true. Next 
vhost_user_blk_connect will simply return without doing anything.

2. modprobe -r virtio-blk inside VM, then disconnect backend, then reconnect 
backend, then modprobe virtio-blk. QEMU won't send messages in vhost_dev_init.
This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev also 
become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be called. Again 
s->connected is still true, even chr connect is closed.

I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be called 
when chr connect close?
Hope we can have a fix soon.


Thanks,
Yajun

Re: [PATCH for-9.1 v2 09/11] hostmem: add a new memory backend based on POSIX shm_open()

2024-03-27 Thread Stefano Garzarella


On Tue, Mar 26, 2024 at 03:45:52PM +0100, David Hildenbrand wrote:

+mode = 0;
+oflag = O_RDWR | O_CREAT | O_EXCL;
+backend_name = host_memory_backend_get_name(backend);
+
+/*
+ * Some operating systems allow creating anonymous POSIX shared memory
+ * objects (e.g. FreeBSD provides the SHM_ANON constant), but this is not
+ * defined by POSIX, so let's create a unique name.
+ *
+ * From Linux's shm_open(3) man-page:
+ *   For  portable  use,  a shared  memory  object should be identified
+ *   by a name of the form /somename;"
+ */
+g_string_printf(shm_name, "/qemu-" FMT_pid "-shm-%s", getpid(),
+backend_name);


Hm, shouldn't we just let the user specify a name, and if no name was 
specified, generate one ourselves?


I thought about it and initially did it that way, but then some problems 
came up so I tried to keep it as simple as possible for the user and for 
our use case (having an fd associated with memory and sharing it with 
other processes).


The problems I had were:

- what mode_t to use if the object does not exist and needs to be 
  created?


- exclude O_EXCL if the user passes the name since they may have already 
  created it?


- call shm_unlink() only at object deallocation?

So I thought that for now we only support the "anonymous" mode, and if 
in the future we have a use case where the user wants to specify the 
name, we can add it later.


That said, if you think it's already useful from the beginning, I can 
add the name as an optional parameter.




I'm also not quite sure if "host_memory_backend_get_name()" should be 
used for the purpose here.


What problem do you see? As an alternative I thought of a static 
counter.


Thanks,
Stefano

Re: [PATCH for-9.1 v2 04/11] vhost-user-server: don't abort if we can't set fd non-blocking

2024-03-27 Thread Stefano Garzarella


On Tue, Mar 26, 2024 at 09:40:12AM -0500, Eric Blake wrote:

On Tue, Mar 26, 2024 at 02:39:29PM +0100, Stefano Garzarella wrote:

In vhost-user-server we set all fd received from the other peer
in non-blocking mode. For some of them (e.g. memfd, shm_open, etc.)
if we fail, it's not really a problem, because we don't use these
fd with blocking operations, but only to map memory.

In these cases a failure is not bad, so let's just report a warning
instead of panicking if we fail to set some fd in non-blocking mode.

This for example occurs in macOS where setting shm_open() fd
non-blocking is failing (errno: 25).


What is errno 25 on MacOS?


It should be ENOTTY.
I'll add in the commit description.





Signed-off-by: Stefano Garzarella 
---
 util/vhost-user-server.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 3bfb1ad3ec..064999f0b7 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -66,7 +66,11 @@ static void vmsg_unblock_fds(VhostUserMsg *vmsg)
 {
 int i;
 for (i = 0; i < vmsg->fd_num; i++) {
-qemu_socket_set_nonblock(vmsg->fds[i]);
+int ret = qemu_socket_try_set_nonblock(vmsg->fds[i]);
+if (ret) {


Should this be 'if (ret < 0)'?


I was confused by the assert() in qemu_socket_set_nonblock():
void qemu_socket_set_nonblock(int fd)
{
int f;
f = qemu_socket_try_set_nonblock(fd);
assert(f == 0);
}

BTW, I see most of the code checks ret < 0, so I'll fix it.




+warn_report("Failed to set fd %d nonblock for request %d: %s",
+vmsg->fds[i], vmsg->request, strerror(-ret));
+}


This now ignores all errors even on pre-existing fds where we NEED
non-blocking, rather than just the specific (expected) error we are
seeing on MacOS.  Should this code be a bit more precise about
checking that -ret == EXXX for the expected errno value we are
ignoring for the specific fds where non-blocking is not essential?


Good point, maybe I'll just avoid calling vmsg_unblock_fds() when the
message is VHOST_USER_ADD_MEM_REG or VHOST_USER_SET_MEM_TABLE.

These should be the cases where carried fds are used for mmap() and so
there is no need to mark them non-blocking.

WDYT?

Stefano

Re: [PATCH for-9.1 v2 03/11] libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported

2024-03-27 Thread Stefano Garzarella


On Tue, Mar 26, 2024 at 09:36:54AM -0500, Eric Blake wrote:

On Tue, Mar 26, 2024 at 02:39:28PM +0100, Stefano Garzarella wrote:

libvhost-user will panic when receiving VHOST_USER_GET_INFLIGHT_FD
message if MFD_ALLOW_SEALING is not defined, since it's not able
to create a memfd.

VHOST_USER_GET_INFLIGHT_FD is used only if
VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD is negotiated. So, let's mask
that feature if the backend is not able to properly handle these
messages.

Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index a11afd1960..1c361ffd51 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -1674,6 +1674,16 @@ vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg 
*vmsg)
 features |= dev->iface->get_protocol_features(dev);
 }

+/*
+ * If MFD_ALLOW_SEALING is not defined, we are not able to handle
+ * VHOST_USER_GET_INFLIGHT_FD messages, since we can't create a memfd.
+ * Those messages are used only if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD
+ * is negotiated. A device implementation can enable it, so let's mask
+ * it to avoid a runtime panic.
+ */
+#ifndef MFD_ALLOW_SEALING
+features &= ~(1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD);
+#endif


Masking the feature out of advertisement is obviously correct. But
should we also fix the code for handling
VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD to return an error to any client
that requests it in error when the feature was not advertised, instead
of panicking?


Totally agree!

Do I send a separate patch from this series or include it in this
series?
I would do the former because this one is already long enough.

Thanks,
Stefano

Re: [PATCH for-9.1 v2 02/11] libvhost-user: fail vu_message_write() if sendmsg() is failing

2024-03-27 Thread Stefano Garzarella


On Tue, Mar 26, 2024 at 03:36:52PM +0100, David Hildenbrand wrote:

On 26.03.24 15:34, Eric Blake wrote:

On Tue, Mar 26, 2024 at 02:39:27PM +0100, Stefano Garzarella wrote:

In vu_message_write() we use sendmsg() to send the message header,
then a write() to send the payload.

If sendmsg() fails we should avoid sending the payload, since we
were unable to send the header.

Discovered before fixing the issue with the previous patch, where
sendmsg() failed on macOS due to wrong parameters, but the frontend
still sent the payload which the backend incorrectly interpreted
as a wrong header.

Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 22bea0c775..a11afd1960 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -639,6 +639,11 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg 
*vmsg)
 rc = sendmsg(conn_fd, , 0);
 } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
+if (rc <= 0) {


Is rejecting a 0 return value correct?  Technically, a 0 return means
a successful write of 0 bytes - but then again, it is unwise to
attempt to send an fd or other auxilliary ddata without at least one
regular byte, so we should not be attempting a write of 0 bytes.  So I
guess this one is okay, although I might have used < instead of <=.


I blindly copied what they already do at the end of the same function.
However, your observation is correct. That said they have a sendmsg()
to send the header. After this we have a write() loop to send the
payload.

Now if sendmsg() returns 0, but we have not sent all the header, what
should we do? Try sendmsg() again? For how many times?



I was wondering if we could see some partial sendmsg()/write 
succeeding. Meaning, we transferred some bytes but not all, and we'd 
actually need to loop ...


Yep, true, but I would fix it in another patch/series if you agree.

Thanks,
Stefano

[PATCH for-9.1 v2 11/11] tests/qtest/vhost-user-test: add a test case for memory-backend-shm

2024-03-26 Thread Stefano Garzarella

`memory-backend-shm` can be used with vhost-user devices, so let's
add a new test case for it.

Signed-off-by: Stefano Garzarella 
---
 tests/qtest/vhost-user-test.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index d4e437265f..8c1d903b2a 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -44,6 +44,8 @@
 "mem-path=%s,share=on -numa node,memdev=mem"
 #define QEMU_CMD_MEMFD  " -m %d -object memory-backend-memfd,id=mem,size=%dM," 
\
 " -numa node,memdev=mem"
+#define QEMU_CMD_SHM" -m %d -object memory-backend-shm,id=mem,size=%dM," \
+" -numa node,memdev=mem"
 #define QEMU_CMD_CHR" -chardev socket,id=%s,path=%s%s"
 #define QEMU_CMD_NETDEV " -netdev vhost-user,id=hs0,chardev=%s,vhostforce=on"
 
@@ -195,6 +197,7 @@ enum test_memfd {
 TEST_MEMFD_AUTO,
 TEST_MEMFD_YES,
 TEST_MEMFD_NO,
+TEST_MEMFD_SHM,
 };
 
 static void append_vhost_net_opts(TestServer *s, GString *cmd_line,
@@ -228,6 +231,8 @@ static void append_mem_opts(TestServer *server, GString 
*cmd_line,
 
 if (memfd == TEST_MEMFD_YES) {
 g_string_append_printf(cmd_line, QEMU_CMD_MEMFD, size, size);
+} else if (memfd == TEST_MEMFD_SHM) {
+g_string_append_printf(cmd_line, QEMU_CMD_SHM, size, size);
 } else {
 const char *root = init_hugepagefs() ? : server->tmpfs;
 
@@ -788,6 +793,19 @@ static void *vhost_user_test_setup_memfd(GString 
*cmd_line, void *arg)
 return server;
 }
 
+static void *vhost_user_test_setup_shm(GString *cmd_line, void *arg)
+{
+TestServer *server = test_server_new("vhost-user-test", arg);
+test_server_listen(server);
+
+append_mem_opts(server, cmd_line, 256, TEST_MEMFD_SHM);
+server->vu_ops->append_opts(server, cmd_line, "");
+
+g_test_queue_destroy(vhost_user_test_cleanup, server);
+
+return server;
+}
+
 static void test_read_guest_mem(void *obj, void *arg, QGuestAllocator *alloc)
 {
 TestServer *server = arg;
@@ -1081,6 +1099,11 @@ static void register_vhost_user_test(void)
  "virtio-net",
  test_read_guest_mem, );
 
+opts.before = vhost_user_test_setup_shm;
+qos_add_test("vhost-user/read-guest-mem/shm",
+ "virtio-net",
+ test_read_guest_mem, );
+
 if (qemu_memfd_check(MFD_ALLOW_SEALING)) {
 opts.before = vhost_user_test_setup_memfd;
 qos_add_test("vhost-user/read-guest-mem/memfd",
-- 
2.44.0

[PATCH for-9.1 v2 04/11] vhost-user-server: don't abort if we can't set fd non-blocking

2024-03-26 Thread Stefano Garzarella

In vhost-user-server we set all fd received from the other peer
in non-blocking mode. For some of them (e.g. memfd, shm_open, etc.)
if we fail, it's not really a problem, because we don't use these
fd with blocking operations, but only to map memory.

In these cases a failure is not bad, so let's just report a warning
instead of panicking if we fail to set some fd in non-blocking mode.

This for example occurs in macOS where setting shm_open() fd
non-blocking is failing (errno: 25).

Signed-off-by: Stefano Garzarella 
---
 util/vhost-user-server.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 3bfb1ad3ec..064999f0b7 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -66,7 +66,11 @@ static void vmsg_unblock_fds(VhostUserMsg *vmsg)
 {
 int i;
 for (i = 0; i < vmsg->fd_num; i++) {
-qemu_socket_set_nonblock(vmsg->fds[i]);
+int ret = qemu_socket_try_set_nonblock(vmsg->fds[i]);
+if (ret) {
+warn_report("Failed to set fd %d nonblock for request %d: %s",
+vmsg->fds[i], vmsg->request, strerror(-ret));
+}
 }
 }
 
-- 
2.44.0

[PATCH for-9.1 v2 03/11] libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported

2024-03-26 Thread Stefano Garzarella

libvhost-user will panic when receiving VHOST_USER_GET_INFLIGHT_FD
message if MFD_ALLOW_SEALING is not defined, since it's not able
to create a memfd.

VHOST_USER_GET_INFLIGHT_FD is used only if
VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD is negotiated. So, let's mask
that feature if the backend is not able to properly handle these
messages.

Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index a11afd1960..1c361ffd51 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -1674,6 +1674,16 @@ vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg 
*vmsg)
 features |= dev->iface->get_protocol_features(dev);
 }
 
+/*
+ * If MFD_ALLOW_SEALING is not defined, we are not able to handle
+ * VHOST_USER_GET_INFLIGHT_FD messages, since we can't create a memfd.
+ * Those messages are used only if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD
+ * is negotiated. A device implementation can enable it, so let's mask
+ * it to avoid a runtime panic.
+ */
+#ifndef MFD_ALLOW_SEALING
+features &= ~(1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD);
+#endif
 vmsg_set_reply_u64(vmsg, features);
 return true;
 }
-- 
2.44.0

[PATCH for-9.1 v2 09/11] hostmem: add a new memory backend based on POSIX shm_open()

2024-03-26 Thread Stefano Garzarella

shm_open() creates and opens a new POSIX shared memory object.
A POSIX shared memory object allows creating memory backend with an
associated file descriptor that can be shared with external processes
(e.g. vhost-user).

The new `memory-backend-shm` can be used as an alternative when
`memory-backend-memfd` is not available (Linux only), since shm_open()
should be provided by any POSIX-compliant operating system.

Signed-off-by: Stefano Garzarella 
---
 docs/system/devices/vhost-user.rst |   5 +-
 qapi/qom.json  |  17 +
 backends/hostmem-shm.c | 118 +
 backends/meson.build   |   1 +
 qemu-options.hx|  10 +++
 5 files changed, 149 insertions(+), 2 deletions(-)
 create mode 100644 backends/hostmem-shm.c

diff --git a/docs/system/devices/vhost-user.rst 
b/docs/system/devices/vhost-user.rst
index 9b2da106ce..35259d8ec7 100644
--- a/docs/system/devices/vhost-user.rst
+++ b/docs/system/devices/vhost-user.rst
@@ -98,8 +98,9 @@ Shared memory object
 
 In order for the daemon to access the VirtIO queues to process the
 requests it needs access to the guest's address space. This is
-achieved via the ``memory-backend-file`` or ``memory-backend-memfd``
-objects. A reference to a file-descriptor which can access this object
+achieved via the ``memory-backend-file``, ``memory-backend-memfd``, or
+``memory-backend-shm`` objects.
+A reference to a file-descriptor which can access this object
 will be passed via the socket as part of the protocol negotiation.
 
 Currently the shared memory object needs to match the size of the main
diff --git a/qapi/qom.json b/qapi/qom.json
index baae3a183f..88136b861d 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -720,6 +720,19 @@
 '*hugetlbsize': 'size',
 '*seal': 'bool' } }
 
+##
+# @MemoryBackendShmProperties:
+#
+# Properties for memory-backend-shm objects.
+#
+# The @share boolean option is true by default with shm.
+#
+# Since: 9.1
+##
+{ 'struct': 'MemoryBackendShmProperties',
+  'base': 'MemoryBackendProperties',
+  'data': { } }
+
 ##
 # @MemoryBackendEpcProperties:
 #
@@ -976,6 +989,8 @@
 { 'name': 'memory-backend-memfd',
   'if': 'CONFIG_LINUX' },
 'memory-backend-ram',
+{ 'name': 'memory-backend-shm',
+  'if': 'CONFIG_POSIX' },
 'pef-guest',
 { 'name': 'pr-manager-helper',
   'if': 'CONFIG_LINUX' },
@@ -1047,6 +1062,8 @@
   'memory-backend-memfd':   { 'type': 'MemoryBackendMemfdProperties',
   'if': 'CONFIG_LINUX' },
   'memory-backend-ram': 'MemoryBackendProperties',
+  'memory-backend-shm': { 'type': 'MemoryBackendShmProperties',
+  'if': 'CONFIG_POSIX' },
   'pr-manager-helper':  { 'type': 'PrManagerHelperProperties',
   'if': 'CONFIG_LINUX' },
   'qtest':  'QtestProperties',
diff --git a/backends/hostmem-shm.c b/backends/hostmem-shm.c
new file mode 100644
index 00..7595204d29
--- /dev/null
+++ b/backends/hostmem-shm.c
@@ -0,0 +1,118 @@
+/*
+ * QEMU host POSIX shared memory object backend
+ *
+ * Copyright (C) 2024 Red Hat Inc
+ *
+ * Authors:
+ *   Stefano Garzarella 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/hostmem.h"
+#include "qapi/error.h"
+
+#define TYPE_MEMORY_BACKEND_SHM "memory-backend-shm"
+
+OBJECT_DECLARE_SIMPLE_TYPE(HostMemoryBackendShm, MEMORY_BACKEND_SHM)
+
+struct HostMemoryBackendShm {
+HostMemoryBackend parent_obj;
+};
+
+static bool
+shm_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
+{
+g_autoptr(GString) shm_name = g_string_new(NULL);
+g_autofree char *backend_name = NULL;
+uint32_t ram_flags;
+int fd, oflag;
+mode_t mode;
+
+if (!backend->size) {
+error_setg(errp, "can't create backend with size 0");
+return false;
+}
+
+/*
+ * Let's use `mode = 0` because we don't want other processes to open our
+ * memory unless we share the file descriptor with them.
+ */
+mode = 0;
+oflag = O_RDWR | O_CREAT | O_EXCL;
+backend_name = host_memory_backend_get_name(backend);
+
+/*
+ * Some operating systems allow creating anonymous POSIX shared memory
+ * objects (e.g. FreeBSD provides the SHM_ANON constant), but this is not
+ * defined by POSIX, so let's create a unique name.
+ *
+ * From Linux's shm_open(3) man-page:
+ *   For  portable  use,  a shared  memory  object should be identified
+ *   by a name of the form /somename;"
+ */
+g_string_printf(shm_name, "/qemu-" FMT_pid "-shm-%s", getpid(),
+backend_name);
+
+fd = shm_open(shm_nam

[PATCH for-9.1 v2 06/11] vhost-user: enable frontends on any POSIX system

2024-03-26 Thread Stefano Garzarella

The vhost-user protocol is not really Linux-specific so let's enable
vhost-user frontends for any POSIX system.

In vhost_net.c we use VHOST_FILE_UNBIND which is defined in a Linux
specific header, let's define it for other systems as well.

Signed-off-by: Stefano Garzarella 
---
 meson.build| 1 -
 hw/net/vhost_net.c | 5 +
 hw/block/Kconfig   | 2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index c9c3217ba4..c19d51501a 100644
--- a/meson.build
+++ b/meson.build
@@ -151,7 +151,6 @@ have_tpm = get_option('tpm') \
 
 # vhost
 have_vhost_user = get_option('vhost_user') \
-  .disable_auto_if(host_os != 'linux') \
   .require(host_os != 'windows',
error_message: 'vhost-user is not available on Windows').allowed()
 have_vhost_vdpa = get_option('vhost_vdpa') \
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e8e1661646..b8a59b9e90 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -34,8 +34,13 @@
 #include "standard-headers/linux/virtio_ring.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/virtio-bus.h"
+#if defined(__linux__)
 #include "linux-headers/linux/vhost.h"
+#endif
 
+#ifndef VHOST_FILE_UNBIND
+#define VHOST_FILE_UNBIND -1
+#endif
 
 /* Features supported by host kernel. */
 static const int kernel_feature_bits[] = {
diff --git a/hw/block/Kconfig b/hw/block/Kconfig
index 9e8f28f982..29ee09e434 100644
--- a/hw/block/Kconfig
+++ b/hw/block/Kconfig
@@ -40,7 +40,7 @@ config VHOST_USER_BLK
 bool
 # Only PCI devices are provided for now
 default y if VIRTIO_PCI
-depends on VIRTIO && VHOST_USER && LINUX
+depends on VIRTIO && VHOST_USER
 
 config SWIM
 bool
-- 
2.44.0

[PATCH for-9.1 v2 00/11] vhost-user: support any POSIX system (tested on macOS, FreeBSD, OpenBSD)

2024-03-26 Thread Stefano Garzarella

v1: https://patchew.org/QEMU/20240228114759.44758-1-sgarz...@redhat.com/
v2:
  - fixed Author email and little typos in some patches
  - added memory-backend-shm (patches 9, 10, 11) [Daniel, David, Markus]
  - removed changes to memory-backend-file (ex patch 9)
  - used memory-backend-shm in tests/qtest/vhost-user-blk-test (patch 10)
  - added test case in tests/qtest/vhost-user-test (patch 11)

The vhost-user protocol is not really Linux-specific, so let's try support
QEMU's frontends and backends (including libvhost-user) in any POSIX system
with this series. The main use case is to be able to use virtio devices that
we don't have built-in in QEMU (e.g. virtiofsd, vhost-user-vsock, etc.) even
in non-Linux systems.

The first 5 patches are more like fixes discovered at runtime on macOS or
FreeBSD that could go even independently of this series.

Patches 6, 7, and 8 enable building of frontends and backends (including
libvhost-user) with associated code changes to succeed in compilation.

Patch 9 adds `memory-backend-shm` that uses the POSIX shm_open() API to
create shared memory which is identified by an fd that can be shared with
vhost-user backends. This is useful on those systems (like macOS) where
we don't have memfd_create() or special filesystems like "/dev/shm".

Patches 10 and 11 use `memory-backend-shm` in some vhost-user tests.

Maybe the first 5 patches can go separately, but I only discovered those
problems after testing patches 6 - 9, so I have included them in this series
for now. Please let me know if you prefer that I send them separately.

I tested this series using vhost-user-blk and QSD on macOS Sonoma 14.4
(aarch64), FreeBSD 14 (x86_64), OpenBSD 7.4 (x86_64), and Fedora 39 (x86_64)
in this way:

- Start vhost-user-blk or QSD (same commands for all systems)

  vhost-user-blk -s /tmp/vhost.socket \
-b Fedora-Cloud-Base-39-1.5.x86_64.raw

  qemu-storage-daemon \
--blockdev 
file,filename=Fedora-Cloud-Base-39-1.5.x86_64.qcow2,node-name=file \
--blockdev qcow2,file=file,node-name=qcow2 \
--export 
vhost-user-blk,addr.type=unix,addr.path=/tmp/vhost.socket,id=vub,num-queues=1,node-name=qcow2,writable=on

- macOS (aarch64): start QEMU (using hvf accelerator)

  qemu-system-aarch64 -smp 2 -cpu host -M virt,accel=hvf,memory-backend=mem \
-drive 
file=./build/pc-bios/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on \
-device virtio-net-device,netdev=net0 -netdev user,id=net0 \
-device ramfb -device usb-ehci -device usb-kbd \
-object memory-backend-shm,id=mem,size=512M \
-device vhost-user-blk-pci,num-queues=1,disable-legacy=on,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

- FreeBSD/OpenBSD (x86_64): start QEMU (no accelerators available)

  qemu-system-x86_64 -smp 2 -M q35,memory-backend=mem \
-object memory-backend-shm,id=mem,size="512M" \
-device vhost-user-blk-pci,num-queues=1,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

- Fedora (x86_64): start QEMU (using kvm accelerator)

  qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
-object memory-backend-shm,size="512M" \
-device vhost-user-blk-pci,num-queues=1,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

Branch pushed (and CI started) at 
https://gitlab.com/sgarzarella/qemu/-/tree/macos-vhost-user?ref_type=heads

Thanks,
Stefano

Stefano Garzarella (11):
  libvhost-user: set msg.msg_control to NULL when it is empty
  libvhost-user: fail vu_message_write() if sendmsg() is failing
  libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported
  vhost-user-server: don't abort if we can't set fd non-blocking
  contrib/vhost-user-blk: fix bind() using the right size of the address
  vhost-user: enable frontends on any POSIX system
  libvhost-user: enable it on any POSIX system
  contrib/vhost-user-blk: enable it on any POSIX system
  hostmem: add a new memory backend based on POSIX shm_open()
  tests/qtest/vhost-user-blk-test: use memory-backend-shm
  tests/qtest/vhost-user-test: add a test case for memory-backend-shm

 docs/system/devices/vhost-user.rst|   5 +-
 meson.build   |   5 +-
 qapi/qom.json |  17 
 subprojects/libvhost-user/libvhost-user.h |   2 +-
 backends/hostmem-shm.c| 118 ++
 contrib/vhost-user-blk/vhost-user-blk.c   |  23 -
 hw/net/vhost_net.c|   5 +
 subprojects/libvhost-user/libvhost-user.c |  76 +-
 tests/qtest/vhost-user-blk-test.c |   2 +-
 tests/qtest/vhost-user-test.c |  23 +
 util/vhost-user-server.c  |   6 +-
 backends/meson.build  |   1 +
 hw/block/Kconfig  |   2 +-
 qemu-options.hx   |  10 ++
 util/meson.build  |   4 +-
 15 files changed, 280 insertions(+), 19 deletions(-)
 creat

[PATCH for-9.1 v2 01/11] libvhost-user: set msg.msg_control to NULL when it is empty

2024-03-26 Thread Stefano Garzarella

On some OS (e.g. macOS) sendmsg() returns -1 (errno EINVAL) if
the `struct msghdr` has the field `msg_controllen` set to 0, but
`msg_control` is not NULL.

Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index a879149fef..22bea0c775 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -632,6 +632,7 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg 
*vmsg)
 memcpy(CMSG_DATA(cmsg), vmsg->fds, fdsize);
 } else {
 msg.msg_controllen = 0;
+msg.msg_control = NULL;
 }
 
 do {
-- 
2.44.0

[PATCH for-9.1 v2 05/11] contrib/vhost-user-blk: fix bind() using the right size of the address

2024-03-26 Thread Stefano Garzarella

On macOS passing `-s /tmp/vhost.socket` parameter to the vhost-user-blk
application, the bind was done on `/tmp/vhost.socke` pathname,
missing the last character.

This sounds like one of the portability problems described in the
unix(7) manpage:

   Pathname sockets
   When  binding  a socket to a pathname, a few rules should
   be observed for maximum portability and ease of coding:

   •  The pathname in sun_path should be null-terminated.

   •  The length of the pathname, including the  terminating
  null byte, should not exceed the size of sun_path.

   •  The  addrlen  argument  that  describes  the enclosing
  sockaddr_un structure should have a value of at least:

  offsetof(struct sockaddr_un, sun_path) +
  strlen(addr.sun_path)+1

  or,  more  simply,  addrlen  can   be   specified   as
  sizeof(struct sockaddr_un).

So let's follow the last advice and simplify the code as well.

Signed-off-by: Stefano Garzarella 
---
 contrib/vhost-user-blk/vhost-user-blk.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index 89e5f11a64..a8ab9269a2 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -469,7 +469,6 @@ static int unix_sock_new(char *unix_fn)
 {
 int sock;
 struct sockaddr_un un;
-size_t len;
 
 assert(unix_fn);
 
@@ -481,10 +480,9 @@ static int unix_sock_new(char *unix_fn)
 
 un.sun_family = AF_UNIX;
 (void)snprintf(un.sun_path, sizeof(un.sun_path), "%s", unix_fn);
-len = sizeof(un.sun_family) + strlen(un.sun_path);
 
 (void)unlink(unix_fn);
-if (bind(sock, (struct sockaddr *), len) < 0) {
+if (bind(sock, (struct sockaddr *), sizeof(un)) < 0) {
 perror("bind");
 goto fail;
 }
-- 
2.44.0

[PATCH for-9.1 v2 07/11] libvhost-user: enable it on any POSIX system

2024-03-26 Thread Stefano Garzarella

The vhost-user protocol is not really Linux-specific so let's enable
libvhost-user for any POSIX system.

Compiling it on macOS and FreeBSD some problems came up:
- avoid to include linux/vhost.h which is avaibale only on Linux
  (vhost_types.h contains many of the things we need)
- macOS doesn't provide sys/endian.h, so let's define them
  (note: libvhost-user doesn't include qemu's headers, so we can't use
   use "qemu/bswap.h")
- define eventfd_[write|read] as write/read wrapper when system doesn't
  provide those (e.g. macOS)
- copy SEAL defines from include/qemu/memfd.h to make the code works
  on FreeBSD where MFD_ALLOW_SEALING is defined
- define MAP_NORESERVE if it's not defined (e.g. on FreeBSD)

Signed-off-by: Stefano Garzarella 
---
 meson.build   |  2 +-
 subprojects/libvhost-user/libvhost-user.h |  2 +-
 subprojects/libvhost-user/libvhost-user.c | 60 +--
 3 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/meson.build b/meson.build
index c19d51501a..3197a2f62e 100644
--- a/meson.build
+++ b/meson.build
@@ -3194,7 +3194,7 @@ endif
 config_host_data.set('CONFIG_FDT', fdt.found())
 
 vhost_user = not_found
-if host_os == 'linux' and have_vhost_user
+if have_vhost_user
   libvhost_user = subproject('libvhost-user')
   vhost_user = libvhost_user.get_variable('vhost_user_dep')
 endif
diff --git a/subprojects/libvhost-user/libvhost-user.h 
b/subprojects/libvhost-user/libvhost-user.h
index deb40e77b3..e13e1d3931 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -18,9 +18,9 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include "standard-headers/linux/virtio_ring.h"
+#include "standard-headers/linux/vhost_types.h"
 
 /* Based on qemu/hw/virtio/vhost-user.c */
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 1c361ffd51..03edb4bf64 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -28,9 +28,7 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
 
 /* Necessary to provide VIRTIO_F_VERSION_1 on system
  * with older linux headers. Must appear before
@@ -39,8 +37,8 @@
 #include "standard-headers/linux/virtio_config.h"
 
 #if defined(__linux__)
+#include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -52,6 +50,62 @@
 
 #endif
 
+#if defined(__APPLE__) && (__MACH__)
+#include 
+#define htobe16(x) OSSwapHostToBigInt16(x)
+#define htole16(x) OSSwapHostToLittleInt16(x)
+#define be16toh(x) OSSwapBigToHostInt16(x)
+#define le16toh(x) OSSwapLittleToHostInt16(x)
+
+#define htobe32(x) OSSwapHostToBigInt32(x)
+#define htole32(x) OSSwapHostToLittleInt32(x)
+#define be32toh(x) OSSwapBigToHostInt32(x)
+#define le32toh(x) OSSwapLittleToHostInt32(x)
+
+#define htobe64(x) OSSwapHostToBigInt64(x)
+#define htole64(x) OSSwapHostToLittleInt64(x)
+#define be64toh(x) OSSwapBigToHostInt64(x)
+#define le64toh(x) OSSwapLittleToHostInt64(x)
+#endif
+
+#ifdef CONFIG_EVENTFD
+#include 
+#else
+#define eventfd_t uint64_t
+
+int eventfd_write(int fd, eventfd_t value)
+{
+return (write(fd, , sizeof(value)) == sizeof(value)) ? 0 : -1;
+}
+
+int eventfd_read(int fd, eventfd_t *value)
+{
+return (read(fd, value, sizeof(*value)) == sizeof(*value)) ? 0 : -1;
+}
+#endif
+
+#ifdef MFD_ALLOW_SEALING
+#include 
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001  /* prevent further seals from being set */
+#define F_SEAL_SHRINK   0x0002  /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004  /* prevent file from growing */
+#define F_SEAL_WRITE0x0008  /* prevent writes */
+#endif
+#endif
+
+#ifndef MAP_NORESERVE
+#define MAP_NORESERVE 0
+#endif
+
 #include "include/atomic.h"
 
 #include "libvhost-user.h"
-- 
2.44.0

[PATCH for-9.1 v2 08/11] contrib/vhost-user-blk: enable it on any POSIX system

2024-03-26 Thread Stefano Garzarella

Let's make the code more portable by using the "qemu/bswap.h" API
and adding defines from block/file-posix.c to support O_DIRECT in
other systems (e.g. macOS).

vhost-user-server.c is a dependency, let's enable it for any POSIX
system.

Signed-off-by: Stefano Garzarella 
---
 meson.build |  2 --
 contrib/vhost-user-blk/vhost-user-blk.c | 19 +--
 util/meson.build|  4 +++-
 3 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/meson.build b/meson.build
index 3197a2f62e..b541e5c875 100644
--- a/meson.build
+++ b/meson.build
@@ -1956,8 +1956,6 @@ has_statx = cc.has_header_symbol('sys/stat.h', 
'STATX_BASIC_STATS', prefix: gnu_
 has_statx_mnt_id = cc.has_header_symbol('sys/stat.h', 'STATX_MNT_ID', prefix: 
gnu_source_prefix)
 
 have_vhost_user_blk_server = get_option('vhost_user_blk_server') \
-  .require(host_os == 'linux',
-   error_message: 'vhost_user_blk_server requires linux') \
   .require(have_vhost_user,
error_message: 'vhost_user_blk_server requires vhost-user support') 
\
   .disable_auto_if(not have_tools and not have_system) \
diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index a8ab9269a2..462e584857 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -16,6 +16,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/bswap.h"
 #include "standard-headers/linux/virtio_blk.h"
 #include "libvhost-user-glib.h"
 
@@ -24,6 +25,20 @@
 #include 
 #endif
 
+/* OS X does not have O_DSYNC */
+#ifndef O_DSYNC
+#ifdef O_SYNC
+#define O_DSYNC O_SYNC
+#elif defined(O_FSYNC)
+#define O_DSYNC O_FSYNC
+#endif
+#endif
+
+/* Approximate O_DIRECT with O_DSYNC if O_DIRECT isn't available */
+#ifndef O_DIRECT
+#define O_DIRECT O_DSYNC
+#endif
+
 enum {
 VHOST_USER_BLK_MAX_QUEUES = 8,
 };
@@ -267,13 +282,13 @@ static int vub_virtio_process_req(VubDev *vdev_blk,
 req->in = (struct virtio_blk_inhdr *)elem->in_sg[in_num - 1].iov_base;
 in_num--;
 
-type = le32toh(req->out->type);
+type = le32_to_cpu(req->out->type);
 switch (type & ~VIRTIO_BLK_T_BARRIER) {
 case VIRTIO_BLK_T_IN:
 case VIRTIO_BLK_T_OUT: {
 ssize_t ret = 0;
 bool is_write = type & VIRTIO_BLK_T_OUT;
-req->sector_num = le64toh(req->out->sector);
+req->sector_num = le64_to_cpu(req->out->sector);
 if (is_write) {
 ret  = vub_writev(req, >out_sg[1], out_num);
 } else {
diff --git a/util/meson.build b/util/meson.build
index 0ef9886be0..f52682ce96 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -113,10 +113,12 @@ if have_block
 util_ss.add(files('filemonitor-stub.c'))
   endif
   if host_os == 'linux'
-util_ss.add(files('vhost-user-server.c'), vhost_user)
 util_ss.add(files('vfio-helpers.c'))
 util_ss.add(files('chardev_open.c'))
   endif
+  if host_os != 'windows'
+util_ss.add(files('vhost-user-server.c'), vhost_user)
+  endif
 endif
 
 if cpu == 'aarch64'
-- 
2.44.0

[PATCH for-9.1 v2 02/11] libvhost-user: fail vu_message_write() if sendmsg() is failing

2024-03-26 Thread Stefano Garzarella

In vu_message_write() we use sendmsg() to send the message header,
then a write() to send the payload.

If sendmsg() fails we should avoid sending the payload, since we
were unable to send the header.

Discovered before fixing the issue with the previous patch, where
sendmsg() failed on macOS due to wrong parameters, but the frontend
still sent the payload which the backend incorrectly interpreted
as a wrong header.

Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 22bea0c775..a11afd1960 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -639,6 +639,11 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg 
*vmsg)
 rc = sendmsg(conn_fd, , 0);
 } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
 
+if (rc <= 0) {
+vu_panic(dev, "Error while writing: %s", strerror(errno));
+return false;
+}
+
 if (vmsg->size) {
 do {
 if (vmsg->data) {
-- 
2.44.0

[PATCH for-9.1 v2 10/11] tests/qtest/vhost-user-blk-test: use memory-backend-shm

2024-03-26 Thread Stefano Garzarella

`memory-backend-memfd` is available only on Linux while the new
`memory-backend-shm` can be used on any POSIX-compliant operating
system. Let's use it so we can run the test in multiple environments.

Signed-off-by: Stefano Garzarella 
---
 tests/qtest/vhost-user-blk-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/vhost-user-blk-test.c 
b/tests/qtest/vhost-user-blk-test.c
index 117b9acd10..e945f6abf2 100644
--- a/tests/qtest/vhost-user-blk-test.c
+++ b/tests/qtest/vhost-user-blk-test.c
@@ -906,7 +906,7 @@ static void start_vhost_user_blk(GString *cmd_line, int 
vus_instances,
vhost_user_blk_bin);
 
 g_string_append_printf(cmd_line,
-" -object memory-backend-memfd,id=mem,size=256M,share=on "
+" -object memory-backend-shm,id=mem,size=256M,share=on "
 " -M memory-backend=mem -m 256M ");
 
 for (i = 0; i < vus_instances; i++) {
-- 
2.44.0

[PATCH for-9.0 v2] vhost-vdpa: check vhost_vdpa_set_vring_ready() return value

2024-03-22 Thread Stefano Garzarella

vhost_vdpa_set_vring_ready() could already fail, but if Linux's
patch [1] will be merged, it may fail with more chance if
userspace does not activate virtqueues before DRIVER_OK when
VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not negotiated.

So better check its return value anyway.

[1] 
https://lore.kernel.org/virtualization/20240206145154.118044-1-sgarz...@redhat.com/T/#u

Acked-by: Eugenio Pérez 
Acked-by: Jason Wang 
Signed-off-by: Stefano Garzarella 
---
Based-on: 20240315155949.86066-1-kw...@redhat.com

v1: https://patchew.org/QEMU/20240207092702.25242-1-sgarz...@redhat.com/
v2:
 - added acks
 - rebased on top of 
https://patchew.org/QEMU/20240315155949.86066-1-kw...@redhat.com/
---
 net/vhost-vdpa.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 85e73dd6a7..eda714d1a4 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -399,7 +399,10 @@ static int vhost_vdpa_net_data_load(NetClientState *nc)
 }
 
 for (int i = 0; i < v->dev->nvqs; ++i) {
-vhost_vdpa_set_vring_ready(v, i + v->dev->vq_index);
+int ret = vhost_vdpa_set_vring_ready(v, i + v->dev->vq_index);
+if (ret < 0) {
+return ret;
+}
 }
 return 0;
 }
@@ -1238,7 +1241,10 @@ static int vhost_vdpa_net_cvq_load(NetClientState *nc)
 
 assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 
-vhost_vdpa_set_vring_ready(v, v->dev->vq_index);
+r = vhost_vdpa_set_vring_ready(v, v->dev->vq_index);
+if (unlikely(r < 0)) {
+return r;
+}
 
 if (v->shadow_vqs_enabled) {
 n = VIRTIO_NET(v->dev->vdev);
@@ -1277,7 +1283,10 @@ static int vhost_vdpa_net_cvq_load(NetClientState *nc)
 }
 
 for (int i = 0; i < v->dev->vq_index; ++i) {
-vhost_vdpa_set_vring_ready(v, i);
+r = vhost_vdpa_set_vring_ready(v, i);
+if (unlikely(r < 0)) {
+return r;
+}
 }
 
 return 0;
-- 
2.44.0

Re: [PATCH] vhost-vdpa: check vhost_vdpa_set_vring_ready() return value

2024-03-20 Thread Stefano Garzarella

On Wed, Mar 20, 2024 at 12:18:14PM +0800, Jason Wang wrote:

On Mon, Mar 18, 2024 at 4:27 PM Stefano Garzarella  wrote:

On Mon, Mar 18, 2024 at 12:31:59PM +0800, Jason Wang wrote:
>On Fri, Mar 15, 2024 at 4:23 PM Stefano Garzarella  wrote:
>>
>> On Thu, Mar 14, 2024 at 11:17:01AM +0800, Jason Wang wrote:
>> >On Wed, Feb 7, 2024 at 5:27 PM Stefano Garzarella  
wrote:
>> >>
>> >> vhost_vdpa_set_vring_ready() could already fail, but if Linux's
>> >> patch [1] will be merged, it may fail with more chance if
>> >> userspace does not activate virtqueues before DRIVER_OK when
>> >> VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not negotiated.
>> >
>> >I wonder what happens if we just leave it as is.
>>
>> Are you referring to this patch or the kernel patch?
>
>This patch.
>
>>
>> Here I'm just checking the return value of vhost_vdpa_set_vring_ready().
>> It can return an error also without that kernel patch, so IMHO is
>> better to check the return value here in QEMU.
>>
>> What issue do you see with this patch applied?
>
>For the parent which can enable after driver_ok but not advertise it.

But this patch is not changing anything in that sense, it just controls
the return value of the VHOST_VDPA_SET_VRING_ENABLE ioctl.

Why would QEMU ignore an error if it can't activate vrings?
If we really want to ignore it we should document it both in QEMU, but
also in the kernel, because honestly the way the code is now it
shouldn't fail from what I understand.

That said, even if we ignore it, IMHO we should at least print a warning
in QEMU.

Right.

>
>(To say the truth, I'm not sure if we need to care about this)

I agree on that, but this is related to the patch in the kernel, not

.> this simple patch to fix QEMU error path, right?

Or it's the charge of the Qemu vDPA layer to avoid calling
set_vq_ready() after driver_ok if no
VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK. Or it might be too late.

Yeah, maybe is too late. We already released several versions without
that.

>
>>
>> >
>> >VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK: We do know enabling could 
>> >be

>> >done after driver_ok.
>> >Without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK: We don't know whether
>> >enabling could be done after driver_ok or not.
>>
>> I see your point, indeed I didn't send a v2 of that patch.
>> Maybe we should document that, because it could be interpreted that if
>> VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not negotiated the enabling
>> should always be done before driver_ok (which is true for example 
>> in

>> VDUSE).
>
>I see, so I think we probably need the fix.
>
>>
>> BTW I think we should discuss it in the kernel patch.
>>
>> Thanks,
>> Stefano
>>
>> >
>> >Thanks
>> >
>> >>
>> >> So better check its return value anyway.
>> >>
>> >> [1] 
https://lore.kernel.org/virtualization/20240206145154.118044-1-sgarz...@redhat.com/T/#u
>> >>
>> >> Signed-off-by: Stefano Garzarella 
>> >> ---
>> >> Note: This patch conflicts with [2], but the resolution is simple,
>> >> so for now I sent a patch for the current master, but I'll rebase
>> >> this patch if we merge the other one first.
>
>Will go through [2].

Here I meant that the conflict is only in the code touched, because
Kevin's patch remove/move some of the code touched by this patch.
And rightly he checked the return value of the ioctl as I would like to
do in the other places where we call the same ioctl.

So honestly I still don't understand what's wrong with this patch...

Nothing wrong now.

Acked-by: Jason Wang 

Thanks for the review,
I'll send a v2 carrying your and Eugenio acks, rebasing on top of 
Kevin's patch, so it should be easy to merge.

Stefano

Re: [PATCH 2/9] backends/confidential-guest-support: Add IGVM file parameter

2024-03-19 Thread Stefano Garzarella


On Tue, Feb 27, 2024 at 02:50:08PM +, Roy Hopkins wrote:

In order to add support for parsing IGVM files for secure virtual
machines, a the path to an IGVM file needs to be specified as
part of the guest configuration. It makes sense to add this to
the ConfidentialGuestSupport object as this is common to all secure
virtual machines that potentially could support IGVM based
configuration.

This patch allows the filename to be configured via the QEMU
object model in preparation for subsequent patches that will read and
parse the IGVM file.

Signed-off-by: Roy Hopkins 
---
backends/confidential-guest-support.c | 21 +
include/exec/confidential-guest-support.h |  9 +
qapi/qom.json | 13 +
qemu-options.hx   |  8 +++-
4 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/backends/confidential-guest-support.c 
b/backends/confidential-guest-support.c
index 052fde8db0..da436fb736 100644
--- a/backends/confidential-guest-support.c
+++ b/backends/confidential-guest-support.c
@@ -20,8 +20,29 @@ OBJECT_DEFINE_ABSTRACT_TYPE(ConfidentialGuestSupport,
CONFIDENTIAL_GUEST_SUPPORT,
OBJECT)

+#if defined(CONFIG_IGVM)
+static char *get_igvm(Object *obj, Error **errp)
+{
+ConfidentialGuestSupport *cgs = CONFIDENTIAL_GUEST_SUPPORT(obj);
+return g_strdup(cgs->igvm_filename);
+}
+
+static void set_igvm(Object *obj, const char *value, Error **errp)
+{
+ConfidentialGuestSupport *cgs = CONFIDENTIAL_GUEST_SUPPORT(obj);
+g_free(cgs->igvm_filename);
+cgs->igvm_filename = g_strdup(value);
+}
+#endif
+
static void confidential_guest_support_class_init(ObjectClass *oc, void *data)
{
+#if defined(CONFIG_IGVM)
+object_class_property_add_str(oc, "igvm-file",
+get_igvm, set_igvm);
+object_class_property_set_description(oc, "igvm-file",
+"Set the IGVM filename to use");
+#endif
}

static void confidential_guest_support_init(Object *obj)
diff --git a/include/exec/confidential-guest-support.h 
b/include/exec/confidential-guest-support.h
index ba2dd4b5df..b08ad8de4d 100644
--- a/include/exec/confidential-guest-support.h
+++ b/include/exec/confidential-guest-support.h
@@ -51,6 +51,15 @@ struct ConfidentialGuestSupport {
 * so 'ready' is not set, we'll abort.
 */
bool ready;
+
+#if defined(CONFIG_IGVM)
+/*
+ * igvm_filename: Optional filename that specifies a file that contains
+ *the configuration of the guest in Isolated Guest
+ *Virtual Machine (IGVM) format.
+ */
+char *igvm_filename;
+#endif
};

typedef struct ConfidentialGuestSupportClass {
diff --git a/qapi/qom.json b/qapi/qom.json
index 2a6e49365a..570bdd7d55 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -859,6 +859,18 @@
  'base': 'RngProperties',
  'data': { '*filename': 'str' } }

+##
+# @ConfidentialGuestProperties:
+#
+# Properties common to objects that are derivatives of 
confidential-guest-support.
+#
+# @igvm-file: IGVM file to use to configure guest (default: none)
+#
+# Since: 8.2


Should it be 9.0 or maybe 9.1 ?


+##
+{ 'struct': 'ConfidentialGuestProperties',
+  'data': { '*igvm-file': 'str' } }
+
##
# @SevGuestProperties:
#
@@ -886,6 +898,7 @@
# Since: 2.12
##
{ 'struct': 'SevGuestProperties',
+  'base': 'ConfidentialGuestProperties',
  'data': { '*sev-device': 'str',
'*dh-cert-file': 'str',
'*session-file': 'str',
diff --git a/qemu-options.hx b/qemu-options.hx
index 9be1e5817c..49d9226e35 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -5640,7 +5640,7 @@ SRST
 -object secret,id=sec0,keyid=secmaster0,format=base64,\\
 data=$SECRET,iv=$(

Re: [PATCH 0/9] Introduce support for IGVM files

2024-03-19 Thread Stefano Garzarella


Hi Roy,
thanks for this series!

On Tue, Feb 27, 2024 at 02:50:06PM +, Roy Hopkins wrote:

Hi everyone,

This initial patch series submission adds the capability to configure
confidential guests using files that conform to the Independent Guest Virtual
Machine (IGVM) file format. The series is based on the master branch commit
1b330da. Alternatively, the series is available here:
https://github.com/roy-hopkins/qemu/tree/igvm_master_v1

I look forward to welcoming your comments!


I saw that the series has been a posted for a while and maybe you're 
going to send v2, so I'll bring back some little things that I saw, but 
I didn't do yet a deep review:


- We use "Isolated Guest Virtual Machine" or "Independent Guest Virtual
  Machine", are they interchangeable for IGVM?

- `./scripts/checkpatch.pl --codespell` reported some warnings:

5/9 Checking commit 81f60e5cdd01 (i386/pc: Process IGVM file during PC 
initialization if present)
WARNING: 'initalization' may be misspelled - perhaps 'initialization'?
#15:
initalization of the target.
^

9/9 Checking commit 66745c0bb940 (docs/system: Add documentation on support for 
IGVM)

WARNING: 'encaspulate' may be misspelled - perhaps 'encapsulate'?
#27: FILE: docs/system/igvm.rst:4:
+IGVM files are designed to encaspulate all the information required to launch a
^^^

Thanks,
Stefano



Why do we need Independent Guest Virtual Machine (IGVM) files?
==

IGVM files describe, using a set of directives, the memory layout and initial
configuration of a guest that supports isolation technologies such as AMD
SEV-SNP and Intel TDX. By encapsulating all of this information in a single
configuration file and applying the directives in the order they are specified
when the guest is initialized, it becomes straightforward to pre-calculate the
cryptographic measurement of the guest initial state, thus aiding in remote
attestation processes.

IGVM files can also be used to configure non-standard guest memory layouts,
payloads or startup configurations. A good example of this is to use IGVM to
deploy and configure an SVSM module in the guest which supports running at
multiple VMPLs. The SVSM can be configured to start directly into 32-bit or
64-bit code. This patch series was developed with this purpose in mind to
support the COCONUT-SVSM project:
https://github.com/coconut-svsm/svsm

More information and background on the IGVM file format can be found on the
project page at:
https://github.com/microsoft/igvm

What this patch series introduces
=

This series adds a build-time configuration option (--enable-igvm) to add
support for launching a guest using an IGVM file. It extends the current
ConfidentialGuestSupport object to allow an IGVM filename to be specified.

The directives in the IGVM file are parsed and the confidential guest is
configured through new virtual methods added to the ConfidentialGuestSupport
object. These virtual functions have been implemented for AMD SEV and AMD
SEV-ES.

Many of the IGVM directives require capabilities that are not supported in SEV
and SEV-ES, so support for IGVM directives will need to be considered when
support for SEV-SNP, TDX or other technologies is introduced to QEMU. Any
directive that is not currently supported results in an error report.

Dependencies


In order to enable IGVM support, you will need the IGVM library installed.
Instructions on building and installing it can be found here:
https://github.com/microsoft/igvm/tree/main/igvm_c

As mentioned above, this series was developed as part of the effort for
COCONUT-SVSM. COCONUT-SVSM requires support for AMD SEV-SNP which is not
available in current QEMU. Therefore this series has also been applied on top of
the AMD SEV-SNP branch (https://github.com/AMDESE/qemu/tree/snp-v3-wip). You can
find that version of the series here:
https://github.com/roy-hopkins/qemu/commits/snp-v3-wip-igvm_v2/

Generating IGVM files
=

To try this out you will need to generate an IGVM file that is compatible with
the SEV platform you are testing on. I've created a tool that can create a
simple IGVM file that packages an OVMF binary for AMD SEV or AMD SEV-ES. The
tool is available here:
https://github.com/roy-hopkins/buildigvm

I have tested this on an AMD EPYC Genoa system configured to support SEV. Both
SEV and SEV-ES have been tested using IGVM files generated using the buildigvm
tool. The SEV-SNP alternative patch set has also been tested using COCONUT-SVSM.

Roy Hopkins (9):
 meson: Add optional dependency on IGVM library
 backends/confidential-guest-support: Add IGVM file parameter
 backends/confidential-guest-support: Add functions to support IGVM
 backends/igvm: Implement parsing and processing of IGVM files
 i386/pc: Process IGVM file during PC initialization if present
 i386/pc: Skip initialization of system FW

Re: [PATCH] vhost-vdpa: check vhost_vdpa_set_vring_ready() return value

2024-03-18 Thread Stefano Garzarella

On Mon, Mar 18, 2024 at 12:31:59PM +0800, Jason Wang wrote:

On Fri, Mar 15, 2024 at 4:23 PM Stefano Garzarella  wrote:

On Thu, Mar 14, 2024 at 11:17:01AM +0800, Jason Wang wrote:
>On Wed, Feb 7, 2024 at 5:27 PM Stefano Garzarella  wrote:
>>
>> vhost_vdpa_set_vring_ready() could already fail, but if Linux's
>> patch [1] will be merged, it may fail with more chance if
>> userspace does not activate virtqueues before DRIVER_OK when
>> VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not negotiated.
>
>I wonder what happens if we just leave it as is.

Are you referring to this patch or the kernel patch?

This patch.

Here I'm just checking the return value of vhost_vdpa_set_vring_ready().
It can return an error also without that kernel patch, so IMHO is
better to check the return value here in QEMU.

What issue do you see with this patch applied?

For the parent which can enable after driver_ok but not advertise it.

But this patch is not changing anything in that sense, it just controls
the return value of the VHOST_VDPA_SET_VRING_ENABLE ioctl.

Why would QEMU ignore an error if it can't activate vrings?
If we really want to ignore it we should document it both in QEMU, but
also in the kernel, because honestly the way the code is now it
shouldn't fail from what I understand.

That said, even if we ignore it, IMHO we should at least print a warning
in QEMU.

(To say the truth, I'm not sure if we need to care about this)

I agree on that, but this is related to the patch in the kernel, not
this simple patch to fix QEMU error path, right?

>
>VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK: We do know enabling could be
>done after driver_ok.
>Without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK: We don't know whether
>enabling could be done after driver_ok or not.

I see your point, indeed I didn't send a v2 of that patch.
Maybe we should document that, because it could be interpreted that if
VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not negotiated the enabling
should always be done before driver_ok (which is true for example in
VDUSE).

I see, so I think we probably need the fix.

BTW I think we should discuss it in the kernel patch.

Thanks,
Stefano

>
>Thanks
>
>>
>> So better check its return value anyway.
>>
>> [1] 
https://lore.kernel.org/virtualization/20240206145154.118044-1-sgarz...@redhat.com/T/#u
>>
>> Signed-off-by: Stefano Garzarella 
>> ---
>> Note: This patch conflicts with [2], but the resolution is simple,
>> so for now I sent a patch for the current master, but I'll rebase
>> this patch if we merge the other one first.

Will go through [2].

Here I meant that the conflict is only in the code touched, because
Kevin's patch remove/move some of the code touched by this patch.
And rightly he checked the return value of the ioctl as I would like to
do in the other places where we call the same ioctl.

So honestly I still don't understand what's wrong with this patch...

Thanks,
Stefano

Thanks

>>
>> [2]
>> https://lore.kernel.org/qemu-devel/20240202132521.32714-1-kw...@redhat.com/
>> ---
>>  hw/virtio/vdpa-dev.c |  8 +++-
>>  net/vhost-vdpa.c | 15 ---
>>  2 files changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
>> index eb9ecea83b..d57cd76c18 100644
>> --- a/hw/virtio/vdpa-dev.c
>> +++ b/hw/virtio/vdpa-dev.c
>> @@ -259,7 +259,11 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, 
Error **errp)
>>  goto err_guest_notifiers;
>>  }
>>  for (i = 0; i < s->dev.nvqs; ++i) {
>> -vhost_vdpa_set_vring_ready(>vdpa, i);
>> +ret = vhost_vdpa_set_vring_ready(>vdpa, i);
>> +if (ret < 0) {
>> +error_setg_errno(errp, -ret, "Error starting vring %d", i);
>> +goto err_dev_stop;
>> +}
>>  }
>>  s->started = true;
>>
>> @@ -274,6 +278,8 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, 
Error **errp)
>>
>>  return ret;
>>
>> +err_dev_stop:
>> +vhost_dev_stop(>dev, vdev, false);
>>  err_guest_notifiers:
>>  k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
>>  err_host_notifiers:
>> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
>> index 3726ee5d67..e3d8036479 100644
>> --- a/net/vhost-vdpa.c
>> +++ b/net/vhost-vdpa.c
>> @@ -381,7 +381,10 @@ static int vhost_vdpa_net_data_load(NetClientState *nc)
>>  }
>>
>>  for (int i = 0; i < v->dev->nvqs; ++i) {
>> -vhost_vdpa_set_vring_ready(v, i + v->dev->vq_index);
>> +int ret = vhost_vdpa_set_vring_ready(v, i +

Re: [PATCH for-9.0 v3] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-03-15 Thread Stefano Garzarella


On Fri, Mar 15, 2024 at 04:59:49PM +0100, Kevin Wolf wrote:

VDUSE requires that virtqueues are first enabled before the DRIVER_OK
status flag is set; with the current API of the kernel module, it is
impossible to enable the opposite order in our block export code because
userspace is not notified when a virtqueue is enabled.

This requirement also mathces the normal initialisation order as done by
the generic vhost code in QEMU. However, commit 6c482547 accidentally
changed the order for vdpa-dev and broke access to VDUSE devices with
this.

This changes vdpa-dev to use the normal order again and use the standard
vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
used with vdpa-dev again after this fix.

vhost_net intentionally avoided enabling the vrings for vdpa and does
this manually later while it does enable them for other vhost backends.
Reflect this in the vhost_net code and return early for vdpa, so that
the behaviour doesn't change for this device.

Cc: qemu-sta...@nongnu.org
Fixes: 6c4825476a4351530bcac17abab72295b75ffe98
Signed-off-by: Kevin Wolf 
---
v2:
- Actually make use of the @enable parameter
- Change vhost_net to preserve the current behaviour

v3:
- Updated trace point [Stefano]
- Fixed typo in comment [Stefano]

hw/net/vhost_net.c | 10 ++
hw/virtio/vdpa-dev.c   |  5 +
hw/virtio/vhost-vdpa.c | 29 ++---
hw/virtio/vhost.c  |  8 +++-
hw/virtio/trace-events |  2 +-
5 files changed, 45 insertions(+), 9 deletions(-)


LGTM!

Reviewed-by: Stefano Garzarella 

Thanks,
Stefano



diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e8e1661646..fd1a93701a 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -541,6 +541,16 @@ int vhost_set_vring_enable(NetClientState *nc, int enable)
VHostNetState *net = get_vhost_net(nc);
const VhostOps *vhost_ops = net->dev.vhost_ops;

+/*
+ * vhost-vdpa network devices need to enable dataplane virtqueues after
+ * DRIVER_OK, so they can recover device state before starting dataplane.
+ * Because of that, we don't enable virtqueues here and leave it to
+ * net/vhost-vdpa.c.
+ */
+if (nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+return 0;
+}
+
nc->vring_enable = enable;

if (vhost_ops && vhost_ops->vhost_set_vring_enable) {
diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
index eb9ecea83b..13e87f06f6 100644
--- a/hw/virtio/vdpa-dev.c
+++ b/hw/virtio/vdpa-dev.c
@@ -253,14 +253,11 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, 
Error **errp)

s->dev.acked_features = vdev->guest_features;

-ret = vhost_dev_start(>dev, vdev, false);
+ret = vhost_dev_start(>dev, vdev, true);
if (ret < 0) {
error_setg_errno(errp, -ret, "Error starting vhost");
goto err_guest_notifiers;
}
-for (i = 0; i < s->dev.nvqs; ++i) {
-vhost_vdpa_set_vring_ready(>vdpa, i);
-}
s->started = true;

/*
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index ddae494ca8..31453466af 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -886,19 +886,41 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, 
int idx)
return idx;
}

-int vhost_vdpa_set_vring_ready(struct vhost_vdpa *v, unsigned idx)
+static int vhost_vdpa_set_vring_enable_one(struct vhost_vdpa *v, unsigned idx,
+   int enable)
{
struct vhost_dev *dev = v->dev;
struct vhost_vring_state state = {
.index = idx,
-.num = 1,
+.num = enable,
};
int r = vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, );

-trace_vhost_vdpa_set_vring_ready(dev, idx, r);
+trace_vhost_vdpa_set_vring_enable_one(dev, idx, enable, r);
return r;
}

+static int vhost_vdpa_set_vring_enable(struct vhost_dev *dev, int enable)
+{
+struct vhost_vdpa *v = dev->opaque;
+unsigned int i;
+int ret;
+
+for (i = 0; i < dev->nvqs; ++i) {
+ret = vhost_vdpa_set_vring_enable_one(v, i, enable);
+if (ret < 0) {
+return ret;
+}
+}
+
+return 0;
+}
+
+int vhost_vdpa_set_vring_ready(struct vhost_vdpa *v, unsigned idx)
+{
+return vhost_vdpa_set_vring_enable_one(v, idx, 1);
+}
+
static int vhost_vdpa_set_config_call(struct vhost_dev *dev,
   int fd)
{
@@ -1514,6 +1536,7 @@ const VhostOps vdpa_ops = {
.vhost_set_features = vhost_vdpa_set_features,
.vhost_reset_device = vhost_vdpa_reset_device,
.vhost_get_vq_index = vhost_vdpa_get_vq_index,
+.vhost_set_vring_enable = vhost_vdpa_set_vring_enable,
.vhost_get_config  = vhost_vdpa_get_config,
.vhost_set_config = vhost_vdpa_set_config,
.vhost_requires_shm_log = NULL,
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 2c9ac79468..a66186 100644
--- a/

Re: [PATCH for-9.0 v2] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-03-15 Thread Stefano Garzarella


On Fri, Mar 15, 2024 at 03:03:31PM +0100, Kevin Wolf wrote:

VDUSE requires that virtqueues are first enabled before the DRIVER_OK
status flag is set; with the current API of the kernel module, it is
impossible to enable the opposite order in our block export code because
userspace is not notified when a virtqueue is enabled.

This requirement also mathces the normal initialisation order as done by
the generic vhost code in QEMU. However, commit 6c482547 accidentally
changed the order for vdpa-dev and broke access to VDUSE devices with
this.

This changes vdpa-dev to use the normal order again and use the standard
vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
used with vdpa-dev again after this fix.

vhost_net intentionally avoided enabling the vrings for vdpa and does
this manually later while it does enable them for other vhost backends.
Reflect this in the vhost_net code and return early for vdpa, so that
the behaviour doesn't change for this device.

Cc: qemu-sta...@nongnu.org
Fixes: 6c4825476a4351530bcac17abab72295b75ffe98
Signed-off-by: Kevin Wolf 
---
v2:
- Actually make use of the @enable parameter
- Change vhost_net to preserve the current behaviour

hw/net/vhost_net.c | 10 ++
hw/virtio/vdpa-dev.c   |  5 +
hw/virtio/vhost-vdpa.c | 27 +--
hw/virtio/vhost.c  |  8 +++-
4 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e8e1661646..fd1a93701a 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -541,6 +541,16 @@ int vhost_set_vring_enable(NetClientState *nc, int enable)
VHostNetState *net = get_vhost_net(nc);
const VhostOps *vhost_ops = net->dev.vhost_ops;

+/*
+ * vhost-vdpa network devices need to enable dataplane virtqueues after
+ * DRIVER_OK, so they can recover device state before starting dataplane.
+ * Because of that, we don't enable virtqueues here and leave it to
+ * net/vhost-vdpa.c.
+ */
+if (nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
+return 0;
+}
+
nc->vring_enable = enable;

if (vhost_ops && vhost_ops->vhost_set_vring_enable) {
diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
index eb9ecea83b..13e87f06f6 100644
--- a/hw/virtio/vdpa-dev.c
+++ b/hw/virtio/vdpa-dev.c
@@ -253,14 +253,11 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, 
Error **errp)

s->dev.acked_features = vdev->guest_features;

-ret = vhost_dev_start(>dev, vdev, false);
+ret = vhost_dev_start(>dev, vdev, true);
if (ret < 0) {
error_setg_errno(errp, -ret, "Error starting vhost");
goto err_guest_notifiers;
}
-for (i = 0; i < s->dev.nvqs; ++i) {
-vhost_vdpa_set_vring_ready(>vdpa, i);
-}
s->started = true;

/*
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index ddae494ca8..401afac2f5 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -886,12 +886,13 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, 
int idx)
return idx;
}

-int vhost_vdpa_set_vring_ready(struct vhost_vdpa *v, unsigned idx)
+static int vhost_vdpa_set_vring_enable_one(struct vhost_vdpa *v, unsigned idx,
+   int enable)
{
struct vhost_dev *dev = v->dev;
struct vhost_vring_state state = {
.index = idx,
-.num = 1,
+.num = enable,
};
int r = vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, );



After this line we now have:

  trace_vhost_vdpa_set_vring_ready(dev, idx, r);

Should we rename it or move it to the new function?

If we rename it, we should trace also `enable`.

@@ -899,6 +900,27 @@ int vhost_vdpa_set_vring_ready(struct vhost_vdpa 
*v, unsigned idx)

return r;
}

+static int vhost_vdpa_set_vring_enable(struct vhost_dev *dev, int enable)
+{
+struct vhost_vdpa *v = dev->opaque;
+unsigned int i;
+int ret;
+
+for (i = 0; i < dev->nvqs; ++i) {
+ret = vhost_vdpa_set_vring_enable_one(v, i, enable);
+if (ret < 0) {
+return ret;
+}
+}
+
+return 0;
+}
+
+int vhost_vdpa_set_vring_ready(struct vhost_vdpa *v, unsigned idx)
+{
+return vhost_vdpa_set_vring_enable_one(v, idx, 1);
+}
+
static int vhost_vdpa_set_config_call(struct vhost_dev *dev,
   int fd)
{
@@ -1514,6 +1536,7 @@ const VhostOps vdpa_ops = {
.vhost_set_features = vhost_vdpa_set_features,
.vhost_reset_device = vhost_vdpa_reset_device,
.vhost_get_vq_index = vhost_vdpa_get_vq_index,
+.vhost_set_vring_enable = vhost_vdpa_set_vring_enable,
.vhost_get_config  = vhost_vdpa_get_config,
.vhost_set_config = vhost_vdpa_set_config,
.vhost_requires_shm_log = NULL,
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 2c9ac79468..decfb85184 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1984,7 +1984,13 @@ static int

Re: [PATCH] vhost-vdpa: check vhost_vdpa_set_vring_ready() return value

2024-03-15 Thread Stefano Garzarella


On Thu, Mar 14, 2024 at 11:17:01AM +0800, Jason Wang wrote:

On Wed, Feb 7, 2024 at 5:27 PM Stefano Garzarella  wrote:


vhost_vdpa_set_vring_ready() could already fail, but if Linux's
patch [1] will be merged, it may fail with more chance if
userspace does not activate virtqueues before DRIVER_OK when
VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not negotiated.


I wonder what happens if we just leave it as is.


Are you referring to this patch or the kernel patch?

Here I'm just checking the return value of vhost_vdpa_set_vring_ready().
It can return an error also without that kernel patch, so IMHO is
better to check the return value here in QEMU.

What issue do you see with this patch applied?



VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK: We do know enabling could be
done after driver_ok.
Without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK: We don't know whether
enabling could be done after driver_ok or not.


I see your point, indeed I didn't send a v2 of that patch.
Maybe we should document that, because it could be interpreted that if
VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not negotiated the enabling
should always be done before driver_ok (which is true for example in
VDUSE).

BTW I think we should discuss it in the kernel patch.

Thanks,
Stefano



Thanks



So better check its return value anyway.

[1] 
https://lore.kernel.org/virtualization/20240206145154.118044-1-sgarz...@redhat.com/T/#u

Signed-off-by: Stefano Garzarella 
---
Note: This patch conflicts with [2], but the resolution is simple,
so for now I sent a patch for the current master, but I'll rebase
this patch if we merge the other one first.

[2] 
https://lore.kernel.org/qemu-devel/20240202132521.32714-1-kw...@redhat.com/

---
 hw/virtio/vdpa-dev.c |  8 +++-
 net/vhost-vdpa.c | 15 ---
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
index eb9ecea83b..d57cd76c18 100644
--- a/hw/virtio/vdpa-dev.c
+++ b/hw/virtio/vdpa-dev.c
@@ -259,7 +259,11 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, 
Error **errp)
 goto err_guest_notifiers;
 }
 for (i = 0; i < s->dev.nvqs; ++i) {
-vhost_vdpa_set_vring_ready(>vdpa, i);
+ret = vhost_vdpa_set_vring_ready(>vdpa, i);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Error starting vring %d", i);
+goto err_dev_stop;
+}
 }
 s->started = true;

@@ -274,6 +278,8 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, 
Error **errp)

 return ret;

+err_dev_stop:
+vhost_dev_stop(>dev, vdev, false);
 err_guest_notifiers:
 k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
 err_host_notifiers:
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 3726ee5d67..e3d8036479 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -381,7 +381,10 @@ static int vhost_vdpa_net_data_load(NetClientState *nc)
 }

 for (int i = 0; i < v->dev->nvqs; ++i) {
-vhost_vdpa_set_vring_ready(v, i + v->dev->vq_index);
+int ret = vhost_vdpa_set_vring_ready(v, i + v->dev->vq_index);
+if (ret < 0) {
+return ret;
+}
 }
 return 0;
 }
@@ -1213,7 +1216,10 @@ static int vhost_vdpa_net_cvq_load(NetClientState *nc)

 assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);

-vhost_vdpa_set_vring_ready(v, v->dev->vq_index);
+r = vhost_vdpa_set_vring_ready(v, v->dev->vq_index);
+if (unlikely(r < 0)) {
+return r;
+}

 if (v->shadow_vqs_enabled) {
 n = VIRTIO_NET(v->dev->vdev);
@@ -1252,7 +1258,10 @@ static int vhost_vdpa_net_cvq_load(NetClientState *nc)
 }

 for (int i = 0; i < v->dev->vq_index; ++i) {
-vhost_vdpa_set_vring_ready(v, i);
+r = vhost_vdpa_set_vring_ready(v, i);
+if (unlikely(r < 0)) {
+return r;
+}
 }

 return 0;
--
2.43.0

Re: [PATCH] vhost-vdpa: check vhost_vdpa_set_vring_ready() return value

2024-03-13 Thread Stefano Garzarella

Gentle ping :-)

On Wed, Feb 7, 2024 at 10:27 AM Stefano Garzarella  wrote:
>
> vhost_vdpa_set_vring_ready() could already fail, but if Linux's
> patch [1] will be merged, it may fail with more chance if
> userspace does not activate virtqueues before DRIVER_OK when
> VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not negotiated.
>
> So better check its return value anyway.
>
> [1] 
> https://lore.kernel.org/virtualization/20240206145154.118044-1-sgarz...@redhat.com/T/#u
>
> Signed-off-by: Stefano Garzarella 
> ---
> Note: This patch conflicts with [2], but the resolution is simple,
> so for now I sent a patch for the current master, but I'll rebase
> this patch if we merge the other one first.
>
> [2] 
> https://lore.kernel.org/qemu-devel/20240202132521.32714-1-kw...@redhat.com/
> ---
>  hw/virtio/vdpa-dev.c |  8 +++-
>  net/vhost-vdpa.c | 15 ---
>  2 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> index eb9ecea83b..d57cd76c18 100644
> --- a/hw/virtio/vdpa-dev.c
> +++ b/hw/virtio/vdpa-dev.c
> @@ -259,7 +259,11 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, 
> Error **errp)
>  goto err_guest_notifiers;
>  }
>  for (i = 0; i < s->dev.nvqs; ++i) {
> -vhost_vdpa_set_vring_ready(>vdpa, i);
> +ret = vhost_vdpa_set_vring_ready(>vdpa, i);
> +if (ret < 0) {
> +error_setg_errno(errp, -ret, "Error starting vring %d", i);
> +goto err_dev_stop;
> +}
>  }
>  s->started = true;
>
> @@ -274,6 +278,8 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, 
> Error **errp)
>
>  return ret;
>
> +err_dev_stop:
> +vhost_dev_stop(>dev, vdev, false);
>  err_guest_notifiers:
>  k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
>  err_host_notifiers:
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index 3726ee5d67..e3d8036479 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -381,7 +381,10 @@ static int vhost_vdpa_net_data_load(NetClientState *nc)
>  }
>
>  for (int i = 0; i < v->dev->nvqs; ++i) {
> -vhost_vdpa_set_vring_ready(v, i + v->dev->vq_index);
> +int ret = vhost_vdpa_set_vring_ready(v, i + v->dev->vq_index);
> +if (ret < 0) {
> +return ret;
> +}
>  }
>  return 0;
>  }
> @@ -1213,7 +1216,10 @@ static int vhost_vdpa_net_cvq_load(NetClientState *nc)
>
>  assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
>
> -vhost_vdpa_set_vring_ready(v, v->dev->vq_index);
> +r = vhost_vdpa_set_vring_ready(v, v->dev->vq_index);
> +if (unlikely(r < 0)) {
> +return r;
> +}
>
>  if (v->shadow_vqs_enabled) {
>  n = VIRTIO_NET(v->dev->vdev);
> @@ -1252,7 +1258,10 @@ static int vhost_vdpa_net_cvq_load(NetClientState *nc)
>  }
>
>  for (int i = 0; i < v->dev->vq_index; ++i) {
> -vhost_vdpa_set_vring_ready(v, i);
> +r = vhost_vdpa_set_vring_ready(v, i);
> +if (unlikely(r < 0)) {
> +return r;
> +}
>  }
>
>  return 0;
> --
> 2.43.0
>

Re: [PATCH 9/9] hostmem-file: support POSIX shm_open()

2024-02-29 Thread Stefano Garzarella


On Thu, Feb 29, 2024 at 11:28:37AM +0100, Markus Armbruster wrote:

Stefano Garzarella  writes:


On Wed, Feb 28, 2024 at 01:32:17PM +0100, Markus Armbruster wrote:

Stefano Garzarella  writes:


[...]


+# @shm: if true, shm_open(3) is used to create/open POSIX shared memory
+#   object; if false, an open(2) is used. (default: false) (since 9.0)
+#


Please format like this for consistency:


Sure.



# @shm: if true, shm_open(3) is used to create/open POSIX shared memory
# object; if false, an open(2) is used (default: false) (since 9.0)


I just noticed that I followed the property just above (@rom). Should we fix 
that one?


Yes, please.


Done: 
https://patchew.org/QEMU/20240229105826.16354-1-sgarz...@redhat.com/


Thanks,
Stefano



See commit a937b6aa739 (qapi: Reformat doc comments to conform to
current conventions).

[PATCH] qapi: Fix format of the memory-backend-file's @rom property doc comment

2024-02-29 Thread Stefano Garzarella

Reflow paragraph following commit a937b6aa73 ("qapi: Reformat doc
comments to conform to current conventions"): use 4 spaces indentation,
70 columns width, and two spaces to separate sentences.

Suggested-by: Markus Armbruster 
Signed-off-by: Stefano Garzarella 
---
 qapi/qom.json | 27 ++-
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 2a6e49365a..db1b0fdea2 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -668,19 +668,20 @@
 # @readonly: if true, the backing file is opened read-only; if false,
 # it is opened read-write.  (default: false)
 #
-# @rom: whether to create Read Only Memory (ROM) that cannot be modified
-#   by the VM.  Any write attempts to such ROM will be denied.  Most
-#   use cases want writable RAM instead of ROM.  However, selected use
-#   cases, like R/O NVDIMMs, can benefit from ROM.  If set to 'on',
-#   create ROM; if set to 'off', create writable RAM;  if set to
-#   'auto', the value of the @readonly property is used.  This
-#   property is primarily helpful when we want to have proper RAM in
-#   configurations that would traditionally create ROM before this
-#   property was introduced: VM templating, where we want to open a
-#   file readonly (@readonly set to true) and mark the memory to be
-#   private for QEMU (@share set to false).  For this use case, we need
-#   writable RAM instead of ROM, and want to set this property to 'off'.
-#   (default: auto, since 8.2)
+# @rom: whether to create Read Only Memory (ROM) that cannot be
+# modified by the VM.  Any write attempts to such ROM will be
+# denied.  Most use cases want writable RAM instead of ROM.
+# However, selected use cases, like R/O NVDIMMs, can benefit from
+# ROM.  If set to 'on', create ROM; if set to 'off', create
+# writable RAM; if set to 'auto', the value of the @readonly
+# property is used.  This property is primarily helpful when we
+# want to have proper RAM in configurations that would
+# traditionally create ROM before this property was introduced: VM
+# templating, where we want to open a file readonly (@readonly set
+# to true) and mark the memory to be private for QEMU (@share set
+# to false).  For this use case, we need writable RAM instead of
+# ROM, and want to set this property to 'off'.  (default: auto,
+# since 8.2)
 #
 # Since: 2.1
 ##
-- 
2.44.0

Re: [PATCH 9/9] hostmem-file: support POSIX shm_open()

2024-02-29 Thread Stefano Garzarella


On Wed, Feb 28, 2024 at 01:32:17PM +0100, Markus Armbruster wrote:

Stefano Garzarella  writes:


Add a new `shm` bool option for `-object memory-backend-file`.

When this option is set to true, the POSIX shm_open(3) is used instead
of open(2).

So a file will not be created in the filesystem, but a "POSIX shared
memory object" will be instantiated. In Linux this turns into a file
in /dev/shm, but in other OSes this may not happen (for example in
macOS or FreeBSD nothing is shown in any filesystem).

This new feature is useful when we need to share guest memory with
another process (e.g. vhost-user backend), but we don't have
memfd_create() or any special filesystems (e.g. /dev/shm) available
as in macOS.

Signed-off-by: Stefano Garzarella 
---
I am not sure this is the best way to support shm_open() in QEMU.

Other solutions I had in mind were:

- create a new memory-backend-shm


How would that look like?  Would it involve duplicating code?


I was looking at it just now, and apart from some boilerplate code to 
create the object, the rest in the end is pretty specific and a lot of 
things in memory-backend-file wouldn't be supported by 
memory-backend-shm anyway, so I'll give it a try for v2 by adding it.





- extend memory-backend-memfd to use shm_open() on systems where memfd is
not available (problem: shm_open wants a name to assign to the object, but
we can do a workaround by using a random name and do the unlink right away)


Hmm.  Too much magic?  I don't know...


Yeah, I agree.




Any preference/suggestion?


[...]


diff --git a/qapi/qom.json b/qapi/qom.json
index 2a6e49365a..bfb01b909f 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -682,6 +682,9 @@

  # @mem-path: the path to either a shared memory or huge page
  # filesystem mount

Does this need adjustment?


Good point. For now I think I will drop this patch and add 
memory-backend-shm in v2, but if I go back I will fix it!




[...]


 #   writable RAM instead of ROM, and want to set this property to 'off'.
 #   (default: auto, since 8.2)
 #
+# @shm: if true, shm_open(3) is used to create/open POSIX shared memory
+#   object; if false, an open(2) is used. (default: false) (since 9.0)
+#


Please format like this for consistency:


Sure.



# @shm: if true, shm_open(3) is used to create/open POSIX shared memory
# object; if false, an open(2) is used (default: false) (since 9.0)


I just noticed that I followed the property just above (@rom). Should we 
fix that one?


Thanks,
Stefano

Re: [PATCH 9/9] hostmem-file: support POSIX shm_open()

2024-02-29 Thread Stefano Garzarella


On Wed, Feb 28, 2024 at 12:08:37PM +, Daniel P. Berrangé wrote:

On Wed, Feb 28, 2024 at 12:47:59PM +0100, Stefano Garzarella wrote:

Add a new `shm` bool option for `-object memory-backend-file`.

When this option is set to true, the POSIX shm_open(3) is used instead
of open(2).

So a file will not be created in the filesystem, but a "POSIX shared
memory object" will be instantiated. In Linux this turns into a file
in /dev/shm, but in other OSes this may not happen (for example in
macOS or FreeBSD nothing is shown in any filesystem).

This new feature is useful when we need to share guest memory with
another process (e.g. vhost-user backend), but we don't have
memfd_create() or any special filesystems (e.g. /dev/shm) available
as in macOS.

Signed-off-by: Stefano Garzarella 
---
I am not sure this is the best way to support shm_open() in QEMU.

Other solutions I had in mind were:

- create a new memory-backend-shm

- extend memory-backend-memfd to use shm_open() on systems where memfd is
not available (problem: shm_open wants a name to assign to the object, but
we can do a workaround by using a random name and do the unlink right away)


IMHO, create a new memory-backend-shm, don't overload memory-backend-memfd,
as this lets users choose between shm & memfd, even on Linux.


Yeah, good point!
I think there's enough of a consensus on adding memory-backend-shm, so 
I'm going to go toward that direction in v2.


Thanks,
Stefano

Re: [PATCH 9/9] hostmem-file: support POSIX shm_open()

2024-02-29 Thread Stefano Garzarella


On Wed, Feb 28, 2024 at 01:01:55PM +0100, David Hildenbrand wrote:

On 28.02.24 12:47, Stefano Garzarella wrote:

Add a new `shm` bool option for `-object memory-backend-file`.

When this option is set to true, the POSIX shm_open(3) is used instead
of open(2).

So a file will not be created in the filesystem, but a "POSIX shared
memory object" will be instantiated. In Linux this turns into a file
in /dev/shm, but in other OSes this may not happen (for example in
macOS or FreeBSD nothing is shown in any filesystem).

This new feature is useful when we need to share guest memory with
another process (e.g. vhost-user backend), but we don't have
memfd_create() or any special filesystems (e.g. /dev/shm) available
as in macOS.

Signed-off-by: Stefano Garzarella 
---
I am not sure this is the best way to support shm_open() in QEMU.

Other solutions I had in mind were:

- create a new memory-backend-shm

- extend memory-backend-memfd to use shm_open() on systems where memfd is
not available (problem: shm_open wants a name to assign to the object, but
we can do a workaround by using a random name and do the unlink right away)

Any preference/suggestion?


Both sound like reasonable options, and IMHO better than hostmem-file 
with things that are not necessarily "real" files.


Yeah, I see.



Regarding memory-backend-memfd, we similarly have to pass a name to 
memfd_create(), although for different purpose: "The  name  supplied 
in name is used as a filename and will be displayed as the target of 
the corresponding symbolic link in the directory /proc/self/fd/".


So we simply pass TYPE_MEMORY_BACKEND_MEMFD.


Okay, so I guess it must be unique only in the process, while for 
shm_open() it is global.




Likely, memory-backend-shm that directly maps to shm_open() and only 
provides properties reasonable for shm_open() is the cleanest 
approach. So that would currently be my preference :)


Thank you for your thoughts, I think I will go toward this direction 
(memory-backend-shm).


It was also my first choice, but in order to have a working RFC right 
away, I modified memory-backend-file.


Stefano

Re: Intention to work on GSoC project

2024-02-29 Thread Stefano Garzarella

Hi Sahil,

On Sun, Feb 25, 2024 at 10:38 PM Sahil  wrote:
>
> Hi,
>
> My name is Sahil and I go by the pseudonym 'valdaarhun' on Github. I have
> never contributed to QEMU before but I have used it a few times as an end
> user. I developed an interest in virtualization during my internship at
> VMware and would like to dive deeper in this subfield.
>
> My current full-time job does not allow me to take part in external programs
> that are paid. I would like to work on one of the proposed projects outside
> of GSoC.

Sure, not a problem at all, also because for this year QEMU was not
accepted in GSoC, so anybody can work on those projects if they have
time

> I have gone through QEMU's list of GSoC '24 projects [1] and am
> interested in two of them:
>
> 1. Add packed virtqueue to Shadow Virtqueue
> 2. vhost-user memory isolation
>
> Based on what I have understood, they are somewhat related and are part
> of the migration subsystem. I feel the learning curve of the first project
> will be less steep and will make me better prepared to tackle the second
> project as well.

The first project is for sure related with migration. While vhost-user
memory isolation is not really related to migration, but both are
related to virtio devices.
Anyway, your plan looks good to me!

>
> I have read the "Getting Started for Developers" [2] wiki page. I have also
> built QEMU from source.

Great!

>
> I think my next step should be to read the documentation on the migration
> subsystem [3], the blog posts attached in the first project's description
> and virtqueue's implementation. Would you also recommend that I work on a
> QEMU issue that is open on Gitlab and related to virtqueues/virtio to
> familiarize
> myself with the codebase? I went through the issues tagged as "device:virtio"
> [4]
> but can't really tell if any of them are good for beginners. One of them has
> the
> "bite-size" tag [5]. It also has a patch attached but hasn't been merged.
> Shall I
> work on getting that merged?

Yeah, "bite-size" issues should be better to understand how to
contribute to QEMU.
Feel free to work on any issue, doing the work or helping to complete
old patches.

>
> I have worked on a few smaller systems programming issues in other
> organizations (eg: strace [6], htop [7]) in the past.
>
> I look forward to hearing from you.

Feel free to reach us if you have more questions on the projects.

Thanks,
Stefano

Re: [PATCH 0/9] vhost-user: support any POSIX system (tested on macOS and FreeBSD)

2024-02-28 Thread Stefano Garzarella

I just noticed that I forgot to add RFC tag and fix Author to match
SOB in some patches, sorry!

Stefano

On Wed, Feb 28, 2024 at 12:48 PM Stefano Garzarella  wrote:
>
> The vhost-user protocol is not really Linux-specific, so let's try support
> QEMU's frontends and backends (including libvhost-user) in any POSIX system
> with this series. The main use case is to be able to use virtio devices that
> we don't have built-in in QEMU (e.g. virtiofsd, vhost-user-vsock, etc.) even
> in non-Linux systems.
>
> The first 5 patches are more like fixes discovered at runtime on macOS or
> FreeBSD that could go even independently of this series.
>
> Patches 6, 7, and 8 enable building of frontends and backends (including
> libvhost-user) with associated code changes to succeed in compilation.
>
> The latest patch (9) adds support for the POSIX shm_open() API to create
> shared memory which is identified by an fd that can be shared with vhost-user
> backends. This is useful on those systems (like macOS) where we don't have
> memfd_create() or special filesystems like "/dev/shm".
>
> I put the whole series in RFC because I have some questions especially in
> patch 6 and 9, but in general any comment/suggestion/test are really welcome.
>
> Maybe the first 5 patches can go separately, but I only discovered those
> problems after testing patches 6 - 9, so I have included them in this series
> for now. Please let me know if you prefer that I send them separately.
>
> For now I tested this series using vhost-user-blk and QSD on
> macOS Sonoma 14.3.1 (aarch64), FreeBSD 14 (x86_64), and Fedora 39 (x86_64)
> in this way:
>
> - Start vhost-user-blk or QSD (same commands for all systems)
>
>   vhost-user-blk -s /tmp/vhost.socket \
> -b Fedora-Cloud-Base-39-1.5.x86_64.raw
>
>   qemu-storage-daemon \
> --blockdev 
> file,filename=Fedora-Cloud-Base-39-1.5.x86_64.qcow2,node-name=file \
> --blockdev qcow2,file=file,node-name=qcow2 \
> --export 
> vhost-user-blk,addr.type=unix,addr.path=/tmp/vhost.socket,id=vub,num-queues=1,node-name=qcow2,writable=on
>
> - macOS (aarch64): start QEMU (using hvf accelerator)
>
>   qemu-system-aarch64 -smp 2 -cpu host -M virt,accel=hvf,memory-backend=mem \
> -drive 
> file=./build/pc-bios/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on \
> -device virtio-net-device,netdev=net0 -netdev user,id=net0 \
> -device ramfb -device usb-ehci -device usb-kbd \
> -object 
> memory-backend-file,mem-path="/mem0",shm=on,share=on,id=mem,size=512M \
> -device vhost-user-blk-pci,num-queues=1,disable-legacy=on,chardev=char0 \
> -chardev socket,id=char0,path=/tmp/vhost.socket
>
> - FreeBSD (x86_64): start QEMU (no accelerators available)
>
>   qemu-system-x86_64 -smp 2 -M q35,memory-backend=mem \
> -object 
> memory-backend-file,mem-path="/mem0",shm=on,share=on,id=mem,size="512M" \
> -device vhost-user-blk-pci,num-queues=1,chardev=char0 \
> -chardev socket,id=char0,path=/tmp/vhost.socket
>
> - Fedora (x86_64): start QEMU (using kvm accelerator)
>
>   qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
> -object 
> memory-backend-file,mem-path="/mem0",shm=on,share=on,id=mem,size="512M" \
> -device vhost-user-blk-pci,num-queues=1,chardev=char0 \
> -chardev socket,id=char0,path=/tmp/vhost.socket
>
> Thanks,
> Stefano
>
> Stefano Garzarella (9):
>   libvhost-user: set msg.msg_control to NULL when it is empty
>   libvhost-user: fail vu_message_write() if sendmsg() is failing
>   libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported
>   vhost-user-server: don't abort if we can't set fd non-blocking
>   contrib/vhost-user-blk: fix bind() using the right size of the address
>   vhost-user: enable frontends on any POSIX system
>   libvhost-user: enable it on any POSIX system
>   contrib/vhost-user-blk: enabled it on any POSIX system
>   hostmem-file: support POSIX shm_open()
>
>  meson.build   |  5 +-
>  qapi/qom.json |  4 ++
>  subprojects/libvhost-user/libvhost-user.h |  2 +-
>  backends/hostmem-file.c   | 57 -
>  contrib/vhost-user-blk/vhost-user-blk.c   | 23 +--
>  hw/net/vhost_net.c|  8 ++-
>  subprojects/libvhost-user/libvhost-user.c | 76 ++-
>  util/vhost-user-server.c  |  6 +-
>  backends/meson.build  |  2 +-
>  hw/block/Kconfig  |  2 +-
>  qemu-options.hx   | 10 ++-
>  util/meson.build  |  4 +-
>  12 files changed, 179 insertions(+), 20 deletions(-)
>
> --
> 2.43.2
>

[PATCH 7/9] libvhost-user: enable it on any POSIX system

2024-02-28 Thread Stefano Garzarella

The vhost-user protocol is not really Linux-specific so let's enable
libvhost-user for any POSIX system.

Compiling it on macOS and FreeBSD some problems came up:
- avoid to include linux/vhost.h which is avaibale only on Linux
  (vhost_types.h contains many of the things we need)
- macOS doesn't provide sys/endian.h, so let's define them
  (note: libvhost-user doesn't include qemu's headers, so we can't use
   use "qemu/bswap.h")
- define eventfd_[write|read] as write/read wrapper when system doesn't
  provide those (e.g. macOS)
- copy SEAL defines from include/qemu/memfd.h to make the code works
  on FreeBSD where MFD_ALLOW_SEALING is defined
- define MAP_NORESERVE if it's not defined (e.g. on FreeBSD)

Signed-off-by: Stefano Garzarella 
---
 meson.build   |  2 +-
 subprojects/libvhost-user/libvhost-user.h |  2 +-
 subprojects/libvhost-user/libvhost-user.c | 60 +--
 3 files changed, 59 insertions(+), 5 deletions(-)

diff --git a/meson.build b/meson.build
index 462f2d7593..af15178969 100644
--- a/meson.build
+++ b/meson.build
@@ -3188,7 +3188,7 @@ endif
 config_host_data.set('CONFIG_FDT', fdt.found())
 
 vhost_user = not_found
-if host_os == 'linux' and have_vhost_user
+if have_vhost_user
   libvhost_user = subproject('libvhost-user')
   vhost_user = libvhost_user.get_variable('vhost_user_dep')
 endif
diff --git a/subprojects/libvhost-user/libvhost-user.h 
b/subprojects/libvhost-user/libvhost-user.h
index c2352904f0..e684b999b4 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -18,9 +18,9 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include "standard-headers/linux/virtio_ring.h"
+#include "standard-headers/linux/vhost_types.h"
 
 /* Based on qemu/hw/virtio/vhost-user.c */
 #define VHOST_USER_F_PROTOCOL_FEATURES 30
diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 5da8de838b..1075fd3147 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -28,9 +28,7 @@
 #include 
 #include 
 #include 
-#include 
 #include 
-#include 
 
 /* Necessary to provide VIRTIO_F_VERSION_1 on system
  * with older linux headers. Must appear before
@@ -39,8 +37,8 @@
 #include "standard-headers/linux/virtio_config.h"
 
 #if defined(__linux__)
+#include 
 #include 
-#include 
 #include 
 #include 
 
@@ -50,6 +48,62 @@
 
 #endif
 
+#if defined(__APPLE__) && (__MACH__)
+#include 
+#define htobe16(x) OSSwapHostToBigInt16(x)
+#define htole16(x) OSSwapHostToLittleInt16(x)
+#define be16toh(x) OSSwapBigToHostInt16(x)
+#define le16toh(x) OSSwapLittleToHostInt16(x)
+
+#define htobe32(x) OSSwapHostToBigInt32(x)
+#define htole32(x) OSSwapHostToLittleInt32(x)
+#define be32toh(x) OSSwapBigToHostInt32(x)
+#define le32toh(x) OSSwapLittleToHostInt32(x)
+
+#define htobe64(x) OSSwapHostToBigInt64(x)
+#define htole64(x) OSSwapHostToLittleInt64(x)
+#define be64toh(x) OSSwapBigToHostInt64(x)
+#define le64toh(x) OSSwapLittleToHostInt64(x)
+#endif
+
+#ifdef CONFIG_EVENTFD
+#include 
+#else
+#define eventfd_t uint64_t
+
+int eventfd_write(int fd, eventfd_t value)
+{
+return (write(fd, , sizeof(value)) == sizeof(value)) ? 0 : -1;
+}
+
+int eventfd_read(int fd, eventfd_t *value)
+{
+return (read(fd, value, sizeof(*value)) == sizeof(*value)) ? 0 : -1;
+}
+#endif
+
+#ifdef MFD_ALLOW_SEALING
+#include 
+
+#ifndef F_LINUX_SPECIFIC_BASE
+#define F_LINUX_SPECIFIC_BASE 1024
+#endif
+
+#ifndef F_ADD_SEALS
+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
+
+#define F_SEAL_SEAL 0x0001  /* prevent further seals from being set */
+#define F_SEAL_SHRINK   0x0002  /* prevent file from shrinking */
+#define F_SEAL_GROW 0x0004  /* prevent file from growing */
+#define F_SEAL_WRITE0x0008  /* prevent writes */
+#endif
+#endif
+
+#ifndef MAP_NORESERVE
+#define MAP_NORESERVE 0
+#endif
+
 #include "include/atomic.h"
 
 #include "libvhost-user.h"
-- 
2.43.2

[PATCH 9/9] hostmem-file: support POSIX shm_open()

2024-02-28 Thread Stefano Garzarella

Add a new `shm` bool option for `-object memory-backend-file`.

When this option is set to true, the POSIX shm_open(3) is used instead
of open(2).

So a file will not be created in the filesystem, but a "POSIX shared
memory object" will be instantiated. In Linux this turns into a file
in /dev/shm, but in other OSes this may not happen (for example in
macOS or FreeBSD nothing is shown in any filesystem).

This new feature is useful when we need to share guest memory with
another process (e.g. vhost-user backend), but we don't have
memfd_create() or any special filesystems (e.g. /dev/shm) available
as in macOS.

Signed-off-by: Stefano Garzarella 
---
I am not sure this is the best way to support shm_open() in QEMU.

Other solutions I had in mind were:

- create a new memory-backend-shm

- extend memory-backend-memfd to use shm_open() on systems where memfd is
not available (problem: shm_open wants a name to assign to the object, but
we can do a workaround by using a random name and do the unlink right away)

Any preference/suggestion?

Thanks,
Stefano
---
 qapi/qom.json   |  4 +++
 backends/hostmem-file.c | 57 -
 backends/meson.build|  2 +-
 qemu-options.hx | 10 +++-
 4 files changed, 70 insertions(+), 3 deletions(-)

diff --git a/qapi/qom.json b/qapi/qom.json
index 2a6e49365a..bfb01b909f 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -682,6 +682,9 @@
 #   writable RAM instead of ROM, and want to set this property to 'off'.
 #   (default: auto, since 8.2)
 #
+# @shm: if true, shm_open(3) is used to create/open POSIX shared memory
+#   object; if false, an open(2) is used. (default: false) (since 9.0)
+#
 # Since: 2.1
 ##
 { 'struct': 'MemoryBackendFileProperties',
@@ -692,6 +695,7 @@
 'mem-path': 'str',
 '*pmem': { 'type': 'bool', 'if': 'CONFIG_LIBPMEM' },
 '*readonly': 'bool',
+'*shm': 'bool',
 '*rom': 'OnOffAuto' } }
 
 ##
diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index ac3e433cbd..9d60375c1f 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -34,6 +34,7 @@ struct HostMemoryBackendFile {
 bool is_pmem;
 bool readonly;
 OnOffAuto rom;
+bool shm;
 };
 
 static bool
@@ -86,7 +87,37 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error 
**errp)
 ram_flags |= fb->rom == ON_OFF_AUTO_ON ? RAM_READONLY : 0;
 ram_flags |= backend->reserve ? 0 : RAM_NORESERVE;
 ram_flags |= fb->is_pmem ? RAM_PMEM : 0;
+/* TODO: check if this should be enabled if shm is enabled */
 ram_flags |= RAM_NAMED_FILE;
+
+if (fb->shm) {
+mode_t mode = S_IRUSR | S_IWUSR;
+int fd, oflag = 0;
+
+oflag |= fb->readonly ? O_RDONLY : O_RDWR;
+oflag |= O_CREAT;
+
+fd = shm_open(fb->mem_path, oflag, mode);
+if (fd < 0) {
+error_setg_errno(errp, errno,
+ "failed to create POSIX shared memory");
+return false;
+}
+
+if (ftruncate(fd, backend->size) == -1) {
+error_setg_errno(errp, errno,
+ "failed to resize POSIX shared memory to %" 
PRIu64,
+ backend->size);
+shm_unlink(fb->mem_path);
+return false;
+}
+
+return memory_region_init_ram_from_fd(>mr, OBJECT(backend),
+  name, backend->size, ram_flags,
+  fd, fb->offset, errp);
+
+}
+
 return memory_region_init_ram_from_file(>mr, OBJECT(backend), 
name,
 backend->size, fb->align, 
ram_flags,
 fb->mem_path, fb->offset, errp);
@@ -254,17 +285,36 @@ static void file_memory_backend_set_rom(Object *obj, 
Visitor *v,
 visit_type_OnOffAuto(v, name, >rom, errp);
 }
 
+static bool file_memory_backend_get_shm(Object *obj, Error **errp)
+{
+return MEMORY_BACKEND_FILE(obj)->shm;
+}
+
+static void file_memory_backend_set_shm(Object *obj, bool value,
+ Error **errp)
+{
+MEMORY_BACKEND_FILE(obj)->shm = value;
+}
+
 static void file_backend_unparent(Object *obj)
 {
 HostMemoryBackend *backend = MEMORY_BACKEND(obj);
 HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(obj);
 
-if (host_memory_backend_mr_inited(backend) && fb->discard_data) {
+if (!host_memory_backend_mr_inited(backend)) {
+return;
+}
+
+if (fb->discard_data) {
 void *ptr = memory_region_get_ram_ptr(>mr);
 uint64_t sz = memory_region_size(>mr);
 
 qemu_madvise(ptr, sz, QEMU_MADV_REMOVE);
 }
+
+if (fb->shm) {
+shm_unlink(fb->mem_path);
+}
 }
 
 static void
@@ -300,6 +350,11 @@ file_backen

[PATCH 6/9] vhost-user: enable frontends on any POSIX system

2024-02-28 Thread Stefano Garzarella

The vhost-user protocol is not really Linux-specific so let's enable
vhost-user frontends for any POSIX system.

In vhost_net.c we use VHOST_FILE_UNBIND which is defined in a Linux
specific header, let's define it for other systems as well.

Signed-off-by: Stefano Garzarella 
---
If we want to be more conservative, maybe we can leave `have_vhost_user`
auto only on linux. I removed it more to test with CI if we have any
problems in the other POSIX systems.

In hw/net/vhost_net.c maybe we can just do:
 #ifndef VHOST_FILE_UNBIND
 #define VHOST_FILE_UNBIND -1
 #endif

Any suggestion?

Thanks,
Stefano
---
 meson.build| 1 -
 hw/net/vhost_net.c | 8 +++-
 hw/block/Kconfig   | 2 +-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/meson.build b/meson.build
index 0ef1654e86..462f2d7593 100644
--- a/meson.build
+++ b/meson.build
@@ -151,7 +151,6 @@ have_tpm = get_option('tpm') \
 
 # vhost
 have_vhost_user = get_option('vhost_user') \
-  .disable_auto_if(host_os != 'linux') \
   .require(host_os != 'windows',
error_message: 'vhost-user is not available on Windows').allowed()
 have_vhost_vdpa = get_option('vhost_vdpa') \
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index e8e1661646..346ef74eb1 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -34,8 +34,14 @@
 #include "standard-headers/linux/virtio_ring.h"
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/virtio-bus.h"
-#include "linux-headers/linux/vhost.h"
 
+#if defined(__linux__)
+#include "linux-headers/linux/vhost.h"
+#else
+#ifndef VHOST_FILE_UNBIND
+#define VHOST_FILE_UNBIND -1
+#endif
+#endif
 
 /* Features supported by host kernel. */
 static const int kernel_feature_bits[] = {
diff --git a/hw/block/Kconfig b/hw/block/Kconfig
index 9e8f28f982..29ee09e434 100644
--- a/hw/block/Kconfig
+++ b/hw/block/Kconfig
@@ -40,7 +40,7 @@ config VHOST_USER_BLK
 bool
 # Only PCI devices are provided for now
 default y if VIRTIO_PCI
-depends on VIRTIO && VHOST_USER && LINUX
+depends on VIRTIO && VHOST_USER
 
 config SWIM
 bool
-- 
2.43.2

[PATCH 8/9] contrib/vhost-user-blk: enabled it on any POSIX system

2024-02-28 Thread Stefano Garzarella

Let's make the code more portable by using the "qemu/bswap.h" API
and adding defines from block/file-posix.c to support O_DIRECT in
other systems (e.g. macOS).

vhost-user-server.c is a dependency, let's enable it for any POSIX
system.

Signed-off-by: Stefano Garzarella 
---
 meson.build |  2 --
 contrib/vhost-user-blk/vhost-user-blk.c | 19 +--
 util/meson.build|  4 +++-
 3 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/meson.build b/meson.build
index af15178969..8afda86d3d 100644
--- a/meson.build
+++ b/meson.build
@@ -1954,8 +1954,6 @@ has_statx = cc.has_header_symbol('sys/stat.h', 
'STATX_BASIC_STATS', prefix: gnu_
 has_statx_mnt_id = cc.has_header_symbol('sys/stat.h', 'STATX_MNT_ID', prefix: 
gnu_source_prefix)
 
 have_vhost_user_blk_server = get_option('vhost_user_blk_server') \
-  .require(host_os == 'linux',
-   error_message: 'vhost_user_blk_server requires linux') \
   .require(have_vhost_user,
error_message: 'vhost_user_blk_server requires vhost-user support') 
\
   .disable_auto_if(not have_tools and not have_system) \
diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index a8ab9269a2..462e584857 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -16,6 +16,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/bswap.h"
 #include "standard-headers/linux/virtio_blk.h"
 #include "libvhost-user-glib.h"
 
@@ -24,6 +25,20 @@
 #include 
 #endif
 
+/* OS X does not have O_DSYNC */
+#ifndef O_DSYNC
+#ifdef O_SYNC
+#define O_DSYNC O_SYNC
+#elif defined(O_FSYNC)
+#define O_DSYNC O_FSYNC
+#endif
+#endif
+
+/* Approximate O_DIRECT with O_DSYNC if O_DIRECT isn't available */
+#ifndef O_DIRECT
+#define O_DIRECT O_DSYNC
+#endif
+
 enum {
 VHOST_USER_BLK_MAX_QUEUES = 8,
 };
@@ -267,13 +282,13 @@ static int vub_virtio_process_req(VubDev *vdev_blk,
 req->in = (struct virtio_blk_inhdr *)elem->in_sg[in_num - 1].iov_base;
 in_num--;
 
-type = le32toh(req->out->type);
+type = le32_to_cpu(req->out->type);
 switch (type & ~VIRTIO_BLK_T_BARRIER) {
 case VIRTIO_BLK_T_IN:
 case VIRTIO_BLK_T_OUT: {
 ssize_t ret = 0;
 bool is_write = type & VIRTIO_BLK_T_OUT;
-req->sector_num = le64toh(req->out->sector);
+req->sector_num = le64_to_cpu(req->out->sector);
 if (is_write) {
 ret  = vub_writev(req, >out_sg[1], out_num);
 } else {
diff --git a/util/meson.build b/util/meson.build
index 0ef9886be0..f52682ce96 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -113,10 +113,12 @@ if have_block
 util_ss.add(files('filemonitor-stub.c'))
   endif
   if host_os == 'linux'
-util_ss.add(files('vhost-user-server.c'), vhost_user)
 util_ss.add(files('vfio-helpers.c'))
 util_ss.add(files('chardev_open.c'))
   endif
+  if host_os != 'windows'
+util_ss.add(files('vhost-user-server.c'), vhost_user)
+  endif
 endif
 
 if cpu == 'aarch64'
-- 
2.43.2

[PATCH 3/9] libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported

2024-02-28 Thread Stefano Garzarella

libvhost-user will panic when receiving VHOST_USER_GET_INFLIGHT_FD
message if MFD_ALLOW_SEALING is not defined, since it's not able
to create a memfd.

VHOST_USER_GET_INFLIGHT_FD is used only if
VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD is negotiated. So, let's mask
that feature if the backend is not able to properly handle these
messages.

Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index 527a550345..5da8de838b 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -1610,6 +1610,16 @@ vu_get_protocol_features_exec(VuDev *dev, VhostUserMsg 
*vmsg)
 features |= dev->iface->get_protocol_features(dev);
 }
 
+/*
+ * If MFD_ALLOW_SEALING is not defined, we are not able to handle
+ * VHOST_USER_GET_INFLIGHT_FD messages, since we can't create a memfd.
+ * Those messages are used only if VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD
+ * is negotiated. A device implementation can enable it, so let's mask
+ * it to avoid a runtime panic.
+ */
+#ifndef MFD_ALLOW_SEALING
+features &= ~(1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD);
+#endif
 vmsg_set_reply_u64(vmsg, features);
 return true;
 }
-- 
2.43.2

[PATCH 4/9] vhost-user-server: don't abort if we can't set fd non-blocking

2024-02-28 Thread Stefano Garzarella

From: Stefano Garzarella 

In vhost-user-server we set all fd received from the other peer
in non-blocking mode. For some of them (e.g. memfd, shm_open, etc.)
if we fail, it's not really a problem, because we don't use these
fd with blocking operations, but only to map memory.

In these cases a failure is not bad, so let's just report a warning
instead of panicking if we fail to set some fd in non-blocking mode.

This for example occurs in macOS where setting shm_open() fd
non-blocking is failing (errno -25).

Signed-off-by: Stefano Garzarella 
---
 util/vhost-user-server.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
index 3bfb1ad3ec..064999f0b7 100644
--- a/util/vhost-user-server.c
+++ b/util/vhost-user-server.c
@@ -66,7 +66,11 @@ static void vmsg_unblock_fds(VhostUserMsg *vmsg)
 {
 int i;
 for (i = 0; i < vmsg->fd_num; i++) {
-qemu_socket_set_nonblock(vmsg->fds[i]);
+int ret = qemu_socket_try_set_nonblock(vmsg->fds[i]);
+if (ret) {
+warn_report("Failed to set fd %d nonblock for request %d: %s",
+vmsg->fds[i], vmsg->request, strerror(-ret));
+}
 }
 }
 
-- 
2.43.2

[PATCH 5/9] contrib/vhost-user-blk: fix bind() using the right size of the address

2024-02-28 Thread Stefano Garzarella

From: Stefano Garzarella 

On macOS passing `-s /tmp/vhost.socket` parameter to the vhost-user-blk
application, the bind was done on `/tmp/vhost.socke` pathname,
missing the last character.

This sounds like one of the portability problems described in the
unix(7) manpage:

   Pathname sockets
   When  binding  a socket to a pathname, a few rules should
   be observed for maximum portability and ease of coding:

   •  The pathname in sun_path should be null-terminated.

   •  The length of the pathname, including the  terminating
  null byte, should not exceed the size of sun_path.

   •  The  addrlen  argument  that  describes  the enclosing
  sockaddr_un structure should have a value of at least:

  offsetof(struct sockaddr_un, sun_path) +
  strlen(addr.sun_path)+1

  or,  more  simply,  addrlen  can   be   specified   as
  sizeof(struct sockaddr_un).

So let's follow the last advice and simplify the code as well.

Signed-off-by: Stefano Garzarella 
---
 contrib/vhost-user-blk/vhost-user-blk.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/contrib/vhost-user-blk/vhost-user-blk.c 
b/contrib/vhost-user-blk/vhost-user-blk.c
index 89e5f11a64..a8ab9269a2 100644
--- a/contrib/vhost-user-blk/vhost-user-blk.c
+++ b/contrib/vhost-user-blk/vhost-user-blk.c
@@ -469,7 +469,6 @@ static int unix_sock_new(char *unix_fn)
 {
 int sock;
 struct sockaddr_un un;
-size_t len;
 
 assert(unix_fn);
 
@@ -481,10 +480,9 @@ static int unix_sock_new(char *unix_fn)
 
 un.sun_family = AF_UNIX;
 (void)snprintf(un.sun_path, sizeof(un.sun_path), "%s", unix_fn);
-len = sizeof(un.sun_family) + strlen(un.sun_path);
 
 (void)unlink(unix_fn);
-if (bind(sock, (struct sockaddr *), len) < 0) {
+if (bind(sock, (struct sockaddr *), sizeof(un)) < 0) {
 perror("bind");
 goto fail;
 }
-- 
2.43.2

[PATCH 0/9] vhost-user: support any POSIX system (tested on macOS and FreeBSD)

2024-02-28 Thread Stefano Garzarella

The vhost-user protocol is not really Linux-specific, so let's try support
QEMU's frontends and backends (including libvhost-user) in any POSIX system
with this series. The main use case is to be able to use virtio devices that
we don't have built-in in QEMU (e.g. virtiofsd, vhost-user-vsock, etc.) even
in non-Linux systems.

The first 5 patches are more like fixes discovered at runtime on macOS or
FreeBSD that could go even independently of this series.

Patches 6, 7, and 8 enable building of frontends and backends (including
libvhost-user) with associated code changes to succeed in compilation.

The latest patch (9) adds support for the POSIX shm_open() API to create
shared memory which is identified by an fd that can be shared with vhost-user
backends. This is useful on those systems (like macOS) where we don't have
memfd_create() or special filesystems like "/dev/shm".

I put the whole series in RFC because I have some questions especially in
patch 6 and 9, but in general any comment/suggestion/test are really welcome.

Maybe the first 5 patches can go separately, but I only discovered those
problems after testing patches 6 - 9, so I have included them in this series
for now. Please let me know if you prefer that I send them separately.

For now I tested this series using vhost-user-blk and QSD on
macOS Sonoma 14.3.1 (aarch64), FreeBSD 14 (x86_64), and Fedora 39 (x86_64)
in this way:

- Start vhost-user-blk or QSD (same commands for all systems)

  vhost-user-blk -s /tmp/vhost.socket \
-b Fedora-Cloud-Base-39-1.5.x86_64.raw

  qemu-storage-daemon \
--blockdev 
file,filename=Fedora-Cloud-Base-39-1.5.x86_64.qcow2,node-name=file \
--blockdev qcow2,file=file,node-name=qcow2 \
--export 
vhost-user-blk,addr.type=unix,addr.path=/tmp/vhost.socket,id=vub,num-queues=1,node-name=qcow2,writable=on

- macOS (aarch64): start QEMU (using hvf accelerator)

  qemu-system-aarch64 -smp 2 -cpu host -M virt,accel=hvf,memory-backend=mem \
-drive 
file=./build/pc-bios/edk2-aarch64-code.fd,if=pflash,format=raw,readonly=on \
-device virtio-net-device,netdev=net0 -netdev user,id=net0 \
-device ramfb -device usb-ehci -device usb-kbd \
-object 
memory-backend-file,mem-path="/mem0",shm=on,share=on,id=mem,size=512M \
-device vhost-user-blk-pci,num-queues=1,disable-legacy=on,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

- FreeBSD (x86_64): start QEMU (no accelerators available)

  qemu-system-x86_64 -smp 2 -M q35,memory-backend=mem \
-object 
memory-backend-file,mem-path="/mem0",shm=on,share=on,id=mem,size="512M" \
-device vhost-user-blk-pci,num-queues=1,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

- Fedora (x86_64): start QEMU (using kvm accelerator)

  qemu-system-x86_64 -smp 2 -M q35,accel=kvm,memory-backend=mem \
-object 
memory-backend-file,mem-path="/mem0",shm=on,share=on,id=mem,size="512M" \
-device vhost-user-blk-pci,num-queues=1,chardev=char0 \
-chardev socket,id=char0,path=/tmp/vhost.socket

Thanks,
Stefano

Stefano Garzarella (9):
  libvhost-user: set msg.msg_control to NULL when it is empty
  libvhost-user: fail vu_message_write() if sendmsg() is failing
  libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported
  vhost-user-server: don't abort if we can't set fd non-blocking
  contrib/vhost-user-blk: fix bind() using the right size of the address
  vhost-user: enable frontends on any POSIX system
  libvhost-user: enable it on any POSIX system
  contrib/vhost-user-blk: enabled it on any POSIX system
  hostmem-file: support POSIX shm_open()

 meson.build   |  5 +-
 qapi/qom.json |  4 ++
 subprojects/libvhost-user/libvhost-user.h |  2 +-
 backends/hostmem-file.c   | 57 -
 contrib/vhost-user-blk/vhost-user-blk.c   | 23 +--
 hw/net/vhost_net.c|  8 ++-
 subprojects/libvhost-user/libvhost-user.c | 76 ++-
 util/vhost-user-server.c  |  6 +-
 backends/meson.build  |  2 +-
 hw/block/Kconfig  |  2 +-
 qemu-options.hx   | 10 ++-
 util/meson.build  |  4 +-
 12 files changed, 179 insertions(+), 20 deletions(-)

-- 
2.43.2

[PATCH 2/9] libvhost-user: fail vu_message_write() if sendmsg() is failing

2024-02-28 Thread Stefano Garzarella

In vu_message_write() we use sendmsg() to send the message header,
then a write() to send the payload.

If sendmsg() fails we should avoid sending the payload, since we
were unable to send the header.

Discovered before fixing the issue with the previous patch, where
sendmsg() failed on macOS due to wrong parameters, but the frontend
still sent the payload which the backend incorrectly interpreted
as a wrong header.

Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index b9c18eee53..527a550345 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -394,6 +394,11 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg 
*vmsg)
 rc = sendmsg(conn_fd, , 0);
 } while (rc < 0 && (errno == EINTR || errno == EAGAIN));
 
+if (rc <= 0) {
+vu_panic(dev, "Error while writing: %s", strerror(errno));
+return false;
+}
+
 if (vmsg->size) {
 do {
 if (vmsg->data) {
-- 
2.43.2

[PATCH 1/9] libvhost-user: set msg.msg_control to NULL when it is empty

2024-02-28 Thread Stefano Garzarella

From: Stefano Garzarella 

On some OS (e.g. macOS) sendmsg() returns -1 (errno EINVAL) if
the `struct msghdr` has the field `msg_controllen` set to 0, but
`msg_control` is not NULL.

Signed-off-by: Stefano Garzarella 
---
 subprojects/libvhost-user/libvhost-user.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/subprojects/libvhost-user/libvhost-user.c 
b/subprojects/libvhost-user/libvhost-user.c
index a3b158c671..b9c18eee53 100644
--- a/subprojects/libvhost-user/libvhost-user.c
+++ b/subprojects/libvhost-user/libvhost-user.c
@@ -387,6 +387,7 @@ vu_message_write(VuDev *dev, int conn_fd, VhostUserMsg 
*vmsg)
 memcpy(CMSG_DATA(cmsg), vmsg->fds, fdsize);
 } else {
 msg.msg_controllen = 0;
+msg.msg_control = NULL;
 }
 
 do {
-- 
2.43.2

Re: [PATCH v1 00/15] libvhost-user: support more memslots and cleanup memslot handling code

2024-02-07 Thread Stefano Garzarella


On Fri, Feb 02, 2024 at 10:53:17PM +0100, David Hildenbrand wrote:

This series adds support for more memslots (509) to libvhost-user, to
make it fully compatible with virtio-mem that uses up to 256 memslots
accross all memory devices in "dynamic-memslot" mode (more details
in patch #3).

One simple fix upfront.

With that in place, this series optimizes and extens memory region
handling in libvhost-user:
* Heavily deduplicate and clean up the memory region handling code
* Speeds up GPA->VA translation with many memslots using binary search
* Optimize mmap_offset handling to use it as fd_offset for mmap()
* Avoid ring remapping when adding a single memory region
* Avoid dumping all guest memory, possibly allocating memory in sparse
 memory mappings when the process crashes

I'm being very careful to not break some weird corner case that modern
QEMU might no longer trigger, but older one could have triggered or some
other frontend might trigger.

The only thing where I am not careful is to forbid memory regions that
overlap in GPA space: it doesn't make any sense.

With this series, virtio-mem (with dynamic-memslots=on) +
qemu-storage-daemon works flawlessly and as expected in my tests.


I don't know much about this code, but I didn't find anything wrong with 
it. Thank you also for the great cleanup!


Acked-by: Stefano Garzarella

Re: Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-02-07 Thread Stefano Garzarella


On Wed, Feb 07, 2024 at 11:18:54AM +0100, Kevin Wolf wrote:

Am 06.02.2024 um 17:44 hat Eugenio Perez Martin geschrieben:

On Mon, Feb 5, 2024 at 2:49 PM Kevin Wolf  wrote:
>
> Am 05.02.2024 um 13:22 hat Eugenio Perez Martin geschrieben:
> > On Fri, Feb 2, 2024 at 2:25 PM Kevin Wolf  wrote:
> > >
> > > VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> > > status flag is set; with the current API of the kernel module, it is
> > > impossible to enable the opposite order in our block export code because
> > > userspace is not notified when a virtqueue is enabled.
> > >
> > > This requirement also mathces the normal initialisation order as done by
> > > the generic vhost code in QEMU. However, commit 6c482547 accidentally
> > > changed the order for vdpa-dev and broke access to VDUSE devices with
> > > this.
> > >
> > > This changes vdpa-dev to use the normal order again and use the standard
> > > vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> > > used with vdpa-dev again after this fix.
> > >
> > > Cc: qemu-sta...@nongnu.org
> > > Fixes: 6c4825476a4351530bcac17abab72295b75ffe98
> > > Signed-off-by: Kevin Wolf 
> > > ---
> > >  hw/virtio/vdpa-dev.c   |  5 +
> > >  hw/virtio/vhost-vdpa.c | 17 +
> > >  2 files changed, 18 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
> > > index eb9ecea83b..13e87f06f6 100644
> > > --- a/hw/virtio/vdpa-dev.c
> > > +++ b/hw/virtio/vdpa-dev.c
> > > @@ -253,14 +253,11 @@ static int vhost_vdpa_device_start(VirtIODevice 
*vdev, Error **errp)
> > >
> > >  s->dev.acked_features = vdev->guest_features;
> > >
> > > -ret = vhost_dev_start(>dev, vdev, false);
> > > +ret = vhost_dev_start(>dev, vdev, true);
> > >  if (ret < 0) {
> > >  error_setg_errno(errp, -ret, "Error starting vhost");
> > >  goto err_guest_notifiers;
> > >  }
> > > -for (i = 0; i < s->dev.nvqs; ++i) {
> > > -vhost_vdpa_set_vring_ready(>vdpa, i);
> > > -}
> > >  s->started = true;
> > >
> > >  /*
> > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > index 3a43beb312..c4574d56c5 100644
> > > --- a/hw/virtio/vhost-vdpa.c
> > > +++ b/hw/virtio/vhost-vdpa.c
> > > @@ -904,6 +904,22 @@ int vhost_vdpa_set_vring_ready(struct vhost_vdpa *v, 
unsigned idx)
> > >  return r;
> > >  }
> > >
> > > +static int vhost_vdpa_set_vring_enable(struct vhost_dev *dev, int enable)
> > > +{
> > > +struct vhost_vdpa *v = dev->opaque;
> > > +unsigned int i;
> > > +int ret;
> > > +
> > > +for (i = 0; i < dev->nvqs; ++i) {
> > > +ret = vhost_vdpa_set_vring_ready(v, i);
> > > +if (ret < 0) {
> > > +return ret;
> > > +}
> > > +}
> > > +
> > > +return 0;
> > > +}
> > > +
> > >  static int vhost_vdpa_set_config_call(struct vhost_dev *dev,
> > > int fd)
> > >  {
> > > @@ -1524,6 +1540,7 @@ const VhostOps vdpa_ops = {
> > >  .vhost_set_features = vhost_vdpa_set_features,
> > >  .vhost_reset_device = vhost_vdpa_reset_device,
> > >  .vhost_get_vq_index = vhost_vdpa_get_vq_index,
> > > +.vhost_set_vring_enable = vhost_vdpa_set_vring_enable,
> > >  .vhost_get_config  = vhost_vdpa_get_config,
> > >  .vhost_set_config = vhost_vdpa_set_config,
> > >  .vhost_requires_shm_log = NULL,
> >
> > vhost-vdpa net enables CVQ before dataplane ones to configure all the
> > device in the destination of a live migration. To go back again to
> > this callback would cause the device to enable the dataplane before
> > virtqueues are configured again.
>
> Not that it makes a difference, but I don't think it would actually be
> going back. Even before your commit 6c482547, we were not making use of
> the generic callback but just called the function in a slightly
> different place (but less different than after commit 6c482547).
>
> But anyway... Why don't the other vhost backend need the same for
> vhost-net then? Do they just not support live migration?

They don't support control virtqueue. More specifically, control
virtqueue is handled directly in QEMU.


So the network device already has to special case vdpa instead of using
the same code for all vhost backends? :-/


> I don't know the code well enough to say where the problem is, but if
> vhost-vdpa networking code relies on the usual vhost operations not
> being implemented and bypasses VhostOps to replace it, that sounds like
> a design problem to me.

I don't follow this. What vhost operation is expected not to be implemented?


You were concerned about implementing .vhost_set_vring_enable in
vdpa_ops like my patch does. So it seems that the networking code
requires that it is not implemented?

On the other hand, for vhost-vdpa, the callback seems to be called in
exactly the right place where virtqueues need to be enabled, like for
other vhost devices.


> Maybe

[PATCH] vhost-vdpa: check vhost_vdpa_set_vring_ready() return value

2024-02-07 Thread Stefano Garzarella

vhost_vdpa_set_vring_ready() could already fail, but if Linux's
patch [1] will be merged, it may fail with more chance if
userspace does not activate virtqueues before DRIVER_OK when
VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not negotiated.

So better check its return value anyway.

[1] 
https://lore.kernel.org/virtualization/20240206145154.118044-1-sgarz...@redhat.com/T/#u

Signed-off-by: Stefano Garzarella 
---
Note: This patch conflicts with [2], but the resolution is simple,
so for now I sent a patch for the current master, but I'll rebase
this patch if we merge the other one first.

[2] https://lore.kernel.org/qemu-devel/20240202132521.32714-1-kw...@redhat.com/
---
 hw/virtio/vdpa-dev.c |  8 +++-
 net/vhost-vdpa.c | 15 ---
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
index eb9ecea83b..d57cd76c18 100644
--- a/hw/virtio/vdpa-dev.c
+++ b/hw/virtio/vdpa-dev.c
@@ -259,7 +259,11 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, 
Error **errp)
 goto err_guest_notifiers;
 }
 for (i = 0; i < s->dev.nvqs; ++i) {
-vhost_vdpa_set_vring_ready(>vdpa, i);
+ret = vhost_vdpa_set_vring_ready(>vdpa, i);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "Error starting vring %d", i);
+goto err_dev_stop;
+}
 }
 s->started = true;
 
@@ -274,6 +278,8 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, 
Error **errp)
 
 return ret;
 
+err_dev_stop:
+vhost_dev_stop(>dev, vdev, false);
 err_guest_notifiers:
 k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false);
 err_host_notifiers:
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 3726ee5d67..e3d8036479 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -381,7 +381,10 @@ static int vhost_vdpa_net_data_load(NetClientState *nc)
 }
 
 for (int i = 0; i < v->dev->nvqs; ++i) {
-vhost_vdpa_set_vring_ready(v, i + v->dev->vq_index);
+int ret = vhost_vdpa_set_vring_ready(v, i + v->dev->vq_index);
+if (ret < 0) {
+return ret;
+}
 }
 return 0;
 }
@@ -1213,7 +1216,10 @@ static int vhost_vdpa_net_cvq_load(NetClientState *nc)
 
 assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 
-vhost_vdpa_set_vring_ready(v, v->dev->vq_index);
+r = vhost_vdpa_set_vring_ready(v, v->dev->vq_index);
+if (unlikely(r < 0)) {
+return r;
+}
 
 if (v->shadow_vqs_enabled) {
 n = VIRTIO_NET(v->dev->vdev);
@@ -1252,7 +1258,10 @@ static int vhost_vdpa_net_cvq_load(NetClientState *nc)
 }
 
 for (int i = 0; i < v->dev->vq_index; ++i) {
-vhost_vdpa_set_vring_ready(v, i);
+r = vhost_vdpa_set_vring_ready(v, i);
+if (unlikely(r < 0)) {
+return r;
+}
 }
 
 return 0;
-- 
2.43.0

Re: Re: Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-02-07 Thread Stefano Garzarella


On Wed, Feb 07, 2024 at 04:05:29PM +0800, Cindy Lu wrote:

Jason Wang  于2024年2月7日周三 11:17写道：


On Tue, Feb 6, 2024 at 4:31 PM Stefano Garzarella  wrote:
>
> On Tue, Feb 06, 2024 at 10:47:40AM +0800, Jason Wang wrote:
> >On Mon, Feb 5, 2024 at 6:51 PM Stefano Garzarella  
wrote:
> >>
> >> On Fri, Feb 02, 2024 at 02:25:21PM +0100, Kevin Wolf wrote:
> >> >VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> >> >status flag is set; with the current API of the kernel module, it is
> >> >impossible to enable the opposite order in our block export code because
> >> >userspace is not notified when a virtqueue is enabled.
> >
> >Did this mean virtio-blk will enable a virtqueue after DRIVER_OK?
>
> It's not specific to virtio-blk, but to the generic vdpa device we have
> in QEMU (i.e. vhost-vdpa-device). Yep, after commit
> 6c4825476a4351530bcac17abab72295b75ffe98, virtqueues are enabled after
> DRIVER_OK.

Right.

>
> >Sepc is not clear about this and that's why we introduce
> >VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK.
>
> Ah, I didn't know about this new feature. So after commit
> 6c4825476a4351530bcac17abab72295b75ffe98 the vhost-vdpa-device is not
> complying with the specification, right?

Kind of, but as stated, it's just because spec is unclear about the
behaviour. There's a chance that spec will explicitly support it in
the future.

>
> >
> >>
> >> Yeah, IMHO the VDUSE protocol is missing a VDUSE_SET_VQ_READY message,
> >
> >I think you meant when VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is
> >negotiated.
>
> At this point yes. But if VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not
> negotiated, should we return an error in vhost-vdpa kernel module if
> VHOST_VDPA_SET_VRING_ENABLE is called when DRIVER_OK is already set?

I'm not sure if this can break some setups or not. It might be better
to leave it as is?

Without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK, we don't know if
parent support vq_ready after driver_ok.
With VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK, we know parent support
vq_ready after driver_ok.

>
> >If this is truth, it seems a little more complicated, for
> >example the get_backend_features needs to be forward to the userspace?
>
> I'm not understanding, don't we already have VHOST_GET_BACKEND_FEATURES
> for this? Or do you mean userspace on the VDUSE side?

Yes, since in this case the parent is in the userspace, there's no way
for VDUSE to know if user space supports vq_ready after driver_ok or
not.

As you may have noticed, we don't have a message for vq_ready which
implies that vq_ready after driver_ok can't be supported.

>
> >This seems suboptimal to implement this in the spec first and then we
> >can leverage the features. Or we can have another parameter for the
> >ioctl that creates the vduse device.
>
> I got a little lost, though in vhost-user, the device can always expect
> a vring_enable/disable, so I thought it was not complicated in VDUSE.

Yes, the problem is assuming we have a message for vq_ready, there
could be  a "legacy" userspace that doesn't support that.  So in that
case, VDUSE needs to know if the userspace parent can support that or
not.

>
> >
> >> I'll start another thread about that, but in the meantime I agree that
> >> we should fix QEMU since we need to work properly with old kernels as
> >> well.
> >>
> >> >
> >> >This requirement also mathces the normal initialisation order as done by
> >> >the generic vhost code in QEMU. However, commit 6c482547 accidentally
> >> >changed the order for vdpa-dev and broke access to VDUSE devices with
> >> >this.
> >> >
> >> >This changes vdpa-dev to use the normal order again and use the standard
> >> >vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> >> >used with vdpa-dev again after this fix.
> >>
> >> I like this approach and the patch LGTM, but I'm a bit worried about
> >> this function in hw/net/vhost_net.c:
> >>
> >>  int vhost_set_vring_enable(NetClientState *nc, int enable)
> >>  {
> >>  VHostNetState *net = get_vhost_net(nc);
> >>  const VhostOps *vhost_ops = net->dev.vhost_ops;
> >>
> >>  nc->vring_enable = enable;
> >>
> >>  if (vhost_ops && vhost_ops->vhost_set_vring_enable) {
> >>  return vhost_ops->vhost_set_vring_enable(>dev, enable);
> >>  }
> >>
> >>  return 0;
> >>  }
> >>
> >> @Eugenio, @Jason, should we change

Re: Re: Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-02-07 Thread Stefano Garzarella

On Wed, Feb 07, 2024 at 11:17:34AM +0800, Jason Wang wrote:

On Tue, Feb 6, 2024 at 4:31 PM Stefano Garzarella  wrote:

On Tue, Feb 06, 2024 at 10:47:40AM +0800, Jason Wang wrote:
>On Mon, Feb 5, 2024 at 6:51 PM Stefano Garzarella  wrote:
>>
>> On Fri, Feb 02, 2024 at 02:25:21PM +0100, Kevin Wolf wrote:
>> >VDUSE requires that virtqueues are first enabled before the DRIVER_OK
>> >status flag is set; with the current API of the kernel module, it is
>> >impossible to enable the opposite order in our block export code because
>> >userspace is not notified when a virtqueue is enabled.
>
>Did this mean virtio-blk will enable a virtqueue after DRIVER_OK?

It's not specific to virtio-blk, but to the generic vdpa device we have
in QEMU (i.e. vhost-vdpa-device). Yep, after commit
6c4825476a4351530bcac17abab72295b75ffe98, virtqueues are enabled after
DRIVER_OK.

Right.

>Sepc is not clear about this and that's why we introduce
>VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK.

Ah, I didn't know about this new feature. So after commit
6c4825476a4351530bcac17abab72295b75ffe98 the vhost-vdpa-device is not
complying with the specification, right?

Kind of, but as stated, it's just because spec is unclear about the
behaviour. There's a chance that spec will explicitly support it in
the future.

>
>>
>> Yeah, IMHO the VDUSE protocol is missing a VDUSE_SET_VQ_READY message,
>
>I think you meant when VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is
>negotiated.

At this point yes. But if VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not
negotiated, should we return an error in vhost-vdpa kernel module if
VHOST_VDPA_SET_VRING_ENABLE is called when DRIVER_OK is already set?

I'm not sure if this can break some setups or not. It might be better
to leave it as is?

As I also answered in the kernel patch, IMHO we have to return an error, 
because any setups are broken anyway (see the problem we're fixing with 
this patch), so at this point it's better to return an error, so they 
don't spend time figuring out why the VDUSE device doesn't work.

Without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK, we don't know if
parent support vq_ready after driver_ok.

So we have to assume that it doesn't support it, to be more 
conservative, right?

With VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK, we know parent support
vq_ready after driver_ok.

>If this is truth, it seems a little more complicated, for
>example the get_backend_features needs to be forward to the userspace?

I'm not understanding, don't we already have VHOST_GET_BACKEND_FEATURES
for this? Or do you mean userspace on the VDUSE side?

Yes, since in this case the parent is in the userspace, there's no way
for VDUSE to know if user space supports vq_ready after driver_ok or
not.

As you may have noticed, we don't have a message for vq_ready which
implies that vq_ready after driver_ok can't be supported.

Yep, I see.

>This seems suboptimal to implement this in the spec first and then we
>can leverage the features. Or we can have another parameter for the
>ioctl that creates the vduse device.

I got a little lost, though in vhost-user, the device can always expect
a vring_enable/disable, so I thought it was not complicated in VDUSE.

Yes, the problem is assuming we have a message for vq_ready, there
could be  a "legacy" userspace that doesn't support that.  So in that
case, VDUSE needs to know if the userspace parent can support that or
not.

I think VDUSE needs to negotiate features (if it doesn't already) as 
vhost-user does with the backend. Also for new future functionality.

>
>> I'll start another thread about that, but in the meantime I agree that
>> we should fix QEMU since we need to work properly with old kernels as
>> well.
>>
>> >
>> >This requirement also mathces the normal initialisation order as done by
>> >the generic vhost code in QEMU. However, commit 6c482547 accidentally
>> >changed the order for vdpa-dev and broke access to VDUSE devices with
>> >this.
>> >
>> >This changes vdpa-dev to use the normal order again and use the standard
>> >vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
>> >used with vdpa-dev again after this fix.
>>
>> I like this approach and the patch LGTM, but I'm a bit worried about
>> this function in hw/net/vhost_net.c:
>>
>>  int vhost_set_vring_enable(NetClientState *nc, int enable)
>>  {
>>  VHostNetState *net = get_vhost_net(nc);
>>  const VhostOps *vhost_ops = net->dev.vhost_ops;
>>
>>  nc->vring_enable = enable;
>>
>>  if (vhost_ops && vhost_ops->vhost_set_vring_enable) {
>>  return vhost_ops->vhost_set_vring_enable(>dev, enable);
>>

Re: Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-02-06 Thread Stefano Garzarella

On Tue, Feb 6, 2024 at 9:31 AM Stefano Garzarella  wrote:
>
> On Tue, Feb 06, 2024 at 10:47:40AM +0800, Jason Wang wrote:
> >On Mon, Feb 5, 2024 at 6:51 PM Stefano Garzarella  
> >wrote:
> >>
> >> On Fri, Feb 02, 2024 at 02:25:21PM +0100, Kevin Wolf wrote:
> >> >VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> >> >status flag is set; with the current API of the kernel module, it is
> >> >impossible to enable the opposite order in our block export code because
> >> >userspace is not notified when a virtqueue is enabled.
> >
> >Did this mean virtio-blk will enable a virtqueue after DRIVER_OK?
>
> It's not specific to virtio-blk, but to the generic vdpa device we have
> in QEMU (i.e. vhost-vdpa-device). Yep, after commit
> 6c4825476a4351530bcac17abab72295b75ffe98, virtqueues are enabled after
> DRIVER_OK.
>
> >Sepc is not clear about this and that's why we introduce
> >VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK.
>
> Ah, I didn't know about this new feature. So after commit
> 6c4825476a4351530bcac17abab72295b75ffe98 the vhost-vdpa-device is not
> complying with the specification, right?
>
> >
> >>
> >> Yeah, IMHO the VDUSE protocol is missing a VDUSE_SET_VQ_READY message,
> >
> >I think you meant when VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is
> >negotiated.
>
> At this point yes. But if VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not
> negotiated, should we return an error in vhost-vdpa kernel module if
> VHOST_VDPA_SET_VRING_ENABLE is called when DRIVER_OK is already set?

I just sent a patch:
https://lore.kernel.org/virtualization/20240206145154.118044-1-sgarz...@redhat.com/T/#u

Then I discovered that we never check the return value of
vhost_vdpa_set_vring_ready() in QEMU.
I'll send a patch for that!

Stefano

>
> >If this is truth, it seems a little more complicated, for
> >example the get_backend_features needs to be forward to the userspace?
>
> I'm not understanding, don't we already have VHOST_GET_BACKEND_FEATURES
> for this? Or do you mean userspace on the VDUSE side?
>
> >This seems suboptimal to implement this in the spec first and then we
> >can leverage the features. Or we can have another parameter for the
> >ioctl that creates the vduse device.
>
> I got a little lost, though in vhost-user, the device can always expect
> a vring_enable/disable, so I thought it was not complicated in VDUSE.
>
> >
> >> I'll start another thread about that, but in the meantime I agree that
> >> we should fix QEMU since we need to work properly with old kernels as
> >> well.
> >>
> >> >
> >> >This requirement also mathces the normal initialisation order as done by
> >> >the generic vhost code in QEMU. However, commit 6c482547 accidentally
> >> >changed the order for vdpa-dev and broke access to VDUSE devices with
> >> >this.
> >> >
> >> >This changes vdpa-dev to use the normal order again and use the standard
> >> >vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
> >> >used with vdpa-dev again after this fix.
> >>
> >> I like this approach and the patch LGTM, but I'm a bit worried about
> >> this function in hw/net/vhost_net.c:
> >>
> >>  int vhost_set_vring_enable(NetClientState *nc, int enable)
> >>  {
> >>  VHostNetState *net = get_vhost_net(nc);
> >>  const VhostOps *vhost_ops = net->dev.vhost_ops;
> >>
> >>  nc->vring_enable = enable;
> >>
> >>  if (vhost_ops && vhost_ops->vhost_set_vring_enable) {
> >>  return vhost_ops->vhost_set_vring_enable(>dev, enable);
> >>  }
> >>
> >>  return 0;
> >>  }
> >>
> >> @Eugenio, @Jason, should we change some things there if vhost-vdpa
> >> implements the vhost_set_vring_enable callback?
> >
> >Eugenio may know more, I remember we need to enable cvq first for
> >shadow virtqueue to restore some states.
> >
> >>
> >> Do you remember why we didn't implement it from the beginning?
> >
> >It seems the vrings parameter is introduced after vhost-vdpa is
> >implemented.
>
> Sorry, I mean why we didn't implement the vhost_set_vring_enable
> callback for vhost-vdpa from the beginning.
>
> Thanks,
> Stefano

Re: Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-02-06 Thread Stefano Garzarella

On Tue, Feb 06, 2024 at 10:47:40AM +0800, Jason Wang wrote:

On Mon, Feb 5, 2024 at 6:51 PM Stefano Garzarella  wrote:

On Fri, Feb 02, 2024 at 02:25:21PM +0100, Kevin Wolf wrote:
>VDUSE requires that virtqueues are first enabled before the DRIVER_OK
>status flag is set; with the current API of the kernel module, it is
>impossible to enable the opposite order in our block export code because
>userspace is not notified when a virtqueue is enabled.

Did this mean virtio-blk will enable a virtqueue after DRIVER_OK?

It's not specific to virtio-blk, but to the generic vdpa device we have 
in QEMU (i.e. vhost-vdpa-device). Yep, after commit 
6c4825476a4351530bcac17abab72295b75ffe98, virtqueues are enabled after 
DRIVER_OK.

Sepc is not clear about this and that's why we introduce
VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK.

Ah, I didn't know about this new feature. So after commit 
6c4825476a4351530bcac17abab72295b75ffe98 the vhost-vdpa-device is not 
complying with the specification, right?

Yeah, IMHO the VDUSE protocol is missing a VDUSE_SET_VQ_READY message,

I think you meant when VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is
negotiated.

At this point yes. But if VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not 
negotiated, should we return an error in vhost-vdpa kernel module if 
VHOST_VDPA_SET_VRING_ENABLE is called when DRIVER_OK is already set?

If this is truth, it seems a little more complicated, for
example the get_backend_features needs to be forward to the userspace?

I'm not understanding, don't we already have VHOST_GET_BACKEND_FEATURES 
for this? Or do you mean userspace on the VDUSE side?

This seems suboptimal to implement this in the spec first and then we
can leverage the features. Or we can have another parameter for the
ioctl that creates the vduse device.

I got a little lost, though in vhost-user, the device can always expect 
a vring_enable/disable, so I thought it was not complicated in VDUSE.

I'll start another thread about that, but in the meantime I agree that
we should fix QEMU since we need to work properly with old kernels as
well.

>
>This requirement also mathces the normal initialisation order as done by
>the generic vhost code in QEMU. However, commit 6c482547 accidentally
>changed the order for vdpa-dev and broke access to VDUSE devices with
>this.
>
>This changes vdpa-dev to use the normal order again and use the standard
>vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
>used with vdpa-dev again after this fix.

I like this approach and the patch LGTM, but I'm a bit worried about
this function in hw/net/vhost_net.c:

 int vhost_set_vring_enable(NetClientState *nc, int enable)
 {
 VHostNetState *net = get_vhost_net(nc);
 const VhostOps *vhost_ops = net->dev.vhost_ops;

 nc->vring_enable = enable;

 if (vhost_ops && vhost_ops->vhost_set_vring_enable) {
 return vhost_ops->vhost_set_vring_enable(>dev, enable);
 }

 return 0;
 }

@Eugenio, @Jason, should we change some things there if vhost-vdpa
implements the vhost_set_vring_enable callback?

Eugenio may know more, I remember we need to enable cvq first for
shadow virtqueue to restore some states.

Do you remember why we didn't implement it from the beginning?

It seems the vrings parameter is introduced after vhost-vdpa is 
implemented.

Sorry, I mean why we didn't implement the vhost_set_vring_enable 
callback for vhost-vdpa from the beginning.

Thanks,
Stefano

Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-02-05 Thread Stefano Garzarella


On Fri, Feb 02, 2024 at 02:25:21PM +0100, Kevin Wolf wrote:

VDUSE requires that virtqueues are first enabled before the DRIVER_OK
status flag is set; with the current API of the kernel module, it is
impossible to enable the opposite order in our block export code because
userspace is not notified when a virtqueue is enabled.


Yeah, IMHO the VDUSE protocol is missing a VDUSE_SET_VQ_READY message,
I'll start another thread about that, but in the meantime I agree that
we should fix QEMU since we need to work properly with old kernels as
well.



This requirement also mathces the normal initialisation order as done by
the generic vhost code in QEMU. However, commit 6c482547 accidentally
changed the order for vdpa-dev and broke access to VDUSE devices with
this.

This changes vdpa-dev to use the normal order again and use the standard
vhost callback .vhost_set_vring_enable for this. VDUSE devices can be
used with vdpa-dev again after this fix.


I like this approach and the patch LGTM, but I'm a bit worried about
this function in hw/net/vhost_net.c:

int vhost_set_vring_enable(NetClientState *nc, int enable)
{
VHostNetState *net = get_vhost_net(nc);
const VhostOps *vhost_ops = net->dev.vhost_ops;

nc->vring_enable = enable;

if (vhost_ops && vhost_ops->vhost_set_vring_enable) {
return vhost_ops->vhost_set_vring_enable(>dev, enable);
}

return 0;
}

@Eugenio, @Jason, should we change some things there if vhost-vdpa
implements the vhost_set_vring_enable callback?

Do you remember why we didn't implement it from the beginning?

Thanks,
Stefano



Cc: qemu-sta...@nongnu.org
Fixes: 6c4825476a4351530bcac17abab72295b75ffe98
Signed-off-by: Kevin Wolf 
---
hw/virtio/vdpa-dev.c   |  5 +
hw/virtio/vhost-vdpa.c | 17 +
2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vdpa-dev.c b/hw/virtio/vdpa-dev.c
index eb9ecea83b..13e87f06f6 100644
--- a/hw/virtio/vdpa-dev.c
+++ b/hw/virtio/vdpa-dev.c
@@ -253,14 +253,11 @@ static int vhost_vdpa_device_start(VirtIODevice *vdev, 
Error **errp)

s->dev.acked_features = vdev->guest_features;

-ret = vhost_dev_start(>dev, vdev, false);
+ret = vhost_dev_start(>dev, vdev, true);
if (ret < 0) {
error_setg_errno(errp, -ret, "Error starting vhost");
goto err_guest_notifiers;
}
-for (i = 0; i < s->dev.nvqs; ++i) {
-vhost_vdpa_set_vring_ready(>vdpa, i);
-}
s->started = true;

/*
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 3a43beb312..c4574d56c5 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -904,6 +904,22 @@ int vhost_vdpa_set_vring_ready(struct vhost_vdpa *v, 
unsigned idx)
return r;
}

+static int vhost_vdpa_set_vring_enable(struct vhost_dev *dev, int enable)
+{
+struct vhost_vdpa *v = dev->opaque;
+unsigned int i;
+int ret;
+
+for (i = 0; i < dev->nvqs; ++i) {
+ret = vhost_vdpa_set_vring_ready(v, i);
+if (ret < 0) {
+return ret;
+}
+}
+
+return 0;
+}
+
static int vhost_vdpa_set_config_call(struct vhost_dev *dev,
   int fd)
{
@@ -1524,6 +1540,7 @@ const VhostOps vdpa_ops = {
.vhost_set_features = vhost_vdpa_set_features,
.vhost_reset_device = vhost_vdpa_reset_device,
.vhost_get_vq_index = vhost_vdpa_get_vq_index,
+.vhost_set_vring_enable = vhost_vdpa_set_vring_enable,
.vhost_get_config  = vhost_vdpa_get_config,
.vhost_set_config = vhost_vdpa_set_config,
.vhost_requires_shm_log = NULL,
--
2.43.0

Re: [PATCH] blkio: Respect memory-alignment for bounce buffer allocations

2024-01-31 Thread Stefano Garzarella


On Wed, Jan 31, 2024 at 06:31:40PM +0100, Kevin Wolf wrote:

blkio_alloc_mem_region() requires that the requested buffer size is a
multiple of the memory-alignment property. If it isn't, the allocation
fails with a return value of -EINVAL.

Fix the call in blkio_resize_bounce_pool() to make sure the requested
size is properly aligned.

I observed this problem with vhost-vdpa, which requires page aligned
memory. As the virtio-blk device behind it still had 512 byte blocks, we
got bs->bl.request_alignment = 512, but actually any request that needed
a bounce buffer and was not aligned to 4k would fail without this fix.

Suggested-by: Stefano Garzarella 
Signed-off-by: Kevin Wolf 
---
block/blkio.c | 3 +++
1 file changed, 3 insertions(+)


Thanks for fixinig this!

Reviewed-by: Stefano Garzarella 



diff --git a/block/blkio.c b/block/blkio.c
index 0a0a6c0f5f..b989617608 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -89,6 +89,9 @@ static int blkio_resize_bounce_pool(BDRVBlkioState *s, 
int64_t bytes)
/* Pad size to reduce frequency of resize calls */
bytes += 128 * 1024;

+/* Align the pool size to avoid blkio_alloc_mem_region() failure */
+bytes = QEMU_ALIGN_UP(bytes, s->mem_region_alignment);
+
WITH_QEMU_LOCK_GUARD(>blkio_lock) {
int ret;

--
2.43.0

Re: [PATCH v3 2/2] vhost-scsi: Add support for a worker thread per virtqueue

2023-12-05 Thread Stefano Garzarella


On Mon, Dec 04, 2023 at 05:16:18PM -0600, Mike Christie wrote:

This adds support for vhost-scsi to be able to create a worker thread
per virtqueue. Right now for vhost-net we get a worker thread per
tx/rx virtqueue pair which scales nicely as we add more virtqueues and
CPUs, but for scsi we get the single worker thread that's shared by all
virtqueues. When trying to send IO to more than 2 virtqueues the single
thread becomes a bottlneck.

This patch adds a new setting, worker_per_virtqueue, which can be set
to:

false: Existing behavior where we get the single worker thread.
true: Create a worker per IO virtqueue.

Signed-off-by: Mike Christie 
Reviewed-by: Stefan Hajnoczi 

---
hw/scsi/vhost-scsi.c| 62 +
include/hw/virtio/virtio-scsi.h |  1 +
2 files changed, 63 insertions(+)


Thank for adding the warning! LGTM!

Reviewed-by: Stefano Garzarella 



diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 3126df9e1d9d..08aa7534df51 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -165,6 +165,59 @@ static const VMStateDescription vmstate_virtio_vhost_scsi 
= {
.pre_save = vhost_scsi_pre_save,
};

+static int vhost_scsi_set_workers(VHostSCSICommon *vsc, bool per_virtqueue)
+{
+struct vhost_dev *dev = >dev;
+struct vhost_vring_worker vq_worker;
+struct vhost_worker_state worker;
+int i, ret;
+
+/* Use default worker */
+if (!per_virtqueue || dev->nvqs == VHOST_SCSI_VQ_NUM_FIXED + 1) {
+return 0;
+}
+
+/*
+ * ctl/evt share the first worker since it will be rare for them
+ * to send cmds while IO is running.
+ */
+for (i = VHOST_SCSI_VQ_NUM_FIXED + 1; i < dev->nvqs; i++) {
+memset(, 0, sizeof(worker));
+
+ret = dev->vhost_ops->vhost_new_worker(dev, );
+if (ret == -ENOTTY) {
+/*
+ * worker ioctls are not implemented so just ignore and
+ * and continue device setup.
+ */
+warn_report("vhost-scsi: Backend supports a single worker. "
+"Ignoring worker_per_virtqueue=true setting.");
+ret = 0;
+break;
+} else if (ret) {
+break;
+}
+
+memset(_worker, 0, sizeof(vq_worker));
+vq_worker.worker_id = worker.worker_id;
+vq_worker.index = i;
+
+ret = dev->vhost_ops->vhost_attach_vring_worker(dev, _worker);
+if (ret == -ENOTTY) {
+/*
+ * It's a bug for the kernel to have supported the worker creation
+ * ioctl but not attach.
+ */
+dev->vhost_ops->vhost_free_worker(dev, );
+break;
+} else if (ret) {
+break;
+}
+}
+
+return ret;
+}
+
static void vhost_scsi_realize(DeviceState *dev, Error **errp)
{
VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
@@ -232,6 +285,13 @@ static void vhost_scsi_realize(DeviceState *dev, Error 
**errp)
goto free_vqs;
}

+ret = vhost_scsi_set_workers(vsc, vs->conf.worker_per_virtqueue);
+if (ret < 0) {
+error_setg(errp, "vhost-scsi: vhost worker setup failed: %s",
+   strerror(-ret));
+goto free_vqs;
+}
+
/* At present, channel and lun both are 0 for bootable vhost-scsi disk */
vsc->channel = 0;
vsc->lun = 0;
@@ -297,6 +357,8 @@ static Property vhost_scsi_properties[] = {
 VIRTIO_SCSI_F_T10_PI,
 false),
DEFINE_PROP_BOOL("migratable", VHostSCSICommon, migratable, false),
+DEFINE_PROP_BOOL("worker_per_virtqueue", VirtIOSCSICommon,
+ conf.worker_per_virtqueue, false),
DEFINE_PROP_END_OF_LIST(),
};

diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 779568ab5d28..0e9a1867665e 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -51,6 +51,7 @@ typedef struct virtio_scsi_config VirtIOSCSIConfig;
struct VirtIOSCSIConf {
uint32_t num_queues;
uint32_t virtqueue_size;
+bool worker_per_virtqueue;
bool seg_max_adjust;
uint32_t max_sectors;
uint32_t cmd_per_lun;
--
2.34.1

Re: [PATCH v2 2/2] vhost-scsi: Add support for a worker thread per virtqueue

2023-11-29 Thread Stefano Garzarella


On Sun, Nov 26, 2023 at 06:28:34PM -0600, Mike Christie wrote:

This adds support for vhost-scsi to be able to create a worker thread
per virtqueue. Right now for vhost-net we get a worker thread per
tx/rx virtqueue pair which scales nicely as we add more virtqueues and
CPUs, but for scsi we get the single worker thread that's shared by all
virtqueues. When trying to send IO to more than 2 virtqueues the single
thread becomes a bottlneck.

This patch adds a new setting, workers_per_virtqueue, which can be set
to:

false: Existing behavior where we get the single worker thread.
true: Create a worker per IO virtqueue.

Signed-off-by: Mike Christie 
---
hw/scsi/vhost-scsi.c| 60 +
include/hw/virtio/virtio-scsi.h |  1 +
2 files changed, 61 insertions(+)

diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 3126df9e1d9d..77eef9474c23 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -165,6 +165,57 @@ static const VMStateDescription vmstate_virtio_vhost_scsi 
= {
.pre_save = vhost_scsi_pre_save,
};

+static int vhost_scsi_set_workers(VHostSCSICommon *vsc, bool per_virtqueue)
+{
+struct vhost_dev *dev = >dev;
+struct vhost_vring_worker vq_worker;
+struct vhost_worker_state worker;
+int i, ret;
+
+/* Use default worker */
+if (!per_virtqueue || dev->nvqs == VHOST_SCSI_VQ_NUM_FIXED + 1) {
+return 0;
+}
+
+/*
+ * ctl/evt share the first worker since it will be rare for them
+ * to send cmds while IO is running.
+ */
+for (i = VHOST_SCSI_VQ_NUM_FIXED + 1; i < dev->nvqs; i++) {
+memset(, 0, sizeof(worker));
+
+ret = dev->vhost_ops->vhost_new_worker(dev, );
+if (ret == -ENOTTY) {
+/*
+ * worker ioctls are not implemented so just ignore and
+ * and continue device setup.
+ */


IIUC here the user has asked to use a worker for each virtqueue, but the
kernel does not support it so we ignore it.

Should we at least print a warning?

The rest LGTM!

Stefano


+ret = 0;
+break;
+} else if (ret) {
+break;
+}
+
+memset(_worker, 0, sizeof(vq_worker));
+vq_worker.worker_id = worker.worker_id;
+vq_worker.index = i;
+
+ret = dev->vhost_ops->vhost_attach_vring_worker(dev, _worker);
+if (ret == -ENOTTY) {
+/*
+ * It's a bug for the kernel to have supported the worker creation
+ * ioctl but not attach.
+ */
+dev->vhost_ops->vhost_free_worker(dev, );
+break;
+} else if (ret) {
+break;
+}
+}
+
+return ret;
+}
+
static void vhost_scsi_realize(DeviceState *dev, Error **errp)
{
VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
@@ -232,6 +283,13 @@ static void vhost_scsi_realize(DeviceState *dev, Error 
**errp)
goto free_vqs;
}

+ret = vhost_scsi_set_workers(vsc, vs->conf.worker_per_virtqueue);
+if (ret < 0) {
+error_setg(errp, "vhost-scsi: vhost worker setup failed: %s",
+   strerror(-ret));
+goto free_vqs;
+}
+
/* At present, channel and lun both are 0 for bootable vhost-scsi disk */
vsc->channel = 0;
vsc->lun = 0;
@@ -297,6 +355,8 @@ static Property vhost_scsi_properties[] = {
 VIRTIO_SCSI_F_T10_PI,
 false),
DEFINE_PROP_BOOL("migratable", VHostSCSICommon, migratable, false),
+DEFINE_PROP_BOOL("worker_per_virtqueue", VirtIOSCSICommon,
+ conf.worker_per_virtqueue, false),
DEFINE_PROP_END_OF_LIST(),
};

diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 779568ab5d28..0e9a1867665e 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -51,6 +51,7 @@ typedef struct virtio_scsi_config VirtIOSCSIConfig;
struct VirtIOSCSIConf {
uint32_t num_queues;
uint32_t virtqueue_size;
+bool worker_per_virtqueue;
bool seg_max_adjust;
uint32_t max_sectors;
uint32_t cmd_per_lun;
--
2.34.1

Re: [PATCH v2 1/2] vhost: Add worker backend callouts

2023-11-29 Thread Stefano Garzarella


On Sun, Nov 26, 2023 at 06:28:33PM -0600, Mike Christie wrote:

This adds the vhost backend callouts for the worker ioctls added in the
6.4 linux kernel commit:

c1ecd8e95007 ("vhost: allow userspace to create workers")

Signed-off-by: Mike Christie 
---
hw/virtio/vhost-backend.c | 28 
include/hw/virtio/vhost-backend.h | 14 ++
2 files changed, 42 insertions(+)


Reviewed-by: Stefano Garzarella 



diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 17f3fc6a0823..833804dd40f2 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -158,6 +158,30 @@ static int vhost_kernel_set_vring_busyloop_timeout(struct 
vhost_dev *dev,
return vhost_kernel_call(dev, VHOST_SET_VRING_BUSYLOOP_TIMEOUT, s);
}

+static int vhost_kernel_new_worker(struct vhost_dev *dev,
+   struct vhost_worker_state *worker)
+{
+return vhost_kernel_call(dev, VHOST_NEW_WORKER, worker);
+}
+
+static int vhost_kernel_free_worker(struct vhost_dev *dev,
+struct vhost_worker_state *worker)
+{
+return vhost_kernel_call(dev, VHOST_FREE_WORKER, worker);
+}
+
+static int vhost_kernel_attach_vring_worker(struct vhost_dev *dev,
+struct vhost_vring_worker *worker)
+{
+return vhost_kernel_call(dev, VHOST_ATTACH_VRING_WORKER, worker);
+}
+
+static int vhost_kernel_get_vring_worker(struct vhost_dev *dev,
+ struct vhost_vring_worker *worker)
+{
+return vhost_kernel_call(dev, VHOST_GET_VRING_WORKER, worker);
+}
+
static int vhost_kernel_set_features(struct vhost_dev *dev,
 uint64_t features)
{
@@ -313,6 +337,10 @@ const VhostOps kernel_ops = {
.vhost_set_vring_err = vhost_kernel_set_vring_err,
.vhost_set_vring_busyloop_timeout =
vhost_kernel_set_vring_busyloop_timeout,
+.vhost_get_vring_worker = vhost_kernel_get_vring_worker,
+.vhost_attach_vring_worker = vhost_kernel_attach_vring_worker,
+.vhost_new_worker = vhost_kernel_new_worker,
+.vhost_free_worker = vhost_kernel_free_worker,
.vhost_set_features = vhost_kernel_set_features,
.vhost_get_features = vhost_kernel_get_features,
.vhost_set_backend_cap = vhost_kernel_set_backend_cap,
diff --git a/include/hw/virtio/vhost-backend.h 
b/include/hw/virtio/vhost-backend.h
index 96ccc18cd33b..9f16d0884e8f 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -33,6 +33,8 @@ struct vhost_memory;
struct vhost_vring_file;
struct vhost_vring_state;
struct vhost_vring_addr;
+struct vhost_vring_worker;
+struct vhost_worker_state;
struct vhost_scsi_target;
struct vhost_iotlb_msg;
struct vhost_virtqueue;
@@ -73,6 +75,14 @@ typedef int (*vhost_set_vring_err_op)(struct vhost_dev *dev,
  struct vhost_vring_file *file);
typedef int (*vhost_set_vring_busyloop_timeout_op)(struct vhost_dev *dev,
   struct vhost_vring_state *r);
+typedef int (*vhost_attach_vring_worker_op)(struct vhost_dev *dev,
+struct vhost_vring_worker *worker);
+typedef int (*vhost_get_vring_worker_op)(struct vhost_dev *dev,
+ struct vhost_vring_worker *worker);
+typedef int (*vhost_new_worker_op)(struct vhost_dev *dev,
+   struct vhost_worker_state *worker);
+typedef int (*vhost_free_worker_op)(struct vhost_dev *dev,
+struct vhost_worker_state *worker);
typedef int (*vhost_set_features_op)(struct vhost_dev *dev,
 uint64_t features);
typedef int (*vhost_get_features_op)(struct vhost_dev *dev,
@@ -151,6 +161,10 @@ typedef struct VhostOps {
vhost_set_vring_call_op vhost_set_vring_call;
vhost_set_vring_err_op vhost_set_vring_err;
vhost_set_vring_busyloop_timeout_op vhost_set_vring_busyloop_timeout;
+vhost_new_worker_op vhost_new_worker;
+vhost_free_worker_op vhost_free_worker;
+vhost_get_vring_worker_op vhost_get_vring_worker;
+vhost_attach_vring_worker_op vhost_attach_vring_worker;
vhost_set_features_op vhost_set_features;
vhost_get_features_op vhost_get_features;
vhost_set_backend_cap_op vhost_set_backend_cap;
--
2.34.1

Re: [PATCH 2/2] vhost-scsi: Add support for a worker thread per virtqueue

2023-11-15 Thread Stefano Garzarella


On Mon, Nov 13, 2023 at 06:36:44PM -0600, Mike Christie wrote:

This adds support for vhost-scsi to be able to create a worker thread
per virtqueue. Right now for vhost-net we get a worker thread per
tx/rx virtqueue pair which scales nicely as we add more virtqueues and
CPUs, but for scsi we get the single worker thread that's shared by all
virtqueues. When trying to send IO to more than 2 virtqueues the single
thread becomes a bottlneck.

This patch adds a new setting, virtqueue_workers, which can be set to:

1: Existing behavior whre we get the single thread.
-1: Create a worker per IO virtqueue.


I find this setting a bit odd. What about a boolean instead?

`per_virtqueue_workers`:
false: Existing behavior whre we get the single thread.
true: Create a worker per IO virtqueue.



Signed-off-by: Mike Christie 
---
hw/scsi/vhost-scsi.c| 68 +
include/hw/virtio/virtio-scsi.h |  1 +
2 files changed, 69 insertions(+)

diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 3126df9e1d9d..5cf669b6563b 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -31,6 +31,9 @@
#include "qemu/cutils.h"
#include "sysemu/sysemu.h"

+#define VHOST_SCSI_WORKER_PER_VQ-1
+#define VHOST_SCSI_WORKER_DEF1
+
/* Features supported by host kernel. */
static const int kernel_feature_bits[] = {
VIRTIO_F_NOTIFY_ON_EMPTY,
@@ -165,6 +168,62 @@ static const VMStateDescription vmstate_virtio_vhost_scsi 
= {
.pre_save = vhost_scsi_pre_save,
};

+static int vhost_scsi_set_workers(VHostSCSICommon *vsc, int workers_cnt)
+{
+struct vhost_dev *dev = >dev;
+struct vhost_vring_worker vq_worker;
+struct vhost_worker_state worker;
+int i, ret;
+
+/* Use default worker */
+if (workers_cnt == VHOST_SCSI_WORKER_DEF ||
+dev->nvqs == VHOST_SCSI_VQ_NUM_FIXED + 1) {
+return 0;
+}
+
+if (workers_cnt != VHOST_SCSI_WORKER_PER_VQ) {
+return -EINVAL;
+}
+
+/*
+ * ctl/evt share the first worker since it will be rare for them
+ * to send cmds while IO is running.
+ */
+for (i = VHOST_SCSI_VQ_NUM_FIXED + 1; i < dev->nvqs; i++) {
+memset(, 0, sizeof(worker));
+
+ret = dev->vhost_ops->vhost_new_worker(dev, );


Should we call vhost_free_worker() in the vhost_scsi_unrealize() or are
workers automatically freed when `vhostfd` is closed?

The rest LGTM.

Thanks,
Stefano


+if (ret == -ENOTTY) {
+/*
+ * worker ioctls are not implemented so just ignore and
+ * and continue device setup.
+ */
+ret = 0;
+break;
+} else if (ret) {
+break;
+}
+
+memset(_worker, 0, sizeof(vq_worker));
+vq_worker.worker_id = worker.worker_id;
+vq_worker.index = i;
+
+ret = dev->vhost_ops->vhost_attach_vring_worker(dev, _worker);
+if (ret == -ENOTTY) {
+/*
+ * It's a bug for the kernel to have supported the worker creation
+ * ioctl but not attach.
+ */
+dev->vhost_ops->vhost_free_worker(dev, );
+break;
+} else if (ret) {
+break;
+}
+}
+
+return ret;
+}
+
static void vhost_scsi_realize(DeviceState *dev, Error **errp)
{
VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
@@ -232,6 +291,13 @@ static void vhost_scsi_realize(DeviceState *dev, Error 
**errp)
goto free_vqs;
}

+ret = vhost_scsi_set_workers(vsc, vs->conf.virtqueue_workers);
+if (ret < 0) {
+error_setg(errp, "vhost-scsi: vhost worker setup failed: %s",
+   strerror(-ret));
+goto free_vqs;
+}
+
/* At present, channel and lun both are 0 for bootable vhost-scsi disk */
vsc->channel = 0;
vsc->lun = 0;
@@ -297,6 +363,8 @@ static Property vhost_scsi_properties[] = {
 VIRTIO_SCSI_F_T10_PI,
 false),
DEFINE_PROP_BOOL("migratable", VHostSCSICommon, migratable, false),
+DEFINE_PROP_INT32("virtqueue_workers", VirtIOSCSICommon,
+  conf.virtqueue_workers, VHOST_SCSI_WORKER_DEF),
DEFINE_PROP_END_OF_LIST(),
};

diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 779568ab5d28..f70624ece564 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -51,6 +51,7 @@ typedef struct virtio_scsi_config VirtIOSCSIConfig;
struct VirtIOSCSIConf {
uint32_t num_queues;
uint32_t virtqueue_size;
+int virtqueue_workers;
bool seg_max_adjust;
uint32_t max_sectors;
uint32_t cmd_per_lun;
--
2.34.1

Re: [PATCH] vhost-user: Fix protocol feature bit conflict

2023-10-16 Thread Stefano Garzarella


On Mon, Oct 16, 2023 at 10:32:01AM +0200, Hanna Czenczek wrote:

The VHOST_USER_PROTOCOL_F_XEN_MMAP feature bit was defined in
f21e95ee97d, which has been part of qemu's 8.1.0 release.  However, it
seems it was never added to qemu's code, but it is well possible that it
is already used by different front-ends outside of qemu (i.e., Xen).


Yep, and also some backends (e.g. we released rust-vmm/vhost v0.8.0 with 
F_XEN_MMAP = 17 defined).




VHOST_USER_PROTOCOL_F_SHARED_OBJECT in contrast was added to qemu's 
code

in 16094766627, but never defined in the vhost-user specification.  As a
consequence, both bits were defined to be 17, which cannot work.

Regardless of whether actual code or the specification should take
precedence, F_XEN_MMAP is already part of a qemu release, while
F_SHARED_OBJECT is not.  Therefore, bump the latter to take number 18
instead of 17, and add this to the specification.

Take the opportunity to add at least a little note on the
VhostUserShared structure to the specification.  This structure is
referenced by the new commands introduced in 16094766627, but was not
defined.

Fixes: 160947666276c5b7f6bca4d746bcac2966635d79
  ("vhost-user: add shared_object msg")
Signed-off-by: Hanna Czenczek 
---
docs/interop/vhost-user.rst   | 11 +++
include/hw/virtio/vhost-user.h|  3 ++-
subprojects/libvhost-user/libvhost-user.h |  3 ++-
3 files changed, 15 insertions(+), 2 deletions(-)


Thanks for fixinig this!

Reviewed-by: Stefano Garzarella 



diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 415bb47a19..768fb5c28c 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -275,6 +275,16 @@ Inflight description

:queue size: a 16-bit size of virtqueues

+VhostUserShared
+^^^
+
++--+
+| UUID |
++--+
+
+:UUID: 16 bytes UUID, whose first three components (a 32-bit value, then
+  two 16-bit values) are stored in big endian.
+
C structure
---

@@ -885,6 +895,7 @@ Protocol features
  #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS  15
  #define VHOST_USER_PROTOCOL_F_STATUS   16
  #define VHOST_USER_PROTOCOL_F_XEN_MMAP 17
+  #define VHOST_USER_PROTOCOL_F_SHARED_OBJECT18

Front-end message types
---
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index 9f9ddf878d..1d4121431b 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -29,7 +29,8 @@ enum VhostUserProtocolFeature {
VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS = 14,
VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
VHOST_USER_PROTOCOL_F_STATUS = 16,
-VHOST_USER_PROTOCOL_F_SHARED_OBJECT = 17,
+/* Feature 17 reserved for VHOST_USER_PROTOCOL_F_XEN_MMAP. */
+VHOST_USER_PROTOCOL_F_SHARED_OBJECT = 18,
VHOST_USER_PROTOCOL_F_MAX
};

diff --git a/subprojects/libvhost-user/libvhost-user.h 
b/subprojects/libvhost-user/libvhost-user.h
index b36a42a7ca..c2352904f0 100644
--- a/subprojects/libvhost-user/libvhost-user.h
+++ b/subprojects/libvhost-user/libvhost-user.h
@@ -65,7 +65,8 @@ enum VhostUserProtocolFeature {
VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS = 14,
VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS = 15,
/* Feature 16 is reserved for VHOST_USER_PROTOCOL_F_STATUS. */
-VHOST_USER_PROTOCOL_F_SHARED_OBJECT = 17,
+/* Feature 17 reserved for VHOST_USER_PROTOCOL_F_XEN_MMAP. */
+VHOST_USER_PROTOCOL_F_SHARED_OBJECT = 18,
VHOST_USER_PROTOCOL_F_MAX
};

--
2.41.0

Re: [PATCH v9 00/12] Add VIRTIO sound card

2023-09-14 Thread Stefano Garzarella


On Thu, Sep 14, 2023 at 01:02:05PM +0300, Manos Pitsidianakis wrote:

On Thu, 14 Sep 2023 12:54, Stefano Garzarella  wrote:

We are seeing something strange with the virtio-sound Linux driver.
It seems that the driver modifies the buffers after exposing them to
the device via the avail ring.


I need more information about this bug. What is the unexpected 
behavior that made you find that the buffer was modified in the first 
place?


CCing Matias for more details, but initially can you just run the test
Matias suggested to check if you experience the same behaviour or not?

Stefano

Re: [PATCH v9 00/12] Add VIRTIO sound card

2023-09-14 Thread Stefano Garzarella


On Wed, Sep 13, 2023 at 10:33:07AM +0300, Emmanouil Pitsidianakis wrote:

This patch series adds an audio device implementing the recent virtio
sound spec (1.2) and a corresponding PCI wrapper device.

v9 can be found online at:

https://gitlab.com/epilys/qemu/-/tree/virtio-snd-v9

Ref 06e6b17186

Main differences with v8 patch series [^v8]
:

- Addressed [^v8] review comments.
- Add cpu_to_le32(_) and le32_to_cpu(_) conversions for messages from/to
 the guest according to the virtio spec.
- Inlined some functions and types to reduce review complexity.
- Corrected the replies to IO messages; now both Playback and Capture
 work correctly for me. (If you hear cracks in pulseaudio+guest, try
 pipewire+guest).


We are seeing something strange with the virtio-sound Linux driver.
It seems that the driver modifies the buffers after exposing them to
the device via the avail ring.

It seems we have this strange behaviour with this built-in QEMU device,
but also with the vhost-device-sound, so it could be some spec
violation in the Linux driver.

Matias also reported on the v8 of this series:
https://lore.kernel.org/qemu-devel/ZPg60lzXWxHPQJEa@fedora/

Can you check if you have the same behaviour?

Nothing that blocks this series of course, but just to confirm that
there may be something to fix in the Linux driver.

Thanks,
Stefano

Re: [PATCH] subprojects: add wrap file for libblkio

2023-09-06 Thread Stefano Garzarella


On Wed, Sep 06, 2023 at 12:45:29PM +0200, Paolo Bonzini wrote:

This allows building libblkio at the same time as QEMU, if QEMU is
configured with --enable-blkio --enable-download.

Signed-off-by: Paolo Bonzini 
---
subprojects/libblkio.wrap | 6 ++
1 file changed, 6 insertions(+)
create mode 100644 subprojects/libblkio.wrap

diff --git a/subprojects/libblkio.wrap b/subprojects/libblkio.wrap
new file mode 100644
index 000..f77af72210c
--- /dev/null
+++ b/subprojects/libblkio.wrap
@@ -0,0 +1,6 @@
+[wrap-git]
+url = https://gitlab.com/libblkio/libblkio
+revision = f84cc963a444e4cb34813b2dcfc5bf8526947dc0
+
+[provide]
+blkio = libblkio_dep
--
2.41.0




I'm trying it in a clean container and I have this issue:

  HEAD is now at f84cc96 allow building libblkio as a subproject

  Executing subproject libblkio

  ../subprojects/libblkio/meson.build:1:0: ERROR: Command 
`/root/qemu/subprojects/libblkio/./package-version.py` failed with status 1.

  A full log can be found at /root/qemu/build/meson-logs/meson-log.txt

  ERROR: meson setup failed


Looking in the meson-log.txt, I have this:

  Package 'blkio', required by 'virtual:world', not found
  ---
  Run-time dependency blkio found: NO (tried pkgconfig)
  Looking for a fallback subproject for the dependency blkio

  Executing subproject libblkio

  Running command: /root/qemu/subprojects/libblkio/./package-version.py
  --- stdout ---

  --- stderr ---
  Traceback (most recent call last):
File "/root/qemu/subprojects/libblkio/./package-version.py", line 5, in 

  out = subprocess.check_output(["cargo", "read-manifest"])
^^^
File "/usr/lib64/python3.11/subprocess.py", line 466, in check_output
  return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
 ^
File "/usr/lib64/python3.11/subprocess.py", line 548, in run
  with Popen(*popenargs, **kwargs) as process:
   ^^^
File "/usr/lib64/python3.11/subprocess.py", line 1026, in __init__
  self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib64/python3.11/subprocess.py", line 1950, in _execute_child
  raise child_exception_type(errno_num, err_msg, err_filename)
  FileNotFoundError: [Errno 2] No such file or directory: 'cargo'

Installing `cargo` fixed the issue on my side.
I don't know if it is possible, but can we put some dependecy check in
the wrap file?

Anyway, after installing cargo it works pretty well :-)

Tested-by: Stefano Garzarella

Re: [PATCH 7/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-08-30 Thread Stefano Garzarella


On Wed, Aug 30, 2023 at 11:26:41AM +0200, Laszlo Ersek wrote:

On 8/30/23 10:39, Stefano Garzarella wrote:

On Sun, Aug 27, 2023 at 08:29:37PM +0200, Laszlo Ersek wrote:

(1) The virtio-1.0 specification
<http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html> writes:


What about referring the latest spec available now (1.2)?


I didn't want to do that because the OVMF guest driver was written
against 1.0 (and the spec and the device are backwards compatible).

But, I don't feel strongly about this; I'm OK updating the reference /
quote to 1.2.






3 General Initialization And Device Operation
3.1   Device Initialization
3.1.1 Driver Requirements: Device Initialization

[...]

7. Perform device-specific setup, including discovery of virtqueues for
   the device, optional per-bus setup, reading and possibly writing the
   device’s virtio configuration space, and population of virtqueues.

8. Set the DRIVER_OK status bit. At this point the device is “live”.


and


4 Virtio Transport Options
4.1   Virtio Over PCI Bus
4.1.4 Virtio Structure PCI Capabilities
4.1.4.3   Common configuration structure layout
4.1.4.3.2 Driver Requirements: Common configuration structure layout

[...]

The driver MUST configure the other virtqueue fields before enabling the
virtqueue with queue_enable.

[...]


These together mean that the following sub-sequence of steps is valid for
a virtio-1.0 guest driver:

(1.1) set "queue_enable" for the needed queues as the final part of
device
initialization step (7),

(1.2) set DRIVER_OK in step (8),

(1.3) immediately start sending virtio requests to the device.

(2) When vhost-user is enabled, and the VHOST_USER_F_PROTOCOL_FEATURES
special virtio feature is negotiated, then virtio rings start in disabled
state, according to
<https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#ring-states>.
In this case, explicit VHOST_USER_SET_VRING_ENABLE messages are needed
for
enabling vrings.

Therefore setting "queue_enable" from the guest (1.1) is a *control
plane*
operation, which travels from the guest through QEMU to the vhost-user
backend, using a unix domain socket.

Whereas sending a virtio request (1.3) is a *data plane* operation, which
evades QEMU -- it travels from guest to the vhost-user backend via
eventfd.

This means that steps (1.1) and (1.3) travel through different channels,
and their relative order can be reversed, as perceived by the vhost-user
backend.

That's exactly what happens when OVMF's virtiofs driver (VirtioFsDxe)
runs
against the Rust-language virtiofsd version 1.7.2. (Which uses version
0.10.1 of the vhost-user-backend crate, and version 0.8.1 of the vhost
crate.)

Namely, when VirtioFsDxe binds a virtiofs device, it goes through the
device initialization steps (i.e., control plane operations), and
immediately sends a FUSE_INIT request too (i.e., performs a data plane
operation). In the Rust-language virtiofsd, this creates a race between
two components that run *concurrently*, i.e., in different threads or
processes:

- Control plane, handling vhost-user protocol messages:

 The "VhostUserSlaveReqHandlerMut::set_vring_enable" method
 [crates/vhost-user-backend/src/handler.rs] handles
 VHOST_USER_SET_VRING_ENABLE messages, and updates each vring's "enabled"
 flag according to the message processed.

- Data plane, handling virtio / FUSE requests:

 The "VringEpollHandler::handle_event" method
 [crates/vhost-user-backend/src/event_loop.rs] handles the incoming
 virtio / FUSE request, consuming the virtio kick at the same time. If
 the vring's "enabled" flag is set, the virtio / FUSE request is
 processed genuinely. If the vring's "enabled" flag is clear, then the
 virtio / FUSE request is discarded.

Note that OVMF enables the queue *first*, and sends FUSE_INIT *second*.
However, if the data plane processor in virtiofsd wins the race, then it
sees the FUSE_INIT *before* the control plane processor took notice of
VHOST_USER_SET_VRING_ENABLE and green-lit the queue for the data plane
processor. Therefore the latter drops FUSE_INIT on the floor, and goes
back to waiting for further virtio / FUSE requests with epoll_wait.
Meanwhile OVMF is stuck waiting for the FUSET_INIT response -- a
deadlock.

The deadlock is not deterministic. OVMF hangs infrequently during first
boot. However, OVMF hangs almost certainly during reboots from the UEFI
shell.

The race can be "reliably masked" by inserting a very small delay -- a
single debug message -- at the top of "VringEpollHandler::handle_event",
i.e., just before the data plane processor checks the "enabled" field of
the vring. That delay suffices for the control plane processor to act
upon
VHOST_USER_SET_VRING_ENABLE.

We can deterministically prevent the race in QEMU, by blocking OVMF
inside
step (1.1) -- i.e., in the write to the "queue_enable" register -- unti

Re: [PATCH 0/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-08-30 Thread Stefano Garzarella


On Sun, Aug 27, 2023 at 08:29:30PM +0200, Laszlo Ersek wrote:

The last patch in the series states and fixes the problem; prior patches
only refactor the code.


Thanks for the fix and great cleanup!

I fully reviewed the series and LGTM.

An additional step that we can take (not in this series) crossed my
mind, though. In some places we repeat the following pattern:

vhost_user_write(dev, , NULL, 0);
...

if (reply_supported) {
return process_message_reply(dev, );
}

So what about extending the vhost_user_write_msg() added in this series
to support also this cases and remove some code.
Or maybe integrate vhost_user_write_msg() in vhost_user_write().

I mean something like this (untested):

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 01e0ca90c5..9ee2a78afa 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1130,13 +1130,19 @@ static int vhost_user_get_features(struct vhost_dev 
*dev, uint
64_t *features)
 return 0;
 }

+typedef enum {
+NO_REPLY,
+REPLY_IF_SUPPORTED,
+REPLY_FORCED,
+} VhostUserReply;
+
 /* Note: "msg->hdr.flags" may be modified. */
 static int vhost_user_write_msg(struct vhost_dev *dev, VhostUserMsg *msg,
-bool wait_for_reply)
+VhostUserReply reply)
 {
 int ret;

-if (wait_for_reply) {
+if (reply != NO_REPLY) {
 bool reply_supported = virtio_has_feature(dev->protocol_features,
   VHOST_USER_PROTOCOL_F_REPLY_ACK);
 if (reply_supported) {
@@ -1149,7 +1155,7 @@ static int vhost_user_write_msg(struct vhost_dev *dev, 
VhostUserMsg *msg,
 return ret;
 }

-if (wait_for_reply) {
+if (reply != NO_REPLY) {
 uint64_t dummy;

 if (msg->hdr.flags & VHOST_USER_NEED_REPLY_MASK) {
@@ -1162,7 +1168,9 @@ static int vhost_user_write_msg(struct vhost_dev 
*dev, VhostUserMsg *msg,

 * Send VHOST_USER_GET_FEATURES which makes all backends
 * send a reply.
 */
-return vhost_user_get_features(dev, );
+if (reply == REPLY_FORCED) {
+return vhost_user_get_features(dev, );
+}
 }

 return 0;
@@ -2207,9 +2228,6 @@ static bool vhost_user_can_merge(struct vhost_dev *dev,
 static int vhost_user_net_set_mtu(struct vhost_dev *dev, uint16_t mtu)
 {
 VhostUserMsg msg;
-bool reply_supported = virtio_has_feature(dev->protocol_features,
-  VHOST_USER_PROTOCOL_F_REPLY_ACK);
-int ret;
 
 if (!(dev->protocol_features & (1ULL << VHOST_USER_PROTOCOL_F_NET_MTU))) {

 return 0;
@@ -2219,21 +2237,9 @@ static int vhost_user_net_set_mtu(struct vhost_dev *dev, 
uint16_t mtu)
 msg.payload.u64 = mtu;
 msg.hdr.size = sizeof(msg.payload.u64);
 msg.hdr.flags = VHOST_USER_VERSION;
-if (reply_supported) {
-msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
-}
-
-ret = vhost_user_write(dev, , NULL, 0);
-if (ret < 0) {
-return ret;
-}
 
 /* If reply_ack supported, backend has to ack specified MTU is valid */

-if (reply_supported) {
-return process_message_reply(dev, );
-}
-
-return 0;
+return vhost_user_write_msg(dev, , REPLY_IF_SUPPORTED);
 }
 
 static int vhost_user_send_device_iotlb_msg(struct vhost_dev *dev,

@@ -2313,10 +2319,7 @@ static int vhost_user_get_config(struct vhost_dev *dev, 
uint8_t *config,
 static int vhost_user_set_config(struct vhost_dev *dev, const uint8_t *data,
  uint32_t offset, uint32_t size, uint32_t 
flags)
 {
-int ret;
 uint8_t *p;
-bool reply_supported = virtio_has_feature(dev->protocol_features,
-  VHOST_USER_PROTOCOL_F_REPLY_ACK);
 
 VhostUserMsg msg = {

 .hdr.request = VHOST_USER_SET_CONFIG,
@@ -2329,10 +2332,6 @@ static int vhost_user_set_config(struct vhost_dev *dev, 
const uint8_t *data,
 return -ENOTSUP;
 }
 
-if (reply_supported) {

-msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
-}
-
 if (size > VHOST_USER_MAX_CONFIG_SIZE) {
 return -EINVAL;
 }
@@ -2343,16 +2342,7 @@ static int vhost_user_set_config(struct vhost_dev *dev, 
const uint8_t *data,
 p = msg.payload.config.region;
 memcpy(p, data, size);
 
-ret = vhost_user_write(dev, , NULL, 0);

-if (ret < 0) {
-return ret;
-}
-
-if (reply_supported) {
-return process_message_reply(dev, );
-}
-
-return 0;
+return vhost_user_write_msg(dev, , REPLY_IF_SUPPORTED);
 }

Thanks,
Stefano

Re: [PATCH 7/7] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously

2023-08-30 Thread Stefano Garzarella

for the reply, if the VHOST_USER_PROTOCOL_F_REPLY_ACK vhost-user feature
 has been negotiated, or

- performing a separate VHOST_USER_GET_FEATURES *exchange*, which requires
 a backend response regardless of VHOST_USER_PROTOCOL_F_REPLY_ACK.


Thanks for the excellent analysis (and fix of course!).



Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
---
hw/virtio/vhost-user.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index beb4b832245e..01e0ca90c538 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1235,7 +1235,7 @@ static int vhost_user_set_vring_enable(struct vhost_dev 
*dev, int enable)
.num   = enable,
};



How about adding a small comment here summarizing the commit message in
a few lines?

Should we cc stable for this fix?


In any case, the fix LGTM, so:

Reviewed-by: Stefano Garzarella 


-ret = vhost_set_vring(dev, VHOST_USER_SET_VRING_ENABLE, , false);
+ret = vhost_set_vring(dev, VHOST_USER_SET_VRING_ENABLE, , true);
if (ret < 0) {
/*
 * Restoring the previous state is likely infeasible, as well as

Re: [PATCH 6/7] vhost-user: allow "vhost_set_vring" to wait for a reply

2023-08-30 Thread Stefano Garzarella


On Sun, Aug 27, 2023 at 08:29:36PM +0200, Laszlo Ersek wrote:

The "vhost_set_vring" function already centralizes the common parts of
"vhost_user_set_vring_num", "vhost_user_set_vring_base" and
"vhost_user_set_vring_enable". We'll want to allow some of those callers
to wait for a reply.

Therefore, rebase "vhost_set_vring" from just "vhost_user_write" to
"vhost_user_write_msg", exposing the "wait_for_reply" parameter.

This is purely refactoring -- there is no observable change. That's
because:

- all three callers pass in "false" for "wait_for_reply", which disables
 all logic in "vhost_user_write_msg" except the call to
 "vhost_user_write";

- the fds=NULL and fd_num=0 arguments of the original "vhost_user_write"
 call inside "vhost_set_vring" are hard-coded within
 "vhost_user_write_msg".

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
---
hw/virtio/vhost-user.c | 11 ++-
1 file changed, 6 insertions(+), 5 deletions(-)


Reviewed-by: Stefano Garzarella

Re: [PATCH 4/7] vhost-user: flatten "enforce_reply" into "vhost_user_write_msg"

2023-08-30 Thread Stefano Garzarella


On Sun, Aug 27, 2023 at 08:29:34PM +0200, Laszlo Ersek wrote:

At this point, only "vhost_user_write_msg" calls "enforce_reply"; embed
the latter into the former.

This is purely refactoring -- no observable change.

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
---
hw/virtio/vhost-user.c | 32 
1 file changed, 13 insertions(+), 19 deletions(-)


Reviewed-by: Stefano Garzarella

Re: [PATCH 5/7] vhost-user: hoist "write_msg", "get_features", "get_u64"

2023-08-30 Thread Stefano Garzarella


On Sun, Aug 27, 2023 at 08:29:35PM +0200, Laszlo Ersek wrote:

In order to avoid a forward-declaration for "vhost_user_write_msg" in a
subsequent patch, hoist "vhost_user_write_msg" ->
"vhost_user_get_features" -> "vhost_user_get_u64" just above
"vhost_set_vring".

This is purely code movement -- no observable change.

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
---
hw/virtio/vhost-user.c | 170 ++--
1 file changed, 85 insertions(+), 85 deletions(-)


Reviewed-by: Stefano Garzarella

Re: [PATCH 3/7] vhost-user: factor out "vhost_user_write_msg"

2023-08-30 Thread Stefano Garzarella


On Sun, Aug 27, 2023 at 08:29:33PM +0200, Laszlo Ersek wrote:

The tails of the "vhost_user_set_vring_addr" and "vhost_user_set_u64"
functions are now byte-for-byte identical. Factor the common tail out to a
new function called "vhost_user_write_msg".

This is purely refactoring -- no observable change.

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
---
hw/virtio/vhost-user.c | 66 +---
1 file changed, 28 insertions(+), 38 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 64eac317bfb2..36f99b66a644 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1320,10 +1320,35 @@ static int enforce_reply(struct vhost_dev *dev,
return vhost_user_get_features(dev, );
}

+/* Note: "msg->hdr.flags" may be modified. */
+static int vhost_user_write_msg(struct vhost_dev *dev, VhostUserMsg *msg,
+bool wait_for_reply)


The difference between vhost_user_write() and vhost_user_write_msg() is
not immediately obvious from the function name, so I would propose
something different, like vhost_user_write_sync() or
vhost_user_write_wait().

Anyway, I'm not good with names and don't have a strong opinion, so this 
version is fine with me as well :-)


Reviewed-by: Stefano Garzarella

Re: [PATCH 2/7] vhost-user: tighten "reply_supported" scope in "set_vring_addr"

2023-08-30 Thread Stefano Garzarella


On Sun, Aug 27, 2023 at 08:29:32PM +0200, Laszlo Ersek wrote:

In the vhost_user_set_vring_addr() function, we calculate
"reply_supported" unconditionally, even though we'll only need it if
"wait_for_reply" is also true.

Restrict the scope of "reply_supported" to the minimum.

This is purely refactoring -- no observable change.

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
---
hw/virtio/vhost-user.c | 11 ++-
1 file changed, 6 insertions(+), 5 deletions(-)


Reviewed-by: Stefano Garzarella

Re: [PATCH 1/7] vhost-user: strip superfluous whitespace

2023-08-30 Thread Stefano Garzarella


On Sun, Aug 27, 2023 at 08:29:31PM +0200, Laszlo Ersek wrote:

Cc: "Michael S. Tsirkin"  (supporter:vhost)
Cc: Eugenio Perez Martin 
Cc: German Maglione 
Cc: Liu Jiang 
Cc: Sergio Lopez Pascual 
Cc: Stefano Garzarella 
Signed-off-by: Laszlo Ersek 
---
hw/virtio/vhost-user.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Stefano Garzarella

Re: [PATCH] Revert "virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events"

2023-08-03 Thread Stefano Garzarella

Hi Michael,

On Thu, Aug 3, 2023 at 10:02 PM Michael S. Tsirkin  wrote:
>
> On Wed, Jul 05, 2023 at 09:15:23AM +0200, Stefano Garzarella wrote:
> > This reverts commit 8cc5583abe6419e7faaebc9fbd109f34f4c850f2.
> >
> > That commit causes several problems in Linux as described in the BZ.
> > In particular, after a while, other devices on the bus are no longer
> > usable even if those devices are not affected by the hotunplug.
> > This may be a problem in Linux, but we have not been able to identify
> > it so far. So better to revert this patch until we find a solution.
> >
> > Also, Oracle, which initially proposed this patch for a problem with
> > Solaris, seems to have already reversed it downstream:
> > https://linux.oracle.com/errata/ELSA-2023-12065.html
> >
> > Suggested-by: Thomas Huth 
> > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2176702
> > Cc: qemu-sta...@nongnu.org
> > Cc: Mark Kanda 
> > Signed-off-by: Stefano Garzarella 
>
> scsi maintainers, what should be done about this patch?
> we don't want a regression in the release ...
> revert for now and think what's the right thing to do is
> after the release?

We found the issue and fixed it in this release with commit 9472083e64
("scsi: fetch unit attention when creating the request").

So we don't need to revert it anymore.

Thanks,
Stefano

[PATCH v2 2/2] block/blkio: add more comments on the fd passing handling

2023-08-03 Thread Stefano Garzarella

As Hanna pointed out, it is not clear in the code why qemu_open()
can fail, and why blkio_set_int("fd") is not enough to discover
the `fd` property support.

Let's fix them by adding more details in the code comments.

Suggested-by: Hanna Czenczek 
Reviewed-by: Hanna Czenczek 
Signed-off-by: Stefano Garzarella 
---
 block/blkio.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index baba2f0b67..1dd495617c 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -713,6 +713,12 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
  */
 fd = qemu_open(path, O_RDWR, NULL);
 if (fd < 0) {
+/*
+ * qemu_open() can fail if the user specifies a path that is not
+ * a file or device, for example in the case of Unix Domain Socket
+ * for the virtio-blk-vhost-user driver. In such cases let's have
+ * libblkio open the path directly.
+ */
 fd_supported = false;
 } else {
 ret = blkio_set_int(s->blkio, "fd", fd);
@@ -741,9 +747,12 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
 }
 
 /*
- * If the libblkio driver doesn't support the `fd` property, 
blkio_connect()
- * will fail with -EINVAL. So let's try calling blkio_connect() again by
- * directly setting `path`.
+ * Before https://gitlab.com/libblkio/libblkio/-/merge_requests/208
+ * (libblkio <= v1.3.0), setting the `fd` property is not enough to check
+ * whether the driver supports the `fd` property or not. In that case,
+ * blkio_connect() will fail with -EINVAL.
+ * So let's try calling blkio_connect() again by directly setting `path`
+ * to cover this scenario.
  */
 if (fd_supported && ret == -EINVAL) {
 /*
-- 
2.41.0

[PATCH v2 0/2] block/blkio: fix fd leak and add more comments for the fd passing

2023-08-03 Thread Stefano Garzarella

Hanna discovered an fd leak in the error path, and a few comments to
improve in the code.

v2:
  - avoid to use `fd_supported` to track a valid fd [Hanna]

v1: 
https://lore.kernel.org/qemu-devel/20230801160332.122564-1-sgarz...@redhat.com/

Stefano Garzarella (2):
  block/blkio: close the fd when blkio_connect() fails
  block/blkio: add more comments on the fd passing handling

 block/blkio.c | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

-- 
2.41.0

[PATCH v2 1/2] block/blkio: close the fd when blkio_connect() fails

2023-08-03 Thread Stefano Garzarella

libblkio drivers take ownership of `fd` only after a successful
blkio_connect(), so if it fails, we are still the owners.

Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for 
virtio-blk")
Suggested-by: Hanna Czenczek 
Signed-off-by: Stefano Garzarella 
---

Notes:
v2:
- avoid to use `fd_supported` to track a valid fd [Hanna]

 block/blkio.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 8e7ce42c79..baba2f0b67 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -678,7 +678,7 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
 const char *path = qdict_get_try_str(options, "path");
 BDRVBlkioState *s = bs->opaque;
 bool fd_supported = false;
-int fd, ret;
+int fd = -1, ret;
 
 if (!path) {
 error_setg(errp, "missing 'path' option");
@@ -719,6 +719,7 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
 if (ret < 0) {
 fd_supported = false;
 qemu_close(fd);
+fd = -1;
 }
 }
 }
@@ -733,14 +734,18 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
 }
 
 ret = blkio_connect(s->blkio);
+if (ret < 0 && fd >= 0) {
+/* Failed to give the FD to libblkio, close it */
+qemu_close(fd);
+fd = -1;
+}
+
 /*
  * If the libblkio driver doesn't support the `fd` property, 
blkio_connect()
  * will fail with -EINVAL. So let's try calling blkio_connect() again by
  * directly setting `path`.
  */
 if (fd_supported && ret == -EINVAL) {
-qemu_close(fd);
-
 /*
  * We need to clear the `fd` property we set previously by setting
  * it to -1.
-- 
2.41.0

Re: [PATCH 1/2] block/blkio: close the fd when blkio_connect() fails

2023-08-02 Thread Stefano Garzarella


On Wed, Aug 02, 2023 at 01:15:40PM +0200, Hanna Czenczek wrote:

On 01.08.23 18:03, Stefano Garzarella wrote:

libblkio drivers take ownership of `fd` only after a successful
blkio_connect(), so if it fails, we are still the owners.

Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for 
virtio-blk")
Suggested-by: Hanna Czenczek 
Signed-off-by: Stefano Garzarella 
---
 block/blkio.c | 9 +
 1 file changed, 9 insertions(+)


Works, so:

Reviewed-by: Hanna Czenczek 


But personally, instead of having `fd_supported` track whether we have 
a valid FD or not, I’d find it more intuitive to track ownership 
through the `fd` variable itself, i.e. initialize it to -1, and set it 
-1 when ownership is transferred, and then close it once we don’t need 
it anymore, but failed to transfer ownership to blkio.  The elaborate 
way would be something like


...
-int fd, ret;
+int fd = -1;
+int ret;
...
 ret = blkio_connect(s->blkio);
+if (!ret) {
+    /* If we had an FD, libblkio now has ownership of it */
+    fd = -1;
+}
+if (fd >= 0) {
+    /* We still have FD ownership, but no longer need it, so close it */
+    qemu_close(fd);
+    fd = -1;
+}
 /*
  * [...]
  */
 if (fd_supported && ret == -EINVAL) {
-    qemu_close(fd);
-
...


Or the shorter less-verbose version would be:

...
-int fd, ret;
+int fd = -1;
+int ret;
...
 ret = blkio_connect(s->blkio);
+if (fd >= 0 && ret < 0) {
+    /* Failed to give the FD to libblkio, close it */
+    qemu_close(fd);
+}


I like this, I'll do it in v2!

Thanks,
Stefano

[PATCH 1/2] block/blkio: close the fd when blkio_connect() fails

2023-08-01 Thread Stefano Garzarella

libblkio drivers take ownership of `fd` only after a successful
blkio_connect(), so if it fails, we are still the owners.

Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for 
virtio-blk")
Suggested-by: Hanna Czenczek 
Signed-off-by: Stefano Garzarella 
---
 block/blkio.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/block/blkio.c b/block/blkio.c
index 8e7ce42c79..2d53a865e7 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -739,6 +739,7 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
  * directly setting `path`.
  */
 if (fd_supported && ret == -EINVAL) {
+fd_supported = false;
 qemu_close(fd);
 
 /*
@@ -763,6 +764,14 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
 }
 
 if (ret < 0) {
+if (fd_supported) {
+/*
+ * libblkio drivers take ownership of `fd` only after a successful
+ * blkio_connect(), so if it fails, we are still the owners.
+ */
+qemu_close(fd);
+}
+
 error_setg_errno(errp, -ret, "blkio_connect failed: %s",
  blkio_get_error_msg());
 return ret;
-- 
2.41.0

[PATCH 0/2] block/blkio: fix fd leak and add more comments for the fd passing

2023-08-01 Thread Stefano Garzarella

Hanna discovered an fd leak in the error path, and a few comments to
improve in the code.

Stefano Garzarella (2):
  block/blkio: close the fd when blkio_connect() fails
  block/blkio: add more comments on the fd passing handling

 block/blkio.c | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)

-- 
2.41.0

[PATCH 2/2] block/blkio: add more comments on the fd passing handling

2023-08-01 Thread Stefano Garzarella

As Hanna pointed out, it is not clear in the code why qemu_open()
can fail, and why blkio_set_int("fd") is not enough to discover
the `fd` property support.

Let's fix them by adding more details in the code comments.

Suggested-by: Hanna Czenczek 
Signed-off-by: Stefano Garzarella 
---
 block/blkio.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 2d53a865e7..848b8189d0 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -713,6 +713,12 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
  */
 fd = qemu_open(path, O_RDWR, NULL);
 if (fd < 0) {
+/*
+ * qemu_open() can fail if the user specifies a path that is not
+ * a file or device, for example in the case of Unix Domain Socket
+ * for the virtio-blk-vhost-user driver. In such cases let's have
+ * libblkio open the path directly.
+ */
 fd_supported = false;
 } else {
 ret = blkio_set_int(s->blkio, "fd", fd);
@@ -734,9 +740,12 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
 
 ret = blkio_connect(s->blkio);
 /*
- * If the libblkio driver doesn't support the `fd` property, 
blkio_connect()
- * will fail with -EINVAL. So let's try calling blkio_connect() again by
- * directly setting `path`.
+ * Before https://gitlab.com/libblkio/libblkio/-/merge_requests/208
+ * (libblkio <= v1.3.0), setting the `fd` property is not enough to check
+ * whether the driver supports the `fd` property or not. In that case,
+ * blkio_connect() will fail with -EINVAL.
+ * So let's try calling blkio_connect() again by directly setting `path`
+ * to cover this scenario.
  */
 if (fd_supported && ret == -EINVAL) {
 fd_supported = false;
-- 
2.41.0

[PATCH v2 0/4] block/blkio: fix opening virtio-blk drivers

2023-07-27 Thread Stefano Garzarella

There is a problem with virtio-blk-vhost-vdpa.
The first patch does some preparation changes. The second and third patches
fix the issues, the last patch tries to prepare QEMU for a future version of
libblkio where we can use blkio_set_fd() to check whether the property is
supported or not.

While testing, I realized that the main problem was that qemu_open() does not
support UDS, but still the problem with blkio_connect() which can fail remains.

v2:
- added first patch in preparation of the others
- reworked patch 2 retrying blkio_connect [Stefan]
- added patch 3 since qemu_open() fails in UDS
- changed patch 4 commit description [Stefan]

v1: 
https://lore.kernel.org/qemu-devel/20230724154611.178858-1-sgarz...@redhat.com/

Based on stefanha/block branch.

Stefano Garzarella (4):
  block/blkio: move blkio_connect() in the drivers functions
  block/blkio: retry blkio_connect() if it fails using `fd`
  block/blkio: fall back on using `path` when `fd` setting fails
  block/blkio: use blkio_set_int("fd") to check fd support

 block/blkio.c | 108 +++---
 1 file changed, 75 insertions(+), 33 deletions(-)

-- 
2.41.0

[PATCH v2 4/4] block/blkio: use blkio_set_int("fd") to check fd support

2023-07-27 Thread Stefano Garzarella

Setting the `fd` property fails with virtio-blk-* libblkio drivers
that do not support fd passing since
https://gitlab.com/libblkio/libblkio/-/merge_requests/208.

Getting the `fd` property, on the other hand, always succeeds for
virtio-blk-* libblkio drivers even when they don't support fd passing.

This patch switches to setting the `fd` property because it is a
better mechanism for probing fd passing support than getting the `fd`
property.

Signed-off-by: Stefano Garzarella 
---

Notes:
v2:
- changed commit description [Stefan]

 block/blkio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blkio.c b/block/blkio.c
index 72b46d61fd..8e7ce42c79 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -690,7 +690,7 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
 return -EINVAL;
 }
 
-if (blkio_get_int(s->blkio, "fd", ) == 0) {
+if (blkio_set_int(s->blkio, "fd", -1) == 0) {
 fd_supported = true;
 }
 
-- 
2.41.0

[PATCH v2 3/4] block/blkio: fall back on using `path` when `fd` setting fails

2023-07-27 Thread Stefano Garzarella

qemu_open() fails if called with an unix domain socket in this way:
-blockdev 
node-name=drive0,driver=virtio-blk-vhost-user,path=vhost-user-blk.sock,cache.direct=on:
 Could not open 'vhost-user-blk.sock': No such device or address

Since virtio-blk-vhost-user does not support fd passing, let`s always fall back
on using `path` if we fail the fd passing.

Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for 
virtio-blk")
Reported-by: Qing Wang 
Signed-off-by: Stefano Garzarella 
---
 block/blkio.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 60d2d0f129..72b46d61fd 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -711,19 +711,19 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
  * In order to open the device read-only, we are using the `read-only`
  * property of the libblkio driver in blkio_file_open().
  */
-fd = qemu_open(path, O_RDWR, errp);
+fd = qemu_open(path, O_RDWR, NULL);
 if (fd < 0) {
-return -EINVAL;
+fd_supported = false;
+} else {
+ret = blkio_set_int(s->blkio, "fd", fd);
+if (ret < 0) {
+fd_supported = false;
+qemu_close(fd);
+}
 }
+}
 
-ret = blkio_set_int(s->blkio, "fd", fd);
-if (ret < 0) {
-error_setg_errno(errp, -ret, "failed to set fd: %s",
- blkio_get_error_msg());
-qemu_close(fd);
-return ret;
-}
-} else {
+if (!fd_supported) {
 ret = blkio_set_str(s->blkio, "path", path);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "failed to set path: %s",
-- 
2.41.0

[PATCH v2 2/4] block/blkio: retry blkio_connect() if it fails using `fd`

2023-07-27 Thread Stefano Garzarella

libblkio 1.3.0 added support of "fd" property for virtio-blk-vhost-vdpa
driver. In QEMU, starting from commit cad2ccc395 ("block/blkio: use
qemu_open() to support fd passing for virtio-blk") we are using
`blkio_get_int(..., "fd")` to check if the "fd" property is supported
for all the virtio-blk-* driver.

Unfortunately that property is also available for those driver that do
not support it, such as virtio-blk-vhost-user.

So, `blkio_get_int()` is not enough to check whether the driver supports
the `fd` property or not. This is because the virito-blk common libblkio
driver only checks whether or not `fd` is set during `blkio_connect()`
and fails with -EINVAL for those transports that do not support it
(all except vhost-vdpa for now).

So let's handle the `blkio_connect()` failure, retrying it using `path`
directly.

Fixes: cad2ccc395 ("block/blkio: use qemu_open() to support fd passing for 
virtio-blk")
Suggested-by: Stefan Hajnoczi 
Signed-off-by: Stefano Garzarella 
---

Notes:
v2:
- reworked retrying blkio_connect() [Stefan]

 block/blkio.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/block/blkio.c b/block/blkio.c
index 8ad7c0b575..60d2d0f129 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -733,6 +733,35 @@ static int blkio_virtio_blk_connect(BlockDriverState *bs, 
QDict *options,
 }
 
 ret = blkio_connect(s->blkio);
+/*
+ * If the libblkio driver doesn't support the `fd` property, 
blkio_connect()
+ * will fail with -EINVAL. So let's try calling blkio_connect() again by
+ * directly setting `path`.
+ */
+if (fd_supported && ret == -EINVAL) {
+qemu_close(fd);
+
+/*
+ * We need to clear the `fd` property we set previously by setting
+ * it to -1.
+ */
+ret = blkio_set_int(s->blkio, "fd", -1);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "failed to set fd: %s",
+ blkio_get_error_msg());
+return ret;
+}
+
+ret = blkio_set_str(s->blkio, "path", path);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "failed to set path: %s",
+ blkio_get_error_msg());
+return ret;
+}
+
+ret = blkio_connect(s->blkio);
+}
+
 if (ret < 0) {
 error_setg_errno(errp, -ret, "blkio_connect failed: %s",
  blkio_get_error_msg());
-- 
2.41.0

[PATCH v2 1/4] block/blkio: move blkio_connect() in the drivers functions

2023-07-27 Thread Stefano Garzarella

This is in preparation for the next patch, where for virtio-blk
drivers we need to handle the failure of blkio_connect().

Let's also rename the *_open() functions to *_connect() to make
the code reflect the changes applied.

Signed-off-by: Stefano Garzarella 
---
 block/blkio.c | 67 ++-
 1 file changed, 40 insertions(+), 27 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index 7eb1b94820..8ad7c0b575 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -603,8 +603,8 @@ static void blkio_unregister_buf(BlockDriverState *bs, void 
*host, size_t size)
 }
 }
 
-static int blkio_io_uring_open(BlockDriverState *bs, QDict *options, int flags,
-   Error **errp)
+static int blkio_io_uring_connect(BlockDriverState *bs, QDict *options,
+  int flags, Error **errp)
 {
 const char *filename = qdict_get_str(options, "filename");
 BDRVBlkioState *s = bs->opaque;
@@ -627,11 +627,18 @@ static int blkio_io_uring_open(BlockDriverState *bs, 
QDict *options, int flags,
 }
 }
 
+ret = blkio_connect(s->blkio);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "blkio_connect failed: %s",
+ blkio_get_error_msg());
+return ret;
+}
+
 return 0;
 }
 
-static int blkio_nvme_io_uring(BlockDriverState *bs, QDict *options, int flags,
-   Error **errp)
+static int blkio_nvme_io_uring_connect(BlockDriverState *bs, QDict *options,
+   int flags, Error **errp)
 {
 const char *path = qdict_get_try_str(options, "path");
 BDRVBlkioState *s = bs->opaque;
@@ -655,11 +662,18 @@ static int blkio_nvme_io_uring(BlockDriverState *bs, 
QDict *options, int flags,
 return -EINVAL;
 }
 
+ret = blkio_connect(s->blkio);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "blkio_connect failed: %s",
+ blkio_get_error_msg());
+return ret;
+}
+
 return 0;
 }
 
-static int blkio_virtio_blk_common_open(BlockDriverState *bs,
-QDict *options, int flags, Error **errp)
+static int blkio_virtio_blk_connect(BlockDriverState *bs, QDict *options,
+int flags, Error **errp)
 {
 const char *path = qdict_get_try_str(options, "path");
 BDRVBlkioState *s = bs->opaque;
@@ -718,6 +732,13 @@ static int blkio_virtio_blk_common_open(BlockDriverState 
*bs,
 }
 }
 
+ret = blkio_connect(s->blkio);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "blkio_connect failed: %s",
+ blkio_get_error_msg());
+return ret;
+}
+
 qdict_del(options, "path");
 
 return 0;
@@ -737,24 +758,6 @@ static int blkio_file_open(BlockDriverState *bs, QDict 
*options, int flags,
 return ret;
 }
 
-if (strcmp(blkio_driver, "io_uring") == 0) {
-ret = blkio_io_uring_open(bs, options, flags, errp);
-} else if (strcmp(blkio_driver, "nvme-io_uring") == 0) {
-ret = blkio_nvme_io_uring(bs, options, flags, errp);
-} else if (strcmp(blkio_driver, "virtio-blk-vfio-pci") == 0) {
-ret = blkio_virtio_blk_common_open(bs, options, flags, errp);
-} else if (strcmp(blkio_driver, "virtio-blk-vhost-user") == 0) {
-ret = blkio_virtio_blk_common_open(bs, options, flags, errp);
-} else if (strcmp(blkio_driver, "virtio-blk-vhost-vdpa") == 0) {
-ret = blkio_virtio_blk_common_open(bs, options, flags, errp);
-} else {
-g_assert_not_reached();
-}
-if (ret < 0) {
-blkio_destroy(>blkio);
-return ret;
-}
-
 if (!(flags & BDRV_O_RDWR)) {
 ret = blkio_set_bool(s->blkio, "read-only", true);
 if (ret < 0) {
@@ -765,10 +768,20 @@ static int blkio_file_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 }
 
-ret = blkio_connect(s->blkio);
+if (strcmp(blkio_driver, "io_uring") == 0) {
+ret = blkio_io_uring_connect(bs, options, flags, errp);
+} else if (strcmp(blkio_driver, "nvme-io_uring") == 0) {
+ret = blkio_nvme_io_uring_connect(bs, options, flags, errp);
+} else if (strcmp(blkio_driver, "virtio-blk-vfio-pci") == 0) {
+ret = blkio_virtio_blk_connect(bs, options, flags, errp);
+} else if (strcmp(blkio_driver, "virtio-blk-vhost-user") == 0) {
+ret = blkio_virtio_blk_connect(bs, options, flags, errp);
+} else if (strcmp(blkio_driver, "virtio-blk-vhost-vdpa") == 0) {
+ret = blkio_virtio_blk_connect(bs, options, flags, errp);
+} else {
+g_assert_not_reached();
+}
 if (ret < 0) {
-error_setg_errno(errp, -ret, "blkio_connect failed: %s",
- blkio_get_error_msg());
 blkio_destroy(>blkio);
 return ret;
 }
-- 
2.41.0

Re: [PATCH 1/2] block/blkio: fix opening virtio-blk drivers

2023-07-26 Thread Stefano Garzarella

On Wed, Jul 26, 2023 at 11:32:10AM -0400, Stefan Hajnoczi wrote:

On Wed, Jul 26, 2023 at 09:26:45AM +0200, Stefano Garzarella wrote:

On Tue, Jul 25, 2023 at 04:00:38PM -0400, Stefan Hajnoczi wrote:
> On Mon, Jul 24, 2023 at 05:46:10PM +0200, Stefano Garzarella wrote:
> > libblkio 1.3.0 added support of "fd" property for virtio-blk-vhost-vdpa
> > driver. In QEMU, starting from commit cad2ccc395 ("block/blkio: use
> > qemu_open() to support fd passing for virtio-blk") we are using
> > `blkio_get_int(..., "fd")` to check if the "fd" property is supported
> > for all the virtio-blk-* driver.
> >
> > Unfortunately that property is also available for those driver that do
> > not support it, such as virtio-blk-vhost-user. Indeed now QEMU is
> > failing if used with virtio-blk-vhost-user in this way:
> >
> >-blockdev 
node-name=drive0,driver=virtio-blk-vhost-user,path=vhost-user-blk.sock,cache.direct=on:
 Could not open 'vhost-user-blk.sock': No such device or address
> >
> > So, `blkio_get_int()` is not enough to check whether the driver supports
> > the `fd` property or not. This is because the virito-blk common libblkio
> > driver only checks whether or not `fd` is set during `blkio_connect()`
> > and fails for those transports that do not support it (all except
> > vhost-vdpa for now).
> >
> > So for now let's also check that the driver is virtio-blk-vhost-vdpa,
> > since that's the only one that supports it.
>
> What happens when more virtio-blk-* libblkio drivers gain support for
> `fd`? I think we'll be back to the same problem because QEMU will be
> unable to distinguish between old and new libraries.

If we release a v1.3.1 version of libblkio with
https://gitlab.com/libblkio/libblkio/-/merge_requests/208
we can set a minimum requirement in QEMU and use blkio_set_fd() here.

>
> How about retrying with `path` if opening with `fd` fails?

IIUC the only way is to check if blkio_connect() will fail with -EINVAL,
that can also be generated by other issues, then retry forcing `path`.

Do you see other ways?

No. Checking for -EINVAL and then retrying with `path` is what I had in
mind.

The code wouldn't be great, but we could do it for now and then when
we release a new version of libblkio, do the revert and use
blkio_set_int(fd) to see if it's supported or not.

I don't know if it is worth it, or if it is better to merge this,
release libblkio v1.3.1 and force the minimum requirement.

WDYT?

I prefer retrying on -EINVAL because even if libblkio 1.3.1 is released
today, we don't have control over when it becomes available in distros.

Fair point!

I'll send v2 using that approach!

Thanks,
Stefano

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1151 matches

Mail list logo