On Fri, Oct 17, 2025 at 01:24:52PM +0200, Albert Esteve wrote:
On Fri, Oct 17, 2025 at 11:23 AM Stefano Garzarella <[email protected]> wrote:

On Thu, Oct 16, 2025 at 04:38:21PM +0200, Albert Esteve wrote:
>Add SHMEM_MAP/UNMAP requests to vhost-user for dynamic management of
>VIRTIO Shared Memory mappings.
>
>This implementation introduces VirtioSharedMemoryMapping as a unified
>QOM object that manages both the mapping metadata and MemoryRegion
>lifecycle. This object provides reference-counted lifecycle management
>with automatic cleanup of file descriptors and memory regions
>through QOM finalization.
>
>This request allows backends to dynamically map file descriptors into a
>VIRTIO Shared Memory Region identified by their shmid. Maps are created
>using memory_region_init_ram_from_fd() with configurable read/write
>permissions, and the resulting MemoryRegions are added as subregions to
>the shmem container region. The mapped memory is then advertised to the
>guest VIRTIO drivers as a base address plus offset for reading and
>writting according to the requested mmap flags.
>
>The backend can unmap memory ranges within a given VIRTIO Shared Memory
>Region to free resources. Upon receiving this message, the frontend
>removes the MemoryRegion as a subregion and automatically unreferences
>the VirtioSharedMemoryMapping object, triggering cleanup if no other
>references exist.
>
>Error handling has been improved to ensure consistent behavior across
>handlers that manage their own vhost_user_send_resp() calls. Since
>these handlers clear the VHOST_USER_NEED_REPLY_MASK flag, explicit
>error checking ensures proper connection closure on failures,
>maintaining the expected error flow.
>
>Note the memory region commit for these operations needs to be delayed
>until after we reply to the backend to avoid deadlocks. Otherwise,
>the MemoryListener would send a VHOST_USER_SET_MEM_TABLE message
>before the reply.
>
>Reviewed-by: Stefan Hajnoczi <[email protected]>
>Signed-off-by: Albert Esteve <[email protected]>
>---
> hw/virtio/vhost-user.c                    | 267 ++++++++++++++++++++++
> hw/virtio/virtio.c                        | 199 ++++++++++++++++
> include/hw/virtio/virtio.h                | 135 +++++++++++
> subprojects/libvhost-user/libvhost-user.c |  70 ++++++
> subprojects/libvhost-user/libvhost-user.h |  54 +++++
> 5 files changed, 725 insertions(+)
>
>diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>index 36c9c2e04d..890be55937 100644
>--- a/hw/virtio/vhost-user.c
>+++ b/hw/virtio/vhost-user.c
>@@ -104,6 +104,7 @@ typedef enum VhostUserRequest {
>     VHOST_USER_GET_SHARED_OBJECT = 41,
>     VHOST_USER_SET_DEVICE_STATE_FD = 42,
>     VHOST_USER_CHECK_DEVICE_STATE = 43,
>+    VHOST_USER_GET_SHMEM_CONFIG = 44,
>     VHOST_USER_MAX
> } VhostUserRequest;
>
>@@ -115,6 +116,8 @@ typedef enum VhostUserBackendRequest {
>     VHOST_USER_BACKEND_SHARED_OBJECT_ADD = 6,
>     VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE = 7,
>     VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP = 8,
>+    VHOST_USER_BACKEND_SHMEM_MAP = 9,
>+    VHOST_USER_BACKEND_SHMEM_UNMAP = 10,
>     VHOST_USER_BACKEND_MAX
> }  VhostUserBackendRequest;
>
>@@ -136,6 +139,12 @@ typedef struct VhostUserMemRegMsg {
>     VhostUserMemoryRegion region;
> } VhostUserMemRegMsg;
>
>+typedef struct VhostUserShMemConfig {
>+    uint32_t nregions;
>+    uint32_t padding;
>+    uint64_t memory_sizes[VIRTIO_MAX_SHMEM_REGIONS];
>+} VhostUserShMemConfig;
>+
> typedef struct VhostUserLog {
>     uint64_t mmap_size;
>     uint64_t mmap_offset;
>@@ -192,6 +201,23 @@ typedef struct VhostUserShared {
>     unsigned char uuid[16];
> } VhostUserShared;
>
>+/* For the flags field of VhostUserMMap */
>+#define VHOST_USER_FLAG_MAP_RW (1u << 0)
>+
>+typedef struct {
>+    /* VIRTIO Shared Memory Region ID */
>+    uint8_t shmid;
>+    uint8_t padding[7];
>+    /* File offset */
>+    uint64_t fd_offset;
>+    /* Offset within the VIRTIO Shared Memory Region */
>+    uint64_t shm_offset;
>+    /* Size of the mapping */
>+    uint64_t len;
>+    /* Flags for the mmap operation, from VHOST_USER_FLAG_MAP_* */
>+    uint64_t flags;
>+} VhostUserMMap;
>+
> typedef struct {
>     VhostUserRequest request;
>
>@@ -224,6 +250,8 @@ typedef union {
>         VhostUserInflight inflight;
>         VhostUserShared object;
>         VhostUserTransferDeviceState transfer_state;
>+        VhostUserMMap mmap;
>+        VhostUserShMemConfig shmem;
> } VhostUserPayload;
>
> typedef struct VhostUserMsg {
>@@ -1768,6 +1796,196 @@ vhost_user_backend_handle_shared_object_lookup(struct 
vhost_user *u,
>     return 0;
> }
>
>+/**
>+ * vhost_user_backend_handle_shmem_map() - Handle SHMEM_MAP backend request
>+ * @dev: vhost device
>+ * @ioc: QIOChannel for communication
>+ * @hdr: vhost-user message header
>+ * @payload: message payload containing mapping details
>+ * @fd: file descriptor for the shared memory region
>+ *
>+ * Handles VHOST_USER_BACKEND_SHMEM_MAP requests from the backend. Creates
>+ * a VhostUserShmemObject to manage the shared memory mapping and adds it
>+ * to the appropriate VirtIO shared memory region. The VhostUserShmemObject
>+ * serves as an intermediate parent for the MemoryRegion, ensuring proper
>+ * lifecycle management with reference counting.
>+ *
>+ * Returns: 0 on success, negative errno on failure
>+ */
>+static int
>+vhost_user_backend_handle_shmem_map(struct vhost_dev *dev,
>+                                    QIOChannel *ioc,
>+                                    VhostUserHeader *hdr,
>+                                    VhostUserPayload *payload,
>+                                    int fd)
>+{
>+    VirtioSharedMemory *shmem;
>+    VhostUserMMap *vu_mmap = &payload->mmap;
>+    VirtioSharedMemoryMapping *existing;
>+    Error *local_err = NULL;
>+    int ret = 0;
>+
>+    if (fd < 0) {
>+        error_report("Bad fd for map");
>+        ret = -EBADF;
>+        goto send_reply;
>+    }
>+
>+    if (QSIMPLEQ_EMPTY(&dev->vdev->shmem_list)) {
>+        error_report("Device has no VIRTIO Shared Memory Regions. "
>+                     "Requested ID: %d", vu_mmap->shmid);
>+        ret = -EFAULT;
>+        goto send_reply;
>+    }
>+
>+    shmem = virtio_find_shmem_region(dev->vdev, vu_mmap->shmid);
>+    if (!shmem) {
>+        error_report("VIRTIO Shared Memory Region at "
>+                     "ID %d not found or uninitialized", vu_mmap->shmid);
>+        ret = -EFAULT;
>+        goto send_reply;
>+    }
>+
>+    if ((vu_mmap->shm_offset + vu_mmap->len) < vu_mmap->len ||
>+        (vu_mmap->shm_offset + vu_mmap->len) > shmem->mr.size) {
>+        error_report("Bad offset/len for mmap %" PRIx64 "+%" PRIx64,
>+                     vu_mmap->shm_offset, vu_mmap->len);
>+        ret = -EFAULT;
>+        goto send_reply;
>+    }
>+
>+    QTAILQ_FOREACH(existing, &shmem->mmaps, link) {
>+        if (ranges_overlap(existing->offset, existing->len,
>+                           vu_mmap->shm_offset, vu_mmap->len)) {
>+            error_report("VIRTIO Shared Memory mapping overlap");
>+            ret = -EFAULT;
>+            goto send_reply;
>+        }
>+    }
>+
>+    memory_region_transaction_begin();
>+
>+    /* Create VirtioSharedMemoryMapping object */
>+    VirtioSharedMemoryMapping *mapping = virtio_shared_memory_mapping_new(
>+        vu_mmap->shmid, fd, vu_mmap->fd_offset, vu_mmap->shm_offset,
>+        vu_mmap->len, vu_mmap->flags & VHOST_USER_FLAG_MAP_RW);
>+
>+    if (!mapping) {
>+        ret = -EFAULT;
>+        goto send_reply_commit;
>+    }
>+
>+    /* Add the mapping to the shared memory region */
>+    if (virtio_add_shmem_map(shmem, mapping) != 0) {
>+        error_report("Failed to add shared memory mapping");
>+        object_unref(OBJECT(mapping));
>+        ret = -EFAULT;
>+        goto send_reply_commit;
>+    }
>+
>+send_reply_commit:
>+    /* Send reply and commit after transaction started */
>+    if (hdr->flags & VHOST_USER_NEED_REPLY_MASK) {
>+        payload->u64 = !!ret;
>+        hdr->size = sizeof(payload->u64);
>+        if (!vhost_user_send_resp(ioc, hdr, payload, &local_err)) {
>+            error_report_err(local_err);
>+            memory_region_transaction_commit();
>+            return -EFAULT;
>+        }
>+    }
>+    memory_region_transaction_commit();

Sorry to be late, I did a quick review, my only doubts is here, maybe it
was already discussed, but why do we commit after responding to the
backend?

Should we do it first to prevent the backend from “seeing” something
that hasn't been committed yet?

There is a race that leads to a deadlock. hw/virtio/vhost.c has a
MemoryListener that sends VHOST_USER_SET_MEM_TABLE messages in its
.commit() callback. If this happens before the reply, the backend will
not process it as it is stuck waiting for the SHMEM reply, and the
handler in qemu will not send it as it is waiting for the reply to the
SET_MEM_TABLE. So we have to delay the transaction commit to
immediately after the reply.

Okay, I see now that you mentioned that in the commit description, great, I should have read it more carefully! IMO it would be worth adding a comment here, but I definitely won't ask you to send a v11 for this! (maybe a followup patch later).



Also, if vhost_user_send_resp() fails, should we call
virtio_del_shmem_map()?

If vhost_user_send_resp() fails, the connection with the backend is
closed, so the mappings will indeed never be removed unless we reset.

Maybe better than removing the single mapping, would be to loop
through mappings in the shared memory and clean them all (same we do :

```
       while (!QTAILQ_EMPTY(&shmem->mmaps)) {
           VirtioSharedMemoryMapping *mapping = QTAILQ_FIRST(&shmem->mmaps);
           virtio_del_shmem_map(shmem, mapping->offset, mapping->mr->size);
       }
```

But since a backend may utilize more than one shared memory region,
and we do not know the mapping between a given backend and its shared
memories, whatever we do will be incomplete (?).

I don't know if this is the right place to do this kind of cleanup, maybe further up in the stack.


I think the only
solution after this happens is to reset (virtio_reset) to remove all
mappings from the all shared regions, and re-establish the backend
channel (is it possible?). Even if the channel cannot be restablished,
I wouldn't bother just removing one mapping, I would assume it needs a
reset.

So, in conclusion, we are saying that if we can no longer communicate with the backend, there is no point in maintaining a consistent state, because we have to reset the device anyway.
Are we already doing this, or should we be doing it?

BTW, I don't want to stop this series, I just found this error path strange.

Thanks,
Stefano


Reply via email to