On 4/7/25 15:49, Chenyi Qiang wrote:
Commit 852f0048f3 ("RAMBlock: make guest_memfd require uncoordinated
discard") highlighted that subsystems like VFIO may disable RAM block
discard. However, guest_memfd relies on discard operations for page
conversion between private and shared memory, potentially leading to
stale IOMMU mapping issue when assigning hardware devices to
confidential VMs via shared memory. To address this, it is crucial to
ensure systems like VFIO refresh its IOMMU mappings.

PrivateSharedManager is introduced to manage private and shared states in
confidential VMs, similar to RamDiscardManager, which supports
coordinated RAM discard in VFIO. Integrating PrivateSharedManager with
guest_memfd can facilitate the adjustment of VFIO mappings in response
to page conversion events.

Since guest_memfd is not an object, it cannot directly implement the
PrivateSharedManager interface. Implementing it in HostMemoryBackend is
not appropriate because guest_memfd is per RAMBlock, and some RAMBlocks
have a memory backend while others do not. Notably, virtual BIOS
RAMBlocks using memory_region_init_ram_guest_memfd() do not have a
backend.

To manage RAMBlocks with guest_memfd, define a new object named
RamBlockAttribute to implement the RamDiscardManager interface. This
object stores guest_memfd information such as shared_bitmap, and handles
page conversion notification. The memory state is tracked at the host
page size granularity, as the minimum memory conversion size can be one
page per request. Additionally, VFIO expects the DMA mapping for a
specific iova to be mapped and unmapped with the same granularity.
Confidential VMs may perform partial conversions, such as conversions on
small regions within larger regions. To prevent invalid cases and until
cut_mapping operation support is available, all operations are performed
with 4K granularity.

Just for your information, IOMMUFD plans to introduce the support for
cut operation. The kickoff patch series is under discussion here:

https://lore.kernel.org/linux-iommu/0-v2-5c26bde5c22d+58b-iommu_pt_...@nvidia.com/

This new cut support is expected to be exclusive to IOMMUFD and not
directly available in the VFIO container context. The VFIO uAPI for map/
unmap is being superseded by IOMMUFD, and all new features will only be
available in IOMMUFD.


Signed-off-by: Chenyi Qiang<chenyi.qi...@intel.com>

<...>

+
+int ram_block_attribute_realize(RamBlockAttribute *attr, MemoryRegion *mr)
+{
+    uint64_t shared_bitmap_size;
+    const int block_size  = qemu_real_host_page_size();
+    int ret;
+
+    shared_bitmap_size = ROUND_UP(mr->size, block_size) / block_size;
+
+    attr->mr = mr;
+    ret = memory_region_set_generic_state_manager(mr, 
GENERIC_STATE_MANAGER(attr));
+    if (ret) {
+        return ret;
+    }
+    attr->shared_bitmap_size = shared_bitmap_size;
+    attr->shared_bitmap = bitmap_new(shared_bitmap_size);

Above introduces a bitmap to track the private/shared state of each 4KB
page. While functional, for large RAM blocks managed by guest_memfd,
this could lead to significant memory consumption.

Have you considered an alternative like a Maple Tree or a generic
interval tree? Both are often more memory-efficient for tracking ranges
of contiguous states.

+
+    return ret;
+}
+
+void ram_block_attribute_unrealize(RamBlockAttribute *attr)
+{
+    g_free(attr->shared_bitmap);
+    memory_region_set_generic_state_manager(attr->mr, NULL);
+}

Thanks,
baolu

Reply via email to