On 10/16/25 11:53, Yi Liu wrote:
On 2025/10/16 16:48, Duan, Zhenzhong wrote:

how about an empty iova_tree? If guest has not mapped anything for
the
device, the tree is empty. And it is fine to not unmap anyting. While,
if the device is attached to an identify domain, the iova_tree is empty
as well. Are we sure that we need not to unmap anything here? It
looks
the answer is yes. But I'm suspecting the unmap failure will happen in
the vfio side? If yes, need to consider a complete fix. :)

Not get what failure will happen, could you elaborate?
In case of identity domain, IOMMU memory region is disabled, no
iommu
notifier will ever be triggered. vfio_listener monitors memory address
space,
if any memory region is disabled, vfio_listener will catch it and do dirty
tracking.

My question comes from the reason why DMA unmap fails. It is due to
a big range is given to kernel while kernel does not support. So if
VFIO gives a big range as well, it should fail as well. And this is
possible when guest (a VM with large size memory) switches from
identify
domain to a paging domain. In this case, vfio_listener will unmap all
the system MRs. And it can be a big range if VM size is big enough.

Got you point. Yes, currently vfio_type1 driver limits unmap_bitmap to
8TB
size.
If guest memory is large enough and lead to a memory region of more
than
8TB size,
unmap_bitmap will fail. It's a rare case to live migrate VM with more than
8TB memory,
instead of fixing it in qemu with complex change, I'd suggest to bump
below
MACRO
value to enlarge the limit in kernel, or switch to use iommufd which
doesn't
have such limit.

This limit shall not affect the usage of device dirty tracking. right?
If yes, add something to tell user use iommufd backend is better. e.g
if memory size is bigger than the limit of vfio iommu type1's dirty
bitmap limit (query cap_mig.max_dirty_bitmap_size), then fail user if
user wants migration capability.

Do you mean just dirty tracking instead of migration, like dirtyrate?
In that case, there is error print as above, I think that's enough as a hint?

it's not related to diryrate.

I guess you mean to add a migration blocker if limit is reached? It's hard
because the limit is only helpful for identity domain, DMA domain in guest
doesn't have such limit, and we can't know guest's choice of domain type
of each VFIO device attached.

I meant a blocker to boot QEMU if there is limit. something like below:

    if (VM memory > 8TB && legacy_container_backend &&
migration_enabled)
        fail the VM boot.

OK, will add below to vfio_migration_realize() with an extra patch:

yeah, let's see Alex and Cedric's feedback.

     if (!vbasedev->iommufd && current_machine->ram_size > 8 * TiB) {
         /*
          * The 8TB comes from default kernel and QEMU config, it may be
          * conservative here as VM can use large page or run with vIOMMU
          * so the limitation may be relaxed. But 8TB is already quite
          * large for live migration. One can also switch to use IOMMUFD
          * backend if there is a need to migrate large VM.
          */

instead of hard code 8TB. May convert cap_mig.max_dirty_bitmap_size to
memory size. :)
yes. It would reflect better that it's a VFIO dirty tracking limitation.


Zhenzhong,

Soft freeze is w45. I plan to send a PR next week, w43, and I will be out
w44. I will have some (limited) time to address more changes on w45.

Thanks,

C.



Reply via email to