>-----Original Message-----
>From: Liu, Yi L <[email protected]>
>Subject: Re: [PATCH 4/5] intel_iommu: Optimize unmap_bitmap during
>migration
>
>On 2025/9/10 10:37, Zhenzhong Duan wrote:
>> If a VFIO device in guest switches from IOMMU domain to block domain,
>> vtd_address_space_unmap() is called to unmap whole address space.
>>
>> If that happens during migration, migration fails with legacy VFIO
>> backend as below:
>>
>> Status: failed (vfio_container_dma_unmap(0x561bbbd92d90,
>0x100000000000, 0x100000000000) = -7 (Argument list too long))
>
>this should be a giant and busy VM. right? Is a fix tag needed by the way?

VM size is unrelated, it's not a bug, just current code doesn't work well with 
migration.

When device switches from IOMMU domain to block domain, the whole iommu
memory region is disabled, this trigger the unmap on the whole iommu memory
region, no matter how many or how large the mappings are in the iommu MR.

>
>>
>> Because legacy VFIO limits maximum bitmap size to 256MB which maps to
>8TB on
>> 4K page system, when 16TB sized UNMAP notification is sent,
>unmap_bitmap
>> ioctl fails.
>>
>> There is no such limitation with iommufd backend, but it's still not optimal
>> to allocate large bitmap.
>>
>> Optimize it by iterating over DMAMap list to unmap each range with active
>> mapping when migration is active. If migration is not active, unmapping the
>> whole address space in one go is optimal.
>>
>> Signed-off-by: Zhenzhong Duan <[email protected]>
>> Tested-by: Giovannio Cabiddu <[email protected]>
>> ---
>>   hw/i386/intel_iommu.c | 42
>++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 42 insertions(+)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 83c5e44413..6876dae727 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -37,6 +37,7 @@
>>   #include "system/system.h"
>>   #include "hw/i386/apic_internal.h"
>>   #include "kvm/kvm_i386.h"
>> +#include "migration/misc.h"
>>   #include "migration/vmstate.h"
>>   #include "trace.h"
>>
>> @@ -4423,6 +4424,42 @@ static void
>vtd_dev_unset_iommu_device(PCIBus *bus, void *opaque, int devfn)
>>       vtd_iommu_unlock(s);
>>   }
>>
>> +/*
>> + * Unmapping a large range in one go is not optimal during migration
>because
>> + * a large dirty bitmap needs to be allocated while there may be only small
>> + * mappings, iterate over DMAMap list to unmap each range with active
>mapping.
>> + */
>> +static void vtd_address_space_unmap_in_migration(VTDAddressSpace
>*as,
>> +
>IOMMUNotifier *n)
>> +{
>> +    const DMAMap *map;
>> +    const DMAMap target = {
>> +        .iova = n->start,
>> +        .size = n->end,
>> +    };
>> +    IOVATree *tree = as->iova_tree;
>> +
>> +    /*
>> +     * DMAMap is created during IOMMU page table sync, it's either 4KB
>or huge
>> +     * page size and always a power of 2 in size. So the range of
>DMAMap could
>> +     * be used for UNMAP notification directly.
>> +     */
>> +    while ((map = iova_tree_find(tree, &target))) {
>
>how about an empty iova_tree? If guest has not mapped anything for the
>device, the tree is empty. And it is fine to not unmap anyting. While,
>if the device is attached to an identify domain, the iova_tree is empty
>as well. Are we sure that we need not to unmap anything here? It looks
>the answer is yes. But I'm suspecting the unmap failure will happen in
>the vfio side? If yes, need to consider a complete fix. :)

Not get what failure will happen, could you elaborate?
In case of identity domain, IOMMU memory region is disabled, no iommu
notifier will ever be triggered. vfio_listener monitors memory address space,
if any memory region is disabled, vfio_listener will catch it and do dirty 
tracking.


>
>> +        IOMMUTLBEvent event;
>> +
>> +        event.type = IOMMU_NOTIFIER_UNMAP;
>> +        event.entry.iova = map->iova;
>> +        event.entry.addr_mask = map->size;
>> +        event.entry.target_as = &address_space_memory;
>> +        event.entry.perm = IOMMU_NONE;
>> +        /* This field is meaningless for unmap */
>> +        event.entry.translated_addr = 0;
>> +        memory_region_notify_iommu_one(n, &event);
>> +
>> +        iova_tree_remove(tree, *map);
>> +    }
>> +}
>> +
>>   /* Unmap the whole range in the notifier's scope. */
>>   static void vtd_address_space_unmap(VTDAddressSpace *as,
>IOMMUNotifier *n)
>>   {
>> @@ -4432,6 +4469,11 @@ static void
>vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
>>       IntelIOMMUState *s = as->iommu_state;
>>       DMAMap map;
>>
>> +    if (migration_is_running()) {
>
>If the range is not big enough, it is still better to unmap in one-go.
>right? If so, might add a check on the range here to go to the iova_tee
>iteration conditionally.

We don't want to ditry track IOVA holes between IOVA ranges because it's time 
consuming and useless work. The hole may be large depending on guest behavior.
Meanwhile the time iterating on iova_tree is trivial. So we prefer tracking the 
exact iova ranges that may be dirty actually.

Thanks
Zhenzhong

Reply via email to