>-----Original Message-----
>From: Liu, Yi L <[email protected]>
>Subject: Re: [PATCH v5 7/9] vfio/listener: Construct iotlb entry when unmap
>memory address space
>
>On 2025/11/6 12:20, Zhenzhong Duan wrote:
>> If a VFIO device in guest switches from passthrough(PT) domain to block
>> domain, the whole memory address space is unmapped, but we passed a
>NULL
>> iotlb entry to unmap_bitmap, then bitmap query didn't happen and we lost
>> dirty pages.
>
>this is a good catch. :) Have you observed problem in testing or just
>identified it with patch iteration?

Patch iteration.

>
>> By constructing an iotlb entry with iova = gpa for unmap_bitmap, it can
>> set dirty bits correctly.
>>
>> For IOMMU address space, we still send NULL iotlb because VFIO don't
>> know the actual mappings in guest. It's vIOMMU's responsibility to send
>> actual unmapping notifications, e.g.,
>vtd_address_space_unmap_in_migration()
>>
>> Signed-off-by: Zhenzhong Duan <[email protected]>
>> Tested-by: Giovannio Cabiddu <[email protected]>
>> ---
>>   hw/vfio/listener.c | 15 ++++++++++++++-
>>   1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
>> index 2109101158..3b48f6796c 100644
>> --- a/hw/vfio/listener.c
>> +++ b/hw/vfio/listener.c
>> @@ -713,14 +713,27 @@ static void
>vfio_listener_region_del(MemoryListener *listener,
>>
>>       if (try_unmap) {
>>           bool unmap_all = false;
>> +        IOMMUTLBEntry entry = {}, *iotlb = NULL;
>>
>>           if (int128_eq(llsize, int128_2_64())) {
>>               assert(!iova);
>>               unmap_all = true;
>>               llsize = int128_zero();
>>           }
>> +
>> +        /*
>> +         * Fake an IOTLB entry for identity mapping which is needed by
>dirty
>> +         * tracking. In fact, in unmap_bitmap, only translated_addr field
>is
>> +         * used to set dirty bitmap.
>
>Just say sync dirty is needed per unmap. So you may add a check
>in_migration as well. If not in migration, it is no needed to do it.

Dirty tracking is not only for migration, but also ditry rate/limit. So a 
non-null iotlb
is always passed for ram MR. That iotlb pointer is used only when
vfio_container_dirty_tracking_is_started() return true.

I can add a check on global_dirty_tracking if you prefer to add a check.

>
>> +         */
>> +        if (!memory_region_is_iommu(section->mr)) {
>> +            entry.iova = iova;
>> +            entry.translated_addr = iova;
>> +            iotlb = &entry;
>> +        }
>> +
>
>While, I'm still wondering how to deal with iommu MR case. Let's see a
>scenario first. When switching from DMA domain to PT, QEMU will switch
>to PT. This shall trigger the vfio_listener_region_del() and unregister
>the iommu notifier. This means vIOMMU side needs to do unmap prior to
>switching AS. If not, the iommu notifier is gone when vIOMMU wants to
>unmap with an IOTLBEvent. For virtual intel_iommu, it is calling
>vtd_address_space_unmap_in_migration() prior to calling
>vtd_switch_address_space(). So I think you need to tweak the intel_iommu
>a bit to suit the order requirement. :)

VTD doesn't support switching from DMA domain to PT atomically, so switches
to block domain in between, see intel_iommu_attach_device() in kernel.

So with this sequence is DMA->block->PT domain, we have guaranteed the order
you shared? See vtd_pasid_cache_sync_locked().

>
>BTW. should the iommu MRs even go to this try_unmap branch? I think for
>such MRs, it relies on the vIOMMU to unmap explicitly (hence trigger the
>vfio_iommu_map_notify()).

Yes, it's unnecessary, but it's hard for VFIO to distinguish if try_unmap is 
due to
domain switch or a real unmap. I think it's harmless because the second 
try_unmap
unmaps nothing.

Thanks
Zhenzhong

>
>>           ret = vfio_container_dma_unmap(bcontainer, iova,
>int128_get64(llsize),
>> -                                       NULL, unmap_all);
>> +                                       iotlb, unmap_all);
>>           if (ret) {
>>               error_report("vfio_container_dma_unmap(%p,
>0x%"HWADDR_PRIx", "
>>                            "0x%"HWADDR_PRIx") = %d (%s)",
>
>Regards,
>Yi Liu

Reply via email to