>-----Original Message-----
>From: Liu, Yi L <[email protected]>
>Subject: Re: [PATCH v8 17/23] vfio/listener: Bypass readonly region for dirty
>tracking
>
>On 2025/11/28 10:08, Duan, Zhenzhong wrote:
>> Hi Yi, Cedric,
>>
>> Could you also help comment on this patch? This is a pure VFIO migration
>related optimization, I think it's better to let it go with the "vfio: relax 
>the
>vIOMMU check" series.
>> I'd like to move it in next respin of "vfio: relax the vIOMMU check" series 
>> if
>you think it make sense.
>
>It makes sense to me.
>
>> Thanks
>> Zhenzhong
>>
>>> -----Original Message-----
>>> From: Duan, Zhenzhong <[email protected]>
>>> Subject: [PATCH v8 17/23] vfio/listener: Bypass readonly region for dirty
>>> tracking
>>>
>>> When doing ditry tracking or calculating dirty tracking range, readonly
>
>s/ditry/dirty/

Will fix, it's strange --codespell didn't catch it.

>
>>> regions can be bypassed, because corresponding DMA mappings are
>readonly
>>> and never become dirty.
>>>
>>> This can optimize dirty tracking a bit for passthrough device.
>>>
>>> Signed-off-by: Zhenzhong Duan <[email protected]>
>>> ---
>>> hw/vfio/listener.c   | 46
>+++++++++++++++++++++++++++++++++-----------
>>> hw/vfio/trace-events |  1 +
>>> 2 files changed, 36 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
>>> index 3b48f6796c..ca2377d860 100644
>>> --- a/hw/vfio/listener.c
>>> +++ b/hw/vfio/listener.c
>>> @@ -76,8 +76,13 @@ static bool vfio_log_sync_needed(const
>VFIOContainer
>>> *bcontainer)
>>>      return true;
>>> }
>>>
>>> -static bool vfio_listener_skipped_section(MemoryRegionSection *section)
>>> +static bool vfio_listener_skipped_section(MemoryRegionSection
>*section,
>>> +                                          bool bypass_ro)
>>> {
>>> +    if (bypass_ro && section->readonly) {
>>> +        return true;
>>> +    }
>>> +
>>>      return (!memory_region_is_ram(section->mr) &&
>>>              !memory_region_is_iommu(section->mr)) ||
>>>             memory_region_is_protected(section->mr) ||
>>> @@ -368,9 +373,9 @@ static bool
>>> vfio_known_safe_misalignment(MemoryRegionSection *section)
>>> }
>>>
>>> static bool vfio_listener_valid_section(MemoryRegionSection *section,
>>> -                                        const char *name)
>>> +                                        bool bypass_ro, const
>char
>>> *name)
>>> {
>>> -    if (vfio_listener_skipped_section(section)) {
>>> +    if (vfio_listener_skipped_section(section, bypass_ro)) {
>>>          trace_vfio_listener_region_skip(name,
>>>                  section->offset_within_address_space,
>>>                  section->offset_within_address_space +
>>> @@ -497,7 +502,7 @@ void vfio_container_region_add(VFIOContainer
>>> *bcontainer,
>>>      int ret;
>>>      Error *err = NULL;
>>>
>>> -    if (!vfio_listener_valid_section(section, "region_add")) {
>>> +    if (!vfio_listener_valid_section(section, false, "region_add")) {
>>>          return;
>>>      }
>>>
>>> @@ -663,7 +668,7 @@ static void
>vfio_listener_region_del(MemoryListener
>>> *listener,
>>>      int ret;
>>>      bool try_unmap = true;
>>>
>>> -    if (!vfio_listener_valid_section(section, "region_del")) {
>>> +    if (!vfio_listener_valid_section(section, false, "region_del")) {
>>>          return;
>>>      }
>>>
>>> @@ -722,11 +727,11 @@ static void
>>> vfio_listener_region_del(MemoryListener *listener,
>>>          }
>>>
>>>          /*
>>> -         * Fake an IOTLB entry for identity mapping which is needed by
>>> dirty
>>> -         * tracking. In fact, in unmap_bitmap, only translated_addr field
>is
>>> -         * used to set dirty bitmap.
>>> +         * Fake an IOTLB entry for writable identity mapping which is
>>> needed
>>> +         * by dirty tracking. In fact, in unmap_bitmap, only
>>> translated_addr
>>> +         * field is used to set dirty bitmap.
>>>           */
>>> -        if (!memory_region_is_iommu(section->mr)) {
>>> +        if (!memory_region_is_iommu(section->mr)
>>> && !section->readonly) {
>>>              entry.iova = iova;
>>>              entry.translated_addr = iova;
>>>              iotlb = &entry;
>>> @@ -834,7 +839,8 @@ static void
>>> vfio_dirty_tracking_update(MemoryListener *listener,
>>>          container_of(listener, VFIODirtyRangesListener, listener);
>>>      hwaddr iova, end;
>>>
>>> -    if (!vfio_listener_valid_section(section, "tracking_update") ||
>>> +    /* Bypass readonly section as it never becomes dirty */
>>> +    if (!vfio_listener_valid_section(section, true, "tracking_update") ||
>>>          !vfio_get_section_iova_range(dirty->bcontainer, section,
>>>                                       &iova, &end, NULL)) {
>>>          return;
>>> @@ -1093,6 +1099,19 @@ static void
>>> vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry
>*iotlb)
>>>      if (!mr) {
>>>          goto out_unlock;
>>>      }
>>> +
>>> +    /*
>>> +     * The mapping is readonly when either it's a readonly mapping in
>guest
>>> +     * or mapped target is readonly, bypass it for dirty tracking as it
>>> +     * never becomes dirty.
>>> +     */
>>> +    if (!(iotlb->perm & IOMMU_WO) || mr->readonly) {
>
>is it possible that guest maps RW, while the backend mr is readonly?

I think a normal OS does not need to map a readonly region as RW, because 
writing will always fail.
If it does, then shadow mapping on host side is readonly. This emulates same 
behavior as on baremetal.

>
>>> +        trace_vfio_iommu_map_dirty_notify_skip_ro(iova,
>>> +                                                  iova +
>>> iotlb->addr_mask);
>>> +        rcu_read_unlock();
>>> +        return;
>>> +    }
>>> +
>>>      translated_addr = memory_region_get_ram_addr(mr) + xlat;
>>>
>>>      ret = vfio_container_query_dirty_bitmap(bcontainer, iova,
>>> iotlb->addr_mask + 1,
>>> @@ -1228,7 +1247,12 @@ static void
>>> vfio_listener_log_sync(MemoryListener *listener,
>>>      int ret;
>>>      Error *local_err = NULL;
>>>
>>> -    if (vfio_listener_skipped_section(section)) {
>>> +    /*
>>> +     * Bypass readonly section as it never becomes dirty, iommu
>memory
>>> section
>>> +     * is RW and never bypassed. The readonly mappings in iommu MR
>are
>>> bypassed
>>> +     * in vfio_iommu_map_dirty_notify().
>>> +     */
>>> +    if (vfio_listener_skipped_section(section, true)) {
>>>          return;
>>>      }
>>>
>
>Looks the vfio_iommu_map_notify() is missed. It also has unmap op. is
>it?

Right, because we don't know existing shadow mapping is readonly or not,
vfio_listener_log_sync() also doesn't provide that info.
That means we do dirty traking in this case, but returned bitmaps should be all 
zero.

Thanks
Zhenzhong

Reply via email to