+ David Hildenbrand

>-----Original Message-----
>From: Steven Sistare <steven.sist...@oracle.com>
>Subject: Re: [PATCH] vfio/container: Remap only populated parts in a section
>
>On 8/31/2025 10:13 PM, Duan, Zhenzhong wrote:
>>> -----Original Message-----
>>> From: Steven Sistare <steven.sist...@oracle.com>
>>> Subject: Re: [PATCH] vfio/container: Remap only populated parts in a
>section
>>>
>>> On 8/13/2025 11:24 PM, Zhenzhong Duan wrote:
>>>> If there are multiple containers and unmap-all fails for some container,
>we
>>>> need to remap vaddr for the other containers for which unmap-all
>>> succeeded.
>>>> When ram discard is enabled, we should only remap populated parts in a
>>>> section instead of the whole section.
>>>>
>>>> Export vfio_ram_discard_notify_populate() and use it to do population.
>>>>
>>>> Fixes: eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr
>>> failure")
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com>
>>>> ---
>>>> btw: I didn't find easy to test this corner case, only code inspecting
>>>
>>> Thanks Zhenzhong, this looks correct.
>>>
>>> However, I never liked patch
>>>    eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr
>failure")
>>>
>>> I think it adds too much complexity for a rare case.  In fact, if we
>>> examine all the possible error return codes, I believe they all would
>>> be caused by other qemu application bugs, or kernel bugs:
>>>
>>> vfio_dma_do_unmap()
>>>    returns -EBUSY if an mdev exists.  qemu blocks live update blocker
>>>      when mdev is present.  If this occurs, the blocker has a bug.
>>>    returns -EINVAL if the vaddr was already invalidated.  qemu already
>>>      invalidated it, or never remapped the vaddr after a previous live
>>>      update.  Both are qemu bugs.
>>>
>>> iopt_unmap_all
>>>    iopt_unmap_iova_range
>>>      -EBUSY - qemu is concurrently performing other dma map or unmap
>>>               operations.  a bug.
>>>
>>>      -EDEADLOCK - Something is not responding to unmap requests.
>>>
>>> Therefore, I think we should just revert eba1f657cbb1, and assert that
>>> the qemu vfio_dma_unmap_vaddr_all() call succeeds.
>>>
>>> Thoughts?
>>
>> I agree it's a rare case and your suggestion will make code simple, but I 
>> feel
>it's aggressive to kill QEMU instance if live update fails, try to restore and
>keep current instance running is important in cloud env and looks more
>moderate.
>
>OK.
>
>Reviewed-by: Steve Sistare <steven.sist...@oracle.com>
>
>(but you should also seek an RB from someone who is more familiar with ram
>discard and
>its callbacks).

Hi David, look forward to your comments, suggestions. Thanks

BRs,
Zhenzhong

Reply via email to