+ David Hildenbrand >-----Original Message----- >From: Steven Sistare <steven.sist...@oracle.com> >Subject: Re: [PATCH] vfio/container: Remap only populated parts in a section > >On 8/31/2025 10:13 PM, Duan, Zhenzhong wrote: >>> -----Original Message----- >>> From: Steven Sistare <steven.sist...@oracle.com> >>> Subject: Re: [PATCH] vfio/container: Remap only populated parts in a >section >>> >>> On 8/13/2025 11:24 PM, Zhenzhong Duan wrote: >>>> If there are multiple containers and unmap-all fails for some container, >we >>>> need to remap vaddr for the other containers for which unmap-all >>> succeeded. >>>> When ram discard is enabled, we should only remap populated parts in a >>>> section instead of the whole section. >>>> >>>> Export vfio_ram_discard_notify_populate() and use it to do population. >>>> >>>> Fixes: eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr >>> failure") >>>> Signed-off-by: Zhenzhong Duan <zhenzhong.d...@intel.com> >>>> --- >>>> btw: I didn't find easy to test this corner case, only code inspecting >>> >>> Thanks Zhenzhong, this looks correct. >>> >>> However, I never liked patch >>> eba1f657cbb1 ("vfio/container: recover from unmap-all-vaddr >failure") >>> >>> I think it adds too much complexity for a rare case. In fact, if we >>> examine all the possible error return codes, I believe they all would >>> be caused by other qemu application bugs, or kernel bugs: >>> >>> vfio_dma_do_unmap() >>> returns -EBUSY if an mdev exists. qemu blocks live update blocker >>> when mdev is present. If this occurs, the blocker has a bug. >>> returns -EINVAL if the vaddr was already invalidated. qemu already >>> invalidated it, or never remapped the vaddr after a previous live >>> update. Both are qemu bugs. >>> >>> iopt_unmap_all >>> iopt_unmap_iova_range >>> -EBUSY - qemu is concurrently performing other dma map or unmap >>> operations. a bug. >>> >>> -EDEADLOCK - Something is not responding to unmap requests. >>> >>> Therefore, I think we should just revert eba1f657cbb1, and assert that >>> the qemu vfio_dma_unmap_vaddr_all() call succeeds. >>> >>> Thoughts? >> >> I agree it's a rare case and your suggestion will make code simple, but I >> feel >it's aggressive to kill QEMU instance if live update fails, try to restore and >keep current instance running is important in cloud env and looks more >moderate. > >OK. > >Reviewed-by: Steve Sistare <steven.sist...@oracle.com> > >(but you should also seek an RB from someone who is more familiar with ram >discard and >its callbacks).
Hi David, look forward to your comments, suggestions. Thanks BRs, Zhenzhong