>-----Original Message----- >From: Peter Xu <pet...@redhat.com> >Sent: Friday, June 9, 2023 3:53 AM >To: Jason Gunthorpe <j...@nvidia.com> >Cc: Liu, Yi L <yi.l....@intel.com>; Duan, Zhenzhong ><zhenzhong.d...@intel.com>; qemu-devel@nongnu.org; m...@redhat.com; >jasow...@redhat.com; pbonz...@redhat.com; >richard.hender...@linaro.org; edua...@habkost.net; >marcel.apfelb...@gmail.com; alex.william...@redhat.com; >c...@redhat.com; da...@redhat.com; phi...@linaro.org; >kwankh...@nvidia.com; c...@nvidia.com; Peng, Chao P ><chao.p.p...@intel.com> >Subject: Re: [PATCH v3 5/5] intel_iommu: Optimize out some unnecessary >UNMAP calls > >On Thu, Jun 08, 2023 at 01:27:50PM -0300, Jason Gunthorpe wrote: >> On Thu, Jun 08, 2023 at 11:40:55AM -0400, Peter Xu wrote: >> >> > > But if there is the proper locks to prevent a map/unmap race, then >> > > there should also be the proper locks to check that there is no map in >> > > the first place and avoid the kernel call.. >> > >> > The problem is IIRC guest iommu driver can do smart things like batching >> > invalidations, it means when QEMU gets it from the guest OS it may >already >> > not matching one mapped objects. >> >> qemu has to fix it. The kernel API is object based, not paged >> based. You cannot unmap partions of a prior mapping. >> >> I assume for this kind of emulation it is doing 4k objects because >> it has no idea what size of mapping the client will use? > >MAP is fine, before notify() to VFIO or anything, qemu scans the pgtable >and handles it in page size or huge page size, so it can be >4K but always >guest iommu pgsize aligned. > >I think we rely on guest behaving right, so it should also always operate >on that size minimum when mapped huge. It shouldn't violate the >"per-object" protocol of iommufd. > >IIUC the same to vfio type1v2 from that aspect. > >It's more about UNMAP batching, but I assume iommufd is fine if it's fine >with holes inside for that case. The only difference of "not exist" of >-ENOENT seems to be just same as before as long as QEMU treats it as 0 like >before. > >Though that does look slightly special, because the whole empty UNMAP >region can be seen as a hole too; not sure when that -ENOENT will be useful >if qemu should always bypass it anyway. Indeed not a problem to qemu. > >> >> > We can definitely lookup every single object and explicitly unmap, but it >> > loses partial of the point of batching that guest OS does. >> >> You don't need every single object, but it would be faster to check >> where things are mapped and then call the kernel correctly instead of >> trying to iterate with the unmapped reults. > >Maybe yes. If so, It'll be great if Zhenzhong could just attach some proof >on that, meanwhile drop the "iommufd UNMAP warnings" section in the commit >message.
Seems vtd_page_walk_one() already works in above way, checking mapping changes and calling kernel for changed entries? Thanks Zhenzhong