> From: Peter Xu [mailto:pet...@redhat.com]
> Sent: Friday, December 02, 2016 1:59 PM
> On Thu, Dec 01, 2016 at 04:21:38AM +0000, Tian, Kevin wrote:
> > > From: Peter Xu
> > > Sent: Wednesday, November 30, 2016 5:24 PM
> > >
> > > On Mon, Nov 28, 2016 at 05:51:50PM +0200, Aviv B.D wrote:
> > > > * intel_iommu's replay op is not implemented yet (May come in different
> > > > patch
> > > > set).
> > > > The replay function is required for hotplug vfio device and to move
> > > > devices
> > > > between existing domains.
> > >
> > > I am thinking about this replay thing recently and now I start to
> > > doubt whether the whole vt-d vIOMMU framework suites this...
> > >
> > > Generally speaking, current work is throwing away the IOMMU "domain"
> > > layer here. We maintain the mapping only per device, and we don't care
> > > too much about which domain it belongs. This seems problematic.
> > >
> > > A simplest wrong case for this is (let's assume cache-mode is
> > > enabled): if we have two assigned devices A and B, both belong to the
> > > same domain 1. Meanwhile, in domain 1 assume we have one mapping which
> > > is the first page (iova range 0-0xfff). Then, if guest wants to
> > > invalidate the page, it'll notify VT-d vIOMMU with an invalidation
> > > message. If we do this invalidation per-device, we'll need to UNMAP
> > > the region twice - once for A, once for B (if we have more devices, we
> > > will unmap more times), and we can never know we have done duplicated
> > > work since we don't keep domain info, so we don't know they are using
> > > the same address space. The first unmap will work, and then we'll
> > > possibly get some errors on the rest of dma unmap failures.
> > Tianyu and I discussed there is a bigger problem: today VFIO assumes
> > only one address space per container, which is fine w/o vIOMMU (all devices
> > in
> > same container share same GPA->HPA translation). However it's not the case
> > when vIOMMU is enabled, because guest Linux implements per-device
> > IOVA space. If a VFIO container includes multiple devices, it means
> > multiple address spaces required per container...
> IIUC the vfio container is created in:
> Along the way (for vfio_get_group()), we have:
> group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev), errp);
> Here the address space is per device. If without vIOMMU, they will be
> pointed to the same system address space. However if with vIOMMU,
> that address space will be per-device, no?
yes, I didn't note that fact. Tianyu also pointed it out in his reply. :-)