On Thu, Dec 01, 2016 at 02:44:14PM +0800, Lan Tianyu wrote:
> I think there are still other gaps to enable passthough device with
> vIOMMU's DMA translation support.
> 1. Since this patchset is to shadow guest IO page table to
> pIOMMU(physical IOMMU) vfio_dma_map/umap(), there will be some fault
> events from pIOMMU if guest os does misconfigurations. We should report
> these fault events to guest. This means we need to pass the fault event
> from pIOMMU driver to vIOMMU in Qemu. I suppose a channel in VFIO should
> be added to connect pIOMMU and vIOMMU.
Thanks for raising this up - IMHO this is a good question.
> The task should be divided into three parts
> 1) pIOMMU driver reports fault events for vIOMMU via new VFIO interface
Here you mean "how host kernel capture the DMAR fault", right?
IMHO We can have something like notifier/notifiee as well in DMAR
fault reporting - people (like vfio driver in kernel) can register to
fault reports related to specific device. When DMAR receives faults
for those devices, it triggers the notification list, then vfio can be
> 2) Add new channel in VFIO subsystem to connect pIOMMU driver and
> vIOMMU in Qemu
> 3) vIOMMU in Qemu get fault event from VFIO subsystem in Qemu and inject
> virtual fault event to guest.
> Such VFIO channel is also required by device's PRS(Page Request
> Services) support. This is also a part of SVM(Shared virtual memory)
> support in VM. Here is SVM design doc link.
> 2. How to restore GPA->HPA mapping when IOVA is disabled by guest.
> When guest enables IOVA for device, vIOMMU will invalidate all previous
> GPA->HPA mapping and update IOVA->HPA mapping to pIOMMU via iommu
> notifier. But if IOVA is disabled, I think we should restore GPA->HPA
> mapping for the device otherwise the device won't work again in the VM.
If we can have a workable replay mechanism, this problem will be
solved IMHO. A more general issue is we move one device into an
existing domain X (no matter this is system address space, or another
existing IOMMU domain) - we need to unmap the pages in A and remap the
pages in B. But, this is not the best solution IMHO. Issue of this
solution is that we will need to maintain some containers that has
totally the same shadow page tables. The best solution is we have a
domain layer, and we have an address space (container) for each
- when new domain added: we create a new container for this new
domain, re-build the shadow page table in that domain if there is
- when device moves domain (either move into/from system address
space, or move into another IOMMU domain): we can just try to put
that device (aka corresponding group) into the existing container
that the new domain belongs