Hi All, Kindly ping, comments welcome😊 There are still some patches lacking a R-b: #2, #9-#11, #13-#14, #19-#21.
Hi Eric, Yi, would like to know if your previous R-b on #9-#11, #13-#14 still stands. I dropped them due to code movement into intel_iommu_accel.c Thanks Zhenzhong >-----Original Message----- >From: Duan, Zhenzhong <[email protected]> >Subject: [PATCH v8 00/23] intel_iommu: Enable first stage translation for >passthrough device > >Hi, > >For passthrough device with intel_iommu.x-flts=on, we don't do shadowing >of >guest page table but pass first stage page table to host side to construct a >nested HWPT. There was some effort to enable this feature in old days, see >[1] for details. > >The key design is to utilize the dual-stage IOMMU translation (also known as >IOMMU nested translation) capability in host IOMMU. As the below diagram >shows, >guest I/O page table pointer in GPA (guest physical address) is passed to host >and be used to perform the first stage address translation. Along with it, >modifications to present mappings in the guest I/O page table should be >followed >with an IOTLB invalidation. > > .-------------. .---------------------------. > | vIOMMU | | Guest I/O page table | > | | '---------------------------' > .----------------/ > | PASID Entry |--- PASID cache flush --+ > '-------------' | > | | V > | | I/O page table pointer in GPA > '-------------' > Guest > ------| Shadow |---------------------------|-------- > v v v > Host > .-------------. .-----------------------------. > | pIOMMU | | First stage for GIOVA->GPA | > | | '-----------------------------' > .----------------/ | > | PASID Entry | V (Nested xlate) > '----------------\.--------------------------------------------. > | | | Second stage for GPA->HPA, unmanaged >domain| > | | '--------------------------------------------' > '-------------' ><Intel VT-d Nested translation> > >This series reuse VFIO device's default HWPT as nesting parent instead of >creating new one. This way avoids duplicate code of a new memory listener, >all existing feature from VFIO listener can be shared, e.g., ram discard, >dirty tracking, etc. Two limitations are: 1) not supporting VFIO device >under a PCI bridge with emulated device, because emulated device wants >IOMMU AS and VFIO device stick to system AS; 2) not supporting kexec or >reboot from "intel_iommu=on,sm_on" to "intel_iommu=on,sm_off" on >platform >with ERRATA_772415_SPR17, because VFIO device's default HWPT is created >with NEST_PARENT flag, kernel inhibit RO mappings when switch to shadow >mode. > >This series is also a prerequisite work for vSVA, i.e. Sharing guest >application address space with passthrough devices. > >There are some interactions between VFIO and vIOMMU >* vIOMMU registers PCIIOMMUOps [set|unset]_iommu_device to PCI > subsystem. VFIO calls them to register/unregister HostIOMMUDevice > instance to vIOMMU at vfio device realize stage. >* vIOMMU registers PCIIOMMUOps get_viommu_flags to PCI subsystem. > VFIO calls it to get vIOMMU exposed flags. >* vIOMMU calls HostIOMMUDeviceIOMMUFD interface [at|de]tach_hwpt > to bind/unbind device to IOMMUFD backed domains, either nested > domain or not. > >See below diagram: > > VFIO Device Intel IOMMU > .-----------------. .-------------------. > | | | >| > | .---------|PCIIOMMUOps |.-------------. | > | | IOMMUFD |(set/unset_iommu_device) || Host IOMMU | >| > | | Device |------------------------>|| Device list | | > | .---------|(get_viommu_flags) |.-------------. | > | | | | >| > | | | V >| > | .---------| HostIOMMUDeviceIOMMUFD | .-------------. | > | | IOMMUFD | (attach_hwpt)| | Host IOMMU >| | > | | link |<------------------------| | Device | | > | .---------| (detach_hwpt)| .-------------. | > | | | | >| > | | | ... >| > .-----------------. .-------------------. > >Below is an example to enable first stage translation for passthrough device: > > -M q35,... > -device intel-iommu,x-scalable-mode=on,x-flts=on... > -object iommufd,id=iommufd0 -device vfio-pci,iommufd=iommufd0,... > >Test done: >- VFIO devices hotplug/unplug >- different VFIO devices linked to different iommufds >- vhost net device ping test >- migration with QAT passthrough > >PATCH01-08: Some preparing work >PATCH09-10: Compatibility check between vIOMMU and Host IOMMU >PATCH11-16: Implement first stage translation for passthrough device >PATCH17-18: Add migration support and optimization >PATCH19-21: Workaround for ERRATA_772415_SPR17 >PATCH22: Enable first stage translation for passthrough device >PATCH23: Add doc > >Qemu code can be found at [2], it's based on >vfio-next + migration_relax_series[3]. > >Fault event injection to guest isn't supported in this series, we presume guest >kernel always construct correct first stage page table for passthrough device. >For emulated devices, the emulation code already provided first stage fault >injection. > >TODO: >- Fault event injection to guest when HW first stage page table faults > >[1] >https://patchwork.kernel.org/project/kvm/cover/20210302203827.437645-1 >[email protected]/ >[2] https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting.v8 >[3] >https://lore.kernel.org/qemu-devel/20251106042027.856594-1-zhenzhong.d >[email protected]/ > >Thanks >Zhenzhong > >Changelog: >v8: >- add hw/i386/intel_iommu_accel.[hc] to hold accel code (Eric) >- return bool for all vtd accel related functions (Cedric, Eric) >- introduce a new PCIIOMMUOps::get_host_iommu_quirks() (Eric, Nicolin) >- minor polishment to comment and code (Cedric, Eric) >- drop some R-b as they have changes needing review again > >v7: >- s/host_iommu_extract_vendor_caps/host_iommu_extract_quirks (Nicolin) >- s/RID_PASID/PASID_0 (Eric) >- drop rid2pasid check in vtd_do_iommu_translate (Eric) >- refine DID check in vtd_pasid_cache_sync_locked (Liuyi) >- refine commit log (Nicolin, Eric, Liuyi) >- Fix doc build (Cedric) >- add migration support > >v6: >- delete RPS capability related supporting code (Eric, Yi) >- use terminology 'first/second stage' to replace 'first/second level" (Eric, >Yi) >- use get_viommu_flags() instead of get_viommu_caps() (Nicolin) >- drop non-RID_PASID related code and simplify pasid invalidation handling >(Eric, Yi) >- drop the patch that handle pasid replay when context invalidation (Eric) >- move vendor specific cap check from VFIO core to backend/iommufd.c >(Nicolin) > >v5: >- refine commit log of patch2 (Cedric, Nicolin) >- introduce helper vfio_pci_from_vfio_device() (Cedric) >- introduce helper vfio_device_viommu_get_nested() (Cedric) >- pass 'bool bypass_ro' argument to vfio_listener_valid_section() instead of >'VFIOContainerBase *' (Cedric) >- fix a potential build error reported by Jim Shu > >v4: >- s/VIOMMU_CAP_STAGE1/VIOMMU_CAP_HW_NESTED (Eric, Nicolin, >Donald, Shameer) >- clarify get_viommu_cap() return pure emulated caps and explain reason in >commit log (Eric) >- retrieve the ce only if vtd_as->pasid in vtd_as_to_iommu_pasid_locked (Eric) >- refine doc comment and commit log in patch10-11 (Eric) > >v3: >- define enum type for VIOMMU_CAP_* (Eric) >- drop inline flag in the patch which uses the helper (Eric) >- use extract64 in new introduced MACRO (Eric) >- polish comments and fix typo error (Eric) >- split workaround patch for ERRATA_772415_SPR17 to two patches (Eric) >- optimize bind/unbind error path processing > >v2: >- introduce get_viommu_cap() to get STAGE1 flag to create nesting parent >HWPT (Liuyi) >- reuse VFIO's default HWPT as parent HWPT of nested translation (Nicolin, >Liuyi) >- abandon support of VFIO device under pcie-to-pci bridge to simplify design >(Liuyi) >- bypass RO mapping in VFIO's default HWPT if ERRATA_772415_SPR17 (Liuyi) >- drop vtd_dev_to_context_entry optimization (Liuyi) > >v1: >- simplify vendor specific checking in vtd_check_hiod (Cedric, Nicolin) >- rebase to master > > >Yi Liu (3): > intel_iommu_accel: Propagate PASID-based iotlb invalidation to host > intel_iommu: Replay all pasid bindings when either SRTP or TE bit is > changed > intel_iommu: Replay pasid bindings after context cache invalidation > >Zhenzhong Duan (20): > intel_iommu: Rename vtd_ce_get_rid2pasid_entry to > vtd_ce_get_pasid_entry > intel_iommu: Delete RPS capability related supporting code > intel_iommu: Update terminology to match VTD spec > hw/pci: Export pci_device_get_iommu_bus_devfn() and return bool > hw/pci: Introduce pci_device_get_viommu_flags() > intel_iommu: Implement get_viommu_flags() callback > intel_iommu: Introduce a new structure VTDHostIOMMUDevice > vfio/iommufd: Force creating nesting parent HWPT > intel_iommu_accel: Check for compatibility with IOMMUFD backed device > when x-flts=on > intel_iommu_accel: Fail passthrough device under PCI bridge if > x-flts=on > intel_iommu_accel: Stick to system MR for IOMMUFD backed host device > when x-flts=on > intel_iommu: Add some macros and inline functions > intel_iommu_accel: Bind/unbind guest page table to host > vfio/listener: Bypass readonly region for dirty tracking > intel_iommu: Add migration support with x-flts=on > hw/pci: Introduce pci_device_get_host_iommu_quirks() > intel_iommu_accel: Implement get_host_iommu_quirks() callback > Workaround for ERRATA_772415_SPR17 > intel_iommu: Enable host device when x-flts=on in scalable mode > docs/devel: Add IOMMUFD nesting documentation > > MAINTAINERS | 2 + > docs/devel/vfio-iommufd.rst | 25 ++ > hw/i386/intel_iommu_accel.h | 55 ++++ > hw/i386/intel_iommu_internal.h | 155 ++++++--- > include/hw/i386/intel_iommu.h | 5 +- > include/hw/iommu.h | 30 ++ > include/hw/pci/pci.h | 55 ++++ > include/hw/vfio/vfio-container.h | 1 + > include/hw/vfio/vfio-device.h | 5 + > hw/i386/intel_iommu.c | 530 >++++++++++++++++++------------- > hw/i386/intel_iommu_accel.c | 272 ++++++++++++++++ > hw/pci/pci.c | 35 +- > hw/vfio/device.c | 26 ++ > hw/vfio/iommufd.c | 18 +- > hw/vfio/listener.c | 48 ++- > tests/qtest/intel-iommu-test.c | 4 +- > hw/i386/Kconfig | 5 + > hw/i386/meson.build | 1 + > hw/i386/trace-events | 4 + > hw/vfio/trace-events | 1 + > 20 files changed, 979 insertions(+), 298 deletions(-) > create mode 100644 hw/i386/intel_iommu_accel.h > create mode 100644 include/hw/iommu.h > create mode 100644 hw/i386/intel_iommu_accel.c > >-- >2.47.1
