On 5/30/25 7:41 AM, Michael S. Tsirkin wrote:
On Fri, May 02, 2025 at 02:15:45AM +0000, Alejandro Jimenez wrote:
This series adds support for guests using the AMD vIOMMU to enable DMA
remapping for VFIO devices. In addition to the currently supported
passthrough (PT) mode, guest kernels are now able to to provide DMA
address translation and access permission checking to VFs attached to
paging domains, using the AMD v1 I/O page table format.
Please see v1[0] cover letter for additional details such as example
QEMU command line parameters used in testing.
are you working on v3?
Yes, there are suggestions from Sairaj that I will address on v3. I am
also planning to include two small patches from Joao Martins that add
support for the HATDis feature (this is something that Sairaj suggested
earlier). The Linux changes are being reviewed here:
https://lore.kernel.org/all/cover.1746613368.git.ankit.s...@amd.com/
I will be offline from 6/2 to 6/6, so I didn't want to send a new
revision and disappear. In general, the changes from v2->v3 are minor
and well contained, so any reviews I receive for v2 will be valid.
That being said, I can send v3 today if you'd prefer that. Please let me
know.
there was a bug you wanted to fix.
I assume the bug is Sairaj's report of a dmesg warning with an NVME
passthrough on a 4.15 kernel, but unfortunately I have not been able to
reproduce that problem. We agreed that given the age of the kernel (and
reports of the same warning on NVME devices in unrelated scenarios),
this is likely a guest driver issue, and should not be a blocker.
More details:
I have tested an Ubuntu image with a 4.15 kernel, but I cannot hit any
issues when I passthrough a CX-6 VF (I don't have access to NMVE VF).
The kernel is old enough that I have to force bind the mlx5_core driver
to the VF on the guest, but once I do the VF comes up with no errors and
I can see DMA map/unmap activity in the traces.
Sairaj: Are you passing a full NVME device to the guest (i.e. a PF)? I
ask because the BDF in '-device vfio-pci,host=0000:44:00.0' doesn't look
like a typical VF...
Thank you,
Alejandro
Changes since v1[0]:
- Added documentation entry for '-device amd-iommu'
- Code movement with no functional changes to avoid use of forward
declarations in later patches [Sairaj, mst]
- Moved addr_translation and dma-remap property to separate commits.
The dma-remap feature is only available for users to enable after
all required functionality is implemented [Sairaj]
- Explicit initialization of significant fields like addr_translation
and notifier_flags [Sairaj]
- Fixed bug in decoding of invalidation size [Sairaj]
- Changed fetch_pte() to use an out parameter for pte, and be able to
check for error conditions via negative return value [Clement]
- Removed UNMAP-only notifier optimization, leaving vhost support for
later series [Sairaj]
- Fixed ordering between address space unmap and memory region activation
on devtab invalidation [Sairaj]
- Fixed commit message with "V=1, TV=0" [Sairaj]
- Dropped patch removing the page_fault event. That area is better
addressed in separate series.
- Independent testing by Sairaj (thank you!)
Thank you,
Alejandro
[0]
https://lore.kernel.org/all/20250414020253.443831-1-alejandro.j.jime...@oracle.com/
Alejandro Jimenez (20):
memory: Adjust event ranges to fit within notifier boundaries
amd_iommu: Document '-device amd-iommu' common options
amd_iommu: Reorder device and page table helpers
amd_iommu: Helper to decode size of page invalidation command
amd_iommu: Add helper function to extract the DTE
amd_iommu: Return an error when unable to read PTE from guest memory
amd_iommu: Add helpers to walk AMD v1 Page Table format
amd_iommu: Add a page walker to sync shadow page tables on
invalidation
amd_iommu: Add basic structure to support IOMMU notifier updates
amd_iommu: Sync shadow page tables on page invalidation
amd_iommu: Use iova_tree records to determine large page size on UNMAP
amd_iommu: Unmap all address spaces under the AMD IOMMU on reset
amd_iommu: Add replay callback
amd_iommu: Invalidate address translations on INVALIDATE_IOMMU_ALL
amd_iommu: Toggle memory regions based on address translation mode
amd_iommu: Set all address spaces to default translation mode on reset
amd_iommu: Add dma-remap property to AMD vIOMMU device
amd_iommu: Toggle address translation mode on devtab entry
invalidation
amd_iommu: Do not assume passthrough translation when DTE[TV]=0
amd_iommu: Refactor amdvi_page_walk() to use common code for page walk
hw/i386/amd_iommu.c | 1005 ++++++++++++++++++++++++++++++++++++-------
hw/i386/amd_iommu.h | 52 +++
qemu-options.hx | 23 +
system/memory.c | 10 +-
4 files changed, 934 insertions(+), 156 deletions(-)
base-commit: 5134cf9b5d3aee4475fe7e1c1c11b093731073cf
--
2.43.5