On 6/7/24 5:10 PM, Joao Martins wrote:
On 06/06/2024 16:43, Cédric Le Goater wrote:
Hello Joao,

On 6/22/23 23:48, Joao Martins wrote:
Hey,

This series introduces support for vIOMMU with VFIO device migration,
particurlarly related to how we do the dirty page tracking.

Today vIOMMUs serve two purposes: 1) enable interrupt remaping 2)
provide dma translation services for guests to provide some form of
guest kernel managed DMA e.g. for nested virt based usage; (1) is specially
required for big VMs with VFs with more than 255 vcpus. We tackle both
and remove the migration blocker when vIOMMU is present provided the
conditions are met. I have both use-cases here in one series, but I am happy
to tackle them in separate series.

As I found out we don't necessarily need to expose the whole vIOMMU
functionality in order to just support interrupt remapping. x86 IOMMUs
on Windows Server 2018[2] and Linux >=5.10, with qemu 7.1+ (or really
Linux guests with commit c40aaaac10 and since qemu commit 8646d9c773d8)
can instantiate a IOMMU just for interrupt remapping without needing to
be advertised/support DMA translation. AMD IOMMU in theory can provide
the same, but Linux doesn't quite support the IR-only part there yet,
only intel-iommu.

The series is organized as following:

Patches 1-5: Today we can't gather vIOMMU details before the guest
establishes their first DMA mapping via the vIOMMU. So these first four
patches add a way for vIOMMUs to be asked of their properties at start
of day. I choose the least churn possible way for now (as opposed to a
treewide conversion) and allow easy conversion a posteriori. As
suggested by Peter Xu[7], I have ressurected Yi's patches[5][6] which
allows us to fetch PCI backing vIOMMU attributes, without necessarily
tieing the caller (VFIO or anyone else) to an IOMMU MR like I
was doing in v3.

Patches 6-8: Handle configs with vIOMMU interrupt remapping but without
DMA translation allowed. Today the 'dma-translation' attribute is
x86-iommu only, but the way this series is structured nothing stops from
other vIOMMUs supporting it too as long as they use
pci_setup_iommu_ops() and the necessary IOMMU MR get_attr attributes
are handled. The blocker is thus relaxed when vIOMMUs are able to toggle
the toggle/report DMA_TRANSLATION attribute. With the patches up to this set,
we've then tackled item (1) of the second paragraph.

Patches 9-15: Simplified a lot from v2 (patch 9) to only track the complete
IOVA address space, leveraging the logic we use to compose the dirty ranges.
The blocker is once again relaxed for vIOMMUs that advertise their IOVA
addressing limits. This tackles item (2). So far I mainly use it with
intel-iommu, although I have a small set of patches for virtio-iommu per
Alex's suggestion in v2.

Comments, suggestions welcome. Thanks for the review!


I spent sometime refreshing your series on upstream QEMU (See [1]) and
gave migration a try with CX-7 VF. LGTM. It doesn't seem we are far
from acceptance in QEMU 9.1. Are we ?

Yeah.

There was a comment from Zhenzhong on the vfio_viommu_preset() here[0]. But I
was looking at that to remind myself what was it that we had to change, but even
with re-reading the thread I can't spot any flaw that needs change.

[0]
https://lore.kernel.org/qemu-devel/de2b72d2-f56b-9350-ce0f-70edfb58e...@intel.com/#r

I introduced a vfio_devices_all_viommu_preset() routine to check all devices
in a container and a simplified version of vfio_viommu_get_max_iova()
returning the space max_iova.


First, I will resend these with the changes I made :

   vfio/common: Extract vIOMMU code from vfio_sync_dirty_bitmap()
   vfio/common: Move dirty tracking ranges update to helper()

I guess the PCIIOMMUOps::get_iommu_attr needs a close review. Is
IOMMU_ATTR_DMA_TRANSLATION a must have ?

It's sort of the 'correct way' of relaxing vIOMMU checks, because you are 100%
guaranteed that the guest won't do DMA. The other outstanding thing related to
that is for older kernels which is to use the directmap for dirty page tracking,
but the moment a mapping is attempted the migration doesn't start or if it's in
progress it gets aborted[*]:

https://lore.kernel.org/qemu-devel/20230908120521.50903-1-joao.m.mart...@oracle.com/

The above link and DMA_TRANSLATION is mostly for the usecase we use that only
cares about vIOMMU for interrupt remapping only and no DMA translation services.
But we can't just disable dma-translation in qemu because it may crash older
kernels, so it supports both old and new this way.

[*] Recently I noticed you improved error reporting, so
vfio_set_migration_error(-EOPNOTSUPP) probably has a better way of getting 
there.

Yes. So, I did a little more change to improve vfio_dirty_tracking_init().

The rest is mostly VFIO internals for dirty tracking.

Right.

I derailed with other work and also stuff required for iommu dirty tracking that
I forgot about these patches, sorry.

That's fine.

I am trying to sort out which patches could be merged in advance for QEMU 9.1
and your series has shrunk a lot since it was last sent. I might resend the
whole to cherry pick the simple changes and get some R-b tags.


For the record, here is my watch list:

QEMU series under review:

* [v7] Add a host IOMMU device abstraction to check with vIOMMU
  https://lore.kernel.org/all/20240605083043.317831-1-zhenzhong.d...@intel.com

  Needs feedback on the PCI IOMMU ops. vIOMMU "iommufd" property ?
  Pushed on vfio-9.1 branch.
* [RFC v2] VIRTIO-IOMMU/VFIO: Fix host iommu geometry
  https://lore.kernel.org/all/20240607143905.765133-1-eric.au...@redhat.com
Pushed on vfio-9.1 branch.

Need a resend :

* [v4] vfio: VFIO migration support with vIOMMU
  
https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.mart...@oracle.com/
Refreshed the patchset on upstream and pushed on vfio-9.1 branch. * [RFCv2] vfio/iommufd: IOMMUFD Dirty Tracking
  
https://lore.kernel.org/qemu-devel/20240212135643.5858-1-joao.m.mart...@oracle.com/

* [v1] vfio: container: Fix missing allocation of VFIOSpaprContainer
  
https://lore.kernel.org/qemu-devel/171528203026.8420.10620440513237875837.st...@ltcd48-lp2.aus.stglabs.ibm.com/
Other interesting series (IOMMU related):

* [rfcv2] intel_iommu: Enable stage-1 translation for emulated device
  https://lore.kernel.org/all/20240522062313.453317-1-zhenzhong.d...@intel.com/

* [PATCH ats_vtd v2 00/25] ATS support for VT-d
  
https://lore.kernel.org/all/20240515071057.33990-1-clement.mathieu--d...@eviden.com/

* [RFC v3] SMMUv3 nested translation support
  https://lore.kernel.org/qemu-devel/20240429032403.74910-1-smost...@google.com/

* [PATCH RFCv1 00/14] Add Tegra241 (Grace) CMDQV Support (part 2/2)
  https://lore.kernel.org/all/cover.1712978212.git.nicol...@nvidia.com/

  Yet to be sent,
  https://github.com/nicolinc/qemu/commits/wip/iommufd_vcmdq/

* [RFC] Multifd device state transfer support with VFIO consumer
  https://lore.kernel.org/all/cover.1713269378.git.maciej.szmigi...@oracle.com/


Thanks,

C.



Reply via email to