Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
On Thu, 7 Apr 2022 12:23:31 -0300 Jason Gunthorpe wrote: > On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote: > > > For the specific case of overriding PCIe No Snoop (which is more problematic > > from an Arm SMMU PoV) when assigning to a VM, would that not be easier > > solved by just having vfio-pci clear the "Enable No Snoop" control bit in > > the endpoint's PCIe capability? > > Ideally. > > That was rediscussed recently, apparently there are non-compliant > devices and drivers that just ignore the bit. > > Presumably this is why x86 had to move to an IOMMU enforced feature.. I considered this option when implementing the current solution, but ultimately I didn't have confidence in being able to prevent drivers from using device specific means to effect the change anyway. GPUs especially have various back channels to config space. Thanks, Alex ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
On Thu, Apr 07, 2022 at 04:17:11PM +0100, Robin Murphy wrote: >> My take is that the drivers using this API are doing it to make sure >> their HW blocks are setup in a way that is consistent with the DMA API >> they are also using, and run in constrained embedded-style >> environments that know the firmware support is present. >> >> So in the end it does not seem suitable right now for linking to >> IOMMU_CACHE.. > > That seems a pretty good summary - I think they're basically all "firmware > told Linux I'm coherent so I'd better act coherent" cases, but that still > doesn't necessarily mean that they're *forced* to respect that. Yes. And the interface is horribly misnamed for that. I'll see what I can do to clean this up as I've noticed various other not very nice things in that area. > One of the > things on my to-do list is to try adding a DMA_ATTR_NO_SNOOP that can force > DMA cache maintenance for coherent devices, primarily to hook up in > Panfrost (where there is a bit of a performance to claw back on the > coherent AmLogic SoCs by leaving certain buffers non-cacheable). This has been an explicit request from the amdgpu folks and thus been on my TODO list for quite a while as well. Note that I don't think it should be a flag to dma_alloc_attrs, but rather for dma_alloc_pages as the drivers that want non-snoop generally also want to actually be able to deal with pages. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
On 2022-04-07 14:59, Jason Gunthorpe wrote: On Thu, Apr 07, 2022 at 07:18:48AM +, Tian, Kevin wrote: From: Jason Gunthorpe Sent: Thursday, April 7, 2022 1:17 AM On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote: On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote: On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote: On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote: Oh, I didn't know about device_get_dma_attr().. Which is completely broken for any non-OF, non-ACPI plaform. I saw that, but I spent some time searching and could not find an iommu driver that would load independently of OF or ACPI. ie no IOMMU platform drivers are created by board files. Things like Intel/AMD discover only from ACPI, etc. Intel discovers IOMMUs (and optionally ACPI namespace devices) from ACPI, but there is no ACPI description for PCI devices i.e. the current logic of device_get_dma_attr() cannot be used on PCI devices. Oh? So on x86 acpi_get_dma_attr() returns DEV_DMA_NON_COHERENT or DEV_DMA_NOT_SUPPORTED? I think it _should_ return DEV_DMA_COHERENT on x86/IA-64 (unless a _CCA method was actually present to say otherwise), based on acpi_init_coherency(), but I only know for sure what happens on arm64. I think I should give up on this and just redefine the existing iommu cap flag to IOMMU_CAP_CACHE_SUPPORTED or something. TBH I don't see any issue with current name, but I'd certainly be happy to nail down a specific definition for it, along the lines of "this means that IOMMU_CACHE mappings are generally coherent". That works for things like Arm's S2FWB making it OK to assign an otherwise-non-coherent device without extra hassle. For the specific case of overriding PCIe No Snoop (which is more problematic from an Arm SMMU PoV) when assigning to a VM, would that not be easier solved by just having vfio-pci clear the "Enable No Snoop" control bit in the endpoint's PCIe capability? We could alternatively use existing device_get_dma_attr() as a default with an iommu wrapper and push the exception down through the iommu driver and s390 can override it. if going this way probably device_get_dma_attr() should be renamed to device_fwnode_get_dma_attr() instead to make it clearer? I'm looking at the few users: drivers/ata/ahci_ceva.c drivers/ata/ahci_qoriq.c - These are ARM only drivers. They are trying to copy the dma-coherent property from its DT/ACPI definition to internal register settings which look like they tune how the AXI bus transactions are created. I'm guessing the SATA IP block's AXI interface can be configured to generate coherent or non-coherent requests and it has to be set in a way that is consistent with the SOC architecture and match what the DMA API expects the device will do. drivers/crypto/ccp/sp-platform.c - Only used on ARM64 and also programs a HW register similar to the sata drivers. Refuses to work if the FW property is not present. drivers/net/ethernet/amd/xgbe/xgbe-platform.c - Seems to be configuring another ARM AXI block drivers/gpu/drm/panfrost/panfrost_drv.c - Robin's commit comment here is good, and one of the things this controls is if the coherent_walk is set for the io-pgtable-arm.c code which avoids DMA API calls drivers/gpu/drm/tegra/uapi.c - Returns DRM_TEGRA_CHANNEL_CAP_CACHE_COHERENT to userspace. No idea. My take is that the drivers using this API are doing it to make sure their HW blocks are setup in a way that is consistent with the DMA API they are also using, and run in constrained embedded-style environments that know the firmware support is present. So in the end it does not seem suitable right now for linking to IOMMU_CACHE.. That seems a pretty good summary - I think they're basically all "firmware told Linux I'm coherent so I'd better act coherent" cases, but that still doesn't necessarily mean that they're *forced* to respect that. One of the things on my to-do list is to try adding a DMA_ATTR_NO_SNOOP that can force DMA cache maintenance for coherent devices, primarily to hook up in Panfrost (where there is a bit of a performance to claw back on the coherent AmLogic SoCs by leaving certain buffers non-cacheable). Cheers, Robin. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
> From: Jason Gunthorpe > Sent: Thursday, April 7, 2022 1:17 AM > > On Wed, Apr 06, 2022 at 06:10:31PM +0200, Christoph Hellwig wrote: > > On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote: > > > On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote: > > > > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote: > > > > > > Oh, I didn't know about device_get_dma_attr().. > > > > > > > > Which is completely broken for any non-OF, non-ACPI plaform. > > > > > > I saw that, but I spent some time searching and could not find an > > > iommu driver that would load independently of OF or ACPI. ie no IOMMU > > > platform drivers are created by board files. Things like Intel/AMD > > > discover only from ACPI, etc. Intel discovers IOMMUs (and optionally ACPI namespace devices) from ACPI, but there is no ACPI description for PCI devices i.e. the current logic of device_get_dma_attr() cannot be used on PCI devices. > > > > s390? > > Ah, I missed looking in s390, hyperv and virtio.. > > hyperv is not creating iommu_domains, just IRQ remapping > > virtio is using OF > > And s390 indeed doesn't obviously have OF or ACPI parts.. > > This seems like it would be consistent with other things: > > enum dev_dma_attr device_get_dma_attr(struct device *dev) > { > const struct fwnode_handle *fwnode = dev_fwnode(dev); > struct acpi_device *adev = to_acpi_device_node(fwnode); > > if (is_of_node(fwnode)) { > if (of_dma_is_coherent(to_of_node(fwnode))) > return DEV_DMA_COHERENT; > return DEV_DMA_NON_COHERENT; > } else if (adev) { > return acpi_get_dma_attr(adev); > } > > /* Platform is always DMA coherent */ > if (!IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) && > !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) && > !IS_ENABLED(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) && > device_iommu_mapped(dev)) > return DEV_DMA_COHERENT; > return DEV_DMA_NOT_SUPPORTED; > } > EXPORT_SYMBOL_GPL(device_get_dma_attr); > > ie s390 has no of or acpi but the entire platform is known DMA > coherent at config time so allow it. Not sure we need the > device_iommu_mapped() or not. Probably not. If adding an iommu may change the coherency it would come from specific IOPTE attributes for a device. The fact whether the device is mapped by iommu doesn't tell that information. > > We could alternatively use existing device_get_dma_attr() as a default > with an iommu wrapper and push the exception down through the iommu > driver and s390 can override it. > if going this way probably device_get_dma_attr() should be renamed to device_fwnode_get_dma_attr() instead to make it clearer? Thanks Kevin ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
On Wed, Apr 06, 2022 at 01:06:23PM -0300, Jason Gunthorpe wrote: > On Wed, Apr 06, 2022 at 05:50:56PM +0200, Christoph Hellwig wrote: > > On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote: > > > > Oh, I didn't know about device_get_dma_attr().. > > > > Which is completely broken for any non-OF, non-ACPI plaform. > > I saw that, but I spent some time searching and could not find an > iommu driver that would load independently of OF or ACPI. ie no IOMMU > platform drivers are created by board files. Things like Intel/AMD > discover only from ACPI, etc. s390? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
On Wed, Apr 06, 2022 at 12:18:23PM -0300, Jason Gunthorpe wrote: > > Oh, I didn't know about device_get_dma_attr().. Which is completely broken for any non-OF, non-ACPI plaform. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
On 2022-04-06 15:14, Jason Gunthorpe wrote: On Wed, Apr 06, 2022 at 03:51:50PM +0200, Christoph Hellwig wrote: On Wed, Apr 06, 2022 at 09:07:30AM -0300, Jason Gunthorpe wrote: Didn't see it I'll move dev_is_dma_coherent to device.h along with device_iommu_mapped() and others then No. It it is internal for a reason. It also doesn't actually work outside of the dma core. E.g. for non-swiotlb ARM configs it will not actually work. Really? It is the only condition that dma_info_to_prot() tests to decide of IOMMU_CACHE is used or not, so you are saying that there is a condition where a device can be attached to an iommu_domain and dev_is_dma_coherent() returns the wrong information? How does dma-iommu.c safely use it then? The common iommu-dma layer happens to be part of the subset of the DMA core which *does* play the dev->dma_coherent game. 32-bit Arm has its own IOMMU DMA ops which do not. I don't know if the set of PowerPCs with CONFIG_NOT_COHERENT_CACHE intersects the set of PowerPCs that can do VFIO, but that would be another example if so. In any case I still need to do something about the places checking IOMMU_CAP_CACHE_COHERENCY and thinking that means IOMMU_CACHE works. Any idea? Can we improve the IOMMU drivers such that that *can* be the case (within a reasonable margin of error)? That's kind of where I was hoping to head with device_iommu_capable(), e.g. [1]. Robin. [1] https://gitlab.arm.com/linux-arm/linux-rm/-/commit/53390e9505b3791adedc0974e251e5c7360e402e ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
On Wed, Apr 06, 2022 at 11:14:46AM -0300, Jason Gunthorpe wrote: > Really? It is the only condition that dma_info_to_prot() tests to > decide of IOMMU_CACHE is used or not, so you are saying that there is > a condition where a device can be attached to an iommu_domain and > dev_is_dma_coherent() returns the wrong information? How does > dma-iommu.c safely use it then? arm does not use dma-iommu.c ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
On 2022-04-05 17:16, Jason Gunthorpe wrote: vdpa and usnic are trying to test if IOMMU_CACHE is supported. The correct way to do this is via dev_is_dma_coherent() Not necessarily... Disregarding the complete disaster of PCIe No Snoop on Arm-Based systems, there's the more interesting effectively-opposite scenario where an SMMU bridges non-coherent devices to a coherent interconnect. It's not something we take advantage of yet in Linux, and it can only be properly described in ACPI, but there do exist situations where IOMMU_CACHE is capable of making the device's traffic snoop, but dev_is_dma_coherent() - and device_get_dma_attr() for external users - would still say non-coherent because they can't assume that the SMMU is enabled and programmed in just the right way. I've also not thought too much about how things might look with S2FWB thrown into the mix in future... Robin. like the DMA API does. If IOMMU_CACHE is not supported then these drivers won't work as they don't call any coherency-restoring routines around their DMAs. Signed-off-by: Jason Gunthorpe --- drivers/infiniband/hw/usnic/usnic_uiom.c | 16 +++- drivers/vhost/vdpa.c | 3 ++- 2 files changed, 9 insertions(+), 10 deletions(-) diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c index 760b254ba42d6b..24d118198ac756 100644 --- a/drivers/infiniband/hw/usnic/usnic_uiom.c +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c @@ -42,6 +42,7 @@ #include #include #include +#include #include "usnic_log.h" #include "usnic_uiom.h" @@ -474,6 +475,12 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev) struct usnic_uiom_dev *uiom_dev; int err; + if (!dev_is_dma_coherent(dev)) { + usnic_err("IOMMU of %s does not support cache coherency\n", + dev_name(dev)); + return -EINVAL; + } + uiom_dev = kzalloc(sizeof(*uiom_dev), GFP_ATOMIC); if (!uiom_dev) return -ENOMEM; @@ -483,13 +490,6 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev) if (err) goto out_free_dev; - if (!iommu_capable(dev->bus, IOMMU_CAP_CACHE_COHERENCY)) { - usnic_err("IOMMU of %s does not support cache coherency\n", - dev_name(dev)); - err = -EINVAL; - goto out_detach_device; - } - spin_lock(>lock); list_add_tail(_dev->link, >devs); pd->dev_cnt++; @@ -497,8 +497,6 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd *pd, struct device *dev) return 0; -out_detach_device: - iommu_detach_device(pd->domain, dev); out_free_dev: kfree(uiom_dev); return err; diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 4c2f0bd062856a..05ea5800febc37 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -22,6 +22,7 @@ #include #include #include +#include #include "vhost.h" @@ -929,7 +930,7 @@ static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v) if (!bus) return -EFAULT; - if (!iommu_capable(bus, IOMMU_CAP_CACHE_COHERENCY)) + if (!dev_is_dma_coherent(dma_dev)) return -ENOTSUPP; v->domain = iommu_domain_alloc(bus); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
On Wed, Apr 06, 2022 at 09:07:30AM -0300, Jason Gunthorpe wrote: > Didn't see it > > I'll move dev_is_dma_coherent to device.h along with > device_iommu_mapped() and others then No. It it is internal for a reason. It also doesn't actually work outside of the dma core. E.g. for non-swiotlb ARM configs it will not actually work. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 1/5] iommu: Replace uses of IOMMU_CAP_CACHE_COHERENCY with dev_is_dma_coherent()
On Tue, Apr 05, 2022 at 01:16:00PM -0300, Jason Gunthorpe wrote: > diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c > b/drivers/infiniband/hw/usnic/usnic_uiom.c > index 760b254ba42d6b..24d118198ac756 100644 > --- a/drivers/infiniband/hw/usnic/usnic_uiom.c > +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c > @@ -42,6 +42,7 @@ > #include > #include > #include > +#include > > #include "usnic_log.h" > #include "usnic_uiom.h" > @@ -474,6 +475,12 @@ int usnic_uiom_attach_dev_to_pd(struct usnic_uiom_pd > *pd, struct device *dev) > struct usnic_uiom_dev *uiom_dev; > int err; > > + if (!dev_is_dma_coherent(dev)) { Which part of the comment at the top of dma-map-ops.h is not clear enough to you? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization