RE: [PATCH V9 06/10] iommu/vt-d: Add svm/sva invalidate function
> From: Jacob Pan > Sent: Wednesday, January 29, 2020 2:02 PM > Subject: [PATCH V9 06/10] iommu/vt-d: Add svm/sva invalidate function > > When Shared Virtual Address (SVA) is enabled for a guest OS via vIOMMU, we > need to provide invalidation support at IOMMU API and driver level. This patch > adds Intel VT-d specific function to implement iommu passdown invalidate API > for shared virtual address. > > The use case is for supporting caching structure invalidation of assigned SVM > capable devices. Emulated IOMMU exposes queue invalidation capability and > passes down all descriptors from the guest to the physical IOMMU. > > The assumption is that guest to host device ID mapping should be resolved > prior > to calling IOMMU driver. Based on the device handle, host IOMMU driver can > replace certain fields before submit to the invalidation queue. > > Signed-off-by: Jacob Pan > Signed-off-by: Ashok Raj > Signed-off-by: Liu, Yi L May be update my email to Liu Yi L :-) @linux.intel.com account no more work for me. :-( > --- > drivers/iommu/intel-iommu.c | 173 > > 1 file changed, 173 insertions(+) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index > 8a4136e805ac..b8aa6479b87f 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -5605,6 +5605,178 @@ static void intel_iommu_aux_detach_device(struct > iommu_domain *domain, > aux_domain_remove_dev(to_dmar_domain(domain), dev); } > > +/* > + * 2D array for converting and sanitizing IOMMU generic TLB granularity > +to > + * VT-d granularity. Invalidation is typically included in the unmap > +operation > + * as a result of DMA or VFIO unmap. However, for assigned device where > +guest > + * could own the first level page tables without being shadowed by > +QEMU. In > + * this case there is no pass down unmap to the host IOMMU as a result > +of unmap > + * in the guest. Only invalidations are trapped and passed down. > + * In all cases, only first level TLB invalidation (request with PASID) > +can be > + * passed down, therefore we do not include IOTLB granularity for > +request > + * without PASID (second level). > + * > + * For an example, to find the VT-d granularity encoding for IOTLB > + * type and page selective granularity within PASID: > + * X: indexed by iommu cache type > + * Y: indexed by enum iommu_inv_granularity > + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR] > + * > + * Granu_map array indicates validity of the table. 1: valid, 0: > +invalid > + * > + */ > +const static int > inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_N > R] = { > + /* PASID based IOTLB, support PASID selective and page selective */ > + {0, 1, 1}, > + /* PASID based dev TLBs, only support all PASIDs or single PASID */ > + {1, 1, 0}, > + /* PASID cache */ > + {1, 1, 0} > +}; > + > +const static u64 > inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_ > NR] = { > + /* PASID based IOTLB */ > + {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID}, > + /* PASID based dev TLBs */ > + {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0}, > + /* PASID cache */ > + {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0}, }; > + > +static inline int to_vtd_granularity(int type, int granu, u64 > +*vtd_granu) { > + if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >= > IOMMU_INV_GRANU_NR || > + !inv_type_granu_map[type][granu]) > + return -EINVAL; > + > + *vtd_granu = inv_type_granu_table[type][granu]; > + > + return 0; > +} > + > +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules) { > + u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT; > + > + /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc. > + * IOMMU cache invalidate API passes granu_size in bytes, and number > of > + * granu size in contiguous memory. > + */ > + return order_base_2(nr_pages); > +} > + > +#ifdef CONFIG_INTEL_IOMMU_SVM > +static int intel_iommu_sva_invalidate(struct iommu_domain *domain, > + struct device *dev, struct iommu_cache_invalidate_info > *inv_info) { > + struct dmar_domain *dmar_domain = to_dmar_domain(domain); > + struct device_domain_info *info; > + struct intel_iommu *iommu; > + unsigned long flags; > + int cache_type; > + u8 bus, devfn; > + u16 did, sid; > + int ret = 0; > + u64 size; > + > + if (!inv_info || !dmar_domain || > + inv_info->version != > IOMMU_CACHE_INVALIDATE_INFO_VERSION_1) > + return -EINVAL; > + > + if (!dev || !dev_is_pci(dev)) > + return -ENODEV; > + > + iommu = device_to_iommu(dev, , ); > + if (!iommu) > + return -ENODEV; > + > + spin_lock_irqsave(_domain_lock, flags); > + spin_lock(>lock); > + info = iommu_support_dev_iotlb(dmar_domain, iommu, bus, devfn); > + if (!info) { >
Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM
On 2020/2/21 上午12:06, Halil Pasic wrote: Currently if one intends to run a memory protection enabled VM with virtio devices and linux as the guest OS, one needs to specify the VIRTIO_F_IOMMU_PLATFORM flag for each virtio device to make the guest linux use the DMA API, which in turn handles the memory encryption/protection stuff if the guest decides to turn itself into a protected one. This however makes no sense due to multiple reasons: * The device is not changed by the fact that the guest RAM is protected. The so called IOMMU bypass quirk is not affected. * This usage is not congruent with standardised semantics of VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal reason for using DMA API in virtio (orthogonal with respect to what is expressed by VIRTIO_F_IOMMU_PLATFORM). This series aims to decouple 'have to use DMA API because my (guest) RAM is protected' and 'have to use DMA API because the device told me VIRTIO_F_IOMMU_PLATFORM'. Please find more detailed explanations about the conceptual aspects in the individual patches. There is however also a very practical problem that is addressed by this series. For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following side effect The vhost code assumes it the addresses on the virtio descriptor ring are not guest physical addresses but iova's, and insists on doing a translation of these regardless of what transport is used (e.g. whether we emulate a PCI or a CCW device). (For details see commit 6b1e6cc7855b "vhost: new device IOTLB API".) On s390 this results in severe performance degradation (c.a. factor 10). Do you see a consistent degradation on the performance, or it only happen when for during the beginning of the test? BTW with ccw I/O there is (architecturally) no IOMMU, so the whole address translation makes no sense in the context of virtio-ccw. I suspect we can do optimization in qemu side. E.g send memtable entry via IOTLB API when vIOMMU is not enabled. If this makes sense, I can draft patch to see if there's any difference. Thanks Halil Pasic (2): mm: move force_dma_unencrypted() to mem_encrypt.h virtio: let virtio use DMA API when guest RAM is protected drivers/virtio/virtio_ring.c | 3 +++ include/linux/dma-direct.h | 9 - include/linux/mem_encrypt.h | 10 ++ 3 files changed, 13 insertions(+), 9 deletions(-) base-commit: ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected
On 2020/2/21 上午10:59, David Gibson wrote: On Thu, Feb 20, 2020 at 05:13:09PM +0100, Christoph Hellwig wrote: On Thu, Feb 20, 2020 at 05:06:06PM +0100, Halil Pasic wrote: diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 867c7ebd3f10..fafc8f924955 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -243,6 +243,9 @@ static bool vring_use_dma_api(struct virtio_device *vdev) if (!virtio_has_iommu_quirk(vdev)) return true; + if (force_dma_unencrypted(>dev)) + return true; Hell no. This is a detail of the platform DMA direct implementation. Drivers have no business looking at this flag, and virtio finally needs to be fixed to use the DMA API properly for everything but legacy devices. So, this patch definitely isn't right as it stands, but I'm struggling to understand what it is you're saying is the right way. By "legacy devices" I assume you mean pre-virtio-1.0 devices, that lack the F_VERSION_1 feature flag. Is that right? Because I don't see how being a legacy device or not relates to use of the DMA API. I *think* what you are suggesting here is that virtio devices that have !F_IOMMU_PLATFORM should have their dma_ops set up so that the DMA API treats IOVA==PA, which will satisfy what the device expects. Can this work for swiotlb? Thanks Then the virtio driver can use the DMA API the same way for both F_IOMMU_PLATFORM and !F_IOMMU_PLATFORM devices. But if that works for !F_IOMMU_PLATFORM_DEVICES+F_VERSION_1 devices, then AFAICT it will work equally well for legacy devices. Using the DMA API for *everything* in virtio, legacy or not, seems like a reasonable approach to me. But, AFAICT, that does require the DMA layer to have some kind of explicit call to turn on this behaviour, which the virtio driver would call during initializsation. I don't think we can do it 100% within the DMA layer, because only the driver can reasonably know when a device has this weird non-standard DMA behaviour. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected
On Thu, Feb 20, 2020 at 05:17:48PM -0800, Ram Pai wrote: > On Thu, Feb 20, 2020 at 03:55:14PM -0500, Michael S. Tsirkin wrote: > > On Thu, Feb 20, 2020 at 05:06:06PM +0100, Halil Pasic wrote: > > > Currently the advanced guest memory protection technologies (AMD SEV, > > > powerpc secure guest technology and s390 Protected VMs) abuse the > > > VIRTIO_F_IOMMU_PLATFORM flag to make virtio core use the DMA API, which > > > is in turn necessary, to make IO work with guest memory protection. > > > > > > But VIRTIO_F_IOMMU_PLATFORM a.k.a. VIRTIO_F_ACCESS_PLATFORM is really a > > > different beast: with virtio devices whose implementation runs on an SMP > > > CPU we are still fine with doing all the usual optimizations, it is just > > > that we need to make sure that the memory protection mechanism does not > > > get in the way. The VIRTIO_F_ACCESS_PLATFORM mandates more work on the > > > side of the guest (and possibly he host side as well) than we actually > > > need. > > > > > > An additional benefit of teaching the guest to make the right decision > > > (and use DMA API) on it's own is: removing the need, to mandate special > > > VM configuration for guests that may run with protection. This is > > > especially interesting for s390 as VIRTIO_F_IOMMU_PLATFORM pushes all > > > the virtio control structures into the first 2G of guest memory: > > > something we don't necessarily want to do per-default. > > > > > > Signed-off-by: Halil Pasic > > > Tested-by: Ram Pai > > > Tested-by: Michael Mueller > > > > This might work for you but it's fragile, since without > > VIRTIO_F_ACCESS_PLATFORM hypervisor assumes it gets > > GPA's, not DMA addresses. > > > > > > > > IOW this looks like another iteration of: > > > > virtio: Support encrypted memory on powerpc secure guests > > > > which I was under the impression was abandoned as unnecessary. > > It has been abondoned on powerpc. We enabled VIRTIO_F_ACCESS_PLATFORM; > by default, flag on powerpc. Uh... we haven't yet, though we're working on it. > We would like to enable secure guests on powerpc without this flag > aswell enabled, but past experience has educated us that its not a easy > path. However if Halil makes some inroads in this path for s390, we > will like to support him. > > > RP > -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h
On Thu, Feb 20, 2020 at 05:31:35PM +0100, Christoph Hellwig wrote: > On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote: > > >From a users perspective it makes absolutely perfect sense to use the > > bounce buffers when they are NEEDED. > > Forcing the user to specify iommu_platform just because you need bounce > > buffers > > really feels wrong. And obviously we have a severe performance issue > > because of the indirections. > > The point is that the user should not have to specify iommu_platform. > We need to make sure any new hypervisor (especially one that might require > bounce buffering) always sets it, So, I have draft qemu patches which enable iommu_platform by default. But that's really because of other problems with !iommu_platform, not anything to do with bounce buffering or secure VMs. The thing is that the hypervisor *doesn't* require bounce buffering. In the POWER (and maybe s390 as well) models for Secure VMs, it's the *guest*'s choice to enter secure mode, so the hypervisor has no reason to know whether the guest needs bounce buffering. As far as the hypervisor and qemu are concerned that's a guest internal detail, it just expects to get addresses it can access whether those are GPAs (iommu_platform=off) or IOVAs (iommu_platform=on). > as was a rather bogus legacy hack It was certainly a bad idea, but it was a bad idea that went into a public spec and has been widely deployed for many years. We can't just pretend it didn't happen and move on. Turning iommu_platform=on by default breaks old guests, some of which we still care about. We can't (automatically) do it only for guests that need bounce buffering, because the hypervisor doesn't know that ahead of time. > that isn't extensibe for cases that for example require bounce buffering. In fact bounce buffering isn't really the issue from the hypervisor (or spec's) point of view. It's the fact that not all of guest memory is accessible to the hypervisor. Bounce buffering is just one way the guest might deal with that. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected
On Thu, Feb 20, 2020 at 05:13:09PM +0100, Christoph Hellwig wrote: > On Thu, Feb 20, 2020 at 05:06:06PM +0100, Halil Pasic wrote: > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > > index 867c7ebd3f10..fafc8f924955 100644 > > --- a/drivers/virtio/virtio_ring.c > > +++ b/drivers/virtio/virtio_ring.c > > @@ -243,6 +243,9 @@ static bool vring_use_dma_api(struct virtio_device > > *vdev) > > if (!virtio_has_iommu_quirk(vdev)) > > return true; > > > > + if (force_dma_unencrypted(>dev)) > > + return true; > > Hell no. This is a detail of the platform DMA direct implementation. > Drivers have no business looking at this flag, and virtio finally needs > to be fixed to use the DMA API properly for everything but legacy devices. So, this patch definitely isn't right as it stands, but I'm struggling to understand what it is you're saying is the right way. By "legacy devices" I assume you mean pre-virtio-1.0 devices, that lack the F_VERSION_1 feature flag. Is that right? Because I don't see how being a legacy device or not relates to use of the DMA API. I *think* what you are suggesting here is that virtio devices that have !F_IOMMU_PLATFORM should have their dma_ops set up so that the DMA API treats IOVA==PA, which will satisfy what the device expects. Then the virtio driver can use the DMA API the same way for both F_IOMMU_PLATFORM and !F_IOMMU_PLATFORM devices. But if that works for !F_IOMMU_PLATFORM_DEVICES+F_VERSION_1 devices, then AFAICT it will work equally well for legacy devices. Using the DMA API for *everything* in virtio, legacy or not, seems like a reasonable approach to me. But, AFAICT, that does require the DMA layer to have some kind of explicit call to turn on this behaviour, which the virtio driver would call during initializsation. I don't think we can do it 100% within the DMA layer, because only the driver can reasonably know when a device has this weird non-standard DMA behaviour. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected
On Thu, Feb 20, 2020 at 03:55:14PM -0500, Michael S. Tsirkin wrote: > On Thu, Feb 20, 2020 at 05:06:06PM +0100, Halil Pasic wrote: > > Currently the advanced guest memory protection technologies (AMD SEV, > > powerpc secure guest technology and s390 Protected VMs) abuse the > > VIRTIO_F_IOMMU_PLATFORM flag to make virtio core use the DMA API, which > > is in turn necessary, to make IO work with guest memory protection. > > > > But VIRTIO_F_IOMMU_PLATFORM a.k.a. VIRTIO_F_ACCESS_PLATFORM is really a > > different beast: with virtio devices whose implementation runs on an SMP > > CPU we are still fine with doing all the usual optimizations, it is just > > that we need to make sure that the memory protection mechanism does not > > get in the way. The VIRTIO_F_ACCESS_PLATFORM mandates more work on the > > side of the guest (and possibly he host side as well) than we actually > > need. > > > > An additional benefit of teaching the guest to make the right decision > > (and use DMA API) on it's own is: removing the need, to mandate special > > VM configuration for guests that may run with protection. This is > > especially interesting for s390 as VIRTIO_F_IOMMU_PLATFORM pushes all > > the virtio control structures into the first 2G of guest memory: > > something we don't necessarily want to do per-default. > > > > Signed-off-by: Halil Pasic > > Tested-by: Ram Pai > > Tested-by: Michael Mueller > > This might work for you but it's fragile, since without > VIRTIO_F_ACCESS_PLATFORM hypervisor assumes it gets > GPA's, not DMA addresses. > > > > IOW this looks like another iteration of: > > virtio: Support encrypted memory on powerpc secure guests > > which I was under the impression was abandoned as unnecessary. It has been abondoned on powerpc. We enabled VIRTIO_F_ACCESS_PLATFORM; by default, flag on powerpc. We would like to enable secure guests on powerpc without this flag aswell enabled, but past experience has educated us that its not a easy path. However if Halil makes some inroads in this path for s390, we will like to support him. RP ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/vt-d: fix the wrong printing in RHSA parsing
Hi, On 2020/1/15 18:28, Zhenzhong Duan wrote: When base address in RHSA structure doesn't match base address in each DRHD structure, the base address in last DRHD is printed out. This doesn't make sense when there are multiple DRHD units, fix it by printing the buggy RHSA's base address. Signed-off-by: Zhenzhong Duan Cc: David Woodhouse Cc: Lu Baolu Cc: Joerg Roedel Cc:iommu@lists.linux-foundation.org Queued for v5.7. Thanks! Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH] iommu/iova: Support limiting IOVA alignment
On Thu, 20 Feb 2020, Robin Murphy wrote: > > > > > Add CONFIG_IOMMU_LIMIT_IOVA_ALIGNMENT to limit the alignment of > > > > > IOVAs to some desired PAGE_SIZE order, specified by > > > > > CONFIG_IOMMU_IOVA_ALIGNMENT. This helps reduce the impact of > > > > > fragmentation caused by the current IOVA alignment scheme, and > > > > > gives better IOVA space utilization. > > > > > > > > Even if the general change did prove reasonable, this IOVA allocator is > > > > not > > > > owned by the DMA API, so entirely removing the option of strict > > > > size-alignment feels a bit uncomfortable. Personally I'd replace the > > > > bool > > > > argument with an actual alignment value to at least hand the authority > > > > out > > > > to individual callers. > > > > > > > > Furthermore, even in DMA API terms, is anyone really ever going to > > > > bother > > > > tuning that config? Since iommu-dma is supposed to be a transparent > > > > layer, > > > > arguably it shouldn't behave unnecessarily differently from CMA, so > > > > simply > > > > piggy-backing off CONFIG_CMA_ALIGNMENT would seem logical. > > > > > > Agreed, reusing CONFIG_CMA_ALIGNMENT makes a lot of sense here as callers > > > relying on natural alignment of DMA buffer allocations already have to > > > deal with that limitation. We could fix it as an optional parameter at > > > init time (init_iova_domain()), and have the DMA IOMMU implementation > > > pass it in there. > > > > > > > My concern with using CONFIG_CMA_ALIGNMENT alignment is that for us this > > would either involve further fragmenting our CMA regions (moving our CMA > > max alignment from 1MB to max 2MB) or losing so of our 2MB IOVA block > > mappings (changing our IOVA max alignment form 2MB to 1MB). > > > > At least for us CMA allocations are often not DMA mapped into stage 1 page > > tables so moving the CMA max alignment to 2MB in our case would, I think, > > only provide the disadvantage of having to increase the size our CMA > > regions to accommodate this large alignment (which isn’t optimal for > > memory utilization since CMA regions can't satisfy unmovable page > > allocations). > > > > As an alternative would it be possible for the dma-iommu layer to use the > > size of the allocation and the domain pgsize_bitmap field to pick a max > > IOVA alignment, which it can pass in for that IOVA allocation, which will > > maximize block mappings but not waste IOVA space? > > Given that we already have DMA_ATTR_ALOC_SINGLE_PAGES for video drivers and > suchlike that know enough to know they want "large buffer" allocation > behaviour, would it suffice to have a similar attribute that says "I'm not too > fussed about alignment"? That way there's no visible change for anyone who > doesn't opt in and might be relying on the existing behaviour, intentionally > or otherwise. > > Then if necessary, the implementation can consider both flags together to > decide whether to try to round down to the next block size or just shove it in > anywhere. > This should work for us. My only concern is that many of our users would be using DMA-Buf memory, so DMA mapping would be done using dma_buf_map_attachment which I believe still doesn't support specifying any DMA attributes. I had previously tried to get support added upstream but wasn't successful. https://lkml.org/lkml/2019/1/18/826 https://lkml.org/lkml/2019/1/18/827 But perhaps this new attribute will provide enough justification for DMA attributes (in some form, either explicitly or via flags) to be supported via dma_buf_map_attachment. Liam Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM
On Thu, Feb 20, 2020 at 05:06:04PM +0100, Halil Pasic wrote: > For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following side > effect The vhost code assumes it the addresses on the virtio descriptor > ring are not guest physical addresses but iova's, and insists on doing a > translation of these regardless of what transport is used (e.g. whether > we emulate a PCI or a CCW device). (For details see commit 6b1e6cc7855b > "vhost: new device IOTLB API".) On s390 this results in severe > performance degradation (c.a. factor 10). BTW with ccw I/O there is > (architecturally) no IOMMU, so the whole address translation makes no > sense in the context of virtio-ccw. So it sounds like a host issue: the emulation of s390 unnecessarily complicated. Working around it by the guest looks wrong ... > Halil Pasic (2): > mm: move force_dma_unencrypted() to mem_encrypt.h > virtio: let virtio use DMA API when guest RAM is protected > > drivers/virtio/virtio_ring.c | 3 +++ > include/linux/dma-direct.h | 9 - > include/linux/mem_encrypt.h | 10 ++ > 3 files changed, 13 insertions(+), 9 deletions(-) > > > base-commit: ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2 > -- > 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM
On Thu, Feb 20, 2020 at 05:06:04PM +0100, Halil Pasic wrote: > * This usage is not congruent with standardised semantics of > VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal reason > for using DMA API in virtio (orthogonal with respect to what is > expressed by VIRTIO_F_IOMMU_PLATFORM). Quoting the spec: \item[VIRTIO_F_ACCESS_PLATFORM(33)] This feature indicates that the device can be used on a platform where device access to data in memory is limited and/or translated. E.g. this is the case if the device can be located behind an IOMMU that translates bus addresses from the device into physical addresses in memory, if the device can be limited to only access certain memory addresses or if special commands such as a cache flush can be needed to synchronise data in memory with the device. Whether accesses are actually limited or translated is described by platform-specific means. If this feature bit is set to 0, then the device has same access to memory addresses supplied to it as the driver has. In particular, the device will always use physical addresses matching addresses used by the driver (typically meaning physical addresses used by the CPU) and not translated further, and can access any address supplied to it by the driver. When clear, this overrides any platform-specific description of whether device access is limited or translated in any way, e.g. whether an IOMMU may be present. since device can't access encrypted memory, this seems to match your case reasonably well. -- MST ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected
On Thu, Feb 20, 2020 at 05:06:06PM +0100, Halil Pasic wrote: > Currently the advanced guest memory protection technologies (AMD SEV, > powerpc secure guest technology and s390 Protected VMs) abuse the > VIRTIO_F_IOMMU_PLATFORM flag to make virtio core use the DMA API, which > is in turn necessary, to make IO work with guest memory protection. > > But VIRTIO_F_IOMMU_PLATFORM a.k.a. VIRTIO_F_ACCESS_PLATFORM is really a > different beast: with virtio devices whose implementation runs on an SMP > CPU we are still fine with doing all the usual optimizations, it is just > that we need to make sure that the memory protection mechanism does not > get in the way. The VIRTIO_F_ACCESS_PLATFORM mandates more work on the > side of the guest (and possibly he host side as well) than we actually > need. > > An additional benefit of teaching the guest to make the right decision > (and use DMA API) on it's own is: removing the need, to mandate special > VM configuration for guests that may run with protection. This is > especially interesting for s390 as VIRTIO_F_IOMMU_PLATFORM pushes all > the virtio control structures into the first 2G of guest memory: > something we don't necessarily want to do per-default. > > Signed-off-by: Halil Pasic > Tested-by: Ram Pai > Tested-by: Michael Mueller This might work for you but it's fragile, since without VIRTIO_F_ACCESS_PLATFORM hypervisor assumes it gets GPA's, not DMA addresses. IOW this looks like another iteration of: virtio: Support encrypted memory on powerpc secure guests which I was under the impression was abandoned as unnecessary. To summarize, the necessary conditions for a hack along these lines (using DMA API without VIRTIO_F_ACCESS_PLATFORM) are that we detect that: - secure guest mode is enabled - so we know that since we don't share most memory regular virtio code won't work, even though the buggy hypervisor didn't set VIRTIO_F_ACCESS_PLATFORM - DMA API is giving us addresses that are actually also physical addresses - Hypervisor is buggy and didn't enable VIRTIO_F_ACCESS_PLATFORM I don't see how this patch does this. > --- > drivers/virtio/virtio_ring.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > index 867c7ebd3f10..fafc8f924955 100644 > --- a/drivers/virtio/virtio_ring.c > +++ b/drivers/virtio/virtio_ring.c > @@ -243,6 +243,9 @@ static bool vring_use_dma_api(struct virtio_device *vdev) > if (!virtio_has_iommu_quirk(vdev)) > return true; > > + if (force_dma_unencrypted(>dev)) > + return true; > + > /* Otherwise, we are left to guess. */ > /* >* In theory, it's possible to have a buggy QEMU-supposed > -- > 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM
On Thu, Feb 20, 2020 at 05:06:04PM +0100, Halil Pasic wrote: > Currently if one intends to run a memory protection enabled VM with > virtio devices and linux as the guest OS, one needs to specify the > VIRTIO_F_IOMMU_PLATFORM flag for each virtio device to make the guest > linux use the DMA API, which in turn handles the memory > encryption/protection stuff if the guest decides to turn itself into > a protected one. This however makes no sense due to multiple reasons: > * The device is not changed by the fact that the guest RAM is > protected. The so called IOMMU bypass quirk is not affected. > * This usage is not congruent with standardised semantics of > VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal reason > for using DMA API in virtio (orthogonal with respect to what is > expressed by VIRTIO_F_IOMMU_PLATFORM). > > This series aims to decouple 'have to use DMA API because my (guest) RAM > is protected' and 'have to use DMA API because the device told me > VIRTIO_F_IOMMU_PLATFORM'. > > Please find more detailed explanations about the conceptual aspects in > the individual patches. There is however also a very practical problem > that is addressed by this series. > > For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following side > effect The vhost code assumes it the addresses on the virtio descriptor > ring are not guest physical addresses but iova's, and insists on doing a > translation of these regardless of what transport is used (e.g. whether > we emulate a PCI or a CCW device). (For details see commit 6b1e6cc7855b > "vhost: new device IOTLB API".) On s390 this results in severe > performance degradation (c.a. factor 10). BTW with ccw I/O there is > (architecturally) no IOMMU, so the whole address translation makes no > sense in the context of virtio-ccw. That's just a QEMU thing. It uses the same flag for VIRTIO_F_ACCESS_PLATFORM and vhost IOTLB. QEMU can separate them, no need to change linux. > Halil Pasic (2): > mm: move force_dma_unencrypted() to mem_encrypt.h > virtio: let virtio use DMA API when guest RAM is protected > > drivers/virtio/virtio_ring.c | 3 +++ > include/linux/dma-direct.h | 9 - > include/linux/mem_encrypt.h | 10 ++ > 3 files changed, 13 insertions(+), 9 deletions(-) > > > base-commit: ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2 > -- > 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page
intel_iommu_iova_to_phys() has a bug when it translates an IOVA for a huge page onto its corresponding physical address. This commit fixes the bug by accomodating the level of page entry for the IOVA and adds IOVA's lower address to the physical address. Signed-off-by: Yonghyun Hwang --- Changes from v1: - level cannot be 0. So, the condition, "if (level > 1)", is removed, which results in a simple code. - a macro, BIT_MASK, is used to have a bit mask --- drivers/iommu/intel-iommu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 932267f49f9a..4fd5c6287b6d 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5554,7 +5554,9 @@ static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain, pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, ); if (pte) - phys = dma_pte_addr(pte); + phys = dma_pte_addr(pte) + + (iova & (BIT_MASK(level_to_offset_bits(level) + + VTD_PAGE_SHIFT) - 1)); return phys; } -- 2.25.0.265.gbab2e86ba0-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page
Hi Yonghyun, On Thu, Feb 20, 2020 at 11:44:31AM -0800, Yonghyun Hwang wrote: > intel_iommu_iova_to_phys() has a bug when it translates an IOVA for a huge > page onto its corresponding physical address. This commit fixes the bug by > accomodating the level of page entry for the IOVA and adds IOVA's lower > address to the physical address. > D'oh I meant to add a Cc: sta...@vger.kernel.org here ... :) > Signed-off-by: Yonghyun Hwang > --- > > Changes from v1: > - level cannot be 0. So, the condition, "if (level > 1)", is removed, which > results in a simple code. > - a macro, BIT_MASK, is used to have a bit mask > > --- > drivers/iommu/intel-iommu.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > index 932267f49f9a..4fd5c6287b6d 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -5554,7 +5554,9 @@ static phys_addr_t intel_iommu_iova_to_phys(struct > iommu_domain *domain, > > pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, ); > if (pte) > - phys = dma_pte_addr(pte); > + phys = dma_pte_addr(pte) + > + (iova & (BIT_MASK(level_to_offset_bits(level) + > + VTD_PAGE_SHIFT) - 1)); > > return phys; > } > -- > 2.25.0.265.gbab2e86ba0-goog > Cheers, Moritz ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH] iommu/dma: Allow drivers to reserve an iova range
On 02/20 08:45 am, Will Deacon wrote: > On Wed, Feb 19, 2020 at 12:06:28PM -0800, isa...@codeaurora.org wrote: > > On 2020-02-19 03:15, Will Deacon wrote: > > > On Tue, Feb 18, 2020 at 05:57:18PM -0800, isa...@codeaurora.org wrote: > > > > Does this mean that the driver should be managing the IOVA space and > > > > mappings for this device using the IOMMU API? If so, is the > > > > rationale for > > > > this because the device driver can have the information of what IOVA > > > > ranges > > > > can and cannot be used? Shouldn't there be a generic way of > > > > informing an > > > > IOMMU driver about these reserved ranges? Perhaps through a device > > > > tree > > > > property, instead of deferring this type of management to the driver? > > > > > > Before we dive into designing that, can you please clarify whether the > > > reserved IOVA range applies to all DMA masters mastering through a > > > particular SMMU, or whether it's just about one specific master? I was > > > assuming the former, but wanted to be sure. > > > > > This situation currently applies to one master. > > Interesting. Is it problematic if the range is reserved for all masters > sharing that SMMU? > Yes, that would be an overkill. It certainly is useful and in some cases quite helpful to not waste that range of IOVA space for other masters on the same SMMU that can use it. -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH] iommu/iova: Add a best-fit algorithm
On 20/02/2020 6:38 am, isa...@codeaurora.org wrote: On 2020-02-17 08:03, Robin Murphy wrote: On 14/02/2020 11:06 pm, Isaac J. Manjarres wrote: From: Liam Mark Using the best-fit algorithm, instead of the first-fit algorithm, may reduce fragmentation when allocating IOVAs. What kind of pathological allocation patterns make that a serious problem? Is there any scope for simply changing the order of things in the callers? Do these drivers also run under other DMA API backends (e.g. 32-bit Arm)? The usecases where the IOVA space has been fragmented have non-deterministic allocation patterns, and thus, it's not feasible to change the allocation order to avoid fragmenting the IOVA space. What about combining smaller buffers into larger individual allocations; any scope for that sort of thing? Certainly if you're consistently allocating small things less than PAGE_SIZE then DMA pools would be useful to avoid wanton memory wastage in general. From what we've observed, the usecases involve allocations of two types of buffers: one type of buffer between 1 KB to 4 MB in size, and another type of buffer between 1 KB to 400 MB in size. The pathological scenarios seem to arise when there are many (100+) randomly distributed non-power of two allocations, which in some cases leaves behind holes of up to 100+ MB in the IOVA space. Here are some examples that show the state of the IOVA space under which failure to allocate an IOVA was observed: Instance 1: Currently mapped total size : ~1.3GB Free space available : ~2GB Map for ~162MB fails. Max contiguous space available : < 162MB Instance 2: Currently mapped total size : ~950MB Free space available : ~2.3GB Map for ~320MB fails. Max contiguous space available : ~189MB Instance 3: Currently mapped total size : ~1.2GB Free space available : ~2.7GB Map for ~162MB fails. Max contiguous space available : <162MB We are still in the process of collecting data with the best-fit algorithm enabled to provide some numbers to show that it results in less IOVA space fragmentation. Thanks for those examples, and I'd definitely like to see the comparative figures. To dig a bit further, at the point where things start failing, where are the cached nodes pointing? IIRC there is still a pathological condition where empty space between limit_pfn and cached32_node gets 'lost' if nothing in between is freed, so the bigger the range of allocation sizes, the worse the effect, e.g.: (considering an empty domain, pfn 0 *not* reserved, 32-bit limit_pfn) alloc 4K, succeeds, cached32_node now at 4G-4K alloc 2G, succeeds, cached32_node now at 0 alloc 4K, fails despite almost 2G contiguous free space within limit_pfn (and max32_alloc_size==1 now fast-forwards *any* further allocation attempt to failure) If you're falling foul of this case (I was never sure how realistic a problem it would be in practice), there are at least a couple of much less invasive tweaks I can think of that would be worth exploring. To answer your question about whether if this driver run under other DMA API backends: yes, such as 32 bit ARM. OK, that's what I suspected :) AFAICS arch/arm's __alloc_iova() is also a first-fit algorithm, so if you get better behaviour there then it would suggest that this aspect isn't really the most important issue. Certainly, the fact that the "best fit" logic here also happens to ignore the cached nodes does start drawing me back to the point above. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 4/4] drm/sun4i: mixer: Call of_dma_configure if there's an IOMMU
The main DRM device is actually a virtual device so it doesn't have the iommus property, which is instead on the DMA masters, in this case the mixers. Add a call to of_dma_configure with the mixers DT node but on the DRM virtual device to configure it in the same way than the mixers. Signed-off-by: Maxime Ripard --- drivers/gpu/drm/sun4i/sun8i_mixer.c | 13 + 1 file changed, 13 insertions(+) diff --git a/drivers/gpu/drm/sun4i/sun8i_mixer.c b/drivers/gpu/drm/sun4i/sun8i_mixer.c index 7c24f8f832a5..85b8930e334c 100644 --- a/drivers/gpu/drm/sun4i/sun8i_mixer.c +++ b/drivers/gpu/drm/sun4i/sun8i_mixer.c @@ -372,6 +372,19 @@ static int sun8i_mixer_bind(struct device *dev, struct device *master, mixer->engine.ops = _engine_ops; mixer->engine.node = dev->of_node; + if (of_find_property(dev->of_node, "iommus", NULL)) { + /* +* This assume we have the same DMA constraints for +* all our the mixers in our pipeline. This sounds +* bad, but it has always been the case for us, and +* DRM doesn't do per-device allocation either, so we +* would need to fix DRM first... +*/ + ret = of_dma_configure(drm->dev, dev->of_node, true); + if (ret) + return ret; + } + /* * While this function can fail, we shouldn't do anything * if this happens. Some early DE2 DT entries don't provide -- git-series 0.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/4] dt-bindings: iommu: Add Allwinner H6 IOMMU bindings
The Allwinner H6 has introduced an IOMMU. Let's add a device tree binding for it. Reviewed-by: Rob Herring Signed-off-by: Maxime Ripard --- Documentation/devicetree/bindings/iommu/allwinner,sun50i-h6-iommu.yaml | 61 + 1 file changed, 61 insertions(+) create mode 100644 Documentation/devicetree/bindings/iommu/allwinner,sun50i-h6-iommu.yaml diff --git a/Documentation/devicetree/bindings/iommu/allwinner,sun50i-h6-iommu.yaml b/Documentation/devicetree/bindings/iommu/allwinner,sun50i-h6-iommu.yaml new file mode 100644 index ..5e125cf2a88b --- /dev/null +++ b/Documentation/devicetree/bindings/iommu/allwinner,sun50i-h6-iommu.yaml @@ -0,0 +1,61 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/iommu/allwinner,sun50i-h6-iommu.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Allwinner H6 IOMMU Device Tree Bindings + +maintainers: + - Chen-Yu Tsai + - Maxime Ripard + +properties: + "#iommu-cells": +const: 1 +description: + The content of the cell is the master ID. + + compatible: +const: allwinner,sun50i-h6-iommu + + reg: +maxItems: 1 + + interrupts: +maxItems: 1 + + clocks: +maxItems: 1 + + resets: +maxItems: 1 + +required: + - "#iommu-cells" + - compatible + - reg + - interrupts + - clocks + - resets + +additionalProperties: false + +examples: + - | + #include + #include + + #include + #include + + iommu: iommu@30f { + compatible = "allwinner,sun50i-h6-iommu"; + reg = <0x030f 0x1>; + interrupts = ; + clocks = < CLK_BUS_IOMMU>; + resets = < RST_BUS_IOMMU>; + #iommu-cells = <1>; + }; + +... -- git-series 0.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 0/4] iommu: Add Allwinner H6 IOMMU driver
Hi, Here's a series adding support for the IOMMU introduced in the Allwinner H6. The driver from Allwinner hints at more SoCs using it in the future (with more masters), so we can bet on that IOMMU becoming pretty much standard in new SoCs from Allwinner. One thing I wasn't really sure about was how to expose the statistics reported by the IOMMU PMU (TLB hit rates, latencies, and so on). The Allwinner driver exposes them through custom sysfs files, while they would be best represented through perf I guess? Anyway, I'm planning to support them later on. Let me know what you think, Maxime Changes from v1: - Add a patch to configure the IOMMU on the virtual DRM device - Rework the domain allocation / freeing - Remove the runtime_pm handling to power up the device and rely on refcounting - use map gfp argument for kmem cache allocation - Removed unused macros - Switched from BIT(0) to 1 for the page table entry valid flag to make it more obvious that it's over multiple bits. - Switch to module_initcall - Make accesses to the fwspec more consistant - Removed dev_info logs - Reworked invalidation / flushing - Allow for compilation with COMPILE_TEST Maxime Ripard (4): dt-bindings: iommu: Add Allwinner H6 IOMMU bindings iommu: Add Allwinner H6 IOMMU driver arm64: dts: allwinner: h6: Add IOMMU drm/sun4i: mixer: Call of_dma_configure if there's an IOMMU Documentation/devicetree/bindings/iommu/allwinner,sun50i-h6-iommu.yaml | 61 - arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 10 +- drivers/gpu/drm/sun4i/sun8i_mixer.c| 13 +- drivers/iommu/Kconfig |9 +- drivers/iommu/Makefile |1 +- drivers/iommu/sun50i-iommu.c | 1072 - 6 files changed, 1166 insertions(+) create mode 100644 Documentation/devicetree/bindings/iommu/allwinner,sun50i-h6-iommu.yaml create mode 100644 drivers/iommu/sun50i-iommu.c base-commit: bb6d3fb354c5ee8d6bde2d576eb7220ea09862b9 -- git-series 0.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 3/4] arm64: dts: allwinner: h6: Add IOMMU
Now that we have a driver for the IOMMU, let's start using it. Signed-off-by: Maxime Ripard --- arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 10 ++ 1 file changed, 10 insertions(+) diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi index 3329283e38ab..2d0777ad39aa 100644 --- a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi +++ b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi @@ -131,6 +131,7 @@ clock-names = "bus", "mod"; resets = <_clocks RST_MIXER0>; + iommus = < 0>; ports { #address-cells = <1>; @@ -370,6 +371,15 @@ #interrupt-cells = <3>; }; + iommu: iommu@30f { + compatible = "allwinner,sun50i-h6-iommu"; + reg = <0x030f 0x1>; + interrupts = ; + clocks = < CLK_BUS_IOMMU>; + resets = < RST_BUS_IOMMU>; + #iommu-cells = <1>; + }; + mmc0: mmc@402 { compatible = "allwinner,sun50i-h6-mmc", "allwinner,sun50i-a64-mmc"; -- git-series 0.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/4] iommu: Add Allwinner H6 IOMMU driver
The Allwinner H6 has introduced an IOMMU for a few DMA controllers, mostly video related: the display engine, the video decoders / encoders, the camera capture controller, etc. The design is pretty simple compared to other IOMMUs found in SoCs: there's a single instance, controlling all the masters, with a single address space. It also features a performance monitoring unit that allows to retrieve various informations (per-master and global TLB accesses, hits and misses, access latency, etc) that isn't supported at the moment. Signed-off-by: Maxime Ripard --- drivers/iommu/Kconfig|9 +- drivers/iommu/Makefile |1 +- drivers/iommu/sun50i-iommu.c | 1072 +++- 3 files changed, 1082 insertions(+) create mode 100644 drivers/iommu/sun50i-iommu.c diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index d2fade984999..87677ea98427 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -302,6 +302,15 @@ config ROCKCHIP_IOMMU Say Y here if you are using a Rockchip SoC that includes an IOMMU device. +config SUN50I_IOMMU + bool "Allwinner H6 IOMMU Support" + depends on ARCH_SUNXI || COMPILE_TEST + select ARM_DMA_USE_IOMMU + select IOMMU_API + select IOMMU_DMA + help + Support for the IOMMU introduced in the Allwinner H6 SoCs. + config TEGRA_IOMMU_GART bool "Tegra GART IOMMU Support" depends on ARCH_TEGRA_2x_SOC diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 2104fb8afc06..dd1ff336b9b9 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -29,6 +29,7 @@ obj-$(CONFIG_MTK_IOMMU_V1) += mtk_iommu_v1.o obj-$(CONFIG_OMAP_IOMMU) += omap-iommu.o obj-$(CONFIG_OMAP_IOMMU_DEBUG) += omap-iommu-debug.o obj-$(CONFIG_ROCKCHIP_IOMMU) += rockchip-iommu.o +obj-$(CONFIG_SUN50I_IOMMU) += sun50i-iommu.o obj-$(CONFIG_TEGRA_IOMMU_GART) += tegra-gart.o obj-$(CONFIG_TEGRA_IOMMU_SMMU) += tegra-smmu.o obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o diff --git a/drivers/iommu/sun50i-iommu.c b/drivers/iommu/sun50i-iommu.c new file mode 100644 index ..81ba5f562bd2 --- /dev/null +++ b/drivers/iommu/sun50i-iommu.c @@ -0,0 +1,1072 @@ +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +// Copyright (C) 2016-2018, Allwinner Technology CO., LTD. +// Copyright (C) 2019-2020, Cerno + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define IOMMU_RESET_REG0x010 +#define IOMMU_ENABLE_REG 0x020 +#define IOMMU_ENABLE_ENABLEBIT(0) + +#define IOMMU_BYPASS_REG 0x030 +#define IOMMU_AUTO_GATING_REG 0x040 +#define IOMMU_AUTO_GATING_ENABLE BIT(0) + +#define IOMMU_WBUF_CTRL_REG0x044 +#define IOMMU_OOO_CTRL_REG 0x048 +#define IOMMU_4KB_BDY_PRT_CTRL_REG 0x04c +#define IOMMU_TTB_REG 0x050 +#define IOMMU_TLB_ENABLE_REG 0x060 +#define IOMMU_TLB_PREFETCH_REG 0x070 +#define IOMMU_TLB_PREFETCH_MASTER_ENABLE(m)BIT(m) + +#define IOMMU_TLB_FLUSH_REG0x080 +#define IOMMU_TLB_FLUSH_PTW_CACHE BIT(17) +#define IOMMU_TLB_FLUSH_MACRO_TLB BIT(16) +#define IOMMU_TLB_FLUSH_MICRO_TLB(i) (BIT(i) & GENMASK(5, 0)) + +#define IOMMU_TLB_IVLD_ADDR_REG0x090 +#define IOMMU_TLB_IVLD_ADDR_MASK_REG 0x094 +#define IOMMU_TLB_IVLD_ENABLE_REG 0x098 +#define IOMMU_TLB_IVLD_ENABLE_ENABLE BIT(0) + +#define IOMMU_PC_IVLD_ADDR_REG 0x0a0 +#define IOMMU_PC_IVLD_ENABLE_REG 0x0a8 +#define IOMMU_PC_IVLD_ENABLE_ENABLEBIT(0) + +#define IOMMU_DM_AUT_CTRL_REG(d) (0x0b0 + ((d) / 2) * 4) +#define IOMMU_DM_AUT_CTRL_RD_UNAVAIL(d, m) (1 << (((d & 1) * 16) + ((m) * 2))) +#define IOMMU_DM_AUT_CTRL_WR_UNAVAIL(d, m) (1 << (((d & 1) * 16) + ((m) * 2) + 1)) + +#define IOMMU_DM_AUT_OVWT_REG 0x0d0 +#define IOMMU_INT_ENABLE_REG 0x100 +#define IOMMU_INT_CLR_REG 0x104 +#define IOMMU_INT_STA_REG 0x108 +#define IOMMU_INT_ERR_ADDR_REG(i) (0x110 + (i) * 4) +#define IOMMU_INT_ERR_ADDR_L1_REG 0x130 +#define IOMMU_INT_ERR_ADDR_L2_REG 0x134 +#define IOMMU_INT_ERR_DATA_REG(i) (0x150 + (i) * 4) +#define IOMMU_L1PG_INT_REG 0x0180 +#define IOMMU_L2PG_INT_REG 0x0184 + +#define IOMMU_INT_INVALID_L2PG BIT(17) +#define IOMMU_INT_INVALID_L1PG BIT(16) +#define IOMMU_INT_MASTER_PERMISSION(m) BIT(m) +#define IOMMU_INT_MASTER_MASK (IOMMU_INT_MASTER_PERMISSION(0) | \ +IOMMU_INT_MASTER_PERMISSION(1) | \ +
Re: [PATCH v2] iommu/arm-smmu-v3: Add SMMUv3.2 range invalidation support
On Mon, Feb 17, 2020 at 1:17 PM Robin Murphy wrote: > > On 13/02/2020 9:49 pm, Rob Herring wrote: > > On Thu, Jan 30, 2020 at 11:34 AM Robin Murphy wrote: > >> > >> On 30/01/2020 3:06 pm, Auger Eric wrote: > >>> Hi Rob, > >>> On 1/17/20 10:16 PM, Rob Herring wrote: > Arm SMMUv3.2 adds support for TLB range invalidate operations. > Support for range invalidate is determined by the RIL bit in the IDR3 > register. > > The range invalidate is in units of the leaf page size and operates on > 1-32 chunks of a power of 2 multiple pages. First, we determine from the > size what power of 2 multiple we can use. Then we calculate how many > chunks (1-31) of the power of 2 size for the range on the iteration. On > each iteration, we move up in size by at least 5 bits. > > Cc: Eric Auger > Cc: Jean-Philippe Brucker > Cc: Will Deacon > Cc: Robin Murphy > Cc: Joerg Roedel > Signed-off-by: Rob Herring > > > > > @@ -2003,7 +2024,7 @@ static void arm_smmu_tlb_inv_range(unsigned long > iova, size_t size, > { > u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS]; > struct arm_smmu_device *smmu = smmu_domain->smmu; > -unsigned long start = iova, end = iova + size; > +unsigned long start = iova, end = iova + size, num_pages = 0, tg = > 0; > int i = 0; > struct arm_smmu_cmdq_ent cmd = { > .tlbi = { > @@ -2022,12 +2043,50 @@ static void arm_smmu_tlb_inv_range(unsigned long > iova, size_t size, > cmd.tlbi.vmid = smmu_domain->s2_cfg.vmid; > } > > +if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) { > +/* Get the leaf page size */ > +tg = __ffs(smmu_domain->domain.pgsize_bitmap); > + > +/* Convert page size of 12,14,16 (log2) to 1,2,3 */ > +cmd.tlbi.tg = ((tg - ilog2(SZ_4K)) / 2) + 1; > >> > >> Given the comment, I think "(tg - 10) / 2" would suffice ;) > > > > Well, duh... > > > >> > + > +/* Determine what level the granule is at */ > +cmd.tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3)); > >> > >> Is it possible to rephrase that with logs and shifts to avoid the division? > > > > Well, with a loop is the only other way I came up with: > > > > bpl = tg - 3; > > ttl = 3; > > mask = BIT(bpl) - 1; > > while ((granule & (mask << ((4 - ttl) * bpl + 3))) == 0) > > ttl--; > > > > Doesn't seem like much improvement to me given we have h/w divide... > > Sure, it doesn't take too many extra instructions to start matching > typical IDIV latency, so there's no point being silly if there really > isn't a clean alternative. > > Some quick scribbling suggests "4 - ilog2(granule) / 10" might actually > be close enough, but perhaps that's a bit too cheeky. How does divide by 10 save a divide? > +num_pages = size / (1UL << tg); > >> > >> Similarly, in my experience GCC has always seemed too cautious to elide > >> non-constant division even in a seemingly-obvious case like this, so > >> explicit "size >> tg" might be preferable. > > > > My experience has been gcc is smart enough. I checked and there's only > > one divide and it is the prior one. But I'll change it anyways. > > Now that I think about it, the case that frustrated me may have had the > shift and divide in separate statements - that's probably a lot harder > to analyse than a single expression. Either way, the simple right shift > is easier to read IMO. > > +} > + > while (iova < end) { > if (i == CMDQ_BATCH_ENTRIES) { > arm_smmu_cmdq_issue_cmdlist(smmu, cmds, i, false); > i = 0; > } > > +if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) { > +/* > + * On each iteration of the loop, the range is 5 > bits > + * worth of the aligned size remaining. > + * The range in pages is: > + * > + * range = (num_pages & (0x1f << __ffs(num_pages))) > + */ > +unsigned long scale, num; > + > +/* Determine the power of 2 multiple number of > pages */ > +scale = __ffs(num_pages); > +cmd.tlbi.scale = scale; > + > +/* Determine how many chunks of 2^scale size we > have */ > +num = (num_pages >> scale) & > CMDQ_TLBI_RANGE_NUM_MAX; > +cmd.tlbi.num = num - 1; > + > +/* range is num * 2^scale * pgsize */ > +granule = num << (scale + tg); > + > +/*
Re: [RFC PATCH] iommu/iova: Support limiting IOVA alignment
On 19/02/2020 11:22 pm, Liam Mark wrote: On Wed, 19 Feb 2020, Will Deacon wrote: On Mon, Feb 17, 2020 at 04:46:14PM +, Robin Murphy wrote: On 14/02/2020 8:30 pm, Liam Mark wrote: When the IOVA framework applies IOVA alignment it aligns all IOVAs to the smallest PAGE_SIZE order which is greater than or equal to the requested IOVA size. We support use cases that requires large buffers (> 64 MB in size) to be allocated and mapped in their stage 1 page tables. However, with this alignment scheme we find ourselves running out of IOVA space for 32 bit devices, so we are proposing this config, along the similar vein as CONFIG_CMA_ALIGNMENT for CMA allocations. As per [1], I'd really like to better understand the allocation patterns that lead to such a sparsely-occupied address space to begin with, given that the rbtree allocator is supposed to try to maintain locality as far as possible, and the rcaches should further improve on that. Are you also frequently cycling intermediate-sized buffers which are smaller than 64MB but still too big to be cached? Are there a lot of non-power-of-two allocations? Right, information on the allocation pattern would help with this change and also the choice of IOVA allocation algorithm. Without it, we're just shooting in the dark. Thanks for the responses. I am looking into how much of our allocation pattern details I can share. My general understanding is that this issue occurs on a 32bit devices which have additional restrictions on the IOVA range they can use within those 32bits. An example is a use case which involves allocating a lot of buffers ~80MB is size, the current algorithm will require an alignment of 128MB for those buffers. My understanding is that it simply can't accommodate the number of 80MB buffers that are required because the of amount of IOVA space which can't be used because of the 128MB alignment requirement. OK, that's a case I can understand :) Add CONFIG_IOMMU_LIMIT_IOVA_ALIGNMENT to limit the alignment of IOVAs to some desired PAGE_SIZE order, specified by CONFIG_IOMMU_IOVA_ALIGNMENT. This helps reduce the impact of fragmentation caused by the current IOVA alignment scheme, and gives better IOVA space utilization. Even if the general change did prove reasonable, this IOVA allocator is not owned by the DMA API, so entirely removing the option of strict size-alignment feels a bit uncomfortable. Personally I'd replace the bool argument with an actual alignment value to at least hand the authority out to individual callers. Furthermore, even in DMA API terms, is anyone really ever going to bother tuning that config? Since iommu-dma is supposed to be a transparent layer, arguably it shouldn't behave unnecessarily differently from CMA, so simply piggy-backing off CONFIG_CMA_ALIGNMENT would seem logical. Agreed, reusing CONFIG_CMA_ALIGNMENT makes a lot of sense here as callers relying on natural alignment of DMA buffer allocations already have to deal with that limitation. We could fix it as an optional parameter at init time (init_iova_domain()), and have the DMA IOMMU implementation pass it in there. My concern with using CONFIG_CMA_ALIGNMENT alignment is that for us this would either involve further fragmenting our CMA regions (moving our CMA max alignment from 1MB to max 2MB) or losing so of our 2MB IOVA block mappings (changing our IOVA max alignment form 2MB to 1MB). At least for us CMA allocations are often not DMA mapped into stage 1 page tables so moving the CMA max alignment to 2MB in our case would, I think, only provide the disadvantage of having to increase the size our CMA regions to accommodate this large alignment (which isn’t optimal for memory utilization since CMA regions can't satisfy unmovable page allocations). As an alternative would it be possible for the dma-iommu layer to use the size of the allocation and the domain pgsize_bitmap field to pick a max IOVA alignment, which it can pass in for that IOVA allocation, which will maximize block mappings but not waste IOVA space? Given that we already have DMA_ATTR_ALOC_SINGLE_PAGES for video drivers and suchlike that know enough to know they want "large buffer" allocation behaviour, would it suffice to have a similar attribute that says "I'm not too fussed about alignment"? That way there's no visible change for anyone who doesn't opt in and might be relying on the existing behaviour, intentionally or otherwise. Then if necessary, the implementation can consider both flags together to decide whether to try to round down to the next block size or just shove it in anywhere. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: question about iommu_need_mapping
On Thu Feb 20 20, Jerry Snitselaar wrote: On Thu Feb 20 20, Lu Baolu wrote: Hi Jerry, On 2020/2/20 7:55, Jerry Snitselaar wrote: Is it possible for a device to end up with dev->archdata.iommu == NULL on iommu_need_mapping in the following instance: 1. iommu_group has dma domain for default 2. device gets private identity domain in intel_iommu_add_device 3. iommu_need_mapping gets called with that device. 4. dmar_remove_one_dev_info sets dev->archdata.iommu = NULL via unlink_domain_info. 5. request_default_domain_for_dev exits after checking that group->default_domain exists, and group->default_domain->type is dma. 6. iommu_request_dma_domain_for_dev returns 0 from request_default_domain_for_dev and a private dma domain isn't created for the device. Yes. It's possible. The case I was seeing went away with commit 9235cb13d7d1 ("iommu/vt-d: Allow devices with RMRRs to use identity domain"), because it changed which domain the group and devices were using, but it seems like it is still a possibility with the code. Baolu, you mentioned possibly removing the domain switch. Commit 98b2fffb5e27 ("iommu/vt-d: Handle 32bit device with identity default domain") makes it sound like the domain switch is required. It's more "nice to have" than "required" if the iommu driver doesn't disable swiotlb explicitly. The device access of system memory higher than the device's addressing capability could go through the bounced buffer implemented in swiotlb. Best regards, baolu Hi Baolu, Would this mean switching to bounce_dma_ops instead? Never mind. I see that it would go into the dma_direct code. Regards, Jerry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 04/11] cpufreq: Remove Calxeda driver
On 2/18/20 11:13 AM, Rob Herring wrote: Cc: "Rafael J. Wysocki" Cc: Viresh Kumar Cc: linux...@vger.kernel.org Signed-off-by: Rob Herring --- Acked-by: Mark Langsdorf ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 02/11] ata: Remove Calxeda AHCI driver
On 2/18/20 11:13 AM, Rob Herring wrote: Cc: Jens Axboe Cc: linux-...@vger.kernel.org Signed-off-by: Rob Herring --- Acked-by: Mark Langsdorf ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] dma-mapping: support setting memory uncached in place
On 20/02/2020 5:01 pm, Christoph Hellwig wrote: We currently only support remapping memory as uncached through vmap or a magic uncached segment provided by some architectures. But there is a simpler and much better way available on some architectures where we can just remap the memory in place. The advantages are: 1) no aliasing is possible, which prevents speculating into the cached alias 2) there is no need to allocate new ptes and thus no need for a special pre-allocated pool of memory that can be used with GFP_ATOMIC DMA allocations The downside is that architectures must provide a way to set arbitrary pages uncached in the kernel mapping, which might not be possible on architecture that have a special implicit kernel mapping, and requires splitting of huge page kernel mappings where they exist. Signed-off-by: Christoph Hellwig --- include/linux/dma-noncoherent.h | 3 +++ kernel/dma/Kconfig | 8 kernel/dma/direct.c | 28 ++-- 3 files changed, 29 insertions(+), 10 deletions(-) diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h index ca9b5770caee..0820ec58f119 100644 --- a/include/linux/dma-noncoherent.h +++ b/include/linux/dma-noncoherent.h @@ -111,4 +111,7 @@ static inline void arch_dma_prep_coherent(struct page *page, size_t size) void *uncached_kernel_address(void *addr); void *cached_kernel_address(void *addr); +int arch_dma_set_uncached(void *cpu_addr, size_t size); +void arch_dma_clear_uncached(void *cpu_addr, size_t size); + #endif /* _LINUX_DMA_NONCOHERENT_H */ diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index 4c103a24e380..7bc0b77f1243 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -83,6 +83,14 @@ config DMA_DIRECT_REMAP bool select DMA_REMAP +# +# Should be selected if the architecture can remap memory from the page +# allocator and CMA as uncached and provides the arch_dma_set_uncached and +# arch_dma_clear_uncached helpers +# +config ARCH_HAS_DMA_SET_UNCACHED + bool + config DMA_CMA bool "DMA Contiguous Memory Allocator" depends on HAVE_DMA_CONTIGUOUS && CMA diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 6af7ae83c4ad..73fe65a4cbc0 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -169,11 +169,8 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, ret = dma_common_contiguous_remap(page, PAGE_ALIGN(size), dma_pgprot(dev, PAGE_KERNEL, attrs), __builtin_return_address(0)); - if (!ret) { - dma_free_contiguous(dev, page, size); - return ret; - } - + if (!ret) + goto out_free_pages; memset(ret, 0, size); goto done; } @@ -186,8 +183,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, * so log an error and fail. */ dev_info(dev, "Rejecting highmem page from CMA.\n"); - dma_free_contiguous(dev, page, size); - return NULL; + goto out_free_pages; } ret = page_address(page); @@ -196,10 +192,15 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, memset(ret, 0, size); - if (IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT) && - dma_alloc_need_uncached(dev, attrs)) { + if (dma_alloc_need_uncached(dev, attrs)) { arch_dma_prep_coherent(page, size); - ret = uncached_kernel_address(ret); + + if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED)) { + if (!arch_dma_set_uncached(ret, size)) + goto out_free_pages; + } else if (IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT)) { + ret = uncached_kernel_address(ret); Hmm, would we actually need to keep ARCH_HAS_UNCACHED_SEGMENT? If arch_dma_set_uncached() returned void*/ERR_PTR instead, then it could work for both cases (with arch_dma_clear_uncached() being a no-op for segments). Robin. + } } done: if (force_dma_unencrypted(dev)) @@ -207,6 +208,9 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, else *dma_handle = phys_to_dma(dev, page_to_phys(page)); return ret; +out_free_pages: + dma_free_contiguous(dev, page, size); + return NULL; } void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, @@ -230,6 +234,8 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) vunmap(cpu_addr); + else if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED)) + arch_dma_clear_uncached(cpu_addr, size);
provide in-place uncached remapping for dma-direct (resend)
Hi all, this series provides support for remapping places uncached in-place in the generic dma-direct code, and moves openrisc over from its own in-place remapping scheme. The arm64 folks also had interest in such a scheme to avoid problems with speculating into cache aliases. Also all architectures that always use small page mappings for the kernel and have non-coherent DMA should look into enabling this scheme, as it is much more efficient than the vmap remapping. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] openrisc: use the generic in-place uncached DMA allocator
Switch openrisc to use the dma-direct allocator and just provide the hooks for setting memory uncached or cached. Signed-off-by: Christoph Hellwig --- arch/openrisc/Kconfig | 1 + arch/openrisc/kernel/dma.c | 51 +- 2 files changed, 7 insertions(+), 45 deletions(-) diff --git a/arch/openrisc/Kconfig b/arch/openrisc/Kconfig index 1928e061ff96..041fff4326dc 100644 --- a/arch/openrisc/Kconfig +++ b/arch/openrisc/Kconfig @@ -7,6 +7,7 @@ config OPENRISC def_bool y select ARCH_32BIT_OFF_T + select ARCH_HAS_DMA_SET_UNCACHED select ARCH_HAS_SYNC_DMA_FOR_DEVICE select OF select OF_EARLY_FLATTREE diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c index adec711ad39d..c73d2b3ae267 100644 --- a/arch/openrisc/kernel/dma.c +++ b/arch/openrisc/kernel/dma.c @@ -11,8 +11,6 @@ * Copyright (C) 2010-2011 Jonas Bonn * * DMA mapping callbacks... - * As alloc_coherent is the only DMA callback being used currently, that's - * the only thing implemented properly. The rest need looking into... */ #include @@ -67,62 +65,25 @@ static const struct mm_walk_ops clear_nocache_walk_ops = { .pte_entry = page_clear_nocache, }; -/* - * Alloc "coherent" memory, which for OpenRISC means simply uncached. - * - * This function effectively just calls __get_free_pages, sets the - * cache-inhibit bit on those pages, and makes sure that the pages are - * flushed out of the cache before they are used. - * - * If the NON_CONSISTENT attribute is set, then this function just - * returns "normal", cachable memory. - * - * There are additional flags WEAK_ORDERING and WRITE_COMBINE to take - * into consideration here, too. All current known implementations of - * the OR1K support only strongly ordered memory accesses, so that flag - * is being ignored for now; uncached but write-combined memory is a - * missing feature of the OR1K. - */ -void * -arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, - gfp_t gfp, unsigned long attrs) +int arch_dma_set_uncached(void *cpu_addr, size_t size) { - unsigned long va; - void *page; - - page = alloc_pages_exact(size, gfp | __GFP_ZERO); - if (!page) - return NULL; - - /* This gives us the real physical address of the first page. */ - *dma_handle = __pa(page); - - va = (unsigned long)page; + unsigned long va = (unsigned long)cpu_addr; /* * We need to iterate through the pages, clearing the dcache for * them and setting the cache-inhibit bit. */ - if (walk_page_range(_mm, va, va + size, _nocache_walk_ops, - NULL)) { - free_pages_exact(page, size); - return NULL; - } - - return (void *)va; + return walk_page_range(_mm, va, va + size, _nocache_walk_ops, + NULL); } -void -arch_dma_free(struct device *dev, size_t size, void *vaddr, - dma_addr_t dma_handle, unsigned long attrs) +void arch_dma_clear_uncached(void *cpu_addr, size_t size) { - unsigned long va = (unsigned long)vaddr; + unsigned long va = (unsigned long)cpu_addr; /* walk_page_range shouldn't be able to fail here */ WARN_ON(walk_page_range(_mm, va, va + size, _nocache_walk_ops, NULL)); - - free_pages_exact(vaddr, size); } void arch_sync_dma_for_device(phys_addr_t addr, size_t size, -- 2.24.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] dma-mapping: support setting memory uncached in place
We currently only support remapping memory as uncached through vmap or a magic uncached segment provided by some architectures. But there is a simpler and much better way available on some architectures where we can just remap the memory in place. The advantages are: 1) no aliasing is possible, which prevents speculating into the cached alias 2) there is no need to allocate new ptes and thus no need for a special pre-allocated pool of memory that can be used with GFP_ATOMIC DMA allocations The downside is that architectures must provide a way to set arbitrary pages uncached in the kernel mapping, which might not be possible on architecture that have a special implicit kernel mapping, and requires splitting of huge page kernel mappings where they exist. Signed-off-by: Christoph Hellwig --- include/linux/dma-noncoherent.h | 3 +++ kernel/dma/Kconfig | 8 kernel/dma/direct.c | 28 ++-- 3 files changed, 29 insertions(+), 10 deletions(-) diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h index ca9b5770caee..0820ec58f119 100644 --- a/include/linux/dma-noncoherent.h +++ b/include/linux/dma-noncoherent.h @@ -111,4 +111,7 @@ static inline void arch_dma_prep_coherent(struct page *page, size_t size) void *uncached_kernel_address(void *addr); void *cached_kernel_address(void *addr); +int arch_dma_set_uncached(void *cpu_addr, size_t size); +void arch_dma_clear_uncached(void *cpu_addr, size_t size); + #endif /* _LINUX_DMA_NONCOHERENT_H */ diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index 4c103a24e380..7bc0b77f1243 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -83,6 +83,14 @@ config DMA_DIRECT_REMAP bool select DMA_REMAP +# +# Should be selected if the architecture can remap memory from the page +# allocator and CMA as uncached and provides the arch_dma_set_uncached and +# arch_dma_clear_uncached helpers +# +config ARCH_HAS_DMA_SET_UNCACHED + bool + config DMA_CMA bool "DMA Contiguous Memory Allocator" depends on HAVE_DMA_CONTIGUOUS && CMA diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 6af7ae83c4ad..73fe65a4cbc0 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -169,11 +169,8 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, ret = dma_common_contiguous_remap(page, PAGE_ALIGN(size), dma_pgprot(dev, PAGE_KERNEL, attrs), __builtin_return_address(0)); - if (!ret) { - dma_free_contiguous(dev, page, size); - return ret; - } - + if (!ret) + goto out_free_pages; memset(ret, 0, size); goto done; } @@ -186,8 +183,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, * so log an error and fail. */ dev_info(dev, "Rejecting highmem page from CMA.\n"); - dma_free_contiguous(dev, page, size); - return NULL; + goto out_free_pages; } ret = page_address(page); @@ -196,10 +192,15 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, memset(ret, 0, size); - if (IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT) && - dma_alloc_need_uncached(dev, attrs)) { + if (dma_alloc_need_uncached(dev, attrs)) { arch_dma_prep_coherent(page, size); - ret = uncached_kernel_address(ret); + + if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED)) { + if (!arch_dma_set_uncached(ret, size)) + goto out_free_pages; + } else if (IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT)) { + ret = uncached_kernel_address(ret); + } } done: if (force_dma_unencrypted(dev)) @@ -207,6 +208,9 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, else *dma_handle = phys_to_dma(dev, page_to_phys(page)); return ret; +out_free_pages: + dma_free_contiguous(dev, page, size); + return NULL; } void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, @@ -230,6 +234,8 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) vunmap(cpu_addr); + else if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED)) + arch_dma_clear_uncached(cpu_addr, size); dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size); } @@ -238,6 +244,7 @@ void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) { if (!IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT) && +
Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h
On 20.02.20 17:31, Christoph Hellwig wrote: > On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote: >> >From a users perspective it makes absolutely perfect sense to use the >> bounce buffers when they are NEEDED. >> Forcing the user to specify iommu_platform just because you need bounce >> buffers >> really feels wrong. And obviously we have a severe performance issue >> because of the indirections. > > The point is that the user should not have to specify iommu_platform. I (and Halil) agree on that. From a user perspective we want to have the right thing in the guest without any command option. The iommu_plaform thing was indeed a first step to make things work. I might mis-read Halils patch, but isnt one effect of this patch set that things WILL work without iommu_platform? > We need to make sure any new hypervisor (especially one that might require > bounce buffering) always sets it, as was a rather bogus legacy hack > that isn't extensibe for cases that for example require bounce buffering. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h
On Thu, Feb 20, 2020 at 05:23:20PM +0100, Christian Borntraeger wrote: > >From a users perspective it makes absolutely perfect sense to use the > bounce buffers when they are NEEDED. > Forcing the user to specify iommu_platform just because you need bounce > buffers > really feels wrong. And obviously we have a severe performance issue > because of the indirections. The point is that the user should not have to specify iommu_platform. We need to make sure any new hypervisor (especially one that might require bounce buffering) always sets it, as was a rather bogus legacy hack that isn't extensibe for cases that for example require bounce buffering. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: question about iommu_need_mapping
On Thu Feb 20 20, Lu Baolu wrote: Hi Jerry, On 2020/2/20 7:55, Jerry Snitselaar wrote: Is it possible for a device to end up with dev->archdata.iommu == NULL on iommu_need_mapping in the following instance: 1. iommu_group has dma domain for default 2. device gets private identity domain in intel_iommu_add_device 3. iommu_need_mapping gets called with that device. 4. dmar_remove_one_dev_info sets dev->archdata.iommu = NULL via unlink_domain_info. 5. request_default_domain_for_dev exits after checking that group->default_domain exists, and group->default_domain->type is dma. 6. iommu_request_dma_domain_for_dev returns 0 from request_default_domain_for_dev and a private dma domain isn't created for the device. Yes. It's possible. The case I was seeing went away with commit 9235cb13d7d1 ("iommu/vt-d: Allow devices with RMRRs to use identity domain"), because it changed which domain the group and devices were using, but it seems like it is still a possibility with the code. Baolu, you mentioned possibly removing the domain switch. Commit 98b2fffb5e27 ("iommu/vt-d: Handle 32bit device with identity default domain") makes it sound like the domain switch is required. It's more "nice to have" than "required" if the iommu driver doesn't disable swiotlb explicitly. The device access of system memory higher than the device's addressing capability could go through the bounced buffer implemented in swiotlb. Best regards, baolu Hi Baolu, Would this mean switching to bounce_dma_ops instead? Regards, Jerry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h
On 20.02.20 17:11, Christoph Hellwig wrote: > On Thu, Feb 20, 2020 at 05:06:05PM +0100, Halil Pasic wrote: >> Currently force_dma_unencrypted() is only used by the direct >> implementation of the DMA API, and thus resides in dma-direct.h. But >> there is nothing dma-direct specific about it: if one was -- for >> whatever reason -- to implement custom DMA ops that have to in the >> encrypted/protected scenarios dma-direct currently deals with, one would >> need exactly this kind of information. > > I really don't think it has business being anywhre else, and your completely > bogus second patch just proves the point. >From a users perspective it makes absolutely perfect sense to use the bounce buffers when they are NEEDED. Forcing the user to specify iommu_platform just because you need bounce buffers really feels wrong. And obviously we have a severe performance issue because of the indirections. Now: I understand that you want to get this fixes differently, but maybe you could help to outline how this could be fixed proper. Christian ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected
On Thu, Feb 20, 2020 at 05:06:06PM +0100, Halil Pasic wrote: > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > index 867c7ebd3f10..fafc8f924955 100644 > --- a/drivers/virtio/virtio_ring.c > +++ b/drivers/virtio/virtio_ring.c > @@ -243,6 +243,9 @@ static bool vring_use_dma_api(struct virtio_device *vdev) > if (!virtio_has_iommu_quirk(vdev)) > return true; > > + if (force_dma_unencrypted(>dev)) > + return true; Hell no. This is a detail of the platform DMA direct implementation. Drivers have no business looking at this flag, and virtio finally needs to be fixed to use the DMA API properly for everything but legacy devices. No amount of desparate hacks is going to fix that fundamental problem, and I'm starting to get really sick and tired of all the crap patches published in this area. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h
On Thu, Feb 20, 2020 at 05:06:05PM +0100, Halil Pasic wrote: > Currently force_dma_unencrypted() is only used by the direct > implementation of the DMA API, and thus resides in dma-direct.h. But > there is nothing dma-direct specific about it: if one was -- for > whatever reason -- to implement custom DMA ops that have to in the > encrypted/protected scenarios dma-direct currently deals with, one would > need exactly this kind of information. I really don't think it has business being anywhre else, and your completely bogus second patch just proves the point. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/2] virtio: decouple protected guest RAM form VIRTIO_F_IOMMU_PLATFORM
Currently if one intends to run a memory protection enabled VM with virtio devices and linux as the guest OS, one needs to specify the VIRTIO_F_IOMMU_PLATFORM flag for each virtio device to make the guest linux use the DMA API, which in turn handles the memory encryption/protection stuff if the guest decides to turn itself into a protected one. This however makes no sense due to multiple reasons: * The device is not changed by the fact that the guest RAM is protected. The so called IOMMU bypass quirk is not affected. * This usage is not congruent with standardised semantics of VIRTIO_F_IOMMU_PLATFORM. Guest memory protected is an orthogonal reason for using DMA API in virtio (orthogonal with respect to what is expressed by VIRTIO_F_IOMMU_PLATFORM). This series aims to decouple 'have to use DMA API because my (guest) RAM is protected' and 'have to use DMA API because the device told me VIRTIO_F_IOMMU_PLATFORM'. Please find more detailed explanations about the conceptual aspects in the individual patches. There is however also a very practical problem that is addressed by this series. For vhost-net the feature VIRTIO_F_IOMMU_PLATFORM has the following side effect The vhost code assumes it the addresses on the virtio descriptor ring are not guest physical addresses but iova's, and insists on doing a translation of these regardless of what transport is used (e.g. whether we emulate a PCI or a CCW device). (For details see commit 6b1e6cc7855b "vhost: new device IOTLB API".) On s390 this results in severe performance degradation (c.a. factor 10). BTW with ccw I/O there is (architecturally) no IOMMU, so the whole address translation makes no sense in the context of virtio-ccw. Halil Pasic (2): mm: move force_dma_unencrypted() to mem_encrypt.h virtio: let virtio use DMA API when guest RAM is protected drivers/virtio/virtio_ring.c | 3 +++ include/linux/dma-direct.h | 9 - include/linux/mem_encrypt.h | 10 ++ 3 files changed, 13 insertions(+), 9 deletions(-) base-commit: ca7e1fd1026c5af6a533b4b5447e1d2f153e28f2 -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] mm: move force_dma_unencrypted() to mem_encrypt.h
Currently force_dma_unencrypted() is only used by the direct implementation of the DMA API, and thus resides in dma-direct.h. But there is nothing dma-direct specific about it: if one was -- for whatever reason -- to implement custom DMA ops that have to in the encrypted/protected scenarios dma-direct currently deals with, one would need exactly this kind of information. More importantly, virtio has to use DMA API (so that the memory encryption (x86) or protection (power, s390) is handled) under the very same circumstances force_dma_unencrypted() returns true. Furthermore, the in these cases the reason to go via the DMA API is distinct, compared to the reason indicated by VIRTIO_F_IOMMU_PLATFORM: we need to use DMA API independently of the device's properties with regards to access to memory. I.e. the addresses in the descriptors are still guest physical addresses, the device may still be implemented by a SMP CPU, and thus the device shall use those without any further translation. See [1]. Let's move force_dma_unencrypted() the so virtio, or other implementations of DMA ops can make the right decisions. [1] https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-416 (In the spec VIRTIO_F_IOMMU_PLATFORM is called VIRTIO_F_ACCESS_PLATFORM). Signed-off-by: Halil Pasic Tested-by: Ram Pai Tested-by: Michael Mueller --- include/linux/dma-direct.h | 9 - include/linux/mem_encrypt.h | 10 ++ 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index 24b8684aa21d..590b15d881b0 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -26,15 +26,6 @@ static inline phys_addr_t __dma_to_phys(struct device *dev, dma_addr_t dev_addr) } #endif /* !CONFIG_ARCH_HAS_PHYS_TO_DMA */ -#ifdef CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED -bool force_dma_unencrypted(struct device *dev); -#else -static inline bool force_dma_unencrypted(struct device *dev) -{ - return false; -} -#endif /* CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED */ - /* * If memory encryption is supported, phys_to_dma will set the memory encryption * bit in the DMA address, and dma_to_phys will clear it. The raw __phys_to_dma diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h index 5c4a18a91f89..64a48c4b01a2 100644 --- a/include/linux/mem_encrypt.h +++ b/include/linux/mem_encrypt.h @@ -22,6 +22,16 @@ static inline bool mem_encrypt_active(void) { return false; } #endif /* CONFIG_ARCH_HAS_MEM_ENCRYPT */ +struct device; +#ifdef CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED +bool force_dma_unencrypted(struct device *dev); +#else +static inline bool force_dma_unencrypted(struct device *dev) +{ + return false; +} +#endif /* CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED */ + #ifdef CONFIG_AMD_MEM_ENCRYPT /* * The __sme_set() and __sme_clr() macros are useful for adding or removing -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] virtio: let virtio use DMA API when guest RAM is protected
Currently the advanced guest memory protection technologies (AMD SEV, powerpc secure guest technology and s390 Protected VMs) abuse the VIRTIO_F_IOMMU_PLATFORM flag to make virtio core use the DMA API, which is in turn necessary, to make IO work with guest memory protection. But VIRTIO_F_IOMMU_PLATFORM a.k.a. VIRTIO_F_ACCESS_PLATFORM is really a different beast: with virtio devices whose implementation runs on an SMP CPU we are still fine with doing all the usual optimizations, it is just that we need to make sure that the memory protection mechanism does not get in the way. The VIRTIO_F_ACCESS_PLATFORM mandates more work on the side of the guest (and possibly he host side as well) than we actually need. An additional benefit of teaching the guest to make the right decision (and use DMA API) on it's own is: removing the need, to mandate special VM configuration for guests that may run with protection. This is especially interesting for s390 as VIRTIO_F_IOMMU_PLATFORM pushes all the virtio control structures into the first 2G of guest memory: something we don't necessarily want to do per-default. Signed-off-by: Halil Pasic Tested-by: Ram Pai Tested-by: Michael Mueller --- drivers/virtio/virtio_ring.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 867c7ebd3f10..fafc8f924955 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -243,6 +243,9 @@ static bool vring_use_dma_api(struct virtio_device *vdev) if (!virtio_has_iommu_quirk(vdev)) return true; + if (force_dma_unencrypted(>dev)) + return true; + /* Otherwise, we are left to guess. */ /* * In theory, it's possible to have a buggy QEMU-supposed -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4] iommu/vt-d: consider real PCI device when checking if mapping is needed
Hi, On 2020/2/20 18:06, Daniel Drake wrote: On Wed, Feb 19, 2020 at 11:40 AM Lu Baolu wrote: With respect, this is problematical. The parent and all subdevices share a single translation entry. The DMA mask should be consistent. Otherwise, for example, subdevice A has 64-bit DMA capability and uses an identity domain for DMA translation. While subdevice B has 32-bit DMA capability and is forced to switch to DMA domain. Subdevice A will be impacted without any notification. Looking closer, this problematic codepath may already be happening for VMD, under intel_iommu_add_device(). Consider this function running for a VMD subdevice, we hit: domain = iommu_get_domain_for_dev(dev); I can't quite grasp the code flow here, but domain->type now always seems to return the domain type of the real DMA device, which seems like pretty reasonable behaviour. if (domain->type == IOMMU_DOMAIN_DMA) { and as detailed in previous mails, the real VMD device seems to be in a DMA domain by default, so we continue. if (device_def_domain_type(dev) == IOMMU_DOMAIN_IDENTITY) { Now we checked the default domain type of the subdevice. This seems rather likely to return IDENTITY because that's effectively the default type... ret = iommu_request_dm_for_dev(dev); if (ret) { dmar_remove_one_dev_info(dev); dmar_domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN; domain_add_dev_info(si_domain, dev); dev_info(dev, "Device uses a private identity domain.\n"); } } and now we're doing the bad stuff that Lu pointed out: we only have one mapping shared for all the subdevices, so if we end up changing it for one subdevice, we're likely to be breaking another. Yes. Agreed. By the way, do all subdevices stay in a same iommu group? Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4] iommu/vt-d: consider real PCI device when checking if mapping is needed
> On Wed, Feb 19, 2020 at 11:40 AM Lu Baolu wrote: > > With respect, this is problematical. The parent and all subdevices share > > a single translation entry. The DMA mask should be consistent. > > > > Otherwise, for example, subdevice A has 64-bit DMA capability and uses > > an identity domain for DMA translation. While subdevice B has 32-bit DMA > > capability and is forced to switch to DMA domain. Subdevice A will be > > impacted without any notification. Looking closer, this problematic codepath may already be happening for VMD, under intel_iommu_add_device(). Consider this function running for a VMD subdevice, we hit: domain = iommu_get_domain_for_dev(dev); I can't quite grasp the code flow here, but domain->type now always seems to return the domain type of the real DMA device, which seems like pretty reasonable behaviour. if (domain->type == IOMMU_DOMAIN_DMA) { and as detailed in previous mails, the real VMD device seems to be in a DMA domain by default, so we continue. if (device_def_domain_type(dev) == IOMMU_DOMAIN_IDENTITY) { Now we checked the default domain type of the subdevice. This seems rather likely to return IDENTITY because that's effectively the default type... ret = iommu_request_dm_for_dev(dev); if (ret) { dmar_remove_one_dev_info(dev); dmar_domain->flags |= DOMAIN_FLAG_LOSE_CHILDREN; domain_add_dev_info(si_domain, dev); dev_info(dev, "Device uses a private identity domain.\n"); } } and now we're doing the bad stuff that Lu pointed out: we only have one mapping shared for all the subdevices, so if we end up changing it for one subdevice, we're likely to be breaking another. In this case iommu_request_dm_for_dev() returns -EBUSY without doing anything and the following private identity code fortunately seems to have no consequential effects - the real DMA device continues to operate in the DMA domain, and all subdevice DMA requests go through the DMA mapping codepath. That's probably why VMD appears to be working fine anyway, but this seems fragile. The following changes enforce that the real DMA device is in the DMA domain, and avoid the intel_iommu_add_device() codepaths that would try to change it to a different domain type. Let me know if I'm on the right lines... --- drivers/iommu/intel-iommu.c | 16 drivers/pci/controller/intel-nvme-remap.c | 6 ++ 2 files changed, 22 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 9644a5b3e0ae..8872b8d1780d 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2911,6 +2911,9 @@ static int device_def_domain_type(struct device *dev) if (dev_is_pci(dev)) { struct pci_dev *pdev = to_pci_dev(dev); + if (pci_real_dma_dev(pdev) != pdev) + return IOMMU_DOMAIN_DMA; + if (device_is_rmrr_locked(dev)) return IOMMU_DOMAIN_DMA; @@ -5580,6 +5583,7 @@ static bool intel_iommu_capable(enum iommu_cap cap) static int intel_iommu_add_device(struct device *dev) { + struct device *real_dev = dev; struct dmar_domain *dmar_domain; struct iommu_domain *domain; struct intel_iommu *iommu; @@ -5591,6 +5595,17 @@ static int intel_iommu_add_device(struct device *dev) if (!iommu) return -ENODEV; + if (dev_is_pci(dev)) + real_dev = _real_dma_dev(to_pci_dev(dev))->dev; + + if (real_dev != dev) { + domain = iommu_get_domain_for_dev(real_dev); + if (domain->type != IOMMU_DOMAIN_DMA) { + dev_err(dev, "Real DMA device not in DMA domain; can't handle DMA\n"); + return -ENODEV; + } + } + iommu_device_link(>iommu, dev); if (translation_pre_enabled(iommu)) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH] iommu/dma: Allow drivers to reserve an iova range
On Wed, Feb 19, 2020 at 12:06:28PM -0800, isa...@codeaurora.org wrote: > On 2020-02-19 03:15, Will Deacon wrote: > > On Tue, Feb 18, 2020 at 05:57:18PM -0800, isa...@codeaurora.org wrote: > > > Does this mean that the driver should be managing the IOVA space and > > > mappings for this device using the IOMMU API? If so, is the > > > rationale for > > > this because the device driver can have the information of what IOVA > > > ranges > > > can and cannot be used? Shouldn't there be a generic way of > > > informing an > > > IOMMU driver about these reserved ranges? Perhaps through a device > > > tree > > > property, instead of deferring this type of management to the driver? > > > > Before we dive into designing that, can you please clarify whether the > > reserved IOVA range applies to all DMA masters mastering through a > > particular SMMU, or whether it's just about one specific master? I was > > assuming the former, but wanted to be sure. > > > This situation currently applies to one master. Interesting. Is it problematic if the range is reserved for all masters sharing that SMMU? Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: sytem reboots when initializing Edirol FA-101 firewire audio interface
Hi, On Wed, Feb 19, 2020 at 06:19:56AM -0800, Steve Morris wrote: > I'm running Arch Linux x86_64 > > My system consistently reboots when I power on my FA-101 when running > kernels 5.5.1-4. Downgrading to 5.4.15 allows everything to work > properly. > > Here's the outpu of: > journalctl | grep -E 'Reboot|firewire|fw|bebob|alsa|jack' > > This is a good init under 5.4.15: > -- Reboot -- > powered on interface while running 5.4.15, initialized properly > Feb 18 18:35:37 hostname kernel: firewire_ohci :05:00.0: enabling device > (0080 -> 0083) > Feb 18 18:35:37 hostname kernel: firewire_ohci :05:00.0: added OHCI v1.10 > device as card 0, 4 IR + 8 IT contexts, quirks 0x11 > Feb 18 18:35:37 hostname kernel: firewire_core :05:00.0: created device > fw0: GUID 00112277, S400 > Feb 18 18:35:37 hostname systemd-udevd[541]: controlC0: Process > '/usr/bin/alsactl restore 0' failed with exit code 99. > Feb 18 18:35:37 hostname systemd-udevd[541]: controlC1: Process > '/usr/bin/alsactl restore 1' failed with exit code 99. > Feb 18 18:35:38 hostname kernel: amdgpu: [powerplay] smu driver if version = > 0x0033, smu fw if version = 0x0035, smu fw version = 0x002a3200 > (42.50.0) > Feb 18 18:36:08 hostname kernel: firewire_ohci :05:00.0: isochronous > cycle inconsistent > Feb 18 18:36:08 hostname kernel: firewire_core :05:00.0: created device > fw1: GUID 0040abc20bc1, S400 > Feb 18 18:36:08 hostname kernel: firewire_core :05:00.0: phy config: new > root=ffc1, gap_count=5 > Feb 18 18:36:10 hostname kernel: firewire_core :05:00.0: BM lock failed > (timeout), making local node (ffc0) root > Feb 18 18:36:10 hostname kernel: firewire_core :05:00.0: phy config: new > root=ffc0, gap_count=5 > Feb 18 18:36:12 hostname systemd-udevd[1532]: controlC2: Process > '/usr/bin/alsactl restore 2' failed with exit code 99. > Feb 18 18:36:15 hostname kernel: snd-bebob fw1.0: transaction failed: no ack > Feb 18 18:36:15 hostname kernel: snd-bebob fw1.0: fail to get an input for > MSU in plug 7: -5 > Feb 18 18:36:15 hostname kernel: snd-bebob fw1.0: transaction failed: no ack > Feb 18 18:36:15 hostname kernel: snd-bebob fw1.0: fail to get an input for > MSU in plug 7: -5 > > This is a bad init under 5.5.4: > -- Reboot -- > Powered on interface when machine was running. > Feb 18 18:13:37 hostname kernel: firewire_ohci :05:00.0: enabling device > (0080 -> 0083) > Feb 18 18:13:37 hostname kernel: firewire_ohci :05:00.0: added OHCI v1.10 > device as card 0, 4 IR + 8 IT contexts, quirks 0x11 > Feb 18 18:13:37 hostname kernel: firewire_core :05:00.0: created device > fw0: GUID 00112277, S400 > Feb 18 18:13:37 hostname kernel: audit: type=1130 audit(1582078417.360:3): > pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=ufw comm="systemd" > exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > Feb 18 18:13:37 hostname audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 > ses=4294967295 msg='unit=ufw comm="systemd" exe="/usr/lib/systemd/systemd" > hostname=? addr=? terminal=? res=success' > Feb 18 18:13:37 hostname systemd-udevd[539]: controlC0: Process > '/usr/bin/alsactl restore 0' failed with exit code 99. > Feb 18 18:13:37 hostname systemd-udevd[530]: controlC1: Process > '/usr/bin/alsactl restore 1' failed with exit code 99. > Feb 18 18:13:38 hostname kernel: amdgpu: [powerplay] smu driver if version = > 0x0033, smu fw if version = 0x0035, smu fw version = 0x002a3200 > (42.50.0) > Feb 18 18:19:45 hostname kernel: firewire_core :05:00.0: phy config: new > root=ffc1, gap_count=5 > Feb 18 18:19:48 hostname kernel: firewire_core :05:00.0: phy config: new > root=ffc1, gap_count=5 > Feb 18 18:19:48 hostname kernel: firewire_core :05:00.0: created device > fw1: GUID 0040abc20bc1, S400 > Feb 18 18:19:52 hostname systemd-udevd[1682]: controlC2: Process > '/usr/bin/alsactl restore 2' failed with exit code 99. > Feb 18 18:20:08 hostname kernel: snd-bebob fw1.0: transaction failed: no ack > Feb 18 18:20:08 hostname kernel: snd-bebob fw1.0: fail to get an input for > MSU in plug 7: -5 > Feb 18 18:20:08 hostname kernel: snd-bebob fw1.0: transaction failed: no ack > Feb 18 18:20:08 hostname kernel: snd-bebob fw1.0: fail to get an input for > MSU in plug 7: -5 > Feb 18 18:20:12 hostname kernel: firewire_core :05:00.0: phy config: new > root=ffc1, gap_count=5 > Feb 18 18:21:24 hostname kernel: firewire_ohci :05:00.0: AMD-Vi: Event > logged [IO_PAGE_FAULT domain=0x0001 address=0xd508 flags=0x] > Followed by 125 more O_PAGE_FAULT and then the reboot, > > I'll be happy to provide additional information if needed. Hm. I think your system rushed into too complicated issue, At least, three software stacks are related to your issue: - ALSA bebob driver(snd-bebob) and helper module(snd-firewire-lib) - Linux firewire core(firewire-core) and 1394 OHCI controller driver (firewire-ohci) - Driver