Re: [PATCH v3 5/6] media: uvcvideo: Use dma_alloc_noncontiguos API
On Wed, Jan 27, 2021 at 10:35:02PM +0100, Ricardo Ribalda wrote: > I have used the current API here: > > https://git.kernel.org/pub/scm/linux/kernel/git/ribalda/linux.git/log/?h=uvc-noncontiguous > > And I think the result is very clean. Great work! > > I have tested it in X86 and in arm64, with similar performance as the > previous patchset. > > Maybe you want to cherry pick that commit into your series I can also > send the patch to the list for review if you prefer so. > > At least in 5.11 rc5 I the same dmadev worked in arm64 and x86. Given that Sergey mentioned the v4l patchset needs more work I'll use your commit as the example for sending out my series. Thanks for all the work! ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC 7/9] iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers
Hi, On 1/28/21 4:01 AM, Chuck Lever wrote: From: Isaac J. Manjarres Add support for IOMMU drivers to have their own map_sg() callbacks. This completes the path for having iommu_map_sg() invoke an IOMMU driver's map_sg() callback, which can then invoke the io-pgtable map_sg() callback with the entire scatter-gather list, so that it can be processed entirely in the io-pgtable layer. For IOMMU drivers that do not provide a callback, the default implementation of iterating through the scatter-gather list, while calling iommu_map() will be used. Signed-off-by: Isaac J. Manjarres Tested-by: Sai Prakash Ranjan [ cel: adjusted new iotlb_sync_map call site ] Signed-off-by: Chuck Lever --- drivers/iommu/iommu.c | 12 include/linux/iommu.h |5 + 2 files changed, 17 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index ed879a4d7fac..bd7adbd0339b 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2551,6 +2551,18 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova, unsigned int i = 0; int ret; + if (ops->map_sg) { + ret = ops->map_sg(domain, iova, sg, nents, prot, gfp, ); + + if (ops->iotlb_sync_map) + ops->iotlb_sync_map(domain, iova, mapped); + + if (ret) + goto out_err; + + return mapped; + } + while (i <= nents) { phys_addr_t s_phys = sg_phys(sg); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index cd5f35022a25..667edc7b034a 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -192,6 +192,8 @@ struct iommu_iotlb_gather { * @attach_dev: attach device to an iommu domain * @detach_dev: detach device from an iommu domain * @map: map a physically contiguous memory region to an iommu domain + * @map_sg: map a scatter-gather list of physically contiguous chunks to + * an iommu domain. * @unmap: unmap a physically contiguous memory region from an iommu domain * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain * @iotlb_sync_map: Sync mappings created recently using @map to the hardware @@ -243,6 +245,9 @@ struct iommu_ops { void (*detach_dev)(struct iommu_domain *domain, struct device *dev); int (*map)(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot, gfp_t gfp); + int (*map_sg)(struct iommu_domain *domain, unsigned long iova, + struct scatterlist *sg, unsigned int nents, int prot, + gfp_t gfp, size_t *mapped); I might be oversensitive. But what if the vendor iommu driver uses iova beyond the pre-allocated range? It's safer if we could pass an iova length parameter as well, so that the iommu driver could do some sanity check. Best regards, baolu size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, size_t size, struct iommu_iotlb_gather *iotlb_gather); void (*flush_iotlb_all)(struct iommu_domain *domain); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC 2/9] iommu: Add iova and size as parameters in iotlb_sync_map
On 1/28/21 4:00 AM, Chuck Lever wrote: From: Yong Wu iotlb_sync_map allow IOMMU drivers tlb sync after completing the whole mapping. This patch adds iova and size as the parameters in it. then the IOMMU driver could flush tlb with the whole range once after iova mapping to improve performance. Signed-off-by: Yong Wu Reviewed-by: Robin Murphy Signed-off-by: Chuck Lever --- drivers/iommu/iommu.c |4 ++-- drivers/iommu/tegra-gart.c |7 +-- include/linux/iommu.h |3 ++- 3 files changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index c304a6a30d42..3d099a31ddca 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2443,7 +2443,7 @@ static int _iommu_map(struct iommu_domain *domain, unsigned long iova, ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL); if (ret == 0 && ops->iotlb_sync_map) - ops->iotlb_sync_map(domain); + ops->iotlb_sync_map(domain, iova, size); return ret; } @@ -2575,7 +2575,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova, } if (ops->iotlb_sync_map) - ops->iotlb_sync_map(domain); + ops->iotlb_sync_map(domain, iova, mapped); return mapped; out_err: diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c index fac720273889..05e8e19b8269 100644 --- a/drivers/iommu/tegra-gart.c +++ b/drivers/iommu/tegra-gart.c @@ -261,7 +261,8 @@ static int gart_iommu_of_xlate(struct device *dev, return 0; } -static void gart_iommu_sync_map(struct iommu_domain *domain) +static void gart_iommu_sync_map(struct iommu_domain *domain, unsigned long iova, + size_t size) { FLUSH_GART_REGS(gart_handle); } @@ -269,7 +270,9 @@ static void gart_iommu_sync_map(struct iommu_domain *domain) static void gart_iommu_sync(struct iommu_domain *domain, struct iommu_iotlb_gather *gather) { - gart_iommu_sync_map(domain); + size_t length = gather->end - gather->start; + + gart_iommu_sync_map(domain, gather->start, length); } static const struct iommu_ops gart_iommu_ops = { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index b3f0e2018c62..9ce0aa9e236b 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -246,7 +246,8 @@ struct iommu_ops { size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, size_t size, struct iommu_iotlb_gather *iotlb_gather); void (*flush_iotlb_all)(struct iommu_domain *domain); - void (*iotlb_sync_map)(struct iommu_domain *domain); + void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova, + size_t size); void (*iotlb_sync)(struct iommu_domain *domain, struct iommu_iotlb_gather *iotlb_gather); phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova); Reviewed-by: Lu Baolu Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC 1/9] iommu: Move iotlb_sync_map out from __iommu_map
Hi, On 1/28/21 4:00 AM, Chuck Lever wrote: From: Yong Wu In the end of __iommu_map, It alway call iotlb_sync_map. This patch moves iotlb_sync_map out from __iommu_map since it is unnecessary to call this for each sg segment especially iotlb_sync_map is flush tlb all currently. Add a little helper _iommu_map for this. Signed-off-by: Yong Wu Reviewed-by: Robin Murphy Signed-off-by: Chuck Lever --- drivers/iommu/iommu.c | 23 ++- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index ffeebda8d6de..c304a6a30d42 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2426,9 +2426,6 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova, size -= pgsize; } - if (ops->iotlb_sync_map) - ops->iotlb_sync_map(domain); - /* unroll mapping in case something went wrong */ if (ret) iommu_unmap(domain, orig_iova, orig_size - size); @@ -2438,18 +2435,31 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova, return ret; } +static int _iommu_map(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) +{ + const struct iommu_ops *ops = domain->ops; + int ret; + + ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL); + if (ret == 0 && ops->iotlb_sync_map) + ops->iotlb_sync_map(domain); Previous code called iotlb_sync_map() regardless of whether the mapping was successful or not. Here the logic changes, and the callback is only called if mapping successfully. Any reason? It's safer to always call iotlb_sync_map() even in failed mapping case. In this way, we can ensure the consistency of cache as much as possible. Best regards, baolu + + return ret; +} + int iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot) { might_sleep(); - return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL); + return _iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL); } EXPORT_SYMBOL_GPL(iommu_map); int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot) { - return __iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC); + return _iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC); } EXPORT_SYMBOL_GPL(iommu_map_atomic); @@ -2533,6 +2543,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova, struct scatterlist *sg, unsigned int nents, int prot, gfp_t gfp) { + const struct iommu_ops *ops = domain->ops; size_t len = 0, mapped = 0; phys_addr_t start; unsigned int i = 0; @@ -2563,6 +2574,8 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova, sg = sg_next(sg); } + if (ops->iotlb_sync_map) + ops->iotlb_sync_map(domain); return mapped; out_err: ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RFC 0/9] Possible set of VT-d optimizations
Hi Chuck, On 1/28/21 4:00 AM, Chuck Lever wrote: Hi- This collection of patches seems to get the best throughtput results so far. The NFS WRITE result is fully restored, and the NFS READ result is very close to fully restored. Very glad to see this. Thanks! Can you please add below link if you have a next version? https://lore.kernel.org/linux-iommu/d81314ed-5673-44a6-b597-090e3cb83...@oracle.com/ It helps people to understand what regression you have seen. Best regards, baolu Children see throughput for 12 initial writers = 5008474.03 kB/sec Parent sees throughput for 12 initial writers = 4996927.80 kB/sec Min throughput per process = 416956.88 kB/sec Max throughput per process = 417910.22 kB/sec Avg throughput per process = 417372.84 kB/sec Min xfer= 1046272.00 kB CPU Utilization: Wall time2.515CPU time1.996CPU utilization 79.37 % Children see throughput for 12 rewriters= 5020584.59 kB/sec Parent sees throughput for 12 rewriters = 5012539.29 kB/sec Min throughput per process = 417799.00 kB/sec Max throughput per process = 419082.22 kB/sec Avg throughput per process = 418382.05 kB/sec Min xfer= 1046528.00 kB CPU utilization: Wall time2.507CPU time2.024CPU utilization 80.73 % Children see throughput for 12 readers = 5805484.25 kB/sec Parent sees throughput for 12 readers = 5799535.68 kB/sec Min throughput per process = 482888.16 kB/sec Max throughput per process = 48.16 kB/sec Avg throughput per process = 483790.35 kB/sec Min xfer= 1045760.00 kB CPU utilization: Wall time2.167CPU time1.964CPU utilization 90.63 % Children see throughput for 12 re-readers = 5812227.16 kB/sec Parent sees throughput for 12 re-readers= 5803793.06 kB/sec Min throughput per process = 483242.97 kB/sec Max throughput per process = 485724.41 kB/sec Avg throughput per process = 484352.26 kB/sec Min xfer= 1043456.00 kB CPU utilization: Wall time2.161CPU time1.976CPU utilization 91.45 % I've included a simple-minded implementation of a map_sg op for the Intel IOMMU. This is nothing more than a copy of the loop in __iommu_map_sg() with the call to __iommu_map() replaced with a call to intel_iommu_map(). --- Chuck Lever (1): iommu/vt-d: Introduce map_sg() for Intel IOMMUs Isaac J. Manjarres (5): iommu/io-pgtable: Introduce map_sg() as a page table op iommu/io-pgtable-arm: Hook up map_sg() iommu/io-pgtable-arm-v7s: Hook up map_sg() iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers iommu/arm-smmu: Hook up map_sg() Lu Baolu (1): iommu/vt-d: Add iotlb_sync_map callback Yong Wu (2): iommu: Move iotlb_sync_map out from __iommu_map iommu: Add iova and size as parameters in iotlb_sync_map drivers/iommu/arm/arm-smmu/arm-smmu.c | 19 drivers/iommu/intel/iommu.c | 131 -- drivers/iommu/io-pgtable-arm-v7s.c| 90 ++ drivers/iommu/io-pgtable-arm.c| 86 + drivers/iommu/iommu.c | 47 +++-- drivers/iommu/tegra-gart.c| 7 +- include/linux/iommu.h | 16 +++- 7 files changed, 353 insertions(+), 43 deletions(-) -- Chuck Lever ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/3] Adding offset keeping option when mapping data via SWIOTLB.*
NVMe driver and other applications may depend on the data offset to operate correctly. Currently when unaligned data is mapped via SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. This patch adds an option to make sure the mapped data preserves its offset of the orginal addrss. Without the patch when creating xfs formatted disk on NVMe backends, with swiotlb=force in kernel boot option, creates the following error: meta-data=/dev/nvme2n1 isize=512agcount=4, agsize=131072 blks = sectsz=512 attr=2, projid32bit=1 = crc=1finobt=1, sparse=0, rmapbt=0, refl ink=0 data = bsize=4096 blocks=524288, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 mkfs.xfs: pwrite failed: Input/output error Jianxiong Gao (3): Adding page_offset_mask to device_dma_parameters Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero. Adding device_dma_parameters->offset_preserve_mask to NVMe driver. drivers/nvme/host/pci.c | 4 include/linux/device.h | 1 + include/linux/dma-mapping.h | 17 + kernel/dma/swiotlb.c| 16 +++- 4 files changed, 37 insertions(+), 1 deletion(-) -- 2.27.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3] iommu/vt-d: do not use flush-queue when caching-mode is on
On 1/28/21 1:53 AM, Nadav Amit wrote: From: Nadav Amit When an Intel IOMMU is virtualized, and a physical device is passed-through to the VM, changes of the virtual IOMMU need to be propagated to the physical IOMMU. The hypervisor therefore needs to monitor PTE mappings in the IOMMU page-tables. Intel specifications provide "caching-mode" capability that a virtual IOMMU uses to report that the IOMMU is virtualized and a TLB flush is needed after mapping to allow the hypervisor to propagate virtual IOMMU mappings to the physical IOMMU. To the best of my knowledge no real physical IOMMU reports "caching-mode" as turned on. Synchronizing the virtual and the physical IOMMU tables is expensive if the hypervisor is unaware which PTEs have changed, as the hypervisor is required to walk all the virtualized tables and look for changes. Consequently, domain flushes are much more expensive than page-specific flushes on virtualized IOMMUs with passthrough devices. The kernel therefore exploited the "caching-mode" indication to avoid domain flushing and use page-specific flushing in virtualized environments. See commit 78d5f0f500e6 ("intel-iommu: Avoid global flushes with caching mode.") This behavior changed after commit 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing"). Now, when batched TLB flushing is used (the default), full TLB domain flushes are performed frequently, requiring the hypervisor to perform expensive synchronization between the virtual TLB and the physical one. Getting batched TLB flushes to use page-specific invalidations again in such circumstances is not easy, since the TLB invalidation scheme assumes that "full" domain TLB flushes are performed for scalability. Disable batched TLB flushes when caching-mode is on, as the performance benefit from using batched TLB invalidations is likely to be much smaller than the overhead of the virtual-to-physical IOMMU page-tables synchronization. Fixes: 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing") Signed-off-by: Nadav Amit Cc: David Woodhouse Cc: Lu Baolu Cc: Joerg Roedel Cc: Will Deacon Cc: sta...@vger.kernel.org --- v2->v3: * Fix the fixes tag in the commit-log (Lu). * Minor English rephrasing of the commit-log. v1->v2: * disable flush queue for all domains if caching-mode is on for any IOMMU (Lu). --- drivers/iommu/intel/iommu.c | 32 +++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 788119c5b021..de3dd617cf60 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -5373,6 +5373,36 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain, return ret; } +static bool domain_use_flush_queue(void) +{ + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu; + bool r = true; + + if (intel_iommu_strict) + return false; + + /* +* The flush queue implementation does not perform page-selective +* invalidations that are required for efficient TLB flushes in virtual +* environments. The benefit of batching is likely to be much lower than +* the overhead of synchronizing the virtual and physical IOMMU +* page-tables. +*/ + rcu_read_lock(); + for_each_active_iommu(iommu, drhd) { + if (!cap_caching_mode(iommu->cap)) + continue; + + pr_warn_once("IOMMU batching is disabled due to virtualization"); + r = false; + break; + } + rcu_read_unlock(); + + return r; +} + static int intel_iommu_domain_get_attr(struct iommu_domain *domain, enum iommu_attr attr, void *data) @@ -5383,7 +5413,7 @@ intel_iommu_domain_get_attr(struct iommu_domain *domain, case IOMMU_DOMAIN_DMA: switch (attr) { case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - *(int *)data = !intel_iommu_strict; + *(int *)data = domain_use_flush_queue(); return 0; default: return -ENODEV; Acked-by: Lu Baolu Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC PATCH v2] uacce: Add uacce_ctrl misc device
> -Original Message- > From: Jason Gunthorpe [mailto:j...@ziepe.ca] > Sent: Wednesday, January 27, 2021 7:20 AM > To: Song Bao Hua (Barry Song) > Cc: Wangzhou (B) ; Greg Kroah-Hartman > ; Arnd Bergmann ; Zhangfei Gao > ; linux-accelerat...@lists.ozlabs.org; > linux-ker...@vger.kernel.org; iommu@lists.linux-foundation.org; > linux...@kvack.org; Liguozhu (Kenneth) ; chensihang > (A) > Subject: Re: [RFC PATCH v2] uacce: Add uacce_ctrl misc device > > On Tue, Jan 26, 2021 at 01:26:45AM +, Song Bao Hua (Barry Song) wrote: > > > On Mon, Jan 25, 2021 at 11:35:22PM +, Song Bao Hua (Barry Song) wrote: > > > > > > > > On Mon, Jan 25, 2021 at 10:21:14PM +, Song Bao Hua (Barry Song) > wrote: > > > > > > mlock, while certainly be able to prevent swapping out, it won't > > > > > > be able to stop page moving due to: > > > > > > * memory compaction in alloc_pages() > > > > > > * making huge pages > > > > > > * numa balance > > > > > > * memory compaction in CMA > > > > > > > > > > Enabling those things is a major reason to have SVA device in the > > > > > first place, providing a SW API to turn it all off seems like the > > > > > wrong direction. > > > > > > > > I wouldn't say this is a major reason to have SVA. If we read the > > > > history of SVA and papers, people would think easy programming due > > > > to data struct sharing between cpu and device, and process space > > > > isolation in device would be the major reasons for SVA. SVA also > > > > declares it supports zero-copy while zero-copy doesn't necessarily > > > > depend on SVA. > > > > > > Once you have to explicitly make system calls to declare memory under > > > IO, you loose all of that. > > > > > > Since you've asked the app to be explicit about the DMAs it intends to > > > do, there is not really much reason to use SVA for those DMAs anymore. > > > > Let's see a non-SVA case. We are not using SVA, we can have > > a memory pool by hugetlb or pin, and app can allocate memory > > from this pool, and get stable I/O performance on the memory > > from the pool. But device has its separate page table which > > is not bound with this process, thus lacking the protection > > of process space isolation. Plus, CPU and device are using > > different address. > > So you are relying on the platform to do the SVA for the device? > Sorry for late response. uacce and its userspace framework UADK depend on SVA, leveraging the enhanced security by isolated process address space. This patch is mainly an extension for performance optimization to get stable high-performance I/O on pinned memory even though the hardware supports IO page fault to get pages back after swapping out or page migration. But IO page fault will cause serious latency jitter for high-speed I/O. For slow speed device, they don't need to use this extension. > This feels like it goes back to another topic where I felt the SVA > setup uAPI should be shared and not buried into every driver's unique > ioctls. > > Having something like this in a shared SVA system is somewhat less > strange. Sounds reasonable. On the other hand, uacce seems to be an common uAPI for SVA, and probably the only one for this moment. uacce is a framework not a specific driver as any accelerators can hook into this framework as long as a device provides uacce_ops and register itself by uacce_register(). Uacce, for itself, doesn't bind with any specific hardware. So uacce interfaces are kind of common uAPI :-) > > Jason Thanks Barry ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/3] Adding page_offset_mask to device_dma_parameters
Some devices rely on the address offset in a page to function correctly (NVMe driver as an example). These devices may use a different page size than the Linux kernel. The address offset has to be preserved upon mapping, and in order to do so, we need to record the page_offset_mask first. Signed-off-by: Jianxiong Gao --- include/linux/device.h | 1 + include/linux/dma-mapping.h | 17 + 2 files changed, 18 insertions(+) diff --git a/include/linux/device.h b/include/linux/device.h index 1779f90eeb4c..f44e0659fc66 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -292,6 +292,7 @@ struct device_dma_parameters { */ unsigned int max_segment_size; unsigned long segment_boundary_mask; + unsigned int page_offset_mask; }; /** diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 2e49996a8f39..5529a31fefba 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -500,6 +500,23 @@ static inline int dma_set_seg_boundary(struct device *dev, unsigned long mask) return -EIO; } +static inline unsigned int dma_get_page_offset_mask(struct device *dev) +{ + if (dev->dma_parms) + return dev->dma_parms->page_offset_mask; + return 0; +} + +static inline int dma_set_page_offset_mask(struct device *dev, + unsigned int page_offset_mask) +{ + if (dev->dma_parms) { + dev->dma_parms->page_offset_mask = page_offset_mask; + return 0; + } + return -EIO; +} + static inline int dma_get_cache_alignment(void) { #ifdef ARCH_DMA_MINALIGN -- 2.27.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/3] Add swiotlb offset preserving mapping when dma_dma_parameters->page_offset_mask is non zero.
For devices that need to preserve address offset on mapping through swiotlb, this patch adds offset preserving based on page_offset_mask and keeps the offset if the mask is non zero. This is needed for device drivers like NVMe. Signed-off-by: Jianxiong Gao --- kernel/dma/swiotlb.c | 25 ++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 7c42df6e6100..4cab35f2c9bc 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -468,7 +468,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr, dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start); unsigned long flags; phys_addr_t tlb_addr; - unsigned int nslots, stride, index, wrap; + unsigned int nslots, stride, index, wrap, page_offset_mask, page_offset; int i; unsigned long mask; unsigned long offset_slots; @@ -500,12 +500,16 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr, ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT); + page_offset_mask = dma_get_page_offset_mask(hwdev); + page_offset = orig_addr & page_offset_mask; + alloc_size += page_offset; + /* * For mappings greater than or equal to a page, we limit the stride * (and hence alignment) to a page size. */ nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT; - if (alloc_size >= PAGE_SIZE) + if ((alloc_size >= PAGE_SIZE) || (page_offset_mask > (1 << IO_TLB_SHIFT))) stride = (1 << (PAGE_SHIFT - IO_TLB_SHIFT)); else stride = 1; @@ -583,6 +587,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr, */ for (i = 0; i < nslots; i++) io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT); + /* +* When keeping the offset of the original data, we need to advance +* the tlb_addr by the offset of orig_addr. +*/ + tlb_addr += page_offset; if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE); @@ -598,7 +607,9 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, enum dma_data_direction dir, unsigned long attrs) { unsigned long flags; - int i, count, nslots = ALIGN(alloc_size, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT; + unsigned int num_page_offset_slabs, page_offset_mask = dma_get_page_offset_mask(hwdev); + int i, count; + int nslots = ALIGN(alloc_size + tlb_addr & page_offset_mask, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT; int index = (tlb_addr - io_tlb_start) >> IO_TLB_SHIFT; phys_addr_t orig_addr = io_tlb_orig_addr[index]; @@ -610,6 +621,14 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, ((dir == DMA_FROM_DEVICE) || (dir == DMA_BIDIRECTIONAL))) swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_FROM_DEVICE); + /* +* When dma_get_page_offset_mask is used, we may have padded more slabs +* when padding exceeds one slab. We need to move index back to the +* beginning of the padding. +*/ + num_page_offset_slabs = (tlb_addr & page_offset_mask) / (1 << IO_TLB_SHIFT); + index -= num_page_offset_slabs; + /* * Return the buffer to the free list by setting the corresponding * entries to indicate the number of contiguous entries available. -- 2.27.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/3] Adding device_dma_parameters->offset_preserve_mask to NVMe driver.
NVMe driver relies on the address offset to function properly. This patch adds the offset preserve mask to NVMe driver when mapping via dma_map_sg_attrs and unmapping via nvme_unmap_sg. The mask depends on the page size defined by CC.MPS register of NVMe controller. Signed-off-by: Jianxiong Gao --- drivers/nvme/host/pci.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 856aa31931c1..0b23f04068be 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -580,12 +580,15 @@ static void nvme_free_sgls(struct nvme_dev *dev, struct request *req) static void nvme_unmap_sg(struct nvme_dev *dev, struct request *req) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); - + if (dma_set_page_offset_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1)) + dev_warn(dev->dev, "dma_set_page_offset_mask failed to set offset\n"); if (is_pci_p2pdma_page(sg_page(iod->sg))) pci_p2pdma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req)); else dma_unmap_sg(dev->dev, iod->sg, iod->nents, rq_dma_dir(req)); + if (dma_set_page_offset_mask(dev->dev, 0)) + dev_warn(dev->dev, "dma_set_page_offset_mask failed to reset offset\n"); } static void nvme_unmap_data(struct nvme_dev *dev, struct request *req) @@ -842,7 +845,7 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); blk_status_t ret = BLK_STS_RESOURCE; - int nr_mapped; + int nr_mapped, offset_ret; if (blk_rq_nr_phys_segments(req) == 1) { struct bio_vec bv = req_bvec(req); @@ -868,12 +871,24 @@ static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, if (!iod->nents) goto out_free_sg; + offset_ret = dma_set_page_offset_mask(dev->dev, NVME_CTRL_PAGE_SIZE - 1); + if (offset_ret) { + dev_warn(dev->dev, "dma_set_page_offset_mask failed to set offset\n"); + goto out_free_sg; + } + if (is_pci_p2pdma_page(sg_page(iod->sg))) nr_mapped = pci_p2pdma_map_sg_attrs(dev->dev, iod->sg, iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN); else nr_mapped = dma_map_sg_attrs(dev->dev, iod->sg, iod->nents, rq_dma_dir(req), DMA_ATTR_NO_WARN); + + offset_ret = dma_set_page_offset_mask(dev->dev, 0); + if (offset_ret) { + dev_warn(dev->dev, "dma_set_page_offset_mask failed to reset offset\n"); + goto out_free_sg; + } if (!nr_mapped) goto out_free_sg; -- 2.27.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 2/2] vfio/iommu_type1: Fix some sanity checks in detach group
On Fri, 22 Jan 2021 17:26:35 +0800 Keqian Zhu wrote: > vfio_sanity_check_pfn_list() is used to check whether pfn_list and > notifier are empty when remove the external domain, so it makes a > wrong assumption that only external domain will use the pinning > interface. > > Now we apply the pfn_list check when a vfio_dma is removed and apply > the notifier check when all domains are removed. > > Fixes: a54eb55045ae ("vfio iommu type1: Add support for mediated devices") > Signed-off-by: Keqian Zhu > --- > drivers/vfio/vfio_iommu_type1.c | 33 ++--- > 1 file changed, 10 insertions(+), 23 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 161725395f2f..d8c10f508321 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -957,6 +957,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, > struct vfio_dma *dma, > > static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) > { > + WARN_ON(!RB_EMPTY_ROOT(>pfn_list)); > vfio_unmap_unpin(iommu, dma, true); > vfio_unlink_dma(iommu, dma); > put_task_struct(dma->task); > @@ -2250,23 +2251,6 @@ static void vfio_iommu_unmap_unpin_reaccount(struct > vfio_iommu *iommu) > } > } > > -static void vfio_sanity_check_pfn_list(struct vfio_iommu *iommu) > -{ > - struct rb_node *n; > - > - n = rb_first(>dma_list); > - for (; n; n = rb_next(n)) { > - struct vfio_dma *dma; > - > - dma = rb_entry(n, struct vfio_dma, node); > - > - if (WARN_ON(!RB_EMPTY_ROOT(>pfn_list))) > - break; > - } > - /* mdev vendor driver must unregister notifier */ > - WARN_ON(iommu->notifier.head); > -} > - > /* > * Called when a domain is removed in detach. It is possible that > * the removed domain decided the iova aperture window. Modify the > @@ -2366,10 +2350,10 @@ static void vfio_iommu_type1_detach_group(void > *iommu_data, > kfree(group); > > if (list_empty(>external_domain->group_list)) { > - vfio_sanity_check_pfn_list(iommu); > - > - if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) > + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) { > + WARN_ON(iommu->notifier.head); > vfio_iommu_unmap_unpin_all(iommu); > + } > > kfree(iommu->external_domain); > iommu->external_domain = NULL; > @@ -2403,10 +2387,12 @@ static void vfio_iommu_type1_detach_group(void > *iommu_data, >*/ > if (list_empty(>group_list)) { > if (list_is_singular(>domain_list)) { > - if (!iommu->external_domain) > + if (!iommu->external_domain) { > + WARN_ON(iommu->notifier.head); > vfio_iommu_unmap_unpin_all(iommu); > - else > + } else { > vfio_iommu_unmap_unpin_reaccount(iommu); > + } > } > iommu_domain_free(domain->domain); > list_del(>next); > @@ -2488,9 +2474,10 @@ static void vfio_iommu_type1_release(void *iommu_data) > struct vfio_iommu *iommu = iommu_data; > struct vfio_domain *domain, *domain_tmp; > > + WARN_ON(iommu->notifier.head); I don't see that this does any harm, but isn't it actually redundant? It seems vfio-core only calls the iommu backend release function after removing all the groups, so the tests in _detach_group should catch all cases. We're expecting the vfio bus/mdev driver to remove the notifier when a device is closed, which necessarily occurs before detaching the group. Thanks, Alex > + > if (iommu->external_domain) { > vfio_release_domain(iommu->external_domain, true); > - vfio_sanity_check_pfn_list(iommu); > kfree(iommu->external_domain); > } > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/5] kernel/dma: remove unnecessary unmap_kernel_range
Excerpts from Christoph Hellwig's message of January 27, 2021 5:10 pm: > On Tue, Jan 26, 2021 at 05:08:46PM -0500, Konrad Rzeszutek Wilk wrote: >> On Tue, Jan 26, 2021 at 02:54:01PM +1000, Nicholas Piggin wrote: >> > vunmap will remove ptes. >> >> Should there be some ASSERT after the vunmap to make sure that is the >> case? > > Not really. removing the PTEs is the whole point of vunmap. Everything > else is just house keeping. Agree. I did double check this and wrote a quick test to check ptes were there before the vunmap and cleared after, just to make sure I didn't make a silly mistake with the patch. But in general drivers should be able to trust code behind the API call will do the right thing. Such assertions should go in the vunmap() implementation as appropriate. Thanks, Nick ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 5/6] media: uvcvideo: Use dma_alloc_noncontiguos API
Hi Christoph On Wed, Jan 27, 2021 at 4:56 PM . Christoph Hellwig wrote: > > On Wed, Jan 27, 2021 at 12:29:08AM +0100, Ricardo Ribalda wrote: > > - Is there any platform where dma_alloc_noncontiguos can fail? > > This is, !ops->alloc_noncontiguous and !dev->coherent_dma_mask > > If yes then we need to add a function to let the driver know in > > advance that it has to use the coherent allocator (usb_alloc_coherent > > for uvc) > > dev->coherent_dma_mask is set by the driver. So the only reason why > dma_alloc_noncontiguos will fail is because is because it can't > allocate any memory. > > > - In dma_alloc_noncontiguos, on the dma_alloc_pages fallback. If we > > have a device where the dma happens in only one direction, could not > > get more performance with DMA_FROM/TO_DEVICE instead of > > DMA_BIDIRECTIONAL ? > > Yes, we could probably do that. > > > > > > > Then I have tried to use the API, and I have encountered a problem: on > > uvcvideo the device passed to the memory allocator is different for > > DMA_PAGES and NON_CONTIGUOUS: > > https://github.com/ribalda/linux/blob/042cd497739f71c8d4a83a67ee970369e2baca4a/drivers/media/usb/uvc/uvc_video.c#L1236 > > > > I need to dig a bit tomorrow to figure out why this is, I have > > hardware to test both paths, so it should not be too difficult. > > I always found the USB dma alloc API a little weird, but we might have > to follow the scheme of the usb coherent wrappers there. I have used the current API here: https://git.kernel.org/pub/scm/linux/kernel/git/ribalda/linux.git/log/?h=uvc-noncontiguous And I think the result is very clean. Great work! I have tested it in X86 and in arm64, with similar performance as the previous patchset. Maybe you want to cherry pick that commit into your series I can also send the patch to the list for review if you prefer so. At least in 5.11 rc5 I the same dmadev worked in arm64 and x86. Best regards! -- Ricardo Ribalda ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC 9/9] iommu/vt-d: Introduce map_sg() for Intel IOMMUs
Attempt to reduce indirect call overhead when mapping a substantial scatter-gather list. Signed-off-by: Chuck Lever --- drivers/iommu/intel/iommu.c | 37 + 1 file changed, 37 insertions(+) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 013097b6d55f..deae39f1477a 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4952,6 +4952,42 @@ static int intel_iommu_map(struct iommu_domain *domain, hpa >> VTD_PAGE_SHIFT, size, prot); } +static int intel_iommu_map_sg(struct iommu_domain *domain, unsigned long iova, + struct scatterlist *sg, unsigned int nents, + int prot, gfp_t gfp, size_t *mapped) +{ + unsigned int i = 0; + phys_addr_t start; + size_t len = 0; + int ret; + + while (i <= nents) { + phys_addr_t s_phys = sg_phys(sg); + + if (len && s_phys != start + len) { + ret = intel_iommu_map(domain, iova + *mapped, start, + len, prot, gfp); + if (ret) + return ret; + + *mapped += len; + len = 0; + } + + if (len) { + len += sg->length; + } else { + len = sg->length; + start = s_phys; + } + + if (++i < nents) + sg = sg_next(sg); + } + + return 0; +} + static size_t intel_iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size, struct iommu_iotlb_gather *gather) @@ -5519,6 +,7 @@ const struct iommu_ops intel_iommu_ops = { .aux_detach_dev = intel_iommu_aux_detach_device, .aux_get_pasid = intel_iommu_aux_get_pasid, .map= intel_iommu_map, + .map_sg = intel_iommu_map_sg, .iotlb_sync_map = intel_iommu_iotlb_sync_map, .unmap = intel_iommu_unmap, .flush_iotlb_all= intel_flush_iotlb_all, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC 4/9] iommu/io-pgtable: Introduce map_sg() as a page table op
From: Isaac J. Manjarres While mapping a scatter-gather list, iommu_map_sg() calls into the IOMMU driver through an indirect call, which can call into the io-pgtable code through another indirect call. This sequence of going through the IOMMU core code, the IOMMU driver, and finally the io-pgtable code, occurs for every element in the scatter-gather list, in the worse case, which is not optimal. Introduce a map_sg callback in the io-pgtable ops so that IOMMU drivers can invoke it with the complete scatter-gather list, so that it can be processed within the io-pgtable code entirely, reducing the number of indirect calls, and boosting overall iommu_map_sg() performance. Signed-off-by: Isaac J. Manjarres Tested-by: Sai Prakash Ranjan Signed-off-by: Chuck Lever --- include/linux/io-pgtable.h |6 ++ 1 file changed, 6 insertions(+) diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index ea727eb1a1a9..6d0e73172603 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -147,6 +147,9 @@ struct io_pgtable_cfg { * struct io_pgtable_ops - Page table manipulation API for IOMMU drivers. * * @map: Map a physically contiguous memory region. + * @map_sg: Map a scatter-gather list of physically contiguous memory + *chunks. The mapped pointer argument is used to store how + *many bytes are mapped. * @unmap:Unmap a physically contiguous memory region. * @iova_to_phys: Translate iova to physical address. * @@ -156,6 +159,9 @@ struct io_pgtable_cfg { struct io_pgtable_ops { int (*map)(struct io_pgtable_ops *ops, unsigned long iova, phys_addr_t paddr, size_t size, int prot, gfp_t gfp); + int (*map_sg)(struct io_pgtable_ops *ops, unsigned long iova, + struct scatterlist *sg, unsigned int nents, int prot, + gfp_t gfp, size_t *mapped); size_t (*unmap)(struct io_pgtable_ops *ops, unsigned long iova, size_t size, struct iommu_iotlb_gather *gather); phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC 8/9] iommu/arm-smmu: Hook up map_sg()
From: Isaac J. Manjarres Now that everything is in place for iommu_map_sg() to defer mapping a scatter-gather list to the io-pgtable layer, implement the map_sg() callback in the SMMU driver, so that iommu_map_sg() can invoke it with the entire scatter-gather list that will be mapped. Signed-off-by: Isaac J. Manjarres Tested-by: Sai Prakash Ranjan Signed-off-by: Chuck Lever --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index d8c6bfde6a61..52acc6858512 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -1208,6 +1208,24 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, return ret; } +static int arm_smmu_map_sg(struct iommu_domain *domain, unsigned long iova, + struct scatterlist *sg, unsigned int nents, int prot, + gfp_t gfp, size_t *mapped) +{ + struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; + struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu; + int ret; + + if (!ops) + return -ENODEV; + + arm_smmu_rpm_get(smmu); + ret = ops->map_sg(ops, iova, sg, nents, prot, gfp, mapped); + arm_smmu_rpm_put(smmu); + + return ret; +} + static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size, struct iommu_iotlb_gather *gather) { @@ -1624,6 +1642,7 @@ static struct iommu_ops arm_smmu_ops = { .domain_free= arm_smmu_domain_free, .attach_dev = arm_smmu_attach_dev, .map= arm_smmu_map, + .map_sg = arm_smmu_map_sg, .unmap = arm_smmu_unmap, .flush_iotlb_all= arm_smmu_flush_iotlb_all, .iotlb_sync = arm_smmu_iotlb_sync, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] of/device: Update dma_range_map only when dev has valid dma-ranges
On Tue, 19 Jan 2021 18:52:03 +0800, Yong Wu wrote: > The commit e0d072782c73 ("dma-mapping: introduce DMA range map, > supplanting dma_pfn_offset") always update dma_range_map even though it was > already set, like in the sunxi_mbus driver. the issue is reported at [1]. > This patch avoid this(Updating it only when dev has valid dma-ranges). > > Meanwhile, dma_range_map contains the devices' dma_ranges information, > This patch moves dma_range_map before of_iommu_configure. The iommu > driver may need to know the dma_address requirements of its iommu > consumer devices. > > [1] > https://lore.kernel.org/linux-arm-kernel/5c7946f3-b56e-da00-a750-be097c7ce...@arm.com/ > > CC: Rob Herring > CC: Frank Rowand > Fixes: e0d072782c73 ("dma-mapping: introduce DMA range map, supplanting > dma_pfn_offset"), > Suggested-by: Robin Murphy > Signed-off-by: Yong Wu > Signed-off-by: Paul Kocialkowski > Reviewed-by: Rob Herring > --- > drivers/of/device.c | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > Applied, thanks! ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC 3/9] iommu/vt-d: Add iotlb_sync_map callback
From: Lu Baolu Some Intel VT-d hardware implementations don't support memory coherency for page table walk (presented by the Page-Walk-coherency bit in the ecap register), so that software must flush the corresponding CPU cache lines explicitly after each page table entry update. The iommu_map_sg() code iterates through the given scatter-gather list and invokes iommu_map() for each element in the scatter-gather list, which calls into the vendor IOMMU driver through iommu_ops callback. As the result, a single sg mapping may lead to multiple cache line flushes, which leads to the degradation of I/O performance after the commit ("iommu/vt-d: Convert intel iommu driver to the iommu ops"). Fix this by adding iotlb_sync_map callback and centralizing the clflush operations after all sg mappings. Fixes: c588072bba6b5 ("iommu/vt-d: Convert intel iommu driver to the iommu ops") Reported-by: Chuck Lever Link: https://lore.kernel.org/linux-iommu/d81314ed-5673-44a6-b597-090e3cb83...@oracle.com/ Signed-off-by: Lu Baolu Cc: Robin Murphy [ cel: removed @first_pte, which is no longer used ] Signed-off-by: Chuck Lever --- drivers/iommu/intel/iommu.c | 90 +-- 1 file changed, 60 insertions(+), 30 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index f665322a0991..013097b6d55f 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2298,9 +2298,9 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, unsigned long phys_pfn, unsigned long nr_pages, int prot) { - struct dma_pte *first_pte = NULL, *pte = NULL; unsigned int largepage_lvl = 0; unsigned long lvl_pages = 0; + struct dma_pte *pte = NULL; phys_addr_t pteval; u64 attr; @@ -2322,7 +2322,7 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, largepage_lvl = hardware_largepage_caps(domain, iov_pfn, phys_pfn, nr_pages); - first_pte = pte = pfn_to_dma_pte(domain, iov_pfn, _lvl); + pte = pfn_to_dma_pte(domain, iov_pfn, _lvl); if (!pte) return -ENOMEM; /* It is large page*/ @@ -2383,34 +2383,14 @@ __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, * recalculate 'pte' and switch back to smaller pages for the * end of the mapping, if the trailing size is not enough to * use another superpage (i.e. nr_pages < lvl_pages). +* +* We leave clflush for the leaf pte changes to iotlb_sync_map() +* callback. */ pte++; if (!nr_pages || first_pte_in_page(pte) || - (largepage_lvl > 1 && nr_pages < lvl_pages)) { - domain_flush_cache(domain, first_pte, - (void *)pte - (void *)first_pte); + (largepage_lvl > 1 && nr_pages < lvl_pages)) pte = NULL; - } - } - - return 0; -} - -static int -domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, - unsigned long phys_pfn, unsigned long nr_pages, int prot) -{ - int iommu_id, ret; - struct intel_iommu *iommu; - - /* Do the real mapping first */ - ret = __domain_mapping(domain, iov_pfn, phys_pfn, nr_pages, prot); - if (ret) - return ret; - - for_each_domain_iommu(iommu_id, domain) { - iommu = g_iommus[iommu_id]; - __mapping_notify_one(iommu, domain, iov_pfn, nr_pages); } return 0; @@ -4943,7 +4923,6 @@ static int intel_iommu_map(struct iommu_domain *domain, struct dmar_domain *dmar_domain = to_dmar_domain(domain); u64 max_addr; int prot = 0; - int ret; if (iommu_prot & IOMMU_READ) prot |= DMA_PTE_READ; @@ -4969,9 +4948,8 @@ static int intel_iommu_map(struct iommu_domain *domain, /* Round up size to next multiple of PAGE_SIZE, if it and the low bits of hpa would take us onto the next page */ size = aligned_nrpages(hpa, size); - ret = domain_mapping(dmar_domain, iova >> VTD_PAGE_SHIFT, -hpa >> VTD_PAGE_SHIFT, size, prot); - return ret; + return __domain_mapping(dmar_domain, iova >> VTD_PAGE_SHIFT, + hpa >> VTD_PAGE_SHIFT, size, prot); } static size_t intel_iommu_unmap(struct iommu_domain *domain, @@ -5478,6 +5456,57 @@ static bool risky_device(struct pci_dev *pdev) return false; } +static void clflush_sync_map(struct dmar_domain *domain, unsigned long clf_pfn, +unsigned long clf_pages) +{ + struct dma_pte
[PATCH RFC 7/9] iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers
From: Isaac J. Manjarres Add support for IOMMU drivers to have their own map_sg() callbacks. This completes the path for having iommu_map_sg() invoke an IOMMU driver's map_sg() callback, which can then invoke the io-pgtable map_sg() callback with the entire scatter-gather list, so that it can be processed entirely in the io-pgtable layer. For IOMMU drivers that do not provide a callback, the default implementation of iterating through the scatter-gather list, while calling iommu_map() will be used. Signed-off-by: Isaac J. Manjarres Tested-by: Sai Prakash Ranjan [ cel: adjusted new iotlb_sync_map call site ] Signed-off-by: Chuck Lever --- drivers/iommu/iommu.c | 12 include/linux/iommu.h |5 + 2 files changed, 17 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index ed879a4d7fac..bd7adbd0339b 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2551,6 +2551,18 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova, unsigned int i = 0; int ret; + if (ops->map_sg) { + ret = ops->map_sg(domain, iova, sg, nents, prot, gfp, ); + + if (ops->iotlb_sync_map) + ops->iotlb_sync_map(domain, iova, mapped); + + if (ret) + goto out_err; + + return mapped; + } + while (i <= nents) { phys_addr_t s_phys = sg_phys(sg); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index cd5f35022a25..667edc7b034a 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -192,6 +192,8 @@ struct iommu_iotlb_gather { * @attach_dev: attach device to an iommu domain * @detach_dev: detach device from an iommu domain * @map: map a physically contiguous memory region to an iommu domain + * @map_sg: map a scatter-gather list of physically contiguous chunks to + * an iommu domain. * @unmap: unmap a physically contiguous memory region from an iommu domain * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain * @iotlb_sync_map: Sync mappings created recently using @map to the hardware @@ -243,6 +245,9 @@ struct iommu_ops { void (*detach_dev)(struct iommu_domain *domain, struct device *dev); int (*map)(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot, gfp_t gfp); + int (*map_sg)(struct iommu_domain *domain, unsigned long iova, + struct scatterlist *sg, unsigned int nents, int prot, + gfp_t gfp, size_t *mapped); size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, size_t size, struct iommu_iotlb_gather *iotlb_gather); void (*flush_iotlb_all)(struct iommu_domain *domain); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC 5/9] iommu/io-pgtable-arm: Hook up map_sg()
From: Isaac J. Manjarres Implement the map_sg io-pgtable op for the ARM LPAE io-pgtable code, so that IOMMU drivers can call it when they need to map a scatter-gather list. Signed-off-by: Isaac J. Manjarres Tested-by: Sai Prakash Ranjan Signed-off-by: Chuck Lever --- drivers/iommu/io-pgtable-arm.c | 86 drivers/iommu/iommu.c | 12 +++--- include/linux/iommu.h |8 3 files changed, 101 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 87def58e79b5..0c11529442b8 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -473,6 +473,91 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova, return ret; } +static int arm_lpae_map_by_pgsize(struct io_pgtable_ops *ops, + unsigned long iova, phys_addr_t paddr, + size_t size, int iommu_prot, gfp_t gfp, + size_t *mapped) +{ + struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); + struct io_pgtable_cfg *cfg = >iop.cfg; + arm_lpae_iopte *ptep = data->pgd; + int ret, lvl = data->start_level; + arm_lpae_iopte prot = arm_lpae_prot_to_pte(data, iommu_prot); + unsigned int min_pagesz = 1 << __ffs(cfg->pgsize_bitmap); + long iaext = (s64)(iova + size - 1) >> cfg->ias; + size_t pgsize; + + if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) { + pr_err("unaligned: iova 0x%lx pa %pa size 0x%zx min_pagesz 0x%x\n", + iova, , size, min_pagesz); + return -EINVAL; + } + + if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) + iaext = ~iaext; + if (WARN_ON(iaext || (paddr + size - 1) >> cfg->oas)) + return -ERANGE; + + while (size) { + pgsize = iommu_pgsize(cfg->pgsize_bitmap, iova | paddr, size); + ret = __arm_lpae_map(data, iova, paddr, pgsize, prot, lvl, ptep, +gfp); + if (ret) + return ret; + + iova += pgsize; + paddr += pgsize; + *mapped += pgsize; + size -= pgsize; + } + + return 0; +} + +static int arm_lpae_map_sg(struct io_pgtable_ops *ops, unsigned long iova, + struct scatterlist *sg, unsigned int nents, + int iommu_prot, gfp_t gfp, size_t *mapped) +{ + + size_t len = 0; + unsigned int i = 0; + int ret; + phys_addr_t start; + + *mapped = 0; + + /* If no access, then nothing to do */ + if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE))) + return 0; + + while (i <= nents) { + phys_addr_t s_phys = sg_phys(sg); + + if (len && s_phys != start + len) { + ret = arm_lpae_map_by_pgsize(ops, iova + *mapped, start, +len, iommu_prot, gfp, +mapped); + + if (ret) + return ret; + + len = 0; + } + + if (len) { + len += sg->length; + } else { + len = sg->length; + start = s_phys; + } + + if (++i < nents) + sg = sg_next(sg); + } + + return 0; +} + static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl, arm_lpae_iopte *ptep) { @@ -750,6 +835,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg) data->iop.ops = (struct io_pgtable_ops) { .map= arm_lpae_map, + .map_sg = arm_lpae_map_sg, .unmap = arm_lpae_unmap, .iova_to_phys = arm_lpae_iova_to_phys, }; diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 3d099a31ddca..ed879a4d7fac 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2346,8 +2346,8 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova) } EXPORT_SYMBOL_GPL(iommu_iova_to_phys); -static size_t iommu_pgsize(struct iommu_domain *domain, - unsigned long addr_merge, size_t size) +size_t iommu_pgsize(unsigned long pgsize_bitmap, unsigned long addr_merge, + size_t size) { unsigned int pgsize_idx; size_t pgsize; @@ -2366,7 +2366,7 @@ static size_t iommu_pgsize(struct iommu_domain *domain, pgsize = (1UL << (pgsize_idx + 1)) - 1; /* throw away page sizes not supported by the hardware */ - pgsize &= domain->pgsize_bitmap; + pgsize &= pgsize_bitmap; /* make sure
[PATCH RFC 6/9] iommu/io-pgtable-arm-v7s: Hook up map_sg()
From: Isaac J. Manjarres Implement the map_sg io-pgtable op for the ARMv7s io-pgtable code, so that IOMMU drivers can call it when they need to map a scatter-gather list. Signed-off-by: Isaac J. Manjarres Tested-by: Sai Prakash Ranjan Signed-off-by: Chuck Lever --- drivers/iommu/io-pgtable-arm-v7s.c | 90 1 file changed, 90 insertions(+) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index 1d92ac948db7..8665dabb753b 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -545,6 +545,95 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova, return ret; } +static int arm_v7s_map_by_pgsize(struct io_pgtable_ops *ops, +unsigned long iova, phys_addr_t paddr, +size_t size, int prot, gfp_t gfp, +size_t *mapped) +{ + struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops); + struct io_pgtable *iop = >iop; + struct io_pgtable_cfg *cfg = >cfg; + unsigned int min_pagesz = 1 << __ffs(cfg->pgsize_bitmap); + int ret; + size_t pgsize; + + if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) { + pr_err("unaligned: iova 0x%lx pa %pa size 0x%zx min_pagesz 0x%x\n", + iova, , size, min_pagesz); + return -EINVAL; + } + + if (WARN_ON((iova + size - 1) >= (1ULL << cfg->ias) || + (paddr + size - 1) >= (1ULL << cfg->oas))) + return -ERANGE; + + while (size) { + pgsize = iommu_pgsize(cfg->pgsize_bitmap, iova | paddr, size); + ret = __arm_v7s_map(data, iova, paddr, pgsize, prot, 1, + data->pgd, gfp); + + if (iop->cfg.quirks & IO_PGTABLE_QUIRK_TLBI_ON_MAP) { + io_pgtable_tlb_flush_walk(>iop, iova, size, + ARM_V7S_BLOCK_SIZE(2)); + } else { + wmb(); + } + + if (ret) + return ret; + + iova += pgsize; + paddr += pgsize; + *mapped += pgsize; + size -= pgsize; + } + + return 0; +} + +static int arm_v7s_map_sg(struct io_pgtable_ops *ops, unsigned long iova, + struct scatterlist *sg, unsigned int nents, + int iommu_prot, gfp_t gfp, size_t *mapped) +{ + size_t len = 0; + unsigned int i = 0; + int ret; + phys_addr_t start; + + *mapped = 0; + + /* If no access, then nothing to do */ + if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE))) + return 0; + + while (i <= nents) { + phys_addr_t s_phys = sg_phys(sg); + + if (len && s_phys != start + len) { + ret = arm_v7s_map_by_pgsize(ops, iova + *mapped, start, + len, iommu_prot, gfp, + mapped); + + if (ret) + return ret; + + len = 0; + } + + if (len) { + len += sg->length; + } else { + len = sg->length; + start = s_phys; + } + + if (++i < nents) + sg = sg_next(sg); + } + + return 0; +} + static void arm_v7s_free_pgtable(struct io_pgtable *iop) { struct arm_v7s_io_pgtable *data = io_pgtable_to_data(iop); @@ -783,6 +872,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg, data->iop.ops = (struct io_pgtable_ops) { .map= arm_v7s_map, + .map_sg = arm_v7s_map_sg, .unmap = arm_v7s_unmap, .iova_to_phys = arm_v7s_iova_to_phys, }; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC 2/9] iommu: Add iova and size as parameters in iotlb_sync_map
From: Yong Wu iotlb_sync_map allow IOMMU drivers tlb sync after completing the whole mapping. This patch adds iova and size as the parameters in it. then the IOMMU driver could flush tlb with the whole range once after iova mapping to improve performance. Signed-off-by: Yong Wu Reviewed-by: Robin Murphy Signed-off-by: Chuck Lever --- drivers/iommu/iommu.c |4 ++-- drivers/iommu/tegra-gart.c |7 +-- include/linux/iommu.h |3 ++- 3 files changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index c304a6a30d42..3d099a31ddca 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2443,7 +2443,7 @@ static int _iommu_map(struct iommu_domain *domain, unsigned long iova, ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL); if (ret == 0 && ops->iotlb_sync_map) - ops->iotlb_sync_map(domain); + ops->iotlb_sync_map(domain, iova, size); return ret; } @@ -2575,7 +2575,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova, } if (ops->iotlb_sync_map) - ops->iotlb_sync_map(domain); + ops->iotlb_sync_map(domain, iova, mapped); return mapped; out_err: diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c index fac720273889..05e8e19b8269 100644 --- a/drivers/iommu/tegra-gart.c +++ b/drivers/iommu/tegra-gart.c @@ -261,7 +261,8 @@ static int gart_iommu_of_xlate(struct device *dev, return 0; } -static void gart_iommu_sync_map(struct iommu_domain *domain) +static void gart_iommu_sync_map(struct iommu_domain *domain, unsigned long iova, + size_t size) { FLUSH_GART_REGS(gart_handle); } @@ -269,7 +270,9 @@ static void gart_iommu_sync_map(struct iommu_domain *domain) static void gart_iommu_sync(struct iommu_domain *domain, struct iommu_iotlb_gather *gather) { - gart_iommu_sync_map(domain); + size_t length = gather->end - gather->start; + + gart_iommu_sync_map(domain, gather->start, length); } static const struct iommu_ops gart_iommu_ops = { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index b3f0e2018c62..9ce0aa9e236b 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -246,7 +246,8 @@ struct iommu_ops { size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, size_t size, struct iommu_iotlb_gather *iotlb_gather); void (*flush_iotlb_all)(struct iommu_domain *domain); - void (*iotlb_sync_map)(struct iommu_domain *domain); + void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova, + size_t size); void (*iotlb_sync)(struct iommu_domain *domain, struct iommu_iotlb_gather *iotlb_gather); phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC 1/9] iommu: Move iotlb_sync_map out from __iommu_map
From: Yong Wu In the end of __iommu_map, It alway call iotlb_sync_map. This patch moves iotlb_sync_map out from __iommu_map since it is unnecessary to call this for each sg segment especially iotlb_sync_map is flush tlb all currently. Add a little helper _iommu_map for this. Signed-off-by: Yong Wu Reviewed-by: Robin Murphy Signed-off-by: Chuck Lever --- drivers/iommu/iommu.c | 23 ++- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index ffeebda8d6de..c304a6a30d42 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2426,9 +2426,6 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova, size -= pgsize; } - if (ops->iotlb_sync_map) - ops->iotlb_sync_map(domain); - /* unroll mapping in case something went wrong */ if (ret) iommu_unmap(domain, orig_iova, orig_size - size); @@ -2438,18 +2435,31 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova, return ret; } +static int _iommu_map(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) +{ + const struct iommu_ops *ops = domain->ops; + int ret; + + ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL); + if (ret == 0 && ops->iotlb_sync_map) + ops->iotlb_sync_map(domain); + + return ret; +} + int iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot) { might_sleep(); - return __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL); + return _iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL); } EXPORT_SYMBOL_GPL(iommu_map); int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot) { - return __iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC); + return _iommu_map(domain, iova, paddr, size, prot, GFP_ATOMIC); } EXPORT_SYMBOL_GPL(iommu_map_atomic); @@ -2533,6 +2543,7 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova, struct scatterlist *sg, unsigned int nents, int prot, gfp_t gfp) { + const struct iommu_ops *ops = domain->ops; size_t len = 0, mapped = 0; phys_addr_t start; unsigned int i = 0; @@ -2563,6 +2574,8 @@ static size_t __iommu_map_sg(struct iommu_domain *domain, unsigned long iova, sg = sg_next(sg); } + if (ops->iotlb_sync_map) + ops->iotlb_sync_map(domain); return mapped; out_err: ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH RFC 0/9] Possible set of VT-d optimizations
Hi- This collection of patches seems to get the best throughtput results so far. The NFS WRITE result is fully restored, and the NFS READ result is very close to fully restored. Children see throughput for 12 initial writers = 5008474.03 kB/sec Parent sees throughput for 12 initial writers = 4996927.80 kB/sec Min throughput per process = 416956.88 kB/sec Max throughput per process = 417910.22 kB/sec Avg throughput per process = 417372.84 kB/sec Min xfer= 1046272.00 kB CPU Utilization: Wall time2.515CPU time1.996CPU utilization 79.37 % Children see throughput for 12 rewriters= 5020584.59 kB/sec Parent sees throughput for 12 rewriters = 5012539.29 kB/sec Min throughput per process = 417799.00 kB/sec Max throughput per process = 419082.22 kB/sec Avg throughput per process = 418382.05 kB/sec Min xfer= 1046528.00 kB CPU utilization: Wall time2.507CPU time2.024CPU utilization 80.73 % Children see throughput for 12 readers = 5805484.25 kB/sec Parent sees throughput for 12 readers = 5799535.68 kB/sec Min throughput per process = 482888.16 kB/sec Max throughput per process = 48.16 kB/sec Avg throughput per process = 483790.35 kB/sec Min xfer= 1045760.00 kB CPU utilization: Wall time2.167CPU time1.964CPU utilization 90.63 % Children see throughput for 12 re-readers = 5812227.16 kB/sec Parent sees throughput for 12 re-readers= 5803793.06 kB/sec Min throughput per process = 483242.97 kB/sec Max throughput per process = 485724.41 kB/sec Avg throughput per process = 484352.26 kB/sec Min xfer= 1043456.00 kB CPU utilization: Wall time2.161CPU time1.976CPU utilization 91.45 % I've included a simple-minded implementation of a map_sg op for the Intel IOMMU. This is nothing more than a copy of the loop in __iommu_map_sg() with the call to __iommu_map() replaced with a call to intel_iommu_map(). --- Chuck Lever (1): iommu/vt-d: Introduce map_sg() for Intel IOMMUs Isaac J. Manjarres (5): iommu/io-pgtable: Introduce map_sg() as a page table op iommu/io-pgtable-arm: Hook up map_sg() iommu/io-pgtable-arm-v7s: Hook up map_sg() iommu: Introduce map_sg() as an IOMMU op for IOMMU drivers iommu/arm-smmu: Hook up map_sg() Lu Baolu (1): iommu/vt-d: Add iotlb_sync_map callback Yong Wu (2): iommu: Move iotlb_sync_map out from __iommu_map iommu: Add iova and size as parameters in iotlb_sync_map drivers/iommu/arm/arm-smmu/arm-smmu.c | 19 drivers/iommu/intel/iommu.c | 131 -- drivers/iommu/io-pgtable-arm-v7s.c| 90 ++ drivers/iommu/io-pgtable-arm.c| 86 + drivers/iommu/iommu.c | 47 +++-- drivers/iommu/tegra-gart.c| 7 +- include/linux/iommu.h | 16 +++- 7 files changed, 353 insertions(+), 43 deletions(-) -- Chuck Lever ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] of/device: Update dma_range_map only when dev has valid dma-ranges
On 2021-01-27 19:09, Rob Herring wrote: On Wed, Jan 27, 2021 at 7:13 AM Robin Murphy wrote: [ + Christoph, Marek ] On 2021-01-27 13:00, Paul Kocialkowski wrote: Hi, On Tue 19 Jan 21, 18:52, Yong Wu wrote: The commit e0d072782c73 ("dma-mapping: introduce DMA range map, supplanting dma_pfn_offset") always update dma_range_map even though it was already set, like in the sunxi_mbus driver. the issue is reported at [1]. This patch avoid this(Updating it only when dev has valid dma-ranges). Meanwhile, dma_range_map contains the devices' dma_ranges information, This patch moves dma_range_map before of_iommu_configure. The iommu driver may need to know the dma_address requirements of its iommu consumer devices. Just a gentle ping on this issue, it would be nice to have this fix merged ASAP, in the next RC :) Ack to that - Rob, Frank, do you want to take this through the OF tree, or shall we take it through the DMA-mapping tree like the original culprit? I've already got some fixes queued up and can take it. Brilliant, thanks! Suggested-by doesn't mean you are happy with the implementation. So Acked-by or Reviewed-by? It still feels slightly awkward to give a tag to say "yes, this is exactly what I suggested", but for the avoidance of doubt, Reviewed-by: Robin Murphy Cheers, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] of/device: Update dma_range_map only when dev has valid dma-ranges
On 2021-01-27 19:07, Rob Herring wrote: On Tue, Jan 19, 2021 at 4:52 AM Yong Wu wrote: The commit e0d072782c73 ("dma-mapping: introduce DMA range map, supplanting dma_pfn_offset") always update dma_range_map even though it was already set, like in the sunxi_mbus driver. the issue is reported at [1]. This patch avoid this(Updating it only when dev has valid dma-ranges). only when dev *doesn't* have valid dma-ranges? I believe the intent is to mean when a valid "dma-ranges" property is specified in DT. The implementation allows DT to take precedence even if platform code *has* already installed a dma_range_map, although we shouldn't expect that to ever happen (except perhaps in the wild early days of platform bringup). Robin. Meanwhile, dma_range_map contains the devices' dma_ranges information, This patch moves dma_range_map before of_iommu_configure. The iommu driver may need to know the dma_address requirements of its iommu consumer devices. [1] https://lore.kernel.org/linux-arm-kernel/5c7946f3-b56e-da00-a750-be097c7ce...@arm.com/ CC: Rob Herring CC: Frank Rowand Fixes: e0d072782c73 ("dma-mapping: introduce DMA range map, supplanting dma_pfn_offset"), Suggested-by: Robin Murphy Signed-off-by: Yong Wu Signed-off-by: Paul Kocialkowski Reviewed-by: Rob Herring --- drivers/of/device.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/of/device.c b/drivers/of/device.c index aedfaaafd3e7..1122daa8e273 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -162,9 +162,11 @@ int of_dma_configure_id(struct device *dev, struct device_node *np, mask = DMA_BIT_MASK(ilog2(end) + 1); dev->coherent_dma_mask &= mask; *dev->dma_mask &= mask; - /* ...but only set bus limit if we found valid dma-ranges earlier */ - if (!ret) + /* ...but only set bus limit and range map if we found valid dma-ranges earlier */ + if (!ret) { dev->bus_dma_limit = end; + dev->dma_range_map = map; + } coherent = of_dma_is_coherent(np); dev_dbg(dev, "device is%sdma coherent\n", @@ -172,6 +174,9 @@ int of_dma_configure_id(struct device *dev, struct device_node *np, iommu = of_iommu_configure(dev, np, id); if (PTR_ERR(iommu) == -EPROBE_DEFER) { + /* Don't touch range map if it wasn't set from a valid dma-ranges */ + if (!ret) + dev->dma_range_map = NULL; kfree(map); return -EPROBE_DEFER; } @@ -181,7 +186,6 @@ int of_dma_configure_id(struct device *dev, struct device_node *np, arch_setup_dma_ops(dev, dma_start, size, iommu, coherent); - dev->dma_range_map = map; return 0; } EXPORT_SYMBOL_GPL(of_dma_configure_id); -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] of/device: Update dma_range_map only when dev has valid dma-ranges
On Wed, Jan 27, 2021 at 7:13 AM Robin Murphy wrote: > > [ + Christoph, Marek ] > > On 2021-01-27 13:00, Paul Kocialkowski wrote: > > Hi, > > > > On Tue 19 Jan 21, 18:52, Yong Wu wrote: > >> The commit e0d072782c73 ("dma-mapping: introduce DMA range map, > >> supplanting dma_pfn_offset") always update dma_range_map even though it was > >> already set, like in the sunxi_mbus driver. the issue is reported at [1]. > >> This patch avoid this(Updating it only when dev has valid dma-ranges). > >> > >> Meanwhile, dma_range_map contains the devices' dma_ranges information, > >> This patch moves dma_range_map before of_iommu_configure. The iommu > >> driver may need to know the dma_address requirements of its iommu > >> consumer devices. > > > > Just a gentle ping on this issue, it would be nice to have this fix merged > > ASAP, in the next RC :) > > Ack to that - Rob, Frank, do you want to take this through the OF tree, > or shall we take it through the DMA-mapping tree like the original culprit? I've already got some fixes queued up and can take it. Suggested-by doesn't mean you are happy with the implementation. So Acked-by or Reviewed-by? Rob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] of/device: Update dma_range_map only when dev has valid dma-ranges
On Tue, Jan 19, 2021 at 4:52 AM Yong Wu wrote: > > The commit e0d072782c73 ("dma-mapping: introduce DMA range map, > supplanting dma_pfn_offset") always update dma_range_map even though it was > already set, like in the sunxi_mbus driver. the issue is reported at [1]. > This patch avoid this(Updating it only when dev has valid dma-ranges). only when dev *doesn't* have valid dma-ranges? > Meanwhile, dma_range_map contains the devices' dma_ranges information, > This patch moves dma_range_map before of_iommu_configure. The iommu > driver may need to know the dma_address requirements of its iommu > consumer devices. > > [1] > https://lore.kernel.org/linux-arm-kernel/5c7946f3-b56e-da00-a750-be097c7ce...@arm.com/ > > CC: Rob Herring > CC: Frank Rowand > Fixes: e0d072782c73 ("dma-mapping: introduce DMA range map, supplanting > dma_pfn_offset"), > Suggested-by: Robin Murphy > Signed-off-by: Yong Wu > Signed-off-by: Paul Kocialkowski > Reviewed-by: Rob Herring > --- > drivers/of/device.c | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/of/device.c b/drivers/of/device.c > index aedfaaafd3e7..1122daa8e273 100644 > --- a/drivers/of/device.c > +++ b/drivers/of/device.c > @@ -162,9 +162,11 @@ int of_dma_configure_id(struct device *dev, struct > device_node *np, > mask = DMA_BIT_MASK(ilog2(end) + 1); > dev->coherent_dma_mask &= mask; > *dev->dma_mask &= mask; > - /* ...but only set bus limit if we found valid dma-ranges earlier */ > - if (!ret) > + /* ...but only set bus limit and range map if we found valid > dma-ranges earlier */ > + if (!ret) { > dev->bus_dma_limit = end; > + dev->dma_range_map = map; > + } > > coherent = of_dma_is_coherent(np); > dev_dbg(dev, "device is%sdma coherent\n", > @@ -172,6 +174,9 @@ int of_dma_configure_id(struct device *dev, struct > device_node *np, > > iommu = of_iommu_configure(dev, np, id); > if (PTR_ERR(iommu) == -EPROBE_DEFER) { > + /* Don't touch range map if it wasn't set from a valid > dma-ranges */ > + if (!ret) > + dev->dma_range_map = NULL; > kfree(map); > return -EPROBE_DEFER; > } > @@ -181,7 +186,6 @@ int of_dma_configure_id(struct device *dev, struct > device_node *np, > > arch_setup_dma_ops(dev, dma_start, size, iommu, coherent); > > - dev->dma_range_map = map; > return 0; > } > EXPORT_SYMBOL_GPL(of_dma_configure_id); > -- > 2.18.0 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/vt-d: do not use flush-queue when caching-mode is on
> On Jan 27, 2021, at 3:25 AM, Lu Baolu wrote: > > On 2021/1/27 14:17, Nadav Amit wrote: >> From: Nadav Amit >> When an Intel IOMMU is virtualized, and a physical device is >> passed-through to the VM, changes of the virtual IOMMU need to be >> propagated to the physical IOMMU. The hypervisor therefore needs to >> monitor PTE mappings in the IOMMU page-tables. Intel specifications >> provide "caching-mode" capability that a virtual IOMMU uses to report >> that the IOMMU is virtualized and a TLB flush is needed after mapping to >> allow the hypervisor to propagate virtual IOMMU mappings to the physical >> IOMMU. To the best of my knowledge no real physical IOMMU reports >> "caching-mode" as turned on. >> Synchronizing the virtual and the physical IOMMU tables is expensive if >> the hypervisor is unaware which PTEs have changed, as the hypervisor is >> required to walk all the virtualized tables and look for changes. >> Consequently, domain flushes are much more expensive than page-specific >> flushes on virtualized IOMMUs with passthrough devices. The kernel >> therefore exploited the "caching-mode" indication to avoid domain >> flushing and use page-specific flushing in virtualized environments. See >> commit 78d5f0f500e6 ("intel-iommu: Avoid global flushes with caching >> mode.") >> This behavior changed after commit 13cf01744608 ("iommu/vt-d: Make use >> of iova deferred flushing"). Now, when batched TLB flushing is used (the >> default), full TLB domain flushes are performed frequently, requiring >> the hypervisor to perform expensive synchronization between the virtual >> TLB and the physical one. >> Getting batched TLB flushes to use in such circumstances page-specific >> invalidations again is not easy, since the TLB invalidation scheme >> assumes that "full" domain TLB flushes are performed for scalability. >> Disable batched TLB flushes when caching-mode is on, as the performance >> benefit from using batched TLB invalidations is likely to be much >> smaller than the overhead of the virtual-to-physical IOMMU page-tables >> synchronization. >> Fixes: 78d5f0f500e6 ("intel-iommu: Avoid global flushes with caching mode.") > > Isn't it > > Fixes: 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing") > > ? Of course it is - bad copy-paste. I will send v3. Thanks again, Nadav ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3] iommu/vt-d: do not use flush-queue when caching-mode is on
From: Nadav Amit When an Intel IOMMU is virtualized, and a physical device is passed-through to the VM, changes of the virtual IOMMU need to be propagated to the physical IOMMU. The hypervisor therefore needs to monitor PTE mappings in the IOMMU page-tables. Intel specifications provide "caching-mode" capability that a virtual IOMMU uses to report that the IOMMU is virtualized and a TLB flush is needed after mapping to allow the hypervisor to propagate virtual IOMMU mappings to the physical IOMMU. To the best of my knowledge no real physical IOMMU reports "caching-mode" as turned on. Synchronizing the virtual and the physical IOMMU tables is expensive if the hypervisor is unaware which PTEs have changed, as the hypervisor is required to walk all the virtualized tables and look for changes. Consequently, domain flushes are much more expensive than page-specific flushes on virtualized IOMMUs with passthrough devices. The kernel therefore exploited the "caching-mode" indication to avoid domain flushing and use page-specific flushing in virtualized environments. See commit 78d5f0f500e6 ("intel-iommu: Avoid global flushes with caching mode.") This behavior changed after commit 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing"). Now, when batched TLB flushing is used (the default), full TLB domain flushes are performed frequently, requiring the hypervisor to perform expensive synchronization between the virtual TLB and the physical one. Getting batched TLB flushes to use page-specific invalidations again in such circumstances is not easy, since the TLB invalidation scheme assumes that "full" domain TLB flushes are performed for scalability. Disable batched TLB flushes when caching-mode is on, as the performance benefit from using batched TLB invalidations is likely to be much smaller than the overhead of the virtual-to-physical IOMMU page-tables synchronization. Fixes: 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing") Signed-off-by: Nadav Amit Cc: David Woodhouse Cc: Lu Baolu Cc: Joerg Roedel Cc: Will Deacon Cc: sta...@vger.kernel.org --- v2->v3: * Fix the fixes tag in the commit-log (Lu). * Minor English rephrasing of the commit-log. v1->v2: * disable flush queue for all domains if caching-mode is on for any IOMMU (Lu). --- drivers/iommu/intel/iommu.c | 32 +++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 788119c5b021..de3dd617cf60 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -5373,6 +5373,36 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain, return ret; } +static bool domain_use_flush_queue(void) +{ + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu; + bool r = true; + + if (intel_iommu_strict) + return false; + + /* +* The flush queue implementation does not perform page-selective +* invalidations that are required for efficient TLB flushes in virtual +* environments. The benefit of batching is likely to be much lower than +* the overhead of synchronizing the virtual and physical IOMMU +* page-tables. +*/ + rcu_read_lock(); + for_each_active_iommu(iommu, drhd) { + if (!cap_caching_mode(iommu->cap)) + continue; + + pr_warn_once("IOMMU batching is disabled due to virtualization"); + r = false; + break; + } + rcu_read_unlock(); + + return r; +} + static int intel_iommu_domain_get_attr(struct iommu_domain *domain, enum iommu_attr attr, void *data) @@ -5383,7 +5413,7 @@ intel_iommu_domain_get_attr(struct iommu_domain *domain, case IOMMU_DOMAIN_DMA: switch (attr) { case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - *(int *)data = !intel_iommu_strict; + *(int *)data = domain_use_flush_queue(); return 0; default: return -ENODEV; -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] iommu/io-pgtable: Remove TLBI_ON_MAP quirk
IO_PGTABLE_QUIRK_TLBI_ON_MAP is now fully superseded by the core API's iotlb_sync_map callback. Signed-off-by: Robin Murphy --- drivers/iommu/io-pgtable-arm-v7s.c | 8 +--- include/linux/io-pgtable.h | 5 - 2 files changed, 1 insertion(+), 12 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index 1d92ac948db7..3a92f06aad43 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -535,12 +535,7 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova, * Synchronise all PTE updates for the new mapping before there's * a chance for anything to kick off a table walk for the new iova. */ - if (iop->cfg.quirks & IO_PGTABLE_QUIRK_TLBI_ON_MAP) { - io_pgtable_tlb_flush_walk(iop, iova, size, - ARM_V7S_BLOCK_SIZE(2)); - } else { - wmb(); - } + wmb(); return ret; } @@ -759,7 +754,6 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg, if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_PERMS | - IO_PGTABLE_QUIRK_TLBI_ON_MAP | IO_PGTABLE_QUIRK_ARM_MTK_EXT | IO_PGTABLE_QUIRK_NON_STRICT)) return NULL; diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 2a5686ca2ba3..06753323a15e 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -68,10 +68,6 @@ struct io_pgtable_cfg { * hardware which does not implement the permissions of a given * format, and/or requires some format-specific default value. * -* IO_PGTABLE_QUIRK_TLBI_ON_MAP: If the format forbids caching invalid -* (unmapped) entries but the hardware might do so anyway, perform -* TLB maintenance when mapping as well as when unmapping. -* * IO_PGTABLE_QUIRK_ARM_MTK_EXT: (ARM v7s format) MediaTek IOMMUs extend * to support up to 34 bits PA where the bit32 and bit33 are * encoded in the bit9 and bit4 of the PTE respectively. @@ -88,7 +84,6 @@ struct io_pgtable_cfg { */ #define IO_PGTABLE_QUIRK_ARM_NS BIT(0) #define IO_PGTABLE_QUIRK_NO_PERMS BIT(1) - #define IO_PGTABLE_QUIRK_TLBI_ON_MAPBIT(2) #define IO_PGTABLE_QUIRK_ARM_MTK_EXTBIT(3) #define IO_PGTABLE_QUIRK_NON_STRICT BIT(4) #define IO_PGTABLE_QUIRK_ARM_TTBR1 BIT(5) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] iommu/msm: Hook up iotlb_sync_map
The core API can now accommodate invalidate-on-map style behaviour in a single efficient call, so hook that up instead of having io-pgatble do it piecemeal. Signed-off-by: Robin Murphy --- Just a little quick cleanup on top of the Mediatek TLB patches now queued - CC'ing a bunch of folks from the Git history in the hope that someone can still test MSM... :) drivers/iommu/msm_iommu.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c index 040e85f70861..f0ba6a09b434 100644 --- a/drivers/iommu/msm_iommu.c +++ b/drivers/iommu/msm_iommu.c @@ -343,7 +343,6 @@ static int msm_iommu_domain_config(struct msm_priv *priv) spin_lock_init(>pgtlock); priv->cfg = (struct io_pgtable_cfg) { - .quirks = IO_PGTABLE_QUIRK_TLBI_ON_MAP, .pgsize_bitmap = msm_iommu_ops.pgsize_bitmap, .ias = 32, .oas = 32, @@ -490,6 +489,14 @@ static int msm_iommu_map(struct iommu_domain *domain, unsigned long iova, return ret; } +static void msm_iommu_sync_map(struct iommu_domain *domain, unsigned long iova, + size_t size) +{ + struct msm_priv *priv = to_msm_priv(domain); + + __flush_iotlb_range(iova, size, SZ_4K, false, priv); +} + static size_t msm_iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t len, struct iommu_iotlb_gather *gather) { @@ -680,6 +687,7 @@ static struct iommu_ops msm_iommu_ops = { * kick starting the other master. */ .iotlb_sync = NULL, + .iotlb_sync_map = msm_iommu_sync_map, .iova_to_phys = msm_iommu_iova_to_phys, .probe_device = msm_iommu_probe_device, .release_device = msm_iommu_release_device, base-commit: 0954d61a59e3c014e52b8d938bc12dc5a2e4949c -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH RESEND] dma-mapping: benchmark: fix kernel crash when dma_map_single fails
Thanks, applied. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 5/6] media: uvcvideo: Use dma_alloc_noncontiguos API
On Wed, Jan 27, 2021 at 12:29:08AM +0100, Ricardo Ribalda wrote: > - Is there any platform where dma_alloc_noncontiguos can fail? > This is, !ops->alloc_noncontiguous and !dev->coherent_dma_mask > If yes then we need to add a function to let the driver know in > advance that it has to use the coherent allocator (usb_alloc_coherent > for uvc) dev->coherent_dma_mask is set by the driver. So the only reason why dma_alloc_noncontiguos will fail is because is because it can't allocate any memory. > - In dma_alloc_noncontiguos, on the dma_alloc_pages fallback. If we > have a device where the dma happens in only one direction, could not > get more performance with DMA_FROM/TO_DEVICE instead of > DMA_BIDIRECTIONAL ? Yes, we could probably do that. > > > Then I have tried to use the API, and I have encountered a problem: on > uvcvideo the device passed to the memory allocator is different for > DMA_PAGES and NON_CONTIGUOUS: > https://github.com/ribalda/linux/blob/042cd497739f71c8d4a83a67ee970369e2baca4a/drivers/media/usb/uvc/uvc_video.c#L1236 > > I need to dig a bit tomorrow to figure out why this is, I have > hardware to test both paths, so it should not be too difficult. I always found the USB dma alloc API a little weird, but we might have to follow the scheme of the usb coherent wrappers there. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v12 10/10] iommu/arm-smmu-v3: Add stall support for platform devices
The SMMU provides a Stall model for handling page faults in platform devices. It is similar to PCIe PRI, but doesn't require devices to have their own translation cache. Instead, faulting transactions are parked and the OS is given a chance to fix the page tables and retry the transaction. Enable stall for devices that support it (opt-in by firmware). When an event corresponds to a translation error, call the IOMMU fault handler. If the fault is recoverable, it will call us back to terminate or continue the stall. To use stall device drivers need to enable IOMMU_DEV_FEAT_IOPF, which initializes the fault queue for the device. Tested-by: Zhangfei Gao Reviewed-by: Jonathan Cameron Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 43 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 59 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 189 +- 3 files changed, 276 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 7b15b7580c6e..59af0bbd2f7b 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -354,6 +354,13 @@ #define CMDQ_PRI_1_GRPID GENMASK_ULL(8, 0) #define CMDQ_PRI_1_RESPGENMASK_ULL(13, 12) +#define CMDQ_RESUME_0_RESP_TERM0UL +#define CMDQ_RESUME_0_RESP_RETRY 1UL +#define CMDQ_RESUME_0_RESP_ABORT 2UL +#define CMDQ_RESUME_0_RESP GENMASK_ULL(13, 12) +#define CMDQ_RESUME_0_SID GENMASK_ULL(63, 32) +#define CMDQ_RESUME_1_STAG GENMASK_ULL(15, 0) + #define CMDQ_SYNC_0_CS GENMASK_ULL(13, 12) #define CMDQ_SYNC_0_CS_NONE0 #define CMDQ_SYNC_0_CS_IRQ 1 @@ -370,6 +377,25 @@ #define EVTQ_0_ID GENMASK_ULL(7, 0) +#define EVT_ID_TRANSLATION_FAULT 0x10 +#define EVT_ID_ADDR_SIZE_FAULT 0x11 +#define EVT_ID_ACCESS_FAULT0x12 +#define EVT_ID_PERMISSION_FAULT0x13 + +#define EVTQ_0_SSV (1UL << 11) +#define EVTQ_0_SSIDGENMASK_ULL(31, 12) +#define EVTQ_0_SID GENMASK_ULL(63, 32) +#define EVTQ_1_STAGGENMASK_ULL(15, 0) +#define EVTQ_1_STALL (1UL << 31) +#define EVTQ_1_PnU (1UL << 33) +#define EVTQ_1_InD (1UL << 34) +#define EVTQ_1_RnW (1UL << 35) +#define EVTQ_1_S2 (1UL << 39) +#define EVTQ_1_CLASS GENMASK_ULL(41, 40) +#define EVTQ_1_TT_READ (1UL << 44) +#define EVTQ_2_ADDRGENMASK_ULL(63, 0) +#define EVTQ_3_IPA GENMASK_ULL(51, 12) + /* PRI queue */ #define PRIQ_ENT_SZ_SHIFT 4 #define PRIQ_ENT_DWORDS((1 << PRIQ_ENT_SZ_SHIFT) >> 3) @@ -464,6 +490,13 @@ struct arm_smmu_cmdq_ent { enum pri_resp resp; } pri; + #define CMDQ_OP_RESUME 0x44 + struct { + u32 sid; + u16 stag; + u8 resp; + } resume; + #define CMDQ_OP_CMD_SYNC0x46 struct { u64 msiaddr; @@ -522,6 +555,7 @@ struct arm_smmu_cmdq_batch { struct arm_smmu_evtq { struct arm_smmu_queue q; + struct iopf_queue *iopf; u32 max_stalls; }; @@ -659,7 +693,9 @@ struct arm_smmu_master { struct arm_smmu_stream *streams; unsigned intnum_streams; boolats_enabled; + boolstall_enabled; boolsva_enabled; + booliopf_enabled; struct list_headbonds; unsigned intssid_bits; }; @@ -678,6 +714,7 @@ struct arm_smmu_domain { struct io_pgtable_ops *pgtbl_ops; boolnon_strict; + boolstall_enabled; atomic_tnr_ats_masters; enum arm_smmu_domain_stage stage; @@ -719,6 +756,7 @@ bool arm_smmu_master_sva_supported(struct arm_smmu_master *master); bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master); int arm_smmu_master_enable_sva(struct arm_smmu_master *master); int arm_smmu_master_disable_sva(struct arm_smmu_master *master); +bool arm_smmu_master_iopf_supported(struct arm_smmu_master *master); struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata); void
[PATCH v12 09/10] ACPI/IORT: Enable stall support for platform devices
Copy the "Stall supported" bit, that tells whether a named component supports stall, into the dma-can-stall device property. Acked-by: Jonathan Cameron Signed-off-by: Jean-Philippe Brucker --- drivers/acpi/arm64/iort.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index c9a8bbb74b09..42820d7eb869 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -968,13 +968,15 @@ static int iort_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data) static void iort_named_component_init(struct device *dev, struct acpi_iort_node *node) { - struct property_entry props[2] = {}; + struct property_entry props[3] = {}; struct acpi_iort_named_component *nc; nc = (struct acpi_iort_named_component *)node->node_data; props[0] = PROPERTY_ENTRY_U32("pasid-num-bits", FIELD_GET(ACPI_IORT_NC_PASID_BITS, nc->node_flags)); + if (nc->node_flags & ACPI_IORT_NC_STALL_SUPPORTED) + props[1] = PROPERTY_ENTRY_BOOL("dma-can-stall"); if (device_add_properties(dev, props)) dev_warn(dev, "Could not add device properties\n"); -- 2.30.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v12 07/10] iommu/arm-smmu-v3: Maintain a SID->device structure
When handling faults from the event or PRI queue, we need to find the struct device associated with a SID. Add a rb_tree to keep track of SIDs. Acked-by: Jonathan Cameron Reviewed-by: Eric Auger Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 13 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 161 2 files changed, 144 insertions(+), 30 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index f985817c967a..7b15b7580c6e 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -639,6 +639,15 @@ struct arm_smmu_device { /* IOMMU core code handle */ struct iommu_device iommu; + + struct rb_root streams; + struct mutexstreams_mutex; +}; + +struct arm_smmu_stream { + u32 id; + struct arm_smmu_master *master; + struct rb_node node; }; /* SMMU private data for each master */ @@ -647,8 +656,8 @@ struct arm_smmu_master { struct device *dev; struct arm_smmu_domain *domain; struct list_headdomain_head; - u32 *sids; - unsigned intnum_sids; + struct arm_smmu_stream *streams; + unsigned intnum_streams; boolats_enabled; boolsva_enabled; struct list_headbonds; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 88dd9feb32f4..3afec6ed8075 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -909,8 +909,8 @@ static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain, spin_lock_irqsave(_domain->devices_lock, flags); list_for_each_entry(master, _domain->devices, domain_head) { - for (i = 0; i < master->num_sids; i++) { - cmd.cfgi.sid = master->sids[i]; + for (i = 0; i < master->num_streams; i++) { + cmd.cfgi.sid = master->streams[i].id; arm_smmu_cmdq_batch_add(smmu, , ); } } @@ -1355,6 +1355,32 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) return 0; } +__maybe_unused +static struct arm_smmu_master * +arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid) +{ + struct rb_node *node; + struct arm_smmu_stream *stream; + struct arm_smmu_master *master = NULL; + + mutex_lock(>streams_mutex); + node = smmu->streams.rb_node; + while (node) { + stream = rb_entry(node, struct arm_smmu_stream, node); + if (stream->id < sid) { + node = node->rb_right; + } else if (stream->id > sid) { + node = node->rb_left; + } else { + master = stream->master; + break; + } + } + mutex_unlock(>streams_mutex); + + return master; +} + /* IRQ and event handlers */ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev) { @@ -1588,8 +1614,8 @@ static int arm_smmu_atc_inv_master(struct arm_smmu_master *master) arm_smmu_atc_inv_to_cmd(0, 0, 0, ); - for (i = 0; i < master->num_sids; i++) { - cmd.atc.sid = master->sids[i]; + for (i = 0; i < master->num_streams; i++) { + cmd.atc.sid = master->streams[i].id; arm_smmu_cmdq_issue_cmd(master->smmu, ); } @@ -1632,8 +1658,8 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, if (!master->ats_enabled) continue; - for (i = 0; i < master->num_sids; i++) { - cmd.atc.sid = master->sids[i]; + for (i = 0; i < master->num_streams; i++) { + cmd.atc.sid = master->streams[i].id; arm_smmu_cmdq_batch_add(smmu_domain->smmu, , ); } } @@ -2065,13 +2091,13 @@ static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master) int i, j; struct arm_smmu_device *smmu = master->smmu; - for (i = 0; i < master->num_sids; ++i) { - u32 sid = master->sids[i]; + for (i = 0; i < master->num_streams; ++i) { + u32 sid = master->streams[i].id; __le64 *step = arm_smmu_get_step_for_sid(smmu, sid); /* Bridged PCI devices may end up with duplicated IDs */ for (j = 0; j < i; j++) - if (master->sids[j] == sid) + if
[PATCH v12 08/10] dt-bindings: document stall property for IOMMU masters
On ARM systems, some platform devices behind an IOMMU may support stall, which is the ability to recover from page faults. Let the firmware tell us when a device supports stall. Reviewed-by: Rob Herring Signed-off-by: Jean-Philippe Brucker --- .../devicetree/bindings/iommu/iommu.txt| 18 ++ 1 file changed, 18 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/iommu.txt b/Documentation/devicetree/bindings/iommu/iommu.txt index 3c36334e4f94..26ba9e530f13 100644 --- a/Documentation/devicetree/bindings/iommu/iommu.txt +++ b/Documentation/devicetree/bindings/iommu/iommu.txt @@ -92,6 +92,24 @@ Optional properties: tagging DMA transactions with an address space identifier. By default, this is 0, which means that the device only has one address space. +- dma-can-stall: When present, the master can wait for a transaction to + complete for an indefinite amount of time. Upon translation fault some + IOMMUs, instead of aborting the translation immediately, may first + notify the driver and keep the transaction in flight. This allows the OS + to inspect the fault and, for example, make physical pages resident + before updating the mappings and completing the transaction. Such IOMMU + accepts a limited number of simultaneous stalled transactions before + having to either put back-pressure on the master, or abort new faulting + transactions. + + Firmware has to opt-in stalling, because most buses and masters don't + support it. In particular it isn't compatible with PCI, where + transactions have to complete before a time limit. More generally it + won't work in systems and masters that haven't been designed for + stalling. For example the OS, in order to handle a stalled transaction, + may attempt to retrieve pages from secondary storage in a stalled + domain, leading to a deadlock. + Notes: == -- 2.30.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v12 06/10] iommu: Add a page fault handler
Some systems allow devices to handle I/O Page Faults in the core mm. For example systems implementing the PCIe PRI extension or Arm SMMU stall model. Infrastructure for reporting these recoverable page faults was added to the IOMMU core by commit 0c830e6b3282 ("iommu: Introduce device fault report API"). Add a page fault handler for host SVA. IOMMU driver can now instantiate several fault workqueues and link them to IOPF-capable devices. Drivers can choose between a single global workqueue, one per IOMMU device, one per low-level fault queue, one per domain, etc. When it receives a fault event, most commonly in an IRQ handler, the IOMMU driver reports the fault using iommu_report_device_fault(), which calls the registered handler. The page fault handler then calls the mm fault handler, and reports either success or failure with iommu_page_response(). After the handler succeeds, the hardware retries the access. The iopf_param pointer could be embedded into iommu_fault_param. But putting iopf_param into the iommu_param structure allows us not to care about ordering between calls to iopf_queue_add_device() and iommu_register_device_fault_handler(). Reviewed-by: Jonathan Cameron Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/Makefile| 1 + drivers/iommu/iommu-sva-lib.h | 53 include/linux/iommu.h | 2 + drivers/iommu/io-pgfault.c| 461 ++ 4 files changed, 517 insertions(+) create mode 100644 drivers/iommu/io-pgfault.c diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 61bd30cd8369..60fafc23dee6 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -28,3 +28,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o +obj-$(CONFIG_IOMMU_SVA_LIB) += io-pgfault.o diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h index b40990aef3fd..031155010ca8 100644 --- a/drivers/iommu/iommu-sva-lib.h +++ b/drivers/iommu/iommu-sva-lib.h @@ -12,4 +12,57 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max); void iommu_sva_free_pasid(struct mm_struct *mm); struct mm_struct *iommu_sva_find(ioasid_t pasid); +/* I/O Page fault */ +struct device; +struct iommu_fault; +struct iopf_queue; + +#ifdef CONFIG_IOMMU_SVA_LIB +int iommu_queue_iopf(struct iommu_fault *fault, void *cookie); + +int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev); +int iopf_queue_remove_device(struct iopf_queue *queue, +struct device *dev); +int iopf_queue_flush_dev(struct device *dev); +struct iopf_queue *iopf_queue_alloc(const char *name); +void iopf_queue_free(struct iopf_queue *queue); +int iopf_queue_discard_partial(struct iopf_queue *queue); + +#else /* CONFIG_IOMMU_SVA_LIB */ +static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie) +{ + return -ENODEV; +} + +static inline int iopf_queue_add_device(struct iopf_queue *queue, + struct device *dev) +{ + return -ENODEV; +} + +static inline int iopf_queue_remove_device(struct iopf_queue *queue, + struct device *dev) +{ + return -ENODEV; +} + +static inline int iopf_queue_flush_dev(struct device *dev) +{ + return -ENODEV; +} + +static inline struct iopf_queue *iopf_queue_alloc(const char *name) +{ + return NULL; +} + +static inline void iopf_queue_free(struct iopf_queue *queue) +{ +} + +static inline int iopf_queue_discard_partial(struct iopf_queue *queue) +{ + return -ENODEV; +} +#endif /* CONFIG_IOMMU_SVA_LIB */ #endif /* _IOMMU_SVA_LIB_H */ diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 00348e4c3c26..edc9be443a74 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -366,6 +366,7 @@ struct iommu_fault_param { * struct dev_iommu - Collection of per-device IOMMU data * * @fault_param: IOMMU detected device fault reporting data + * @iopf_param: I/O Page Fault queue and data * @fwspec: IOMMU fwspec data * @iommu_dev: IOMMU device this device is linked to * @priv: IOMMU Driver private data @@ -376,6 +377,7 @@ struct iommu_fault_param { struct dev_iommu { struct mutex lock; struct iommu_fault_param*fault_param; + struct iopf_device_param*iopf_param; struct iommu_fwspec *fwspec; struct iommu_device *iommu_dev; void*priv; diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c new file mode 100644 index ..1df8c1dcae77 --- /dev/null +++ b/drivers/iommu/io-pgfault.c @@ -0,0 +1,461 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Handle device page faults + * + * Copyright (C) 2020 ARM Ltd. + */ + +#include +#include +#include +#include +#include +
[PATCH v12 05/10] uacce: Enable IOMMU_DEV_FEAT_IOPF
The IOPF (I/O Page Fault) feature is now enabled independently from the SVA feature, because some IOPF implementations are device-specific and do not require IOMMU support for PCIe PRI or Arm SMMU stall. Enable IOPF unconditionally when enabling SVA for now. In the future, if a device driver implementing a uacce interface doesn't need IOPF support, it will need to tell the uacce module, for example with a new flag. Acked-by: Zhangfei Gao Signed-off-by: Jean-Philippe Brucker --- Cc: Arnd Bergmann Cc: Greg Kroah-Hartman Cc: Zhangfei Gao Cc: Zhou Wang --- drivers/misc/uacce/uacce.c | 39 +- 1 file changed, 30 insertions(+), 9 deletions(-) diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c index d07af4edfcac..6db7a98486ec 100644 --- a/drivers/misc/uacce/uacce.c +++ b/drivers/misc/uacce/uacce.c @@ -385,6 +385,33 @@ static void uacce_release(struct device *dev) kfree(uacce); } +static unsigned int uacce_enable_sva(struct device *parent, unsigned int flags) +{ + if (!(flags & UACCE_DEV_SVA)) + return flags; + + flags &= ~UACCE_DEV_SVA; + + if (iommu_dev_enable_feature(parent, IOMMU_DEV_FEAT_IOPF)) + return flags; + + if (iommu_dev_enable_feature(parent, IOMMU_DEV_FEAT_SVA)) { + iommu_dev_disable_feature(parent, IOMMU_DEV_FEAT_IOPF); + return flags; + } + + return flags | UACCE_DEV_SVA; +} + +static void uacce_disable_sva(struct uacce_device *uacce) +{ + if (!(uacce->flags & UACCE_DEV_SVA)) + return; + + iommu_dev_disable_feature(uacce->parent, IOMMU_DEV_FEAT_SVA); + iommu_dev_disable_feature(uacce->parent, IOMMU_DEV_FEAT_IOPF); +} + /** * uacce_alloc() - alloc an accelerator * @parent: pointer of uacce parent device @@ -404,11 +431,7 @@ struct uacce_device *uacce_alloc(struct device *parent, if (!uacce) return ERR_PTR(-ENOMEM); - if (flags & UACCE_DEV_SVA) { - ret = iommu_dev_enable_feature(parent, IOMMU_DEV_FEAT_SVA); - if (ret) - flags &= ~UACCE_DEV_SVA; - } + flags = uacce_enable_sva(parent, flags); uacce->parent = parent; uacce->flags = flags; @@ -432,8 +455,7 @@ struct uacce_device *uacce_alloc(struct device *parent, return uacce; err_with_uacce: - if (flags & UACCE_DEV_SVA) - iommu_dev_disable_feature(uacce->parent, IOMMU_DEV_FEAT_SVA); + uacce_disable_sva(uacce); kfree(uacce); return ERR_PTR(ret); } @@ -487,8 +509,7 @@ void uacce_remove(struct uacce_device *uacce) mutex_unlock(>queues_lock); /* disable sva now since no opened queues */ - if (uacce->flags & UACCE_DEV_SVA) - iommu_dev_disable_feature(uacce->parent, IOMMU_DEV_FEAT_SVA); + uacce_disable_sva(uacce); if (uacce->cdev) cdev_device_del(uacce->cdev, >dev); -- 2.30.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v12 04/10] iommu/vt-d: Support IOMMU_DEV_FEAT_IOPF
Allow drivers to query and enable IOMMU_DEV_FEAT_IOPF, which amounts to checking whether PRI is enabled. Reviewed-by: Lu Baolu Signed-off-by: Jean-Philippe Brucker --- Cc: David Woodhouse Cc: Lu Baolu --- drivers/iommu/intel/iommu.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index f665322a0991..c777bd94df5d 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -5330,6 +5330,8 @@ static int siov_find_pci_dvsec(struct pci_dev *pdev) static bool intel_iommu_dev_has_feat(struct device *dev, enum iommu_dev_features feat) { + struct device_domain_info *info = get_domain_info(dev); + if (feat == IOMMU_DEV_FEAT_AUX) { int ret; @@ -5344,13 +5346,13 @@ intel_iommu_dev_has_feat(struct device *dev, enum iommu_dev_features feat) return !!siov_find_pci_dvsec(to_pci_dev(dev)); } - if (feat == IOMMU_DEV_FEAT_SVA) { - struct device_domain_info *info = get_domain_info(dev); + if (feat == IOMMU_DEV_FEAT_IOPF) + return info && info->pri_supported; + if (feat == IOMMU_DEV_FEAT_SVA) return info && (info->iommu->flags & VTD_FLAG_SVM_CAPABLE) && info->pasid_supported && info->pri_supported && info->ats_supported; - } return false; } @@ -5361,6 +5363,9 @@ intel_iommu_dev_enable_feat(struct device *dev, enum iommu_dev_features feat) if (feat == IOMMU_DEV_FEAT_AUX) return intel_iommu_enable_auxd(dev); + if (feat == IOMMU_DEV_FEAT_IOPF) + return intel_iommu_dev_has_feat(dev, feat) ? 0 : -ENODEV; + if (feat == IOMMU_DEV_FEAT_SVA) { struct device_domain_info *info = get_domain_info(dev); -- 2.30.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v12 03/10] iommu: Separate IOMMU_DEV_FEAT_IOPF from IOMMU_DEV_FEAT_SVA
Some devices manage I/O Page Faults (IOPF) themselves instead of relying on PCIe PRI or Arm SMMU stall. Allow their drivers to enable SVA without mandating IOMMU-managed IOPF. The other device drivers now need to first enable IOMMU_DEV_FEAT_IOPF before enabling IOMMU_DEV_FEAT_SVA. Enabling IOMMU_DEV_FEAT_IOPF on its own doesn't have any effect visible to the device driver, it is used in combination with other features. Signed-off-by: Jean-Philippe Brucker --- Cc: Arnd Bergmann Cc: David Woodhouse Cc: Greg Kroah-Hartman Cc: Joerg Roedel Cc: Lu Baolu Cc: Will Deacon Cc: Zhangfei Gao Cc: Zhou Wang --- include/linux/iommu.h | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index b7ea11fc1a93..00348e4c3c26 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -156,10 +156,24 @@ struct iommu_resv_region { enum iommu_resv_typetype; }; -/* Per device IOMMU features */ +/** + * enum iommu_dev_features - Per device IOMMU features + * @IOMMU_DEV_FEAT_AUX: Auxiliary domain feature + * @IOMMU_DEV_FEAT_SVA: Shared Virtual Addresses + * @IOMMU_DEV_FEAT_IOPF: I/O Page Faults such as PRI or Stall. Generally + * enabling %IOMMU_DEV_FEAT_SVA requires + * %IOMMU_DEV_FEAT_IOPF, but some devices manage I/O Page + * Faults themselves instead of relying on the IOMMU. When + * supported, this feature must be enabled before and + * disabled after %IOMMU_DEV_FEAT_SVA. + * + * Device drivers query whether a feature is supported using + * iommu_dev_has_feature(), and enable it using iommu_dev_enable_feature(). + */ enum iommu_dev_features { - IOMMU_DEV_FEAT_AUX, /* Aux-domain feature */ - IOMMU_DEV_FEAT_SVA, /* Shared Virtual Addresses */ + IOMMU_DEV_FEAT_AUX, + IOMMU_DEV_FEAT_SVA, + IOMMU_DEV_FEAT_IOPF, }; #define IOMMU_PASID_INVALID(-1U) -- 2.30.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v12 01/10] iommu: Fix comment for struct iommu_fwspec
Commit 986d5ecc5699 ("iommu: Move fwspec->iommu_priv to struct dev_iommu") removed iommu_priv from fwspec and commit 5702ee24182f ("ACPI/IORT: Check ATS capability in root complex nodes") added @flags. Update the struct doc. Acked-by: Jonathan Cameron Signed-off-by: Jean-Philippe Brucker --- include/linux/iommu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index b3f0e2018c62..bdf3f34a4457 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -570,7 +570,7 @@ struct iommu_group *fsl_mc_device_group(struct device *dev); * struct iommu_fwspec - per-device IOMMU instance data * @ops: ops for this device's IOMMU * @iommu_fwnode: firmware handle for this device's IOMMU - * @iommu_priv: IOMMU driver private data for this device + * @flags: IOMMU_FWSPEC_* flags * @num_pasid_bits: number of PASID bits supported by this device * @num_ids: number of associated device IDs * @ids: IDs which this device may present to the IOMMU -- 2.30.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v12 02/10] iommu/arm-smmu-v3: Use device properties for pasid-num-bits
The pasid-num-bits property shouldn't need a dedicated fwspec field, it's a job for device properties. Add properties for IORT, and access the number of PASID bits using device_property_read_u32(). Suggested-by: Robin Murphy Acked-by: Jonathan Cameron Signed-off-by: Jean-Philippe Brucker --- include/linux/iommu.h | 2 -- drivers/acpi/arm64/iort.c | 13 +++-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 ++- drivers/iommu/of_iommu.c| 5 - 4 files changed, 9 insertions(+), 14 deletions(-) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index bdf3f34a4457..b7ea11fc1a93 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -571,7 +571,6 @@ struct iommu_group *fsl_mc_device_group(struct device *dev); * @ops: ops for this device's IOMMU * @iommu_fwnode: firmware handle for this device's IOMMU * @flags: IOMMU_FWSPEC_* flags - * @num_pasid_bits: number of PASID bits supported by this device * @num_ids: number of associated device IDs * @ids: IDs which this device may present to the IOMMU */ @@ -579,7 +578,6 @@ struct iommu_fwspec { const struct iommu_ops *ops; struct fwnode_handle*iommu_fwnode; u32 flags; - u32 num_pasid_bits; unsigned intnum_ids; u32 ids[]; }; diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c index d4eac6d7e9fb..c9a8bbb74b09 100644 --- a/drivers/acpi/arm64/iort.c +++ b/drivers/acpi/arm64/iort.c @@ -968,15 +968,16 @@ static int iort_pci_iommu_init(struct pci_dev *pdev, u16 alias, void *data) static void iort_named_component_init(struct device *dev, struct acpi_iort_node *node) { + struct property_entry props[2] = {}; struct acpi_iort_named_component *nc; - struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); - - if (!fwspec) - return; nc = (struct acpi_iort_named_component *)node->node_data; - fwspec->num_pasid_bits = FIELD_GET(ACPI_IORT_NC_PASID_BITS, - nc->node_flags); + props[0] = PROPERTY_ENTRY_U32("pasid-num-bits", + FIELD_GET(ACPI_IORT_NC_PASID_BITS, + nc->node_flags)); + + if (device_add_properties(dev, props)) + dev_warn(dev, "Could not add device properties\n"); } static int iort_nc_iommu_map(struct device *dev, struct acpi_iort_node *node) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index baebaac34a83..88dd9feb32f4 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2392,7 +2392,8 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev) } } - master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits); + device_property_read_u32(dev, "pasid-num-bits", >ssid_bits); + master->ssid_bits = min(smmu->ssid_bits, master->ssid_bits); /* * Note that PASID must be enabled before, and disabled after ATS: diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index e505b9130a1c..a9d2df001149 100644 --- a/drivers/iommu/of_iommu.c +++ b/drivers/iommu/of_iommu.c @@ -210,11 +210,6 @@ const struct iommu_ops *of_iommu_configure(struct device *dev, of_pci_iommu_init, ); } else { err = of_iommu_configure_device(master_np, dev, id); - - fwspec = dev_iommu_fwspec_get(dev); - if (!err && fwspec) - of_property_read_u32(master_np, "pasid-num-bits", ->num_pasid_bits); } /* -- 2.30.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v12 00/10] iommu: I/O page faults for SMMUv3
Add stall support to the SMMUv3, along with a common I/O Page Fault handler. Since v11 I added a sanity check in arm_smmu_dev_disable_feature(), patch 10: return -EBUSY if device drivers attempt to disable the IOPF feature before SVA. That would leak the IOPF param and fault handler. v11: https://lore.kernel.org/linux-iommu/20210125110650.3232195-1-jean-phili...@linaro.org/ v10: https://lore.kernel.org/linux-iommu/20210121123623.2060416-1-jean-phili...@linaro.org/ v9: https://lore.kernel.org/linux-iommu/20210108145217.2254447-1-jean-phili...@linaro.org/ Jean-Philippe Brucker (10): iommu: Fix comment for struct iommu_fwspec iommu/arm-smmu-v3: Use device properties for pasid-num-bits iommu: Separate IOMMU_DEV_FEAT_IOPF from IOMMU_DEV_FEAT_SVA iommu/vt-d: Support IOMMU_DEV_FEAT_IOPF uacce: Enable IOMMU_DEV_FEAT_IOPF iommu: Add a page fault handler iommu/arm-smmu-v3: Maintain a SID->device structure dt-bindings: document stall property for IOMMU masters ACPI/IORT: Enable stall support for platform devices iommu/arm-smmu-v3: Add stall support for platform devices drivers/iommu/Makefile| 1 + .../devicetree/bindings/iommu/iommu.txt | 18 + drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 56 ++- drivers/iommu/iommu-sva-lib.h | 53 ++ include/linux/iommu.h | 26 +- drivers/acpi/arm64/iort.c | 15 +- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 59 ++- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 351 +++-- drivers/iommu/intel/iommu.c | 11 +- drivers/iommu/io-pgfault.c| 461 ++ drivers/iommu/of_iommu.c | 5 - drivers/misc/uacce/uacce.c| 39 +- 12 files changed, 1021 insertions(+), 74 deletions(-) create mode 100644 drivers/iommu/io-pgfault.c -- 2.30.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 00/13] iommu/amd: Add Generic IO Page Table Framework Support
On 1/27/21 7:06 PM, Joerg Roedel wrote: Hi Suravee, On Tue, Dec 15, 2020 at 01:36:52AM -0600, Suravee Suthikulpanit wrote: Suravee Suthikulpanit (13): iommu/amd: Re-define amd_iommu_domain_encode_pgtable as inline iommu/amd: Prepare for generic IO page table framework iommu/amd: Move pt_root to struct amd_io_pgtable iommu/amd: Convert to using amd_io_pgtable iommu/amd: Declare functions as extern iommu/amd: Move IO page table related functions iommu/amd: Restructure code for freeing page table iommu/amd: Remove amd_iommu_domain_get_pgtable iommu/amd: Rename variables to be consistent with struct io_pgtable_ops iommu/amd: Refactor fetch_pte to use struct amd_io_pgtable iommu/amd: Introduce iommu_v1_iova_to_phys iommu/amd: Introduce iommu_v1_map_page and iommu_v1_unmap_page iommu/amd: Adopt IO page table framework for AMD IOMMU v1 page table Applied this series, thanks for the work! Given testing goes well you can consider this queued for 5.12. Thanks, Joerg Thanks Joerg and Will, and welcome back!!! Suravee ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/7] MediaTek IOMMU improve tlb flush performance in map/unmap
On Thu, 7 Jan 2021 20:29:02 +0800, Yong Wu wrote: > This patchset is to improve tlb flushing performance in iommu_map/unmap > for MediaTek IOMMU. > > For iommu_map, currently MediaTek IOMMU use IO_PGTABLE_QUIRK_TLBI_ON_MAP > to do tlb_flush for each a memory chunk. this is so unnecessary. we could > improve it by tlb flushing one time at the end of iommu_map. > > [...] After discussion with Joerg, I'll queue this (and hopefully the next posting of your IOMMU driver) along with the Arm SMMU patches, and then send that all together. Applied to will (for-joerg/mtk), thanks! [1/7] iommu: Move iotlb_sync_map out from __iommu_map https://git.kernel.org/arm64/c/d8c1df02ac7f [2/7] iommu: Add iova and size as parameters in iotlb_sync_map https://git.kernel.org/arm64/c/2ebbd25873ce [3/7] iommu/mediatek: Add iotlb_sync_map to sync whole the iova range https://git.kernel.org/arm64/c/20143451eff0 [4/7] iommu: Switch gather->end to the inclusive end https://git.kernel.org/arm64/c/862c3715de8f [5/7] iommu/io-pgtable: Allow io_pgtable_tlb ops optional https://git.kernel.org/arm64/c/77e0992aee4e [6/7] iommu/mediatek: Gather iova in iommu_unmap to achieve tlb sync once https://git.kernel.org/arm64/c/f21ae3b10084 [7/7] iommu/mediatek: Remove the tlb-ops for v7s https://git.kernel.org/arm64/c/0954d61a59e3 Cheers, -- Will https://fixes.arm64.dev https://next.arm64.dev https://will.arm64.dev ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] of/device: Update dma_range_map only when dev has valid dma-ranges
[ + Christoph, Marek ] On 2021-01-27 13:00, Paul Kocialkowski wrote: Hi, On Tue 19 Jan 21, 18:52, Yong Wu wrote: The commit e0d072782c73 ("dma-mapping: introduce DMA range map, supplanting dma_pfn_offset") always update dma_range_map even though it was already set, like in the sunxi_mbus driver. the issue is reported at [1]. This patch avoid this(Updating it only when dev has valid dma-ranges). Meanwhile, dma_range_map contains the devices' dma_ranges information, This patch moves dma_range_map before of_iommu_configure. The iommu driver may need to know the dma_address requirements of its iommu consumer devices. Just a gentle ping on this issue, it would be nice to have this fix merged ASAP, in the next RC :) Ack to that - Rob, Frank, do you want to take this through the OF tree, or shall we take it through the DMA-mapping tree like the original culprit? Thanks, Robin. Cheers, Paul [1] https://lore.kernel.org/linux-arm-kernel/5c7946f3-b56e-da00-a750-be097c7ce...@arm.com/ CC: Rob Herring CC: Frank Rowand Fixes: e0d072782c73 ("dma-mapping: introduce DMA range map, supplanting dma_pfn_offset"), Suggested-by: Robin Murphy Signed-off-by: Yong Wu Signed-off-by: Paul Kocialkowski Reviewed-by: Rob Herring --- drivers/of/device.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/of/device.c b/drivers/of/device.c index aedfaaafd3e7..1122daa8e273 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -162,9 +162,11 @@ int of_dma_configure_id(struct device *dev, struct device_node *np, mask = DMA_BIT_MASK(ilog2(end) + 1); dev->coherent_dma_mask &= mask; *dev->dma_mask &= mask; - /* ...but only set bus limit if we found valid dma-ranges earlier */ - if (!ret) + /* ...but only set bus limit and range map if we found valid dma-ranges earlier */ + if (!ret) { dev->bus_dma_limit = end; + dev->dma_range_map = map; + } coherent = of_dma_is_coherent(np); dev_dbg(dev, "device is%sdma coherent\n", @@ -172,6 +174,9 @@ int of_dma_configure_id(struct device *dev, struct device_node *np, iommu = of_iommu_configure(dev, np, id); if (PTR_ERR(iommu) == -EPROBE_DEFER) { + /* Don't touch range map if it wasn't set from a valid dma-ranges */ + if (!ret) + dev->dma_range_map = NULL; kfree(map); return -EPROBE_DEFER; } @@ -181,7 +186,6 @@ int of_dma_configure_id(struct device *dev, struct device_node *np, arch_setup_dma_ops(dev, dma_start, size, iommu, coherent); - dev->dma_range_map = map; return 0; } EXPORT_SYMBOL_GPL(of_dma_configure_id); -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] of/device: Update dma_range_map only when dev has valid dma-ranges
Hi, On Tue 19 Jan 21, 18:52, Yong Wu wrote: > The commit e0d072782c73 ("dma-mapping: introduce DMA range map, > supplanting dma_pfn_offset") always update dma_range_map even though it was > already set, like in the sunxi_mbus driver. the issue is reported at [1]. > This patch avoid this(Updating it only when dev has valid dma-ranges). > > Meanwhile, dma_range_map contains the devices' dma_ranges information, > This patch moves dma_range_map before of_iommu_configure. The iommu > driver may need to know the dma_address requirements of its iommu > consumer devices. Just a gentle ping on this issue, it would be nice to have this fix merged ASAP, in the next RC :) Cheers, Paul > [1] > https://lore.kernel.org/linux-arm-kernel/5c7946f3-b56e-da00-a750-be097c7ce...@arm.com/ > > CC: Rob Herring > CC: Frank Rowand > Fixes: e0d072782c73 ("dma-mapping: introduce DMA range map, supplanting > dma_pfn_offset"), > Suggested-by: Robin Murphy > Signed-off-by: Yong Wu > Signed-off-by: Paul Kocialkowski > Reviewed-by: Rob Herring > --- > drivers/of/device.c | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/of/device.c b/drivers/of/device.c > index aedfaaafd3e7..1122daa8e273 100644 > --- a/drivers/of/device.c > +++ b/drivers/of/device.c > @@ -162,9 +162,11 @@ int of_dma_configure_id(struct device *dev, struct > device_node *np, > mask = DMA_BIT_MASK(ilog2(end) + 1); > dev->coherent_dma_mask &= mask; > *dev->dma_mask &= mask; > - /* ...but only set bus limit if we found valid dma-ranges earlier */ > - if (!ret) > + /* ...but only set bus limit and range map if we found valid dma-ranges > earlier */ > + if (!ret) { > dev->bus_dma_limit = end; > + dev->dma_range_map = map; > + } > > coherent = of_dma_is_coherent(np); > dev_dbg(dev, "device is%sdma coherent\n", > @@ -172,6 +174,9 @@ int of_dma_configure_id(struct device *dev, struct > device_node *np, > > iommu = of_iommu_configure(dev, np, id); > if (PTR_ERR(iommu) == -EPROBE_DEFER) { > + /* Don't touch range map if it wasn't set from a valid > dma-ranges */ > + if (!ret) > + dev->dma_range_map = NULL; > kfree(map); > return -EPROBE_DEFER; > } > @@ -181,7 +186,6 @@ int of_dma_configure_id(struct device *dev, struct > device_node *np, > > arch_setup_dma_ops(dev, dma_start, size, iommu, coherent); > > - dev->dma_range_map = map; > return 0; > } > EXPORT_SYMBOL_GPL(of_dma_configure_id); > -- > 2.18.0 > -- Paul Kocialkowski, Bootlin Embedded Linux and kernel engineering https://bootlin.com signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v1 2/2] iommu: add Unisoc iommu basic driver
On Fri, 22 Jan 2021 at 05:46, Robin Murphy wrote: > > On 2021-01-21 11:23, Chunyan Zhang wrote: > > From: Chunyan Zhang > > > > This patch only adds display iommu support, the driver was tested with sprd > > dpu and image codec processor. > > > > The iommu support for others would be added once finished tests with those > > devices, such as a few signal processors, including VSP(video), > > GSP(graphic), ISP(image), and camera CPP, etc. > > > > Signed-off-by: Chunyan Zhang > > --- > > drivers/iommu/Kconfig | 12 + > > drivers/iommu/Makefile | 1 + > > drivers/iommu/sprd-iommu.c | 566 + > > 3 files changed, 579 insertions(+) > > create mode 100644 drivers/iommu/sprd-iommu.c > > > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > > index 192ef8f61310..79af62c519ae 100644 > > --- a/drivers/iommu/Kconfig > > +++ b/drivers/iommu/Kconfig > > @@ -408,4 +408,16 @@ config VIRTIO_IOMMU > > > > Say Y here if you intend to run this kernel as a guest. > > > > +config SPRD_IOMMU > > + tristate "Unisoc IOMMU Support" > > + depends on ARCH_SPRD || COMPILE_TEST > > + select IOMMU_API > > + help > > + Support for IOMMU on Unisoc's SoCs on which multi-media subsystems > > + need IOMMU, such as DPU, Image codec(jpeg) processor, and a few > > + signal processors, including VSP(video), GSP(graphic), ISP(image), > > and > > + CPP, etc. > > + > > + Say Y here if you want multi-media functions. > > + > > endif # IOMMU_SUPPORT > > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile > > index 61bd30cd8369..5925b6af2123 100644 > > --- a/drivers/iommu/Makefile > > +++ b/drivers/iommu/Makefile > > @@ -28,3 +28,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o > > obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o > > obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o > > obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o > > +obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o > > diff --git a/drivers/iommu/sprd-iommu.c b/drivers/iommu/sprd-iommu.c > > new file mode 100644 > > index ..44cde44017fa > > --- /dev/null > > +++ b/drivers/iommu/sprd-iommu.c > > @@ -0,0 +1,566 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* > > + * Unisoc IOMMU driver > > + * > > + * Copyright (C) 2020 Unisoc, Inc. > > + * Author: Chunyan Zhang > > + */ > > + > > +#include > > +#include > > +#include > > You need since you're using the DMA API. > > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +/* SPRD IOMMU page is 4K size alignment */ > > +#define SPRD_IOMMU_PAGE_SHIFT12 > > +#define SPRD_IOMMU_PAGE_SIZE SZ_4K > > + > > +#define SPRD_EX_CFG 0x0 > > +#define SPRD_IOMMU_VAOR_BYPASS BIT(4) > > +#define SPRD_IOMMU_GATE_EN BIT(1) > > +#define SPRD_IOMMU_ENBIT(0) > > +#define SPRD_EX_UPDATE 0x4 > > +#define SPRD_EX_FIRST_VPN0x8 > > +#define SPRD_EX_VPN_RANGE0xc > > +#define SPRD_EX_FIRST_PPN0x10 > > +#define SPRD_EX_DEFAULT_PPN 0x14 > > + > > +#define SPRD_IOMMU_VERSION 0x0 > > +#define SPRD_VERSION_MASKGENMASK(15, 8) > > +#define SPRD_VERSION_SHIFT 0x8 > > +#define SPRD_VAU_CFG 0x4 > > +#define SPRD_VAU_UPDATE 0x8 > > +#define SPRD_VAU_AUTH_CFG0xc > > +#define SPRD_VAU_FIRST_PPN 0x10 > > +#define SPRD_VAU_DEFAULT_PPN_RD 0x14 > > +#define SPRD_VAU_DEFAULT_PPN_WR 0x18 > > +#define SPRD_VAU_FIRST_VPN 0x1c > > +#define SPRD_VAU_VPN_RANGE 0x20 > > + > > +enum sprd_iommu_version { > > + SPRD_IOMMU_EX, > > + SPRD_IOMMU_VAU, > > +}; > > + > > +struct sprd_iommu_match_data { > > + unsigned long reg_offset; > > +}; > > + > > +/* > > + * struct sprd_iommu_device - high-level sprd iommu device representation, > > + * including hardware information and configuration, also driver data, etc > > + * > > + * @mdata: hardware configuration and information > > + * @ver: sprd iommu device version > > + * @prot_page: protect page base address, data would be written to > > here > > + * while translation fault > > + * @base:mapped base address for accessing registers > > + * @dev: pointer to basic device structure > > + * @iommu: IOMMU core representation > > + * @group: IOMMU group > > + */ > > +struct sprd_iommu_device { > > + const struct sprd_iommu_match_data *mdata; > > + enum sprd_iommu_version ver; > > + u32 *prot_page_va; > > + dma_addr_t prot_page_pa; > > + void __iomem*base; > > + struct device *dev; > > + struct iommu_device iommu; > > + struct iommu_group *group; > > +}; > > + > > +struct sprd_iommu_domain { > > + spinlock_t pgtlock; /* lock for page table */ > > + struct iommu_domain domain; > > + u32 *pgt_va; /* page table virtual address base */ > > + dma_addr_t pgt_pa; /*
Re: [PATCH v4 00/13] iommu/amd: Add Generic IO Page Table Framework Support
Hi Suravee, On Tue, Dec 15, 2020 at 01:36:52AM -0600, Suravee Suthikulpanit wrote: > Suravee Suthikulpanit (13): > iommu/amd: Re-define amd_iommu_domain_encode_pgtable as inline > iommu/amd: Prepare for generic IO page table framework > iommu/amd: Move pt_root to struct amd_io_pgtable > iommu/amd: Convert to using amd_io_pgtable > iommu/amd: Declare functions as extern > iommu/amd: Move IO page table related functions > iommu/amd: Restructure code for freeing page table > iommu/amd: Remove amd_iommu_domain_get_pgtable > iommu/amd: Rename variables to be consistent with struct > io_pgtable_ops > iommu/amd: Refactor fetch_pte to use struct amd_io_pgtable > iommu/amd: Introduce iommu_v1_iova_to_phys > iommu/amd: Introduce iommu_v1_map_page and iommu_v1_unmap_page > iommu/amd: Adopt IO page table framework for AMD IOMMU v1 page table Applied this series, thanks for the work! Given testing goes well you can consider this queued for 5.12. Thanks, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/amd: remove h from printk format specifier
On Tue, Dec 15, 2020 at 01:30:21PM -0800, t...@redhat.com wrote: > drivers/iommu/amd/init.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH -next] iommu: amd: Use DEFINE_SPINLOCK() for spinlock
On Mon, Dec 28, 2020 at 09:51:12PM +0800, Zheng Yongjun wrote: > spinlock can be initialized automatically with DEFINE_SPINLOCK() > rather than explicitly calling spin_lock_init(). > > Signed-off-by: Zheng Yongjun > --- > drivers/iommu/amd/iommu_v2.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH -next] iommu/amd/init: convert comma to semicolon
On Mon, Dec 14, 2020 at 09:44:38PM +0800, Zheng Yongjun wrote: > Replace a comma between expression statements by a semicolon. > > Signed-off-by: Zheng Yongjun > --- > drivers/iommu/amd/init.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) What tree is this against? This code does not exist in v5.11-rc5. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 3/3] iommu/arm-smmu-v3: Reserving the entire SMMU register space
commit 52f3fab0067d ("iommu/arm-smmu-v3: Don't reserve implementation defined register space") only reserves the basic SMMU register space. So the ECMDQ register space is not covered, it should be mapped again. Due to the size of this ECMDQ resource is not fixed, depending on SMMU_IDR6.CMDQ_CONTROL_PAGE_LOG2NUMQ. Processing its resource reservation to avoid resource conflict with PMCG is a bit more complicated. Therefore, the resources of the PMCG are not reserved, and the entire SMMU resources are reserved. Suggested-by: Robin Murphy Signed-off-by: Zhen Lei --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 24 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 -- 2 files changed, 4 insertions(+), 22 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index f04c55a7503c790..fde24403b06a9e3 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3476,14 +3476,6 @@ static int arm_smmu_set_bus_ops(struct iommu_ops *ops) return err; } -static void __iomem *arm_smmu_ioremap(struct device *dev, resource_size_t start, - resource_size_t size) -{ - struct resource res = DEFINE_RES_MEM(start, size); - - return devm_ioremap_resource(dev, ); -} - static int arm_smmu_device_probe(struct platform_device *pdev) { int irq, ret; @@ -3519,22 +3511,14 @@ static int arm_smmu_device_probe(struct platform_device *pdev) } ioaddr = res->start; - /* -* Don't map the IMPLEMENTATION DEFINED regions, since they may contain -* the PMCG registers which are reserved by the PMU driver. -*/ - smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ); + smmu->base = devm_ioremap_resource(dev, res); if (IS_ERR(smmu->base)) return PTR_ERR(smmu->base); - if (arm_smmu_resource_size(smmu) > SZ_64K) { - smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K, - ARM_SMMU_REG_SZ); - if (IS_ERR(smmu->page1)) - return PTR_ERR(smmu->page1); - } else { + if (arm_smmu_resource_size(smmu) > SZ_64K) + smmu->page1 = smmu->base + SZ_64K; + else smmu->page1 = smmu->base; - } /* Interrupt lines */ diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index da525f46dab4fc1..63f2b476987d6ae 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -152,8 +152,6 @@ #define ARM_SMMU_PRIQ_IRQ_CFG1 0xd8 #define ARM_SMMU_PRIQ_IRQ_CFG2 0xdc -#define ARM_SMMU_REG_SZ0xe00 - /* Common MSI config fields */ #define MSI_CFG0_ADDR_MASK GENMASK_ULL(51, 2) #define MSI_CFG2_SHGENMASK(5, 4) -- 1.8.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 1/3] perf/smmuv3: Don't reserve the PMCG register spaces
According to the SMMUv3 specification: Each PMCG counter group is represented by one 4KB page (Page 0) with one optional additional 4KB page (Page 1), both of which are at IMPLEMENTATION DEFINED base addresses. This means that the PMCG register spaces may be within the 64KB pages of the SMMUv3 register space. When both the SMMU and PMCG drivers reserve their own resources, a resource conflict occurs. To avoid this conflict, don't reserve the PMCG regions. Suggested-by: Robin Murphy Signed-off-by: Zhen Lei --- drivers/perf/arm_smmuv3_pmu.c | 27 +-- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c index 74474bb322c3f26..e5e505a0804fe53 100644 --- a/drivers/perf/arm_smmuv3_pmu.c +++ b/drivers/perf/arm_smmuv3_pmu.c @@ -761,6 +761,29 @@ static void smmu_pmu_get_acpi_options(struct smmu_pmu *smmu_pmu) dev_notice(smmu_pmu->dev, "option mask 0x%x\n", smmu_pmu->options); } +static void __iomem * +smmu_pmu_get_and_ioremap_resource(struct platform_device *pdev, + unsigned int index, + struct resource **res) +{ + void __iomem *base; + struct resource *r; + + r = platform_get_resource(pdev, IORESOURCE_MEM, index); + if (!r) { + dev_err(>dev, "invalid resource\n"); + return ERR_PTR(-EINVAL); + } + if (res) + *res = r; + + base = devm_ioremap(>dev, r->start, resource_size(r)); + if (!base) + return ERR_PTR(-ENOMEM); + + return base; +} + static int smmu_pmu_probe(struct platform_device *pdev) { struct smmu_pmu *smmu_pmu; @@ -793,7 +816,7 @@ static int smmu_pmu_probe(struct platform_device *pdev) .capabilities = PERF_PMU_CAP_NO_EXCLUDE, }; - smmu_pmu->reg_base = devm_platform_get_and_ioremap_resource(pdev, 0, _0); + smmu_pmu->reg_base = smmu_pmu_get_and_ioremap_resource(pdev, 0, _0); if (IS_ERR(smmu_pmu->reg_base)) return PTR_ERR(smmu_pmu->reg_base); @@ -801,7 +824,7 @@ static int smmu_pmu_probe(struct platform_device *pdev) /* Determine if page 1 is present */ if (cfgr & SMMU_PMCG_CFGR_RELOC_CTRS) { - smmu_pmu->reloc_base = devm_platform_ioremap_resource(pdev, 1); + smmu_pmu->reloc_base = smmu_pmu_get_and_ioremap_resource(pdev, 1, NULL); if (IS_ERR(smmu_pmu->reloc_base)) return PTR_ERR(smmu_pmu->reloc_base); } else { -- 1.8.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 0/3] perf/smmuv3: Don't reserve the PMCG register spaces
v2 --> v3: Patch 3 is updated because https://lkml.org/lkml/2021/1/22/532 has been queued in advance. v1 --> v2: According to Robin Murphy's suggestion: https://lkml.org/lkml/2021/1/20/470 Don't reserve the PMCG register spaces, and reserve the entire SMMU register space. v1: Since the PMCG may implement its resigters space(4KB Page0 and 4KB Page1) within the SMMUv3 64KB Page0. In this case, when the SMMUv3 driver reserves the 64KB Page0 resource in advance, the PMCG driver try to reserve its Page0 and Page1 resources, a resource conflict occurs. commit 52f3fab0067d6fa ("iommu/arm-smmu-v3: Don't reserve implementation defined register space") reduce the resource reservation range of the SMMUv3 driver, it only reserves the first 0xe00 bytes in the 64KB Page0, to avoid the above-mentioned resource conflicts. But the SMMUv3.3 add support for ECMDQ, its registers space is also implemented in the SMMUv3 64KB Page0. This means we need to build two separate mappings. New features may be added in the future, and more independent mappings may be required. The simple problem is complicated because the user expects to map the entire SMMUv3 64KB Page0. Therefore, the proper solution is: If the PMCG register resources are located in the 64KB Page0 of the SMMU, the PMCG driver does not reserve the conflict resources when the SMMUv3 driver has reserved the conflict resources before. Instead, the PMCG driver only performs devm_ioremap() to ensure that it can work properly. Zhen Lei (3): perf/smmuv3: Don't reserve the PMCG register spaces perf/smmuv3: Add a MODULE_SOFTDEP() to indicate dependency on SMMU iommu/arm-smmu-v3: Reserving the entire SMMU register space drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 24 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 -- drivers/perf/arm_smmuv3_pmu.c | 28 ++-- 3 files changed, 30 insertions(+), 24 deletions(-) -- 1.8.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 2/3] perf/smmuv3: Add a MODULE_SOFTDEP() to indicate dependency on SMMU
The MODULE_SOFTDEP() gives user space a hint of the loading sequence. And when command "modprobe arm_smmuv3_pmu" is executed, the arm_smmu_v3.ko is automatically loaded in advance. Signed-off-by: Zhen Lei --- drivers/perf/arm_smmuv3_pmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c index e5e505a0804fe53..9a305ac51208cd2 100644 --- a/drivers/perf/arm_smmuv3_pmu.c +++ b/drivers/perf/arm_smmuv3_pmu.c @@ -950,6 +950,7 @@ static void __exit arm_smmu_pmu_exit(void) module_exit(arm_smmu_pmu_exit); MODULE_DESCRIPTION("PMU driver for ARM SMMUv3 Performance Monitors Extension"); +MODULE_SOFTDEP("pre: arm_smmu_v3"); MODULE_AUTHOR("Neil Leeder "); MODULE_AUTHOR("Shameer Kolothum "); MODULE_LICENSE("GPL v2"); -- 1.8.3 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RESEND PATCH 1/1] iommu/amd: Remove unnecessary assignment
On Thu, Dec 10, 2020 at 10:13:30AM +0800, Adrian Huang wrote: > drivers/iommu/amd/init.c | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 0/6] IOMMU: Some more IOVA and core code tidy-up
On Wed, Jan 06, 2021 at 09:35:05PM +0800, John Garry wrote: > Just some tidy-up to IOVA and core code. > > Based on v5.11-rc2 > > Differences to v1: > - Add core IOMMU patches > > John Garry (6): > iova: Make has_iova_flush_queue() private > iova: Delete copy_reserved_iova() > iova: Stop exporting some more functions > iommu: Stop exporting iommu_map_sg_atomic() > iommu: Delete iommu_domain_window_disable() > iommu: Delete iommu_dev_has_feature() Applied, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/vt-d: do not use flush-queue when caching-mode is on
On 2021/1/27 14:17, Nadav Amit wrote: From: Nadav Amit When an Intel IOMMU is virtualized, and a physical device is passed-through to the VM, changes of the virtual IOMMU need to be propagated to the physical IOMMU. The hypervisor therefore needs to monitor PTE mappings in the IOMMU page-tables. Intel specifications provide "caching-mode" capability that a virtual IOMMU uses to report that the IOMMU is virtualized and a TLB flush is needed after mapping to allow the hypervisor to propagate virtual IOMMU mappings to the physical IOMMU. To the best of my knowledge no real physical IOMMU reports "caching-mode" as turned on. Synchronizing the virtual and the physical IOMMU tables is expensive if the hypervisor is unaware which PTEs have changed, as the hypervisor is required to walk all the virtualized tables and look for changes. Consequently, domain flushes are much more expensive than page-specific flushes on virtualized IOMMUs with passthrough devices. The kernel therefore exploited the "caching-mode" indication to avoid domain flushing and use page-specific flushing in virtualized environments. See commit 78d5f0f500e6 ("intel-iommu: Avoid global flushes with caching mode.") This behavior changed after commit 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing"). Now, when batched TLB flushing is used (the default), full TLB domain flushes are performed frequently, requiring the hypervisor to perform expensive synchronization between the virtual TLB and the physical one. Getting batched TLB flushes to use in such circumstances page-specific invalidations again is not easy, since the TLB invalidation scheme assumes that "full" domain TLB flushes are performed for scalability. Disable batched TLB flushes when caching-mode is on, as the performance benefit from using batched TLB invalidations is likely to be much smaller than the overhead of the virtual-to-physical IOMMU page-tables synchronization. Fixes: 78d5f0f500e6 ("intel-iommu: Avoid global flushes with caching mode.") Isn't it Fixes: 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing") ? Best regards, baolu Signed-off-by: Nadav Amit Cc: David Woodhouse Cc: Lu Baolu Cc: Joerg Roedel Cc: Will Deacon Cc: sta...@vger.kernel.org > --- v1->v2: * disable flush queue for all domains if caching-mode is on for any IOMMU (Lu). --- drivers/iommu/intel/iommu.c | 32 +++- 1 file changed, 31 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 788119c5b021..de3dd617cf60 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -5373,6 +5373,36 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain, return ret; } +static bool domain_use_flush_queue(void) +{ + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu; + bool r = true; + + if (intel_iommu_strict) + return false; + + /* +* The flush queue implementation does not perform page-selective +* invalidations that are required for efficient TLB flushes in virtual +* environments. The benefit of batching is likely to be much lower than +* the overhead of synchronizing the virtual and physical IOMMU +* page-tables. +*/ + rcu_read_lock(); + for_each_active_iommu(iommu, drhd) { + if (!cap_caching_mode(iommu->cap)) + continue; + + pr_warn_once("IOMMU batching is disabled due to virtualization"); + r = false; + break; + } + rcu_read_unlock(); + + return r; +} + static int intel_iommu_domain_get_attr(struct iommu_domain *domain, enum iommu_attr attr, void *data) @@ -5383,7 +5413,7 @@ intel_iommu_domain_get_attr(struct iommu_domain *domain, case IOMMU_DOMAIN_DMA: switch (attr) { case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: - *(int *)data = !intel_iommu_strict; + *(int *)data = domain_use_flush_queue(); return 0; default: return -ENODEV; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 07/33] iommu: Avoid reallocate default domain for a group
On Tue, 2021-01-26 at 22:23 +, Will Deacon wrote: > On Mon, Jan 11, 2021 at 07:18:48PM +0800, Yong Wu wrote: > > If group->default_domain exists, avoid reallocate it. > > > > In some iommu drivers, there may be several devices share a group. Avoid > > realloc the default domain for this case. > > > > Signed-off-by: Yong Wu > > --- > > drivers/iommu/iommu.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > index 3d099a31ddca..f4b87e6abe80 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -266,7 +266,8 @@ int iommu_probe_device(struct device *dev) > > * support default domains, so the return value is not yet > > * checked. > > */ > > - iommu_alloc_default_domain(group, dev); > > + if (!group->default_domain) > > + iommu_alloc_default_domain(group, dev); > > I don't really get what this achieves, since iommu_alloc_default_domain() > looks like this: > > static int iommu_alloc_default_domain(struct iommu_group *group, > struct device *dev) > { > unsigned int type; > > if (group->default_domain) > return 0; > > ... > > in which case, it should be fine? oh. sorry, I overlooked this. the current code is enough. I will remove this patch. and send the next version in this week. Thanks very much. > > Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/arm-smmu-v3: add support for BBML
On 2021-01-27 07:36, Keqian Zhu wrote: On 2021/1/27 10:01, Leizhen (ThunderTown) wrote: On 2021/1/26 18:12, Will Deacon wrote: On Mon, Jan 25, 2021 at 08:23:40PM +, Robin Murphy wrote: Now we probably will need some degreee of BBML feature awareness for the sake of SVA if and when we start using it for CPU pagetables, but I still cannot see any need to consider it in io-pgtable. Agreed; I don't think this is something that io-pgtable should have to care about. Hi, I have a question here :-). If the old table is not live, then the break procedure seems unnecessary. Do I miss something? The MMU is allowed to prefetch translations at any time, so not following the proper update procedure could still potentially lead to a TLB conflict, even if there's no device traffic to worry about disrupting. Robin. Thanks, Keqian Yes, the SVA works in stall mode, and the failed device access requests are not discarded. Let me look for examples. The BBML usage scenario was told by a former colleague. Will . ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel . ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/arm-smmu-v3: Use DEFINE_RES_MEM() to simplify code
On 2021/1/27 17:23, Will Deacon wrote: > On Wed, Jan 27, 2021 at 10:05:50AM +0800, Leizhen (ThunderTown) wrote: >> I've sent another set of patches. https://lkml.org/lkml/2021/1/26/1065 >> If those patches are acceptable, then this one should be ignored. > > I've already queued this one, so if you want me to drop it then you need to > send me a revert. Thanks. Since it's queued, keep it. I'll update the new patch set. > > Will > > . > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/arm-smmu-v3: Use DEFINE_RES_MEM() to simplify code
On Wed, Jan 27, 2021 at 10:05:50AM +0800, Leizhen (ThunderTown) wrote: > I've sent another set of patches. https://lkml.org/lkml/2021/1/26/1065 > If those patches are acceptable, then this one should be ignored. I've already queued this one, so if you want me to drop it then you need to send me a revert. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu