Re: [PATCH 1/2] iommu: Fix race condition during default domain allocation
On 6/11/2021 6:19 PM, Robin Murphy wrote: External email: Use caution opening links or attachments On 2021-06-11 11:45, Will Deacon wrote: On Thu, Jun 10, 2021 at 09:46:53AM +0530, Ashish Mhetre wrote: Domain is getting created more than once during asynchronous multiple display heads(devices) probe. All the display heads share same SID and are expected to be in same domain. As iommu_alloc_default_domain() call is not protected, the group->default_domain and group->domain are ending up with different domains and leading to subsequent IOMMU faults. Fix this by protecting iommu_alloc_default_domain() call with group->mutex. Can you provide some more information about exactly what the h/w configuration is, and the callstack which exhibits the race, please? It'll be basically the same as the issue reported long ago with PCI groups in the absence of ACS not being constructed correctly. Triggering the iommu_probe_device() replay in of_iommu_configure() off the back of driver probe is way too late and allows calls to happen in the wrong order, or indeed race in parallel as here. Fixing that is still on my radar, but will not be simple, and will probably go hand-in-hand with phasing out the bus ops (for the multiple-driver-coexistence problem). For iommu group creation, the stack flow during race is like: Display device 1: iommu_probe_device -> iommu_group_get_for_dev -> arm_smmu_device_group Display device 2: iommu_probe_device -> iommu_group_get_for_dev -> arm_smmu_device_group And this way it ends up in creating 2 groups for 2 display devices sharing same SID. Ideally for 2nd display device, iommu_group_get call from iommu_group_get_for_dev should return same group as 1st display device. But due to the race, it ends up with 2 groups. For default domain, the stack flow during race is like: Display device 1: iommu_probe_device -> iommu_alloc_default_domain -> arm_smmu_domain_alloc Display device 2: iommu_probe_device -> iommu_alloc_default_domain -> arm_smmu_domain_alloc Here also 2nd device should already have domain allocated and 'if(group->default_domain)' condition from iommu_alloc_default_domain should be true for 2nd device. Issue with this is IOVA accesses from 2nd device results in context faults. Signed-off-by: Ashish Mhetre --- drivers/iommu/iommu.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 808ab70..2700500 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -273,7 +273,9 @@ int iommu_probe_device(struct device *dev) * support default domains, so the return value is not yet * checked. */ + mutex_lock(>mutex); iommu_alloc_default_domain(group, dev); + mutex_unlock(>mutex); It feels wrong to serialise this for everybody just to cater for systems with aliasing SIDs between devices. If two or more devices are racing at this point then they're already going to be serialised by at least iommu_group_add_device(), so I doubt there would be much impact - only the first device through here will hold the mutex for any appreciable length of time. Every other path which modifies group->domain does so with the mutex held (note the "expected" default domain allocation flow in bus_iommu_probe() in particular), so not holding it here does seem like a straightforward oversight. Robin. Serialization will only happen for the devices sharing same group. Only the first device in group will hold this till domain is created. For rest of the devices it will just check for existing domain in iommu_alloc_default_domain and then return and release the mutex. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)
Christoph Hellwig wrote on Thu, Jun 17, 2021 at 07:12:32AM +0200: > On Thu, Jun 17, 2021 at 09:39:15AM +0900, Dominique MARTINET wrote: > > Konrad Rzeszutek Wilk wrote on Wed, Jun 16, 2021 at 08:27:39PM -0400: > > > Thank you for testing that - and this is a bummer indeed. > > > > Hm, actually not that surprising if it was working without the offset > > adjustments and doing non-aligned mappings -- perhaps the nvme code just > > needs to round the offsets down instead of expecting swiotlb to do it? > > It can't. The whole point of the series was to keep the original offsets. Right, now I'm reading this again there are two kind of offsets (quoting code from today's master) --- static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size, enum dma_data_direction dir) { struct io_tlb_mem *mem = io_tlb_default_mem; int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT; phys_addr_t orig_addr = mem->slots[index].orig_addr; --- There is: - (tlb_addr - mem->start) alignment that Linus added up - mem->slots[index].orig_addr alignment (within IO_TLB_SIZE blocks) I would assume that series made it possible to preserve offsets within a block for orig_addr, but in the process broke the offsets of a bounce within an memory slot (the first one) ; I assume we want to restore here the offset within the IO_TLB_SIZE block in orig_addr so it needs another offseting of that orig_addr offset e.g. taking a block and offsets within blocks, we have at the start of function: |-|---|--| ^ ^ ^ block start slot orig addr tlb_addr and want the orig_addr variable to align with tlb_addr. So I was a bit hasty in saying nvme needs to remove offsets, it's more that current code only has the second one working while the quick fix breaks the second one in the process of fixing the first... Jianxiong Gao, before spending more time on this, could you also try Chanho Park's patch? https://lore.kernel.org/linux-iommu/20210510091816.ga2...@lst.de/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f I frankly don't understand many details of that code at this point, in particular I have no idea why or if the patch needs another offset with mem->start or where the dma_get_min_align_mask(dev) comes from, but it'll be interesting to test. Thanks, -- Dominique ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)
On Thu, Jun 17, 2021 at 09:39:15AM +0900, Dominique MARTINET wrote: > Konrad Rzeszutek Wilk wrote on Wed, Jun 16, 2021 at 08:27:39PM -0400: > > Thank you for testing that - and this is a bummer indeed. > > Hm, actually not that surprising if it was working without the offset > adjustments and doing non-aligned mappings -- perhaps the nvme code just > needs to round the offsets down instead of expecting swiotlb to do it? It can't. The whole point of the series was to keep the original offsets. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Plan for /dev/ioasid RFC v2
Hi Alex, On Wed, 16 Jun 2021 13:39:37 -0600, Alex Williamson wrote: > On Wed, 16 Jun 2021 06:43:23 + > "Tian, Kevin" wrote: > > > > From: Alex Williamson > > > Sent: Wednesday, June 16, 2021 12:12 AM > > > > > > On Tue, 15 Jun 2021 02:31:39 + > > > "Tian, Kevin" wrote: > > > > > > > > From: Alex Williamson > > > > > Sent: Tuesday, June 15, 2021 12:28 AM > > > > > > > > > [...] > > > > > > IOASID. Today the group fd requires an IOASID before it hands out a > > > > > > device_fd. With iommu_fd the device_fd will not allow IOCTLs until > > > > > > it > > > > > > has a blocked DMA IOASID and is successefully joined to an > > > > > > iommu_fd. > > > > > > > > > > Which is the root of my concern. Who owns ioctls to the device fd? > > > > > It's my understanding this is a vfio provided file descriptor and it's > > > > > therefore vfio's responsibility. A device-level IOASID interface > > > > > therefore requires that vfio manage the group aspect of device access. > > > > > AFAICT, that means that device access can therefore only begin when > > > > > all > > > > > devices for a given group are attached to the IOASID and must halt for > > > > > all devices in the group if any device is ever detached from an > > > > > IOASID, > > > > > even temporarily. That suggests a lot more oversight of the IOASIDs > > > > > by > > > > > vfio than I'd prefer. > > > > > > > > > > > > > This is possibly the point that is worthy of more clarification and > > > > alignment, as it sounds like the root of controversy here. > > > > > > > > I feel the goal of vfio group management is more about ownership, i.e. > > > > all devices within a group must be assigned to a single user. Following > > > > the three rules defined by Jason, what we really care is whether a group > > > > of devices can be isolated from the rest of the world, i.e. no access to > > > > memory/device outside of its security context and no access to its > > > > security context from devices outside of this group. This can be > > > > achieved > > > > as long as every device in the group is either in block-DMA state when > > > > it's not attached to any security context or attached to an IOASID > > > > context > > > > in IOMMU fd. > > > > > > > > As long as group-level isolation is satisfied, how devices within a > > > > group > > > > are further managed is decided by the user (unattached, all attached to > > > > same IOASID, attached to different IOASIDs) as long as the user > > > > understands the implication of lacking of isolation within the group. > > > > This > > > > is what a device-centric model comes to play. Misconfiguration just > > > > hurts > > > > the user itself. > > > > > > > > If this rationale can be agreed, then I didn't see the point of having > > > > VFIO > > > > to mandate all devices in the group must be attached/detached in > > > > lockstep. > > > > > > In theory this sounds great, but there are still too many assumptions > > > and too much hand waving about where isolation occurs for me to feel > > > like I really have the complete picture. So let's walk through some > > > examples. Please fill in and correct where I'm wrong. > > > > Thanks for putting these examples. They are helpful for clearing the > > whole picture. > > > > Before filling in let's first align on what is the key difference between > > current VFIO model and this new proposal. With this comparison we'll > > know which of following questions are answered with existing VFIO > > mechanism and which are handled differently. > > > > With Yi's help we figured out the current mechanism: > > > > 1) vfio_group_viable. The code comment explains the intention clearly: > > > > -- > > * A vfio group is viable for use by userspace if all devices are in > > * one of the following states: > > * - driver-less > > * - bound to a vfio driver > > * - bound to an otherwise allowed driver > > * - a PCI interconnect device > > -- > > > > Note this check is not related to an IOMMU security context. > > Because this is a pre-requisite for imposing that IOMMU security > context. > > > 2) vfio_iommu_group_notifier. When an IOMMU_GROUP_NOTIFY_ > > BOUND_DRIVER event is notified, vfio_group_viable is re-evaluated. > > If the affected group was previously viable but now becomes not > > viable, BUG_ON() as it implies that this device is bound to a non-vfio > > driver which breaks the group isolation. > > This notifier action is conditional on there being users of devices > within a secure group IOMMU context. > > > 3) vfio_group_get_device_fd. User can acquire a device fd only after > > a) the group is viable; > > b) the group is attached to a container; > > c) iommu is set on the container (implying a security context > > established); > > The order is actually b) a) c) but arguably b) is a no-op until: > > d) a device fd is provided to the user Per the code in QEMU vfio_get_group(). The
Re: [RFC PATCH] iommu: add domain->nested
On 2021/6/16 下午10:44, Christoph Hellwig wrote: On Wed, Jun 16, 2021 at 10:38:02PM +0800, Zhangfei Gao wrote: +++ b/include/linux/iommu.h @@ -87,6 +87,7 @@ struct iommu_domain { void *handler_token; struct iommu_domain_geometry geometry; void *iova_cookie; + int nested; This should probably be a bool : 1; Also this needs a user, so please just queue up a variant of this for the code that eventually relies on this information. Thanks Christoph Got it, will do this. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v12 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On Wed, 16 Jun 2021, Claire Chang wrote: > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and > use it to determine whether to bounce the data or not. This will be > useful later to allow for different pools. > > Signed-off-by: Claire Chang > --- > include/linux/swiotlb.h | 11 +++ > kernel/dma/direct.c | 2 +- > kernel/dma/direct.h | 2 +- > kernel/dma/swiotlb.c| 4 > 4 files changed, 17 insertions(+), 2 deletions(-) > > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h > index dd1c30a83058..8d8855c77d9a 100644 > --- a/include/linux/swiotlb.h > +++ b/include/linux/swiotlb.h > @@ -84,6 +84,7 @@ extern enum swiotlb_force swiotlb_force; > * unmap calls. > * @debugfs: The dentry to debugfs. > * @late_alloc: %true if allocated using the page allocator > + * @force_bounce: %true if swiotlb bouncing is forced > */ > struct io_tlb_mem { > phys_addr_t start; > @@ -94,6 +95,7 @@ struct io_tlb_mem { > spinlock_t lock; > struct dentry *debugfs; > bool late_alloc; > + bool force_bounce; > struct io_tlb_slot { > phys_addr_t orig_addr; > size_t alloc_size; > @@ -109,6 +111,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, > phys_addr_t paddr) > return mem && paddr >= mem->start && paddr < mem->end; > } > > +static inline bool is_swiotlb_force_bounce(struct device *dev) > +{ > + return dev->dma_io_tlb_mem->force_bounce; > +} > void __init swiotlb_exit(void); > unsigned int swiotlb_max_segment(void); > size_t swiotlb_max_mapping_size(struct device *dev); > @@ -120,6 +127,10 @@ static inline bool is_swiotlb_buffer(struct device *dev, > phys_addr_t paddr) > { > return false; > } > +static inline bool is_swiotlb_force_bounce(struct device *dev) > +{ > + return false; > +} > static inline void swiotlb_exit(void) > { > } > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c > index 7a88c34d0867..a92465b4eb12 100644 > --- a/kernel/dma/direct.c > +++ b/kernel/dma/direct.c > @@ -496,7 +496,7 @@ size_t dma_direct_max_mapping_size(struct device *dev) > { > /* If SWIOTLB is active, use its maximum mapping size */ > if (is_swiotlb_active(dev) && > - (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE)) > + (dma_addressing_limited(dev) || is_swiotlb_force_bounce(dev))) > return swiotlb_max_mapping_size(dev); > return SIZE_MAX; > } > diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h > index 13e9e7158d94..4632b0f4f72e 100644 > --- a/kernel/dma/direct.h > +++ b/kernel/dma/direct.h > @@ -87,7 +87,7 @@ static inline dma_addr_t dma_direct_map_page(struct device > *dev, > phys_addr_t phys = page_to_phys(page) + offset; > dma_addr_t dma_addr = phys_to_dma(dev, phys); > > - if (unlikely(swiotlb_force == SWIOTLB_FORCE)) > + if (is_swiotlb_force_bounce(dev)) > return swiotlb_map(dev, phys, size, dir, attrs); > > if (unlikely(!dma_capable(dev, dma_addr, size, true))) { Should we also make the same change in drivers/xen/swiotlb-xen.c:xen_swiotlb_map_page ? If I make that change, I can see that everything is working as expected for a restricted-dma device with Linux running as dom0 on Xen. However, is_swiotlb_force_bounce returns non-zero even for normal non-restricted-dma devices. That shouldn't happen, right? It looks like struct io_tlb_slot is not zeroed on allocation. Adding memset(mem, 0x0, struct_size) in swiotlb_late_init_with_tbl solves the issue. With those two changes, the series passes my tests and you can add my tested-by. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)
Konrad Rzeszutek Wilk wrote on Wed, Jun 16, 2021 at 08:27:39PM -0400: > Thank you for testing that - and this is a bummer indeed. Hm, actually not that surprising if it was working without the offset adjustments and doing non-aligned mappings -- perhaps the nvme code just needs to round the offsets down instead of expecting swiotlb to do it? Note I didn't look at that part of the code at all, so I might be stating the obvious in a way that's difficult to adjust... > Dominique, Horia, > > Are those crypto devices somehow easily available to test out the > patches? The one I have is included in the iMX8MP and iMX8MQ socs, the later is included in the mnt reform and librem 5 and both have evaluation toolkits but I wouldn't quite say they are easy to get... I'm happy to test different patch variants if Horia doesn't beat me to it though, it's not as practical as having the device but don't hesitate to ask if I can run with extra debugs or something. -- Dominique ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)
On Wed, Jun 16, 2021 at 01:49:54PM -0700, Jianxiong Gao wrote: > On Fri, Jun 11, 2021 at 3:35 AM Konrad Rzeszutek Wilk > wrote: > > > > On Fri, Jun 11, 2021 at 08:21:53AM +0200, Christoph Hellwig wrote: > > > On Thu, Jun 10, 2021 at 05:52:07PM +0300, Horia Geantă wrote: > > > > I've noticed the failure also in v5.10 and v5.11 stable kernels, > > > > since the patch set has been backported. > > > > > > FYI, there has been a patch on the list that should have fixed this > > > for about a month: > > > > > > https://lore.kernel.org/linux-iommu/20210510091816.ga2...@lst.de/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f > > > > > > but it seems like it never got picked up. > > > > Jianxiong, > > Would you be up for testing this patch on your NVMe rig please? I don't > > forsee a problem.. but just in case > > > I have tested the attached patch and it generates an error when > formatting a disk to xfs format in Rhel 8 environment: Thank you for testing that - and this is a bummer indeed. Jianxiong, How unique is this NVMe? Should I be able to reproduce this with any type or is it specific to Google Cloud? Dominique, Horia, Are those crypto devices somehow easily available to test out the patches? P.S. Most unfortunate timing - I am out in rural areas in US with not great Internet, so won't be able to get fully down to this until Monday. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v12 11/12] dt-bindings: of: Add restricted DMA pool
On Wed, 16 Jun 2021, Claire Chang wrote: > Introduce the new compatible string, restricted-dma-pool, for restricted > DMA. One can specify the address and length of the restricted DMA memory > region by restricted-dma-pool in the reserved-memory node. > > Signed-off-by: Claire Chang > --- > .../reserved-memory/reserved-memory.txt | 36 +-- > 1 file changed, 33 insertions(+), 3 deletions(-) > > diff --git > a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt > b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt > index e8d3096d922c..46804f24df05 100644 > --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt > +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt > @@ -51,6 +51,23 @@ compatible (optional) - standard definition >used as a shared pool of DMA buffers for a set of devices. It can >be used by an operating system to instantiate the necessary pool >management subsystem if necessary. > +- restricted-dma-pool: This indicates a region of memory meant to be > + used as a pool of restricted DMA buffers for a set of devices. The > + memory region would be the only region accessible to those devices. > + When using this, the no-map and reusable properties must not be > set, > + so the operating system can create a virtual mapping that will be > used > + for synchronization. The main purpose for restricted DMA is to > + mitigate the lack of DMA access control on systems without an > IOMMU, > + which could result in the DMA accessing the system memory at > + unexpected times and/or unexpected addresses, possibly leading to > data > + leakage or corruption. The feature on its own provides a basic > level > + of protection against the DMA overwriting buffer contents at > + unexpected times. However, to protect against general data leakage > and > + system memory corruption, the system needs to provide way to lock > down > + the memory access, e.g., MPU. Note that since coherent allocation > + needs remapping, one must set up another device coherent pool by > + shared-dma-pool and use dma_alloc_from_dev_coherent instead for > atomic > + coherent allocation. > - vendor specific string in the form ,[-] > no-map (optional) - empty property > - Indicates the operating system must not create a virtual mapping > @@ -85,10 +102,11 @@ memory-region-names (optional) - a list of names, one > for each corresponding > > Example > --- > -This example defines 3 contiguous regions are defined for Linux kernel: > +This example defines 4 contiguous regions for Linux kernel: > one default of all device drivers (named linux,cma@7200 and 64MiB in > size), > -one dedicated to the framebuffer device (named framebuffer@7800, 8MiB), > and > -one for multimedia processing (named multimedia-memory@7700, 64MiB). > +one dedicated to the framebuffer device (named framebuffer@7800, 8MiB), > +one for multimedia processing (named multimedia-memory@7700, 64MiB), and > +one for restricted dma pool (named restricted_dma_reserved@0x5000, > 64MiB). > > / { > #address-cells = <1>; > @@ -120,6 +138,11 @@ one for multimedia processing (named > multimedia-memory@7700, 64MiB). > compatible = "acme,multimedia-memory"; > reg = <0x7700 0x400>; > }; > + > + restricted_dma_reserved: restricted_dma_reserved { > + compatible = "restricted-dma-pool"; > + reg = <0x5000 0x400>; > + }; > }; > > /* ... */ > @@ -138,4 +161,11 @@ one for multimedia processing (named > multimedia-memory@7700, 64MiB). > memory-region = <_reserved>; > /* ... */ > }; > + > + pcie_device: pcie_device@0,0 { > + reg = <0x8301 0x0 0x 0x0 0x0010 > +0x8301 0x0 0x0010 0x0 0x0010>; > + memory-region = <_dma_mem_reserved>; Shouldn't it be _dma_reserved ? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)
On Fri, Jun 11, 2021 at 3:35 AM Konrad Rzeszutek Wilk wrote: > > On Fri, Jun 11, 2021 at 08:21:53AM +0200, Christoph Hellwig wrote: > > On Thu, Jun 10, 2021 at 05:52:07PM +0300, Horia Geantă wrote: > > > I've noticed the failure also in v5.10 and v5.11 stable kernels, > > > since the patch set has been backported. > > > > FYI, there has been a patch on the list that should have fixed this > > for about a month: > > > > https://lore.kernel.org/linux-iommu/20210510091816.ga2...@lst.de/T/#m0d0df6490350a08dcc24c9086c8edc165b402d6f > > > > but it seems like it never got picked up. > > Jianxiong, > Would you be up for testing this patch on your NVMe rig please? I don't > forsee a problem.. but just in case > I have tested the attached patch and it generates an error when formatting a disk to xfs format in Rhel 8 environment: sudo mkfs.xfs -f /dev/nvme0n2 meta-data=/dev/nvme0n2 isize=512agcount=4, agsize=32768000 blks = sectsz=512 attr=2, projid32bit=1 = crc=1finobt=1, sparse=1, rmapbt=0 = reflink=1 data = bsize=4096 blocks=131072000, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=64000, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Discarding blocks...Done. bad magic number bad magic number Metadata corruption detected at 0x56211de4c0c8, xfs_sb block 0x0/0x200 libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x200 releasing dirty buffer (bulk) to free list! I applied the patch on commit 06af8679449d. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: Plan for /dev/ioasid RFC v2
On Wed, 16 Jun 2021 06:43:23 + "Tian, Kevin" wrote: > > From: Alex Williamson > > Sent: Wednesday, June 16, 2021 12:12 AM > > > > On Tue, 15 Jun 2021 02:31:39 + > > "Tian, Kevin" wrote: > > > > > > From: Alex Williamson > > > > Sent: Tuesday, June 15, 2021 12:28 AM > > > > > > > [...] > > > > > IOASID. Today the group fd requires an IOASID before it hands out a > > > > > device_fd. With iommu_fd the device_fd will not allow IOCTLs until it > > > > > has a blocked DMA IOASID and is successefully joined to an iommu_fd. > > > > > > > > Which is the root of my concern. Who owns ioctls to the device fd? > > > > It's my understanding this is a vfio provided file descriptor and it's > > > > therefore vfio's responsibility. A device-level IOASID interface > > > > therefore requires that vfio manage the group aspect of device access. > > > > AFAICT, that means that device access can therefore only begin when all > > > > devices for a given group are attached to the IOASID and must halt for > > > > all devices in the group if any device is ever detached from an IOASID, > > > > even temporarily. That suggests a lot more oversight of the IOASIDs by > > > > vfio than I'd prefer. > > > > > > > > > > This is possibly the point that is worthy of more clarification and > > > alignment, as it sounds like the root of controversy here. > > > > > > I feel the goal of vfio group management is more about ownership, i.e. > > > all devices within a group must be assigned to a single user. Following > > > the three rules defined by Jason, what we really care is whether a group > > > of devices can be isolated from the rest of the world, i.e. no access to > > > memory/device outside of its security context and no access to its > > > security context from devices outside of this group. This can be achieved > > > as long as every device in the group is either in block-DMA state when > > > it's not attached to any security context or attached to an IOASID context > > > in IOMMU fd. > > > > > > As long as group-level isolation is satisfied, how devices within a group > > > are further managed is decided by the user (unattached, all attached to > > > same IOASID, attached to different IOASIDs) as long as the user > > > understands the implication of lacking of isolation within the group. This > > > is what a device-centric model comes to play. Misconfiguration just hurts > > > the user itself. > > > > > > If this rationale can be agreed, then I didn't see the point of having > > > VFIO > > > to mandate all devices in the group must be attached/detached in > > > lockstep. > > > > In theory this sounds great, but there are still too many assumptions > > and too much hand waving about where isolation occurs for me to feel > > like I really have the complete picture. So let's walk through some > > examples. Please fill in and correct where I'm wrong. > > Thanks for putting these examples. They are helpful for clearing the > whole picture. > > Before filling in let's first align on what is the key difference between > current VFIO model and this new proposal. With this comparison we'll > know which of following questions are answered with existing VFIO > mechanism and which are handled differently. > > With Yi's help we figured out the current mechanism: > > 1) vfio_group_viable. The code comment explains the intention clearly: > > -- > * A vfio group is viable for use by userspace if all devices are in > * one of the following states: > * - driver-less > * - bound to a vfio driver > * - bound to an otherwise allowed driver > * - a PCI interconnect device > -- > > Note this check is not related to an IOMMU security context. Because this is a pre-requisite for imposing that IOMMU security context. > 2) vfio_iommu_group_notifier. When an IOMMU_GROUP_NOTIFY_ > BOUND_DRIVER event is notified, vfio_group_viable is re-evaluated. > If the affected group was previously viable but now becomes not > viable, BUG_ON() as it implies that this device is bound to a non-vfio > driver which breaks the group isolation. This notifier action is conditional on there being users of devices within a secure group IOMMU context. > 3) vfio_group_get_device_fd. User can acquire a device fd only after > a) the group is viable; > b) the group is attached to a container; > c) iommu is set on the container (implying a security context > established); The order is actually b) a) c) but arguably b) is a no-op until: d) a device fd is provided to the user > The new device-centric proposal suggests: > > 1) vfio_group_viable; > 2) vfio_iommu_group_notifier; > 3) block-DMA if a device is detached from previous domain (instead of > switching back to default domain as today); I'm literally begging for specifics in this thread, but none are provided here. What is the "previous domain"? How is a device placed into a DMA blocking IOMMU context? Is this the IOMMU default domain?
Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems
On Tue, Jun 15, 2021 at 01:15:43PM -0600, Rob Herring wrote: > If a property has an 'items' list, then a 'minItems' or 'maxItems' with the > same size as the list is redundant and can be dropped. Note that is DT > schema specific behavior and not standard json-schema behavior. The tooling > will fixup the final schema adding any unspecified minItems/maxItems. > > This condition is partially checked with the meta-schema already, but > only if both 'minItems' and 'maxItems' are equal to the 'items' length. > An improved meta-schema is pending. > > Cc: Jens Axboe > Cc: Stephen Boyd > Cc: Herbert Xu > Cc: "David S. Miller" > Cc: David Airlie > Cc: Daniel Vetter > Cc: Vinod Koul > Cc: Bartosz Golaszewski > Cc: Kamal Dasu > Cc: Jonathan Cameron > Cc: Lars-Peter Clausen > Cc: Thomas Gleixner > Cc: Marc Zyngier > Cc: Joerg Roedel > Cc: Jassi Brar > Cc: Mauro Carvalho Chehab > Cc: Krzysztof Kozlowski > Cc: Ulf Hansson > Cc: Jakub Kicinski > Cc: Wolfgang Grandegger > Cc: Marc Kleine-Budde > Cc: Andrew Lunn > Cc: Vivien Didelot > Cc: Vladimir Oltean > Cc: Bjorn Helgaas > Cc: Kishon Vijay Abraham I > Cc: Linus Walleij > Cc: "Uwe Kleine-König" > Cc: Lee Jones > Cc: Ohad Ben-Cohen > Cc: Mathieu Poirier > Cc: Philipp Zabel > Cc: Paul Walmsley > Cc: Palmer Dabbelt > Cc: Albert Ou > Cc: Alessandro Zummo > Cc: Alexandre Belloni > Cc: Greg Kroah-Hartman > Cc: Mark Brown > Cc: Zhang Rui > Cc: Daniel Lezcano > Cc: Wim Van Sebroeck > Cc: Guenter Roeck > Signed-off-by: Rob Herring Acked-by: Wolfram Sang # for I2C signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 7/7] iommu/amd: Use only natural aligned flushes in a VM
From: Nadav Amit When running on an AMD vIOMMU, it is better to avoid TLB flushes of unmodified PTEs. vIOMMUs require the hypervisor to synchronize the virtualized IOMMU's PTEs with the physical ones. This process induce overheads. AMD IOMMU allows us to flush any range that is aligned to the power of 2. So when running on top of a vIOMMU, break the range into sub-ranges that are naturally aligned, and flush each one separately. This apporach is better when running with a vIOMMU, but on physical IOMMUs, the penalty of IOTLB misses due to unnecessary flushed entries is likely to be low. Repurpose (i.e., keeping the name, changing the logic) domain_flush_pages() so it is used to choose whether to perform one flush of the whole range or multiple ones to avoid flushing unnecessary ranges. Use NpCache, as usual, to infer whether the IOMMU is physical or virtual. Cc: Joerg Roedel Cc: Will Deacon Cc: Jiajun Cao Cc: Lu Baolu Cc: iommu@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org Suggested-by: Robin Murphy Signed-off-by: Nadav Amit --- drivers/iommu/amd/iommu.c | 47 ++- 1 file changed, 42 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index ce8e970aac9a..ec0b6ad27e48 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -1262,15 +1262,52 @@ static void __domain_flush_pages(struct protection_domain *domain, } static void domain_flush_pages(struct protection_domain *domain, - u64 address, size_t size) + u64 address, size_t size, int pde) { - __domain_flush_pages(domain, address, size, 0); + if (likely(!amd_iommu_np_cache)) { + __domain_flush_pages(domain, address, size, pde); + return; + } + + /* +* When NpCache is on, we infer that we run in a VM and use a vIOMMU. +* In such setups it is best to avoid flushes of ranges which are not +* naturally aligned, since it would lead to flushes of unmodified +* PTEs. Such flushes would require the hypervisor to do more work than +* necessary. Therefore, perform repeated flushes of aligned ranges +* until you cover the range. Each iteration flush the smaller between +* the natural alignment of the address that we flush and the highest +* bit that is set in the remaining size. +*/ + while (size != 0) { + int addr_alignment = __ffs(address); + int size_alignment = __fls(size); + int min_alignment; + size_t flush_size; + + /* +* size is always non-zero, but address might be zero, causing +* addr_alignment to be negative. As the casting of the +* argument in __ffs(address) to long might trim the high bits +* of the address on x86-32, cast to long when doing the check. +*/ + if (likely((unsigned long)address != 0)) + min_alignment = min(addr_alignment, size_alignment); + else + min_alignment = size_alignment; + + flush_size = 1ul << min_alignment; + + __domain_flush_pages(domain, address, flush_size, pde); + address += flush_size; + size -= flush_size; + } } /* Flush the whole IO/TLB for a given protection domain - including PDE */ void amd_iommu_domain_flush_tlb_pde(struct protection_domain *domain) { - __domain_flush_pages(domain, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS, 1); + domain_flush_pages(domain, 0, CMD_INV_IOMMU_ALL_PAGES_ADDRESS, 1); } void amd_iommu_domain_flush_complete(struct protection_domain *domain) @@ -1297,7 +1334,7 @@ static void domain_flush_np_cache(struct protection_domain *domain, unsigned long flags; spin_lock_irqsave(>lock, flags); - domain_flush_pages(domain, iova, size); + domain_flush_pages(domain, iova, size, 1); amd_iommu_domain_flush_complete(domain); spin_unlock_irqrestore(>lock, flags); } @@ -2205,7 +2242,7 @@ static void amd_iommu_iotlb_sync(struct iommu_domain *domain, unsigned long flags; spin_lock_irqsave(>lock, flags); - __domain_flush_pages(dom, gather->start, gather->end - gather->start, 1); + domain_flush_pages(dom, gather->start, gather->end - gather->start, 1); amd_iommu_domain_flush_complete(dom); spin_unlock_irqrestore(>lock, flags); } -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 6/7] iommu/amd: Sync once for scatter-gather operations
From: Nadav Amit On virtual machines, software must flush the IOTLB after each page table entry update. The iommu_map_sg() code iterates through the given scatter-gather list and invokes iommu_map() for each element in the scatter-gather list, which calls into the vendor IOMMU driver through iommu_ops callback. As the result, a single sg mapping may lead to multiple IOTLB flushes. Fix this by adding amd_iotlb_sync_map() callback and flushing at this point after all sg mappings we set. This commit is followed and inspired by commit 933fcd01e97e2 ("iommu/vt-d: Add iotlb_sync_map callback"). Cc: Joerg Roedel Cc: Will Deacon Cc: Jiajun Cao Cc: Robin Murphy Cc: Lu Baolu Cc: iommu@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Nadav Amit --- drivers/iommu/amd/iommu.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 63048aabaf5d..ce8e970aac9a 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -2027,6 +2027,16 @@ static int amd_iommu_attach_device(struct iommu_domain *dom, return ret; } +static void amd_iommu_iotlb_sync_map(struct iommu_domain *dom, +unsigned long iova, size_t size) +{ + struct protection_domain *domain = to_pdomain(dom); + struct io_pgtable_ops *ops = >iop.iop.ops; + + if (ops->map) + domain_flush_np_cache(domain, iova, size); +} + static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, phys_addr_t paddr, size_t page_size, int iommu_prot, gfp_t gfp) @@ -2045,10 +2055,8 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, if (iommu_prot & IOMMU_WRITE) prot |= IOMMU_PROT_IW; - if (ops->map) { + if (ops->map) ret = ops->map(ops, iova, paddr, page_size, prot, gfp); - domain_flush_np_cache(domain, iova, page_size); - } return ret; } @@ -2228,6 +2236,7 @@ const struct iommu_ops amd_iommu_ops = { .attach_dev = amd_iommu_attach_device, .detach_dev = amd_iommu_detach_device, .map = amd_iommu_map, + .iotlb_sync_map = amd_iommu_iotlb_sync_map, .unmap = amd_iommu_unmap, .iova_to_phys = amd_iommu_iova_to_phys, .probe_device = amd_iommu_probe_device, -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 4/7] iommu: Factor iommu_iotlb_gather_is_disjoint() out
From: Nadav Amit Refactor iommu_iotlb_gather_add_page() and factor out the logic that detects whether IOTLB gather range and a new range are disjoint. To be used by the next patch that implements different gathering logic for AMD. Note that updating gather->pgsize unconditionally does not affect correctness as the function had (and has) an invariant, in which gather->pgsize always represents the flushing granularity of its range. Arguably, “size" should never be zero, but lets assume for the matter of discussion that it might. If "size" equals to "gather->pgsize", then the assignment in question has no impact. Otherwise, if "size" is non-zero, then iommu_iotlb_sync() would initialize the size and range (see iommu_iotlb_gather_init()), and the invariant is kept. Otherwise, "size" is zero, and "gather" already holds a range, so gather->pgsize is non-zero and (gather->pgsize && gather->pgsize != size) is true. Therefore, again, iommu_iotlb_sync() would be called and initialize the size. Cc: Joerg Roedel Cc: Jiajun Cao Cc: Robin Murphy Cc: Lu Baolu Cc: iommu@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org> Acked-by: Will Deacon Signed-off-by: Nadav Amit --- include/linux/iommu.h | 34 ++ 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index e554871db46f..979a5ceeea55 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -497,6 +497,28 @@ static inline void iommu_iotlb_sync(struct iommu_domain *domain, iommu_iotlb_gather_init(iotlb_gather); } +/** + * iommu_iotlb_gather_is_disjoint - Checks whether a new range is disjoint + * + * @gather: TLB gather data + * @iova: start of page to invalidate + * @size: size of page to invalidate + * + * Helper for IOMMU drivers to check whether a new range and the gathered range + * are disjoint. For many IOMMUs, flushing the IOMMU in this case is better + * than merging the two, which might lead to unnecessary invalidations. + */ +static inline +bool iommu_iotlb_gather_is_disjoint(struct iommu_iotlb_gather *gather, + unsigned long iova, size_t size) +{ + unsigned long start = iova, end = start + size - 1; + + return gather->end != 0 && + (end + 1 < gather->start || start > gather->end + 1); +} + + /** * iommu_iotlb_gather_add_range - Gather for address-based TLB invalidation * @gather: TLB gather data @@ -533,20 +555,16 @@ static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain, struct iommu_iotlb_gather *gather, unsigned long iova, size_t size) { - unsigned long start = iova, end = start + size - 1; - /* * If the new page is disjoint from the current range or is mapped at * a different granularity, then sync the TLB so that the gather * structure can be rewritten. */ - if (gather->pgsize != size || - end + 1 < gather->start || start > gather->end + 1) { - if (gather->pgsize) - iommu_iotlb_sync(domain, gather); - gather->pgsize = size; - } + if ((gather->pgsize && gather->pgsize != size) || + iommu_iotlb_gather_is_disjoint(gather, iova, size)) + iommu_iotlb_sync(domain, gather); + gather->pgsize = size; iommu_iotlb_gather_add_range(gather, iova, size); } -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 5/7] iommu/amd: Tailored gather logic for AMD
From: Nadav Amit AMD's IOMMU can flush efficiently (i.e., in a single flush) any range. This is in contrast, for instnace, to Intel IOMMUs that have a limit on the number of pages that can be flushed in a single flush. In addition, AMD's IOMMU do not care about the page-size, so changes of the page size do not need to trigger a TLB flush. So in most cases, a TLB flush due to disjoint range is not needed for AMD. Yet, vIOMMUs require the hypervisor to synchronize the virtualized IOMMU's PTEs with the physical ones. This process induce overheads, so it is better not to cause unnecessary flushes, i.e., flushes of PTEs that were not modified. Implement and use amd_iommu_iotlb_gather_add_page() and use it instead of the generic iommu_iotlb_gather_add_page(). Ignore disjoint regions unless "non-present cache" feature is reported by the IOMMU capabilities, as this is an indication we are running on a physical IOMMU. A similar indication is used by VT-d (see "caching mode"). The new logic retains the same flushing behavior that we had before the introduction of page-selective IOTLB flushes for AMD. On virtualized environments, check if the newly flushed region and the gathered one are disjoint and flush if it is. Cc: Joerg Roedel Cc: Will Deacon Cc: Jiajun Cao Cc: Lu Baolu Cc: iommu@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org> Cc: Robin Murphy Signed-off-by: Nadav Amit --- drivers/iommu/amd/iommu.c | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 3e40f6610b6a..63048aabaf5d 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -2053,6 +2053,27 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova, return ret; } +static void amd_iommu_iotlb_gather_add_page(struct iommu_domain *domain, + struct iommu_iotlb_gather *gather, + unsigned long iova, size_t size) +{ + /* +* AMD's IOMMU can flush as many pages as necessary in a single flush. +* Unless we run in a virtual machine, which can be inferred according +* to whether "non-present cache" is on, it is probably best to prefer +* (potentially) too extensive TLB flushing (i.e., more misses) over +* mutliple TLB flushes (i.e., more flushes). For virtual machines the +* hypervisor needs to synchronize the host IOMMU PTEs with those of +* the guest, and the trade-off is different: unnecessary TLB flushes +* should be avoided. +*/ + if (amd_iommu_np_cache && gather->end != 0 && + iommu_iotlb_gather_is_disjoint(gather, iova, size)) + iommu_iotlb_sync(domain, gather); + + iommu_iotlb_gather_add_range(gather, iova, size); +} + static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova, size_t page_size, struct iommu_iotlb_gather *gather) @@ -2067,7 +2088,7 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova, r = (ops->unmap) ? ops->unmap(ops, iova, page_size, gather) : 0; - iommu_iotlb_gather_add_page(dom, gather, iova, page_size); + amd_iommu_iotlb_gather_add_page(dom, gather, iova, page_size); return r; } -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 3/7] iommu: Improve iommu_iotlb_gather helpers
From: Robin Murphy The Mediatek driver is not the only one which might want a basic address-based gathering behaviour, so although it's arguably simple enough to open-code, let's factor it out for the sake of cleanliness. Let's also take this opportunity to document the intent of these helpers for clarity. Cc: Joerg Roedel Cc: Will Deacon Cc: Jiajun Cao Cc: Robin Murphy Cc: Lu Baolu Cc: iommu@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Robin Murphy Signed-off-by: Nadav Amit --- Changes from Robin's version: * Added iommu_iotlb_gather_add_range() stub !CONFIG_IOMMU_API * Use iommu_iotlb_gather_add_range() in iommu_iotlb_gather_add_page() --- drivers/iommu/mtk_iommu.c | 6 +- include/linux/iommu.h | 38 +- 2 files changed, 34 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index e06b8a0e2b56..cd457487ce81 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -521,12 +521,8 @@ static size_t mtk_iommu_unmap(struct iommu_domain *domain, struct iommu_iotlb_gather *gather) { struct mtk_iommu_domain *dom = to_mtk_domain(domain); - unsigned long end = iova + size - 1; - if (gather->start > iova) - gather->start = iova; - if (gather->end < end) - gather->end = end; + iommu_iotlb_gather_add_range(gather, iova, size); return dom->iop->unmap(dom->iop, iova, size, gather); } diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 32d448050bf7..e554871db46f 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -497,6 +497,38 @@ static inline void iommu_iotlb_sync(struct iommu_domain *domain, iommu_iotlb_gather_init(iotlb_gather); } +/** + * iommu_iotlb_gather_add_range - Gather for address-based TLB invalidation + * @gather: TLB gather data + * @iova: start of page to invalidate + * @size: size of page to invalidate + * + * Helper for IOMMU drivers to build arbitrarily-sized invalidation commands + * where only the address range matters, and simply minimising intermediate + * syncs is preferred. + */ +static inline void iommu_iotlb_gather_add_range(struct iommu_iotlb_gather *gather, + unsigned long iova, size_t size) +{ + unsigned long end = iova + size - 1; + + if (gather->start > iova) + gather->start = iova; + if (gather->end < end) + gather->end = end; +} + +/** + * iommu_iotlb_gather_add_page - Gather for page-based TLB invalidation + * @domain: IOMMU domain to be invalidated + * @gather: TLB gather data + * @iova: start of page to invalidate + * @size: size of page to invalidate + * + * Helper for IOMMU drivers to build invalidation commands based on individual + * pages, or with page size/table level hints which cannot be gathered if they + * differ. + */ static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain, struct iommu_iotlb_gather *gather, unsigned long iova, size_t size) @@ -515,11 +547,7 @@ static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain, gather->pgsize = size; } - if (gather->end < end) - gather->end = end; - - if (gather->start > start) - gather->start = start; + iommu_iotlb_gather_add_range(gather, iova, size); } /* PCI device grouping function */ -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 2/7] iommu/amd: Do not use flush-queue when NpCache is on
From: Nadav Amit Do not use flush-queue on virtualized environments, where the NpCache capability of the IOMMU is set. This is required to reduce virtualization overheads. This change follows a similar change to Intel's VT-d and a detailed explanation as for the rationale is described in commit 29b32839725f ("iommu/vt-d: Do not use flush-queue when caching-mode is on"). Cc: Joerg Roedel Cc: Will Deacon Cc: Jiajun Cao Cc: Robin Murphy Cc: Lu Baolu Cc: iommu@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Nadav Amit --- drivers/iommu/amd/init.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index d006724f4dc2..4a52d22d0d6f 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1850,8 +1850,13 @@ static int __init iommu_init_pci(struct amd_iommu *iommu) if (ret) return ret; - if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) + if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) { + if (!amd_iommu_unmap_flush) + pr_warn("IOMMU batching is disabled due to virtualization"); + amd_iommu_np_cache = true; + amd_iommu_unmap_flush = true; + } init_iommu_perf_ctr(iommu); -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 1/7] iommu/amd: Selective flush on unmap
From: Nadav Amit Recent patch attempted to enable selective page flushes on AMD IOMMU but neglected to adapt amd_iommu_iotlb_sync() to use the selective flushes. Adapt amd_iommu_iotlb_sync() to use selective flushes and change amd_iommu_unmap() to collect the flushes. As a defensive measure, to avoid potential issues as those that the Intel IOMMU driver encountered recently, flush the page-walk caches by always setting the "pde" parameter. This can be removed later. Cc: Joerg Roedel Cc: Will Deacon Cc: Jiajun Cao Cc: Robin Murphy Cc: Lu Baolu Cc: iommu@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Nadav Amit --- drivers/iommu/amd/iommu.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 3ac42bbdefc6..3e40f6610b6a 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -2059,12 +2059,17 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova, { struct protection_domain *domain = to_pdomain(dom); struct io_pgtable_ops *ops = >iop.iop.ops; + size_t r; if ((amd_iommu_pgtable == AMD_IOMMU_V1) && (domain->iop.mode == PAGE_MODE_NONE)) return 0; - return (ops->unmap) ? ops->unmap(ops, iova, page_size, gather) : 0; + r = (ops->unmap) ? ops->unmap(ops, iova, page_size, gather) : 0; + + iommu_iotlb_gather_add_page(dom, gather, iova, page_size); + + return r; } static phys_addr_t amd_iommu_iova_to_phys(struct iommu_domain *dom, @@ -2167,7 +2172,13 @@ static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain) static void amd_iommu_iotlb_sync(struct iommu_domain *domain, struct iommu_iotlb_gather *gather) { - amd_iommu_flush_iotlb_all(domain); + struct protection_domain *dom = to_pdomain(domain); + unsigned long flags; + + spin_lock_irqsave(>lock, flags); + __domain_flush_pages(dom, gather->start, gather->end - gather->start, 1); + amd_iommu_domain_flush_complete(dom); + spin_unlock_irqrestore(>lock, flags); } static int amd_iommu_def_domain_type(struct device *dev) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v4 0/7] iommu/amd: Enable page-selective flushes
From: Nadav Amit The previous patch, commit 268aa4548277 ("iommu/amd: Page-specific invalidations for more than one page") was supposed to enable page-selective IOTLB flushes on AMD. Besides the bug that was already fixed by commit a017c567915f ("iommu/amd: Fix wrong parentheses on page-specific invalidations") there are several remaining matters to enable and benefit from page-selective IOTLB flushes on AMD: 1. Enable selective flushes on unmap (patch 1) 2. Avoid using flush-queue on vIOMMUs (patch 2) 3. Relaxed flushes when gathering, excluding vIOMMUs (patches 3-5) 4. Syncing once on scatter-gather map operations (patch 6) 5. Breaking flushes to naturally aligned ranges on vIOMMU (patch 7) The main difference in this version is that the logic that flushes vIOMMU was improved based on Robin's feedback. Batching decisions are not based on alignment anymore, but instead the flushing range is broken into naturally aligned regions on sync. Doing so allows us to flush only the entries that we modified with the minimal number of flushes. Cc: Joerg Roedel Cc: Will Deacon Cc: Jiajun Cao Cc: Robin Murphy Cc: Lu Baolu Cc: iommu@lists.linux-foundation.org Cc: linux-ker...@vger.kernel.org --- v3->v4: * Breaking flushes to naturally aligned ranges on vIOMMU [Robin] * Removing unnecessary stubs; fixing comment [Robin] * Removing unused variable [Yong] * Changing pr_warn_once() to pr_warn() [Robin] * Improving commit log [Will] v2->v3: * Rebase on v5.13-rc5 * Refactoring (patches 4-5) [Robin] * Rework flush logic (patch 5): more relaxed on native * Syncing once on scatter-gather operations (patch 6) v1->v2: * Rebase on v5.13-rc3 Nadav Amit (6): iommu/amd: Selective flush on unmap iommu/amd: Do not use flush-queue when NpCache is on iommu: Factor iommu_iotlb_gather_is_disjoint() out iommu/amd: Tailored gather logic for AMD iommu/amd: Sync once for scatter-gather operations iommu/amd: Use only natural aligned flushes in a VM Robin Murphy (1): iommu: Improve iommu_iotlb_gather helpers drivers/iommu/amd/init.c | 7 ++- drivers/iommu/amd/iommu.c | 96 +++ drivers/iommu/mtk_iommu.c | 6 +-- include/linux/iommu.h | 72 +++-- 4 files changed, 153 insertions(+), 28 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 5/6] iommu/dma: Simplify calls to iommu_setup_dma_ops()
On 2021-06-10 08:51, Jean-Philippe Brucker wrote: dma-iommu uses the address bounds described in domain->geometry during IOVA allocation. The address size parameters of iommu_setup_dma_ops() are useful for describing additional limits set by the platform firmware, but aren't needed for drivers that call this function from probe_finalize(). The base parameter can be zero because dma-iommu already removes the first IOVA page, and the limit parameter can be U64_MAX because it's only checked against the domain geometry. Simplify calls to iommu_setup_dma_ops(). Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/amd/iommu.c | 9 + drivers/iommu/dma-iommu.c | 4 +++- drivers/iommu/intel/iommu.c | 10 +- 3 files changed, 5 insertions(+), 18 deletions(-) diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index 94b96d81fcfd..d3123bc05c08 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -1708,14 +1708,7 @@ static struct iommu_device *amd_iommu_probe_device(struct device *dev) static void amd_iommu_probe_finalize(struct device *dev) { - struct iommu_domain *domain; - - /* Domains are initialized for this device - have a look what we ended up with */ - domain = iommu_get_domain_for_dev(dev); - if (domain->type == IOMMU_DOMAIN_DMA) - iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, U64_MAX); - else - set_dma_ops(dev, NULL); + iommu_setup_dma_ops(dev, 0, U64_MAX); } static void amd_iommu_release_device(struct device *dev) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index c62e19bed302..175f8eaeb5b3 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1322,7 +1322,9 @@ void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit) if (domain->type == IOMMU_DOMAIN_DMA) { if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev)) goto out_err; - dev->dma_ops = _dma_ops; + set_dma_ops(dev, _dma_ops); + } else { + set_dma_ops(dev, NULL); I'm not keen on moving this here, since iommu-dma only knows that its own ops are right for devices it *is* managing; it can't assume any particular ops are appropriate for devices it isn't. The idea here is that arch_setup_dma_ops() may have already set the appropriate ops for the non-IOMMU case, so if the default domain type is passthrough then we leave those in place. For example, I do still plan to revisit my conversion of arch/arm someday, at which point I'd have to undo this for that reason. Simplifying the base and size arguments is of course fine, but TBH I'd say rip the whole bloody lot out of the arch_setup_dma_ops() flow now. It's a considerable faff passing them around for nothing but a tenuous sanity check in iommu_dma_init_domain(), and now that dev->dma_range_map is a common thing we should expect that to give us any relevant limitations if we even still care. That said, those are all things which can be fixed up later if the series is otherwise ready to go and there's still a chance of landing it for 5.14. If you do have any other reason to respin, then I think the x86 probe_finalize functions simply want an unconditional set_dma_ops(dev, NULL) before the iommu_setup_dma_ops() call. Cheers, Robin. } return; diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 85f18342603c..8d866940692a 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -5165,15 +5165,7 @@ static void intel_iommu_release_device(struct device *dev) static void intel_iommu_probe_finalize(struct device *dev) { - dma_addr_t base = IOVA_START_PFN << VTD_PAGE_SHIFT; - struct iommu_domain *domain = iommu_get_domain_for_dev(dev); - struct dmar_domain *dmar_domain = to_dmar_domain(domain); - - if (domain && domain->type == IOMMU_DOMAIN_DMA) - iommu_setup_dma_ops(dev, base, - __DOMAIN_MAX_ADDR(dmar_domain->gaw)); - else - set_dma_ops(dev, NULL); + iommu_setup_dma_ops(dev, 0, U64_MAX); } static void intel_iommu_get_resv_regions(struct device *device, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 6/6] iommu/virtio: Enable x86 support
Hi jean, On 6/10/21 9:51 AM, Jean-Philippe Brucker wrote: > With the VIOT support in place, x86 platforms can now use the > virtio-iommu. > > Because the other x86 IOMMU drivers aren't yet ready to use the > acpi_dma_setup() path, x86 doesn't implement arch_setup_dma_ops() at the > moment. Similarly to Vt-d and AMD IOMMU, call iommu_setup_dma_ops() from > probe_finalize(). > > Acked-by: Joerg Roedel > Acked-by: Michael S. Tsirkin > Signed-off-by: Jean-Philippe Brucker Reviewed-by: Eric Auger Eric > --- > drivers/iommu/Kconfig| 3 ++- > drivers/iommu/dma-iommu.c| 1 + > drivers/iommu/virtio-iommu.c | 8 > 3 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index aff8a4830dd1..07b7c25cbed8 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -400,8 +400,9 @@ config HYPERV_IOMMU > config VIRTIO_IOMMU > tristate "Virtio IOMMU driver" > depends on VIRTIO > - depends on ARM64 > + depends on (ARM64 || X86) > select IOMMU_API > + select IOMMU_DMA > select INTERVAL_TREE > select ACPI_VIOT if ACPI > help > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 175f8eaeb5b3..46ed43c400cf 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -1332,6 +1332,7 @@ void iommu_setup_dma_ops(struct device *dev, u64 > dma_base, u64 dma_limit) >pr_warn("Failed to set up IOMMU for device %s; retaining platform DMA > ops\n", >dev_name(dev)); > } > +EXPORT_SYMBOL_GPL(iommu_setup_dma_ops); > > static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev, > phys_addr_t msi_addr, struct iommu_domain *domain) > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > index 218fe8560e8d..77aee1207ced 100644 > --- a/drivers/iommu/virtio-iommu.c > +++ b/drivers/iommu/virtio-iommu.c > @@ -1026,6 +1026,13 @@ static struct iommu_device *viommu_probe_device(struct > device *dev) > return ERR_PTR(ret); > } > > +static void viommu_probe_finalize(struct device *dev) > +{ > +#ifndef CONFIG_ARCH_HAS_SETUP_DMA_OPS > + iommu_setup_dma_ops(dev, 0, U64_MAX); > +#endif > +} > + > static void viommu_release_device(struct device *dev) > { > struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); > @@ -1062,6 +1069,7 @@ static struct iommu_ops viommu_ops = { > .iova_to_phys = viommu_iova_to_phys, > .iotlb_sync = viommu_iotlb_sync, > .probe_device = viommu_probe_device, > + .probe_finalize = viommu_probe_finalize, > .release_device = viommu_release_device, > .device_group = viommu_device_group, > .get_resv_regions = viommu_get_resv_regions, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 5/6] iommu/dma: Simplify calls to iommu_setup_dma_ops()
Hi Jean, On 6/10/21 9:51 AM, Jean-Philippe Brucker wrote: > dma-iommu uses the address bounds described in domain->geometry during > IOVA allocation. The address size parameters of iommu_setup_dma_ops() > are useful for describing additional limits set by the platform > firmware, but aren't needed for drivers that call this function from > probe_finalize(). The base parameter can be zero because dma-iommu > already removes the first IOVA page, and the limit parameter can be > U64_MAX because it's only checked against the domain geometry. Simplify > calls to iommu_setup_dma_ops(). > > Signed-off-by: Jean-Philippe Brucker Reviewed-by: Eric Auger Eric > --- > drivers/iommu/amd/iommu.c | 9 + > drivers/iommu/dma-iommu.c | 4 +++- > drivers/iommu/intel/iommu.c | 10 +- > 3 files changed, 5 insertions(+), 18 deletions(-) > > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c > index 94b96d81fcfd..d3123bc05c08 100644 > --- a/drivers/iommu/amd/iommu.c > +++ b/drivers/iommu/amd/iommu.c > @@ -1708,14 +1708,7 @@ static struct iommu_device > *amd_iommu_probe_device(struct device *dev) > > static void amd_iommu_probe_finalize(struct device *dev) > { > - struct iommu_domain *domain; > - > - /* Domains are initialized for this device - have a look what we ended > up with */ > - domain = iommu_get_domain_for_dev(dev); > - if (domain->type == IOMMU_DOMAIN_DMA) > - iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, U64_MAX); > - else > - set_dma_ops(dev, NULL); > + iommu_setup_dma_ops(dev, 0, U64_MAX); > } > > static void amd_iommu_release_device(struct device *dev) > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index c62e19bed302..175f8eaeb5b3 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -1322,7 +1322,9 @@ void iommu_setup_dma_ops(struct device *dev, u64 > dma_base, u64 dma_limit) > if (domain->type == IOMMU_DOMAIN_DMA) { > if (iommu_dma_init_domain(domain, dma_base, dma_limit, dev)) > goto out_err; > - dev->dma_ops = _dma_ops; > + set_dma_ops(dev, _dma_ops); > + } else { > + set_dma_ops(dev, NULL); > } > > return; > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index 85f18342603c..8d866940692a 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -5165,15 +5165,7 @@ static void intel_iommu_release_device(struct device > *dev) > > static void intel_iommu_probe_finalize(struct device *dev) > { > - dma_addr_t base = IOVA_START_PFN << VTD_PAGE_SHIFT; > - struct iommu_domain *domain = iommu_get_domain_for_dev(dev); > - struct dmar_domain *dmar_domain = to_dmar_domain(domain); > - > - if (domain && domain->type == IOMMU_DOMAIN_DMA) > - iommu_setup_dma_ops(dev, base, > - __DOMAIN_MAX_ADDR(dmar_domain->gaw)); > - else > - set_dma_ops(dev, NULL); > + iommu_setup_dma_ops(dev, 0, U64_MAX); > } > > static void intel_iommu_get_resv_regions(struct device *device, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] swiotlb-xen: override common mmap and get_sgtable dma ops
On Wed, Jun 16, 2021 at 11:39:07AM -0400, Boris Ostrovsky wrote: > > On 6/16/21 11:35 AM, Christoph Hellwig wrote: > > On Wed, Jun 16, 2021 at 11:33:50AM -0400, Boris Ostrovsky wrote: > >> Isn't the expectation of virt_to_page() that it only works on > >> non-vmalloc'd addresses? (This is not a rhetorical question, I actually > >> don't know). > > Yes. Thus is why I'd suggest to just do the vmalloc_to_page or > > virt_to_page dance in ops_helpers.c and just continue using that. > > > Ah, OK, so something along the lines of what I suggested. (I thought by > "helpers" you meant virt_to_page()). Yes. Just keeping it contained in the common code without duplicating it into a xen-specific version. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] swiotlb-xen: override common mmap and get_sgtable dma ops
On 6/16/21 11:35 AM, Christoph Hellwig wrote: > On Wed, Jun 16, 2021 at 11:33:50AM -0400, Boris Ostrovsky wrote: >> Isn't the expectation of virt_to_page() that it only works on non-vmalloc'd >> addresses? (This is not a rhetorical question, I actually don't know). > Yes. Thus is why I'd suggest to just do the vmalloc_to_page or > virt_to_page dance in ops_helpers.c and just continue using that. Ah, OK, so something along the lines of what I suggested. (I thought by "helpers" you meant virt_to_page()). -boris ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] swiotlb-xen: override common mmap and get_sgtable dma ops
On Wed, Jun 16, 2021 at 11:33:50AM -0400, Boris Ostrovsky wrote: > Isn't the expectation of virt_to_page() that it only works on non-vmalloc'd > addresses? (This is not a rhetorical question, I actually don't know). Yes. Thus is why I'd suggest to just do the vmalloc_to_page or virt_to_page dance in ops_helpers.c and just continue using that. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] swiotlb-xen: override common mmap and get_sgtable dma ops
On 6/16/21 10:21 AM, Christoph Hellwig wrote: > On Wed, Jun 16, 2021 at 10:12:55AM -0400, Boris Ostrovsky wrote: >> I wonder now whether we could avoid code duplication between here and >> dma_common_mmap()/dma_common_get_sgtable() and use your helper there. >> >> >> Christoph, would that work? I.e. something like > You should not duplicate the code at all, and just make the common > helpers work with vmalloc addresses. Isn't the expectation of virt_to_page() that it only works on non-vmalloc'd addresses? (This is not a rhetorical question, I actually don't know). -boris ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 4/6] iommu/dma: Pass address limit rather than size to iommu_setup_dma_ops()
Hi Jean, On 6/10/21 9:51 AM, Jean-Philippe Brucker wrote: > Passing a 64-bit address width to iommu_setup_dma_ops() is valid on > virtual platforms, but isn't currently possible. The overflow check in > iommu_dma_init_domain() prevents this even when @dma_base isn't 0. Pass > a limit address instead of a size, so callers don't have to fake a size > to work around the check. > > Signed-off-by: Jean-Philippe Brucker > --- > include/linux/dma-iommu.h | 4 ++-- > arch/arm64/mm/dma-mapping.c | 2 +- > drivers/iommu/amd/iommu.c | 2 +- > drivers/iommu/dma-iommu.c | 12 ++-- > drivers/iommu/intel/iommu.c | 2 +- > 5 files changed, 11 insertions(+), 11 deletions(-) > > diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h > index 6e75a2d689b4..758ca4694257 100644 > --- a/include/linux/dma-iommu.h > +++ b/include/linux/dma-iommu.h > @@ -19,7 +19,7 @@ int iommu_get_msi_cookie(struct iommu_domain *domain, > dma_addr_t base); > void iommu_put_dma_cookie(struct iommu_domain *domain); > > /* Setup call for arch DMA mapping code */ > -void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size); > +void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit); > > /* The DMA API isn't _quite_ the whole story, though... */ > /* > @@ -50,7 +50,7 @@ struct msi_msg; > struct device; > > static inline void iommu_setup_dma_ops(struct device *dev, u64 dma_base, > - u64 size) > +u64 dma_limit) > { > } > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c > index 4bf1dd3eb041..7bd1d2199141 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -50,7 +50,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, > u64 size, > > dev->dma_coherent = coherent; > if (iommu) > - iommu_setup_dma_ops(dev, dma_base, size); > + iommu_setup_dma_ops(dev, dma_base, size - dma_base - 1); I don't get size - dma_base - 1? > > #ifdef CONFIG_XEN > if (xen_swiotlb_detect()) > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c > index 3ac42bbdefc6..94b96d81fcfd 100644 > --- a/drivers/iommu/amd/iommu.c > +++ b/drivers/iommu/amd/iommu.c > @@ -1713,7 +1713,7 @@ static void amd_iommu_probe_finalize(struct device *dev) > /* Domains are initialized for this device - have a look what we ended > up with */ > domain = iommu_get_domain_for_dev(dev); > if (domain->type == IOMMU_DOMAIN_DMA) > - iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, 0); > + iommu_setup_dma_ops(dev, IOVA_START_PFN << PAGE_SHIFT, U64_MAX); > else > set_dma_ops(dev, NULL); > } > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > index 7bcdd1205535..c62e19bed302 100644 > --- a/drivers/iommu/dma-iommu.c > +++ b/drivers/iommu/dma-iommu.c > @@ -319,16 +319,16 @@ static bool dev_is_untrusted(struct device *dev) > * iommu_dma_init_domain - Initialise a DMA mapping domain > * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() > * @base: IOVA at which the mappable address space starts > - * @size: Size of IOVA space > + * @limit: Last address of the IOVA space > * @dev: Device the domain is being initialised for > * > - * @base and @size should be exact multiples of IOMMU page granularity to > + * @base and @limit + 1 should be exact multiples of IOMMU page granularity > to > * avoid rounding surprises. If necessary, we reserve the page at address 0 > * to ensure it is an invalid IOVA. It is safe to reinitialise a domain, but > * any change which could make prior IOVAs invalid will fail. > */ > static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t > base, > - u64 size, struct device *dev) > + dma_addr_t limit, struct device *dev) > { > struct iommu_dma_cookie *cookie = domain->iova_cookie; > unsigned long order, base_pfn; > @@ -346,7 +346,7 @@ static int iommu_dma_init_domain(struct iommu_domain > *domain, dma_addr_t base, > /* Check the domain allows at least some access to the device... */ > if (domain->geometry.force_aperture) { > if (base > domain->geometry.aperture_end || > - base + size <= domain->geometry.aperture_start) { > + limit < domain->geometry.aperture_start) { > pr_warn("specified DMA range outside IOMMU > capability\n"); > return -EFAULT; > } > @@ -1308,7 +1308,7 @@ static const struct dma_map_ops iommu_dma_ops = { > * The IOMMU core code allocates the default DMA domain, which the underlying > * IOMMU driver needs to support via the dma-iommu layer. > */ > -void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size) > +void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit) > { > struct
Re: [RFC PATCH] iommu: add domain->nested
On Wed, Jun 16, 2021 at 10:38:02PM +0800, Zhangfei Gao wrote: > +++ b/include/linux/iommu.h > @@ -87,6 +87,7 @@ struct iommu_domain { > void *handler_token; > struct iommu_domain_geometry geometry; > void *iova_cookie; > + int nested; This should probably be a bool : 1; Also this needs a user, so please just queue up a variant of this for the code that eventually relies on this information. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[RFC PATCH] iommu: add domain->nested
Add domain->nested to decide whether domain is in nesting mode, since attr DOMAIN_ATTR_NESTING is removed in the patches: 7876a83 iommu: remove iommu_domain_{get,set}_attr 7e14754 iommu: remove DOMAIN_ATTR_NESTING Signed-off-by: Zhangfei Gao --- Nesting info is still required for vsva according to https://patchwork.kernel.org/project/linux-arm-kernel/patch/20210301084257.945454-16-...@lst.de/ drivers/iommu/iommu.c | 8 +++- include/linux/iommu.h | 1 + 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 808ab70..ba26ad0 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2684,11 +2684,17 @@ core_initcall(iommu_init); int iommu_enable_nesting(struct iommu_domain *domain) { + int ret; + if (domain->type != IOMMU_DOMAIN_UNMANAGED) return -EINVAL; if (!domain->ops->enable_nesting) return -EINVAL; - return domain->ops->enable_nesting(domain); + ret = domain->ops->enable_nesting(domain); + if (!ret) + domain->nested = 1; + + return ret; } EXPORT_SYMBOL_GPL(iommu_enable_nesting); diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 32d4480..179f849 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -87,6 +87,7 @@ struct iommu_domain { void *handler_token; struct iommu_domain_geometry geometry; void *iova_cookie; + int nested; }; enum iommu_cap { -- 2.7.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] swiotlb-xen: override common mmap and get_sgtable dma ops
On Wed, Jun 16, 2021 at 10:12:55AM -0400, Boris Ostrovsky wrote: > I wonder now whether we could avoid code duplication between here and > dma_common_mmap()/dma_common_get_sgtable() and use your helper there. > > > Christoph, would that work? I.e. something like You should not duplicate the code at all, and just make the common helpers work with vmalloc addresses. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/2] swiotlb-xen: override common mmap and get_sgtable dma ops
On 6/16/21 7:42 AM, Roman Skakun wrote: > This commit is dedicated to fix incorrect conversion from > cpu_addr to page address in cases when we get virtual > address which allocated through xen_swiotlb_alloc_coherent() > and can be mapped in the vmalloc range. > As the result, virt_to_page() cannot convert this address > properly and return incorrect page address. > > Need to detect such cases and obtains the page address using > vmalloc_to_page() instead. > > The reference code for mmap() and get_sgtable() was copied > from kernel/dma/ops_helpers.c and modified to provide > additional detections as described above. > > In order to simplify code there was added a new > dma_cpu_addr_to_page() helper. > > Signed-off-by: Roman Skakun > Reviewed-by: Andrii Anisov > --- > drivers/xen/swiotlb-xen.c | 42 +++ > 1 file changed, 34 insertions(+), 8 deletions(-) > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > index 90bc5fc321bc..9331a8500547 100644 > --- a/drivers/xen/swiotlb-xen.c > +++ b/drivers/xen/swiotlb-xen.c > @@ -118,6 +118,14 @@ static int is_xen_swiotlb_buffer(struct device *dev, > dma_addr_t dma_addr) > return 0; > } > > +static struct page *cpu_addr_to_page(void *cpu_addr) > +{ > + if (is_vmalloc_addr(cpu_addr)) > + return vmalloc_to_page(cpu_addr); > + else > + return virt_to_page(cpu_addr); > +} > + > static int > xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs) > { > @@ -337,7 +345,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t > size, void *vaddr, > int order = get_order(size); > phys_addr_t phys; > u64 dma_mask = DMA_BIT_MASK(32); > - struct page *page; > + struct page *page = cpu_addr_to_page(vaddr); > > if (hwdev && hwdev->coherent_dma_mask) > dma_mask = hwdev->coherent_dma_mask; > @@ -349,11 +357,6 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t > size, void *vaddr, > /* Convert the size to actually allocated. */ > size = 1UL << (order + XEN_PAGE_SHIFT); > > - if (is_vmalloc_addr(vaddr)) > - page = vmalloc_to_page(vaddr); > - else > - page = virt_to_page(vaddr); > - > if (!WARN_ON((dev_addr + size - 1 > dma_mask) || >range_straddles_page_boundary(phys, size)) && > TestClearPageXenRemapped(page)) > @@ -573,7 +576,23 @@ xen_swiotlb_dma_mmap(struct device *dev, struct > vm_area_struct *vma, >void *cpu_addr, dma_addr_t dma_addr, size_t size, >unsigned long attrs) > { > - return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size, attrs); > + unsigned long user_count = vma_pages(vma); > + unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT; > + unsigned long off = vma->vm_pgoff; > + struct page *page = cpu_addr_to_page(cpu_addr); > + int ret; > + > + vma->vm_page_prot = dma_pgprot(dev, vma->vm_page_prot, attrs); > + > + if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, )) > + return ret; > + > + if (off >= count || user_count > count - off) > + return -ENXIO; > + > + return remap_pfn_range(vma, vma->vm_start, > + page_to_pfn(page) + vma->vm_pgoff, > + user_count << PAGE_SHIFT, vma->vm_page_prot); > } I wonder now whether we could avoid code duplication between here and dma_common_mmap()/dma_common_get_sgtable() and use your helper there. Christoph, would that work? I.e. something like diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c index 910ae69cae77..43411c2fa47b 100644 --- a/kernel/dma/ops_helpers.c +++ b/kernel/dma/ops_helpers.c @@ -12,7 +12,7 @@ int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs) { - struct page *page = virt_to_page(cpu_addr); + struct page *page = cpu_addr_to_page(cpu_addr); int ret; ret = sg_alloc_table(sgt, 1, GFP_KERNEL); @@ -43,7 +43,7 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct *vma, return -ENXIO; return remap_pfn_range(vma, vma->vm_start, - page_to_pfn(virt_to_page(cpu_addr)) + vma->vm_pgoff, + page_to_pfn(cpu_addr_to_page(cpu_addr)) + vma->vm_pgoff, user_count << PAGE_SHIFT, vma->vm_page_prot); #else return -ENXIO; -boris > > /* > @@ -585,7 +604,14 @@ xen_swiotlb_get_sgtable(struct device *dev, struct > sg_table *sgt, > void *cpu_addr, dma_addr_t handle, size_t size, > unsigned long attrs) > { > - return dma_common_get_sgtable(dev, sgt, cpu_addr, handle, size, attrs); > + struct page *page = cpu_addr_to_page(cpu_addr); > + int ret; > + > + ret = sg_alloc_table(sgt, 1,
Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems
On 15.06.2021 13:15:43, Rob Herring wrote: > If a property has an 'items' list, then a 'minItems' or 'maxItems' with the > same size as the list is redundant and can be dropped. Note that is DT > schema specific behavior and not standard json-schema behavior. The tooling > will fixup the final schema adding any unspecified minItems/maxItems. > > This condition is partially checked with the meta-schema already, but > only if both 'minItems' and 'maxItems' are equal to the 'items' length. > An improved meta-schema is pending. [...] > Documentation/devicetree/bindings/net/can/bosch,m_can.yaml | 2 -- Acked-by: Marc Kleine-Budde regards, Marc -- Pengutronix e.K. | Marc Kleine-Budde | Embedded Linux | https://www.pengutronix.de | Vertretung West/Dortmund | Phone: +49-231-2826-924 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917- | signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 02/15] iommu: Add an unmap_pages() op for IOMMU drivers
From: "Isaac J. Manjarres" Add a callback for IOMMU drivers to provide a path for the IOMMU framework to call into an IOMMU driver, which can call into the io-pgtable code, to unmap a virtually contiguous range of pages of the same size. For IOMMU drivers that do not specify an unmap_pages() callback, the existing logic of unmapping memory one page block at a time will be used. Signed-off-by: Isaac J. Manjarres Suggested-by: Will Deacon Signed-off-by: Will Deacon Acked-by: Lu Baolu Signed-off-by: Georgi Djakov --- include/linux/iommu.h | 4 1 file changed, 4 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 32d448050bf7..25a844121be5 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -181,6 +181,7 @@ struct iommu_iotlb_gather { * @detach_dev: detach device from an iommu domain * @map: map a physically contiguous memory region to an iommu domain * @unmap: unmap a physically contiguous memory region from an iommu domain + * @unmap_pages: unmap a number of pages of the same size from an iommu domain * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain * @iotlb_sync_map: Sync mappings created recently using @map to the hardware * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush @@ -231,6 +232,9 @@ struct iommu_ops { phys_addr_t paddr, size_t size, int prot, gfp_t gfp); size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, size_t size, struct iommu_iotlb_gather *iotlb_gather); + size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova, + size_t pgsize, size_t pgcount, + struct iommu_iotlb_gather *iotlb_gather); void (*flush_iotlb_all)(struct iommu_domain *domain); void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova, size_t size); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 09/15] iommu/io-pgtable-arm: Prepare PTE methods for handling multiple entries
From: "Isaac J. Manjarres" The PTE methods currently operate on a single entry. In preparation for manipulating multiple PTEs in one map or unmap call, allow them to handle multiple PTEs. Signed-off-by: Isaac J. Manjarres Suggested-by: Robin Murphy Signed-off-by: Georgi Djakov --- drivers/iommu/io-pgtable-arm.c | 78 -- 1 file changed, 44 insertions(+), 34 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 87def58e79b5..ea66b10c04c4 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -232,20 +232,23 @@ static void __arm_lpae_free_pages(void *pages, size_t size, free_pages((unsigned long)pages, get_order(size)); } -static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, +static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries, struct io_pgtable_cfg *cfg) { dma_sync_single_for_device(cfg->iommu_dev, __arm_lpae_dma_addr(ptep), - sizeof(*ptep), DMA_TO_DEVICE); + sizeof(*ptep) * num_entries, DMA_TO_DEVICE); } static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte, - struct io_pgtable_cfg *cfg) + int num_entries, struct io_pgtable_cfg *cfg) { - *ptep = pte; + int i; + + for (i = 0; i < num_entries; i++) + ptep[i] = pte; if (!cfg->coherent_walk) - __arm_lpae_sync_pte(ptep, cfg); + __arm_lpae_sync_pte(ptep, num_entries, cfg); } static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, @@ -255,47 +258,54 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data, phys_addr_t paddr, arm_lpae_iopte prot, - int lvl, arm_lpae_iopte *ptep) + int lvl, int num_entries, arm_lpae_iopte *ptep) { arm_lpae_iopte pte = prot; + struct io_pgtable_cfg *cfg = >iop.cfg; + size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data); + int i; if (data->iop.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1) pte |= ARM_LPAE_PTE_TYPE_PAGE; else pte |= ARM_LPAE_PTE_TYPE_BLOCK; - pte |= paddr_to_iopte(paddr, data); + for (i = 0; i < num_entries; i++) + ptep[i] = pte | paddr_to_iopte(paddr + i * sz, data); - __arm_lpae_set_pte(ptep, pte, >iop.cfg); + if (!cfg->coherent_walk) + __arm_lpae_sync_pte(ptep, num_entries, cfg); } static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data, unsigned long iova, phys_addr_t paddr, -arm_lpae_iopte prot, int lvl, +arm_lpae_iopte prot, int lvl, int num_entries, arm_lpae_iopte *ptep) { - arm_lpae_iopte pte = *ptep; - - if (iopte_leaf(pte, lvl, data->iop.fmt)) { - /* We require an unmap first */ - WARN_ON(!selftest_running); - return -EEXIST; - } else if (iopte_type(pte) == ARM_LPAE_PTE_TYPE_TABLE) { - /* -* We need to unmap and free the old table before -* overwriting it with a block entry. -*/ - arm_lpae_iopte *tblp; - size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data); - - tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data); - if (__arm_lpae_unmap(data, NULL, iova, sz, lvl, tblp) != sz) { - WARN_ON(1); - return -EINVAL; + int i; + + for (i = 0; i < num_entries; i++) + if (iopte_leaf(ptep[i], lvl, data->iop.fmt)) { + /* We require an unmap first */ + WARN_ON(!selftest_running); + return -EEXIST; + } else if (iopte_type(ptep[i]) == ARM_LPAE_PTE_TYPE_TABLE) { + /* +* We need to unmap and free the old table before +* overwriting it with a block entry. +*/ + arm_lpae_iopte *tblp; + size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data); + + tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data); + if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, +lvl, tblp) != sz) { + WARN_ON(1); + return -EINVAL; + } } - } - __arm_lpae_init_pte(data, paddr, prot, lvl, ptep); + __arm_lpae_init_pte(data, paddr, prot, lvl, num_entries, ptep); return 0; } @@ -323,7
[PATCH v7 06/15] iommu: Split 'addr_merge' argument to iommu_pgsize() into separate parts
From: Will Deacon The 'addr_merge' parameter to iommu_pgsize() is a fabricated address intended to describe the alignment requirements to consider when choosing an appropriate page size. On the iommu_map() path, this address is the logical OR of the virtual and physical addresses. Subsequent improvements to iommu_pgsize() will need to check the alignment of the virtual and physical components of 'addr_merge' independently, so pass them in as separate parameters and reconstruct 'addr_merge' locally. No functional change. Signed-off-by: Will Deacon Signed-off-by: Isaac J. Manjarres Signed-off-by: Georgi Djakov --- drivers/iommu/iommu.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 80e471ada358..80e14c139d40 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2375,12 +2375,13 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova) } EXPORT_SYMBOL_GPL(iommu_iova_to_phys); -static size_t iommu_pgsize(struct iommu_domain *domain, - unsigned long addr_merge, size_t size) +static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size) { unsigned int pgsize_idx; unsigned long pgsizes; size_t pgsize; + unsigned long addr_merge = paddr | iova; /* Page sizes supported by the hardware and small enough for @size */ pgsizes = domain->pgsize_bitmap & GENMASK(__fls(size), 0); @@ -2433,7 +2434,7 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova, pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, , size); while (size) { - size_t pgsize = iommu_pgsize(domain, iova | paddr, size); + size_t pgsize = iommu_pgsize(domain, iova, paddr, size); pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n", iova, , pgsize); @@ -2521,8 +2522,9 @@ static size_t __iommu_unmap(struct iommu_domain *domain, * or we hit an area that isn't mapped. */ while (unmapped < size) { - size_t pgsize = iommu_pgsize(domain, iova, size - unmapped); + size_t pgsize; + pgsize = iommu_pgsize(domain, iova, iova, size - unmapped); unmapped_page = ops->unmap(domain, iova, pgsize, iotlb_gather); if (!unmapped_page) break; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 15/15] iommu/arm-smmu: Implement the map_pages() IOMMU driver callback
From: "Isaac J. Manjarres" Implement the map_pages() callback for the ARM SMMU driver to allow calls from iommu_map to map multiple pages of the same size in one call. Also, remove the map() callback for the ARM SMMU driver, as it will no longer be used. Signed-off-by: Isaac J. Manjarres Suggested-by: Will Deacon Signed-off-by: Georgi Djakov --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 593a15cfa8d5..c1ca3b49a620 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -1193,8 +1193,9 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return ret; } -static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot, gfp_t gfp) +static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t pgsize, size_t pgcount, + int prot, gfp_t gfp, size_t *mapped) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu; @@ -1204,7 +1205,7 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, return -ENODEV; arm_smmu_rpm_get(smmu); - ret = ops->map(ops, iova, paddr, size, prot, gfp); + ret = ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, gfp, mapped); arm_smmu_rpm_put(smmu); return ret; @@ -1574,7 +1575,7 @@ static struct iommu_ops arm_smmu_ops = { .domain_alloc = arm_smmu_domain_alloc, .domain_free= arm_smmu_domain_free, .attach_dev = arm_smmu_attach_dev, - .map= arm_smmu_map, + .map_pages = arm_smmu_map_pages, .unmap_pages= arm_smmu_unmap_pages, .flush_iotlb_all= arm_smmu_flush_iotlb_all, .iotlb_sync = arm_smmu_iotlb_sync, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 00/15] Optimizing iommu_[map/unmap] performance
When unmapping a buffer from an IOMMU domain, the IOMMU framework unmaps the buffer at a granule of the largest page size that is supported by the IOMMU hardware and fits within the buffer. For every block that is unmapped, the IOMMU framework will call into the IOMMU driver, and then the io-pgtable framework to walk the page tables to find the entry that corresponds to the IOVA, and then unmaps the entry. This can be suboptimal in scenarios where a buffer or a piece of a buffer can be split into several contiguous page blocks of the same size. For example, consider an IOMMU that supports 4 KB page blocks, 2 MB page blocks, and 1 GB page blocks, and a buffer that is 4 MB in size is being unmapped at IOVA 0. The current call-flow will result in 4 indirect calls, and 2 page table walks, to unmap 2 entries that are next to each other in the page-tables, when both entries could have been unmapped in one shot by clearing both page table entries in the same call. The same optimization is applicable to mapping buffers as well, so these patches implement a set of callbacks called unmap_pages and map_pages to the io-pgtable code and IOMMU drivers which unmaps or maps an IOVA range that consists of a number of pages of the same page size that is supported by the IOMMU hardware, and allows for manipulating multiple page table entries in the same set of indirect calls. The reason for introducing these callbacks is to give other IOMMU drivers/io-pgtable formats time to change to using the new callbacks, so that the transition to using this approach can be done piecemeal. Changes since V6: (https://lore.kernel.org/r/1623776913-390160-1-git-send-email-quic_c_gdj...@quicinc.com/) * Fix compiler warning (patch 08/15) * Free underlying page tables for large mappings (patch 10/15) Consider the case where a 2N--where N > 1--MB buffer is composed entirely of 4 KB pages. This means that at the second to last level, the buffer will have N non-leaf entries that point to page tables with 4 KB mappings. When the buffer is unmapped, all N entries will be cleared at the second to last level. However, the existing logic only checks if it needs to free the underlying page tables for the first non-leaf entry. Therefore, the page table memory for the other entries N-1 entries will be leaked. Fix this memory leak by ensuring that we apply the same check to all N entries that are being unmapped. When unmapping multiple entries, __arm_lpae_unmap() should unmap one entry at a time and perform TLB maintenance as required for that entry. Changes since V5: (https://lore.kernel.org/r/20210408171402.12607-1-isa...@codeaurora.org/) * Rebased on next-20210515. * Fixed minor checkpatch warnings - indentation, extra blank lines. * Use the correct function argument in __arm_lpae_map(). (chenxiang) Changes since V4: * Fixed type for addr_merge from phys_addr_t to unsigned long so that GENMASK() can be used. * Hooked up arm_v7s_[unmap/map]_pages to the io-pgtable ops. * Introduced a macro for calculating the number of page table entries for the ARM LPAE io-pgtable format. Changes since V3: * Removed usage of ULL variants of bitops from Will's patches, as they were not needed. * Instead of unmapping/mapping pgcount pages, unmap_pages() and map_pages() will at most unmap and map pgcount pages, allowing for part of the pages in pgcount to be mapped and unmapped. This was done to simplify the handling in the io-pgtable layer. * Extended the existing PTE manipulation methods in io-pgtable-arm to handle multiple entries, per Robin's suggestion, eliminating the need to add functions to clear multiple PTEs. * Implemented a naive form of [map/unmap]_pages() for ARM v7s io-pgtable format. * arm_[v7s/lpae]_[map/unmap] will call arm_[v7s/lpae]_[map_pages/unmap_pages] with an argument of 1 page. * The arm_smmu_[map/unmap] functions have been removed, since they have been replaced by arm_smmu_[map/unmap]_pages. Changes since V2: * Added a check in __iommu_map() to check for the existence of either the map or map_pages callback as per Lu's suggestion. Changes since V1: * Implemented the map_pages() callbacks * Integrated Will's patches into this series which address several concerns about how iommu_pgsize() partitioned a buffer (I made a minor change to the patch which changes iommu_pgsize() to use bitmaps by using the ULL variants of the bitops) Isaac J. Manjarres (12): iommu/io-pgtable: Introduce unmap_pages() as a page table op iommu: Add an unmap_pages() op for IOMMU drivers iommu/io-pgtable: Introduce map_pages() as a page table op iommu: Add a map_pages() op for IOMMU drivers iommu: Add support for the map_pages() callback iommu/io-pgtable-arm: Prepare PTE methods for handling multiple entries iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages() iommu/io-pgtable-arm: Implement arm_lpae_map_pages() iommu/io-pgtable-arm-v7s: Implement
[PATCH v7 12/15] iommu/io-pgtable-arm-v7s: Implement arm_v7s_unmap_pages()
From: "Isaac J. Manjarres" Implement the unmap_pages() callback for the ARM v7s io-pgtable format. Signed-off-by: Isaac J. Manjarres Signed-off-by: Georgi Djakov --- drivers/iommu/io-pgtable-arm-v7s.c | 24 +--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index d4004bcf333a..1af060686985 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -710,15 +710,32 @@ static size_t __arm_v7s_unmap(struct arm_v7s_io_pgtable *data, return __arm_v7s_unmap(data, gather, iova, size, lvl + 1, ptep); } -static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova, - size_t size, struct iommu_iotlb_gather *gather) +static size_t arm_v7s_unmap_pages(struct io_pgtable_ops *ops, unsigned long iova, + size_t pgsize, size_t pgcount, + struct iommu_iotlb_gather *gather) { struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops); + size_t unmapped = 0, ret; if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias))) return 0; - return __arm_v7s_unmap(data, gather, iova, size, 1, data->pgd); + while (pgcount--) { + ret = __arm_v7s_unmap(data, gather, iova, pgsize, 1, data->pgd); + if (!ret) + break; + + unmapped += pgsize; + iova += pgsize; + } + + return unmapped; +} + +static size_t arm_v7s_unmap(struct io_pgtable_ops *ops, unsigned long iova, + size_t size, struct iommu_iotlb_gather *gather) +{ + return arm_v7s_unmap_pages(ops, iova, size, 1, gather); } static phys_addr_t arm_v7s_iova_to_phys(struct io_pgtable_ops *ops, @@ -781,6 +798,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg, data->iop.ops = (struct io_pgtable_ops) { .map= arm_v7s_map, .unmap = arm_v7s_unmap, + .unmap_pages= arm_v7s_unmap_pages, .iova_to_phys = arm_v7s_iova_to_phys, }; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 14/15] iommu/arm-smmu: Implement the unmap_pages() IOMMU driver callback
From: "Isaac J. Manjarres" Implement the unmap_pages() callback for the ARM SMMU driver to allow calls from iommu_unmap to unmap multiple pages of the same size in one call. Also, remove the unmap() callback for the SMMU driver, as it will no longer be used. Signed-off-by: Isaac J. Manjarres Suggested-by: Will Deacon Signed-off-by: Georgi Djakov --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 61233bcc4588..593a15cfa8d5 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -1210,8 +1210,9 @@ static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova, return ret; } -static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, -size_t size, struct iommu_iotlb_gather *gather) +static size_t arm_smmu_unmap_pages(struct iommu_domain *domain, unsigned long iova, + size_t pgsize, size_t pgcount, + struct iommu_iotlb_gather *iotlb_gather) { struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; struct arm_smmu_device *smmu = to_smmu_domain(domain)->smmu; @@ -1221,7 +1222,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, return 0; arm_smmu_rpm_get(smmu); - ret = ops->unmap(ops, iova, size, gather); + ret = ops->unmap_pages(ops, iova, pgsize, pgcount, iotlb_gather); arm_smmu_rpm_put(smmu); return ret; @@ -1574,7 +1575,7 @@ static struct iommu_ops arm_smmu_ops = { .domain_free= arm_smmu_domain_free, .attach_dev = arm_smmu_attach_dev, .map= arm_smmu_map, - .unmap = arm_smmu_unmap, + .unmap_pages= arm_smmu_unmap_pages, .flush_iotlb_all= arm_smmu_flush_iotlb_all, .iotlb_sync = arm_smmu_iotlb_sync, .iova_to_phys = arm_smmu_iova_to_phys, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 08/15] iommu: Add support for the map_pages() callback
From: "Isaac J. Manjarres" Since iommu_pgsize can calculate how many pages of the same size can be mapped/unmapped before the next largest page size boundary, add support for invoking an IOMMU driver's map_pages() callback, if it provides one. Signed-off-by: Isaac J. Manjarres Suggested-by: Will Deacon Signed-off-by: Georgi Djakov --- drivers/iommu/iommu.c | 43 +++ 1 file changed, 35 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 725622c7e603..70a729ce88b1 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2429,6 +2429,30 @@ static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova, return pgsize; } +static int __iommu_map_pages(struct iommu_domain *domain, unsigned long iova, +phys_addr_t paddr, size_t size, int prot, +gfp_t gfp, size_t *mapped) +{ + const struct iommu_ops *ops = domain->ops; + size_t pgsize, count; + int ret; + + pgsize = iommu_pgsize(domain, iova, paddr, size, ); + + pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx count %zu\n", +iova, , pgsize, count); + + if (ops->map_pages) { + ret = ops->map_pages(domain, iova, paddr, pgsize, count, prot, +gfp, mapped); + } else { + ret = ops->map(domain, iova, paddr, pgsize, prot, gfp); + *mapped = ret ? 0 : pgsize; + } + + return ret; +} + static int __iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot, gfp_t gfp) { @@ -2439,7 +2463,7 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t orig_paddr = paddr; int ret = 0; - if (unlikely(ops->map == NULL || + if (unlikely(!(ops->map || ops->map_pages) || domain->pgsize_bitmap == 0UL)) return -ENODEV; @@ -2463,18 +2487,21 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova, pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, , size); while (size) { - size_t pgsize = iommu_pgsize(domain, iova, paddr, size, NULL); + size_t mapped = 0; - pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n", -iova, , pgsize); - ret = ops->map(domain, iova, paddr, pgsize, prot, gfp); + ret = __iommu_map_pages(domain, iova, paddr, size, prot, gfp, + ); + /* +* Some pages may have been mapped, even if an error occurred, +* so we should account for those so they can be unmapped. +*/ + size -= mapped; if (ret) break; - iova += pgsize; - paddr += pgsize; - size -= pgsize; + iova += mapped; + paddr += mapped; } /* unroll mapping in case something went wrong */ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 01/15] iommu/io-pgtable: Introduce unmap_pages() as a page table op
From: "Isaac J. Manjarres" The io-pgtable code expects to operate on a single block or granule of memory that is supported by the IOMMU hardware when unmapping memory. This means that when a large buffer that consists of multiple such blocks is unmapped, the io-pgtable code will walk the page tables to the correct level to unmap each block, even for blocks that are virtually contiguous and at the same level, which can incur an overhead in performance. Introduce the unmap_pages() page table op to express to the io-pgtable code that it should unmap a number of blocks of the same size, instead of a single block. Doing so allows multiple blocks to be unmapped in one call to the io-pgtable code, reducing the number of page table walks, and indirect calls. Signed-off-by: Isaac J. Manjarres Suggested-by: Will Deacon Signed-off-by: Will Deacon Signed-off-by: Georgi Djakov --- include/linux/io-pgtable.h | 4 1 file changed, 4 insertions(+) diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 4d40dfa75b55..9391c5fa71e6 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -144,6 +144,7 @@ struct io_pgtable_cfg { * * @map: Map a physically contiguous memory region. * @unmap:Unmap a physically contiguous memory region. + * @unmap_pages: Unmap a range of virtually contiguous pages of the same size. * @iova_to_phys: Translate iova to physical address. * * These functions map directly onto the iommu_ops member functions with @@ -154,6 +155,9 @@ struct io_pgtable_ops { phys_addr_t paddr, size_t size, int prot, gfp_t gfp); size_t (*unmap)(struct io_pgtable_ops *ops, unsigned long iova, size_t size, struct iommu_iotlb_gather *gather); + size_t (*unmap_pages)(struct io_pgtable_ops *ops, unsigned long iova, + size_t pgsize, size_t pgcount, + struct iommu_iotlb_gather *gather); phys_addr_t (*iova_to_phys)(struct io_pgtable_ops *ops, unsigned long iova); }; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 10/15] iommu/io-pgtable-arm: Implement arm_lpae_unmap_pages()
From: "Isaac J. Manjarres" Implement the unmap_pages() callback for the ARM LPAE io-pgtable format. Signed-off-by: Isaac J. Manjarres Suggested-by: Will Deacon Signed-off-by: Georgi Djakov --- drivers/iommu/io-pgtable-arm.c | 120 + 1 file changed, 74 insertions(+), 46 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index ea66b10c04c4..fe8fa0ee9c98 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -46,6 +46,9 @@ #define ARM_LPAE_PGD_SIZE(d) \ (sizeof(arm_lpae_iopte) << (d)->pgd_bits) +#define ARM_LPAE_PTES_PER_TABLE(d) \ + (ARM_LPAE_GRANULE(d) >> ilog2(sizeof(arm_lpae_iopte))) + /* * Calculate the index at level l used to map virtual address a using the * pagetable in d. @@ -239,22 +242,19 @@ static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries, sizeof(*ptep) * num_entries, DMA_TO_DEVICE); } -static void __arm_lpae_set_pte(arm_lpae_iopte *ptep, arm_lpae_iopte pte, - int num_entries, struct io_pgtable_cfg *cfg) +static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg) { - int i; - for (i = 0; i < num_entries; i++) - ptep[i] = pte; + *ptep = 0; if (!cfg->coherent_walk) - __arm_lpae_sync_pte(ptep, num_entries, cfg); + __arm_lpae_sync_pte(ptep, 1, cfg); } static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, struct iommu_iotlb_gather *gather, - unsigned long iova, size_t size, int lvl, - arm_lpae_iopte *ptep); + unsigned long iova, size_t size, size_t pgcount, + int lvl, arm_lpae_iopte *ptep); static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data, phys_addr_t paddr, arm_lpae_iopte prot, @@ -298,7 +298,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data, size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data); tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data); - if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, + if (__arm_lpae_unmap(data, NULL, iova + i * sz, sz, 1, lvl, tblp) != sz) { WARN_ON(1); return -EINVAL; @@ -526,14 +526,15 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, struct iommu_iotlb_gather *gather, unsigned long iova, size_t size, arm_lpae_iopte blk_pte, int lvl, - arm_lpae_iopte *ptep) + arm_lpae_iopte *ptep, size_t pgcount) { struct io_pgtable_cfg *cfg = >iop.cfg; arm_lpae_iopte pte, *tablep; phys_addr_t blk_paddr; size_t tablesz = ARM_LPAE_GRANULE(data); size_t split_sz = ARM_LPAE_BLOCK_SIZE(lvl, data); - int i, unmap_idx = -1; + int ptes_per_table = ARM_LPAE_PTES_PER_TABLE(data); + int i, unmap_idx_start = -1, num_entries = 0, max_entries; if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS)) return 0; @@ -542,15 +543,18 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, if (!tablep) return 0; /* Bytes unmapped */ - if (size == split_sz) - unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data); + if (size == split_sz) { + unmap_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data); + max_entries = ptes_per_table - unmap_idx_start; + num_entries = min_t(int, pgcount, max_entries); + } blk_paddr = iopte_to_paddr(blk_pte, data); pte = iopte_prot(blk_pte); - for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) { + for (i = 0; i < ptes_per_table; i++, blk_paddr += split_sz) { /* Unmap! */ - if (i == unmap_idx) + if (i >= unmap_idx_start && i < (unmap_idx_start + num_entries)) continue; __arm_lpae_init_pte(data, blk_paddr, pte, lvl, 1, [i]); @@ -568,76 +572,92 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, return 0; tablep = iopte_deref(pte, data); - } else if (unmap_idx >= 0) { - io_pgtable_tlb_add_page(>iop, gather, iova, size); - return size; + } else if (unmap_idx_start >= 0) { + for (i = 0; i < num_entries; i++) +
[PATCH v7 13/15] iommu/io-pgtable-arm-v7s: Implement arm_v7s_map_pages()
From: "Isaac J. Manjarres" Implement the map_pages() callback for the ARM v7s io-pgtable format. Signed-off-by: Isaac J. Manjarres Signed-off-by: Georgi Djakov --- drivers/iommu/io-pgtable-arm-v7s.c | 26 ++ 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c index 1af060686985..5db90d7ce2ec 100644 --- a/drivers/iommu/io-pgtable-arm-v7s.c +++ b/drivers/iommu/io-pgtable-arm-v7s.c @@ -519,11 +519,12 @@ static int __arm_v7s_map(struct arm_v7s_io_pgtable *data, unsigned long iova, return __arm_v7s_map(data, iova, paddr, size, prot, lvl + 1, cptep, gfp); } -static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova, - phys_addr_t paddr, size_t size, int prot, gfp_t gfp) +static int arm_v7s_map_pages(struct io_pgtable_ops *ops, unsigned long iova, +phys_addr_t paddr, size_t pgsize, size_t pgcount, +int prot, gfp_t gfp, size_t *mapped) { struct arm_v7s_io_pgtable *data = io_pgtable_ops_to_data(ops); - int ret; + int ret = -EINVAL; if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) || paddr >= (1ULL << data->iop.cfg.oas))) @@ -533,7 +534,17 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova, if (!(prot & (IOMMU_READ | IOMMU_WRITE))) return 0; - ret = __arm_v7s_map(data, iova, paddr, size, prot, 1, data->pgd, gfp); + while (pgcount--) { + ret = __arm_v7s_map(data, iova, paddr, pgsize, prot, 1, data->pgd, + gfp); + if (ret) + break; + + iova += pgsize; + paddr += pgsize; + if (mapped) + *mapped += pgsize; + } /* * Synchronise all PTE updates for the new mapping before there's * a chance for anything to kick off a table walk for the new iova. @@ -543,6 +554,12 @@ static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova, return ret; } +static int arm_v7s_map(struct io_pgtable_ops *ops, unsigned long iova, + phys_addr_t paddr, size_t size, int prot, gfp_t gfp) +{ + return arm_v7s_map_pages(ops, iova, paddr, size, 1, prot, gfp, NULL); +} + static void arm_v7s_free_pgtable(struct io_pgtable *iop) { struct arm_v7s_io_pgtable *data = io_pgtable_to_data(iop); @@ -797,6 +814,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg, data->iop.ops = (struct io_pgtable_ops) { .map= arm_v7s_map, + .map_pages = arm_v7s_map_pages, .unmap = arm_v7s_unmap, .unmap_pages= arm_v7s_unmap_pages, .iova_to_phys = arm_v7s_iova_to_phys, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 07/15] iommu: Hook up '->unmap_pages' driver callback
From: Will Deacon Extend iommu_pgsize() to populate an optional 'count' parameter so that we can direct unmapping operation to the ->unmap_pages callback if it has been provided by the driver. Signed-off-by: Will Deacon Signed-off-by: Isaac J. Manjarres Signed-off-by: Georgi Djakov --- drivers/iommu/iommu.c | 59 +++ 1 file changed, 50 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 80e14c139d40..725622c7e603 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2376,11 +2376,11 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova) EXPORT_SYMBOL_GPL(iommu_iova_to_phys); static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size) + phys_addr_t paddr, size_t size, size_t *count) { - unsigned int pgsize_idx; + unsigned int pgsize_idx, pgsize_idx_next; unsigned long pgsizes; - size_t pgsize; + size_t offset, pgsize, pgsize_next; unsigned long addr_merge = paddr | iova; /* Page sizes supported by the hardware and small enough for @size */ @@ -2396,7 +2396,36 @@ static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long iova, /* Pick the biggest page size remaining */ pgsize_idx = __fls(pgsizes); pgsize = BIT(pgsize_idx); + if (!count) + return pgsize; + /* Find the next biggest support page size, if it exists */ + pgsizes = domain->pgsize_bitmap & ~GENMASK(pgsize_idx, 0); + if (!pgsizes) + goto out_set_count; + + pgsize_idx_next = __ffs(pgsizes); + pgsize_next = BIT(pgsize_idx_next); + + /* +* There's no point trying a bigger page size unless the virtual +* and physical addresses are similarly offset within the larger page. +*/ + if ((iova ^ paddr) & (pgsize_next - 1)) + goto out_set_count; + + /* Calculate the offset to the next page size alignment boundary */ + offset = pgsize_next - (addr_merge & (pgsize_next - 1)); + + /* +* If size is big enough to accommodate the larger page, reduce +* the number of smaller pages. +*/ + if (offset + pgsize_next <= size) + size = offset; + +out_set_count: + *count = size >> pgsize_idx; return pgsize; } @@ -2434,7 +2463,7 @@ static int __iommu_map(struct iommu_domain *domain, unsigned long iova, pr_debug("map: iova 0x%lx pa %pa size 0x%zx\n", iova, , size); while (size) { - size_t pgsize = iommu_pgsize(domain, iova, paddr, size); + size_t pgsize = iommu_pgsize(domain, iova, paddr, size, NULL); pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n", iova, , pgsize); @@ -2485,6 +2514,19 @@ int iommu_map_atomic(struct iommu_domain *domain, unsigned long iova, } EXPORT_SYMBOL_GPL(iommu_map_atomic); +static size_t __iommu_unmap_pages(struct iommu_domain *domain, + unsigned long iova, size_t size, + struct iommu_iotlb_gather *iotlb_gather) +{ + const struct iommu_ops *ops = domain->ops; + size_t pgsize, count; + + pgsize = iommu_pgsize(domain, iova, iova, size, ); + return ops->unmap_pages ? + ops->unmap_pages(domain, iova, pgsize, count, iotlb_gather) : + ops->unmap(domain, iova, pgsize, iotlb_gather); +} + static size_t __iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size, struct iommu_iotlb_gather *iotlb_gather) @@ -2494,7 +2536,7 @@ static size_t __iommu_unmap(struct iommu_domain *domain, unsigned long orig_iova = iova; unsigned int min_pagesz; - if (unlikely(ops->unmap == NULL || + if (unlikely(!(ops->unmap || ops->unmap_pages) || domain->pgsize_bitmap == 0UL)) return 0; @@ -2522,10 +2564,9 @@ static size_t __iommu_unmap(struct iommu_domain *domain, * or we hit an area that isn't mapped. */ while (unmapped < size) { - size_t pgsize; - - pgsize = iommu_pgsize(domain, iova, iova, size - unmapped); - unmapped_page = ops->unmap(domain, iova, pgsize, iotlb_gather); + unmapped_page = __iommu_unmap_pages(domain, iova, + size - unmapped, + iotlb_gather); if (!unmapped_page) break; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 11/15] iommu/io-pgtable-arm: Implement arm_lpae_map_pages()
From: "Isaac J. Manjarres" Implement the map_pages() callback for the ARM LPAE io-pgtable format. Signed-off-by: Isaac J. Manjarres Signed-off-by: Georgi Djakov --- drivers/iommu/io-pgtable-arm.c | 41 +++-- 1 file changed, 31 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index fe8fa0ee9c98..053df4048a29 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -341,20 +341,30 @@ static arm_lpae_iopte arm_lpae_install_table(arm_lpae_iopte *table, } static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova, - phys_addr_t paddr, size_t size, arm_lpae_iopte prot, - int lvl, arm_lpae_iopte *ptep, gfp_t gfp) + phys_addr_t paddr, size_t size, size_t pgcount, + arm_lpae_iopte prot, int lvl, arm_lpae_iopte *ptep, + gfp_t gfp, size_t *mapped) { arm_lpae_iopte *cptep, pte; size_t block_size = ARM_LPAE_BLOCK_SIZE(lvl, data); size_t tblsz = ARM_LPAE_GRANULE(data); struct io_pgtable_cfg *cfg = >iop.cfg; + int ret = 0, num_entries, max_entries, map_idx_start; /* Find our entry at the current level */ - ptep += ARM_LPAE_LVL_IDX(iova, lvl, data); + map_idx_start = ARM_LPAE_LVL_IDX(iova, lvl, data); + ptep += map_idx_start; /* If we can install a leaf entry at this level, then do so */ - if (size == block_size) - return arm_lpae_init_pte(data, iova, paddr, prot, lvl, 1, ptep); + if (size == block_size) { + max_entries = ARM_LPAE_PTES_PER_TABLE(data) - map_idx_start; + num_entries = min_t(int, pgcount, max_entries); + ret = arm_lpae_init_pte(data, iova, paddr, prot, lvl, num_entries, ptep); + if (!ret && mapped) + *mapped += num_entries * size; + + return ret; + } /* We can't allocate tables at the final level */ if (WARN_ON(lvl >= ARM_LPAE_MAX_LEVELS - 1)) @@ -383,7 +393,8 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova, } /* Rinse, repeat */ - return __arm_lpae_map(data, iova, paddr, size, prot, lvl + 1, cptep, gfp); + return __arm_lpae_map(data, iova, paddr, size, pgcount, prot, lvl + 1, + cptep, gfp, mapped); } static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, @@ -450,8 +461,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, return pte; } -static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova, - phys_addr_t paddr, size_t size, int iommu_prot, gfp_t gfp) +static int arm_lpae_map_pages(struct io_pgtable_ops *ops, unsigned long iova, + phys_addr_t paddr, size_t pgsize, size_t pgcount, + int iommu_prot, gfp_t gfp, size_t *mapped) { struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops); struct io_pgtable_cfg *cfg = >iop.cfg; @@ -460,7 +472,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova, arm_lpae_iopte prot; long iaext = (s64)iova >> cfg->ias; - if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size)) + if (WARN_ON(!pgsize || (pgsize & cfg->pgsize_bitmap) != pgsize)) return -EINVAL; if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) @@ -473,7 +485,8 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova, return 0; prot = arm_lpae_prot_to_pte(data, iommu_prot); - ret = __arm_lpae_map(data, iova, paddr, size, prot, lvl, ptep, gfp); + ret = __arm_lpae_map(data, iova, paddr, pgsize, pgcount, prot, lvl, +ptep, gfp, mapped); /* * Synchronise all PTE updates for the new mapping before there's * a chance for anything to kick off a table walk for the new iova. @@ -483,6 +496,13 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova, return ret; } +static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova, + phys_addr_t paddr, size_t size, int iommu_prot, gfp_t gfp) +{ + return arm_lpae_map_pages(ops, iova, paddr, size, 1, iommu_prot, gfp, + NULL); +} + static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl, arm_lpae_iopte *ptep) { @@ -787,6 +807,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg) data->iop.ops = (struct io_pgtable_ops) { .map= arm_lpae_map, + .map_pages = arm_lpae_map_pages, .unmap = arm_lpae_unmap,
[PATCH v7 05/15] iommu: Use bitmap to calculate page size in iommu_pgsize()
From: Will Deacon Avoid the potential for shifting values by amounts greater than the width of their type by using a bitmap to compute page size in iommu_pgsize(). Signed-off-by: Will Deacon Signed-off-by: Isaac J. Manjarres Signed-off-by: Georgi Djakov --- drivers/iommu/iommu.c | 31 --- 1 file changed, 12 insertions(+), 19 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 5419c4b9f27a..80e471ada358 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -8,6 +8,7 @@ #include #include +#include #include #include #include @@ -2378,30 +2379,22 @@ static size_t iommu_pgsize(struct iommu_domain *domain, unsigned long addr_merge, size_t size) { unsigned int pgsize_idx; + unsigned long pgsizes; size_t pgsize; - /* Max page size that still fits into 'size' */ - pgsize_idx = __fls(size); + /* Page sizes supported by the hardware and small enough for @size */ + pgsizes = domain->pgsize_bitmap & GENMASK(__fls(size), 0); - /* need to consider alignment requirements ? */ - if (likely(addr_merge)) { - /* Max page size allowed by address */ - unsigned int align_pgsize_idx = __ffs(addr_merge); - pgsize_idx = min(pgsize_idx, align_pgsize_idx); - } - - /* build a mask of acceptable page sizes */ - pgsize = (1UL << (pgsize_idx + 1)) - 1; - - /* throw away page sizes not supported by the hardware */ - pgsize &= domain->pgsize_bitmap; + /* Constrain the page sizes further based on the maximum alignment */ + if (likely(addr_merge)) + pgsizes &= GENMASK(__ffs(addr_merge), 0); - /* make sure we're still sane */ - BUG_ON(!pgsize); + /* Make sure we have at least one suitable page size */ + BUG_ON(!pgsizes); - /* pick the biggest page */ - pgsize_idx = __fls(pgsize); - pgsize = 1UL << pgsize_idx; + /* Pick the biggest page size remaining */ + pgsize_idx = __fls(pgsizes); + pgsize = BIT(pgsize_idx); return pgsize; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 04/15] iommu: Add a map_pages() op for IOMMU drivers
From: "Isaac J. Manjarres" Add a callback for IOMMU drivers to provide a path for the IOMMU framework to call into an IOMMU driver, which can call into the io-pgtable code, to map a physically contiguous rnage of pages of the same size. For IOMMU drivers that do not specify a map_pages() callback, the existing logic of mapping memory one page block at a time will be used. Signed-off-by: Isaac J. Manjarres Suggested-by: Will Deacon Acked-by: Lu Baolu Signed-off-by: Georgi Djakov --- include/linux/iommu.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 25a844121be5..d7989d4a7404 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -180,6 +180,8 @@ struct iommu_iotlb_gather { * @attach_dev: attach device to an iommu domain * @detach_dev: detach device from an iommu domain * @map: map a physically contiguous memory region to an iommu domain + * @map_pages: map a physically contiguous set of pages of the same size to + * an iommu domain. * @unmap: unmap a physically contiguous memory region from an iommu domain * @unmap_pages: unmap a number of pages of the same size from an iommu domain * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain @@ -230,6 +232,9 @@ struct iommu_ops { void (*detach_dev)(struct iommu_domain *domain, struct device *dev); int (*map)(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot, gfp_t gfp); + int (*map_pages)(struct iommu_domain *domain, unsigned long iova, +phys_addr_t paddr, size_t pgsize, size_t pgcount, +int prot, gfp_t gfp, size_t *mapped); size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, size_t size, struct iommu_iotlb_gather *iotlb_gather); size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v7 03/15] iommu/io-pgtable: Introduce map_pages() as a page table op
From: "Isaac J. Manjarres" Mapping memory into io-pgtables follows the same semantics that unmapping memory used to follow (i.e. a buffer will be mapped one page block per call to the io-pgtable code). This means that it can be optimized in the same way that unmapping memory was, so add a map_pages() callback to the io-pgtable ops structure, so that a range of pages of the same size can be mapped within the same call. Signed-off-by: Isaac J. Manjarres Suggested-by: Will Deacon Signed-off-by: Georgi Djakov --- include/linux/io-pgtable.h | 4 1 file changed, 4 insertions(+) diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 9391c5fa71e6..c43f3b899d2a 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -143,6 +143,7 @@ struct io_pgtable_cfg { * struct io_pgtable_ops - Page table manipulation API for IOMMU drivers. * * @map: Map a physically contiguous memory region. + * @map_pages:Map a physically contiguous range of pages of the same size. * @unmap:Unmap a physically contiguous memory region. * @unmap_pages: Unmap a range of virtually contiguous pages of the same size. * @iova_to_phys: Translate iova to physical address. @@ -153,6 +154,9 @@ struct io_pgtable_cfg { struct io_pgtable_ops { int (*map)(struct io_pgtable_ops *ops, unsigned long iova, phys_addr_t paddr, size_t size, int prot, gfp_t gfp); + int (*map_pages)(struct io_pgtable_ops *ops, unsigned long iova, +phys_addr_t paddr, size_t pgsize, size_t pgcount, +int prot, gfp_t gfp, size_t *mapped); size_t (*unmap)(struct io_pgtable_ops *ops, unsigned long iova, size_t size, struct iommu_iotlb_gather *gather); size_t (*unmap_pages)(struct io_pgtable_ops *ops, unsigned long iova, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 3/6] ACPI: Add driver for the VIOT table
Hi Jean, On 6/10/21 9:51 AM, Jean-Philippe Brucker wrote: > The ACPI Virtual I/O Translation Table describes topology of > para-virtual platforms, similarly to vendor tables DMAR, IVRS and IORT. > For now it describes the relation between virtio-iommu and the endpoints > it manages. > > Three steps are needed to configure DMA of endpoints: > > (1) acpi_viot_init(): parse the VIOT table, find or create the fwnode > associated to each vIOMMU device. > > (2) When probing the vIOMMU device, the driver registers its IOMMU ops > within the IOMMU subsystem. This step doesn't require any > intervention from the VIOT driver. > > (3) viot_iommu_configure(): before binding the endpoint to a driver, > find the associated IOMMU ops. Register them, along with the > endpoint ID, into the device's iommu_fwspec. > > If step (3) happens before step (2), it is deferred until the IOMMU is > initialized, then retried. > > Signed-off-by: Jean-Philippe Brucker > --- > drivers/acpi/Kconfig | 3 + > drivers/iommu/Kconfig | 1 + > drivers/acpi/Makefile | 2 + > include/linux/acpi_viot.h | 19 ++ > drivers/acpi/bus.c| 2 + > drivers/acpi/scan.c | 3 + > drivers/acpi/viot.c | 364 ++ > MAINTAINERS | 8 + > 8 files changed, 402 insertions(+) > create mode 100644 include/linux/acpi_viot.h > create mode 100644 drivers/acpi/viot.c > > diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig > index eedec61e3476..3758c6940ed7 100644 > --- a/drivers/acpi/Kconfig > +++ b/drivers/acpi/Kconfig > @@ -526,6 +526,9 @@ endif > > source "drivers/acpi/pmic/Kconfig" > > +config ACPI_VIOT > + bool > + > endif# ACPI > > config X86_PM_TIMER > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index 1f111b399bca..aff8a4830dd1 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -403,6 +403,7 @@ config VIRTIO_IOMMU > depends on ARM64 > select IOMMU_API > select INTERVAL_TREE > + select ACPI_VIOT if ACPI > help > Para-virtualised IOMMU driver with virtio. > > diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile > index 700b41adf2db..a6e644c48987 100644 > --- a/drivers/acpi/Makefile > +++ b/drivers/acpi/Makefile > @@ -118,3 +118,5 @@ video-objs+= acpi_video.o > video_detect.o > obj-y+= dptf/ > > obj-$(CONFIG_ARM64) += arm64/ > + > +obj-$(CONFIG_ACPI_VIOT) += viot.o > diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h > new file mode 100644 > index ..1eb8ee5b0e5f > --- /dev/null > +++ b/include/linux/acpi_viot.h > @@ -0,0 +1,19 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > + > +#ifndef __ACPI_VIOT_H__ > +#define __ACPI_VIOT_H__ > + > +#include > + > +#ifdef CONFIG_ACPI_VIOT > +void __init acpi_viot_init(void); > +int viot_iommu_configure(struct device *dev); > +#else > +static inline void acpi_viot_init(void) {} > +static inline int viot_iommu_configure(struct device *dev) > +{ > + return -ENODEV; > +} > +#endif > + > +#endif /* __ACPI_VIOT_H__ */ > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c > index be7da23fad76..b835ca702ff0 100644 > --- a/drivers/acpi/bus.c > +++ b/drivers/acpi/bus.c > @@ -27,6 +27,7 @@ > #include > #endif > #include > +#include > #include > #include > #include > @@ -1339,6 +1340,7 @@ static int __init acpi_init(void) > pci_mmcfg_late_init(); > acpi_iort_init(); > acpi_scan_init(); > + acpi_viot_init(); > acpi_ec_init(); > acpi_debugfs_init(); > acpi_sleep_proc_init(); > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c > index 0c53c8533300..4fa684fdfda8 100644 > --- a/drivers/acpi/scan.c > +++ b/drivers/acpi/scan.c > @@ -9,6 +9,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -1556,6 +1557,8 @@ static const struct iommu_ops > *acpi_iommu_configure_id(struct device *dev, > return ops; > > err = iort_iommu_configure_id(dev, id_in); > + if (err && err != -EPROBE_DEFER) > + err = viot_iommu_configure(dev); > > /* >* If we have reason to believe the IOMMU driver missed the initial > diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c > new file mode 100644 > index ..892cd9fa7b6d > --- /dev/null > +++ b/drivers/acpi/viot.c > @@ -0,0 +1,364 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Virtual I/O topology > + * > + * The Virtual I/O Translation Table (VIOT) describes the topology of > + * para-virtual IOMMUs and the endpoints they manage. The OS uses it to > + * initialize devices in the right order, preventing endpoints from issuing > DMA > + * before their IOMMU is ready. > + * > + * When binding a driver to a device, before calling the device driver's > probe() > + * method, the driver infrastructure calls
Re: [PATCH v4 0/6] Add support for ACPI VIOT
Hi Jean, On 6/10/21 9:51 AM, Jean-Philippe Brucker wrote: > Add a driver for the ACPI VIOT table, which provides topology > information for para-virtual IOMMUs. Enable virtio-iommu on > non-devicetree platforms, including x86. > > Since v3 [1] I fixed a build bug for !CONFIG_IOMMU_API. Joerg offered to > take this series through the IOMMU tree, which requires Acks for patches > 1-3. > > You can find a QEMU implementation at [2], with extra support for > testing all VIOT nodes including MMIO-based endpoints and IOMMU. > This series is at [3]. > > [1] > https://lore.kernel.org/linux-iommu/2021060215.1077006-1-jean-phili...@linaro.org/ > [2] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/acpi > [3] https://jpbrucker.net/git/linux/log/?h=virtio-iommu/acpi I tested the series on both aarch64 and x86_64 with qemu. It works for me. Feel free to add my T-b. Tested-by: Eric Auger Thanks Eric > > > Jean-Philippe Brucker (6): > ACPI: arm64: Move DMA setup operations out of IORT > ACPI: Move IOMMU setup code out of IORT > ACPI: Add driver for the VIOT table > iommu/dma: Pass address limit rather than size to > iommu_setup_dma_ops() > iommu/dma: Simplify calls to iommu_setup_dma_ops() > iommu/virtio: Enable x86 support > > drivers/acpi/Kconfig | 3 + > drivers/iommu/Kconfig| 4 +- > drivers/acpi/Makefile| 2 + > drivers/acpi/arm64/Makefile | 1 + > include/acpi/acpi_bus.h | 3 + > include/linux/acpi.h | 3 + > include/linux/acpi_iort.h| 14 +- > include/linux/acpi_viot.h| 19 ++ > include/linux/dma-iommu.h| 4 +- > arch/arm64/mm/dma-mapping.c | 2 +- > drivers/acpi/arm64/dma.c | 50 + > drivers/acpi/arm64/iort.c| 129 ++--- > drivers/acpi/bus.c | 2 + > drivers/acpi/scan.c | 78 +++- > drivers/acpi/viot.c | 364 +++ > drivers/iommu/amd/iommu.c| 9 +- > drivers/iommu/dma-iommu.c| 17 +- > drivers/iommu/intel/iommu.c | 10 +- > drivers/iommu/virtio-iommu.c | 8 + > MAINTAINERS | 8 + > 20 files changed, 580 insertions(+), 150 deletions(-) > create mode 100644 include/linux/acpi_viot.h > create mode 100644 drivers/acpi/arm64/dma.c > create mode 100644 drivers/acpi/viot.c > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems
On Tue, 15 Jun 2021 13:15:43 -0600 Rob Herring wrote: > If a property has an 'items' list, then a 'minItems' or 'maxItems' with the > same size as the list is redundant and can be dropped. Note that is DT > schema specific behavior and not standard json-schema behavior. The tooling > will fixup the final schema adding any unspecified minItems/maxItems. > > This condition is partially checked with the meta-schema already, but > only if both 'minItems' and 'maxItems' are equal to the 'items' length. > An improved meta-schema is pending. > ... > .../devicetree/bindings/iio/adc/amlogic,meson-saradc.yaml | 1 - For this one, the fact it overrides maxItems elsewhere makes this a little bit odd. I guess we can get used to it being implicit. > .../devicetree/bindings/iio/adc/st,stm32-dfsdm-adc.yaml | 2 -- Acked-by: Jonathan Cameron ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v12 00/12] Restricted DMA
Hi Claire, On Wed, Jun 16, 2021 at 02:21:45PM +0800, Claire Chang wrote: > This series implements mitigations for lack of DMA access control on > systems without an IOMMU, which could result in the DMA accessing the > system memory at unexpected times and/or unexpected addresses, possibly > leading to data leakage or corruption. > > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is > not behind an IOMMU. As PCI-e, by design, gives the device full access to > system memory, a vulnerability in the Wi-Fi firmware could easily escalate > to a full system exploit (remote wifi exploits: [1a], [1b] that shows a > full chain of exploits; [2], [3]). > > To mitigate the security concerns, we introduce restricted DMA. Restricted > DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a > specially allocated region and does memory allocation from the same region. > The feature on its own provides a basic level of protection against the DMA > overwriting buffer contents at unexpected times. However, to protect > against general data leakage and system memory corruption, the system needs > to provide a way to restrict the DMA to a predefined memory region (this is > usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]). > > [1a] > https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html > [1b] > https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html > [2] https://blade.tencent.com/en/advisories/qualpwn/ > [3] > https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/ > [4] > https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132 > > v12: > Split is_dev_swiotlb_force into is_swiotlb_force_bounce (patch 06/12) and > is_swiotlb_for_alloc (patch 09/12) I took this for a spin in an arm64 KVM guest with virtio devices using the DMA API and it works as expected on top of swiotlb devel/for-linus-5.14, so: Tested-by: Will Deacon Thanks! Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 9/9] memory: mtk-smi: mt8195: Add initial setting for smi-larb
To improve the performance, We add some initial setting for smi larbs. there are two part: 1), Each port has the special ostd(outstanding) value in each larb. 2), Two general setting for each larb. In some SoC, this setting maybe changed dynamically for some special case like 4K, and this initial setting is enough in mt8195. Signed-off-by: Yong Wu --- drivers/memory/mtk-smi.c | 74 +++- 1 file changed, 73 insertions(+), 1 deletion(-) diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c index 08b28e96fd8c..33f497b58f7b 100644 --- a/drivers/memory/mtk-smi.c +++ b/drivers/memory/mtk-smi.c @@ -32,6 +32,14 @@ #define SMI_DUMMY 0x444 /* SMI LARB */ +#define SMI_LARB_CMD_THRT_CON 0x24 +#define SMI_LARB_THRT_EN 0x370256 + +#define SMI_LARB_SW_FLAG 0x40 +#define SMI_LARB_SW_FLAG_1 0x1 + +#define SMI_LARB_OSTDL_PORT0x200 +#define SMI_LARB_OSTDL_PORTx(id) (SMI_LARB_OSTDL_PORT + (((id) & 0x1f) << 2)) /* Below are about mmu enable registers, they are different in SoCs */ /* mt2701 */ @@ -67,6 +75,11 @@ }) #define SMI_COMMON_INIT_REGS_NR6 +#define SMI_LARB_PORT_NR_MAX 32 + +#define MTK_SMI_FLAG_LARB_THRT_EN BIT(0) +#define MTK_SMI_FLAG_LARB_SW_FLAG BIT(1) +#define MTK_SMI_CAPS(flags, _x)(!!((flags) & (_x))) struct mtk_smi_reg_pair { unsigned intoffset; @@ -100,6 +113,8 @@ struct mtk_smi_larb_gen { int port_in_larb[MTK_LARB_NR_MAX + 1]; void (*config_port)(struct device *dev); unsigned intlarb_direct_to_common_mask; + const u8(*ostd)[SMI_LARB_PORT_NR_MAX]; + unsigned intflags_general; }; struct mtk_smi { @@ -187,12 +202,22 @@ static void mtk_smi_larb_config_port_mt8173(struct device *dev) static void mtk_smi_larb_config_port_gen2_general(struct device *dev) { struct mtk_smi_larb *larb = dev_get_drvdata(dev); - u32 reg; + u32 reg, flags_general = larb->larb_gen->flags_general; + const u8 *larbostd = larb->larb_gen->ostd[larb->larbid]; int i; if (BIT(larb->larbid) & larb->larb_gen->larb_direct_to_common_mask) return; + if (MTK_SMI_CAPS(flags_general, MTK_SMI_FLAG_LARB_THRT_EN)) + writel_relaxed(SMI_LARB_THRT_EN, larb->base + SMI_LARB_CMD_THRT_CON); + + if (MTK_SMI_CAPS(flags_general, MTK_SMI_FLAG_LARB_SW_FLAG)) + writel_relaxed(SMI_LARB_SW_FLAG_1, larb->base + SMI_LARB_SW_FLAG); + + for (i = 0; i < SMI_LARB_PORT_NR_MAX && larbostd && !!larbostd[i]; i++) + writel_relaxed(larbostd[i], larb->base + SMI_LARB_OSTDL_PORTx(i)); + for_each_set_bit(i, (unsigned long *)larb->mmu, 32) { reg = readl_relaxed(larb->base + SMI_LARB_NONSEC_CON(i)); reg |= F_MMU_EN; @@ -263,6 +288,51 @@ static const struct component_ops mtk_smi_larb_component_ops = { .unbind = mtk_smi_larb_unbind, }; +static const u8 mtk_smi_larb_mt8195_ostd[][SMI_LARB_PORT_NR_MAX] = { + [0] = {0x0a, 0xc, 0x22, 0x22, 0x01, 0x0a,}, /* larb0 */ + [1] = {0x0a, 0xc, 0x22, 0x22, 0x01, 0x0a,}, /* larb1 */ + [2] = {0x12, 0x12, 0x12, 0x12, 0x0a,}, /* ... */ + [3] = {0x12, 0x12, 0x12, 0x12, 0x28, 0x28, 0x0a,}, + [4] = {0x06, 0x01, 0x17, 0x06, 0x0a,}, + [5] = {0x06, 0x01, 0x17, 0x06, 0x06, 0x01, 0x06, 0x0a,}, + [6] = {0x06, 0x01, 0x06, 0x0a,}, + [7] = {0x0c, 0x0c, 0x12,}, + [8] = {0x0c, 0x0c, 0x12,}, + [9] = {0x0a, 0x08, 0x04, 0x06, 0x01, 0x01, 0x10, 0x18, 0x11, 0x0a, + 0x08, 0x04, 0x11, 0x06, 0x02, 0x06, 0x01, 0x11, 0x11, 0x06,}, + [10] = {0x18, 0x08, 0x01, 0x01, 0x20, 0x12, 0x18, 0x06, 0x05, 0x10, + 0x08, 0x08, 0x10, 0x08, 0x08, 0x18, 0x0c, 0x09, 0x0b, 0x0d, + 0x0d, 0x06, 0x10, 0x10,}, + [11] = {0x0e, 0x0e, 0x0e, 0x0e, 0x0e, 0x0e, 0x01, 0x01, 0x01, 0x01,}, + [12] = {0x09, 0x09, 0x05, 0x05, 0x0c, 0x18, 0x02, 0x02, 0x04, 0x02,}, + [13] = {0x02, 0x02, 0x12, 0x12, 0x02, 0x02, 0x02, 0x02, 0x08, 0x01,}, + [14] = {0x12, 0x12, 0x02, 0x02, 0x02, 0x02, 0x16, 0x01, 0x16, 0x01, + 0x01, 0x02, 0x02, 0x08, 0x02,}, + [15] = {}, /* */ + [16] = {0x28, 0x02, 0x02, 0x12, 0x02, 0x12, 0x10, 0x02, 0x02, 0x0a, + 0x12, 0x02, 0x0a, 0x16, 0x02, 0x04,}, + [17] = {0x1a, 0x0e, 0x0a, 0x0a, 0x0c, 0x0e, 0x10,}, + [18] = {0x12, 0x06, 0x12, 0x06,}, + [19] = {0x01, 0x04, 0x01, 0x01, 0x01, 0x01, 0x01, 0x04, 0x04, 0x01, + 0x01, 0x01, 0x04, 0x0a, 0x06, 0x01, 0x01, 0x01, 0x0a, 0x06, + 0x01, 0x01, 0x05, 0x03, 0x03, 0x04, 0x01,}, + [20] = {0x01, 0x04, 0x01, 0x01, 0x01, 0x01, 0x01, 0x04, 0x04, 0x01, + 0x01, 0x01, 0x04, 0x0a, 0x06, 0x01, 0x01, 0x01, 0x0a, 0x06, +
[PATCH 7/9] memory: mtk-smi: mt8195: Add smi support
mt8195 has two smi-common. the IP are the same. only the larbs that connect with the smi-common are different. thus the bus_sel is different for the two smi-common. Signed-off-by: Yong Wu --- drivers/memory/mtk-smi.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c index fa3a14605dc2..8b1bfef47ecd 100644 --- a/drivers/memory/mtk-smi.c +++ b/drivers/memory/mtk-smi.c @@ -286,6 +286,10 @@ static const struct mtk_smi_larb_gen mtk_smi_larb_mt8192 = { .config_port= mtk_smi_larb_config_port_gen2_general, }; +static const struct mtk_smi_larb_gen mtk_smi_larb_mt8195 = { + .config_port= mtk_smi_larb_config_port_gen2_general, +}; + static const struct of_device_id mtk_smi_larb_of_ids[] = { {.compatible = "mediatek,mt2701-smi-larb", .data = _smi_larb_mt2701}, {.compatible = "mediatek,mt2712-smi-larb", .data = _smi_larb_mt2712}, @@ -294,6 +298,7 @@ static const struct of_device_id mtk_smi_larb_of_ids[] = { {.compatible = "mediatek,mt8173-smi-larb", .data = _smi_larb_mt8173}, {.compatible = "mediatek,mt8183-smi-larb", .data = _smi_larb_mt8183}, {.compatible = "mediatek,mt8192-smi-larb", .data = _smi_larb_mt8192}, + {.compatible = "mediatek,mt8195-smi-larb", .data = _smi_larb_mt8195}, {} }; @@ -408,6 +413,21 @@ static const struct mtk_smi_common_plat mtk_smi_common_mt8192 = { F_MMU1_LARB(6), }; +static const struct mtk_smi_common_plat mtk_smi_common_mt8195_vdo = { + .type = MTK_SMI_GEN2, + .bus_sel = F_MMU1_LARB(1) | F_MMU1_LARB(3) | F_MMU1_LARB(5) | + F_MMU1_LARB(7), +}; + +static const struct mtk_smi_common_plat mtk_smi_common_mt8195_vpp = { + .type = MTK_SMI_GEN2, + .bus_sel = F_MMU1_LARB(1) | F_MMU1_LARB(2) | F_MMU1_LARB(7), +}; + +static const struct mtk_smi_common_plat mtk_smi_sub_common_mt8195 = { + .type = MTK_SMI_GEN2_SUB_COMM, +}; + static const struct of_device_id mtk_smi_common_of_ids[] = { {.compatible = "mediatek,mt2701-smi-common", .data = _smi_common_gen1}, {.compatible = "mediatek,mt2712-smi-common", .data = _smi_common_gen2}, @@ -416,6 +436,9 @@ static const struct of_device_id mtk_smi_common_of_ids[] = { {.compatible = "mediatek,mt8173-smi-common", .data = _smi_common_gen2}, {.compatible = "mediatek,mt8183-smi-common", .data = _smi_common_mt8183}, {.compatible = "mediatek,mt8192-smi-common", .data = _smi_common_mt8192}, + {.compatible = "mediatek,mt8195-smi-common-vdo", .data = _smi_common_mt8195_vdo}, + {.compatible = "mediatek,mt8195-smi-common-vpp", .data = _smi_common_mt8195_vpp}, + {.compatible = "mediatek,mt8195-smi-sub-common", .data = _smi_sub_common_mt8195}, {} }; -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 8/9] memory: mtk-smi: mt8195: Add initial setting for smi-common
To improve the performance, This patch adds initial setting for smi-common. some register use some fix setting(suggested from DE). Signed-off-by: Yong Wu --- drivers/memory/mtk-smi.c | 42 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c index 8b1bfef47ecd..08b28e96fd8c 100644 --- a/drivers/memory/mtk-smi.c +++ b/drivers/memory/mtk-smi.c @@ -18,11 +18,19 @@ #include /* SMI COMMON */ +#define SMI_L1LEN 0x100 + #define SMI_BUS_SEL0x220 #define SMI_BUS_LARB_SHIFT(larbid) ((larbid) << 1) /* All are MMU0 defaultly. Only specialize mmu1 here. */ #define F_MMU1_LARB(larbid)(0x1 << SMI_BUS_LARB_SHIFT(larbid)) +#define SMI_M4U_TH 0x234 +#define SMI_FIFO_TH1 0x238 +#define SMI_FIFO_TH2 0x23c +#define SMI_DCM0x300 +#define SMI_DUMMY 0x444 + /* SMI LARB */ /* Below are about mmu enable registers, they are different in SoCs */ @@ -58,6 +66,13 @@ (_id << 8 | _id << 10 | _id << 12 | _id << 14); \ }) +#define SMI_COMMON_INIT_REGS_NR6 + +struct mtk_smi_reg_pair { + unsigned intoffset; + u32 value; +}; + enum mtk_smi_type { MTK_SMI_GEN1, MTK_SMI_GEN2, /* gen2 smi common */ @@ -77,6 +92,8 @@ static const char * const mtk_smi_larb_clocks[] = { struct mtk_smi_common_plat { enum mtk_smi_type type; u32 bus_sel; /* Balance some larbs to enter mmu0 or mmu1 */ + + const struct mtk_smi_reg_pair *init; }; struct mtk_smi_larb_gen { @@ -387,6 +404,15 @@ static struct platform_driver mtk_smi_larb_driver = { } }; +static const struct mtk_smi_reg_pair mtk_smi_common_mt8195_init[SMI_COMMON_INIT_REGS_NR] = { + {SMI_L1LEN, 0xb}, + {SMI_M4U_TH, 0xe100e10}, + {SMI_FIFO_TH1, 0x506090a}, + {SMI_FIFO_TH2, 0x506090a}, + {SMI_DCM, 0x4f1}, + {SMI_DUMMY, 0x1}, +}; + static const struct mtk_smi_common_plat mtk_smi_common_gen1 = { .type = MTK_SMI_GEN1, }; @@ -417,11 +443,13 @@ static const struct mtk_smi_common_plat mtk_smi_common_mt8195_vdo = { .type = MTK_SMI_GEN2, .bus_sel = F_MMU1_LARB(1) | F_MMU1_LARB(3) | F_MMU1_LARB(5) | F_MMU1_LARB(7), + .init = mtk_smi_common_mt8195_init, }; static const struct mtk_smi_common_plat mtk_smi_common_mt8195_vpp = { .type = MTK_SMI_GEN2, .bus_sel = F_MMU1_LARB(1) | F_MMU1_LARB(2) | F_MMU1_LARB(7), + .init = mtk_smi_common_mt8195_init, }; static const struct mtk_smi_common_plat mtk_smi_sub_common_mt8195 = { @@ -514,15 +542,21 @@ static int mtk_smi_common_remove(struct platform_device *pdev) static int __maybe_unused mtk_smi_common_resume(struct device *dev) { struct mtk_smi *common = dev_get_drvdata(dev); - u32 bus_sel = common->plat->bus_sel; - int ret; + const struct mtk_smi_reg_pair *init = common->plat->init; + u32 bus_sel = common->plat->bus_sel; /* default is 0 */ + int ret, i; ret = clk_bulk_prepare_enable(common->clk_num, common->clks); if (ret) return ret; - if (common->plat->type == MTK_SMI_GEN2 && bus_sel) - writel(bus_sel, common->base + SMI_BUS_SEL); + if (common->plat->type != MTK_SMI_GEN2) + return 0; + + for (i = 0; i < SMI_COMMON_INIT_REGS_NR && init && init[i].offset; i++) + writel_relaxed(init[i].value, common->base + init[i].offset); + + writel(bus_sel, common->base + SMI_BUS_SEL); return 0; } -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 6/9] memory: mtk-smi: Add smi sub common support
This patch adds smi-sub-common support. some larbs may connect with the smi-sub-common, then connect with smi-common. Before we create device link between smi-larb with smi-common, If we have sub-common, we should use device link the smi-larb and smi-sub-common, then use device link between the smi-sub-common with smi-common. This is for enabling clock/power automatically. Move the device link code to a new interface for reusing. there is no SW extra setting for smi-sub-common. Signed-off-by: Yong Wu --- drivers/memory/mtk-smi.c | 78 ++-- 1 file changed, 52 insertions(+), 26 deletions(-) diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c index 6858877ac859..fa3a14605dc2 100644 --- a/drivers/memory/mtk-smi.c +++ b/drivers/memory/mtk-smi.c @@ -60,7 +60,8 @@ enum mtk_smi_type { MTK_SMI_GEN1, - MTK_SMI_GEN2 + MTK_SMI_GEN2, /* gen2 smi common */ + MTK_SMI_GEN2_SUB_COMM, /* gen2 smi sub common */ }; #define MTK_SMI_CLK_NR_MAX 4 @@ -93,13 +94,14 @@ struct mtk_smi { void __iomem*smi_ao_base; /* only for gen1 */ void __iomem*base;/* only for gen2 */ }; + struct device *smi_common_dev; /* for sub common */ const struct mtk_smi_common_plat *plat; }; struct mtk_smi_larb { /* larb: local arbiter */ struct mtk_smi smi; void __iomem*base; - struct device *smi_common_dev; + struct device *smi_common_dev; /* common or sub-common dev */ const struct mtk_smi_larb_gen *larb_gen; int larbid; u32 *mmu; @@ -206,6 +208,39 @@ mtk_smi_larb_unbind(struct device *dev, struct device *master, void *data) /* Do nothing as the iommu is always enabled. */ } +static int mtk_smi_device_link_common(struct device *dev, struct device **com_dev) +{ + struct platform_device *smi_com_pdev; + struct device_node *smi_com_node; + struct device *smi_com_dev; + struct device_link *link; + + smi_com_node = of_parse_phandle(dev->of_node, "mediatek,smi", 0); + if (!smi_com_node) + return -EINVAL; + + smi_com_pdev = of_find_device_by_node(smi_com_node); + of_node_put(smi_com_node); + if (smi_com_pdev) { + /* smi common is the supplier, Make sure it is ready before */ + if (!platform_get_drvdata(smi_com_pdev)) + return -EPROBE_DEFER; + smi_com_dev = _com_pdev->dev; + link = device_link_add(dev, smi_com_dev, + DL_FLAG_PM_RUNTIME | DL_FLAG_STATELESS); + if (!link) { + dev_err(dev, "Unable to link smi-common dev\n"); + return -ENODEV; + } + *com_dev = smi_com_dev; + } else { + dev_err(dev, "Failed to get the smi_common device\n"); + return -EINVAL; + } + + return 0; +} + static const struct component_ops mtk_smi_larb_component_ops = { .bind = mtk_smi_larb_bind, .unbind = mtk_smi_larb_unbind, @@ -267,9 +302,6 @@ static int mtk_smi_larb_probe(struct platform_device *pdev) struct mtk_smi_larb *larb; struct resource *res; struct device *dev = >dev; - struct device_node *smi_node; - struct platform_device *smi_pdev; - struct device_link *link; int i, ret; larb = devm_kzalloc(dev, sizeof(*larb), GFP_KERNEL); @@ -291,27 +323,9 @@ static int mtk_smi_larb_probe(struct platform_device *pdev) return ret; larb->smi.dev = dev; - - smi_node = of_parse_phandle(dev->of_node, "mediatek,smi", 0); - if (!smi_node) - return -EINVAL; - - smi_pdev = of_find_device_by_node(smi_node); - of_node_put(smi_node); - if (smi_pdev) { - if (!platform_get_drvdata(smi_pdev)) - return -EPROBE_DEFER; - larb->smi_common_dev = _pdev->dev; - link = device_link_add(dev, larb->smi_common_dev, - DL_FLAG_PM_RUNTIME | DL_FLAG_STATELESS); - if (!link) { - dev_err(dev, "Unable to link smi-common dev\n"); - return -ENODEV; - } - } else { - dev_err(dev, "Failed to get the smi_common device\n"); - return -EINVAL; - } + ret = mtk_smi_device_link_common(dev, >smi_common_dev); + if (ret < 0) + return ret; pm_runtime_enable(dev); platform_set_drvdata(pdev, larb); @@ -451,6 +465,14 @@ static int mtk_smi_common_probe(struct platform_device *pdev) if (IS_ERR(common->base))
[PATCH 5/9] memory: mtk-smi: Adjust some code position
This patch has no functional change, Only move the code position to make the code more readable. 1. Put the register smi-common above smi-larb. this is preparing to add many others register setting. 2. put mtk_smi_larb_bind near larb_unbind. 3. Sort the SoC data alphabetically. and put them in one line as the current kernel allow it. Signed-off-by: Yong Wu --- drivers/memory/mtk-smi.c | 219 --- 1 file changed, 90 insertions(+), 129 deletions(-) diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c index 8eb39b46a6c8..6858877ac859 100644 --- a/drivers/memory/mtk-smi.c +++ b/drivers/memory/mtk-smi.c @@ -17,12 +17,15 @@ #include #include -/* mt8173 */ -#define SMI_LARB_MMU_EN0xf00 +/* SMI COMMON */ +#define SMI_BUS_SEL0x220 +#define SMI_BUS_LARB_SHIFT(larbid) ((larbid) << 1) +/* All are MMU0 defaultly. Only specialize mmu1 here. */ +#define F_MMU1_LARB(larbid)(0x1 << SMI_BUS_LARB_SHIFT(larbid)) -/* mt8167 */ -#define MT8167_SMI_LARB_MMU_EN 0xfc0 +/* SMI LARB */ +/* Below are about mmu enable registers, they are different in SoCs */ /* mt2701 */ #define REG_SMI_SECUR_CON_BASE 0x5c0 @@ -41,20 +44,20 @@ /* mt2701 domain should be set to 3 */ #define SMI_SECUR_CON_VAL_DOMAIN(id) (0x3 << id) & 0x7) << 2) + 1)) -/* mt2712 */ -#define SMI_LARB_NONSEC_CON(id)(0x380 + ((id) * 4)) -#define F_MMU_EN BIT(0) -#define BANK_SEL(id) ({ \ +/* mt8167 */ +#define MT8167_SMI_LARB_MMU_EN 0xfc0 + +/* mt8173 */ +#define MT8173_SMI_LARB_MMU_EN 0xf00 + +/* larb gen2 */ +#define SMI_LARB_NONSEC_CON(id)(0x380 + ((id) * 4)) +#define F_MMU_EN BIT(0) +#define BANK_SEL(id) ({ \ u32 _id = (id) & 0x3; \ (_id << 8 | _id << 10 | _id << 12 | _id << 14); \ }) -/* SMI COMMON */ -#define SMI_BUS_SEL0x220 -#define SMI_BUS_LARB_SHIFT(larbid) ((larbid) << 1) -/* All are MMU0 defaultly. Only specialize mmu1 here. */ -#define F_MMU1_LARB(larbid)(0x1 << SMI_BUS_LARB_SHIFT(larbid)) - enum mtk_smi_type { MTK_SMI_GEN1, MTK_SMI_GEN2 @@ -117,55 +120,6 @@ void mtk_smi_larb_put(struct device *larbdev) } EXPORT_SYMBOL_GPL(mtk_smi_larb_put); -static int -mtk_smi_larb_bind(struct device *dev, struct device *master, void *data) -{ - struct mtk_smi_larb *larb = dev_get_drvdata(dev); - struct mtk_smi_larb_iommu *larb_mmu = data; - unsigned int i; - - for (i = 0; i < MTK_LARB_NR_MAX; i++) { - if (dev == larb_mmu[i].dev) { - larb->larbid = i; - larb->mmu = _mmu[i].mmu; - larb->bank = larb_mmu[i].bank; - return 0; - } - } - return -ENODEV; -} - -static void mtk_smi_larb_config_port_gen2_general(struct device *dev) -{ - struct mtk_smi_larb *larb = dev_get_drvdata(dev); - u32 reg; - int i; - - if (BIT(larb->larbid) & larb->larb_gen->larb_direct_to_common_mask) - return; - - for_each_set_bit(i, (unsigned long *)larb->mmu, 32) { - reg = readl_relaxed(larb->base + SMI_LARB_NONSEC_CON(i)); - reg |= F_MMU_EN; - reg |= BANK_SEL(larb->bank[i]); - writel(reg, larb->base + SMI_LARB_NONSEC_CON(i)); - } -} - -static void mtk_smi_larb_config_port_mt8173(struct device *dev) -{ - struct mtk_smi_larb *larb = dev_get_drvdata(dev); - - writel(*larb->mmu, larb->base + SMI_LARB_MMU_EN); -} - -static void mtk_smi_larb_config_port_mt8167(struct device *dev) -{ - struct mtk_smi_larb *larb = dev_get_drvdata(dev); - - writel(*larb->mmu, larb->base + MT8167_SMI_LARB_MMU_EN); -} - static void mtk_smi_larb_config_port_gen1(struct device *dev) { struct mtk_smi_larb *larb = dev_get_drvdata(dev); @@ -197,6 +151,55 @@ static void mtk_smi_larb_config_port_gen1(struct device *dev) } } +static void mtk_smi_larb_config_port_mt8167(struct device *dev) +{ + struct mtk_smi_larb *larb = dev_get_drvdata(dev); + + writel(*larb->mmu, larb->base + MT8167_SMI_LARB_MMU_EN); +} + +static void mtk_smi_larb_config_port_mt8173(struct device *dev) +{ + struct mtk_smi_larb *larb = dev_get_drvdata(dev); + + writel(*larb->mmu, larb->base + MT8173_SMI_LARB_MMU_EN); +} + +static void mtk_smi_larb_config_port_gen2_general(struct device *dev) +{ + struct mtk_smi_larb *larb = dev_get_drvdata(dev); + u32 reg; + int i; + + if (BIT(larb->larbid) & larb->larb_gen->larb_direct_to_common_mask) + return; + + for_each_set_bit(i, (unsigned long *)larb->mmu, 32) { + reg = readl_relaxed(larb->base + SMI_LARB_NONSEC_CON(i)); + reg |= F_MMU_EN; + reg
[PATCH 3/9] memory: mtk-smi: Use clk_bulk instead of the clk ops
smi have many clocks: apb/smi/gals. This patch use clk_bulk interface instead of the orginal one to simply the code. gals is optional clk(some larbs may don't have gals). use clk_bulk_optional instead. and then remove the has_gals flag. Also remove clk fail logs since bulk interface already output fail log. Signed-off-by: Yong Wu --- drivers/memory/mtk-smi.c | 124 +++ 1 file changed, 34 insertions(+), 90 deletions(-) diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c index c5fb51f73b34..bcd2bf130655 100644 --- a/drivers/memory/mtk-smi.c +++ b/drivers/memory/mtk-smi.c @@ -60,9 +60,18 @@ enum mtk_smi_gen { MTK_SMI_GEN2 }; +#define MTK_SMI_CLK_NR_MAX 4 + +static const char * const mtk_smi_common_clocks[] = { + "apb", "smi", "gals0", "gals1", /* glas is optional */ +}; + +static const char * const mtk_smi_larb_clocks[] = { + "apb", "smi", "gals" +}; + struct mtk_smi_common_plat { enum mtk_smi_gen gen; - bool has_gals; u32 bus_sel; /* Balance some larbs to enter mmu0 or mmu1 */ }; @@ -70,13 +79,12 @@ struct mtk_smi_larb_gen { int port_in_larb[MTK_LARB_NR_MAX + 1]; void (*config_port)(struct device *dev); unsigned intlarb_direct_to_common_mask; - boolhas_gals; }; struct mtk_smi { struct device *dev; - struct clk *clk_apb, *clk_smi; - struct clk *clk_gals0, *clk_gals1; + unsigned intclk_num; + struct clk_bulk_dataclks[MTK_SMI_CLK_NR_MAX]; struct clk *clk_async; /*only needed by mt2701*/ union { void __iomem*smi_ao_base; /* only for gen1 */ @@ -95,45 +103,6 @@ struct mtk_smi_larb { /* larb: local arbiter */ unsigned char *bank; }; -static int mtk_smi_clk_enable(const struct mtk_smi *smi) -{ - int ret; - - ret = clk_prepare_enable(smi->clk_apb); - if (ret) - return ret; - - ret = clk_prepare_enable(smi->clk_smi); - if (ret) - goto err_disable_apb; - - ret = clk_prepare_enable(smi->clk_gals0); - if (ret) - goto err_disable_smi; - - ret = clk_prepare_enable(smi->clk_gals1); - if (ret) - goto err_disable_gals0; - - return 0; - -err_disable_gals0: - clk_disable_unprepare(smi->clk_gals0); -err_disable_smi: - clk_disable_unprepare(smi->clk_smi); -err_disable_apb: - clk_disable_unprepare(smi->clk_apb); - return ret; -} - -static void mtk_smi_clk_disable(const struct mtk_smi *smi) -{ - clk_disable_unprepare(smi->clk_gals1); - clk_disable_unprepare(smi->clk_gals0); - clk_disable_unprepare(smi->clk_smi); - clk_disable_unprepare(smi->clk_apb); -} - int mtk_smi_larb_get(struct device *larbdev) { int ret = pm_runtime_resume_and_get(larbdev); @@ -270,7 +239,6 @@ static const struct mtk_smi_larb_gen mtk_smi_larb_mt6779 = { }; static const struct mtk_smi_larb_gen mtk_smi_larb_mt8183 = { - .has_gals = true, .config_port= mtk_smi_larb_config_port_gen2_general, .larb_direct_to_common_mask = BIT(2) | BIT(3) | BIT(7), /* IPU0 | IPU1 | CCU */ @@ -320,6 +288,7 @@ static int mtk_smi_larb_probe(struct platform_device *pdev) struct device_node *smi_node; struct platform_device *smi_pdev; struct device_link *link; + int i, ret; larb = devm_kzalloc(dev, sizeof(*larb), GFP_KERNEL); if (!larb) @@ -331,22 +300,14 @@ static int mtk_smi_larb_probe(struct platform_device *pdev) if (IS_ERR(larb->base)) return PTR_ERR(larb->base); - larb->smi.clk_apb = devm_clk_get(dev, "apb"); - if (IS_ERR(larb->smi.clk_apb)) - return PTR_ERR(larb->smi.clk_apb); - - larb->smi.clk_smi = devm_clk_get(dev, "smi"); - if (IS_ERR(larb->smi.clk_smi)) - return PTR_ERR(larb->smi.clk_smi); - - if (larb->larb_gen->has_gals) { - /* The larbs may still haven't gals even if the SoC support.*/ - larb->smi.clk_gals0 = devm_clk_get(dev, "gals"); - if (PTR_ERR(larb->smi.clk_gals0) == -ENOENT) - larb->smi.clk_gals0 = NULL; - else if (IS_ERR(larb->smi.clk_gals0)) - return PTR_ERR(larb->smi.clk_gals0); - } + larb->smi.clk_num = ARRAY_SIZE(mtk_smi_larb_clocks); + for (i = 0; i < larb->smi.clk_num; i++) + larb->smi.clks[i].id = mtk_smi_larb_clocks[i]; + + ret = devm_clk_bulk_get_optional(dev, larb->smi.clk_num, larb->smi.clks); + if (ret) + return ret; + larb->smi.dev = dev;
[PATCH 4/9] memory: mtk-smi: Rename smi_gen to smi_type
This is a preparing patch for adding smi sub common. About the previou smi_gen, we have gen1/gen2 that stand for the generation number for HW. I plan to add a new type(sub_common), then the "gen" is not prober. this patch only change it to "type", No functional change. Signed-off-by: Yong Wu --- drivers/memory/mtk-smi.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c index bcd2bf130655..8eb39b46a6c8 100644 --- a/drivers/memory/mtk-smi.c +++ b/drivers/memory/mtk-smi.c @@ -55,7 +55,7 @@ /* All are MMU0 defaultly. Only specialize mmu1 here. */ #define F_MMU1_LARB(larbid)(0x1 << SMI_BUS_LARB_SHIFT(larbid)) -enum mtk_smi_gen { +enum mtk_smi_type { MTK_SMI_GEN1, MTK_SMI_GEN2 }; @@ -71,8 +71,8 @@ static const char * const mtk_smi_larb_clocks[] = { }; struct mtk_smi_common_plat { - enum mtk_smi_gen gen; - u32 bus_sel; /* Balance some larbs to enter mmu0 or mmu1 */ + enum mtk_smi_type type; + u32 bus_sel; /* Balance some larbs to enter mmu0 or mmu1 */ }; struct mtk_smi_larb_gen { @@ -387,27 +387,27 @@ static struct platform_driver mtk_smi_larb_driver = { }; static const struct mtk_smi_common_plat mtk_smi_common_gen1 = { - .gen = MTK_SMI_GEN1, + .type = MTK_SMI_GEN1, }; static const struct mtk_smi_common_plat mtk_smi_common_gen2 = { - .gen = MTK_SMI_GEN2, + .type = MTK_SMI_GEN2, }; static const struct mtk_smi_common_plat mtk_smi_common_mt6779 = { - .gen= MTK_SMI_GEN2, - .bus_sel= F_MMU1_LARB(1) | F_MMU1_LARB(2) | F_MMU1_LARB(4) | - F_MMU1_LARB(5) | F_MMU1_LARB(6) | F_MMU1_LARB(7), + .type = MTK_SMI_GEN2, + .bus_sel = F_MMU1_LARB(1) | F_MMU1_LARB(2) | F_MMU1_LARB(4) | + F_MMU1_LARB(5) | F_MMU1_LARB(6) | F_MMU1_LARB(7), }; static const struct mtk_smi_common_plat mtk_smi_common_mt8183 = { - .gen = MTK_SMI_GEN2, + .type = MTK_SMI_GEN2, .bus_sel = F_MMU1_LARB(1) | F_MMU1_LARB(2) | F_MMU1_LARB(5) | F_MMU1_LARB(7), }; static const struct mtk_smi_common_plat mtk_smi_common_mt8192 = { - .gen = MTK_SMI_GEN2, + .type = MTK_SMI_GEN2, .bus_sel = F_MMU1_LARB(1) | F_MMU1_LARB(2) | F_MMU1_LARB(5) | F_MMU1_LARB(6), }; @@ -471,7 +471,7 @@ static int mtk_smi_common_probe(struct platform_device *pdev) * clock into emi clock domain, but for mtk smi gen2, there's no smi ao * base. */ - if (common->plat->gen == MTK_SMI_GEN1) { + if (common->plat->type == MTK_SMI_GEN1) { res = platform_get_resource(pdev, IORESOURCE_MEM, 0); common->smi_ao_base = devm_ioremap_resource(dev, res); if (IS_ERR(common->smi_ao_base)) @@ -511,7 +511,7 @@ static int __maybe_unused mtk_smi_common_resume(struct device *dev) if (ret) return ret; - if (common->plat->gen == MTK_SMI_GEN2 && bus_sel) + if (common->plat->type == MTK_SMI_GEN2 && bus_sel) writel(bus_sel, common->base + SMI_BUS_SEL); return 0; } -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/9] dt-bindings: memory: mediatek: Add mt8195 smi sub common
This patch adds the binding for smi-sub-common. The SMI block diagram like this: IOMMU | | smi-common -- | | larb0 larb7 <-max is 8 The smi-common connects with smi-larb and IOMMU. The maximum larbs number that connects with a smi-common is 8. If the engines number is over 8, sometimes we use a smi-sub-common which is nearly same with smi-common. It supports up to 8 input and 1 output(smi-common has 2 output) Something like: IOMMU | | smi-common - | | ... larb0 sub-common ... <-max is 8 --- ||... <-max is 8 too. larb2 larb5 We don't need extra SW setting for smi-sub-common, only the sub-common has special clocks need to enable when the engines access dram. If it is sub-common, it should have a "mediatek,smi" phandle to point to its smi-common. also, the sub-common only has one gals clock. Signed-off-by: Yong Wu --- .../mediatek,smi-common.yaml | 25 +++ 1 file changed, 25 insertions(+) diff --git a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml index 6317025bd203..11515afdfb2e 100644 --- a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml +++ b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml @@ -38,6 +38,7 @@ properties: - mediatek,mt8192-smi-common - mediatek,mt8195-smi-common-vdo - mediatek,mt8195-smi-common-vpp + - mediatek,mt8195-smi-sub-common - description: for mt7623 items: @@ -69,6 +70,10 @@ properties: minItems: 2 maxItems: 4 + mediatek,smi: +$ref: /schemas/types.yaml#/definitions/phandle-array +description: a phandle to the smi-common node above. Only for sub-common. + required: - compatible - reg @@ -95,6 +100,26 @@ allOf: - const: smi - const: async + - if: # only for sub common + properties: +compatible: + contains: +enum: + - mediatek,mt8195-smi-sub-common +then: + required: +- mediatek,smi + properties: +clock: + items: +minItems: 3 +maxItems: 3 +clock-names: + items: +- const: apb +- const: smi +- const: gals0 + - if: # for gen2 HW that have gals properties: compatible: -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/9] dt-bindings: memory: mediatek: Add mt8195 smi binding
This patch adds mt8195 smi supporting in the bindings. In mt8195, there are two smi-common HW, one is for vdo(video output), the other is for vpp(video processing pipe). They connects with different smi-larbs, then some setting(bus_sel) is different. Differentiate them with the compatible string. Something like this: IOMMU(VDO) IOMMU(VPP) | | SMI_COMMON_VDO SMI_COMMON_VPP --- | | ... | | ... larb0 larb2 ...larb1 larb3... Signed-off-by: Yong Wu --- .../bindings/memory-controllers/mediatek,smi-common.yaml| 6 +- .../bindings/memory-controllers/mediatek,smi-larb.yaml | 3 +++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml index a08a32340987..6317025bd203 100644 --- a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml +++ b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-common.yaml @@ -16,7 +16,7 @@ description: | MediaTek SMI have two generations of HW architecture, here is the list which generation the SoCs use: generation 1: mt2701 and mt7623. - generation 2: mt2712, mt6779, mt8167, mt8173, mt8183 and mt8192. + generation 2: mt2712, mt6779, mt8167, mt8173, mt8183, mt8192 and mt8195. There's slight differences between the two SMI, for generation 2, the register which control the iommu port is at each larb's register base. But @@ -36,6 +36,8 @@ properties: - mediatek,mt8173-smi-common - mediatek,mt8183-smi-common - mediatek,mt8192-smi-common + - mediatek,mt8195-smi-common-vdo + - mediatek,mt8195-smi-common-vpp - description: for mt7623 items: @@ -100,6 +102,8 @@ allOf: - mediatek,mt6779-smi-common - mediatek,mt8183-smi-common - mediatek,mt8192-smi-common +- mediatek,mt8195-smi-common-vdo +- mediatek,mt8195-smi-common-vpp then: properties: diff --git a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml index 7ed7839ff0a7..a100283903bd 100644 --- a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml +++ b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.yaml @@ -24,6 +24,7 @@ properties: - mediatek,mt8173-smi-larb - mediatek,mt8183-smi-larb - mediatek,mt8192-smi-larb + - mediatek,mt8195-smi-larb - description: for mt7623 items: @@ -75,6 +76,7 @@ allOf: compatible: enum: - mediatek,mt8183-smi-larb +- mediatek,mt8195-smi-larb then: properties: @@ -109,6 +111,7 @@ allOf: - mediatek,mt6779-smi-larb - mediatek,mt8167-smi-larb - mediatek,mt8192-smi-larb + - mediatek,mt8195-smi-larb then: required: -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/9] MT8195 SMI support
This patchset mainly adds SMI support for mt8195. Comparing with the previous version, add two new functions: a) add smi sub common b) add initial setting for smi-common and smi-larb. Yong Wu (9): dt-bindings: memory: mediatek: Add mt8195 smi binding dt-bindings: memory: mediatek: Add mt8195 smi sub common memory: mtk-smi: Use clk_bulk instead of the clk ops memory: mtk-smi: Rename smi_gen to smi_type memory: mtk-smi: Adjust some code position memory: mtk-smi: Add smi sub common support memory: mtk-smi: mt8195: Add smi support memory: mtk-smi: mt8195: Add initial setting for smi-common memory: mtk-smi: mt8195: Add initial setting for smi-larb .../mediatek,smi-common.yaml | 31 +- .../memory-controllers/mediatek,smi-larb.yaml | 3 + drivers/memory/mtk-smi.c | 568 ++ 3 files changed, 347 insertions(+), 255 deletions(-) -- 2.18.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] swiotlb-xen: override common mmap and get_sgtable dma ops
> We make sure that we allocate contiguous memory in > xen_swiotlb_alloc_coherent(). I understood. Thanks! -- Best Regards, Roman. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/2] swiotlb-xen: override common mmap and get_sgtable dma ops
This commit is dedicated to fix incorrect conversion from cpu_addr to page address in cases when we get virtual address which allocated through xen_swiotlb_alloc_coherent() and can be mapped in the vmalloc range. As the result, virt_to_page() cannot convert this address properly and return incorrect page address. Need to detect such cases and obtains the page address using vmalloc_to_page() instead. The reference code for mmap() and get_sgtable() was copied from kernel/dma/ops_helpers.c and modified to provide additional detections as described above. In order to simplify code there was added a new dma_cpu_addr_to_page() helper. Signed-off-by: Roman Skakun Reviewed-by: Andrii Anisov --- drivers/xen/swiotlb-xen.c | 42 +++ 1 file changed, 34 insertions(+), 8 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 90bc5fc321bc..9331a8500547 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -118,6 +118,14 @@ static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr) return 0; } +static struct page *cpu_addr_to_page(void *cpu_addr) +{ + if (is_vmalloc_addr(cpu_addr)) + return vmalloc_to_page(cpu_addr); + else + return virt_to_page(cpu_addr); +} + static int xen_swiotlb_fixup(void *buf, size_t size, unsigned long nslabs) { @@ -337,7 +345,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr, int order = get_order(size); phys_addr_t phys; u64 dma_mask = DMA_BIT_MASK(32); - struct page *page; + struct page *page = cpu_addr_to_page(vaddr); if (hwdev && hwdev->coherent_dma_mask) dma_mask = hwdev->coherent_dma_mask; @@ -349,11 +357,6 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr, /* Convert the size to actually allocated. */ size = 1UL << (order + XEN_PAGE_SHIFT); - if (is_vmalloc_addr(vaddr)) - page = vmalloc_to_page(vaddr); - else - page = virt_to_page(vaddr); - if (!WARN_ON((dev_addr + size - 1 > dma_mask) || range_straddles_page_boundary(phys, size)) && TestClearPageXenRemapped(page)) @@ -573,7 +576,23 @@ xen_swiotlb_dma_mmap(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs) { - return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size, attrs); + unsigned long user_count = vma_pages(vma); + unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT; + unsigned long off = vma->vm_pgoff; + struct page *page = cpu_addr_to_page(cpu_addr); + int ret; + + vma->vm_page_prot = dma_pgprot(dev, vma->vm_page_prot, attrs); + + if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, )) + return ret; + + if (off >= count || user_count > count - off) + return -ENXIO; + + return remap_pfn_range(vma, vma->vm_start, + page_to_pfn(page) + vma->vm_pgoff, + user_count << PAGE_SHIFT, vma->vm_page_prot); } /* @@ -585,7 +604,14 @@ xen_swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt, void *cpu_addr, dma_addr_t handle, size_t size, unsigned long attrs) { - return dma_common_get_sgtable(dev, sgt, cpu_addr, handle, size, attrs); + struct page *page = cpu_addr_to_page(cpu_addr); + int ret; + + ret = sg_alloc_table(sgt, 1, GFP_KERNEL); + if (!ret) + sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0); + + return ret; } const struct dma_map_ops xen_swiotlb_dma_ops = { -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/2] Revert "swiotlb-xen: remove xen_swiotlb_dma_mmap and xen_swiotlb_dma_get_sgtable"
This reverts commit 922659ea771b3fd728149262c5ea15608fab9719. Signed-off-by: Roman Skakun --- drivers/xen/swiotlb-xen.c | 29 +++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 2b385c1b4a99..90bc5fc321bc 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -563,6 +563,31 @@ xen_swiotlb_dma_supported(struct device *hwdev, u64 mask) return xen_virt_to_bus(hwdev, xen_io_tlb_end - 1) <= mask; } +/* + * Create userspace mapping for the DMA-coherent memory. + * This function should be called with the pages from the current domain only, + * passing pages mapped from other domains would lead to memory corruption. + */ +static int +xen_swiotlb_dma_mmap(struct device *dev, struct vm_area_struct *vma, +void *cpu_addr, dma_addr_t dma_addr, size_t size, +unsigned long attrs) +{ + return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size, attrs); +} + +/* + * This function should be called with the pages from the current domain only, + * passing pages mapped from other domains would lead to memory corruption. + */ +static int +xen_swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt, + void *cpu_addr, dma_addr_t handle, size_t size, + unsigned long attrs) +{ + return dma_common_get_sgtable(dev, sgt, cpu_addr, handle, size, attrs); +} + const struct dma_map_ops xen_swiotlb_dma_ops = { .alloc = xen_swiotlb_alloc_coherent, .free = xen_swiotlb_free_coherent, @@ -575,8 +600,8 @@ const struct dma_map_ops xen_swiotlb_dma_ops = { .map_page = xen_swiotlb_map_page, .unmap_page = xen_swiotlb_unmap_page, .dma_supported = xen_swiotlb_dma_supported, - .mmap = dma_common_mmap, - .get_sgtable = dma_common_get_sgtable, + .mmap = xen_swiotlb_dma_mmap, + .get_sgtable = xen_swiotlb_get_sgtable, .alloc_pages = dma_common_alloc_pages, .free_pages = dma_common_free_pages, }; -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems
On Tue, Jun 15, 2021 at 01:15:43PM -0600, Rob Herring wrote: > If a property has an 'items' list, then a 'minItems' or 'maxItems' with the > same size as the list is redundant and can be dropped. Note that is DT > schema specific behavior and not standard json-schema behavior. The tooling > will fixup the final schema adding any unspecified minItems/maxItems. Acked-by: Mark Brown signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v13 6/6] iommu: Remove mode argument from iommu_set_dma_strict()
We only ever now set strict mode enabled in iommu_set_dma_strict(), so just remove the argument. Signed-off-by: John Garry Reviewed-by: Robin Murphy --- drivers/iommu/amd/init.c| 2 +- drivers/iommu/intel/iommu.c | 6 +++--- drivers/iommu/iommu.c | 5 ++--- include/linux/iommu.h | 2 +- 4 files changed, 7 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index fb3618af643b..7bc460052678 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -3099,7 +3099,7 @@ static int __init parse_amd_iommu_options(char *str) for (; *str; ++str) { if (strncmp(str, "fullflush", 9) == 0) { pr_warn("amd_iommu=fullflush deprecated; use iommu.strict instead\n"); - iommu_set_dma_strict(true); + iommu_set_dma_strict(); } if (strncmp(str, "force_enable", 12) == 0) amd_iommu_force_enable = true; diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index d586990fa751..0618c35cfb51 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -454,7 +454,7 @@ static int __init intel_iommu_setup(char *str) iommu_dma_forcedac = true; } else if (!strncmp(str, "strict", 6)) { pr_warn("intel_iommu=strict deprecated; use iommu.strict instead\n"); - iommu_set_dma_strict(true); + iommu_set_dma_strict(); } else if (!strncmp(str, "sp_off", 6)) { pr_info("Disable supported super page\n"); intel_iommu_superpage = 0; @@ -4382,7 +4382,7 @@ int __init intel_iommu_init(void) */ if (cap_caching_mode(iommu->cap)) { pr_warn("IOMMU batching disallowed due to virtualization\n"); - iommu_set_dma_strict(true); + iommu_set_dma_strict(); } iommu_device_sysfs_add(>iommu, NULL, intel_iommu_groups, @@ -5699,7 +5699,7 @@ static void quirk_calpella_no_shadow_gtt(struct pci_dev *dev) } else if (dmar_map_gfx) { /* we have to ensure the gfx device is idle before we flush */ pci_info(dev, "Disabling batched IOTLB flush on Ironlake\n"); - iommu_set_dma_strict(true); + iommu_set_dma_strict(); } } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0040, quirk_calpella_no_shadow_gtt); diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 60b1ec42e73b..ff221d3ddcbc 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -349,10 +349,9 @@ static int __init iommu_dma_setup(char *str) } early_param("iommu.strict", iommu_dma_setup); -void iommu_set_dma_strict(bool strict) +void iommu_set_dma_strict(void) { - if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) - iommu_dma_strict = strict; + iommu_dma_strict = true; } bool iommu_get_dma_strict(struct iommu_domain *domain) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 32d448050bf7..754f67d6dd90 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -476,7 +476,7 @@ int iommu_enable_nesting(struct iommu_domain *domain); int iommu_set_pgtable_quirks(struct iommu_domain *domain, unsigned long quirks); -void iommu_set_dma_strict(bool val); +void iommu_set_dma_strict(void); bool iommu_get_dma_strict(struct iommu_domain *domain); extern int report_iommu_fault(struct iommu_domain *domain, struct device *dev, -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v13 5/6] iommu/amd: Add support for IOMMU default DMA mode build options
From: Zhen Lei Make IOMMU_DEFAULT_LAZY default for when AMD_IOMMU config is set, which matches current behaviour. For "fullflush" param, just call iommu_set_dma_strict(true) directly. Since we get a strict vs lazy mode print already in iommu_subsys_init(), and maintain a deprecation print when "fullflush" param is passed, drop the prints in amd_iommu_init_dma_ops(). Finally drop global flag amd_iommu_unmap_flush, as it has no longer has any purpose. [jpg: Rebase for relocated file and drop amd_iommu_unmap_flush] Signed-off-by: Zhen Lei Signed-off-by: John Garry --- drivers/iommu/Kconfig | 2 +- drivers/iommu/amd/amd_iommu_types.h | 6 -- drivers/iommu/amd/init.c| 3 +-- drivers/iommu/amd/iommu.c | 6 -- 4 files changed, 2 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index c214a36eb2dc..fd1ad28dd5ee 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -94,7 +94,7 @@ choice prompt "IOMMU default DMA IOTLB invalidation mode" depends on IOMMU_DMA - default IOMMU_DEFAULT_LAZY if INTEL_IOMMU + default IOMMU_DEFAULT_LAZY if (AMD_IOMMU || INTEL_IOMMU) default IOMMU_DEFAULT_STRICT help This option allows an IOMMU DMA IOTLB invalidation mode to be diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h index 94c1a7a9876d..8dbe61e2b3c1 100644 --- a/drivers/iommu/amd/amd_iommu_types.h +++ b/drivers/iommu/amd/amd_iommu_types.h @@ -779,12 +779,6 @@ extern u16 amd_iommu_last_bdf; /* allocation bitmap for domain ids */ extern unsigned long *amd_iommu_pd_alloc_bitmap; -/* - * If true, the addresses will be flushed on unmap time, not when - * they are reused - */ -extern bool amd_iommu_unmap_flush; - /* Smallest max PASID supported by any IOMMU in the system */ extern u32 amd_iommu_max_pasid; diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 9f3096d650aa..fb3618af643b 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -161,7 +161,6 @@ u16 amd_iommu_last_bdf; /* largest PCI device id we have to handle */ LIST_HEAD(amd_iommu_unity_map);/* a list of required unity mappings we find in ACPI */ -bool amd_iommu_unmap_flush;/* if true, flush on every unmap */ LIST_HEAD(amd_iommu_list); /* list of all AMD IOMMUs in the system */ @@ -3100,7 +3099,7 @@ static int __init parse_amd_iommu_options(char *str) for (; *str; ++str) { if (strncmp(str, "fullflush", 9) == 0) { pr_warn("amd_iommu=fullflush deprecated; use iommu.strict instead\n"); - amd_iommu_unmap_flush = true; + iommu_set_dma_strict(true); } if (strncmp(str, "force_enable", 12) == 0) amd_iommu_force_enable = true; diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c index b1fbf2c83df5..32b541ee2e11 100644 --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -1775,12 +1775,6 @@ void amd_iommu_domain_update(struct protection_domain *domain) static void __init amd_iommu_init_dma_ops(void) { swiotlb = (iommu_default_passthrough() || sme_me_mask) ? 1 : 0; - - if (amd_iommu_unmap_flush) - pr_info("IO/TLB flush on unmap enabled\n"); - else - pr_info("Lazy IO/TLB flushing enabled\n"); - iommu_set_dma_strict(amd_iommu_unmap_flush); } int __init amd_iommu_init_api(void) -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v13 4/6] iommu/vt-d: Add support for IOMMU default DMA mode build options
From: Zhen Lei Make IOMMU_DEFAULT_LAZY default for when INTEL_IOMMU config is set, as is current behaviour. Also delete global flag intel_iommu_strict: - In intel_iommu_setup(), call iommu_set_dma_strict(true) directly. Also remove the print, as iommu_subsys_init() prints the mode and we have already marked this param as deprecated. - For cap_caching_mode() check in intel_iommu_setup(), call iommu_set_dma_strict(true) directly, and reword the accompanying print and add the missing '\n'. - For Ironlake GPU, again call iommu_set_dma_strict(true) directly and keep the accompanying print. [jpg: Remove intel_iommu_strict] Signed-off-by: Zhen Lei Signed-off-by: John Garry --- drivers/iommu/Kconfig | 1 + drivers/iommu/intel/iommu.c | 15 ++- 2 files changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 0327a942fdb7..c214a36eb2dc 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -94,6 +94,7 @@ choice prompt "IOMMU default DMA IOTLB invalidation mode" depends on IOMMU_DMA + default IOMMU_DEFAULT_LAZY if INTEL_IOMMU default IOMMU_DEFAULT_STRICT help This option allows an IOMMU DMA IOTLB invalidation mode to be diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 821d8227a4e6..d586990fa751 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -361,7 +361,6 @@ int intel_iommu_enabled = 0; EXPORT_SYMBOL_GPL(intel_iommu_enabled); static int dmar_map_gfx = 1; -static int intel_iommu_strict; static int intel_iommu_superpage = 1; static int iommu_identity_mapping; static int iommu_skip_te_disable; @@ -455,8 +454,7 @@ static int __init intel_iommu_setup(char *str) iommu_dma_forcedac = true; } else if (!strncmp(str, "strict", 6)) { pr_warn("intel_iommu=strict deprecated; use iommu.strict instead\n"); - pr_info("Disable batched IOTLB flush\n"); - intel_iommu_strict = 1; + iommu_set_dma_strict(true); } else if (!strncmp(str, "sp_off", 6)) { pr_info("Disable supported super page\n"); intel_iommu_superpage = 0; @@ -4382,9 +4380,9 @@ int __init intel_iommu_init(void) * is likely to be much lower than the overhead of synchronizing * the virtual and physical IOMMU page-tables. */ - if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) { - pr_warn("IOMMU batching is disabled due to virtualization"); - intel_iommu_strict = 1; + if (cap_caching_mode(iommu->cap)) { + pr_warn("IOMMU batching disallowed due to virtualization\n"); + iommu_set_dma_strict(true); } iommu_device_sysfs_add(>iommu, NULL, intel_iommu_groups, @@ -4393,7 +4391,6 @@ int __init intel_iommu_init(void) } up_read(_global_lock); - iommu_set_dma_strict(intel_iommu_strict); bus_set_iommu(_bus_type, _iommu_ops); if (si_domain && !hw_pass_through) register_memory_notifier(_iommu_memory_nb); @@ -5702,8 +5699,8 @@ static void quirk_calpella_no_shadow_gtt(struct pci_dev *dev) } else if (dmar_map_gfx) { /* we have to ensure the gfx device is idle before we flush */ pci_info(dev, "Disabling batched IOTLB flush on Ironlake\n"); - intel_iommu_strict = 1; - } + iommu_set_dma_strict(true); + } } DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0040, quirk_calpella_no_shadow_gtt); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0044, quirk_calpella_no_shadow_gtt); -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v13 3/6] iommu: Enhance IOMMU default DMA mode build options
From: Zhen Lei First, add build options IOMMU_DEFAULT_{LAZY|STRICT}, so that we have the opportunity to set {lazy|strict} mode as default at build time. Then put the two config options in an choice, as they are mutually exclusive. [jpg: Make choice between strict and lazy only (and not passthrough)] Signed-off-by: Zhen Lei Signed-off-by: John Garry Reviewed-by: Robin Murphy --- .../admin-guide/kernel-parameters.txt | 3 +- drivers/iommu/Kconfig | 40 +++ drivers/iommu/iommu.c | 2 +- 3 files changed, 43 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index fcbb36d6eea7..d8fb36363be0 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2052,9 +2052,10 @@ throughput at the cost of reduced device isolation. Will fall back to strict mode if not supported by the relevant IOMMU driver. - 1 - Strict mode (default). + 1 - Strict mode. DMA unmap operations invalidate IOMMU hardware TLBs synchronously. + unset - Use value of CONFIG_IOMMU_DEFAULT_{LAZY,STRICT}. Note: on x86, the default behaviour depends on the equivalent driver-specific parameters, but a strict mode explicitly specified by either method takes diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 1f111b399bca..0327a942fdb7 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -90,6 +90,46 @@ config IOMMU_DEFAULT_PASSTHROUGH If unsure, say N here. +choice + prompt "IOMMU default DMA IOTLB invalidation mode" + depends on IOMMU_DMA + + default IOMMU_DEFAULT_STRICT + help + This option allows an IOMMU DMA IOTLB invalidation mode to be + chosen at build time, to override the default mode of each ARCH, + removing the need to pass in kernel parameters through command line. + It is still possible to provide common boot params to override this + config. + + If unsure, keep the default. + +config IOMMU_DEFAULT_STRICT + bool "strict" + help + For every IOMMU DMA unmap operation, the flush operation of IOTLB and + the free operation of IOVA are guaranteed to be done in the unmap + function. + +config IOMMU_DEFAULT_LAZY + bool "lazy" + help + Support lazy mode, where for every IOMMU DMA unmap operation, the + flush operation of IOTLB and the free operation of IOVA are deferred. + They are only guaranteed to be done before the related IOVA will be + reused. + + The isolation provided in this mode is not as secure as STRICT mode, + such that a vulnerable time window may be created between the DMA + unmap and the mappings cached in the IOMMU IOTLB or device TLB + finally being invalidated, where the device could still access the + memory which has already been unmapped by the device driver. + However this mode may provide better performance in high throughput + scenarios, and is still considerably more secure than passthrough + mode or no IOMMU. + +endchoice + config OF_IOMMU def_bool y depends on OF && IOMMU_API diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index cf58949cc2f3..60b1ec42e73b 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -29,7 +29,7 @@ static struct kset *iommu_group_kset; static DEFINE_IDA(iommu_group_ida); static unsigned int iommu_def_domain_type __read_mostly; -static bool iommu_dma_strict __read_mostly = true; +static bool iommu_dma_strict __read_mostly = IS_ENABLED(CONFIG_IOMMU_DEFAULT_STRICT); static u32 iommu_cmd_line __read_mostly; struct iommu_group { -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v13 2/6] iommu: Print strict or lazy mode at init time
As well as the default domain type, it's useful to know whether strict or lazy for DMA domains, so add this info in a separate print. The (stict/lazy) mode may be also set via iommu.strict earlyparm, but this will be processed prior to iommu_subsys_init(), so that print will be accurate for drivers which don't set the mode via custom means. For the drivers which set the mode via custom means - AMD and Intel drivers - they maintain prints to inform a change in policy or that custom cmdline methods to change policy are deprecated. Signed-off-by: John Garry Reviewed-by: Robin Murphy --- drivers/iommu/iommu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 5419c4b9f27a..cf58949cc2f3 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -138,6 +138,11 @@ static int __init iommu_subsys_init(void) (iommu_cmd_line & IOMMU_CMD_LINE_DMA_API) ? "(set via kernel command line)" : ""); + pr_info("DMA domain TLB invalidation policy: %s mode %s\n", + iommu_dma_strict ? "strict" : "lazy", + (iommu_cmd_line & IOMMU_CMD_LINE_STRICT) ? + "(set via kernel command line)" : ""); + return 0; } subsys_initcall(iommu_subsys_init); -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v13 0/6] iommu: Enhance IOMMU default DMA mode build options
This is a reboot of Zhen Lei's series from a couple of years ago, which never made it across the line. I still think that it has some value, so taking up the mantle. Motivation: Allow lazy mode be default mode for DMA domains for all ARCHs, and not only those who hardcode it (to be lazy). For ARM64, currently we must use a kernel command line parameter to use lazy mode, which is less than ideal. I have now included the print for strict/lazy mode, which I originally sent in: https://lore.kernel.org/linux-iommu/72eb3de9-1d1c-ae46-c5a9-95f26525d...@huawei.com/ There was some concern there about drivers and their custom prints conflicting with the print in that patch, but I think that it should be ok. Differences to v12: - Rebase to next-20210611 and include patch "iommu: Update "iommu.strict" documentation" as a baseline - Add Robin's RB tags (thanks!) - Please let me know if not ok with kernel-parameters.txt update in 3/6 - Add a patch to mark x86 strict cmdline params as deprecated - Improve wording in Kconfig change and tweak iommu_dma_strict declaration Differences to v11: - Rebase to next-20210610 - Drop strict mode globals in Intel and AMD drivers - Include patch to print strict vs lazy mode - Include patch to remove argument from iommu_set_dma_strict() Differences to v10: - Rebase to v5.13-rc4 - Correct comment and typo in Kconfig (Randy) - Make Kconfig choice depend on relevant architectures - Update kernel-parameters.txt for CONFIG_IOMMU_DEFAULT_{LAZY,STRICT} John Garry (3): iommu: Deprecate Intel and AMD cmdline methods to enable strict mode iommu: Print strict or lazy mode at init time iommu: Remove mode argument from iommu_set_dma_strict() Zhen Lei (3): iommu: Enhance IOMMU default DMA mode build options iommu/vt-d: Add support for IOMMU default DMA mode build options iommu/amd: Add support for IOMMU default DMA mode build options .../admin-guide/kernel-parameters.txt | 8 ++-- drivers/iommu/Kconfig | 41 +++ drivers/iommu/amd/amd_iommu_types.h | 6 --- drivers/iommu/amd/init.c | 7 ++-- drivers/iommu/amd/iommu.c | 6 --- drivers/iommu/intel/iommu.c | 16 drivers/iommu/iommu.c | 12 -- include/linux/iommu.h | 2 +- 8 files changed, 66 insertions(+), 32 deletions(-) -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v13 1/6] iommu: Deprecate Intel and AMD cmdline methods to enable strict mode
Now that the x86 drivers support iommu.strict, deprecate the custom methods. Signed-off-by: John Garry --- Documentation/admin-guide/kernel-parameters.txt | 5 +++-- drivers/iommu/amd/init.c| 4 +++- drivers/iommu/intel/iommu.c | 1 + 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 30e9dd52464e..fcbb36d6eea7 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -290,7 +290,8 @@ amd_iommu= [HW,X86-64] Pass parameters to the AMD IOMMU driver in the system. Possible values are: - fullflush - enable flushing of IO/TLB entries when + fullflush [Deprecated, use iommu.strict instead] + - enable flushing of IO/TLB entries when they are unmapped. Otherwise they are flushed before they will be reused, which is a lot of faster @@ -1947,7 +1948,7 @@ bypassed by not enabling DMAR with this option. In this case, gfx device will use physical address for DMA. - strict [Default Off] + strict [Default Off] [Deprecated, use iommu.strict instead] With this option on every unmap_single operation will result in a hardware IOTLB flush operation as opposed to batching them for performance. diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index 46280e6e1535..9f3096d650aa 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -3098,8 +3098,10 @@ static int __init parse_amd_iommu_intr(char *str) static int __init parse_amd_iommu_options(char *str) { for (; *str; ++str) { - if (strncmp(str, "fullflush", 9) == 0) + if (strncmp(str, "fullflush", 9) == 0) { + pr_warn("amd_iommu=fullflush deprecated; use iommu.strict instead\n"); amd_iommu_unmap_flush = true; + } if (strncmp(str, "force_enable", 12) == 0) amd_iommu_force_enable = true; if (strncmp(str, "off", 3) == 0) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index bd93c7ec879e..821d8227a4e6 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -454,6 +454,7 @@ static int __init intel_iommu_setup(char *str) pr_warn("intel_iommu=forcedac deprecated; use iommu.forcedac instead\n"); iommu_dma_forcedac = true; } else if (!strncmp(str, "strict", 6)) { + pr_warn("intel_iommu=strict deprecated; use iommu.strict instead\n"); pr_info("Disable batched IOTLB flush\n"); intel_iommu_strict = 1; } else if (!strncmp(str, "sp_off", 6)) { -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems
On Tue, 2021-06-15 at 13:15 -0600, Rob Herring wrote: > If a property has an 'items' list, then a 'minItems' or 'maxItems' with the > same size as the list is redundant and can be dropped. Note that is DT > schema specific behavior and not standard json-schema behavior. The tooling > will fixup the final schema adding any unspecified minItems/maxItems. > > This condition is partially checked with the meta-schema already, but > only if both 'minItems' and 'maxItems' are equal to the 'items' length. > An improved meta-schema is pending. [...] > Documentation/devicetree/bindings/reset/fsl,imx-src.yaml| 1 - [...] > diff --git a/Documentation/devicetree/bindings/reset/fsl,imx-src.yaml > b/Documentation/devicetree/bindings/reset/fsl,imx-src.yaml > index 27c5e34a3ac6..b11ac533f914 100644 > --- a/Documentation/devicetree/bindings/reset/fsl,imx-src.yaml > +++ b/Documentation/devicetree/bindings/reset/fsl,imx-src.yaml > @@ -59,7 +59,6 @@ properties: >- description: SRC interrupt >- description: CPU WDOG interrupts out of SRC > minItems: 1 > -maxItems: 2 > >'#reset-cells': > const: 1 Acked-by: Philipp Zabel regards Philipp ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[GIT PULL] iommu/arm-smmu: Updates for 5.14
Hi Joerg, Please pull these Arm SMMU updates for 5.14. Of particular note is the support for stalling faults with platform devices using SMMUv3 -- this concludes the bulk of the SVA work from Jean-Philippe. Other than that, one thing to note is that the patch from Thierry adding the '->prove_finalize' implementation hook is also shared with the memory-controller tree, since they build on top of it to get the SMMU working with an nvidia SOC. Unfortunately, that patch also caused a NULL dereference on other parts, so there's a subsequent patch on top here addressing that. Due to the above, the sooner this lands in -next, the better. Cheers, Will --->8 The following changes since commit c4681547bcce777daf576925a966ffa824edd09d: Linux 5.13-rc3 (2021-05-23 11:42:48 -1000) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git tags/arm-smmu-updates for you to fetch changes up to ddd25670d39b2181c7bec33301f2d24cdcf25dde: Merge branch 'for-thierry/arm-smmu' into for-joerg/arm-smmu/updates (2021-06-16 11:30:55 +0100) Arm SMMU updates for 5.14 - SMMUv3: * Support stalling faults for platform devices * Decrease defaults sizes for the event and PRI queues - SMMUv2: * Support for a new '->probe_finalize' hook, needed by Nvidia * Even more Qualcomm compatible strings * Avoid Adreno TTBR1 quirk for DB820C platform - Misc: * Trivial cleanups/refactoring Amey Narkhede (1): iommu/arm: Cleanup resources in case of probe error path Bixuan Cui (1): iommu/arm-smmu-v3: Change *array into *const array Eric Anholt (2): iommu/arm-smmu-qcom: Skip the TTBR1 quirk for db820c. arm64: dts: msm8996: Mark the GPU's SMMU as an adreno one. Jean-Philippe Brucker (4): dt-bindings: Document stall property for IOMMU masters ACPI/IORT: Enable stall support for platform devices iommu/arm-smmu-v3: Add stall support for platform devices iommu/arm-smmu-v3: Ratelimit event dump Martin Botka (1): iommu/arm-smmu-qcom: Add sm6125 compatible Sai Prakash Ranjan (2): iommu/arm-smmu-qcom: Add SC7280 SMMU compatible iommu/arm-smmu-qcom: Move the adreno smmu specific impl Shawn Guo (2): iommu/arm-smmu-qcom: hook up qcom_smmu_impl for ACPI boot iommu/arm-smmu-qcom: Protect acpi_match_platform_list() call with CONFIG_ACPI Thierry Reding (1): iommu/arm-smmu: Implement ->probe_finalize() Will Deacon (2): iommu/arm-smmu: Check smmu->impl pointer before dereferencing Merge branch 'for-thierry/arm-smmu' into for-joerg/arm-smmu/updates Xiyu Yang (2): iommu/arm-smmu: Fix arm_smmu_device refcount leak when arm_smmu_rpm_get fails iommu/arm-smmu: Fix arm_smmu_device refcount leak in address translation Zhen Lei (2): iommu/arm-smmu-v3: Decrease the queue size of evtq and priq iommu/arm-smmu-v3: Remove unnecessary oom message Documentation/devicetree/bindings/iommu/iommu.txt | 18 ++ arch/arm64/boot/dts/qcom/msm8996.dtsi | 2 +- drivers/acpi/arm64/iort.c | 4 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 59 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 222 -- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 48 - drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c| 43 - drivers/iommu/arm/arm-smmu/arm-smmu.c | 38 +++- drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 + drivers/iommu/arm/arm-smmu/qcom_iommu.c | 13 +- 10 files changed, 409 insertions(+), 39 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dt-bindings: Drop redundant minItems/maxItems
On 15-06-21, 13:15, Rob Herring wrote: > If a property has an 'items' list, then a 'minItems' or 'maxItems' with the > same size as the list is redundant and can be dropped. Note that is DT > schema specific behavior and not standard json-schema behavior. The tooling > will fixup the final schema adding any unspecified minItems/maxItems. > > This condition is partially checked with the meta-schema already, but > only if both 'minItems' and 'maxItems' are equal to the 'items' length. > An improved meta-schema is pending. > .../devicetree/bindings/dma/renesas,rcar-dmac.yaml | 1 - > Documentation/devicetree/bindings/phy/brcm,sata-phy.yaml| 1 - > Documentation/devicetree/bindings/phy/mediatek,tphy.yaml| 2 -- > .../devicetree/bindings/phy/phy-cadence-sierra.yaml | 2 -- > .../devicetree/bindings/phy/phy-cadence-torrent.yaml| 4 > .../devicetree/bindings/phy/qcom,ipq806x-usb-phy-hs.yaml| 1 - > .../devicetree/bindings/phy/qcom,ipq806x-usb-phy-ss.yaml| 1 - > Documentation/devicetree/bindings/phy/qcom,qmp-phy.yaml | 1 - > Documentation/devicetree/bindings/phy/qcom,qusb2-phy.yaml | 2 -- > Documentation/devicetree/bindings/phy/renesas,usb2-phy.yaml | 2 -- > Documentation/devicetree/bindings/phy/renesas,usb3-phy.yaml | 1 - Acked-By: Vinod Koul -- ~Vinod ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/1] iommu/arm-smmu-v3: remove unnecessary oom message
On Wed, Jun 16, 2021 at 09:47:18AM +0800, Leizhen (ThunderTown) wrote: > > > On 2021/6/15 19:55, Will Deacon wrote: > > On Tue, Jun 15, 2021 at 12:51:38PM +0100, Robin Murphy wrote: > >> On 2021-06-15 12:34, Will Deacon wrote: > >>> On Tue, Jun 15, 2021 at 07:22:10PM +0800, Leizhen (ThunderTown) wrote: > > > On 2021/6/11 18:32, Will Deacon wrote: > > On Wed, Jun 09, 2021 at 08:54:38PM +0800, Zhen Lei wrote: > >> Fixes scripts/checkpatch.pl warning: > >> WARNING: Possible unnecessary 'out of memory' message > >> > >> Remove it can help us save a bit of memory. > >> > >> Signed-off-by: Zhen Lei > >> --- > >> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 ++-- > >> 1 file changed, 2 insertions(+), 6 deletions(-) > >> > >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > >> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > >> index 2ddc3cd5a7d1..fd7c55b44881 100644 > >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > >> @@ -2787,10 +2787,8 @@ static int arm_smmu_init_l1_strtab(struct > >> arm_smmu_device *smmu) > >>void *strtab = smmu->strtab_cfg.strtab; > >>cfg->l1_desc = devm_kzalloc(smmu->dev, size, GFP_KERNEL); > >> - if (!cfg->l1_desc) { > >> - dev_err(smmu->dev, "failed to allocate l1 stream table > >> desc\n"); > >> + if (!cfg->l1_desc) > > > > What error do you get if devm_kzalloc() fails? I'd like to make sure > > it's > > easy to track down _which_ allocation failed in that case -- does it > > give > > you a line number, for example? > > When devm_kzalloc() fails, the OOM information is printed. No line > number information, but the > size(order) and call stack is printed. It doesn't matter which > allocation failed, the failure > is caused by insufficient system memory rather than the fault of the > SMMU driver. Therefore, > the current printing is not helpful for locating the problem of > insufficient memory. After all, > when memory allocation fails, the SMMU driver cannot work at a lower > specification. > >>> > >>> I don't entirely agree. Another reason for the failure is because the > >>> driver > >>> might be asking for a huge (or negative) allocation, in which case it > >>> might > >>> be instructive to have a look at the actual caller, particularly if the > >>> size is derived from hardware or firmware properties. > >> > >> Agreed - other than deliberately-contrived situations I don't think I've > >> ever hit a genuine OOM, but I definitely have debugged attempts to allocate > >> -1 of something. If the driver-specific message actually calls out the > >> critical information, e.g. "failed to allocate %d stream table entries", it > >> gives debugging a head start since the miscalculation is obvious, but a > >> static message that only identifies the callsite really only saves a quick > >> trip to scripts/faddr2line, and personally I've never found that > >> particularly valuable. > > > > So it sounds like this particular patch is fine, but the one for smmuv2 > > should leave the IRQ allocation message alone (by virtue of it printing > > something a bit more useful -- the number of irqs). > > num_irqs = 0; > while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) > { > num_irqs++; > } > > As the above code, num_irqs is calculated based on the number of dtb or acpi > configuration items, it can't be too large. That is, there is almost zero > chance > that devm_kcalloc() will fail because num_irqs is too large. Right, because firmware is never wrong about anything :) Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 1/6] ACPI: arm64: Move DMA setup operations out of IORT
Hi jean, On 6/10/21 9:51 AM, Jean-Philippe Brucker wrote: > Extract generic DMA setup code out of IORT, so it can be reused by VIOT. > Keep it in drivers/acpi/arm64 for now, since it could break x86 > platforms that haven't run this code so far, if they have invalid > tables. > > Signed-off-by: Jean-Philippe Brucker Reviewed-by: Eric Auger Eric > --- > drivers/acpi/arm64/Makefile | 1 + > include/linux/acpi.h| 3 +++ > include/linux/acpi_iort.h | 6 ++--- > drivers/acpi/arm64/dma.c| 50 ++ > drivers/acpi/arm64/iort.c | 54 ++--- > drivers/acpi/scan.c | 2 +- > 6 files changed, 66 insertions(+), 50 deletions(-) > create mode 100644 drivers/acpi/arm64/dma.c > > diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile > index 6ff50f4ed947..66acbe77f46e 100644 > --- a/drivers/acpi/arm64/Makefile > +++ b/drivers/acpi/arm64/Makefile > @@ -1,3 +1,4 @@ > # SPDX-License-Identifier: GPL-2.0-only > obj-$(CONFIG_ACPI_IORT) += iort.o > obj-$(CONFIG_ACPI_GTDT) += gtdt.o > +obj-y+= dma.o > diff --git a/include/linux/acpi.h b/include/linux/acpi.h > index c60745f657e9..7aaa9559cc19 100644 > --- a/include/linux/acpi.h > +++ b/include/linux/acpi.h > @@ -259,9 +259,12 @@ void acpi_numa_x2apic_affinity_init(struct > acpi_srat_x2apic_cpu_affinity *pa); > > #ifdef CONFIG_ARM64 > void acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa); > +void acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size); > #else > static inline void > acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa) { } > +static inline void > +acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size) { } > #endif > > int acpi_numa_memory_affinity_init (struct acpi_srat_mem_affinity *ma); > diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h > index 1a12baa58e40..f7f054833afd 100644 > --- a/include/linux/acpi_iort.h > +++ b/include/linux/acpi_iort.h > @@ -34,7 +34,7 @@ struct irq_domain *iort_get_device_domain(struct device > *dev, u32 id, > void acpi_configure_pmsi_domain(struct device *dev); > int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id); > /* IOMMU interface */ > -void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *size); > +int iort_dma_get_ranges(struct device *dev, u64 *size); > const struct iommu_ops *iort_iommu_configure_id(struct device *dev, > const u32 *id_in); > int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head > *head); > @@ -48,8 +48,8 @@ static inline struct irq_domain *iort_get_device_domain( > { return NULL; } > static inline void acpi_configure_pmsi_domain(struct device *dev) { } > /* IOMMU interface */ > -static inline void iort_dma_setup(struct device *dev, u64 *dma_addr, > - u64 *size) { } > +static inline int iort_dma_get_ranges(struct device *dev, u64 *size) > +{ return -ENODEV; } > static inline const struct iommu_ops *iort_iommu_configure_id( > struct device *dev, const u32 *id_in) > { return NULL; } > diff --git a/drivers/acpi/arm64/dma.c b/drivers/acpi/arm64/dma.c > new file mode 100644 > index ..f16739ad3cc0 > --- /dev/null > +++ b/drivers/acpi/arm64/dma.c > @@ -0,0 +1,50 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +#include > +#include > +#include > +#include > + > +void acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size) > +{ > + int ret; > + u64 end, mask; > + u64 dmaaddr = 0, size = 0, offset = 0; > + > + /* > + * If @dev is expected to be DMA-capable then the bus code that created > + * it should have initialised its dma_mask pointer by this point. For > + * now, we'll continue the legacy behaviour of coercing it to the > + * coherent mask if not, but we'll no longer do so quietly. > + */ > + if (!dev->dma_mask) { > + dev_warn(dev, "DMA mask not set\n"); > + dev->dma_mask = >coherent_dma_mask; > + } > + > + if (dev->coherent_dma_mask) > + size = max(dev->coherent_dma_mask, dev->coherent_dma_mask + 1); > + else > + size = 1ULL << 32; > + > + ret = acpi_dma_get_range(dev, , , ); > + if (ret == -ENODEV) > + ret = iort_dma_get_ranges(dev, ); > + if (!ret) { > + /* > + * Limit coherent and dma mask based on size retrieved from > + * firmware. > + */ > + end = dmaaddr + size - 1; > + mask = DMA_BIT_MASK(ilog2(end) + 1); > + dev->bus_dma_limit = end; > + dev->coherent_dma_mask = min(dev->coherent_dma_mask, mask); > + *dev->dma_mask = min(*dev->dma_mask, mask); > + } > + > + *dma_addr = dmaaddr; > + *dma_size = size; > + > + ret =
Re: [PATCH v4 2/6] ACPI: Move IOMMU setup code out of IORT
Hi jean, On 6/10/21 9:51 AM, Jean-Philippe Brucker wrote: > Extract the code that sets up the IOMMU infrastructure from IORT, since > it can be reused by VIOT. Move it one level up into a new > acpi_iommu_configure_id() function, which calls the IORT parsing > function which in turn calls the acpi_iommu_fwspec_init() helper. > > Signed-off-by: Jean-Philippe Brucker > --- > include/acpi/acpi_bus.h | 3 ++ > include/linux/acpi_iort.h | 8 ++--- > drivers/acpi/arm64/iort.c | 75 +-- > drivers/acpi/scan.c | 73 - > 4 files changed, 87 insertions(+), 72 deletions(-) > > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h > index 3a82faac5767..41f092a269f6 100644 > --- a/include/acpi/acpi_bus.h > +++ b/include/acpi/acpi_bus.h > @@ -588,6 +588,9 @@ struct acpi_pci_root { > > bool acpi_dma_supported(struct acpi_device *adev); > enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev); > +int acpi_iommu_fwspec_init(struct device *dev, u32 id, > +struct fwnode_handle *fwnode, > +const struct iommu_ops *ops); > int acpi_dma_get_range(struct device *dev, u64 *dma_addr, u64 *offset, > u64 *size); > int acpi_dma_configure_id(struct device *dev, enum dev_dma_attr attr, > diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h > index f7f054833afd..f1f0842a2cb2 100644 > --- a/include/linux/acpi_iort.h > +++ b/include/linux/acpi_iort.h > @@ -35,8 +35,7 @@ void acpi_configure_pmsi_domain(struct device *dev); > int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id); > /* IOMMU interface */ > int iort_dma_get_ranges(struct device *dev, u64 *size); > -const struct iommu_ops *iort_iommu_configure_id(struct device *dev, > - const u32 *id_in); > +int iort_iommu_configure_id(struct device *dev, const u32 *id_in); > int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head > *head); > phys_addr_t acpi_iort_dma_get_max_cpu_address(void); > #else > @@ -50,9 +49,8 @@ static inline void acpi_configure_pmsi_domain(struct device > *dev) { } > /* IOMMU interface */ > static inline int iort_dma_get_ranges(struct device *dev, u64 *size) > { return -ENODEV; } > -static inline const struct iommu_ops *iort_iommu_configure_id( > - struct device *dev, const u32 *id_in) > -{ return NULL; } > +static inline int iort_iommu_configure_id(struct device *dev, const u32 > *id_in) > +{ return -ENODEV; } > static inline > int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head > *head) > { return 0; } > diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c > index a940be1cf2af..b5b021e064b6 100644 > --- a/drivers/acpi/arm64/iort.c > +++ b/drivers/acpi/arm64/iort.c > @@ -806,23 +806,6 @@ static struct acpi_iort_node > *iort_get_msi_resv_iommu(struct device *dev) > return NULL; > } > > -static inline const struct iommu_ops *iort_fwspec_iommu_ops(struct device > *dev) > -{ > - struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); > - > - return (fwspec && fwspec->ops) ? fwspec->ops : NULL; > -} > - > -static inline int iort_add_device_replay(struct device *dev) > -{ > - int err = 0; > - > - if (dev->bus && !device_iommu_mapped(dev)) > - err = iommu_probe_device(dev); > - > - return err; > -} > - > /** > * iort_iommu_msi_get_resv_regions - Reserved region driver helper > * @dev: Device from iommu_get_resv_regions() > @@ -900,18 +883,6 @@ static inline bool iort_iommu_driver_enabled(u8 type) > } > } > > -static int arm_smmu_iort_xlate(struct device *dev, u32 streamid, > -struct fwnode_handle *fwnode, > -const struct iommu_ops *ops) > -{ > - int ret = iommu_fwspec_init(dev, fwnode, ops); > - > - if (!ret) > - ret = iommu_fwspec_add_ids(dev, , 1); > - > - return ret; > -} > - > static bool iort_pci_rc_supports_ats(struct acpi_iort_node *node) > { > struct acpi_iort_root_complex *pci_rc; > @@ -946,7 +917,7 @@ static int iort_iommu_xlate(struct device *dev, struct > acpi_iort_node *node, > return iort_iommu_driver_enabled(node->type) ? > -EPROBE_DEFER : -ENODEV; > > - return arm_smmu_iort_xlate(dev, streamid, iort_fwnode, ops); > + return acpi_iommu_fwspec_init(dev, streamid, iort_fwnode, ops); > } > > struct iort_pci_alias_info { > @@ -1020,24 +991,14 @@ static int iort_nc_iommu_map_id(struct device *dev, > * @dev: device to configure > * @id_in: optional input id const value pointer > * > - * Returns: iommu_ops pointer on configuration success > - * NULL on configuration failure > + * Returns: 0 on success, <0 on failure > */ > -const struct iommu_ops *iort_iommu_configure_id(struct device *dev, > -
Re: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list
On 2021-06-16 12:28, Sai Prakash Ranjan wrote: Hi Robin, On 2021-06-15 19:23, Robin Murphy wrote: On 2021-06-15 12:51, Sai Prakash Ranjan wrote: ... Hi @Robin, from these discussions it seems they are not ok with the change for all SoC vendor implementations and do not have any data on such impact. As I mentioned above, on QCOM platforms we do have several optimizations in HW for TLBIs and would like to make use of it and reduce the unmap latency. What do you think, should this be made implementation specific? Yes, it sounds like there's enough uncertainty for now that this needs to be an opt-in feature. However, I still think that non-strict mode could use it generically, since that's all about over-invalidating to save time on individual unmaps - and relatively non-deterministic - already. So maybe we have a second set of iommu_flush_ops, or just a flag somewhere to control the tlb_flush_walk functions internally, and the choice can be made in the iommu_get_dma_strict() test, but also forced on all the time by your init_context hook. What do you reckon? Sounds good to me. Since you mentioned non-strict mode using it generically, can't we just set tlb_flush_all() in io_pgtable_tlb_flush_walk() like below based on quirk so that we don't need to add any check in iommu_get_dma_strict() and just force the new flush_ops in init_context hook? if (iop->cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT) { iop->cfg.tlb->tlb_flush_all(iop->cookie); return; } Instead of flush_ops in init_context hook, perhaps a io_pgtable quirk since this is related to tlb, probably a bad name but IO_PGTABLE_QUIRK_TLB_INV which will be set in init_context impl hook and the prev condition in io_pgtable_tlb_flush_walk() becomes something like below. Seems very minimal and neat instead of poking into tlb_flush_walk functions or touching dma strict with some flag? if (iop->cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT || iop->cfg.quirks & IO_PGTABLE_QUIRK_TLB_INV) { iop->cfg.tlb->tlb_flush_all(iop->cookie); return; } Thanks, Sai -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v12 3/5] iommu/vt-d: Add support for IOMMU default DMA mode build options
On 2021/6/15 16:25, Robin Murphy wrote: On 2021-06-15 08:26, Lu Baolu wrote: Hi John, On 6/14/21 4:03 PM, John Garry wrote: On 12/06/2021 03:14, Lu Baolu wrote: On 2021/6/11 20:20, John Garry wrote: diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 2a71347611d4..4467353f981b 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -94,6 +94,7 @@ choice prompt "IOMMU default DMA mode" depends on IOMMU_DMA + default IOMMU_DEFAULT_LAZY if INTEL_IOMMU default IOMMU_DEFAULT_STRICT If two default values are different. Which one will be overridden? If I understand your question correctly, I think you are asking if both are set: CONFIG_IOMMU_DEFAULT_LAZY=y CONFIG_IOMMU_DEFAULT_STRICT=y If this happens, then make defconfig complains about both being set, and selects the 2nd, whatever that is. On x86, Intel or AMD, this will be prompt "IOMMU default DMA mode" depends on IOMMU_DMA default IOMMU_DEFAULT_LAZY default IOMMU_DEFAULT_STRICT which will be default, LAZY or STRICT? Will it cause complains? See Documentation/kbuild/kconfig-language.rst: A config option can have any number of default values. If multiple default values are visible, only the first defined one is active. Get it. Thank you, Robin. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v12 09/12] swiotlb: Add restricted DMA alloc/free support
On Wed, Jun 16, 2021 at 02:21:54PM +0800, Claire Chang wrote: > Add the functions, swiotlb_{alloc,free} and is_swiotlb_for_alloc to > support the memory allocation from restricted DMA pool. > > The restricted DMA pool is preferred if available. > > Note that since coherent allocation needs remapping, one must set up > another device coherent pool by shared-dma-pool and use > dma_alloc_from_dev_coherent instead for atomic coherent allocation. > > Signed-off-by: Claire Chang Looks good, Reviewed-by: Christoph Hellwig ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v12 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
On Wed, Jun 16, 2021 at 02:21:51PM +0800, Claire Chang wrote: > Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and > use it to determine whether to bounce the data or not. This will be > useful later to allow for different pools. > > Signed-off-by: Claire Chang Looks good, Reviewed-by: Christoph Hellwig ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/io-pgtable-arm: Optimize partial walk flush for large scatter-gather list
Hi Robin, On 2021-06-15 19:23, Robin Murphy wrote: On 2021-06-15 12:51, Sai Prakash Ranjan wrote: ... Hi @Robin, from these discussions it seems they are not ok with the change for all SoC vendor implementations and do not have any data on such impact. As I mentioned above, on QCOM platforms we do have several optimizations in HW for TLBIs and would like to make use of it and reduce the unmap latency. What do you think, should this be made implementation specific? Yes, it sounds like there's enough uncertainty for now that this needs to be an opt-in feature. However, I still think that non-strict mode could use it generically, since that's all about over-invalidating to save time on individual unmaps - and relatively non-deterministic - already. So maybe we have a second set of iommu_flush_ops, or just a flag somewhere to control the tlb_flush_walk functions internally, and the choice can be made in the iommu_get_dma_strict() test, but also forced on all the time by your init_context hook. What do you reckon? Sounds good to me. Since you mentioned non-strict mode using it generically, can't we just set tlb_flush_all() in io_pgtable_tlb_flush_walk() like below based on quirk so that we don't need to add any check in iommu_get_dma_strict() and just force the new flush_ops in init_context hook? if (iop->cfg.quirks & IO_PGTABLE_QUIRK_NON_STRICT) { iop->cfg.tlb->tlb_flush_all(iop->cookie); return; } Thanks, Sai -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: Plan for /dev/ioasid RFC v2
> From: Alex Williamson > Sent: Wednesday, June 16, 2021 12:56 AM > > On Tue, 15 Jun 2021 01:21:35 + > "Tian, Kevin" wrote: > > > > From: Jason Gunthorpe > > > Sent: Monday, June 14, 2021 9:38 PM > > > > > > On Mon, Jun 14, 2021 at 03:09:31AM +, Tian, Kevin wrote: > > > > > > > If a device can be always blocked from accessing memory in the IOMMU > > > > before it's bound to a driver or more specifically before the driver > > > > moves it to a new security context, then there is no need for VFIO > > > > to track whether IOASIDfd has taken over ownership of the DMA > > > > context for all devices within a group. > > > > > > I've been assuming we'd do something like this, where when a device is > > > first turned into a VFIO it tells the IOMMU layer that this device > > > should be DMA blocked unless an IOASID is attached to > > > it. Disconnecting an IOASID returns it to blocked. > > > > Or just make sure a device is in block-DMA when it's unbound from a > > driver or a security context. Then no need to explicitly tell IOMMU layer > > to do so when it's bound to a new driver. > > > > Currently the default domain type applies even when a device is not > > bound. This implies that if iommu=passthrough a device is always > > allowed to access arbitrary system memory with or without a driver. > > I feel the current domain type (identity, dma, unmanged) should apply > > only when a driver is loaded... > > Note that vfio does not currently require all devices in the group to > be bound to drivers. Other devices within the group, those bound to > vfio drivers, can be used in this configuration. This is not > necessarily recommended though as a non-vfio, non-stub driver binding > to one of those devices can trigger a BUG_ON. This is a good learning that I didn't realize before. As explained in previous mail, we need reuse the group_viable mechanism to trigger BUG_ON. > > > > > If this works I didn't see the need for vfio to keep the sequence. > > > > VFIO still keeps group fd to claim ownership of all devices in a > > > > group. > > > > > > As Alex says you still have to deal with the problem that device A in > > > a group can gain control of device B in the same group. > > > > There is no isolation in the group then how could vfio prevent device > > A from gaining control of device B? for example when both are attached > > to the same GPA address space with device MMIO bar included, devA > > can do p2p to devB. It's all user's policy how to deal with devices within > > the group. > > The latter is user policy, yes, but it's a system security issue that > the user cannot use device A to control device B if the user doesn't > have access to both devices, ie. doesn't own the group. vfio would > prevent this by not allowing access to device A while device B is > insecure and would require that all devices within the group remain in > a secure, user owned state for the extent of access to device A. > > > > This means device A and B can not be used from to two different > > > security contexts. > > > > It depends on how the security context is defined. From iommu layer > > p.o.v, an IOASID is a security context which isolates a device from > > the rest of the system (but not the sibling in the same group). As you > > suggested earlier, it's completely sane if an user wants to attach > > devices in a group to different IOASIDs. Here I just talk about this fact. > > This is sane, yes, but that doesn't give us license to allow the user > to access device A regardless of the state of device B. > > > > > > > If the /dev/iommu FD is the security context then the tracking is > > > needed there. > > > > > > > As I replied to Alex, my point is that VFIO doesn't need to know the > > attaching status of each device in a group before it can allow user to > > access a device. As long as a device in a group either in block DMA > > or switch to a new address space created via /dev/iommu FD, there's > > no problem to allow user accessing it. User cannot do harm to the > > world outside of the group. User knows there is no isolation within > > the group. that is it. > > This is self contradictory, "vfio doesn't need to know the attachment > status"... "[a]s long as a device in a group either in block DMA or > switch to a new address space". So vfio does need to know the latter. > How does it know that? Thanks, > My point was that, if a device can only be in two states: block-DMA or attached to a new address space, which both are secure, then vfio doesn't need to track which state a device is actually in. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: Plan for /dev/ioasid RFC v2
> From: Alex Williamson > Sent: Wednesday, June 16, 2021 12:12 AM > > On Tue, 15 Jun 2021 02:31:39 + > "Tian, Kevin" wrote: > > > > From: Alex Williamson > > > Sent: Tuesday, June 15, 2021 12:28 AM > > > > > [...] > > > > IOASID. Today the group fd requires an IOASID before it hands out a > > > > device_fd. With iommu_fd the device_fd will not allow IOCTLs until it > > > > has a blocked DMA IOASID and is successefully joined to an iommu_fd. > > > > > > Which is the root of my concern. Who owns ioctls to the device fd? > > > It's my understanding this is a vfio provided file descriptor and it's > > > therefore vfio's responsibility. A device-level IOASID interface > > > therefore requires that vfio manage the group aspect of device access. > > > AFAICT, that means that device access can therefore only begin when all > > > devices for a given group are attached to the IOASID and must halt for > > > all devices in the group if any device is ever detached from an IOASID, > > > even temporarily. That suggests a lot more oversight of the IOASIDs by > > > vfio than I'd prefer. > > > > > > > This is possibly the point that is worthy of more clarification and > > alignment, as it sounds like the root of controversy here. > > > > I feel the goal of vfio group management is more about ownership, i.e. > > all devices within a group must be assigned to a single user. Following > > the three rules defined by Jason, what we really care is whether a group > > of devices can be isolated from the rest of the world, i.e. no access to > > memory/device outside of its security context and no access to its > > security context from devices outside of this group. This can be achieved > > as long as every device in the group is either in block-DMA state when > > it's not attached to any security context or attached to an IOASID context > > in IOMMU fd. > > > > As long as group-level isolation is satisfied, how devices within a group > > are further managed is decided by the user (unattached, all attached to > > same IOASID, attached to different IOASIDs) as long as the user > > understands the implication of lacking of isolation within the group. This > > is what a device-centric model comes to play. Misconfiguration just hurts > > the user itself. > > > > If this rationale can be agreed, then I didn't see the point of having VFIO > > to mandate all devices in the group must be attached/detached in > > lockstep. > > In theory this sounds great, but there are still too many assumptions > and too much hand waving about where isolation occurs for me to feel > like I really have the complete picture. So let's walk through some > examples. Please fill in and correct where I'm wrong. Thanks for putting these examples. They are helpful for clearing the whole picture. Before filling in let's first align on what is the key difference between current VFIO model and this new proposal. With this comparison we'll know which of following questions are answered with existing VFIO mechanism and which are handled differently. With Yi's help we figured out the current mechanism: 1) vfio_group_viable. The code comment explains the intention clearly: -- * A vfio group is viable for use by userspace if all devices are in * one of the following states: * - driver-less * - bound to a vfio driver * - bound to an otherwise allowed driver * - a PCI interconnect device -- Note this check is not related to an IOMMU security context. 2) vfio_iommu_group_notifier. When an IOMMU_GROUP_NOTIFY_ BOUND_DRIVER event is notified, vfio_group_viable is re-evaluated. If the affected group was previously viable but now becomes not viable, BUG_ON() as it implies that this device is bound to a non-vfio driver which breaks the group isolation. 3) vfio_group_get_device_fd. User can acquire a device fd only after a) the group is viable; b) the group is attached to a container; c) iommu is set on the container (implying a security context established); The new device-centric proposal suggests: 1) vfio_group_viable; 2) vfio_iommu_group_notifier; 3) block-DMA if a device is detached from previous domain (instead of switching back to default domain as today); 4) vfio_group_get_device_fd. User can acquire a device fd once the group is viable; 5) device-centric when binding to IOMMU fd or attaching to IOASID In this model the group viability mechanism is kept but there is no need for VFIO to track the actual attaching status. Now let's look at how the new model works. > > 1) A dual-function PCIe e1000e NIC where the functions are grouped >together due to ACS isolation issues. > >a) Initial state: functions 0 & 1 are both bound to e1000e driver. > >b) Admin uses driverctl to bind function 1 to vfio-pci, creating > vfio device file, which is chmod'd to grant to a user. This implies that function 1 is in block-DMA mode when it's unbound from e1000e. > >c) User opens vfio function
Re: [PATCH v4 0/6] Add support for ACPI VIOT
Hi Rafael, On Thu, Jun 10, 2021 at 09:51:27AM +0200, Jean-Philippe Brucker wrote: > Add a driver for the ACPI VIOT table, which provides topology > information for para-virtual IOMMUs. Enable virtio-iommu on > non-devicetree platforms, including x86. > > Since v3 [1] I fixed a build bug for !CONFIG_IOMMU_API. Joerg offered to > take this series through the IOMMU tree, which requires Acks for patches > 1-3. I was wondering if you could take a look at patches 1-3, otherwise we'll miss the mark for 5.14 since I won't be able to resend next week. The series adds support for virtio-iommu on QEMU and cloud hypervisor. Thanks, Jean > > You can find a QEMU implementation at [2], with extra support for > testing all VIOT nodes including MMIO-based endpoints and IOMMU. > This series is at [3]. > > [1] > https://lore.kernel.org/linux-iommu/2021060215.1077006-1-jean-phili...@linaro.org/ > [2] https://jpbrucker.net/git/qemu/log/?h=virtio-iommu/acpi > [3] https://jpbrucker.net/git/linux/log/?h=virtio-iommu/acpi > > > Jean-Philippe Brucker (6): > ACPI: arm64: Move DMA setup operations out of IORT > ACPI: Move IOMMU setup code out of IORT > ACPI: Add driver for the VIOT table > iommu/dma: Pass address limit rather than size to > iommu_setup_dma_ops() > iommu/dma: Simplify calls to iommu_setup_dma_ops() > iommu/virtio: Enable x86 support > > drivers/acpi/Kconfig | 3 + > drivers/iommu/Kconfig| 4 +- > drivers/acpi/Makefile| 2 + > drivers/acpi/arm64/Makefile | 1 + > include/acpi/acpi_bus.h | 3 + > include/linux/acpi.h | 3 + > include/linux/acpi_iort.h| 14 +- > include/linux/acpi_viot.h| 19 ++ > include/linux/dma-iommu.h| 4 +- > arch/arm64/mm/dma-mapping.c | 2 +- > drivers/acpi/arm64/dma.c | 50 + > drivers/acpi/arm64/iort.c| 129 ++--- > drivers/acpi/bus.c | 2 + > drivers/acpi/scan.c | 78 +++- > drivers/acpi/viot.c | 364 +++ > drivers/iommu/amd/iommu.c| 9 +- > drivers/iommu/dma-iommu.c| 17 +- > drivers/iommu/intel/iommu.c | 10 +- > drivers/iommu/virtio-iommu.c | 8 + > MAINTAINERS | 8 + > 20 files changed, 580 insertions(+), 150 deletions(-) > create mode 100644 include/linux/acpi_viot.h > create mode 100644 drivers/acpi/arm64/dma.c > create mode 100644 drivers/acpi/viot.c > > -- > 2.31.1 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v11 00/12] Restricted DMA
v12: https://lore.kernel.org/patchwork/cover/1447254/ On Wed, Jun 16, 2021 at 11:52 AM Claire Chang wrote: > > This series implements mitigations for lack of DMA access control on > systems without an IOMMU, which could result in the DMA accessing the > system memory at unexpected times and/or unexpected addresses, possibly > leading to data leakage or corruption. > > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is > not behind an IOMMU. As PCI-e, by design, gives the device full access to > system memory, a vulnerability in the Wi-Fi firmware could easily escalate > to a full system exploit (remote wifi exploits: [1a], [1b] that shows a > full chain of exploits; [2], [3]). > > To mitigate the security concerns, we introduce restricted DMA. Restricted > DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a > specially allocated region and does memory allocation from the same region. > The feature on its own provides a basic level of protection against the DMA > overwriting buffer contents at unexpected times. However, to protect > against general data leakage and system memory corruption, the system needs > to provide a way to restrict the DMA to a predefined memory region (this is > usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]). > > [1a] > https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html > [1b] > https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html > [2] https://blade.tencent.com/en/advisories/qualpwn/ > [3] > https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/ > [4] > https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132 > > v11: > - Rebase against swiotlb devel/for-linus-5.14 > - s/mempry/memory/g > - exchange the order of patch 09/12 and 10/12 > https://lore.kernel.org/patchwork/cover/1446882/ > > v10: > Address the comments in v9 to > - fix the dev->dma_io_tlb_mem assignment > - propagate swiotlb_force setting into io_tlb_default_mem->force > - move set_memory_decrypted out of swiotlb_init_io_tlb_mem > - move debugfs_dir declaration into the main CONFIG_DEBUG_FS block > - add swiotlb_ prefix to find_slots and release_slots > - merge the 3 alloc/free related patches > - move the CONFIG_DMA_RESTRICTED_POOL later > > v9: > Address the comments in v7 to > - set swiotlb active pool to dev->dma_io_tlb_mem > - get rid of get_io_tlb_mem > - dig out the device struct for is_swiotlb_active > - move debugfs_create_dir out of swiotlb_create_debugfs > - do set_memory_decrypted conditionally in swiotlb_init_io_tlb_mem > - use IS_ENABLED in kernel/dma/direct.c > - fix redefinition of 'of_dma_set_restricted_buffer' > https://lore.kernel.org/patchwork/cover/1445081/ > > v8: > - Fix reserved-memory.txt and add the reg property in example. > - Fix sizeof for of_property_count_elems_of_size in > drivers/of/address.c#of_dma_set_restricted_buffer. > - Apply Will's suggestion to try the OF node having DMA configuration in > drivers/of/address.c#of_dma_set_restricted_buffer. > - Fix typo in the comment of > drivers/of/address.c#of_dma_set_restricted_buffer. > - Add error message for PageHighMem in > kernel/dma/swiotlb.c#rmem_swiotlb_device_init and move it to > rmem_swiotlb_setup. > - Fix the message string in rmem_swiotlb_setup. > https://lore.kernel.org/patchwork/cover/1437112/ > > v7: > Fix debugfs, PageHighMem and comment style in rmem_swiotlb_device_init > https://lore.kernel.org/patchwork/cover/1431031/ > > v6: > Address the comments in v5 > https://lore.kernel.org/patchwork/cover/1423201/ > > v5: > Rebase on latest linux-next > https://lore.kernel.org/patchwork/cover/1416899/ > > v4: > - Fix spinlock bad magic > - Use rmem->name for debugfs entry > - Address the comments in v3 > https://lore.kernel.org/patchwork/cover/1378113/ > > v3: > Using only one reserved memory region for both streaming DMA and memory > allocation. > https://lore.kernel.org/patchwork/cover/1360992/ > > v2: > Building on top of swiotlb. > https://lore.kernel.org/patchwork/cover/1280705/ > > v1: > Using dma_map_ops. > https://lore.kernel.org/patchwork/cover/1271660/ > > Claire Chang (12): > swiotlb: Refactor swiotlb init functions > swiotlb: Refactor swiotlb_create_debugfs > swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used > swiotlb: Update is_swiotlb_buffer to add a struct device argument > swiotlb: Update is_swiotlb_active to add a struct device argument > swiotlb: Use is_dev_swiotlb_force for swiotlb data bouncing > swiotlb: Move alloc_size to swiotlb_find_slots > swiotlb: Refactor swiotlb_tbl_unmap_single > swiotlb: Add restricted DMA alloc/free support > swiotlb: Add restricted DMA pool initialization > dt-bindings: of: Add restricted DMA pool > of: Add plumbing for restricted DMA pool > >
[PATCH v12 12/12] of: Add plumbing for restricted DMA pool
If a device is not behind an IOMMU, we look up the device node and set up the restricted DMA when the restricted-dma-pool is presented. Signed-off-by: Claire Chang --- drivers/of/address.c| 33 + drivers/of/device.c | 3 +++ drivers/of/of_private.h | 6 ++ 3 files changed, 42 insertions(+) diff --git a/drivers/of/address.c b/drivers/of/address.c index 73ddf2540f3f..cdf700fba5c4 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -1022,6 +1023,38 @@ int of_dma_get_range(struct device_node *np, const struct bus_dma_region **map) of_node_put(node); return ret; } + +int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np) +{ + struct device_node *node, *of_node = dev->of_node; + int count, i; + + count = of_property_count_elems_of_size(of_node, "memory-region", + sizeof(u32)); + /* +* If dev->of_node doesn't exist or doesn't contain memory-region, try +* the OF node having DMA configuration. +*/ + if (count <= 0) { + of_node = np; + count = of_property_count_elems_of_size( + of_node, "memory-region", sizeof(u32)); + } + + for (i = 0; i < count; i++) { + node = of_parse_phandle(of_node, "memory-region", i); + /* +* There might be multiple memory regions, but only one +* restricted-dma-pool region is allowed. +*/ + if (of_device_is_compatible(node, "restricted-dma-pool") && + of_device_is_available(node)) + return of_reserved_mem_device_init_by_idx(dev, of_node, + i); + } + + return 0; +} #endif /* CONFIG_HAS_DMA */ /** diff --git a/drivers/of/device.c b/drivers/of/device.c index 6cb86de404f1..e68316836a7a 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -165,6 +165,9 @@ int of_dma_configure_id(struct device *dev, struct device_node *np, arch_setup_dma_ops(dev, dma_start, size, iommu, coherent); + if (!iommu) + return of_dma_set_restricted_buffer(dev, np); + return 0; } EXPORT_SYMBOL_GPL(of_dma_configure_id); diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h index d9e6a324de0a..25cebbed5f02 100644 --- a/drivers/of/of_private.h +++ b/drivers/of/of_private.h @@ -161,12 +161,18 @@ struct bus_dma_region; #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA) int of_dma_get_range(struct device_node *np, const struct bus_dma_region **map); +int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np); #else static inline int of_dma_get_range(struct device_node *np, const struct bus_dma_region **map) { return -ENODEV; } +static inline int of_dma_set_restricted_buffer(struct device *dev, + struct device_node *np) +{ + return -ENODEV; +} #endif #endif /* _LINUX_OF_PRIVATE_H */ -- 2.32.0.272.g935e593368-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v12 11/12] dt-bindings: of: Add restricted DMA pool
Introduce the new compatible string, restricted-dma-pool, for restricted DMA. One can specify the address and length of the restricted DMA memory region by restricted-dma-pool in the reserved-memory node. Signed-off-by: Claire Chang --- .../reserved-memory/reserved-memory.txt | 36 +-- 1 file changed, 33 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt index e8d3096d922c..46804f24df05 100644 --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt @@ -51,6 +51,23 @@ compatible (optional) - standard definition used as a shared pool of DMA buffers for a set of devices. It can be used by an operating system to instantiate the necessary pool management subsystem if necessary. +- restricted-dma-pool: This indicates a region of memory meant to be + used as a pool of restricted DMA buffers for a set of devices. The + memory region would be the only region accessible to those devices. + When using this, the no-map and reusable properties must not be set, + so the operating system can create a virtual mapping that will be used + for synchronization. The main purpose for restricted DMA is to + mitigate the lack of DMA access control on systems without an IOMMU, + which could result in the DMA accessing the system memory at + unexpected times and/or unexpected addresses, possibly leading to data + leakage or corruption. The feature on its own provides a basic level + of protection against the DMA overwriting buffer contents at + unexpected times. However, to protect against general data leakage and + system memory corruption, the system needs to provide way to lock down + the memory access, e.g., MPU. Note that since coherent allocation + needs remapping, one must set up another device coherent pool by + shared-dma-pool and use dma_alloc_from_dev_coherent instead for atomic + coherent allocation. - vendor specific string in the form ,[-] no-map (optional) - empty property - Indicates the operating system must not create a virtual mapping @@ -85,10 +102,11 @@ memory-region-names (optional) - a list of names, one for each corresponding Example --- -This example defines 3 contiguous regions are defined for Linux kernel: +This example defines 4 contiguous regions for Linux kernel: one default of all device drivers (named linux,cma@7200 and 64MiB in size), -one dedicated to the framebuffer device (named framebuffer@7800, 8MiB), and -one for multimedia processing (named multimedia-memory@7700, 64MiB). +one dedicated to the framebuffer device (named framebuffer@7800, 8MiB), +one for multimedia processing (named multimedia-memory@7700, 64MiB), and +one for restricted dma pool (named restricted_dma_reserved@0x5000, 64MiB). / { #address-cells = <1>; @@ -120,6 +138,11 @@ one for multimedia processing (named multimedia-memory@7700, 64MiB). compatible = "acme,multimedia-memory"; reg = <0x7700 0x400>; }; + + restricted_dma_reserved: restricted_dma_reserved { + compatible = "restricted-dma-pool"; + reg = <0x5000 0x400>; + }; }; /* ... */ @@ -138,4 +161,11 @@ one for multimedia processing (named multimedia-memory@7700, 64MiB). memory-region = <_reserved>; /* ... */ }; + + pcie_device: pcie_device@0,0 { + reg = <0x8301 0x0 0x 0x0 0x0010 + 0x8301 0x0 0x0010 0x0 0x0010>; + memory-region = <_dma_mem_reserved>; + /* ... */ + }; }; -- 2.32.0.272.g935e593368-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v12 10/12] swiotlb: Add restricted DMA pool initialization
Add the initialization function to create restricted DMA pools from matching reserved-memory nodes. Regardless of swiotlb setting, the restricted DMA pool is preferred if available. The restricted DMA pools provide a basic level of protection against the DMA overwriting buffer contents at unexpected times. However, to protect against general data leakage and system memory corruption, the system needs to provide a way to lock down the memory access, e.g., MPU. Signed-off-by: Claire Chang Reviewed-by: Christoph Hellwig --- include/linux/swiotlb.h | 3 +- kernel/dma/Kconfig | 14 kernel/dma/swiotlb.c| 76 + 3 files changed, 92 insertions(+), 1 deletion(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index a73fad460162..175b6c113ed8 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -73,7 +73,8 @@ extern enum swiotlb_force swiotlb_force; * range check to see if the memory was in fact allocated by this * API. * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and - * @end. This is command line adjustable via setup_io_tlb_npages. + * @end. For default swiotlb, this is command line adjustable via + * setup_io_tlb_npages. * @used: The number of used IO TLB block. * @list: The free list describing the number of free entries available * from each index. diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index 77b405508743..3e961dc39634 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -80,6 +80,20 @@ config SWIOTLB bool select NEED_DMA_MAP_STATE +config DMA_RESTRICTED_POOL + bool "DMA Restricted Pool" + depends on OF && OF_RESERVED_MEM + select SWIOTLB + help + This enables support for restricted DMA pools which provide a level of + DMA memory protection on systems with limited hardware protection + capabilities, such as those lacking an IOMMU. + + For more information see + + and . + If unsure, say "n". + # # Should be selected if we can mmap non-coherent mappings to userspace. # The only thing that is really required is a way to set an uncached bit diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index d3d4f1a25fee..8a4d4ad4335e 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -39,6 +39,13 @@ #ifdef CONFIG_DEBUG_FS #include #endif +#ifdef CONFIG_DMA_RESTRICTED_POOL +#include +#include +#include +#include +#include +#endif #include #include @@ -735,4 +742,73 @@ bool swiotlb_free(struct device *dev, struct page *page, size_t size) return true; } +static int rmem_swiotlb_device_init(struct reserved_mem *rmem, + struct device *dev) +{ + struct io_tlb_mem *mem = rmem->priv; + unsigned long nslabs = rmem->size >> IO_TLB_SHIFT; + + /* +* Since multiple devices can share the same pool, the private data, +* io_tlb_mem struct, will be initialized by the first device attached +* to it. +*/ + if (!mem) { + mem = kzalloc(struct_size(mem, slots, nslabs), GFP_KERNEL); + if (!mem) + return -ENOMEM; + + swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false); + mem->force_bounce = true; + mem->for_alloc = true; + set_memory_decrypted((unsigned long)phys_to_virt(rmem->base), +rmem->size >> PAGE_SHIFT); + + rmem->priv = mem; + + if (IS_ENABLED(CONFIG_DEBUG_FS)) { + mem->debugfs = + debugfs_create_dir(rmem->name, debugfs_dir); + swiotlb_create_debugfs_files(mem); + } + } + + dev->dma_io_tlb_mem = mem; + + return 0; +} + +static void rmem_swiotlb_device_release(struct reserved_mem *rmem, + struct device *dev) +{ + dev->dma_io_tlb_mem = io_tlb_default_mem; +} + +static const struct reserved_mem_ops rmem_swiotlb_ops = { + .device_init = rmem_swiotlb_device_init, + .device_release = rmem_swiotlb_device_release, +}; + +static int __init rmem_swiotlb_setup(struct reserved_mem *rmem) +{ + unsigned long node = rmem->fdt_node; + + if (of_get_flat_dt_prop(node, "reusable", NULL) || + of_get_flat_dt_prop(node, "linux,cma-default", NULL) || + of_get_flat_dt_prop(node, "linux,dma-default", NULL) || + of_get_flat_dt_prop(node, "no-map", NULL)) + return -EINVAL; + + if (PageHighMem(pfn_to_page(PHYS_PFN(rmem->base { + pr_err("Restricted DMA pool must be accessible within the linear mapping."); + return -EINVAL; + } + + rmem->ops = _swiotlb_ops; +
[PATCH v12 09/12] swiotlb: Add restricted DMA alloc/free support
Add the functions, swiotlb_{alloc,free} and is_swiotlb_for_alloc to support the memory allocation from restricted DMA pool. The restricted DMA pool is preferred if available. Note that since coherent allocation needs remapping, one must set up another device coherent pool by shared-dma-pool and use dma_alloc_from_dev_coherent instead for atomic coherent allocation. Signed-off-by: Claire Chang --- include/linux/swiotlb.h | 26 ++ kernel/dma/direct.c | 49 +++-- kernel/dma/swiotlb.c| 38 ++-- 3 files changed, 99 insertions(+), 14 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 8d8855c77d9a..a73fad460162 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -85,6 +85,7 @@ extern enum swiotlb_force swiotlb_force; * @debugfs: The dentry to debugfs. * @late_alloc:%true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced + * @for_alloc: %true if the pool is used for memory allocation */ struct io_tlb_mem { phys_addr_t start; @@ -96,6 +97,7 @@ struct io_tlb_mem { struct dentry *debugfs; bool late_alloc; bool force_bounce; + bool for_alloc; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; @@ -156,4 +158,28 @@ static inline void swiotlb_adjust_size(unsigned long size) extern void swiotlb_print_info(void); extern void swiotlb_set_max_segment(unsigned int); +#ifdef CONFIG_DMA_RESTRICTED_POOL +struct page *swiotlb_alloc(struct device *dev, size_t size); +bool swiotlb_free(struct device *dev, struct page *page, size_t size); + +static inline bool is_swiotlb_for_alloc(struct device *dev) +{ + return dev->dma_io_tlb_mem->for_alloc; +} +#else +static inline struct page *swiotlb_alloc(struct device *dev, size_t size) +{ + return NULL; +} +static inline bool swiotlb_free(struct device *dev, struct page *page, + size_t size) +{ + return false; +} +static inline bool is_swiotlb_for_alloc(struct device *dev) +{ + return false; +} +#endif /* CONFIG_DMA_RESTRICTED_POOL */ + #endif /* __LINUX_SWIOTLB_H */ diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index a92465b4eb12..2de33e5d302b 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -75,6 +75,15 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size) min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit); } +static void __dma_direct_free_pages(struct device *dev, struct page *page, + size_t size) +{ + if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) && + swiotlb_free(dev, page, size)) + return; + dma_free_contiguous(dev, page, size); +} + static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, gfp_t gfp) { @@ -86,6 +95,16 @@ static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, _limit); + if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) && + is_swiotlb_for_alloc(dev)) { + page = swiotlb_alloc(dev, size); + if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) { + __dma_direct_free_pages(dev, page, size); + return NULL; + } + return page; + } + page = dma_alloc_contiguous(dev, size, gfp); if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) { dma_free_contiguous(dev, page, size); @@ -142,7 +161,7 @@ void *dma_direct_alloc(struct device *dev, size_t size, gfp |= __GFP_NOWARN; if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && - !force_dma_unencrypted(dev)) { + !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev)) { page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO); if (!page) return NULL; @@ -155,18 +174,23 @@ void *dma_direct_alloc(struct device *dev, size_t size, } if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) && - !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - !dev_is_dma_coherent(dev)) + !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) && + !is_swiotlb_for_alloc(dev)) return arch_dma_alloc(dev, size, dma_handle, gfp, attrs); /* * Remapping or decrypting memory may block. If either is required and * we can't block, allocate the memory from the atomic pools. +* If restricted DMA (i.e., is_swiotlb_for_alloc) is required, one must +* set up another device coherent pool by shared-dma-pool and use +*
[PATCH v12 08/12] swiotlb: Refactor swiotlb_tbl_unmap_single
Add a new function, swiotlb_release_slots, to make the code reusable for supporting different bounce buffer pools. Signed-off-by: Claire Chang Reviewed-by: Christoph Hellwig --- kernel/dma/swiotlb.c | 35 --- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index b59e689aa79d..688c6e0c43ff 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -555,27 +555,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, return tlb_addr; } -/* - * tlb_addr is the physical address of the bounce buffer to unmap. - */ -void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, - size_t mapping_size, enum dma_data_direction dir, - unsigned long attrs) +static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr) { - struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem; + struct io_tlb_mem *mem = dev->dma_io_tlb_mem; unsigned long flags; - unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr); + unsigned int offset = swiotlb_align_offset(dev, tlb_addr); int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT; int nslots = nr_slots(mem->slots[index].alloc_size + offset); int count, i; - /* -* First, sync the memory before unmapping the entry -*/ - if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && - (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)) - swiotlb_bounce(hwdev, tlb_addr, mapping_size, DMA_FROM_DEVICE); - /* * Return the buffer to the free list by setting the corresponding * entries to indicate the number of contiguous entries available. @@ -610,6 +598,23 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, spin_unlock_irqrestore(>lock, flags); } +/* + * tlb_addr is the physical address of the bounce buffer to unmap. + */ +void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr, + size_t mapping_size, enum dma_data_direction dir, + unsigned long attrs) +{ + /* +* First, sync the memory before unmapping the entry +*/ + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && + (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)) + swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE); + + swiotlb_release_slots(dev, tlb_addr); +} + void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr, size_t size, enum dma_data_direction dir) { -- 2.32.0.272.g935e593368-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v12 07/12] swiotlb: Move alloc_size to swiotlb_find_slots
Rename find_slots to swiotlb_find_slots and move the maintenance of alloc_size to it for better code reusability later. Signed-off-by: Claire Chang Reviewed-by: Christoph Hellwig --- kernel/dma/swiotlb.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index b5a9c4c0b4db..b59e689aa79d 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -431,8 +431,8 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, unsigned int index) * Find a suitable number of IO TLB entries size that will fit this request and * allocate a buffer from that IO TLB pool. */ -static int find_slots(struct device *dev, phys_addr_t orig_addr, - size_t alloc_size) +static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr, + size_t alloc_size) { struct io_tlb_mem *mem = dev->dma_io_tlb_mem; unsigned long boundary_mask = dma_get_seg_boundary(dev); @@ -487,8 +487,11 @@ static int find_slots(struct device *dev, phys_addr_t orig_addr, return -1; found: - for (i = index; i < index + nslots; i++) + for (i = index; i < index + nslots; i++) { mem->slots[i].list = 0; + mem->slots[i].alloc_size = + alloc_size - ((i - index) << IO_TLB_SHIFT); + } for (i = index - 1; io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 && mem->slots[i].list; i--) @@ -529,7 +532,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, return (phys_addr_t)DMA_MAPPING_ERROR; } - index = find_slots(dev, orig_addr, alloc_size + offset); + index = swiotlb_find_slots(dev, orig_addr, alloc_size + offset); if (index == -1) { if (!(attrs & DMA_ATTR_NO_WARN)) dev_warn_ratelimited(dev, @@ -543,11 +546,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, * This is needed when we sync the memory. Then we sync the buffer if * needed. */ - for (i = 0; i < nr_slots(alloc_size + offset); i++) { + for (i = 0; i < nr_slots(alloc_size + offset); i++) mem->slots[index + i].orig_addr = slot_addr(orig_addr, i); - mem->slots[index + i].alloc_size = - alloc_size - (i << IO_TLB_SHIFT); - } tlb_addr = slot_addr(mem->start, index) + offset; if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)) -- 2.32.0.272.g935e593368-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v12 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and use it to determine whether to bounce the data or not. This will be useful later to allow for different pools. Signed-off-by: Claire Chang --- include/linux/swiotlb.h | 11 +++ kernel/dma/direct.c | 2 +- kernel/dma/direct.h | 2 +- kernel/dma/swiotlb.c| 4 4 files changed, 17 insertions(+), 2 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index dd1c30a83058..8d8855c77d9a 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -84,6 +84,7 @@ extern enum swiotlb_force swiotlb_force; * unmap calls. * @debugfs: The dentry to debugfs. * @late_alloc:%true if allocated using the page allocator + * @force_bounce: %true if swiotlb bouncing is forced */ struct io_tlb_mem { phys_addr_t start; @@ -94,6 +95,7 @@ struct io_tlb_mem { spinlock_t lock; struct dentry *debugfs; bool late_alloc; + bool force_bounce; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; @@ -109,6 +111,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) return mem && paddr >= mem->start && paddr < mem->end; } +static inline bool is_swiotlb_force_bounce(struct device *dev) +{ + return dev->dma_io_tlb_mem->force_bounce; +} + void __init swiotlb_exit(void); unsigned int swiotlb_max_segment(void); size_t swiotlb_max_mapping_size(struct device *dev); @@ -120,6 +127,10 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) { return false; } +static inline bool is_swiotlb_force_bounce(struct device *dev) +{ + return false; +} static inline void swiotlb_exit(void) { } diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 7a88c34d0867..a92465b4eb12 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -496,7 +496,7 @@ size_t dma_direct_max_mapping_size(struct device *dev) { /* If SWIOTLB is active, use its maximum mapping size */ if (is_swiotlb_active(dev) && - (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE)) + (dma_addressing_limited(dev) || is_swiotlb_force_bounce(dev))) return swiotlb_max_mapping_size(dev); return SIZE_MAX; } diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h index 13e9e7158d94..4632b0f4f72e 100644 --- a/kernel/dma/direct.h +++ b/kernel/dma/direct.h @@ -87,7 +87,7 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev, phys_addr_t phys = page_to_phys(page) + offset; dma_addr_t dma_addr = phys_to_dma(dev, phys); - if (unlikely(swiotlb_force == SWIOTLB_FORCE)) + if (is_swiotlb_force_bounce(dev)) return swiotlb_map(dev, phys, size, dir, attrs); if (unlikely(!dma_capable(dev, dma_addr, size, true))) { diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 101abeb0a57d..b5a9c4c0b4db 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -179,6 +179,10 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->end = mem->start + bytes; mem->index = 0; mem->late_alloc = late_alloc; + + if (swiotlb_force == SWIOTLB_FORCE) + mem->force_bounce = true; + spin_lock_init(>lock); for (i = 0; i < mem->nslabs; i++) { mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i); -- 2.32.0.272.g935e593368-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v12 05/12] swiotlb: Update is_swiotlb_active to add a struct device argument
Update is_swiotlb_active to add a struct device argument. This will be useful later to allow for different pools. Signed-off-by: Claire Chang Reviewed-by: Christoph Hellwig --- drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +- drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +- drivers/pci/xen-pcifront.c | 2 +- include/linux/swiotlb.h | 4 ++-- kernel/dma/direct.c | 2 +- kernel/dma/swiotlb.c | 4 ++-- 6 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c index a9d65fc8aa0e..4b7afa0fc85d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c @@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct drm_i915_gem_object *obj) max_order = MAX_ORDER; #ifdef CONFIG_SWIOTLB - if (is_swiotlb_active()) { + if (is_swiotlb_active(obj->base.dev->dev)) { unsigned int max_segment; max_segment = swiotlb_max_segment(); diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c index 9662522aa066..be15bfd9e0ee 100644 --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c @@ -321,7 +321,7 @@ nouveau_ttm_init(struct nouveau_drm *drm) } #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86) - need_swiotlb = is_swiotlb_active(); + need_swiotlb = is_swiotlb_active(dev->dev); #endif ret = ttm_bo_device_init(>ttm.bdev, _bo_driver, diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c index b7a8f3a1921f..0d56985bfe81 100644 --- a/drivers/pci/xen-pcifront.c +++ b/drivers/pci/xen-pcifront.c @@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct pcifront_device *pdev) spin_unlock(_dev_lock); - if (!err && !is_swiotlb_active()) { + if (!err && !is_swiotlb_active(>xdev->dev)) { err = pci_xen_swiotlb_init_late(); if (err) dev_err(>xdev->dev, "Could not setup SWIOTLB!\n"); diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index d1f3d95881cd..dd1c30a83058 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -112,7 +112,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) void __init swiotlb_exit(void); unsigned int swiotlb_max_segment(void); size_t swiotlb_max_mapping_size(struct device *dev); -bool is_swiotlb_active(void); +bool is_swiotlb_active(struct device *dev); void __init swiotlb_adjust_size(unsigned long size); #else #define swiotlb_force SWIOTLB_NO_FORCE @@ -132,7 +132,7 @@ static inline size_t swiotlb_max_mapping_size(struct device *dev) return SIZE_MAX; } -static inline bool is_swiotlb_active(void) +static inline bool is_swiotlb_active(struct device *dev) { return false; } diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 84c9feb5474a..7a88c34d0867 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask) size_t dma_direct_max_mapping_size(struct device *dev) { /* If SWIOTLB is active, use its maximum mapping size */ - if (is_swiotlb_active() && + if (is_swiotlb_active(dev) && (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE)) return swiotlb_max_mapping_size(dev); return SIZE_MAX; diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index a9f5c08dd94a..101abeb0a57d 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -663,9 +663,9 @@ size_t swiotlb_max_mapping_size(struct device *dev) return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE; } -bool is_swiotlb_active(void) +bool is_swiotlb_active(struct device *dev) { - return io_tlb_default_mem != NULL; + return dev->dma_io_tlb_mem != NULL; } EXPORT_SYMBOL_GPL(is_swiotlb_active); -- 2.32.0.272.g935e593368-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v12 04/12] swiotlb: Update is_swiotlb_buffer to add a struct device argument
Update is_swiotlb_buffer to add a struct device argument. This will be useful later to allow for different pools. Signed-off-by: Claire Chang Reviewed-by: Christoph Hellwig --- drivers/iommu/dma-iommu.c | 12 ++-- drivers/xen/swiotlb-xen.c | 2 +- include/linux/swiotlb.h | 7 --- kernel/dma/direct.c | 6 +++--- kernel/dma/direct.h | 6 +++--- 5 files changed, 17 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 3087d9fa6065..10997ef541f8 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -507,7 +507,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr, __iommu_dma_unmap(dev, dma_addr, size); - if (unlikely(is_swiotlb_buffer(phys))) + if (unlikely(is_swiotlb_buffer(dev, phys))) swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs); } @@ -578,7 +578,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys, } iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask); - if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys)) + if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys)) swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs); return iova; } @@ -749,7 +749,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev, if (!dev_is_dma_coherent(dev)) arch_sync_dma_for_cpu(phys, size, dir); - if (is_swiotlb_buffer(phys)) + if (is_swiotlb_buffer(dev, phys)) swiotlb_sync_single_for_cpu(dev, phys, size, dir); } @@ -762,7 +762,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev, return; phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle); - if (is_swiotlb_buffer(phys)) + if (is_swiotlb_buffer(dev, phys)) swiotlb_sync_single_for_device(dev, phys, size, dir); if (!dev_is_dma_coherent(dev)) @@ -783,7 +783,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev, if (!dev_is_dma_coherent(dev)) arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir); - if (is_swiotlb_buffer(sg_phys(sg))) + if (is_swiotlb_buffer(dev, sg_phys(sg))) swiotlb_sync_single_for_cpu(dev, sg_phys(sg), sg->length, dir); } @@ -800,7 +800,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev, return; for_each_sg(sgl, sg, nelems, i) { - if (is_swiotlb_buffer(sg_phys(sg))) + if (is_swiotlb_buffer(dev, sg_phys(sg))) swiotlb_sync_single_for_device(dev, sg_phys(sg), sg->length, dir); diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 4c89afc0df62..0c6ed09f8513 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -100,7 +100,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr) * in our domain. Therefore _only_ check address within our domain. */ if (pfn_valid(PFN_DOWN(paddr))) - return is_swiotlb_buffer(paddr); + return is_swiotlb_buffer(dev, paddr); return 0; } diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 216854a5e513..d1f3d95881cd 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -2,6 +2,7 @@ #ifndef __LINUX_SWIOTLB_H #define __LINUX_SWIOTLB_H +#include #include #include #include @@ -101,9 +102,9 @@ struct io_tlb_mem { }; extern struct io_tlb_mem *io_tlb_default_mem; -static inline bool is_swiotlb_buffer(phys_addr_t paddr) +static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) { - struct io_tlb_mem *mem = io_tlb_default_mem; + struct io_tlb_mem *mem = dev->dma_io_tlb_mem; return mem && paddr >= mem->start && paddr < mem->end; } @@ -115,7 +116,7 @@ bool is_swiotlb_active(void); void __init swiotlb_adjust_size(unsigned long size); #else #define swiotlb_force SWIOTLB_NO_FORCE -static inline bool is_swiotlb_buffer(phys_addr_t paddr) +static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) { return false; } diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index f737e3347059..84c9feb5474a 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev, for_each_sg(sgl, sg, nents, i) { phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg)); - if (unlikely(is_swiotlb_buffer(paddr))) + if (unlikely(is_swiotlb_buffer(dev, paddr))) swiotlb_sync_single_for_device(dev, paddr, sg->length,
[PATCH v12 03/12] swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used
Always have the pointer to the swiotlb pool used in struct device. This could help simplify the code for other pools. Signed-off-by: Claire Chang Reviewed-by: Christoph Hellwig --- drivers/base/core.c| 4 include/linux/device.h | 4 kernel/dma/swiotlb.c | 8 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index f29839382f81..cb3123e3954d 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include /* for dma_default_coherent */ @@ -2736,6 +2737,9 @@ void device_initialize(struct device *dev) defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) dev->dma_coherent = dma_default_coherent; #endif +#ifdef CONFIG_SWIOTLB + dev->dma_io_tlb_mem = io_tlb_default_mem; +#endif } EXPORT_SYMBOL_GPL(device_initialize); diff --git a/include/linux/device.h b/include/linux/device.h index ba660731bd25..240d652a0696 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -416,6 +416,7 @@ struct dev_links_info { * @dma_pools: Dma pools (if dma'ble device). * @dma_mem: Internal for coherent mem override. * @cma_area: Contiguous memory area for dma allocations + * @dma_io_tlb_mem: Pointer to the swiotlb pool used. Not for driver use. * @archdata: For arch-specific additions. * @of_node: Associated device tree node. * @fwnode:Associated device node supplied by platform firmware. @@ -518,6 +519,9 @@ struct device { #ifdef CONFIG_DMA_CMA struct cma *cma_area; /* contiguous memory area for dma allocations */ +#endif +#ifdef CONFIG_SWIOTLB + struct io_tlb_mem *dma_io_tlb_mem; #endif /* arch specific additions */ struct dev_archdata archdata; diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index af416bcd1914..a9f5c08dd94a 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -339,7 +339,7 @@ void __init swiotlb_exit(void) static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size, enum dma_data_direction dir) { - struct io_tlb_mem *mem = io_tlb_default_mem; + struct io_tlb_mem *mem = dev->dma_io_tlb_mem; int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT; unsigned int offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1); phys_addr_t orig_addr = mem->slots[index].orig_addr; @@ -430,7 +430,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, unsigned int index) static int find_slots(struct device *dev, phys_addr_t orig_addr, size_t alloc_size) { - struct io_tlb_mem *mem = io_tlb_default_mem; + struct io_tlb_mem *mem = dev->dma_io_tlb_mem; unsigned long boundary_mask = dma_get_seg_boundary(dev); dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(dev, mem->start) & boundary_mask; @@ -507,7 +507,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr, size_t mapping_size, size_t alloc_size, enum dma_data_direction dir, unsigned long attrs) { - struct io_tlb_mem *mem = io_tlb_default_mem; + struct io_tlb_mem *mem = dev->dma_io_tlb_mem; unsigned int offset = swiotlb_align_offset(dev, orig_addr); unsigned int i; int index; @@ -558,7 +558,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr, size_t mapping_size, enum dma_data_direction dir, unsigned long attrs) { - struct io_tlb_mem *mem = io_tlb_default_mem; + struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem; unsigned long flags; unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr); int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT; -- 2.32.0.272.g935e593368-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v12 02/12] swiotlb: Refactor swiotlb_create_debugfs
Split the debugfs creation to make the code reusable for supporting different bounce buffer pools. Signed-off-by: Claire Chang Reviewed-by: Christoph Hellwig --- kernel/dma/swiotlb.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 3ba0f08a39a1..af416bcd1914 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -670,19 +670,26 @@ bool is_swiotlb_active(void) EXPORT_SYMBOL_GPL(is_swiotlb_active); #ifdef CONFIG_DEBUG_FS +static struct dentry *debugfs_dir; -static int __init swiotlb_create_debugfs(void) +static void swiotlb_create_debugfs_files(struct io_tlb_mem *mem) { - struct io_tlb_mem *mem = io_tlb_default_mem; - - if (!mem) - return 0; - mem->debugfs = debugfs_create_dir("swiotlb", NULL); debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, >nslabs); debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, >used); +} + +static int __init swiotlb_create_default_debugfs(void) +{ + struct io_tlb_mem *mem = io_tlb_default_mem; + + debugfs_dir = debugfs_create_dir("swiotlb", NULL); + if (mem) { + mem->debugfs = debugfs_dir; + swiotlb_create_debugfs_files(mem); + } return 0; } -late_initcall(swiotlb_create_debugfs); +late_initcall(swiotlb_create_default_debugfs); #endif -- 2.32.0.272.g935e593368-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu