Re: [RFC PATCH 2/4] iommu/vt-d: Add first level page table interfaces
Hi, Baolu, On Fri, Sep 27, 2019 at 10:27:24AM +0800, Lu Baolu wrote: > > > > > + spin_lock(&(domain)->page_table_lock); > > > > > \ > > > > > > > > Is this intended to lock here instead of taking the lock during the > > > > whole page table walk? Is it safe? > > > > > > > > Taking the example where nm==PTE: when we reach here how do we > > > > guarantee that the PMD page that has this PTE is still valid? > > > > > > We will always keep the non-leaf pages in the table, > > > > I see. Though, could I ask why? It seems to me that the existing 2nd > > level page table does not keep these when unmap, and it's not even use > > locking at all by leveraging cmpxchg()? > > I still need some time to understand how cmpxchg() solves the race issue > when reclaims pages. For example. > > Thread A Thread B > -A1: check all PTE's empty-B1: up-level PDE valid > -A2: clear the up-level PDE > -A3: reclaim the page -B2: populate the PTEs > > Both (A1,A2) and (B1,B2) should be atomic. Otherwise, race could happen. I'm not sure of this, but IMHO it is similarly because we need to allocate the iova ranges from iova allocator first, so thread A (who's going to unmap pages) and thread B (who's going to map new pages) should never have collapsed regions if happening concurrently. I'm referring to intel_unmap() in which we won't free the iova region before domain_unmap() completes (which should cover the whole process of A1-A3) so the same iova range to be unmapped won't be allocated to any new pages in some other thread. There's also a hint in domain_unmap(): /* we don't need lock here; nobody else touches the iova range */ > > Actually, the iova allocator always packs IOVA ranges close to the top > of the address space. This results in requiring a minimal number of > pages to map the allocated IOVA ranges, which makes memory onsumption > by IOMMU page tables tolerable. Hence, we don't need to reclaim the > pages until the whole page table is about to tear down. The real data > on my test machine also improves this. Do you mean you have run the code with a 1st-level-supported IOMMU hardware? IMHO any data point would be good to be in the cover letter as reference. [...] > > > > > +static struct page * > > > > > +mmunmap_pte_range(struct dmar_domain *domain, pmd_t *pmd, > > > > > + unsigned long addr, unsigned long end, > > > > > + struct page *freelist, bool reclaim) > > > > > +{ > > > > > + int i; > > > > > + unsigned long start; > > > > > + pte_t *pte, *first_pte; > > > > > + > > > > > + start = addr; > > > > > + pte = pte_offset_kernel(pmd, addr); > > > > > + first_pte = pte; > > > > > + do { > > > > > + set_pte(pte, __pte(0)); > > > > > + } while (pte++, addr += PAGE_SIZE, addr != end); > > > > > + > > > > > + domain_flush_cache(domain, first_pte, (void *)pte - (void > > > > > *)first_pte); > > > > > + > > > > > + /* Add page to free list if all entries are empty. */ > > > > > + if (reclaim) { > > > > > > > > Shouldn't we know whether to reclaim if with (addr, end) specified as > > > > long as they cover the whole range of this PMD? > > > > > > Current policy is that we don't reclaim any pages until the whole page > > > table will be torn down. > > > > Ah OK. But I saw that you're passing in relaim==!start_addr. > > Shouldn't that errornously trigger if one wants to unmap the 1st page > > as well even if not the whole address space? > > IOVA 0 is assumed to be reserved by the allocator. Otherwise, we have no > means to check whether a IOVA is valid. Is this an assumption of the allocator? Could that change in the future? IMHO that's not necessary if so, after all it's as simple as replacing (!start_addr) with (start == 0 && end == END). I see that in domain_unmap() it has a similar check when freeing pgd: if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) Thanks, -- Peter Xu
Re: [RFC PATCH 2/4] iommu/vt-d: Add first level page table interfaces
Hi Peter, On 9/26/19 11:49 AM, Peter Xu wrote: On Thu, Sep 26, 2019 at 10:35:24AM +0800, Lu Baolu wrote: [...] @@ -0,0 +1,342 @@ +// SPDX-License-Identifier: GPL-2.0 +/** + * intel-pgtable.c - Intel IOMMU page table manipulation library Could this be a bit misleading? Normally I'll use "IOMMU page table" to refer to the 2nd level page table only, and I'm always understanding it as "the new IOMMU will understand MMU page table as the 1st level". At least mention "IOMMU 1st level page table"? This file is a place holder for all code that manipulating iommu page tables (both first level and second level). Instead of putting everything in intel_iommu.c, let's make the code more structured so that it's easier for reading and maintaining. This is the motivation of this file. I see. + * + * Copyright (C) 2019 Intel Corporation + * + * Author: Lu Baolu + */ + +#define pr_fmt(fmt) "DMAR: " fmt +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifdef CONFIG_X86 +/* + * mmmap: Map a range of IO virtual address to physical addresses. "... to physical addresses using MMU page table"? Might be clearer? Yes. + */ +#define pgtable_populate(domain, nm) \ +do { \ + void *__new = alloc_pgtable_page(domain->nid); \ + if (!__new) \ + return -ENOMEM; \ + smp_wmb(); \ Could I ask what's this wmb used for? Sure. This is answered by a comment in __pte_alloc() in mm/memory.c. Let me post it here. /* * Ensure all pte setup (eg. pte page lock and page clearing) are * visible before the pte is made visible to other CPUs by being * put into page tables. * * The other side of the story is the pointer chasing in the page * table walking code (when walking the page table without locking; * ie. most of the time). Fortunately, these data accesses consist * of a chain of data-dependent loads, meaning most CPUs (alpha * being the notable exception) will already guarantee loads are * seen in-order. See the alpha page table accessors for the * smp_read_barrier_depends() barriers in page table walking code. */ smp_wmb(); /* Could be smp_wmb__xxx(before|after)_spin_lock */ Ok. I don't understand the rationale much behind but the comment seems to make sense... Could you help to comment above, like "please reference to comment in __pte_alloc" above the line? Yes. + spin_lock(&(domain)->page_table_lock); \ Is this intended to lock here instead of taking the lock during the whole page table walk? Is it safe? Taking the example where nm==PTE: when we reach here how do we guarantee that the PMD page that has this PTE is still valid? We will always keep the non-leaf pages in the table, I see. Though, could I ask why? It seems to me that the existing 2nd level page table does not keep these when unmap, and it's not even use locking at all by leveraging cmpxchg()? I still need some time to understand how cmpxchg() solves the race issue when reclaims pages. For example. Thread AThread B -A1: check all PTE's empty -B1: up-level PDE valid -A2: clear the up-level PDE -A3: reclaim the page -B2: populate the PTEs Both (A1,A2) and (B1,B2) should be atomic. Otherwise, race could happen. Actually, the iova allocator always packs IOVA ranges close to the top of the address space. This results in requiring a minimal number of pages to map the allocated IOVA ranges, which makes memory onsumption by IOMMU page tables tolerable. Hence, we don't need to reclaim the pages until the whole page table is about to tear down. The real data on my test machine also improves this. hence we only need a spin lock to serialize multiple tries of populating a entry for pde. As for pte, we can assume there is only single thread which can access it at a time because different mappings will have different iova's. Ah yes sorry nm will never be pte here... so do you mean the upper layer, e.g., the iova allocator will make sure the ranges to be mapped will never collapse with each other so setting PTEs do not need lock? Yes. + if (nm ## _present(*nm)) { \ + free_pgtable_page(__new); \ + } else {\ + set_##nm(nm, __##nm(__pa(__new) | _PAGE_TABLE));\ It seems to me that PV could trap calls to set_pte(). Then these could also be trapped by e.g. Xen? Are
Re: [RFC PATCH 1/3] dma-mapping: make overriding GFP_* flags arch customizable
On Thu, 26 Sep 2019 14:04:13 +0100 Robin Murphy wrote: > On 26/09/2019 13:37, Halil Pasic wrote: > > On Mon, 23 Sep 2019 17:21:17 +0200 > > Christoph Hellwig wrote: > > > >> On Mon, Sep 23, 2019 at 02:34:16PM +0200, Halil Pasic wrote: > >>> Before commit 57bf5a8963f8 ("dma-mapping: clear harmful GFP_* flags in > >>> common code") tweaking the client code supplied GFP_* flags used to be > >>> an issue handled in the architecture specific code. The commit message > >>> suggests, that fixing the client code would actually be a better way > >>> of dealing with this. > >>> > >>> On s390 common I/O devices are generally capable of using the full 64 > >>> bit address space for DMA I/O, but some chunks of the DMA memory need to > >>> be 31 bit addressable (in physical address space) because the > >>> instructions involved mandate it. Before switching to DMA API this used > >>> to be a non-issue, we used to allocate those chunks from ZONE_DMA. > >>> Currently our only option with the DMA API is to restrict the devices to > >>> (via dma_mask and coherent_dma_mask) to 31 bit, which is sub-optimal. > >>> > >>> Thus s390 we would benefit form having control over what flags are > >>> dropped. > >> > >> No way, sorry. You need to express that using a dma mask instead of > >> overloading the GFP flags. > > > > Thanks for your feedback and sorry for the delay. Can you help me figure > > out how can I express that using a dma mask? > > > > IMHO what you ask from me is frankly impossible. > > > > What I need is the ability to ask for (considering the physical > > address) 31 bit addressable DMA memory if the chunk is supposed to host > > control-type data that needs to be 31 bit addressable because that is > > how the architecture is, without affecting the normal data-path. So > > normally 64 bit mask is fine but occasionally (control) we would need > > a 31 bit mask. > > If it's possible to rework the "data" path to use streaming mappings > instead of coherent allocations, you could potentially mimic what virtio > does for a similar situation - see commit a0be1db4304f. > Thank you for your feedback. Just to be sure we are on the same pager, I read commit a0be1db4304f like this: 1) virtio_pci_legacy needs to allocate the virtqueues so that the base address fits 44 bits 2) if 64 bit dma is possible they set coherent_dma_mask to DMA_BIT_MASK(44) and dma_mask to DMA_BIT_MASK(64) 3) since the queues get allocated with coherent allocations 1) is satisfied 4) when the streaming mappings see a buffer that is beyond DMA_BIT_MASK(44) then it has to treat it as not coherent memory and do the syncing magic (which isn't actually required, just a side effect of the workaround. I'm actually trying to get virtio_ccw working nice with Protected Virtualization (best think of encrypted memory). So the "data" path is mostly the same as for virtio_pci. But unlike virtio_pci_legacy we are perfectly fine with virtqueues at an arbitrary address. We can make coherent_dma_mask == DMA_BIT_MASK(31) != dma_mask == DMA_BIT_MASK(64) but that affects all dma coherent allocations and needlessly force the virtio control structures into the [0..2G] range. Furthermore this whole issue has nothing to do with memory coherence. For ccw devices memory at addresses above 2G is no less coherent for ccw devices than memory at addresses below 2G. I've already implemented a patch (see after the scissors line) that takes a similar route as commit a0be1db4304f, but I consider that a workaround at best. But if that is what the community wants... I have to get the job done one way or the other. Many thanks for your help and your time. ---8< From: Halil Pasic Date: Thu, 25 Jul 2019 18:44:21 +0200 Subject: [PATCH 1/1] s390/cio: fix virtio-ccw DMA without PV Commit 37db8985b211 ("s390/cio: add basic protected virtualization support") breaks virtio-ccw devices with VIRTIO_F_IOMMU_PLATFORM for non Protected Virtualization (PV) guests. The problem is that the dma_mask of the ccw device, which is used by virtio core, gets changed from 64 to 31 bit, because some of the DMA allocations do require 31 bit addressable memory. For PV the only drawback is that some of the virtio structures must end up in ZONE_DMA because we have the bounce the buffers mapped via DMA API anyway. But for non PV guests we have a problem: because of the 31 bit mask guests bigger than 2G are likely to try bouncing buffers. The swiotlb however is only initialized for PV guests, because we don't want to bounce anything for non PV guests. The first such map kills the guest. Since the DMA API won't allow us to specify for each allocating whether we need memory from ZONE_DMA (31 bit addressable) or any DMA capable memory will do, let us abuse coherent_dma_mask (which is used for allocations) to force allocating form ZONE_DMA while changing dma_mask to DMA_BIT_MASK(64). Signed-off-by: Halil Pasic Reported-by: Marc
Re: [PATCH 1/6] iommu/qcom: Use the asid read from device-tree if specified
On Thu, Sep 26, 2019 at 5:05 AM wrote: > > From: AngeloGioacchino Del Regno > > As specified in this driver, the context banks are 0x1000 apart. > Problem is that sometimes the context number (our asid) does not > match this logic and we end up using the wrong one: this starts > being a problem in the case that we need to send TZ commands > to do anything on a specific context. > > For this reason, read the ASID from the DT if the property > "qcom,ctx-num" is present on the IOMMU context node. > > Signed-off-by: AngeloGioacchino Del Regno > --- > .../devicetree/bindings/iommu/qcom,iommu.txt| 1 + > drivers/iommu/qcom_iommu.c | 17 ++--- > 2 files changed, 15 insertions(+), 3 deletions(-) > > diff --git a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt > b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt > index 059139abce35..98102b323196 100644 > --- a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt > +++ b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt > @@ -38,6 +38,7 @@ to non-secure vs secure interrupt line. > - "qcom,msm-iommu-v1-sec" : secure context bank >- reg: Base address and size of context bank within the iommu >- interrupts : The context fault irq. > + - qcom,ctx-num : The number associated to the context bank I guess this should be more like: + and the following optional properties: + - qcom,ctx-num : The number associated to the context bank (since this is an optional property) BR, -R > > ** Optional properties: > > diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c > index dadc707573a2..5837556af147 100644 > --- a/drivers/iommu/qcom_iommu.c > +++ b/drivers/iommu/qcom_iommu.c > @@ -557,7 +557,8 @@ static int qcom_iommu_of_xlate(struct device *dev, struct > of_phandle_args *args) > * index into qcom_iommu->ctxs: > */ > if (WARN_ON(asid < 1) || > - WARN_ON(asid > qcom_iommu->num_ctxs)) > + WARN_ON(asid > qcom_iommu->num_ctxs) || > + WARN_ON(qcom_iommu->ctxs[asid - 1] == NULL)) > return -EINVAL; > > if (!fwspec->iommu_priv) { > @@ -665,7 +666,8 @@ static int qcom_iommu_sec_ptbl_init(struct device *dev) > > static int get_asid(const struct device_node *np) > { > - u32 reg; > + u32 reg, val; > + int asid; > > /* read the "reg" property directly to get the relative address > * of the context bank, and calculate the asid from that: > @@ -673,7 +675,16 @@ static int get_asid(const struct device_node *np) > if (of_property_read_u32_index(np, "reg", 0, )) > return -ENODEV; > > - return reg / 0x1000; /* context banks are 0x1000 apart */ > + /* Context banks are 0x1000 apart but, in some cases, the ASID > +* number doesn't match to this logic and needs to be passed > +* from the DT configuration explicitly. > +*/ > + if (of_property_read_u32(np, "qcom,ctx-num", )) > + asid = reg / 0x1000; > + else > + asid = val; > + > + return asid; > } > > static int qcom_iommu_ctx_probe(struct platform_device *pdev) > -- > 2.21.0 > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH] iommu/ipmmu-vmsa: Hook up r8a774b1 DT matching code
Hi Geert, Thanks for the feedback. > Subject: Re: [PATCH] iommu/ipmmu-vmsa: Hook up r8a774b1 DT matching > code > > Hi Biju, > > On Tue, Sep 24, 2019 at 9:43 AM Biju Das wrote: > > Support RZ/G2N (R8A774B1) IPMMU. > > > > Signed-off-by: Biju Das > > Thanks for your patch! > > > --- a/drivers/iommu/ipmmu-vmsa.c > > +++ b/drivers/iommu/ipmmu-vmsa.c > > > static const struct soc_device_attribute soc_rcar_gen3_whitelist[] = { > > { .soc_id = "r8a774c0", }, > > + { .soc_id = "r8a774b1", }, > > Please preserve alphabetical sort order. OK. Will send V2. > > { .soc_id = "r8a7795", .revision = "ES3.*" }, > > { .soc_id = "r8a77965", }, > > { .soc_id = "r8a77990", }, > > With the above fixed: > Reviewed-by: Geert Uytterhoeven > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux- > m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like > that. > -- Linus Torvalds ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] iommu/ipmmu-vmsa: Hook up r8a774b1 DT matching code
Hi Biju, On Tue, Sep 24, 2019 at 9:43 AM Biju Das wrote: > Support RZ/G2N (R8A774B1) IPMMU. > > Signed-off-by: Biju Das Thanks for your patch! > --- a/drivers/iommu/ipmmu-vmsa.c > +++ b/drivers/iommu/ipmmu-vmsa.c > static const struct soc_device_attribute soc_rcar_gen3_whitelist[] = { > { .soc_id = "r8a774c0", }, > + { .soc_id = "r8a774b1", }, Please preserve alphabetical sort order. > { .soc_id = "r8a7795", .revision = "ES3.*" }, > { .soc_id = "r8a77965", }, > { .soc_id = "r8a77990", }, With the above fixed: Reviewed-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Re: [RFC PATCH 1/3] dma-mapping: make overriding GFP_* flags arch customizable
On 26/09/2019 13:37, Halil Pasic wrote: On Mon, 23 Sep 2019 17:21:17 +0200 Christoph Hellwig wrote: On Mon, Sep 23, 2019 at 02:34:16PM +0200, Halil Pasic wrote: Before commit 57bf5a8963f8 ("dma-mapping: clear harmful GFP_* flags in common code") tweaking the client code supplied GFP_* flags used to be an issue handled in the architecture specific code. The commit message suggests, that fixing the client code would actually be a better way of dealing with this. On s390 common I/O devices are generally capable of using the full 64 bit address space for DMA I/O, but some chunks of the DMA memory need to be 31 bit addressable (in physical address space) because the instructions involved mandate it. Before switching to DMA API this used to be a non-issue, we used to allocate those chunks from ZONE_DMA. Currently our only option with the DMA API is to restrict the devices to (via dma_mask and dma_mask_coherent) to 31 bit, which is sub-optimal. Thus s390 we would benefit form having control over what flags are dropped. No way, sorry. You need to express that using a dma mask instead of overloading the GFP flags. Thanks for your feedback and sorry for the delay. Can you help me figure out how can I express that using a dma mask? IMHO what you ask from me is frankly impossible. What I need is the ability to ask for (considering the physical address) 31 bit addressable DMA memory if the chunk is supposed to host control-type data that needs to be 31 bit addressable because that is how the architecture is, without affecting the normal data-path. So normally 64 bit mask is fine but occasionally (control) we would need a 31 bit mask. If it's possible to rework the "data" path to use streaming mappings instead of coherent allocations, you could potentially mimic what virtio does for a similar situation - see commit a0be1db4304f. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dt-bindings: iommu: ipmmu-vmsa: Add r8a774b1 support
On Tue, Sep 24, 2019 at 9:41 AM Biju Das wrote: > Document RZ/G2N (R8A774B1) SoC bindings. > > Signed-off-by: Biju Das Reviewed-by: Geert Uytterhoeven Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 6/6] iommu/qcom: Add support for QCIOMMUv2 and QCIOMMU-500 secured contexts
From: AngeloGioacchino Del Regno This IOMMU is yet another Qualcomm variant of known IOMMUs, found in Family-B SoCs, such as MSM8956, MSM8976, MSM8953, MSM8917 and others, and that firmware perfectly adheres to this driver logic. This time, though, the catch is that the secure contexts are also secured, meaning that these are programmed by the bootloader or TZ and their "interesting" registers are locked out, so the hypervisor disallows touching them from the non-secure world: in this case the OS is supposed to blindly trust the secure configuration of these contexts and just use them "as they are". For this reason, it is necessary to distinguish between the v1 and 500/v2 secure contexts in this driver in order to adhere to this specification. To do this, add a new DT compatible, named "qcom,msm-iommu-v2-sec" that will trigger the new behavior. For the sake of completeness, also add a "qcom,msm-iommu-v2-ns" so that the human eye gets pleased with it when reading the contexts in the final SoC DT. Of course, the latter is just cosmetic. Signed-off-by: AngeloGioacchino Del Regno --- .../devicetree/bindings/iommu/qcom,iommu.txt | 2 ++ drivers/iommu/qcom_iommu.c| 19 +-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt index a4dd76b8c566..44676d3221d5 100644 --- a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt +++ b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt @@ -36,6 +36,8 @@ to non-secure vs secure interrupt line. - compatible : Should be one of: - "qcom,msm-iommu-v1-ns" : non-secure context bank - "qcom,msm-iommu-v1-sec" : secure context bank +- "qcom,msm-iommu-v2-ns" : non-secure QSMMUv2/QSMMU500 context bank +- "qcom,msm-iommu-v2-sec" : secure QSMMUv2/QSMMU500 context bank - reg: Base address and size of context bank within the iommu - interrupts : The context fault irq. - qcom,ctx-num : The number associated to the context bank diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c index 03c68fe9439b..2f65a4cdca78 100644 --- a/drivers/iommu/qcom_iommu.c +++ b/drivers/iommu/qcom_iommu.c @@ -56,6 +56,7 @@ struct qcom_iommu_ctx { struct device *dev; void __iomem*base; bool secure_init; + bool secured_ctx; u8 asid; /* asid and ctx bank # are 1:1 */ struct iommu_domain *domain; }; @@ -281,6 +282,12 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain, ctx->secure_init = true; } + /* Secured QSMMU-500/QSMMU-v2 contexts cannot be programmed */ + if (ctx->secured_ctx) { + ctx->domain = domain; + break; + } + qcom_iommu_reset_ctx(ctx); tcr[0] = pgtbl_cfg.arm_lpae_s1_cfg.tcr; @@ -762,10 +769,15 @@ static int qcom_iommu_ctx_probe(struct platform_device *pdev) return -ENODEV; } + if (of_device_is_compatible(dev->of_node, "qcom,msm-iommu-v2-sec")) + ctx->secured_ctx = true; + /* clear IRQs before registering fault handler, just in case the * boot-loader left us a surprise: */ - iommu_writel(ctx, ARM_SMMU_CB_FSR, iommu_readl(ctx, ARM_SMMU_CB_FSR)); + if (!ctx->secured_ctx) + iommu_writel(ctx, ARM_SMMU_CB_FSR, +iommu_readl(ctx, ARM_SMMU_CB_FSR)); ret = devm_request_irq(dev, irq, qcom_iommu_fault, @@ -807,6 +819,8 @@ static int qcom_iommu_ctx_remove(struct platform_device *pdev) static const struct of_device_id ctx_of_match[] = { { .compatible = "qcom,msm-iommu-v1-ns" }, { .compatible = "qcom,msm-iommu-v1-sec" }, + { .compatible = "qcom,msm-iommu-v2-ns" }, + { .compatible = "qcom,msm-iommu-v2-sec" }, { /* sentinel */ } }; @@ -824,7 +838,8 @@ static bool qcom_iommu_has_secure_context(struct qcom_iommu_dev *qcom_iommu) struct device_node *child; for_each_child_of_node(qcom_iommu->dev->of_node, child) - if (of_device_is_compatible(child, "qcom,msm-iommu-v1-sec")) + if (of_device_is_compatible(child, "qcom,msm-iommu-v1-sec") || + of_device_is_compatible(child, "qcom,msm-iommu-v2-sec")) return true; return false; -- 2.21.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4/6] iommu/qcom: Add support for AArch64 IOMMU pagetables
From: AngeloGioacchino Del Regno Some IOMMUs associated with some TZ firmwares may support switching to the AArch64 pagetable format by sending a "set pagetable format" scm command indicating the IOMMU secure ID and the context number to switch. Add a DT property "qcom,use-aarch64-pagetables" for this driver to send this command to the secure world and to switch the pagetable format to benefit of the ARM64 IOMMU pagetables, where possible. Note that, even though the command should be valid to switch each context, the property is made global because: 1. It doesn't make too much sense to switch only one or two context(s) to AA64 instead of just the entire thing 2. Some IOMMUs will go crazy and produce spectacular results when trying to mix up the pagetables on a per-context basis. Signed-off-by: AngeloGioacchino Del Regno --- .../devicetree/bindings/iommu/qcom,iommu.txt | 2 + drivers/iommu/qcom_iommu.c| 55 +++ 2 files changed, 47 insertions(+), 10 deletions(-) diff --git a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt index 98102b323196..a4dd76b8c566 100644 --- a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt +++ b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt @@ -39,6 +39,8 @@ to non-secure vs secure interrupt line. - reg: Base address and size of context bank within the iommu - interrupts : The context fault irq. - qcom,ctx-num : The number associated to the context bank + - qcom,use-aarch64-pagetables : Switch to AArch64 pagetable format on all + contexts declared in this IOMMU ** Optional properties: diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c index 2f31da2e7add..233ef496af27 100644 --- a/drivers/iommu/qcom_iommu.c +++ b/drivers/iommu/qcom_iommu.c @@ -48,6 +48,7 @@ struct qcom_iommu_dev { void __iomem*local_base; u32 sec_id; u8 num_ctxs; + bool use_aarch64_pt; struct qcom_iommu_ctx *ctxs[0]; /* indexed by asid-1 */ }; @@ -153,11 +154,17 @@ static void qcom_iommu_tlb_inv_range_nosync(unsigned long iova, size_t size, reg = leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA; for (i = 0; i < fwspec->num_ids; i++) { + struct qcom_iommu_dev *qcom_iommu = to_iommu(fwspec); struct qcom_iommu_ctx *ctx = to_ctx(fwspec, fwspec->ids[i]); size_t s = size; - iova &= ~12UL; - iova |= ctx->asid; + if (qcom_iommu->use_aarch64_pt) { + iova >>= 12; + iova |= (u64)ctx->asid << 48; + } else { + iova &= ~12UL; + iova |= ctx->asid; + } do { iommu_writel(ctx, reg, iova); iova += granule; @@ -222,6 +229,8 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain, struct qcom_iommu_domain *qcom_domain = to_qcom_iommu_domain(domain); struct io_pgtable_ops *pgtbl_ops; struct io_pgtable_cfg pgtbl_cfg; + enum io_pgtable_fmt pgtbl_fmt; + unsigned long ias, oas; int i, ret = 0; u32 reg; @@ -229,16 +238,25 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain, if (qcom_domain->iommu) goto out_unlock; + if (qcom_iommu->use_aarch64_pt) { + pgtbl_fmt = ARM_64_LPAE_S1; + ias = oas = 48; + } else { + pgtbl_fmt = ARM_32_LPAE_S1; + ias = 32; + oas = 40; + } + pgtbl_cfg = (struct io_pgtable_cfg) { .pgsize_bitmap = qcom_iommu_ops.pgsize_bitmap, - .ias= 32, - .oas= 40, + .ias= ias, + .oas= oas, .tlb= _gather_ops, .iommu_dev = qcom_iommu->dev, }; qcom_domain->iommu = qcom_iommu; - pgtbl_ops = alloc_io_pgtable_ops(ARM_32_LPAE_S1, _cfg, fwspec); + pgtbl_ops = alloc_io_pgtable_ops(pgtbl_fmt, _cfg, fwspec); if (!pgtbl_ops) { dev_err(qcom_iommu->dev, "failed to allocate pagetable ops\n"); ret = -ENOMEM; @@ -252,6 +270,7 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain, for (i = 0; i < fwspec->num_ids; i++) { struct qcom_iommu_ctx *ctx = to_ctx(fwspec, fwspec->ids[i]); + u32 tcr[2]; if (!ctx->secure_init) { ret = qcom_scm_restore_sec_cfg(qcom_iommu->sec_id, ctx->asid); @@ -264,12 +283,25 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain, qcom_iommu_reset_ctx(ctx);
[PATCH 2/6] iommu/qcom: Write TCR before TTBRs to fix ASID access behavior
From: AngeloGioacchino Del Regno As also stated in the arm-smmu driver, we must write the TCR before writing the TTBRs, since the TCR determines the access behavior of some fields. Signed-off-by: AngeloGioacchino Del Regno --- drivers/iommu/qcom_iommu.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c index 5837556af147..8431fb97a50f 100644 --- a/drivers/iommu/qcom_iommu.c +++ b/drivers/iommu/qcom_iommu.c @@ -245,6 +245,13 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain, ctx->secure_init = true; } + /* TCR */ + iommu_writel(ctx, ARM_SMMU_CB_TCR2, + (pgtbl_cfg.arm_lpae_s1_cfg.tcr >> 32) | + FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM)); + iommu_writel(ctx, ARM_SMMU_CB_TCR, + pgtbl_cfg.arm_lpae_s1_cfg.tcr); + /* TTBRs */ iommu_writeq(ctx, ARM_SMMU_CB_TTBR0, pgtbl_cfg.arm_lpae_s1_cfg.ttbr[0] | @@ -253,13 +260,6 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain, pgtbl_cfg.arm_lpae_s1_cfg.ttbr[1] | FIELD_PREP(TTBRn_ASID, ctx->asid)); - /* TCR */ - iommu_writel(ctx, ARM_SMMU_CB_TCR2, - (pgtbl_cfg.arm_lpae_s1_cfg.tcr >> 32) | - FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM)); - iommu_writel(ctx, ARM_SMMU_CB_TCR, - pgtbl_cfg.arm_lpae_s1_cfg.tcr); - /* MAIRs (stage-1 only) */ iommu_writel(ctx, ARM_SMMU_CB_S1_MAIR0, pgtbl_cfg.arm_lpae_s1_cfg.mair[0]); -- 2.21.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 5/6] iommu/qcom: Index contexts by asid number to allow asid 0
From: AngeloGioacchino Del Regno This driver was indexing the contexts by asid-1, which is probably done under the assumption that the first ASID is always 1. Unfortunately this is not entirely true: at least in the MSM8956 and MSM8976 GPU IOMMU, the gpu_user context's ASID number is zero. To allow using an asid number of zero, stop indexing the contexts by asid-1 and rather index them by asid. Signed-off-by: AngeloGioacchino Del Regno --- drivers/iommu/qcom_iommu.c | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c index 233ef496af27..03c68fe9439b 100644 --- a/drivers/iommu/qcom_iommu.c +++ b/drivers/iommu/qcom_iommu.c @@ -49,7 +49,7 @@ struct qcom_iommu_dev { u32 sec_id; u8 num_ctxs; bool use_aarch64_pt; - struct qcom_iommu_ctx *ctxs[0]; /* indexed by asid-1 */ + struct qcom_iommu_ctx *ctxs[0]; /* indexed by asid */ }; struct qcom_iommu_ctx { @@ -87,7 +87,7 @@ static struct qcom_iommu_ctx * to_ctx(struct iommu_fwspec *fwspec, unsigned asid struct qcom_iommu_dev *qcom_iommu = to_iommu(fwspec); if (!qcom_iommu) return NULL; - return qcom_iommu->ctxs[asid - 1]; + return qcom_iommu->ctxs[asid]; } static inline void @@ -604,12 +604,10 @@ static int qcom_iommu_of_xlate(struct device *dev, struct of_phandle_args *args) qcom_iommu = platform_get_drvdata(iommu_pdev); /* make sure the asid specified in dt is valid, so we don't have -* to sanity check this elsewhere, since 'asid - 1' is used to -* index into qcom_iommu->ctxs: +* to sanity check this elsewhere: */ - if (WARN_ON(asid < 1) || - WARN_ON(asid > qcom_iommu->num_ctxs) || - WARN_ON(qcom_iommu->ctxs[asid - 1] == NULL)) + if (WARN_ON(asid > qcom_iommu->num_ctxs) || + WARN_ON(qcom_iommu->ctxs[asid] == NULL)) return -EINVAL; if (!fwspec->iommu_priv) { @@ -789,7 +787,7 @@ static int qcom_iommu_ctx_probe(struct platform_device *pdev) dev_dbg(dev, "found asid %u\n", ctx->asid); - qcom_iommu->ctxs[ctx->asid - 1] = ctx; + qcom_iommu->ctxs[ctx->asid] = ctx; return 0; } @@ -801,7 +799,7 @@ static int qcom_iommu_ctx_remove(struct platform_device *pdev) platform_set_drvdata(pdev, NULL); - qcom_iommu->ctxs[ctx->asid - 1] = NULL; + qcom_iommu->ctxs[ctx->asid] = NULL; return 0; } @@ -846,7 +844,8 @@ static int qcom_iommu_device_probe(struct platform_device *pdev) for_each_child_of_node(dev->of_node, child) max_asid = max(max_asid, get_asid(child)); - sz = sizeof(*qcom_iommu) + (max_asid * sizeof(qcom_iommu->ctxs[0])); + sz = sizeof(*qcom_iommu); + sz += (max_asid + 1) * sizeof(qcom_iommu->ctxs[0]); qcom_iommu = devm_kzalloc(dev, sz, GFP_KERNEL); if (!qcom_iommu) -- 2.21.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/6] iommu/qcom: Use the asid read from device-tree if specified
From: AngeloGioacchino Del Regno As specified in this driver, the context banks are 0x1000 apart. Problem is that sometimes the context number (our asid) does not match this logic and we end up using the wrong one: this starts being a problem in the case that we need to send TZ commands to do anything on a specific context. For this reason, read the ASID from the DT if the property "qcom,ctx-num" is present on the IOMMU context node. Signed-off-by: AngeloGioacchino Del Regno --- .../devicetree/bindings/iommu/qcom,iommu.txt| 1 + drivers/iommu/qcom_iommu.c | 17 ++--- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt index 059139abce35..98102b323196 100644 --- a/Documentation/devicetree/bindings/iommu/qcom,iommu.txt +++ b/Documentation/devicetree/bindings/iommu/qcom,iommu.txt @@ -38,6 +38,7 @@ to non-secure vs secure interrupt line. - "qcom,msm-iommu-v1-sec" : secure context bank - reg: Base address and size of context bank within the iommu - interrupts : The context fault irq. + - qcom,ctx-num : The number associated to the context bank ** Optional properties: diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c index dadc707573a2..5837556af147 100644 --- a/drivers/iommu/qcom_iommu.c +++ b/drivers/iommu/qcom_iommu.c @@ -557,7 +557,8 @@ static int qcom_iommu_of_xlate(struct device *dev, struct of_phandle_args *args) * index into qcom_iommu->ctxs: */ if (WARN_ON(asid < 1) || - WARN_ON(asid > qcom_iommu->num_ctxs)) + WARN_ON(asid > qcom_iommu->num_ctxs) || + WARN_ON(qcom_iommu->ctxs[asid - 1] == NULL)) return -EINVAL; if (!fwspec->iommu_priv) { @@ -665,7 +666,8 @@ static int qcom_iommu_sec_ptbl_init(struct device *dev) static int get_asid(const struct device_node *np) { - u32 reg; + u32 reg, val; + int asid; /* read the "reg" property directly to get the relative address * of the context bank, and calculate the asid from that: @@ -673,7 +675,16 @@ static int get_asid(const struct device_node *np) if (of_property_read_u32_index(np, "reg", 0, )) return -ENODEV; - return reg / 0x1000; /* context banks are 0x1000 apart */ + /* Context banks are 0x1000 apart but, in some cases, the ASID +* number doesn't match to this logic and needs to be passed +* from the DT configuration explicitly. +*/ + if (of_property_read_u32(np, "qcom,ctx-num", )) + asid = reg / 0x1000; + else + asid = val; + + return asid; } static int qcom_iommu_ctx_probe(struct platform_device *pdev) -- 2.21.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 0/6] Add support for QCOM IOMMU v2 and 500
From: AngeloGioacchino Del Regno Some Qualcomm Family-B SoCs have got a different version of the QCOM IOMMU, specifically v2 and 500, which perfectly adhere to the current qcom_iommu driver, but need some variations due to slightly different hypervisor behavior. The personal aim is to upstream MSM8956 as much as possible. This code has been tested on two Sony phones featuring the Qualcomm MSM8956 SoC. AngeloGioacchino Del Regno (6): iommu/qcom: Use the asid read from device-tree if specified iommu/qcom: Write TCR before TTBRs to fix ASID access behavior iommu/qcom: Properly reset the IOMMU context iommu/qcom: Add support for AArch64 IOMMU pagetables iommu/qcom: Index contexts by asid number to allow asid 0 iommu/qcom: Add support for QCIOMMUv2 and QCIOMMU-500 secured contexts .../devicetree/bindings/iommu/qcom,iommu.txt | 5 + drivers/iommu/qcom_iommu.c| 133 ++ 2 files changed, 111 insertions(+), 27 deletions(-) -- 2.21.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/6] iommu/qcom: Properly reset the IOMMU context
From: AngeloGioacchino Del Regno To avoid context faults reset the context entirely on detach and to ensure a fresh clean start also do a complete reset before programming the context for domain initialization. Signed-off-by: AngeloGioacchino Del Regno --- drivers/iommu/qcom_iommu.c | 23 +-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c index 8431fb97a50f..2f31da2e7add 100644 --- a/drivers/iommu/qcom_iommu.c +++ b/drivers/iommu/qcom_iommu.c @@ -198,6 +198,23 @@ static irqreturn_t qcom_iommu_fault(int irq, void *dev) return IRQ_HANDLED; } +static void qcom_iommu_reset_ctx(struct qcom_iommu_ctx *ctx) +{ + iommu_writel(ctx, ARM_SMMU_CB_FAR, 0); + iommu_writel(ctx, ARM_SMMU_CB_FSR, 0); + iommu_writel(ctx, ARM_SMMU_CB_S1_MAIR1, 0); + iommu_writel(ctx, ARM_SMMU_CB_PAR, 0); + iommu_writel(ctx, ARM_SMMU_CB_S1_MAIR0, 0); + iommu_writel(ctx, ARM_SMMU_CB_SCTLR, 0); + iommu_writel(ctx, ARM_SMMU_CB_TCR2, 0); + iommu_writel(ctx, ARM_SMMU_CB_TCR, 0); + iommu_writeq(ctx, ARM_SMMU_CB_TTBR0, 0); + iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, 0); + + /* Should we issue a TLBSYNC there instead? */ + mb(); +} + static int qcom_iommu_init_domain(struct iommu_domain *domain, struct qcom_iommu_dev *qcom_iommu, struct iommu_fwspec *fwspec) @@ -245,6 +262,8 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain, ctx->secure_init = true; } + qcom_iommu_reset_ctx(ctx); + /* TCR */ iommu_writel(ctx, ARM_SMMU_CB_TCR2, (pgtbl_cfg.arm_lpae_s1_cfg.tcr >> 32) | @@ -390,8 +409,8 @@ static void qcom_iommu_detach_dev(struct iommu_domain *domain, struct device *de for (i = 0; i < fwspec->num_ids; i++) { struct qcom_iommu_ctx *ctx = to_ctx(fwspec, fwspec->ids[i]); - /* Disable the context bank: */ - iommu_writel(ctx, ARM_SMMU_CB_SCTLR, 0); + /* Disable and reset the context bank */ + qcom_iommu_reset_ctx(ctx); ctx->domain = NULL; } -- 2.21.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 6/6] iommu/amd: Lock code paths traversing protection_domain->dev_list
On Wed Sep 25 19, Joerg Roedel wrote: From: Joerg Roedel The traversing of this list requires protection_domain->lock to be taken to avoid nasty races with attach/detach code. Make sure the lock is held on all code-paths traversing this list. Reported-by: Filippo Sironi Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path") Signed-off-by: Joerg Roedel --- drivers/iommu/amd_iommu.c | 25 - 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index bac4e20a5919..9c26976a0f99 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1334,8 +1334,12 @@ static void domain_flush_np_cache(struct protection_domain *domain, dma_addr_t iova, size_t size) { if (unlikely(amd_iommu_np_cache)) { + unsigned long flags; + + spin_lock_irqsave(>lock, flags); domain_flush_pages(domain, iova, size); domain_flush_complete(domain); + spin_unlock_irqrestore(>lock, flags); } } @@ -1700,8 +1704,13 @@ static int iommu_map_page(struct protection_domain *dom, ret = 0; out: - if (updated) + if (updated) { + unsigned long flags; + + spin_lock_irqsave(>lock, flags); update_domain(dom); + spin_unlock_irqrestore(>lock, flags); + } /* Everything flushed out, free pages now */ free_page_list(freelist); @@ -1857,8 +1866,12 @@ static void free_gcr3_table(struct protection_domain *domain) static void dma_ops_domain_flush_tlb(struct dma_ops_domain *dom) { + unsigned long flags; + + spin_lock_irqsave(>domain.lock, flags); domain_flush_tlb(>domain); domain_flush_complete(>domain); + spin_unlock_irqrestore(>domain.lock, flags); } static void iova_domain_flush_tlb(struct iova_domain *iovad) @@ -2414,6 +2427,7 @@ static dma_addr_t __map_single(struct device *dev, { dma_addr_t offset = paddr & ~PAGE_MASK; dma_addr_t address, start, ret; + unsigned long flags; unsigned int pages; int prot = 0; int i; @@ -2451,8 +2465,10 @@ static dma_addr_t __map_single(struct device *dev, iommu_unmap_page(_dom->domain, start, PAGE_SIZE); } + spin_lock_irqsave(_dom->domain.lock, flags); domain_flush_tlb(_dom->domain); domain_flush_complete(_dom->domain); + spin_unlock_irqrestore(_dom->domain.lock, flags); dma_ops_free_iova(dma_dom, address, pages); @@ -2481,8 +2497,12 @@ static void __unmap_single(struct dma_ops_domain *dma_dom, } if (amd_iommu_unmap_flush) { + unsigned long flags; + + spin_lock_irqsave(_dom->domain.lock, flags); domain_flush_tlb(_dom->domain); domain_flush_complete(_dom->domain); + spin_unlock_irqrestore(_dom->domain.lock, flags); dma_ops_free_iova(dma_dom, dma_addr, pages); } else { pages = __roundup_pow_of_two(pages); @@ -3246,9 +3266,12 @@ static bool amd_iommu_is_attach_deferred(struct iommu_domain *domain, static void amd_iommu_flush_iotlb_all(struct iommu_domain *domain) { struct protection_domain *dom = to_pdomain(domain); + unsigned long flags; + spin_lock_irqsave(>lock, flags); domain_flush_tlb_pde(dom); domain_flush_complete(dom); + spin_unlock_irqrestore(>lock, flags); } static void amd_iommu_iotlb_sync(struct iommu_domain *domain, -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu Reviewed-by: Jerry Snitselaar ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 5/6] iommu/amd: Lock dev_data in attach/detach code paths
On Wed Sep 25 19, Joerg Roedel wrote: From: Joerg Roedel Make sure that attaching a detaching a device can't race against each other and protect the iommu_dev_data with a spin_lock in these code paths. Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path") Signed-off-by: Joerg Roedel --- drivers/iommu/amd_iommu.c | 9 + drivers/iommu/amd_iommu_types.h | 3 +++ 2 files changed, 12 insertions(+) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 459247c32dc0..bac4e20a5919 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -201,6 +201,7 @@ static struct iommu_dev_data *alloc_dev_data(u16 devid) if (!dev_data) return NULL; + spin_lock_init(_data->lock); dev_data->devid = devid; ratelimit_default_init(_data->rs); @@ -2157,6 +2158,8 @@ static int attach_device(struct device *dev, dev_data = get_dev_data(dev); + spin_lock(_data->lock); + ret = -EBUSY; if (dev_data->domain != NULL) goto out; @@ -2199,6 +2202,8 @@ static int attach_device(struct device *dev, domain_flush_complete(domain); out: + spin_unlock(_data->lock); + spin_unlock_irqrestore(>lock, flags); return ret; @@ -2218,6 +2223,8 @@ static void detach_device(struct device *dev) spin_lock_irqsave(>lock, flags); + spin_lock(_data->lock); + /* * First check if the device is still attached. It might already * be detached from its domain because the generic @@ -2240,6 +2247,8 @@ static void detach_device(struct device *dev) dev_data->ats.enabled = false; out: + spin_unlock(_data->lock); + spin_unlock_irqrestore(>lock, flags); } diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h index 0186501ab971..c9c1612d52e0 100644 --- a/drivers/iommu/amd_iommu_types.h +++ b/drivers/iommu/amd_iommu_types.h @@ -633,6 +633,9 @@ struct devid_map { * This struct contains device specific data for the IOMMU */ struct iommu_dev_data { + /*Protect against attach/detach races */ + spinlock_t lock; + struct list_head list;/* For domain->dev_list */ struct llist_node dev_data_list; /* For global dev_data_list */ struct protection_domain *domain; /* Domain the device is bound to */ -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu Reviewed-by: Jerry Snitselaar ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/6] iommu/amd: Check for busy devices earlier in attach_device()
On Wed Sep 25 19, Joerg Roedel wrote: From: Joerg Roedel Check early in attach_device whether the device is already attached to a domain. This also simplifies the code path so that __attach_device() can be removed. Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path") Signed-off-by: Joerg Roedel --- drivers/iommu/amd_iommu.c | 25 +++-- 1 file changed, 7 insertions(+), 18 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 2919168577ff..459247c32dc0 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2072,23 +2072,6 @@ static void do_detach(struct iommu_dev_data *dev_data) domain->dev_cnt -= 1; } -/* - * If a device is not yet associated with a domain, this function makes the - * device visible in the domain - */ -static int __attach_device(struct iommu_dev_data *dev_data, - struct protection_domain *domain) -{ - if (dev_data->domain != NULL) - return -EBUSY; - - /* Attach alias group root */ - do_attach(dev_data, domain); - - return 0; -} - - static void pdev_iommuv2_disable(struct pci_dev *pdev) { pci_disable_ats(pdev); @@ -2174,6 +2157,10 @@ static int attach_device(struct device *dev, dev_data = get_dev_data(dev); + ret = -EBUSY; + if (dev_data->domain != NULL) + goto out; + if (!dev_is_pci(dev)) goto skip_ats_check; @@ -2198,7 +2185,9 @@ static int attach_device(struct device *dev, } skip_ats_check: - ret = __attach_device(dev_data, domain); + ret = 0; + + do_attach(dev_data, domain); /* * We might boot into a crash-kernel here. The crashed kernel -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu Reviewed-by: Jerry Snitselaar ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/6] iommu/amd: Take domain->lock for complete attach/detach path
On Wed Sep 25 19, Joerg Roedel wrote: From: Joerg Roedel The code-paths before __attach_device() and __detach_device() are called also access and modify domain state, so take the domain lock there too. This allows to get rid of the __detach_device() function. Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path") Signed-off-by: Joerg Roedel --- drivers/iommu/amd_iommu.c | 65 --- 1 file changed, 26 insertions(+), 39 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 37a9c04fc728..2919168577ff 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2079,27 +2079,13 @@ static void do_detach(struct iommu_dev_data *dev_data) static int __attach_device(struct iommu_dev_data *dev_data, struct protection_domain *domain) { - unsigned long flags; - int ret; - - /* lock domain */ - spin_lock_irqsave(>lock, flags); - - ret = -EBUSY; if (dev_data->domain != NULL) - goto out_unlock; + return -EBUSY; /* Attach alias group root */ do_attach(dev_data, domain); - ret = 0; - -out_unlock: - - /* ready */ - spin_unlock_irqrestore(>lock, flags); - - return ret; + return 0; } @@ -2181,8 +2167,11 @@ static int attach_device(struct device *dev, { struct pci_dev *pdev; struct iommu_dev_data *dev_data; + unsigned long flags; int ret; + spin_lock_irqsave(>lock, flags); + dev_data = get_dev_data(dev); if (!dev_is_pci(dev)) @@ -2190,12 +2179,13 @@ static int attach_device(struct device *dev, pdev = to_pci_dev(dev); if (domain->flags & PD_IOMMUV2_MASK) { + ret = -EINVAL; if (!dev_data->passthrough) - return -EINVAL; + goto out; if (dev_data->iommu_v2) { if (pdev_iommuv2_enable(pdev) != 0) - return -EINVAL; + goto out; dev_data->ats.enabled = true; dev_data->ats.qdep= pci_ats_queue_depth(pdev); @@ -2219,24 +2209,10 @@ static int attach_device(struct device *dev, domain_flush_complete(domain); - return ret; -} - -/* - * Removes a device from a protection domain (unlocked) - */ -static void __detach_device(struct iommu_dev_data *dev_data) -{ - struct protection_domain *domain; - unsigned long flags; - - domain = dev_data->domain; - - spin_lock_irqsave(>lock, flags); - - do_detach(dev_data); - +out: spin_unlock_irqrestore(>lock, flags); + + return ret; } /* @@ -2246,10 +,13 @@ static void detach_device(struct device *dev) { struct protection_domain *domain; struct iommu_dev_data *dev_data; + unsigned long flags; dev_data = get_dev_data(dev); domain = dev_data->domain; + spin_lock_irqsave(>lock, flags); + /* * First check if the device is still attached. It might already * be detached from its domain because the generic @@ -2257,12 +2236,12 @@ static void detach_device(struct device *dev) * our alias handling. */ if (WARN_ON(!dev_data->domain)) - return; + goto out; - __detach_device(dev_data); + do_detach(dev_data); if (!dev_is_pci(dev)) - return; + goto out; if (domain->flags & PD_IOMMUV2_MASK && dev_data->iommu_v2) pdev_iommuv2_disable(to_pci_dev(dev)); @@ -2270,6 +2249,9 @@ static void detach_device(struct device *dev) pci_disable_ats(to_pci_dev(dev)); dev_data->ats.enabled = false; + +out: + spin_unlock_irqrestore(>lock, flags); } static int amd_iommu_add_device(struct device *dev) @@ -2904,13 +2886,18 @@ int __init amd_iommu_init_dma_ops(void) static void cleanup_domain(struct protection_domain *domain) { struct iommu_dev_data *entry; + unsigned long flags; + + spin_lock_irqsave(>lock, flags); while (!list_empty(>dev_list)) { entry = list_first_entry(>dev_list, struct iommu_dev_data, list); BUG_ON(!entry->domain); - __detach_device(entry); + do_detach(entry); } + + spin_unlock_irqrestore(>lock, flags); } static void protection_domain_free(struct protection_domain *domain) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu Reviewed-by: Jerry Snitselaar ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/6] iommu/amd: Remove amd_iommu_devtable_lock
On Wed Sep 25 19, Joerg Roedel wrote: From: Joerg Roedel The lock is not necessary because the device table does not contain shared state that needs protection. Locking is only needed on an individual entry basis, and that needs to happen on the iommu_dev_data level. Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path") Signed-off-by: Joerg Roedel --- drivers/iommu/amd_iommu.c | 23 ++- 1 file changed, 6 insertions(+), 17 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 042854bbc5bc..37a9c04fc728 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -70,7 +70,6 @@ */ #define AMD_IOMMU_PGSIZES ((~0xFFFUL) & ~(2ULL << 38)) -static DEFINE_SPINLOCK(amd_iommu_devtable_lock); static DEFINE_SPINLOCK(pd_bitmap_lock); /* List of all available dev_data structures */ @@ -2080,10 +2079,11 @@ static void do_detach(struct iommu_dev_data *dev_data) static int __attach_device(struct iommu_dev_data *dev_data, struct protection_domain *domain) { + unsigned long flags; int ret; /* lock domain */ - spin_lock(>lock); + spin_lock_irqsave(>lock, flags); ret = -EBUSY; if (dev_data->domain != NULL) @@ -2097,7 +2097,7 @@ static int __attach_device(struct iommu_dev_data *dev_data, out_unlock: /* ready */ - spin_unlock(>lock); + spin_unlock_irqrestore(>lock, flags); return ret; } @@ -2181,7 +2181,6 @@ static int attach_device(struct device *dev, { struct pci_dev *pdev; struct iommu_dev_data *dev_data; - unsigned long flags; int ret; dev_data = get_dev_data(dev); @@ -2209,9 +2208,7 @@ static int attach_device(struct device *dev, } skip_ats_check: - spin_lock_irqsave(_iommu_devtable_lock, flags); ret = __attach_device(dev_data, domain); - spin_unlock_irqrestore(_iommu_devtable_lock, flags); /* * We might boot into a crash-kernel here. The crashed kernel @@ -2231,14 +2228,15 @@ static int attach_device(struct device *dev, static void __detach_device(struct iommu_dev_data *dev_data) { struct protection_domain *domain; + unsigned long flags; domain = dev_data->domain; - spin_lock(>lock); + spin_lock_irqsave(>lock, flags); do_detach(dev_data); - spin_unlock(>lock); + spin_unlock_irqrestore(>lock, flags); } /* @@ -2248,7 +2246,6 @@ static void detach_device(struct device *dev) { struct protection_domain *domain; struct iommu_dev_data *dev_data; - unsigned long flags; dev_data = get_dev_data(dev); domain = dev_data->domain; @@ -2262,10 +2259,7 @@ static void detach_device(struct device *dev) if (WARN_ON(!dev_data->domain)) return; - /* lock device table */ - spin_lock_irqsave(_iommu_devtable_lock, flags); __detach_device(dev_data); - spin_unlock_irqrestore(_iommu_devtable_lock, flags); if (!dev_is_pci(dev)) return; @@ -2910,9 +2904,6 @@ int __init amd_iommu_init_dma_ops(void) static void cleanup_domain(struct protection_domain *domain) { struct iommu_dev_data *entry; - unsigned long flags; - - spin_lock_irqsave(_iommu_devtable_lock, flags); while (!list_empty(>dev_list)) { entry = list_first_entry(>dev_list, @@ -2920,8 +2911,6 @@ static void cleanup_domain(struct protection_domain *domain) BUG_ON(!entry->domain); __detach_device(entry); } - - spin_unlock_irqrestore(_iommu_devtable_lock, flags); } static void protection_domain_free(struct protection_domain *domain) -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu Reviewed-by: Jerry Snitselaar ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/6] iommu/amd: Remove domain->updated
On Wed Sep 25 19, Joerg Roedel wrote: From: Joerg Roedel This struct member was used to track whether a domain change requires updates to the device-table and IOMMU cache flushes. The problem is, that access to this field is racy since locking in the common mapping code-paths has been eliminated. Move the updated field to the stack to get rid of all potential races and remove the field from the struct. Fixes: 92d420ec028d ("iommu/amd: Relax locking in dma_ops path") Signed-off-by: Joerg Roedel --- drivers/iommu/amd_iommu.c | 49 + drivers/iommu/amd_iommu_types.h | 1 - 2 files changed, 25 insertions(+), 25 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 7bdce3b10f3d..042854bbc5bc 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1458,10 +1458,11 @@ static void free_pagetable(struct protection_domain *domain) * another level increases the size of the address space by 9 bits to a size up * to 64 bits. */ -static void increase_address_space(struct protection_domain *domain, +static bool increase_address_space(struct protection_domain *domain, gfp_t gfp) { unsigned long flags; + bool ret = false; u64 *pte; spin_lock_irqsave(>lock, flags); @@ -1478,19 +1479,21 @@ static void increase_address_space(struct protection_domain *domain, iommu_virt_to_phys(domain->pt_root)); domain->pt_root = pte; domain->mode+= 1; - domain->updated = true; + + ret = true; out: spin_unlock_irqrestore(>lock, flags); - return; + return ret; } static u64 *alloc_pte(struct protection_domain *domain, unsigned long address, unsigned long page_size, u64 **pte_page, - gfp_t gfp) + gfp_t gfp, + bool *updated) { int level, end_lvl; u64 *pte, *page; @@ -1498,7 +1501,7 @@ static u64 *alloc_pte(struct protection_domain *domain, BUG_ON(!is_power_of_2(page_size)); while (address > PM_LEVEL_SIZE(domain->mode)) - increase_address_space(domain, gfp); + *updated = increase_address_space(domain, gfp) || *updated; level = domain->mode - 1; pte = >pt_root[PM_LEVEL_INDEX(level, address)]; @@ -1530,7 +1533,7 @@ static u64 *alloc_pte(struct protection_domain *domain, for (i = 0; i < count; ++i) cmpxchg64([i], __pte, 0ULL); - domain->updated = true; + *updated = true; continue; } @@ -1547,7 +1550,7 @@ static u64 *alloc_pte(struct protection_domain *domain, if (cmpxchg64(pte, __pte, __npte) != __pte) free_page((unsigned long)page); else if (IOMMU_PTE_PRESENT(__pte)) - domain->updated = true; + *updated = true; continue; } @@ -1656,28 +1659,29 @@ static int iommu_map_page(struct protection_domain *dom, gfp_t gfp) { struct page *freelist = NULL; + bool updated = false; u64 __pte, *pte; - int i, count; + int ret, i, count; BUG_ON(!IS_ALIGNED(bus_addr, page_size)); BUG_ON(!IS_ALIGNED(phys_addr, page_size)); + ret = -EINVAL; if (!(prot & IOMMU_PROT_MASK)) - return -EINVAL; + goto out; count = PAGE_SIZE_PTE_COUNT(page_size); - pte = alloc_pte(dom, bus_addr, page_size, NULL, gfp); + pte = alloc_pte(dom, bus_addr, page_size, NULL, gfp, ); - if (!pte) { - update_domain(dom); - return -ENOMEM; - } + ret = -ENOMEM; + if (!pte) + goto out; for (i = 0; i < count; ++i) freelist = free_clear_pte([i], pte[i], freelist); if (freelist != NULL) - dom->updated = true; + updated = true; if (count > 1) { __pte = PAGE_SIZE_PTE(__sme_set(phys_addr), page_size); @@ -1693,12 +1697,16 @@ static int iommu_map_page(struct protection_domain *dom, for (i = 0; i < count; ++i) pte[i] = __pte; - update_domain(dom); + ret = 0; + +out: + if (updated) + update_domain(dom); /* Everything flushed out, free pages now */ free_page_list(freelist); - return 0; + return ret; } static unsigned long iommu_unmap_page(struct protection_domain *dom, @@ -2399,15 +2407,10 @@ static void update_device_table(struct protection_domain *domain) static void update_domain(struct protection_domain *domain) { - if (!domain->updated) -
[LINUX PATCH] dma-mapping: Control memset operation using gfp flags
In case of 4k video buffer, the allocation from a reserved memory is taking a long time, ~500ms. This is root caused to the memset() operations on the allocated memory which is consuming more cpu cycles. Due to this delay, we see that initial frames are being dropped. To fix this, we have wrapped the default memset, done when allocating coherent memory, under the __GFP_ZERO flag. So, we only clear allocated memory if __GFP_ZERO flag is enabled. We believe this should be safe as the video decoder always writes before reading. This optimizes decoder initialization as we do not set the __GFP_ZERO flag when allocating memory for decoder. With this optimization, we don't see initial frame drops and decoder initialization time is ~100ms. This patch adds plumbing through dma_alloc functions to pass gfp flag set by user to __dma_alloc_from_coherent(). Here gfp flag is checked for __GFP_ZERO. If present, we memset the buffer to 0 otherwise we skip memset. Signed-off-by: Dylan Yip --- arch/arm/mm/dma-mapping-nommu.c | 2 +- include/linux/dma-mapping.h | 11 +++ kernel/dma/coherent.c | 15 +-- kernel/dma/mapping.c| 2 +- 4 files changed, 18 insertions(+), 12 deletions(-) diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index 52b8255..242b2c3 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -35,7 +35,7 @@ static void *arm_nommu_dma_alloc(struct device *dev, size_t size, unsigned long attrs) { - void *ret = dma_alloc_from_global_coherent(size, dma_handle); + void *ret = dma_alloc_from_global_coherent(size, dma_handle, gfp); /* * dma_alloc_from_global_coherent() may fail because: diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index f7d1eea..b715c9f 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -160,24 +160,27 @@ static inline int is_device_dma_capable(struct device *dev) * Don't use them in device drivers. */ int dma_alloc_from_dev_coherent(struct device *dev, ssize_t size, - dma_addr_t *dma_handle, void **ret); + dma_addr_t *dma_handle, void **ret, + gfp_t flag); int dma_release_from_dev_coherent(struct device *dev, int order, void *vaddr); int dma_mmap_from_dev_coherent(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, size_t size, int *ret); -void *dma_alloc_from_global_coherent(ssize_t size, dma_addr_t *dma_handle); +void *dma_alloc_from_global_coherent(ssize_t size, dma_addr_t *dma_handle, +gfp_t flag); int dma_release_from_global_coherent(int order, void *vaddr); int dma_mmap_from_global_coherent(struct vm_area_struct *vma, void *cpu_addr, size_t size, int *ret); #else -#define dma_alloc_from_dev_coherent(dev, size, handle, ret) (0) +#define dma_alloc_from_dev_coherent(dev, size, handle, ret, flag) (0) #define dma_release_from_dev_coherent(dev, order, vaddr) (0) #define dma_mmap_from_dev_coherent(dev, vma, vaddr, order, ret) (0) static inline void *dma_alloc_from_global_coherent(ssize_t size, - dma_addr_t *dma_handle) + dma_addr_t *dma_handle, + gfp_t flag) { return NULL; } diff --git a/kernel/dma/coherent.c b/kernel/dma/coherent.c index 29fd659..d85fab5 100644 --- a/kernel/dma/coherent.c +++ b/kernel/dma/coherent.c @@ -136,7 +136,7 @@ void dma_release_declared_memory(struct device *dev) EXPORT_SYMBOL(dma_release_declared_memory); static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem, - ssize_t size, dma_addr_t *dma_handle) + ssize_t size, dma_addr_t *dma_handle, gfp_t gfp_flag) { int order = get_order(size); unsigned long flags; @@ -158,7 +158,8 @@ static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem, *dma_handle = mem->device_base + (pageno << PAGE_SHIFT); ret = mem->virt_base + (pageno << PAGE_SHIFT); spin_unlock_irqrestore(>spinlock, flags); - memset(ret, 0, size); + if (gfp_flag & __GFP_ZERO) + memset(ret, 0, size); return ret; err: spin_unlock_irqrestore(>spinlock, flags); @@ -172,6 +173,7 @@ static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem, * @dma_handle:This will be filled with the correct dma handle * @ret: This pointer will be filled with the virtual address * to allocated area. + * @flag: gfp flag set by user * * This function should be only called from per-arch dma_alloc_coherent() * to support allocation from per-device coherent memory pools. @@ -180,24 +182,25