Re: [PATCH v6 0/2] make dma_alloc_coherent NUMA-aware by per-NUMA CMA
FYI, as of the last one I'm fine now, bit I really need an ACK from the arm64 maintainers. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 1/2] dma-contiguous: provide the ability to reserve per-numa CMA
On 8/20/20 7:26 PM, Barry Song wrote: > > > Cc: Jonathan Cameron > Cc: Christoph Hellwig > Cc: Marek Szyprowski > Cc: Will Deacon > Cc: Robin Murphy > Cc: Ganapatrao Kulkarni > Cc: Catalin Marinas > Cc: Nicolas Saenz Julienne > Cc: Steve Capper > Cc: Andrew Morton > Cc: Mike Rapoport > Signed-off-by: Barry Song > --- > v6: rebase on top of 5.9-rc1; > doc cleanup > > .../admin-guide/kernel-parameters.txt | 9 ++ > include/linux/dma-contiguous.h| 6 ++ > kernel/dma/Kconfig| 10 ++ > kernel/dma/contiguous.c | 100 -- > 4 files changed, 115 insertions(+), 10 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt > b/Documentation/admin-guide/kernel-parameters.txt > index bdc1f33fd3d1..3f33b89aeab5 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -599,6 +599,15 @@ > altogether. For more information, see > include/linux/dma-contiguous.h > > + pernuma_cma=nn[MG] memparse() allows any one of these suffixes: K, M, G, T, P, E and nothing in the option parsing function cares what suffix is used... > + [ARM64,KNL] > + Sets the size of kernel per-numa memory area for > + contiguous memory allocations. A value of 0 disables > + per-numa CMA altogether. DMA users on node nid will > + first try to allocate buffer from the pernuma area > + which is located in node nid, if the allocation fails, > + they will fallback to the global default memory area. > + > cmo_free_hint= [PPC] Format: { yes | no } > Specify whether pages are marked as being inactive > when they are freed. This is used in CMO environments > diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c > index cff7e60968b9..89b95f10e56d 100644 > --- a/kernel/dma/contiguous.c > +++ b/kernel/dma/contiguous.c > @@ -69,6 +69,19 @@ static int __init early_cma(char *p) > } > early_param("cma", early_cma); > > +#ifdef CONFIG_DMA_PERNUMA_CMA > + > +static struct cma *dma_contiguous_pernuma_area[MAX_NUMNODES]; > +static phys_addr_t pernuma_size_bytes __initdata; why phys_addr_t? couldn't it just be unsigned long long? OK, so cma_declare_contiguous_nid() uses phys_addr_t. Fine. > + > +static int __init early_pernuma_cma(char *p) > +{ > + pernuma_size_bytes = memparse(p, &p); > + return 0; > +} > +early_param("pernuma_cma", early_pernuma_cma); > +#endif > + > #ifdef CONFIG_CMA_SIZE_PERCENTAGE > > static phys_addr_t __init __maybe_unused cma_early_percent_memory(void) > @@ -96,6 +109,34 @@ static inline __maybe_unused phys_addr_t > cma_early_percent_memory(void) > > #endif > > +#ifdef CONFIG_DMA_PERNUMA_CMA > +void __init dma_pernuma_cma_reserve(void) > +{ > + int nid; > + > + if (!pernuma_size_bytes) > + return; > + > + for_each_node_state(nid, N_ONLINE) { > + int ret; > + char name[20]; > + struct cma **cma = &dma_contiguous_pernuma_area[nid]; > + > + snprintf(name, sizeof(name), "pernuma%d", nid); > + ret = cma_declare_contiguous_nid(0, pernuma_size_bytes, 0, 0, > + 0, false, name, cma, nid); > + if (ret) { > + pr_warn("%s: reservation failed: err %d, node %d", > __func__, > + ret, nid); > + continue; > + } > + > + pr_debug("%s: reserved %llu MiB on node %d\n", __func__, > + (unsigned long long)pernuma_size_bytes / SZ_1M, nid); Conversely, if you want to leave pernuma_size_bytes as phys_addr_t, you should use %pa (or %pap) to print it. > + } > +} > +#endif -- ~Randy ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 1/2] dma-contiguous: provide the ability to reserve per-numa CMA
Right now, drivers like ARM SMMU are using dma_alloc_coherent() to get coherent DMA buffers to save their command queues and page tables. As there is only one default CMA in the whole system, SMMUs on nodes other than node0 will get remote memory. This leads to significant latency. This patch provides per-numa CMA so that drivers like SMMU can get local memory. Tests show localizing CMA can decrease dma_unmap latency much. For instance, before this patch, SMMU on node2 has to wait for more than 560ns for the completion of CMD_SYNC in an empty command queue; with this patch, it needs 240ns only. A positive side effect of this patch would be improving performance even further for those users who are worried about performance more than DMA security and use iommu.passthrough=1 to skip IOMMU. With local CMA, all drivers can get local coherent DMA buffers. Cc: Jonathan Cameron Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Will Deacon Cc: Robin Murphy Cc: Ganapatrao Kulkarni Cc: Catalin Marinas Cc: Nicolas Saenz Julienne Cc: Steve Capper Cc: Andrew Morton Cc: Mike Rapoport Signed-off-by: Barry Song --- v6: rebase on top of 5.9-rc1; doc cleanup .../admin-guide/kernel-parameters.txt | 9 ++ include/linux/dma-contiguous.h| 6 ++ kernel/dma/Kconfig| 10 ++ kernel/dma/contiguous.c | 100 -- 4 files changed, 115 insertions(+), 10 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index bdc1f33fd3d1..3f33b89aeab5 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -599,6 +599,15 @@ altogether. For more information, see include/linux/dma-contiguous.h + pernuma_cma=nn[MG] + [ARM64,KNL] + Sets the size of kernel per-numa memory area for + contiguous memory allocations. A value of 0 disables + per-numa CMA altogether. DMA users on node nid will + first try to allocate buffer from the pernuma area + which is located in node nid, if the allocation fails, + they will fallback to the global default memory area. + cmo_free_hint= [PPC] Format: { yes | no } Specify whether pages are marked as being inactive when they are freed. This is used in CMO environments diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h index 03f8e98e3bcc..fe55e004f1f4 100644 --- a/include/linux/dma-contiguous.h +++ b/include/linux/dma-contiguous.h @@ -171,6 +171,12 @@ static inline void dma_free_contiguous(struct device *dev, struct page *page, #endif +#ifdef CONFIG_DMA_PERNUMA_CMA +void dma_pernuma_cma_reserve(void); +#else +static inline void dma_pernuma_cma_reserve(void) { } +#endif + #endif #endif diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index 847a9d1fa634..db7a37ed35eb 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -118,6 +118,16 @@ config DMA_CMA If unsure, say "n". if DMA_CMA + +config DMA_PERNUMA_CMA + bool "Enable separate DMA Contiguous Memory Area for each NUMA Node" + help + Enable this option to get pernuma CMA areas so that devices like + ARM64 SMMU can get local memory by DMA coherent APIs. + + You can set the size of pernuma CMA by specifying "pernuma_cma=size" + on the kernel's command line. + comment "Default contiguous memory area size:" config CMA_SIZE_MBYTES diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c index cff7e60968b9..89b95f10e56d 100644 --- a/kernel/dma/contiguous.c +++ b/kernel/dma/contiguous.c @@ -69,6 +69,19 @@ static int __init early_cma(char *p) } early_param("cma", early_cma); +#ifdef CONFIG_DMA_PERNUMA_CMA + +static struct cma *dma_contiguous_pernuma_area[MAX_NUMNODES]; +static phys_addr_t pernuma_size_bytes __initdata; + +static int __init early_pernuma_cma(char *p) +{ + pernuma_size_bytes = memparse(p, &p); + return 0; +} +early_param("pernuma_cma", early_pernuma_cma); +#endif + #ifdef CONFIG_CMA_SIZE_PERCENTAGE static phys_addr_t __init __maybe_unused cma_early_percent_memory(void) @@ -96,6 +109,34 @@ static inline __maybe_unused phys_addr_t cma_early_percent_memory(void) #endif +#ifdef CONFIG_DMA_PERNUMA_CMA +void __init dma_pernuma_cma_reserve(void) +{ + int nid; + + if (!pernuma_size_bytes) + return; + + for_each_node_state(nid, N_ONLINE) { + int ret; + char name[20]; + struct cma **cma = &dma_contiguous_pernuma_area[nid]; + + snprintf(name, sizeof(name), "pernuma%d", nid); + ret = cma_declare_contiguous_nid(0, pernuma_size_bytes, 0, 0,
[PATCH v6 0/2] make dma_alloc_coherent NUMA-aware by per-NUMA CMA
Ganapatrao Kulkarni has put some effort on making arm-smmu-v3 use local memory to save command queues[1]. I also did similar job in patch "iommu/arm-smmu-v3: allocate the memory of queues in local numa node" [2] while not realizing Ganapatrao has done that before. But it seems it is much better to make dma_alloc_coherent() to be inherently NUMA-aware on NUMA-capable systems. Right now, smmu is using dma_alloc_coherent() to get memory to save queues and tables. Typically, on ARM64 server, there is a default CMA located at node0, which could be far away from node2, node3 etc. Saving queues and tables remotely will increase the latency of ARM SMMU significantly. For example, when SMMU is at node2 and the default global CMA is at node0, after sending a CMD_SYNC in an empty command queue, we have to wait more than 550ns for the completion of the command CMD_SYNC. However, if we save them locally, we only need to wait for 240ns. with per-numa CMA, smmu will get memory from local numa node to save command queues and page tables. that means dma_unmap latency will be shrunk much. Meanwhile, when iommu.passthrough is on, device drivers which call dma_ alloc_coherent() will also get local memory and avoid the travel between numa nodes. I only have ARM64 server platforms to test, but I believe this patch will benefit X86 somehow. Hopefully, some X86 guys will bring it up on x86. [1] https://lists.linuxfoundation.org/pipermail/iommu/2017-October/024455.html [2] https://www.spinics.net/lists/iommu/msg44767.html -v6: * rebase on top of 5.9-rc1 * doc cleanup -v5: refine code according to Christoph Hellwig's comments * remove Kconfig option for pernuma cma size; * add Kconfig option for pernuma cma enable; * code cleanup like line over 80 char I haven't removed the cma NULL check code in cma_alloc() as it requires a bundle of other changes. So I prefer to handle this issue separately. -v4: * rebase on top of Christoph Hellwig's patch: [PATCH v2] dma-contiguous: cleanup dma_alloc_contiguous https://lore.kernel.org/linux-iommu/20200723120133.94105-1-...@lst.de/ * cleanup according to Christoph's comment * rebase on top of linux-next to avoid arch/arm64 conflicts * reserve cma by checking N_MEMORY rather than N_ONLINE -v3: * move to use page_to_nid() while freeing cma with respect to Robin's comment, but this will only work after applying my below patch: "mm/cma.c: use exact_nid true to fix possible per-numa cma leak" https://marc.info/?l=linux-mm&m=159333034726647&w=2 * handle the case count <= 1 more properly according to Robin's comment; * add pernuma_cma parameter to support dynamic setting of per-numa cma size; ideally we can leverage the CMA_SIZE_MBYTES, CMA_SIZE_PERCENTAGE and "cma=" kernel parameter and avoid a new paramter separately for per- numa cma. Practically, it is really too complicated considering the below problems: (1) if we leverage the size of default numa for per-numa, we have to avoid creating two cma with same size in node0 since default cma is probably on node0. (2) default cma can consider the address limitation for old devices while per-numa cma doesn't support GFP_DMA and GFP_DMA32. all allocations with limitation flags will fallback to default one. (3) hard to apply CMA_SIZE_PERCENTAGE to per-numa. it is hard to decide if the percentage should apply to the whole memory size or only apply to the memory size of a specific numa node. (4) default cma size has CMA_SIZE_SEL_MIN and CMA_SIZE_SEL_MAX, it makes things even more complicated to per-numa cma. I haven't figured out a good way to leverage the size of default cma for per-numa cma. it seems a separate parameter for per-numa could make life easier. * move dma_pernuma_cma_reserve() after hugetlb_cma_reserve() to reuse the comment before hugetlb_cma_reserve() with respect to Robin's comment -v2: * fix some issues reported by kernel test robot * fallback to default cma while allocation fails in per-numa cma free memory properly Barry Song (2): dma-contiguous: provide the ability to reserve per-numa CMA arm64: mm: reserve per-numa CMA to localize coherent dma buffers .../admin-guide/kernel-parameters.txt | 9 ++ arch/arm64/mm/init.c | 2 + include/linux/dma-contiguous.h| 6 ++ kernel/dma/Kconfig| 10 ++ kernel/dma/contiguous.c | 100 -- 5 files changed, 117 insertions(+), 10 deletions(-) -- 2.27.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v6 2/2] arm64: mm: reserve per-numa CMA to localize coherent dma buffers
Right now, smmu is using dma_alloc_coherent() to get memory to save queues and tables. Typically, on ARM64 server, there is a default CMA located at node0, which could be far away from node2, node3 etc. with this patch, smmu will get memory from local numa node to save command queues and page tables. that means dma_unmap latency will be shrunk much. Meanwhile, when iommu.passthrough is on, device drivers which call dma_ alloc_coherent() will also get local memory and avoid the travel between numa nodes. Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Will Deacon Cc: Robin Murphy Cc: Ganapatrao Kulkarni Cc: Catalin Marinas Cc: Nicolas Saenz Julienne Cc: Steve Capper Cc: Andrew Morton Cc: Mike Rapoport Signed-off-by: Barry Song --- -v6: rebase on top of 5.9-rc1 arch/arm64/mm/init.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 481d22c32a2e..f1c75957ff3c 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -429,6 +429,8 @@ void __init bootmem_init(void) arm64_hugetlb_cma_reserve(); #endif + dma_pernuma_cma_reserve(); + /* * sparse_init() tries to allocate memory from memblock, so must be * done after the fixed reservations -- 2.27.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v6 07/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)
Hi Alex, > From: Alex Williamson > Sent: Friday, August 21, 2020 9:49 AM > > On Fri, 21 Aug 2020 00:37:19 + > "Liu, Yi L" wrote: > > > Hi Alex, > > > > > From: Alex Williamson > > > Sent: Friday, August 21, 2020 4:51 AM > > > > > > On Mon, 27 Jul 2020 23:27:36 -0700 > > > Liu Yi L wrote: > > > > > > > This patch allows userspace to request PASID allocation/free, e.g. > > > > when serving the request from the guest. > > > > > > > > PASIDs that are not freed by userspace are automatically freed when > > > > the IOASID set is destroyed when process exits. > > > > > > > > Cc: Kevin Tian > > > > CC: Jacob Pan > > > > Cc: Alex Williamson > > > > Cc: Eric Auger > > > > Cc: Jean-Philippe Brucker > > > > Cc: Joerg Roedel > > > > Cc: Lu Baolu > > > > Signed-off-by: Liu Yi L > > > > Signed-off-by: Yi Sun > > > > Signed-off-by: Jacob Pan > > > > --- > > > > v5 -> v6: > > > > *) address comments from Eric against v5. remove the alloc/free helper. > > > > > > > > v4 -> v5: > > > > *) address comments from Eric Auger. > > > > *) the comments for the PASID_FREE request is addressed in patch 5/15 of > > > >this series. > > > > > > > > v3 -> v4: > > > > *) address comments from v3, except the below comment against the range > > > >of PASID_FREE request. needs more help on it. > > > > "> +if (req.range.min > req.range.max) > > > > > > > > Is it exploitable that a user can spin the kernel for a long time > > > > in > > > > the case of a free by calling this with [0, MAX_UINT] regardless of > > > > their actual allocations?" > > > > > > > > https://lore.kernel.org/linux-iommu/20200702151832.048b4...@x1.home/ > > > > > > > > v1 -> v2: > > > > *) move the vfio_mm related code to be a seprate module > > > > *) use a single structure for alloc/free, could support a range of > > > > PASIDs > > > > *) fetch vfio_mm at group_attach time instead of at iommu driver open > > > > time > > > > --- > > > > drivers/vfio/Kconfig| 1 + > > > > drivers/vfio/vfio_iommu_type1.c | 69 > > > + > > > > drivers/vfio/vfio_pasid.c | 10 ++ > > > > include/linux/vfio.h| 6 > > > > include/uapi/linux/vfio.h | 37 ++ > > > > 5 files changed, 123 insertions(+) > > > > > > > > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index > > > > 3d8a108..95d90c6 100644 > > > > --- a/drivers/vfio/Kconfig > > > > +++ b/drivers/vfio/Kconfig > > > > @@ -2,6 +2,7 @@ > > > > config VFIO_IOMMU_TYPE1 > > > > tristate > > > > depends on VFIO > > > > + select VFIO_PASID if (X86) > > > > default n > > > > > > > > config VFIO_IOMMU_SPAPR_TCE > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c > > > > b/drivers/vfio/vfio_iommu_type1.c index 18ff0c3..ea89c7c 100644 > > > > --- a/drivers/vfio/vfio_iommu_type1.c > > > > +++ b/drivers/vfio/vfio_iommu_type1.c > > > > @@ -76,6 +76,7 @@ struct vfio_iommu { > > > > booldirty_page_tracking; > > > > boolpinned_page_dirty_scope; > > > > struct iommu_nesting_info *nesting_info; > > > > + struct vfio_mm *vmm; > > > > }; > > > > > > > > struct vfio_domain { > > > > @@ -1937,6 +1938,11 @@ static void vfio_iommu_iova_insert_copy(struct > > > > vfio_iommu *iommu, > > > > > > > > static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu) > > > > { > > > > + if (iommu->vmm) { > > > > + vfio_mm_put(iommu->vmm); > > > > + iommu->vmm = NULL; > > > > + } > > > > + > > > > kfree(iommu->nesting_info); > > > > iommu->nesting_info = NULL; > > > > } > > > > @@ -2071,6 +2077,26 @@ static int vfio_iommu_type1_attach_group(void > > > *iommu_data, > > > > iommu->nesting_info); > > > > if (ret) > > > > goto out_detach; > > > > + > > > > + if (iommu->nesting_info->features & > > > > + > > > > IOMMU_NESTING_FEAT_SYSWIDE_PASID) > > > { > > > > + struct vfio_mm *vmm; > > > > + int sid; > > > > + > > > > + vmm = vfio_mm_get_from_task(current); > > > > + if (IS_ERR(vmm)) { > > > > + ret = PTR_ERR(vmm); > > > > + goto out_detach; > > > > + } > > > > + iommu->vmm = vmm; > > > > + > > > > + sid = vfio_mm_ioasid_sid(vmm); > > > > + ret = iommu_domain_set_attr(domain->domain, > > > > + > > > > DOMAIN_ATTR_IOASID_SID, > > > > + &sid); > > > > + if (ret) > > > > + goto out_detach; > >
[patch RFC 26/38] x86/xen: Wrap XEN MSI management into irqdomain
To allow utilizing the irq domain pointer in struct device it is necessary to make XEN/MSI irq domain compatible. While the right solution would be to truly convert XEN to irq domains, this is an exercise which is not possible for mere mortals with limited XENology. Provide a plain irqdomain wrapper around XEN. While this is blatant violation of the irqdomain design, it's the only solution for a XEN igorant person to make progress on the issue which triggered this change. Signed-off-by: Thomas Gleixner Cc: linux-...@vger.kernel.org Cc: xen-de...@lists.xenproject.org --- Note: This is completely untested, but it compiles so it must be perfect. --- arch/x86/pci/xen.c | 63 + 1 file changed, 63 insertions(+) --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -406,6 +406,63 @@ static void xen_teardown_msi_irq(unsigne WARN_ON_ONCE(1); } +static int xen_msi_domain_alloc_irqs(struct irq_domain *domain, +struct device *dev, int nvec) +{ + int type; + + if (WARN_ON_ONCE(!dev_is_pci(dev))) + return -EINVAL; + + if (first_msi_entry(dev)->msi_attrib.is_msix) + type = PCI_CAP_ID_MSIX; + else + type = PCI_CAP_ID_MSI; + + return x86_msi.setup_msi_irqs(to_pci_dev(dev), nvec, type); +} + +static void xen_msi_domain_free_irqs(struct irq_domain *domain, +struct device *dev) +{ + if (WARN_ON_ONCE(!dev_is_pci(dev))) + return; + + x86_msi.teardown_msi_irqs(to_pci_dev(dev)); +} + +static struct msi_domain_ops xen_pci_msi_domain_ops = { + .domain_alloc_irqs = xen_msi_domain_alloc_irqs, + .domain_free_irqs = xen_msi_domain_free_irqs, +}; + +static struct msi_domain_info xen_pci_msi_domain_info = { + .ops= &xen_pci_msi_domain_ops, +}; + +/* + * This irq domain is a blatant violation of the irq domain design, but + * distangling XEN into real irq domains is not a job for mere mortals with + * limited XENology. But it's the least dangerous way for a mere mortal to + * get rid of the arch_*_msi_irqs() hackery in order to store the irq + * domain pointer in struct device. This irq domain wrappery allows to do + * that without breaking XEN terminally. + */ +static __init struct irq_domain *xen_create_pci_msi_domain(void) +{ + struct irq_domain *d = NULL; + struct fwnode_handle *fn; + + fn = irq_domain_alloc_named_fwnode("XEN-MSI"); + if (fn) + d = msi_create_irq_domain(fn, &xen_pci_msi_domain_info, NULL); + + /* FIXME: No idea how to survive if this fails */ + BUG_ON(!d); + + return d; +} + static __init void xen_setup_pci_msi(void) { if (xen_initial_domain()) { @@ -426,6 +483,12 @@ static __init void xen_setup_pci_msi(voi } x86_msi.teardown_msi_irq = xen_teardown_msi_irq; + + /* +* Override the PCI/MSI irq domain init function. No point +* in allocating the native domain and never use it. +*/ + x86_init.irqs.create_pci_msi_domain = xen_create_pci_msi_domain; } #else /* CONFIG_PCI_MSI */ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 21/38] PCI: MSI: Provide pci_dev_has_special_msi_domain() helper
Provide a helper function to check whether a PCI device is handled by a non-standard PCI/MSI domain. This will be used to exclude such devices which hang of a special bus, e.g. VMD, to be excluded from the irq domain override in irq remapping. Signed-off-by: Thomas Gleixner Cc: Bjorn Helgaas Cc: linux-...@vger.kernel.org --- drivers/pci/msi.c | 22 ++ include/linux/msi.h |1 + 2 files changed, 23 insertions(+) --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1553,4 +1553,26 @@ struct irq_domain *pci_msi_get_device_do DOMAIN_BUS_PCI_MSI); return dom; } + +/** + * pci_dev_has_special_msi_domain - Check whether the device is handled by + * a non-standard PCI-MSI domain + * @pdev: The PCI device to check. + * + * Returns: True if the device irqdomain or the bus irqdomain is + * non-standard PCI/MSI. + */ +bool pci_dev_has_special_msi_domain(struct pci_dev *pdev) +{ + struct irq_domain *dom = dev_get_msi_domain(&pdev->dev); + + if (!dom) + dom = dev_get_msi_domain(&pdev->bus->dev); + + if (!dom) + return true; + + return dom->bus_token != DOMAIN_BUS_PCI_MSI; +} + #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */ --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -374,6 +374,7 @@ int pci_msi_domain_check_cap(struct irq_ struct msi_domain_info *info, struct device *dev); u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev *pdev); struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev); +bool pci_dev_has_special_msi_domain(struct pci_dev *pdev); #else static inline struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev) { ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 20/38] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI
Devices on the VMD bus use their own MSI irq domain, but it is not distinguishable from regular PCI/MSI irq domains. This is required to exclude VMD devices from getting the irq domain pointer set by interrupt remapping. Override the default bus token. Signed-off-by: Thomas Gleixner Cc: Bjorn Helgaas Cc: Lorenzo Pieralisi Cc: Jonathan Derrick Cc: linux-...@vger.kernel.org --- drivers/pci/controller/vmd.c |6 ++ 1 file changed, 6 insertions(+) --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -579,6 +579,12 @@ static int vmd_enable_domain(struct vmd_ return -ENODEV; } + /* +* Override the irq domain bus token so the domain can be distinguished +* from a regular PCI/MSI domain. +*/ + irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI); + pci_add_resource(&resources, &vmd->resources[0]); pci_add_resource_offset(&resources, &vmd->resources[1], offset[0]); pci_add_resource_offset(&resources, &vmd->resources[2], offset[1]); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 36/38] platform-msi: Add device MSI infrastructure
Add device specific MSI domain infrastructure for devices which have their own resource management and interrupt chip. These devices are not related to PCI and contrary to platform MSI they do not share a common resource and interrupt chip. They provide their own domain specific resource management and interrupt chip. This utilizes the new alloc/free override in a non evil way which avoids having yet another set of specialized alloc/free functions. Just using msi_domain_alloc/free_irqs() is sufficient While initially it was suggested and tried to piggyback device MSI on platform MSI, the better variant is to reimplement platform MSI on top of device MSI. Signed-off-by: Thomas Gleixner Cc: Greg Kroah-Hartman Cc: Marc Zyngier Cc: "Rafael J. Wysocki" --- drivers/base/platform-msi.c | 129 include/linux/irqdomain.h |1 include/linux/msi.h | 24 kernel/irq/Kconfig |4 + 4 files changed, 158 insertions(+) --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -412,3 +412,132 @@ int platform_msi_domain_alloc(struct irq return err; } + +#ifdef CONFIG_DEVICE_MSI +/* + * Device specific MSI domain infrastructure for devices which have their + * own resource management and interrupt chip. These devices are not + * related to PCI and contrary to platform MSI they do not share a common + * resource and interrupt chip. They provide their own domain specific + * resource management and interrupt chip. + */ + +static void device_msi_free_msi_entries(struct device *dev) +{ + struct list_head *msi_list = dev_to_msi_list(dev); + struct msi_desc *entry, *tmp; + + list_for_each_entry_safe(entry, tmp, msi_list, list) { + list_del(&entry->list); + free_msi_entry(entry); + } +} + +/** + * device_msi_free_irqs - Free MSI interrupts assigned to a device + * @dev: Pointer to the device + * + * Frees the interrupt and the MSI descriptors. + */ +static void device_msi_free_irqs(struct irq_domain *domain, struct device *dev) +{ + __msi_domain_free_irqs(domain, dev); + device_msi_free_msi_entries(dev); +} + +/** + * device_msi_alloc_irqs - Allocate MSI interrupts for a device + * @dev: Pointer to the device + * @nvec: Number of vectors + * + * Allocates the required number of MSI descriptors and the corresponding + * interrupt descriptors. + */ +static int device_msi_alloc_irqs(struct irq_domain *domain, struct device *dev, int nvec) +{ + int i, ret = -ENOMEM; + + for (i = 0; i < nvec; i++) { + struct msi_desc *entry = alloc_msi_entry(dev, 1, NULL); + + if (!entry) + goto fail; + list_add_tail(&entry->list, dev_to_msi_list(dev)); + } + + ret = __msi_domain_alloc_irqs(domain, dev, nvec); + if (!ret) + return 0; +fail: + device_msi_free_msi_entries(dev); + return ret; +} + +static void device_msi_update_dom_ops(struct msi_domain_info *info) +{ + if (!info->ops->domain_alloc_irqs) + info->ops->domain_alloc_irqs = device_msi_alloc_irqs; + if (!info->ops->domain_free_irqs) + info->ops->domain_free_irqs = device_msi_free_irqs; + if (!info->ops->msi_prepare) + info->ops->msi_prepare = arch_msi_prepare; +} + +/** + * device_msi_create_msi_irq_domain - Create an irq domain for devices + * @fwnode:Firmware node of the interrupt controller + * @info: MSI domain info to configure the new domain + * @parent:Parent domain + */ +struct irq_domain *device_msi_create_irq_domain(struct fwnode_handle *fn, + struct msi_domain_info *info, + struct irq_domain *parent) +{ + struct irq_domain *domain; + + if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS) + platform_msi_update_chip_ops(info); + + if (info->flags & MSI_FLAG_USE_DEF_DOM_OPS) + device_msi_update_dom_ops(info); + + domain = msi_create_irq_domain(fn, info, parent); + if (domain) + irq_domain_update_bus_token(domain, DOMAIN_BUS_DEVICE_MSI); + return domain; +} + +#ifdef CONFIG_PCI +#include + +/** + * pci_subdevice_msi_create_irq_domain - Create an irq domain for subdevices + * @pdev: Pointer to PCI device for which the subdevice domain is created + * @info: MSI domain info to configure the new domain + */ +struct irq_domain *pci_subdevice_msi_create_irq_domain(struct pci_dev *pdev, + struct msi_domain_info *info) +{ + struct irq_domain *domain, *pdev_msi; + struct fwnode_handle *fn; + + /* +* Retrieve the parent domain of the underlying PCI device's MSI +* domain. This is going to be the parent of the new subdevice +* domain as well. +
[patch RFC 25/38] irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
To support MSI irq domains which do not fit at all into the regular MSI irqdomain scheme, like the XEN MSI interrupt management for PV/HVM/DOM0, it's necessary to allow to override the alloc/free implementation. This is a preperatory step to switch X86 away from arch_*_msi_irqs() and store the irq domain pointer right in struct device. No functional change for existing MSI irq domain users. Aside of the evil XEN wrapper this is also useful for special MSI domains which need to do extra alloc/free work before/after calling the generic core function. Work like allocating/freeing MSI descriptors, MSI storage space etc. Signed-off-by: Thomas Gleixner Cc: Marc Zyngier --- include/linux/msi.h | 27 kernel/irq/msi.c| 70 +++- 2 files changed, 75 insertions(+), 22 deletions(-) --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -241,6 +241,10 @@ struct msi_domain_info; * @msi_finish:Optional callback to finalize the allocation * @set_desc: Set the msi descriptor for an interrupt * @handle_error: Optional error handler if the allocation fails + * @domain_alloc_irqs: Optional function to override the default allocation + * function. + * @domain_free_irqs: Optional function to override the default free + * function. * * @get_hwirq, @msi_init and @msi_free are callbacks used by * msi_create_irq_domain() and related interfaces @@ -248,6 +252,22 @@ struct msi_domain_info; * @msi_check, @msi_prepare, @msi_finish, @set_desc and @handle_error * are callbacks used by msi_domain_alloc_irqs() and related * interfaces which are based on msi_desc. + * + * @domain_alloc_irqs, @domain_free_irqs can be used to override the + * default allocation/free functions (__msi_domain_alloc/free_irqs). This + * is initially for a wrapper around XENs seperate MSI universe which can't + * be wrapped into the regular irq domains concepts by mere mortals. This + * allows to universally use msi_domain_alloc/free_irqs without having to + * special case XEN all over the place. + * + * Contrary to other operations @domain_alloc_irqs and @domain_free_irqs + * are set to the default implementation if NULL and even when + * MSI_FLAG_USE_DEF_DOM_OPS is not set to avoid breaking existing users and + * because these callbacks are obviously mandatory. + * + * This is NOT meant to be abused, but it can be useful to build wrappers + * for specialized MSI irq domains which need extra work before and after + * calling __msi_domain_alloc_irqs()/__msi_domain_free_irqs(). */ struct msi_domain_ops { irq_hw_number_t (*get_hwirq)(struct msi_domain_info *info, @@ -270,6 +290,10 @@ struct msi_domain_ops { struct msi_desc *desc); int (*handle_error)(struct irq_domain *domain, struct msi_desc *desc, int error); + int (*domain_alloc_irqs)(struct irq_domain *domain, +struct device *dev, int nvec); + void(*domain_free_irqs)(struct irq_domain *domain, + struct device *dev); }; /** @@ -327,8 +351,11 @@ int msi_domain_set_affinity(struct irq_d struct irq_domain *msi_create_irq_domain(struct fwnode_handle *fwnode, struct msi_domain_info *info, struct irq_domain *parent); +int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, + int nvec); int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, int nvec); +void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev); void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev); struct msi_domain_info *msi_get_domain_info(struct irq_domain *domain); --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -229,11 +229,13 @@ static int msi_domain_ops_check(struct i } static struct msi_domain_ops msi_domain_ops_default = { - .get_hwirq = msi_domain_ops_get_hwirq, - .msi_init = msi_domain_ops_init, - .msi_check = msi_domain_ops_check, - .msi_prepare= msi_domain_ops_prepare, - .set_desc = msi_domain_ops_set_desc, + .get_hwirq = msi_domain_ops_get_hwirq, + .msi_init = msi_domain_ops_init, + .msi_check = msi_domain_ops_check, + .msi_prepare= msi_domain_ops_prepare, + .set_desc = msi_domain_ops_set_desc, + .domain_alloc_irqs = __msi_domain_alloc_irqs, + .domain_free_irqs = __msi_domain_free_irqs, }; static void msi_domain_update_dom_ops(struct msi_domain_info *info) @@ -245,6 +247,14 @@ static void msi_domain_update_dom_ops(st return; }
[patch RFC 13/38] PCI: MSI: Rework pci_msi_domain_calc_hwirq()
Retrieve the PCI device from the msi descriptor instead of doing so at the call sites. Signed-off-by: Thomas Gleixner Cc: linux-...@vger.kernel.org --- arch/x86/kernel/apic/msi.c |2 +- drivers/pci/msi.c | 13 ++--- include/linux/msi.h|3 +-- 3 files changed, 8 insertions(+), 10 deletions(-) --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -232,7 +232,7 @@ EXPORT_SYMBOL_GPL(pci_msi_prepare); void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc) { - arg->msi_hwirq = pci_msi_domain_calc_hwirq(arg->msi_dev, desc); + arg->msi_hwirq = pci_msi_domain_calc_hwirq(desc); } EXPORT_SYMBOL_GPL(pci_msi_set_desc); --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1346,17 +1346,17 @@ void pci_msi_domain_write_msg(struct irq /** * pci_msi_domain_calc_hwirq - Generate a unique ID for an MSI source - * @dev: Pointer to the PCI device * @desc: Pointer to the MSI descriptor * * The ID number is only used within the irqdomain. */ -irq_hw_number_t pci_msi_domain_calc_hwirq(struct pci_dev *dev, - struct msi_desc *desc) +irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc) { + struct pci_dev *pdev = msi_desc_to_pci_dev(desc); + return (irq_hw_number_t)desc->msi_attrib.entry_nr | - pci_dev_id(dev) << 11 | - (pci_domain_nr(dev->bus) & 0x) << 27; + pci_dev_id(pdev) << 11 | + (pci_domain_nr(pdev->bus) & 0x) << 27; } static inline bool pci_msi_desc_is_multi_msi(struct msi_desc *desc) @@ -1406,8 +1406,7 @@ static void pci_msi_domain_set_desc(msi_ struct msi_desc *desc) { arg->desc = desc; - arg->hwirq = pci_msi_domain_calc_hwirq(msi_desc_to_pci_dev(desc), - desc); + arg->hwirq = pci_msi_domain_calc_hwirq(desc); } #else #define pci_msi_domain_set_descNULL --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -369,8 +369,7 @@ void pci_msi_domain_write_msg(struct irq struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode, struct msi_domain_info *info, struct irq_domain *parent); -irq_hw_number_t pci_msi_domain_calc_hwirq(struct pci_dev *dev, - struct msi_desc *desc); +irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc); int pci_msi_domain_check_cap(struct irq_domain *domain, struct msi_domain_info *info, struct device *dev); u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev *pdev); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 11/38] x86/irq: Consolidate DMAR irq allocation
None of the DMAR specific fields are required. Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/hw_irq.h |6 -- arch/x86/kernel/apic/msi.c| 10 +- 2 files changed, 5 insertions(+), 11 deletions(-) --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -83,12 +83,6 @@ struct irq_alloc_info { irq_hw_number_t msi_hwirq; }; #endif -#ifdef CONFIG_DMAR_TABLE - struct { - int dmar_id; - void*dmar_data; - }; -#endif #ifdef CONFIG_X86_UV struct { int uv_limit; --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -329,15 +329,15 @@ static struct irq_chip dmar_msi_controll static irq_hw_number_t dmar_msi_get_hwirq(struct msi_domain_info *info, msi_alloc_info_t *arg) { - return arg->dmar_id; + return arg->hwirq; } static int dmar_msi_init(struct irq_domain *domain, struct msi_domain_info *info, unsigned int virq, irq_hw_number_t hwirq, msi_alloc_info_t *arg) { - irq_domain_set_info(domain, virq, arg->dmar_id, info->chip, NULL, - handle_edge_irq, arg->dmar_data, "edge"); + irq_domain_set_info(domain, virq, arg->devid, info->chip, NULL, + handle_edge_irq, arg->data, "edge"); return 0; } @@ -384,8 +384,8 @@ int dmar_alloc_hwirq(int id, int node, v init_irq_alloc_info(&info, NULL); info.type = X86_IRQ_ALLOC_TYPE_DMAR; - info.dmar_id = id; - info.dmar_data = arg; + info.devid = id; + info.data = arg; return irq_domain_alloc_irqs(domain, 1, node, &info); } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 31/38] x86/irq: Cleanup the arch_*_msi_irqs() leftovers
Get rid of all the gunk and enable CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS. Signed-off-by: Thomas Gleixner Cc: xen-de...@lists.xenproject.org Cc: linux-...@vger.kernel.org --- arch/x86/Kconfig|1 + arch/x86/include/asm/pci.h | 11 --- arch/x86/include/asm/x86_init.h |1 - arch/x86/kernel/apic/msi.c | 22 -- arch/x86/kernel/x86_init.c | 18 -- arch/x86/pci/xen.c |7 --- 6 files changed, 1 insertion(+), 59 deletions(-) --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -225,6 +225,7 @@ config X86 select NEED_SG_DMA_LENGTH select PCI_DOMAINS if PCI select PCI_LOCKLESS_CONFIG if PCI + select PCI_MSI_DISABLE_ARCH_FALLBACKS select PERF_EVENTS select RTC_LIB select RTC_MC146818_LIB --- a/arch/x86/include/asm/pci.h +++ b/arch/x86/include/asm/pci.h @@ -105,17 +105,6 @@ static inline void early_quirks(void) { extern void pci_iommu_alloc(void); -#ifdef CONFIG_PCI_MSI -/* implemented in arch/x86/kernel/apic/io_apic. */ -struct msi_desc; -int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type); -void native_teardown_msi_irq(unsigned int irq); -void native_restore_msi_irqs(struct pci_dev *dev); -#else -#define native_setup_msi_irqs NULL -#define native_teardown_msi_irqNULL -#endif - /* generic pci stuff */ #include --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -277,7 +277,6 @@ struct pci_dev; struct x86_msi_ops { int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type); - void (*teardown_msi_irq)(unsigned int irq); void (*teardown_msi_irqs)(struct pci_dev *dev); void (*restore_msi_irqs)(struct pci_dev *dev); }; --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -182,28 +182,6 @@ static struct irq_chip pci_msi_controlle .flags = IRQCHIP_SKIP_SET_WAKE, }; -int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) -{ - struct irq_domain *domain; - struct irq_alloc_info info; - - init_irq_alloc_info(&info, NULL); - info.type = X86_IRQ_ALLOC_TYPE_PCI_MSI; - - domain = irq_remapping_get_irq_domain(&info); - if (domain == NULL) - domain = x86_pci_msi_default_domain; - if (domain == NULL) - return -ENOSYS; - - return msi_domain_alloc_irqs(domain, &dev->dev, nvec); -} - -void native_teardown_msi_irq(unsigned int irq) -{ - irq_domain_free_irqs(irq, 1); -} - int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, msi_alloc_info_t *arg) { --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -145,28 +145,10 @@ EXPORT_SYMBOL_GPL(x86_platform); #if defined(CONFIG_PCI_MSI) struct x86_msi_ops x86_msi __ro_after_init = { - .setup_msi_irqs = native_setup_msi_irqs, - .teardown_msi_irq = native_teardown_msi_irq, - .teardown_msi_irqs = default_teardown_msi_irqs, .restore_msi_irqs = default_restore_msi_irqs, }; /* MSI arch specific hooks */ -int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) -{ - return x86_msi.setup_msi_irqs(dev, nvec, type); -} - -void arch_teardown_msi_irqs(struct pci_dev *dev) -{ - x86_msi.teardown_msi_irqs(dev); -} - -void arch_teardown_msi_irq(unsigned int irq) -{ - x86_msi.teardown_msi_irq(irq); -} - void arch_restore_msi_irqs(struct pci_dev *dev) { x86_msi.restore_msi_irqs(dev); --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -401,11 +401,6 @@ static void xen_pv_teardown_msi_irqs(str xen_teardown_msi_irqs(dev); } -static void xen_teardown_msi_irq(unsigned int irq) -{ - WARN_ON_ONCE(1); -} - static int xen_msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, int nvec) { @@ -482,8 +477,6 @@ static __init void xen_setup_pci_msi(voi return; } - x86_msi.teardown_msi_irq = xen_teardown_msi_irq; - /* * Override the PCI/MSI irq domain init function. No point * in allocating the native domain and never use it. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 29/38] x86/pci: Set default irq domain in pcibios_add_device()
Now that interrupt remapping sets the irqdomain pointer when a PCI device is added it's possible to store the default irq domain in the device struct in pcibios_add_device(). If the bus to which a device is connected has an irq domain associated then this domain is used otherwise the default domain (PCI/MSI native or XEN PCI/MSI) is used. Using the bus domain ensures that special MSI bus domains like VMD work. This makes XEN and the non-remapped native case work solely based on the irq domain pointer in struct device for PCI/MSI and allows to remove the arch fallback and make most of the x86_msi ops private to XEN in the next steps. Signed-off-by: Thomas Gleixner Cc: linux-...@vger.kernel.org --- arch/x86/include/asm/irqdomain.h |2 ++ arch/x86/kernel/apic/msi.c |2 +- arch/x86/pci/common.c| 18 +- 3 files changed, 20 insertions(+), 2 deletions(-) --- a/arch/x86/include/asm/irqdomain.h +++ b/arch/x86/include/asm/irqdomain.h @@ -53,9 +53,11 @@ extern int mp_irqdomain_ioapic_idx(struc #ifdef CONFIG_PCI_MSI void x86_create_pci_msi_domain(void); struct irq_domain *native_create_pci_msi_domain(void); +extern struct irq_domain *x86_pci_msi_default_domain; #else static inline void x86_create_pci_msi_domain(void) { } #define native_create_pci_msi_domain NULL +#define x86_pci_msi_default_domain NULL #endif #endif --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -21,7 +21,7 @@ #include #include -static struct irq_domain *x86_pci_msi_default_domain __ro_after_init; +struct irq_domain *x86_pci_msi_default_domain __ro_after_init; static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg) { --- a/arch/x86/pci/common.c +++ b/arch/x86/pci/common.c @@ -19,6 +19,7 @@ #include #include #include +#include unsigned int pci_probe = PCI_PROBE_BIOS | PCI_PROBE_CONF1 | PCI_PROBE_CONF2 | PCI_PROBE_MMCONF; @@ -633,8 +634,9 @@ static void set_dev_domain_options(struc int pcibios_add_device(struct pci_dev *dev) { - struct setup_data *data; struct pci_setup_rom *rom; + struct irq_domain *msidom; + struct setup_data *data; u64 pa_data; pa_data = boot_params.hdr.setup_data; @@ -661,6 +663,20 @@ int pcibios_add_device(struct pci_dev *d memunmap(data); } set_dev_domain_options(dev); + + /* +* Setup the initial MSI domain of the device. If the underlying +* bus has a PCI/MSI irqdomain associated use the bus domain, +* otherwise set the default domain. This ensures that special irq +* domains e.g. VMD are preserved. The default ensures initial +* operation if irq remapping is not active. If irq remapping is +* active it will overwrite the domain pointer when the device is +* associated to a remapping domain. +*/ + msidom = dev_get_msi_domain(&dev->bus->dev); + if (!msidom) + msidom = x86_pci_msi_default_domain; + dev_set_msi_domain(&dev->dev, msidom); return 0; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 30/38] PCI/MSI: Allow to disable arch fallbacks
If an architecture does not require the MSI setup/teardown fallback functions, then allow them to be replaced by stub functions which emit a warning. Signed-off-by: Thomas Gleixner Cc: Bjorn Helgaas Cc: linux-...@vger.kernel.org --- drivers/pci/Kconfig |3 +++ drivers/pci/msi.c |3 ++- include/linux/msi.h | 31 ++- 3 files changed, 31 insertions(+), 6 deletions(-) --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -56,6 +56,9 @@ config PCI_MSI_IRQ_DOMAIN depends on PCI_MSI select GENERIC_MSI_IRQ_DOMAIN +config PCI_MSI_DISABLE_ARCH_FALLBACKS + bool + config PCI_QUIRKS default y bool "Enable PCI quirk workarounds" if EXPERT --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -58,8 +58,8 @@ static void pci_msi_teardown_msi_irqs(st #define pci_msi_teardown_msi_irqs arch_teardown_msi_irqs #endif +#ifndef CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS /* Arch hooks */ - int __weak arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc) { struct msi_controller *chip = dev->bus->msi; @@ -132,6 +132,7 @@ void __weak arch_teardown_msi_irqs(struc { return default_teardown_msi_irqs(dev); } +#endif /* !CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS */ static void default_restore_msi_irq(struct pci_dev *dev, int irq) { --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -193,17 +193,38 @@ void pci_msi_mask_irq(struct irq_data *d void pci_msi_unmask_irq(struct irq_data *data); /* - * The arch hooks to setup up msi irqs. Those functions are - * implemented as weak symbols so that they /can/ be overriden by - * architecture specific code if needed. + * The arch hooks to setup up msi irqs. Default functions are implemented + * as weak symbols so that they /can/ be overriden by architecture specific + * code if needed. + * + * They can be replaced by stubs with warnings via + * CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS when the architecture fully + * utilizes direct irqdomain based setup. */ +#ifndef CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc); void arch_teardown_msi_irq(unsigned int irq); int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type); void arch_teardown_msi_irqs(struct pci_dev *dev); -void arch_restore_msi_irqs(struct pci_dev *dev); - void default_teardown_msi_irqs(struct pci_dev *dev); +#else +static inline int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) +{ + WARN_ON_ONCE(1); + return -ENODEV; +} + +static inline void arch_teardown_msi_irqs(struct pci_dev *dev) +{ + WARN_ON_ONCE(1); +} +#endif + +/* + * The restore hooks are still available as they are useful even + * for fully irq domain based setups. Courtesy to XEN/X86. + */ +void arch_restore_msi_irqs(struct pci_dev *dev); void default_restore_msi_irqs(struct pci_dev *dev); struct msi_controller { ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 34/38] x86/msi: Let pci_msi_prepare() handle non-PCI MSI
Rename it to x86_msi_prepare() and handle the allocation type setup depending on the device type. Add a new arch_msi_prepare define which will be utilized by the upcoming device MSI support. Define it to NULL if not provided by an architecture in the generic MSI header. One arch specific function for MSI support is truly enough. Signed-off-by: Thomas Gleixner Cc: linux-...@vger.kernel.org Cc: linux-hyp...@vger.kernel.org --- arch/x86/include/asm/msi.h |4 +++- arch/x86/kernel/apic/msi.c | 27 --- drivers/pci/controller/pci-hyperv.c |2 +- include/linux/msi.h |4 4 files changed, 28 insertions(+), 9 deletions(-) --- a/arch/x86/include/asm/msi.h +++ b/arch/x86/include/asm/msi.h @@ -6,7 +6,9 @@ typedef struct irq_alloc_info msi_alloc_info_t; -int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, +int x86_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, msi_alloc_info_t *arg); +#define arch_msi_prepare x86_msi_prepare + #endif /* _ASM_X86_MSI_H */ --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -182,26 +182,39 @@ static struct irq_chip pci_msi_controlle .flags = IRQCHIP_SKIP_SET_WAKE, }; -int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, - msi_alloc_info_t *arg) +static void pci_msi_prepare(struct device *dev, msi_alloc_info_t *arg) { - struct pci_dev *pdev = to_pci_dev(dev); - struct msi_desc *desc = first_pci_msi_entry(pdev); + struct msi_desc *desc = first_msi_entry(dev); - init_irq_alloc_info(arg, NULL); if (desc->msi_attrib.is_msix) { arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX; } else { arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI; arg->flags |= X86_IRQ_ALLOC_CONTIGUOUS_VECTORS; } +} + +static void dev_msi_prepare(struct device *dev, msi_alloc_info_t *arg) +{ + arg->type = X86_IRQ_ALLOC_TYPE_DEV_MSI; +} + +int x86_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, + msi_alloc_info_t *arg) +{ + init_irq_alloc_info(arg, NULL); + + if (dev_is_pci(dev)) + pci_msi_prepare(dev, arg); + else + dev_msi_prepare(dev, arg); return 0; } -EXPORT_SYMBOL_GPL(pci_msi_prepare); +EXPORT_SYMBOL_GPL(x86_msi_prepare); static struct msi_domain_ops pci_msi_domain_ops = { - .msi_prepare= pci_msi_prepare, + .msi_prepare= x86_msi_prepare, }; static struct msi_domain_info pci_msi_domain_info = { --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -1532,7 +1532,7 @@ static struct irq_chip hv_msi_irq_chip = }; static struct msi_domain_ops hv_msi_ops = { - .msi_prepare= pci_msi_prepare, + .msi_prepare= arch_msi_prepare, .msi_free = hv_msi_free, }; --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -430,4 +430,8 @@ static inline struct irq_domain *pci_msi } #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */ +#ifndef arch_msi_prepare +# define arch_msi_prepare NULL +#endif + #endif /* LINUX_MSI_H */ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 33/38] x86/irq: Add DEV_MSI allocation type
For the upcoming device MSI support a new allocation type is required. Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/hw_irq.h |1 + 1 file changed, 1 insertion(+) --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -40,6 +40,7 @@ enum irq_alloc_type { X86_IRQ_ALLOC_TYPE_PCI_MSIX, X86_IRQ_ALLOC_TYPE_DMAR, X86_IRQ_ALLOC_TYPE_UV, + X86_IRQ_ALLOC_TYPE_DEV_MSI, X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT, X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT, }; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 10/38] x86/ioapic: Consolidate IOAPIC allocation
Move the IOAPIC specific fields into their own struct and reuse the common devid. Get rid of the #ifdeffery as it does not matter at all whether the alloc info is a couple of bytes longer or not. Signed-off-by: Thomas Gleixner Cc: Wei Liu Cc: "K. Y. Srinivasan" Cc: Stephen Hemminger Cc: Joerg Roedel Cc: linux-hyp...@vger.kernel.org Cc: iommu@lists.linux-foundation.org Cc: Haiyang Zhang Cc: Jon Derrick Cc: Lu Baolu --- arch/x86/include/asm/hw_irq.h | 23 ++- arch/x86/kernel/apic/io_apic.c | 70 ++-- drivers/iommu/amd/iommu.c | 14 +++ drivers/iommu/hyperv-iommu.c|2 - drivers/iommu/intel/irq_remapping.c | 18 - 5 files changed, 64 insertions(+), 63 deletions(-) --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -44,6 +44,15 @@ enum irq_alloc_type { X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT, }; +struct ioapic_alloc_info { + int pin; + int node; + u32 trigger : 1; + u32 polarity : 1; + u32 valid : 1; + struct IO_APIC_route_entry *entry; +}; + /** * irq_alloc_info - X86 specific interrupt allocation info * @type: X86 specific allocation type @@ -53,6 +62,8 @@ enum irq_alloc_type { * @mask: CPU mask for vector allocation * @desc: Pointer to msi descriptor * @data: Allocation specific data + * + * @ioapic:IOAPIC specific allocation data */ struct irq_alloc_info { enum irq_alloc_type type; @@ -64,6 +75,7 @@ struct irq_alloc_info { void*data; union { + struct ioapic_alloc_infoioapic; int unused; #ifdef CONFIG_PCI_MSI struct { @@ -71,17 +83,6 @@ struct irq_alloc_info { irq_hw_number_t msi_hwirq; }; #endif -#ifdef CONFIG_X86_IO_APIC - struct { - int ioapic_id; - int ioapic_pin; - int ioapic_node; - u32 ioapic_trigger : 1; - u32 ioapic_polarity : 1; - u32 ioapic_valid : 1; - struct IO_APIC_route_entry *ioapic_entry; - }; -#endif #ifdef CONFIG_DMAR_TABLE struct { int dmar_id; --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -860,10 +860,10 @@ void ioapic_set_alloc_attr(struct irq_al { init_irq_alloc_info(info, NULL); info->type = X86_IRQ_ALLOC_TYPE_IOAPIC; - info->ioapic_node = node; - info->ioapic_trigger = trigger; - info->ioapic_polarity = polarity; - info->ioapic_valid = 1; + info->ioapic.node = node; + info->ioapic.trigger = trigger; + info->ioapic.polarity = polarity; + info->ioapic.valid = 1; } #ifndef CONFIG_ACPI @@ -878,32 +878,32 @@ static void ioapic_copy_alloc_attr(struc copy_irq_alloc_info(dst, src); dst->type = X86_IRQ_ALLOC_TYPE_IOAPIC; - dst->ioapic_id = mpc_ioapic_id(ioapic_idx); - dst->ioapic_pin = pin; - dst->ioapic_valid = 1; - if (src && src->ioapic_valid) { - dst->ioapic_node = src->ioapic_node; - dst->ioapic_trigger = src->ioapic_trigger; - dst->ioapic_polarity = src->ioapic_polarity; + dst->devid = mpc_ioapic_id(ioapic_idx); + dst->ioapic.pin = pin; + dst->ioapic.valid = 1; + if (src && src->ioapic.valid) { + dst->ioapic.node = src->ioapic.node; + dst->ioapic.trigger = src->ioapic.trigger; + dst->ioapic.polarity = src->ioapic.polarity; } else { - dst->ioapic_node = NUMA_NO_NODE; + dst->ioapic.node = NUMA_NO_NODE; if (acpi_get_override_irq(gsi, &trigger, &polarity) >= 0) { - dst->ioapic_trigger = trigger; - dst->ioapic_polarity = polarity; + dst->ioapic.trigger = trigger; + dst->ioapic.polarity = polarity; } else { /* * PCI interrupts are always active low level * triggered. */ - dst->ioapic_trigger = IOAPIC_LEVEL; - dst->ioapic_polarity = IOAPIC_POL_LOW; + dst->ioapic.trigger = IOAPIC_LEVEL; + dst->ioapic.polarity = IOAPIC_POL_LOW; } } } static int ioapic_alloc_attr_node(struct irq_alloc_info *info) { - return (info && info->ioapic_valid) ? info->ioapic_node : NUMA_NO_NODE; +
[patch RFC 17/38] x86/pci: Reducde #ifdeffery in PCI init code
Adding a function call before the first #ifdef in arch_pci_init() triggers a 'mixed declarations and code' warning if PCI_DIRECT is enabled. Use stub functions and move the #ifdeffery to the header file where it is not in the way. Signed-off-by: Thomas Gleixner Cc: linux-...@vger.kernel.org --- arch/x86/include/asm/pci_x86.h | 11 +++ arch/x86/pci/init.c| 10 +++--- 2 files changed, 14 insertions(+), 7 deletions(-) --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -114,9 +114,20 @@ extern const struct pci_raw_ops pci_dire extern bool port_cf9_safe; /* arch_initcall level */ +#ifdef CONFIG_PCI_DIRECT extern int pci_direct_probe(void); extern void pci_direct_init(int type); +#else +static inline int pci_direct_probe(void) { return -1; } +static inline void pci_direct_init(int type) { } +#endif + +#ifdef CONFIG_PCI_BIOS extern void pci_pcbios_init(void); +#else +static inline void pci_pcbios_init(void) { } +#endif + extern void __init dmi_check_pciprobe(void); extern void __init dmi_check_skip_isa_align(void); --- a/arch/x86/pci/init.c +++ b/arch/x86/pci/init.c @@ -8,11 +8,9 @@ in the right sequence from here. */ static __init int pci_arch_init(void) { -#ifdef CONFIG_PCI_DIRECT - int type = 0; + int type; type = pci_direct_probe(); -#endif if (!(pci_probe & PCI_PROBE_NOEARLY)) pci_mmcfg_early_init(); @@ -20,18 +18,16 @@ static __init int pci_arch_init(void) if (x86_init.pci.arch_init && !x86_init.pci.arch_init()) return 0; -#ifdef CONFIG_PCI_BIOS pci_pcbios_init(); -#endif + /* * don't check for raw_pci_ops here because we want pcbios as last * fallback, yet it's needed to run first to set pcibios_last_bus * in case legacy PCI probing is used. otherwise detecting peer busses * fails. */ -#ifdef CONFIG_PCI_DIRECT pci_direct_init(type); -#endif + if (!raw_pci_ops && !raw_pci_ext_ops) printk(KERN_ERR "PCI: Fatal: No config space access function found\n"); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 15/38] x86/msi: Use generic MSI domain ops
pci_msi_get_hwirq() and pci_msi_set_desc are not longer special. Enable the generic MSI domain ops in the core and PCI MSI code unconditionally and get rid of the x86 specific implementations in the X86 MSI code and in the hyperv PCI driver. Signed-off-by: Thomas Gleixner Cc: Wei Liu Cc: Stephen Hemminger Cc: Haiyang Zhang Cc: linux-...@vger.kernel.org Cc: linux-hyp...@vger.kernel.org --- arch/x86/include/asm/msi.h |2 -- arch/x86/kernel/apic/msi.c | 15 --- drivers/pci/controller/pci-hyperv.c |8 drivers/pci/msi.c |4 kernel/irq/msi.c|6 -- 5 files changed, 35 deletions(-) --- a/arch/x86/include/asm/msi.h +++ b/arch/x86/include/asm/msi.h @@ -9,6 +9,4 @@ typedef struct irq_alloc_info msi_alloc_ int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, msi_alloc_info_t *arg); -void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc); - #endif /* _ASM_X86_MSI_H */ --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -204,12 +204,6 @@ void native_teardown_msi_irq(unsigned in irq_domain_free_irqs(irq, 1); } -static irq_hw_number_t pci_msi_get_hwirq(struct msi_domain_info *info, -msi_alloc_info_t *arg) -{ - return arg->hwirq; -} - int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, msi_alloc_info_t *arg) { @@ -228,17 +222,8 @@ int pci_msi_prepare(struct irq_domain *d } EXPORT_SYMBOL_GPL(pci_msi_prepare); -void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc) -{ - arg->desc = desc; - arg->hwirq = pci_msi_domain_calc_hwirq(desc); -} -EXPORT_SYMBOL_GPL(pci_msi_set_desc); - static struct msi_domain_ops pci_msi_domain_ops = { - .get_hwirq = pci_msi_get_hwirq, .msi_prepare= pci_msi_prepare, - .set_desc = pci_msi_set_desc, }; static struct msi_domain_info pci_msi_domain_info = { --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -1531,16 +1531,8 @@ static struct irq_chip hv_msi_irq_chip = .irq_unmask = hv_irq_unmask, }; -static irq_hw_number_t hv_msi_domain_ops_get_hwirq(struct msi_domain_info *info, - msi_alloc_info_t *arg) -{ - return arg->hwirq; -} - static struct msi_domain_ops hv_msi_ops = { - .get_hwirq = hv_msi_domain_ops_get_hwirq, .msi_prepare= pci_msi_prepare, - .set_desc = pci_msi_set_desc, .msi_free = hv_msi_free, }; --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1401,16 +1401,12 @@ static int pci_msi_domain_handle_error(s return error; } -#ifdef GENERIC_MSI_DOMAIN_OPS static void pci_msi_domain_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc) { arg->desc = desc; arg->hwirq = pci_msi_domain_calc_hwirq(desc); } -#else -#define pci_msi_domain_set_descNULL -#endif static struct msi_domain_ops pci_msi_domain_ops_default = { .set_desc = pci_msi_domain_set_desc, --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -187,7 +187,6 @@ static const struct irq_domain_ops msi_d .deactivate = msi_domain_deactivate, }; -#ifdef GENERIC_MSI_DOMAIN_OPS static irq_hw_number_t msi_domain_ops_get_hwirq(struct msi_domain_info *info, msi_alloc_info_t *arg) { @@ -206,11 +205,6 @@ static void msi_domain_ops_set_desc(msi_ { arg->desc = desc; } -#else -#define msi_domain_ops_get_hwirq NULL -#define msi_domain_ops_prepare NULL -#define msi_domain_ops_set_descNULL -#endif /* !GENERIC_MSI_DOMAIN_OPS */ static int msi_domain_ops_init(struct irq_domain *domain, struct msi_domain_info *info, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 09/38] x86/msi: Consolidate HPET allocation
None of the magic HPET fields are required in any way. Signed-off-by: Thomas Gleixner Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org Cc: Lu Baolu --- arch/x86/include/asm/hw_irq.h |7 --- arch/x86/kernel/apic/msi.c | 14 +++--- drivers/iommu/amd/iommu.c |2 +- drivers/iommu/intel/irq_remapping.c |4 ++-- 4 files changed, 10 insertions(+), 17 deletions(-) --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -65,13 +65,6 @@ struct irq_alloc_info { union { int unused; -#ifdef CONFIG_HPET_TIMER - struct { - int hpet_id; - int hpet_index; - void*hpet_data; - }; -#endif #ifdef CONFIG_PCI_MSI struct { struct pci_dev *msi_dev; --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -427,7 +427,7 @@ static struct irq_chip hpet_msi_controll static irq_hw_number_t hpet_msi_get_hwirq(struct msi_domain_info *info, msi_alloc_info_t *arg) { - return arg->hpet_index; + return arg->hwirq; } static int hpet_msi_init(struct irq_domain *domain, @@ -435,8 +435,8 @@ static int hpet_msi_init(struct irq_doma irq_hw_number_t hwirq, msi_alloc_info_t *arg) { irq_set_status_flags(virq, IRQ_MOVE_PCNTXT); - irq_domain_set_info(domain, virq, arg->hpet_index, info->chip, NULL, - handle_edge_irq, arg->hpet_data, "edge"); + irq_domain_set_info(domain, virq, arg->hwirq, info->chip, NULL, + handle_edge_irq, arg->data, "edge"); return 0; } @@ -477,7 +477,7 @@ struct irq_domain *hpet_create_irq_domai init_irq_alloc_info(&info, NULL); info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT; - info.hpet_id = hpet_id; + info.devid = hpet_id; parent = irq_remapping_get_irq_domain(&info); if (parent == NULL) parent = x86_vector_domain; @@ -506,9 +506,9 @@ int hpet_assign_irq(struct irq_domain *d init_irq_alloc_info(&info, NULL); info.type = X86_IRQ_ALLOC_TYPE_HPET; - info.hpet_data = hc; - info.hpet_id = hpet_dev_id(domain); - info.hpet_index = dev_num; + info.data = hc; + info.devid = hpet_dev_id(domain); + info.hwirq = dev_num; return irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info); } --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3511,7 +3511,7 @@ static int get_devid(struct irq_alloc_in return get_ioapic_devid(info->ioapic_id); case X86_IRQ_ALLOC_TYPE_HPET: case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT: - return get_hpet_devid(info->hpet_id); + return get_hpet_devid(info->devid); case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: return get_device_id(&info->msi_dev->dev); --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1115,7 +1115,7 @@ static struct irq_domain *intel_get_irq_ case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT: return map_ioapic_to_ir(info->ioapic_id); case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT: - return map_hpet_to_ir(info->hpet_id); + return map_hpet_to_ir(info->devid); case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: return map_dev_to_ir(info->msi_dev); @@ -1285,7 +1285,7 @@ static void intel_irq_remapping_prepare_ case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: if (info->type == X86_IRQ_ALLOC_TYPE_HPET) - set_hpet_sid(irte, info->hpet_id); + set_hpet_sid(irte, info->devid); else set_msi_sid(irte, info->msi_dev); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 24/38] x86/xen: Consolidate XEN-MSI init
X86 cannot store the irq domain pointer in struct device without breaking XEN because the irq domain pointer takes precedence over arch_*_msi_irqs() fallbacks. To achieve this XEN MSI interrupt management needs to be wrapped into an irq domain. Move the x86_msi ops setup into a single function to prepare for this. Signed-off-by: Thomas Gleixner --- arch/x86/pci/xen.c | 51 --- 1 file changed, 32 insertions(+), 19 deletions(-) --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -371,7 +371,10 @@ static void xen_initdom_restore_msi_irqs WARN(ret && ret != -ENOSYS, "restore_msi -> %d\n", ret); } } -#endif +#else /* CONFIG_XEN_DOM0 */ +#define xen_initdom_setup_msi_irqs NULL +#define xen_initdom_restore_msi_irqs NULL +#endif /* !CONFIG_XEN_DOM0 */ static void xen_teardown_msi_irqs(struct pci_dev *dev) { @@ -403,7 +406,31 @@ static void xen_teardown_msi_irq(unsigne WARN_ON_ONCE(1); } -#endif +static __init void xen_setup_pci_msi(void) +{ + if (xen_initial_domain()) { + x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs; + x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs; + x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs; + pci_msi_ignore_mask = 1; + } else if (xen_pv_domain()) { + x86_msi.setup_msi_irqs = xen_setup_msi_irqs; + x86_msi.teardown_msi_irqs = xen_pv_teardown_msi_irqs; + pci_msi_ignore_mask = 1; + } else if (xen_hvm_domain()) { + x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs; + x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs; + } else { + WARN_ON_ONCE(1); + return; + } + + x86_msi.teardown_msi_irq = xen_teardown_msi_irq; +} + +#else /* CONFIG_PCI_MSI */ +static inline void xen_setup_pci_msi(void) { } +#endif /* CONFIG_PCI_MSI */ int __init pci_xen_init(void) { @@ -420,12 +447,7 @@ int __init pci_xen_init(void) /* Keep ACPI out of the picture */ acpi_noirq_set(); -#ifdef CONFIG_PCI_MSI - x86_msi.setup_msi_irqs = xen_setup_msi_irqs; - x86_msi.teardown_msi_irq = xen_teardown_msi_irq; - x86_msi.teardown_msi_irqs = xen_pv_teardown_msi_irqs; - pci_msi_ignore_mask = 1; -#endif + xen_setup_pci_msi(); return 0; } @@ -445,10 +467,7 @@ static void __init xen_hvm_msi_init(void ((eax & XEN_HVM_CPUID_APIC_ACCESS_VIRT) && boot_cpu_has(X86_FEATURE_APIC))) return; } - - x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs; - x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs; - x86_msi.teardown_msi_irq = xen_teardown_msi_irq; + xen_setup_pci_msi(); } #endif @@ -481,13 +500,7 @@ int __init pci_xen_initial_domain(void) { int irq; -#ifdef CONFIG_PCI_MSI - x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs; - x86_msi.teardown_msi_irq = xen_teardown_msi_irq; - x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs; - x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs; - pci_msi_ignore_mask = 1; -#endif + xen_setup_pci_msi(); __acpi_register_gsi = acpi_register_gsi_xen; __acpi_unregister_gsi = NULL; /* ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 28/38] iommm/amd: Store irq domain in struct device
As the next step to make X86 utilize the direct MSI irq domain operations store the irq domain pointer in the device struct when a device is probed. It only overrides the irqdomain of devices which are handled by a regular PCI/MSI irq domain which protects PCI devices behind special busses like VMD which have their own irq domain. No functional change. It just avoids the redirection through arch_*_msi_irqs() and allows the PCI/MSI core to directly invoke the irq domain alloc/free functions instead of having to look up the irq domain for every single MSI interupt. Signed-off-by: Thomas Gleixner Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org --- drivers/iommu/amd/iommu.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -729,7 +729,21 @@ static void iommu_poll_ga_log(struct amd } } } -#endif /* CONFIG_IRQ_REMAP */ + +static void +amd_iommu_set_pci_msi_domain(struct device *dev, struct amd_iommu *iommu) +{ + if (!irq_remapping_enabled || !dev_is_pci(dev) || + pci_dev_has_special_msi_domain(to_pci_dev(dev))) + return; + + dev_set_msi_domain(dev, iommu->msi_domain); +} + +#else /* CONFIG_IRQ_REMAP */ +static inline void +amd_iommu_set_pci_msi_domain(struct device *dev, struct amd_iommu *iommu) { } +#endif /* !CONFIG_IRQ_REMAP */ #define AMD_IOMMU_INT_MASK \ (MMIO_STATUS_EVT_INT_MASK | \ @@ -2157,6 +2171,7 @@ static struct iommu_device *amd_iommu_pr iommu_dev = ERR_PTR(ret); iommu_ignore_device(dev); } else { + amd_iommu_set_pci_msi_domain(dev, iommu); iommu_dev = &iommu->iommu; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 37/38] irqdomain/msi: Provide msi_alloc/free_store() callbacks
For devices which don't have a standard storage for MSI messages like the upcoming IMS (Interrupt Message Storm) it's required to allocate storage space before allocating interrupts and after freeing them. This could be achieved with the existing callbacks, but that would be awkward because they operate on msi_alloc_info_t which is not uniform accross architectures. Also these callbacks are invoked per interrupt but the allocation might have bulk requirements depending on the device. As such devices can operate on different architectures it is simpler to have seperate callbacks which operate on struct device. The resulting storage information has to be stored in struct msi_desc so the underlying irq chip implementation can retrieve it for the relevant operations. Signed-off-by: Thomas Gleixner Cc: Marc Zyngier --- include/linux/msi.h |8 kernel/irq/msi.c| 11 +++ 2 files changed, 19 insertions(+) --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -279,6 +279,10 @@ struct msi_domain_info; * function. * @domain_free_irqs: Optional function to override the default free * function. + * @msi_alloc_store: Optional callback to allocate storage in a device + * specific non-standard MSI store + * @msi_alloc_free:Optional callback to free storage in a device + * specific non-standard MSI store * * @get_hwirq, @msi_init and @msi_free are callbacks used by * msi_create_irq_domain() and related interfaces @@ -328,6 +332,10 @@ struct msi_domain_ops { struct device *dev, int nvec); void(*domain_free_irqs)(struct irq_domain *domain, struct device *dev); + int (*msi_alloc_store)(struct irq_domain *domain, + struct device *dev, int nvec); + void(*msi_free_store)(struct irq_domain *domain, + struct device *dev); }; /** --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -410,6 +410,12 @@ int __msi_domain_alloc_irqs(struct irq_d if (ret) return ret; + if (ops->msi_alloc_store) { + ret = ops->msi_alloc_store(domain, dev, nvec); + if (ret) + return ret; + } + for_each_msi_entry(desc, dev) { ops->set_desc(&arg, desc); @@ -509,6 +515,8 @@ int msi_domain_alloc_irqs(struct irq_dom void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev) { + struct msi_domain_info *info = domain->host_data; + struct msi_domain_ops *ops = info->ops; struct msi_desc *desc; for_each_msi_entry(desc, dev) { @@ -522,6 +530,9 @@ void __msi_domain_free_irqs(struct irq_d desc->irq = 0; } } + + if (ops->msi_free_store) + ops->msi_free_store(domain, dev); } /** ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 14/38] x86/msi: Consolidate MSI allocation
Convert the interrupt remap drivers to retrieve the pci device from the msi descriptor and use info::hwirq. This is the first step to prepare x86 for using the generic MSI domain ops. Signed-off-by: Thomas Gleixner Cc: Wei Liu Cc: Stephen Hemminger Cc: Joerg Roedel Cc: linux-...@vger.kernel.org Cc: linux-hyp...@vger.kernel.org Cc: iommu@lists.linux-foundation.org Cc: Haiyang Zhang Cc: Lu Baolu --- arch/x86/include/asm/hw_irq.h |8 arch/x86/kernel/apic/msi.c |7 +++ drivers/iommu/amd/iommu.c |5 +++-- drivers/iommu/intel/irq_remapping.c |4 ++-- drivers/pci/controller/pci-hyperv.c |2 +- 5 files changed, 9 insertions(+), 17 deletions(-) --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -85,14 +85,6 @@ struct irq_alloc_info { union { struct ioapic_alloc_infoioapic; struct uv_alloc_infouv; - - int unused; -#ifdef CONFIG_PCI_MSI - struct { - struct pci_dev *msi_dev; - irq_hw_number_t msi_hwirq; - }; -#endif }; }; --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -189,7 +189,6 @@ int native_setup_msi_irqs(struct pci_dev init_irq_alloc_info(&info, NULL); info.type = X86_IRQ_ALLOC_TYPE_PCI_MSI; - info.msi_dev = dev; domain = irq_remapping_get_irq_domain(&info); if (domain == NULL) @@ -208,7 +207,7 @@ void native_teardown_msi_irq(unsigned in static irq_hw_number_t pci_msi_get_hwirq(struct msi_domain_info *info, msi_alloc_info_t *arg) { - return arg->msi_hwirq; + return arg->hwirq; } int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, @@ -218,7 +217,6 @@ int pci_msi_prepare(struct irq_domain *d struct msi_desc *desc = first_pci_msi_entry(pdev); init_irq_alloc_info(arg, NULL); - arg->msi_dev = pdev; if (desc->msi_attrib.is_msix) { arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX; } else { @@ -232,7 +230,8 @@ EXPORT_SYMBOL_GPL(pci_msi_prepare); void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc) { - arg->msi_hwirq = pci_msi_domain_calc_hwirq(desc); + arg->desc = desc; + arg->hwirq = pci_msi_domain_calc_hwirq(desc); } EXPORT_SYMBOL_GPL(pci_msi_set_desc); --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3514,7 +3514,7 @@ static int get_devid(struct irq_alloc_in return get_hpet_devid(info->devid); case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: - return get_device_id(&info->msi_dev->dev); + return get_device_id(msi_desc_to_dev(info->desc)); default: WARN_ON_ONCE(1); return -1; @@ -3688,7 +3688,8 @@ static int irq_remapping_alloc(struct ir info->type == X86_IRQ_ALLOC_TYPE_PCI_MSIX) { bool align = (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI); - index = alloc_irq_index(devid, nr_irqs, align, info->msi_dev); + index = alloc_irq_index(devid, nr_irqs, align, + msi_desc_to_pci_dev(info->desc)); } else { index = alloc_irq_index(devid, nr_irqs, false, NULL); } --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1118,7 +1118,7 @@ static struct irq_domain *intel_get_irq_ return map_hpet_to_ir(info->devid); case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: - return map_dev_to_ir(info->msi_dev); + return map_dev_to_ir(msi_desc_to_pci_dev(info->desc)); default: WARN_ON_ONCE(1); return NULL; @@ -1287,7 +1287,7 @@ static void intel_irq_remapping_prepare_ if (info->type == X86_IRQ_ALLOC_TYPE_HPET) set_hpet_sid(irte, info->devid); else - set_msi_sid(irte, info->msi_dev); + set_msi_sid(irte, msi_desc_to_pci_dev(info->desc)); msg->address_hi = MSI_ADDR_BASE_HI; msg->data = sub_handle; --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -1534,7 +1534,7 @@ static struct irq_chip hv_msi_irq_chip = static irq_hw_number_t hv_msi_domain_ops_get_hwirq(struct msi_domain_info *info, msi_alloc_info_t *arg) { - return arg->msi_hwirq; + return arg->hwirq; } static struct msi_domain_ops hv_msi_ops = { ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 16/38] x86/irq: Move apic_post_init() invocation to one place
No point to call it from both 32bit and 64bit implementations of default_setup_apic_routing(). Move it to the caller. Signed-off-by: Thomas Gleixner --- arch/x86/kernel/apic/apic.c |3 +++ arch/x86/kernel/apic/probe_32.c |3 --- arch/x86/kernel/apic/probe_64.c |3 --- 3 files changed, 3 insertions(+), 6 deletions(-) --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1429,6 +1429,9 @@ void __init apic_intr_mode_init(void) break; } + if (x86_platform.apic_post_init) + x86_platform.apic_post_init(); + apic_bsp_setup(upmode); } --- a/arch/x86/kernel/apic/probe_32.c +++ b/arch/x86/kernel/apic/probe_32.c @@ -170,9 +170,6 @@ void __init default_setup_apic_routing(v if (apic->setup_apic_routing) apic->setup_apic_routing(); - - if (x86_platform.apic_post_init) - x86_platform.apic_post_init(); } void __init generic_apic_probe(void) --- a/arch/x86/kernel/apic/probe_64.c +++ b/arch/x86/kernel/apic/probe_64.c @@ -32,9 +32,6 @@ void __init default_setup_apic_routing(v break; } } - - if (x86_platform.apic_post_init) - x86_platform.apic_post_init(); } int __init default_acpi_madt_oem_check(char *oem_id, char *oem_table_id) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 08/38] x86/irq: Prepare consolidation of irq_alloc_info
struct irq_alloc_info is a horrible zoo of unnamed structs in a union. Many of the struct fields can be generic and don't have to be type specific like hpet_id, ioapic_id... Provide a generic set of members to prepare for the consolidation. The goal is to make irq_alloc_info have the same basic member as the generic msi_alloc_info so generic MSI domain ops can be reused and yet more mess can be avoided when (non-PCI) device MSI support comes along. Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/hw_irq.h | 22 -- 1 file changed, 16 insertions(+), 6 deletions(-) --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -44,10 +44,25 @@ enum irq_alloc_type { X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT, }; +/** + * irq_alloc_info - X86 specific interrupt allocation info + * @type: X86 specific allocation type + * @flags: Flags for allocation tweaks + * @devid: Device ID for allocations + * @hwirq: Associated hw interrupt number in the domain + * @mask: CPU mask for vector allocation + * @desc: Pointer to msi descriptor + * @data: Allocation specific data + */ struct irq_alloc_info { enum irq_alloc_type type; u32 flags; - const struct cpumask*mask; /* CPU mask for vector allocation */ + u32 devid; + irq_hw_number_t hwirq; + const struct cpumask*mask; + struct msi_desc *desc; + void*data; + union { int unused; #ifdef CONFIG_HPET_TIMER @@ -88,11 +103,6 @@ struct irq_alloc_info { char*uv_name; }; #endif -#if IS_ENABLED(CONFIG_VMD) - struct { - struct msi_desc *desc; - }; -#endif }; }; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 35/38] platform-msi: Provide default irq_chip::ack
For the upcoming device MSI support it's required to have a default irq_chip::ack implementation (irq_chip_ack_parent) so the drivers do not need to care. Signed-off-by: Thomas Gleixner Cc: Greg Kroah-Hartman --- drivers/base/platform-msi.c |2 ++ 1 file changed, 2 insertions(+) --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -95,6 +95,8 @@ static void platform_msi_update_chip_ops chip->irq_mask = irq_chip_mask_parent; if (!chip->irq_unmask) chip->irq_unmask = irq_chip_unmask_parent; + if (!chip->irq_ack) + chip->irq_ack = irq_chip_ack_parent; if (!chip->irq_eoi) chip->irq_eoi = irq_chip_eoi_parent; if (!chip->irq_set_affinity) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 27/38] iommm/vt-d: Store irq domain in struct device
As a first step to make X86 utilize the direct MSI irq domain operations store the irq domain pointer in the device struct when a device is probed. This is done from dmar_pci_bus_add_dev() because it has to work even when DMA remapping is disabled. It only overrides the irqdomain of devices which are handled by a regular PCI/MSI irq domain which protects PCI devices behind special busses like VMD which have their own irq domain. No functional change. It just avoids the redirection through arch_*_msi_irqs() and allows the PCI/MSI core to directly invoke the irq domain alloc/free functions instead of having to look up the irq domain for every single MSI interupt. Signed-off-by: Thomas Gleixner Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org Cc: Lu Baolu --- drivers/iommu/intel/dmar.c |3 +++ drivers/iommu/intel/irq_remapping.c | 16 include/linux/intel-iommu.h |5 + 3 files changed, 24 insertions(+) --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -316,6 +316,9 @@ static int dmar_pci_bus_add_dev(struct d if (ret < 0 && dmar_dev_scope_status == 0) dmar_dev_scope_status = ret; + if (ret >= 0) + intel_irq_remap_add_device(info); + return ret; } --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1086,6 +1086,22 @@ static int reenable_irq_remapping(int ei return -1; } +/* + * Store the MSI remapping domain pointer in the device if enabled. + * + * This is called from dmar_pci_bus_add_dev() so it works even when DMA + * remapping is disabled. Only update the pointer if the device is not + * already handled by a non default PCI/MSI interrupt domain. This protects + * e.g. VMD devices. + */ +void intel_irq_remap_add_device(struct dmar_pci_notify_info *info) +{ + if (!irq_remapping_enabled || pci_dev_has_special_msi_domain(info->dev)) + return; + + dev_set_msi_domain(&info->dev->dev, map_dev_to_ir(info->dev)); +} + static void prepare_irte(struct irte *irte, int vector, unsigned int dest) { memset(irte, 0, sizeof(*irte)); --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -439,6 +439,11 @@ struct ir_table { struct irte *base; unsigned long *bitmap; }; + +void intel_irq_remap_add_device(struct dmar_pci_notify_info *info); +#else +static inline void +intel_irq_remap_add_device(struct dmar_pci_notify_info *info) { } #endif struct iommu_flush { ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 01/38] iommu/amd: Prevent NULL pointer dereference
Dereferencing irq_data before checking it for NULL is suboptimal. Signed-off-by: Thomas Gleixner Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org --- drivers/iommu/amd/iommu.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3717,8 +3717,8 @@ static int irq_remapping_alloc(struct ir for (i = 0; i < nr_irqs; i++) { irq_data = irq_domain_get_irq_data(domain, virq + i); - cfg = irqd_cfg(irq_data); - if (!irq_data || !cfg) { + cfg = irq_data ? irqd_cfg(irq_data) : NULL; + if (!cfg) { ret = -EINVAL; goto out_free_data; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 18/38] x86/irq: Initialize PCI/MSI domain at PCI init time
No point in initializing the default PCI/MSI interrupt domain early and no point to create it when XEN PV/HVM/DOM0 are active. Move the initialization to pci_arch_init() and convert it to init ops so that XEN can override it as XEN has it's own PCI/MSI management. The XEN override comes in a later step. Signed-off-by: Thomas Gleixner Cc: linux-...@vger.kernel.org --- arch/x86/include/asm/irqdomain.h |6 -- arch/x86/include/asm/x86_init.h |3 +++ arch/x86/kernel/apic/msi.c | 26 -- arch/x86/kernel/apic/vector.c|2 -- arch/x86/kernel/x86_init.c |3 ++- arch/x86/pci/init.c |3 +++ 6 files changed, 28 insertions(+), 15 deletions(-) --- a/arch/x86/include/asm/irqdomain.h +++ b/arch/x86/include/asm/irqdomain.h @@ -51,9 +51,11 @@ extern int mp_irqdomain_ioapic_idx(struc #endif /* CONFIG_X86_IO_APIC */ #ifdef CONFIG_PCI_MSI -extern void arch_init_msi_domain(struct irq_domain *domain); +void x86_create_pci_msi_domain(void); +struct irq_domain *native_create_pci_msi_domain(void); #else -static inline void arch_init_msi_domain(struct irq_domain *domain) { } +static inline void x86_create_pci_msi_domain(void) { } +#define native_create_pci_msi_domain NULL #endif #endif --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -8,6 +8,7 @@ struct mpc_bus; struct mpc_cpu; struct mpc_table; struct cpuinfo_x86; +struct irq_domain; /** * struct x86_init_mpparse - platform specific mpparse ops @@ -42,12 +43,14 @@ struct x86_init_resources { * @intr_init: interrupt init code * @intr_mode_select: interrupt delivery mode selection * @intr_mode_init:interrupt delivery mode setup + * @create_pci_msi_domain: Create the PCI/MSI interrupt domain */ struct x86_init_irqs { void (*pre_vector_init)(void); void (*intr_init)(void); void (*intr_mode_select)(void); void (*intr_mode_init)(void); + struct irq_domain *(*create_pci_msi_domain)(void); }; /** --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -21,7 +21,7 @@ #include #include -static struct irq_domain *msi_default_domain; +static struct irq_domain *x86_pci_msi_default_domain __ro_after_init; static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg) { @@ -192,7 +192,7 @@ int native_setup_msi_irqs(struct pci_dev domain = irq_remapping_get_irq_domain(&info); if (domain == NULL) - domain = msi_default_domain; + domain = x86_pci_msi_default_domain; if (domain == NULL) return -ENOSYS; @@ -243,25 +243,31 @@ static struct msi_domain_info pci_msi_do .handler_name = "edge", }; -void __init arch_init_msi_domain(struct irq_domain *parent) +struct irq_domain * __init native_create_pci_msi_domain(void) { struct fwnode_handle *fn; + struct irq_domain *d; if (disable_apic) - return; + return NULL; fn = irq_domain_alloc_named_fwnode("PCI-MSI"); if (fn) { - msi_default_domain = - pci_msi_create_irq_domain(fn, &pci_msi_domain_info, - parent); + d = pci_msi_create_irq_domain(fn, &pci_msi_domain_info, + x86_vector_domain); } - if (!msi_default_domain) { + if (!d) { irq_domain_free_fwnode(fn); - pr_warn("failed to initialize irqdomain for MSI/MSI-x.\n"); + pr_warn("Failed to initialize PCI-MSI irqdomain.\n"); } else { - msi_default_domain->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK; + d->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK; } + return d; +} + +void __init x86_create_pci_msi_domain(void) +{ + x86_pci_msi_default_domain = x86_init.irqs.create_pci_msi_domain(); } #ifdef CONFIG_IRQ_REMAP --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -713,8 +713,6 @@ int __init arch_early_irq_init(void) BUG_ON(x86_vector_domain == NULL); irq_set_default_host(x86_vector_domain); - arch_init_msi_domain(x86_vector_domain); - BUG_ON(!alloc_cpumask_var(&vector_searchmask, GFP_KERNEL)); /* --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -76,7 +76,8 @@ struct x86_init_ops x86_init __initdata .pre_vector_init= init_ISA_irqs, .intr_init = native_init_IRQ, .intr_mode_select = apic_intr_mode_select, - .intr_mode_init = apic_intr_mode_init + .intr_mode_init = apic_intr_mode_init, + .create_pci_msi_domain = native_create_pci_msi_domain, }, .oem = { --- a/arch/x86/pci/init.c +++ b/arch/x86/pci/init.c @@ -3,6 +3,7 @@ #include #i
[patch RFC 12/38] x86/irq: Consolidate UV domain allocation
Move the UV specific fields into their own struct for readability sake. Get rid of the #ifdeffery as it does not matter at all whether the alloc info is a couple of bytes longer or not. Signed-off-by: Thomas Gleixner Cc: Steve Wahl Cc: Dimitri Sivanich Cc: Russ Anderson --- arch/x86/include/asm/hw_irq.h | 21 - arch/x86/platform/uv/uv_irq.c | 16 2 files changed, 20 insertions(+), 17 deletions(-) --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -53,6 +53,14 @@ struct ioapic_alloc_info { struct IO_APIC_route_entry *entry; }; +struct uv_alloc_info { + int limit; + int blade; + unsigned long offset; + char*name; + +}; + /** * irq_alloc_info - X86 specific interrupt allocation info * @type: X86 specific allocation type @@ -64,7 +72,8 @@ struct ioapic_alloc_info { * @data: Allocation specific data * * @ioapic:IOAPIC specific allocation data - */ + * @uv:UV specific allocation data +*/ struct irq_alloc_info { enum irq_alloc_type type; u32 flags; @@ -76,6 +85,8 @@ struct irq_alloc_info { union { struct ioapic_alloc_infoioapic; + struct uv_alloc_infouv; + int unused; #ifdef CONFIG_PCI_MSI struct { @@ -83,14 +94,6 @@ struct irq_alloc_info { irq_hw_number_t msi_hwirq; }; #endif -#ifdef CONFIG_X86_UV - struct { - int uv_limit; - int uv_blade; - unsigned long uv_offset; - char*uv_name; - }; -#endif }; }; --- a/arch/x86/platform/uv/uv_irq.c +++ b/arch/x86/platform/uv/uv_irq.c @@ -90,15 +90,15 @@ static int uv_domain_alloc(struct irq_do ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg); if (ret >= 0) { - if (info->uv_limit == UV_AFFINITY_CPU) + if (info->uv.limit == UV_AFFINITY_CPU) irq_set_status_flags(virq, IRQ_NO_BALANCING); else irq_set_status_flags(virq, IRQ_MOVE_PCNTXT); - chip_data->pnode = uv_blade_to_pnode(info->uv_blade); - chip_data->offset = info->uv_offset; + chip_data->pnode = uv_blade_to_pnode(info->uv.blade); + chip_data->offset = info->uv.offset; irq_domain_set_info(domain, virq, virq, &uv_irq_chip, chip_data, - handle_percpu_irq, NULL, info->uv_name); + handle_percpu_irq, NULL, info->uv.name); } else { kfree(chip_data); } @@ -193,10 +193,10 @@ int uv_setup_irq(char *irq_name, int cpu init_irq_alloc_info(&info, cpumask_of(cpu)); info.type = X86_IRQ_ALLOC_TYPE_UV; - info.uv_limit = limit; - info.uv_blade = mmr_blade; - info.uv_offset = mmr_offset; - info.uv_name = irq_name; + info.uv.limit = limit; + info.uv.blade = mmr_blade; + info.uv.offset = mmr_offset; + info.uv.name = irq_name; return irq_domain_alloc_irqs(domain, 1, uv_blade_to_memory_nid(mmr_blade), &info); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 07/38] iommu/irq_remapping: Consolidate irq domain lookup
Now that the iommu implementations handle the X86_*_GET_PARENT_DOMAIN types, consolidate the two getter functions. Signed-off-by: Thomas Gleixner Cc: Wei Liu Cc: Joerg Roedel Cc: linux-hyp...@vger.kernel.org Cc: iommu@lists.linux-foundation.org Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Jon Derrick Cc: Lu Baolu --- arch/x86/include/asm/irq_remapping.h |8 arch/x86/kernel/apic/io_apic.c |2 +- arch/x86/kernel/apic/msi.c |2 +- drivers/iommu/amd/iommu.c|1 - drivers/iommu/hyperv-iommu.c |4 ++-- drivers/iommu/intel/irq_remapping.c |1 - drivers/iommu/irq_remapping.c| 23 +-- drivers/iommu/irq_remapping.h|5 + 8 files changed, 6 insertions(+), 40 deletions(-) --- a/arch/x86/include/asm/irq_remapping.h +++ b/arch/x86/include/asm/irq_remapping.h @@ -45,8 +45,6 @@ extern int irq_remap_enable_fault_handli extern void panic_if_irq_remap(const char *msg); extern struct irq_domain * -irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info); -extern struct irq_domain * irq_remapping_get_irq_domain(struct irq_alloc_info *info); /* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */ @@ -74,12 +72,6 @@ static inline void panic_if_irq_remap(co } static inline struct irq_domain * -irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info) -{ - return NULL; -} - -static inline struct irq_domain * irq_remapping_get_irq_domain(struct irq_alloc_info *info) { return NULL; --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -2298,7 +2298,7 @@ static int mp_irqdomain_create(int ioapi init_irq_alloc_info(&info, NULL); info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT; info.ioapic_id = mpc_ioapic_id(ioapic); - parent = irq_remapping_get_ir_irq_domain(&info); + parent = irq_remapping_get_irq_domain(&info); if (!parent) parent = x86_vector_domain; else --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -478,7 +478,7 @@ struct irq_domain *hpet_create_irq_domai init_irq_alloc_info(&info, NULL); info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT; info.hpet_id = hpet_id; - parent = irq_remapping_get_ir_irq_domain(&info); + parent = irq_remapping_get_irq_domain(&info); if (parent == NULL) parent = x86_vector_domain; else --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3561,7 +3561,6 @@ struct irq_remap_ops amd_iommu_irq_ops = .disable= amd_iommu_disable, .reenable = amd_iommu_reenable, .enable_faulting= amd_iommu_enable_faulting, - .get_ir_irq_domain = get_irq_domain, .get_irq_domain = get_irq_domain, }; --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -182,7 +182,7 @@ static int __init hyperv_enable_irq_rema return IRQ_REMAP_X2APIC_MODE; } -static struct irq_domain *hyperv_get_ir_irq_domain(struct irq_alloc_info *info) +static struct irq_domain *hyperv_get_irq_domain(struct irq_alloc_info *info) { if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT) return ioapic_ir_domain; @@ -193,7 +193,7 @@ static struct irq_domain *hyperv_get_ir_ struct irq_remap_ops hyperv_irq_remap_ops = { .prepare= hyperv_prepare_irq_remapping, .enable = hyperv_enable_irq_remapping, - .get_ir_irq_domain = hyperv_get_ir_irq_domain, + .get_irq_domain = hyperv_get_irq_domain, }; #endif --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1131,7 +1131,6 @@ struct irq_remap_ops intel_irq_remap_ops .disable= disable_irq_remapping, .reenable = reenable_irq_remapping, .enable_faulting= enable_drhd_fault_handling, - .get_ir_irq_domain = intel_get_irq_domain, .get_irq_domain = intel_get_irq_domain, }; --- a/drivers/iommu/irq_remapping.c +++ b/drivers/iommu/irq_remapping.c @@ -160,33 +160,12 @@ void panic_if_irq_remap(const char *msg) } /** - * irq_remapping_get_ir_irq_domain - Get the irqdomain associated with the IOMMU - * device serving request @info - * @info: interrupt allocation information, used to identify the IOMMU device - * - * It's used to get parent irqdomain for HPET and IOAPIC irqdomains. - * Returns pointer to IRQ domain, or NULL on failure. - */ -struct irq_domain * -irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info) -{ - if (!remap_ops || !remap_ops->get_ir_irq_domain) - return NULL; - - return remap_ops->get_ir_irq_domain(info); -} - -/** * irq_remapping_get_irq_domain - Get the irqdomain serving the request @info * @info: interrupt allocation information,
[patch RFC 23/38] x86/xen: Rework MSI teardown
X86 cannot store the irq domain pointer in struct device without breaking XEN because the irq domain pointer takes precedence over arch_*_msi_irqs() fallbacks. XENs MSI teardown relies on default_teardown_msi_irqs() which invokes arch_teardown_msi_irq(). default_teardown_msi_irqs() is a trivial iterator over the msi entries associated to a device. Implement this loop in xen_teardown_msi_irqs() to prepare for removal of the fallbacks for X86. This is a preparatory step to wrap XEN MSI alloc/free into a irq domain which in turn allows to store the irq domain pointer in struct device and to use the irq domain functions directly. Signed-off-by: Thomas Gleixner --- arch/x86/pci/xen.c | 23 ++- 1 file changed, 18 insertions(+), 5 deletions(-) --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -376,20 +376,31 @@ static void xen_initdom_restore_msi_irqs static void xen_teardown_msi_irqs(struct pci_dev *dev) { struct msi_desc *msidesc; + int i; + + for_each_pci_msi_entry(msidesc, dev) { + if (msidesc->irq) { + for (i = 0; i < msidesc->nvec_used; i++) + xen_destroy_irq(msidesc->irq + i); + } + } +} + +static void xen_pv_teardown_msi_irqs(struct pci_dev *dev) +{ + struct msi_desc *msidesc = first_pci_msi_entry(dev); - msidesc = first_pci_msi_entry(dev); if (msidesc->msi_attrib.is_msix) xen_pci_frontend_disable_msix(dev); else xen_pci_frontend_disable_msi(dev); - /* Free the IRQ's and the msidesc using the generic code. */ - default_teardown_msi_irqs(dev); + xen_teardown_msi_irqs(dev); } static void xen_teardown_msi_irq(unsigned int irq) { - xen_destroy_irq(irq); + WARN_ON_ONCE(1); } #endif @@ -412,7 +423,7 @@ int __init pci_xen_init(void) #ifdef CONFIG_PCI_MSI x86_msi.setup_msi_irqs = xen_setup_msi_irqs; x86_msi.teardown_msi_irq = xen_teardown_msi_irq; - x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs; + x86_msi.teardown_msi_irqs = xen_pv_teardown_msi_irqs; pci_msi_ignore_mask = 1; #endif return 0; @@ -436,6 +447,7 @@ static void __init xen_hvm_msi_init(void } x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs; + x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs; x86_msi.teardown_msi_irq = xen_teardown_msi_irq; } #endif @@ -472,6 +484,7 @@ int __init pci_xen_initial_domain(void) #ifdef CONFIG_PCI_MSI x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs; x86_msi.teardown_msi_irq = xen_teardown_msi_irq; + x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs; x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs; pci_msi_ignore_mask = 1; #endif ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 19/38] irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
PCI devices behind a VMD bus are not subject to interrupt remapping, but the irq domain for VMD MSI cannot be distinguished from a regular PCI/MSI irq domain. Add a new domain bus token and allow it in the bus token check in msi_check_reservation_mode() to keep the functionality the same once VMD uses this token. Signed-off-by: Thomas Gleixner Cc: Jon Derrick --- include/linux/irqdomain.h |1 + kernel/irq/msi.c |7 ++- 2 files changed, 7 insertions(+), 1 deletion(-) --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -84,6 +84,7 @@ enum irq_domain_bus_token { DOMAIN_BUS_FSL_MC_MSI, DOMAIN_BUS_TI_SCI_INTA_MSI, DOMAIN_BUS_WAKEUP, + DOMAIN_BUS_VMD_MSI, }; /** --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -370,8 +370,13 @@ static bool msi_check_reservation_mode(s { struct msi_desc *desc; - if (domain->bus_token != DOMAIN_BUS_PCI_MSI) + switch(domain->bus_token) { + case DOMAIN_BUS_PCI_MSI: + case DOMAIN_BUS_VMD_MSI: + break; + default: return false; + } if (!(info->flags & MSI_FLAG_MUST_REACTIVATE)) return false; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 32/38] x86/irq: Make most MSI ops XEN private
Nothing except XEN uses the setup/teardown ops. Hide them there. Signed-off-by: Thomas Gleixner Cc: xen-de...@lists.xenproject.org Cc: linux-...@vger.kernel.org --- arch/x86/include/asm/x86_init.h |2 -- arch/x86/pci/xen.c | 23 +++ 2 files changed, 15 insertions(+), 10 deletions(-) --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -276,8 +276,6 @@ struct x86_platform_ops { struct pci_dev; struct x86_msi_ops { - int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type); - void (*teardown_msi_irqs)(struct pci_dev *dev); void (*restore_msi_irqs)(struct pci_dev *dev); }; --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -156,6 +156,13 @@ static int acpi_register_gsi_xen(struct struct xen_pci_frontend_ops *xen_pci_frontend; EXPORT_SYMBOL_GPL(xen_pci_frontend); +struct xen_msi_ops { + int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type); + void (*teardown_msi_irqs)(struct pci_dev *dev); +}; + +static struct xen_msi_ops xen_msi_ops __ro_after_init; + static int xen_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) { int irq, ret, i; @@ -414,7 +421,7 @@ static int xen_msi_domain_alloc_irqs(str else type = PCI_CAP_ID_MSI; - return x86_msi.setup_msi_irqs(to_pci_dev(dev), nvec, type); + return xen_msi_ops.setup_msi_irqs(to_pci_dev(dev), nvec, type); } static void xen_msi_domain_free_irqs(struct irq_domain *domain, @@ -423,7 +430,7 @@ static void xen_msi_domain_free_irqs(str if (WARN_ON_ONCE(!dev_is_pci(dev))) return; - x86_msi.teardown_msi_irqs(to_pci_dev(dev)); + xen_msi_ops.teardown_msi_irqs(to_pci_dev(dev)); } static struct msi_domain_ops xen_pci_msi_domain_ops = { @@ -461,17 +468,17 @@ static __init struct irq_domain *xen_cre static __init void xen_setup_pci_msi(void) { if (xen_initial_domain()) { - x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs; - x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs; + xen_msi_ops.setup_msi_irqs = xen_initdom_setup_msi_irqs; + xen_msi_ops.teardown_msi_irqs = xen_teardown_msi_irqs; x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs; pci_msi_ignore_mask = 1; } else if (xen_pv_domain()) { - x86_msi.setup_msi_irqs = xen_setup_msi_irqs; - x86_msi.teardown_msi_irqs = xen_pv_teardown_msi_irqs; + xen_msi_ops.setup_msi_irqs = xen_setup_msi_irqs; + xen_msi_ops.teardown_msi_irqs = xen_pv_teardown_msi_irqs; pci_msi_ignore_mask = 1; } else if (xen_hvm_domain()) { - x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs; - x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs; + xen_msi_ops.setup_msi_irqs = xen_hvm_setup_msi_irqs; + xen_msi_ops.teardown_msi_irqs = xen_teardown_msi_irqs; } else { WARN_ON_ONCE(1); return; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING
A generic IMS irq chip and irq domain implementation for IMS based devices which utilize a MSI message store array on chip. Allows IMS devices with a MSI message store array to reuse this code for different array sizes. Allocation and freeing of interrupts happens via the generic msi_domain_alloc/free_irqs() interface. No special purpose IMS magic required as long as the interrupt domain is stored in the underlying device struct. Completely untested of course and mostly for illustration and educational purpose. This should of course be a modular irq chip, but adding that support is left as an exercise for the people who care about this deeply. Signed-off-by: Thomas Gleixner Cc: Marc Zyngier Cc: Megha Dey Cc: Jason Gunthorpe Cc: Dave Jiang Cc: Alex Williamson Cc: Jacob Pan Cc: Baolu Lu Cc: Kevin Tian Cc: Dan Williams --- drivers/irqchip/Kconfig |8 + drivers/irqchip/Makefile|1 drivers/irqchip/irq-ims-msi.c | 169 include/linux/irqchip/irq-ims-msi.h | 41 4 files changed, 219 insertions(+) --- a/drivers/irqchip/Kconfig +++ b/drivers/irqchip/Kconfig @@ -571,4 +571,12 @@ config LOONGSON_PCH_MSI help Support for the Loongson PCH MSI Controller. +config IMS_MSI + bool "IMS Interrupt Message Store MSI controller" + depends on PCI + select DEVICE_MSI + help + Support for IMS Interrupt Message Store MSI controller + with IMS slot storage in a slot array + endmenu --- a/drivers/irqchip/Makefile +++ b/drivers/irqchip/Makefile @@ -111,3 +111,4 @@ obj-$(CONFIG_LOONGSON_HTPIC)+= irq-loo obj-$(CONFIG_LOONGSON_HTVEC) += irq-loongson-htvec.o obj-$(CONFIG_LOONGSON_PCH_PIC) += irq-loongson-pch-pic.o obj-$(CONFIG_LOONGSON_PCH_MSI) += irq-loongson-pch-msi.o +obj-$(CONFIG_IMS_MSI) += irq-ims-msi.o --- /dev/null +++ b/drivers/irqchip/irq-ims-msi.c @@ -0,0 +1,169 @@ +// SPDX-License-Identifier: GPL-2.0 +// (C) Copyright 2020 Thomas Gleixner +/* + * Shared interrupt chip and irq domain for Intel IMS devices + */ +#include +#include +#include +#include + +#include + +struct ims_data { + struct ims_array_info info; + unsigned long map[0]; +}; + +static void ims_mask_irq(struct irq_data *data) +{ + struct msi_desc *desc = irq_data_get_msi_desc(data); + struct ims_array_slot __iomem *slot = desc->device_msi.priv_iomem; + u32 __iomem *ctrl = &slot->ctrl; + + iowrite32(ioread32(ctrl) & ~IMS_VECTOR_CTRL_UNMASK, ctrl); +} + +static void ims_unmask_irq(struct irq_data *data) +{ + struct msi_desc *desc = irq_data_get_msi_desc(data); + struct ims_array_slot __iomem *slot = desc->device_msi.priv_iomem; + u32 __iomem *ctrl = &slot->ctrl; + + iowrite32(ioread32(ctrl) | IMS_VECTOR_CTRL_UNMASK, ctrl); +} + +static void ims_write_msi_msg(struct irq_data *data, struct msi_msg *msg) +{ + struct msi_desc *desc = irq_data_get_msi_desc(data); + struct ims_array_slot __iomem *slot = desc->device_msi.priv_iomem; + + iowrite32(msg->address_lo, &slot->address_lo); + iowrite32(msg->address_hi, &slot->address_hi); + iowrite32(msg->data, &slot->data); +} + +static const struct irq_chip ims_msi_controller = { + .name = "IMS", + .irq_mask = ims_mask_irq, + .irq_unmask = ims_unmask_irq, + .irq_write_msi_msg = ims_write_msi_msg, + .irq_retrigger = irq_chip_retrigger_hierarchy, + .flags = IRQCHIP_SKIP_SET_WAKE, +}; + +static void ims_reset_slot(struct ims_array_slot __iomem *slot) +{ + iowrite32(0, &slot->address_lo); + iowrite32(0, &slot->address_hi); + iowrite32(0, &slot->data); + iowrite32(0, &slot->ctrl); +} + +static void ims_free_msi_store(struct irq_domain *domain, struct device *dev) +{ + struct msi_domain_info *info = domain->host_data; + struct ims_data *ims = info->data; + struct msi_desc *entry; + + for_each_msi_entry(entry, dev) { + if (entry->device_msi.priv_iomem) { + clear_bit(entry->device_msi.hwirq, ims->map); + ims_reset_slot(entry->device_msi.priv_iomem); + entry->device_msi.priv_iomem = NULL; + entry->device_msi.hwirq = 0; + } + } +} + +static int ims_alloc_msi_store(struct irq_domain *domain, struct device *dev, + int nvec) +{ + struct msi_domain_info *info = domain->host_data; + struct ims_data *ims = info->data; + struct msi_desc *entry; + + for_each_msi_entry(entry, dev) { + unsigned int idx; + + idx = find_first_zero_bit(ims->map, ims->info.max_slots); + if (idx >= ims->info.max_slots) + goto fail; + set_
[patch RFC 22/38] x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()
The only user is in the same file and the name is too generic because this function is only ever used for HVM domains. Signed-off-by: Thomas Gleixner Cc: Konrad Rzeszutek Wilk Cc: linux-...@vger.kernel.org Cc: xen-de...@lists.xenproject.org Cc: Juergen Gross Cc: Boris Ostrovsky Cc: Stefano Stabellini --- arch/x86/pci/xen.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -419,7 +419,7 @@ int __init pci_xen_init(void) } #ifdef CONFIG_PCI_MSI -void __init xen_msi_init(void) +static void __init xen_hvm_msi_init(void) { if (!disable_apic) { /* @@ -459,7 +459,7 @@ int __init pci_xen_hvm_init(void) * We need to wait until after x2apic is initialized * before we can set MSI IRQ ops. */ - x86_platform.apic_post_init = xen_msi_init; + x86_platform.apic_post_init = xen_hvm_msi_init; #endif return 0; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 02/38] x86/init: Remove unused init ops
Some past platform removal forgot to get rid of this unused ballast. Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/mpspec.h | 10 -- arch/x86/include/asm/x86_init.h | 10 -- arch/x86/kernel/mpparse.c | 26 -- arch/x86/kernel/x86_init.c |4 4 files changed, 4 insertions(+), 46 deletions(-) --- a/arch/x86/include/asm/mpspec.h +++ b/arch/x86/include/asm/mpspec.h @@ -67,21 +67,11 @@ static inline void find_smp_config(void) #ifdef CONFIG_X86_MPPARSE extern void e820__memblock_alloc_reserved_mpc_new(void); extern int enable_update_mptable; -extern int default_mpc_apic_id(struct mpc_cpu *m); -extern void default_smp_read_mpc_oem(struct mpc_table *mpc); -# ifdef CONFIG_X86_IO_APIC -extern void default_mpc_oem_bus_info(struct mpc_bus *m, char *str); -# else -# define default_mpc_oem_bus_info NULL -# endif extern void default_find_smp_config(void); extern void default_get_smp_config(unsigned int early); #else static inline void e820__memblock_alloc_reserved_mpc_new(void) { } #define enable_update_mptable 0 -#define default_mpc_apic_id NULL -#define default_smp_read_mpc_oem NULL -#define default_mpc_oem_bus_info NULL #define default_find_smp_config x86_init_noop #define default_get_smp_config x86_init_uint_noop #endif --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -11,22 +11,12 @@ struct cpuinfo_x86; /** * struct x86_init_mpparse - platform specific mpparse ops - * @mpc_record:platform specific mpc record accounting * @setup_ioapic_ids: platform specific ioapic id override - * @mpc_apic_id: platform specific mpc apic id assignment - * @smp_read_mpc_oem: platform specific oem mpc table setup - * @mpc_oem_pci_bus: platform specific pci bus setup (default NULL) - * @mpc_oem_bus_info: platform specific mpc bus info * @find_smp_config: find the smp configuration * @get_smp_config:get the smp configuration */ struct x86_init_mpparse { - void (*mpc_record)(unsigned int mode); void (*setup_ioapic_ids)(void); - int (*mpc_apic_id)(struct mpc_cpu *m); - void (*smp_read_mpc_oem)(struct mpc_table *mpc); - void (*mpc_oem_pci_bus)(struct mpc_bus *m); - void (*mpc_oem_bus_info)(struct mpc_bus *m, char *name); void (*find_smp_config)(void); void (*get_smp_config)(unsigned int early); }; --- a/arch/x86/kernel/mpparse.c +++ b/arch/x86/kernel/mpparse.c @@ -46,11 +46,6 @@ static int __init mpf_checksum(unsigned return sum & 0xFF; } -int __init default_mpc_apic_id(struct mpc_cpu *m) -{ - return m->apicid; -} - static void __init MP_processor_info(struct mpc_cpu *m) { int apicid; @@ -61,7 +56,7 @@ static void __init MP_processor_info(str return; } - apicid = x86_init.mpparse.mpc_apic_id(m); + apicid = m->apicid; if (m->cpuflag & CPU_BOOTPROCESSOR) { bootup_cpu = " (Bootup-CPU)"; @@ -73,7 +68,7 @@ static void __init MP_processor_info(str } #ifdef CONFIG_X86_IO_APIC -void __init default_mpc_oem_bus_info(struct mpc_bus *m, char *str) +static void __init mpc_oem_bus_info(struct mpc_bus *m, char *str) { memcpy(str, m->bustype, 6); str[6] = 0; @@ -84,7 +79,7 @@ static void __init MP_bus_info(struct mp { char str[7]; - x86_init.mpparse.mpc_oem_bus_info(m, str); + mpc_oem_bus_info(m, str); #if MAX_MP_BUSSES < 256 if (m->busid >= MAX_MP_BUSSES) { @@ -100,9 +95,6 @@ static void __init MP_bus_info(struct mp mp_bus_id_to_type[m->busid] = MP_BUS_ISA; #endif } else if (strncmp(str, BUSTYPE_PCI, sizeof(BUSTYPE_PCI) - 1) == 0) { - if (x86_init.mpparse.mpc_oem_pci_bus) - x86_init.mpparse.mpc_oem_pci_bus(m); - clear_bit(m->busid, mp_bus_not_pci); #ifdef CONFIG_EISA mp_bus_id_to_type[m->busid] = MP_BUS_PCI; @@ -198,8 +190,6 @@ static void __init smp_dump_mptable(stru 1, mpc, mpc->length, 1); } -void __init default_smp_read_mpc_oem(struct mpc_table *mpc) { } - static int __init smp_read_mpc(struct mpc_table *mpc, unsigned early) { char str[16]; @@ -218,14 +208,7 @@ static int __init smp_read_mpc(struct mp if (early) return 1; - if (mpc->oemptr) - x86_init.mpparse.smp_read_mpc_oem(mpc); - - /* -* Now process the configuration blocks. -*/ - x86_init.mpparse.mpc_record(0); - + /* Now process the configuration blocks. */ while (count < mpc->length) { switch (*mpt) { case MP_PROCESSOR: @@ -256,7 +239,6 @@ static int __init smp_read_mpc(struct mp count = mpc->length; break; } - x86_init.mpparse.mpc_record(1)
[patch RFC 03/38] x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency
No functional change. Signed-off-by: Thomas Gleixner Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org --- arch/x86/include/asm/hw_irq.h |4 ++-- arch/x86/kernel/apic/msi.c |6 +++--- drivers/iommu/amd/iommu.c | 24 drivers/iommu/intel/irq_remapping.c | 18 +- 4 files changed, 26 insertions(+), 26 deletions(-) --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -36,8 +36,8 @@ struct msi_desc; enum irq_alloc_type { X86_IRQ_ALLOC_TYPE_IOAPIC = 1, X86_IRQ_ALLOC_TYPE_HPET, - X86_IRQ_ALLOC_TYPE_MSI, - X86_IRQ_ALLOC_TYPE_MSIX, + X86_IRQ_ALLOC_TYPE_PCI_MSI, + X86_IRQ_ALLOC_TYPE_PCI_MSIX, X86_IRQ_ALLOC_TYPE_DMAR, X86_IRQ_ALLOC_TYPE_UV, }; --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -188,7 +188,7 @@ int native_setup_msi_irqs(struct pci_dev struct irq_alloc_info info; init_irq_alloc_info(&info, NULL); - info.type = X86_IRQ_ALLOC_TYPE_MSI; + info.type = X86_IRQ_ALLOC_TYPE_PCI_MSI; info.msi_dev = dev; domain = irq_remapping_get_irq_domain(&info); @@ -220,9 +220,9 @@ int pci_msi_prepare(struct irq_domain *d init_irq_alloc_info(arg, NULL); arg->msi_dev = pdev; if (desc->msi_attrib.is_msix) { - arg->type = X86_IRQ_ALLOC_TYPE_MSIX; + arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX; } else { - arg->type = X86_IRQ_ALLOC_TYPE_MSI; + arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI; arg->flags |= X86_IRQ_ALLOC_CONTIGUOUS_VECTORS; } --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3514,8 +3514,8 @@ static int get_devid(struct irq_alloc_in case X86_IRQ_ALLOC_TYPE_HPET: devid = get_hpet_devid(info->hpet_id); break; - case X86_IRQ_ALLOC_TYPE_MSI: - case X86_IRQ_ALLOC_TYPE_MSIX: + case X86_IRQ_ALLOC_TYPE_PCI_MSI: + case X86_IRQ_ALLOC_TYPE_PCI_MSIX: devid = get_device_id(&info->msi_dev->dev); break; default: @@ -3553,8 +3553,8 @@ static struct irq_domain *get_irq_domain return NULL; switch (info->type) { - case X86_IRQ_ALLOC_TYPE_MSI: - case X86_IRQ_ALLOC_TYPE_MSIX: + case X86_IRQ_ALLOC_TYPE_PCI_MSI: + case X86_IRQ_ALLOC_TYPE_PCI_MSIX: devid = get_device_id(&info->msi_dev->dev); if (devid < 0) return NULL; @@ -3615,8 +3615,8 @@ static void irq_remapping_prepare_irte(s break; case X86_IRQ_ALLOC_TYPE_HPET: - case X86_IRQ_ALLOC_TYPE_MSI: - case X86_IRQ_ALLOC_TYPE_MSIX: + case X86_IRQ_ALLOC_TYPE_PCI_MSI: + case X86_IRQ_ALLOC_TYPE_PCI_MSIX: msg->address_hi = MSI_ADDR_BASE_HI; msg->address_lo = MSI_ADDR_BASE_LO; msg->data = irte_info->index; @@ -3660,15 +3660,15 @@ static int irq_remapping_alloc(struct ir if (!info) return -EINVAL; - if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_MSI && - info->type != X86_IRQ_ALLOC_TYPE_MSIX) + if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI && + info->type != X86_IRQ_ALLOC_TYPE_PCI_MSIX) return -EINVAL; /* * With IRQ remapping enabled, don't need contiguous CPU vectors * to support multiple MSI interrupts. */ - if (info->type == X86_IRQ_ALLOC_TYPE_MSI) + if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI) info->flags &= ~X86_IRQ_ALLOC_CONTIGUOUS_VECTORS; devid = get_devid(info); @@ -3700,9 +3700,9 @@ static int irq_remapping_alloc(struct ir } else { index = -ENOMEM; } - } else if (info->type == X86_IRQ_ALLOC_TYPE_MSI || - info->type == X86_IRQ_ALLOC_TYPE_MSIX) { - bool align = (info->type == X86_IRQ_ALLOC_TYPE_MSI); + } else if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI || + info->type == X86_IRQ_ALLOC_TYPE_PCI_MSIX) { + bool align = (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI); index = alloc_irq_index(devid, nr_irqs, align, info->msi_dev); } else { --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1115,8 +1115,8 @@ static struct irq_domain *intel_get_ir_i case X86_IRQ_ALLOC_TYPE_HPET: iommu = map_hpet_to_ir(info->hpet_id); break; - case X86_IRQ_ALLOC_TYPE_MSI: - case X86_IRQ_ALLOC_TYPE_MSIX: + case X86_IRQ_ALLOC_TYPE_PCI_MSI: + case X86_IRQ_ALLOC_TYPE_PCI_MSIX: iommu = map_dev_to_ir(info->msi_dev); break; default: @@ -1135,8 +1135,8 @@ static struct irq_domain *intel_get_irq_ return N
[patch RFC 00/38] x86, PCI, XEN, genirq ...: Prepare for device MSI
First of all, sorry for the horrible long Cc list, which was unfortunately unavoidable as this touches the world and some more. This patch series aims to provide a base to support device MSI (non PCI based) in a halfways architecture independent way. It's a mixed bag of bug fixes, cleanups and general improvements which are worthwhile independent of the device MSI stuff. Unfortunately this also comes with an evil abuse of the irqdomain system to coerce XEN on x86 into compliance without rewriting XEN from scratch. As discussed in length in this mail thread: https://lore.kernel.org/r/87h7tcgbs2@nanos.tec.linutronix.de the initial attempt of piggypacking device MSI support on platform MSI is doomed for various reasons, but creating independent interrupt domains for these upcoming magic PCI subdevices which are not PCI, but might be exposed as PCI devices is not as trivial as it seems. The initially suggested and evaluated approach of extending platform MSI turned out to be the completely wrong direction and in fact platform MSI should be rewritten on top of device MSI or completely replaced by it. One of the main issues is that x86 does not support the concept of irq domains associations stored in device::msi_domain and still relies on the arch_*_msi_irqs() fallback implementations which has it's own set of problems as outlined in https://lore.kernel.org/r/87bljg7u4f@nanos.tec.linutronix.de/ in the very same thread. The main obstacle of storing that pointer is XEN which has it's own historical notiion of handling PCI MSI interupts. This series tries to address these issues in several steps: 1) Accidental bug fixes iommu/amd: Prevent NULL pointer dereference 2) Janitoring x86/init: Remove unused init ops 3) Simplification of the x86 specific interrupt allocation mechanism x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency x86/irq: Add allocation type for parent domain retrieval iommu/vt-d: Consolidate irq domain getter iommu/amd: Consolidate irq domain getter iommu/irq_remapping: Consolidate irq domain lookup 4) Consolidation of the X86 specific interrupt allocation mechanism to be as close as possible to the generic MSI allocation mechanism which allows to get rid of quite a bunch of x86'isms which are pointless x86/irq: Prepare consolidation of irq_alloc_info x86/msi: Consolidate HPET allocation x86/ioapic: Consolidate IOAPIC allocation x86/irq: Consolidate DMAR irq allocation x86/irq: Consolidate UV domain allocation PCI: MSI: Rework pci_msi_domain_calc_hwirq() x86/msi: Consolidate MSI allocation x86/msi: Use generic MSI domain ops 5) x86 specific cleanups to remove the dependency on arch_*_msi_irqs() x86/irq: Move apic_post_init() invocation to one place z86/pci: Reducde #ifdeffery in PCI init code x86/irq: Initialize PCI/MSI domain at PCI init time irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI PCI: MSI: Provide pci_dev_has_special_msi_domain() helper x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init() x86/xen: Rework MSI teardown x86/xen: Consolidate XEN-MSI init irqdomain/msi: Allow to override msi_domain_alloc/free_irqs() x86/xen: Wrap XEN MSI management into irqdomain iommm/vt-d: Store irq domain in struct device iommm/amd: Store irq domain in struct device x86/pci: Set default irq domain in pcibios_add_device() PCI/MSI: Allow to disable arch fallbacks x86/irq: Cleanup the arch_*_msi_irqs() leftovers x86/irq: Make most MSI ops XEN private This one is paving the way to device MSI support, but it comes with an ugly and evil hack. The ability of overriding the default allocation/free functions of an MSI irq domain is useful in general as (hopefully) demonstrated with the device MSI POC, but the abuse in context of XEN is evil. OTOH without enough XENology and without rewriting XEN from scratch wrapping XEN MSI handling into a pseudo irq domain is a reasonable step forward for mere mortals with severly limited XENology. One day the XEN folks might make it a real irq domain. Perhaps when they have to support the same mess on other architectures. Hope dies last... At least the mechanism to override alloc/free turned out to be useful for implementing the base infrastructure for device MSI. So it's not a completely lost case. 6) X86 specific preparation for device MSI x86/irq: Add DEV_MSI allocation type x86/msi: Let pci_msi_prepare() handle non-PCI MSI 7) Generic device MSI infrastructure platform-msi: Provide default irq_chip:ack platform-msi: Add device MSI infrastructure 8) Infrastructure for and a POC of an IMS (Interrupt Message
[patch RFC 04/38] x86/irq: Add allocation type for parent domain retrieval
irq_remapping_ir_irq_domain() is used to retrieve the remapping parent domain for an allocation type. irq_remapping_irq_domain() is for retrieving the actual device domain for allocating interrupts for a device. The two functions are similar and can be unified by using explicit modes for parent irq domain retrieval. Add X86_IRQ_ALLOC_TYPE_IOAPIC/HPET_GET_PARENT and use it in the iommu implementations. Drop the parent domain retrieval for PCI_MSI/X as that is unused. Signed-off-by: Thomas Gleixner Cc: Joerg Roedel Cc: x...@kernel.org Cc: linux-hyp...@vger.kernel.org Cc: iommu@lists.linux-foundation.org Cc: Haiyang Zhang Cc: Jon Derrick Cc: Lu Baolu --- arch/x86/include/asm/hw_irq.h |2 ++ arch/x86/kernel/apic/io_apic.c |2 +- arch/x86/kernel/apic/msi.c |2 +- drivers/iommu/amd/iommu.c |8 drivers/iommu/hyperv-iommu.c|2 +- drivers/iommu/intel/irq_remapping.c |8 ++-- 6 files changed, 15 insertions(+), 9 deletions(-) --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -40,6 +40,8 @@ enum irq_alloc_type { X86_IRQ_ALLOC_TYPE_PCI_MSIX, X86_IRQ_ALLOC_TYPE_DMAR, X86_IRQ_ALLOC_TYPE_UV, + X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT, + X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT, }; struct irq_alloc_info { --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -2296,7 +2296,7 @@ static int mp_irqdomain_create(int ioapi return 0; init_irq_alloc_info(&info, NULL); - info.type = X86_IRQ_ALLOC_TYPE_IOAPIC; + info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT; info.ioapic_id = mpc_ioapic_id(ioapic); parent = irq_remapping_get_ir_irq_domain(&info); if (!parent) --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -476,7 +476,7 @@ struct irq_domain *hpet_create_irq_domai domain_info->data = (void *)(long)hpet_id; init_irq_alloc_info(&info, NULL); - info.type = X86_IRQ_ALLOC_TYPE_HPET; + info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT; info.hpet_id = hpet_id; parent = irq_remapping_get_ir_irq_domain(&info); if (parent == NULL) --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3534,6 +3534,14 @@ static struct irq_domain *get_ir_irq_dom if (!info) return NULL; + switch (info->type) { + case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT: + case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT: + break; + default: + return NULL; + } + devid = get_devid(info); if (devid >= 0) { iommu = amd_iommu_rlookup_table[devid]; --- a/drivers/iommu/hyperv-iommu.c +++ b/drivers/iommu/hyperv-iommu.c @@ -184,7 +184,7 @@ static int __init hyperv_enable_irq_rema static struct irq_domain *hyperv_get_ir_irq_domain(struct irq_alloc_info *info) { - if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC) + if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT) return ioapic_ir_domain; else return NULL; --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -1109,16 +1109,12 @@ static struct irq_domain *intel_get_ir_i return NULL; switch (info->type) { - case X86_IRQ_ALLOC_TYPE_IOAPIC: + case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT: iommu = map_ioapic_to_ir(info->ioapic_id); break; - case X86_IRQ_ALLOC_TYPE_HPET: + case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT: iommu = map_hpet_to_ir(info->hpet_id); break; - case X86_IRQ_ALLOC_TYPE_PCI_MSI: - case X86_IRQ_ALLOC_TYPE_PCI_MSIX: - iommu = map_dev_to_ir(info->msi_dev); - break; default: BUG_ON(1); break; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 05/38] iommu/vt-d: Consolidate irq domain getter
The irq domain request mode is now indicated in irq_alloc_info::type. Consolidate the two getter functions into one. Signed-off-by: Thomas Gleixner Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org Cc: Lu Baolu --- drivers/iommu/intel/irq_remapping.c | 67 1 file changed, 24 insertions(+), 43 deletions(-) --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -204,35 +204,40 @@ static int modify_irte(struct irq_2_iomm return rc; } -static struct intel_iommu *map_hpet_to_ir(u8 hpet_id) +static struct irq_domain *map_hpet_to_ir(u8 hpet_id) { int i; - for (i = 0; i < MAX_HPET_TBS; i++) + for (i = 0; i < MAX_HPET_TBS; i++) { if (ir_hpet[i].id == hpet_id && ir_hpet[i].iommu) - return ir_hpet[i].iommu; + return ir_hpet[i].iommu->ir_domain; + } return NULL; } -static struct intel_iommu *map_ioapic_to_ir(int apic) +static struct intel_iommu *map_ioapic_to_iommu(int apic) { int i; - for (i = 0; i < MAX_IO_APICS; i++) + for (i = 0; i < MAX_IO_APICS; i++) { if (ir_ioapic[i].id == apic && ir_ioapic[i].iommu) return ir_ioapic[i].iommu; + } return NULL; } -static struct intel_iommu *map_dev_to_ir(struct pci_dev *dev) +static struct irq_domain *map_ioapic_to_ir(int apic) { - struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu = map_ioapic_to_iommu(apic); - drhd = dmar_find_matched_drhd_unit(dev); - if (!drhd) - return NULL; + return iommu ? iommu->ir_domain : NULL; +} + +static struct irq_domain *map_dev_to_ir(struct pci_dev *dev) +{ + struct dmar_drhd_unit *drhd = dmar_find_matched_drhd_unit(dev); - return drhd->iommu; + return drhd ? drhd->iommu->ir_msi_domain : NULL; } static int clear_entries(struct irq_2_iommu *irq_iommu) @@ -996,7 +1001,7 @@ static int __init parse_ioapics_under_ir for (ioapic_idx = 0; ioapic_idx < nr_ioapics; ioapic_idx++) { int ioapic_id = mpc_ioapic_id(ioapic_idx); - if (!map_ioapic_to_ir(ioapic_id)) { + if (!map_ioapic_to_iommu(ioapic_id)) { pr_err(FW_BUG "ioapic %d has no mapping iommu, " "interrupt remapping will be disabled\n", ioapic_id); @@ -1101,47 +1106,23 @@ static void prepare_irte(struct irte *ir irte->redir_hint = 1; } -static struct irq_domain *intel_get_ir_irq_domain(struct irq_alloc_info *info) +static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info) { - struct intel_iommu *iommu = NULL; - if (!info) return NULL; switch (info->type) { case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT: - iommu = map_ioapic_to_ir(info->ioapic_id); - break; + return map_ioapic_to_ir(info->ioapic_id); case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT: - iommu = map_hpet_to_ir(info->hpet_id); - break; - default: - BUG_ON(1); - break; - } - - return iommu ? iommu->ir_domain : NULL; -} - -static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info) -{ - struct intel_iommu *iommu; - - if (!info) - return NULL; - - switch (info->type) { + return map_hpet_to_ir(info->hpet_id); case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: - iommu = map_dev_to_ir(info->msi_dev); - if (iommu) - return iommu->ir_msi_domain; - break; + return map_dev_to_ir(info->msi_dev); default: - break; + WARN_ON_ONCE(1); + return NULL; } - - return NULL; } struct irq_remap_ops intel_irq_remap_ops = { @@ -1150,7 +1131,7 @@ struct irq_remap_ops intel_irq_remap_ops .disable= disable_irq_remapping, .reenable = reenable_irq_remapping, .enable_faulting= enable_drhd_fault_handling, - .get_ir_irq_domain = intel_get_ir_irq_domain, + .get_ir_irq_domain = intel_get_irq_domain, .get_irq_domain = intel_get_irq_domain, }; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch RFC 06/38] iommu/amd: Consolidate irq domain getter
The irq domain request mode is now indicated in irq_alloc_info::type. Consolidate the two getter functions into one. Signed-off-by: Thomas Gleixner Cc: Joerg Roedel Cc: iommu@lists.linux-foundation.org --- drivers/iommu/amd/iommu.c | 65 ++ 1 file changed, 21 insertions(+), 44 deletions(-) --- a/drivers/iommu/amd/iommu.c +++ b/drivers/iommu/amd/iommu.c @@ -3505,77 +3505,54 @@ static void irte_ga_clear_allocated(stru static int get_devid(struct irq_alloc_info *info) { - int devid = -1; - switch (info->type) { case X86_IRQ_ALLOC_TYPE_IOAPIC: - devid = get_ioapic_devid(info->ioapic_id); - break; + case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT: + return get_ioapic_devid(info->ioapic_id); case X86_IRQ_ALLOC_TYPE_HPET: - devid = get_hpet_devid(info->hpet_id); - break; + case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT: + return get_hpet_devid(info->hpet_id); case X86_IRQ_ALLOC_TYPE_PCI_MSI: case X86_IRQ_ALLOC_TYPE_PCI_MSIX: - devid = get_device_id(&info->msi_dev->dev); - break; + return get_device_id(&info->msi_dev->dev); default: - BUG_ON(1); - break; + WARN_ON_ONCE(1); + return -1; } - - return devid; } -static struct irq_domain *get_ir_irq_domain(struct irq_alloc_info *info) +static struct irq_domain *get_irq_domain_for_devid(struct irq_alloc_info *info, + int devid) { - struct amd_iommu *iommu; - int devid; + struct amd_iommu *iommu = amd_iommu_rlookup_table[devid]; - if (!info) + if (!iommu) return NULL; switch (info->type) { case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT: case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT: - break; + return iommu->ir_domain; + case X86_IRQ_ALLOC_TYPE_PCI_MSI: + case X86_IRQ_ALLOC_TYPE_PCI_MSIX: + return iommu->msi_domain; default: + WARN_ON_ONCE(1); return NULL; } - - devid = get_devid(info); - if (devid >= 0) { - iommu = amd_iommu_rlookup_table[devid]; - if (iommu) - return iommu->ir_domain; - } - - return NULL; } static struct irq_domain *get_irq_domain(struct irq_alloc_info *info) { - struct amd_iommu *iommu; int devid; if (!info) return NULL; - switch (info->type) { - case X86_IRQ_ALLOC_TYPE_PCI_MSI: - case X86_IRQ_ALLOC_TYPE_PCI_MSIX: - devid = get_device_id(&info->msi_dev->dev); - if (devid < 0) - return NULL; - - iommu = amd_iommu_rlookup_table[devid]; - if (iommu) - return iommu->msi_domain; - break; - default: - break; - } - - return NULL; + devid = get_devid(info); + if (devid < 0) + return NULL; + return get_irq_domain_for_devid(info, devid); } struct irq_remap_ops amd_iommu_irq_ops = { @@ -3584,7 +3561,7 @@ struct irq_remap_ops amd_iommu_irq_ops = .disable= amd_iommu_disable, .reenable = amd_iommu_reenable, .enable_faulting= amd_iommu_enable_faulting, - .get_ir_irq_domain = get_ir_irq_domain, + .get_ir_irq_domain = get_irq_domain, .get_irq_domain = get_irq_domain, }; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 07/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)
On Fri, 21 Aug 2020 00:37:19 + "Liu, Yi L" wrote: > Hi Alex, > > > From: Alex Williamson > > Sent: Friday, August 21, 2020 4:51 AM > > > > On Mon, 27 Jul 2020 23:27:36 -0700 > > Liu Yi L wrote: > > > > > This patch allows userspace to request PASID allocation/free, e.g. > > > when serving the request from the guest. > > > > > > PASIDs that are not freed by userspace are automatically freed when > > > the IOASID set is destroyed when process exits. > > > > > > Cc: Kevin Tian > > > CC: Jacob Pan > > > Cc: Alex Williamson > > > Cc: Eric Auger > > > Cc: Jean-Philippe Brucker > > > Cc: Joerg Roedel > > > Cc: Lu Baolu > > > Signed-off-by: Liu Yi L > > > Signed-off-by: Yi Sun > > > Signed-off-by: Jacob Pan > > > --- > > > v5 -> v6: > > > *) address comments from Eric against v5. remove the alloc/free helper. > > > > > > v4 -> v5: > > > *) address comments from Eric Auger. > > > *) the comments for the PASID_FREE request is addressed in patch 5/15 of > > >this series. > > > > > > v3 -> v4: > > > *) address comments from v3, except the below comment against the range > > >of PASID_FREE request. needs more help on it. > > > "> +if (req.range.min > req.range.max) > > > > > > Is it exploitable that a user can spin the kernel for a long time in > > > the case of a free by calling this with [0, MAX_UINT] regardless of > > > their actual allocations?" > > > > > > https://lore.kernel.org/linux-iommu/20200702151832.048b4...@x1.home/ > > > > > > v1 -> v2: > > > *) move the vfio_mm related code to be a seprate module > > > *) use a single structure for alloc/free, could support a range of > > > PASIDs > > > *) fetch vfio_mm at group_attach time instead of at iommu driver open > > > time > > > --- > > > drivers/vfio/Kconfig| 1 + > > > drivers/vfio/vfio_iommu_type1.c | 69 > > + > > > drivers/vfio/vfio_pasid.c | 10 ++ > > > include/linux/vfio.h| 6 > > > include/uapi/linux/vfio.h | 37 ++ > > > 5 files changed, 123 insertions(+) > > > > > > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index > > > 3d8a108..95d90c6 100644 > > > --- a/drivers/vfio/Kconfig > > > +++ b/drivers/vfio/Kconfig > > > @@ -2,6 +2,7 @@ > > > config VFIO_IOMMU_TYPE1 > > > tristate > > > depends on VFIO > > > + select VFIO_PASID if (X86) > > > default n > > > > > > config VFIO_IOMMU_SPAPR_TCE > > > diff --git a/drivers/vfio/vfio_iommu_type1.c > > > b/drivers/vfio/vfio_iommu_type1.c index 18ff0c3..ea89c7c 100644 > > > --- a/drivers/vfio/vfio_iommu_type1.c > > > +++ b/drivers/vfio/vfio_iommu_type1.c > > > @@ -76,6 +76,7 @@ struct vfio_iommu { > > > booldirty_page_tracking; > > > boolpinned_page_dirty_scope; > > > struct iommu_nesting_info *nesting_info; > > > + struct vfio_mm *vmm; > > > }; > > > > > > struct vfio_domain { > > > @@ -1937,6 +1938,11 @@ static void vfio_iommu_iova_insert_copy(struct > > > vfio_iommu *iommu, > > > > > > static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu) > > > { > > > + if (iommu->vmm) { > > > + vfio_mm_put(iommu->vmm); > > > + iommu->vmm = NULL; > > > + } > > > + > > > kfree(iommu->nesting_info); > > > iommu->nesting_info = NULL; > > > } > > > @@ -2071,6 +2077,26 @@ static int vfio_iommu_type1_attach_group(void > > *iommu_data, > > > iommu->nesting_info); > > > if (ret) > > > goto out_detach; > > > + > > > + if (iommu->nesting_info->features & > > > + IOMMU_NESTING_FEAT_SYSWIDE_PASID) > > { > > > + struct vfio_mm *vmm; > > > + int sid; > > > + > > > + vmm = vfio_mm_get_from_task(current); > > > + if (IS_ERR(vmm)) { > > > + ret = PTR_ERR(vmm); > > > + goto out_detach; > > > + } > > > + iommu->vmm = vmm; > > > + > > > + sid = vfio_mm_ioasid_sid(vmm); > > > + ret = iommu_domain_set_attr(domain->domain, > > > + DOMAIN_ATTR_IOASID_SID, > > > + &sid); > > > + if (ret) > > > + goto out_detach; > > > + } > > > } > > > > > > /* Get aperture info */ > > > @@ -2859,6 +2885,47 @@ static int vfio_iommu_type1_dirty_pages(struct > > vfio_iommu *iommu, > > > return -EINVAL; > > > } > > > > > > +static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu, > > > + unsigned long arg) > > > +{ > > > + struct vfio_iommu_type1_pasid_request req; > > > + unsigned long minsz; > > > + int ret; > > > + > > > + minsz = offsetofend(struct vfio_iommu_type1_pasid_request, range);
RE: [PATCH v6 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs
Hi Alex, > From: Alex Williamson > Sent: Friday, August 21, 2020 5:49 AM > > On Mon, 27 Jul 2020 23:27:41 -0700 > Liu Yi L wrote: > > > Recent years, mediated device pass-through framework (e.g. vfio-mdev) > > is used to achieve flexible device sharing across domains (e.g. VMs). > > Also there are hardware assisted mediated pass-through solutions from > > platform vendors. e.g. Intel VT-d scalable mode which supports Intel > > Scalable I/O Virtualization technology. Such mdevs are called IOMMU- > > backed mdevs as there are IOMMU enforced DMA isolation for such mdevs. > > In kernel, IOMMU-backed mdevs are exposed to IOMMU layer by aux-domain > > Or a physical IOMMU backing device. got it. :-) > > concept, which means mdevs are protected by an iommu domain which is > > auxiliary to the domain that the kernel driver primarily uses for DMA > > API. Details can be found in the KVM presentation as below: > > > > https://events19.linuxfoundation.org/wp-content/uploads/2017/12/\ > > Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf > > I think letting the line exceed 80 columns is preferable so that it's > clickable. Thanks, yeah, it's clickable now. will do it. :-) Thanks, Yi Liu > Alex > > > This patch extends NESTING_IOMMU ops to IOMMU-backed mdev devices. The > > main requirement is to use the auxiliary domain associated with mdev. > > > > Cc: Kevin Tian > > CC: Jacob Pan > > CC: Jun Tian > > Cc: Alex Williamson > > Cc: Eric Auger > > Cc: Jean-Philippe Brucker > > Cc: Joerg Roedel > > Cc: Lu Baolu > > Reviewed-by: Eric Auger > > Signed-off-by: Liu Yi L > > --- > > v5 -> v6: > > *) add review-by from Eric Auger. > > > > v1 -> v2: > > *) check the iommu_device to ensure the handling mdev is IOMMU-backed > > --- > > drivers/vfio/vfio_iommu_type1.c | 40 > > > > 1 file changed, 36 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c > > b/drivers/vfio/vfio_iommu_type1.c index bf95a0f..9d8f252 100644 > > --- a/drivers/vfio/vfio_iommu_type1.c > > +++ b/drivers/vfio/vfio_iommu_type1.c > > @@ -2379,20 +2379,41 @@ static int vfio_iommu_resv_refresh(struct > vfio_iommu *iommu, > > return ret; > > } > > > > +static struct device *vfio_get_iommu_device(struct vfio_group *group, > > + struct device *dev) > > +{ > > + if (group->mdev_group) > > + return vfio_mdev_get_iommu_device(dev); > > + else > > + return dev; > > +} > > + > > static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data) { > > struct domain_capsule *dc = (struct domain_capsule *)data; > > unsigned long arg = *(unsigned long *)dc->data; > > + struct device *iommu_device; > > + > > + iommu_device = vfio_get_iommu_device(dc->group, dev); > > + if (!iommu_device) > > + return -EINVAL; > > > > - return iommu_uapi_sva_bind_gpasid(dc->domain, dev, (void __user *)arg); > > + return iommu_uapi_sva_bind_gpasid(dc->domain, iommu_device, > > + (void __user *)arg); > > } > > > > static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data) > > { > > struct domain_capsule *dc = (struct domain_capsule *)data; > > unsigned long arg = *(unsigned long *)dc->data; > > + struct device *iommu_device; > > > > - iommu_uapi_sva_unbind_gpasid(dc->domain, dev, (void __user *)arg); > > + iommu_device = vfio_get_iommu_device(dc->group, dev); > > + if (!iommu_device) > > + return -EINVAL; > > + > > + iommu_uapi_sva_unbind_gpasid(dc->domain, iommu_device, > > +(void __user *)arg); > > return 0; > > } > > > > @@ -2401,8 +2422,13 @@ static int __vfio_dev_unbind_gpasid_fn(struct device > *dev, void *data) > > struct domain_capsule *dc = (struct domain_capsule *)data; > > struct iommu_gpasid_bind_data *unbind_data = > > (struct iommu_gpasid_bind_data *)dc->data; > > + struct device *iommu_device; > > + > > + iommu_device = vfio_get_iommu_device(dc->group, dev); > > + if (!iommu_device) > > + return -EINVAL; > > > > - iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data); > > + iommu_sva_unbind_gpasid(dc->domain, iommu_device, unbind_data); > > return 0; > > } > > > > @@ -3060,8 +3086,14 @@ static int vfio_dev_cache_invalidate_fn(struct > > device *dev, void *data) { > > struct domain_capsule *dc = (struct domain_capsule *)data; > > unsigned long arg = *(unsigned long *)dc->data; > > + struct device *iommu_device; > > + > > + iommu_device = vfio_get_iommu_device(dc->group, dev); > > + if (!iommu_device) > > + return -EINVAL; > > > > - iommu_uapi_cache_invalidate(dc->domain, dev, (void __user *)arg); > > + iommu_uapi_cache_invalidate(dc->domain, iommu_device, > > + (void __user *)arg); > > return 0; > > } > > ___
RE: [PATCH v6 04/15] vfio/type1: Report iommu nesting info to userspace
Hi Alex, > From: Alex Williamson > Sent: Friday, August 21, 2020 3:52 AM > > On Mon, 27 Jul 2020 23:27:33 -0700 > Liu Yi L wrote: > > > This patch exports iommu nesting capability info to user space through > > VFIO. Userspace is expected to check this info for supported uAPIs (e.g. > > PASID alloc/free, bind page table, and cache invalidation) and the vendor > > specific format information for first level/stage page table that will be > > bound to. > > > > The nesting info is available only after container set to be NESTED type. > > Current implementation imposes one limitation - one nesting container > > should include at most one iommu group. The philosophy of vfio container > > is having all groups/devices within the container share the same IOMMU > > context. When vSVA is enabled, one IOMMU context could include one 2nd- > > level address space and multiple 1st-level address spaces. While the > > 2nd-level address space is reasonably sharable by multiple groups, blindly > > sharing 1st-level address spaces across all groups within the container > > might instead break the guest expectation. In the future sub/super container > > concept might be introduced to allow partial address space sharing within > > an IOMMU context. But for now let's go with this restriction by requiring > > singleton container for using nesting iommu features. Below link has the > > related discussion about this decision. > > > > https://lore.kernel.org/kvm/20200515115924.37e69...@w520.home/ > > > > This patch also changes the NESTING type container behaviour. Something > > that would have succeeded before will now fail: Before this series, if > > user asked for a VFIO_IOMMU_TYPE1_NESTING, it would have succeeded even > > if the SMMU didn't support stage-2, as the driver would have silently > > fallen back on stage-1 mappings (which work exactly the same as stage-2 > > only since there was no nesting supported). After the series, we do check > > for DOMAIN_ATTR_NESTING so if user asks for VFIO_IOMMU_TYPE1_NESTING > and > > the SMMU doesn't support stage-2, the ioctl fails. But it should be a good > > fix and completely harmless. Detail can be found in below link as well. > > > > https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/ > > > > Cc: Kevin Tian > > CC: Jacob Pan > > Cc: Alex Williamson > > Cc: Eric Auger > > Cc: Jean-Philippe Brucker > > Cc: Joerg Roedel > > Cc: Lu Baolu > > Signed-off-by: Liu Yi L > > --- > > v5 -> v6: > > *) address comments against v5 from Eric Auger. > > *) don't report nesting cap to userspace if the nesting_info->format is > >invalid. > > > > v4 -> v5: > > *) address comments from Eric Auger. > > *) return struct iommu_nesting_info for > VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as > >cap is much "cheap", if needs extension in future, just define another > > cap. > >https://lore.kernel.org/kvm/20200708132947.5b7ee...@x1.home/ > > > > v3 -> v4: > > *) address comments against v3. > > > > v1 -> v2: > > *) added in v2 > > --- > > drivers/vfio/vfio_iommu_type1.c | 106 > +++- > > include/uapi/linux/vfio.h | 19 +++ > > 2 files changed, 113 insertions(+), 12 deletions(-) > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c > > b/drivers/vfio/vfio_iommu_type1.c > > index 3bd70ff..18ff0c3 100644 > > --- a/drivers/vfio/vfio_iommu_type1.c > > +++ b/drivers/vfio/vfio_iommu_type1.c > > @@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit, > > "Maximum number of user DMA mappings per container (65535)."); > > > > struct vfio_iommu { > > - struct list_headdomain_list; > > - struct list_headiova_list; > > - struct vfio_domain *external_domain; /* domain for external user */ > > - struct mutexlock; > > - struct rb_root dma_list; > > - struct blocking_notifier_head notifier; > > - unsigned intdma_avail; > > - uint64_tpgsize_bitmap; > > - boolv2; > > - boolnesting; > > - booldirty_page_tracking; > > - boolpinned_page_dirty_scope; > > + struct list_headdomain_list; > > + struct list_headiova_list; > > + /* domain for external user */ > > + struct vfio_domain *external_domain; > > + struct mutexlock; > > + struct rb_root dma_list; > > + struct blocking_notifier_head notifier; > > + unsigned intdma_avail; > > + uint64_tpgsize_bitmap; > > + boolv2; > > + boolnesting; > > + booldirty_page_tracking; > > + boolpinned_page_dirty_scope; > > + struct iommu_nesting_info *nesting_info; > > }; > > > > struct vfio_domain { > > @@ -130,6 +132,9 @@ struct vfio_regions { > > #define IS_IOMMU_CAP_DOMAIN_
RE: [PATCH v6 07/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)
Hi Alex, > From: Alex Williamson > Sent: Friday, August 21, 2020 4:51 AM > > On Mon, 27 Jul 2020 23:27:36 -0700 > Liu Yi L wrote: > > > This patch allows userspace to request PASID allocation/free, e.g. > > when serving the request from the guest. > > > > PASIDs that are not freed by userspace are automatically freed when > > the IOASID set is destroyed when process exits. > > > > Cc: Kevin Tian > > CC: Jacob Pan > > Cc: Alex Williamson > > Cc: Eric Auger > > Cc: Jean-Philippe Brucker > > Cc: Joerg Roedel > > Cc: Lu Baolu > > Signed-off-by: Liu Yi L > > Signed-off-by: Yi Sun > > Signed-off-by: Jacob Pan > > --- > > v5 -> v6: > > *) address comments from Eric against v5. remove the alloc/free helper. > > > > v4 -> v5: > > *) address comments from Eric Auger. > > *) the comments for the PASID_FREE request is addressed in patch 5/15 of > >this series. > > > > v3 -> v4: > > *) address comments from v3, except the below comment against the range > >of PASID_FREE request. needs more help on it. > > "> +if (req.range.min > req.range.max) > > > > Is it exploitable that a user can spin the kernel for a long time in > > the case of a free by calling this with [0, MAX_UINT] regardless of > > their actual allocations?" > > > > https://lore.kernel.org/linux-iommu/20200702151832.048b4...@x1.home/ > > > > v1 -> v2: > > *) move the vfio_mm related code to be a seprate module > > *) use a single structure for alloc/free, could support a range of > > PASIDs > > *) fetch vfio_mm at group_attach time instead of at iommu driver open > > time > > --- > > drivers/vfio/Kconfig| 1 + > > drivers/vfio/vfio_iommu_type1.c | 69 > + > > drivers/vfio/vfio_pasid.c | 10 ++ > > include/linux/vfio.h| 6 > > include/uapi/linux/vfio.h | 37 ++ > > 5 files changed, 123 insertions(+) > > > > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index > > 3d8a108..95d90c6 100644 > > --- a/drivers/vfio/Kconfig > > +++ b/drivers/vfio/Kconfig > > @@ -2,6 +2,7 @@ > > config VFIO_IOMMU_TYPE1 > > tristate > > depends on VFIO > > + select VFIO_PASID if (X86) > > default n > > > > config VFIO_IOMMU_SPAPR_TCE > > diff --git a/drivers/vfio/vfio_iommu_type1.c > > b/drivers/vfio/vfio_iommu_type1.c index 18ff0c3..ea89c7c 100644 > > --- a/drivers/vfio/vfio_iommu_type1.c > > +++ b/drivers/vfio/vfio_iommu_type1.c > > @@ -76,6 +76,7 @@ struct vfio_iommu { > > booldirty_page_tracking; > > boolpinned_page_dirty_scope; > > struct iommu_nesting_info *nesting_info; > > + struct vfio_mm *vmm; > > }; > > > > struct vfio_domain { > > @@ -1937,6 +1938,11 @@ static void vfio_iommu_iova_insert_copy(struct > > vfio_iommu *iommu, > > > > static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu) > > { > > + if (iommu->vmm) { > > + vfio_mm_put(iommu->vmm); > > + iommu->vmm = NULL; > > + } > > + > > kfree(iommu->nesting_info); > > iommu->nesting_info = NULL; > > } > > @@ -2071,6 +2077,26 @@ static int vfio_iommu_type1_attach_group(void > *iommu_data, > > iommu->nesting_info); > > if (ret) > > goto out_detach; > > + > > + if (iommu->nesting_info->features & > > + IOMMU_NESTING_FEAT_SYSWIDE_PASID) > { > > + struct vfio_mm *vmm; > > + int sid; > > + > > + vmm = vfio_mm_get_from_task(current); > > + if (IS_ERR(vmm)) { > > + ret = PTR_ERR(vmm); > > + goto out_detach; > > + } > > + iommu->vmm = vmm; > > + > > + sid = vfio_mm_ioasid_sid(vmm); > > + ret = iommu_domain_set_attr(domain->domain, > > + DOMAIN_ATTR_IOASID_SID, > > + &sid); > > + if (ret) > > + goto out_detach; > > + } > > } > > > > /* Get aperture info */ > > @@ -2859,6 +2885,47 @@ static int vfio_iommu_type1_dirty_pages(struct > vfio_iommu *iommu, > > return -EINVAL; > > } > > > > +static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu, > > + unsigned long arg) > > +{ > > + struct vfio_iommu_type1_pasid_request req; > > + unsigned long minsz; > > + int ret; > > + > > + minsz = offsetofend(struct vfio_iommu_type1_pasid_request, range); > > + > > + if (copy_from_user(&req, (void __user *)arg, minsz)) > > + return -EFAULT; > > + > > + if (req.argsz < minsz || (req.flags & ~VFIO_PASID_REQUEST_MASK)) > > + return -EINVAL; > > + > > + if (req.range.min > req.range.max) > > +
Re: [PATCH 12/18] iommu/tegra-gart: Add IOMMU_DOMAIN_DMA support
On 2020-08-20 21:16, Dmitry Osipenko wrote: 20.08.2020 18:08, Robin Murphy пишет: Now that arch/arm is wired up for default domains and iommu-dma, implement the corresponding driver-side support for DMA domains. Signed-off-by: Robin Murphy --- drivers/iommu/tegra-gart.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c index fac720273889..e081387080f6 100644 --- a/drivers/iommu/tegra-gart.c +++ b/drivers/iommu/tegra-gart.c @@ -9,6 +9,7 @@ #define dev_fmt(fmt) "gart: " fmt +#include #include #include #include @@ -145,16 +146,22 @@ static struct iommu_domain *gart_iommu_domain_alloc(unsigned type) { struct iommu_domain *domain; Hello, Robin! Tegra20 GART isn't a real IOMMU, but a small relocation aperture. We would only want to use it for a temporal mappings (managed by GPU driver) for the time while GPU hardware is busy and working with a sparse DMA buffers, the driver will take care of unmapping the sparse buffers once GPU work is finished [1]. In a case of contiguous DMA buffers, we want to bypass the IOMMU and use buffer's phys address because GART aperture is small and all buffers simply can't fit into GART for a complex GPU operations that involve multiple buffers [2][3]. The upstream GPU driver still doesn't support GART, but eventually it needs to be changed. [1] https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gart.c#L489 [2] https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gart.c#L542 [3] https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/uapi/patching.c#L90 - if (type != IOMMU_DOMAIN_UNMANAGED) + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA) return NULL; Will a returned NULL tell to IOMMU core that implicit domain shouldn't be used? Is it possible to leave this driver as-is? The aim of this patch was just to make the conversion without functional changes wherever possible, i.e. maintain an equivalent to the existing ARM behaviour of allocating its own implicit domains for everything. It doesn't represent any judgement of whether that was ever appropriate for this driver in the first place ;) Hopefully my other reply already covered the degree of control drivers can have with proper default domains, but do shout if anything wasn't clear. Cheers, Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v6 08/15] iommu: Pass domain to sva_unbind_gpasid()
Hi Alex, > From: Alex Williamson > Sent: Friday, August 21, 2020 5:06 AM > > On Mon, 27 Jul 2020 23:27:37 -0700 > Liu Yi L wrote: > > > From: Yi Sun > > > > Current interface is good enough for SVA virtualization on an assigned > > physical PCI device, but when it comes to mediated devices, a physical > > device may attached with multiple aux-domains. Also, for guest unbind, > > s/may/may be/ got it. > > > the PASID to be unbind should be allocated to the VM. This check > > requires to know the ioasid_set which is associated with the domain. > > > > So this interface needs to pass in domain info. Then the iommu driver > > is able to know which domain will be used for the 2nd stage > > translation of the nesting mode and also be able to do PASID ownership > > check. This patch passes @domain per the above reason. Also, the > > prototype of &pasid is changed frnt" to "u32" as the below link. > > s/frnt"/from an "int"/ got it. > > https://lore.kernel.org/kvm/27ac7880-bdd3-2891-139e-b4a7cd18420b@redha > > t.com/ > > This is really confusing, the link is to Eric's comment asking that the > conversion from > (at the time) int to ioasid_t be included in the commit log. The text here > implies that > it's pointing to some sort of justification for the change, which it isn't. > It just notes > that it happened, not why it happened, with a mostly irrelevant link. really sorry, a mistake from me. it should be the below link. [PATCH v6 01/12] iommu: Change type of pasid to u32 https://lore.kernel.org/linux-iommu/1594684087-61184-2-git-send-email-fenghua...@intel.com/ > > Cc: Kevin Tian > > CC: Jacob Pan > > Cc: Alex Williamson > > Cc: Eric Auger > > Cc: Jean-Philippe Brucker > > Cc: Joerg Roedel > > Cc: Lu Baolu > > Reviewed-by: Eric Auger > > Signed-off-by: Yi Sun > > Signed-off-by: Liu Yi L > > --- > > v5 -> v6: > > *) use "u32" prototype for @pasid. > > *) add review-by from Eric Auger. > > I'd probably hold off on adding Eric's R-b given the additional change in > this version > FWIW. Thanks, ok, will hold on it. :-) Regards, Yi Liu > Alex > > > v2 -> v3: > > *) pass in domain info only > > *) use u32 for pasid instead of int type > > > > v1 -> v2: > > *) added in v2. > > --- > > drivers/iommu/intel/svm.c | 3 ++- > > drivers/iommu/iommu.c | 2 +- > > include/linux/intel-iommu.h | 3 ++- > > include/linux/iommu.h | 3 ++- > > 4 files changed, 7 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c > > index c27d16a..c85b8d5 100644 > > --- a/drivers/iommu/intel/svm.c > > +++ b/drivers/iommu/intel/svm.c > > @@ -436,7 +436,8 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, > struct device *dev, > > return ret; > > } > > > > -int intel_svm_unbind_gpasid(struct device *dev, int pasid) > > +int intel_svm_unbind_gpasid(struct iommu_domain *domain, > > + struct device *dev, u32 pasid) > > { > > struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); > > struct intel_svm_dev *sdev; > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index > > 1ce2a61..bee79d7 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -2145,7 +2145,7 @@ int iommu_sva_unbind_gpasid(struct iommu_domain > *domain, struct device *dev, > > if (unlikely(!domain->ops->sva_unbind_gpasid)) > > return -ENODEV; > > > > - return domain->ops->sva_unbind_gpasid(dev, data->hpasid); > > + return domain->ops->sva_unbind_gpasid(domain, dev, data->hpasid); > > } > > EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid); > > > > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h > > index 0d0ab32..f98146b 100644 > > --- a/include/linux/intel-iommu.h > > +++ b/include/linux/intel-iommu.h > > @@ -738,7 +738,8 @@ extern int intel_svm_enable_prq(struct intel_iommu > > *iommu); extern int intel_svm_finish_prq(struct intel_iommu *iommu); > > int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev, > > struct iommu_gpasid_bind_data *data); -int > > intel_svm_unbind_gpasid(struct device *dev, int pasid); > > +int intel_svm_unbind_gpasid(struct iommu_domain *domain, > > + struct device *dev, u32 pasid); > > struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm, > > void *drvdata); > > void intel_svm_unbind(struct iommu_sva *handle); diff --git > > a/include/linux/iommu.h b/include/linux/iommu.h index b1ff702..80467fc > > 100644 > > --- a/include/linux/iommu.h > > +++ b/include/linux/iommu.h > > @@ -303,7 +303,8 @@ struct iommu_ops { > > int (*sva_bind_gpasid)(struct iommu_domain *domain, > > struct device *dev, struct iommu_gpasid_bind_data > > *data); > > > > - int (*sva_unbind_gpasid)(struct device *dev, int pasid); > > + int (*sva_unbind_gpasid)(struct iommu_domain *domain, > > +struct d
Re: [PATCH 16/18] staging/media/tegra-vde: Clean up IOMMU workaround
On 2020-08-20 20:51, Dmitry Osipenko wrote: 20.08.2020 18:08, Robin Murphy пишет: Now that arch/arm is wired up for default domains and iommu-dma, we no longer need to work around the arch-private mapping. Signed-off-by: Robin Murphy --- drivers/staging/media/tegra-vde/iommu.c | 12 1 file changed, 12 deletions(-) diff --git a/drivers/staging/media/tegra-vde/iommu.c b/drivers/staging/media/tegra-vde/iommu.c index 6af863d92123..4f770189ed34 100644 --- a/drivers/staging/media/tegra-vde/iommu.c +++ b/drivers/staging/media/tegra-vde/iommu.c @@ -10,10 +10,6 @@ #include #include -#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU) -#include -#endif - #include "vde.h" int tegra_vde_iommu_map(struct tegra_vde *vde, @@ -70,14 +66,6 @@ int tegra_vde_iommu_init(struct tegra_vde *vde) if (!vde->group) return 0; -#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU) - if (dev->archdata.mapping) { - struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); - - arm_iommu_detach_device(dev); - arm_iommu_release_mapping(mapping); - } -#endif vde->domain = iommu_domain_alloc(&platform_bus_type); if (!vde->domain) { err = -ENOMEM; Hello, Robin! Thank you for yours work! Some drivers, like this Tegra VDE (Video Decoder Engine) driver for example, do not want to use implicit IOMMU domain. That isn't (intentionally) changing here - the only difference should be that instead of having the ARM-special implicit domain, which you have to kick out of the way with the ARM-specific API before you're able to attach your own domain, the implicit domain is now a proper IOMMU API default domain, which automatically gets bumped by your attach. The default domains should still only be created in the same cases that the ARM dma_iommu_mappings were. Tegra VDE driver relies on explicit IOMMU domain in a case of Tegra SMMU because VDE hardware can't access last page of the AS and because driver wants to reserve some fixed addresses [1]. [1] https://elixir.bootlin.com/linux/v5.9-rc1/source/drivers/staging/media/tegra-vde/iommu.c#L100 Tegra30 SoC supports up to 4 domains, hence it's not possible to afford wasting unused implicit domains. I think this needs to be addressed before this patch could be applied. Yeah, there is one subtle change in behaviour from removing the ARM layer on top of the core API, in that the IOMMU driver will no longer see an explicit detach call. Thus it does stand to benefit from being a bit cleverer about noticing devices being moved from one domain to another by an attach call, either by releasing the hardware context for the inactive domain once the device(s) are moved across to the new one, or by simply reprogramming the hardware context in-place for the new domain's address space without allocating a new one at all (most of the drivers that don't have multiple contexts already handle the latter approach quite well). Would it be possible for IOMMU drivers to gain support for filtering out devices in iommu_domain_alloc(dev, type)? Then perhaps Tegra SMMU driver could simply return NULL in a case of type=IOMMU_DOMAIN_DMA and dev=tegra-vde. If you can implement IOMMU_DOMAIN_IDENTITY by allowing the relevant devices to bypass translation entirely without needing a hardware context (or at worst, can spare one context which all identity-mapped logical domains can share), then you could certainly do that kind of filtering with the .def_domain_type callback if you really wanted to. As above, the intent is that that shouldn't be necessary for this particular case, since only one of a group's default domain and explicitly attached domain can be live at any given time, so the driver should be able to take advantage of that. If you simply have more active devices (groups) than available contexts then yes, you probably would want to do some filtering to decide who deserves a translation domain and who doesn't, but in that case you should already have had a long-standing problem with the ARM implicit domains. Alternatively, the Tegra SMMU could be changed such that the devices will be attached to a domain at the time of a first IOMMU mapping invocation instead of attaching at the time of attach_dev() callback invocation. Or maybe even IOMMU core could be changed to attach devices at the time of the first IOMMU mapping invocation? This could be a universal solution for all drivers. I suppose technically you could do that within an IOMMU driver already (similar to how some defer most of setup that logically belongs to ->domain_alloc until the first ->attach_dev). It's a bit grim from the caller's PoV though, in terms of the failure mode being non-obvious and having no real way to recover. Again, you'd be better off simply making decisions up-front at domain_alloc or attach time based on the domain type. Robin. _
Re: [PATCH 17/18] media/omap3isp: Clean up IOMMU workaround
On 2020-08-20 20:55, Sakari Ailus wrote: On Thu, Aug 20, 2020 at 06:25:19PM +0100, Robin Murphy wrote: On 2020-08-20 17:53, Sakari Ailus wrote: Hi Robin, On Thu, Aug 20, 2020 at 04:08:36PM +0100, Robin Murphy wrote: Now that arch/arm is wired up for default domains and iommu-dma, devices behind IOMMUs will get mappings set up automatically as appropriate, so there is no need for drivers to do so manually. Signed-off-by: Robin Murphy Thanks for the patch. Many thanks for testing so quickly! I haven't looked at the details but it seems that this causes the buffer memory allocation to be physically contiguous, which causes a failure to allocate video buffers of entirely normal size. I guess that was not intentional? Hmm, it looks like the device ends up with the wrong DMA ops, which implies something didn't go as expected with the earlier IOMMU setup and default domain creation. Chances are that either I missed some subtlety in the omap_iommu change, or I've fundamentally misjudged how the ISP probing works and it never actually goes down the of_iommu_configure() path in the first place. Do you get any messages from the IOMMU layer earlier on during boot? I do get these: [2.934936] iommu: Default domain type: Translated [2.940917] omap-iommu 480bd400.mmu: 480bd400.mmu registered [2.946899] platform 480bc000.isp: Adding to iommu group 0 So that much looks OK, if there are no obvious errors. Unfortunately there's no easy way to tell exactly what of_iommu_configure() is doing (beyond enabling a couple of vague debug messages). The first thing I'll do tomorrow is double-check whether it's really working on my boards here, or whether I was just getting lucky with CMA... (I assume you don't have CMA enabled if you're ending up in remap_allocator_alloc()) Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs
On Mon, 27 Jul 2020 23:27:41 -0700 Liu Yi L wrote: > Recent years, mediated device pass-through framework (e.g. vfio-mdev) > is used to achieve flexible device sharing across domains (e.g. VMs). > Also there are hardware assisted mediated pass-through solutions from > platform vendors. e.g. Intel VT-d scalable mode which supports Intel > Scalable I/O Virtualization technology. Such mdevs are called IOMMU- > backed mdevs as there are IOMMU enforced DMA isolation for such mdevs. > In kernel, IOMMU-backed mdevs are exposed to IOMMU layer by aux-domain Or a physical IOMMU backing device. > concept, which means mdevs are protected by an iommu domain which is > auxiliary to the domain that the kernel driver primarily uses for DMA > API. Details can be found in the KVM presentation as below: > > https://events19.linuxfoundation.org/wp-content/uploads/2017/12/\ > Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf I think letting the line exceed 80 columns is preferable so that it's clickable. Thanks, Alex > This patch extends NESTING_IOMMU ops to IOMMU-backed mdev devices. The > main requirement is to use the auxiliary domain associated with mdev. > > Cc: Kevin Tian > CC: Jacob Pan > CC: Jun Tian > Cc: Alex Williamson > Cc: Eric Auger > Cc: Jean-Philippe Brucker > Cc: Joerg Roedel > Cc: Lu Baolu > Reviewed-by: Eric Auger > Signed-off-by: Liu Yi L > --- > v5 -> v6: > *) add review-by from Eric Auger. > > v1 -> v2: > *) check the iommu_device to ensure the handling mdev is IOMMU-backed > --- > drivers/vfio/vfio_iommu_type1.c | 40 > 1 file changed, 36 insertions(+), 4 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index bf95a0f..9d8f252 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -2379,20 +2379,41 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu > *iommu, > return ret; > } > > +static struct device *vfio_get_iommu_device(struct vfio_group *group, > + struct device *dev) > +{ > + if (group->mdev_group) > + return vfio_mdev_get_iommu_device(dev); > + else > + return dev; > +} > + > static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data) > { > struct domain_capsule *dc = (struct domain_capsule *)data; > unsigned long arg = *(unsigned long *)dc->data; > + struct device *iommu_device; > + > + iommu_device = vfio_get_iommu_device(dc->group, dev); > + if (!iommu_device) > + return -EINVAL; > > - return iommu_uapi_sva_bind_gpasid(dc->domain, dev, (void __user *)arg); > + return iommu_uapi_sva_bind_gpasid(dc->domain, iommu_device, > + (void __user *)arg); > } > > static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data) > { > struct domain_capsule *dc = (struct domain_capsule *)data; > unsigned long arg = *(unsigned long *)dc->data; > + struct device *iommu_device; > > - iommu_uapi_sva_unbind_gpasid(dc->domain, dev, (void __user *)arg); > + iommu_device = vfio_get_iommu_device(dc->group, dev); > + if (!iommu_device) > + return -EINVAL; > + > + iommu_uapi_sva_unbind_gpasid(dc->domain, iommu_device, > + (void __user *)arg); > return 0; > } > > @@ -2401,8 +2422,13 @@ static int __vfio_dev_unbind_gpasid_fn(struct device > *dev, void *data) > struct domain_capsule *dc = (struct domain_capsule *)data; > struct iommu_gpasid_bind_data *unbind_data = > (struct iommu_gpasid_bind_data *)dc->data; > + struct device *iommu_device; > + > + iommu_device = vfio_get_iommu_device(dc->group, dev); > + if (!iommu_device) > + return -EINVAL; > > - iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data); > + iommu_sva_unbind_gpasid(dc->domain, iommu_device, unbind_data); > return 0; > } > > @@ -3060,8 +3086,14 @@ static int vfio_dev_cache_invalidate_fn(struct device > *dev, void *data) > { > struct domain_capsule *dc = (struct domain_capsule *)data; > unsigned long arg = *(unsigned long *)dc->data; > + struct device *iommu_device; > + > + iommu_device = vfio_get_iommu_device(dc->group, dev); > + if (!iommu_device) > + return -EINVAL; > > - iommu_uapi_cache_invalidate(dc->domain, dev, (void __user *)arg); > + iommu_uapi_cache_invalidate(dc->domain, iommu_device, > + (void __user *)arg); > return 0; > } > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [GIT PULL] dma-mapping fixes for 5.9
The pull request you sent on Thu, 20 Aug 2020 18:41:58 +0200: > git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.9-1 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/d271b51c60ebe71e0435a9059b315a3d8bb8a099 Thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/prtracker.html ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 08/15] iommu: Pass domain to sva_unbind_gpasid()
On Mon, 27 Jul 2020 23:27:37 -0700 Liu Yi L wrote: > From: Yi Sun > > Current interface is good enough for SVA virtualization on an assigned > physical PCI device, but when it comes to mediated devices, a physical > device may attached with multiple aux-domains. Also, for guest unbind, s/may/may be/ > the PASID to be unbind should be allocated to the VM. This check requires > to know the ioasid_set which is associated with the domain. > > So this interface needs to pass in domain info. Then the iommu driver is > able to know which domain will be used for the 2nd stage translation of > the nesting mode and also be able to do PASID ownership check. This patch > passes @domain per the above reason. Also, the prototype of &pasid is > changed frnt" to "u32" as the below link. s/frnt"/from an "int"/ > https://lore.kernel.org/kvm/27ac7880-bdd3-2891-139e-b4a7cd184...@redhat.com/ This is really confusing, the link is to Eric's comment asking that the conversion from (at the time) int to ioasid_t be included in the commit log. The text here implies that it's pointing to some sort of justification for the change, which it isn't. It just notes that it happened, not why it happened, with a mostly irrelevant link. > Cc: Kevin Tian > CC: Jacob Pan > Cc: Alex Williamson > Cc: Eric Auger > Cc: Jean-Philippe Brucker > Cc: Joerg Roedel > Cc: Lu Baolu > Reviewed-by: Eric Auger > Signed-off-by: Yi Sun > Signed-off-by: Liu Yi L > --- > v5 -> v6: > *) use "u32" prototype for @pasid. > *) add review-by from Eric Auger. I'd probably hold off on adding Eric's R-b given the additional change in this version FWIW. Thanks, Alex > v2 -> v3: > *) pass in domain info only > *) use u32 for pasid instead of int type > > v1 -> v2: > *) added in v2. > --- > drivers/iommu/intel/svm.c | 3 ++- > drivers/iommu/iommu.c | 2 +- > include/linux/intel-iommu.h | 3 ++- > include/linux/iommu.h | 3 ++- > 4 files changed, 7 insertions(+), 4 deletions(-) > > diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c > index c27d16a..c85b8d5 100644 > --- a/drivers/iommu/intel/svm.c > +++ b/drivers/iommu/intel/svm.c > @@ -436,7 +436,8 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, > struct device *dev, > return ret; > } > > -int intel_svm_unbind_gpasid(struct device *dev, int pasid) > +int intel_svm_unbind_gpasid(struct iommu_domain *domain, > + struct device *dev, u32 pasid) > { > struct intel_iommu *iommu = intel_svm_device_to_iommu(dev); > struct intel_svm_dev *sdev; > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 1ce2a61..bee79d7 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -2145,7 +2145,7 @@ int iommu_sva_unbind_gpasid(struct iommu_domain > *domain, struct device *dev, > if (unlikely(!domain->ops->sva_unbind_gpasid)) > return -ENODEV; > > - return domain->ops->sva_unbind_gpasid(dev, data->hpasid); > + return domain->ops->sva_unbind_gpasid(domain, dev, data->hpasid); > } > EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid); > > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h > index 0d0ab32..f98146b 100644 > --- a/include/linux/intel-iommu.h > +++ b/include/linux/intel-iommu.h > @@ -738,7 +738,8 @@ extern int intel_svm_enable_prq(struct intel_iommu > *iommu); > extern int intel_svm_finish_prq(struct intel_iommu *iommu); > int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev, > struct iommu_gpasid_bind_data *data); > -int intel_svm_unbind_gpasid(struct device *dev, int pasid); > +int intel_svm_unbind_gpasid(struct iommu_domain *domain, > + struct device *dev, u32 pasid); > struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm, >void *drvdata); > void intel_svm_unbind(struct iommu_sva *handle); > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index b1ff702..80467fc 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -303,7 +303,8 @@ struct iommu_ops { > int (*sva_bind_gpasid)(struct iommu_domain *domain, > struct device *dev, struct iommu_gpasid_bind_data > *data); > > - int (*sva_unbind_gpasid)(struct device *dev, int pasid); > + int (*sva_unbind_gpasid)(struct iommu_domain *domain, > + struct device *dev, u32 pasid); > > int (*def_domain_type)(struct device *dev); > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 07/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)
On Mon, 27 Jul 2020 23:27:36 -0700 Liu Yi L wrote: > This patch allows userspace to request PASID allocation/free, e.g. when > serving the request from the guest. > > PASIDs that are not freed by userspace are automatically freed when the > IOASID set is destroyed when process exits. > > Cc: Kevin Tian > CC: Jacob Pan > Cc: Alex Williamson > Cc: Eric Auger > Cc: Jean-Philippe Brucker > Cc: Joerg Roedel > Cc: Lu Baolu > Signed-off-by: Liu Yi L > Signed-off-by: Yi Sun > Signed-off-by: Jacob Pan > --- > v5 -> v6: > *) address comments from Eric against v5. remove the alloc/free helper. > > v4 -> v5: > *) address comments from Eric Auger. > *) the comments for the PASID_FREE request is addressed in patch 5/15 of >this series. > > v3 -> v4: > *) address comments from v3, except the below comment against the range >of PASID_FREE request. needs more help on it. > "> +if (req.range.min > req.range.max) > > Is it exploitable that a user can spin the kernel for a long time in > the case of a free by calling this with [0, MAX_UINT] regardless of > their actual allocations?" > https://lore.kernel.org/linux-iommu/20200702151832.048b4...@x1.home/ > > v1 -> v2: > *) move the vfio_mm related code to be a seprate module > *) use a single structure for alloc/free, could support a range of PASIDs > *) fetch vfio_mm at group_attach time instead of at iommu driver open time > --- > drivers/vfio/Kconfig| 1 + > drivers/vfio/vfio_iommu_type1.c | 69 > + > drivers/vfio/vfio_pasid.c | 10 ++ > include/linux/vfio.h| 6 > include/uapi/linux/vfio.h | 37 ++ > 5 files changed, 123 insertions(+) > > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig > index 3d8a108..95d90c6 100644 > --- a/drivers/vfio/Kconfig > +++ b/drivers/vfio/Kconfig > @@ -2,6 +2,7 @@ > config VFIO_IOMMU_TYPE1 > tristate > depends on VFIO > + select VFIO_PASID if (X86) > default n > > config VFIO_IOMMU_SPAPR_TCE > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 18ff0c3..ea89c7c 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -76,6 +76,7 @@ struct vfio_iommu { > booldirty_page_tracking; > boolpinned_page_dirty_scope; > struct iommu_nesting_info *nesting_info; > + struct vfio_mm *vmm; > }; > > struct vfio_domain { > @@ -1937,6 +1938,11 @@ static void vfio_iommu_iova_insert_copy(struct > vfio_iommu *iommu, > > static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu) > { > + if (iommu->vmm) { > + vfio_mm_put(iommu->vmm); > + iommu->vmm = NULL; > + } > + > kfree(iommu->nesting_info); > iommu->nesting_info = NULL; > } > @@ -2071,6 +2077,26 @@ static int vfio_iommu_type1_attach_group(void > *iommu_data, > iommu->nesting_info); > if (ret) > goto out_detach; > + > + if (iommu->nesting_info->features & > + IOMMU_NESTING_FEAT_SYSWIDE_PASID) { > + struct vfio_mm *vmm; > + int sid; > + > + vmm = vfio_mm_get_from_task(current); > + if (IS_ERR(vmm)) { > + ret = PTR_ERR(vmm); > + goto out_detach; > + } > + iommu->vmm = vmm; > + > + sid = vfio_mm_ioasid_sid(vmm); > + ret = iommu_domain_set_attr(domain->domain, > + DOMAIN_ATTR_IOASID_SID, > + &sid); > + if (ret) > + goto out_detach; > + } > } > > /* Get aperture info */ > @@ -2859,6 +2885,47 @@ static int vfio_iommu_type1_dirty_pages(struct > vfio_iommu *iommu, > return -EINVAL; > } > > +static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu, > + unsigned long arg) > +{ > + struct vfio_iommu_type1_pasid_request req; > + unsigned long minsz; > + int ret; > + > + minsz = offsetofend(struct vfio_iommu_type1_pasid_request, range); > + > + if (copy_from_user(&req, (void __user *)arg, minsz)) > + return -EFAULT; > + > + if (req.argsz < minsz || (req.flags & ~VFIO_PASID_REQUEST_MASK)) > + return -EINVAL; > + > + if (req.range.min > req.range.max) > + return -EINVAL; > + > + mutex_lock(&iommu->lock); > + if (!iommu->vmm) { > + mutex_unlock(&iommu->lock); > + return -EOPNOTSUPP; > + } > + > + switch (req.flags & VFIO_PASID_REQUEST_MASK) { > +
Re: [PATCH 12/18] iommu/tegra-gart: Add IOMMU_DOMAIN_DMA support
20.08.2020 18:08, Robin Murphy пишет: > Now that arch/arm is wired up for default domains and iommu-dma, > implement the corresponding driver-side support for DMA domains. > > Signed-off-by: Robin Murphy > --- > drivers/iommu/tegra-gart.c | 17 - > 1 file changed, 12 insertions(+), 5 deletions(-) > > diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c > index fac720273889..e081387080f6 100644 > --- a/drivers/iommu/tegra-gart.c > +++ b/drivers/iommu/tegra-gart.c > @@ -9,6 +9,7 @@ > > #define dev_fmt(fmt) "gart: " fmt > > +#include > #include > #include > #include > @@ -145,16 +146,22 @@ static struct iommu_domain > *gart_iommu_domain_alloc(unsigned type) > { > struct iommu_domain *domain; Hello, Robin! Tegra20 GART isn't a real IOMMU, but a small relocation aperture. We would only want to use it for a temporal mappings (managed by GPU driver) for the time while GPU hardware is busy and working with a sparse DMA buffers, the driver will take care of unmapping the sparse buffers once GPU work is finished [1]. In a case of contiguous DMA buffers, we want to bypass the IOMMU and use buffer's phys address because GART aperture is small and all buffers simply can't fit into GART for a complex GPU operations that involve multiple buffers [2][3]. The upstream GPU driver still doesn't support GART, but eventually it needs to be changed. [1] https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gart.c#L489 [2] https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gart.c#L542 [3] https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/uapi/patching.c#L90 > - if (type != IOMMU_DOMAIN_UNMANAGED) > + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA) > return NULL; Will a returned NULL tell to IOMMU core that implicit domain shouldn't be used? Is it possible to leave this driver as-is? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 16/18] staging/media/tegra-vde: Clean up IOMMU workaround
20.08.2020 22:51, Dmitry Osipenko пишет: > Alternatively, the Tegra SMMU could be changed such that the devices > will be attached to a domain at the time of a first IOMMU mapping > invocation instead of attaching at the time of attach_dev() callback > invocation. > > Or maybe even IOMMU core could be changed to attach devices at the time > of the first IOMMU mapping invocation? This could be a universal > solution for all drivers. Although, please scratch this :) I'll need to revisit how DMA mapping API works. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 17/18] media/omap3isp: Clean up IOMMU workaround
On Thu, Aug 20, 2020 at 06:25:19PM +0100, Robin Murphy wrote: > On 2020-08-20 17:53, Sakari Ailus wrote: > > Hi Robin, > > > > On Thu, Aug 20, 2020 at 04:08:36PM +0100, Robin Murphy wrote: > > > Now that arch/arm is wired up for default domains and iommu-dma, devices > > > behind IOMMUs will get mappings set up automatically as appropriate, so > > > there is no need for drivers to do so manually. > > > > > > Signed-off-by: Robin Murphy > > > > Thanks for the patch. > > Many thanks for testing so quickly! > > > I haven't looked at the details but it seems that this causes the buffer > > memory allocation to be physically contiguous, which causes a failure to > > allocate video buffers of entirely normal size. I guess that was not > > intentional? > > Hmm, it looks like the device ends up with the wrong DMA ops, which implies > something didn't go as expected with the earlier IOMMU setup and default > domain creation. Chances are that either I missed some subtlety in the > omap_iommu change, or I've fundamentally misjudged how the ISP probing works > and it never actually goes down the of_iommu_configure() path in the first > place. Do you get any messages from the IOMMU layer earlier on during boot? I do get these: [2.934936] iommu: Default domain type: Translated [2.940917] omap-iommu 480bd400.mmu: 480bd400.mmu registered [2.946899] platform 480bc000.isp: Adding to iommu group 0 -- Sakari Ailus ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 04/15] vfio/type1: Report iommu nesting info to userspace
On Mon, 27 Jul 2020 23:27:33 -0700 Liu Yi L wrote: > This patch exports iommu nesting capability info to user space through > VFIO. Userspace is expected to check this info for supported uAPIs (e.g. > PASID alloc/free, bind page table, and cache invalidation) and the vendor > specific format information for first level/stage page table that will be > bound to. > > The nesting info is available only after container set to be NESTED type. > Current implementation imposes one limitation - one nesting container > should include at most one iommu group. The philosophy of vfio container > is having all groups/devices within the container share the same IOMMU > context. When vSVA is enabled, one IOMMU context could include one 2nd- > level address space and multiple 1st-level address spaces. While the > 2nd-level address space is reasonably sharable by multiple groups, blindly > sharing 1st-level address spaces across all groups within the container > might instead break the guest expectation. In the future sub/super container > concept might be introduced to allow partial address space sharing within > an IOMMU context. But for now let's go with this restriction by requiring > singleton container for using nesting iommu features. Below link has the > related discussion about this decision. > > https://lore.kernel.org/kvm/20200515115924.37e69...@w520.home/ > > This patch also changes the NESTING type container behaviour. Something > that would have succeeded before will now fail: Before this series, if > user asked for a VFIO_IOMMU_TYPE1_NESTING, it would have succeeded even > if the SMMU didn't support stage-2, as the driver would have silently > fallen back on stage-1 mappings (which work exactly the same as stage-2 > only since there was no nesting supported). After the series, we do check > for DOMAIN_ATTR_NESTING so if user asks for VFIO_IOMMU_TYPE1_NESTING and > the SMMU doesn't support stage-2, the ioctl fails. But it should be a good > fix and completely harmless. Detail can be found in below link as well. > > https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/ > > Cc: Kevin Tian > CC: Jacob Pan > Cc: Alex Williamson > Cc: Eric Auger > Cc: Jean-Philippe Brucker > Cc: Joerg Roedel > Cc: Lu Baolu > Signed-off-by: Liu Yi L > --- > v5 -> v6: > *) address comments against v5 from Eric Auger. > *) don't report nesting cap to userspace if the nesting_info->format is >invalid. > > v4 -> v5: > *) address comments from Eric Auger. > *) return struct iommu_nesting_info for VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as >cap is much "cheap", if needs extension in future, just define another cap. >https://lore.kernel.org/kvm/20200708132947.5b7ee...@x1.home/ > > v3 -> v4: > *) address comments against v3. > > v1 -> v2: > *) added in v2 > --- > drivers/vfio/vfio_iommu_type1.c | 106 > +++- > include/uapi/linux/vfio.h | 19 +++ > 2 files changed, 113 insertions(+), 12 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 3bd70ff..18ff0c3 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit, >"Maximum number of user DMA mappings per container (65535)."); > > struct vfio_iommu { > - struct list_headdomain_list; > - struct list_headiova_list; > - struct vfio_domain *external_domain; /* domain for external user */ > - struct mutexlock; > - struct rb_root dma_list; > - struct blocking_notifier_head notifier; > - unsigned intdma_avail; > - uint64_tpgsize_bitmap; > - boolv2; > - boolnesting; > - booldirty_page_tracking; > - boolpinned_page_dirty_scope; > + struct list_headdomain_list; > + struct list_headiova_list; > + /* domain for external user */ > + struct vfio_domain *external_domain; > + struct mutexlock; > + struct rb_root dma_list; > + struct blocking_notifier_head notifier; > + unsigned intdma_avail; > + uint64_tpgsize_bitmap; > + boolv2; > + boolnesting; > + booldirty_page_tracking; > + boolpinned_page_dirty_scope; > + struct iommu_nesting_info *nesting_info; > }; > > struct vfio_domain { > @@ -130,6 +132,9 @@ struct vfio_regions { > #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \ > (!list_empty(&iommu->domain_list)) > > +#define CONTAINER_HAS_DOMAIN(iommu) (((iommu)->external_domain) || \ > + (!list_empty(&(i
Re: [PATCH 16/18] staging/media/tegra-vde: Clean up IOMMU workaround
20.08.2020 18:08, Robin Murphy пишет: > Now that arch/arm is wired up for default domains and iommu-dma, we no > longer need to work around the arch-private mapping. > > Signed-off-by: Robin Murphy > --- > drivers/staging/media/tegra-vde/iommu.c | 12 > 1 file changed, 12 deletions(-) > > diff --git a/drivers/staging/media/tegra-vde/iommu.c > b/drivers/staging/media/tegra-vde/iommu.c > index 6af863d92123..4f770189ed34 100644 > --- a/drivers/staging/media/tegra-vde/iommu.c > +++ b/drivers/staging/media/tegra-vde/iommu.c > @@ -10,10 +10,6 @@ > #include > #include > > -#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU) > -#include > -#endif > - > #include "vde.h" > > int tegra_vde_iommu_map(struct tegra_vde *vde, > @@ -70,14 +66,6 @@ int tegra_vde_iommu_init(struct tegra_vde *vde) > if (!vde->group) > return 0; > > -#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU) > - if (dev->archdata.mapping) { > - struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); > - > - arm_iommu_detach_device(dev); > - arm_iommu_release_mapping(mapping); > - } > -#endif > vde->domain = iommu_domain_alloc(&platform_bus_type); > if (!vde->domain) { > err = -ENOMEM; > Hello, Robin! Thank you for yours work! Some drivers, like this Tegra VDE (Video Decoder Engine) driver for example, do not want to use implicit IOMMU domain. Tegra VDE driver relies on explicit IOMMU domain in a case of Tegra SMMU because VDE hardware can't access last page of the AS and because driver wants to reserve some fixed addresses [1]. [1] https://elixir.bootlin.com/linux/v5.9-rc1/source/drivers/staging/media/tegra-vde/iommu.c#L100 Tegra30 SoC supports up to 4 domains, hence it's not possible to afford wasting unused implicit domains. I think this needs to be addressed before this patch could be applied. Would it be possible for IOMMU drivers to gain support for filtering out devices in iommu_domain_alloc(dev, type)? Then perhaps Tegra SMMU driver could simply return NULL in a case of type=IOMMU_DOMAIN_DMA and dev=tegra-vde. Alternatively, the Tegra SMMU could be changed such that the devices will be attached to a domain at the time of a first IOMMU mapping invocation instead of attaching at the time of attach_dev() callback invocation. Or maybe even IOMMU core could be changed to attach devices at the time of the first IOMMU mapping invocation? This could be a universal solution for all drivers. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
On Thu, Aug 20, 2020 at 6:52 PM Christoph Hellwig wrote: > > On Thu, Aug 20, 2020 at 12:24:31PM +0200, Tomasz Figa wrote: > > > Of course this still uses the scatterlist structure with its annoying > > > mix of input and output parametes, so I'd rather not expose it as > > > an official API at the DMA layer. > > > > The problem with the above open coded approach is that it requires > > explicit handling of the non-IOMMU and IOMMU cases and this is exactly > > what we don't want to have in vb2 and what was actually the job of the > > DMA API to hide. Is the plan to actually move the IOMMU handling out > > of the DMA API? > > > > Do you think we could instead turn it into a dma_alloc_noncoherent() > > helper, which has similar semantics as dma_alloc_attrs() and handles > > the various corner cases (e.g. invalidate_kernel_vmap_range and > > flush_kernel_vmap_range) to achieve the desired functionality without > > delegating the "hell", as you called it, to the users? > > Yes, I guess I could do something in that direction. At least for > dma-iommu, which thanks to Robin should be all you'll need in the > foreseeable future. That would be really great. Let me know if we can help by testing with V4L2/vb2 or in any other way. Best regards, Tomasz ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
On Thu, Aug 20, 2020 at 6:54 PM Christoph Hellwig wrote: > > On Thu, Aug 20, 2020 at 12:05:29PM +0200, Tomasz Figa wrote: > > The UAPI and V4L2/videobuf2 changes are in good shape and the only > > wrong part is the use of DMA API, which was based on an earlier email > > guidance anyway, and a change to the synchronization part . I find > > conclusions like the above insulting for people who put many hours > > into designing and implementing the related functionality, given the > > complexity of the videobuf2 framework and how ill-defined the DMA API > > was, and would feel better if such could be avoided in future > > communication. > > It wasn't meant to be too insulting, but I found this out when trying > to figure out how to just disable it. But it also ends up using > the actual dma attr flags for it's own consistency checks, so just > not setting the flag did not turn out to work that easily. > Yes, sadly the videobuf2 ended up becoming quite counterintuitive after growing for the long years and that is reflected in the design of this feature as well. I think we need to do something about it. > But in general it helps to add a few more people to the Cc list for > such things that do stranger things. Especially if you think you did > it based on the advice of those people. Indeed, we should have CCed you and other DMA folks. Sergey who worked on this series is quite new to these areas of the kernel (although not to the kernel itself) and it's my fault for not explicitly letting him know to do that. Best regards, Tomasz ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 17/18] media/omap3isp: Clean up IOMMU workaround
On 2020-08-20 17:53, Sakari Ailus wrote: Hi Robin, On Thu, Aug 20, 2020 at 04:08:36PM +0100, Robin Murphy wrote: Now that arch/arm is wired up for default domains and iommu-dma, devices behind IOMMUs will get mappings set up automatically as appropriate, so there is no need for drivers to do so manually. Signed-off-by: Robin Murphy Thanks for the patch. Many thanks for testing so quickly! I haven't looked at the details but it seems that this causes the buffer memory allocation to be physically contiguous, which causes a failure to allocate video buffers of entirely normal size. I guess that was not intentional? Hmm, it looks like the device ends up with the wrong DMA ops, which implies something didn't go as expected with the earlier IOMMU setup and default domain creation. Chances are that either I missed some subtlety in the omap_iommu change, or I've fundamentally misjudged how the ISP probing works and it never actually goes down the of_iommu_configure() path in the first place. Do you get any messages from the IOMMU layer earlier on during boot? Robin. -8<--- [ 218.934448] WARNING: CPU: 0 PID: 1994 at mm/page_alloc.c:4859 __alloc_pages_nodemask+0x9c/0xb1c [ 218.943847] Modules linked in: omap3_isp videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common leds_as3645a smiapp v4l2_flash_led_class led_class_flash v4l2_fwnode smiapp_pll videodev leds_gpio mc led_class [ 218.964660] CPU: 0 PID: 1994 Comm: yavta Not tainted 5.9.0-rc1-dirty #1818 [ 218.972442] Hardware name: Generic OMAP36xx (Flattened Device Tree) [ 218.978973] Backtrace: [ 218.981842] [] (dump_backtrace) from [] (show_stack+0x20/0x24) [ 218.989715] r7: r6:0009 r5:c08f03bc r4:c08f2fef [ 218.995880] [] (show_stack) from [] (dump_stack+0x28/0x30) [ 219.003631] [] (dump_stack) from [] (__warn+0x100/0x118) [ 219.010955] r5:c08f03bc r4: [ 219.014953] [] (__warn) from [] (warn_slowpath_fmt+0x84/0xa8) [ 219.022949] r9:c0232090 r8:c08f03bc r7:c0b08a88 r6:0009 r5:12fb r4: [ 219.031036] [] (warn_slowpath_fmt) from [] (__alloc_pages_nodemask+0x9c/0xb1c) [ 219.040557] r9:c0185c3c r8: r7:010ec000 r6: r5:000d r4: [ 219.048858] [] (__alloc_pages_nodemask) from [] (__dma_alloc_buffer.constprop.14+0x3c/0x90) [ 219.059570] r10:0cc0 r9:c0185c3c r8: r7:010ec000 r6:000d r5:c0b08a88 [ 219.067901] r4:0cc0 [ 219.070587] [] (__dma_alloc_buffer.constprop.14) from [] (remap_allocator_alloc+0x34/0x7c) [ 219.081207] r9:c0185c3c r8:0247 r7:e6d7fb84 r6:010ec000 r5:c0b08a88 r4:0001 [ 219.089263] [] (remap_allocator_alloc) from [] (__dma_alloc+0x124/0x21c) [ 219.098236] r9:ed99fc10 r8:e69aa890 r7: r6: r5:c0b08a88 r4:e6fdd680 [ 219.106536] [] (__dma_alloc) from [] (arm_dma_alloc+0x68/0x74) [ 219.114654] r10:0cc0 r9:c0185c3c r8:0cc0 r7:e69aa890 r6:010ec000 r5:ed99fc10 [ 219.122985] r4: [ 219.125671] [] (arm_dma_alloc) from [] (dma_alloc_attrs+0xe4/0x120) [ 219.134216] r9: r8:e69aa890 r7:010ec000 r6:c0b08a88 r5:ed99fc10 r4:c010f634 [ 219.142517] [] (dma_alloc_attrs) from [] (vb2_dc_alloc+0xcc/0x108 [videobuf2_dma_contig]) [ 219.153076] r10:e6885ca8 r9:e6abfc48 r8:0002 r7: r6:010ec000 r5:ed99fc10 [ 219.161407] r4:e69aa880 [ 219.164184] [] (vb2_dc_alloc [videobuf2_dma_contig]) from [] (__vb2_queue_alloc+0x258/0x4a4 [videobuf2_common]) [ 219.176696] r8:bf095b70 r7:010ec000 r6: r5:e6885ca8 r4:e6abfc00 [ 219.183959] [] (__vb2_queue_alloc [videobuf2_common]) from [] (vb2_core_reqbufs+0x408/0x498 [videobuf2_common]) [ 219.196533] r10:e6885ce8 r9: r8:e6d7fe24 r7:e6d7fcec r6:bf09ced4 r5:bf088580 [ 219.204895] r4:e6885ca8 [ 219.207672] [] (vb2_core_reqbufs [videobuf2_common]) from [] (vb2_reqbufs+0x64/0x70 [videobuf2_v4l2]) [ 219.219268] r10: r9:bf032bc0 r8:c0145608 r7:bf0ad4a4 r6:e6885ca8 r5: [ 219.227600] r4:e6d7fe24 [ 219.230499] [] (vb2_reqbufs [videobuf2_v4l2]) from [] (isp_video_reqbufs+0x40/0x54 [omap3_isp]) [ 219.241607] r7:bf0ad4a4 r6:e6d7fe24 r5:e6885c00 r4:e6cca928 [ 219.247924] [] (isp_video_reqbufs [omap3_isp]) from [] (v4l_reqbufs+0x4c/0x50 [videodev]) [ 219.258514] r7:bf0ad4a4 r6:e6885c00 r5:e6d7fe24 r4:e7efbec0 [ 219.264984] [] (v4l_reqbufs [videodev]) from [] (__video_do_ioctl+0x2d8/0x414 [videodev]) [ 219.275512] r7:bf01de00 r6: r5: r4:e6cca2e0 [ 219.281982] [] (__video_do_ioctl [videodev]) from [] (video_usercopy+0x144/0x508 [videodev]) [ 219.292816] r10:e7efbec0 r9:c0145608 r8:e6d7fe24 r7: r6: r5:bf01ebdc [ 219.300933] r4:c0145608 [ 219.304168] [] (video_usercopy [videodev]) from [] (video_ioctl2+0x1c/0x24 [videodev]) [ 219.314453] r10:e7fbfda0 r9:e7efbec0 r8:0003 r7: r6:bee658f4 r5:c0145608 [ 219.322784] r4:e7efbec0 [ 219.325775] [] (video_ioctl2 [videodev]) from []
Re: [PATCH 10/18] iommu/msm: Add IOMMU_DOMAIN_DMA support
On Thu, Aug 20, 2020 at 9:58 AM Robin Murphy wrote: > > On 2020-08-20 16:55, Rob Clark wrote: > > Side note, I suspect we'll end up needing something like > > 0e764a01015dfebff8a8ffd297d74663772e248a .. if someone can dig a 32b > > device out of the closet and dust it off, the fix is easy enough. > > Just wanted to mention that here so anyone with a 32b device could > > find what is needed. > > FWIW there shouldn't be any material change here - the generic default > domain is installed under the same circumstances as the Arm > dma_iommu_mapping was, so if any platform does have an issue, then it > should already have started 4 years with f78ebca8ff3d ("iommu/msm: Add > support for generic master bindings"). ok, it has, I guess, been a while since playing with 32b things.. someone on IRC had mentioned a problem that sounded like what 0e764a01015dfebff8a8ffd297d74663772e248a solved, unless they disabled some ARCH_HAS_xyz thing (IIRC), which I guess is related.. BR, -R > Robin. > > > > > BR, > > -R > > > > On Thu, Aug 20, 2020 at 8:09 AM Robin Murphy wrote: > >> > >> Now that arch/arm is wired up for default domains and iommu-dma, > >> implement the corresponding driver-side support for DMA domains. > >> > >> Signed-off-by: Robin Murphy > >> --- > >> drivers/iommu/msm_iommu.c | 7 ++- > >> 1 file changed, 6 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c > >> index 3615cd6241c4..f34efcbb0b2b 100644 > >> --- a/drivers/iommu/msm_iommu.c > >> +++ b/drivers/iommu/msm_iommu.c > >> @@ -8,6 +8,7 @@ > >> #include > >> #include > >> #include > >> +#include > >> #include > >> #include > >> #include > >> @@ -314,13 +315,16 @@ static struct iommu_domain > >> *msm_iommu_domain_alloc(unsigned type) > >> { > >> struct msm_priv *priv; > >> > >> - if (type != IOMMU_DOMAIN_UNMANAGED) > >> + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA) > >> return NULL; > >> > >> priv = kzalloc(sizeof(*priv), GFP_KERNEL); > >> if (!priv) > >> goto fail_nomem; > >> > >> + if (type == IOMMU_DOMAIN_DMA && > >> iommu_get_dma_cookie(&priv->domain)) > >> + goto fail_nomem; > >> + > >> INIT_LIST_HEAD(&priv->list_attached); > >> > >> priv->domain.geometry.aperture_start = 0; > >> @@ -339,6 +343,7 @@ static void msm_iommu_domain_free(struct iommu_domain > >> *domain) > >> struct msm_priv *priv; > >> unsigned long flags; > >> > >> + iommu_put_dma_cookie(domain); > >> spin_lock_irqsave(&msm_iommu_lock, flags); > >> priv = to_msm_priv(domain); > >> kfree(priv); > >> -- > >> 2.28.0.dirty > >> > >> ___ > >> dri-devel mailing list > >> dri-de...@lists.freedesktop.org > >> https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 17/18] media/omap3isp: Clean up IOMMU workaround
Hi Robin, On Thu, Aug 20, 2020 at 04:08:36PM +0100, Robin Murphy wrote: > Now that arch/arm is wired up for default domains and iommu-dma, devices > behind IOMMUs will get mappings set up automatically as appropriate, so > there is no need for drivers to do so manually. > > Signed-off-by: Robin Murphy Thanks for the patch. I haven't looked at the details but it seems that this causes the buffer memory allocation to be physically contiguous, which causes a failure to allocate video buffers of entirely normal size. I guess that was not intentional? -8<--- [ 218.934448] WARNING: CPU: 0 PID: 1994 at mm/page_alloc.c:4859 __alloc_pages_nodemask+0x9c/0xb1c [ 218.943847] Modules linked in: omap3_isp videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common leds_as3645a smiapp v4l2_flash_led_class led_class_flash v4l2_fwnode smiapp_pll videodev leds_gpio mc led_class [ 218.964660] CPU: 0 PID: 1994 Comm: yavta Not tainted 5.9.0-rc1-dirty #1818 [ 218.972442] Hardware name: Generic OMAP36xx (Flattened Device Tree) [ 218.978973] Backtrace: [ 218.981842] [] (dump_backtrace) from [] (show_stack+0x20/0x24) [ 218.989715] r7: r6:0009 r5:c08f03bc r4:c08f2fef [ 218.995880] [] (show_stack) from [] (dump_stack+0x28/0x30) [ 219.003631] [] (dump_stack) from [] (__warn+0x100/0x118) [ 219.010955] r5:c08f03bc r4: [ 219.014953] [] (__warn) from [] (warn_slowpath_fmt+0x84/0xa8) [ 219.022949] r9:c0232090 r8:c08f03bc r7:c0b08a88 r6:0009 r5:12fb r4: [ 219.031036] [] (warn_slowpath_fmt) from [] (__alloc_pages_nodemask+0x9c/0xb1c) [ 219.040557] r9:c0185c3c r8: r7:010ec000 r6: r5:000d r4: [ 219.048858] [] (__alloc_pages_nodemask) from [] (__dma_alloc_buffer.constprop.14+0x3c/0x90) [ 219.059570] r10:0cc0 r9:c0185c3c r8: r7:010ec000 r6:000d r5:c0b08a88 [ 219.067901] r4:0cc0 [ 219.070587] [] (__dma_alloc_buffer.constprop.14) from [] (remap_allocator_alloc+0x34/0x7c) [ 219.081207] r9:c0185c3c r8:0247 r7:e6d7fb84 r6:010ec000 r5:c0b08a88 r4:0001 [ 219.089263] [] (remap_allocator_alloc) from [] (__dma_alloc+0x124/0x21c) [ 219.098236] r9:ed99fc10 r8:e69aa890 r7: r6: r5:c0b08a88 r4:e6fdd680 [ 219.106536] [] (__dma_alloc) from [] (arm_dma_alloc+0x68/0x74) [ 219.114654] r10:0cc0 r9:c0185c3c r8:0cc0 r7:e69aa890 r6:010ec000 r5:ed99fc10 [ 219.122985] r4: [ 219.125671] [] (arm_dma_alloc) from [] (dma_alloc_attrs+0xe4/0x120) [ 219.134216] r9: r8:e69aa890 r7:010ec000 r6:c0b08a88 r5:ed99fc10 r4:c010f634 [ 219.142517] [] (dma_alloc_attrs) from [] (vb2_dc_alloc+0xcc/0x108 [videobuf2_dma_contig]) [ 219.153076] r10:e6885ca8 r9:e6abfc48 r8:0002 r7: r6:010ec000 r5:ed99fc10 [ 219.161407] r4:e69aa880 [ 219.164184] [] (vb2_dc_alloc [videobuf2_dma_contig]) from [] (__vb2_queue_alloc+0x258/0x4a4 [videobuf2_common]) [ 219.176696] r8:bf095b70 r7:010ec000 r6: r5:e6885ca8 r4:e6abfc00 [ 219.183959] [] (__vb2_queue_alloc [videobuf2_common]) from [] (vb2_core_reqbufs+0x408/0x498 [videobuf2_common]) [ 219.196533] r10:e6885ce8 r9: r8:e6d7fe24 r7:e6d7fcec r6:bf09ced4 r5:bf088580 [ 219.204895] r4:e6885ca8 [ 219.207672] [] (vb2_core_reqbufs [videobuf2_common]) from [] (vb2_reqbufs+0x64/0x70 [videobuf2_v4l2]) [ 219.219268] r10: r9:bf032bc0 r8:c0145608 r7:bf0ad4a4 r6:e6885ca8 r5: [ 219.227600] r4:e6d7fe24 [ 219.230499] [] (vb2_reqbufs [videobuf2_v4l2]) from [] (isp_video_reqbufs+0x40/0x54 [omap3_isp]) [ 219.241607] r7:bf0ad4a4 r6:e6d7fe24 r5:e6885c00 r4:e6cca928 [ 219.247924] [] (isp_video_reqbufs [omap3_isp]) from [] (v4l_reqbufs+0x4c/0x50 [videodev]) [ 219.258514] r7:bf0ad4a4 r6:e6885c00 r5:e6d7fe24 r4:e7efbec0 [ 219.264984] [] (v4l_reqbufs [videodev]) from [] (__video_do_ioctl+0x2d8/0x414 [videodev]) [ 219.275512] r7:bf01de00 r6: r5: r4:e6cca2e0 [ 219.281982] [] (__video_do_ioctl [videodev]) from [] (video_usercopy+0x144/0x508 [videodev]) [ 219.292816] r10:e7efbec0 r9:c0145608 r8:e6d7fe24 r7: r6: r5:bf01ebdc [ 219.300933] r4:c0145608 [ 219.304168] [] (video_usercopy [videodev]) from [] (video_ioctl2+0x1c/0x24 [videodev]) [ 219.314453] r10:e7fbfda0 r9:e7efbec0 r8:0003 r7: r6:bee658f4 r5:c0145608 [ 219.322784] r4:e7efbec0 [ 219.325775] [] (video_ioctl2 [videodev]) from [] (v4l2_ioctl+0x50/0x64 [videodev]) [ 219.335845] [] (v4l2_ioctl [videodev]) from [] (vfs_ioctl+0x30/0x44) [ 219.344482] r7: r6:e7efbec0 r5:bee658f4 r4:c0145608 [ 219.350402] [] (vfs_ioctl) from [] (sys_ioctl+0xdc/0x7ec) [ 219.358062] [] (sys_ioctl) from [] (ret_fast_syscall+0x0/0x28) [ 219.366149] Exception stack(0xe6d7ffa8 to 0xe6d7fff0) [ 219.371673] ffa0: bee65c1a 0003 c0145608 bee658f4 0001 [ 219.380157] ffc0: bee65c1a 0036 09a0 ef30 010
Re: [PATCH 10/18] iommu/msm: Add IOMMU_DOMAIN_DMA support
On 2020-08-20 16:55, Rob Clark wrote: Side note, I suspect we'll end up needing something like 0e764a01015dfebff8a8ffd297d74663772e248a .. if someone can dig a 32b device out of the closet and dust it off, the fix is easy enough. Just wanted to mention that here so anyone with a 32b device could find what is needed. FWIW there shouldn't be any material change here - the generic default domain is installed under the same circumstances as the Arm dma_iommu_mapping was, so if any platform does have an issue, then it should already have started 4 years with f78ebca8ff3d ("iommu/msm: Add support for generic master bindings"). Robin. BR, -R On Thu, Aug 20, 2020 at 8:09 AM Robin Murphy wrote: Now that arch/arm is wired up for default domains and iommu-dma, implement the corresponding driver-side support for DMA domains. Signed-off-by: Robin Murphy --- drivers/iommu/msm_iommu.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c index 3615cd6241c4..f34efcbb0b2b 100644 --- a/drivers/iommu/msm_iommu.c +++ b/drivers/iommu/msm_iommu.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -314,13 +315,16 @@ static struct iommu_domain *msm_iommu_domain_alloc(unsigned type) { struct msm_priv *priv; - if (type != IOMMU_DOMAIN_UNMANAGED) + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA) return NULL; priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (!priv) goto fail_nomem; + if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(&priv->domain)) + goto fail_nomem; + INIT_LIST_HEAD(&priv->list_attached); priv->domain.geometry.aperture_start = 0; @@ -339,6 +343,7 @@ static void msm_iommu_domain_free(struct iommu_domain *domain) struct msm_priv *priv; unsigned long flags; + iommu_put_dma_cookie(domain); spin_lock_irqsave(&msm_iommu_lock, flags); priv = to_msm_priv(domain); kfree(priv); -- 2.28.0.dirty ___ dri-devel mailing list dri-de...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
On Thu, Aug 20, 2020 at 12:05:29PM +0200, Tomasz Figa wrote: > The UAPI and V4L2/videobuf2 changes are in good shape and the only > wrong part is the use of DMA API, which was based on an earlier email > guidance anyway, and a change to the synchronization part . I find > conclusions like the above insulting for people who put many hours > into designing and implementing the related functionality, given the > complexity of the videobuf2 framework and how ill-defined the DMA API > was, and would feel better if such could be avoided in future > communication. It wasn't meant to be too insulting, but I found this out when trying to figure out how to just disable it. But it also ends up using the actual dma attr flags for it's own consistency checks, so just not setting the flag did not turn out to work that easily. But in general it helps to add a few more people to the Cc list for such things that do stranger things. Especially if you think you did it based on the advice of those people. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
On Thu, Aug 20, 2020 at 12:24:31PM +0200, Tomasz Figa wrote: > > Of course this still uses the scatterlist structure with its annoying > > mix of input and output parametes, so I'd rather not expose it as > > an official API at the DMA layer. > > The problem with the above open coded approach is that it requires > explicit handling of the non-IOMMU and IOMMU cases and this is exactly > what we don't want to have in vb2 and what was actually the job of the > DMA API to hide. Is the plan to actually move the IOMMU handling out > of the DMA API? > > Do you think we could instead turn it into a dma_alloc_noncoherent() > helper, which has similar semantics as dma_alloc_attrs() and handles > the various corner cases (e.g. invalidate_kernel_vmap_range and > flush_kernel_vmap_range) to achieve the desired functionality without > delegating the "hell", as you called it, to the users? Yes, I guess I could do something in that direction. At least for dma-iommu, which thanks to Robin should be all you'll need in the foreseeable future. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
On Thu, Aug 20, 2020 at 12:09:34PM +0200, Tomasz Figa wrote: > > I'm happy to Cc and active participant in the discussion. I'm not > > going to add all reviewers because even with the trimmed CC list > > I'm already hitting the number of receipients limit on various lists. > > Fair enough. > > We'll make your job easier and just turn my MAINTAINERS entry into a > maintainer. :) Sounds like a plan. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[GIT PULL] dma-mapping fixes for 5.9
The following changes since commit a1d21081a60dfb7fddf4a38b66d9cef603b317a9: Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2020-08-13 20:03:11 -0700) are available in the Git repository at: git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.9-1 for you to fetch changes up to d7e673ec2c8e0ea39c4c70fc490d67d7fbda869d: dma-pool: Only allocate from CMA when in same memory zone (2020-08-14 16:27:05 +0200) dma-mapping fixes for 5.9 - fix out more fallout from the dma-pool changes (Nicolas Saenz Julienne, me) Christoph Hellwig (1): dma-pool: fix coherent pool allocations for IOMMU mappings Nicolas Saenz Julienne (1): dma-pool: Only allocate from CMA when in same memory zone drivers/iommu/dma-iommu.c | 4 +- include/linux/dma-direct.h | 3 - include/linux/dma-mapping.h | 5 +- kernel/dma/direct.c | 13 ++-- kernel/dma/pool.c | 145 5 files changed, 92 insertions(+), 78 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 10/18] iommu/msm: Add IOMMU_DOMAIN_DMA support
Side note, I suspect we'll end up needing something like 0e764a01015dfebff8a8ffd297d74663772e248a .. if someone can dig a 32b device out of the closet and dust it off, the fix is easy enough. Just wanted to mention that here so anyone with a 32b device could find what is needed. BR, -R On Thu, Aug 20, 2020 at 8:09 AM Robin Murphy wrote: > > Now that arch/arm is wired up for default domains and iommu-dma, > implement the corresponding driver-side support for DMA domains. > > Signed-off-by: Robin Murphy > --- > drivers/iommu/msm_iommu.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c > index 3615cd6241c4..f34efcbb0b2b 100644 > --- a/drivers/iommu/msm_iommu.c > +++ b/drivers/iommu/msm_iommu.c > @@ -8,6 +8,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -314,13 +315,16 @@ static struct iommu_domain > *msm_iommu_domain_alloc(unsigned type) > { > struct msm_priv *priv; > > - if (type != IOMMU_DOMAIN_UNMANAGED) > + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA) > return NULL; > > priv = kzalloc(sizeof(*priv), GFP_KERNEL); > if (!priv) > goto fail_nomem; > > + if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(&priv->domain)) > + goto fail_nomem; > + > INIT_LIST_HEAD(&priv->list_attached); > > priv->domain.geometry.aperture_start = 0; > @@ -339,6 +343,7 @@ static void msm_iommu_domain_free(struct iommu_domain > *domain) > struct msm_priv *priv; > unsigned long flags; > > + iommu_put_dma_cookie(domain); > spin_lock_irqsave(&msm_iommu_lock, flags); > priv = to_msm_priv(domain); > kfree(priv); > -- > 2.28.0.dirty > > ___ > dri-devel mailing list > dri-de...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 18/18] ARM/dma-mapping: Remove legacy dma-iommu API
With no users left and generic iommu-dma now doing all the work, clean up the last traces of the arch-specific API, plus the temporary workarounds that you'd forgotten about because you were thinking about zebras instead. Signed-off-by: Robin Murphy --- arch/arm/common/dmabounce.c | 1 - arch/arm/include/asm/device.h| 9 -- arch/arm/include/asm/dma-iommu.h | 29 - arch/arm/mm/dma-mapping.c| 200 +-- drivers/iommu/dma-iommu.c| 38 ++ 5 files changed, 11 insertions(+), 266 deletions(-) delete mode 100644 arch/arm/include/asm/dma-iommu.h diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c index f4b719bde763..064349df7bbf 100644 --- a/arch/arm/common/dmabounce.c +++ b/arch/arm/common/dmabounce.c @@ -30,7 +30,6 @@ #include #include -#include #undef STATS diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h index be666f58bf7a..db33f389c94e 100644 --- a/arch/arm/include/asm/device.h +++ b/arch/arm/include/asm/device.h @@ -8,9 +8,6 @@ struct dev_archdata { #ifdef CONFIG_DMABOUNCE struct dmabounce_device_info *dmabounce; -#endif -#ifdef CONFIG_ARM_DMA_USE_IOMMU - struct dma_iommu_mapping*mapping; #endif unsigned int dma_coherent:1; unsigned int dma_ops_setup:1; @@ -24,10 +21,4 @@ struct pdev_archdata { #endif }; -#ifdef CONFIG_ARM_DMA_USE_IOMMU -#define to_dma_iommu_mapping(dev) ((dev)->archdata.mapping) -#else -#define to_dma_iommu_mapping(dev) NULL -#endif - #endif diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h deleted file mode 100644 index f39cfa509fe4.. --- a/arch/arm/include/asm/dma-iommu.h +++ /dev/null @@ -1,29 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef ASMARM_DMA_IOMMU_H -#define ASMARM_DMA_IOMMU_H - -#ifdef __KERNEL__ - -#include -#include -#include -#include - -struct dma_iommu_mapping { - /* iommu specific data */ - struct iommu_domain *domain; - - struct kref kref; -}; - -struct dma_iommu_mapping * -arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, u64 size); - -void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping); - -int arm_iommu_attach_device(struct device *dev, - struct dma_iommu_mapping *mapping); -void arm_iommu_detach_device(struct device *dev); - -#endif /* __KERNEL__ */ -#endif diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 2ef0afc17645..ff6c4962161a 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -33,7 +33,6 @@ #include #include #include -#include #include #include #include @@ -1073,201 +1072,6 @@ static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent) return coherent ? &arm_coherent_dma_ops : &arm_dma_ops; } -#ifdef CONFIG_ARM_DMA_USE_IOMMU - -extern const struct dma_map_ops iommu_dma_ops; -extern int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, - u64 size, struct device *dev); -/** - * arm_iommu_create_mapping - * @bus: pointer to the bus holding the client device (for IOMMU calls) - * @base: start address of the valid IO address space - * @size: maximum size of the valid IO address space - * - * Creates a mapping structure which holds information about used/unused - * IO address ranges, which is required to perform memory allocation and - * mapping with IOMMU aware functions. - * - * The client device need to be attached to the mapping with - * arm_iommu_attach_device function. - */ -struct dma_iommu_mapping * -arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, u64 size) -{ - struct dma_iommu_mapping *mapping; - int err = -ENOMEM; - - mapping = kzalloc(sizeof(*mapping), GFP_KERNEL); - if (!mapping) - goto err; - - mapping->domain = iommu_domain_alloc(bus); - if (!mapping->domain) - goto err2; - - err = iommu_get_dma_cookie(mapping->domain); - if (err) - goto err3; - - err = iommu_dma_init_domain(mapping->domain, base, size, NULL); - if (err) - goto err4; - - kref_init(&mapping->kref); - return mapping; -err4: - iommu_put_dma_cookie(mapping->domain); -err3: - iommu_domain_free(mapping->domain); -err2: - kfree(mapping); -err: - return ERR_PTR(err); -} -EXPORT_SYMBOL_GPL(arm_iommu_create_mapping); - -static void release_iommu_mapping(struct kref *kref) -{ - struct dma_iommu_mapping *mapping = - container_of(kref, struct dma_iommu_mapping, kref); - - iommu_put_dma_cookie(mapping->domain); - iommu_domain_free(mapping->domain); - kfree(mapping); -} - -void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping) -{ - if (mapping) - kref_put(&mapping->kref, release_iommu_mapping); -} -EXPORT_SYMBOL_GPL(arm_iommu_release_ma
[PATCH 07/18] iommu/arm-smmu: Remove arch/arm workaround
Now that arch/arm is wired up for default domains and iommu-dma, remove the add_device workaround. Signed-off-by: Robin Murphy --- drivers/iommu/arm/arm-smmu/arm-smmu.c | 10 -- 1 file changed, 10 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 09c42af9f31e..4e52d8cb67dd 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -1164,17 +1164,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) return -ENXIO; } - /* -* FIXME: The arch/arm DMA API code tries to attach devices to its own -* domains between of_xlate() and probe_device() - we have no way to cope -* with that, so until ARM gets converted to rely on groups and default -* domains, just say no (but more politely than by dereferencing NULL). -* This should be at least a WARN_ON once that's sorted. -*/ cfg = dev_iommu_priv_get(dev); - if (!cfg) - return -ENODEV; - smmu = cfg->smmu; ret = arm_smmu_rpm_get(smmu); -- 2.28.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 10/18] iommu/msm: Add IOMMU_DOMAIN_DMA support
Now that arch/arm is wired up for default domains and iommu-dma, implement the corresponding driver-side support for DMA domains. Signed-off-by: Robin Murphy --- drivers/iommu/msm_iommu.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c index 3615cd6241c4..f34efcbb0b2b 100644 --- a/drivers/iommu/msm_iommu.c +++ b/drivers/iommu/msm_iommu.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -314,13 +315,16 @@ static struct iommu_domain *msm_iommu_domain_alloc(unsigned type) { struct msm_priv *priv; - if (type != IOMMU_DOMAIN_UNMANAGED) + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA) return NULL; priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (!priv) goto fail_nomem; + if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(&priv->domain)) + goto fail_nomem; + INIT_LIST_HEAD(&priv->list_attached); priv->domain.geometry.aperture_start = 0; @@ -339,6 +343,7 @@ static void msm_iommu_domain_free(struct iommu_domain *domain) struct msm_priv *priv; unsigned long flags; + iommu_put_dma_cookie(domain); spin_lock_irqsave(&msm_iommu_lock, flags); priv = to_msm_priv(domain); kfree(priv); -- 2.28.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 12/18] iommu/tegra-gart: Add IOMMU_DOMAIN_DMA support
Now that arch/arm is wired up for default domains and iommu-dma, implement the corresponding driver-side support for DMA domains. Signed-off-by: Robin Murphy --- drivers/iommu/tegra-gart.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c index fac720273889..e081387080f6 100644 --- a/drivers/iommu/tegra-gart.c +++ b/drivers/iommu/tegra-gart.c @@ -9,6 +9,7 @@ #define dev_fmt(fmt) "gart: " fmt +#include #include #include #include @@ -145,16 +146,22 @@ static struct iommu_domain *gart_iommu_domain_alloc(unsigned type) { struct iommu_domain *domain; - if (type != IOMMU_DOMAIN_UNMANAGED) + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA) return NULL; domain = kzalloc(sizeof(*domain), GFP_KERNEL); - if (domain) { - domain->geometry.aperture_start = gart_handle->iovmm_base; - domain->geometry.aperture_end = gart_handle->iovmm_end - 1; - domain->geometry.force_aperture = true; + if (!domain) + return NULL; + + if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(domain)) { + kfree(domain); + return NULL; } + domain->geometry.aperture_start = gart_handle->iovmm_base; + domain->geometry.aperture_end = gart_handle->iovmm_end - 1; + domain->geometry.force_aperture = true; + return domain; } -- 2.28.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 17/18] media/omap3isp: Clean up IOMMU workaround
Now that arch/arm is wired up for default domains and iommu-dma, devices behind IOMMUs will get mappings set up automatically as appropriate, so there is no need for drivers to do so manually. Signed-off-by: Robin Murphy --- drivers/media/platform/omap3isp/isp.c | 68 ++- drivers/media/platform/omap3isp/isp.h | 3 -- 2 files changed, 3 insertions(+), 68 deletions(-) diff --git a/drivers/media/platform/omap3isp/isp.c b/drivers/media/platform/omap3isp/isp.c index b91e472ee764..196522883231 100644 --- a/drivers/media/platform/omap3isp/isp.c +++ b/drivers/media/platform/omap3isp/isp.c @@ -56,10 +56,6 @@ #include #include -#ifdef CONFIG_ARM_DMA_USE_IOMMU -#include -#endif - #include #include #include @@ -1942,51 +1938,6 @@ static int isp_initialize_modules(struct isp_device *isp) return ret; } -static void isp_detach_iommu(struct isp_device *isp) -{ -#ifdef CONFIG_ARM_DMA_USE_IOMMU - arm_iommu_detach_device(isp->dev); - arm_iommu_release_mapping(isp->mapping); - isp->mapping = NULL; -#endif -} - -static int isp_attach_iommu(struct isp_device *isp) -{ -#ifdef CONFIG_ARM_DMA_USE_IOMMU - struct dma_iommu_mapping *mapping; - int ret; - - /* -* Create the ARM mapping, used by the ARM DMA mapping core to allocate -* VAs. This will allocate a corresponding IOMMU domain. -*/ - mapping = arm_iommu_create_mapping(&platform_bus_type, SZ_1G, SZ_2G); - if (IS_ERR(mapping)) { - dev_err(isp->dev, "failed to create ARM IOMMU mapping\n"); - return PTR_ERR(mapping); - } - - isp->mapping = mapping; - - /* Attach the ARM VA mapping to the device. */ - ret = arm_iommu_attach_device(isp->dev, mapping); - if (ret < 0) { - dev_err(isp->dev, "failed to attach device to VA mapping\n"); - goto error; - } - - return 0; - -error: - arm_iommu_release_mapping(isp->mapping); - isp->mapping = NULL; - return ret; -#else - return -ENODEV; -#endif -} - /* * isp_remove - Remove ISP platform device * @pdev: Pointer to ISP platform device @@ -2002,10 +1953,6 @@ static int isp_remove(struct platform_device *pdev) isp_cleanup_modules(isp); isp_xclk_cleanup(isp); - __omap3isp_get(isp, false); - isp_detach_iommu(isp); - __omap3isp_put(isp, false); - media_entity_enum_cleanup(&isp->crashed); v4l2_async_notifier_cleanup(&isp->notifier); @@ -2383,18 +2330,11 @@ static int isp_probe(struct platform_device *pdev) isp->mmio_hist_base_phys = mem->start + isp_res_maps[m].offset[OMAP3_ISP_IOMEM_HIST]; - /* IOMMU */ - ret = isp_attach_iommu(isp); - if (ret < 0) { - dev_err(&pdev->dev, "unable to attach to IOMMU\n"); - goto error_isp; - } - /* Interrupt */ ret = platform_get_irq(pdev, 0); if (ret <= 0) { ret = -ENODEV; - goto error_iommu; + goto error_isp; } isp->irq_num = ret; @@ -2402,13 +2342,13 @@ static int isp_probe(struct platform_device *pdev) "OMAP3 ISP", isp)) { dev_err(isp->dev, "Unable to request IRQ\n"); ret = -EINVAL; - goto error_iommu; + goto error_isp; } /* Entities */ ret = isp_initialize_modules(isp); if (ret < 0) - goto error_iommu; + goto error_isp; ret = isp_register_entities(isp); if (ret < 0) @@ -2433,8 +2373,6 @@ static int isp_probe(struct platform_device *pdev) isp_unregister_entities(isp); error_modules: isp_cleanup_modules(isp); -error_iommu: - isp_detach_iommu(isp); error_isp: isp_xclk_cleanup(isp); __omap3isp_put(isp, false); diff --git a/drivers/media/platform/omap3isp/isp.h b/drivers/media/platform/omap3isp/isp.h index a9d760fbf349..b50459106d89 100644 --- a/drivers/media/platform/omap3isp/isp.h +++ b/drivers/media/platform/omap3isp/isp.h @@ -145,7 +145,6 @@ struct isp_xclk { * @syscon: Regmap for the syscon register space * @syscon_offset: Offset of the CSIPHY control register in syscon * @phy_type: ISP_PHY_TYPE_{3430,3630} - * @mapping: IOMMU mapping * @stat_lock: Spinlock for handling statistics * @isp_mutex: Mutex for serializing requests to ISP. * @stop_failure: Indicates that an entity failed to stop. @@ -185,8 +184,6 @@ struct isp_device { u32 syscon_offset; u32 phy_type; - struct dma_iommu_mapping *mapping; - /* ISP Obj */ spinlock_t stat_lock; /* common lock for statistic drivers */ struct mutex isp_mutex; /* For handling ref_count field */ -- 2.28.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mail
[PATCH 15/18] drm/nouveau/tegra: Clean up IOMMU workaround
Now that arch/arm is wired up for default domains and iommu-dma, we no longer need to work around the arch-private mapping. Signed-off-by: Robin Murphy --- drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 13 - 1 file changed, 13 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c index d0d52c1d4aee..410ee1f83e0b 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c @@ -23,10 +23,6 @@ #ifdef CONFIG_NOUVEAU_PLATFORM_DRIVER #include "priv.h" -#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU) -#include -#endif - static int nvkm_device_tegra_power_up(struct nvkm_device_tegra *tdev) { @@ -109,15 +105,6 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra *tdev) unsigned long pgsize_bitmap; int ret; -#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU) - if (dev->archdata.mapping) { - struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); - - arm_iommu_detach_device(dev); - arm_iommu_release_mapping(mapping); - } -#endif - if (!tdev->func->iommu_bit) return; -- 2.28.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 16/18] staging/media/tegra-vde: Clean up IOMMU workaround
Now that arch/arm is wired up for default domains and iommu-dma, we no longer need to work around the arch-private mapping. Signed-off-by: Robin Murphy --- drivers/staging/media/tegra-vde/iommu.c | 12 1 file changed, 12 deletions(-) diff --git a/drivers/staging/media/tegra-vde/iommu.c b/drivers/staging/media/tegra-vde/iommu.c index 6af863d92123..4f770189ed34 100644 --- a/drivers/staging/media/tegra-vde/iommu.c +++ b/drivers/staging/media/tegra-vde/iommu.c @@ -10,10 +10,6 @@ #include #include -#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU) -#include -#endif - #include "vde.h" int tegra_vde_iommu_map(struct tegra_vde *vde, @@ -70,14 +66,6 @@ int tegra_vde_iommu_init(struct tegra_vde *vde) if (!vde->group) return 0; -#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU) - if (dev->archdata.mapping) { - struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); - - arm_iommu_detach_device(dev); - arm_iommu_release_mapping(mapping); - } -#endif vde->domain = iommu_domain_alloc(&platform_bus_type); if (!vde->domain) { err = -ENOMEM; -- 2.28.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 06/18] ARM/dma-mapping: Support IOMMU default domains
Now that iommu-dma is wired up, we can let it work as normal without the dma_iommu_mapping hacks if the IOMMU driver already supports default domains. Signed-off-by: Robin Murphy --- arch/arm/mm/dma-mapping.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 0f69ede44cd7..2ef0afc17645 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -1220,6 +1220,13 @@ static bool arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size, if (!iommu) return false; + /* If a default domain exists, just let iommu-dma work normally */ + if (iommu_get_domain_for_dev(dev)) { + iommu_setup_dma_ops(dev, dma_base, size); + return true; + } + + /* Otherwise, use the workaround until the IOMMU driver is updated */ mapping = arm_iommu_create_mapping(dev->bus, dma_base, size); if (IS_ERR(mapping)) { pr_warn("Failed to create %llu-byte IOMMU mapping for device %s\n", @@ -1234,6 +1241,7 @@ static bool arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size, return false; } + set_dma_ops(dev, &iommu_dma_ops); return true; } @@ -1263,8 +1271,6 @@ static void arm_teardown_iommu_dma_ops(struct device *dev) { } void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, const struct iommu_ops *iommu, bool coherent) { - const struct dma_map_ops *dma_ops; - dev->archdata.dma_coherent = coherent; #ifdef CONFIG_SWIOTLB dev->dma_coherent = coherent; @@ -1278,12 +1284,9 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, if (dev->dma_ops) return; - if (arm_setup_iommu_dma_ops(dev, dma_base, size, iommu)) - dma_ops = &iommu_dma_ops; - else - dma_ops = arm_get_dma_map_ops(coherent); + set_dma_ops(dev, arm_get_dma_map_ops(coherent)); - set_dma_ops(dev, dma_ops); + arm_setup_iommu_dma_ops(dev, dma_base, size, iommu); #ifdef CONFIG_XEN if (xen_initial_domain()) -- 2.28.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 14/18] drm/exynos: Consolidate IOMMU mapping code
Now that arch/arm is wired up for default domains and iommu-dma, we can consolidate the shared mapping code onto the generic IOMMU API version, and retire the arch-specific implementation. Signed-off-by: Robin Murphy --- This is a cheeky revert of 07dc3678bacc ("drm/exynos: Fix cleanup of IOMMU related objects"), plus removal of the remaining arm_iommu_* references on top. --- drivers/gpu/drm/exynos/exynos5433_drm_decon.c | 5 +- drivers/gpu/drm/exynos/exynos7_drm_decon.c| 5 +- drivers/gpu/drm/exynos/exynos_drm_dma.c | 61 +++ drivers/gpu/drm/exynos/exynos_drm_drv.h | 6 +- drivers/gpu/drm/exynos/exynos_drm_fimc.c | 5 +- drivers/gpu/drm/exynos/exynos_drm_fimd.c | 5 +- drivers/gpu/drm/exynos/exynos_drm_g2d.c | 5 +- drivers/gpu/drm/exynos/exynos_drm_gsc.c | 5 +- drivers/gpu/drm/exynos/exynos_drm_rotator.c | 5 +- drivers/gpu/drm/exynos/exynos_drm_scaler.c| 6 +- drivers/gpu/drm/exynos/exynos_mixer.c | 7 +-- 11 files changed, 29 insertions(+), 86 deletions(-) diff --git a/drivers/gpu/drm/exynos/exynos5433_drm_decon.c b/drivers/gpu/drm/exynos/exynos5433_drm_decon.c index 1f79bc2a881e..8428ae12dfa5 100644 --- a/drivers/gpu/drm/exynos/exynos5433_drm_decon.c +++ b/drivers/gpu/drm/exynos/exynos5433_drm_decon.c @@ -55,7 +55,6 @@ static const char * const decon_clks_name[] = { struct decon_context { struct device *dev; struct drm_device *drm_dev; - void*dma_priv; struct exynos_drm_crtc *crtc; struct exynos_drm_plane planes[WINDOWS_NR]; struct exynos_drm_plane_config configs[WINDOWS_NR]; @@ -645,7 +644,7 @@ static int decon_bind(struct device *dev, struct device *master, void *data) decon_clear_channels(ctx->crtc); - return exynos_drm_register_dma(drm_dev, dev, &ctx->dma_priv); + return exynos_drm_register_dma(drm_dev, dev); } static void decon_unbind(struct device *dev, struct device *master, void *data) @@ -655,7 +654,7 @@ static void decon_unbind(struct device *dev, struct device *master, void *data) decon_atomic_disable(ctx->crtc); /* detach this sub driver from iommu mapping if supported. */ - exynos_drm_unregister_dma(ctx->drm_dev, ctx->dev, &ctx->dma_priv); + exynos_drm_unregister_dma(ctx->drm_dev, ctx->dev); } static const struct component_ops decon_component_ops = { diff --git a/drivers/gpu/drm/exynos/exynos7_drm_decon.c b/drivers/gpu/drm/exynos/exynos7_drm_decon.c index f2d87a7445c7..e7b58097ccdc 100644 --- a/drivers/gpu/drm/exynos/exynos7_drm_decon.c +++ b/drivers/gpu/drm/exynos/exynos7_drm_decon.c @@ -40,7 +40,6 @@ struct decon_context { struct device *dev; struct drm_device *drm_dev; - void*dma_priv; struct exynos_drm_crtc *crtc; struct exynos_drm_plane planes[WINDOWS_NR]; struct exynos_drm_plane_config configs[WINDOWS_NR]; @@ -128,13 +127,13 @@ static int decon_ctx_initialize(struct decon_context *ctx, decon_clear_channels(ctx->crtc); - return exynos_drm_register_dma(drm_dev, ctx->dev, &ctx->dma_priv); + return exynos_drm_register_dma(drm_dev, ctx->dev); } static void decon_ctx_remove(struct decon_context *ctx) { /* detach this sub driver from iommu mapping if supported. */ - exynos_drm_unregister_dma(ctx->drm_dev, ctx->dev, &ctx->dma_priv); + exynos_drm_unregister_dma(ctx->drm_dev, ctx->dev); } static u32 decon_calc_clkdiv(struct decon_context *ctx, diff --git a/drivers/gpu/drm/exynos/exynos_drm_dma.c b/drivers/gpu/drm/exynos/exynos_drm_dma.c index 58b89ec11b0e..fd5f9bcf1857 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_dma.c +++ b/drivers/gpu/drm/exynos/exynos_drm_dma.c @@ -14,19 +14,6 @@ #include "exynos_drm_drv.h" -#if defined(CONFIG_ARM_DMA_USE_IOMMU) -#include -#else -#define arm_iommu_create_mapping(...) ({ NULL; }) -#define arm_iommu_attach_device(...) ({ -ENODEV; }) -#define arm_iommu_release_mapping(...) ({ }) -#define arm_iommu_detach_device(...) ({ }) -#define to_dma_iommu_mapping(dev) NULL -#endif - -#if !defined(CONFIG_IOMMU_DMA) -#define iommu_dma_init_domain(...) ({ -EINVAL; }) -#endif #define EXYNOS_DEV_ADDR_START 0x2000 #define EXYNOS_DEV_ADDR_SIZE 0x4000 @@ -58,7 +45,7 @@ static inline void clear_dma_max_seg_size(struct device *dev) * mapping. */ static int drm_iommu_attach_device(struct drm_device *drm_dev, - struct device *subdrv_dev, void **dma_priv) + struct device *subdrv_dev) { struct exynos_drm_private *priv = drm_dev->dev_private; int ret = 0; @@ -73,22 +60,7 @@ static int drm_iommu_attach_device(struct drm_device *drm_dev, if (ret) return ret; - if (IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU))
[PATCH 05/18] ARM/dma-mapping: Switch to iommu_dma_ops
With the IOMMU ops now looking much the same shape as iommu_dma_ops, switch them out in favour of the iommu-dma library, currently enhanced with temporary workarounds that allow it to also sit underneath the arch-specific API. With that in place, we can now start converting the remaining IOMMU drivers and consumers to work with IOMMU API default domains instead. Signed-off-by: Robin Murphy --- arch/arm/Kconfig | 24 +- arch/arm/include/asm/dma-iommu.h | 8 - arch/arm/mm/dma-mapping.c| 887 +-- drivers/iommu/Kconfig| 8 - drivers/media/platform/Kconfig | 1 - 5 files changed, 22 insertions(+), 906 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index b91273f9fd43..79406fe5cd6b 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -133,31 +133,11 @@ config ARM_HAS_SG_CHAIN bool config ARM_DMA_USE_IOMMU - bool + def_bool IOMMU_SUPPORT select ARM_HAS_SG_CHAIN + select IOMMU_DMA select NEED_SG_DMA_LENGTH -if ARM_DMA_USE_IOMMU - -config ARM_DMA_IOMMU_ALIGNMENT - int "Maximum PAGE_SIZE order of alignment for DMA IOMMU buffers" - range 4 9 - default 8 - help - DMA mapping framework by default aligns all buffers to the smallest - PAGE_SIZE order which is greater than or equal to the requested buffer - size. This works well for buffers up to a few hundreds kilobytes, but - for larger buffers it just a waste of address space. Drivers which has - relatively small addressing window (like 64Mib) might run out of - virtual space with just a few allocations. - - With this parameter you can specify the maximum PAGE_SIZE order for - DMA IOMMU buffers. Larger buffers will be aligned only to this - specified order. The order is expressed as a power of two multiplied - by the PAGE_SIZE. - -endif - config SYS_SUPPORTS_APM_EMULATION bool diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h index 86405cc81385..f39cfa509fe4 100644 --- a/arch/arm/include/asm/dma-iommu.h +++ b/arch/arm/include/asm/dma-iommu.h @@ -13,14 +13,6 @@ struct dma_iommu_mapping { /* iommu specific data */ struct iommu_domain *domain; - unsigned long **bitmaps; /* array of bitmaps */ - unsigned intnr_bitmaps; /* nr of elements in array */ - unsigned intextensions; - size_t bitmap_size;/* size of a single bitmap */ - size_t bits; /* per bitmap */ - dma_addr_t base; - - spinlock_t lock; struct kref kref; }; diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 0537c97cebe1..0f69ede44cd7 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -1074,812 +1075,9 @@ static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent) #ifdef CONFIG_ARM_DMA_USE_IOMMU -static int __dma_info_to_prot(enum dma_data_direction dir, unsigned long attrs) -{ - int prot = 0; - - if (attrs & DMA_ATTR_PRIVILEGED) - prot |= IOMMU_PRIV; - - switch (dir) { - case DMA_BIDIRECTIONAL: - return prot | IOMMU_READ | IOMMU_WRITE; - case DMA_TO_DEVICE: - return prot | IOMMU_READ; - case DMA_FROM_DEVICE: - return prot | IOMMU_WRITE; - default: - return prot; - } -} - -/* IOMMU */ - -static int extend_iommu_mapping(struct dma_iommu_mapping *mapping); - -static inline dma_addr_t __alloc_iova(struct dma_iommu_mapping *mapping, - size_t size) -{ - unsigned int order = get_order(size); - unsigned int align = 0; - unsigned int count, start; - size_t mapping_size = mapping->bits << PAGE_SHIFT; - unsigned long flags; - dma_addr_t iova; - int i; - - if (order > CONFIG_ARM_DMA_IOMMU_ALIGNMENT) - order = CONFIG_ARM_DMA_IOMMU_ALIGNMENT; - - count = PAGE_ALIGN(size) >> PAGE_SHIFT; - align = (1 << order) - 1; - - spin_lock_irqsave(&mapping->lock, flags); - for (i = 0; i < mapping->nr_bitmaps; i++) { - start = bitmap_find_next_zero_area(mapping->bitmaps[i], - mapping->bits, 0, count, align); - - if (start > mapping->bits) - continue; - - bitmap_set(mapping->bitmaps[i], start, count); - break; - } - - /* -* No unused range found. Try to extend the existing mapping -* and perform a second attempt to reserve an IO virtual -* address range of size bytes. -*/ - if (i == mapping->nr_bitmaps) { -
[PATCH 08/18] iommu/renesas: Remove arch/arm workaround
Now that arch/arm is wired up for default domains and iommu-dma, remove the shared mapping workaround and rely on groups there as well. Signed-off-by: Robin Murphy --- drivers/iommu/ipmmu-vmsa.c | 69 -- 1 file changed, 69 deletions(-) diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index 0f18abda0e20..8ad74a76f402 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -26,15 +26,6 @@ #include #include -#if defined(CONFIG_ARM) && !defined(CONFIG_IOMMU_DMA) -#include -#else -#define arm_iommu_create_mapping(...) NULL -#define arm_iommu_attach_device(...) -ENODEV -#define arm_iommu_release_mapping(...) do {} while (0) -#define arm_iommu_detach_device(...) do {} while (0) -#endif - #define IPMMU_CTX_MAX 8U #define IPMMU_CTX_INVALID -1 @@ -67,7 +58,6 @@ struct ipmmu_vmsa_device { s8 utlb_ctx[IPMMU_UTLB_MAX]; struct iommu_group *group; - struct dma_iommu_mapping *mapping; }; struct ipmmu_vmsa_domain { @@ -805,50 +795,6 @@ static int ipmmu_of_xlate(struct device *dev, return ipmmu_init_platform_device(dev, spec); } -static int ipmmu_init_arm_mapping(struct device *dev) -{ - struct ipmmu_vmsa_device *mmu = to_ipmmu(dev); - int ret; - - /* -* Create the ARM mapping, used by the ARM DMA mapping core to allocate -* VAs. This will allocate a corresponding IOMMU domain. -* -* TODO: -* - Create one mapping per context (TLB). -* - Make the mapping size configurable ? We currently use a 2GB mapping -* at a 1GB offset to ensure that NULL VAs will fault. -*/ - if (!mmu->mapping) { - struct dma_iommu_mapping *mapping; - - mapping = arm_iommu_create_mapping(&platform_bus_type, - SZ_1G, SZ_2G); - if (IS_ERR(mapping)) { - dev_err(mmu->dev, "failed to create ARM IOMMU mapping\n"); - ret = PTR_ERR(mapping); - goto error; - } - - mmu->mapping = mapping; - } - - /* Attach the ARM VA mapping to the device. */ - ret = arm_iommu_attach_device(dev, mmu->mapping); - if (ret < 0) { - dev_err(dev, "Failed to attach device to VA mapping\n"); - goto error; - } - - return 0; - -error: - if (mmu->mapping) - arm_iommu_release_mapping(mmu->mapping); - - return ret; -} - static struct iommu_device *ipmmu_probe_device(struct device *dev) { struct ipmmu_vmsa_device *mmu = to_ipmmu(dev); @@ -862,20 +808,8 @@ static struct iommu_device *ipmmu_probe_device(struct device *dev) return &mmu->iommu; } -static void ipmmu_probe_finalize(struct device *dev) -{ - int ret = 0; - - if (IS_ENABLED(CONFIG_ARM) && !IS_ENABLED(CONFIG_IOMMU_DMA)) - ret = ipmmu_init_arm_mapping(dev); - - if (ret) - dev_err(dev, "Can't create IOMMU mapping - DMA-OPS will not work\n"); -} - static void ipmmu_release_device(struct device *dev) { - arm_iommu_detach_device(dev); } static struct iommu_group *ipmmu_find_group(struct device *dev) @@ -905,7 +839,6 @@ static const struct iommu_ops ipmmu_ops = { .iova_to_phys = ipmmu_iova_to_phys, .probe_device = ipmmu_probe_device, .release_device = ipmmu_release_device, - .probe_finalize = ipmmu_probe_finalize, .device_group = IS_ENABLED(CONFIG_ARM) && !IS_ENABLED(CONFIG_IOMMU_DMA) ? generic_device_group : ipmmu_find_group, .pgsize_bitmap = SZ_1G | SZ_2M | SZ_4K, @@ -1118,8 +1051,6 @@ static int ipmmu_remove(struct platform_device *pdev) iommu_device_sysfs_remove(&mmu->iommu); iommu_device_unregister(&mmu->iommu); - arm_iommu_release_mapping(mmu->mapping); - ipmmu_device_reset(mmu); return 0; -- 2.28.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 13/18] iommu/tegra: Add IOMMU_DOMAIN_DMA support
Now that arch/arm is wired up for default domains and iommu-dma, implement the corresponding driver-side support for DMA domains. Signed-off-by: Robin Murphy --- drivers/iommu/tegra-smmu.c | 37 + 1 file changed, 21 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index 124c8848ab7e..8e276eac84d9 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -5,6 +5,7 @@ #include #include +#include #include #include #include @@ -278,7 +279,7 @@ static struct iommu_domain *tegra_smmu_domain_alloc(unsigned type) { struct tegra_smmu_as *as; - if (type != IOMMU_DOMAIN_UNMANAGED) + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA) return NULL; as = kzalloc(sizeof(*as), GFP_KERNEL); @@ -288,25 +289,19 @@ static struct iommu_domain *tegra_smmu_domain_alloc(unsigned type) as->attr = SMMU_PD_READABLE | SMMU_PD_WRITABLE | SMMU_PD_NONSECURE; as->pd = alloc_page(GFP_KERNEL | __GFP_DMA | __GFP_ZERO); - if (!as->pd) { - kfree(as); - return NULL; - } + if (!as->pd) + goto out_free_as; as->count = kcalloc(SMMU_NUM_PDE, sizeof(u32), GFP_KERNEL); - if (!as->count) { - __free_page(as->pd); - kfree(as); - return NULL; - } + if (!as->count) + goto out_free_all; as->pts = kcalloc(SMMU_NUM_PDE, sizeof(*as->pts), GFP_KERNEL); - if (!as->pts) { - kfree(as->count); - __free_page(as->pd); - kfree(as); - return NULL; - } + if (!as->pts) + goto out_free_all; + + if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(&as->domain)) + goto out_free_all; /* setup aperture */ as->domain.geometry.aperture_start = 0; @@ -314,12 +309,22 @@ static struct iommu_domain *tegra_smmu_domain_alloc(unsigned type) as->domain.geometry.force_aperture = true; return &as->domain; + +out_free_all: + kfree(as->pts); + kfree(as->count); + __free_page(as->pd); +out_free_as: + kfree(as); + return NULL; } static void tegra_smmu_domain_free(struct iommu_domain *domain) { struct tegra_smmu_as *as = to_smmu_as(domain); + iommu_put_dma_cookie(domain); + /* TODO: free page directory and page tables */ WARN_ON_ONCE(as->use_count); -- 2.28.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 09/18] iommu/mediatek-v1: Add IOMMU_DOMAIN_DMA support
Now that arch/arm is wired up for default domains and iommu-dma, implement the corresponding driver-side support for groups and DMA domains to replace the shared mapping workaround. Signed-off-by: Robin Murphy --- drivers/iommu/mtk_iommu.h| 2 - drivers/iommu/mtk_iommu_v1.c | 153 +++ 2 files changed, 48 insertions(+), 107 deletions(-) diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h index 122925dbe547..6253e98d810c 100644 --- a/drivers/iommu/mtk_iommu.h +++ b/drivers/iommu/mtk_iommu.h @@ -67,8 +67,6 @@ struct mtk_iommu_data { struct iommu_device iommu; const struct mtk_iommu_plat_data *plat_data; - struct dma_iommu_mapping*mapping; /* For mtk_iommu_v1.c */ - struct list_headlist; struct mtk_smi_larb_iommu larb_imu[MTK_LARB_NR_MAX]; }; diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c index 82ddfe9170d4..40c89b8d3ac4 100644 --- a/drivers/iommu/mtk_iommu_v1.c +++ b/drivers/iommu/mtk_iommu_v1.c @@ -28,7 +28,6 @@ #include #include #include -#include #include #include #include @@ -240,13 +239,18 @@ static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type) { struct mtk_iommu_domain *dom; - if (type != IOMMU_DOMAIN_UNMANAGED) + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA) return NULL; dom = kzalloc(sizeof(*dom), GFP_KERNEL); if (!dom) return NULL; + if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(&dom->domain)) { + kfree(dom); + return NULL; + } + return &dom->domain; } @@ -257,6 +261,7 @@ static void mtk_iommu_domain_free(struct iommu_domain *domain) dma_free_coherent(data->dev, M2701_IOMMU_PGT_SIZE, dom->pgt_va, dom->pgt_pa); + iommu_put_dma_cookie(domain); kfree(to_mtk_domain(domain)); } @@ -265,14 +270,8 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain, { struct mtk_iommu_data *data = dev_iommu_priv_get(dev); struct mtk_iommu_domain *dom = to_mtk_domain(domain); - struct dma_iommu_mapping *mtk_mapping; int ret; - /* Only allow the domain created internally. */ - mtk_mapping = data->mapping; - if (mtk_mapping->domain != domain) - return 0; - if (!data->m4u_dom) { data->m4u_dom = dom; ret = mtk_iommu_domain_finalise(data); @@ -358,18 +357,39 @@ static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain, static const struct iommu_ops mtk_iommu_ops; -/* - * MTK generation one iommu HW only support one iommu domain, and all the client - * sharing the same iova address space. - */ -static int mtk_iommu_create_mapping(struct device *dev, - struct of_phandle_args *args) +static struct iommu_device *mtk_iommu_probe_device(struct device *dev) { struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); struct mtk_iommu_data *data; + + if (!fwspec || fwspec->ops != &mtk_iommu_ops) + return ERR_PTR(-ENODEV); /* Not a iommu client device */ + + data = dev_iommu_priv_get(dev); + + return &data->iommu; +} + +static void mtk_iommu_release_device(struct device *dev) +{ + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); + + if (!fwspec || fwspec->ops != &mtk_iommu_ops) + return; + + iommu_fwspec_free(dev); +} + +static struct iommu_group *mtk_iommu_device_group(struct device *dev) +{ + struct mtk_iommu_data *data = dev_iommu_priv_get(dev); + + return iommu_group_ref_get(data->m4u_group); +} + +static int mtk_iommu_of_xlate(struct device *dev, struct of_phandle_args *args) +{ struct platform_device *m4updev; - struct dma_iommu_mapping *mtk_mapping; - int ret; if (args->args_count != 1) { dev_err(dev, "invalid #iommu-cells(%d) property for IOMMU\n", @@ -377,15 +397,6 @@ static int mtk_iommu_create_mapping(struct device *dev, return -EINVAL; } - if (!fwspec) { - ret = iommu_fwspec_init(dev, &args->np->fwnode, &mtk_iommu_ops); - if (ret) - return ret; - fwspec = dev_iommu_fwspec_get(dev); - } else if (dev_iommu_fwspec_get(dev)->ops != &mtk_iommu_ops) { - return -EINVAL; - } - if (!dev_iommu_priv_get(dev)) { /* Get the m4u device */ m4updev = of_find_device_by_node(args->np); @@ -395,83 +406,7 @@ static int mtk_iommu_create_mapping(struct device *dev, dev_iommu_priv_set(dev, platform_get_drvdata(m4updev)); } - ret = iommu_fwspec_add_ids(dev, args->args, 1); - if (ret) - return ret; - - data = dev_iommu_priv_get(dev); -
[PATCH 11/18] iommu/omap: Add IOMMU_DOMAIN_DMA support
Now that arch/arm is wired up for default domains and iommu-dma, implement the corresponding driver-side support for DMA domains. Signed-off-by: Robin Murphy --- drivers/iommu/omap-iommu.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c index 71f29c0927fc..ea25c2fe0418 100644 --- a/drivers/iommu/omap-iommu.c +++ b/drivers/iommu/omap-iommu.c @@ -9,6 +9,7 @@ * Paul Mundt and Toshihiro Kobayashi */ +#include #include #include #include @@ -1574,13 +1575,19 @@ static struct iommu_domain *omap_iommu_domain_alloc(unsigned type) { struct omap_iommu_domain *omap_domain; - if (type != IOMMU_DOMAIN_UNMANAGED) + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA) return NULL; omap_domain = kzalloc(sizeof(*omap_domain), GFP_KERNEL); if (!omap_domain) return NULL; + if (type == IOMMU_DOMAIN_DMA && + iommu_get_dma_cookie(&omap_domain->domain)) { + kfree(omap_domain); + return NULL; + } + spin_lock_init(&omap_domain->lock); omap_domain->domain.geometry.aperture_start = 0; @@ -1601,6 +1608,7 @@ static void omap_iommu_domain_free(struct iommu_domain *domain) if (omap_domain->dev) _omap_iommu_detach_dev(omap_domain, omap_domain->dev); + iommu_put_dma_cookie(&omap_domain->domain); kfree(omap_domain); } @@ -1736,6 +1744,17 @@ static struct iommu_group *omap_iommu_device_group(struct device *dev) return group; } +static int omap_iommu_of_xlate(struct device *dev, + struct of_phandle_args *args) +{ + /* +* Logically, some of the housekeeping from _omap_iommu_add_device() +* should probably move here, but the minimum we *need* is simply to +* cooperate with of_iommu at all to let iommu-dma work. +*/ + return 0; +} + static const struct iommu_ops omap_iommu_ops = { .domain_alloc = omap_iommu_domain_alloc, .domain_free= omap_iommu_domain_free, @@ -1747,6 +1766,7 @@ static const struct iommu_ops omap_iommu_ops = { .probe_device = omap_iommu_probe_device, .release_device = omap_iommu_release_device, .device_group = omap_iommu_device_group, + .of_xlate = omap_iommu_of_xlate, .pgsize_bitmap = OMAP_IOMMU_PGSIZES, }; -- 2.28.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 03/18] ARM/dma-mapping: Merge IOMMU ops
The dma_sync_* operations are now the only difference between the coherent and non-coherent IOMMU ops. Some minor tweaks to make those safe for coherent devices with minimal overhead, and we can condense down to a single set of DMA ops. Signed-off-by: Robin Murphy --- arch/arm/mm/dma-mapping.c | 41 +-- 1 file changed, 13 insertions(+), 28 deletions(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 1bb7e9608f75..0537c97cebe1 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -1677,6 +1677,9 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *s; int i; + if (dev->dma_coherent) + return; + for_each_sg(sg, s, nents, i) __dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir); @@ -1696,6 +1699,9 @@ static void arm_iommu_sync_sg_for_device(struct device *dev, struct scatterlist *s; int i; + if (dev->dma_coherent) + return; + for_each_sg(sg, s, nents, i) __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); } @@ -1829,12 +1835,13 @@ static void arm_iommu_sync_single_for_cpu(struct device *dev, { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); dma_addr_t iova = handle & PAGE_MASK; - struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); + struct page *page; unsigned int offset = handle & ~PAGE_MASK; - if (!iova) + if (dev->dma_coherent || !iova) return; + page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); __dma_page_dev_to_cpu(page, offset, size, dir); } @@ -1843,12 +1850,13 @@ static void arm_iommu_sync_single_for_device(struct device *dev, { struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); dma_addr_t iova = handle & PAGE_MASK; - struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); + struct page *page; unsigned int offset = handle & ~PAGE_MASK; - if (!iova) + if (dev->dma_coherent || !iova) return; + page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); __dma_page_cpu_to_dev(page, offset, size, dir); } @@ -1872,22 +1880,6 @@ static const struct dma_map_ops iommu_ops = { .unmap_resource = arm_iommu_unmap_resource, }; -static const struct dma_map_ops iommu_coherent_ops = { - .alloc = arm_iommu_alloc_attrs, - .free = arm_iommu_free_attrs, - .mmap = arm_iommu_mmap_attrs, - .get_sgtable= arm_iommu_get_sgtable, - - .map_page = arm_iommu_map_page, - .unmap_page = arm_iommu_unmap_page, - - .map_sg = arm_iommu_map_sg, - .unmap_sg = arm_iommu_unmap_sg, - - .map_resource = arm_iommu_map_resource, - .unmap_resource = arm_iommu_unmap_resource, -}; - /** * arm_iommu_create_mapping * @bus: pointer to the bus holding the client device (for IOMMU calls) @@ -2067,11 +2059,6 @@ void arm_iommu_detach_device(struct device *dev) } EXPORT_SYMBOL_GPL(arm_iommu_detach_device); -static const struct dma_map_ops *arm_get_iommu_dma_map_ops(bool coherent) -{ - return coherent ? &iommu_coherent_ops : &iommu_ops; -} - static bool arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size, const struct iommu_ops *iommu) { @@ -2118,8 +2105,6 @@ static bool arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size, static void arm_teardown_iommu_dma_ops(struct device *dev) { } -#define arm_get_iommu_dma_map_ops arm_get_dma_map_ops - #endif /* CONFIG_ARM_DMA_USE_IOMMU */ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, @@ -2141,7 +2126,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, return; if (arm_setup_iommu_dma_ops(dev, dma_base, size, iommu)) - dma_ops = arm_get_iommu_dma_map_ops(coherent); + dma_ops = &iommu_ops; else dma_ops = arm_get_dma_map_ops(coherent); -- 2.28.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 04/18] iommu/dma: Add temporary hacks for arch/arm
In order to wrangle arch/arm over to iommu_dma_ops, we first need to convert all its associated IOMMU drivers over to default domains, and deal with users of its public dma_iommu_mappinng API. Since that can't reasonably be done in a single patch, we've no choice but to go through an ugly transitional phase. That starts with exposing some hooks into iommu-dma's internals so that it can start to do most of the heavy lifting. Before you start thinking about how horrible that is, here's a zebra: , c@ `)\ < / Signed-off-by: Robin Murphy --- drivers/iommu/dma-iommu.c | 38 +- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 4959f5df21bd..ab157d155bf7 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -25,6 +25,19 @@ #include #include +#ifdef CONFIG_ARM +#include +#endif +static struct iommu_domain *__iommu_get_dma_domain(struct device *dev) +{ +#ifdef CONFIG_ARM + struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev); + if (mapping) + return mapping->domain; +#endif + return iommu_get_dma_domain(dev); +} + struct iommu_dma_msi_page { struct list_headlist; dma_addr_t iova; @@ -298,8 +311,11 @@ static void iommu_dma_flush_iotlb_all(struct iova_domain *iovad) * avoid rounding surprises. If necessary, we reserve the page at address 0 * to ensure it is an invalid IOVA. It is safe to reinitialise a domain, but * any change which could make prior IOVAs invalid will fail. + * + * XXX: Not formally exported, but needs to be referenced + * from arch/arm/mm/dma-mapping.c temporarily */ -static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, +int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 size, struct device *dev) { struct iommu_dma_cookie *cookie = domain->iova_cookie; @@ -456,7 +472,7 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie, static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr, size_t size) { - struct iommu_domain *domain = iommu_get_dma_domain(dev); + struct iommu_domain *domain = __iommu_get_dma_domain(dev); struct iommu_dma_cookie *cookie = domain->iova_cookie; struct iova_domain *iovad = &cookie->iovad; size_t iova_off = iova_offset(iovad, dma_addr); @@ -478,7 +494,7 @@ static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr, static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, size_t size, int prot, u64 dma_mask) { - struct iommu_domain *domain = iommu_get_dma_domain(dev); + struct iommu_domain *domain = __iommu_get_dma_domain(dev); struct iommu_dma_cookie *cookie = domain->iova_cookie; struct iova_domain *iovad = &cookie->iovad; size_t iova_off = iova_offset(iovad, phys); @@ -582,7 +598,7 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev, static void *iommu_dma_alloc_remap(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) { - struct iommu_domain *domain = iommu_get_dma_domain(dev); + struct iommu_domain *domain = __iommu_get_dma_domain(dev); struct iommu_dma_cookie *cookie = domain->iova_cookie; struct iova_domain *iovad = &cookie->iovad; bool coherent = dev_is_dma_coherent(dev); @@ -678,7 +694,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev, if (dev_is_dma_coherent(dev)) return; - phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle); + phys = iommu_iova_to_phys(__iommu_get_dma_domain(dev), dma_handle); arch_sync_dma_for_cpu(phys, size, dir); } @@ -690,7 +706,7 @@ static void iommu_dma_sync_single_for_device(struct device *dev, if (dev_is_dma_coherent(dev)) return; - phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle); + phys = iommu_iova_to_phys(__iommu_get_dma_domain(dev), dma_handle); arch_sync_dma_for_device(phys, size, dir); } @@ -831,7 +847,7 @@ static void __invalidate_sg(struct scatterlist *sg, int nents) static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir, unsigned long attrs) { - struct iommu_domain *domain = iommu_get_dma_domain(dev); + struct iommu_domain *domain = __iommu_get_dma_domain(dev); struct iommu_dma_cookie *cookie = domain->iova_cookie; struct iova_domain *iovad = &cookie->iovad; struct scatterlist *s, *prev = NULL; @@ -1112,12 +1128,16 @@ static int iommu_dma_get_sgtable(struct device *dev, struct sg_table *sgt, static unsigned long iommu_dma_get_merge_boundary(struct device *dev) { - struct iommu_domain *domain = iommu_
[PATCH 00/18] Convert arch/arm to use iommu-dma
Hi all, After 5 years or so of intending to get round to this, finally the time comes! The changes themselves actualy turn out to be relatively mechanical; the bigger concern appears to be how to get everything merged across about 5 diffferent trees given the dependencies. I've lightly boot-tested things on Rockchip RK3288 and Exynos 4412 (Odroid-U3), to the degree that their display drivers should be using IOMMU-backed buffers and don't explode (the Odroid doesn't manage to send a working HDMI signal to the one monitor I have that it actually detects, but that's a pre-existing condition...) Confirmation that the Mediatek, OMAP and Tegra changes work will be most welcome. Patches are based on 5.9-rc1, branch available here: git://linux-arm.org/linux-rm arm/dma Robin. Robin Murphy (18): ARM/dma-mapping: Drop .dma_supported for IOMMU ops ARM/dma-mapping: Consolidate IOMMU ops callbacks ARM/dma-mapping: Merge IOMMU ops iommu/dma: Add temporary hacks for arch/arm ARM/dma-mapping: Switch to iommu_dma_ops ARM/dma-mapping: Support IOMMU default domains iommu/arm-smmu: Remove arch/arm workaround iommu/renesas: Remove arch/arm workaround iommu/mediatek-v1: Add IOMMU_DOMAIN_DMA support iommu/msm: Add IOMMU_DOMAIN_DMA support iommu/omap: Add IOMMU_DOMAIN_DMA support iommu/tegra-gart: Add IOMMU_DOMAIN_DMA support iommu/tegra: Add IOMMU_DOMAIN_DMA support drm/exynos: Consolidate IOMMU mapping code drm/nouveau/tegra: Clean up IOMMU workaround staging/media/tegra-vde: Clean up IOMMU workaround media/omap3isp: Clean up IOMMU workaround ARM/dma-mapping: Remove legacy dma-iommu API arch/arm/Kconfig | 28 +- arch/arm/common/dmabounce.c |1 - arch/arm/include/asm/device.h |9 - arch/arm/include/asm/dma-iommu.h | 37 - arch/arm/mm/dma-mapping.c | 1198 + drivers/gpu/drm/exynos/exynos5433_drm_decon.c |5 +- drivers/gpu/drm/exynos/exynos7_drm_decon.c|5 +- drivers/gpu/drm/exynos/exynos_drm_dma.c | 61 +- drivers/gpu/drm/exynos/exynos_drm_drv.h |6 +- drivers/gpu/drm/exynos/exynos_drm_fimc.c |5 +- drivers/gpu/drm/exynos/exynos_drm_fimd.c |5 +- drivers/gpu/drm/exynos/exynos_drm_g2d.c |5 +- drivers/gpu/drm/exynos/exynos_drm_gsc.c |5 +- drivers/gpu/drm/exynos/exynos_drm_rotator.c |5 +- drivers/gpu/drm/exynos/exynos_drm_scaler.c|6 +- drivers/gpu/drm/exynos/exynos_mixer.c |7 +- .../drm/nouveau/nvkm/engine/device/tegra.c| 13 - drivers/iommu/Kconfig |8 - drivers/iommu/arm/arm-smmu/arm-smmu.c | 10 - drivers/iommu/ipmmu-vmsa.c| 69 - drivers/iommu/msm_iommu.c |7 +- drivers/iommu/mtk_iommu.h |2 - drivers/iommu/mtk_iommu_v1.c | 153 +-- drivers/iommu/omap-iommu.c| 22 +- drivers/iommu/tegra-gart.c| 17 +- drivers/iommu/tegra-smmu.c| 37 +- drivers/media/platform/Kconfig|1 - drivers/media/platform/omap3isp/isp.c | 68 +- drivers/media/platform/omap3isp/isp.h |3 - drivers/staging/media/tegra-vde/iommu.c | 12 - 30 files changed, 150 insertions(+), 1660 deletions(-) delete mode 100644 arch/arm/include/asm/dma-iommu.h -- 2.28.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 01/18] ARM/dma-mapping: Drop .dma_supported for IOMMU ops
When an IOMMU is present, we trust that it should be capable of remapping any physical memory, and since the device masks represent the input (virtual) addresses to the IOMMU it makes no sense to validate them against physical PFNs anyway. Signed-off-by: Robin Murphy --- arch/arm/mm/dma-mapping.c | 4 1 file changed, 4 deletions(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 8a8949174b1c..ffa387f343dc 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -1997,8 +1997,6 @@ static const struct dma_map_ops iommu_ops = { .map_resource = arm_iommu_map_resource, .unmap_resource = arm_iommu_unmap_resource, - - .dma_supported = arm_dma_supported, }; static const struct dma_map_ops iommu_coherent_ops = { @@ -2015,8 +2013,6 @@ static const struct dma_map_ops iommu_coherent_ops = { .map_resource = arm_iommu_map_resource, .unmap_resource = arm_iommu_unmap_resource, - - .dma_supported = arm_dma_supported, }; /** -- 2.28.0.dirty ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 02/18] ARM/dma-mapping: Consolidate IOMMU ops callbacks
Merge the coherent and non-coherent callbacks down to a single implementation each, relying on the generic dev->dma_coherent flag at the points where the difference matters. Signed-off-by: Robin Murphy --- arch/arm/Kconfig | 4 +- arch/arm/mm/dma-mapping.c | 281 +++--- 2 files changed, 79 insertions(+), 206 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index e00d94b16658..b91273f9fd43 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -19,8 +19,8 @@ config ARM select ARCH_HAS_SET_MEMORY select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL select ARCH_HAS_STRICT_MODULE_RWX if MMU - select ARCH_HAS_SYNC_DMA_FOR_DEVICE if SWIOTLB - select ARCH_HAS_SYNC_DMA_FOR_CPU if SWIOTLB + select ARCH_HAS_SYNC_DMA_FOR_DEVICE if SWIOTLB || ARM_DMA_USE_IOMMU + select ARCH_HAS_SYNC_DMA_FOR_CPU if SWIOTLB || ARM_DMA_USE_IOMMU select ARCH_HAS_TEARDOWN_DMA_OPS if MMU select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAVE_CUSTOM_GPIO_H diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index ffa387f343dc..1bb7e9608f75 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -1418,13 +1418,13 @@ static void __iommu_free_atomic(struct device *dev, void *cpu_addr, __free_from_pool(cpu_addr, size); } -static void *__arm_iommu_alloc_attrs(struct device *dev, size_t size, - dma_addr_t *handle, gfp_t gfp, unsigned long attrs, - int coherent_flag) +static void *arm_iommu_alloc_attrs(struct device *dev, size_t size, + dma_addr_t *handle, gfp_t gfp, unsigned long attrs) { pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL); struct page **pages; void *addr = NULL; + int coherent_flag = dev->dma_coherent ? COHERENT : NORMAL; *handle = DMA_MAPPING_ERROR; size = PAGE_ALIGN(size); @@ -1467,19 +1467,7 @@ static void *__arm_iommu_alloc_attrs(struct device *dev, size_t size, return NULL; } -static void *arm_iommu_alloc_attrs(struct device *dev, size_t size, - dma_addr_t *handle, gfp_t gfp, unsigned long attrs) -{ - return __arm_iommu_alloc_attrs(dev, size, handle, gfp, attrs, NORMAL); -} - -static void *arm_coherent_iommu_alloc_attrs(struct device *dev, size_t size, - dma_addr_t *handle, gfp_t gfp, unsigned long attrs) -{ - return __arm_iommu_alloc_attrs(dev, size, handle, gfp, attrs, COHERENT); -} - -static int __arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma, +static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs) { @@ -1493,35 +1481,24 @@ static int __arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma if (vma->vm_pgoff >= nr_pages) return -ENXIO; + if (!dev->dma_coherent) + vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot); + err = vm_map_pages(vma, pages, nr_pages); if (err) pr_err("Remapping memory failed: %d\n", err); return err; } -static int arm_iommu_mmap_attrs(struct device *dev, - struct vm_area_struct *vma, void *cpu_addr, - dma_addr_t dma_addr, size_t size, unsigned long attrs) -{ - vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot); - - return __arm_iommu_mmap_attrs(dev, vma, cpu_addr, dma_addr, size, attrs); -} - -static int arm_coherent_iommu_mmap_attrs(struct device *dev, - struct vm_area_struct *vma, void *cpu_addr, - dma_addr_t dma_addr, size_t size, unsigned long attrs) -{ - return __arm_iommu_mmap_attrs(dev, vma, cpu_addr, dma_addr, size, attrs); -} /* * free a page as defined by the above mapping. * Must not be called with IRQs disabled. */ -static void __arm_iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr, - dma_addr_t handle, unsigned long attrs, int coherent_flag) +static void arm_iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr, + dma_addr_t handle, unsigned long attrs) { + int coherent_flag = dev->dma_coherent ? COHERENT : NORMAL; struct page **pages; size = PAGE_ALIGN(size); @@ -1543,19 +1520,6 @@ static void __arm_iommu_free_attrs(struct device *dev, size_t size, void *cpu_ad __iommu_free_buffer(dev, pages, size, attrs); } -static void arm_iommu_free_attrs(struct device *dev, size_t size, -void *cpu_addr, dma_addr_t handle, -unsigned long attrs) -{ - __arm_iommu_free_attrs(dev, size, cpu_addr, handle, attrs, NORMAL); -} - -static void arm_coherent_iommu_free_attrs(struct device *dev, size_t size, - void *cpu_addr, dma_addr_t handle,
Re: [PATCH RESEND v10 07/11] device-mapping: Introduce DMA range map, supplanting dma_pfn_offset
Hi Anday, On Tue, Aug 18, 2020 at 4:14 AM Andy Shevchenko wrote: > > On Mon, Aug 17, 2020 at 05:53:09PM -0400, Jim Quinlan wrote: > > The new field 'dma_range_map' in struct device is used to facilitate the > > use of single or multiple offsets between mapping regions of cpu addrs and > > dma addrs. It subsumes the role of "dev->dma_pfn_offset" which was only > > capable of holding a single uniform offset and had no region bounds > > checking. > > > > The function of_dma_get_range() has been modified so that it takes a single > > argument -- the device node -- and returns a map, NULL, or an error code. > > The map is an array that holds the information regarding the DMA regions. > > Each range entry contains the address offset, the cpu_start address, the > > dma_start address, and the size of the region. > > > > of_dma_configure() is the typical manner to set range offsets but there are > > a number of ad hoc assignments to "dev->dma_pfn_offset" in the kernel > > driver code. These cases now invoke the function > > dma_attach_offset_range(dev, cpu_addr, dma_addr, size). > > ... > > > + if (dev) { > > + phys_addr_t paddr = PFN_PHYS(pfn); > > + > > > + pfn -= (dma_offset_from_phys_addr(dev, paddr) >> PAGE_SHIFT); > > PFN_DOWN() ? Yep. > > > + } > > ... > > > + pfn += (dma_offset_from_dma_addr(dev, addr) >> PAGE_SHIFT); > > Ditto. Yep. > > > ... > > > +static inline u64 dma_offset_from_dma_addr(struct device *dev, dma_addr_t > > dma_addr) > > +{ > > + const struct bus_dma_region *m = dev->dma_range_map; > > + > > + if (!m) > > + return 0; > > + for (; m->size; m++) > > + if (dma_addr >= m->dma_start && dma_addr - m->dma_start < > > m->size) > > + return m->offset; > > + return 0; > > +} > > + > > +static inline u64 dma_offset_from_phys_addr(struct device *dev, > > phys_addr_t paddr) > > +{ > > + const struct bus_dma_region *m = dev->dma_range_map; > > + > > + if (!m) > > + return 0; > > + for (; m->size; m++) > > + if (paddr >= m->cpu_start && paddr - m->cpu_start < m->size) > > + return m->offset; > > + return 0; > > +} > > Perhaps for these the form with one return 0 is easier to read > > if (m) { > for (; m->size; m++) > if (paddr >= m->cpu_start && paddr - m->cpu_start < > m->size) > return m->offset; > } > return 0; > > ? I see what you are saying but I don't think there is enough difference between the two to justify changing it. > > ... > > > + if (mem->use_dev_dma_pfn_offset) { > > + u64 base_addr = (u64)mem->pfn_base << PAGE_SHIFT; > > PFN_PHYS() ? Yep. > > > + > > + return base_addr - dma_offset_from_phys_addr(dev, base_addr); > > + } > > ... > > > + * It returns -ENOMEM if out of memory, 0 otherwise. > > This doesn't describe cases dev->dma_range_map != NULL and offset == 0. Okay, I'll fix this. > > > +int dma_set_offset_range(struct device *dev, phys_addr_t cpu_start, > > + dma_addr_t dma_start, u64 size) > > +{ > > + struct bus_dma_region *map; > > + u64 offset = (u64)cpu_start - (u64)dma_start; > > + > > + if (!offset) > > + return 0; > > + > > + if (dev->dma_range_map) { > > + dev_err(dev, "attempt to add DMA range to existing map\n"); > > + return -EINVAL; > > + } > > + > > + map = kcalloc(2, sizeof(*map), GFP_KERNEL); > > + if (!map) > > + return -ENOMEM; > > + map[0].cpu_start = cpu_start; > > + map[0].dma_start = dma_start; > > + map[0].offset = offset; > > + map[0].size = size; > > + dev->dma_range_map = map; > > + > > + return 0; > > +} > > ... > > > +void *dma_copy_dma_range_map(const struct bus_dma_region *map) > > +{ > > + int num_ranges; > > + struct bus_dma_region *new_map; > > + const struct bus_dma_region *r = map; > > + > > + for (num_ranges = 0; r->size; num_ranges++) > > + r++; > > > + new_map = kcalloc(num_ranges + 1, sizeof(*map), GFP_KERNEL); > > + if (new_map) > > + memcpy(new_map, map, sizeof(*map) * num_ranges); > > Looks like krealloc() on the first glance... It's not. We are making a distinct copy of the original, not resizing it. > > > + > > + return new_map; > > +} > > -- > With Best Regards, > Andy Shevchenko Thanks again, Jim > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 0/3] iommu/arm-smmu-v3: permit users to disable msi polling
On 2020-08-19 00:38, Barry Song wrote: patch 1/3 and patch 2/3 are the preparation of patch 3/3 which permits users to disable MSI-based polling by cmd line. -v4: with respect to Robin's comments * cleanup the code of the existing module parameter disable_bypass * add ARM_SMMU_OPT_MSIPOLL flag. on the other hand, we only need to check a bit in options rather than two bits in features Thanks Barry - for all 3 patches, Reviewed-by: Robin Murphy I'd be inclined to squash #2 into #1, but I'll leave that up to Will. Cheers, Robin. Barry Song (3): iommu/arm-smmu-v3: replace symbolic permissions by octal permissions for module parameter iommu/arm-smmu-v3: replace module_param_named by module_param for disable_bypass iommu/arm-smmu-v3: permit users to disable msi polling drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 2/2] iommu/iova: Free global iova rcache on iova alloc failure
From: Vijayanand Jitta When ever an iova alloc request fails we free the iova ranges present in the percpu iova rcaches and then retry but the global iova rcache is not freed as a result we could still see iova alloc failure even after retry as global rcache is holding the iova's which can cause fragmentation. So, free the global iova rcache as well and then go for the retry. Signed-off-by: Vijayanand Jitta --- drivers/iommu/iova.c | 23 +++ include/linux/iova.h | 6 ++ 2 files changed, 29 insertions(+) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 4e77116..5836c87 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -442,6 +442,7 @@ struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn) flush_rcache = false; for_each_online_cpu(cpu) free_cpu_cached_iovas(cpu, iovad); + free_global_cached_iovas(iovad); goto retry; } @@ -1055,5 +1056,27 @@ void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad) } } +/* + * free all the IOVA ranges of global cache + */ +void free_global_cached_iovas(struct iova_domain *iovad) +{ + struct iova_rcache *rcache; + unsigned long flags; + int i, j; + + for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) { + rcache = &iovad->rcaches[i]; + spin_lock_irqsave(&rcache->lock, flags); + for (j = 0; j < rcache->depot_size; ++j) { + iova_magazine_free_pfns(rcache->depot[j], iovad); + iova_magazine_free(rcache->depot[j]); + rcache->depot[j] = NULL; + } + rcache->depot_size = 0; + spin_unlock_irqrestore(&rcache->lock, flags); + } +} + MODULE_AUTHOR("Anil S Keshavamurthy "); MODULE_LICENSE("GPL"); diff --git a/include/linux/iova.h b/include/linux/iova.h index a0637ab..a905726 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -163,6 +163,7 @@ int init_iova_flush_queue(struct iova_domain *iovad, struct iova *split_and_remove_iova(struct iova_domain *iovad, struct iova *iova, unsigned long pfn_lo, unsigned long pfn_hi); void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad); +void free_global_cached_iovas(struct iova_domain *iovad); #else static inline int iova_cache_get(void) { @@ -270,6 +271,11 @@ static inline void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad) { } + +static inline void free_global_cached_iovas(struct iova_domain *iovad) +{ +} + #endif #endif -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/2] iommu/iova: Retry from last rb tree node if iova search fails
From: Vijayanand Jitta When ever a new iova alloc request comes iova is always searched from the cached node and the nodes which are previous to cached node. So, even if there is free iova space available in the nodes which are next to the cached node iova allocation can still fail because of this approach. Consider the following sequence of iova alloc and frees on 1GB of iova space 1) alloc - 500MB 2) alloc - 12MB 3) alloc - 499MB 4) free - 12MB which was allocated in step 2 5) alloc - 13MB After the above sequence we will have 12MB of free iova space and cached node will be pointing to the iova pfn of last alloc of 13MB which will be the lowest iova pfn of that iova space. Now if we get an alloc request of 2MB we just search from cached node and then look for lower iova pfn's for free iova and as they aren't any, iova alloc fails though there is 12MB of free iova space. To avoid such iova search failures do a retry from the last rb tree node when iova search fails, this will search the entire tree and get an iova if its available. Signed-off-by: Vijayanand Jitta --- drivers/iommu/iova.c | 23 +-- 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 49fc01f..4e77116 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -184,8 +184,9 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad, struct rb_node *curr, *prev; struct iova *curr_iova; unsigned long flags; - unsigned long new_pfn; + unsigned long new_pfn, low_pfn_new; unsigned long align_mask = ~0UL; + unsigned long high_pfn = limit_pfn, low_pfn = iovad->start_pfn; if (size_aligned) align_mask <<= fls_long(size - 1); @@ -198,15 +199,25 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad, curr = __get_cached_rbnode(iovad, limit_pfn); curr_iova = rb_entry(curr, struct iova, node); + low_pfn_new = curr_iova->pfn_hi + 1; + +retry: do { - limit_pfn = min(limit_pfn, curr_iova->pfn_lo); - new_pfn = (limit_pfn - size) & align_mask; + high_pfn = min(high_pfn, curr_iova->pfn_lo); + new_pfn = (high_pfn - size) & align_mask; prev = curr; curr = rb_prev(curr); curr_iova = rb_entry(curr, struct iova, node); - } while (curr && new_pfn <= curr_iova->pfn_hi); - - if (limit_pfn < size || new_pfn < iovad->start_pfn) { + } while (curr && new_pfn <= curr_iova->pfn_hi && new_pfn >= low_pfn); + + if (high_pfn < size || new_pfn < low_pfn) { + if (low_pfn == iovad->start_pfn && low_pfn_new < limit_pfn) { + high_pfn = limit_pfn; + low_pfn = low_pfn_new; + curr = &iovad->anchor.node; + curr_iova = rb_entry(curr, struct iova, node); + goto retry; + } iovad->max32_alloc_size = size; goto iova32_full; } -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation 1.9.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
On Thu, Aug 20, 2020 at 7:02 AM Christoph Hellwig wrote: > > On Wed, Aug 19, 2020 at 03:07:04PM +0100, Robin Murphy wrote: > >> FWIW, I asked back in time what the plan is for non-coherent > >> allocations and it seemed like DMA_ATTR_NON_CONSISTENT and > >> dma_sync_*() was supposed to be the right thing to go with. [2] The > >> same thread also explains why dma_alloc_pages() isn't suitable for the > >> users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT. > > > > AFAICS even back then Christoph was implying getting rid of NON_CONSISTENT > > and *replacing* it with something streaming-API-based - i.e. this series - > > not encouraging mixing the existing APIs. It doesn't seem impossible to > > implement a remapping version of this new dma_alloc_pages() for > > IOMMU-backed ops if it's really warranted (although at that point it seems > > like "non-coherent" vb2-dc starts to have significant conceptual overlap > > with vb2-sg). > > You can alway vmap the returned pages from dma_alloc_pages, but it will > make cache invalidation hell - you'll need to use > invalidate_kernel_vmap_range and flush_kernel_vmap_range to properly > handle virtually indexed caches. > > Or with remapping you mean using the iommu do de-scatter/gather? Ideally, both. For remapping in the CPU sense, there are drivers which rely on a contiguous kernel mapping of the vb2 buffers, which was provided by dma_alloc_attrs(). I think they could be reworked to work on single pages, but that would significantly complicate the code. At the same time, such drivers would actually benefit from a cached mapping, because they often have non-bursty, random access patterns. Then, in the IOMMU sense, the whole idea of videobuf2-dma-contig is to rely on the DMA API to always provide device-contiguous memory, as required by the hardware which only has a single pointer and size. > > You can implement that trivially implement it yourself for the iommu > case: > > { > merge_boundary = dma_get_merge_boundary(dev); > if (!merge_boundary || merge_boundary > chunk_size - 1) { > /* can't coalesce */ > return -EINVAL; > } > > > nents = DIV_ROUND_UP(total_size, chunk_size); > sg = sgl_alloc(); > for_each_sgl() { > sg->page = __alloc_pages(get_order(chunk_size)) > sg->len = chunk_size; > } > dma_map_sg(sg, DMA_ATTR_SKIP_CPU_SYNC); > // you are guaranteed to get a single dma_addr out > } > > Of course this still uses the scatterlist structure with its annoying > mix of input and output parametes, so I'd rather not expose it as > an official API at the DMA layer. The problem with the above open coded approach is that it requires explicit handling of the non-IOMMU and IOMMU cases and this is exactly what we don't want to have in vb2 and what was actually the job of the DMA API to hide. Is the plan to actually move the IOMMU handling out of the DMA API? Do you think we could instead turn it into a dma_alloc_noncoherent() helper, which has similar semantics as dma_alloc_attrs() and handles the various corner cases (e.g. invalidate_kernel_vmap_range and flush_kernel_vmap_range) to achieve the desired functionality without delegating the "hell", as you called it, to the users? Best regards, Tomasz ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
On Thu, Aug 20, 2020 at 6:45 AM Christoph Hellwig wrote: > > On Wed, Aug 19, 2020 at 04:11:52PM +0200, Tomasz Figa wrote: > > > > By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any > > > > series related to the subsystem-facing DMA API changes, since > > > > videobuf2 is one of the biggest users of it. > > > > > > The cc list is too long - I cc lists and key maintainers. As a reviewer > > > should should watch your subsystems lists closely. > > > > Well, I guess we can disagree on this, because there is no clear > > policy. I'm listed in the MAINTAINERS file for the subsystem and I > > believe the purpose of the file is to list the people to CC on > > relevant patches. We're all overloaded with work and having to look > > through the huge volume of mailing lists like linux-media doesn't help > > and thus I'd still appreciate being added on CC. > > I'm happy to Cc and active participant in the discussion. I'm not > going to add all reviewers because even with the trimmed CC list > I'm already hitting the number of receipients limit on various lists. Fair enough. We'll make your job easier and just turn my MAINTAINERS entry into a maintainer. :) Best regards, Tomasz ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT
On Thu, Aug 20, 2020 at 7:20 AM Christoph Hellwig wrote: > > On Thu, Aug 20, 2020 at 06:43:47AM +0200, Christoph Hellwig wrote: > > On Wed, Aug 19, 2020 at 03:57:53PM +0200, Tomasz Figa wrote: > > > > > Could you explain what makes you think it's unused? It's a feature of > > > > > the UAPI generally supported by the videobuf2 framework and relied on > > > > > by Chromium OS to get any kind of reasonable performance when > > > > > accessing V4L2 buffers in the userspace. > > > > > > > > Because it doesn't do anything except on PARISC and non-coherent MIPS, > > > > so by definition it isn't used by any of these media drivers. > > > > > > It's still an UAPI feature, so we can't simply remove the flag, it > > > must stay there as a no-op, until the problem is resolved. > > > > Ok, I'll switch to just ignoring it for the next version. > > So I took a deeper look. I don't really think it qualifies as a UAPI > in our traditional sense. For one it only appeared in 5.9-rc1, so we > can trivially expedite the patch into 5.9-rc and not actually make it > show up in any released kernel version. And even as of the current > Linus' tree the only user is a test driver. So I really think the best > way to go ahead is to just revert it ASAP as the design wasn't thought > out at all. The UAPI and V4L2/videobuf2 changes are in good shape and the only wrong part is the use of DMA API, which was based on an earlier email guidance anyway, and a change to the synchronization part . I find conclusions like the above insulting for people who put many hours into designing and implementing the related functionality, given the complexity of the videobuf2 framework and how ill-defined the DMA API was, and would feel better if such could be avoided in future communication. That said, we can revert it on the basis of the implementation issues, but I feel like we wouldn't get anything by doing so, because as I said, the design is sane and most of the implementation is fine as well. Instead. I'd suggest simply removing the use of the attribute being removed, so that the feature stays no-op until the DMA API provides a way to implement it or we just migrate videobuf2 to stop using the DMA API as much as possible, like many drivers in the DRM subsystem did. Best regards, Tomasz ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu