date:20200820

Re: [PATCH v6 0/2] make dma_alloc_coherent NUMA-aware by per-NUMA CMA

2020-08-20 Thread Christoph Hellwig

FYI, as of the last one I'm fine now, bit I really need an ACK from
the arm64 maintainers.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 1/2] dma-contiguous: provide the ability to reserve per-numa CMA

2020-08-20 Thread Randy Dunlap

On 8/20/20 7:26 PM, Barry Song wrote:
> 
> 
> Cc: Jonathan Cameron 
> Cc: Christoph Hellwig 
> Cc: Marek Szyprowski 
> Cc: Will Deacon 
> Cc: Robin Murphy 
> Cc: Ganapatrao Kulkarni 
> Cc: Catalin Marinas 
> Cc: Nicolas Saenz Julienne 
> Cc: Steve Capper 
> Cc: Andrew Morton 
> Cc: Mike Rapoport 
> Signed-off-by: Barry Song 
> ---
>  v6: rebase on top of 5.9-rc1;
>  doc cleanup
> 
>  .../admin-guide/kernel-parameters.txt |   9 ++
>  include/linux/dma-contiguous.h|   6 ++
>  kernel/dma/Kconfig|  10 ++
>  kernel/dma/contiguous.c   | 100 --
>  4 files changed, 115 insertions(+), 10 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index bdc1f33fd3d1..3f33b89aeab5 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -599,6 +599,15 @@
>   altogether. For more information, see
>   include/linux/dma-contiguous.h
>  
> + pernuma_cma=nn[MG]

memparse() allows any one of these suffixes: K, M, G, T, P, E
and nothing in the option parsing function cares what suffix is used...

> + [ARM64,KNL]
> + Sets the size of kernel per-numa memory area for
> + contiguous memory allocations. A value of 0 disables
> + per-numa CMA altogether. DMA users on node nid will
> + first try to allocate buffer from the pernuma area
> + which is located in node nid, if the allocation fails,
> + they will fallback to the global default memory area.
> +
>   cmo_free_hint=  [PPC] Format: { yes | no }
>   Specify whether pages are marked as being inactive
>   when they are freed.  This is used in CMO environments

> diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
> index cff7e60968b9..89b95f10e56d 100644
> --- a/kernel/dma/contiguous.c
> +++ b/kernel/dma/contiguous.c
> @@ -69,6 +69,19 @@ static int __init early_cma(char *p)
>  }
>  early_param("cma", early_cma);
>  
> +#ifdef CONFIG_DMA_PERNUMA_CMA
> +
> +static struct cma *dma_contiguous_pernuma_area[MAX_NUMNODES];
> +static phys_addr_t pernuma_size_bytes __initdata;

why phys_addr_t? couldn't it just be unsigned long long?

OK, so cma_declare_contiguous_nid() uses phys_addr_t. Fine.

> +
> +static int __init early_pernuma_cma(char *p)
> +{
> + pernuma_size_bytes = memparse(p, &p);
> + return 0;
> +}
> +early_param("pernuma_cma", early_pernuma_cma);
> +#endif
> +
>  #ifdef CONFIG_CMA_SIZE_PERCENTAGE
>  
>  static phys_addr_t __init __maybe_unused cma_early_percent_memory(void)
> @@ -96,6 +109,34 @@ static inline __maybe_unused phys_addr_t 
> cma_early_percent_memory(void)
>  
>  #endif
>  
> +#ifdef CONFIG_DMA_PERNUMA_CMA
> +void __init dma_pernuma_cma_reserve(void)
> +{
> + int nid;
> +
> + if (!pernuma_size_bytes)
> + return;
> +
> + for_each_node_state(nid, N_ONLINE) {
> + int ret;
> + char name[20];
> + struct cma **cma = &dma_contiguous_pernuma_area[nid];
> +
> + snprintf(name, sizeof(name), "pernuma%d", nid);
> + ret = cma_declare_contiguous_nid(0, pernuma_size_bytes, 0, 0,
> +  0, false, name, cma, nid);
> + if (ret) {
> + pr_warn("%s: reservation failed: err %d, node %d", 
> __func__,
> + ret, nid);
> + continue;
> + }
> +
> + pr_debug("%s: reserved %llu MiB on node %d\n", __func__,
> + (unsigned long long)pernuma_size_bytes / SZ_1M, nid);

Conversely, if you want to leave pernuma_size_bytes as phys_addr_t,
you should use %pa (or %pap) to print it.

> + }
> +}
> +#endif



-- 
~Randy

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 1/2] dma-contiguous: provide the ability to reserve per-numa CMA

2020-08-20 Thread Barry Song

Right now, drivers like ARM SMMU are using dma_alloc_coherent() to get
coherent DMA buffers to save their command queues and page tables. As
there is only one default CMA in the whole system, SMMUs on nodes other
than node0 will get remote memory. This leads to significant latency.

This patch provides per-numa CMA so that drivers like SMMU can get local
memory. Tests show localizing CMA can decrease dma_unmap latency much.
For instance, before this patch, SMMU on node2 has to wait for more than
560ns for the completion of CMD_SYNC in an empty command queue; with this
patch, it needs 240ns only.

A positive side effect of this patch would be improving performance even
further for those users who are worried about performance more than DMA
security and use iommu.passthrough=1 to skip IOMMU. With local CMA, all
drivers can get local coherent DMA buffers.

Cc: Jonathan Cameron 
Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Will Deacon 
Cc: Robin Murphy 
Cc: Ganapatrao Kulkarni 
Cc: Catalin Marinas 
Cc: Nicolas Saenz Julienne 
Cc: Steve Capper 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Signed-off-by: Barry Song 
---
 v6: rebase on top of 5.9-rc1;
 doc cleanup

 .../admin-guide/kernel-parameters.txt |   9 ++
 include/linux/dma-contiguous.h|   6 ++
 kernel/dma/Kconfig|  10 ++
 kernel/dma/contiguous.c   | 100 --
 4 files changed, 115 insertions(+), 10 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index bdc1f33fd3d1..3f33b89aeab5 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -599,6 +599,15 @@
altogether. For more information, see
include/linux/dma-contiguous.h
 
+   pernuma_cma=nn[MG]
+   [ARM64,KNL]
+   Sets the size of kernel per-numa memory area for
+   contiguous memory allocations. A value of 0 disables
+   per-numa CMA altogether. DMA users on node nid will
+   first try to allocate buffer from the pernuma area
+   which is located in node nid, if the allocation fails,
+   they will fallback to the global default memory area.
+
cmo_free_hint=  [PPC] Format: { yes | no }
Specify whether pages are marked as being inactive
when they are freed.  This is used in CMO environments
diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h
index 03f8e98e3bcc..fe55e004f1f4 100644
--- a/include/linux/dma-contiguous.h
+++ b/include/linux/dma-contiguous.h
@@ -171,6 +171,12 @@ static inline void dma_free_contiguous(struct device *dev, 
struct page *page,
 
 #endif
 
+#ifdef CONFIG_DMA_PERNUMA_CMA
+void dma_pernuma_cma_reserve(void);
+#else
+static inline void dma_pernuma_cma_reserve(void) { }
+#endif
+
 #endif
 
 #endif
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 847a9d1fa634..db7a37ed35eb 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -118,6 +118,16 @@ config DMA_CMA
  If unsure, say "n".
 
 if  DMA_CMA
+
+config DMA_PERNUMA_CMA
+   bool "Enable separate DMA Contiguous Memory Area for each NUMA Node"
+   help
+ Enable this option to get pernuma CMA areas so that devices like
+ ARM64 SMMU can get local memory by DMA coherent APIs.
+
+ You can set the size of pernuma CMA by specifying "pernuma_cma=size"
+ on the kernel's command line.
+
 comment "Default contiguous memory area size:"
 
 config CMA_SIZE_MBYTES
diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c
index cff7e60968b9..89b95f10e56d 100644
--- a/kernel/dma/contiguous.c
+++ b/kernel/dma/contiguous.c
@@ -69,6 +69,19 @@ static int __init early_cma(char *p)
 }
 early_param("cma", early_cma);
 
+#ifdef CONFIG_DMA_PERNUMA_CMA
+
+static struct cma *dma_contiguous_pernuma_area[MAX_NUMNODES];
+static phys_addr_t pernuma_size_bytes __initdata;
+
+static int __init early_pernuma_cma(char *p)
+{
+   pernuma_size_bytes = memparse(p, &p);
+   return 0;
+}
+early_param("pernuma_cma", early_pernuma_cma);
+#endif
+
 #ifdef CONFIG_CMA_SIZE_PERCENTAGE
 
 static phys_addr_t __init __maybe_unused cma_early_percent_memory(void)
@@ -96,6 +109,34 @@ static inline __maybe_unused phys_addr_t 
cma_early_percent_memory(void)
 
 #endif
 
+#ifdef CONFIG_DMA_PERNUMA_CMA
+void __init dma_pernuma_cma_reserve(void)
+{
+   int nid;
+
+   if (!pernuma_size_bytes)
+   return;
+
+   for_each_node_state(nid, N_ONLINE) {
+   int ret;
+   char name[20];
+   struct cma **cma = &dma_contiguous_pernuma_area[nid];
+
+   snprintf(name, sizeof(name), "pernuma%d", nid);
+   ret = cma_declare_contiguous_nid(0, pernuma_size_bytes, 0, 0,

[PATCH v6 0/2] make dma_alloc_coherent NUMA-aware by per-NUMA CMA

2020-08-20 Thread Barry Song

Ganapatrao Kulkarni has put some effort on making arm-smmu-v3 use local
memory to save command queues[1]. I also did similar job in patch
"iommu/arm-smmu-v3: allocate the memory of queues in local numa node"
[2] while not realizing Ganapatrao has done that before.

But it seems it is much better to make dma_alloc_coherent() to be
inherently NUMA-aware on NUMA-capable systems.

Right now, smmu is using dma_alloc_coherent() to get memory to save queues
and tables. Typically, on ARM64 server, there is a default CMA located at
node0, which could be far away from node2, node3 etc.
Saving queues and tables remotely will increase the latency of ARM SMMU
significantly. For example, when SMMU is at node2 and the default global
CMA is at node0, after sending a CMD_SYNC in an empty command queue, we
have to wait more than 550ns for the completion of the command CMD_SYNC.
However, if we save them locally, we only need to wait for 240ns.

with per-numa CMA, smmu will get memory from local numa node to save command
queues and page tables. that means dma_unmap latency will be shrunk much.

Meanwhile, when iommu.passthrough is on, device drivers which call dma_
alloc_coherent() will also get local memory and avoid the travel between
numa nodes.

I only have ARM64 server platforms to test, but I believe this patch will
benefit X86 somehow. Hopefully, some X86 guys will bring it up on x86.

[1] https://lists.linuxfoundation.org/pipermail/iommu/2017-October/024455.html
[2] https://www.spinics.net/lists/iommu/msg44767.html


-v6:
 * rebase on top of 5.9-rc1
 * doc cleanup

-v5:
 refine code according to Christoph Hellwig's comments
 * remove Kconfig option for pernuma cma size;
 * add Kconfig option for pernuma cma enable;
 * code cleanup like line over 80 char

 I haven't removed the cma NULL check code in cma_alloc() as it requires
 a bundle of other changes. So I prefer to handle this issue separately.

-v4:
 * rebase on top of Christoph Hellwig's patch:
 [PATCH v2] dma-contiguous: cleanup dma_alloc_contiguous
 https://lore.kernel.org/linux-iommu/20200723120133.94105-1-...@lst.de/
 * cleanup according to Christoph's comment
 * rebase on top of linux-next to avoid arch/arm64 conflicts
 * reserve cma by checking N_MEMORY rather than N_ONLINE

-v3:
  * move to use page_to_nid() while freeing cma with respect to Robin's
  comment, but this will only work after applying my below patch:
  "mm/cma.c: use exact_nid true to fix possible per-numa cma leak"
  https://marc.info/?l=linux-mm&m=159333034726647&w=2

  * handle the case count <= 1 more properly according to Robin's
  comment;

  * add pernuma_cma parameter to support dynamic setting of per-numa
  cma size;
  ideally we can leverage the CMA_SIZE_MBYTES, CMA_SIZE_PERCENTAGE and
  "cma=" kernel parameter and avoid a new paramter separately for per-
  numa cma. Practically, it is really too complicated considering the
  below problems:
  (1) if we leverage the size of default numa for per-numa, we have to
  avoid creating two cma with same size in node0 since default cma is
  probably on node0.
  (2) default cma can consider the address limitation for old devices
  while per-numa cma doesn't support GFP_DMA and GFP_DMA32. all
  allocations with limitation flags will fallback to default one.
  (3) hard to apply CMA_SIZE_PERCENTAGE to per-numa. it is hard to
  decide if the percentage should apply to the whole memory size
  or only apply to the memory size of a specific numa node.
  (4) default cma size has CMA_SIZE_SEL_MIN and CMA_SIZE_SEL_MAX, it
  makes things even more complicated to per-numa cma.

  I haven't figured out a good way to leverage the size of default cma
  for per-numa cma. it seems a separate parameter for per-numa could
  make life easier.

  * move dma_pernuma_cma_reserve() after hugetlb_cma_reserve() to
  reuse the comment before hugetlb_cma_reserve() with respect to
  Robin's comment

-v2: 
  * fix some issues reported by kernel test robot
  * fallback to default cma while allocation fails in per-numa cma
 free memory properly

Barry Song (2):
  dma-contiguous: provide the ability to reserve per-numa CMA
  arm64: mm: reserve per-numa CMA to localize coherent dma buffers

 .../admin-guide/kernel-parameters.txt |   9 ++
 arch/arm64/mm/init.c  |   2 +
 include/linux/dma-contiguous.h|   6 ++
 kernel/dma/Kconfig|  10 ++
 kernel/dma/contiguous.c   | 100 --
 5 files changed, 117 insertions(+), 10 deletions(-)

-- 
2.27.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v6 2/2] arm64: mm: reserve per-numa CMA to localize coherent dma buffers

2020-08-20 Thread Barry Song

Right now, smmu is using dma_alloc_coherent() to get memory to save queues
and tables. Typically, on ARM64 server, there is a default CMA located at
node0, which could be far away from node2, node3 etc.
with this patch, smmu will get memory from local numa node to save command
queues and page tables. that means dma_unmap latency will be shrunk much.
Meanwhile, when iommu.passthrough is on, device drivers which call dma_
alloc_coherent() will also get local memory and avoid the travel between
numa nodes.

Cc: Christoph Hellwig 
Cc: Marek Szyprowski 
Cc: Will Deacon 
Cc: Robin Murphy 
Cc: Ganapatrao Kulkarni 
Cc: Catalin Marinas 
Cc: Nicolas Saenz Julienne 
Cc: Steve Capper 
Cc: Andrew Morton 
Cc: Mike Rapoport 
Signed-off-by: Barry Song 
---
 -v6: rebase on top of 5.9-rc1

 arch/arm64/mm/init.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 481d22c32a2e..f1c75957ff3c 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -429,6 +429,8 @@ void __init bootmem_init(void)
arm64_hugetlb_cma_reserve();
 #endif
 
+   dma_pernuma_cma_reserve();
+
/*
 * sparse_init() tries to allocate memory from memblock, so must be
 * done after the fixed reservations
-- 
2.27.0


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v6 07/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)

2020-08-20 Thread Liu, Yi L

Hi Alex,

> From: Alex Williamson 
> Sent: Friday, August 21, 2020 9:49 AM
> 
> On Fri, 21 Aug 2020 00:37:19 +
> "Liu, Yi L"  wrote:
> 
> > Hi Alex,
> >
> > > From: Alex Williamson 
> > > Sent: Friday, August 21, 2020 4:51 AM
> > >
> > > On Mon, 27 Jul 2020 23:27:36 -0700
> > > Liu Yi L  wrote:
> > >
> > > > This patch allows userspace to request PASID allocation/free, e.g.
> > > > when serving the request from the guest.
> > > >
> > > > PASIDs that are not freed by userspace are automatically freed when
> > > > the IOASID set is destroyed when process exits.
> > > >
> > > > Cc: Kevin Tian 
> > > > CC: Jacob Pan 
> > > > Cc: Alex Williamson 
> > > > Cc: Eric Auger 
> > > > Cc: Jean-Philippe Brucker 
> > > > Cc: Joerg Roedel 
> > > > Cc: Lu Baolu 
> > > > Signed-off-by: Liu Yi L 
> > > > Signed-off-by: Yi Sun 
> > > > Signed-off-by: Jacob Pan 
> > > > ---
> > > > v5 -> v6:
> > > > *) address comments from Eric against v5. remove the alloc/free helper.
> > > >
> > > > v4 -> v5:
> > > > *) address comments from Eric Auger.
> > > > *) the comments for the PASID_FREE request is addressed in patch 5/15 of
> > > >this series.
> > > >
> > > > v3 -> v4:
> > > > *) address comments from v3, except the below comment against the range
> > > >of PASID_FREE request. needs more help on it.
> > > > "> +if (req.range.min > req.range.max)
> > > >
> > > >  Is it exploitable that a user can spin the kernel for a long time 
> > > > in
> > > >  the case of a free by calling this with [0, MAX_UINT] regardless of
> > > >  their actual allocations?"
> > > >
> > > > https://lore.kernel.org/linux-iommu/20200702151832.048b4...@x1.home/
> > > >
> > > > v1 -> v2:
> > > > *) move the vfio_mm related code to be a seprate module
> > > > *) use a single structure for alloc/free, could support a range of
> > > > PASIDs
> > > > *) fetch vfio_mm at group_attach time instead of at iommu driver open
> > > > time
> > > > ---
> > > >  drivers/vfio/Kconfig|  1 +
> > > >  drivers/vfio/vfio_iommu_type1.c | 69
> > > +
> > > >  drivers/vfio/vfio_pasid.c   | 10 ++
> > > >  include/linux/vfio.h|  6 
> > > >  include/uapi/linux/vfio.h   | 37 ++
> > > >  5 files changed, 123 insertions(+)
> > > >
> > > > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index
> > > > 3d8a108..95d90c6 100644
> > > > --- a/drivers/vfio/Kconfig
> > > > +++ b/drivers/vfio/Kconfig
> > > > @@ -2,6 +2,7 @@
> > > >  config VFIO_IOMMU_TYPE1
> > > > tristate
> > > > depends on VFIO
> > > > +   select VFIO_PASID if (X86)
> > > > default n
> > > >
> > > >  config VFIO_IOMMU_SPAPR_TCE
> > > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > > b/drivers/vfio/vfio_iommu_type1.c index 18ff0c3..ea89c7c 100644
> > > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > > @@ -76,6 +76,7 @@ struct vfio_iommu {
> > > > booldirty_page_tracking;
> > > > boolpinned_page_dirty_scope;
> > > > struct iommu_nesting_info   *nesting_info;
> > > > +   struct vfio_mm  *vmm;
> > > >  };
> > > >
> > > >  struct vfio_domain {
> > > > @@ -1937,6 +1938,11 @@ static void vfio_iommu_iova_insert_copy(struct
> > > > vfio_iommu *iommu,
> > > >
> > > >  static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
> > > > {
> > > > +   if (iommu->vmm) {
> > > > +   vfio_mm_put(iommu->vmm);
> > > > +   iommu->vmm = NULL;
> > > > +   }
> > > > +
> > > > kfree(iommu->nesting_info);
> > > > iommu->nesting_info = NULL;
> > > >  }
> > > > @@ -2071,6 +2077,26 @@ static int vfio_iommu_type1_attach_group(void
> > > *iommu_data,
> > > > iommu->nesting_info);
> > > > if (ret)
> > > > goto out_detach;
> > > > +
> > > > +   if (iommu->nesting_info->features &
> > > > +   
> > > > IOMMU_NESTING_FEAT_SYSWIDE_PASID)
> > > {
> > > > +   struct vfio_mm *vmm;
> > > > +   int sid;
> > > > +
> > > > +   vmm = vfio_mm_get_from_task(current);
> > > > +   if (IS_ERR(vmm)) {
> > > > +   ret = PTR_ERR(vmm);
> > > > +   goto out_detach;
> > > > +   }
> > > > +   iommu->vmm = vmm;
> > > > +
> > > > +   sid = vfio_mm_ioasid_sid(vmm);
> > > > +   ret = iommu_domain_set_attr(domain->domain,
> > > > +   
> > > > DOMAIN_ATTR_IOASID_SID,
> > > > +   &sid);
> > > > +   if (ret)
> > > > +   goto out_detach;
> >

[patch RFC 26/38] x86/xen: Wrap XEN MSI management into irqdomain

2020-08-20 Thread Thomas Gleixner

To allow utilizing the irq domain pointer in struct device it is necessary
to make XEN/MSI irq domain compatible.

While the right solution would be to truly convert XEN to irq domains, this
is an exercise which is not possible for mere mortals with limited XENology.

Provide a plain irqdomain wrapper around XEN. While this is blatant
violation of the irqdomain design, it's the only solution for a XEN igorant
person to make progress on the issue which triggered this change.

Signed-off-by: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
Cc: xen-de...@lists.xenproject.org
---
Note: This is completely untested, but it compiles so it must be perfect.
---
 arch/x86/pci/xen.c |   63 +
 1 file changed, 63 insertions(+)

--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -406,6 +406,63 @@ static void xen_teardown_msi_irq(unsigne
WARN_ON_ONCE(1);
 }
 
+static int xen_msi_domain_alloc_irqs(struct irq_domain *domain,
+struct device *dev,  int nvec)
+{
+   int type;
+
+   if (WARN_ON_ONCE(!dev_is_pci(dev)))
+   return -EINVAL;
+
+   if (first_msi_entry(dev)->msi_attrib.is_msix)
+   type = PCI_CAP_ID_MSIX;
+   else
+   type = PCI_CAP_ID_MSI;
+
+   return x86_msi.setup_msi_irqs(to_pci_dev(dev), nvec, type);
+}
+
+static void xen_msi_domain_free_irqs(struct irq_domain *domain,
+struct device *dev)
+{
+   if (WARN_ON_ONCE(!dev_is_pci(dev)))
+   return;
+
+   x86_msi.teardown_msi_irqs(to_pci_dev(dev));
+}
+
+static struct msi_domain_ops xen_pci_msi_domain_ops = {
+   .domain_alloc_irqs  = xen_msi_domain_alloc_irqs,
+   .domain_free_irqs   = xen_msi_domain_free_irqs,
+};
+
+static struct msi_domain_info xen_pci_msi_domain_info = {
+   .ops= &xen_pci_msi_domain_ops,
+};
+
+/*
+ * This irq domain is a blatant violation of the irq domain design, but
+ * distangling XEN into real irq domains is not a job for mere mortals with
+ * limited XENology. But it's the least dangerous way for a mere mortal to
+ * get rid of the arch_*_msi_irqs() hackery in order to store the irq
+ * domain pointer in struct device. This irq domain wrappery allows to do
+ * that without breaking XEN terminally.
+ */
+static __init struct irq_domain *xen_create_pci_msi_domain(void)
+{
+   struct irq_domain *d = NULL;
+   struct fwnode_handle *fn;
+
+   fn = irq_domain_alloc_named_fwnode("XEN-MSI");
+   if (fn)
+   d = msi_create_irq_domain(fn, &xen_pci_msi_domain_info, NULL);
+
+   /* FIXME: No idea how to survive if this fails */
+   BUG_ON(!d);
+
+   return d;
+}
+
 static __init void xen_setup_pci_msi(void)
 {
if (xen_initial_domain()) {
@@ -426,6 +483,12 @@ static __init void xen_setup_pci_msi(voi
}
 
x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
+
+   /*
+* Override the PCI/MSI irq domain init function. No point
+* in allocating the native domain and never use it.
+*/
+   x86_init.irqs.create_pci_msi_domain = xen_create_pci_msi_domain;
 }
 
 #else /* CONFIG_PCI_MSI */

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 21/38] PCI: MSI: Provide pci_dev_has_special_msi_domain() helper

2020-08-20 Thread Thomas Gleixner

Provide a helper function to check whether a PCI device is handled by a
non-standard PCI/MSI domain. This will be used to exclude such devices
which hang of a special bus, e.g. VMD, to be excluded from the irq domain
override in irq remapping.

Signed-off-by: Thomas Gleixner 
Cc: Bjorn Helgaas 
Cc: linux-...@vger.kernel.org
---
 drivers/pci/msi.c   |   22 ++
 include/linux/msi.h |1 +
 2 files changed, 23 insertions(+)

--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1553,4 +1553,26 @@ struct irq_domain *pci_msi_get_device_do
 DOMAIN_BUS_PCI_MSI);
return dom;
 }
+
+/**
+ * pci_dev_has_special_msi_domain - Check whether the device is handled by
+ * a non-standard PCI-MSI domain
+ * @pdev:  The PCI device to check.
+ *
+ * Returns: True if the device irqdomain or the bus irqdomain is
+ * non-standard PCI/MSI.
+ */
+bool pci_dev_has_special_msi_domain(struct pci_dev *pdev)
+{
+   struct irq_domain *dom = dev_get_msi_domain(&pdev->dev);
+
+   if (!dom)
+   dom = dev_get_msi_domain(&pdev->bus->dev);
+
+   if (!dom)
+   return true;
+
+   return dom->bus_token != DOMAIN_BUS_PCI_MSI;
+}
+
 #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -374,6 +374,7 @@ int pci_msi_domain_check_cap(struct irq_
 struct msi_domain_info *info, struct device *dev);
 u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev 
*pdev);
 struct irq_domain *pci_msi_get_device_domain(struct pci_dev *pdev);
+bool pci_dev_has_special_msi_domain(struct pci_dev *pdev);
 #else
 static inline struct irq_domain *pci_msi_get_device_domain(struct pci_dev 
*pdev)
 {

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 20/38] PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI

2020-08-20 Thread Thomas Gleixner

Devices on the VMD bus use their own MSI irq domain, but it is not
distinguishable from regular PCI/MSI irq domains. This is required
to exclude VMD devices from getting the irq domain pointer set by
interrupt remapping.

Override the default bus token.

Signed-off-by: Thomas Gleixner 
Cc: Bjorn Helgaas 
Cc: Lorenzo Pieralisi 
Cc: Jonathan Derrick 
Cc: linux-...@vger.kernel.org
---
 drivers/pci/controller/vmd.c |6 ++
 1 file changed, 6 insertions(+)

--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -579,6 +579,12 @@ static int vmd_enable_domain(struct vmd_
return -ENODEV;
}
 
+   /*
+* Override the irq domain bus token so the domain can be distinguished
+* from a regular PCI/MSI domain.
+*/
+   irq_domain_update_bus_token(vmd->irq_domain, DOMAIN_BUS_VMD_MSI);
+
pci_add_resource(&resources, &vmd->resources[0]);
pci_add_resource_offset(&resources, &vmd->resources[1], offset[0]);
pci_add_resource_offset(&resources, &vmd->resources[2], offset[1]);

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 36/38] platform-msi: Add device MSI infrastructure

2020-08-20 Thread Thomas Gleixner

Add device specific MSI domain infrastructure for devices which have their
own resource management and interrupt chip. These devices are not related
to PCI and contrary to platform MSI they do not share a common resource and
interrupt chip. They provide their own domain specific resource management
and interrupt chip.

This utilizes the new alloc/free override in a non evil way which avoids
having yet another set of specialized alloc/free functions. Just using
msi_domain_alloc/free_irqs() is sufficient

While initially it was suggested and tried to piggyback device MSI on
platform MSI, the better variant is to reimplement platform MSI on top of
device MSI.

Signed-off-by: Thomas Gleixner 
Cc: Greg Kroah-Hartman 
Cc: Marc Zyngier 
Cc: "Rafael J. Wysocki" 
---
 drivers/base/platform-msi.c |  129 
 include/linux/irqdomain.h   |1 
 include/linux/msi.h |   24 
 kernel/irq/Kconfig  |4 +
 4 files changed, 158 insertions(+)

--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -412,3 +412,132 @@ int platform_msi_domain_alloc(struct irq
 
return err;
 }
+
+#ifdef CONFIG_DEVICE_MSI
+/*
+ * Device specific MSI domain infrastructure for devices which have their
+ * own resource management and interrupt chip. These devices are not
+ * related to PCI and contrary to platform MSI they do not share a common
+ * resource and interrupt chip. They provide their own domain specific
+ * resource management and interrupt chip.
+ */
+
+static void device_msi_free_msi_entries(struct device *dev)
+{
+   struct list_head *msi_list = dev_to_msi_list(dev);
+   struct msi_desc *entry, *tmp;
+
+   list_for_each_entry_safe(entry, tmp, msi_list, list) {
+   list_del(&entry->list);
+   free_msi_entry(entry);
+   }
+}
+
+/**
+ * device_msi_free_irqs - Free MSI interrupts assigned to  a device
+ * @dev:   Pointer to the device
+ *
+ * Frees the interrupt and the MSI descriptors.
+ */
+static void device_msi_free_irqs(struct irq_domain *domain, struct device *dev)
+{
+   __msi_domain_free_irqs(domain, dev);
+   device_msi_free_msi_entries(dev);
+}
+
+/**
+ * device_msi_alloc_irqs - Allocate MSI interrupts for a device
+ * @dev:   Pointer to the device
+ * @nvec:  Number of vectors
+ *
+ * Allocates the required number of MSI descriptors and the corresponding
+ * interrupt descriptors.
+ */
+static int device_msi_alloc_irqs(struct irq_domain *domain, struct device 
*dev, int nvec)
+{
+   int i, ret = -ENOMEM;
+
+   for (i = 0; i < nvec; i++) {
+   struct msi_desc *entry = alloc_msi_entry(dev, 1, NULL);
+
+   if (!entry)
+   goto fail;
+   list_add_tail(&entry->list, dev_to_msi_list(dev));
+   }
+
+   ret = __msi_domain_alloc_irqs(domain, dev, nvec);
+   if (!ret)
+   return 0;
+fail:
+   device_msi_free_msi_entries(dev);
+   return ret;
+}
+
+static void device_msi_update_dom_ops(struct msi_domain_info *info)
+{
+   if (!info->ops->domain_alloc_irqs)
+   info->ops->domain_alloc_irqs = device_msi_alloc_irqs;
+   if (!info->ops->domain_free_irqs)
+   info->ops->domain_free_irqs = device_msi_free_irqs;
+   if (!info->ops->msi_prepare)
+   info->ops->msi_prepare = arch_msi_prepare;
+}
+
+/**
+ * device_msi_create_msi_irq_domain - Create an irq domain for devices
+ * @fwnode:Firmware node of the interrupt controller
+ * @info:  MSI domain info to configure the new domain
+ * @parent:Parent domain
+ */
+struct irq_domain *device_msi_create_irq_domain(struct fwnode_handle *fn,
+   struct msi_domain_info *info,
+   struct irq_domain *parent)
+{
+   struct irq_domain *domain;
+
+   if (info->flags & MSI_FLAG_USE_DEF_CHIP_OPS)
+   platform_msi_update_chip_ops(info);
+
+   if (info->flags & MSI_FLAG_USE_DEF_DOM_OPS)
+   device_msi_update_dom_ops(info);
+
+   domain = msi_create_irq_domain(fn, info, parent);
+   if (domain)
+   irq_domain_update_bus_token(domain, DOMAIN_BUS_DEVICE_MSI);
+   return domain;
+}
+
+#ifdef CONFIG_PCI
+#include 
+
+/**
+ * pci_subdevice_msi_create_irq_domain - Create an irq domain for subdevices
+ * @pdev:  Pointer to PCI device for which the subdevice domain is created
+ * @info:  MSI domain info to configure the new domain
+ */
+struct irq_domain *pci_subdevice_msi_create_irq_domain(struct pci_dev *pdev,
+  struct msi_domain_info 
*info)
+{
+   struct irq_domain *domain, *pdev_msi;
+   struct fwnode_handle *fn;
+
+   /*
+* Retrieve the parent domain of the underlying PCI device's MSI
+* domain. This is going to be the parent of the new subdevice
+* domain as well.
+

[patch RFC 25/38] irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()

2020-08-20 Thread Thomas Gleixner

To support MSI irq domains which do not fit at all into the regular MSI
irqdomain scheme, like the XEN MSI interrupt management for PV/HVM/DOM0,
it's necessary to allow to override the alloc/free implementation.

This is a preperatory step to switch X86 away from arch_*_msi_irqs() and
store the irq domain pointer right in struct device.

No functional change for existing MSI irq domain users.

Aside of the evil XEN wrapper this is also useful for special MSI domains
which need to do extra alloc/free work before/after calling the generic
core function. Work like allocating/freeing MSI descriptors, MSI storage
space etc.

Signed-off-by: Thomas Gleixner 
Cc: Marc Zyngier 
---
 include/linux/msi.h |   27 
 kernel/irq/msi.c|   70 +++-
 2 files changed, 75 insertions(+), 22 deletions(-)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -241,6 +241,10 @@ struct msi_domain_info;
  * @msi_finish:Optional callback to finalize the allocation
  * @set_desc:  Set the msi descriptor for an interrupt
  * @handle_error:  Optional error handler if the allocation fails
+ * @domain_alloc_irqs: Optional function to override the default allocation
+ * function.
+ * @domain_free_irqs:  Optional function to override the default free
+ * function.
  *
  * @get_hwirq, @msi_init and @msi_free are callbacks used by
  * msi_create_irq_domain() and related interfaces
@@ -248,6 +252,22 @@ struct msi_domain_info;
  * @msi_check, @msi_prepare, @msi_finish, @set_desc and @handle_error
  * are callbacks used by msi_domain_alloc_irqs() and related
  * interfaces which are based on msi_desc.
+ *
+ * @domain_alloc_irqs, @domain_free_irqs can be used to override the
+ * default allocation/free functions (__msi_domain_alloc/free_irqs). This
+ * is initially for a wrapper around XENs seperate MSI universe which can't
+ * be wrapped into the regular irq domains concepts by mere mortals.  This
+ * allows to universally use msi_domain_alloc/free_irqs without having to
+ * special case XEN all over the place.
+ *
+ * Contrary to other operations @domain_alloc_irqs and @domain_free_irqs
+ * are set to the default implementation if NULL and even when
+ * MSI_FLAG_USE_DEF_DOM_OPS is not set to avoid breaking existing users and
+ * because these callbacks are obviously mandatory.
+ *
+ * This is NOT meant to be abused, but it can be useful to build wrappers
+ * for specialized MSI irq domains which need extra work before and after
+ * calling __msi_domain_alloc_irqs()/__msi_domain_free_irqs().
  */
 struct msi_domain_ops {
irq_hw_number_t (*get_hwirq)(struct msi_domain_info *info,
@@ -270,6 +290,10 @@ struct msi_domain_ops {
struct msi_desc *desc);
int (*handle_error)(struct irq_domain *domain,
struct msi_desc *desc, int error);
+   int (*domain_alloc_irqs)(struct irq_domain *domain,
+struct device *dev, int nvec);
+   void(*domain_free_irqs)(struct irq_domain *domain,
+   struct device *dev);
 };
 
 /**
@@ -327,8 +351,11 @@ int msi_domain_set_affinity(struct irq_d
 struct irq_domain *msi_create_irq_domain(struct fwnode_handle *fwnode,
 struct msi_domain_info *info,
 struct irq_domain *parent);
+int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
+   int nvec);
 int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
  int nvec);
+void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev);
 void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev);
 struct msi_domain_info *msi_get_domain_info(struct irq_domain *domain);
 
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -229,11 +229,13 @@ static int msi_domain_ops_check(struct i
 }
 
 static struct msi_domain_ops msi_domain_ops_default = {
-   .get_hwirq  = msi_domain_ops_get_hwirq,
-   .msi_init   = msi_domain_ops_init,
-   .msi_check  = msi_domain_ops_check,
-   .msi_prepare= msi_domain_ops_prepare,
-   .set_desc   = msi_domain_ops_set_desc,
+   .get_hwirq  = msi_domain_ops_get_hwirq,
+   .msi_init   = msi_domain_ops_init,
+   .msi_check  = msi_domain_ops_check,
+   .msi_prepare= msi_domain_ops_prepare,
+   .set_desc   = msi_domain_ops_set_desc,
+   .domain_alloc_irqs  = __msi_domain_alloc_irqs,
+   .domain_free_irqs   = __msi_domain_free_irqs,
 };
 
 static void msi_domain_update_dom_ops(struct msi_domain_info *info)
@@ -245,6 +247,14 @@ static void msi_domain_update_dom_ops(st
return;
}

[patch RFC 13/38] PCI: MSI: Rework pci_msi_domain_calc_hwirq()

2020-08-20 Thread Thomas Gleixner

Retrieve the PCI device from the msi descriptor instead of doing so at the
call sites.

Signed-off-by: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
---
 arch/x86/kernel/apic/msi.c |2 +-
 drivers/pci/msi.c  |   13 ++---
 include/linux/msi.h|3 +--
 3 files changed, 8 insertions(+), 10 deletions(-)

--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -232,7 +232,7 @@ EXPORT_SYMBOL_GPL(pci_msi_prepare);
 
 void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
 {
-   arg->msi_hwirq = pci_msi_domain_calc_hwirq(arg->msi_dev, desc);
+   arg->msi_hwirq = pci_msi_domain_calc_hwirq(desc);
 }
 EXPORT_SYMBOL_GPL(pci_msi_set_desc);
 
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1346,17 +1346,17 @@ void pci_msi_domain_write_msg(struct irq
 
 /**
  * pci_msi_domain_calc_hwirq - Generate a unique ID for an MSI source
- * @dev:   Pointer to the PCI device
  * @desc:  Pointer to the MSI descriptor
  *
  * The ID number is only used within the irqdomain.
  */
-irq_hw_number_t pci_msi_domain_calc_hwirq(struct pci_dev *dev,
- struct msi_desc *desc)
+irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc)
 {
+   struct pci_dev *pdev = msi_desc_to_pci_dev(desc);
+
return (irq_hw_number_t)desc->msi_attrib.entry_nr |
-   pci_dev_id(dev) << 11 |
-   (pci_domain_nr(dev->bus) & 0x) << 27;
+   pci_dev_id(pdev) << 11 |
+   (pci_domain_nr(pdev->bus) & 0x) << 27;
 }
 
 static inline bool pci_msi_desc_is_multi_msi(struct msi_desc *desc)
@@ -1406,8 +1406,7 @@ static void pci_msi_domain_set_desc(msi_
struct msi_desc *desc)
 {
arg->desc = desc;
-   arg->hwirq = pci_msi_domain_calc_hwirq(msi_desc_to_pci_dev(desc),
-  desc);
+   arg->hwirq = pci_msi_domain_calc_hwirq(desc);
 }
 #else
 #define pci_msi_domain_set_descNULL
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -369,8 +369,7 @@ void pci_msi_domain_write_msg(struct irq
 struct irq_domain *pci_msi_create_irq_domain(struct fwnode_handle *fwnode,
 struct msi_domain_info *info,
 struct irq_domain *parent);
-irq_hw_number_t pci_msi_domain_calc_hwirq(struct pci_dev *dev,
- struct msi_desc *desc);
+irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc);
 int pci_msi_domain_check_cap(struct irq_domain *domain,
 struct msi_domain_info *info, struct device *dev);
 u32 pci_msi_domain_get_msi_rid(struct irq_domain *domain, struct pci_dev 
*pdev);

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 11/38] x86/irq: Consolidate DMAR irq allocation

2020-08-20 Thread Thomas Gleixner

None of the DMAR specific fields are required.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/hw_irq.h |6 --
 arch/x86/kernel/apic/msi.c|   10 +-
 2 files changed, 5 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -83,12 +83,6 @@ struct irq_alloc_info {
irq_hw_number_t msi_hwirq;
};
 #endif
-#ifdef CONFIG_DMAR_TABLE
-   struct {
-   int dmar_id;
-   void*dmar_data;
-   };
-#endif
 #ifdef CONFIG_X86_UV
struct {
int uv_limit;
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -329,15 +329,15 @@ static struct irq_chip dmar_msi_controll
 static irq_hw_number_t dmar_msi_get_hwirq(struct msi_domain_info *info,
  msi_alloc_info_t *arg)
 {
-   return arg->dmar_id;
+   return arg->hwirq;
 }
 
 static int dmar_msi_init(struct irq_domain *domain,
 struct msi_domain_info *info, unsigned int virq,
 irq_hw_number_t hwirq, msi_alloc_info_t *arg)
 {
-   irq_domain_set_info(domain, virq, arg->dmar_id, info->chip, NULL,
-   handle_edge_irq, arg->dmar_data, "edge");
+   irq_domain_set_info(domain, virq, arg->devid, info->chip, NULL,
+   handle_edge_irq, arg->data, "edge");
 
return 0;
 }
@@ -384,8 +384,8 @@ int dmar_alloc_hwirq(int id, int node, v
 
init_irq_alloc_info(&info, NULL);
info.type = X86_IRQ_ALLOC_TYPE_DMAR;
-   info.dmar_id = id;
-   info.dmar_data = arg;
+   info.devid = id;
+   info.data = arg;
 
return irq_domain_alloc_irqs(domain, 1, node, &info);
 }

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 31/38] x86/irq: Cleanup the arch_*_msi_irqs() leftovers

2020-08-20 Thread Thomas Gleixner

Get rid of all the gunk and enable CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS.

Signed-off-by: Thomas Gleixner 
Cc: xen-de...@lists.xenproject.org
Cc: linux-...@vger.kernel.org
---
 arch/x86/Kconfig|1 +
 arch/x86/include/asm/pci.h  |   11 ---
 arch/x86/include/asm/x86_init.h |1 -
 arch/x86/kernel/apic/msi.c  |   22 --
 arch/x86/kernel/x86_init.c  |   18 --
 arch/x86/pci/xen.c  |7 ---
 6 files changed, 1 insertion(+), 59 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -225,6 +225,7 @@ config X86
select NEED_SG_DMA_LENGTH
select PCI_DOMAINS  if PCI
select PCI_LOCKLESS_CONFIG  if PCI
+   select PCI_MSI_DISABLE_ARCH_FALLBACKS
select PERF_EVENTS
select RTC_LIB
select RTC_MC146818_LIB
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -105,17 +105,6 @@ static inline void early_quirks(void) {
 
 extern void pci_iommu_alloc(void);
 
-#ifdef CONFIG_PCI_MSI
-/* implemented in arch/x86/kernel/apic/io_apic. */
-struct msi_desc;
-int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
-void native_teardown_msi_irq(unsigned int irq);
-void native_restore_msi_irqs(struct pci_dev *dev);
-#else
-#define native_setup_msi_irqs  NULL
-#define native_teardown_msi_irqNULL
-#endif
-
 /* generic pci stuff */
 #include 
 
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -277,7 +277,6 @@ struct pci_dev;
 
 struct x86_msi_ops {
int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type);
-   void (*teardown_msi_irq)(unsigned int irq);
void (*teardown_msi_irqs)(struct pci_dev *dev);
void (*restore_msi_irqs)(struct pci_dev *dev);
 };
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -182,28 +182,6 @@ static struct irq_chip pci_msi_controlle
.flags  = IRQCHIP_SKIP_SET_WAKE,
 };
 
-int native_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
-{
-   struct irq_domain *domain;
-   struct irq_alloc_info info;
-
-   init_irq_alloc_info(&info, NULL);
-   info.type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
-
-   domain = irq_remapping_get_irq_domain(&info);
-   if (domain == NULL)
-   domain = x86_pci_msi_default_domain;
-   if (domain == NULL)
-   return -ENOSYS;
-
-   return msi_domain_alloc_irqs(domain, &dev->dev, nvec);
-}
-
-void native_teardown_msi_irq(unsigned int irq)
-{
-   irq_domain_free_irqs(irq, 1);
-}
-
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
msi_alloc_info_t *arg)
 {
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -145,28 +145,10 @@ EXPORT_SYMBOL_GPL(x86_platform);
 
 #if defined(CONFIG_PCI_MSI)
 struct x86_msi_ops x86_msi __ro_after_init = {
-   .setup_msi_irqs = native_setup_msi_irqs,
-   .teardown_msi_irq   = native_teardown_msi_irq,
-   .teardown_msi_irqs  = default_teardown_msi_irqs,
.restore_msi_irqs   = default_restore_msi_irqs,
 };
 
 /* MSI arch specific hooks */
-int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
-{
-   return x86_msi.setup_msi_irqs(dev, nvec, type);
-}
-
-void arch_teardown_msi_irqs(struct pci_dev *dev)
-{
-   x86_msi.teardown_msi_irqs(dev);
-}
-
-void arch_teardown_msi_irq(unsigned int irq)
-{
-   x86_msi.teardown_msi_irq(irq);
-}
-
 void arch_restore_msi_irqs(struct pci_dev *dev)
 {
x86_msi.restore_msi_irqs(dev);
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -401,11 +401,6 @@ static void xen_pv_teardown_msi_irqs(str
xen_teardown_msi_irqs(dev);
 }
 
-static void xen_teardown_msi_irq(unsigned int irq)
-{
-   WARN_ON_ONCE(1);
-}
-
 static int xen_msi_domain_alloc_irqs(struct irq_domain *domain,
 struct device *dev,  int nvec)
 {
@@ -482,8 +477,6 @@ static __init void xen_setup_pci_msi(voi
return;
}
 
-   x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
-
/*
 * Override the PCI/MSI irq domain init function. No point
 * in allocating the native domain and never use it.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 29/38] x86/pci: Set default irq domain in pcibios_add_device()

2020-08-20 Thread Thomas Gleixner

Now that interrupt remapping sets the irqdomain pointer when a PCI device
is added it's possible to store the default irq domain in the device struct
in pcibios_add_device().

If the bus to which a device is connected has an irq domain associated then
this domain is used otherwise the default domain (PCI/MSI native or XEN
PCI/MSI) is used. Using the bus domain ensures that special MSI bus domains
like VMD work.

This makes XEN and the non-remapped native case work solely based on the
irq domain pointer in struct device for PCI/MSI and allows to remove the
arch fallback and make most of the x86_msi ops private to XEN in the next
steps.

Signed-off-by: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
---
 arch/x86/include/asm/irqdomain.h |2 ++
 arch/x86/kernel/apic/msi.c   |2 +-
 arch/x86/pci/common.c|   18 +-
 3 files changed, 20 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/irqdomain.h
+++ b/arch/x86/include/asm/irqdomain.h
@@ -53,9 +53,11 @@ extern int mp_irqdomain_ioapic_idx(struc
 #ifdef CONFIG_PCI_MSI
 void x86_create_pci_msi_domain(void);
 struct irq_domain *native_create_pci_msi_domain(void);
+extern struct irq_domain *x86_pci_msi_default_domain;
 #else
 static inline void x86_create_pci_msi_domain(void) { }
 #define native_create_pci_msi_domain   NULL
+#define x86_pci_msi_default_domain NULL
 #endif
 
 #endif
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -21,7 +21,7 @@
 #include 
 #include 
 
-static struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
+struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
 static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
 {
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 unsigned int pci_probe = PCI_PROBE_BIOS | PCI_PROBE_CONF1 | PCI_PROBE_CONF2 |
PCI_PROBE_MMCONF;
@@ -633,8 +634,9 @@ static void set_dev_domain_options(struc
 
 int pcibios_add_device(struct pci_dev *dev)
 {
-   struct setup_data *data;
struct pci_setup_rom *rom;
+   struct irq_domain *msidom;
+   struct setup_data *data;
u64 pa_data;
 
pa_data = boot_params.hdr.setup_data;
@@ -661,6 +663,20 @@ int pcibios_add_device(struct pci_dev *d
memunmap(data);
}
set_dev_domain_options(dev);
+
+   /*
+* Setup the initial MSI domain of the device. If the underlying
+* bus has a PCI/MSI irqdomain associated use the bus domain,
+* otherwise set the default domain. This ensures that special irq
+* domains e.g. VMD are preserved. The default ensures initial
+* operation if irq remapping is not active. If irq remapping is
+* active it will overwrite the domain pointer when the device is
+* associated to a remapping domain.
+*/
+   msidom = dev_get_msi_domain(&dev->bus->dev);
+   if (!msidom)
+   msidom = x86_pci_msi_default_domain;
+   dev_set_msi_domain(&dev->dev, msidom);
return 0;
 }
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 30/38] PCI/MSI: Allow to disable arch fallbacks

2020-08-20 Thread Thomas Gleixner

If an architecture does not require the MSI setup/teardown fallback
functions, then allow them to be replaced by stub functions which emit a
warning.

Signed-off-by: Thomas Gleixner 
Cc: Bjorn Helgaas 
Cc: linux-...@vger.kernel.org
---
 drivers/pci/Kconfig |3 +++
 drivers/pci/msi.c   |3 ++-
 include/linux/msi.h |   31 ++-
 3 files changed, 31 insertions(+), 6 deletions(-)

--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -56,6 +56,9 @@ config PCI_MSI_IRQ_DOMAIN
depends on PCI_MSI
select GENERIC_MSI_IRQ_DOMAIN
 
+config PCI_MSI_DISABLE_ARCH_FALLBACKS
+   bool
+
 config PCI_QUIRKS
default y
bool "Enable PCI quirk workarounds" if EXPERT
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -58,8 +58,8 @@ static void pci_msi_teardown_msi_irqs(st
 #define pci_msi_teardown_msi_irqs  arch_teardown_msi_irqs
 #endif
 
+#ifndef CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS
 /* Arch hooks */
-
 int __weak arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
 {
struct msi_controller *chip = dev->bus->msi;
@@ -132,6 +132,7 @@ void __weak arch_teardown_msi_irqs(struc
 {
return default_teardown_msi_irqs(dev);
 }
+#endif /* !CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS */
 
 static void default_restore_msi_irq(struct pci_dev *dev, int irq)
 {
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -193,17 +193,38 @@ void pci_msi_mask_irq(struct irq_data *d
 void pci_msi_unmask_irq(struct irq_data *data);
 
 /*
- * The arch hooks to setup up msi irqs. Those functions are
- * implemented as weak symbols so that they /can/ be overriden by
- * architecture specific code if needed.
+ * The arch hooks to setup up msi irqs. Default functions are implemented
+ * as weak symbols so that they /can/ be overriden by architecture specific
+ * code if needed.
+ *
+ * They can be replaced by stubs with warnings via
+ * CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS when the architecture fully
+ * utilizes direct irqdomain based setup.
  */
+#ifndef CONFIG_PCI_MSI_DISABLE_ARCH_FALLBACKS
 int arch_setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc);
 void arch_teardown_msi_irq(unsigned int irq);
 int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type);
 void arch_teardown_msi_irqs(struct pci_dev *dev);
-void arch_restore_msi_irqs(struct pci_dev *dev);
-
 void default_teardown_msi_irqs(struct pci_dev *dev);
+#else
+static inline int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
+{
+   WARN_ON_ONCE(1);
+   return -ENODEV;
+}
+
+static inline void arch_teardown_msi_irqs(struct pci_dev *dev)
+{
+   WARN_ON_ONCE(1);
+}
+#endif
+
+/*
+ * The restore hooks are still available as they are useful even
+ * for fully irq domain based setups. Courtesy to XEN/X86.
+ */
+void arch_restore_msi_irqs(struct pci_dev *dev);
 void default_restore_msi_irqs(struct pci_dev *dev);
 
 struct msi_controller {

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 34/38] x86/msi: Let pci_msi_prepare() handle non-PCI MSI

2020-08-20 Thread Thomas Gleixner

Rename it to x86_msi_prepare() and handle the allocation type setup
depending on the device type.

Add a new arch_msi_prepare define which will be utilized by the upcoming
device MSI support. Define it to NULL if not provided by an architecture in
the generic MSI header.

One arch specific function for MSI support is truly enough.

Signed-off-by: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
Cc: linux-hyp...@vger.kernel.org
---
 arch/x86/include/asm/msi.h  |4 +++-
 arch/x86/kernel/apic/msi.c  |   27 ---
 drivers/pci/controller/pci-hyperv.c |2 +-
 include/linux/msi.h |4 
 4 files changed, 28 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -6,7 +6,9 @@
 
 typedef struct irq_alloc_info msi_alloc_info_t;
 
-int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
+int x86_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
msi_alloc_info_t *arg);
 
+#define arch_msi_prepare   x86_msi_prepare
+
 #endif /* _ASM_X86_MSI_H */
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -182,26 +182,39 @@ static struct irq_chip pci_msi_controlle
.flags  = IRQCHIP_SKIP_SET_WAKE,
 };
 
-int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
-   msi_alloc_info_t *arg)
+static void pci_msi_prepare(struct device *dev, msi_alloc_info_t *arg)
 {
-   struct pci_dev *pdev = to_pci_dev(dev);
-   struct msi_desc *desc = first_pci_msi_entry(pdev);
+   struct msi_desc *desc = first_msi_entry(dev);
 
-   init_irq_alloc_info(arg, NULL);
if (desc->msi_attrib.is_msix) {
arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
} else {
arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
arg->flags |= X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
}
+}
+
+static void dev_msi_prepare(struct device *dev, msi_alloc_info_t *arg)
+{
+   arg->type = X86_IRQ_ALLOC_TYPE_DEV_MSI;
+}
+
+int x86_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
+   msi_alloc_info_t *arg)
+{
+   init_irq_alloc_info(arg, NULL);
+
+   if (dev_is_pci(dev))
+   pci_msi_prepare(dev, arg);
+   else
+   dev_msi_prepare(dev, arg);
 
return 0;
 }
-EXPORT_SYMBOL_GPL(pci_msi_prepare);
+EXPORT_SYMBOL_GPL(x86_msi_prepare);
 
 static struct msi_domain_ops pci_msi_domain_ops = {
-   .msi_prepare= pci_msi_prepare,
+   .msi_prepare= x86_msi_prepare,
 };
 
 static struct msi_domain_info pci_msi_domain_info = {
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1532,7 +1532,7 @@ static struct irq_chip hv_msi_irq_chip =
 };
 
 static struct msi_domain_ops hv_msi_ops = {
-   .msi_prepare= pci_msi_prepare,
+   .msi_prepare= arch_msi_prepare,
.msi_free   = hv_msi_free,
 };
 
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -430,4 +430,8 @@ static inline struct irq_domain *pci_msi
 }
 #endif /* CONFIG_PCI_MSI_IRQ_DOMAIN */
 
+#ifndef arch_msi_prepare
+# define arch_msi_prepare  NULL
+#endif
+
 #endif /* LINUX_MSI_H */

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 33/38] x86/irq: Add DEV_MSI allocation type

2020-08-20 Thread Thomas Gleixner

For the upcoming device MSI support a new allocation type is
required.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/hw_irq.h |1 +
 1 file changed, 1 insertion(+)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -40,6 +40,7 @@ enum irq_alloc_type {
X86_IRQ_ALLOC_TYPE_PCI_MSIX,
X86_IRQ_ALLOC_TYPE_DMAR,
X86_IRQ_ALLOC_TYPE_UV,
+   X86_IRQ_ALLOC_TYPE_DEV_MSI,
X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT,
X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 10/38] x86/ioapic: Consolidate IOAPIC allocation

2020-08-20 Thread Thomas Gleixner

Move the IOAPIC specific fields into their own struct and reuse the common
devid. Get rid of the #ifdeffery as it does not matter at all whether the
alloc info is a couple of bytes longer or not.

Signed-off-by: Thomas Gleixner 
Cc: Wei Liu 
Cc: "K. Y. Srinivasan" 
Cc: Stephen Hemminger 
Cc: Joerg Roedel 
Cc: linux-hyp...@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: Haiyang Zhang 
Cc: Jon Derrick 
Cc: Lu Baolu 
---
 arch/x86/include/asm/hw_irq.h   |   23 ++-
 arch/x86/kernel/apic/io_apic.c  |   70 ++--
 drivers/iommu/amd/iommu.c   |   14 +++
 drivers/iommu/hyperv-iommu.c|2 -
 drivers/iommu/intel/irq_remapping.c |   18 -
 5 files changed, 64 insertions(+), 63 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -44,6 +44,15 @@ enum irq_alloc_type {
X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };
 
+struct ioapic_alloc_info {
+   int pin;
+   int node;
+   u32 trigger : 1;
+   u32 polarity : 1;
+   u32 valid : 1;
+   struct IO_APIC_route_entry  *entry;
+};
+
 /**
  * irq_alloc_info - X86 specific interrupt allocation info
  * @type:  X86 specific allocation type
@@ -53,6 +62,8 @@ enum irq_alloc_type {
  * @mask:  CPU mask for vector allocation
  * @desc:  Pointer to msi descriptor
  * @data:  Allocation specific data
+ *
+ * @ioapic:IOAPIC specific allocation data
  */
 struct irq_alloc_info {
enum irq_alloc_type type;
@@ -64,6 +75,7 @@ struct irq_alloc_info {
void*data;
 
union {
+   struct ioapic_alloc_infoioapic;
int unused;
 #ifdef CONFIG_PCI_MSI
struct {
@@ -71,17 +83,6 @@ struct irq_alloc_info {
irq_hw_number_t msi_hwirq;
};
 #endif
-#ifdef CONFIG_X86_IO_APIC
-   struct {
-   int ioapic_id;
-   int ioapic_pin;
-   int ioapic_node;
-   u32 ioapic_trigger : 1;
-   u32 ioapic_polarity : 1;
-   u32 ioapic_valid : 1;
-   struct IO_APIC_route_entry *ioapic_entry;
-   };
-#endif
 #ifdef CONFIG_DMAR_TABLE
struct {
int dmar_id;
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -860,10 +860,10 @@ void ioapic_set_alloc_attr(struct irq_al
 {
init_irq_alloc_info(info, NULL);
info->type = X86_IRQ_ALLOC_TYPE_IOAPIC;
-   info->ioapic_node = node;
-   info->ioapic_trigger = trigger;
-   info->ioapic_polarity = polarity;
-   info->ioapic_valid = 1;
+   info->ioapic.node = node;
+   info->ioapic.trigger = trigger;
+   info->ioapic.polarity = polarity;
+   info->ioapic.valid = 1;
 }
 
 #ifndef CONFIG_ACPI
@@ -878,32 +878,32 @@ static void ioapic_copy_alloc_attr(struc
 
copy_irq_alloc_info(dst, src);
dst->type = X86_IRQ_ALLOC_TYPE_IOAPIC;
-   dst->ioapic_id = mpc_ioapic_id(ioapic_idx);
-   dst->ioapic_pin = pin;
-   dst->ioapic_valid = 1;
-   if (src && src->ioapic_valid) {
-   dst->ioapic_node = src->ioapic_node;
-   dst->ioapic_trigger = src->ioapic_trigger;
-   dst->ioapic_polarity = src->ioapic_polarity;
+   dst->devid = mpc_ioapic_id(ioapic_idx);
+   dst->ioapic.pin = pin;
+   dst->ioapic.valid = 1;
+   if (src && src->ioapic.valid) {
+   dst->ioapic.node = src->ioapic.node;
+   dst->ioapic.trigger = src->ioapic.trigger;
+   dst->ioapic.polarity = src->ioapic.polarity;
} else {
-   dst->ioapic_node = NUMA_NO_NODE;
+   dst->ioapic.node = NUMA_NO_NODE;
if (acpi_get_override_irq(gsi, &trigger, &polarity) >= 0) {
-   dst->ioapic_trigger = trigger;
-   dst->ioapic_polarity = polarity;
+   dst->ioapic.trigger = trigger;
+   dst->ioapic.polarity = polarity;
} else {
/*
 * PCI interrupts are always active low level
 * triggered.
 */
-   dst->ioapic_trigger = IOAPIC_LEVEL;
-   dst->ioapic_polarity = IOAPIC_POL_LOW;
+   dst->ioapic.trigger = IOAPIC_LEVEL;
+   dst->ioapic.polarity = IOAPIC_POL_LOW;
}
}
 }
 
 static int ioapic_alloc_attr_node(struct irq_alloc_info *info)
 {
-   return (info && info->ioapic_valid) ? info->ioapic_node : NUMA_NO_NODE;
+

[patch RFC 17/38] x86/pci: Reducde #ifdeffery in PCI init code

2020-08-20 Thread Thomas Gleixner

Adding a function call before the first #ifdef in arch_pci_init() triggers
a 'mixed declarations and code' warning if PCI_DIRECT is enabled.

Use stub functions and move the #ifdeffery to the header file where it is
not in the way.

Signed-off-by: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
---
 arch/x86/include/asm/pci_x86.h |   11 +++
 arch/x86/pci/init.c|   10 +++---
 2 files changed, 14 insertions(+), 7 deletions(-)

--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -114,9 +114,20 @@ extern const struct pci_raw_ops pci_dire
 extern bool port_cf9_safe;
 
 /* arch_initcall level */
+#ifdef CONFIG_PCI_DIRECT
 extern int pci_direct_probe(void);
 extern void pci_direct_init(int type);
+#else
+static inline int pci_direct_probe(void) { return -1; }
+static inline  void pci_direct_init(int type) { }
+#endif
+
+#ifdef CONFIG_PCI_BIOS
 extern void pci_pcbios_init(void);
+#else
+static inline void pci_pcbios_init(void) { }
+#endif
+
 extern void __init dmi_check_pciprobe(void);
 extern void __init dmi_check_skip_isa_align(void);
 
--- a/arch/x86/pci/init.c
+++ b/arch/x86/pci/init.c
@@ -8,11 +8,9 @@
in the right sequence from here. */
 static __init int pci_arch_init(void)
 {
-#ifdef CONFIG_PCI_DIRECT
-   int type = 0;
+   int type;
 
type = pci_direct_probe();
-#endif
 
if (!(pci_probe & PCI_PROBE_NOEARLY))
pci_mmcfg_early_init();
@@ -20,18 +18,16 @@ static __init int pci_arch_init(void)
if (x86_init.pci.arch_init && !x86_init.pci.arch_init())
return 0;
 
-#ifdef CONFIG_PCI_BIOS
pci_pcbios_init();
-#endif
+
/*
 * don't check for raw_pci_ops here because we want pcbios as last
 * fallback, yet it's needed to run first to set pcibios_last_bus
 * in case legacy PCI probing is used. otherwise detecting peer busses
 * fails.
 */
-#ifdef CONFIG_PCI_DIRECT
pci_direct_init(type);
-#endif
+
if (!raw_pci_ops && !raw_pci_ext_ops)
printk(KERN_ERR
"PCI: Fatal: No config space access function found\n");

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 15/38] x86/msi: Use generic MSI domain ops

2020-08-20 Thread Thomas Gleixner

pci_msi_get_hwirq() and pci_msi_set_desc are not longer special. Enable the
generic MSI domain ops in the core and PCI MSI code unconditionally and get
rid of the x86 specific implementations in the X86 MSI code and in the
hyperv PCI driver.

Signed-off-by: Thomas Gleixner 
Cc: Wei Liu 
Cc: Stephen Hemminger 
Cc: Haiyang Zhang 
Cc: linux-...@vger.kernel.org
Cc: linux-hyp...@vger.kernel.org
---
 arch/x86/include/asm/msi.h  |2 --
 arch/x86/kernel/apic/msi.c  |   15 ---
 drivers/pci/controller/pci-hyperv.c |8 
 drivers/pci/msi.c   |4 
 kernel/irq/msi.c|6 --
 5 files changed, 35 deletions(-)

--- a/arch/x86/include/asm/msi.h
+++ b/arch/x86/include/asm/msi.h
@@ -9,6 +9,4 @@ typedef struct irq_alloc_info msi_alloc_
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
msi_alloc_info_t *arg);
 
-void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc);
-
 #endif /* _ASM_X86_MSI_H */
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -204,12 +204,6 @@ void native_teardown_msi_irq(unsigned in
irq_domain_free_irqs(irq, 1);
 }
 
-static irq_hw_number_t pci_msi_get_hwirq(struct msi_domain_info *info,
-msi_alloc_info_t *arg)
-{
-   return arg->hwirq;
-}
-
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
msi_alloc_info_t *arg)
 {
@@ -228,17 +222,8 @@ int pci_msi_prepare(struct irq_domain *d
 }
 EXPORT_SYMBOL_GPL(pci_msi_prepare);
 
-void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
-{
-   arg->desc = desc;
-   arg->hwirq = pci_msi_domain_calc_hwirq(desc);
-}
-EXPORT_SYMBOL_GPL(pci_msi_set_desc);
-
 static struct msi_domain_ops pci_msi_domain_ops = {
-   .get_hwirq  = pci_msi_get_hwirq,
.msi_prepare= pci_msi_prepare,
-   .set_desc   = pci_msi_set_desc,
 };
 
 static struct msi_domain_info pci_msi_domain_info = {
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1531,16 +1531,8 @@ static struct irq_chip hv_msi_irq_chip =
.irq_unmask = hv_irq_unmask,
 };
 
-static irq_hw_number_t hv_msi_domain_ops_get_hwirq(struct msi_domain_info 
*info,
-  msi_alloc_info_t *arg)
-{
-   return arg->hwirq;
-}
-
 static struct msi_domain_ops hv_msi_ops = {
-   .get_hwirq  = hv_msi_domain_ops_get_hwirq,
.msi_prepare= pci_msi_prepare,
-   .set_desc   = pci_msi_set_desc,
.msi_free   = hv_msi_free,
 };
 
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -1401,16 +1401,12 @@ static int pci_msi_domain_handle_error(s
return error;
 }
 
-#ifdef GENERIC_MSI_DOMAIN_OPS
 static void pci_msi_domain_set_desc(msi_alloc_info_t *arg,
struct msi_desc *desc)
 {
arg->desc = desc;
arg->hwirq = pci_msi_domain_calc_hwirq(desc);
 }
-#else
-#define pci_msi_domain_set_descNULL
-#endif
 
 static struct msi_domain_ops pci_msi_domain_ops_default = {
.set_desc   = pci_msi_domain_set_desc,
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -187,7 +187,6 @@ static const struct irq_domain_ops msi_d
.deactivate = msi_domain_deactivate,
 };
 
-#ifdef GENERIC_MSI_DOMAIN_OPS
 static irq_hw_number_t msi_domain_ops_get_hwirq(struct msi_domain_info *info,
msi_alloc_info_t *arg)
 {
@@ -206,11 +205,6 @@ static void msi_domain_ops_set_desc(msi_
 {
arg->desc = desc;
 }
-#else
-#define msi_domain_ops_get_hwirq   NULL
-#define msi_domain_ops_prepare NULL
-#define msi_domain_ops_set_descNULL
-#endif /* !GENERIC_MSI_DOMAIN_OPS */
 
 static int msi_domain_ops_init(struct irq_domain *domain,
   struct msi_domain_info *info,

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 09/38] x86/msi: Consolidate HPET allocation

2020-08-20 Thread Thomas Gleixner

None of the magic HPET fields are required in any way.

Signed-off-by: Thomas Gleixner 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
Cc: Lu Baolu 
---
 arch/x86/include/asm/hw_irq.h   |7 ---
 arch/x86/kernel/apic/msi.c  |   14 +++---
 drivers/iommu/amd/iommu.c   |2 +-
 drivers/iommu/intel/irq_remapping.c |4 ++--
 4 files changed, 10 insertions(+), 17 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -65,13 +65,6 @@ struct irq_alloc_info {
 
union {
int unused;
-#ifdef CONFIG_HPET_TIMER
-   struct {
-   int hpet_id;
-   int hpet_index;
-   void*hpet_data;
-   };
-#endif
 #ifdef CONFIG_PCI_MSI
struct {
struct pci_dev  *msi_dev;
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -427,7 +427,7 @@ static struct irq_chip hpet_msi_controll
 static irq_hw_number_t hpet_msi_get_hwirq(struct msi_domain_info *info,
  msi_alloc_info_t *arg)
 {
-   return arg->hpet_index;
+   return arg->hwirq;
 }
 
 static int hpet_msi_init(struct irq_domain *domain,
@@ -435,8 +435,8 @@ static int hpet_msi_init(struct irq_doma
 irq_hw_number_t hwirq, msi_alloc_info_t *arg)
 {
irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
-   irq_domain_set_info(domain, virq, arg->hpet_index, info->chip, NULL,
-   handle_edge_irq, arg->hpet_data, "edge");
+   irq_domain_set_info(domain, virq, arg->hwirq, info->chip, NULL,
+   handle_edge_irq, arg->data, "edge");
 
return 0;
 }
@@ -477,7 +477,7 @@ struct irq_domain *hpet_create_irq_domai
 
init_irq_alloc_info(&info, NULL);
info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
-   info.hpet_id = hpet_id;
+   info.devid = hpet_id;
parent = irq_remapping_get_irq_domain(&info);
if (parent == NULL)
parent = x86_vector_domain;
@@ -506,9 +506,9 @@ int hpet_assign_irq(struct irq_domain *d
 
init_irq_alloc_info(&info, NULL);
info.type = X86_IRQ_ALLOC_TYPE_HPET;
-   info.hpet_data = hc;
-   info.hpet_id = hpet_dev_id(domain);
-   info.hpet_index = dev_num;
+   info.data = hc;
+   info.devid = hpet_dev_id(domain);
+   info.hwirq = dev_num;
 
return irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info);
 }
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3511,7 +3511,7 @@ static int get_devid(struct irq_alloc_in
return get_ioapic_devid(info->ioapic_id);
case X86_IRQ_ALLOC_TYPE_HPET:
case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-   return get_hpet_devid(info->hpet_id);
+   return get_hpet_devid(info->devid);
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
return get_device_id(&info->msi_dev->dev);
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1115,7 +1115,7 @@ static struct irq_domain *intel_get_irq_
case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
return map_ioapic_to_ir(info->ioapic_id);
case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-   return map_hpet_to_ir(info->hpet_id);
+   return map_hpet_to_ir(info->devid);
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
return map_dev_to_ir(info->msi_dev);
@@ -1285,7 +1285,7 @@ static void intel_irq_remapping_prepare_
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
if (info->type == X86_IRQ_ALLOC_TYPE_HPET)
-   set_hpet_sid(irte, info->hpet_id);
+   set_hpet_sid(irte, info->devid);
else
set_msi_sid(irte, info->msi_dev);
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 24/38] x86/xen: Consolidate XEN-MSI init

2020-08-20 Thread Thomas Gleixner

X86 cannot store the irq domain pointer in struct device without breaking
XEN because the irq domain pointer takes precedence over arch_*_msi_irqs()
fallbacks.

To achieve this XEN MSI interrupt management needs to be wrapped into an
irq domain.

Move the x86_msi ops setup into a single function to prepare for this.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/pci/xen.c |   51 ---
 1 file changed, 32 insertions(+), 19 deletions(-)

--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -371,7 +371,10 @@ static void xen_initdom_restore_msi_irqs
WARN(ret && ret != -ENOSYS, "restore_msi -> %d\n", ret);
}
 }
-#endif
+#else /* CONFIG_XEN_DOM0 */
+#define xen_initdom_setup_msi_irqs NULL
+#define xen_initdom_restore_msi_irqs   NULL
+#endif /* !CONFIG_XEN_DOM0 */
 
 static void xen_teardown_msi_irqs(struct pci_dev *dev)
 {
@@ -403,7 +406,31 @@ static void xen_teardown_msi_irq(unsigne
WARN_ON_ONCE(1);
 }
 
-#endif
+static __init void xen_setup_pci_msi(void)
+{
+   if (xen_initial_domain()) {
+   x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs;
+   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
+   x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
+   pci_msi_ignore_mask = 1;
+   } else if (xen_pv_domain()) {
+   x86_msi.setup_msi_irqs = xen_setup_msi_irqs;
+   x86_msi.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
+   pci_msi_ignore_mask = 1;
+   } else if (xen_hvm_domain()) {
+   x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs;
+   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
+   } else {
+   WARN_ON_ONCE(1);
+   return;
+   }
+
+   x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
+}
+
+#else /* CONFIG_PCI_MSI */
+static inline void xen_setup_pci_msi(void) { }
+#endif /* CONFIG_PCI_MSI */
 
 int __init pci_xen_init(void)
 {
@@ -420,12 +447,7 @@ int __init pci_xen_init(void)
/* Keep ACPI out of the picture */
acpi_noirq_set();
 
-#ifdef CONFIG_PCI_MSI
-   x86_msi.setup_msi_irqs = xen_setup_msi_irqs;
-   x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
-   x86_msi.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
-   pci_msi_ignore_mask = 1;
-#endif
+   xen_setup_pci_msi();
return 0;
 }
 
@@ -445,10 +467,7 @@ static void __init xen_hvm_msi_init(void
((eax & XEN_HVM_CPUID_APIC_ACCESS_VIRT) && 
boot_cpu_has(X86_FEATURE_APIC)))
return;
}
-
-   x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs;
-   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
-   x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
+   xen_setup_pci_msi();
 }
 #endif
 
@@ -481,13 +500,7 @@ int __init pci_xen_initial_domain(void)
 {
int irq;
 
-#ifdef CONFIG_PCI_MSI
-   x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs;
-   x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
-   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
-   x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
-   pci_msi_ignore_mask = 1;
-#endif
+   xen_setup_pci_msi();
__acpi_register_gsi = acpi_register_gsi_xen;
__acpi_unregister_gsi = NULL;
/*

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 28/38] iommm/amd: Store irq domain in struct device

2020-08-20 Thread Thomas Gleixner

As the next step to make X86 utilize the direct MSI irq domain operations
store the irq domain pointer in the device struct when a device is probed.

It only overrides the irqdomain of devices which are handled by a regular
PCI/MSI irq domain which protects PCI devices behind special busses like
VMD which have their own irq domain.

No functional change.

It just avoids the redirection through arch_*_msi_irqs() and allows the
PCI/MSI core to directly invoke the irq domain alloc/free functions instead
of having to look up the irq domain for every single MSI interupt.

Signed-off-by: Thomas Gleixner 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
---
 drivers/iommu/amd/iommu.c |   17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -729,7 +729,21 @@ static void iommu_poll_ga_log(struct amd
}
}
 }
-#endif /* CONFIG_IRQ_REMAP */
+
+static void
+amd_iommu_set_pci_msi_domain(struct device *dev, struct amd_iommu *iommu)
+{
+   if (!irq_remapping_enabled || !dev_is_pci(dev) ||
+   pci_dev_has_special_msi_domain(to_pci_dev(dev)))
+   return;
+
+   dev_set_msi_domain(dev, iommu->msi_domain);
+}
+
+#else /* CONFIG_IRQ_REMAP */
+static inline void
+amd_iommu_set_pci_msi_domain(struct device *dev, struct amd_iommu *iommu) { }
+#endif /* !CONFIG_IRQ_REMAP */
 
 #define AMD_IOMMU_INT_MASK \
(MMIO_STATUS_EVT_INT_MASK | \
@@ -2157,6 +2171,7 @@ static struct iommu_device *amd_iommu_pr
iommu_dev = ERR_PTR(ret);
iommu_ignore_device(dev);
} else {
+   amd_iommu_set_pci_msi_domain(dev, iommu);
iommu_dev = &iommu->iommu;
}
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 37/38] irqdomain/msi: Provide msi_alloc/free_store() callbacks

2020-08-20 Thread Thomas Gleixner

For devices which don't have a standard storage for MSI messages like the
upcoming IMS (Interrupt Message Storm) it's required to allocate storage
space before allocating interrupts and after freeing them.

This could be achieved with the existing callbacks, but that would be
awkward because they operate on msi_alloc_info_t which is not uniform
accross architectures. Also these callbacks are invoked per interrupt but
the allocation might have bulk requirements depending on the device.

As such devices can operate on different architectures it is simpler to
have seperate callbacks which operate on struct device. The resulting
storage information has to be stored in struct msi_desc so the underlying
irq chip implementation can retrieve it for the relevant operations.

Signed-off-by: Thomas Gleixner 
Cc: Marc Zyngier 
---
 include/linux/msi.h |8 
 kernel/irq/msi.c|   11 +++
 2 files changed, 19 insertions(+)

--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -279,6 +279,10 @@ struct msi_domain_info;
  * function.
  * @domain_free_irqs:  Optional function to override the default free
  * function.
+ * @msi_alloc_store:   Optional callback to allocate storage in a device
+ * specific non-standard MSI store
+ * @msi_alloc_free:Optional callback to free storage in a device
+ * specific non-standard MSI store
  *
  * @get_hwirq, @msi_init and @msi_free are callbacks used by
  * msi_create_irq_domain() and related interfaces
@@ -328,6 +332,10 @@ struct msi_domain_ops {
 struct device *dev, int nvec);
void(*domain_free_irqs)(struct irq_domain *domain,
struct device *dev);
+   int (*msi_alloc_store)(struct irq_domain *domain,
+  struct device *dev, int nvec);
+   void(*msi_free_store)(struct irq_domain *domain,
+   struct device *dev);
 };
 
 /**
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -410,6 +410,12 @@ int __msi_domain_alloc_irqs(struct irq_d
if (ret)
return ret;
 
+   if (ops->msi_alloc_store) {
+   ret = ops->msi_alloc_store(domain, dev, nvec);
+   if (ret)
+   return ret;
+   }
+
for_each_msi_entry(desc, dev) {
ops->set_desc(&arg, desc);
 
@@ -509,6 +515,8 @@ int msi_domain_alloc_irqs(struct irq_dom
 
 void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev)
 {
+   struct msi_domain_info *info = domain->host_data;
+   struct msi_domain_ops *ops = info->ops;
struct msi_desc *desc;
 
for_each_msi_entry(desc, dev) {
@@ -522,6 +530,9 @@ void __msi_domain_free_irqs(struct irq_d
desc->irq = 0;
}
}
+
+   if (ops->msi_free_store)
+   ops->msi_free_store(domain, dev);
 }
 
 /**

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 14/38] x86/msi: Consolidate MSI allocation

2020-08-20 Thread Thomas Gleixner

Convert the interrupt remap drivers to retrieve the pci device from the msi
descriptor and use info::hwirq.

This is the first step to prepare x86 for using the generic MSI domain ops.

Signed-off-by: Thomas Gleixner 
Cc: Wei Liu 
Cc: Stephen Hemminger 
Cc: Joerg Roedel 
Cc: linux-...@vger.kernel.org
Cc: linux-hyp...@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: Haiyang Zhang 
Cc: Lu Baolu 
---
 arch/x86/include/asm/hw_irq.h   |8 
 arch/x86/kernel/apic/msi.c  |7 +++
 drivers/iommu/amd/iommu.c   |5 +++--
 drivers/iommu/intel/irq_remapping.c |4 ++--
 drivers/pci/controller/pci-hyperv.c |2 +-
 5 files changed, 9 insertions(+), 17 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -85,14 +85,6 @@ struct irq_alloc_info {
union {
struct ioapic_alloc_infoioapic;
struct uv_alloc_infouv;
-
-   int unused;
-#ifdef CONFIG_PCI_MSI
-   struct {
-   struct pci_dev  *msi_dev;
-   irq_hw_number_t msi_hwirq;
-   };
-#endif
};
 };
 
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -189,7 +189,6 @@ int native_setup_msi_irqs(struct pci_dev
 
init_irq_alloc_info(&info, NULL);
info.type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
-   info.msi_dev = dev;
 
domain = irq_remapping_get_irq_domain(&info);
if (domain == NULL)
@@ -208,7 +207,7 @@ void native_teardown_msi_irq(unsigned in
 static irq_hw_number_t pci_msi_get_hwirq(struct msi_domain_info *info,
 msi_alloc_info_t *arg)
 {
-   return arg->msi_hwirq;
+   return arg->hwirq;
 }
 
 int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec,
@@ -218,7 +217,6 @@ int pci_msi_prepare(struct irq_domain *d
struct msi_desc *desc = first_pci_msi_entry(pdev);
 
init_irq_alloc_info(arg, NULL);
-   arg->msi_dev = pdev;
if (desc->msi_attrib.is_msix) {
arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
} else {
@@ -232,7 +230,8 @@ EXPORT_SYMBOL_GPL(pci_msi_prepare);
 
 void pci_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc)
 {
-   arg->msi_hwirq = pci_msi_domain_calc_hwirq(desc);
+   arg->desc = desc;
+   arg->hwirq = pci_msi_domain_calc_hwirq(desc);
 }
 EXPORT_SYMBOL_GPL(pci_msi_set_desc);
 
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3514,7 +3514,7 @@ static int get_devid(struct irq_alloc_in
return get_hpet_devid(info->devid);
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-   return get_device_id(&info->msi_dev->dev);
+   return get_device_id(msi_desc_to_dev(info->desc));
default:
WARN_ON_ONCE(1);
return -1;
@@ -3688,7 +3688,8 @@ static int irq_remapping_alloc(struct ir
   info->type == X86_IRQ_ALLOC_TYPE_PCI_MSIX) {
bool align = (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI);
 
-   index = alloc_irq_index(devid, nr_irqs, align, info->msi_dev);
+   index = alloc_irq_index(devid, nr_irqs, align,
+   msi_desc_to_pci_dev(info->desc));
} else {
index = alloc_irq_index(devid, nr_irqs, false, NULL);
}
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1118,7 +1118,7 @@ static struct irq_domain *intel_get_irq_
return map_hpet_to_ir(info->devid);
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-   return map_dev_to_ir(info->msi_dev);
+   return map_dev_to_ir(msi_desc_to_pci_dev(info->desc));
default:
WARN_ON_ONCE(1);
return NULL;
@@ -1287,7 +1287,7 @@ static void intel_irq_remapping_prepare_
if (info->type == X86_IRQ_ALLOC_TYPE_HPET)
set_hpet_sid(irte, info->devid);
else
-   set_msi_sid(irte, info->msi_dev);
+   set_msi_sid(irte, msi_desc_to_pci_dev(info->desc));
 
msg->address_hi = MSI_ADDR_BASE_HI;
msg->data = sub_handle;
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1534,7 +1534,7 @@ static struct irq_chip hv_msi_irq_chip =
 static irq_hw_number_t hv_msi_domain_ops_get_hwirq(struct msi_domain_info 
*info,
   msi_alloc_info_t *arg)
 {
-   return arg->msi_hwirq;
+   return arg->hwirq;
 }
 
 static struct msi_domain_ops hv_msi_ops = {

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 16/38] x86/irq: Move apic_post_init() invocation to one place

2020-08-20 Thread Thomas Gleixner

No point to call it from both 32bit and 64bit implementations of
default_setup_apic_routing(). Move it to the caller.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/apic/apic.c |3 +++
 arch/x86/kernel/apic/probe_32.c |3 ---
 arch/x86/kernel/apic/probe_64.c |3 ---
 3 files changed, 3 insertions(+), 6 deletions(-)

--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1429,6 +1429,9 @@ void __init apic_intr_mode_init(void)
break;
}
 
+   if (x86_platform.apic_post_init)
+   x86_platform.apic_post_init();
+
apic_bsp_setup(upmode);
 }
 
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -170,9 +170,6 @@ void __init default_setup_apic_routing(v
 
if (apic->setup_apic_routing)
apic->setup_apic_routing();
-
-   if (x86_platform.apic_post_init)
-   x86_platform.apic_post_init();
 }
 
 void __init generic_apic_probe(void)
--- a/arch/x86/kernel/apic/probe_64.c
+++ b/arch/x86/kernel/apic/probe_64.c
@@ -32,9 +32,6 @@ void __init default_setup_apic_routing(v
break;
}
}
-
-   if (x86_platform.apic_post_init)
-   x86_platform.apic_post_init();
 }
 
 int __init default_acpi_madt_oem_check(char *oem_id, char *oem_table_id)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 08/38] x86/irq: Prepare consolidation of irq_alloc_info

2020-08-20 Thread Thomas Gleixner

struct irq_alloc_info is a horrible zoo of unnamed structs in a union. Many
of the struct fields can be generic and don't have to be type specific like
hpet_id, ioapic_id...

Provide a generic set of members to prepare for the consolidation. The goal
is to make irq_alloc_info have the same basic member as the generic
msi_alloc_info so generic MSI domain ops can be reused and yet more mess
can be avoided when (non-PCI) device MSI support comes along.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/hw_irq.h |   22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -44,10 +44,25 @@ enum irq_alloc_type {
X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };
 
+/**
+ * irq_alloc_info - X86 specific interrupt allocation info
+ * @type:  X86 specific allocation type
+ * @flags: Flags for allocation tweaks
+ * @devid: Device ID for allocations
+ * @hwirq: Associated hw interrupt number in the domain
+ * @mask:  CPU mask for vector allocation
+ * @desc:  Pointer to msi descriptor
+ * @data:  Allocation specific data
+ */
 struct irq_alloc_info {
enum irq_alloc_type type;
u32 flags;
-   const struct cpumask*mask;  /* CPU mask for vector allocation */
+   u32 devid;
+   irq_hw_number_t hwirq;
+   const struct cpumask*mask;
+   struct msi_desc *desc;
+   void*data;
+
union {
int unused;
 #ifdef CONFIG_HPET_TIMER
@@ -88,11 +103,6 @@ struct irq_alloc_info {
char*uv_name;
};
 #endif
-#if IS_ENABLED(CONFIG_VMD)
-   struct {
-   struct msi_desc *desc;
-   };
-#endif
};
 };
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 35/38] platform-msi: Provide default irq_chip::ack

2020-08-20 Thread Thomas Gleixner

For the upcoming device MSI support it's required to have a default
irq_chip::ack implementation (irq_chip_ack_parent) so the drivers do not
need to care.

Signed-off-by: Thomas Gleixner 
Cc: Greg Kroah-Hartman 
---
 drivers/base/platform-msi.c |2 ++
 1 file changed, 2 insertions(+)

--- a/drivers/base/platform-msi.c
+++ b/drivers/base/platform-msi.c
@@ -95,6 +95,8 @@ static void platform_msi_update_chip_ops
chip->irq_mask = irq_chip_mask_parent;
if (!chip->irq_unmask)
chip->irq_unmask = irq_chip_unmask_parent;
+   if (!chip->irq_ack)
+   chip->irq_ack = irq_chip_ack_parent;
if (!chip->irq_eoi)
chip->irq_eoi = irq_chip_eoi_parent;
if (!chip->irq_set_affinity)

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 27/38] iommm/vt-d: Store irq domain in struct device

2020-08-20 Thread Thomas Gleixner

As a first step to make X86 utilize the direct MSI irq domain operations
store the irq domain pointer in the device struct when a device is probed.

This is done from dmar_pci_bus_add_dev() because it has to work even when
DMA remapping is disabled. It only overrides the irqdomain of devices which
are handled by a regular PCI/MSI irq domain which protects PCI devices
behind special busses like VMD which have their own irq domain.

No functional change. It just avoids the redirection through
arch_*_msi_irqs() and allows the PCI/MSI core to directly invoke the irq
domain alloc/free functions instead of having to look up the irq domain for
every single MSI interupt.

Signed-off-by: Thomas Gleixner 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
Cc: Lu Baolu 
---
 drivers/iommu/intel/dmar.c  |3 +++
 drivers/iommu/intel/irq_remapping.c |   16 
 include/linux/intel-iommu.h |5 +
 3 files changed, 24 insertions(+)

--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -316,6 +316,9 @@ static int dmar_pci_bus_add_dev(struct d
if (ret < 0 && dmar_dev_scope_status == 0)
dmar_dev_scope_status = ret;
 
+   if (ret >= 0)
+   intel_irq_remap_add_device(info);
+
return ret;
 }
 
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1086,6 +1086,22 @@ static int reenable_irq_remapping(int ei
return -1;
 }
 
+/*
+ * Store the MSI remapping domain pointer in the device if enabled.
+ *
+ * This is called from dmar_pci_bus_add_dev() so it works even when DMA
+ * remapping is disabled. Only update the pointer if the device is not
+ * already handled by a non default PCI/MSI interrupt domain. This protects
+ * e.g. VMD devices.
+ */
+void intel_irq_remap_add_device(struct dmar_pci_notify_info *info)
+{
+   if (!irq_remapping_enabled || pci_dev_has_special_msi_domain(info->dev))
+   return;
+
+   dev_set_msi_domain(&info->dev->dev, map_dev_to_ir(info->dev));
+}
+
 static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
 {
memset(irte, 0, sizeof(*irte));
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -439,6 +439,11 @@ struct ir_table {
struct irte *base;
unsigned long *bitmap;
 };
+
+void intel_irq_remap_add_device(struct dmar_pci_notify_info *info);
+#else
+static inline void
+intel_irq_remap_add_device(struct dmar_pci_notify_info *info) { }
 #endif
 
 struct iommu_flush {

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 01/38] iommu/amd: Prevent NULL pointer dereference

2020-08-20 Thread Thomas Gleixner

Dereferencing irq_data before checking it for NULL is suboptimal.

Signed-off-by: Thomas Gleixner 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
---
 drivers/iommu/amd/iommu.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3717,8 +3717,8 @@ static int irq_remapping_alloc(struct ir
 
for (i = 0; i < nr_irqs; i++) {
irq_data = irq_domain_get_irq_data(domain, virq + i);
-   cfg = irqd_cfg(irq_data);
-   if (!irq_data || !cfg) {
+   cfg = irq_data ? irqd_cfg(irq_data) : NULL;
+   if (!cfg) {
ret = -EINVAL;
goto out_free_data;
}

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 18/38] x86/irq: Initialize PCI/MSI domain at PCI init time

2020-08-20 Thread Thomas Gleixner

No point in initializing the default PCI/MSI interrupt domain early and no
point to create it when XEN PV/HVM/DOM0 are active.

Move the initialization to pci_arch_init() and convert it to init ops so
that XEN can override it as XEN has it's own PCI/MSI management. The XEN
override comes in a later step.

Signed-off-by: Thomas Gleixner 
Cc: linux-...@vger.kernel.org
---
 arch/x86/include/asm/irqdomain.h |6 --
 arch/x86/include/asm/x86_init.h  |3 +++
 arch/x86/kernel/apic/msi.c   |   26 --
 arch/x86/kernel/apic/vector.c|2 --
 arch/x86/kernel/x86_init.c   |3 ++-
 arch/x86/pci/init.c  |3 +++
 6 files changed, 28 insertions(+), 15 deletions(-)

--- a/arch/x86/include/asm/irqdomain.h
+++ b/arch/x86/include/asm/irqdomain.h
@@ -51,9 +51,11 @@ extern int mp_irqdomain_ioapic_idx(struc
 #endif /* CONFIG_X86_IO_APIC */
 
 #ifdef CONFIG_PCI_MSI
-extern void arch_init_msi_domain(struct irq_domain *domain);
+void x86_create_pci_msi_domain(void);
+struct irq_domain *native_create_pci_msi_domain(void);
 #else
-static inline void arch_init_msi_domain(struct irq_domain *domain) { }
+static inline void x86_create_pci_msi_domain(void) { }
+#define native_create_pci_msi_domain   NULL
 #endif
 
 #endif
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -8,6 +8,7 @@ struct mpc_bus;
 struct mpc_cpu;
 struct mpc_table;
 struct cpuinfo_x86;
+struct irq_domain;
 
 /**
  * struct x86_init_mpparse - platform specific mpparse ops
@@ -42,12 +43,14 @@ struct x86_init_resources {
  * @intr_init: interrupt init code
  * @intr_mode_select:  interrupt delivery mode selection
  * @intr_mode_init:interrupt delivery mode setup
+ * @create_pci_msi_domain: Create the PCI/MSI interrupt domain
  */
 struct x86_init_irqs {
void (*pre_vector_init)(void);
void (*intr_init)(void);
void (*intr_mode_select)(void);
void (*intr_mode_init)(void);
+   struct irq_domain *(*create_pci_msi_domain)(void);
 };
 
 /**
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -21,7 +21,7 @@
 #include 
 #include 
 
-static struct irq_domain *msi_default_domain;
+static struct irq_domain *x86_pci_msi_default_domain __ro_after_init;
 
 static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
 {
@@ -192,7 +192,7 @@ int native_setup_msi_irqs(struct pci_dev
 
domain = irq_remapping_get_irq_domain(&info);
if (domain == NULL)
-   domain = msi_default_domain;
+   domain = x86_pci_msi_default_domain;
if (domain == NULL)
return -ENOSYS;
 
@@ -243,25 +243,31 @@ static struct msi_domain_info pci_msi_do
.handler_name   = "edge",
 };
 
-void __init arch_init_msi_domain(struct irq_domain *parent)
+struct irq_domain * __init native_create_pci_msi_domain(void)
 {
struct fwnode_handle *fn;
+   struct irq_domain *d;
 
if (disable_apic)
-   return;
+   return NULL;
 
fn = irq_domain_alloc_named_fwnode("PCI-MSI");
if (fn) {
-   msi_default_domain =
-   pci_msi_create_irq_domain(fn, &pci_msi_domain_info,
- parent);
+   d = pci_msi_create_irq_domain(fn, &pci_msi_domain_info,
+ x86_vector_domain);
}
-   if (!msi_default_domain) {
+   if (!d) {
irq_domain_free_fwnode(fn);
-   pr_warn("failed to initialize irqdomain for MSI/MSI-x.\n");
+   pr_warn("Failed to initialize PCI-MSI irqdomain.\n");
} else {
-   msi_default_domain->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK;
+   d->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK;
}
+   return d;
+}
+
+void __init x86_create_pci_msi_domain(void)
+{
+   x86_pci_msi_default_domain = x86_init.irqs.create_pci_msi_domain();
 }
 
 #ifdef CONFIG_IRQ_REMAP
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -713,8 +713,6 @@ int __init arch_early_irq_init(void)
BUG_ON(x86_vector_domain == NULL);
irq_set_default_host(x86_vector_domain);
 
-   arch_init_msi_domain(x86_vector_domain);
-
BUG_ON(!alloc_cpumask_var(&vector_searchmask, GFP_KERNEL));
 
/*
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -76,7 +76,8 @@ struct x86_init_ops x86_init __initdata
.pre_vector_init= init_ISA_irqs,
.intr_init  = native_init_IRQ,
.intr_mode_select   = apic_intr_mode_select,
-   .intr_mode_init = apic_intr_mode_init
+   .intr_mode_init = apic_intr_mode_init,
+   .create_pci_msi_domain  = native_create_pci_msi_domain,
},
 
.oem = {
--- a/arch/x86/pci/init.c
+++ b/arch/x86/pci/init.c
@@ -3,6 +3,7 @@
 #include 
 #i

[patch RFC 12/38] x86/irq: Consolidate UV domain allocation

2020-08-20 Thread Thomas Gleixner

Move the UV specific fields into their own struct for readability sake. Get
rid of the #ifdeffery as it does not matter at all whether the alloc info
is a couple of bytes longer or not.

Signed-off-by: Thomas Gleixner 
Cc: Steve Wahl 
Cc:  Dimitri Sivanich 
Cc: Russ Anderson 
---
 arch/x86/include/asm/hw_irq.h |   21 -
 arch/x86/platform/uv/uv_irq.c |   16 
 2 files changed, 20 insertions(+), 17 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -53,6 +53,14 @@ struct ioapic_alloc_info {
struct IO_APIC_route_entry  *entry;
 };
 
+struct uv_alloc_info {
+   int limit;
+   int blade;
+   unsigned long   offset;
+   char*name;
+
+};
+
 /**
  * irq_alloc_info - X86 specific interrupt allocation info
  * @type:  X86 specific allocation type
@@ -64,7 +72,8 @@ struct ioapic_alloc_info {
  * @data:  Allocation specific data
  *
  * @ioapic:IOAPIC specific allocation data
- */
+ * @uv:UV specific allocation data
+*/
 struct irq_alloc_info {
enum irq_alloc_type type;
u32 flags;
@@ -76,6 +85,8 @@ struct irq_alloc_info {
 
union {
struct ioapic_alloc_infoioapic;
+   struct uv_alloc_infouv;
+
int unused;
 #ifdef CONFIG_PCI_MSI
struct {
@@ -83,14 +94,6 @@ struct irq_alloc_info {
irq_hw_number_t msi_hwirq;
};
 #endif
-#ifdef CONFIG_X86_UV
-   struct {
-   int uv_limit;
-   int uv_blade;
-   unsigned long   uv_offset;
-   char*uv_name;
-   };
-#endif
};
 };
 
--- a/arch/x86/platform/uv/uv_irq.c
+++ b/arch/x86/platform/uv/uv_irq.c
@@ -90,15 +90,15 @@ static int uv_domain_alloc(struct irq_do
 
ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, arg);
if (ret >= 0) {
-   if (info->uv_limit == UV_AFFINITY_CPU)
+   if (info->uv.limit == UV_AFFINITY_CPU)
irq_set_status_flags(virq, IRQ_NO_BALANCING);
else
irq_set_status_flags(virq, IRQ_MOVE_PCNTXT);
 
-   chip_data->pnode = uv_blade_to_pnode(info->uv_blade);
-   chip_data->offset = info->uv_offset;
+   chip_data->pnode = uv_blade_to_pnode(info->uv.blade);
+   chip_data->offset = info->uv.offset;
irq_domain_set_info(domain, virq, virq, &uv_irq_chip, chip_data,
-   handle_percpu_irq, NULL, info->uv_name);
+   handle_percpu_irq, NULL, info->uv.name);
} else {
kfree(chip_data);
}
@@ -193,10 +193,10 @@ int uv_setup_irq(char *irq_name, int cpu
 
init_irq_alloc_info(&info, cpumask_of(cpu));
info.type = X86_IRQ_ALLOC_TYPE_UV;
-   info.uv_limit = limit;
-   info.uv_blade = mmr_blade;
-   info.uv_offset = mmr_offset;
-   info.uv_name = irq_name;
+   info.uv.limit = limit;
+   info.uv.blade = mmr_blade;
+   info.uv.offset = mmr_offset;
+   info.uv.name = irq_name;
 
return irq_domain_alloc_irqs(domain, 1,
 uv_blade_to_memory_nid(mmr_blade), &info);

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 07/38] iommu/irq_remapping: Consolidate irq domain lookup

2020-08-20 Thread Thomas Gleixner

Now that the iommu implementations handle the X86_*_GET_PARENT_DOMAIN
types, consolidate the two getter functions. 

Signed-off-by: Thomas Gleixner 
Cc: Wei Liu 
Cc: Joerg Roedel 
Cc: linux-hyp...@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Jon Derrick 
Cc: Lu Baolu 
---
 arch/x86/include/asm/irq_remapping.h |8 
 arch/x86/kernel/apic/io_apic.c   |2 +-
 arch/x86/kernel/apic/msi.c   |2 +-
 drivers/iommu/amd/iommu.c|1 -
 drivers/iommu/hyperv-iommu.c |4 ++--
 drivers/iommu/intel/irq_remapping.c  |1 -
 drivers/iommu/irq_remapping.c|   23 +--
 drivers/iommu/irq_remapping.h|5 +
 8 files changed, 6 insertions(+), 40 deletions(-)

--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -45,8 +45,6 @@ extern int irq_remap_enable_fault_handli
 extern void panic_if_irq_remap(const char *msg);
 
 extern struct irq_domain *
-irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info);
-extern struct irq_domain *
 irq_remapping_get_irq_domain(struct irq_alloc_info *info);
 
 /* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */
@@ -74,12 +72,6 @@ static inline void panic_if_irq_remap(co
 }
 
 static inline struct irq_domain *
-irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info)
-{
-   return NULL;
-}
-
-static inline struct irq_domain *
 irq_remapping_get_irq_domain(struct irq_alloc_info *info)
 {
return NULL;
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2298,7 +2298,7 @@ static int mp_irqdomain_create(int ioapi
init_irq_alloc_info(&info, NULL);
info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
info.ioapic_id = mpc_ioapic_id(ioapic);
-   parent = irq_remapping_get_ir_irq_domain(&info);
+   parent = irq_remapping_get_irq_domain(&info);
if (!parent)
parent = x86_vector_domain;
else
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -478,7 +478,7 @@ struct irq_domain *hpet_create_irq_domai
init_irq_alloc_info(&info, NULL);
info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
info.hpet_id = hpet_id;
-   parent = irq_remapping_get_ir_irq_domain(&info);
+   parent = irq_remapping_get_irq_domain(&info);
if (parent == NULL)
parent = x86_vector_domain;
else
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3561,7 +3561,6 @@ struct irq_remap_ops amd_iommu_irq_ops =
.disable= amd_iommu_disable,
.reenable   = amd_iommu_reenable,
.enable_faulting= amd_iommu_enable_faulting,
-   .get_ir_irq_domain  = get_irq_domain,
.get_irq_domain = get_irq_domain,
 };
 
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -182,7 +182,7 @@ static int __init hyperv_enable_irq_rema
return IRQ_REMAP_X2APIC_MODE;
 }
 
-static struct irq_domain *hyperv_get_ir_irq_domain(struct irq_alloc_info *info)
+static struct irq_domain *hyperv_get_irq_domain(struct irq_alloc_info *info)
 {
if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT)
return ioapic_ir_domain;
@@ -193,7 +193,7 @@ static struct irq_domain *hyperv_get_ir_
 struct irq_remap_ops hyperv_irq_remap_ops = {
.prepare= hyperv_prepare_irq_remapping,
.enable = hyperv_enable_irq_remapping,
-   .get_ir_irq_domain  = hyperv_get_ir_irq_domain,
+   .get_irq_domain = hyperv_get_irq_domain,
 };
 
 #endif
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1131,7 +1131,6 @@ struct irq_remap_ops intel_irq_remap_ops
.disable= disable_irq_remapping,
.reenable   = reenable_irq_remapping,
.enable_faulting= enable_drhd_fault_handling,
-   .get_ir_irq_domain  = intel_get_irq_domain,
.get_irq_domain = intel_get_irq_domain,
 };
 
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -160,33 +160,12 @@ void panic_if_irq_remap(const char *msg)
 }
 
 /**
- * irq_remapping_get_ir_irq_domain - Get the irqdomain associated with the 
IOMMU
- *  device serving request @info
- * @info: interrupt allocation information, used to identify the IOMMU device
- *
- * It's used to get parent irqdomain for HPET and IOAPIC irqdomains.
- * Returns pointer to IRQ domain, or NULL on failure.
- */
-struct irq_domain *
-irq_remapping_get_ir_irq_domain(struct irq_alloc_info *info)
-{
-   if (!remap_ops || !remap_ops->get_ir_irq_domain)
-   return NULL;
-
-   return remap_ops->get_ir_irq_domain(info);
-}
-
-/**
  * irq_remapping_get_irq_domain - Get the irqdomain serving the request @info
  * @info: interrupt allocation information,

[patch RFC 23/38] x86/xen: Rework MSI teardown

2020-08-20 Thread Thomas Gleixner

X86 cannot store the irq domain pointer in struct device without breaking
XEN because the irq domain pointer takes precedence over arch_*_msi_irqs()
fallbacks.

XENs MSI teardown relies on default_teardown_msi_irqs() which invokes
arch_teardown_msi_irq(). default_teardown_msi_irqs() is a trivial iterator
over the msi entries associated to a device.

Implement this loop in xen_teardown_msi_irqs() to prepare for removal of
the fallbacks for X86.

This is a preparatory step to wrap XEN MSI alloc/free into a irq domain
which in turn allows to store the irq domain pointer in struct device and
to use the irq domain functions directly.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/pci/xen.c |   23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -376,20 +376,31 @@ static void xen_initdom_restore_msi_irqs
 static void xen_teardown_msi_irqs(struct pci_dev *dev)
 {
struct msi_desc *msidesc;
+   int i;
+
+   for_each_pci_msi_entry(msidesc, dev) {
+   if (msidesc->irq) {
+   for (i = 0; i < msidesc->nvec_used; i++)
+   xen_destroy_irq(msidesc->irq + i);
+   }
+   }
+}
+
+static void xen_pv_teardown_msi_irqs(struct pci_dev *dev)
+{
+   struct msi_desc *msidesc = first_pci_msi_entry(dev);
 
-   msidesc = first_pci_msi_entry(dev);
if (msidesc->msi_attrib.is_msix)
xen_pci_frontend_disable_msix(dev);
else
xen_pci_frontend_disable_msi(dev);
 
-   /* Free the IRQ's and the msidesc using the generic code. */
-   default_teardown_msi_irqs(dev);
+   xen_teardown_msi_irqs(dev);
 }
 
 static void xen_teardown_msi_irq(unsigned int irq)
 {
-   xen_destroy_irq(irq);
+   WARN_ON_ONCE(1);
 }
 
 #endif
@@ -412,7 +423,7 @@ int __init pci_xen_init(void)
 #ifdef CONFIG_PCI_MSI
x86_msi.setup_msi_irqs = xen_setup_msi_irqs;
x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
-   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
+   x86_msi.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
pci_msi_ignore_mask = 1;
 #endif
return 0;
@@ -436,6 +447,7 @@ static void __init xen_hvm_msi_init(void
}
 
x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs;
+   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
 }
 #endif
@@ -472,6 +484,7 @@ int __init pci_xen_initial_domain(void)
 #ifdef CONFIG_PCI_MSI
x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs;
x86_msi.teardown_msi_irq = xen_teardown_msi_irq;
+   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
pci_msi_ignore_mask = 1;
 #endif

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 19/38] irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI

2020-08-20 Thread Thomas Gleixner

PCI devices behind a VMD bus are not subject to interrupt remapping, but
the irq domain for VMD MSI cannot be distinguished from a regular PCI/MSI
irq domain.

Add a new domain bus token and allow it in the bus token check in
msi_check_reservation_mode() to keep the functionality the same once VMD
uses this token.

Signed-off-by: Thomas Gleixner 
Cc: Jon Derrick 
---
 include/linux/irqdomain.h |1 +
 kernel/irq/msi.c  |7 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -84,6 +84,7 @@ enum irq_domain_bus_token {
DOMAIN_BUS_FSL_MC_MSI,
DOMAIN_BUS_TI_SCI_INTA_MSI,
DOMAIN_BUS_WAKEUP,
+   DOMAIN_BUS_VMD_MSI,
 };
 
 /**
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -370,8 +370,13 @@ static bool msi_check_reservation_mode(s
 {
struct msi_desc *desc;
 
-   if (domain->bus_token != DOMAIN_BUS_PCI_MSI)
+   switch(domain->bus_token) {
+   case DOMAIN_BUS_PCI_MSI:
+   case DOMAIN_BUS_VMD_MSI:
+   break;
+   default:
return false;
+   }
 
if (!(info->flags & MSI_FLAG_MUST_REACTIVATE))
return false;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 32/38] x86/irq: Make most MSI ops XEN private

2020-08-20 Thread Thomas Gleixner

Nothing except XEN uses the setup/teardown ops. Hide them there.

Signed-off-by: Thomas Gleixner 
Cc: xen-de...@lists.xenproject.org
Cc: linux-...@vger.kernel.org
---
 arch/x86/include/asm/x86_init.h |2 --
 arch/x86/pci/xen.c  |   23 +++
 2 files changed, 15 insertions(+), 10 deletions(-)

--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -276,8 +276,6 @@ struct x86_platform_ops {
 struct pci_dev;
 
 struct x86_msi_ops {
-   int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type);
-   void (*teardown_msi_irqs)(struct pci_dev *dev);
void (*restore_msi_irqs)(struct pci_dev *dev);
 };
 
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -156,6 +156,13 @@ static int acpi_register_gsi_xen(struct
 struct xen_pci_frontend_ops *xen_pci_frontend;
 EXPORT_SYMBOL_GPL(xen_pci_frontend);
 
+struct xen_msi_ops {
+   int (*setup_msi_irqs)(struct pci_dev *dev, int nvec, int type);
+   void (*teardown_msi_irqs)(struct pci_dev *dev);
+};
+
+static struct xen_msi_ops xen_msi_ops __ro_after_init;
+
 static int xen_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 {
int irq, ret, i;
@@ -414,7 +421,7 @@ static int xen_msi_domain_alloc_irqs(str
else
type = PCI_CAP_ID_MSI;
 
-   return x86_msi.setup_msi_irqs(to_pci_dev(dev), nvec, type);
+   return xen_msi_ops.setup_msi_irqs(to_pci_dev(dev), nvec, type);
 }
 
 static void xen_msi_domain_free_irqs(struct irq_domain *domain,
@@ -423,7 +430,7 @@ static void xen_msi_domain_free_irqs(str
if (WARN_ON_ONCE(!dev_is_pci(dev)))
return;
 
-   x86_msi.teardown_msi_irqs(to_pci_dev(dev));
+   xen_msi_ops.teardown_msi_irqs(to_pci_dev(dev));
 }
 
 static struct msi_domain_ops xen_pci_msi_domain_ops = {
@@ -461,17 +468,17 @@ static __init struct irq_domain *xen_cre
 static __init void xen_setup_pci_msi(void)
 {
if (xen_initial_domain()) {
-   x86_msi.setup_msi_irqs = xen_initdom_setup_msi_irqs;
-   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
+   xen_msi_ops.setup_msi_irqs = xen_initdom_setup_msi_irqs;
+   xen_msi_ops.teardown_msi_irqs = xen_teardown_msi_irqs;
x86_msi.restore_msi_irqs = xen_initdom_restore_msi_irqs;
pci_msi_ignore_mask = 1;
} else if (xen_pv_domain()) {
-   x86_msi.setup_msi_irqs = xen_setup_msi_irqs;
-   x86_msi.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
+   xen_msi_ops.setup_msi_irqs = xen_setup_msi_irqs;
+   xen_msi_ops.teardown_msi_irqs = xen_pv_teardown_msi_irqs;
pci_msi_ignore_mask = 1;
} else if (xen_hvm_domain()) {
-   x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs;
-   x86_msi.teardown_msi_irqs = xen_teardown_msi_irqs;
+   xen_msi_ops.setup_msi_irqs = xen_hvm_setup_msi_irqs;
+   xen_msi_ops.teardown_msi_irqs = xen_teardown_msi_irqs;
} else {
WARN_ON_ONCE(1);
return;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 38/38] irqchip: Add IMS array driver - NOT FOR MERGING

2020-08-20 Thread Thomas Gleixner

A generic IMS irq chip and irq domain implementation for IMS based devices
which utilize a MSI message store array on chip.

Allows IMS devices with a MSI message store array to reuse this code for
different array sizes.

Allocation and freeing of interrupts happens via the generic
msi_domain_alloc/free_irqs() interface. No special purpose IMS magic
required as long as the interrupt domain is stored in the underlying device
struct.

Completely untested of course and mostly for illustration and educational
purpose. This should of course be a modular irq chip, but adding that
support is left as an exercise for the people who care about this deeply.

Signed-off-by: Thomas Gleixner 
Cc: Marc Zyngier 
Cc: Megha Dey 
Cc: Jason Gunthorpe 
Cc: Dave Jiang 
Cc: Alex Williamson 
Cc: Jacob Pan 
Cc: Baolu Lu 
Cc: Kevin Tian 
Cc: Dan Williams 
---
 drivers/irqchip/Kconfig |8 +
 drivers/irqchip/Makefile|1 
 drivers/irqchip/irq-ims-msi.c   |  169 
 include/linux/irqchip/irq-ims-msi.h |   41 
 4 files changed, 219 insertions(+)

--- a/drivers/irqchip/Kconfig
+++ b/drivers/irqchip/Kconfig
@@ -571,4 +571,12 @@ config LOONGSON_PCH_MSI
help
  Support for the Loongson PCH MSI Controller.
 
+config IMS_MSI
+   bool "IMS Interrupt Message Store MSI controller"
+   depends on PCI
+   select DEVICE_MSI
+   help
+ Support for IMS Interrupt Message Store MSI controller
+ with IMS slot storage in a slot array
+
 endmenu
--- a/drivers/irqchip/Makefile
+++ b/drivers/irqchip/Makefile
@@ -111,3 +111,4 @@ obj-$(CONFIG_LOONGSON_HTPIC)+= irq-loo
 obj-$(CONFIG_LOONGSON_HTVEC)   += irq-loongson-htvec.o
 obj-$(CONFIG_LOONGSON_PCH_PIC) += irq-loongson-pch-pic.o
 obj-$(CONFIG_LOONGSON_PCH_MSI) += irq-loongson-pch-msi.o
+obj-$(CONFIG_IMS_MSI)  += irq-ims-msi.o
--- /dev/null
+++ b/drivers/irqchip/irq-ims-msi.c
@@ -0,0 +1,169 @@
+// SPDX-License-Identifier: GPL-2.0
+// (C) Copyright 2020 Thomas Gleixner 
+/*
+ * Shared interrupt chip and irq domain for Intel IMS devices
+ */
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+struct ims_data {
+   struct ims_array_info   info;
+   unsigned long   map[0];
+};
+
+static void ims_mask_irq(struct irq_data *data)
+{
+   struct msi_desc *desc = irq_data_get_msi_desc(data);
+   struct ims_array_slot __iomem *slot = desc->device_msi.priv_iomem;
+   u32 __iomem *ctrl = &slot->ctrl;
+
+   iowrite32(ioread32(ctrl) & ~IMS_VECTOR_CTRL_UNMASK, ctrl);
+}
+
+static void ims_unmask_irq(struct irq_data *data)
+{
+   struct msi_desc *desc = irq_data_get_msi_desc(data);
+   struct ims_array_slot __iomem *slot = desc->device_msi.priv_iomem;
+   u32 __iomem *ctrl = &slot->ctrl;
+
+   iowrite32(ioread32(ctrl) | IMS_VECTOR_CTRL_UNMASK, ctrl);
+}
+
+static void ims_write_msi_msg(struct irq_data *data, struct msi_msg *msg)
+{
+   struct msi_desc *desc = irq_data_get_msi_desc(data);
+   struct ims_array_slot __iomem *slot = desc->device_msi.priv_iomem;
+
+   iowrite32(msg->address_lo, &slot->address_lo);
+   iowrite32(msg->address_hi, &slot->address_hi);
+   iowrite32(msg->data, &slot->data);
+}
+
+static const struct irq_chip ims_msi_controller = {
+   .name   = "IMS",
+   .irq_mask   = ims_mask_irq,
+   .irq_unmask = ims_unmask_irq,
+   .irq_write_msi_msg  = ims_write_msi_msg,
+   .irq_retrigger  = irq_chip_retrigger_hierarchy,
+   .flags  = IRQCHIP_SKIP_SET_WAKE,
+};
+
+static void ims_reset_slot(struct ims_array_slot __iomem *slot)
+{
+   iowrite32(0, &slot->address_lo);
+   iowrite32(0, &slot->address_hi);
+   iowrite32(0, &slot->data);
+   iowrite32(0, &slot->ctrl);
+}
+
+static void ims_free_msi_store(struct irq_domain *domain, struct device *dev)
+{
+   struct msi_domain_info *info = domain->host_data;
+   struct ims_data *ims = info->data;
+   struct msi_desc *entry;
+
+   for_each_msi_entry(entry, dev) {
+   if (entry->device_msi.priv_iomem) {
+   clear_bit(entry->device_msi.hwirq, ims->map);
+   ims_reset_slot(entry->device_msi.priv_iomem);
+   entry->device_msi.priv_iomem = NULL;
+   entry->device_msi.hwirq = 0;
+   }
+   }
+}
+
+static int ims_alloc_msi_store(struct irq_domain *domain, struct device *dev,
+  int nvec)
+{
+   struct msi_domain_info *info = domain->host_data;
+   struct ims_data *ims = info->data;
+   struct msi_desc *entry;
+
+   for_each_msi_entry(entry, dev) {
+   unsigned int idx;
+
+   idx = find_first_zero_bit(ims->map, ims->info.max_slots);
+   if (idx >= ims->info.max_slots)
+   goto fail;
+   set_

[patch RFC 22/38] x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()

2020-08-20 Thread Thomas Gleixner

The only user is in the same file and the name is too generic because this
function is only ever used for HVM domains.

Signed-off-by: Thomas Gleixner 
Cc: Konrad Rzeszutek Wilk 
Cc: linux-...@vger.kernel.org
Cc: xen-de...@lists.xenproject.org
Cc: Juergen Gross 
Cc: Boris Ostrovsky 
Cc: Stefano Stabellini 

---
 arch/x86/pci/xen.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -419,7 +419,7 @@ int __init pci_xen_init(void)
 }
 
 #ifdef CONFIG_PCI_MSI
-void __init xen_msi_init(void)
+static void __init xen_hvm_msi_init(void)
 {
if (!disable_apic) {
/*
@@ -459,7 +459,7 @@ int __init pci_xen_hvm_init(void)
 * We need to wait until after x2apic is initialized
 * before we can set MSI IRQ ops.
 */
-   x86_platform.apic_post_init = xen_msi_init;
+   x86_platform.apic_post_init = xen_hvm_msi_init;
 #endif
return 0;
 }

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 02/38] x86/init: Remove unused init ops

2020-08-20 Thread Thomas Gleixner

Some past platform removal forgot to get rid of this unused ballast.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/include/asm/mpspec.h   |   10 --
 arch/x86/include/asm/x86_init.h |   10 --
 arch/x86/kernel/mpparse.c   |   26 --
 arch/x86/kernel/x86_init.c  |4 
 4 files changed, 4 insertions(+), 46 deletions(-)

--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -67,21 +67,11 @@ static inline void find_smp_config(void)
 #ifdef CONFIG_X86_MPPARSE
 extern void e820__memblock_alloc_reserved_mpc_new(void);
 extern int enable_update_mptable;
-extern int default_mpc_apic_id(struct mpc_cpu *m);
-extern void default_smp_read_mpc_oem(struct mpc_table *mpc);
-# ifdef CONFIG_X86_IO_APIC
-extern void default_mpc_oem_bus_info(struct mpc_bus *m, char *str);
-# else
-#  define default_mpc_oem_bus_info NULL
-# endif
 extern void default_find_smp_config(void);
 extern void default_get_smp_config(unsigned int early);
 #else
 static inline void e820__memblock_alloc_reserved_mpc_new(void) { }
 #define enable_update_mptable 0
-#define default_mpc_apic_id NULL
-#define default_smp_read_mpc_oem NULL
-#define default_mpc_oem_bus_info NULL
 #define default_find_smp_config x86_init_noop
 #define default_get_smp_config x86_init_uint_noop
 #endif
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -11,22 +11,12 @@ struct cpuinfo_x86;
 
 /**
  * struct x86_init_mpparse - platform specific mpparse ops
- * @mpc_record:platform specific mpc record accounting
  * @setup_ioapic_ids:  platform specific ioapic id override
- * @mpc_apic_id:   platform specific mpc apic id assignment
- * @smp_read_mpc_oem:  platform specific oem mpc table setup
- * @mpc_oem_pci_bus:   platform specific pci bus setup (default NULL)
- * @mpc_oem_bus_info:  platform specific mpc bus info
  * @find_smp_config:   find the smp configuration
  * @get_smp_config:get the smp configuration
  */
 struct x86_init_mpparse {
-   void (*mpc_record)(unsigned int mode);
void (*setup_ioapic_ids)(void);
-   int (*mpc_apic_id)(struct mpc_cpu *m);
-   void (*smp_read_mpc_oem)(struct mpc_table *mpc);
-   void (*mpc_oem_pci_bus)(struct mpc_bus *m);
-   void (*mpc_oem_bus_info)(struct mpc_bus *m, char *name);
void (*find_smp_config)(void);
void (*get_smp_config)(unsigned int early);
 };
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -46,11 +46,6 @@ static int __init mpf_checksum(unsigned
return sum & 0xFF;
 }
 
-int __init default_mpc_apic_id(struct mpc_cpu *m)
-{
-   return m->apicid;
-}
-
 static void __init MP_processor_info(struct mpc_cpu *m)
 {
int apicid;
@@ -61,7 +56,7 @@ static void __init MP_processor_info(str
return;
}
 
-   apicid = x86_init.mpparse.mpc_apic_id(m);
+   apicid = m->apicid;
 
if (m->cpuflag & CPU_BOOTPROCESSOR) {
bootup_cpu = " (Bootup-CPU)";
@@ -73,7 +68,7 @@ static void __init MP_processor_info(str
 }
 
 #ifdef CONFIG_X86_IO_APIC
-void __init default_mpc_oem_bus_info(struct mpc_bus *m, char *str)
+static void __init mpc_oem_bus_info(struct mpc_bus *m, char *str)
 {
memcpy(str, m->bustype, 6);
str[6] = 0;
@@ -84,7 +79,7 @@ static void __init MP_bus_info(struct mp
 {
char str[7];
 
-   x86_init.mpparse.mpc_oem_bus_info(m, str);
+   mpc_oem_bus_info(m, str);
 
 #if MAX_MP_BUSSES < 256
if (m->busid >= MAX_MP_BUSSES) {
@@ -100,9 +95,6 @@ static void __init MP_bus_info(struct mp
mp_bus_id_to_type[m->busid] = MP_BUS_ISA;
 #endif
} else if (strncmp(str, BUSTYPE_PCI, sizeof(BUSTYPE_PCI) - 1) == 0) {
-   if (x86_init.mpparse.mpc_oem_pci_bus)
-   x86_init.mpparse.mpc_oem_pci_bus(m);
-
clear_bit(m->busid, mp_bus_not_pci);
 #ifdef CONFIG_EISA
mp_bus_id_to_type[m->busid] = MP_BUS_PCI;
@@ -198,8 +190,6 @@ static void __init smp_dump_mptable(stru
1, mpc, mpc->length, 1);
 }
 
-void __init default_smp_read_mpc_oem(struct mpc_table *mpc) { }
-
 static int __init smp_read_mpc(struct mpc_table *mpc, unsigned early)
 {
char str[16];
@@ -218,14 +208,7 @@ static int __init smp_read_mpc(struct mp
if (early)
return 1;
 
-   if (mpc->oemptr)
-   x86_init.mpparse.smp_read_mpc_oem(mpc);
-
-   /*
-*  Now process the configuration blocks.
-*/
-   x86_init.mpparse.mpc_record(0);
-
+   /* Now process the configuration blocks. */
while (count < mpc->length) {
switch (*mpt) {
case MP_PROCESSOR:
@@ -256,7 +239,6 @@ static int __init smp_read_mpc(struct mp
count = mpc->length;
break;
}
-   x86_init.mpparse.mpc_record(1)

[patch RFC 03/38] x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency

2020-08-20 Thread Thomas Gleixner

No functional change.

Signed-off-by: Thomas Gleixner 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
---
 arch/x86/include/asm/hw_irq.h   |4 ++--
 arch/x86/kernel/apic/msi.c  |6 +++---
 drivers/iommu/amd/iommu.c   |   24 
 drivers/iommu/intel/irq_remapping.c |   18 +-
 4 files changed, 26 insertions(+), 26 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -36,8 +36,8 @@ struct msi_desc;
 enum irq_alloc_type {
X86_IRQ_ALLOC_TYPE_IOAPIC = 1,
X86_IRQ_ALLOC_TYPE_HPET,
-   X86_IRQ_ALLOC_TYPE_MSI,
-   X86_IRQ_ALLOC_TYPE_MSIX,
+   X86_IRQ_ALLOC_TYPE_PCI_MSI,
+   X86_IRQ_ALLOC_TYPE_PCI_MSIX,
X86_IRQ_ALLOC_TYPE_DMAR,
X86_IRQ_ALLOC_TYPE_UV,
 };
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -188,7 +188,7 @@ int native_setup_msi_irqs(struct pci_dev
struct irq_alloc_info info;
 
init_irq_alloc_info(&info, NULL);
-   info.type = X86_IRQ_ALLOC_TYPE_MSI;
+   info.type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
info.msi_dev = dev;
 
domain = irq_remapping_get_irq_domain(&info);
@@ -220,9 +220,9 @@ int pci_msi_prepare(struct irq_domain *d
init_irq_alloc_info(arg, NULL);
arg->msi_dev = pdev;
if (desc->msi_attrib.is_msix) {
-   arg->type = X86_IRQ_ALLOC_TYPE_MSIX;
+   arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSIX;
} else {
-   arg->type = X86_IRQ_ALLOC_TYPE_MSI;
+   arg->type = X86_IRQ_ALLOC_TYPE_PCI_MSI;
arg->flags |= X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
}
 
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3514,8 +3514,8 @@ static int get_devid(struct irq_alloc_in
case X86_IRQ_ALLOC_TYPE_HPET:
devid = get_hpet_devid(info->hpet_id);
break;
-   case X86_IRQ_ALLOC_TYPE_MSI:
-   case X86_IRQ_ALLOC_TYPE_MSIX:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSI:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
devid = get_device_id(&info->msi_dev->dev);
break;
default:
@@ -3553,8 +3553,8 @@ static struct irq_domain *get_irq_domain
return NULL;
 
switch (info->type) {
-   case X86_IRQ_ALLOC_TYPE_MSI:
-   case X86_IRQ_ALLOC_TYPE_MSIX:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSI:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
devid = get_device_id(&info->msi_dev->dev);
if (devid < 0)
return NULL;
@@ -3615,8 +3615,8 @@ static void irq_remapping_prepare_irte(s
break;
 
case X86_IRQ_ALLOC_TYPE_HPET:
-   case X86_IRQ_ALLOC_TYPE_MSI:
-   case X86_IRQ_ALLOC_TYPE_MSIX:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSI:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
msg->address_hi = MSI_ADDR_BASE_HI;
msg->address_lo = MSI_ADDR_BASE_LO;
msg->data = irte_info->index;
@@ -3660,15 +3660,15 @@ static int irq_remapping_alloc(struct ir
 
if (!info)
return -EINVAL;
-   if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_MSI &&
-   info->type != X86_IRQ_ALLOC_TYPE_MSIX)
+   if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_PCI_MSI &&
+   info->type != X86_IRQ_ALLOC_TYPE_PCI_MSIX)
return -EINVAL;
 
/*
 * With IRQ remapping enabled, don't need contiguous CPU vectors
 * to support multiple MSI interrupts.
 */
-   if (info->type == X86_IRQ_ALLOC_TYPE_MSI)
+   if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI)
info->flags &= ~X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
 
devid = get_devid(info);
@@ -3700,9 +3700,9 @@ static int irq_remapping_alloc(struct ir
} else {
index = -ENOMEM;
}
-   } else if (info->type == X86_IRQ_ALLOC_TYPE_MSI ||
-  info->type == X86_IRQ_ALLOC_TYPE_MSIX) {
-   bool align = (info->type == X86_IRQ_ALLOC_TYPE_MSI);
+   } else if (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI ||
+  info->type == X86_IRQ_ALLOC_TYPE_PCI_MSIX) {
+   bool align = (info->type == X86_IRQ_ALLOC_TYPE_PCI_MSI);
 
index = alloc_irq_index(devid, nr_irqs, align, info->msi_dev);
} else {
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1115,8 +1115,8 @@ static struct irq_domain *intel_get_ir_i
case X86_IRQ_ALLOC_TYPE_HPET:
iommu = map_hpet_to_ir(info->hpet_id);
break;
-   case X86_IRQ_ALLOC_TYPE_MSI:
-   case X86_IRQ_ALLOC_TYPE_MSIX:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSI:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
iommu = map_dev_to_ir(info->msi_dev);
break;
default:
@@ -1135,8 +1135,8 @@ static struct irq_domain *intel_get_irq_
return N

[patch RFC 00/38] x86, PCI, XEN, genirq ...: Prepare for device MSI

2020-08-20 Thread Thomas Gleixner

First of all, sorry for the horrible long Cc list, which was
unfortunately unavoidable as this touches the world and some more.

This patch series aims to provide a base to support device MSI (non
PCI based) in a halfways architecture independent way.

It's a mixed bag of bug fixes, cleanups and general improvements which
are worthwhile independent of the device MSI stuff. Unfortunately this
also comes with an evil abuse of the irqdomain system to coerce XEN on
x86 into compliance without rewriting XEN from scratch.

As discussed in length in this mail thread:

  https://lore.kernel.org/r/87h7tcgbs2@nanos.tec.linutronix.de

the initial attempt of piggypacking device MSI support on platform MSI
is doomed for various reasons, but creating independent interrupt
domains for these upcoming magic PCI subdevices which are not PCI, but
might be exposed as PCI devices is not as trivial as it seems.

The initially suggested and evaluated approach of extending platform
MSI turned out to be the completely wrong direction and in fact
platform MSI should be rewritten on top of device MSI or completely
replaced by it.

One of the main issues is that x86 does not support the concept of irq
domains associations stored in device::msi_domain and still relies on
the arch_*_msi_irqs() fallback implementations which has it's own set
of problems as outlined in

  https://lore.kernel.org/r/87bljg7u4f@nanos.tec.linutronix.de/

in the very same thread.

The main obstacle of storing that pointer is XEN which has it's own
historical notiion of handling PCI MSI interupts.

This series tries to address these issues in several steps:

 1) Accidental bug fixes
iommu/amd: Prevent NULL pointer dereference

 2) Janitoring
x86/init: Remove unused init ops

 3) Simplification of the x86 specific interrupt allocation mechanism

x86/irq: Rename X86_IRQ_ALLOC_TYPE_MSI* to reflect PCI dependency
x86/irq: Add allocation type for parent domain retrieval
iommu/vt-d: Consolidate irq domain getter
iommu/amd: Consolidate irq domain getter
iommu/irq_remapping: Consolidate irq domain lookup

 4) Consolidation of the X86 specific interrupt allocation mechanism to be as 
close
as possible to the generic MSI allocation mechanism which allows to get rid
of quite a bunch of x86'isms which are pointless

x86/irq: Prepare consolidation of irq_alloc_info
x86/msi: Consolidate HPET allocation
x86/ioapic: Consolidate IOAPIC allocation
x86/irq: Consolidate DMAR irq allocation
x86/irq: Consolidate UV domain allocation
PCI: MSI: Rework pci_msi_domain_calc_hwirq()
x86/msi: Consolidate MSI allocation
x86/msi: Use generic MSI domain ops

  5) x86 specific cleanups to remove the dependency on arch_*_msi_irqs()

x86/irq: Move apic_post_init() invocation to one place
z86/pci: Reducde #ifdeffery in PCI init code
x86/irq: Initialize PCI/MSI domain at PCI init time
irqdomain/msi: Provide DOMAIN_BUS_VMD_MSI
PCI: vmd: Mark VMD irqdomain with DOMAIN_BUS_VMD_MSI
PCI: MSI: Provide pci_dev_has_special_msi_domain() helper
x86/xen: Make xen_msi_init() static and rename it to xen_hvm_msi_init()
x86/xen: Rework MSI teardown
x86/xen: Consolidate XEN-MSI init
irqdomain/msi: Allow to override msi_domain_alloc/free_irqs()
x86/xen: Wrap XEN MSI management into irqdomain
iommm/vt-d: Store irq domain in struct device
iommm/amd: Store irq domain in struct device
x86/pci: Set default irq domain in pcibios_add_device()
PCI/MSI: Allow to disable arch fallbacks
x86/irq: Cleanup the arch_*_msi_irqs() leftovers
x86/irq: Make most MSI ops XEN private

This one is paving the way to device MSI support, but it comes
with an ugly and evil hack. The ability of overriding the default
allocation/free functions of an MSI irq domain is useful in general as
(hopefully) demonstrated with the device MSI POC, but the abuse
in context of XEN is evil. OTOH without enough XENology and without
rewriting XEN from scratch wrapping XEN MSI handling into a pseudo
irq domain is a reasonable step forward for mere mortals with severly
limited XENology. One day the XEN folks might make it a real irq domain.
Perhaps when they have to support the same mess on other architectures.
Hope dies last...

At least the mechanism to override alloc/free turned out to be useful
for implementing the base infrastructure for device MSI. So it's not a
completely lost case.

  6) X86 specific preparation for device MSI

   x86/irq: Add DEV_MSI allocation type
   x86/msi: Let pci_msi_prepare() handle non-PCI MSI

  7) Generic device MSI infrastructure

   platform-msi: Provide default irq_chip:ack
   platform-msi: Add device MSI infrastructure

  8) Infrastructure for and a POC of an IMS (Interrupt Message

[patch RFC 04/38] x86/irq: Add allocation type for parent domain retrieval

2020-08-20 Thread Thomas Gleixner

irq_remapping_ir_irq_domain() is used to retrieve the remapping parent
domain for an allocation type. irq_remapping_irq_domain() is for retrieving
the actual device domain for allocating interrupts for a device.

The two functions are similar and can be unified by using explicit modes
for parent irq domain retrieval.

Add X86_IRQ_ALLOC_TYPE_IOAPIC/HPET_GET_PARENT and use it in the iommu
implementations. Drop the parent domain retrieval for PCI_MSI/X as that is
unused.

Signed-off-by: Thomas Gleixner 
Cc: Joerg Roedel 
Cc: x...@kernel.org
Cc: linux-hyp...@vger.kernel.org
Cc: iommu@lists.linux-foundation.org
Cc: Haiyang Zhang 
Cc: Jon Derrick 
Cc: Lu Baolu 
---
 arch/x86/include/asm/hw_irq.h   |2 ++
 arch/x86/kernel/apic/io_apic.c  |2 +-
 arch/x86/kernel/apic/msi.c  |2 +-
 drivers/iommu/amd/iommu.c   |8 
 drivers/iommu/hyperv-iommu.c|2 +-
 drivers/iommu/intel/irq_remapping.c |8 ++--
 6 files changed, 15 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -40,6 +40,8 @@ enum irq_alloc_type {
X86_IRQ_ALLOC_TYPE_PCI_MSIX,
X86_IRQ_ALLOC_TYPE_DMAR,
X86_IRQ_ALLOC_TYPE_UV,
+   X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT,
+   X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT,
 };
 
 struct irq_alloc_info {
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2296,7 +2296,7 @@ static int mp_irqdomain_create(int ioapi
return 0;
 
init_irq_alloc_info(&info, NULL);
-   info.type = X86_IRQ_ALLOC_TYPE_IOAPIC;
+   info.type = X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT;
info.ioapic_id = mpc_ioapic_id(ioapic);
parent = irq_remapping_get_ir_irq_domain(&info);
if (!parent)
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -476,7 +476,7 @@ struct irq_domain *hpet_create_irq_domai
domain_info->data = (void *)(long)hpet_id;
 
init_irq_alloc_info(&info, NULL);
-   info.type = X86_IRQ_ALLOC_TYPE_HPET;
+   info.type = X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT;
info.hpet_id = hpet_id;
parent = irq_remapping_get_ir_irq_domain(&info);
if (parent == NULL)
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3534,6 +3534,14 @@ static struct irq_domain *get_ir_irq_dom
if (!info)
return NULL;
 
+   switch (info->type) {
+   case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
+   case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
+   break;
+   default:
+   return NULL;
+   }
+
devid = get_devid(info);
if (devid >= 0) {
iommu = amd_iommu_rlookup_table[devid];
--- a/drivers/iommu/hyperv-iommu.c
+++ b/drivers/iommu/hyperv-iommu.c
@@ -184,7 +184,7 @@ static int __init hyperv_enable_irq_rema
 
 static struct irq_domain *hyperv_get_ir_irq_domain(struct irq_alloc_info *info)
 {
-   if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC)
+   if (info->type == X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT)
return ioapic_ir_domain;
else
return NULL;
--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -1109,16 +1109,12 @@ static struct irq_domain *intel_get_ir_i
return NULL;
 
switch (info->type) {
-   case X86_IRQ_ALLOC_TYPE_IOAPIC:
+   case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
iommu = map_ioapic_to_ir(info->ioapic_id);
break;
-   case X86_IRQ_ALLOC_TYPE_HPET:
+   case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
iommu = map_hpet_to_ir(info->hpet_id);
break;
-   case X86_IRQ_ALLOC_TYPE_PCI_MSI:
-   case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-   iommu = map_dev_to_ir(info->msi_dev);
-   break;
default:
BUG_ON(1);
break;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 05/38] iommu/vt-d: Consolidate irq domain getter

2020-08-20 Thread Thomas Gleixner

The irq domain request mode is now indicated in irq_alloc_info::type.

Consolidate the two getter functions into one.

Signed-off-by: Thomas Gleixner 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
Cc: Lu Baolu 
---
 drivers/iommu/intel/irq_remapping.c |   67 
 1 file changed, 24 insertions(+), 43 deletions(-)

--- a/drivers/iommu/intel/irq_remapping.c
+++ b/drivers/iommu/intel/irq_remapping.c
@@ -204,35 +204,40 @@ static int modify_irte(struct irq_2_iomm
return rc;
 }
 
-static struct intel_iommu *map_hpet_to_ir(u8 hpet_id)
+static struct irq_domain *map_hpet_to_ir(u8 hpet_id)
 {
int i;
 
-   for (i = 0; i < MAX_HPET_TBS; i++)
+   for (i = 0; i < MAX_HPET_TBS; i++) {
if (ir_hpet[i].id == hpet_id && ir_hpet[i].iommu)
-   return ir_hpet[i].iommu;
+   return ir_hpet[i].iommu->ir_domain;
+   }
return NULL;
 }
 
-static struct intel_iommu *map_ioapic_to_ir(int apic)
+static struct intel_iommu *map_ioapic_to_iommu(int apic)
 {
int i;
 
-   for (i = 0; i < MAX_IO_APICS; i++)
+   for (i = 0; i < MAX_IO_APICS; i++) {
if (ir_ioapic[i].id == apic && ir_ioapic[i].iommu)
return ir_ioapic[i].iommu;
+   }
return NULL;
 }
 
-static struct intel_iommu *map_dev_to_ir(struct pci_dev *dev)
+static struct irq_domain *map_ioapic_to_ir(int apic)
 {
-   struct dmar_drhd_unit *drhd;
+   struct intel_iommu *iommu = map_ioapic_to_iommu(apic);
 
-   drhd = dmar_find_matched_drhd_unit(dev);
-   if (!drhd)
-   return NULL;
+   return iommu ? iommu->ir_domain : NULL;
+}
+
+static struct irq_domain *map_dev_to_ir(struct pci_dev *dev)
+{
+   struct dmar_drhd_unit *drhd = dmar_find_matched_drhd_unit(dev);
 
-   return drhd->iommu;
+   return drhd ? drhd->iommu->ir_msi_domain : NULL;
 }
 
 static int clear_entries(struct irq_2_iommu *irq_iommu)
@@ -996,7 +1001,7 @@ static int __init parse_ioapics_under_ir
 
for (ioapic_idx = 0; ioapic_idx < nr_ioapics; ioapic_idx++) {
int ioapic_id = mpc_ioapic_id(ioapic_idx);
-   if (!map_ioapic_to_ir(ioapic_id)) {
+   if (!map_ioapic_to_iommu(ioapic_id)) {
pr_err(FW_BUG "ioapic %d has no mapping iommu, "
   "interrupt remapping will be disabled\n",
   ioapic_id);
@@ -1101,47 +1106,23 @@ static void prepare_irte(struct irte *ir
irte->redir_hint = 1;
 }
 
-static struct irq_domain *intel_get_ir_irq_domain(struct irq_alloc_info *info)
+static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info)
 {
-   struct intel_iommu *iommu = NULL;
-
if (!info)
return NULL;
 
switch (info->type) {
case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
-   iommu = map_ioapic_to_ir(info->ioapic_id);
-   break;
+   return map_ioapic_to_ir(info->ioapic_id);
case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-   iommu = map_hpet_to_ir(info->hpet_id);
-   break;
-   default:
-   BUG_ON(1);
-   break;
-   }
-
-   return iommu ? iommu->ir_domain : NULL;
-}
-
-static struct irq_domain *intel_get_irq_domain(struct irq_alloc_info *info)
-{
-   struct intel_iommu *iommu;
-
-   if (!info)
-   return NULL;
-
-   switch (info->type) {
+   return map_hpet_to_ir(info->hpet_id);
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-   iommu = map_dev_to_ir(info->msi_dev);
-   if (iommu)
-   return iommu->ir_msi_domain;
-   break;
+   return map_dev_to_ir(info->msi_dev);
default:
-   break;
+   WARN_ON_ONCE(1);
+   return NULL;
}
-
-   return NULL;
 }
 
 struct irq_remap_ops intel_irq_remap_ops = {
@@ -1150,7 +1131,7 @@ struct irq_remap_ops intel_irq_remap_ops
.disable= disable_irq_remapping,
.reenable   = reenable_irq_remapping,
.enable_faulting= enable_drhd_fault_handling,
-   .get_ir_irq_domain  = intel_get_ir_irq_domain,
+   .get_ir_irq_domain  = intel_get_irq_domain,
.get_irq_domain = intel_get_irq_domain,
 };
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[patch RFC 06/38] iommu/amd: Consolidate irq domain getter

2020-08-20 Thread Thomas Gleixner

The irq domain request mode is now indicated in irq_alloc_info::type.

Consolidate the two getter functions into one.

Signed-off-by: Thomas Gleixner 
Cc: Joerg Roedel 
Cc: iommu@lists.linux-foundation.org
---
 drivers/iommu/amd/iommu.c |   65 ++
 1 file changed, 21 insertions(+), 44 deletions(-)

--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -3505,77 +3505,54 @@ static void irte_ga_clear_allocated(stru
 
 static int get_devid(struct irq_alloc_info *info)
 {
-   int devid = -1;
-
switch (info->type) {
case X86_IRQ_ALLOC_TYPE_IOAPIC:
-   devid = get_ioapic_devid(info->ioapic_id);
-   break;
+   case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
+   return get_ioapic_devid(info->ioapic_id);
case X86_IRQ_ALLOC_TYPE_HPET:
-   devid = get_hpet_devid(info->hpet_id);
-   break;
+   case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
+   return get_hpet_devid(info->hpet_id);
case X86_IRQ_ALLOC_TYPE_PCI_MSI:
case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-   devid = get_device_id(&info->msi_dev->dev);
-   break;
+   return get_device_id(&info->msi_dev->dev);
default:
-   BUG_ON(1);
-   break;
+   WARN_ON_ONCE(1);
+   return -1;
}
-
-   return devid;
 }
 
-static struct irq_domain *get_ir_irq_domain(struct irq_alloc_info *info)
+static struct irq_domain *get_irq_domain_for_devid(struct irq_alloc_info *info,
+  int devid)
 {
-   struct amd_iommu *iommu;
-   int devid;
+   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
 
-   if (!info)
+   if (!iommu)
return NULL;
 
switch (info->type) {
case X86_IRQ_ALLOC_TYPE_IOAPIC_GET_PARENT:
case X86_IRQ_ALLOC_TYPE_HPET_GET_PARENT:
-   break;
+   return iommu->ir_domain;
+   case X86_IRQ_ALLOC_TYPE_PCI_MSI:
+   case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
+   return iommu->msi_domain;
default:
+   WARN_ON_ONCE(1);
return NULL;
}
-
-   devid = get_devid(info);
-   if (devid >= 0) {
-   iommu = amd_iommu_rlookup_table[devid];
-   if (iommu)
-   return iommu->ir_domain;
-   }
-
-   return NULL;
 }
 
 static struct irq_domain *get_irq_domain(struct irq_alloc_info *info)
 {
-   struct amd_iommu *iommu;
int devid;
 
if (!info)
return NULL;
 
-   switch (info->type) {
-   case X86_IRQ_ALLOC_TYPE_PCI_MSI:
-   case X86_IRQ_ALLOC_TYPE_PCI_MSIX:
-   devid = get_device_id(&info->msi_dev->dev);
-   if (devid < 0)
-   return NULL;
-
-   iommu = amd_iommu_rlookup_table[devid];
-   if (iommu)
-   return iommu->msi_domain;
-   break;
-   default:
-   break;
-   }
-
-   return NULL;
+   devid = get_devid(info);
+   if (devid < 0)
+   return NULL;
+   return get_irq_domain_for_devid(info, devid);
 }
 
 struct irq_remap_ops amd_iommu_irq_ops = {
@@ -3584,7 +3561,7 @@ struct irq_remap_ops amd_iommu_irq_ops =
.disable= amd_iommu_disable,
.reenable   = amd_iommu_reenable,
.enable_faulting= amd_iommu_enable_faulting,
-   .get_ir_irq_domain  = get_ir_irq_domain,
+   .get_ir_irq_domain  = get_irq_domain,
.get_irq_domain = get_irq_domain,
 };
 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 07/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)

2020-08-20 Thread Alex Williamson

On Fri, 21 Aug 2020 00:37:19 +
"Liu, Yi L"  wrote:

> Hi Alex,
> 
> > From: Alex Williamson 
> > Sent: Friday, August 21, 2020 4:51 AM
> > 
> > On Mon, 27 Jul 2020 23:27:36 -0700
> > Liu Yi L  wrote:
> >   
> > > This patch allows userspace to request PASID allocation/free, e.g.
> > > when serving the request from the guest.
> > >
> > > PASIDs that are not freed by userspace are automatically freed when
> > > the IOASID set is destroyed when process exits.
> > >
> > > Cc: Kevin Tian 
> > > CC: Jacob Pan 
> > > Cc: Alex Williamson 
> > > Cc: Eric Auger 
> > > Cc: Jean-Philippe Brucker 
> > > Cc: Joerg Roedel 
> > > Cc: Lu Baolu 
> > > Signed-off-by: Liu Yi L 
> > > Signed-off-by: Yi Sun 
> > > Signed-off-by: Jacob Pan 
> > > ---
> > > v5 -> v6:
> > > *) address comments from Eric against v5. remove the alloc/free helper.
> > >
> > > v4 -> v5:
> > > *) address comments from Eric Auger.
> > > *) the comments for the PASID_FREE request is addressed in patch 5/15 of
> > >this series.
> > >
> > > v3 -> v4:
> > > *) address comments from v3, except the below comment against the range
> > >of PASID_FREE request. needs more help on it.  
> > > "> +if (req.range.min > req.range.max)  
> > >
> > >  Is it exploitable that a user can spin the kernel for a long time in
> > >  the case of a free by calling this with [0, MAX_UINT] regardless of
> > >  their actual allocations?"
> > >
> > > https://lore.kernel.org/linux-iommu/20200702151832.048b4...@x1.home/
> > >
> > > v1 -> v2:
> > > *) move the vfio_mm related code to be a seprate module
> > > *) use a single structure for alloc/free, could support a range of
> > > PASIDs
> > > *) fetch vfio_mm at group_attach time instead of at iommu driver open
> > > time
> > > ---
> > >  drivers/vfio/Kconfig|  1 +
> > >  drivers/vfio/vfio_iommu_type1.c | 69  
> > +  
> > >  drivers/vfio/vfio_pasid.c   | 10 ++
> > >  include/linux/vfio.h|  6 
> > >  include/uapi/linux/vfio.h   | 37 ++
> > >  5 files changed, 123 insertions(+)
> > >
> > > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index
> > > 3d8a108..95d90c6 100644
> > > --- a/drivers/vfio/Kconfig
> > > +++ b/drivers/vfio/Kconfig
> > > @@ -2,6 +2,7 @@
> > >  config VFIO_IOMMU_TYPE1
> > >   tristate
> > >   depends on VFIO
> > > + select VFIO_PASID if (X86)
> > >   default n
> > >
> > >  config VFIO_IOMMU_SPAPR_TCE
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > > b/drivers/vfio/vfio_iommu_type1.c index 18ff0c3..ea89c7c 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -76,6 +76,7 @@ struct vfio_iommu {
> > >   booldirty_page_tracking;
> > >   boolpinned_page_dirty_scope;
> > >   struct iommu_nesting_info   *nesting_info;
> > > + struct vfio_mm  *vmm;
> > >  };
> > >
> > >  struct vfio_domain {
> > > @@ -1937,6 +1938,11 @@ static void vfio_iommu_iova_insert_copy(struct
> > > vfio_iommu *iommu,
> > >
> > >  static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
> > > {
> > > + if (iommu->vmm) {
> > > + vfio_mm_put(iommu->vmm);
> > > + iommu->vmm = NULL;
> > > + }
> > > +
> > >   kfree(iommu->nesting_info);
> > >   iommu->nesting_info = NULL;
> > >  }
> > > @@ -2071,6 +2077,26 @@ static int vfio_iommu_type1_attach_group(void  
> > *iommu_data,  
> > >   iommu->nesting_info);
> > >   if (ret)
> > >   goto out_detach;
> > > +
> > > + if (iommu->nesting_info->features &
> > > + IOMMU_NESTING_FEAT_SYSWIDE_PASID)  
> > {  
> > > + struct vfio_mm *vmm;
> > > + int sid;
> > > +
> > > + vmm = vfio_mm_get_from_task(current);
> > > + if (IS_ERR(vmm)) {
> > > + ret = PTR_ERR(vmm);
> > > + goto out_detach;
> > > + }
> > > + iommu->vmm = vmm;
> > > +
> > > + sid = vfio_mm_ioasid_sid(vmm);
> > > + ret = iommu_domain_set_attr(domain->domain,
> > > + DOMAIN_ATTR_IOASID_SID,
> > > + &sid);
> > > + if (ret)
> > > + goto out_detach;
> > > + }
> > >   }
> > >
> > >   /* Get aperture info */
> > > @@ -2859,6 +2885,47 @@ static int vfio_iommu_type1_dirty_pages(struct  
> > vfio_iommu *iommu,  
> > >   return -EINVAL;
> > >  }
> > >
> > > +static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu,
> > > +   unsigned long arg)
> > > +{
> > > + struct vfio_iommu_type1_pasid_request req;
> > > + unsigned long minsz;
> > > + int ret;
> > > +
> > > + minsz = offsetofend(struct vfio_iommu_type1_pasid_request, range);

RE: [PATCH v6 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs

2020-08-20 Thread Liu, Yi L

Hi Alex,

> From: Alex Williamson 
> Sent: Friday, August 21, 2020 5:49 AM
> 
> On Mon, 27 Jul 2020 23:27:41 -0700
> Liu Yi L  wrote:
> 
> > Recent years, mediated device pass-through framework (e.g. vfio-mdev)
> > is used to achieve flexible device sharing across domains (e.g. VMs).
> > Also there are hardware assisted mediated pass-through solutions from
> > platform vendors. e.g. Intel VT-d scalable mode which supports Intel
> > Scalable I/O Virtualization technology. Such mdevs are called IOMMU-
> > backed mdevs as there are IOMMU enforced DMA isolation for such mdevs.
> > In kernel, IOMMU-backed mdevs are exposed to IOMMU layer by aux-domain
> 
> Or a physical IOMMU backing device.

got it. :-)

> > concept, which means mdevs are protected by an iommu domain which is
> > auxiliary to the domain that the kernel driver primarily uses for DMA
> > API. Details can be found in the KVM presentation as below:
> >
> > https://events19.linuxfoundation.org/wp-content/uploads/2017/12/\
> > Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf
> 
> I think letting the line exceed 80 columns is preferable so that it's 
> clickable.  Thanks,

yeah, it's clickable now. will do it. :-)

Thanks,
Yi Liu

> Alex
> 
> > This patch extends NESTING_IOMMU ops to IOMMU-backed mdev devices. The
> > main requirement is to use the auxiliary domain associated with mdev.
> >
> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > CC: Jun Tian 
> > Cc: Alex Williamson 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Cc: Joerg Roedel 
> > Cc: Lu Baolu 
> > Reviewed-by: Eric Auger 
> > Signed-off-by: Liu Yi L 
> > ---
> > v5 -> v6:
> > *) add review-by from Eric Auger.
> >
> > v1 -> v2:
> > *) check the iommu_device to ensure the handling mdev is IOMMU-backed
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 40
> > 
> >  1 file changed, 36 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c index bf95a0f..9d8f252 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -2379,20 +2379,41 @@ static int vfio_iommu_resv_refresh(struct
> vfio_iommu *iommu,
> > return ret;
> >  }
> >
> > +static struct device *vfio_get_iommu_device(struct vfio_group *group,
> > +   struct device *dev)
> > +{
> > +   if (group->mdev_group)
> > +   return vfio_mdev_get_iommu_device(dev);
> > +   else
> > +   return dev;
> > +}
> > +
> >  static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)  {
> > struct domain_capsule *dc = (struct domain_capsule *)data;
> > unsigned long arg = *(unsigned long *)dc->data;
> > +   struct device *iommu_device;
> > +
> > +   iommu_device = vfio_get_iommu_device(dc->group, dev);
> > +   if (!iommu_device)
> > +   return -EINVAL;
> >
> > -   return iommu_uapi_sva_bind_gpasid(dc->domain, dev, (void __user *)arg);
> > +   return iommu_uapi_sva_bind_gpasid(dc->domain, iommu_device,
> > + (void __user *)arg);
> >  }
> >
> >  static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
> > {
> > struct domain_capsule *dc = (struct domain_capsule *)data;
> > unsigned long arg = *(unsigned long *)dc->data;
> > +   struct device *iommu_device;
> >
> > -   iommu_uapi_sva_unbind_gpasid(dc->domain, dev, (void __user *)arg);
> > +   iommu_device = vfio_get_iommu_device(dc->group, dev);
> > +   if (!iommu_device)
> > +   return -EINVAL;
> > +
> > +   iommu_uapi_sva_unbind_gpasid(dc->domain, iommu_device,
> > +(void __user *)arg);
> > return 0;
> >  }
> >
> > @@ -2401,8 +2422,13 @@ static int __vfio_dev_unbind_gpasid_fn(struct device
> *dev, void *data)
> > struct domain_capsule *dc = (struct domain_capsule *)data;
> > struct iommu_gpasid_bind_data *unbind_data =
> > (struct iommu_gpasid_bind_data *)dc->data;
> > +   struct device *iommu_device;
> > +
> > +   iommu_device = vfio_get_iommu_device(dc->group, dev);
> > +   if (!iommu_device)
> > +   return -EINVAL;
> >
> > -   iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data);
> > +   iommu_sva_unbind_gpasid(dc->domain, iommu_device, unbind_data);
> > return 0;
> >  }
> >
> > @@ -3060,8 +3086,14 @@ static int vfio_dev_cache_invalidate_fn(struct
> > device *dev, void *data)  {
> > struct domain_capsule *dc = (struct domain_capsule *)data;
> > unsigned long arg = *(unsigned long *)dc->data;
> > +   struct device *iommu_device;
> > +
> > +   iommu_device = vfio_get_iommu_device(dc->group, dev);
> > +   if (!iommu_device)
> > +   return -EINVAL;
> >
> > -   iommu_uapi_cache_invalidate(dc->domain, dev, (void __user *)arg);
> > +   iommu_uapi_cache_invalidate(dc->domain, iommu_device,
> > +   (void __user *)arg);
> > return 0;
> >  }
> >

___

RE: [PATCH v6 04/15] vfio/type1: Report iommu nesting info to userspace

2020-08-20 Thread Liu, Yi L

Hi Alex,

> From: Alex Williamson 
> Sent: Friday, August 21, 2020 3:52 AM
> 
> On Mon, 27 Jul 2020 23:27:33 -0700
> Liu Yi L  wrote:
> 
> > This patch exports iommu nesting capability info to user space through
> > VFIO. Userspace is expected to check this info for supported uAPIs (e.g.
> > PASID alloc/free, bind page table, and cache invalidation) and the vendor
> > specific format information for first level/stage page table that will be
> > bound to.
> >
> > The nesting info is available only after container set to be NESTED type.
> > Current implementation imposes one limitation - one nesting container
> > should include at most one iommu group. The philosophy of vfio container
> > is having all groups/devices within the container share the same IOMMU
> > context. When vSVA is enabled, one IOMMU context could include one 2nd-
> > level address space and multiple 1st-level address spaces. While the
> > 2nd-level address space is reasonably sharable by multiple groups, blindly
> > sharing 1st-level address spaces across all groups within the container
> > might instead break the guest expectation. In the future sub/super container
> > concept might be introduced to allow partial address space sharing within
> > an IOMMU context. But for now let's go with this restriction by requiring
> > singleton container for using nesting iommu features. Below link has the
> > related discussion about this decision.
> >
> > https://lore.kernel.org/kvm/20200515115924.37e69...@w520.home/
> >
> > This patch also changes the NESTING type container behaviour. Something
> > that would have succeeded before will now fail: Before this series, if
> > user asked for a VFIO_IOMMU_TYPE1_NESTING, it would have succeeded even
> > if the SMMU didn't support stage-2, as the driver would have silently
> > fallen back on stage-1 mappings (which work exactly the same as stage-2
> > only since there was no nesting supported). After the series, we do check
> > for DOMAIN_ATTR_NESTING so if user asks for VFIO_IOMMU_TYPE1_NESTING
> and
> > the SMMU doesn't support stage-2, the ioctl fails. But it should be a good
> > fix and completely harmless. Detail can be found in below link as well.
> >
> > https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/
> >
> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > Cc: Alex Williamson 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Cc: Joerg Roedel 
> > Cc: Lu Baolu 
> > Signed-off-by: Liu Yi L 
> > ---
> > v5 -> v6:
> > *) address comments against v5 from Eric Auger.
> > *) don't report nesting cap to userspace if the nesting_info->format is
> >invalid.
> >
> > v4 -> v5:
> > *) address comments from Eric Auger.
> > *) return struct iommu_nesting_info for
> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
> >cap is much "cheap", if needs extension in future, just define another 
> > cap.
> >https://lore.kernel.org/kvm/20200708132947.5b7ee...@x1.home/
> >
> > v3 -> v4:
> > *) address comments against v3.
> >
> > v1 -> v2:
> > *) added in v2
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 106
> +++-
> >  include/uapi/linux/vfio.h   |  19 +++
> >  2 files changed, 113 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c 
> > b/drivers/vfio/vfio_iommu_type1.c
> > index 3bd70ff..18ff0c3 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
> >  "Maximum number of user DMA mappings per container (65535).");
> >
> >  struct vfio_iommu {
> > -   struct list_headdomain_list;
> > -   struct list_headiova_list;
> > -   struct vfio_domain  *external_domain; /* domain for external user */
> > -   struct mutexlock;
> > -   struct rb_root  dma_list;
> > -   struct blocking_notifier_head notifier;
> > -   unsigned intdma_avail;
> > -   uint64_tpgsize_bitmap;
> > -   boolv2;
> > -   boolnesting;
> > -   booldirty_page_tracking;
> > -   boolpinned_page_dirty_scope;
> > +   struct list_headdomain_list;
> > +   struct list_headiova_list;
> > +   /* domain for external user */
> > +   struct vfio_domain  *external_domain;
> > +   struct mutexlock;
> > +   struct rb_root  dma_list;
> > +   struct blocking_notifier_head   notifier;
> > +   unsigned intdma_avail;
> > +   uint64_tpgsize_bitmap;
> > +   boolv2;
> > +   boolnesting;
> > +   booldirty_page_tracking;
> > +   boolpinned_page_dirty_scope;
> > +   struct iommu_nesting_info   *nesting_info;
> >  };
> >
> >  struct vfio_domain {
> > @@ -130,6 +132,9 @@ struct vfio_regions {
> >  #define IS_IOMMU_CAP_DOMAIN_

RE: [PATCH v6 07/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)

2020-08-20 Thread Liu, Yi L

Hi Alex,

> From: Alex Williamson 
> Sent: Friday, August 21, 2020 4:51 AM
> 
> On Mon, 27 Jul 2020 23:27:36 -0700
> Liu Yi L  wrote:
> 
> > This patch allows userspace to request PASID allocation/free, e.g.
> > when serving the request from the guest.
> >
> > PASIDs that are not freed by userspace are automatically freed when
> > the IOASID set is destroyed when process exits.
> >
> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > Cc: Alex Williamson 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Cc: Joerg Roedel 
> > Cc: Lu Baolu 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Yi Sun 
> > Signed-off-by: Jacob Pan 
> > ---
> > v5 -> v6:
> > *) address comments from Eric against v5. remove the alloc/free helper.
> >
> > v4 -> v5:
> > *) address comments from Eric Auger.
> > *) the comments for the PASID_FREE request is addressed in patch 5/15 of
> >this series.
> >
> > v3 -> v4:
> > *) address comments from v3, except the below comment against the range
> >of PASID_FREE request. needs more help on it.
> > "> +if (req.range.min > req.range.max)
> >
> >  Is it exploitable that a user can spin the kernel for a long time in
> >  the case of a free by calling this with [0, MAX_UINT] regardless of
> >  their actual allocations?"
> >
> > https://lore.kernel.org/linux-iommu/20200702151832.048b4...@x1.home/
> >
> > v1 -> v2:
> > *) move the vfio_mm related code to be a seprate module
> > *) use a single structure for alloc/free, could support a range of
> > PASIDs
> > *) fetch vfio_mm at group_attach time instead of at iommu driver open
> > time
> > ---
> >  drivers/vfio/Kconfig|  1 +
> >  drivers/vfio/vfio_iommu_type1.c | 69
> +
> >  drivers/vfio/vfio_pasid.c   | 10 ++
> >  include/linux/vfio.h|  6 
> >  include/uapi/linux/vfio.h   | 37 ++
> >  5 files changed, 123 insertions(+)
> >
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index
> > 3d8a108..95d90c6 100644
> > --- a/drivers/vfio/Kconfig
> > +++ b/drivers/vfio/Kconfig
> > @@ -2,6 +2,7 @@
> >  config VFIO_IOMMU_TYPE1
> > tristate
> > depends on VFIO
> > +   select VFIO_PASID if (X86)
> > default n
> >
> >  config VFIO_IOMMU_SPAPR_TCE
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c index 18ff0c3..ea89c7c 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -76,6 +76,7 @@ struct vfio_iommu {
> > booldirty_page_tracking;
> > boolpinned_page_dirty_scope;
> > struct iommu_nesting_info   *nesting_info;
> > +   struct vfio_mm  *vmm;
> >  };
> >
> >  struct vfio_domain {
> > @@ -1937,6 +1938,11 @@ static void vfio_iommu_iova_insert_copy(struct
> > vfio_iommu *iommu,
> >
> >  static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
> > {
> > +   if (iommu->vmm) {
> > +   vfio_mm_put(iommu->vmm);
> > +   iommu->vmm = NULL;
> > +   }
> > +
> > kfree(iommu->nesting_info);
> > iommu->nesting_info = NULL;
> >  }
> > @@ -2071,6 +2077,26 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
> > iommu->nesting_info);
> > if (ret)
> > goto out_detach;
> > +
> > +   if (iommu->nesting_info->features &
> > +   IOMMU_NESTING_FEAT_SYSWIDE_PASID)
> {
> > +   struct vfio_mm *vmm;
> > +   int sid;
> > +
> > +   vmm = vfio_mm_get_from_task(current);
> > +   if (IS_ERR(vmm)) {
> > +   ret = PTR_ERR(vmm);
> > +   goto out_detach;
> > +   }
> > +   iommu->vmm = vmm;
> > +
> > +   sid = vfio_mm_ioasid_sid(vmm);
> > +   ret = iommu_domain_set_attr(domain->domain,
> > +   DOMAIN_ATTR_IOASID_SID,
> > +   &sid);
> > +   if (ret)
> > +   goto out_detach;
> > +   }
> > }
> >
> > /* Get aperture info */
> > @@ -2859,6 +2885,47 @@ static int vfio_iommu_type1_dirty_pages(struct
> vfio_iommu *iommu,
> > return -EINVAL;
> >  }
> >
> > +static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu,
> > + unsigned long arg)
> > +{
> > +   struct vfio_iommu_type1_pasid_request req;
> > +   unsigned long minsz;
> > +   int ret;
> > +
> > +   minsz = offsetofend(struct vfio_iommu_type1_pasid_request, range);
> > +
> > +   if (copy_from_user(&req, (void __user *)arg, minsz))
> > +   return -EFAULT;
> > +
> > +   if (req.argsz < minsz || (req.flags & ~VFIO_PASID_REQUEST_MASK))
> > +   return -EINVAL;
> > +
> > +   if (req.range.min > req.range.max)
> > +

Re: [PATCH 12/18] iommu/tegra-gart: Add IOMMU_DOMAIN_DMA support

2020-08-20 Thread Robin Murphy


On 2020-08-20 21:16, Dmitry Osipenko wrote:

20.08.2020 18:08, Robin Murphy пишет:

Now that arch/arm is wired up for default domains and iommu-dma,
implement the corresponding driver-side support for DMA domains.

Signed-off-by: Robin Murphy 
---
  drivers/iommu/tegra-gart.c | 17 -
  1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index fac720273889..e081387080f6 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -9,6 +9,7 @@
  
  #define dev_fmt(fmt)	"gart: " fmt
  
+#include 

  #include 
  #include 
  #include 
@@ -145,16 +146,22 @@ static struct iommu_domain 
*gart_iommu_domain_alloc(unsigned type)
  {
struct iommu_domain *domain;


Hello, Robin!

Tegra20 GART isn't a real IOMMU, but a small relocation aperture. We
would only want to use it for a temporal mappings (managed by GPU
driver) for the time while GPU hardware is busy and working with a
sparse DMA buffers, the driver will take care of unmapping the sparse
buffers once GPU work is finished [1]. In a case of contiguous DMA
buffers, we want to bypass the IOMMU and use buffer's phys address
because GART aperture is small and all buffers simply can't fit into
GART for a complex GPU operations that involve multiple buffers [2][3].
The upstream GPU driver still doesn't support GART, but eventually it
needs to be changed.

[1]
https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gart.c#L489

[2]
https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gart.c#L542

[3]
https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/uapi/patching.c#L90


-   if (type != IOMMU_DOMAIN_UNMANAGED)
+   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
return NULL;


Will a returned NULL tell to IOMMU core that implicit domain shouldn't
be used? Is it possible to leave this driver as-is?


The aim of this patch was just to make the conversion without functional 
changes wherever possible, i.e. maintain an equivalent to the existing 
ARM behaviour of allocating its own implicit domains for everything. It 
doesn't represent any judgement of whether that was ever appropriate for 
this driver in the first place ;)


Hopefully my other reply already covered the degree of control drivers 
can have with proper default domains, but do shout if anything wasn't clear.


Cheers,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v6 08/15] iommu: Pass domain to sva_unbind_gpasid()

2020-08-20 Thread Liu, Yi L

Hi Alex,

> From: Alex Williamson 
> Sent: Friday, August 21, 2020 5:06 AM
> 
> On Mon, 27 Jul 2020 23:27:37 -0700
> Liu Yi L  wrote:
> 
> > From: Yi Sun 
> >
> > Current interface is good enough for SVA virtualization on an assigned
> > physical PCI device, but when it comes to mediated devices, a physical
> > device may attached with multiple aux-domains. Also, for guest unbind,
> 
> s/may/may be/

got it.

> 
> > the PASID to be unbind should be allocated to the VM. This check
> > requires to know the ioasid_set which is associated with the domain.
> >
> > So this interface needs to pass in domain info. Then the iommu driver
> > is able to know which domain will be used for the 2nd stage
> > translation of the nesting mode and also be able to do PASID ownership
> > check. This patch passes @domain per the above reason. Also, the
> > prototype of &pasid is changed frnt" to "u32" as the below link.
> 
> s/frnt"/from an "int"/

got it.

> > https://lore.kernel.org/kvm/27ac7880-bdd3-2891-139e-b4a7cd18420b@redha
> > t.com/
> 
> This is really confusing, the link is to Eric's comment asking that the 
> conversion from
> (at the time) int to ioasid_t be included in the commit log.  The text here 
> implies that
> it's pointing to some sort of justification for the change, which it isn't.  
> It just notes
> that it happened, not why it happened, with a mostly irrelevant link.

really sorry, a mistake from me. it should be the below link.

[PATCH v6 01/12] iommu: Change type of pasid to u32
https://lore.kernel.org/linux-iommu/1594684087-61184-2-git-send-email-fenghua...@intel.com/

> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > Cc: Alex Williamson 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Cc: Joerg Roedel 
> > Cc: Lu Baolu 
> > Reviewed-by: Eric Auger 
> > Signed-off-by: Yi Sun 
> > Signed-off-by: Liu Yi L 
> > ---
> > v5 -> v6:
> > *) use "u32" prototype for @pasid.
> > *) add review-by from Eric Auger.
> 
> I'd probably hold off on adding Eric's R-b given the additional change in 
> this version
> FWIW.  Thanks,

ok, will hold on it. :-)

Regards,
Yi Liu

> Alex
> 
> > v2 -> v3:
> > *) pass in domain info only
> > *) use u32 for pasid instead of int type
> >
> > v1 -> v2:
> > *) added in v2.
> > ---
> >  drivers/iommu/intel/svm.c   | 3 ++-
> >  drivers/iommu/iommu.c   | 2 +-
> >  include/linux/intel-iommu.h | 3 ++-
> >  include/linux/iommu.h   | 3 ++-
> >  4 files changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> > index c27d16a..c85b8d5 100644
> > --- a/drivers/iommu/intel/svm.c
> > +++ b/drivers/iommu/intel/svm.c
> > @@ -436,7 +436,8 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain,
> struct device *dev,
> > return ret;
> >  }
> >
> > -int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> > +int intel_svm_unbind_gpasid(struct iommu_domain *domain,
> > +   struct device *dev, u32 pasid)
> >  {
> > struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > struct intel_svm_dev *sdev;
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index
> > 1ce2a61..bee79d7 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -2145,7 +2145,7 @@ int iommu_sva_unbind_gpasid(struct iommu_domain
> *domain, struct device *dev,
> > if (unlikely(!domain->ops->sva_unbind_gpasid))
> > return -ENODEV;
> >
> > -   return domain->ops->sva_unbind_gpasid(dev, data->hpasid);
> > +   return domain->ops->sva_unbind_gpasid(domain, dev, data->hpasid);
> >  }
> >  EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
> >
> > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> > index 0d0ab32..f98146b 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -738,7 +738,8 @@ extern int intel_svm_enable_prq(struct intel_iommu
> > *iommu);  extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> > int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
> >   struct iommu_gpasid_bind_data *data); -int
> > intel_svm_unbind_gpasid(struct device *dev, int pasid);
> > +int intel_svm_unbind_gpasid(struct iommu_domain *domain,
> > +   struct device *dev, u32 pasid);
> >  struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
> >  void *drvdata);
> >  void intel_svm_unbind(struct iommu_sva *handle); diff --git
> > a/include/linux/iommu.h b/include/linux/iommu.h index b1ff702..80467fc
> > 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -303,7 +303,8 @@ struct iommu_ops {
> > int (*sva_bind_gpasid)(struct iommu_domain *domain,
> > struct device *dev, struct iommu_gpasid_bind_data 
> > *data);
> >
> > -   int (*sva_unbind_gpasid)(struct device *dev, int pasid);
> > +   int (*sva_unbind_gpasid)(struct iommu_domain *domain,
> > +struct d

Re: [PATCH 16/18] staging/media/tegra-vde: Clean up IOMMU workaround

2020-08-20 Thread Robin Murphy


On 2020-08-20 20:51, Dmitry Osipenko wrote:

20.08.2020 18:08, Robin Murphy пишет:

Now that arch/arm is wired up for default domains and iommu-dma, we no
longer need to work around the arch-private mapping.

Signed-off-by: Robin Murphy 
---
  drivers/staging/media/tegra-vde/iommu.c | 12 
  1 file changed, 12 deletions(-)

diff --git a/drivers/staging/media/tegra-vde/iommu.c 
b/drivers/staging/media/tegra-vde/iommu.c
index 6af863d92123..4f770189ed34 100644
--- a/drivers/staging/media/tegra-vde/iommu.c
+++ b/drivers/staging/media/tegra-vde/iommu.c
@@ -10,10 +10,6 @@
  #include 
  #include 
  
-#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)

-#include 
-#endif
-
  #include "vde.h"
  
  int tegra_vde_iommu_map(struct tegra_vde *vde,

@@ -70,14 +66,6 @@ int tegra_vde_iommu_init(struct tegra_vde *vde)
if (!vde->group)
return 0;
  
-#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)

-   if (dev->archdata.mapping) {
-   struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-
-   arm_iommu_detach_device(dev);
-   arm_iommu_release_mapping(mapping);
-   }
-#endif
vde->domain = iommu_domain_alloc(&platform_bus_type);
if (!vde->domain) {
err = -ENOMEM;



Hello, Robin! Thank you for yours work!

Some drivers, like this Tegra VDE (Video Decoder Engine) driver for
example, do not want to use implicit IOMMU domain.


That isn't (intentionally) changing here - the only difference should be 
that instead of having the ARM-special implicit domain, which you have 
to kick out of the way with the ARM-specific API before you're able to 
attach your own domain, the implicit domain is now a proper IOMMU API 
default domain, which automatically gets bumped by your attach. The 
default domains should still only be created in the same cases that the 
ARM dma_iommu_mappings were.



Tegra VDE driver
relies on explicit IOMMU domain in a case of Tegra SMMU because VDE
hardware can't access last page of the AS and because driver wants to
reserve some fixed addresses [1].

[1]
https://elixir.bootlin.com/linux/v5.9-rc1/source/drivers/staging/media/tegra-vde/iommu.c#L100

Tegra30 SoC supports up to 4 domains, hence it's not possible to afford
wasting unused implicit domains. I think this needs to be addressed
before this patch could be applied.


Yeah, there is one subtle change in behaviour from removing the ARM 
layer on top of the core API, in that the IOMMU driver will no longer 
see an explicit detach call. Thus it does stand to benefit from being a 
bit cleverer about noticing devices being moved from one domain to 
another by an attach call, either by releasing the hardware context for 
the inactive domain once the device(s) are moved across to the new one, 
or by simply reprogramming the hardware context in-place for the new 
domain's address space without allocating a new one at all (most of the 
drivers that don't have multiple contexts already handle the latter 
approach quite well).



Would it be possible for IOMMU drivers to gain support for filtering out
devices in iommu_domain_alloc(dev, type)? Then perhaps Tegra SMMU driver
could simply return NULL in a case of type=IOMMU_DOMAIN_DMA and
dev=tegra-vde.


If you can implement IOMMU_DOMAIN_IDENTITY by allowing the relevant 
devices to bypass translation entirely without needing a hardware 
context (or at worst, can spare one context which all identity-mapped 
logical domains can share), then you could certainly do that kind of 
filtering with the .def_domain_type callback if you really wanted to. As 
above, the intent is that that shouldn't be necessary for this 
particular case, since only one of a group's default domain and 
explicitly attached domain can be live at any given time, so the driver 
should be able to take advantage of that.


If you simply have more active devices (groups) than available contexts 
then yes, you probably would want to do some filtering to decide who 
deserves a translation domain and who doesn't, but in that case you 
should already have had a long-standing problem with the ARM implicit 
domains.



Alternatively, the Tegra SMMU could be changed such that the devices
will be attached to a domain at the time of a first IOMMU mapping
invocation instead of attaching at the time of attach_dev() callback
invocation.

Or maybe even IOMMU core could be changed to attach devices at the time
of the first IOMMU mapping invocation? This could be a universal
solution for all drivers.


I suppose technically you could do that within an IOMMU driver already 
(similar to how some defer most of setup that logically belongs to 
->domain_alloc until the first ->attach_dev). It's a bit grim from the 
caller's PoV though, in terms of the failure mode being non-obvious and 
having no real way to recover. Again, you'd be better off simply making 
decisions up-front at domain_alloc or attach time based on the domain type.


Robin.
_

Re: [PATCH 17/18] media/omap3isp: Clean up IOMMU workaround

2020-08-20 Thread Robin Murphy


On 2020-08-20 20:55, Sakari Ailus wrote:

On Thu, Aug 20, 2020 at 06:25:19PM +0100, Robin Murphy wrote:

On 2020-08-20 17:53, Sakari Ailus wrote:

Hi Robin,

On Thu, Aug 20, 2020 at 04:08:36PM +0100, Robin Murphy wrote:

Now that arch/arm is wired up for default domains and iommu-dma, devices
behind IOMMUs will get mappings set up automatically as appropriate, so
there is no need for drivers to do so manually.

Signed-off-by: Robin Murphy 


Thanks for the patch.


Many thanks for testing so quickly!


I haven't looked at the details but it seems that this causes the buffer
memory allocation to be physically contiguous, which causes a failure to
allocate video buffers of entirely normal size. I guess that was not
intentional?


Hmm, it looks like the device ends up with the wrong DMA ops, which implies
something didn't go as expected with the earlier IOMMU setup and default
domain creation. Chances are that either I missed some subtlety in the
omap_iommu change, or I've fundamentally misjudged how the ISP probing works
and it never actually goes down the of_iommu_configure() path in the first
place. Do you get any messages from the IOMMU layer earlier on during boot?


I do get these:

[2.934936] iommu: Default domain type: Translated
[2.940917] omap-iommu 480bd400.mmu: 480bd400.mmu registered
[2.946899] platform 480bc000.isp: Adding to iommu group 0



So that much looks OK, if there are no obvious errors. Unfortunately 
there's no easy way to tell exactly what of_iommu_configure() is doing 
(beyond enabling a couple of vague debug messages). The first thing I'll 
do tomorrow is double-check whether it's really working on my boards 
here, or whether I was just getting lucky with CMA... (I assume you 
don't have CMA enabled if you're ending up in remap_allocator_alloc())


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 12/15] vfio/type1: Add vSVA support for IOMMU-backed mdevs

2020-08-20 Thread Alex Williamson

On Mon, 27 Jul 2020 23:27:41 -0700
Liu Yi L  wrote:

> Recent years, mediated device pass-through framework (e.g. vfio-mdev)
> is used to achieve flexible device sharing across domains (e.g. VMs).
> Also there are hardware assisted mediated pass-through solutions from
> platform vendors. e.g. Intel VT-d scalable mode which supports Intel
> Scalable I/O Virtualization technology. Such mdevs are called IOMMU-
> backed mdevs as there are IOMMU enforced DMA isolation for such mdevs.
> In kernel, IOMMU-backed mdevs are exposed to IOMMU layer by aux-domain

Or a physical IOMMU backing device.

> concept, which means mdevs are protected by an iommu domain which is
> auxiliary to the domain that the kernel driver primarily uses for DMA
> API. Details can be found in the KVM presentation as below:
> 
> https://events19.linuxfoundation.org/wp-content/uploads/2017/12/\
> Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf

I think letting the line exceed 80 columns is preferable so that it's
clickable.  Thanks,

Alex

> This patch extends NESTING_IOMMU ops to IOMMU-backed mdev devices. The
> main requirement is to use the auxiliary domain associated with mdev.
> 
> Cc: Kevin Tian 
> CC: Jacob Pan 
> CC: Jun Tian 
> Cc: Alex Williamson 
> Cc: Eric Auger 
> Cc: Jean-Philippe Brucker 
> Cc: Joerg Roedel 
> Cc: Lu Baolu 
> Reviewed-by: Eric Auger 
> Signed-off-by: Liu Yi L 
> ---
> v5 -> v6:
> *) add review-by from Eric Auger.
> 
> v1 -> v2:
> *) check the iommu_device to ensure the handling mdev is IOMMU-backed
> ---
>  drivers/vfio/vfio_iommu_type1.c | 40 
>  1 file changed, 36 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index bf95a0f..9d8f252 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -2379,20 +2379,41 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu 
> *iommu,
>   return ret;
>  }
>  
> +static struct device *vfio_get_iommu_device(struct vfio_group *group,
> + struct device *dev)
> +{
> + if (group->mdev_group)
> + return vfio_mdev_get_iommu_device(dev);
> + else
> + return dev;
> +}
> +
>  static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
>  {
>   struct domain_capsule *dc = (struct domain_capsule *)data;
>   unsigned long arg = *(unsigned long *)dc->data;
> + struct device *iommu_device;
> +
> + iommu_device = vfio_get_iommu_device(dc->group, dev);
> + if (!iommu_device)
> + return -EINVAL;
>  
> - return iommu_uapi_sva_bind_gpasid(dc->domain, dev, (void __user *)arg);
> + return iommu_uapi_sva_bind_gpasid(dc->domain, iommu_device,
> +   (void __user *)arg);
>  }
>  
>  static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
>  {
>   struct domain_capsule *dc = (struct domain_capsule *)data;
>   unsigned long arg = *(unsigned long *)dc->data;
> + struct device *iommu_device;
>  
> - iommu_uapi_sva_unbind_gpasid(dc->domain, dev, (void __user *)arg);
> + iommu_device = vfio_get_iommu_device(dc->group, dev);
> + if (!iommu_device)
> + return -EINVAL;
> +
> + iommu_uapi_sva_unbind_gpasid(dc->domain, iommu_device,
> +  (void __user *)arg);
>   return 0;
>  }
>  
> @@ -2401,8 +2422,13 @@ static int __vfio_dev_unbind_gpasid_fn(struct device 
> *dev, void *data)
>   struct domain_capsule *dc = (struct domain_capsule *)data;
>   struct iommu_gpasid_bind_data *unbind_data =
>   (struct iommu_gpasid_bind_data *)dc->data;
> + struct device *iommu_device;
> +
> + iommu_device = vfio_get_iommu_device(dc->group, dev);
> + if (!iommu_device)
> + return -EINVAL;
>  
> - iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data);
> + iommu_sva_unbind_gpasid(dc->domain, iommu_device, unbind_data);
>   return 0;
>  }
>  
> @@ -3060,8 +3086,14 @@ static int vfio_dev_cache_invalidate_fn(struct device 
> *dev, void *data)
>  {
>   struct domain_capsule *dc = (struct domain_capsule *)data;
>   unsigned long arg = *(unsigned long *)dc->data;
> + struct device *iommu_device;
> +
> + iommu_device = vfio_get_iommu_device(dc->group, dev);
> + if (!iommu_device)
> + return -EINVAL;
>  
> - iommu_uapi_cache_invalidate(dc->domain, dev, (void __user *)arg);
> + iommu_uapi_cache_invalidate(dc->domain, iommu_device,
> + (void __user *)arg);
>   return 0;
>  }
>  

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [GIT PULL] dma-mapping fixes for 5.9

2020-08-20 Thread pr-tracker-bot

The pull request you sent on Thu, 20 Aug 2020 18:41:58 +0200:

> git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.9-1

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/d271b51c60ebe71e0435a9059b315a3d8bb8a099

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 08/15] iommu: Pass domain to sva_unbind_gpasid()

2020-08-20 Thread Alex Williamson

On Mon, 27 Jul 2020 23:27:37 -0700
Liu Yi L  wrote:

> From: Yi Sun 
> 
> Current interface is good enough for SVA virtualization on an assigned
> physical PCI device, but when it comes to mediated devices, a physical
> device may attached with multiple aux-domains. Also, for guest unbind,

s/may/may be/

> the PASID to be unbind should be allocated to the VM. This check requires
> to know the ioasid_set which is associated with the domain.
> 
> So this interface needs to pass in domain info. Then the iommu driver is
> able to know which domain will be used for the 2nd stage translation of
> the nesting mode and also be able to do PASID ownership check. This patch
> passes @domain per the above reason. Also, the prototype of &pasid is
> changed frnt" to "u32" as the below link.

s/frnt"/from an "int"/
 
> https://lore.kernel.org/kvm/27ac7880-bdd3-2891-139e-b4a7cd184...@redhat.com/

This is really confusing, the link is to Eric's comment asking that the
conversion from (at the time) int to ioasid_t be included in the commit
log.  The text here implies that it's pointing to some sort of
justification for the change, which it isn't.  It just notes that it
happened, not why it happened, with a mostly irrelevant link.

> Cc: Kevin Tian 
> CC: Jacob Pan 
> Cc: Alex Williamson 
> Cc: Eric Auger 
> Cc: Jean-Philippe Brucker 
> Cc: Joerg Roedel 
> Cc: Lu Baolu 
> Reviewed-by: Eric Auger 
> Signed-off-by: Yi Sun 
> Signed-off-by: Liu Yi L 
> ---
> v5 -> v6:
> *) use "u32" prototype for @pasid.
> *) add review-by from Eric Auger.

I'd probably hold off on adding Eric's R-b given the additional change
in this version FWIW.  Thanks,

Alex
 
> v2 -> v3:
> *) pass in domain info only
> *) use u32 for pasid instead of int type
> 
> v1 -> v2:
> *) added in v2.
> ---
>  drivers/iommu/intel/svm.c   | 3 ++-
>  drivers/iommu/iommu.c   | 2 +-
>  include/linux/intel-iommu.h | 3 ++-
>  include/linux/iommu.h   | 3 ++-
>  4 files changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index c27d16a..c85b8d5 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -436,7 +436,8 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
> struct device *dev,
>   return ret;
>  }
>  
> -int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> +int intel_svm_unbind_gpasid(struct iommu_domain *domain,
> + struct device *dev, u32 pasid)
>  {
>   struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
>   struct intel_svm_dev *sdev;
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 1ce2a61..bee79d7 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2145,7 +2145,7 @@ int iommu_sva_unbind_gpasid(struct iommu_domain 
> *domain, struct device *dev,
>   if (unlikely(!domain->ops->sva_unbind_gpasid))
>   return -ENODEV;
>  
> - return domain->ops->sva_unbind_gpasid(dev, data->hpasid);
> + return domain->ops->sva_unbind_gpasid(domain, dev, data->hpasid);
>  }
>  EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
>  
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 0d0ab32..f98146b 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -738,7 +738,8 @@ extern int intel_svm_enable_prq(struct intel_iommu 
> *iommu);
>  extern int intel_svm_finish_prq(struct intel_iommu *iommu);
>  int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
> struct iommu_gpasid_bind_data *data);
> -int intel_svm_unbind_gpasid(struct device *dev, int pasid);
> +int intel_svm_unbind_gpasid(struct iommu_domain *domain,
> + struct device *dev, u32 pasid);
>  struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
>void *drvdata);
>  void intel_svm_unbind(struct iommu_sva *handle);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index b1ff702..80467fc 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -303,7 +303,8 @@ struct iommu_ops {
>   int (*sva_bind_gpasid)(struct iommu_domain *domain,
>   struct device *dev, struct iommu_gpasid_bind_data 
> *data);
>  
> - int (*sva_unbind_gpasid)(struct device *dev, int pasid);
> + int (*sva_unbind_gpasid)(struct iommu_domain *domain,
> +  struct device *dev, u32 pasid);
>  
>   int (*def_domain_type)(struct device *dev);
>  

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 07/15] vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)

2020-08-20 Thread Alex Williamson

On Mon, 27 Jul 2020 23:27:36 -0700
Liu Yi L  wrote:

> This patch allows userspace to request PASID allocation/free, e.g. when
> serving the request from the guest.
> 
> PASIDs that are not freed by userspace are automatically freed when the
> IOASID set is destroyed when process exits.
> 
> Cc: Kevin Tian 
> CC: Jacob Pan 
> Cc: Alex Williamson 
> Cc: Eric Auger 
> Cc: Jean-Philippe Brucker 
> Cc: Joerg Roedel 
> Cc: Lu Baolu 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Yi Sun 
> Signed-off-by: Jacob Pan 
> ---
> v5 -> v6:
> *) address comments from Eric against v5. remove the alloc/free helper.
> 
> v4 -> v5:
> *) address comments from Eric Auger.
> *) the comments for the PASID_FREE request is addressed in patch 5/15 of
>this series.
> 
> v3 -> v4:
> *) address comments from v3, except the below comment against the range
>of PASID_FREE request. needs more help on it.
> "> +if (req.range.min > req.range.max)  
> 
>  Is it exploitable that a user can spin the kernel for a long time in
>  the case of a free by calling this with [0, MAX_UINT] regardless of
>  their actual allocations?"
> https://lore.kernel.org/linux-iommu/20200702151832.048b4...@x1.home/
> 
> v1 -> v2:
> *) move the vfio_mm related code to be a seprate module
> *) use a single structure for alloc/free, could support a range of PASIDs
> *) fetch vfio_mm at group_attach time instead of at iommu driver open time
> ---
>  drivers/vfio/Kconfig|  1 +
>  drivers/vfio/vfio_iommu_type1.c | 69 
> +
>  drivers/vfio/vfio_pasid.c   | 10 ++
>  include/linux/vfio.h|  6 
>  include/uapi/linux/vfio.h   | 37 ++
>  5 files changed, 123 insertions(+)
> 
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index 3d8a108..95d90c6 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -2,6 +2,7 @@
>  config VFIO_IOMMU_TYPE1
>   tristate
>   depends on VFIO
> + select VFIO_PASID if (X86)
>   default n
>  
>  config VFIO_IOMMU_SPAPR_TCE
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 18ff0c3..ea89c7c 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -76,6 +76,7 @@ struct vfio_iommu {
>   booldirty_page_tracking;
>   boolpinned_page_dirty_scope;
>   struct iommu_nesting_info   *nesting_info;
> + struct vfio_mm  *vmm;
>  };
>  
>  struct vfio_domain {
> @@ -1937,6 +1938,11 @@ static void vfio_iommu_iova_insert_copy(struct 
> vfio_iommu *iommu,
>  
>  static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
>  {
> + if (iommu->vmm) {
> + vfio_mm_put(iommu->vmm);
> + iommu->vmm = NULL;
> + }
> +
>   kfree(iommu->nesting_info);
>   iommu->nesting_info = NULL;
>  }
> @@ -2071,6 +2077,26 @@ static int vfio_iommu_type1_attach_group(void 
> *iommu_data,
>   iommu->nesting_info);
>   if (ret)
>   goto out_detach;
> +
> + if (iommu->nesting_info->features &
> + IOMMU_NESTING_FEAT_SYSWIDE_PASID) {
> + struct vfio_mm *vmm;
> + int sid;
> +
> + vmm = vfio_mm_get_from_task(current);
> + if (IS_ERR(vmm)) {
> + ret = PTR_ERR(vmm);
> + goto out_detach;
> + }
> + iommu->vmm = vmm;
> +
> + sid = vfio_mm_ioasid_sid(vmm);
> + ret = iommu_domain_set_attr(domain->domain,
> + DOMAIN_ATTR_IOASID_SID,
> + &sid);
> + if (ret)
> + goto out_detach;
> + }
>   }
>  
>   /* Get aperture info */
> @@ -2859,6 +2885,47 @@ static int vfio_iommu_type1_dirty_pages(struct 
> vfio_iommu *iommu,
>   return -EINVAL;
>  }
>  
> +static int vfio_iommu_type1_pasid_request(struct vfio_iommu *iommu,
> +   unsigned long arg)
> +{
> + struct vfio_iommu_type1_pasid_request req;
> + unsigned long minsz;
> + int ret;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_pasid_request, range);
> +
> + if (copy_from_user(&req, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (req.argsz < minsz || (req.flags & ~VFIO_PASID_REQUEST_MASK))
> + return -EINVAL;
> +
> + if (req.range.min > req.range.max)
> + return -EINVAL;
> +
> + mutex_lock(&iommu->lock);
> + if (!iommu->vmm) {
> + mutex_unlock(&iommu->lock);
> + return -EOPNOTSUPP;
> + }
> +
> + switch (req.flags & VFIO_PASID_REQUEST_MASK) {
> +

Re: [PATCH 12/18] iommu/tegra-gart: Add IOMMU_DOMAIN_DMA support

2020-08-20 Thread Dmitry Osipenko

20.08.2020 18:08, Robin Murphy пишет:
> Now that arch/arm is wired up for default domains and iommu-dma,
> implement the corresponding driver-side support for DMA domains.
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/tegra-gart.c | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
> index fac720273889..e081387080f6 100644
> --- a/drivers/iommu/tegra-gart.c
> +++ b/drivers/iommu/tegra-gart.c
> @@ -9,6 +9,7 @@
>  
>  #define dev_fmt(fmt) "gart: " fmt
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -145,16 +146,22 @@ static struct iommu_domain 
> *gart_iommu_domain_alloc(unsigned type)
>  {
>   struct iommu_domain *domain;

Hello, Robin!

Tegra20 GART isn't a real IOMMU, but a small relocation aperture. We
would only want to use it for a temporal mappings (managed by GPU
driver) for the time while GPU hardware is busy and working with a
sparse DMA buffers, the driver will take care of unmapping the sparse
buffers once GPU work is finished [1]. In a case of contiguous DMA
buffers, we want to bypass the IOMMU and use buffer's phys address
because GART aperture is small and all buffers simply can't fit into
GART for a complex GPU operations that involve multiple buffers [2][3].
The upstream GPU driver still doesn't support GART, but eventually it
needs to be changed.

[1]
https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gart.c#L489

[2]
https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/gart.c#L542

[3]
https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/uapi/patching.c#L90

> - if (type != IOMMU_DOMAIN_UNMANAGED)
> + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
>   return NULL;

Will a returned NULL tell to IOMMU core that implicit domain shouldn't
be used? Is it possible to leave this driver as-is?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 16/18] staging/media/tegra-vde: Clean up IOMMU workaround

2020-08-20 Thread Dmitry Osipenko

20.08.2020 22:51, Dmitry Osipenko пишет:
> Alternatively, the Tegra SMMU could be changed such that the devices
> will be attached to a domain at the time of a first IOMMU mapping
> invocation instead of attaching at the time of attach_dev() callback
> invocation.
> 
> Or maybe even IOMMU core could be changed to attach devices at the time
> of the first IOMMU mapping invocation? This could be a universal
> solution for all drivers.

Although, please scratch this :) I'll need to revisit how DMA mapping
API works.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 17/18] media/omap3isp: Clean up IOMMU workaround

2020-08-20 Thread Sakari Ailus

On Thu, Aug 20, 2020 at 06:25:19PM +0100, Robin Murphy wrote:
> On 2020-08-20 17:53, Sakari Ailus wrote:
> > Hi Robin,
> > 
> > On Thu, Aug 20, 2020 at 04:08:36PM +0100, Robin Murphy wrote:
> > > Now that arch/arm is wired up for default domains and iommu-dma, devices
> > > behind IOMMUs will get mappings set up automatically as appropriate, so
> > > there is no need for drivers to do so manually.
> > > 
> > > Signed-off-by: Robin Murphy 
> > 
> > Thanks for the patch.
> 
> Many thanks for testing so quickly!
> 
> > I haven't looked at the details but it seems that this causes the buffer
> > memory allocation to be physically contiguous, which causes a failure to
> > allocate video buffers of entirely normal size. I guess that was not
> > intentional?
> 
> Hmm, it looks like the device ends up with the wrong DMA ops, which implies
> something didn't go as expected with the earlier IOMMU setup and default
> domain creation. Chances are that either I missed some subtlety in the
> omap_iommu change, or I've fundamentally misjudged how the ISP probing works
> and it never actually goes down the of_iommu_configure() path in the first
> place. Do you get any messages from the IOMMU layer earlier on during boot?

I do get these:

[2.934936] iommu: Default domain type: Translated 
[2.940917] omap-iommu 480bd400.mmu: 480bd400.mmu registered
[2.946899] platform 480bc000.isp: Adding to iommu group 0

-- 
Sakari Ailus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 04/15] vfio/type1: Report iommu nesting info to userspace

2020-08-20 Thread Alex Williamson

On Mon, 27 Jul 2020 23:27:33 -0700
Liu Yi L  wrote:

> This patch exports iommu nesting capability info to user space through
> VFIO. Userspace is expected to check this info for supported uAPIs (e.g.
> PASID alloc/free, bind page table, and cache invalidation) and the vendor
> specific format information for first level/stage page table that will be
> bound to.
> 
> The nesting info is available only after container set to be NESTED type.
> Current implementation imposes one limitation - one nesting container
> should include at most one iommu group. The philosophy of vfio container
> is having all groups/devices within the container share the same IOMMU
> context. When vSVA is enabled, one IOMMU context could include one 2nd-
> level address space and multiple 1st-level address spaces. While the
> 2nd-level address space is reasonably sharable by multiple groups, blindly
> sharing 1st-level address spaces across all groups within the container
> might instead break the guest expectation. In the future sub/super container
> concept might be introduced to allow partial address space sharing within
> an IOMMU context. But for now let's go with this restriction by requiring
> singleton container for using nesting iommu features. Below link has the
> related discussion about this decision.
> 
> https://lore.kernel.org/kvm/20200515115924.37e69...@w520.home/
> 
> This patch also changes the NESTING type container behaviour. Something
> that would have succeeded before will now fail: Before this series, if
> user asked for a VFIO_IOMMU_TYPE1_NESTING, it would have succeeded even
> if the SMMU didn't support stage-2, as the driver would have silently
> fallen back on stage-1 mappings (which work exactly the same as stage-2
> only since there was no nesting supported). After the series, we do check
> for DOMAIN_ATTR_NESTING so if user asks for VFIO_IOMMU_TYPE1_NESTING and
> the SMMU doesn't support stage-2, the ioctl fails. But it should be a good
> fix and completely harmless. Detail can be found in below link as well.
> 
> https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/
> 
> Cc: Kevin Tian 
> CC: Jacob Pan 
> Cc: Alex Williamson 
> Cc: Eric Auger 
> Cc: Jean-Philippe Brucker 
> Cc: Joerg Roedel 
> Cc: Lu Baolu 
> Signed-off-by: Liu Yi L 
> ---
> v5 -> v6:
> *) address comments against v5 from Eric Auger.
> *) don't report nesting cap to userspace if the nesting_info->format is
>invalid.
> 
> v4 -> v5:
> *) address comments from Eric Auger.
> *) return struct iommu_nesting_info for VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
>cap is much "cheap", if needs extension in future, just define another cap.
>https://lore.kernel.org/kvm/20200708132947.5b7ee...@x1.home/
> 
> v3 -> v4:
> *) address comments against v3.
> 
> v1 -> v2:
> *) added in v2
> ---
>  drivers/vfio/vfio_iommu_type1.c | 106 
> +++-
>  include/uapi/linux/vfio.h   |  19 +++
>  2 files changed, 113 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 3bd70ff..18ff0c3 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
>"Maximum number of user DMA mappings per container (65535).");
>  
>  struct vfio_iommu {
> - struct list_headdomain_list;
> - struct list_headiova_list;
> - struct vfio_domain  *external_domain; /* domain for external user */
> - struct mutexlock;
> - struct rb_root  dma_list;
> - struct blocking_notifier_head notifier;
> - unsigned intdma_avail;
> - uint64_tpgsize_bitmap;
> - boolv2;
> - boolnesting;
> - booldirty_page_tracking;
> - boolpinned_page_dirty_scope;
> + struct list_headdomain_list;
> + struct list_headiova_list;
> + /* domain for external user */
> + struct vfio_domain  *external_domain;
> + struct mutexlock;
> + struct rb_root  dma_list;
> + struct blocking_notifier_head   notifier;
> + unsigned intdma_avail;
> + uint64_tpgsize_bitmap;
> + boolv2;
> + boolnesting;
> + booldirty_page_tracking;
> + boolpinned_page_dirty_scope;
> + struct iommu_nesting_info   *nesting_info;
>  };
>  
>  struct vfio_domain {
> @@ -130,6 +132,9 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(&iommu->domain_list))
>  
> +#define CONTAINER_HAS_DOMAIN(iommu)  (((iommu)->external_domain) || \
> +  (!list_empty(&(i

Re: [PATCH 16/18] staging/media/tegra-vde: Clean up IOMMU workaround

2020-08-20 Thread Dmitry Osipenko

20.08.2020 18:08, Robin Murphy пишет:
> Now that arch/arm is wired up for default domains and iommu-dma, we no
> longer need to work around the arch-private mapping.
> 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/staging/media/tegra-vde/iommu.c | 12 
>  1 file changed, 12 deletions(-)
> 
> diff --git a/drivers/staging/media/tegra-vde/iommu.c 
> b/drivers/staging/media/tegra-vde/iommu.c
> index 6af863d92123..4f770189ed34 100644
> --- a/drivers/staging/media/tegra-vde/iommu.c
> +++ b/drivers/staging/media/tegra-vde/iommu.c
> @@ -10,10 +10,6 @@
>  #include 
>  #include 
>  
> -#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
> -#include 
> -#endif
> -
>  #include "vde.h"
>  
>  int tegra_vde_iommu_map(struct tegra_vde *vde,
> @@ -70,14 +66,6 @@ int tegra_vde_iommu_init(struct tegra_vde *vde)
>   if (!vde->group)
>   return 0;
>  
> -#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
> - if (dev->archdata.mapping) {
> - struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
> -
> - arm_iommu_detach_device(dev);
> - arm_iommu_release_mapping(mapping);
> - }
> -#endif
>   vde->domain = iommu_domain_alloc(&platform_bus_type);
>   if (!vde->domain) {
>   err = -ENOMEM;
> 

Hello, Robin! Thank you for yours work!

Some drivers, like this Tegra VDE (Video Decoder Engine) driver for
example, do not want to use implicit IOMMU domain. Tegra VDE driver
relies on explicit IOMMU domain in a case of Tegra SMMU because VDE
hardware can't access last page of the AS and because driver wants to
reserve some fixed addresses [1].

[1]
https://elixir.bootlin.com/linux/v5.9-rc1/source/drivers/staging/media/tegra-vde/iommu.c#L100

Tegra30 SoC supports up to 4 domains, hence it's not possible to afford
wasting unused implicit domains. I think this needs to be addressed
before this patch could be applied.

Would it be possible for IOMMU drivers to gain support for filtering out
devices in iommu_domain_alloc(dev, type)? Then perhaps Tegra SMMU driver
could simply return NULL in a case of type=IOMMU_DOMAIN_DMA and
dev=tegra-vde.

Alternatively, the Tegra SMMU could be changed such that the devices
will be attached to a domain at the time of a first IOMMU mapping
invocation instead of attaching at the time of attach_dev() callback
invocation.

Or maybe even IOMMU core could be changed to attach devices at the time
of the first IOMMU mapping invocation? This could be a universal
solution for all drivers.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-20 Thread Tomasz Figa

On Thu, Aug 20, 2020 at 6:52 PM Christoph Hellwig  wrote:
>
> On Thu, Aug 20, 2020 at 12:24:31PM +0200, Tomasz Figa wrote:
> > > Of course this still uses the scatterlist structure with its annoying
> > > mix of input and output parametes, so I'd rather not expose it as
> > > an official API at the DMA layer.
> >
> > The problem with the above open coded approach is that it requires
> > explicit handling of the non-IOMMU and IOMMU cases and this is exactly
> > what we don't want to have in vb2 and what was actually the job of the
> > DMA API to hide. Is the plan to actually move the IOMMU handling out
> > of the DMA API?
> >
> > Do you think we could instead turn it into a dma_alloc_noncoherent()
> > helper, which has similar semantics as dma_alloc_attrs() and handles
> > the various corner cases (e.g. invalidate_kernel_vmap_range and
> > flush_kernel_vmap_range) to achieve the desired functionality without
> > delegating the "hell", as you called it, to the users?
>
> Yes, I guess I could do something in that direction.  At least for
> dma-iommu, which thanks to Robin should be all you'll need in the
> foreseeable future.

That would be really great. Let me know if we can help by testing with
V4L2/vb2 or in any other way.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-20 Thread Tomasz Figa

On Thu, Aug 20, 2020 at 6:54 PM Christoph Hellwig  wrote:
>
> On Thu, Aug 20, 2020 at 12:05:29PM +0200, Tomasz Figa wrote:
> > The UAPI and V4L2/videobuf2 changes are in good shape and the only
> > wrong part is the use of DMA API, which was based on an earlier email
> > guidance anyway, and a change to the synchronization part . I find
> > conclusions like the above insulting for people who put many hours
> > into designing and implementing the related functionality, given the
> > complexity of the videobuf2 framework and how ill-defined the DMA API
> > was, and would feel better if such could be avoided in future
> > communication.
>
> It wasn't meant to be too insulting, but I found this out when trying
> to figure out how to just disable it.  But it also ends up using
> the actual dma attr flags for it's own consistency checks, so just
> not setting the flag did not turn out to work that easily.
>

Yes, sadly the videobuf2 ended up becoming quite counterintuitive
after growing for the long years and that is reflected in the design
of this feature as well. I think we need to do something about it.

> But in general it helps to add a few more people to the Cc list for
> such things that do stranger things.  Especially if you think you did
> it based on the advice of those people.

Indeed, we should have CCed you and other DMA folks. Sergey who worked
on this series is quite new to these areas of the kernel (although not
to the kernel itself) and it's my fault for not explicitly letting him
know to do that.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 17/18] media/omap3isp: Clean up IOMMU workaround

2020-08-20 Thread Robin Murphy


On 2020-08-20 17:53, Sakari Ailus wrote:

Hi Robin,

On Thu, Aug 20, 2020 at 04:08:36PM +0100, Robin Murphy wrote:

Now that arch/arm is wired up for default domains and iommu-dma, devices
behind IOMMUs will get mappings set up automatically as appropriate, so
there is no need for drivers to do so manually.

Signed-off-by: Robin Murphy 


Thanks for the patch.


Many thanks for testing so quickly!


I haven't looked at the details but it seems that this causes the buffer
memory allocation to be physically contiguous, which causes a failure to
allocate video buffers of entirely normal size. I guess that was not
intentional?


Hmm, it looks like the device ends up with the wrong DMA ops, which 
implies something didn't go as expected with the earlier IOMMU setup and 
default domain creation. Chances are that either I missed some subtlety 
in the omap_iommu change, or I've fundamentally misjudged how the ISP 
probing works and it never actually goes down the of_iommu_configure() 
path in the first place. Do you get any messages from the IOMMU layer 
earlier on during boot?


Robin.


-8<---
[  218.934448] WARNING: CPU: 0 PID: 1994 at mm/page_alloc.c:4859 
__alloc_pages_nodemask+0x9c/0xb1c
[  218.943847] Modules linked in: omap3_isp videobuf2_dma_contig 
videobuf2_memops videobuf2_v4l2 videobuf2_common leds_as3645a smiapp 
v4l2_flash_led_class led_class_flash v4l2_fwnode smiapp_pll videodev leds_gpio 
mc led_class
[  218.964660] CPU: 0 PID: 1994 Comm: yavta Not tainted 5.9.0-rc1-dirty #1818
[  218.972442] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[  218.978973] Backtrace:
[  218.981842] [] (dump_backtrace) from [] 
(show_stack+0x20/0x24)
[  218.989715]  r7: r6:0009 r5:c08f03bc r4:c08f2fef
[  218.995880] [] (show_stack) from [] 
(dump_stack+0x28/0x30)
[  219.003631] [] (dump_stack) from [] (__warn+0x100/0x118)
[  219.010955]  r5:c08f03bc r4:
[  219.014953] [] (__warn) from [] 
(warn_slowpath_fmt+0x84/0xa8)
[  219.022949]  r9:c0232090 r8:c08f03bc r7:c0b08a88 r6:0009 r5:12fb 
r4:
[  219.031036] [] (warn_slowpath_fmt) from [] 
(__alloc_pages_nodemask+0x9c/0xb1c)
[  219.040557]  r9:c0185c3c r8: r7:010ec000 r6: r5:000d 
r4:
[  219.048858] [] (__alloc_pages_nodemask) from [] 
(__dma_alloc_buffer.constprop.14+0x3c/0x90)
[  219.059570]  r10:0cc0 r9:c0185c3c r8: r7:010ec000 r6:000d 
r5:c0b08a88
[  219.067901]  r4:0cc0
[  219.070587] [] (__dma_alloc_buffer.constprop.14) from [] 
(remap_allocator_alloc+0x34/0x7c)
[  219.081207]  r9:c0185c3c r8:0247 r7:e6d7fb84 r6:010ec000 r5:c0b08a88 
r4:0001
[  219.089263] [] (remap_allocator_alloc) from [] 
(__dma_alloc+0x124/0x21c)
[  219.098236]  r9:ed99fc10 r8:e69aa890 r7: r6: r5:c0b08a88 
r4:e6fdd680
[  219.106536] [] (__dma_alloc) from [] 
(arm_dma_alloc+0x68/0x74)
[  219.114654]  r10:0cc0 r9:c0185c3c r8:0cc0 r7:e69aa890 r6:010ec000 
r5:ed99fc10
[  219.122985]  r4:
[  219.125671] [] (arm_dma_alloc) from [] 
(dma_alloc_attrs+0xe4/0x120)
[  219.134216]  r9: r8:e69aa890 r7:010ec000 r6:c0b08a88 r5:ed99fc10 
r4:c010f634
[  219.142517] [] (dma_alloc_attrs) from [] 
(vb2_dc_alloc+0xcc/0x108 [videobuf2_dma_contig])
[  219.153076]  r10:e6885ca8 r9:e6abfc48 r8:0002 r7: r6:010ec000 
r5:ed99fc10
[  219.161407]  r4:e69aa880
[  219.164184] [] (vb2_dc_alloc [videobuf2_dma_contig]) from 
[] (__vb2_queue_alloc+0x258/0x4a4 [videobuf2_common])
[  219.176696]  r8:bf095b70 r7:010ec000 r6: r5:e6885ca8 r4:e6abfc00
[  219.183959] [] (__vb2_queue_alloc [videobuf2_common]) from 
[] (vb2_core_reqbufs+0x408/0x498 [videobuf2_common])
[  219.196533]  r10:e6885ce8 r9: r8:e6d7fe24 r7:e6d7fcec r6:bf09ced4 
r5:bf088580
[  219.204895]  r4:e6885ca8
[  219.207672] [] (vb2_core_reqbufs [videobuf2_common]) from 
[] (vb2_reqbufs+0x64/0x70 [videobuf2_v4l2])
[  219.219268]  r10: r9:bf032bc0 r8:c0145608 r7:bf0ad4a4 r6:e6885ca8 
r5:
[  219.227600]  r4:e6d7fe24
[  219.230499] [] (vb2_reqbufs [videobuf2_v4l2]) from [] 
(isp_video_reqbufs+0x40/0x54 [omap3_isp])
[  219.241607]  r7:bf0ad4a4 r6:e6d7fe24 r5:e6885c00 r4:e6cca928
[  219.247924] [] (isp_video_reqbufs [omap3_isp]) from [] 
(v4l_reqbufs+0x4c/0x50 [videodev])
[  219.258514]  r7:bf0ad4a4 r6:e6885c00 r5:e6d7fe24 r4:e7efbec0
[  219.264984] [] (v4l_reqbufs [videodev]) from [] 
(__video_do_ioctl+0x2d8/0x414 [videodev])
[  219.275512]  r7:bf01de00 r6: r5: r4:e6cca2e0
[  219.281982] [] (__video_do_ioctl [videodev]) from [] 
(video_usercopy+0x144/0x508 [videodev])
[  219.292816]  r10:e7efbec0 r9:c0145608 r8:e6d7fe24 r7: r6: 
r5:bf01ebdc
[  219.300933]  r4:c0145608
[  219.304168] [] (video_usercopy [videodev]) from [] 
(video_ioctl2+0x1c/0x24 [videodev])
[  219.314453]  r10:e7fbfda0 r9:e7efbec0 r8:0003 r7: r6:bee658f4 
r5:c0145608
[  219.322784]  r4:e7efbec0
[  219.325775] [] (video_ioctl2 [videodev]) from []

Re: [PATCH 10/18] iommu/msm: Add IOMMU_DOMAIN_DMA support

2020-08-20 Thread Rob Clark

On Thu, Aug 20, 2020 at 9:58 AM Robin Murphy  wrote:
>
> On 2020-08-20 16:55, Rob Clark wrote:
> > Side note, I suspect we'll end up needing something like
> > 0e764a01015dfebff8a8ffd297d74663772e248a .. if someone can dig a 32b
> > device out of the closet and dust it off, the fix is easy enough.
> > Just wanted to mention that here so anyone with a 32b device could
> > find what is needed.
>
> FWIW there shouldn't be any material change here - the generic default
> domain is installed under the same circumstances as the Arm
> dma_iommu_mapping was, so if any platform does have an issue, then it
> should already have started 4 years with f78ebca8ff3d ("iommu/msm: Add
> support for generic master bindings").

ok, it has, I guess, been a while since playing with 32b things..
someone on IRC had mentioned a problem that sounded like what
0e764a01015dfebff8a8ffd297d74663772e248a solved, unless they disabled
some ARCH_HAS_xyz thing (IIRC), which I guess is related..

BR,
-R

> Robin.
>
> >
> > BR,
> > -R
> >
> > On Thu, Aug 20, 2020 at 8:09 AM Robin Murphy  wrote:
> >>
> >> Now that arch/arm is wired up for default domains and iommu-dma,
> >> implement the corresponding driver-side support for DMA domains.
> >>
> >> Signed-off-by: Robin Murphy 
> >> ---
> >>   drivers/iommu/msm_iommu.c | 7 ++-
> >>   1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
> >> index 3615cd6241c4..f34efcbb0b2b 100644
> >> --- a/drivers/iommu/msm_iommu.c
> >> +++ b/drivers/iommu/msm_iommu.c
> >> @@ -8,6 +8,7 @@
> >>   #include 
> >>   #include 
> >>   #include 
> >> +#include 
> >>   #include 
> >>   #include 
> >>   #include 
> >> @@ -314,13 +315,16 @@ static struct iommu_domain 
> >> *msm_iommu_domain_alloc(unsigned type)
> >>   {
> >>  struct msm_priv *priv;
> >>
> >> -   if (type != IOMMU_DOMAIN_UNMANAGED)
> >> +   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
> >>  return NULL;
> >>
> >>  priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> >>  if (!priv)
> >>  goto fail_nomem;
> >>
> >> +   if (type == IOMMU_DOMAIN_DMA && 
> >> iommu_get_dma_cookie(&priv->domain))
> >> +   goto fail_nomem;
> >> +
> >>  INIT_LIST_HEAD(&priv->list_attached);
> >>
> >>  priv->domain.geometry.aperture_start = 0;
> >> @@ -339,6 +343,7 @@ static void msm_iommu_domain_free(struct iommu_domain 
> >> *domain)
> >>  struct msm_priv *priv;
> >>  unsigned long flags;
> >>
> >> +   iommu_put_dma_cookie(domain);
> >>  spin_lock_irqsave(&msm_iommu_lock, flags);
> >>  priv = to_msm_priv(domain);
> >>  kfree(priv);
> >> --
> >> 2.28.0.dirty
> >>
> >> ___
> >> dri-devel mailing list
> >> dri-de...@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 17/18] media/omap3isp: Clean up IOMMU workaround

2020-08-20 Thread Sakari Ailus

Hi Robin,

On Thu, Aug 20, 2020 at 04:08:36PM +0100, Robin Murphy wrote:
> Now that arch/arm is wired up for default domains and iommu-dma, devices
> behind IOMMUs will get mappings set up automatically as appropriate, so
> there is no need for drivers to do so manually.
> 
> Signed-off-by: Robin Murphy 

Thanks for the patch.

I haven't looked at the details but it seems that this causes the buffer
memory allocation to be physically contiguous, which causes a failure to
allocate video buffers of entirely normal size. I guess that was not
intentional?

-8<---
[  218.934448] WARNING: CPU: 0 PID: 1994 at mm/page_alloc.c:4859 
__alloc_pages_nodemask+0x9c/0xb1c
[  218.943847] Modules linked in: omap3_isp videobuf2_dma_contig 
videobuf2_memops videobuf2_v4l2 videobuf2_common leds_as3645a smiapp 
v4l2_flash_led_class led_class_flash v4l2_fwnode smiapp_pll videodev leds_gpio 
mc led_class
[  218.964660] CPU: 0 PID: 1994 Comm: yavta Not tainted 5.9.0-rc1-dirty #1818
[  218.972442] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[  218.978973] Backtrace: 
[  218.981842] [] (dump_backtrace) from [] 
(show_stack+0x20/0x24)
[  218.989715]  r7: r6:0009 r5:c08f03bc r4:c08f2fef
[  218.995880] [] (show_stack) from [] 
(dump_stack+0x28/0x30)
[  219.003631] [] (dump_stack) from [] (__warn+0x100/0x118)
[  219.010955]  r5:c08f03bc r4:
[  219.014953] [] (__warn) from [] 
(warn_slowpath_fmt+0x84/0xa8)
[  219.022949]  r9:c0232090 r8:c08f03bc r7:c0b08a88 r6:0009 r5:12fb 
r4:
[  219.031036] [] (warn_slowpath_fmt) from [] 
(__alloc_pages_nodemask+0x9c/0xb1c)
[  219.040557]  r9:c0185c3c r8: r7:010ec000 r6: r5:000d 
r4:
[  219.048858] [] (__alloc_pages_nodemask) from [] 
(__dma_alloc_buffer.constprop.14+0x3c/0x90)
[  219.059570]  r10:0cc0 r9:c0185c3c r8: r7:010ec000 r6:000d 
r5:c0b08a88
[  219.067901]  r4:0cc0
[  219.070587] [] (__dma_alloc_buffer.constprop.14) from [] 
(remap_allocator_alloc+0x34/0x7c)
[  219.081207]  r9:c0185c3c r8:0247 r7:e6d7fb84 r6:010ec000 r5:c0b08a88 
r4:0001
[  219.089263] [] (remap_allocator_alloc) from [] 
(__dma_alloc+0x124/0x21c)
[  219.098236]  r9:ed99fc10 r8:e69aa890 r7: r6: r5:c0b08a88 
r4:e6fdd680
[  219.106536] [] (__dma_alloc) from [] 
(arm_dma_alloc+0x68/0x74)
[  219.114654]  r10:0cc0 r9:c0185c3c r8:0cc0 r7:e69aa890 r6:010ec000 
r5:ed99fc10
[  219.122985]  r4:
[  219.125671] [] (arm_dma_alloc) from [] 
(dma_alloc_attrs+0xe4/0x120)
[  219.134216]  r9: r8:e69aa890 r7:010ec000 r6:c0b08a88 r5:ed99fc10 
r4:c010f634
[  219.142517] [] (dma_alloc_attrs) from [] 
(vb2_dc_alloc+0xcc/0x108 [videobuf2_dma_contig])
[  219.153076]  r10:e6885ca8 r9:e6abfc48 r8:0002 r7: r6:010ec000 
r5:ed99fc10
[  219.161407]  r4:e69aa880
[  219.164184] [] (vb2_dc_alloc [videobuf2_dma_contig]) from 
[] (__vb2_queue_alloc+0x258/0x4a4 [videobuf2_common])
[  219.176696]  r8:bf095b70 r7:010ec000 r6: r5:e6885ca8 r4:e6abfc00
[  219.183959] [] (__vb2_queue_alloc [videobuf2_common]) from 
[] (vb2_core_reqbufs+0x408/0x498 [videobuf2_common])
[  219.196533]  r10:e6885ce8 r9: r8:e6d7fe24 r7:e6d7fcec r6:bf09ced4 
r5:bf088580
[  219.204895]  r4:e6885ca8
[  219.207672] [] (vb2_core_reqbufs [videobuf2_common]) from 
[] (vb2_reqbufs+0x64/0x70 [videobuf2_v4l2])
[  219.219268]  r10: r9:bf032bc0 r8:c0145608 r7:bf0ad4a4 r6:e6885ca8 
r5:
[  219.227600]  r4:e6d7fe24
[  219.230499] [] (vb2_reqbufs [videobuf2_v4l2]) from [] 
(isp_video_reqbufs+0x40/0x54 [omap3_isp])
[  219.241607]  r7:bf0ad4a4 r6:e6d7fe24 r5:e6885c00 r4:e6cca928
[  219.247924] [] (isp_video_reqbufs [omap3_isp]) from [] 
(v4l_reqbufs+0x4c/0x50 [videodev])
[  219.258514]  r7:bf0ad4a4 r6:e6885c00 r5:e6d7fe24 r4:e7efbec0
[  219.264984] [] (v4l_reqbufs [videodev]) from [] 
(__video_do_ioctl+0x2d8/0x414 [videodev])
[  219.275512]  r7:bf01de00 r6: r5: r4:e6cca2e0
[  219.281982] [] (__video_do_ioctl [videodev]) from [] 
(video_usercopy+0x144/0x508 [videodev])
[  219.292816]  r10:e7efbec0 r9:c0145608 r8:e6d7fe24 r7: r6: 
r5:bf01ebdc
[  219.300933]  r4:c0145608
[  219.304168] [] (video_usercopy [videodev]) from [] 
(video_ioctl2+0x1c/0x24 [videodev])
[  219.314453]  r10:e7fbfda0 r9:e7efbec0 r8:0003 r7: r6:bee658f4 
r5:c0145608
[  219.322784]  r4:e7efbec0
[  219.325775] [] (video_ioctl2 [videodev]) from [] 
(v4l2_ioctl+0x50/0x64 [videodev])
[  219.335845] [] (v4l2_ioctl [videodev]) from [] 
(vfs_ioctl+0x30/0x44)
[  219.344482]  r7: r6:e7efbec0 r5:bee658f4 r4:c0145608
[  219.350402] [] (vfs_ioctl) from [] (sys_ioctl+0xdc/0x7ec)
[  219.358062] [] (sys_ioctl) from [] 
(ret_fast_syscall+0x0/0x28)
[  219.366149] Exception stack(0xe6d7ffa8 to 0xe6d7fff0)
[  219.371673] ffa0:    bee65c1a 0003 c0145608 
bee658f4 0001
[  219.380157] ffc0:  bee65c1a  0036 09a0  
ef30 010

Re: [PATCH 10/18] iommu/msm: Add IOMMU_DOMAIN_DMA support

2020-08-20 Thread Robin Murphy


On 2020-08-20 16:55, Rob Clark wrote:

Side note, I suspect we'll end up needing something like
0e764a01015dfebff8a8ffd297d74663772e248a .. if someone can dig a 32b
device out of the closet and dust it off, the fix is easy enough.
Just wanted to mention that here so anyone with a 32b device could
find what is needed.


FWIW there shouldn't be any material change here - the generic default 
domain is installed under the same circumstances as the Arm 
dma_iommu_mapping was, so if any platform does have an issue, then it 
should already have started 4 years with f78ebca8ff3d ("iommu/msm: Add 
support for generic master bindings").


Robin.



BR,
-R

On Thu, Aug 20, 2020 at 8:09 AM Robin Murphy  wrote:


Now that arch/arm is wired up for default domains and iommu-dma,
implement the corresponding driver-side support for DMA domains.

Signed-off-by: Robin Murphy 
---
  drivers/iommu/msm_iommu.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 3615cd6241c4..f34efcbb0b2b 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -8,6 +8,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -314,13 +315,16 @@ static struct iommu_domain 
*msm_iommu_domain_alloc(unsigned type)
  {
 struct msm_priv *priv;

-   if (type != IOMMU_DOMAIN_UNMANAGED)
+   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
 return NULL;

 priv = kzalloc(sizeof(*priv), GFP_KERNEL);
 if (!priv)
 goto fail_nomem;

+   if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(&priv->domain))
+   goto fail_nomem;
+
 INIT_LIST_HEAD(&priv->list_attached);

 priv->domain.geometry.aperture_start = 0;
@@ -339,6 +343,7 @@ static void msm_iommu_domain_free(struct iommu_domain 
*domain)
 struct msm_priv *priv;
 unsigned long flags;

+   iommu_put_dma_cookie(domain);
 spin_lock_irqsave(&msm_iommu_lock, flags);
 priv = to_msm_priv(domain);
 kfree(priv);
--
2.28.0.dirty

___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-20 Thread Christoph Hellwig

On Thu, Aug 20, 2020 at 12:05:29PM +0200, Tomasz Figa wrote:
> The UAPI and V4L2/videobuf2 changes are in good shape and the only
> wrong part is the use of DMA API, which was based on an earlier email
> guidance anyway, and a change to the synchronization part . I find
> conclusions like the above insulting for people who put many hours
> into designing and implementing the related functionality, given the
> complexity of the videobuf2 framework and how ill-defined the DMA API
> was, and would feel better if such could be avoided in future
> communication.

It wasn't meant to be too insulting, but I found this out when trying
to figure out how to just disable it.  But it also ends up using
the actual dma attr flags for it's own consistency checks, so just
not setting the flag did not turn out to work that easily.

But in general it helps to add a few more people to the Cc list for
such things that do stranger things.  Especially if you think you did
it based on the advice of those people.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-20 Thread Christoph Hellwig

On Thu, Aug 20, 2020 at 12:24:31PM +0200, Tomasz Figa wrote:
> > Of course this still uses the scatterlist structure with its annoying
> > mix of input and output parametes, so I'd rather not expose it as
> > an official API at the DMA layer.
> 
> The problem with the above open coded approach is that it requires
> explicit handling of the non-IOMMU and IOMMU cases and this is exactly
> what we don't want to have in vb2 and what was actually the job of the
> DMA API to hide. Is the plan to actually move the IOMMU handling out
> of the DMA API?
> 
> Do you think we could instead turn it into a dma_alloc_noncoherent()
> helper, which has similar semantics as dma_alloc_attrs() and handles
> the various corner cases (e.g. invalidate_kernel_vmap_range and
> flush_kernel_vmap_range) to achieve the desired functionality without
> delegating the "hell", as you called it, to the users?

Yes, I guess I could do something in that direction.  At least for
dma-iommu, which thanks to Robin should be all you'll need in the
foreseeable future.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-20 Thread Christoph Hellwig

On Thu, Aug 20, 2020 at 12:09:34PM +0200, Tomasz Figa wrote:
> > I'm happy to Cc and active participant in the discussion.  I'm not
> > going to add all reviewers because even with the trimmed CC list
> > I'm already hitting the number of receipients limit on various lists.
> 
> Fair enough.
> 
> We'll make your job easier and just turn my MAINTAINERS entry into a
> maintainer. :)

Sounds like a plan.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[GIT PULL] dma-mapping fixes for 5.9

2020-08-20 Thread Christoph Hellwig

The following changes since commit a1d21081a60dfb7fddf4a38b66d9cef603b317a9:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2020-08-13 
20:03:11 -0700)

are available in the Git repository at:

  git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-5.9-1

for you to fetch changes up to d7e673ec2c8e0ea39c4c70fc490d67d7fbda869d:

  dma-pool: Only allocate from CMA when in same memory zone (2020-08-14 
16:27:05 +0200)


dma-mapping fixes for 5.9

 - fix out more fallout from the dma-pool changes
   (Nicolas Saenz Julienne, me)


Christoph Hellwig (1):
  dma-pool: fix coherent pool allocations for IOMMU mappings

Nicolas Saenz Julienne (1):
  dma-pool: Only allocate from CMA when in same memory zone

 drivers/iommu/dma-iommu.c   |   4 +-
 include/linux/dma-direct.h  |   3 -
 include/linux/dma-mapping.h |   5 +-
 kernel/dma/direct.c |  13 ++--
 kernel/dma/pool.c   | 145 
 5 files changed, 92 insertions(+), 78 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 10/18] iommu/msm: Add IOMMU_DOMAIN_DMA support

2020-08-20 Thread Rob Clark

Side note, I suspect we'll end up needing something like
0e764a01015dfebff8a8ffd297d74663772e248a .. if someone can dig a 32b
device out of the closet and dust it off, the fix is easy enough.
Just wanted to mention that here so anyone with a 32b device could
find what is needed.

BR,
-R

On Thu, Aug 20, 2020 at 8:09 AM Robin Murphy  wrote:
>
> Now that arch/arm is wired up for default domains and iommu-dma,
> implement the corresponding driver-side support for DMA domains.
>
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/msm_iommu.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
> index 3615cd6241c4..f34efcbb0b2b 100644
> --- a/drivers/iommu/msm_iommu.c
> +++ b/drivers/iommu/msm_iommu.c
> @@ -8,6 +8,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -314,13 +315,16 @@ static struct iommu_domain 
> *msm_iommu_domain_alloc(unsigned type)
>  {
> struct msm_priv *priv;
>
> -   if (type != IOMMU_DOMAIN_UNMANAGED)
> +   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
> return NULL;
>
> priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> if (!priv)
> goto fail_nomem;
>
> +   if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(&priv->domain))
> +   goto fail_nomem;
> +
> INIT_LIST_HEAD(&priv->list_attached);
>
> priv->domain.geometry.aperture_start = 0;
> @@ -339,6 +343,7 @@ static void msm_iommu_domain_free(struct iommu_domain 
> *domain)
> struct msm_priv *priv;
> unsigned long flags;
>
> +   iommu_put_dma_cookie(domain);
> spin_lock_irqsave(&msm_iommu_lock, flags);
> priv = to_msm_priv(domain);
> kfree(priv);
> --
> 2.28.0.dirty
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 18/18] ARM/dma-mapping: Remove legacy dma-iommu API

2020-08-20 Thread Robin Murphy

With no users left and generic iommu-dma now doing all the work,
clean up the last traces of the arch-specific API, plus the temporary
workarounds that you'd forgotten about because you were thinking about
zebras instead.

Signed-off-by: Robin Murphy 
---
 arch/arm/common/dmabounce.c  |   1 -
 arch/arm/include/asm/device.h|   9 --
 arch/arm/include/asm/dma-iommu.h |  29 -
 arch/arm/mm/dma-mapping.c| 200 +--
 drivers/iommu/dma-iommu.c|  38 ++
 5 files changed, 11 insertions(+), 266 deletions(-)
 delete mode 100644 arch/arm/include/asm/dma-iommu.h

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index f4b719bde763..064349df7bbf 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -30,7 +30,6 @@
 #include 
 
 #include 
-#include 
 
 #undef STATS
 
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index be666f58bf7a..db33f389c94e 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -8,9 +8,6 @@
 struct dev_archdata {
 #ifdef CONFIG_DMABOUNCE
struct dmabounce_device_info *dmabounce;
-#endif
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-   struct dma_iommu_mapping*mapping;
 #endif
unsigned int dma_coherent:1;
unsigned int dma_ops_setup:1;
@@ -24,10 +21,4 @@ struct pdev_archdata {
 #endif
 };
 
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-#define to_dma_iommu_mapping(dev) ((dev)->archdata.mapping)
-#else
-#define to_dma_iommu_mapping(dev) NULL
-#endif
-
 #endif
diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h
deleted file mode 100644
index f39cfa509fe4..
--- a/arch/arm/include/asm/dma-iommu.h
+++ /dev/null
@@ -1,29 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef ASMARM_DMA_IOMMU_H
-#define ASMARM_DMA_IOMMU_H
-
-#ifdef __KERNEL__
-
-#include 
-#include 
-#include 
-#include 
-
-struct dma_iommu_mapping {
-   /* iommu specific data */
-   struct iommu_domain *domain;
-
-   struct kref kref;
-};
-
-struct dma_iommu_mapping *
-arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, u64 size);
-
-void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping);
-
-int arm_iommu_attach_device(struct device *dev,
-   struct dma_iommu_mapping *mapping);
-void arm_iommu_detach_device(struct device *dev);
-
-#endif /* __KERNEL__ */
-#endif
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 2ef0afc17645..ff6c4962161a 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -33,7 +33,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -1073,201 +1072,6 @@ static const struct dma_map_ops 
*arm_get_dma_map_ops(bool coherent)
return coherent ? &arm_coherent_dma_ops : &arm_dma_ops;
 }
 
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-
-extern const struct dma_map_ops iommu_dma_ops;
-extern int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
-   u64 size, struct device *dev);
-/**
- * arm_iommu_create_mapping
- * @bus: pointer to the bus holding the client device (for IOMMU calls)
- * @base: start address of the valid IO address space
- * @size: maximum size of the valid IO address space
- *
- * Creates a mapping structure which holds information about used/unused
- * IO address ranges, which is required to perform memory allocation and
- * mapping with IOMMU aware functions.
- *
- * The client device need to be attached to the mapping with
- * arm_iommu_attach_device function.
- */
-struct dma_iommu_mapping *
-arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, u64 size)
-{
-   struct dma_iommu_mapping *mapping;
-   int err = -ENOMEM;
-
-   mapping = kzalloc(sizeof(*mapping), GFP_KERNEL);
-   if (!mapping)
-   goto err;
-
-   mapping->domain = iommu_domain_alloc(bus);
-   if (!mapping->domain)
-   goto err2;
-
-   err = iommu_get_dma_cookie(mapping->domain);
-   if (err)
-   goto err3;
-
-   err = iommu_dma_init_domain(mapping->domain, base, size, NULL);
-   if (err)
-   goto err4;
-
-   kref_init(&mapping->kref);
-   return mapping;
-err4:
-   iommu_put_dma_cookie(mapping->domain);
-err3:
-   iommu_domain_free(mapping->domain);
-err2:
-   kfree(mapping);
-err:
-   return ERR_PTR(err);
-}
-EXPORT_SYMBOL_GPL(arm_iommu_create_mapping);
-
-static void release_iommu_mapping(struct kref *kref)
-{
-   struct dma_iommu_mapping *mapping =
-   container_of(kref, struct dma_iommu_mapping, kref);
-
-   iommu_put_dma_cookie(mapping->domain);
-   iommu_domain_free(mapping->domain);
-   kfree(mapping);
-}
-
-void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping)
-{
-   if (mapping)
-   kref_put(&mapping->kref, release_iommu_mapping);
-}
-EXPORT_SYMBOL_GPL(arm_iommu_release_ma

[PATCH 07/18] iommu/arm-smmu: Remove arch/arm workaround

2020-08-20 Thread Robin Murphy

Now that arch/arm is wired up for default domains and iommu-dma, remove
the add_device workaround.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 10 --
 1 file changed, 10 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 09c42af9f31e..4e52d8cb67dd 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1164,17 +1164,7 @@ static int arm_smmu_attach_dev(struct iommu_domain 
*domain, struct device *dev)
return -ENXIO;
}
 
-   /*
-* FIXME: The arch/arm DMA API code tries to attach devices to its own
-* domains between of_xlate() and probe_device() - we have no way to 
cope
-* with that, so until ARM gets converted to rely on groups and default
-* domains, just say no (but more politely than by dereferencing NULL).
-* This should be at least a WARN_ON once that's sorted.
-*/
cfg = dev_iommu_priv_get(dev);
-   if (!cfg)
-   return -ENODEV;
-
smmu = cfg->smmu;
 
ret = arm_smmu_rpm_get(smmu);
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 10/18] iommu/msm: Add IOMMU_DOMAIN_DMA support

2020-08-20 Thread Robin Murphy

Now that arch/arm is wired up for default domains and iommu-dma,
implement the corresponding driver-side support for DMA domains.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/msm_iommu.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 3615cd6241c4..f34efcbb0b2b 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -314,13 +315,16 @@ static struct iommu_domain 
*msm_iommu_domain_alloc(unsigned type)
 {
struct msm_priv *priv;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED)
+   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
return NULL;
 
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (!priv)
goto fail_nomem;
 
+   if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(&priv->domain))
+   goto fail_nomem;
+
INIT_LIST_HEAD(&priv->list_attached);
 
priv->domain.geometry.aperture_start = 0;
@@ -339,6 +343,7 @@ static void msm_iommu_domain_free(struct iommu_domain 
*domain)
struct msm_priv *priv;
unsigned long flags;
 
+   iommu_put_dma_cookie(domain);
spin_lock_irqsave(&msm_iommu_lock, flags);
priv = to_msm_priv(domain);
kfree(priv);
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 12/18] iommu/tegra-gart: Add IOMMU_DOMAIN_DMA support

2020-08-20 Thread Robin Murphy

Now that arch/arm is wired up for default domains and iommu-dma,
implement the corresponding driver-side support for DMA domains.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/tegra-gart.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c
index fac720273889..e081387080f6 100644
--- a/drivers/iommu/tegra-gart.c
+++ b/drivers/iommu/tegra-gart.c
@@ -9,6 +9,7 @@
 
 #define dev_fmt(fmt)   "gart: " fmt
 
+#include 
 #include 
 #include 
 #include 
@@ -145,16 +146,22 @@ static struct iommu_domain 
*gart_iommu_domain_alloc(unsigned type)
 {
struct iommu_domain *domain;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED)
+   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
return NULL;
 
domain = kzalloc(sizeof(*domain), GFP_KERNEL);
-   if (domain) {
-   domain->geometry.aperture_start = gart_handle->iovmm_base;
-   domain->geometry.aperture_end = gart_handle->iovmm_end - 1;
-   domain->geometry.force_aperture = true;
+   if (!domain)
+   return NULL;
+
+   if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(domain)) {
+   kfree(domain);
+   return NULL;
}
 
+   domain->geometry.aperture_start = gart_handle->iovmm_base;
+   domain->geometry.aperture_end = gart_handle->iovmm_end - 1;
+   domain->geometry.force_aperture = true;
+
return domain;
 }
 
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 17/18] media/omap3isp: Clean up IOMMU workaround

2020-08-20 Thread Robin Murphy

Now that arch/arm is wired up for default domains and iommu-dma, devices
behind IOMMUs will get mappings set up automatically as appropriate, so
there is no need for drivers to do so manually.

Signed-off-by: Robin Murphy 
---
 drivers/media/platform/omap3isp/isp.c | 68 ++-
 drivers/media/platform/omap3isp/isp.h |  3 --
 2 files changed, 3 insertions(+), 68 deletions(-)

diff --git a/drivers/media/platform/omap3isp/isp.c 
b/drivers/media/platform/omap3isp/isp.c
index b91e472ee764..196522883231 100644
--- a/drivers/media/platform/omap3isp/isp.c
+++ b/drivers/media/platform/omap3isp/isp.c
@@ -56,10 +56,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-#include 
-#endif
-
 #include 
 #include 
 #include 
@@ -1942,51 +1938,6 @@ static int isp_initialize_modules(struct isp_device *isp)
return ret;
 }
 
-static void isp_detach_iommu(struct isp_device *isp)
-{
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-   arm_iommu_detach_device(isp->dev);
-   arm_iommu_release_mapping(isp->mapping);
-   isp->mapping = NULL;
-#endif
-}
-
-static int isp_attach_iommu(struct isp_device *isp)
-{
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-   struct dma_iommu_mapping *mapping;
-   int ret;
-
-   /*
-* Create the ARM mapping, used by the ARM DMA mapping core to allocate
-* VAs. This will allocate a corresponding IOMMU domain.
-*/
-   mapping = arm_iommu_create_mapping(&platform_bus_type, SZ_1G, SZ_2G);
-   if (IS_ERR(mapping)) {
-   dev_err(isp->dev, "failed to create ARM IOMMU mapping\n");
-   return PTR_ERR(mapping);
-   }
-
-   isp->mapping = mapping;
-
-   /* Attach the ARM VA mapping to the device. */
-   ret = arm_iommu_attach_device(isp->dev, mapping);
-   if (ret < 0) {
-   dev_err(isp->dev, "failed to attach device to VA mapping\n");
-   goto error;
-   }
-
-   return 0;
-
-error:
-   arm_iommu_release_mapping(isp->mapping);
-   isp->mapping = NULL;
-   return ret;
-#else
-   return -ENODEV;
-#endif
-}
-
 /*
  * isp_remove - Remove ISP platform device
  * @pdev: Pointer to ISP platform device
@@ -2002,10 +1953,6 @@ static int isp_remove(struct platform_device *pdev)
isp_cleanup_modules(isp);
isp_xclk_cleanup(isp);
 
-   __omap3isp_get(isp, false);
-   isp_detach_iommu(isp);
-   __omap3isp_put(isp, false);
-
media_entity_enum_cleanup(&isp->crashed);
v4l2_async_notifier_cleanup(&isp->notifier);
 
@@ -2383,18 +2330,11 @@ static int isp_probe(struct platform_device *pdev)
isp->mmio_hist_base_phys =
mem->start + isp_res_maps[m].offset[OMAP3_ISP_IOMEM_HIST];
 
-   /* IOMMU */
-   ret = isp_attach_iommu(isp);
-   if (ret < 0) {
-   dev_err(&pdev->dev, "unable to attach to IOMMU\n");
-   goto error_isp;
-   }
-
/* Interrupt */
ret = platform_get_irq(pdev, 0);
if (ret <= 0) {
ret = -ENODEV;
-   goto error_iommu;
+   goto error_isp;
}
isp->irq_num = ret;
 
@@ -2402,13 +2342,13 @@ static int isp_probe(struct platform_device *pdev)
 "OMAP3 ISP", isp)) {
dev_err(isp->dev, "Unable to request IRQ\n");
ret = -EINVAL;
-   goto error_iommu;
+   goto error_isp;
}
 
/* Entities */
ret = isp_initialize_modules(isp);
if (ret < 0)
-   goto error_iommu;
+   goto error_isp;
 
ret = isp_register_entities(isp);
if (ret < 0)
@@ -2433,8 +2373,6 @@ static int isp_probe(struct platform_device *pdev)
isp_unregister_entities(isp);
 error_modules:
isp_cleanup_modules(isp);
-error_iommu:
-   isp_detach_iommu(isp);
 error_isp:
isp_xclk_cleanup(isp);
__omap3isp_put(isp, false);
diff --git a/drivers/media/platform/omap3isp/isp.h 
b/drivers/media/platform/omap3isp/isp.h
index a9d760fbf349..b50459106d89 100644
--- a/drivers/media/platform/omap3isp/isp.h
+++ b/drivers/media/platform/omap3isp/isp.h
@@ -145,7 +145,6 @@ struct isp_xclk {
  * @syscon: Regmap for the syscon register space
  * @syscon_offset: Offset of the CSIPHY control register in syscon
  * @phy_type: ISP_PHY_TYPE_{3430,3630}
- * @mapping: IOMMU mapping
  * @stat_lock: Spinlock for handling statistics
  * @isp_mutex: Mutex for serializing requests to ISP.
  * @stop_failure: Indicates that an entity failed to stop.
@@ -185,8 +184,6 @@ struct isp_device {
u32 syscon_offset;
u32 phy_type;
 
-   struct dma_iommu_mapping *mapping;
-
/* ISP Obj */
spinlock_t stat_lock;   /* common lock for statistic drivers */
struct mutex isp_mutex; /* For handling ref_count field */
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mail

[PATCH 15/18] drm/nouveau/tegra: Clean up IOMMU workaround

2020-08-20 Thread Robin Murphy

Now that arch/arm is wired up for default domains and iommu-dma, we no
longer need to work around the arch-private mapping.

Signed-off-by: Robin Murphy 
---
 drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
index d0d52c1d4aee..410ee1f83e0b 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/tegra.c
@@ -23,10 +23,6 @@
 #ifdef CONFIG_NOUVEAU_PLATFORM_DRIVER
 #include "priv.h"
 
-#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
-#include 
-#endif
-
 static int
 nvkm_device_tegra_power_up(struct nvkm_device_tegra *tdev)
 {
@@ -109,15 +105,6 @@ nvkm_device_tegra_probe_iommu(struct nvkm_device_tegra 
*tdev)
unsigned long pgsize_bitmap;
int ret;
 
-#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
-   if (dev->archdata.mapping) {
-   struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-
-   arm_iommu_detach_device(dev);
-   arm_iommu_release_mapping(mapping);
-   }
-#endif
-
if (!tdev->func->iommu_bit)
return;
 
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 16/18] staging/media/tegra-vde: Clean up IOMMU workaround

2020-08-20 Thread Robin Murphy

Now that arch/arm is wired up for default domains and iommu-dma, we no
longer need to work around the arch-private mapping.

Signed-off-by: Robin Murphy 
---
 drivers/staging/media/tegra-vde/iommu.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/drivers/staging/media/tegra-vde/iommu.c 
b/drivers/staging/media/tegra-vde/iommu.c
index 6af863d92123..4f770189ed34 100644
--- a/drivers/staging/media/tegra-vde/iommu.c
+++ b/drivers/staging/media/tegra-vde/iommu.c
@@ -10,10 +10,6 @@
 #include 
 #include 
 
-#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
-#include 
-#endif
-
 #include "vde.h"
 
 int tegra_vde_iommu_map(struct tegra_vde *vde,
@@ -70,14 +66,6 @@ int tegra_vde_iommu_init(struct tegra_vde *vde)
if (!vde->group)
return 0;
 
-#if IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU)
-   if (dev->archdata.mapping) {
-   struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-
-   arm_iommu_detach_device(dev);
-   arm_iommu_release_mapping(mapping);
-   }
-#endif
vde->domain = iommu_domain_alloc(&platform_bus_type);
if (!vde->domain) {
err = -ENOMEM;
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 06/18] ARM/dma-mapping: Support IOMMU default domains

2020-08-20 Thread Robin Murphy

Now that iommu-dma is wired up, we can let it work as normal
without the dma_iommu_mapping hacks if the IOMMU driver already
supports default domains.

Signed-off-by: Robin Murphy 
---
 arch/arm/mm/dma-mapping.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0f69ede44cd7..2ef0afc17645 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1220,6 +1220,13 @@ static bool arm_setup_iommu_dma_ops(struct device *dev, 
u64 dma_base, u64 size,
if (!iommu)
return false;
 
+   /* If a default domain exists, just let iommu-dma work normally */
+   if (iommu_get_domain_for_dev(dev)) {
+   iommu_setup_dma_ops(dev, dma_base, size);
+   return true;
+   }
+
+   /* Otherwise, use the workaround until the IOMMU driver is updated */
mapping = arm_iommu_create_mapping(dev->bus, dma_base, size);
if (IS_ERR(mapping)) {
pr_warn("Failed to create %llu-byte IOMMU mapping for device 
%s\n",
@@ -1234,6 +1241,7 @@ static bool arm_setup_iommu_dma_ops(struct device *dev, 
u64 dma_base, u64 size,
return false;
}
 
+   set_dma_ops(dev, &iommu_dma_ops);
return true;
 }
 
@@ -1263,8 +1271,6 @@ static void arm_teardown_iommu_dma_ops(struct device 
*dev) { }
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu, bool coherent)
 {
-   const struct dma_map_ops *dma_ops;
-
dev->archdata.dma_coherent = coherent;
 #ifdef CONFIG_SWIOTLB
dev->dma_coherent = coherent;
@@ -1278,12 +1284,9 @@ void arch_setup_dma_ops(struct device *dev, u64 
dma_base, u64 size,
if (dev->dma_ops)
return;
 
-   if (arm_setup_iommu_dma_ops(dev, dma_base, size, iommu))
-   dma_ops = &iommu_dma_ops;
-   else
-   dma_ops = arm_get_dma_map_ops(coherent);
+   set_dma_ops(dev, arm_get_dma_map_ops(coherent));
 
-   set_dma_ops(dev, dma_ops);
+   arm_setup_iommu_dma_ops(dev, dma_base, size, iommu);
 
 #ifdef CONFIG_XEN
if (xen_initial_domain())
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 14/18] drm/exynos: Consolidate IOMMU mapping code

2020-08-20 Thread Robin Murphy

Now that arch/arm is wired up for default domains and iommu-dma, we can
consolidate the shared mapping code onto the generic IOMMU API version,
and retire the arch-specific implementation.

Signed-off-by: Robin Murphy 

---
This is a cheeky revert of 07dc3678bacc ("drm/exynos: Fix cleanup of
IOMMU related objects"), plus removal of the remaining arm_iommu_*
references on top.
---
 drivers/gpu/drm/exynos/exynos5433_drm_decon.c |  5 +-
 drivers/gpu/drm/exynos/exynos7_drm_decon.c|  5 +-
 drivers/gpu/drm/exynos/exynos_drm_dma.c   | 61 +++
 drivers/gpu/drm/exynos/exynos_drm_drv.h   |  6 +-
 drivers/gpu/drm/exynos/exynos_drm_fimc.c  |  5 +-
 drivers/gpu/drm/exynos/exynos_drm_fimd.c  |  5 +-
 drivers/gpu/drm/exynos/exynos_drm_g2d.c   |  5 +-
 drivers/gpu/drm/exynos/exynos_drm_gsc.c   |  5 +-
 drivers/gpu/drm/exynos/exynos_drm_rotator.c   |  5 +-
 drivers/gpu/drm/exynos/exynos_drm_scaler.c|  6 +-
 drivers/gpu/drm/exynos/exynos_mixer.c |  7 +--
 11 files changed, 29 insertions(+), 86 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos5433_drm_decon.c 
b/drivers/gpu/drm/exynos/exynos5433_drm_decon.c
index 1f79bc2a881e..8428ae12dfa5 100644
--- a/drivers/gpu/drm/exynos/exynos5433_drm_decon.c
+++ b/drivers/gpu/drm/exynos/exynos5433_drm_decon.c
@@ -55,7 +55,6 @@ static const char * const decon_clks_name[] = {
 struct decon_context {
struct device   *dev;
struct drm_device   *drm_dev;
-   void*dma_priv;
struct exynos_drm_crtc  *crtc;
struct exynos_drm_plane planes[WINDOWS_NR];
struct exynos_drm_plane_config  configs[WINDOWS_NR];
@@ -645,7 +644,7 @@ static int decon_bind(struct device *dev, struct device 
*master, void *data)
 
decon_clear_channels(ctx->crtc);
 
-   return exynos_drm_register_dma(drm_dev, dev, &ctx->dma_priv);
+   return exynos_drm_register_dma(drm_dev, dev);
 }
 
 static void decon_unbind(struct device *dev, struct device *master, void *data)
@@ -655,7 +654,7 @@ static void decon_unbind(struct device *dev, struct device 
*master, void *data)
decon_atomic_disable(ctx->crtc);
 
/* detach this sub driver from iommu mapping if supported. */
-   exynos_drm_unregister_dma(ctx->drm_dev, ctx->dev, &ctx->dma_priv);
+   exynos_drm_unregister_dma(ctx->drm_dev, ctx->dev);
 }
 
 static const struct component_ops decon_component_ops = {
diff --git a/drivers/gpu/drm/exynos/exynos7_drm_decon.c 
b/drivers/gpu/drm/exynos/exynos7_drm_decon.c
index f2d87a7445c7..e7b58097ccdc 100644
--- a/drivers/gpu/drm/exynos/exynos7_drm_decon.c
+++ b/drivers/gpu/drm/exynos/exynos7_drm_decon.c
@@ -40,7 +40,6 @@
 struct decon_context {
struct device   *dev;
struct drm_device   *drm_dev;
-   void*dma_priv;
struct exynos_drm_crtc  *crtc;
struct exynos_drm_plane planes[WINDOWS_NR];
struct exynos_drm_plane_config  configs[WINDOWS_NR];
@@ -128,13 +127,13 @@ static int decon_ctx_initialize(struct decon_context *ctx,
 
decon_clear_channels(ctx->crtc);
 
-   return exynos_drm_register_dma(drm_dev, ctx->dev, &ctx->dma_priv);
+   return exynos_drm_register_dma(drm_dev, ctx->dev);
 }
 
 static void decon_ctx_remove(struct decon_context *ctx)
 {
/* detach this sub driver from iommu mapping if supported. */
-   exynos_drm_unregister_dma(ctx->drm_dev, ctx->dev, &ctx->dma_priv);
+   exynos_drm_unregister_dma(ctx->drm_dev, ctx->dev);
 }
 
 static u32 decon_calc_clkdiv(struct decon_context *ctx,
diff --git a/drivers/gpu/drm/exynos/exynos_drm_dma.c 
b/drivers/gpu/drm/exynos/exynos_drm_dma.c
index 58b89ec11b0e..fd5f9bcf1857 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_dma.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_dma.c
@@ -14,19 +14,6 @@
 
 #include "exynos_drm_drv.h"
 
-#if defined(CONFIG_ARM_DMA_USE_IOMMU)
-#include 
-#else
-#define arm_iommu_create_mapping(...)  ({ NULL; })
-#define arm_iommu_attach_device(...)   ({ -ENODEV; })
-#define arm_iommu_release_mapping(...) ({ })
-#define arm_iommu_detach_device(...)   ({ })
-#define to_dma_iommu_mapping(dev) NULL
-#endif
-
-#if !defined(CONFIG_IOMMU_DMA)
-#define iommu_dma_init_domain(...) ({ -EINVAL; })
-#endif
 
 #define EXYNOS_DEV_ADDR_START  0x2000
 #define EXYNOS_DEV_ADDR_SIZE   0x4000
@@ -58,7 +45,7 @@ static inline void clear_dma_max_seg_size(struct device *dev)
  * mapping.
  */
 static int drm_iommu_attach_device(struct drm_device *drm_dev,
-   struct device *subdrv_dev, void **dma_priv)
+   struct device *subdrv_dev)
 {
struct exynos_drm_private *priv = drm_dev->dev_private;
int ret = 0;
@@ -73,22 +60,7 @@ static int drm_iommu_attach_device(struct drm_device 
*drm_dev,
if (ret)
return ret;
 
-   if (IS_ENABLED(CONFIG_ARM_DMA_USE_IOMMU))

[PATCH 05/18] ARM/dma-mapping: Switch to iommu_dma_ops

2020-08-20 Thread Robin Murphy

With the IOMMU ops now looking much the same shape as iommu_dma_ops,
switch them out in favour of the iommu-dma library, currently enhanced
with temporary workarounds that allow it to also sit underneath the
arch-specific API. With that in place, we can now start converting the
remaining IOMMU drivers and consumers to work with IOMMU API default
domains instead.

Signed-off-by: Robin Murphy 
---
 arch/arm/Kconfig |  24 +-
 arch/arm/include/asm/dma-iommu.h |   8 -
 arch/arm/mm/dma-mapping.c| 887 +--
 drivers/iommu/Kconfig|   8 -
 drivers/media/platform/Kconfig   |   1 -
 5 files changed, 22 insertions(+), 906 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index b91273f9fd43..79406fe5cd6b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -133,31 +133,11 @@ config ARM_HAS_SG_CHAIN
bool
 
 config ARM_DMA_USE_IOMMU
-   bool
+   def_bool IOMMU_SUPPORT
select ARM_HAS_SG_CHAIN
+   select IOMMU_DMA
select NEED_SG_DMA_LENGTH
 
-if ARM_DMA_USE_IOMMU
-
-config ARM_DMA_IOMMU_ALIGNMENT
-   int "Maximum PAGE_SIZE order of alignment for DMA IOMMU buffers"
-   range 4 9
-   default 8
-   help
- DMA mapping framework by default aligns all buffers to the smallest
- PAGE_SIZE order which is greater than or equal to the requested buffer
- size. This works well for buffers up to a few hundreds kilobytes, but
- for larger buffers it just a waste of address space. Drivers which has
- relatively small addressing window (like 64Mib) might run out of
- virtual space with just a few allocations.
-
- With this parameter you can specify the maximum PAGE_SIZE order for
- DMA IOMMU buffers. Larger buffers will be aligned only to this
- specified order. The order is expressed as a power of two multiplied
- by the PAGE_SIZE.
-
-endif
-
 config SYS_SUPPORTS_APM_EMULATION
bool
 
diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h
index 86405cc81385..f39cfa509fe4 100644
--- a/arch/arm/include/asm/dma-iommu.h
+++ b/arch/arm/include/asm/dma-iommu.h
@@ -13,14 +13,6 @@ struct dma_iommu_mapping {
/* iommu specific data */
struct iommu_domain *domain;
 
-   unsigned long   **bitmaps;  /* array of bitmaps */
-   unsigned intnr_bitmaps; /* nr of elements in array */
-   unsigned intextensions;
-   size_t  bitmap_size;/* size of a single bitmap */
-   size_t  bits;   /* per bitmap */
-   dma_addr_t  base;
-
-   spinlock_t  lock;
struct kref kref;
 };
 
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0537c97cebe1..0f69ede44cd7 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1074,812 +1075,9 @@ static const struct dma_map_ops 
*arm_get_dma_map_ops(bool coherent)
 
 #ifdef CONFIG_ARM_DMA_USE_IOMMU
 
-static int __dma_info_to_prot(enum dma_data_direction dir, unsigned long attrs)
-{
-   int prot = 0;
-
-   if (attrs & DMA_ATTR_PRIVILEGED)
-   prot |= IOMMU_PRIV;
-
-   switch (dir) {
-   case DMA_BIDIRECTIONAL:
-   return prot | IOMMU_READ | IOMMU_WRITE;
-   case DMA_TO_DEVICE:
-   return prot | IOMMU_READ;
-   case DMA_FROM_DEVICE:
-   return prot | IOMMU_WRITE;
-   default:
-   return prot;
-   }
-}
-
-/* IOMMU */
-
-static int extend_iommu_mapping(struct dma_iommu_mapping *mapping);
-
-static inline dma_addr_t __alloc_iova(struct dma_iommu_mapping *mapping,
- size_t size)
-{
-   unsigned int order = get_order(size);
-   unsigned int align = 0;
-   unsigned int count, start;
-   size_t mapping_size = mapping->bits << PAGE_SHIFT;
-   unsigned long flags;
-   dma_addr_t iova;
-   int i;
-
-   if (order > CONFIG_ARM_DMA_IOMMU_ALIGNMENT)
-   order = CONFIG_ARM_DMA_IOMMU_ALIGNMENT;
-
-   count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   align = (1 << order) - 1;
-
-   spin_lock_irqsave(&mapping->lock, flags);
-   for (i = 0; i < mapping->nr_bitmaps; i++) {
-   start = bitmap_find_next_zero_area(mapping->bitmaps[i],
-   mapping->bits, 0, count, align);
-
-   if (start > mapping->bits)
-   continue;
-
-   bitmap_set(mapping->bitmaps[i], start, count);
-   break;
-   }
-
-   /*
-* No unused range found. Try to extend the existing mapping
-* and perform a second attempt to reserve an IO virtual
-* address range of size bytes.
-*/
-   if (i == mapping->nr_bitmaps) {
-

[PATCH 08/18] iommu/renesas: Remove arch/arm workaround

2020-08-20 Thread Robin Murphy

Now that arch/arm is wired up for default domains and iommu-dma, remove
the shared mapping workaround and rely on groups there as well.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/ipmmu-vmsa.c | 69 --
 1 file changed, 69 deletions(-)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 0f18abda0e20..8ad74a76f402 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -26,15 +26,6 @@
 #include 
 #include 
 
-#if defined(CONFIG_ARM) && !defined(CONFIG_IOMMU_DMA)
-#include 
-#else
-#define arm_iommu_create_mapping(...)  NULL
-#define arm_iommu_attach_device(...)   -ENODEV
-#define arm_iommu_release_mapping(...) do {} while (0)
-#define arm_iommu_detach_device(...)   do {} while (0)
-#endif
-
 #define IPMMU_CTX_MAX  8U
 #define IPMMU_CTX_INVALID  -1
 
@@ -67,7 +58,6 @@ struct ipmmu_vmsa_device {
s8 utlb_ctx[IPMMU_UTLB_MAX];
 
struct iommu_group *group;
-   struct dma_iommu_mapping *mapping;
 };
 
 struct ipmmu_vmsa_domain {
@@ -805,50 +795,6 @@ static int ipmmu_of_xlate(struct device *dev,
return ipmmu_init_platform_device(dev, spec);
 }
 
-static int ipmmu_init_arm_mapping(struct device *dev)
-{
-   struct ipmmu_vmsa_device *mmu = to_ipmmu(dev);
-   int ret;
-
-   /*
-* Create the ARM mapping, used by the ARM DMA mapping core to allocate
-* VAs. This will allocate a corresponding IOMMU domain.
-*
-* TODO:
-* - Create one mapping per context (TLB).
-* - Make the mapping size configurable ? We currently use a 2GB mapping
-*   at a 1GB offset to ensure that NULL VAs will fault.
-*/
-   if (!mmu->mapping) {
-   struct dma_iommu_mapping *mapping;
-
-   mapping = arm_iommu_create_mapping(&platform_bus_type,
-  SZ_1G, SZ_2G);
-   if (IS_ERR(mapping)) {
-   dev_err(mmu->dev, "failed to create ARM IOMMU 
mapping\n");
-   ret = PTR_ERR(mapping);
-   goto error;
-   }
-
-   mmu->mapping = mapping;
-   }
-
-   /* Attach the ARM VA mapping to the device. */
-   ret = arm_iommu_attach_device(dev, mmu->mapping);
-   if (ret < 0) {
-   dev_err(dev, "Failed to attach device to VA mapping\n");
-   goto error;
-   }
-
-   return 0;
-
-error:
-   if (mmu->mapping)
-   arm_iommu_release_mapping(mmu->mapping);
-
-   return ret;
-}
-
 static struct iommu_device *ipmmu_probe_device(struct device *dev)
 {
struct ipmmu_vmsa_device *mmu = to_ipmmu(dev);
@@ -862,20 +808,8 @@ static struct iommu_device *ipmmu_probe_device(struct 
device *dev)
return &mmu->iommu;
 }
 
-static void ipmmu_probe_finalize(struct device *dev)
-{
-   int ret = 0;
-
-   if (IS_ENABLED(CONFIG_ARM) && !IS_ENABLED(CONFIG_IOMMU_DMA))
-   ret = ipmmu_init_arm_mapping(dev);
-
-   if (ret)
-   dev_err(dev, "Can't create IOMMU mapping - DMA-OPS will not 
work\n");
-}
-
 static void ipmmu_release_device(struct device *dev)
 {
-   arm_iommu_detach_device(dev);
 }
 
 static struct iommu_group *ipmmu_find_group(struct device *dev)
@@ -905,7 +839,6 @@ static const struct iommu_ops ipmmu_ops = {
.iova_to_phys = ipmmu_iova_to_phys,
.probe_device = ipmmu_probe_device,
.release_device = ipmmu_release_device,
-   .probe_finalize = ipmmu_probe_finalize,
.device_group = IS_ENABLED(CONFIG_ARM) && !IS_ENABLED(CONFIG_IOMMU_DMA)
? generic_device_group : ipmmu_find_group,
.pgsize_bitmap = SZ_1G | SZ_2M | SZ_4K,
@@ -1118,8 +1051,6 @@ static int ipmmu_remove(struct platform_device *pdev)
iommu_device_sysfs_remove(&mmu->iommu);
iommu_device_unregister(&mmu->iommu);
 
-   arm_iommu_release_mapping(mmu->mapping);
-
ipmmu_device_reset(mmu);
 
return 0;
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 13/18] iommu/tegra: Add IOMMU_DOMAIN_DMA support

2020-08-20 Thread Robin Murphy

Now that arch/arm is wired up for default domains and iommu-dma,
implement the corresponding driver-side support for DMA domains.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/tegra-smmu.c | 37 +
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 124c8848ab7e..8e276eac84d9 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -5,6 +5,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -278,7 +279,7 @@ static struct iommu_domain 
*tegra_smmu_domain_alloc(unsigned type)
 {
struct tegra_smmu_as *as;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED)
+   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
return NULL;
 
as = kzalloc(sizeof(*as), GFP_KERNEL);
@@ -288,25 +289,19 @@ static struct iommu_domain 
*tegra_smmu_domain_alloc(unsigned type)
as->attr = SMMU_PD_READABLE | SMMU_PD_WRITABLE | SMMU_PD_NONSECURE;
 
as->pd = alloc_page(GFP_KERNEL | __GFP_DMA | __GFP_ZERO);
-   if (!as->pd) {
-   kfree(as);
-   return NULL;
-   }
+   if (!as->pd)
+   goto out_free_as;
 
as->count = kcalloc(SMMU_NUM_PDE, sizeof(u32), GFP_KERNEL);
-   if (!as->count) {
-   __free_page(as->pd);
-   kfree(as);
-   return NULL;
-   }
+   if (!as->count)
+   goto out_free_all;
 
as->pts = kcalloc(SMMU_NUM_PDE, sizeof(*as->pts), GFP_KERNEL);
-   if (!as->pts) {
-   kfree(as->count);
-   __free_page(as->pd);
-   kfree(as);
-   return NULL;
-   }
+   if (!as->pts)
+   goto out_free_all;
+
+   if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(&as->domain))
+   goto out_free_all;
 
/* setup aperture */
as->domain.geometry.aperture_start = 0;
@@ -314,12 +309,22 @@ static struct iommu_domain 
*tegra_smmu_domain_alloc(unsigned type)
as->domain.geometry.force_aperture = true;
 
return &as->domain;
+
+out_free_all:
+   kfree(as->pts);
+   kfree(as->count);
+   __free_page(as->pd);
+out_free_as:
+   kfree(as);
+   return NULL;
 }
 
 static void tegra_smmu_domain_free(struct iommu_domain *domain)
 {
struct tegra_smmu_as *as = to_smmu_as(domain);
 
+   iommu_put_dma_cookie(domain);
+
/* TODO: free page directory and page tables */
 
WARN_ON_ONCE(as->use_count);
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 09/18] iommu/mediatek-v1: Add IOMMU_DOMAIN_DMA support

2020-08-20 Thread Robin Murphy

Now that arch/arm is wired up for default domains and iommu-dma,
implement the corresponding driver-side support for groups and DMA
domains to replace the shared mapping workaround.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/mtk_iommu.h|   2 -
 drivers/iommu/mtk_iommu_v1.c | 153 +++
 2 files changed, 48 insertions(+), 107 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index 122925dbe547..6253e98d810c 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -67,8 +67,6 @@ struct mtk_iommu_data {
struct iommu_device iommu;
const struct mtk_iommu_plat_data *plat_data;
 
-   struct dma_iommu_mapping*mapping; /* For mtk_iommu_v1.c */
-
struct list_headlist;
struct mtk_smi_larb_iommu   larb_imu[MTK_LARB_NR_MAX];
 };
diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index 82ddfe9170d4..40c89b8d3ac4 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -28,7 +28,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -240,13 +239,18 @@ static struct iommu_domain 
*mtk_iommu_domain_alloc(unsigned type)
 {
struct mtk_iommu_domain *dom;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED)
+   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
return NULL;
 
dom = kzalloc(sizeof(*dom), GFP_KERNEL);
if (!dom)
return NULL;
 
+   if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(&dom->domain)) {
+   kfree(dom);
+   return NULL;
+   }
+
return &dom->domain;
 }
 
@@ -257,6 +261,7 @@ static void mtk_iommu_domain_free(struct iommu_domain 
*domain)
 
dma_free_coherent(data->dev, M2701_IOMMU_PGT_SIZE,
dom->pgt_va, dom->pgt_pa);
+   iommu_put_dma_cookie(domain);
kfree(to_mtk_domain(domain));
 }
 
@@ -265,14 +270,8 @@ static int mtk_iommu_attach_device(struct iommu_domain 
*domain,
 {
struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
struct mtk_iommu_domain *dom = to_mtk_domain(domain);
-   struct dma_iommu_mapping *mtk_mapping;
int ret;
 
-   /* Only allow the domain created internally. */
-   mtk_mapping = data->mapping;
-   if (mtk_mapping->domain != domain)
-   return 0;
-
if (!data->m4u_dom) {
data->m4u_dom = dom;
ret = mtk_iommu_domain_finalise(data);
@@ -358,18 +357,39 @@ static phys_addr_t mtk_iommu_iova_to_phys(struct 
iommu_domain *domain,
 
 static const struct iommu_ops mtk_iommu_ops;
 
-/*
- * MTK generation one iommu HW only support one iommu domain, and all the 
client
- * sharing the same iova address space.
- */
-static int mtk_iommu_create_mapping(struct device *dev,
-   struct of_phandle_args *args)
+static struct iommu_device *mtk_iommu_probe_device(struct device *dev)
 {
struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
struct mtk_iommu_data *data;
+
+   if (!fwspec || fwspec->ops != &mtk_iommu_ops)
+   return ERR_PTR(-ENODEV); /* Not a iommu client device */
+
+   data = dev_iommu_priv_get(dev);
+
+   return &data->iommu;
+}
+
+static void mtk_iommu_release_device(struct device *dev)
+{
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+
+   if (!fwspec || fwspec->ops != &mtk_iommu_ops)
+   return;
+
+   iommu_fwspec_free(dev);
+}
+
+static struct iommu_group *mtk_iommu_device_group(struct device *dev)
+{
+   struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
+
+   return iommu_group_ref_get(data->m4u_group);
+}
+
+static int mtk_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
+{
struct platform_device *m4updev;
-   struct dma_iommu_mapping *mtk_mapping;
-   int ret;
 
if (args->args_count != 1) {
dev_err(dev, "invalid #iommu-cells(%d) property for IOMMU\n",
@@ -377,15 +397,6 @@ static int mtk_iommu_create_mapping(struct device *dev,
return -EINVAL;
}
 
-   if (!fwspec) {
-   ret = iommu_fwspec_init(dev, &args->np->fwnode, &mtk_iommu_ops);
-   if (ret)
-   return ret;
-   fwspec = dev_iommu_fwspec_get(dev);
-   } else if (dev_iommu_fwspec_get(dev)->ops != &mtk_iommu_ops) {
-   return -EINVAL;
-   }
-
if (!dev_iommu_priv_get(dev)) {
/* Get the m4u device */
m4updev = of_find_device_by_node(args->np);
@@ -395,83 +406,7 @@ static int mtk_iommu_create_mapping(struct device *dev,
dev_iommu_priv_set(dev, platform_get_drvdata(m4updev));
}
 
-   ret = iommu_fwspec_add_ids(dev, args->args, 1);
-   if (ret)
-   return ret;
-
-   data = dev_iommu_priv_get(dev);
-

[PATCH 11/18] iommu/omap: Add IOMMU_DOMAIN_DMA support

2020-08-20 Thread Robin Murphy

Now that arch/arm is wired up for default domains and iommu-dma,
implement the corresponding driver-side support for DMA domains.

Signed-off-by: Robin Murphy 
---
 drivers/iommu/omap-iommu.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 71f29c0927fc..ea25c2fe0418 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -9,6 +9,7 @@
  * Paul Mundt and Toshihiro Kobayashi
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -1574,13 +1575,19 @@ static struct iommu_domain 
*omap_iommu_domain_alloc(unsigned type)
 {
struct omap_iommu_domain *omap_domain;
 
-   if (type != IOMMU_DOMAIN_UNMANAGED)
+   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
return NULL;
 
omap_domain = kzalloc(sizeof(*omap_domain), GFP_KERNEL);
if (!omap_domain)
return NULL;
 
+   if (type == IOMMU_DOMAIN_DMA &&
+   iommu_get_dma_cookie(&omap_domain->domain)) {
+   kfree(omap_domain);
+   return NULL;
+   }
+
spin_lock_init(&omap_domain->lock);
 
omap_domain->domain.geometry.aperture_start = 0;
@@ -1601,6 +1608,7 @@ static void omap_iommu_domain_free(struct iommu_domain 
*domain)
if (omap_domain->dev)
_omap_iommu_detach_dev(omap_domain, omap_domain->dev);
 
+   iommu_put_dma_cookie(&omap_domain->domain);
kfree(omap_domain);
 }
 
@@ -1736,6 +1744,17 @@ static struct iommu_group 
*omap_iommu_device_group(struct device *dev)
return group;
 }
 
+static int omap_iommu_of_xlate(struct device *dev,
+  struct of_phandle_args *args)
+{
+   /*
+* Logically, some of the housekeeping from _omap_iommu_add_device()
+* should probably move here, but the minimum we *need* is simply to
+* cooperate with of_iommu at all to let iommu-dma work.
+*/
+   return 0;
+}
+
 static const struct iommu_ops omap_iommu_ops = {
.domain_alloc   = omap_iommu_domain_alloc,
.domain_free= omap_iommu_domain_free,
@@ -1747,6 +1766,7 @@ static const struct iommu_ops omap_iommu_ops = {
.probe_device   = omap_iommu_probe_device,
.release_device = omap_iommu_release_device,
.device_group   = omap_iommu_device_group,
+   .of_xlate   = omap_iommu_of_xlate,
.pgsize_bitmap  = OMAP_IOMMU_PGSIZES,
 };
 
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 03/18] ARM/dma-mapping: Merge IOMMU ops

2020-08-20 Thread Robin Murphy

The dma_sync_* operations are now the only difference between the
coherent and non-coherent IOMMU ops. Some minor tweaks to make those
safe for coherent devices with minimal overhead, and we can condense
down to a single set of DMA ops.

Signed-off-by: Robin Murphy 
---
 arch/arm/mm/dma-mapping.c | 41 +--
 1 file changed, 13 insertions(+), 28 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 1bb7e9608f75..0537c97cebe1 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1677,6 +1677,9 @@ static void arm_iommu_sync_sg_for_cpu(struct device *dev,
struct scatterlist *s;
int i;
 
+   if (dev->dma_coherent)
+   return;
+
for_each_sg(sg, s, nents, i)
__dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir);
 
@@ -1696,6 +1699,9 @@ static void arm_iommu_sync_sg_for_device(struct device 
*dev,
struct scatterlist *s;
int i;
 
+   if (dev->dma_coherent)
+   return;
+
for_each_sg(sg, s, nents, i)
__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
 }
@@ -1829,12 +1835,13 @@ static void arm_iommu_sync_single_for_cpu(struct device 
*dev,
 {
struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
dma_addr_t iova = handle & PAGE_MASK;
-   struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, 
iova));
+   struct page *page;
unsigned int offset = handle & ~PAGE_MASK;
 
-   if (!iova)
+   if (dev->dma_coherent || !iova)
return;
 
+   page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
__dma_page_dev_to_cpu(page, offset, size, dir);
 }
 
@@ -1843,12 +1850,13 @@ static void arm_iommu_sync_single_for_device(struct 
device *dev,
 {
struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
dma_addr_t iova = handle & PAGE_MASK;
-   struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, 
iova));
+   struct page *page;
unsigned int offset = handle & ~PAGE_MASK;
 
-   if (!iova)
+   if (dev->dma_coherent || !iova)
return;
 
+   page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
__dma_page_cpu_to_dev(page, offset, size, dir);
 }
 
@@ -1872,22 +1880,6 @@ static const struct dma_map_ops iommu_ops = {
.unmap_resource = arm_iommu_unmap_resource,
 };
 
-static const struct dma_map_ops iommu_coherent_ops = {
-   .alloc  = arm_iommu_alloc_attrs,
-   .free   = arm_iommu_free_attrs,
-   .mmap   = arm_iommu_mmap_attrs,
-   .get_sgtable= arm_iommu_get_sgtable,
-
-   .map_page   = arm_iommu_map_page,
-   .unmap_page = arm_iommu_unmap_page,
-
-   .map_sg = arm_iommu_map_sg,
-   .unmap_sg   = arm_iommu_unmap_sg,
-
-   .map_resource   = arm_iommu_map_resource,
-   .unmap_resource = arm_iommu_unmap_resource,
-};
-
 /**
  * arm_iommu_create_mapping
  * @bus: pointer to the bus holding the client device (for IOMMU calls)
@@ -2067,11 +2059,6 @@ void arm_iommu_detach_device(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(arm_iommu_detach_device);
 
-static const struct dma_map_ops *arm_get_iommu_dma_map_ops(bool coherent)
-{
-   return coherent ? &iommu_coherent_ops : &iommu_ops;
-}
-
 static bool arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu)
 {
@@ -2118,8 +2105,6 @@ static bool arm_setup_iommu_dma_ops(struct device *dev, 
u64 dma_base, u64 size,
 
 static void arm_teardown_iommu_dma_ops(struct device *dev) { }
 
-#define arm_get_iommu_dma_map_ops arm_get_dma_map_ops
-
 #endif /* CONFIG_ARM_DMA_USE_IOMMU */
 
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
@@ -2141,7 +2126,7 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, 
u64 size,
return;
 
if (arm_setup_iommu_dma_ops(dev, dma_base, size, iommu))
-   dma_ops = arm_get_iommu_dma_map_ops(coherent);
+   dma_ops = &iommu_ops;
else
dma_ops = arm_get_dma_map_ops(coherent);
 
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 04/18] iommu/dma: Add temporary hacks for arch/arm

2020-08-20 Thread Robin Murphy

In order to wrangle arch/arm over to iommu_dma_ops, we first need to
convert all its associated IOMMU drivers over to default domains, and
deal with users of its public dma_iommu_mappinng API. Since that can't
reasonably be done in a single patch, we've no choice but to go through
an ugly transitional phase. That starts with exposing some hooks into
iommu-dma's internals so that it can start to do most of the heavy
lifting.

Before you start thinking about how horrible that is, here's a zebra:
  ,
 c@
  `)\
   <  /

Signed-off-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 38 +-
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4959f5df21bd..ab157d155bf7 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -25,6 +25,19 @@
 #include 
 #include 
 
+#ifdef CONFIG_ARM
+#include 
+#endif
+static struct iommu_domain *__iommu_get_dma_domain(struct device *dev)
+{
+#ifdef CONFIG_ARM
+   struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+   if (mapping)
+   return mapping->domain;
+#endif
+   return iommu_get_dma_domain(dev);
+}
+
 struct iommu_dma_msi_page {
struct list_headlist;
dma_addr_t  iova;
@@ -298,8 +311,11 @@ static void iommu_dma_flush_iotlb_all(struct iova_domain 
*iovad)
  * avoid rounding surprises. If necessary, we reserve the page at address 0
  * to ensure it is an invalid IOVA. It is safe to reinitialise a domain, but
  * any change which could make prior IOVAs invalid will fail.
+ *
+ * XXX: Not formally exported, but needs to be referenced
+ * from arch/arm/mm/dma-mapping.c temporarily
  */
-static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
+int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
u64 size, struct device *dev)
 {
struct iommu_dma_cookie *cookie = domain->iova_cookie;
@@ -456,7 +472,7 @@ static void iommu_dma_free_iova(struct iommu_dma_cookie 
*cookie,
 static void __iommu_dma_unmap(struct device *dev, dma_addr_t dma_addr,
size_t size)
 {
-   struct iommu_domain *domain = iommu_get_dma_domain(dev);
+   struct iommu_domain *domain = __iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = &cookie->iovad;
size_t iova_off = iova_offset(iovad, dma_addr);
@@ -478,7 +494,7 @@ static void __iommu_dma_unmap(struct device *dev, 
dma_addr_t dma_addr,
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
size_t size, int prot, u64 dma_mask)
 {
-   struct iommu_domain *domain = iommu_get_dma_domain(dev);
+   struct iommu_domain *domain = __iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = &cookie->iovad;
size_t iova_off = iova_offset(iovad, phys);
@@ -582,7 +598,7 @@ static struct page **__iommu_dma_alloc_pages(struct device 
*dev,
 static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
 {
-   struct iommu_domain *domain = iommu_get_dma_domain(dev);
+   struct iommu_domain *domain = __iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = &cookie->iovad;
bool coherent = dev_is_dma_coherent(dev);
@@ -678,7 +694,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
*dev,
if (dev_is_dma_coherent(dev))
return;
 
-   phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
+   phys = iommu_iova_to_phys(__iommu_get_dma_domain(dev), dma_handle);
arch_sync_dma_for_cpu(phys, size, dir);
 }
 
@@ -690,7 +706,7 @@ static void iommu_dma_sync_single_for_device(struct device 
*dev,
if (dev_is_dma_coherent(dev))
return;
 
-   phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
+   phys = iommu_iova_to_phys(__iommu_get_dma_domain(dev), dma_handle);
arch_sync_dma_for_device(phys, size, dir);
 }
 
@@ -831,7 +847,7 @@ static void __invalidate_sg(struct scatterlist *sg, int 
nents)
 static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs)
 {
-   struct iommu_domain *domain = iommu_get_dma_domain(dev);
+   struct iommu_domain *domain = __iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = &cookie->iovad;
struct scatterlist *s, *prev = NULL;
@@ -1112,12 +1128,16 @@ static int iommu_dma_get_sgtable(struct device *dev, 
struct sg_table *sgt,
 
 static unsigned long iommu_dma_get_merge_boundary(struct device *dev)
 {
-   struct iommu_domain *domain = iommu_

[PATCH 00/18] Convert arch/arm to use iommu-dma

2020-08-20 Thread Robin Murphy

Hi all,

After 5 years or so of intending to get round to this, finally the
time comes! The changes themselves actualy turn out to be relatively
mechanical; the bigger concern appears to be how to get everything
merged across about 5 diffferent trees given the dependencies.

I've lightly boot-tested things on Rockchip RK3288 and Exynos 4412
(Odroid-U3), to the degree that their display drivers should be using
IOMMU-backed buffers and don't explode (the Odroid doesn't manage to
send a working HDMI signal to the one monitor I have that it actually
detects, but that's a pre-existing condition...) Confirmation that the
Mediatek, OMAP and Tegra changes work will be most welcome.

Patches are based on 5.9-rc1, branch available here:

  git://linux-arm.org/linux-rm arm/dma


Robin.


Robin Murphy (18):
  ARM/dma-mapping: Drop .dma_supported for IOMMU ops
  ARM/dma-mapping: Consolidate IOMMU ops callbacks
  ARM/dma-mapping: Merge IOMMU ops
  iommu/dma: Add temporary hacks for arch/arm
  ARM/dma-mapping: Switch to iommu_dma_ops
  ARM/dma-mapping: Support IOMMU default domains
  iommu/arm-smmu: Remove arch/arm workaround
  iommu/renesas: Remove arch/arm workaround
  iommu/mediatek-v1: Add IOMMU_DOMAIN_DMA support
  iommu/msm: Add IOMMU_DOMAIN_DMA support
  iommu/omap: Add IOMMU_DOMAIN_DMA support
  iommu/tegra-gart: Add IOMMU_DOMAIN_DMA support
  iommu/tegra: Add IOMMU_DOMAIN_DMA support
  drm/exynos: Consolidate IOMMU mapping code
  drm/nouveau/tegra: Clean up IOMMU workaround
  staging/media/tegra-vde: Clean up IOMMU workaround
  media/omap3isp: Clean up IOMMU workaround
  ARM/dma-mapping: Remove legacy dma-iommu API

 arch/arm/Kconfig  |   28 +-
 arch/arm/common/dmabounce.c   |1 -
 arch/arm/include/asm/device.h |9 -
 arch/arm/include/asm/dma-iommu.h  |   37 -
 arch/arm/mm/dma-mapping.c | 1198 +
 drivers/gpu/drm/exynos/exynos5433_drm_decon.c |5 +-
 drivers/gpu/drm/exynos/exynos7_drm_decon.c|5 +-
 drivers/gpu/drm/exynos/exynos_drm_dma.c   |   61 +-
 drivers/gpu/drm/exynos/exynos_drm_drv.h   |6 +-
 drivers/gpu/drm/exynos/exynos_drm_fimc.c  |5 +-
 drivers/gpu/drm/exynos/exynos_drm_fimd.c  |5 +-
 drivers/gpu/drm/exynos/exynos_drm_g2d.c   |5 +-
 drivers/gpu/drm/exynos/exynos_drm_gsc.c   |5 +-
 drivers/gpu/drm/exynos/exynos_drm_rotator.c   |5 +-
 drivers/gpu/drm/exynos/exynos_drm_scaler.c|6 +-
 drivers/gpu/drm/exynos/exynos_mixer.c |7 +-
 .../drm/nouveau/nvkm/engine/device/tegra.c|   13 -
 drivers/iommu/Kconfig |8 -
 drivers/iommu/arm/arm-smmu/arm-smmu.c |   10 -
 drivers/iommu/ipmmu-vmsa.c|   69 -
 drivers/iommu/msm_iommu.c |7 +-
 drivers/iommu/mtk_iommu.h |2 -
 drivers/iommu/mtk_iommu_v1.c  |  153 +--
 drivers/iommu/omap-iommu.c|   22 +-
 drivers/iommu/tegra-gart.c|   17 +-
 drivers/iommu/tegra-smmu.c|   37 +-
 drivers/media/platform/Kconfig|1 -
 drivers/media/platform/omap3isp/isp.c |   68 +-
 drivers/media/platform/omap3isp/isp.h |3 -
 drivers/staging/media/tegra-vde/iommu.c   |   12 -
 30 files changed, 150 insertions(+), 1660 deletions(-)
 delete mode 100644 arch/arm/include/asm/dma-iommu.h

-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 01/18] ARM/dma-mapping: Drop .dma_supported for IOMMU ops

2020-08-20 Thread Robin Murphy

When an IOMMU is present, we trust that it should be capable
of remapping any physical memory, and since the device masks
represent the input (virtual) addresses to the IOMMU it makes
no sense to validate them against physical PFNs anyway.

Signed-off-by: Robin Murphy 
---
 arch/arm/mm/dma-mapping.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 8a8949174b1c..ffa387f343dc 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1997,8 +1997,6 @@ static const struct dma_map_ops iommu_ops = {
 
.map_resource   = arm_iommu_map_resource,
.unmap_resource = arm_iommu_unmap_resource,
-
-   .dma_supported  = arm_dma_supported,
 };
 
 static const struct dma_map_ops iommu_coherent_ops = {
@@ -2015,8 +2013,6 @@ static const struct dma_map_ops iommu_coherent_ops = {
 
.map_resource   = arm_iommu_map_resource,
.unmap_resource = arm_iommu_unmap_resource,
-
-   .dma_supported  = arm_dma_supported,
 };
 
 /**
-- 
2.28.0.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 02/18] ARM/dma-mapping: Consolidate IOMMU ops callbacks

2020-08-20 Thread Robin Murphy

Merge the coherent and non-coherent callbacks down to a single
implementation each, relying on the generic dev->dma_coherent
flag at the points where the difference matters.

Signed-off-by: Robin Murphy 
---
 arch/arm/Kconfig  |   4 +-
 arch/arm/mm/dma-mapping.c | 281 +++---
 2 files changed, 79 insertions(+), 206 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e00d94b16658..b91273f9fd43 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -19,8 +19,8 @@ config ARM
select ARCH_HAS_SET_MEMORY
select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL
select ARCH_HAS_STRICT_MODULE_RWX if MMU
-   select ARCH_HAS_SYNC_DMA_FOR_DEVICE if SWIOTLB
-   select ARCH_HAS_SYNC_DMA_FOR_CPU if SWIOTLB
+   select ARCH_HAS_SYNC_DMA_FOR_DEVICE if SWIOTLB || ARM_DMA_USE_IOMMU
+   select ARCH_HAS_SYNC_DMA_FOR_CPU if SWIOTLB || ARM_DMA_USE_IOMMU
select ARCH_HAS_TEARDOWN_DMA_OPS if MMU
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
select ARCH_HAVE_CUSTOM_GPIO_H
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index ffa387f343dc..1bb7e9608f75 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1418,13 +1418,13 @@ static void __iommu_free_atomic(struct device *dev, 
void *cpu_addr,
__free_from_pool(cpu_addr, size);
 }
 
-static void *__arm_iommu_alloc_attrs(struct device *dev, size_t size,
-   dma_addr_t *handle, gfp_t gfp, unsigned long attrs,
-   int coherent_flag)
+static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
+   dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
 {
pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL);
struct page **pages;
void *addr = NULL;
+   int coherent_flag = dev->dma_coherent ? COHERENT : NORMAL;
 
*handle = DMA_MAPPING_ERROR;
size = PAGE_ALIGN(size);
@@ -1467,19 +1467,7 @@ static void *__arm_iommu_alloc_attrs(struct device *dev, 
size_t size,
return NULL;
 }
 
-static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
-   dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
-{
-   return __arm_iommu_alloc_attrs(dev, size, handle, gfp, attrs, NORMAL);
-}
-
-static void *arm_coherent_iommu_alloc_attrs(struct device *dev, size_t size,
-   dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
-{
-   return __arm_iommu_alloc_attrs(dev, size, handle, gfp, attrs, COHERENT);
-}
-
-static int __arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct 
*vma,
+static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
 {
@@ -1493,35 +1481,24 @@ static int __arm_iommu_mmap_attrs(struct device *dev, 
struct vm_area_struct *vma
if (vma->vm_pgoff >= nr_pages)
return -ENXIO;
 
+   if (!dev->dma_coherent)
+   vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
+
err = vm_map_pages(vma, pages, nr_pages);
if (err)
pr_err("Remapping memory failed: %d\n", err);
 
return err;
 }
-static int arm_iommu_mmap_attrs(struct device *dev,
-   struct vm_area_struct *vma, void *cpu_addr,
-   dma_addr_t dma_addr, size_t size, unsigned long attrs)
-{
-   vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
-
-   return __arm_iommu_mmap_attrs(dev, vma, cpu_addr, dma_addr, size, 
attrs);
-}
-
-static int arm_coherent_iommu_mmap_attrs(struct device *dev,
-   struct vm_area_struct *vma, void *cpu_addr,
-   dma_addr_t dma_addr, size_t size, unsigned long attrs)
-{
-   return __arm_iommu_mmap_attrs(dev, vma, cpu_addr, dma_addr, size, 
attrs);
-}
 
 /*
  * free a page as defined by the above mapping.
  * Must not be called with IRQs disabled.
  */
-static void __arm_iommu_free_attrs(struct device *dev, size_t size, void 
*cpu_addr,
-   dma_addr_t handle, unsigned long attrs, int coherent_flag)
+static void arm_iommu_free_attrs(struct device *dev, size_t size, void 
*cpu_addr,
+   dma_addr_t handle, unsigned long attrs)
 {
+   int coherent_flag = dev->dma_coherent ? COHERENT : NORMAL;
struct page **pages;
size = PAGE_ALIGN(size);
 
@@ -1543,19 +1520,6 @@ static void __arm_iommu_free_attrs(struct device *dev, 
size_t size, void *cpu_ad
__iommu_free_buffer(dev, pages, size, attrs);
 }
 
-static void arm_iommu_free_attrs(struct device *dev, size_t size,
-void *cpu_addr, dma_addr_t handle,
-unsigned long attrs)
-{
-   __arm_iommu_free_attrs(dev, size, cpu_addr, handle, attrs, NORMAL);
-}
-
-static void arm_coherent_iommu_free_attrs(struct device *dev, size_t size,
-   void *cpu_addr, dma_addr_t handle,

Re: [PATCH RESEND v10 07/11] device-mapping: Introduce DMA range map, supplanting dma_pfn_offset

2020-08-20 Thread Jim Quinlan via iommu

Hi Anday,


On Tue, Aug 18, 2020 at 4:14 AM Andy Shevchenko
 wrote:
>
> On Mon, Aug 17, 2020 at 05:53:09PM -0400, Jim Quinlan wrote:
> > The new field 'dma_range_map' in struct device is used to facilitate the
> > use of single or multiple offsets between mapping regions of cpu addrs and
> > dma addrs.  It subsumes the role of "dev->dma_pfn_offset" which was only
> > capable of holding a single uniform offset and had no region bounds
> > checking.
> >
> > The function of_dma_get_range() has been modified so that it takes a single
> > argument -- the device node -- and returns a map, NULL, or an error code.
> > The map is an array that holds the information regarding the DMA regions.
> > Each range entry contains the address offset, the cpu_start address, the
> > dma_start address, and the size of the region.
> >
> > of_dma_configure() is the typical manner to set range offsets but there are
> > a number of ad hoc assignments to "dev->dma_pfn_offset" in the kernel
> > driver code.  These cases now invoke the function
> > dma_attach_offset_range(dev, cpu_addr, dma_addr, size).
>
> ...
>
> > + if (dev) {
> > + phys_addr_t paddr = PFN_PHYS(pfn);
> > +
>
> > + pfn -= (dma_offset_from_phys_addr(dev, paddr) >> PAGE_SHIFT);
>
> PFN_DOWN() ?
Yep.
>
> > + }
>
> ...
>
> > + pfn += (dma_offset_from_dma_addr(dev, addr) >> PAGE_SHIFT);
>
> Ditto.
Yep.
>
>
> ...
>
> > +static inline u64 dma_offset_from_dma_addr(struct device *dev, dma_addr_t 
> > dma_addr)
> > +{
> > + const struct bus_dma_region *m = dev->dma_range_map;
> > +
> > + if (!m)
> > + return 0;
> > + for (; m->size; m++)
> > + if (dma_addr >= m->dma_start && dma_addr - m->dma_start < 
> > m->size)
> > + return m->offset;
> > + return 0;
> > +}
> > +
> > +static inline u64 dma_offset_from_phys_addr(struct device *dev, 
> > phys_addr_t paddr)
> > +{
> > + const struct bus_dma_region *m = dev->dma_range_map;
> > +
> > + if (!m)
> > + return 0;
> > + for (; m->size; m++)
> > + if (paddr >= m->cpu_start && paddr - m->cpu_start < m->size)
> > + return m->offset;
> > + return 0;
> > +}
>
> Perhaps for these the form with one return 0 is easier to read
>
> if (m) {
> for (; m->size; m++)
> if (paddr >= m->cpu_start && paddr - m->cpu_start < 
> m->size)
> return m->offset;
> }
> return 0;
>
> ?
I see what you are saying but I don't think there is enough difference
between the two to justify changing it.
>
> ...
>
> > + if (mem->use_dev_dma_pfn_offset) {
> > + u64 base_addr = (u64)mem->pfn_base << PAGE_SHIFT;
>
> PFN_PHYS() ?
Yep.

>
> > +
> > + return base_addr - dma_offset_from_phys_addr(dev, base_addr);
> > + }
>
> ...
>
> > + * It returns -ENOMEM if out of memory, 0 otherwise.
>
> This doesn't describe cases dev->dma_range_map != NULL and offset == 0.
Okay, I'll fix this.

>
> > +int dma_set_offset_range(struct device *dev, phys_addr_t cpu_start,
> > +  dma_addr_t dma_start, u64 size)
> > +{
> > + struct bus_dma_region *map;
> > + u64 offset = (u64)cpu_start - (u64)dma_start;
> > +
> > + if (!offset)
> > + return 0;
> > +
> > + if (dev->dma_range_map) {
> > + dev_err(dev, "attempt to add DMA range to existing map\n");
> > + return -EINVAL;
> > + }
> > +
> > + map = kcalloc(2, sizeof(*map), GFP_KERNEL);
> > + if (!map)
> > + return -ENOMEM;
> > + map[0].cpu_start = cpu_start;
> > + map[0].dma_start = dma_start;
> > + map[0].offset = offset;
> > + map[0].size = size;
> > + dev->dma_range_map = map;
> > +
> > + return 0;
> > +}
>
> ...
>
> > +void *dma_copy_dma_range_map(const struct bus_dma_region *map)
> > +{
> > + int num_ranges;
> > + struct bus_dma_region *new_map;
> > + const struct bus_dma_region *r = map;
> > +
> > + for (num_ranges = 0; r->size; num_ranges++)
> > + r++;
>
> > + new_map = kcalloc(num_ranges + 1, sizeof(*map), GFP_KERNEL);
> > + if (new_map)
> > + memcpy(new_map, map, sizeof(*map) * num_ranges);
>
> Looks like krealloc() on the first glance...
It's not.  We are making a distinct copy of the original, not resizing it.
>
> > +
> > + return new_map;
> > +}
>
> --
> With Best Regards,
> Andy Shevchenko
Thanks again,
Jim
>
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v4 0/3] iommu/arm-smmu-v3: permit users to disable msi polling

2020-08-20 Thread Robin Murphy


On 2020-08-19 00:38, Barry Song wrote:

patch 1/3 and patch 2/3 are the preparation of patch 3/3 which permits users
to disable MSI-based polling by cmd line.

-v4:
   with respect to Robin's comments
   * cleanup the code of the existing module parameter disable_bypass
   * add ARM_SMMU_OPT_MSIPOLL flag. on the other hand, we only need to check
 a bit in options rather than two bits in features


Thanks Barry - for all 3 patches,

Reviewed-by: Robin Murphy 

I'd be inclined to squash #2 into #1, but I'll leave that up to Will.

Cheers,
Robin.



Barry Song (3):
   iommu/arm-smmu-v3: replace symbolic permissions by octal permissions
 for module parameter
   iommu/arm-smmu-v3: replace module_param_named by module_param for
 disable_bypass
   iommu/arm-smmu-v3: permit users to disable msi polling

  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 19 +--
  1 file changed, 13 insertions(+), 6 deletions(-)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 2/2] iommu/iova: Free global iova rcache on iova alloc failure

2020-08-20 Thread vjitta

From: Vijayanand Jitta 

When ever an iova alloc request fails we free the iova
ranges present in the percpu iova rcaches and then retry
but the global iova rcache is not freed as a result we could
still see iova alloc failure even after retry as global
rcache is holding the iova's which can cause fragmentation.
So, free the global iova rcache as well and then go for the
retry.

Signed-off-by: Vijayanand Jitta 
---
 drivers/iommu/iova.c | 23 +++
 include/linux/iova.h |  6 ++
 2 files changed, 29 insertions(+)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 4e77116..5836c87 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -442,6 +442,7 @@ struct iova *find_iova(struct iova_domain *iovad, unsigned 
long pfn)
flush_rcache = false;
for_each_online_cpu(cpu)
free_cpu_cached_iovas(cpu, iovad);
+   free_global_cached_iovas(iovad);
goto retry;
}
 
@@ -1055,5 +1056,27 @@ void free_cpu_cached_iovas(unsigned int cpu, struct 
iova_domain *iovad)
}
 }
 
+/*
+ * free all the IOVA ranges of global cache
+ */
+void free_global_cached_iovas(struct iova_domain *iovad)
+{
+   struct iova_rcache *rcache;
+   unsigned long flags;
+   int i, j;
+
+   for (i = 0; i < IOVA_RANGE_CACHE_MAX_SIZE; ++i) {
+   rcache = &iovad->rcaches[i];
+   spin_lock_irqsave(&rcache->lock, flags);
+   for (j = 0; j < rcache->depot_size; ++j) {
+   iova_magazine_free_pfns(rcache->depot[j], iovad);
+   iova_magazine_free(rcache->depot[j]);
+   rcache->depot[j] = NULL;
+   }
+   rcache->depot_size = 0;
+   spin_unlock_irqrestore(&rcache->lock, flags);
+   }
+}
+
 MODULE_AUTHOR("Anil S Keshavamurthy ");
 MODULE_LICENSE("GPL");
diff --git a/include/linux/iova.h b/include/linux/iova.h
index a0637ab..a905726 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -163,6 +163,7 @@ int init_iova_flush_queue(struct iova_domain *iovad,
 struct iova *split_and_remove_iova(struct iova_domain *iovad,
struct iova *iova, unsigned long pfn_lo, unsigned long pfn_hi);
 void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad);
+void free_global_cached_iovas(struct iova_domain *iovad);
 #else
 static inline int iova_cache_get(void)
 {
@@ -270,6 +271,11 @@ static inline void free_cpu_cached_iovas(unsigned int cpu,
 struct iova_domain *iovad)
 {
 }
+
+static inline void free_global_cached_iovas(struct iova_domain *iovad)
+{
+}
+
 #endif
 
 #endif
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2 1/2] iommu/iova: Retry from last rb tree node if iova search fails

2020-08-20 Thread vjitta

From: Vijayanand Jitta 

When ever a new iova alloc request comes iova is always searched
from the cached node and the nodes which are previous to cached
node. So, even if there is free iova space available in the nodes
which are next to the cached node iova allocation can still fail
because of this approach.

Consider the following sequence of iova alloc and frees on
1GB of iova space

1) alloc - 500MB
2) alloc - 12MB
3) alloc - 499MB
4) free -  12MB which was allocated in step 2
5) alloc - 13MB

After the above sequence we will have 12MB of free iova space and
cached node will be pointing to the iova pfn of last alloc of 13MB
which will be the lowest iova pfn of that iova space. Now if we get an
alloc request of 2MB we just search from cached node and then look
for lower iova pfn's for free iova and as they aren't any, iova alloc
fails though there is 12MB of free iova space.

To avoid such iova search failures do a retry from the last rb tree node
when iova search fails, this will search the entire tree and get an iova
if its available.

Signed-off-by: Vijayanand Jitta 
---
 drivers/iommu/iova.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 49fc01f..4e77116 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -184,8 +184,9 @@ static int __alloc_and_insert_iova_range(struct iova_domain 
*iovad,
struct rb_node *curr, *prev;
struct iova *curr_iova;
unsigned long flags;
-   unsigned long new_pfn;
+   unsigned long new_pfn, low_pfn_new;
unsigned long align_mask = ~0UL;
+   unsigned long high_pfn = limit_pfn, low_pfn = iovad->start_pfn;
 
if (size_aligned)
align_mask <<= fls_long(size - 1);
@@ -198,15 +199,25 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,
 
curr = __get_cached_rbnode(iovad, limit_pfn);
curr_iova = rb_entry(curr, struct iova, node);
+   low_pfn_new = curr_iova->pfn_hi + 1;
+
+retry:
do {
-   limit_pfn = min(limit_pfn, curr_iova->pfn_lo);
-   new_pfn = (limit_pfn - size) & align_mask;
+   high_pfn = min(high_pfn, curr_iova->pfn_lo);
+   new_pfn = (high_pfn - size) & align_mask;
prev = curr;
curr = rb_prev(curr);
curr_iova = rb_entry(curr, struct iova, node);
-   } while (curr && new_pfn <= curr_iova->pfn_hi);
-
-   if (limit_pfn < size || new_pfn < iovad->start_pfn) {
+   } while (curr && new_pfn <= curr_iova->pfn_hi && new_pfn >= low_pfn);
+
+   if (high_pfn < size || new_pfn < low_pfn) {
+   if (low_pfn == iovad->start_pfn && low_pfn_new < limit_pfn) {
+   high_pfn = limit_pfn;
+   low_pfn = low_pfn_new;
+   curr = &iovad->anchor.node;
+   curr_iova = rb_entry(curr, struct iova, node);
+   goto retry;
+   }
iovad->max32_alloc_size = size;
goto iova32_full;
}
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-20 Thread Tomasz Figa

On Thu, Aug 20, 2020 at 7:02 AM Christoph Hellwig  wrote:
>
> On Wed, Aug 19, 2020 at 03:07:04PM +0100, Robin Murphy wrote:
> >> FWIW, I asked back in time what the plan is for non-coherent
> >> allocations and it seemed like DMA_ATTR_NON_CONSISTENT and
> >> dma_sync_*() was supposed to be the right thing to go with. [2] The
> >> same thread also explains why dma_alloc_pages() isn't suitable for the
> >> users of dma_alloc_attrs() and DMA_ATTR_NON_CONSISTENT.
> >
> > AFAICS even back then Christoph was implying getting rid of NON_CONSISTENT
> > and *replacing* it with something streaming-API-based - i.e. this series -
> > not encouraging mixing the existing APIs. It doesn't seem impossible to
> > implement a remapping version of this new dma_alloc_pages() for
> > IOMMU-backed ops if it's really warranted (although at that point it seems
> > like "non-coherent" vb2-dc starts to have significant conceptual overlap
> > with vb2-sg).
>
> You can alway vmap the returned pages from dma_alloc_pages, but it will
> make cache invalidation hell - you'll need to use
> invalidate_kernel_vmap_range and flush_kernel_vmap_range to properly
> handle virtually indexed caches.
>
> Or with remapping you mean using the iommu do de-scatter/gather?

Ideally, both.

For remapping in the CPU sense, there are drivers which rely on a
contiguous kernel mapping of the vb2 buffers, which was provided by
dma_alloc_attrs(). I think they could be reworked to work on single
pages, but that would significantly complicate the code. At the same
time, such drivers would actually benefit from a cached mapping,
because they often have non-bursty, random access patterns.

Then, in the IOMMU sense, the whole idea of videobuf2-dma-contig is to
rely on the DMA API to always provide device-contiguous memory, as
required by the hardware which only has a single pointer and size.

>
> You can implement that trivially implement it yourself for the iommu
> case:
>
> {
> merge_boundary = dma_get_merge_boundary(dev);
> if (!merge_boundary || merge_boundary > chunk_size - 1) {
> /* can't coalesce */
> return -EINVAL;
> }
>
>
> nents = DIV_ROUND_UP(total_size, chunk_size);
> sg = sgl_alloc();
> for_each_sgl() {
> sg->page = __alloc_pages(get_order(chunk_size))
> sg->len = chunk_size;
> }
> dma_map_sg(sg, DMA_ATTR_SKIP_CPU_SYNC);
> // you are guaranteed to get a single dma_addr out
> }
>
> Of course this still uses the scatterlist structure with its annoying
> mix of input and output parametes, so I'd rather not expose it as
> an official API at the DMA layer.

The problem with the above open coded approach is that it requires
explicit handling of the non-IOMMU and IOMMU cases and this is exactly
what we don't want to have in vb2 and what was actually the job of the
DMA API to hide. Is the plan to actually move the IOMMU handling out
of the DMA API?

Do you think we could instead turn it into a dma_alloc_noncoherent()
helper, which has similar semantics as dma_alloc_attrs() and handles
the various corner cases (e.g. invalidate_kernel_vmap_range and
flush_kernel_vmap_range) to achieve the desired functionality without
delegating the "hell", as you called it, to the users?

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-20 Thread Tomasz Figa

On Thu, Aug 20, 2020 at 6:45 AM Christoph Hellwig  wrote:
>
> On Wed, Aug 19, 2020 at 04:11:52PM +0200, Tomasz Figa wrote:
> > > > By the way, as a videobuf2 reviewer, I'd appreciate being CC'd on any
> > > > series related to the subsystem-facing DMA API changes, since
> > > > videobuf2 is one of the biggest users of it.
> > >
> > > The cc list is too long - I cc lists and key maintainers.  As a reviewer
> > > should should watch your subsystems lists closely.
> >
> > Well, I guess we can disagree on this, because there is no clear
> > policy. I'm listed in the MAINTAINERS file for the subsystem and I
> > believe the purpose of the file is to list the people to CC on
> > relevant patches. We're all overloaded with work and having to look
> > through the huge volume of mailing lists like linux-media doesn't help
> > and thus I'd still appreciate being added on CC.
>
> I'm happy to Cc and active participant in the discussion.  I'm not
> going to add all reviewers because even with the trimmed CC list
> I'm already hitting the number of receipients limit on various lists.

Fair enough.

We'll make your job easier and just turn my MAINTAINERS entry into a
maintainer. :)

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 05/28] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT

2020-08-20 Thread Tomasz Figa

On Thu, Aug 20, 2020 at 7:20 AM Christoph Hellwig  wrote:
>
> On Thu, Aug 20, 2020 at 06:43:47AM +0200, Christoph Hellwig wrote:
> > On Wed, Aug 19, 2020 at 03:57:53PM +0200, Tomasz Figa wrote:
> > > > > Could you explain what makes you think it's unused? It's a feature of
> > > > > the UAPI generally supported by the videobuf2 framework and relied on
> > > > > by Chromium OS to get any kind of reasonable performance when
> > > > > accessing V4L2 buffers in the userspace.
> > > >
> > > > Because it doesn't do anything except on PARISC and non-coherent MIPS,
> > > > so by definition it isn't used by any of these media drivers.
> > >
> > > It's still an UAPI feature, so we can't simply remove the flag, it
> > > must stay there as a no-op, until the problem is resolved.
> >
> > Ok, I'll switch to just ignoring it for the next version.
>
> So I took a deeper look.  I don't really think it qualifies as a UAPI
> in our traditional sense.  For one it only appeared in 5.9-rc1, so we
> can trivially expedite the patch into 5.9-rc and not actually make it
> show up in any released kernel version.  And even as of the current
> Linus' tree the only user is a test driver.  So I really think the best
> way to go ahead is to just revert it ASAP as the design wasn't thought
> out at all.

The UAPI and V4L2/videobuf2 changes are in good shape and the only
wrong part is the use of DMA API, which was based on an earlier email
guidance anyway, and a change to the synchronization part . I find
conclusions like the above insulting for people who put many hours
into designing and implementing the related functionality, given the
complexity of the videobuf2 framework and how ill-defined the DMA API
was, and would feel better if such could be avoided in future
communication.

That said, we can revert it on the basis of the implementation issues,
but I feel like we wouldn't get anything by doing so, because as I
said, the design is sane and most of the implementation is fine as
well. Instead. I'd suggest simply removing the use of the attribute
being removed, so that the feature stays no-op until the DMA API
provides a way to implement it or we just migrate videobuf2 to stop
using the DMA API as much as possible, like many drivers in the DRM
subsystem did.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

99 matches

Mail list logo