Re: [PATCH -next] swiotlb: drop pointless static qualifier in swiotlb_dma_supported()

2019-02-13 Thread Christoph Hellwig
On Thu, Feb 14, 2019 at 01:41:47AM +, YueHaibing wrote:
> There is no need to have the 'struct dentry *d_swiotlb_usage' variable
> static since new value always be assigned before use it.

FYI, this is in swiotlb_create_debugfs, not swiotlb_dma_supported.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 1/9] iommu: Add APIs for multiple domains per device

2019-02-13 Thread Lu Baolu

Hi Jean,

On 2/13/19 7:55 PM, Jean-Philippe Brucker wrote:

Hi,

I have a few boring nits and one question below


Thanks a lot for reviewing my patch.



On 13/02/2019 04:02, Lu Baolu wrote:

Sharing a physical PCI device in a finer-granularity way
is becoming a consensus in the industry. IOMMU vendors
are also engaging efforts to support such sharing as well
as possible. Among the efforts, the capability of support
finer-granularity DMA isolation is a common requirement
due to the security consideration. With finer-granularity
DMA isolation, all DMA requests out of or to a subset of
a physical PCI device can be protected by the IOMMU.


That last sentence seems strange, how about "With finer-granularity DMA
isolation, subsets of a PCI function can be isolated from each others by
the IOMMU."


Yours looks better. Thanks!




As a
result, there is a request in software to attach multiple
domains to a physical PCI device. One example of such use
model is the Intel Scalable IOV [1] [2]. The Intel vt-d
3.0 spec [3] introduces the scalable mode which enables
PASID granularity DMA isolation.

This adds the APIs to support multiple domains per device.
In order to ease the discussions, we call it 'a domain in
auxiliary mode' or simply 'auxiliary domain' when multiple
domains are attached to a physical device.

The APIs include:

* iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)
   - Check whether both IOMMU and device support IOMMU aux
     domain feature. Below aux-domain specific interfaces
     are available only after this returns true.


s/after/if/ since calling has_feature() shouldn't be a prerequisite to
using the aux-domain interface (unlike calling enable_feature()).


After reconsideration, I think my comments about this API isn't
correct. It should be like,

* iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)
  - Detect both IOMMU and PCI endpoint devices supporting the feature
(aux-domain here) without host driver dependency.

* iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)
  - Check the enabling status of the feature (aux-domain here). The
aux-domain interfaces are available only if this returns true.





* iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX)
   - Enable/disable device specific aux-domain feature.

* iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)
   - Check whether the aux domain specific feature enabled or
     not.


"is enabled"


Sure.





* iommu_aux_attach_device(domain, dev)
   - Attaches @domain to @dev in the auxiliary mode. Multiple
     domains could be attached to a single device in the
     auxiliary mode with each domain representing an isolated
     address space for an assignable subset of the device.

* iommu_aux_detach_device(domain, dev)
   - Detach @domain which has been attached to @dev in the
     auxiliary mode.

* iommu_aux_get_pasid(domain, dev)
   - Return ID used for finer-granularity DMA translation.
     For the Intel Scalable IOV usage model, this will be
     a PASID. The device which supports Scalable IOV needs
     to write this ID to the device register so that DMA
     requests could be tagged with a right PASID prefix.

This has been updated with the latest proposal from Joerg
posted here [5].

Many people involved in discussions of this design.

Kevin Tian 
Liu Yi L 
Ashok Raj 
Sanjay Kumar 
Jacob Pan 
Alex Williamson 
Jean-Philippe Brucker 
Joerg Roedel 

and some discussions can be found here [4] [5].

[1]
https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[2] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
[3]
https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
[4] https://lkml.org/lkml/2018/7/26/4
[5] https://www.spinics.net/lists/iommu/msg31874.html

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Liu Yi L 
Suggested-by: Kevin Tian 
Suggested-by: Jean-Philippe Brucker 
Suggested-by: Joerg Roedel 
Signed-off-by: Lu Baolu 
---
  drivers/iommu/iommu.c | 91 +++
  include/linux/iommu.h | 70 +
  2 files changed, 161 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3ed4db334341..d0b323e8357f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2033,3 +2033,94 @@ int iommu_fwspec_add_ids(struct device *dev, u32
*ids, int num_ids)
  return 0;
  }
  EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
+
+/*
+ * Per device IOMMU features.
+ */
+bool iommu_dev_has_feature(struct device *dev, enum iommu_dev_features
feat)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+   if (ops && ops->dev_has_feat)
+   return ops->dev_has_feat(dev, feat);
+
+   return false;
+}
+EXPORT_SYMBOL_GPL(iommu_dev_has_feature);
+
+int iommu_dev_enable_feature(struct device *dev, enum
iommu_dev_features feat)
+{
+   const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+   if (o

[PATCH -next] swiotlb: drop pointless static qualifier in swiotlb_dma_supported()

2019-02-13 Thread YueHaibing
There is no need to have the 'struct dentry *d_swiotlb_usage' variable
static since new value always be assigned before use it.

Signed-off-by: YueHaibing 
---
 kernel/dma/swiotlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index a7b53786db9f..02fa517c47d9 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -689,7 +689,7 @@ swiotlb_dma_supported(struct device *hwdev, u64 mask)
 
 static int __init swiotlb_create_debugfs(void)
 {
-   static struct dentry *d_swiotlb_usage;
+   struct dentry *d_swiotlb_usage;
struct dentry *ent;
 
d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);





___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] PCI/ATS: Add pci_prg_resp_pasid_required() interface.

2019-02-13 Thread Bjorn Helgaas
On Mon, Feb 11, 2019 at 01:50:31PM -0800, 
sathyanarayanan.kuppusw...@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan 
> 
> Return the PRG Response PASID Required bit in the Page Request
> Status Register.
> 
> As per PCIe spec r4.0, sec 10.5.2.3, if this bit is Set then the device
> expects a PASID TLP Prefix on PRG Response Messages when the
> corresponding Page Requests had a PASID TLP Prefix. If Clear, the device
> does not expect PASID TLP Prefixes on any PRG Response Message, and the
> device behavior is undefined if this bit is Clear and the device
> receives a PRG Response Message with a PASID TLP Prefix. Also the device
> behavior is undefined in the this bit is Set and the device receives a
> PRG Response Message with no PASID TLP Prefix when the corresponding
> Page Requests had a PASID TLP Prefix.

s/Set then the device/Set, the device/
s/undefined if this bit is Clear and the device/undefined if the device/
s/is undefined in the this/is undefined if this/

> This function will be used by drivers like IOMMU, if it is required to
> check the status of the PRG Response PASID Required bit before enabling
> the PASID support of the device.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Keith Busch 
> Suggested-by: Ashok Raj 
> Signed-off-by: Kuppuswamy Sathyanarayanan 
> 

With typos (also below) addressed,

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/ats.c | 31 +++
>  include/linux/pci-ats.h   |  5 +
>  include/uapi/linux/pci_regs.h |  1 +
>  3 files changed, 37 insertions(+)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 5b78f3b1b918..f843cd846dff 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -368,6 +368,37 @@ int pci_pasid_features(struct pci_dev *pdev)
>  }
>  EXPORT_SYMBOL_GPL(pci_pasid_features);
>  
> +/**
> + * pci_prg_resp_pasid_required - Return PRG Response PASID Required bit
> + *status.
> + * @pdev: PCI device structure
> + *
> + * Returns 1 if PASID is required in PRG Response message, 0 otherwise.
> + *
> + * Even though the PRG response PASID status is read from PRI status
> + * register, since this API will mainly be used by PASID users, this
> + * function is defined within #ifdef CONFIG_PCI_PASID instead of
> + * CONFIG_PCI_PRI.
> + *

Remove blank comment line.

> + */
> +int pci_prg_resp_pasid_required(struct pci_dev *pdev)
> +{
> + u16 status;
> + int pos;
> +
> + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PRI);
> + if (!pos)
> + return 0;
> +
> + pci_read_config_word(pdev, pos + PCI_PRI_STATUS, &status);
> +
> + if (status & PCI_PRI_STATUS_PASID)
> + return 1;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_prg_resp_pasid_required);
> +
>  #define PASID_NUMBER_SHIFT   8
>  #define PASID_NUMBER_MASK(0x1f << PASID_NUMBER_SHIFT)
>  /**
> diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h
> index 7c4b8e27268c..facfd6a18fe1 100644
> --- a/include/linux/pci-ats.h
> +++ b/include/linux/pci-ats.h
> @@ -40,6 +40,7 @@ void pci_disable_pasid(struct pci_dev *pdev);
>  void pci_restore_pasid_state(struct pci_dev *pdev);
>  int pci_pasid_features(struct pci_dev *pdev);
>  int pci_max_pasids(struct pci_dev *pdev);
> +int pci_prg_resp_pasid_required(struct pci_dev *pdev);
>  
>  #else  /* CONFIG_PCI_PASID */
>  
> @@ -66,6 +67,10 @@ static inline int pci_max_pasids(struct pci_dev *pdev)
>   return -EINVAL;
>  }
>  
> +static int pci_prg_resp_pasid_required(struct pci_dev *pdev)
> +{
> + return 0;
> +}
>  #endif /* CONFIG_PCI_PASID */
>  
>  
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index e1e9888c85e6..898be572b010 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -880,6 +880,7 @@
>  #define  PCI_PRI_STATUS_RF   0x001   /* Response Failure */
>  #define  PCI_PRI_STATUS_UPRGI0x002   /* Unexpected PRG index */
>  #define  PCI_PRI_STATUS_STOPPED  0x100   /* PRI Stopped */
> +#define  PCI_PRI_STATUS_PASID0x8000  /* PRG Response PASID Required 
> */
>  #define PCI_PRI_MAX_REQ  0x08/* PRI max reqs supported */
>  #define PCI_PRI_ALLOC_REQ0x0c/* PRI max reqs allowed */
>  #define PCI_EXT_CAP_PRI_SIZEOF   16
> -- 
> 2.20.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/2] PCI/ATS: Add pci_ats_page_aligned() interface

2019-02-13 Thread Bjorn Helgaas
On Mon, Feb 11, 2019 at 01:44:34PM -0800, 
sathyanarayanan.kuppusw...@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan 
> 
> Return the Page Aligned Request bit in the ATS Capability Register.
> 
> As per PCIe spec r4.0, sec 10.5.1.2, If Page Aligned Request bit is
> set, then it indicates the Untranslated Addresses generated by the
> device are alwayis always aligned to a 4096 byte boundary.

s/, If/, if the/
s/then it/it/
s/alwayis//

> This interface will be used by drivers like IOMMU, if it is required
> to check whether the Untranslated Address generated by the device will
> be aligned before enabling the ATS service.

Maybe something like this?

  An IOMMU that can only translate page-aligned addresses can only be used
  with devices that always produce aligned Untranslated Addresses.  This
  interface will be used by drivers for such IOMMUs to determine whether
  devices can use the ATS service.

> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Keith Busch 
> Suggested-by: Ashok Raj 
> Signed-off-by: Kuppuswamy Sathyanarayanan 
> 

With typos addressed (more below),

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/ats.c | 27 +++
>  include/linux/pci.h   |  2 ++
>  include/uapi/linux/pci_regs.h |  1 +
>  3 files changed, 30 insertions(+)
> 
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index 5b78f3b1b918..b3c7f1496081 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -142,6 +142,33 @@ int pci_ats_queue_depth(struct pci_dev *dev)
>  }
>  EXPORT_SYMBOL_GPL(pci_ats_queue_depth);
>  
> +/**
> + * pci_ats_page_aligned - Return Page Aligned Request bit status.
> + * @pdev: the PCI device
> + *
> + * Returns 1, if Untranslated Addresses generated by the device are
> + * always aligned or 0 otherwise.
> + *
> + * Per PCIe spec r4.0, sec 10.5.1.2, If Page Aligned Request bit is
> + * set, it indicates the Untranslated Addresses generated by the
> + * device are always aligned to a 4096 byte boundary.

s/, If/, if the/

> + */
> +int pci_ats_page_aligned(struct pci_dev *pdev)
> +{
> + u16 cap;
> +
> + if (!pdev->ats_cap)
> + return 0;
> +
> + pci_read_config_word(pdev, pdev->ats_cap + PCI_ATS_CAP, &cap);
> +
> + if (cap & PCI_ATS_CAP_PAGE_ALIGNED)
> + return 1;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_ats_page_aligned);
> +
>  #ifdef CONFIG_PCI_PRI
>  /**
>   * pci_enable_pri - Enable PRI capability
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 65f1d8c2f082..9724a8c0496b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1524,11 +1524,13 @@ void pci_ats_init(struct pci_dev *dev);
>  int pci_enable_ats(struct pci_dev *dev, int ps);
>  void pci_disable_ats(struct pci_dev *dev);
>  int pci_ats_queue_depth(struct pci_dev *dev);
> +int pci_ats_page_aligned(struct pci_dev *dev);
>  #else
>  static inline void pci_ats_init(struct pci_dev *d) { }
>  static inline int pci_enable_ats(struct pci_dev *d, int ps) { return 
> -ENODEV; }
>  static inline void pci_disable_ats(struct pci_dev *d) { }
>  static inline int pci_ats_queue_depth(struct pci_dev *d) { return -ENODEV; }
> +static inline int pci_ats_page_aligned(struct pci_dev *dev) { return 0; }
>  #endif
>  
>  #ifdef CONFIG_PCIE_PTM
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index e1e9888c85e6..7973bb02ed4b 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -866,6 +866,7 @@
>  #define PCI_ATS_CAP  0x04/* ATS Capability Register */
>  #define  PCI_ATS_CAP_QDEP(x) ((x) & 0x1f)/* Invalidate Queue Depth */
>  #define  PCI_ATS_MAX_QDEP32  /* Max Invalidate Queue Depth */
> +#define  PCI_ATS_CAP_PAGE_ALIGNED0x0020 /* Page Aligned Request */
>  #define PCI_ATS_CTRL 0x06/* ATS Control Register */
>  #define  PCI_ATS_CTRL_ENABLE 0x8000  /* ATS Enable */
>  #define  PCI_ATS_CTRL_STU(x) ((x) & 0x1f)/* Smallest Translation Unit */
> -- 
> 2.20.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/12] dma-mapping: improve selection of dma_declare_coherent availability

2019-02-13 Thread Rob Herring
On Wed, Feb 13, 2019 at 12:24 PM Christoph Hellwig  wrote:
>
> On Tue, Feb 12, 2019 at 02:40:23PM -0600, Rob Herring wrote:
> > > diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
> > > index 3607fd2810e4..f8c66a9472a4 100644
> > > --- a/drivers/of/Kconfig
> > > +++ b/drivers/of/Kconfig
> > > @@ -43,6 +43,7 @@ config OF_FLATTREE
> > >
> > >  config OF_EARLY_FLATTREE
> > > bool
> > > +   select DMA_DECLARE_COHERENT
> >
> > Is selecting DMA_DECLARE_COHERENT okay on UML? We run the unittests with 
> > UML.
>
> No, that will fail with undefined references to memunmap.
>
> I gues this needs to be
>
> select DMA_DECLARE_COHERENT if HAS_DMA
>
> > Maybe we should just get rid of OF_RESERVED_MEM. If we support booting
> > from DT, then it should always be enabled anyways.
>
> Fine with me.  Do you want me to respin the series to just remove
> it?

Either now or it can wait. I don't want to hold this up any.

Rob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 21/21] arm64: trim includes in dma-mapping.c

2019-02-13 Thread Christoph Hellwig
With most of the previous functionality now elsewhere a lot of the
headers included in this file are not needed.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/mm/dma-mapping.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index ad46594b3799..cfa084bca3ea 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -5,20 +5,9 @@
  */
 
 #include 
-#include 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
 #include 
-#include 
 #include 
-#include 
-#include 
-#include 
-
 #include 
 
 pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: implement generic dma_map_ops for IOMMUs v2

2019-02-13 Thread Christoph Hellwig
Sorry,

plese ignore this thread.  This just resend the start of the
dma-mapping for-next branch instead of the actual series that
sits on top of it.  

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 20/21] arm64: switch copyright boilerplace to SPDX in dma-mapping.c

2019-02-13 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
Acked-by: Robin Murphy 
---
 arch/arm64/mm/dma-mapping.c | 15 +--
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index bf49e982c978..ad46594b3799 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -1,20 +1,7 @@
+// SPDX-License-Identifier: GPL-2.0
 /*
- * SWIOTLB-based DMA API implementation
- *
  * Copyright (C) 2012 ARM Ltd.
  * Author: Catalin Marinas 
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see .
  */
 
 #include 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 19/21] dma-iommu: switch copyright boilerplace to SPDX

2019-02-13 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
Acked-by: Robin Murphy 
---
 drivers/iommu/dma-iommu.c | 13 +
 include/linux/dma-iommu.h | 13 +
 2 files changed, 2 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 35a5c219b82e..625d6085adfe 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0
 /*
  * A fairly generic DMA-API to IOMMU-API glue layer.
  *
@@ -5,18 +6,6 @@
  *
  * based in part on arch/arm/mm/dma-mapping.c:
  * Copyright (C) 2000-2004 Russell King
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see .
  */
 
 #include 
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 1d1a3b58d574..b4e283c26ad3 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -1,17 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0 */
 /*
  * Copyright (C) 2014-2015 ARM Ltd.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2 as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see .
  */
 #ifndef __DMA_IOMMU_H
 #define __DMA_IOMMU_H
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 18/21] dma-iommu: don't depend on CONFIG_DMA_DIRECT_REMAP

2019-02-13 Thread Christoph Hellwig
For entirely dma coherent architectures there is no requirement to ever
remap dma coherent allocation.  Move all the remap and pool code under
CONFIG_DMA_DIRECT_REMAP ifdefs, and drop the Kconfig dependency.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/Kconfig |  1 -
 drivers/iommu/dma-iommu.c | 10 ++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 8b13fb7d0263..d9a25715650e 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -94,7 +94,6 @@ config IOMMU_DMA
select IOMMU_API
select IOMMU_IOVA
select NEED_SG_DMA_LENGTH
-   depends on DMA_DIRECT_REMAP
 
 config FSL_PAMU
bool "Freescale IOMMU support"
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 5f3c70c65d50..35a5c219b82e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -501,6 +501,7 @@ static void *iommu_dma_alloc_contiguous(struct device *dev, 
size_t size,
return page_address(page);
 }
 
+#ifdef CONFIG_DMA_DIRECT_REMAP
 static void __iommu_dma_free_pages(struct page **pages, int count)
 {
while (count--)
@@ -783,6 +784,7 @@ static void *iommu_dma_alloc_noncoherent(struct device 
*dev, size_t size,
gfp, attrs);
return iommu_dma_alloc_remap(dev, size, dma_handle, gfp, attrs);
 }
+#endif /* CONFIG_DMA_DIRECT_REMAP */
 
 static void iommu_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
@@ -1065,6 +1067,7 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 */
gfp |= __GFP_ZERO;
 
+#ifdef CONFIG_DMA_DIRECT_REMAP
if (!dev_is_dma_coherent(dev))
return iommu_dma_alloc_noncoherent(dev, size, dma_handle, gfp,
attrs);
@@ -1072,6 +1075,7 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
if (gfpflags_allow_blocking(gfp) &&
!(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
return iommu_dma_alloc_remap(dev, size, dma_handle, gfp, attrs);
+#endif
 
return iommu_dma_alloc_contiguous(dev, size, dma_handle, gfp, attrs);
 }
@@ -1091,6 +1095,7 @@ static void iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr,
 *
 * Hence how dodgy the below logic looks...
 */
+#ifdef CONFIG_DMA_DIRECT_REMAP
if (dma_in_atomic_pool(cpu_addr, PAGE_ALIGN(size))) {
iommu_dma_free_pool(dev, size, cpu_addr, dma_handle);
return;
@@ -1104,6 +1109,7 @@ static void iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr,
page = vmalloc_to_page(cpu_addr);
dma_common_free_remap(cpu_addr, PAGE_ALIGN(size), VM_USERMAP);
} else
+#endif
page = virt_to_page(cpu_addr);
 
iommu_dma_free_contiguous(dev, size, page, dma_handle);
@@ -1126,11 +1132,13 @@ static int iommu_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
if (off >= nr_pages || vma_pages(vma) > nr_pages - off)
return -ENXIO;
 
+#ifdef CONFIG_DMA_DIRECT_REMAP
if (is_vmalloc_addr(cpu_addr)) {
if (!(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
return iommu_dma_mmap_remap(cpu_addr, size, vma);
pfn = vmalloc_to_pfn(cpu_addr);
} else
+#endif
pfn = page_to_pfn(virt_to_page(cpu_addr));
 
return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
@@ -1144,11 +1152,13 @@ static int iommu_dma_get_sgtable(struct device *dev, 
struct sg_table *sgt,
struct page *page;
int ret;
 
+#ifdef CONFIG_DMA_DIRECT_REMAP
if (is_vmalloc_addr(cpu_addr)) {
if (!(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
return iommu_dma_get_sgtable_remap(sgt, cpu_addr, size);
page = vmalloc_to_page(cpu_addr);
} else
+#endif
page = virt_to_page(cpu_addr);
 
ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 15/21] dma-iommu: don't remap contiguous allocations for coherent devices

2019-02-13 Thread Christoph Hellwig
There is no need to remap for pte attributes, or for a virtually
contiguous address, so just don't do it.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ff6c6bf30c90..3199c9c81294 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1049,10 +1049,10 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 
addr = iommu_dma_alloc_contiguous(dev, iosize, handle, gfp,
attrs);
-   if (!addr)
-   return NULL;
-   page = virt_to_page(addr);
+   if (coherent || !addr)
+   return addr;
 
+   page = virt_to_page(addr);
addr = dma_common_contiguous_remap(page, size, VM_USERMAP, prot,
__builtin_return_address(0));
if (!addr) {
@@ -1060,8 +1060,7 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
return NULL;
}
 
-   if (!coherent)
-   arch_dma_prep_coherent(page, iosize);
+   arch_dma_prep_coherent(page, iosize);
} else {
addr = iommu_dma_alloc_remap(dev, iosize, handle, gfp, attrs);
}
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 11/21] dma-iommu: refactor page array remap helpers

2019-02-13 Thread Christoph Hellwig
Move the call to dma_common_pages_remap / dma_common_free_remap  into
__iommu_dma_alloc / __iommu_dma_free and rename those functions to
better describe what they do.  This keeps the functionality that
allocates and remaps a non-contigous array of pages nicely abstracted
out from the calling code.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 75 +++
 1 file changed, 36 insertions(+), 39 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 2b446178fb8d..c1ecb0c3436e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -525,51 +525,57 @@ static struct page **__iommu_dma_alloc_pages(struct 
device *dev,
 }
 
 /**
- * iommu_dma_free - Free a buffer allocated by __iommu_dma_alloc()
+ * iommu_dma_free_remap - Free a buffer allocated by iommu_dma_alloc_remap
  * @dev: Device which owns this buffer
- * @pages: Array of buffer pages as returned by __iommu_dma_alloc()
  * @size: Size of buffer in bytes
+ * @cpu_address: Virtual address of the buffer
  * @handle: DMA address of buffer
  *
  * Frees both the pages associated with the buffer, and the array
  * describing them
  */
-static void __iommu_dma_free(struct device *dev, struct page **pages,
-   size_t size, dma_addr_t *handle)
+static void iommu_dma_free_remap(struct device *dev, size_t size,
+   void *cpu_addr, dma_addr_t dma_handle)
 {
-   __iommu_dma_unmap(iommu_get_dma_domain(dev), *handle, size);
-   __iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
-   *handle = DMA_MAPPING_ERROR;
+   struct vm_struct *area = find_vm_area(cpu_addr);
+
+   if (WARN_ON(!area || !area->pages))
+   return;
+   __iommu_dma_unmap(iommu_get_dma_domain(dev), dma_handle, size);
+   __iommu_dma_free_pages(area->pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
+   dma_common_free_remap(cpu_addr, PAGE_ALIGN(size), VM_USERMAP);
 }
 
 /**
- * __iommu_dma_alloc - Allocate and map a buffer contiguous in IOVA space
+ * iommu_dma_alloc_remap - Allocate and map a buffer contiguous in IOVA space
  * @dev: Device to allocate memory for. Must be a real device
  *  attached to an iommu_dma_domain
  * @size: Size of buffer in bytes
+ * @dma_handle: Out argument for allocated DMA handle
  * @gfp: Allocation flags
  * @attrs: DMA attributes for this allocation
- * @prot: IOMMU mapping flags
- * @handle: Out argument for allocated DMA handle
  *
  * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
  * but an IOMMU which supports smaller pages might not map the whole thing.
  *
- * Return: Array of struct page pointers describing the buffer,
- *or NULL on failure.
+ * Return: Mapped virtual address, or NULL on failure.
  */
-static struct page **__iommu_dma_alloc(struct device *dev, size_t size,
-   gfp_t gfp, unsigned long attrs, int prot, dma_addr_t *handle)
+static void *iommu_dma_alloc_remap(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
 {
struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = &cookie->iovad;
+   bool coherent = dev_is_dma_coherent(dev);
+   int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
+   pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
+   unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap;
struct page **pages;
struct sg_table sgt;
dma_addr_t iova;
-   unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap;
+   void *vaddr;
 
-   *handle = DMA_MAPPING_ERROR;
+   *dma_handle = DMA_MAPPING_ERROR;
 
min_size = alloc_sizes & -alloc_sizes;
if (min_size < PAGE_SIZE) {
@@ -595,7 +601,7 @@ static struct page **__iommu_dma_alloc(struct device *dev, 
size_t size,
if (sg_alloc_table_from_pages(&sgt, pages, count, 0, size, GFP_KERNEL))
goto out_free_iova;
 
-   if (!(prot & IOMMU_CACHE)) {
+   if (!(ioprot & IOMMU_CACHE)) {
struct scatterlist *sg;
int i;
 
@@ -603,14 +609,21 @@ static struct page **__iommu_dma_alloc(struct device 
*dev, size_t size,
arch_dma_prep_coherent(sg_page(sg), sg->length);
}
 
-   if (iommu_map_sg(domain, iova, sgt.sgl, sgt.orig_nents, prot)
+   if (iommu_map_sg(domain, iova, sgt.sgl, sgt.orig_nents, ioprot)
< size)
goto out_free_sg;
 
-   *handle = iova;
+   vaddr = dma_common_pages_remap(pages, size, VM_USERMAP, prot,
+   __builtin_return_address(0));
+   if (!vaddr)
+   goto out_unmap;
+
+   *dma_handle = iova;
sg_free_table(&sgt);
-   return pages;
+   return vaddr;
 
+out_unmap:
+   __iommu_dma_unmap(domain, iova, size);
 ou

[PATCH 09/21] dma-iommu: refactor iommu_dma_get_sgtable

2019-02-13 Thread Christoph Hellwig
Move the vm_area handling into a new iommu_dma_get_sgtable_remap helper.

Inline __iommu_dma_get_sgtable_page into the main function to simplify
the code flow a bit.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 54 +--
 1 file changed, 24 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ed2ef8409806..7ea9b2fac74b 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -626,6 +626,18 @@ static int iommu_dma_mmap_remap(void *cpu_addr, size_t 
size,
return ret;
 }
 
+static int iommu_dma_get_sgtable_remap(struct sg_table *sgt, void *cpu_addr,
+   size_t size)
+{
+   unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+   struct vm_struct *area = find_vm_area(cpu_addr);
+
+   if (WARN_ON(!area || !area->pages))
+   return -ENXIO;
+   return sg_alloc_table_from_pages(sgt, area->pages, count, 0, size,
+   GFP_KERNEL);
+}
+
 static void iommu_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
 {
@@ -1085,42 +1097,24 @@ static int iommu_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
vma_pages(vma) << PAGE_SHIFT, vma->vm_page_prot);
 }
 
-static int __iommu_dma_get_sgtable_page(struct sg_table *sgt, struct page 
*page,
-   size_t size)
-{
-   int ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
-
-   if (!ret)
-   sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
-   return ret;
-}
-
 static int iommu_dma_get_sgtable(struct device *dev, struct sg_table *sgt,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
 {
-   unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   struct vm_struct *area = find_vm_area(cpu_addr);
-
-   if (!is_vmalloc_addr(cpu_addr)) {
-   struct page *page = virt_to_page(cpu_addr);
-   return __iommu_dma_get_sgtable_page(sgt, page, size);
-   }
-
-   if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   /*
-* DMA_ATTR_FORCE_CONTIGUOUS allocations are always remapped,
-* hence in the vmalloc space.
-*/
-   struct page *page = vmalloc_to_page(cpu_addr);
-   return __iommu_dma_get_sgtable_page(sgt, page, size);
-   }
+   struct page *page;
+   int ret;
 
-   if (WARN_ON(!area || !area->pages))
-   return -ENXIO;
+   if (is_vmalloc_addr(cpu_addr)) {
+   if (!(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
+   return iommu_dma_get_sgtable_remap(sgt, cpu_addr, size);
+   page = vmalloc_to_page(cpu_addr);
+   } else
+   page = virt_to_page(cpu_addr);
 
-   return sg_alloc_table_from_pages(sgt, area->pages, count, 0, size,
-GFP_KERNEL);
+   ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
+   if (!ret)
+   sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
+   return ret;
 }
 
 static const struct dma_map_ops iommu_dma_ops = {
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 07/21] dma-iommu: move the arm64 wrappers to common code

2019-02-13 Thread Christoph Hellwig
There is nothing really arm64 specific in the iommu_dma_ops
implementation, so move it to dma-iommu.c and keep a lot of symbols
self-contained.  Note the implementation does depend on the
DMA_DIRECT_REMAP infrastructure for now, so we'll have to make the
DMA_IOMMU support depend on it, but this will be relaxed soon.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/mm/dma-mapping.c | 389 +---
 drivers/iommu/Kconfig   |   1 +
 drivers/iommu/dma-iommu.c   | 383 ---
 include/linux/dma-iommu.h   |  44 +---
 4 files changed, 365 insertions(+), 452 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 54787a3d4ad9..bf49e982c978 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -58,27 +59,6 @@ void arch_dma_prep_coherent(struct page *page, size_t size)
__dma_flush_area(page_address(page), size);
 }
 
-#ifdef CONFIG_IOMMU_DMA
-static int __swiotlb_get_sgtable_page(struct sg_table *sgt,
- struct page *page, size_t size)
-{
-   int ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
-
-   if (!ret)
-   sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0);
-
-   return ret;
-}
-
-static int __swiotlb_mmap_pfn(struct vm_area_struct *vma,
- unsigned long pfn, size_t size)
-{
-   return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
- vma->vm_end - vma->vm_start,
- vma->vm_page_prot);
-}
-#endif /* CONFIG_IOMMU_DMA */
-
 static int __init arm64_dma_init(void)
 {
WARN_TAINT(ARCH_DMA_MINALIGN < cache_line_size(),
@@ -90,379 +70,18 @@ static int __init arm64_dma_init(void)
 arch_initcall(arm64_dma_init);
 
 #ifdef CONFIG_IOMMU_DMA
-#include 
-#include 
-#include 
-
-static void *__iommu_alloc_attrs(struct device *dev, size_t size,
-dma_addr_t *handle, gfp_t gfp,
-unsigned long attrs)
-{
-   bool coherent = dev_is_dma_coherent(dev);
-   int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
-   size_t iosize = size;
-   void *addr;
-
-   if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
-   return NULL;
-
-   size = PAGE_ALIGN(size);
-
-   /*
-* Some drivers rely on this, and we probably don't want the
-* possibility of stale kernel data being read by devices anyway.
-*/
-   gfp |= __GFP_ZERO;
-
-   if (!gfpflags_allow_blocking(gfp)) {
-   struct page *page;
-   /*
-* In atomic context we can't remap anything, so we'll only
-* get the virtually contiguous buffer we need by way of a
-* physically contiguous allocation.
-*/
-   if (coherent) {
-   page = alloc_pages(gfp, get_order(size));
-   addr = page ? page_address(page) : NULL;
-   } else {
-   addr = dma_alloc_from_pool(size, &page, gfp);
-   }
-   if (!addr)
-   return NULL;
-
-   *handle = iommu_dma_map_page(dev, page, 0, iosize, ioprot);
-   if (*handle == DMA_MAPPING_ERROR) {
-   if (coherent)
-   __free_pages(page, get_order(size));
-   else
-   dma_free_from_pool(addr, size);
-   addr = NULL;
-   }
-   } else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
-   struct page *page;
-
-   page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT,
-   get_order(size), gfp & __GFP_NOWARN);
-   if (!page)
-   return NULL;
-
-   *handle = iommu_dma_map_page(dev, page, 0, iosize, ioprot);
-   if (*handle == DMA_MAPPING_ERROR) {
-   dma_release_from_contiguous(dev, page,
-   size >> PAGE_SHIFT);
-   return NULL;
-   }
-   addr = dma_common_contiguous_remap(page, size, VM_USERMAP,
-  prot,
-  __builtin_return_address(0));
-   if (addr) {
-   if (!coherent)
-   __dma_flush_area(page_to_virt(page), iosize);
-   memset(addr, 0, size);
-   } else {
-   iommu_dma_unmap_page(dev, *handle, iosize, 0, attrs);
-   dma_release_from_contiguous(dev, page,
-

[PATCH 08/21] dma-iommu: refactor iommu_dma_mmap

2019-02-13 Thread Christoph Hellwig
Move the vm_area handling into __iommu_dma_mmap, which is renamed
to iommu_dma_mmap_remap.

Inline __iommu_dma_mmap_pfn into the main function to simplify the code
flow a bit.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 50 ++-
 1 file changed, 18 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c320c52cdac4..ed2ef8409806 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -598,23 +598,27 @@ static struct page **__iommu_dma_alloc(struct device 
*dev, size_t size,
 }
 
 /**
- * __iommu_dma_mmap - Map a buffer into provided user VMA
- * @pages: Array representing buffer from __iommu_dma_alloc()
+ * iommu_dma_mmap_remap - Map a remapped page array into provided user VMA
+ * @cpu_addr: virtual address of the memory to be remapped
  * @size: Size of buffer in bytes
  * @vma: VMA describing requested userspace mapping
  *
- * Maps the pages of the buffer in @pages into @vma. The caller is responsible
+ * Maps the pages pointed to by @cpu_addr into @vma. The caller is responsible
  * for verifying the correct size and protection of @vma beforehand.
  */
-static int __iommu_dma_mmap(struct page **pages, size_t size,
+static int iommu_dma_mmap_remap(void *cpu_addr, size_t size,
struct vm_area_struct *vma)
 {
+   struct vm_struct *area = find_vm_area(cpu_addr);
unsigned long uaddr = vma->vm_start;
unsigned int i, count = PAGE_ALIGN(size) >> PAGE_SHIFT;
int ret = -ENXIO;
 
+   if (WARN_ON(!area || !area->pages))
+   return -ENXIO;
+
for (i = vma->vm_pgoff; i < count && uaddr < vma->vm_end; i++) {
-   ret = vm_insert_page(vma, uaddr, pages[i]);
+   ret = vm_insert_page(vma, uaddr, area->pages[i]);
if (ret)
break;
uaddr += PAGE_SIZE;
@@ -1053,21 +1057,13 @@ static void iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr,
}
 }
 
-static int __iommu_dma_mmap_pfn(struct vm_area_struct *vma,
- unsigned long pfn, size_t size)
-{
-   return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
-  vma->vm_end - vma->vm_start,
-  vma->vm_page_prot);
-}
-
 static int iommu_dma_mmap(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
 {
unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
unsigned long off = vma->vm_pgoff;
-   struct vm_struct *area;
+   unsigned long pfn;
int ret;
 
vma->vm_page_prot = arch_dma_mmap_pgprot(dev, vma->vm_page_prot, attrs);
@@ -1078,25 +1074,15 @@ static int iommu_dma_mmap(struct device *dev, struct 
vm_area_struct *vma,
if (off >= nr_pages || vma_pages(vma) > nr_pages - off)
return -ENXIO;
 
-   if (!is_vmalloc_addr(cpu_addr)) {
-   unsigned long pfn = page_to_pfn(virt_to_page(cpu_addr));
-   return __iommu_dma_mmap_pfn(vma, pfn, size);
-   }
+   if (is_vmalloc_addr(cpu_addr)) {
+   if (!(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
+   return iommu_dma_mmap_remap(cpu_addr, size, vma);
+   pfn = vmalloc_to_pfn(cpu_addr);
+   } else
+   pfn = page_to_pfn(virt_to_page(cpu_addr));
 
-   if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   /*
-* DMA_ATTR_FORCE_CONTIGUOUS allocations are always remapped,
-* hence in the vmalloc space.
-*/
-   unsigned long pfn = vmalloc_to_pfn(cpu_addr);
-   return __iommu_dma_mmap_pfn(vma, pfn, size);
-   }
-
-   area = find_vm_area(cpu_addr);
-   if (WARN_ON(!area || !area->pages))
-   return -ENXIO;
-
-   return __iommu_dma_mmap(area->pages, size, vma);
+   return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
+   vma_pages(vma) << PAGE_SHIFT, vma->vm_page_prot);
 }
 
 static int __iommu_dma_get_sgtable_page(struct sg_table *sgt, struct page 
*page,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 14/21] dma-iommu: refactor iommu_dma_free

2019-02-13 Thread Christoph Hellwig
Reorder the checks a bit so that a non-remapped allocation is the
fallthrough case, as this will ease making remapping conditional.
Also get rid of the confusing game with the size and iosize variables
and rename the handle argument to the more standard dma_handle.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 46 ---
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index b8a2159ca31a..ff6c6bf30c90 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1069,34 +1069,36 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 }
 
 static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr,
-   dma_addr_t handle, unsigned long attrs)
+   dma_addr_t dma_handle, unsigned long attrs)
 {
-   size_t iosize = size;
+   struct page *page;
 
-   size = PAGE_ALIGN(size);
/*
-* @cpu_addr will be one of 4 things depending on how it was allocated:
-* - A remapped array of pages for contiguous allocations.
-* - A remapped array of pages from iommu_dma_alloc_remap(), for all
-*   non-atomic allocations.
-* - A non-cacheable alias from the atomic pool, for atomic
-*   allocations by non-coherent devices.
-* - A normal lowmem address, for atomic allocations by
-*   coherent devices.
+* cpu_addr can be one of 4 things depending on how it was allocated:
+*
+*  (1) A non-cacheable alias from the atomic pool.
+*  (2) A remapped array of pages from iommu_dma_alloc_remap().
+*  (3) A remapped contiguous lowmem allocation.
+*  (4) A normal lowmem address.
+*
 * Hence how dodgy the below logic looks...
 */
-   if (dma_in_atomic_pool(cpu_addr, size)) {
-   iommu_dma_free_pool(dev, size, cpu_addr, handle);
-   } else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   iommu_dma_free_contiguous(dev, iosize,
-   vmalloc_to_page(cpu_addr), handle);
-   dma_common_free_remap(cpu_addr, size, VM_USERMAP);
-   } else if (is_vmalloc_addr(cpu_addr)){
-   iommu_dma_free_remap(dev, iosize, cpu_addr, handle);
-   } else {
-   iommu_dma_free_contiguous(dev, iosize, virt_to_page(cpu_addr),
-   handle);
+   if (dma_in_atomic_pool(cpu_addr, PAGE_ALIGN(size))) {
+   iommu_dma_free_pool(dev, size, cpu_addr, dma_handle);
+   return;
}
+
+   if (is_vmalloc_addr(cpu_addr)) {
+   if (!(attrs & DMA_ATTR_FORCE_CONTIGUOUS)) {
+   iommu_dma_free_remap(dev, size, cpu_addr, dma_handle);
+   return;
+   }
+   page = vmalloc_to_page(cpu_addr);
+   dma_common_free_remap(cpu_addr, PAGE_ALIGN(size), VM_USERMAP);
+   } else
+   page = virt_to_page(cpu_addr);
+
+   iommu_dma_free_contiguous(dev, size, page, dma_handle);
 }
 
 static int iommu_dma_mmap(struct device *dev, struct vm_area_struct *vma,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 17/21] dma-iommu: refactor iommu_dma_alloc

2019-02-13 Thread Christoph Hellwig
Split all functionality related to non-coherent devices into a
separate helper, and make the decision flow more obvious.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 51 +++
 1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 5a7ca5271532..5f3c70c65d50 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -768,6 +768,22 @@ static void *iommu_dma_alloc_pool(struct device *dev, 
size_t size,
return vaddr;
 }
 
+static void *iommu_dma_alloc_noncoherent(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+{
+   /*
+* In atomic context we can't remap anything, so we'll only get the
+* virtually contiguous buffer we need by way of a physically
+* contiguous allocation.
+*/
+   if (!gfpflags_allow_blocking(gfp))
+   return iommu_dma_alloc_pool(dev, size, dma_handle, gfp, attrs);
+   if (attrs & DMA_ATTR_FORCE_CONTIGUOUS)
+   return iommu_dma_alloc_contiguous_remap(dev, size, dma_handle,
+   gfp, attrs);
+   return iommu_dma_alloc_remap(dev, size, dma_handle, gfp, attrs);
+}
+
 static void iommu_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
 {
@@ -1041,40 +1057,23 @@ static void iommu_dma_unmap_resource(struct device 
*dev, dma_addr_t handle,
 }
 
 static void *iommu_dma_alloc(struct device *dev, size_t size,
-   dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
+   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
 {
-   bool coherent = dev_is_dma_coherent(dev);
-   size_t iosize = size;
-   void *addr;
-
/*
 * Some drivers rely on this, and we probably don't want the
 * possibility of stale kernel data being read by devices anyway.
 */
gfp |= __GFP_ZERO;
 
-   if (!gfpflags_allow_blocking(gfp)) {
-   /*
-* In atomic context we can't remap anything, so we'll only
-* get the virtually contiguous buffer we need by way of a
-* physically contiguous allocation.
-*/
-   if (!coherent)
-   return iommu_dma_alloc_pool(dev, iosize, handle, gfp,
-   attrs);
-   return iommu_dma_alloc_contiguous(dev, iosize, handle, gfp,
+   if (!dev_is_dma_coherent(dev))
+   return iommu_dma_alloc_noncoherent(dev, size, dma_handle, gfp,
attrs);
-   } else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   if (coherent)
-   addr = iommu_dma_alloc_contiguous(dev, iosize, handle,
-   gfp, attrs);
-   else
-   addr = iommu_dma_alloc_contiguous_remap(dev, iosize,
-   handle, gfp, attrs);
-   } else {
-   addr = iommu_dma_alloc_remap(dev, iosize, handle, gfp, attrs);
-   }
-   return addr;
+
+   if (gfpflags_allow_blocking(gfp) &&
+   !(attrs & DMA_ATTR_FORCE_CONTIGUOUS))
+   return iommu_dma_alloc_remap(dev, size, dma_handle, gfp, attrs);
+
+   return iommu_dma_alloc_contiguous(dev, size, dma_handle, gfp, attrs);
 }
 
 static void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 12/21] dma-iommu: factor atomic pool allocations into helpers

2019-02-13 Thread Christoph Hellwig
This keeps the code together and will simplify compiling the code
out on architectures that are always dma coherent.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 51 +--
 1 file changed, 38 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c1ecb0c3436e..ff1ada9f98f5 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -674,6 +674,35 @@ static int iommu_dma_get_sgtable_remap(struct sg_table 
*sgt, void *cpu_addr,
GFP_KERNEL);
 }
 
+static void iommu_dma_free_pool(struct device *dev, size_t size,
+   void *vaddr, dma_addr_t dma_handle)
+{
+   __iommu_dma_unmap(iommu_get_domain_for_dev(dev), dma_handle, size);
+   dma_free_from_pool(vaddr, PAGE_ALIGN(size));
+}
+
+static void *iommu_dma_alloc_pool(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+{
+   bool coherent = dev_is_dma_coherent(dev);
+   struct page *page;
+   void *vaddr;
+
+   vaddr = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp);
+   if (!vaddr)
+   return NULL;
+
+   *dma_handle = __iommu_dma_map(dev, page_to_phys(page), size,
+   dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs),
+   iommu_get_domain_for_dev(dev));
+   if (*dma_handle == DMA_MAPPING_ERROR) {
+   dma_free_from_pool(vaddr, PAGE_ALIGN(size));
+   return NULL;
+   }
+
+   return vaddr;
+}
+
 static void iommu_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
 {
@@ -982,21 +1011,18 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 * get the virtually contiguous buffer we need by way of a
 * physically contiguous allocation.
 */
-   if (coherent) {
-   page = alloc_pages(gfp, get_order(size));
-   addr = page ? page_address(page) : NULL;
-   } else {
-   addr = dma_alloc_from_pool(size, &page, gfp);
-   }
-   if (!addr)
+   if (!coherent)
+   return iommu_dma_alloc_pool(dev, iosize, handle, gfp,
+   attrs);
+
+   page = alloc_pages(gfp, get_order(size));
+   if (!page)
return NULL;
 
+   addr = page_address(page);
*handle = __iommu_dma_map_page(dev, page, 0, iosize, ioprot);
if (*handle == DMA_MAPPING_ERROR) {
-   if (coherent)
-   __free_pages(page, get_order(size));
-   else
-   dma_free_from_pool(addr, size);
+   __free_pages(page, get_order(size));
addr = NULL;
}
} else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
@@ -1050,8 +1076,7 @@ static void iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr,
 * Hence how dodgy the below logic looks...
 */
if (dma_in_atomic_pool(cpu_addr, size)) {
-   __iommu_dma_unmap_page(dev, handle, iosize, 0, 0);
-   dma_free_from_pool(cpu_addr, size);
+   iommu_dma_free_pool(dev, size, cpu_addr, handle);
} else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
struct page *page = vmalloc_to_page(cpu_addr);
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 16/21] dma-iommu: factor contiguous remapped allocations into helpers

2019-02-13 Thread Christoph Hellwig
This moves the last remaning non-dispatch code out of iommu_dma_alloc,
preparing to refactor the allocation method selection.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 48 +++
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 3199c9c81294..5a7ca5271532 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -675,6 +675,29 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
return NULL;
 }
 
+static void *iommu_dma_alloc_contiguous_remap(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+{
+   pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
+   struct page *page;
+   void *addr;
+
+   addr = iommu_dma_alloc_contiguous(dev, size, dma_handle, gfp, attrs);
+   if (!addr)
+   return NULL;
+
+   page = virt_to_page(addr);
+   addr = dma_common_contiguous_remap(page, PAGE_ALIGN(size), VM_USERMAP,
+   prot, __builtin_return_address(0));
+   if (!addr)
+   goto out_free;
+   arch_dma_prep_coherent(page, size);
+   return addr;
+out_free:
+   iommu_dma_free_contiguous(dev, size, page, *dma_handle);
+   return NULL;
+}
+
 /**
  * iommu_dma_mmap_remap - Map a remapped page array into provided user VMA
  * @cpu_addr: virtual address of the memory to be remapped
@@ -1024,8 +1047,6 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
size_t iosize = size;
void *addr;
 
-   size = PAGE_ALIGN(size);
-
/*
 * Some drivers rely on this, and we probably don't want the
 * possibility of stale kernel data being read by devices anyway.
@@ -1044,23 +1065,12 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
return iommu_dma_alloc_contiguous(dev, iosize, handle, gfp,
attrs);
} else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
-   pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
-   struct page *page;
-
-   addr = iommu_dma_alloc_contiguous(dev, iosize, handle, gfp,
-   attrs);
-   if (coherent || !addr)
-   return addr;
-
-   page = virt_to_page(addr);
-   addr = dma_common_contiguous_remap(page, size, VM_USERMAP, prot,
-   __builtin_return_address(0));
-   if (!addr) {
-   iommu_dma_free_contiguous(dev, iosize, page, *handle);
-   return NULL;
-   }
-
-   arch_dma_prep_coherent(page, iosize);
+   if (coherent)
+   addr = iommu_dma_alloc_contiguous(dev, iosize, handle,
+   gfp, attrs);
+   else
+   addr = iommu_dma_alloc_contiguous_remap(dev, iosize,
+   handle, gfp, attrs);
} else {
addr = iommu_dma_alloc_remap(dev, iosize, handle, gfp, attrs);
}
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 10/21] dma-iommu: move __iommu_dma_map

2019-02-13 Thread Christoph Hellwig
Moving this function up to its unmap counterpart helps to keep related
code together for the following changes.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 46 +++
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7ea9b2fac74b..2b446178fb8d 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -436,6 +436,29 @@ static void __iommu_dma_unmap(struct iommu_domain *domain, 
dma_addr_t dma_addr,
iommu_dma_free_iova(cookie, dma_addr, size);
 }
 
+static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
+   size_t size, int prot, struct iommu_domain *domain)
+{
+   struct iommu_dma_cookie *cookie = domain->iova_cookie;
+   size_t iova_off = 0;
+   dma_addr_t iova;
+
+   if (cookie->type == IOMMU_DMA_IOVA_COOKIE) {
+   iova_off = iova_offset(&cookie->iovad, phys);
+   size = iova_align(&cookie->iovad, size + iova_off);
+   }
+
+   iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);
+   if (!iova)
+   return DMA_MAPPING_ERROR;
+
+   if (iommu_map(domain, iova, phys - iova_off, size, prot)) {
+   iommu_dma_free_iova(cookie, iova, size);
+   return DMA_MAPPING_ERROR;
+   }
+   return iova + iova_off;
+}
+
 static void __iommu_dma_free_pages(struct page **pages, int count)
 {
while (count--)
@@ -690,29 +713,6 @@ static void iommu_dma_sync_sg_for_device(struct device 
*dev,
arch_sync_dma_for_device(dev, sg_phys(sg), sg->length, dir);
 }
 
-static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-   size_t size, int prot, struct iommu_domain *domain)
-{
-   struct iommu_dma_cookie *cookie = domain->iova_cookie;
-   size_t iova_off = 0;
-   dma_addr_t iova;
-
-   if (cookie->type == IOMMU_DMA_IOVA_COOKIE) {
-   iova_off = iova_offset(&cookie->iovad, phys);
-   size = iova_align(&cookie->iovad, size + iova_off);
-   }
-
-   iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);
-   if (!iova)
-   return DMA_MAPPING_ERROR;
-
-   if (iommu_map(domain, iova, phys - iova_off, size, prot)) {
-   iommu_dma_free_iova(cookie, iova, size);
-   return DMA_MAPPING_ERROR;
-   }
-   return iova + iova_off;
-}
-
 static dma_addr_t __iommu_dma_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size, int prot)
 {
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 13/21] dma-iommu: factor contiguous allocations into helpers

2019-02-13 Thread Christoph Hellwig
This keeps the code together and will simplify using it in different
ways.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 110 --
 1 file changed, 59 insertions(+), 51 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ff1ada9f98f5..b8a2159ca31a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -459,6 +459,48 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
return iova + iova_off;
 }
 
+static void iommu_dma_free_contiguous(struct device *dev, size_t size,
+   struct page *page, dma_addr_t dma_handle)
+{
+   unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+   __iommu_dma_unmap(iommu_get_domain_for_dev(dev), dma_handle, size);
+   if (!dma_release_from_contiguous(dev, page, count))
+   __free_pages(page, get_order(size));
+}
+
+
+static void *iommu_dma_alloc_contiguous(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
+{
+   bool coherent = dev_is_dma_coherent(dev);
+   int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
+   unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+   unsigned int page_order = get_order(size);
+   struct page *page = NULL;
+
+   if (gfpflags_allow_blocking(gfp))
+   page = dma_alloc_from_contiguous(dev, count, page_order,
+gfp & __GFP_NOWARN);
+
+   if (page)
+   memset(page_address(page), 0, PAGE_ALIGN(size));
+   else
+   page = alloc_pages(gfp, page_order);
+   if (!page)
+   return NULL;
+
+   *dma_handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot,
+   iommu_get_dma_domain(dev));
+   if (*dma_handle == DMA_MAPPING_ERROR) {
+   if (!dma_release_from_contiguous(dev, page, count))
+   __free_pages(page, page_order);
+   return NULL;
+   }
+
+   return page_address(page);
+}
+
 static void __iommu_dma_free_pages(struct page **pages, int count)
 {
while (count--)
@@ -755,19 +797,6 @@ static void iommu_dma_sync_sg_for_device(struct device 
*dev,
arch_sync_dma_for_device(dev, sg_phys(sg), sg->length, dir);
 }
 
-static dma_addr_t __iommu_dma_map_page(struct device *dev, struct page *page,
-   unsigned long offset, size_t size, int prot)
-{
-   return __iommu_dma_map(dev, page_to_phys(page) + offset, size, prot,
-   iommu_get_dma_domain(dev));
-}
-
-static void __iommu_dma_unmap_page(struct device *dev, dma_addr_t handle,
-   size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
-   __iommu_dma_unmap(iommu_get_dma_domain(dev), handle, size);
-}
-
 static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size, enum dma_data_direction dir,
unsigned long attrs)
@@ -992,7 +1021,6 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
dma_addr_t *handle, gfp_t gfp, unsigned long attrs)
 {
bool coherent = dev_is_dma_coherent(dev);
-   int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
size_t iosize = size;
void *addr;
 
@@ -1005,7 +1033,6 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
gfp |= __GFP_ZERO;
 
if (!gfpflags_allow_blocking(gfp)) {
-   struct page *page;
/*
 * In atomic context we can't remap anything, so we'll only
 * get the virtually contiguous buffer we need by way of a
@@ -1014,44 +1041,27 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
if (!coherent)
return iommu_dma_alloc_pool(dev, iosize, handle, gfp,
attrs);
-
-   page = alloc_pages(gfp, get_order(size));
-   if (!page)
-   return NULL;
-
-   addr = page_address(page);
-   *handle = __iommu_dma_map_page(dev, page, 0, iosize, ioprot);
-   if (*handle == DMA_MAPPING_ERROR) {
-   __free_pages(page, get_order(size));
-   addr = NULL;
-   }
+   return iommu_dma_alloc_contiguous(dev, iosize, handle, gfp,
+   attrs);
} else if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
pgprot_t prot = arch_dma_mmap_pgprot(dev, PAGE_KERNEL, attrs);
struct page *page;
 
-   page = dma_alloc_from_contiguous(dev, size >> PAGE_SHIFT,
-   get_order(size), gfp & __GFP_NOWARN);
-   if (!page)
+   addr = iommu_dma_alloc_contiguous(dev, iosize, handle, gfp,
+ 

[PATCH 06/21] dma-iommu: use for_each_sg in iommu_dma_alloc

2019-02-13 Thread Christoph Hellwig
arch_dma_prep_coherent can handle physically contiguous ranges larger
than PAGE_SIZE just fine, which means we don't need a page-based
iterator.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index af8a39861b8f..ee697cfb2227 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -578,15 +578,11 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
goto out_free_iova;
 
if (!(prot & IOMMU_CACHE)) {
-   struct sg_mapping_iter miter;
-   /*
-* The CPU-centric flushing implied by SG_MITER_TO_SG isn't
-* sufficient here, so skip it by using the "wrong" direction.
-*/
-   sg_miter_start(&miter, sgt.sgl, sgt.orig_nents, 
SG_MITER_FROM_SG);
-   while (sg_miter_next(&miter))
-   arch_dma_prep_coherent(miter.page, PAGE_SIZE);
-   sg_miter_stop(&miter);
+   struct scatterlist *sg;
+   int i;
+
+   for_each_sg(sgt.sgl, sg, sgt.orig_nents, i)
+   arch_dma_prep_coherent(sg_page(sg), sg->length);
}
 
if (iommu_map_sg(domain, iova, sgt.sgl, sgt.orig_nents, prot)
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 05/21] dma-iommu: remove the flush_page callback

2019-02-13 Thread Christoph Hellwig
We now have a arch_dma_prep_coherent architecture hook that is used
for the generic DMA remap allocator, and we should use the same
interface for the dma-iommu code.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/mm/dma-mapping.c | 8 +---
 drivers/iommu/dma-iommu.c   | 8 +++-
 include/linux/dma-iommu.h   | 3 +--
 3 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index e54288921e72..54787a3d4ad9 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -94,12 +94,6 @@ arch_initcall(arm64_dma_init);
 #include 
 #include 
 
-/* Thankfully, all cache ops are by VA so we can ignore phys here */
-static void flush_page(struct device *dev, const void *virt, phys_addr_t phys)
-{
-   __dma_flush_area(virt, PAGE_SIZE);
-}
-
 static void *__iommu_alloc_attrs(struct device *dev, size_t size,
 dma_addr_t *handle, gfp_t gfp,
 unsigned long attrs)
@@ -176,7 +170,7 @@ static void *__iommu_alloc_attrs(struct device *dev, size_t 
size,
struct page **pages;
 
pages = iommu_dma_alloc(dev, iosize, gfp, attrs, ioprot,
-   handle, flush_page);
+   handle);
if (!pages)
return NULL;
 
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d19f3d6b43c1..af8a39861b8f 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -532,8 +533,6 @@ void iommu_dma_free(struct device *dev, struct page 
**pages, size_t size,
  * @attrs: DMA attributes for this allocation
  * @prot: IOMMU mapping flags
  * @handle: Out argument for allocated DMA handle
- * @flush_page: Arch callback which must ensure PAGE_SIZE bytes from the
- * given VA/PA are visible to the given non-coherent device.
  *
  * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
  * but an IOMMU which supports smaller pages might not map the whole thing.
@@ -542,8 +541,7 @@ void iommu_dma_free(struct device *dev, struct page 
**pages, size_t size,
  *or NULL on failure.
  */
 struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
-   unsigned long attrs, int prot, dma_addr_t *handle,
-   void (*flush_page)(struct device *, const void *, phys_addr_t))
+   unsigned long attrs, int prot, dma_addr_t *handle)
 {
struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
@@ -587,7 +585,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
 */
sg_miter_start(&miter, sgt.sgl, sgt.orig_nents, 
SG_MITER_FROM_SG);
while (sg_miter_next(&miter))
-   flush_page(dev, miter.addr, page_to_phys(miter.page));
+   arch_dma_prep_coherent(miter.page, PAGE_SIZE);
sg_miter_stop(&miter);
}
 
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 3e206f4ee173..10ef708a605c 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -45,8 +45,7 @@ int dma_info_to_prot(enum dma_data_direction dir, bool 
coherent,
  * the arch code to take care of attributes and cache maintenance
  */
 struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
-   unsigned long attrs, int prot, dma_addr_t *handle,
-   void (*flush_page)(struct device *, const void *, phys_addr_t));
+   unsigned long attrs, int prot, dma_addr_t *handle);
 void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
dma_addr_t *handle);
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 04/21] dma-iommu: cleanup dma-iommu.h

2019-02-13 Thread Christoph Hellwig
No need for a __KERNEL__ guard outside uapi and add a missing comment
describing the #else cpp statement.  Last but not least include
 instead of the asm version, which is frowned upon.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-iommu.h | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index e760dc5d1fa8..3e206f4ee173 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -16,14 +16,14 @@
 #ifndef __DMA_IOMMU_H
 #define __DMA_IOMMU_H
 
-#ifdef __KERNEL__
+#include 
 #include 
-#include 
 
 #ifdef CONFIG_IOMMU_DMA
 #include 
 #include 
 #include 
+#include 
 
 int iommu_dma_init(void);
 
@@ -74,7 +74,7 @@ void iommu_dma_unmap_resource(struct device *dev, dma_addr_t 
handle,
 void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);
 void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);
 
-#else
+#else /* CONFIG_IOMMU_DMA */
 
 struct iommu_domain;
 struct msi_msg;
@@ -108,5 +108,4 @@ static inline void iommu_dma_get_resv_regions(struct device 
*dev, struct list_he
 }
 
 #endif /* CONFIG_IOMMU_DMA */
-#endif /* __KERNEL__ */
 #endif /* __DMA_IOMMU_H */
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 03/21] dma-mapping: add a Kconfig symbol to indicated arch_dma_prep_coherent presence

2019-02-13 Thread Christoph Hellwig
Add a Kconfig symbol that indicates an architecture provides a
arch_dma_prep_coherent implementation, and provide a stub otherwise.

This will allow the generic dma-iommu code to it while still allowing
to be built for cache coherent architectures.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/Kconfig  | 1 +
 arch/csky/Kconfig   | 1 +
 include/linux/dma-noncoherent.h | 6 ++
 kernel/dma/Kconfig  | 3 +++
 4 files changed, 11 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 87ec7be25e97..52175007ffb7 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -13,6 +13,7 @@ config ARM64
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_DMA_COHERENT_TO_PFN
select ARCH_HAS_DMA_MMAP_PGPROT
+   select ARCH_HAS_DMA_PREP_COHERENT
select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FAST_MULTIPLIER
diff --git a/arch/csky/Kconfig b/arch/csky/Kconfig
index 398113c845f5..8b84d4362ff6 100644
--- a/arch/csky/Kconfig
+++ b/arch/csky/Kconfig
@@ -1,5 +1,6 @@
 config CSKY
def_bool y
+   select ARCH_HAS_DMA_PREP_COHERENT
select ARCH_HAS_SYNC_DMA_FOR_CPU
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_USE_BUILTIN_BSWAP
diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h
index 69b36ed31a99..9741767e400f 100644
--- a/include/linux/dma-noncoherent.h
+++ b/include/linux/dma-noncoherent.h
@@ -72,6 +72,12 @@ static inline void arch_sync_dma_for_cpu_all(struct device 
*dev)
 }
 #endif /* CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL */
 
+#ifdef CONFIG_ARCH_HAS_DMA_PREP_COHERENT
 void arch_dma_prep_coherent(struct page *page, size_t size);
+#else
+static inline void arch_dma_prep_coherent(struct page *page, size_t size)
+{
+}
+#endif /* CONFIG_ARCH_HAS_DMA_PREP_COHERENT */
 
 #endif /* _LINUX_DMA_NONCOHERENT_H */
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index bde9179c6ed7..c3b2a5ae2dd4 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -35,6 +35,9 @@ config ARCH_HAS_SYNC_DMA_FOR_CPU
 config ARCH_HAS_SYNC_DMA_FOR_CPU_ALL
bool
 
+config ARCH_HAS_DMA_PREP_COHERENT
+   bool
+
 config ARCH_HAS_DMA_COHERENT_TO_PFN
bool
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 02/21] arm64/iommu: improve mmap bounds checking

2019-02-13 Thread Christoph Hellwig
The nr_pages checks should be done for all mmap requests, not just those
using remap_pfn_range.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/mm/dma-mapping.c | 21 -
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index be88beb2e377..e54288921e72 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -73,19 +73,9 @@ static int __swiotlb_get_sgtable_page(struct sg_table *sgt,
 static int __swiotlb_mmap_pfn(struct vm_area_struct *vma,
  unsigned long pfn, size_t size)
 {
-   int ret = -ENXIO;
-   unsigned long nr_vma_pages = vma_pages(vma);
-   unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   unsigned long off = vma->vm_pgoff;
-
-   if (off < nr_pages && nr_vma_pages <= (nr_pages - off)) {
-   ret = remap_pfn_range(vma, vma->vm_start,
- pfn + off,
- vma->vm_end - vma->vm_start,
- vma->vm_page_prot);
-   }
-
-   return ret;
+   return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff,
+ vma->vm_end - vma->vm_start,
+ vma->vm_page_prot);
 }
 #endif /* CONFIG_IOMMU_DMA */
 
@@ -241,6 +231,8 @@ static int __iommu_mmap_attrs(struct device *dev, struct 
vm_area_struct *vma,
  void *cpu_addr, dma_addr_t dma_addr, size_t size,
  unsigned long attrs)
 {
+   unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+   unsigned long off = vma->vm_pgoff;
struct vm_struct *area;
int ret;
 
@@ -249,6 +241,9 @@ static int __iommu_mmap_attrs(struct device *dev, struct 
vm_area_struct *vma,
if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, &ret))
return ret;
 
+   if (off >= nr_pages || vma_pages(vma) > nr_pages - off)
+   return -ENXIO;
+
if (!is_vmalloc_addr(cpu_addr)) {
unsigned long pfn = page_to_pfn(virt_to_page(cpu_addr));
return __swiotlb_mmap_pfn(vma, pfn, size);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 01/21] arm64/iommu: handle non-remapped addresses in ->mmap and ->get_sgtable

2019-02-13 Thread Christoph Hellwig
DMA allocations that can't sleep may return non-remapped addresses, but
we do not properly handle them in the mmap and get_sgtable methods.
Resolve non-vmalloc addresses using virt_to_page to handle this corner
case.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/mm/dma-mapping.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 78c0a72f822c..be88beb2e377 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -249,6 +249,11 @@ static int __iommu_mmap_attrs(struct device *dev, struct 
vm_area_struct *vma,
if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, &ret))
return ret;
 
+   if (!is_vmalloc_addr(cpu_addr)) {
+   unsigned long pfn = page_to_pfn(virt_to_page(cpu_addr));
+   return __swiotlb_mmap_pfn(vma, pfn, size);
+   }
+
if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
/*
 * DMA_ATTR_FORCE_CONTIGUOUS allocations are always remapped,
@@ -272,10 +277,15 @@ static int __iommu_get_sgtable(struct device *dev, struct 
sg_table *sgt,
unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
struct vm_struct *area = find_vm_area(cpu_addr);
 
+   if (!is_vmalloc_addr(cpu_addr)) {
+   struct page *page = virt_to_page(cpu_addr);
+   return __swiotlb_get_sgtable_page(sgt, page, size);
+   }
+
if (attrs & DMA_ATTR_FORCE_CONTIGUOUS) {
/*
 * DMA_ATTR_FORCE_CONTIGUOUS allocations are always remapped,
-* hence in the vmalloc space.
+*  hence in the vmalloc space.
 */
struct page *page = vmalloc_to_page(cpu_addr);
return __swiotlb_get_sgtable_page(sgt, page, size);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


implement generic dma_map_ops for IOMMUs v2

2019-02-13 Thread Christoph Hellwig
Hi Robin,

please take a look at this series, which implements a completely generic
set of dma_map_ops for IOMMU drivers.  This is done by taking the
existing arm64 code, moving it to drivers/iommu and then massaging it
so that it can also work for architectures with DMA remapping.  This
should help future ports to support IOMMUs more easily, and also allow
to remove various custom IOMMU dma_map_ops implementations, like Tom
was planning to for the AMD one.

A git tree is also available at:

git://git.infradead.org/users/hch/misc.git dma-iommu-ops.2

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-iommu-ops.2

Changes since v1:
 - only include other headers in dma-iommu.h if CONFIG_DMA_IOMMU is enabled
 - keep using a scatterlist in iommu_dma_alloc
 - split out mmap/sgtable fixes and move them early in the series
 - updated a few commit logs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 02/29] dma-mapping: don't BUG when calling dma_map_resource on RAM

2019-02-13 Thread Christoph Hellwig
Use WARN_ON_ONCE to print a stack trace and return a proper error
code instead.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Robin Murphy 
Tested-by: Marek Szyprowski 
---
 include/linux/dma-mapping.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 9842085e6774..b904d55247ab 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -353,7 +353,8 @@ static inline dma_addr_t dma_map_resource(struct device 
*dev,
BUG_ON(!valid_dma_direction(dir));
 
/* Don't allow RAM to be mapped */
-   BUG_ON(pfn_valid(PHYS_PFN(phys_addr)));
+   if (WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr
+   return DMA_MAPPING_ERROR;
 
if (dma_is_direct(ops))
addr = dma_direct_map_resource(dev, phys_addr, size, dir, 
attrs);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


implement generic dma_map_ops for IOMMUs v2

2019-02-13 Thread Christoph Hellwig
Hi Robin,

please take a look at this series, which implements a completely generic
set of dma_map_ops for IOMMU drivers.  This is done by taking the
existing arm64 code, moving it to drivers/iommu and then massaging it
so that it can also work for architectures with DMA remapping.  This
should help future ports to support IOMMUs more easily, and also allow
to remove various custom IOMMU dma_map_ops implementations, like Tom
was planning to for the AMD one.

A git tree is also available at:

git://git.infradead.org/users/hch/misc.git dma-iommu-ops.2

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-iommu-ops.2

Changes since v1:
 - only include other headers in dma-iommu.h if CONFIG_DMA_IOMMU is enabled
 - keep using a scatterlist in iommu_dma_alloc
 - split out mmap/sgtable fixes and move them early in the series
 - updated a few commit logs
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 01/29] dma-mapping: remove the default map_resource implementation

2019-02-13 Thread Christoph Hellwig
Instead provide a proper implementation in the direct mapping code, and
also wire it up for arm and powerpc, leaving an error return for all the
IOMMU or virtual mapping instances for which we'd have to wire up an
actual implementation

Signed-off-by: Christoph Hellwig 
Tested-by: Marek Szyprowski 
---
 arch/arm/mm/dma-mapping.c |  2 ++
 arch/powerpc/kernel/dma-swiotlb.c |  1 +
 arch/powerpc/kernel/dma.c |  1 +
 include/linux/dma-mapping.h   | 12 +++-
 kernel/dma/direct.c   | 14 ++
 5 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index f1e2922e447c..3c8534904209 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -188,6 +188,7 @@ const struct dma_map_ops arm_dma_ops = {
.unmap_page = arm_dma_unmap_page,
.map_sg = arm_dma_map_sg,
.unmap_sg   = arm_dma_unmap_sg,
+   .map_resource   = dma_direct_map_resource,
.sync_single_for_cpu= arm_dma_sync_single_for_cpu,
.sync_single_for_device = arm_dma_sync_single_for_device,
.sync_sg_for_cpu= arm_dma_sync_sg_for_cpu,
@@ -211,6 +212,7 @@ const struct dma_map_ops arm_coherent_dma_ops = {
.get_sgtable= arm_dma_get_sgtable,
.map_page   = arm_coherent_dma_map_page,
.map_sg = arm_dma_map_sg,
+   .map_resource   = dma_direct_map_resource,
.dma_supported  = arm_dma_supported,
 };
 EXPORT_SYMBOL(arm_coherent_dma_ops);
diff --git a/arch/powerpc/kernel/dma-swiotlb.c 
b/arch/powerpc/kernel/dma-swiotlb.c
index 7d5fc9751622..fbb2506a414e 100644
--- a/arch/powerpc/kernel/dma-swiotlb.c
+++ b/arch/powerpc/kernel/dma-swiotlb.c
@@ -55,6 +55,7 @@ const struct dma_map_ops powerpc_swiotlb_dma_ops = {
.dma_supported = swiotlb_dma_supported,
.map_page = dma_direct_map_page,
.unmap_page = dma_direct_unmap_page,
+   .map_resource = dma_direct_map_resource,
.sync_single_for_cpu = dma_direct_sync_single_for_cpu,
.sync_single_for_device = dma_direct_sync_single_for_device,
.sync_sg_for_cpu = dma_direct_sync_sg_for_cpu,
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index b1903ebb2e9c..258b9e8ebb99 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -273,6 +273,7 @@ const struct dma_map_ops dma_nommu_ops = {
.dma_supported  = dma_nommu_dma_supported,
.map_page   = dma_nommu_map_page,
.unmap_page = dma_nommu_unmap_page,
+   .map_resource   = dma_direct_map_resource,
.get_required_mask  = dma_nommu_get_required_mask,
 #ifdef CONFIG_NOT_COHERENT_CACHE
.sync_single_for_cpu= dma_nommu_sync_single,
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f6ded992c183..9842085e6774 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -208,6 +208,8 @@ dma_addr_t dma_direct_map_page(struct device *dev, struct 
page *page,
unsigned long attrs);
 int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
enum dma_data_direction dir, unsigned long attrs);
+dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
+   size_t size, enum dma_data_direction dir, unsigned long attrs);
 
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
 defined(CONFIG_SWIOTLB)
@@ -346,19 +348,19 @@ static inline dma_addr_t dma_map_resource(struct device 
*dev,
  unsigned long attrs)
 {
const struct dma_map_ops *ops = get_dma_ops(dev);
-   dma_addr_t addr;
+   dma_addr_t addr = DMA_MAPPING_ERROR;
 
BUG_ON(!valid_dma_direction(dir));
 
/* Don't allow RAM to be mapped */
BUG_ON(pfn_valid(PHYS_PFN(phys_addr)));
 
-   addr = phys_addr;
-   if (ops && ops->map_resource)
+   if (dma_is_direct(ops))
+   addr = dma_direct_map_resource(dev, phys_addr, size, dir, 
attrs);
+   else if (ops->map_resource)
addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
 
debug_dma_map_resource(dev, phys_addr, size, dir, addr);
-
return addr;
 }
 
@@ -369,7 +371,7 @@ static inline void dma_unmap_resource(struct device *dev, 
dma_addr_t addr,
const struct dma_map_ops *ops = get_dma_ops(dev);
 
BUG_ON(!valid_dma_direction(dir));
-   if (ops && ops->unmap_resource)
+   if (!dma_is_direct(ops) && ops->unmap_resource)
ops->unmap_resource(dev, addr, size, dir, attrs);
debug_dma_unmap_resource(dev, addr, size, dir);
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 355d16acee6d..25bd19974223 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -356

Re: [PATCH 06/12] dma-mapping: improve selection of dma_declare_coherent availability

2019-02-13 Thread Christoph Hellwig
On Tue, Feb 12, 2019 at 02:40:23PM -0600, Rob Herring wrote:
> > diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
> > index 3607fd2810e4..f8c66a9472a4 100644
> > --- a/drivers/of/Kconfig
> > +++ b/drivers/of/Kconfig
> > @@ -43,6 +43,7 @@ config OF_FLATTREE
> >
> >  config OF_EARLY_FLATTREE
> > bool
> > +   select DMA_DECLARE_COHERENT
> 
> Is selecting DMA_DECLARE_COHERENT okay on UML? We run the unittests with UML.

No, that will fail with undefined references to memunmap.

I gues this needs to be

select DMA_DECLARE_COHERENT if HAS_DMA

> Maybe we should just get rid of OF_RESERVED_MEM. If we support booting
> from DT, then it should always be enabled anyways.

Fine with me.  Do you want me to respin the series to just remove
it?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] iommu/vt-d: Enable PASID only if device expects PASID in PRG Response.

2019-02-13 Thread sathyanarayanan kuppuswamy



On 2/13/19 12:26 AM, Tian, Kevin wrote:

From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
boun...@lists.linux-foundation.org] On Behalf Of
sathyanarayanan.kuppusw...@linux.intel.com
Sent: Tuesday, February 12, 2019 5:51 AM
To: bhelg...@google.com; j...@8bytes.org; dw...@infradead.org
Cc: Raj, Ashok ; linux-...@vger.kernel.org; linux-
ker...@vger.kernel.org; Busch, Keith ;
iommu@lists.linux-foundation.org; Pan, Jacob jun

Subject: [PATCH v2 2/2] iommu/vt-d: Enable PASID only if device expects
PASID in PRG Response.

From: Kuppuswamy Sathyanarayanan


In Intel IOMMU, if the Page Request Queue (PRQ) is full, it will
automatically respond to the device with a success message as a keep
alive. And when sending the success message, IOMMU will include PASID in
the Response Message when the Page Request has a PASID in Request
Message and It does not check against the PRG Response PASID
requirement
of the device before sending the response. Also, If the device receives the
PRG response with PASID when its not expecting it then the device behavior
is undefined. So enable PASID support only if device expects PASID in PRG
response message.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Keith Busch 
Suggested-by: Ashok Raj 
Signed-off-by: Kuppuswamy Sathyanarayanan

---
  drivers/iommu/intel-iommu.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 1457f931218e..af2e4a011787 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1399,7 +1399,8 @@ static void iommu_enable_dev_iotlb(struct
device_domain_info *info)
   undefined. So always enable PASID support on devices which
   have it, even if we can't yet know if we're ever going to
   use it. */
-   if (info->pasid_supported && !pci_enable_pasid(pdev, info-

pasid_supported & ~1))

+   if (info->pasid_supported && pci_prg_resp_pasid_required(pdev)
&&
+   !pci_enable_pasid(pdev, info->pasid_supported & ~1))
info->pasid_enabled = 1;

Above logic looks problematic. As Dave commented in another thread,
PRI and PASID are orthogonal capabilities. Especially with introduction
of VT-d scalable mode, PASID will be used alone even w/o PRI...

Why not doing the check when PRI is actually enabled? At that point
you can fail the request if above condition is false.

yes, makes sense. I will fix it in next version.



if (info->pri_supported && !pci_reset_pri(pdev)
&& !pci_enable_pri(pdev, 32))
--
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


--
Sathyanarayanan Kuppuswamy
Linux kernel developer

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 03/12] of: mark early_init_dt_alloc_reserved_memory_arch static

2019-02-13 Thread Christoph Hellwig
On Tue, Feb 12, 2019 at 02:24:19PM -0600, Rob Herring wrote:
> Looks like this one isn't a dependency, so I can take it if you want.

Sure, please go ahead.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/12] mfd/sm501: depend on HAS_DMA

2019-02-13 Thread Christoph Hellwig
On Wed, Feb 13, 2019 at 07:29:31AM +, Lee Jones wrote:
> I would normally have taken this, but I fear it will conflict with
> [PATCH 06/12].  For that reason, just take my:
> 
>   Acked-by: Lee Jones 

Yes, I'll need it for the later patches in the series.

Thanks for the review.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 01/12] iommu/vt-d: Implement dma_[un]map_resource()

2019-02-13 Thread Logan Gunthorpe
Oops, sorry. Please ignore the first two patches in this series. They
have already been merged independently.

Logan



On 2019-02-13 10:54 a.m., Logan Gunthorpe wrote:
> Currently the Intel IOMMU uses the default dma_[un]map_resource()
> implementations does nothing and simply returns the physical address
> unmodified.
> 
> However, this doesn't create the IOVA entries necessary for addresses
> mapped this way to work when the IOMMU is enabled. Thus, when the
> IOMMU is enabled, drivers relying on dma_map_resource() will trigger
> DMAR errors. We see this when running ntb_transport with the IOMMU
> enabled, DMA, and switchtec hardware.
> 
> The implementation for intel_map_resource() is nearly identical to
> intel_map_page(), we just have to re-create __intel_map_single().
> dma_unmap_resource() uses intel_unmap_page() directly as the
> functions are identical.
> 
> Signed-off-by: Logan Gunthorpe 
> Cc: David Woodhouse 
> Cc: Joerg Roedel 
> ---
>  drivers/iommu/intel-iommu.c | 23 ---
>  1 file changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 78188bf7e90d..ad737e16575b 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -3649,11 +3649,9 @@ static int iommu_no_mapping(struct device *dev)
>   return 0;
>  }
>  
> -static dma_addr_t __intel_map_page(struct device *dev, struct page *page,
> -unsigned long offset, size_t size, int dir,
> -u64 dma_mask)
> +static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr,
> +  size_t size, int dir, u64 dma_mask)
>  {
> - phys_addr_t paddr = page_to_phys(page) + offset;
>   struct dmar_domain *domain;
>   phys_addr_t start_paddr;
>   unsigned long iova_pfn;
> @@ -3715,7 +3713,15 @@ static dma_addr_t intel_map_page(struct device *dev, 
> struct page *page,
>enum dma_data_direction dir,
>unsigned long attrs)
>  {
> - return __intel_map_page(dev, page, offset, size, dir, *dev->dma_mask);
> + return __intel_map_single(dev, page_to_phys(page) + offset, size,
> +   dir, *dev->dma_mask);
> +}
> +
> +static dma_addr_t intel_map_resource(struct device *dev, phys_addr_t 
> phys_addr,
> +  size_t size, enum dma_data_direction dir,
> +  unsigned long attrs)
> +{
> + return __intel_map_single(dev, phys_addr, size, dir, *dev->dma_mask);
>  }
>  
>  static void intel_unmap(struct device *dev, dma_addr_t dev_addr, size_t size)
> @@ -3806,8 +3812,9 @@ static void *intel_alloc_coherent(struct device *dev, 
> size_t size,
>   return NULL;
>   memset(page_address(page), 0, size);
>  
> - *dma_handle = __intel_map_page(dev, page, 0, size, DMA_BIDIRECTIONAL,
> -dev->coherent_dma_mask);
> + *dma_handle = __intel_map_single(dev, page_to_phys(page), size,
> +  DMA_BIDIRECTIONAL,
> +  dev->coherent_dma_mask);
>   if (*dma_handle != DMA_MAPPING_ERROR)
>   return page_address(page);
>   if (!dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT))
> @@ -3924,6 +3931,8 @@ static const struct dma_map_ops intel_dma_ops = {
>   .unmap_sg = intel_unmap_sg,
>   .map_page = intel_map_page,
>   .unmap_page = intel_unmap_page,
> + .map_resource = intel_map_resource,
> + .unmap_resource = intel_unmap_page,
>   .dma_supported = dma_direct_supported,
>  };
>  
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: add config symbols for arch_{setup,teardown}_dma_ops

2019-02-13 Thread Christoph Hellwig
Thanks Catalin and Paul.  I've merged this into the dma-mapping
for-next branch.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] dma-mapping: Move debug configuration options to kernel/dma

2019-02-13 Thread Christoph Hellwig
Thanks, applied to the dma-mapping for-next branch.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] iommu/vt-d: Enable PASID only if device expects PASID in PRG Response.

2019-02-13 Thread Raj, Ashok
On Wed, Feb 13, 2019 at 12:26:33AM -0800, Tian, Kevin wrote:
> > 
> > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> > index 1457f931218e..af2e4a011787 100644
> > --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -1399,7 +1399,8 @@ static void iommu_enable_dev_iotlb(struct
> > device_domain_info *info)
> >undefined. So always enable PASID support on devices which
> >have it, even if we can't yet know if we're ever going to
> >use it. */
> > -   if (info->pasid_supported && !pci_enable_pasid(pdev, info-
> > >pasid_supported & ~1))
> > +   if (info->pasid_supported && pci_prg_resp_pasid_required(pdev)
> > &&
> > +   !pci_enable_pasid(pdev, info->pasid_supported & ~1))
> > info->pasid_enabled = 1;
> 
> Above logic looks problematic. As Dave commented in another thread,
> PRI and PASID are orthogonal capabilities. Especially with introduction
> of VT-d scalable mode, PASID will be used alone even w/o PRI...
> 
> Why not doing the check when PRI is actually enabled? At that point
> you can fail the request if above condition is false. 
> 

That makes sense. 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 09/12] NTB: Introduce MSI library

2019-02-13 Thread Logan Gunthorpe
The NTB MSI library allows passing MSI interrupts across a memory
window. This offers similar functionality to doorbells or messages
except will often have much better latency and the client can
potentially use significantly more remote interrupts than typical hardware
provides for doorbells. (Which can be important in high-multiport
setups.)

The library utilizes one memory window per peer and uses the highest
index memory windows. Before any ntb_msi function may be used, the user
must call ntb_msi_init(). It may then setup and tear down the memory
windows when the link state changes using ntb_msi_setup_mws() and
ntb_msi_clear_mws().

The peer which receives the interrupt must call ntb_msim_request_irq()
to assign the interrupt handler (this function is functionally
similar to devm_request_irq()) and the returned descriptor must be
transferred to the peer which can use it to trigger the interrupt.
The triggering peer, once having received the descriptor, can
trigger the interrupt by calling ntb_msi_peer_trigger().

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
---
 drivers/ntb/Kconfig  |  11 ++
 drivers/ntb/Makefile |   3 +-
 drivers/ntb/msi.c| 415 +++
 include/linux/ntb.h  |  73 
 4 files changed, 501 insertions(+), 1 deletion(-)
 create mode 100644 drivers/ntb/msi.c

diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
index 95944e52fa36..5760764052be 100644
--- a/drivers/ntb/Kconfig
+++ b/drivers/ntb/Kconfig
@@ -12,6 +12,17 @@ menuconfig NTB
 
 if NTB
 
+config NTB_MSI
+   bool "MSI Interrupt Support"
+   depends on PCI_MSI
+   help
+Support using MSI interrupt forwarding instead of (or in addition to)
+hardware doorbells. MSI interrupts typically offer lower latency
+than doorbells and more MSI interrupts can be made available to
+clients. However this requires an extra memory window and support
+in the hardware driver for creating the MSI interrupts.
+
+If unsure, say N.
 source "drivers/ntb/hw/Kconfig"
 
 source "drivers/ntb/test/Kconfig"
diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
index 537226f8e78d..cc27ad2ef150 100644
--- a/drivers/ntb/Makefile
+++ b/drivers/ntb/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_NTB) += ntb.o hw/ test/
 obj-$(CONFIG_NTB_TRANSPORT) += ntb_transport.o
 
-ntb-y := core.o
+ntb-y  := core.o
+ntb-$(CONFIG_NTB_MSI)  += msi.o
diff --git a/drivers/ntb/msi.c b/drivers/ntb/msi.c
new file mode 100644
index ..5d4bd7a63924
--- /dev/null
+++ b/drivers/ntb/msi.c
@@ -0,0 +1,415 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_VERSION("0.1");
+MODULE_AUTHOR("Logan Gunthorpe ");
+MODULE_DESCRIPTION("NTB MSI Interrupt Library");
+
+struct ntb_msi {
+   u64 base_addr;
+   u64 end_addr;
+
+   void (*desc_changed)(void *ctx);
+
+   u32 *peer_mws[];
+};
+
+/**
+ * ntb_msi_init() - Initialize the MSI context
+ * @ntb:   NTB device context
+ *
+ * This function must be called before any other ntb_msi function.
+ * It initializes the context for MSI operations and maps
+ * the peer memory windows.
+ *
+ * This function reserves the last N outbound memory windows (where N
+ * is the number of peers).
+ *
+ * Return: Zero on success, otherwise a negative error number.
+ */
+int ntb_msi_init(struct ntb_dev *ntb,
+void (*desc_changed)(void *ctx))
+{
+   phys_addr_t mw_phys_addr;
+   resource_size_t mw_size;
+   size_t struct_size;
+   int peer_widx;
+   int peers;
+   int ret;
+   int i;
+
+   peers = ntb_peer_port_count(ntb);
+   if (peers <= 0)
+   return -EINVAL;
+
+   struct_size = sizeof(*ntb->msi) + sizeof(*ntb->msi->peer_mws) * peers;
+
+   ntb->msi = devm_kzalloc(&ntb->dev, struct_size, GFP_KERNEL);
+   if (!ntb->msi)
+   return -ENOMEM;
+
+   ntb->msi->desc_changed = desc_changed;
+
+   for (i = 0; i < peers; i++) {
+   peer_widx = ntb_peer_mw_count(ntb) - 1 - i;
+
+   ret = ntb_peer_mw_get_addr(ntb, peer_widx, &mw_phys_addr,
+  &mw_size);
+   if (ret)
+   goto unroll;
+
+   ntb->msi->peer_mws[i] = devm_ioremap(&ntb->dev, mw_phys_addr,
+mw_size);
+   if (!ntb->msi->peer_mws[i]) {
+   ret = -EFAULT;
+   goto unroll;
+   }
+   }
+
+   return 0;
+
+unroll:
+   for (i = 0; i < peers; i++)
+   if (ntb->msi->peer_mws[i])
+   devm_iounmap(&ntb->dev, ntb->msi->peer_mws[i]);
+
+   devm_kfree(&ntb->dev, ntb->msi);
+   ntb->msi = NULL;
+   return ret;
+}
+EXPORT_SYMBOL(ntb_msi_init);
+
+/**
+ * ntb_msi_setup_mws() - Initialize

[PATCH v2 06/12] PCI/switchtec: Add module parameter to request more interrupts

2019-02-13 Thread Logan Gunthorpe
Seeing the we want to use more interrupts in the NTB MSI code
we need to be able allocate more (sometimes virtual) interrupts
in the switchtec driver. Therefore add a module parameter to
request to allocate additional interrupts.

This puts virtually no limit on the number of MSI interrupts available
to NTB clients.

Signed-off-by: Logan Gunthorpe 
Cc: Bjorn Helgaas 
---
 drivers/pci/switch/switchtec.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
index e22766c79fe9..8b1db78197d9 100644
--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -30,6 +30,10 @@ module_param(use_dma_mrpc, bool, 0644);
 MODULE_PARM_DESC(use_dma_mrpc,
 "Enable the use of the DMA MRPC feature");
 
+static int nirqs = 32;
+module_param(nirqs, int, 0644);
+MODULE_PARM_DESC(nirqs, "number of interrupts to allocate (more may be useful 
for NTB applications)");
+
 static dev_t switchtec_devt;
 static DEFINE_IDA(switchtec_minor_ida);
 
@@ -1247,8 +1251,12 @@ static int switchtec_init_isr(struct switchtec_dev 
*stdev)
int dma_mrpc_irq;
int rc;
 
-   nvecs = pci_alloc_irq_vectors(stdev->pdev, 1, 4,
- PCI_IRQ_MSIX | PCI_IRQ_MSI);
+   if (nirqs < 4)
+   nirqs = 4;
+
+   nvecs = pci_alloc_irq_vectors(stdev->pdev, 1, nirqs,
+ PCI_IRQ_MSIX | PCI_IRQ_MSI |
+ PCI_IRQ_VIRTUAL);
if (nvecs < 0)
return nvecs;
 
-- 
2.19.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 10/12] NTB: Introduce NTB MSI Test Client

2019-02-13 Thread Logan Gunthorpe
Introduce a tool to test NTB MSI interrupts similar to the other
NTB test tools. This tool creates a debugfs directory for each
NTB device with the following files:

port
irqX_occurrences
peerX/port
peerX/count
peerX/trigger

The 'port' file tells the user the local port number and the
'occurrences' files tell the number of local interrupts that
have been received for each interrupt.

For each peer, the 'port' file and the 'count' file tell you the
peer's port number and number of interrupts respectively. Writing
the interrupt number to the 'trigger' file triggers the interrupt
handler for the peer which should increment their corresponding
'occurrences' file. The 'ready' file indicates if a peer is ready,
writing to this file blocks until it is ready.

The module parameter num_irqs can be used to set the number of
local interrupts. By default this is 4. This is only limited by
the number of unused MSI interrupts registered by the hardware
(this will require support of the hardware driver) and there must
be at least 2*num_irqs + 1 spads registers available.

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
---
 drivers/ntb/test/Kconfig|   9 +
 drivers/ntb/test/Makefile   |   1 +
 drivers/ntb/test/ntb_msi_test.c | 433 
 3 files changed, 443 insertions(+)
 create mode 100644 drivers/ntb/test/ntb_msi_test.c

diff --git a/drivers/ntb/test/Kconfig b/drivers/ntb/test/Kconfig
index a5d0eda44438..a3f3e2638935 100644
--- a/drivers/ntb/test/Kconfig
+++ b/drivers/ntb/test/Kconfig
@@ -25,3 +25,12 @@ config NTB_PERF
 to and from the window without additional software interaction.
 
 If unsure, say N.
+
+config NTB_MSI_TEST
+   tristate "NTB MSI Test Client"
+   depends on NTB_MSI
+   help
+ This tool demonstrates the use of the NTB MSI library to
+ send MSI interrupts between peers.
+
+ If unsure, say N.
diff --git a/drivers/ntb/test/Makefile b/drivers/ntb/test/Makefile
index 9e77e0b761c2..d2895ca995e4 100644
--- a/drivers/ntb/test/Makefile
+++ b/drivers/ntb/test/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_NTB_PINGPONG) += ntb_pingpong.o
 obj-$(CONFIG_NTB_TOOL) += ntb_tool.o
 obj-$(CONFIG_NTB_PERF) += ntb_perf.o
+obj-$(CONFIG_NTB_MSI_TEST) += ntb_msi_test.o
diff --git a/drivers/ntb/test/ntb_msi_test.c b/drivers/ntb/test/ntb_msi_test.c
new file mode 100644
index ..99d826ed9c34
--- /dev/null
+++ b/drivers/ntb/test/ntb_msi_test.c
@@ -0,0 +1,433 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_VERSION("0.1");
+MODULE_AUTHOR("Logan Gunthorpe ");
+MODULE_DESCRIPTION("Test for sending MSI interrupts over an NTB memory 
window");
+
+static int num_irqs = 4;
+module_param(num_irqs, int, 0644);
+MODULE_PARM_DESC(num_irqs, "number of irqs to use");
+
+struct ntb_msit_ctx {
+   struct ntb_dev *ntb;
+   struct dentry *dbgfs_dir;
+   struct work_struct setup_work;
+
+   struct ntb_msit_isr_ctx {
+   int irq_idx;
+   int irq_num;
+   int occurrences;
+   struct ntb_msit_ctx *nm;
+   struct ntb_msi_desc desc;
+   } *isr_ctx;
+
+   struct ntb_msit_peer {
+   struct ntb_msit_ctx *nm;
+   int pidx;
+   int num_irqs;
+   struct completion init_comp;
+   struct ntb_msi_desc *msi_desc;
+   } peers[];
+};
+
+static struct dentry *ntb_msit_dbgfs_topdir;
+
+static irqreturn_t ntb_msit_isr(int irq, void *dev)
+{
+   struct ntb_msit_isr_ctx *isr_ctx = dev;
+   struct ntb_msit_ctx *nm = isr_ctx->nm;
+
+   dev_dbg(&nm->ntb->dev, "Interrupt Occurred: %d",
+   isr_ctx->irq_idx);
+
+   isr_ctx->occurrences++;
+
+   return IRQ_HANDLED;
+}
+
+static void ntb_msit_setup_work(struct work_struct *work)
+{
+   struct ntb_msit_ctx *nm = container_of(work, struct ntb_msit_ctx,
+  setup_work);
+   int irq_count = 0;
+   int irq;
+   int ret;
+   uintptr_t i;
+
+   ret = ntb_msi_setup_mws(nm->ntb);
+   if (ret) {
+   dev_err(&nm->ntb->dev, "Unable to setup MSI windows: %d\n",
+   ret);
+   return;
+   }
+
+   for (i = 0; i < num_irqs; i++) {
+   nm->isr_ctx[i].irq_idx = i;
+   nm->isr_ctx[i].nm = nm;
+
+   if (!nm->isr_ctx[i].irq_num) {
+   irq = ntbm_msi_request_irq(nm->ntb, ntb_msit_isr,
+  KBUILD_MODNAME,
+  &nm->isr_ctx[i],
+  &nm->isr_ctx[i].desc);
+   if (irq < 0)
+   break;
+
+   nm->isr_ctx[i].irq_num = irq;
+   }
+
+ 

[PATCH v2 05/12] PCI/MSI: Support allocating virtual MSI interrupts

2019-02-13 Thread Logan Gunthorpe
For NTB devices, we want to be able to trigger MSI interrupts
through a memory window. In these cases we may want to use
more interrupts than the NTB PCI device has available in its MSI-X
table.

We allow for this by creating a new 'virtual' interrupt. These
interrupts are allocated as usual but are not programmed into the
MSI-X table (as there may not be space for them).

The MSI address and data will then handled through an NTB MSI library
introduced later in this series.

Signed-off-by: Logan Gunthorpe 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/msi.c   | 55 +
 include/linux/msi.h |  8 +++
 include/linux/pci.h |  9 
 3 files changed, 63 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 4c0b47867258..e7810ec45c9d 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -192,6 +192,9 @@ static void msi_mask_irq(struct msi_desc *desc, u32 mask, 
u32 flag)
 
 static void __iomem *pci_msix_desc_addr(struct msi_desc *desc)
 {
+   if (desc->msi_attrib.is_virtual)
+   return NULL;
+
return desc->mask_base +
desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
 }
@@ -206,14 +209,19 @@ static void __iomem *pci_msix_desc_addr(struct msi_desc 
*desc)
 u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag)
 {
u32 mask_bits = desc->masked;
+   void __iomem *desc_addr;
 
if (pci_msi_ignore_mask)
return 0;
+   desc_addr = pci_msix_desc_addr(desc);
+   if (!desc_addr)
+   return 0;
 
mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
if (flag)
mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
-   writel(mask_bits, pci_msix_desc_addr(desc) + 
PCI_MSIX_ENTRY_VECTOR_CTRL);
+
+   writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
 
return mask_bits;
 }
@@ -273,6 +281,11 @@ void __pci_read_msi_msg(struct msi_desc *entry, struct 
msi_msg *msg)
if (entry->msi_attrib.is_msix) {
void __iomem *base = pci_msix_desc_addr(entry);
 
+   if (!base) {
+   WARN_ON(1);
+   return;
+   }
+
msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR);
msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR);
msg->data = readl(base + PCI_MSIX_ENTRY_DATA);
@@ -303,6 +316,9 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct 
msi_msg *msg)
} else if (entry->msi_attrib.is_msix) {
void __iomem *base = pci_msix_desc_addr(entry);
 
+   if (!base)
+   goto skip;
+
writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR);
writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR);
writel(msg->data, base + PCI_MSIX_ENTRY_DATA);
@@ -327,7 +343,13 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct 
msi_msg *msg)
  msg->data);
}
}
+
+skip:
entry->msg = *msg;
+
+   if (entry->write_msi_msg)
+   entry->write_msi_msg(entry, entry->write_msi_msg_data);
+
 }
 
 void pci_write_msi_msg(unsigned int irq, struct msi_msg *msg)
@@ -550,6 +572,7 @@ msi_setup_entry(struct pci_dev *dev, int nvec, const struct 
irq_affinity *affd)
 
entry->msi_attrib.is_msix   = 0;
entry->msi_attrib.is_64 = !!(control & PCI_MSI_FLAGS_64BIT);
+   entry->msi_attrib.is_virtual= 0;
entry->msi_attrib.entry_nr  = 0;
entry->msi_attrib.maskbit   = !!(control & PCI_MSI_FLAGS_MASKBIT);
entry->msi_attrib.default_irq   = dev->irq; /* Save IOAPIC IRQ */
@@ -674,6 +697,7 @@ static int msix_setup_entries(struct pci_dev *dev, void 
__iomem *base,
struct irq_affinity_desc *curmsk, *masks = NULL;
struct msi_desc *entry;
int ret, i;
+   int vec_count = pci_msix_vec_count(dev);
 
if (affd)
masks = irq_create_affinity_masks(nvec, affd);
@@ -696,6 +720,10 @@ static int msix_setup_entries(struct pci_dev *dev, void 
__iomem *base,
entry->msi_attrib.entry_nr = entries[i].entry;
else
entry->msi_attrib.entry_nr = i;
+
+   entry->msi_attrib.is_virtual =
+   entry->msi_attrib.entry_nr >= vec_count;
+
entry->msi_attrib.default_irq   = dev->irq;
entry->mask_base= base;
 
@@ -714,12 +742,19 @@ static void msix_program_entries(struct pci_dev *dev,
 {
struct msi_desc *entry;
int i = 0;
+   void __iomem *desc_addr;
 
for_each_pci_msi_entry(entry, dev) {
if (entries)
entries[i++].vector = entry->irq;
-   entry->masked = readl(pci_msix_desc_addr(entry) +
-   PCI_MSIX_ENTRY_VECTOR_CTRL

[PATCH v2 01/12] iommu/vt-d: Implement dma_[un]map_resource()

2019-02-13 Thread Logan Gunthorpe
Currently the Intel IOMMU uses the default dma_[un]map_resource()
implementations does nothing and simply returns the physical address
unmodified.

However, this doesn't create the IOVA entries necessary for addresses
mapped this way to work when the IOMMU is enabled. Thus, when the
IOMMU is enabled, drivers relying on dma_map_resource() will trigger
DMAR errors. We see this when running ntb_transport with the IOMMU
enabled, DMA, and switchtec hardware.

The implementation for intel_map_resource() is nearly identical to
intel_map_page(), we just have to re-create __intel_map_single().
dma_unmap_resource() uses intel_unmap_page() directly as the
functions are identical.

Signed-off-by: Logan Gunthorpe 
Cc: David Woodhouse 
Cc: Joerg Roedel 
---
 drivers/iommu/intel-iommu.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 78188bf7e90d..ad737e16575b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3649,11 +3649,9 @@ static int iommu_no_mapping(struct device *dev)
return 0;
 }
 
-static dma_addr_t __intel_map_page(struct device *dev, struct page *page,
-  unsigned long offset, size_t size, int dir,
-  u64 dma_mask)
+static dma_addr_t __intel_map_single(struct device *dev, phys_addr_t paddr,
+size_t size, int dir, u64 dma_mask)
 {
-   phys_addr_t paddr = page_to_phys(page) + offset;
struct dmar_domain *domain;
phys_addr_t start_paddr;
unsigned long iova_pfn;
@@ -3715,7 +3713,15 @@ static dma_addr_t intel_map_page(struct device *dev, 
struct page *page,
 enum dma_data_direction dir,
 unsigned long attrs)
 {
-   return __intel_map_page(dev, page, offset, size, dir, *dev->dma_mask);
+   return __intel_map_single(dev, page_to_phys(page) + offset, size,
+ dir, *dev->dma_mask);
+}
+
+static dma_addr_t intel_map_resource(struct device *dev, phys_addr_t phys_addr,
+size_t size, enum dma_data_direction dir,
+unsigned long attrs)
+{
+   return __intel_map_single(dev, phys_addr, size, dir, *dev->dma_mask);
 }
 
 static void intel_unmap(struct device *dev, dma_addr_t dev_addr, size_t size)
@@ -3806,8 +3812,9 @@ static void *intel_alloc_coherent(struct device *dev, 
size_t size,
return NULL;
memset(page_address(page), 0, size);
 
-   *dma_handle = __intel_map_page(dev, page, 0, size, DMA_BIDIRECTIONAL,
-  dev->coherent_dma_mask);
+   *dma_handle = __intel_map_single(dev, page_to_phys(page), size,
+DMA_BIDIRECTIONAL,
+dev->coherent_dma_mask);
if (*dma_handle != DMA_MAPPING_ERROR)
return page_address(page);
if (!dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT))
@@ -3924,6 +3931,8 @@ static const struct dma_map_ops intel_dma_ops = {
.unmap_sg = intel_unmap_sg,
.map_page = intel_map_page,
.unmap_page = intel_unmap_page,
+   .map_resource = intel_map_resource,
+   .unmap_resource = intel_unmap_page,
.dma_supported = dma_direct_supported,
 };
 
-- 
2.19.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 03/12] iommu/vt-d: Add helper to set an IRTE to verify only the bus number

2019-02-13 Thread Logan Gunthorpe
The current code uses set_irte_sid() with SVT_VERIFY_BUS and PCI_DEVID
to set the SID value. However, this is very confusing because, with
SVT_VERIFY_BUS, the SID value is not a PCI devfn address, but the start
and end bus numbers to match against.

According to the Intel Virtualization Technology for Directed I/O
Architecture Specification, Rev. 3.0, page 9-36:

  The most significant 8-bits of the SID field contains the Startbus#,
  and the least significant 8-bits of the SID field contains the Endbus#.
  Interrupt requests that reference this IRTE must have a requester-id
  whose bus# (most significant 8-bits of requester-id) has a value equal
  to or within the Startbus# to Endbus# range.

So to make this more clear, introduce a new set_irte_verify_bus() that
explicitly takes a start bus and end bus so that we can stop abusing
the PCI_DEVID macro.

This helper function will be called a second time in an subsequent patch.

Signed-off-by: Logan Gunthorpe 
Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: Jacob Pan 
---
 drivers/iommu/intel_irq_remapping.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 24d45b07f425..5a55bef8e379 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -294,6 +294,18 @@ static void set_irte_sid(struct irte *irte, unsigned int 
svt,
irte->sid = sid;
 }
 
+/*
+ * Set an IRTE to match only the bus number. Interrupt requests that reference
+ * this IRTE must have a requester-id whose bus number is between or equal
+ * to the start_bus and end_bus arguments.
+ */
+static void set_irte_verify_bus(struct irte *irte, unsigned int start_bus,
+   unsigned int end_bus)
+{
+   set_irte_sid(irte, SVT_VERIFY_BUS, SQ_ALL_16,
+(start_bus << 8) | end_bus);
+}
+
 static int set_ioapic_sid(struct irte *irte, int apic)
 {
int i;
@@ -391,9 +403,8 @@ static int set_msi_sid(struct irte *irte, struct pci_dev 
*dev)
 * original device.
 */
if (PCI_BUS_NUM(data.alias) != data.pdev->bus->number)
-   set_irte_sid(irte, SVT_VERIFY_BUS, SQ_ALL_16,
-PCI_DEVID(PCI_BUS_NUM(data.alias),
-  dev->bus->number));
+   set_irte_verify_bus(irte, PCI_BUS_NUM(data.alias),
+   dev->bus->number);
else if (data.pdev->bus->number != dev->bus->number)
set_irte_sid(irte, SVT_VERIFY_SID_SQ, SQ_ALL_16, data.alias);
else
-- 
2.19.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 08/12] NTB: Rename ntb.c to support multiple source files in the module

2019-02-13 Thread Logan Gunthorpe
The kbuild system does not support having multiple source files in
a module if one of those source files has the same name as the module.

Therefore, we must rename ntb.c to core.c, while the module remains
ntb.ko.

This is similar to the way the nvme modules are structured.

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
---
 drivers/ntb/Makefile  | 2 ++
 drivers/ntb/{ntb.c => core.c} | 0
 2 files changed, 2 insertions(+)
 rename drivers/ntb/{ntb.c => core.c} (100%)

diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
index 1921dec1949d..537226f8e78d 100644
--- a/drivers/ntb/Makefile
+++ b/drivers/ntb/Makefile
@@ -1,2 +1,4 @@
 obj-$(CONFIG_NTB) += ntb.o hw/ test/
 obj-$(CONFIG_NTB_TRANSPORT) += ntb_transport.o
+
+ntb-y := core.o
diff --git a/drivers/ntb/ntb.c b/drivers/ntb/core.c
similarity index 100%
rename from drivers/ntb/ntb.c
rename to drivers/ntb/core.c
-- 
2.19.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 12/12] NTB: Add MSI interrupt support to ntb_transport

2019-02-13 Thread Logan Gunthorpe
Introduce the module parameter 'use_msi' which, when set uses
MSI interrupts instead of doorbells for each queue pair (QP). T
he parameter is only available if NTB MSI support is configured into
the kernel. We also require there to be more than one memory window
(MW) so that an extra one is available to forward the APIC region.

To use MSIs, we request one interrupt per QP and forward the MSI address
and data to the peer using scratch pad registers (SPADS) above the MW
spads. (If there are not enough SPADS the MSI interrupt will not be used.)

Once registered, we simply use ntb_msi_peer_trigger and the recieving
ISR simply queues up the rxc_db_work for the queue.

This addition can significantly improve performance of ntb_transport.
In a simple, untuned, apples-to-apples comparision using ntb_netdev
and iperf with switchtec hardware, I see 3.88Gb/s without MSI
interrupts and 14.1Gb/s which is a more than 3x improvement.

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
---
 drivers/ntb/ntb_transport.c | 169 +++-
 1 file changed, 168 insertions(+), 1 deletion(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 526b65afc16a..90e3ea67d48a 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -93,6 +93,12 @@ static bool use_dma;
 module_param(use_dma, bool, 0644);
 MODULE_PARM_DESC(use_dma, "Use DMA engine to perform large data copy");
 
+static bool use_msi;
+#ifdef CONFIG_NTB_MSI
+module_param(use_msi, bool, 0644);
+MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
+#endif
+
 static struct dentry *nt_debugfs_dir;
 
 /* Only two-ports NTB devices are supported */
@@ -188,6 +194,11 @@ struct ntb_transport_qp {
u64 tx_err_no_buf;
u64 tx_memcpy;
u64 tx_async;
+
+   bool use_msi;
+   int msi_irq;
+   struct ntb_msi_desc msi_desc;
+   struct ntb_msi_desc peer_msi_desc;
 };
 
 struct ntb_transport_mw {
@@ -221,6 +232,10 @@ struct ntb_transport_ctx {
u64 qp_bitmap;
u64 qp_bitmap_free;
 
+   bool use_msi;
+   unsigned int msi_spad_offset;
+   u64 msi_db_mask;
+
bool link_is_up;
struct delayed_work link_work;
struct work_struct link_cleanup;
@@ -667,6 +682,114 @@ static int ntb_transport_setup_qp_mw(struct 
ntb_transport_ctx *nt,
return 0;
 }
 
+static irqreturn_t ntb_transport_isr(int irq, void *dev)
+{
+   struct ntb_transport_qp *qp = dev;
+
+   tasklet_schedule(&qp->rxc_db_work);
+
+   return IRQ_HANDLED;
+}
+
+static void ntb_transport_setup_qp_peer_msi(struct ntb_transport_ctx *nt,
+   unsigned int qp_num)
+{
+   struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
+   int spad = qp_num * 2 + nt->msi_spad_offset;
+
+   if (!nt->use_msi)
+   return;
+
+   if (spad >= ntb_spad_count(nt->ndev))
+   return;
+
+   qp->peer_msi_desc.addr_offset =
+   ntb_peer_spad_read(qp->ndev, PIDX, spad);
+   qp->peer_msi_desc.data =
+   ntb_peer_spad_read(qp->ndev, PIDX, spad + 1);
+
+   dev_dbg(&qp->ndev->pdev->dev, "QP%d Peer MSI addr=%x data=%x\n",
+   qp_num, qp->peer_msi_desc.addr_offset, qp->peer_msi_desc.data);
+
+   if (qp->peer_msi_desc.addr_offset) {
+   qp->use_msi = true;
+   dev_info(&qp->ndev->pdev->dev,
+"Using MSI interrupts for QP%d\n", qp_num);
+   }
+}
+
+static void ntb_transport_setup_qp_msi(struct ntb_transport_ctx *nt,
+  unsigned int qp_num)
+{
+   struct ntb_transport_qp *qp = &nt->qp_vec[qp_num];
+   int spad = qp_num * 2 + nt->msi_spad_offset;
+   int rc;
+
+   if (!nt->use_msi)
+   return;
+
+   if (spad >= ntb_spad_count(nt->ndev)) {
+   dev_warn_once(&qp->ndev->pdev->dev,
+ "Not enough SPADS to use MSI interrupts\n");
+   return;
+   }
+
+   ntb_spad_write(qp->ndev, spad, 0);
+   ntb_spad_write(qp->ndev, spad + 1, 0);
+
+   if (!qp->msi_irq) {
+   qp->msi_irq = ntbm_msi_request_irq(qp->ndev, ntb_transport_isr,
+  KBUILD_MODNAME, qp,
+  &qp->msi_desc);
+   if (qp->msi_irq < 0) {
+   dev_warn(&qp->ndev->pdev->dev,
+"Unable to allocate MSI interrupt for qp%d\n",
+qp_num);
+   return;
+   }
+   }
+
+   rc = ntb_spad_write(qp->ndev, spad, qp->msi_desc.addr_offset);
+   if (rc)
+   goto err_free_interrupt;
+
+   rc = ntb_spad_write(qp->ndev, spad + 1, qp->msi_desc.data);
+   if (rc)
+   goto err_free_interrupt;
+
+   dev_dbg(&qp->ndev->pdev->dev, "QP%d MSI %d addr=%x data=%

[PATCH v2 04/12] iommu/vt-d: Allow interrupts from the entire bus for aliased devices

2019-02-13 Thread Logan Gunthorpe
When a device has multiple aliases that all are from the same bus,
we program the IRTE to accept requests from any matching device on the
bus.

This is so NTB devices which can have requests from multiple bus-devfns
can pass MSI interrupts through across the bridge.

Signed-off-by: Logan Gunthorpe 
Cc: David Woodhouse 
Cc: Joerg Roedel 
Cc: Jacob Pan 
---
 drivers/iommu/intel_irq_remapping.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index 5a55bef8e379..2d74641b7f7b 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -368,6 +368,8 @@ static int set_hpet_sid(struct irte *irte, u8 id)
 struct set_msi_sid_data {
struct pci_dev *pdev;
u16 alias;
+   int count;
+   int busmatch_count;
 };
 
 static int set_msi_sid_cb(struct pci_dev *pdev, u16 alias, void *opaque)
@@ -376,6 +378,10 @@ static int set_msi_sid_cb(struct pci_dev *pdev, u16 alias, 
void *opaque)
 
data->pdev = pdev;
data->alias = alias;
+   data->count++;
+
+   if (PCI_BUS_NUM(alias) == pdev->bus->number)
+   data->busmatch_count++;
 
return 0;
 }
@@ -387,6 +393,8 @@ static int set_msi_sid(struct irte *irte, struct pci_dev 
*dev)
if (!irte || !dev)
return -1;
 
+   data.count = 0;
+   data.busmatch_count = 0;
pci_for_each_dma_alias(dev, set_msi_sid_cb, &data);
 
/*
@@ -395,6 +403,11 @@ static int set_msi_sid(struct irte *irte, struct pci_dev 
*dev)
 * device is the case of a PCIe-to-PCI bridge, where the alias is for
 * the subordinate bus.  In this case we can only verify the bus.
 *
+* If there are multiple aliases, all with the same bus number,
+* then all we can do is verify the bus. This is typical in NTB
+* hardware which use proxy IDs where the device will generate traffic
+* from multiple devfn numbers on the same bus.
+*
 * If the alias device is on a different bus than our source device
 * then we have a topology based alias, use it.
 *
@@ -405,6 +418,8 @@ static int set_msi_sid(struct irte *irte, struct pci_dev 
*dev)
if (PCI_BUS_NUM(data.alias) != data.pdev->bus->number)
set_irte_verify_bus(irte, PCI_BUS_NUM(data.alias),
dev->bus->number);
+   else if (data.count >= 2 && data.busmatch_count == data.count)
+   set_irte_verify_bus(irte, dev->bus->number, dev->bus->number);
else if (data.pdev->bus->number != dev->bus->number)
set_irte_sid(irte, SVT_VERIFY_SID_SQ, SQ_ALL_16, data.alias);
else
-- 
2.19.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 11/12] NTB: Add ntb_msi_test support to ntb_test

2019-02-13 Thread Logan Gunthorpe
When the ntb_msi_test module is available, the test code will trigger
each of the interrupts and ensure the corresponding occurrences files
gets incremented.

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
---
 tools/testing/selftests/ntb/ntb_test.sh | 54 -
 1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/ntb/ntb_test.sh 
b/tools/testing/selftests/ntb/ntb_test.sh
index 17ca36403d04..1a10b8f67727 100755
--- a/tools/testing/selftests/ntb/ntb_test.sh
+++ b/tools/testing/selftests/ntb/ntb_test.sh
@@ -87,10 +87,10 @@ set -e
 
 function _modprobe()
 {
-   modprobe "$@"
+   modprobe "$@" || return 1
 
if [[ "$REMOTE_HOST" != "" ]]; then
-   ssh "$REMOTE_HOST" modprobe "$@"
+   ssh "$REMOTE_HOST" modprobe "$@" || return 1
fi
 }
 
@@ -451,6 +451,30 @@ function pingpong_test()
echo "  Passed"
 }
 
+function msi_test()
+{
+   LOC=$1
+   REM=$2
+
+   write_file 1 $LOC/ready
+
+   echo "Running MSI interrupt tests on: $(subdirname $LOC) / $(subdirname 
$REM)"
+
+   CNT=$(read_file "$LOC/count")
+   for ((i = 0; i < $CNT; i++)); do
+   START=$(read_file $REM/../irq${i}_occurrences)
+   write_file $i $LOC/trigger
+   END=$(read_file $REM/../irq${i}_occurrences)
+
+   if [[ $(($END - $START)) != 1 ]]; then
+   echo "MSI did not trigger the interrupt on the remote 
side!" >&2
+   exit 1
+   fi
+   done
+
+   echo "  Passed"
+}
+
 function perf_test()
 {
USE_DMA=$1
@@ -529,6 +553,29 @@ function ntb_pingpong_tests()
_modprobe -r ntb_pingpong
 }
 
+function ntb_msi_tests()
+{
+   LOCAL_MSI="$DEBUGFS/ntb_msi_test/$LOCAL_DEV"
+   REMOTE_MSI="$REMOTE_HOST:$DEBUGFS/ntb_msi_test/$REMOTE_DEV"
+
+   echo "Starting ntb_msi_test tests..."
+
+   if ! _modprobe ntb_msi_test 2> /dev/null; then
+   echo "  Not doing MSI tests seeing the module is not available."
+   return
+   fi
+
+   port_test $LOCAL_MSI $REMOTE_MSI
+
+   LOCAL_PEER="$LOCAL_MSI/peer$LOCAL_PIDX"
+   REMOTE_PEER="$REMOTE_MSI/peer$REMOTE_PIDX"
+
+   msi_test $LOCAL_PEER $REMOTE_PEER
+   msi_test $REMOTE_PEER $LOCAL_PEER
+
+   _modprobe -r ntb_msi_test
+}
+
 function ntb_perf_tests()
 {
LOCAL_PERF="$DEBUGFS/ntb_perf/$LOCAL_DEV"
@@ -550,6 +597,7 @@ function cleanup()
_modprobe -r ntb_perf 2> /dev/null
_modprobe -r ntb_pingpong 2> /dev/null
_modprobe -r ntb_transport 2> /dev/null
+   _modprobe -r ntb_msi_test 2> /dev/null
set -e
 }
 
@@ -586,5 +634,7 @@ ntb_tool_tests
 echo
 ntb_pingpong_tests
 echo
+ntb_msi_tests
+echo
 ntb_perf_tests
 echo
-- 
2.19.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 00/12] Support using MSI interrupts in ntb_transport

2019-02-13 Thread Logan Gunthorpe
Note: this version will likely trivially conflict with some cleanup
patches I sent to Bjorn. So this is meant for review purposes only.
If there are no objections, I'd like to look at getting it merged in
the next cycle through the NTB tree.

--

Changes in v2:

* Cleaned up the changes in intel_irq_remapping.c to make them
  less confusing and add a comment. (Per discussion with Jacob and
  Joerg)

* Fixed a nit from Bjorn and collected his Ack

* Added a Kconfig dependancy on CONFIG_PCI_MSI for CONFIG_NTB_MSI
  as the Kbuild robot hit a random config that didn't build
  without it.

* Worked in a callback for when the MSI descriptor changes so that
  the clients can resend the new address and data values to the peer.
  On my test system this was never necessary, but there may be
  other platforms where this can occur. I tested this by hacking
  in a path to rewrite the MSI descriptor when I change the cpu
  affinity of an IRQ. There's a bit of uncertainty over the latency
  of the change, but without hardware this can acctually occur on
  we can't test this. This was the result of a discussion with Dave.

--

This patch series adds optional support for using MSI interrupts instead
of NTB doorbells in ntb_transport. This is desirable seeing doorbells on
current hardware are quite slow and therefore switching to MSI interrupts
provides a significant performance gain. On switchtec hardware, a simple
apples-to-apples comparison shows ntb_netdev/iperf numbers going from
3.88Gb/s to 14.1Gb/s when switching to MSI interrupts.

To do this, a couple changes are required outside of the NTB tree:

1) The IOMMU must know to accept MSI requests from aliased bused numbers
seeing NTB hardware typically sends proxied request IDs through
additional requester IDs. The first patch in this series adds support
for the Intel IOMMU. A quirk to add these aliases for switchtec hardware
was already accepted. See commit ad281ecf1c7d ("PCI: Add DMA alias quirk
for Microsemi Switchtec NTB") for a description of NTB proxy IDs and why
this is necessary.

2) NTB transport (and other clients) may often need more MSI interrupts
than the NTB hardware actually advertises support for. However, seeing
these interrupts will not be triggered by the hardware but through an
NTB memory window, the hardware does not actually need support or need
to know about them. Therefore we add the concept of Virtual MSI
interrupts which are allocated just like any other MSI interrupt but
are not programmed into the hardware's MSI table. This is done in
Patch 2 and then made use of in Patch 3.

The remaining patches in this series add a library for dealing with MSI
interrupts, a test client and finally support in ntb_transport.

The series is based off of v5.0-rc4 and I've tested it on top of a
of the patches I've already sent to the NTB tree (though they are
independent changes). A git repo is available here:

https://github.com/sbates130272/linux-p2pmem/ ntb_transport_msi_v2

Thanks,

Logan

--

Logan Gunthorpe (12):
  iommu/vt-d: Implement dma_[un]map_resource()
  NTB: ntb_transport: Ensure the destination buffer is mapped for TX DMA
  iommu/vt-d: Add helper to set an IRTE to verify only the bus number
  iommu/vt-d: Allow interrupts from the entire bus for aliased devices
  PCI/MSI: Support allocating virtual MSI interrupts
  PCI/switchtec: Add module parameter to request more interrupts
  NTB: Introduce functions to calculate multi-port resource index
  NTB: Rename ntb.c to support multiple source files in the module
  NTB: Introduce MSI library
  NTB: Introduce NTB MSI Test Client
  NTB: Add ntb_msi_test support to ntb_test
  NTB: Add MSI interrupt support to ntb_transport

 drivers/iommu/intel-iommu.c |  23 +-
 drivers/iommu/intel_irq_remapping.c |  32 +-
 drivers/ntb/Kconfig |  11 +
 drivers/ntb/Makefile|   3 +
 drivers/ntb/{ntb.c => core.c}   |   0
 drivers/ntb/msi.c   | 415 +++
 drivers/ntb/ntb_transport.c | 197 ++-
 drivers/ntb/test/Kconfig|   9 +
 drivers/ntb/test/Makefile   |   1 +
 drivers/ntb/test/ntb_msi_test.c | 433 
 drivers/pci/msi.c   |  55 ++-
 drivers/pci/switch/switchtec.c  |  12 +-
 include/linux/msi.h |   8 +
 include/linux/ntb.h | 143 
 include/linux/pci.h |   9 +
 tools/testing/selftests/ntb/ntb_test.sh |  54 ++-
 16 files changed, 1379 insertions(+), 26 deletions(-)
 rename drivers/ntb/{ntb.c => core.c} (100%)
 create mode 100644 drivers/ntb/msi.c
 create mode 100644 drivers/ntb/test/ntb_msi_test.c

--
2.19.0
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 02/12] NTB: ntb_transport: Ensure the destination buffer is mapped for TX DMA

2019-02-13 Thread Logan Gunthorpe
Presently, when ntb_transport is used with DMA and the IOMMU turned on,
it fails with errors from the IOMMU such as:

  DMAR: DRHD: handling fault status reg 202
  DMAR: [DMA Write] Request device [00:04.0] fault addr
381fc034 [fault reason 05] PTE Write access is not set

This is because ntb_transport does not map the BAR space with the IOMMU.

To fix this, we map the entire MW region for each QP after we assign
the DMA channel. This prevents needing an extra DMA map in the fast
path.

Link: 
https://lore.kernel.org/linux-pci/499934e7-3734-1aee-37dd-b42a5d2a2...@intel.com/
Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
---
 drivers/ntb/ntb_transport.c | 28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 3bfdb4562408..526b65afc16a 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -144,7 +144,9 @@ struct ntb_transport_qp {
struct list_head tx_free_q;
spinlock_t ntb_tx_free_q_lock;
void __iomem *tx_mw;
-   dma_addr_t tx_mw_phys;
+   phys_addr_t tx_mw_phys;
+   size_t tx_mw_size;
+   dma_addr_t tx_mw_dma_addr;
unsigned int tx_index;
unsigned int tx_max_entry;
unsigned int tx_max_frame;
@@ -1049,6 +1051,7 @@ static int ntb_transport_init_queue(struct 
ntb_transport_ctx *nt,
tx_size = (unsigned int)mw_size / num_qps_mw;
qp_offset = tx_size * (qp_num / mw_count);
 
+   qp->tx_mw_size = tx_size;
qp->tx_mw = nt->mw_vec[mw_num].vbase + qp_offset;
if (!qp->tx_mw)
return -EINVAL;
@@ -1644,7 +1647,7 @@ static int ntb_async_tx_submit(struct ntb_transport_qp 
*qp,
dma_cookie_t cookie;
 
device = chan->device;
-   dest = qp->tx_mw_phys + qp->tx_max_frame * entry->tx_index;
+   dest = qp->tx_mw_dma_addr + qp->tx_max_frame * entry->tx_index;
buff_off = (size_t)buf & ~PAGE_MASK;
dest_off = (size_t)dest & ~PAGE_MASK;
 
@@ -1863,6 +1866,18 @@ ntb_transport_create_queue(void *data, struct device 
*client_dev,
qp->rx_dma_chan = NULL;
}
 
+   if (qp->tx_dma_chan) {
+   qp->tx_mw_dma_addr =
+   dma_map_resource(qp->tx_dma_chan->device->dev,
+qp->tx_mw_phys, qp->tx_mw_size,
+DMA_FROM_DEVICE, 0);
+   if (dma_mapping_error(qp->tx_dma_chan->device->dev,
+ qp->tx_mw_dma_addr)) {
+   qp->tx_mw_dma_addr = 0;
+   goto err1;
+   }
+   }
+
dev_dbg(&pdev->dev, "Using %s memcpy for TX\n",
qp->tx_dma_chan ? "DMA" : "CPU");
 
@@ -1904,6 +1919,10 @@ ntb_transport_create_queue(void *data, struct device 
*client_dev,
qp->rx_alloc_entry = 0;
while ((entry = ntb_list_rm(&qp->ntb_rx_q_lock, &qp->rx_free_q)))
kfree(entry);
+   if (qp->tx_mw_dma_addr)
+   dma_unmap_resource(qp->tx_dma_chan->device->dev,
+  qp->tx_mw_dma_addr, qp->tx_mw_size,
+  DMA_FROM_DEVICE, 0);
if (qp->tx_dma_chan)
dma_release_channel(qp->tx_dma_chan);
if (qp->rx_dma_chan)
@@ -1945,6 +1964,11 @@ void ntb_transport_free_queue(struct ntb_transport_qp 
*qp)
 */
dma_sync_wait(chan, qp->last_cookie);
dmaengine_terminate_all(chan);
+
+   dma_unmap_resource(chan->device->dev,
+  qp->tx_mw_dma_addr, qp->tx_mw_size,
+  DMA_FROM_DEVICE, 0);
+
dma_release_channel(chan);
}
 
-- 
2.19.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 07/12] NTB: Introduce functions to calculate multi-port resource index

2019-02-13 Thread Logan Gunthorpe
When using multi-ports each port uses resources (dbs, msgs, mws, etc)
on every other port. Creating a mapping for these resources such that
each port has a corresponding resource on every other port is a bit
tricky.

Introduce the ntb_peer_resource_idx() function for this purpose.
It returns the peer resource number that will correspond with the
local peer index on the remote peer.

Also, introduce ntb_peer_highest_mw_idx() which will use
ntb_peer_resource_idx() but return the MW index starting with the
highest index and working down.

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
---
 include/linux/ntb.h | 70 +
 1 file changed, 70 insertions(+)

diff --git a/include/linux/ntb.h b/include/linux/ntb.h
index 181d16601dd9..f5c69d853489 100644
--- a/include/linux/ntb.h
+++ b/include/linux/ntb.h
@@ -1502,4 +1502,74 @@ static inline int ntb_peer_msg_write(struct ntb_dev 
*ntb, int pidx, int midx,
return ntb->ops->peer_msg_write(ntb, pidx, midx, msg);
 }
 
+/**
+ * ntb_peer_resource_idx() - get a resource index for a given peer idx
+ * @ntb:   NTB device context.
+ * @pidx:  Peer port index.
+ *
+ * When constructing a graph of peers, each remote peer must use a different
+ * resource index (mw, doorbell, etc) to communicate with each other
+ * peer.
+ *
+ * In a two peer system, this function should always return 0 such that
+ * resource 0 points to the remote peer on both ports.
+ *
+ * In a 5 peer system, this function will return the following matrix
+ *
+ * pidx \ port01234
+ * 0  00123
+ * 1  01234
+ * 2  01234
+ * 3  01234
+ *
+ * For example, if this function is used to program peer's memory
+ * windows, port 0 will program MW 0 on all it's peers to point to itself.
+ * port 1 will program MW 0 in port 0 to point to itself and MW 1 on all
+ * other ports. etc.
+ *
+ * For the legacy two host case, ntb_port_number() and ntb_peer_port_number()
+ * both return zero and therefore this function will always return zero.
+ * So MW 0 on each host would be programmed to point to the other host.
+ *
+ * Return: the resource index to use for that peer.
+ */
+static inline int ntb_peer_resource_idx(struct ntb_dev *ntb, int pidx)
+{
+   int local_port, peer_port;
+
+   if (pidx >= ntb_peer_port_count(ntb))
+   return -EINVAL;
+
+   local_port = ntb_port_number(ntb);
+   peer_port = ntb_peer_port_number(ntb, pidx);
+
+   if (peer_port < local_port)
+   return local_port - 1;
+   else
+   return local_port;
+}
+
+/**
+ * ntb_peer_highest_mw_idx() - get a memory window index for a given peer idx
+ * using the highest index memory windows first
+ *
+ * @ntb:   NTB device context.
+ * @pidx:  Peer port index.
+ *
+ * Like ntb_peer_resource_idx(), except it returns indexes starting with
+ * last memory window index.
+ *
+ * Return: the resource index to use for that peer.
+ */
+static inline int ntb_peer_highest_mw_idx(struct ntb_dev *ntb, int pidx)
+{
+   int ret;
+
+   ret = ntb_peer_resource_idx(ntb, pidx);
+   if (ret < 0)
+   return ret;
+
+   return ntb_mw_count(ntb, pidx) - ret - 1;
+}
+
 #endif
-- 
2.19.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

2019-02-13 Thread Vlastimil Babka
On 1/22/19 11:51 PM, Nicolas Boichat wrote:
> Hi Andrew,
> 
> On Fri, Jan 11, 2019 at 6:21 PM Joerg Roedel  wrote:
>>
>> On Wed, Jan 02, 2019 at 01:51:45PM +0800, Nicolas Boichat wrote:
>> > Does anyone have any further comment on this series? If not, which
>> > maintainer is going to pick this up? I assume Andrew Morton?
>>
>> Probably, yes. I don't like to carry the mm-changes in iommu-tree, so
>> this should go through mm.
> 
> Gentle ping on this series, it seems like it's better if it goes
> through your tree.
> 
> Series still applies cleanly on linux-next, but I'm happy to resend if
> that helps.

Ping, Andrew?

> Thanks!
> 
>> Regards,
>>
>> Joerg
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 6/9] iommu/dma-iommu.c: Convert to use vm_map_pages()

2019-02-13 Thread Souptick Joarder
Convert to use vm_map_pages() to map range of kernel
memory to user vma.

Signed-off-by: Souptick Joarder 
---
 drivers/iommu/dma-iommu.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d19f3d6..bacebff 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -620,17 +620,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
 
 int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct 
*vma)
 {
-   unsigned long uaddr = vma->vm_start;
-   unsigned int i, count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   int ret = -ENXIO;
-
-   for (i = vma->vm_pgoff; i < count && uaddr < vma->vm_end; i++) {
-   ret = vm_insert_page(vma, uaddr, pages[i]);
-   if (ret)
-   break;
-   uaddr += PAGE_SIZE;
-   }
-   return ret;
+   return vm_map_pages(vma, pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/9] mm: Introduce new vm_map_pages() and vm_map_pages_zero() API

2019-02-13 Thread Souptick Joarder
Previouly drivers have their own way of mapping range of
kernel pages/memory into user vma and this was done by
invoking vm_insert_page() within a loop.

As this pattern is common across different drivers, it can
be generalized by creating new functions and use it across
the drivers.

vm_map_pages() is the API which could be used to mapped
kernel memory/pages in drivers which has considered vm_pgoff

vm_map_pages_zero() is the API which could be used to map
range of kernel memory/pages in drivers which has not considered
vm_pgoff. vm_pgoff is passed default as 0 for those drivers.

We _could_ then at a later "fix" these drivers which are using
vm_map_pages_zero() to behave according to the normal vm_pgoff
offsetting simply by removing the _zero suffix on the function
name and if that causes regressions, it gives us an easy way to revert.

Tested on Rockchip hardware and display is working, including talking
to Lima via prime.

Signed-off-by: Souptick Joarder 
Suggested-by: Russell King 
Suggested-by: Matthew Wilcox 
Reviewed-by: Mike Rapoport 
Tested-by: Heiko Stuebner 
---
 include/linux/mm.h |  4 +++
 mm/memory.c| 81 ++
 mm/nommu.c | 14 ++
 3 files changed, 99 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 80bb640..e0aaa73 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2565,6 +2565,10 @@ unsigned long change_prot_numa(struct vm_area_struct 
*vma,
 int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t);
 int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
+int vm_map_pages(struct vm_area_struct *vma, struct page **pages,
+   unsigned long num);
+int vm_map_pages_zero(struct vm_area_struct *vma, struct page **pages,
+   unsigned long num);
 vm_fault_t vmf_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn);
 vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index e11ca9d..cad3e27 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1520,6 +1520,87 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned 
long addr,
 }
 EXPORT_SYMBOL(vm_insert_page);
 
+/*
+ * __vm_map_pages - maps range of kernel pages into user vma
+ * @vma: user vma to map to
+ * @pages: pointer to array of source kernel pages
+ * @num: number of pages in page array
+ * @offset: user's requested vm_pgoff
+ *
+ * This allows drivers to map range of kernel pages into a user vma.
+ *
+ * Return: 0 on success and error code otherwise.
+ */
+static int __vm_map_pages(struct vm_area_struct *vma, struct page **pages,
+   unsigned long num, unsigned long offset)
+{
+   unsigned long count = vma_pages(vma);
+   unsigned long uaddr = vma->vm_start;
+   int ret, i;
+
+   /* Fail if the user requested offset is beyond the end of the object */
+   if (offset > num)
+   return -ENXIO;
+
+   /* Fail if the user requested size exceeds available object size */
+   if (count > num - offset)
+   return -ENXIO;
+
+   for (i = 0; i < count; i++) {
+   ret = vm_insert_page(vma, uaddr, pages[offset + i]);
+   if (ret < 0)
+   return ret;
+   uaddr += PAGE_SIZE;
+   }
+
+   return 0;
+}
+
+/**
+ * vm_map_pages - maps range of kernel pages starts with non zero offset
+ * @vma: user vma to map to
+ * @pages: pointer to array of source kernel pages
+ * @num: number of pages in page array
+ *
+ * Maps an object consisting of @num pages, catering for the user's
+ * requested vm_pgoff
+ *
+ * If we fail to insert any page into the vma, the function will return
+ * immediately leaving any previously inserted pages present.  Callers
+ * from the mmap handler may immediately return the error as their caller
+ * will destroy the vma, removing any successfully inserted pages. Other
+ * callers should make their own arrangements for calling unmap_region().
+ *
+ * Context: Process context. Called by mmap handlers.
+ * Return: 0 on success and error code otherwise.
+ */
+int vm_map_pages(struct vm_area_struct *vma, struct page **pages,
+   unsigned long num)
+{
+   return __vm_map_pages(vma, pages, num, vma->vm_pgoff);
+}
+EXPORT_SYMBOL(vm_map_pages);
+
+/**
+ * vm_map_pages_zero - map range of kernel pages starts with zero offset
+ * @vma: user vma to map to
+ * @pages: pointer to array of source kernel pages
+ * @num: number of pages in page array
+ *
+ * Similar to vm_map_pages(), except that it explicitly sets the offset
+ * to 0. This function is intended for the drivers that did not consider
+ * vm_pgoff.
+ *
+ * Context: Process context. Called by mmap handlers.
+ * Return: 0 on success and error cod

[PATCH v3 0/9] mm: Use vm_map_pages() and vm_map_pages_zero() API

2019-02-13 Thread Souptick Joarder
Previouly drivers have their own way of mapping range of
kernel pages/memory into user vma and this was done by
invoking vm_insert_page() within a loop.

As this pattern is common across different drivers, it can
be generalized by creating new functions and use it across
the drivers.

vm_map_pages() is the API which could be used to map
kernel memory/pages in drivers which has considered vm_pgoff.

vm_map_pages_zero() is the API which could be used to map
range of kernel memory/pages in drivers which has not considered
vm_pgoff. vm_pgoff is passed default as 0 for those drivers.

We _could_ then at a later "fix" these drivers which are using
vm_map_pages_zero() to behave according to the normal vm_pgoff
offsetting simply by removing the _zero suffix on the function
name and if that causes regressions, it gives us an easy way to revert.

Tested on Rockchip hardware and display is working fine, including talking
to Lima via prime.

v1 -> v2:
Few Reviewed-by.

Updated the change log in [8/9]

In [7/9], vm_pgoff is treated in V4L2 API as a 'cookie'
to select a buffer, not as a in-buffer offset by design
and it always want to mmap a whole buffer from its beginning.
Added additional changes after discussing with Marek and
vm_map_pages() could be used instead of vm_map_pages_zero().

v2 -> v3:
Corrected the documentation as per review comment.

As suggested in v2, renaming the interfaces to -
*vm_insert_range() -> vm_map_pages()* and
*vm_insert_range_buggy() -> vm_map_pages_zero()*.
As the interface is renamed, modified the code accordingly,
updated the change logs and modified the subject lines to use the
new interfaces. There is no other change apart from renaming and
using the new interface.

Patch[1/9] & [4/9], Tested on Rockchip hardware.

Souptick Joarder (9):
  mm: Introduce new vm_map_pages() and vm_map_pages_zero() API
  arm: mm: dma-mapping: Convert to use vm_map_pages()
  drivers/firewire/core-iso.c: Convert to use vm_map_pages_zero()
  drm/rockchip/rockchip_drm_gem.c: Convert to use vm_map_pages()
  drm/xen/xen_drm_front_gem.c: Convert to use vm_map_pages()
  iommu/dma-iommu.c: Convert to use vm_map_pages()
  videobuf2/videobuf2-dma-sg.c: Convert to use vm_map_pages()
  xen/gntdev.c: Convert to use vm_map_pages()
  xen/privcmd-buf.c: Convert to use vm_map_pages_zero()

 arch/arm/mm/dma-mapping.c  | 22 ++
 drivers/firewire/core-iso.c| 15 +---
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c| 17 +
 drivers/gpu/drm/xen/xen_drm_front_gem.c| 18 ++---
 drivers/iommu/dma-iommu.c  | 12 +---
 drivers/media/common/videobuf2/videobuf2-core.c|  7 ++
 .../media/common/videobuf2/videobuf2-dma-contig.c  |  6 --
 drivers/media/common/videobuf2/videobuf2-dma-sg.c  | 22 ++
 drivers/xen/gntdev.c   | 16 ++---
 drivers/xen/privcmd-buf.c  |  8 +--
 include/linux/mm.h |  4 ++
 mm/memory.c| 81 ++
 mm/nommu.c | 14 
 13 files changed, 136 insertions(+), 106 deletions(-)

-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 1/9] iommu: Add APIs for multiple domains per device

2019-02-13 Thread Jean-Philippe Brucker
Hi,

I have a few boring nits and one question below

On 13/02/2019 04:02, Lu Baolu wrote:
> Sharing a physical PCI device in a finer-granularity way
> is becoming a consensus in the industry. IOMMU vendors
> are also engaging efforts to support such sharing as well
> as possible. Among the efforts, the capability of support
> finer-granularity DMA isolation is a common requirement
> due to the security consideration. With finer-granularity
> DMA isolation, all DMA requests out of or to a subset of
> a physical PCI device can be protected by the IOMMU.

That last sentence seems strange, how about "With finer-granularity DMA
isolation, subsets of a PCI function can be isolated from each others by
the IOMMU."

> As a
> result, there is a request in software to attach multiple
> domains to a physical PCI device. One example of such use
> model is the Intel Scalable IOV [1] [2]. The Intel vt-d
> 3.0 spec [3] introduces the scalable mode which enables
> PASID granularity DMA isolation.
> 
> This adds the APIs to support multiple domains per device.
> In order to ease the discussions, we call it 'a domain in
> auxiliary mode' or simply 'auxiliary domain' when multiple
> domains are attached to a physical device.
> 
> The APIs include:
> 
> * iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)
>   - Check whether both IOMMU and device support IOMMU aux
>     domain feature. Below aux-domain specific interfaces
>     are available only after this returns true.

s/after/if/ since calling has_feature() shouldn't be a prerequisite to
using the aux-domain interface (unlike calling enable_feature()).

> 
> * iommu_dev_enable/disable_feature(dev, IOMMU_DEV_FEAT_AUX)
>   - Enable/disable device specific aux-domain feature.
> 
> * iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)
>   - Check whether the aux domain specific feature enabled or
>     not.

"is enabled"

> 
> * iommu_aux_attach_device(domain, dev)
>   - Attaches @domain to @dev in the auxiliary mode. Multiple
>     domains could be attached to a single device in the
>     auxiliary mode with each domain representing an isolated
>     address space for an assignable subset of the device.
> 
> * iommu_aux_detach_device(domain, dev)
>   - Detach @domain which has been attached to @dev in the
>     auxiliary mode.
> 
> * iommu_aux_get_pasid(domain, dev)
>   - Return ID used for finer-granularity DMA translation.
>     For the Intel Scalable IOV usage model, this will be
>     a PASID. The device which supports Scalable IOV needs
>     to write this ID to the device register so that DMA
>     requests could be tagged with a right PASID prefix.
> 
> This has been updated with the latest proposal from Joerg
> posted here [5].
> 
> Many people involved in discussions of this design.
> 
> Kevin Tian 
> Liu Yi L 
> Ashok Raj 
> Sanjay Kumar 
> Jacob Pan 
> Alex Williamson 
> Jean-Philippe Brucker 
> Joerg Roedel 
> 
> and some discussions can be found here [4] [5].
> 
> [1]
> https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
> [2] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
> [3]
> https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
> [4] https://lkml.org/lkml/2018/7/26/4
> [5] https://www.spinics.net/lists/iommu/msg31874.html
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Liu Yi L 
> Suggested-by: Kevin Tian 
> Suggested-by: Jean-Philippe Brucker 
> Suggested-by: Joerg Roedel 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/iommu.c | 91 +++
>  include/linux/iommu.h | 70 +
>  2 files changed, 161 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3ed4db334341..d0b323e8357f 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2033,3 +2033,94 @@ int iommu_fwspec_add_ids(struct device *dev, u32
> *ids, int num_ids)
>  return 0;
>  }
>  EXPORT_SYMBOL_GPL(iommu_fwspec_add_ids);
> +
> +/*
> + * Per device IOMMU features.
> + */
> +bool iommu_dev_has_feature(struct device *dev, enum iommu_dev_features
> feat)
> +{
> +   const struct iommu_ops *ops = dev->bus->iommu_ops;
> +
> +   if (ops && ops->dev_has_feat)
> +   return ops->dev_has_feat(dev, feat);
> +
> +   return false;
> +}
> +EXPORT_SYMBOL_GPL(iommu_dev_has_feature);
> +
> +int iommu_dev_enable_feature(struct device *dev, enum
> iommu_dev_features feat)
> +{
> +   const struct iommu_ops *ops = dev->bus->iommu_ops;
> +
> +   if (ops && ops->dev_enable_feat)
> +   return ops->dev_enable_feat(dev, feat);
> +
> +   return -ENODEV;
> +}
> +EXPORT_SYMBOL_GPL(iommu_dev_enable_feature);
> +
> +int iommu_dev_disable_feature(struct device *dev, enum
> iommu_dev_features feat)
> +{
> +   const struct iommu_ops *ops = dev->bus->iommu_ops;
> +
> +   if (ops && ops->dev_disable_feat)
> +   retur

Re: [PATCH 06/12] dma-mapping: improve selection of dma_declare_coherent availability

2019-02-13 Thread Lee Jones
On Mon, 11 Feb 2019, Christoph Hellwig wrote:

> This API is primarily used through DT entries, but two architectures
> and two drivers call it directly.  So instead of selecting the config
> symbol for random architectures pull it in implicitly for the actual
> users.  Also rename the Kconfig option to describe the feature better.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/arc/Kconfig| 1 -
>  arch/arm/Kconfig| 2 +-
>  arch/arm64/Kconfig  | 1 -
>  arch/csky/Kconfig   | 1 -
>  arch/mips/Kconfig   | 1 -
>  arch/riscv/Kconfig  | 1 -
>  arch/sh/Kconfig | 2 +-
>  arch/unicore32/Kconfig  | 1 -
>  arch/x86/Kconfig| 1 -

>  drivers/mfd/Kconfig | 2 ++

If everyone else is happy with these changes, then so am I.

  Acked-by: Lee Jones 

>  drivers/of/Kconfig  | 3 ++-
>  include/linux/device.h  | 2 +-
>  include/linux/dma-mapping.h | 8 
>  kernel/dma/Kconfig  | 2 +-
>  kernel/dma/Makefile | 2 +-
>  15 files changed, 13 insertions(+), 17 deletions(-)

-- 
Lee Jones [李琼斯]
Linaro Services Technical Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 01/12] mfd/sm501: depend on HAS_DMA

2019-02-13 Thread Lee Jones
On Mon, 11 Feb 2019, Christoph Hellwig wrote:

> Currently the sm501 mfd driver can be compiled without any dependencies,
> but through the use of dma_declare_coherent it really depends on
> having DMA and iomem support.  Normally we don't explicitly require DMA
> support as we have stubs for it if on UML, but in this case the driver
> selects support for dma_declare_coherent and thus also requires
> memmap support.  Guard this by an explicit dependency.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/mfd/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
> index f461460a2aeb..f15f6489803d 100644
> --- a/drivers/mfd/Kconfig
> +++ b/drivers/mfd/Kconfig
> @@ -1066,6 +1066,7 @@ config MFD_SI476X_CORE
>  
>  config MFD_SM501
>   tristate "Silicon Motion SM501"
> + depends on HAS_DMA
>---help---
> This is the core driver for the Silicon Motion SM501 multimedia
> companion chip. This device is a multifunction device which may

I would normally have taken this, but I fear it will conflict with
[PATCH 06/12].  For that reason, just take my:

  Acked-by: Lee Jones 

-- 
Lee Jones [李琼斯]
Linaro Services Technical Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v2 2/2] iommu/vt-d: Enable PASID only if device expects PASID in PRG Response.

2019-02-13 Thread Tian, Kevin
> From: iommu-boun...@lists.linux-foundation.org [mailto:iommu-
> boun...@lists.linux-foundation.org] On Behalf Of
> sathyanarayanan.kuppusw...@linux.intel.com
> Sent: Tuesday, February 12, 2019 5:51 AM
> To: bhelg...@google.com; j...@8bytes.org; dw...@infradead.org
> Cc: Raj, Ashok ; linux-...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Busch, Keith ;
> iommu@lists.linux-foundation.org; Pan, Jacob jun
> 
> Subject: [PATCH v2 2/2] iommu/vt-d: Enable PASID only if device expects
> PASID in PRG Response.
> 
> From: Kuppuswamy Sathyanarayanan
> 
> 
> In Intel IOMMU, if the Page Request Queue (PRQ) is full, it will
> automatically respond to the device with a success message as a keep
> alive. And when sending the success message, IOMMU will include PASID in
> the Response Message when the Page Request has a PASID in Request
> Message and It does not check against the PRG Response PASID
> requirement
> of the device before sending the response. Also, If the device receives the
> PRG response with PASID when its not expecting it then the device behavior
> is undefined. So enable PASID support only if device expects PASID in PRG
> response message.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Keith Busch 
> Suggested-by: Ashok Raj 
> Signed-off-by: Kuppuswamy Sathyanarayanan
> 
> ---
>  drivers/iommu/intel-iommu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 1457f931218e..af2e4a011787 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1399,7 +1399,8 @@ static void iommu_enable_dev_iotlb(struct
> device_domain_info *info)
>  undefined. So always enable PASID support on devices which
>  have it, even if we can't yet know if we're ever going to
>  use it. */
> - if (info->pasid_supported && !pci_enable_pasid(pdev, info-
> >pasid_supported & ~1))
> + if (info->pasid_supported && pci_prg_resp_pasid_required(pdev)
> &&
> + !pci_enable_pasid(pdev, info->pasid_supported & ~1))
>   info->pasid_enabled = 1;

Above logic looks problematic. As Dave commented in another thread,
PRI and PASID are orthogonal capabilities. Especially with introduction
of VT-d scalable mode, PASID will be used alone even w/o PRI...

Why not doing the check when PRI is actually enabled? At that point
you can fail the request if above condition is false. 

> 
>   if (info->pri_supported && !pci_reset_pri(pdev)
> && !pci_enable_pri(pdev, 32))
> --
> 2.20.1
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu