[PATCH v3 3/3] swiotlb: checking whether swiotlb buffer is full with io_tlb_used

2019-01-17 Thread Dongli Zhang
This patch uses io_tlb_used to help check whether swiotlb buffer is full.
io_tlb_used is no longer used for only debugfs. It is also used to help
optimize swiotlb_tbl_map_single().

Suggested-by: Joe Jin 
Signed-off-by: Dongli Zhang 
---
Changed since v2:
  * move #ifdef folding to previous patch (suggested by Konrad Rzeszutek Wilk)

 kernel/dma/swiotlb.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index bedc9f9..a01b83e 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -483,6 +483,10 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 * request and allocate a buffer from that IO TLB pool.
 */
spin_lock_irqsave(_tlb_lock, flags);
+
+   if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))
+   goto not_found;
+
index = ALIGN(io_tlb_index, stride);
if (index >= io_tlb_nslabs)
index = 0;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/3] swiotlb: fix comment on swiotlb_bounce()

2019-01-17 Thread Dongli Zhang
Fix the comment as swiotlb_bounce() is used to copy from original dma
location to swiotlb buffer during swiotlb_tbl_map_single(), while to
copy from swiotlb buffer to original dma location during
swiotlb_tbl_unmap_single().

Signed-off-by: Dongli Zhang 
---
 kernel/dma/swiotlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1fb6fd6..1d8b377 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -385,7 +385,7 @@ void __init swiotlb_exit(void)
 }
 
 /*
- * Bounce: copy the swiotlb buffer back to the original dma location
+ * Bounce: copy the swiotlb buffer from or back to the original dma location
  */
 static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr,
   size_t size, enum dma_data_direction dir)
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 2/3] swiotlb: add debugfs to track swiotlb buffer usage

2019-01-17 Thread Dongli Zhang
The device driver will not be able to do dma operations once swiotlb buffer
is full, either because the driver is using so many IO TLB blocks inflight,
or because there is memory leak issue in device driver. To export the
swiotlb buffer usage via debugfs would help the user estimate the size of
swiotlb buffer to pre-allocate or analyze device driver memory leak issue.

Signed-off-by: Dongli Zhang 
---
Changed since v1:
  * init debugfs with late_initcall (suggested by Robin Murphy)
  * create debugfs entries with debugfs_create_ulong(suggested by Robin Murphy)

Changed since v2:
  * some #ifdef folding (suggested by Konrad Rzeszutek Wilk)

 kernel/dma/swiotlb.c | 44 
 1 file changed, 44 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1d8b377..bedc9f9 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -34,6 +34,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_DEBUG_FS
+#include 
+#endif
 
 #include 
 #include 
@@ -73,6 +76,11 @@ phys_addr_t io_tlb_start, io_tlb_end;
 static unsigned long io_tlb_nslabs;
 
 /*
+ * The number of used IO TLB block
+ */
+static unsigned long io_tlb_used;
+
+/*
  * This is a free list describing the number of free entries available from
  * each index
  */
@@ -524,6 +532,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", 
size);
return DMA_MAPPING_ERROR;
 found:
+   io_tlb_used += nslots;
spin_unlock_irqrestore(_tlb_lock, flags);
 
/*
@@ -584,6 +593,8 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
 */
for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != 
IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
io_tlb_list[i] = ++count;
+
+   io_tlb_used -= nslots;
}
spin_unlock_irqrestore(_tlb_lock, flags);
 }
@@ -662,3 +673,36 @@ swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
return __phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
 }
+
+#ifdef CONFIG_DEBUG_FS
+
+static int __init swiotlb_create_debugfs(void)
+{
+   static struct dentry *d_swiotlb_usage;
+   struct dentry *ent;
+
+   d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);
+
+   if (!d_swiotlb_usage)
+   return -ENOMEM;
+
+   ent = debugfs_create_ulong("io_tlb_nslabs", 0400,
+  d_swiotlb_usage, _tlb_nslabs);
+   if (!ent)
+   goto fail;
+
+   ent = debugfs_create_ulong("io_tlb_used", 0400,
+  d_swiotlb_usage, _tlb_used);
+   if (!ent)
+   goto fail;
+
+   return 0;
+
+fail:
+   debugfs_remove_recursive(d_swiotlb_usage);
+   return -ENOMEM;
+}
+
+late_initcall(swiotlb_create_debugfs);
+
+#endif
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/3] PCI: iproc: Add feature to set order mode

2019-01-17 Thread Srinath Mannam via iommu
Order mode in RX header of incoming pcie packets can be override to
strict or loose order based on requirement.
Sysfs entry is provided to set dynamic and default order modes of upstream
traffic.

To improve performance in few endpoints we required to modify the
ordering attributes. Using this feature we can override order modes of RX
packets at IPROC RC.

Ex:
In PCIe based NVMe SSD endpoints data read/writes from disk is using
Queue pairs (submit/completion). After completion of block read/write,
EP writes completion command to completion queue to notify RC.
So that all data transfers before completion command write are not
required to strict order except completion command. It means previous all
packets before completion command write to RC should be written to memory
and acknowledged.

Signed-off-by: Srinath Mannam 
Reviewed-by: Ray Jui 
---
 drivers/pci/controller/pcie-iproc.c | 128 
 drivers/pci/controller/pcie-iproc.h |  16 +
 2 files changed, 144 insertions(+)

diff --git a/drivers/pci/controller/pcie-iproc.c 
b/drivers/pci/controller/pcie-iproc.c
index c20fd6b..13ce80f 100644
--- a/drivers/pci/controller/pcie-iproc.c
+++ b/drivers/pci/controller/pcie-iproc.c
@@ -57,6 +57,9 @@
 #define PCIE_DL_ACTIVE_SHIFT   2
 #define PCIE_DL_ACTIVE BIT(PCIE_DL_ACTIVE_SHIFT)
 
+#define MPS_MRRS_CFG_MPS_SHIFT 0
+#define MPS_MRRS_CFG_MRRS_SHIFT16
+
 #define APB_ERR_EN_SHIFT   0
 #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT)
 
@@ -91,6 +94,14 @@
 
 #define IPROC_PCIE_REG_INVALID 0x
 
+#define RO_FIELD(window)   BIT((window) << 1)
+#define RO_VALUE(window)   BIT(((window) << 1) + 1)
+/* All Windows are allowed */
+#define RO_ALL_WINDOW  0x
+/* Wait on All Windows */
+#define RO_FIELD_ALL_WINDOW0x
+#define DYNAMIC_ORDER_MODE 0x5
+
 /**
  * iProc PCIe outbound mapping controller specific parameters
  *
@@ -295,6 +306,15 @@ enum iproc_pcie_reg {
/* enable APB error for unsupported requests */
IPROC_PCIE_APB_ERR_EN,
 
+   /* Ordering Mode configuration registers */
+   IPROC_PCIE_ORDERING_CFG,
+   IPROC_PCIE_MPS_MRRS_CFG,
+   IPROC_PCIE_IMAP0_RO_CONTROL,
+   IPROC_PCIE_IMAP1_RO_CONTROL,
+   IPROC_PCIE_IMAP2_RO_CONTROL,
+   IPROC_PCIE_IMAP3_RO_CONTROL,
+   IPROC_PCIE_IMAP4_RO_CONTROL,
+
/* total number of core registers */
IPROC_PCIE_MAX_NUM_REG,
 };
@@ -352,6 +372,13 @@ static const u16 iproc_pcie_reg_paxb_v2[] = {
[IPROC_PCIE_IMAP4]  = 0xe70,
[IPROC_PCIE_LINK_STATUS]= 0xf0c,
[IPROC_PCIE_APB_ERR_EN] = 0xf40,
+   [IPROC_PCIE_ORDERING_CFG]   = 0x2000,
+   [IPROC_PCIE_MPS_MRRS_CFG]   = 0x2008,
+   [IPROC_PCIE_IMAP0_RO_CONTROL]   = 0x201c,
+   [IPROC_PCIE_IMAP1_RO_CONTROL]   = 0x2020,
+   [IPROC_PCIE_IMAP2_RO_CONTROL]   = 0x2024,
+   [IPROC_PCIE_IMAP3_RO_CONTROL]   = 0x2028,
+   [IPROC_PCIE_IMAP4_RO_CONTROL]   = 0x202c,
 };
 
 /* iProc PCIe PAXC v1 registers */
@@ -1401,6 +1428,97 @@ static int iproc_pcie_rev_init(struct iproc_pcie *pcie)
return 0;
 }
 
+static
+ssize_t order_mode_show(struct device *dev,
+   struct device_attribute *attr,
+   char *buff)
+{
+   struct pci_host_bridge *host = to_pci_host_bridge(dev);
+   struct iproc_pcie *pcie = pci_host_bridge_priv(host);
+
+   return sprintf(buff, "Current PAXB order configuration %d\n",
+  pcie->order_cfg);
+}
+
+static void pcie_iproc_set_dynamic_order(struct iproc_pcie *pcie)
+{
+   u32 val = 0, mps;
+
+   /* Set all IMAPs to relaxed order in dynamic order mode */
+   iproc_pcie_write_reg(pcie, IPROC_PCIE_ORDERING_CFG,
+DYNAMIC_ORDER_MODE);
+   iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP0_RO_CONTROL,
+RO_ALL_WINDOW);
+   iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP1_RO_CONTROL,
+RO_ALL_WINDOW);
+   iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP2_RO_CONTROL,
+RO_ALL_WINDOW);
+   iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP3_RO_CONTROL,
+RO_ALL_WINDOW);
+   iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP4_RO_CONTROL,
+RO_ALL_WINDOW);
+
+   /* PAXB MPS/MRRS settings configuration */
+   iproc_pci_raw_config_read32(pcie, 0, IPROC_PCI_EXP_CAP + PCI_EXP_DEVCTL,
+   2, );
+   mps = (mps & PCI_EXP_DEVCTL_PAYLOAD) >> 5;
+   /* set MRRS to match system MPS */
+   val |= (mps << MPS_MRRS_CFG_MRRS_SHIFT);
+   /* set MPS to 4096 bytes */
+   val |= (0x5 << MPS_MRRS_CFG_MPS_SHIFT);
+   iproc_pcie_write_reg(pcie, IPROC_PCIE_MPS_MRRS_CFG, val);
+}
+
+static
+ssize_t order_mode_store(struct device *dev,
+   

[PATCH 2/3] PCI: iproc: CRS state check in config request

2019-01-17 Thread Srinath Mannam via iommu
In the current implementation, config read of 0x0001 data
is assumed as CRS completion. but sometimes 0x0001 can be
a valid data.
IPROC PCIe RC has a register to show config request status flags
like SC, UR, CRS and CA.
So that extra check is added in the code to confirm the CRS
state using this register before reissue config request.

Signed-off-by: Srinath Mannam 
Reviewed-by: Ray Jui 
---
 drivers/pci/controller/pcie-iproc.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/pcie-iproc.c 
b/drivers/pci/controller/pcie-iproc.c
index 13ce80f..ee89d56 100644
--- a/drivers/pci/controller/pcie-iproc.c
+++ b/drivers/pci/controller/pcie-iproc.c
@@ -63,6 +63,10 @@
 #define APB_ERR_EN_SHIFT   0
 #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT)
 
+#define CFG_RD_SUCCESS 0
+#define CFG_RD_UR  1
+#define CFG_RD_CRS 2
+#define CFG_RD_CA  3
 #define CFG_RETRY_STATUS   0x0001
 #define CFG_RETRY_STATUS_TIMEOUT_US50 /* 500 milliseconds */
 
@@ -300,6 +304,9 @@ enum iproc_pcie_reg {
IPROC_PCIE_IARR4,
IPROC_PCIE_IMAP4,
 
+   /* config read status */
+   IPROC_PCIE_CFG_RD_STATUS,
+
/* link status */
IPROC_PCIE_LINK_STATUS,
 
@@ -370,6 +377,7 @@ static const u16 iproc_pcie_reg_paxb_v2[] = {
[IPROC_PCIE_IMAP3]  = 0xe08,
[IPROC_PCIE_IARR4]  = 0xe68,
[IPROC_PCIE_IMAP4]  = 0xe70,
+   [IPROC_PCIE_CFG_RD_STATUS]  = 0xee0,
[IPROC_PCIE_LINK_STATUS]= 0xf0c,
[IPROC_PCIE_APB_ERR_EN] = 0xf40,
[IPROC_PCIE_ORDERING_CFG]   = 0x2000,
@@ -501,10 +509,12 @@ static void __iomem *iproc_pcie_map_ep_cfg_reg(struct 
iproc_pcie *pcie,
return (pcie->base + offset);
 }
 
-static unsigned int iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
+static unsigned int iproc_pcie_cfg_retry(struct iproc_pcie *pcie,
+void __iomem *cfg_data_p)
 {
int timeout = CFG_RETRY_STATUS_TIMEOUT_US;
unsigned int data;
+   u32 status;
 
/*
 * As per PCIe spec r3.1, sec 2.3.2, CRS Software Visibility only
@@ -525,6 +535,15 @@ static unsigned int iproc_pcie_cfg_retry(void __iomem 
*cfg_data_p)
 */
data = readl(cfg_data_p);
while (data == CFG_RETRY_STATUS && timeout--) {
+   /*
+* CRS state is set in CFG_RD status register
+* This will handle the case where CFG_RETRY_STATUS is
+* valid config data.
+*/
+   status = iproc_pcie_read_reg(pcie, IPROC_PCIE_CFG_RD_STATUS);
+   if (status != CFG_RD_CRS)
+   return data;
+
udelay(1);
data = readl(cfg_data_p);
}
@@ -603,7 +622,7 @@ static int iproc_pcie_config_read(struct pci_bus *bus, 
unsigned int devfn,
if (!cfg_data_p)
return PCIBIOS_DEVICE_NOT_FOUND;
 
-   data = iproc_pcie_cfg_retry(cfg_data_p);
+   data = iproc_pcie_cfg_retry(pcie, cfg_data_p);
 
*val = data;
if (size <= 2)
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/3] Add IPROC PCIe new features

2019-01-17 Thread Srinath Mannam via iommu
Add changes related to IPROC PCIe RC IP new features.

This patch set is based on Linux-5.0-rc2.

Srinath Mannam (3):
  PCI: iproc: Add feature to set order mode
  PCI: iproc: CRS state check in config request
  PCI: iproc: Add PCIe 32bit outbound memory configuration

 drivers/pci/controller/pcie-iproc.c | 172 +++-
 drivers/pci/controller/pcie-iproc.h |  16 
 2 files changed, 184 insertions(+), 4 deletions(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/3] PCI: iproc: Add PCIe 32bit outbound memory configuration

2019-01-17 Thread Srinath Mannam via iommu
IPROC PCIe RC supports outbound memory mapping with SOC address inside
4GB address boundary, which is 32 bit AXI address.
This patch add the support.

Signed-off-by: Srinath Mannam 
Signed-off-by: Abhishek Shah 
Signed-off-by: Ray Jui 
Reviewed-by: Scott Branden 
Reviewed-by: Vikram Prakash 
---
 drivers/pci/controller/pcie-iproc.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/pcie-iproc.c 
b/drivers/pci/controller/pcie-iproc.c
index ee89d56..4d0e63b 100644
--- a/drivers/pci/controller/pcie-iproc.c
+++ b/drivers/pci/controller/pcie-iproc.c
@@ -982,8 +982,25 @@ static int iproc_pcie_setup_ob(struct iproc_pcie *pcie, 
u64 axi_addr,
resource_size_t window_size =
ob_map->window_sizes[size_idx] * SZ_1M;
 
-   if (size < window_size)
-   continue;
+   /*
+* Keep iterating until we reach the last window and
+* with the minimal window size at index zero. In this
+* case, we take a compromise by mapping it using the
+* minimum window size that can be supported
+*/
+   if (size < window_size) {
+   if (size_idx > 0 || window_idx > 0)
+   continue;
+
+   /*
+* For the corner case of reaching the minimal
+* window size that can be supported on the
+* last window
+*/
+   axi_addr = ALIGN_DOWN(axi_addr, window_size);
+   pci_addr = ALIGN_DOWN(pci_addr, window_size);
+   size = window_size;
+   }
 
if (!IS_ALIGNED(axi_addr, window_size) ||
!IS_ALIGNED(pci_addr, window_size)) {
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu: amd: call free_iova_fast with pfn in map_sg

2019-01-17 Thread Jerry Snitselaar
In the error path of map_sg, free_iova_fast is being called with
address instead of the pfn. This results in a bad value getting into
the rcache, and can result in hitting a BUG_ON when
iova_magazine_free_pfns is called.

Cc: Joerg Roedel 
Cc: Suravee Suthikulpanit 
Signed-off-by: Jerry Snitselaar 
---
 drivers/iommu/amd_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 87ba23a75b38..418df8ff3e50 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2623,7 +2623,7 @@ static int map_sg(struct device *dev, struct scatterlist 
*sglist,
}
 
 out_free_iova:
-   free_iova_fast(_dom->iovad, address, npages);
+   free_iova_fast(_dom->iovad, address >> PAGE_SHIFT, npages);
 
 out_err:
return 0;
-- 
2.20.1.98.gecbdaf0899

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/3] videobuf2: replace a layering violation with dma_map_resource

2019-01-17 Thread Christoph Hellwig
On Mon, Jan 14, 2019 at 01:42:26PM +0100, Marek Szyprowski wrote:
> Hi Christoph,
> 
> On 2019-01-11 19:17, Christoph Hellwig wrote:
> > vb2_dc_get_userptr pokes into arm direct mapping details to get the
> > resemblance of a dma address for a a physical address that does is
> > not backed by a page struct.  Not only is this not portable to other
> > architectures with dma direct mapping offsets, but also not to uses
> > of IOMMUs of any kind.  Switch to the proper dma_map_resource /
> > dma_unmap_resource interface instead.
> >
> > Signed-off-by: Christoph Hellwig 
> 
> There are checks for IOMMU presence in other places in vb2-dma-contig,
> so it was used only when no IOMMU is available, but I agree that the
> hacky code should be replaced by a generic code if possible.
> 
> Tested-by: Marek Szyprowski 
> 
> V4L2 pipeline works fine on older Exynos-based boards with CMA and IOMMU
> disabled.

Do you know if these rely on the offsets?  E.g. would they still work
with the patch below applied on top.  That would keep the map_resource
semantics as-is as solve the issue pointed out by Robin for now.

If not I can only think of a flag to bypass the offseting for now, but
that would be pretty ugly.  Or go for the long-term solution of
discovering the relationship between the two devices, as done by the
PCIe P2P code..

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 8e0359b04957..25bd19974223 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -359,7 +359,7 @@ EXPORT_SYMBOL(dma_direct_map_sg);
 dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
-   dma_addr_t dma_addr = phys_to_dma(dev, paddr);
+   dma_addr_t dma_addr = paddr;
 
if (unlikely(!dma_direct_possible(dev, dma_addr, size))) {
report_addr(dev, dma_addr, size);

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] arm64/xen: fix xen-swiotlb cache flushing

2019-01-17 Thread Christoph Hellwig
Xen-swiotlb hooks into the arm/arm64 arch code through a copy of the
DMA mapping operations stored in the struct device arch data.

Switching arm64 to use the direct calls for the merged DMA direct /
swiotlb code broke this scheme.  Replace the indirect calls with
direct-calls in xen-swiotlb as well to fix this problem.

Fixes: 356da6d0cd ("dma-mapping: bypass indirect calls for dma-direct")
Reported-by: Julien Grall 
Signed-off-by: Christoph Hellwig 
---
 arch/arm/include/asm/xen/page-coherent.h   | 94 +
 arch/arm64/include/asm/device.h|  3 -
 arch/arm64/include/asm/xen/page-coherent.h | 76 +
 arch/arm64/mm/dma-mapping.c|  4 +-
 drivers/xen/swiotlb-xen.c  |  4 +-
 include/xen/arm/page-coherent.h| 97 +-
 6 files changed, 176 insertions(+), 102 deletions(-)

diff --git a/arch/arm/include/asm/xen/page-coherent.h 
b/arch/arm/include/asm/xen/page-coherent.h
index b3ef061d8b74..2c403e7c782d 100644
--- a/arch/arm/include/asm/xen/page-coherent.h
+++ b/arch/arm/include/asm/xen/page-coherent.h
@@ -1 +1,95 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM_XEN_PAGE_COHERENT_H
+#define _ASM_ARM_XEN_PAGE_COHERENT_H
+
+#include 
+#include 
 #include 
+
+static inline const struct dma_map_ops *xen_get_dma_ops(struct device *dev)
+{
+   if (dev && dev->archdata.dev_dma_ops)
+   return dev->archdata.dev_dma_ops;
+   return get_arch_dma_ops(NULL);
+}
+
+static inline void *xen_alloc_coherent_pages(struct device *hwdev, size_t size,
+   dma_addr_t *dma_handle, gfp_t flags, unsigned long attrs)
+{
+   return xen_get_dma_ops(hwdev)->alloc(hwdev, size, dma_handle, flags, 
attrs);
+}
+
+static inline void xen_free_coherent_pages(struct device *hwdev, size_t size,
+   void *cpu_addr, dma_addr_t dma_handle, unsigned long attrs)
+{
+   xen_get_dma_ops(hwdev)->free(hwdev, size, cpu_addr, dma_handle, attrs);
+}
+
+static inline void xen_dma_map_page(struct device *hwdev, struct page *page,
+dma_addr_t dev_addr, unsigned long offset, size_t size,
+enum dma_data_direction dir, unsigned long attrs)
+{
+   unsigned long page_pfn = page_to_xen_pfn(page);
+   unsigned long dev_pfn = XEN_PFN_DOWN(dev_addr);
+   unsigned long compound_pages =
+   (1unmap_page)
+   xen_get_dma_ops(hwdev)->unmap_page(hwdev, handle, size, 
dir, attrs);
+   } else
+   __xen_dma_unmap_page(hwdev, handle, size, dir, attrs);
+}
+
+static inline void xen_dma_sync_single_for_cpu(struct device *hwdev,
+   dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+   unsigned long pfn = PFN_DOWN(handle);
+   if (pfn_valid(pfn)) {
+   if (xen_get_dma_ops(hwdev)->sync_single_for_cpu)
+   xen_get_dma_ops(hwdev)->sync_single_for_cpu(hwdev, 
handle, size, dir);
+   } else
+   __xen_dma_sync_single_for_cpu(hwdev, handle, size, dir);
+}
+
+static inline void xen_dma_sync_single_for_device(struct device *hwdev,
+   dma_addr_t handle, size_t size, enum dma_data_direction dir)
+{
+   unsigned long pfn = PFN_DOWN(handle);
+   if (pfn_valid(pfn)) {
+   if (xen_get_dma_ops(hwdev)->sync_single_for_device)
+   xen_get_dma_ops(hwdev)->sync_single_for_device(hwdev, 
handle, size, dir);
+   } else
+   __xen_dma_sync_single_for_device(hwdev, handle, size, dir);
+}
+
+#endif /* _ASM_ARM_XEN_PAGE_COHERENT_H */
diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
index 3dd3d664c5c5..4658c937e173 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -20,9 +20,6 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
void *iommu;/* private IOMMU data */
 #endif
-#ifdef CONFIG_XEN
-   const struct dma_map_ops *dev_dma_ops;
-#endif
 };
 
 struct pdev_archdata {
diff --git a/arch/arm64/include/asm/xen/page-coherent.h 
b/arch/arm64/include/asm/xen/page-coherent.h

Re: [PATCH 5/5] iommu/tegra-smmu: Add resv_regions ops

2019-01-17 Thread Robin Murphy

On 16/01/2019 20:50, Navneet Kumar wrote:

Add support for get/put reserved regions.


For any particular reason? If you're aiming to get 
VFIO-passthrough-with-MSIs working on Tegra, this probably won't be 
correct anyway...



Signed-off-by: Navneet Kumar 
---
  drivers/iommu/tegra-smmu.c | 31 +++
  1 file changed, 31 insertions(+)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 4b43c63..e848a45 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -22,6 +22,9 @@
  #include 
  #include 
  
+#define MSI_IOVA_BASE	0x800


...because this arbitrary IOVA relies on writes to the MSI doorbell 
undergoing address translation at the IOMMU just like any other access. 
Looking at Tegra 134, that seems to implement an MSI doorbell internal 
to the PCIe root complex, far upstream of translation, therefore at the 
very least you'd need to reserve whatever IOVA region is shadowed by 
that doorbell in PCI memory space...



+#define MSI_IOVA_LENGTH0x10
+
  struct tegra_smmu_group {
struct list_head list;
const struct tegra_smmu_group_soc *soc;
@@ -882,6 +885,31 @@ static int tegra_smmu_of_xlate(struct device *dev,
return iommu_fwspec_add_ids(dev, , 1);
  }
  
+static void tegra_smmu_get_resv_regions(struct device *dev,

+   struct list_head *head)
+{
+   struct iommu_resv_region *region;
+   int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+
+   region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
+   prot, IOMMU_RESV_SW_MSI);


...and as an untranslatable hardware MSI region, not a software-managed one.

Robin.


+   if (!region)
+   return;
+
+   list_add_tail(>list, head);
+
+   iommu_dma_get_resv_regions(dev, head);
+}
+
+static void tegra_smmu_put_resv_regions(struct device *dev,
+   struct list_head *head)
+{
+   struct iommu_resv_region *entry, *next;
+
+   list_for_each_entry_safe(entry, next, head, list)
+   kfree(entry);
+}
+
  static const struct iommu_ops tegra_smmu_ops = {
.capable = tegra_smmu_capable,
.domain_alloc = tegra_smmu_domain_alloc,
@@ -896,6 +924,9 @@ static const struct iommu_ops tegra_smmu_ops = {
.iova_to_phys = tegra_smmu_iova_to_phys,
.of_xlate = tegra_smmu_of_xlate,
.pgsize_bitmap = SZ_4K,
+
+   .get_resv_regions = tegra_smmu_get_resv_regions,
+   .put_resv_regions = tegra_smmu_put_resv_regions,
  };
  
  static void tegra_smmu_ahb_enable(void)



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] iommu/tegra-smmu: Fix domain_alloc

2019-01-17 Thread Robin Murphy

On 17/01/2019 15:13, Dmitry Osipenko wrote:

16.01.2019 23:50, Navneet Kumar пишет:

* Allocate dma iova cookie for a domain while adding dma iommu
devices.
* Perform a stricter check for domain type parameter.



Commit message should tell what exactly is getting "fixed". Apparently you're 
trying to support T132 ARM64 here.


Signed-off-by: Navneet Kumar 
---
  drivers/iommu/tegra-smmu.c | 43 +++
  1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 543f7c9..ee4d8a8 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -16,6 +16,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -271,8 +272,10 @@ static bool tegra_smmu_capable(enum iommu_cap cap)
  static struct iommu_domain *tegra_smmu_domain_alloc(unsigned type)
  {
struct tegra_smmu_as *as;
+   int ret;
  
-	if (type != IOMMU_DOMAIN_UNMANAGED)

+   if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA &&
+   type != IOMMU_DOMAIN_IDENTITY)
return NULL;


Should be better to write these lines like this for the sake of readability:

if (type != IOMMU_DOMAIN_UNMANAGED &&
type != IOMMU_DOMAIN_DMA &&
type != IOMMU_DOMAIN_IDENTITY)
return NULL;


Furthermore, AFAICS there's no special handling being added for identity 
domains - don't pretend to support them by giving back a regular 
translation domain, that's just misleading and broken.





  
  	as = kzalloc(sizeof(*as), GFP_KERNEL);

@@ -281,26 +284,22 @@ static struct iommu_domain 
*tegra_smmu_domain_alloc(unsigned type)
  
  	as->attr = SMMU_PD_READABLE | SMMU_PD_WRITABLE | SMMU_PD_NONSECURE;
  
+	ret = (type == IOMMU_DOMAIN_DMA) ? iommu_get_dma_cookie(>domain) :

+   -ENODEV;


This makes to fail allocation of domain that has type other than 
IOMMU_DOMAIN_DMA.

Should be:

if (type == IOMMU_DOMAIN_DMA) {
err = iommu_get_dma_cookie(>domain);
if (err)
goto free_as;
}



+   if (ret)
+   goto free_as;
+
as->pd = alloc_page(GFP_KERNEL | __GFP_DMA | __GFP_ZERO);
-   if (!as->pd) {
-   kfree(as);
-   return NULL;
-   }
+   if (!as->pd)
+   goto put_dma_cookie;
  
  	as->count = kcalloc(SMMU_NUM_PDE, sizeof(u32), GFP_KERNEL);

-   if (!as->count) {
-   __free_page(as->pd);
-   kfree(as);
-   return NULL;
-   }
+   if (!as->count)
+   goto free_pd_range;
  
  	as->pts = kcalloc(SMMU_NUM_PDE, sizeof(*as->pts), GFP_KERNEL);

-   if (!as->pts) {
-   kfree(as->count);
-   __free_page(as->pd);
-   kfree(as);
-   return NULL;
-   }
+   if (!as->pts)
+   goto free_pts;
  
  	/* setup aperture */

as->domain.geometry.aperture_start = 0;
@@ -308,6 +307,18 @@ static struct iommu_domain 
*tegra_smmu_domain_alloc(unsigned type)
as->domain.geometry.force_aperture = true;
  
  	return >domain;

+
+free_pts:
+   kfree(as->pts);
+free_pd_range:
+   __free_page(as->pd);
+put_dma_cookie:
+   if (type == IOMMU_DOMAIN_DMA)


FWIW you don't strictly need that check - since domain->iova_cookie 
won't be set for other domain types anyway, iommu_put_dma_cookie() will 
simply ignore them.



+   iommu_put_dma_cookie(>domain);
+free_as:
+   kfree(as);
+
+   return NULL;
  }
  
  static void tegra_smmu_domain_free(struct iommu_domain *domain)


How about not leaking the cookie when you free a DMA ops domain?

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 2/2] swiotlb: checking whether swiotlb buffer is full with io_tlb_used

2019-01-17 Thread Konrad Rzeszutek Wilk
On Mon, Dec 10, 2018 at 08:37:58AM +0800, Dongli Zhang wrote:
> This patch uses io_tlb_used to help check whether swiotlb buffer is full.
> io_tlb_used is no longer used for only debugfs. It is also used to help
> optimize swiotlb_tbl_map_single().

Please split this up.

That is have the 'if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))'
as a seperate patch.

And the #ifdef folding in the previous patch.

Also rebase on top of latest Linus please.

> 
> Suggested-by: Joe Jin 
> Signed-off-by: Dongli Zhang 
> ---
>  kernel/dma/swiotlb.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 3979c2c..9300341 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -76,12 +76,10 @@ static phys_addr_t io_tlb_start, io_tlb_end;
>   */
>  static unsigned long io_tlb_nslabs;
>  
> -#ifdef CONFIG_DEBUG_FS
>  /*
>   * The number of used IO TLB block
>   */
>  static unsigned long io_tlb_used;
> -#endif
>  
>  /*
>   * This is a free list describing the number of free entries available from
> @@ -489,6 +487,10 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>* request and allocate a buffer from that IO TLB pool.
>*/
>   spin_lock_irqsave(_tlb_lock, flags);
> +
> + if (unlikely(nslots > io_tlb_nslabs - io_tlb_used))
> + goto not_found;
> +
>   index = ALIGN(io_tlb_index, stride);
>   if (index >= io_tlb_nslabs)
>   index = 0;
> @@ -538,9 +540,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>   dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", 
> size);
>   return SWIOTLB_MAP_ERROR;
>  found:
> -#ifdef CONFIG_DEBUG_FS
>   io_tlb_used += nslots;
> -#endif
>   spin_unlock_irqrestore(_tlb_lock, flags);
>  
>   /*
> @@ -602,9 +602,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
>   for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != 
> IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
>   io_tlb_list[i] = ++count;
>  
> -#ifdef CONFIG_DEBUG_FS
>   io_tlb_used -= nslots;
> -#endif
>   }
>   spin_unlock_irqrestore(_tlb_lock, flags);
>  }
> -- 
> 2.7.4
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu/tegra-smmu: Use non-secure register for flushing

2019-01-17 Thread Dmitry Osipenko
16.01.2019 23:50, Navneet Kumar пишет:
> Use PTB_ASID instead of SMMU_CONFIG to flush smmu.
> PTB_ASID can be accessed from non-secure mode, SMMU_CONFIG cannot be.
> Using SMMU_CONFIG could pose a problem when kernel doesn't have secure
> mode access enabled from boot.
> 
> Signed-off-by: Navneet Kumar 
> ---
>  drivers/iommu/tegra-smmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> index ee4d8a8..fa175d9 100644
> --- a/drivers/iommu/tegra-smmu.c
> +++ b/drivers/iommu/tegra-smmu.c
> @@ -235,7 +235,7 @@ static inline void smmu_flush_tlb_group(struct tegra_smmu 
> *smmu,
>  
>  static inline void smmu_flush(struct tegra_smmu *smmu)
>  {
> - smmu_readl(smmu, SMMU_CONFIG);
> + smmu_readl(smmu, SMMU_PTB_ASID);
>  }
>  
>  static int tegra_smmu_alloc_asid(struct tegra_smmu *smmu, unsigned int *idp)
> 

Driver writes to SMMU_CONFIG during of the probing. Looks like it's better to 
drop this patch for now and make it part of a complete solution that will add 
proper support for a stricter insecure-mode platforms.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/5] iommu/tegra-smmu: Fix domain_alloc

2019-01-17 Thread Dmitry Osipenko
16.01.2019 23:50, Navneet Kumar пишет:
> * Allocate dma iova cookie for a domain while adding dma iommu
> devices.
> * Perform a stricter check for domain type parameter.
> 

Commit message should tell what exactly is getting "fixed". Apparently you're 
trying to support T132 ARM64 here.

> Signed-off-by: Navneet Kumar 
> ---
>  drivers/iommu/tegra-smmu.c | 43 +++
>  1 file changed, 27 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
> index 543f7c9..ee4d8a8 100644
> --- a/drivers/iommu/tegra-smmu.c
> +++ b/drivers/iommu/tegra-smmu.c
> @@ -16,6 +16,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -271,8 +272,10 @@ static bool tegra_smmu_capable(enum iommu_cap cap)
>  static struct iommu_domain *tegra_smmu_domain_alloc(unsigned type)
>  {
>   struct tegra_smmu_as *as;
> + int ret;
>  
> - if (type != IOMMU_DOMAIN_UNMANAGED)
> + if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA &&
> + type != IOMMU_DOMAIN_IDENTITY)
>   return NULL;

Should be better to write these lines like this for the sake of readability:

if (type != IOMMU_DOMAIN_UNMANAGED && 
type != IOMMU_DOMAIN_DMA &&
type != IOMMU_DOMAIN_IDENTITY)
return NULL;


>  
>   as = kzalloc(sizeof(*as), GFP_KERNEL);
> @@ -281,26 +284,22 @@ static struct iommu_domain 
> *tegra_smmu_domain_alloc(unsigned type)
>  
>   as->attr = SMMU_PD_READABLE | SMMU_PD_WRITABLE | SMMU_PD_NONSECURE;
>  
> + ret = (type == IOMMU_DOMAIN_DMA) ? iommu_get_dma_cookie(>domain) :
> + -ENODEV;

This makes to fail allocation of domain that has type other than 
IOMMU_DOMAIN_DMA.

Should be:

if (type == IOMMU_DOMAIN_DMA) {
err = iommu_get_dma_cookie(>domain);
if (err)
goto free_as;
}


> + if (ret)
> + goto free_as;
> +
>   as->pd = alloc_page(GFP_KERNEL | __GFP_DMA | __GFP_ZERO);
> - if (!as->pd) {
> - kfree(as);
> - return NULL;
> - }
> + if (!as->pd)
> + goto put_dma_cookie;
>  
>   as->count = kcalloc(SMMU_NUM_PDE, sizeof(u32), GFP_KERNEL);
> - if (!as->count) {
> - __free_page(as->pd);
> - kfree(as);
> - return NULL;
> - }
> + if (!as->count)
> + goto free_pd_range;
>  
>   as->pts = kcalloc(SMMU_NUM_PDE, sizeof(*as->pts), GFP_KERNEL);
> - if (!as->pts) {
> - kfree(as->count);
> - __free_page(as->pd);
> - kfree(as);
> - return NULL;
> - }
> + if (!as->pts)
> + goto free_pts;
>  
>   /* setup aperture */
>   as->domain.geometry.aperture_start = 0;
> @@ -308,6 +307,18 @@ static struct iommu_domain 
> *tegra_smmu_domain_alloc(unsigned type)
>   as->domain.geometry.force_aperture = true;
>  
>   return >domain;
> +
> +free_pts:
> + kfree(as->pts);
> +free_pd_range:
> + __free_page(as->pd);
> +put_dma_cookie:
> + if (type == IOMMU_DOMAIN_DMA)
> + iommu_put_dma_cookie(>domain);
> +free_as:
> + kfree(as);
> +
> + return NULL;
>  }
>  
>  static void tegra_smmu_domain_free(struct iommu_domain *domain)
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: Fail to boot Dom0 on Xen Arm64 after "dma-mapping: bypass indirect calls for dma-direct"

2019-01-17 Thread Christoph Hellwig
On Thu, Jan 17, 2019 at 11:43:49AM +, Julien Grall wrote:
> Looking at the change for arm64, you will always call dma-direct API. In
> previous Linux version, xen-swiotlb will call dev->archdata.dev_dma_ops (a
> copy of dev->dma_ops before setting Xen DMA ops) if not NULL. Does it mean
> we expect dev->dma_ops to always be NULL and hence using dma-direct API?

The way I understood the code from inspecting it and sking the
maintainers a few askings is that for DOM0 we always use xen-swiotlb
as the actual dma_map_ops, but then use the functions in page-coherent.h
only to deal with cache maintainance, so it should be safe.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1] iommu/s390: Declare s390 iommu reserved regions

2019-01-17 Thread Robin Murphy

On 15/01/2019 17:37, Pierre Morel wrote:

The s390 iommu can only allow DMA transactions between the zPCI device
entries start_dma and end_dma.

Let's declare the regions before start_dma and after end_dma as
reserved regions using the appropriate callback in iommu_ops.

The reserved region may later be retrieved from sysfs or from
the vfio iommu internal interface.


For this particular case, I think the best solution is to give VFIO the 
ability to directly interrogate the domain geometry (which s390 appears 
to set correctly already). The idea of reserved regions was really for 
'unexpected' holes inside the usable address space - using them to also 
describe places that are entirely outside that address space rather 
confuses things IMO.


Furthermore, even if we *did* end up going down the route of actively 
reserving addresses beyond the usable aperture, it doesn't seem sensible 
for individual drivers to do it themselves when the core API already 
describes the relevant information generically.


Robin.



This seems to me related with the work Shameer has started on
vfio_iommu_type1 so I add Alex and Shameer to the CC list.


Pierre Morel (1):
   iommu/s390: Declare s390 iommu reserved regions

  drivers/iommu/s390-iommu.c | 29 +
  1 file changed, 29 insertions(+)


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v1] iommu/s390: Declare s390 iommu reserved regions

2019-01-17 Thread Pierre Morel

On 17/01/2019 10:23, Shameerali Kolothum Thodi wrote:

Hi Pierre,


-Original Message-
From: Pierre Morel [mailto:pmo...@linux.ibm.com]
Sent: 15 January 2019 17:37
To: gerald.schae...@de.ibm.com
Cc: j...@8bytes.org; linux-s...@vger.kernel.org;
iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
alex.william...@redhat.com; Shameerali Kolothum Thodi
; wall...@linux.ibm.com
Subject: [PATCH v1] iommu/s390: Declare s390 iommu reserved regions

The s390 iommu can only allow DMA transactions between the zPCI device
entries start_dma and end_dma.

Let's declare the regions before start_dma and after end_dma as
reserved regions using the appropriate callback in iommu_ops.

The reserved region may later be retrieved from sysfs or from
the vfio iommu internal interface.


Just in case you are planning to use the sysfs interface to retrieve the valid
regions, and intend to use that in Qemu vfio path, please see the discussion
here [1] (If you haven't seen this already)

Thanks,
Shameer

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03651.html
  

This seems to me related with the work Shameer has started on
vfio_iommu_type1 so I add Alex and Shameer to the CC list.

Pierre Morel (1):
   iommu/s390: Declare s390 iommu reserved regions

  drivers/iommu/s390-iommu.c | 29 +
  1 file changed, 29 insertions(+)

--
2.7.4




Thanks Shameer,

Interesting discussion indeed.

AFAIK the patch series you are working on will provide a way to retrieve 
the reserved region through the VFIO IOMMU interface, using capabilities 
in the VFIO_IOMMU_GET_INFO.
Before this patch, the iommu_type1 was not able to retrieve the 
forbidden region from the s390_iommu.


See this patch is a contribution, so that these regions will appear in 
the reserved list when the VFIO_IOMM_GET_INFO will be able to report the 
information.


I am expecting to be able to to retrieve this information from the 
VFIO_IOMMU_GET_INFO syscall as soon as it is available.


Regards,
Pierre


--
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain

2019-01-17 Thread Suthikulpanit, Suravee
Joerg,

On 1/17/19 3:44 PM, Suravee Suthikulpanit wrote:
> Joerg,
> 
> On 1/17/19 12:08 AM, j...@8bytes.org wrote:
>> On Wed, Jan 16, 2019 at 02:08:55PM +, Suthikulpanit, Suravee wrote:
>>> Actually, I am not sure how we would be missing the flush on the last 
>>> device.
>>> In my test, I am seeing the flush command being issued correctly during
>>> vfio_unmap_unpin(), which is after all devices are detached.
>>> Although, I might be missing your point here. Could you please elaborate?
>>
>> Okay, you are right, the patch effectivly adds an unconditional flush of
>> the domain on all iommus when the last device is detached. So it is
>> correct in that regard. But that code path is also used in the
>> iommu_unmap() path.
>>
>> The problem now is, that with your change we send flush commands to all
>> IOMMUs in the unmap path when no device is attached to the domain.
>> This will hurt performance there, no?
>>
>> Regards,
>>
>> Joerg
>>
> 
> Sounds like we need a way to track state of each IOMMU for a domain.
> What if we define the following:
> 
>    enum IOMMU_DOMAIN_STATES {
>      DOMAIN_FREE = -1,
>      DOMAIN_DETACHED = 0,
>      DOMAIN_ATTACHED >= 1
>    }
> 
> We should be able to use the dev_iommu[] to help track the state.
>      - In amd_iommu_domain_alloc, we initialize the array to DOMAIN_FREE
>      - In do_attach(), we change to DOMAIN_ATTACH or we can increment the 
> count
>    if it is already in DOMAIN_ATTACH state.
>      - In do_detach(). we change to DOMAIN_DETACH.
> 
> Then, in __domain_flush_pages, we issue command when the dev_iommu[] >= 0.
> This should preserve previous behavior, and only add flushing condition to
> the specific IOMMU in detached state. Please let me know what you think.
> 
> Regards,
> Suravee

By the way, I just sent V2 of this patch since it might be more clear.

Regards,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v2] iommu/amd: Fix IOMMU page flush when detach all devices from a domain

2019-01-17 Thread Suthikulpanit, Suravee
From: Suravee Suthikulpanit 

When a VM is terminated, the VFIO driver detaches all pass-through
devices from VFIO domain by clearing domain id and page table root
pointer from each device table entry (DTE), and then invalidates
the DTE. Then, the VFIO driver unmap pages and invalidate IOMMU pages.

Currently, the IOMMU driver keeps track of which IOMMU and how many
devices are attached to the domain. When invalidate IOMMU pages,
the driver checks if the IOMMU is still attached to the domain before
issuing the invalidate page command.

However, since VFIO has already detached all devices from the domain,
the subsequent INVALIDATE_IOMMU_PAGES commands are being skipped as
there is no IOMMU attached to the domain. This results in data
corruption and could cause the PCI device to end up in indeterministic
state.

Fix this by introducing IOMMU domain states attached, detached,
and free. Then only issuing the IOMMU pages invalidate command when
domain is in attached and detached states.

Cc: Boris Ostrovsky 
Signed-off-by: Brijesh Singh 
Signed-off-by: Suravee Suthikulpanit 
---
 drivers/iommu/amd_iommu.c   | 25 -
 drivers/iommu/amd_iommu_types.h |  8 +++-
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 525659b88ade..1c64a26c50fb 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1248,7 +1248,14 @@ static void __domain_flush_pages(struct 
protection_domain *domain,
build_inv_iommu_pages(, address, size, domain->id, pde);
 
for (i = 0; i < amd_iommu_get_num_iommus(); ++i) {
-   if (!domain->dev_iommu[i])
+   /*
+* We issue the command only when the domain is
+* in ATTACHED and DETACHED state. This is required
+* since VFIO detaches all devices from the group
+* before invalidating IOMMU pages. Please see
+* vfio_iommu_type1_detach_group().
+*/
+   if (domain->dev_iommu[i] == IOMMU_DOMAIN_STATE_FREE)
continue;
 
/*
@@ -1292,7 +1299,8 @@ static void domain_flush_complete(struct 
protection_domain *domain)
int i;
 
for (i = 0; i < amd_iommu_get_num_iommus(); ++i) {
-   if (domain && !domain->dev_iommu[i])
+   if (domain &&
+   domain->dev_iommu[i] < IOMMU_DOMAIN_STATE_ATTACHED)
continue;
 
/*
@@ -1978,8 +1986,11 @@ static void do_attach(struct iommu_dev_data *dev_data,
list_add(_data->list, >dev_list);
 
/* Do reference counting */
-   domain->dev_iommu[iommu->index] += 1;
-   domain->dev_cnt += 1;
+   if (domain->dev_iommu[iommu->index] < IOMMU_DOMAIN_STATE_ATTACHED)
+   domain->dev_iommu[iommu->index] = IOMMU_DOMAIN_STATE_ATTACHED;
+   else
+   domain->dev_iommu[iommu->index] += 1;
+   domain->dev_cnt += 1;
 
/* Update device table */
set_dte_entry(dev_data->devid, domain, ats, dev_data->iommu_v2);
@@ -2904,6 +2915,7 @@ static int protection_domain_init(struct 
protection_domain *domain)
 
 static struct protection_domain *protection_domain_alloc(void)
 {
+   int i;
struct protection_domain *domain;
 
domain = kzalloc(sizeof(*domain), GFP_KERNEL);
@@ -2915,6 +2927,9 @@ static struct protection_domain 
*protection_domain_alloc(void)
 
add_domain_to_list(domain);
 
+   for (i = 0; i < MAX_IOMMUS; i++)
+   domain->dev_iommu[i] = IOMMU_DOMAIN_STATE_FREE;
+
return domain;
 
 out_err:
@@ -3364,7 +3379,7 @@ static int __flush_pasid(struct protection_domain 
*domain, int pasid,
 * prevent device TLB refill from IOMMU TLB
 */
for (i = 0; i < amd_iommu_get_num_iommus(); ++i) {
-   if (domain->dev_iommu[i] == 0)
+   if (domain->dev_iommu[i] < IOMMU_DOMAIN_STATE_ATTACHED)
continue;
 
ret = iommu_queue_command(amd_iommus[i], );
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index eae0741f72dc..6d17cdcfa306 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -461,6 +461,12 @@ struct amd_irte_ops;
 
 #define AMD_IOMMU_FLAG_TRANS_PRE_ENABLED  (1 << 0)
 
+enum IOMMU_DOMAIN_STATES {
+   IOMMU_DOMAIN_STATE_FREE = -1,
+   IOMMU_DOMAIN_STATE_DETTACHED,
+   IOMMU_DOMAIN_STATE_ATTACHED
+};
+
 /*
  * This structure contains generic data for  IOMMU protection domains
  * independent of their use.
@@ -480,7 +486,7 @@ struct protection_domain {
unsigned long flags;/* flags to find out type of domain */
bool updated;   /* complete domain flush required */
unsigned dev_cnt;   /* devices assigned to this domain */
-   unsigned dev_iommu[MAX_IOMMUS]; /* per-IOMMU reference count */

Re: Fail to boot Dom0 on Xen Arm64 after "dma-mapping: bypass indirect calls for dma-direct"

2019-01-17 Thread Julien Grall

(+ Leo Yan who reported a similar issues on xen-devel)

Hi Christoph,

On 16/01/2019 18:19, Christoph Hellwig wrote:

Does this fix your problem?


Thank you for the patch. This allows me to boot Dom0 on Juno r2.

My knowledge of DMA is quite limited, so I may have missed something.

Looking at the change for arm64, you will always call dma-direct API. In 
previous Linux version, xen-swiotlb will call dev->archdata.dev_dma_ops (a copy 
of dev->dma_ops before setting Xen DMA ops) if not NULL. Does it mean we expect 
dev->dma_ops to always be NULL and hence using dma-direct API?


Cheers,

--
Julien Grall
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/9] Use vm_insert_range and vm_insert_range_buggy

2019-01-17 Thread Souptick Joarder
On Fri, Jan 11, 2019 at 8:31 PM Souptick Joarder  wrote:
>
> Previouly drivers have their own way of mapping range of
> kernel pages/memory into user vma and this was done by
> invoking vm_insert_page() within a loop.
>
> As this pattern is common across different drivers, it can
> be generalized by creating new functions and use it across
> the drivers.
>
> vm_insert_range() is the API which could be used to mapped
> kernel memory/pages in drivers which has considered vm_pgoff
>
> vm_insert_range_buggy() is the API which could be used to map
> range of kernel memory/pages in drivers which has not considered
> vm_pgoff. vm_pgoff is passed default as 0 for those drivers.
>
> We _could_ then at a later "fix" these drivers which are using
> vm_insert_range_buggy() to behave according to the normal vm_pgoff
> offsetting simply by removing the _buggy suffix on the function
> name and if that causes regressions, it gives us an easy way to revert.
>
> There is an existing bug in [7/9], where user passed length is not
> verified against object_count. For any value of length > object_count
> it will end up overrun page array which could lead to a potential bug.
> This is fixed as part of these conversion.
>
> Souptick Joarder (9):
>   mm: Introduce new vm_insert_range and vm_insert_range_buggy API
>   arch/arm/mm/dma-mapping.c: Convert to use vm_insert_range
>   drivers/firewire/core-iso.c: Convert to use vm_insert_range_buggy
>   drm/rockchip/rockchip_drm_gem.c: Convert to use vm_insert_range
>   drm/xen/xen_drm_front_gem.c: Convert to use vm_insert_range
>   iommu/dma-iommu.c: Convert to use vm_insert_range
>   videobuf2/videobuf2-dma-sg.c: Convert to use vm_insert_range_buggy
>   xen/gntdev.c: Convert to use vm_insert_range
>   xen/privcmd-buf.c: Convert to use vm_insert_range_buggy

Any further comment on these patches ?

>
>  arch/arm/mm/dma-mapping.c | 22 ++
>  drivers/firewire/core-iso.c   | 15 +
>  drivers/gpu/drm/rockchip/rockchip_drm_gem.c   | 17 +
>  drivers/gpu/drm/xen/xen_drm_front_gem.c   | 18 ++---
>  drivers/iommu/dma-iommu.c | 12 +---
>  drivers/media/common/videobuf2/videobuf2-dma-sg.c | 22 ++
>  drivers/xen/gntdev.c  | 16 ++---
>  drivers/xen/privcmd-buf.c |  8 +--
>  include/linux/mm.h|  4 ++
>  mm/memory.c   | 81 
> +++
>  mm/nommu.c| 14 
>  11 files changed, 129 insertions(+), 100 deletions(-)
>
> --
> 1.9.1
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] i2c: imx: dma map the i2c data i/o register

2019-01-17 Thread Laurentiu Tudor
Hi Robin,

On 16.01.2019 19:55, Robin Murphy wrote:
> On 16/01/2019 16:17, Laurentiu Tudor wrote:
>> This is an attempt to fix an iommu exception when doing dma to the
>> i2c controller with EDMA. Without these mappings the smmu raises a
>> context fault [1] exactly with the address of the i2c data i/o reg.
>> This was seen on an NXP LS1043A chip while working on enabling SMMU.
> 
> Rather than gradually adding much the same code to potentially every 
> possible client driver, can it not be implemented once in the edma 
> driver as was done for pl330 and rcar-dmac? That also sidesteps any of 
> the nastiness of smuggling a dma_addr_t via a phys_addr_t variable.

Thanks for the pointer. I was actually unsure where this should be 
tackled: either i2c or dma side. Plus I somehow managed to completely 
miss the support added in the dma drivers you mention. I'll start 
looking into stealing some of your code [1]. :-)

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4d6d74e22096543cb3b35e717cf1b9aea3655f37

---
Best Regards, Laurentiu


> 
>> [1] arm-smmu 900.iommu: Unhandled context fault: fsr=0x402,
>>  iova=0x02180004, fsynr=0x150021, cb=7
>>
>> Signed-off-by: Laurentiu Tudor 
>> ---
>>   drivers/i2c/busses/i2c-imx.c | 57 +---
>>   1 file changed, 47 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
>> index 4e34b1572756..07cc8f4b45b9 100644
>> --- a/drivers/i2c/busses/i2c-imx.c
>> +++ b/drivers/i2c/busses/i2c-imx.c
>> @@ -202,6 +202,9 @@ struct imx_i2c_struct {
>>   struct pinctrl_state *pinctrl_pins_gpio;
>>   struct imx_i2c_dma    *dma;
>> +
>> +    dma_addr_t    dma_tx_addr;
>> +    dma_addr_t    dma_rx_addr;
>>   };
>>   static const struct imx_i2c_hwdata imx1_i2c_hwdata = {
>> @@ -274,17 +277,20 @@ static inline unsigned char 
>> imx_i2c_read_reg(struct imx_i2c_struct *i2c_imx,
>>   /* Functions for DMA support */
>>   static int i2c_imx_dma_request(struct imx_i2c_struct *i2c_imx,
>> -   dma_addr_t phy_addr)
>> +   phys_addr_t phy_addr)
>>   {
>>   struct imx_i2c_dma *dma;
>>   struct dma_slave_config dma_sconfig;
>>   struct device *dev = _imx->adapter.dev;
>>   int ret;
>> +    phys_addr_t i2dr_pa;
>>   dma = devm_kzalloc(dev, sizeof(*dma), GFP_KERNEL);
>>   if (!dma)
>>   return -ENOMEM;
>> +    i2dr_pa = phy_addr + (IMX_I2C_I2DR << i2c_imx->hwdata->regshift);
>> +
>>   dma->chan_tx = dma_request_chan(dev, "tx");
>>   if (IS_ERR(dma->chan_tx)) {
>>   ret = PTR_ERR(dma->chan_tx);
>> @@ -293,15 +299,25 @@ static int i2c_imx_dma_request(struct 
>> imx_i2c_struct *i2c_imx,
>>   goto fail_al;
>>   }
>> -    dma_sconfig.dst_addr = phy_addr +
>> -    (IMX_I2C_I2DR << i2c_imx->hwdata->regshift);
>> +    i2c_imx->dma_tx_addr = dma_map_resource(dma->chan_tx->device->dev,
>> +    i2dr_pa,
>> +    DMA_SLAVE_BUSWIDTH_1_BYTE,
>> +    DMA_MEM_TO_DEV, 0);
>> +    ret = dma_mapping_error(dma->chan_tx->device->dev,
>> +    i2c_imx->dma_tx_addr);
>> +    if (ret) {
>> +    dev_err(dev, "can't dma map tx destination (%d)\n", ret);
>> +    goto fail_tx;
>> +    }
>> +
>> +    dma_sconfig.dst_addr = i2c_imx->dma_tx_addr;
>>   dma_sconfig.dst_addr_width = DMA_SLAVE_BUSWIDTH_1_BYTE;
>>   dma_sconfig.dst_maxburst = 1;
>>   dma_sconfig.direction = DMA_MEM_TO_DEV;
>>   ret = dmaengine_slave_config(dma->chan_tx, _sconfig);
>>   if (ret < 0) {
>>   dev_err(dev, "can't configure tx channel (%d)\n", ret);
>> -    goto fail_tx;
>> +    goto fail_tx_dma;
>>   }
>>   dma->chan_rx = dma_request_chan(dev, "rx");
>> @@ -309,18 +325,28 @@ static int i2c_imx_dma_request(struct 
>> imx_i2c_struct *i2c_imx,
>>   ret = PTR_ERR(dma->chan_rx);
>>   if (ret != -ENODEV && ret != -EPROBE_DEFER)
>>   dev_err(dev, "can't request DMA rx channel (%d)\n", ret);
>> -    goto fail_tx;
>> +    goto fail_tx_dma;
>>   }
>> -    dma_sconfig.src_addr = phy_addr +
>> -    (IMX_I2C_I2DR << i2c_imx->hwdata->regshift);
>> +    i2c_imx->dma_rx_addr = dma_map_resource(dma->chan_rx->device->dev,
>> +    i2dr_pa,
>> +    DMA_SLAVE_BUSWIDTH_1_BYTE,
>> +    DMA_DEV_TO_MEM, 0);
>> +    ret = dma_mapping_error(dma->chan_rx->device->dev,
>> +    i2c_imx->dma_rx_addr);
>> +    if (ret) {
>> +    dev_err(dev, "can't dma map rx source (%d)\n", ret);
>> +    goto fail_rx;
>> +    }
>> +
>> +    dma_sconfig.src_addr = i2c_imx->dma_rx_addr;
>>   dma_sconfig.src_addr_width = DMA_SLAVE_BUSWIDTH_1_BYTE;
>>   dma_sconfig.src_maxburst = 1;
>>   dma_sconfig.direction = DMA_DEV_TO_MEM;
>>   ret = dmaengine_slave_config(dma->chan_rx, _sconfig);
>>   if (ret < 0) 

Re: use generic DMA mapping code in powerpc V4

2019-01-17 Thread Christoph Hellwig
On Thu, Jan 17, 2019 at 10:21:11AM +0100, Christian Zigotzky wrote:
> The X1000 boots and the PASEMI onboard ethernet works!
>
> Bad news for the X5000 (P5020 board). U-Boot loads the kernel and the dtb 
> file. Then the kernel starts but it doesn't find any hard disks 
> (partitions).

Thanks for bisecting so far, and lets stop here until I manage to
resolve the problem.

Can you send me the .config and the dtb file for this board?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/2] iommu/arm-smmu: Add support for non-coherent page table mappings

2019-01-17 Thread Vivek Gautam
Adding a device tree option for arm smmu to enable non-cacheable
memory for page tables.
We already enable a smmu feature for coherent walk based on
whether the smmu device is dma-coherent or not. Have an option
to enable non-cacheable page table memory to force set it for
particular smmu devices.

Signed-off-by: Vivek Gautam 
---
 drivers/iommu/arm-smmu.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index af18a7e7f917..7ebbcf1b2eb3 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -188,6 +188,7 @@ struct arm_smmu_device {
u32 features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
+#define ARM_SMMU_OPT_PGTBL_NON_COHERENT (1 << 1)
u32 options;
enum arm_smmu_arch_version  version;
enum arm_smmu_implementationmodel;
@@ -273,6 +274,7 @@ static bool using_legacy_binding, using_generic_binding;
 
 static struct arm_smmu_option_prop arm_smmu_options[] = {
{ ARM_SMMU_OPT_SECURE_CFG_ACCESS, "calxeda,smmu-secure-config-access" },
+   { ARM_SMMU_OPT_PGTBL_NON_COHERENT, "arm,smmu-pgtable-non-coherent" },
{ 0, NULL},
 };
 
@@ -902,6 +904,11 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,
if (smmu_domain->non_strict)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
+   /* Non coherent page table mappings only for Stage-1 */
+   if (smmu->options & ARM_SMMU_OPT_PGTBL_NON_COHERENT &&
+   smmu_domain->stage == ARM_SMMU_DOMAIN_S1)
+   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_COHERENT;
+
smmu_domain->smmu = smmu;
pgtbl_ops = alloc_io_pgtable_ops(fmt, _cfg, smmu_domain);
if (!pgtbl_ops) {
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/2] iommu/arm: Add support for non-coherent page tables

2019-01-17 Thread Vivek Gautam
As discussed in the Qcom system cache support thread [1], it is
imperative that we enable the support for non-cacheable page tables
for SMMU implementations for which removing snoop latency on walks
by making mappings as non-cacheable, outweighs the cost of cache
maintenance on PTE updates.

This series adds a new SMMU device tree option to let the particular
SMMU configuration setup cacheable or non-cacheable mappings for
page-tables out of box. We set a new quirk for i/o page tables -
IO_PGTABLE_QUIRK_NON_COHERENT, that lets us set different TCR
configurations.

This quirk enables the non-cacheable page tables for all masters
sitting on SMMU. Should this control be available per smmu_domain
as each master may have a different perf requirement?
Enabling this for the entire SMMU may not be desirable for all
masters.

[1] https://lore.kernel.org/patchwork/patch/1020906/

Vivek Gautam (2):
  iommu/io-pgtable-arm: Add support for non-coherent page tables
  iommu/arm-smmu: Add support for non-coherent page table mappings

 drivers/iommu/arm-smmu.c   |  7 +++
 drivers/iommu/io-pgtable-arm.c | 17 -
 drivers/iommu/io-pgtable.h |  6 ++
 3 files changed, 25 insertions(+), 5 deletions(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] iommu/io-pgtable-arm: Add support for non-coherent page tables

2019-01-17 Thread Vivek Gautam
>From Robin's comment [1] about touching TCR configurations -

"TBH if we're going to touch the TCR attributes at all then we should
probably correct that sloppiness first - there's an occasional argument
for using non-cacheable pagetables even on a coherent SMMU if reducing
snoop traffic/latency on walks outweighs the cost of cache maintenance
on PTE updates, but anyone thinking they can get that by overriding
dma-coherent silently gets the worst of both worlds thanks to this
current TCR value."

We have IO_PGTABLE_QUIRK_NO_DMA quirk present, but we don't force
anybody _not_ using dma-coherent smmu to have non-cacheable page table
mappings.
Having another quirk flag can help in having non-cacheable memory for
page tables once and for all.

[1] https://lore.kernel.org/patchwork/patch/1020906/

Signed-off-by: Vivek Gautam 
---
 drivers/iommu/io-pgtable-arm.c | 17 -
 drivers/iommu/io-pgtable.h |  6 ++
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 237cacd4a62b..c76919c30f1a 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -780,7 +780,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
struct arm_lpae_io_pgtable *data;
 
if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA |
-   IO_PGTABLE_QUIRK_NON_STRICT))
+   IO_PGTABLE_QUIRK_NON_STRICT |
+   IO_PGTABLE_QUIRK_NON_COHERENT))
return NULL;
 
data = arm_lpae_alloc_pgtable(cfg);
@@ -788,9 +789,14 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
return NULL;
 
/* TCR */
-   reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
- (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
- (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
+   reg = ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT;
+
+   if (cfg->quirks & IO_PGTABLE_QUIRK_NON_COHERENT)
+   reg |= ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT |
+  ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT;
+   else
+   reg |= ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT |
+  ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT;
 
switch (ARM_LPAE_GRANULE(data)) {
case SZ_4K:
@@ -873,7 +879,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, 
void *cookie)
 
/* The NS quirk doesn't apply at stage 2 */
if (cfg->quirks & ~(IO_PGTABLE_QUIRK_NO_DMA |
-   IO_PGTABLE_QUIRK_NON_STRICT))
+   IO_PGTABLE_QUIRK_NON_STRICT |
+   IO_PGTABLE_QUIRK_NON_COHERENT))
return NULL;
 
data = arm_lpae_alloc_pgtable(cfg);
diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index 47d5ae559329..46604cf7b017 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -75,6 +75,11 @@ struct io_pgtable_cfg {
 * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
 *  on unmap, for DMA domains using the flush queue mechanism for
 *  delayed invalidation.
+*
+* IO_PGTABLE_QUIRK_NON_COHERENT: Enforce non-cacheable mappings for
+*  pagetables even on a coherent SMMU for cases where reducing
+*  snoop traffic/latency on walks outweighs the cost of cache
+*  maintenance on PTE updates.
 */
#define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
#define IO_PGTABLE_QUIRK_NO_PERMS   BIT(1)
@@ -82,6 +87,7 @@ struct io_pgtable_cfg {
#define IO_PGTABLE_QUIRK_ARM_MTK_4GBBIT(3)
#define IO_PGTABLE_QUIRK_NO_DMA BIT(4)
#define IO_PGTABLE_QUIRK_NON_STRICT BIT(5)
+   #define IO_PGTABLE_QUIRK_NON_COHERENT   BIT(6)
unsigned long   quirks;
unsigned long   pgsize_bitmap;
unsigned intias;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v1] iommu/s390: Declare s390 iommu reserved regions

2019-01-17 Thread Shameerali Kolothum Thodi
Hi Pierre,

> -Original Message-
> From: Pierre Morel [mailto:pmo...@linux.ibm.com]
> Sent: 15 January 2019 17:37
> To: gerald.schae...@de.ibm.com
> Cc: j...@8bytes.org; linux-s...@vger.kernel.org;
> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> alex.william...@redhat.com; Shameerali Kolothum Thodi
> ; wall...@linux.ibm.com
> Subject: [PATCH v1] iommu/s390: Declare s390 iommu reserved regions
> 
> The s390 iommu can only allow DMA transactions between the zPCI device
> entries start_dma and end_dma.
> 
> Let's declare the regions before start_dma and after end_dma as
> reserved regions using the appropriate callback in iommu_ops.
> 
> The reserved region may later be retrieved from sysfs or from
> the vfio iommu internal interface.

Just in case you are planning to use the sysfs interface to retrieve the valid 
regions, and intend to use that in Qemu vfio path, please see the discussion
here [1] (If you haven't seen this already)

Thanks,
Shameer

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03651.html
 
> This seems to me related with the work Shameer has started on
> vfio_iommu_type1 so I add Alex and Shameer to the CC list.
> 
> Pierre Morel (1):
>   iommu/s390: Declare s390 iommu reserved regions
> 
>  drivers/iommu/s390-iommu.c | 29 +
>  1 file changed, 29 insertions(+)
> 
> --
> 2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: use generic DMA mapping code in powerpc V4

2019-01-17 Thread Christian Zigotzky

Hi All,

I compiled the fixed '257002094bc5935dd63207a380d9698ab81f0775' 
(powerpc/dma: use the dma-direct allocator for coherent platforms) today.


git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.6 a

git checkout 257002094bc5935dd63207a380d9698ab81f0775

Link to the Git: 
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/powerpc-dma.6 



env LANG=C make CROSS_COMPILE=powerpc-linux-gnu- ARCH=powerpc zImage

env LANG=C make CROSS_COMPILE=powerpc-linux-gnu- ARCH=powerpc uImage

The X1000 boots and the PASEMI onboard ethernet works!

Bad news for the X5000 (P5020 board). U-Boot loads the kernel and the 
dtb file. Then the kernel starts but it doesn't find any hard disks 
(partitions).


Cheers,
Christian


On 15 January 2019 at 4:17PM, Christoph Hellwig wrote:

So 257002094bc5935dd63207a380d9698ab81f0775 above is the fixed version
for the commit - this switched the ifdef in dma.c around that I had
inverted.  Can you try that one instead?  And then move on with the
commits after it in the updated powerpc-dma.6 branch - they are
identical to the original branch except for carrying this fix forward.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/amd: Fix IOMMU page flush when detach all devices from a domain

2019-01-17 Thread Suthikulpanit, Suravee
Joerg,

On 1/17/19 12:08 AM, j...@8bytes.org wrote:
> On Wed, Jan 16, 2019 at 02:08:55PM +, Suthikulpanit, Suravee wrote:
>> Actually, I am not sure how we would be missing the flush on the last device.
>> In my test, I am seeing the flush command being issued correctly during
>> vfio_unmap_unpin(), which is after all devices are detached.
>> Although, I might be missing your point here. Could you please elaborate?
> 
> Okay, you are right, the patch effectivly adds an unconditional flush of
> the domain on all iommus when the last device is detached. So it is
> correct in that regard. But that code path is also used in the
> iommu_unmap() path.
> 
> The problem now is, that with your change we send flush commands to all
> IOMMUs in the unmap path when no device is attached to the domain.
> This will hurt performance there, no?
> 
> Regards,
> 
>   Joerg
> 

Sounds like we need a way to track state of each IOMMU for a domain.
What if we define the following:

   enum IOMMU_DOMAIN_STATES {
 DOMAIN_FREE = -1,
 DOMAIN_DETACHED = 0,
 DOMAIN_ATTACHED >= 1
   }

We should be able to use the dev_iommu[] to help track the state.
 - In amd_iommu_domain_alloc, we initialize the array to DOMAIN_FREE
 - In do_attach(), we change to DOMAIN_ATTACH or we can increment the count
   if it is already in DOMAIN_ATTACH state.
 - In do_detach(). we change to DOMAIN_DETACH.

Then, in __domain_flush_pages, we issue command when the dev_iommu[] >= 0.
This should preserve previous behavior, and only add flushing condition to
the specific IOMMU in detached state. Please let me know what you think.

Regards,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu